Você está na página 1de 569

Computational Techniques in Quantum Chemistry

and Molecular Physics

NATO ADVANCED STUDY INSTITUTES SERIES


Proceedings of the Advanced Study Institute Programme, which aims
at the dissemination of advanced knowledge and
the formation of contacts among scientists from different countries

The series is published by an international board of publishers in conjunction


with NATO Scientific Affairs Division

A
B

Life Sciences
Physics

Plenum Publishing Corporation


London and New York

Mathematical and
Physical Sciences

D. Reidel Publishing Company


Dordrecht and Boston

Behavioral and
Social Sciences

Sijthoff International Publishing Company


Leiden

Applied Sciences

Noordhoff International Publishing


Leiden

Series C - Mathernaticaland Physical Sciences


Volume 15 - Computational Techniques in Quantum Chemistry
and Molecular Physics

Computational Techniques
in Quantum Chemistry
and Molecular Physics
Proceedings of the NATO Advanced Study Institute
held at Ramsau, Germany, 4-21 September, 1974

edited by

G. H. F. DIERCKSEN,
B. T. SUTCLIFFE,

A. VEILLARD,

Munich

York

Strasbourg

D. Reidel Publishing Company


Dordrecht-Holland / Boston-U.S.A.
Published in cooperation with NATO Scientific Affairs Division

Library of Congress Cataloging in Publication Data


NATO Advanced Study Institute on Computational Techniques
in Quantum Chemistry and Molecular Physics,
Ramsau bei Berchtesgaden, Ger., 1974.
Computational techniques in quantum chemistry and
molecular physics : [lectures]
(NATO advanced study institutes series: Series C,
mathematical and physical sciences; v. 15)
Bibliography: p.
1. Electronic data processing-Molecules-Congresses.
2. Electronic data processing-Quantum chemistryCongresses. I. Diercksen, G. H. F. II. Title.
III. Series. [DNLM: 1. Chemistry-Congresses.
2. Computers-Congresses. 3. Physics-Congresses.
4. Quantum theory-Congresses. QD462Al N279c 1974]
QCI75.16.M6N2 1974
514'.28'02854
75-9913

ISBN-13: 978-94-010-1817-3

e-ISBN-13: 978-94-010-1815-9

001: 10.1007/978-94-010-1815-9

Published by D. Reidel Publishing Company


P.O. Box 17, Dordrecht, Holland
Sold and distributed in the U.S.A., Canada, and Mexico
by D. Reidel Publishing Company, Inc.
306 Dartmouth Street. Boston, Mass. 02116, U.S.A.

All Rights Reserved


Copyright 1975 by D. Reidel Publishing Company, Dordrecht
Softcover reprint of the hardcover 1st edition 1975
No part of this book may be reproduced in any form, by print, photoprint, microfilm,
or any other means, without written permission from the publisher

CONTENTS

PREFACE

VII

B. T. Sutcliffe

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

G. H. F. Diercksen and W. P. Kraemer

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE IN RELATION TO


QUANTUM CHEMICAL CALCULATIONS
A. Vei llard

THE LOGIC OF SELF-CONSISTENT-FIELD PROCEDURES


B. Roos

THE CONFIGURATION INTERACTION METHOD

P. Swanstr~m and F. Hegelund

MOLECULAR PROPERTIES

107
201
251

299

V. R. Saunders

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

347

N. C. Handy

CORRELATED WAVEFUNCTIONS

425

M. A. Robb

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY


R. McWeeny

SOME APPLICATIONS OF PROJECTION OPERATORS


G. Winnewisser

MOLECULES IN ASTROPHYSICS

435

505
529

PREFACE

This book contains the transcripts of the lectures


presented at the NATO Advanced study Institute on
"Computational Techniques in Quantum Chemistry and
Molecular Physics", held at Ramsau, Germany,
4th - 21st Sept. 1974.
Quantum theory was developed in the early decades
of this century and was first applied to problems in
chemistry and molecular physics as early as 1927. It
soon emerged however, that it was impossible to consider any but the simplest systems in any quantitative detail because of the complexity of Schrodinger's
equation which is the basic equation for chemical and
molecular physics applications. This remained the situation until the development, after 1950, of electronic digital computers. It then became possible to
attempt approximate solutions of Schrodinger's equation for fairly complicated systems, to yield results
which were sufficiently accurate to make comparison
with experiment meaningful.
Starting in the early nineteen sixties in the United
States at a few centres with access to good computers
an enormous amount of work went into the development
and implementation of schemes for approximate solutions of Schrodinger's equation, particularly the development of the Hartree-Fock self-consistent-field
scheme. But it was soon found that the integrals
needed for application of the methods to molecular
problems are far from trivial to evaluate and cannot
be easily approximated. In the past five or so years
however big steps have been made in solving the integral evaluation problem and the field has progressed
to such a stage that it is generally accepted that
the results of quantum mechanical calculations are
now sufficiently good to leqd to a better understanding of experimental results in chemistry and molecular

PREFACE

VIII

physics, and often to provide impetus for fresh experimental work.


The aim of the Institute was to familiarize young professionals in the field with the current state of the
art and to indicate to them likely areas of advance in
the near future. Basically the Institute had three divisions: detailed instructional lectures given for the
whole period of the course, review lectures given in
the last week of the course, and problem solving and
instructional sessions which again were given throughout the course.
The Advanced Study Institute was financially sponsored by the NATO Scientific Affairs Division. The
installation of a computer terminal system was made
possible by a generous grant of IBM Germany. Invaluable administrative assistance and the computer facilities were supplied by the Max-Planck-Institute
for Physics and Astrophysics, MUnchen. The Organizing Committee wishes to express its gratitude for
this support. In particular we would like to thank
Dr. T. Kester (NATO, BrUssel) Dr. G. HUbner (IBM,
Sindelfingen), and Prof. Dr. L. Biermann (MPI, Munich)
for their interest and constant incouragement.
The editors would also like to thank the lecturers
for their co-operation in preparing the material
that made this publication possible.
The institute itself was made possible by the enormous enthusiasm of the students, lecturers and demonstrators on the course; and by the untiring
efforts of the administrative and technical staff
and of the service staff of the Alpenhotel Hochkater,
Ramsau.
December 1974

The Organizing Committee


G.H.F. Diercksen, Munich
B.T. Sutcliffe, York
A. Veillard, StraBbourg

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

B. T. Sutcliffe
Dept. of Chemistry, University of York, England

1.

THE BASIC PROBLEM


The chief aim of this course is to describe practicable

methods of solving the eigen-value problem:


H'l'

E'l'

(1.1)

and the realisation of these methods on electronic digital


computers.
and

Here E is one of a set of possible energies {E }

one of a set of associated state functions {'l' }.

Hamiltonian Operator
H(l,2--N)

Hwe

L: h (i)

1=1

shall take to be
+

L: Ie /

1J = 1

The

41TE

r ..

(1. 2a)

~J

(1. 2b)

h(i)

which describes (in conventional notation), the motion of N


electrons, moving in the field provided by N nuclei each wi th
n

charge Z ,fixed in space, assuming only electrostatic interactions.


n

We shall generally quote this Hamiltonian in atomic units, by


quoting all distances as multiples of the fundamental length
a

41TE

h 2 /me 2 .

Diercksen et aL (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 1-105.
All Rights Reserved Copyright 1975 by D. Reidel Publishing Company, Dordrecht-Holland.

B. T.SUTCLIFFE

To a lesser extent we shall also be concerned with describing practicable methods of calculating expectation values of
operators, between the calculated state functions.
We shall not have much occasion in future to refer explicitly
to why we want to know how to solve these problems, so it seems
appropriate to describe the context at this stage, in the hope of
forestalling possible puzzlement or exasperation later.
We hope to be able to produce numbers from our calculations
which, at very least, can be compared with experimental numbers.
Better than this we hope to be able to anticipate numerically
the outcome of as yet unperformed experiments, and best of all,
we hope to have methods which will yield numbers so reliable as
to be useful alternatives to experimental measurement.
When we reflect on the fact that most experiments are done
on huge assemblies of molecules, (a tube of gas in a spectroscope,
a flask of liquid on a heating mantle and so on), and that these
assemblies are open, (if only by virtue of the intervention of
the measuring apparatus) and therefore developing in time, we may
wonder what possible relevance our simple isolated molecule, time
independant Hamiltonian, can have to an experimental situation.
The answer is, I think, that so far as we know, the solution of
the problem as we have stated it is a sine

qu~non

of any

progress towards a complete description of the experimental


situation.
Essentially we believe that the experimental situation can
be described by an equation of the form
(1.3)

where

system.

here is a complicated Hamiltonian describing the whole


We believe however that in systems where chemical

reaction is not taking place

H can

usefully be written as sum of

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

isolated molecule Hamiltonians together with interaction terms.


In systems where the interactions are weak then the assembly
properties will be essentially those of the isolated molecules,
suitably averaged according to statistical mechanical principles.
Thus the solution of the isolated molecule problem here forms a
natural starting point.
In reacting systems and systems subject to strong time
dependent interactions, the situation is by no means so clear.
However it is perhaps not too much to assert that even in these
situations, it is extremely likely that isolated molecule
functions will play an important role in the explanation of the
overall behaviour of the system.
However the Hamiltonian that we have written down in (1.2)
is not the full Hamiltonian even for the isolated molecule, and
unfortunately we are not completely sure of what the full
Hamiltonian in fact is.
H c1assica1"

This is essentially because the

Hamiltonian for the problem, though known, is

inadequate, but we are not sure of how we should construct


properly the relativistic one.

We believe however that the

classical Hamiltonian, represents the leading term (in most


situations) of the full Hamiltonian.

We further believe that

most of the other terms can be taken care of by allowing every


particle to have spin, and by associating with the spin a
magnetic moment operator.
A

ll(n)

g et S(n)

(1. 4)

nrmn

where S (n) is a spin operator appropriate to the n'th particle,


and g

is assigned from experiment.

We then add to the

Hamiltonian all those extra terms which we would expect to arise


classically from the presence of these extra magnetic moments.
But (1.2) still does not fully represent the isolated
molecule, even in the leading term approximation, since it lacks

B. T. SUTCLIFFE

terms which would describe the nuclear motion.

However in many

cases the electronic behaviour is of paramount interest and it


turns out that often this can be well described with a solution
of (1. 2) providing a suitable fixed disposition of the nuclei is
taken.
From this discussion I hope that it is clear that the task
that we have set ourselves, namely that of solving (1.1) using
(1.2) as an approximation to

~,

produces results which are still

a long way from being directly usable to achieve our desired end.
The solution if (1.1) in our case, though difficult enough, is
simply the start of a hard road to observation.
Now what I should like to do in this series of lectures is
the following.

First of all I should like to consider how (1.2a)

arises from the full "classical" Schrodinger Hamiltonian, that


is I should like to consider the problem of removing nuclear
motion, without considering spin or relativistic interactions.
The object of this section will be to indicate how we might
"correct" our solutions to (1.1) for the presence of nuclear
motion.

Next I want to consider in a rather abstract mathematical

way what we know about the properties of the solutions of (1.1)


and of the solutions of some straightforward extensions to (1.1)
(to cover the case of including fields and simple spin-field
interactions).

The object of this section is two-fold, to see

what kind of guidelines we can get, just from the problem


itself, to the actual construction of approximate solutions, and
to examine what kind of terms we could in principle actually put
into

and still get a well-behaved operator.

We are interested

in this aspect of the problem because we must consider some pretty


weird and wonderful operators if we intend to "correct" our
solutions to (1.1) for relativistic effects, or to calculate
essentially relativistic properties (like ESR or NMR coupling
constants) from our wave function.

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

We shall not go on directly to consider the problem of


relativistic corrections but will turn instead the principles of
constructing approximate solutions to (1.1) in the light of what
has been found in the previous sections and we shall confine our
attention completely to the problem of approximation for bound
states, and come fairly quickly to the nitty-gritty of the
mathematics necessary in actually constructing and computing
approximate wave functions of various kinds.

Then in the

context of some specific approximations we shall finally consider


the problems of calculating properties, estimating bounds and
correcting calculations for comparison with experiment.

In this

context we shall consider relativistic effects but not in any


detail statistical mechanical problems.
In the first few sections I shall in fact be concerned with
problems which require for their solution, extreme mathematical
sophistication and a detailed account of the mathematics involved
would, I think, be quite inappropriate, and in large measure
unnecessary.

I shall therefore content myself largely with

results referring to books and papers for more detailed considerations.

The book that I have found most helpful in putting

useful perspective

in

the kind of problems that we shall be consider-

ing, is E. C. Kemble "The Fundamental principles of quantum


mechanics" (Dover, 1937).

It is rather old now and needs to be

supplemented in a few places with modern results, and the book


which I have found helpful for some such results is G. Hellwig
"Differential Operators of Mathematical Physics" (Adison-Wes1ey,
1967).

It is a bit difficult to find the results that one needs

in it, but many of them are there, if you can wade through the
rather abstract maths.

There is also a useful review paper (which

is unfortunately rather badly proof read and not always too clear)
by Kato (Supp. Prog. Theoret. Phys. 40, 2 (1967
much of the progress in the field up to 1966.

which summarises

2.

B. T. SUTCLIFFE

THE NUCLEAR MOTION PROBLEM


As an introduction to this may I remind you what steps we

actually go through in solving the hydrogen atom problem, a


problem we can solve exactly.
The full hydrogen-like atom Hamiltonian is

H(1,2)

anI

(2.1)

2m

and the first thing that we do is either to separate off the


translational motion in centre of mass co-ordinates or to assume
the nuclear mass, m, infinite and so fix the nucleus as origin of
co-ordinates.

Now apart from any physical reasons for doing one

of these we do it for a good mathematical reason, which we can


explain as follows:
We are going to seek eigen-solutions of the differential
equation
H(l,2)'I'

E'I'

with the'!' in some domain

(2.2)
which we must specify in detail.

Now

it is easy to see that H(1,2) commutes with the translation


operator T(a), which maps x.

x. + a
1

etc., so that the 'I' may be

chosen to be simultaneous eigen-functions of T(a), but the


operator T(a) has no proper eigen-functions at all, in the sense
that its spectrum is completely continuous.

This can easily be

seen from a one dimensional example using t(a ) then


x

t(a )e kx
x

k(x+a )
x

a k kx

e x e

(2.3)

so the "eigenvalue" is e axk for arbitrary ax'


Thus in one sense (to be discussed later) the problem including

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

nuclear motion has no eigen-solutions at all.

Now in fact this

is an important and quite general point (it holds for all atoms
and molecules) unless by some means or another we get rid of the
centre of mass motion of a quantum mechanical problem, we cannot
proceed any further with the solution of the problem in differential equation form.

This means, as we shall see in a moment, that

we cannot proceed with the associated variational approach either.


If we make, as is usual, the fixed nucleus assumption then
we get in place of our above Hamiltonian
(2.3a)

2m

47rE: r
o

using the nucleus as the origin of co-ordinates, which is


obviously a special case of the general problem that we are going
to deal with in this course.
It is interesting to compare this equation with the one we
get if we separate off centre of mass motion.

To do this let us

choose the co-ordinate system which we can express generally for


any Nn nucleus- N electron system by the relations
X =

with M

N
(

N
n
I:

I:

n=l

m
n=l n

x.

2,3

1,2 --- N
. '1 ar express10ns
.
f or Y, Z, y.,
" z.
+ Nm, an d S1m1
J
J

(see e.g. Kemble Section 15).


In the case of an atom or ion Nn

=1

all others can be referred is the nucleus.

so the particle to which


For a molecule there

are of course Nn choices of reference point. I should point out


however that there is no need to use the nucleus or one of the
nuclei as a reference point, technically what follows can be done

B. T. SUTCLIFFE

using any particle as a reference point.

Indeed it is a very

easy matter to show that a separation of the kind given below


can be effected by any choice of co-ordinate system r~ given by
1

....

1: a .. ~ .

r.

j =1 1J

a ..
j=l 1J

0,

2,3 -- N

(2.5)

but to return to our choice (2.4) we get, after a bit of


manipulation, the Hamiltonian for the hydrogen-like atom
AI

_t2 1,72(R)

-1/

2M

_'h2

1,72

_ t2 1,72

2m

2ml

- Ze

41TE: r
0

(2.6)

1,72(R) _t?1,72 - Ze 2

2M

2]J

where ]J = (m l + m)/m l where

41T r
0

is the nuclear mass, and r is the

nucleus-electron radial variable.

The operator 1,72(R) is in the

centre of mass co-ordinates X, Y and Z it follows at once that a


solution to (2.2) with (2.6) as the Hamiltonian can be written as
.l.

' (R,r)

T (R) '

(~).
(2.7)

.
1

where
(2.8)

ze 2 ) ' (r)

.' (r)

(2.3b)

41T o r
Formal functions which satisfy (2.8a) are, of course:
~

T(R)

N(t)eik,R

(2.9a)
(2.9b)

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

where 0

.::.1 kl 2.::.

00

The relationship between (2.3a) and (2.3b) is a perfectly


straightforward one, it is, as we shall show, a simple scaling
relation, but in this very simple case it is obvious at once
that the two problems are essentially equivalent.

Unfortunately

this is not the case in the many electron atom or molecule.

Let

us concentrate for simplicity on the many electron atom problem,


here the fixed nucleus hamiltonian is:
A

Ze 2

HI (1,2,--N)

4'TTE: r.
o 1

2
E'
e
i ,j=l
4'TTE: r ..

(2.10a)

o 1J

while it can readily be shown that the Hamiltonian equivalent to


(2.8b) is:
N

- t2 2m

a'

-= 2

r:

lj=l

E'
lj=l

e
4'TTE:--r:-:o 1J

V(i). V(j)

(2.10b)

It'2

Now it is very easy to show that HI and H'2 differ only by a change
of scale. This can be shown by replacing the variab les r in HI
A

by r =
a

ria 0 and in H2 by the variables r' = ria 0 where


J,.2 2
4'TTE::n Ie m
o

(2.11a)

and
a

(2.11b)

10

B. T. SUTCLIFFE

in which case both operators reduce to operators of the form

~
L

o i=l

(-!

L' _1_
N
lj=l rij

with b o for the equation arising from


b

2
Z
V (i) - /r!) + ~

HI

(2.12)

being
(2.13a)

and in the equation arising from H~ being (2.12) but with b 0


replacing b ,
o
b

(2.l3b)

Thus if we were to solve the problem specified by (2.12) in


terms of an energy expressed in terms of b

(say) as a unit, then

the solutions would be solution to the fixed nucleus problem


(with positions variables in units of a , energy in units of b )
o

and these solutions could be converted to solutions of the problem


specified by

Hi

simply by mUltiplying the position variables in

any solution function by


where
a

m/~

and mUltiplying the energy by

~/m,

and m are calculated in a system of units consistent with

being the unit of length and b

the unit of energy.

Such a

system of units is of course the Hartree system of atomic units


and it is easy to see that in this unit system we have effectively
e = 1, m = 1, h = 1, 4TI

= 1 and thus on this system

(1 + ml)/m l where ml is measured in mUltiples of m, so that


for the hydrogen atom ~ ~ 1837/1836.

Because of this scaling property of atomic units and because


of their general convenience we shall, as is common practice, use
them throughout the course from now on.
units to S.l. units is that

The relationship of these

11

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

(2.l4a)

and
(2.l4b)
The unit b

is often called the hartree and a

bohr radius.

Sometimes b

of course is the

is quoted in electron volts and

b o '" 27.3eV and sometimes it is multiplied by Avogadro's number

to give energy per mol and as such it is roughly


2625.8 kJ mol-lor 628 kcal mol-I.
However even if we work in atomic units the fixed nucleus
problem in the many electron case is not equivalent to the centre

"

of mass removed problem, because of the presence of the term HZ'


which in atomic units is

fill

- 1

N
L

2ml ij=l

'Y(i).

(2.15 )

'Y(j)

where ml is a pure number which is at least 1836. Since the ml


is so big, even in the worst case, we can, at least initially,
think of this term as "small" and regard any solution to the
fixed nucleus problem for an atom, when appropriately scaled, as
an approximate solution to the centre of mass removed problem.
Now there is no essential difference in the discussion of
the removal of centre of mass motion for molecules from that shown
for atoms.

However if we use the same method in a molecule the

choice of particle 1 is an arbitrary choice and the solutions of


the fixed nucleus problem scale into solutions of the centre of
mass removed problem only on neglect of terms like those occurring
in fill but involving nucleus-nucleus and nucleus-electron operator
2

pairs and it is obviously much more problematic, deciding whether


or not these terms are "small".

But this is not our principal

difficulty in connection with molecules.

If you look at (2.l0a)

12

B. T. SUTCLIFFE

and (2.10b) you will see that they are both invariant to any
orthogonal co-ordinate transformation of the form that maps
vectors in the co-ordinate space:

t.

-+"t.'
1

where

i.'
= R~.1
1

i = l,2--N

for

and.in cartesian co-ordinates the operator


orthogonal matrix representative,
X.

y.1

R has

(2.16)

the 3 by 3

such that if

(i

(2.17)

k)

Z.

with f j and

k the cartesian unit vector set so that


~,

r.

r.' =Rr.,

-.1

_...J.

r.

,..

(2.18)

e r.

-1

(2.19)

and
where

is a unit matrix and the superscript on R and denotes a

matrix transpose.

,..
Typically we think of the operators R as rotations,
reflections and the inversion, and we think of this invariance as
reflected in the fact that we can classify atomic states according
to their L and

values (angular-momentum quantum numbers) and by

their parity, and we see nothing surprising in this.

However if

we turn to molecules it is very easy to see that the fixed nucleus


Hamiltonian is

invariant under all such operators, precisely

because the nuclei are fixed and define the co-ordinate frame,
while the Hamiltonian in the centre-of-mass-removed system is
invariant under the orthogonal mapping of all electronic and
nuclear variables.

13

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

To get ahead a little in logical exposition, but not to say


anything that you are not already familiar with, this means that
the eigen-so1utions (if any) of the fixed nucleus problem carry
only representations of the

fix~d

nucleus point group while those

(again if any) of the centre of mass removed system carry


representatives of the full rotation-ref1ection-inversion group,
of which the point group is only a sub-group.
Now what this means technically is this; that we are free in
the case of centre of mass removed Hamiltonian to make a further
choice of co-ordinate system

for the molecule (and indeed we can

for the atom too if we so desire) such that it is in some way


embedded in the molecule.

If we then express the Hamiltonian in

that co-ordinate system (the so-called rotating co-ordinate


system) then we find that we have a Hamiltonian involving two
distinct types of variables.

One set of variables describes

what is essentially the internal motion of a non-rotating system


and there are three less of these variables than the total number
of variables in the problem, and the other set of three variables
describes the overall rotation of the system.

Now the actual

process of performing this "separation off of rotation" is tedious


and complicated in the extreme (some discussion of it can be found
in section 35 of Kemble), but the point of interest here is that
for very good technical reasons the internal motion equation has
so far resisted solution in any situation other than a simple
diatomic molecule (see e.g. Hunter and Pritchard
1967 or Kolos and Wo1niewicz, R.M.P.

~,

J.e.p.

41, 121,

55, 1963) and there is

good reason to believe that it always will be intractable.


Avoiding the detailed maths of why this is, one can perhaps
understand it qualitatively by pointing out that one makes no
assumption about rigidity in embedding the rotating co-ordinate
system in the molecule, but defines it with respect to freely
moving particles.

This means that it is possible for the

particles to take up a configuration which in fact precludes the

14

B. T. SUTCLIFFE

possibility of a unique definition of an axis system (for example


the three defining particles could lie on a straight line).
This possibility shows itself up in essential singularities in
the operators written in this system and it appears to be
impossible to avoid these except in the diatomic case.
As if this were not enough there is also another difficulty.
If we make the fixed nucleus approximation then by hypothesis we
are distinguishing between nuclei, so that we can label otherwise
identical nuclei by their spatial positions.

However in the

case of the centre-of-mass-removed Hamiltonian for molecules we


cannot, even in principle, perform such a labelling so that our
eigen-solutions of that problem (again, if any) must carry
appropriate representations of the symmetric group of the
identical particles, under the operators of which the centre-ofmass-removed Hamiltonian is invariant.
Now in the centre-of-mass-removed Hamiltonian in general,a
permutation of particle variables shows up as a mapping of the
arbitrary variable in the problem, into a linear-combination of
the other variables.

In certain circumstances, as for instance

when there is a unique centre in the problem (as in the case of


NH3 or H20) it is possible to choose a separation system where
this is not the case, but it does not appear that this can be done
generally.

This means that in practice it is fiendishly difficult

to construct even an approximate solution to the problem, which


is properly symmetry adapted as far as the symmetric group is
concerned, and furthermore fiendishly difficult to evaluate any
potential energy integrals.

Needless to say, perhaps, if you

try to separate off rotation the problem gets even worse and one
finds that one simply cannot avoid permutations of the variables
which re-define the orientation of the rotating co-ordinate
system.

15

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

On top of all this we have the problem that in reasonable


sized systems the Hamiltonian including nuclear motion may well
have among its eigen.states not only isolated molecule states,
but states representing all the sub-divisions of the system that
are possible, that is all possible dissociation products and
this in itself passes on enormous problems in the construction
of adequate approximate wave functions.
In the light of this discussion we see that, at present at
any rate, solving the general centre-of-mass removed problem is
simply not a possibility open to us in molecules, as it is in
atoms, and it is therefore rather important to relate, if
possible, the fixed nucleus problem to the centre of mass removed
problem, if we wish to utilise our solutions in any other than
pseudo-static situations and relate them meaningfully to experimentally observed quantities.
We have seen that we cannot in any useful way think of the
fixed nucleus solutions simply being "scaled" to centre-of-mass
removed solutions and indeed there is no easy way in which we can
convert its solutions into solutions of the centre-of-massremoved problem.

It should be noted that by adopting the con-

vention that the co-ordinate system is centred on the nucleus in


the fixed nucleus atomic problem, we have effectively removed
three variables from the problem and hence removed the translational
invariance from our wave function there.

In the fixed nucleus

molecular problem the situation is somewhat different however,


here the nuclear attraction term in our Hamiltonian has to be
written as (c.L (1.2b)).
Nn
L

n=l

zn / It.1 - t n I

(2.20)

and thus apparently our Hamiltonian still contains 3(N+Nn)


variables.

Now of course our1n (in fact) are fixed in any given

16

B. T. SUTCLIFFE

calculation, and they are not really to be regarded as variables


of the problem but as parameters.

Nevertheless we do in certain

circumstances vary them (approximate calculations to optimise


nuclear positions are a commonplace) and when we do, we recognise,
almost without reflection, the need to avoid uniform translations
and so we keep one nuclear parameter fixed.

Even if we do this

however we still have enough freedom to describe a rigid


rotation of the system and indeed if we performed "fixed-nucleus"
calculations for all non-translational dispositions of the
nuclei but including rigid rotational dispositions we could
obtain in principle, a solution

--

~(r,

R) which would have the

correct rotational symmetry for the centre-of-mass-removed


Hamiltonian.

In practice of course it would be pretty close to

impossible to construct such a solution.

In this context it is

also worth noting that approximate wave functions constructed


from orbitals centred on the various nuclei of the fixed nucleus
problem, do (in the same sense as above) contain all variables and
would therefore be very awkward to use in the construction of
functions like

~(I'

!).

It should be clear to you now that though instinctive1y,one


may think that the fixed nucleus Hamiltonian is easily related to
the Hamiltonian that arises in the Born-Oppenheimer approach or in
the adiabatic approximation in Born's later approach (see Born
and Huang "The Dynamic theory of Crystal Lattices" App. VIII,
(Oxford, 1954, this is simply not the case.

It should of

course be noted that rigorous separation off of nuclear motion is


not possible in the presence of fields nor is it possible in
Hamiltonians whose potentials are "velocity dependent" even in the
absence of fields.

However the problems presented by these

possibilities are, relatively speaking, trivial and can be managed


if the major problem is considered solved.

17

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

There seems to be no direct way in which these difficulties


can be resolved, that is to say there seems to be no way in which
starting from the quantum mechanical Hamiltonian in space fixed
co-ordinates we can pass in a satisfactory manner to an equation
which exhibits clearly the status of the fixed nucleus
Hamiltonians as the limit of some

~ell

defined problem, involving

nuclear and electronic motion.


This being the case there are some advantages to setting out the
best that is presently possible (albeit an unsatisfactory best)
in the context of the way that the molecular Hamiltonian is
usually derived by those interested in molecular spectroscopy.
I shall try and outline this now, leaning very heavily on the
account given by Howard and Moss in Mol. Phys.

11,

433, 1970

(see also Moss "Advanced Molecular Quantum Mechanics", Chapman


and Hall 1973, Ch. 10), which in my opinion is the best and most
satisfactory account so far offered.

Actually the approach

offered by Moss also makes it possible to include relativistic


effects and external electro-magnetic fields.

It will not

however deal properly with molecules whose "rigid" configuration


is linear, but it can be modified in a straightforward fashion to
cope with even this case (see Howard and Moss, Mol. Phys. 20, 147,
1971).

I shall not attempt to describe the intimate details of

the approach, but merely attempt a summary which I hope will be


useful at least in indicating the tricky and non-obvious features.
One starts off by writing down a Hamiltonian in space fixed
co-ordinates which includes relativistic corrections up to terms
of order 1/c 2 and external fields. This Hamiltonian is derived
for both electrons and nuclei, by analogy with the way that Foldy
and Wouthuysen derived the Pauli equation from the Breit equation
for a pair of electrons.

Now there are many difficulties and

uncertainties in this procedure, some of them Moss discusses in


his book and you can also find a discussion of them in McWeeny

18

B. T. SUTCLIFFE

and Sutcliffe "Methods of Molecular Quantum Mechanics".


Press, 1969, App. 4.

Academic

It is perhaps sufficient to say here that

this Hamiltonian is, whatever the uncertainties, the best we can


do and at least it is a plausible one.

For our purposes it is

sufficient to note that in order to get anything at all you have


to work in space fixed co-ordinates and your initial approach
has to be quantum mechanical, or else you do not get the proper
spin dependence in.
co-ordinate

It is impossible to start in a localised

system and with a classical Lagrangian or Hamiltonian.

Having got this Hamiltonian it is possible to write it in a


rather nifty tensor operator form, taking up all the spin
dependent terms into the base tensors involved with particle
momenta or into the potential.

Having done this it is possible to

convert the quantum mechanical Hamiltonian into Lagrangian form


and hence to derive a Lagrangian for the problem and from this an
equivalent "classical" Lagrangian.

In other words to work in

exactly the reverse order from that which we usually do in


establishing quantum mechanical equations.
Having got a classical Lagrangian for the problem we are
then in business for one can then take over, practically unchanged,
the usual vibration spectroscopic arguments such as are found for
example in Ch. 11 of Wilson, Lee ius and Cross "Molecular Vibrations",
McGraw Hill, 1955.
Watson ,Mol. Phys.

(For a rather more up to date analysis see

li,

479, 1968).

To describe what is done in a little more detail, first a


centre of mass transformation is performed such that the
~

co-ordinates in the space fixed system R are given by


(2.21)
~

where R is the centre of mass vector

19

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY


~

R = l:mARA/M

(2.22)

(the sum going over all electrons and nuclei) and where
~

= -m

l: r.
Mil

(2.23)

where m is the electron mass and M the total molecular mass


(M
+ Nm) and where ~. is an electronic position vector,
nuc
1
measured relative to the centre of nuclear mass, in a molecule
fixed system parallel to the space fixed system.

The trans-

formation (2.21) is completely specified by requiring that


~

=0

and hence that


~

l:mA (r A + 8)
A

=0

(2.24)

Notice that this is essentially a classical procedure. We are not


free to do this kind of thing in quantum mechanics. Equation --(2.24) implies at once that
l:m ~
n

n n

(2.25)

where the sum (2.25) goes

over all nuclei.

The conditions

(2.24) and (2.25) define the electron position vectors~. and the
. .
nuclear position
vectors ~
rn comp 1 ete 1 y.

We now suppose that it is possible in some way to embed an


axis frame in the system, which somehow rotates with the system

with angular velocity w.

By perfectly standard methods it then

follows that the velocity of a particle with respect to the space


~

fixed frame VA can be written as


(2.26)

.
~.
Here olioV is
the centre of mass ve 1 OClty
va is du"I dt and ~.
vA is the

velocity of the particle as measured in the rotating frame.

Now

it should be stressed that this is a vector relation so we need


.olIo.ollo

.olio

not specify in detail the co-ordinate system in which w, r A or 8


~
are to be described but clearly if we take r A in the non-rotating

20

B. T. SUTCLIFFE
~

system then w must be in that system also.


To specify the rotating frame we follow Eckart (Phys. Rev.
47, 552, 1935) and postulate a set of constant reference vectors
~o

r n , one for each nucleus such that


"
.0
~
L..m r
v = 0
n n x n

(2.27)

where vn is the nuclear velocity in the rotating frame.

Now

again, since (2.26) is a vector relation, we can regard the


co-ordinate system as arbitrary but since it is natural to regard

~ because of its definition as being expressed in the rotating


n

frame then it is natural also to regard the ~


in the rotating frame.

as being expressed

In many ways it is perhaps better to

regard the rotating frame as specified by


.,. ~o
L..m r

n n x n
n
. h t he
toget h er w~t

(2.28)
'

requ~rement

, expresse d'~n t h e
th
at, '~ f ~o
rn ~s

rotating co-ordinate system then, d~o/dt = O.


n

course (2.28) implies (2.27) at once.

In this case of

The condition (2.27) is

easily seen to correspond to the requirement that the nuclei have


n.

frame when r n = r n for all


It is therefore natural to think of the ~o as in some sense

zero angular momentum

~n

the

rotat~ng

~o

the equilibrium nuclear positions but we will simply call them


reference vectors.

It is perhaps appropriate to notice that at

this point difficulties begin to arise if one specifies ones


"rigid" framework in terms of ~o such that the nuclei are co-linear I
n

but we shall not consider that point here.

If the molecule is

simply a diatomic then we need not use this approach at all of


course, since in this case we can do the whole thing quantum
mechanically from the start.
Now providing that (2.28), (2.27), (2.23) and (2.25) are
, f' d b your ~o
sat~s ~e
rn and ~rn ( no matter what

co-ord~nate

system we

21

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

actually think of them as expressed in), we can define a set of


vibration co-ordinates Qq by the relation
~

~o

+ L:

(2.29)

viiin nq q

where there are just 3N -6 of the Q and we can also define a


n
.. q
set of Corioli's coupling constants ~pq by the relations
Em (~ _
n n
so that

~pq

~o)

x vn

..

(2.30)

L: ~pqQp VQq

pq

..

E 1
1
np x nq
n

(2.31)

where again ~n is measured in the rotating system and


dQq/dt measured in the rotating system.

is

V Qq

In order that the

vibration co-ordinates shall satisfy orthogonality relationships


between themselves and between themselves and rotations and
translations there are certain accessory relations that must hold

....

between the lnq and between vn and

V Qq

etc., which are discussed

in Moss and Howard's paper.


Now we can regard the relations that we have so far obtained
as specifying an orthogonal co-ordinate transformation from the
space fixed frame to the rotating frame (see e.g. vatson's equations
(8) (9) and (10

and we can thus express our space fixed frame

Hamiltonians in terms of the new system of co-ordinates.

The

process is in fact hair-raisingly tricky involving masses of very


tedious algebra, but eventually one comes up with
(2.32)

where I have neglected all spin and external field effects,


except those that crop up in V as dependent only on inter-particle
distances.

In (2.32) P is the centre of mass momentum, Pi

1S

the

22

B. T. SUTCLIFFE

...:.

electronic momentum in the rotating frame and P the momentum


q
conjugate to the vibration co-ordinate Q , again in the rotating
....

frame.

The vector N is the rotational angular momentum

...N

..:.

J - 1: (~.

p.)
1

1:

pq

spq

Q P
P q

(2.33)

-'"

where here J is the total angular momentum of the system, which


can be expressed in terms of the Euler angles for the rotating
frame and an expression for this in these terms is given for
example as equation 6 in section 11.4 of Wilson, Decius and Cross.
For those of you familiar with the usual spectroscopic approaches
~

to this problem N is equivalent to the vector whose components


are written (M -m ) etc. in Wilson, Decius and Cross or as
x x
(P - 1T ) in Nielsen (Rev.Mod. Phys. 23, 90, 1951).
a

The tensor
tensor

is the inverse of the instantaneous inertia

which has components laS (a, S,

= x,

y, z) given by
(2.34)

where

ayu~ is the unit anti-symmetric tensor (sometimes also


called the permutation symbol or the Levi-Civita density) and is
E

such that EaSy

1 if aSy is any cyclic re-arrangement of xyz, -1

if any pair of aSy are transposed from cyclic order, and


pair of aSy are the same.

if any

Again it does not matter which

co-ordinate system we imagine the inertia tensor expressed in,


but it is perhaps rational to think of it in the molecule fixed
but non-rotating frame so that if the nuclear framework were
completely rigid the rotating axes could be chosen as the principal
axes of inertia.

One should perhaps also notice here that if the

molecule has an accessible linear configuration of nuclei then I


becomes singular and in fact therefore possesses no inverse.
The trick now is to convert (2.32) into a quantum mechanical
Hamiltonian and this can be done using a method which seems to be

23

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

due originally to Podolsky.

Phys. Rev.

~,

described in some detail in Kemble p. 273.

812, 1928, but is


The real difficulty

here is that the rotational angular momentum is not conjugate to


any co-ordinates (the components being defined in terms of the

= aT/ awx etc., since the~ potential


x
However as we can express J in terms

classical Lagrangian as aL/ aw


V is not a function of ~).

of the Euler angles for the rotating frame and momenta conjugate
to these Euler angles, we can proceed with the Podolsky transformation much in the way done in section 11.4 of Wilson, Decius
and Cross.

The actual process is a bit more tricky than in that

section, because of the presence of electronic terms but there is


no essential difference.
The up-shot of this process, which is again very long,
tedious and involved, is to produce the quantum mechanical
Hamiltonian
1

Ii = 1

2M
1

~2 + 1 r~4~ ~

"2

as

_I

~2~

as

p4

7; P ,,-~ P A~
2
+ l~
qll
qll" + lmEp. +

"2 i

"2q

where

ais

! (r .P-)
2M
n'Uc.

2 + V

the determinant of the matrix (tensor)

P and

(2.35)

is there-

fore not just a scalar constant but a function of position and so


does not commute wi th ~ or 1> It should by the way be noted
q

here that reference frame does matter for ~, in that its total
angular momentum component J has to be defined in terms of the
Euler angles referred to the molecule-fixed rotating co-ordinate
system.

In deriving (2.35) it is necessary to decide how the

wave function shall be normalised and the assumption has been


made that the translational motion can be factored off and that
the remaining rotation-vibration-electronic motion wave function
shall be normalised as

24

I ---- J

B. T. SUTCLIFFE

1P

* 1P

sin eded!pdx'lf dQ
q q

'l1'
1.

dr.

1.

=1

(2.36)

where e, !p and X are the Euler 'angles of the problem and where
the dQq and dr i are the "volume" elements for the vibration coordinates and electronic co-ordinates respectively, in the
rotating co-ordinate system.
Now until quite recently that would be where our discussion
would have ended, but Watson in a really important paper (op.cit.)
was able to show (and Moss was able to generalise his discussions
to the case we are considering) that you could "commute" out the
factors and (neglecting the translational portion) write the
Hamil tonian as:

It' = 1 L ~ ~ SRS + 1 L P 2 + 1 LP ,2
2 a, S a a

2 q

2m

1.

(2.37)
In other words to show that the total effect of "~ was to add a
rather small mass dependent term to the potential.

He was also

able to show that "~aS was a function only of the elements of the
inertia tensor at the reference configuration and the vibrational
co-ordinates, and thus commuted with
of factors in (2.37) does not matter.

Na

and

NS

so that the order

It is usual now to attempt

an expansion of ~aS and this can be done in terms of the matrices


ap with elements

(ella s/'aQp ) 0
where the

(2.38)

signifies the reference configuration for all nuclei.

In terms of ~ and

1o (the inertia tensor at the reference

25

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

configuration) we obtain

-1 _

-0

1:

-u

-1 a 1+1 + 3

- -

I -1

-0

~I -1 ~I -1+ .
-"'0

(2.39)

where

a = kE a.~

(2.40)

Here (2.39) is a series of constant terms which provide a


correct representation of the operator only if
definite matrix and the series converges.

fO

is a positive

But of course the

convergence depends on the behaviour of ~ and ~ depends on the


~o

r n and the 1nq and, as we shall see, we are ~ free to assign


these to ensure convergence. The status of (2.39) is therefore

problematic.
And now we are on the home stretch and the only thing we
need consider further is the spin co-ordinate.
consider electron spins for the moment.

Let us just

Though the Hamiltonian

(2.37) does not explicitly contain any spin terms (except perhaps
in V) its solutions will be functions of electron spin and it will
be natural to refer electron spin to the rotating axis system
(that is Hund's case (a unless there is a strong external
magnetic field, or very strong spin-molecular-rotation interaction {Hund's case (b. Now if one wants to use spin functions
referring to the rotating co-ordinate system then one must find
the transformation operator,

say,which maps solutions of the

space fixed problem into solutions of the rotating problem

~rot

~ ~fixed

aR far as the Rpins are concerned and then transform the


Hamiltonian as

0 R 0- 1

so that it then becomes the correct

Hamiltonian to act on functions

ro t. This transformation is
of course just the rotation matrix for spin functions, and as
~

26

B. T. SUTCLIFFE
1

such is the D2 representation of the full rotation group. This


representation is of course a function of the Euler angles and
this affects

N.

Fortunately it does so in a very simple way,

changing it from ~ to

N-S

(see Van Vleck Phys. Rev. 33, 467, 1929)

where ~ is the spin operator appropriate to the rotating frame.


The

Uoperator

will of course also affect any terms in

Hinvolving

spin operators, but will not affect the usual electrostatic terms
in V so we will not consider its effects in this context in detail.
(I must say I always find this result
ing one since there are no "bare"

N~ N- S a

S terms

very surpris-

in the original

Hamiltonian, but from the analysis it has to be there).


Nuclear spins can be dealt with in a very similar way, in
fact in exactly the same way for spin

nuclei and by using the

appropriate representations of the rotation group for nuclei of


higher spin (spin 0 nuclei can of course be ignored, thank
goodness!)
Now let me stress that there is in fact nothing approximate
about (2.37), in the sense that no approximations have been made
in the course of deriving it from the fundamental problem (though
of course I have chosen to neglect external fields, spin terms in
the potential and so on, this was merely for ease of exposition,
they all go through without approximation if necessary).

But of

course central to this approach is the notion of a reference


configuration of the nuclei, that we have so far assumed that we
are free to specify for use in (2.28) and (2.29), though we have
had at the back of our minds some idea of the reference configuration as the "equilibrium" one in some fixed nucleus sense.
But how free are we really to specify this framework?
If we combine (2.29) and (2.34) so that

1 involves

only the

Qp as variables, it is easy to see then that the elements of iL


will be operators of order Q -2. Now in the next section we shall
p

27

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

see that this kind of operator is really highly singular and that
it is possible to show that a Hamiltonian containing such highly
singular terms is not generally a proper object in quantum
mechanics.
that

However if we choose a reference

such

configurat~

f -1 is well defined then (2.37) is a perfectly good

....0

Hamiltonian (if there is nothing nasty in the potentials).

Now

suppose that we can find a manifold of solutions of (2.37) in


this approximation, we can then obtain perturbation (or variation)
solutions of the full problem in this manifold, providing that the
expectation values of the full

Hdo

not diverge.

It is therefore

only if this second step can (at least in principle) be made that
(2.37) makes any sense as a Hamiltonian operator.

(In passing it

is quite interesting to compare (2.37) with the Hamiltonian


proposed by Hirschfelder and Wigner, Proc. Nat. Acad. Sci.

~,

113, 1935, where they did attempt to get at a result like this by
direct quantum mechanical discussion without specifying a
reference configuration so directly.

Not surprisingly the same

trouble develops there, and it is manifested in ghastly singularities in the operators and in dependence among the variables).
Supposing that we can attempt,say,a perturbation theoretic
approach of the full problem, then it only remains to determine
if the answers so obtained correspond to the situation one thinks
one is describing, and hopefully by a good choice of the reference
configuration,one can always do this.
Thus we conclude that (2.37) can at best make sense for a
strongly bound system and we are not sure whether it does for all
such systems (for example it is not clear how far one could go
with this approach in systems that one would think of usually as
being single bonds about which "free" rotation was possible).
Setting aside these problems for the moment however, we see
that if V is just the ordinary electrostatic potential and hence

28

B. T. SUTCLIFFE

not a function of the Euler angles and we quantise the spin along
the space fixed axes, then the solution to the problem specified
...

"-1

by the first approximation to a,i.e. one with ll= Io

can

always be written as

(2.41)
where

0.

and

denote electronic and nuclear spin functions

referred to the space fixed axes.

(In passing it should be

noticed that $
will be very tricky to make properly antien
symmetric with respect to fermion nuclear interchange and
symmetric with respect to boson nuclear interchange because the
nuclear space co-ordinates are all mixed up in Q and in any case
not all need necessarily be there).
contain the ~~ as parameters.

Each function will of course

Thus in this case a true separation

of rotation and electronic with nuclear-vibrational motions can


be obtained.
However when the spins are referred to the rotating coordinate system, the simple separation is no longer possible even
making the above assumption about V and ~ and the general solution
becomes a sum over terms corresponding to various

~,

MS and MI

values for fixed values of N, S and I (and of course there may be


various different groups of I values for different groups of
nuclei).

We get back to simple product form only if the electrons

and the nuclei are both in singlet states.

Now, in a sense, we

can avoid the problem of nuclear spins by noting that since no

part of 'H
" in (2.37) depends on them in our approximation, we can
just write, if we neglect symmetry/anti-symmetry considerations

where the spins are now referred to the rotating frame, and we
see that the component of the energy associated $
with respect to nuclear spin.

is degenerate
en
Now to cope with getting the right

29

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

symmetry/anti-symmetry into the products

0 is, as I have said


en
tricky, but it is of course perfectly do-able by straightforward
~

group theoretical methods, at least for small numbers of nuclei.


We can regard our solutions

as solutions in which the nuclei


en
are identified but not fixed, and as such, the solutions to the
~

problem specified by the last five terms in (2.37) (which we shall


call
able.

Hen )

assuming the displacement co-ordinates to be distinguish-

Now if we could solve for ~


at this stage then we could
en
check the validity of our assumption about the convergence of the
expansion of

by calculating the expectation values term, by

term ,of the expanded Hamiltonian over functions like


seeing if the expansion converged.

and
en
But of course in practice
~

this is simply not possible and it is at this stage that one is


forced to make the Born-Oppenheimer or Born-Adiabatic approximations based on the assumption that the wave function can be
(r.,s. ,Q p ) ~ n (Qp )), where we
regard the Qp in ~ e as parameters rather than variables.

written as sums of products of

(~

ell

In the simplest (Born-Oppenheimer) approach we regard

~e

as

determined by the solution of the problem specified by


A

(2.42)

for all values of Q close to zero, and thus determine a function


p

E (Q ) as an "eigen-value" of (2.42). Now again in practice we


e p
do not know the Qp at this stage, and what we do is to get Ee as
a function of the complete set of displacement co-ordinates

~ -~ 0) and add to E the nuclear repUlsion potential in terms


nne
of the displacement co-ordinates to produce a so-called "total
energy" E.

It is possible in a perfectly standard way to trans-

form these displacement co-ordinates into (3N-6) internal


co-ordinates (see e.g. Wilson, Decius and Cross Ch. 4) which

30

B. T. SUTCLIFFE

satisfy the requirements placed on the Q in so far as they are


invariant to translations and rotations.
of these internal co-ordinates

~an

Now if E, as a function

be expanded in a Taylor series

about a set of reference positions of the nuclei defining the


framework for a reasonable range of values of the internal
co-ordinates, then we are in business.

We can replace the

potential in the nuclear motion equation by the Taylor series


expansion of E, to any order we require (but usually only up to
second order).

We can then choose the Q in terms of the internal


p

co-ordinates as those which diagonalise the quadratic part of the


expanded potential in the nuclear problem.

Given these co-ordinates

and the reference configuration we can move to a principal axis


system for the inertia tensor

1,
....0

hence define the Euler angles

and then the Corio lis coupling constants and we can thus solve
the nuclear problem as accurately as we like.

Furthermore we can

proceed to relax the Born-Oppenheimer assumption, by replacing


our single term function by a sum and including the neglected
terms in the nuclear motion problem.
In fact you can, with some difficulty, modify this approach
to deal with "free" internal rotation, and, with even more
difficulty to deal with the "two reference configuration" problem
such as o.ccurs in the inversion of ammonia, but it is not possible
or desirable to go into details here.
The point I want to make is that within the context of this
approach, the "fixed" nucleus electronic Hamiltonian has a clear
significance.

If we solve the fixed nucleus Hamiltonian problem,

as usual, for various values of the internal (nuclear) co-ordinates


to determine E as a function of these co-ordinates and if we find
that E has a reasonably deep minimum for some value of these
co-ordinates then we know that we can, using the preceding
analysis, develop solutions to the whole molecule problem, as
accurately as we choose.

The resulting functions are of course

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

31

good only for the region of configuration space about this


minimum but within this region we are on safe ground.
Now of course it is technically possible to solve the fixed
nucleus problem, with nuclei fixed at arbitrary positions and so
obtain an energy surface (or surfaces) for the complete range of
values of the internal co-ordinates, but as I hope that you can
now see, it is problematic in the extreme as to how far it is
legitimate to treat such a surface as a "potential" in which to
describe nuclear motion, except close to a minimum in that
surface.

Essentially the situation is this, you can go ahead

and solve the problem for the nuclei on the electronic surface
you get (which is in practice one expressed in terms of 3 Nn -3
co-ordinates so that uniform translation, but not relative
translation, is taken out, but rotation is left in) and this is
now a growing art form.

But one cannot be clear about what

relation this problem has to the physical situation one wishes to


describe, for there are bound to be regions on the surface where
the whole notion of separation of nuclear and electronic motions
becomes a nonsense.

Thus the fixed nucleus problem for a system

does not have an invariant significance over the whole range of


internuclear separations, a conclusion which is of course world
shatteringly unsurprising.
So far what I have tried to do is to outline if you like the
background to the problem of treating electrons in the fixed
nucleus approximation.

As far as the nuclear motion problem was

concerned it was my intention to convey to you the idea that the


fixed nucleus molecular Hamiltonian was a well defined object, in
a certain sense, only under certain rigidity conditions, which in
turn could only be developed by actually looking at the fixed
nucleus problem.

I may perhaps have over-stressed this and you

may be thinking that I was somehow impunging the Born-Adiabitic


formulation, but I was not and it may be worth a few minutes just
to redevelop this.

32

B. T. SUTCLIFFE

First let me say that the original derivation in Born's


paper and in the book by Born and Huang ("The Dynamic Theory of
Crystal Lattices" App. VIII for this and VII for the BornOppenheimer) certainly looks wrong because they write down the
full Hamiltonian as:
(2.43)
with

IE'

by implication given as -~! V2 (i) and so on.


i=l

Now as we have seen this Hamiltonian has no square integrable


eigenfunctions because of the translational continium, so that
the expansion that they later go on to make is, formally at least,
a nonsense.
If we follow Born and Huang for a moment however, they choose
A

(2.44)

and solve the problem


0,

= 8
X (2.45)
mx
nm for all ,...
....

<~ I~ >

presumably by solving the fixed nucleus problem for all values of


X. Assuming now that a complete set of $ n exist we can write a
,...
solution to the whole problem as
1jJ (x, X)

= !1jJ

n n

(x)$ (x, X)

(2.46)

and substituting we see our full problem is then


(2.47)
A

Since Ho contains! only as a multiplier we may re-write the

33

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

first term as

(2.48)
and the second term we can write as

Nn

E ~ p(q) .p(q)

(2.49)

q=l m
q

and our second term then becomes, using the product rule,
(2.50)
If we multiply from the left by

say and integrate over all

electronic co-ordinates then we get


(2.51)

E 1jJ (X)

m-

where

cmn (X)
,"

E 1 (Aq + Bq )
mq
mn
mn
q=l
(2.52)

now Aq and Bq must be Hermitian


and for stationary states ~ can be chosen real.
m
IV then clearly Aq = O.
nn

"

Since p is

So we can re-write our wave function for nuclear motion as


E c (X)1jJ = 0
n m nm - m

(2.53)

where

(X)

m-

(2.54)

34

B. T. SUTCLIFFE

Now of course the formally wrong step in this derivation is


the expansion assumption, which probably should not be made anyway, but is certainly not valid if the translational continium
is still present since then there is no discrete part to the sum.
However, the derivation goes through in a very similar manner if
you assume translational motion is separated off and hence you
may well get a Hamiltonian of a somewhat different form as your
base problem.
For example

"A,

H' = T'E + TN + U(~,


where now both

T~

and

T~

!')

NNnl , . . "
L
p(i) .p(q)

Ct
~,q
q

(2.55)

may contain mass polarisation terms and

there will be three less nuclear co-ordinates than we started off


with.

You can easily see that the presence (possibly) of mass

polarisation terms merely alters the significance of the term we


have called Cnn (X'.
in fact it adds a few more terms to it, so in
~
principle we could get this kind of equation, though the precise
details would depend on the co-ordinate separation scheme, and
the resulting "potential" for nuclear motion would be essentially
the same kind of thing as our Um(X)
.... above.
Now of course if our C or their equivalent are small then
nm
we can neglect the sum in our equation, and we can regard the
fixed nucleus equation for electronic motion as providing a
potential in which the nuclei move.

But of course if they are

not and, for obvious reasons, far away from an "equilibrium"


geometry we have no reason to assume that they are, small.
(Think for example of when a molecule is reacting then surely its
electronic wave function must be by hypothesis on extra-ordinarily
sensitive function of nuclear-geometry).

We cannot then neglect

the C
and the whole idea of a potential falls to the ground
nm
again.

35

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

Now I do not mean to say or imply that as a consequence of


this you can never use the idea of a potential surface in thinking
about a chemical reaction.

Clearly one can in many cases in

diatomics, and one can in situations where one artificially forces


on the system ones' preconceptions about behaviour.

Thus if one

decides to study the formation of NH4CI from NH3 and HCI and so
constructs ones fixed nucleus electronic problem that one is
always essentially describing NH3 and HCI as separate entities,
the free co-ordinates being their separation and their almostrigid rotations, then almost certainly one is not in trouble.
You can think of other examples.
What I wanted to say was simply that it is premature, even
wrong, to suppose that you can always give the significance of a
"potential" to the clamped nucleus eigenvalues as functions of
the internal co-ordinates.
or not.

We simply do not know whether we can

Whether we can or not is only very obliquely dependent

on the energy separation of our electronic states.

Certainly we

know that in the case where we have a state which is degenerate


for some nuclear configuration the C terms do become important,
nm
as they lift the electronic degeneracy by distorting the nuclei
(the Jahn-Teller effect).

They can also however be very important

even when the electronic states are well separated, if the


vibrations become very large as for example in the Renner effect
in NH 2 , and of course in many other sorts of vibronic interactions.
Well at any rate I hope I have made my point, I don't want
to stop people calculating 'potential' energy surfaces (so-called)
but I wish you always to ask yourselves is that what they really
are.
I should of course also like to stress again that the BornAdiabitic separation is a purely formal procedure except for
diatomics.

There seems to be very little chance indeed of

developing even approximate

Un(~)

for all! even for triatomics,

36

B. T. SUTCLIFFE

other than by arbitarily restricting the variables considered.


(You would need a 6 dimensional space over which you had to get
first approximate E~(~) on a sufficiently fine grid to
differentiate it and then you would need to perform the coupling
integrals over the electronic co-ordinates to check that the C
nm
were small. These integrals would be fiendishly difficult).
You cannot in any sense pass from these formal equations to
a set of equations like those I gave you for the almost-rigidmolecule, though you can use a Born-Adiabitic type argument on
Moss's Hamiltonian where the equivalents of the general nuclear
motion co-ordinates we have used are replaced by the vibration
co-ordinates I developed earlier, and indeed the Moss-Watson
Hamiltonian is used in this way in discussing vibronic interactions, as I tried to indicate at the end of my lecture on it.
This is the approach Dr. Swanstrm has used in his lectures.
The point I made earlier still stands; at the present time, if
you want to understand the nuclear motion of the molecule near
equilibrium, you have to use a classical Hamiltonian until you
have got rid of rotation otherwise you are absolutely broke.
I should also say that you can of course develop a completely
variational approach along the lines of the Born-Huang approach
(you can guess what you might do) but of course the formal
equations are much more cumbersome and remain essentially quite
formal so there is perhaps no need to go into this.

37

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

3.

THE MATHEMATICAL PRELIMINARIES.

The stationary state

solutions of the Schrodinger equation.


In many ways the last section was a huge digression and to
some extent it was the cart before the horse, because we kept on
referring to eigen-solutions of this and that equation before we
really knew whether or not such solutions existed at all, however
perhaps you will be prepared to believe from what has gone before
that unless we fix the nuclei or separate off translational
motion our full Hamiltonian has no useful eigen-solutions.
However if I persuaded you of this in one sense I am afraid that
I cheated you.

A very important (but in a way elementary) point

that I did not make explicit in the previous section was of course
the well-known one that the differential equation determined
formally by the operator

is in fact only determined completely

if we specify boundary value conditions and continuity and


differentiability conditions on the function space in which H
operates.

Now it is a common-place of elementary quantum chemistry

that all eigen-solutions of a quantum mechanical problem must be


square-integrable over the configuration space of the problem
otherwise it is not possible to show that associated eigen-values
are real.

Now if we took that restriction seriously we would

have to say that our full Hamiltonian has

eigen-states whatso-

ever in a quantum mechanically useful space since our putative


eigen-function T(R) (see 2.9a) is definitely not square integrable.
It would obviously be quite inappropriate to consider in
detail here the abstract mathematical problems raised in dealing in
a satisfactory manner with the continuous spectrum of an operator
in quantum mechanics, and this is particularly so because in fact
we shall not be concerned in any practical sense with the
continuous spectrum.

We shall not therefore discuss it further

beyond referring again to Hellwig's book, where this very difficult


problem is discussed in more detail, and commenting that it is

38

B. T. SUTCLIFFE

perfectly possible to

construct appropriate square integrable

solutions, though these are not eigen-solutions of the ordinary


differential equation but merely solutions to a rather more
abstract reflection of this problem, in a suitable Hilbert space.
Let us however assume that we have somehow got rid of the
nuclear motion in a given problem and so avoided the problem of a
ubiquitous continuous spectrum

due to translation, we might well

confine ourselves to the space of ordinarily square integrable


functions and ask ourselves what other properties such functions
should have in order to belong to the discrete spectrum of the
Hamiltonian operator and this (as if you could not guess!) is
itself an extremely tricky problem, for in the first place from a
mathematical point of view we have to establish that the
Hamiltonian operator has any discrete spectrum at all.
Now I can well understand that you might consider this a
terrible quibble, because we just know that every reasonable
Hamiltonian has a discrete spectrum, but I wonder do we really?
Take this very simple problem

"H(1,2)

- !

2
L

i=l

(i (i)

- Z )

r:-

(3.1)

1.

which is of course the Hamiltonian (inatomic units) for the


helium-like ion in the fixed nucleus approximation.

We "know"

that it has a discrete spectrum for Z=2 because we have seen the
line spectrum of the helium atom, and we know it has one for Z>2
on similar arguments, but has it got a discrete spectrum say for
Z=l?

That is has H- got a discrete spectrum?

I take it that you

will agree with me that in fact this is a very difficult question


to decide, but is obviously one of crucial importance because it
is no good trying to construct "approximate" solutions to this
problem, which are square integrable, if in fact its spectrum is
purely continuous.

39

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

Or take problems where there is a constant electric or


magnetic field involved, by hypothesis such fields are finite at
infinity and it is again by no means clear that it would be at
all useful to look for discrete eigen functions of such problems.
Again you may say, "Come off it!

We can solve the problem

without the electric field, say, and include the electric field
later by perturbation theory, to describe the Stark effect and
similarly for the Zeeman effect".

But of course, if in fact the

Stark effect Hamiltonian had no discrete spectrum, if you carried


out your perturbation theory properly, all you would find was that
it was divergent, and indeed it is easy to show (and is shown,
for example, in Landau and Lifshitz) that the perturbation
expansion for the hydrogen-like atom is indeed divergent,
precisely because the Stark effect Hamiltonian in this case has no
discrete spectrum.
And so we could go on, with many other examples.

Of course

I agree with you when you assert that in 99 cases out of 100 in
systems of chemical interest we are not going to run into any
troub Ie on this kind of score, but I think that we are moving into
an era in computational quantum chemistry where we are going to be
asked to look at pretty bizzare molecules and ions for people,
such as those postulated as reaction intermediates and it will not
always be so clear as it seems now whether there are any discrete
solutions to the specified problem.
What then is known of these matters?

The situation is that

one could have answered this question pretty easily up to about


1960, by saying simply "Next to nothing".

But since then there

has been an enormous amount of work done and the name associated
with much of it Kato and certainly he provides a good review of
the field in his 1967 article, in the Supplement of the Progress
of Theoretical Physics 40, 3, 1967.

(See also the article by

Hepp in "Maths. of Contemp. Phys" , (Ed. Streater) Academic

40

B. T. SUTCLIFFE

Press, 1972).

The important results for our purposes may be

summarised, albeit a little crudely, as follows.


For any neutral molecule or positive ion the ordinary
Schrodinger operator, either in a centre of mass co-ordinate
system or in the fixed nucleus approximation, possesses a
countably infinite number of square integreable eigen-functions
in ordinary configuration space.

The eigen-values associated

with these eigen-functions have a lower bound which is negative


and an upper limit point

~,

which is zero for a one particle

system and less than zero for a many particle system.

It can

also be shown that there are no eigen-functions of the continuous


spectrum below
to

~,

and that the continuous spectrum extends from

(in some well defined sense).

It is further possible to show

that there are no eigen-functions of the discrete spectrum with


positive eigen-values, but it

~as

not yet proved possible to

establish in detail what happens in the interval

(~,

0).

Of

course we believe there are quasi-discrete eigen-functions in


this range, otherwise

we don't really understand the phenomenon

of Auger electrons; and there are theoretical reasons for


believing that there are at least some states in the range

(~,

0).

It is possible to extend these results to the case where a


magnetic field is present, without any essential changes.

It is not

possible to extend them to negative ions and nor is it possible to


extend them to operators with potentials much more singular than
the coulomb potential.

More specifically these results cannot be


proved if the potential is more singular than r- 3 /2. It is
further possible to show that the results obtained above still
hold if it is required that the eigen-functions belong to
irreducible representations of the permutation group, the
rotation-inversion group and so on.
As for the detailed properties of the eigen-functions, it is

41

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

possible to show that they must be continuous everywhere and that


they must be bounded everywhere, that is they must be such that a
constant k can be found so that for any eigen-function
~,

(~*~)2

<k.

Furthermore they must possess first derivatives

everywhere, except at singular points of the potential, and that


these derivatives must be bounded where they exist.

From this

result it follows of course that the eigen-functions do not


possess derivatives of all orders everywhere, as is sometimes
An
n
asserted, on the basis of the undoubted fact that H ~ = E ~
for arbitrary n, on any eigen-function

in the discrete

spectrum.
These last results may strike one as a little odd,
particularly the result on the boundedness of

~,

because it

contradicts a result of Fock who asserted that for the helium


atom in the fixed nucleus approximation ~ should go like
22
In(r l + r 2 ) for small r l r 2 (the two electron variables), and
clearly this is unbounded behaviour. It would seem that Fock's
assertion is questionable.

The properties of the first and

second derivatives are in fact commonplaces for the discrete


spectrum, but we seldom notice them explicitly.

Thus if we

consider the exact ground state of hydrogen in the fixed nucleus


approximation, Ne- ar , we see that
Ne

-ar

- Nax e- ar

etc.

(3.2)

and it is obvious that this derivative does not exist at r = 0,


though it is clearly bounded elsewhere.

If we look at the second

derivative in spherical po lars


2 -ar
-ar
a e
- 2ae

(3.3)

naturally this does not exist at r=o either.

This fact does not of

B. T. SUTCLIFFE

42

course worry us at all since the second part of the expression


exactly 'cancels' with the coulomb term in the Hamiltonian,
ensuring that the Hamiltonian operator as a whole can be applied
as many times as required to the eigen-function.

As for the fact

that there are no positive discrete eigen-values, this result


follows easily from the virial theorem, for all homogeneous
potentials of degree-a, if O<a <2.

The interesting case for us

is the coulomb potential for which a = 1.


We do know something in a little more detail about the general
behaviour of an eigen-function when a pair of variables approach a
coulomb singularity.

It is convenient to display these results in

an integrated form due to Bingel (Z. Natur. l8a, 1249, 1963) in


the fixed nucleus approximation though the original results were
due to Kato, Bingel showed that as an electronic variable
approached zero Kato's result could be written as
IJ' (;:'1'!-2 -- ~)

= IJ'(O'~2

---

!N) (l-Z

.lo

r l ) + rla l (!.2'!.3 -+

O(r l 2 )

.!N)
(3.4a)
~

where r. denotes the set (x., y., z.) of electronic variables, r.


-1

the associated "position vector" and r. the radial variable.


1

As a pair of electronic variables approach one another the


equivalent result is
~

...l.

1J'(!J!Jr 3 -- rN)(l + ~r12) + r12Cl2(~'~ --- ~)


2

+ O(r 12 )

(3.4b)

where
~

= .:.r l

+ r2

set associated with

r l2

F.

and ,..
r denotes the variable

Notice these results apply only when a variable approaches

43

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

zero or a pair of variables approach one another, we do not at


present know what happens when more than one singularity is
approached simultaneously.

The two results (3.4a) and (3.4b)

are often called the integrated form of the Kato cusp conditions.
Now it might be thought at this stage that I am now going
to say that "It can further be shown that the eigen functions of

II

form a complete set, if we include the continuum correctly",

but I am not, because so far as is known at present they don't,


at least not for the many-particle problem.
situation is this.

Technically the

Kato was able to show that corresponding to

the ordinary Schrodinger Hamiltonian, there exists in a Hilbert


space an abstract operator,
self-adjointt.

Hsay,

which is not in fact simply

He was fortunately able to show however that

this operator possessed a self-adjoint extension, so it was what


is called, essentially self-adjoint.

However it is not in general

possible to show that the abstract self-adjoint extension corresponds


to an ordinary Schrodinger operator in configuration space.

In

consequence it is not possible to show that a complete set


expansion can be built up from the eigen-solutions (discrete and
continuous) of the ordinary N particle Schrodinger problem.
Now what this means is that "abstract" quantum mechanics as
developed, say by von Neummann, (for a modern exposition see
e.g. Jauch "Foundations of Quantum Mechanics") is perfectly
securely founded, since that depends only on abstract operator
theory, but "practical" variation and perturbation theory is not
so securely founded insofar as it involves appeal to specific

tUsually in quantum mechanics we use the terms "hermitian" and


"self-adjoint" interchangeably.
ions a difference

~usually

In detailed mathematical discuss-

made, the notion of hermiticity

(sometimes just called "syunnetry") merely ensures real eigen-values


self-adjointness however, ensures that the eigen-function expansion
is complete, in the mean.

44

B. T. SUTCLIFFE

solutions of the usual Schrodinger problem in configuration space.


However as you no doubt appreciate the practical consequences of
this lack of secure foundation are precisely nil!
I've tried to draw together in this section all the analytic
results that I know of, that have a bearing on the solution of
the ordinary Schrodinger problem and I hope they have not merely
constituted a bald and unconvincing narrative, but those of you
who wish to give them a semblance of verisimilitude should consult
the original references.

Now I want if I may to end the section

by drawing out some fairly practical consequences.


In the first place of course there is no reason, from what we
have said, to believe that we will ever be able to get directly,
exact solutions of the ordinary Schrodinger equations, but since
the exact solutions in configuration space are known to be
reasonably well behaved, we can have every confidence that we can
construct approximations to them by some rational approximation
scheme, such as a kind of Fourier expansion.

We shall return to

this in the next section.


The next point is, that it is clear that we cannot put any
old terms in our Hamiltonian and assume that it will still have a
discrete spectrum, because if the terms are too singular we know
that this may well not be true.

Thus we can with impunity

include the operator

get
B S (i)
-z

(3.5)

2m

to describe the effect on spin of a uniform magnetic field


applied along the z direction, and still assume a discrete spectrum,
but we can't include for example, the spin-orbit coupling operator.
2 ?f. 2
- g e-h

~(')
2 ~s (.).::.
1 .r .. P J

81fm2

r ..

lJX

lJ

...l....l.

s(j). r .. p(j)
lJX

(3.6)

45

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

If we wish to include an operator like (3.6) we can include


it at most by "perturbation theory", in the sense of as an
expectation value of an already computed wave function.

We

should therefore be very cautious about involving terms like


(3.6) in the zero-th order problem, as one might be tempted to
in, for examp Ie, "coup led" Hartree-Fock theory.

A s imil ar

restriction must also be placed on considerations of an electric


field.

(In passing it should be noted that if one considers (3.6)

as being an essentially relativistic origin, as one might well,


arguments of relativistic consistency can be adduced for treating
it solely as a perturbation, as well).
We should also notice that our results imply that any
neutral or positively charged agglomeration of nuclei and
electrons whatsoever, possess a discrete spectrum.

This in its

turn implies that there is always a configuration of nuclei in


the fixed nucleus approximation, that, if associated with an
appropriate number of electrons,has a bound state, and so in
some sense "looks like a molecule".

Of course this arrangement

may be pretty unstable with respect to decomposition into


various sub-units.

46

4.

B. T. SUTCLIFFE

THE PRINCIPLES OF APPROXIMATE SOLUTION OF THE EIGEN-VALUE

PROBLEM (1. 2).


As we have seen we cannot hope for a direct separable kind
of solution in the many particle problem, and the natural thing
for us to look for next is a series solution to the equation,
and since again because of the problems of inseperability we
cannot use the equations to define such a series solution we
must look for a series solution in a Fourier sense.

That is if

is the desired solution, we must look for a set of functions

~
~.

which are linearly independent and orthogonal such that we

can write.
~

= Lc.~.(xl'-~~'
.~~-..-z.

---

(4.1)

~-)

-N

where we use the symbol x. to denote the complete variable set of


-~

particle i (x., y., z. and the spin co-ordinate s.).


~

use the symbol

We shall

to denote the space variable set only.

The

extent of the sum is generally infinite and we shall be content


if the sequence of partial sums
~m

m
L

i=l

(4.2)

c.~.

~ ~

converge on the exact solution in the sense that


(4.3)
assuming that <41 mJ4Im > = 1.
The notation in (4.3) is simply that

(4.4)

JfJ2=!f*fdX
the integral being over all configuration space.

The requirement

that the m'th partial sum be normalised to unitYis expressed in

the usual Dirac notation, that is with any operator "B,

47

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

(4.5)
the integral again being over all configuration space.
The convergence expressed in (4.3) is sometimes called
convergence in the mean.

There is another type of convergence

sometimes talked about in this context and that is uniform or


pointwise convergence, and that, as the name indicates, is
expressed by the condition
_E~(~m - ~) ~ E, for some positive E, all

(4.6)

x in range

Now the expansion (4.1) is not at all straightforward


unless we actually know that . the chosen expansion set
complete discrete set and that
discrete spectrum.

~.

is a

is an eigenfunction of the

Now we can and will confine attention to

approximating discrete eigenfunctions and first ask the question


are there any complete discrete sets of many-particle functions
that are known?

The answer appears to be that no such sets are

known directly from the many particle point of view.

However a

number of complete discrete sets are known for one-particle


problems in the whole one-particle configuration space (-oo<r<oo),
-1M
and it is an easy matter to show that suitable complete manyparticle expansions can be obtained in terms of products of such
one particle functions.

However, the only "physical" complete

discrete set of one particle functions is the set of threedimensional simple harmonic oscillatOr functions, though there
are plenty of other single particle complete sets that do not
arise directly from physical problems.
Now if we assume that we are working in a complete set of
functions, the immediate problem is how to determine the c., and
1

in the case of such a complete set we can show (and it is by no


means trivial to show it, see e.g. Kemble, Section 51b or

48

B. T. SUTCLIFFE

Kato, Trans. Am. Math. Soc. lQ, 195, 1951) that we can replace
the eigenvalue problem by a variational problem to determine the
c .
~

In particular we can make use of the linear-variation

theorem.

(This theorem was almost certainly known in essence to

Legendre, and has doubtless been re-discovered many times, but


it is usual to attribute it to Hylleraas and Undheim (2. Phys.
65, 759, 1930) or to MacDonald (Phys. Rev. 43, 830, 1933)).
I am sure that you are all familiar with it in outline and
it is based on the observation that, whether or not the function
set is complete we can construct a sequence of approximation
functions as solutions of the secular problem
(4.7)

which arises from minimising the functional <~ IHI~>


with respect to the linear coefficients.

Here:

<~.Igi ~.> _ H .

(4.8a)

<~.I~.>
~
J

(4.8b)

~J

0 .

~J

and the elements of the column matrix em are the coefficients in

(4.2).

The energies Em may be placed in non-decreasing sequence

and the theorem of interest (for which I evinced great antiquity)


.
.
. . .
~s s~mply that ~f we m~n~m~se the

funct~onal

<~

m+

11 H1~m+1>
A

t h en

m+l
the roots of this problem E. "split" the roots of the m-th
~

order problem, that is


and so on.
Now of course if (and only if) the set is complete, this
theorem implies that our sequence satisfies (4.3), for each and

49

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

every eigenfunction of

R,

that is our energies converge on the

exact energy from above and the overlap between our approximate
and the exact wave function tends to unity.

By the same token

we are- assured that even if the set if not complete, we will


still not break through the upper bound so all our approximate
energies must be upper bounds, if calculated using (4.7).
Having summarised the theorem I wish to make the following
remarks.
First a rather obvious one about symmetry, this theorem
clearly holds only within a manifold of states of a given
symmetry.

Each symmetry type has its own associated secular

problem and the problems will not mix.

If there is a degeneracy

in a given problem due to symmetry, it is quite easy to show that


the degenerate eigen-values are depressed as a degenerate block
on extension of the basis,providing that at each stage, the
basis is extended by sufficient functions to provide a basis for
the degenerate irreducible representation.

(Thus if one were

considering an atom and had chosen to look at P states, then at


each stage one must add 3 equivalent functions of P symmetry etc.).
Secondly the theorem still holds even if the basis

{~.}
1

is

not orthonomal, it is sufficient simply that the basis be a


linearly independent one.
Finally some points about the

{~}

in practice.

In general

we shall not know whether any set that we have chosen is


potentially completable in a uniformly convergent sense, and
most function sets currently used in practice are certainly not.
This has implications in the use of such wave functions to
calculate properties, which we will discuss later.
Also in practice the

{~.}
1

will themselves contain some

parameters which are themselves capable of variation (for instance

50

B. T. SUTCLIFFE

the internuclear distances).

If we denote such parameters by !

then we can imagine that we might solve a sequence of secular


problems involving the sets
{~m(R)}

-1'

{~m(R)} ------2

'

and so on.

The upper bound theorems would apply separately to each problem,


but providing that the functions are continuous functions of R
then it is possible to use the results of these calculations to
develop energy-bound surfaces with R as the surface variable.
It must be stressed, that to retain the bound theorems we must
first vary the R and then solve the linear variation problem.
If the parameters! are in fact the internuclear distances then
particular care must be taken about the continuity requirement,
when a change of geometry implies a symmetry change.
Another point of practical interest is that we can in fact
"concentrate our attention" on higher roots and still retain the
bound theorem.

Suppose that we construct two approximate

functions
~m

= E

i=l

c.~.

~ ~

and

-m
~

c.~.

1=1 ~ ~

where ~. and ~. are different function sets and we choose the


~

~.

so as to make say El as low as possible and we choose the~. so


~

as to make E~ as low as possible, where E~ is the lowest root of


-m
the first problem and E2 is the second lowest root of the second
problem.

It follows at once that E~ is still an upper bound to

the true second lowest root even though it will, by choice, be


lower than Em
2 . In practice if you do this for a smallish set
you find the ordering

though of course no theorem assures you of this.

Generally of

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

course in this case <cj>m, ~~

+0,

51

though all other things being

equal one might expect it to be rather small.


Now I do not wish to be thought to imply from what has been
said before that the linear variation method is the only method
of constructing an approximate solution of the eigenvalue problem.
I think that it would be justified to say that in quantum
chemistry at any rate, it is the most popular and widely used
method, but overall it is probable that various perturbation
theoretic methods are equally popular.
of'~

I shall not consider any

these in detail here for some separate lectures in our course

will be devoted to them, but I should point out that matrix


elements like (4.7) have to be evaluated in these approaches too,
and the basic mathematical considerations are very similar to
those obtaining in linear variation theory and indeed in one
sense it is possible to regard perturbation theory as approximate
variation theory.
However there are some methods (for example the transcorrelated method, and various local energy methods) which are
used and which seem to have little or no relation to variation
theoretic methods.

Again I do not wish to say much about them

beyond commenting that in the end they involve calculating


effectively, if not explicitly, integrals like (4.8) and so can
be considered as raising no essentially new mathematical
problems in future discussions.

52

5.

B. T. SUTCLIFFE

THE CHOICE OF APPROXIMATING FUNCTIONS


We have seen that the only practicable ways of obtaining

approximate solutions involve expansions of the form:


m

L:

cf

P P

q,

(5.1)

and that our ability to determine the c;

depends effectively in

all approximate methods, on our ability to calculate integrals


of the form:
dr

~N

q, (r

q .. l

-- r ) q, "( r~ --- r )
..N

p,;c

-N

(5.2)

where q,p" denotes a function q,p after having been operated on by

...

H.

The integral is over all the 3N dimensional configuration

space only, for the removal of spin from the integral is always
essentially trivial.
If we suppose for a moment that we can get numerical values
m

for these integrals then we can determine the c p in any fixed


basis {q, }, by solving the secular problem (4.6) and this as we
p

have indicated is, at the practical level a rather easy matter,


for m up to about 10,000.

Thus so long as we can evaluate the

integrals we might well expect to be able to choose a function


set {q, } to obtain extremely good results.
p

We would be able to

evaluate integrals of this form quite generally if we had good


many-dimensional numerical quadrature methods which could handle
integrals with singularities in them (for example the 1/r12
singularity).
methods.

But unfortunately we simply do not have such

At present we have really good methods only for one-

dimension and then only if we can avoid singularities.

We have

some methods which will handle certain kinds of three and six
dimensional integration, but these are as yet only in their early
stages of development as techniques.

53

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

Thus the primary limitation on our use of expansions like


(5.1) is that our functions

~p

must be chosen so that we can

evaluate the integrals (5.2) and this means essentially that we


must be able analytically to reduce (5.2) to an integral that can
be performed by numerical quadrature.

It is this limitation

that pushes us almost inevitably to an orbital approximation in


which we represent a function

in (5.1) as an anti-symmetrical

product of spin-orbitals viz.


A

HXl -- x..) = A
1'1

cp. (r.) X. (S.)

II

'l~~

(5.3)

~~

~=

where X . (s) denotes the spin function a.(s) or 8(s) as required,


~

and cp. (r) the appropriate


~

and A denotes the normalised

orb~tal,

antisymmetrising operator.
With functions like (5.3) it is easy to show that our
integral (5.2) reduces to:

<~ IHI~ > = Ed .. <cp.lhlcp> + ! E d. 'kl<CP.cp.1


p

ij

~J

ijkl

~J

1/

r12Icpk1>

(5.4)

where the d .. and the d" kl are in general sums of products of


~J
~J
(N-l) (or (N-2 orbital overlap integrals of the form < Icp >
m n
neither m nor n in ij (or ijkl). (For details see e.g.
McWeeny and Sutcliffe pp. 50-51).

We have thus reduced our

problem to that of performing at most six-dimensional


quadratures.
Currently we do not have certain methods of handling sixdimensional numerical quadrature and we tend to consider integrals
not evaluable unless we can analytically reduce them until they
involve at most a one dimensional numerical quadrature.

Now there

are some signs that this situation is beginning to change.

In

the early sixties some pioneers began looking at "Monte-Carlo"


methods for such problems and though the results there were not
very encouraging, a small amount of work still appears to be going

54

B.T.SUTCLIFFE

on, jUdging from published articles, some of the most interesting


by pupils of or by these influenced by the work of the late
Professor Boys (see e.g. Hyslop, Theoret. Chim Acta
Boys and Handy, Theoret. Chim Acta

2l,

2l,

189 (1973),

195 (1973but also by

other workers (see e.g. Daudey and Diner, Int. J. Quant. Chem. VI,
575 (1972). It is an area that could do with very much more work
since the requirement of analytical reducibility to one dimension
is a very strong one indeed and effectively rules out some interesting possibilities for orbital choices, for example many centre
orbitals or orbitals composed of numerical Hartree-Fock atomic
orbitals.
At the present time it seems as if lack of technique even
further confines us to orbitals of the form:
m

= ~~i

E crin r
r=l

(5.5)

' m~N

where the basis {n } consists of a set of orbitals, generally


r

non-orthogonal, having local centres within the molecule, and of


relatively simple analytic form.

The nr could, for example, be

Slater orbitals on each nucleus.


Over the past 10 years (for reasons that will become clear
to you from the Section of this course devoted to integral
evaluation) the front runner in the basis set stakes has been
the gaussian orbital.

I do not wish to analyse the reasons for

this view, but I simply wish to point out that in spite of the
successes of computational work in this field in the past few years,
there still remains an enormous problem of integral evaluation
that has to be tackled if we are to use functions that can do
better point by point than can gaussians.
For completeness it should perhaps at this stage be pointed
out that in the transcorrelated method one is not in fact forced
back to simple orbital product form, but to orbital product form

55

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

supplemented by a function involving pairs of the interelectronic


variables.

However, even in this method, as you will see, there

remains an integral problem very similar to, but somewhat ha:tder


than, that which arises in the more usual methods.
Given however ,that we are

go~ng

to use

orb~tals

of the kind

(5.5) then we must recognise that in order to get the integrals


over the

~.

for use in (5.4), we have a transformation problem

and in particular we have the celebrated four index transformation


problem.
<

~ i $ j /11r12./ ~k ~ I>

(5.6)

The details of how this may best be done form another part of the
course, but it is clear that the optimum algorithm for such a
transformation is of order m5 operations for a complete transformation.

Now we expect to use quite a large basis{n}in our

calculations, if only because a large one will be forced on us


in reasonable sized molecules, we recognise that this process of
transformation will be extremely time consuming.

Indeed until

very recently the process was considered to be computationally


unfeasible except for very small bases (less than about 20) and
this fact again had consequences, in that it forced concentration
on just one-term approximations in (5.1) where, as we shall see,
explicit use of a full transformation technique can be avoided.
Now it is, of course,the case that a transformation can be
avoided by using the basis {n} directly and if the basis consists
of atomic orbitals this approach leads to the classical valencebond (VB) method (for convenience we shall call all methods
employing a raw basis of non-orthogonal orbitals, valence-bond
methods).

The trouble with this approach is that since the basis

is generally non-orthogonal few of the d .. and the d. 'kl of


~J

~J

(5.4) vanish and these must be evaluated essentially as minors of

56

B. T. SUTCLIFFE

order (mrl) (or (mr2

of determinants of overlap integrals.

This procedure, though formally simple, is extremely time


consuming involving processes (at best) of order m3 operations
for each integral like (5.4).

Since one must, in this approach,

use a many term approximation to get anywhere at all, the process


can soon get out of hand.

Again until recently the only VB type

calculations had been on simple first row diatomics, and the VB


method had in consequence come to be regarded as a kind of bad
joke, begun by Pauling and continued by organic chemists with
the sole object of plaguing honest hard-working theoreticians.
However the difficulties of dealing with (5.4) were and are real
enough (though there have been some recent technical advances),
and in consequence the tendency is still to concentrate on an
orthogonal basis.
Now again, as is well known, if you choose the orbitals from
which your

are constructed, to form a mutually orthogonal set

then you can cut down the number of basic integrals required, for
because of the orthogonality restrictions,many vanish.

The very

worst case that can arise here is the so-called diagonal case
where the matrix element can be written as

<~IHI~> =

<Cp . cp 1
1

N
L

1/

N
L

i=l

'"

<cplhlcp> +!
1

r121 cp. cp. >} +


J

L <cp.cp.1
i=p+l j=l 1 J

1/

p
I:

ij =1

N
L

1=1

j=p+l

r12Icp.cp.>
1

{<cpcp1

<cp. cp 1
1

1/

r12lcpcp>J

1/

r121 cp. cp. > +


1

(5.7)

where for definiteness the first p orbitals in the product are

57

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

associated with a spin functions and the orbitals from p+1 to N,


with S spin functions.

Here it is easy to see we need just None

electron and N(N+1) two electron integrals so that the problem is


on order of magnitude smaller than the general problem (5.4).
As is well known in an orthogonal basis the off-diagonal

" >, p~q, vanish unless the products ~


matrix elements <~ IHI~
and

have two or less spin orbital differences between them

and the non-vanishing matrix elements take a reasonably simple


form ("Slater's Rules", see e.g. McWeeny and Sutcliffe pp. 49-50).
This of course makes a many term approximation in (5.1) easier
to handle as far as matrix elements go, but the transformation
problem is certain to arise if an orthogonal basis is used to get
the integrals over the orthogonal orbitals, so that a one term
approach is again sought for in order to avoid this.
Thus we see that no matter how we twist and turn we are
presently forced back on a one term orbital product approximation,
with really very crude orbitals, as the only possibility for
practical molecular computation.

There presently appear some

possibilities of this changing, but effectively all the history


of the subject (which in a sense does not begin in earnest until
1951 and does not get really flying until 1964) is the history
of this kind of one term approximation.

We shall therefore

examine it in greater detail in the next section.

58

6.

B. T. SUTCLIFFE

SOME PROPERTIES OF THE SINGLE TERM (DETERMINANT) FUNCTION


If we are going to choose a single term approximation to

(5.1) then it clearly is sensible to choose the orbitals in such


a way that the energy is minimised.

This choice leads to the

Hartree-Fock-Slater Self-Consistent-Field method.

The practice

of this method as described by Dr. Veillard, constitutes a whole


unit of this course, but in this section I should simply like to
say one or two things about the theory underlying the SCF method.
Some of what I say will overlap with what is being said in
other lectures, but I have a different purpose in mind, I seek to
indicate how what is said and done, may be justified.

We shall

make our discussion quite general within the class of single


determinant approximations i. e.

Hx... 1' x..


--~

x..) =

"N

1 det I 1aA-'1'2 a --- Pa p+l e --- NeI

{1r

(6.1)

and we shall assume

The first thing that must be said is that we can require,


if we wish, without any loss of generality that the first p
orbitals form an orthonormal set {a} as shall the set p+l to
N' {S}.

In this case the spin-orbitals comprising the

functions form an orthonormal set.

To see this let us assume

that the sets {a} and {S} are the minimizing orbitals, but are
not orthonormal sets, though they are linearly independent sets
(if they were not

would vanish).

Under these circumstances the

sets can be mapped into orthonormal sets by a linear transformation.

-f"

'" -+ '"
-'"
""'--

'" V'"
'"
= f"
-,-.

and similarly for~.

<:i:_al
r_a>
'I'
'I'

(6.2)

It is immediately clear that all these

59

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

transformations do is to add multiples of one column to another


in the determinant, and in consequence to change the determinant
at most by a multiplicative factor, which is taken care of by
the normalisation requirement.

In fact it is possible to choose

a set of orbitals such that not only are the ~a and ~8

orthogonal among themselves, but that each orbital in ~a is

orthogonal to all but one of the orbitals in ~8 (and ~e versa),


and still leave the energy unchanged.

These orbitals are called

"corresponding orbitals" and are discussed, for example, by Huriman

J.e.p. 40, 2827, (1964).

They are closely related to the

natural orbitals of the problem and can be very useful in spin


projection analysis (see later).

However they are not convenient

orbitals for direct calculation.


The next aim is rather more subtle but of the same kind as
the previous one.

Suppose that the molecular Hamiltonian is

invariant under the operations of a point group G.


orbital set is arbitrary then under some operation
the orbital set
~.'
1

~.
1

If the

G in

will be mapped into some on the set

will not necessarily be related to the

~ .
1

the group
The

~.'
1

However if the

set is chosen so that under any operation of the group


~.1 -+ ~.'
1

c.~.

.11

l=n

(6.3)

where n=l, m=p


or
n=p+l, m=N

that is if the orbitals are chosen to be symmetry orbitals, then


we have the same sort of energy invariance that we had above. and
furthermore we have a total wave function
symmetry function.

which is itself a

It should be noticed that (6.3) does not

necessarily imply that the

~.

belong to irreducible representations

of the point group. but in practice we usually choose them to do


so.
It should be recognised however that to require that a single
determinant wave function be a symmetry function is in fact to

60

B. T. SUTCLIFFE

impose an extra requirement on it.

Thus there may be, in any

given problem, a different solution at a different minimum to


that obtained in the symmetry restricted case.

If there is such

a solution it will of course have a lower energy than the solution


with symmetry.
It is, of course, obvious that a single term function can at
best transform like one basis vector carrying an irreducible
representation so that a single term function cannot be used to
describe a degenerate state, and in general single term functions
are restricted to describing states belonging to one dimensional
irreducible representations.
The incorporation of a space symmetry restriction, in
practice, presents no problems, for as we shall see shortly, we
can set up an SCF equation so that we can satisfy the symmetry
requirements on the orbitals automatically.

However it is not so

easy to satisfy the other natural symmetry requirement, namely


that

should be a spin eigen-function.

It is easy to show that

a one term function will be a spin eigen-function, if and only if


"'. = '"
'I'~

'l'p+~

and thus to make

= I

"

--- NS

(6.4)

a spin eigen-function we must impose this extra

restriction, and this is not easily done, except when Na

= NS '

the so called closed shell singlet case, of which more later.

It

is however possible to develop a method of solution containing


this restriction and this method will be dealt with as the open
shell SCF scheme, but as the same scheme can be used in limited
instances of many term functions, its consideration at this point
is deferred.

For the moment then let us neglect spin symmetry

and write our MO's in LCAO form as

61

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

Jl.

_Ta

1xm m N

(6.5)

Cl

Ta

(6.6)

1xm DixN f3
where each column of T is a set of orbital coefficients.

Straight-

forward manipulation of (4.3) then yields


(6.7)

+ a term

Cl+B

where

(6.8)

(6.9)

(~ (!2)rs

~ (~\s = r Rut<nrntl

1/

tu

\zlnuns>

(6.10)

and "tr" denotes the operation of taking the matrix trace.

elements of hare h
= <n Ihln
" s >.
rs
r

The

Our basic SCF problem is now to minimise E against variations


in Ta and

---

TB subject to the constraints

---

(6.11)

where -n
I
is a unit matrix of dimension n by nand Srs

= <n r In s >.

Now it is at once apparent that the energy expression as it


stands is invariant under a unitary transformation of MO's of the
type

a'
1.
-'

= cpa Ua

uB

(6.12)

and this we may see simply by writing for example


a'

1.

--

n Ta Ua

a'
=n T

--

(6.13)

62

B. T. SUTCLIFFE

and we see that


cx'

(6.14)

so the invariance of E follows at once from the invariance of


Rcx . As Dr. Veillard will show, this invariance can be made use

.-

of, to show that the vectors forming the columns of T which have

'"
the required minimum properties can be obtained by iterative

solution of the pseudo-eigen-value problems

(6.15)
where

(6.16)
with a similar problem for the ~.
The MO's obtained by solution of this problem are (in the
absence of degeneracy in the ~cx) uniquely defined to within a
phase factor (in the case of real orbitals to within a sign
factor), and constitute the canonical molecular orbitals of the
problem.

Their uniqueness is easily proved (pace degeneracy) by


noting that the set of all vectors CCX, CB is complete in the

- -

space of the problem and hence any other solution vector may be
expanded in terms of them.
This invariance of the energy under a unitary transformation
also enables one to show, that
symmetry orbitals then

hF

i~

the initial orbitals are

represents a totally symmetric operator

in the tensor operator sense.

That is it transforms like the

totally symmetric representation of the molecular point group.


This means that the solutions to the pseudo-eigen-value problem
are themselves symmetry functions, and will thus remain so throughout the iterative solution process.

To show this result it is

sufficient to notice that if the initial MO's are chosen to be

63

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

symmetry orbitals then


A

= - G ,

G
~

where

(6.17)

can be chosen to be unitary and thus J and K are

invariant as before.

,...

It is also easy to show that providing there exists a


unique energy minimum for the function

constructed from a given

basis then the energy is invariant under a general linear transformation n

,...

nV.
--

This means that it is possible to work in an

orthogonalised basis if we wish, without paying any energy


penalty, and perhaps more important, to work in a basis of
symmetry orbitals if we wish.

y that reduces

Thus if we have a transformation

to blocked form i.e.


8'

0
.-

tv--

8'
.,.2

MJ.

- - --' -Vt 8 V

8'

and providing that

S'

~ ~ ~ ~

=~

(6.18)

8'

-3

is not singular then we can re-write the

eigen-value equation as

(where ,....
hF may be either

or

hF
-a

hF V C'
,...

~ ~ Y..~

or

hF)
-13
(6.19)

and hence

or

-Vt -hF
h

F' C'

;{. ~'

(6.20)

64

B. T. SUTCLIFFE

--

because Sl is blocked then so must h

FI

be, and we can therefore

split the problem into symmetry blocks for solution at this stage.
The invariance assures us that this way we obtain exactly
the same results as we would have obtained if we had transformed
the integrals from the basis !l.. to the basis

.!l.!

and obtained

symmetry blocking in this way


There is another point which it is perhaps worthwhile
making known, suppose that within any symmetry block we find a
non-singular matrix

such that

..

:G

then by the same

technique as above we may write


h

F'

C'
,...,

C' e: C'
,.....,,.,,..',,-

-1
= ,.,..,
Z

(6.21)

(where is an mxm matrix of coefficients

and La diagonal

matrix of eigen-values) and we can thus convert the pseudo eigenvalue problem into a proper problem very easily.
It should also be noticed that if you have a matrix c(i)

.......

obtained from solution of this problem at the i'th iteration then


this matrix will transform S to a unit matrix, and thus may be

""'"'

used in place of Z at the (i + l)th iteration.

If the matrix

c(i) diagonalise~F(i), close enough to self-consistency it


,.,..,
"-

should also nearly diagonalise hF(i + 1) and so speed the

iterative converge.

This discussion perhaps helps to explain in some measure why


we strive so hard to cast our SCF equations in canonical form,
even in open shell situations where canonical form is far from
natural.

It is because the pseudo-eigen-value problem is easily

forced into eigen-value form, and hence very easily solved using
a computer and also that as a consequence of the solution of this
problem, you obtain a set of orbitals which are naturally
orthonormal.

Furthermore one is provided automatically with a

65

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

set of (m-n) orbitals which are not used in making up

(the

unoccupied orbitals) and which may be used (though many would say
they should not be used) in improving the given function by
constructing extra configurations and performing a many configuration calculation.
However canonical form is not without its disadvantages,
since there are no theorems to show that an iterative process in
canonical form must converge and indeed it is easy (as we shall
see later) to construct examples where such a process definitely
does not converge.
If we do construct more determinants from an original
function, by replacing the occupied orbitals, one, two three and
so on at a time by orbitals from the unoccupied set (for any
given spin), then we know from Slater's rules that only determinants differing in two or fewer orbitals will have matrix elements
with the original determinant.
credited

There is also a theorem, often

to Brillouin, which tells us that in fact there will

not be matrix elements between the single substituent determinants


and the original one.

This theorem is easy to prove and is proved

in all the standard texts, I simply mention it as we shall need


to use it later.
While on this topic perhaps I should remind you of the
approximate result due to Koopmans, which states that ionisation
potentials are the negatives of the canonical orbital energies.
So far we have not really considered the problem of spin
symmetry in single determinant functions, but it is very easy to
see (as mentioned in Section 5) that our general one determinant
function is not an eigen-function of

52

though it is of

5z,

and

there is no particularly straight-forward method of making it


such an eigen-function.

To achieve this end (if it is desired)

there are three possible ways.

66

1.

B. T. SUTCLIFFE

Constrain the orbitals so that the single determinant is a


spin eigen-function.

This may be done by constructing a

determinant of orbitals doubly occupied as far as possible

i.e.
H!l' x --~)
;:z
....

~det 1~ 1ex~ 2ex


'V'N!

--- ~

Nex

ex~ lex 8~ 28 --- 4N881

This function is a spin eigen function with eigen values


S =

2.

Ms =

(N ex

N 8)

Optimise the function as described previously (the method is


often called the unrestricted Hartree Fock (UHF) method) but
then project out from the optimised function the required
spin eigen-function with eigen-values using the projector

oS
3.

TI

'"
'" - Sf)
(S"'2 - S')/(S

Sf,S

Project out the required state from the general function and
then optimise the resulting projected function.
If the first procedure is adopted this leads to the open-

shell SCF problem, about which Dr. Vei11ard has spoken, except in
the case where S : Ms

0 and there we get the usual closed-shell

problem, with which you are very familiar.

By a happy accident

Brillouin's theorem and Koopman's approximate result hold for


closed shell results, but they do not hold for the general openshell results.
It is also the case that sometimes functions produced in the
third way can also be managed in the conventional open-shell
manner, but generally a many configuration SCF approach is
necessary here to perform

the optimisation.

The second method is often adopted (at least in principle)


following a UHF calculation.
just one term from

In practice it is the custom to use

OS' the single annihilator, to remove the most

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

important contaminating spin component.

67

There are no hard and

fast rules for deciding what that component will be but it is


often taken to be the state with S value one higher than the state
of interest.

The state of interest is nearly always the state

with lowest permissable S value.


There are many programs available to perform a single
annihilation and at least one is available to perform complete
projection.

More details of the theory on computational form

can be found in Amos and Snyder J.e.p.4l, 1773, (1964) and


Harriman J.e.p. 40, 2827, (1964).

It is sufficient here to

notice that projection after optimisation, leads to a many


determinant wave function which is not itself an optimised many
determinant function.

68

7.

B. T. SUTCLIFFE

BEYOND THE SINGLE TERM APPROXIMATION


In this section I want to provide a background for what

others will have to say on correlated wave functions.

To do this

I shall need to say a little bit about density matrices.

For a

detailed discussion of density matrices I might in all modesty


recommend the excellent exposition available in McWeeny and
Sutcliffe "Methods of Molecular Quantum Mechanics", especially
Ch. 4.

However I just remind you that the one particle density

matrix derived from an electronic wave function


P (!.l'

~')

= Nf

~*~i'

.!2 --

~) ~

(!.l'

---

is defined as

~)d~2

d~
(7.1)

and the two particle density matrix is defined as


P

~l' ~2' .!l I , .!-2 I)

(x

--

-1' -2

~.)dx
~

-t'l

= N (N -

---

1)

f ~* (!1'

!.21 --

~) ~
(7.2)

d~.
-i'l

The associated spinless quantities

Pl(~l'

Ell) and

P2 (:1' E2' ll" E21) are derived from these equations by integrating out the spin variables.
For a single determinent function it is very easy to show
that

+
PI (~ r') = PI (!J
where
PI

--

(.!J rl )= r

PI (!.!

t)

i=l

:.:)

+ PI

- (E.!

rl)

(7.3)

<p. (J)<P.*(!J
~

(7.4)

i=p+l

<PI (r}.*(r ' )


,..... ~ -

PI + (PI)
is the probability density (distribution) function for

an electron with spin up (down), so that the probability of finding

69

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

a particle with spin up in a volume element V is just

and similarly for a particle with spin down.

The interpretation

of PI(r) as a probability density for a particle irrespective of


its spin is then clear.
Now on the ordinary statistical definition of independence,
two events are independent of their joint distribution function
is just the product of the independent distribution functions.
Now here the joint distribution function is just
Pz (~,~) (= P z (~l' z' ~l' ~ and simple analysis shows
that this function can be written in a form very similar to that
++

of (7.3), that is as a sum of terms P z

+-

,P z

'P 2

-+

and P2

the first term describing the joint probability density for two
particles both with spin up the second term that for particle
one with spin up and two with spin down and so on.
Analysis of the single determinant wave function reveals
that we can write

(7.6)
(7.7)

with similar expressions for P2-- and P 2-+


Thus we see that on
our definition, particles with different spin behave as if they
were completely independent in the single determinent approximation.
We often say that their motions are completely uncorrelated.
However the motions of particles with like spin are correlated,
as they must be in order to obey the Pauli principle, and. indeed
....

++

it can easily be seen from (7.6) that as r l ~ r 2 , P z ~ 0 as


required by this principle. This kind of correlation is often
~

called "Fermi correlation" and it must be present in any electronic

70

B. T. SUTCLIFFE

wave function.

However it is in a sense "accidental" and does

not reflect the fact that, since electrons interact, we should


expect their motions to be correlated.

Thus we should expect, in

any really good wave function, instead of a form like (7.7) rather
a form like
(7.8)
where the "coulomb hole" function f+- described the details of
the correlation between the particle motions.

It is often useful

to look at efforts to improve on the one term approximation as


efforts to describe f

+-

properly.

It is sensible therefore to ask what is known about f+-?


The answer is, unfortunately very little.

About the strongest

analytic result that can have any bearing on the form of f

+-

is

the one due to Kato, of which we spoke in Section 3 and whose


differential form is

....r,
where L

= ! (!.l

+ 2) ~2

= ! (~ - 4)

and where ' indicates a

spherical average to be taken as follows.


polar system r 12 , 812 ,
812 and

~12'

~12'

(7.9)

Take l12 to define a

now integrate out the variables

keeping r 12 , r, x 3 , etc. constant.

This result

holds only for one pair of electrons coming together.

No one

knows quite what should happen when more than one pair come
simultaneously close together or when one pair come together near
a nucleus and so on.

Furthermore the relationship (7.9) cannot

tell us anything about what happens when


various possible paths.

~l

tends to

along

It is of course clear that (7.9) is a

relation rather like the virial theorem, in that the exact


solution must satisfy it, but it provides no guarantee that any
approximate solution that does satisfy it is a good solution.

71

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

We can integrate (7.9) in the sense that we can decide the


most general functional form for

that satisfies the relation

and we have seen that on this definition of integration

(7.10)

From this form it is easy to see that the exact solution


possesses a cusp at r lZ
reflect this behaviour.

=0

and we should expect f

+-

(El' Ez) to

This more or less exhausts what is known.

We are, therefore, forced to proceed from this stage rather like


blind men, but it would seem that a reasonable way to proceed
would be to assume that an improvement over the single determinent
function would be a function
~

1 detl a --- a
S
1
p p+l

IN'

NS

I .7r. f.. (r.,


~>J

~J

-~

(7.11)

r.)
....J

where the functions in the product are assumed so chosen as to


make the product symmetric under interchange of a pair of
variables.

This is most easily achieved by choosing the f .. to

be the same function, say f, for all pairs ij.


What then should be the form of f?

~J

This again is very

difficult to say but it would certainly be sensible to choose a


form which at least potentially could satisfy the cusp conditions.
One of the most popular forms (see e.g. Hirschfelder J.e.p.

~,

3145 (1963)) is

(7.1Z)
This form is of course not the only one, nor can it be a
completely proper form, since it depends only on r lZ and not on
~

.lo,

r l - r Z'

72

B. T. SUTCLIFFE

From what we have said before (7.11) is clearly a nonstarter from a computational point of view, if we are going to
use the variation theorem and (7.12) directly.

(Boys in his

trans-correlated method uses (7.11) and a function very like


(7.12) directly but he avoids the matrix element problem, by
abandoning the variation theorem but Dr. Handy will deal with
this in more detail).

If we want to stick to the variation

theorem the best that we can probably do is to try and simulate


the behaviour of f, using orbitals.

If we know the details of

(7.12) (what values of yare appropriate and so on) then we could


of course simply attempt a least squares expansion of (7.12) in
a chosen basis.

I don't want to discuss this as a realistic

possibility, because we don't in general know y, and anyway it


may not be the best way to use a given basis.

The point that I

want to make is that, essentially, to describe correlation we


must simulate a function which is more or less "peaked" at a
particular r 12 value. Intuitively one would expect that the
function should be peaked at the r 12 value corresponding to the
most probable electron separation in the zeroth order approximation (i.e. the one determinant approximation) so as to make the
maximum change at the most important point.

It is easy to see

that in the one determinant approximation p+- (i; ~) will be a


maximum when P l + (~) is a maximum and P l -(2) is a maximum.
+
But PI and PI have maxima where the sums of the squares of the
occupied orbitals have maxima, so that roughly speaking any
orbitals used to describe correlation functions should have the
maxima of their squares, in the same regions of space, as do the
occupied orbitals.
This requirement poses an acute problem numerically, since
it is unlikely that one can choose many such orbitals "localised"
(in some sense) in the same region of space as the occupied
orbitals without choosing a set which, to the accuracy of one's
machine, are independent of one another.

possible practical

73

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

choices for the orbitals will be discussed in more detail, but in


the meantime it should be noted that if you do choose to expand
(7.11) in terms of an orbital basis, you effectively get the
usual C.l. approach and that there are some theorems for the best
orbitals to use in such an approach.
There is a theorem due originally to Schmidt and rediscovered
by Coleman that states that if you wish to approximate a function
~

in a least squares sense by a product expansion of the form


p

p'

E E a .. f.(x 1 )g.(x 2 --1=1 j=l ~J ~ J'"

(7.13)

~_)

Then the least squares error is minimised by a function of the


form
p"
E y. f. (xl) g. (x2
1.""
1=1 ''1. 1 -

(7.14)

~)

where the fi are p" natural spin orbitals of

corresponding to

the highest p" occupation numbers, and p" is the smaller of p


and p'.

The function g.~ can obviously be similarly expanded and

so it follows that

is closest to

if it is a product of

natural spin orbitals with highest occupation numbers.

Anti-

symmetry requirements do not effect the argument.


The natural spin orbitals, may I remind you, are defined to
be the eigen-functions of the one particle density matrix and the
occupation numbers are the associated eigen-values.
Now of course this result as it stands is not much good in
practice, for in order to determine the natural spin orbitals we
have to know 'I' , but it has had an important effect on computational
quantum chemistry through the idea of a pseudo-natural orbital.
The natural orbitals of a problem are defined as the eigen
functions of PI (rather than of PI ).

(The connection between the

74

B. T. SUTCLIFFE

natural and natural spin orbitals of a problem is not always easy


to make, but in the particular case of singlet function (where
Pl +

= Pl -)

are "

1.

it is very easy to show that, if the natural orbitals

then the natural spin-orbitals are ,a and ' 13).


1.

1.

The

pseudo-natural orbitals are defined just as are the natural


orbitals but for some approximate density function.

Krauss was

able to show that if you did a CI expansion and determined the


pseudo-natural orbitals at any stage, you were able to use these
to shorten your CI expansion length.

Bender and Davidson carried

these ideas further and designed a CI scheme which involved the


iterative construction of pseudo-natural orbitals as a device
for getting the optimal CI expansion of a given length.
In fact however one can derive equations, the so-called
M.C-.S.C.F. equations which enable you to determine the optimum
orbitals (on an energy criterion) for a CI expansion of a given
length and about this you will hear more from Dr. Veillard.

It is

in the development of this method that currently there appears


much hope for calculations that go beyond the single term
approximations.

75

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

8.

ERROR ESTIMATION IN EXPECTATION VALUES COMPUTED WITH

APPROXIMATE WAVE FUNCTIONS

In this section I want to tackle the problem of how we can


estimate how far our solution is from the exact solution of our
specified problem.

This is of course not the same thing as the

relation our calculated result bears to an observed result and


that delicate problem I reserve for the next section.
Let us suppose we have an approximate solution
given problem and that we write our exact solution

where X is chosen to be orthogonal to


are each normalised to unity.

~,

and

~
~

where~,

to our
as

X, and

We then see immediately that

Let us for the moment assume that the correction function X,


is chosen so that the equality (8.1) is pointwise (rather than in
A

the mean), so that for any operator B we have


(8.3)
where

(8.4)
A little simple re-arrangement of (8.3) gives

(8.5)

or using (8.2) to eliminate d in favour of S.


LIB
If we write S

(8.6)

(1 - lO-n), then as n increases we get,after

76

B. T. SUTCLIFFE

a bit of fiddling
(8.7a)

Typical figures for these expressions as a function of n


are perhaps illuminating
n
28 (1 _ 82)~
(1 - 82 )

=1

=4

.7846

.279

0.028

.19

0.0.02

0.0002

=7

2.8 x 10- 3
2 x 10- 6

Thus the relations (8.7a and b) are pretty precise for


8

.99 but the term mUltiplying Bx does not in any real sense
become small until the 8 > 0.999999.
>

Now of course it is very difficult to say anything very


precisely with the aid of (8.6) because we do not know Band B
and indeed could not without knowing
some useful qualitative points.

we can perhaps make

~)but

In the first place it is clearly

in our interests to choose our so that B is as small as


x

possible, consistent with 8 remaining large, and we might hope to


be able to determine on general theoretical grounds, circumstances
A

in which Bx was small. Unfortunately for an arbitrary operator B,


this is even trickier than might be thought, because we generally
choose our wave functions using the variation theorem and thus we
are content with a situation where, even in the limit, we can only
be sure that (8.1) is satisfied in the mean.
If (8.1) is true only in the mean then of course (8.3) does
not generally follow from (8.1).

It may be that it does, but it

will depend on the operator and on the wave functions used in the
variation approach.

For example it is very well known that if

you put little pimples in the wave function far out from the
centres of interest (for example, by putting a gaussian with a

77

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

large exponent on a

cha~geless

centre, far removed from the

origin of molecular co-ordinates) then such a function will lower


the energy slightly, but can have absolutely disastrous
consequences for the dipole moment operator.

On the other hand

if one uses a more commonsensical basis set designed to fall away


uniformly from the molecular centres, then as the energy improves
so on the whole does the dipole moment.

Even if one uses such a

reasonable set, say of gaussians, and calculates the spin density


at the nucleus with them, then one finds that this, being a point
property, seems to be pretty unrelated to the energy improvement
with basis.
Thus we are really unjustified in using variation theoretic
estimates of B , but there is little or nothing that can be done
about it.

We must just take great care to avoid pathological

basis sets and nasty operators.


If we start by considering
~

Bas

the Hamiltonian and think of

as a single term Hartree-Fock function and X as C.I. expansion

formed from all single, double, triple and so on substitutions of


orbitals in

~,

then by Brillouims theorem, to first order in

perturbation theory

single substitutions will be absent from X.

In so far as this is rigorously correct and the expansion is

complete we can expect Bx = Hx to vanish in this case and we


should achieve a rather good energy from the Hartree-Fock
function,all other things being equal.

It obviously follows at

once that we should also achieve rather good expectation values


of one particle operators, since again for these Bx is zero, in

so far as single substitutions are genuinely absent.

Again,

somewhat intuitively, we can assert that we may well get better


expectation values for totally positive (or totally negative)
one particle operators, than for the arbitrary operator, because
even if Bc is big, there is more likely to be cancellation in
the (B-Bc ) term for these classes of operators.

78

B. T. SUTCLIFFE

Now we can expect these results only for the exact HartreeFock function, so that we have (hopefully) a complete set of
functions in which to describe X, and that we have also the
largest value possible for S.

Also, though Brillouin's theorem

is true for any S.C.F. closed shell or unrestricted function, we


should in fact expect single substitutions to playa bigger part
proportionately in any expansion based on a poor approximation to
the Hartree-Fock function.
Now it is in theory at any rate possible to design an orbital
wave function in which single substitutions rigorously vanish.
The orbitals are the so-called Bruckner or maximum overlap
orbitals.

They are discussed briefly in Ch. 7 of McWeeny and

Sutcliffe, but unfortunately they are defined in terms of the


exact wave function only, and so far attempts to approximate
them have not been outstandingly successful.

For those interested

there is a fairly recent paper on these orbitals by Cizek et al.


Phys. Rev. AB, 640, 1973.

Of course a

constructed from Bruckner

orbitals is not chosen to optimise any property in particular,


but simply to have maximum overlap with the exact wave function,
so we may not in fact do better for any particular property with
this

than with the Hartree-Fock

~.

A similar situation arises with a

constructed from natural

orbitals (or more generally natural spin-orbitals) for here again,


for one particle operator B vanishes, but the
in a particular property.

is not optimal

As we have already indicated it is

perfectly feasible to construct pseudo-natural orbitals, and if


we use these then there is some evidence that on the whole we do
get rather good one particle expectation values.

Of course here

we would not necessarily expect good two-particle expectation


values, unless B were small and/or S were very large.
x

On the basis of this discussion therefore we expect by and

79

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

large a near Hartree-Fock function to be pretty good for both


energy and one-particle expectation values.

However we have a

similar kind of problem to the one we have so far discussed in


deciding how close any LCAO-MO-SCF function is to the HartreeFock limit, since (except perhaps for atoms) we don't know the
exact Hartree-Fock function.
There are theorems which shed, albeit obliquely, some light
on the problem.

For example we know that both the exact and the

exact Hartree-Fock functions both satisfy the viral theorem.


(For a review of the viral theorem see e.g. L3wdin, J. Mol. Spec.
3. 46, 1959).

We also know again for the exact and the exact

Hartree-Fock, that the Hellman-Feynmann forces are equal to the


forces calculated as derivatives of the total energy with respect
to some nuclear displacement co-ordinate.

(For a review of the

Hellman-Feynmann theorem see e.g. Deb. Rev. Mod. Phys. 45, 22,
1973).

But of course these theorems help us only in so far as we

know that if a wave function doesn't satisfy them then it cannot


be the one we are after.
As far as closeness to the Hartree-Fock limit is concerned,
recently some very careful work has been done by von Niessen,
Diercksen and Kraemer (Theoret. Chim. Acta. in press) to try to
estimate how close certain Gaussian bases come to the limit, in
the case of hydrogen fluoride.

They come to the conclusion

that a quite enormous basis set including d and f functions is


necessary to reach close to the limit.

This is a very depressing

result, but I suppose a not unexpected one.

In passing, it should

be noticed that there have been semi-empirical schemes proposed


for the estimation of the Hartree-Fock energy (see e.g.
Hollister and Sinanoglu J.A.C.S. 88, 13, 1966) but of course
such schemes are always subject to quite a measure of uncertainty.
Now all our discussion so far has been soft and qualitative

80

B. T. SUTCLIFFE

since it is very difficult to say anything at all about Band


c

Bx in general, but oddly enough it is not impossible to say


something about S, in the sense of providing a lower bound for its
value, and a discussion of this will lead us naturally into a
discussion of the problem of bounds generally.
There is a theorem due to Eckart, Phys. Rev. 36, 878, 1930,
which asserts that
(8.8)
where E is the approximate energy and El and Eo are the exact
energies of the two lowest states of the same symmetry or lower
bounds to these energies.

This may strike you as a pretty

useless result since we don't in general know Eo or El or even


lower bounds to them, and essentially you are right, it is in
general from a computational point of view, a useless result.
But in one or two very simple cases we can use it to get an idea
about magnitudes.

This bound is in fact really rather a slack

one and Weinhold has shown (J. Chem. Phys. 46, 2448 1967) that a
better estimate is given by
(8.9)
where
Sl
~

but not

<1;1'1'>, S12
yet~.

= <~II;>,

here I; is a "better" function than

S12 is of course evaluable, and Sl may be

estimated using the Eckart criterion.

Obviously (8.9) is only of

use in checking S at a rather modest level of approximation.


Now naturally enough these bounds have only been really
checked out on simple systems like helium, where we have a good
knowledge of El and Eo and some extremely good wave functions.
If we take a simple 1s2 function for helium (that is Slater
orbitals with exponent 2) we get from the Eckart criterion

81

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

0.89965.

If we use a similar function with optimal orbital

exponents we get S
get S

0.962, if we use a split shell function we

0.981 and if we use the Pekeris 1078 term function we get


S ~ (1- 3.02 x 10- 9). Using the Weinhold bound with the Hy1eraas
~

three term function as the better function, we tighten the bound


on the screened function to S
S

0.993.

0.987 and on the split shell to

The Hartree Fock result for helium will probably lie

between 0.987 and 0.993 and an estimate of S


reasonably conservative one.

0.990 would seem a

It would seem not unlikely then for

atoms we could say that for Hartree-Fock functions S

0.99.

Of course for molecules the wicket is a much stickier one,


since Eo and El are much less easily accessible from experiment,
but is is plausible atleast that a good Hartree-Fock function for
a diatomic or simple triatomic molecule should have S

0.95 but

this is of course a guess.


In the light of this discussion the thought must leap to mind
of how nice it would be, to be able to calculate bounds for any
particular quantity, then one would know exactly where one was,
and of course one ought to be able to calculate such bounds
without recourse to any empirical data.

Now recently there has

been quite a lot of activity in connection with the calculation


of bounds and their is a pretty up-to-date review available
(Weinhold, Advances in Quantum Chemistry,

299, 1972), but the

~,

subject is a pretty venerable one and you will even find some
discussion of it in Pauling and Wilson's classic text (p. 189).
Let us begin with the older work on energy bounds, there is
a result by MacDonald (Phys. Rev. 46, 828, 1934) which asserts

that, for suitable trial functions, the integral

I'

ex

[01 -

= a constant

ex)r~]*(<ii-ex)m-r~1

d,;

integer m.
~ r ~ m

(8.10)

82

B. T. SUTCLIFFE

will take the same value for all r, and is equivalent to the eigenvalue problem
'\
m
[ (H
- a)

A(m) 1jJ (m)

(8.11)

andSI~12dr

if 01' = 0
m
"
spectrum of H.

= 1 and if A(m) is in the discrete


..,.
()
It follows at once by inspection that 1jJ m = 1jJ
and A(m) = (E - a)m if
(H - E)1jJ = 0

(8.12)

It therefore follows a fortiori that for a suitable class of


trial functions
I' ~ (E

where E

- a)m

is that discrete eigenvalue that lies closest to a in

the sense that it is the eigenvalue which minimises (E

- a)m.

Thus if there are two adjacent discrete eigenvalues E and


n
En + l , (E n + l > En)' in the exact spectrum then,a is closest to
En if

< a <

Now if we take a simple case with m

Ii

= fC(H

- aH)* C(H -

a)~Jd~

= 2,

(8.13)

En+ 1)

=1

we get

= <H2> - 2a<H> + a 2 ? (En - a)2

(8.13)
and we see at once that
(8.14)
or

-(I')!
2
....< (En - a)

<
...

(I')~
2

(8.15)

83

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

where the positive square root is assumed.


Thus we see that
(8.16)

and so a -(1 )2 produces with a lower bound to En if a is chosen


correctly and if the

are suitable functions.

,..

In order for this

result to be correct it must be the case that H is Hermitian not


only with respect to the manifold
the manifold

~(m)

~(l) = ~ ~

Hm~).

~,but

also with respect to

(In general with respect to

This is extremely hard to ensure.

choose the functions

Thus if we

to be determinants of molecular spin

orbitals, composed say of Gaussian or of Slater orbitals then H


is Hermitian with respect to a manifold of such functions, but it
is very problematic indeed, as to whether or not the resulting
functions

~(l) will provide a suitable manifold, since they will

be functions now which depend on l/r .. and l/r ..


1J

a1

It is easy to see the kind of trouble you can get into simply
by recognising that one of the terms in evaluating the expectation
value of HZ is going to be of the form
~

dr
(8.17)

and that

If however you evaluate <HZ> as


(8.18)

then of course such delta function terms do not arise, but if the
manifold is suitable then the two approaches must yield identical
results and from this you can see that it is not at all certain
that they do so.

In most applications however these difficulties

84

B. T. SUTCLIFFE

are simply ignored and the latter formula is used to evaluate the
matrix elements.

(For detailed formulae applicable to the

evaluation of such quantities over certain sorts of Gaussian


orbitals see e.g. Keaveny and Christoffersen J.C.P. 50, 80, 1969).
The difficulties will clearly become very much greater as we to
higher m in this approach, though presumably the higher m that we
can consistently go to the better will be our trial function.
Let us however be content with the case m

= 2 and look at the

result (8.16) which is usually called Stevenson's bound


(Stevenson Phys. Rev.
for a.

22,

If we choose a

199, 1938) and consider possible choices

= <H>, and hopefully our trial function is

sufficiently good for this to be an appropriate choice, we get


(8.l9a)
a result usually called Weinstein's bound and written

En ~ EW
= <H> - ~
n
where

(8.l9b)
~

is of course just the variance or width of H and we see,

from (8.15) that as expected the result is exact when the variance
vanishes.
Of course in this case our upper bound from (8.l5b) is
artificially high because we know that <H> alone is an upper bound,
and this fact might lead one to suspect that the lower bound is
also correspondingly slack, and indeed in cases where it can be
worked out, it turns out to be quite awful, for many, many
functions which give very good upper bounds.

One could of course

choose the parameters in any trial function to minimise

and

hence get as close a bracketing as possible, but to do so would


obviously lead to hair-raisingly tricky equations even if one managed
the integrals, and apart from one or two pioneering attempts on
simple molecules the program has never been carried through.

85

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

As a matter of fact it can be shown that the best bound you


can get of the Stevenson kind is to choose a to be (Es(a) + En + I )/2
where ES(a) denotes an energy calculated by some suitable choice
of a in the Stevenson bound formula, and then one can derive the
relations.

<H> -

6.

(8.20)

(En+ l-<H

and this is called the Temple bound.

Of course to use it, one

must know En + l , but one usually does not, so the formula is, to
all intents and purposes, quite useless. It could be useful in
theory since it holds for any lower bound to En +l , but of course
the problem is to get this bound and we are thus in a circular
situation.
Thus the situation with respect to putting bounds on the
calculated energy is at present pretty hopeless.

The only truly.

non-empirical possibility gives lousy bounds and in the way it is


usually used is of doubtful validity.

Opinion differs about

whether or not this is an area in which more work ought to be done


or whether the situation is likely always to remain hopeless.
Some people think that the "Intermediate Operator" methods first
exploited in quantum mechanics by Bazely and Fox are a better line
of attack on the lower bound problems, but numerically for
molecules they are absolutely non-starters, because of the
necessity of solving some base equation exactly.

Admittedly they

can be modified to deal with a "truncated" basis but in this


condition it seems doubtful if they offer any advantages over the
so-called classical bound methods that we have discussed so far.
What then about bounds on calculated properties?

Well you

will perhaps not be surprised to learn that we are in even worse


condition than with energies.

All the formulae that I know of

in this field depend on you knowing either the exact eigen-values

86

B. T. SUTCLIFFE

of the Hamiltonians or the overlap of the trial with the exact


wave function, or sometimes both!' As such it is probably not
worthwhile considering them further in a course of this kind.
The situation then in respect of accuracy estimates is then
really extremely depressing.

We hope for accuracy if we have a

good variational result but our reasons for hope are not at all
clear.

We could of course form some idea of what our accuracy

might be by looking at the ratio

1\

H~/~,

for some trial function

~,

but it would be extremely tedious to compute this and practically


impossible to interpret the results in a useful way.

Also, if

early experience on helium is anything to go by, we would probably


give up when we observed just how far from constant the ratio was.
No one so far as I know has ever attempted to test wave functions
in this way, for anything but helium.

87

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

9.

COMPARISON OF CALCULATED RESULTS WITH EXPERIMENT


It is perhaps as well at the beginning of this section to

get out of the way some remarks which may easily sound to you as
being at once platidunious and rather offensive simultaneously.
It must always be remembered that for the most part so-called
experimental results are not really experimental results, but the
results of processing a set of primitive observations (meter
readings, time measurements and so on) according to a particular
theoretical model.

Sometimes this theory is very d,eeply embedded

in the language of the way we talk about the particular result, and
there is always a difficulty in deciding whether the theory with
aid of which the results are derived is in fact the most appropriate
one to use from a numerical point of view.
Let me try and clarify this point by reference to a couple
of examples.

If one looks up in tables results for bond lengths

and bond angles of simple molecules what one gets often are
results that are obtained by processing observations of the microwave spectrum or from the rotation-vibration-spectrum in the infrared or perhaps even rotational fine structure in the electronic
spectrum.

The method of processing is usually based on the almost

rigid molecule approach outlined earlier and is for the most part
also based on the assumption of an almost-quadratic potential
function.

If a theoretician wants to see if he can calculate

bond distances etc., that agree with experiment he has two courses
of action open to him.

He can so design his calculation as to be

in tune with the model and produce results in line with the
usually quoted experimental ones or he can build his own model,
perform his calculations in his own way and see if he can get
results in accord with the primitive observation.

One can think

of many similar examples like this where one has a hard and
developed normal model, atomic spectroscopy, many kinds of
magnetic experiments and so on.

well

88

B. T. SUTCLIFFE

However, the situation is not always so clear, particularly


in "ordinary" chemistry, (which, as some cynical physicist once
said, is largely devoted to explaining the un-explainable in terms
of the unobservable).

Let me take a rather extreme example from

E.S.R. spectroscopy, where it is often asserted that molecular


geometry can be obtained from a study of isotropic hyperfine
coupling constants in solution.

The theory that lies behind this

assertion, is essentially the old-fashioned theory of McConnell,


which explains why so called pi-radicals have any E.S.R. spectrum
at all.

The explanation is on the basis of an exceedingly crude

semi-empirical independent particle model.

For the most part it

would simply not be possible or desirable to cast any calculation


in such a form nowadays, so it would simply not be possible to
speak in terms of the assumed model.

One would therefore choose

a less crude model and calculate the primitive observational data


in this model.

But here one would run into difficulty.

Naturally

one would optimise ones geometry in such a calculation with respect


to energy and it may well be that the primitive data calculated at
this geometry do not agree well with those observed.

However

perfect agreement may well be observed at another non-optimal


geometry.

What then is to be compared?

How should one explain

what one has obtained to the experimentalist?

What really is the

experimental result?
An even more acute situation like this arises when a
theoretician is asked to decide by calculation, some point which
is apparently an experimental point, but arises merely in the case
of developing a chemical explanation in terms of a model with
quantum theoretical words in it, but with no contact any longer
with quantum theory.

(Examples are: "It is clear from experiment

that this reaction goes via a non-classical carbonium ion, however


Prof. X does not agree.

Would you do a calculation to show that

the classical intermediate does not meet the requirements of the


observed kinetics?"

Or "It is clear from the results of the NMR

89

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

experiment that in this case through-bond interactions


predominate over through-space interactions.
opposite.

Prof Y asserts the

Would you do a calculation to show that he is wrong?")

I have no advice about what to do in these circumstances, I am


afraid that I often chicken out and say that I am very busy now,
but I will put it on my list of things to be done, or suggest
that they borrow my INDO or (if they are rich) my POLYATOM
package and do it for themselves.

This kind of situation is

really very tricky for on the whole the experimentalist often


wants his predjudices confirmed and would regard it as in doubtful
taste, if you tried to explain that you needed to know what
actually happened in the experiment and that even then you might
not be able to help him, even if you could formulate the problem
in computable terms, because his system might be so big as to
resist solution at the required level of accuracy.

The experi-

mentalist here is a bit like the drunkard at the Salvation Army


meeting who, on seeing that the band is about to start to play.
shouts "I wan'na hear "Temptation". only to be told by the
Salvation Army Officer (the theoretician) "We don't play that
kind of music here" to which the drunkard replies (with some
feeling) "O.K. then, show us your muscles".

They both have a

point I feel.
But anyway let us reserve this last situation for another
time.

There certainly are practioners in our field who are able

to participate in this kind of discussion with experimentalists


to the benifit of both sides.

However to do so involves

intuition of no ordinary order. combined with a careful and


conscientious juxtaposition of very different calculations
(perhaps even some semi-empirical ones) in order to cover the
ground.

The skills necessary to do this will be. I am sure.

displayed by example in some of our "symposium" speakers. and


indeed perhaps you have already gained some insight from listening to Dr. Veillard's talks. since he is one of the most

90

B. T. SUTCLIFFE

distinguished practioners of this art in Inorganic Chemistry, as


is Prof. Heilbronner in Organic Chemistry.

I want to confine

myself to talking about the situation where one has in fact, or


in view, an experimental result, which is something which can in
principle, be calculated.

In this case one has just to analyse the

experiment carefully and decide how one is going to set about


dealing with it as I indicated earlier, and perhaps the paradigm
of these kind of experiments are spectroscopic experiments and
perhaps also certain kinds of scattering experiments.

About the

experimental aspects of these kinds of problems we will be hearing


something from Prof. Winewisser and Prof. Tgnnies respectively.
Now the detail of precisely how one sets about dealing with
experimental results, differs enormously from experiment to
experiment even in this limited context and it is not really
possible to give a coherent general account of how it should be
done, so perhaps the best kind of thing to do is to take an
example and to work it through in some detail using it as a peg
to hang some of the techniques on, and I propose taking the
experimental observation of Buhl and Snyder (Nature 228, 267,
1970,

Astrophys J. 180, 791, 1973),that there is a very strong

microwave line arising in emission from at least four sources in


the galaxy at 89.189
fine structure.

0.002 GHz.

The line shows no resolved

The question is, "what on earth can it be?"

(In passing it should be mentioned that this looks like a piece


of raw data, but isn't, it is already corrected for velocity
shift, assuming a source velocity of + 10 KID/s.

The correction

here is obviously a matter of interest, but it is perhaps best


not pursued further now.

The correction figures are usually KHz

or at most MHz.)
Obviously and very sensibly the first move here would be to
see if anything earth bound shows up in this kind of region.

One

turns then say to Townes and Schalow ("Microwave Spectroscopy."

91

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

McGraw Hill, 1955) or some other suitable book with a great table
of stuff in the back and hope for the best.

In these kind of

tables one finds rotational constants evaluated on the basis of


conventional theory from observation and by working backwards,
one can find if there are lines in the region of interest.

To

cut a long story short, what one finds there is that for example
H12CN has a line at about 88.63lGHz and H13CN has one at about
86.340GHz.

Thus the spectrum suggests a small triatomic as a

possibility, but of course only as a possibility.

Anyway shortly

after this line was observed (but before details of it were


published!)

Klemperer (Nature 227, 1230, 1970) hypothesised that

in fact the line was due to HCO+, Herzberg in fact suggested HNC
and somewhat later Barshun suggested CCH.
Letters 12,169, 1972).

(Astrophysical

The simple experimental fact allowing

this plethora of suggestions was (and so far as I know still is)


that there is no line observed in the lab. which is spot on
The basis of all the theoretical suggestions are the

89.l89GHz.

very simplest model for a molecule freely rotating as a rigid


rotor, namely that the transition is between every levels given
by
E

h 2 J(J + 1)

2r

with allowed transitions being

J~

= hv = 2B(J

+ 1)

liE

(9.1)

BJ(J + 1)

J + 1

(9.2)

and so the ground transition is just 2B.


Here
I

E r .. m.m. /Em.
~J
~ J i ~

1> j

for the moment of inertia of a rigid linear system about an axis


through its centre of mass.

To determine I of course plausible

bond distances must be chosen but the isotopic masses are of

92

B. T. SUTCLIFFE

course known.
It therefore seems sensible to see if we might be able at
least as a first go to try to calculate non-empirically the bond
lengths of the contenders and try for an assignment on the same
model.

Should we find any of them to be non-linear then of course

we would have to abandon the rigid rotor model and assign on the
basis of an assymetric top model, but let us ignore that possibility for a moment.

Before we do this, however, we should

determine what kind of accuracy is required of us on this simple


model, and estimate whether or not our contemplated calculation
is up to it.

Thus if we were considering HCO+ as the molecule in

question, then it is quite easy to show (see Wahlgren, Liu,


Pearson and Schaffer, Nature 246, 4, 1973) that a spectrum of CH
distances .11244 nm to .10213 nm. with a spectrum of CO distances
0.1100 nm +0 0.1120 nm. will all give agreement with experiment
on this model.

It is easily seen that in this range the

distance has quite an overlap

~dth

c-o

conventional CN and possibly

C-C bond distances so it is clear that for a calculation even to


be potentially decisive in this model, we must get the bond
lengths right to within 0.0005 nm.

Can we do this?

At this stage, in the light of the way we have chosen to


interpret the result, we are committed to a calculation in the
almost rigid molecule approximation.

In fact we don't really

have enough information to decide one way or another but the


almost rigid molecule is obviously a sensible starting point.
One would not on the whole abandon this viewpoint unless one
could not cope with the spectrum in the usual way.

(For a very

interesting case of a situation in which this is forced on you,


look at the history of the UV spectrum of the NH2 radical from
its discovery by Dressler and Ramsey,Phi1. Trans Roy Soc.

251,

553, 1959 though its early interpretations by Pop1e and LonguetHiggins Mol. Phys.l, 372, 1958, and on through later work by

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

Dixon, Mol Phys


Phys Lett.

~,

~,

93

357, 1965 and by Hardwick and Brand, Chem

458, 1973).

Thus we can start off using the fixed nucleus Hamiltonian


here, calculating the energy as a function of internuclear
distances with some confidence perhaps.

That is we can perform

calculations at what, in a very interesting review (Adv. Chem


Phys 23, 961, 1973), Browne and Matsen call the level of "coarse
structure quantum chemistry".

(See also the review by Browne in

Advances in Atomic and Mol. Phys.

I,

47, 1971).

In this kind of

situation Browne estimates for light diatomics nuclear motion


corrections are four orders of magnitude smaller in energy than
are the electronic terms and that relativistic corrections are
eight orders of magnitude smaller, though we shall return to
these estimates later.
The natural starting point therefore for us then would be
SCF calculations (closed shell on HNC and HCO + , open shell or
unrestricted SCF on CCH).

Now how accurately do these kind of

calculations give bond lengths in first row diatomics?

(Again

there are questions here of what exactly is the bond length from
experiment, but of course what really matters to us at this stage
is how the simple rigid molecule model does against experiment in
a region where much experimental information is known and very,
very careful analysis possible).

In his book ("The Electronic

Structure of Atoms and Molecules", Addison-Wesley 1972) Schaefer


suggests that for diatomics "it appears safe to say that a
general characteristic of the (restricted) HF approximation is the
prediction of bond distances as much as 0.005 nm too short".

must say I personally find it hard to see this regularity myself,


in the most accurate (i.e. nearest to HF) that have been done.
It seems to me that if the molecule dissociated "correctly" in
the HF approximation then on the whole the bond length is over
estimated, but if it doesn't then the HF method under-estimates,

94

B. T. SUTCLIFFE

but you will now be jumping up and down and shouting "but what
about LiH or NaH, or NaF?"
not fit my rule.

And I would have to agree, these do

What is perhaps more interesting is that we get,

on the whole, differences between calculated and experimental


bond lengths of up to 2.5%.

In a way this may be unfair since as

you can easily see if you look at the calculation of Cade and Huo
on the first row hydrides (J.C.P.

~,

614, 1967) that the

tendency to errors of the order of 2.5% is towards the end of the


periodic table row and that in the earlier members the basis is
more adequate (LiH + 0.60%, CH -1.69%).

However it is perhaps

representative of the kind of error we can get with basis sets


which are among the best we can handle.
It is clear then that if linear triatomics go anything like
diatomics then we are not even going to get to first base on an
SCF calculation.

Indeed we can go a bit further; the linear

triatomic HCN is a fantastically well studied species experimentally, and there are some extremely careful HF calculations on
it by McLean and Yoshimine (Int. J. Quant. Chem. 15, 313, 1967).
They obtain for the CH distance .1024 nm and for the CN distance
0.1116 nm.

The best "experimental" values would seem to be

0.10655 nm and 0.1153 nm respectively (see Strey and Mills, Mol.


Phys., 26, 129, 1973), and again the calculated bond lengths
seem to be quite substantially short.

It is quite interesting to

note too, that a gaussian orbital calculation of quite high


quality ("double zeta + polarisation") carried out by Roos et al
(Astrophys. Journal 184, L19, 1973) yielded distances substantially
away from those of McLean and Yoshimine C-H
C-N

= 0.1063

nm,

= 0.1137 nm, but in rather closer agreement with experiment.

Though the results of Roos et al are significantly less good in


energy than the McLean-Yoshimine results, they in fact represent,
as far as basis is concerned,about the best we can hope to do
with light polyatomic molecules generally, as you must be aware
from the lectures of Dr. Veil lard.

The other interesting thing is

95

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

that this is an example of a rule which appears to be reasonably


widely true in SCF calculations, namely the better the basis the
shorter the bond length.
Thus we must go to more accurate levels before we can hope
to manage this kind of problem and on the whole as you have seen
nowadays this means extensive

c.r.

Now as yet there really is

simply not enough experience to say how well one does in general
with say all single and double excitation

c.r.

from an HF start-

ing point for an internuclear distance, since very few

cr

calculations have been performed as a function of distance even


on diatomics.

The weight of informed opinion at the moment seems

to be, however, that such calculations could give bond lengths


that are too long by perhaps as much as 0.01 nm in "minimum"
basis

c.r.

though the error should decrease in more adequate

bases perhaps down as low as 0.0005 nm.

However, we do know that

in the case of HCN the same basis used by Roos et al for the SCF
calculation, yielded, on all single and double excitation

cr

(6343 configurations) C-H=.1066 nm and C-N= 0.1153, a result

essentially in perfect agreement with experiment.

We must

however regard this I think as largely fortuitous, though the


authors here suggest an accuracy of 0.0003 nm for their
calculations.

My own feeling is that probably a better basis

would in fact have reduced the bond lengths by more than that
error, but perhaps in discussion here Dr. Roos can help us more.
At any rate the point here is that this is the best that we can
currently do and we must, if we wish to manage this kind of
experiment, use it.
Roos et al in fact performed calculations on HNC as well as
HCN and were able to get bond lengths for HNC of C-N= 0.1169 nm
and N-H of 0.0995 nm.

Wahlgren et al. (vide sup) in a very

similar calculation on HCO+ get a C-H distance of 0.1095 nm and a

c-o

distance of 0.11045 nm.

Currently the CCH calculations are

96

B. T. SUTCLIFFE

not at a similarly accurate stage so we should perhaps not


consider them further here.
Now we have done all that we can on the basis of the
conventional model in a coarse structure calculation and we
should now perhaps turn to a discussion of the conventional
model.
At the most general level we believe that the observed
absorption can be accounted for by the interaction of the
radiation field with a molecule and in particular that this kind
of transition arises through the electric dipole mechanism, as
~

the first term in the expansion of A in the transition operator


matrix
N

<lJ!A! r Z. e
i=l 1
m.
1

.....

where A is the vector potential of the radiation field, and lJ!A


and lJ!B are whole molecule states (translation removed).
kinds of transitions

(Other

of course arise, mediated by the inter-

action of the field with spin and other more complicated operators.
For example, spin induced transitions are observed in the microwave
region, but generally only in the presence of static magnetic
fields as in ESR spectroscopy).

It is usual to use a commutation

rule argument to replace the first term expression by

i.h (EB-E A) <1J!A! i


rem. Z.~ !1J!B>
ll 1
where the first term in the expansion has been considered as a
unit vector and hence suppressed.

There is of course considerable

argument about whether the passage from the momentum (velocity)


term to the length (dipole) form is legitimate in anything other
than the simplest problem with exact solutions, but that is not a
hare that can be profitably chased here I think.

97

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

If a transition is allowed from

~A

to

~B

then we will observe

a line centred about

where intensity depends among other things on the transition


operator matrix element value and whose shape is a problem in
statistical mechanics and so we shall not consider it.
What we must now ask is how are our fixed nucleus wave
functions to be fitted into all this?

Well in principle we know

how to go on from our earlier discussions if we are in the nearlyrigid-molecule region or if we are dealing with a diatomic molecule.
If we have explored (as we have) the energy surface for the
molecule in question, then we shall know whether the rigid
molecule model is a starter or not.

If it is, then we can be

sure that for the range of interest then we can develop whole
molecule solutions in the form of the Born Adiabatic expansion
~

= !:
n

X (R) ~ (r, R)

n....

n ...

where now we recognise the nuclear variables! to symbolise the


Euler angles and the vibrational co-ordinates, R the vibrational
co-ordinates alone and n the sum over all states including those
arising from rotations within and between the electronic levels,
which are of course the principle terms in the sum.
As a first step we could make the Born-Oppenheimer approximation and develop the fixed nucleus electronic energy as a
function of the internuclear parameters for use in the effective
nuclear motion equation, as outlined previously.

In fact very

little has been done at this level except for diatomic molecules
and then mostly at the HF level for the electronic parts.
Essentially the most has been done in diatomics is to expand the
calculated potential about the equilibrium internuclear distance

98

B. T. SUTCLIFFE

and then to do a Dunham analysis up to the first anharmonic terms


in the vibration and the first vibration-rotation interaction
terms.

The results here (see for example Cade and Huo

~.

cit.)

are really in perfectly good agreement with experiment for those


systems for which the HF gets the bond length reasonably well.
On a Dunham analysis the leading terms in the energy of a
rotation-vibrating molecule are written conventionally as

vJ

= we (v

+ !) - w x (v + !)2 + B J(J + 1) - D J2(J + 1)2 + -e e v e

with v and J integers and

= Be

Bv

- a (v +
e

!)

+ v (v + !)2
e

where B is given by the rigid rotator-formula and all the other


e

constants w X etc. can be given in terms of the potential


e' e
parameters. Typically say for the LiH molecule one gets errors
in B of about 1.2% in the estimate (though the error in the
e
estimate of a e is about 9%).
Unfortunately again little systematic CI seems to have been
done that might enable us to estimate how well we were going to be
able to get agreement with experiment.

However, it would seem

likely that one could, if one took the trouble, with CI wave
functions come to within a few percent for a.
e

The important

pointis however that on the Dunham model the experimental


observations in the microwave, equivalent to the one we are
considering, would be one of B
without vibrational change).

(that is the rotational transition

For HCl for example, the difference

between Be' the crudely calculated quantity and Bo' is of the


order of 4525 MH

which is a quite important correction.

What now of the polyatomic molecule?

Well again in principle

one could make the Born-Oppenheimer approximation, do a Dunhamtype analysis at the same level as for diatomics, but here of
course the whole situation is much more complicated, and even in

99

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

the case of the linear molecule it is extremely tricky to tie in


the potential characteristics to the formula analogous to the one
given above for the nuclear energy levels.

(For the classical

account of the relationship see Nielsen Rev. Mol. Phys. 23, 90,
1951, esp. pp 111-113).

The up-shot of this is that almost no

work has been done on this problem and we cannot say how likely
we are to get agreement with experiment.

I feel that this is a

problem that we have to face soon in computational work,


particularly in view of the fact that nowadays many experimental
spectroscopists have computer programs, and do analyses into
which potential function characteristics may more readily be
fitted (see for example, Hoy, Mills and Strey) Mol. Phys.

~,

1265, 1972, or Burnham, Hougen and Johns, J. Mol. Spec. 34,


136, 1970).

There have been recent moves in this direction for

example by Meyer and Pulay, but only on the level of a quadratic


force field approximation and we must I think try and do somewhat
better if we are to help in this field at all.

To illustrate

this if we use the calculations that we have been talking about


on HNC and HCO + our computed Be values turn out to be 45.43 GHZ
and 44.87 GHZ.

Now in HCN a very careful analysis of the

experimental results show that the spectrum can be fitted by a


Dunham type analysis with the observed ground vibration state
transition given by
B

= Be

- a /2 - a - a /2 - D
I
2
3

there being three rotation-vibration interaction terms because of


the three normal modes (one degenerate) of HCN.

In this case

279MHz, a 2 = -21MHz and a 3 = 324 MHz and D is singularly


small 0.10 MHz), giving a total contribution of -582 MHz or

al

-0.582 GHz.

Now if we assume the worst and take it that this is

roughly the size of the uncertainty in our computed Be values,


one must say that to be fair we cannot really distinguish between
HCO+ and HNC.

The best that we can do is to assume that the

100

B.T. SUTCLIFFE

corrections will be in the same direction and about the size of


the HCN corrections, so we can say that HNC should be observed
close to 90 GHZ and HCO+ observed close to 89 GHZ which. is of
course a big separation in the microwave, but it is I think an
unsatisfactory assumption.
Now I have been through that problem in some detail to
illustrate what really is involved in using results in an
experimental context, comparing with experiment.

I could have

chosen a vibrational problem but if I had done it would have


turned out that except in the simplest cases we would not have
been in even such good shape as we were there.

For example, quite

good spectroscopic contents for the ground state of the alkali


halides and of the first row hydrides can be obtained at the
Hartree Fock level.
are pretty poor.

However the results for homonuclear diatomics

Calculated dipole moments and their agreement

with experiment have been the subject of a recent review by Green


(Adv. Chem. Phys. 25, 1179, 1974) and it is well worth reading
just to see how erratically HF theory predicts these.

As you

will see from that review, there has been surprisingly little
straight CI work on this kind of thing.
If I had chosen to talk about the vibrational-rotational
structure of electronic transitions or indeed even of their gross
structure, I can only say my talk would have been much shorter.
The problem of dealing with excited electronic states at a
reasonable level of accuracy is simply a largely unsolved one at
the minute.

On the whole it is fair to say that this is because

of our old enemy the non-orthogonality problem.

It seems to be

essential to use different bases for different electronic states


and this really gets us into trouble.

The honourable exception

to this is of course the hydrogen molecule, mostly due to the


stupendous work of Kolos.

If you need cheering up about things

just read his review on what can be done on that molecule

101

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

(Adv. Quant. Chern.

2,

99, 1970).

On the other hand if you really want a depressing read, have


a look at the paper of Douglas and Kroll (Annals. Physics.

~,

89, 1974) on the relativistic corrections for the fine structure


of the 3p state of Helium. If you know the paper already then
you will realise only too well why it is that I have tried to avoid
talking about the kind of situation where comparison with
experiment really involves taking relativistic corrections into
account.

There is, let me say, absolutely no doubt that if you are

a theoretical atomic spectroscopist, beyond the first row you need


these corrections to understand why certain transitions occur and
certainly to place them correctly.

Even in the first row you

need them to keep up with the most accurate experiments.

I am

afraid I shall have to reverse roles in the earlier joke and


simply say "We don't play that kind of music here!"

As I

indicated at the start, the whole situation of relativistic


corrections in molecules is a mess.

The fact is that we do need

them to understand ESR and NMR, to understand fine structure and


hyperfine structure in microwave and in IR spectra.

In this

context we even need to know something about nuclear quadrupoles.


We might also in very accurate molecular calculations want to know
how molecular energy levels are shifted by relativistic corrections.
As far as this last is concerned I had always thought that we
might be in reasonable shape simply by using the Hartman-Clementi
corrections for atoms (Phys. Rev. l13!, 1295, 1964) and quick
reference to that would have been enough.

I was therefore pretty

depressed last year to learn in a paper of Ermolaev and Jones


(J. Phys.

!, ..'

1, 1973) that "it is not in general sufficient to

calculate the relativistic corrections using the Breit-Pauli


Hamiltonians, together with approximate non-relativistic zeroorder wave functions" which was what Clementi and Hartman did.
But what then should one do?

According again to Ermolaev and

102

B.T. SUTCLIFFE

Jones one may use the method but only if one has very very good
wave functions and they cite as examples Pekeris' helium function,
and Sims' and Hagstrom's beryllium (Phys. Rev. 4!, 968, 1971) and
both of these functions are replete with r 12 's. What kind of
corrections might however we expect for the energy in atoms?
Walker in a careful correction of Clementi and Hartman's results
(J. Phys.

!, 4, 399, 1971) shows that in the non-relativistic

Hartree Fock approximation, relativistic corrections due to the


reduced Breit interaction vary between 0.00070 au for Be,
0.016619 au for Ne, up to 0.131749 au for Ar, Kim (Phys. Rev.
154, 17, 1967) in relativistic Hartree-Fock calculations on Be,
finds that the full Breit operator considered in first-order of
perturbation theory constitutes 0.00116 au whereas doing the SCF
using Dirac spinor products instead of orbitals lowers the energy
from -14.57302 to -14.57590, giving a total relativistic
corrections so to speak of 0.00404 au.

Now of course precisely

what these results mean is in dispute, but from them we can perhaps
get a feeling for what kind of shift we might expect in energy
levels with relativistic corrections.

Of course it is not the

case that every level of an atom (or of a molecule) will be


affected equally and perhaps the low lying states will be affected
more than the upper ones.

Usually one argues that in chemistry

the phenomena of interest arise principally from changes in what,


in an independent particle picture, would be called the valence
electrons or outer electrons, and that since the core remains
unchanged then the relativistic corrections will be extremely
small, since most of the relativistic contributions come from the
core.

Indeed this is quite true at the Hartman-Clementi level,

but is it really true?

There seems to be no way of knowing at

all accurately, except perhaps for helium where, because there


are only two electrons, quantum electrodynamics can be used in a
pretty reasonable way (Douglas and Kroll vide. sup.).

In helium

the nuclear motion corrections are apparently as big as the terms


of order mc 2a 4 in the quantum electrodynamical corrections

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

(a

= e l /2hoc)

103

so that relativistic terms and nuclear motion

terms are about equally important, but then helium is a very light
atom.

It would seem that at this stage we must hope that we can

get by without very detailed relativistic corrections.


Of course in another way we could not live without them, for
explaining ESR and NMR spectra and so on, and in the kind of
context in which we use them they certainly can be regarded as
very small corrections.

Thus typically a separation between

molecular electronic levels is of the order of 10- 2-10- 3 of an


Hartree.

At a field of 0.3T (3KG) then the Bohr magneton is


roughly 10- 6 of a Hartree so that we might expect, say the Fermi-

contact coupling term to have a quite negligibly small effect on


energy levels, and indeed it has (though it is not quite 8 orders
of magnitude smaller than the electronic separations) but of
course the effect though small is very important, to those doing
ESR at any rate.

The NMR terms are of course smaller still, and

the interesting question we must ask ourselves here is at what


level of accuracy must we calculate to wave functions in order to
make sensible predictions about the effects of these kinds of
terms.

For instance since the ESR spectrum depends on the

contact interaction a nucleus, how important will nuclear motion


be?

I really hesitate to even speak about the problems in NMR

spin-spin coupling constants, simply because to understand them


at all, we have to adopt a fixed (or at least distinguishable)
nucleus approximation and obtain an expression using second
order perturbation theory (see e.g. McWeeny and Sutcliffe p. 230)
and this as we have already mentioned, is absolutely illegitimate.
Whatever the relativistic effects are, we may only work in first
order of perturbation theory to include them and indeed it is
perfectly easy to show that in second order the terms we use to
explain NMR spectra result in divergent expressions unless we
artificially exclude the "self-interactions" that arise
naturally in the perturbation theory.

104

B. T. SUTCLIFFE

The answers to these problems as you might well expect, are


that nobody really knows.

We know that we can qualitatively

understand these phenomena with what we have and that we can also
do well at what might be called the semi-quantitative level, in
fixed nucleus calculations, but the field really needs a very
brave soul to look at these effects in high accuracy calculations
on simple molecules.
I trust that this does not sound a too depressing ending to
this lecture course.

It should be clear to you that I am not too

hopeful about the possibilities of proper scattering calculations


for molecules, and am very doubtful about the validity of some of
the work done on reactions using a "potential surface" approach.
However I also want to make it clear that I appreciate that at
present we have little alternative to adopting this approach if
we want to do anything on chemical reactions and so I think work
on it is certainly worthwhile.

However I think that even

ostensibly non-empirical work on such problems should really be


regarded as semi-empirical since one is in fact adopting a model
for the system which may well correspond only roughly to the
experimental situation.
I am also not too hopeful about calculations on large
molecules containing transition metals and the like, in the sense
that we cannot really cope with relativistic corrections and so
on in these circumstances.

However again I think that fixed

nucleus non-empirical calculations on such systems are useful and


important in the sense that they provide model results that can
be compared with experiment and sequences if such calculations
are certainly useful, in the same way that semi-empirical
calculations are useful in correlating and explicating experimental results.
I really am however extremely hopeful about highly accurate

FUNDAMENTALS OF COMPUTATIONAL QUANTUM CHEMISTRY

105

calculations including nuclear motion corrections for the ground


and excited states of 10-20 electron systems formed from first
and second row elements.

I feel (though of course I cannot prove)

that relativistic corrections in such systems will be of little


importance.

I have no doubt about our ability to get microwave

and IR spectral results in these systems completely sewn up in


the next few years, and probably we shall be able to make big
holes in the interpretation of the visible and UV spectrum too,
and possibly of the PES (UPS) spectrum as well, though for obvious
reasons I am less confident about the ESCA results.
So I hope you will go away with a feeling of cautious
optimism about future progress in the field of the calculations
of molecular structure.

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE


IN RELATION TO QUANTUM CHEMICAL CALCULATIONS)

Geerd H.F. Diercksen and Wolfgang P. Kraemer


Max-Planck-Institut flir Physik und Astrophysik,
Mlinchen, Germany

1.

INTRODUCTION

The present series of lectures is aimed to introduce scientists into the fundamentals of computer hardand software, in particular in relation to quantum chemical calculations.
In the early days of "electronic computers" it
has been quite normal, and even absolutely necessary,
for every user to have quite a detailed knowledge of
the computer he was working on, of its machine language,
the tricks, the possibilities offered, and the limitations imposed. With the advent of modern large, high
speed electronic computers, and the development of
powerful high level programming languages, both becoming available to a wide variety of users rather than
to a small group of specialists, the question came up
of how much knowledge of computer hard- and software
should be necessary and should be expected of a user
in order to write a program and to run it successfully.
The answer largely depends on the problem the user is
working on:
It seems to be well agreed in this context that
computers, and in particular the system software,
should be designed in such a way as to allow any user to
solve "simple" problems with a minimum knowledge of the
computer and its hard- and software features. The corre*)This contribution is based on the lectures presented
by one of us (GHFD).
Diercksen et al. (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 107-199.
All Rights Reserved. Copyright 1975 by D. Reidel Publishing Company, Dordrecht-Holland.

108

GEERD H. F. DIERCKS EN AND WOLGFANG P. KRAEMER

sponding programs written according to the standard


conventions of some higher level programming language
are expected nowadays to run equally well on any computer independent of its design. Such programs are
then called to be machine compatible. Most computer
centers are supporting this point of view with more or
less success in its realisation.
The answer is quite different on the other hand
if it is intended to design programs for large scale
computations which need large amounts of computer time,
as it is the case in the quantum mechanical calculation
of atomic and molecular structure, or if it is aimed
to solve problems which are at the ultimate limits of
the computer hard- and system software. In these situations a very careful understanding of the underlying principles of computer architecture is felt to
be absolutely necessary, as well as a very extensive
knowledge of the particular features and the system
software of the computer actually to be used.
Normally,in the higher level programing languages
no use can be made of the more advanced system features,
and special machine language routines have to be written
to make use of the full power of the computers. In general, application programs designed to make extensive
use of special system features may, or normally will
perform rather poorly on computers designed according
to a different logic. Such programs, or at least parts
of it, have to be redesigned before they can be run on
other computer systems economically. Programs which
cannot be run without major modifications on computers
different to the one they are designed for, are called
machine incompatible.
In the following lectures we will discuss the basic concepts of present day medium advanced to large
scale electronic computers (section 2), and in particular the structure of the central processing unit performing all logical and arithmetical operations. Further the organisation of two coupled and simultaneously working computer systems will be sketched briefly (section
3). We will define a pseudomachine, which includes the
basic concepts of present day computers, and we will
use it to discuss the principles of instruction scheduling for such kinds of computers (section 4), as well
as some basic compiler features looking in particular
into code optimisation techniques (section 5). Finally,
the logic of modern input/output systems will be described, which provides a proper basis for a short dis-

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

109

cussion of the organisation of parallel computers and


input/output operations in large problem programs
(section 6).
2. BASIC PRINCIPLES OF COMPUTER DESIGN
To start from scratch we will try to develop the
logical structure of a medium sized electronic computer of the present generation.
For the moment we think of the computer as a black
box. This computer is assumed to perform all the necessary activities to run any program written to solve
the problems we have in mind. As we advance in the
following lecture, we will refine the present picture
according to our needs.
We assume further that there are two computers
available to us, which are completely identical and
independent of each other. In the literature of computer science such a system is usually called a symmetric independent multicomputer system. Two jobs are
now submitted to this system at the same time, one to
each of the two computers, and for simplicity, they
are both expected to finish simultaneously at some
other time. Intuitively, we would say that the jobs
have been running in parallel.
This simple scheme contains already practically
all the basic features of modern computer systems, and
we have to look at it in some more detail.
First we notice that a computer must be a rather
intelligent instrument, that can perform all the operations necessary to run the program written to solve our
problems. But we know that intelligence is a relative
term. In our case, one could say that a computer is
more intelligent than another one, if it is able to
perform more operations in the same time, or if it can
work them down in a more economical way than other computers of its class.
Before we start with the discussion we have to
introduce some notations which will be frequently used
in the following: the task, the agent to perform the
task, and the time intervall which is needed to perform the task. The task, that are all the activities
necessary to complete a job as there are logical decissions, arithmetic operations, input/output mani-

110

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

pulations etc. These activities are performed by the


agent which is one of the two computers in the present
case. The time intervall is then the actual time taken
from the beginning to the end of the job. The definitions given here are very simple and we have to redefine our notations later on a more detailed basis.
In our model we said intuitively that there were
two tasks running in parallel and independently from
each other on the two agents in the same time intervall. In this picture two features are included which
played an important role in the development of modern
computer architecture, the logical independenc~ and
the parallelism.
In each task the logical order of execution specified by the user in his program has to be preserved
in order to produce the correct results. In the present
example where two tasks are running simultaneously on
the two computers the two programs have to be assumed
to be logically independent from each other. The more
logical dependency there exists between the two tasks,
or between parts of them, the smaller are the chances
to run the tasks in parallel to each other.
So far our model consisted of two identical, completely independent computers. This means that the two
computers have their own input devices, and jobs have
to be submitted to them separately. Now we are going
to modify this model by allowing both computers to have
access to some common input device from which each computer can select jobs whenever it is free to start
another task, according to some criterions. A system
where the computer selects its input without the interaction of the user is called a self-scheduling system;
if the job selection takes place according to certain
criterions it is called a dynamically self-scheduling
system.
In our present model both computers are assumed
to be able to access the common input device simultaneously and select the same job. To prevent the system
to select the same job, certain rules for controlling
the shared input device have to be introduced.
These rules for controling access to a common resource can be constructed according to the principles
of "open" or of "restricted" access. Open access means
in this context that all agents can make any use of the

III

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

resources, whereas restricted access implies a set of


constraints, taking into consideration the special needs
of each agent as well as the relation between all the
different agents. Typical examples of restricted access
are: priviledged instructions, and read only access.
For our present model, called a symmetric independent multi-computer system, with both computers
sharing the same input device, the status of the two
computers is displayed graphically as a function of
time ~or a given sequence of jobs in fig. 2.1.

TIME
COMPUTER
I.

2.

3.

f---~~=:J----m~---{:::::J--~~

B i---

r----<D.I~c::::Jr=--~=~:::J------<ZII{~=:Jm--_CJZll{:=:::p--_c:J_____I

1-----fJ!~I__---__!_~_~---_lWLl
~WA_____I

A f---~==:::J------~~~----{~==J-____I
B~~~~~----~~~~--~~~
- - WORK MODE
---- WORK MODE IOVERHEAD)
-c::r SELECT MODE
~ DELAY MODE

Fig.

2.1

We have to distinguish between three different modes


or status of a computer: the work, the select, and the
delay
(wait) mode. In the first example of fig. 2.1
with the computers A and B it is shown that one of the
computers of the coupled system has to go into a delay
mode after terminating a task if the other one is selecting a new job (select mode) at the same time. Delays due to contention of resources play a central role
in all systems with shared resources. An immediate and
important result of contention delays is that the overall efficiency of the system decreases because parts

112

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

of the system will be idling while excluded from the


access to a shared resource. To avoid, however, the
duplication of expensive resources such inefficient
situations have sometimes to be accepted within certain limits. Corresponding arguments play an essential
role in the considerations about the design of a computer installation because they are strongly related
to the very important price/work ratio of the installation.
In the present example restricted access to the
common input device can be achieved in a simple way.
The computer that selects the shared input device sets
a hardware switch that prevents the other one to use
the input device simultaneously. This does not involve
any communication between the two computers. Actually
a computer will not recognize that the other is in select mode, if it does not try to get into select mode
simultaneously. The main feature of this switch is its
passivity.
Such systems, namely dynamically self-scheduling
systems, have been set up in many large installations.
The fundamental advantage is the ability to balance
loadings of two computer systems, or more generally
of two agents, by allowing them to aquire tasks as
they are free to process them.
This form of our model can be further modified
if we postulate that one of the two computers, say A,
does task selection for both computers in addition to
processing jobs. However, to make this system working,
we need some kind of active communication between the
two computers. Computer B must be able to interrupt
computer A in some way to aquire work. The rules on
which this can be done will of course influence the
efficiency of the whole system considerably: first we
assume that computer B can interrupt computer A at any
time. In this case there will be no idle time on computer B, but normally computer A will have to interrupt
its task to service the request of computer B for work.
This means that some time will be necessary to clean
up the present activity of computer A and to resume the
work, after it has serviced computer B. This is shown
graphically in the second example of fig. 2.1. The overhead work which has to be performed by A will decrease
the overall efficiency of the total system. Alternatively we may postulate that computer A is only interruptable if it is itself in select mode. In this case

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

113

there will be no overhead on the computer A doing all


the selection work for stopping and resuming its task.
The selection of two jobs at the same time (one for
each of the computers) will obviously be much faster
than the separate selection of two jobs. But, on the
other hand, the computer B may have to idle for a long
time before its request for work can be serviced by
computer A. This set up is illustrated in example 3
in fig. 2.1. Which of the two approaches is the more
economic one depends on a number of factors, in particular: (a). on the amount of the overhead to start
and resume work after servicing a request, compared
to the average processing time, and (b) on the average
waiting time for being serviced, which depends on the
average processing time of a job. By introducing the
modification described above to our computer model,
we are switching already from a symmetrical to an asymmetric computer system. In this context two fundamental notations arise which have to be cleared up: the
active connection between computers and the specialisation of computers.
It is quite clear that in the present form of our
model of an asymmetric computer system the two components do not need to be identical any longer. Actually
there are two alternatives for an asymmetric set up at
this level: Either a large and powerful controller
system may be backed up by a number of less powerful
computing slaves, or a support processor may be linked
to a powerful computing processor. In both cases, the
primary computer will do all the job selection work
and will delegate only those tasks which are especially
suitable for the various computing slaves or processors.
Thus we may speak of a master/slave relation, although
the slave system might be the more powerful computer
from the purely computational point of ~iew.
The concept of asymmetry and functional specialisation is a very basic one and is used at various levels
in the design of computer systems organized for concurrent operations. The master/slave computer set up
as it is described above, is only one representation
of this concept. Another application we have in the
development of functionally specialized units within
one simple processor. These semiautonomeous subprocessors have originally been floating-point-units designed
to perform floating point operations at high speed and
essentially in parallel with other operations. The semiautonomeous floating point unit, under the control

114

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

of the processor, determines by itself which task it


is going to process and how this is to be done. Semiautonomy in a computer system allows a specialized
unit to take work out of the main path in small functional chunks and to perform it in parallel with the
main path. The effectiveness of this system depends on
the intelligence of the subprocessor, i.e. it depends
on how much work it can take over and how independently it can operate.
Until now resource sharing has been motivated by
the request to make full use of the resource, and to
avoid its duplication. Two other important motivations
are: reliability and flexibility. Let us assume for the
moment that each agent has its own and private resource. It may happen then that the resource of one of the
agents has a malfunction. As a consequence the agent
would have to stop processing, if it needs the malfunctioning resource for the present task. Or it may
happen for a certain task that one agent wants to
supply work for two or more identical resources simultaneously. Then the agent would be idling for a long
time while the only resource of the necessary type
available has to work on the queue of tasks. In both
cases delays could have been avoided, if a number of
agents would have pooled access to two or more resources of identical type. If there is a malfunction
in one of the identical resources, it can be disconnected from the pool for repair, and the agents are
still able to work in degraded mode. Every agent, in
addition, would be able according to some rules and
limitations depending on the status of the other agents,
to aquire more than one resource of identical type for
some time intervall.
This shows already that there is a close relation
between resource sharing and resource contention. If
two or more agents are sharing the same resource(s),
then an agent might find all resources assigned if it
aquires one and it must go into the wait state and idle
until one of the resources becomes free. This may lead
in the case of very strong contention to an intolerable
loss of efficiency of the system. The limit of intolerability is usually determined by economical considerations rather than by technological ones. It might very
often be more economic to let inexpensive resources
idle and avoid only the duplication of very costly resources.

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

115

With these problems we reach already the discussion about the configuration of a computer installation,
namely the determination of the populations of the resources to be attached to a processor in order to
achieve the necessary performance and reliability.
To decide on those questions during the stage of designing a computer installation extensive and careful
studies have to be performed, analysing the expected
delays due to resource contention and the influence
on the productivity of the computer system. The same
applies to the design of a computer itself.
For our further discussion we redefine our present
model of the computer and we assume now that it consists
of a number of specialized units: the processor performing all the program logic and arithmetic operations,
the memory for storing the informations, the channels
and control units performing and controlling the input and output operations, and the input/output devices
as there are for example card reader, line printer,
card punch, and magnetic tape units. We are then concerned with providing a sufficient number of the various
resources to the processor in order to meet the expected needs of the system performing its work load and
with the distribution of the resources among different
processors if there is more than one in our computer
system. But even only one processor may compete with
itself for access to a resource: Assume that two tape
drives are attached to the processor via one control
unit and one channel. Then it is impossible for the
processor to gain access to both tape units simultaneously because control unit and channel can only serve one
device at a time. This contention can only be released,
if we provide a control unit and a channel to each tape
drive.
In what is normally understood to be a generalized
multiprocessor system there is a very high degree of
resource sharing, including the sharing of all primary~
directly addressable memory. Each processor has full
addressability to all locations. In the sharing of
memory the concept of private ownership is entirely
abandonned, and the assignement of space to a processor is normally dynamic from a pool of available locations. The situation is slightly different in case of
the input/output channels, and each processor has separate channels assigned to it. But in various systems
the possibility exists that one processor can request

116

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

another to perform input/output operations on its behalf.


From the preceeding discussion it follows that
the concept of sharing and its techniques are of fundamental importance on different levels for the design
of a computer. In the following we are therefore considering briefly two main forms of sharing in a computer system: time sharing and space sharing.
Two processors can space share storage devices
that have collections of addressable units allocatable
in portions of various sizes to different logical structure, for example primary processor partitions, and
data sets (data files). Two processors are said to
space share a device, when data relevant to the current
tasks of both processors are recorded on the device.
An illustration of time sharing is the multiplexor
channel servicing card readers and printers. We observe
perfect simultanity if we watch the card readers and
printers of our installation. It seems as if each of
them is connected via its own control unit and channel
to the memory. To understand the technique applied here
we have to know that the three different resources involved here in the input/output processes have characteristically different cycle rates: the memory is capable of delivering 500.000 characters per second, the
channel can characteristically deliver 50.000 characters per second, and the devices (reader, printer) have
a delivery rate of appro 1-2000 characters per second.
What actually happens is that the multiplexor channel
services all the attached input/output devices in a
cycle, each of them for a fixed small time intervall.
But as the data transfer rate is much higher in the
channel compared to the devices they can be kept working with their maximum speed and simultaneously. Time
sharing basically occurs when asynchronous units have
rates of service such that the basic cycle of the slower is maintained as if it had free and uncontrolled
use of the shared resource.
In this context an important question arises, if
there is an upper limit in the number of resources of
a computer system which can be shared, or if there is
a minimum processor. The minimum processor may be reduced to a location counter and there are many proposals that start from this as a design concept for
multiprocessor systems.

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

117

Another aspect of the time sharing concept is the


idea of multiprogramming. In general multiprogramming
is the interleaved attention of the system to tasks
during a given time frame. That means, in a multiprogramming system different programs are active, that is,
their instruction sequences are processed at different
times. Control is given to the different programs according to certain priority rules. The system switches to
another program, if the present one cannot continue
because a certain requested resource is not available,
or if processing has to be delayed until certain external operations have been finished. In most cases,
the reason for such delays is that the program has to
wait for the completion of input/output operations.
The program is immediately reactivated, if the necessary I/O operation has been completed. Thus most present day multiprogramming systems are input/output
sensitive. Multiprogramming involves an unnegligible
amount of overhead due to switching between tasks.
Therefore the time slices have to be chosen such that
the overhead becomes tolerable.
We have looked at a computer system as a collection of resources and we have realized a number of
possibilities of parallel performance. In particular
we observed parallelism between arithmetic operations
and input/output operations. Present day computers are
actually collections of subprocessors with distributed
intelligence. There are for example subprocessors which
are able to undertake complex input/output operations,
including error checking, without participation of the
control processing unit.
We will finally discuss very briefly the basic
design concepts of present day large scale computers,
after having reviewed the fundamental principles of
a computer using a more and more refined artificial
model. The aim of any design of a computer has always
been to encrease the performance at minimum cost. This
has been realized to some extent within the past few
years primarily by the tremendous advances in computer
technology (not to be discussed here) and by new concepts in computer organisation. Within each technology,
certain limitations are imposed, and within these limitations only certain things are logically and economically feasible to attempt. The technology limits of
course the raw speed of a computer system. The speed
with which electrical current passes down a line limits the size of a system. The natural alternative to

118

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

this limitation is the introduction of more parallel


capability into a computer. The present day computer
organisation is based on the recognition of independence and parallelism of tasks on various levels, the
classical example of which is the separation and parallelism of processor and of input/output operations.
Such concept can only lead to shorter overall processing times, if the different resources actually have
something to work in parallel. Although some parallelism
is inherent in each program, especially on the very
low level as we will see later in some more detail,
the design of the user's program itself can be very
important to profit from the possibility of task separation and parallel processing. This is especially
true for the organisation of the input/output operations in many application programs.
From the observation of task separation and parallel
processing there has resulted the classical organisation
of large scale, high spe~d computers. It involves a
hierarchy of specialized units, each with some local
storage and with a limited intelligence. A corresponding scheme is shown in fig. 2.2.

su
STORAGE UNIT(SU)
CENTRAL PROCESSOR UNIT (CPU)
CHANNEL (CH )
CONTROL UN IT (CU)
DEVICE (0)

Fig.

2.2

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

119

Units down the tree tend to be highly specialized, capable of performing reasonably sophisticated functions
in their area with high speed. But they are useless
outside this area. The intelligence of each unit is
limited by its special function it has to perform within the total system. In a hierarchical system there is
a pyramid-like distribution of general capability.
Each parent unit does an interpretation of functions to determine to which unit of the tree it relates and passes it on to that unit for further interpretation or for execution. For example, the control
processor interprets an order to perform input/output
operations on a given channel and determines to what
channel to send the information necessary to perform
the operation. The channel receives relevant control
data, independently accesses any additionally required
data, and sends whatever is needed to a control unit
to which it initiates contact. The control unit addresses the device.
3. THE FUNCTIONS OF A PROCESSOR.
After having outlined the basic principles concerning the design of a computer of the present generation, we have to discuss now in more detail the
functions of the processor itself.
The first step to increase the efficiency of the
processor is to separate the performance of the arithmetic operations from all input/output operations and
to have the input/output operations controlled by
special functional units. This possibility has already been recognized very early because of the different execution times of logical and arithmetic operations compared to input/output operations. But this
separation has no real advantage for a number of jobs
with massive data manipulations and relatively little
input/output operations, as for example for calculations on atomic and molecular structures. To improve
the performance in this situation one has to introduce more parallel capability in the processor itself.
Parallelism is obtained here on the instruction level
or below, and the user of the system remains partly
unaware of the asynchronity and the overlap achieved
by the different components of the processor. To take
proper advantage of it is normally left to the hardware design and to the compiler, translating the source statements of a higher level language into a machine

120

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

executable sequence of instructions. Nevertheless, we


will try to point out in the following sections that
a careful organisation of higher level language statements can support the task of compilers and partly influence instruction scheduling and thus increase the
efficiency of a program for execution on a strongly
parallel organized processor. The same is even more
true for programming in assembler languages.
To
offered
ture of
logical

be able to exhaust the various possibilities


by the machine we have to look into the structhe processor and to learn how arithmetic and
operations are performed.

First it has to be noticed that processors have


two cycles of operation: an instruction cycle (I-cycle)
in which an instruction is aquired and stabilized in
the computer, and an execution cycle (E-cycle), in
which the particular function is actually performed.
In table 3.1 nine steps (tasks) are listed which are
necessary to perform an 'ADD' operation, assuming a
computer with only one arithmetic register and without any indexing cycle.
Table 3.1
The different steps (tasks) that are necessary to perform an 'ADD'-operation.
Step:
'I-cycle', executed in the instruction unit (I-unit):
1.
2.
3.
4.
5.

Update instruction counter.


Send instruction to instruction register.
Send operation code to operation code register.
Send operand address to operand address register.
Decode operation.
'E-cycle', executed in the execution unit (E-unit):

6. Send operad to storage interface register (SIR).


7. Send content of arithmetic register together with
operand to adder.
8. Execute 'ADD'.
9. Send result to result register.

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

121

The first five activities (step 1 to 5) form the


I-cycle: The instruction counter is incremented to
point to the next instruction to be executed. The
instruction is fetched from primary storage and placed
into the instruction register. The instruction word
contains the operation code, defining the operation to
be performed, and the storage address of one of the
operands (assuming a single address machine). The
operation code is sent to the operation code register
and decoded. The operand address is finally sent to
the operand address register in the E-unit. The next
four activities (step 6 to 9) form the E-cycle: The
contents of the arithmetic register (treated as one
operand of the operation by default), and the operand
fetched from primary storage are sent to the adder,
and the 'ADD' instruction is executed. The result of
the operation is placed into the result register.
The sequence of instructions summarized in table
3.1 to perform the 'ADD' operation can be worked down
most economically if we split the processor into three
different functional units: an instruction unit (I-unit),
an execution unit (E-unit), and a storage unit (S-unit)
(processor storage, primary storage, memory, core
storage). The fact that the 1- and E-cycle can be performed in two functionally independent units is the
condition for any parallelism in the processor. The
I-unit contains an instruction counter, an instruction
register, an operation code register, and a decode
matrix which is associated with the operation code
register. The E-unit contains an arithmetic register,
an operand address register, an adder, a multiply/
devide, and an arithmetic result register.
We have already mentioned that the machine described here is a single address machine (because the
address of only one of the operands necessary for the
operation is specified in the instruction word) with
memory references possible from the E-unit as well as
from the I-unit, which means that the I-and E-units
share the same register interface to the memory, called the storage interface register (SIR). This set up
is displayed in fig 3.1.

122

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

STORAGE UNIT (SU)

Jsu

STORAGE INTERFACE REGISTER (SIR)


INSTRUCTION UNIT (IU)
INSTRUCTION COUNTER (IC)
INSTRUCTION REGISTER
OPERATION CODE REGISTER (OCR)
DECODER
EXECUTION UNIT (EU)
ARITHMETIC REGISTER
OPERAND ADDRESS REGISTER
ADDER
MULTlPLV/DEVIDE
ARITHM. RESULT REGISTER
ENCODER

SIR

IU
~

: REGISTER

EU

Fig.

3.1

In some installations both, the I- and E-unit have


local registers and the two units can then be characterized as functionally specialized units with local
store. The critical characteristic of these local registers is that they work much faster than primary
storage. But for many machines these registers are
just positions in core rather than in a local store
in the particular unit. In this case their use has
no advantage over working with references to processor
storage. They are just defined in those machines to
keep the compatibility in the series of products. For
example, in the IBM series 360, only from the model 50
upwards, local storage registers have actually been included into the functionally specialized subunits.
To evaluate the performance of the machine defined
here and to investigate further improvements, we have
to introduce some characteristic rates of performance
as a basis for the following discussion. For the present example, the ADD operation, the characteristic
timings of the different steps (tasks) are listed in
table 3.2.

123

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

Table 3.2
Characteristic timings for the different steps (tasks)
that are necessary to perform an 'ADD'-operation.
time

step
1.

50 ns

2.

375 ns

3.

50 ns
225 ns

4./5.

6.

375 ns

7.

50 ns

8.

200 ns

9.

50 ns

'I-cycle'

'E-cycle'

update IC
fetch instruction
send OP-code to
register
decode and send
operand address to
address register

receive signal

fetch operand
send operand to
adder
ADD
send result to
register

I
Total time for I-cycle: 700 ns
Total time for E-cycle: 900 ns

It is assumed in our example that the I-cycle is


fixed and independent of the instruction to be executed.
This is of course a simplification, since most machines
do not have fixed decode cycles, but are decode sensitive. Further we see that the I-cycle is dominated by
the storage reference required to fetch the istruction.
This is generally true for all processors of medium to
large size. They are normally capable of generating and
processing storage references much faster than the
memory can respond. The E-cycle is of course completely
dependend on the instruction to be performed. The time
which the E-unit needs for an instruction depends cha-

124

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

racteristically on the question whether the instruction requires a memory fetch as well as on the complexity of the instruction itself.
For our further discussion it is convenient to
characterize the performance of a processor in terms
of millions of instructions per second (MIPS). Since
the time it takes to execute an instruction is variable,
MIPS-rates cannot be accurate criteria for the performance. We are interested in the MIPS-rates of the
I- and E-units. From our example (of table 3.2) the
following MIPS-characteristica can be obtained:
1.4 MIPS can pass through the I-unit and 1.1 MIPS
through the E-unit, assuming for simplicity the average
execution time to be~ual to that of the ADD operation,
and assuming that approximately 2.000.000 words of 36
bit length can be delivered from the S-unit (equal
number of reads and writes).
As a consequence of our initial assumption that
the I- and E-units have separate cycles following each
other it happens that the I-unit is idling for 675 ns
while the E-unit executes the instruction. This reduces the delivery rate of the I-unit from the highest
possible rate of 1.4 MIPS to approximately 0.727 MIPS,
i.e. to a usage of only 51 % of the total time. This
has of course the further implication that the E-unit
receives much less instructions than has been assumed
for its optimum performance rate and that therefore
the E-unit itself is also idling for about 35 % of
the total time.
The maximum throughput of this processor is obviously equal to the throughput of its slowest resource, the E-unit, which is able to deliver 1.1 MIPS.
To approach this limit one could try to arrange the
I- and E-units in an asynchroneous and overlapping
set up so that the I-cycle of an instruction is overlapping with the E-cycle of the preceeding one. Unfortunately this set up gives, however, raise to some
difficulties: (1) The time an instruction spends in
the E-unit depends on its complexity. Different instructions spend a different time in the E-unit. If the processor enters a string of more complex operations, the
delivery will be delayed for a longer time, and less
instructions per second can be performed. This relates immediately to one of the difficulties in evaluating the performance of a machine. The performance
strongly depends on the "mixture" of operations to be
executed, in particular on their types, the number of

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

125

occurances, and the sequence of instructions.


(2) Even more critical is that now the I- and E-units
generate much more storage references than the S-unit
can fulfill and that they are blocking each other
heavily from the free access to the S-unit, as they
are both referring to it approximately at the same
time.
There are of course a number of instructiornwhich
make no references to storage at all, for example:
shift and test operations. And there are further
classes of operations for which storage references
from the E-unit are not necessary, for example branch
instructions. That means that for these kinds of instructions we have overestimated to some extent the
contention of storage. But it depends of course on
the structure of the particular program and it is not
possible to make any predictions in general.
One can think on the other hand of a number of
methods to decrease contention of storaqe. (1) One
possibility is to provide more registers and operand
buffers as "scratch pads" for mathematical operations.
This would of course not decrease the contention for
memory if the registers are implemented inthe S-unit,
rather than as local storage in the I- and for E-units.
Register usage decreases the number of references by
holding values to be used frequently, for example as
base pointers to some instructions or data, and by
holding intermediate results of arithmetic operations.
(2) Another possibility to decrease contention is to
transfer on each reference more than 36 bits, for
example 72 bits. The usefulness of this solution of
course depends on how often the second 36 bits are
really used for computation. For references to instructions this usually is true in very many cases, depending on the branching structure. For references to
data this is true at least in the very important cases
of mathematical operations on ordered strings of data,
that is for mathematical operations on vectors and
matrices. (3) Strongly related to this solution is the
approach to use variable size instructions. Many instructions can be specified completely on 18 bits rather than 36 bits (in the IBM 360/370 series instruction set there are instructions of 16, 32, and 48 bits,
resp.). (4) So far we have been assuming that the storage is one unit, allowing one reference to it at a
time. An obvious extension is to devide the storage
into a number of units, each of which allows one reference at a time, but all of which may be referenced

126

GEERD H. F. DIERCKS EN AND WOLFGANG P. KRAEMER

simultaneously.
There are two possibilities to arrange consecutive storage locations into the different units, sometimes called banks in the CDC terminology: high bit
overlapping and low bit overlapping, both approaching
each other in the limit of an infinite number of units.
In high bit overlapping those storage locations are collected in the same unit, which have the same high bit
pattern of the address. For example, if there are two
units, each with 128k bytes, then byte 0 to (128k-1)
would be positioned in unit 1, and bytes 128k to
(256-1) in unit 2. In low bit overlapping those storage locations are collected in the same unit, which
have the same low bit address pattern. Assuming again
2 boxes, all storage locations with a 0 in the last
address bit would be put into unit 1, and all with a
1 in the last address bit into unit 2. Thus consecutive
storage locations would be put in different units. This
form of arranging storage locations in different units
is usually called interleaving. In some present day
machines (for example UNIVAC 1108, and IBM 360/75 between them), both techniques are provided. Which of the
two techniques has more advantages for avoiding contention depends on many factors, as for example average
program size, levels of multiprograming, and arrangement of data.

STORAGE UNIT (SU)


STORAGE INTERFACE UNIT (SIR)
INSTRUCTION UNIT (JU)
INSTRUCTION COUNTER IIC)
INSTRUCTION REGISTER
OPERATION CODE REGISTER
EXECUTION UNIT (EU)

IU

EU

Fig.

3.2

: REGISTER
BUFFER

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

127

In the machine developed up to now, which is displayed in fig 3.2, the 1- and E-units are roughly balaced. The rate with which this machine can process
instructions is limited by the rate of delivering of
instructions by the I-unit. To speed up the machine
without speeding up the electronic circuity, we will
analyse in more detail the functions of the I-unit.
Among the 700 ns of I-cycle time, 375 ns are spent in
the interface between the 1- and S-unit. This process
of instruction fetching consists of: (1) the delivery
of an address by the I-unit to the SIR, which we can
assume to take 50 ns; (2) the fetching of the "word"
from the S-unit into the SIR, estimated to take 275 ns;
and (3) the delivery of the "word" from the SIR to the
I-unit, assumed to take another 50 ns. While during
this time intervall the S-unit is fully busy, the
I-unit is idling for the time needed to deliver the
instruction word from the S-unit to the SIR. That
means, that the I-unit is actually busy only for 100 ns
within the time intervall of 375 ns. The question is
now which modifications have to be introduced to avoid
this delay. At the beginning of this paragraph we have
stated that the performance of the machine is limited
by the instruction delivery rate of the I-unit. But
now we see that the limiting factor is not the I-unit
itself, but the dependence on the delivery rate of
information from the S-unit. If the storage would respond instantaneously, that means if it would have a
zero delivery time, then the processing time of each
instruction in the I-unit would be reduced to 425 ns
and its basic delivery rate would be increased to
2.35 MIPS. This situation is typical for large-scale
machines which are indeed memory bound/rather than
processor bound.
The other dominating step among the five tasks
performed in the I-cycle is the one in which the instructions are decoded with an acutal decoding time
of 225 ns per instruction (see table 3.2). Apart from
the wait-for-delivery time the rest of the tasks takes
only about 200 ns. This part of the I-cycle could possibly be speeded up if the remaining tasks could be
arranged such that they are performed in parallel to
the decode operation. But before we are going to consider this possibility we turn back to the instruction
fetching step and we try to reorganize the relation
between the 1- and the S-unit in order to speed up
the corresponding interface.

128

GEERD H. F. DIERCKS EN AND WOLFGANG P. KRAEMER

We notice, that I- and S-units are elements with


fundamentally different speed characteristics. The
classical approach of interfacing asynchronous devices
of different characteristic speeds is to build a buffer
between them. Such a buffer can of course not change
the actual speed of a unit. It serves as a repository
for data at the interface in such a way as to "smooth"
the operations of the asynchronous element. We will
analyse the effect of an eight instruction buffer between the I- and S-units, so that whenever the I-unit
references the S-unit eight instructions are transferred from the S-unit to the buffer simultaneously.
It is quite obvious that the total time to fetch
eight instructions at once is shorter than to fetch
eight instructions separately, because no time is
needed to deliver the addresses of the seven instructions following the first one. The buffer can directly
serve as an instruction register, and processing can
proceed immediately from there. The usefulness of such
an instruction buffer depends of course on the question
whether the eight instructions can be used as they are
in the buffer. The optimum situation occurs when no
branching is required, and all the instructions in the
buffer can be executed in sequence. If on the other
hand one instruction in the buffer is an unconditional
branch to another instruction out of the buffer, the
transfer time of all the instructions following this
branch instruction is lost. The situation becomes even
worse if a conditional branch instruction occurs, where
it is necessary to inhibit the branch until it is possible to decide what direction it will take. The problem is then what to do with the following instructions
until the condition on the branch is cleared up. There
are two possibilities: One can inhibit instruction advance, or one can tryon a technique of "tentative"
advance, as it is done in most large scale machines.
A special situation arises in the case that one
of the instructions is a branch pointing back to an
instruction, that is still in the instruction buffer.
In this case the machine encounters a special state,
usually called "short loop", or "loop mode". Instruction fetching is stopped, and processing proceeds entirely from the instruction set resident in the instruction buffer. This speeds up the execution considerably, primarily because of relieving contention
for the S-unit. On the IBM 360/91 programs executing
largely in loop mode run up to 4 times faster than
otherwise.

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

129

For a further improvement of the availability of


instructions in the I-unit some modifications can be
introduced in the present buffer system: One could
think of providing two buffers, each with a length of
four instructions. While the I-unit works on instructions from one of the two buffers, the other buffer
could be filled. This would avoid waiting times if the
end-of-buffer condition is encountered. In our present
machine, a four word instruction buffer would be filled
in appro 1200 ns, while the I-unit can process four instructions in appro 1100 ns. Thus with such a buffer
system the 1- and S-units would be roughly balanced.
An even more flexible approach is possible and has
actually been implemented for example in the IBM 360
series, models 91 and 195. The instruction buffer with
a length of 8 double words (each of 8 bytes length) is
considered to be cyclic. An upper bound register is
pointing to the double word that has most recently
been placed into the buffer, and a lower bound register
pOints to the word which is the longest time in the
buffer. Buffer filling involves then only the transfer
of one double word of instructions from the S-unit under
control of the upper bound register which has to be increased automatically. The buffer is filled in advance
until the upper bound register is 7 buffer positions
ahead of the instruction register. The instruction
fetch is made conditional upon the contention for the
S-unit. The priority of an instruction fetch is reduced compared to other references depending on how
many buffer positions the upper bound register is ahead
of the instruction register.
We see that the liS-interface can be speeded up
by supplying local storage in the I-unit, which smoothens out instruction fetching. Together with this
buffer, another level of addressing has to be introduced. Addressing is no longer to the positions in the
S-unit only, but to the instruction buffer as well.
Recently, large fast S-units have been developed, completely invisible to the programmer that might make local
storage in the I-unit unnecessary (see below) .
With the present machine organisation, we have
increased the instruction delivery rate of the 1unit to 3.64 MIPS. Depending on the programming language used, the programmer can contribute to some extent
to an optimum delivery rate of the I-unit, if the following basic principles are obeyed:

130

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

(1) Whenever possible, loops should be made to fit into


the instruction buffer, allowing the machine to run in
a special "short loop" mode. This can speed up program execution considerably. (2) The parameters of
conditional branches should be computed well ahead
of the branch instruction if possible. This allows
the I-unit to make an immediate decission and initialize buffer filling while other operations in the E-unit
might still be in progress.
With the improvements which have been introduced
for the I-unit and the I/S-interface the relation between the I- and E-unit appears now again to be seriously unbalanced.
At this stage two basic approaches are possible
to speed up the performance of the E-unit without increasing the raw speed of the circuity: (1) to increase parallel capability in the E-unit so as to
allow the acceptance of more instructions from the
I-unit, and (2) to relieve the E-unit from some of
its functions. We will investigate both approaches.
The E-unit we have used so far has been executing
all instructions, except the input/output operations.
It has a multiple arithmetic register, the careful use
of which is left to the programmer or to the compiler.
However, the E-unit can only execute one instruction
at a time. If we want to introduce parallel capability into the E-unit we have to find different classes
of operations that can be executed simultaneously.
The decission connected with this classification of
operations is of a major impact on the organisation
of each machine, its register organisation, its instruction set, its addressing structure, and finally on the
compilers to be used for the computer system. If we
analyse from this point of view application programs
and the instruction sets and instruction timings of
modern computers we notice that LOAD and STORE operations are the most frequent instructions performed in
the E-unit. The short execution times of these instructions are compensated by their enormous frequency of
occurance. The occurance of floating point operations
on the other hand is rather low, but the individual
execution times are quite large. - If we optimize for
the first group of instructions, we would achieve an
increase in the general throughput of the machine. If
we optimize for the second group, however, we would
design a machine ideally suited for problems with a

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

131

high accurance rate of floating point operations (as


it has actually been done for the IBM 360/91).
Several problems arise in context with the introduction of parallel processing in the E-unit. If parallel processing is achieved just by adding another
E-unit, there has to be some mechanism so that the
I-unit has the possibility to select one of the two
E-units for processing the next instruction. This can
be done by associating a busy signal with each E-unit.
In case both E-units are busy, the instruction delivery by the I-unit has to be delayed. Further, the
addressable registers must be accessible by both E-units,
they can no longer be privat to one of the units.
Another serious problem is the preservation of the
logical seguence of instructions. One solution to this
problem is to associate a control bit with each of the
registers; a 0 whenever it is free for reference, and
a 1 whenever it is interlocked. All registers in which
the result of an operation is to be stored are interlocked until this result has been delivered. Notice,
that in this context the register of a load instruction is a result register which has to be interlocked
until the data are delivered from the S-unit. If an
instruction refers to an interlocked register, its
processing has to be delayed. This leads to an important limitation. The introduction of a second E-unit
into the machine will only increase the MIP rate of
the combined "E-units", if there is the possibility
for parallel processing in the instruction sequence.
This depends of course very much on the program structure (interdependence of the instruction stream) .
In the discussion about the performance rate of
the I-unit we have found that the I-unit is delayed
primarily because of the slow speed of the S-unit delivering the instructions. The same can be expected
for the E-unit. One instruction spends an (assumed)
average processing time of 675 ns in the E-unit. In
this time intervall 375 ns are needed for fetching the
operand data. To avoid operand fetching from the S-unit
we have supplied multiple registers. Instructions referring to these registers will be executed much faster
and will reduce the contention for the S-unit.
But there is still some need to improve the E-/
S-unit interface, for example by supplying a buffer
for operand fetching. Balancing however is more complex in this case than for the I-unit because the re-

132

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

quest for operands is very variable, and the addresses


generated will not be as regular as those for the E-unit.
To avoid these problems one has to supply a massive, fast buffer storage, or to generate the addresses
and initiate the fetches for operands well ahead before delivering the instruction to the E-unit.
The first approach has been realized for example
in the IBM 360 model 85, and subsequently in the higher
models of the IBM 370 series. A 16K or 32K bytes (monolyth) storage with only 80 ns access time is supplied,
which is invisible to the programmer, that means not
directly addressable. This buffer storage, or cache,
is organized in sectors of 1024 bytes, where each
sector in turn is devided into 16 blocks of 64 bytes
each. Each sector is assigned to some sector of 1024
bytes in the S-unit by a sector address register.
There are less sectors in the cache than in the S-unit,
and therefore the correspondence of a processor storage
sector to a buffer storage sector is dynamically depending on the references.
The dynamic binding priority depends on how recent the last reference was for execution or read.
Writes are executed directly and do not change the
priority. On the model 85 with a 4 times interleaved
S-unit, and a 16 byte wide data path, it is possible
to achieve a block load of 64 bytes in one storage
cycle. The advantage of this approach depends of course
on the success of finding data referenced in the cache.
This is the case for arithmetic operations on ordered
data like vectors and matrices. In the limit that a program and its data would fit into such a cache the machine would be able to operate like an 80 ns storage
cycle machine.
The second approach to improve the E-/S-unit interface can only be realized with some profit if the distribution of the work between 1- and E-unit is reorganized.
The I-unit fetches instructions, decodes them,
forwards operation codes to an op-code register asynchronously used by the 1- and E-unit, and forwards
operand addresses to the E-unit. But there is a complete class of instructions that do not really need
the participation of the E-unit as there are: LOAD,
STORE, and TRANSFER (unconditional and conditional

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

133

branches) operations. Therefore we can assign the execution of these instructions to the I-unit. This will
relieve the E-unit of work, but it does not yet allow
parallel processing of the I- and E-unit: The same
op-code register is still used by both units, and
while the I-unit is processing an instruction, the
E-unit has to wait and vice versa. This situation
can be improved if we supply a second op-code register:
one is used for instructions executed in the I-unit,
and the other one for instructions to be executed in
the E-unit. This set up would now allow parallel processing within some limits, but no look ahead techniques. Once the I-unit is working on one instruction,
and finds another to be executed in the I-unit as well,
no instruction can be forwarded to the E-unit, which
might be free for processing some instructions independent of those executed in or waiting for the I-unit.
To get around this difficulty we have to introduce an
instruction stack for operations to be executed by the
I-unit. This allows the I-unit to advance, to locate
E-type instructions, and to forward them to the E-unit
for processing if possible. Now, the I-unit, is
able to evaluate addresses of operands needed in E-type
instructions, and to initiate the necessary LOAD instructions for the operands into local storage of the E-units.
We transfer this task to the I-unit completely which
will allow us to use the local storage of the E-units
sensibly, and fill it by a look ahead technique.
At this point it is easy to extend the machine
concept further and to include the possibility of
address modification (indexing). The necessary capability and the necessary registers haveonly to be included into the I-unit, because this is the only place
where addresses need to be evaluated.
We have now reduced the work load on the E-unit
in two ways: We have reduced the number of instructions to be processed (its population) and we have
provided a local buffer, sensibly filled by the I-unit
using some look ahead technique operating on a 50 ns
cycle. The average instruction time of the E-unit is
thus 325 ns faster than in our previous set up. This
has been achieved without reducing the delivery rate
of the E-unit, but by introducing some additional hardware resources.

134

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

The machine can finally be further speeded up so


that one instruction is decoded every machine cycle,
that is every 50 ns. This of course makes necessary
to reconsider the decode task: We define instruction
sets in such a way that by partial inspection it can
be determined which unit is going to process the instruction. This predecoding then is the only decoding
work left to the I-unit, while all other decoding will
be performed within the appropriate units. This increases the parallel capability of the machine by
allowing decoding throughout the units. Furthermore
we split the instruction decoding capability (predecoding) of the I-unit, and its instruction processing capability. We introduce for the LOAD-STOREMODIFY-TRANSFER instruction processing an LSMT-unit
with the capabilities so far performed by the I-unit.
In addition we supply the E-unit, analogous to the
LSMT-unit, with local instruction buffer and operand
buffer store in order not to delay instruction delivery and to cause interlocks.
The pseudomachine described here is displayed in
fig 3.3.
STORAGE UNIT (SU)
STORAGE INTERFACE UNIT (SIR)

I I

BUFFER STORAGE UNIT (BSU)


INSTRUCTION UNIT (lU)
INSTRUCTION COUNTER (IC)
INSTRUCTION STACK
OPERATION CODE REGISTER IOCR)
LOAD -STORE - MODIFY - TRANSFER U. (LSMTU)
OPERATION CODE REGISTER (OCR)
INDEX REGISTERS (XR)
OPERAND BUFFER-

I Isu

SIR
BSU

Jrn

EXECUTION UNITlS) (EU,EU1,EU2)


OPERATION CODE REGISTER (OCR)
ARITHM. REGISTERS
OPERAND BUFFER

---

~
XR

IU

,~----'--T----:-~(
L

LSMTU

1=

I=='=l

I r-------t--'------.=ErrU I '----'--'.E'""UI
Fig.

3.3

EU2

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

135

This set up includes most present day design characteristics of actual machines. Each of the two processors
has extended local store for various buffering purposes
and is actually a highly specialized subprocessor by
its own. The machine design allows to include any number of subprocessors for special purposes by defining
appropriate instruction classes. Traditionally these
have been specialized units for arithmetic operations.
This machine has an extensive capability for look ahead
and parallel processing, but special care has to be
taken to preserve the principle of the logical sequence
of instruction processing.
To study the performance of instructions in this
final set up of our pseudomachine we consider as a
simple example the sequence of instructions listed
below:
LOAD
MPLY
STRE

AR1 ,MEM1
AR1,MEM2,AR2
AR2,MEM3

Following the concept used in IBM machines it is assumed that each instruction set consists only of two
types of instructions: register-register and registermemory instructions.
Initially the I-unit brings the LOAD instruction
to the predecoder, discovers it to be of the class
which is executed by the LSMT-unit, sends it to the
LSMT unit and places it into its instruction stack.
Simultaneously with the processing (decoding) of the
LOAD by the LSMT unit, the I-unit brings the MPLY to
the decoder, and discovers that it is to be executed
by the E-unit. However, it contains a memory reference
that has to be processed by the LSMT unit.
There is now a fundamental difference between these
two instructions to be handled in the LSMT-unit: In the
LOAD instruction the source and destination is explicitly specified. The operand will be sent directly
from memory to the arithmetic register AR1. To process
the MPLY instruction we have to remember that the E-unit
expects all operands either to be in an addressable register, or in an operand buffer (local storage). Therefore, the predecoder must decompose the MPLY instruction into two instructions, one for the LSMT-unit and
one for the E-unit. The instruction sent to the LSMT
unit is to load an operand from the memory into the

136

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

operand buffer of the E-unit, and the instruction sent


to
the E-unit will be to perform the multiplication
of the operand in arithmetic register AR1, and the
operand in the appropriate operand buffer. This leads
again to the problem of relating asynchroneously arriving operands and instructions in the E-unit. One solution, which has already been pointed out, is to
interlock the usage of the arithmetic registers and
buffers until the operands have arrived. This makes
it necessary that the buffer to which the operand is
sent is known to the E-unit, which can be achieved by
allowing the I-unit to select the next empty operand
buffer in the E-unit and to place the address into the
instructions sent to the LSMT- and E-units.
An interesting possiblity that comes up here is
to perform logical independent operations that are delayed because of the unavailability of addressable
buffers via operand buffers. This again points to the
possibility to abandon addressable buffers completely,
and to leave the buffer organisation to the hardware,
using solely operand buffers.
After forwarding the instructions related to the
MPLY, the I-unit will bring the STRE instruction to
the predecoder, discovers that it is an instruction
to be executed by the LSMT-unit and sends it there.
The STRE instruction will be put into the LSWI instruction stack, until the register AR2 receives its
result of the multiply instruction. Thus the sequence
of instructions listed above has been resolved into
the following form:
LOAD
LOAD
MPLY
STRE

AR1,MEM1
EBUF,MEM2
AR1,EBUF,AR2
AR2,MEM3

We have pointed out very briefly quite a number


of possibilities how work can be split once the underlying principles have been recognized and established:
local store, functional specialisation and functional
concurrence.- The major concern of designing high speed
computers has been to increase the performance at minimum cost. The basis for this was to split the work for
parallel processing, leading to balanced, specialized
units with a high population of instructions executed.With the advanced technology of large-scale integration,
where 100's or 100C1s of circuits can be placed on a

137

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

tiny ship, there is an enormous advantage of cost reduction in building identical parts. Therefore one
might expect that the design of future machines will
tend to introduce more identical units, rather than
functionally specialized units.
To conclude this chapter we will finally scetch
the basic design concepts of two high performance computer systems which are currently in use, the CDC 6600,
and the IBM 360/91.
The CDC 6600 has a control unit and 10 specialized
functional units: a branch unit, a Boolean unit, a
shift unit, a floating point add/subtract unit, two
floating point- and fixed multiply units, a divide unit,
two short-word fixed-increment units (used also for
indexing), and a fixed-point add/subtract unit. The
general structure of the machine is displayed in fig.
3.4.

STORAGE

CONTROL UNIT

BRANCH
UNIT

ENCREMENT
UNITS

BOOLEAN
UNIT

DEVIDE
UNIT

SHIFT
UNIT
FLOATINGP FIXP
ADD/SUBTRACT
UNIT

FLOATINGP/FIXP
MULTIPLY UNIT

Fig.

3.4

138

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

The short word add (increments) and branches are


the only instructions that may have direct S-unit references. An interesting feature of this machine is
its use of registers for addressing and how it achieves independent operand fetches: The X-registers are
true 60 bit registers addressable by arithmetic and
logical instructions. Only the X-registers may be
addressed by these operations and therefore each arithmetic and logical operation depends on the presence of
operands in these register. The A registers are 18 bit
registers, and there is a one to one correspondence
between the X and A registers. The B registers are
true index registers to be used in forming addresses.
A set of instructions is allowed to refer to the A
and B registers. The "increment" instruction is allowed
to refer to both registers. Whenever an address changes
in A, the corresponding X register is fetched from, or
stored in memory. Fetches occur if At to AS is referenced, stores if A6 or A7 is referenced. All addresses
are relative addresses and are added to a relative
location register to form the true machine address.
The asynchrony of this machine is controlled by
the main processor. If the necessary subprocessor is
available to execute, the instruction is sent to the
subprocessor, otherwise it is held by the control unit.
The control unit itself has an 8 word instruction stack.
The subprocessors have no capability for instructions
and operand stagging (buffering). Successor instructions may be issued if appropriate units are available,
inspite of instructions held by the control unit.
The impressive feature of this machine is the
breadth, allowing 10 independent instructions to proceed in parallel with decoding of another. The operational efficiency of the machine unfortunately is quite
sensitive to instruction seguencing and register assignment, and the programmer has to pay a lot of attention
to these considerations.
The IBM 360/91 (see fig. 3.5) is conceptionally
simpler, and simpler to program efficiently. It has
4 floating point registers of 64 bits, and 16 general
purpose registers of 32 bits, available for fixed point
arithmetic operations and index manipulation. Memory
references are allowed on arithmetic instructions. The
instruction unit basically transforms such instructions
into a sequence of memory reference- and arithmetic
instructions.

139

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

1Ie rl
------

---

INSTRUCTION
UNIT
INSTRUCTION
S - CONFLICT
STORE ADDR.
STORE DATA

8 (8)
8 (4)
8.(3)
8.(3)

:
.:t

----

----

-(16)
FIXED POINT
OPERATION
UNIT

I
-----

STORAGE UNIT

----

--

FLOATING POINT
OPERATION
UNIT

OPERATION 8 (6) OPERATION 8 (8)


OPERAND 8.(6) OPERAND 8.(6)
REGISTERS (16) REGISTERS
(4)

Fig.

3.5

The instruction unit has an 8 double word (64 bytes)


instruction fetch buffer. It is responsible for all
instruction fetches. Therefore there is a family of
store oriented buffers, in order to allow the instruc~
tion unit to issue instructions independent of the
response of the store. There are four storage conflict
buffers, and 3 address buffers. This allows the instruc~
tion unit to issue storage requests to busy units and
to bypass waits on store references. The data store
buffers allow functional units to free their local
storage and register without waiting for storage
access. The load operation and operand buff~rs allow
the instruction unit to issue instructions (within
limits) idependently of the execution rate of the
functional units.
The IBM 360/91 achieves its speed by the depth,
while the CDC 6600 achieves it by its breadth.

140

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

4. INSTRUCTION SCHEDULING AND RESOURCE ALLOCATION


After the preceeding discussion of the concept of
present day computers, which are designed to support
parallel processing of logically independent tasks,
we turn now to the discussion of instruction scheduling and resource allocation. Instruction scheduling
has to be performed such that instructions are "submitted" for execution in a form to allow the computer
to make optimum use of its parallel capabilities. In
the last chapter we have "designed" a pseudomachine
which apart from its parallel capability is able to
bypass locked instructions, which means that logically
independent instructions can be processed ahead of
locked instructions. We found this look ahead feature
to be necessary, because we assumed by default, that
instruction sequences may be submitted for execution
which are not very well suited for the actual machine
design. This look ahead capability is necessary in
particular because of the largely unpredictable contention of the S-unit for data transfer, independent
of how many simultaneously accessible S-units are
available.
Three different agents are directly responsible
for this instruction scheduling:
(1) The compiler which translates a sequence of
high level language statements into a machine executable sequence of instructions. Translation means in
this context the process of producing a more or less
optimum sequence of instructions according to the
actual features and capabilities offered by the computer preserving the logic of the program. (2) The
assembly language programmer himself, because assemblers perform only an interpretation of the instructions
written by the programmer, statement by statement, and
put them into the machine executable representation.
The register allocation is normally completely left to
the programmer. In the case of register allocation the
programmer is directly responsible for resource allocation, and in the case of instruction scheduling he
is indirectly responsible for resource allocation.
One could of course think of an assembler with an
optimization phase, assigning internal (true) registers
to the programmer specified registers in a way optimum
for the machine design. The main characteristics of
the source program and of the intention of the programmer
would still be preserved. (3) Finally the machine and

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

141

its actual configuration. We have seen that the processor of most present day computers has some flexibility to schedule logically independent instructions
out of sequence or to avoid resource contention and
execution delay.
To some extent, there is a further agent which
is at least partly responsible for instruction scheduling: the high level language programmer. He has
obviously much less influence on the instruction scheduling than the assembly language programmer (who has
practically full control over it), in particular if
the compiler has an effective optimization phase. But
nevertheless, the programmer can write his high level
language statements in a way to support the task of
the optimization phase because there are decissions
which a compiler logically cannot make and which thus
have to be made by the programmer himself, e.g., to
decide if a variable has to be stored in the S-unit,
or if it is just an intermediate result used only once
or a few times within a limited sequence of statements,
only. In the latter case it would be sufficient and
preferable to keep the variable temporarily in a register or buffer. There are also optimization decissions
which classically are not thought to be tasks of the
optimization phase of a compiler. It is, for example,
well known that matrix arithmetic can be speeded up
by using such a code that matrices are written in
vector form avoiding time consuming index manipulations.
All the agents have the advantage that they "know"
a good deal about the design concept of the machine on
which the program is running, and thus can produce a
schedule of the instructions within the limits of logical independency that makes optimum use of the machine's hardware and organisational facilities. The
information most important for instruction scheduling
on all levels are: the instruction interdependencies
(program logic), the type and number of resources
available, and the instruction (task) execution timings.
Unfortunately the execution of the S-unit references
is largely unknown thus entering an un.predictability
into the instruction scheduling.
As an illustration we define a pseudomachine and
analyse how it executes codings with different instruction sequences of the same arithmetic sample equations:

142

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

It would be too complicated to analyse the execution of the sample codings on a pseudomachine which
has all the advanced features characteristic for the
contemporary large scale computers as they have been
discussed previously, in particular the look ahead
and bypass techniques, which in some respect are just
introduced to smoothen out the effect of non-optimum
instruction scheduling. Therefore we restrict the
present discussion to a much simpler pseudomachine
that has the basic features of present machines just
to give a realistic picture of the impact of instruction scheduling on the code execution.
The pseudomachine we will use is defined by the
following resources and characteristic features:
Storage unit, storage interface register, instruction
unit, two fetch/store units, add/subtract unit,
multiply unit, de vide unit, and seven registers.
This set up is shown in fig. 4.1.

STORAGE UNIT (SU)


STORAGE INTERFACE REGISTER (SIR)
INSTRUCTION UNIT (IU)
FETCH/STORE UNIT (FSU)
ADD/SUBTRACT UN IT (ASU)
MULTIPLY UNIT (MU)
DEVIDE UNIT(DU)
ARITHM. REGISTERS (AR)

Fig.

4.1

143

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

In addition, we have to make a number of postulates how this machine is working:


(a) Each instruction is spontaneously available from
the instruction stack and has a decode time of one
time unit, usually called one "cycle". This actually
means that we assume storage conflicts not to arise
on instruction fetching, and that we do not consider
buffer filling times to delay instruction availability.
(b) Data are available at the specified register 5
cycles after the operand address has been delivered
by the fetch/store unit to the storage interface register. This in fact means, that storage conflicts
are not taken into account in the present discussion
at all, that is, we are assuming practically a strongly
interleaved S-unit.
(c) Instructions can be passed from the instruction
unit to the selected functional unit only if this unit
is free. Otherwise the instruction unit is locked until
the instruction is delivered to the selected functional
unit, This actually means, that the resources are considered not to have instruction hold stacks, a possible
form of bypass technique. Results of operations become
available in the specified registers one cycle before
the unit becomes available again.
(d) The timings for instruction execution and the unit
busy times for this machine are listed in table 4.1.

Table 4.1

INSTRUCTION

Pseudomachine instruction
execution- and unit busy times
FUNCTIONAL
UNIT

INSTRUCTION
EXECUTION TIME
(CYCLES)

UNIT BUSY
TIME
(CYCLES)

FETCH/STORE

FSU
SIR

4
5

ADD/SUBTRACT

ASU

MULTIPLY

MU

10

11

DEVIDE

DU

29

30

144

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

A number of rules are necessary for controlling


the register usage in order to guarantee that instructions are executed in its logical order, that means as
specified in the program:
A register from which an operand of an instruction
is taken will be called a source register, and a register in which the result is to be put will be called
a sink register. In this context registers specified
in FETCH instructions are sink registers, and the
registers specified in STORE instructions are source
registers. The following rules are set up to control
the register usage of our pseudomachine:
(e) An instruction specifying a register as a sink of
an operation cannot be issued by the instruction unit,
if the specified register is already in sink mode. This
rule enforces that operands (results) are put into the
register in the specified order. If it would be possible
to issue one or more instructions simultaneously that
specify the same register as sink, then the content of
the register would be unpredictable, because in the
highly parallel machines it is not forseeable which
instruction will be worked down first. In some respect,
this rule sequencializes the instruction processing
with respect to result delivery.
(f) A register specified as source and sink in the
same instruction, will be considered as a sink register. This means, that the sink is the stronger
mode.
(g) An instruction referencing a register as source
which is in sink mode, or referencing a register as
sink which is in source mode, will be forwarded to
the selected unit (if available). The register status
is set to sink/source, or source/sink, respectively,
and processing is delayed until the first mode is removed, that is, until the first condition is fulfilled.
(h) Registers specified as sink will be available for
references by other instructions after they have been
used once as source register, and registem specified
as source will be available for references by other
instructions (other than in source mode) one cycle
after instruction execution is started.
(i) Finally, fetches will not be allowed on our pseudomachine to be issued as long as operand stores are in
progress. This, again, is necessary to guarantee a well
defined content of a storage location. - This rule
could have been specified in a much more restricted
form, i.e. restricting the access to the storage location referenced by the store instruction.

145

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

This rather simple pseudomachine includes a number of features of present day computers (especially
of the CDC 6600) most important for the understanding
of the performance of such machines. Other features,
designed to decrease the influence of resource contention (look ahead and bypass techniques) as well as
storage interleaving (instruction and operand fetching)
have not been included into this model. Although vital
for the performance of an individual machine, they are
not important for discussing optimum instruction scheduling.
We will now consider a simple mathematical operation, and see how it is executed in the machine depending on the actual coding:

x = A-B/C
Y

(4.1 a)
(4.1 b)

X+D-E+F*G

The coding of these expressions is given in table 4.2.


Assuming a completely sequencial machine, where an
instruction is issued only after the execution of the
previous instruction is finished, the total execution
time of this instruction sequence amounts to 142 cycles.
The logical structure of these expressions, that
is the interdependencies of the operations, can best
be seen from the tree-diagram displayed in fig. 4.2.
Y
X-A-B/C

(4.10)

Y-X+D-E+F*G (4.1b)

Fig.

4.2

14
15

13

1
2
3
4
5
6
7
8
9
10
11
12

FETCH
FETCH
FETCH
DIVIDE
SUBTRACT
STORE
FETCH
ADD
FETCH
SUBTRACT
FETCH
FETCH
MULTIPLY
ADD
STORE

OPERATION

TABLE 4.2

R1 ,
R2,
R3,
R2,
R1 ,
R6,
R2,
R6,
R2,
R1,
R2,
R3,
R2,
R1 ,
R6,
B
C
R3,
R2,
X
D
R2,
E
R2,
F
G
R3,
R2,
Y

OPERANDS

B/C
A-B/C
X == A-B/C
X+D
X+D-E
F*G
X+D-E+F*G
Y == X+D-E+F*G

R2
R6
R1
R1
R2
R6

COMMENT(S)

9
9
9
30
5
9
9
5
9
5
9
9
11
5
9

INSTR.TIME
(CYCLES)

9
18
27
57
62
71
80
85
94
99
108
117
128
133
142

TOTAL TIME
(CYCLES)

Instruction sequence for expressions (4.1 ) and associated


execution times on a "sequential" machine.

'"

:><:

.,'"

:-0

'Tj

t""

:><:
en

'"
.,zz
"::E
.,z

"t;:;

:-rJ

"';:r:"

0
m
m

0\

.j::.

147

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

Each node in this diagram represents an operation,


those marked with an operand are actually representing
the fetch or store operation of the operand. Such a
tree diagram has a certain breadth and a depth. The
breadth is the number of nodes of a given level, and
the depth is the number of levels. The operation of
each node depends on the completion of all operations
on the lower levels, which have branches to this node.
All operations on the same level can be executed in
parallel, independently of each other (if the number
of resources allows). Thus the breadth of each level
is a measure of the potential parallelism in the logic
of the expression. There is a qualitative relation
between the depth of the tree, and the execution time
of the instruction set. But it is not possible to conclude directly from such a tree quantitatively how
much time it will take to execute the instruction sequence. In the present example, all operands could be
fetched in parallel, and the subexpressions B/C and
F,G could be performed in parallel, if the available
resources allow. All other operations have to be performed in sequence, because each depends on the outcome of operations on a lower level.
Assuming that no delays occur, we can determine
the minimum execution time for an instruction sequence
represented by the logic of the tree-diagram displayed
in fig 4.2. This has been done graphically in fi~ 4.3.

FUNCTIONAL UNITS

10

20

FETCH UNIT I

C
F

B
E

,
3

AOO/SUBTRACT UNIT
MULTIPLY UNIT
OEVIOE UNIT

30

60 T(CYCLES)

50

'0

G
I
I

F*G

B/c

Fig.

4.3

I'

1 A-B/c
2 1-0
3 2-E
3-F*G

148

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

Assuming that four fetch/store units are available,


the minimum execution time is determined to be 58 cycles, that is approximately half the time needed on a
sequential machine to execute the instruction sequence
of table 4.2. It should be noticed however, that the
sequence of instructions has been slightly changed compared to table 4.2 to minimize the total execution time.
After the preceeding discussion of the sequential
machine we are now going to analyse how the pseudomachine defined above would execute the sequence of instructions of table 4.2, and determine the total execution time. The execution of this instruction sequence
is displayed in fi~ 4.4. Such diagrams are called GANT
charts. On the left hand side each instruction is listed,
together with the functional unit to execute it. Horizontally, the time is plotted. A solid line segment
shows that the functional unit is busy in that time
intervall. In the lower part of the fig 4.4 the functional units and registers are listed. Here, in addition, the status of the units and registers as a functional of time is shown.
From this diagram it. is seen, that the pseudomachine needs 91 cycles to execute the given instruction
sequence. This is about 2/3 of the time needed on a
sequential machine to execute the same instruction sequence. This is a rather disappointing result; on a
pseudomachine with highly parallel capabilities using
advanced organisational techniques the execution time
is only 1/3 shorter than on a completely sequential
machine and is still 33 cycles longer than the minimum
exectution time. This result has to be analyzed more
carefully.
There are 11 delays which arise during execution
of the instruction list on the pseudomachine. These delays are mainly due to the following three reasons;
(1) waiting for operands (delays B',C,D,E' ,F' ,I,J, and
K)i the longest delay is due to waiting for the result
of the devide operation B/C,
(2) waiting for units (delays A,F, and H), and
(3) register conflicts (B,E, and G). In particular we
would like to point in this context to the delays B,
and E which are sink/sink conflicts on register 2, and
to the delay E, which is a store/fetch con~lict.
It is easily seen that there are enough functional
units available for parallel processing, as well as
enough registers that can be used for independent instruc-

___

r-------a

l1li

K
K

j'

f'
K

Fig.

~f~

~~---;

--E3

>------E3

r---E3

610

CD

~/r

K,

,5,

K
K

,K,

K{S,S,

t------El

-.

K/S

CD
IIII------E3

!II

7p

K,

f-----E3 >------E3

t------El

r-------a

.......

l1li

rm

>------E3

4.4

K!S

r-------a

-.

l1li

------.a

f--<

t------El

>---- ~

,K/S ,5,

>------E3

--E3

510

t------El:

t------El

..

l1li

8---B

:@>---- f

01

. . (~)

--.e-------<

_ _=~----1I---____1~p

,5,
,5,

I------E3
D

9? T(CYCLES)

f-------l

>--E

8p

+>
'D

tTl

'"

...,
~

o."

en

;J>

'IT'"

';J>"
::r:

tTl

c::
...,

."

o;,::

(j

.."

ten

;;:

tTl

;,::

o;J>

c::

."

150

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

tions. The main reason for the delays therefore is due


to a non-optimum instruction sequence, and to a careless use of registers.
To yield a more optimum instruction sequence we
have rewritten the equations (4.1) in the following
form

x
y

(4.2a)
(4.2b)

B/C - A
F*G + D-E + X

The tree-diagram for these equations is displayed in


fig.4.5. It is immedially seen that this tree is broader and lower than the tree of equation (4.1) displayed
in fig. 4.2.

+
-X-B/C-A

(4.20)

y= F*G+O-E+X (4.2b)

-B

-A

Fig.

4.5

The optimum instruction sequence that can be set up


for this tree-diagram and the corresponding GANT chart
displaying the execution of this instruction sequence
on the pseudomachine is given in fig. 4.6.

ASU

R6,R7,R7

R7,Y

ADD

STRE

FUI

ASU

ASU

FUI

ASU

FU2

R7,R5,R7

ADD

FU2

R2 ,R6 ,R6

Rl,E

FETCH

MU

R6,X

R3 ,Rl. ,R7

MPLY

FUI

STRE

R5,D

FETCH

DU

SUB

Rl ,R2 ,R6

DIV

FU2

R2,A

RI.,G

FETCH

FUI

R7,Rl,R7

R3,F

FETCH

FU2

SUB

R2,C

FETCH

FUI

FETCH

Rl,B

FUNCTIONAL
UNIT

FETCH

INSTRUCTION

I!!I

I!!I

E3

i!!I

I!!I

II!!

E3

E3

II!!I

II!!I

II!!!I

E3

10

E3

Fig.

4.6

I!!I

i!!I

~---E3

20

I!!I

E3

30

--I

II!!I

40
50

T(CYCLES)

'Tj

Vl

tTl

;>:l

'Tj

en

"o

)'
;..

;>:l

;..

::I:

;>:l

t;j

'"c::

s:

(")

o'Tj

t-<

en

s:tTl

~
~

152

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

The most striking result is the decrease of the


total execution time of this instruction sequence to
54 cycles. That means, that the pseudomachine can execute this instruction sequence approximately twice as
fast as the instruction sequence of table 4.2. In the
following a short analyses is given of this instruction sequence:
First it is recognized, that both equations are
worked down in parallel. This allows a more optimal
arrangement of the instructions than a separate treatment of the two equations. It is easily seen, that all
arithmetic operations of equation (4.2) are performed
(except the addition of the variable X) while the time
consuming division B/C of equation (4.2a) is executed.
In particular the following holds:
The variables B,C,F, and G which are operands of the
time consuming devide and multiply operations are
loaded first into the registers R1 to R4. There is a
short delay in the fetch operation, but the time could
not be used in any other operation to be performed
because no operands are available. Then the devide instruction is issued. Again, there is a delay in the
execution but the operands of these instructions are
the first to be available. When the multiply instruction is started, register 2 becomes available, and
another fetch operation is issued, to use the time
until the operands F and G are available to perform
the multiply operation. And so on. It can be stated
as a rule that operands should be fetched well ahead
of the instruction in which they are used, whenever it
is possible. This increases the propability that operands are available whenever they are needed. The
arithmetic instructions are issued whenever the operands are available. Obviously even if this rule is
followed there are still conflicts between fetching
operands and arithmetic instructions, and delays cannot be avoided completely.
In order to write an optimum code like in the last
example a very good knowledge of the principles of the
computer is necessary together with some experience.
It has been tried of course to replace this intuitive
code optimization which has to be done by the programmer himself. For this purpose in most present day compilers for higher level languages optimising phases
have been included. We will therefore discuss in the
following chapter the basic compiler principles and
analyse to what extent an optimum code can be generated
automatically.

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

153

5. BASIC COMPILER TECHNIQUES


Compilers are programs itself, which are incorporated into the machine to translate statements written
in some higher level language (like FORTRAN, ALGOL,
PL1, etc.) into sequences of machine executable instructions. Actually, the code generated by the compiler is
usually not directly executable. It contains already
the final instruction sequences and register assignments, but all references to external symbols (outside
the compiled modul) are still expressed symbolically,
and all addresses are specified relative to some basic
location of the code. The external references are resolved by another program which links all object codes
compiled at different times to build the final program
(linkage editor). The relative addresses are resolved
at the moment when the absolute position of the program
in the primary storage unit is known, that is at load
time (loader).
There are various compiler techniques and many
different approaches have. been employed. We are interested here only in the very basic ones in context with
the generation of optimum instruction sequences for
highly parallel oriented central processors, and we
will mention more complex techniques only very briefly
if necessary.
The primary task of a compiler is to translate the
arithmetic expression coded in the higher level language
from the infix form, common to most algebraic languages,
to the so-called postfix or Polish form.
In the well known infix form the operator is placed between the associated operands (we restrict ourselves here to binary operators that is to operators
with two operands). Expressions written in the infix
form are on the other hand sometimes ambiguous. They
get their unique meaning only by an additional set of
interpreting rules. Basically, these rules determine
the operator hierarchy and the treatment of brackets.
This makes the infix form rather inconvenient for
straight left to right scanning during the process of
code generation in the compiler.
In the postfix form the operator is put behind
the two associated operands, this means the two operands
left to an operator are considered to be the input for
the operation (see example of table 5.1).

154

Table 5.1

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

Translation from infix to postfix


notation.

INFIX NOTATION:

X=A-B+C-D+E-F

( 5.1)

POSTFIX NOTATION: XAB-C+D-E+F-=


XAB-C+D-E+F-=
R1
XR1C+D-E+F-=
R2
XR2D-E+F-=
R3
XR3F+F-=
R4
XR4F-=
R5
XR5=

R1AB-=
R2R1C+=
R3R2D-=
R4R3E+=
R5R4F- =

Obviously, operands may also be results of previous


operations, and may thus be presented indirectly.
Postfix expressions are always interpreted from left
to right which is actually the only rule to be followed.
They are unique. This makes postfix expressions
very convenient for further manipulations in the compiler.
The translation of the infix form into the postfix form thus consists in the interpretation and rewriting of the infix form under the control of some
interpreting rules. Various methods have been developed for this purpose. We will restrict ourselves
here to a very simple and straightforward technique
sufficient for our aims. It consists of a simple left
to right scan of the infix form to be demonstrated
for the very simple expression X=A-B+C-D+E-F in
table 5.2.
We will discuss only those points absolutely necessary for the illustration. In particular we will
not discuss any of the problems connected with the
detection of syntax errors in the higher level language expressions, "rith .the variable length of variable
names, with the different types of variables and operations (fixed point and floating point), and with the

155

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

internal representation of variables in the program.


In all the following examples we assume correct syntax
and some unique type of variables and operations and
restrict ourselves to the basic operators = (equal),
+ (plus), - (minus),
(multiply), and
(exponentiate).

* *'

The lists necessary for generating postfix form


by the selected technique are: (a) an input string to
hold the expression in infix form, (b) an output string
to hold the translated expression in postfix form, and
(c) a last-in-first-out (LIFO) stack to hold operators
temporarily. Characteristically informations are entered into such a stack exclusively at the top of the
internal list and they can also be removed only from
there, which is technically verified by a pointer marking the top entry.
The scanning process from left to right is performed such that: (a) if an operand is found, it is
directly moved to the output string, and (b) if an
operator is found, its hierarchy is compared to those
of the operators on the stack. So-called unstacking
rules have to determine now whether an operator has
to be unstacked or not and whether the new operator
joins the stack. Obviously, these unstacking rules
must reflect the rules for interpreting the infix expression, that is they must be based on operator hierarchies and bracket structures. In the expression (5.1)
we have only three operators: equal (=), plus (+), and
minus (-), where the two last ones are considered to
have the same priority. We assign to each of these
operators a hierarchy value: "equal" is assigned the
hierarchy value 0, and "plus" and "minus" are assigned
the hierarchy value +1. The following unstacking rule
is sufficient for the present purpose. We will extend
this rule when we proceed to more complex examples:
Rule 1:

An operator unstacks all operators


in the operator hold stack with
equal or higher priority.

It is easily realized that this rule decodes the current infix expression correctly: Operations with operators of higher priority are performed prior to operations with operators of lower priority, and operations
with operators of equal priority are performed from
left to right.

XA

+
=

XAB-C
&

&

+
=

&

&

&

XAB-

XAB

XA

-1

0
-1

1
0
-1

-1

-1

-1

&

-1

&

-1

X
X

&

OPERATOR HOLD STACK


(LIFO STACK)
OPERATOR HIERARCHY
VALUE

-D+E-F

C-D+E-F

+C-D+E-F

B+C-D+E-F

-B+C-D+E-F

X=A-B+C-D+E-F
=A-B+C-D+E-F
A-B+C-D+E-F

INPUT STRING
(INFIX FORM)

The generation of postfix form from the infix expression (5.1)


according to the unstacking rule.

OUTPUT STRING
(POSTFIX FORM)

Table 5.2

(5.1)

s:t7j

t7j

;..

;;<:

:-0

C'l

;..

C'l

t""'
."

:E

tl

;..

'"ztTl

;;<:

tTl

S2

p::

tl

tTl
tTl

C'l

0\

Ul

&

XAB-C+D-E+F-=

&

&

&

XAB-C+D-E+F-

XAB-C+D-E+F

XAB-C+D-E+

&

+
=

XAB-C+D-E

&

+
=

&

&

XAB-C+D-

XAB-C+D

XAB-C+

Table 5.2 cant.

-1

-1

-1

-1

-F

E-F

+E-F

D+E-F

"'l

-....l

Vl

'm"

...,
~

'""'l

I:)

::c
>

'"
Ii'
'>"

...,c::

is::

(")

"'l

'"0

>
t"'"

...,Z

>
is::

zI:)

c::

158

GEERD H. F. DIERCKSEN AND WOLFGANG p, KRAEMER

At the start of the scanning process we have the


following initial conditions: (a) The input list holds
the expression (5.1) in infix form. (b) The operator
hold stack contains the end-of-list indicator (&)
which has been assigned a hierarchy value of -1, that
is a lower hierarchy value than any other operator.
(c) The output lis~ is empty.
The first symbol in the input list is the operand
It is moved directly to the output list. The next
symbol analysed is the operator "=". Because the operator hold stack is empty the operator joins the stack.
Then the operand A is found and moved directly to the
output string. This operand is followed by the operator
"-" which has a hierarchy value of +1. This value is
compared to the hierarchy of the last operator on the
stack, which is the operator "=" with 0 in the hierarchy.
According to the unstacking rule the operator "=" is
not unstacked, and the operator "-" joins the stack.
The next symbol analysed is the operand B, which again
is moved directly to the output string. This operand
is followed by the operator "+" with a hierarchy value
of +1. It is found, that this hierarchy is equal to
that of the last operator on the stack, i.e. the operator "-" must be unstacked, and moved to the output
string. The operator "+" then joins the stack. The
next operand again is directly moved to the output
string and the next symbol is the operator "+". A
comparison of the hierarchies of this operator and
the last on the stack shows that both are equal. Therefore the operator on the stack is moved to the output
string and the new operator "+" joins the stack. The
translation of the rest of the infix expression follows the same scheme. The final generated postfix expression is found in the last line of table 5.2.
The generation of an actual sequence of machine
instructions from the postfix expression of table 5.2
is demonstrated now for a three address machine in
table 5.3. It has already been pointed out that the
postfix expression is scanned from left to right.
Operands are moved to another last-in-first-out stack,
the operand hold stack. Whenever an operator is encountered in the scan, the top two members of the
operand hold stack are used as its operands. The result
of the operation is assigned a symbol and this is put
on the operand hold stack as topmost member. The final
sequence of instructions is summarized at the end of
table 5.3.

x.

B-C+D-E+F-=

A
X
&

SUB A,B,Rl
ADD Rl,C,R2

X
&

R2

X
&

Rl

SUB A,B,Rl

X
&

Rl

SUB A,B,Rl

B
A
X
&

D-E+F-=

+D-E+F-=

C+D-E+F-=

-C+D-E+F-=

XAB-C+D-E+F-=
AB-C+D-E+F-=

&
X
&

INPUT STRING
(POSTFIX FORM)

from the postfix

OPERAND HOLD STACK

The generation of an ."instruction code"


form of expression (5.1)

OUTPUT LIST
(GENERATED CODE)

Table 5.3

\0

Vl

co

;0

[/)

u
o

9
:>
z

;<l

:>

:c

;<l

co

::::
c""
>-l

()

[/)

I""'

:>

z>-l

::::
co

:>

SUB A,B,R1
ADD R1,C,R2
SUB R2,D,R3
ADD R3,E,R4
SUB R4,F,R5

SUB A,B,R1
ADD R1,C,R2
SUB R2,D,R3
ADD R3,E,R4
SUB A,B,R1
ADD R1,C,R2
SUB R2,D,R3
ADD R3,E,R4

R5

X
&

X
&

--

R4

F-=

+F-=

E+F-=

-E+F-=

R4

X
&

X
&

R3

SUB A,B,R1
ADD R1,C,R2
SUB R2,D,R3

X
&

R3

X
&

D
R2

SUB A,B,R1
ADD R1,C,R2
SUB R2,D,R3

SUB A,B,R1
ADD R1,C,R2

Table 5.3 cont.

>
m

'"
:::
m
'"

:<'

C'l

>

C'l

'T1

t""

::;::

t:l

>
z

CIl

'"

t:l

t:l

;t

'"

m
m

C'l

0\

SUB A,B,R1
ADD R1,C,R2
SUB R2,D,R3
ADD R3,E,R4
SUB R4,R,R5
STffi R5,X

Table 5.3 cont.


&

0\

tTl

;<'

o
'"'"r]

zt:l

:>-

I?

;<'

:>-

:I:

;<'

tTl

'"~

s:::

(')

;;;l
t;
o'"r]

tTl

s:::

:>-

t:l

'"r]

162

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

We have described here a basic compiler activity,


in which a sequence of instructions is formed from an
expression coded in a high levellanguage. We have not
discussed so far the optimization of this code.
Advanced present day compilers work in principle
in two steps: In the first step an intermediate form
is generated from the high level language expressions,
very much like the procedure described above. This
intermediate form is usually called the intermediate
language. It consists essentially of a list of triplets
of the structure: operator, operand, operand. This
intermediate language allows for table representation
as well as for tree diagram representation. In the
second step a list of machine executable instructions
is formed from the intermediate language list. Optimization takes place on both levels of processing. The
optimization performed in the first step, forming the
intermediate language, is usually called conceptual
level optimization, and the optimization taking place
in the final code generation is called machine specific
level of optimization. On the conceptual level of optimization it is tried to generate instruction sequences
with a minimum of interdependencies. On the machine
level of optimization on the other hand it is tried
to find an optimum instructing scheduling and to avoid
resource contention. The techniques used in the optimization phases of modern compilers are highly sophisticated. We will therefore consider here only some basic
principles.
A method to generate an instruction sequence directly from the infix form of a high level language
expression is described in table 5.4 for our simple
example (5.1). It consists essentially in a combination
of the two steps pointed out in tables 5.2 and 5.3. In
this method, an operator- and an operand hold stack
are needed simultaneously. The process of generating
the instruction triples is straightforward. The final
list is given at the end of table 5.4 and the corresponding tree representation is displayed in fig.5.1.
It is recognized immediately that the code represented
by this small and high tree diagram consists of a sequence of strongly dependent instructionS. with nearly
no possibility for parallel processing. As has been explained earlier the generated code should,however, be
represented by a broad and low tree-diagram offering
more possibilities for parallel processing. For the
present example (5.1) we actually would like to have
generated a code represented by the tree-diagram of
fig.5.2.

-AB

1
2
3
1
2
3

1
2
3
1
2
3

D
2

-AB

1
2

X
&

-AB

+1C
-2D
+3E

X
&

+1C
-2D

-AB

X
&

E
3

-AB

+1C
-2D

X
&

X
&

+1C
-2D

+1C

X
&

+1C

-AB

1
2

X
&

C
1

-AB

Table 5.4 cont.

&

=
&

&

&

-1

-1

-1

1
0

-1

-1

-1

-1

&

&

&

-F

E-F

+E-F

D+E-F

-D+E-F

)0

>
ttl
s=
ttl

)0

:-=

C'l

>
z

C'l

"':I

t-<

t;)

>
z

'"zttl

)0

!2
ttl

;t

)0
t;)

ttl
ttl

C'l

0\

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

11)

r:QOQII.II.
~,-INr-<\.=::r

165

166

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

x
X-A-B+C-D+E-F (5.1)

Fig.

5.1

x
X =((A -B)+(C -O))+(E -F) (5.10)

Fig.

5.2

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

167

This tree diagram is a representation of the expression:


S

((A-B) + (C-D

+ (E-F)

(5.1a)

which is equivalent to expression (5.1) from the mathematical point of view and can be expected to be also
equivalent with respect to the numerical accuracy
(rounding errors!) on most computers. In this expression
brackets have been used to define the order of execution.
There are several techniques for compiling such expressions. As an illustration a simple method is chosen here
which consists in taking care of the bracket structure
by increasing the hierarchy of the operator by a multiple of the number of left (open) brackets to its left
hand side. The multiple must be qual to the highest
hierarchy value assigned to an operator.
The generation of an intermediate language list
for the expression (5.1a) according to this procedure
is demonstrated in table 5.5. The resulting instruction sequence given at the end of the table can be represented by the tree diagram of fi~ 5.2. This shows
that bracket structures are a possibility to enforce
an improved instruction scheduling. Unfortunately, there
is a tendency in designing modern compilers to ignore
bracket structures wherever possible.
We are therefore interested in compilers and their
optimizing phases, which directly generate instruction
sequences from any given high level language expression,
that can be represented by low and broad tree-diagrams,
to make best use of the parallel capabilities of the
computer system. An analyses of the process of code
generation for the present example, expression (5.1),
shows, that this can be achieved in two ways (or by a
combination of them): (a) to delay the unstacking of
every second operator of equal hierarchy, or (b) to
delay the unstacking of equal hierarchy operators, if
the operands are results of previous operations. This
ensures a minimum dependency of successive instructions
on each other.
First we implement restriction (a) to our basic
unstacking rule, and apply the method then to the direct code generation from the sample expression (5.1).
Re 1: A switch (SWITCH1) is introduced to control unstacking. Initially the switch is set OFF. It is
turned ON, if an operator unstacks an operator
of equal hierarchy, and it is turned OFF if an

-AB

&

A
X

&

1
X

&

+
=

&

B
A
X
&

&

&

A
X

&

&
&

&

&

&

&

-1

-1

-1

3
0

-1

-1

-1

-1

(C-D+(E-F)

+(C-D+(E-F)

B)+(C-D+(E-F)

-B)+(C-D+(E-F)

((A-B)+(C-D+(E-F)

=((A-B)+(C-D+(E-F)

X=((A-B)+(C-D+(E-F)

INPUT LIST
OPERATOR HOLD STACK
OPERATOR HIERARCHY (INFIX EXPRESSION IN
HIGH LEVEL LANGUAGE)
VALUE

The direct generation of instruction triples (intermediate language)


for expression (5.1 a)

OUTPUT LIST
OPERAND HOLD STACK
(INSTRUCTION TRIPLES)

Table 5.5

'"

is:
m

.,m'"

co

c:l

c:l

'TI

t"'"

.,z

en

(')
~

'"
.,zz

"@

:-n

0
:I:

'"

m
m

c:l

0\
00

-AB

-AB

1
2
3
1
2
3

1
2
3

-AB

-CD
+12

-AB

-CD
+12

-AB

-CD
+12

-AB

Table 5.5 cont.

E
3

&

&

=
&

&

&

&

&

&

&

&

D
C
1

&

C
1

&

C
1

-1

2
1

-1

-1

-1

3
2

-1

3
2

-1

F)

-F)

(E-F)

+(E-F)

D+(E-F)

-D+(E-F)

>-xl

\0

0\

CT1

>-xl
...,
~:>::l

en
0

;.-

:>::l

;.-

::c

:>::l

ttj

""...,c::

s::

("l

>-xl

;.ten
0

s::ttj
z
...,

;.-

c::

1
2
3
4

1
2
3

&

+34

-EF

+12

-CD

-AB

X
&

F
E

+12

-CD

-AB

Table 5.5 cont.

&

&

-1

0
-1

2
1

i"i
i"i

;:e

i"i

a::

i"i

;.-

;>::
;:e

Cl

z
:-c

:>

Cl

>Tj

~
t'""'

:>

i"i

CI'l

;>::

;:e

t;;

;r:

;:e

Cl

-J

171

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

operator of equal hierarchy finds the switch ON.


Unstacking of equal hierarchy operators is forbidden, if the switch is ON.
The code generation for the sample expression is demonstrated in table 5.6 and is expected to be selfexplanatory. The generated instruction sequence is represented at the end of the figure. This instruction
sequence is represented by the tree diagram of fig.
5.2, which is equivalent to the one obtained from expression (5.1a). We have generated three instruction
triples depending on true operands only. These subexpressions represent the formation of the bracketed
differences of expression (5.1a). After the formation
of the first two independent instructions there is
issued an instruction with operands which are the results of the previous instructions in the sequence.
This order of instruction scheduling could have been
avoided by allowing each operator to unstack only one
operator of equal hierarchy. In this case the code
generated is represented by the following bracket structure of equation (5.1b):

x = (A-B) + ((C-D) + (E-F))

(5.1 b)

which is completely equivalent to expression (5.1a).


The same result could have been obtained by applying
the second restriction (b) to our unstacking rule:
Re 2: Each operator unstacks only operators of higher
hierarchy, or only one operator of equal hierarchy.
Then the operator enforcing the unstacking joins
the stack and a true-only mode is set (ON). In
this true-only mode (ON) it is only allowed to
form instruction triples with true operands. The
operand hold stack may be searched for true operands within the range of equal hierarchy operators, if necessary. The true-only mode is cancelled (OFF) if the operator that has enforced
the mode to be ON is unstacked.
The implementation of this rule is very simple and consists in the introduction of another switch (SWITCH2)
which is turned ON and OFF according to the above rules
and controls the unstacking. The generation of intermediate language code under the control of this switch
is demonstrated for the expression (5.1) at the end of
table 5.6. It leads to the same sequence of instruction
triples, represented by expression (5.1b) as the more
simple rule of un stacking only one operator of the same

-AB

&

X
=

&

A
X

t"r1
t"r1

;t:

tl

::a

C'l

&

1
X

&

B
A
X

&

+
=

&

&

&

-1

-1
ON

ON

OFF OFF
0

-1
1

OFF OFF

C-D+E-F

+C-D+E-F

::a

t"r1

;;::

::a
>
t"r1

;0

C'l

>
z

C'l

'"rj

t"'"

tl

>
z

t"r1

til

tl

til
::a
n

B+C-D+E-F

-B+C-D+E-F

A-B+C-D+E-F

X=A-B+C-D+E-F
=A-B+C-D+E-F

INPUT LIST
(INFIX EXPRESSION IN
HIGH LEVEL LANGUAGE)

:-n

OFF OFF

OFF OFF

OFF OFF
OFF OFF

S2

A
X

-1

-1

-1
-1

Sl

&

&

&

&

&

OPERATOR HOLD STACK


OPERATOR HIERARCHY
VALUE

&

OPERAND
HOLD STACK

The direct generation of instruction triples (intermediate language)


for expression (5.1) under the control of SWITCH1, and SWITCH2, resp.

OUTPUT LIST
(INSTRUCTION TRIPLES)

Table 5.6

-.)
t-)

1
2

1
2

-CD
+12

-AB

-CD
+12

&

&

-AB

-AB

1
2

&

&

=
&

&

&

ON

>-l

-1

1
1

-1

1
1

-1

-1

OFF

ON

ON

-F

E-F

-.J

tTl

::0

:E
;..

>-r1
>-l

Ul

;..

'9

;..
::0

::0

>-l
tTl

c:::
""

=::

>-r1

Ul

t-'

;..

::r:

+E-F

D+E-F

-D+E-F

OFF ON

OFF ON

ON

=::
tTl

-1

1
1

-1

(under control of S1)


+
1
3

&

D
C
1

&

C
1

&

-AB

-AB

&

&

C
1

-CD
+12

-AB

Table 5.6 cont.

>-r1

c:::
z
u
;..

-AB

-AB

-AB

-AB

1
2

1
2

1
2

-CD

-CD

-CD

+34

-EF

+12

-CD

-AB

+12

-CD

1
2
3
4

1
2
3

Table 5.6 cont.

&

&

-1

1
1
0
-1

+
+

=
&

&

+
+

2
1
X

&

2
1
X

1
1
1
0
-1

1
1
0
-1

(under control of 32)


+
1
2
+
1
1
0
X
=
-1
&
&

&

&

3
X

F
E

ON

ON

ON

OFF

-F

E-F

'"

t'j

t'j

;>

'is:"
'"

;c

z
C'l

;>

C'l

t""
'Tl

;>

z
z
I:j

~
Vl
t'j

("')

t'j

s:

;:t

I:j

'"

t'j
t'j

C'l

"""

-...I

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

1++lIo/j

o/j

.j..l

s::
0
0

\0
lJ'")

OJ

CQQ

C:X:o
1

CQQ~t<\.::r

C:X:O~N..--t

1++

r-I
..Q

E-I

..--tN

..--t N

t<\.::r LC\

175

176

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

hierarchy at a time.
So far we have been dealing only with a very
simple sample expression, including only operators
of the same hierarchy. The simple unstacking rule and
its restriction works however equally well for more
complicated expressions like
(5.2)
The generation of instruction triples for this example
is demonstrated in table 5.7. The operators "*", and
"I" have been assigned a hierarchy value of +2, and +3,
respectively. The generation of the code is demonstrated both with and without using the restriction Re 2
to the unstacking rule. The two tree diagrams corresponding to the generated code are displayed in fig. 5.3
and fig. 5.4, respectively. As expected, the formation
of instruction tripples with "result" operands is delayed longest under control of the unstacking restriction Re 2.
In our discussion about the generation of optimized intermediate language lists for expressions given
in a higher level language we have not yet considered
the problem of local optimization. In this context we
are particularly interested in the elimination of common
factors or of common subexpressions in larger, more
complex expressions, and its influence on the execution
time. These questions are of particular interest for
the application programmer.
From the point of view of supporting parallel activities in highly parallel computer systems it may not
always be economical to eliminate common factors and
subexpressions completely. Sometimes this could make
the majority of operations in a sequence dependent on
the result of previous operations and the total task
would be put into an unefficient sequence.
As an illustration we consider the expression
(5.3)
which has first been discussed by Hellermann. A completely factorized form of this expression is given
by the equation

= A + Z

* (B

+ Z

* (C

z'"

D) ) .

(5.3a)

177

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

x
X=A+B*C/O*E-F

(5.2)

X ~ ((C/O) *(8 *E)) +(A-F)

Fig.

5.3

X
X=A+B*C/O*E-F
X=(A+(((B *(C/O)) *E)-F

Fig.

5.4

(5.2)

A
X

C/D*E-F

&

&

:0

-1

i'l

t<"l
~
t<"l

;..

i'l

zC'l

;..

B
A
X

'Tl

t"'

;..

'"zt<"l

i'l

:n

p:

i'l
0

t<"l
t<"l

C'l

C'l

OFF

*C/D~E-F

B*C/D*E-F

+B~C/D*E-F

A+B1tC/DJtE-F

X=A+B C/D~E-F
=A+B C/D*E-F

INPUT LIST
(INFIX EXPRESSION IN
HIGH LEVEL LANGUAGE)

&

&

*+

-1

+
=

B
A
X

-1
1

OFF

OFF

OFF

OFF

OFF
OFF

S2

&

+
=

-1

-1

-1

-1

&

A
X

&

&

&

X
&

&

X
&

&

OPERATOR HOLD STACK


OPERATOR HIERARCHY
VALUE

&

OPERAND
HOLD STACK

The direct generation of instruction triples (intermediate language)


for expression (5.2) without and under control of SWITCH2, resp.

OUTPUT LIST
(INSTRUCTION TRIPLES)

Table 5.7

-.-J
00

ICD

)(B1

*B1

ICD

Table 5.7 cant.

3
1

+
=
&

l<

-1

A
X
&

&

&

-1

+
=

-1

&

-)(

(without control of 32)


x
2
2
A
+
1
X
0
=

C
B
A
X
&

-1

-1

&

*+

&

+
=

C
B
A
X
&

C
B
A
X
&

OFF

OFF

OFF

-F

E-F

*E-F

D~E-F

ID*E-F

"l"l

\0

--l

'"

"l"l

..,

0
en
0

:>

:I:

'"
'it'"
:>

..,c""m

:;:::

(')

"l"l

en
0

t"""'

:>

..,zm

:>
:;:::

leD

+A3

-4F

leD

leD

)t2E

*B1

+A3
leD

*2E

~B1

+A3
leD

~2E

~B1

3
4
5

3
4

3
4

Table 5. ,7 cont.

X
&

1
B

X
&

1
B

&

&

&

&

*+

&

2
2

-1

2
2

-1

1
0
-1

1
0
-1

(under control of 32)

&

X
&

X
&

ON

ON

-F

E-F

t'l
t'l

:;.;

t'l

is::

>
t'l

i>':
:;.;

:.,;

C'l

>
z

C'l

'TI

t""

>
z

t'l

en

i>':

@
:;.;

;C

:;.;
0

C'l

00

leD

1
2

*12

3
4

leD

1
2

+34

-AF

*BE

~d2

*BE

*12

:tBE

leD

1
2

Table 5.7 cant.

&

A
&

A
X
&

&

&

&

-1

1
1
0
-1

-1

1
1

ON

ON
F

00

t1:I

::0

>-l

"lj

'"0

>
z

Ii'

>
::0

::t:

::0

>-l
t1:I

c:::

'<I

s::

(')

"lj

'"0

>
t""

>-l

>
s::
t1:I

"lj

182

GEERD H. F. DIERCKS EN AND WOLFGANG P. KRAEMER

Another possibility of factorizing the expression (5.3)


is just to eliminate the common subexpression Z *Jt 2,
leading to the two equations

Z2
X

Z::I'Z
A + B*Z + C*Z2 + D*Z*Z2.

(S.3b)

The number of multiplications in the three different


formulations of the same mathematical expression encreases from equ. (S.3a) to (S.3b) and to (5.3). Because the multiplication is a very time consuming
operation, one could intuitively - knowing nothing
about parallelism - conclude that the expression (S.3a)
is a better code than the two other ones.
The tree diagrams and the intermediate language
lists for the three expressions are given in figs. 5.5
to 5.7.
These tree diagrams show that the instruction
"structures" of the expressions (5.3) and (S.3b) are
much broader and slightly lower than the one for the
expression (S.3a). This means that the code generated
from the equations (5.3) and (S.3b) offers more possibilities for parallel processing than the code generated from the completely factorized expression (S.3a).
Actually expression (S.3a) leads to a completely "sequential" instruction list without any possibility for
parallel processing.

x
X-A+B*Z**2+C*Z*3 (53)

INSTRUCTION TRIPLES.
1 * BZ
2 ** Z(2)
3 * C2
4 ** Z(3)
5 * D4
6 + 35
7 + 26
8 + 17

Fig.

5.5

183

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

X-A+Z.(8+Z.(C+Z.O)) (5.3a)

INSTRUCTION TRIPLES

1
ZO
2 + Cl
3 * Z2
4 + 83
5 * Z4
6 + AS

Fig. 5.6

Z2 Q2*Z
X -A+8*Z+C*Z2+0*Z*Z2(5.3b)

INSTRUCTION TRIPLES
1
2
3

*
*
*

*
*
+
+
+

Fig. 5.7

ZZ
8Z
Cl
OZ
41
35
26
A7

184

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

The minimum execution time for the three instruction lists depends of course strongly on the resources
which are available to the processor. Therefore the execution times of the three instruction sequences are determined in table 5.8 for two different configurations
of resources: for a system with 4 multiply units and
1 add unit, and for a system with 2 multiply units and
1 add unit. Only the execution times of the multiply
and add instructions have been taken into account, i.e.
instruction fetching," operand fetching and storing, and
storage conflicts have not been considered. In addition,
the actual processing time has been calculated and
listed, which means the time that the different resources actually work on the exectuion of the instruction sequences.
We notice that the expression (5.3) has the shortest minimum execution time on the machine configuration with 4 multiply units, and that the expression
(5.3b) has the shortes minimum execution time on the
machine with 2 multiply units, which can be considered
as a realistic configuration. It is surprising, that
the expression (5.3) is executed only slightly slower
than the completely factorized expression (5.3a) on a
machine with 2 multiply units, although the expression
(5.3) has twice as many multiplications as the expression (5.3a). For the expression (5.3) and (5.3b) the
internal processing time is larger than those for the
expression (5.3a), which is to be expected, because
the expression (5.3a) has the smalles number of multiplications. - Obviously, the completely factorized expression (5.3a) - which has the minimum number of multiplications - will be executed fastest on a computer
system with only 1 multiplication unit. Here we observe
the" fact, that although more actual work is done by the
resources, the minimum execution time is much shorter
than the internal processing time, because much work
(that is instruction execution) can be done in parallel.
This leads to the conclusion that the elimination
of common factors and subexpressions leads not always
to a decrease in the execution time. Whether such an
elimination has some advantage or not depends strongly
on the independent fuctional units available in the computer system and must be considered carefully by the
programmer. It must be stressed, that so far no compiler
is available, that transfers the three expressions
(5.3), (5.3a) and (5.3b) into each in order to generate
a code optimally suited for execution on a computer
system with a given set of functional units.

185

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

INSTRUCTIONS

FETCH
FETCH
MPLY
FETCH
FETCH
MPLY
FETCH
FETCH
SUB
ADD
ADD
STRE

Rl,A
R2,B
Rl,R2,R6
R3,C
R4,D
R3,R4,R7
R5,E
Rl,F
R5,Rl,R2
R2,R7,R7
R7,R6,R6
R6,X

10

20

30

40 TICYCLESI

E3

I
I

E3

I
I
I
I

E3

E3

E3

E3
E3

>------E3

>------E3
>------E3

----.:3

f----------i

Fig. 5.8

-45.---_~

.---_~-29

-36L--=..-----'

L-....;.....---'-24

-34,---_~
-25'--~---'

- 25 ,-----""----,

INSTRUCTION LIST:

- 14 '--""""""---'

FETCH
FETCH
FETCH
FETCH
MPLY

FETCH
FETCH
MPLY
SUB

ADD
ADD

STRE

Fig. 5.9

Rl,C
R2,D
R3,A
R4,B
Rl,R2,R6
R5,E
Rl,F
R3,R4,R7
Rl,R5,R2
R2,R7,R7
R6,R7,R6
R6,X

186

Table 5.8

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

Minimum execution times and "actual"


processing times for expressions (5.3)
on pseudomachines with different numbers
of functional units.

Functional units: 4 MPLY units,


1 ADD unit,
equ.

( 5 . 3)

(5.3a)

MRT
APT
(cycles)
11
11

4
4

11 cycles per oper.


4 cycles per oper.

44

26

4
4

30 715

C5.3b)

MRT
APT
(cycles)
11
11

33
26

4
4

3C)

4
4

Dr

Functional units: 2 MPLY units,


1 ADD unit,
equ.

( 5 . 3)

11
11
11
11

22
26
15
11

----zrn- 715

11

11

11

11

11

4
4

lf5lf5

(5.3b)

MRT
APT
(cycles)
11
11
11

11

11 cycles per oper.


4 cycles per oper.

(5.3a)

MRT
APT
(cycles)

MRT
APT
(cycles)

22
26
15

37 Dr

MRT
APT
(cycles)
11

22

11

11

11

11

4
4

lf5~

187

FUNDAMENTALS OF COMPUTER HARD-AND SOFTWARE

So far we have discussed optimization on the conceptual level, that is the rearrangement of individual
operations in such a way as to ensure minimal interdependence between the instructions. Optimization on this
level leads to an intermediate language list which allows
a maximum rearrangement of subtasks for parallel processing.
We will turn now to a discussion of the optimization on the machine specific level, that is the rearrangement of independent subtasks in such a way as
to make most economic use of the available resources
(independent functional units) and to ensure parallel
processing whenever possible.
First we describe briefly the assignment of registers to instructions. Register optimization is a
very subtle taks, and quite some literature exists
which is concerned with that problem. Again we restrict
ourselves here to a very simple approach which allows
to demonstrate the main features. The pseudomachine is
defined to have 7 registers: 5 registers (R1-R5) for
operand fetching and 2 registers (R6, R7) to store results. Each set of registers is used cyclicly. If no
result registers are available we use one of the available operand registers (R1-R5), to store intermediate
results. It is expected that this cyclic use of registers increases in the average the time before a register is re-used, and thus minimizes register conflicts.
Each register can be used again, if it has once been referenced as source of an operand. In general, the appropriate rule will be more complex, because more than one
instruction can refer to the same operand. Then a counter has to be used to keep track of the number of source references to the register, and it will become available to be re-used, if the counter is zero. We will
assume further that the registers available will be
sufficient to execute the given instruction sequence
without backing up register contents temporarily in
storage locations. In actual compilers, of course, care
has to be taken for such a backing up of registers.
The register assignment, as well as the optimization on the machine specific level to be described,
will be demonstrated using the expression
(5.4)
The intermediate language list (instruction triples)

188

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

generated for this expression is given in table 5.9.


The instruction triples are expanded in form of an instruction list such that each arithmetic instruction is
directly preceeded by a FETCH for those true operands
which are involved into it. The register assignement
to the instruction of this sequence is demonstrated in
table 5.10. Registers R1 to R5 are used as operand registers, and R6 and R7 are assumed to be the result registers in our installation. A new operand register cycle has to be started at instruction 8, fetching the
operand F. There is no result register available when
the 'SUB' instruction (9) is encoded because both result registers R6 and R7 are in sink mode and have not
yet been referenced as a source of an operand. Therefore register R2 is taken temporarily as a result register to hold the result of the subtract operation.
The execution of the generated code in the pseudomachine defined previously, which has been extended by
one add/subtract unit, is displayed in form of a Gant
chart in fi~ 5.8. We notice that there are a number
of delays which possibly might have been avoided by a
different instruction sequence.
Two opposite strategies may be applied here to
improve the present instruction sequence. They can be
characterized as latest possible assignment strategy
and as earliest possible assignment strategy.
In the latest possible assignment strategy an
instruction is issued at the latest possible time that
will not delay the next instruction that depends on its
result. The method can be illustrated using example (5.4).
We assume that the execution of the instruction list is
to be completed at some time zero. The instruction list
is then scanned backwards, and the latest possible initialisation times of the instructions are determined.
The execution times are chosen here to be those of our
pseudomachine. It is possible within this scheme to
take the machine configuration into account as well,
that means the number of independent functional units
available. In the present example this has been done
for the two multiply instructions using our pseudomachine with only one multiply unit.We see that the "first"
multiply instruction has to be shifted far ahead in order to get the "second" one started at its scheduled
time (-25). The approach and the generated instruction
list are displayed in fig. 5.9 and the corresponding
Gant chart is shown in fig. 5.10.

189

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

-Table 5.9

Instruction triples (intermediate


language) and "instruction list"
for equation (5.4)

( 5.4 )

X=A .. B+C~D+E-F
Instruction triples

"Instruction list"

*AB

FETCH A
FETCH B
IMPLY A,B,R1
FETCH C
FETCH D
MPLY C,D,R2
FETCH E
FETCH F
SUB E,F,R3
ADD R2,R3,R4
ADD R1,R4,R5
STRE R5,X

ltCD
-EF
+23
+14

Table 5.10

Register assignment for the


"instruction list" of expression (5.4)
R1

1
2
3
4
5
6
7
8
9
10
11
12

FETCH
FETCH
MPLY
FETCH
FETCH
MPLY
FETCH
FETCH
SUB
ADD
ADD
STRE

R1,A
R2,B
R1,R2,R6
R3,C
R4,D
R3,R4,R7
R5,E
R1,F
R5,R1,R2
R2,R7,R7
R7,R6,R6
R6,x

R2

K
K
K
K/S K/S

K
K/S K
K/S

R3

R4

K
K K
K/S K/S

R5

R6

K
K
K
K
K
K
K K
K/S K
K
K/S

R7

K
K
K
K
K
K/S

190

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

The instruction list generated avoids some of the


previous delays and thus executes the instruction sequence 4 cycles faster ( rJ 8 %) than in fig 5.8. - In
general, the latest possible assignment strategy is an
advisable technique, because it tends to balance resource utilisation by stagging instructions by an upper
time constraint.
The earliest possible assignment technique consists on the other hand in scheduling instructions as
soon as they become independent from preceeding operations. An example of this method is the floating fetch
technique, where the fetch instructions are issued as
soon as operand registers are available. The idea of
this method is to increase the chance that operands
are available in the register at the time they are
needed. The instruction list generated for the present
example using this technique is similar to that of
fig 5.9, but with the second multiply and the fifth
fetch instruction interchanged. This does not lead to
any change in the execution time as can be seen from
the Gant chart of fig 5.10. - The earliest possile
asignment strategy tends, however, to lead to contentions for limited resources (in the above example for
the fetch units).
It is possible to combine both methods in such a
way as to "float" the fetches as before whenever operand
registers are available, but to issue arithmetic instructions whenever the necessary operands are available in
the registers. For the present example this combined
technique leads to the same instruction list as the
floating fetch technique alone.
Finally, we observe in our example, that the execution of the expression (5.4) can be speeded up by
interchanging the operands of the two add operations as
shown in fig 5.11. Here we first add the results of the
first multiply- and the subtract operation, and then
perform the final add. The execution of the expression
can thus be speeded up by another 4 cycles. This optimization cannot be achieved on the machine specific
level, but it would have to be recognized by an optimization technique on the conceptual level.
In the present section some basic compiling and
optimization techniques have been described. The aim
has been to outline, what optimization with regard to
parallel processing can be expected from a compiler,

191

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

and what optimization a compiler definitely cannot


perform. It has been pointed out that the knowledge
of the basic compiler techniques can help to write
an optimum code for parallel processing even in
higher level languages where the programmer has only
little influence on instruction scheduling.
INSTRUCTIONS

10

FETCH RI,A
FETCH R2,B

lIB

FETCH R3,C
FETCH R4,D
MPLY

lIB

MPLY

R3,R4,R2

ADD

R6,R7,R6

ADD

R2,R6,R7

STRE

R7,X

I!!I

: I

lIB I

I!!I

--j

lIB

f--------E3

lIB

0
I!!I

FETCH RI,D
SUB
ADD

R5,RI,R7

MPLY

R6,R7,R6
R3,R4,R7

ADD
STRE

R6,R7,R6
R6,X

f--------E3

-----.e

INSTRUCTIONS

FETCH R4,F
MPLY RI,R2,R6
FETCH R5,C

f--------E3

I!!I

Fig.

FETCH RI,A
FETCH R2,B
FETCH R3,E

40 TICYCLESI

~0

I!!!I

FETCH R5,E
FETCH RI,F
R5,RI,R7

30

RI,R2,R6

SUB

20

10

I--------l

5.10

20

30

I!!I

I!!!I
lIB

..

: I

.1

lIB

...

f---------D
f---------D

lIB
lIB

E3

Fig.

5. 11

40 TlCYCLESI

192

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

6. INPUT/OUTPUT OPERATIONS AND BUFFERING


We have already mentioned (cf. chapter 3) that
the separation between arithmetic and input/output
(I/O) operations has been the fundamental step towards parallel processing in modern computer systems.
Whereas in the early, completely sequential machines
the operations of an instruction sequence had to be
performed one after each other because they were supervised and executed by the same central processing
unit, most present day computers allow for the parallel execution of independent operations. This leads
to a considerable reduction in the total elapsed times
and to an increased utilization of the independent
functional units. In the hypothetical case of completely overlapped arithmetic and I/O-activities the
total elapsed time will be equal to the time which is
needed for the most time consuming operation. This
hypothetical limiting situation may in many programs
be reached if some care is taken in designing the program logic to allow for ov~rlapped processing as much
as possible and in the actual coding as well. Programs,
in which the total elapsed time is defined by the CPU
time, are called CPU bound, and programs, in which the
total elapsed time is limited by the I/O time, are
called I/O bound.
In general the ultimate goal of program design
and coding is to achieve a balance of CPU- and I/O
timings within the same task. This cannot always be
realizeil.,if the problem itself does not allow for such
a balanced code. For such cases, multiprogramming, that
is the simultaneous execution of several tasks in the
machine, has been introduced to guarantee a balanced
usage of the various functional units in the computer.
Contemporary computer systems with I/O units separated from the central processor differ from each
other in how much "intelligence" is transferred to
these I/O units, and to what extent the I/O processes
are still controlled by the central processor. The
organization of input/output processing and the design of the necessary functional units have undergone fast development within the last years. Intermediately, the I/O functional units in a sense were
controlled via fixed length buffers. The central processor controls the transfer of data from processor
storage to the buffer storage and vice versa. In par-

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

193

ticular, it was responsible for collecting the different data from the positions in core, and finally
to initialize the functional unit. The complete buffer area was then transferred to the external storage
under control of the I/O functional unit. Thus, the
control processor was still strongly involved in I/O
operations. There were a number of variations in this
approach as for example, the number of buffers supplied
or the technique of filling the buffer (scattered read
and write). But the fixed buffer length put quite some
restrictions on the data format on the external devices.
Present day I/O functional units are much more
advanced, that is much more control functions have
been transferred from the CPU to the I/O functional
units. But still today, the control logic differs considerably from manufacturer to manufacturer. The two
ends of the development are: highly general I/O processors that can serve many different devices, and
highly special I/O processors which are oriented to
serve one or a group of logically similar I/O devices.
We will discuss I/O processing, using the IBM facilities as example of a rather general and flexible system.
The input/output system basically consists of
channels, control units, and devices. The channel
provides paths for data and control signals from the
processor storage to the control unit and vice versa.
The channels vary by the following characteristic
features: (1) the speed, that is how many units of
data can be transferred in a time unit, (2) transfer
width, that is how many units of data can be transferred simultaneously, and (3) in their capability of
data transfer: simplex channels transfer data in one
direction, half-duplex in both directions sequentially,
and duplex in both directions simultaneously.
The use of a channel with a given speed characteristic is limited to such devices, which have equal
or smaller data speeds. Otherwise, the channel receives data from the control unit with a higher speed,
than it can transfer them to core, and data would be
overlayed and lost. - Specialized channels have been
developed for devices with a typically slow data rate,
like card readers, card punches, and line printers.
The characteristic of these multiplexor channels is,
that they serve a number of devices simultaneously at

194

GEERD H. F. DlERCKSEN AND WOLFGANG P. KRAEMER

the same time, by transferring data to and from


devices in quantities of 1 or a few characters
rather than as a physical record (multiplex mode) .
The control units include all those control
functions which are specific of a device or a class
of devices. This separation of control function allows
to design channels which can be used for a number of
devices, and on the other hand to control a number of
equal (or similar) devices by the same control unit
(hardware) .
The devices traditionally are: card reader, card
punch, line printer, magnetic tapes, magnetic disks,
and magnetic drums. Within each class of devices, data
speed and capacity characteristics may vary considerably.
Highly advanced channels are processors which are
able to interpret and execute a limited instruction
set adequate and sufficient to control I/O operations.
The channel activity is started under control of the
central processor, and basically, the address of the
entry point of a channel program resident in processor storage is passed. This channel program contains
all information to control I/O operations, like:
data length, data position in storage, device type
and unit, and data structure on the external device.
This program is processed by the channel, and it allows
a highly flexible data transfer completely under control of the channel. It is important in this context
to realize, that the channel competes with the central
processor and other channels (if present) for access
to the primary storage unit. Data transfer (I/O operations) thus encrease the contention for access to the
S-unit. The channel passes all information that is device specific to the control unit for interpretation
and execution. Very recent channels are able to serve
other I/O requests during the time a control unit is
busy with the execution of control functions that do
not involve data transfer. This capability is extremely important in the case of locating data positions
on disk devices, a process which in the average takes
exceedingly long compared to the acutal data transfer
(see below) .After the I/O operation is completed or
aborted for some reason, the central processor is informed (interruped).
- Extended error recovery is
usually available on all levels of processing.

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

195

Information isstored on magnetic tape as continuous areas of data and control information, called
physical records, separated by areas containing no information, called record gaps. To write and read information, the tape is transported over the write/
read head with constant speed. The time necessary
to write or read data is called the write/read time,
or data transfer time. The time necessary to accelerate
the tape to constant speed and to stop its movement
after the I/O operation, are called start and stop time.
It is actually because start and stop times are finite
that record gaps are necessary. The characteristic
feature of data stored on magnetic tape is that they
can be accessed only sequentially. The tape has to be
moved from its present position to the position of the
record to be accessed by passing over all records physically stored on the tape between the two positions.
Typical sizes of records and record gaps and of timings
for read and write operations are given in table 6.1.
A magnetic disk (pack) is built of a number of
disks, mounted on the same axes. In operation, the
disk pack is spinning permanently with constant speed.
Above each disk surface, one (or a small number) of
write/read heads are positioned. Each write/read head
can access a spherical trace on the spining surface,
called a track, the collection of all tracks on different surfaces above and below each other is called
a cylinder. All write/read heads are mounted on the
same arm in such a way, that data on the same cylinder
can be accessed without moving the arm. To access data
on different cylinders usually the disk arm has to perform a radial movement.
Data are stored on tracks, data exceeding one
track in length are stored on cylinders. The data are
stored on the disk in such a way that each track contains the same number of data. The characteristic
feature of data stored on disk is that each physical
record can be accessed directly. The write/read head
can be moved directly, according to control information,
into a position to access the specified record. This
mayor may not (but usually will) involve a radial
movement of the disk arm. The average time needed for
this movement is called the average radial delay time.
The average time passing until the record to be accessed passes over the write/read head is called the
average rotational delay time. Typical data transfer
and aver. delay times are listed in fig. 6.1.

MAGN.DRUM
IBM 2301

MAGN.DISK
IBM 2314 A1
IBM 3330

4.6
0.6

RECORD
RECORD
(7200 BYTES)
GAP
LENGTH
LENGTH
(inch)
(inch)

60
30

(ms)

(ms)

5.3

AV.ACCESS
TIME

START/STOP
TIME

8.6

12.5
8.4

(ms)

AV.ROTAT.
DELAY TIME

6.0

23.1
5.8

40.8

14.6

95.6
44.2

46.1

DATA
AV.WRITE/
TRANSM.
READ
TIME
TIME
(ms)
(ms)

Typical sizes of records- and recordgap length, and


typical tmungs for write- and read operations for various devices

MAGN.TAPE
IBM 2403/6
9 TRACK,1600 BPI

DEVICE
(MODEL)

Table 6.1

m
m

;<l

;;;::

m
m

;..

;<l

:e

Z
C'l

;..

C'l

""I

::E
0
r

zm
;..
z0

en

('l

;<l

;t

;<l

C'l

\0
0\

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

197

The magnetic drum basically differs from the


magnetic disk, except for the shape, in that there
is a separate write/read head for each track. That
means, on magnetic drums no average radial delay
times occurs.
A comparison of the sample timings listed in
table 6.1 leads to the following conclusion. The total
I/O times for records to be accessed sequentially is
shorter for data stored on magnetic tape than on disk,
because of the rather large average delay times.
We turn now to the techniques of achieving
parallel processing of independent compute (arithmetic)- and I/O operations, including as far as possible parallel processing of independent"I/O operations
itself. Technically (hardware-wise) the possibility of
processing independent I/O operations parallel depends
on the availability of different channels to process
the I/O requests in parallel.
The compute and I/O operations involve the manipulation of data. For different operations executed
in parallel, independent areas for data storage have
to be supplied in processor storage. Traditionally,
the storage area supplied for data processed in I/O
operation is called buffer storage or buffer area.
A number of different techniques of organising
compute- and I/O operations and the associated storage
request have been developed, usually known as buffering
techniques. The different techniques developed basically differ in: (1) the number of buffers assigned,
and (2) the technique of assigning the buffers to the
independent tasks (operations).
The assignment of one buffer area to each data
set (data file) is called single buffering, the assignment of multiple buffers is called multiple buffering.
In particular, if two buffers are assigned to a data
set it is called double buffering. Finally, the assignment of a variable number of buffers to one data set
is called dynamic buffering.
There are basically two techniques of assigning
the data areas to the independent compute- and I/O
operations: fixed assignment and variable assignment.
In simple buffering a fixed area of core storage is
assigned to each parallel, independent task, and data

198

GEERD H. F. DIERCKSEN AND WOLFGANG P. KRAEMER

are transferred between the two areas on completion of both independent tasks. Normally, the completion of both tasks is controlled by the compute
task, but this is not a necessary condition. In
exchange buffering the assignment of the areas is
switched on completion of the indepen~t tasks: the
buffer area becomes compute data area and vice versa.
Obviously, dynamic buffering can only be realized
within the scheme of exchange buffering technique.
A special form of exchange buffering is the cyclic
buffering where the compute data and buffer areas
are switched cyclicly. The number of buffers to be
used and the buffering technique to be applied is
completely determined by the bounding characteristics
of the total elapsed time. We can decide between the
following cases:
(1) Compute bound problems, and
Problems with balanced compute and I/O times:
Here a second subdivision is necessary.
(1a) The compute time of processing a data record
is always larger then the I/O time required
for the record: In such cases single buffering
is necessary and sufficient to achieve complete overlap of the operations. The assignment of more than one buffer is a waste of
processor storage and leads to no decrease
in total elapsed time.
(1b) The compute time of processing a record
varies and may be smaller than the I/O time
required for a record: In such cases multiple
buffering has to be used to achieve completely overlapped operations. The number of
buffers necessary to achieve complete overlap
is hard to predict exactly and depends on the
variations in compute time per record. In
general, the number of buffers has to be encreased, if the variations in compute processing times of different records encrease. The
minimum number of buffers is two. The encrease
of buffer size may be useful within certain
limits, but it becomes less effective if the
variations in compute time per record increase.
If input and output operations are necessary
for each processed record, then dynamical,
cyclic buffering is ideally suited and advised.

FUNDAMENTALS OF COMPUTER HARD- AND SOFTWARE

199

(2) I/O bound problems:


In these cases one buffer is always sufficient,
except perhaps in cases with very different
compute times per record. It should be stressed
in this context again, although obivous, that
buffering does not decrease the minimum elapsed
time, that is the time of the slowest operation,
but that it is an approach of smoothi.ng asynchronous parallel operations. The minimum elapsed time
of I/O bound problems can only be decreased by decreasing the I/O time itself. For a given number
of data the pure write/read time cannot be decreased, but the start/stop times can be minimi.zed.
In addition the question should always be investigated, if it is possible to decrease the number of
data to be transferred, even at the cost of encreased CPU-time. It should always be realized
that I/O operations have their price, like CPU
operations.
The interested reader is referred for further information on the question delt within the present series
of lectures to:
D.E. Knuth,

"The Art of Computer Programming"


Volumes 1,2,3, Addison-Wesley Publishing Company, Reading (Mass.)
1969-73

H. Lorin,

"Parallelism in Hardware and Software"


Prentice Hall Inc., Englewood Cliffs,
1972

W.M. McKeeman, J.J.Horning, and D.B. wortmann


"A Compiler Generator",
Prentice Hall Inc., Englewood Cliffs
1970
P. Wegner

"Programming Languages, Information


Structures and Machine Organization"
McGraw-Hill, London 1971

THE LOGIC OF SCF PROCEDURES

A. Veillard
C.N.R.S., Strasbourg (France)

INTRODUCTION
Since the Schrodinger equation cannot be solved exactly for
polyelectronic systems, one has to look for approximate solutions
of any desired accuracy. One way is the use of the variation methodD]. The Eckart theorem shows that any trial wavefunction ~
(which is normalizable) leads to a value of the energy E which is
never lower than the true ground state energy E of the system
HI/!
E

EI/!

( 1)

<~IHI~>

(2)

E)- E

<~I~>
(3)

This lS the key to the power of the variation method of approximating solutions to the Schrodinger equation. One can choose
the "best" wavefunction from several alternatives on the basis of
the criterion of lowest energy (however one should remember that
the energy can be an insensitive criterion with respect to a "best"
wavefunction for other physical properties).
SCF procedures have arisen from the use of a variety of approximate wavefunctions given as linear combinations of Slater
determinants (this insures that the wavefunction is antisymmetric
with respect to exchange of any pair of electrons) :

Diercksen et al. (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 201-250.
All Rights Reserved. Copyright 1975 by D. Reidel Publishing Company, Dordrecht-Holland.

202

A. VEIL LARD

'I' =

or

cI> 1 ( 1 ) ...
cI>2 ( 1 ) ...

cI>1 (n)
cI>2(n)

cI> (1) ........


n

cI> (n)
n

'1'( 1 ,2 . n) =/lcI>1 ( 1 )

(4)
cI>n(n)\l (5)

Each cI>.1 is a spin orbital.


For a closed-shell system of 2N electrons described by a sindeterminant of 2N spin-orbitals, built from N doubly occupied
spatial orbitals,

<Pl(2n)
( 6)

'I' =

the use of the variational method to,find ~he best possible orbitals <P leads to the Hartree-Fock SCF equations. If each M.O.<p is
expressed 'as a linear combination of basis functions (usually atomic orbitals hence the name of LCAO approximation), the HartreeFock (HF) equations turn to ~he Roothaan SCF equations [2].
Open-shell systems give rise to a number of difficulties.
For instance, in the case of two degenerate configurations as

tt-t-

or

4--tt

the total energy of a pure-spin state cannot be identified with


the expectation value of a single Slater determinant. The most popular methods for handling open-shell systems are the Restricted
HF method and the Unrestricted HF method. In the RHF method, each
Slater determinant in the total wavefunction is built of doubly
occupied spatial orbitals describing the paired electrons. This is
a straightforward extension of the closed shell case. In the UHF
or spin-polarized HF method, the trial wavefunction is a single
Slater determinant, but orbitals of opposite spin need not have
the same spatial functions :

'I' =

SCF equations for open-shell systems have been given by Roothaan


[3] fo~ the RHF method, and independantly by Pople and Nesbet [4J
and Berthier [5] for the UHF method.

203

THE LOGIC OF SCF PROCEDURES

For a closed-shell system described by a linear combination


of Slater determinants, use of the variational method to find the
best possible orbitals leads to the MC-SCF methods. Special choices
in the expansion have been first proposed by Das and Wahl in the
OVC method C6 and by Veillard and Clementi in the MC-SCF method[7}

All the above methods lead to one or several pseudo-eigenvalue equations


F

or FC

= s

(8 )

sSC in the LCAO approximation where

= xC

F is usually called the Fock operator or the Fock matrix, sand


(or C) are the eigenvalues and eigenvectors, S is the overlap
matrix. This lS a pseudo-eigenvalue equation in the following
respects :
- it differs from a true eigenvalue equation FC = EC
(case S =1}) ;
- the F matrix is a function of the C matrix. Since F depends
on the solution of the problem, the problem has to be solved iteratively : given an assumed C, F is set up and FC = ESC is solved.
Then, from the obtained solution C, F is recalculated, and so on.
Assuming that the process converges, after several iterations the
C obtained from the equation will differ inappreciably from the
one used in setting up F, the solution is then said to be selfconsistent (or it is said that the electrons move in a self-consistent field) --8,9'.
VIe shall deal first with the Roothaan BCF equations for closedshell systems. This involves only one pseudo-eigenvalue equation,
which means that all the C vectors are obtained as eigenvectors of
only one eigenvalue equation. The open-shell case in the RHF method leads to one equation for the eigenvectors corresponding to
the doubly occupied orbitals, and one equation for the eigenvectors
corresponding to partly occupied orbitals. This makes things slightly more complicated. In the MC-SCF method, one may have as many
equations as orbitals, each orbital being an eigenvector from a
different Fock matrix.

Similar problems are encountered when trying to solve the various SCF equations in the most efficient way. Let us mention:
1) The problem of building in the most efficient way the F matrix
or matrices from the two-electron integrals, which can number up
to several millions and are necessarily held in a slow access store
(magnetic tape or disks) ;

204

A. VEILLARD

2) Given the F matrix, what lS the most efficient way to solve


the pseudoeigen'ralue equation FC = ESC. This usually involves a
transformation to an eigenvalue problem F'C' = E'C' but not necessarily ;
3) Is it possible to speed up the convergency through the use of
extrapolation procedures or to insure SCF convergence ? What should
be the criterion for self-consistency?
4) ~~at is the benefit of using syrr~etry adapted functions instead
of basis functions ?
5) Usually, only some of the eigenvectors obtained by solving the
eigenvalue equation are to be used in the built-up of the F matrix
(ces). How do we insure that we pick up the right vectors?
CHAPTER I. REVIEW OF THE CLOSED-SHELL CASE
ROOTPAAN SCF EQUATIONS.

THE

P~RTREE-FOCK

AND

Only the important points and formulae in the derivation of


the SCF equations ~2l are given here.
Halmitonian for a system of fixed nuclei (Born-Oppenheimer approximation)(~.v denote the electrons, M and N the nuclei)
1
ZN
2:(--6 -2:-) +
v 2 v rN
N v

2:

~<v

2
2
32
-3- + 3 + 3z2
3y2
3x 2
v
v
v

-- +
r
~v

ZMZN
2:
M<N RMN

= kinetic

operator

v2:H(v)+ ~,v H'(~,v)


(1)1 ( 1)

1jI( 1 ,2 ... 2N)

"d 1)

..fiN!

( 11)

( 12)

( 10)

(PI ( 2 ) . . ..

l ( 2) . . ..

4> d 2N )
"i (2N)

(13 )

~(1) .......... "N(2N)

the total number of electrons being 2N


m

4>,.= 2: C,. X, = XC.


Al p=l

Alp AP

( 14)

X is a row vector, C a column vector :\ stands for an irreducible


representation of the molecular point group and m is the number of
basis functions.

THE LOGIC

or SCF PROCEDURES
2N

2N

1. +

i=l

205

i ,j= 1
i<j

J ..
lJ

(15 )

Z' K ..
i ,j lJ

The sUF~ation over J .. lS over each pair of spin-orbital.


.
lJ
In the summatlon Z~, summation is only for spin-orbitals . and
l
. with the same lJspin a or S.
J

*'

1. =

'*

H l (v)H(v).(v)dT = !.(v)H(v).(v)dv
l
v
l
l
V

(16)

with a spin-orbital, a space orbital and


dv
dT

( 17)

= dv

do

( 18)

*' *"

J ..

!.(jJ).(v)H'(jJ,v).(jJ).(v)dv dv coulomb (19)


l
J
l
J
jJ v integral

Kij

!:(jJ)j(V)H'(jJ,v)i (v)j(jJ)dvjJdv v

lJ

=:

1. +

i=l

2Z
~

i,j = 1

'Y

Z Z
i=l pq

C~.

lp

C~.

lq

H~

pq

+2 Z

~,jJ

Z
Z C~. C~. C.. C .
i,j=l pqrs lp lq ~Jr jJJs

[<~pq IjJrs>-~ ( <pslqr>+<prjqs]


Z Z
pq

~pq

r";"\

2Z
i

c,S\pq, ~rs

(20)

Summation lS over
(
(2J iJ'-\J') the space orbitals "'l' 21)

~~~::~~~

~pq ~pq

.l2

~,jJ

pqrs

(22)
D1'

~pq ~rs' ~pq,~rs

C~.

lp C~.lq

<pql rs>-i [<ps Iqr>+<pr Iqs>]

~
~
1
<pqlrs> = Ix (1)x (2)-X (l)X (2)dv 12
p
r
r 12 q
s

(23)

(24)
(25)
p,qE:~

r,sEjJ

(26)

We require that the orbitals . minimize the energy value.


This is obtalned if the 8. are sucE that oE=O. The MO i have
to conform to the orthonor~ality conditions

206

A. VEILLARD

*l

,+

2: CA S A CA =o .. =C.SC.
lp pq Jq lJ l J

.()J)dv =0 .. (27)
J
)J lJ

!~. ()J)~

(28)

pq

><:-

S, =!x, (v)x, (v)dv


(29)
"pq
I\p
I\q
V
These orthonormality conditions are introduced through the use of
Lagrange multipliers e ji , leading to the SCF equations:
N

F~.=

2: ~.e .. (30)
. 1 J Jl
J=

2: FA CA =2: SA CA EJ i
pq lq jq pq Jq
q

F=H+ 2: (2J.-K.) (32)


j=l

Apq

=H

+2: 2:

Apq)J rs

fi?Apq,)Jrs D)Jrs

(33)

1
J. (v)f(v)= (J~.* ()l)~.
()J)dv ) f(v)
l
l
r
l
)J

(Coulomb operator)

1
K. (v)f(v)=(f~.* ()J)-f()J)dv
H. (v)
l
l
r
)J l

(Exchange operator) (35)

)JV
)JV

(3~)

By subjecting the MO ~ to the appropriate unitary transfor~


mation, we can eliminate the off-diagonal Lagrangian multipliers.
We obtain the Hartree-Fock and Roothaan SCF equations.
(36)

F~.=e.~.
l

(39)

or

FC=ESC

(~O)

The M.O. 's ~ or the vectors C are eigenfunctions of only one


operator or matrix.
From the above equations, it can be shown that
E = 2:(E.+I.)
.
l

(~1

(~2)

2: CA' CA HA
pq
lp lq pq
E.
l

N
1.+ 2:
l

J=

12: D

(2J . . -K .. )
lJ

(~3)

lJ

+12: 2:

Apq Apq 2
Apq )Jrs Apq,)Jrs
pq.
)Jpqrs

(~4)

These relationships are satisfied only when the self-consistency


has been reached.

207

THE LOGIC OF SCF PROCEDURES

CHAPTER II. THE LOGIC OF THE SCF CALCULATION FOR THE CLOSED-SHELL
CASE
The SCF calculation proceeds in an iterative way. Each iteration includes essentially three steps :
- building the F matrix ;
- solving the pseudoeigenvalue equation FC = ESC ;
- recognizing whether the self-consistency has been reached ; if
not. one has to define a new set of trial vectors C to be used as
the input for the next iteration.
Each of these steps require some auxiliary calculations like :
the calculation of the density matrix D from a set of given vectors ;
- the calculation of the total energy E at each iteration simultaneously with the building of the Fock matrix;
- an orthogonalization process to insure that the set of trial
vectors is orthonormalized.
He shall go through each of these three steps.
1 - Building the F matrix
This requires the density matrix D and either the ~ supermatrix or the list of two electron integrals <pqlrs>. Some molecular
programs generate a list of two-electron integrals (POLYATOM [101
MUNICH [111. ASTERIX [12}) while some others (1I\OLECULE [131) create
directly the supermatrix (i.e. a list of
elements) without storing separately the two-electron intg~r~fs. He shall consider both cases.

gJ

.
.
A. From the D matrlx
and the.G)S supermatrlx.
He assume that
the basis set used is a basis of symmetry adapted functions belonging to the irreducible representations A of the molecular
point group. theu the Fock matrix F is symmetry blocked*:

A=l
A=2

A=3

From the general expression


F

Apq

=H

Apq

~ ~ Apq.~rs D~rs

(45)

~rs

*If the basis set used is not a basis of symmetry orbitals but rather one of pure basis functions (for instance atomic functions.
either Slater functions or contracted Gaussian functions. centered
on each atom) then A=l.

208

A. VEILLARD

F is found to be a symmetric square matrix


FApq=FAqp

(46)

(since

~APq,~rS= ~qp,~rs

and HApq=HAqp)

The closed-shell density matrix is defined as

The index A refers to the irreducible representations of the molecular point group. The index i refers to the occupied molecular
orbitals. The indices p and q refer to the basis functions. Clearly DA =D A
and we need to store in computer only the lower half
pq
qp
.
.
()
of the square matrlx for lnstance p~.
For example, in the case of two irreducible representations
with respectively three and two symmetry orbitals, we need to consider the following elements of the D matrix
Aq
AP
11
12
13
24
25

11

12

13

111
121
131

122
132

133

24

25

244
254

255

To use the computer storage in the most efficient way, these matrix
elements are stored not as a matrix but as a one-dimensional array
which is sometimes called a supervector. They will be stored in the
order 111 121 122 131 ...
The total number of matrix elements is
L

mA(~+1)

with mA the number of symmetry functions In the


irreducible representation A.

The algorithm for computing the densiLy mhtrix

c~n

be written as

Loop over A
Loop over p=l,m A
Loop over q=l,p
DApq=O.
Loop over i
D
=D
+2C. C .
Apq Apq
Alp Alq
The C vectors to be used are either the trial vectors at the first
iteration, or the results of the previous iteration or the results
of some extrapolation procedure (see below).

209

THE LOGIC OF SCF PROCEDURES

.
The next step is the contraction of the,) supermatrlx with
the density matrix (this is sometimes said to produce the P matrix) .
(\~

Apq

=H

Apq

+P

Apq

=H

Apq

~
D
, Apq, ].Irs ].Irs

+ Z Z

lJ rs

(48)

This is the time consuming opera~ion in the building of the


F matrix. From the definition of the ~ supermatrix

'.~)
j

Apq,lJ rS

[I

I]

(49)

=<pq rS>--41 <ps qr>+<pr qs>

with the symmetry functions p and q belonging to the irreducible


representation A, rand s belonging to the representation lJ.
Clearly one has

r:'
j

Apq,lJrS

_0::,'
_;~
r--:\
-,\
_J
='S;'
Aqp,lJrS Apq,lJSr lJrS,Apq

(50)

l.:'
So the supermatrix \.~ lS symmetrical with respect to the exchanges
p

Apq

lJrs

(,

Only the distinct elements of (..)~ wi th q ,:;; p, s ,:;; r, lJ ~ A need to


be computed and stored. They are usually stored as a linear array
in the following order
Apq
111
121
122

lJrs

111

121

111 / 11 1
121/111
122/111

122

121/121
122/121

131

133

244

254

255

122/122

The algorithm for generating the


Loop over A
Loop over p=l,mA
Loop over q=l,p
Loop over lJ= 1 , A
Loop over r=l, m
lJ
r=l,p

if lJ=A

Loop over s= 1, r

if r<p

s=l,q

if r=p

CfApq , lJrs

132

if lJ<A

$' supermatrix

may be written as

210

A. VEILLARD

The ~ supermatrix elements are usually written in this order in a


file in slow store (disk or tape).

Cf

The algorithm for the contraction of the


supermatrix with
the density matrix is then relatively straightforward. The matrix
P

AN
P

Apq

z ~

].Jrs

Apq, ].Jrs

].Jrs

lS computed in high speed store by reading the ~ supermatrix in


sequential order from the slow store and by multiplying each element by the appropriate element of the density matrix (already in
high speed store).
Loop over A
Loop over p=1,m A
Loop over q=1,p
= O.
Apq
Loop over A

Loop over p=1,mA


Loop over q=l,p
Loop over ].J
Loop over r

1
j

as above

Loop over s
Read

~Apq, ].Jrs

=P

].Jrs

P Apq

(),
+'S'
].Jrs
Apq,].Jrs

P Apq

+0
~ Apq,].Jrs

Apq

].Jrs

In order to have the correct contribution, it should be realized that each element ~A
in the calculation of PI..
should
also account for the term pq, ].Jrs
D . Since D pq= D
,
.
I\pq, ].Jsr .J.lsr.
].Jsr
].Jrs
the easlest way to have the correct contrlbutlon when only nonredundant elements of the supermatrix are stored is to store the
off-diagonal elements of the density matrix after multiplication
by a factor of 2.

9,

However, depending on the organization of the integral package, it is not always possible to generate the ~supermatrix in the
canonical order as defined by the above algorithm. Although this
COuld be done in a separate reordering step, then it is probably
more economical to have the j)supermatrix elements written in an
arbitrary order. Then each supermatrix element must be assigned
two indices IJ and KL :

211

THE LOGIC OF SCF PROCEDURES

I(I-1)/2+J
K(K-1) /2+L

IJ
KL

hr

This procedure is used


the MOLECULE program [13]. It allows
indeed to eliminate all the ~~supermatrix elements which are zero
or negligible, i.e. less than a given threshold (usually of the
order of 10- 7 ). In fact the construction of the Fock matrix may be
speeded up by this indexing since the algorithm is now simply
Loop over

the~

P ( IJ ) = P ( IJ ) +

supermatrix elements

5l (IJ , KL)

~ D( KL )

+~(IJ,KL) ~ D(IJ)

P(KL) = P(KL)

However, one has now to read from the slow store not only the
supermatrix elements but also the corresponding indices IJ and
KL. This may result in a significant increase of the 10 time whenever the number of
supermatrix elements is large.

5'

B. From the D matrix and a list of two-electron integrals


<pqlrs>. Most molecular LCAO-MO-SCF programs create a list of twoelectron integrals <pqlrs>. Some programs (POLYATOM, ASTERIOC) take
advantage of the symmetry relationships between basis functions to
create a non redundant symmetry adapted integral list (that list
of integrals has no integrals in it that are zero by symmetry and
Will group together those integrals that are equal to within a
sign, so that only one member of the group needs evaluation [lOJ).
Then the two-electron integrals are computed and stored (in slow
store) in a random order. This implies that the integral <pqlrs>
has to be stored with its four corresponding indices P,Q,R and S.
Let us consider a two-electron integral <pqlrs> with

R>S

P>Q
1

PQ = -P(P-1 )+Q
2

and

PQ

RS

1
RS = -R(R-l
)+8

We shall take advantage of the equalities


<pqlrs>=<qplrs>=<pqlsr>=<qplsr>=<rslpq>=<rslqp>=
<sr pq>=<srlqp>

(54)

The general term of the F matrix is given by


F =H +G =H +L D
pqlrs>-t<prlqs>-t<pslqr
pq pq pq pq rs rs

(55)

Let us list now all the contributions of a given <pqlrs>


integral to the lower half of the G matrix (q~p). This is found

212

A. VEILLARD

most easily by uSlng In the above expression


5?pq,rs =

<pqlrs>-~<prlqs>

(56)

instead of

S?pq,rs

= <pqlrs>--41<P r lqs>--41<p s lqr>

(57)

(these two definitions of l


may be shown to be equivalent),
pq,rs

By considering all the possible permutations, one finds the


following contributions
G

pq

rs

ps

sp

rq
qr =

G
G

pr

rp

qs

sq

<pqlrs> D
+ <pql sr> D
rs
sr
<rslpq> D
+ <rslqp> D
qp
1
pq
--<pql sr> D
21
qr
- -<srlpq> D
12
rq
--<rslqp> D
21
sp
- -<qplrs> D
12
ps
--<pqlrs> D
21
qs
- -<rslpq> D
12
sq
--<qplsr> D
21
pr
- -<srlqp> D
2
rp

(58)

For the most general two-electron integral <pqlrs> with no COlncidence among the four indices, these contributions reduce to
2<pqlrs> D
rs
G
2<pqlrs> D
rs
1
pq
G
- -<pqlrs> D
ps
12
qr
G
= --<pqlrs> D
qr
21
ps
- -<pqlrs> D
G
pr
12
qs
G
--<pqlrs> D
qs
pr
2
G

pq

(59)

when the four indices satisfY the relationship

P > Q> R > S


(see below for the case of other indicial relations),
The corresponding FORTRAN code may be written (labelling
the <pqlrs> integral as X) :

213

THE LOGIC OF SCF PROCEDURES

X2 = X + X
X=-05"X
G(PQ)
G(PQ)
G(RS)
G(RS)
G(PS)
G(PS)
G(QR)
G(QR)
G(PR)
G(PR)
G(QS)
G(QS)

+ X2 ~ D(RS)
+ X2 ~ D(PQ)
- X " D(QR)
- X " D(PS)
- X " D(QS)
- X " D(PR)

( 60)

The pair indices PQ, RS etc ... are defined by


PQ = ~P(P-1) + Q

(with P

Q)

(61 )

The algorithm for the computing of the G matrix from a list


of two-electron integrals re~uires comparatively more operations
than the one which uses the 0~ supermatrix. The algorithm includes
reading the entire two-electron integral file and, for each integral, adding its value with the appropriate coefficient to several
elements of the G or F matrix. Since the number of integrals to
be processed may be of the order of several millions, the elementary algorithm for each integral should be as efficient as possible.
Time saving is achieved by considering each half-matrix
A(I,J) as a linear array A(IJ), with IJ the pair index as defined
above (the fastest code for IJ obtains 1(1-1)/2 from a precomputed
array and simply adds J). From Table 1, one may see that adressing
an element A(I,J) requires on the Univac 1108 one MSI instruction
(2.4 ~s) while adressing A(IJ) requires one LX and one SA instruction (0.75 + 0.75 ~s). Since the above algorithm includes the adressing of twelve matrix elements, the saving by working with a
linear array may be of the order of 10 ~s for each integral. If we
consider a list of 4.10 6 integrals, the saving in computer time
may amount to about 40 seconds of CPU time per iteration.
The typical algorithm for the general case P > Q > R > S is
given in Table 2 together with its transcription into machine
language. In order that this algorithm be as efficient as possible, the matrices D and G which are computed (D matrix) or used
(G matrix) in other subroutines than the one which builds the Fock
matrix, have to be passed through a COMMON and not through the
use of a list of arguments (this would introduce some additional
instructions as illustrated in Table 3). In Table 4 we have reported the CPU timing for one SCF iteration
- either using a two-index adressing for the matrix A(I,J) which
is passed as an argument of the subroutine ;
- or using a one-index array A(IJ) which is transmitted through a
common.

214

A. VEILLARD

DIMENSION A(5,5),B(5,5)
DATA I,J /4,2/
A(I,J)=O.
B(I,J)=O.
END

DIMENSION ITABL( 5) ,A( 15) ,B( 15)


ITABL(I)=I*(I-1)/2
DATA ITABL /0,1,3,6,10/
DATA I,J/4,2/
IJ=ITABL(I)+J
A(IJ)=O.
B(IJ)=O.
END

IMJ
+

X11,NINTR$
0000,0
AO,J
LA
MSI,XU AO,5
AO,I
AA
X1,AO.
LX
A-6,X1
SZ
SZ
B-6,X1
LMJ
X11,NSTOP$
+
(0050505050505)

IMJ X11 ,NINTR$


+
0000,0
LX X1,I
LA AO, ITABL-1 ,X 1
AA AO,J
SA AO,IJ
LX X2,IJ
SZ A-1,X2
SZ B-1,X2
IMJ X11 ,NSTOP$
+
(0050505050505)

Table 1. Fortran and machine code (on 8 Univac 1108) for adressing a matrix element A(I,J)(right) or an element of a linear
array A(IJ)(left).
C

IJKL I>J>K>L
TWOINT=TWOINO+TWOINO
IJ=ITABL(I)+J
IL=ITABL(I)+L
IK=ITABL(I)+K
JK=ITABL(J)+K
JL=ITABL(J)+L
KL=ITABL(K)+L
F(KL)=F(KL)+P(IJ)*TWOINT
F(IJ)=F(IJ)+P(KL)*TWOINT
F(IL)=F(IL)+P(JK)*TWOIN1
F(JK)=F(JK)+P(IL)*TWOIN1
F(IK)=F(IK)+P(JL)*TWOIN1
F(JL)=F(JL)+P(IK)*TWOIN1

LA

FA
SA
LX
LX
LX
LA

AA

SA
LX

LA

AA

SA
LX

LA

AA

SA

LX
LA

Table 2. (Continuation on next page)

AO,TWOINO
AO,TWOINO
AO,TWOINT
X2,I
X6,J
X4,K
A2, ITABL-1 ,X2
A2,J
A2,IJ
X5,IJ
A4, ITABL-1 ,X2
A4,L
A4,IL
X1,IL
A6, ITABL-1 ,X2
A6,K
A6,IK
X3,IK
A8,ITABL-1,x6

215

THE LOGIC OF SCF PROCEDURES

AA
SA
LX
LA

AA
SA
LX
LA

AA
SA
LX

FM
FA
SA
LA

FM
FA
SA
LA

FM
FA
SA
LA

FM
FA
SA
LA

FM
FA
SA
LA

FM
FA
SA

A8,K
A8,JK
X2,JK
A1O,ITABL-l ,x6
Al0,L
Al0,JL
X6,JL
A12,ITABL-l,x4
A12,L
A12,KL
X4,KL
AO,P-l,X5
AO,F-l,x4
AO,F-l,x4
A2,P-l,x4
A2,TWOINT
A2,F-l,X5
A2,F-l,X5
AO,P-l,X2
AO,TWOINl
AO,F-l,Xl
AO,F-l,Xl
A2,P-l,Xl
A2,TWOINl
A2 ,F-l ,X2
A2 ,F-l ,X2
AO,P-1,x6
AO,TWOIN1
AO,F-1,X3
AO,F-1,X3
A2,P-l,X3
A2,TWOIN1
A2,F-l,x6
A2 ,F-1 ,x6

Table 2 (End). Algorithm for the computing of the G matrix for the
case P > Q > R > S : the Fortran algorithm and its transcription into
machine language (as given by the compiler).'
Note : in the above algorithm, the notations F and P stand for the
matrices G and D of the text. A relatively large number of instructions may be cut down by writing directly this fraction of the program in machine language.
There is an appreciable saving of computer time (of the order of
10 %) in the last case.
The above method requires that different formulae be used
according to the nature of equalities between the four indices
P,Q,R and S associated with a two-electron integral. If only the
half-matrices D and G are to be used, one has to consider fourteen

216

A. VEILLARD

classes of integral labels as defined in Table 5. Each class of


integral contributes to the Fock matrix in a different way. Using
this scheme the reading-in of an integral involves taking a 14 way
branch by class according to a class index. This class index is
usually computed at the time of the integral computation and stored with each integral QO,12]' The most efficient FORTRAN code uses a computed G T, wh1ch requires only four instructions
(Table 6).
SUBROUTINE AUX(A)
DIMENSION A(3)
1=3
A( 1)=0.
RETURN
END

SUBROUTINE AUX
COMMON A
DIMENSION A(3)
1=3
A(I)=O.
RETURN

LA,U
M,XU
SA
LA ,XU
SA

LA,XU
SA

LX
AX

SZ

AO,*O,Xl1
AO,-l
AO,NTEMP~

AO,3
AO,I
Xl,I
Xl,NTEMP$
O,Xl

END

LX

SZ

AO,3
AO,I
Xl,I
A-l,Xl

Table 3. Fortran and machine code (on a Univac 1108) for adressing
a matrix element which is passed either as a subroutine argument
(left) or through a common (right).

Nb. of basis functions


Nb. of two-electron integrals
CPU time a)
b)
c)

NiN2

Al(H 2 0)4

43
93582
1 '27"
l ' 21"
1'16"

65
1276823
4'27"
3'52"

a) Matrix element A( I,J) passed as an argument.


b) Matrix element A(IJ) passed as an argument.
c) Matrix element A(IJ) passed through a common.
Table 4. CPU time for one SCF iteration (on the Univac 1108 ).

217

THE LOGIC OF SCF PROCEDURES

I
J
K=L
I =K> J =L
I =J > K=L
I =J =K > L
I =J > K > L
I > J =K= L
I > J > K L
I > K L > J
I > J = K> L
I = K> J > L
I > K> J = L
I > J > K> L
I > K> J > L
I > K> L > J

IIII
IJIJ
IIKK
IIIL
IIKL
IJJJ
IJKK
IJKK
IJJL
IJIL
IJKJ
IJKL
IJKL
IJKL

Table 5. The fourteen classes of integral labels (the corresponding contributions to the Fock matrix may be found in
Ref.GO]and [1~
GO TO (14,12,11,10,4,13,7,8,6,5,9,1,2,3),IC
LA

TLE,U
JP
SLJ
J
J
J
J
J
J
J
J
J
J
J
J
J
J
+

A2,IC
A2,15

,A2
NERR2$
14L
12L
11L
10L
4L
13L
7L
8L
6L
5L
9L
1L
2L
3L
0157,(W.B.)
A2,~+1

Table 6. A computed G T statement and the machine code


(generated by the Fortran compiler on the Univac 1108) used for a
14 way branching.
For most of the integrals (P~Q~R~S), the processing of each
integral requires taking a 14 way branch by class and then accessing twelve elements from a pair of two dimensional arrays D and

218

A. VEILLARD

G, with six linear indices needed (designated IJ,KL,IK,JL,IL and


JK) (see Table 2). For each integral this process has to be repeated at each iteration. It has been proposed [14] to take all the
indexing calculations and branches by class outside the SCF process by
- storing the six indices needed together with the integral in
place of the four indices I,J,K,L ;
- the use of only one master formula to correctly describe the
contribution of any class of integral to the Fock matrix in place
of the above fourteen formulae ; this is achieved by scaling the
integral and using rescaling arrays (details will be found in

RefDY)

However the benefit to be expected from this procedure depends


probably on the specifications of the computer used. On the CDC
6600 computer for which it has been designed originally, the 60 bit
word length allows all six indices and the integral to be stored
in two words. On the Univac 1108 with a 36 bit word, going from
four to six indices would require three words instead of two,.hence
increasing the I/O time at each iteration.
Another way to eliminate the conditional G T statement for
each integral is based on a physical separation of the integrals
according to the fourteen categories above, by storing the integrals of different classes on different files (either disk or tape
files or a combination of both). This can be done either at the
time of the integral calculation or during a preprocessing step
before entering the SCF cycles. Each file will be processed by a
different section of code, likewise relieving the need for branching. This method also eliminates the need for storing a code
number with each integral. It has been implemented in an early
version of t~e ~STERIX program [15} and proposed independantly by
Billingsley ~6J. However the G T statement is relatively inexpensive (four machine instructions versus a total number of about
50 machine instructions (Table 2) for the contribution of each
integral to the G matrix). Hence the method is worthwhile only if
the different files are created at the time of the integral calculation and not during an additional step after the integral step
(since that step will require some additional I/O time, hence opposing the benefit expected during the SCF calculation). For large
calculations with the list of integrals including up to several
millions of integrals, this can be achieved easily only if the
computing installation has one fast disk with a large storage ability.
However, none of the above approaches introduces any change
in the set of equations (58) and in the end integral processing
time is governed by these equations.

219

THE LOGIC OF SCF PROCEDURES

C. Creating a QPsupermatrix file from a random two-electron


integrals file.
Since the processing time for eq.(51)(using the~ supermatrix)
should be much smaller than for eq.(58) using the two-electron integrals (by a factor of about three), it may be more efficient to
form the supermatrix @from the two-electron integrals prior to
the SCF iterations. A procedure for preprocessing ~ random list of
two-electron integrals in order to set a list of ~elements has
been proposed by Raffenetti [17] and we shall describe it briefly.
In order to form the'Y supermatrix, each integral of the set
(ij Ikl),(illjk) and (ikljl) must be readily available at the same
time. Since these two-electron integrals are usually computed
without any particular ordering, some procedure for bringing a
list of randomly ordered integrals into a specific order is needed. Such a procedure has been proposed by Yoshimine [18], based
on the canonical order
I 3- J

K7 L

IJ rKL

The procedure consists of two distinct sort steps. In the


first step each integral 1S ranked according to the value of its
canonical index IJKL
IJKL

IJ(IJ-l)/2 + KL
= KL(KL-l)/2 + IJ

IJ 7 KL
KL > IJ

( 62)

and placed in one of several categories. The size of each index


range (IJKL - IJKL + ) defining the nth category is determined
by the maxi~um sizenot a central memory area available in the
second step.
For instance, let us assume that we have a file of integrals
on tape with the index IJKL running from 1 to 106 The maximum
size of the central memory area in the second step is 5.10 4 words.
Each integral is placed into one of 20 files according to the value of IJKL,. the first file corresponding to 1 ~ IJKL ~ 5.10 4 ,
the second f1le to 5.104 < IJKL ~ 10 5 and so on. This is achieved
(Fig.l) by reading successively the integrals in core where they
are collected by category. They are then written along in one of
the twenty files on a random access device (disk) for later retrieval in the second step.
In the second step, each file for a given category is read
in core and the integrals are retrieved and put into order in the
sorting area according to the following algorithm:
- each integral of the category is processed in turn ;
- from the four indices I,J,K,L the index IJKL is computed according to

220

A. VEILLARD

IJ = ITAB (I) + J
KL = ITAB (K) + L
IJKL = ITAB (IJ) + KL
(IJKL was computed in the first step
by storing it with the integral) ;

one avoids recomputing it

- the integral is stored in the sorting area according to the value of IJKL.
Integral
file on tape
(IJKL from
1 to 10 6 )
Integral
buffer
(in core)
20 buffers
(in core) for
each category
20 files on
disk (size of
each file
5.104 )

t y~
~

----.
BUF1

FILE 1

1~IJKL~5. 104

} I/o

Integral
sorting

BUF2

} Core

process

~}

'\

I/O

FILE 20
9.5 105<IJKL~106

FILE 2

5.104<IJK~105

Fig.1 First step for bringing a list of randomly ordered


integrals into a specific order.
Each category, having been ordered in this way, 1S transferred in
turn to a sequential file for future use.
This procedure is efficient since there are a max1mum of two
reads and two writes for each integral.
In fact each integral mus,k be ordered according to its three
possible contributions to the ~J supermatrix. Each integral is then
categorized according to IJKL, IKJL and ILJK. It can be written
either in three different files on the random access device or in
only one file provided that the integrals for
(IJKL) or ){/2
(IKJL and ILJK) be kept distinct (this can be achieved by putting
some low order bit of the integral to zero or one). In the second
stage of sorting, since three integrals will in general contribute
to the same location, each must be added to the contents of that
location, with the corresponding array set to zero at the start

"J

221

THE LOGIC OF SCF PROCEDURES

of this stage.
Some comparative timing for the SCF iteration either from
the integral file or from the ~ file has been given by Raffenetti
[n}. The benefit of preprocessing the integral file is probably
not decisive when only one electronic state is considered. It may
be worthwhile when several electronic states have to be computed.
Since most excited states have open-shell configurations, the SCF
treatment will then require two supermatrices ~ and Q.. or ~ and
(cf. below). Preprocessing of the integral file to form the ~Q.
or ~~ file is a straightforward extension of the above procedure
and has also been described by Raffenetti [17}.

X.

2 - Solving the Roothaan equation FC = ESC.


Two techniques are currently used for the calculation of the
eigenvalues and eigenvectors of the Roothaan equation. The first
one transformS the problem into a standard eigenvalue problem
FC = EC, by sUbjecting the basis set to a Lowdin's orthogonalization [19}. One of the many techniques for the calculation of eigenvalues and eigenvectors of real symmetrical matrices is then
used. The second technique, sometimes called the Single Vector
Diagonalization (SVD)[20], is an iterative method by which, from
a trial eigenvector for the equation FC = ESC, a more accurate
one is computed. Both methods have been found useful. Turning to
the standard eigenvalue problem is probably the easiest way for
the closed-shell problem. The SVD technique has some interest
when only few eigenvectors are wanted (atomic calculations) or
when one has to be sure to pick the right eigenvector out of many
possible solutions. In this case, starting with some trial vector
which represents a decent approximation to the solution insures
the right choice among the many solutions. This has been found
useful for the open-shell problem and the multiconfiguration problem.
A. Through the standard eigenvalue problem F'C'
a) The Lowdin orthogonalization
Given the problem In the matrix form
FC = ESC

EC'

( 64)

one has to find C and E.


we first find a unitary matrix t such that
t+St=s

(65 )

with s a diagonal matrix (s and t are eigenvalues and eigenvectors o~ the overlap matrix S~.

222

A. VEILLARD

Then the diagonal matrix s -1/2 1S set up and one constructs the matrix
a
.( 66)
We rewrite (64) as
VFV+(V+)-lC

(67)

or

F'C' = EC'

(68)

with

F'

VFV+

( 69)

C'

(V+)-l C

(70)

VSV+

slnce

-1/2 t +St -1/2 -1/2


-1/2 ~
=so
So
=so
soso
= 'U

(71)

Given the overlap matrix S, one has to solve for its eigenvectors
t and eigenvalues s . From these two matrices one computes the V
matrix and the F' m~trix. One finds C' by solving the eigenvalue
problem and one finds C as

The storage requirements for this sequence of operations


consist of two square matrices NxN and two triangular arrays of
N(N+1)/2 elements if S is to be saved. Usually the V matrix is
saved either in slow store or in fast core from one iteration to
another. Through the use of the slow store and at the cost of a
slightly slower processing, the storage requirements can be
brought down to two square matrices. For instance, an SCF package
for closed-shell systems with two square matrices 120x120 can be
fitted in less than 60 K words (words of 36 bits) of the Univac
1108.
At this stage, it is possible to introduce the molecular symmetry into the SCF procedure [1 OJ, after building the F matrix
over a basis set X which is not symmetry adapted (namely whenever
the two electrons integrals are over the basis functions X, not
over symmetry adapted functions).
A set of nonorthonormal symmetry functions
through
o

= Z Y
X
p rp p

1S defined

or

The corresponding overlap matrix S'

S' =

00

+ +

= yXX Y

= ySy

1S symmetry blocked. It is then diagonalized by block

THE LOGIC OF SCF PROCEDURES

223

u +8 ' u = g

with g a diagonal matrix and one sets


-1/2 +
guy
and the rows of V represent functions which transform as the 1rreducible representations and are orthonormal since
g

-1/2 + 8 + -1/2
uyyug

g-1/2 u +8' ug -1/2

11
Then one sets

~F~+
= F'
v
v

= g-1/2 u +yFy +ug-1/2

and one diagonalizes F' by block


+ '
WF W =
one finds C as

= V+W = g-1/2y+uW

(80)

One will notice that the requirements that the rows of V


transform as the irreducible representations and are orthonormal
are also met by any molecular orbital coefficient matrix C solution of (64).
b) The Jacobi diagonalization method.
The Jacobi diagonalization method [21J has been favored by
many people for real symmetrical matrices. If A is the matrix to
be diagonalized, it involves a sequence of transformations T-1AT,
which leave unchanged the eigenvalues, the final result of the
transformation being a diagonal matrix. Each transformation denoted
T
1S equivalent to a 2x2 rotation.
pq
1 ...... 0
0 ...... 1

cos

o
o

pq

-sin
column

tp

0 ..... 0 sin
1 ..... 0

0 ..... 1

row p

0 ..... 0 cos <E---- row q

column q

(81)

224

A. VEILLARD

It is easily shown that this matrix

1S

its own inverse


(82)

Let us define the B matrix

B = T- 1 A T = TAT
pq
pq
pq
pq
One will choose

such that b

pq

= b

qp

= 0

(84)

By successive transformations, one can transform the A matrix to


a diagonal form. First one does successive transformations for
p=l q=2, p=l q=3, ... p=1 q=n
One proceeds next to the second row (p=2) ... and so to the last
row. Let us call C the matrix obtained after these successive
transformations.
If

i,j
i#j

C~.

1J

< E

E being a given threshold, the diagonalization is considered to


be completed. Otherwise one proceeds again in the same way starting with the C matrix instead of the A matrix. When the diagonalization is completed, the eigenvalues are equal to the diagonal
elements of the final matrix. Eigenvectors are obtained from the
successive transformation matrices.
The number of operations, hence the computation time, goes
approximately as n 3 if n is the dimension of the matrix.
The storage requirements for the Jacobi diagonalization method are two square matrices (they can be brought down easily to
one square matrix and one-half matrix).
B. Through the Single Vector Diagonalization method.
blem

Let C be an approximate eigenvector of the eigenvalue proo

(85)

FC = ESC

and 8C the correction to make it an actual eigenvector


C = C + 8C
(86)
o
We assume that C is normalized and 8C orthogonal to C
o

225

THE LOGIC OF SCF PROCEDURES

C+SC
o

(87)

(88)

C+SoC = 0
o

We define the approximate eigenvalue EO in terms of Co by


+

(89)

E =C FC

000

and let the actual eigenvalue differ from Eo by

OE

(90)
If we sUbstitute C and E into Eq. (85), and multiply the resulting
equation from the left by C:. we obtain readily
oE=C+FoC=C+(F-E S)oC
o

(9')

After defining the auxiliary vectors


s =SC
o

g =(F -E

s)c 0

one obtains after eliminating OE from (85) and (9')


+

(F-E S)oC = -g +(g oC)s +....


o

(94)

We obtain finally an equation for OC correct to first order

G oC =
o

where G
o

~s

the symmetrical matrix defined by


( 96)

It can be shown that the matrix G is singular. Eq.(95) is then


solved by the method of Gaussian glimination. The set of equations
G"oC,+G'2o C2+ .. +G,noCn = -g,
G2'oC'+G22oC2++G2noCn = -g2
is replaced. after substracting the nth equation from the preceding n-' equations with an appropriate factor such that the terms
oC are eliminated, then applying the same technique to the first
n~~ equations and so forth, by a set of equations
ToC = -t

226

A. VEILLARD

where T is a triangular matrix with elements only on and below


the diagonal. Then, putting OC 1=0, one computes OC 2 from the second equation, OC 3 from the thlrd one, and so on.
I Having found in this way a correction vector OCI for which
OC l = 0, we obtain the desired correction vector which is orthogonal to Co from
OC

= oCI_(C+SOCI)C
o

(99)

C = C + oC now gives a better approximation to the eigenvector,


afterOnormalization.
In practice, only few iterations are needed to obtain convergence. The method is quadratic, that is, the iterated vector has
roughly twice the number of significant figures compared to the
trial vector.
The storage requirements are two square matrices N x Nand
one triangular array of N(N+l) elements in fast core.
2

3 - SCF Iterations and extrapolations


In the SCF procedure one starts with a set of trial vectors
which are input to the calculation, calculate the density matrix
from these trial vectors and then the Hamiltonian, obtain the
eigenvectors and repeat the process until input and output vectors
agree within a certain threshold. This procedure raises three different questions :
i) how do we choose the trial vectors?
ii) there is no guarantee that the iterative process will converge;
even when it converges it sometimes takes many iterations before
convergence is actually aChieved. Clearly some scheme is needed ln
most cases, both to insure convergence and to speed it up.
iii) how do we recognize SCF convergence?
A. Choice of trial vectors. The choice of good initial vectors
is the most obvious way to reduce the number of iterations, and,
in addition, reduces the possibility of trapping of the solution
in a local minimum.
A trivial choice is a set of zero vectors. Although it may
lead to a convergent process in some cases, it is nevertheless a
very poor choice. Trial vectors do not need to be accurate, but
must have some qualitative relationship with the final vectors.
It is relatively easy to have fairly accurate trial vectors for
the molecular inner shells from atomic calculations, since the
inner shells are very similar in atoms and molecules. The easiest

227

THE LOGIC OF SCF PROCEDURES

way is then to perform separate atomic calculations with the same


basis functions. Trial vectors for the valence shells may be built
from a localized bond model, each molecular orbital describing a
bond between two atoms. In some cases trial vectors are obtained
from the results of previous calculations either for molecular
fragments or for a related molecule, for instance :
- trial vectors for the complex ion ~uC12- will be obtained from
the atomic orbitals for the ions Cu+
4and Cl-. This is a fairly good choice since this complex is relatively ionic.

- trial vectors for the borazane molecule BH3NH3 will start with
the SCF vectors for the NH3 and BH3 molecules.
In the POLYATOM program, trial vectors are taken as the
eigenvectors of the H matrix, id est the one-electron part of the
hamiltonian [10]. If the basis set used is a minimal basis set,
trial vectors for the valence orbitals may be obtained as the solutions from one of the many semi-empirical methods for the valence electrons (Extended Huckel Theory [22], CNDO method [23]).
If the basis set used is a double-zeta set, one may use the vectors solution of a semi-empirical method by splitting the corresponding coefficients. A scheme has been proposed for the construction of a trial density matrix when the basis set is made of localized orbitals (more precisely for a basis set of floating gaussian functions) based on the idea of combining molecular fragments

[24J.

B. Extrapolation procedures. Extrapolation is an attempt to


compute a better approximation to the SCF solution from the results of a few successive iterations. Many schemes of extrapolation have been proposed [20,25,26] of which two are commonly in
use [20]. Both make use of three sets of vectors which are connected by two successive iterations
Iteration nO

Trial
vector

C.

i+l

C.+l

Final
vector

/~

Ci +1

Ci +2

C. as input yields Ci +1 as output.


1
Let the complete set of vectors representing all the occupied
orbitals be collected in one vector C. Let C. be the approximate
vectors where i=l,2 denotes the iteration~. We denote the differences of these vectors from the actual solution by ~C.1 so that

228

A. VEILLARD

( 100)

/:J.C. = C.-C
1

The first method, called the Aitken method and introduced by


Hartree, is based on the assumption that each element of C approaches its limit with a geometric decrease of error
/:J.C i + 1

= m/:J.C.1

(101) (with m a diagonal matrix)

C.
- C
1+1,p
p

m (C. -C )
p 1p P

( 102)

C.
- C
1+2,p
p

m (C.
-C)
p 1+1,p p

( 103)

We can eliminate m from equations (102) and (103) to obtain an


explicit expressioE for the extrapotated solution C
C.1+ 2 ,p C.1p-C~1+ 1 .p
C.1+,p
2 -2C.1+1,p +C.1p

( 104)

Each component of the extrapolated SCF solution is obtained from


the corresponding components of three sets of vectors connected
by two successive SCF iterations.
In the second method, it is assumed that for successive iterations the endpoints of the vectors C. will describe a spiral in
1
.
a plane, centered around the SCF solut1on C. An express10n for C
in terms of Ci , Ci + 1 , Ci +2 can be derived (20J
C = (a-2S+y)
a
S

-1

(aCi-2SCi+1+yCi+2)

(105)

= (C i +2-C i + 1 ) (C i +2-C i + 1 )
= (C i +2-C i + 1 ) + (C i + 1-C i )

( 106)

One obvious precaution before extrapolating is to insure that


the three sets of. vectors
C.,
2 have
.
1 C'+
1 1 ' C'+
1
. the same relative
.
phase. The relat1ve s1gn of the vectors obta1ned at succeSS1ve
iterations through a Jacobi diagonalization is at random and extrapolation without caution will lead to disastrous results if
the signs of the three sets of vectors have not been made to agree.
Both extrapolation schemes have been found useful in general,
either by extrapolating on the vectors or on the density matrix
[271. However, experience has also shown that both schemes may
fail in some cases [28.29]. usually when the vectors are too far
away from the exact solution. A general procedure called damping
[30) has been found useful in some cases [28,29]. It consists of

229

THE LOGIC OF SCF PROCEDURES

scaling down the changes in the vectors produced by one iteration.


If E. is the total energy associated with the set of vectors C.
(and 1Ei + l with Ci + l ), the input to the iteration i+2 will be 1
taken as [28}

C = aC i +(l-a)C i + l

( 107)

with a of the order of 0.7. In Ref.~9]the damping procedure is


based on the density matrix instead of the molecular orbital coefficients.
C. The Convergence Guarantees. An unconditional guarantee of
convergence may be obtained according to a procedure proposed by
Saunders and. Hillier. [31J. For a 2n electron system with ~ basis
set of m baS1S functlons, we can construct a set of m-n vlrtual
orbitals such that all molecular orbitals form an orthonormal set.
We denote ~l the set of doubly occupied m.o. 's, ~2 the set of
virtual orbitals

XC

(108)

It is perfectly possible to consider the basis set at any given


iteration as being the set of trial molecular orbitals at that
iteration and we write equation (108) in this new basis as
(109)
with r a unity matrix. Arbitrary but small variations in the doubly occupied molecular orbitals may be written
( 110)
Equation (110) does not consider mlxlng amongst the trial doubly
occupied orbitals, since the total wavefunction and energy are
invariant to such mixing [2]. With these variations, the doubly
occupied molecular orbitals remain orthonormal to first order,
since their overlap matrix is given by
S =

~r ~l

(I:6+)(!_})(~d~2)(-f-)
~2

(I: 6 +) (-!~

6
+

= 1+6 6

(111 )

230

A. VEILLARD

The corresponding variations (to first order) In the density matrix are given by

D+oD =

+
rIo
o~+
) = (0:6) + (~lo-) + higher terms

~~)(r:~

with* D=CC+

(112)

(113)

according to equation (24). First order energy variations may then


be deduced from equation (23) and are given by
oE = L 20D

pq
= L 20D

pq

pq

(H

pq

pq

+ L DC\))

rs

Fmo

pq

rsUpq rs
'

( 114)

Fmo now stands for the matrix elements of the Fock operator in the
bR~is set of the trial molecular orbitals. Then by using equatlon
(112) for the expression of oD one obtains
occ. virtual
oE = 4 L
L
~ki F~~ + higher terms ( 11 5)
i
k
One notices that if
i) all ~ki are chosen of opposite sign to corresponding F~~ ;
ii) all ~ . are sufficiently small to make the higher terms of
equation t~15) smaller in absolute value than the first order
term ;
then the energy will go down (or remain stationary if all F~~ are
zero, id est at self-consistency). Then one would have a metnod
guaranteeing convergence to a stationary point on the energy surface (although not necessarily the ground state).
Let us perform an iteration using as basis set the trial molecular orbitals of this iteration. This is easily achieved by
constructing the Fock matrix F in the basis functions representation and by transforming it to the molecular orbital basis through
a linear transformation
( 116)

~In equation (24) we had included a factor of two in the density


matrix. Here we have included it directly in the expression of
the energy (114).

231

THE LOGIC OF SCF PROCEDURES

For the purpose of analysis we assume that the diagonal blocks of


Fmo are in diagonal form. We consider the diagonalization of a
matrix identical to Fmo except that the elements of the off-diagonal blocks CtFC2 and C!FCl have been multiplied by a small positive number A. If A is chosen sufficiently small, we may use
first-order perturbation theory. Then one finds that the result
of the diagonalization may be written in the form given by equation (110), where 6 ki is given by
6 ki

\ Fmo
ki

1\

(117 )

We have assumed that the trial molecular orbitals obey the aufbau
principle (all Ffkmo greater than all F~~). Then we see that the
A '

II
.
H
U ki ~re lndeed 0
Opposlte slgn to ~he correspondln? Fki . ~w:ver
one lS unable to guarantee ~he magnltudes of 6 ki .belng sufflc:ently small that convergence lS assured. If the ~rlal wavefunctlon
is far from convergence, then the analysis is less clear since
the application of first order perturbation theory may be invalid,
and energy reordering of trial molecular orbitals may invalidate
equation (117).
However, convergence may be insured by adding a positive
constant A to the diagonal elements F~ before diagonalization.
One then performs the diagonalization of this "level-shifted"
hamiltonian and orders the resulting eigenvectors according to
the aufbau principle based on the resulting eigenvalues. First
order perturbation theory now yields
6 . = Fm~ / (F~~ - Fmo - A)
kl
kl
II
kk

(1

Hll

For A sufficiently large


i) the first order perturbation theory used above is valid;
ii) no swapping of molecular orbitals can occur from one iteration
to the next ;
iii) all 6 ki have opposite sign to the corresponding F ki ;
iiii) all 6k . are sufficiently small that the higher terms in the
expansion orlequation (115) are smaller in absolute value than
the first order term.
Then, for A sufficiently large, the output wave function energy
will be lower than or equal to the input wave function energy and
we are thus provided with an unconditional guarantee of convergence
to some stationary point on the energy surface.
Implementation of this procedure only requires a linear transformation of the Fock matrix in the basis of molecular orbitals,

232

A. VEILLARD

then subtracting the leve~ shift~ng ~arameter A fr~m th~ F:~


elements, followed by a dlagonallzatl0n of the Hamlltonlan (the
method is well suited to a Jacobi diagonalization routine, since
Fmo achieves a diagonal form as convergence is approached). The
resulting molecular orbitals are then expressed with respect to
the basis functions through

C' = CV
C being the trial set of eigenvectors and V the eigenvectors resulting from the diagonalization of the level shifted Hamiltonian.
Then Ck ' the eigenvectors defining the molecular orbitals at the
kth iteration, are given by

C being a trial set of eigenvectors. Build-up in round-off error


arise from the matrix multiplications in equation (120). Such
round-off error may be removed by Schmidt orthonormalization of
the vectors of C'. However this orthonormalization is a rather
expensive process in terms of computer time (cf.below) and experience has shown that it is enough to perform it every few iterations.

mRy

Whereas the standard SCF procedure requires a set of trial


vectors only for the occupied orbitals, trial vectors are needed
for the whole set of molecular orbitals (occupied and virtual) in
the level shifting procedure.
The level shifting procedure has been found efficient for
systems like the transition metal carbonyls which are inherently
divergent (divergence occurs even with trial vectors which are
extremely close to the SCF solution) r32]. Relatively large values of the level shifting parameter rabout 4 to 5 a.u.) have to
be used in the first stage of the calculation when the trial vectors are rather inaccurate. However, after a few cycles, the use
of such a large value slows convergence down unnecessarily and
smaller values in the range 0.3 to 1. a.u. are used to achieved
convergence [31,32J.
D. Convergence Control. If, for each component of all the
vectors (corresponding to the occupied orbitals) the computed value at a given iteration differs from the value used in building
the F matrix by less than an appropriate threshold, the calculation is said to have converged. Let us call SCF threshold this
convergence criterion for the SCF iterations. One may specifY its
numerical value for each particular calculation with the other
input data. However this method has some drawback. If the threshold is set too stiffly, an otherwise perfect calculation may be

THE LOGIC OF SCF PROCEDURES

233

rejected because it "diverges". If the threshold is set too loose,


the results will be unnecessarily inaccurate. One way to avoid
these difficulties is to let the program find the natural convergence level for each particular calculation [20].
At the beginning of the SCF run, the program will automatically set the SCF threshold at a given value, for instance 10- 6
It will look for the vector convergence only after some convergence has been achieved for the energy value (since the energy converges much faster than the vectors). All the components of all
the vectors are then compared to the values with which the present
SCF iteration was started. If all the differences are less than
the common SCF threshold, the SCF process is said to have converged. If the convergence is not achieved, the program may carry out
an extrapolation (at least every two iterations) and the extrapolated vectors are the starting vectors for the next iteration.
SCF convergence will be attempted for a given number of iterations with the same SCF threshold. If it is not obtained within
these iterations, the SCF threshold is multiplied by a given number and this new value is used for another round of NXTRP iterations. This process can terminate in three ways:
- the SCF threshold exceeds an alarm threshold given as input. The
run is then terminated ;
- a maximum permissible number of iterations, also given as input,
is exceeded. This will also terminate the run ;
- SCF convergence occured for an SCF threshold less than the alarm
threshold. Final results of the calculation are printed and punched.
There are a number of other ways to define the SCF convergence.
Since one is often more interested in the energy value than in the
wave function , one may look at the convergence of the energy value
instead of the expansion coefficients. The relationship between the
total energy and the orbital energies (equation (41)) may also provide a test for the achievement of self-consistency. One may also
consider the magnitude of the off-diagonal elements (between occupied and virtual orbitals) of the Fock matrix in the basis of molecular orbitals, since they are zero at convergence.

4 - The use of molecular symmetry in the SCF calculation


We have already mentionned that symmetry adapted functions
instead of atomic basis functions may be used to speed up the SCF
calculations. All the one-electron matrices S,H,D and F become
symmetry blocked and the eigenvalue equation (64) can be solved
block by block. However this is not an important time saver since
the secular determinants are only of order m2 (with m the size of

234

A. VEILLARD

the basis set) and the time needed to carry out the diagonalization goes as m?, whereas the built-up of the Fock matrix goes as
the size.of the integral list, namely m4. This last process is
optimally carried out with integrals over symmetry adapted functions, since the number of two-electron integrals is reduced appreciably with the symmetry. However the transformation time for
computing symmetry adapted integrals from integrals over basis.
functions ~oes at b~st as the fif~h power of the basi~ set [33}
and the prlce of thlS transformatl0n becomesoverwhelmlng when
increasing the size of the basis set. Clearly one needs some way
to take advantage of the symmetry in the construction of the Fock
matrix without going through the integral transformation.
Such an approach has been proposed by Pitzer and collaborators [34-36J and is called the equal contribution method. It makes
use of the theorem that symmetry related integrals over basis
functions make equal contributions to symmetry adapted integrals
with totally symmetric integrands. A simple instructive example
given by Pitzer [35] corresponds to the calculation of overlap
integrals between the symmetry adapted functions obtained from
the three hydrogen s orbitals (denoted s1 s2 s3) in NH3 (C 3v symmetry)
a 1 = s 1 + s2 + s3
e = 2s 1 - s - s3
x
2
e = V3(s2 - s3)
y

( 121)

(with the z axis as the axis of the molecule and s1 c~ntered ~n


the x axis). There are six overlap integrals over basls functlons,
related by symmetry as follows
( 122)
The symmetry adapted overlap integrals are (a 1 la 1 ), which has a
totally symmetric integrand and (e Ie ) and (e Ie ), which do not
.
.
lYY Then it is usebut WhlCh
are !eq~lred
by symme tXt
ry 0 x:oe equa.
ful to define 35 J

(ele) = -21 (e x Ie x ) + -21 (ey IeY)

(123 )

which does have a totally symmetric integrand. As one may see from
the contributions of the integrals over basis functions to the
symmetry adapted integrals as given in Table 7, the optimum procedure is to compute only two integrals over basis functions, such

235

THE LOGIC OF SCF PROCEDURES

as (sl Is l ) and. (s1 1 s2)' multiply the values by.t~ree to account


for the other lntegrals and then use the coefflclents of the symmetry adapted orbitals to form the symmetry adapted integrals
(alla1) and (ele) . For one-el~ctron integrals th~ d~finition o~
symmetry adapted lntegrals wlth totally symmetrlc lntegrands lS
a straightforward matter. This is no longer true for two-electron
int-egrals and this problem has been considered in detail by Pitzer
[36j.
Symmetry
Integrals over basis functions
adapted
I
integrals(sl sl) (s21 s2) (s3 Is 3) ( s l l s2) ( s 2 Is 3 ) (s31 s 1 )
(a l I a l )

-4

-4

(e Ie)
x x

(e Ie)
y y

-6

(e I e)

-2

-2

-2

Table 7. Contributions of the overlap integrals over basis functions to the symmetry adapted overlap integrals for the s(H) functions of NH3
The corresponding algorithm to utilize this theorem In an SCF
program has been described in Ref.[34]. It requires that the integral routine first generates a list of all the integrals over basis functions, grouping together those which are required by symmetry to be equal in magnitude (more complicated symmetry determined relationships are not considered). Since each integral in a
group will make the same contribution to a given symmetry adapted
integral, only the first integral of a group is retained after
being multiplied by the number of integrals originally in the group.
Then the Fock matrix is constructed from this reduced list of integrals according to the method of Paragraph lB without modification for molecules whose symmetry groups possess one dimensional
irreducible representations only. The use of the theorem is based
on the fact that the Fock operator is a totally symmetric operator
and xhus its integrals between symmetry adapted orbitals of the
same symmetry have a totally symmetric integrand. In the case of
molecules whose symmetry groups possess multidimensional irreducible representations, the integrals in the diagonal blocks of the
symmetry blocked Fock matrix do not in general have a totally symmetric integrand and correct matrix elements can only be obtained
by averaging these integrals [34,36].

236

A. VEILLARD

Since the time consuming step in each SCF iteration is the


construction of the Fock matrix (cf. below Paragraph 4), a reduction in the integral list by a given factor will reduce the entire
SCF iterations time by approximately that factor (which clearly
depends on the degree of molecular symmetry). Some examples from
Ref.[34] are given in Table 8.
H2O
C2v

Molecule
Point group

Mn 4

29

Td
39

Nb. of integrals over basis functions

94830

304590

Nb. of non-zero integrals

46848

288486

Nb. of

26310

15132

Nb. of basis functions

uni~ue

non-zero integrals

Time per SCF iteration using


all non-zero integrals

8.8 sec.

62 sec.

Time per SCF iteration using


integrals only

4.6 sec.

4.5 sec.

uni~ue

Table 8. SCF timings (on an IBM 370/165) with and without symmetry
(from Ref. (34])
A somewhat different approach has been proposed by Dacre to
reduce the integral list [37] and was latter extended by Elder
[38J. However it leads to a more complicated algorithm for processing the shortened list of integrals since all symmetry related
contributions to the Fock matrix are generated from the one formed
from the reduced list by applying symmetry operations.
4 - Se~uence of operations and flow-chart for a SCF closed-shell
program.
A simplified se~uence of operations for a SCF closed-shell
program would be :
1. Read input tlata (including trial vectors).
2. Set the SCF threshold at 10-6
3. Read Sand H from tape.
4. Schmidt orthonormalize the trial vectors if needed.
5. Write input data.
6. Compute the closed-shell density matrix.
7. Read the
<p~lrs> integrals from tape and compute the P matrix.
8. Compute the one and two-electron energies and the F matrix.
9. Compute the total energy.
10. Diagonalize the overlap matrix if needed (first iteration if
Lowdin orthogonalization is used).

237

THE LOGIC OF SCF PROCEDURES

11. Compute the F' matrix (Lowdin orthogonalization or tranformation in the basis of molecular orbitals).
12. Solve the eigenvalue equation (Jacobi diagonalization).
13. Back transformation for the eigenvectors.
14. Order eigenvectors and eigenvalues.
15. Test for convergence.
16. If self-consistency is achieved, print and punch the final resuIts.
17. I f not, test for the number of iterations; Multiply the SCF
threshold if needed and extrapolate.
18. Test for interrupt. If no, go back to 6 (calculation of the
density matrix).
The corresponding flow-chart is represented in Fig.2. Some illustrativetiming for a relatively large calculation is given in
Table 9. One will notice that eighty per cent of the CPU time goes
into the computation of the P matrix from the list of two-electron
integrals and that the Schmidt orthonormalization is a relatively
expensive process. We have considered both the diagonalization of
the F matrix after a Lowdin orthogonalization (step 4a in Table 9)
according to the method of Paragraph 2A and the diagonalization of
the F matrix expressed in the basis of trial molecular orbitals
(step 4b in Table 9) according to the method of Paragraph 3C.
Clearly this latter procedure is more economical, with respect to
both the matrix manipulation and the matrix diagonalization.
1.

Schmidt orthogonalization of the trial vectors


(optional)
2. Computation of the density matrix
3. Computation of the P matrix from the list of
two-electron integrals
4a) Diagonalization of the S matrix (optional,
only at the first iteration)
Computation of the F' matrix (Lowdin
orthogonalization)
Diagonalization of the F' matrix
4b) Computation of the Fock matrix in the basis
of molecular orbitals
Diagonalization of the Fock matrix
Total for one SCF iteration (without step 1 and
wi th step 4b)

4.9
.1
10.3
2.3
1 .0
1.1
.4
.6
12.1

Table 9. Timing (in minute) of one SCF iteration for Fe(CO)s


(with 115 basis functions and a list of 5.7x10 6 integrals) On a
Uni vac 1108 [32 (CPU time only).

238

A. VEILLARD

JRead input datal


ISCF

threshold=10- 6 1

IS,H from tapel


ISchmidt orthonormalization of trial vectors1
IWrite input datal

I<pq Irs> I
I

z;jJ
].Irs

=.1

2 Apq

Apq

Apq Apq

H D
Apq Apq

Apq
F

Apq, JUS f,lrs

== H

Apq

+P

Apq

E=E +E +E
1 2 nuc.rep.
I

Order eigenvectors and eigenvalues

---J

SUP =1 C,. -C~ ~-I


I1.lp I1.lp max

>~SUP'::: SCF thresho~~J

rExtrapolatio~

Write f~nal ;;;;sult~l


Punch flnal vector~

lEnd

~-~--

Figure 2. Flow-chart for a SCF closed-shell program

. .
~

239

THE LOGIC OF SCF PROCEDURES

CHAPTER III. THE OPEN-SHELL CASE : THE ROOTHAAN SCF EQUATIONS IN


THE RESTRICTED HARTREE-FOCK METHOD.
1 - Review of the open-shell theory
A. The open-shell wavefunction and energy. We consider openshell wavefunctions with the following specifications [3,39,40] :
i) the total wavefunction is, in general, a sum of Slater determinants, each of which contains a doubly occupied closed-shell core
~ , and a partially occupied open-shell chosen from a set ~ , the
c
. . . .
0
dlfferent Slater determlnants contalnlng dlfferent subsets of ~ .
0
The combined set of orbitals ~ is defined by

~0 )
(124)
c
and is assumed to be orthonormal, so that the two sets ~ and ~
are orthonormal and mutually orthogonal. In referring toCthe ingividual orbitals, we use the indices k and I for the closed-shell
orbitals, m and n for the open-shell orbitals, i and j for orbitals of either set. It is assumed that there is at most one openshell per irreducible representation (denoted A and ~).
~

(~

ii)The expectation value of the energy is given by


E=2ZI k +L (2Jkl-~I)+fr2ZI +n: (2a A J -b, K )+22: (2J -K )-1 (125)
k
kl
L'm m mn
~ mn It~ mn
kIn
km --kID
where a A ' b A and f are numerical constants depending on the specific cahle an~ the open-shell orbitals m and n belong to the irreducible representations A and ~. The first two sums represent the
closed-shell energy, the next two sums the open-shell energy and
the last sum the interaction energy of the closed and open-shell.
In the LCAO formalism this can be rewritten as
E=2:EN Ak E HA CAk CAk +EEN E H C
C
Ak
pq pq
P
q Am Ampq Apq Amp Amq

+.1 E E
2
A~

ENC
C
~
NC
C
Ak Akp Akq Apq,~rs ~l ~lr ~ls
kl pqrs

(126)
(closed-shell
closed-shell)

+1 E E
2

'?Apq,~rs N~n C~nr C~ns

(closed-shell
open-shell)

+.1 E E
2

9Apq,~rs NC
C
~l ~lr ~ls

(open-shell
closed-shell)

+-1 E E

A~

A~

1
2

A~

E
A~

E NAkC
C
Akp Akq
kn pqrs

E NC
C
Am Amp Amq
1m pqrs

ENC
C
mn pqrs Am Amp Amq

Apq,~rs

NC

~n ~nr ~ns

E
ENC
C
<?
NC
C
mn pqrs Am Amp Amq Apq,~rs ~n ~nr ~ns

(open-shell
open-shell)

240

A. VEILLARD

the number of electrons per closed-shell orbital


(usually 2, but also 4 or 6 in the case of degenerate orbitals)
N

Am

the number of electrons per open-shell (usually


1, but also 2, 3, 4 or 5).

= N

~n

and the supermatrix


Q

~,

-(\,

~pq,~rs '{~pq,~rs

Q. defined through
= a,

~~

<pqlrs>

-.....6.l:!...4 prlqs>+<pslqr

( 121 )

yielding

QApq,~rs

( 128)

with
( 129)

A few examples will be given in order to illustrate the above


expression :

i) for a free radical with one unpaired electron, the total wavefunction of the system will be a single Slater determinant
1P =

(2n-1)~

!\1(1)1(2) ... -1 (2n-3) _1(2n-2) (2n-1)/I(130)


n
n
n

To find the corresponding energy expression is a straightforward


application of tge Jules giving the energy associated with a Slater determinant 1

L J:

E=II.+ I J .. - I'K ..
i l i<j lJ i<j lJ

(131 )

The summations are over spin-orbitals, hence:


n-1
n-1
n-1
E=2 I Ik+I + I
(2Jkl-~1)+ I (2J
-~
)
k=1
n k,1=1
k=1
k,n
,n
Clearly

a=b=O

a=B=1

f=l2

( 132)

N =1
Am

ii) For triplet and singlet excited states.


For a closed-shell system, we consider an excitation of an
electron from a doubly occupied orbital to an empty orbital .
We assume and to be non degenerate. m
n
m
n

241

THE LOGIC OF SCF PROCEDURES

This leads to four possible configurations with respect to the


spin

aa SS as Sa

corresponding to four Slater determinants :


1

1/11 = .r::-7 11<1>1(1)<1>1(2) .... <1> (11)<1> (v)"


V 2n~
m
n
" .. ~ (Il)~ (v)1I
m
n
\' . . . . . . . . . . . . . . <I>

'\

( 133)'

m(Il)~n (v)U

1\ ..............1"m(11)<1> n (v)l\
1/11 1/1 2 1/1

~d

1/14 are eigenfunctions of Sz but

o~ly

1/11

an~

1/1 2 are

e:genf~ctlons of S2. From 1/1 3 and 1/14 we can bUllt two elgenfunc-

tlons of S2
'31/1 =

(1/13+1/14)

( 134)

11/1 = __1__ (1/1 -1/1 )


J2 3 4
If we compute the energy associated with 1/11, 1/12' 31/1 and 11/1, we
find easily that

E (2Jkm-~ +2Jk -~ )+J -K


k;i:m,n
--kIn
n --kn
mn mn

( 135)

= E(31/1)+2K
mn
1/11, 1/12 and 31/1 are the three components of the triplet state. 11/1 is
the wavefunction associated with the singlet state. Since it is
assumed that the two open-shells belong to different representations (this is required to insure the orthogonality condition between the open-shell molecular orbitals) expressions (135)
of the energy may be fitted into the form of equation (125) with
the following choices :
f_1 a =a =a =1
2 U
All 1111

b,,=b

f=..! a =a =b =b =0
2 AA 1111 AA 1111

It It

All

=b

1111

=2
b

for the triplet state


All

=-2 for the singlet state

iii) The case where the open-shell set includes degenerate orbitals
will be handled in a similar way. For instance a system of three
electrons in two degenerate orbitals will be described by the wavefunction

242

A. VEILLARD

( 136)

with ~l and ~2 being two degenerate orbitals. It


that the corresponding energy is given by
1
1
E = ~I + ~I + -;:/11
+ "2 J 22 + 2J 12 - K12
2 1
2 2

lS

easily found

( 137)

This amounts to choosing


f =

or

Am

= ~

and, by considering that the two degenerate orbitals belong to


different representations denoted 1 and 2 :

8 In
. order to have f2 x 2 x 2 aJ 12
a 12 = b 12 = 9
(the summation includes two terms mn and nm)
8 .

9 In

order to have f

x 2 x aJ 11

2J 12

= J 11

B. Review of the equations [3J. Minimization of the energy


with respect to the orbitals ~. together with the orthonormality
conditions leads to two sets or equations respectively for the
closed-shell and open-shell orbitals (we give only the LCAO equations) :

E(H

with P

Apq

Apq

+P

Apq

C,Apq

Apq

)C

=E S

Amq I

C
8
IN +E SApqCAnq 8 Amn IN Am
Apq Alq AIm Am
nq
( 139)

(0/ Apq, ]Jrs DT, ]Jrs

(140)

L:

n't Apq, ]Jrs D0, ]Jrs

(141 )

]Jrs
]Jrs

-Q

(142)

(closed-shell density matrix)

243

THE LOGIC OF SCF PROCEDURES

D
D

O,Apq

= L: N

AmCAmp CAmq

C,Apq

T,Apq

+D

(open-shell density matrix)

(144)

O,Apq

(total density matrix)

The orbitals can be subjected to the transformations


<1>'

=<1>

(145)

U
C C

where Uc and Uo are two unitary matrices which eliminate the offdiagonal multipliers 8 lk (closed-shell closed-shell term) and 8
.
.nm
( open-shell open-shell term.
) Such a transformatlon
cannot ellminate the off-diagonal multipliers 8 nk and 8 1 which coup~e the
closed and open-shells. However, the open-sheTI SCF equatlons are
still reducible to pseudo-eigenvalue problems and this is achieved
by reexpressing the closed-shell open-shell coupling terms in such
a way that those terms can be absorbed into the left-hand sides of
the equations. This is done by introducing two new square symmetrical matrices

(146)
R

O,Apq

L: (S

NCA-N OA uw

ApU O,AUW AWq

+Q

S)

APU O,AUW AWq

( 147)

(N CA and NOA b~ing t~e occupation numbers of the closed-shell and


open-shell orbltals In the representation A). Then the SCF equations (138,139) may be rewritten as the pseudo-eigenvalues equations
L: S
C
or
C
=E
pq Ak q Ak q Apq Akq

FCC

ESC

(148)

C
=E
L: S
C
or
Apq Amq
OApq Amq Am
q

FOC

ESC

( 149)

L: FCA

L: F

where~

,Apq

O,Apq

+P
+R
H
Apq Apq 0 Apq

( 150)

H
+P
-Q
+R
Apq Apq Apq C,Apq

(151 )

Roothaan has also given a different formulation of the SCF


problem in which the closed-shell and open-shell orbitals are solutions of the- same eigenvalue problem [3J

( 152)
However it has been shown that this single eigenvalue method can
give an energy that is not invariant to a unitary transformation
of the orbitals, hence leading to possible discontinuities in the
potential energy curve when different symmetries are investigated

244
(for instance C3v and D3h )

A. VEILLARD

[41 j .

2 - The logic of the SCF calculation in the open-shell case.


With respect to many points, the SCF calculation will proceed
for the open-shell case in the same way than for the closed-shell
case. However, since we deal now with two sets of equations, the
program will deal at each iteration, either simultaneously or successively, with the closed-shell and open-shell equations. In this
respect, the computational burden could be twice for the open-shell
what it is for the closed-shell. Furthermore, an optimal use of the
fast store is now needed to handle the many matrices which appear
in the computing of the F matrices. To illustrate the difference
between the two cases, in the closed-shell case only one density
matrix is needed and only once in computing the F matrix. In the
open-shell case three density matrices are involved at different
step~ o~ the calcul~tion.of the F matri?es. For instance. the DO.
matr1x 1S needed tW1ce, 1n the calculat10n of the Q matr1x and 1n
the calculation of the RO matrix. There are three possibilities :
i) compute the DO matrix once for all and keep it in fast store.
If enough fast store is available, this is the best solution.
However one is usually short of fast store ;
ii) c~mp~te the Do matrix, ~se it to compute the Q matrix, then
save 1t 1n slow store. Get 1t back from the slow store when it is
needed aga1n to built the RO matrix
iii) compute twice the Do matrix.
Since one is usually short of fast store, ii) represents usually
the most "economical" solution.
Another point raised in connection with the optimization of
the SCF process for open-shells is the relative intricacy of the
sequence of calculations for the closed-shell hamiltonian and the
open-shell hamiltonian. One way would be to compute first the closed-shell hamiltonian and to solve the corresponding eigenvalue
equation, next to compute the open-shell hamiltonian. However,
this is far from being the best way since several sequences of operations are common to the computation of the two hamiltonians. To
proceed simultaneously with the two hamiltonians is more economical but raises some problems in connection with the optimal use of
the fast store. We have mentionned that, for the closed-shell case,
the most expensive operation is the computation of the P matrix
from the list of two-electron integrals. In the open-shell case,
processing of the two-electron integrals is needed for the computation of both matrices P and Q. This will be optimally done by
computing simultaneously P and Q, and this requires four matrices
(P,Q,D T and DO) in core. For relatively large basis sets, this may
be achleved by storing only half-matrices which are considered as

245

THE LOGIC OF SCF PROCEDURES

linear arrays (cf.Chapter II, Paragraph 1B).


Then it is obvious that the organization of an open-shell SCF
calculation will depend largely on a number of specific features
like the amount of fast store available, the access speed to the
slow store, the maximum size of the matrices to be treated. For all
these reasons, the example given below for an open-shell SCF program is given only in an indicative way. It was implemented on the
Univac 1108, with the maximum size of the basis set being 122 and
enough core to accomodate three square matrices of maximum size 122.
The total storage requirement was about 60 K words.
A. Diagonalization procedures and selection of the vectors in
the open-shell case. The same techniques which have been used for
the calculation of the eigenvalues and eigenvectors in the closed
shell case are still sUltable in the open shell case. The two
pseudo-eigenvalues equations FCC=ESC and FOC=ESC can be solved
through the use of a Lowdin orthogonalizatlon followed by a Jacobi
diagonalization. It may be convenient to use the single vector diagonalization technique for the open-shell equation.
The number of eigenvalues.and eigenvectors obtained by solving
the equation FC=ESC is equal to the size m of the basis set. For a
closed shell system of 2n electrons with n
m, in its ground state,
it is customary to select the n elgenvectors corresponding to the
lowest eigenvalues (aufbau principle). It is usually assumed that this
choice will lead to the lowest value of the total energy, ~ to
the ground state wavefunction. For an open-shell system of 2n + 1
electrons with n doubly occupied molecular orbitals, it is also customary to select the n eigenvectors corresponding to the lowest eigenvalues o~ the.clo~ed~shell equation FCC=ESC. But som~ care has
to be exerclzed In plcklng the correct open-shell solutlons. The
operators FC.and F are physically little different, so that some
eigenfunctions of ~o are expected to resemble closely the closedshell orbitals and obviously must be rejected as open-shell orbitals. It might be expected that the corresponding sequence of eigenvalues is the same for the operators F~ and FO. Then one would
take for the open-shell orbital the eigenvector oelonging to the
n + 1 eigenvalue. However, experience has shown that this is generally not true and it is usually found that the open-shell orbital
corresponds to the p elgenvalue with p < n+1. In some unfavourable
but still rather common cases, the rank p of the correct open-shell
eigenvalue will change from one iteration to another. Clearly the
choice of the open-shell eigenvector is not straightforward. This
problem is solved by selecting the open-shell eigenvector with reference to the corresponding trial vector. This can be done in two
ways:

<

- if the Jacobi diagonalization is used, one computes the overlap


between the trial vector and the m eigenvectors of the open-shell

246

A. VEILLARD

Since these m eigenvectors form a complete space, it can


be shown that the overlap integral with one eigenvector should be
larger than ~,the overlap integrals with the remaining vectors
being less y2 than ._1_. Then one chooses as the open-shell solution the first eigenv~tor leading to an overlap integral greater
than J:...
e~uation.

.J2

- if the single vector diagonalization is used, the open-shell


trial vector is used as the approximate eigenvector C at the start
o
of the dlagonalization process.
The efficiency of this method is of course dependent on the
of the trial vector for the open-shell orbital. Experience
indicates that it is relatively easy, with the help of symmetry
and qualitative arguments, to bUllt an open-shell trial vector of
sufficient accuracy.
~uality

B. The organization of an open-shell SCF calculation. As mentionned above,the following organization was implemented with
enough fast core to accomodate three square matrices (with the possibility of storing two half-matrices considered as linear arrays
in the place of a square matrix). The corresponding sequence of
operations for a SCF open-shell program is

[12] :

1-5 Steps 1 to 5 identical to the ones of the closed-shell

6.
7.
8.

10.
11.

12.
13.

14.
15.
16.

17.
18-21.

program (except for step 3, the matrices S and H being


read from tape when they are needed).
Compute the open-shell and total density matrices DT
and DO.
Read the <pqirs> integrals from tape and compute the
P and Q matrices.
Compute the RO matrix.
Compute the closed-shell density matrix DC.
Compute the RC-Q matrix.
Compute the one and two-electron energies and the FC
and FO matrices.
Compute the total energy.
C~mpute the Fe and Fa matrices (Lowdin orthogonaliza
tlon) .
Diagonalize in t~rn the Fe an~ Fa matrices.
Back transformatlon for tfie elgenvectors.
Order the closed-shell set of eigenvectors and eigenvalues.
Select the appropriate open-shell orbital.
Identical to steps 15 to 18 of the closed-shell program.

Some illustrative timing for a large calculation is given in


Table 10. One will notice that the computation of the P and Q matrices from the list of two-electron integrals represent in the
open-shell case only fifty per cent of the CPU time (versus eighty
per cent for the closed-shell case).

247

THE LOGIC OF SCF PROCEDURES

1. Computation of the matrix V (equation (76))


for the Lowdin orthogonalization in a symmetryadapted basis set.
2. Schmidt orthogonalization of the trial vectors
(optional) .
3. Computation of the density matrices DT and DO'
4. Computation of the P and Q matrices from the
list of two-electron integrals
5. Computation of the RC and RO matrices
6. Computation of the Fe and Fa matrices
7. Jacobi diagonalization (twice)
8. Selection of the open-shell eigenvector
Total for one SCF iteration

.5

2.4
1

9.3

2.(1

.8
5
.6

17.4

Table 10. Timing (in minute) of one SCF iteration (with 93


basis functions and a symmetry adapted basis set for a point
group Cs on a Univac 1108 [28 (CPU time only).

CONCLUSION
In the above pages, we have emphasized both the theoretical
methods and the computational techniques which have been largely
in use, during the last ten years, for large-scale molecular calculations (for molecules with up to 30 atoms and basis sets in
the range of 50 to 150). However, a number of other methods and
techniques have been proposed in the litterature, which have been
less currently in use and for which less experience has been gained so far. We feel worthwhile to mention them briefly, since some
may gain more favor in a near future (or possibly have already
been favored in some specific cases).
The SCF equations as given by Roothaan, both in the closed
and open-shell cases, are solved by the pseudoeigenvalue method,
a technique which is at best linearly convergent. Numerical methods have been proposed for solving the SCF equations which are
based on the Newton Raphson iterative method (namely expanding
the SCF equations in terms of first order changes oC and then
solving the resulting equations for the optimum new oC) subject to
the constraints imposed by a Schmidt orthonormalization, the method being quadratically convergent [42J. In the steepest descent
method, one abandons the determination of the individual orbitals
and the total energy is minimized subject to the idempotency condition of the density matrix [43}. Direct minimization of the
energy functional has been based on unconstrained optimization
techniques through the incorporation of the orbital orthonormality
restrictions into the expresslons for the energy gradient [44-46J.

248

A. VEILLARD

Alternatives to the Roothaan SCF equations have been sought


mostly for the open-shell case, probably as a consequence of the
increased computational burden associated with the open-shell problem. An account of some earlier methods may be found In Ref. [47}.
The method of equivalence restrlctions of Nesbet [481 is computationally attractive since it is based on the use of only one effective Hamiltonian and the off-diagonal multipliers are eliminated by a unitary transformation. However the corresponding wavefunction does not satisfy the variational principle Slnce the
Hamiltonian used does not correspond to the actual form of the
assumed wavefunctlon. The Orthogonality Constrained Basis Set
Expanslon (OCBSE) method is an alternative to the coupling operator technique for the open-shell problem [49], the orthogonality
of a glven orbital to the remaining orbitals being ensured by limiting the variations to the space spanned by the functlons orthogonal to those orbitals. The "superiteration" process proposed by
Peters [50J is closely related to the OCBSE method. However it has
been shown that both methods do not satisfy all the necessary conditions for the energy to be stationary [51,52]. An extension and
a rectification of these two methods has been proposed by Dahl as
the "Direct Method" [52l. A ''level shifting" technique to permit
guaranteed convergence for the open-shell case has been described
[53], based on an energy minimlzation scheme which lncludei2 as a
special case the single Hamiltonlan procedure of Roothaan l3J.
REFERENCES

[11

See for instance: a) R. McWeeny and B.T. Sutcliffe, "Methods of Molecular Quantum Mechanics", Academic Press, 1969
b) F. L. Pilar, "Elementary Quantum Chemistry", McGraw Hill,
New-York, 1968.
C.C.J. Roothaan, Rev. Mod. Phys., 23,69(1951).
C.C.J. Roothaan, Rev. Mod. Phys., 32,179(1960).
J.A. Pople and R.K. Nesbet, J. Che~ Phys.,22,571( 1954).
G. Berthier, J. Chim. Phys., 51,363(1954). -G. Das and A.C. Wahl, J. Chem-.-Phys.,44,b7(1966).
A. Veillard and E. Clementi, Theoret.-Chim. Acta,7,133(1967).
D.R. Hartree, Proc. Cambridge Phll. Soc.,24,89 and 111(1928).
v. Fock, z. Physik,,l, 126( 1930).
-D.B. Neumann, H. Basch, R.L. Kornegay, L.C. Snyder,
J. W. Moskowitz, C. Hornback and S. P. Lei bmann, "The Polyatom
(version 2) System of Programs", QCPE 199, Indiana Unlversity,
Bloomington, Indiana 47401, USA.
G.H.F. Diercksen and W.P. Kraemer, "MUNICH, Molecular Program
System'; Reference Manual, Special Technical Report, Max-Plank
Institut fur Physik und Astrophysik, Munich, Germany.
A. Dedieu, J. Demuynck, A. Strich and A. Veillard, "Asterix :
a system of programs for the Univac 1108", unpublished work.

THE LOGIC OF SCF PROCEDURES

249

D31 U. Almlof, USIP Technical Report n07209, Stockholm, Suede


(1972).
[141 A.J. Duke, Chern. Phys. Let.,.1.l,76(1972).
[15 A. Strich and A. Veillard, unpublished work.
[16 F. P. Billingsley, Int. J. Quant. Chern. ,.2.,617 ( 1972) .
(17 JR. C. Raffenetti, Chern. Phys. Let., 20, 335 ( 1973) .
Q8J M. Yoshimine, IBM Technical Report RJ-555, San Jose, USA(1969).
(19] P.O. Lowdin, J. Chern. Phys.,J..,365(1950).
L20] C.C.J. Roothaan and P.S. Bagus, in "Methods in Computational
Physics'~ B. Alder, S. Fernbach and M. Rotenberg ed., Vol.2,
Academic Press, New-York, 1963, p.47.
[21} See for instance: H. Margenau and G.M. Murphy, "The Mathematics of Physics and Chemistry", Van Nostrand, Princeton, 1967,
Vol. 2, p. 71 .
[22] R. Hofmann, J. Chern. Phys.,39,1397(1963).
[23] J.A. Pople and D.L. Beveridge, "Approximate Molecular Orbital
Theory", Mc GrawHill, New-York, (1970).
(24] L.L. Shipman and R.E. Christoffersen, Chern. Phys. Let., 12,
469(1972).
[25} N.M. Winter and T.H. Dunning, Chern. Phys. Let, ~,169(1971).
[261 W.B. Neilsen, Chern. Phys. Let.,18,225(1973).
(271 A. Strich and B. Roos, unpublished work.
(28J A. Dedieu, private communication.
L29J P.S. Bagus, I.P. Batra and E. Clementi, Chern. Phys. Let.,23,
305( 1973).
[30J S. Ehrenson, Theoret. Chim. Acta,1!., 136( 1969).
l31} V.R. Saunders and I.H. Hillier, Int. J. Quant. Chem.,l,699,
(1973) .
(321 J. Demuynck, private communication.
(33J K.C. Tang and C. Edmiston, J. Chern. Phys.,~,997(1970).
L34J N.M. Winter, W.C. Ermler and R.M. Pitzer, Chern. Phys. Let.,
19,179 ( 1973) .
[351 R:'"M. Pitzer, J. Chern. Phys.,58,3111(1973).
[36] R.M. Pitzer, J. Chern. Phys.,59,330e(1973).
(37] P.D. Dacre, Chern. Phys. Let.,1,47(1970).
(38 J M. Elder, Int. J. Quant. Chern. ,1,75 ( 1973) .
(391 S. Huzinaga, Phys. Rev.,120,866(1960).
[40J S. Huzinaga, Phys. Rev.,122,131(1961).
[411 T.E.H. Walker, Chern. Phys. Let.,.2.,174(1971).
[42} W.R. Wessel, J. Chern. Phys.,47,3253(1967).
[431 R. McWeeny , Proc. Roy. Soc.(London),A235,496(1956).
[441 R. Fletcher, Mol. Phys.,19,55(1970).
[45] R. Kari and B.T. Sutcliffe, Chern. Phys. Let.,1,149(1970).
(46] R. Kari and B.T. Sutcliffe, Int. J. Quant. Chem.,7,459(1973).
[47J G. Berthier, in "Molecular Orbitals in Chemistry, Physics and
BiOlOgy", P.O. Lowdin and B. Pullman ed., Academic Press,
New-York, 1964, p.57 and following.
[48} R.K. Nesbet, Proc. Roy. Soc.(London),A230,312(1955).
(49] W.J. Hunt, T.H. Dunning and W.A. Goddard, Chern. Phys. Let.,
1,606(1969).

250

A. VEILLARD

D. Peters, J. Chern. Phys.,21,4351(1972).


R. Albat and N. Gruen, Chern. Phys. Let.,18,572(1973).
J.P. Dahl, in Proceedings of the SRC Atlas Symposium n U 4
"Quantum Chemistry. The State of the Art", V.R. Saunders and
J. Brown ed., Atlas Computer Laboratory, Chilton, England,
April 1974.
M.F. Guest and V.R. Saunders, Mol. Phys., to be published.

THE CONFIGURATION INTERACTION METHOD

Bjorn Roos

1.

Introducti on

The Hartree-Fock method is a well established model for studying electronic


structures of molecular systems. The Hartree-Fock energy, EHF , accounts in
most cases for more than 99 % of the exact eigenvalue, ENRL , of the nonrelativistic Hamiltonian of a system. However, usually one is not interested
in absolute energies, but rather in energy differences. To be expiicit, for
a given sys tem one us ua 11 y wants to study the energy difference, liE = ES -E A'
between two states A and B of the system, that is the energy connected with
the "reaction"

For many such reactions" the HF method has proven to give results in good
agreement with experiment ("good" here means that the error is of the order
of 1 kcal/mole). Example are: energies connected with conformational changes
(inversion and rotation barriers, cis-trans isomerism, etc.); hydrogen bond
energies; proton affinities; hydration energies, etc. Common to all cases
for which the HF method is applicable is that no drastic electron rearrangements take place while the system changes from state A to S: no electron
pairs are broken or formed, and the molecular orbitals are only slightly
modified. If these conditions are fulfilled, the error connected with the HF
approximation usually remains almost constant during the structural changes of
the system.
Many interesting problems, however, do not belong to this category. It

Diercksen et al. (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 251-297.
All Rights Reserved. Copyright 1975 by D. Reidel Publishing Company, Dordrecht-Holland.

252

BJORN ROOS

suffices to mention examples such as: ionization and excitation energies, and
especially chemical reactions involving the formation (or rupture) of
chemical bonds. Furthermore the HF method will be inappropriate for most
problems involving very accurate studies where the error is required to be
smaller than 1 kcal/mole.
Thus, even if the HF method gives more than 99 % of the non-relativistic
energy, the remaining error is often of chemical significance. Let us take
an example. The HF energy for the N2 molecule has been calculated to be
-108.99 a.u., which is 99.5 % of the total non-relativistic energy
(E NRL = -109.54 a.u.). A nitrogen atom has a HF energy of -54.40 a.u.
(E NRL = -54.59 a.u.). Using these energies we can calculate the binding
energy for the N2 molecule. In the HF approximation a value of 0.19 a.u.
(5.17 eV) is obtained. The experimental value is 9.9 eV. Thus while the
errors in the total energies are only around 0.5 %, the calculated binding
energy is in error with as much as 48 %.
What is the cause of this error? In the independent particle model, which
forms the basis for the HF approximation, one writes the total wave function
as a product of spin-orbitals. Thfs is equivalent to the assumption that the
motion of one specific electron is independent of the instantaneous positions
of the other electrons. The electron motion is said to be uncorrelated.
Energetically this is expressed by the form of the Coulomb operators in the
Hartree-Fock operator. An electron interacts only with the average field of
the other electrons. This approximation is good as long as the electrons are
far apart from each other. Suppose, however, that we take two electrons,
which originally are far apart, and pair them in the same molecular orbital
(thus forming a chemical bond). In the beginning they do not interact at all.
After the reaction, they occupy the same space and are in close contact with
each other. I n the HF mode 1 the ca 1cul a ted repu 1s i on energy i s no\~ too 1a rge
since the model does not take into account the instantaneous interaction and
therefore overestimates the probability of finding the two electrons close
together. Thus, we can explain the error in the HF binding energy for N2.
Three new electron pairs are formed corresponding to an increase of the
error of the HF energy of 4.8 eV (or 1.6 eV per electron pair).
Since a determinental wave function obeys the Pauli principle, two
electrons with parallel spin cannot occupy the same molecular orbital. The
motion of two such electrons is therefore correlated already in the HartreeFock model; and this correlation is expressed through the appaerence of the
exchange integrals in the energy expression. This effect is usually called
Fermicorrelation. The fact, that the probability of finding two electrons
with parallel spins in the same space point is zero, has an important implication. The probability of finding three electrons simultaneously close together

THE CONFIGURATION INTERACTION METHOD

253

will be small since at least two of them must have parallel spins. The
electron motions will therefore correlate pairwise, which can, as will be
seen later, be used to simplify the treatment of the correlation effects.
The above discussion has referred to the so called dynamical correlation
between the electrons. In certain cases the Hartree-Fock model .breaks down
due to degeneracies or near degeneracies between different Slater determinants.
Such effects often occur when one tries to calculate energy surfaces. In the
above mentioned example, we compared the HF energy of a N2 molecule to the
HF energy for two separate nitrogen atoms. If we had instead made an attempt
to calculate the full potential curve for N2 by varying the NN distance, the
error in the binding energy would have been much larger. The reason for this
increase is that a single closed shell Slater determinant is not a correct
wave function for two nitrogen atoms at a large distance, but corresponds to
a mixture of a number of ionized and excited states. In order to obtain a
correct description of two N( 4S) atoms coupled to a 1Lg+ state, a linear combination of a number of Slater determinants must be used. This state of
affairs is sometimes referred to as the symmetry dilemma of the restricted HF
scheme. At large distances the bonding and antibonding molecular orbitals
become degenerate and should have equal occupation numbers. In the single
determinant wave function for N2 , the bonding orbitals 3ag and 11Tu are
completely occupied, while the corresponding antibonding orbitals3au and
iff are empty. The correct wave function, at large distances, has to include
g
determinants where these orbitals are occupied.
Similar effects occur when two orbitals are close in energy but not
degenerate, and when only one is occupied in the HF scheme. In the Be atom,
for example, the 2p-orbital is very close in energy to the 2s-orbital.
The configuration (ls)2(2P)2 has as result of this near degeneracy a large
coefficient in the wave function for the ground state of the beryllium atom.
Strong correlation effects therefore occur for the 2s electron pair. This is
usually called near degeneracy correlation.
Many different methods have been proposed for a quantita.tive description
of correlation effects in molecular systems. In the following chapters we
shall discuss one such method, the method of superposition of configurations,
or simpler, the Configuration Interaction (CI) method.

254

BJORN ROOS

2.

The Configuration Interaction Method

2.1

General Method

The Hartree-Fock energy is an upper bound the the exact eigenvalue of the
lowest eigenstate of the Hamiltonian. The correlation energy is the
difference between these two energies:

The method for calculating wave functions which we are concerned with in
these lectures are those which take into account correlation effects and
thus enable calculating also the correlation energy.
In the Hartree-Fock method for an n-electron problem, the wave function
is built from the n first spin orbitals of an, in principle, complete set:

------1P1'

1P2 '

IPn;

occupi ed
orbitals

------------------

(2)

IPn+1 '

virtual
orbita 1s

Obviously it ought to be possible to obtain an improved wave function by


including in some way the virtual orbital space. From the set (2), or any
other complet set of spin-orbitals, we can form a complete set of anti-symmetrized n-electron functions (Slater determinants). The wave function can
then be expanded in such a set:
(3)

Each ~v is here a Slater determinant built from an ordered set of n spin


orb ita 1s;
(4)
A

where II = (11 1 ,112 , .... ,lin) with "1 <112< .. <lin' and where A is the
antisymmetrizer:

A = (n!)-1/2

L: (-1)P

(5)

Although not necessary we assume that the basis set (2) is orthonorma1ized.
The set of Slater determinants is then also orthonormal, and the normalization
condition for the wave function (3) takes the form

255

THE CONFIGURATION INTERACTION METHOD

In order to determine the expansion coefficients c v ' we apply the


Rayleigh-Ritz variation method, which gives us the usual linear equation
system
E {H ,,- E S .,} CIo/ = 0

10/

IJ.

IJ.

(6)

where the eigenvalues E are obtained from the secular equation


(7)

The matrix elements have their usual definitions:


HIJ.V = Jif>IJ.*

Hif>.,df

(8)

SIJ.Io/=Jif>IJ.*if>.,df

The equations (6) and (7) give an exact solution to the Schrodinger
equation only if the basis set (2) is complete. In practice, however, we
always have to work with a finite number of spin-orbitals. Further, the
number of determinants which can be generated from this orbital basis is
usually untractable. Consider as an example the water molecule. This is a tenelectron system. In a so called 'double-zeta' calculation, we would use 14 basis
functions and thus obtain 28 molecular spin orbitals. From these spin-orbitals
we can construct (~~) determinants. It is of course impossible to solve secular
problems of this size ( (i~)~ 1.3.10 7). There are however ways by which
the number of configurations can be significantly reduced. First of all the
number of non zero coefficients can be reduced by symmetry. For example,
in the lAl ground state of the water molecule, only those Slater determinants which belong to the irreducible representation Al of the point
spin-orbitals,
group C2v and which have an equal number of a and ~
contribute to the expansion (3). Furthermore, determinants whicn only differ
in the spin part of the orbitals, and thus belong to the same space configuration, are coupled together to form singlet states. This further reduces the number of independent coefficients Cv ' Methods to obtain such pure
spin states will be discussed later. But even after this symmetry reduction,
the number of terms in (3) is, in our specific example, (and in general), too
large to be manageable. A further reduction has to be based on ideas concerning the importance of different configurations. Also the choice of the
one-electron functions (2) is important. When canonical HF orbitals are
used, the expansion (3) is known to be slowly convergent. The search for
a basis set with optimum convergence properties therefore plays an important role in the development of efficient Cl schemes. We will deal with
this problem in more detail in connection with the discussion of the different
CI methods. One optimum set of orbitals for a Cl expansion of a given

256

BJORN ROOS

length is obtained by applying the variation principle, not only to the


coefficients Cv ' but also to the spin orbitals themselves. This approach
gives rise to the ~ulti-fonfiguration-~elf-Consistent-~ield (MC-SCF)
method, which will be taken up in the lectures given by dr Veil lard and
will not be further discussed here.
2.2 The Natural Orbitals
A set of orbitals with certain optimum properties are the natural spin
orbitals. They are the orbitals which diagonalize the first order reduced
density matrix Pi(X i ; xi ) ,where xi is a common designation for the space
and spin coordinates of an electron.

The integration is performed over the coordinates for all electrons except the
first. If the wave function is given by the CI expansion (3) the density
matrix can be written as
Pi(xi; Xi)

= L L C * Cv
j,Lv

j,L

Pi (j,L II'

IXI; xi)

(10)

where Pi {J.wlxi; xi) is the transition density matrix

The diagonal term can be written as


(12 )

and is the density matrix connected with the Slater determinant ~j.L The
non-diagonal terms are zero if the two determinants differ in more than one
spin-orbital. If they differ in just one spin-orbital, the transition
density matrix is given by (say that ~ is different from ~k )
( 13)

where q is the parity of the permutation which puts equal orbitals into
equal positions in the two determinants.
The density matrix (9) can thus be written in the form
(14)

257

THE CONFIGURATION INTERACTION METHOD

where the matrix p (with elements Pij ) is the representation of pt(xi;


in the spin-orbital space. The diagonal elements of this matrix have the
form

Xt)

(15)

where the summation is taken over all determinants which contain the spinorbital I/J i .
Using relation (14) and the definition (9), we arrive at the following
important properties of the density matrix:
(16 )

Trp

=n

( 17)

(18 )

The density matrix is thus Hermitian; every diagonal element lies between
zero and one; and the sum of the diagonal elements equals the number of
electrons.
Since p is Hermitian, it is possible to find an unitary matrix U
which brings this matrix to diagonal form
n

where n
obta i n

= ut p u

(19)

is a diagonal matrix. Writing equation (14) in matrix form, we

The matrix n is thus the representation of the first order reduced


density matrix in the transformed basis

or
(20)

In this basis the density matrix can be written in the form


P (x t ' ; Xt) = L:in.
A. * (x t ') A.1 (xi)
ll

(21)

258

BJORN ROOS

where ni are the diagonal elements of the matrix n . The spin-orbitals


Ai are called the natural spin-orbitals of the system (Lowdin, 1955); and
the numbers ni are interpreted as their occupation numbers. These numbers
fulfill the same condition as the elements of p

o<

n. < 1
1

and

L: n. = n
i 1

(22)

When the CI-expansion (3) is reduced to one term (Hartree-Fock approximation), the occupation numbers are either
or 0, depending on whether the
corresponding orbital is present is present is the HF wave function or not.
Also in this case, the HF orbitals are themselves the natural spin-orbitals.
In a general CI expansion the HF determinant is often the dominant
term with a coefficient>0.9. In such cases, the n first occupation numbers
are only slightly lower than one, and the corresponding natural spin-orbitals
to a large extent resemble the HF orbitals. The rest of the natural spinorbitals have small occupation numbers and describe correlation effects.
We can form a spin-less density matrix by integrating equation (21) over
the spin variable. The orbitals for which this density matrix is diagonal are
called the natural orbitals. They have occupation numbers between 0 and 2.

Example:

The HZ molecule.

Hagstrom and Shull studied the natural orbitals for the ground state of this
molecule (Rev.Mod.Phys.~, 624 (1963)). They used a 33 configuration wave
function and obtained the following natural orbitals and occupation numbers
(only the first six are given here):
10'

10'

11T

2ag
11Tg
3ag

1.9638
0.OZ04
0.0086
0.0060
0.0004
0.0002

The first natural orbital 10' corresponds closely to the HF-orbital .


g
The second one is an anti-bonding a-orbital. Its effect is to increase the
probability for the electrons to be distributed so that if one of the
electrons is close to one of the nuclei the other electron is close to the
second nucleus. This is called left-right (or horisontal) correlation and
is in general described with natural orbitals which have a node in the
bond region. The natural orbital 11Tu ' when included in the wave function,

259

THE CONFIGURATION INTERACTION METHOD

increases the probability to find the electrons on opposite sides of the


molecular axis. This effect is called angular correlation and is in the
general case described with orbitals having higher angular quantum numbers
than those of the corresponding HF-orbita1s. The fourth natural orbital 2ug
has the same symmetry as the first, but has its maximum density further away
from the molecular axis. This orbital describes radial correlation effects;
that is, it increases the probability to find the second electron far away
from the axis when the first electron is close to it.

Thus we see that each natural orbital describes a specific correlation


of the electron motion. It is apparent that the natural orbitals are an
important tool in the analysis of complicated wave functions. A simple
expression is obtained for expectation values of one-electron properties:
n

( L f (i) )
i=i

=i
Ln. (A.I
ll

f 1A . )

(23)

where f is a one-electron operator. Also, when an atomic orbital basis is


used to express the natural orbitals, the density matrix (21) can be used
for a population analysis in the same way as is customary with HF wave
functions.
The natural orbitals are however also important in quite a dlfferent context. It can be proven that in a certain sense, they give rise to the most
rapidly convergent CI expansion (Lowdin 1955, Coleman 1963). In order to
understand this property of the natural orbitals let us consider the following
finite trial expansion
r
q, (xi' ,.) =

L
aiJ' fi (xi) gJ' (,.)
i,j=1

(24)

where T is a common designation for the variables x 2 ,x3 '. ,xn ' and fi
a set of r orthonormal one-particle functions.
Our task is now to find the set of functions f i (x 1 ) which minimizes
the least square error

is

(25)
where 'It is the exact eigenfunction of our Hamiltonian. In order to do this,
we introduce the functions
(26)

260

BJORN ROOS

the complex number ci


where for each
bi(r) is normalized

is chosen such that the function

(27)

Introducing (26) into (24), we can write the matrix element (25) in the
following form
'>=NI'l!)

r
r
*
r
2
-!: c('l!If.h.)
-L: c. (f.hI'l!)
+!: Ic1 . (28)
i=i 1
1 1 x 1 ,r i=i 1
1 1
xi,r
i=i 1

xi,r

In order to simplify this expression further, we introduce the functions


p.(r) = (1Ji If. )
1

(29)

Xi

where integration now takes place only over the first coordinate xi'
After some rearrangement (29) takes the form
r

'> = NIIJi)

+!: (c.h-plc.h.-p.) -!: (plp)


i=i 1 1 1 1 1 1 r i=i 1 1 r

xi' r

(30)

In this expression all terms are positive. '> is therefore minimized


if we minimize the second term and maximize the third. Since the functions
g. can be chosen freely, we can make the second term equal to zero by
J
choosing the functions hi such that
(31)

Consider now the last sum in (30). If we write it out in detail, we


obtain
r
!: (p1 p.)
i=i
1 1 r
r
*
..
=.. !:
!1Ji(xi,r) fi (xi)dX i J.1Ji (Xi,f) fi(xi)dxi dr
l=i r Xi
xi
.

SJ [S 'l! * (xi',r) IJi (Xi' r)] fi (x1') fi* (Xi) dXi dx1'

=.!:
1=1x i xi

The term within the bracket is the exact first order reduced density matrix,
and we finally obtain:
r

JS

.!: (Pilpi)r =.!:


Pi (Xi; xi) f i (x 1') fi (x 1 )dX 1 dxi
1=1
1=1 Xi xi

or in terms of natural spin-orbitals Ak:

261

THE CONFIGURATION INTERACTION METHOD

(32)

This sum takes its largest value when the functions fi are chosen as
the r natural spin-orbitals with the highest occupation numbers. This
choice thus minimizes the least square error (25), and the 'best' finite
expansion of the type (24) is obtained as
r
CP(Xi,T)=L:
a .. A.(xi)g(T)
i,j=1 1J 1
J

(33)

The rest of the proof follows directly from the antisymmetry property of
the wave function. Suppose that Ak (x 1 ) is a natural spin-orbital orthogonal
to all orbitals present in (33) (i .e. k> r). This gives

Due to the antisymmetry of

CP,

we can also write this result as

or
(34)
It follows that each integral in this sum must. be zero. The functions
can be expanded in natural orbitals as follows:

h.

(X)

h. (x 2 'X 3 '.. ,x ) = L: b .. A. (x 2 ) f. (x 3 , ... ,x )


j=i 1J J
J
n
1
n

If the expansion is inserted into equation (34), it follows that b..


lJ
for j > r ; and we have

The same procedure can be repeated for all the coordinates, and thus we
finally obtain the following 'optimum' expansion

which is an expansion of the trial function in configurations built with the


natural spin-orbitals as the one-electron basis set. Consequently, we call
this the natural expansion of the wave function.
We have thus proved that one optimal choice of a finite set of spin-

262

BJORN ROOS

orbitals for a CI-expansion is the natural spin-orbitals with the highest


occupation numbers. This choice gives a CI wave function with maximum
overlap with the true wave function. It does not always lead to the lowest
energy, however. For example, natural orbitals correlating inner shell electrons
have in general lower occupation numbers than those which are used to correlate
the valence electrons. Still, the effect on the correlation energy can be
larger. In most cases there is however a close parallellism between the
energy improvement, obtained by adding a certain natural orbital to the basis
set, and the occupation number for that orbital. Figure 1 illustrates this
principle. The example has been taken from a study of the water molecule.
5

10

15

20

0.1

J.2
a.u.
Fig.

Correlation energy for H 0 as a function of the number of one-electron


basis orbitals. Curve a ts obtained with virtual canonical HF orbitals
ordered after increasing orbital energy, curve b with natural orbitals
ordered after decreasing occupation numbers.

First a set of CI-calculations were performed with canonical HF-orbitals


as basis functions. The virtual orbitals were ordered after increasing
orbital energy and were added one at a time to the basis set. The corresponding improvement in energy is shown by curve a in the figure. This curve is
close to a straight line, showing that each virtual orbital has approximately
the same weight in the CI expansion. The final wave function, including all

THE CONFIGURATION INTERACTION METHOD

263

the 20 virtual orbitals, was then used to construct natural orbitals. These
orbitals were ordered after decreasing occupation numbers, and a new set of
CI calculations was performed with one natural orbital added at a time.
The result is shown in curve b. As can be seen from the curve, there is a
clear correlation between the occupation number and the energy gain. Actually
the ten first natural orbitals yields 93 % of the total correlation energy
while the ten first virtual orbitals yields only 50 %. It is obvious that the
use of natural orbitals as basis functions considerably improves the convergence of the CI expansion.
Usually the natural orbitals are not accessible from the start, so one
might think that the optimum properties of these orbitals has no practical
implications. There exists however methods by which we can obtain
approximate natural orbitals which can be used as basis functions. We will
come back to a discussion of some of these methods later.
One approach, which utilizes the benefits of the natural orbitals, is the
Iterative Natural Orbital (INO) methods introduced by Bender and Davidsson
(1966). As the name indicates, the natural orbitals are in this method obtained by an iterative procedure according to the following scheme:
1. The basis functions used are the canonical HF orbitals. A set of configurations is selected and a CI calculation is carried out.
2. The density matrix is then calculated using the CI wave function obtained
in the preceeding step. This density matrix is diagonalized to obtain
the natural orbitals.
3. These orbitals are used to construct a new set of configurations and a
new CI calculation is carried out.
4. Steps 2 and 3 are repeated until an energy minimum has been reached.
A difficulty with this scheme is that one has to carry out several CI
calculations (usually from three to five) before an energy minimum is
reached. Since each calculation is, as we shall see later, rather time consuming, the INO method easily becomes very expensive. On the other hand,
significant energy improvements can be obtained with rather short expansions.
The method is similar in philosophy to the MC-SCF method. Actually, if
only the HF determinant plus all singly excited configurations are included
in the expansion, the INO method becomes an efficient tool for determining
the HF orbitals.

264

BJORN ROOS

2.3 The CI expansion


As was pointed out in the previous section, it is usually not possible to
include all determinants which can be formed from a given set of spinorbitals into the CI expansion. A selection has to be made. In this section
we shall make a classification of the different terms appearing in the
expansion (3) and use this classification to discuss the relative importance
of different terms.
In the following, the one-electron basis will be constructed from a set
of orthonormal functions dependent on the space coordinates of a single
electron, i.e. from an orbital set
tip} = {ip1,CP2' .... , cPn. ,CPn.+ 1' tpn.+n
1

l I e

}.

(36)

The orbitals are assumed to be symmetry adapted, i.e. to form a basis for
irreducible representations of the spatial symmetry group of the system.
The orbital set is divided into two subsets: an internal orbital set
(CPi j i=1, ni and an external set {CPi j i = ni + 1, ni+n e}.
This division
of the orbitals is closely related to the classification of the different
terms in the CI-expansion. Orbital sets can be obtained in many different
ways, Usually the CI calculation is preceded by a HF calculation, and the
orbitals can then be chosen as the canonical HF orbitals or some suitable
linear transformation of them (for example: localized orbitals or approximate natural orbital s). If the HF orbitals form the basis set, the internal
set must at least include the occupied orbitals, and the virtual orbitals
(or a part of them) then form the external set.

Example: The water molecule


We shall in this and some of the following sections use the water molecule
as an exampel. Suppose we start from a HF calculation using a minimal basis
set:

The molecular orbitals are linear combinations of these basis functions


and form a basis for the point group C2v :
(37)

265

THE CONFIGURATION INTERACTION METHOD

Of these, in the HF approximation, the five first are occupied; and the
remaining two, empty. Thus the internal set can be chosen to consist of the
fi ve fi rs t members of [cp}.

In other cases the internal set is extended beyond the occupied HF space.
If we, for example, want to make a calculation of the full potential curve
for the F2 molecule (HF configuration (1a)2(1a )2(2a )2(2a )2(11T )4(11T )4(3a )2),
g
u
g
u
u
g
g
we would include at least also the orbital 3au into the internal set, even
if this orbital is not occupied in the single determinant HF wave function
for the ground state. The reason is that at large FF distances, the orbitals
3a
and 3au become degenerate, implying that for a correct description
g
of the wave function at the dissociation limit, we must include a determinant in which 3ag has been replaced with 3au
If natural orbitals obtained from a previous study form the orbital set,
the internal orbitals include natural orbitals with large occupation numbers.
The spin-orbital basis is constructed from the orbital set by multiplication with either an a
or a ~ spin function. This set thus includes
2(n i +n e ) one-electron functions.
A configuration is now defined as a set of occupation numbers for the
orbita 1s:
(38)

Each of the occupation numbers can take the values 0, 1, or 2; and the
sum of the occupation numbers must be equal to the number of electrons.

Example: H20 again.


The ground state HF configuration has two electrons in each of the five first
orbitals (cf. (37)):
(2,2,2,2,2,0,0)

(39)

A configuration corresponding to the single excitation 3a l

4a l is

characterized by the set of numbers:


(2,2,1,2,2,1,0)
and the double excitation

(40)

3a l lb 2 -4a 12b 2 , by

(2,2,1,1,2,1,1)

etc.

(41 )

266

BJORN ROOS

For each configuration we can in general form


2

~ (n.)
1

Slater determinants by different choices of the spin functions. For the configurations (39)-(41) we can form 1,4 and 16 Slater determinants respectively.
The Slater determinants are in general not spin eigenfunctions. In order
to obtain functions which are eigenfunctions of the spin operators, we have
to make a linear transformation of the original set. The new set of functions
are called Configuration State Functions (CSF). The CSF's will thus in
general be linear combinations of Slater determinants.

The configuration (39) contains only one Slater determinant which is a


spin eigenfunction with the eigenvalues S=O and~=O (singlet).
For the configuration (40) we can construct four determinants
(CP3

= 3a 1

and CP6

= 4a 1)

(42)

where we have for simplicity omitted the doubly occupied orbitals. There
are many ways in which we can obtain spin eigenstates from the set (42). One
way, which is used in some CI programs, would be to simply diagonalize the 82
matrix for each value of ~. One could also use the projection operator

Os
A

82 -

S' (S' + 1)

(43)

S,tS' S(S + 1) - S' (S' + 1)

which gives zero when operating on spin eigenstates with eigenvalues


different grom S(S+1).
Another method, which we will discuss in some detail later, is to construct so called bonded functions.
In the present example (42), vie notice that ~1 and ~4 are already
spin eigenfunctions with S=l, and NIs=l and -1, respectively. It is also easy

267

THE CONFIGURATION INTERACTION METHOD

to see (use the step down operator on

~1

that

is the third triplet state function and has Ms=O. The Orthogonal state
function

is then the singlet state with S=M=O. It is left as an exercise to the


reader to construct the CSF's for the configuration (41). One finds one
quintet, three triplet and two singlet states.

In the general case a branching diagram can be used for calculating the
number of spin eigenstates which can be constructed for a system with a
given number of unpaired electrons (cf. fig. 2).
Total spin S

5/2
2

3/2
1
1/2
0
~

__r-__ __+-__1-__
~

Number of

-r__-+________________~unpaired

~__

electrons

Fig. 2 The branching diagram. The number-of spin eigenstates with eigenvalue S for n unpaired electrons is obtained as the sum of the
number of states with S'=S+1/2 and S'=S-1/2 for n-1 unpaired
electrons.

268

BJORN ROOS

We can now write the C1 expansion in terms of CSF's. First we define a


reference state as a set of state functions (all of the same spin and space
symmetry) in which only internal orbitals are occupied. These states are
thus associated with the configurations

The C1 expansion can now be written as:


,T,

... = CO~O

+ '"
'" a
Ld.. C,
i a 1

a",,,,
+ {., {., Cooab
i , j a, b 1J

~.
1

ab +

~ ..
1J

(44)

Here ~o is a linear combination of reference state functions. The


coefficients in ~O must in general (when the reference state consists of
CSF's from more than one configuration) be included in the C1 vector. The
functions ~~b:::: are CSF's in which one, two or more orbital~ CPi'CPj"'"
in the reference state have been replaced with new orbitals CPa'~' .. .
We call the configurations which have been generated in this way single,
double, etc. replacements (or substitutions). We can call these replacements
internal, semi-internal or external according to whether the set CPa'~""
is completely, partly or not at all within the internal subset of our orbital
set {ep}

Example:

For the ground state of H20 a C1 expansion could take the form

(45)

is here the HF ground state CSF. The orbitals are numbered according
to their order in the orbital set (37). For simplicity only excitations from
the orbitals 3-5 are included in (45). Notice that only those CSF's which
belong to the irreducible representation Al of the point group C2v contribute to the wave function. Thus only two single replacement CSF's contribute.
Notice also that there are two singlet CSF's for the configuration 43 ~76
3a 1)(1b 2) ~ (4a 1) (2b 2)) .
These are labelled with 1 or 2 as a left upper
index. All the other double replacements correspond to closed shell configurations.
~o

269

THE CONFIGURATION INTERACTION METHOD

Let us now investigate the relative importance of the different terms in


the expansion (44). Suppose we have included in ~O all degeneracy and
near-degeneracy effects so that the rest of the coefficients can be considered
to be small compared to Co . We can then use perturbation theory to estimate
the magnitude of these coefficients. T~ do so, we construct a zeroth order
Hamiltonian which is a projection of H into the supspace spanned by the
CSF's included in (44).
HO = r III ) <Il i HIll) <Ill

(46)

Il

where each state Ill) corresponds to one of the CSF's 110) to ~o) .
All the CSF's are eigenfunctions to HO with eigenvalues <IlIHI/l')'
In the following discussion we will assume that the CSF's are orthogonal to
each other and to ~O. The perturbation part of the Hamiltonian is defined
"
"
as the difference between Hand
HO

~o is taken as the zeroth order wave function ~(O)


tri buti ons are expanded in the set
~(n)

= r

I /l)

higher order con-

C(n) Ill)

(48)

1110 p.

. h
..
.
Using n'th order Ray1elg
-Schrodlnger
perturbatlon
t heory,
ta i ne'd from the equa t ion

,T,(n)
~

is ob-

(49)

where En' the energy of order n, is obtained as


(50)

By inserting the expansion (48) into (49), we obtain for the coeffic,ients

C(n)

Il

( (p. I HO I p.) - ( 0 I HO I 0 C~n) =


=

n-2

r E k c(k) - r
k=1
nIl
v

and for the energy,

IJ.I Hi I v)

- Ei 0

IlV

c~n-1)

(51)

270

BJORN ROOS

En = E C{n-1) ( 0
J.L
Il

1H 1J.L >

(52)

For the first order coefficients and the second order energy, we obtain
the usual expressions
(53)

(54)

Notice that diagonal elements of Hand Ho are equal, as are also nondiagonal elements of
and
To first order in perturbation theory only those CSF's which have a
non-zero matrix element with at least one of the CSF's in the reference state
thus have a non-vanishing coefficient in the CI expansion. The part of the
configuration space which fulfills this condition is called the first order
interacting space.The Hamiltonian
has the form

Hi.

An,..

.E

1=1

where

h(i)

h(i) +

"2

. '"

(55)

.E. g(i,j)

1, J

is a one-particle operator and

g(i,j)

a two-particle operator.

Matrix elements of this Hamiltonian between Slater determinants vanish if


the two determinants differ in more than two spin-orbitals. Thus the first
order interacting space will consist of, at most, all single and double replacements with respect to the reference state. It is possible to reduce this space
further since one. can in general make linear combinations of the CSF's of a
given configuration so that only a part of these new state functions interact
with the reference state (McClean, 1973). The singly excited configurations
are excluded from the first order interacting space if the reference state is
a closed shell HF configuration (Brillouin's theorem). However, there are
only a few singly excited configurations so they are usually included anyway.
These configurations are also of special importance since the first order
density matrix contains terms linear in their coefficients in the CI expansion.
They can therefore be expected to have a larger effect on the natural orbitals
than on the correlation energy.
From these last considerations we can conclude that a CI expansion consisting of an appropriately chosen reference state plus all single and
double replacements with respect to this state ought to give a good description of the wave function. Experience also shows that around 90 % of the
correlation energy can be obtained with CI expansions of this type. HO~lever,

271

THE CONFIGURATION INTERACTION METHOD

even this truncated CI expansion can become very long. We shall later discuss
methods which can handle expansions with a length having an order of 50 000
terms; but even this number is easily exceeded for larger systems (as an
example: with ni = 10 and ne = 56 a total of 56 268 singly and doubly
excited CSF's was obtained in a study of the water dimer). We must therefore
look for methods to further reduce the length of the expansion. We have
already mentioned some possible ways to reduce the number of orbitals in the
external set by the use of approximate natural orbitals. One way to obtain
such orbitals is to calculate the density matrix by means of first order
perturbation theory. The CI coefficients are obtained by the simple formula
(53); and the density matrix elements are obtained by inserting these coefficient into equation (10).
Another frequently used method to reduce the number of terms in the CI
expansion is to include only those configurations which contribute to the
second order energy (54) with an amount larger than a given threshold. An
extension of this method is the CIPCI method (Huron et~. 1973). The
procedure used in this method can be briefly outlined in the following way:
1. The first order coefficients c~l) are calculated
2. A subspace s(l) is defined which includes the CSF's with c (1) 1arger
J.L
than a given threshold; and the corresponding CI matrix is diagona1ized.
3. The wave function w(l) obtained in this way is taken to be the new reference
state. This wave function is again perturbed under the influence of the
CSF's not included in S(l). The most important of these new CSF's are
added to s(l), giving a larger subspace S(2).
4. Step 2 and 3 are repeated until the quantity

I <wen) I If I w(n

+ E~n) - (w(n-i)

I II I w(n-i

- E2 (n-i)

is small enough. Here E~n) and Er- i ) are the second order energy
contributions from CSF's not included in w(n) and w(n-1), respectively.
In this method the most important CSF's are treated variationally, while
contributions from the others are obtained by means of perturbation theory.
Another limited type of CI wave function is the first order function (not
to be confused with the first order interacting space), developed by
Schaefer et~. (1969). The internal orbital set includes in this case all
valence shell orbitals and the reference state consists of all CSF's (of a
given symmetry) which can be generated from the internal set. In this way all
degeneracy and near degeneracy effects are automatically included into the
reference state. In addition to the reference state CSF's, all single

272

BJORN ROOS

replacement configurations are, in principle, included. A wave function of


this type does not give a large portion of the total correlation energy
but seems in many cases to be able to extract a large fraction of the
structure dependent part of the correlation energy (e.g. the correlation
contribution to binding energies, etc.). Calculation using first order wave
functions are usually made with the INO method, which ensures that the
orbital set is always as optimal as the basis set allows.

Example:

The valence shell orbitals for the N2 molecule are


+

Zag' 20'u' 30'g' 11Tu ' 11Tu ' 11Tg , 11T g , 30'u

(In general the valence shell orbitals are those which can be constructed
from a minimal atomic orbital basis set.)
In the HF approximation only the seven first orbitals are occupied. If we, for
simplicity, include only excitations out of 30'g
and 11Tu '
we obtain the
following reference state configurations:
2

(11T )2
g

(11Tu)2 -

(111 )2
g

(30'g)2

(30'U) 2

( 11Tu)2 -

(30'u) 2

(30'g)

(30' ) (111 )
g
U

(30:) (111 )
U
g

2
2
(30' ) (11T)
g
u

(3au )2(11Tg)2

(30'g) 2 (11Tu) 2

(11T ) 4
g

(3ag> (11Tu)3 -

(3au ) (11Tg) 3

(11T )4 _
U

(3au ) 2 (11T g) 2

(11T )4 _
u

(11Tg) 4

it;

All these configurations include


state functions which are necessary
for a correct description of the wave function for two separated nitrogen

THE CONFIGURATION INTERACTION METHOD

atoms. Notice that one of the


placement. A first order wave
plus single replacements from
into orbitals not included in

273

configurations corresponds to a sixtuple refunction would include all these configurations


the orbitals 3Ug, 3uu ' 1rru
and 1rrg
the valence shell.

First order wave functions have been used to calculate potential energy
surfaces and also to study atomic hyperfine structure. The single replacements in the latter case account for the polarization of inner shell orbitals
in open shell systems which play an important role in the determination of
the hfs parameters.

2.3

Pair Correlation Energies, the IEPA and PNO-CI Methods

We mentioned in the introduction that the major part of the correlation


energy is due to a pairwise correlation of the electronic motion. In this
section we shall investigate how this fact can be used to simplify the calculation of correlation energies.
Consider a CI expansion of the wave function to first order in perturbation theory. According to (53) only those CSF's contribute which have a nonvanishing matrix element with the reference state function. To s'implify, let
us assume that the reference state is a closed shell HF determinant. The
expansion can than be written as a sum over doubly excited configurations
only:
IV = <PO + L: L:
cab <pab
P a,b
P
P

(56)

ab

where <Pp
is the state function corresponding to an excitation of the
electron pair P into the virtual orbitals CPa and ~ . There are three
distinct types of pairs P. If the two electrons belong to the same molecular orbital CPi' they couple to a singlet; but if they belong to different
orbitals CPi and Wj , both a singlet and a triplet pair can be formed.
The following types are obtained:
Intraorbi ta 1 singlet pairs,
Interorbita 1 singlet pairs,
Interorbita 1 triplet pairs.
These electron pairs are called spin irreducible since they have the
correct spin symmetry. The orbitals CPa and ~ now couple to the hole
state p to form a singlet state <p~b
For CPi different from CPj we can
then form two doubly excited CSF's.

274

BJORN ROOS

The energy to second order is according to equation (52) obtained as:


(57)
or according to (54) as

It follows from the equations above that the correlation energy to


second order in perturbation theory can be written as a sum over pair correlation energies, Ep:
(59)

where
(60)

He expect this to be a good approximation for the correlation energy


when the coefficients c~b are small (no degeneracy or near degeneracy
effects) .
(Formally we can in the closed shell case always write the correlation
energy in the form (57). The first row (p. = 0) of the equation system (6)
gives the total energy as
(6il)

where we have put Co equal to unity. HOO is the HF energy so the correlation energy is obtained as
E

carr

L: cab (~ I If I ~ab >


p a, b P O P

= E

(62)

since only the double replacement states will contribute when ~o is taken
as the HF wave function. However, even if the equations (62) and (57) look
identical, there is an important difference. The coefficients in (62) have
been obtained from the secular equation (6) with use of the full expansion
(44). These coefficients then implicitly include higher order terms, thus
taking into account orbital effects (single replacement states), pair-pair
interactions, etc.).
As was mentioned earlier the calculation of the correlation energy with
the CI method gives rise to difficult computational problems due to the size

275

THE CONFIGURATION INTERACTION METHOD

of CI expansion. Many alternate routes to reduce the problem have therefore


been attempted.
The exact wave function can be written in the following form:
'l-' = <PO + Z U. + Z U .. +
i

i,j

1)

L:

i,j,k

(63)

U" k +
1)

where Ui' Uij , etc. are functions in which one, two, etc. spin-orbitals are
replaced with arbitrary one-, two-, etc., electron functions, for example
(64)

This expansion, the so called cluster expansion of the wave function, was
introduced by Sinanoglu (1963); and in a somewhat different form, by Nesbet
(1958). Considering only the pair correlation functions uij ' Sinanoglu, by
applying the variation principle, derived an effective two-electron
Schrodinger equation for U1)..
(65)
The total energy is then obtained as the HF energy plus the sum of all
pair correlation energies Bij The many-electron problem is thus reduced
to in(n-l) two-electron problem, that is, each pair correlation is treated
separately.
Equation (63) becomes identical to a CI expansion of the wave function if
the cluster functions are expanded in terms of configurations. Nesbet used
this result to formulate the many-electron problem as a hierarchy of BetheGoldstone equations. In the first step each pair of electrons is considered
separately with a trial function of the following type
a

b b

'l-'.. =<Po+ZC.<t.+zc.<t.+Z
1)

b))

a, b

ab ab
C .. <t ..
1)

1)

(66)

Thus >ij
includes only excitation out of the pair (i ,j). The difference
between the energy obtained with this trial function and the HF energy is
defined as the pair correlation energy. The total energy is again obtained
as a sum of the pair energies. In the next step, if carried out, an expansion
of the form (66) is considered for each triplet of electrons, and so on.
The final equation is, of course, identical to a full CI expansion. The
power of the method lies in the fact that higher order contributions very
quickly become small and can be neglected.
A similar method called the ~seudo ~atural Qrbital - Independent Ilectron
~air ~pproximation (PNO-IEPA) has been developed by Kutzelnigg and co-workers

276

BJORN ROOS

(Ahlrichs and Kutzelnigg, 1968). The pair correlation energies are here obin terms of pseudonatural
tained by an expansion of the pair function >Itij
orbitals. The prws are the orbitals which diagonalize the first order density
matrix for each pair function. The C1 expansion in terms of these orbitals
takes an especially simple form (Lowdin and Shull, 1956):

(67

Thus for singlet pairs (S), only diagonal terms enter the expansion while
for triplet pairs the natural orbitals occur in degenerate pairs (b,b').
The length of the expansion is now reduced from ne (n e+I)/2 terms (ne being
the size of the external space) to ne terms. Obviously the problem of
determining the C1 coefficients for each pair function has been made much
easier. The PNO's are obtained by applying the variational principle to the
expression for the total energy. This leads, after some further approximations,
to a pseudo eigenvalue equation which determines the PNO's.
The independent electron pair approximation, characterized as a method in
which the correlation energy is obtained as a sum of pair correlation energies,
has however certain features which restrict its practicability. Most important,
an upper bound to the exact energy is not obtained. Thus, it might happen
that more than 100 % of the correlation energy is obtained in an accurate
calculation. Recent experience also seems to indicate that the 1EPA overestimates the variation of the correlation energy with structural changes of
the system (e.g. bond distances become too long and force constants too
sma 11).
An interesting extension of the 1EPA-PNO method, called the PNO-C1 method,
has recently been proposed (Meyer, 1973). He writes the total wave function
in the following form (for Simplicity we only consider the closed shell case):
>It= .po+ ~ L: c~.a S.p~.a +
1 a..
11
11
II

L: [ L: sc~.a S.p~.a + L: TC?,b 1 T.p?b 1

I<J

ij

IJ

IJ

IJ

IJ"

(68)

whwre the sums over a .. , a .. and extend over the PtlO's for the intraorbital
II
IJ
pair (i,i) and the interorbital singlet and triplet pairs (i,j) (for
simplicity the indices
and ij have been left out where a and b appear
as upper indices). There are important differences between this type of
C1 expansion and the one we have discussed before. Since pseudonatural orbitals
form the orbital set only diagonal substitutions occur. However, this is
only possible with non-orthogonal orbitals. There is one set of PNO's for
each pair (i,j). Each set is orthonormalized:

THE CONFIGURATION INTERACTION METHOD

277
(69)

The correlation orbitals for different pairs (i,j) and (k,l) are in general
not orthogonal, however. The orthogonality relations (69) are sufficient to
guarantee the orthogonality between the different configurations occurring
in (68). The calculation of the matrix elements is therefore only slightly
more complicated than in the case with a completely orthonormalized basis set.
The diagonal form of the C1 expansion is exact only if the PNO's are obtained from an over-all C1 calculation, but such a procedure would make the
whole method meaningless. Meyer has however shown that the PNO's used in
IEPA are good approximations to the exact PNO's. The computational procedure
in the PNO-CI method can be summarized as follows:
1. The coefficients c~b are calculated for each pair P by a perturbational
procedure. The pair functions Wp are set up and the corresponding density
matrix is diagonalized to obtain the PNO's.
2. A CI expansion of the type (68) is formed using the PNO's with occupation
numbers larger than a given threshold.
3. The CI secular equation is solved to obtain the correlation energy and the
Cl coefficients.
There are several computational advantages with this method. The CI
expansion becomes orders of magnitudes shorter when only diagonal substitutions
occur. The number of two-electron integrals necessary for the calculation of
the matrix elements is also drastically reduced, which simplifies the transformation considerably. This simplification however partly counterbalanced
by the somewhat more complicated forms of the matrix elements, which is due
to the use of non-orthogonal orbitals. The main drawback of the method seems
to be the difficulty to extend it to general spin symmetries.
The PNO-C1 method has been tested in calculation on a number of hydrides.
For CH 4 83 per cent of the correlation energy was obtained. Studies on a
number of diatomic hydrides yielded bond distances, force constants and
binding energies in very good agreement with experiments, the errors being
less than 0.005 A, 100 cm- l and 0.3 eV, respectively.

278

BJORN ROOS

3.

Computational Procedures

3.1

General Structure of the CI Program

In this section some of the computational problems encountered in a CI


calculation will be discussed. An outline of the general structure of a CI
program will be given and certain parts of this program will be taken up in
some detail. We shall concentrate on the type of CI expansion discussed in
section 2.2, that is, an expansion in configurations built from an orthonormal orbital set.
The over-all flow chart for a CI program usually has the structure shown
in figure 3. The one-electron functions are almost always expanded in a set
of basis functions taken as Slater, or contracted Gaussian, atomic orbitals.
In a first step all necessary one- and two-electron integrals over these basis
functions are computed. The second step involves a calculation of the orbital
set in terms of the atomic basis functions;
(70)

where X", are the basis functions, and the coefficients ci ", are to be
determined. This part of the program usually contains an SCF or an MC-SCF
package. The SCF (or MC-CSF) orbitals are often used themselves as basis
orbitals in the CI calculation. Sometimes unitary transformatiQns are performed in order to obtain, for example, localized orbitals. As has already
been pointed out, CI expansions based on canonical SCF orbitals are usually
slowly convergent. Step 2 therefore often includes routines for calculating
approximate natural orbitals, for example, by using first order perturbation
theory.
Once the orbital set has been chosen, all one- and two-electron integrals
have to be transformed to this new basis set. This transformation is computationally a very complicated problem which only recently has been solved
in an efficient way (Ycshimine, 1973; Diercksen, 1974) with present days
computer facilities, the transformation problem sets a limit to the size of
the orbital set (ni+n e ) at around 100 functions. The details of the transformation process are discussed in detail later.
In step 4 a selection of configurations is made. Not all programs include
this part. In some cases all configurations of a given type are included;
in other cases, the selection is made by hand, and a list of configurations
is given as input. The principles behind the selection of the most important
configurations has been discussed earlier and will not be treated further here.

THE CONFIGURATION INTERACTION METHOD

l. Calculation of integrals

over atomic basis orb ita 1s

2. Orbita 1 set construction


(SCF, MC-SCF, approximate NO's etc.)

3. Integral transformation

program

4. Configuration selector

5. Symbolic matrix
element generator

6. Calculation of numerical
matrix elements

7. Matrix diagonalization
routines

8. Wave function analysis


(pair correlation energies,
natural orb ita 1s, etc.)

Fig. 3

Flow chart of a CI program.

279

280

BJORN ROOS

In step 5 the matrix elements of the Hamiltonian are obtained in symbolic


is reduced to a list of symbols
form. Each matrix element (~ IHI~)
It
1/
identifying the matrix element and giving references to the one- and twoelectron integrals contributing to it. Usually the list of symbolic matrix
elements is kept on some peripheral storage (disk or magnetic tape). Step 5
is the most time consuming part of the actual CI-calculation. From the
symbolic matrix elements, numerical matrix elements are formed in step 6.
The symbolic matrix element list can be used to construct numerical matrix
elements for several problems (e.g. for calculations on different points on a
potential curve) as long as the molecular symmetry and the type of configurations and orbitals are left unchanged. The lowest root or roots of the secular
equation are obtained in step 7. The size of the equation usually prevents
a direct diagonalization of the full CI matrix. Instead, only one or a few
roots are obtained by means of iterative or perturbative methods. Finally the
wave function is analyzed in step 8. The first order density matrix is
computed and diagonalized. The resulting natural orbitals can then be used
for studies of electron densities, molecular properties, etc. In the following sections some important parts of the general procedure will be discussed
in more detail.

3.2 The Integral Transformation Program


The transformation of two-electron integrals from a basis of atomic orbitals
to the orbital basis used in the CI wave function, is a difficult computational problem. Efficient programs are however available and have been implemented in some CI programs (e.g. ALCHEMY, MUNICH, MOLECULE). Here we shall
discuss some features of the transformation program used in the program
system MOLECULE. The general method has been taken from the work of
Yoshimine and Diercksen. The basic ideas are therefore the same as in the
program systems ALCHEMY and MUNICH.
We define a two-electron integral over the atomic basis functions as
(71)

and over the molecular orbitals as


(72)

281

THE CONFIGURATION INTERACTION METHOD

where the orbitals ware defined according to (70). The integrals (72)
are obtained through a four index transformation of the integrals (71),

(73)
A direct transformation according to (73) implies that on the order of
mS multiplications and additions would have to be performed, where

m is
the basis set size. Consider, however, a stepwise transformation according
to
(Illli xl)

= I:

cn. (Illli xX)

(74)

(Illli kl)

= I:

ck

(Illli xl)

(75)

(Ilj I kl)

= I:

c j II (Illli kl)

(76)

(ij I kl)

= I:

(Iljlkl)
c.
III

(77)

II

'"

The transformation is now reduced to four m5 problems. In practise the


transformation is carried through in two steps. First the transformations (74)
and (75) are performed and the resulting integrals (",III kl) are stored on
some peripheral memory device. Essentially the same procedure is then repeated
for (76) and (77). The atomic integrals (",III xX) are generally stored on
a magnetic tape in arbitrary order. Each integral is for identification tagged
with a four index label. The number of such integrals is usually in the range
10 5-10 6 , which is far beyond the capacity of any core storage. In order to
perform the stepd (74) and (75), one therefore likes to have this data list
ordered so that elements with the same pair index ",II come together. The
necessary reordering is performed by means of direct access memory devices,
and the process therefore works only on computers Which have this facility.
The reordering of the integrals and the subsequent transformations are
performed as follows:
1. The basic integral list is read through, and the number of non-zero
integra 1sis ca 1cu 1a ted for each pa i r index IlI1 These numbers are
called nllll (notice that' the integral (Illli xX) contributes both to
nand n, if ",IIi xX).
IlI1
Xl'.
2. Supposing it is possible to store K integrals simultaneously on the
direct access memory, one puts MIN",II= 1 and chooses MAX",II to be the
smallest number that gives
MAX

MIN

+1

(7S)

IJII

nlJlI> K

IlI1

282

BJORN ROOS

Integra 1s with pa i r i ndi ces /J v in the range MIN/J v - MAX/J v can then be
stored simultaneously on the direct access device. In a first cycle it is
then possible to obtain the half-transformed integrals (/Jv I kl) for /JV
in this range.
3. Supposing that N words are available in the core storage, the core
storage is divided into blocks, each with a size of m(m+l)/2, where m
is the size of the basis X/J(/J=1.2 ... m). Each such block can then
store all basic integrals with a given value of /Jv. There will be M
such blocks available in core, where
M

= [ 2N /

m (m+1) ]

4. The integrals are then grouped together so that each group contains M pair
i nd ices /J v . There wi 11 be
Ng = [(MAX/JV - MIN/JV) / MJ + 1
such groups:
group 2

/J v = MIN - (MIN + M - 1)
/JV
/JV
/JV = (MIN + M) - (MIN + 2M -1)
/JV
/JV

group Ng

/J v = (MIN + M(N - 1) ) /JV


g

group 1

MAX

/JV

Each such group of integrals can be stored in core simultaneously, and it is


then possible to perform the transformations (74) and (75) for each group
of indices /Jv. The problem is only to order the integrals in such groups.
5. The core is divided into Ng records. Each record contains integrals
.
belonging to one of the groups given above. The integrals are read into
core, identified with respect to the pair index /Jv and dis~ributed with
their index labels into the appropriate records. The records have the
following structure:
Integrals
NSIZE words

Indices
NSIZE words

LSIZE
1 word

IADR
1 word

where LSIZE is the actual number of integrals in this record (LSIZE~NSIZE).


IADR is from the beginning set equal to -1. When a record is full, it is
written onto the direct access memory. IADR is then changed to contain the
starting address for this record. In this way all integrals with pair
indices in the range MIt)Jv - MAX/Jv are grouped together and distributed
onto the direct access device. The last Ng values of IADR are kept in
core.

THE CONFIGURATION INTERACTION METHOD

283

6. The integrals can now be read into core again, group by group. The reading
process has been named back-chaining since, starting with one of the
IADR values kept in core, each record read contains the address to the
next record of integrals which belong to the same group. The reading
process for a given group is terminated when IADR = -1.
7. The transformation steps (74) and (75) are performed in core for each
group of integrals, and the resulting integrals (1J.IlIkl)are written onto
peripheral storage sequentially.
8. MINIJ.II is set equal to MA)LII + 1, and a new value for MAXIJ.II is computed
using equation (78). The whole process is repeated until MAXIJ.II~ m(m+l)/2
(the total number of pairs IJ.II).
9. Starting from the 1ist of half-transformed integrals (1J.IlIkl), the whole
procedure is repeated for the transformation steps (76) and (77).
The process requires a minimum amount of I/O time. If the total number
of non-zero integrals is NINT, the integral tape has to be read
[NINT/K] + 2 times. For most computers this number can be kept in the
range 1-10. The process is also not very sensitive to the size of the
core storage. A decrease in storage only means that the number of pair
indices IJ.II within each group of integrals becomes smaller. Thus Ng
increases, and the size of each record is smaller due to the increased
number of accesses, this procedure increases the I/O time somewhat but
does not affect the CPU time.

3.3 Matrix element Calculation - Bonded Functions


In order to carry through a CI calculation, a method for defining the set of
spin-symmetrized configuration state functions used in the CI expansion must
first be found. An algorithm must then be derived to determine the matrix
elements between the CSF's in terms of one- and two-electron integrals over
the basis orbital set. As was pointed out earlier, many different techniques
can be used to obtain the CSF's with given values of Sand M for each
configuration. One frequently used method employs bonded functions (Boys
et~. 1956; for further details, see also Sutcliffe, 1966 and Reeves, 1966).
A bonded function (BF) corresponding to p spin-coupled pairs and n-2p unpaired orbitals is defined as follows:
(79)

where the spin-coupled pairs are defined as

284

BJORN ROOS

(80a)
(80b)

and the unpaired orbitals as


(80c)

= <Pi (i) 0' (i)

As is easily seen the bonded function is an eigenfunction of 52 and Sz with


the eigenvalues S = ~(n-2p) and M=S. A complete set of linearly independent
bonded functions with the same eigenvalues Sand M is defined as a canonical
set of BF and is generated in the following way:

1. In each BF, identical orbitals must be spin-coupled.


2. The remaining left and right brackets are assigned to the non-repeated
orbitals. The brackets are assigned one to each orbital in all possible
ways such that there is always at least one more left bracket to the left
of any right bracket than there are right brackets to the left of that
bracket.
The second rule, which at a first sight appears to be somewhat complicated,
simply states that there should be no right brackets which cannot be paired
with a left bracket. After the brackets have been assigned, the structure is
reordered so that spin-coupled pairs are grouped together and unpaired orbitals
appear to the right.

Example:

Let us consider the

follo~ling

doubly excited configuration for H20

Using the rules given above we can construct two singlet BF's for this configuration
(8la)
and

which after reordering gives

(8lb)

285

THE CONFIGURATION INTERACTION METHOD

A bonded function composed of n orbitals of which n-2p are unpaired and


containing x idential pairs. can be written as a sum of 2P- x determinants.

Example: For the functions (81a) and (81b). we have n=lO. p=5 and x=3.
Thus each of them can be written as a sum of 22=4 determinant. Inserting
the definitions (80). we obtain for (81a)

(82)

A similar formula is obtained for (81b).

The next step is to evaluate formulas for matrix elements of the


Hamiltonian (61) between the bonded functions. In general these matrix
elements can be written as sums over individual integrals between tpe
determinants of the two bonded functions. These matrix elements can in turn
be written in terms-of one- and two-electron integrals over the orbitals ~i
The general expression is therefore:
AI
iJ v
IlV
A
(<P1l IH <Pv) = L: a ij
<Wi Ih I~j) + L: b ijkl <~i ~j Ig 1~'+1 )
(83)
A

IlI1

IlV

The coefficients a ij
and bijkl
are called Projective Reduction
Coefficients (PRC). They depend only on the form in which the two bonded
functions <P/l and <Pv depend on the orbitals. Formulas for calculating the
PRC for general bonded functions using a diagram technique have been given
by Sutcliffe. The principles are as follows (the reader is referred to the
paper by Sutcliffe for more details):
1. The orbitals QJJ.i

in <P1l and cp.lli in <PII are written parallel to each


other. The orbitals in v are rearranged so that: (1). identical orbitals
appear opposite to one another; and (2). spin coupled pairs are kept
adjacent as often as possible.
2. The orbitals above each other are joined by a solid vertical line and
curved dotted lines are drawn between the spin coupled pairs.

286

BJORN ROOS

Example: The diagram connecting the bonded function (81a) with the HF
determinant
(84)

Po at the top, (only indices written):

is, with

---

+1

,
2

-1

+1
,1

+1

-1
2

-,- ,

,
4

+1

-1
6

,
4

-1

'"

+1
,

(85)

-1
5

,~

3. Each diagram is in general composed of a number of patterns of two


different types. Patterns which close back on themselves are called cycles,
those which end on an unpaired orbital are referred to as chains. An even
chain begins and ends on the same function (it involves an even number of
vertical lines). An odd chain starts in one function and ends in the other.
Obviously there must be as many chains in a diagram as there are unpaired
orbitals. There is always an even number of even chains.

Example: 1. Excited states of H20+


4>j.l =

q,1I =

A [CPi CD1 J[~1)2 CP2J[CP3 CP3J[CD4 (,04J[CP5


A [CPi CP1 J[CP2 CD2 J[(,03 CP4 J[CP5 <D6 J[CP7
--, ,
.... -- ...
,
2

....

I
,

2,

6,

3,

I I

.....

_-- }

,5
,-

-----This diagram is composed of two cycles and one odd chain.

(86)

287

THE CONFIGURATION INTERACTION METHOD

2. H20 triplet states


q,1J. = A [CP1 CD1 J[<P2 <P2 J[<P3 9 3 J[CP4 <P4 J[<P5 [<06
q,V

= A [CP1 CP1 J[<p2 <P2 J[<P3 9 5 J[<P6 CP6 J[<P4


, .... 1

- ....

I I

I I

.1

,2

,
3

[<P7
,

...
3

,5

4
\
7

~--

.. ,

(87)

This diagram is composed of three cycles and two even chains.

4. From the diagram we can now generate the matrix element. Each vertical line
is given a parity Pi starting with plus one for the first line, minus one
for the second, and so on, as indicated in the diagram (85), The matrix
elements are obtained accordingly to the following formulas:
(a)

with more than two even chains in the diagram,

(88a)
(88b)
(b)

with two even chains in the diagram,


(q,Iq,)=o
I.L

(88c)

(88d)
(c)

otherwise,

(88e)

+t::, Qij [(<P II <P1I.lg(1,2)icpv.cP ) +q .. (<p <P1I .lg(1,2)I<Pv


1> J

1"1

"'J

1 Vj

IJ

J1.i" J

Yv '> J,

where the operators have been defined according to (55).

(88f)

288

BJORN ROOS

The different constants appearing in the formulas have the following


definition:
if

k
otherwise

1:

if c,o/J

(-1)

,; c,oll

k
otherwise

1:
r

c,o/J ,; c,oll

o +0
/J II

for k'; i

for k'; i and j

1 [(n-h) /2-mJ

(-"2)

V2')

(89)

where: 0/J and 011 are the signatures of the permutations which bring the
unpaired orbitals back to their original order in 'P/J and 'PII ' respectively;
n is the number of electrons; h the number of chains (ev~n and odd); m
the number of cycles; J the number of pairs for whichc,oH=c,o/Jj , but
C,lJvi'; C,lJvj
(or vise versa). The coefficients qij are given in table 1. Here
Pij is the product of Pi and Pj; a letter D indicates that i and j occur
in different patterns, and a letter S indicates that they occur in the same
pattern.
Table 1.

Coefficients for the two-electron integrals in (88d) and (88f).


j

cycle
cycle
odd chcdn
odd chain

even chain

cycle
odd chain
cycle
odd chain

even chain

Pij

-1
+1
-1
+1
-1
+1
-1
+1
-1
+1

Pattern

qij

-1/2
-1/2
+1
-2

-1
+1
-2

S
S
D
D

-1
+1

289

THE CONFIGURATION INTERACTION METHOD

Example:

Calculation of the matrix element connected with the diagram (85).

Case (c) applies since all patterns are cycles. The overlap integral is zero
since not all orbitals in q,J.L and q,1I are identical (Q i in (88e) is always
zero). There is no one-electron contribution since there are two different
orbitals (Q i in (88f) is always zero).
Calculation of r :
0=

10

m =4

J=2.

= (-1)

0+0

J.L

II

= 0

1 [{10-0)/2-4J
(-'2)

(00

unpaired orbitals)

2
(t/2')

=-1

There is only one term in the last sum of (88f) for which Qij # 0, corresponding to the positions 5 and 8 in the diagram. Thus Pij = -1. Since the two
positions belong to the same cycle (pattern = S), qij is obtained from the
third row in table 1. This yields qij = -1 and

The procedure given above is well suited for automatization on a computer,


and a program using it has recently been developed and is included in the
MUNICH program system (Diercksen and Sutcliffe, 1974). The program first
generates a list of symbolic matrix elements each of which has the following
general structure:
1. A matrix element identifier
2. Symbolic overlap integral (zero if the overlap integral is zero; otherwise
it contains an entry to a table of r -values)
3. Symbolic one and/or two-electron integrals (contains, sequence number
for the integral, entrys to tables of q-va1ues, etc. There is one block
for each contribution).
Once the list of symbolic matrix elements has been built, the references
have to be reordered in such a way that consequtive symbolic contributions
refer to integrals that can be kept in the core storage simultaneously. The
references to the integrals can then be resolved. The numerical contributions
to the different matrix elements then, in turn, have to be reordered so that
consecutive contributions refer to matrix elements which can be kept in core
storage simultaneously. It is then possible to resolve the references of the
numerical contributions and compute the matrix elements. The reordering

290

BJORN ROOS

processes, which are necessary for obtaining numerical matrix elements from
the list of symbolic matrix elements, is very similar to the one used to
reorder the two-electron integrals in the transformation program.

3.3 Calculation of Eigenvalues and Eigenvectors of the C1 Matrix


Once the list of matrix element has been built, the next step in any C1 calculation is to compute one or more of the lowest eigenvalues and corresponding
eigenvectors of the C1 matrix (equation (6)). The size of this matrix is often
so large that it cannot be completely kept in the computer's core storage
(typical dimensions are in the range 10 3-10 5). The methods used to obtain the
eigenvalues should therefore be such that they require small sections of the
matrix (e.g. one row) at a time.
One method which has frequently been used for obtaining the lowest eigenvalue is the iterative method of Nesbet (1965). This method is based on an
iterative scheme for solving linear equation systems of the type
(90)
Starting from an initial guess vector x(O), the first iteration gives
x(l) is then inserted in the left hand side of (90), and ~(2)
is obtained from the right hand side, and so on.
Equation (6) can easily be rewritten in a form similar to (90):

~(l) = A x(O),

C" =
,.

I:

II ;i P.

(H

IL 1/

- ES

IL II

) C

II

(E S

IL IL

- H

IL IL

(91)

By inserting on the right hand side the coefficients obtained in the (n-l)'th
iteration, equation (91) gives the coefficients of the n'th iteration. The
equation is usually written in a differential form. If we first define the
quantity
I: (H
II

IL II

- ES

IL II

) C(n-i)
II

(92)

we obtain
(93)
One component of the coefficient vector is fixed at unity throughout the
calculation and the others are incremented according to (93).
The normalization constant D and the energy E have to be updated in
each iteration. The computational procedure is the following:

THE CONFIGURATION INTERACTION METHOD

1. A

starting vector c(O)

291

is guessed. The energy


(94)

and the normalization constant


(95)

are computed.
2. This step is performed for IJ= 1,2, ... ,N, where N is the dimension of
the problem, but the iteration is repeated from IJ = 1 if \llC(n) \
becomes 1a rger than a given thres ho 1d II
u(n) and II c(n) ~re
max
~
~
calculated according to (92) and (93). The increments to the normalization constant and the energy are computed as
llD(n) = llC(n) [2 L: c(n-1) S
+ llC(n) ]
IJ
v
v
IJv
IJ

(96)

llE(n) = u(n) llC(n) / (D+llD)


IJ
IJ

(97)

3. Step 2 is repeated until all increments \llCIJ\


become smaller than a
fixed threshold llC max ' which has been specified as input.
The method is of second order in the energy, and convergence to the lowest
eigenvalue is monotonic. A slightly modified version of the Nesbet algorithm
is used in the program system MOLECULE. Here all the aIJ's are calculated
simultaneously, and the normalization constant and the energy are updated at
the end of each iteration.
A slightly different technique for obtaining the lowest roots of a symmetric matrix has been proposed by Shavitt et~. (1973). Essentially the
method adjusts the components C
one at a time so as to decrease the
.
IJ
energy as much as possible 1n each step. The total energy is given as
E=L:CH
C/L:CS
C
IJ. v IJ IJ v v IJ. v IJ J.I v v

When one of the components CJ.I


C

IJ

-<C

IJ

+a

(98)

is incremented with, say, a

the energy can be treated as a function of the parameter a. It is then


possible to take the derivative of equation (98) with respect to a, and
thereby calculate the value of a which minimizes E. The parameter a is obtained
as one of the solutions to the quadratic equation

292

BJORN ROOS

(99)

where
a = (q H
- f S )
~ Illl
Il Illl
b=H

Illl

c=f

with

Il

= l:
II

IlII

( 100)

Il~

Il

and

II

f" = l: H
,..

II

which minimizes

= -2c

-ES

-q

The value of a

(b -

IlII
E

I b 2 - 4ac\

II

is
(101 )

The computational procedure is essentially the same as in the Nesbet method.


Convergence is obtained when all a's in an iteration cycle (Il = 1,2, ... ,N)
are smaller than a given threshold. Higher roots can be obtained by making
adjustments in a subspace which is orthogonal to all vectors already
calculated.
Another way of obtaining the lowest eigenvalue and the corresponding eigenvector is to use the perturbation theory constituted by equations (49)-(51).
This method has been implemented in MOLECULE and has proven to be quickly
convergent for very large CI expansions (up to 50 000 terms have been used)
where one of the terms has a large coefficient. The perturbation expansion does
not however always converge. Convergence can be assured by combining the
perturbation calculation with a perturbation-variation treatment (Goscinski
and Brandas, 1970). The wave function is in this method expanded' in the
perturbations:
W=

n
l:

k=O

w(k)

(102)

where n is the current order of the perturbation expansion. The coefficients


Ck are obtained by the ordinary variation method. The computed energy is
always an upper bound to the two eigenvalues and decreases monotonically towards
it. The matrix elements over the perturbations w(k) are simple functions of
the perturbation energies E(k) (k=1, ... 2n) and of the overlap integrals
N(k)\w(l, and can therefore easily be calculated (Lowdin, 1965). The
perturbation-variation method has been found to give convergence in six figures
of the correlation energy within 10-15 iterations, also in cases where the
ordinary perturbation expansion is strongly divergent.

THE CONFIGURATION INTERACTION METHOD

293

3.4 A CI method Using Integral Type Symbolic References


In a CI calculation based on the methods described in the previous section
the computational procedure can be outlined as follows:
1. A list of symbolic matrix elements is generated
2. The references in the symbolic matrix elements are resolved to the
numerical values of the integrals, and the numerical matrix elements are
computed.
3. The lowest root(s) of the secular equation are calculated using an iterative
or perturbative method.
Points 1 and 2 above set a limit to the size of the CI expansion which can
be treated with this method. It is hardly possible to treat problems which have
more than around 10 4 terms in the expansion. The number of matrix elements
would in this case be around 5'10 7 . Even if only 10 % of these elements are
different from zero around ten magnetic tapes would be necessary to store the
symbolic matrix elements, and four more tapes would be needed for the
numerical matrix elements. The computer time also is prohibitive in calculations of this size. However, in calculations of high accuracy, where a large
fraction of the correlation energy is required, it is in many cases necessary
to go beyond this limit, at least when an orthogonal orbital set is used.
Extensions too larger expansions are, however, possible in special cases by
means of a technique avoiding the construction of a list of matrix elements
(symbolic and numerical). Here the CI vector is, instead, obtained directly
from the relatively short list of two-electron integrals (usually the length
of this list is of the order of 10 5-10 6) by using one of the iterative or
perturbative methods mentioned earlier (Roos, 1972). The important term to
calculate in all these methods is
(J'{n)
j.l

z.:: H
II

j.lll

C (n-i)
II

( 103)

In the Nesbet procedure this term is included in (92), while it is defined


as f j.l in the method of Shavitt et al. In the perturbation expansion method
the same quantity appears in the last sum in equation (51). If the matrix
elements Hj.lll have been precalculated, the evaluation of the sum (103)
presents no computational problem. Bur for very large CI matrices an explicit
calculation of these matrix elements is impossible. In special cases it is,
however, possible to proceed in a quite different way. If there are only a
few different types of symbolic matrix elements, it is possible to resolve
the references directly in equation (103) and write this equation as a sum
over two-electron integrals. Let us, as an example, consider a CI expansion

294

BJORN ROOS

with a reference state which is a closed shell determinant and which includes
all single and double replacements:

c.a 1'.a + L:
1

(104 )

i>j a>b

There are in this case six different types of configuration state functions:
( 105)
(one should notice that there are two independent CSF's for the configuration
ij -ab. when i j and alb). Using these CSF's, we can construct 21 different
types of matrix elements. All these elements can be written as simple linear
combinations of one- and two-electron integrals, which can then be directly
inserted into equation (103). The sum over matrix elements is then replaced
with a sum over two-electron integrals. For example, the two-electron contribution to 6a~ can, in the case where both ~ and v refer to doubly
excited configurations, be written as
L: [A/.LV (ikljl) +

k,l
+

A~v

(illkj) }

<~:~b

L: [A

(aclbd)+A'

(adlbc)} C~~-i)d

L: [B

(ailck) + B'

(aclki)} C~~-i)b
J-c

(bjlck)+B'

(bclkj)}C~~-i)

c,d

~v

k,c

L: [B

k,c

L: [C

k,c

L: {C

k,c

~v

~v

/.Lv

~v

~v

~v

~v

IJ-C

I'-ac

(bilck) + C' (bclki)} Ck(~-i)


J.J.v
J-ac
(ajlck) +

C~V (aclkj) } c~n-1)

Ik-cb

(106 )

Here ~ = ij _ab and


v correspond to the configuration specified on
the coefficient C(n-i). The numerical values of the coupling parameters
A
, Band C
are completely determined by the types of configurations
J.J.v
J.J.v
J.J.v
J.J. and v . These values are thus easily tabulated (tables of them can be
found in the reference given above). Similar (but simpler) equations can be
derived for the interaction between doubly and singly excited configuration
and between the singly excited configurations. In these equations the integral
types act as the symbolic references. An integral type is defined by the

THE CONFIGURATION INTERACTION METHOD

295

four orbitals from which it is formed. In the example used here, one has to
define the following different integral types
(ij Ik1)
(ailjk)
(ablij)
(ai I bj)

(ablci)
(ailbc)
(ablcd)

where the indices i, j, k and 1 refer to internal orbitals and a, b, c and d


to external orbitals.
In a program using integral type symbolic references, the a 's are obJJ.
tained in the following way: (1), the integral list is read into core storage,
one block at a time; (2), each integral is identified with respect to its
type; (3), the reference is resolved, and the contribution from each integral
after multiplication with one of the
can be added to the appropriate a~n)
parameters AJJ.V , BJJ.v , etc., and a coefficient C~-1) The computation of the
vector a is completed when all two-electron integrals have been read. The
integral .type symbol reference method becomes efficient for large scale
calculations since the list of two electron integrals is much shorter than
the list of CI matrix elements in these cases. (In a recent full CI treatment
of non-linear H3 (all single, double and triple replacements were included)
the number of non-zero matrix elements was 28 million, but the list of twoelectron integrals was only around 300 000. A calculation using the conventional
method would have been impossible. Using the present scheme, each iteration
took only a few minutes of computer time.)
The integral type symbolic reference method has been implemented in the
MOLECULE program system. Some typical calculations on closed shell systems
and using this program are presented in table 2.
The method can be extended to open shell cases as long as the number of
integral types does not become prohibitively large. The HOLECULE system also
includes a program based on an unrestricted Hartree-Fock reference state
with a CI expansion in terms of Slater determinants. A version for full
CI calculations on three and four electron systems has recently been
developed. A generalization to an arbitrary open shell system involves the
creation of a list of symbolic matrix elements for each ~ of matrix
element. The references in the symbolic matrix elements can then be resolved
to the integral types, instead of to the numerical values of the integrals.
In this way a list of integral types and tables of coupling parameters
can be generated, which can then be used to distribute the two-electron
integrals to the different components aJJ. Work along this line is in
progress.

296

Table 2.

BJORN RODS

Some examples of CI calculations made with the program


system MOLECULE
c
Nconf

CPU time d
(minutes/ieration)

35

3304

0.8

-0.2306

35

10010

3.0

-0.2120

39

10101

3.6

-0.2402

F-(H 2O)

46

36204

10.5

-0.4148

(H 2O)2

58

56268

22.1

':0.4097

Molecule

n.1a

ne

H2O
H 0+
3
Li+(H 2O)

a
b
c
d

number of
number of
number of
on an IBM

Ecorr (a.u.)

internal orbitals
external orbitals
single and double replacement state functions
360/91 computer

REFERENCES
R. Ahlrichs and W. Kutzelnigg, J. Chern. Phys. 48, 1819 (1968).
C.F. Bender and E.R. Davidson, J. Phys. Chem. ZQ, 2675 (1968).
S.F. Boys, C.M. Reeves and I. Shavitt, Nature ~, 1207 (1956).
E. Brandas and 0. Goscinski, Phys. Rev. Al, 552 (1970).
A.J. Coleman, Rev. Mod. Phys. 35, 668 (1963).
G. Diercksen, Theoret. Chirn. Acta (Berl.) ~, 1 (1974).
G. Diercksen and B.T. Sutcliffe, to be published.
B. Huron, J.P. Malrieux and P. Rancurel, J. Chem. Phys. 58, 5745 (1973).
A.D. McLean and B. Liu, IBM Corp. Technical Report (1973).
P.O. Lowdin, Phys. Rev. 97, 1474 (1955).
P.O. Lowdin and Harrison Shull, Phys. Rev, lQl, 1730 (1956).
P.O. Lowdin, J. Math. Phys. &.' 1341 (1965).
W. Meyer, J. Chem. Phys. 58. 1017 (1973).
R.K. Nesbet. Phys. Rev. 109. 1632 (1958); see also: Advan. Chem. Phys. ~,
321 (1965).
R.K. Nesbet. J. Chem. Phys. 43, 311 (1965).
C.M. Reeves, Commun. ACM~. 276 (1966).
B. Roos, Chem. Phys. Letters ~, 153 (1972).

THE CONFIGURATION INTERACTION METHOD

297

H.F. Schaefer III, R.A. Klemm and F.E. Harris, Phys. Rev. lQl, 137 (1969).
I. Shavitt, C.F. Bender, A. Pipano and R.P. Hosteny, J. Compo Phys. ll, 90
(1973).
O. Sinanoglu, J. Chern. Phys. 36, 3198 (1963).
B.T. Sutcliffe, J. Chern. Phys. 45, 235 (1966).
M. Yoshimine, IBM Corp. Technical Report RJ555, 1973.

MOLECULAR PROPERTIES

Peter

Swanstr~m

and Flemming Hegelund

Department of Chemistry, University of Aarhus


DK-8000 Aarhus C, Denmark

Contents:
1. The Hamiltonian of a molecule exposed to an external
electromagnetic field.
2. Reduction of this operator to the A-subspace.
The concept of molecular properties.
3. Explicit algebraic formulas of some first and second
order one-electron properties.
4. Computation of properties using perturbation theory.
5. Gauge problems of magnetic properties.
This series of lectures is intended to provide a background for understanding the concept of molecular properties.
We shall in fact interprete properties as functions which define the kind and magnitude of the various phenomena that actually can be observed under experimental circumstances, using
external fields as perturbations. We believe that this approach
is both fundamental and easy to understand; for this reason it
will be given a rather general formulation which pretends both
to survey the elementary literature, and to place each individual
property in its proper context.
It will always be subject to discussion whether a molecular
theory which involves electromagnetic fields should be presented
using field quantization methods, or not. In order to ensure a
maximum of overlap with the existing literature we have chosen
to consider the fields as not-quantized.
Some emphasis is given to the development of a theory which

Diercksen et aL (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 299-345.
All Rights Reserved. Copyright 1975 by D. Reidel Publishing Company, DordrechtHolland.

300

PETER SWANSTR~M AND FLEMMING HEGELUND

is applicable in a computational evaluation of properties.


1. THE HAMILTONIAN OF A MOLECULE EXPOSED TO AN EXTERNAL
ELECTROMAGNETIC FIELD.
The derivation of the Hamiltonian is not the purpose of these
lectures; it has been done already by several authors and here we
shall merely state the form of the operator. We refer the reader
to the comprehensive study of this field presented by Moss [1],
whose formulation and results we have adopted.
We consider the molecule to be a collection of particles electrons and nuclei - all possessing mass, charge and spin. The
mutual interactions between these particles determine the possible stationary states of the molecule; we describe these interactions in terms of internal electromagnetic fields created
by the particles during their movements. In addition to the internal fields we shall allow the molecule to be exposed to some
measuring operation defined by the application of an external
electromagnetic field. Following Moss [1] we write the total
Hamiltonian as
H=H + H + H
en
e
n

(1.1)

that is, as a sum of three operators representing pure electronic, pure nuclear, and mixed electronic-nuclear interactions.
TI~

n
H

L{mc Z

e(r.t) + ~ - ~.o H(r.t)


~
2m
~
~

1 _
+ -~.

4mc

r-

lTI.X
~

E(r.t) - E(r.t) x 'if.]} + .L! 2


~

~,J

e2

r ..

(1. 2)

~J

where we, somewhat arbitrarily, have neglected terms containing


c 2 in the denominator together with all inter-electronic interactions, except for the pure Coulomb repulsion term. This will
by no means imply any restrictions on the generality of the principles outlined in the following; it is done only for a matter of
convenience, because the omitted operators contribute very little
to the effects that we want to describe.
We can give a short physical interpretation of the individual
operators appearing in (1.2). Term by term they represent (a) the
electronic rest energy, (b) interaction with external electric
potential, (c) kinetic energy, (d) Zeeman interaction, (e) spin
interaction due to the movement of the electrons relative to the
external electric field, (f) mutual electronic Coulomb repulsion.
In (1.2), E and H denote the external (applied) electric and
magnetic field strengths. They are derived from the electromagnetic scalar and vector potentials, and A, through the well-known
equations

301

MOLECULAR PROPERTIES

1 aA
-Vet> - cat

(1. 3)

fixA

(1. 4)

curl A

The mechanical momentum operator

n.1=1
P.

+ ~

A. = -ifiV.

Cl

+ ~
C

is defined as

TI.
1

A(r.t)

(1. 5)

and the spin magnetic moment of the electron is represented by


the operator
11

= -gl1BS

(llB = eh/2mc)

(1. 6)

where s is the spin angular momentum operator of the electron,


g is the so-called electronic g-factor, and llB is the Bohrmagneton. cgs-units are applied throughout.
The nuclear operator can be written
H

TI2
N
a
2
E{m c + Z eet>Cr: t) +
a
a
2m - lla
a
a
a

where

na

TI

P - 2:....

. H(fa t)} +

N Z Z e2
E' a b
2r ab
a,b

(1. 7)

represents the mechanical momentum of nucleus a:

Z e
c

Aa

-ih V
a

Z e
-

a
c

--

A(r t)

(1. 8)

The spin magnetic moment of the nucleus a is defined by the


operator
(1. 9)

where I is the spin angular momentum operator of the nucleus,


g is tHe nuclear g-factor, and llN the nuclear magneton. A comm~nt on the origin of spin and the associated magnetic moments
might be added at this place. The electron spin is predicted by
the Dirac equation thus being a purely relativistic effect and
the electronic g-factor can be obtained from quantum electrodynamics. The origin of nuclear spin is more complex. The Dirac equation does not apply to nuclei and quantum electrodynamics does
not provide g-factors in agreement with experiment. A crude explanation of the nuclear spin and therefore of the spin magnetic
moment is the following: nuclei are not simply point charges; we
may take the nucleus to be a collection of elementary particlesprotons and neutrons - which "move" within the nuclear sphere or

302

PETER SWANSTR,pM AND FLEMMING HEGELUND

ellipsoid creating local electric and magnetic fields at the nucleus. Thus we cannot say whether the nucleus has a spin because
of its magnetic moment, or vice versa. Another consequence of the
finite size and structure of the nucleus is that it may possess
an electric quadrupole (but not a dipole) moment; we shall return
to this point later. For these reasons the nuclear g-factors will
be treated as phenomenological values that can be determined by
experiment.
We can give a physical interpretation of the terms of the
nuclear operator (1.7): (a) the nuclear rest energ~ (b) interaction with external electric potential, (c) nuclear kinetic energy, (d) Zeeman-term, (e) Coulomb repulsion.
The operator covering the mixed electron-nucleus effects is
Z e2
n N
a
L: L: {r
a
i
ai

en

(0 1. jj a )r2.
al

e
2

al

'"Qa
r5.
al

Z e
a
2mc
-

iJ.1

. (ral.
r

3(jJ .
1

n.1 )

3.

al

r al.)(jJa

e )la
mc

+ -

. (r .

al

x iT. )
1

3.

al

r al.)

ai }

(1.10)

where we have discarded the operators containing c 2 or m in the


denominator. The vector
is defined as
The 1as~ term
in (1.10) represents the ffiteraction betweefi tRe nuclear quadrupole moment Q (a tensor of rank 2) and the electrons. As mentioned above, itaappears as a consequence of the non-spherical
character of the charge distribution within the nucleus; when
the electric potential of the nucleus is expanded in a Taylor
series (the so-called mu1tipo1e expansion to which we shall return later) we obtain a number of terms, namely: (a) the Coulomb
interaction, which is the first operator appearing in (1.10),
(b) a dipole term which vanishes because of the symmetry of the
nuc18'.ls, (c) ::he quadrupole term, (d) and higher order pole interactions, like octupo1es, which we have neglected here. For a
matter of convenience the quadrupole term is usually rewritten
as follows:

r .

e
2

-r

al

r.-r .

'"

Qa r al

r5.
al

1
-6

~Q Q(a) V(a) (i)

"''-' as

as

(1.11)

where a,S denote the cartesian components of the (space-fixed)

303

MOLECULAR PROPERTIES

ru

coordinate system. The components of the quadrupole tensor Qa


can, in a rather formal way, be written as
Q(a)

as

=e

E (3a S
- 0
pa
ap ap

as r2)
ap

(1.12)

The sum extends over the coordinates of the protons p within the
nucleus a; this expression is, of course, tajher fictious since
it is based on classical e~ectrostatics.
are the components
of the second rank tensor V which we deno~e the internal electric
field gradient operator:
a

V:

(1.13)
or, using a tensor algebra formalism:
ru
V (i)

3 r .r . - r2. ~
-e

a~

a~

a~

ru

(1.14 )
ru

We observe that both Q and V are traceless; th~s proves very


useful in a more detaifed anafysis of the quadrupole coupling.
We conclude the presentation of the electron-nucleus operator
(1.10) by giving the physical interpretation of the individual
terms: (a) electron-nucleus Coulomb attraction, (b) electron
spin-electron orbit coupling, (c) nuclear spin-electron orbit
coupling, (d) electron spin-nuclear spin dipole coupling, (e)
Fermi contact interaction (penetration of the nucleus), (f) and
the nuclear quadrupole term.
Introduction of time-independent fields.
The operators presented above are still too general to be
useful for our purposes. From now on we shall restrict ourselves
to apply fields which are independent of time. For this reason
the following derivations only lead to time-independent properties. However, this restriction is not very severe because many
properties are measured in constant field experiments such as
NMR, ESR, and molecular beam spectroscopy. Thus we set

(1.15 )
H

= constant

(1.16)

where we restrict the magnetic field to be uniform. In this case


we can choose the electromagnetic potentials time-independent as
well, see (1.3) and (1.4):

304

PETER SWANSTRCPM AND FLEMMING HEGELUND

(i't)

(r)

and

(1.17)

A(rt)

In this case Maxwell's equations take the form

V
II

.H

II x

II x H

div

4np

div H
curl

ext

(1.18)

(1.19)

(1.20)

curl H

(1.21)

which will be used in the subsequent reduction of the Hamiltonian.


The charge density p
in (1.18) is the external density that
produces the electriExfield; it vanishes when we assume that the
molecule does not approach these field-producing charges.
Since H = curl A = constant we can chose
function of H, namely

A to

be a simple
(1.22)

which also ensures that A fulfills the so-called Coulomb gauge:


div A = 0

(1. 23a)

The important consequence is that, for operators:


II A = div A + A II = A II

(1.24)

that is A and II commute. Since we have chosen the scalar potential


to be time-independent the potentials will also satisfy the
Lorentz condition
div A +

l ~ =
c Clt

(1.23b)

i.e. the electromagnetic potentials are relativistically invariant.


Transformation to a molecule-fixed coordinate system.
We are now in principle able to simplify the space-fixed
Hamiltonian (1.1) considerably, that is we could rewrite it
using the Maxwell equations (1.18)-(1.21) of the constant external fields together with the Coulomb gauge (1.24). This would lead
to an operator which would be a function of the external fields;
before performing this simplification we shall, however, transform the Hamiltonian to a molecule-fixed system because we want
to include the effect of molecular rotation and vibration in our
considerations. Such a transformation is by no means trivial to

MOLECULAR PROPERTIES

305

perform - and again we shall merely state the results of it in a


suitable and approximate form, neglecting the effects that are
known to be small. The reader is referred to Moss LI4J.
We shall assume that the molecule is free to rotate as a
rigid framework, i.e. that the nuclei remain fixed in their
equilibrium position during rotation. In addition it will be assumed that the molecule is free to perform vibrations and that
the corresponding nuclear motions do not affect any of the operators of the Hamiltonian, except for the direct electron-nucleus
and nucleus-nucleus Coulomb interactions. In this case the transformation will only affect the following operators: the nuclear
kinetic energy operator and the nuclear Coulomb potentials. Apart
from these operators which we shall analyse separately below, all
tne Hamiltonian terms given in (1.2), (1.7), and (1.10) remain
unchanged, except that all coordinates refer to the moleculefixed system and all nuclear coordinates are the coordinates of
the equilibrium configuration.
The nuclear Hamiltonian (1.7) contains an operator representing the nuclear kinetic energy. We can approximate this operator,
using equations (1.8) and (1.24), as follows:

2m

p2

22 e

a
c

A p +
a
a

2m

(1.25)

that is, the effect of the external fields on the kinetic energy
of a nucleus is neglected. In this case the nuclear kinetic energy, when transformed to a molecule-fixed system, yields three
operators that represent translation, rotation, and vibration of
the nuclear framework:
N p2

N p2

L:

a
2m

+ L:

2m

(1.26)

We shall ignore the translation operator because it does not affect the stationary states of the molecule. In the rotational
operator, N denotes the angular momentum of the nuclear framework
and I (a ~ x,y,z) are the three moments of inertia (choosing
princ~pal axes); since we have assumed that the molecule rotates
rigidly the moments of inertia are constants. The summation in
the vibrational operator extends over the moments conjugate to
the molecule-fixed coordinates. The removal of the 6 (for linear
molecules 5) redundant operators must be accomplished by means of
the Eckart conditions.
We shall discuss for a moment the rotation operator of (1.26).
This operator represents the rotational energy of the nuclei only,
but we shall attempt to express it in terms of the total molecular

306

PETER SWANSTR\bM AND FLEMMING HEGELUND

angular momentum. We observe that the total angular momentum J of


the molecule (neglecting spins) is just the sum of the electronic momentum L and the nuclear momentum N:

J = N+ L

where

n
L: L.
~
i

n
L: (f. x
~
i

p.)

(1.27)

If we substitute this into the rotational operator we obtain


L:

N2
a

na

L:

(J -L ) 2
a a
21

J2
L: {~-

21

J L
a a
I
a

L2
+~}
21

(1. 28)

The resulting operators represent over-all molecular rotation,


coupling between total and electronic angular moments, and pure
electronic rotation energy, respectively. In what follows below
this latter operator will be omitted since it merely gives rise
to a constant energy contribution within a given electronic state.
We have now concluded the transformation of the nuclear kinetic energy operator to the molecule-fixed system. We shall now
turn to an evaluation of the nuclear Coulomb interaction operators which we assumed were responsible for the nuclear vibration
potential. We expand the operators in terms of infinitesimal distortions 0 = ( .... ora . orb . ) of the nuclear framework:
Q(o)

Q'(O)o + ~ 0 Q"(O)o +

Q(O) +

N
Q(O) +

L:(~).or

a '1or

(1.29)

For the electron-nucleus Coulomb attraction we obtain

N n Z e2
c
L: L: r . (0)
c i

c~

N n Z eZ

L: L: _c_ +
. r .
c ~ c~

+ .....

(1.30)

and the expansion of the mutual nucleus-nucleus repulsion yields

307

MOLECULAR PROPERTIES

II

c,d

Z Z e2
c d
2r cd (o)

Z Z e2
C d
2rcd

II

c,d

I or . (II
a

Z Z e 2r
ac
a c
r3
ac

N Z Z e 2 (3r -r2
1 N
ac ac
I or ( I I a c
2
a
5
r
a
c
ac

+ -

or
a

'V

Z Z e 2 (r2 I - 3i"
a b
ab
ab rab)
(
or
) orb
a
r5
ab

1
+ 2

l)

II

a,b

(1.31)

We are now in position to give the transformed Hamiltonians He'


Hand H
The electronic operator (1.2) becomes
en
n
n

I {mc 2

e
e. + --2 (P. + - A.~ )2
m

- 1\

II

(1.32)

i,j

If we apply the field relations derived above we obtain, after


some manipulation:

P~

-lJ~

~ {-ei

e2

+ 2m + 2mc H Li + 8mc 2

H+

__
1_

2mc

E. (jl.
~

'V

H(r~I - r.r.)H
~

II

x P.)} +
~

i,j

(1.33)

where we have omitted the electronic rest energy mc 2


The nuclear Hamiltonian (1.7) becomes

Z e
1
(P - ~A )2
I {m c 2 + Z e + --a
c a
- lJ a
2m
a
a a
a
a

N
H
n

II

a,b
and after some manipulations:

(1. 34)

308

PETER SWANSTRM AND FLEMMING HEGELUND

2: {Z ecp - lJa
a a
a

J2
+ 2: {

J L

a a

2~ ---I
a

H}

Z Z e2
a b
2r ab (6)

2:'
a,b

p2
a
} + Z
2m
a
a
N

(1.35)

where we have used the transscriptions (1.25), (1.26), and


(1.28) of the nuclear kinetic energy, and where the nuclear rest
energy m c 2 is omitted. For the Coulomb interaction operator the
expansio~ (1.31) is understood.
The electron-nucleus interaction operator becomes
H

lJ.
Cr- .______
x (P. + ~ A.)]
a ______
____
___
2mc
r3.

n N
Z e2
Z 2: {- /.(6)
i a
a~

en

e
+-

lJ

Z e

(ii.

ila )r2.

a~

~a~~

[r . x (P. + ~
a~
~
c

mc

A. ) ]

c~_~

a~

a~

r .) (ila

- 3 (ii. 0
~

r .)
a~

a~

r5.
a~

Q(a)

V~~)(i)}

2:
as as

(1.36)

and after some manipulations:


n N
H

en

Z Z

i a

{-

e2

Z e2
e jJ. L .
~
a~
a
_ _a_
2mc r 3.
4mc 2
Z

r . (6)
a~

e
mc

+ -

-lJaoL
r

Ho il. (r . or.)-r . (r. oiJ.)


~

a~

a~

a~

e2

a~

'V

(r .0r.)I - r .r.
a~

+ --H

a~

2mc L

3.
a~

ra2 ~~ il~~

a~

r3.

r . (jJ. r .)
a~

a~

8n
-3

s: (-

r a~. ) lJOlJ
~
a

(1. 37)

where the expansion (1.30) should be substituted for the Coulomb

309

MOLECULAR PROPERTIES

r . x P. denotes the angular momentum

interaction operator. L .
. h respect to nuc 1 eus a~a.

a~

w~t

In principle we have now conclded the transformation of the


Hamiltonian (1.1) to a molecule-fixed system. The last step to be
undertaken in this section will be to rearrange the operator so
as to obtain a form which is suitable to handle.
The partitioned Hamiltonian.
We shall now try to separate the terms of the molecular Hamiltonian according to the effects that they represent. Such a procedure is of course somewhat arbitrary and can be justified only
through the results that we obtain using it. We shall in fact
partition the Hamiltonian as follows:

(1. 38)

H = Ho + Hrot + Hv~'b + H.~nt + HE + H


.
-li

where H is a model Hamiltonian representing the 'unperturbed'


system: o
H
o

n p~
N Z e2
I {2. - I _a_}

2m

a rai

I'

i,j

e2

2r ..
~J

(1.39)

I'
a,b

In (1.38), H
describes the rigid rotation of the molecule,
H. the nucIgr vibration. H.
represents the internal perv~b.
h
d 1 Ham~l~nt.
.
h
b
f external
turDat~ons to t e mo e
ton~an ~n t e a sence 0
fields. HE and HH cover the interactions between the molecule
and the external electric and magnetic fields. By inspection of
(1.33), (1.35), and (1.38) we find:
H

(1.40)

rot

which gives rise to the pure rotational states,

N p2
Hv~'b

I ~

2m

(1.41)

+ V(8)

where V(8) is the vibrational potential energy operator obtained


from the expansions (1.30) and (1.31):
Nn
v(8)

I I{-8r

a i

Z e 2r .
a~
a
r3.
a~

Z e 2 (3 r- . r
a
a~ ai
a
2 rS.

8r

a~

r2.
a~

1'>

8r }
a

310

PETER SWANSTR~M AND FLEMMING HEGELUND

Hora o(Z'

a,b

(Z'
c

r a b r a b) )

I - 3

ora

Z'

Z Z e 2 (3 r r -r2
a c
ac ac ac

ora

"v

"'- +
ur
b

(1.42)

We shall later reduce this potential to the more familiar force


constant representation. The internal perturbation operator is
given by
N

Z Z {-

H.

~nt

a i
+ ]Ja

e
a
2mc

ii.oL .

a~

r 3.

Z Q(a)V(a)(i)

6 as as

as

ai +
mc r 3.

-]J.r 2 . - 3 r(ii.or-.)
~

a~

a~

a~

a~

o[~

J a (L.)

8IT

a~

5.

o(r.)jJ.]} (1.43)
a~

a~

For a matter of convenience the latter (bracketed) operator will


be denoted the hyperfine structure operator, hfs:
n

hfs(a)

il.r2. -

Z {~~
i mc r3.

a~

r . (jj.o:r.)
a~

8IT

a~

r5.

<5

Cr . )ii.} (1.44)
a~

a~

a~

H.
represents various intrinsic molecular effects - mainly
tfig~e involving spin - which cannot be classified as pure vibrational or rotational. The individual terms of (1.43) describe: (a) electron spin-orbit coupling, (b) nuclear quadrupole
interaction, (c) coupling between total and pure electronic angular moments, (d) nuclear spin-electron orbit coupling, (e) magnetic dipole interactions, (f) Fermi contact between electrons
and nuclei. The latter three effects are entirely covered by the
hyperfine structure operator (1.44).
We proceed to the field-dependent operators:
n
Z {-e(r.) +

HE

1
2mc

E(r. )

N
0

(ii. x P.)} + Z Z e (r )
~

(1.45 )

This operator will be discussed in more detail below when we have


introduced the multipole expansion. For the magnetic operator we
get
H
H

H'o {_e_ L.
2mc

~ - Wi -

NZe 2

a 4mc 2

ii.(r. r.)-r.(r.
o

a~

a~

r3.

a~

jJ.)

~}

311

MOLECULAR PROPERTIES

n _
z::

N
z::

e2

2. ""

}_

H { - - (r. I - r.r.) H

8mc 2

H{___e_2.
2mc 2

~ ~

""

n (r- . -r.) I - r .r.


z:: __~a~~~~~~____~a~~~~~
i

- ""}I ]1

(1.46)

r3.

a~

The operator which is linear in H describes the interaction between the external magnetic field and the molecular magnetic moment, composed of electron orbit, electron spin and mixed orbitspin contributions. The operator which is quadratic in H describes
the interaction between the external field and the induced magnetic moment, i.e. it defines the ability of the molecule to be affected by a field ('susceptibility'). The last operator covers
the interaction between the nuclear spin and the field, often denoted the nuclear Zeeman effect.
The mu1tipo1e expansion.
We observe that the electric operator (1.45) contains the
scalar potential ~ which is related to the external electric
field through equation (1.3):
(1. 47)

We shall find a suitable representation of the potential

~.

~particle with charge Z

...........

R'=R+r/
/

/
/

,r
,

....

Molecule-fixed system

/
/
/

Field-producing charge q at origo in space-fixed


coordinate system

312

PETER SWANSTRrPM AND FLEMMING HEGELUND

The electric potential at R' is simply

"(R')
-- R'
q q
't'
- IR+rl

(1.48)

We expand cP in powers of r:

CP(R') = CP(R) + r VCP(R) +

t r(VVcp(Rr

(1.49)

or, in molecule-fixed coordinates:


cp(r) = CP(O)
-

r E(O) -

21_'V-_
r F(O)r

(1.50)

+ ...

where 0 is the origin of the molecule-fixed system. In (1.50) we


have introduced the external field and field gradient defined by:

'V
F (0)

-ilcp(o) = q a/R 3

(1.51)

-VVCP(O) = q

(1.52)

(3RR-R2~)/R5

The last term of (1.50) can be transcribed using (1.52):

A particle with charge Z (Z


-e for electrons, and Z = Z e for
nuclei) at the point r will thus experience an electric pgtential
energy
+

(1.53)

where we have introduced the quadrupole moment operator


(1. 54)
The expansion (1.53) is called the electric multipole expansion. It evaluates the interaction between a charged particle and
an electric field in terms of a direct Coulomb (point charge) interaction plus dipole, quadrupole, and higher order pole interactions. This type of expansion has in fact been used above in
the expansion of the vibrational potentials (1.30) and (1.31),
and it was implicitely used in the expansion of the nuclear electric potential into a Coulomb and a nuclear quadrupole term.
We are now able to rewrite the electric field Hamiltonian
(1.45) using the multipole expansion. We obtain

313

MOLECULAR PROPERTIES

1
L {er. E + - - E

HE

i
N

+ L

2mc

I.

{-z er
a

E -

1:.

. (jJ.I.

3 as

x P. )
I.

- 1:.

3 as

GaS(i) FaS}

GaS (a) FaS}

(1.55 )

where we have omitted the (constant) Coulomb potential (O) and


the higher order pole interactions. The physical interpretation
of the terms in this representation is: (a) electronic dipolefield interaction, (b) electronic spin dipole-field interaction,
(c) electronic quadrupole-field gradient interaction, (d) nuclear
dipole-field interaction, (e) nuclear quadrupole-field gradient
interaction.
2. REDUCTION OF THE HAMILTONIAN TO THE A-SUBSPACE.
THE CONCEPT OF MOLECULAR PROPERTIES.
The Hamiltonian established in Section 1 is generally operating at all the electronic and nuclear space and spin coordinates,
and it includes operators representing interactions between the
molecule and the external electric and magnetic fields. In this
section we shall, on the basis of this Hamiltonian, construct a
new energy operator which only operates at the nuclear coordinates, and we shall use the new operator to introduce the concept
of molecular properties.
We use the notation A = (A. ,A
) to denote the complete
f
'
I.nt ext appearI.ng
. .I.n t h e HamI.'1 to.
co 1 lectI.on
0
non-el
ectronI.C
operators
nian (1.38). The subscripts indicate that the operators are of
internal and external origin, respectively. Thus

A.I.nt
A
ext

A.

(p,
a

or a ,J, ii,
Qa la
a

- ru (E, F, H)

1,2,oo.,N)

(Z.l)
(Z. Z)

is obviously a collection of operators representing (by

ofH~r) nuclear vibrational moment, nuclear displacement, total

angular momentum, nuclear spin magnetic moment, and nuclear


quadrupole moment, whereas A
collects the external electric
field, the field gradient, afiatthe magnetic field perturbations.
By inspection it is seen that we can partition
the Hamiltonian (1.38) into a sum of three different operators
H = H' + H + H
o
1
Z

(Z.3)

where H' is independent of ~ whereas HJ and HZ contain the Aoperato~s to first and second power. Tfius

314

PETER SWANSTR~M AND FLEMMING HEGELUND

H = ~ H(l)

H
2

and

=.!.
2

~ it'(2)~

(2.4)

.
.
where H(l) is a vector and '\,(2)
H
~s a matr~x, both conta~n~ng
pure electronic operators. In this way we have obviously expanded
the Hamiltonian in terms of ~ as follows:

= HI

A H(l)

+ .!.~ it'(2)~

(2.5)

Since HI is independent of A it will consist of the model Hamiltonian ~1.39) defining the unperturbed system, and of the electron
spin-orbit coupling operator appearing in (1.43), i.e.
n

HI

P:

z:; {-.2

2m

n N

+ z:; z:; {-

i a

N Z e2
n
N
2
z:; _a_} + z:;1 -e+ z:; I
r
2r ..
a
ai
i,j
a,b
~J
Z e
a
2mc

Z Z e2
a b
2r ab

iJ.
La~}
.
~
r

(2.6)

3.
a~

In this way we have in fact shifted the electron spin-orbit interaction operator from the internal perturbation operator H.
(1.43) into H . The apparent typographical confusion between~nt
the new HI (2?6) and the old H (1.39) disappears when we enter
actual ca~cu1ations in which tRe spin-orbit interaction conventionally again is considered as an independent perturbation to
H In order to keep the subsequent derivations physically rig8urous as far as possible we shall, however, maintain the new
definition (2.6).
H(l) and it'(2) can be written down immediately if one collects
the terms of the Hamiltonian (1.38) which are linear and quadratic, respectively, in the elements of I. Since this is completely
trivial we shall take the resulting operator (2.5) as ~t1~tands
~t2~out stating exp1icite1y the form of the operators Hand
H

We intend to solve the stationary eigenstate problem


(2.7)

The form of the Hamiltonian, (2.3) and (2.5), suggests the use
of perturbation theory, considering H1 and H2 as first and second
order perturbations to the model system described by HI. We introduce the Hamiltonian
0
(2.8)

315

MOLECULAR PROPERTIES

where ~ is an ordering parameter, and we shall use ordinary


Rayleigh-Schrodinger perturbation theory to find the approximate solutions to the stationary state problem (2.7). We expand
energies and wave functions in terms of ~:
(0) + ]1e (1) + ]12 e(2)
ek
+
k
k

ek

+ ]12':1'(2) +
':I' = ':1'(0) + ~':I'(l)
k
k
k
k

.....
.....

(2.9)
(2.10)

and insert into the eigenvalue equation (2.7). If we assume that


the zeroth order problem has been completely solved, i.e.
H' ':1'(0)

(0)
= e k ':1'(0)
k

(2.11)

we obtain the well-known results for the energy corrections


through second order:
(2.12)

(2.13)

Since H' is oPtryting exclusively at electronic space and spin


coordingtes, ':I'ko. is a functi0Y ~f these electronic space and
spin variables only. Because 'l'k o is expected to describe an
electronic quantum state the Pauli principle(of anti-symmetry
must be imposed on the wave function, i.e. ':I' 0) must change its
sign when the space and spin coordinates of two electrons are
interchanged. Equation (2.11) is usually referred to as the
electronic Schrodinger equation.
It is important to observe that the electronic states 'l'~o)
include the effect of spin-orbit coupling, i.e. they include the
so-called fine structure splitting of the electronic energy levels. Thus all perturbations represented by HI and H2 are, in
principle, perturbations to a single fine structure level.
In actual calculations where the spin-orbit operator is discarded from H' the fine structure level is t8~roximated simply by
the 'pure' el~ctronic state. In this case 'l'k
is primarily a
function of the electronic space coordinates only, because H is
spinless. However( tt must still obey the anti-symmetry prin2iple.
It follows that ':I'ko becomes a function of both space and spin
coordinates, though in this case the spin dependence results exclusively from an additional physical requirement and not from
the form of the operator; the spin dependence is designed merely

316

PETER SWANSTRM AND FLEMMING HEGELUND

to satisfy the anti-symmetry principle.


In any case it follows at once that the integrations in
~xtend over the electronic space and spin
variables, leaving the A-operators appearing in Hl and H2 unaffected. We use this fact to rewrite the energy express~on (2.9)
in terms of I using (2.4). If at the same time we switch the
perturbation fully on (~~l) we obtain

(2.12) and (2.13) only

-(1)
e k = E(o) - A Ek
k

1 - '\,(2)-

2" A Ek A +

(2.14)

where
E (0)
k

e ( 0 ) = <\ji (0) IHI I \ji ( 0 ) >


k
k
0 k

(2.15)

E (1)
k

_<\ji(o) IH(l) l\ji(o


k
k

(2.16)

'\,(2)

_<\ji~o) 1~(2) l\ji~o

Ek

- ZI

<\ji(o) lil(f) I \ji(o<\ji(o) IIf'll \ji(o) > + h.c.


k
11k
E(o)_E(o)
k
1

(2.17)

It is remarkable that ~k is an operator in the nuclear space


since it contains the A-operators; we shall denote this operator
the reduced energy operator. We say that the complete Hamiltonian
has been reduced to the A-subspace because the electronic space
and spin variables have been removed by means of integration with
the electronic state functions. e k must in fact be interpreted as
the average or effective molecular Hamiltonian corresponding to
the kIth electronic state.
The formal similarity between the expansion (2.14) of the reduced energy operator and an ordinary Taylor series allows for
the following re-definition or re-interpretation of the expansion
coefficients:
E(O)
k
- (1)

Ek

'" (2)
Ek

(2.18)

lim e k

A~o

(le k
lim ( - - )
A~o

elA

(l2e
k
lim (- -::-::-)
A~o
dAdA

(2.19)
(2.20)

Thus the elements of E(l) and 'E:(2) can be considered simply as

317

MOLECULAR PROPERTIES

derivatives of_the total molecQlar energy with respect to the


components of A, evaluated at A = O.
The somewhat artificial definition of E(l) and ~(2) as negative energy corrections follows already established conventions.
The reasons for this will become clearer below.
Before analyzing the reduced energy operator Ln detail we
shall briefly interprete it by referring to some well-known results.
Born and Oppenheimer [2] postulated, though in a somewhat
different formulation, that the operator which we call the reduced energy operator defines the 'nuclear' eigenstates ~, i.e.
(2.21)

where ~ is a function of the nuclear space and spin coordinates,


and where the corresponding energies are functions of the external field perturbations only. ~ will accordingly describe molecular rotation and vibration as well as various nuclear spin
couplings.
We have Ln the above perturbation treatment essentially used
the same ideas as Born and Oppenheimer, namely that it is possible - to a certain degree of approximation - to treat the electronic and nuclear motions separately. In this approximation the total wave function of the molecule is simply a product of the
electronic and nuclear eigenstates:

''

total

= '' (0)
k

(2.22)

and the energy of this product state is the eigenvalue Ek(A


)
ext
obtained from the nuclear equation (2.21).
Molecular properties.
We shall now focus our attention to the reduced energy operator (2.~4) an~ employ it ~o introd~{I)the cZ~~JPt o~ mo~ecular
propertLes. FLrst we realLze that Ek
and Ek
defLne Ln general
the energetic response of the molecule to some internal or external perturbation described by ~. The magnitude of this response
is a characteristic property of the molecule and of its actual
electronic state. Accordingly we sha!tlyall t~t2)response a molecular property, and the elements of Ek
and Ek
will be denoted
as first and second order properties, respectively. Before giving
the exact algebraic formulas of these molecular properties we
shall interprete the contents of the property matrices in a qualitative way, on the basis of physical principles. This physical
interpretation is summarized in Figures I and II.
Seen from the physical view-point the molecular properties

318
P

PETER SWANSTR(hM AND FLEMMING HEGELUND

ora

Nuclear Force (vanishes in equilibrium)

Lambda Doubling

Hyperfine Coupling Constants

lJ a

Internal Electric Field Gradients


~----------------------------------------------E
'V

Permanent Molecular Dipole Moment

Permanent Molecular Quadrupole Moment

Permanent Molecular Magnetic Moment


-(1)

-E

Figure I. The first order property matrix i(l). The dotted line
separates the internal from the external properties.
The null-property arises from the fact that the Hamiltonian does not contain any operators which are linear
in P .
a

319

MOLECULAR PROPERTIES

fall into two distinct groups: those which connect to an internal


perturbation, X.
,and those which connect only to an external

'\A ~nt Th ese propert~es


.
. t h e sense
pertur b at~on,
are d'~ ff erent ~n
that the formere8~es will lead to a splitting of the energy levels of the molecule, whereas the latter ones merely shift all
the levels equally much, within the considered electronic state.
Thus the internal properties can be observed in a spectroscopic
experiment, the external ones not. In the case of mixed internal
and external properties, the external perturbation serves as a
means of amplifying the splitting. According to this it is obvious that the Lambda doubling, the hyperfine coupling, and the internal field gradients will lead to field-independent splittings,
and the same will hold for e.g. the vibrational operators and
the nuclear spin-spin coupling. The nuclear magnetic shielding
and the rotational magnetic moment factors cause splittings which
are proportional to the applied magnetic field strength, and the
magnetic susceptibility shifts all the levels by the same amount
proportional to the square of the magnetic field.

3. EXPLICIT ALGEBRAIC FORMULAS OF SOME FIRST AND SECOND ORDER


PROPERTIES.

We could now easill yyite do~~ formulas of all first and second order properties E(
and ~ ), using the operators and the
principles outlined in the preceeding sections. This would, of
course, be a rather comprehensive work; in this section we shall
select some properties of common interest in order to illustrate
the method.
In order to simplify the notation in the following we shall
omit the subscript that refers to the electronic fine structure
level. It is thus understood that we consider the stationary properties of a molecule in a single (but unspecified) electronic
state.
Hyperfine coupling constants (first order).
As seen from (1.44), hyperfine splitting is due to the coupling between electronic and nuclear spin magnetic moments. The
hyperfine coupling constant is defined as
(3.1)

Here and in the f~llowing it is understood that all derivatives


are evaluated at:\
O. From (1.38), (1.43), and (1.44) we find
3H/3jj

whence

3H.

~nt

/3)1

hfs(a)

(3.2)

320
p

PETER SWANSTRj{lM AND FLEMMING HEGELUND

vibrational
G-matrix
quadratic
force
constants

ora

reciprocal
inertial
tensor

spin
rotation
constants

nuclear
spin-spin
coupling
constants

rotational
magnetic
moments

nuclear
magnetic
shielding
tensors

molecular
dipole
moment
derivative

ora

Figure II. The second order property matrix E(2). The matrix is
null-properties or which are unknown to the authors.

321

MOLECULAR PROPERTIES

nuclear
quadrupolequadrupole
coupling
constants
molecular
dipole
po larizabil i ty
tensor
nuclear
quadrupole
shielding
tensors

induced
electric
moments

molecular
quadrupole
polarizability
tensors
molecular
magnetic
susceptibili ty
tensor

'V

symmetric. The blank fields refer to properties which are

322

PETER SWANSTR0M AND FLEMMING HEGELUND

<'1'

L.
( ) n
0
IZ{~ ~ +

. mc

r.

al

al

(3.4)

that is, the hyperfine coupling constant is simply the expectation value of the hyperfine structure operator, hfs. The first of
the operators in (3.4) merely causes a shift of all the hyperfine levels, but not a splitting; for this reason it is usually
neglected in this context although it is not negligible in magnitude. The second operator in (3.4) is the dipole interaction term
which leads to anisotropic hyperfine splitting, and the last operator is the Fermi contact term which gives the isotropic hyperfine splitting. The hyperfine coupling constant vanishes for molecular states with no unpaired electrons. The anisotropic coupling averages to zero for gases and liquids. The contact term
vanishes unless there is a finite electron density at the nucleus
a.
It is still doubtful whether (3.4) in fact accounts for the
hyperfine coupling between electronic and nuclear spins. The
reason for this is that the hfs'operator is based on a multipole
expansion of the internal vector potential, and this expansion
cannot be proved to be convergent. The same matter of dispute can
be raised for the nuclear spin-spin coupling and shielding.
The hyperfine splitting is field-independent; it is observable in electron spin resonance spectroscopy where transitions
are induced between the Zeeman components of the fine structure
level, whose degeneracy has been removed by a strong magnetic
field.
Electric dipole and quadrupole moments (first order).
The molecular dipole and quadrupole moments define the first
order response of the molecule to an external electric field and
field gradient, respectively. Thus
)l

-de/dE = - <'1'(0) IdH/dEI'1'(o

(3.5)

-3de/d~ = _3<'1'(0) IdH/d~I'1'(o

(3.6)

It is readily seen that only the electric field operator (1.55)


contributes to the derived operators in (3.5) and (3.6), and we
obtain

323

MOLECULAR PROPERTIES

11

11. x P.
~
~}

2mc

IIJI

()
0

> + L: Z er
a a
a

(3.7)

and for the quadrupole moment tensor


n
e <'Y(o) iL:{3r.r.
1'}1'Y(o
. ~ ~ - r~~
2

'Q"

N
e
L: Z (31' l' - r2
a a
a
2
a
a

+ -

1')

(3.8)

The spin dependent operator in (3.7) gives vanishing contribution


for singlet states. The dipole and quadrupole moments of a molecule can be determined experimentally by e.g., molecular beam
spectroscopy.
Nuclear vibration (second order).

.
..
f ragTh e f ~rst
two d'~agona 1 blocks 0 f "'(2)
E
def~ne Jo~ntly a
ment of the reduced energy operator which is responsible for the
oscillations of the nuclei within the molecule. We can write this
operator fragment on the compact form

12

(Pa ... or a .... )

f-~- -i- ~ -]

1 N
1 N
- L: G p2 + L:
2
a a
2
a,b
a

or a

(3.9)

'"

. the so-called force constant


where G = 1 / m and where F b ~s
tensor.aThe pr~sence of thi~ operator will cause a collection of
vibrational levels to be stacked on top of the electronic fine
structure levels. The vibration operator obviously describ~s a
superposition of 3N harmonic oscillators. We observe that G is a
diagonal matrix",representing the kinetic energy of the oscillators, and that F represents the correspond~ng potentia~ energy.
That coordinate system which diagonalizes F and makes G unity is
known as the normal coordinate system.
Since there are only 3N-6 (for linear molecules 3N-5) internal modes of vibration, 6 (5) of the harmonic oscillators described by (3.9) will be non-genuine or redundant, and they give

324

PETER SWANSTRM AND FLEMMING HEGELUND

zero contributions. This redundancy is established via a number


of conditions which the force constants must satisfy, known as
the translational and rotational invariance conditions because
they reflect the invariance of the vibration potential during
translation and rotation. These relations are inherent properties
of the potential; they appear because the potential operator is
derived from Coulomb interactions which only involve distances
between electrons and nuclei within the molecule.
An important remark should be added at this point. The invariance of the vibration potential during translation and rotation is an obvious physical requirement but in principle it
applies only to the complete potential - including the higher
order (anharmonic) terms. Since the rotational invariance relations are non-linear they connect potential terms of different
orders; the harmonic potential in (3.9) is a truncated expansion
and accordingly not invariant to rotations.
The translational invariance relations are linear and the
harmonic potential is exactly invariant during molecular translation.
The force constant tensors are obviously defined by
Cl 2e /Clr a Clrb

}lab

<'fl(o) I Cl 2H/Clr a Clrb !'fl(o


<'fl(o) iClH/Clr i'fl(o<'fl(o) !ClH/Clr !'fl(o

all

+ 2 2:'
1

(3.10)

It is seen that only H 'b' (1.41)-(1.42), contributes to the


operator derivatives iX~~3.10). We find easily
n Z e 2 (3r .r .-r2.
a
a~ a~ a~
i
r5.

<'fl(o)!2:

a~

<'fl(o) !LZ e 2r ./r 3 .I'fl(o<'fl(o) !2:.z e 2r ./r 3 .1'fl(o


~ a
a~
a~
1
1
~ b
b~
b~

2 2:'
1

u ab

'"",

(3.11)

325

MOLECULAR PROPERTIES

This is the force constant representation of the vibration potential (1.42).


The vibrational splitting of the electronic levels is fieldindependent. It can be observed in, e.g., infrared or Raman
spectroscopy, or as spectral fine structure in UV.
Nuclear spin-spin coupling (second order).
The spin magnetic moments of the nuclei in a molecule interact via the electrons, and the interaction causes another field
independent splitting, known as the nuclear hyperfine splitting.
The ~ouEt~yg constants are diagonal elements of the property
matrIx E
:
Cl 2 e
CliJaCl].lb

:tab

<~(o) Ihfs(a) I~io<~io) Ihfs(b) I~(o


~'

h.c.
(3.12)

E(o)_E(o)
1

where hfs is the hyperfine structure operator (1.44).


b is
named the nuclear spin-spin coupling tensor. Since the ftamiltonians (1.38) and (1.43) contain no operators of second order in the
nuclear spin moments there is no direct interaction between nuclear spins. For this reason, (3.12) consists only of the perturbation sum over the electronic states. The nuclear spin-spin coupling is obviously of the same origin as the hyperfine coupling
between electronic and nuclear spins, except that it is classified as a second order effect.
The nuclear spin-spin interaction is observable in nuclear
magnetic resonance experiments where transitions are induced between nuclear Zeeman levels whose degeneracy has been removed
by a strong magnetic field.
Nuclear magnetic shielding (second order).
When a molecule is exposed to an external magnetic field the
nuclear spin magnetic moments will interact with the field. This
interaction is the nuclear Zeeman effect or the nuclear magnetic
shielding. The magnitude of the shielding is defined by the
shielding tensor

326

PETER SWANSTR\llM AND FLEMMING HEGELUND

<~(o) IdH/dHI~iO<~iO) IdH/d~al~(o


- 2:'
1

+ h.c.

(3.13)

E(o)_E(o)
1

It is seen from (1.43), (1.44), and (1.46) that

- '\, - .r.)
(r- .or.)I-r

e2

2mc 2

L{

a1

N Zbe

~{~L.-jj.}-~

i 2mc
dH/ dil

a1 1

}_

r3.

a1

2-

b 4mc 2

-J,l.(rb.or.)-rb.(r.ojl.)

11

111

3
r bi

hf s (a)

Restricting ourselves to molecular states with no unpaired electrons the electron spin dependent operators will give vanishing
contributions to (3.13) which then reduces to the Ramsey formula
[3]for the nuclear shielding.
,..,

'\,

a(a)

1---

2mc 2

e2

+ h.c.

---~'

2m 2 c 2 1

(3.14)

The second and the third terms are usually denoted the diamagnetic and the paramagnetic contributions, respectively, although
this distinction is closely connected to the choice of gauge
origin of the electromagnetic potentials and therefore rather
arbitrary. This problem will be further discussed in Section S.
The nuclear shielding is often denoted as the chemical shift
though strictly speaking these two properties are not identical.
We can introduce the chemical shift in the following way: The
external magnetic field interacts with the nuclear spin magnetic
moments both directly and via the electrons. The chemical shift
tensor is defined to account for the electron-coupled interaction,
and the over-all energy shift can be written as
(3.15)
Comparing this expression to the definition of the shielding

327

MOLECULAR PROPERTIES

(3.13) we obtain
'V

0shift (a)

'V

'V

= I - o(a)

(3.16)

The chemical shift tensor defines the effective magnetic field


experienced by the nucleus; we define the effective field at
the nucleus by
L1E

=-

ila .

eff

(3.17)

(a)

whence
(3.18)
corresponding to the fact that the electrons shield the magnetic
moment of the nucleus by decreasing the applied field strength.
The static electric polarizability (second order).
When a molecule is exposed to some external electric field
there will be a first order interaction between the field and
the molecular dipole moment. But in addition to this effect, the
field induces a molecular dipole moment which then interacts
with the applied field. This second order 'feed-back' effect is
known as polarization and it h~~2tts analogues in the diagonal
blocks of the property matrix E
. By inspection of Figure II
we see that the polarizability tensor is defined by
'V

ex

=-

d 2 e/dEdE

(3.19)

From (1.38) and (1.55) we deduce that


n

dH/dE

L: {er. +
l
i

jJ.xP.
1

2mc

l}

N
_

L: Z er
a a
a

(3.20)

Using the short hand notation d for this operator, the polarisability tensor becomes simply
- 2

(3.21)

328

PETER SWANSTR~M AND FLEMMING HEGELUND

because d is a real operator.


We can give some interesting physical interpretations of ~.
Let us define the over-all electric moment of the molecule as
ae

E +

aE

(3.22)

where we expand the derivative ae/aE in terms of the field.


Using the definition (3.19) of the polarizability the dipole
moment becomes

~(E) = ~o

E+

t E~ E
'\,

(3.23)

+
'\,

where we have introduced the hyperpolarizability ~


-a 3e/aB3.
This higher order polarizability is of importance when strong
electric fields are applied, e.g. in Laser beams. We see that
the molecular dipole moment is a sum of the permanent dipole ~
and the induced moments, whose magnitudes and directions are 0
determined by the polarizabilities.
The polarizability is also connected to some macroscopic
quantities, namely the dielectric constant and the refractivity
index.
The specific polarization of an isotropic substance in an
external field is

= NV a E = K E

(3.24)
.

where NV = number of mo~ecules per cm , a = the average molecular


polarizability = trace a/3, and where K is denoted the dielectric
susceptibility. The effective charge distribution p in the substance is
p = Po - div P

(3.25)

where p is the field-free charge distribution. The electric


field s~rength is defined by Coulomb's law, i.e.
div

41fp

41f(p -div
o

P)

whence
div(E + 41fP)

= 41fp o

and the effective field within the substance is

(3.26)

329

MOLECULAR PROPERTIES

=E

+ 4TIP

(1+4TIK)E

EE

(3.27)

The factor E which connects the effective and the applied field
strengths is the dielectric constant:
E

=1

+ 4TIN v a

(3.28)

and the refractivity index n is defined by


n2

or

Il+4TIN va

(3.29)

It is well-known that the refractivity index is frequency dependent. This frequency dependence appears when the applied
fields are time-dependent, i.e. when we deal with electromagnetic
radiation. In this case we must employ time-dependent perturbation theory to determine the frequency dependent or dynamic polarizability.
4. COMPUTATION OF PROPERTIES USING PERTURBATION THEORY.
In Section 2 we introduced the concept of molecular properties on the basis of perturbati~lyhe~t~J This gave rise to properties of various orders named E
,E
and so on. In tht~ secti~~2re shall present methods to compute the elements of E ) and
E

When we want to perform such property calculations we have to


face primarily the problem of solving the zeroth order electronic
Schrodinger equation (2.11) because the stationary state functions
enter the first and second order property expressions (2.16) and
(2.17). Frequently one is only interested in the properties of
one single electronic state (e.g. the ground state). In that case
~~11eed of course only the wave f~~Zyion of that state to compute
but if we want to calculate E
we certainly need all the
E
other states in order to evaluate the perturbation sum of (2.17).
Since most computations of electronic wave functions aim at providing ground state functions we shall restrict ourselves to
finding properties for that state.
Computation of first order properties does not imply severe
difficulties since such properties are simply average or expectation values of the various property operators corresponding to
the ground state function. As to the second order properties the
situation is somewhat more complicated. The reason for this is of
course that the zeroth order problem cannot be solved reasonably
accurately for all the excited states.
At first sight the Hartree-Fock approach seems to be of little
use because it is a single state method. We shall return to this
point later. In stead the configuration interaction method could

330

PETER SWANSTR<i\M AND FLEMMING HEGELUND

be of interest in a property calculation because it generates a


number of excited states which we might simply take as the excited states. Though at present we do not know much about the success of such an approach it may present at least one major problem: The perturbation sum of (2.17) seems to be very slowly converging and it will not include the continuum which is claimed to
contribute considerably in some cases.
Below we present a method which is strictly confined to the
Hartree-Fock level. It is a perturbation theory designed within
the SCF-LCAO frame usually referred to as the perturbed or coupled Hartree-Fock method. The method has proved successful in many
cases but it is restricted to closed shell molecules. We refer
the reader to the works by Lipscomb [12] and by Gerratt and Mills
[11] for a comprehensive and detailed presentation of the theory.
Perturbation theory has also been applied to the unrestricted
Hartree-Fock approach. This seems to have some success in the
case of spin-dependent properties.
In very general terms it can be said that there seem to be
evidence that one-electron properties evaluated at the HartreeFock level will deviate only a few per cent from the exact or
experimental data.
The coupled or perturbed Hartree-Fock approach.
The Fock operator operates at all the electronic and nuclear
space and spin variables and it contains operators which account
for the interaction between the molecule and an external field.
It is a pseudo one-electron operator

(4.1)
where h is a genuine one-electron operator while g covers the
mutual electronic interactions. When we deal with one-electron
perturbations as in the preceding sections h depends explicitely
on A,and g implicitely via the molecular orbitals. We partition
f into three terms according to powers of

"fo

X:

..

"

(4.2)

+ fl + f2

= f

where f

"

(0)

-(1)
f

is the usual unperturbed Fock operator and


and

1 - "'(2)2 A f
A

(4.3)

We can introduce the auxiliary operator

(4.4)
where

is an ordering parameter.

331

MOLECULAR PROPERTIES

Employing the (finite) basis


= {X. li=1,2, ... w} of atomic orbitals for the expansion of r(~) the SCF-LCAO equations take the
form
(4.5)
(4.6)
which must be fulfilled for any value of ~ in the permitted interval [O,lJ. The matrices in (4.5) and (4.6) are simply
rv

(4.7)

FAO (~)

(4.8)

We allow for the basis functions to depend on the perturbation A


and therefore on ~. Thus the molecular orbitals defined by (4.5)
and (4.6) are given by
. (~)

X(~) C.

(~)

(4.9)

For computational reasons we shall transform the general SCFequations (4.5) and (4(6) to a coordinate system in which the
unperturbed operator f ~(
diagonal. This can be accomplished
by me~~8)of the matrix C 0 formed by the unperturbed eigenvectors C.
as columns:

tS

c(o)}

(4.10)

We obtain
rv

F(~)u. (~)

= s.

u.

(~)

+rv

rv

(~)O(~)u. (~)
~

0 ..

O(~)u. (~)

~J

(4.11)

(4.12)

where
rv(o)+rv ()rv(o)
C
F AO ~ C

(4.13)

2;(0)+'S'

(4.14 )

AO

c. (~)
~

(~)2;(0)

2;(o)u. (~)
~

(4.15 )

332

PETER SWANSTR~M AND FLEMMING HEGELUND

""(0) .
. )
(The inverse of ""(0).
C
ex~sts because C
~s an orthogonal matr~x .
Inserting (4.15) into (4.9) we obtain the relation between the
molecular orbitals and the new eigenvectors u.:
~

(4.16)
where 8 is the transformed basis. 8 resembles the unperturbed
molecular orbitals because
(4.17)

8(0) = X(O)C'(o) = (P(O)

It follows that ~(O) = {ul (0),u 2 (0), .... uw(O)}


and that

is a unit matrix

""F .. (0)

6 .. s. (0)

(4.18)

""o .. (0)

6 ..

(4.19)

~J

~J

~J

~J

""

Wenow expand ""


F, 0, u., and s. in powers of the ordering parameter. Inserting the e*pansion into the SCF-equation (4.11) we
obtain

(4.20)
Collecting together corresponding orders in
equations
(0) - (0)
u.
J
J

= s.
( ""F(O)

- s.(0) "")-(1)
I u.
J

we get the

(4.21)
(4.22)

and so on. Here we are mainly intereste1l}n the first order equation because we want to determine the U.
which in turn define
the first order correction to the wave function. The zeroth order
equation (4.21) is assumed to be exactly solved, i.e. we assume
that the ordinary unperturbed Hartree-Fock solution is already
established.
In order to simplify the following derivations we shall let
the basis
be independent of the perturbation, i.e. X(~)=X(O),

333

MOLECULAR PROPERTIES

whence from (4.16) and (4.17):

cp. (ll) = 8(0) U. (ll) = cp (0) U. (ll)


~

(4.23 )

It should be emphasized that this restriction has the character


of an approximation. This approximation seems reasonable when
dealing with external field perturbations. However, in the case
of nuclear displacements leading to force constants it is not
reasonable, i.e. when performing force constant calculations the
basis set should be chosen dependent on the nuclear coor.dinates.
This latter assumption makes the perturbation treatment somewhat
more complicated, see e.g. Gerratt and Mills [11], and Thomsen
and Swanstrm ~8].
In the invariable basis approximation we account for the perturbation of the wave function only through the coefficients u ..
This means the overlap matrix ~ is a unit matrix whence (4.22)~
becomes
(~(o) _ E~o) l)u~l) = _(~(l) _ E~l) l)u~o)
J
J
J
J

(4.24 )

.
f rom left b y u-(0)+ we get
. l'~cat~on
Af ter mu 1 t~p
i

(E~O) - E~o))U~~)
J

~J

F~~)

(4.25)

~J

where F~~) = H~~) + G~~) is the first order matrix representation of~the Fo~~ oper~tor. By definition
H~~)
~J

G~~)
~J

[2(0)+~~~)2(0)]ij
r'V ( 0 ) +'V ( 1 ) 'V ( 0) ]
C
GAO C
ij

~C(~)*C(~)<X 1'h(1) Ix

>

(4.26 )

1: C (~ ) * C (~ ) <X Ig(1) IX >


pq p~
qJ
P
q

(4.27)

pq

p~

qJ

h(l) is the first order(~~e-electron Hamiltonian which we derive


directly from (1.38). g
is the first-order electronic repulsion operator which in the Hartree-t~~k scheme depends on the molecular orbitals. We shall derive
explicitely:

(4.28)
"

1\

where M is the number of occupied molecular orbitals and J and K


the Coulomb and exchange operators defined by

334

PETER SWANSTRM AND FLEMMING HEGELUND

(4.29)

1: C k(].l)C k(].l)<X

rs r

1
I-Ix
>X
q s

r r 12

(4.30 )

whence
(4.31)

where <rslpq> is the usual two-electron integral ~n the atomic


orbital basis. Using (4.27) and (4.31) we find

or, using (4.15):

(4.32)

where the two-electron integrals are now transformed to the


(unperturbed) molecular orbital basis. To evaluate the derivative
(4.32) we perform the following operations:
Equation (4.32) is written in the short-hand form
(1)
G..
~J

=~
a).l

M _+
rv_
1: Uk(].l)M uk(].l)
k

(4.33)

where the matrix ~ (suppressing indices i,j) ~s defined as


2[lmlijJ - [ljlim]
and (4.33) becomes

(4.34)

335

MOLECULAR PROPERTIES

~ {-(l)+ ~ -(0)
uk

t...

kl

M~

-Co)+ ~ -(I)}
MUk

+~

M C (1)* + UkCll + ~
~ {uCl)*M +u Cl ) ~ }
lk u lk
k T=M+l Tk
Tk Tk -KT

(4.35)

To simplify this further we utilize the orthogonality condition


C4.l2):
-Co) +
Cu.

]J

-uC.l) + ... )+ ~
I (-Co)
u.
J

-Cl) + .. )

+]J U.

0 ..
~J

whence
-Cl)+ -(0)
-Co)+ -(1)
u.
u.
+ u.
u.
~

o ,

(1)
..
u .. + u (1)*
~J

J~

(4.36)

Using this in C4.35) we get simply

G~:)
~J

M w
(1)
{u (1)*
E
E
MTk + u Tk ~T}
Tk
k T=M+l

(4.37)

Now, if the uC~) elements are real, corresponding to real oneetrytron pert~rbations (e.g. electric field perturbations)
G.. becomes
~J

M w
E
E
k T=M+l

uC~)
C4[Tklij]-[Tj lik]-[kj liT])
Tk

C4.38)

and if the uC~) are purely imaginary, corresponding to imaginary


o~r)electronTperturbations (e.g. magnetic field perturbations)
G.. becomes
~J

M w
E
E
k T=M+l

u~~) ([Tjlikl-[kjliT])

(4.39 )

If we insert C4.38) and C4.39) in the first order equation we get


two sets of equations from which we can determine the elements

336

PETER SWANSTR(I}M AND FLEMMING HEGELUND

(1)

(T=M+l, . .. , w, k=l, ... ,M). These equations are called the

Uk

p~rturbed SCF-equations and they define the first order correcti-

ons to the molecular orbitals due to a one-electron perturbation.


The equations are
M

L:

w
L:

k T=M+l
+0

OT

O'k ( S ( 0
J

)_

( 0 ) )}

s,
J

( 1 ) = -H (1. )
u k
T
oJ

(4.40)

for the case of real perturbations. We use latin letters for


occupied orbitals and greek letters for the virtual orbitals.
For imaginary perturbations the equations become
(4.41)

We see that only the right-hand side of these equations depends


on the perturbations, whereas the left-hand side depends on the
zeroth order SCF solution only. The equations are ordinary linear
equations with M(w-M) unknowns. The equations have non-trivial
solutions if the wave function is stable to one-electron perturbations. The solutions determine the contributions from the virtual orbitals to the first order occupied orbitals.
The energy corrections.
Having worked out the first order perturbation corrections to
function we are now abl~ to_~I~ermin~(~~e energy correct~ons, ~.e. the molecular propert~es E
and E
.
~he wa~e

The energy corresponding to a closed shell wave function is

- exchange term)

(4.42)

where we have used (4.23). ~ is then the matrix representation in


MO basis of the one-electron operators appearing in the total Hamiltonian (1.38). We expand e in terms of ~
(4.43)
and we shall subsequently extract the properties from the energy
corrections. By inspection we find

337

MOLECULAR PROPERTIES

(1)

~~ {2 u.
-(o)+~(l)-(o)
-(l)+~(o)-(o)
H u.
+ u.
H
u.
~
~
~
~

(o)+~(o)-(l)}

+ u.

u.

'" {-(l)+M~(' .)-(0)


-(o)+M~(' .)-(l)}
uk
~,~ uk
+ uk
~,~ uk

+.~

(4.44 )

~,k

~(~i) is the representation (4.34) of the electron repulsion

operators. Using the orthogonality condition (4.36) we obtain


(after operations similar to the ones leading from (4.32) to
(4.39) :

M w
M
u(l)*{H(o) + 2:(2[T k lii]-[ Ti l'ik])} + c.c.
+ 2:
2:
Tk
Tk
k T=M+l
i

(4.45 )

But the zeroth order SCF equations (4.21) state that

H~~)

(2[Tkiii]-[Tilik])

(4.46 )

Thus t~Tl)econd term in (4.45) and its complex conjugate vaEish.


Since H
is of first order in the perturbation operators A,
e

(1)

A -(1)
E

(4.47)

where we have introduced the first order property matrix E(l)


(2.16). The general expression for a first order property is then
(4.48)
where we use the superscript 0 to indicate that the property
refers to the perturbation A . This expression may be used to
evaluate the first order pro~erties listed in Figure I, Section 2.
The first order energy correction (4.48) could also have been
obtained by straight forward application of the Hellmann-Feynman
theorem. The reasons for this are that we have assumed implicitely that the exact eigenfunctions of the zerGth order Fock operator are known, and that the basis functions are independent of
the perturbations. This is discussed in [8].

338

PETER SWANSTRM AND FLEMMING HEGELUND

Next we shall derive the second order energy corrections.


This is most easily accomplished if we expand the general first
order expression

e (1) (]J) = 2 L:

ii~ (]J) ~(1)

u. (]J)

(]J)

(4.49)

to second order in ]J. We get


e

(2)

M
~{2
6

-u(.o)+ ~H(2)u-(.o) + u.
-(l)+~(l)-(o) -(o)+~(l)-(l)}
H
u. +u.
H
u.
~

(4.50)

Again we use the orthogonality condition (4.36) to obtain


(4.51)
This expression is of second order in A:
e

(2)

1 -

~(2)-

= - 2 AE

(4.52)

where ~(2) is the second order property matrix (2.17). The general expression for a second order property is then

M
M
- 2 L: H~~V) - 4 Re L:
i

~~

L:
i T=M+1

(4.53)

Again we use superscripts 0 and V to distinguish between the


various perturbations AO and AV.
By definition E(OV) will be an element of the property matrix of
Figure II, Section 2. The expression (4.53) can be used in computational evaluation of such properties.
Though the coupled Hartree-Fock theory is straight forward
there are a number of practical difficulties to overcome. The
most severe one is due to the sets of linear equations (4.40) and
(4.41). Because of their extremely high dimensions (M(W-M)) the
computer core storage will limit applications of the theory to
rather small molecules. The most well-known attempt to solve this
problem is due to Dalgarno[13] who suggested the neglect of all
contributions from the two-electron integrals to the left hand
side of the equations. This approach causes the coefficient matrix to be diagonal but the results obtained in this way may be
50% in error.

339

MOLECULAR PROPERTIES

5. GAUGE PROBLEMS OF MAGNETIC PROPERTIES.


As stated in Section 1 the electromagnetic potentials A(rt)
and (rt) define the electric and magnetic fields; see equations
(1.3) and (1.4). However, these equations do not define A and
uniquely but rather a set of potentials; two sets of potentials,
(A,) and (A~'), are said to be connected by a gauge transformation if they lead to the same E and H. (~,') are obtained from
(A,) by such a gauge transformation if

(5.1)
+

1:.
c

Clg(rt)
Clt

(5.2)

for any decent function g(rt). This follows readily from application of (1.3) and (1.4):

E' (rt) = - Vc/>' (it)

- 1 ClA' n:t)
c
Clt
(5.3)

H' (rt)

vx

A'(rt)

vx

A(rt)

(5.4)

This fact leaves us completely free to choose the gauge function


g. We shall take advantage of this freedom to select a special
gauge which will prove convenient to our considerations.
Let (A' ,') be a set of potentials obtained from (A,c/ by an
arbitrary gauge transformation. If we apply (1.3) and (1.4) to
Ampere's law:
Vx H -

1 ClE
~3t

4 'IT
C

(5.5)

where j is the current density, we obtain


(5.6)

V x (V x A')

or, after some simplification:


Cl')
Clt

4'IT
c

(5.7)

We predict the existence of a gauge function g which makes the


parenthesis in (5.7) vanish identically:

340

PETER SWANSTR0M AND FLEMMING HEGELUND

1 3,j, I
c 3t

V AI + - --~-

=0

(5.8)

This condition which we shall impose on the potentials is the


Lorentz condition of relativistic invariance. The vector potential AI is thus a solution of the simple inhomogeneous differential
equation
47T
C

(5.9)

if (A:<P') fulfil the Lorentz invariance condition (5.8). Equation


(5.9) is known as d'Alembert's equation. In case of a perfect vacuum U=O) it leads to the well-known plane electromagnetic waves
[1]; but that is another story.
We are now able to prove the existence of the gauge function
g predicted above. Inserting (5.1) and (5.2) into (5.8) we obtain
1 3<p(rt)
c
3t

(5.10)

For any decent set of potentials (A,<P) we can find infinitely


many solutions g of (5.10), and these solutions ensure the existence of gauge transforms (A',<P ' ) which fulfill the Lorentz invariance condition (5.8).
Let us consider the time-independent case introduced in Section 1, equations (1.15) and (1.16). We will choose the potentials
(A,<P) time-independent as well. We shall now seek a time-independent gauge function which ensures that (A',<P ' ) fulfil the Lorentz
invariance condition.
We can take the potential (1.22) as the arbitrary vector
potential A:
-

A(r) =

21

(H x r)

(5.11)

where H is the uniform magnetic field and r is a posLtLon vector


in the molecule-fixed system. That this A is a genuine potential
follows simply from the fact that curl A = H. We insert (5.11)
into (5.10) and look for a set of solutions g(r). We obtain
(5.12)
which may be satisfied by functions of the type
(5.l3)

where

R is

any vector in the coordinate space. Thus, since we can

341

MOLECULAR PROPERTIES

always find time-independent gauge functions g, it follows from


(5.2) that

cp'(r)=CPCr)

(5.14 )

Any potential

A'(r) = A(r) - Vg(r)

i(H

x r) +

V(i(H

x r) R)

i(H

x r) -

V(i r

(H x R))

"21 -H x (r-R)

(5.15)

will thus be a proper potential in the sense that (5.14) and


(5.15) together satisfy the Lorentz condition. Moreover, since
cP' = cp, this condition is fulfilled for any scalar potential cP,
and the Lorentz condition simplifies to
(5.16)
also known as Coulomb gauge.
It is seen that A' = A if we choose R
O. A differs from A'
only in the sense that while A is zero at the origin of the molecule-fixed coordinate system,
is zero in the point defined by
R. The gauge transformation defined by the gauge function (5.13)
will thus correspond to a translation of the potential zero point
within the molecule. The point R is frequently denoted the gauge
origin.

A'

In the time-independent case treated here, the electric field


is defined exclusively by the scalar potential_cp', and the magnetic field exclusively by the vector potential A'. But because
there are infinitely many vector potentials leading to the same
magnetic field H we shall have to prove that all magnetic properties are independent of the gauge origin. It is also clear that
this kind of problem is non-existent for the electric properties
because cP' = cP (5.14).
Gauge transformations of molecular wave functions.
Above we have discussed gauge transformations of the electromagnetic potentials. It was shown that, though we restrict ourselves to consider potentials which are relativistically invariant there will still be infinitely many potentials which lead to
identical electromagnetic fields. Since the potentials enter the
molecular Hamiltonian it is important to investigate the effect

342

PETER SWANSTRlM AND FLEMMING HEGELUND

of these gauge transformations on the molecular wave functions


and energies.
We shall demonstrate that the energy is independent of the
choice of the gauge function under certain circumstances.
Let (A,) and (A' ,') denote potentials which are connected
by a gauge function g(rt). g is primarily designed to ensure the
invariance of (A' ,'), i.e. it is defined by equation (5.10):
_ V A(rt) _ 1 d(rt)
c dt
Of course, if the potentials (A,) are already invariant the
right hand side of this equation vanishes because of (5.8). In
any case this differential equation has infinitely many solutions
of the gauge function. For the n-electron system that we consider
we select n different solutions {g. !i=1,2, ... ,n}. In this way we
are able to assign to every electr6n an individual gauge function.
We define an overall gauge function for the n-electron system by
n

G(T 1T2 .Tn,t) = ~ gi (rit)

(5.17)

and we introduce the gauge transformed wave function


(5.18)
where ~ is an (eventually approximate) electronic wave function.
One can easily show that

n!~

~e

exp ~
-

~'

G)'IT.

(5.19)

and

(n!
~

n!)~'

(5.20)

where TI' is the gauge transform of 'IT:


(5.21)
If we deal with time-independent fields we can choose the potentials and the gauge functions time-independent as well. In this
case ' = , whence
'~'

ie
exp (nc G) ~

(5.22)

343

MOLECULAR PROPERTIES

Since
enters the Hamiltonian only via the linear momentum operator n, and only as a multiplication operator we conclude that
H'

ie

= exp(fic G)H

~'

(5.23)

<~ H ~>

(5.24 )

and
<~

IH I~
I

>

I I

This holds for approximate as well as for exact wave functions.


We conclude that for any given set of potentials and for any
given electronic state function ~ we can always find new potentials which are relativistically invariant and new state functions ~' which preserve the energy. This statement holds for timeindependent fields.
The transformation (5.18) of the wave function is called a
gauge transformation of the second kind; the transformations
(5.1) and (5.2) are the transformations of the first kind. As
shown above these transformations do not alter the physical contents of our model; they merely shift the zero point of some potentials and multiply the wave functions by a phase factor.
Gauge invariance of molecular properties.
In Section 2 we solved the stationary eigenstate problem
(2.7) by means of perturbation theory. This lead us to an expansion of the energy in terms of the perturbing operators A:
e

= Ek(0) -

. -(1)
Ek

1. '\
2

~(2) A +

...

with
A

= (Pa , ora'

'\,

J, lJ a , Qa ; E, F<,

H)

About the energy (operator) e k we know that it is gauge invariant whether it is approximate or exact (5.24). It is also clear
that the perturbation operators Xare gauge independent. The
question to be answered is: are the electron properties gauge
invariant as well?
Seen from the physical viewpoint the answer is trivial: since
the properties (directly or indirectly) represent observable
quantities they must be independent of the choice of gauge.

Pty

.
However~ from the computa~io~a1
of v~t~)the situ~tion
1S not so s1mple. The reason 1S that Ek
and Ek
do not 1nvolve
the gauge invariant momentum operators n because we dissolved this
operator into its components already in Section 1. Since a time
independent gauge transformation results in the addition of an
arbitrary constant to the vector potential, as seen from (5.15),

344

PETER SWANSTR~M AND FLEMMING HEGELUND

we foresee the possibility of gauge dependence of the magnetic


properties. In fact the change of gauge origin will result in a
change of the relative magnitudes of the diamagnetic and paramagnetic contributions to a magnetic property, thus making the distinction between these two contributions as arbitrary as the
gauge origin. It can be shown, however, that the sum of the diamagnetic and paramagnetic terms is constant in case we employ a
complete basis. Thus, while the total energy is gauge invariant
even with a finite basis the magnetic properties require an infinite basis. The proof of this statement involves a great number
of algebraic operations and for this reason we shall omit the
proof from the present text; the reader is referred to Davies [4].
This gauge invariance is not obtained with finite basis sets
and consequently we may obtain any result that we want by shifting the gauge origin around in space. This nuisance may be overcome by the application of the so-called gauge invariant atomic
orbitals (GlAD) first introduced by London [5], and later used
by e.g. Hameka [6], and Ditchfield [7]. This method involves the
assumption that the (approximate) wave function depends explicitely on the magnetic field in a predefined way. Of course this
assumption cannot be justified physically since the stationary
state functions must depend on the external fields in an implicite way, but it has the virtue of leading to properties which are
gauge invariant within a certain approximation.
The method of changing, in the wave function, an implicite
parameter dependence to an explicite one is used frequently in
calculations of force constants (see for instance [8] and references therein), and sometimes also in the evaluation of electric
properties [9]. In no case this can be justified physically; it is
merely a computational trick designed so as to obtain control
over the dependence of the wave function on some perturbation
parameters.
As stated above, the gauge problem is non-existent for the
stationary electric properties because the scalar potential is
invariant under time-independent gauge transformations. Still some electric properties may depend on the choice of molecule-fixed
origin, i.e. a translation of the molecular framework within the
molecule-fixed coordinate system changes some of the electric
properties. This is so because the multipole expansion (1.53) of
the scalar potential involves the values of the field and the
field gradient at the molecule-fixed origin. Buckingham [10] has
shown that only the first non-vanishing component of the multipoIe expansion is origin-independent. Thus the dipole moment is
origin-independent for a neutral molecule, and the quadrupole moment will be independent only for non-dipolar molecules. This
coordinate dependence should not be confused with the ordinary
gauge dependence.

345

MOLECULAR PROPERTIES

BIBLIOGRAPHY
[1]

R.E. Moss: Advanced Molecular Quantum Mechanics. (Chapman


and Hall, 1973).

[2]

See e.g. M. Born and K. Huang: Dynamical Theory of Crystal


Lattices. (Oxford University Press, 1954).

[3]

N.F. Ramsey: Molecular Beams. (Oxford Univ. Press, 1956).

[4]

D.W. Davies: The Theory of the Electric and Magnetic Properties of Molecules. (J. Wiley, 1967).

[5]

F. London, J.Phys.Radium (Paris)

[6]

See e.g. H. Hameka: Advanced Quantum Chemistry. (AddisonWesley, 1956).

[7]

R. Ditchfield, J.Chem.Phys. 56, 5688 (1972).

[8]

K. Thomsen and P. Swanstrm, Mol.Phys.

[9]

See e.g. Fraga and Malli: Many-Electron Systems: Properties


and Interactions. (Saunders, 1968).

[10]

A.D. Buckingham, Adv. Chern. Phys.

[11]

J. Gerratt and I.M. Mills, J.Chem.Phys. 49, 1719 (1968).

~,

397 (1937).

~,

~,

735 (1973).

107 (1967).

[12]

W.N. Lipscomb, Adv. Magn.

[13J
[14]

281 (1962).
B.J. Howard and R.E. Moss, Mol.Phys. ~, 433 (1970).
A. Dalgarno, Adv. Phys.

Resonance~,

l!,

137 (1966).

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

V.R. Saunders
Atlas Computer Laboratory, Chilton, Didcot,
Oxfordshire OX!! OQY, England

!.

INTRODUCTION

The expectation value of the electronic energy of an-electron


system may be written as the ratio of two integrals,

integration being taken over the 3n spatial and n spin coordinates


of the electrons.

Procedures for integration over the spin

variables are not the concern of the present work, and we proceed
under the assumption that the spin integrations of the energy
expectation value have been completed.

We are thus left with

the evaluation of two integrals over the 3n spatial coordinates.


A rather widely used procedure is to construct the trial form of
the many electron wavefunction,

~,

the coordinates of one electron.


of one electron functions by

{~}.

from a set of functions of


We will denote the basis set
Notice that each basis

function defines a set of n building blocks to be used in the


construction of the wavefunction, since each function may be
written in the coordinates of any of the n electrons.

It is

now usual to find, irrespective of the exact rules used in the


Diercksen et al. (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 347424.
All Rights Reserved. Copyright 1975 by D. Reidel Publishing Company, Dordrecht-Holland.

v.

348

R. SAUNDERS

construction of the trial wavefunction, that the integrals


occurring in energy expectation value can be reduced to linear
combinations of three and six dimensional integrals of the
following form (we assume that He denotes the usual spin-free
clamped nucleus Hamiltonian):
(a)

Overlap Integral

s ..

!.(IH.(I) dTI
~
J

~J

(b)

Kinetic Energy Integral


-Hi (I)V~j (l)dT I

T ..
~J

( c)

(la)

(I b)

Nuclear Attraction Integral

!i(I)j(2)rl~dTI

V~.

~J

(I c)

where rIC denotes the distance between a fixed point with


Cartesian coordinates Cx ' Cy and Cz , and a variable point
with coordinates x, y, z, the latter being the position of
electron I.
(d)

Electron Repulsion Integral


(ij/U)

(I d)

Clearly the evaluation of the energy integrals is of the utmost


importance for the application of the quantum theory to problems
of molecular electronic structure, and it is this class of
integral which will receive our most direct attention.
In the evaluation of molecular properties and higher terms in
the full Hamiltonian, other categories of integral become
important.

For example, if we wish to evaluate the electric

349

AN INTRODUCTION TO MOLECULAR INTEGRAL EV ALUA TION

field at a point, C, we require integrals of the form

where

Xc

point C.

is the

coordinate of electron 1, measured from the

Similar integrals for the y and z axes are also

required.
We will not take up the evaluation of such 'property' integrals
in any great detail, since we prefer to indicate how methods to
be described for the evaluation of the energy integrals may be
generalized so as to be applicable to as wide a range of
integrals as possible.

2.

BASIS FUNCTIONS

2.1

Gaussian Type Functions

The unnormalized three-dimensional 'Is' Gaussian type function


(GTF) is given by
2

G(a,A)

exp(-ar A)

(2)

where a, which is normally real and positive, is a parameter


(the orbital exponent), and r A denotes the distance between a
given fixed point, A, with coordinates A ,A and A , and a
x
y
z
variable point with coordinates x,y and z. The fixed point will
be the centre of the Gaussian.
2
rA

Thus

(x-A )2 + (y-A )2 + (z-A )2


x
y
z

The following abbreviations will be used


x

A = x-Ax
y-A

z-A

350

v. R. SAUNDERS

A more general class of function was first proposed by Boys [IJ,


of the form
G(a,A,R-,m,n)

(3)

where R-, m and n are integers > O.

It should be noted that GTF

can be factored into three simple (one dimensional) Gaussians,


thus
G(a,A,R-,m,n)

X(a,A,R-)Y(a,A,m)Z(a,A,n)

(4a)

where
X(a,A,t)

x! exp(-axi)

(4b)

and similarly for the Y and Z factors.


2.2

Slater and Exponential type Functions

If rA,e and

denote the spherical polar coordinates of a

point measured from an origin at A, thus


xA

r A sin

(3

cos

YA

r A sin

(3

cos

zA

r A cos e

then a normalized Slater type function (STF) may be written

where

YR-m(e,~)

denotes a spherical harmonic function of the

polar angles (see eg reference [2J page


integer

~I,

143),~

and N is a normalization factor.

denotes an
If the spherical

harmonics are specified in real form, then the first few members

351

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

of the set are:


I

Is

(a3/Tf) 2 exp (-ar A)

2s

(a s /3Tf)2 rA exp(-ar A)

2p

(a s /Tf)2 x A exp(-ar A) and similarly for 2p 2p


z
Y
I
(2a 7 /45Tf) 2r exp(-ar A)

3s

3px
3d

(2a7/15Tf)2rAxA exp(-ar A) and similarly for 3p .3p


Y
z
I

xy

(2a7/3Tf)2xAYA exp(-ar A) and similarly for 3d

xz'

3d

yz

We shall find it preferable to consider a related form of basis


function, which may be written, in unnormalized form,

E(a,A,ll,~,m,n)

(5)

which we shall refer to as an exponential type function (ETF).


Notice that any STF may be written as a linear combination of ETF.
2.3

Why consider two classes of basis function?

The exponential (or Slater) functions have often been preferred


because:
(a)

The exact wavefunction for a one-electron system in the


presence of a central attractive Coulomb field (eg the
hydrogen atom) may be written as a linear combination of
either a finite number of ETF, or an infinite number of GTF.
We may conclude that the ETF will prove more suitable for
the construction of accurate wavefunctions in those regions
of a molecule where the electrostatic potential may be

352

v. R. SAUNDERS
closely approximated by a Coulombic central field eg in the
immediate neighbourhood of a nucleus.

(b)

It may be shown that the exact wavefunction should obey


certain cusp conditions.

For either a one or two electron

atomic system, a spherically symmetric wavefunction should


obey the following nuclear-electron cusp condition

where 8A denotes the atomic number of the nucleus.

Clearly

a Is GTF centred at A is incapable of complying with the


cusp condition, since the LHS of the cusp condition will be
identically zero.

However, a Is ETF obeys the cusp

condition exactly if we let the orbital exponent equal 8A,


again indicating the potential superiority of ETF for the
construction of wavefunctions designed to be accurate in the
immediate environs of a nucleus.
(c)

The wavefunction should exhibit an exponential rather than


Gaussian decay with respect to the distance of any electron
from the nuclear framework a large distance.

Clearly the

asymptotic rate of decay of a GTF is too rapid.


(d)

Perhaps the most important factor in practical computations,


it has been found that fewer ETF than GTF are required to
achieve a given accuracy in the energy.

Nothwithstanding the above considerations, the GTF has been the


most widely used basis function in recent ab initio calculations.
The reason for this

~s

almost certainly due to the fact that

molecular integrals are so much more tractable when GTFs are


used.

Thus we find:

353

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

(a)

The coding of a general purpose computer program to produce


accurate values of molecular integral

~s

easily accomplished for basis sets of GTF

comparatively
whilst it would

still be fair to say that no such comparable program exists


for the STF.
(b)

The amount of functional and numerical analysis (and


numerical experimentation) required to produce a viable
GTF integral program is vastly less than for the
corresponding STF case.

(c)

The computer time taken to produce a molecular integral is


considerably less in the GTF basis.

However, the fact that

more GTF must be used for a given accuracy partially offsets


this effect.
In passing, it seems appropriate to note that our list of possible
classes of basis function is by no means complete.

For example,

Steiner et al. [3J have suggested the use of the so-called 'cusp'
functions to supplement a GTF basis, so as to improve the quality
of the basis in the neighbourhood of nuclear-electron cusps.
Many other special-purpose sets of basis function are known
(particularly for use in atomic and diatomic cases), and simple
variants of the basic GTF or ETF/STF forms have also been used.
None of these will be discussed.

It is our hope that by omitting

discussion of such integrals, we can devote more time and hence


give greater depth to our discussion of molecular integrals for
GTF and ETF.

It is further hoped that the following pages will

show many of the most useful techniques and something of the


attitude to be adopted towards molecular integral evaluation.

3.

THE GAMMA FUNCTION AND RELATED DEFINITE INTEGRALS

The following formulae are of central importance for the evaluation


of molecular integrals.
3.1

Definition of the complete Gamma function

The complete Gamma function may be defined through the definite


integral

rem)

f exp(-x)xm-I dx
OO

(6)

3.2

Recurrence relationship for the Gamma function

Integrating equation (6) by parts (fudv=[uv]-fvdu), letting


m-I
u=x
,dv=exp(-x)dx,

r (m)

[-x

m-I

exp(-x)]

00

oof
m-2
(m-I) exp(-x)x
dx

(7)

Now the first term on the RHS of equation (7) is zero if m is


greater than I, and using the definition of the Gamma function,
equation (6), we obtain the recurrence relationship

rem)
3.3

(m-I)r(m-I) (m>l)

(8)

Gamma functions of integral argument

The following is readily proved:


00

r (I)

00

fexp(-x)dx

[-exp(-x)]

and from the recurrence relationship, we may show:


r(n)

(n-I) !

(n integral and

so long as we adopt the convention O!=I.

I)

(9)

355

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

3.4

Gamma functions of half integral argument

By means of the substitution u

= ax 2 ,

and using the definition

of the Gamma function (6),

fexp(-ax )x dx
2

2a

f exp (-u )u (t-I)/2 d u

-(t+I)/2O

(10)

Hence, by symmetry
t

00

exp(-ax ) x dx

(t odd integer)

-00

N-(HI)/2 r (t+l)

(t even integer)
(I I)

From equation (II), we may show


00

00

-00-00

-00

We now transform into plane polar coordinates


x

r cose

r sine

the Jacobian of the transformation (see reference [4J page 182)


is given by
ax/ar
ax/ae

ay/ar \. =
ay/ae

Hence
[rO)]

00

21T

ff

2
rexp(-r )dedr

00

21Tfrexp(-r )dr
o

356

V. R. SAUNDERS

By means of the substitution u

r(D

7Tjexp(-u)du
o

r2, we find
7Tr

(I)

7T

Therefore

From the recurrence relationship (8), we find


(2n-I)!!7T!

r(n+i)

(n integral and

0)

( 12)

2n

where we adopt the convention (-I)!!


3.5

I.

Some definite integrals related to the Gamma function

By means of the substitution z

= x-S/2a,

and using equations (II)

and (12)
jexp(-ax 2+Sx)dx
-co

( 13)
An alternative proof of equation (13) may be accomplished by
expanding exp(Sx) in power series, integrating term by term
(using equation (11, and noting that the resulting infinite
series may be written as an exponential in S2/4a with a factor
(7T/a)!
This latter proof is also valid for integrals of the form
jexp(-ax 2+iSx)dx

(14 )

-co

where i

= 1-1,

the result.

where we expand exp(iSx) in power series to prove

Equation (14) may be generalised by differentiation

357

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

with respect to g, using Leibnitz's rule to move the differential


operator under the integral sign:
00

in

JxnexP(-ax2+iSx)dx
-00

(15 )
Noting the following definition of the Hermite polynomial, see
reference [5J page 150,
H (z)
n

(_I)n exp(z2) an

exp(-z2)

az n

we find
(-Iii)
n

(TI/a)

(~)

i (TI/a)2 (~)

2a 2

2a2

exp(-z2)H (z)

exp(-g2/~a)H

(g/2a 2 )
( 16)

The Hermite polynomial, H (z), may be written


n

H (z)
n

n!

[n/2]

I
k=O

(_I)k

(2z)n-2k

( 17)

-'--'-'---->-=.L-..,---

k!

(n-2k)!

where the summation maximum, [n/2J, denotes the largest integer


less than or equal to n/2.
It

~s

of course possible to achieve a somewhat similar generalization

of equation (13), an exercise we leave to the student.

Equation

(13) will prove of use in the evaluation of two-electron and


nuclear attraction integrals by means of a Laplace transformation
technique whilst equation (16) will be used for the evaluation
of such integrals via a Fourier transform.

v.

358
4.

R. SAUNDERS

OVERLAP AND KINETIC ENERGY INTEGRALS INVOLVING GTF

In this section we will consider the evaluation of the overlap


and kinetic energy integrals for GTF.

These integrals are of a

rather simple form, and serve to illustrate the 'Gaussian product


theorem' outlined below.
The Gaussian product theorem

4.1

The Gaussian product theorem states that the product of two


s-type GTF having different centres, A and B, is itself a
s-type Gaussian (multiplied by a constant factor) with a centre
somewhere along the line AB, at a point P.

That is:

K G(y,P)

(18)

To prove this theorem, we proceed as follows, noting

iA

i-A ' B
i

i-B

(19)
also, noting -+rp
KG(y,P)

""t-
.1:'
-+-+

-+-+

-+

exp(-yr.r +2yP. r -yP . .I:')K

(20)

To obtain the equivalence between the right hand sides of equations


(19) and (20) implied by equation (18), we find (equate powers
of

t)
y

therefore, using equation (21)

(21 )

359

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

(22)

using equations (21) and (22), we find

(23)
where we define (AB)2

The notation AB

A -B will also be used in subsequent discussion.


x x
Equations (21) to (23) constitute proof of the product theorem.
~l 2
When the GTFs are not Is orbitals, extra factors such as xA
xB

appear in the product G(al,A,tl,ml,nl)

G(a2,B'~2,m2,n2).

For later convenience we redefine these factors in terms of x p '


a coordinate relative to the point P.

Thus

iA

(~-P) + (P-A)

= xp

+ PAx and

xB

= xp

r; + (P-t), hence xA

+ PBx' and by the binomial theorem,

(~-A)

v.

360

R. SAUNDERS

J/,z

L
j=O
and the product

J/,l +J/,Z
=

_
k
fk(J/,l,Jl.Z,PA ,PB )Xp
k=o
x
x

where

i+j=k
Jl.l Jl.z

L L means that we sum over the indices


i=o j=o
i and j in the indicated range, for all values of the indices

where the notation

satisfying the condition i+j=k.

Combining the results of the

present section together, we find

(24)
4.2

Normalization of the GTFs

A normalized GTF is defined by means of


NGl,A,!/',m,n)

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

361

where N is such that

fG~(a,A,t,m,n)dT
Hence N

_1
2,

where

I may be considered as a rather simple example of the overlap


integral defined by equation (Ia), and may be written

I I

x Yz

where
I

(and similarly for I ,I )


Y z

(2a)-(2t+I)/2 r ( (2t+I)/2)

(using equation (II) )

Hence
I

(2a)-(2t+2m+2n+3)/2 r ( (2t+I)/2)r( (2m+I)/2)r( (2n+I)/2)


a-(Zi+2m+2n+3)/2(f)3/2(Zi_I)! !(2m-I)!! (2n-I)!!

where we have used equation (12).

In all subsequent analysis,

we shall work with unnormalized GTF.

Clearly an integral over

normalized GTF may be expressed as the corresponding integral


over unnormalized GTF multiplied by appropriate normalization
constants.

v.

362

4.3

R. SAUNDERS

The overlap integral for unnormalized GTF

The overlap integral between two unnormalized GTF, may be written,


using equation (24)

(25a)
where I

(25b)

and similarly for I

and I . I may be evaluated using equation


z
x
(II), to yield (note that the odd powers of xp in equation (25)
y

give rise to zero-valued integrals)


[(R.l+JI.2)/2 J
I

i=o

-- --

-(2i+I)/2

f . (R.l,R.2,PA ,PB )y
2~

r(

(2i+I)/2)

This result may be written, using equation (12),


I

(2i-I)!!

with corresponding results for I

and I , thus completing the

formulation of the overlap integral.


4.4

(26)

(2y)i
z

The kinetic energy integral for unnormalized GTF


(unsymmetric form)

Before proceeding to an analysis of the kinetic energy integral,


it will prove convenient to establish a short hand notation for
integrals related to the overlap integral.
the symbol

<a/a>

We first define

<a/a>
(27)

363

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

The symbol <+nIO>

will denote an integral of the form given by

equation (27), except that the quantum number 21 has been


incremented by n.

Similar notations will apply to the ml and nl

quantum numbers by the use of subscripts x and y.

We will also

use the symbols <ol+n> , where we have incremented the quantum


x
number 22 by n, and <+ml+n> where the quantum number 21 and 22
x

have been incremented by m and n respectively.

The symbol

<-ml-n> will denote on integral where the quantum numbers, 21


x

and 22' have been decremented by m and n respectively.

If the

result of such decrementation is to produce a negative quantum


number, the integral will be set to zero.
The kinetic energy integral was defined by equation (Ib), and
may be written:
22222
2
-HG(al ,A,21 ,ml ,nl) (3 13x +3 13y +3 13z )

KE

G(a2,B,22,m2,n2)dT
Clearly, the integral can be written as a sum of three components,
KE

-!fG(al,A,21,ml,nl)3 13x G(a2,B,22,m2,n2)dT

+ I

+ I

(28)

where
2

and similarly for I

and I .
z

(29)

The following results are easily

obtained

(30)
2

3 13x G(a2,B,22,ffi2,n2)
_2
2 2
(22(22-I)xB-2a2(222+1)+4a2xB )G(a2,B,2 2 ,m2,n2)

364

v.

Whence I
I

R. SAUNDERS

may be written
(31 )

and similarly for I

and I .
z

Equations (28) and (31) comprise our first scheme, the unsymmetric
scheme, for the evaluation of kinetic energy integrals.

The

scheme is unsymmetric because the integrals which appear

~n

equation (31) are of the form where the quantum numbers of the
GTF centred on B are altered, whilst those of the GTF on A are
not.

4.5

The kinetic energy integral for unnormalized GTF (symmetric


form)

The x component of the Laplacian in equation (Ib) gives rise to


the integral
00

00

-! J J

00

J~i a

lax

~j dx dy dz

-00-00-00

Integrate by parts in the x coordinate, let u

dv

Hence

whence du

00

~.,

00

JJ

00

Ja~./a
~
x

-00-00-00

a~j lax

~./ax

dx,

dx dy dZ]
(32)

Now the term

[~.a~./axJ
~

in equation (32) is identically zero for

-00

basis functions where both

~.~O
~

and

a~.ldX~
~

as

x~.

Such basis

functions are the only allowable ones for bound state problems.
Therefore:
00

00

00

H J J a~. la x
~

a~./a

dx dy dz

(33)

365

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

We now apply equation (30) twice, to obtain the result

(34)

Thus equations (28) and (34) define an alternative symmetric


scheme for the evaluation of kinetic energy integrals.

There

is little to choose between the two schemes in terms of computational


efficiency.

The present author makes use of the symmetric scheme,

principally because more of the auxiliary functions required for


the kinetic energy integrals are in common with those required
for dipole moment integrals in the symmetric scheme, hence making
the simultaneous evaluation of kinetic energy and dipole moment
integrals slightly easier.
4.6

Dipole moment integrals for unnormalized GTF

We discuss these non-energy integrals here because of the frequency


of their use, and their close connection with the overlap and
kinetic energy integrals.

The general dipole moment integral

may be written

and similarly for the operators y

and z. A convenient procedure


c
c
is to redefine Xc in terms of xA or x B' and we will use xA Thus
x

x-C

(x-A ) + (A -C )
x
x x

and we find
D

<+1/0>

+ AC <0/0>
x

and similarly for D and D


z
Y

xA + AC

v. R. SAUNDERS

366

5.

INTEGRALS RELATED TO THE INCOMPLETE GAMMA FUNCTION

We now present a study of a definite integral which cannot be


evaluated in closed form, and which will be required for the
evaluation of nuclear attraction and electron repUlsion integrals.
Our presentation is intended to give familiarity with a number of
widely used techniques for the approximation of non closed form
functions.

Much of the material of this chapter is to be found

in a review due to Shavitt [6J.


Definition of F

5.1

(W)

We are interested in a function of the form

Fm (W)

2 2m
J exp(-Wt)t
dt

(35)

where W is real and

0, and m is an integer

o.

The most

obvious feature of this function, which follows from the form of


the integrand, is that the greater the value of m, the smaller
the value of the function (for a given value of W).
a special case expressable in closed form,
F (0)
m

5.2

(2m+l)

F (0) is
m

-I

(36)

Recurrence relation

Let u

= exp(-Wt

) and dv

2m
dt, and integrate equation (35)

by parts,
F (W)
m

exp(-Wt

(2m+ 1 )

-I

2m+! ] I

)~2m+l)

2W
+ (2m+!)

(exp (-W) + 2W F I (W)


m+

Jt 2m+2 exp(-Wt Z)dt

(37)

The recursion implied by equation (37) is in a downwards direction.


It is possible to recast equation (37) into a form suitable for

367

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

recursion upwards.

Thus

( (Zm+l)F (W) - exp(-W) )/ZW

Fm+ 1 (W)

(38)

However recursion upwards is not recommended, particularly for


small values of W, because the RHS of equation (38) consists of
two almost cancelling components for small W, and indeed limits
to the indefinite form %

as

W~O.

Generally one should be

extremely careful when using recursion formulae, as often they


suffer from excessive round-off if applied in the wrong direction.
5.3

Power series by successive application of the recurence


relation

Repeated application of equation (37) gives rise to a power


series for F (W), as follows:
m

F (W)
m

exp(-W)
(Zm+ 1)

ZW
(Zm+l)

exp(-W)

[(Z!+l)

Fm+l (W)

ZW
]
(Zm+l) (Zm+3)

ZW
[(Z!+l) + (2m+l) (2m+3)

exp(-W)
2

(2W)
]
(Zm+l) (Zm+3) (Zm+5)

(ZW)

(ZW) 2F

Z (W)
m+
(Zm+l) (Zm+3)

F 3(W)
m+
(Zm+l) (Zm+3) (Zm+5)
3

More generally,

Fm (W)

exp(-W)

I
1=0

(2W) i
(Zm+ 1) (Zm+3) ...... (2m+Zi+ 1 )

(39)

This power ser1es is unconditionally convergent to the required


result for all values of W, a fact which is easily proved by
noting that the greater the value of m, the smaller the value of

v.

368
F (W), as explained in section 5. I.
m

However convergence becomes

less rapid as W achieves large values, and


~s

rarely used when W>IO.

R. SAUNDERS

~n

practice the series

The application of the series (39)

is best achieved by direct use of the recurrence relation (37).


Thus if we set F . (W) to zero, and recur down, F (W) will be
m+J
m
equal to the series (39) where the last term included is such
that i = j-I.

This procedure ensures that the summation is

performed so that the small terms are added in first, giving


rise to a smaller loss of accuracy when compared with the forward
summation procedure.
terms first.

As a general principle, sum the smallest

Of course successful use of the recurrence

relationship implies that we can choose a suitable value of j,


so as to produce the desired accuracy in F (W).
m

The appropriate

value of j will depend upon the values of m and W, and also on


the required accuracy;

we have determined by numerical

experimentation the following formulae for j so as to produce a


relative accuracy of at least 10

_9

for all values of m in the

range 0-+12.
8 + m/2 + 2W

It is sometimes useful to recast equation (39) in terms of


complete Gamma functions of half-integral argument.

Thus noting

from equation (12) that

r(m+i+3/2)

TI~(2m+2i+I)! !/2(m+i+l)

we find
F

(W)

r(m+~)

exp(-W)

I
~=o

Wi /r(m+i+3/2)

(40)

369

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

5.4

Power series by Taylor series expansion

Using Leibniz's rule to differentiate F (w) with respect to W,


m

we find
aF (W)

J dt

aw

t 2m aexp(-Wt )

aw

- J dt

2m+2

exp(-Wt)

(41 )

- Fm+ 1 (W)

and using a Taylor series expansion about Wo '

l.

F (W +W)
m

(_W)i F

. (W )/i!

m+~

i=o

(42)

We may particularize equation (42) to the case W

equation (36),

Fm (W)

i=o

0, using

(-W)i/(i! (2m+2i+I

(43)

The power series (43) is as rapidly convergent as (39), and


yet is little used by comparison, despite the presence of the
exponential in (39).

Perhaps the reason for this is the

alternation of sign of the terms in (43), making (43) slightly


less numerically stable.

Another reason is connected with the

fact that we usually require the F (W) functions for a range of


m

m values, which we obtain by recursion down.

Since the recursion

relationship (37) requires evaluation of the exp(-W), the latter's


presence in equation (39) causes no additional expense.
5.5

Evaluation using continued fractions

The method of evaluation we now consider is suitable for the case


when W is large in magnitude (say W>8) thus complementing the
power series method, the latter being of value for small values
of W.

First consider the integral (see reference [7J page 263,

370

v.

R. SAUNDERS

[8J page 356, [9J page 144)

00

r (a, W)

exp(-u)u

a-I

du

Wa exp(-W)

[w!

2-a
1+

I-a

W+

1+

W+

3-a

3
W+

1+

.. J
(44)

where we have used a standard notation for a continued fraction,


ie,

bS

The repeated fraction appearing

~n

(45 )

equation (44)

~s

convergent

for W>O, /a/<oo


We now define a function, (W), complementary to F (W),
m
m
(W)
m

00

exp(-Wt ) t

2m

dt

(46)

we note that, using equation (10),

F (W)
m

00

exp(-Wt)t

2m

dt - m(W)

(47)

371

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

By means of the substitution u

= Wt

in equation (46) we find,

using equation (44)


2IW-(m+!)f

OO

cP

(W)

exp(-u)um-!2du

4W- (m+! ) r (m+! , W)

(48)

Therefore
F (W)
m

(49)

where P 1S the repeated fraction given 1n equation (44).


5.6

The numerical evaluation of continued fractions

The most complete work of reference on the theory of continued


fractions in functional analysis is due to Wall [8J, and the
following represents a digest of a recursive procedure for the
evaluation of continued fractions explained more fully in [8J,
page 15 or L9J page 2.

Let us define the n'th partial result,

sometimes referred to as the n'th approximant or convergent,


P

bn

A
B

(50)

In the practical evaluation of an infinite repeated fraction, we


would like to proceed as follows:
(a)

Evaluation P

and P I and compare their values. If the


n
n+
difference is less than a given threshold, use the value of
Pn + l

(b)

If the difference is greater than the threshold, evaluate

2' compare its value with Pn+ I' and if the difference
n+
is less than the threshold accept the value of Pn +2 . If the

difference is greater than the threshold, continue to

372

V.R.SAUNDERS

evaluate higher order partial results, until convergence is


achieved.
A moments refection on this process leads us to the conclusion
that what we require is a recursion relationship for the P , for
n

if we have to evaluate each P

ab initio, the whole process is

going to be rather expensive.

In fact no recursion relationship

exists for the P n , but it is possible to give recursion formulae


for the A and H of equation (50), as follows:
n
n
A
n

A
+ a A
n-I
n n-2

where A_I

o , B- I

and B

I.

Before leaving our discussion of continued fractions, it is worth


mentioning that continued fractions find a widespread use in the
mathematical sciences, although they sometimes arise in rather
disguised forms.

For example it is possible to show that Pade

approximants, (a Pade approximant is simply the ratio of two


polynomials, the (2,m) approximant denotes a form where the
numerator and denominator are polynomials of order 2 and m
respectively), may be written as repeated fraction.
elementary functions (exponential, sine.

Thus the

square root etc) are

normally evaluated with the aid of rational approximations


deduced. by the truncation of a continued fraction.

The student

may care to confirm that the n'th partial result of the repeated
fraction

~n

approximant.

equation (44) may be written as a (n-I,n) Pade

373

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

5.7

Asymptotic series for the function

(W)

Our purpose is to describe an alternative procedure for the


approximate evaluation of

(W), and hence F (W) via equation (47).


m

--

First integrate equation (46) by parts, letting u


dv

= tZmdt,
~

= exp(-wt

),.

to yield
(exp(-W) +

(W)

(Zm-1)~m_1

(51 )

(W) )/ZW

It is possible to deduce from equation (51), given that ~m(W) is


intrinsically a positive quantity, irrespective of the value of m,
the inequality
~

(W) < exp(-W)/ZW

if

(5Z)

m~O

If we apply the recurrence relation (51) i times, we obtain a


series
~

(W)

exp(-W)

[Z~

(Zm-I) .... .(Zm-Zi+3)] +


(ZW) 1.

(Zm-I)
(ZW)

(Zm-l) (Zm-3)
(ZW)

'" .+

(53)

where R, the remainder, is given by


(Zm-I) .... (Zm-Zi+l)
(ZH)i

we deduce from the inequality (5Z) that if

IR I

<

i~m

I (Zm-I) ... (Zm-Zi +1) exp (-W) / (ZW) i + 11

Therefore the absolute value of the remainder in equation (53)


is smaller than the absolute value of the first term omitted
from the series (the i+I'th term), given that
true asymptotic series.

and we have a

Notice that we have not shown that the

terms decrease without limit.


shown in the figure.

i~m,

A typical pattern of behaviour is

374

V. R. SAUNDERS

Point of
Absolute

minimum

Value of
Term
Term Number
There is no guarantee that for a given m and W value the point
of minimum error will give a sufficiently small error.

In

practi'ce it is found that so long as W is larger than some


critical value, the latter depending on m and on the required
accuracy, the asymptotic series is capable of affording an
adequate precision.
It is worth noting that the theory of asymptotic series and
continued fractions is rather closely related.

In fact it is

sometimes possible to convert a divergent asymptotic series into


a convergent repeated fraction, but we refer to Wall [8J page
350 for the details.

5.8

Tabular interpolation

Let us suppose we have constructed, using the methods outlined above,


a table of F (W.) for a series of values of the argument, W1"
m

The

value of F (W) may then be obtained from such a table by interpom

lation.

However, because F (W) is a rapidly varying function, with


m

comparatively large high order derivatives, a high order interpolation scheme and/or a closely spaced table will be required.

Consider

the functionf (W) = W(m+!)F (W). We know from the asymptotic


m
m
series for F (W), equation (49) that f (W) will be extremely
m

slowly varying as W+oo.

In practice, it is found that f (W) is more


m

slowly varying than F (W) down to rather small values of W.


m

Thus

a better scheme is to compute f (W) by a comparatively low order


m

interpolation scheme, and compute F (W) from


m

F (W)
m

(W) W-(m+D

375

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

6.

NUCLEAR ATTRACTION AND ELECTRON REPULSION INTEGRALS FOR


GTF - THE LAPLACE TRANSFORM METHOD

The Laplace transform method was used by Boys [IJ in his


pioneering work on Gaussian orbitals, but has now been largely
superseded by the Fourier transform method, the latter being
the subject of section 7 of the present notes.

Nonetheless we

have felt it worthwhile to present a discussion of the Laplace


transform methods largely because it provides an opportunity to
to introduce integral transform techniques without too many side
complications.
6.1

The Laplace transform

The integral transform we shall be using is of the form


r

-I

1T -

ff"

2_1

exp(-sr )s 2ds

(54)

which is easily proved using the substitution u = sr


equations (6) and (12).

and

A generalization of equation (54)

~s

available,
r- A

[r(A/2)J- I

exp(-sr 2 )s(A/2-I)ds

which

lS

useful for certain property integrals, and which is

again easily proved from equations (6) and (12).

Tabulations of

Laplace transforms are available [IOJ, in which f(t) is quoted


against g(s) where
f(t)

exp(-ts) g(s) ds

To convert the exponential In the above equation to 'Gaussian'


2

form, it is merely necessary to perform the substitution r =t.


For example, equation (54) will be found in a standard tabulation

376

v.

as f (t)
6.2

-1

g(s)

R. SAUNDERS

-1

The nuclear attraction integral for unnormalized 'Is' GTFs

This integral may be written


NAI
We now apply the Gaussian product theorem, equations (21) to
(23) ,

NAI
where y

= al+a2'

_2

= exp(-ala2(AB)

/y) and the coordinates of the

point P are given by equation (22).


-)

We now apply the Laplace transform (54) to rC '

The order of integration is now inverted, so that we integrate


over the spatial coordinates first.

We first apply the Gaussian

product theorem
2

exp(-yr p ) exp(-sr C)
_

exp(-ys(PC) /(y+s)) exp(-(y+s)r D)


where the coordinates of D are given by

D= (yP+sC)/(y+s),

so that
NAI

K;2

co

js-2

exp(-Ys(PC) /(y+s))j exp(-(y-+s)r D) dT ds

The integration over the spatial coordinates is accomplished

377

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

using equations (II) and (12),


NAI

K7f

fOOs -!

(y+S)

-3/2

- 2
exp(rys(PC) I (y+s))ds

Making the substitution t

s/(y+s), we find

and
NAI
(55a)
The integral on the RHS of equation (55a) is of the form
discussed in section 5, ie

6.3

2 2

exp ( -y(PC) t ) d t

F (y(PC) )
o

(55b)

The electron repulsion integral for unnormalized 'Is' GTF

The procedure for the evaluation of the electron repulsion integral


is basically a more complicated example of the procedures used
for the nuclear attraction integral.

ERI

ff

Thus

2
2
-I
2
2
exp(-alrl A-a2 r lB) rl2exp(-a3r2C-a4r2D) dTl dT2

where the overlap distributions (OD) on the left and RHS of


-I

rl2 are functions of the coordinates of electron I and 2


respectively.
functions.

An OD is defined as the product of two basis

As a first step we apply the Gaussian product theorem

to each of the ODs, thus converting from a four to two centre


integral,

378

V. R. SAUNDERS

where Kl

-)

We introduce the Laplace transform of r12, to yield


00

ERI

I s-!ds
x Y z

I I

(56)

where
I

and similarly for I

and I .
z

Let u

(u-v + P -Q )
x x

= XI-P x

and v
_

(u-v + PQ )
x

Therefore
I

exp(-s(PQ )
x

) * f

-00

* f

2
du exp(-(Yl+s)u -2PQ su)
x
2
dv exp(-(Y2+s)v +2(PQ +u)sv)
x

Integration over v is accomplished using equation ()3),


I

1
X

11 2

* -cof

CO

exp (- s (PQ ) ) (y 2+ s)
X

_1
2

d u exp [-(Yl+S)U 2-2PQx su +

379

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

7I! exp(-s(PQ )2 Y2/(Y2+s


x

(Y2+ S )-!

We now use equation (13) again, to integrate over u,

_1

7I(Y1+ Y2) 2 (S+s)

_1
2

exp(-S(PQ )s/(S+s
x

Substituting this formula for I

(and similarly for I

into equation (56), and using the substitution t


ERI

K1K271

5/2

(Y1+Y2)

Yl Y2(Yl+Y2) 2

and I )

s/(S+s)

-3/2 00

_ 2
fs -12(S+S) -3/2 exp(-S(PQ)s/(S+sds

2 2

exp(-S(PQ) t ) dt

o
(57)

6.4

Higher quantum numbers - the step operator method

In the general case, integrals involving GTFs of higher angular


quantum number are required.

It is of course possible to derive

the formulae for such integral directly on the lines of sections


6.2 and 6.3 above.

This direct procedure is found to be extremely

380

V.R.SAUNDERS

cumbersome, and the following step operator technique is more


commonly used.
aG

OAx

The equation
2aG(a,A,t+l,m,n)

(a,A,t,m,n)

tG(a,A,t-l,m,n)

is easily proved from the definition of the GTF (equation (3)).


Therefore
(~(a,A,t,m,n)

G(a,A,t+l,m,n)

aA

+ tG(a,A,t-l,m,n)/2a

Similar relations also hold for the y and z quantum numbers.


This means that integral formulae for orbitals of higher quantum
numbers can be derived from the basic formulae (equation (55) and
(57)) by differentiations with respect to parameters.

Consider,

for example, the nuclear attraction integral

where we have used the inverse of Leibniz's theorem to move the


differential operator from under the integral sign.
Hence, using equation (55b),
n
yal

OAx

[exp(-ala2(AB)2/ y)

F (y(PC)2)]
0

__ 2
2n
exp(-ala2(AB) /y)
Y

a2
(C -P ) F 1 (y(PC)2) x x
Y

(A -B ) F (y(PC)2)]
x x
0

where we have used equation (41) to accomplish the necessary


differentiations of F (Y(PC)2).
o

(58)

381

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

7.

NUCLEAR ATTRACTION AND ELECTRON REPULSION INTEGRALS FOR


GTF - THE FOURIER TRANSFORM METHOD

The Laplace transform method discussed in section 6 is ill-suited


to the generation of general formulae for the nuclear attraction
and electron repulsion integrals;

a rather better procedure is

via the Fourier transform to be described in the present section.


The latter method, which is essentially due to Wright [IIJ, has
been discussed by Huzinaga and co-workers [12J.

(Beware of

typographical errors when reading [12J).


7.1

The Fourier transform and other preliminary material

The one dimensional Fourier transform is defined by (see [13J


page 301),
I

F(x)

(2n)-Z

00

exp(ixk) f(k) dk

(59)

-00

where F(x) is said to be the Fourier transform of f(k) , and


!

i=(-1)2.

The n-dimensional generalization of (59) is of the form

where the notation


00

-+

00

-+

has been used, and r, k are vectors with components xl,x2etc


and kI' k2 etc respectively.
The particular transform we shall be interested 1n is three
dimensional, and of the form
(60)

382

V. R. SAUNDERS

where
k
and r

k .+k +k
x y z

k:.k:

++

x +y +z

r.r

The proof of equation (60) is rather simple, and involves


transforming the k-space variables into spherical polar coordinates
k, 9k'~k' using as polar axis the
J

vector.

Hence, noting that

k x ,ky ,k z )

k,8 k ,!f>k

we find
00

I=2- l rr- 2f

00

00

fdk Xdky dk Zk- 2 exp(it.k:)

-00-00-00

-I _2o rr2rr

rr

fff

000

dkd8kd!f>k sin8 k exp(irk cOS9 k )

Integration over !f>k is

irr~ediate,

and using the substitutions

u = cos8 k , k' = rk, we find


I

001
r-Irr-Iff dk'du exp(ik'u)
0-1

Integration over u yields


I

r-Irr-Ijdk' (ik,)-I (exp(ik') - exp(-ik') )


o

r- I (2/rr) fdk' sink' /k'

(61 )

The integral on the RHS of equation (61) is a standard form and


equal to rr/2, (best evaluated by the method of contour integration,
see [15J page 184), thus completing the proof.

383

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

The following equation will be used:


2

exp(-Ek )

2Ek

2 1

-3

exp(-Ew

-2 2

k )dw

(62)

This is derived form the identity e


a = Ek2 and x = w- 2
7.2

-a

-ax

dx by letting

The nuclear attraction integral

The general nuclear attraction integral over GTF is defined by

This three centre integral is decomposed into a sum of two


centre integrals by means of the Gaussian product theorem,
equtions (21) to (24), to yield

(63)
where K
The summations 1n equation (63) run from
m

=0

to (ml+m2) and n

Noting that

tc

=0

to (nl+n2).

= ~-c = (~-p) +
-1

the Fourier transform of rC

Hence

NAI

(P-c)

= ~p

may be written

PC,

where

PC

-+-+

P-C,

384

V. R. SAUNDERS

(64)

where

and similarly for I Y, IZ. Integration over the spatial coordinates


m n
is accomplished with the aid of equations (16) and (17). Thus,

1 ett~ng
= (4y)-I.

IX

1/2
"~'cr)1/2
i/2 exp(-k 2 ) H(
k)
X
x
y

'(7T)1/2
~
i/2 exp(-k 2 )2!
x
y

[2/2J (-1) q (2 ~k x ) 2-2q

q=o

q!

(2-2q)!

(65)

Remember that [aJ denotes the largest integer less than or equal
to a. Similar results for I Y and I Z can be obtained.
m
n
Subtituting the result of equation (65) into (64) we obtain

NAI

K2

-1

_1

7T 2y

-3/2

(66)
where L = 2-2q, M = m-2r, N = n-2s.

Integration over k-space

~s

facilitated by equation (62), followed by equations (16) and (17).

385

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

Thus

2E f dw
o
exp(-Ew

f dk

00

W-3

-00

-2

kL exp(-Ew-2 k 2 +iPC
k) f dk kM
x x
X
X X _~ Y Y
00

cof

k 2+iPC k) dk k exp(-Ew
Y
Y Y -co z z

-2
2 -

k +iPC k )
z
z z

L' ..M(_I_)M M' .N(_I_)N N'


21T 3/2 E-1/2 1.L(_I_)L
1/2
. 1
1/2
. 1
1/2
.
2E
2E
2E

[L/2]

t=o
[M/2]

u=o
[N/2]

v=o

fdw

(-I)t(PC )L-2t
x
t! (L-t)!

-I
E2

(~)

L-2t

(_I)u(PC )M-2u
y
(+)M-2u
u! (M-u)!
Eli
(_I)V(PC )N-2v
z

V!

(N-v)!

(+)N-2v
E2

2 (L-t+M-u+N-v) exp PC)


2W 2)
-y

(67)

Note that the integral on the RHS of equation (67) is of the form
of the function discussed in section 5,
I

Jdw

2 (L-t+M-u+N-v) exp PC)


2W 2)
-y

L-t+M-u+N-v y

(PC) 2)

hence, by substituting the result of equation (67) into (66)


we obtain
NAI

(!I'1 +2)

I
=0

21TK/y

tl2
_
[tl2] 1
(-1) E !f(I,2,PA ,PB) I --,-x
x q=o q.

386

V. R.

SA~JNDERS

[L/2J

t!

t=o

(ml +m2)

\
L

(L-2t)!

m m/2

(-1) E

m=o

_
_
[m/2J 1
m!f (ml,m2,PA ,PB)
\_m
y
y
L r!
r=o

[M/2J

u!

u=o

(nl +n2)

\
L

(M-2u)!
n n/2

(-I) E

n=o

[NIl]
\'

v=o
*F

v!

n!f (nl,n2,PA ,PB)


z

[n/2] 1

\_L
s!
s=o

(-)-)N-2u

(N-2v)!

E2

L-t+M-u+N-v ( Y(PC) 2)

(68)

Equation (68) is the general formula for the nuclear attraction


integral over GTFs.

It is not yet in a convenient form for

efficient implementation on a computer, and it is to matters of


efficiency that we now turn.
7.3

Efficient implementation of the nuclear attraction integral


formula

9.,12

that E
9., L-2t
(-I) (PC)
x

Not~ng

NAI

(L-2t) 12

q+t

etc, and using the fact that


L-2t
(CP )
, we may rewrite equation (68) as
x

IE

(9.,1+9.,2) (ml+m2) (nl+n2)


2rrK/y

i=o

j=o

k=o

2)
X,Y,Z k F . . k(Y(PC)
~ J
~+J +
(69)

where

x.

(70)

387

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

The notation

L-t=i

I II

was introduced in section 4. I, and means that

q t

we must loop over the indices , q and t 1n the appropriate range


(as indicated by equation (68)), and sum terms to X. for indicial
1

values satisfying the condition L-t=i, or equivalently -2q-t=i.


The computation of the X. is made more efficient by recasting
1

equation (70) into the form


L-t=i GX(-e)t(CP )L-2t

X.1

t~

t~

(L-~t)!

(71)

where the range of indices is L=O to (1+2), t=O to [L/2], and


X. consists of the sum of all possible terms satisfying the
1

condition L-t=i.

The G~ are defined by


(72)

the indicial range being =0 to (1+2), q = 0 to [/2J.

In the

event that we require nuclear attraction integrals over more than


one nucleus, which is the most usual case, a further breakdown of
We note that X. may be written

equation (71) is useful.

X.

HX(i,A) (CP ) A
A=max(O,2i-1- i 2

(73)

The notation A=max(O,2i-1-2) means that the start point for the
A summation is chosen as the algebraically larger of zero or
(2i-1-2), and

L-2t=A
L-t=i GX(-E) t

I I
L t

t! (L-2t)!.

(74)

where the range of indices is L=O to (1+2)' t=O to [L/2J, and


HX(i,A) consists of a sum of all terms satisfying the dual conditions
L-t=i, L-2 t=A.

v.

388

R. SAUNDERS

The procedure for the evaluation of nuclear attraction integrals


will now be summarized.
(a)

Clear all store locations designated to hold the values of

G~(L=O to (1+2'
q=O to

L~/2J),

Loop over and q (~=O to (~1+2)'

evaluate

L=~-2q,

of equation (72) to G~;


computed.
(b)

and add the corresonding term

in this way all possible G~ are

Similarly compute the G~ and G~.

Clear all storage locations designated to hold the values of


HX(i,A) (i=O to (1+2), A=max(0,2i-~1-2) to i).

Loop over

Land t (L=O to (1+2), t=O to [L/2J), compute i=L-t,


A=L-2t and add the corresponding term of equation (74) to
HX(i,A).
(c)

Similarly compute the HY(j,~) and HZ(k,v).

Loop over all possible nuclei.

X. (i=O to (1+2
~

Compute the factors

from equation (73), and similarly for

Yj and Zk' Also compute Fm(y(PC)2) for m=O to


(1+2+m1+m2+n1+n2)' The nuclear attraction integral

~s

then

computed from equation (69).


All of this may seem a strangely complicated way of evaluating the
straightforward equation (68).

Nonetheless, the 'complicated'

route suggested above will prove greatly more efficient than


direct use of equation (68), particularly when large angular
quantum numbers are involved, and/or if large numbers of nuclei are
present.
Our analysis will serve as a model for the factorization of the
sums involved in the more complicated electron repulsion integral,
to which we now turn our attention.

389

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

7.4

The electron repulsion integral

Our purpose is to give a discussion of the Fourier transform method


as applied to electron repulsion integral evaluation.

The

treatment will be somewhat abbreviated, on the grounds that the


analysis proceeds on lines parallel to the development of the
formula for the nuclear attraction integral.

We do not recommend

study of the present section until the contents of sections 7.2


and 7.3 have been understood.
The electron repulsion integral over GTF

1S

defined by

where the overlap distributions on the LHS and RHS of r~~ are
functions of the coordinates of electrons 1 and 2 respectively.
Now apply the Gaussian product theorem (equations (21) to (24
twice, to yield

(75)

where

Yl

390

V. R. SAUNDERS

The summation limits being i=O to (il+i 2 ), i'=O to (i3+i4) etc.


-I

The Fourier transform of r lZ may be written

r~~

(Zn 2 )-lfdk k- Z exp(itlp.k)exp(-itZQ.k)exp(iPQ.k)

where we have used equation (60) and noted that

Substitution of this transform into the integral part of the RHS


of equation (75) yields
I

(Zn 2)-lfdk k- Z exp(iPQ.k)I I ,I I ,I I ,


x x

y y

z z

(76)

where, using equations (16) and (17),


IX

Jx~P exp(-YIX~p+ikxxIP)dxIP

-00

iinl/Z(Yl)-1/2(Ell/2)i i! exp(-E1k;)

[t/2] (-I)q(2Ell/2)Lk L
I
x
L!
q=o q !

07a)

and similarly
00

fX;Q

Ix'

exp(-Y2X~Q-ikxx2Q)dx2Q

-00

(-i)i'n l / 2 (Y2)-1/2(E2 1/ 2)i'i'! exp(-E2k;)

[i'~/ZJ(-I)q, (2E21/2)L\~'
q

=0

q'

L'!

(77b)

where El
I , I "
y

I , I , may be evaluated similarly.


z

Substitution of the

results of equation (77) into (76), followed by an application of


equation (62) gives the result

391

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

= I)'IT

_2 1

f dw

-3

(78)

Yl +Y2) , an d
(___
YlY2

_I
1
1 '
! ' !
'IT 1o( -10)' ( Yl Y2 ) 2(q2) (22)

1 L'
[/2J ['/2J (-I)q+q' (21 ~)L
(222)
L q'=o
L q! q' ! L! L' !
q=o

* f dk x

L+L'
-2
exp(-I)w k 2 + iPQ k )
x
x
x x

and similarly for J and J Integration over k-space is


y
z
accomplished using equations (16) and (17); for example,
after some manipulation
J

2 3/2
_'IT__

(Yl +Y2)

'

(-I)

!
2

!'!

[/2J (l)-q

q=o

q!L!

[' /2J (2) -q (L+L')!


L q'! L'!
q'=o
[(L+L')/2] (_I)t (PQ )L+L'-2t

. t=o

-2(L+L'-t
exp(-PQ )2/(41))w 2 )w.w

t! (L+L'-2t)! I)L+L'-t

(79)

Substitution of the result of equation (79) into (78), followed


by a substitution of the result into (75) gives the following
formula for the electron repulsion integral.

m,m' ,r,r' ,u

FL+L'-t+M+M'-u+N+N'-V

( (PQ) 2/ (4 I) )

(80)

392

V. R. SAUNDERS

where

q!

q'! L'!

L!

(L+L')! (-1 )t(PQ)L+L '-2t


t!

(81 )

(L+L' -2t) ! ./)L+L'-t

and similarly for the GY and GZ factors.

The summation limits

in equation (80) are

q=O to

q'=O to

[~/2J,

t=O to [(L+L')/2]

[~'/2J

[(t-2q+~'-2q')/2J

and similarly for the other indices.


7.5

Efficient implementation of the electron repulsion integral


formula

We now describe on efficient scheme for the evaluation of equation


(8).

We first recast equation (80) into the form

ERI

i j k

x,Y,Z k F . . kPQ)2/(4/))
1 J
1+J+
(82)

where

X.

(83)

and similarly for Y. and Zk.


J

The evaluation of the X. is made


1

more efficient by recasting equation (83) into the form

393

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

x.

L+L'-t=ix

L,L' ,t

(L+L') !(_I)t(PQ )L+L'-2t

HL lIL ,

(84)

t! (L+L'_2t)!oL+L'-t

where the indicial ranges are L=O to (tl+t2)' L'=O to (t3+t4)


and t=O to [(L+L')/2J, and
t-2q=L
\'

t-q
t
q
t!
(-I) f (tl,t2,PA ,PB )--;---::-.-_
L!
t
x
x
q!
t,q

(85a)

I\'

(85b)

The procedure for the evaluation of the electron repulsion integral


will now be summarized.
(a)

Clear the storage locations designated to hold the factors

I\

(L=O to (tl+t2

and

I\t

(L'=O to (t3+t4'

Loop over

t and q (t=O to (tl+t2)' q=O to [t/2:J), evaluate L=t-2q


and add the appropriate term of equation (85a) to

I\'

Similarly, evaluate
y y
z z
factors ~1'HM' 'liN 'liN , .
(b)

(from equation (85b, and also the

Clear all storage locations designated to hold the


(i=O to (tl+t2+t3+t4'

HL.x

x.

Loop over L,L' and t (L=O to (tl+t2),

L'=O to (t3+t4)' t=O to [(L+L')/2J), evaluate i= L+L'-t and


add the appropriate term of equation (84) to

x..
~

Similarly evaluate the Yj and Zk'


(c)

Evaluate the F

(d)

Evaluate the electron repulsion integral using equation (83).

PQ)2/(4o)

for m=O to

394

v.

8.

R. SAUNDERS

SOME DEFINITE INTEGRALS RELATED TO THE MODIFIED BESSEL


FUNCTIONS OF THE SECOND KIND

The purpose of the present section is to collect a few formulae


relating to the modified Bessel functions of the second kind
which will be required when we discuss the evaluation of certain
molecular integrals over ETF in section 9.

The student is referred

to Watson [16J for a coherent treatment of Bessel functions.


8.1

The modified Bessel function of the second kind

We may define the modified Bessel function of the second kind


through an integral representation (for W>O),

K (W)
v

It

~s

f dy

2v-1

W 2 -2
exp(- Z(y +y

(87)

more convenient to work with a reduced function


k (W)
v

v
W

f dy

2v-1

W 2 -2
exp(- Z(y +y

(88)

Other integral representations can be found, for example

Kv (W)

nl/2(W/2)v oof
V-I
r(v+I/2) I exp(-Wt) (tLl) :1

but these will not be of direct concern to us.


8.2

The function I (A,B)


v

We define
(89)

I (A, B)
v

Let y2 = At/B, and noting the definition of the reduced Bessel


function (88),

395

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

Bv
2 (A)

(A,B)

""f

dy y

2 -2
exp(-AB(y +y

2. (2A 2 )-v k

8.3

2v-1

(2AB)

(90)

Recurrence relationship for reduced Bessel functions

W 2 -2
Integrate equation (88) by parts, letting u = exp(- I(y +y
2v-1
dv = y
dy

v 2v
W 2 -2
""
W [y
exp(- I(y +y /2vJ o

k (W)
v

Wv+1

oof

Iv

(2v+1 2v-3)
W 2 -2
y
-y
exp(- I(y +y

the above equation is not valid if v=O;

noting that the first

term in the RHS is zero, and using equation (88), we find

k (W)
v

giving rise to the following:


(9Ia)

(kv+ leW) - 2vkv (W/W2

(9Ib)

It is immediately obvious from the form of the integral in


equation (88) that the reduced Bessel function is a positive
quantity, so that recursion upwards using (9Ia) will be numerically
stable for v>O, whilst (9Ib), recursion downwards, is stable for
VKO.

8.4

Modified Bessel functions of half-integral order

By performing the substitution z=y

k (W)
v

v
W

fdz

00

-(2v+l)

-I

in equation (88), we find

W 2 -2
exp(- I(z +z

396

V. R. SAUNDERS

(92)
Therefore

(k l / 2 + Wk_ I / 2 (W/2

WI / 200
-2
W 2 -2
2
fdy (I+y ) exp(- 2(y +y
o

2 -2
Noting that y +y
kl /2 (W)

Let z

y-y

-I

k l / 2 (W)

WI / 2

00
-2
W
-I 2
exp(-W) fdy (I+y ) exp(- 2(Y-y
o

and using equations (II) and (12)


WI / 2

exp(-W)

fdz exp(- ~2)


-00

(rr/2)1/2 exp(-W)

(93a)

and, using equation (92)


(93b)
All other Bessel functions of half-integral order may be computed
by recurrence (90).
8.5

Modified Bessel functions of integral order

Clearly, if we are able to evaluate k (W) and kl (W), all other


o

positive order Bessel functions are available by recursion


upwards (9Ia).

Negative integral order functions can be made

available by computing k_1 (W) from equation (92), followed by


recursion down (90b), remembering that the recurrence formulae

397

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

are not valid for v=O.


and k1(W).

The basic problem is to compute k (W)


o

No closed form exists, and the following series are

usually used (see, for example, [17J pages 375 and 378):
(a)

Asymptotic expansion for large W.


1f
1/2
K (W)
(2W)
exp (-W)
v

Denoting ~=4v2 ,

*(1

(94)
If

v~O,

the remainder after k terms does not exceed the (k+l)th

term in absolute magnitude, provided that k~v-I/2


(b)

Series in ascending powers of W (only for integral order


Bessel functions).

K (W)
n

+ ~ (-W /2)

!(W/2)-n

k=o

n
(~(k+l)+ ~(n+k+l (W2 /4)k
L ....!.:-~~~,...>-:.:;~..:;....!...O<-...,;"-'-~
k=o
k! (n+k)!
00

- (-W/2)n log (W/2)


e

provided that

~(k)

n~1 (n-k-I)~! (-W2 /4)k

n~O,

-Y +

'"

k=o

(W 2 /4)k

k!

(n+k)!

(95)

and
k-I

j-I

j=1

where y is Euler's constant,


y

0.5772 15664 90153 ...

The reduced functions may be computed from (88).


Series (95) is used for arguments W<8, whilst series (94) is
used for

W~8.

Other procedures, based on tabular interpolation

Pad: approximants and numerical integration are also in use.

v.

398

9.

R. SAUNDERS

EVALUATION OF MOLECULAR INTEGRALS OVER ETF/STF

The following notes give details of a sample of the more commonly


used procedures for the evaluation of molecular integrals over
STF.

Our treatment is far from exhaustive.

Perhaps the most

obvious omission concerns a lack of explicit consideration of one


centre integrals.

The latter are normally evaluated using

spherical polar coordinates, and an excellent summary is to be


found in reference [18J.
9.1

Evaluation of two centre overlap and kinetic energy integrals


by the A and B function method

The evaluation of this class of integrals is facilitated by a


transformation to elliptic coordinates, as shown in the figure.

,
(

'8

R--~>

Such coordinates are best defined with respect to two local


Cartesian systems centred at A and B, the z axis of both being
chosen to be along the bond AB.

A,

and

Then the elliptic coordinates,

are given by
R

-(A~-I)

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

399

Therefore
r

The Jacobian of the transformation is given by (~) 3 (;\,2_)12), the


limits of integration being 1 to
and

00,

-1 to 1 and 0 to 2n for A, )1

respectively.

Consider a simple example of the type of integral which can arise,


the overlap integral between two unnormalized Is Slater orbitals,
centred on A and B:

00

(!) . f

1 2n

ff

1 -1 0

and

An (F)
B (Q)
n

00

{ exp(-FA)A
1

f
-1

n dA

exp(-Q)1) )1

d)1

(96a)

(96b)

v. R. SAUNDERS

400

Clearly the evaluation of other overlap integrals involving ETF


of higher quantum numbers will follow a similar, but more complicated
path, and involve the evaluation of A and B functions of higher
order [19J.
The evaluation of the kinetic energy integrals is also simple if
we use the analogue of the 'symmetric' method for GTF described
in section 4.4, noting that
3/3x

E(a,A,~,,m,n)

(~-l)

E(a,A,~-2,+I,m,n)

E(a,A,~,-I,m,n)

E(a,A,~-I,+I,m,n)

Note that the integrals evaluated above have been of the form where
the x, y and z axes of have been defined rather conveniently with
respect to the AB bond.

In the general polyatomic molecule, we

will require integrals involving orbitals with angular factors


which are defined with respect to a global Cartesian system, which
will not in general be parallel to the local coordinate systems at
A and B chosen above.

However, a Cartesian coordinate in the

global frame may be written as a linear combination of coordinates


in the local frame, via a rotational transformation;

hence an

integral over basis functions defined in the global frame may be


expressed as a linear combination of integrals evaluated in the
local frame.
9.2

Evaluation of the A (P) functions


n

A (P) may be evaluated directly, to give


o

A (P)

exp(-P)/P

401

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

A-functions of high order may be evaluated by recursion upwards.


Thus integration of equation (96a) by parts leads to
n+l)A (P)

n+ I(P)

9.3

exp(-P))/P

Evaluation of the B (Q) functions


n

We first derive a recursion relationship, by integration by parts,


( (n+I)B (Q)

Bn+1(Q)

exp(-Q)

(_I)n exp(Q))/Q

This recursion formula is numerically stable [20J for IQI>n


Ie
max
in the ascending direction, and for IQI~n
Ie in the descending
max
direction, where n
is the highest order of B-function required.
max
Therefore, if IQI>n
Ie, we evaluate B0 (Q) and recur in the
max
ascending direction, noting that
B (Q)
o

(exp(Q) - exp(-Q))/Q

Ie we evaluate B
(Q) from a power series In Q, and
max
n max
then recur downwards, noting that

If IQI~n

B (Q)
n

= -2

I
(Q)2i
i =0 (2 i)! (n +2 i + 1)

(Q)2i-l
i=l (2i-l)! (n+2i)

(n even)

(n odd)

The above formulae are easily derived by expanding the exponential


in power series, and integrating equation (96b) term by term.
9.4

Gaussian integration

The most widely investigated method for approximating a definite


integral is

402

v.
b

jw(x) f (X) dx
a

'II

i=1

R. SAUNDERS

A.f(x.)
~

(97)

The weight-function or kernel, w(x), is such that the moments


(the so-called monomial integrals)
b

are defined.

w(x) x

dx

We can say that a quadrature formula of the type (97)

has a degree of exactness, m, if it is exact whenever f(x) is


a polynomial of degree ~ m, (or equivalently, whenever f(x) =
I,x, . xm) and it is not exact for f(x) = xm+l. The x. are
~

called the points (or nodes) of the quadrature, and the A. are
called coefficients (or weights).

Clearly, a given n-point

quadrature cannot have, in general, a degree of exactness greater


than 2n-l, there being only 2n adjustable parameters in the
quadrature formula.

It can be shown [27J that if w(x) is

non-negative on the interval (a,b), then n points and coefficients


can be found to make (97) exact for all the monomial integrals
dk , k=O to 2n-l, and hence all polynomials of degree ~2n-l, and
that the n points will all be interior to (a,b). Such quadrature
formulae are normally called Gaussian quadrature formulae,
because they were first studied by Gauss for the case w(x) =1.
It can further be shown that the points of the n-point quadrature
are the roots of the orthogonal polynomial of degree n, defined
for the interval (a,b) and with the weight functions w(x).
Gaussian quadratures are well known for the weight functions
exp(-x) (on the interval (0,00, exp(-x 2 ) (on the interval
(-00,00

and w(x) = (I-x)u (l+x)S (on the interval (-1,1, these

being referred to as Gauss-Laguerre, Gauss-Hermite and Gauss-Jacobi


methods respectively.

A special case of Gauss-Jacobi is where

w(x) = I, (therefore u=S=O), referred to as the Gauss-Legendre


procedure.

There is a large literature concerning the convergence

of the Gaussian integration technique, and also concerning methods

403

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

for the determination of the points and weights.

This literature

has been adequately surveyed in reference [21J, to which we refer.


An approximate integration formula for a multiple integral over a

m-dimensional space corresponding to equation (97) is


bm

bl

w (xl" xm) f (xl' xm) dxI dxm

al

am
n

'VLA.f(x' j . . . . . x.)
i=1 ~
~,
~m
where the ~.=(x. I ..... x. ) are points in the space, and the A.
~

~,

~m

have the property that the approximation is exact whenever


f(xI,"'X) is a polynomial which does not exceed a certain
n

degree, d, in the m variables.


cubature formula.

Such a formula is known as a

Clearly, the calculation of the values of the

A. and ~. so as to produce a cubature formula of a given degree


~

of exactness is, in general, a matter of far greater complexity


than for the one dimensional case, and reference [22J should
be consulted for a discussion of certain known cases.

Normally,

we set our sights somewhat lower, and apply the ordinary one
dimensional Gaussian procedures m times.

Consider the two

dimensional cases:
b2

bl

J J
a2

W(xI,x2) f(xI,x2) dXI dX2

al
b2 bl

f f
a2

wI (xI)W2 (x2)

aJ

nZ nl

'V

L L

j=li=1

A. 1 AJ. 2
~

f (xl ,x2)W (xI,x2)


~v1 (xl) W2 (x2)

f(x' l x 2 )w(x 1 x' Z)


~,J
~,J

404
9.5

v.

R. SAUNDERS

Nuclear attraction integral over unnormalized 'Is' STF


using the Gaussian transform method

The method to be described has been extensively discussed by


Shavitt [6J and Shavitt and Karplus [Z3J.

The integral transform

we shall use is of the form


e

-ar

-2:-

ZIf I/Z

s-3/Z exp(-r 2 s -

(~Z)2/S)

ds

(98)

Equation (98) is easily proved using the results presented in


section 8.
oof

Thus

s -3/Z exp(-r 2s -

(a)2
2 /s)ds

I_I/Z(r,a/Z)

(using equation (89))

3/Z
Z
r k_I/Z(ar)

(using equation (90))

Zlf l / Z
exp(-ar)
a

(using equation (93b) )

thus completing the proof.


Before proceeding with our discussion of the Gaussian transform
method (GTM) we should like to mention an entirely different
approach based on the Fourier transform of STF, and refer to the
work of Silverstone [Z4J.

We will concentrate on the GTM because

of the latter's wider useage at the present time.

The three

centre nuclear attraction integral over unnormalized Is STF may


be written
NAI

dT

405

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

f J

00

al a 2

41T

where we have applied the Gaussian transform twice.

The

integration over the spacial variables is of the form of a NAI


over Is GTF, as discussed in sections 6.2 and 7.2.

Therefore,

using equation (55), we find

NAI

where the coordinates of the point P are given by

We now define the coordinate transformation


u

Therefore
z (I-u)

The Jacobian of the transformation is given by


J

(SI,S2)
u , z

the limits of integration being u

to I, z

to

00.

v. R. SAUNDERS

406
Hence

NAI

00

dz Z-3 exp[-u(l-u) (AB)2 z-al/2)2/ u+(a2/ 2)2/(I-u/z]

* Fo (z(CP)2)

(99)

where

ut..

+ (l-u)B

(IOOa)

Hence

C-p

u(C-A) + (I-u) (C-B)

(IOOb)

We now consider the evaluation of equation (99).

To integrate

over u, we use the Gauss-Legendre quadrature rule, noting that


the transformation
u'

2u-1

leads to an integration over u ' in the interval (-1,1).

In

practive we perform a direct quadrature over u, the coefficients


and points of the u quadrature (A.,u.) being related to the
1

Gauss-Legendre coefficients and points (A!, u!) via


1

u.

A.

(U!+I)/2
1

A!/2
1

(note that du

du ' /2)

We thus reduce the NAI to the form

407

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

In

NAI

i=1

-3/2

A. u.

~ ~

(I -u)

-3/2

dz z

-3

where (Cp.)2 may be calculated from equation (100).


~

We deal first

with the integration over z ~n the case that (Cp.)2 is zero


~

(this can only be so if the point C is on the line AB, and even
then is not necessarily so).

We then have F (0)


a

I, and the

integral over z reduces to the form

where

2 1/2
(AB)]
,

[ u. ( 1-u .)
~

and 1_2 [X,Y] was defined in section 8 (see equation (89.


We now discuss the integration over z ~n the case that (Cp.)2
~

~s finite.

Let w = z(Cp.)2, so that


~

NA1

I
i=l

A.u-: 3 / 2 (I-u.) -3/2(Cp.)4 H(-2,0,g,h)


~

(101)

where
OJ

M(.Q;.in,g,h)

dw

wi - I exp(-gw-h/w)F (w)
m

and g

(Cp.)-2 (AB)2u. (l-u.)


1

(102)

408

V. R. SAUNDERS

Equation (101) defines our scheme for the evaluation of the


NAI, provided that we are able to evaluate
9.6

The evaluation of

M(~,m,g,h).

M(~,m,g,h)

The most rapidly varying part of the integral on the RHS of


equation (102) is exp (-gw-h/w-), which has the form displayed 1n
the figure below
exp(-gw-h/w)

The characteristics of the above figure are of some importance for


the methods of evaluation of

M(~,m,g,h),

and it is to the latter

we now turn.
(a)

The first method consists of exponding F (W) in the


m

asymptotic series, equations (47) and (53), to give

!r (m+ 1/2) I

M(,m,g,h)

II((g+I)1/2,h I/2 )

(g 1/ 2 h 1/2)

~-m-1/2'

~-

( 2m-1)1 -2 ( g+ I) 1/2 , h I/2 )

-I-

22
.

(2m-I) .... (2m-21+3)I

. g+l)
-1

1/2

,h

1/2

where the I-functions were discussed in section 8, see equation


(89).

Clearly the method will only prove accurate when the

409

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

maximum of exp(-gw-h/w) lies at sufficiently large w.


The method is particularly attractive if only the first term
needs to be included, because Bessel functions of half-integral
order only are required.
(b)

Expand F (w)
m

(the

max~mum

F (w)
m

~n

(-I) i (w-w ) ~ F
0

.
po~nt

-_(h/g)I/2,

. (w )
0

m+~

. I

~=o

. (w ) (-I) i
. I
L

m+~

. I

~=o

L (-1 )j

j=o

~.

j=o
i

(w )

(~)wj (-w )i-j

m+~

~=o

a b out t h e

of exp(-gw-h/w)), using equation (42), to give

L
L

.
ser~es

a Taylor

(~) (w )i-jw j
J

Substituting into equation (102) , we find


Fm+i (w o )

M(i,m,g,h)

~=o

(c)

. I

j=o

(-I)j (~) (w ) i-j I i+J. ( g 1/2 , hl/2)


0
J

Expand F (w) in the series given by equation (40) , so that


m
M(i,m,g,h)

~r(m+D

i=o

i+~

+1)1/2 hl/2)
,

Clearly this procedure will be rapidly convergent only


when the maximum of exp(-gw-h/w) is found at a small
value for w.
(d)

Let us rewrite equation (102) using the integral representation


of F (w)
m

M(t,m,g,h)

00

ff

dw dt wt - I t 2m exp(-(g+t 2 )w-h/w)

Integrate over w first,

410

V. R. SAUNDERS

M(Q"m,g,h)

dt t 2m IQ,(g+t2)1/2,hl/2)

The integration over t is now performed using Gauss-Legendre


quadrature.
9.7

The electron repulsion integral for unnormalized 'Is' STF


using the Gaussian transform method

The four centre electron repulsion integral over unnormalized Is


STF is defined by

We apply the Gaussian transform (98) four times, to give


00<00000

ERI

(l1(l2(l3(l4

16rr2

ffII

0000

*ff
We notice that the integration over the spatial coordinates is of
the form of an electron repulsion integral over GTF, see sections
6.3 and 7.4, so that after using equation (57), we obtain.
ERI

*
*
(103)

411

AN INTRODUCTION TO MOLECULAR INTEGRAL EV ALUA TlON

where the coordinates of P and Q are given by

We now apply a coordinate transformation

The Jacobian of the transformation is t(l-t)z3,


hence
11100

ERI

ffffdudvdtdzz

-11/2-3-3

(I-t)

0000

[u(l-u)v(l-v)]

-3/2

exp(-gz-h/z) F (zd)
o

(104)

where

g
h
d

u(l-u)t

[~
ut

AB2

+ v(l-v) (I-t)

(l~u)t

t (i-t) PQ2

+~

v(l-t)

CD2

<:(4 2

(i-v) (I-t)

/4

412

V.R.SAUNDERS

Au+B(I-u)
Cv+D(I-v)
The procedure is now to integrate over u, v and t numerically,
the integration over z being similar to that discussed for the
nuclear attraction integral in sections 9.5 and 9.6.
9.8

Miscellaneous notes on the Gaussian transform method

(a)

When dealing with a one-centre overlap distribution, it


is best to apply a single Gaussian transform to the overlap
distribution, rather than two separate transformations,
one for each basis function.

The student will find it

instructive to work through the case of the two centre


nuclear attraction integral involving two Is STF on the
same centre, following the single transform method.

This

integral can be reduced to closed form, as is to be expected,


since it can also be evaluated by the A and
(b)

function method.

The numerical integration over the u variable of equation (99)


becomes slowly convergent if the orbital exponents of the
two STF (al and a2) become greatly different, and if the
suggested Gauss-Legendre integration scheme is used.
Analysis of the corresponding two centre overlap integral by
the Gaussian transform method shows up the reason for this,
and is suggestive of ways to improve the convergence of the
u quadrature.

We leave such an analysis as an exercise

for the student.

Note that rather similar difficulties arise

in the case of the u and v integrations of equation (104).


(c)

A procedure, known as the 'alternate Gaussian transform method'


is known for the ERr.

The alternate method differs from the

ordinary method by the use of a different coordinate

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

413

transformation in the development commencing at equation (103).


We refer to the literature [6,23,25J for a discussion of the
alternate method.
(d)

Formulae for molecular integrals involving basis functions


of higher quantum may be deduced by derivation of the basis
equations (99) and (104), using the step operators

aE
aa

(a,A,~,l,m,n)

E(a,A,~+I,l,m,n)

and

aa

(1a

aE
aA

(a,A,~,l,m,n)

lE(a,A,~,1-I,m,n)/a2

+ lE(a,A,~+1 ,1-1,m,n)/a

-E(a,A,~,l+l,m,n)

This process of stepping up the quantum numbers (which is very


similar to that for GTF described in section 6.4) gives rise to
extremelY tedious algebraic manipulations, so that it has proved
necessary to generate the formulae for molecular integrals over
STF using computers, using a sort of 'compiler'.

414

10.

V. R. SAUNDERS

ORGANIZATIONAL ASPECTS OF A MOLECULAR INTEGRALS PROGRAM

In this final section, we turn to a consideration of methods for


the avoidance of the evaluation of molecular integrals.

Thus the

most simple-minded scheme is to loop over all the necessary molecular


integrals, evaluate each one in turn, and output the result to a
file for later use.

The following gives details of modifications

of this primitive scheme, which can enhance the rate of molecular


integral evaluation very considerably.

In all cases we use the

two electron integrals as an example, although the methods are also


applicable to the other classes of molecular integral.
10.1 Use of molecular symmetry
The notation of equation (Id) will be used for the two electron
integral.

Two integrals, (ij/ki) and (i'j'/k'i') can be seen to be

related if the basis functions which appear in one integral are


transformed into the basis functions of the other integral under a
transformation of the dummy variables (the coordinates of electrons
I and 2), provided that the Jacobian of the transformation is unity,
-I

and the transformation leaves the operator r l2 unchanged. The


method is to apply to an integral a sequence of orthogonal
transformations of the spatial variables which leave the operator
unaltered, thus generationg a sub list of integrals which are equal
(within a sign) to the original.

If it should turn out that one of

these is the negative of the original, then obviously the value of


all these integrals must be zero.
For example, consider the square planar configuration shwon in
the figure below:

A
B

415

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

Suppose we site Is (spherically symmetric) basis functions on A, B,


C and D, which we denote SA,SB,SCand SD respectively.

Consider a

rotation of the coordinate systems of electrons 1 and 2 by 90 0 in


the plane of the paper about the origin in the anticlockwise direction.

Basis functions SA in the old coordinate system will 'look

like' basis function SB in the new system, the overall effect being
as if we had rotated the system by 90 0 in the clockwise direction,
whilst leaving the coordinate systems unaltered.

We can say that

this rotation operator transforms SA into SB' SB into SC' Sc to SD


and SD into SA'

The two-electron integral (SASB/SCSD) will

transform into (S~SC/SDSA)' so that (SBSC/SDSA)

(SASB/SCSD)'

We now propose an algorithm for the use of such symmetry transformation operators, and present a formal analysis thereof.
Let an integral (ij/kt) be tagged with a unique index or label,
L.

The effect of a transformation operator, T, on an integral will

be represented by its effect on the integral label, thus:


TL

L'

The algorithm we propose to use is as follows:


(i)
(ii)

Loop through all possible integrals


For each integral we
(a)

Generate the associated label, L

(b)

For each transformation operator we generate L.

T.L
1

(i = 1 to m) .

(c)

If all the L. are


1

~L

, we calculate the value of the

integral, tag the value of the integral with all the


distinct labels in the range L to L , and output to file.
o
m

v.

416

(d)

If any of the Li are

<

R. SAUNDERS

Lo' we do not evaluate the

integral, or output any thing to file.


The question we must now answer is:

what conditions must the set

of transformation operators satisfy in order that a complete and


non-redundant list of integrals is eventually placed on file?
Consider the following
(a)

Suppose T)Lo

= L),

and LL o '

This means that when we

evaluate the integral labelled L , we will place both Land


L) on file.

Now when we come to process the integral labelled

L), clearly we must be able to guarantee that the transformation


operators will generate a label less than L), and we are only
.;.-)

able to guarantee this by having T)

in our set of operators,

because
L

We conclude that in order that we do not produce a redundant


list, each operator and its inverse must be present in the set.
(b)

Suppose we have two transformation equations

where K

>

I and J.

We see at once the possibility that K may

be loaded to file twice (once when we process I, and again


when we process J).

Further suppose that J > I.

Clearly we

could avoid loading K to file twice if we have an operator,


T3 , in the set such that:

417

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

-I

Now by rule (a) above we must have TI


T- I K
I

in the set, so that

Hence

Assuming a label list of the most general kind, the implication


is that for the proposed algorithm to work satisfactorily, the
product of any two operators in the set of operators must itself
be an operator in the set, and the set of operators must comprise
a mathematical group.

(Note that it is not strictly necessary to

include the identity, however.)


10.Z Practical realization of the label list algorithm
At the heart of the practical realization lies a table contructed
[Z6J so that each column corresponds to a transformation, each
row to a basis function.

The element M.. of the table denotes the


1.J

effect that T. has on basis function

means that transformation T. sends


J

~.

1.

1.

into

A non-zero entry, k,
~k'

A zero entry means

that the result of the transformation is a function which is not


a member of the basis set, and can arise, for example, when we use
local symmetry operators.

The table may be supplied as part of

the data, or computer generated, the latter being quite simple


for such operators as CZ'C 4 'SZ,S4' a,i, given the coordinates
of the point about which these operations -are to be performed.
In the most general program, both methods of definition of the
table would be used.
The completion of a partial group of transformations is normally
performed by program, although in some early implementations, the
group had to be supplied complete,
following table

As an-example consider the

V.R.SAUNDERS

418

Transformation

Basis

-)

-2

func-

tion

-2
3

-2

1- 2
1-5

-4

-3

-)

-)

Let us suppose we are supplied with a first set of transformations,


T) and T2 . We consider all products of operators within this set,
and each 'new' operator is appended to the list of transformations.
Thus T3 = T)T2' T4 = T2T), T5 = T) 2 and T6 = T22 in the above, and
comprise a second set. Now consider all products within the second
set, and also any cross products between operators of the first
and second set.

Any new operators from this procedure are

appended to the list, and we continue in the fashion until no new


operators are formed,

when the table is complete.

In the case

that the table contains zero valued entries, it is necessary to


adopt the convention that 'zero transforms to zero' under any
operation.
Note

The following statement gives a slightly weaker condition

for a transformation matrix to generate a complete non-redundant


list:

take the transformation matrix, and change all negative

signs to positive.

Add a column corresponding to the identity.

The transformations specified by this table alone must form a


group.
)0.3 Batching of integrals with certain common factors
Let GA denote a 'group' of basis functions with certain common
features.

To be useful, these common features should be that

419

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

(a)

The orbitals have a common exponent

(b)

The orbitals are located on the same centre.

The most common example would be where we have three p-orbitals on


a common centre and of common exponent (p,p and p ) or similarly
x y
z
six d-orbitals (d 2,d 2,d 2,d ,d ,d ). Larger groups of basis
x
y
z
xy xz yz
functions may be generated by requiring that the orbital exponents
of sand p orbitals on a given centre be in common for example.
A

Denote a member of the group G by G..

ABC D

Then a batch of integrals

will be denoted (G G /G G ) and includes all distinct integrals


ABC D

of the form (GiGj/GkG i ).

Now it is the case that the integrals

of such a batch possess many common factors, so that an optimal


strategy is to evaluate these common factors first, and then
evaluate the batch of integrals 'simultaneously'.

Such a procedure

will normally produce a large enhancement of the rate of calculation


of molecular integrals.
It is of course possible to combine the use of molecular symmetry
and batching into one program.

To do this one needs two

transformation matrices, the first being for the basis functions,


the second being for the groups.

In the latter, the element

NAj
B means that under transformation Tj , the basis functions
A
in G transform into the basis functions of GB. That is, there is
. f unction
.
. GA , an d
a one to one correspondence between a basis
in
one in GB. The procedure is to use the group transformation table

to give the overall connection between batches of integrals, and


the basis function transformation table to give the detail of
that connection.
Suppose under a given transformation a basis function' goes into a
linear combination of other basis functions.

Such an event is

420

v.

R. SAUNDERS

quite common when considering rotation operators which do not


involve angles of 90 0 or 180 0

Consider the effect of C3 operations


on P orbitals sited on A, Band C of the following system.

J..y

~I_ _ _~A

By a slight generalization of basis function transformation table,


such transformations can be handled efficiently if the combined
batching and symmetry algorithm is used.

If batching is not used,

the organizational difficulties are such that such a C3 operation


as above is rarely used to the maximum of its potential. For an
example of the use of symmetry and batching, see the program ATMOL2
[273 .

10.4 Other organizational aspects


(a)

Permutational redundancy.
It is clear from inspection that
(ij/k)
(U/ij)

(ji/kt)

(ij/k)

(U/ji)

(k/ij)

(ji/k)
(k/ji).

Clearly it would be extremely inefficient to evaluate such


integrals separately.

It is necessary to organize the looping

structure of the molecular integrals program so that only one


member of such a permutationally redundant set is computed and
output to file.

It is the responsibility of the program designed

to subsequently process such a file to take account of such


permutational redundancy elimination, and Dr Veil lard will discuss
this in the context of the SCF method.

421

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

(b)

Integrals which are 'zero' by distance.


As a model, consider a regular planar lattice of points, and
at each point we site on s-orbital, all s-orbitals being of
equal exponent.

We can now show that the number of overlap

distributions with differential overlap greater than a given


factor,

E,

(note that the differential overlap is equal to

fl IdT), is proportional to the number of basis functions,


~

N, and not the N2, no matter how small is

E.

Because a two

electron integral decays roughly as the product of the


differential overlaps of its two overlap distributions, the
number of two electron integrals greater than a given
threshold, 8, will rise as N2 and not N4.

Clearly a rather

similar situation will obtain in any large chemical system,


and we see that if we deduce a rapid method for obtaining
an approximation to an integral, we will be able to completely
avoid a full computation in the case of many such integrals
(d)

-I

The mUltipole expansion of r 12 .


Consider the two electron integral (ijlk~).

Let the overlap

distribution .. be 'distributed' over centres A and B, and


~

k~

be over C and D.

l);~~

'i

Ce.t1-t,.t. ~

~i

'J)ipol~

Ce.I\-t..of. 0-9 ~A. t

422

V. R. SAUNDERS

-I

Expand the operator r lZ in a Taylor series using the variables


xIE'YIE,zIE,xZF'YZF and zZF. Substitution of such a Taylor series
into the two electron integral gives as a first term

where S .. , Sk n are the overlap integrals J..dT and Jk dT, and


IJ

~F

is the distance between E and F.

1.

If ~F is large compared with RAB and ReD' the asymptotic approximation is extremely accurate, and is surprisingly accurate in many
less favourable situations, and as such will often provide a useful
basis for forming an approximation to an integral.

A method based

closely upon the above considerations is to let


..
J

1.

That is, to approximate the overlap distributions .. and


1.

k 1

by spherical GTFs centred at E and F respectively, whose exponents


and coefficients are chosen so that
J..dT
1.

We will then have the approximation

423

AN INTRODUCTION TO MOLECULAR INTEGRAL EVALUATION

It is to be noted that the success of the multipole expansion


method implies that any numerical integration procedure proposed
for two electron integrals which fails to give good values for
overlap and higher multipole one electron integrals is most unlikely
to be very successful.

REFERENCES
[IJ

BOYS S F, (1950). Proc Roy Soc A200, 542.

[2J

PILAR F L, (1968). Elementary Quantum Chemistry. (McGrawHill) .

[3J

STEINER E, and SYKES S, (1972). Mol Phys, 23, 643.


STEINER E, (1972). Mol Phys, 23, 65'1, 669 STEINER E, and WALSH B C, (1974), Quantum Chemistry - The
State of the Art, SRC Atlas Symposium 4, (ed V R SAUNDERS and
J BROWN).

[4J

JEFFREYS H, and JEFFREYS B S, (1950), Methods of Mathematical


Physics. (Cambridge University Press).

[5J

SNEDDON I N, (1961). Special Functions of Mathematical


Physics and Chemistry (Oliver and Boyd).

[6J

SHAVITT I, (1963). Methods in Computational Physics,


(Academic Press).

[7J

DAVIS P J, (1964). Handbook of Mathematical Functions,


(ed M Abramowitz and I A Stegun). (Dover).

[8J

WALL H s, (1948). Analytic Theory of Continued Fractions.


(Wiley) .

[9J

KHOVANSKII A N, (1963). The application of Continued Fractions


and their Generalizations to Problems in Approximation Theory.
Translated by P WYNN. (Noordhoff).

~,

I.

[IOJ OBERHETTINGER F, and BADDII L, (1973). Tables of Laplace


Transforms. (Springer-Verlag).
[IIJ WRIGHT J P, (1963). Quarterly Progress Report of the Solid
State and Molecular Theory Group, MIT, 35.
[J2J TAKETAH, HUZINAGA S, and O-OHATAK, (1966). J Phys Soc (Japan)
..!..!.., 2313.

424

V.R.SAUNDERS

[13J CARRIER G F, KROOK M, and PEARSON C E, (1966). Functions of a


Complex Variables (McGraw-Hill).
[14J CAMPBELL G, and FOSTER R, (1948). Fourier Integrals for Practica
Applications. (D Van Nostrand).
[15J SPIEGEL M R, (1964). Complex Variables. (McGraw-Hill).
[16J WATSON G N, (1952). A Treatise on the Theory of Bessel Functions
(Cambridge University Press).
[17J OLIVER F W J, (1964). Handbook of Mathematical Functions,
(ed M Abromowitz and I A Stegan). (Dover).
[18J MILLER J, and BROWNE J C, (1962). Collection Formulas for
Diatomic Integrals. Technical report of the Molecular Physics
Group, University of Texas, Austin.
[19J MULLIKEN R S, RIEKE C A, ORLOFF D, and ORLOFF h. (1949). J
Chern Phys, ~, 1248.
[20J GAUSCHI G, (1961). J Assoc Comp Mach.
[21J STROUD A H, and SECREST D, (1966). Gaussian Quadrature
Formulas (Prentice-Hall).
[22J STROUD A H, (1971). Approximate Calculation of Multiple
Integrals. (Prentice-Hall).
[23J SHAVITT I, and KARPLUS M, (1965). J Chern Phys, 43, 398.
[24J SILVERSTONE H J, (1966). J Chern Phys, 45, 4337.
(1967). ibid, ~, 537:[25J BOWERS M J T, (1974). J Chern Phys, 60, 3705.
[26J CSIZMADIA I G, HARRISON M G, MOSKOWITZ J W, SEUNG S, SUTCLIFFE
B T, and BARNETT M P, (1964). The POLYATOM system, Technical
note 36, Cooperative Computing Laboratory, MIT.
[27J CHIU M F, GUEST M F, and SAUNDERS V R, (1973-74). ATMOL2 User
notices 1-12, particularly note 3.

CORRELATED WAVEFUNCTIONS

N.C. Handy
University Chemical Laboratory, Lensfield Road,
Cambridge, England, CB2 lEW.

It is desirable to obtain highly accurate solutions to the


Schrodinger equation

where
+

')_1

1)1

r;,

(f..)

I.:.J

Here we are working in atomic units, and the Born-Oppenheimer [lJ


approximation has been used to separate the motion of the nuclei
from that of the electrons. I, J denote nuclei; i, j denote
electrons and
~~
denotes the distance between nucleus I and
electron i. In addition to equation (1), the wavefunction
jf
must obey the Pauli Principle

where

'1l}

permutes the coordinates of electrons i and j.

Because a review has recently been written on this subject


(2J we shall briefly describe two common methods which are used
to obtain approximate solutions to equation (1), and follow by
giving examples of correlated wavefunctions for various small
atoms and molecules.
In the variational method, an approximate wavefunction is
written as a linear combination of expansion functions:

Diercksen et aL reds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 425-433.
All Rights Reserved. Copyright 1975 by D. Reidel Publishing Company, Dordrecht-Holland.

426

c. HANDY

N.

'"
~:~,~
J

where the coefficients


equations

'" <'lfr I +4 1:,

Ci

are determined by solving the secular

wi 'ts>Cs

(5)

; 0

:s
tn denotes the number of expansion functions '/>.. If thew
eigenvalues of equations (5) are denoted W~1fI) J W!;:,) ... W,..
where
5
then these eigenvalues can be shown to obey

Wr> wt':>,

w.(m+,) ~ W.tM)
...

(6)

j..

It is usually assumed that this implies

Wi, (~)(tAd;.) ~ W;..""'I

b)

where ~~(exact) is the exact ith lowest eigenvalue of equation


(1). It is the properties (6) and (7) which make the variational
method so appealing, but it must be noted that the method can only
be used if the integrals ('Jt./ ~ Il.s> can be evaluated.
In the method of moments, two different expansion sets are
used:

""e

,+" 'i-:l-'" ~IVI

f, ) ~~) ... 51'1\

~o..t~CI'\S o..S5ccA.A.reJ.. W'L~ ltie. me.ti\otA ~rL

<~( I ~-WI~ ~s'Y-.s>= 0

(Z~ .. ~r I ~-wl'l-s>~o

s" 1/2 ...

III

Thus two approximations,


'j~ ts
and? Jr gr
are obtained
for i. There is no variational principle here, but the
advantage is that it may be possible to evaluate the integrals
<jr'~I~s> when it is not possible to evaluate the integrals
<'I-, loW I'+'~). I f
and )At are a measure of the errors in the
exp~nsion sets '" and
being defined by

r ,

I
"'.J'z)A-= "'i.n <f(~ -Z<:sl/-s .f'(e.x.,d;} -L:.c..;'V's;>
~

t_

AA. '" MiY\

< (e-XAet)

"'.

-2.cl.S~
3

I i: (eXAct)

."

(10)

2:.rA.s
'3'!>?
~

than it can be shown that W, the eigenvalue of equations (9)


satisfies

427

CORRELATED WA VEFUNCTIONS

I.. ~~+S ::. q: (e.xo.Pt)

Z JS 'f.. :.: . (ext'&-t)

CI

t- /" \. 't

t I

+/ t 'f

10

/I

.. ,.

"

+/l'f+.?1~

(It.)

W'''') 'IfII
The above are perturbation-type exapnsions, with 1'
(II,
etc being terms in the expansions. These expressions
11

'1'01

should be apparent when it is realised that W will be exact


if either of sets ! or ~ span the exact wavefunction, and
likewise
~s '#:'5
will be exact if If- spans the exact wavefunction.
Further details of this method can be found in reference [3J .

.f

Here we are concerned with the determination of correlated


wavefunctions,
In this context, we shall mean wavefunctions which
explicitly include powers of r0 ,the interelectronic distances.
This means we exclude configuration interaction wavefunctions.
It
can be shown that configuration interaction implicitly introduces
powers of ~j into the wavefunction, but this author beleives that
it is the failure of the method to include linear terms in ~j
which makes its convengence so slow. However it must be recognised
that configuration interaction is the simplest method for including
correlation effects in wavefunctions.
It has been used with much
success on a variety of small atoms and molecules, its success
being dependent upon the ability to evaluate the integrals
(~r cj:4
where
is are Slater Determinants. The wavefunctions which are determined by configuration interaction may
contain 10,000 determinants, which is rather undesirable.
It will
now be seen that by including powers and products of r~j in wavefunctions, the methods for correlated wave functions are able to
provide compact and accurate wavefunctions for a small, select set
of atoms and molecules.

I l.t>

SPr ,

The "original" correlated wavefunction of Hyll eraas [4J for


the Helium atom in its ground state had the form

j:=

(l.0-0.100828s

-0.031779u 2 )

0.355808u

0.033124s

0.128521t 2

(13)

with W
-2.90324a.u. ,
and s ~ "f\ .... r,.)t: ('. -(').., .... ::....;,...
He demonstrated therefore that
this six parameter variationally determined wave function gave an
energy within O.00048a.u. of the exact value.
Being an atom, all
the integrals were trivial.
Using ten parameters an energy error
of 0.00012a.u. is obtained. More recently Pekeris [5J extended
the expansion set to obtain an eigenvalue which is within
experimental error of the exact eigenvalue.
It appears that the
inclusion of terms linear in~... are vital to obtain such accuracy.
(Although the cusp condition at ~~ =0 demands a linear dependence
it is in fact a linear
in the wavefunction near ('12.=-O
on 1"'11.

428

N. C. HANDY

Gl=O

dependence away from


function),

which is desirable in the wave-

The form of variational correlated wavefunctions for the


hydrogen molecule in its ground state was originally due to James
and Coolidge [6J. Their wavefunction took the form
(2.23779

0.80483 ( ~~ ... ?~

-0.5599

??,.

-0. 60985 (,~>.~", )

+ 0.56906 ~J,.) exp(-0.75(:f,+g;), (O(~-/>r:i)

(J'i-)

at an internuclear distance of 1.4a.u., with an energy of


-1.1665a.u. (exact value -1.744a.u.). In this wavefunction
j~=-ll"i.A~{'~P,)/??~=lrtA-('.8)/R with A,B denoting the two nuclei, and
R the internuclear distance. Using 13 terms an energy of
-1.1735a.u. was obtained. Here again, this form represents a
highly accurate and compact wavefunction, and its success can be
attributed to terms in ~~ . The variational integrals are not
trivial, but can easily be coped with on a computer. More
recently Kolos and Wolniewicz [7] have extended these calculations
to include 100 terms, and after including relativistic an~lother
corrections, predict a dissociation energy D of 36117.4cm ,
whereas t!:r experimental value [8J satisfies 36116.3 < D <
36118.3cm .
Variational correlated wavefunction calculations were also
performed on the ground state of the Lithium Atom by James and
Coolidge [9J , using a wavefunction of the form

<p(f, If,. ,b) ~lllj)"'((I'.,(Lu3J + CP('}!'~1!1) ~(Uj)f>LiJJri.{I13)


+ <p (f.1 )r~ (~) ~(lJi) 0(11"..) fitl's)

0s)

is anti symmetric in 11

In (15), <fC!",!~,!\)

and.~

and is given by

<p :: 2'i-<'Y.P-

with the expansion set

rl ,,2,j r3It r1lil lJ.3I"

""1

having the form

)
b f'l,-c:.f',3
e,)(p(-o..l',- .

lit)

using ten terms in equation (16), they obtained an energy of


-7.47607a.u. (The Hartree Fock energy is -7.4327a.u. and the exact
eigenvalue is -7.47806a.u.). By including 60 terms, Larsson
tlOJ has recently obtained a value of -7.478025a.u. Thus again
experimental accuracy has been achieved using explicitly correlated
wavefunctions. The most difficult integrals in the calculation
are of the form

429

CORRELATED WAVEFUNCTIONS

which can be reduced to a rapidly convergent infinite series, and


thus eminently suitable for a computer.
Correlated wave functions have also been determined for the
ground state of the Beryllium atom using the variational method.
They have been constructed from a set of expansion functions of
the form

In (19), cp... ) <P" I <PG. and J.L are linear combinations of Slater
type basis functions, (ij) denotes any pair of electrons and v
is any positive integer. The integrals for this expansion set
are no more difficult than those met in the lithium calculations.
Szasz and Byrne C1D were the first workers to perform such a
calculation, but more recently Sims and Hagstrom [12J have used
107 such expansion functions in a calculation. They obtained
W
-14.66654a.u., which compares with an estimated exact value
of -14.6667a.u., and the best configuration interaction value of
-14.664l9a.u. [l3J .
Beryllium represents the largest atom and H2 the largest
molecule for which variationally determined correlated wavefunctions are available.
It is probable that in the future we
shall see further correlated wave functions for small atoms,
but the difficulty of calculating the integrals <~~ 1~'~1
where 'f are correlated functions, for molecules with more than
three electrons, makes it improbable that correlated wavefunctions
for such molecules will also appear.
Because of this difficulty with integral evaluation, one may
consider the application of the method of moments to the
determination of molecular wavefunctions. The transcorrelated
method [14) represents one such application.
In this method the
wave function has the form
where

CI

c;: TI.J f (

f .. i ) (i

"

fi)

a
and ~ = 5+(cp, q>~" . 'PI))
is a Slater determinant.
wave function the Schrodinger equation (1) is
.. ;>

or

oFt c.:~> Wc g

(20)
For this

l7-I)

-1

The transcorrelated hamiltonian C i+C while being non-hermition,


only contains three-electron operators~

430

N. C. HANDY

')

\7,. f
"'

-' -I.
2. "..- ~

'-J

\7., f''''
1j 1.. 1q (23)
_ .....

where the notation -f~j:: f ((.j, f~ I tj)


has been used. Thus
matrix elements of c.-I JiG across determinants involve integrals
no worse than double six-dimensional, and so the following
expression

for the energy can possibly be evaluated forcertain basis functions.


A specific form of C and ~ for which such calculations can be
performed is

<i'J' ~ l, bj~.:ticl:f)4'(Y)
lei: I

where
forms

~(1)

is a spin function, and the other functions have the

G1t \ r~ )rio) = e.Xp l- 0." '1~ -

by Vi\, - Co" r2.~J

,!p (1"1) " ~)(~ ( - r r.~)


~ (1.)

~)(P L-\Q.'f,:J
and

bik

the following

=0

~., 1) 2. ...

I( j

j" '1)2- ...

=0)

11:: 1)2. .. G

To determine the parameters D~


equations are solved

>
<: ~C:I~) I PI- - w Ic.~ >
c,D.}<C) L~-'~) \ 1=1- ~V I C~

1\

(;.7)

dhJk

(2'&)

i,

By noting that "'I/~Iodk is a single replacement of


equation
(27) may be recognised as a type of Brillouin condition for the
operator c.-I~C and the single determinant 1. Likewise
equation (28) is plausible because ';J(C'j)/t3Dey
are related to
double replacements of~. The parameters d? are chosen so that
C'jotC
and t=t differ as little as possible, and consequently
ensure that the calculations are as near variational as possible.
Because it can be shown that

(Z'f )

431

CORRELATED WAVEFUNCTIONS

a convenient equation to determine the

J p is therefore

(30)
Thus equations (24), (27), (28) and (30) are sufficient to find
all the parameters and the energy for the transcorrelated
wavefunction C;jf. The connection of the transcorrelated method
and the method of moments is apparent when equations (27) and
(28) are compared with equations (9A).
For the particular basis set (26'), the integrals in the method
can be rather trivially evaluated, being dependent on the formula

JJf exp(-pl/~ -'r~ -

rtJ~ -CIZ.r,,~

-ejl

'3~)vHl,dV1-JV3
(31)

:. rr'f/2. IJ.Xp (- F/1 /D3f~


where

F::. (",e31 -+ C.31 c/:l'-+ G /1 C2 ;>O'(jlpt -t 61 R'fY + Rf~rf)


+ftrC PtJ.'l.cu.. -t a~Cz.3 T Rfl-c.!/)

D= flj-r -+lft'V-t-f)(Ca.!>t:!.1

+ C3,c-,2. +C.ll.C"t.) +-fty(c31-tCZ)+

+1-'1"" t ' l l

(32)
t'l""(CYl.'-+C2.~)
.,. C/ 2)

As an example of a typical calculation, a m~nimum gaussian


basis set (FSGO) was used for the orbital basis ~k
for the
water molecule.
(This gives a self consistent field energy of
-64.23a.u. compared to an estimated Hartree-Fock value of
-76.07a.u.). Using four correlation expansion functions, a
correlation function in the form
"TT
_().3r.':"
-1.0/;
-30 ?,.
C.=. " axp[- 0.20'1'1 e
'j - O,OIl7e
'J -O.0742e . 'fr;J
L~'
~
.

.J

-O.Obi?l

was obtained.

e- /2 O 'f'i.j

+-!J
(~f'(!i) rcrpCtJ ~J
r r

(?.3)

The calculated value of the correlation energy,


(or 66% of the correlation energy).

WSc.F- W ? was O.24a.u.

This result is sufficient to demonstrate the power of the


transcorrelated method. A compact wavefunction
with
C given by equation (33), has b~en obtained, the form of which is
easy to understand. A high percentage of the correlation energy
was obtained. The difficulties with the method are its nonvariational characteristic, and, for this example, the fact that
the number of integrals to be calculated went up as ~3 Kg', where

C.i

432

N.C. HANDY

Q is the number of correlation functions

of orbital basis functions ~~

tf'V and K is the number

It is sometimes said that the method suffers because


expectation values cannot be calculated in the usual way. But a
way around this difficulty can be found through the HellmannFeynman theorem. Let ~ be the operator whose expectation value
is required. Obtain transcorrelated energies \V~ corresponding
to the Hamiltonian '*11::
-+
for various discrete values of
~.
The Hellmann-Feynman theorem states, for exact wavefunctions
that

>.&

if

(~ 1& IJ> "" ~I


<i" Ii)

(34)

4~ ~=o

A numerical estimate for "OvJ;.i)~ at A<:.O


the calculated set of values for W)..

can be obtained from

In conclusion it appears that the transcorrelated method


represents one attempt to find correlated wavefunctions for small
molecules. It is not a method which is ready for wide usagebecause there is a need for further research in order to
understand fully the convergence of the method and associated
problems.

References
1.

Born M. and Oppenheimer J.R.

2.

Handy N.C. in the volume on Theoretical Chemistry, in


Physical Chemistry, Series 2, of the Medical and Technical
Publishing Company's biennial reviews of science (to be
published)

3,

Boys S.F. (1969) Proc. Roy. Soc. (London) A309, 195

4,

Hylleraas E.A.

5,

Pekeris C.L. (1958) Phys. Rev. 112, 1649

6,

James H.M. and Coolidge A.S.

7.

Kolos W. and Wolniewicz L.

8,

Herzberg G.

9~

James H.M. and Coolidge A.S.

(1929)

z.

(1927) Ann. Phys. 84, 457

Physik 54, 347

(1933) J. Chern. Phys.

l,

825

(1968) Phys. Rev. Lett. 20, 243

(1970) Phys. Rev. Lett.

~,

1081

(1936) Phys. Rev. 49, 688

433

CORRELATED WAVEFUNCTIONS

10.

Larsson S.

(1968) Phys. Rev. 169, 49

11.

Szasz L. and Byrne J.

12.

Sims J.S. and Hagstrom S.

13,

Bunge C.F.

14.

Boys S.F. and Handy N.C. (1969) Proc. Roy. Soc. (London)
A309, 209;
(1969) ibid A310, 43. Handy N.C~ (1972)
Mol. Phys. ~, 1. Handy N.C. (1969) J. Chern. Phys. 51, 3205.

(1967) Phys. Rev. 158, 34


(1971) Phys. Rev. A4, 908

(1968) Phys. Rev. 168, 92

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

M. A. Robb
Department of Chemistry, Queen Elizabeth College,
Campden Hill Road, Kensington, London W8 7AR

INTRODUCTION
The purpose of this review is to examine the quantum mechanical basis

of some of the methods used for the computation of correlation energies


using pair functions. We shall formulate this discussion using the "language"
of Many-Body Perturbation Theory (MBPT) with the aid of Feynman diagrams.
The reason for such an approach is that, while some of the methods which we
shall discuss correspond to variational wavefunctions, the majority must be
viewed as non-variational. Hence, perturbation methods provide a common
approach to the discussion of both types of theory.
This review is aimed at the non-specialist. We shall therefore use
diagrammatic methods only as a language. We do not intend to become
enmeshed in the details of MBPT itself. Thus, we shall avoid any discussion
which requires the use of second quantization, or time dependent perturbation
methods (see references I, 2 and 3). Rather, our discussion will be formulated
using strictly algebraic methods such as those employed by Brandow (4) or
Diercksen et aL (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 435-503.
All Rights Reserved. Copyright 1975 by D. Reidel Publishing Company, Dordrecht-Holland.

436

M.A. ROBB

Lowdin (5). We shall see, however, that the language of MBPT is very powerful, in that it not only exposes the nature of the types of approximations
inherent in many of the more traditional quantum mechanical theories, but
also suggests some new and useful approaches.
The scope of this review must be rather limited. We shall be concerned
primarily with the calculation of ground state correlation energies. However,
it must be pointed out that the techniques we shall describe become even
more powerful when applied to the calculation of energies for electronic
excited states or second order one-electron properties. A general review of
many-body theories of electronic structure has recently been given by
Freed(6). Also, we must omit any detailed discussion of those methods which
have their origin in the valence bond method. Most of these methods may be
related to the pair function methods which we shall discuss if full orbital
optimization is allowed.

II

CONFIGURATION INTERACTION (CI) AND THE DEFINITION OF


PAIR CORRELATION FUNCTIONS
We shall begin our discussion with a brief review of the manner in

which all pair function theories are related to the general CI expansion of
the wavefunction. We assume that a good zeroth order wavefunction for our
quantum state of interest takes the form of a single antisymmetrized product,

10), of orthonormal spin orbitalS,{ <Po:} ,0:=1 , ... n. Further, we assume that
we have available the orthogonal complement to the set {<po: } a second set

{<Pk},k= n+1 , .. n+m such that the two sets together form a complete m_ co
set. We shall refer to the first set, <Po: ' <P fJ ' q; 'Y . as occupied orbitals or
"hole" states, and to the set <Pk ' <P I ' <I> m .... as virtual orbitals or
"particle" states.

437

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

The CI expansion of the best wave function for the considered state
can then be written (7, 8)
[ 1)

'11

= 10)

2:

1< I

0)/3 0/3

kl )

0/3

k)1
kim

2:

a/3'Y

a)f3)'Y
k) I)m

1:

o)P)y)o

C
opyo

k 1mn

k) I )m)n

We use the symbol


replacing spin orbitals
The

Cok pI....
..

kim n>
o p yo

+ .....

~p:

::) to denote the Slater determinant formed by

0,

p . ..

in 10) . by virtual orbitals k, I . . .

are linear variation coefficients which are chosen to minimize

the energy. The individual terms in equation [1) are referred to as


"single substitutions", "double substitutions", etc. up to "n-fold substitutions".
Application of the variation method to equation [1) requires a knowledge of the matrix elements of the Hamiltonian,

H , over the expansion

functions. It is therefore convenient to introduce some notation at this point.


As suggested by Taylor (9) we shall denote matrix elements over various
orders of substituted configurations as
(2)

<OIHIO)= Eo
{<0IHI~)}=t!O,1'
{<ol HI~b)}= t!O,2 , ...... {<~IHI~)}= t!1,1 '
{ <~~ 1H Iyg )}= th.2

and similarly for the coefficients so that


(3)

C 1 = {Ca"} , ~2

Further, since
[4)

t!O,3

= {

Co Pk I}

, ......

is a sum of one-and two-electron operators, then

Q, t!O,4 =Q, ...... Ho,n= 0 ,

438

M.A. ROBB

and if our orbitals are Hartree-Fock orbitals, then


[5]

!::!0,1 =

and

[6]

C
-1

0-

holds approximately. (For a detailed discussion of the conditions under which


equations [5] and [6] are valid see reference 10). If we now define

[7]

{ !::! 0,3

!::! 04,

....

H
-O,n

{ H 2 ,3

H
-2,4

.... }

H 2 ,n

H
-3,3

!::!.4,3

H n,n

and
[8] {

C3

'

C4

, ....

.......... }

Cn

then the application of the variation method to equation [1] gives a scalar
and two matrix equations (9):

= E- E

[9]

HO,2

[ 10]

H
-2,0

[ 11]

H n ,2

C
+ (H
- E) C
-2
- n,n
-n

-2

(t:b,2-E)~2+

tb,n
=

~n=Q

We may formally solve equation [11] for


[ 12]

Cn =

-(

!::!n,n -

-1

E)

i::!.n,2

n so that

Q2

When this result is substituted in [10] we have

where

Thus from equations [9] and [13] we have the important result that in
order to determine the energy one needs only the coefficients of the double
substitutions, C 2 . However, these coefficients depend upon the higher order

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

substitutions through the definition of

....

H 2,2

439

. Through equations [9] and [13]

one may then compute the energy directly from pair functions as we shall now
discuss.
If we write out equation [9] in full, we have

[IS]

C kl H kl
ap
O,a p

E = Eo + ~

a>p
k>1

We may then define ~ea~ as

~e

[16]

A =

a"

so that ~e a

k>1

a"

kl

HO,aklp

is the contribution of spin orbital pair {cjl a cjl p } to the

correlation energy. Further, if we define an antisymmetrized pair functionlrs>


by
[17]

__1_[, (1),
Irs ) ff r

(2) - ,
5

(1) ,

(2)J

then we may define pair correlation functions


[18]

Iua"A)=

IUa p) as

k>1

Equation [16] now reduces to


[19]

(U I.!. lap> .
a p= a
p r12
knowledge of the U a p gives

~e

Thus a

With this definition of the

us the correlation energy.

IU a p )

at hand, the equation system [13]

now reduces to a set of coupled, non-linear, inhomogeneous equations which


will have the general form
[20]

012{Gap (1,2>} o12l uap>

+~ ~a p 012{ La(~';~B} 0 12 1U yB > + 012 :121a p > = 0


where

440

M.A. ROBB

One could imagine solving the equation system [20J or [13J by any of the
standard methods. However, this would only be feasible for the smallest of
systems. Thus one is forced to seek approximations. The form of equations
[12J and [13 J suggests the expansions of perturbation theory (5) and it is to
this method that we shall devote the remainder of our discussion. First,
however, one must comment briefly on the calculation of the energy with
approximate U a ~

using equation [19 J .

Let us suppose we have a set ofU a ~which are approximations to the true
U a ~ The sum of l:1e a ~ calculated from equation [19J will not give an upper bound

to the energy. On improving one'su a ~ one does not approach the truel:1ea ~ from
above. The upper bound to the energy must be computed from
[22J

E=

<"'IHI"'> / <",I",>

where '" is the complete expansion in equation [I J. Thus one needs all the
expansion coefficients

(.l

I'

kim ...

y...

and not just the Ca (.lkl


I'

Of course,

one could define a trial function which was truncated at double substitutions,
[23J

=Io)+);~

the exact

Ua~

C k1lkl>
'"
a~ a~
os
k> I
and calculate an upper bound from equation [22J. However, as one approaches
equation [19J becomes exact, whereas the use of equation [23J

in [22J. is not. Thus with approximate

Ua ~

the l:1ea ~

computed from

equation [19J may be a much better approximation to the true variational


energy than the upper bound computed with the trial function defined by
equation [23J. Sinanoglu (II, 12, 13, 14) has discussed this point in great
detail, and further discussion has been given by Nesbet (7, 8) and Szasz (15, 16).
However, the reason for this discrepancy is apparent from very simple physical
arguments. An example (10) of two isolated two-electron systems serves to

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

441

illustrate the essential ideas. Suppose our two isolated systems A and B have
wavefunctions
[24]

"'A (1,2)

[25]

"'B

A2
A2

(3,4)

[cj>1(1)cj>2(2)]

[cj>3(3)cj>4(4)]

U 12 (1,2)

U 34 (3,4)

In the limit where there is no interaction between the two sub-systems, the
exact wavefunction for the combined system is just the antisymmetrized
product of
[26]

"'A

and'" B

"'AB ( 1,2,3,4)

=A4

{cj>1 (1)cj>2 (2)cj>3(3)cj>4(4)

+cj>1(1) cj>2(2)

u 3 J3,4)

+ U 12(1,2)

+cj>3(3) cj> (4) U (1,2)


4
12

(3,4)}

34

Because of the appearance of the term involving the product of two pair
correlation functions, the trial function '" os given by equation [23] is
clearly not appropriate.
The correct form of trial wavefunction for use in pair function theories
has been discussed by Cizek (17) and also by Primas(lO). It takes the form
of a "linked pair expansion"

where the operator A 2 has the form


[28]

=L

2 a>~

and
[29]

Ta A 10)=
t'

k>1

k)1 kl)
Capap

Equation [26] could then be written

However, we shall not pursue this point since such considerations arise
naturally in the formalism of MBPT.

442
III

M. A. ROBB

CALCULATION OF PAIR CORRELATION FUNCTIONS USING


THE METHODS OF MANY-BODY PERTURBATION THEORY

III -

introduction to Diagrammatic Perturbation Methods


We shall begin our discussion by giving a brief synopsis of what we hope

to accomplish. Suppose we formally solve equation [l3] for the coefficients


so that
[31]

..

-1

C2 = -(H2,2-E)

H2,o

Inserting this result into equation [9] we have

Now, using the identity


[33]

(A- B}-1 = A-1 + A-1 B A-1

+ ..... .

and choosing suitable definitions of A and B , one may expand the inverse
matrices in equations [32] and [14] and so derive (5) most of the standard
"sum over states" perturbation methods. However, as soon as one introduces
explicit expressions for the matrix elements in terms of spin-orbitals,
considerable simplifications and cancellations occur. MBPT is merely a reduction
of these perturbation expansions to elementary bits, where each bit is a
matrix element with respect to one-and two-electron operators over the
original spin-orbital basis set. All cancellations are taken care of from the
outset. Each bit is then enumerated using diagram techniques. The power of
the method arises form the fact that one may then re-sum selected terms in
the resulting expansion to yield approximate equations for the U

In

transforming ordinary perturbation theory to MBPT one focuses one's


attention on the types of interaction one will include at the spin-orbital level
(i.e. those terms which are to be summed), rather than on the form of one's
variational wavefunction.

443

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

Let us proceed, then, from the text-book formulation of


Rayleigh-Schrodinger Perturbation Theory (RSPT). We let the Hamiltonian
be decomposed into an unperturbed part H

and

a perturbed part V

so that
[34]

H = Ho

where -,. .

+ -,...v

is a formal parameter which is to be set equal to I. Given a set

of zeroth order states

with energies
[36]

E~, E~, ....

we can define a propagator


[37]

.!..

2:

Il=FO

1<1
Il

<<1> I/CEoO- EO)


Il

Il

and we can then write the RSPT series correct to -,. . 4 as


[38]

Eo= Eoo+ -,. (olvlo) + -,. .2 ~Ivtvlo)


+-,...3{(alvtv-kvlo) -(olvlo) (OIV~2Vlo)}
+-,. .4 {<olvtvt v~vlo)_\olvla) (0Ivkv~2 V10)
-<oIVIO) (olv ~2VtV I0)
+ ~I V10)2 (oIV~3V Ia)
-<olvivIO) ~lv~2 via) }

The states,

<1>

Il

, may be chosen identical to those in the expansion

It will be convenient to define

I.

in terms of the Hartree-Fock (HF)

operator although this does not imply that we need use HF orbitals. We shall
make the same choice as Amos and Musher(l8) and define
[39]

where

Ho =

2:i h (i)

so that

444

M.A. ROBB

HF

[40] f{(1)=t(1>+V<1>+V (1>-h(o.

The operators t and V are the usual kinetic and nuclear-electron potential
operators,

and

is the HF average potential defined as

VHF

is defined as

(1)

ii=y;61~<61Y6 +k~ll~ ~Ikl

[42]

where
[43] rs

~It

+ V+ VHF

Is)

Thus, provided our orbitals are HF orbitals or some unitary transformation of the HF orbitals, we will always have
[44]

h'(1,J~ =al~

and hence
[45]

E:

I
y

Further, since
[46]

V = H-Ho
= I

i>j

we will still have


[47]

th~

usual first order energy

E~ = />6 <y 61y 6) - ~ (yIVHF I y)


= - ~ ~ (yIVHFI y)

where we have used the notation

The term involving

Ii (1)

does not enter until third order.

If we now introduce the explicit form of the expansion functions,cP ll

445

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

from equation [1] into equation [38] and evaluate the resulting matrix
elements, we observe certain cancellations between the first term in each
order and the remaining "renormalization" terms at the same order. This is
easily illustrated at third order. Let us consider a third order term involving a
diagonal intermediate interaction Vfl fl' Combining both terms we have
[49]

IVol-'12 (VI-'fl-VOO)

E (3)
I-'fl

(Eg _ E~)2

where Voo comes from the renormalization term in).,3. If we set I-'=I~p) we have

1501 V

~If.~ ~ 51 V ~

V~Q,~ Q-IVHFlv) ]

+[:=ta.~

- ~IVHFI]

+ ~ilk~
and
[51]

Voo

=[I

y*a.~

"2

It is easily seen that the first term in

Voo

cancels the first term in

Further, there is cancellation within the second term in V flfl

leaving only terms from


[52]

VI-' I-'

V - Vo
1-'1-'

VHF

. Thus we have

(kllk!> -<fklak)-cflla!>

(~kl ~k> - (PI I ~~+~ ~Ia ~>


This same sort of cancellation occurs to all orders of RSPT. For example,
one could verify that there is a partial cancellation between the first term
in )., 4

and the renormalization terms which involve the first order energy

(olvlo).

However, the cancellation between, say, the fifth term in).,4

and the first term in )., 4

is much more difficult to verify beCl,lUse of the

appearance of the energy denominators. It is at this point that diagrammatic

446

M.A. ROBB

methods begin to become invaluable, since the cancellations are immediately


apparent from the topological structure of the diagrams. Thus, we must now
digress to discuss the general features of the diagrammatic method.
We begin with a simple example. Consider the first term in equation
[38]. If we take
[53]

E~IV
r

\l=I~P

and v

=I~B)

and evaluate

VO\l V\lV Vvo

(E~- EO)(Eg-Ee>

we obtain the algebraic expression for one of the resulting terms which is
shown on the right of Figure 1. The corresponding Feynman-like diagram is
shown to the left. All we now require is a set of rules for going from left
to right. Before giving the general rules, let us note the two essential
topological features of the diagram in Figure

1; namely, the oriented lines

which are labelled by hole or particle states, and the horizontal "interaction"
lines which intersect the oriented lines at nodes. The dictionary given in
Figure 2 then gives the rules for translating the diagrams into matrix element
expressions. The downward lines are hole or occupied orbital lines and the
upward lines are particle or virtual orbital lines. To keep relative signs
correct, we add a factor of

~1

for each hole-line segment and for each

closed fermion loop. The numerator of each diagram gives


~

factor for

interaction line. The various possibilities are summarized in lines 3 to 6

of Figure 2. Returning to our example, Figure 1, the three two-electron


matrix elements that occur in the numerator are easily evaluated if one
remembers the rule:
left in. right in. operator. left out. right out.

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

"<-!. ,......

.;w

9A

(4)

I-

:;::

0...

co

UJ:JI!

.....,)

40

>-

UJr4.

>---

ttl
~

Ol
ttl
Cl
Q)
.::,t.

,-~,

.I
c:

c:

lL-

>
Ul

JP-

~
A
e

~
rQ.

ttl

>.

.JI::

Q)

Q)

-0

-0
~

-'=
l-

.ttl

~,<.
I
I

447

'oX

,,

I
I

~1J
>
~

le:(

u
.,...
0>.

Q)
~
~

Ol

lL-

448

M.A. ROBB

DIAGRAM ElEMENT

fa
t
k

o.V~

___lyf3

MEANING
HOLE-LINE SEGMENT
(OCCUPIED ORBITAL)
-1
PARTICLE- LINE SEGMENT
C. VIRTUAL ORBITA L)

-I

E----(9

1"12 ~

(r - P12) r

-I

12

-<k (r) I~ J(Il- KCI) lUI)


y y
'I

>

*~----@
~

~~r::lt~~

CLOSED FERMION LOOP

-1

J-

ENERGY DENOMINATORS
1
- ~ e
[ 1:
't 'I k k

Figure 2. Rules for Evaluating Feynman-like Diagrams

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

449

Also, we have 3 hole-line segments and two closed loops giving an overall
sign factor of -1. For the denominators, each m!ir of interaction lines
contributes a factor equal to the sum of hole-state energies minus particlestate energies. This rule is illustrated in Figure 3 and the application to
Figure I follows directly.
At this point it is convenient to introduce a simplification. For each
diagram with N two-electron interaction lines there will be 2N -I corresponding diagrams resulting from exchange. We shall include all 2N diagrams in one
diagram by using antisymmetrized interaction lines. (It is easily verified that
the sign rule for the complete set of diagrams still holds). For example, we
can obtain all the terms resulting from equation [53] by replacing the dashes
by the antisymmetrized interactions (wavy lines).
We now have all the formalism necessary to write the expansion [38]
1D

diagrammatic form. We begin with some general considerations relating to

the lowest order terms. Since only the double substitutions contribute to the
energy at second order we have
[54]

E02

o>P
k>1

o>P
k>1

<olvl~~>2

o
EO
Eo - 1~lp>

(0 Plk I)
EO+

(k 110

P>

Ep-Ek-EI

The last summation is represented by the diagram, Figure 4a. We may now
generate a large class of related diagrams by inserting intermediate interaction
lines, in all possible ways, into this second order Diagram 4a. All the diagrams
in Figure 4 are derived in this manner, subject to the constraint that no

450

M.A. ROBB

.,.-....,
UJ
I

...

I~

+~

VI

'-""

+->

<0

!::

E
0

!::

Q)

Cl

~
Q)

>
e)az

C!l
~

...} ' ..J

I
I
I
I
I
I~ I

I
I

~,

('

!::

40
!::

+->

<0
::l

<0

>

(V")

Q)
~

::l

en
I.J...

ex

IX

IX

(e)

(c)

Figure 4.

(0.)

'P

+IX

ex

ex

(b)

(d)

f3

+
(b' )

/3

+ ...

~ 1+ ...

ex

Diagonal Hole-Line Diagrams Contributing to the


Correlation Energy of Spin-Orbital Pairs

+
p

+ ....

.j:>

U1

-<

::<I

g;

~
..,

::l

>

t>O

::<I

c::

"0

Pi

~
..,~

::<I

S2
~

I:)

'"
>
Z

~z

'"!j

::<I

4S2

M. A. ROBB

intermediate interaction on a hole-line may change the hole-line label. Because


of this constraint, we shall call them diagonal hole-line diagrams. This class
of diagram will play a vital role in the pair function theories we shall discuss
subsequently. However, first one must comment on the way in which the
cancellations, which occur as a result of the lowest order renormalization
effects, are accounted for. Diagram 4b with m = k and n = I , Diagram 4d
with m = k , and Diagram 4e merely sum to give the intermediate interaction
of equation [S2]; however, Diagrams 4c and 4d (m"bk) result from a
different type of cancellation that deserves some comment. The general
situation is illustrated in Figure S. At third order, one encounters diagrams
of the form Sa and Sb. In the summation of 0 in Sa, it is perfectly consistent to include or ignore Diagram Se. However, since Se is exactly
cancelled by Sf, one must either include both diagrams or neither of them.
If we include Se, then Sa and Sb cancel exactly with the result that only

Diagrams Sc and Sd occur in one's expansion. Diagrams of the type Se have


been called "exclusion principle violating" (EPV) diagrams, since one has two
holes with the same label excited at the same time. Unfortunately, the same
acronym has been applied to diagrams of the type Sb and this has been the
source of some confusion. The important point is that if we define our
in terms of

VHF

Ho

,the "bubble" diagrams of the type Sa are cancelled

but terms of the type Sc must be included.


Finally, we should mention that one may write the diagrammatic
expansion for the wavefunction itself in the same manner as we have done
for the energy. For example, the coefficient

C(1)~~

of

I~A

correct to the first order is shown in Figure 6. One now has a new diagram

ell

Ofol

'j.

(e)

,b
0(.

J, ,t.k

. \ IX

+
~

()(.

(b)

(f)

.---@
Y

'I

(d)

--~

O~vvvvvTI

11'

Cancellation of Diagrams Involving -VHF and the Introduction


of EPV Diagrams

(c)

:~

=;>

,..,.,.,~

(ll)

Figure 5,

j."

', ...
fJ

..,

.l>v.

'-<"

::c
t'1

to

..,'"
13z
..,

t'1

'c..,"

"C

';;::;;::"

Cl

to
);:

zto

13
zen

("J

..,

"Ij

'cz"

"C

454

M.A.ROBB

-JctL

~~

--~

1\ w

...,)

loX

'+-

vJ

s-.

? uJ~

Ul~

to

Ol

a
J::
0
.j..J

J::

::s
'+<lJ

>

to
;:;:

~
I

.:It.

sto

II

....J

"co
:=J

I.
I

s<lJ

"0

s-

.j..J

til

.....s-

l1..

I.D

<lJ

s-

::s

.....

Ol

l1..

455

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

element; namely, "free" particle and hole lines. The free lines label the
states in

I~~

>

and the two pairs of free lines correspond to a double

substitution.
III - 2

The Many-Body Linked Cluster Expansion

As we discussed in the previous subsection, it is at fourth order in


RSPT that the diagram techniques become a very powerful tool. The reason
for this is a theorem first discussed algebraically by Brueckner (19, 20) and
later proved by Goldstone (21). A statement of this theorem sufficient for
our purposes is:
Any part of a diagram which has no free lines and which is completely
disconnected from the rest of the diagram is called an unlinked part.
All diagrams having unlinked parts will be cancelled by terms arising
from the renormalization corrections. Thus only diagrams which are
fully linked need be included.
Let us illustrate the theorem using an example first discussed by
Brueckner (20). We shall consider the partial cancellation between the first
term in ). 4

in equation [38] and the fifth, which is a renormalization

term involving the second order energy. (We have already dealt with the
renormalization corrections involving the first order energy.) At the top of
Figure 7 we give the first term in ). 4

in the usual "sum over states"

notation. We now consider the case where state I is a double substitution

I~~

>

and examine the resulting expressions when J

=1~~yg),K=I~b>, K=IYC).

The resulting diagrams for these two possibilities are given on the left of
figure 7. The arrows indicate the evaluation of the last two factors in the

L
:J

--- -

---

U----

Figure 7.

y (x + y)

+ __, -

+ e - E - e + E + e - f mn
- e )

= (YS) ~ (mn)

,B

xy

(Er+fb-Em-c n )

(. + e - E - E. + E + E - E - E. )
oc.,Bkl)"6/T1n
X
+
y

0(

CXPktYS
X
+
y
(6 + E - e - E )

(E

Factorizations of Fourth Order Unlinked Diagrams

+ y)

.,.---:I~

(x

K (()(p) -)- (kl)

= (rx/3 Y6) -)- (kLm n)

O.-k---:D~l\;;; --~Ob

1= (cxf3) -+ (kl.)

(Eo- E1)(E o- E.r)(Eo-E K )

<DlvII> <I/vl:r><:r/vIK><K/V/O>

~Q:~~~~V:Q~~~~06

ex

UK

""'VI

0:;

o
0:;

::0

?=

0\

457

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

energy denominators. Two topological features of the diagrams are apparent:


a) both diagrams have unlinked parts
b) the upper and lower diagrams differ only in the relative order of
the last two interaction lines.
Because of a) both diagrams must be canceled, and it is b) that makes the
cancellation possible. Using the algebraic identity given in the bottom line of
Figure 7, the sum of the two diagrams factorizes to give:
[55] Diagram 7

=~<QPlkl >J2 X [(YE>lmn )2


]
: + :-:- :
: + : - : - :
P k lyE> m n
Q

Now let us examine the renormalization term in ). 4


of Figure 8. With

I = I~ g)

and

K = I ~~)

given in the top line

this then reduces to the

algebraic expression on the right of the bottom line of Figure 8. This term
exactly cancels the sum of the two disconnected diagrams of Figure 7
(equation [55] ).
Because of the factorization, we can write the sum of the two diagrams
in Figure 7 as the single diagram shown on the left side of Figure 8. The

diagram involving Y E> m n may be regarded as though it were inserted in the


diagram with labels

Pk

I . We use the braces to remind ourselves:

a) that we have two factors in the donominator of the left hand part
of the diagram
and:
b) that the energy denominator for the left hand part is evaluated as
appropriate for the left hand part alone and similarly for the right
hand part.
In fact, as we shall presently discuss, whenever we have a set of diagrams

Eo - Er

t:

)~- -

---\Is

e.mn
- E. )

2.

( IX+ fj - Kl
- E.)2 (E.Y + E (1r. -

()(/31kL) C<Y5/mn

~(mn)

K = (()(,f3) ~ (k l)

I= (Y5)

Cancellation of Fourth Order Renormalization Effects

----

Figure 8.

ex.

-Q~ -~D[~m---nJ&\

Eo - EK

)(I:[<OIVIK>]2)
K

_(l:[<OIVII>J Z

t>O

::c
otl:l

;s::
;>-

00

Ul

459

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

where several relative orderings of the intermediate interactions are possible,


we have a similar factorization.
In summary, we should like to emphasize two features about this
cancellation which will be of importance to subsequent -discussion. First,
there would be no cancellation of these renormalization effects without the
introduction of the quadruple substitutions. Secondly, the cancellation is not
complete. There are many terms in A4

both from the first term and

from the renormalization part that are not cancelled. However, these terms
all correspond to linked diagrams.
III - 3

Contribution of Quadruple Substitutions to Pair Correlation Energies

We have just observed that the introduction of quadruple substitutions


causes a partial cancellation of the renormalization effects of the second
order pair energies. In this subsection, we shall examine those contributions
of the quadruple substitutions that are not cancelled. In particular, we shall
demonstrate that the contribution of the quadruple substitutions at fourth order
is factorizable into products of contributions from the first order pair functions.
This result is generalizable to infinite order. More detailed discussion has been
given by Brenig (22), Sinanoglu (12, 13), Kelly (23, 24, 25), and more
recently by Brandow (4).
A second order contribution to the coefficients of a; quadruple

substitution 1~~~6) is given by the two diagrams of Figure 9. There are


also contributions from those diagrams where y

and

and 6 and

interchanged. For each of these exchanges there are also terms arising for
k I+mn. k-+m etc. The algebraic expression for each diagram is given on the

are

j~ J~----:~.

Figure 9.

A +8

=
k'

+E

-e

<v6Imn>
rom"

(e + e.~--E -f-)

>

mn"/J

klmn
rxp Yh

CS

-E -e)

Ktmn>
I()(.~~S

6mn

-e)(~ +E -E -f +E ... e

t&~kL-r

(E

<"'I3/kL><)'S/kl;>

V6mn,Bkl

Factorization of a Second Order Contribution to the


Coefficient of a Quadruple Substitution I ~~~~ >

(L~

(E +E - E -~)

_ <t&fJ I kl>

;4:-----~~p ~----J~

A_k____

KLmn '\.

<01.,8 Ikl><Vc51 k l>rx/lYb/


(E+E -E -E l(E +E -e -e +E +E -E -f.)

?'"

r;::

0\

.j::.

461

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

right. Note that the diagrams are not unlinked since both parts contain free
lines.

In the same manner as our previous example, when we consider both


relative orderings of the two interaction lines we obtain a factorization. The
sum of the upper and lower diagram gives a second order contribution to
the coefficient of I~ h~n6

which may be written as

[56] C(2)klmn
o ~ yO

and from Figure 6, allowing for exchange of y


[57] c(2)klmn

C(1)kl

o ~ yO

o~

C(1) mn
y 6

, 6 ~ etc., we have

C(1)kl

oy

C(1)kl

00

C(1)mn
~o

ETC.

We may now add any number of intermediate interaction lines (with all
possible relative orderings) in either part of each diagram without altering
this result so that one may immediately generalize equation [57] to infinite
order. However, if we add an intermediate interaction line that joins both
disjoint parts of each diagram in Figure 9, this sort of factorization is not
possible. Fortunately, the factorization becomes possible when we come to
evaluate the contribution to the correlation energy.
The fourth order contribution to the correlation energy from diagram 9
is obtained by taking the four pairs of free lines, in the upper and lower
diagrams, and joining them together, two pairs at a time, by two final
interaction lines. In fact, one obtains only five diffe!ent types of diagram. We
have already dealt with one type, Figure 7, which was unlinked. The

462

M.A. ROBB

remaining four are not cancelled.

>

One example is shown in Figure 10. The upper and lower diagrams
correspond to final states I~~

and 16~n>

respectively. The calculation

of the energy denominators is indicated in the same manner as in Figure 7.


We again have a factorization so that: Diagram lOa + Diagram lOb
[58]

(a~lk

I)

Ea+E~-Ek-EI

[<YOlmn>(~Olmn)

Ey + Eo - Em-En

0y Ik I)
Ea-+Ey-E k- EI

Now, if we include the analogous diagram with


1

interchanged and add

a factor of "2 to allow the yo mdex to be independent of the a

index, then we may treat the right hand side of the diagram as though it
were inserted in the left as shown in Figure 11. The inserted part has
energy denominators independent of the part into which it is inserted, so that
[59]

G1 =

~[
2

olmiV(~olmn

(y
>
E+E-E-E
~
y
m
n

and the factor G 1

above takes the place of the expression in square

brackets in equation [58] above. The insertion above can also be written in
terms of the coefficients of first order pairs so that
[60]

G1

=; [ (yolmn)

and similarly for the first and last factors in [58]. In summary we have
thus established two important results:
a) Diagram 10 factorizes so that one part may be thought of as

0(

Figure 10.

y(x+y)

xy

X
+
+- (f.IJ + E6 - Em- En )
y

+-(eOL +E~kl
-g -+E +E -E. -E)
'(.s mn

+- ( E(X + f y- E k - El)

~(f+ -E-f+EH-E -E)


~klvlSl'l'ln

Hole-Line Rearrangement Diagram

x (x +y)

(b)

(.1)

c5

'"

0\
W

.j::.

'><:"

t'rl

5!

::I

;.-

0:>

'"

>-l

'C"

t'rl

"d

::I
(")

;.-

a:::

;.-

'a:::"

C'l

;;

\j
\j

~
'";.-

"!j

;:

Figure 11.

~--G

Factorization of the Hole-line Rearrangement Diagram

IX

I-{~+
'~

'2:

6J

.j>..

t:D
t:D

?>
::<l
o

.j::..

0'1

465

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

inserted into the other,


and:
b) The fourth order contribution of the quadruple substitutions from
diagram lOis expressible in terms of first order pairs.
Finally, we should note that the diagram in the bottom line of Figure 11 is
formally similar to a third order diagram involving an intermediate interaction
with a

one~electron

operator.

Before we consider other diagrams which may be derived from


Figure 9, we must comment briefly on the case where

~ =

This situation would result from a quadruply excited state

y in Figure 10.

I~~~o">

which

would violate the exclusion principle. Remarkably, such a diagram does arise
from uncancelled terms in the renormalization expression

I
I

where I = I~~

0 I V I

>

1)2) (I
X

EO - EO

and

K = I~~

>.

Such diagrams are also called EPV diagrams;

however, their inclusion in no way implies a neglect of proper antisymmetry


constraints on the wavefunction. The important point is that these EPV
diagrams must be included and, as we shall see, they play an important role
in all pair function theories. The factorization of diagrams of the type shown
in Figure 10 was first demonstrated by Brueckner and Goldman (26) who
called these diagrams "rearrangement" diagrams.
There are three other types of diagram that may be derived from
Diagram 9. These are shown in Figures 12, 13, and 14. When these diagrams
are treated in the same manner as Figure 10 they have values summarized in

466

M.A.ROBB

Dc.

(a)

IX

(b)

Figure 12.

Particle-Hole Rearrangement
Di agram

467

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

(a)

(b)

Figure 13.

Particle-Line Rearrangement
Di agram

468

M. A. ROBB

fJ

(a)

(b)

Figure 14.

Particle-Particle Rearrangement
Diagrams

469

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

Table I. One may again identify the coefficients in the expansion of the
first order pairs as was done for Diagram 10. The results are again
generalizable to infinite order by adding intermediate interaction lines joining
k

I.

Q~. mn ......

but not

Im.Qy ..... .

Finally, we must now treat the case where the two double substitutions
of Figure 9 are coupled by some intermediate interaction. For example, one
might add an intermediate interaction of the form

(Imll' m') .

The result-

ing third order expression for the coefficient of the quadruple substitution

I: ~ ~~~'>

would not be factorizable in the same manner as Diagram 9.

However, the resulting fifth order contribution to the correlation energy is


factorizable. For example, if we include all relative orderings of the
interactions in Figure ISa we have:
Diagram ISa =
[61 ]

I>
)x<!mll'm'>x [<:y 61m n>
( <'Q ja+ Ik/>.+--
Q I'" k - I
y6mn

>]

(y 61 m'n
y+ 6--1
m
n

We may again identify the expansion coefficients of the first order pairs and
subsequently generalize to infinite order. Some other examples are shown in
Figures ISb and ISc. One must also include diagrams where the same type of
interaction is introduced into Figures 10, 12, 13, and 14.
To summarize, the central result of this subsection is the factorization
of the contributions of the quadruple substitutions. Provided we can neglect

14

13

12

10

DIAGRAM

TABLE I

<"B/k~>

<"elk!.>
E,,+a-Ek-ER,

Ea+Ea-Ek-~

<"e/k!.>

Ea+e-Ek-E~

<aa/k~>

Eex +a-k -1.

_1

_1

~Y6/mn.

x fr6/mn>

x fY6/mn>

+
<a6/~n>

E/Eo-k-E~

<yo/k~>

+ <y6/kR,>

<y6/kn>
+ <y6/kn>
Eyho-Ek-en

e-6-~-n

<B6/tn>

<e6/mn>
X [Y5Imn> c +6-m- En + <e6/mn>
e

VALUE

MATRIX ELEMENTS fOR REARRANGEMENT DIAGRAMS

1
J
<y6/mn>
Ey+6-m- En

<y6/mn>
e: y+e: 6-DI-& n

<y6/mn>
+E -& -&
y 6 m n

<y6/mn>
E + -E -&
y 6 m n

<aa/mn>
E +e -e -E
" a m n

<aB/!.m>
E +e: -& -&
a a ~ '11

<ay/km>
Ea+<y-ck-En

E,,+Ey-Ek-EI.

<"y/k~>

.j:>.

-...l

tzj

otzj

::tl

!>

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

(c)

Figure 15.

Higher Order Rearrangement


Diagrams

471

472

M.A. ROBB

the single and triple substitutions, then the linked pair expansion (equation
[27]) should thus be an excellent trial function.
III - 4

The Partial Summations of MBPT

We have now accompanied a transcription of RSPT into MBPT. In order


to obtain numerical results, one must now sum the resulting series. However,
the enumeration of all possible diagrams beyond fourth order is a formidable
task. In addition, straightforward summation, order by order would be almost
futile because of slow convergence. Therefore, what is done instead, is to sum
over certain types of diagram to infinite order, a procedure called partial or
selective summation. One first identifies various diagrams according to
topological classes, then identifies what one hopes are the dominant classes,
and sums only these diagrams. In adopting this type of approach one has no
upper bound to the energy and one must justify one's methods either by
comparison with variational calculations, or by examining selected higher
order terms. These infinite order perturbation theories are related to various
methods of quantum chemistry that are already familiar to the reader. The
simplest method for infinite order summation of diagrams is based upon the
summation of a geometric series
[62]

1 +a 2 +

a 3 + ....

(1-a)

Ial<1

This technique is used to sum out a series of repeated intermediate interactions


such as those illustrated in Figure 16. The series illustrated in the top line of
Figure 16 factorizes into two terms, one of which is a geometric series as
shown on the second line. The result, as shown on the last line of Figure 16
is just a new energy denominator which is shifted by the value of the
repeated interaction. A specific example illustrates this point. Consider the

fJ

cx..,
fl

!fo

----

.t
DC

'"

at

----

fl
fl
fl

1____1

==
~
n

1-

r+

t tJ J-1 - ~t-~iJ

Summation of Diagonal-Hole Ladder Diagrams


using a Geometric Series

tpX:*--i; ]

Figure 16.

[+~

ottpt

oj::.

-...J
Vol

~
~
:<l
-<

::l

)-

:<l

c:::
co

trl

"C

III

:<l

C'l

;;;

I:)
I:)

fJ

)-

CIl

IX

fl

-t t, {1 + [t t, <t~1] + H~tx:H1r+ [. . . _. f+---}

0(

(X

~::l

'Tl

:<l

474

M.A. ROBB

series
Diagram 4a + Diagram 4e + Diagram 4e' + ---------The intermediate interaction ~[61 Q~>

from Figure 4.

may be summed to infinite order

to give

In a similar manner,
Diagram 4a + Diagram 4b (m~k, 1_ n) + Diagram 4b'(m=k,l=n,o=k, p=l)
contributes a denominator shift of-(k Ilk I> while the analogous series
containing Diagram 4d contributes a shift of

< la >
Q

I<

Thus we may sum

all the diagrams that result from the partial cancellation of the renormalization
effects which invoive

<OIVIO) This result is well known and corresponds

(27, 28) to starting RSPT using

This is exactly equivalent to the summation of EPV diagrams used by Kelly


(24, 25). One could imagine summing all the diagrams of Figure 4, using
"nested" geometric series to give Feenberg (29) type continued fractions;
however, this sort of appraoch has never been investigated numerically. Kelly
has used an approximate geometric series to sum the non-EPV diagrams of
Figure 4. This series is illustrated in Figure 17 and is correct to fourth order.
However, the most general method of diagram summation is the partial
differential equation (PDE) method which, in tum, leads one to the familiar
variation-perturbation (30) method.

Figure 17.

ex.

"

ex

m n

+{

P + -----

Approximate Summation of Non-diagonal


Particle-Particle Ladder Diagrams using
a Geometric Series

~o~+ B
a8D"
~([D~ [I +{ffi3Dt 0 }
t

}:--

~
~

Vl

-Po
--l

'"

:r:
tTl
o
-<

...,Z

:>
...,
(3

to

'"

...,

'"

tTl

"'"

::l
n

:>

:>

'"

;;

u
u

:>

Z
en

(3

...,

~
n

'"'1

;;;

476

M. A. ROBB

The basis of the PDE method is well established and we shall be


content with summarizing the essential ideas using an example discussed in
some detail by Freed( 13). Starting from the second order contribution to the
correlation energy of pair
[64]

A e(2)

_
-

~ a~

I<

<a[3lkl )2
a+p.-k- 1

>I

and defining

[65] Q '2
[66]

H O(1,2)

Ikl)

k>1

h' (1)

II

<k

h' (2)

Ii (1) is defined in equation [40] we may write

where
[67]

8e~~ _ <a~I_'

_'_1 a(3 >

Q '2

r'2 Ea + Ef-! - H o(1,2) r 12

We may then identify first order pair functions Ua~') so that

<a~I_1r 12 Iu ap(1

where the U ~~
[69]

are determined from

lu~v) = { Q'VEa + E~-Ho(1'2)}

+,21

a~

The above equation is then inverted to give a PDE


[70]

{Ho (1,2) -

Ea - Ef3} Q ,2 1

Ud~> + Q~~ Ia> =

which may be solved variationally. The third order energy expression which
includes Diagrams 4b, 4c, and 4d is given in the same manner as
[71]

(3)

where
[72]

~i3l-'-

ea (3 - ~

~~'2)

1
= r,2

Q '2

(1,2)

r ' 2 Ea+ E~ - H O(1,2) Va~

,..,

,..,

h (1) + h (2) - (Ja(1)

- (Ja(2)

Q '2
_,
Ea+ Ej!>- HO(1,2) r'2

Kd1)-(J~(1) - Kf!> (1)

Ka(2))-(J~(2)- Kf3(2)

+<a~la(3).

Ia~/~

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

In the above equation

'"
h (1)

477

r 12 comes from Diagram 4b, J a ' Ka from 4c, and

from 4e. The summation to infinite order is then given as

Using the identity

(A- B)-1= A- 1

(Xl

+ A- 1 I

n=1

(BA- 1)"

with

E~_ Ho (1,2)}

012 {

012 Va~ (1,2) 12

Ea+

we can write equation [73] as

where

Thus by analogy with equations [69] and [70] we have

and thus

o
[78]

llea~ = ~~

\:12

IUa~ >

The equations [77] are in fact related to the Bethe-Goldstone equations used

478

M. A. ROBB

by Nesbet (33) or the pair functions of Sinanoglu (13). We shall return to


this point shortly. The important fact which we wish to establish at this
stage is that once the types of diagram to be summed have been identified
(eg. the diagrams of Figure 4), the summation to infinite order results in a
PDE. The essential factor is that the diagrams to be summed need only be
identified at the lowest order at which the intermediate interaction occurs for
the first time, and this interaction is included in Va~ (1,2)
III - 5

The Pair Correlation Functions of Quantum Chemistry

I) Non-Variational Methods
With the techniques of MBPT now at hand our task of examining the
origin of the various methods used for the computation of pair functions is
readily accomplished. From the general form of equation [13] or equation
[20] and from the discussion of the last subsection, it is clear that the PDE
for the pair functions must have the form
[79]

012{ Ho (1,2)-

a - (3

+ y,o
L
,.a~

+ Va~

t1,2)

0 12 }

La~;~o
IU yo
r

>+

IUa~>
0121<;l~
r
12

>

O.

Thus the various approximate methods used to calculate pair functions must
differ only in the choice of

Va~(1,2)

(1,2)
and L a~;yo

. Since these

operators may in turn be constructed by infinite order summation of


diagrams, we have established our basis of comparison of the various methods.
We shall begin our discussion with an examination of some of the
methods first used by Nesbet (33),

Sinanoglu (12, 13), and Szasz (15, 16)

which have come to be known as the Independent Electron Pair Approximation

479

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

(IEPA). The relationship of the IEPA to perturbation methods has been

extensively reviewed by Freed (31, 32), Cizek (17) and is implicit in earlier
work by Kelly (23). Indeed, the very early work of Brueckner (20) is
fundamental to the present approach.
Much of the derivation of the IEPA equations has been discussed in the
derivation of equation [77); however, the final steps are particularly instructive and we shall now sketch the details.
The IEPA corresponds to the infinite order summation of
a) all diagrams of Figure 4 yielding equation [77) with

Va~\1,2)

defined in equation [72)


b) Diagrams 10, 12, 13, and 14 where y 6 = a

f3

and the labels of any

crossed lines in each diagram are equal.


Let us illustrate b) above using Figure 10 as an example. Referring to our
previous analysis summarized in Figure 11, we note that if y
and we sum over
A

ue

m) n

then the value of the inserted part \ -

f3 and 6 = a

G1)

is just

t 2)

af3 . From our previous analysis we know how to sum an insertion

in a second order diagram to infinite order. Thus Diagram 10 contributes a


factor of -l1ea~ to V
insertion on the a

a~(1,2)

in equation [79] above. The corresponding

line gives an additional factor of - l1ea~ In the same

manner, Diagram 12 contributes a factor of 2

l1e a~

of-2I1eaf3 and Diagram 14, a factor ofl1ea~


contribution
[80) Q 12

of-l1ea~

to

Vaf-l (1,2)

{Ho (1,2) + Va~ (1,2)- (a r 12

with a resulting net

. Thus the IEPA equations become

(/I}luai>

+ Q12Ia~>

, Diagram 13, a factor

l1e a f?>

Uar

480

M.A. ROBB

The non-linearity due to fl. eafj arises (32) from the summation of the EPV
rearrangement diagrams with only two hole states a and

It must be emphasized that these EPV diagrams that we have just

summed result from the incomplete cancellation of the renormalization


corrections as discussed in III - 3. Thus the IEPA implicitly includes some
contribution from the quadruple substitutions, and thus comparison with
variational calculations that do not include up to quadruple substitutions is
not meaningful.
The obvious generalization of the IEPA equations is due to Kelly (25).
His choice of diagrams was dictated somewhat by purely technical considerations since he could only sum diagrams that formed geometric series. At
second order, he included all terms resulting from the infinite order generalization of Diagrams 10, 12, 13, 14 subject to the constraint that they be
EPV diagrams. Thus, Diagram 10 gives a contribution to
[81) Diagram 10=

1- fl.eay - fl.eo~

tI -

y>o

fl.e"
au

Va~ (1,2)

given by

fl.eyp.
r

There is a partial cancellation between the EPV terms for Diagrams 12, 13,
and 14 so that, for example, 13 and 14 sum to zero if mn = k 1
The sum of the EPV terms resulting from Diagrams 12, 13 and 14 give a
further contribution to
[82)

Va~(1,2)

of the form

Diagram 12 + 13 + 14

1<>1

where
[83)

fl. A~k
0,-

1)=

p-

y>o "" O'I~

m>n"" k,l

mn

yO

Thus when Kelly's second order equations are generalized, one obtains
equations similar to equation [80) with the additional terms arising from

481

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

equation [81] and [82] included. Thus Kelly's second order method
represents an improvement over the IEPA in that it represents an attempt to
include all the contributions of the quadruple substitutions correctly. The
main defect of both the IEPA and the second order method of Kelly is that
the pairs are assumed to be uncoupled (except for the rearrangement diagrams).
Topologically, we have included no diagrams where the hole-line label is
changed by the intermediate interaction. Thus we have no contributions to
the coupling operator LQ~)J'2)

in equation [79]. The equations proposed

by Cizek (17) on the other hand, include all coupling terms except those
arising from Figure 15 and we shall now comment briefly on this work.
The diagrams that couple the pairs begin to occur at third order and
the "direct" diagrams are summarized in Figure 18. In the same manner as

.
dISCUSSIon
.
. contnb
our prevIOUS
t hey sum to gIve
utIOn to

L Q~j
(1,2)
y6

f t he ..lorm

In addition, there are non-EPV terms arising from Diagram 10 and Diagram 12
. that contribute to

LQ~1j'~~

13 and 14 contribute to

and also some non-EPV terms from Diagrams

VQ~(1,2)

. The corresponding matrix elements can

be derived from Table I. The resulting equations give the coupled pair manyelectron theory CPMET of Cizek (17). These equations would be exact except
for the neglect of diagrams of the type shown in Figure 15.
There are two other methods which have recently been proposed which
lie in between the IEPA and CPMET. The coupled electron pair CEPA of

482

M. A. ROBB

--

Cb)

(e)

Figure 18.

Third Order Diagrams that


Couple the Independent Pairs

483

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

Meyer (33, 34)) includes the coupling effects due to the diagrams in Figure
18 to infinite order but uses the same approximation as Kelly in the
treatment of the rearrangement diagrams. On the other hand, the Independent
Pair Potential (lPP) and the Coupled Independent Pair Potential (CIPP) used by
Van der Velde (36) include only the terms from diagrams 18a and b. The
relationship between the various methods is summarized in Table II.
Finally, we should mention that one may solve the IEPA equations
[80] in any manner one chooses. In particular, if one is using variational
methods, different basis sets may be used for each pair. Kutzelnigg and
Ahlriches (37, 38) have derived equations for optimum orbitals based on a
different pseudo-natural orbital expansion of each pair. Meyer (34) has
extended this work so that the coupling terms

(LQ~1;y2~ )

may be

accounted for in this non-orthogonal basis. However, the evaluation of the


higher order rearrangement terms in this non-orthogonal basis would be
almost impossible.
(ii)

Variational Methods

All of the variational methods for the calculation of pair correlation


functions involve severe constraints that severely limit their usefulness where
accurate correlation energies are required. However, the various non-variational
methods can only be tested by comparison with variational calculations and
some of the methods we shall now discuss provide practical methods for
accomplishing this. For this reason, it is important to compare both types of
theory using the same methods.
The simplest variational pair theory is the Separated Electron Pair (SEP)

none*

none*

18a, 18b

18c

all
all

all

none

all.EPV
except
13 and 14

all

none

yo = as

EPV with

all

CEPA(b)

none*

all

none

yo = as

EPV with

all

CIPP

all

a11

none

all

all

CPMET

none

none

none

all EPV

all
using geometric
approximation

KELLY 2nd ORDER

* When symmetry adapted pairs are used some contribution from these diagrams is included but
not to infinite order.

none

15

yo = as

EPV with

all

a11

10, 12, 13, 14

CEPA(a)

IEPA

NON-VARIATIONAL PAIR THEORIES

DIAGRAM

TABLE II

et!

oet!

::<l

?::

i'>

.j:>.

00

.j:>.

485

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

model proposed by Hurley, Lennard-Jones and Pople (39). The SEP model
corresponds to a wavefunction which is a single antisymmetrized product of
strongly orthogonal geminals APSG:
[85]

\.IJAPSG=

A{ A1~,2).... "

where the geminals

AR

A " .. 'A (N-1,N)}


N/2

satisfy the strong orthogonality condition

[86]

The strong orthogonality constraint is consistent with a partition of the


orbital basis into disjoint sets

so that each geminal

{<I>R j }

AR

is expanded entirely in its own orbital basis

Thus using the same notation as in our previous discussions we

may write
[87]

AR (1,2) =IR1

IU R. R)

R1)+

[88]

Iu RR)=1
R >R
k

Rk RI Rk

R1 R1

R1

The equation [85] then becomes


\.IJAPSG =

[89]

+1
R>R
k

A {111 1

1)

....

.,>

R1

>" .. }

IR1 R 1

486

M. A. ROBB

+ ...... .

It is possible to minimize the energy directly with respect to variations in

the

AR

subject to the constraint [861 (see reference 40, 41, or 42).

Although the SEP model is not a very good one, its analysis into a
diagrammatic expression is particularly instructive. Referring to Figure 6, the
only coefficients to be determined are those for which a-R"p= R,ik- Rk , I = Rr
Thus in Figure 4, it is easily verified that we need only those diagrams
where
a) all orbitals in any fermion loop belong to the same group
b) all orbitals at any interaction line must belong to the same group.
Because of constraint b) all the diagrams of Figure 18 vanish. Further, the
only non-vanishing contributions from the rearrangement diagrams 10, 12, 13
and 14 are the same EPV terms we encountered in the IEPA. The APSG
pairs are thus coupled only by diagrams of the type shown in Figure 15.
Robb and Csizmadia (43) have demonstrated numerically that the coupling
from the diagrams of Figure 15 is completely negligible. Thus for the SEP
constraint the IEPA gives essentially a variational result.
Wavefunctions computed with the SEP model also provide a numerical

487

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

demonstration of the inadequacy of the trial function consisting of only


double substitutions (equation 23) and thus of the importance of the EPV
terms that result from the unlinked cluster factorization. Recent extensive
calculations by Saunders and Guest (44) using APSG wavefunctions showed
errors of up to 5% of the correlation energy when the expression

89

is

truncated at double substitutions.


It is possible to generalize the SEP model to include interpair (i.e.

interorbital correlations) and various methods have been suggested by


McWeeny (40), Kapuy (45, 46), Mehler (47), and by Robb and Csizmadia
(48). These models when expressed in diagrammatic MBPT would be
equivalent to keeping only the constraint that all orbitals in any closed
fermion loop belong to the same group.

This would eliminate Diagram 18c

and many diagrams of the type shown in Figure 10, 12, 13, and 14. Thus
the IEPA, with a SEP type partitioned basis might be very close to a
variation calculation. However, we are aware of no actual calculations.
As soon as the strong orthogonality constraint is relaxed, the resulting
variational pair equations only remain computationally tractable for closed
orbital pairs. Even then the pair expansions must be restricted to natural
orbital form which implies that one must use some multi-configuration SCF
procedure (49).
III - 6

Symmetry Adapted Pair Functions and the Use of Localized


Orbitals

Introduction of group theoretical considerations into the theories based


on pair functions often causes some confusion. For example, if one requires

488

M.A. ROBB

that the pair functions in the IEPA be so defined that the total wavefunctions correspond to an eigenfunction of

2 then the resulting analysis

into orders of perturbation theory becomes rather complex. However, in the


general MBPT expansion, to take care of symmetry restrictions, one has
merely to include those diagrams that must couple the pair functions because
of group theoretical considerations, to infinite order and one ultimately
converges to the correct state.
Let us illustrate using the

SU (2)-spin symmetry as an example. As

discussed by McWeeny and Steiner (50), we may first couple the pair
functions so that they are eigenfunctions of a two-electron spin operator:
[91] ,I,.

'YO (K,S, m11,2) =

[92] ... (

K, S, m 11,2)=

flY
~
L

flY

The summation fl Y must involve a pair of orbitals with the same spatial
parts but different spin and thus

K is an index denoting different pairs of

spatial orbitals. Those pair functions with the same

but different mare

then further coupled to give a pairfunction with the correct

S2

value for

the state of interest. Provided one is using the full variational coupled
equations there is no ambiguity and one can greatly reduce the number of
equations to be solved. It is only when one attempts to decouple the resulting equations to give a spin adapted IEPA that conceptual problems begin to
arise. Thus for a closed shell case one could imagine four possible schemes
1) no spin coupling (equivalent to the original IEPA)

2) couple pairs only so that they are eigenfunctions of a two-electron

489

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

spin operator
3) couple pairs with same

only leaving pairs with different

uncoupled (38, 51)


4) couple all pairs with the same

(52, 34)

The correspondence of scheme 4 with the diagrammatic method is


straightforward. Scheme 4 implies the inclusion of all the diagrams of
Figure 18 where only the spin label of any hole line is unchanged by the
intermediate interaction. This rule would also apply to the rearrangement
diagrams (Figure 10, 12, 13, 14 and 15) but this has been ignored in all
calculations with the exception of those of Cizek (17). Scheme 3 has been
the most popular one and causes some problems since contributions occur
for certain diagrams which occur in coupling terms in Scheme 2.
One should point out, that if one were starting from one of the
geminal theories (APSG) one might proceed rather differently when one comes
to the interpair terms. For example, one might excite two singlet geminals to
triplets and couple the triplets to a singlet and proceed as in Scheme 3 above.
However, this approach has never been attempted in the context of the IEPA.
Considerations similar to those just discussed also apply to symmetry
restrictions arising from point group symmetry, and the use of localized
orbitals. In general, the third order diagrams of the type shown in Figure 18
will have different signs and different magnitudes if one subjects one's
orbitals to unitary transformations. For example, Diagram 18a becomes nonzero for localized orbitals but Diagram 18c may become very small. Some
numerical studies are available (for example, references 34 and 53) but no

490

M.A.ROBB

general conclusions can be made at this point.

III - 7

Some General Considerations On the Calculation of Pair


Functions using Perturbation Methods

At the beginning of our discussion on pair functions, it was suggested


that one might first identify those diagrams which were important at lowest
order and then sum only those diagrams to infinite order. One might improve
on this scheme somewhat as follows:
a) sum dominant diagrams to infinite order
b) correct the resulting pair correlation energies to first order in the
intermediate interactions that were ignored in a)
Let us illustrate this with an example. Consider Diagram 18c which has the
value
[93] Diagram 18c

We can then generalize this result by adding all possible interactions of the
type shown in Figure 4 in both the top and bottom half of figure 18c. Thus
the first order correction to the IEPA correlation energy from Diagram 18c is
/ , IEPAI

~u a~

<a(311

0/.~

IEPA>

U yo

We have illustrated the scheme for the coupling of the IEPA pairs but it
could equally well be applied to the rearrangement diagrams or for correcting
for the single excitations.
At this point, one should also comment briefly on Nesbet's (28)
method of higher order Bethe-Goldstone increments for the correction of the
IEPA. Nesbet's technique is equivalent to solving a set of equations for all

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

491

diagrams with 2 holes, a set for all diagrams with 3 holes and so on. This
scheme does not take advantage of the factorization which is possible in
Diagrams 10, 12, 13, 14, and 15 and thus is unnecessarily cumbersome.
However, some modification of his scheme, applied to Cizek's (17) equations
might be very successful.
Mehler (47) has also recently discussed a method for the inclusion of
the terms which couple the pairs in the IEPA approximation. He proposes
using the IEPA pairs as the starting point for a full variational calculation.
The zeroth iteration would give the first order corrections just discussed.
However, beyond the zeroth iterate, the solution of these equations would
be exceedingly difficult because all the rearrangement effects (which occur
in his equations as overlap matrix elements between pairfunctions) are included
to infinite order.
IV

SOME COMMENTS ON NUMERICAL APPLICA nONS


Pair function calculations, particularly within the framework of the

IEPA, are now being routinely applied to problems of chemical interest (see,
for example, references 54, 55, and 56). In addition, applications have even
been made within the framework of semi-empirical theories (57). However,
we shall limit ourselves to a brief discussion of some selected recent work
which illustrates some of the theoretical problems we have just been
discussing.
The most accurate calculations have been made on atoms. Recently,
two papers by Sasaki and Yoshimine (58, 59), report the results of extensive
CI calculations (up to "i" type orbitals, including up to quadruple substitutions)

492

M.A.ROBB

on the atoms B, C, N, 0, F, and Ne. These detailed calculations now


permit the critical assessment of previous MBPT calculations using the second
order method of Kelly (25), and the IEPA type variation-perturbation calculations. Before attempting an analysis of these results, however, we must
comment on some of the technical aspects of Kelly's method.
Kelly evaluates the MBPT diagrams by direct integration (as opposed to
summation in a discrete basis) over the complete set of continuum states.
Thus, he must sum the diagrams order by order rather than using the PDE
method. For this reason, the summation of the diagrams in Figure 4 with
non-diagonal intermediate interactions had to be performed using the
approximate geometric series shown in Figure 16. This is undoubtably an
approximation. We have independently tested this formula in our own
discrete basis calculations and the error was usually less than 10- 5 . We did,
however, encounter a few exceptional cases where the error increased to 10-3 .
In any variation-perturbation or CI calculation one must make a point-wise
approximation to the complete set of states considered by Kelly. Sasaki and
Yoshimine (SY) have developed an efficient method of accomplishing this
step and thus their results may be compared with the MBPT results.
The most directly comparable calculations are those of Miller and
Kelly (60) and those of SY on the L shell of C (3 P) Kelly's second order
result (including the EPV Diagrams 10, II, 12, 13, 14) is -0.1048 which is
to be compared with the variational result of SY which gives -0.0984.
Kelly evaluated the third order ring diagrams (18a) to give the three-body
correction of 0.0041 which added to the second order result gives -0.1007.

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

493

The discrepancy between Kelly's calculation and that of SY is consistent with:


a) Kelly's neglect of Diagram 18c and higher order rearrangement
diagrams
and
b) the truncation errors of SY which they estimate to be of the order
of 4%.
The truly remarkable point is that a simple third order correction due to
pair-pair repulsions (Diagram 18a) brings Kelly's calculations (which are an
extension of the IEPA) into agreement with the variational result of SY.
Unfortunately, the agreement of the MBPT calculations by Lee Dutta
and Das (61) and those of SY on

Ne(1 S)

is not as good. The L shell

result of Lee, Dutta and Das (LDD), calculated using Kelly's method is
-0.3584. LDD then evaluated the Diagrams 18a and 18c to give a correction
due to pair-pair repulsions of 0.0224 which, in turn, yields a final result of
-0.3362. This result does not compare well with the SY result of -0.3052.
The LDD result for Diagram 4a is -0.26582 which is considerably lower than
the result -0.18693 computed by King (62) by solving the second order
equations [70] using Gaussian geminals. Obviously, these discrepancies require
further study. However, the IEPA results of Nesbet, Barr, and Davidson (63)
are in better agreement with those of SY if one adds the estimate of LDD
for the Diagrams 18a and 18c. The Nesbet, Barr and Davidson IEPA result is
-0.3454 and when corrected for pair-pair repulsions is -0.3221. The difference
between the IEPA and Kelly's second order method results from the EPV
diagrams of Figure 10, 12, 13 and 14 which LDD calculate will reduce the
IEP A result by 12%. This correction would bring the Nesbet, Barr and Davidson

494

M.A. ROBB

result into agreement with the variational calculations of SY.


The significant feature of the MBPT and IEPA calculations is the
importance of the diagrams which couple the pairs. It would appear that a
simple third order calculation of the diagram of Figure 18 will suffice (the
procedure of subsection III - 7 should be more accurate still). Of particular
interest are the signs of the pair-pair repulsions calculated by LDD. They
found the diagonal terms (2s2p - 2s2p, 2p2p - 2p2p) to be positive and the
non-diagonal terms (2s2s - 2p2p, 2s2p - 2p2p, 2s2s - 2p2s) to be negative.
Thus as one goes to larger numbers of electrons the net contribution may
become negative.
The calculations of SY on F:- were performed using a variety of
IEPA type wavefunctions as well as CI. The results provide some information
as to the efficacy of symmetry adapted pair function schemes. For the
shell of F- SY obtain -0.3529 using symmetry adapted pairs (analogous to
scheme 3

subsection III - 6), -0.3103 using space orbital pairs (scheme 4

subsection III - 6) and -0.3020 using the CI method. The good agreement
using space orbital pairs is encouraging, but it must be remembered that the
higher order EPV diagrams of Figure 10, 12, 13 and 14 which have been
included by Kelly are ignored in both these types of IEPA calculations. When
these terms were included, the IEPA calculation with space orbital pairs might
even come above the CI result.
In connection with the preceding discussion, one should also mention
the calculations of Mehler using the IPP method (35). He computes pairlike wavefunctions using the wavefunction form

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

[94]

1<5> +y~o
k>1

C or k

495

'lk 'Y')
0

In his calculations on BH, as the basis was extended the IPP correlation
energy actually became worse than the IEPA energy. At first this may seem
surprising as by this coupling procedure one has included the intermediate
interactions to be found in Figure 18a. However, in doing this one has
changed completely the form of the higher order pair-pair repulsions. When
one uses some sort of symmetry adapted pair functions one is doing a similar
sort of coupling on a smaller scale. One should not be too surprised if one
encounters examples where the symmetry adapted pairs give a worse result
than the IEPA model using simple spin-orbital pairs. The treatment of the
rearrangement terms is particularly ambiguous in the symmetry adapted
schemes.
The most sophisticated pair function calculation on molecules is
undoubtedly that of Paldus, Cizek and Shavitt (64) on the BH3 molecule.
They have used the complete coupled pair many-electron theory (CPMET)
equations of Cizek (17). This set of equations, as we have discussed in subsection III - 5 (i), includes the diagrams of Figure 4 and 18 as well as all
the rearrangement effects of Figure 10, 12, 13 and 14. Further, the theory
was extended to include the single and triple substitutions. Thus their
results will differ from a full variational calculation only by neglect of
terms which arise from diagrams of the type found in Figure 15. While they
have used only a double zeta basis in order that a comparison with a
complete CI can be made, the agreement of the variational and variation-

496

M.A. ROBB

perturbation calculations is remarkable - - the CI (double + quadruple


excitations) result is -0.048050 and the result for the corresponding coupled
pair calculation is -0.48048. The agreement is similar with the inclusion of
single substitutions. Unfortunately there have been no numerical tests of the
relative importance of the non-EPV rearrangement diagrams and the EPV
diagrams summed by Kelly. This is important, since Kelly's approximation
yields results which are not invariant with respect to unitary transformations
of the orbitals. Some calculations performed in our own laboratory on H20
showed that the inclusion of non-EPV diagrams of the type shown in Figure
13 and 14 gave an additional positive contribution to the correlation energy
of the order of 2% but much numerical investigation is still required.
The only molecular calculations of accuracy comparable to the atomic
calculations are those of Meyer on H2 0 (33) and CH 4 (34). The CH 4
calculations are particularly interesting because the calculations were performed
using not only the CEPA (33, 34) but also two symmetry adapted IEPA.
Technically, these calculations represent a considerable innovation since they
were performed using different non-orthogonal bases for each pair function.
The CEPA calculations included the coupling of the pairs via the diagrams of
Figure 18 to infinite order. The EPV rearrangement diagrams were treated
in a manner almost identical to that of Kelly (25). In his IEPA calculations,
Meyer observed a similar behaviour as in the calculations of Sasaki and
Yoshimine on F- for the efficiency of the symmetry adapted pairs as
opposed to spacial orbital. Provided the higher order non-EPV rearrangement
diagrams prove to be negligible then this method shows much promise for
molecular calculations.

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

497

Finally, we should briefly mention some calculations done in our own


laboratory. The choice of methodology used was influenced primarily by a
desire to remain as close as possible to the simple second order calculations
of Kelly and the technical problem of constructing all required matrix
elements directly from the molecular orbital integral list type (65) without
the need to store supermatrices or symbolic matrix elements on backing store.
Thus we have used the pair coupling scheme 2 of subsection III - 7. In
Table III we present results for CH 2 F 2 in a localized orbital basis. These
numbers give an indication as to how the pair-pair repulsion (computed by
correcting the pairs to first order as described in subsection III - 7) may
behave for more extended systems. The most significant feature of the results
is that the "diagonal" pair-pair repulsions are large and positive while nondiagonal pair-pair repulsions are mostly negative. A similar feature is observed
in our results displayed in Table IV for H 20 (using an orbital basis which

included p and d orbitals at the centroids of the bonds and lone pairs). The
diagonal pair-pair repulsions are positive and the off-diagonal ones are
negative. Note that the pair-pair repulsions are extremely sensitive to the
inclusion of the EPV rearrangement diagram as observed by Kelly (60.)

TABLE III

Pair

Pai r Corr.En.

CH 1-CH 1

Pair correlation energies of CH 2F2 in DZ basis (LMO)

/Pr.-Pr. repulsion

Pair
/non-diagOna1 pair-pair repulsion

- 0.0186

CH 1-CH 2

- 0.0076 / 0.0010

CH 1-F up

- 0.0053 / 0.0007

CH 2-F 2iP / - 0.0001

CH 1-CF 1 / 0.0001

CH 1-CF 2 / 0.0001
CH 1-F up '

- 0.0003 / + 0.0000

CH 1-CF 1

- 0.0008 / 0.0001

F2tp - FUp

- 0.022B

F2tP-FHp

- 0.0035 / 0.0005

F2ip-F2ip'

- 0.0178 / 0.0024

F1ip -CF 1 / - 0.0001


F21p ' - F2ip '/ 0.0003

CF 2-F21p '/ 0.0002

CF 2 - CF 2 / 0.0003

F2ip,-Fup"/ 0.0003

F2tp-F21p'/ - 0.0003

F2ip -CF 2/ - 0.0003

F2tp,-F2tp'/ - 0.0004

FUp,-F2ip"/ - 0.0007

F2tp -CF 2 / - 0.0004


F2ip-Flip'

- 0.0003 / + 0.0000

FUp-CF2

- 0.0228 / 0.0035

F2tp :CF1

- 0.0008 / + 0.0000

F2tp,-F2iP'

- 0.0092

CF 2-CF2 / - 0.0004

F2ip -F 2ip " / - 0.0004

CF 2-CF 2 / 0.0001

F2ip,-FUp"/ - 0.0001

F2tP ,-CF2 / 0.0002


FUp,-FliP'

- 0.0000 / + 0.0000

F2ip ,-CF 2

- 0.0123 / 0.0013

F2ip' -CF 1

- 0.0001 / 0.0000

F2ip ,-F 2I.p"

- 0.0092 / 0.0009

CF 2-F 2ip"/ - 0.0009

CF 2-CF 2 / 0.0002

F2ip,-FUp" / - 0.0006

CF 2-CF 2
CF 2-CF l

- 0.0104
- 0.0003 I 0.0000
Sum of Pai rs
Diagram 18a
Diagram 18b
Diagram 18c

- 0.3669
0.0180
- 0.0058
0.0058

F2tp,-F2ip'/ - 0.0001

CF2-F Up ,,1 - 0.0006

F2ip,-F2ip,1 0.0001

CF 2-F 21p ,,1 0.0002

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

499

Pair Correlation Energies for


HZO in s, p, d, Basis
C'M 0

l M0

- 0.Z6S7

- 0.Z564

- 0.Z601

- 0.Z498

Pair-pair repulsions

0.0031

O.OOlZ

with rearrangement effects

0.0006

0.00Z7

DIAGRAM lSa B= i y= ;

0.016S

0.0154

- 0.0143

- 0.0063

- 0.0043

- 0.0011

Sum of Spin Irreducible


Pairs
with higher order
rearrangement effects

DIAGRAM lSa Cl= i

B=

DIAGRAM lSa ,,"I B1 y

or i

DIAGRAM lSb

- 0.0069

DIAGRAM lSc Cl= i B= i


y= j 63 j

0.0029

0.0010

DIAGRAM lSc ClF B1'U' 6

0.0001

0.0010

CONCLUSION
It would appear that completely variational pair function calculations

without severe constraints will always be computationally intractible except


for the smallest systems. Thus one is forced to seek methods which are a
combination of variation and perturbation methods. However, we still have
very little numerical information to guide one's selection of those terms
which must be evaluated variationally (infinite order perturbation theory) and
those terms which need to be included only at lowest order. Certainly
the IEPA is not good enough; however, one hopes that one will not have to
solve the much more complicated CPMET equations of Cizek.

500

M. A. ROBB

References
1) R.D. Mattuck, A Guide to Feynman Diagrams in the Many Body
Problem. (New York, Academic Press)
2) N.H. March, W.H. Young and S. Sampanthar, The Many Body Problem

in Quantum Mechanics. (Cambridge University Press 1967)


3) D.l. Thouless, The Quantum Mechanics of Many Body Systems
(Academic Press, 1961)

4) B.H. Brandow, Rev. Mod. Phys. 39, 771 (1967)


5) P.-O. Lowdin, 1. Math. Phys . .1, 969 (1962)
6) K.F. Freed, Ann. Rev. Phys. Chern. 22, 313 (1971)
7) R.K. Nesbet, Phys. Rev. 109, 1632 (1958)
8) R.K. Nesbet, Adv. Chern. Phys.2, 321 (1965)
9) W.l. Taylor, Chern. Phys. Letters 26, 29 (1974)
10) H. Prim as in Modern Quantum Chemistry (0. Sinanoglu ed.,
New York, Academic Press (1965))

11) O. Sinanoglu, 1. Chern. Phys. 36, 706, 3198 (1962)


12) O. Sinanoglu, Advan. Chern. Phys .

. 315 (1964)

13) O. Sinanoglu, Advan. Chern. Phys. lj, 239 (1969)


14) O. Sinanoglu and K.A. Brueckner, Three Approaches to Electron
Correlation in Atoms (Yale University Press, New Haven, 1970)

15) L. Szasz, Phys. Rev. 126, 169 (1962)


16) L. Szasz, Phys. Rev. 132, 939 (1963)
17) 1. Cizek, Advan. Chern. Phys. 14, 35 (1969)

18) A.T. Amos and 1.I. Musher, 1. Chern. Phys. 54, 2380 (1971)
19) K.A. Brueckner, Phys. Rev. 100, 36 (1955)

501

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

20) K.A. Brueckner, The Many-Body Problem (les Houches, 1958),


C. De Witt Ed. (Dunod Cie, Paris, 1958)
21) 1. Goldstone, Proc. Roy. Soc. A239, 267 (1967)
22) W. Brenig, Nucl.Phys. j;, 363 (1957)
23) H.P. Kelly, Phys. Rev.

A1450 (1964)

~,

24) H.P. Kelly, Advan.Theoret. Phys . .1, 75 (1968)


25) H.P. Kelly, Advan. Chern. Phys. lj, 129 (1969)
26) K.A. Brueckner and D.T. Goldman, Phys. Rev.

ill,

307 (1960)

27) P.S. Epstein, Physic. Rev. 28, 695 (1926)


28) R.K. Nesbet, Proc. Roy. Soc. (London) A230, 312, 322 (1955)
29) E. Feenberg, Phys. Rev . .11, 206 (1948)
30) 1.0. Hirschfelder, W.B. Brown and S.T. Epstein, Advan. Quantum
Chern . .], 256 (1964)
31) K.F. Freed, Phys. Rev. 173, 1 (1968), Chem.Phys. Letters
32) R.K. Nesbet, Advan. Chern. Phys.
33) W. Meyer, Int. 1. Quantum
34) W. Meyer, 1.

11,

33 (1970)

1 (1969)

Chem.~,

Chem.Phys.~,

341 (1971)

1017 (1973)

35) E.L. Mehler, Int.J. Quantum Chern.

E,

437 (1973)

36) G.A. Van der Velde, PhD. Thesis, University of Groningen (1974)
37) R. Ahlrichs and W. Kutzelnigg, 1. Chern. Phys. 48, 1819 (1968)
38) M. 1 ungen and R. Ahlrichs, Theoret. Chim. Acta

339 (1970)

39) A.C. Hurley, 1. Lennard-lones and 1.A. Pople, Proc. Roy. Soc.
A220, 446 (1953)
40) R. McWeeny, Rev. Mod. Phys. 32, 355 (1960)
41) 1.M. Parks and R.G. Parr, 1. Chern. Phys. 28, 335 (1960)

502

M. A. ROBB

42) E. Kapuy, Acta. Phys. Acad. Sci. Hung,

185 (1960)

43) M.A. Robb and l.G. Csizmadia, J. Chern. Phys. 54, 3646 (1971)
44) V.R. Saunders and M.F. Guest, in Quantum Chemistry - the State of
the Art ed. V.R. Saunders and J. Brown (Proceedings of SRC Atlas
Symposium H4, 1974)
45) E. Kapuy, Theoret. Chim. Acta 6, 281 (1966)
46) E. Kapuy, Acta Phys. Acad. Sci. Hung. 12, 351 (1960)
47) E.L. Mehler, J. Chern. Phys. 59, 3485 (1973)
48) M.A. Robb and I.G. Csizmadia, Int. J. Quantum Chern. 4, 36 (1970)
49) A. Veillard and E. Clementi, Theoret. Chim. Acta 7, 133 (1967)
50) R. McWeeny and E. Steiner, Advan. Quantum Chern. 2, 93 (1965)
51) A.W. Weiss, Phys. Rev. A3, 126 (1971)
52) J.W. Viers, F.E. Harris, and H.F. Schaffer, Phys. Rev. AI, 24 (1970)
53) E.R. Davidson and C.F. Bender, J. Chern. Phys. 56, 4334 (1970)
54) B. Zurawski, R. Ah1richs, and W. Kutze1nigg, Chern. Phys.
Letters

309 (1973)

55) H. Lischka, V. Dyczmon, Chern. Phys. Letters 23, 167 (1973)


56) W. Meyer, in Quantum Chemistry - The State of the Art, ed. V.R.
Saunders and J. Brown, (Proceedings of SRC Atlas Symposium -/1-4 (1974)
57) S. Diner, J.P. Malrieu, P. Claverie, Theoret .. Chim. Acta

390 (1967)

58) F. Saski and M. Yoshimine, Phys. Rev. A9, 17 (1974)


59) F. Saski and M. Yoshimine, Phys. Rev. A9, 26 (1974)
60) J.H. Miller and H.P. Kelly, Phys. Rev. A3, 578 (1971)
61) T. Lee, N.C. Dutta and T.P. Das, Phys. Rev. A4, 1410 (1971)
62) K.C. Pan and H.F. King, J. Chern. Phys. 56, 4467 (1972)

PAIR FUNCTIONS AND DIAGRAMMATIC PERTURBATION THEORY

63) R.K. Nesbet, T.L. Barr and E.R. Davidson, Chern. Phys.
Letters 4, 203 (1969)
64) 1. PaId us, 1. Cizek, and I. Shavitt, Phys. Rev. AS, 50 (1972)
65) B. Roos, Chern. Phys. Letters 15, 153 (1973)

Acknowledgements
The author is indebted to his two research students, Miss S.P. Prime
and Mr. S. Chin for carrying out the numerical calculations on
H20 and CH 2F 2 .

503

SOME APPLICATIONS OF PROJECTION OPERATORS

R. McWeeny
Department of Chemistry,
The University, Sheffield S3 7HF, U.K.

This review is intended to indicate the power and generality


of projection operator techniques, starting with simple examples
and building up to recent applications in multi-configuration
self-consistent field (MC-SCF) theory.

1.

DEFINITIONS AND BASIC PROPERTIES


In its best known context, a projection operator applies to

a vector visualized as a directed line segment.


dimensions, v

= vle l

In two

+ v 2 e 2 and the

operation of "picking out" the component


in the e l direction may be defined by

giving the part of v which lies in a certain subspace of the


vector space.

Evidently

so PI has exactly the same effect, applied to any vector, as Pl.

Diercksen et al. (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 505-528.
All Rights Reserved Copyright 1975 by D. Reidel Publishing Company, Dordrecht-Holland.

506

R. McWEENY

We write

- the property of "idempotency", which characterizes all


projection operators.
Projection operators may, however, be defined more generally
in terms of sets e.g. any collection of objects.
a basket of food (f)

containing apples,

Suppose we take

bananas and cabbages

(a,b,c), and that

3a + 2b + 5c

indicates symbolically the composition of the collection (3


apples etc.).

Then the operation of selecting or "projecting

out" apples is defined by

3a

and again has the idempotency property p2 = P;

selecting

apples from the subset already selected gives no further change.


On the other hand selection of bananas from the subset of apples
would give zero:

o
We are constructing an algebraic representation of the operations,
introducing the rules as we go on;
3 apples, Oa

=0

thus 3a is a set containing

is the null set, containing nothing, etc.

The

properties of the projectors are thus

(1.1)

507

SOME APPLICATIONS OF PROJECTION OPERATORS

(1.2)
where the null operator 0 produces from any set the null set.
If we introduce the identity operator 1, which leaves any set
unchanged, we discover another important property.
P f + Pbf + P f

3a + 2b + 5c

For

f and with the usual convention

(distributive law)
f

this implies, since the result is true for any f, the operator
identity
1

(1. 3)

which is called the "resolution of the identity".

It simply

means that if we select all the subsets and then put them
together again we do not really change the set in any way!
Equations (1.1)-(1.3) summarize the general properties of
projection operators.

We note that the subsets may be combined

without disturbing the basic properties:if P ab

= Pa

+ P b is the

operator for selection of fruit (not vegetables)


P a + Pb = Pab

P abPc
1

Only the number of subsets is changed.

2.

MATRIX REPRESENTATIONS
Let us now consider an m-dimensional Hermitian vector space,

spanned by a basis e, i.e. by a set of orthonormal vectors,


With every operator (mapping) in the space
we can associate an m x m matrix:

508

R. McWEENY

Pe )
m

mm

(2.1)

The ith column thus contains the components of the new vector
e~

1.

Pe. relative to the original "fixed" basis.


1.

the matrix representing P.

We say IP is

If we have an arbitrary vector v,

and collect its components into a column V, we can write v

ev

and obtain a new vector


Pv

v'

IS

IP

IS

v'

The effect o P is thus to change vector components according to


v -+ v' = IE' v

(2.2)

Any sequence of operations is equivalent to .some single operation


(AB=C) and this property is mimicked by the matrices of the
representation I.!A IB = It).

What matrix do we associate with

projection onto a single unit vector v?

* l + v 2*v 2 + ... + vmvm


*
condition implies vlv
if we consider the square matrix
vv

The unit length

= vt v =

1 and clearly

we obtain

Thus the projection operator Pv onto the direction of an


arbitrary vector v is represented by
IP

vv

(2.3)

509

SOME APPLICATIONS OF PROJECTION OPERATORS

Obviously, it has the basic projection operator property, now


in matrix form,
2
Pv

. t
= v ( vt v)v
= vv t

together with a "metric" property (we have introduced a vector of


uni t "length")
tr IP

(2.4)

Suppose now we define any m orthonormal unit vectors (e.g. a new


basis ei, .. e~) by writing
(2.5)
where the columns of V are v l ,v2 ' - the components of v l ,v2 ,
relative to the basis

The matrices

~.

(2.6)

then clearly possess the properties

jp~
~

IP.

lP.iP .
~

IP 'IP'

= (() (ilj)

i=l

IP.

(2.7)

where 1 is the (mXm) unit matrix.

(The last property follows


t
t
They
because V is a unitary matrix and hence VV = Lv.v. = 1).
~

define a complete set of l-dimensional projection operators.


Let us now take the first n columns of V as a rectangular
(mxn)

matrix~.

Then TTt is evidently the sum of the first n

projection operators:
(2.8)

describes projection onto the n-dimensional subspace spanned by


the vectors v l ,v2 , ... v n

The remaining (m-n) vectors allow us

to define a projection matrix

jpl

for the complementary subspace

(m-n dimensional) and the properties of !P and !P' are

510

R.McWEENY

lP ,

lP,2

lP'

iPlP'=JP'IP=),

1P,+lP'

:0:

(2.9)

Only two subsets are now distinguished but the projection operators
which partition the space still have the basic properties (1.1),
(1. 2) and ( 1. 3)

What distinguishes whether W projects onto a

I-dimensional subspace or an n-dimensional subspace?


(from (2.4) amd (2.8
tr JP

Clearly

it is the trace

(2.10)

- the dimension of the subspace.

Any m-dimensional vector space

may thus be partitioned into subspaces, of dimensions n l ,n 2 , ... ,


by use of a set of projectors IP K , with trW = n and En = m.
KKK K
As examples, we might mention the occupied and unoccupied MO's
of a closed-shell SCF calculation, performed in an m-dimensional
basis (two subspaces);

or the doubly, singly, and empty MO's in

an open-shell calculation (three subspaces).


an arbitrary MO is described by a column
if there are m MO's altogether,
~1'~2'

~1'2'

In such examples

of AO coefficients;

... with coefficients

... we can write

and if we denote the projector onto a subspace of n occupied


MO's by

(to avoid confusion with IP the density matrix), then it

follows readily that

In other words projection picks out the part of an orbital which


can be expressed in terms of the occupied orbitals, and annihilates
the remainder.

This is exactly what happens, for example, when

a three-dimensional vector is projected onto a 2-dimensional


subspace (i.e. a plane) .

SOME APPLICATIONS OF PROJECTION OPERATORS

3.

511

CLOSED-SHELL SCF THEORY


With a one-determinant wavefunction of n doubly occupied MO's,

the energy expectation value is given by


(3.1)

or, expanding in terms of mAO's (or arbitrary basis functions),

2 tr

+ tr R G

(3.2)

(R

where R is the matrix, denoted by

in (2.8), describing

projection onto the occupied subspace.

The matrix

in (3.2)

is

defined in terms of R by
~(2R)

J(2R) - K(R)

(3.3)

where the functional notation simply indicates dependence on the


matrix argument, and
J(2R) ..
1J

2Rk<X i Xk lglx j X>

Rk<XiXklgIXXj>

k,

K(R) ..
1J

k,

(3.4)

The optimum orbitals, of LCAO form, follow directly from the


variation theorem by requiring that E in (3.2) be a minimum for
variations R

R+oR, subject to the constraint


(3.5)

These conditions provide a complete statement of the SCF problem:


usually the problem is expressed in terms of orbitals, and the
corresponding orthonormality constraint is dealt with using
Lagrangian multipliers.

Here we proceed another way, by

incorporating the constraint automatically.


To incorporate the constraint (3.5) we require

512

R.McWEENY

or, to first order of small quantities


(3.6)
Now R describes the projection operator onto the subspace spanned
by the n occupied orbitals, and

}-R

that for the complementary

subspace (virtual orbitals) of dimension m-n.


momentarily denoted by Rl and

~2'

If Rand l-R are

then (cf. (2.9))

~l

(3.7)

and it is trivial to show that


<=>

R.AR. = R.BR.
1

(i,j

1,2)

(3.8)

i.e. the equality of all corresponding projections is the


necessary and sufficient condition for the equality of two
matrices.

On applying this condition to (3.6) we obtain

(cf. McWeeny 1962)

R oR R
2
2

(l-R)o~(]-R)

=0

while the 1,2 and 2,1 components are undetermined and may be nonThe most general variation oR preserving idempotency to

zero.

first-order is thus

where X and Yare arbitrary mxm matrices, while the requirement


that R remains Hermitian symmetric implies yt = X and hence
oR

= R~(i-R)

+ (h.c.)

(3.9)

513

SOME APPLICATIONS OF PROJECTION OPERATORS

where (h.c.) indicates the Hermitian conjugate of the preceding


term.
The condition for stationary E may be written, from (3.2),

oE

2 tr offih + tr oRG + tr RoG

2 tr oR (Ih+G)

where we make use of a basic property of the

and K matrices

defined in (3.4) to show that


tr A 1(18)

for arbitrary A and lB.


OE

(3.10)

tr 18 G(A)

2 tr oR Ih

Thus, to first order,


(3.11)

As would be expected, the elements of fu

are the matrix elements

in the basis {X.} of the usual Hartree-Fock operator.


1.

The

condition for stationary E subject to (3.5) follows on


substituting (3.9) in (3.11):

oE

2tr ~([-R)IhFR + (c.c.)

The vanishing of each term, for

(l-R)1h R

IR ih (1L-1R)

(3.12)

arbitrary~,

leads to

= to

and hence, by subtraction,


(3.13)

The constrained stationary value condition is thus that the matrix


IR (and hence the density matrix
matrix

It/.

commute with the Hartree-Fock

It should be noticed that the orbitals (determined

by their expansion coefficients


formulation;

~)

~A'~B"")

do not appear in this

they are in fact arbitrary to within a unitary

transformation, under which the one-determinant wavefunction is


invariant.

The solution of the closed-shell Hartree-Fock problem

514

R. McWEENY

is contained in the matrix R alone i.e. in the projection


operator which defines the subspace of the occupied orbitals.
The orbitals themselves, if we wish to introduce them, simply
define a particular basis within that subspace.
To cast the condition (3.13) in the more familiar form of an
eigenvalue equation, we introduce the orbitals explicitly by

~Tt.

using R

The commutation condition then becomes

or, multiplying from the right by T and noting that the orthonormality conditions require rtT = 1,
(3.14)
where

TthF T is an nXn Hermitian matrix.

This is not a simple

eigenvalue equation as it stands, but it may be made so by


exploiting the invariance of the solution under a unitary
Thus, if we introduce T' = TV, T = T'Vt,
t
t
t
t
.
the density matrix is unchanged (TT = T'UJ UT' ,-", T'T' ) wh1.1e
transformation.

(3.14) becomes, after mUltiplication by V, from the right,

Since

is Hermitian it can be brought to diagonal form by

suitable choice of V;
of generality.

so [ may be assumed diagonal without loss

On dropping the primes, the matrix equation is

seen to be equivalent to a set of equations, one for each column


of T:

In other words, the Hartree-Fock orbitals in LCAO form are


determined by solving the matrix eigenvalue equation
(3.15)

515

SOME APPLICATIONS OF PROJECTION OPERATORS

which is the exact analogue of the usual operator form.

If the

set {x.} were complete, this equation (with infinite matrices)


1

would indeed be simply the matrix transcription of the exact


Hartree-Fock equation.

By proceeding from first principles,

however, using the variational formulation with finite LCAO-type


orbitals, we have shown that truncation to finite basis form
allows us to obtain best approximations to Hartree-Fock orbitals
in the usual sense of variation theory.
The actual mode of solution (i.e. of reaching selfconsistency) is well-known and will not be discussed here.

We

note, however, that the basis is usually non-orthogonal, with


overlap matrix $, and that all equations may be modified
accordingly.

The X basis used so far may be related

symmetrically to the non-orthogonal basis (X O say) by (LOwdin


1950)
(3.16)
When the previous equations are transcribed in terms of lO' the
final equation (3.15) is simply replaced by
(3.17)
where all quantities are now defined in the non-orthogonal basis
(subscript 0 finally discarded) .

It is also possible to use

gradient methods (e.g. McWeeny, 1956;

for a recent review see

Garfield & Sutcliffe, 1974) in minimizing (3.2) directly, without


recourse to repeated solution of eigenvalue equations.

4.

MULTI-SHELL SCF THEORY


In dealing with electron configurations containing one or

more incompletely filled shells, various difficulties arise;


the states of the configuration are usually not well described by
single determinants, and if a single determinant is used it may

516

R. McWEENY

give a Hamiltonian hF which lacks the full symmetry of the atom


or molecule considered - giving SCF orbitals (and a resultant wave
function) which violate symmetry requirements.

An example will

make this clear.


In their work on atoms, Hartree and his collaborators used
a method of "spherical averaging" in order to set up a oneelectron Hamiltonian (hF) with spherical symmetry.

In dealing

with boron, for example, the charge distribution due to a single


2p electron (in the orbital PO' say) would have the form
Ipo(r)

12

with axial symmetry:

Hartree performed an angular

integration, to obtain a potential function depending on radial


distance only, and used this function in solving the radial wave
equation-

suppression of the angle-dependence of the potential

was essential (in order to separate the variables) and the


resultant potential was identical with that of a charge density

{lp+I (r)

12

+ Ipo(r)

12

+ Ip_I (r)

12}

i.e. one sixth of the charge

density for a filled shell of doubly occupied 2p orbitals.

In

this way, an incompletely filled shell was treated using the


closed-shell theory with the introduction of a "fractional
occupation number" in the electron interaction terms.

Another

way of looking at this procedure, which ensures that the SCF


orbitals are symmetry constrained, is to regard the symmetrized
charge density (or, more generally, density matrix) as an
ensemble average in which the densities associated with all
states of the configuration (i.e. all permitted values of the
quantum numbers

m~

and ms

which specify the orbital and spin

symmetry) are combined with equal weight factors.


It can in fact be shown that ensemble averaging provides a
convenient method of deriving a fully symmetrized Hamiltonian
for a system with any number of partially occupied shells
e.g. an ionized molecule in which electrons have been ejected
from inner shells, as in photoelectron and Auger spectroscopy.
We discuss a system (generally a molecule) of any given symmetry

517

SOME APPLICATIONS OF PROJECTION OPERATORS

in a state described by a certain "configuration" corresponding


to an allocation of n K electrons to a "shell" of mK degenerate
orbitals, n L to another shell of mL orbitals, etc., assuming that
each degenerate set provides an irreducible representation of
the molecular symmetry group.

The individual states associated

with such a configuration would be obtained by vector-coupling


the determinants corresponding to all possible allocations of
electrons to the degenerate spin-orbitals of each shell, keeping
The energies of the
only the occupation numbers (nK) fixed.
states would be calculated by solving a corresponding secular
equation;

but the sum of the energies of the states (giving the

average energy of the configuration) is equal to the sum of the


diagonal elements of the secular matrix, which are energy
expectation values computed using all possible single determinants.
The average energy thus coincides with that of an ensemble of
systems in which all states of the configuration receive equal
weight.

It is not difficult to find the ensemble-averaged

density matrices (which exhibit the full symmetry of the molecule,


just as in Hartree's averaging of the electron density) and the
resultant average energy.
Let us denote the orbitals of shell K by

KKK
. $

~1,$2,

The energy expression is found to be (McWeeny, 1974)

I [l

E
av

2~

HK

n K (nK-l)
mK (2mK-l)

Gal

II

nKn L GKL
mKm
L
K,L

~
(4.1)

where
HK

L <$~JhJ$~>
l.
l.

(4.2)

.
l.

GKL

i,j

K LJ g J $.$.>
K L - t<$.$.
K LJ g J$.$.>J
L K
[<$.$.
l.J

l.J

l.J

Jl.

(4.3)

Evidently HK is the energy of a half-filled shell K (~electron

518

R, McWEENY

' terac t'10n


' 1) , GKK,1S a correspon d'1ng average e 1 ec t ron 1n
per orb 1ta
KL
energy, and G
gives an inter-shell repulsion energy for halffilled shells K and L.

The fractional occupation of the various

shells appears only in the numerical coefficients in (4.1).


We pass directly to a finite basis approximation, as used
in Section 3, by expressing the orbitals for all shells in terms
of a single set of m basis functions {X,} initially assumed
1

orthonormal.

The set of mK orbitals of shell K is then defined


by an mxmK matrix ~K of expansion coefficients, FK = XT , and
K
t
the K-shell subspace is characterised by the matrix RK = TK~K
representing a projection operator.

On defining a G matrix

exactly as in (3.3) it is then a simple matter to express E

av

in

the form
E

(4.4)

av

where
G(v'R ) +
KK

(4.5)

and the fractional occupation numbers are

v'

(4.6)

Clearly 0 < v K < 2, while the modified occupation number v~


vanishes when the shell considered contains only one electron
(eliminating a "self-interaction").

The terms in (4.5) are

defined exactly as in (3.3) and (3.4).

= 0, under variations
av
+ aRK' is now accompanied by constraints which

The stationary value condition oE


of the form

~ +

ensure the orthonormality of orbitals in the same shell and the


mutual orthogonality of orbitals in different shells.
of the matrices

these constraints become (cf.

(3.7

In terms

519

SOME APPLICATIONS OF PROJECTION OPERATORS

(all K,L)

(4.7)

where K,L label subspaces spanned by the orbitals of the, fully


occupied, partially occupied, and empty shells.

The matrix which

describes projection onto the set of empty orbitals (shell Z,


say) will be denoted, where necessary, by

~Z

and the resolution

of the identity (represented by the mom unit matrix) becomes


11.

(4.8)

Evidently, we shall need to consider explicitly only variations


of the occupied

~.

The method used in the closed-shell case again applies and


yields most general constrained variations of the form
(4.9)

(all K)

where x KL is a matrix with only a KL-projection:


(4.10)
where

~L

is an arbitrary mXm matrix.

The second of the

constraints (4.7), yields another condition, namely, x LK


which in turn implies

-x

KL

(4.11)

The stationary value condition on the energy then follows by


substituting (4.9) in (4.4).

By use of the symmetry condition

(3.10), the first-order change may be put in a form exactly


analogous to (3.11):
OEav =

v tr o~ ~
KKK

(4.12)

520

R. McWEENY

where the effective Hamiltonian of an electron in shell K is


(4.13)

Exactly as in the closed-shell theory, this differs from the


matrix

h + 1GK which appears in the expression for the total

electronic energy, where the ~K term takes a factor

to avoid

counting interactions twice in the summation over all shells.


The total energy (4.4) may be written

2:

av

tr IR \had
KKK

12: v K
K

tr ~(h + lli K)

(4.14)

where the Hamiltonian which gives an "additive partitioning"


of the energy is
(4.15)
and the expression (4.14) is familiar from closed-shell theory.
Substitution of (4.9) in (4.4) yields the stationary
condition, with all constraints incorporated,

OE

2:

2:

K(occ) L(fK)

v [tr
K

J(

IR Ih

KL L K

+ tr X

KL

~h ~l

The terms in the square bracket are complex conjugate:


trace must therefore vanish separately, for arbitrary

(4.16)

each
~KL.

This implies, taking L=Z,

lRlhlR

K K Z

=1()

(4.17)

(all K)

and, on taking the remaining terms in pairs and noting (4.11),

(all K,L)

(4.18)

where now K,L refer only to the occupied or partially occupied

521

SOME APPLICATIONS OF PROJECTION OPERATORS

shells.
As in the closed-shell case, it is convenient to obtain the
solution of these equations by factorizing each R matrix in the
t
form ~ = ~K~K and showing that the columns of ~K can be chosen
as eigenvectors of a certain effective Hamiltonian;

this

Hamiltonian cannot be simply h K , however, owing to the presence


of inter-shell orthogonality constraints.
single matrix

We therefore seek a

(3.13

such that (cf.

(all K)

(4.19)

be the necessary and sufficient condition for the R matrices to


t
satisfy (4.17) and (4.18); for then ~(=TK~K ) can be
construCted from a set of mK eigenvectors of

rh.

To this end it is convenient to define projection matrices


for the union of two subspaces,
of a matrix such as

~+~.

The eigenvectors

RKhKRK would clearly be confined to the

separate subspaces, Kirrespective of any iterative adjustment


of the b K , and such a matrix would therefore lack the flexibility
required of ~.

A matrix

corresponding to mixtures

IKKK
R zb R
KZ

would have eigenvectors

of occupied and empty subspaces but

would not permit variations associated with mixing between


different partially occupied shells.

If, however, we choose

(4.20)

(K,L referring only to the fully or partially occupied shells) it


is easy to verify that the vanishing of all possible projections
of the commutator (4.19) is identically equivalent to the conditions
(4.17) and (4.18), for all non-zero values of the numerical
parameters a K , b KL

The commutation conditions are in turn

satisfied when the matrices


corresponding sets

~K

are constructed from the

of eigenvectors of the equation

522

R. McWEENY

(4.21)

exactly as in the closed-shell case (cf.

(3.15)), where now the

orbitals of all shells are determined simultaneously from a


single, but more complicated, effective Hamiltonian.
Since the introduction of

in (4.20)

is merely a

mathematical device for obtaining the desired solutions, this


matrix does not necessarily have physical significance.

In

fact, the eigenvalues in (4.21) may be shifted at will be changing


the numerical values of the arbitrary parameters a K and b KL
(though the eigenvectors remain invariant) .

There is thus no

immediate analogue of, for example, the Koopmans theorem.

This

arbitrariness may be removed in a physically satisfactory way by


requiring that the strictly one-electron part of h

(e.g. the

kinetic energy), which depends in no way on the inclusion of


interactions in an "effective field", shall yield the same
expectation values as the true one-electron Hamiltonian h.
The "canonical Hamiltonian" so defined contains parameters
(McWeeny, 1974)

(all K)

KL

1
(v -v )

(all K,L;

K<L)

(4.22)

where NS is the total number of shells (excluding the empty


shell Z).

With this choice the eigenvalues normally follow

the expected sequence of orbital energies, although there is


still no direct analogue of the Koopmans theorem.
The actual solution of the multi-shell SCF problem is
exactly parallel to that for a closed-shell system.

From an

initial approximation to the RK (constructed from eigenvectors


of any plausible "model" Hamiltonian) ,

(4.20);

lh

is set up according to

solution of (4.21) gives sets of eigenvectors from

which the RK may be calculated;

h may then be revised and the

process continued until self-consistency is achieved.

During

523

SOME APPLICATIONS OF PROJECTION OPERATORS

the iteration, the eigenvectors belonging to a given shell are


easily identified by projection:
shell K, while if

RK~

if

~~

~,

then

belongs to

0 it belongs to another shell.

As in

the closed shell case, various devices may be used to speed


convergence;

and the modifications necessary to admit basis set

non-orthogonality are exactly similar.


Finally, we note that in certain special cases an ensemble
average, over a particular degenerate set of states, may coincide
with the energy of a single symmetry constrained state.

Examples

have been given by Roothaan (1960), all for the case of a single
open shell, the best known being that of n l doubly-occupied
orbitals and n 2 singly occupied with all spins parallel.
Such states yield energy expressions of the form (4.4), apart
from a modification of the weights of the
the open-shell
5.

and K matrices in

matrix.

MULTI-CONFIGURATION SCF THEORY


It is not generally possible to obtain an energy expression

from a many-determinant wavefunction in terms of projection


operators alone;

the two-electron density matrix is not in general

determined by projection operators for the occupied orbitals.


There is however one special case, which lends itself well to
treatment along the lines of the last Section and which we now
consider.
A restricted but extremely effective, form of MC SCF theory
was introduced by Clementi & Veillard (1967) and by Das & Wahl
(1966):

these authors consider a system with a non-degenerate

closed-shell ground state, represented in first approximation by


a single determinant ~O of n(=tN) doubly occupied orbitals, and
admit mixing with all determinants

~TU

in which a pair of

electrons has been transferred from an occupied orbital


unoccupied orbital

~U.

~T

to an

The approximate wave function is thus

524

R.McWEENY

q,
TU TU

(5.1)

and yields an expectation value (taking all quantities real, for


convenience)

a a

TU

<q,

IH I q, TU >

(5.2)

It is then possible, after expressing the orbitals {~A""~T""


~u""} in terms of a common basis set {X r }, to rewrite the energy

expression as

(5.3)

where T runs over the n orbitals in q, 0' and U over the "weak ly
occupied" orbitals in the q,TU (in general a subset of the
remaining m-n linearly independent functions).

The matrices IRT

and lRu represent projection operators onto the one-dimensional


subspaces spanned by

~T

and

~u'

respectively;

coefficients collected into columns

The matrices

~~d

and

~T

and

with expansion

~U"

fRU

(5.4)

h~d

are effective Hamiltonian matrices which

give an "additive partitioning" of the energy into orbital


contributions, weighted with occupation numbers v T and Vu'
We
have written the expression given by Clementi & Veillard (1967)
in the form (5.3) to emphasise the parallel with the multi-shell
.
ad
ad
SCF theory of Section 4:
the matrlces hT and fuu for the
"strongly" and "weakly" occupied orbitals, respectively, may be
written

525

SOME APPLICATIONS OF PROJECTION OPERATORS

(5. S)

where the electron interaction terms are defined presently.


To proceed, we introduce an ensemble interpretation in
which, for example,

la TU

I2

indicates the fractional number of

systems in which an electron pair has been moved from


~u.

~T

to

Thus, the fractional number with an electron pair in

is p = a~ +
L
a2"
whilst that of a pair in
T 2
T
T' (;;tT),U T U . .
.
is PU = ~ a TU ; since
normal~zat~on requ~res

orbital

2
aO +

T,U

2
a
.TU

it follows easily that

PT

[1 -

L aTul

Pu

L a TU

(5.6)

whilst the average occupation numbers are simply

(5.7)

Clementi & Veillard (1967) interpreted their energy expression in


terms of the "pair probabilities" (5.6), and this interpretation
has been developed further by Golebiewski & Nowak-Broclawik

(1973) .

The fractional numbers of systems in the ensemble with

one pair in T and one in T', or with one in T and one in U, are
easily seen to be

1 -

(5.8)

The only other new quantities needed, in casting the energy


expression in a neat form, are the "coupling coefficients"

WTT '

(5.9)

~U

526

R. McWEENY

Clearly the "probabilities" determine the parts of the energy


expression arising from diagonal matrix elements, the coupling
coefficients those parts corning from the off-diagonal elements.
To write the energy expression (5.1) in its most compact
explicit form we use the letters K,L to label both strongly and
weakly occupied orbitals, interpreting the quantities PK' PKL
(=PLK)' wKL (=wLK ) according to (5.6), (5.8) and (5.9), which
give the only non-zero values.
Thus, for example (5.3) becomes
E

\'

L V K tr

fR/tKad '

!had
K

Ih + .LQ;
2

(5.10)

and, explicitly, we find


E

L p [tr
K K

h + tr.R IS (IR ) 1
K-K
K

[R

(5.ll)

which contains pair energies and pair-pair interactions, with


appropriate weight factors (PK' PKL)' plus exchange terms arising
from the CI.

Comparison of the last two forms, shows that


(5.12)

The stationary value problem may now be solved exactly as in


Section 4.
We consider a variation of all the strongly and weakly
occupied orbitals, with fixed values of the expansion coefficients
aTU ' and use the property (3.10) to eliminate the first order
To first order, we find

variations of the Q;K.

oE

L v tr h
KKK

olR

where the effective Hamiltonians are

(5.13)

527

SOME APPLICATIONS OF PROJECTION OPERATORS

(5.14)

IhK = !h + (bK
The difference between IhK and

h~d

in (5.14) and (5.10), is

exactly analogous to that between (4.13) and (4.15), the factor


~ in (5.10)

ensuring that electron interactions are not counted

twice in summing the contributions to E.

The variation (5.13)

must now be made to vanish subject to the constraints (4.7),


where K,L are to run over all orbitals, T-type, U-type and any
empty orbitals

(Z) orthogonal to those involved in the CI

expansion (5.1).
same lines:

Solution therefore proceeds along exactly the

iterative solution of the eigenvalue equation (4.21)

yields a set of self-consistent orbitals which minimize E for a


given set of CI mixing coefficients.
Finally, 'Me recall that the mixing coefficients are optimized
by solution of the usual CI secular equations.

In:terms of the

matrices RK , the required matrix elements take the forms

(using

iRO for the "closed-shell" R-matrix of <PO)

(5.15)

After each optimization of the orbitals, the CI coefficients a TU


must be redetermined by solving

IH a

ell

(5.16)

with the matrix elements of IH determined as above.


We shall not discuss the practical details of the double

528

R.McWEENY

iteration procedure.

It is, however, practicable and many

variations are possible;

for example, the repeated solution of

the eigenvalue equation for the orbitals may be replaced by


descent or two-by-two rotation methods.

Results obtained so far

have been striking (see, for example, Das & Wahl, 1970;

Wahl &

Das, 1970).

SOME APPLICATIONS OF PROJECTION OPERATORS - REFERENCES

E. Clementi & A. Veillard (1967) Theoret.chim.Acta,

2.,

133

G. Das & A.C. Wahl (1966) J.Chem.Phys., 44, 876


G. Das & A.C. Wahl ( 1970) Phys.Rev.Letters, 24, 440
D. Garton & B.T. Sutcliffe (1974) in Specialist Periodical
Reports, "Theoretical Chemistry" Vol. 1, Quantum Mechanics,
Chemical Society (London)
A. Golebiewski & E. Nowak-Broclawik (1973) Mol.Phys., 26, 989
P.O. Lowdin (1950) J.Chem.Phys., 18, 365
R. McWeeny (1956) Proc.Roy.Soc. (Lond.), A235, 496
R. McWeeny (1960) Rev.Mod.Phys., 32, 335
R. McWeeny (1962) Phys.Rev., 126, 1028
R. McWeeny (1974) Mol.Phys.,

(in press)

C.C.J. Roothaan (1960) Rev.Mod.Phys.,

~,

179

A.C. Wahl & G. Das (1970) in Adv. in Quant.Chem.,

2'

261,

ed. P.O. Lowdin, Academic Press, New York & London.

MOLECULES IN ASTROPHYSICS

G. Winnewisser
Max-Planck-Institut fur Radioastronomie,
D-53 Bonn, Germany

The total mass of our Galaxy is 2 x loll M . In


the Galaxy more than 90 % of this mass is in th~ form
of stars, and less than 10 % represents the tenuous
qaseous material located between the stars: the interstellar matter (ISM). The principal constituents of
interstellar matter are gas and fine dust particles.
The qas consists mainly of hydrogen and helium, with
an approxiMate ratio by mass of H: He: (all heavier
elements) = 70:28:2. Dust accounts for about 1 % of
the mass of the ISM. The raw ~aterial for the formation of young stars is supplied by the ISM. Conversely, stars can reach their final stable "white dwarf"
staqe only when their mass has decreased to less than
1.4 Me' During phases of extensive mass 10SSi through
stellar winds, novae, super novae and planetary nebulae, stars expell mass into interstellar space. Thus,
the ISM consists not only of matter left over from the
formation of stars but also of matter that was processed through the interior of stars. Obviously stars
and the ISM are not two separate entities but are very
closely coupled through the cosmic recycling process
of star formation and subsequent evolution.

33
One solar mass 1 M
2 x ~Q g. In comparison
the mass of the earth i~ 6 x 1027g.

Diercksen et al. (eds.), Computational Techniques in Quantum Chemistry and Molecular Physics, 529-568.
All Rights Reserved. Copyright 1975 by D. Reidel Publishing Company, DordrechtHolland.

530

G. WINNEWISSER

The first evidence that interstellar molecules


inhabit tenous interstell~r clouds came from the detection of CH, CN, and CH . These strong absorption
lines occur in the optical reqion and are transitions
from the ground electronic state to excited electronic
states. Several diffuse broad absorption features between 4400 and 6100 ~ have not been identified, but may
be of molecular orioin. The first radio radiation from
a molecule was discovered in 1963 (Weinreb, et al.).
It is the well known A-type tran 2ition between the
lowest rotational levels of the
1r 3/2 state of OH.
In 1968 the detection of the first
polyatomic
molecule ammonia, NH 3 , was announced (Cheunq et al.
1968), followed in 1969 by formaldehyde, H2 CO,
(Snyder et al. 1969) and water, H2 0 (Cheung et al.
1969). Since then more than 150 transitions of some
45 different molecules have been reported. With the
improvement of receiver sensitivity, particularly in
the millimeterwave portion of the electromagnetic
spectrum, and the knowledge of new laboratory rest
frequencies, combined with more extensive interstellar
line searches, the number of known interstellar molecules will certainly increase.
The known interstellar molecular transitions now
span a frequency range from 0.8 GHz to 230 GHz and the
extension of radio observations up to 300 GHz is expected.
The number of different molecules, their sizes
and chemical complexity and the variety of the degrees
of excitation reveal a totally unexpected complexity
of the cool gas component of interstellar clouds.
Interstellar molecules have become of importance to
radioastronomy, spectroscopy and chemistry for a number of reasons.
(1) Within our Galaxy molecules are found predominantly in dense and cool interstellar clouds.
These reglons cannot be investigated by the A21-cm
hyperfine line of atomic hydrogen because most of the
hydrogen is in molecular form,nor by optical means because the optical extinction is too large to "see" into them. Millimeter and microwave molecular transitions
allow for the first time a detailed study of dense
clouds.

MOLECULES IN ASTROPHYSICS

531

(2) Observations of different interstellar molecules and various transitions of anyone molecule provide an excellent tool for studying the physical state,
dynamics, radiation fields and chemistry within individual clouds. Many of these clouds appear to be likely sites for star formation.
(3) Maser emission spectra of OH, H2 0 and SiO allow the study of the early and late stages in the evolution of stars. In particular, this applies to the
study of red giant variable stars in the SiO lines,
whereas the OH, H2 0 maser appear to be primarily associated with the early phases of star formation.
(4) Optical and radio observatioBs provide an indirect method of measuring the intensity and spectrum
of the microwave background radiation.
(5) Molecules are now being increasingly used to
study galactic structure, in particular the galactic
nucleus. Thus, complementing and extending the information gained froM both the A 21 cm line of atomic hydrogen ann the recombination lines.
(6) The presently known composition of molecules
indicates that simple bio-molecules may exist in interstellar clouds. They may be detectable once their rest
frequencies are precisely known.
(7) The discovery of simple free radicals and
molecular ions in interstellar clouds allows the study
of their spectra and derivation of their structure,
which has not been possible+by laboratory techniques.
Examples are CH,CN,C 2 H, N2H .
In Table I we provide a complete list of all interstellar molecules detected at optical and radio frequencies up to September 1974. Interstellar transitions in the radio frequency portion of the electromaanetic spectrum are displayed in Fig. 1., together
with the laboratory spectra of three molecules, HCN,
H2 CO and I!2NCII. Detailed comparisons between detected
interstellar spectra and the rich rotational spectrum
of most molecules show that only a few transitions of
each individual molecule have been detected in interstellar space. Interstellar transitions occur in
emission or absorption. All molecular lines in the
millimeter wave region are seen in emission, whereas
at centimeter wavelenaths they can be seen both in

532

G. WINNEWISSER

Table I. Molecules Detected in Interstellar Space


(Listing ends Aug. 1974. Recent detections
are underlined)

a) Optical or UV detection
12 CH ,

12 cH +,

13 cn +, CN, CO,

13 CO , II 2 , IJD.

b) In Radioastronomical Techniaues

x-ogen
(HNC?)

(U
(U

(89190)
(90665)

...J

a
~
~

II:

>-

...

tii
II:

...J
...J

..c{ii:

II:~

HCN

2n-2'2

312-3,3

4'3-4'4

50

J=1-0
'~-112.10

5,,-:S,5

S,4-51S~1~000

J=1-0

III 1i

H'lc'4N \

1-Pc~l.i2C~

6,5':616

I
100
I

150

O'2C'4N

110-101

H'2C'~

J=3-2

I
200

;! I I~I I i

I 13~;-2~~':.>,j/~,-2203'2-2'1 I

4,,-4,.

~~f]'~r

322~221

,
I
GHz ]"=3-2
,----'-----,

5os-4'1(.8,7-6,e~-2,,-2~312~30J-202 fC"Jp2 2', Su,-505


I

H'lc 14 N H'2C'4N '

Ii r Ii r

D12C'~ D'lc~

J=2-1

::c

::<l

'">-i

W
W

Vl

Fig. 1: Comparison between interstellar molecular transitions and laboratory


spectra. All interstellar lines which have been detected in emission or absorption
are plotted in the frequency range below 220 GHz. The atmospheric attenuation
(Nlo dB) due to 02 and H20 is indicated by the dashed areas. The laboratory spectra
of 3 molecules are plotted in the same frequency range.

110-1"

H 2 CNH (==~:)

H2 CO

I---

I O'2C'4N

o'lc'''N

O'2C1SN

J=2-1
---"

>

~
~
'"z

534

G. WINNEWISSER

absorption and emission. This behaviour outlines


another field of current interest: the understanding
of the excitation mechanisms which overpopulate or
underpopulate certain energy levels with respect to
each other.
It is not possible to present in a short review
paper a systematic and comprehensive account of the
subject of microwave molecular spectroscopy together
with the astrophysical developments in observation and
interpretation of interstellar millimeter and microwave molecular snectra.
I will try to give in Section I a short outline
of how to derive from microwave molecular spectra information about the molecules. In part II certain aspects of interstellar molecular spectroscopy will be
discussed such as the distribution of molecules in
various interstellar sources and the physical parameters of some clouds as derived from the observed interstellar molecular spectra.
I. MICROHAVE SPECTRA AND MOLECULAR STRUCTURE
a) Diatomic and linear molecules
Most of the spectra observed in the microwave and
far infrared portion of the electromagnetic spectrum
are pure rotational spectra. Table II lists the most
important energy terms necessary both to explain the
high resolution spectra and to interprete the observed
interstellar spectra.
Table II
Most important Energy Terms for Microwave Spectroscopy
Rigid Rotation
Centrifugal Distortion
Internal Rotation, Rotation-Inversion
Nuclear nuadrupole Couplina
Electron Spin-Rotation Couplina
Nuclear Spin-Rotation Couplin~
Coriolis Coupling
Zeeman - and Stark Effect

535

MOLECULES IN ASTROPHYSICS

The rotational spectru~ of a molecule depends


primarily on the structure of the ~olecule. For a diato~ic or linear polyatomic molecule such as CO, HCN,
HCCCN and others the rotational spectrum assumes a
simple form. It consists of a series of equidistant
lines. The freauencies of the individual transitions
are readily calculated by using
~ul

=h

(Eu - E l )

where E and El are the eneraies of the upper and


lower Ihlvel involved in the transition together with
the analytic energy expression for the rigid rotor as
derived from quantum mechanics (see for example Gordy
and Cook, 1970)
h2
E
J (J + 1) (J = 0,1,2 .. )

r O ist the mo~ent of inertia perpendicular to the internuclear axis passing through the center of mass. J is
the rotational quantum number which gives the angular
momI~t~~ in unit~ ~~ h. Fig. 2 shows the energy levels
of
C Sand
C S toaether with the rotational
spectrum

~= ~(EJ+l

- EJ ) = 2

8,]"l?r o

-(J+l) = 2B o (J+l)

derived from the selection rule ~J


introduced the rotational constant
B

+1. Here we have

which is proportional to the reciprocal moment of inertia rO. The proportion~lity factor is given by
B x I = (5.053763)) x 10 MHz AMU $..2. B values of presently detected interstellar molecules extend over a
wide range of values. For example, light molecules such
as CH and OH have B values of 417.7 GHz and 555 GHz,
respectively. Theirofirst rotational transitions lie
therefore in the far infrared region. Heavy molecules
like cyanoacetylene, HCCCN, and carbonylsulfide, OC5,
have small values of B = 4.549 GHz and 6.081 GHz respectively. RotationalOtransitions of these molecules
occur at nearly regular intervals throughout the entire
microwave region.
The precise determination of the rotational constant from the microwave and millimeter wave spectra
allows a precise evaluation of the internuclear dis-

536

G. WINNEWISSER

cm- 1

J
4

15

195954

10

192818

3
146969

144617

2
97981

96413

1
48207

48990

1-0 2-1 3-2 4-3

1-0 2-1 3-2 4-3

~~~~~~~E ~I~+I
__I~+I__I~I-~~~I-+1~1,~~li~I,o
__

100

200

100

FREQUENCY(GHz)------

200

Fiq. 2: Enerqy levels and rotational spectrum of CS.


The numbers next to the arrows indicate the transition
frequencies in MHz.

MOLECULES IN ASTROPHYSICS

537

tance in the rotational ground state, since


IO =

r ro2

where ~ is the reduced mass and r the effective


internuclear distance. In the detgrmination of IOfrom
the observed spectra. deviations of the molecule from
the models of a rigid-rotor have to be taken into account, i.e. centrifugal distortion effects. These produce slight shifts in the position of the rigid rotor
lines. Centrifugal distortion constants can be determined with the measurements of more than one rotational
transition:

v=

2Bo (J+I) - 4Do (J+I) + .

where the centrifugal sj2etching constant D B . As


an example we give 12C S (Kewley et al. 1~63) 0
Bo = 24 495.574 6(51)MHziD o =40.24(29)kHzir 0 =1.53772(6)R
In the case of linear polyatomic molecules the moment
of inertia depends on the different internuclear distances and can be evaluated only if the B values for
two or more isotopically substituted mole8ules are
known. Moments of inertia can be obtained from the
spectra with an accuracy of seven or more significant
figures. However, an evaluation of the molecular structure with comparable accuracy to that of the moments
of inertia is difficult, if not impossible. Sources of
uncertainty in the determination of the structural parameters are their dependence on the vibrational energy.
In the ground state the molecule still retains the
zero point vibrational energy. Additional effects are
the uncertainty in Planck~constant, and higher order
corrections such as electronic effects, the wobblestretch effect and others. Different procedures have
been proposed to correct for the various vibrational
effects. This has led to different conceptions of interatomic distances. Four types of bond lengths are
defined:
(1) The ro structure. The effective bond length
for the ground vibrational state.
(2) The r
structure. The equilibrium bond length
for the hypoth~tical vibrationless state. Vibrational
effects including zero-point vibration have been removed to obtain Be.

538

G. WINNEWISSER

(3) The r
structure. The substitution bond length
is derived froffi isotopic substitution method. The effects of zero-point vibration are partially cancelled
but not actually removed.
(4) The(r> structure. The average bond distances
are evaluated from a partial correction of vibrational
effects involving the mean-square harmonic vibrational
amplitude.
In Table III results obtained from the various
methods of calculating the HCN bond distances are summarized and compared along with the dispersion in the
values.
COMPARISON OF DIFFERENT TYPES OF INTERATOMIC DISTANCES' IN HCN
r(C-H)

r,
ro
r, (with a
and y)c
r, (Rank et al.)e
r, (from 100,020,000'

1.06316
1.064
1.06549
1.06593
1.06573

~
~
~
~

r(C==N)

0.00010 b
O.Ollb
0.00024 d

1.15512
1.1564
1.15321

0.00010
0.00080 d

1.15313
1.15317

~
~

r(N-H)

0.00015 b
0.OO24b
0.OOOO5 d

2.21828
2.2205
2.21870

0.00002
0.00025 d

2.21906
2.21876

0.00012 b
0.OO90 b
O.OOO20 d
0.00008
O.OOO60 d

All calculations used I = 505376/B(MHz) and unified mass units given by L. A. KONIG,
J. H. E. MATTAUCH, and A. H. WAPSTRA, Nucl. Phys. 31. 18 (1962).
b The r, and ro distances are averages of the values obtained for all possible permutations of Bo values. The uncertainty is due to the dispersion of the calculated values and
does not allow for measurement errors.
c B,(HCN) = 44512.36:ol-: 0.23 MHz; B,eDCN) = 36329.49 ~ 1.53 MHz.
d The uncertainty is the estimated error limit from the Be values.
e See D. H. Rank, G. SKORINKO, D. P. EASTMAN, and T. A. WIGGINS, J. Opt. Soc. Amer.
60. 421 (1960).
f Calculated using Be = 272 Bo - 72 B100 - 72 B020 - 72 B001 for HCN, DCN, and DI3CN.

The vibrational dependence of the B-values has


the general form

L.

at. (v.+l/2) + .%?-1i.](v.+1/2) (v.+l/2)+1R(


1
1
<=j Jd
1
J
where the sUms run over the normal modes. The rotationvibration constants ~,~ are very much smaller than
the rotational constants. Costain (1958) has proposed
the use of the substitution distances (r ) as a means
of partially cancelling out the effects 8f zero-point
vibration without really removing it, the idea being
that r distances are more consistant (have less scatter) r~gardless of which sets of isotopic species are
used. It has been pointed out (Winnewisser et al.,1971)
that the r
distances or substitution bond lengths are
still the ffiost reliable and readily measurable quantities for structural comparisons between molecules. The
rs structures of various molecules derived from microB

= B

539

MOLECULES IN ASTROPHYSICS

wave data are compared with each other. The numbers in


parenthesis are the experimental errors in units of
the last significant figures (Ninnewisser et al.,1972).

Methinophosphide
Hydrogen cyanide
Cyanoacetylene
Fluoroacetylene
Fulminic acid
Nitrous oxide

1.0667(5)

H
H
H
H

1.063 16(10!
1.058
1.053
1.026~4!

C
C
C
C
C
N

1.5421(5)
1.155 12(15)
1.205
1.198
1.1679(4)
1.128 ~2)

N
C
C
N
N

1.378

1.279

F
0
0

1.1994(4)
1.187~3!

1.159

Note for example that the r (CH) of HCNO is anomalously short. The shrinkaqe by 8.0356 ~ of the CH bond
lenqth cannot be explained in terms of changing hybridization of the carbon atoms since the carbon atom in
both HCN and HeNO forms predominantly sp bonds, and
any other hybridization must lead to a longer CH distance. A reinterpretation of the rs structure parameters of fulminic acid in the light of the quasilinear
model leads to an explanation of the short CH bond
distance of 1.027 1\ as the projection of a cn bond
length of 1.060 (5) K upon the heavy atom axis (lVinnewisser et al. 1974).
b) Symmetric and asymmetric rotor molecules: For nonlinear molecules, the rotational energies and the resulting spectra are more complex. The inertial properties of the molecule have to be described by the three
moments of inertia
10 =

~
8~

1,

10 =

h 1
8".](2 B

10 =

8:Jf2 Co

0
0
with the definition Ia ~ Ib= I c Ao,B o ' Co are the three
rotational constants.
If two of the three moments of inertia are equal
Ib< Ie' the molecule is a symmetric top, such as
Cft 3 CN, CH CCH, NH3 and others. The three rotational
degrees of freedom are quantum mechanically described
by a complete set of commuting operators: the Hamiltonian operator H, the square of the total angular
momentrum operator p2 and its projection on the figure
axis P z . The rotational energy can therefore be expressed by the quantum number J of the total angular
momentrum and K its projection along the symmetry axis:
I

E/h = B J(J+l) + (A - Bo)K2


o
0

with J'9 K

540

G. WINNEWISSER

I
is the moment of inertia about the a-axis (symmetry
a~is) and Ib that about the b-axis which is perpendicular to the a-axis. Note that there exists a degeneracy in the K levels, due to the values of -K and +K
yielding the same energy. The selection rules are 6J =
+ 1 and in addition on account of symmetryAK = o.
Therefore each line of the rotational spectrum consists
of J+l components which coincide for the rigid molecule,
but a+e separated if centrifugal distortion is present.
An example of such a spectrum is presented in Fig. 3,
where the laboratory spectrum of CH 3CN is compared
with the astrophysical spectrum (Solomon et ale 1971,
1973)
With all three moments of inertia different,
I
Ib~I , the molecule is classified as an asymmetric
t8p. ThenCthe Hamiltonian is

+ C J2 + 1:. L-;- (""L (fd.{l.{l,


r
) J2
H t /h = A J2 + B J2
IX J2+
{J,
4 <x,f->
ro
0 a
0 b
0 c
IT
,where J , J b and J
are components of the angular momentrumain units of h. A , Band C are the effective
rotational constants, an8 ( T~~) ~re the centrifugal
distortion constants as defined by Kivelson and Wilson
(1952) ,
and are summed over a, b,c. The centrifugal
distortion constants are smaller than the rotational
constants by factors of about 10- 4 - 10- 6 . Higher order terms in the angular momenta (six and higher powers)
have to be included to predict precisely high J and
high K energy levels. For all except the lightest molecules the series converges rapidly. The fact that only
even powers of the angular momentum operators occur is
related to the invariance of the Hamiltonian to time
reversal.
The relationship of the rotational constants to
the energy levels and hence to the observed spectra is
complicated by the fact that it is not possible to
write an explicit expression for the eigenvalues of
the Hamiltonian of the asymmetric top, as was possible
for the other cases. This problem is discussed in detail by King et ale (1943) and Cook and Gordy (1970).
The influence of the centrifugal distortion constants
is further complicated by the fact that only five linear combinations of the six ~/constants of the J4 term
are determinable from the spectrum, as discussed by
Watson (1966,1967). A general description of the perturbation calculation by which the complete Hamiltonian
is reduced to an effective rotational Hamiltonian within each vibrational state has been given by Mills (1972).

541

MOLECULES IN ASTROPHYSICS

y (GHz)

110.390

0.370

0.350

0.330

59' B
V = 63 km/sec

0.80

Interstellar
space

0.60
0.40

J=6--S

0.20
0.00
- 0.20

_ _ _ _U - - L_ _

____

____

_ _ _ _ _ _ _ _- J

Laboratory

J=6-5

Fig. 3: Comparison of the interstellar emission spectrum of CH 3 CN with the laboratory absorption spectrum
of the J = 6 - 5 transition. The interstellar emission
spectrum is seen in the direction of Sagittarius B2,
and shows the various values of K (Solomon et al.,
1971). The line shapes of the laboratory spectra resemble
first. -. ---'-derivative.
---_. a ._.,-_.
..._--------------------

__

__

_._-_

From the observed spectra all three moments of


inertia can usually be determined with relative ease,
once the characteristic low J transitions have been
assigned. Refined rotational constants and a set of
centrifugal distortion constants can be derived from
a more complete data set, which is obtained by a bootstrap procedure as suggested by Kirchhoff (1972). The
rotational spectra of a large number of asymmetric top
molecules have been studied and their structures have
been determined. In Table IV we quote ground state
spectroscopic constants of some asymmetric top molecules whose spectra have been studied recently in the
laboratory, and are of potential astrophysical interest.

56 312.9(190)

H C 2 HNH 2
e hylam1ne

1)
a)
b)
c)
d)
e)
f)
g)

62 927.723(9)
62 036.112(19)

6 134.260 6(8)
6 125 305 6(5)

11 071.010 27(62)

10 034.756(58)

5 584.753 9(8)
5 569.643 7(5)

10 910.577 48(64)

8 564.930(60)

4 868.940 21(25)

g
g

4 499.510 7(69)
4 513.828 9(8)

Ref.

9 915.239 6(62)

.j::..

:s
:s

Figures in paranfhesis are standard errors in units of the last significant figures
Johns,J.W.C.,Stone,J.M.R.and Winnewisser,G,J.Mo1ec.Spectrosc.42,523,1972.
(;')
Winnewisser, G., J.Molec.Spectrosc.46,16,1973.
-Gerry,M.C.L., and l'Jinnewisser,G., 48,1,1973.(J .Molec.Spectrosc.)
z
Yamada,K., and h'innewisser, M. (private communication) 1974.
zt"l
Lovas,F., Clark, F.O. and Tiemann,E. (private communication) 1974.
on
Hocking,W.H. ,Gerry,M.C.L., and Winnewisser,G.,Cag.J.Phys.(to be published) 1975.
on
t"l
~
Hocking,l'l1.H., and ~Hnnewisser,G., J.C.S. Chem.Comm. (in press) 1975.

cis
trans
mpnothioformic acid

HCOSH

HNCO
isocyanic acid

912 712.288(136)

51 479.457 8(30)

H C 2 HNC
vfnyl-iso-cyanide

4 971.214 1(8)

49 85-0. 700 ( 13 )

H2 C 2 HCN
vl.ny1 cyanide
5 386.648 56(25)

4 826.301 4(73)

10 293.972 2(59)

68 035.299 4(429)

282 081. (12)

HCCCHO
propyna1

H2 C2 O
ketene

Molecule

Table IV. Rotational Constants of some Asymmetric Rotors (in MHZ)l

Vl

tv

543

MOLECULES IN ASTROPHYSICS

The assumed model works 'tlell for most molecules.


However, for light molecules, such as H2 0, H2 S and
others, the centrifugal distortion effects become
huge (Helminger et al., 1972) and terms up to the
tenth power in the angular momenta have to be introduced into the Hamiltonian. Each degree nin the angular momentum adds (n + 1) distortion coefficients
to the reduced Hamiltonian. Thus for HDO the
1
- 1
transition at 80578.15 MHz is shifted by
-199.1 k~z from its rigid rotor position, whereas for
the 616 - 523 transition at 138530.57r.mz ..of the same
molecule the distortion effect accounts already for a
shift of 15 541.49 MHz (De Lucia et a1., 1971). The
effects of centrifugal distortion on the line positions
of several molecules are displayed in Fig. 4. Table V
presents the distortion constants of vinyl cyanide,
recently discovered in interstellar space.
Table V: Ground state spectroscopic constants of vinyl
cyanide (Gerry and Winnewisser 1973)
Value

Parameter
Centrifuqal Distortion
Constants (MHz)
AJ
AJK
AK
SJ
OK
HJ
EJK
HKJ

-3
(2.2448 + 0.0021) x 10
(-8.5442 + 0.0018) x 10 -2
2.7183 + 0.0021
-4
(4.5716 + 0.0037) x 10
(2.4575 + 0.0097) x 10 -2
(4.73
x 10 -9
+
- 2.0
(2.30
x 10 -7
+
- 1.7
(-9.03
x 10 -6
+ 0.60
-4

HK

4.51

+ 0.80

x 10

hJ

3.38

+. 0.83

x 10- 9

hJK

(-3.25

+ 3.25

x 10

hK

+ 2.2

-5
x 10

7.49

-7

544

C. WINNEWISSER

Paral"1.eter

Value

Nuclear 0uadrupo1e
Coun1inq Constants
-3.74
0.127
Dipole t10ment
.a (D)
f-b ( D)

f'L

(D)

3.68
1.25

3.89 + 0.08

c) Different effects in rotational spectra


In addition to the pure rotational spectra there
are several other types of spectra in the microwave
region. I want to mention a few only. The inversion
spectrum of the ammonia molecule, NH 3 , is a notable
example of the tunneling phenomena encountered in polyatomic molecules which have two or more equivalent
equilibrium forms separated by some sort of potential
energy barriers. In the inversion spectrum of NH~
the nitrogen atom tunnels through the plane of tne
hydrogen atoms. The rotational quantum numbers J and
K do not change and the transitions accur between the
two inversion levels. These transitions are very strong
and occur near 23 GHz; seven transitions of NH3 have
been observed in interstellar space.
Another common type of tunneling involves the internal rotation of aCH 3 group about the bond joining
it to the rest of the molecule. The different equilibrium positions are separated by a potential barrier.
Tunneling splits the threefold degeneracy in a doubly
degenerate (E) level and a non-degenerate A-level.
Between the different types of levels transitions do
not occur, but coriolis coupling can be different and
cause a splitting in the rotational lines (Lin and
Swalen 1959). The splitting can range between very
small ( a few kHz) and may be recognizable as such
(for example CH 3 CHO, CH 3 COOH, HSSH, . ) and very large
(eH 3 0H,HOOH, .. ). The tunneling phenomenon is quite

545

MOLECULES IN ASTROPHYSICS

I
0

iii

l<:

I
Ii)

Iii

Iii

iii

l<:

Iii

Ii)

l<:

Iii

Iii

1""1'"
l<:

Ii)

.
~

Ii)

I iii

iii

l<:

Ii)

J Iii

Iii

Ii)

0
0

CD

I'lilt!
N

>:

?i

"
/"

0
0

"

"
t./[i''.

en

co
I
en

S2

J-,

J-,

"

u
z
:r

0
0

;0

. i~r-'
",
.
,
-1~\\r
\
\ \
0

>:
~

:r

>:

/"

"

Z
0

co
I
en

"

J-,

:r

>:
N

\l{l/1

0
0

"It,
~'lIC''iIl'-III

-x "--x..

"--M---X

0
0

co
I
en

::E

;:;

"

J-,

z
u
:r
u
U

:r

~
0

"

co
I

z
u
:r
u
u
:r

0
0

CD
I

Fig. 4: Comparison between some a-type R-branch transitions of molecules with potential astrophysical interest. For each K-component the calculated rigid rotor
positions (x) and the laboratory measured (.) line positions taking centrifugal distortion into account.
The length and direction of the arrOvlS indicate the
effect of centrifugal distortion.

546

G. WINNEWISSER

common in larger molecules.


Molecules with an odd number of electrons, socalled open-shell molecules or radicals show splittings
of the rotational levels. Open shell molecules have by
definition an electronic configuration other than 1.L
ground electronic state. The magnetic moment of the
electrons interacts with the magnetic moment generated by the rotation of the molecule. VJell-knovln examples are the A-type spectra of OJ', Cll and SH, as well
as SO, CN, O2 and the recently discovered interstellar radical C 2 11. For an introduction to the spectra
of free radicals the reader is referred to books by
Herzberg (1971) and Carrington (1973).
A rotational transition is affected by the interaction of the nuclear electrical quadrupole moment of
nuclei of spin > 1/2 with the electrical field gradient. This interaction occurs in molecules with nuclei D, l4N, 17 0, 33S and leads to a characteristic
hyperfine splitting. The internal relative spacings
and the relative intensities of the hyperfine components are accurately predictable and if observable
they are most useful for identification of a transition.
For more detailed information the reader is referred to existing textbooks.
In addition to a precise determination of the rotational energy levels and the resulting spectra,
microwave spectroscopy allows accurate evaluation of
electric dipole moments of gaseous molecules to be
made. From it precise values are calculated for the
Einstein coefficients of spontaneous emission. An
example of a recent structural and electric dipole
determination is given for isocyanic acid (Hocking
et al.,l974,l97~. These data are summarized in Fig. 5.
In conclusion, the rotational spectra of most
molecules can be adequately accounted for and fitted
with the existing theory. However, it should be pointed out that it is not possible to guarantee a very
close prediction of spectra of molecules not studied
in the laboratory. The prediction from theoretical
calculations can, however, be close enough to warrant
a preliminary laboratory investigation which if successful will be useful to astronomical searches. On
the other hand, the theory may be useful in supporting
the identification of an astronomical line by predic-

1.0

..
\

.
1.209 A

~a=1.57D

40.5t

-0.5

+0.5

~ 1.166A

1.0

Q(Al

Fig. 5: Molcecular structure and positions of the atoMs in the principal axis system
of isocyanic acid. The direction of the total electric dipole moment is almost coincident with the NH bond.

1282 1 /

0986~~

""

~=2.07D

~b=1.35D

b(Al

The Molecular Structure and Dipole Moment of Isocyanic Acid


(in the principal inertial axis system of H14N12C160)

.j:>.

--..l

Ul

en

;=;

::c
-<
en

;d

:>en

en

?ttl

(j

b;

548

G. WlNNEWlSSER

ting other observable lines. Furthermore interstellar


molecular clouds provide huge "sample cells" in which
radicals and ions such as CH, CN, C2 H and tentatively
N H+ have been identified. Laboratory work in the
mfcrowave region on all these systems is beset with
many difficulties. Finally it should be mentioned that
microwave spectroscopy has not yet been applied extensively towards the determination of collisional cross
sections which are so important for astrophysics. In
all these areas theoretical work will be of utmost
importance.
II. ASTRONOMICAL OBSERVATIONS
Our Galaxy is one of billions (Nlo ll ) of similar
systems which fill in all directions the space of the
observable universe. Many of these systems, for example
the Andromeda Nebula, or in catalogue numbers M 31 or
NGC 224, are flattened and similar to our Galaxyl.
This suggests that they are rotating objects. The
average distance between the galaxies is 600 kpc ~
2 x 10 24 em, whereas the diameter of the Galaxy is
about 40 kpc. The Galaxy contains about loll stars,
whose spacing and diameter areon the average 10 19 cm
and lolocm respectively. The sun is but one of them.
The space between the stars contains a number of different components. In addition to ~as, including plasmas and dust particles, interstellar space is permeated by stellar radiation, the isotr9pic:' microwave
radiation, cosmic rays, gamma rays, x-rays and magnetic fields.
Stars and ISM form a huge flat disk whose plane
of symmetry is called the "galactic plane". A schematic
cross section through the Galaxy is shown in Fig. 6.
Stars and interstellar matter are closely confined to
the galactic plance. In the center of the Galaxy the
IThe designation M 31 refers to object 31 in MessierJs
of 103 objects. The designation NGC 224 refers
to object 224 in the New General Catalog. The New General Catalog has two later supplements, known as the
Index Cataloq IC. NGC and IC list about 13000 objects.
1 parsec = 3.0856 x 10 18 cm = 3.2615 light years
1 kpc
= 10 3 pc
1 year
3.155 692 6 x lo7 sec .
Catalo~

549

MOLECULES IN ASTROPHYSICS

SURFACE DENSITIES vs RADIUS

ROTATIONAL VELOCITY vs RADIUS

GIANT HII REGIONS

260]

'i'

ATOMIC HYDROGEN

200 ti

~z 2.0

TOTAL MASS

IOu

>-

is

IS!

-'

~
100

1.5

rt

~ 1.0
0.5

0.1
20-(IJ Ikpc( 15

10

10

'R
.::L

......

!!

10

15 (iZ/kpc(-20

CROSS SECTION THROUGH THE GALAXY

~ GLOBULAR CLUSTERS

.'

1
0
- 1

.....:.::: :::::;:;tl*;;;::,;;;s.;;

STARS

..

- 5

-10

2Cl-{t;l/kpc]15

10

10

15

20

Fig. 6: Distribution of stars and interstellar matter


in the Galaxy. The lower part of the diagram is a cross
section perpendicular to the galactic plane. Globular
clusters are the oldest stellar systems and must therefore have been formed in the early evolutionary stages
of the Galaxy. The upper left part of the diagram shows
the column density of stars, hydroaen gas and HII regions
as a function of the distance
from the galactic center.
The rotational (orbital) velocity is given as a function
of ~ on the riqht of the upper diagram. At the distance
of the sun this velocity is 250 km/sec; it takes the sun
2.5 x 10 8 yr to complete one rotation around the galactic center. (Winnewisser et al., 1974).

550

G. WINNEWISSER

stars form a spherical dense region, known as the nuclear bulge. The oldest objects in the Galaxy, the
globular clusters show an approximate spherical distribution, interpreted as the original shape of the giant
cloud out of which the Galaxy formed. Globular clusters
are deficient in heavy elements.
a) Interstellar Clouds
Nearly all information about ISM and in particular the cold gas component has been gained from observations in our own Galaxy by means of spectroscopic methods. In the past ISM has generally been considered to consist mainly of atoms and ions, with an
average density of about 1 atom per em 3 . The gases are
ionized by the UV radiation from stars producing electrons and ions spanning a temperature range Tk =
500 to 10 000 K. There exist, however, huge amounts of
neutral atomic hydrogen II and large quantities of molecular hydrogen, H2 . Thus, in addition to the hot and
frequently ionized and te~ous gas, the neutral component is found to prevail in clouds of higher density
(n H > 10 cm 3 ). This gas is at considerably lower tempera~ures (T N 400 0 K). The gas can essentially exist
in two sta~le conditions. The coexistence of the two
phases (two component model) can be explained by the
fact that once the gas density increases, collisions
allow it to cool more efficiently. Once this process
was triggered it accelerates itself by very efficient
cooling and simultaneously intercepting less heating
energy from the outside. Thus a low density cloud starts
to contract pvobably into the high density clouds. It
is in these high density clouds where the complex molecules are being detected and where star formation
is expected to take place. Simultaneously the dust becomes dense enough to make the cloud appear "dark" or
"black" by screening the light from background stars.
This is the primary reason why it is not possible to
investigate them by optical astronomy.
However, with the discovery of polyatomic molecules these dark clouds have received in the last 5 years
considerable attention. The microwave and infrared radiation which emanate from these clouds can penetrate
them with ease, be~ause the size of the dust particles
of radii a ~ 0.15A is small compared to the wavelength
of this radiation. Light and some of the microwave radiation come through the earth's atmosphere, whereas
in case of infrared much of the radiation is blocked
by water vapor and carbon dioxide in the atmosphere.

MOLECULES IN ASTROPHYSICS

551

b) Molecular Clouds
Although there exists no standard interstellar
cloud it is possible to devide them roughly into two
types of clouds. Observations show that the majority
of the normal dust clouds or dark clouds (with a temperature fV 3K L.. Tk :E:. 50 K) , density 10fYl H ~ 103cm -3)
show the spectra of relatively simple
2 molecules
such as CO, OIl, and the centimeter transitions of H2 CO.
The observed lines are generally fairly narrow their
line width corresponding to the thermal motion of the
gas. Molecular lines which need higher excitation such
as the millimeter wave transitions of HCN, H2CO and
others are generally not seen.
The wide distribution of OH,CO,HCN,H2CO,NH3 and
others allows some of their molecular transitions
to be used towards extensive surveys. The first resuIts show that the distribution of molecules is similar to atomic hydrogen. Molecules are concentrated
towards the galactic plane in a fairly thin layer of
less than N 300 pc. ~Hthin this flat disk their distribution is fairly common with an apparent concentration
towards the galactic nucleus. The more complex molecules are restricted to a smaller number of fairly
dense and cool condensations. These qre the massive
molecular clouds which exhibit the very rich and
often very intense molecular spectra. Particularly
rich molecular sources are Sgr B2, Sgr A, the Orion
complex, ~\f51,T'J3,DR21 IRC+l0216. Their seizes and
masses as determined from their molecular lines dwarf
the giant HII regions often associated with them. The
galactic positions of both nIl regions and molecular
sources are shown in Fig. 7.
An HII region is an almost fully ionized cloud of
gas and dust. In most cases they are ionized by the
UV radiation of one or more hot stars embedded in the
cloud. A very readable account of recent radio observations of galactic HII regions has been given by
Churchwell (1974),
Systematic surveys of molecular clouds are being
conducted by use of various molecules. In this context CO is of special importance, due its high abundance and the ease with which it can be excited at
relatively low gas densities. Liszt et al. (1974)find
that th~ CO line intensities decrease smoothly with
increasing distance away from the dense central region, where usually also other molecules with much

552

G. WINNEWISSER

100
ORION 16

8
80

10

60

SGRA 11
SGR B2 20

3400

20

Fig.7: Galactic distribution of giant HII regions (after


Churchwell, 1970). Six molecular clouds are identified
by arrows. The number of interstellar molecules known
in each source is indicated on the end of the arrows.

smaller angular distribution are found. Thus the CO


emission comes from the interior of the cloud. The
distribution of CO is similar to that of atomic hydrogen as seen in the A 21 em line (Schwartz et al. ,1973) .
c) Physical properties of molecular clouds
Observation of molecular spectra in interstellar
clouds have shown that they can be used as fairly reliable tools for analyzing temperature, mass and internal motion of the interstellar clouds. The physical
properties of the clouds are related to the observed
quantities by using the equation of radiative transfer
and considering the collisional and radiative processes likely to be responsibl~ for the excitation. In
interstellar clouds at least four different tempera-

553

MOLECULES IN ASTROPHYSICS

tures have to be consinered: the excitation temperature Tex of a molecule, defined by the relative population of at least two levels, the kinetic temperature,
Tk,correspondin~ to the Max1.<Tellian velocity distribution of the gas particles, the radiation temperature,
Tr , . assuMing a black body radiation distribution and
finally the temperature of the dust particles Tdust.
Thermodynamic equilibrium is the exception in interstellar space and therefore all four temperatures will
usually be different. The non-equilibrium conditions
are likely to be caused in part by the delicate balance between various mechanisms of excitation and deexcitation of molecular energy levels. These include
collisions with other particles, radiative transitions
and in part the molecular formation process itself.
a) Rotational excitation temperature and kinetic temperature.
The excitation temperature Tex scans a very wide
range of values from temperatures just above the background radiation to more than 50 K in HII reaions.
Since the excitation temperature is a measure of the
popUlation difference between the two levels associated with the transition one obtains,
a--u /01
. ,. exp(-hVu e/kT ex )

where n is the number of molecules in the energy levels and g their statistical weigths. The subscripts
refer to the upper and lower levels. Statistical equilibrium requires that the number of transitions u-+~
equals the number of
inverse transitions
nu (A ue + ButUv,+C u ) = /)t. (Btu u" + Cluj
where A ann B are the Einstein coefficients of induced
and spontaneous emission as well as absorption. u'if is
the energy density of the radiation field, corresponding to Uv = (4 Jt"/c) Bv TJ;' where in the general ISM,
T = 2.7 K. The C coefficlents describe the collisional
iftduced transitions:
Cut = n <Out. v)

zerv) is the average value of the collisional cross


section for a Maxwellian velocity distribution. Thus
the collisional induced coefficients depend on the density n of the collision partners (mainly H2 in the

554

G. WINNEWISSER

dense clouds) and on their kinetic temperature Tk .


One may summarize
= { Tr = 2.7 K
for C~~< Aue
ex
Tk
for Cui> Auf
The excitation temperature is therefore expected to
lie between the kinetic gas temperature Tk and the
radiation temperature T = 2.7 K. T
can be slightly
lower (lvl.7 K) than therbackground r~diation, Tbh=
2.7 K which has been observed in various K-type ~ran
sitions of H2 CO. This phenomena is referred to as anomalous absorption of the microwave background ~adia
tion. H2 0 and OH are observed in maser emission in
which case T
may range to lo12K. In general, however, deviatrBns from thermodynamic equilibrium are
weak.
T

For the interpretation of line observations the


optical depth ~ is another important quantity. For a
detailed compilation and derivation of the equation
of radiative transfer I refer to a review paper by
Winnewisser et al., (1974). The optical depth integrated over the line profile is
~
2..{

STvoLv
o

=_C_

.eA.(L.e.N.t(1-<tf'Y1.-.-/~.(,I-I)'t-.e.)

81C kv Cf .e

Ne = Jnril is the column density of molecules in


the lower level. The total column density in one level may be related to the total column density by
assuming a Boltzmann distribution.
Observations of the J=l-o emission lines of
12C16 0 and of various isotopic species show that in
several sources carbon monoxide is optically thick.
Thus, from the equation of radiative transfer one finds
that the brightness temperature of the line TL= TexTb ' where T = Tb + T the continuum backqround
br~ghtness tg~peraureCexpressed as the sum of the
2.7 K isotopic background radiation, Tbb , and the
temperature, T , of a continuum source which may be
in the line ofCsight and located behind the molecular
cloud. With T ~ 0 at 115 GHz and Tbb = 2.7 K, Tb 2.7 K. Thus tHe observed brightness temperature ~s
equal to the excitation temperature and independent of
the column density. The kinetic temperature T. , is
related to the excitation temperature T ,an~ therefore to the population of the rotationaIXenergy levels.
Observational and theoretical evidence indicate that
collisions are the most important source of excitation. If collisions dominate the entire population

555

MOLECULES IN ASTROPHYSICS

between two states, then T


is close to or equal to
the kinetic temperature ofehe gas. This seems to be
the case for CO, which has a small dipole moment
=0.1 Debye) and is therefore weakly coupled to the
radiation field. Molecules with large dipole moments
like CS, HCN and others couple more strongly to the
radiation field and therefore Tex is lower.

(r

In the small central cores associated with infrared objects the kinetic temperature may reach 150 K
as has been deduced from short lived, highly excited
states in polyatomic molecules like NH 3 , CH 3CN and
HNCO. On the average, however, one conludes that for
the Orion cloud Tk ~ 50 K, for Sgr B2 Tk~30 K and for
black clouds Tk~ 30K.
d) Densities, sizes and masses
The kinetic temperature Tk , the total mass and
the number density of a cloud are the most important
quantities for understanding cloud fragmentation and
the formation of star clusters. The total mass of a
cloud is calculated from the hydrogen density, which
in interstellar clouds ranqe between 10 3 and 10 7 molecules per em 3 . These values are determined by using
one of two methods: (1) the observed molecular column
densities are correlated with hydrogen densities,
(2) the measured excitation of rotational transitions
can be used to determine hydrogen densities. Both
methods are in fair agreement.
For an optically thin cloud the molecular abundance can be determined. t'i'i th
1, one obtains
for the integrated observed briqhtness temperature
of an emission line

5TL oIv oc:


cO

dO

Te~ ) 't"v oI-v


0

Evaluation of the riqht hand side in terms of the absorption coefficient yields

J
dO

3tt
.-L T c{\)
.e.. = 811~c:a-~ f-L1.. \12. 0 L
vlhere Nt represents the number of molecules in the
lower enerqy level contained in a column of cross
section 1 cm 3 along the line of sight. NL can be related to the total column density N, by evaluating
the rotational partition function. For a diatomic or
linear molecule, this can be carried out in closed form,
and one obtains:

556

G. WINNEWISSER

where B is the rotational constant. From observations


of two spectral lines it becomes possible to determine both excitation temperature and column density.
ColUJ:r~ densities obtained by this method range from
Nlo
cm- 2 for NH2CHO, which represents about the
lower limit of detectability for present-day receiving
systems, to 10 19 cm- 2 for CO. l2CO is believed to. be
optically thick in almost all sources, and therefore
only Te can be determined. The column density is obtained from the optically thin l3C160 or 12C1 8 0 species (Penzias et al., 1971).
All molecules which show optically thin lines are
believed to be minor constituents in the cloud compared to the expected molecular hydrogen H2 . The molecular hydrogen column density of the cloud is inferred
from the l3CO column densitv bv usina the observed
12C160/l3C 16 0 isotope abund~nc~ ratios(N90) and the
C/l! cosmic abundance (--'30 x 10- 4 ). This assumption
produces a lower limit for the H2 column density of
about 6 x 1022cm- 2 . Assumina a cloud thickness of
about 10 pc (N3 x 10 19 cm) one expects nH2~ 2 x 103 cm -3.
Molecular hydrogen densities can also be obtained
from the assumption that the excitation of the rotational enerqy levels which produce observed interstellar
transition is caused by H2 collisions. Theoretical considerations and observational results have shown this
assumption to be valid. By equating the collision rate
1/ 't:' c = nH2 <oN> (O=collisional cross section'" 1015 cm- 2 i
v=thermal
velocitY N 5xl0 4 cm sec- l ) with the transition rate 1/1:" r =
kTb
A
---...9:
one finds the minimum H2 density to be
ul h
(Rank et ale ,1971)

The Einstein coefficient for spontaneous emission AUI


is proportional to the square of the dipole moment
matrix element and the cube of the transition frequency. Substituting for CO f"= 0.1 Debye and 'V = 4115 GHz,
one finds for clouds with Tk = 40 K, nH >10 cm- 3 . For
molecules with larger dipole moments an with transi -

MOLECULES IN ASTROPHYSICS

557

tions in the millimeter wave region, (HCN, H2CO,


HNCO and others) the hydrogen density required is
considerably higher. Values as high as 10 8 molecules
per em 3 are needed to populate some of these levels.
This explains why rotational transitions emitted from
enerqy levels which have short lifetimes for spontaneous emission (Nloo sec) are only seen in the dense
cores of the molecular clouds. In fact the shortest
lived energy levels observed to date in emission are
detected in Sgr B2 only. These include the 4 13 - 3 1 2
transition of HNCO (Snvder and Ruhl,1973), the
2~3/2 J = 5/2 and J = 7/2 (Yen et al., 1969, Turner
et al., 1970) and the 2111/2 hyperfine structure transitions of OH and the J,K = (3,2) ,(2,1) transitions
of NH 3. (Gardner et al., 1971; Zuckerman et al., 19 71
respectively). Furthermore transitions such as the J
6-5, K = 5 of CII3CN and the (J,K) = (6,6) of NH3 arise
from lonaer lived levels but need excitation temperatures ~ 300 K. If collisional excitation is considered as the onlv source of excitation, densities in
the order of s;me 10 7 cm- 3 are needed. However, these
density_limits may turn out to be lower once other
excitation mechanisms such as infrared pumping or
trapping are included. The molecular distribution in
the Sgr B2 molecular source is summarized in Fig. B.
Up to date more molecules have been found in the Sgr
A and Sgr D2 molecular clouds than in any other cloud
in the Galaxy. Recently Gardner and Winnewisser found
the first molecule containing a carbon carbon double
bond, vinyl cyanide, H2 C2 HCN. The energy levels and
interstellar detection are shown in Fig.9. Other recent detections are U93.174(Turner,1974) tentatively
assigned as N2 H+ (Green, 1974) and the ethynyl radical,
C2H (Tucker et al.,1974)as well as vibrationally excited SiO maser emission (Buhl and Snyder, 1974,
Thaddeus, 1974).
The total mass Of a molecular cloud can now be
estimated from its measured anqular size, its distance
anG the hydrogen density. The linear size depends on
the distance which is obtainable only indirectly and
hence is somewhat uncertain. Dust arains cannot contribute more than about 1.4% of th~ total mass of a
molecular cloud. In Table VI we have collected some
relevant cloud parameters. It is seen that molecular
clouds are by far the laraest individual objects in
the Galaxy. Sgr B2 is a particular massive cloud
(Martin and Downes, 1972).

11"

Il:!

g;

Oi
o

Z
~
5
z

()

'"r

"-<=<

'"

J!

\;'

=I~

:r

.!'"

.!'J

!1l

<f9

'f
fl: ellS

II

=~

:r
'h

, ..,'"

()
0

()

V>

",I

....,I

if

I~

c.

:rl~~

z,<

. .::r.

"'"I"s.,'~

o :r

(-,.0
!!! o

~ I~,
:r

):

w""

:r

51

lS
..!J
o~

"o
"

~,

~,

o>.J,
....

=J,~

, 8

"
:r

0"
o~ ~

z :r
:r z

Ig

~~~

C;

8 m

"....

'?)

:r
z

wr.~g~

lf9~~l?l
~ ~

01

~~ ()~~=i

:-1

~ : ~!! ~

1,5

I\J

3
VJ..

.., ':l
~
0>

'"

I~

'll

'"

<n.

....

'"

co

W
N

I~

IV

(ll

G>
::0

(fl

00

Ul
Ul

Fiq. 8: Continuum radiation and molecular distribution in declination for Sgr B2.
~
Th~ top part shows the brigthness distribution at 10.6 GHz and 5 GlIz in different
~
spatial resolution obtained with the Effelsberg loo-m telescope (D.Downes) and the
~
Cambridge One Mile Telescope (Martin and Downes, 1972). The molecular distribution
~
shows that transitions from short lived energy levels are excited only in the central ~
core of the source, whereas other transitions are seen over a wide range.

'"

'"

0-

"' ....

iD",

g:

c'"
zO>

'"co

iz

ill

kj

'"

'"~

:r

ANTENNA TEMPERATURE IKI

559

MOLECULES IN ASTROPHYSICS
E
cm- 1

413
4572 MHz

3
3

404

QS.Hz

~
N

-200

i- -200kHz
1371.829MHz

(a)

W
0:

::>
....
e{

0:
W
Il.
~

W
....
e{

""

In

1-----

w
....
z

e{

(b)

""o

In

1
13720
-

13715
FREQUENCY (MHz)

Fig. 9: Enerqy levels ana Detection of Interstellar


vinyl cyanide in Sgr B2. The average spectra of the
211 - 2 1Z transition are plotted in (a) with a channel spac1ng of 10 kHz smoothed to 20 kHz resolution and
(b) with a 4 kHz spacing smoothed to 16 kHz resolution
(after Gardner and Winnewisser, 1975).

Size

6' -10'

~4

(>12?)
n

10- 1

2"_41

(>10 12 )

20
60

;.5

10 4

",2

<:5

<0.001"

0.7 <R<1.2
10- 5<R<4xl0- 1

2xl0- 6

0.5'- 3'

2~n

ex

<0

1700

>60

<0

500

0.3-60

( K)

700

T rad
o

500

105

~106

10 2 -10 6

10 2

(M

Mass

The numbers of these tables are taken from various authors. Particularly I would
like to mention
Martin,
A.H.M. and Downes, D., Astrophvs.
Lett.ll,2l9,1972.
1

.
" Lelles, C., Ann.Rev. Astron. Astrophvs. 9, 293, 1971.
Mezger, P.G., Proc. 2nd Adv. Course in Astr.and Astrophys.
Interstellar Matter: An. Observers View, Geneva Observ. 1972.

ionized regions

source

maser point

core region

genera 1

a) Molecular cloud

3) Or ion

0.1 < 10

211_41

ionized regions

50

(>12?)

20-80

~8

50

30

<10

"'3

- 5

Tkin

20<R<40

"'3

(n H ) (cm- 3 )
2

Density in powers of 10

1.5 <R< 6
<5xlO- 3

2'

<0.1"

0.5' -

25<R<50

maser point
source

6'

core region

- 14'

- IS'

gene ra 1

b) Molecular cloud

a) Dust cloud

8'

10<R<30

2)~

0.03<R<10

( pc)

1 i nea r

Black clouds

angular
(' or II)

1) Dust clouds

Object

Table VI. Physical Properties of Interstellar Clouds.

tn

o
;s

0\

VI

MOLECULES IN ASTROPHYSICS

561

Throughout the Galaxy it is found that within


massive clouds the OH and H2 0 masers are located.
These sites ~re believed to be ultra-dense spots with
nH N lollcm- and apparent linear dimension~ of
l-foo AU (1 AU = Astronomical Unit ~1.5 xlo 3cm ) separated by distances of about 10.000 AU (Hills et al.,
1972). The strong OH and H20 maser emission consists
usually of many peaks corresponding to different
DippIer shifts. The individual peaks show a pronounced
change in line intensity in a time scale as short as
a few days. These spots may be protostars in their
premainsequence adiabatic contraction state. For further information on the rate and efficiency of star
formation in our Galaxy I refer to a review paper
by Mezger (1974).
e) Formation and destruction
The important processes during the life cycle of
an interstellar molecule are formation, relaxation
and destruction as indicated in Fig. 10. Possibly
best understood is the dissociation of a molecule as
part of the destruction process. All molecules observed to date in interstellar space can be dissociated
by absorption of UV photons of wavelengths A> 9l2~,
the ionization limit for hydrogen. Exchange reactions
and absorption on dust grains also contribute to the
loss of molecules. In fact the lifetime of these
molecules is less than 100 years if they are not protected against the radiation but are expected to increase by several orders of magnitude (Nl06years) in
areas of strong shielding.
The relaxation processes which take place during
the lifetime of a molecule are important for the understanding of the observational data. The formation
of molecules is the least understood of the three processes, and it is in this area where new insight into
the basic chemical reactions in a low density medium
is required.
Since the detection of interstellar molecules
several mechanisms have been proposed to explain their
presence in the interstellar medium. Two general types
of formation schemes have been suggested: the first
proposes that molecules are formed in stellar atmospheres or protostellar nebulae and then expelled into
the interstellar medium, while the second hypothesis
is that molecules are constantly formed within the

562

G. WINNEWISSER

The Life of a Molecule

in Laboratory

in Interstellar Space
Gas Phase Reactions

I-~IBIRTHI
I
I
I

Evaporation from Dust

---,
--;;=lD~
-

I ,..,10- 2 -104 hr

P
U
M
I
I

P
I
N
G

4-

1.......- -......

!~-IDEATHI

Photo-Dissociation

I
I
I

~-A-~-----------;
D

Steady State

10

-3 sec

(Boltzmann
Distribution)

Chemical Reaction

Fig. 10: The li~e cycle of molecules in interstellar


space and in the laboratory. The time elapsed between
the birth and the destruction of a molecule is indicated by the different steps during the relaxation processes. With radio telescopes molecules are detected
in different rotational states in the electronic ground
state and most likely in the lowest vibrational state.
Durin~ the molecule's stay in the lower rotational levels it becomes subject to various pumping schemes. IN
the laboratory the Boltzmann distribution is reached
in a fraction of a second.

MOLECULES IN ASTROPHYSICS

563

clouds where they are detected.


Althouqh the dense environment (Nlo 14 molecules
per CT'1 3 ) of-cool stars would favour multiple-body gas
phase reactions to occur at a fast rate, this cannot
be considered, for a variety of reasons, to be the
principal formation mechanism unless dust and molecules are produced and blown out simultaneously into
the interstellar medium. Intercloud distances of several pc are too far for molecules to travel unharmed
durinq their lifetimes. Some interstellar clouds with
10 5 to lo6Mm are so massive that many stars would have
to simultaneously blm-, out molecules. Furthermore,
observed isotopic abundance ratios do not support this
picture.
Local formation processes are therefore strongly
favored. The formation processes discussed involve
gas-phase and surface reactions, the latter thought
to take place upon interstellar dust grains. Several
types of qas-phase reactions are proposed such as radiative recombination, ion-neutral, neutral-neutral,
chemical exchange and recombination reactions
(Herbst and Klemperer, 1973). The low temperature of
interstellar space allows the occurrenceof exothermic
gas phase reactions only. The chemistry of an interstellar cloud depends upon whether it has low, medium
or high density. For example, the observed molecular
abundance of the diatomic molecules CH, CH+, CN in
the direction of) Ophiuci can quantitatively be accounted for by two body radiative association of the
form (Solomon and Klemperer,1971)
C+ + H ~CP++hV
Once CH+ is form~d recombination reactions of an
electron with CH + e --)CH + h" or chemical exchange
reactions may follow to form:
CH+ + X(X=N,O,C)~CX + H+.
These processes are operative only in low density regions where C is ionized and may not be expected to
take place in the central core of dense clouds. Ionneutral gas phase reactions are known to be very fast
and may be im~ortant for clouds with densities 10 4 <
nH2 < 10 6 cm- . All gas phase models are sensitive
to the presence of interstellar dust, which shields
the molecules from the destructive interstellar ultraviolet radiation field. However, in all model calcula-

564

G. WINNEWISSER

tions rate constants are needed. They are either determined experimentally or estimated on theoretical
grounds. Their precise determination often poses
serious difficulties and the possibility that all relevant reactions are not included in the treatment
adds considerable uncertainty to all gas-phase calculations. Barsuhn (1974) finds that by fitting the
OAO-C observations of n(H2)/ntotal and ~dus~to model
calculations that the density of these clouas must
lie between 50 <.. n (H 2 ) 1000.
Calculations considering formation on grain
surfaces (Watson and Salpeter, 1972) are considerably
more undefined and uncertain since the relevant physical processes on the surfaces of interstellar dust
qrains are still largely speculative. The information
nee'ded to describe the processes, include: condensation of the qas onto grains, sticking probability,
surface mobility, reaction rates, ejection mechanisms
and rates for molecules newly formed on grain surface.
Despite the high uncertainty whether polyatomic molecules can be formed this way, it is almost certain
that H2 is formed from atomic hydrogen on interstellar
grain surfaces (Hollenbach et al., 1974), but it is
uncertain whether polyatomi C molecules can form in
this way.
III. FUTURE PROSPECTS
The very existence of molecules in interstellar
space has provided considerable new insight into the
interstellar chemistry of our Galaxy. By providing
birth and shelter to molecules like HCN, H2 0, NH3
and H2 CO, known to be important in reactions which
synthesize amino acids , the interstellar environment
is not nearly as hostile as had oriqinally been assumed. Chemical evolution has clearly advanced much
further than had been expected !5the cold (~20 K)
and hiqh vacuum conditions (",10
Torr) of interstellar clouds.
It seems to me that the most important aspect of
the detection of interstellar molecular spectra is
that they provide probes into the interior of clouds.
Especiallv the millimeter wave emission and maser
emission emanate deep from the core of very dense
and often giant molecular clouds. Some of the molecular radiation comes from molecular species presently
outside the reach of laboratory spectroscopy.Thus two

MOLECULES IN ASTROPHYSICS

565

avenues of interstellar molecular spectroscopy will


be particularly rewarding in the near future.

(1) We are trying to investiaate and to understand


in more detail the physical conditions in interstellar
clouds and their relation to conditions which lead to
and govern processes of star formation. Selected molecules serve as ideal probes, offering their complete
rotational spectrum rather than just one transition.
(2) It remains an intriguinq task to search for
new and more complex organic molecules to see how far
chemical evolution in interstellar clouds has proceeded towards the evolution of bio-molecules. The recent detection of methylamine, CH 3NH 2 (Fourikis et al.,
1974), and dimethylether, (CH3)20 (W1nnewisser and
Gardner, 1974; Snyder et al., 1974), in both the Sagittarius and Orion clouds raises the hope that one
could expect to find simple bio-molecules such as the
amino acids glycine, NH 2 CH 2 COOH, alanine, serine and
others. These particular molecules differ from each
other only by the nature of their side chains. They
are the basic building blocks of all proteins. At present there exist no laboratory microwave spectra for
this class of molecules. It is important to establish
also, whether ring molecules exist in interstellar
clouds. It is well known that heterocyclic ring molecules are necessary for the synthesis of complex biomolecules such as ribonucleic acid, RNA, and deoxyribonucleic acid, DNA, which constitute the genetic material of cells. Although this complex class of biomolecules will not be detectable as such, precursors
of them may very well be accessible to radio spetroscopy.
ACKNOWLEDGEMENTS
It is my pleasure to thank DR. E.Churchwell for
reading part of the manuscript and to Dr. D. Downes
for providing some results prior to publication. I also
would like to thank Mrs. H.Winnewisser for her efficient typinq of the manuscript.

566

G. WINNEWISSER

REFERENCES
Barsuhn, J., Mitt.Astron.Gesellsch. 35,197,1974.
Buhl, D., Snyder, L.E. ,189, L31,1974-.Carrington, A., Microwave-Spectroscopy of Free Radicals, London, Academic Press 1974.
Cheung, A.C., Rank, D.M., Townes, C.H., Thornton, D.
D. and Welch,W.J., Phys.Rev.Lett. 21,
1701,1968.
-Cheung, A.C., Rank, D.M., Townes,C.H., Thornton, D.O.,
and Welch, W.J., Nature 221, 626,1969.
Churchwell, E., Ph.D.Thesis, Univ.of Indlana, 1970.
Churchwell, E., I.A.U. Symposium 60, ed. F.J.Kerr and
Simonson, 1974.
Costain, C.C., J.Chem.Phys.29, 864, 1958.
De Lucia, F.C., Cook, R.L.,-nelminger, D., and Gordy,
~"J., J.Chem.Phys. 55, 5334,197l.
Fourikis, N., Takagi, K. and Morimoto, N., Astrophys.
J. 191,L139, 1974.
Gardner, F.F., Ribe~J.C., Sinclair, M.W., Astrophys.
J. 169, Ll09,1971.
Gardner, F.F., and i"Vinnewisser, G., Astrophys.J. (in
press) 1975.
Gerry, M.C.L. and Winnewisser, G., J.Mol.Spectrosc.48,
1,1973.
-Green, S., Montgomery Jr., J.A. and Thaddeus, P.,
Astrophys.J.,193, L89,1974.
Helminger, P., Cook, R.L. and De Lucia, F.C., J.Mol.
Spectrosc. 40, 125,1971.
Herbst, E., and Klemperer, ~, Astrophys.J. 185,505,
1973.
-Herzberg G.,The Spectra and Structures of Simple Free
Radicals (Ithaca, Cornell University
Press.
Hills, R., Janssen, M.A., Thornton, D.O., Welch W.J.,
Astrophys.J. 175, L59, 1972.
Hocking, TT.H., Gerry, M.C.L. and lvinnewisser, G., Astrophys. J. 187, L89,1974.
Hocking, W.H., Gerry, M.C.L. anrl Winnewisser, G., to
be published Canad.J.Phys. 1975.
Hollenbach, D.J., Werner, M.W., Salpeter, E.E., Astrophys. J. 163,'165, 1971.
Kaifu, N., Morimoto, M., Nagane, K., Akabane, K., and
Takagi, K., Astrophys.J. 19l,L135,1974.
Kewley, R., Sastry, leV.L.N., Hinnewisser-;t:L and Gordy,
t-!., J.Chem.Phys. 39,2856,1963.
King, G.I':., Hainer, R.M., and Cros-s, P.C., J.Chem.Phys.
II, 27, 1943.
Kirchhoff, li!.H.,-J'.Mol.Spectrosc. Q, 333, 1972.

MOLECULES IN ASTROPHYSICS

567

Kivelson, D., and y,Hlson, E.B., J.Chem.Phys . .~_~, 1575,


1952.
Lin, C.C. and Swalen, J.C., Rv.Mod.Phys. 31, 841, 1959.
Liszt, H.S., Penzias, A.A., and \A7ilson, R:N., private
co~~unication, submitted to Astrophys.
J. 1974.
Martin, A.H.M., and Downes, D., Astrophys. Lett. 11,
219, 1972.
Mills, I.M., Molecular Spectroscony: Modern Research
(ed. K.N. Rao and C.W. Mathews, Academic
Press, 1972.
Penzias, A.A., Jefferts, K.B., and Wilsnn, R.W.Astrophvs.J. 165, L229,1971.
Rank, n.M., To~nes, C.IL, and Nelch, ~'].J. Science 174,
1083, 1971.
-Schwartz, P.R., Nilson,hr ,J. and Epstein, E.E., Astrophys. J. 186, 529, 1973.
Snyder, L. E., Buhl, D., -Z-uckerman, B. and Palmer, P.,
Phvs.Rev.Lett., 22, 679, 1969.
Snyder, L.E., and Buhl, D., Nature Phvs.Sci. 243,45,
1973.
---Snyder, L.E., Buhl, D., Schwartz, p.R., Clark, F.O.,
Johnson, D.R., Lovas, H.J., and Gigure,
P.T., Astrophys. J., 191, L79, 1974.
Solomon, P.M., Jefferts, K.B., Penzias, A.A., and Wilson, R.W. Astrophys.J. 168, Ll07, 1971.
Solomon, P.M., and Klemperer, N., Astrophys.J. 178,
389, 1971.
--Solomon, P.M., Penzias, A.A., Jefferts, K.B., and Wilson, R.D., Astrophys.J. 185, L67,1973.
Thaddeus, P., Mether, J., Davis, J.II.,--and Blair, G.N.,
Astrophys.J. 192, L33,1974.
Tucker, K. D., Kutner, M. L., and Thaddeus, P., Astrophys.
J. 193, Ll15, 1974.
Turner, B.E., Palmer, P., and Zuckerman, B., Astrophys.
J. 160, L12S, 1970.
Turner, B.E., Astrophys. J. 193, L83, 1974.
Watson, J.K.G., J. Chem.Phvs:-45, 1360, 1966; ~, 1935,
1967.
-Watson, N.D., and Sa1peter, E.E., Astrophys.J. 175,
659, 1972.
Weinreb, S., Barrett, A.M., Meeks; M.L. and Henry,J.C.,
Nature 200, 829, 1963.
t'Vinnewisser, B. P., Winnewisser, M., and ~7inther, F.,
J.Molec.Spectrosc. 51, 65, 1974.

568

G. WINNEWISSER

Winnewisser, G./ Maki, A.G., and Johnson, D.R.,


J.Mo1ec.Spectrosc. 39/149/ 1971.
Winnewisser / G. / t'i'innewisser / M/ and~Hnnewisser / B. P. ,
MTP Rev. of Chemistry, Phys/Chem.,
Ser.1, Vo1.3, Spectrosc (ed.D.A.Ramsay)
Oxford: Medical Technical Pub1.Co.,
Ltd. and London: Butterworth Co.,Ltd.,
1972 .
Winnewisser, G., Mezaer, P.G., and Breuer, H.D.,
Topics in Current Chemistry, Vo1.44,
Berlin, Springer Verlag, 1974.
-Winnewisser, G., and Gardner, F.F. , Astrophys.J.
(to be published) 1974.
Yen, J.L., Zuckerman, B., Palmer, P., and Renfield, H.,
Astrophys.J., 156, L127, 1969.
Zuckerman, B., Morris, M., Turner, B.E., and Palmer, P.
Astrophys.J. 169, L105, 1971.

Você também pode gostar