Você está na página 1de 120

BACKGROUND

In this handout, we discuss some points of elementary logic that are apt to cause confusion,
and also introduce ideas of set theory, and establish the basic terminology and notation.
This is not examinable material, but read it carefully, as this forms the basic language of
mathematics.
1. How to read mathematics
Dont just read it, fight it! Mathematics says a lot with a little. The reader must participate.
After reading every sentence, stop, pause and think: do I really understand this sentence?
Dont read too fast. Reading mathematics too quickly results in frustration. A half hour
of concentration while reading a novel perhaps buys you 20 pages with full comprehension.
The same half hour in a math article buys you just a couple of lines. There is no substitute
for work and time.
An easy way to progress in a mathematics course is to read the relevant section of the
course notes or book before the lectures, and then once again on the same day after the
lecture is over. Keep up, as Mathematics is different from other disciplines: you need to know
yesterdays material to understand todays. Dont save it all for one long night of cramming,
which simply wont work with Mathematics.
Before attempting the exercises, make sure you read the corresponding section from the
lecture notes or book. After reading an exercise, stop and think if you know all the terms in
the exercise, and if you understand what is being asked. Then think about what is given and
what is required. You might then see a possible way of proceeding. You can do some rough
work by writing down a few things in order to convince yourself that your strategy indeed
works. Then write down your answer in a manner that a person can understand logically
what your argument is. Justify each step. Writing proofs is an art, and one gets better at it
only by practice. Every step in the proof is a (mathematical) statement, but it is a sentence in
English! So make sure that each step in your argument reads like a simple sentence (so avoid
the use of a chain of dangling or , and pay attention to punctuation and grammar!).
2. Definitions, Lemmas, Theorems and all that
2.1. Definitions. A definition in Mathematics is a name given to a mathematical object by
specifying what the mathematical object is. Just like in biology we define that
An animal is called a fish if it is a cold-blooded, water-dwelling vertebrate with gills.
Observe that in defining a fish, we have listed the characterizing properties that specify which
animals are fish and which arent. In the same way, in Mathematics, a mathematical object
(a set or a function) is given a certain name if it satisfies certain properties.
Any definition of a mathematical term or a phrase will roughly have the form
if ? ? ? ,

where is the term which is being defined, and ? ? ? is its defining property. For example:
1

BACKGROUND

Definition 1. An integer is called even if it is divisible by 2.


Definition 2. An integer is called prime if it is greater than 1 and it has no divisors other
than 1 and itself.
Definition 3. A function f : [a, b] R is continuous at c [a, b] if for every  R such
that  > 0, there exists a R such that > 0 and for all x [a, b] satisfying |x c| < ,
|f (x) f (c)| < .
In a definition the word if really means if and only if. So in the definition of odd integers,
all integers which are not divisible by 2 are ruled out from being even.
Definitions are very important, as without knowing what the objects precisely mean, how
can we prove things about them?
2.2. Lemmas, Theorems, . . . . Not all results proved in mathematics are called theorems.
Some are called lemmas or corollaries. Strictly speaking, there is no difference among them.
However, the distinction rests on some aspects such as utility, depth and beauty. A lemma
is useful in a limited context (often only as a preparatory step for some theorem) and is too
technical to have an aesthetic appeal. A theorem carries with it some depth and a certain
succinctness of form and often represents the culmination of some coherent piece of work,
while a corollary is like an outgrowth of a theorem, and it is an easy consequence of the
theorem.
3. Some remarks about logic
3.1. Bivalued logic. A mathematical statement is a sentence that allows only two possibilities: either it is true or it is false. Thus there is nothing between true or false. There is
no such thing as very true, almost true, substantially true, partially true, or having an
element of truth, although we commonly use such phrases in everyday life.
Of course there are many mathematical statements for which we do not know if they are
true or false. An example of this is the Goldbachs conjecture, which states the following:
(1)

Every even integer greater than 2 can be expressed as a sum of two prime numbers.

Although the above statement has been verified for an impressive range of cases, nobody has
proved it for all cases. Nor has it been disproved, that is, nobody has so far discovered even
a single even integer greater than 2 which cannot be expressed as a sum of two primes. Still,
even today, the statement is either true or false, even though we do not know which way it is.
3.2. Statements about a class. The above remark about quantification of truth is important for statements about a class. The statement
(2)

All rich men are happy

is about a class, namely that comprising rich men. Goldbachs conjecture above is a statement
about the class of all even integers greater than 2.
A layman is apt to regard these statements as true or nearly true when they hold in a
large number of cases. Even if there are a few exceptions, he is likely to ignore them as say
The exception proves the rule!. In mathematics, this is not so. Even a single exceptional case
(a counter-example, as it is called), renders false a statement about a class. Thus even one
unhappy rich man makes the statement (2) as false as millions of such men would do. In other
words, in mathematics, we interpret the words all and every quite literally, not allowing

BACKGROUND

even a single exception. If we want to make a true statement after taking the exceptional
ones into account, we would have to make a different statement such as
All rich men other than Mr. X are happy.
But loose expressions such as most, a great many or almost all cannot be used in mathematical statements, unless, of course they have been precisely defined earlier.
There is another type of statements made about a class. These do not assert that something
holds for all elements of the class, but instead that it holds for at least one element from that
class. Take for example the statement
There exists a man whose height is 5 feet 7 inches
or
There exists a natural number k such that 1 < k < 4294967297 that divides 4294967297.
These statements refer respectively to the class of all men and to the class of all natural
numbers between 1 and 4294967297. In each case, the statements says that there is at least
one member of the class having a certain property. It does not say how many such members
are there. Nor does it say which ones they are. Thus the first statement tells us nothing by
way of the name and the address of the person with that height, and the second one does not
say what this divisor is. These statements are, therefor, not as strong as, respectively, the
statements, say,
Mr. X in London is 5 feet 7 inches tall
or
641 divides 4294967297
which are very specific. A statement which merely asserts the existence of something without
naming it or without giving any method for finding it is called an existence statement. In
the bivalued logic setting of mathematics, existence statements are either true or false, even
if they are not specific.
3.3. Negation of a statement. A negation of a statement is a statement which is true
precisely when the original statement is false and vice-versa. The simplest way to negate a
statement is to precede it with the phrase It is not the case that .... Thus the negation of
Mr. X is rich
is
It is not the case that Mr. X is rich.
Note that the negation of
All men are mortal
is not
All men are immortal.
In view of the comments that we have made about the truth of statements about a class, it
is false, even when it fails to hold just in one case, that is, when there is even one man who
is not mortal. So the correct logical negation is
There exists a man who is immortal.

BACKGROUND

Not surprisingly, the negation of an existence statement is a statement asserting that every
member of the class (to which the existence statement refers) fails to have the property
asserted by the existence statement. Thus, the negation of
There exists a rich man
is
No man is rich
that is,
Every man is poor .
If we keep in mind these simple facts, we can almost mechanically write down the negation of
any complicated statement. For example, if a 1 , a2 , a3 , . . . is a sequence of real numbers, then
the negation of
 R such that  > 0, N N such that n N satisfying n > N , |a n L| < 
is
 R such that  > 0 and N N, n N such that n > N and |a n L| .
If a statement is denoted by some symbol P , then the negation of P is denoted by P .
3.4. Vacuous truth. An interesting point arises while dealing with statements about a class.
A class which contains no elements at all is called a vacuous or empty class. For example,
the class of all six-legged men is empty because there is no man who has six legs. But now
consider the statement
(3)

Every six-legged man is happy.

Is this statement true or false? We cannot call it meaningless. It has a definite meaning, just
like the statement
Every rich man is happy.
We may call the statement (3) useless, but that does not debar it from being true or false.
Which way is it then? Here the reasoning goes as follows. Because of bivalued logic, the
statement (3) has to be either true or false, but not both. If it is false, then its negation is
true. But the negation is the statement
There exists a six-legged man who is not happy.
But this statement can never be true because there exists no six-legged man whatsoever (and
so the question of his being happy or unhappy does not arise at all). So the negation has to
be false, and hence the original statement is true!
A layman may hesitate in accepting the above reasoning, and we give some recognition to
his hesitation by calling such statements as being vacuously true, meaning that they are true
simply because there is no example to render them false.
Note, by the way, that the statement
(4)

Every six-legged man is unhappy

is also true (albeit vacuously). There is no contradiction here because the statements (3) and
(4) are not negations of each other.

BACKGROUND

What is the use of vacuously true statements? Certainly, no mathematician goes on proving
theorems which are known to be vacuously true. But such statements sometimes arise as
special cases of a more general case.
3.5. Logical precision in mathematics. The importance of logic in mathematics cannot
be over-emphasized. Logical reasoning being the soul of mathematics, even a single flaw of
reasoning can thwart an entire piece of research work. We already pointed out that in mathematics every theorem has to be deduced from the axioms in a strictly deductive manner.
Every step has to be justified, and this is the rule for all mathematics. But it deserves to
be emphasized here, since in high-school mathematics, the concern was usually with numbers. Consequently the required justifications were based upon some very basic properties
of numbers and their specific mention was rarely made. For example if we are to solve
(x + 3)3(x 3) = 30, we mechanically solve it in the following steps:
(x + 3)(x 3) = 10

x2 9 = 10
x2 = 19

x = 19 or 19.

Although no justification is given for these steps, they require various properties of real
numbers such as associativity, commutativity and cancellation laws for multiplication and
addition, distributivity of multiplication over addition, and finally, the existence of square
roots of real numbers. So far in high-school, we ignored these. But in mathematics, one
sometimes considers abstract algebraic systems where some of these laws of associativity,
distributivity and so on do not hold. Then the justification for each step will have to be given
carefully, starting from the axioms. Hence a proof is needed even when the statement may
seem obvious.
4. Sets and functions
It is sometimes said that mathematics is the study of sets and functions. This is an
oversimplification of matters, but it is true.
4.1. Sets. A set is a collection of objects considered together. For instance the set of all
positive integers, or the set of all rational numbers and so on. The set comprising no element
is called the empty set, and it is denoted by . The objects belonging to the set are called its
elements. For example, 2 is an element belonging to the set of positive integers.
There are two standard methods of specifying a particular set.
Method 1. Whenever it is feasible to do, we can list its elements between braces. Thus
{1, 2, 3} is the set comprising the first three positive integers.
This manner of specifying a set, by listing its elements, is unworkable in many circumstances. We then use the second method, which is to use a property that characterizes the
elements of the set.
Method 2. If P denotes a certain property of elements, then {a | P } stands for the set of all
elements a for which the property P is true 1. The set then contains all those elements (and
1The symbol | is read as such that. Some authors use : instead of |.

BACKGROUND

no others) which possess the stated property. For example,


{a | a is real and irrational}
is the set of all a such that a is real and irrational, that is those real numbers that cannot be
written as a quotient of two integers. The set {1, 2, 3} can also be described as
{n | n is an integer and 0 < n < 4}.
We usually denote elements by small letters and sets by capital letters. If a is an element
of the set A, then we abbreviate this by writing
a A.
Similarly, if a does not belong to the set A, then we write a 6 A.
We say that a set A is a subset of a set B if every element belonging to A also belongs to
B. We then write
A B.

(Some authors use the notation A B instead of A B.) If A B, we sometimes say that
A is contained in B or that B contains A. For example the set of integers is contained in
the set of rational numbers: Z Q.
For any set A, A A and A. Two sets A and B are said to be equal if they consist of
exactly the same elements, and we then write A = B. A is equal to B iff A B and B A.
If A B and A 6= B, then we say that A is strictly contained in B, or that B strictly
contains A.
The intersection of sets A and B, denoted by A B, is the set of all elements that belong
to A and to B:
A B = {a | a A and a B}.

If A B = , then the sets A and B are said to be disjoint. For example, the intersection of
the set of all integers divisible by 2 and the set of all integers divisible by 3 is the set of all
integers divisible by 6. More generally, if A 1 , . . . , An are sets, then their intersection is the
set
{a | for all i {1, . . . n}, a Ai }.
n
\
Ai . If we have an infinite family of
The intersection of the sets A1 , . . . , An is denoted by
i=1

sets, say, An , n N, then their intersection is the set

which is denoted by

{a | for all i N, a Ai },
Ai .

i=1

The union of sets A and B denoted by A B, is the set of all elements that belong to A
or to B:
A B = {a | a A or a B}.

For example, the union of the set of even integers and the set of odd integers is the set of all
integers. More generally, if A1 , . . . , An are sets, then their union is the set
{a | there exists an i {1, . . . n} such that a A i }.

BACKGROUND

The union of the sets A1 , . . . , An is denoted by

n
[

Ai . If we have an infinite family of sets,

i=1

say, An , n N, then their union is the set


which is denoted by

{a | there exists an i N such that a A i },

Ai .

i=1

Given sets A and B, the product of A and B is defined as the set of all ordered pairs (a, b),
such that a is from A and b is from B. The product of the sets A and B is denoted by A B.
Thus,
A B = {(a, b) | a A and b B}.
We do not define an ordered pair, but remark that unless a = b, (a, b) is not the same as
(b, a). The name product is justified, since if A and B are finite, and have m and n elements,
respectively, then the set A B has mn elements. Note that as sets A B and B A are not
equal, even though they have the same number of elements. Similarly given sets A 1 , . . . , An ,
we define A An by
A1 An = {(a1 , . . . , an ) | ai Ai for all i {1, . . . , n}}.

If all the Ai s are equal o the set A, then we denote A A by A n .

4.2. Functions or maps. Let A and B be two nonempty sets. A function (or a map) is a
rule which assigns to each element a A, an element of the set B.
The set A is called the domain, and B is called the codomain of the function f . We write
f :AB

where A is the domain and B is the codomain. If a A, then f takes a to an element in B,


and this element from B is denoted by f (a). The element f (b) ( B) is called the image of
a under f . We sometimes also say that f maps a to f (a). The set of all images is called the
image of f , and this set is denoted by f (A):
f (A) = {b B | there exists an a A such that f (A) = b}.

Clearly f (A) B.
For example, if we take A = B = Z and consider the rule f that assigns to the integer n
the integer n2 , then we obtain a function f : Z Z given by
(5)

f (n) = n2 ,

n Z.

We observe that the image of f , is the set f (Z) = {0, 1, 2, . . . } comprising the nonnegative
integers, which is strictly contained in the codomain Z. Thus f (Z) Q, but f (Z) 6= Z.
Note that while talking about a function, one has to keep in mind that a function really
consists of three objects: its domain A, its codomain B and the rule f . Thus for example, if
the function g : Z Q is given by g(n) = n 2 , n Z, then g is a different function from the
function f : Z Z given by (5) above, since the codomain of f is Z, while that of g is Q.
Functions can be between far more general objects than sets comprising numbers. The
important thing to remember is that the rule of assignment is such that for each element
from the domain there is only one element assigned from the codomain. For example, if we
take the set A to be the set of all human beings in the world, and B to be the set of all
females on the planet, and f : A B to be the function which associates to a person his/her

BACKGROUND

mother. Then we see that f is a function. However, if g is rule which assigns to each person
a sister he/she has, then clearly this is not a function, since there are be people with more
than one sister (and also there are people who do not have any sister).
Properties of functions play an important role in mathematics, and we highlight two very
important types of functions.
A function f : A B is said to be injective (or one-to-one) if
(6)

for all a1 and a2 in A such that f (a1 ) = f (a2 ), a1 = a2 .

Equivalently, a function f : A B is injective iff

for all a1 and a2 in A such that a1 6= a2 , f (a1 ) 6= f (a2 ).

This means that for each point b in the image f (A) of a function f : A B, there is a unique
point a in the codomain A such that f (a) = b. The function f : Z Z given by (5) is not
injective, since for instance for the points 1, 1 in the domain Z, we see that 1 6= 1, but
f (1) = 1 = f (1). However, the function g : Z Z given by
g(n) = 2n,

n Z,

is injective.
A function f : A B is said to be surjective or onto if
(7)

for all b B, there exists an a in A such that f (a) = b.

Note that (7) is equivalent to f (A) = B. In other words, a function is surjective if every
element from the codomain is the image of some element. The map f : Z Z given by (5) is
not surjective. Indeed, 1 is an element from the codomain, but there is no element n from
the domain Z such that (f (n) =) n2 = 1. Consider the map g : Z {0, 1, 2, 3, . . . } given
by
g(n) = n2 , n Z.
Then clearly g is surjective.
A function f : A B is said to be bijective if it is injective and surjective. Thus to check
that a map is bijective we have to check two things, injectivity and surjectivity. Consider the
map h from the set of all integers to the set of all even integers, given by
h(n) = 2n,

n Z.

Then h is injective and surjective, and so it is bijective.


Finally we mention a situation that arises frequently in mathematics. Let f : A B be
a function, and suppose that S is a nonempty subset of A. Then one can construct a new
function from f as follows. Consider the function g : S B, defined by
g(a) = f (a),

a S.

This function is called the restriction of f to S, and it is denoted by f | S .


5. Some common mathematical notation
N the set of natural numbers 1, 2, 3, . . .
Z is the set of integers . . . , 3, 2, 1, 0, 1, 2, 3, . . .
Q is the set of rational numbers
R is the set of real numbers
C is the set of complex numbers.

BACKGROUND

means for all or for every


means there exists
:= means is defined to be or defined by

Optimisation Theory

2007/08

MA 208
Notes 1
Introduction to Continuous Optimisation
Some Mathematical Background
As mentioned in the general information of this course, the first part of this is course is based
on the text book A First Course in Optimization Theory by R.K. Sundaram ( Cambridge University Press (1996), ISBN 0-521-49770-1 ). ( Notice the difference between the British spelling
optimisation and the American optimization. We will use the British spelling, which
means that the correct title of this course is Optimisation Theory. ) As far as possible, we
will follow the notation and conventions in that book.

The notes will mainly consist of extra material ( or a different explanation of material covered in the book ), plus a description of the parts of the book relevant for the topic under
consideration.
1.1 The basic problems
Throughout this part of the course we will assume that we are given a function f : D R,
where D is a certain subset of Rn , for some n 1. The function f is called the objective
function and D is the constraint set.
And the optimisation problem is : what is the maximum or minimum for f ( x ) when x D ?
We will write these problems as
maximise f ( x ) subject to x D

and

minimise f ( x ) subject to x D ;

or more compact as
maximise f ( x ) for D

and

minimise f ( x ) for D

max{ f ( x ) | x D }

and

min{ f ( x ) | x D }.

or

In this course we will make no assumptions on f or D ( or on the dimension n, for that


matter ) unless explicitly stated. So dont assume that every function is differentiable or
continuous, or that D will not be some nasty set.

Author : Jan van den Heuvel

c London School of Economics, 2008

MA 208 Optimisation Theory

Notes 1 Page 2

More precisely, a solution to the problem max{ f ( x ) | x D } is a point x in D such that


f ( x ) f ( y ),

for all y D .

And a solution to the problem min{ f ( x ) | x D } is a point z D such that


f ( z ) f ( y ),

for all y D .

If such points x or z do exist, then they are called a global maximum of f on D and a global
minimum of f on D , respectively.
Notice that there is a difference between a solution to one of the optimisation problems
above and the solutions ( or the set of solutions ) to the optimisation problem.
Also realise that quite often there will be no solution at all; the set of solutions is the empty
set in that case.
We will use the notation arg max{ f ( x ) | x D } to denote the set of all solutions to the
problem max{ f ( x ) | x D } and arg min{ f ( x ) | x D } for the minimising problem.

The set of attainable values of f on D or the image of D under f is


f (D) = { y R | f ( x ) = y for some x D }.
Using this notation it is immediately clear that f has a maximum ( minimum ) on D if and
only if f (D) has a maximal element ( minimal element ).

1.2 First results


The following are some easy to prove first results. Because they are so obvious, we often use
them implicitly without referring to them.
Theorem
A point x D is a maximum of f on D if and only if x is a minimum of f on D .
Because of the theorem above, we usually only discuss the maximisation problem.
Theorem
Suppose we can write D as a finite union of sets D = D1 D2 Dk , such that f has a
maximum on each subset Di . Then f has a maximum on D .
Note that in this theorem the subsets need not be disjoint.
Theorem
Suppose we can write D as a union of two sets D = D1 D2 , such that the following two
properties hold :
There is a point x D1 such that for each y D2 we have f ( x ) f (y).

f Has a maximum on D1 .

Then f has a maximum on D .

MA 208 Optimisation Theory

Notes 1 Page 3

Theorem 2.5 in the book looks kind of scary, but also very obvious. The composition f of
a function : R R and a function f : D R is the function given by
( f )( x ) = ( f ( x )),

for all x D .

A better way to state the theorem would be :


Theorem 2.5
Suppose : f (D) R is a strictly increasing function, hence for all y1 , y2 f (D) with
y1 > y2 we have (y1 ) > (y2 ). Then x is a maximum of f on D if and only if x is also a
maximum of the composition f on D .
As an example, if f (y) 0 for all y D , then finding the maximum of f on D is the same as
p
finding the maximum of ( f (y))2 or f (y) on D .

1.3 The very basics of mathematics needed


Basic mathematical notation and conventions can be found in Appendix A of the book
( starting at page 315 ). You should read it to refresh your memory.
The other appendices in the book can be ignored.

Notice that in this course we use no clear distinction in notation between a real number
x R and a vector or point y Rn . Also, vectors are in general written with the coordinates
in a row x = ( x1 , . . . , xn ), but will be considered as column vectors. If we want to regard the
same vector as a row vector, an accent x 0 is added to its name.
Be careful with the notation to compare two vectors. So if x = ( x1 , . . . , xn ) and y = (y1 , . . . , yn )
have the same dimension, then
x = y,

if xi = yi for all i = 1, . . . , n;

x y,

if xi yi for all i = 1, . . . , n;

x > y,

if xi yi for all i = 1, . . . , n and xi > yi for at least one i = 1, . . . , n;

x y,

if xi > yi for all i = 1, . . . , n.

Similar definitions exist for , <, and .


Also be careful that, for instance, if x 6 y, then that doesnt mean that x < y. Quite often it
is not possible to compare two vectors; such as x = (0, 1) and y = (1, 0).

The notation for the inner product is again somewhat different from the one you may be
used to. In this course we just write x y for the inner product of two vectors. So we can

write the norm of a vector x as k x k = x x .


The distance d( x, y) of two vectors x, y Rn is just the norm of their difference : d( x, y) =
k x y k.

The most important result on the norm of vectors and the distance between two vectors is
the Triangle Inequality. It comes in two flavours :
* For any two vectors x, y Rn we have k x + yk k x k + kyk.
* For any three vectors x, y, z Rn we have d( x, z) d( x, y) + d(y, z).

MA 208 Optimisation Theory

Notes 1 Page 4

1.4 Basic concepts for functions


Some important general definitions for functions are :
* If f : S T, where S Rn and T Rm , then S is called the domain of f and T is the
range of f .
* For R S, the set of attainable values of f on R or the image of R under f is
f ( R) = { y T | f ( x ) = y for some x R }.
* For U T, define the inverse image of U, f 1 (U ), by
f 1 (U ) = { x S | f ( x ) = y for some y U }.
Notice that f 1 is not the inverse of f , because it is not always true that f 1 ( f ({ x })) = { x }
for all x S, or f ( f 1 ({y})) = {y} for all y T.

1.5 Sequences, convergence and limits


In order to be able to do some more involved analysis, there is no way to avoid sequences
and limits.
A sequence x1 , x2 , . . . will be denoted by { xk }. Because subscripts are also used to indicate
the different elements of a vector ( as in x = ( x1 , . . . , xn ) ) we also use x1 , x2 , . . . and { x k }.
This last notation is used in particular when we have a sequence of vectors. Then we have
the sequence { x k }, and each x k Rn has n coordinates : x k = ( x1k , x2k , . . . , xnk ).

In a way, the following two are the only definitions of convergence involving epsilons and
deltas you need to know.
Definition
A sequence of real numbers { xk } converges to zero, notation xk 0, if for all > 0 there is
an integer K such that | xk | < for all k K.
Definition
A sequence of real numbers { xk } diverges to +, notation xk + or xk +, if for all
M R there is an integer K such that xk > M for all k K.

From the first definition in the previous paragraph, we can obtain all other convergence
concepts we need.
Definition
A sequence of real numbers { xk } converges to x for some x R, notation xk x, if xk
x 0.
Definition
A sequence of points { xk } in Rn converges to x for some x Rn , notation xk x, if k xk
x k 0.
Those of you who like definitions with epsilons and so, can use the following equivalent
definition :

MA 208 Optimisation Theory

Notes 1 Page 5

Definition
A sequence of points { xk } in Rn converges to x for some x Rn , notation xk x, if for all
> 0 there is an integer K such that k xk x k < for all k K.
Instead of converges to x we also write is convergent with limit x.
Note that there is no concept of convergence to infinity if we are dealing with sequences of
points in Rn .

Apart from knowing the definition of convergence, you should also have a fairly good idea
what it means if a sequence is not convergent :
* A sequence of points { xk } in Rn is not convergent to x Rn if there exists an > 0 such
that for all integers K there is a k > K with k xk x k .
In particular this means that if we have a sequence of real numbers { xk } with xk +,
then { xk } does not converge.

Some of the basic results concerning convergence of sequences are :


Theorem
A sequence in Rn has at most one limit.
Theorem
A sequence { x k } in Rn is converges to x, where x k = ( x1k , . . . , xnk ) and x = ( x1 , . . . , xn ), if and
only if xik xi for all i = 1, . . . , n.
Although the latter theorem looks fairly obvious, the formal proof with all details is actually
quite subtle. You should have a look at this proof at least once. It can be found on page 9 of
the book.

There is one more property that is probably well-known.


Property
Suppose a sequence { xk } of real numbers is monotonic nondecreasing ( so x1 x2 )
and there is a number M R such that xk M for all k = 1, 2, . . .. Then the sequence has a
limit x, with x M.

1.6 Open and closed sets


For any x Rn and real number r > 0, the open ball B( x, r ) with centre x and radius r is the set
B( x, r ) = { y Rn | k x yk < r }.

We will use the following definitions for open and closed set :
Definition
A set S Rn is open if for all x S there is an r > 0 such that B( x, r ) S.
A set S Rn is closed if for all sequences { xk } in S that converge to a limit x, also x S.
The two definitions are related by the following theorem, which is equivalent to Theorem 1.20 in the book :
Theorem
A set S Rn is closed if and only if its complement Sc = { x Rn | x
/ S } is open.

MA 208 Optimisation Theory

Notes 1 Page 6

The following should be known notation for the union and intersection of arbitrary collections of sets (S ) A , where A is some index set :
S
A

S = { x | x S for some A };
S = { x | x S for all A }.

Also, for two sets S1 , S2 Rn define the sum S1 + S2 by :


S1 + S2 = { x Rn | x = x1 + x2 for some x1 S1 and z2 S2 }.

You shouldnt try to remember the following results, but try to get a feeling why they are
true.
* The union of an arbitrary collection of open sets is again open.
* The intersection of an arbitrary collection of open sets is not always open.
* The intersection of a finite collection of open sets is again open.
* The sum of two open sets is again open.
* The union of an arbitrary collection of closed sets is not always closed.
* The union of a finite collection of closed sets is again closed.
* The intersection of an arbitrary collection of closed sets is again closed.
* The sum of two closed sets is not always closed.

1.7 Upper and lower bound; supremum and infimum; maximum and minimum
As said in the first subsection, when looking at a problem of the type max{ f ( x ) | x D }
were actually looking at the properties of f (D). Since f (D) is a subset of R, we take a closer
look at subsets of R.
Definition
Let A be a nonempty subset of R.
* An upper bound of A is a point u R such that u a for all a A.
A lower bound of A is a point ` R such that ` a for all a A.
* If A has at least one upper bound, then the supremum of A, notation sup( A), is the smallest
upper bound of A.
If A has no upper bound, then we set sup( A) = +.
If A has at least one lower bound, then the infimum of A, notation inf( A), is the largest
upper bound of A.
If A has no lower bound, then we set inf( A) = .
* The maximum of A, notation max( A), is a point z A such that z a for all a A.
The minimum of A, notation min( A), is a point w A such that w a for all a A.

MA 208 Optimisation Theory

Notes 1 Page 7

The most complicated of the definitions above are those for supremum and infimum. From
the definitions its often not so easy to show whether or not a point is the supremum of a set.
An alternative definition would be the following :
Property
Let A R be a nonempty set. If A has an upper bound, then the supremum of A is the
unique point m with the following properties :
For each x > m we have that x
/ A.
For each x < m we have that there is an a A such that a x.

If A has no upper bound, then sup( A) = +.


A second important property is that a nonempty subset of R always has an infimum and a
supremum. This is basically what is said in Theorem 1.13 in the book. The following, which
will be proved in the lectures, will more or less prove the same.
Property
Let A R be a nonempty set with an upper bound. Then there exists a sequence { ak } in A
such that ak a, where a is an upper bound of A. In particular, a is the supremum of A.
Some further important relations are the following.
Property
Let A R be a nonempty set.
If the supremum sup( A) is an element of A, then A has a maximum, and max( A) =
sup( A).
If the maximum max( A) exists, then also max( A) = sup( A).

1.8 Matrices
Section 1.3 in the book contains much about matrices that I assume you are familiar with.
Read it sometimes to refresh your memory. Some items will be discussed in some detail later
in the course, when appropriate.

Remember that vectors are assumed to be column vectors, although written with the coordinates in a row. So if M is an m n matrix ( hence has m rows and n columns ) and
x = ( x1 , . . . , xn ) is an n-vector, then we will write A x for the product of A and x. If we want
to multiply the matrix from the left by the row vector x, we will write x 0 A.
The assumption above means that we could write the inner product of two vectors x, y Rn
as x y = x 0 y. We will in general only use the second notation when a matrix is involved in
the middle. Hence if we take the inner product of x and A y, then we will usually write this
as x 0 A y, which is the same as x A y.

MA 208 Optimisation Theory

Notes 1 Page 8

From the book


The first two subsections of these notes correspond more or less with Section 2.1 in the
book.
Section 2.2 in the book deals with optimisation problems in parametric form, where there
is some external parameter determining the objective function and/or the constraint set.
We will now and then encounter this kind of problems, but the approach in Section 2.2 is
too heavy for us. Ignore this section.
Section 2.3 in the book contains a whole collection of optimisation problems encountered
in economics or finance. Read it, but many of these examples will return later.
Section 2.4 lists some general questions we will keep ourselves busy with during this
course. Nothing complicated there.
Section 2.5 gives an outline of the book; which is not the same as an outline of this course.
Ignore.

Section 1.1 in the book gives list some basic mathematical facts that I assume to be wellknown. Proofs from here will not be discussed in the course.
You should have some ideas what is happening in subsections 1.2.1 1.2.3 of the book.
Compare with section 1.3 in these notes.
Theorem 1.5 and the related Theorem 1.8 are obvious, but important results. Theorems 1.10 and 1.11 will be needed now and then in some of the more formal proofs,
but you yourself dont really need to know them in detail. Proofs in these sections of the
book are of little interest to us.
Everything we need to know about subsection 1.2.4 in the book can be found in section 1.7
of these notes. The main ideas of the proof of Theorem 1.13 will be discussed in the
lectures.
Subsections 1.2.5 and 1.2.6 are of little interest to us.
Everything you need to know about subsection 1.2.7 is in section 1.6 of these notes.
Subsection 1.2.8 will be discussed in the next notes.
We will try to avoid convex sets as much as possible in this course ! There are plenty of
courses at the LSE where you can learn about them. So ignore subsection 1.2.9.
The contents of subsection 1.2.10 will be discussed at some places in these notes. Read
this subsection once to see what is happening.

As said before, Section 1.3 in the book contains much about matrices that I assume you
are familiar with. Read it sometimes to refresh your memory.
We will need all kinds of bits and pieces from Sections 1.4 to 1.6 in the book. We will
discuss them when they are needed. So ignore at the moment.

MA 208 Optimisation Theory

Notes 1 Page 9

Suitable exercises from the book related to these notes


Section 1.7 : 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 23, 24, 29, 30, 38, 39, 40, 42, 44.
Section 2.6 : 1, 2, 3, 5, 6, 7, 8.

Extra Exercises
1

(a) Use the definition in 1.4 to prove that the sequence { xk } given by xk = 1/k does converge to zero.
(b) Use the definition in 1.4 to prove that the sequence {yk } given by yk = (1)k does not
converge to any limit.

Use the definition of closed set to prove the following statements.


(a) Let 0 be the origin of Rn . Then the set A = {0} is closed.
(b) Any set B Rn with one element is closed.
(c) Define 0 as in (a) and let x = (1, . . . , 1) Rn . Then the set C = {0, x } is closed.
(d) Any set D Rn with two elements is closed.
(e) Any finite set E Rn is closed.
(f) If F = Z is the set of integers, then F is a closed subset of R.
(g) Let G Rn be the empty set ( i.e., the set without elements ). Then G is closed.
(h) Let H = Rn be the entire space. Then H is closed.

Determine which of the sets A, B, C, D, E, F, G, H from the previous question are open. Justify your answers !

Let A R be a nonempty set. Use the definitions from 1.7 to prove the following statements.
(a) If inf( A) = sup( A), then A has only one element.
(b) If A is an open set, then sup( A)
/ A.

Optimisation Theory
MA 208

2007/08

Notes 2
Bounded and compact sets
Continuous Functions
Weierstrass Theorem

2.1 Bounded and compact sets


A set S Rn is bounded if there exists an M > 0 such that S B(0, M). In other words, S is
bounded if there is an M > 0 such that k x k < M for all x S.
Do not mix the notion of a set being bounded, which can hold for any dimension, and having
an upper or lower bound, which only applies for subsets of R.

We use the following two statements as equivalent definitions of a compact set in Rn :


* A set S Rn is compact if for all sequences { xk } in S, there exists a subsequence { xm(k) }
such that xm(k) x for some x S.
( A subsequence { xm(k) } of a sequence { xk } is an infinite sequence xm(1) , xm(2) , xm(3) , . . .,
where m(1), m(2), m(3), . . . is an infinite increasing sequence of integers. )
* A set S Rn is compact if it is both closed and bounded.
The reason that two different definitions exist originates in a more general setting of compact
sets in certain objects called topologies. Also, the second definition is in general easier to use
if you need to show that a given set is compact; but the first one is often more useful if you
need to prove certain properties of a compact set.

Again, dont learn the following properties by heart, but try to get a feeling why they are
true.
* The union of an arbitrary collection of bounded sets is not always bounded.
* The union of a finite collection of bounded sets is again bounded.
* The intersection of an arbitrary collection of bounded sets is again bounded.
* The sum of two bounded sets is again bounded.
* The union of an arbitrary collection of compact sets is not always compact.
* The union of a finite collection of compact sets is again compact.
* The intersection of an arbitrary collection of compact sets is again compact.
* The sum of two compact sets is again compact.
Author : Jan van den Heuvel

c London School of Economics, 2008

MA 208 Optimisation Theory

Notes 2 Page 2

2.2 Limits of functions

Most of you will have a notion of what it means for a function from the real numbers in
itself will have a limit. Definitions for these concepts usually include notions such as x
approaches a or x approaches a from above. But if we are dealing with functions f :
Rn Rm , we must be careful what we mean if we say x approaches a since x and a are
points in some higher dimensional space. We will use the following definition :
Definition
Given a function f : S Rm where S Rn . Then we say that f ( x ) ` as x a,
where ` Rm and a Rn , if for every sequence { xk } in S such that xk 6= a but xk a we
have that f ( xk ) `.
Instead of f ( x ) ` if x a we sometimes write lim f ( x ) = `.
xa

Those of you who like epsilons and so, can use the following equivalent definition :
Definition
Given a function f : S Rm where S Rn . Then we say that f ( x ) ` as x a,
where ` Rm and a Rn , if for every > 0 there is a > 0 such that for all x S with
0 < k x ak < we have k f ( x ) `k < .

For functions f : S R whose range are the real numbers, we have the following additional concept :
Definition
Given a function f : S R where S Rn . Then we say that f ( x ) diverges to + as x a,
where a Rn , if for every sequence { xk } in S such that xk 6= a but xk a we have that
f ( xk ) +.
Here is an equivalent definition not using sequences :
Definition
Given a function f : S R where S Rn . Then we say that f ( x ) diverges to + as x a,
where a Rn , if for all M R there is an > 0 so that for all x Rn such that xk 6= a but
k x ak < , we have that | f ( x ) f ( a)| > M.
We use the notation f ( x ) + as x a, or lim f ( x ) = +.
xa

And for functions f : R R whose domain are the real numbers, we have the following
additional concept :
Definition
Given a function f : R Rn . Then we say that f ( x ) ` as x , where ` Rm , if
for every sequence { xk } in R such that xk we have that f ( xk ) `.
And also this time we can give a definition with epsilons :
Definition
Given a function f : R Rn . Then we say that f ( x ) ` as x , where ` Rm , if
for all > 0 there exists an M R such k f ( x ) `k < for all x > M.
For this we use the notation lim f ( x ) = `.
x

We use similar definitions for x .

MA 208 Optimisation Theory

Notes 2 Page 3

2.3 Continuous functions

We use the following definitions for a function to be continuous :


* A function f : S Rm , where S Rn , is continuous at x S if for all sequences { xk }
in S with xk x, we have that f ( xk ) f ( x ).
* And f is continuous on S if f is continuous at every point in S.

If f : S Rm , then we can think of f as being defined by m coordinate functions f ( x ) =


( f 1 ( x ), . . . , f m ( x )). It can be shown that this means that :
*

f is continuous at x S if and only if each f i is continuous at x S.

2.4 Weierstrass Theorem

Weierstrass Theorem, Theorem 3.1 in the book, reads as follows :


Weierstrass Theorem
Let D Rn be a nonempty compact subset of Rn and f : D R a function which is
continuous on D . Then f has both a maximum and a minimum on D .
The proof can be found in Section 3.3 of the book, and will also be discussed in class. It
basically has two elements, both of whom are of independent interest as well :
Lemma
Let A R be a nonempty compact set. Then A has a maximum and a minimum.
Lemma
If f : D R is continuous on D and D is compact, then f (D) is also compact.
The first lemma has more or less been proved in the lectures already. We showed that a
closed subset of R that has an upper bound has a maximum; and that a closed subset of R
with a lower bound has a minimum. But its very easy to show that a bounded set in R has
a lower and an upper bound. Since compact means both closed and bounded, we are done.

Note that Weierstrass Theorem gives a sufficient condition, not a necessary one. So if the
function f is not continuous or the set D is not compact, you cannot conclude that a maximum of f on D does not exist.

2.5 Examples
In Section 3.2 of the book you can find a selection of worked-out examples in which Weierstrass Theorem can provide some insight into certain optimisation problems. The examples
are described also in Sections 2.3.1, 2.3.4, and 2.3.7 in Chapter 2. Have a look at these sections
in order to understand the examples better.

Here you find some of the typical notation of the book. For instance x Rn+ means that
x = ( x1 , . . . , xn ) is a vector with xi 0 for all i. And p 0 doesnt mean that p is much
larger than 0, but means that p = ( p1 , . . . , p) is a vector with pi > 0 for all i.

MA 208 Optimisation Theory

Notes 2 Page 4

In order to apply Weierstrass Theorem you need to check that the conditions are satisfied.
This is usually not a trivial exercise, in particular since the definitions for compact set and
for continuous function are kind of nasty. Here are some hints that may be helpful.
The formal definition for a function to be continuous is too cumbersome to use in general.
That is why you can use the following rule :
Every reasonably looking function, whose definition involves polynomials and quotients of polynomials, trigonometric functions, exponential and logarithmic functions,
and the like is continuous for every point where it is defined.
A function where the domain is split into different parts, with different function descriptions for the different parts, does not have to be continuous ( but it can be ).
A set S Rn is compact if and only if it is closed and bounded.
To show that a set S Rn is bounded, you need to find a real number M such that
k x k M for all x S.
A particular way to show that this is the case is the following :
Suppose that for a set S Rn there exists numbers m1 , . . . , mn such that for all x S,
where x = ( x1 , . . . , xn ) we have | xi | mi for all i = 1, . . . , n. Then S is a bounded set.
This follows from the previous statement since we have
q
q
k x k = x12 + + xn2 m21 + + m2n ,
q
so you could take M =

m21 + + m2n .

Again, the formal definition of a closed set should only be used when asked to do so explicitly. You can assume the following rules :
The whole space Rn is closed ( but not bounded of course ).
Any subset of Rn described by one or more constraints of the type g( x ) = a or g( x ) a,
where a R and g : Rn R is continuous on Rn , is closed.
But be aware, if there is a constraint of the type g( x ) < a, then the set is often not closed.
Further investigations are needed in that case.

As an example, the set


D = { ( x, y) R2 | x 0, y 0, x y 1 }
is closed, because the functions g1 ( x, y) = x, g2 ( x, y) = y, and g3 ( x, y) = x y are trivially
continuous.
But D is not bounded, since for every M, if we take x = | M| + 1 and y = | M| + 1, then x 0,
y 0, and x y (| M | + 1)2 1, so ( x, y) D . But
q
q

2
2
k( x, y)k = x + y = 2 (| M| + 1)2 > 2 | M| > M.
So we can never find an M such that kzk M for all z D .

MA 208 Optimisation Theory

Notes 2 Page 5

1
on the set D given above.
x+y
First note that f is defined on D , since for all ( x, y) D we have x > 0 and y > 0. Following
the rule of looking reasonable, f is continuous. So we would like to use Weierstrass
Theorem to conclude that a maximum exists. But weve seen above that D is not bounded
hence not compact.

Suppose we are given the problem to maximise f ( x, y) =

The trick here is to apply the final theorem on page 2 of notes 1. Its easy to see that, for
instance, (3, 3) D and since f (3, 3) = 1/6 we know that if a maximum exists it must have
function value at least 1/6. Now define

D1 = { ( x, y) D | x + y 6 }

and

D2 = { ( x, y) D | x + y 6 }.

Then D = D1 D2 . Also, for every ( x, y) D2 we have f ( x, y) 1/6 = f (3, 3), with


(3, 3) D1 . So the theorem will guarantee that f has a maximum on D if we can show that f
has a maximum on D1 .

D1 is a closed set, because D1 = { ( x, y) R2 | x 0, y 0, x y 1, x + y 6 } and so we


can use the rule from above. And now D1 is bounded as well. This follows since for every
( x, y) D1 we have 0 < x < 6 and 0 < y < 6, so
q

k( x, y)k = x2 + y2 < 36 + 36 = 72.


This means that D1 is compact. Since f is continuous, Weierstrass Theorem tells that f has
a maximum on D1 , which must be the maximum on the whole set D .

Note that in the example above we can also conclude that f has a minimum on D1 , since
Weierstrass Theorem guarantees both a maximum and a minimum. But we cannot conclude
that f has a minimum on D .
So, to complete the example above, what can we say about the existence of a minimum of f
on D ? First note that for all ( x, y) D we have f ( x, y) > 0, but there is no ( a, b) D such
that f ( a, b) = 0. On the other hand, it is fairly easy to show that f ( x, y) can get arbitrarily
close to 0 for the right ( x, y) D . For instance, for all x 1 we know that ( x, x ) D . But
1
f ( x, x ) =
0 if x . It follows that 0 is the infimum of f (D) but not the minimum.
2x
We must conclude that f has no minimum on D .

From the book


Section 2.1 of the notes corresponds with Section 1.2.8 and bits and pieces from Section 1.2.10 in the book. The main part of Section 1.2.8 establishes that the second definition we give implies the first one. You should have some idea what is happening in
that proof.
Continuous function appear in Subsection 1.4.1 of the book. But all you need to know is
in section 2.2 of these notes.

Weierstrass Theorem is the topic of Chapter 3 in the book. More or less everything in
that chapter is relevant to us as well.

MA 208 Optimisation Theory

Notes 2 Page 6

Suitable exercises from the book related to these notes


Section 1.7 : 25, 26, 27, 32, 33, 46, 48, 49, 51, 53, 54.
Section 3.4 : 1, 2, 5, 6, 7, 8, 9, 13, 14.

Extra Exercises
1

For the sets A given below, show that every function f : A R is continuous.
(a) A set A Rn with just one element.
(b) A set A Rn with exactly two elements.
(c) A set A Rn with a finite number of elements.
(d) The set A = Z, considered as a subset of R.

Determine which of the sets in (a) (d) from above are compact. ( Use question 2 from
notes 1. )

Give examples of sets and functions with the following properties :


(a) an open set D R and a continuous function f : D R that has both a maximum
and minimum on D ;
(b) a compact set D 0 R and two functions g, h : D 0 R such that neither g nor h has a
maximum or minimum on D 0 , but the sum g + h does.

1
| n = 1, 2, . . . }. Determine, justifying your answers, if the following
n
statements are true :
Let D = {0} {

(a) If f : D R is a continuous function, then f has a maximum on D .


(b) If g : D R is a function so that g( x ) [0, 1] for all x D , then g has a maximum or
a minimum on D .

Let f : R R be a continuous function such that lim f ( x ) = 0 and lim f ( x ) = 0.


x

(a) Show that f must have a minimum or a maximum on R.


(b) Give an example that shows that it is not true in general that f has both a maximum and
a minimum on R.

Optimisation Theory

2007/08

MA 208
Notes 3
Differentiation of functions
Unconstrained Optimisation
First-Order Conditions

3.1 Differentiable functions in general


We use the following definitions for a function to be differentiable :
* A function f : S Rm , where S Rn , is differentiable at x0 S, where x0 must be in the
interior of S, if there exists an m n matrix A so that
f ( x ) f ( x0 ) A ( x x0 )
0
k x x0 k

as x x0 .

The matrix A is called the derivative of f at x and is denoted by D f ( x ).


Moreover, f is differentiable on S if f is differentiable at every point of S.
A couple of remarks about the definitions above. First recall what it means to say that
f ( x ) f ( x0 ) A ( x x0 )

0 as x x0 . From the definition in Notes 2, Section 2.2, it


k x x0 k
follows that this can be translated as for all sequences { xk } in S with xk 6= x0 and xk x0
f ( x k ) f ( x0 ) A ( x k x0 )
we have that
0.
k x k x0 k
For those who really want to see a definition with epsilons and everything, that should be :
for all > 0 there is a > 0 so that for all x S with 0 < k x x0 k < we have that
k f ( x ) f ( x0 ) A ( x x0 )k
< .
k x x0 k
Also note that A is assumed to be a matrix here, so the product A ( x x0 ) is in fact the
product of a matrix and a vector. In particular, you shouldnt write ( x x0 ) A, because that
will be undefined in general.

If f is differentiable on a set S, then the derivative D f can in its turn be seen as a function
D f : S Rmn . If this function is continuous, then f is said to be continuous differentiable,
or f is C1 .

Author : Jan van den Heuvel

c London School of Economics, 2008

MA 208 Optimisation Theory

Notes 3 Page 2

3.2 Further about differentiable functions


Section 3.1 above is far more general than we will need.

In this course we will be mainly interested in functions from Rn into R, i.e., in functions
f : S R, where S Rn . For these functions the definition of differentiable becomes :
* A function f : S R, where S Rn , is differentiable at x0 S, where x0 must be in the
interior of S, if there exists an n-vector a such that
f ( x ) f ( x0 ) a ( x x0 )
0
k x x0 k

as x x0 .

Again we say that D f ( x ) = a, where a is now a vector.

More interesting for practical use are partial derivatives. Let e j Rn be the j-th unit vector, i.e.,
e j has a 1 in the j-th coordinate and a 0 everywhere else. Then the j-th partial derivative of f at

f (x)
a point x is the number
( or
f ( x ) ) such that
x j
x j
f (x + t ej ) f (x)
f (x)

t
x j

as t 0.

The important result connecting the general derivative and partial derivatives is Theorem 1.54
in the book. We only need the following part of that theorem.
* Theorem
Let f : S R be a function, where S Rn is an open set. The function f is C1 on S
( i.e., the derivative D f ( x ) exists for all x S and is continuous ) if and only if all partial
derivatives of f exists and are continuous for all x S.
f (x) f (x)
f (x)
In that case we also have D f ( x ) =
,
, ...,
for all x S.
x1
x2
x2
Note that the statement above is useless if the derivative or some of the partial derivatives
are not continuous. Have a look at Example 1.55 in the book to see that continuity is really
essential, and that there really is something happening in the theorem above.

The way the theorem above is used is usually as follows : given the function f , deduce
all partial derivatives. If these look reasonable, i.e., they exist everywhere on S and are
continuous, then we will assume f to be differentiable.

3.3 Taylors Theorem


Taylors Theorem for functions from Rn into Rm is quite a complicated story in general. We
only need to know part of it when m = 1. The following is a reformulation of the first part
of Theorem 1.75 from the book :

MA 208 Optimisation Theory

Notes 3 Page 3

* Theorem ( Taylors Theorem in Rn , first-order version )


Let f : S R be a function, where S Rn is an open set. Pick x0 S. If f is C1 on S,
then for any x S we can write
f ( x ) = f ( x0 ) + D f ( x0 ) ( x x0 ) + R ( x ) k x x0 k.

Here R : S R is a remainder term, depending on x0 , with the properties that R( x0 ) =


0 and R( x ) 0 as x x0 .
Proof Although the theorem looks kind of scare, its actually very easy to proof. Of course,
easy if you understand what the definition of differentiability and so is.
The definition of f being differentiable at x0 means that
f ( x ) f ( x0 ) D f ( x0 ) ( x x0 )
0
k x x0 k

as x x0 .

f ( x ) f ( x0 ) D f ( x0 ) ( x x0 )
for x S \ { x0 }. ( The reason that we
k x x0 k
exclude x0 is that the quotient is not defined if x = x0 . ) Then we get for free that R( x ) 0
as x x0 . Also, by rearranging this definition we immediately find the main formula

Now define R( x ) =

f ( x ) = f ( x0 ) + D f ( x0 ) ( x x0 ) + R ( x ) k x x0 k.
So the only thing left to do is what to do with R( x0 ). But by defining R( x0 ) = 0 ( remember,
R( x ) was until now undefined for x = x0 ), everything fits nicely together.

3.4 Unconstrained optimisation

Remember that the open ball B( x, r ) with centre x Rn and radius r is the set B( x, r ) =
{ y Rn | d( x, y) < r }, where d( x, y) = k x yk is the distance between x and y.
* For a set S Rn , a point x S is in the interior of S if there exists an r > 0 such that
B( x, r ) S.
The set of all interior points is denoted by int D .
Note that by this definition we see that a set S is open if and only if every point of S is an
interior point of S.

This are unifying definitions for several important maximum-concepts. Here we assume
that D Rn is a non-empty set and f : D R a real-valued function. Several of these can
also be found in Chapter 4 of the book :
* A point x D is a global maximum of f on D if for all y D we have f (y) f ( x ).
* A point x D is a local maximum of f on D if there exists an r > 0 such that for all
y D B( x, r ) we have f (y) f ( x ).
* A point x D is an unconstrained local maximum of f on D if there exists an r > 0 such
that for all y B( x, r ) we have y D and f (y) f ( x ).
Notice the slightly different conditions for points y we need to consider in the definitions for
local maximum and unconstrained local maximum.

MA 208 Optimisation Theory

Notes 3 Page 4

For completeness we give the following easy lemma :


* Lemma
Suppose f : D R has a local maximum in x D where x int D . Then x is an
unconstrained local maximum of f on D .
Proof The fact that x is an interior point of D means that there exists an r1 > 0 such that
for all y B( x, r1 ) we have y D . And the fact that x is a local maximum of f on D means
that there is an r2 > 0 such that for all y D B( x, r2 ) we have f (y) f ( x ).
Now take r = min{r1 , r2 }. Then r > 0 and for all y B( x, r ) we have y D and f (y) f ( x ),
which proves the lemma.

3.5 First-Order Conditions for an unconstrained optimum


Theorem 4.1 in the book gives the well known First-Order Conditions for an unconstrained
local maximum. The theorem in the book doesnt talk about unconstrained, but from the
lemma above it follows that a local maximum which is also an interior point is in fact an
unconstrained local maximum. So this is an alternative formulation of the theorem :
* Theorem 4.1
Suppose x D is an unconstrained local maximum of f on D . If f is differentiable at x ,
then D f ( x ) = 0.

A point x D with D f ( x ) = 0 will be called a critical point.


Hence the theorem says that an unconstrained local maximum of a differentiable function
must be a critical point. But it follows from the classical Example 4.2 in the book that not
every critical point is an unconstrained local maximum.

Two proofs of Theorem 4.1 can be found in Section 4.5 in the book. Here is another proof,
using Taylors Theorem.
Proof The fact that x is an unconstrained local maximum of f on D means that there is an
r > 0 such that for all x B( x , r ) we have that x D and f ( x ) f ( x ).
From Taylors Theorem we know that there is a function R1 : D R such that R1 ( x ) = 0,
R1 ( x ) 0 as x x , and
f ( x ) = f ( x ) + D f ( x ) ( x x ) + R1 ( x ) k x x k.
Now suppose that D f ( x ) 6= 0. For easy notation say D f ( x ) = a, where a Rn \ {0}. For
any real number t define xt = x + t a. Then we have
f ( x t ) = f ( x ) + a ( x t x ) + R1 ( x t ) k x t x k

= f ( x ) + a ( t a ) + R1 ( x + t a ) k t a k
= f ( x ) + t a a + R1 ( x + t a ) k t a k.
Now a a = k ak2 and kt ak = |t| k ak. ( Notice that t is a real number and a a vector, so we
must treat them different. ) So we can write
f ( xt ) = f ( x ) + t k ak2 + R1 ( x + t a) |t|k ak

= f ( x ) + k a k t k a k + | t | R1 ( x + t a ) .

MA 208 Optimisation Theory

Notes 3 Page 5

Now first look at the case where t > 0. Then |t| = t so the formula becomes

f ( x t ) = f ( x ) + t k a k k a k + | R1 ( x + t a ) .
Now use that for t small enough we know that k x xt k < r so that f ( xt ) f ( x ). But we
also know R1 ( x ) 0 as x x , so again by taking t small enough we can be sure that
R1 ( x + t a) > k ak, where we use that a 6= 0, so k ak > 0. So for sufficiently small values
of t we get that

f ( x t ) = f ( x ) + t k a k k a k + | R1 ( x + t a )

> f ( x ) + t k ak k ak k ak

= f ( x ).
But this contradicts the fact that we must that for all sufficiently small t we must have f ( xt )
f ( x ) ! The only place where we assumed something that could cause this contradiction is
the assumption that D f ( x ) 6= 0. So this assumption must be false, hence we must have
D f (0) = 0.
We can do the case that x is a local unconstrained minimum very fast in the same way by
looking at the case t < 0, hence |t| = t.

3.6 Applying the First-Order Conditions

Theorem 4.3 in the book describes the well-known Second-Order conditions for an unconstrained maximum or minimum. We actually wont discuss these. The reason for that is
that the Second-Order Conditions say nothing about the possibilities that a critical point is
a global optimum. Parts 3 and 4 of Theorem 4.3 can can only be used to guarantee that a
certain critical point is a local maximum or minimum. And parts 1 and 2 can only be used
in the sense that if the Hessian D f 2 ( x ) is neither negative nor positive semidefinite, then x
cannot be a local maximum or minimum.
The general way to apply the First-Order Conditions for finding and optimum of f ( x ) on D
is as follows :
Determine if a maximum or minimum exists. Weierstrass Theorem can be used for
this. But it may also be possible to show that no maximum exist because you can show
that f ( x ) can get arbitrarily large for x D .
Determine the derivative D f on the interior of D.
Use the First-Order conditions to find the critical points of f . These points are candidate
optima.
Since critical points must be interior points of D , you need to do the non-interior points
of D in a different way. ( At this moment, we dont have a good method to do so, and
hence must rely on ad-hoc methods and common sense. )
If all candidate optima are known ( either because they are critical points, or from observing the non-interior points ), then calculate the functions values in these points in order
to find a possible global maximum or minimum.

MA 208 Optimisation Theory

Notes 3 Page 6

Notice that he Second-Order Conditions play no role in the recipe above. The reason for that
is already given above : the Second-Order Conditions say nothing about the possibilities
that a critical point is a global optimum. And in general we are only interested in finding the
global optima.
We could use the Second-Order Conditions to decide that some critical points cannot be local
optima, hence no global optima. In order words, this might remove some critical points as
candidates for being an optimum. But we still have to do most steps above to determine the
global optima.

From the book


Differentiability and derivatives of functions are discussed in Sections 1.4.2 1.4.5 of the
book. All important concepts and points can be found in the first three sections of these
notes. But to get an idea about the subtleties and other points we ignore, you can read
some parts in the book. I recommend Section 1.4.2 and Example 1.55.
Taylors Theorem can be found in Section 1.6.2 of the book. But you can safely ignore that
section, and look at the part in these notes.

Unconstrained optimisation is the topic of Chapter 4 of the book. Everything in Sections 4.1, 4.2, and 4.4 is actually fairly good material to read. ( Even although Section 4.4
has the words Second-Order Conditions in its title, nothing is done with those. ) Together with whats in the notes, that should be more than enough. You can ignore the
fairly complicated proof in Section 4.5. But it would be nice if you have some idea what
is happening in the proofs in these notes, in particular how the Taylor approximations
very quickly lead to some good insights on what is happening in the neighbourhood of
specific points.
Finally, Sections 4.3 and 4.6 are completely irrelevant for us.

Suitable exercises from the book related to these notes


Section 1.7 : 57.
Section 4.7 : 1, 2 ( ignore part about second-order test ), 4 ( only determine if critical points
are global extrema ), 5, 8.

Optimisation Theory
MA 208

2007/08

Notes 4
Constrained Optimisation with Equality Constraints
Lagranges Theorem and Method
4.1 Introduction

The kind of optimisation problems we will look at in these and the following notes are problems with a constraint set of the form
D = U { x Rn | g( x ) = 0, h( x ) 0 }.
Here U is an open set ( often the whole Rn ) and g, h are certain multidimensional functions :
g : Rn Rk , for some k, and h : Rn R` , for some `.
We will usually explicitly write the k components of h, and the ` components of g. So instead
of g( x ) = 0, h( x ) 0 we will write
g1 ( x ) = 0,

g2 ( x ) = 0,

...,

gk ( x ) = 0;

h1 ( x ) 0,

h2 ( x ) 0,

...,

h` ( x ) 0.

You may notice that there are no constraints of the form ji ( x ) > 0, i = 1, . . . , m. The reason
for this is that for reasonable functions ji : Rn R ( for instance, if ji is continuous ) the set
{ x Rn | j1 ( x ) > 0, . . . , jm ( x ) > 0 } will be an open set. That means that constraints of that
form will appear in the definition of the open set U.
The main reason to give the open set U such a separate role, is that every point of an open
set in an interior point of that set. And we already know quite a lot about how to handle
optimisation for interior points from Notes 3.

It is often necessary to rewrite the constraint set to the standard form above. For instance we
are asked to find the minimum of a certain function f : R3 R when x D defined by

D = { ( x, y, z) R3 | x > 0, y > 0, x + z = 3, y z2 , y + z ln( x ) }.


In order to write this in the standard form, we need to define the set U and the functions gi
and hi . It is fairly straightforward that this can be done as follows :
U = { ( x, y, z) R3 | x > 0, y > 0 };

g1 ( x, y, z) = x + z 3; h1 ( x, y, z) = y + z2 ;

h2 ( x, y, z) = ln( x ) + y + z.

This gives the following standard form for D :

D = U { x R3 | g1 ( x ) = 0, h1 ( x ) 0, h2 ( x ) 0 }.
Author : Jan van den Heuvel

c London School of Economics, 2008

MA 208 Optimisation Theory

Notes 4 Page 2

4.2 Lagranges Theorem


Lagranges Theorem deals with problems where the constraint set above contains only equality constraints of the form gi ( x ) = 0.

A formulation of Lagranges Theorem can be found in Theorem 5.1 in the book. That formulation involves a rather technical condition Suppose also that ( Dg( x )) = k, which we
will translate a little here.
For a matrix A, ( A) denotes the rank of A. A further definition and some properties can
be found in the book in Section 1.3.3. I advise you to read that section ( ignore the final
Theorem 1.43 ). In particular you should know that the column-rank of A ( the maximum
number of independent columns of A ) is equal to the row-rank of A ( the maximum number
of independent rows of A ); and that this number is called the rank of A.

If the constraint function g is seen as a function from Rn into Rk , then Dg( x ) is an n k


matrix. Using the definition of the rank of a matrix, it follows that this matrix has rank k
if and only if it has column rank k, hence if and only if the k column vectors of Dg( x ) are
linearly independent. This gives :
* For a function g : Rn Rk , the rank of Dg( x ) is k if and only the vectors { Dg1 ( x ), . . .,
Dgk ( x ) } form an independent set of vectors.

Using the observation above, Lagranges Theorem can be written as follows. Recall that a
function is a C1 function if its derivative exists and the derivative is continuous everywhere
where the function is defined.
* Theorem ( Lagranges Theorem )
Let f : U R be a C1 function on a certain open set U Rn , and let gi : Rn R,
i = 1, . . . , k, be C1 functions. Suppose x is a local maximum or minimum of f on the set

D = U { x Rn | gi ( x ) = 0, i = 1, . . . , k }.
Suppose also that the derivatives { Dg1 ( x ), . . . , Dgk ( x ) } form an independent set of
vectors.
k

Then there exist 1 , . . . , k R such that D f ( x ) + i Dgi ( x ) = 0.


i =1

Note that Lagranges Theorem only gives a necessary condition for a local minimum or
k

maximum. If we have a point x0 with D f ( x0 ) + i Dgi ( x0 ) = 0 for some 1 , . . . , k R,


i =1

it doesnt mean that that point must be a local minimum or maximum.


Also note that there is this weird condition Suppose also that the derivatives { Dg1 ( x ), . . . ,
Dgk ( x ) } form an independent set of vectors. If this is not the case for some point, then
such a point may fail to satisfy the final conclusion of the theorem, even if the point is a local
minimum or maximum. ( See also Section 4.4 below. )
And finally, Lagranges Theorem only says something about local maxima or minima. Different methods are needed to decide which of those are a global maximum or minimum.

MA 208 Optimisation Theory

Notes 4 Page 3

4.3 Proof of Lagranges Theorem


The proof of Lagranges Theorem can be found in Section 5.6 in the book. Although that
proof is not too long, it depends on the Implicit Function Theorem, Theorem 1.77 in the book.
And a look at that theorem will convince you that this is no laughing matter. Moreover, the
proof as given in the book doesnt give you much idea of what is actually going on. Thats
why a sketch of an alternative proof is given below.

Sketch of Proof Use the notation from the statement of the theorem on the previous page,
and suppose x is a local maximum of f on D . Using the first order Taylor Approximation
at x we get that
f ( x ) = f ( x ) + D f ( x ) ( x x ) + remainder,
where a more precise description of the remainder term can be found in Notes 3. Now write
x = x + a, where k ak is small enough to guarantee that x + a U. ( This is possible since U
is an open set. ) Then we get that
f ( x + a) = f ( x ) + D f ( x ) a + remainder.
Now recall that x was a local maximum on D . Thus for small enough k ak with x + a D
we must have that f ( x + a) f ( x ) and f ( x a) f ( x ). If we substitute that in the
formula above, and forget about the remainder term, then this must mean that D f ( x ) a 0
and D f ( x ) ( a) 0. We get the following necessary condition :
(1)

For all a Rn with x + a D and k ak small enough we have that D f ( x ) a = 0.

In order to get an idea what it means that x + a D , we have a look at the constraint
functions g1 , . . . , gk . Their Taylor Approximation around x , writing x = x + a, looks as
gi ( x + a) = gi ( x ) + Dgi ( x ) a + remainder,

for i = 1, . . . , k.

But we know that x D , hence gi ( x ) = 0; and we are only interested in those a such that
x + a D , hence such that gi ( x + a) = 0. Filling this in into the formula above, means that
we are only interested in those a such that 0 = 0 + Dgi ( x ) a + remainder, for i = 1, . . . , k.
Again ignoring the remainder term, this means that get the following statement :
(2)

In order to have x + a D for a certain a Rn with k ak small, we need that


Dgi ( x ) a = 0, for all i = 1, . . . , k.

Now note that if we have an a with D f ( x ) a = 0 or Dgi ( x ) a = 0, then the same holds for
every scalar multiple a. So we can ignore the condition k ak small, to get the following
combination of statements (1) and (2).
(3)

For all a Rn with Dgi ( x ) a = 0 for i = 1, . . . , k, we need that D f ( x ) a = 0.

In a lemma below we will show that the only way that statement (3) can be true is if D f ( x )
is a linear combination of Dg1 ( x ), . . . , Dgk ( x ). So there must exist 1 , . . . , k R such that
k

i =1

i =1

D f ( x ) = i Dgi ( x ). But this is the same as D f ( x ) + (i ) Dgi ( x ) = 0, which gives


Lagranges Theorem if we set i = i , for i = 1, . . . , k.

MA 208 Optimisation Theory

Notes 4 Page 4

Here is the lemma promised above.


* Lemma
Let x, y1 , . . . , yk Rn . Suppose that we know that for all a Rn with yi a = 0 for
i = 1, . . . , k, we also have x a = 0. Then x is a linear combination of y1 , . . . , yk .
Proof Suppose that x is not a linear combination of y1 , . . . , yk . In other words, x is not
in the subspace of Rn spanned by the vectors y1 , . . . , yk . Now write x = z1 + z2 , where z1
is the orthogonal projection of x on the subspace spanned by y1 , . . . , yk , and z2 = x z1 .
Then we have z2 6= 0, and also b z2 = 0 for all vectors b in the subspace spanned by
y1 , . . . , yk . In particular we have that yi z2 = 0 for i = 1, . . . , k. By the condition in the
lemma, this means that x z2 = 0. But we also have that z1 z2 = 0, which means that
x z2 = (z1 + z2 ) z2 = z1 z2 + z2 z2 = 0 + kz2 k2 6= 0. This gives a contradiction, so we
must have that x is a linear combination of y1 , . . . , yk .

4.4 The Constraint Qualification


The condition in Lagranges Theorem that the derivatives { Dg1 ( x ), . . . , Dgk ( x ) } form an
independent set of vectors is called the Constraint Qualification. There are two easy cases in
which the Constraint Qualification fails :
If there is a gi such that gi ( x ) = 0;
if the number of constraints k is larger than the dimension n. ( The number of elements
in an independent set is at most the dimension of the space the vectors come from. )
But you should not look for just those two cases; it may be that the Constraint Qualification is
not satisfied in a certain point because in that point the derivatives of the constraint functions
just dont form an independent set.

If you go through the sketch of the proof of Lagranges Theorem in Section 4.3 above, then
you may notice that the Constraint Qualification doesnt seem to play a role there. The
reason is that we did some hand-waving at a couple of places. In particular, we neglected
the remainder terms in the first order Taylor Approximations for the constraint functions gi .
But if, for instance, the constraint function has Dgi ( x ) = 0, then the remainder term is
actually the most important term in deciding if x + a D . So ignoring it at that case makes
the rest of the argument pretty useless.
Something similar, but a bit more subtle, happens when the vectors { Dg1 ( x ), . . . , Dgk ( x ) }
are not independent.

4.5 The Lagrangean Multipliers


The numbers 1 , . . . , k in Lagranges Theorem are called the Lagrangean Multipliers. They
have a meaning for the corresponding local optimum x as follows :
* Property
Suppose x is a local optimum of f for which the Lagrangean Theorem with Lagrangean
Multipliers 1 , . . . , k holds. Then a small relaxation of the j-th constraint, replacing
g j ( x ) = 0 by g j ( x ) + = 0, will give a new optimum x () for which we have approximately f ( x () ) f ( x ) + j .

MA 208 Optimisation Theory

Notes 4 Page 5

Sketch of proof We give a sketch of the proof of the statement above using again the first
order Taylor Approximations of the functions involved. We assume that we replaced the j-th
constraint g j ( x ) = 0 by g j ( x ) + = 0, for some small . In other words, we use a new j-th
()

constraint function g j

()

given by g j ( x ) = g j ( x ) + . And after this change we get a new

local optimum x () .
We first look at the Taylor Approximations at x for the constraint functions :
gi ( x ) = gi ( x ) + Dgi ( x ) ( x x ) + remainder,

for i = 1, . . . , k.

For the constraints that havent changed, we have that both gi ( x ) = 0 and gi ( x () ) = 0. If
we fill in x = x () into the formula above, using the knowledge from the previous sentence,
and neglecting the remainder term, we get
(4)

Dgi ( x ) ( x () x ) 0,

for i = 1, . . . , k, i 6= j.
()

For the new and old j-th constraint we need that g j ( x ) = 0 and g j ( x () ) = 0, which gives
()

0 = g j ( x () ) = g j ( x () ) + = g j ( x ) + Dg j ( x ) ( x () x ) + + remainder

= 0 + Dg j ( x ) ( x () x ) + + remainder.
Again neglecting the remainder term, we find that
(5)

Dg j ( x ) ( x () x ) .
k

Now use that we assume that x satisfies the condition D f ( x ) + i Dgi ( x ) = 0, hence
i =1

D f ( x ) = i Dgi ( x ). Filling this in into the Taylor Approximation for f at x , neglecti =1

ing the remainder term, and using the knowledge in (4) and (5), we get
f ( x () ) f ( x ) + D f ( x ) ( x () x )
k

= f ( x ) i Dgi ( x ) ( x () x )
i =1

f ( x )

i 0 j ()

i =1, i 6= j

= f ( x ) + j .
This proves the statement.

An economical interpretation of the above is the following : Suppose the optimum x is


a maximum, and you are given the option to replace the j-th constraint g j ( x ) = 0 by the
constraint g j ( x ) + c = 0, provided you pay a certain price p. Then you only should pay this
price if the increase in the maximum is more than p, hence if i c p.
Because of this interpretation, i is sometimes called the shadow price of constraint i at x .

MA 208 Optimisation Theory

Notes 4 Page 6

4.6 Second-Order Conditions


Similar to the fact that the First-Order Conditions for Unconstrained Optima in Notes 3 can
be accompanied by Second-Order Conditions, so there exists a Second-Order companion for
Lagranges Theorem. You can find it in Section 5.3 of the book. But this time the SecondOrder Conditions are very complicated, both to state them and to use them for actual problems. ( And their proof in 5.7 is an experience you want to avoid at all cost. )
Because of this, we wont look at the Second-Order Conditions. You may note that the conditions are so awkward that also the book never uses them in further discussion or asks about
them in the exercises.

4.7 Applying Lagranges Theorem


Sections 5.4 and 5.5 in the book give explicit descriptions of how to use Lagranges Theorem
in all kinds of situations. They all have the form ( or can be translated to the form ) :
maximise f ( x ) subject to x D = U { x Rn | g1 ( x ) = 0, . . . , gk ( x ) = 0 },
where f , gi : Rn R are given C1 functions, and U Rn is an open set.

The Cookbook Procedure in Section 5.4.1 is simply trying to find the points x Rn and
the Lagrangean Multipliers 1 , . . . , k such that g1 ( x ) = 0, . . . , gk ( x ) = 0, and D f ( x ) +
k

i Dgi ( x ) = 0. The first equation is actually a vector equation, involving vectors with n

i =1

coordinates. If we write out the equations coordinate by coordinate we get


gi ( x ) = 0,

for all i = 1, . . . , k,

f ( x ) + i
gi ( x ) = 0,
x j
x j
i =1

for all j = 1, . . . , n.

We sometimes call the equations above the Lagrangean equations.


Another way to describe these equations is by defining the Lagrangean
k

L( x, ) = f ( x ) + i gi ( x ),
i =1

where x Rn and = (1 , . . . , k ) Rk . Then the equations above can be written as

L( x , ) = 0,
i

L( x , ) = 0,
x j

for all i = 1, . . . , k,
for all j = 1, . . . , n.

The Cookbook Procedure involves solving a system of n + k equations, where there are
n + k unknowns x1 , . . . , xn , i , . . . , k . It is not always easy to find all solutions.
Moreover, all solutions to the Lagrangean equations are only candidates for local or global
optima; you still need to find out the true nature of these points.

MA 208 Optimisation Theory

Notes 4 Page 7

And finally, the procedure doesnt work for points where the Constraint Qualification is not
satisfied. These points should be identified separately and all points for which the Constraint
Qualification is not satisfied should be added to the set of candidates for the optima.
And really finally, dont forget that there may be an open set U used in the definition of D .
If you find a point x satisfying the Lagrangean equations above and the Constraint Qualification, but lies outside U, then it should not be considered as a candidate optimum.

The problem with the Cookbook Procedure from the book is that it ignores some facts ( see
the previous paragraph ). So here is an improved recipe for solving equality constrained
optimisation problems.
Given : the optimisation problem
max/min-imise f ( x ) subject to x D = U { x Rn | g1 ( x ) = 0, . . . , gk ( x ) = 0 },
where f , gi : Rn R are C1 functions, and U Rn is an open set.
1. If possible, find a good reason why a maximum or minimum must exist. For this, Weierstrass Theorem would be a prime source of knowledge, but sometimes ad-hoc methods
will be required.
2. Determine the derivatives Dg1 ( x ), . . . , Dgk ( x ) of the constraint functions. Try to find all
points in D for which the vectors { Dg1 ( x ), . . . , Dgk ( x ) } are not independent.
3. Determine the derivative D f ( x ) of the objective function and formulate the Lagrangean
equations :
gi ( x ) = 0,

for all i = 1, . . . , k,

f ( x ) + i
gi ( x ) = 0,
x j
x
j
i =1

for all j = 1, . . . , n.

4. Find all values x U and multipliers 1 , . . . , k for which the equations above are satisfied.
At this point you should have a collection of candidates for the optima : the points from 2
for which the Constraint Qualification failed and the points x in 4 satisfying the Lagrangean equations. No other point can be a maximum or minimum of f on D .
5. If you know from step 1 that a maximum or minimum must exist, then calculate the
function values for all candidate points from above. The points x which give the maximal
value f ( x ) must form a global maximum. And similarly for the global minima.
If you havent been able in step 1 to guarantee the existence of a maximum or minimum,
then you probably have to do some more work. Check the candidate points and see
which could be a global maxima or minima and why ( or why not ).
If you havent been able in step 1 to guarantee the existence of a maximum or minimum,
and no candidate points are found in steps 2 or 4, then no maximum and minimum exists.
It may be a good idea to check if you can confirm that using some other reasons. ( For
instance, the function has no upper and lower bound on D . )
If no candidate point is left from steps 2 and 4, but you claimed in step 1 that a maximum
or minimum must exist, then there is something seriously wrong. Check your work and
try to find the mistake(s).

MA 208 Optimisation Theory

Notes 4 Page 8

From the book


Rank of matrices appears in Section 1.3.3 in the book. Most should be familiar from linear
algebra courses.
Equality constrained optimisation and Lagranges Theorem is the topic of Chapter 5 of
the book. You should have a good look at Sections 5.1, 5.2.1 and 5.2.2. Section 5.2.3 is
slightly more technical than Section 4.5, and can be skipped.
Study Sections 5.4 and 5.5 from the book to see how Lagranges Theorem is applied and
where the problems can appear. But it may be a good idea to use the more extensive
Cookbook Procedure from the previous page.

You can ignore Sections 5.3 and 5.7 on the Second-Order Conditions. Also the proof in 5.6
is beyond our reach. Look at the sketch of the proof in Section 4.3. Again, dont learn that
proof by heart, but try to get an understanding of the main ideas, in particular the use
of the Taylor Approximations. A similar remarks holds for the role of the Lagrangean
Multipliers in Section 4.5.

Suitable exercises from the book related to these notes


Section 5.8 : 1 11.
Most of the questions will take a considerable amount of time; often because its quite some
work to find all solutions for the Lagrangean equations. Make sure you get enough practice.

Optimisation Theory
MA 208

2007/08

Notes 5
Constrained Optimisation with Inequality Constraints
Kuhn-Tuckers Theorem

5.1 Introduction

In these notes we will look at constraint sets of the forms


D = U { x Rn | hi ( x ) 0, i = 1, . . . , ` },
called an optimisation problem with inequality constraints; and of the form

D = U { x Rn | g j ( x ) = 0, j = 1, . . . , k; hi ( x ) 0, i = 1, . . . , ` },
called an optimisation problem with mixed constraints.
Here U Rn is an open set and the functions g j , hi are assumed to be C1 , i.e., the functions
have continuous first derivatives.

5.2 Kuhn-Tuckers Theorem

Section 6.1.1 contains the statement of Kuhn-Tuckers Theorem. We need one extra definition : An inequality constraint hi ( x ) 0 is said to be effective at a certain point x if we have
hi ( x ) = 0, i.e., if the constraint holds with equality at x .
* Theorem ( Kuhn-Tuckers Theorem for Local Maxima )
Let f : U R be a C1 function on a certain open set U Rn , and let hi : Rn R,
i = 1, . . . , `, be C1 functions. Suppose x is a local maximum of f on the set

D = U { x Rn | hi ( x ) 0, i = 1, . . . , ` }.
Let E {1, . . . , `} denote the set of effective constraints at x . Suppose that the derivatives { Dhi ( x ) | i E } form an independent set of vectors.
Then there exist 1 , . . . , ` R such that
i 0,

for i = 1, . . . , `;

i hi ( x ) = 0,

for i = 1, . . . , `;

D f ( x ) + i Dhi ( x ) = 0.
i =1

Author : Jan van den Heuvel

c London School of Economics, 2008

MA 208 Optimisation Theory

Notes 5 Page 2

By replacing f ( x ) by f ( x ) we get a result concerning local minima.


* Corollary ( Kuhn-Tuckers Theorem for Local Minima )
Let f : U R be a C1 function on a certain open set U Rn , and let hi : Rn R,
i = 1, . . . , `, be C1 functions. Suppose x is a local minimum of f on the set

D = U { x Rn | hi ( x ) 0, i = 1, . . . , ` }.
Let E {1, . . . , `} denote the set of effective constraints at x . Suppose that the derivatives { Dhi ( x ) | i E } form an independent set of vectors.
Then there exist 1 , . . . , ` R such that
i 0,
i hi ( x )

for i = 1, . . . , `;

= 0,

for i = 1, . . . , `;

D f ( x ) i Dhi ( x ) = 0.
i =1

Note that the Kuhn-Tuckers Theorems for maxima and for minima are not exactly the same.
So in order to find both the maximum and the minimum for an inequality constrained optimisation problem, you must do certain steps twice.
The conditions that i hi ( x ) = 0 for all i are called the complementary slackness conditions.
Since we must have that hi ( x ) 0 and i 0, we can only have slack in one of the conditions ( i.e., i > 0 or hi ( x ) > 0 ) if the other condition is satisfied with equality ( i.e.,
hi ( x ) = 0 or i = 0, respectively ).

5.3 Proof of Kuhn-Tuckers Theorem


The proof of Kuhn-Tuckers Theorem in Section 6.5 of the book is long and complicated in
its full generality and with all subtleties accounted for. The sketch below avoids many of
the subtleties by assuming that the linear Taylor Approximations give a good idea of the
behaviour of the functions concerned. Many steps in the proof are comparable to certain
steps in the proof of Lagranges Theorem from Notes 4. So you might want to have a look at
that proof again.

Sketch of Proof We use the notation from the statement of the theorem above. In particular,
x is a local maximum of f on D and E is the set of effective constraints, i.e., E = { i |
hi ( x ) = 0 }. This means that hi ( x ) > 0 for all i
/ E. Since the hi are continuous functions,
n
the sets { x R | hi ( x ) > 0 } are open for all i
/ E. So if we define
U 0 = U { x Rn | hi ( x ) > 0, i
/ E },

and

D 0 = U 0 { x R | hi ( x ) 0, i E },
then U 0 is an open set, and x D 0 D . Since x is a maximum of f on D , it certainly
is a maximum of f on D 0 . In the remainder of the proof we forget about the non-effective
constraints, because they are taken care of by U 0 .

MA 208 Optimisation Theory

Notes 5 Page 3

The only thing we have to do is saying what the values of i are for i
/ E. We just take
i = 0,

for all i
/ E;

in that way we are sure that the conditions i 0 and i hi ( x ) = 0 are satisfied for i
/ E.
Now we look back to the start of the proof of Lagranges Theorem in Notes 4. The Taylor
Approximation around x of f , writing x = x + a, is
f ( x + a) = f ( x ) + D f ( x ) a + remainder.
Recall that x is a maximum of f on D 0 . Thus for small enough k ak with x + a D 0 we
must have that f ( x + a) f ( x ). If we substitute that in the formula above, and forget
about the remainder term, then this must mean that D f ( x ) a 0. So we get the following
necessary condition :
(1)

For all a Rn with x + a D 0 and k ak small enough we have that D f ( x ) a 0.

Continuing to follow the proof of Lagranges Theorem, the Taylor Approximations of hi


around x , writing x = x + a, becomes
hi ( x + a) = hi ( x ) + Dhi ( x ) a + remainder,

for i E.

But we know that hi ( x ) = 0; and we are only interested in those a such that x + a D 0 ,
hence such that hi ( x + a) 0, for i E. Filling this in into the formula above, means that
we are only interested in those a such that 0 + Dhi ( x ) a + remainder 0, for i E. If we
ignore the remainder term, then this gives the following statement :
(2)

In order to have x + a D 0 for a certain a Rn with k ak small, we need that


Dhi ( x ) a 0, for all i E.

Note that if we have an a with D f ( x ) a = 0 or Dhi ( x ) a 0, then the same holds for
every scalar multiple a with 0. So we can ignore the condition k ak small, to get the
following combination of statements (1) and (2).
(3)

For all a Rn with Dhi ( x ) a 0 for all i E, we need that D f ( x ) a 0.

In a lemma below we will show that the only way that statement (3) can be true is if
(4)

D f ( x ) = i Dhi ( x ),
iE

with i 0.

So if we set i = i for i E, then certainly i 0 for i E. And since hi ( x ) = 0 for


i E, we also have i hi ( x ) = 0.

MA 208 Optimisation Theory

Notes 5 Page 4

So the only thing left to show is that the last equation in Kuhn-Tuckers Theorem is satisfied.
But that is now very easy, using (4) and the fact that i = 0 for i
/ E:
`

D f ( x ) + i Dhi ( x ) = D f ( x ) + i Dhi ( x ) + i Dhi ( x )


i =1

iE

i
/E

= D f ( x ) + (i ) Dhi ( x ) + 0 Dhi ( x )
iE

i
/E

= D f ( x ) D f ( x ) + 0 = 0,
which completes the proof.

Here is the promised lemma.


* Lemma
Let x, y1 , . . . , ym Rn , where y1 , . . . , ym is an independent set of vectors. Suppose that we
know that for all a Rn with yi a 0 for i = 1, . . . , m, we also have x a 0. Then x
can be written as x = 1 y1 + + m ym with i 0.
Proof Suppose first that x is not a linear combination of y1 , . . . , ym . In other words, x is not
in the subspace of Rn spanned by the vectors y1 , . . . , ym . Now write x = z1 + z2 , where z1
is the orthogonal projection of x on the subspace spanned by y1 , . . . , ym , and z2 = x z1 .
Then we have z2 6= 0, and also b z2 = 0 for all vectors b in the subspace spanned by
y1 , . . . , ym . In particular we have that yi z2 = 0 for i = 1, . . . , m. By the hypothesis in the
lemma, this means that x z2 0. But we also have that z1 z2 = 0, which means that
x z2 = (z1 + z2 ) z2 = z1 z2 + z2 z2 = 0 + kz2 k2 > 0. This gives a contradiction, so we
must have that x is a linear combination of y1 , . . . , ym .
So at this moment we know that we can write x = 1 y1 + + m ym , but we still have to
show that all i 0. We do this for 1 only, the other ones can be done similarly.
Since y1 , . . . , ym are independent, y1 is not a vector in the subspace spanned by y2 , . . . , ym .
Now write y1 = v1 + v2 , where v1 is the orthogonal projection of y1 on the subspace spanned
by y2 , . . . , ym , and v2 = y1 v1 . Then we have v2 6= 0, and also b v2 = 0 for all vectors b in
the subspace spanned by y2 , . . . , ym . In particular we have that
yi v2 = 0,

for i = 2, . . . , m.

But we also have that v1 v2 = 0, which means that


y1 v2 = (v1 + v2 ) v2 = v1 v1 + v2 v2 = 0 + kv2 k2 > 0.
By the hypothesis in the lemma, this means that x v2 0. Using the formulas above give

0 x v2 =

m
m
i y i v2 = i y i v2 = 1 k v2 k2 .

i =1

This gives 1 0, as promised.

i =1

MA 208 Optimisation Theory

Notes 5 Page 5

5.4 Mixed Constraint Optimisation Problems


By combining Lagranges Theorem and Kuhn-Tuckers Theorem, we get the following result
for mixed constraint optimisation problems where

D = U { x Rn | g1 ( x ) = 0, . . . , gk ( x ) = 0, h1 ( x ) 0, . . . , h` ( x ) 0 }.
To make the notation not too cumbersome, we define functions i : Rn R for i =
1, . . . , k + `, where
(
gi ,
for i = 1, . . . , k,
i =
hi k ,
for i = k + 1, . . . , k + `.
We also only formulate the result for maxima. As before, a constraint i is effective at a point
x D if i ( x ) = 0. This means that the constraints 1, . . . , k are always effective.
* Theorem
Let f : U R be a C1 function on a certain open set U Rn , and let i : Rn R,
i = 1, . . . , k + 1 be C1 functions. Suppose x is a local maximum of f on the set

D = U { x Rn | i ( x ) = 0, i = 1, . . . , k; i ( x ) 0, j = k + 1, . . . , k + ` }.
Let E {1, . . . , k + `} denote the set of effective constraints at x . Suppose that the
derivatives { Di ( x ) | i E } form an independent set of vectors.
Then there exist 1 , . . . , k+` R such that
j 0,

for j = k + 1, . . . , k + `;

j i ( x ) = 0,

for j = k + 1, . . . , k + `;

k +`

D f ( x ) + i Di ( x ) = 0.
i =1

5.5 The Constraint Qualification


The condition in Kuhn-Tuckers Theorem that the derivatives { Dhi ( x ) | i E } form an
independent set of vectors is again called the Constraint Qualification. And also this time, if
the qualification fails for a certain point, then the theorem may not work for that point.
But for the Kuhn-Tuckers Theorem it is usually more work if and when the Constraint
Qualification is not satisfied. The reason is that the qualification only looks at the constraints
that are effective at a certain point. So, for instance, even if the number of constraints `
is larger than the dimension n, then the Constraint Qualifications may hold everywhere
because only a small number of constraints are effective at any point in D .

More particular, we say that the Constraint Qualification fails for a certain point x D if the
following holds : First, let E {1, . . . , `} be the set of effective constraints, so hi ( x ) = 0 if
and only if i E. And then the Constraint Qualification fails if { Dhi ( x ) | i E } is not an
independent set.

MA 208 Optimisation Theory

Notes 5 Page 6

So how do you check if the Constraint Qualification holds on D , or for which x D it fails ?
The main problem is that there are many possibilities for the set E of effective constraints.
For instance, consider the set D = { ( x, y) R2 | x + y 0, x2 0 }. Then (0, 0) D
with h1 (0, 0) = 0 and h2 (0, 0) = 0, so E = {1, 2} for x = (0, 0). But also (1, 1) D with
h1 (1, 1) = 0 but h2 (1, 1) > 0, so E = {1} for x = (1, 1). And finally, (1, 1) D with
h1 (1, 1) > 0 and h2 (1, 1) > 0, so E = in this case.

Because of the different possibilities for E, checking for which x D the Constraint Qualification fails is quite some work. Here is a tedious, but save, method :
1. Write down each subset of {1, . . . , `}, except the empty set. So if ` = 3, then we get have
the subsets { h1 }, {h2 }, { h3 }, { h1 , h2 }, { h1 , h3 }, {h2 , h3 }, and { h1 , h2 , h3 }.
2. For each of the subsets E from step 1, see if there are points x D such that
hi ( x ) = 0,

for all i E,

and

hi ( x ) > 0,

for all i
/ E,

and

{ Dhi ( x ) | i E } is dependent.
In general, there will be many E for which there is no x satisfying this. In particular, for
many E it wont be possible to have a point x satisfying hi ( x ) = 0 for all x E.

If you are working with a mixed constraint problem, then the procedure above should be
adapted to take into account that every equality constraint gi ( x ) = 0 is always effective, so
you should only consider E that include all equality constraints.
Every point x D found in the procedure above is a potential problem case in KuhnTuckers Theorem. So it has to be considered as a candidate maximum or minimum, until
we have a good reason to remove it from the list of such candidates.

5.6 The Kuhn-Tucker Multipliers


The numbers 1 , . . . , m in Kuhn-Tuckers Theorem are called the Kuhn-Tucker Multipliers.
They have a meaning similar to the Lagrangean multipliers from the previous chapter :
* Property
If hi is a non-effective constraint, then hi ( x ) > 0, and relaxing the constraint hi ( x ) 0
further to hi ( x ) + 0 will not help anything. This behaviour is reflected in the fact that
for those constraints we have i = 0.
If h j is an effective constraint, then h j ( x ) = 0 and then we can expect to have a new
maximum if we relax the constraint h j ( x ) 0 to h j ( x ) + 0. In a way similar to the
effect of the Lagrangean multipliers we can show the following :

If h j is an effective constraint, the a small relaxation of the j-th constraint, replacing


h j ( x ) 0 by h j ( x ) + 0, will give a new maximum x () for which we have approximately f ( x () ) f ( x ) + j .
Proofs of these facts are very similar to those for the Lagrangean multipliers.

MA 208 Optimisation Theory

Notes 5 Page 7

The result above also holds for mixed constraint problems, where you must realise that all
equality constraints are always effective. The main difference between equality and inequality constraints, is that for the latter ones we know that j 0. In particular that means that
if > 0, then the maximum changes from f ( x ) to f ( x ) + j , hence will increase or stay
the same if j = 0.
For the multiplier j of an equality constraint we dont know the sign of j , so the difference
j can be both positive or negative or 0.

5.7 Applying Kuhn-Tuckers Theorem


Sections 6.2 and 6.3 in the book contain all kind of examples of optimisation problems and
their behaviour with respect to the Kuhn-Tucker theorem. The standard form of the problems is
maximise f ( x ) subject to x D = U { x Rn | hi ( x ) 0, i = 1, . . . , ` },
where f , hi : Rn R are given C1 functions, and U Rn is an open set.
You may note that a detailed analysis of even fairly simple problems can be quite more work
than similar equality constraint problems using Lagranges Theorem. Some of the reasons
for this extra work are :
The Constraint Qualification for Kuhn-Tuckers Theorem is more complicated and harder
to check.
There are different theorems for maxima and minima; so if you want to find all optima
you need to do a certain amount of work twice.

The Cookbook Procedure in Section 6.2.1 for using Kuhn-Tuckers Theorem to find a maximum can be described as follows :
1. If possible, find a good reason why a maximum must exist. For this, Weierstrass Theorem
would be a prime source of knowledge, but sometimes ad-hoc methods will be required.
2. Determine the derivatives Dh1 ( x ), . . . , Dh` ( x ) of the constraint functions.
Determine all point for which the Constraint Qualification fails. Unless there are obvious
shortcuts, this has to be done by looking at every possible combination of constraints
using the procedure from Section 5.5 in these notes.
Any point in which the Constraint Qualification fails must be considered as a candidate
optimum.
3. Determine the derivative D f ( x ) of the objective function and formulate the Kuhn-Tucker
equations :
i 0

and
`

i hi ( x ) = 0,

D f ( x ) + i Dhi ( x ) = 0.
i =1

for i = 1, . . . , `;

MA 208 Optimisation Theory

Notes 5 Page 8

4. Find all values x U and multipliers 1 , . . . , k for which the equations above are satisfied.
At this point you should have a collection of candidates for the maxima : the points from 2
for which the Constraint Qualification failed and the points x in 4 satisfying the Lagrangean equations. No other point can be a maximum of f on D .
5. If you know from step 1 that a maximum must exist, then calculate the function values
for all candidate points from above. The points x which give the maximal value f ( x )
must form a global maximum.
If you havent been able in step 1 to guarantee the existence of a maximum, then you
probably have to do some more work. Check the candidate points and see which could
be a global maxima and why ( or why not ).
If you havent been able in step 1 to guarantee the existence of a maximum, and no candidate points are found in steps 2 or 4, then no maximum and minimum exists. It may be
a good idea to check if you can confirm that using some other reasons. ( For instance, the
function has no upper bound on D . )
If no candidate point is left from steps 2 and 4, but you claimed in step 1 that a maximum
must exist, then there is something seriously wrong. Check your work and try to find the
mistake(s).

In order to find a minimum, the procedure above needs some small adaption, although the
critical step 2 is identical. In fact, if we want to find both a maximum and minimum, then
step 2 needs to be done only once.
For mixed constraint optimisation problem, some further adaption is required.

Another way to describe the equations in step 3 is by defining the Lagrangean


k

L( x, ) = f ( x ) + i gi ( x ),
i =1

where x Rn and = (1 , . . . , ` ) R` . Then the equations above, together with the


condition x D , can be written as

L( x , ) 0,
i

L( x , ) = 0,
x j

i 0,

L( x , ) = 0,
i

for i = 1, . . . , `,

for j = 1, . . . , n.

Note that no Second Order Conditions for inequality constrained optimisation problems
are described in the book. That type of conditions exist in the literature, but they are so
cumbersome that they are only of theoretical interest.
You can also note from the description above, and from all examples in the book, that KuhnTuckers Theorem is only used when searching for global optima. Local optima are usually
discarded as being to hard to guarantee and of little interest.

MA 208 Optimisation Theory

Notes 5 Page 9

From the book


Most of Section 6.1 is covered in these notes, and will be considered part of the material
for this course. You can skip the more formal argumentation about the Kuhn-Tucker
Multipliers on page 149.
Section 6.2.1 gives some more detail concerning the use of the Lagrangean L( x, ).
The rest of Section 6.2 and Section 6.3 give all kind of examples when and how to use the
Kuhn-Tucker method. Have a look at what is happening.
Section 6.4 formulates the method for mixed constraint problem. Its a pity that not much
more is done with it.
You can skip Section 6.5, the proof of Kuhn-Tuckers Theorem.

Suitable exercises from the book related to these notes


Section 6.6 : 1 12.
The actual values mentioned at the end of Question 6 make this a very hard question.
Better use p1 = p2 = 1 and w1 = w2 = 1.
Also the numbers in Question 7 (a) are nasty. Use that x1 has unit price 5 23 and x2 has unit
price 12 instead.

Optimisation Theory

2007/08

MA 208
Notes 6
Linear Programming and Duality

6.1 Introduction
This topic doesnt appear in the book, so youll have to do with these notes and the lectures
and classes.

A linear function is a function f : Rn R of the form


f ( x ) = a1 x2 + a2 x2 + + an xn + b,
where a1 , . . . , an and b are real constants.
A linear programming problem ( or LP-problem ) is a constrained optimisation problem of the
form
maximise ( or minimise ) f ( x ) subject to
x D = { x Rn | g j ( x ) = 0, j = 1, . . . , k; hi ( x ) 0, i = 1, . . . , ` },

(1)

where the objective function f and all constraints g1 , . . . , gk and h1 , . . . , h` are linear functions.

The above gives the general form for an LP-problem. But for the rest of these notes we will
always assume that an LP-problem has the following standard form :
maximise c1 x1 + c2 x2 + + cn xn
subject to

(1)

(1)

(2)

(2)

(1)

a1 x1 + a2 x2 + + an xn b1 ,
a1 x1 + a2
..
.
(m)

a1

(m)

x1 + a2

x1 0,

(2)

x2 + + an xn b2 ,
..
..
..
.
.
.
(m)

x2 + + a n

x2 0,

...,

(2)

x n bm ,

xn 0.

( j)

Here the ci , b j and ai , i = 1, . . . , n, j = 1, . . . , m, are real constants.

Author : Jan van den Heuvel

c London School of Economics, 2008

MA 208 Optimisation Theory

Notes 6 Page 2

Of course, not every linear programming problem in general format (1) will look like the
standard form in (2). In this paragraph we will show how to transform a linear programming
problem in the more general form (1) into the form prescribed in (2).
If we want to minimise f ( x ), then this is equivalent to maximising f ( x ). Similarly,
maximising a1 x2 + + an xn + b, will give the same solutions as maximising a1 x2 +
+ an xn , where only the function value in the maxima will differ by b.
Next it is straightforward that a constraint of the form h( x ) 0, where h is a linear
function, can always be written as a1 x2 + + an xn b, for constants a1 , . . . , an and b.
So we are left with the problem what to do with equality constraints g( x ) = 0, where g is a
linear function, and how to make sure that all constraints of the form xi 0 are present.
First assume we have an equality constraint of the form g( x ) = 0 in (1), where g is a linear
function. In other words, we have a constraint of the form a1 x1 + + an xn + b = 0 for
some constants a1 , . . . , an and b. We can assume that not all ai are 0, otherwise we have
something that is always true ( if b = 0; and hence would be a constraint we can ignore ),
or that is always violated ( if b 6= 0, which would mean that D = and the whole problem has no solution ).
So take any such ai 6= 0. For simplicity we assume an 6= 0. Then a1 x1 + + an xn + b = 0
is the same as saying xn = (b a1 x1 an1 xn1 )/an . Then by just substituting this
value for xn in the objective function and in all constraints, we get a new linear programming problem. This new problem no longer has the equality a1 x1 + + an xn + b = 0.
But note that xn 0 gets translated to (b a1 x1 an1 xn1 )/an 0, which becomes ( a1 /an ) x1 + + ( an1 /an ) xn1 b/an . So we loose the inequality xn 0, but
must add a new inequality instead.
This way we get a new system which is one dimension lower than the original was, because xn has disappeared.
We should repeat the procedure until all equality constraints in (1) have been removed.
Its somewhat trickier to make sure that all constraints of the form xi 0 are really
present.
First notice that any a R can be written as a = b c for some b, c R with b, c 0
( for instance, take b = max{0, a} and c = max{0, a} ). In fact, there are always many
possibilities for b and c, since if a = b c with b, c 0, then also a = (b + 1) (c + 1)
with b + 1, c + 1 0.
Anyhow, if the constraint xi 0 is not one of the constraints in (1), then we write every
occurrence of xi in the objective function and the constraints as xi = xi+ xi , and we
add the two constraints xi+ 0 and xi 0. This way we get a new linear programming problem with one more variable ( xi is replaced by xi+ and xi ), and two further
constraints xi+ 0 and xi 0.
By repeatedly applying the procedures above, any linear program of the general form (1) can
be transformed into an equivalent LP-problem in the standard form (2); possibly with a different number of variables and/or constraints. Once were done analysing the LP-problem
in standard form ( for instance weve found a maximum ), then we can do the inverse of the
procedures above to obtain the analysis of the original problem.

MA 208 Optimisation Theory

Notes 6 Page 3

6.2 Linear programming and Kuhn-Tuckers Theorem


In this section we will see what the consequences are of giving an LP-problem in standard
form (2) as an input for the cookbook procedure of Kuhn-Tuckers Theorem. First note that
all functions are linear, so certainly C1 .
As you should know by now, an essential part when applying Kuhn-Tuckers Theorem is
checking if the Constraint Qualifications hold. We will discuss that later, and go directly to
the Kuhn-Tucker equations.

In order to find critical points according to the Kuhn-Tucker equations we first rewrite an
LP-problem in standard form to the form we need for Kuhn-Tucker :
maximise c1 x1 + c2 x2 + + cn xn
subject to

(1)

(1)

(1)

b1 a1 x1 a2 x2 an xn 0,
x2 an xn 0,
..
..
..
.
.
.

(m)

x2 a n

x1 a2
..
.

(m)

x1 a2

bm a 1

x1 0,

(2)

(2)

(2)

b2 a1
..
.

x2 0,

(m)

...,

x1 0,

xn 0.

Since there are clearly two different types of constraints, we also will use two names for the
( j)
( j)
Kuhn-Tucker multipliers : j for constraints b j a1 x1 an xn 0 and multipliers i
for constraints xi 0.
Using Kuhn-Tuckers Theorem, and ignoring the Constraint Qualifications for the moment,
we see that in order for ( x1 , . . . , xn ) to be a maximum, we must be able to find (1 , . . . , m ,
1 , . . . , n ) such that
j 0,

for j = 1, . . . , m,

(3a)

xn 0,

for j = 1, . . . , m,

(3b)

( j)
an

for j = 1, . . . , m,

(3c)

i 0,

for i = 1, . . . , n,

(3d)

xi 0,

for i = 1, . . . , n,

(3e)

i xi = 0,

for i = 1, . . . , n,

(3f)

for i = 1, . . . , n.

(3g)

bj

( j)
a1

j (b j

x1
( j)
a1

( j)
an

x1

xn ) = 0,

( j)

ci j ai + i = 0,
j =1

From equations (3g) we find


m

i =

( j)

j ai

j =1

(1)

(m)

ci = ai 1 + + a i

m ci ,

for i = 1, . . . , n.

Substituting this into (3d) and (3f), rearranging (3d) (3f), and renaming the j to y j , we get
that if ( x1 , . . . , xn ) is a maximum of the LP-problem (2) in standard form, then there must

MA 208 Optimisation Theory

Notes 6 Page 4

exist (y1 , . . . , ym ) such that


y j 0,
bj

( j)
a1

y j (b j

x1
( j)
a1

( j)
an

x1

for j = 1, . . . , m,

(4a)

xn 0,

for j = 1, . . . , m,

(4b)

( j)
an

for j = 1, . . . , m,

(4c)

for i = 1, . . . , n,

(4d)

for i = 1, . . . , n,

(4e)

for i = 1, . . . , n.

(4f)

xn ) = 0,

xi 0,
(1)

ai

(m)

y1 + + a i

(1)
x i ( a i y1

++

ym ci 0,
(m)
ai y m ci ) =

0,

So to summarise we get
* If ( x1 , . . . , xn ) is a maximum of the LP-problem in (2), then there exist (y1 , . . . , ym ) such
that the equations (4a) (4f) are satisfied.

If you look at the equations (4a) (4f), you notice a remarkable symmetry between the xi
and the y j . In fact, we would obtain exactly the same equations if we would analyse what it
means for an m-dimensional vector (y1 , . . . , ym ) to be the solution to the following problem :
minimise b1 y1 + b2 y2 + + bm ym
(1)

(2)

(m)

subject to a1 y1 + a1 y2 + + a1
(1)
a2 y1

(2)
a2 y2

..
.
(1)

++

ym
(m)
a2 y m

..
.
(2)

..
.
(m)

c1 ,
c2 ,
..
.

a n y1 + a n y2 + + a n

ym cn ,

y1 0,

ym 0.

y2 0,

...,

(5)

We can summarise this to


* If (y1 , . . . , ym ) is a minimum of the linear programming problem in (5), then there exist
( x1 , . . . , xn ) such that the equations (4a) (4f) are satisfied.

So what about the Constraint Qualifications for the linear programming problem discussed
above ? If we are dealing with linear constraint function, then it is possible to prove that
even in points where the Constraint Qualifications fail, a maximum must occur as a solution
to the Kuhn-Tucker equations.
The reason for this behaviour can be found by looking back at the proof of Kuhn-Tuckers
Theorem. There the Constraint Qualifications appeared because in such points the first order Taylor approximation may not give a good description of what is happening with the
constraint functions at such a point. But this problem cannot occur if all functions are linear, because then the first order Taylor approximation is exactly the function itself ! So the
first order Taylor approximations used in the proof of Kuhn-Tuckers Theorem give an exact
description of the behaviour of the constraint functions.
There is one thing you should be aware of when there is a maximum of a linear programming problem in a point for which the Constraint Qualification fails. Although such a point
will appear as a solution to the Kuhn-Tucker equation, it is possible that there will be no
unique solution, and that many different multipliers are possible. This doesnt influence the
question about existence of the multipliers, but it may make them harder to find, and also
will spoil their interpretation as shadow prices.

MA 208 Optimisation Theory

Notes 6 Page 5

6.3 Linear Duality


We will now write the linear programming problems seen so far in a more
compact form.

(1)
(1)
a1
an
.
..
..
.
Given an LP-problem in standard form (2), let A be the matrix A =
.
.
.
,
(m)
(m)
a1
an
and define column vectors c = (c1 , . . . , cn ), b = (b1 , . . . , bm ), x = ( x1 , . . . , xn ), and y =
(y1 , . . . , ym ). Also, let 0n and 0m be the n-dimensional and the m-dimensional null-vector,
respectively ( so they have all coordinates equal to 0 ).
Remember that the inner product of two vectors a and b of the same dimension is denoted
by a b. And we use A0 to denote the transpose of A, i.e., the matrix obtained from A by
taking the reflection in the diagonal. ( Transposed are usually indicated by At or A T , but we
follow the notation in the book. )

Using the notation above, we can write an LP-problem in standard form as


maximise c x
subject to

A x b,

(6)

x 0n ,
where the matrix A, the n-vector c and the m-vector b are given. This problem is sometimes
called the primal linear programming problem.
And if we have such an LP-problem in the standard form given in (6), we define the dual
linear programming problem ( or DLP-problem ) as the following constrained optimisation problem :
minimise b y
subject to A0 y c,

(7)

y 0m .
Notice that this is exactly the linear programming problem in (5).

The constraint set of an LP-problem according to (6) is given by


D P = { x Rn | A x b; x 0n }.
This set is usually called the feasible set of the ( primal ) LP-problem and a point x D P is a
feasible point of the LP-problem.
Similarly, for the dual LP-problem we have the constraint set

D D = { y Rm | A0 y c; y 0m };
and D D is the feasible set of the DLP-problem and a point y D D is a feasible point of the
dual problem.

MA 208 Optimisation Theory

Notes 6 Page 6

We can also rewrite the equations (4a) (4f) in a more compact form, and re-order to get :
A x b,

(8a)

A0

(8b)

y c,

x 0n ,

(8c)

y 0m ,

(8d)
( j)

( j)

y j (b j a1 x1 an xn ) = 0,
(1)

xi ( ai

(m)

y1 + + a i

ym ci ) = 0,

for j = 1, . . . , m,

(8e)

for i = 1, . . . , n.

(8f)

Using the Kuhn-Tucker Theorem, in Section 6.2 we proved the following :


Proposition 6.1
* If there exists a solution x to the LP-problem in (6), then there exists a y so that x and y
satisfy equations (8a) (8f).
* If there exists a solution y to the DLP-problem in (7), then there exists an x so that x and y
satisfy equations (8a) (8f).
This proposition shows strong connections between an LP-problem and its dual problem. In
the remainder of this section we further extend these connections.
Corollary 6.2
* If there exists a solution x to the LP-problem in (6), then D D is not empty.
* If there exists a solution y to the DLP-problem in (7), then D P is not empty.
Proof From Proposition 6.1 we get that if there exists a solution x to the LP-problem in (6),
then there exists a y so that x and y satisfy equations (8a) (8f). In particular we find a y so
that A0 y c and y 0m . But that exactly means that we find a y D D , hence D D is not
empty.
The second part is proved in the same way.

Lemma 6.3
(a) If a, b Rn so that a 0n and b 0n , then a b 0 ( where this last 0 is the real number
zero ).
(b) If a, b, c Rn so that a 0n and b c, then a b a c.
(c) If M is an n m matrix with real entries, a Rn and b Rm , then a ( M b) = ( M0 a) b.
Proof Part (a) follows immediately since x 0 means xi 0, for all i = 1, . . . , n, and
n

similarly for y. Writing out the inner product we get : x y = xi yi 0.


i =1

The second part is almost equally trivial since b c means b c 0, which according to the
first part means a b a c = a (b c) 0, and we are done.
And (c) is easily obtained by writing out what a ( M b) and ( M0 a) b are. If the i, j-entry
n m

m n

i =1 j =1

j = 1i = 1

of matrix M is Mij , then you will find that a ( M b) = ai Mij b j = ( Mij ai ) b j =

( M0 a) b.

MA 208 Optimisation Theory

Notes 6 Page 7

Theorem 6.4 ( Weak Duality )

Given an LP-problem in the form according to (6), and its dual in (7). Then for any feasible
point x of the LP-problem and any feasible point y of its dual we have c x b y.
Proof For a feasible point x we know A x b and x 0n ; and for a feasible point y we
know A0 y c and y 0m . By repeatedly using the appropriate parts of Lemma 6.3 we can
deduce : c x ( A0 y) x = y ( A x ) y b.

Theorem 6.5 ( Strong Duality )


Given an LP-problem in the form according to (6), and its dual in (7).
* If the LP-problem has an optimal solution x , then there exists an optimal solution y for
the DLP problem.
* If the DLP-problem has an optimal solution y , then there exists an optimal solution x
for the LP problem.
* Optimal solutions x and y as above satisfy c x = b y . ( In other words, the maximal
value of the LP-problem is the same as the minimal value of the DLP-problem. )
Proof Suppose x is an optimal solution of the LP-problem. Then by Proposition 6.1 we
know that there exist y = (y1 , . . . , ym ) so that
A0 y c,

(9a)

y 0m ,

(9b)
( j)

( j)

y j (b j a1 x1 an xn ) = 0,
(1)
( a i y1

++

(m)
ai y m

ci ) xi = 0,

for j = 1, . . . , m,

(9c)

for i = 1, . . . , n.

(9d)

( This are just equations (8b), (8d), (8e) and (8f). ) As a consequence of (9a) and (9b) we see
that y D D . And as a consequence of (9c) we get
y (b A x ) =

( j)

y j ( b j a1

( j)

x1 an xn ) = 0,

j =1

and so y b = y ( A x ). Similarly, from (9d) we find ( A0 y c) x = 0, which gives ( A0 y)


x = c x . Using this, and Lemma 6.3 (c) again, we get
b y = y b = y ( A x ) = ( A0 y) x = c x .
But on the other hand, we know that b z c x for all z D D because of the Weak Duality
Theorem 6.4. So the y we are working with is in fact a y D D that minimises b y and which
satisfies b y = c x . Renaming y to y completes the first and the third part.
Starting from the assumption that there is an optimal solution y for the DLP-problem we
can follow a similar reasoning to obtain a proof of the second part.

MA 208 Optimisation Theory

Notes 6 Page 8

As a final consequence of the proof of Theorem 6.5 we obtain the following important result.
Theorem 6.6 ( Complementary Slackness for Linear Programming )

Given an LP-problem in the form according to (6), and its dual in (7). If x is a solution to
the LP-problem and y is a solution to its dual, then they must satisfy the so-called complementary slackness conditions :
if xi > 0,

then ( A0 y )i = ci ;

if yj > 0,

then ( A x ) j = b j ;

if ( A x ) j < b j ,

then yj = 0;

if ( A0 y )i > ci , then xi = 0.
Proof From the proof of Theorem 6.5 we know that for optimal solution x and y we must
have
( j)

( j)

yj (b j a1 x1 an xn ) = 0,
(1)

(m)
ym

( ai y1 + + ai

ci ) xi = 0,
( j)

for j = 1, . . . , m,
for i = 1, . . . , n.
( j)

So for each j we have that yj = 0 or b j a1 x1 an xn = 0, which can be rewritten to


(1)

(m)
ym

yj = 0 or ( A x ) j = b j . Similarly, for each i we have xi = 0 or ai y1 + + ai


which is equivalent to xi = 0 or ( A0 y )i = ci . The result follows.

ci = 0,

Notice that you only can use complementary slackness if a constraint is slack, i.e., if it is
not satisfied with equality. If you have a tight constraint, then you cannot conclude that
the corresponding constraint is slack. For example if xi = 0, then it is still possible that
( A0 y )i = ci as well.

6.4 Solving Linear Programming Problems


Since this is a theory of course, we wont deal much with actually solving LP-problems.
But a few ideas why what weve done so far is useful in a more practical context as well
might be interesting for you.

Suppose you are asked to solve a linear programming problem in its most general form (1).
Then the following steps can be useful, although they dont give a full Cookbook Procedure :
1.

Use the procedures in Section 6.1 to translate the problem to an LP-problem in standard
form according to (2).

2.

If you are left with a problem with dimension n at most 2, then you should be able to
solve the problem graphically : In the x1 , x2 -plane, sketch the areas corresponding to
( j)
( j)
the constraints. I.e., find the ( x1 , x2 ) satisfying x1 , x2 0 and a1 x1 + a2 b j , for j =
1, . . . , m. Also sketch some level sets of the objective function; lines with c1 x1 + c2 x2 =
for some . This sketch should give you an idea in which point of the feasible set D P a
maximum is attained ( if any ).

MA 208 Optimisation Theory

3.

Notes 6 Page 9
( j)

If you are left with a problem with a small number of constraints of the form a1 x1 +
( j)

+ an xn b j , then it may be possible to derive fairly easily that the objective function
has no maximum on the feasible set.
4.

If you cant solve the primal LP-problem directly, then formulate the dual LP-problem
according to (5) or (7).

5.

Similar as in steps 3 or 4, it may be possible to solve the DLP-problem directly, or to


show it has no solution.

6.

If you can conclude in step 5 that the DLP-problem has no solution, then you know that
the primal LP-problem has no solution as well.

7.

If in step 5 you were able to find an optimal solution y of the DLP-problem, then
you can use information about y in the Complementary Slackness Conditions in Theorem 6.6 to obtain information about an optimal solution x of the primal LP-problem.
Also, the Strong Duality Theorem gives you an equation c x = b y about x . This
information is often enough to find an optimal solution x .

8.

If neither the primal nor the dual LP-problem can be solved directly, or can be shown
to have no solutions, then the last resort is to formulate the equations in (4a) (4f) and
try to solve these. This is basically the same as solving the Kuhn-Tucker equations in
general, so will often lead to a detailed case analysis. For large n and m this is not
feasible to do by hand, but for smaller dimensions it should be possible.

As a small beside, the procedure above is not really what is done in general for solving big
LP-problems using a computer. ( Where big can mean n equal to a couple of thousand and m
of the order of millions. ) Other techniques are available, but all of these use duality and
complementary slackness to get to a solution as fast as possible.

Exercises
1

Given is a linear programming problem of the form


Minimise x1 + x2 x4
for D = { x R4 | x1 + x2 + 3 x3 + 4 x4 18, 3 x1 + 2 x2 + x3 + x4 = 10 }.

(a) Transform this problem into the standard format for a Linear Programming problem
according to (2) on page 1.
(b) Formulate the DLP-problem for this problem.

MA 208 Optimisation Theory

Notes 6 Page 10

Given is the following LP-problem :


Maximise

x+y

subject to

x+2y
2 x + y
5x
x, y

3,
2,
6,
0.

(a) Write this problem as a standard LP-problem, sketch its feasible set, and find an optimal
solution.
(b) Give the dual LP-problem. Describe what the Complementary Slackness Conditions
mean for the primal and dual LP-problem.
(c) Find an optimal solution for the dual LP-problem.

Given is the following LP-problem :


Maximise 60 x1 + 84 x2 + 72 x3
subject to

3 x1 + 7 x2 + 3 x3 30,
2 x1 + 2 x2 + 3 x3 12
x1 , x2 , x3 0.

(a) Give the dual LP-problem, sketch the feasible set of the DLP-problem, and find an optimal solution for the DLP-problem.
(b) Describe what the Complementary Slackness Conditions mean for the primal and dual
LP-problem.
(c) Find an optimal solution for the original LP-problem.

Consider a linear programming problem in standard form :


maximise c x
subject to

A x b,
x 0n .

It is known that the feasible set D P of this problem is not empty and that c 0n , c 6= 0n .
(a) Explain carefully why you can conclude that an optimal solution of the LP-problem
must exist.
Now suppose that, in addition, for this specific LP-problem we have that every point in the
feasible set is an optimal solution.
(b) Is it true that we also must have that every point in the feasible set D D of the dual LPproblem is an optimal solution of the dual LP-problem ? ( Justify your answer, either by
proving the statement, or by giving a counterexample. )

MA 208 Optimisation Theory

Notes 6 Page 11

Given is the following LP-problem :


Maximise 3 x1 x2 x3
subject to

x1 x2 + x3 1,
x1 + x2 x3 1
x1 , x2 , x3 0.

(a) Give the dual LP-problem and sketch the feasible set of the DLP-problem.
(b) Show that the DLP-problem has more than one optimal solution.
(c) Use the Complementary Slackness Conditions to find an optimal solution for the primal
LP-problem, and show this solution is unique.
(d) Show that if we interpret the LP-problem as an inequality constrained optimisation
problem in Kuhn-Tuckers Theorem, then the Constraint Qualification would fail in the
optimum found in (c).

Optimisation Theory
MA 208

2007/08

Notes 7
Digraphs and Networks
Shortest Paths
Order of Functions
Algorithms and their Analysis
Shortest Path Algorithms
We now start the combinatorial optimisation part of the course. The Sundaram book
doesnt know about this, so instead we switch to the Biggs book : N.L. Biggs, Discrete Mathematics ( 2nd edition ), Oxford University Press (2002), ISBN 0-19-850717-8. The sections from
this book that are most relevant for us are : 14.1 14.7, 15.1, 15.4, 16.6 and 18.1 18.4.
For most of the remainder, we will be looking at graphs as our object of study ( well,
directed graphs actually ). If you want to read more on graph theory, then almost any book
with the words introduction, graph and theory in the title should do. An excellent
source is also the book Graph Theory with Applications, by J.A. Bondy and U.S.R. Murty, North
Holland (1976). This book is out of print ( and has been out of print for ages ). But the full text
is available online for personal use. You can find it via www.ecp6.jussieu.fr/pageperso/
bondy/books/gtwa/gtwa.html. ( Bondy and Murty recently published a new book on graph
theory; that is a far more advanced book, and not a 2nd edition of the book mentioned
above. )
7.1 Graphs, digraphs and networks
The definition of a graph can be found in Section 15.1 of the Biggs book. But we will mostly
be interested in digraphs ( or directed graphs ) defined in Section 18.1 of the book.

For us a digraph D = (V, A) consists of a finite set V, called the vertices ( singular vertex ),
and a collection A of pairs of different elements from V. The elements in A are called arcs
( sometimes also called directed edges ).
Since we dont allow pairs (u, v) with u = v, we dont allow what the book calls loops.
We also are not allowed to take the same pair twice. ( Certain authors happily allow that and
call it parallel arcs. ) But we do allow that both arcs (u, v) and (v, u) are present in A.
We think of an arc (u, v) as a line connecting u to v, with a direction from u to v. Hence we
often talk about the arc from u to v. We call u the tail and v the head of the arc (u, v).
Author : Jan van den Heuvel

c London School of Economics, 2008

MA 208 Optimisation Theory

Notes 7 Page 2

A network ( D, w) is a digraph D = (V, A) with a weight function w : A Z on the arcs.


( The book only allows non-negative weights, but allowing negative weights makes life so
much more interesting. )
You might wonder why we concentrate on digraphs, and dont consider undirected graphs.
The reason is that for most of the problems we are interested in, if we understand the problems for digraphs, then we fairly easily can deduce what is happening for graphs. The other
way round, going from graphs to digraphs, is not always that easy.
For most of the problems on graphs we might be interested in, we can use the following
transformation to digraphs : replace each undirected edge {u, v} by two arcs (u, v) and (v, u).
So we would do the following for the simple graph below :
u

graph
-

6?
u

digraph

On the other hand, its not clear how we would go from digraphs to graphs, without losing
some information :
u
6
u

digraph

graph
u

7.2 Walks, paths, cycles, strong connectivity

Let D = (V, A) be a digraph. A directed walk is a sequence of vertices v1 , v2 , . . . , vk so that


every two consecutive vertices form an arc : (vi , vi+1 ) A for all i = 1, 2, . . . , k 1. If we
want to specify the beginning and end of the walk, we use the term directed v1 , vk -walk or
directed walk from v1 to v2 .
A directed walk v1 , v2 , . . . , vk in which the first and last vertex are the same is called a directed
tour. Note that if u, v are vertices so that both (u, v) and (v, u) are arcs, then u, v, u is an
allowed directed tour.
A directed walk v1 , v2 , . . . , vk in which all the vertices are distinct is a directed path or a directed
v1 , vk -path.
A directed walk v1 , v2 , . . . , vk in which all vertices are distinct, except that v1 = vk , is a directed
cycle. Again, if u, v are vertices so that both (u, v) and (v, u) are arcs, then u, v, u is an allowed
directed cycle.

Since we are always assuming we are working in a digraph, we often omit the directed in
the names above, and just talk about walk, u, v-walk, tour, path, u, v-path, cycle.
Note that for a pair of vertices u, v, there may be infinitely many u, v-walks. On the other
hand, since in a path no vertex can appear more than once, the number of u, v-paths is always
finite.

MA 208 Optimisation Theory

Notes 7 Page 3

A digraph D = (V, A) is strongly connected if for every two distinct vertices u, v there is walk
from u to v.
Property 7.1
A digraph D = (V, A) is strongly connected if and only if for every two distinct vertices u, v there is
a path from u to v ( exercise ).

7.3 Combinatorial optimisation and algorithms


Below we start considering some problems from combinatorial optimisation. These can be
formulated in the traditional way as maximise or minimise f : D R for some set D .
But the structure of the function f and the set D usually means that such a description would
not really be enlightening. So we usually formulate the problems more informally ( but not
less precisely ! ) as find the shortest path, find the longest cycle, etc.

Another phenomenon is that the question about the existence of the optima is usually guaranteed. Heres an easy argument to prove this.
Let D be a finite, non-empty, set, and f : D R be any function on D . Consider the set
{ f ( a) | a D }. This is a finite, non-empty, set of numbers from R. Hence this set has a
maximum and a minimum ( you will be asked to prove this formally in an exercise ), and
this maximum/minimum is the maximum/minimum of f on D .

So for most questions in combinatorial optimisation we are not concerned about proving
the existence of a maximum or minimum, but about an efficient way of finding them. We
usually do this by describing an algorithm to find the maximum or minimum, prove that that
algorithm indeed will result in finding the solution we want, and discussing how long the
algorithm will take to find the solution.
Just as in Section 14.1 of the Biggs book, we wont define precisely what we mean by an
algorithm. We just stick to the same description as there : An algorithm is a sequence
of instructions. Each instruction must be carried out, in its proper place, by the person or
machine for whom the algorithm is intended.

We will usually write an algorithm in some kind of pseudo computer language, being
precise if we can be, and using more general language if that is more convenient.
So for instance, the following could be an algorithm to find the minimum in a finite set S of
real numbers :
1.
2.
3.
4.
5.
6.
7.
8.

make the set A equal to the set S;


take an element a from A;
set m = a and remove a from A;
as long as A is not empty :
{ take an element a from A;
if a < m, then replace m by a;
remove a from A };
declare the result to be m

We will see later how we can prove that such an algorithm gives the correct answer.

MA 208 Optimisation Theory

Notes 7 Page 4

We will allow ourselves to describe our algorithms in fairly informal language : remove a
from A, calculate x + y, take the maximum of x1 , . . . , xk , etc. To prevent ourselves
from both writing one-line algorithms ( like solve the problems stated ), and from writing
detailed descriptions involving a large number of small but simple steps, we assume the
following operations are always allowed as building blocks :

arithmetic operations involving integers;


operations on sets ( inclusion, difference, union, adding/removing a single element );
comparing integers ( i.e, checking if a < b or not );
assigning values to new variables ( set a = 2 b, provided b has a value at that moment );
checking if a S or not, for some element a and set S;
checking if a set is empty or not.

7.4 Shortest walks in networks

When we have a network ( D, w) and W = v1 , . . . , vk is a walk in D, then the weight w(W ) of


the walk is just the sum of the weights of the arcs of that walk :
k 1

w (W ) =

w ( v i , v i +1 ).

i =1

The weight of tours, paths, cycles, etc., is defined analogously.


In the remainder of these notes we will be considering the problem : Given a network ( D, w)
and two distinct vertices s, t what is the s, t-walk of minimum weight ? Instead of walk of
minimum weight, we will usually use the term shortest walk.

Let u, v be two vertices in a network ( D, w). Let Y (u, v) be the set of all u, v-walks in D, and
set Z (u, v) = { w(W ) | W Y (u, v) }.
Now we define the distance from u to v, denoted dist(u, v), as follows :
If there is no u, v-walk ( Y (u, v) and Z (u, v) are empty ), then we set dist(u, v) = +.
If Z (u, v) has no lower bound, then we set dist(u, v) = .
In all other cases, dist(u, v) is the weight of a shortest u, v-walk, the minimum of Z (u, v).
The fact that in the last case we can take the minimum of Z (u, v), and not the infimum, requires
some proof. This will be done in the lectures.
Life is a lot easier if we assume the digraph D is strongly connected and if none of the
weights w( a) is negative ( see exercises ).

In the definition of distance above we introduced + and . But we should not start
using them just as if they are numbers. They are more meant to indicate certain concepts ( in
particular as a shorthand for two different reasons why there is no shortest walk ).
Property 7.2
Let ( D, w) be a network. There exist two vertices u, v with dist(u, v) = if and only if there exists
a cycle C in D with w(C ) < 0.

MA 208 Optimisation Theory

Notes 7 Page 5

Property 7.3
Let u, v be two vertices in a network ( D, w).
If dist(u, v) 6= , then there is a u, v-path P so that dist(u, v) = w( P).
If w( a) 0 for all arcs a and there is a u, v-walk in D, then there is a u, v-path P so that
dist(u, v) = w( P).
Because of Property 7.3, for the case that all weights are non-negative, we could also define
dist(u, v) as the weight of a shortest path from u to v, if such a path exists.
Property 7.4
If u, v, w are three vertices in a network ( D, w) so that dist(u, v) 6= , dist(v, w) 6= and
dist(u, w) 6= , then dist(u, w) dist(u, v) + dist(v, w).
Property 7.5
Let ( D, w) be a network. Then for all v V we have either dist(v, v) = 0 or dist(v, v) = .
Most of the properties will be proved in the lectures.

7.5 Dijkstras Algorithm


We now describe our first algorithm for finding shortest walks in networks. This algorithms
is called Dijkstras Algorithm, after one of the persons who described this algorithm1 . It is only
guaranteed to work if there are no negative weights. So we assume were given a network
( D, w) where D = (V, A) is a digraph and w : A Z+ = {0, 1, 2, . . .} is a weight function.
We are also given two vertices s, t and want to find out dist(s, t).
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

colour all vertices black;


colour s white and set d(s) = 0;
for all v V with (s, v) A : colour v grey and set d(v) = w(s, v);
as long as there are grey vertices :
{ let u be the grey vertex with d(u) minimal;
colour u white;
for all v V with (u, v) A :
{ if v is black : colour v grey and set d(v) = d(u) + w(u, v);
if v is grey and d(u) + w(u, v) < d(v) : replace d(v) by d(u) + w(u, v)
};
};
for all black vertices v : set d(v) = +;
declare dist(s, t) to be d(t)

In the lectures we will prove why this algorithm gives the right results.

Actually, the first known description of the algorithm is in a little-known report by Leyzorek, Gray, Johnson,
Ladew, Meaker, Petry & Seitz from 1957; Dijkstras publication is from 1959. You can imagine why the name
Dijkstras Algorithm stayed more popular than the name Leyzorek-Gray-Johnson-Ladew-Meaker-Petry-Seitz
Algorithm, even after the historical inaccuracy became known.

MA 208 Optimisation Theory

Notes 7 Page 6

Note that in fact Dijkstras Algorithm finds dist(s, v) for every vertex v V. This is a feature
of most algorithms to find shortest paths. That is why these algorithms are often known as
Single Source Shortest Path algorithms ( where the source is the single vertex s that forms the
base of all the distances ). And we might as well replace the final line of the algorithm by
13.

for all v V : declare dist(s, v) to be d(v)

Since all the algorithms we look at will have the property that it determines dist(s, v) for one
source s V and all vertices v V, from now on we expect only the source s to be given
( and not s and t ).
Of course, in the case of Dijkstras Algorithm, we could stop earlier as soon as dist(s, t) is
determined ( that happens when t is coloured white ).
We have a look at how efficient Dijkstras Algorithm is in the next session.

Dijkstras Algorithm might not work if there are arcs a with negative weight. An example
where this happens is the following small graph, with weights beside the arcs :
4

u ut
A

4 AKA 2

s Au

Dijkstras Algorithm will give dist(2, t) = 2, while the correct distance is dist(s, t) = 0.

7.6 Order of functions and number of operations in an algorithm

Let f , g be two functions f , g : N R+ . We say that f (n) = O( g(n)) if there exists


constants C1 , C2 so that f (n) C1 + C2 g(n) for all n N.
The expression f (n) = O( g(n)) is usually pronounced as f (n) is big-oh g(n). It is a
precise expression of the idea f (n) is at most the same order as g(n).

The big-oh notation comes with its own kind of arithmetic. For instance, the following
statements are easy to prove :
If f 1 (n) = O( g1 (n) and f 2 (n) = O( g2 (n), then f 1 (n) + f 2 (n) = O( g1 (n) + g2 (n)).
We often write this as O( g1 (n)) + O( g2 (n)) = O( g1 (n) + g2 (n)).
If f 1 (n) = O( g1 (n) and f 2 (n) = O( g2 (n), then f 1 (n) f 2 (n) = O( g1 (n) g2 (n)).
We often write this as O( g1 (n)) O( g2 (n)) = O( g1 (n) g2 (n)).
If f 1 (n) = O( g(n) and f 2 (n) = O( g(n), then f 1 (n) + f 2 (n) = O( g(n)).
We often write this asO( g(n)) + O( g(n)) = O( g(n)).
If f (n) = O( g(n)) and g(n) = O(h(n)), then f (n) = O(h(n)).

We have another look at Dijkstras Algorithm, and ask ourselves how long it would take to
solve a particular problem ( i.e., if we start with a network ( D, w) and a source s, how long
before the algorithm finishes ). Since it doesnt seem to make much sense to talk about how
long in time, we assume the question is supposed to be how many steps will it do ?.

MA 208 Optimisation Theory

Notes 7 Page 7

So now we have to decide what we mean by number of steps. An obvious attempt to


answer that is by saying we mean the number of operations, where an operation is one of
the operations described in Section 7.3.
But that still doesnt answer the question how many operations we need to do for, for
instance, if v is grey and d(u) + w(u, v) < d(v) : replace d(v) by d(u) + w(u, v). Is
checking if v is grey one operation ? And how many operations would it be to check if
d(u) + w(u, v) < d(v) ( there is one addition and one comparison ) ? And even if we decided how many operations it takes to check v is grey and d(u) + w(u, v) < d(v), then
depending on the outcome of that check we do something or we dont. So the number of
operations to do if v is grey and d(u) + w(u, v) < d(v) : replace d(v) by d(u) + w(u, v) is
not constant, but depends on the values of (d(u), d(v) and w(u, v) !
In order to overcome most of these problems, we follow the following two conventions,
which are more or less standard in the analysis of algorithms :
All simple operations from Section 7.3 always take some constant number of steps.
We always assume the worst case when considering conditional behaviour.
For the final answer for the number of operations we are only interested in the order of
that number, expressed in the essential information that was given as input.

As an example, consider the simple algorithm to find the minimum of a finite set S in Section 7.3. Recall that |S| denotes the number of elements in S.
Line 1, copying the set S, is |S| operations. Line 2 is 1 operation, while line 3 is 2 operations.
The check in line 4 is one operation, but we may have to do this check possibly many times.
In fact, doing lines 4 7 once takes at most some small constant number a of operations. And
each time we encounter those lines, the set A becomes one smaller. So we might have to
do those steps |S| times before the set A is empty. Hence lines 4 7 in total may take a |S|
operations. And then there is the final operation in line 8.
So in total we might have up to |S| + 1 + 2 + a |S| + 1 = 4 + ( a + 1) |S| operations, where a is
some small constant. We express this as saying that the algorithm in Section 7.3 for finding
the minimum of a set S takes O(|S|) operations.

7.7 The number of operations in Dijkstras Algorithm

We can do a similar analysis of Dijkstras Algorithm. Here the essential input is the digraph
D = (V, A) with |V | vertices and | A| arcs, the | A| weights w( a) on the arcs, and the two
vertices s and t.
First observe that we can do lines 1 3 in O(|V |) operations.
Now the lines 4 11 are repeated as long as we have grey vertices. Since every time we do
these lines, we recolour one of the grey vertices to white, and a white vertex never changes
colour, the number of times we have to do lines 4 10 is at most the number of vertices |V |.
The check in line 4 weve assumed is just one operation.
In line 5 we need to find the minimum of a finite set, and the size of that set is the number of
grey vertices at that moment. We dont really know how large that set is, and the best we can
say is that we never have more than |V | grey vertices. Using our minimisation procedure

MA 208 Optimisation Theory

Notes 7 Page 8

from Section 7.3, this means that one run of line 5 takes O(|V |) operations. And since we
might have to do this up to O(|V |) times, the total number of operations is O(|V |2 ).
Line 6 is one operation, which we have to do O(|V |) times.
In lines 7 10, we do something with all arcs that have the chosen vertex u as their tail. Each
vertex appears at most once as a chosen vertex u in those lines ( since u becomes white at
this point, and white vertices are never recoloured and never chosen in line 5 ). Lines 8 10
all take a constant number of operations. So the number of operations required in total in all
the times we perform lines 7 10 is at most a constant times the number of arcs. That leads
to an estimate of O(| A|) operations.
Once weve reached line 12, we only have to worry about the black vertices. We cant predict how many there will be, but its certainly not more than |V |. So line 12 takes O(|V |)
operations.
And, finally, line 13 is one operation.
Putting it all together, and applying the arithmetic of big-oh, we see that we need at most
O(|V |) + O(|V |2 ) + O(| A|) = O(|V |2 + | A|) operations. In fact, since | A| |V | (|V | 1)
( exercise ), we can write O(|V |2 + | A|) = O(|V |2 ).
So we can conclude :
Dijkstras Algorithm requires O(|V |2 ) operations to find the distance dist(s, t) of two vertices s, t in
a network ( D, w) with |V | vertices.

It is possible to organise the way we deal with grey vertices and the way we manipulate
the values d(v) for the grey vertices more carefully; in such a way that the total number of
operation required to find all the minima in line 5 is O(|V | ln |V |) instead of O(|V |2 ). Hence
that version of Dijkstras Algorithm would use O(| A| + |V | ln |V |) operations.
So is Dijkstras Algorithm any good ? In particular, is it more efficient than just finding all
paths ( or walks ) and checking which one has the lowest weight ? Well, in Exercise 10 you
will be asked to prove that there exist graphs with (n + 1)2 vertices that have more than 2n
different paths between
two particular vertices. So checking all paths for those graphs would

|V |1
involve at least 2
operations ( and that is ignoring the number of operations involved
in finding all paths and calculating the weight of each paths ).

Since 2 |V |1 grows much, much faster than |V |2 , we can indeed say that Dijkstras Algorithm is much more efficient than the brute force approach of finding all paths.

7.8 The Bellman-Ford Algorithm

As noticed already, Dijkstras Algorithm is not guaranteed to work if there are negative
weights in the network. ( It might give the right answer, but we cant rely on it. ) The algorithm in this section, generally known as the Bellman-Ford Algorithm2 , can be used whether
or not there are negative weights.
2

Regarding the names attached to this algorithm, we are in a similar situation as we were with Dijkstras
Algorithm. The first known published version was by Schinbel from 1955; it was rediscovered by Moore, and
by Woodbury & Dantzig, both in 1957; and then by Bellman in 1958. Since Bellman in his description used some
ideas from a publication of Ford from 1956, the name Ford became attached to the algorithm as well.

MA 208 Optimisation Theory

Notes 7 Page 9

Here is the algorithm. Again, we assume we have been given a network ( D, w) where D =
(V, A) is a digraph and w : A Z is a weight function. We are also given a source vertex s
and want to find dist(s, v) for all v V.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.

set d(s) = 0;
for all v V with (s, v) A : set d(v) = w(s, v);
for all v V, v 6= s, with (s, v)
/ A : set d(v) = +;
repeat |V | times :
{ for all arcs (u, v) A :
{ if d(u) 6= + :
{ if d(v) = + : set d(v) = d(u) + w(u, v);
if d(v) 6= + and d(v) > d(u) + w(u, v) : set d(v) = d(u) + w(u, v)
};
};
};
for all arcs (u, v) A :
{ if d(u) 6= + and d(v) > d(u) + w(u, v) :
declare Negative Cycle ! and STOP IMMEDIATELY;
};
for all v V : declare dist(s, v) to be d(v)

We will prove the correctness of the Bellman-Ford Algorithm in the lectures. A few further
comments about the algorithm are in place as well.
First notice that also this algorithm doesnt treat t any special. If there is no v V with
dist(s, v) = , then the algorithm will find dist(s, v) for all v V. If there are certain
vertices f for which we have dist(s, v) = , then it will return this fact by giving Negative
Cycle !. This means that the algorithm discovered that there is a cycle with negative weight
and so that there is a walk from s to a vertex on that cycle. It then immediately follows that
dist(s, v) = for all vertices v on the cycle.
Of course, it might be possible that there are some vertices v V with dist(s, v) = ,
but that t is not one of them. And hence the algorithm might give the outcome Negative
Cycle !, although it also could have determined dist(s, t). There are ways to overcome this
problem, but we wont go into them here.

It is possible to rewrite the whole of lines 6 10 as


set d(v) = min{ d(v), d(u) + w(u, v) },
provided we agree the convention that for any number a we have (+) + a = +, min{+, a} =
a, and that min{+, +} = +.

Finally, how many operations do we need for the Bellman-Ford Algorithm ? Looking through
the algorithm, is should be obvious that most of the work is done in lines 4 11. In fact,
the whole of lines 5 11 is done |V | times. And then for each time we do those lines, we
need to do | A| times the operations in lines 6 10. So the work in lines 4 11 would require
O(|V | | A|) operations.

MA 208 Optimisation Theory

Notes 7 Page 10

None of the other steps would require more operations than that, hence we can say :
The Bellman-Ford Algorithm requires O(|V | | A|) operations to find a negative cycle or the distances
dist(s, v) for all v V, in a network ( D, w) with |V | vertices and | A| arcs.

7.9 All-Pairs Shortest Paths and the Floyd-Warschall Algorithm


In this section we will look at algorithms that try to find dist(s, t) for all pairs of vertices s, t.
We saw already that both Dijkstras Algorithm and the Bellman-Ford Algorithm in fact determine dist(s, v) for a given s and all v V ( for the specific problems for which the algorithms
can be applied ). So we could just use these algorithms |V | times, for each possible v V.
With our version of Dijkstras Algorithm, this would take O(|V |3 ) operations; while the
Bellman-Ford Algorithm would use O(|V |2 | A|) operations to find the distance between all
pairs. It is actually hard to beat Dijkstras Algorithm in this case if there are no negative arc
weights. But if there are negative edge weights, but no negative cycles, then we actually
can do better than the O(|V |2 | A|) operations that repeated application of the Bellman-Short
Algorithm would give us.

For the remainder of the section we assume that there are no negative cycles, but there might
still be arcs with negative weight. The following algorithm, known as the Floyd-Warschall
Algorithm3 , will find dist(s, t) for all vertices s, t. It works as long as there are no negative
cycles, but allows negative weight arcs.
Here is the algorithm. Again, we assume we have been given a network ( D, w) where D =
(V, A) is a digraph and w : A Z is a weight function. For this algorithm we assume that
the vertices are numbered from 1 to |V |.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

for all (u, v) A : set d(u, v; 0) = w(u, v);


for all u, v V with (u, v)
/ A : set d(u, v; 0) = +;
set k = 1;
as long as k |V | :
{ for all u, v V :
{ if d(u, v; k 1) d(u, k; k 1) + d(k, v; k 1) : set d(u, v; k) = d(u, v; k 1);
if d(u, k; k 1) + d(k, v; k 1) < d(u, v; k 1) :
set d(u, v; k) = d(u, k; k 1) + d(k, v; k 1)
};
set k = k + 1
};
for all s, t V : declare dist(s, t) to be d(s, t; |V |)

The ideas behind this algorithm, and the proof of its correctness, will be discussed in the
lecture.

Its getting boring, but once again we have an example where the persons after whom it is named published
their work later ( in two different papers from 1962 ) than the oldest known appearance of the algorithm ( by Roy
in 1959 ).

MA 208 Optimisation Theory

Notes 7 Page 11

Here we need to give some words of care about lines 6 8. Some of the d(i, j; k 1) in the
checking of d(u, v; k 1) d(u, k; k 1) + d(k, v; k 1) may be equal to +. So we again use
the convention that (+) + (+) = +, and for any number a we agree (+) + a = +
and a < +.

Once again, we have a quick look at the number of operations needed in the Floyd-Warschall
Algorithm. It shouldnt be too hard to convince yourself that most of the work is done in
lines 4 11. The number of times we go through those lines is |V | : it takes that long to get
from k = 1 to a value of k larger than |V |, since we increase k by one every time. And then for
every k, we go through lines 6 9 for all vertex pairs u, v. Hence lines 6 9 are encountered
|V |2 times for each value of k. Since the operations in lines 6 9 require some small constant
number of operations, we see that the total number of operations when performing all of
lines 4 11 is O(|V | |V |2 ) = O(|V |3 ).
All the other steps require significantly less than that number of operations. We summarise :
The Floyd-Warschall Algorithm requires O(|V |3 ) operations to find the distances dist(s, t) for all
vertex pairs s, t in a network ( D, w) without negative cycles ( where |V | is the number of vertices in
the digraph D ).
In particular we see that the number of operations of the Floyd-Warschall Algorithm is of
the same order as the number of operations of repeating Dijkstras Algorithm for all starting
vertices s. But Floyd-Warschall can be used if there are negative weights, whereas Dijkstras
Algorithm cant be trusted in these situations.

Suitable exercises from the Biggs book related to these notes


Section 14.1 : Questions 1, 2 and 3.
Section 14.6 : Questions 1, 2, 3 and 4.
Section 18.1 : Questions 1 and 4.

Extra Exercises
1

Give a formal prove of the following claim : If A is a non-empty finite set of real numbers, then A
contains a maximum, i.e., there is an a A so that a a for all a A. ( Hint : use induction
on the number of elements of A. )

Let u, v, w be three distinct vertices in a digraph D. Prove that if there is a path from u to v
and a path from v to w, then there is a path from u to w. ( Note : we cant just take a u, v-path
and a v, w-path and put them in a row. ( Why not ? ) )

Prove Property 7.1. I.e., prove that it wouldnt matter if we replaced walks by paths in the
definition of strongly connected.

MA 208 Optimisation Theory

Notes 7 Page 12

Let u, v be two distinct vertices in a strongly connected digraph.


(a) Prove that if there is an arc (u, v), then there is a directed cycle containing both u and v.
(b) Decide whether or not we can always reach the same conclusion if (u, v) is not a arc.

Suppose that ( D, w) is a network so that D is strongly connected.


(a) Assume w( a) 0 for all arcs a. Prove that this means that dist(u, v) 6= for all
pairs u, v.
(b) Now assume D contains a tour T with negative weight, w( T ) < 0. Prove that this means
that dist(u, v) = for all pairs u, v ( even if u = v ).

The big-oh notation is defined in Section 14.6 of the Biggs book in a slightly different way
then we do in Section 7.6 of these notes. So lets define f (n) = O0 ( g(n)) the Biggs way : For
two functions f , g : N R+ , we say that f (n) = O0 ( g(n)) if there exists a constant K > 0
so that f (n) K g(n) for all n N, with possibly a finite number of exceptions.
Prove that the two definition are equivalent. I.e., prove :
(a) If f (n) = O( g(n)) ( definition from these notes ), then f (n) = O0 ( g(n)) ( definition from
the Biggs book ).
(b) If f (n) = O0 ( g(n)), then f (n) = O( g(n)).

Let p(n) be a polynomial of degree d 1 so that p(n) N for all n N.


(a) Prove that p(n) = O(nd ).
(b) Prove that nd = O( p(n)).

Prove that for all real numbers a, b > 1 we have loga (n) = O(logb (n)).

Let D = (V, A) be a digraph.


(a) Prove that | A| |V | (|V | 1).
Is it possible to have equality ? If so, for what digraphs does that happen ?
(b) Suppose that D has the property that for each pair u, v, u 6= v, at most one of the two
possible arcs (u, v), (v, u) is present. What is the largest number of arcs that D can have
under that condition ?

MA 208 Optimisation Theory

10

Notes 7 Page 13

For m, n 0, let M (m, n) be the Manhattan digraph. This digraph has as vertices all pairs (i, j)
with 0 i n and 0 j n. And there is an arc from a pair (i, j) to a pair (i0 , j0 ) if i0 = i
and j0 = j + 1, or if j0 = j and i0 = i + 1. This is a sketch of M(3, 2) :
u -

u -

u -

u B

u -

u -

u -

u -

u -

A u -

We are interested in number of directed paths from A = (0, 0) to B = (m, n). Lets call this
number p(m, n).
(a) Show that p(0, n) = 1 and p(m, 0) = 1 for all m, n 0.
(b) Show that for all m, n 1 we have p(m, n) = p(m 1, n) + p(m, n 1).


(m + n)!
m+n
a
(c) Prove that p(m, n) =
=
, for all m, n 0. ( Here
is the binomial
n
b
m! n!
a!
number
, and a! is the factorial a! = a ( a 1) ( a 2) 2 1. )
b! ( a b)!
( Hint : use the earlier parts and induction. )
(d) Prove that for all n 0 we have p(n, n) 2n .

11

Consider the following network, with vertex set V = {s, a, b, c, d, e, f }, and the weights are
given besides the arcs :
f
v

3 HH 2
*

YH
H
v

1
Hv e
d H

HH
*5
0 YH c
v

H
10 ?
12

H 7 6
8
*
Y
H
HHv
v

4 6
b
a H
HH
jH

6 H
3
v

s
(a) Describe how Dijkstras Algorithm would progress on this network when determining
dist(s, v) for all vertices v V.
(b) Describe how the Bellman-Ford Algorithm would progress on this network when determining dist(s, v) for all vertices v V.
(c) Describe how the Floyd-Warschall Algorithm would progress on this network when
determining dist(u, v) for all pairs u, v.

MA 208 Optimisation Theory

12

Notes 7 Page 14

Consider the following network, with vertex set V = {s, a, b, c, d, e, f }, and the weights are
given besides the arcs :
f
v

8 HH 2

YH
H
v

1
Hv e
d H
HH

*
0 H c 5
H
v
3 ?
3
?
HHH
1
8
Y
*
HH
v

v
4 6
b
a H
HH
YH

6 H
3
v

s
(a) Describe how Dijkstras Algorithm would progress on this network when determining
dist(s, v) for all vertices v V. Are all the answers correct ?
(b) Describe how the Bellman-Ford Algorithm would progress on this network when determining dist(s, v) for all vertices v V.
(c) Describe how the Floyd-Warschall Algorithm would progress on this network when
determining dist(u, v) for all pairs u, v.

13

Suppose we want to calculate dist(s, t) in a network where certain weights are negative. We
use the Bellman-Ford Algorithm, and for the network were interested in it declares Negative Cycle !. Thats good information to have, but it doesnt really tell us what dist(s, t)
is.
Describe how you can modify the Bellman-Ford Algorithm so that for specific given s, t it
will return the right answer dist(s, t) ( if there are s, t-walks, but whose weights have no
lower bound; + if there are no s, t-walks; and the minimum weight of a s, t-walk if this
minimum exists ).

Optimisation Theory
MA 208

2007/08

Example exam questions


Below youll find a couple of questions and their solutions that could be exam questions for
the part of the course related to the material in Notes 7 ( digraphs and networks, shortest
paths, order of functions, algorithms and their analysis, shortest path algorithms ).
These question are meant to give an impression of what can be expected at the real exam.
The fact that certain types of questions or topics do not appear in this note, does not mean
they cant appear in the real exam ! They are a good training tool to see how you are doing
with your revision, but they dont have much predictive value.

Example Questions
1

Let D = (V, A) be a digraph.


(a) Give the definition of a directed tour in D.
Give the definition of a directed cycle in D.
Suppose the digraph D has a directed tour containing all vertices of D.
(b) Prove that this means that for every vertex v there is a directed cycle containing v.
(c) Decide whether or not we can always find a directed cycle in D containing all vertices.

Let R = { x R | x 1 }.
(a) Prove that if two functions f , g : N R satisfy f (n) = O( g(n)), then we have
ln( f (n)) = O(ln( g(n))).
(b) Give an example of two functions f , g : N R so that ln( f (n)) = O(ln( g(n))) but
not f (n) = O( g(n)).

Let N = ( D, w) be a network, with D = (V, A) a digraph and w : A Z+ a non-negative


weight function on the arcs of D. And let s V.
(a) Describe Dijkstras Algorithm to find dist(s, v) for all v V.
Recall that for given vertices s, t, Dijkstras Algorithm will find the minimal weight w(W )
of a walk W from s to t ( if such a walk exists ). Suppose we are interest in the minimum
weight w( T ) of a directed tour T containing s ( if such a tour exists ).
(b) Describe how you can use Dijkstras algorithm to solve this problem.
(c) Give an estimate for the number of operations needed for your algorithm in (b), in terms
of the input variables.
Author : Jan van den Heuvel

c London School of Economics, 2008

MA 208 Optimisation Theory

Example exam questions Page 2

Solutions
1

(a) A directed tour is a sequence of vertices v1 , v2 , . . . , vk where the first and the last vertex
are the same ( v1 = vk ) and every two consecutive vertices form an arc : (vi , vi+1 ) A
for all i = 1, 2, . . . , k 1.
A directed cycle is a directed tour in which all vertices, except the first and the last, are
distinct.
(b) Let T = v1 , . . . , vk be a directed tour that contains all vertices of D. And let v be a vertex
of D. Then we must have v = vi for some vi .
Proof 1 If T is a cycle, then we are done. So assume T is not a cycle, hence there
are v p , vq p < q, ( p, q) 6= (1, k), so that v p = vq . Then the two sequences T1 =
v1 , . . . , v p , vq+1 , . . . , vk and T2 = v p , . . . , vq are also directed tours, both of them shorter
than the original tour T. Moreover, vi must be in at least one of the two tours. Take the
tour T1 or T2 on which vi lies, and continue shortening it if it is not a cycle, but making
sure vi is still on it. After a finite number of reductions in length we must be done, and
then we have found a cycle containing vi .
Proof 2 We know that there is at least one tour containing v. Now among all tours that
contain v, let T 0 be the shortest one ( the one whose sequence v1 , . . . , vk0 has the smallest
number of vertices ). Then using the ideas from Proof 1, we cannot have any vertex
appearing more than once ( except v1 = vk0 ). So T 0 must be a cycle, and we are done.
(c) No, this is not the case. Consider the following digraph :
v4

v5

u
u
JJ
JJ

J
J
]
]
u - J
u - Ju

v1

v2

v3

Then the sequence v1 , v2 , v3 , v5 , v2 , v4 , v1 is a directed tour containing all vertices. But


there is no directed cycle containing all vertices.

(a) The fact that f (n) = O( g(n)) means that there exists constants C1 , C2 so that f (n)
C1 + C2 g(n) for all n N. Since g(n) 1 for all n N, this certainly means that
f (n) (C1 + C2 ) g(n) for all n N.
Then we have that ln( f (n)) ln((C1 + C2 ) g(n)) = ln(C1 + C2 ) + ln( g(n)) for all n
N. So by taking C10 = ln(C1 + C2 ) and C20 = 1, we have shown that ln( f (n)) C10 +
C20 ln( g(n)) for all n N. This means that ln( f (n)) = O(ln( g(n)) as required.
(b) Let f (n) = n2 and g(n) = n for all n N. Then we have that ln( f (n)) = ln(n2 ) =
2 ln(n) and ln( g(n)) = ln(n) for all n N. So by taking C1 = 0 and C2 = 2, we have
shown that ln( f (n)) C1 + C2 ln( g(n)) for all n N. This means that ln( f (n)) =
O(ln( g(n)).
But we dont have f (n) = O( g(n)). For that we would need that there are constants
D1 , D2 so that n2 D1 + D2 n for all n N. But for all constants D1 , D2 , if we take n
large enough, then we will always achieve n2 > D1 + D2 n. So it is not the case that
(n) = O( g(n)).

MA 208 Optimisation Theory

Example exam questions Page 3

(a) The following is a full description of Dijkstras Algorithm :


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

colour all vertices black;


colour s white and set d(s) = 0;
for all v V with (s, v) A : colour v grey and set d(v) = w(s, v);
as long as there are grey vertices :
{ let u be the grey vertex with d(u) minimal;
colour u white;
for all v V with (u, v) A :
{ if v is black : colour v grey and set d(v) = d(u) + w(u, v);
if v is grey and d(u) + w(u, v) < d(v) : replace d(v) by d(u) + w(u, v)
};
};
for all black vertices v : set d(v) = +;
declare dist(s, t) to be d(t)

(b) Every tour containing s can be written as s, v2 , . . . , vk , s, where each two consecutive
vertices are connected by an arc. In particular notice that such a tour must start with an
arc (s, v2 ) and then is a walk from v2 to s. And the weight of such a tour is the weight
w(s, v2 ) of the arc (s, v2 ) plus the weight of the walk from v2 to s.
So to find the shortest tour containing s, we can do the following :
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

set M = +;
for all v V do the following :
{ if (s, v) is not an arc : do nothing;
if (s, v) is an arc :
{ use Dijkstras Algorithm to find the shortest walk from v to s;
call the weight of this shortest walk m;
if w(s, v) + m < M : replace M by w(s, v) + m
};
};
declare the minimum weight of a tour containing s to be M

(c) Most of the work for the algorithm in (b) is done in lines 2 9. In those lines we follow
lines 5 8 for every vertex v for which (s, v) is an arc. Lines 6 and 7 cost a small constant
number of operations. But in line 5 we perform Dijkstras Algorithm for the network,
hence that line requires O(|V |2 ) operations. So the number of operations for lines 5 8
is O(|V |2 ).
And the number of times we have to go through lines 5 8 is O(|V |). So the number of
operations for lines 2 9 is O(|V |) O(|V |2 ) = O(|V |3 ).
Lines 1 and 10 are just a few operations.
So we can conclude that the algorithm in (b) will require O(|V |3 ) operations.

Summer 2008 examination

MA 208
Optimisation Theory
(Half Unit)
2007 / 08 syllabus only not for resit candidates

Instructions to candidates
Time allowed :

2 hours.

This exam contains 5 questions. You may attempt as many questions as you wish,
but only your best 4 questions will count towards the final mark.
All questions carry equal numbers of marks.
Answers should be justified by showing work.
You are supplied with :

Answer booklets.

Remarks :

Calculators are NOT allowed in this exam.

c LSE 2008/MA 208

Page 1 of 5

(a) Let N = (D, w) be a network, with D = (V, A) a digraph and w : A Z a weight


function on the arcs.
(i) Give the definition of the distance dist(u, v) for two vertices u, v.
(ii) Prove that if dist(u, v) 6= for some u 6= v, then there is a path P so that
dist(u, v) = w(P ).
(b) Given the following Linear Programming problem :
Maximise
subject to

x1 + x2 + x3 ,
x1 + x2 2,
x1 + x3 3,

(1)

x2 + x3 5,
x1 , x2 , x3 0.
An optimal solution of this system is obtained at x = (0, 2, 3). ( You dont have to
prove this. )
(i) Formulate the Dual LP-problem of (1).
(ii) Describe the information that can be obtained from the Strong Duality Property
for the LP-problem in (1) and its Dual.
(iii) Describe the information that the Complementary Slackness Conditions provide
for the Dual LP-problem obtained in (i).
(iv) Find an optimal solution for the Dual LP-problem from (i).
Is this solution unique ?

(a) Consider the following function f : R2 R :


f (x, y) = y 2 + x2 cos( 12 y 2 ).
(i) Show that f has no maximum or minimum on R2 .
Let D be the set D = { (x, y) R2 | 1 < x < 4, 1 < y < 2 }.
(ii) Find the two critical points of f in D.
(iii) Determine if f has a global maximum or minimum on D.
(b) (i) Define what it means for two functions f, g : N R to satisfy f (n) = O(g(n)).
For a function f : N R+ , define Sf : N R+ by Sf (n) =

n
P

f (k) for all n N.

k=1

(ii) Prove that if f : N R+ satisfies f (n) = O(n), then Sf (n) = O(n2 ).


(iii) Give an example of two functions f, g : N R+ so that f (n) = O(g(n)), but it
is not the case that Sf (n) = O((g(n)2 ). ( Make sure you justify your answer. )
c LSE 2008/MA 208

Page 2 of 5

Consider the following inequality constraints optimisation problem


maximise f (x, y, z) = x y + 2 x z + 2 y z,
for D = { (x, y, z) R3 | 2 z 2 x + y and x2 + y 2 2 }.
(a) (i) Explain why f must have a maximum on D.
(ii) Check for which points the Constraint Qualifications are satisfied.
(iii) Formulate the Kuhn-Tucker equations whose solutions provide information about
the maxima of f on D. ( You do not have to solve these equations ! )
Which of the following points are candidate maxima, according to these equations ?
(1.) (x, y, z) = (1, 1, 1);
(2.) (x, y, z) = (1, 1, 1);
(3.) (x, y, z) = (1, 1, 0);
(4.) (x, y, z) = (1, 1, 0);
(5.) (x, y, z) = (9, 9, 3).
(iv) Under the assumption that no other points than those encountered so far can be
maxima, what is the solution to the optimisation problem ?
(b) Without repeating calculations from above, what can you say about the maximum
of f (x, y, z) on :
(i) D0 = { (x, y, z) R3 | 2 z 2 x + y and x2 + y 2 2 };
(ii) D00 = { (x, y, z) R3 | 2 z 2 = x + y and x2 + y 2 2 }.

c LSE 2008/MA 208

Page 3 of 5

(a) A monopolist produces one product in a market. He has a fixed cost C > 0, and
when he produces x units of product, then he has a cost c(x) per unit and can ask a
price p(x) per unit so that all units can be sold. So his profit as a function of units
produced is
(x) = x p(x) x c(x) C,
and of course this profit should be maximised, for x 0.
It is known that p : R+ R+ is a continuous function satisfying p(x) as
x , for some 0; and that c : R+ R+ is a continuous function satisfying
c(x) as x , for some 0.
For the following cases, decide if a maximum of (x) for x 0 always exists or not.
Justify your answers.
(i) > ;
(ii) > ;
(iii) = 0 and = 0.
(b) Consider the function h(x, y) = x2 + y 2 on the set D = { (x, y) | y 2 = x2 (1 x2 ) }.
(i) Explain why you can be sure that h has a maximum and a minimum on D.
(ii) Find the maxima and minima of h on D.
Make sure to explain why you are justified in each of the steps you do in your
analysis.
The constraint set is replaced by the set D = { (x, y) | y 2 = x2 (1 x2 ) + }, where
is a ( negative or positive ) real number close to zero. So now we are looking for the
extrema of h on D .
(iii) Based on the results obtained in (ii), what can you say approximately about the
extrema of h on D ?

c LSE 2008/MA 208

Page 4 of 5

(a) Determine, justifying your answer, if the following statements are true :
(i) Let D1 = { x R | |x| < 1 } and D2 = { x R | |x| 1 }. If f : R R is a
( not necessarily continuous ) function which has a maximum on D1 , then f has a
maximum on D2 .
(ii) Let D3 = { x R2 | kxk < 1 } and D4 = { x R2 | kxk 1 }. If g : R2 R is a
( not necessarily continuous ) function which has a maximum on D3 , then g has a
maximum on D4 .
(b) Let N = (D, w) be a network, with D = (V, A) a digraph and w : A Z a weight
function on the arcs. And let s V .
(i) Describe the Bellman-Ford Algorithm to find dist(s, v) for all v V .
The Bellman-Ford Algorithm is run on a particular network N = (D, w). For that
particular instance, the output of the algorithm appears to be a list dist(s, v) 6=
for all v V . Moreover, for all v 6= s the output gives dist(s, v) > 0.
Based on the output of the Bellman-Ford Algorithm for this network, which of the
following statements are justified ? ( Make sure you justify your answers, either by
explaining why it is true, or by providing a counterexample. )
(ii) We must have w(a) 0 for all a A.
(iii) There are no cycles in the digraph with negative weight.
(iv) The digraph D is strongly connected.

END OF EXAM

c LSE 2008/MA 208

Page 5 of 5

1. (a) State the Weierstrass Theorem.


A continuous real valued function dened on a compact set has a
minimum and a maximum.
(b) Dene (in terms of sequences) what it means for a subset of Rn
to be compact and show that a compact set must be bounded.
A set is compact if every sequence of elements of the set has a
subsequence which converges to a limit in the set. Suppose that
K is an unbounded set. For every n we may choose an xn 2 K
with jjxn jj n: Consider the sequence fxn g1
n=1 we have generated.
Suppose it converges to some limit x; with some length jjxjj = L:
But jjxn xjj jjjxn jj jjxjjj jn Lj ; which certainly does not
converge (it goes to innity), so xn cannot converge to any limit
x:
(c) Show that the function f : R2 ! R dened by
f (x; y) = sin2 (x + y) cos3 (x

y)

has a minimum and a maximum on R2 : What are they? (Hint:


consider f (x + ; y), f (x; y + ) : Dont take any derivatives.)
Dene the set K = f(x; y) : 0 x
;0 y
g : Clearly K is
a closed (and bounded) square, and consequently is compact. So by
the Weierstrass Theorem, f has a max and min in K: For every
(x; y) in R2 ; there exist integers k; l; such that (x + k ; y + l )
belongs to K: Consequently the max (min) on K is a max (min)
on R2 :
To nd the max, we rst see if we are lucky and can simultaneously
maximize both factors of f: We can! To do this we must have
x + y = =2 and x y = 0; which can be accomplished by x = y =
=4; f = 1: Similarly to get the minimumum possible of 1; we
need sin2 (x + y) = 1; or sin (x + y) = 1 and cos (x y) = 1;
or x y = : Hence we need to solve x + y = =2; x y = ; so
x = 5 =8 and y = 13 =8 gives the minimum,f = 1: In fact if
described carefully, this analysis can avoid the WT.
1

(d) Prove that every polynomial p (x) = an xn + an 1 xn 1 + + a0 of


even degree (n is even and an 6= 0) has a maximum or a minimum,
but not both. What can you say about maxima and minima of
odd degree polynomials
Suppose an > 0: Then limx!1 p (x) = 1 and limx! 1 p (x) = 1:
Hence there is an M + > 0 such that x
M + implies p (x)
p (0) and M < 0 such that x
M implies p (x)
p (0) :
+
Observe that the interval K = [M ; M ] is closed and bounded,
hence compact. A polynomial is continuous. By the WT, there
is an x 2 K which minimimizes p on K: Since 0 belongs to K;
p (x ) p (0) : For every x 2 R , we have either x < M ; x 2 K;
or x > M + : In all three cases we have shown that p (x) p (x ) ;
so p has a global minimum at x : On the other hand, p cannot
have a maximum on R because it gets arbitrarily large as x goes
to innity. If an < 0 the same argument shows that it has a
maximum and no minimum. If p has odd degree, its range is all
of R; which has no max or min element.

2. (a) Use Lagranges Theorem to nd the maximum and minimum of


the function f (x; y) = x + 2y, on the ellipse 2x2 + 2y 2 xy = 1:
Write g (x; y) = 2x2 + 2y 2
Dg =

4x y
x + 4y

1: We have Df =

xy

1
2

and

: So Dg forms an independent set unless


4x y = 0
x + 4y = 0

which holds only at (0; 0) ; which is not on the ellipse. Otherwise,


we have
1
2

4x y
; or
x + 4y
(4x y) ; and
( x + 4y)

=
1 =
2 =

Solving for

in the two equations, we get


1

2 (4x
8x

4x
y) = 4y
2y = 4y

y
4y x
x; or
x
3
9x = 6y; or y = x:
2

; or

Subsituting this into the constraint, gives


2

3
3
x
x
x = 1; or
2
2
x2 = 1=5; so solutions are
3
1
3
1
x = p ; y = p ; and x = p ; y = p :
5
2 5
5
2 5
2x2 + 2

1
3
4p
3
4p
1
p ; p
=
5 > 0 and f p ; p
=
5<
5
5
5 2 5
5 2 5
0; so the former must be the maximum and the latter the minimum.

We have f

(b) Analyse the constraint qualications for the following sets of constraints.
i. g1 (x; y; z) = (x 1)2 + y 2 1 = 0; g2 (x; y; z) = (x
y 2 4 = 0: (These are cylinder sets.)

2)2 +

1
0
1
2x 2
2x 4
We have Dg1 = @ 2y A and Dg2 = @ 2y A :
0
0

The constraint set is of the form (0;0


0; z) ; that
the vertical
1 is, 0
1
2
4
1
line x = y = 0: On this set Dg1 = @ 0 A = @ 0 A =
2
0
0
Dg2 ; so Dg1 and Dg2 are dependent. The constraint qualication is not satised.
ii. g1 (x; y) = x 1 = 0; g2 (x; y) = y 1 = 0:
1
0
Here we have Dg1 =
and Dg2 =
; so they are
0
1
independent, and the constraint qualication is satised.

3. (a) Maximize the function f (x; y) = y 2 x, over the unit disk D =


fx2 + y 2 1g :
The objective function is continuous and D is closed and bounded,
hence compact. W-Theorem guarantees a maximum. Write the
constraint in the form g (x; y) = 1 x2 y 2 : We have
Df =

1
2y

; Dg =

2x
2y

We have Dg = 0 (vector) only at (0; 0) ; but g 6= 0 at (0; 0) :


Otherwise we have for (x; y) with g (x; y) = 0;
1
2y
1
2y
1 x2

2x
2y

= 0; or

2 x = 0;
2 y = 2y (1
)=0
2
y
= 0 (but no interior max as Df 6= 0 )

The second equation gives y = 0 or = 1: If y = 0; the rst


equation gives x = 1= (2 ) < 0: With the constraint g (x; y) = 0
this gives x = 1; and the pair ( 1; 0) : If = 1; the rst equation
2
2
gives x = 1=2;
p and then the constraint gives y = 1 ( 1=2) =
3=2: We have
3=4; or y =

f (0; 0) = 0
f ( 1; 0) = 1
p
1=2;
3=2 = 3=4 + 1=2 = 5=4

So f is maximized at the two points

1=2;

3=2 :

(b) Consider the constraint qualications for the three constraints x2 +


y 2 1; y 1=2; and y
1=2:
2
2
Write g1 (x; y) = 1 x y ; g2 (x; y) = 1=2 y;, and g3 = y +1=2:
We have
Dg1 =

2x
2y

; Dg2 =

0
1

; Dg3 =

0
1

No point satises gi = 0 for both i = 2; 3: If E = f1g (only


g1 = 0) then Dg1 = 0 only for x = y = 0; where the constraint
qualication fails. If E = f1; 3g ; then Dg1 = Dg3 gives x = 0;
but (0; 1=2) does not satisy g1 = 0: Next, consider E = f1; 2g ;
then Dg1 = Dg2 gives x = 0; but (0; 1=2) does not satisy g1 = 0.
If E = f2g or f3g ; the single vector is not zero, so is independent.
So the constaint qualication holds everywhere.

4. (a) Consider the LP


minimize 2y1

2y1
y1

3y2 + 4y3 ; subject to:


2y2 y3
+ 3y3
3y2
y 1 ; y2 ; y3

2
+3
4
0

i. Show that this LP is feasible


The second inequality has the highest bound, so try that rst.
Try, say y1 = 0 and y3 = 1: Then take y2 = 0 to satisfy other
two inequalities.
ii. Write down its dual (and compare it to the original primal
problem)
maximize

2x1 + 3x2
2x2 + x3
2x1 3x3
x1 + 3x2
xi

4x3 ; subject to
2
3
4
0

The domain is the same, the objective function is the negative


of that in the primal.
iii. Show that the value (the is, the minimum) of the original
problem is 0 without doing any calculations.
The dual has the same domain D as the original problem, with
an objective function which is the negative of that one. So if
the the minimum of the original problem is V; the maximum
of this problem is V: But the strong law of duality says that
these values are the same, so V = V; or V = 0:

(b) Consider the LP


minimize 8u + 5v + 30w, subject to
u + v + 2w
2u + v + 3w
u; v; w

5
6
0

i. Write down the dual maximum problem.


max
x + 2y
x+y
2x + 3y
x
y

5x + 6y
8
5
30
0
0

ii. Describe the feasible set by indicating its vertices.

D has vertices (0; 0) ; (0; 4) ; (2; 3) (5; 0) ; max at (2; 3)


iii. Solve the maximum problem.
Maximum of 28 is at (2; 3) :
iv. Which elements of this solution are slack?
x + 2y
x+y
2x + 3y
x
y

=
=
<
>
>

8
5
30 slack
0 slack
0 slack

v. Use this information to solve the original minimum problem.


Since third inequality is slack, third variable w is 0: Since both
variables are slack, both inequalities are equalities. This gives
a modied problem:
minimize 8u + 5v
u+v = 5
2u + v = 6
u
0
v
0
Minimum of 28 is at: fu = 1; w = 0; v = 4g

5. Multiplying an m k matrix by a k n matrix requires mkn multiplications (n for each of the mk elements of the product m k matrix).
(a) Given matrices A (2
((AB) C) D
(A (BC)) D
A ((BC) D)
(AB) (CD)
A (B (CD))

has
has
has
has
has

5) ; B (5
2
5
5
2
1

5
1
1
5
3

1+2
3+2
3+5
1+1
6+5

1) ; C (1
1
5
3
3
1

3+2
3+2
6+2
6+2
6+2

3
3
5
1
5

3) ; and D (3
6
6
6
6
6

6)

52 mults
81 mults
165 mults
40 mults
108 mults

=
=
=
=
=

(only third and fourth were required for this answer)


(b) More generally, we may ask this questions for the matrix product
A1 A2 : : : An ; where Ai is a di di+1 matrix, given d1 ; d2 ; : : : ; dn+1 :
Let V (i; j) denote the fewest multiplications required to calculate
Ai Ai+1 : : : Aj ; so that V (1; n) is what we want. Note that the
outmost parentheses partition the product at some k (after some
Ak ); the example in part (a) has k = 3; the location of the last
matrix multiplication. At the optimum, the matrices on either
side of k must be optimally multiplied. So write the Bellman
equation in a form beginning with
V (i; j) = min [di dk+1 dj+1 +?+?] ;
i k j 1

where the ?srefer to multiplications to the left and right of k:


V (i; j) = min [di dk+1 dj+1 + V (i; k) + V (k + 1; j)]
i k j 1

(c) What is V (i; i + 1)?


V (i; i + 1) = di di+1 di+2 ; V (i; i) = 0
(d) For n = 5, write the formula for the entries
= V (2; 4) of
i=1
j=1 0
2
d1 d2 d3
V =
3
4
5
10

0
d2 d3 d4

= V (3; 4) and
5

Clearly V (3; 4) = d3 d4 d5 :
V (2; 4) = min[d2 d3 d5 + 0 + d3 d4 d5 ; d2 d4 d5 + d2 d3 d4 + 0]
(e) For n = 5 and d1 = 2; d2 = 3; d3 = 1; d4 = 3; d5 = 6; d6 = 4;
i=1
j=1 0
2
6
3
12
4
36
5

11

0
9
36

0
18

1. Let D = fx = (x1 ; x2 ) : h1 (x1 ; x2 ) = 2x21

x2

x31 + x2

0; h2 (x1 ; x2 ) =

a Draw the set D and analyze the Constraint Qualications.


ans:
y

0
0

0.5

1.5

2
x

3x21 ; +1

Dh1 = (4x1 ; 1) ; Dh2 =

Since neither is ever zero we need only check where they are
collinear, which means their rst coordinates must be negatives
of each other. So consider
4x1 = 3x21 ; with solutions x1 = 0 and x1 = 4=3:
But the constraints are both eective only at x1 = 0; where x2 = 0
also. So the Constraint Qualication holds on D except at (0; 0) :

b Determine the set of vectors c such that the objective function c x


is maximized on D at (i) (2; 8) ; (ii) (0; 0) : In which case is the
answer simply the set of non-negative linear combinations of the
outward normals Dh1 and Dh2 at that point?
ans: If (2; 8) is the maximizing point, c must be of the form
1

( Dh1 ) +

( Dh2 ) =

( 8; 1) +

(12; 1) ;

1;

0;

by the Kuhn-Tucker analysis. If (0; 0) is the maximizing point,


then c1
0 (must point leftwards). The set of non-negative
linear combinations of the outward normals Dh1 = (0; 1) and
Dh2 = (0; +1) at (0; 0) is simply the set of c with c1 = 0 (points
vertically).

0g :

c Maximise f (x1 ; x2 ) = x21 x2 on D:


ans: Consider the KT equations
(2x1 ; 1) +

(2x1 ; 1) +

3x21 ; +1 = 0
2
x2 = 0
1 h1 =
1 2x1
3
x1 + x2 = 0
2 h2 =
2
h1 ; h2
0
0
1; 2

If h1 = h2 = 0; then (x1 ; x2 ) must be (0; 0) (which we consider


separately later, as it is not covered by KT) or (2; 8) : So the rst
equation becomes
(4; 1) =

( 8; 1) +

(12; 1) ;

which has no solution for 1 ; 2 0:


If h1 = 0 and h2 > 0; then 2 = 0 so the rst equation becomes
(2x1 ; 1) +

(4x1 ; 1) = 0;

but equating rst coordinates gives 1 < 0:


Finally, if h1 > 0 and h2 = 0; we have
(2x1 ; 1) +

3x21 ; +1
x31 + x2

= 0;
= 0

The top equation, second coordinate, gives


coordinate gives
3x21 = 2x1 ; x1 = 2=3:

= 1: Then the rst

The second equation gives


x2 = (2=3)3 ; f 2=3; (2=3)3 =

4
9

8
4
= :
27
27

We must compare this positive number to the value f (0; 0) = 0


for the single point (0; 0) where KT does not apply.
Now the set D is closed and bounded, and the objective function
is continuous. So by WT there is a maximum. By KT the max
can only be at 2=3; (2=3)3 or (0; 0) so it is at the former. Note:
A few students noticed (correctly that the domain is unbounded
to the left of the y axis; as a boundary condition was omitted
from the question). Obviously full credit was given for that correct
solution, but students who did problem as here also were given full
credit.
3

2. a Dene what it means for a subset K of Rn to be compact (state


the denition in terms of sequences). State two properties which
together are necessary and su cient for a subset C of Rn to be
compact. Give examples of subsets F1 and F2 of R1 which are not
compact but have one of these properties (a dierent one for F1
and for F2 ), and use the denition of compactness to show they
are not compact.
ans: K is compact if every sequence in K has a subsequence which
converges to a limit in K: Take, for example, F1 = Z and F2 =
(0; 1) : The rst is closed but not bounded; the second is bounded
but not closed. The rst is not compact because the sequence
(1; 2; 3; : : : ) has no convergent subsequence; the second is the same
for the sequence 1 1=n; where any convergent subsequnce has
limit 1; which is not in the set.
b Prove that if K
Rn is compact and f : Rn ! Rm is continuous,
then f (K) is compact.
ans: bookwork
c Consider the innite dimensional (Hilbert) space H = R1 which
consists of all sequences of real numbers x = (x1 ; x2 ; x3 ; : : : ) such
that
x21 + x22 + x23 +

< 1; with metric (distance)


q
(x1 y1 )2 +
:
d (x; y) =

Let B denote the unit ball in H; B = fx 2 H : d (0; x) 1g ;


where 0 = (0; 0; : : : ) : The set B is closed and bounded. Show
that it is not compact.
ans: Consider the sequence of unit vectors v i = (0; 0; : : : ; 0; 1; 0; : : : ) ;
i
j
where the
p1 is in the ith place. The distance between v and v ,
i 6= j; is 2: Hence there is no convergent subseqence (it is not a
Cauchy sequence, for example).
d Let g : [1; 1) ! R be a continuous function with lim g (x) = 2
x!1

and g (1) = 3: Prove that g has a maximum but not necessarily


a minimum. If g has no minimum, what can you say about b =
inf fg (x) ; x 2 [1; 1)g :
ans: Let g be as stated. Since lim g (x) = 2; there is an N > 0
x!1

such that g (x)


3 for all x
N: Let z be the maximum of g
on the compact set [1; N ] ; which exists by W-Theorem. Clearly
g (z) g (1) = 3; so g (z) is the global maximum. The continuous
function f (x) = 2 + 1=x; for example, has no minimum on [1; 1):
4

3. a Consider the problem


min x2 + y 2 ; subject to (x

1)3

y 2 = 0:

Draw the constraint curve and determine where the Constraint


Qualication fails. Explain why the minimum exists. Show that
LaGranges equation is never satised. So where is the minimum?
ans: The constraint is the curve y 2 = (x 1)3 ; and the latter is
non-negative only for x 1:
y

0.5

0
1

1.25

1.5

1.75

2
x

-0.5

-1

Note that original solutions had only top branch, so accepted this
(and other close pictures) from students. Since x 1 and y 2 0;
the objective function is at least 1. Since 12 + 02 = 1; this is
one way to establish minimum. For x or y > 2 objective is at
least 4; greater than f (1; 0) = 1: So by W-Theorem there is a
minimum on [1; 2] [0; 2] ; which is a global minimum. We have
2
Dg = D (x 1)3 y 2 = 3(x2y1) which is 00 only when x = 1
and y = 0: LaGranges equations are
2x

3 (x 1)2 = 0;
2y + 2 y = 0;
(x 1)3 y 2 = 0:

The second equation has the solution y = 0 (and then x = 1;from


bottom, but then x = 0 from top) or = 1: When = 1 the
top quadratic equation has negative discriminant and hence no
5

solution. So the minimum muct occur at the single point where


the CQ fails, namely (1; 0) :
b Let f : R3 ! R be dened by f (x; y; z) = x2 + y 2 + z 2 2xyz:
Show that f is neither bounded above or below. (Hint: nd
paths p (t) ; q (t) : [0; 1] ! R3 such that limt!1 f (p (t)) = 1
and limt!1 f (q (t)) = 1:) Show that (0; 0; 0) and (1; 1; 1) are
critical points and determine whether each is a local maximum, a
local minimum, or neither.
ans: Take p (t) = (t; 0; 0) ; so f (p (t)) = t2 ! 1: Take q (t) =
(t; t; t) ; so f (q (t)) = 3t2 2t3 ! 1:
0
1 0 1
2x 2yz
0
@
A
@
2y 2xz
0 A at both points.
Df =
=
2z 2xy
0
0

1 0
1 0
2
2z
2y
2 0 0
2
2
@
A
@
A
@
2z
2
2x
0 2 0 ;
2
D f =
=
2y
2x
2
0 0 2
2
at the two points.

2
2
2

1
2
2 A
2

The rst is positive denite, so (0; 0; 0) is a local minimum. At


(1; 1; 1) ; D2 f is neither positive or negative denite. The objective
function has a max along the path q, since the second derivative
1 2t is negative at t = 1: However along the path (t; 1; 1) ; where
f is t2 + 2 2t; it has a min since the second derivative 2t 1 is
positive. So (1; 1; 1) is neither a local max or a local min.

4. a Consider the LP
min 11x1 + 2x2 + 6x3 ; subject to
2x1 x2
3x1 + x2 + x3
x 1 ; x2 ; x3

1
5
0

i Write down the dual maximum problem.


ii Draw the feasible set of the dual and solve it graphically. Note
which of the constraints are tight at the maximum.
iii Use the previous part to solve the original LP.
ans: the dual is
max y1 + 5y2 ; subject to
2y1 + 3y2
y1 + y2
y2
y 1 ; y2

11
2
6
0

(1,3)

Feasible Set of Dual

Graphical analysis gives the point (1; 3) as maximum with


V = 16: The third constraint is slack, so we know from the
Complementary Slackness Theorem that x3 = 0 at min of
7

primal problem. Since y1 and y2 are positve, this reduces the


primal to
2x1 x2 = 1
3x1 + x2 = 5
Solution is: x1 = 65 ; x2 = 75 , and the value is 11 (6=5)+2 (7=5) =
16:
iv Suppose you want to demonstrate to your friend who does
not know duality theory of linear programming (and does not
want to learn it) that the answer you have arrived at, call it
V; is indeed the minimum value. Use your solution to the dual
problem to show that for some non-negative numbers a and b;
a times the rst inequality plus b times the second inequality
gives the inequality
11x1 + 2x2 + 6x3

V:

This will prove to your friend that the objective function cannot be less than V:
ans: Use the dual varible optimum, y1 = 1 and y2 = 3;
1 (2x1 x2 )
3 (3x1 + x2 + x3 )

1 (1)
3 (5)
adding gives
16 = V

11x1 + 2x2 + 6x3

b (This part counts much less than part a.) Explain why a pair of dual
linear programs cannot both be unbounded. For full credit, prove
any theorem you use.
Suppose that
max c x; subject to
Ax

b; x

and its dual minimum program


min b y; subject to
yT = AT y T

c; y

are both unbounded, and in particular feasible. Then for any


feasible x; y for the respective problems (taking y as a row vector),
we have
cx (yT ) x = y (T x) yb
Hence yb is an upper bound for cx; for any feasible x: Actually we
have proved more, that if one is unbounded, the other is infeasible.

5. An object is known to be hidden in one of two tunnels, each currently


lled with earth, and is known to be at distance from the surface of
either 1,2,3,4 in tunnel I or 1,2 in tunnel II. from the surface. It is
hidden in tunnel 1 at distance i = 1; 2; 3; 4 with probability pi and in
tunnel 2 at distance j = 1; 2; with probability qj : So p1 + p2 + p3 + p4 +
q1 + q2 = 1: In each time period t the searcher digs to the next distance
in one of the tunnels. If he digs in say tunnels (I, I, II, II, I, I) at times
t = 1; : : : ; 6 = T; the expected time E to nd the object is
E = 1 p 1 + 2 p 2 + 3 q1 + 4 q2 + 5 p 3 + 6 p 4 :
He wishes to minimize E. The state at the beginning of period t =
1; : : : ; 6 can be described as (i; j) (with i + j = t 1) if at that time
tunnel I has been searched up to distance i; tunnel II to distance j:
Solve the dynamic program from (0; 0), using the following steps.
(a) Write down the state space S; the action space A and feasible
action correspondence (s) ; the transition function f (s; a) ; and
the reward function rt (s; a) :
ans: A state of the system is given by (i; j) which means tunnel
I has been searched to distance i and tunnel II to distance j:
State (i; j) can occur only at time i + j + 1: The state space is
S = f(i; j) : i 2 f0; 1; 2; 3; 4g ; j 2 f0; 1; 2gg : The action space is
A = f1; 2g :
8
< f1; 2g if i < 4; j < 2
f1g if i < 4; j = 2
((i; j)) =
:
f2g if i = 4; j < 2
(i + j + 1) pi+1 if a = 1
(i + j + 1) qj+1 if a = 2

r ((i; j) ; a) =

(i + 1; j) if a = 1
(i; j + 1) if a = 2
for (i; j) 6= (4; 2) :

f ((i; j) ; a) =

(b) What is the Bellman Equation for this problem?


ans:
V (i; j) = Vi+j+1 (i; j)
= min ((i + j + 1) pi+1 + V (i + 1; j) ; (i + j + 1)qj+1 + V (i; j + 1))
10

(c) Suppose that p = (:1; :3; :25; :05) and q = (:2; :1) : Solve the dynamic program, stating the minimum of E and the optimal sequence of searching the two tunnels? Hint: copy the diagram
below and complete the backwards induction, writing V at nodes
and indicating the optimal action at each node by a thick arrow.

.3

.3

0
.6

.5

.8

.25

.6

0
0

ans:

3.05

q2=.1

.3

.2

1 2.9

1.2

1.25
1.55

2.75

.3
.2

q1=.2

2.7

1.8

.4

.6
.6

.8
.8

2.95

.75

0
.6

.25

.6
1

.2
1.6
1.6
1
i
2
3
4
P1=.1
p2=.3
p3=.25 p4=.05
Optimal path is (I, I, I, II, II, I )

0 3.05
0

.1

.3

.5

.4
.9

.3

2.35

11

Optimisation Theory

2007/08

MA 208
Solutions 2008 Exam
1

(a) (i)

If there is no walk from u to v in D, then dist(u, v) = +. If there are walks from u


to v, but the set { w(W ) | W a u, v-walk } has no lower bound, then dist(u, v) = .
Finally, if none of the two previous cases applies, then dist(u, v) is the minimum weight
of a u, v-walk.

(ii) The fact that dist(u, v) 6= means that there is a walk W from u to v so that dist(u, v) =
w(W ). Now let W = x1 , x2 . . . , xk , with x1 = u, xk = v, be a walk so that dist(u, v) =
w(W ) and so that W is as short as possible ( in terms of the number of vertices k ). We
will prove that W must be a path.
Suppose W is not a path. The only reason why this can be the case is if some vertex
appears more than once on W. Say xi = x j , for some i < j. If i = 1 ( so x j = xi =
x1 = u ), then let W 0 be the walk x j , x j+1 , . . . , xk l while if i 6= 1, then let W 0 be the path
x1 , x2 , . . . , xi1 , x j , x j+1 , . . . , xk . And let T be the tour xi , xi+1 , . . . , x j . ( This is a tour since
the first and last vertex are the same. )
We have that W 0 is a walk from u to v with fewer vertices than W. Hence we must have
w(W 0 ) > w(W ). Since w(W ) = w(W 0 ) + w( T ), this means that w( T ) < 0. But then we
can construct walks from u to v with arbitrarily low total weight : first walk from u to xi ,
then go round the tour T as many times as you want, and then walk from x j to xk . This
would mean that dist(u, v) = , contradicting the hypothesis.
So we can conclude that W is a path with dist(u, v) = w(W ).

(b) (i)

1 1 0
Setting c = (1, 1, 1), A = 1 0 1 and b = (2, 3, 5), the Dual LP-problem is
0 1 1
Minimise b y
subject to A0 y c,
y 0,
which becomes
Minimise 2 y1 + 3 y2 + 5 y3
subject to

y1 + y2
y1 + y3
y2 + y3
y1 , y2 , y3

Author : Jan van den Heuvel

1,
1,
1,
0.
c London School of Economics, 2008

MA 208 Optimisation Theory

Solutions 2008 Exam Page 2

(ii) For this LP-problem we are given that an optimal solution for the primal LP-problem
exists. So we know that an optimal solution y for the Dual LP-problem above exists.
Moreover, we know that this optimal solution must satisfy b y = c x , which gives
2 y1 + 3 y2 + 5 y3 = x1 + x2 + x3 = 5.
(iii) From the Complementary Slackness Conditions the following information can be obtained about an optimal solution y of the Dual LP-problem :
x1 = 0, so we have no information on the tightness of y1 + y2 1;
x2 > 0, so y1 + y3 1 must be tight : y1 + y3 = 1;
x3 > 0, so y2 + y3 1 must be tight : y2 + y3 = 1;
x1 + x2 = 2, so we have no information on the tightness of y1 0;
x1 + x3 = 3, so we have no information on the tightness of y2 0;
x2 + x3 = 5, so we have no information on the tightness of y3 0.
(iv) From (ii) and (iii) it follows that we are looking for (y1 , y2 , y3 ) with 2 y1 + 3 y2 + 5 y3 = 5,
y1 + y3 = 1 and y2 + y3 = 1. The last two equations give y1 = 1 y3 and y2 =
1 y3 . Substituting this into the first equation gives 2 2 y3 + 3 3 y3 + 5 y3 = 5. This
simplifies to 5 = 5.
So all solutions to these three equations have the form (1 a, 1 a, a), for any a. But the
coordinates must also be non-negative, hence we must add 1 a 0 and a 0. Also
y1 + y2 1, which means 2 (1 a) 1 hence a 1/2. We can conclude that all points
of the form y = (1 a, 1 a, a) with 0 a 1/2 are optimal solutions to the Dual
LP-problem.
In particular, the optimal solution is not unique.

For all points ( a, 0) R2 we have h( a, 0) = 02 + a2 cos(0) = a2 . So h( a, 0) if


a , and hence h cannot have a maximum on R2 .

If y = 2, then 12 y2 = , and hence cos( 12 y2 ) = 1. So we find find for all points

( a, 2) R2 that h( a, 2) = 2 a2 . This means that h( a, 2) if a ,


and hence h cannot have a maximum on R2 .

2 x cos( 12 y2 )
(ii) For the derivative we find Dh( x, y) =
.
2 y x2 y sin( 12 y2 )
For the critical points we need to find the solutions in D2 to Dh( x, y) = 0, hence we need
to find the solutions ( x, y) to 2 x cos( 21 y2 ) = 0 and 2 y x2 y sin( 12 y2 ) = 0. The
first equation has solutions if x = 0 or cos( 12 y2 ) = 0.
The possibility x = 0 in the second equation gives 2 y = 0, so y = 0.
We know that cos( 12 y2 ) = 0 only if 12 y2 = ( 21 + k) for k Z. So we must have
y2 = 1 + 2 k for some k Z. Since 1 < y < 2, hence 0 y2 4, the only possible

solutions are y2 = 1 and y2 = 3, hence y = 1 and y = 3.


The choice y = 1 in the second equation gives 2 x2 sin( 12 ) = 0. Since sin( 12 ) =
1, this simplifies to 2 x2 = 0, hence x2 = 2. Since we require 1 < x < 2, the only

solution is x = 2.

And the possibility y = 3 in the second equation gives 2 3 x2 3 sin( 23 ) = 0.

Since sin( 12 ) = 1, this simplifies to 2 3 + x2 3 = 0. This has no solutions for x.

So we find two critical points : ( x, y) = (0, 0) and ( x, y) = ( 2, 1).

(a) (i)

MA 208 Optimisation Theory

Solutions 2008 Exam Page 3

(iii) For the function values in the critical points we have h(0, 0) = 0 and h( 2, 1) = +
2 cos( 12 ) = . We next show that there are function values for points in D below
and above these two values, so that neither of them is a global minimum or maximum.
Since a minimum or maximum can only occur in a critical point ( the set D is open ), this
means that f has no global minimum or maximum on D .
Firstly, h( a, 0) = a2 . We have (2, 0) D . Since 22 = 4 > , we find h(2, 0) = 4 > =

h( 2, 1) > h(0, 0).

Secondly, we have (3, 2) D and h(3, 2) = 2 9 < 0 = h(0, 0) < h( , 12 ).


So none of the critical points are global maximum or minimum on D .

We say that f (n) = O( g(n)) if there exists constants C1 , C2 so that f (n) C1 + C2 g(n)
for all n N.
(ii) Suppose f (n) = O(n), so that there exist constants C1 , C2 so that f (n) C1 + C2 n for
all n N. Then for the function S f (n) we can derive, for all n N,

(b) (i)

S f (n) =

f (k)

k =1

(C1 + C2 k)

k =1
2

= C1 n + C2

C1 n + C2

k =1
2

k =1
2

= C1 n + C2 n C1 n + C2 n = (C1 + C2 ) n .
So if we set C3 = 0 and C4 = C1 + C2 , then for all n N we have S f (n) C3 + C4 n2 ,
proving that S f (n) = O(n2 ).
(iii) Define the functions f , g : N R+ by setting, for all n N, f (n) = 1 and g(n) = 1.
Then taking C1 = 0 and C2 = 1 we trivially have f (n) C1 + C2 g(n), hence f (n) =
O( g(n)).
n

For S f (n) we find S f (n) = 1 = n, and we also have ( g(n))2 = 1.


k =1

If it would be the case that S f (n) = O(( g(n))2 ), then there are constants C3 , C4 so that
for all n N we have S f (n) C3 + C4 ( g(n))2 . This is equivalent to n C3 + C4 for
all n N. Since no constants C3 , C4 can have C3 + C4 n for all odd n N, this
gives a contradiction. And hence we can conclude that it is not the case that S f (n) =
O(( g(n))2 ).

(a) (i)

The objective function f ( x, y, z) is continuous, and the constraint set D is closed ( all
constraints are of the type h( x, y, z) 0 with continuous constraint functions ). So we

only need to check if D is bounded. From x2 + y2 2 we


see that | x | 2 and
p

|y| 2 . This gives that 2 z2 x + y 2 2 , hence |z|


2. It follows that D is
bounded.
So by Weierstrass Theorem we can conclude that f ( x, y, z) has a maximum on D .

MA 208 Optimisation Theory

Solutions 2008 Exam Page 4

(ii) here are two constraint functions : h1 ( x, y, z) = x+ y 2 z2 0 and h2 ( x, y,


z) = 2

1
2 x
x2 y2 0. The derivatives are Dh1 ( x, y, z) = 1 and Dh2 ( x, y, z) = 2 y .
4 z
0
The possible sets of effective constraints are { h1 }, { h2 } and { h1 , h2 }.
It is obvious that { Dh1 ( x, y, z)} is never a dependent

set.
2 x
0
We have that { Dh2 ( x, y, z)} is dependent if 2 y = 0, hence if x = 0 and y = 0.
0
0
But for (0, 0, z) we have h2 (0, 0, z) = 2 > 0. So the constraint h2 ( x, y, z) = 0 is never
effective if x = 0 and y = 0.
So we are left to check if there are points ( x, y, z) D with h1 ( x, y, z) = 0 and
h2
( x, y, z
) =
where { Dh1 ( x, y, z), Dh2 ( x, y, z)} is a dependent set. The set
0, and
1
2 x o
n
1 , 2 y is dependent only if 4 z = 0 and if 2 x = 2 y, hence if z = 0
4 z
0
and x = y. Since we must have z2 = x + y, from z = 0 and x = y we find x = 0 and
y = 0. But for (0, 0, 0) the second constraint h2 ( x, y, z) = 0 is not effective.
We can conclude that the Constraint Qualification are satisfied everywhere on D .

y+2z
1
2 x
(iii) We have D f ( x, y, z) = x + 2 z , Dh1 ( x, y, z) = 1 , Dh2 ( x, y, z) = 2 y .
2x+2y
4 z
0
Using the Kuhn-Tucker Theorem, we get the following equations for ( x, y, z, 1 , 2 ) :
1 0,
2 0,

x + y 2 z2 0,
2

2 x y 0,

1 ( x + y 2 z2 ) = 0;
2

2 (2 x y ) = 0;

(1)
(2)

y + 2 z + 1 2 2 x = 0;

(3)

x + 2 z + 1 2 2 y = 0;

(4)

2 x + 2 y 4 1 z = 0.

(5)

1. Substituting ( x, y, z) = (1, 1, 1) in (5) gives 4 + 4 1 = 0, hence 1 = 1. But


then (3) or (4) give 2 2 2 = 0, hence 2 = 1, which violates (2). So (1, 1, 1) is not
a candidate maximum.
2. Substituting ( x, y, z) = (1, 1, 1) in (5) gives 4 4 1 = 0, hence 1 = 1. Then (3)
or (4) give 4 2 2 = 0, hence 2 = 2. These values satisfy all equations, hence we have
a candidate ( x, y, z; 1 , 2 ) = (1, 1, 1; 1, 2).
3. Substituting ( x, y, z) = (1, 1, 0) into (3) gives 1 + 1 2 2 = 0, while substituting into (4) gives 1 + 1 2 2 = 0. The only solution of these two equations is 1 = 0
and 2 = 1/2. This violates (2), so this point is no candidate maximum.
4. The point ( x, y, z) = (1, 1, 0) is no candidate maximum for the same reasons as the
point (1, 1, 0) in 3. above.
5. The point ( x, y, z) = (9, 9, 3) fails the second inequality in (2), hence cannot be a
candidate maximum.

MA 208 Optimisation Theory

Solutions 2008 Exam Page 5

(iv) From the analysis above, there is only one candidate for a maximum : ( x, y, z) = (1, 1, 1).
We also know from (i) and (ii) that a maximum must exist and must appear as a solution
of the Kuhn-Tucker equations. It follows that (1, 1, 1) is the maximum of f ( x, y, z) on D .
(b) (i)

For any a 1 we have (1, 1, a) D 0 . But f (1, 1, a) = 1 + 4 a as a . So


f ( x, y, z) has no maximum on D 0 .

(ii) In part (a) we found that the function f has a maximum on D in ( x, y, z) = (1, 1, 1).
Hence we know that f (1, 1, 1) f ( x, y, z) for all ( x, y, z) D . We have that D 00 D
and (1, 1, 1) D 00 . Hence we have that f (1, 1, 1) f ( x, y, z) for all ( x, y, z) D 00 as
well. So we can conclude immediately that f will have the same maximum on D 00 :
( x, y, z) = (1, 1, 1).

(a) If p( x ) and c( x ) as x , then


( x ) = x p( x ) x c( x ) C x x C = ( ) x C,
(i)

as x .

Following the above, if > , then ( x ) as x , so no maximum exists.

(ii) If > , then ( x ) , so there is an x1 such that ( x ) < C for all x > x1 . Now
consider the interval I = [0, x1 ]. The profit is continuous on the compact set I, so, by
Weierstrass Theorem, has a maximum on I. And since (0) = C, this maximum is
at least C and hence is a maximum on R+ .

(iii) There is no maximum in general. For instance, take p( x ) = 1/ x + 1 and c( x ) =


1/( x + 1). Then p( x ) 0 and c( x ) 0. But the profit function satisfies ( x ) =

x/ x + 1 x/( x + 1) C as x , and hence has no upper bound.

(b) (i)

The function h is continuous. The set D is clearly closed, since it is described by an


equality involving a continuous function.
To check that D is bounded, we first notice that for all ( x, y) D we have y2 0. So
we must have x2 (1 x2 ) 0, which implies x2 1. This gives | x | 1. But then also
x2 (1 x2 ) 1, hence y2 1. So also |y| 1. So the coordinates of every point in D are
bounded. It follows that D is bounded.
So we have a continuous function h on a compact set D . And hence by Weierstrass
Theorem we know that a maximum and minimum must exist.

(ii) Setting j( x, y) = x2 x4 y2 , the problem becomes finding the maximum and minimum of h( x, y) for points ( x, y) satisfying j( x, y) = 0.
We first check if there arepoint failing
the Constraint Qualification. For the derivative
2 x 4 x3
of j we have Dj( x, y) =
. The constraint set { Dj( x, y)} is only dependent
2 y
if Dj( x, y) = 0, hence if 2 x 4 x3 = 0 and 2 y = 0. The second equation gives y = 0.
The first equation can be simplified to 2 x (1 2 x2 ) = 0. This has solutions x = 0 and

x = 1/2. But if x = 1/2 and y = 0, then j( x, y) 6= 0. So the only point in D for


which Dj( x, y) = 0 is if ( x, y) = 0.
We conclude that the Constraint Qualification is satisfied everywhere on D except at the
origin (0, 0).

MA 208 Optimisation Theory

Solutions 2008 Exam Page 6

Next we set up the Lagrangean equations. Since Dh( x, y) =

2 x 4 x3
, we are looking for solutions ( x, y, ) for :
2 y
x2 x4 y2 = 0;

(1)

2 x + ( 2 x 4 x ) = 0;

(2)

2 y + ( 2 y) = 0.

(3)

2x
2y

and Dj( x, y) =

Equation (3) is equivalent to 2 y (1 ) = 0, which has solutions y = 0 or = 1.


If we substitute y = 0 in equation (1), then we get x2 (1 x2 ) = 0, which has solutions
x = 1, x = 0 and x = 1. If x = 1, then (2) becomes 2 + (2 + 4) = 0, hence
= 1. If x = 0, then (2) becomes 0 = 0, hence no information about ! And if x = 1,
then (2) becomes 2 + (2 4) = 0, again leading to = 1.
If we substitute = 1 in equation (2), then we get 4 x 4 x3 = 0, hence 4 x (1 x2 ) = 0.
This again leads to solutions x = 1, x = 0 and x = 1, which all gives y = 0 again.
So from the Lagrangean equations we get the following solutions for ( x, y, ) :

(1, 0, 1),

(1, 0, 1),

(0, 0, ),

where is undetermined in the third point.


From all of the above we have three candidates for the minimum and maximum : (1, 0),
(1, 0) and (0, 0). The function values are h(1, 0) = h(1, 0) = 1 and h(0, 0) = 0. Since
there are no other points in which an extrema is possible, and since by (i) we know that
a maximum and minimum must exist, we can conclude that h has a maximum on D
in (1, 0) and (1, 0) and a minimum in (0, 0).
(iii) The change in constraint means that instead of looking at points satisfying h( x, y) =
x2 x4 y2 = 0, we are now considering points satisfying h ( x, y) = x2 x4 y2 +
= 0. From a Theorem in the lectures, we know that for original extrema x with
a Lagrangean multiplier , we get a new extrema x with function value h( x )
h( x ) + .
So for the maxima in (1, 0) and (1, 0), with the multiplier = 1, we get that there will
be new maxima ( x , y ) with function value h( x , y ) 1 + .
For the minimum in (0, 0) we can not make a prediction what the new minimum will
be. There are two reasons for this : There is no corresponding unique multiplier, and the
Constraint Qualification fails in (0, 0), so the whole Lagrangean Theorem cannot give
any information about what is happening in and around that point.

(a) (i)

This is true. Since D2 = D1 {1, 1}, if f has a maximum on D1 in x , then


max{ f ( x ) | x D2 } = max{ f ( x ), f (1), f (1) }.
And this latest maximum always exists, since it is just the maximum of a finite set.

MA 208 Optimisation Theory

Solutions 2008 Exam Page 7

(ii) This is false. For a counterexample, define g : R2 R as follows :


(
g ( x1 , x2 ) =

0,

if k( x1 , x2 )k 6= 1 or x2 {1, 1};

x2 , if k( x1 , x2 )k = 1 and x2 6= 1.

Then g( x1 , x2 ) = 0 for all ( x1 , x2 ) D3 , hence g has a maximum on D3 . But g(D4 ) is the


real interval (1, 1), so no maximum exists on that set.
(b) (i)

The following is the Bellman-Ford Algorithm :


1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

set d(s) = 0;
for all v V with (s, v) A : set d(v) = w(s, v);
for all v V, v 6= s, with (s, v)
/ A : set d(v) = +;
repeat |V | times :
{ for all arcs (u, v) A :
set d(v) = min{ d(v), d(u) + w(u, v) }
};
for all arcs (u, v) A :
{ if d(u) 6= + and d(v) > d(u) + w(u, v) :
declare Negative Cycle ! and STOP IMMEDIATELY;
};
for all v V : declare dist(s, v) to be d(v)

In step 6 we follow the convention that for any number a we have (+) + a = +,
min{+, a} = a, and that min{+, +} = +.
(ii) This is false. The following is a counterexample :

2
a u -

ub

4 AKA 2
s Au
A
A

For this network we have dist(s, a) = 4 and dist(s, b) = 2, even though the arc ( a, b) has
negative weight.
(iii) This is true. The fact that dist(s, v) 6= + for all v V, means that there is a path from s
to v for all v V. If there would be a cycle C with negative weight and v is a vertex
on that cycle, then dist(s, v) = , since we can form a walk from s to v and then walk
around the cycle as many times as we want.
In such a case, the Bellman-Ford Algorithm would not give the distances as outcome,
but would give Negative Cycle ! instead.
(iv) This is false. The digraph in (ii) is also a counterexample for this : There is no walk
from b to s, hence the digraph is not strongly connected.

Você também pode gostar