Lecture Notes On Microeconomics

Lecture Notes on
M i c r o e c o n o m i c s*
r. vijay krishna
∗
Riddled with typos. Use at your own peril.
Contents
1 Theory of Choice 1 5 Duality 40

1.1 Introduction . . . . . . . . . . 1 5.1 Induced Preferences over
Choice Sets and Indirect
1.2 Preference Relations . . . . . 1
Utility . . . . . . . . . . . . . . 40
1.3 Revealed Preference . . . . . 5
5.1.1 Preferences Over
1.4 Summary . . . . . . . . . . . . 8 Budget Sets . . . . . . 41
5.2 The Dual Problem . . . . . . 43
5.3 Hicksian Demand . . . . . . 44
2 Ordinal Utility 9
5.4 Expenditure Function . . . . 45
2.1 Utility Representations . . . 9
2.2 Continuous Utility . . . . . . 13 6 Comparative Statics 48
6.1 Monotone Comparative
Statics — A First Look . . . . 50
3 Consumer Preferences 18
6.2 A Brief Detour into Lattice
3.1 Some Assumptions . . . . . . 18 Theory . . . . . . . . . . . . . 56
3.1.1 Marginal Rate of 6.3 Monotone Comparative
Substitution . . . . . . 20 Statics — Another Look . . . 58
3.2 Temporal Prizes . . . . . . . . 26 6.4 Comparative Statics with
Changing Constraints . . . . 62
6.4.1 An Abstract Theorem 62
4 Consumer Demand 27 6.5 Constrained Optimisation
4.1 The Basic Problem . . . . . . 27 in Euclidean Space . . . . . . 65
4.2 Differentiable Utility Func- 6.6 Ordering Constraint Sets . . 70
tions . . . . . . . . . . . . . . . 29 6.7 Comparative Statics . . . . . 71
6.8 Applications . . . . . . . . . . 72
4.3 Demand Functions . . . . . . 29
4.4 The Lagrangian with Ap- 7 Choice under Uncertainty 74
plications . . . . . . . . . . . . 31 7.1 Preliminaries . . . . . . . . . . 74
4.5 Rationalisable Demand . . . 35 7.2 Preferences Over Lotteries . 78
4.5.1 Law of Demand . . . 38 7.3 Compound Lotteries . . . . . 79
ii
Contents iii
7.4 Expected Utility Theory . . . 81 10.2 Bandit Problems . . . . . . . 133

7.5 Consistency of our As- 10.2.1 A Single-Project
sumptions . . . . . . . . . . . 86 Bandit Problem . . . 133
10.2.2 A More Concrete
8 Risk and Risk Aversion 88
Problem . . . . . . . . 140
8.1 Monetary Prizes . . . . . . . 88
10.2.3 Poisson Version of
8.1.1 Marginal Utility . . . 90
a Concrete Example . 145
8.2 Comparing Lotteries . . . . . 91
10.3 The Space of Bounded
8.2.1 First Order Stochastic
Functions . . . . . . . . . . . . 152
Dominance . . . . . . 92
10.3.1 Defining a Norm . . 152
8.3 Risk Aversion . . . . . . . . . 95
10.3.2 Completeness . . . . 153
8.4 Risk Premia . . . . . . . . . . 96
8.5 Comparing Risk Aversion . 97 10.3.3 An Order Structure . 154
8.6 The Arrow-Pratt Measure . . 99 10.4 Contraction Mappings . . . . 154
8.7 A Portfolio Problem . . . . . 102 10.5 Ordinary Differential
8.8 Increased Risk Aversion Equations . . . . . . . . . . . . 157
and the Optimal Portfolio .104 10.5.1 A More Involved
8.9 More Stochastic Dominance 106 Example . . . . . . . . 159
8.10 Non-expected Utility . . . . .
109
110 11 Metric Spaces
8.11 Subjective States and Utility 162
11.1 Basic Definitions . . . . . . . 162
9 General Equilibrium Theory 114 11.2 Completeness . . . . . . . . . 166
9.1 Basic Definitions . . . . . . . 114 11.3 Product Spaces . . . . . . . . 167
9.2 Pareto Optimal Allocations . 114
11.4 Continuous Functions . . . . 168
9.3 Competitive Mechanisms . . 118
11.5 Compactness . . . . . . . . . . 170
9.4 Equilibrium . . . . . . . . . . 120
9.5 Welfare Theorems . . . . . . 123
12 Envelope Theorems 175
10 Dynamic Programming 125 12.1 Motivation . . . . . . . . . . . 175
10.1 Markov Decision Models . . 125 12.2 Results . . . . . . . . . . . . . 175
10.1.1 Valuing Strategies . . 127 12.3 Applications . . . . . . . . . . 180
10.1.2 Optimality and the
Bellman Equation . . 129 Bibliography 182
©r. vijay krishna 29th September 2010 22:59

1 Theory of Choice
1.1 Introduction
The theory of choice underlies most of microeconomic

theory. It concerns the study of an agent’s choices over
alternatives. As primitives, we have
• Choice set X
• axioms on preferences ≽ over X
• (utility) representation theorem of preferences
From the normative point of view, we usually ask if
you (the analyst) want to obey the axioms (and hence
use the representation theorem). From the descriptive
point of view, we ask if the individuals actually obey
the axioms, and if the axioms have testable implica-
tions. Consider the following ‘real-world’ example.
1 Example: There are two lotteries A and B. Lottery A is

[(0.5; 10), (0.2; 60), (0.3; 100)], and lottery B is [(0.5; 0), (0.5; 60)].
What would you choose? What should an agent choose?
Should the two answers be the same?
1.2 Preference Relations
The following are the primitives in our model of con-

sumer behaviour.
1
1 Theory of Choice 2
• X is a choice set
• ≽ is a preference relation; it captures the behavi-
oural property of ‘most preferred’
• X × X ∶= {(x, y) ∶ x ∈ X, y ∈ X} is the Cartesian product
• A binary relation is B ⊂ X × X; if (x, y) ∈ B, we write
x B y; if (x, y) ∉ B, we write ¬[x B y]
The way to interpret a binary relation is the following:

If x B y, we read it as ‘x is related to y according to B.’
2 Example: Let us consider the following examples
(a) X ∶= {1, 2}. What is X×X? Let B1 ∶= {(1, 1), (1, 2), (2, 1), (2, 2)}
and B2 ∶= {(1, 1), (1, 2)}.
(b) X ∶= R. Define x B y if and only if x ⩾ y. In other
words, B ∶= {(x, y) ∈ R2 ∶ x ⩾ y}.
(c) X ∶= R, x B y if and only if x = y + 2, y ∈ N.
(d) Let X ∶= {Bill, Hillary, Chelsea}. Suppose B is the
binary relation
B ∶= {(Hillary, Chelsea), (Bill, Chelsea)}
can be interpreted as capturing the relationship of

parenthood between x and y. ♢
The following are useful kinds of binary relations. Fix a

choice set X. We say that B ⊂ X × X is
(a) reflexive if x B x for all x ∈ X
(b) irreflexive if ¬[x B x] for all x ∈ X
(c) symmetric if x B y implies y B x for all x, y ∈ X
(d) asymmetric if x B y implies ¬[y B x] for all x, y ∈ X

(e) antiymmetric if x B y and y B x implies x = y for

all x, y ∈ X
(f) transitive if x B y and y B z implies x B z for all
x, y, z ∈ X
(g) negatively transitive if ¬[x B y] and ¬[y B z] im-

plies ¬[x B z] for all x, y, z ∈ X
(h) complete or connected if x B y or y B x for all x, y ∈
X
(i) weakly connected if x B y or y B x or x = y for all

x, y ∈ X
(j) acyclic if x1 B x2 , x2 B x3 , . . . , xn−1 B xn implies

x1 ≠ xn for all x1 , . . . , xn ∈ X
Our approach. Start with a binary relation, and make

some assumptions about what properties it satisfies.
The assumptions will take the fanciful name ‘axioms’.
3 Exercise: Let B be a binary relation that represents a

consumer’s preferences. What properties above do you
think are reasonable to assume? ♢
4 Definition: A binary relation ≽ on a set X is a preference

relation if it is complete and transitive. ♢
5 Example: Let us think a little more carefully about the

assumptions we have made. Suppose X ∶= (0, ∞) × (0, ∞)
represents bundles (b, w) of beer and wine. Are the fol-
lowing comparable, (22, 17) and 18, 21)? What about
(24, 16) and (17, 22)?
So much for completeness. What about transitivity? Is
this reasonable? Let X ∶= N represent the number of
grains of sugar in a cup of tea. It is reasonable to assume
that for all n, n ∼ n + 1. But is n ∼ 100100 ? ♢

Given a preference ≽ on X. We may define ≻ ⊂ X × X as
x ≻ y if x ≽ y and ¬[y ≽ x]
and ∼ ⊂ X × X as
x ∼ y if x ≽ y and y ≽ x
≻ is referred to as the agent’s strict preference and ∼ is

his indifference relation.
6 Exercise: Let ≽ be a preference relation. Show that ≽=≻

∪ ∼. Show that ≻ is asymmetric and negatively transitive.
Show that ∼ is transitive and symmetric. ♢
7 Proposition: Let ≻ be asymmetric and negatively trans-

itive. Define (i) x ≽ y if ¬[y ≻ x] and (ii) x ∼ y if ¬[x ≻ y]
and ¬[y ≻ x]. Show that ≽ is complete and transitive. ♢
8 Example: Let B ⊂ N × N be defined as a B b if and only if

ab ≠ 0. Therefore, B is symmetric, B is transitive and B is
not reflexive (since ¬[0 B 0]). ♢
9 Exercise: Let B ⊂ N × N be defined as a B b if and only

if ab ≠ 1. Show that B is symmetric. Show that B is not
transitive. ♢
A useful definition is the following.
10 Definition: Let X be a set. An equivalence relation on X

is a binary relation ∼ that is symmetric and transitive. ♢
11 Exercise: Let R ⊂ X × X be symmetric and transitive.

Define Y ∶= {x ∈ X ∶ x R x}.
(a) Show that (i) Y ⊂ X and (ii) R is an equivalence
relation on Y .
(b) Let Z ∶ X ∖ Y . Show that for any z ∈ Z, there is no
w ∈ Z such that z R w or w R z. ♢

12 Exercise: Let B ⊂ R2+ × R2+ , where x ∶= (xa , xb ) is a bundle

of apples and bananas. B is such that x B y if and only if
(i) u(x) ⩾ u(y) and (ii) v(x) ⩾ v(y) where u, v ∶ R2+ → R are
1−α
given by u(x) ∶= xαa xb and v(x) ∶= x1− α α
a xb , α ∈ [0, 1]. Is
B transitive? Complete? If not, for what values of α is B
complete? ♢
1.3 Revealed Preference
Descriptive point of view: observe choices being made.

The agent’s choice behaviour reveals his preferences.
Form a normative point of view, the question is, Given
preferences, what should choices look like?
Another way of stating this is that at some level, choices
are more fundamental than preferences. After all, we
observe choices, not preferences. Assume X is finite,
and 2X is the set of subsets of X.
13 Exercise: If X ∶= {1, . . . , n}, show that the cardinality of

2X is 2n . ♢
14 Definition: A choice function for a finite set X is a func-

tion c ∶ 2X ∖ ∅ → 2X ∖ ∅ such that for all A ∈ 2X ∖ ∅,
c(A) ⊂ A. ♢
The idea behind the choice function is this. If the agent

is offered the set A to choose from, any member of c(A)
will do. Let us consider a simple example of a choice
function that is of some interest.
15 Example (Money Pump): Let X ∶= {x, y, z}, and suppose

c is a choice function such that c(X) = X, c({α}) = {α} for
all α ∈ X, c({x, y}) = {x}, c({y, z}) = {y} and c({x, z}) =
{z}. Notice that if an agent has such a choice function,

then an unscrupulous confidence man will be able to

extract all of the agent’s wealth. (Can you see how?) ♢
If preferences are ≽, then
c(A; ≽) ∶= max (≽) = {x ∈ A ∶ x ≽ y, ∀ y ∈ A}

x∈A
is a choice function.
16 Exercise: Is c(A; ≽) well defined? Ie, is c(A) ≠ ∅ for all

A. Is it always a choice function? Can you think of when
it might not be well defined? (Think of the cardinality of
X.) ♢
The following is a useful characterisation of choice func-

tions.
17 Exercise: Let ≽ be complete. Then, c(⋅; ≽) is a choice

function if and only if ≻ is acyclic. ♢
We shall now make some assumptions about choice

functions.
18 Definition (Houthakker’s Axiom aka WARP): If x, y ∈

A ∩ B and x ∈ c(A), y ∈ C(B), then x ∈ c(B) and y ∈ c(A). ♢
19 Definition (Sen’s decomposition): Sen’s α: If x ∈ B ⊂ A

and x ∈ c(A), then x ∈ c(B). Sen’s β: If x, y ∈ c(A), A ⊂ B
and y ∈ c(B), then x ∈ c(B). ♢
20 Exercise: Which of Sen’s α and β does the choice func-

tion in Example 15 not satisfy? ♢
We begin with a simple proposition about choice func-

tions induced by preferences.

21 Proposition: If ≽ is a preference relation, then c(⋅; ≽)

satisfies WARP and hence both Sen’s α and β. ♢
Proof. If x, y ∈ A ∩ B, x ∈ c(A; ≽) and y ∈ c(B; ≽), then

x ≽ y in A and y ≽ x in B. But since preferences are the
same over subsets, x ∼ y. ∎
22 Exercise: Show that WARP implies Sen’s α and β. ♢
The following is a fundamental characterisation of

WARP and Sen’s conditions α and β.
23 Proposition: If c satisfies both α and β, then there ex-

ists a preference relation ≽ such that c(⋅) = c(⋅; ≽). ♢
Proof. Define ≽ as follows. x ≽ y if x ∈ c({x, y}).

Clearly, ≽ is complete, since c({x, y}) ≠ ∅. We first
need to check that ≽ is transitive.
To check transitivity: Let A ∶= {x, y, z}. Suppose x ∈
c({x, y}) and y ∈ c({y, z}). Is x ∈ c({x, z}). If it is, then
we are done. To see this, let us ask ourselves, What is
c(A)?
If z ∈ c(A), then z ∈ c({y, z}) (Sen’s α). By Sen’s β, it

must be that y ∈ c(A). Then, by Sen’s β, x ∈ c(A), so
that x ≽ z.
So let us suppose z ∉ c(A). Then, if y ∈ c(A), it must
be that y ∈ c({x, y}) by Sen’s α. Thus, x ∈ c(A) by
Sen’s β, so that x ≽ y ≻ z. Finally, if y ∉ c(A), then
x ∈ c(A), so that x ≻ z. Thus, ≽ is transitive.
We now have to show that c(A) ∶= c(; ≽) for all A. Fix

an A ⊂ X.
(a) If x ∈ c(A), then for all z ∈ A, x ≽ z. To see this, note
that if z ≻ x, then c({x, z}) = {z}, contradicting Sen’s α.
Thus, x ∈ c(A; ≽), so that c(A) ⊂ c(A; ≽).

(b) If x ∈ c(A; ≽), then x ≽ y for all y ∈ A. If x ∉ c(A),

then there is some y ∈ A such that y ≻ x, a contradic-
tion. ∎
24 Corollary: c satisfies WARP if and only if c satisfies α

and β if and only if there exists preference relation ≽
such that c(⋅) = c(⋅; ≽). ♢
1.4 Summary
We have seen that preferences and choices are two sides

of the same coin. We have also defined rationality so
that an agent is rational if he satisfies WARP which is
the same as saying that he has complete and transitive
preferences. Of course, there is nothing sacrosanct about
these assumptions. Indeed, there has been a lot of work
on relaxing these assumptions. A final note, notice that
we haven’t introduced utility functions. It is possible to
do a lot without them (as we shall see below), but it is
useful to know when they represent preferences.

2 Ordinal Utility
2.1 Utility Representations
As before, X is a choice set and ≽ is a preference rela-

tion. Let u ∶ X → R be a real function. We want a numer-
ical representation of ≽.
1 Definition: A function u ∶ X → R represents a preference

≽ if
x ≽ y ←→ u(x) ⩾ u(y).
u is also referred to as a numerical representation or
utility representation of ≽. ♢
The question we will ask in this chapter is if and when

a utility representation of a preference exists. We hasten
to note that even if a utility representation exists, it is
never unique. To see this, suppose the function u ∶ X →
R represents ≽. Then, the function
v(x) ∶= eu(x)
also represents the preference. (Check this!) Indeed, the

same is true more generally.
2 Proposition: If u represents ≽ and f ∶ R → R is any

strictly increasing function, then v ∶= f ○ u also represents
≽. ♢
Proof. x ≽ y iff u(x) ⩾ u(y) (since u represents ≽) iff

f(u(x)) ⩾ f(u(y)) (since f is strictly increasing) iff
v(x) ⩾ v(y) (by the definition of v). ∎
9
2 Ordinal Utility 10
3 Exercise: Let X be nonempty and let u, v ∶ X → R rep-

resent ≽ on X. Show that there exists a strictly increasing
function f ∈ Ru(X) such that v ∶= f ○ u. ♢
We begin with a representation theorem for the case

where X is finite.
4 Proposition: Let X be finite. Then, a binary relation ≽ is
a preference relation if and only if there exists a function
u ∶ X → R that represents ≽. ♢
Proof. We begin with a claim. We claim that if A is

finite, then maxA (≽) and minA (≽) exist. We prove
this by induction. If ∣A∣ = 1, it is easily seen. Suppose
∣A∣ = n, so that by the induction assumption, the state-
ment is true for all subsets of A with cardinality n − 1.
Suppose x ∈ A and so A ∖ {x} is of cardinality n − 1. If
z ∈ A ∖ {x} is maximal, and x ≽ z implies x is maximal
in A. If z ≽ x, then z is maximal in A. Similarly, for the
minimal element.
To prove the proposition, let [x] ∶= {y ∈ X ∶ y ∼ x} be
the equivalence class of x. Since X is finite, [x] is finite
for each x ∈ X. Notice that, by definition of ∼, y ≁ x
implies [y] ∩ [x] = ∅. Thus, there exists a partition of
X ∶= X1 ∪ ⋅ ⋅ ⋅ ∪ Xk such that Xj ∶= [x] for some x ∈ X for
j = 1, . . . , k. Moreover, by the claim above, we can take
X1 , . . . , Xk such that x ≻ y if and only if x ∈ Xj , y ∈ X`
and j > `.
Now, define the function u(x) ∶= j, if x ∈ Xj . It is easy
to see that u represents ≽.
A final comment. We can take the utility function u so
that u(X) ⊂ (−1, 1). To see this, suppose u is defined
as above. Then, by construction, u is unbounded.
Now consider the function v(x) ∶= tan−1 (u(x))/(π/2).
Clearly, v represents ≽ and v(X) ∈ [0, 1). ∎

5 Exercise: In the Proposition above, show that if u exists,

≽ is complete and transitive. ♢
We now consider the proposition when X is countable.

We begin with a lemma.
6 Proposition (Cantor): Let X be countable and ≽ an an-

tisymmetric preference relation. Then, there exists a
function f ∶ X → Q that represents ≽. ♢
Proof. Let X ∶= {x0 , x1 , . . . } and Q ∶= {q0 , q1 , . . . }. To

construct f, let f(x0 ) ∶= q0 . If x1 ≽ x0 , set f(x2 ) as
the first element (with respect to the indexing) in
{q1 , q2 , . . . } such that qk ⩾ f(x0 ). Similarly, if x0 ≽ x1 ,
let f(x1 ) be the first element (with respect to the index-
ing) in {q1 , q2 , . . . } such that qk ⩽ f(x0 ).
Proceeding inductively, for any n, suppose we have
defined f(xj ) for j < n. Then, set f(xn ) as the first
element of
{q0 , q1 , . . . , } ∖ {f(x0 ), . . . , f(xn−1 )}
so that this rational number has the same (order) rela-

tion to the collection of rationals f(x0 ), . . . , f(xn−1 ) that
xn has to x0 , . . . , xn−1 .
You should check that f is well defined. It is easy to

see that f represents ≽. Notice that instead of taking
the rationals, we could instead have taken, for in-
stance, an enumeration {q0′ , q1′ , . . . } of the rationals
in (−1, 1), so that {q0′ , q1′ , . . . } = Q ∩ (−1, 1). ∎
Notice that we cannot let f take values only in N.
7 Exercise: Let X ∶= Q ∩ [0, 1], and ≽ so that x ≽ y if and

only if x ⩾ y. Show that there is no utility representation
of ≽ that takes values only in N. ♢

8 Proposition: Let X be countable and ≽ a preference rela-

tion. Then, ≽ has a utility representation. ♢
Proof. Since ∼ is an equivalence relation, / ∼ is a quo-

tient set. The binary relation P on X/ ∼ is given as
follows: [x] P [y] if and only if x ≽ y. Then, P is a
preference relation that is antisymmetric.
By the proposition above, we see that there exists a
function f ∶ X/ ∼→ Q that represents P. Now, for any
x ∈ X, let u(x) ∶= f([x]). Clearly, u represents ≽, as
desired. ∎
An obvious question is, Does every X with a preference

≽ have a utility representation. Surprisingly enough, the
answer is no. Let us look at the standard example that
shows this. First a definition.
9 Definition: A set X is countable if there exists an injec-

tion ψ ∶ X → N. If ψ is a bijection, X is countably infinite
or denumerable. Thus, N and Q are countably infinite,
but R is not. ♢
10 Example (Lexicographic Preferences): Let X ∶= [0, 1] ×

[0, 1]. Say that (x1 , x2 ) ≽ (y1 , y2 ) if x1 > y1 ) or if x1 = y1
and x2 ⩾ y2 . This is known as the lexicographic order or
dictionary order. (Can you guess why?)
Suppose u represents ≽. Then, for any r ∈ [0, 1], (r, 1) ≻
(r, 0), so that u((r, 1)) > u((r, 0)). Therefore, there exists
rational number qr ∈ (u((r, 0)), u((r, 1))). If s > r, then
qs > qr . But this means that there are uncountably many
rationals (one for each r ∈ [0, 1]), which is untrue. Hence,
≽ cannot have a utility representation. ♢

Since a lot of the interesting choice sets are uncountable,

we must introduce some topological notions if we are to
make any progress. This brings us to continuous utility.
2.2 Continuous Utility
We begin with some definitions. Fix a metric space X.
11 Definition (Continuity — C1): A preference ≽ is con-

tinuous if for x ≻ y, there exist disjoint balls B(x, r) and
B(y, r) such that for all a ∈ B(x, r) and all b ∈ B(y, r),
a ≻ b. ♢

tinuous if the graph of ≽ — ie, the set {(x, y) ∶ x ≽ y} ⊂
X × X — is a closed set. That is to say, if (xn , yn ) is a se-
quence in X × X such that (i) (xn , yn ) → (x, y) and (ii)
xn ≽ yn for all n, then x ≽ y. ♢
The different notions of continuity will be useful in

different contexts, provided they are equivalent. This is
what we shall prove next.
13 Theorem: A preference ≽ satisfies C1 if and only if ≽

satisfies C2. ♢
Proof. (i) (C1 implies C2). Assume ≽ is continuous

as per C1. To show that C2 holds, let (xn , yn ) be a
sequence such that (xn , yn ) → (x, y) and xn ≽ yn ).
We want to show that x ≽ y. Toward this end, let us
suppose, by way of contradiction, that this isn’t the
case. That is, suppose y ≻ x. Then, by C1, there exists
r > 0 such that B(x, r) ∩ B(y, r) = ∅ such that for all

a ∈ B(x, r) and b ∈ B(y, r), a ≻ b. Also, there exists

large enough N such that for all n > N, xn ∈ B(x, r)
and yn B(y, r). But C1 then implies (by the definition
of r) that yn ≻ xn for n > N, which contradict the
hypothesis.
Assume ≽ is continuous as per C2 and let x ≻ y. Now
suppose C1 does not hold. Then, for all n > 0, there
exists xn ∈ B(x, 1/n) and yn ∈ B(y, 1/n) such that
yn ≽ xn . It is easy to see that xn → x and yn → y. Thus,
by C2, it follows that y ≽ x, a contradiction. ∎
Two other definitions of continuity are useful.

tinuous if for all x ∈ X, the sets {y ∈ X ∶ y ≻ x} and
{y ∈ X ∶ x ≻ y} are open. ♢

tinuous if for all x ∈ X, the sets {y ∈ X ∶ y ≽ x} and
{y ∈ X ∶ x ≽ y} are closed. ♢
16 Exercise: Show that the definitions C1 — C4 are equi-

valent. ♢
Our goal is now to prove (at least some part of) Debreu’s
theorem which says that every continuous preference
has a continuous utility representation. We shall not
even state the theorem in complete generality. Moreover,
we shall only show that there exists a utility representa-
tion.1 We begin with some definitions. 1
The Open Gap Lemma shows
that if such a representation
exists, then a continuous rep-
17 Definition: A set X ⊂ RN is convex if for all x, y ∈ X and resentation also exists. The
λ ∈ [0, 1], λx + (1 − λ)y ∈ X. ♢ proof of this lemma is beyond
the scope of these notes.

Intuitively, a set is convex if it has no holes or troughs.
18 Definition: A metric space X is connected if there are no

two disjoint (non-empty) open sets O1 and O2 such that
X ∶= O1 ∪ O2 . ♢
19 Exercise: Show that RN

+ is convex. Show that it is con-
nected. What is RN
+ ∩ ∅? ♢
For a convex set X, if x, y ∈ X, let [x, y] ∶= {z = λx +(1− λ)y ∶

0 ⩽ λ ⩽ 1} be the interval joining x and y.
20 Lemma: Let ≽ be continuous and X ⊂ RN be convex. If

x ≻ y, there exists a z ∈ X such that x ≻ z ≻ y. ♢
Proof. Suppose not. Since X is convex, [x, y] ⊂ X. We

will construct two sequences (xt ) and (yt ) by first
letting x0 ∶= x and y0 ∶= y. For any t ∈ N, suppose we
have defined xt and yt such that xt ≽ x and y ≽ yt .
xt + yt
Now, let m ∶= . Then, either m ≽ xt or yt ≽ m,
2
but not both. If the first is true, let xt+1 ∶= m and yt+1 ∶=
yt . If the second is true, let yt+1 ∶= m and xt+1 ∶= xt . In
either case, xt+1 ≽ x and y ≽ yt+1 .
But the sequences (xt ) and (yt ) are converging, to
the same point. In particular, ∥xt − yt ∥2 → 0. Let the
common limit point be z. Then, since ≽ is continuous,
z ≽ x and y ≽ z, so that by transitivity y ≽ x. But this
contradicts the hypothesis that x ≻ y. ∎
As is often the case, greater generality leads to a more

transparent proof.

21 Lemma: Let ≽ be continuous and X ⊂ RN be connected.

If x ≻ y, there exists a z ∈ X such that x ≻ z ≻ y. ♢
Proof. Suppose not. Let Oy ∶= {z ∶ z ≻ y} and Ox ∶= {z ∶

x ≻ z}. Then, if there is no z such that x ≻ z ≻ y, Ox and
Oy are disjoint. They are also nonempty since x ∈ Oy
and y ∈ Ox . Since ≽ is continuous, Ox and Oy are also
open. Moreover, X = Ox ∪ Oy , which implies that X is
not connected. But this is a contradiction. ∎
22 Exercise: Let X ⊂ RN be convex. Show that it is connec-

ted (in the relative topology). ♢
Recall that a set Y ⊂ is dense in X if X ∶= Y . In other

words, for all x ∈ X and for all ε > 0, B(x, ε) ∩ Y ≠ ∅. Thus,
Q is dense in R, but N is not. A set X is separable if it
has a countable dense subset.
23 Lemma: Assume X ⊂ RN is convex (or connected) and

separable. If ≽ is continuous, there exists a utility repres-
entation. ♢
Proof. Let Y ⊂ X be the countable dense subset. From

above, we know that there exists a utility function
v ∶ Y → (−1, 1) that represents ≽ on Y . For each x ∈ X,
define U(x) ∶= sup{v(z) ∶ z ∈ Y and x ≻ z}.
If there is no z ∈ Y such that x ≻ z (which could happen
if x is the worst element in X or Y ), then let U(x) ∶=
−1. (If z is the minimal element in X, it could be that
U(z) < v(z).) Now to check that U(⋅) represents ≽.
First, suppose that x ∼ y. Then, x ≻ z iff y ≻ z, so that
U(x) = U(y).

Next, suppose that x ≻ y. Then, by the lemmas above,

there exists z ∈ X such that x ≻ z ≻ y. Since ≽ is continu-
ous, there exists ε > 0 such that x ≻ B(z, ε) ≻ y. Since Y
is dense, there exists z1 ∈ Y ∩ B(z, ε). By the same argu-
ments, there exists z2 such that x ≻ z1 ≻ z2 ≻ y. There-
fore, U(x) ⩾ v(z1 ) (by the definition of U and since
x ≻ z1 ), v(z1 ) > v(z2 ) (since z1 ≻ z2 ) and v(z2 ) ⩾ U(y)
(since z2 ≻ y and from the definition of U.)
We have thus shown that ≽ on X has a utility repres-
entation. ∎

3 Consumer Preferences
In this chapter, we shall make some structural assump-

tions on the choice set X and the preference ≽. In par-
ticular, we shall assume throughout that X = RN + where
n ⩾ 1. The set X has three kinds of structural proper-
ties, namely topological, algebraic and lattice (ie order)
structures. The assumptions on ≽ shall exploit these
properties in turn.
The topological properties of X that will be useful are
completeness and path connectedness. The algebraic
structure that is useful is the fact that X is convex. Fi-
nally, it is easily seen that X admits a partial order,
where we compare elements of X pointwise (ie, coordin-
atewise). To see this order, we require some notation.
Suppose x = (x1 , . . . , xn ). Let us say that
• x ⩾ y if xm ⩾ ym for all m
• x > y if x ⩾ y and x ≠ y. That is, xm ⩾ ym for all m
and xm > ym for some m
• x ≫ y if xm > ym for all m
Thus, ⩾ is a partial (ie, not complete or linear) order on
X.
3.1 Some Assumptions
1 Definition (MON): A preference ≽ is monotone if x ⩾ y

implies x ≽ y, and if x ≫ y implies x ≻ y. ♢
18
2 Definition (S-MON): A preference ≽ is strongly mono-

tone if x > y implies x ≻ y. ♢
MON captures the notion that more is better. S-MON

says that more of any good is strictly better. Clearly,
S-MON implies MON. Below is an example distinguish-
ing between the two.
3 Example: Suppose the preference ≽ is represented by

the utility function u. (i) Let u(x) = min{x1 , x2 }. Then, ≽
is monotone but not strongly monotone. (ii) u(x) = x1 + x2 ,
so that ≽ is strongly monotone. ♢
As before, we shall assume that ≽ is continuous. By as-

suming that ≽ is also monotone, we shall now prove
that there exists a utility representation. The trick is to
show that the diagonal of X is connected and prefer-
ences restricted to the diagonal has a continuous utility
representation (this is where we use monotonicity). We
can then find for each bundle x ∈ X, a unique bundle
on the diagonal that is indifferent to x, thereby giving a
utility function for the entire domain.
4 Proposition: Let ≽ be continuous and satisfy MON.

Then, it has a continuous utility representation. ♢
Proof. Let e = (1, . . . , 1) so that the diagonal is D ∶=

⋃λ⩾0 λe. Define the utility function u(λe) = λ on the
diagonal D. Clearly, u is continuous on the diagonal.
Now, consider x ∈ X. Then, Mx ∶= ( maxn (xn ), . . . , maxn (xn )) ∈
D for any such x. By MON, Mx ≽ x ≽ 0. Then, there ex-
ists λx such that λx e ∼ x. By MON, such a λx is unique.
Therefore, let u(x) = λx .

It is easy to see that u ∶ X → R is well defined. To

see that u is continuous, let c ∈ R. If c < 0, the sets
{y ∈ X ∶ u(y) ⩾ 0} = X and {y ∈ X ∶ u(y) ⩽ 0} = ∅ are
closed.
Now suppose c > 0, so that c = u(λe) for some λ > 0.
Then, {y ∶ u(y) ⩾ c} = {y ∶ y ≽ λe} is closed. Similarly,
{y ∶ u(y) ⩽ c} = {y ∶ λe ≽ y} is closed. Thus, u is
continuous. ∎
3.1.1 Marginal Rate of Substitution
Before posing the main question of this section, notice

that the extra utility to a consumer of increasing his
consumption of good 1 by an infinitesimal amount at
bundle x is ∂u/∂x1 (x), and is therefore the Marginal
Utility of good 1, denoted by MU1 (x). As before, this
depends on the bundle x. A similar definition applies to
the Marginal Utility of good 2, denoted by MU2 (x).
An extremely important question in consumer theory is

(a version of) the following. Suppose the consumer has
the bundle (5, 3) consisting of 5 apples and 3 bananas.
How many apples is the consumer willing to give up
for an extra banana? This amount is known as the mar-
ginal rate of substitution between bananas and apples
at (5, 3). Now that we have some familiarity with the
calculus, we can make this idea more precise.
Let z∗ ∈ R2+ be a bundle. Consider the indifference class

I(z∗ ). Suppose for each x1 ⩾ 0, there exists an x2 (x1 )
(which depends on x1 ) such that for each x1 ∈ R,
u(x1 , x2 (x1 )) = u(z∗ ).

In other words, the graph of x2 (x1 ) is the indifference

class I(z∗ ), and each point (x1 , x2 (x1 )) also lies in the
indifference class I(z∗ ). Therefore, x2 (z∗1 ) = z∗2 .
If we were to offer an increase of x1 by a small amount.

How much of good 2 would the consumer be willing
to give up for this extra offer of good 1? Clearly, he
does not want to fall below his original utility level.
Therefore, we can ask the question, how much of good
2 does the consumer have to give up so that he remains
at the same utility level (when given a small amount of
good 1)? The answer to this lies in differentiating the
equation above, with respect to x1 . Notice that the right
hand side of the equation is a constant (it is just a fixed
level of utility), so differentiating the right hand side
with respect to x1 gives us 0. On the left hand side, we
have
d ∂u ∂u dx2
u(x1 , x2 (x1 )) = u((x1 , x2 (x1 )) + u((x1 , x2 (x1 ))
dx1 ∂x1 ∂x2 dx1
=0
We can rearrange this equation as

∂u
dx2 ∂x1
u((x1 , x2 (x1 ))
= − ∂u .
dx1 ∂x
u((x1 , x2 (x1 ))
2
When evaluated at the bundle z∗ , we find that

dx2 ∗ MU1 (z∗ )
(z1 ) = − =∶ MRS12 (z∗ ).
dx1 MU2 (z∗ )
5 Definition: The marginal rate of substitution (MRS) of

a utility function at the bundle z∗ is denoted by MRS12 (z∗ )
and is given by
MU1 (z∗ )
MRS12 (z∗ ) = − .
MU2 (z∗ ) ♢

How do we interpret this geometrically, especially since

we have derived the expression for the marginal rate
of substitution in terms of derivatives? Suppose you
were to draw the indifference class I(z∗ ). Now, draw
the tangent to the curve I(z∗ ) at the point z∗ . The slope
of this line is precisely the marginal rate of substitution
at z∗ .
What properties does the MRS have? Consider the

bundle (5, 3) and an arbitrary utility function u. Then,
the MRS denotes the amount of good 2 that the con-
sumer is willing to give up for an extra unit of good 1.
Let us suppose this amount is ε. What about the bundle
(5 − ε, 4)? Presumably, since the consumer has less of
good 2 to give up, he doesn’t value an extra unit of
good 1 so much, so the extra amount of good 2 that
he is willing to give up is smaller. This leads us to the
following assumption, stated in terms that are not so
obvious.
6 Assumption: Preferences are convex so that indiffer-

ence curves are either linear or bowed toward the ori-
gin. ♢
Therefore, indifference curves are never bowed away

from the origin. They are either bowed towards the
origin (consider the utility function u(x) = xa1 1 xa2 2 ) or
the indifference curves are linear (for instance, u(x) =
a1 x1 + a2 x2 ). We end with two simple exercises.
7 Exercise: Find the marginal rates of substitution of the

linear, Leontief, Cobb-Douglas and the quasilinear utility
functions given above, at the bundle (1, 2). ♢

8 Exercise: The object of this exercise is to show you

what is not true. A reasonable conjecture might be that
MRS(x1 , x2 ) is decreasing in x1 . Consider the function
u(x) = x21 x42 . Show that the conjecture is not true at any
x ≫ 0. ♢
A common economic assumption is that as we move

along an indifference curve, the marginal rate of substi-
tution decreases. This property is implied by the follow-
ing assumptions. First some notation. For each y ∈ X,
let U (y) ∶= {x ∶ x ≽ y} be the upper contour set at y.
Similarly, L (y) ∶= {x ∶ y ≽ x} is the lower contour set at
y.
9 Definition (CONV1): A preference ≽ is convex if x ≽ y

and α ∈ (0, 1) implies αx + (1 − α)y ≽ y. ♢
10 Definition (CONV2): A preference ≽ is convex if the set

U (y) ∶= {x ∶ x ≽ y} is convex for each y ∈ X. ♢
11 Proposition: CONV1 iff CONV2. ♢
Proof. (i) [CONV1 implies CONV2]. Fix y ∈ X, and

suppose a ≽ b ≽ y. Then, αa + (1 − α)b ≽ b ≽ y, so that
αa + (1 − α)b ∈ U (y).
(ii) [CONV2 implies CONV1]. Suppose x ≽ y, so x, y ∈

U (y). Then, since U (y) is convex, αx +(1− α)y ∈ U (y),
ie, αx + (1 − α)y ≽ y. ∎
A stronger assumption is the following.
12 Definition (S-CONV): Preference ≽ is strictly convex if

a, b ∈ U (y), a ≠ b and α ∈ (0, 1), then αa + (1 − α)B ≻ y. ♢

√
13 Example: Here are some examples. (i) Let u(x) = x1 +
√
x2 . This represents a preference that is S-CONV, and
hence CONV. (ii) The preference represented by u(x) ∶=
min{x1 , x2 } is CONV, but not S-CONV. ♢
14 Exercise: Show that the lexicographic preference rela-

tion is S-CONV. ♢
Now to a definition regarding utility functions.
15 Definition: The function u ∶ X → R is quasiconcave if

{x ∶ u(x) ⩾ u(y)} is convex for each y ∈ X. ♢
NOTE. The convexity of ≽ does NOT imply that u is

convex. Quasiconcavity is the best we can do. Another
useful property of preferences is the following.
16 Definition (HOM): Preference ≽ is homothetic if x ≽ y

implies αx ≽ αy for all α ⩾ 0. ♢
17 Definition: A function u ∈ RX is positively homogen-

eous of degree λ if u(αx) = αλ u(x) for all α ⩾ 0. ♢
Some consequences. If u represents ≽ and is positively

homogeneous of degree λ, then ≽ is homothetic. To see
this, notice that x ≽ y iff u(x) ⩾ u(y) iff αλ u(x) ⩾ αλ u(y)
iff u(αx) ⩾ u(αy) iff αx ≽ αy.
More is true. If ≽ is homothetic and continuous satisfies
MON, then it has a utility representation that is homo-
geneous of degree 1. To see this, let u(x) = λ = u(λe), so
that x ∼ λe. Then, αx ∼ αλe, ie, u(αx) = u(αλe) = αλu(e) =
αu(λe) = αu(x).
A useful observation about homothetic preferences

is the following. Suppose X = R2+ . Then, MRS(x) =

MRS(αx) for all α > 0. In particular, this means that

if x maximises a consumer’s utility when he has w
as wealth, then changing his wealth by a factor of α
changes the utility maximising bundle by a factor of α.
Thus, knowing how the consumer will behave at one
wealth level tells us how he will behave at any other
wealth level.
A further useful class of preferences are the quasilinear
ones.
18 Definition: Preference ≽ is quasilinear in commodity 1
(the numeraire) if x ≽ y implies x + εe1 ≽ y + εe1 , where
ε > 0 and e1 ∶= (1, 0, . . . , 0). ♢
Quasilinear preferences have a very natural representa-

tion.
19 Proposition: If ≽ is continuous, satisfies S-MON (in
good 1, at least) and is quasilinear in good 1, then it
is represented by a utility function of the form x1 +
v(x2 , . . . , xn ). ♢
Proof. For each (0, x2 , . . . , xn ), there exists v(x2 , . . . , xn )

such that
(v(x2 , . . . , xn ), 0, . . . , 0) ∼ (0, x2 , . . . , xn ).
(Show this!) Therefore, for each bundle x, (x1 +v(x2 , . . . , xn ), 0, . . . , 0) ∼

(x1 , . . . , xn ), and because of S-MON in good 1, x1 +
v(x2 , . . . , xn ) represents ≽. ∎
20 Exercise: Suppose X ∶= R2+ and ≽ is quasilinear in good

1. Let y1 ∈ R such that (0, y1 ) ∼ (x1 , 0) for some x1 ∈ R.
Show that for all y ∈ [0, y1 ), there exists x(y) ∈ R+ such
that (0, y) ∼ (x(y), 0). ♢

3.2 Temporal Prizes

4 Consumer Demand
4.1 The Basic Problem
Here we let X ∶= RN
+ be the choice set. Preferences are ≽
on X. Let us also assume that the agent has a wealth w.
If the goods are available at price p ∈ RN++ , this defines
a budget set B(p, w) ∶= {x ∈ X ∶ ⟨x, p⟩ ⩽ w}, where ⟨⋅, ⋅⟩ ∶
RN N
+ × R+ → R is the standard inner product. Clearly,
B(p, w) is convex and compact.
1 Definition (Consumer’s Problem – CP): The consumer’s

problem is to find a ≽-best element in B(p, w). ♢
2 Proposition: If ≽ is continuous, then CP has a solu-

tion. ♢
Proof. We shall give two proofs of the proposition.

(i) For each x ∈ B(p, w), Ux ∶= {y ∶ y ≽ x} is the closed
upper contour set. Let F ∶= {Ux ∶ x ∈ X} be a collection
of closed sets. To see that F has the fip, let x1 , . . . , xn
be such that x1 ≽ . . . ≽ xn . Then, xi ∈ Uxj whenever i < j.
Thus,
n
⋂ Uxj ≠ ∅.
j=1
Since B(p, w) is compact, it follows that
⋂ Ux ≠ ∅.
x∈B(p,w)
27
4 Consumer Demand 28
Let
x∗ ∈ ⋂ Ux ,
x∈B(p,w)
so that x∗ ≽ y for all y ∈ B(p, w). Thus, x∗ is ≽-maximal.

(ii) For each x ∈ B(p, w), let Lx− ∶= {y ∈ B(p, w) ∶ x ≻ y},
which is open since ≽ is continuous. If there is no ≽-
maximal element, then for each x ∈ B(p, w), there
exists z ∈ B(p, w) such that x ∈ Lz− . Thus,
B(p, w) = ⋃ Lx− ,
x∈B(p,w)
ie, B(p, w) has an open cover. Since B(p, w) is com-

pact, there is a finite subcover. Ie, there exist x1 , . . . , xn ,
where x1 ≽ . . . ≽ xn such that B(p, w) ∶= Lx−1 ∪ ⋅ ⋅ ⋅ ∪ Lx−n .
But this means that x1 ≻ y for all y ∈ B(p, w), a contra-
diction. ∎
3 Proposition: If ≽ is Convex, then the set of solutions (if

it exists) is convex. ♢
Proof. Let x, y be ≽-maximal. Then, x ∼ y, so that for

all α ∈ (0, 1), αx + (1 − α)y ≽ x ≽ z for all z ∈ B(p, w). ∎
4 Proposition: If ≽ is Strongly Convex, the solution (if it

exists) is unique. ♢
Proof. Let x, y be solutions. Then, B(p, w) ∋ 12 x + 12 y ≻ x,

a contradiction. ∎

4.2 Differentiable Utility Functions
4.3 Demand Functions
Assume that ≽ is Strongly Convex, so that the unique

solution to CP is x(p, w), where x ∶ RN N
+ × R+ → R+ . The
function x(⋅, ⋅) is referred to as the demand function. We
shall explore some properties of the demand function.
5 Proposition: For all λ > 0, x(p, w) = x(λp, λw). ♢
Proof. Notice that B(p, w) = B(λp, λw), so the result

follows immediately. ∎
QUESTION: Is the proposition above behaviourally

plausible?
6 Proposition (Walras’ Law): If ≽ satisfies MON, then

⟨p, x(p, w)⟩ = w, ie, x(p, w) lies on the boundary. ♢
Proof. If not, ⟨p, x⟩ < w, so there exists ε > 0 such that

x+ε(1, . . . , 1) ∈ B(p, w). (Why?) By MON, x+ε(1, . . . , 1) ≻
x, which is a contradiction. ∎
7 Proposition: If ≽ is continuous, the x(p, w) is continu-

ous in prices. ♢
Proof. If we have a utility representation, we can

directly appeal to the Theorem of the Maximum,
which says that if u(x, p) is continuous, then f(p) ∶=
arg maxx u(x, p) is continuous in p. We shall provide a
more direct proof here.

Assume not, so (pn ) is a sequence such that pn →

p∗ such that x(p∗ , w) = x∗ and x(pn , w) =∶ xn ↛ x∗ .
Therefore, there exists ε > 0 such that ∥xn − x∗ ∥ > ε for
all n.
Since (pn ) is a convergent sequence, the set {pn ∶ n ∈
N} is bounded. Moreover, since pn ≫ 0 and p∗ ≫ 0,
there exists m > 0 such that pn ≫ (m, . . . , m) for all n.
Therefore, xn ∈ [0, w/m]N . Since [0, w/m]N is compact,
the sequence (xn ) has a convergent subsequence. Re-
naming the subsequence if necessary, let us say that
xn → y∗ ≠ x∗ .
Recall the following continuity property of inner

product spaces. If (xn ) and (yn ) are sequences in Rn
that converge to x and y respectively, then ⟨xn , yn ⟩ →
⟨x, y⟩.
Now, for all n, ⟨pn , xn ⟩ ⩽ w, so that ⟨p∗ , y∗ ⟩ ⩽ w, ie,
y∗ ∈ B(p, w). But x∗ ≻ y∗ (because of Strong Convex-
ity). Let Nε (z) denote the ε ball around z ∈ X. Since
≽ is continuous, there exists an ε such that Nε (x∗ ) ∩
Nε (y∗ ) = ∅ and for any z ∈ Nε (x∗ ) and z ′ ∈ Nε (y∗ ),
z ≻ z ′ . Moreover, there exists M such that for any
n > M, xn ∈ Nε (y∗ ).
Let q ∈ R be such that q ⩾ ∑i pin for all n. Let δ > 0

be such that (i) z ∶= x∗ − δ(1, . . . , 1) ∈ Nε (x∗ ) and (ii)
0 < w − δq < w. (Such a δ clearly exists.) Therefore,
for any n > M, ⟨pn , z⟩ = ⟨pn , x∗ ⟩ − ⟨pn , δ(1, . . . , 1)⟩ =
w − δ ∑ pn < w. Moreover, z ≻ xn for all n > M, which
contradicts the optimality of xn . Therefore, x(p, w) is
continuous. ∎
8 Exercise: Show that x(p, w) is continuous in w. ♢

4.4 The Lagrangian with Applications
In this section, we shall see some heuristics that de-

scribe how we can fruitfully use the method of the Lag-
rangian in the consumer’s problem. As before, consider
the consumer’s problem,
(CP) max u(x) such that x ∈ B(p, w).

x∈R2+
Such a problem is called a constrained optimisation

problem, since the consumer wants to pick the optimal
bundle (hence the optimisation) from a budget set
(which represents his constraint). Notice that if there
were no budget constraint, the consumer would want
more of everything. We shall now see a very general
method of solving constrained optimisation problems.
Let x ∈ R2 and let f ∶ R2 → R be a function that we

want to maximise. The constraint is that the solution
must satisfy g(x) = c for some constant c ∈ R, where g ∶
R2 → R is a function. We shall refer to f as the objective
function. In other words, we want to solve the problem
max f(x) such that g(x) − c = 0.

x∈R2
We shall assume that both f and g are differentiable,

which means that they have all the partial derivatives
that are of interest to us.
Lagrange’s Method. Let us define the function
L (x, λ) ∶= f(x) + λ[c − g(x)]
called the Lagrangian after Joseph Louis Lagrange, the

inventor of the method. The real number λ > 0 is called
the Lagrange multiplier. It has not yet been determined

(we get to choose it), but we know that it is positive.

Notice that L is a function from R2 × R+ to R. Suppose
we want to maximise the function L , which, by con-
struction, has no constraints. In other words, we are
free to choose any (x, λ) ∈ R2 × R+ that maximises L .
(Of course, that λ cannot be negative is a constraint, but
it is one we shall ignore, since we know it will always
be satisfied, with the assumptions we have made.) An
obvious first step is to differentiate L and write down
the three first order conditions. Denote the partial deriv-
atives of L by
Lj ∶= ∂L /∂xj , Lλ ∶= ∂L /∂λ.
Then,
Lj (x, λ) = fj (x) − λgj (x),
and
Lλ = c − g(x).
The first-order necessary conditions require that we set
Lj (x, λ) = 0 for j = 1, 2 Lλ (x, λ) = 0.
We can now state Lagrange’s Theorem.
Lagrange’s Theorem. Suppose f, g and L are

as described above. If x∗ maximises f(x) sub-
ject to g(x∗ ) = c with no other constraints
(such as non-negativity, unless they automat-
ically hold), and if gj (x∗ ) ≠ 0 for at least one
j, then there is a value of λ > 0 such that
(9)
Lj (x∗ , λ) = 0 for j = 1, 2 Lλ (x∗ , λ) = 0.
What does the theorem tell us? It says that if we know

the solution to original problem, of maximising f(x)

such that g(x) = c, then there exists a λ such that the

equations in (9) hold. This doesn’t seem very useful,
but this isn’t how we shall use Lagrange’s Theorem.
What we shall do is the following. We shall maximise

L as above. Then, we shall see which solutions of the
new maximisation problem are solutions to the original
problem. In all the problems we shall encounter, this
second step will be unnecessary, in that if (x∗ , λ) max-
imises L (x, λ), then it will turn out that x∗ maximises
f(x∗ ) while satisfying g(x∗ ) = c.
This is a truly amazing leap. We started out with the

problem of maximising a function with some constraint
set that could have been very complicated. By attach-
ing the Lagrange multiplier, we turn it into an uncon-
strained problem which is very easy to solve: we just
look at the first order conditions.
Why Does It Work? Why does this mysterious method

work? The secret lies in determining λ, which tells us
the value of violating the constraint. Indeed, let λ be
any positive real number. Then, maximising the Lag-
rangian is a simple unconstrained problem. If we ex-
ceed the budget or feasibility constraint, ie if g(x) > c,
then the penalty paid for exceeding the constraint is
λ(c − g(x)) < 0. By choosing the λ carefully, we can guar-
antee that the feasibility constraint is just met. If it is
violated, then a penalty is paid, as above, and the value
of the Lagrangian is lowered. This careful balancing act
is very subtle in theory, but is the easiest thing to use
in practice. The λ in the theorem, the optimal λ, is inter-
preted as the marginal value of violating the constraint.
In the consumer’s context, it is called the shadow price
of wealth. Let us consider a simple example.

10 Example: Let the consumer have utility function u(x) ∶=

α ln(x1 ) + β ln(x2 ) and wealth w, and suppose prices are
p = (p1 , p2 ).
[In the notation introduced above, u is the function f, w

is c and g(x) = p1 x1 + p2 x2 .]
We write the Lagrangian as
L (x, λ) = α ln(x1 ) + β ln(x2 ) + λ[w − p1 x1 − p2 x2 ].
Then, the first order conditions become
∂L /∂x1 = α/x1 − λp1 = 0

∂L /∂x2 = β/x2 − λp2 = 0
and
∂L /∂λ = w − p1 x1 − p2 x2 = 0.
Notice that the last equation is just the budget constraint.
From the first two equations, we see that
α β
= = λ.
p1 x1 p2 x2
In other words,
α
p1 x1 = ( ) p2 x2 .
β
Substituting this into the budget equation and solving
for p2 x2 , we get
β
p2 x2 = w
α+β
so that
α
p1 x1 = w
α+β
which means that
(α + β )
λ= .
w
Notice that with the Cobb-Douglas utility function, the
consumer spends a constant share of his wealth on each

good. This is a very useful property in theoretical mod-

els, the kind we shall see in this course.
One final comment. The number λ, turns out to be the

marginal value of wealth. Intuitively, it is the penalty (in
utility terms) that the consumer must pay if he exceeds
his budget by a single dollar. ♢
Now for some exercises.
11 Exercise: Suppose prices are p = (p1 , p2 ) and the con-

sumer has wealth w. Find the consumer’s optimal bundle
β
if his utility function is (i) u(x) = xα
1 x2 and (ii) u(x) =
x1 − e−x2 . ♢
4.5 Rationalisable Demand
Suppose we are given a function x(p, w). Under what

conditions can we say that the function x(p, w) is the de-
mand function for some preference? In this section, we
seek to impose some conditions on the demand func-
tion that ensure that the function is indeed the demand
function for some preference relation. Thus, we say that
the function x(p, w) is rationalised by the preference ≽.
More specifically,
12 Definition: The preference ≽ fully rationalises x(p, w) if

for all (p, w), x(p, w) is the unique ≽-maximal element of
B(p, w). ♢

Unfortunately, we shall not see a general answer to this

question. Necessary and sufficient conditions are bey-
ond the scope of this course. Nevertheless, we shall find
necessary and sufficient conditions when there are only
two goods, and show that these conditions are neces-
sary, but not sufficient in general.
Two obvious necessary conditions are that (i) x(p, w) is
positively homogeneous of degree zero, and (ii) x(p, w)
satisfies Walras’ Law, ie, all the wealth is expended.
If an agent wants to signal to others his wealth or social
status, then he may, for instance, always purchase the
more expensive good. Such induced preferences cannot
be rationalised in our model of consumer behaviour.
Now consider the following example.
13 Example: Suppose x(p, w) = (α1 pw , . . . , αn pwn ), where

1
αi > 0 for all i = 1, . . . , n and ∑i αi = 1. Then, it is clear
that the preference represented by the (Cobb-Douglas)
utility function
n
u(x) ∶= ∏ xα
i
i
i=1
rationalises x(p, w). ♢
Notice that for each (p, w), the budget set B(p, w) is a
choice set. Therefore, the set of all choice sets is para-
metrised by (p, w), which is a generalisation of the
choice theoretic model considered earlier. Thus, it seems
reasonable that a necessary condition for a function to
be rationalisable is that it satisfy the weak axiom of re-
vealed preference (WARP). We shall now impose the
weak axiom in the budget setting.

14 Definition: A function x(p, w) satisfies WARP if ⟨p, x(p ′ , w ′ )⟩ ⩽

w and x(p, w) ≠ x(p ′ , w ′ ) implies ⟨p ′ , x(p, w)⟩ > w ′ . ♢
In words, if x(p ′ , w ′ ) is affordable at (p, w) but is not

optimal, then for the function to be rationalisable, it
must be that x(p, w) is unaffordable at (p ′ , w ′ ).
[INSERT PICTURE]
WARP (along with positive homogeneity of degree zero

and Walras’ Law) is necessary and sufficient for a func-
tion to be rationalisable, but is only necessary in gen-
eral. For the general case, the following axiom is used
instead.
N
15 Definition: Let (xn )N
1 be bundles and (B(pn , wn ))1
budget sets such that for all 1 < n ⩽ N, xn−1 ≠ xn , and
for all n, xn ∶= x(pn , wn ). Such a function x(p, w) satis-
fies the Strong Axiom of Revealed Preference (SARP) if
xn+1 ∈ B(pn , wn ) implies x1 ∉ B(pn , wn ). ♢
The Strong Axiom (SARP) says that any finite collection

of bundles can be effectively rationalised. This provides
us with a preference relation ≽ that is transitive, but
not complete. To find the transitive closure, we need
to use Zorn’s Lemma. We shall now see that the weak
axiom is nevertheless useful in that we can prove a Law
of Compensated Demand. In effect, we shall compare
the demand where prices move from p to p ′ , wealth
moves from w to w ′ , and the original optima bundle
x(p, w) is still just affordable at the new price, so that
⟨p ′ , x(p, w)⟩ = w ′ .

4.5.1 Law of Demand
We can now prove a (compensated) law of demand.

If x(p, w) is a demand function, we are interested in
the effect of (i) a change in price from p to p ′ and (ii)
a change in wealth from w to w ′ , while keeping the
optimal bundle at (p, w), namely x(p, w), still affordable
at the new budget set.
16 Proposition: Let x(⋅, ⋅) be a demand function that sat-

isfies Walras’ Law and WARP. If w = ⟨p, x(p ′ , w ′ )⟩, then
either x(p, w) = x(p ′ , w ′ ) or
⟨p ′ − p, x(p ′ , w ′ ) − x(p, w)⟩ < 0. ♢
Proof. Assume x(p, w) ≠ x(p ′ , w ′ ). Then,
⟨p ′ − p, x(p ′ , w ′ ) − x(p, w)⟩ = ⟨p ′ , x(p ′ , w ′ )⟩ − ⟨p ′ , x(p, w)⟩

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸′¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸′¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
=w >w
′ ′
− ⟨p, x(p , w )⟩ + ⟨p, x(p, w)⟩
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
=w =w
′ ′
= w − ⟨p , x(p, w)⟩ (by WARP)
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸′¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
>w
< 0
as desired. The last inequality follows from WARP:

If ⟨p, x(p ′ , w ′ )⟩ ⩽ w and x(p, w) ≠ x(p ′ , w ′ ), it must be
that ⟨p ′ , x(p, w)⟩ > w ′ . ∎
We can use this result to prove a Law of Demand.
17 Corollary (Law of Demand): Suppose p ′ − p = (pi′ − pi )ei

where pi′ − pi > 0 and ei ∶= (0, . . . , 1, . . . , 0), and suppose
the hypotheses of Proposition 16 are satisfied. Then,
xi (p ′ , w ′ ) − xi (p, w) < 0. ♢

18 Exercise: Consider the following bundles xi and prices

pi , i = 0, 1. Verify whether they satisfy WARP.
(a) p1 = (1, 3), x1 = (4, 2); p2 = (3, 5), x2 = (3, 1).

(b) p1 = (1, 2), x1 = (3, 1); p2 = (2, 2), x2 = (1, 2). ♢
19 Exercise: Suppose there are only two goods, and that a

consumer’s choice function x(p, w) satisfies Walras’ Law.
Show that if x(p, y) satisfies WARP, then the induced re-
vealed preference relation ≽ has no intransitive cycles. ♢

5 Duality
5.1 Induced Preferences over Choice Sets and

Indirect Utility
Let X be a choice set (not necessarily finite), and ≽ a

preference on X. Let C be a class of subsets of X. For
instance, if X is finite, then C could be the space of
all non-empty subsets of X. Let c(⋅; ≽) be the induced
choice function. Given the induced choice function, we
have an induced preference relation on C that we shall
denote as ≽∗ , defined as follows: for A, B ∈ C ,
A ≽∗ B if and only if c(A; ≽) ≽ c(B; ≽).
We shall make the standard assumptions about the pref-

erence ≽, thereby ruling out
• Temptation: {b} ≻ {b, c} ≻ {c}
• Guilt: {a} ≻ {a, b}, {b}
If u represents the preference ≽, then we can say that
the function v represents ≽∗ , wherein
v(A) ∶= u(c(A; ≽))
for all A ∈ C . We shall refer to v as the indirect utility

function. Notice that if A ≻∗ A ∖ {a}, then c(A; ≽) = {a}.
In what follows, we shall take X ∶= RN + , and let C be the
collection of all budget sets in X.
40
5 Duality 41
5.1.1 Preferences Over Budget Sets
Recall that a budget set is B(p, w) ⊂ RN

+ , where (p, w) ∈
N
R++ × R++ . Thus, the space of all budget sets, which we
referred to as C above, is (isomorphic to) RN ++ × R++ . An
important property of convex sets is that if A, B ⊂ RN
are convex, then αA + (1 − α)B is also convex for all
α ∈ [0, 1], where
αA + (1 − α)B ∶= {αx + (1 − α)y ∶ x ∈ A, y ∈ B}.
Thus, the space of budget sets is convex. It is useful to

note that this is also true in our space of parametrised
budget sets, ie, the isomorphism alluded to above pre-
serves convexity.
1 Exercise: Let A, B ⊂ RN be convex. Show that α + (1 − α)B

is also convex for α ∈ (0, 1). ♢
2 Exercise: Let B(p, w) and B(p ′ , w ′ ) be budget sets. Con-

sider the claim that, For any α ∈ (0, 1),
αB(p, w)+(1− α)B(p ′ , w ′ ) = B(αp +(1− α)p ′ , αw +(1− α)w ′ ).
Is this true? Prove or provide a counterexample. ♢
Let x(p, w) be a demand function for some preference ≽.

Then,
(p, w) ≽∗ (p ′ , w ′ ) if and only if x(p, w) ≽ x(p ′ , w ′ ).
We now examine some of the properties of the induced

preference ≽∗ .

5 Duality 42
(a) ≽∗ is homothetic. To see this, let λ > 0. Then,

(λp, λw) ∼∗ (p, w), since B(λp, λw) = B(p, w).
(b) ≽∗ is monotone, in the sense that B(p ′ , w ′ ) ⊂ B(p, w)
implies that (p, w) ≽∗ (p ′ , w ′ ). Thus, ≽∗ is non-
increasing in pn and increasing in w.
(c) ≽∗ is continuous if ≽ is continuous. To see this, no-
tice that if ≽ is continuous, then x(p, w) is continu-
ous, so that u(x(p, w)) is continuous, ie, v(p, w) is
continuous, so that ≽∗ is continuous.
(d) Indirect utility is quasiconvex. In other words, if
(p, w) ≽∗ (p ′ , w ′ ), then (p, w) ≽∗ α(p, w) + (1 −
α)((p ′ , w ′ ), for all α ∈ [0, 1].
To see this, suppose (p, w) ≽∗ (p ′ , w ′ ), and let z ∶=

x(αp + (1 − α)p ′ , αw + (1 − α)w ′ ). Then,
⟨αp + (1 − α)p ′ , z⟩ ⩽ αw + (1 − α)w ′ .
Therefore, either ⟨p, z⟩ ⩽ w or ⟨p ′ , z⟩ ⩽ w ′ must

hold (else the displayed equation cannot hold).
Therefore, x(p, w) ≽ z, ie, (p, w) ≽∗ α(p, w) + (1 −
α)((p ′ , w ′ ).
Remark. Notice that in showing the quasiconvexity

of indirect utility, we have not made any assumptions
about the underlying preference ≽.
We also have Roy’s Identity:
∂v
(p∗ , w∗ )
= −xn (p∗ , w∗ ).
∂pn
∂v
∂w
(p∗ , w∗ )

5 Duality 43
5.2 The Dual Problem
Let us refer to the following as the Primal Problem, de-

noted by P(p, y):
Find a ≽-maximal x ∈ B (p, ⟨p, y⟩), ie, solve

max {x ∶ ⟨p, x⟩ ⩽ ⟨p, y⟩}.
x
We shall refer to the following as the Dual Problem de-

noted by D(p, y):
Find x that minimises ⟨p, x⟩ such that x ≽ y, ie, solve

min {⟨p, x⟩ ∶ x ≽ y}.
x
The primal problem is also referred to as the Consumer’s

Problem (CP). The dual problem is also referred to as
the Expenditure Minimisation Problem (EMP). The fol-
lowing theorem says that knowing the solution to one
tells us the solution of the other.
3 Theorem: (a) If x∗ solves P(p, x∗ ), it solves D(p, x∗ ). (b)

If x∗ solves D(p, x∗ ), it solves P(p, x∗ ). ♢
Proof. (a) Assume x∗ solves P(p, x∗ ). If x∗ does not
solve D(p, x∗ ), then there exists x such that x ≽ x∗
and ⟨p, x⟩ < ⟨p, x∗ ⟩. Then, there exists ε > 0 such that
⟨p, x + ε1⟩ < ⟨p, x∗ ⟩ (where 1 = (1, . . . , 1) ∈ RN ). But then,
x + ε1 ≻ x, by MON, which is a contradiction that x∗
solves P(p, x∗ ).
(b) Assume x∗ solves D(p, x∗ ). If x∗ does not solve
P(p, x∗ ), then there exists x such that x ≻ x∗ and
⟨p, x⟩ ⩽ ⟨p, x∗ ⟩. Therefore, by CONtinuity of ≽, there
exists ε > 0 such that x − ε1 ≻ x∗ , and ⟨p, x − ε1⟩ <
⟨p, x∗ ⟩, which contradicts the assumption that x∗
solves D(p, x∗ ). ∎

5 Duality 44
Remark. Notice that for the theorem to be true, we

have to make some assumptions, ie, we don’t get it
for free. In particular, for (a) to be true, we need that
≽ satisfy MON. For (b) to be true, we need that ≽ be
CONtinuous. These are the only assumptions that are
needed.
5.3 Hicksian Demand
Assume that the dual problem D(p, u) has a unique

solution. This unique solution is referred to as the Hick-
sian demand function, denoted by h(p, u). We shall now
explore some properties of Hicksian demand.
(a) h(λp, y) = h(p, y) for λ > 0. To see this, notice that

λ ⟨p, x⟩ = ⟨λp, x⟩. Therefore, x∗ solves minx {⟨p, x⟩ ∶
x ≽ y} if and only if x∗ solves minx λ{⟨p, x⟩ ∶ x ≽ y}
if and only if x∗ solves minx {⟨λp, x⟩ ∶ x ≽ y}.
(b) hk (p, y) is non-increasing in pk . To see this, no-
tice for any other p ′ , ⟨p, h(p, y)⟩ ⩽ ⟨p, h(p ′ , y)⟩.
(This is because h(p, y) solves is the cheapest way
to be at least as good as y, which means that al-
though h(p ′ , y) is at least as good as y, it can-
not be strictly cheaper.) Similarly, ⟨p ′ , h(p ′ , y)⟩ ⩽
⟨p ′ , h(p, y)⟩. Thus,
⟨p − p ′ , h(p, y) − h(p ′ , y)⟩ = ⟨p, h(p, y) − h(p ′ , y)⟩

+ ⟨p ′ , h(p ′ , y) − h(p, y)⟩
⩽ 0.
When p − p ′ = (0, . . . , ε, . . . , 0), we see that hk (p, y) −

hk (p ′ , y) ⩽ 0. This result looks a lot like the law of
demand that we encountered earlier. This result is

5 Duality 45
the reason that the Hicksian demand function is

also referred to as the compensated demand func-
tion.
(c) h(p, y) is continuous in p. (You should check this.)
5.4 Expenditure Function
The expenditure function is given by e(p, y) = ⟨p, h(p, y)⟩,

is the cost or the value of the Hicksian demand func-
tion. Intuitively, e(p, y) is the smallest amount that has
to be spent to get a bundle at least as good as y. The
following are properties of the expenditure function. A
lot of the properties follow from properties of the Hick-
sian demand function.
(a) e(λp, y) = λe(p, y). To see this, notice that e(λp, y) =

⟨λp, h(λp, y)⟩ = ⟨λp, h(p, y)⟩ = λ ⟨p, h(p, y)⟩ = λe(p, y),
since h(λp, y) = h(p, y).
(b) e(p, y) is non-deceasing in pk and strictly increas-
ing in yk .
(c) e(p, y) is continuous in p (since h(p, y) is continu-
ous in p).
(d) e(p, y) is concave in p. To see this, let x ∶= h(λp +
(1 − λ)p ′ , y), so that, from the definition of the
expenditure function, e(p, y) ⩽ ⟨p, x⟩ and e(p ′ , y) ⩽
⟨p ′ , x⟩. Thus,
e(λp + (1 − λ)p ′ , y) = ⟨λp + (1 − λ)p ′ , x⟩

= λ ⟨p, x⟩ + (1 − λ) ⟨p ′ , x⟩
⩾ λe(p, y) + (1 − λ)e(p ′ , y).

5 Duality 46
4 Proposition (Shepherd’s Lemma): Suppose u(⋅) is

continuous, monotone, and strictly concave, and repres-
ents a preference ≽ on X ∶= Rn
+ . Then, for all p ≫ 0 and u,
Hicksian demand h(p, u) = Dp e(p, u). ♢
Proof. Recall that the Expenditure Minimisation Prob-

lem is
(EMP) min ⟨p, x⟩ such that u(x) ⩾ u
Let V (pk ) be the value function for this problem, fixed

values of pk/ . Then,
V (pk ) = min ⟨p, x⟩ such that u(x) ⩾ u

= ⟨p, x⟩ − λ(pk )[u(x) − u]
Then, by the envelope theorem (or more precisely, by

a corollary to the envelope theorem) we have
∂V
= x∗k
∂pk
∂
= ⟨p, x∗ ⟩
∂pk
∂
= e(p, u)
∂pk
where e(p, u) is the expenditure function, the solu-

tion to the expenditure minimisation problem, which
means that e(p, u) = x∗ , since the solution to the
primal is the same as the solution to the dual prob-
lem. ∎
We have the following results.
Duality Theorem:

5 Duality 47
(i) x(p, w) = h(p, x(p, w))

(ii) h(p, y) = x(p, e(p, y)).
The Slutsky Matrix is the matrix of partial derivatives,

also known as the Jacobian
∂(x1 , . . . , xN )
.
∂(p1 , . . . , pN )
The Slutsky matrix is symmetric. A fundamental prop-

erty is the following, known as the Slutsky Equation:

6 Comparative Statics
Comparative statics are the lifeblood of an economist. It

is also known as sensitivity analysis in the engineering
sciences. The basic question can be phrased as follows:
Suppose we can find some variable y that is

endogenous to the economic model in ques-
tion as a function of some fixed parameter x.
What happens when you change x?
Quite often, the function y will be the result of some

optimisation, so it may be the solution to some first
order conditions. For instance, in consumer theory, y is
the demand function and x may be a vector of prices,
or the agent’s wealth level. In producer theory, y could
be the supply function and x could be a price vector or
a parameter governing the production technology. In
such a case, we may use the implicit function theorem
to answer the question. For instance, we write
f(x0 , y0 ) = z0
which may represent a first order condition. We want

a function y = g(x), and want to find the derivative
Dg. Dg tells us the sign of the change in y as well as
the rate of that change. However, this approach has the
following shortfalls.
• It assumes that the solution y is sufficiently differ-
entiable.
48
6 Comparative Statics 49
• It only holds for sufficiently small changes in x. In

particular, it says nothing about larger changes in
x, which may be the question we have in mind if,
for instance, x represents a tax rate that is to be
raised.
• It assumes that the solution to the optimisation
problem is unique, which may not always be the
case.
• Sufficient conditions that ensure the Implicit Func-
tion Theorem is applicable are either unintuitive or
extremely strong. (See the example below.)
Consider the following example.
1 Example: Suppose the profit of a firm is π(q, p) = pq −

c(q), where p is the price of the good which the firm
takes as given, q is the quantity produced, and c(q) is
the cost of production. For each price p, let q(p) be the
profit maximising level of output. The question is, What
happens to q(p) if p rises? In the standard approach
that uses the Implicit Function Theorem, to answer this
question, we need to make a lot of assumptions about
c. As we will see below, these assumptions are neither
warranted nor necessary. ♢
2 Exercise: Assume that c(q) is concave, smooth and

increasing. What conclusions can you draw about how
q(p) behaves when p is changed? Be explicit and careful
in all your statements. ♢
In these notes, we shall consider two kinds of problems,

namely Type A and Type B problems.

Type A These are problems of the following form: X, Θ

are sets, S ⊂ X, and f ∶ X × Θ → R is a function. Let
Φ(θ) = arg maxx∈S f(x, θ) be the set of maximisers
as a function of θ. We are interested in how Φ(θ)
changes as a function of θ. For instance, how does
demand change with a change in preference?
Type B These are problems of the following form: X
is a set, S ⊂ X, and f ∶ X → R is a function. Let
Φ(S) = arg maxx∈S f(x) be the set of maximisers
as a function of S. We are interested in how Φ(S)
changes as a function of S, the constraint set. For
instance, how does demand change with a change
in wealth?
We shall begin with a treatment of Type A problems,

and then address Type B problems.
6.1 Monotone Comparative Statics — A First

Look
The kinds of comparative statics questions that we can

ask are primarily of two kinds. The first involves the
effect of a change in the constraint set (eg, a change in
prices or a change in wealth level). The second kind of
question involves a change in the objective function.
The theory of Monotone Comparative Statics that we
shall outline below will mainly address the second kind
of question. But remember that every problem has an
associated dual problem, and quite often, the dual prob-
lem may be amenable to analysis, where the primal
problem may not. We follow an ordinal approach where
there are fewer assumptions, but the assumptions will

be behavioural in nature. We start by assuming that all

relevant spaces are single dimensional.
3 Definition: Let X, T ⊂ R, and u ∶ X × T → R. We shall

say that u has increasing differences in (x, t) if for all
xH , xL ∈ X and tH , tL ∈ T such that xL < xH and tL < tH ,
u(xH , tH ) − u(xH , tL ) ⩾ u(xL , tH ) − u(xL , tL ).
We shall say that u has strictly increasing differences if

the inequality is strict. ♢
Notice that the inequality is symmetric in (x, t) in the

sense that we can rewrite it as
u(xH , tH ) − u(xL , tH ) ⩾ u(xH , tL ) − u(xL , tL ).
The notion of increasing differences is a form of com-

plementarity between x and t. Thus, if t is a parameter,
then one would expect that a rise in t would mean an
increase in the optimal x. It is always useful to know
what mathematical operations preserve increasing differ-
ences. The following proposition states that increasing
differences are preserved under addition and (point-
wise) multiplication.
4 Proposition: Let f, g ∶ X × T → R have increasing differ-

ences in (x, t). Then, u ∶ X × T → R given by
(i) u(x, t) ∶= f(x, t) + g(x, t) has increasing differences, and
(ii) u(x, t) ∶= αf(x, t) has increasing differences, where
α ⩾ 0. ♢
5 Exercise: Prove the proposition above. ♢

6 Proposition: Let f ∶ X → R and g ∶ T → R both be

increasing (or decreasing). Then, u ∶ X × T → R given by
u(x, t) ∶= f(x)g(t) has increasing differences in (x, t). ♢
Proof. To see this, let
u(xH , tH ) − u(xH , tL ) = g(xH )[h(tH ) − h(tL )]

⩾ g(xL )[h(tH ) − h(tL )]
= u(xL , tH ) − u(xL , tL )
as desired. ∎
Note well that we have not made any topological as-

sumptions thus far. But quite often, we will know that
X is convex and for each t, u ∈ C1 (X, t), in which case u
has increasing differences if and only if ∂u∂x
(x, t) is non-
decreasing in t. If T is also convex and we know that
u ∈ C2 (X, t), then u has increasing differences if and only
∂2 u
if ∂x∂y (x, t) ⩾ 0.
Now for some notation. Let us suppose that T is the
parameter space, and u is the objective function. Define
Ψ ∶ T → 2X as
Ψ(t) ∶= arg max u(x, t).

x∈X
Thus, for each t ∈ T , Ψ(t) is the set of maximisers of

u(x, t). Note well that Ψ is a function, as defined. But in
keeping with conventions, we shall denote it as Ψ ∶ T ↠
X and refer to it as a correspondence, or a set valued
map.
Let us first consider the case where each maximiser is
unique, ie for each t ∈ T , Ψ(t) is unique. This leads to

7 Theorem: Let u have strictly increasing differences in

(x, t) and suppose Ψ ∶ T ↠ X is a function. Then, for
tL , tH ∈ T where tL < tH , if xL ∶= Ψ(tL ) and xH ∶= Ψ(tH ),
then xH ⩾ xL . ♢
Proof. Suppose not, ie suppose xL > xH . Then, by

strict increasing differences,
u(xL , tH ) − u(xL , tL ) > u(xH , tH ) − u(xH , tL ).
By the definition of Ψ, it must be the case that u(xL , tL ) ⩾

u(xH , tL ). Therefore,
u(xL , tH ) > u(xH , tH ) + [ u(xL , tL ) − u(xH , tL ) ].

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
⩾0
This implies u(xL , tH ) > u(xH , tH ), which contradicts

the definition of Ψ. Therefore, it must be that xH ⩾
xL . ∎
Note that once again, we have not made any topological

assumptions. Thus, because of the potential discrete-
ness, we cannot claim that xH > xL , and have to make to
do with a weak increase. We are now ready to revisit an
old friend.
8 Example: Let π(q, p) ∶= pq − c(q) be the firm’s profit

function, where c(q) is a cost function. Then, π(q, p)
has strict increasing differences in (q, p). Assuming that
there is a unique maximiser, it is easily seen that if pH >
pL , then qH ⩾ qL . Notice again that we haven’t made
any assumptions about the function c(q). In particular,
it need not even be everywhere measurable, let alone
differentiable. ♢

We want to relax two assumptions made in the theorem

above. Namely the assumptions that Ψ(t) is always a
singleton and that u has strict increasing differences in
(x, t).
First, suppose that there are multiple solutions, and
that for each t, Ψ(t) is an interval. Then, for tH > tL ,
let Ψ(tL ) ∶= [xL , xL ] and Ψ(tH ) ∶= [xH , xH ] and suppose
Ψ(tL ) ≠ Ψ(tH ).
Assume that u has increasing differences (not necessar-

ily strict), and suppose xH < xL . Then, by increasing
differences, we have
0 ⩾ u(xL , tH ) − u(xH , tH )
⩾ u(xL , tL ) − u(xH , tL )
⩾ 0.
Therefore, it must be that

• u(xL , tL ) − u(xH , tL ) = 0, so that xH ∈ Ψ(tL ), and
• u(xL , tH ) − u(xH , tH ) = 0, so that xL ∈ Ψ(tH ).
Moreover, it must be that xH ⩾ xL , otherwise we would
have Ψ(tL ) = Ψ(tH ) (which cannot be, by assumption).
This motivates the notion of Strong Set Order.
9 Definition: Let S, T ⊂ R. We shall say that S dominates

T in the strong set order and denote it as S ⩾S T , if for
every x ∈ S and y ∈ T , y ⩾ x implies x, y ∈ S ∩ T . We shall
say that S >S T if S ⩾S T and ¬(T ⩾S S). ♢
10 Exercise: Verify that in the strong set order, S ⩽S ∅ ⩽S

S. ♢

11 Exercise: Is the strong set order reflexive? Transitive?

Symmetric? Antisymmetric? Complete? Justify all your
answers. ♢
12 Example: Some examples of the strong set order.
Thus, on 2R , the space of all subsets of R, the strong set

order is an order that is reflexive. But it is not reflexive
(see exercise above) and most definitely not complete.
The order on 2R also provides us a way to talk about
the order structure of correspondences.
13 Definition: A correspondence Ψ ∶ R ↠ R is nondecreas-

ing if for x > y, Ψ(x) ⩾S Ψ(y). ♢
We are now ready to prove our first real theorem of this

section.
14 Theorem: Let Ψ(t) ∶= arg maxx∈X u(x, t) and suppose

u(x, y) has increasing differences in (x, t). Then, Ψ ∶ T ↠
X is (weakly) increasing in t. ♢
Proof. Let t2 > t1 . Then, xi ∈ Ψ(ti ) and let x1 > x2 .
0 ⩾ u(x1 , t2 ) − u(x2 , t2 )
⩾ u(x1 , t1 ) − u(x2 , t1 )
⩾0
where the first inequality follows from the definition

of x2 , the second from increasing differences and the
third from the definition of x1 . Thus, Ψ is weakly in-
creasing in t, as desired. ∎

6.2 A Brief Detour into Lattice Theory
Let X be a set and ⩾ a binary relation on X. The pair

(X, ⩾) is a partially ordered set or poset if ⩾ is reflex-
ive, transitive and antisymmetric. For any x, y ∈ X,
the supremum or least upper bound or join is denoted
by x ∨X y ∶= inf{z ∈ X ∶ z ⩾ x, z ⩾ y}, and the in-
fimum or greatest lower bound or meet is denoted by
x ∧X y ∶= sup{z ∈ X ∶ z ⩽ x, z ⩽ y}.
A lattice is a poset (X, ⩾) such that for all x, y ∈ X,

x ∨X y ∈ X and x ∧X y ∈ X. Notice that in the notation,
I have added a subscript of ‘X’ to the ∨ and the ∧. In
future, when this operation is obvious, I shall drop the
subscript. Now for some examples of lattices.
The reals (R, ⩾). For any x, y ∈ R, x ∨ y ∶= max{x, y} and

x ∧ y ∶= min{x, y}.
Euclidean Space (RN , ⩾). For any x, y ∈ RN ,
x ∨ y ∶= { max{x1 , y1 }, . . . , max{xN , yN }}
and
x ∧ y ∶= { min{x1 , y1 }, . . . , min{xN , yN }}.
Function Space (RX , ⩾), where X is any set. For any

f, g ∈ RX , f ∨ g ∶= max{f, g}, and f ∧ g ∶= min{f, g}, ie
(f ∨ g)(x) ∶= max{f(x), g(x)} etc.
Sequence Space (R∞ , ⩾). For any x, y ∈ R∞ , x ∨ y ∶=
( max{xi , yi }) and x ∧ y ∶= ( min{xi , yi }).
Space of Subsets (2X , ⩾), where for any set X, 2X is the
space of all subsets, and A ⩾ B iff A ⊃ B. Then, for
A, B ⊂ X, A ∨ B ∶= A ∪ B and A ∧ B ∶= A ∩ B.

Space of Subsets (2X , ⩾), where for any set X, 2X is the

space of all subsets, and A ⩾ B iff A ⊂ B. Then, for
A, B ⊂ X, A ∨ B ∶= A ∩ B and A ∧ B ∶= A ∪ B.
A non-example Let S ∶= {(1, 0), (0, 1)} ⊂ R2 , and for

all x, y ∈ S, x ⩾S y if and only if x ⩾RN y. Then,
(1, 0) ∨R2 (0, 1) ∶= (1, 1) ∉ S and (1, 0) ∧R2 (0, 1) ∶=
(0, 0) ∉ S. Thus, the join of (1, 0) and (0, 1) exists in
R2 but not in S.
Another non-example Let ∆N ∶= {x ∈ R+N+1 ∶ ∑i xi = 1}.

Then, for x, y ∈ ∆N , x ∨ y ∉ ∆N .
A subset S of a lattice X is a sublattice if for all x, y ∈ S,

x ∨X y ∈ S and x ∧X y ∈ S. In other words, with the lattice
operations inherited from X, the set S is a lattice in its
own right.
Finally, we define product lattices. Let (Xi , ⩾i ) be a lat-
tice, for i = 1, . . . , n. Then, define the product lattice
X ∶= X1 × ⋅ ⋅ ⋅ × Xn by giving it the following order, join
and meet: For x ∶= (x1 , . . . , xn ), y ∶= (y1 , . . . , yn ) ∈ X, x ⩾ y
if and only if xi ⩾i yi for all i = 1, . . . , n. Also, x ∨ y ∶=
(x1 ∨1 y1 , . . . , xn ∨n yn ) and x ∧ y ∶= (x1 , ∧1 y1 , . . . , xn ∧n yn ).
With these definitions and examples, we can now return
to monotone comparative statics.
15 Exercise: Verify that (Rn , ⩾) is a product lattice, as the

n-fold product of (R, ⩾). ♢
16 Exercise: Show that x ∨ y ⩾ x, y and x, y ⩾ x ∧ y. ♢
17 Exercise: For x, y ∈ Rn , show that the four points x, y, x ∨

y, x ∧ y form a two-dimensional rectangle. ♢

6.3 Monotone Comparative Statics — Another

Look
In its fullest generality, the family of optimisation prob-

lems that we shall consider can be indexed by a para-
meter T :
max u(x, t)
x∈S
where u ∶ X × T → R, X is a lattice, T is a poset and S ⊂ X.

We shall frequently take X and T as Euclidean space,
but this is clearly not necessary. Moreover, our doing so
will not make the proofs of the theorems any easier.
The strong set order introduced earlier has a counter-
part in the abstract setting introduced in this section.
Consider the lattice (X, ⩾), and suppose A, B ⊂ X.
18 Definition (Strong Set Order): A is higher than B in the

strong set order if for each a ∈ A and b ∈ B, a ∨ b ∈ A and
a ∧ b ∈ B. ♢
Notice that the strong set order coincides with the un-
derlying lattice order in the sense that {a} ⩾S {b} if
and only if a ⩾ b. Moreover, the strong set order also
coincides with the definition provided earlier (in the
one-dimensional case, as it should). To see this, suppose
X = R, and let A, B ⊂ R. The definition reduces to saying
that if A ⩾S B, then a ∈ A and b ∈ B with b ⩾ a implies
that a, b ∈ A ∩ B.
19 Exercise: Verify that in the strong set order, S ⩽S ∅ ⩽S

S. ♢
20 Exercise: Is the strong set order reflexive? Transitive?

Symmetric? Antisymmetric? Complete? Justify all your
answers. ♢

21 Exercise: For any S ⊂ X, show that S ⩾S S if and only if S

is a sublattice of X. ♢
As before, let Ψ ∶ T ↠ S be the set of maximisers of u

for each t ∈ T . Then, Ψ is increasing if t ⩾ t ′ implies
Ψ(t) ⩾S Ψ(t ′ ). (Again, compare with the definition given
above.) A function g ∶ X → R is supermodular if for all
x, y ∈ X,
g(x ∨ y) + g(x ∧ y) ⩾ g(x) + g(y).
A function f ∶ X → R is submodular if the function −f is
supermodular.
If X is a lattice and T is a poset, the function g ∶ X × T →
R has increasing differences in (x, t) if t ⩾ t ′ implies
g(x, t) − g(x, t ′ ) is increasing in x. This condition can be
rewritten as
g(x, t) − g(x, t ′ ) ⩾ g(x ′ , t) − g(x ′ , t ′ )
where t ⩾ t ′ and x ⩾ x ′ , which is in consonance with
the definition we saw earlier. Supermodularity is about
how different choice variables interact with each other
for a fixed value of the parameter, while increasing dif-
ferences is about how the level of the choice variable
interacts with the level of the parameter. The following
exercise shows there is a distant relation between the
two concepts.
22 Exercise: Suppose g(x, t) is supermodular in (x, t).

Show that g has increasing differences in (x, t). ♢
The following theorem is the most general theorem we

shall prove. You are encouraged to consult Milgrom
and Shannon, 1994 for a more thorough, and yet en-
tirely readable treatment.

23 Theorem: Let u ∶ X × T → R be supermodular in x

for each t ∈ T , and suppose S is a sublattice of X. Also
suppose u has increasing differences in (x, t). Then, Ψ is
increasing. ♢
Proof. Notice that since S is a sublattice, x ∨ y, x ∧ y ∈ S

for all x, y ∈ S. Let s > t where s, t ∈ T , and suppose
x ∈ Ψ(s) and y ∈ Ψ(t). To prove our claim, we need
to show that x ∨ y ∈ Ψ(s) and x ∧ y ∈ Ψ(t). To see this,
notice that
0 ⩾ u(x ∨ y, s) − u(x, s)
⩾ u(x ∨ y, t) − u(x, t)
⩾ u(y, t) − u(x ∧ y, t)
⩾ 0.
The first inequality follows from the definition of Ψ

and the fact that x ∈ Ψ(s), the second inequality fol-
lows from increasing differences, the third from the
supermodularity of u on S, and the fourth from the
definition of Ψ and the fact that y ∈ Ψ(t). The desired
conclusion follows immediately. ∎
We end with some results that show how to discern su-

permodularity and increasing differences in Euclidean
applications. Also see Milgrom and Shannon, 1994.
24 Theorem: Let f ∶ Rn × Rm → R be twice continuously

differentiable. Then, f has increasing differences in (x, t)
if and only if ∂2 f/∂xi ∂tj ⩾ 0 for i = 1, . . . , n and j = 1, . . . , m;
and f is supermodular if and only if ∂2 f/∂xi ∂xj ⩾ 0 for all
t ∈ T , and i, j ∈ {1, . . . , n}. ♢

25 Theorem: If f and g are supermodular, and α, β ⩾ 0,

then αf + βg is supermodular. If the functions f1 , f2 , . . . are
supermodular and f∗ is the pointwise limit of (fn ), then
f∗ is supermodular. Also, ∑∞n=1 fn is supermodular if this
pointwise sum is well defined. If f is supermodular and
increasing and g ∶ R → R is increasing and convex, then
g ○ f is supermodular. ♢
Finally, the firm’s problem again and some more exer-

cises.
26 Example: Let π(p, q) ∶= ⟨p, q⟩ − c(q) be the firm’s profit

function, where p, q ∈ Rn + . If −c(q) is supermodular,
π(p, q) is supermodular in q and has increasing differ-
ences in (p, q). Therefore, if p rises (in the pointwise
order), then the set of optimal choices Ψ(p) also rises in
the strong set order. ♢
27 Exercise: Let f, g ∶ X × T → R have increasing differences

in (x, t). Show that f(x, t) + g(x, t) has increasing differ-
ences, and that αf(x, t) has increasing differences, where
α ⩾ 0. ♢
28 Exercise: Let f ∶ X → R and g ∶ T → R both be increasing

(or decreasing). Show that u ∶ X × T → R given by u(x, t) ∶=
f(x)g(t) has increasing differences in (x, t). ♢
29 Exercise: Show that the profit function in Example 26

is supermodular in q if and only if the cost function c(q)
is submodular in q. Show that π(p, q) has increasing
differences in (p, q). ♢
30 Exercise: Let p ∈ `1 and q ∈ `∞ . Let ⟨p, q⟩ ∶= ∑i pi qi .

Show that ⟨p, q⟩ < ∞. ♢

31 Exercise: Suppose p ∈ `1 and q ∈ `∞ (so that ⟨p, q⟩ ∶=

∑i pi qi < ∞). Show that the profit function supermodu-
lar if and only if cost function c(q) is submodular in q.
Does it have increasing differences in (p, q)? ♢
32 Exercise: Let p ∈ `1 and q ∈ `∞ . Show that ⟨p, q⟩ ∶=

∑i pi qi < ∞. Let c ∶ `∞ → R be a function such that
−c is supermodular. Show that the function π(p, q) ∶=
⟨p, q⟩ − c(q) is supermodular. Show that π has increasing
differences in (p, q). What happens to the set of optimal
choices (when maximising π(p, q) for given p), assuming
it exists, when p128 rises? (In other words, what hap-
pens when only the 128th price in the price sequence p
rises.) ♢
6.4 Comparative Statics with Changing

Constraints
We shall now consider comparative statics problems of

Type B. Before we do that, we shall take another look
at the standard theory developed above, but follow the
more abstract approach of Quah, 2007.
6.4.1 An Abstract Theorem
Recall that the approach taken to Type A problems

above imposed the condition of supermodularity. Un-
fortunately, supermodularity is a cardinal property and
not ordinal. Why is this important? Let f ∶ X × T →
R be a function, S ⊂ X a constraint set, and Ψ(t) =
arg maxx∈S f(x, t) the set of maximisers. Suppose Ψ(t) ⩾

Ψ(t ′ ) (in some order) where t > t ′ . Now take a mono-

tone transformation of f, ie let h ∶ R → R be strictly
increasing, and consider (h ○ f)(x, t). It is clear that
arg maxx∈S (h ○ f)(x, t) ⩾ arg maxx∈S (h ○ f)(x, t ′ ), whenever
t > t ′ . But it is clear that (h ○ f) need not be supermod-
ular, even if f is. Therefore, what is needed is an ordinal
approach to the problem of monotone comparative stat-
ics. We shall explore that now.
33 Exercise: Let h and f be as above. Show that
arg max (h ○ f)(x, t) = arg max f(x, t).

x∈S x∈S
To keep matters simple, we shall ignore the parameter

space T , and only concentrate on increases in the con-
straint set, since this shall be our main concern in what
follows. Let X be a set (not necessarily a lattice), and let
△, ▽ ∶ X × X → X be two binary operations. We shall write
△(x, y) as x △ y etc. We want to think of △ and ▽ as the
meet and join operations in a lattice. (Notice that in a
lattice, we are guaranteed the existence of a meet and
a join. We are not, for a general set, especially if that
set doesn’t have an order structure.) We shall make the
following assumption about the binary relations △ and
▽.
34 Assumption: ▽ and △ commute, ie x ▽ y = y ▽ x and

x △ y = y △ x.1 ♢ 1
Quah doesn’t make this as-
sumption explicitly, but as far
as I can tell, it is necessary.
Say that a function f ∶ X → R is (▽, △)-supermodular
(spm) if for all x, y ∈ X,
f(x ▽ y) − f(y) ⩾ f(x) − f(x △ y).
Of course, the notion of (▽, △)-supermodularity is not

ordinal, in that monotone transformations of f need

not be supermodular. To remedy this, we introduce an

ordinal version of the property above. We shall say that
the function f is (▽, △)-quasisupermodular (q-spm) if
for all x, y ∈ X,
f(x) ⩾ (>) f(x △ y) implies f(x ▽ y) ⩾ (>) f(y).
It is straightforward to verify that q-spm is an ordinal

property. It is also clear that spm implies q-spm, but
not vice-versa.
35 Exercise: Verify that spm is not an ordinal property,
and that q-spm is. Verify that spm is a strictly stronger
property than q-spm. ♢
Let S, S ′ ⊂ X be two subsets. We may define the induced

strong set order (by (▽, △), and write S ′ ⩾(▽,△) S) if for
any x ∈ S and y ∈ S ′ , x ▽ y ∈ S ′ and x △ y ∈ S. We can now
state the main theorem of this subsection.
36 Theorem: Let ▽ and △ be two operations on X, and let

f ∶ X → R. Then, the following are equivalent.
(i) f is a (▽, △)-q-spm function
(ii) For all S, S ′ ⊂ X, S ′ ⩾(▽,△) S implies
arg max f(x) ⩾(▽,△) arg max f(x).

x∈S ′ x∈S
Proof. (i) → (ii). Suppose f is q-spm. Let x ∈ arg maxx ′ ∈S f(x ′ )

and y ∈ arg maxx ′ ∈S ′ f(x ′ ). Since S ′ ⩾(▽,△) S, x ▽ y ∈ S ′
and x △ y ∈ S. Since x is a maximiser (of f on S), it
must be that f(x) ⩾ f(x △ y). By q-spm, we have
f(x ▽ y) ⩾ f(y), which implies f(x ▽ y) = f(y). If
we show that x △ y ∈ arg maxx∈S f(x), we will be done.
If x △ y ∉ arg maxx ′ ∈S f(x ′ ), then f(x) > f(x △ y). But
q-spm then implies f(x ▽ y) > f(y), which is a contra-
diction, so we are done.

(ii) → (i). For any x, y ∈ X, let S ∶= {x, x △ y} and

S ′ ∶= {x ▽ y, y}. Clearly, S ′ ⩾(▽,△) S. By assumption,
arg maxx∈S ′ f(x) ⩾(▽,△) arg maxx∈S f(x).
Suppose f(x) = f(x △ y), and suppose f(y) > f(x ▽ y).
But (ii) says that then {y} ⩾(▽,△) {x} or {x △ y}, which
is not true by assumption (since we have assumed
that x ▽ y ≠ y). Similarly, we can address the case
where f(x) > f(x △ y). Thus, f is q-spm. ∎
6.5 Constrained Optimisation in Euclidean

Space
We shall now consider constrained optimisation prob-

lems in Rn . In particular, we shall consider Type B prob-
lems. The main difficulty in using the standard lattice
order on Rn is that constraint sets like budget sets can-
not be ordered in the induced strong set order.
37 Exercise: Consider two budget sets B(p, w) and B(p, w ′ ),

where w < w ′ . Show that (i) B(p, w) ⊂ B(p, w ′ ), and (ii)
B(p, w) and B(p, w ′ ) cannot be ordered in the strong set
order. ♢
The main idea we shall use can be summarised in the

following recipe.
(a) Define operators (△, ▽) which can potentially de-

pend on a parameter, that allow budget sets and
the like to be ordered
(b) Ensure that the objective function is (quasi)supermodular
in this order

The rest of this section is devoted to developing such an

order. In what follows, we shall assume that the domain
X ⊂ Rn is a convex sublattice. For each λ ∈ [0, 1], define
the operator ▽λi on X as follows:
⎧
⎪y
⎪ if xi ⩽ yi
x ▽λi y ∶= ⎨
⎪
⎩λx + (1 − λ)(x ∨ y) if xi > yi
⎪
Notice that since X is a convex lattice (in particular, a

sublattice of Rn ), x ∨ y, x ∧ y ∈ X for all x, y ∈ X. Therefore,
the convex combination above is in X. Similarly, for
each λ ∈ [0, 1], define
⎧
⎪x
⎪ if xi ⩽ yi
x △λi y ∶= ⎨
⎪ if xi > yi
⎩λy + (1 − λ)(x ∧ y)
⎪
The property we shall require of functions f ∶ X →

R is now described. We shall say that f is (▽λi , △λi )-
supermodular for some λ ∈ [0, 1] if for all x, y ∈ X, we
have
f(x ▽λi y) − f(y) ⩾ f(x) − f(x △λi y).
Let us now examine the intuition behind the defin-

ition. For any x, y ∈ X, notice that if xi ⩽ yi , the x ▽λi y
condition above is easily satisfied, since both left x x∨y
and right hand sides are zero. Therefore, the only
interesting case is the one where xi > yi . Notice y
first that λ = 0 amounts to requiring that f be x ∧ y x △λ y
i
supermodular. Consider then a λ ∈ (0, 1), and
define v ∶= y − (x ∧ y) = (x ∨ y) − x. This im-
plies x △λi y = (x ∧ y) + λv and x ▽λi y = (x ∨ y) − λv.
Therefore, (▽λi , △λi )-supermodularity requires that the
difference in the value of the function on the left side

of the (backward bending) parallelogram defined by

{x, x △λi y, y, x ▽λi y} is less than the difference in the value
of f along the right side of the parallelogram.
Define now, Ci ∶= {(▽λi , △λi ) ∶ λ ∈ [0, 1]}, and say that a
function f ∶ X → R is Ci -supermodular if it is (▽λi , △λi )
for each (▽λi , △λi ) ∈ Ci , ie it is (▽λi , △λi )-supermodular
for each λ ∈ [0, 1]. Therefore, we are requiring that the
inequality hold for each λ ∈ [0, 1].
Again, requiring the parallelogram inequality only λ = 0
is equivalent to requiring that f be supermodular. Re-
quiring it instead for each λ ∈ [0, 1] is therefore a much
more stringent requirement. Fortunately, as our next
result shows, Ci -supermodularity follows from super-
modularity and concavity of f. In what follows, we shall
say that f is C -supermodular if it is Ci -supermodular
for each i = 1, . . . , n. First, a simple property of concave
functions, that will be very useful.
38 Exercise: Let f ∶ X → R be concave, x, v ∈ X and s, t > 0.

Then, f(x) − f(x − tv) ⩽ f(x − sv) − f(x − sv − tv). For x, y ∈ X,
define v ∶= x ∨ y − x. Show that v = y − x ∧ y. Show also that
for all λ ∈ (0, 1), f(x ∨ y) − f(x ∨ y − λv) ⩽ f(x + λv) − f(x). ♢
39 Exercise: Let f ∶ X → R be spm and x, y ∈ X. Define

v ∶= x ∨ y − x. Show that v = y − x ∧ y. Show also that for all
λ ∈ (0, 1), f(x ∨ y) − f(x + λv) ⩾ f(y) − f(x ∧ y + λv). ♢
40 Proposition: Let f ∶ X → R be spm and concave. Then, f

is Ci -spm for each i = 1, . . . , n. ♢
Proof. Let x, y ∈ X be unordered, with xi > yi (by

the definition of (▽λi , △λi ), this is the only case to con-
sider). Define v ∶= (x ∨ y) − x = y − (x ∧ y) and recall that

x ▽λi y = (x ∨ y) − λv, and x △λi y = (x ∧ y) + λv. We can

write
f(x ∨ y − λv) − f(y)

= [f(x ∨ y − λv) − f(x ∨ y)] + [f(x ∨ y) − f(y)]
and
f(x) − f(x ∧ y + λv)

= [f(x) − f(x + λv)] + [f(x + λv) − f(x ∧ y + λv)]
Since f is concave, it follows from exercise 38 above

that
f(x ∨ y − λv) − f(x ∨ y) ⩾ f(x) − f(x + λv)
From the supermodularity of f (and exercise 39), it

follows that
f(x ∨ y) − f(y) ⩾ f(x + λv) − f(x ∧ y + λv)
Adding the two displays gives us
f(x ∨ y − λv) − f(y) ⩾ f(x) − f(x ∧ y + λv)
as desired. ∎
The above is Proposition 2 in Quah, 2007. He states

a slightly more general theorem, and you are encour-
aged to consult the original. It is possible to define
an ordinal version of C -spm. Say that f ∶ X → R is
Ci -quasisupermodular if f is (▽λi , △λi )-q-spm for each
(▽λi , △λi ) ∈ Ci .
What does Ci -q-spm say in the case of a consumer with
preferences over two goods? Suppose the consumer’s
preferences are represented by the utility function f ∶
X → R, where X ∶= R2+ , and preferences are monotone, so

that f1 , f2 ⩾ 0. Recall that for any x ∈ X, the slope of the

indifference curve through x is precisely −f1 (x)/f2 (x).
Since preferences are concave, we know that as we go
along the indifference curve by increasing x1 , the slope
must decrease.
In general, the slope of the indifference curve does not
decrease as you keep x2 fixed, but increase x1 . However,
C2 -q-spm ensures the following declining slope property:
the slope of the indifference curve through x = (x1 , x2 )
falls as x1 rises, keeping x2 fixed. To see this, let us in-
stead assume that f is C2 -spm, and notice that
d f1 (x) −f2 f11 + f12 f1

(− )=
dx1 f2 (x) f22
But f11 ⩽ 0, since f is concave, and f12 ⩾ 0 since f is C2 -

spm. Therefore, dx d
1
(− ff12 ((xx)) ) ⩾ 0, as required. See the
supplement to Quah, 2007 for more on the special case
where X = R2+ . We end with some exercises that further
explore Ci -spm.
λ λ
˜ i,△
41 Exercise: Let us define (▽ ˜ i ) as follows:
⎧
λ ⎪y
⎪ if xi ⩽ yi
˜
x▽i y ∶= ⎨
⎪ if xi > yi
⎩λy + (1 − λ)(x ∨ y)
⎪
⎧
λ ⎪x
⎪ if xi ⩽ yi
˜
x△i y ∶= ⎨
⎪
⎩λx + (1 − λ)(x ∧ y) if xi > yi
⎪
Assume f ∶ X → R is concave and spm. Show that f is
Ci -spm. ♢
42 Exercise: Consider now the two-parameter family Ci

defined as follows: First define (▽λi , △λi ) as above. Now
define,
▽λi ,µ ∶= µy + (1 − µ)x ▽λi y

and
△λi ,µ ∶= µx + (1 − µ)x △λi y.
Let x ′ ∶= (x △λi y) − (1 − µ)w. Show that x △λi y = x ′ △τi y

and x ▽iλ,µ y = x ′ ▽τi y for some τ ∈ [0, 1]. (See figure ?? x ▽λi y
x x∨y
below.) As above, assume that f ∶ X → R is concave and
x △λi ,µ y
spm. Show that f satisfies
x′ x ▽λi ,µ y
f(x ▽λi ,µ y) − f(y) ⩾ f(x) − f(x △λi ,µ y). x∧y y
x △λi y
Put differently, let Ci (λ, µ) ∶= {(▽λi ,µ , △λi ,µ ) ∶ λ, µ ∈ [0, 1]}.
Show that f is Ci (λ, µ)-spm. ♢
6.6 Ordering Constraint Sets
Let S, S ′ ⊂ X, where X is a convex sublattice of Rn .

We shall say that S ′ dominates S in the Ci -flexible b ∶= x ▽λi y
x x∨y
set order (and write S ′ ⩾i S if for any x ∈ S and
y ∈ S ′ , there exists (▽λi , △λi ) ∈ Ci such that x ▽λi y ∈ S ′ x∧y y
and x △λi y ∈ S. We shall say that S ′ dominates S in a ∶= x △λi y
the C -flexible set order if S ′ ⩾i S for all i, and write

S ′ ⩾ S.
What does the ordering entail? Suppose x ∈ S and y ∈ S ′ .

If xi ⩽ yi , then x ▽λi y = y and x △λi y = x, so it follows
immediately that S ′ ⩾i S. Therefore, to check that S ′ ⩾i S,
we only need to consider the case where xi > yi , and
x, y are unordered. In this case, what is required is that
we find λ such that a ∶= x △λi y ∈ S and b ∶= x ▽λi y ∈ S ′ .
Notice that the definition of the Ci -flexible set order is
remarkably flexible. More specifically, we could have
used defined Ci as Ci ∶= {(▽ ˜ λi , △
˜ λi ) ∶ λ ∈ [0, 1]}, where

▽˜ λi , △
˜ λi are defined as above. We could also have used
the family Ci (λ, µ) ∶= {(▽λi ,µ , △λi ,µ ) ∶ λ, µ ∈ [0, 1]}, and
indeed, in some applications, it is advantageous to do
so. An exercise illustrates the ideas.
43 Exercise: Consider a price vector p ∈ Rn

++ and two
budget sets B(p, w) and B(p, w ′ ), where w < w ′ . Show
that B(p, w ′ ) ⩾ B(p, w) in the C -flexible set order. (Hint:
First consider points x and y that satisfy the budget con-
straint with equality.) ♢
6.7 Comparative Statics
As before, X is a convex sublattice of Rn . Let us say that

f ∶ X → R is i-increasing if S ′ ⩾i S implies arg maxx∈S ′ f(x) ⩾i
arg maxx∈S f(x). More generally, f is increasing if S ′ ⩾ S
implies arg maxx∈S ′ f(x) ⩾ arg maxx∈S f(x). Our main the-
orem says that the i-increasing property holds if, and
only if, f is q-spm. As before, the proof is very straight-
forward, since all the terms are well defined.
44 Theorem: Let f ∶ X → R where X ⊂ Rn is a convex

sublattice. Then, the following are equivalent.
(a) f is Ci -q-spm.
(b) f has the i-increasing property. ♢
Proof. (a) implies (b). Let f be Ci -q-spm, S ′ ⩾i S,

x ∈ arg maxx ′ ∈S f(x ′ ) and y ∈ arg maxx ′ ∈S ′ f(x ′ ). We
may assume, without loss of generality, that xi >
yi (since this is the only case that needs to be con-
sidered). Since S ′ ⩾i S, there exists λ∗ ∈ [0, 1] such that
y ≠ (x ▽λi y) ∈ S ′ and x ≠ (x △λi y) ∈ S.
∗ ∗

Define C ′ ∶= {y, x ▽λi y} and C ∶= {x, x △λi y}. Also

∗ ∗
by definition, we have x ∈ arg maxx ′ ∈C f(x ′ ) and y ∈

arg maxx ′ ∈C ′ f(x ′ ). Since f is Ci -q-spm, it is definitely
∗ ∗
(▽λi , △λi )-q-spm, in the sense of Theorem 36. There-
fore, by Theorem 36, we have C ′ ⊂ arg maxx ′ ∈S ′ f(x ′ )
and C ⊂ arg maxx ′ ∈S f(x ′ ), as desired.
(b) implies (a). Exercise. ∎
45 Exercise: Complete the proof of Theorem 44. (Hint:

Follow the logic of Theorem 36.) ♢
6.8 Applications
We end our discussion of monotone comparative statics

with some applications, which are extremely hard to
reach via other methods.
46 Example (Normal Goods): Consider a consumer with

preferences over Rn+ and suppose his preferences are
represented by f ∶ Rn n
+ → R. Let p ∈ R++ be a price and
w his wealth level. Let x(p, w) = arg maxx∈S f(x) be his
demand correspondence. Recall that a good is normal
(for the consumer) if an increase in wealth increases his
demand for the good.
Let w ′ > w. By exercise 43, we see that B(p, w ′ ) ⩾ B(p, w)
in the C -flexible set order. Moreover, since f is Ci -q-spm,
it follows from Theorem 44 that x(p, w ′ ) ⩾i x(p, w).
Therefore, a sufficient condition on f for all goods to be
normal is that f be concave and supermodular. Notice

that while most texts provide a definition of a normal

good, and also observe that normality is a property of
the underlying preference, few (if any) give sufficient
conditions on preferences that guarantee normality of
demand. However, with the methods developed above,
the conclusion is trivial, illustrating the power of the
technique. ♢
47 Exercise: Let ≽ be a continuous preference on Rn

+ , and
suppose that the induced demand function x(p, w) is
monotone in w, ie w ′ > w implies x(p, w ′ ) ⩾ x(p, w).
What does this imply about ≽? Put differently, what
properties must ≽ necessarily have if demand is nor-
mal? ♢

7 Choice under Uncertainty
Until now, we have modeled situations where
action z→ consequence
and the map is deterministic. We now want to look

at situations where an action leads to a probability
distribution over consequences, ie the consequence is
stochastic
action z→ (consequence 1, . . . , consequence n, . . . )
The action is viewed as choosing a lottery ticket and a

consequence is a prize (or outcome). First a brief review
of definitions from probability theory.
7.1 Preliminaries
Let Z be a set. A σ-algebra Z is a collection of subsets

of Z such that (i) for all A ∈ Z , Ac ∶= Z ∖ A ∈ Z , (ii) for
any (An ) ∈ Z ∞ , ⋃n An ∈ Z , and (iii) Z ∈ Z . The pair
(Z, Z ) is referred to as a measurable space. Notice that
Z ∶= {∅, Z} is also a σ-algebra, the trivial σ-algebra. The
set of all subsets of Z, namely 2Z , is also a σ-algebra. In
what follows, we shall assume that Z is sufficiently rich
in that it contains all the singletons, ie {z} ∈ Z for all
z ∈ Z.
A (finite, positive) measure is a set-function P ∶ Z → R+

(in the sense that it takes as input an element of Z )
74
7 Choice under Uncertainty 75
such that (i) P(Z) < ∞, and (ii) for any collection of
pairwise disjoint sets (An ) ∈ Z ∞ , P (⋃n An ) = ∑n P(An ).
The triple (Z, Z , P) is known as a measure space. If, in
addition, it is the case that P(Z) = 1, then (Z, Z , P) is a
probability space.
WARNING. There are a few tracts that define a prob-
ability measure as a set function that satisfies (i) above
and (ii-a) for any finite, pairwise disjoint collection of
sets (A1 , . . . , An ) ∈ Z n , P (⋃n An ) = ∑n P(An ).
Note. The property (ii) in the definition of a probabil-
ity measure is known as countable additivity, and the
property (ii-a) is known as finite additivity, and a fi-
nitely additive probability measure is also known as a
charge. Countable additivity, also known as continuity
of the measure, is essential for the proof of the Laws of
Large Numbers and the Central Limit Theorems, which
assume central positions the theory of probability, and
are indispensable in applications. Nevertheless, finitely
additive measures show up quite often in analysis (and
therefore infinite dimensional economies, for instance
when the prize space is `∞ ) and seem to be more reas-
onable from a behavioural point of view. Nevertheless,
we shall see some odd behaviour associated with fi-
nitely additive measures.
1 Example: Let Z ∶= R and Z ∶= B, the Borel σ-algebra,

which is the σ-algebra generated1 by the open intervals. Let C be a collection of subsets
1
Then (R, B) is a measurable space and (R, B, Leb) is a of Z. Then, C ⊂ 2Z , the latter
being a σ-algebra. Let σ(C )
measure space, where Leb is the Lebesgue measure. ♢
be the smallest σ-algebra that
contains C . Then, σ(C ) is the
2 Example: The measure space (Rn , B n ), is the n-dimensional
σ-algebra generated by C .
Euclidean space with the Borel σ-algebra. As above,
(Rn , B n , Lebn ) is a measure space, where Lebn is the
n-dimensional Lebesgue measure. ♢

3 Example: The probability space ([0, 1], B[0, 1], P), where
P([0, 1]) = 1 and P(A) ⩾ 0 for all A ∈ B[0, 1]. ♢
4 Example: Let (Z, Z , P) be a finite measure space with

P(Z) > 0 and P(A) ⩾ 0 for all A ∈ Z . Then, we can define
the probability measure Q ∶ Z → [0, 1] as Q ∶= P(1Z) P, so
that (Z, Z , Q) is a probability space. ♢
5 Example: Let (Z, Z ) be a measurable space, and fix

z0 ∈ Z. Now define the probability measure δz0 , as
⎧
⎪1
⎪ if z0 ∈ A
δz0 (A) ∶= ⎨
⎪
⎩0
⎪ otherwise,
for all A ∈ Z . The measure δz0 is known as the Dirac

measure, and is concentrated at the point z0 . ♢
In the special case where Z is finite, Z ∶= 2Z , the space

of all subsets of Z. A probability measure p ∶ Z → [0, 1]
is uniquely determined by the values it takes on mem-
bers of Z. In particular, if Z ∶= {z0 , . . . , zn }, so that ∣Z∣ =
n + 1, p can be represented as (p0 , . . . , pn ), where pi ⩾ 0
for all i and ∑n
i=0 pi = 1. Intuitively, pi ∶= p({zi }) for all i.
It will be especially useful for us to consider the space

of all probability measures on Z, which is clearly
n
∆n ∶= {p ∈ Rn+1 ∶ pi ⩾ 0 and ∑ pi = 1},
i=0
the standard n-dimensional simplex: n-dimensional since

∆n is of dimension n in the sense that its affine hull
is an n-dimensional linear manifold, simplex since it is
a generalisation of a triangle, and standard since it is
the canonical simplex in much the same way that an
equilateral triangle in the canonical triangle.

6 Exercise: Sketch ∆n for n = 0, 1, 2. ♢
For p ∈ ∆n , the support of the measure p is the set

supp(p) ∶= {z ∈ Z ∶ p(z) > 0}, ie all elements of Z that
are assigned positive probability by the measure p. We
shall generalise this definition in the sequel to other
probability spaces.
Notice that ∆n is a convex subset of Rn+1 . Therefore, for
p, q ∈ ∆n , 12 p + 12 q ∈ ∆n , ie the convex combination of
probability measures is also a probability measure. For
instance, if n = 2 and p = ( 21 , 12 , 0) and q = (0, 0, 1), then
1 1
( 1 1 1 ). Thus, taking a convex combination of
2p + 2q = 4, 4, 2
p and q changes the support of the resulting measure.
This is a more general property.
7 Exercise: Let p, q ∈ ∆n . Show that for all α ∈ (0, 1),

supp (αp + (1 − α)q) ∶= supp(p) ∪ supp(q). ♢
8 Exercise: Let (Z, Z ) be a measurable space and let p, q

be probability measures on this measurable space. Show
that for any α ∈ (0, 1), αp + (1 − α)q is also a probability
measure, where for any set A ∈ Z , (αp + (1 − α)q)(A) ∶=
αp(A) + (1 − α)q(A). ♢
9 Exercise: Let (Z, Z ) be a measurable space and p, q

probability measure with finite support on this meas-
urable space. Show that for all α ∈ (0, 1), the measure
αp + (1 − α)q also has finite support. ♢
Fix a probability space (Z, Z , P). A random variable (rv)

is a measurable function f ∶ Z → R such that for each
B ∈ B, f−1 (B) ∈ Z . The expectation of f with respect to
the measure P is
EP (f) ∶= ∫ f(z) P(dz).

Z

Notice that Ep is a linear functional on the space of rv’s.

For a measurable space (Z, Z ), the space of lotteries on
Z is denoted by L(Z) and is the space of all probability
measures on (Z, Z ). Notice that L(Z) is always convex.
Often, it is more convenient to look at the space of all
lotteries with finite support, L0 (Z). From the exercises
above, it is easy to see that L0 (Z) is also convex.
If Z is finite, then L(Z) = L0 (Z) ∶= ∆∣Z∣−1 and L(Z) ⊂ RZ .
In what follows, unless otherwise specified, we shall
assume that Z is finite.
We are interested in preferences over L(Z). Members
of L(Z) represent “objective” lotteries, in the sense that
all the agents in the economy assign the same odds to
the outcomes. The implicit assumption therefore is that
only distributions over outcomes matter.
7.2 Preferences Over Lotteries
Let us look at some examples of preferences over lotter-

ies.
Preference for Uniformity – I: Suppose ∣Z∣ = n, then

2 2
p ≽ q if ∑i (pi − n1 ) ⩽ ∑i (qi − n1 ) , so the most
preferred lottery is the uniform distribution.
Preference for Uniformity – II: A wider class of prefer-
ences that prefer uniformity are those where p ∼ q
implies 12 p + 12 q ≽ p. Clearly, the preferences above
have this property.

Worst Case: Suppose there exists a function v ∈ RZ , and

p ≽ q if
min {v(z) ∶ z ∈ supp(p)} ⩾ min {v(z) ∶ z ∈ supp(p)}
Lexicographic Order: p ≽ q if (p1 , . . . , pn ) ≽L (q1 , . . . , qn ),

where ≽L is the lexicographic order.
Expected Utility: If there exists a function v ∈ RZ , p ≽ q
if
Ep [v(z)] ⩾ Eq [v(z)]
We shall shortly explore some properties of the prefer-

ences described above.
7.3 Compound Lotteries
Fix a (finite) set of prizes Z and let L(Z) be the space of

lotteries over Z. We shall say that a lottery p ∈ L(Z) is a
simple. Consider now the lottery described as follows:
(a) In the afternoon, balls are drawn from an urn,
and with probability α1 , the outcome is a simple
lottery p1 ∈ L(Z), and with probability α2 ∶= 1 − α1 ,
the outcome is a simple lottery p2 ∈ L(Z)
(b) In the evening, a perfect die is rolled to determine
the outcome of the relevant simple lottery (ie p1 or
p2 )
Call the above lottery (in the sense that there is a dis-
tribution over prizes) a compound lottery. Compound,
since there are two layers of uncertainty, and after the
first level of uncertainty is resolved, we are still left with
a simple lottery whose uncertainty is yet to be resolved.

Consider the compound lottery A, where as above, with

probability α = 41 , the outcome is the simple lottery p1 ∶=
[(z1 , z2 , z3 ) ∶ ( 31 , 13 , 13 )], and with probability α2 ∶= 34 , the
outcome is the simple lottery p2 ∶= [(z4 , z5 , z6 ) ∶ ( 17 , 72 , 47 )].
Now consider the simple lottery q ∶= [(z1 , z2 , z3 , z4 , z5 , z6 ) ∶
( 121 , 12
1 1 3 3 3
, 12 , 28 , 14 , 7 )]. Clearly, both the compound lottery
A and the simple lottery q have the same final distribu-
tion over outcomes. So the question is,
Is the decision maker indifferent (should he
be) between the compound lottery A and the
simple lottery q?
Notice that this isn’t a mathematical question. Instead,
it is a behavioural question, about how the decision
maker will (should) behave. Therefore, there is no cor-
rect answer. Instead, you, as the modeller, should be
comfortable with whatever answer you think reason-
able.
Notice also that the compound lottery is really a simple
lottery if we let the outcome space be L(Z), ie A ∈ L(L(Z)).
Therefore, A and q are defined over different domains.
10 Exercise: Describe a canonical mapping for Z ↦ L(Z).

Does the same canonical mapping work for L(Z) ↦
L(L(Z))? In other words, is every simple lottery in a
sense also complex? ♢
Going back to the initial question, suppose z1 = 15, z2 = 8

and z3 = 10, and z4 = z5 = z6 = 9, and suppose a prize rep-
resents a wealth level. Recall that a simple lottery (the
second stage lottery) is only resolved in the evening.
Now suppose there is some unmodelled information
that the decision maker receives in the afternoon, say

about what he will have for lunch in the afternoon. This

could conceivably change how he feels about the two
simple lotteries p1 and p2 . Therefore, it is conceivable
that the agent does not think that the lotteries A and q
are equivalent. Note well that the argument depends on
unmodelled things (the information in the afternoon).
Note well also that there is no economic model that
captures all that goes on in an agent’s life.
7.4 Expected Utility Theory
We shall now introduce the celebrated Expected Utility

Theory, originally due to von Neumann and Morgen-
stern. Note well that the theory we shall introduce will
treat the compound lottery A and simple lottery q as
mathematically equivalent and hence render them be-
haviourally indistinguishable by assumption. In any
event, we shall begin with the axioms.
Let Z be a finite set, and L(Z) the space of lotteries over
Z. As always, ≽ is a preference relation over L(Z). The
first axiom requires that preferences be continuous.
11 Assumption (Continuity — CONT): If p ≻ q ≻ r, there

exists α ∈ (0, 1) such that q ∼ αp + (1 − α)r. ♢
This assumption is equivalent to the standard version of

continuity that we have seen earlier. Nevertheless, the
above version seems to somewhat more difficult to swal-
low. For instance, if p is the lottery that gives $1,000,000
for sure, q is a lottery that gives $1,000 for sure and
r is the lottery where you lose your life. Most people

would say, “I wouldn’t trade my life for anything, even

the smallest chance of getting a million dollars.” Now
suppose the million dollars is four city blocks away and
you have to go on foot to get it. There is clearly a small
chance that you could get struck by lightning or hit by
a truck, ie of getting the lottery r. So the axiom doesn’t
seem so bad after all. The next axiom is more controver-
sial.
12 Assumption (Independence — IND): For all p, q, r, p ≻

q implies αp + (1 − α)r ≻ αq + (1 − α)r for all α ∈ (0, 1). ♢
The standard story behind this axiom is that we can

consider two compound lotteries A and B. With prob-
ability 1 − α, they both give the simple lottery r, and
with probability α, A gives lottery p while B gives lot-
tery q. If the outcome is r, then the decision maker has
no choice to make. If the outcome is to make a choice
between p and q, well we know that the decision maker
strictly prefers p. Thus, if the decision maker strictly
prefers p to q, then he should strictly prefer A to B for
any α and r.
Or so the story goes. Recall the example in the previous
section, where we argued that reducing a compound
lottery to a simple lottery is behaviourally suspect. But
with this warning in mind, we shall maintain this as-
sumption.
13 Exercise: Which of the preferences in §7.2 satisfy Con-

tinuity? Independence? ♢
14 Exercise: Show that the Continuity Axiom presented in

this section and the Continuity Axiom presented earlier
in terms of contour sets are equivalent. ♢

15 Exercise: Suppose that ≽ satisfies IND. Show that for

each p ∈ L0 (Z), the sets {q ∶ q ≽ p}, {q ∶ p ≽ q} and
{q ∶ q ∼ p} are convex. Is the convexity of the three sets
for each p sufficient to ensure that ≽ satisfies IND? ♢
Before we state and prove the Expected Utility Theorem,

we shall state a couple of intermediate results that are
interesting in their own right.
16 Lemma: Let ≽ satisfy IND and suppose p, q ∈ L0 (Z) are

such that p ≻ q. Show that for any 1 ⩾ α > β ⩾ 0,
p ≽ αp + (1 − α)q ≻ βp + (1 − β)q ≽ q. ♢
Proof. By IND, we know that p ≽ αp + (1 − α)q. Now,

let γ ∶= β/α ∈ (0, 1). Then, γ(αp + (1 − α)q) + (1 − γ)q =
βp + (1 − β)q. By IND, αp + (1 − α)q ≻ γ(αp + (1 − α)q) +
(1 − γ)q ≽ q, as required. ∎
17 Lemma (Substitution): Let (pn )N N

1 and (qn )1 be collec-
tions of lotteries such that pn ∼ qn for all n ∈ 1, . . . , N.
Then, for any (αn ) ∈ RN+ such that ∑n αn = 1,
∑ αn pn ∼ ∑ αn qn .
n n ♢
Proof. We shall prove the lemma for the case of N = 2.

The general case is left as an exercise. We know that
pn ∼ qn . Now, let α ∈ [0, 1]. Then, by IND, αp1 + (1 −
α)p2 ∼ αq1 + (1 − α)p2 . By IND, αq1 + (1 − α)p2 ∼
αq1 + (1 − α)q2 . Transitivity of ∼ proves the claim. ∎

18 Exercise: Prove the Substitution lemma above for gen-

eral N. ♢
19 Exercise: Suppose δx ∼ δy for all x, y ∈ Z. Show that for

all p, q ∈ L0 (Z), p ∼ q. ♢
Now for the Theorem. When Z is finite, it is easy to see

that L0 (Z) is compact and convex. Hence, by Debreu’s
theorem, there exists a utility representation, since pref-
erences are continuous. But this utility representation
lacks the structure we desire. Therefore, we shall need a
new line of proof, which is presented below.
20 Theorem (Expected Utility Theorem): Let ≽ be a pref-

erence over L0 (Z). Then, the following are equivalent.
(i) ≽ satisfies IND and CONT.
(ii) There exists a function v ∶ Z → R such that for any
lotteries p, q ∈ L0 (Z), p ≽ q if and only if
V (p) ∶= ∑ p(z)v(z) ⩾ ∑ q(z)v(z) =∶ V (q).

z z
Moreover, the function v is unique up to positive affine

transformation.2 ♢ 2
For a set X, a function w ∈ RX
is a positive affine transform-
Proof. We shall only prove that (i) implies (ii). The ation of a function v ∈ RX if
there exist a > 0 and b ∈ R such
other implication is left as an exercise. Let z○ and z○
that w ∶= av + b.
be the best and worst degenerate lotteries respectively.
If δz○ ∼ δz○ , it follows from the Substitution lemma
above (and exercise 19) that p ∼ q for all p, q ∈ L0 (Z).
Thus, we can let v(z) = 0 (or any other constant) for all
z ∈ Z, which implies ∑z p(z)v(z) = 0 for all p ∈ L0 (Z), as
desired.
Now suppose δz○ ≻ δz○ . Let v(z○ ) = 0 and v(z○ ) = 1.
Then, by CONT, for each z ∈ Z, there exists a v(z) ∈

[0, 1] such that δz ∼ v(z)δz○ + (1 − v(z))δz○ . By lemma 16,

v(z) is unique. By the Substitution lemma,
p = ∑ p(z)δz
z
∼ ∑ p(z)[v(z)δz○ + (1 − v(z))δz○ ]
z
= [∑ p(z)v(z)] δz○ + [1 − ∑ p(z)v(z)] δz○ .

z z
Moreover, by lemma 16, p ≽ q if and only if ∑z p(z)v(z) ⩾

∑z q(z)v(z), so that V represents ≽.
Now to show that v is unique up to positive affine
transformation. Let us suppose that W (p) ∶= ∑z p(z)w(z)
also represents ≽. We shall show that w is a posit-
ive affine transformation of the function v identified
above. First notice that since δz○ ≻ δz○ , w(z○ ) > w(z○ )
and v(z○ ) > v(z○ ). Thus, there exists a unique (a, b) ∈
R++ × R such that
w(z○ ) = av(z○ ) + b and w(z○ ) = av(z○ ) + b.
Indeed,
w(z○ ) − w(z○ )
a ∶=
v(z○ ) − v(z○ )
b is defined implicitly. Then, for any z ∈ Z, there exists
a α ∈ [0, 1] such that δz ∼ αz○ + (1 − α)z○ , so that
w(z) = αw(z○ ) + (1 − α)w(z○ )

= α[av(z○ ) + b] + (1 − α)[av(z○ ) + b]
= a[αv(z○ ) + (1 − α)v(z○ )] + b
= av(z) + b.
Thus, v is unique up to positive affine transforma-

tion. ∎

21 Exercise: Show that (ii) implies (i) in the theorem above.♢
22 Exercise: Prove the expected utility theorem above for

the case where Z is infinite. ♢
23 Exercise: Let Z ⊂ Rn be convex. Suppose ≽ is a prefer-

ence relation on Z that is continuous and satisfies the
following: x ≻ y implies αx + (1 − α)z ≻ αy + (1 − α)z for
all x, y, z ∈ Z and all α ∈ (0, 1]. Show that there exists an
affine function u ∶ Z → R that represents ≽. Now consider
the cases where Z ⊂ C[0, 1] and Z ⊂ `∞ respectively. ♢
7.5 Consistency of our Assumptions
Let us now consider two examples that show us that

we should be careful with the assumptions we are mak-
ing about the properties of an agent’s preferences over
lotteries. It is easiest to do so via some exercises.
24 Exercise (Allais’ Paradox): Consider the lotteries in $ 0m $ 1m $ 5m

the table J 0 1 0
K 0.01 0.89 0.10
(a) Would you choose J over K? L 0.89 0.11 0
(b) Would you choose L over M? M 0.9 0 0.1
(c) Show that the preferences with J ≻ K and M ≻ L

cannot have an expected utility representation. It
will suffice to show that ≽ does not satisfy one of
the vN-M axioms. ♢
25 Exercise (Zeckhauser’s Paradox): Consider the fol-

lowing version of the Russian Roulette. Some bullets are
loaded into a revolver with six chambers. The cylinder is
spun and the gun put to your head.

(a) Would you be willing to pay more to get one bullet

removed from the gun when only one bullet was
loaded or when four bullets were loaded?
(b) Show that if you have vN-M preferences, you
would be willing to pay the same amount of money
to have a bullet removed, regardless of the number
of bullets in the gun.
(c) Can you see why you would be willing to pay the
same amount to have a bullet removed? Does this
seem reasonable?
In sum, vN-M preferences have the property that (ap-

propriate) utility representations that are linear in prob-
abilities. As Allais’ and Zeckhauser’s Paradox show,
this is not always very reasonable. Nevertheless, when
sums are small, and probabilities don’t change by too
much, it seems reasonable that vN-M preferences are a
good approximation of reality.

8 Risk and Risk Aversion
8.1 Monetary Prizes
One of the most important applications of expected

utility theory is when the prizes are money amounts.
The money amount can be interpreted as final wealth
(a stock) or as income (a flow). Based on the interpret-
ation, certain other assumptions may or may not be
palatable. Be careful what you have in mind. We shall
assume that the prize space Z ⊂ R. As above, prefer-
ences are over L0 (Z), the space of lotteries with finite
support.
A lottery is a probability distribution over R with finite
support. In the last chapter, we presented a utility func-
tion over Z that represented preferences over lotteries.
In applications, we would like more structure on the
utility function, in particular, that it be continuous. Un-
fortunately, this property of the utility function does not
follow from the axioms considered in the last chapter.
Roughly, we need to strengthen our continuity axiom.
We need, as before, that upper and lower contour sets
be closed. But if Z is not finite, we can’t use the Euc-
lidean metric, so we need another metric on the space
of lotteries. This is the technical detail that we wish to
avoid here, so we shall simply assume that there exists
a continuous vN-M utility function u on Z, and that the
utility of a lottery p is Ep [u] ∶= ∫ u(z) p(dz).
88
We will naturally assume that if x > y, then δx ≻ δy , ie

that preferences are monotone. We shall also wish to
consider lotteries with unbounded (but typically finite)
support.
More advanced treatments allow for lotteries with infin-
ite support. In such treatments, a consequence is that
the utility function must be bounded. This is a neces-
sary property of the utility function. To see this, sup-
pose the utility function u ∶ Z → R is unbounded below
and lotteries can have unbounded (and countably in-
finite) support. Then, for each k ∈ N, there exists zk ∈ Z
such that u(zk ) ⩽ −2k . Now consider the lottery p∗ with
countable, unbounded support, where the probability
of zk is p∗ (zk ) = 2−k . Then, the expected utility of the
lottery p∗ is
∞
Ep∗ [u] = ∑ u(zk )p∗ (zk ) = −∞.
k=1
If we allow lotteries with −∞ utility, then the weak

Continuity axiom fails. For if p ≻ q ≻ p∗ , then for all
α ∈ (0, 1), Eαp+(1−α)p∗ [u] = −∞ < Eq [u]. This can be
avoided if u is bounded, or if we only allow lotteries
with a bounded support, thereby bounding utility.
There is another, supposedly market-based reason for
not allowing unbounded utility, the so-called St. Peters-
burg Paradox. Suppose the agent has an unbounded util-
ity function u. Let us say it is unbounded above. Now
suppose the agent has wealth z0 . A confidence man ap-
proached him and makes the following offers.
(a) A lottery that reduces his wealth to 0 with probab-
ility 1/2 and wealth z1 with probability 1/2, where
z1 is such that u(z0 ) < [u(0) + u(z1 )]/2. Since u is
unbounded, such an z1 always exists.

(b) If the agent loses, the confidence man stops. If not,

rather than pay up, the agent offers him a lottery
that reduces his wealth to 0 with probability 1/2
and gives him wealth x2 with probability 1/2. As
before, z2 is such that u(z1 ) < [u(0) + u(z2 )]/2. Since
u is unbounded, such an z2 always exists.
(c) And so on and so forth . . . .

Clearly, with probability 1, the agent ends up with zero
wealth. Again, this is just an imaginary market. This is
the same type of argument for restricting attention to
risk averse individuals (that we shall see below). It is
not clear where such a confidence man can be found, or
why if there are other confidence men, they don’t bid
down the bets. Therefore, you should take this argu-
ment with a grain of salt.
8.1.1 Marginal Utility
A crucial concept in economics is the marginal utility

of consumption or wealth. Nevertheless, it is not an
idea we have yet mentioned, and with good reason. In
the standard model of the consumer, the idea makes
no sense. This is because the consumer’s preferences
can only be identified up to a monotone transformation.
As is easily seen, the derivative is not preserved under
monotone transformation, hence the notion of marginal
utility is not well defined.
But if we have lotteries over money, and if preferences
over lotteries are linear, then the Expected Utility The-
orem tells us that the vN-M utility function is identified
up to positive affine transformation. But the derivative
is then identified up to positive constant, hence mar-
ginal utility is identified up to positive constant, giving

the notion some meaning. It is amusing to note that

the notion of marginal utility of wealth requires the
use of lotteries over wealth. For more on this, see von
Neumann and MorgensternNeumann and Morgenstern,
1953.
8.2 Comparing Lotteries
The question that we will address in this section is the

following:
Given a set of preference relations on L(Z), for

what lotteries p and q will it be the case that p ≽ q
for all agents in this set?
The type of agents we shall consider are those who

satisfy the EU hypothesis and have increasing utility for
money, ie those with a non-decreasing function u such
that Ep [u] ⩾ Eq [u].
We shall assume that lotteries are over R. We shall not
make any assumptions on the support of lotteries, ex-
cept to require that it be compact (and not necessarily
finite). Let F be the cdf of a lottery, and without loss of
generality, assume that F(0) = 0 and F(1) = 1.
Before we proceed, recall that a lottery F ∶ R → R has the
following properties:
• limx→−∞ F(x) = 0 and limx→∞ F(x) = 1
• F is non-decreasing, and
• F is right continuous.

1 Exercise: Show that the set of discontinuities of F(⋅) is

countable. ♢
8.2.1 First Order Stochastic Dominance
Suppose we say that ‘F yields unambiguously higher

returns that G’. This could mean
(a) every EU maximiser prefers F to G, or
(b) for all x, F(x) ⩽ G(x).
Notice that the second definition is independent of EU
theory – it is purely a property of distributions. Now
for a definition. For this definition, we shall take the
first meaning above, and in our theorem, we shall show
that it is equivalent to the second definition.
2 Definition: F(⋅) first order stochastically dominates

G(⋅) if for all u ∶ R → R, non-decreasing, we have
∫ u(x) F(dx) ⩾ ∫ u(x) G(dx).
We shall write this as F D1 G. ♢
The following is a characterisation of first order stochastic

dominance.
3 Theorem: F D1 G if and only if F ⩽ G. ♢
Proof. (i) Suppose F D1 G. Let H(x) ∶= F(x) − G(x). By

assumption, H(0) = H(1) = 0. We want to show that
H(x) ⩽ 0 for all x ∈ [0, 1]. So suppose that x∗ ∈ (0, 1)

such that H(x∗ ) > 0. Toward this end, define u ∶ R → R

as
⎧
⎪0 x < x∗
⎪
u(x) ∶= ⎨
⎪ ∗
⎩1 x ⩾ x .
⎪
Thus, u is non-decreasing. Therefore, by definition of
1
D1 , ∫ u(x) H(dx) ⩾ 0. But ∫ u(x) H(dx) = ∫x∗ u(x) H(dx) =
u(1)H(1) − u(x∗ )H(x∗ ) < 0, a contradiction. Therefore,
F ⩽ G.
(ii) Suppose F ⩽ G. It suffices to consider u ∈ C2 (R),

since any non-decreasing function can be approxim-
ated by a twice continuously differentiable function.
As above, let H(x) ∶= F(x) − G(x). Then, integrating by
parts,
1 1
∫ u(x) H(dx) = [u(x)H(x)]0 − ∫ u ′ (x) H(x)dx
0
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
=0
since H(0) = H(1) = 0. Therefore, ∫ u(x) H(dx) ⩾

1
0 if and only if ∫0 u ′ (x) H(x)dx ⩽ 0. But since u is
non-decreasing, u ′ ⩾ 0 ae. Moreover, H ⩽ 0, so that
1 ′
∫0 u (x) H(x)dx ⩽ 0, as desired. ∎
4 Exercise: Let u ∶ R → R and suppose F ⩽ G. Show

that if ∫ u(x) F(dx) < ∫ u(x) G(dx), there exists a twice
continuously differentiable function v ∶ R → R such that
∫ v(x) F(dx) < ∫ v(x) G(dx). ♢
Notice that if F D1 G, then ∫ x F(dx) ⩾ ∫ x G(dx). (Prove

this.) Consider the following simple example of the D1
order.

5 Example: Let
⎧
⎪ 0 x ⩽ 0,
⎪
⎪
⎪
⎪
G(x) ∶= ⎨2x 0 ⩽ x ⩽ 12 ,
⎪
⎪
⎪
⎪ 1
⎩1 x ⩾ 2 .
⎪
Let F(x) ∶= G(x − 12 ). Then, F D1 G. ♢
6 Exercise: In the example above, let Ft ∶= G(x − t), for

0 ⩽ t ⩽ 12 . Show that for each such t, F1 D1 G. Sketch Ft
and G. ♢
7 Exercise: Suppose F D1 G. Show that ∫ x F(dx) ⩾ ∫ x G(dx).

Show that for all α ∈ (0, 1), F D1 αF + (1 − α)G D1 G. ♢
8 Exercise: Is the order D1 on L(Z) complete? Transitive?

Prove or disprove. ♢
9 Exercise: Let (pn ) be a sequence of lotteries in L0 ([0, 1])

such that pn+1 D1 pn for all n. Moreover, suppose pn → p
pointwise (or uniformly if that makes life easier–but
look up Dini’s theorem in an analysis text). Show that
p D1 pn for all n. ♢
10 Exercise: Let (Fn ) be a sequence of cdf’s (lotteries) in R

such that Fn+1 D1 Fn for all n. Moreover, suppose Fn → F
pointwise (or uniformly if that makes life easier–but
look up Dini’s theorem in an analysis text). Show that
F D1 Fn for all n. ♢

8.3 Risk Aversion
We say that ≽ is risk averse if for any lottery p, Ep [x] ≽

p, risk loving if p ≽ Ep [x], and risk neutral if Ep [x] ∼ p.
In other words, a risk averse agent prefers the expecta-
tion of a lottery to the lottery itself. We will now char-
acterise risk aversion for agents with vN-M utility func-
tions. Let C ⊂ Rn be a convex set. Recall that a function
u ∶ C → R is concave if u(αx +(1− α)y) ⩾ αu(x)+(1− α)u(y)
for all x, y ∈ C, α ∈ [0, 1]. Now for some properties of
concave functions defined on R.
(a) If u ∶ R → R is concave, then it is continuous every-

where. If f ∶ [a, b] → R, then it is continuous in the
interior of the domain.
(b) If u is differentiable, x < y implies u ′ (x) ⩾ u ′ (y).
(c) Jensen’s Inequality: For all x1 , . . . , xN and α1 , . . . , αN
such that αn ⩾ 0 and ∑n αn = 1, it is the case that
u (∑n αn xn ) ⩾ ∑n αn u(xn ).
(d) Three Strings Lemma: x < y < z and u concave im-

plies [u(z) − u(y)]/(z − y) ⩽ [u(z) − u(x)]/(z − x) ⩽
[u(y) − u(x)]/(y − x). (Draw a picture to see why
the Lemma is named so.)
11 Exercise: Prove the properties above. ♢
12 Proposition: Let ≽ be a preference on L(Z), represented

by a vN-M utility function u. Then, ≽ is risk averse if
and only if u is concave. ♢
Proof. (i) Assume ≽ is risk averse and is represented

by u. For any α ∈ (0, 1) and x, y ∈ Z, by risk aversion it

follows that δαx+(1−α)y ≽ αδx + (1 − α)δy . Thus, u(αx +

(1 − α)y) ⩾ αu(x) + (1 − α)u(y), so that u is concave.
(ii) Suppose now that u is concave. Then, for any lot-
tery p with finite support, u( Ep [x]) ⩾ Ep [u], ie ≽ is
risk averse. ∎
13 Exercise: Prove the proposition above for lotteries with

infinite support. Use the following generalised form of
Jensen’s inequality: u (∫ x F(dx)) ⩾ ∫ u(x) F(dx), for all
cdf’s F and concave u. ♢
8.4 Risk Premia
For a preference ≽, the certainty equivalence of a lottery

p ∈ L(Z) is CE(p) ∈ R, such that CE(p) ∼ p. If preference
≽ is continuous and monotonic, then CE(p) always ex-
ists. If ≽ is risk averse, the CE is unique. Let us define
RP(p) ∶= Ep [x] − CE(p) as the risk premium. A preference
is risk averse if and only if RP(p) ⩾ 0.
14 Exercise: Show that if ≽ is continuous and monotonic

(ie, x > y implies δx ≽ δy ), then CE exists. Show that if ≽ is
risk averse, then CE is unique. ♢
15 Exercise: Show that RP ⩾ 0 if and only if ≽ is risk

averse. ♢

8.5 Comparing Risk Aversion
Is there a way to compare the risk aversions of agents?

Note, we are not (yet) necessarily asking for a numer-
ical measure of risk aversion (which would solve our
problem). All we want is a definition that we can use
in practice. To see what a possible definition might be,
let us think about a plausible notion in the following
example.
Let p be a lottery that gives prizes $1 and $2 with equal
probability. Suppose agent A has a certainty equivalent
CEA (p). If A is risk averse, CEA (p) ⩽ 32 . If B is more risk
averse, it is reasonable to require that CEB (p) ⩽ CEA (p).
We shall take this as a definition.
16 Definition (cra1): ≽1 is more risk averse than ≽2 if for

all p, CE1 (p) ⩽ CE2 (p). ♢
A weaker version of this statement (weaker if prefer-

ences are not monotone) is the following.
17 Definition (cra2): ≽1 is more risk averse than ≽2 if for

all p and any degenerate lottery x, p ≽1 δx implies p ≽2
δx . ♢
Finally, if preferences have a vN-M representation, there

is the following notion.
18 Definition (cra3): Let ui be a vN-M utility function

that represents ≽i , for i = 1, 2. Then, ≽1 is more risk
averse than ≽2 if there is an increasing concave function
g such that Ep [u1 ] ∶= g( Ep [u2 ]). ♢

In other words, ≽1 is more risk averse than ≽2 if u1 is

more concave than u2 . Thus, measuring the curvature
of the utility function seems to be a worthwhile exercise.
We shall come back to this idea. First, we shall show
that the three notions defined above are equivalent for
EU preferences.
19 Theorem: Suppose ≽1 and ≽2 are continuous and mono-

tone. Then, the above definitions are equivalent. ♢
Proof. (i) [cra1 implies cra2] Suppose cra1 holds,

so that CE2 (p) ⩽ CE1 (p) for all p. Now suppose p ≽1 δx
for some x. Then, by risk aversion, transitivity and
monotonicity, CE1 (p) ⩾ x, and by assumption CE2 (p) ⩾
x. Therefore, p ≽2 δx .
(ii) [cra2 implies cra3] Suppose cra2 holds. Define

g implicitly by g(u2 (x)) = u1 (x). We want to show that
g is concave. Suppose g is NOT concave, in which
case there exist x, z ∈ Z and α ∈ (0, 1) such that
g(αu2 (x) + (1 − α)u2 z)) < αg(u2 (x)) + (1 − α)g(u2 (z)).
Let p ∶= αδx + (1 − α)δz and y ∶= CE2 (p). Then, y ∼2 p,
but p ≻1 δy . By Continuity of preferences, there exists
w > y such that p ≻1 δw but w ≻2 p, which contradicts
cra2.
(iii) [cra3 implies cra1] Suppose cra3 holds, so
g is concave. Then, for any lottery p, u2 (CE2 (p)) =
∑ p(xk )u2 (xk ). Now, u1 (CE2 (p)) = g(u2 (CE2 (p))) =
g( ∑ p(xk )u2 (xk )) ⩾ ∑ p(xk )g(u2 (xk )) = ∑ p(xk )u1 (xk ) =
Ep [u1 ] = u1 (cE1 (p)). Since u1 is increasing, CE1 (p) ⩽
CE2 (p), as desired. ∎
20 Exercise: Show that cra1 is equivalent to cra2 if pref-

erences are monotone. ♢

21 Exercise: Consider the utility function u(c) ∶= c1−σ /(1 −

σ), where σ ∈ (0, 1). Does an increase in σ make the
utility function more or less risk averse? ♢
8.6 The Arrow-Pratt Measure
As mentioned above, the concavity of the utility func-

tion indicates the degree of risk aversion. We shall now
look at a measure of risk aversion which is closely re-
lated to the curvature of the utility function.
Consider two agents 1 and 2, with initial wealth levels
w1 and w2 , and preferences ≽1 and ≽2 . Now suppose
p is a lottery over money. Then, w + p is a lottery over
final wealth levels, wherein p(w + xk ) = p(xk ) for all
xk ∈ supp(p). Suppose now that the agents can be rep-
resented by vN-M utility functions u1 and u2 over lotter-
ies over money.
We shall interpret the two agents as different selves
of one agent, but at different wealth levels. Intuitively,
the agent ‘changes’ whenever his wealth level changes.
More precisely, consider the wealth levels w1 and w2 as
above. Then, the agent’s preferences at wealth level wi
are ≽i , i = 1, 2. But the following doctrine says that this
is not of great concern.
Consequentialism Suppose the agent has preference

≽w at wealth level w. Then, there exists a single utility
function u such that for all wealth levels w and for all
lotteries p and q over money, p ≽w q if and only if u(w +
p) ⩾ u(w + q).

Is this assumption, which economists make all the time,

reasonable? MarkowitzMarkowitz, 1952 disavows the
doctrine, suggesting that any change in wealth should
cause the agent to rerank the entire space of lotteries. A
reasonable question then is, How much should prefer-
ences change if there is a very small increase in wealth?
As MachinaMachina, 1982 points out, dropping the

doctrine of consequentialism would be extremely dis-
turbing to economists, and more importantly, would
make welfare comparisons nearly impossible to perform.
In what follows, we shall adopt the doctrine without
further qualification.
Coming back to our discussion of a measure of risk

aversion, we note that we want to compare an agent’s
risk aversion at different wealth levels.
22 Definition: Suppose the agent’s preferences are ≽w at

wealth level w, and suppose the doctrine of consequen-
tialism holds. Then, the agent is decreasingly (absolute)
risk averse (DARA) if w < w ′ implies that ≽w is more risk
averse than ≽w ′ . ♢
Thus, if y is a constant amount of wealth, and if the

agent is more risk averse at wealth level w than at wealth
level w ′ , then for any lottery p, p ≽w δy implies p ≽w ′ δy .
Another way of stating this is that the agent is DARA
if for any lottery p and all wealth levels w < w ′ and y,
Ew+p [u] > u(w + y) implies Ew ′ +p [u] > u(w ′ + y). (This is
the form in which it is usually stated in textbooks.) The
following theorem helps us characterise DARA.

23 Theorem: Suppose ≽ has a vN-M representation u.

Then, ≽ is DARA if and only if
u ′′ (w)
Au (w) ∶= −
u ′ (w)
is non-increasing in w. The function Au is called the

Arrow-Pratt measure of risk aversion. ♢
The theorem follows from the next theorem below. As a

consequence of the above theorem, we have the follow-
ing definition.
24 Definition: A utility function u has constant (absolute)

risk aversion (CARA) if Au (w) is constant in w. Then,
for some a > 0, b ∈ R,
⎧
⎪aw + b
⎪ if Au (w) = 0, or
u(w) ∶= ⎨
⎪ −λw + b if Au (w) = λ > 0.
⎩−ae
⎪ ♢
The Arrow-Pratt measure also allows us to compare risk

aversion across individuals.
25 Theorem: v is more risk averse than u if and only if

Av ⩾ Au . ♢
Proof. (i) Suppose v is more risk averse than u, so

that v ∶= g ○ u for some concave increasing g. Then,
(26) v ′ (c) = g ′ (u(c))u ′ (c)
and
2
v ′′ (c) = g ′ (u(c))u ′′ (c) + g ′′ (u(c))(u ′ (c)) .

Then, dividing −1 times equation (36) by (26), we get
v ′′ (c) g ′′ (u(c)) ′
Av (c) ∶= − = A u (c) − u (c).
v ′ (c) g ′ (u(c))
But g is concave so g ′′ ⩽ 0, so that Av ⩾ Au .

(ii) Suppose u and v are increasing and Av ⩾ Au . Let g
be implicitly defined by v =∶ g ○ u, ie g(y) ∶= v(u−1 (y)).
Since u ′ , v ′ > 0, it follows from (26) that g ′ > 0. We can
rewrite (36) as
g ′′ (u(c))
− u ′ (c) = Av (c) − Au (c) ⩾ 0
g ′ (u(c))
where the last inequality is by hypothesis. But u ′ , g ′ >

0, so that g ′′ ⩽ 0, as desired. ∎
8.7 A Portfolio Problem
We shall now use these ideas in a simple portfolio prob-

lem. Consider an investor with wealth w. There are
two assets in which she can invest. The first is riskless,
and offers a constant rate of return of 1. The second is
a risky asset that gives a (net) return ri with probabil-
ity pi , i = 1, . . . , n. We shall denote the random rate of
return by r. Let x denote the part of his wealth that
he allocates in the risky asset. Then, final wealth is
(w − x) + x(1 + r) = w + xr.
Suppose she has vN-M utility function u and is DARA.
Then, her expected utility is Ep [u(w + xr)], and her
problem is to maximise expected utility, ie to solve
n
max ∑ pi u(w + ri x).
0⩽x⩽w i=1

For ease of notation, let U(x) ∶= ∑n

i=1 pi u(w + rx). The first
thing to do is to check that the optimal choice x∗ > 0. If
dU ∗
x∗ = 0, it must be that (x ) ⩽ 0. But this happens if
dx
and only if
∑ pi ri u ′ (w + ri x)∣x∗ =0 = u ′ (w) ∑ pi ri ⩽ 0.
In other words, x∗ = 0 if and only if the expected return

of the asset ∑ pi ri ⩽ 0. So from now onwards, we shall
assume that ∑ pi ri > 0 which implies that x∗ > 0. Let us
now assume that x∗ < w. Then, the first order condition
is
(FOC) ∑ pi ri u ′ (w + ri x) = 0
and the second order condition is
(SOC) ∑ pi r2i u ′′ (w + ri x) < 0.
Notice that SOC is strict because of risk aversion.

Now to our question. What happens to x∗ (w) as w in-
creases? To answer our question, let us differentiate
FOC with respect to w. Using the implicit function the-
orem, we see that
dx∗ ∑ pi ri u ′′ (w + x∗ ri )
= −
dw ∑ pi r2i u ′′ (w + x∗ ri )
The denominator in the expression above is the SOC

and hence negative. Now consider the numerator. By
DARA,
⎧
⎪> Au (w + x∗ ri ) if ri > 0
⎪
Au (w) ⎨
⎪ ∗
⎩< Au (w + x ri ) if ri < 0
⎪
so that for all i = 1, . . . , n,
Au (w)ri > Au (w + x∗ ri )ri .

By the definition of Au , for each i,
−u ′′ (w + x∗ ri )ri = Au (w + x∗ ri )ri u ′ (w + x∗ ri )
< Au (w)ri u ′ (w + x∗ ri )
Summing over i, we see that
− ∑ pi ri u ′′ (w + x∗ ri ) < Au ∑ pi ri u ′ (w + x∗ ri ) = 0
which implies that

dx∗
> 0.
dw
27 Exercise: Solve the portfolio problem and examine the

impact on the portfolio of an increase in wealth for the
following utility functions: (i) u(c) ∶= c1−σ /(1 − σ), where
σ ∈ (0, 1), (ii) u(c) ∶= ln c, and (iii) u(c) ∶= −ae−λc . where
a, λ > 0. ♢
8.8 Increased Risk Aversion and the Optimal

Portfolio
In the portfolio problem above, suppose there is instead

an increase in risk aversion. We shall now study the
impact of such a change on the optimal portfolio. As
above, investors have a wealth w. There are two as-
sets in which they can invest. The first is riskless, and
offers a constant rate of return of 1. The second is a
risky asset that gives a (net) return ri with probabil-
ity pi , i = 1, . . . , n. We shall denote the random rate
of return by r. Let x denote the part of wealth that
is allocated to the risky asset. Then, final wealth is
(w − x) + x(1 + r) = w + xr.

Suppose one agent has vN-M utility function u and an-

other agent has vN-M utility function v ∶= g ○ u, where g
is an increasing and concave function. Thus, the second
agent is more risk averse. For simplicity, we shall as-
sume that g is differentiable everywhere.
As above, we shall assume that the first agent invests
some but not all of her wealth in the risky asset, and
the optimal allocation is x∗ . Then, her first order condi-
tion is
(FOC) ∑ pi ri u ′ (w + ri x∗ ) = 0.
By the definition of v, we see that for each ri , v ′ (w +

ri x∗ ) = g ′ (u(w + ri x∗ ))u ′ (w + ri x∗ ). Since u is monotone
and g ′ is decreasing, we see that for all ri > 0,
∑ pi ri g ′ (u(w + ri x∗ ))u ′ (w + ri x∗ )
{ri >0}
⩽ g ′ (u(w)) ∑ pi ri u ′ (w + ri x∗ ).
{ri >0}
Similarly, for all ri < 0,
{ri <0}
⩽ g ′ (u(w)) ∑ pi ri u ′ (w + ri x∗ ).
{ri <0}
Summing up, we see that
ri
⩽ g ′ (u(w)) ∑ pi ri u ′ (w + ri x∗ )
ri
= 0,
so the second (more risk averse) agent will invest less

than the first agent in the risky asset. This is a very
intuitive result, and it is good that it holds.

28 Exercise: Does the result hold when there are two risky
assets? ♢
29 Exercise: How important is the assumption that g is

differentiable everywhere? Justify your claim. ♢
8.9 More Stochastic Dominance
First order stochastic dominance gives us a very weak

(and hence crude) way of comparing lotteries. Indeed,
F D1 G if (and only if) every EU agent with non-decreasing
utility function prefers F to G. We shall now focus our
attention on a smaller group of agents, and the follow-
ing question.
Given a set of preference relations on L(Z), for

what lotteries p and q will it be the case that p ≽ q
for all agents who are risk averse?
The risk averse agents we shall consider are those who

satisfy the EU hypothesis and therefore have increas-
ing and concave utility for money. For generality (and
because we shall avoid most proofs in this section),
we shall assume that lotteries can have infinite sup-
port. The notion that we shall introduce in this section
is second order stochastic dominance. To focus more
clearly on the underlying riskiness of agents, we shall
assume that distributions being compared have the
same mean.

30 Definition: For cdf’s F and G with the same mean, F

second order stochastically dominates G if for all non-
decreasing concave u ∶ R+ → R, we have
∫ u(x) dF(x) ⩾ ∫ u(x) dG(x).
We shall denote this as F D2 G. ♢
31 Example: Let F ∶= δ0 be the cdf of the lottery with all

its mass at 0. Let G1 be the cdf of the probability meas-
ure that puts equal weight on +1 and −1. Let u be non-
decreasing and concave. Then, by Jensen’s inequality
u(0) = ∫ u(x) dF(x) = u (∫ x dG(x)) ⩾ ∫ u(x) dG(x).
Therefore, every risk averse EU agent will prefer F to G.
The intuition behind this is simple. Lottery G has the
same mean as F and yet, in an obvious sense, is riskier.
Now suppose G2 is the cdf of the uniform probability
measure on [−1, +1]. Once again, F D2 G2 . This too is
true for the same reason. ♢
Let us try and generalise the intuition of the above

example. Let F be a cdf. For each x that can be the
outcome of running F, let Hx (z) be the cdf such that
∫ z dHx (z) = 0. In other words, EHx [z∣x] = 0, ie the ex-
pectation of Hx , conditional on x, is zero. Now consider
the compound lottery G obtained by first running F
and then Hx . We shall say that G is a mean-preserving
spread of F. Clearly, the mean of F is the same as the
mean of G. Moreover, every risk averter will prefer F to
G. It turns out that this property of distributions is equi-
valent to second order stochastic dominance. We state
this in the following theorem. See also §6.D of MWG.

32 Theorem: Let F and G be cdf’s with the same mean.

Then, the following are equivalent.
(i) F D2 G.
(ii) G is a mean preserving spread of F. ♢
33 Exercise: Let G be a mean preserving spread of F. Show

that every risk averter will prefer F to G. ♢
+ 2 2n−1−1 δ0 + 21n δ2n−1

n−1
1
34 Exercise: Let p0 ∶= δ0 and pn ∶= 2n δ−2n−1
for all n. Show that
i. p1 is a mean preserving spread of p0 .
ii. pn+1 is a mean preserving spread of pn .
w
iii. Show that pn Ð→ p0 . ♢
35 Exercise: Let p0 ∶= δ0 and pn ∶= αn δ−n + (1 − 2αn )δ0 + αn δn

for all n. Find αn such that
i. p1 is a mean preserving spread of p0 .
ii. pn+1 is a mean preserving spread of pn .
w
iii. Is it the case that pn Ð→ p0 ? ♢
w
Recall that the notation pn Ð→ p means pn weakly con-
verges to p. In other words, for all bounded continuous
functions f ∈ Cb (R) (the space of continuous bounded
functions on R), ∫ f(x) dpn (x) → ∫ f(x) dp(x). This
definition also holds for probability measures defined
on arbitrary topological spaces. For probability distribu-
tions on R, we have the following characterisation.
Let pn , p be probability measures on R and Fn , F the
w
corresponding cdf’s. Then, pn −−→ p if and only if
Fn (x) → F(x) at all x ∈ R where F is continuous. Recall
that since F is increasing, it can have at most countably
many points of discontinuity.

The major results on second order stochastic dominance

are from Rothschild and StiglitzRothschild and Stiglitz,
1970, 1971. The exercise above with unbounded support
of the lotteries is from MüllerMüller, 1998, who extends
the results of Rothschild-Stiglitz.
8.10 Non-expected Utility
Thus far, we have seen that the expected utility hypo-

thesis (ie, the assumption that preferences satisfy Inde-
pendence) is extremely useful, since utility functions
over lotteries take very simple forms. Nevertheless, as
alluded to before, there are a lot of reasons not to be
very satisfied with this.
There are some descriptive reasons, an example of
which is the Allais Paradox (see MWG). Roughly, it
turns out that a lot of people do not satisfy Independ-
ence. How to deal with this?
There are also methodological reasons to not be satis-
fied with the expected utility hypothesis. Consider, for
instance, the utility function u(w) = w1−σ /(1 − σ) where
σ ∈ (0, 1). Then, u ′ (w) = w−σ and u ′′ (w) = −σw−σ−1 ⩽ 0.
Therefore, Au (w) = σw−1 .
Notice that for the utility function u, σ is a measure of
the marginal utility of consumption. Are there normat-
ive or descriptive reasons for the curvature of u to be
directly related to the agent’s risk aversion? Of course,
this follows from the EU hypothesis, but the question
then becomes, If this is a consequence of assuming the
EU hypothesis, should we assume it in the first place?

These are not easy questions, but suffice it to say that

a lot of research has been done to find utility functions
over lotteries with nice properties that do not satisfy
the EU hypothesis, and this is still a frontier of research.
Moreover, preferences that disentangle risk aversion
and marginal utility in static models, and risk aversion
and intertemporal elasticity of substitution in dynamic
models are of great use in finance and macroeconomics.
8.11 Subjective States and Utility
Thus far, we have considered agents who agree on lot-

teries. For instance, consider the lottery that gives +1
and −1 as outcomes. We shall call the lottery objective
if all agents agree on the probabilities of the two out-
comes. Note that the agents may all be wrong, in that
the true probability, whatever that may mean or be, can
be quite different from what the agents believe.
But suppose the outcome is the outcome of a horse race.
Do all agents agree on the odds that one of two horses
will win? Typically, the answer will be No. In such a
situation, it becomes important to talk both about the
utilities that the agents have contingent on the outcome
as well as their subjective assessments of the relative
likelihood of the outcomes. To do this, we need a model
of subjective uncertainty.
Let S be a set of states. For instance, we could have S ∶=
{yankees, mets}, where a state is a team winning the
World Series. Different agents have different probability
assessments about each team winning the World Series.
Let L0 (Z) be the space of all simple objective lotteries on

a prize space Z. (We could take Z ∶= R.) We shall refer to

members of L0 (Z) as roulette wheel lotteries.
Now, let H ∶= {f ∶ S → L0 (Z)}, be the space of all
functions that map from the set of states to the set of
roulette-wheel lotteries. A member f ∈ H will be referred
to as a horse race lottery.
Note well the timing implicit here: First, a horse race
lottery is run. Then, depending on the outcome, the
agent receives a roulette wheel lottery (which is then
run). In other words, for each s ∈ S, f(s) ∈ L0 (Z). If
∣S∣ = n, then we can write f as f ∶= (f1 , f2 , . . . , fn ).
Notice also that H is a convex subset of a vector space.
In particular, for all f, g ∈ H and α ∈ [0, 1], (αf + (1 −
α)g)(s) = αf(s) + (1 − α)g(s) ∈ L0 (Z) for all s ∈ S. If Z is
finite, then H is a convex subset of a finite dimensional
vector space. We now move to preferences.
As always, a preference is a binary relation ≽ ⊂ H × H
that is also continuous (or weakly continuous if Z is
infinite). We shall assume that ≽ satisfies the following
axiom.
36 Axiom (Independence): If f ≻ g, then for all α ∈ (0, 1)

and h, αf + (1 − α) ≻ αg + (1 − α)h. ♢
Recall now from Exercise 23 in Chapter 7 that there

exists a utility function U ∶ H → R such that U(αf + (1 −
α)g) = αU(f) + (1 − α)U(g). In fact, we can say much
more about the form of U.
37 Proposition: Let U be as above. Then, there exist func-

tions (us )s∈S such that
U(h) = ∑ ∑ us (z)fs (z).
s z ♢

Proof. Let us assume that ∣S∣ = n and fix some h∗ ∈

H. For any other f ∈ H and for each s ∈ S, let fs ∶=
(fs , h∗−s ) ∈ H, where fs ∶= f(s) is the sth component,
and h∗−s is the horse race lottery h∗ except that there is
no roulette wheel lottery in state s.
Notice that
1 n−1 ∗ 1
f+ h = ∑ f1 .
n n s n
Then, by induction and from the linearity of U, we see

that
1 n−1 1
(X) U(f) + U(h∗ ) = ∑ U(f1 ).
n n s n
Now, for each s ∈ S, define Us ∶ L0 (Z) → R as
n−1
Us (p) ∶= U(h∗1 , . . . , h∗s−1 , p, h∗s+1 , . . . , h∗n ) − U(h∗ ).
n
Thus, for f ∈ H, we have
n−1
Us (fs ) = U(fs ) − U(h∗ ).
n
Summing this equation over all s ∈ S, and dividing by
n, we get
1 1 n−1
Us (fs ) = U(fs ) − U(h∗ ).
n n n
But comparing this equation with equation (X), we
see that
U(h) = ∑ Us (hs ).
s
By the linearity of U and from the definition of Us , we

see that for all p, q ∈ L0 (Z) and α ∈ (0, 1),
Us (αp + (1 − α)q) = αUs (p) + (1 − α)Us (q).

Now, for each z ∈ Z, define us (z) ∶= Us (δz ), so that

after some standard arguments, we get
Us (p) = ∑ p(z)us (z).

z
This gives us the desired result. ∎
Let us see what we have done above. The key was to

write down the utility in each state, and being able to
isolate a roulette wheel lottery in a state is key here,
hence the horse race lotteries fs . Then, since utility over
horse race lotteries is additive, the next step is to define
a utility state by state. This is what Us accomplishes,
and the rest is just accounting.
The above representation is called an additively separ-
able representation. Notice that the utilities are state
dependent, and that there is no role for the agent’s sub-
jective belief about the probability of a state.
To see this, suppose there is a subjective measure µ on
S, and the utility of a horse race lottery is
U(f) = ∑ µ(s)Us (fs ).

s
Then, we can define the function Ũs ∶= µ(s)Us , so that
U(f) = ∑ Ũs (fs ).

s
To get an active role for the subjective probability meas-

ure, we need state independent utility. For this, we
would need to define a null state and a Monotonicity
axiom that will deliver for us a state independent util-
ity. This would take us too far, but we will see such a
representation used in the sequel.

9 General Equilibrium Theory
9.1 Basic Definitions
An economy is a collection E = (H, (uh , eh )h∈H , X), where

H is a (finite) set of households and X ⊂ RL+ is the commod-
ity or consumption space. For each h ∈ H, the endowment
is eh ≫ 0 and uh ∶ X → R is a utility function that repres-
ents preferences.
An allocation is (xh ) ∈ XH . We shall say that an alloca-
tion is feasible if ∑h xh ⩽ ∑h eh . Let F ⊂ XH denote the
set of all feasible allocations. Now for some standing
assumptions. Where relevant, we shall indicate which
assumptions can be weakened.
1 Assumption: For all h ∈ H, uh is continuous, increasing

and strictly concave. ♢
9.2 Pareto Optimal Allocations
We shall now define a very weak form of optimality for

allocations. We shall also show that these allocations
exist.
114
9 General Equilibrium Theory 115
2 Definition: An allocation x̃ ∶= (x̃h ) Pareto dominates

an allocation x ∶= (xh ) if uh (x̃h ) ⩾ uh (xh ) for all h ∈ H
and uh (x̃h ) > uh (xh ) for some h ∈ H and if (x̃h ) ∈ F .
An allocation x is Pareto optimal if it is feasible and is
not Pareto dominated by any other allocation. The set of
Pareto optimal allocations is denoted by PO. ♢
3 Proposition: The set PO ≠ ∅. ♢
Proof. Suppose x̃ ∶= (x̃h ) solves
max ∑ uh (xh ).
x∈F h
Then, x̃ is Pareto optimal. To see this, suppose not, ie,

suppose x̂ ∈ F Pareto dominates x̃. Then, uh (x̂h ) ⩾
uh (x̃h ) for all h ∈ H and uh (x̂h ) > uh (x̃h ) for some
h ∈ H. Thus, ∑h uh (x̂h ) > ∑h uh (x̃h ), contradicting our
assumption about x̃.
All that remains is to show that we can solve maxx∈F ∑h uh (xh ).
Since uh is continuous for each h, the objective func-
tion is continuous. Since F is closed and bounded, it
is compact. Hence, the objective always has a maxim-
iser, which proves our claim. ∎
Notice that the key assumption is that F is compact. In

infinite dimensional spaces, this assumption is usually
harder (but not impossible) to come by. We shall now
show that an allocation is Pareto optimal if and only
if it is optimal with respect to a special class of social
welfare functions; namely those that are weighted sums
of households’ utilities. In particular, let a ∶= (ah ) ∈
RH h
+ ∖ {0}, and for an allocation x, consider ∑h ah uh (x ).
As we vary the weights (ah ), we obtain the entire set of
Pareto optimal allocations.

4 Proposition: Let x̃ ∈ F solve
max ∑ ah uh (xh ).
x∈F h
where ah > 0 for all h. Then, x̃ is Pareto optimal. ♢
Proof. Suppose not, so that x̂ ∈ F Pareto dominates

x̃. Since ah > 0 for all h, we see that ∑h ah uh (x̂h ) >
∑h ah uh (x̃h ), contradicting our assumption. ∎
Notice once again that the proposition does not require

any assumptions on the functions uh . We shall now
look for a converse to the above proposition. However,
this will necessitate some (strong) assumptions about
the utility functions. First a useful lemma.
5 Lemma: Let the economy E be such that for all h, uh ∶

X → R is concave. Then, the utility possibility set
U ∶= {v ∈ RH ∶ (∃ x ∈ F ) (v ⩽ u(x))}
is convex, where u(x) = (uh (xh ))h∈H . ♢
Proof. We shall first show that F is convex. To see

this, let x̂, x̃ ∈ F and α ∈ [0, 1]. Then,
∑ xh = α ∑ x̂h + (1 − α) ∑ x̃h
h h h
⩽ α ∑ e + (1 − α) ∑ eh
h
h h
h
⩽ ∑e ,
h
where the inequalities follow from the feasibility of x̃

and x̂. Now suppose ṽ ⩽ u(x̃) and v̂ ⩽ u(x̂). Then,
vh = αṽh + (1 − α)v̂h
⩽ αuh (x̃h ) + (1 − α)uh (x̂h )
⩽ uh (αx̃h + (1 − α)x̂h ) = u(x)

for all h, where the last inequality follows from the

concavity of uh . This proves the convexity of U . ∎
A useful theorem is the following, where int X is the

interior of the set X.
6 Theorem (Minkowski Separation Theorem): Let X, Y ⊂

RN be convex and such that int X ≠ ∅ and Y ∩ int X = ∅.
Then, there exists a ∈ RN ∖ {0} such that supx∈X ⟨a, x⟩ ⩽
infy∈Y ⟨a, y⟩, ie a separates X and Y . ♢
We can now prove the converse to Proposition 4.
7 Theorem: Let the economy E be such that for all h, uh ∶

X → R is concave. If x̃ ∈ F is Pareto optimal, then there
exists a ∈ RH
+ ∖ {0} such that x̃ solves
max ∑ ah uh (xh ).
x∈F h ♢
Proof. Let U be the utility possibility set for the eco-

nomy, x̃ a Pareto optimal allocation and ṽ ∶= u(x̃).
Now define,
Γ ∶= {v ∈ RH ∶ v ⩾ ṽ}.
We shall first show that U ∩ Γ = ṽ. To see this, suppose
not, ie suppose w ∈ U ∩ Γ . Then, since w ∈ U , there
exists x ∈ F such that w ⩽ u(x). But w ∈ Γ , which
means that wh ⩾ ṽh for all h and wh > ṽh for some h.
Therefore, uh (xh ) ⩾ ṽh for all h and uh (xh ) > ṽh for
some h, which contradicts the Pareto optimality of x̃.
Notice now that U is convex (by the lemma above), Γ
is (obviously) convex and int Γ ≠ ∅ (since Γ − ṽ = RH
+ ).

Moreover, ṽ ∉ int Γ , so that U ∩ int Γ = ∅. Thus, by

Minkowski’s Separation Theorem, there exists a ≠ 0
that separates U and Γ , ie infw∈Γ ⟨a, w⟩ ⩾ supv∈U ⟨a, v⟩.
We shall now show that a > 0. To see that ah ⩾ 0,
notice that ṽ + ih ∈ Γ where ih ∶= (0, . . . , 1, . . . , 0) with the
1 in the h-th entry. Then, ⟨a, ṽ + ih ⟩ = ⟨a, ṽ⟩ + ah ⩾ ⟨a, ṽ⟩,
ie ah ⩾ 0 for all h. But a ≠ 0, which shows that a > 0.
Finally, we need to show that
∑ ah uh (x̃h ) ⩾ ∑ ah uh (xh )
h h
for all x ∈ F . To see this, notice that ṽ ∈ Γ . If x ∈ F ,

then u(x) ∈ U . By the definition of a, it follows that
⟨a, ṽ⟩ ⩾ ⟨a, u(x)⟩, as desired. ∎
9.3 Competitive Mechanisms
We shall first make an assumption about the consump-

tion set. Notice that the set A ∶= {x ∈ XH ∶ ∑h (xh − eh ) ⩽ 0}
is compact. Since preferences are convex, we can restrict
attention to consumption sets that contain A in the in-
terior.
8 Assumption: The consumption set X is compact. ♢
A price is any p ∈ RL+ ∖ {0}. The value or cost to house-

hold h of a consumption bundle xh is then ⟨p, xh ⟩. A
generalised competitive mechanism (GCM) is a func-
tion Ih ∶ RL+ ∖ {0} → R++ , that for each price p, assigns
household h an income of Ih (p). Consider the following
examples of generalised competitive mechanisms.

9 Example: Given (eh ), let Ih (p) ∶= ⟨p, eh ⟩. This is the

standard Walrasian mechanism. As a second example,
let x̃ = (x̃h ) be a Pareto optimal allocation. Then, Ih (p) ∶=
⟨p, x̃h ⟩ is a generalised competitive mechanism. ♢
Given a GCM (Ih ), let xh (p) be the utility maximising

bundle subject to ⟨p, xh (p)⟩ ⩽ Ih (p), ie h’s demand func-
tion. We can then define the excess demand function
z(p) ∶= ∑ [xh (p) − eh ].

h
The following are properties of the excess demand func-

tion.
10 Proposition: Suppose the GCM (Ih ) is such that Ih (tp) =

tIh (p) for all t > 0 and Ih (p) is continuous. Then,
(a) z(p) is continuous,

(b) z(tp) = z(p) for all t > 0, and
(c) If (Ih ) is the Walrasian mechanism, then for all p,
⟨p, z(p)⟩ = 0, which is known as Walras’ Law. ♢
Proof. (a) Notice that xh (p) is continuous since Ih (p)

is continuous. Therefore, z(p) is continuous.
(b) Ih (tp) = tIh (p) implies xh (tp) = xh (p) for all t > 0.
Therefore, z(tp) = z(p) for t > 0.
(c) Since preferences are increasing, ⟨p, xh (p)⟩ = ⟨p, eh ⟩
for all h and all p. Therefore, z(p) ∶= ∑h [xh (p) − eh ]
and ⟨p, z(p)⟩ = 0. ∎

9.4 Equilibrium
An equilibrium of a GCM is a price vector p and an

allocation (xh ) such that (i) for each h, xh is utility max-
imising subject to ⟨p, xh ⟩ ⩽ Ih (p), and (ii) ∑h xh ⩽ ∑h eh ,
ie x ∈ F . A Walrasian equilibrium (WE) is an equilib-
rium of the Walrasian competitive mechanism. In this
section, we shall restrict attention to the Walrasian com-
petitive mechanism.
Notice that in order to find a competitive equilibrium, it
suffices to find a price p at which z(p) ⩽ 0. But z(p) is
positively homegeneous of degree 0, so we can normal-
ise prices. We shall adopt, where convenient, any one of
the following normalisations.
Let SL−1 ⊂ RL be the unit sphere, ie SL−1 ∶= {x ∈ RL ∶ ∥x∥2 =
1}. Our first normalisation is the set SL+−1 ∶= {x ∈ RL+ ∶
∥x∥2 = 1}, the positive orthant of the sphere. The second
normalisation is ∆L−1 ∶= {x ∈ RL+ ∶ ∥x∥1 = 1}. Our final
normalisation sets p` = 1 for some `.
11 Exercise: Let p ∈ RL
+ ∖ {0}. Show that there exists (i) λ > 0
such that λp ∈ SL+−1 , (ii) λ > 0 such that λp ∈ ∆L−1 . Now
suppose p ≫ 0. Show that there exists λ > 0 such that for
p ′ ∶= λp, p`′ = 1. ♢
Before we proceed, it is useful to prove the following

lemma. It is an example of the so-called variational tech-
nique. The rest of the proof follows GeanakoplosGeana-
koplos, 2003.
12 Lemma (Concave Perturbation Lemma): Let X ⊂

RN be convex, x̃ ∈ X and u ∶ X → R concave. Then,

2
arg maxx∈X [u(x) − ∥x − x̃∥2 ] is at most a singleton. If
2
x̃ ∈ arg maxx∈X [u(x) − ∥x − x̃∥2 ], then u(x) ⩽ u(x̃) for
all x ∈ X, ie x̃ ∈ arg maxx∈X [u(x)]. ♢
2
Proof. Since u is concave and − ∥x − x̃∥2 is strictly con-
2
cave, it follows that u(x) − ∥x − x̃∥2 is strictly concave.
Therefore, a maximiser, if it exists, is unique.
2
Now suppose x̃ ∈ arg maxx∈X [u(x) − ∥x − x̃∥2 ]. Let
ε ∈ (0, 1), and define yε ∶= (1 − ε)x̃ + εx for some x ∈ X,
so that yε − x̃ = ε(x − x̃). Then,
2 2
0 ⩾ [u(yε ) − ∥yε − x̃∥2 ] − [u(x̃) − ∥x̃ − x̃∥2 ]
2
= [u(yε ) − ε ∥x − x̃∥2 ] − u(x̃)
2
⩾ ε[(u(x) − u(x̃)) − ε ∥x − x̃∥2 ]
so that
2
u(x) − u(x̃) ⩽ ε ∥x − x̃∥2
for all ε ∈ (0, 1). This implies u(x) ⩽ u(x̃). Since x ∈
X is arbitrary, it follows that x̃ ∈ arg maxx∈X u(x), as
desired. ∎
13 Exercise: Let f ∶ X → R be concave and g ∶ X → R be

strictly concave. Show that f + g is strictly concave. Show
2
that g(x) ∶= − ∥x − x̃∥2 is strictly concave. Let gp ∶ RN → R
p
be given by gp (x) ∶= − ∥x − x̃∥p for 1 ⩽ p ⩽ ∞ and some
x̃ ∈ RN . For what values of p is gp strictly concave? ♢
An extremely useful theorem for existence results is the

following theorem.
14 Theorem (Brouwer): Let f ∶ ∆L−1 → ∆L−1 be continuous.

Then, there exists p ∈ ∆L−1 such that f(p) = p. ♢

The proof of the above theorem would take us too far

afield. Nevertheless, with this theorem in hand, we can
prove the existence of a Walrasian equilibrium.
15 Theorem: There exists p ∈ ∆L−1 such that the Walrasian

excess demand function has z(p) ⩽ 0. ♢
Proof. It is clear from the arguments above that for

such a p, (p, (xh (p)) constitutes a Walrasian equilib-
rium. Let us define ψ ∶ ∆L−1 → ∆L−1 as
2
(16) ψ(p̃) ∶= arg max [ ⟨p, z(p̃)⟩ − ∥p − p̃∥2 ].
p∈∆L−1
Since ∆L−1 is convex, ⟨p, z(p̃)⟩ is a linear function of

2
p and − ∥p − p̃∥2 is a strictly concave function of p.
Therefore,
2
arg max [ ⟨p, z(p̃)⟩ − ∥p − p̃∥2 ]
p∈∆L−1
has a unique maximiser, if one exists. But [ ⟨p, z(p̃)⟩ −

2
∥p − p̃∥2 ] is a continuous function of p, and ∆L−1 is
compact, so that a maximiser exists. Thus, ψ(p̃) is a
function, and is well defined.
By the Theorem of the Maximum, ψ(p̃) is continu-
ous, and so by Brouwer’s theorem, there exists p̃ such
that ψ(p̃) = p̃. By the concave perturbation lemma,
⟨p, z(p̃)⟩ ⩽ ⟨p̃, z(p̃)⟩ = 0 for all p ∈ ∆L−1 , where the
equality is Walras’ Law. Therefore, z(p̃) ⩽ 0. ∎
17 Exercise: Show that ⟨p, x⟩ ⩽ 0 for all p ∈ ∆L−1 implies

x ⩽ 0. ♢
To get an intuition for the result, consider the set of

maximisers of ⟨p, z(p̃)⟩, denoted as arg maxp∈∆L−1 ⟨p, z(p̃)⟩.

If z` (p̃) > 0, then we can set p` = 1, ie, if there is excess

demand for a good, then the price of the good rises. In
general however, ⟨p, z(p̃)⟩ can have many maximisers.
This is where we use the concave perturbation lemma,
which ensures that there is a unique maximiser. Thus,
the mapping ψ can be viewed as an adjustment process,
with a fixed point the rest point of the process.
9.5 Welfare Theorems
We now prove the fundamental welfare theorems of

general equilibrium theory. An important aspect of
our treatment is that apart from assuming there ex-
ists a Walrasian equilibrium, following Maskin and
RobertsMaskin and Roberts, 2008, we shall make no
other convexity assumptions.
18 Theorem (First Welfare Theorem): If uh is increasing,

then any WE is Pareto optimal. ♢
Proof. Let (p, (xh )) be a WE and suppose x ∉ PO.

Then, there exists x̃ ∈ F that Pareto dominates x, ie
uh (x̃h ) ⩾ uh (xh ) for all h and uh (x̃h ) > u(xh ) for some
h. Since preferences are increasing, it follows that
⟨p, x̃h ⟩ ⩾ ⟨p, xh ⟩ for all h, and ⟨p, x̃h ⟩ > ⟨p, xh ⟩ for the
household h for which uh (x̃h ) > u(xh ). This implies
⟨p, ∑ x̃h ⟩ > ⟨p, ∑ xh ⟩ = ⟨p, eh ⟩

h h
where the last equality is because preferences are

increasing. Thus, x̃ ∉ F , which is a contradiction,
thereby proving our theorem. ∎

19 Theorem (Decentralisation Theorem): Suppose pref-

erences are increasing, x̃ ∈ PO and e = x̃. If a WE exists,
x̃ is an equilibrium allocation. ♢
Proof. Let p be a WE price and x̂ a WE allocation.

Then, by the First Welfare Theorem, x̂ ∈ PO. But
(x̃h ) ∈ PO and x̃h ∈ Bh (p) for each h. Therefore,
uh (x̃h ) = uh (x̂h ) for all h, so that (p, (x̃h )) is a WE. ∎
As a corollary of the decentralisation theorem, we ob-

tain the following.
20 Corollary: Let x̃ ∈ PO, and p such that (p, x̃) is a WE.

Also let ẽ be another endowment such that ⟨p, x̃h ⟩ =
⟨p, ẽh ⟩. Then, for this new endowment, (p, x̃) is still a
WE. ♢
21 Theorem (Second Welfare Theorem): Suppose the

economy satisfies some assumptions that ensure the ex-
istence of a WE (such as those made previously), and
suppose x̃ ∈ PO, with x̃ ≫ 0, and that preferences are
increasing. Then, there exists price p and balanced trans-
fers (T h ) (ie, ∑h T h = 0) such that Walrasian income is
⟨p, eh ⟩ + T h and x̃ is an equilibrium allocation. ♢
Proof. Let p be price for x̃ as in the First Welfare The-

orem. Then, T h ∶= ⟨p, x̃h − eh ⟩ and ∑h T h = 0, as de-
sired. ∎
Finally, it should be remarked that the proofs given here

hold even in infinite dimensional consumption spaces,
as long as a WE exists.

10 Dynamic Programming
In these notes, I shall consider Markov decision prob-

lems, both deterministic and stochastic, with applic-
ations. The idea is not to substitute for a complete
course, but to provide you with a gentle introduction
to the ideas and methods.
10.1 Markov Decision Models
Consider a decision maker who takes an action at ∈ A

at each point in time, t = 0, 1, 2, . . .. The set of possible
actions is A, but at each point in time, only a subset of
the actions are feasible. At each point in time, there is
a state variable xt that serves the following three func-
tions: (i) It specifies the set of actions A(xt ) available to
the decision maker, (ii) it specifies the (instantaneous)
payoffs to the decision maker, contingent on the action
at , and (iii) it serves as a sufficient statistic of the his-
tory of events up to that point in time. Thus, all predic-
tions about the future based on observations about the
past can be made equally well with just the knowledge
of xt .
By choosing action at in state xt , she (the decision
maker) gets an instantaneous payoff of u(at , xt ). She
discount future payoffs at the rate of δ ∈ [0, 1). Condi-
tional on her action at in state xt , the state in period t + 1
is xt+1 with probability P(xt+1 ∣at , xt ).
125
10 Dynamic Programming 126
The decision makers objective is to actions so as to max-

imise expected discounted utility
∞
(1 − δ) E ∑ δt u(at , xt )
t=0
where the expectation is over the states that occur in the

future.
Such a problem is referred to as a stationary Markov
decision problem. Markov because at any point in time
t, all necessary information about the past is captured
by the state variable xt ; stationary because the decision
maker’s problem at time t + 1 looks the same as her
problem at time t, conditional on being in the same
state.
Remark. A broad swath of modern dynamic economics
falls under the rubric of stationary Markov decision
problems. Unfortunately, in practice, decision problems
do not come with state variables clearly identified, so
the main difficulty in practice is in identifying a suitable
state variable. In many problems, the choice of state
variable is not immediately apparent. In some problems,
there are multiple candidates for state variable. Put
differently, the art is in choosing the state variable so as
to capture the decision problem at hand. We will see
examples of these below.
First, a simple example.
1 Example: Consider the so-called one sector growth

model. The agent has an initial capital of k0 and wants
to maximise ∞
max (1 − δ) ∑ δt u(ct )
(ct ,kt+1 ) t=0

subject to
ct + kt+1 ⩽ f(kt )
ct , kt+1 ⩾ 0, t = 0, 1, 2, . . .
with u, f ∶ R+ → R concave. A natural choice of state

variable here is capital at time t, namely kt . Clearly, the
transition to a state in the next period is a deterministic
function. ♢
10.1.1 Valuing Strategies
A strategy for the Markov decision problem is a func-

tion that, at each point in time t, takes into account all
the information available, namely all past actions taken
and all states that occurred in the past. We shall denote
such a strategy σ. Naturally, the space of all possible
strategies is quite complicated. As one might imagine,
each strategy induces a probability distribution over
future states, and therefore, a probability distribution
over the future rewards the decision maker will receive.
Thus, the value of the strategy σ to decision maker, at
initial state x0 is vσ (x0 ). Although such values always
exist, it is usually next to impossible to compute these
values without making some additional assumptions.
The Markovian nature of the problem permits us to
restrict attention to strategies that are (i) Markovian
and (ii) stationary. Markovian because at time t, the
strategy only depends xt and not (x0 , x1 , . . . , xt ) and
(ao , a1 , . . . , at−1 ); stationary because the dependence on
the state xt is independent of time.
In what follows, we shall make two (restrictive) assump-
tions. Let the set of all possible states be S. We shall

first assume that the function u is such that supx∈S sup{∣u(a, x)∣ ∶
a ∈ A(x)} is bounded. Thus, u is uniformly bounded
both above and below. Secondl for all x ∈ S and a ∈ A(x),
the set {y ∈ S ∶ P(y∣a, x)} is finite.
Consider a stationary strategy σ. When, in state x, the
strategy prescribes action σ(x) ∈ A(x). A fundamental
result is the following:
2 Proposition: Suppose u is bounded uniformly bounded
both above and below. Then, for each stationary Markov
strategy σ ∶ S → A that is feasible (ie σ(x) ∈ A(x) for
all x ∈ S), there exists a unique value function vσ (⋅) that
satisfies the recursive equation below:
vσ (x) = (1 − δ)u(σ(x), x) + δ ∑ vσ (y) P (y∣σ(x), x).
y ♢
We remark that if u is unbounded above, then v = +∞

is also a solution to the above recursive equation. It
is worthwhile to see how to prove such a proposition.
Toward this end, let B(S) denote the space of bounded
functions on S endowed with the sup norm. Consider
the stationary Markov strategy σ. Then, for any f ∈ B(S),
let us define the operator Ψ ∶ B(S) → B(S) as:
(Ψf)(x, M) ∶= (1 − δ)u(σ(x), x) + δ ∑ f(y) P (y∣σ(x), x).
y
It is easy to verify that Ψ is well defined, ie for each

f ∈ B(S), Ψf is indeed in B(S). (A word on notation.
Even though Ψ is, in general, non-linear, we shall use
the standard operator notation and write Ψ(f) as Ψf.)
The operator Ψ has the following properties that are
easily verified:
Monotonicity For f, g ∈ B(S), f ⩽ g implies

Ψf ⩽ Ψg.
Discounting Ψ(f + c1) ⩽ Ψf + δc1, where c ∈ R.

3 Exercise: If f ∈ B(S), show that Ψf ∈ B(S). Verify that the

operator Ψ satisfies Monotonicity and Discounting. ♢
Then, by Blackwell’s Theorem (Theorem 29 below), we

see that the operator Ψ is a contraction (see §10.4), and
so there is a unique bounded function (that depends on
σ) that we shall call vσ ∈ B(S) such that Ψvσ = vσ . Thus,
we have proved that there exists a value function that
represents the value of the decision problem at each
state.
10.1.2 Optimality and the Bellman Equation
Thus far, we have seen that with every strategy σ, there

exists an associated value function vσ (⋅). The decision
maker’s problem is to choose the strategy with the
highest value function. The optimal value function is
then given by
V (x) = sup vσ (x)
σ
where the supremum is taken over all conceivable strategies

(and not just the stationary Markov ones). We shall say
that strategy σ is optimal if V (x) = vσ (x) for all x. The
optimal value function has a recursive structure.
4 Proposition: The optimal value function V satisfies the

recursive equation
V (x) = sup [(1 − δ)u(a, x) + δ ∑ V (y) P(y∣a, x)] .

a∈A(x) y ♢
The equation above is referred to as Bellman’s equation

or the optimality equation. It says that it is best to take
the best possible action today, contingent on following

the optimal path tomorrow onwards. This idea is also

referred to as Bellman’s Principle of Optimality. As
intuitive as it is, it requires proof. Essentially, one has to
ensure that all the sums are bounded.
A strategy σ is called a conserving strategy if it is the
one that obtains the supremum in Bellman’s equation.
That is, for all x ∈ S, σ satisfies
V (x) = (1 − δ)u(σ(x), x) + δ ∑ V (y) P (y∣σ(x), x).

y
It is called conserving because if you follow the strategy

for one stage and then follow an optimal strategy, you
get the optimal value. The following proposition con-
nects the twin concepts of conserving and optimal
strategies.
5 Proposition: (a) Any optimal strategy σ is conserving.

(b) If u is bounded above, then any conserving strategy
is optimal. ♢
The notion of conserving strategy can be very useful in

problems where A(x) is finite for each x ∈ S. This is be-
cause is A(x) is finite, then there must be a conserving
strategy that achieves the supremum in Bellman’s equa-
tion. Then, for u bounded above, we are guaranteed
that there exists an optimal strategy.
For many examples, it is often easier to compute the
optimal strategy by making other assumptions on the u,
A(x) and the transition probabilities P(y∣a, x). But there
is an important class of examples where the set of feas-
ible actions is always binary; stop (ie retire) or continue.
Such problems are referred to as optimal stopping prob-
lems. An important instance of such problems are re-
ferred to as Bandit Problems and will occupy much of
our attention hereon.

Before we proceed, we would be remiss if we didn’t ask

an obvious question
Why is Stokey-Lucas so bloody hard?
After all, there seems to be theorem after theorem in

that book, even in the deterministic case, while we seem
to have avoided it, especially in the next section, where
we actually solve a Bellman equation by finding a value
function explicitly and solving for the optimal strategy.
The reason is that Stokey-Lucas are looking for con-
serving strategies, whose existence is not easy to show,
unless, as in what follows, the action space is always
finite. To see why this matters, consider again Example
1, the one sector growth model. In such a model, let
S ∶= R+ be the state space, and assume for simplicity
that u and f are bounded. Also let Cb (S) be the space
of continuous, bounded functions on S. Then, for any
w ∈ Cb (S), define the dynamic programming operator
Ψ ∶ Cb (S) → Cb (S) as
(Ψw)(k) ∶= max [(1 − δ)u(f(k) − k ′ ) + δw(k ′ )].

k ′ ∈[0,f(k)]
It is easy to see that Ψ satisfies Monotonicity and Dis-

counting, so we should be able to conclude very easily
that there is a solution, ie a value function that solves
this problem. Or can we? The problem is that we haven’t
verified that Ψ is well defined. In particular, for any
w ∈ Cb (S), we haven’t shown that Ψw ∈ Cb (S).
Indeed, fix any k ∈ S. Then, since w ∈ Cb (S), and u and f

are bounded, it is easily seen that there exists a solution
to the maximisation problem
max [(1 − δ)u(f(k) − k ′ ) + δw(k ′ )]

k ′ ∈[0,f(k)]

and let the value of this solution be Ψw(k). Therefore,

Ψw(k) exists for each k ∈ S. Nevertheless, the question
remains, Is Ψw ∈ Cb (S)?
If you look carefully, you will notice that this is quite
difficult to prove. It is precisely this hurdle that needs
to be overcome. To do this, Stokey-Lucas introduce the
notions of correspondences (since there may not be a
unique maximiser) and then prove Berge’s Theorem of
the Maximum, which ensures that Ψ is well defined. All
of the problems that they attack require an uncountable
state space, even in the deterministic case. Thus, this is
a problem they need to (and do) tackle immediately.
A final word. Using the techniques outlined in Stokey-
Lucas, it is possible to solve deterministic dynamic pro-
gramming problems, using a first order condition and
the envelope theorem. This leads us to the (discrete
time) Euler equation, the solution to which gives us the
value function. Nevertheless, in the more complicated
stochastic case in Stokey-Lucas, this approach typically
fails miserably, since there is always a pesky expectation
functional E floating in the Bellman equation. This is
not a problem that has a solution in the discrete time
model.
However, if we were to take time as continuous, and
assumed that uncertainty follows some combination
of Brownian motion and a Poisson process, then we
can show that the solution to the Bellman equation is
the solution to a partial differential equation, known as
the HJB (Hamilton-Jacobi-Bellman) equation. (If there
is Brownian uncertainty, we use the celebrated Itô’s
Lemma to do this.) In particular, there is no expecta-
tion operator in the partial differential equation, and
with some parametrisations, the PDE becomes an ODE,
which are generally easier to solve. An example of this

is given in §10.2.2, where we treat a bandit problem in

continuous time.
10.2 Bandit Problems
Consider the following problem of sequential decision

making under uncertainty. Suppose there are n inde-
pendent projects you can work on at each point in time,
time being discrete and given by t = 0, 1, 2, . . . . If, at
some time t, a particular project is in state x and the
decision maker decides to work on it, then she receives
an expected reward u(x) and the next state of the pro-
ject becomes y with probability pxy . As always, she
discounts the future at a constant rate of δ ∈ [0, 1).
Moreover, the remaining projects, which remain idle,
do not change state. Moreover, at any point in time, she
can retire, an option which earns her a one-time pay-
ment of M, and then a payoff of 0 forever. We begin
with an analysis of the decision maker’s problem when
there is only one project, ie the case where n = 1.
10.2.1 A Single-Project Bandit Problem
Let the set of states that the project can be in be given

by S. At any point in time, the project is in some state.
After observing the state, the decision maker must de-
cide if she should operate the project or if she should
retire. If she operates on the project when the project is
in state x, she receives a payoff of u(x), and the project
moves to project y ∈ S with probability pxy . We shall
assume that the set {y ∈ S ∶ pxy > 0} is finite for each
x ∈ S. In other words, after working on a project that is

in state i, the project can only move to one of finitely

many states. (We make this assumption to avoid ques-
tions of measurability. It is quite easy to relax this as-
sumption.) If the decision maker decides to retire, then
she earns a reward of M (in normalised, present value
terms).
We let v(x, M) denote the maximal expected discounted
return when the state is x ∈ S, future payoffs are dis-
counted at rate δ, 0 ⩽ δ < 1, and the retirement option
gives the decision maker M. Then, the value function v
satisfies the Bellman equation
v(x, M) = max [M, (1 − δ)u(x) + δ ∑ pxy v(y, M)] .

y
The first question to address is the existence of such

a value function. Toward this end, let us assume that
function u ∶ S → R is bounded, and for a fixed M ∈ R,
let B(S) denote the space of bounded functions on S
endowed with the sup norm. Then, for any f ∈ B(S), let
us define the operator Ψ ∶ B(S) → B(S) as:
(Ψf)(x, M) ∶= max [M, (1 − δ)u(x) + δ ∑ pxy f(y, M)] .

y
It is easy to verify that Ψ is well defined, ie for each

f ∈ B(S), Ψf is indeed in B(S). (A word on notation.
Even though Ψ is, in general, non-linear, we shall use
the standard operator notation and write Ψ(f) as Ψf.)
The operator Ψ has the following properties that are
easily verified:
Monotonicity For f, g ∈ B(S), f ⩽ g implies

Ψf ⩽ Ψg.
Discounting Ψ(f + c1) ⩽ Ψf + δc1, where c ∈ R.

6 Exercise: If f ∈ B(S), show that Ψf ∈ B(S). Verify that the

operator Ψ satisfies Monotonicity and Discounting. ♢
Then, by Blackwell’s Theorem (Theorem 29 below), we

see that the operator Ψ is a contraction (see §10.4), and
so there is a unique bounded function v ∈ B(S) such
that Ψv = v. Thus, we have proved that there exists a
value function that represents the value of the decision
problem at each state.
Our first objective is to show that if it is optimal to re-
tire when the termination reward is M, it is also op-
timal to retire when the terminal reward is M ′ > M.
This is achieved via the following lemma, which also
described some other properties of the value function.
7 Lemma: Let B ∶= sup{∣u(x)∣ ∶ x ∈ S}. Then, for each

(fixed) x ∈ S, the value function v(x, M) has the following
properties.
(a) v(x, M) − M is decreasing in M.
(b) v(x, M) is convex and increasing in M.
(c) v(x, M) is constant for M ⩽ −B/(1 − δ).
(d) v(x, M) = M for all M ⩾ B/(1 − δ). ♢
Proof. Consider the operator Ψ ∶ B(S) → B(S), defined

as above, so that
(Ψf)(x, M) ∶= max [M, (1 − δ)u(x) + δ ∑ pxy f(y, M)] .

y
Notice that Ψ is a contraction mapping, so that for

any v0 ∈ B(S), the sequence defined inductively by

vn+1 ∶= Ψvn converges to the unique fixed point of Ψ,

denoted by v.
Let v0 (x, M) ∶= max[0, M], and define the sequence
(vn ) by the inductive rule
vn+1 (x, M) ∶= (Ψvn )(x, M).
We shall show by induction that the function vn (x, M)

has the properties (a) to (d) above, so that the lim-
iting value function v(x, M) also has all the desired
properties. It is clear that the function v0 (x, M) ∶=
max[0, M] has all the properties (b) to (d). Notice that
v0 (x, M) − M = max[0, −M] so that v0 (x, M) − M satisfies
(a). Now suppose vn has properties (a) to (d). To see
that vn+1 has property (a), notice that
(Ψvn )(x, M)−M ∶= max [0, (1 − δ)u(x) + δ ∑ pxy (vn (y, M) − M) − (1 − δ)M] .
k
which is clearly decreasing in M. Similarly, assume

that vn (x, M) is convex in M. Then, it is easy to show
that vn+1 is convex in M. If vn is increasing in M, then
so is vn+1 . Similar arguments establish properties (c)
and (d) for the function vn+1 . Thus, the limiting func-
tion v also has all the desired properties. ∎
8 Exercise: Show that if vn is convex in M, then so is vn+1 .

(Hint: You will need to show that for any a, b, c, d ∈ R,
max[a + c, b + d] ⩽ max[a, b] + max[c, d].) Show that
the function v(x, M) is continuous in M. (How does this
follow from the convexity of v in M?) ♢
Now suppose M ′ > M and suppose the project is in

state x. We claim that if it is optimal to retire when the

terminal reward is M, then it is optimal to retire when

the terminal reward is M ′ . To see this, it will suffice to
show that v(x, M ′ ) = M ′ . This is true because
0 ⩽ v(x, M ′ ) − M ′ by the Bellman equation

⩽ v(x, M) − M by Lemma 7
= 0 since retirement is optional at M,
so that v(x, M ′ ) = M ′ .
Let G(x) denote the smallest value of M at which retire-
ment at i is optimal. That is
G(x) ∶= min [M ∶ v(x, M) = M].
G(x) is referred to as the Gittins Index of the project

in state x. (We can take the minimum instead of the
infimum since v(x, ⋅) is continuous in M.) Thus, the op-
timal strategy of the decision maker can be summarised
by the following theorem.
9 Theorem: When the project is in state x, it is optimal

for the decision maker to retire if M ⩾ G(x), and to con-
tinue if M ⩽ G(x). ♢
We end with a more probabilistic approach to some

properties of the value function, so that we can get a
more explicit expression for the Gittins Index. A word
on strategies. It is clear what the optimal strategy in
the Bandit problem is. More generally, for a fixed M,
a stationary, Markov strategy is a mapping σ ∶ S →
{Retire, Continue}. The strategy is Markovian because it
only depends on the state and is stationary because it is
independent of time.
Since the transitions between states is random, each
strategy induces a probability measure over the set

of states so that the time of exit is random. For each

strategy σ, there is an associated random stopping time
Tσ . We now see a probabilistic proof of the proposition
that v(x, M) is convex in M.
10 Proposition: For a fixed x ∈ S, v(x, M) is convex in M. ♢
Proof. Consider a stationary Markov strategy σ, with

random retirement time Tσ . Then,
Tσ −1
δTσ
vσ (x, M) ∶= (1 − δ) Eσ [ ∑ δt u(xt ) + M ∣ x0 = x]
t=0 1−δ
which is the expected discounted return before retir-

ing at the random time Tσ . Notice that vσ (x, M) is an
affine function in M. Hence,
Tσ −1
δTσ
v(x, M) = (1 − δ) max Eσ [ ∑ δt u(xt ) + M ∣ x0 = x]
σ
t=0 1−δ
is a convex function in M, since it is the maximum of

a family of affine functions in M. (We divide by (1 − δ)
above since M is in normalised present-value terms.)∎
The proposition above is useful because it leads us nat-

urally to another interpretation of the Gittins Index.
Consider the one-armed bandit in state x. Since the
decision maker is indifferent between retiring and con-
tinuing when M = G(x), we have that for any other
σ −1 t
strategy σ, G(x) ⩾ (1 − δ) Eσ [∑Tt=0 δ u(xt ) ∣ x0 = x] +
T
G(x) Eσ [δ ∣ x0 = x], with equality for the optimal
σ
strategy σ. Thus,
T −1
σ
Eσ [∑t=0 δt u(xt ) ∣ x0 = x]
G(x) ⩾ (1 − δ)
1 − Eσ [δTσ ∣ x0 = x]

with equality for the optimal policy. Rearranging, we

get,
T −1
σ
Eσ [∑t=0 δt u(xt ) ∣ x0 = x]
G(x) = sup
σ Eσ [1 − δTσ ∣ x0 = x]/(1 − δ)
σ −1 t
(1 − δ) Eσ [∑tT=0 δ u(xt ) ∣ x0 = x]
= sup σ −1 t
σ (1 − δ) Eσ [∑tT=0 δ ∣ x0 = x]
where the numerator represents the expected discoun-

ted return prior to Tσ , while the denominator repres-
ents the expected discounted time prior to Tσ . G(x)
can be interpreted as the (normalised, present discoun-
ted) value of a lump sum retirement payment such that
the decision maker is indifferent between receiving the
lump sum and retiring or continuing to play optimally
and receiving the lump sum after some (optimal, but
random) number of further plays. Since Tσ is a random
stopping time it is, in general, not possible to compute
G(x) explicitly. Nevertheless, we consider below a spe-
cific case where it is possible to compute the Gittins
Index explicitly.
11 Theorem (The deteriorating case): Suppose that

v(y, M) ⩽ v(x, M) for all M and for all y ∈ {z ∈ S ∶ pxz > 0}.
Then,
G(x) ∶= u(x).
Proof. The Bellman equation is such that v satisfies
v(x, M) = max [M, (1 − δ)u(x) + δ E[v(y, M) ∣ x]]
where y ∈ {z ∈ S ∶ pxz > 0}. If M = G(x), so that

v(x, M) = M, and stopping is optimal at x, then by the
hypothesis that the value function is deteriorating, it

must be optimal to stop at any y ∈ {z ∈ S ∶ pxz > 0}.

Moreover, for any such y, v(y, M) ⩽ M. Therefore,
M = (1 − δ)u(x) + δM
which proves our theorem. ∎
We now proceed to a more concrete formulation of the

bandit problem, where it is possible to say a little more
about the value function.
10.2.2 A More Concrete Problem
The name “bandit” problem comes from the infamous

one-armed bandit: the slot machine. In this section, we
consider the simplest such problem and try and deduce
some properties.
Suppose there is a (slot) machine that makes a payoff
each time a lever is pulled. Payoffs lie in the set {0, 1}.
The machine is one of two types. The high-return ma-
chine gives a payoff of 1 with probability θH ∈ (0, 1) and
the low-return machine gives a payoff of 1 with probab-
ility θL ∈ [0, θH ). As above, time is discrete and future
payoffs are discounted at the rate of δ ∈ [0, 1). The de-
cision maker can retire at the beginning of any period
(before pulling the lever), thereby getting a termination
reward of M.
At the beginning of a period, the decision maker has
prior probability p that the machine is the high-return
machine. If the decision maker pulls the lever, she gets
a payoff and moves to the next period with a new prior.
Let the set of all possible prior probabilities be [0, 1].

With this as our state space, we are back to the abstract

bandit problem described above in §10.2.1. If the de-
cision maker enters a period with prior probability p,
then upon seeing a reward of +1, his posterior probabil-
ity that θ = θH is
pθH
ϕ1 (p) ∶=
pθH + (1 − p)θL
while his posterior probability that θ = θH , upon seeing
a reward of 0, is
p(1 − θH )
ϕ0 (p) ∶= .
p(1 − θH ) + (1 − p)(1 − θL )
Here is a useful observation.
12 Exercise: Show that Ep [ϕx (p)∣x] = p, where x ∈ {0, 1} is

the outcome of the lever being pulled and the expecta-
tion is over all possible outcomes. This is known as the
martingale property of beliefs. ♢
The only changes we will make to the formulation of

the problem is to endow the state space [0, 1] with the
natural Euclidean topology (and metric). Now, consider
the vector space of (real valued) continuous functions
C[0, 1] endowed with the sup norm. Using the argu-
ment used in §10.2.1, we see that there is a unique (con-
tinuous) function v ∈ C[0, 1] that satisfies the Bellman
equation
v(p, M) = max {M, (1−δ)(pθH +(1−p)θL )+δ E [v(ϕx (p), M)∣x, p] }
where x is the outcome 0 or 1, and E is the expectation

functional. With probability pθH + (1 − p)θL , x takes the
value 1 and with probability p(1 − θH ) + (1 − p)(1 − θL ), x
takes the value 0.

13 Exercise: Show that v(p, M) is increasing in p. (You

should start with a continuous function as in Lemma 7
above, but that is also increasing in p and show that the
operator preserves this property.) You may consider the
simpler case where θL = 0.
Proof (Solution). Let f ∶ [0, 1] → R be an increasing

function. Also, let fx (p) ∶= f(ϕx (p)) for x ∈ {0, 1}. No-
tice that ϕx (⋅) is an increasing function of p, therefore
fx (⋅) is increasing in p. Then, the dynamic program-
ming operator is Ψ ∶ C[0, 1] → C[0, 1], wherein
Ψf(p, M) = max {M, (1−δ)(pθH +(1−p)θL )+δ E [f(ϕx (p), M)∣x, p] }.
Notice that pθH + (1 − p)θL = p(θH − θL ) + θL is in-

creasing in p. Therefore, it will suffice to show that
E [f(ϕx (p), M)∣x, p] is increasing in p. To this end, let
p ′ > p. Then, E [f(ϕx (p ′ ), M)∣x, p] − E [f(ϕx (p), M)∣x, p]
(after suppressing the dependence on M) is
= p ′ f1 (p ′ ) + (1 − p ′ )f0 (p ′ ) − pf1 (p) − (1 − p)f0 (p)

= p[f1 (p ′ ) − f1 (p)] + (1 − p)[f0 (p ′ ) − f0 (p)] + (p ′ − p)[f1 (p ′ ) − f0 (p ′ )]
⩾ 0,
since fx (⋅) is an increasing function. Thus, Ψf is also

an increasing function in p, which implies that the
value function v, which is the unique fixed point of Ψ
and satisfies v ∶= limn→∞ Ψn f, is also increasing in p. ∎
Unfortunately, even though we have a seemingly straight-

forward Bellman equation, it is not possible to obtain a
closed form solution for the value function v. Indeed,
it is not even possible to compute the Gittins Index ex-
plicitly. Nevertheless, it is possible to determine a few

properties of the value function that are quite enlighten-

ing. We shall do this via some exercises.
14 Exercise: Let p ∈ (0, 1). Show that v(p, 0) > pθH + (1 −

p)θL . ♢
15 Exercise: For M ∈ (0, 1), argue that (i) v(p, M) ⩾ max [M, v(p, 0)]
and (ii) v(p, M) ≠ max [M, v(p, 0)]. ♢
16 Exercise: Is it the case that v(p, M) is convex in p? ♢
We now consider an extreme case of our parametrisa-

tion, where it is possible to compute the Gittins Index.
17 Exercise: Consider the bandit problem above, where

θL = 0. Show that for each p ∈ [0, 1], the Gittins Index
pθH
G(p) ∶= .
δp + (1 − δ)
Why do you suppose G(p) > pθH , which is the expec-

ted (normalised, present discounted) value of the pay-
offs if the arm is high-return? (Hint: Argue that G(p) ⩾
v(p, 0) > pθH .) ♢
Notice that in the example above, even though we were

able to compute the Gittins Index explicitly, we are not
able to explicitly solve for the value function v(p, M) for
general M. We end with the following useful observa-
tion. Suppose it is the case that
θL < M < θH .
In other words, suppose the decision maker knows the

type of the machine. Then, she would choose to retire

if and only if the machine is the low-return variety. In

case there is uncertainty, notice that with positive prob-
ability, she will retire even if the machine is actually of
the high-return kind.
18 Remark: We have conveniently assumed that there are

only two states that are possible and that each period,
we get a noisy signal about the state. Suppose at some
point in time, the probability that the true state is p.
Then, with our (binary) specification of uncertainty, the
variance of the estimate is p(1 − p). Since p ′ , the posterior
distribution upon the receipt of a new signal, can be
greater or smaller than p, it follows that the variance can
either go up or go down.
Now consider the case where the true state θ is initially
distributed normally with mean y and variance τ2 . Sup-
pose the observation is s ∶= θ + ε, where ε ∼ N(0, σ2 ) is
noise. If a signal s is observed, the posterior distribution
is also normal, with mean
σ2 y + τ2 s
m ∶=
σ2 + τ2
and variance
σ2 τ2
τ ′2 ∶= .
σ2 + τ2
Then, the precision of the posterior distribution of θ is
1
ρ ′ ∶= = ρ + ρσ ,
τ ′2
where ρ ∶= 1/τ2 and ρσ ∶= 1/σ2 . Notice that in this case,
ρ ′ > ρ regardless of the realisation of ε, so the variance of
the estimate is always decreasing.
This is an important structural difference between the
two popular models of uncertainty in the learning literat-
ure. ∎ ♢

We now consider the bandit problem in continuous

time, where we get an explicit, closed form solution
of the Bellman equation, ie, the value function. It will
also be possible to verify a number of properties of the
solution in a straightforward manner.
10.2.3 Poisson Version of a Concrete Example
We shall now consider a version of the bandit problem

in continuous time. The biggest advantage of doing this
is that we are able to eliminate the expectation func-
tional so that the value function is the solution to a first
order ODE. We shall consider a model where beliefs
move downward monotonically, with the possibility
of a single upward jump to the boundary. The specific
formulation we shall use also has the advantage of not
requiring the use of Itô’s lemma.
The Poisson version of the bandit problem in continu-
ous time (with discounting) was introduced by Presman,
1990. Here we shall follow Keller, Rady and Cripps,
2005 (henceforth KRC), and use a simple parametrised
version of the problem solved by Presman, 1990.
Time t ∈ [0, ∞) is continuous, and the discount rate is
r > 0. There is 1 unit of a perfectly divisible resource.
The decision maker faces a two-armed bandit problem
where she continually has to decide what fraction of
the available resource to allocate to each arm. One arm
S is safe and yields a known deterministic payoff. The
other arm R is risky and can either be bad or good. If it
is bad, then it always yields a return of 0, regardless of
the fraction of the resource allocated to it; if it is good,
then it yields lump-sum payoffs at random times, the

arrival rate of these payoffs being proportional to the

fraction of the resource allocated to it.
More precisely, if the decision maker allocates kt ∈ [0, 1]
of the resource to R over an interval [t, t + dt), and con-
sequently fraction 1 − kt to S, then she receives the payoff
(1 − kt )sdt from S, where s > 0 is perfectly known. The
probability that she receives a lump-sum payoff from R
at some point in the interval is kλθdt, where λ > 0 is per-
fectly known, and θ = 1 if the arm is good, while θ = 0
if the arm is bad. Lump-sums are independent draws
from a time-invariant distribution on R++ with a known
mean h. Thus, when the fraction kt of the resource is
allocated to R on [t, t + dt), the overall expected payoff
increment conditional on θ is [(1 − kt )s + kt gθ]dt, wher
g = λh.
To keep things interesting, let us assume that 0 < s < g,

so that if the type of the arm is known perfectly, the
decision maker will choose R if and only if it is good. A
player’s strategy is a set of actions (kt ) such that kt is
measurable with respect to the natural filtration. Then,
her total expected payoff, normalised to present-period
terms is
E [∫ re−rt [(1 − kt )s + kt gθ]dt]
where the expectation is over both θ and the stochastic

process (kt ). (The second part of the expectation, the
conditioning on (kt ), is for the following reason: If at
some point in time, the decision maker gets a positive
payoff from R, then from that point on, all her resources
will go to R. But this moment is random (and it may
never occur), so her strategy is random, and dependent
on her history of observations.) Before we proceed, we
perform some calculations.

19 Exercise: Suppose the decision maker has to place all

of her resources in the safe arm. Show that her expected
payoff is s. Suppose she knows the risky arm is good.
Show that her expected payoff is g. ♢
We are interested in stationary strategies, with the state

variable being p ∈ [0, 1], the current belief that the arm
R is good. If the current action is k, then the expected
payoff increment is
[(1 − k)s + kgp] dt
and the subjective probability of a breakthrough on the
risky arm is pkλdt. Let us now address the evolution of
beliefs.
Warning. In what follows, we shall only consider ex-
pressions up to terms of order o(dt), which we shall
ignore. That this is legitimate is more difficult to prove
and would take us too far from our path. Moreover, we
shall use the notation dt rather informally, instead of
being more formal with the derivation of the ODE that
is the solution to the Bellman equation below. Neverthe-
less, it turns out that everything is legitimate, but this
is also quite tedious to prove, and so we shall return to
this question on another day.
When current belief is p and investment in R is k, then
the probability that a breakthrough occurs is pkλdt. In
this case, the posterior jumps to p = 1. The probability
that there is no breakthrough is 1 − pkλdt = p(1 − kλdt) +
(1 − p). To calculate the posterior probability, use Bayes’
P(θ = 1 & 0)
rule which says that P(θ = 1∣0) = (where 0
P(0)
represents the event “no breakthrough”). Formally, this
means that
p(1 − kλdt)
p + dp = .
(1 − p) + p(1 − kλdt)

We can rewrite this as

pkλdt(−1 + p)
dp =
1 − pkλdt
pkλdt(−1 + p) 1 + pkλdt
=
1 − pkλdt 1 + pkλdt
−kλp(1 − p)dt + o(dt)
=
1 − o(dt)
= −kλp(1 − p)dt.
We now move to the Bellman equation. We shall write

down the Bellman equation, so the value function satis-
fies
u(p) ∶= max {r[(1 − k)s + kgp]dt + e−rdt E [u(p + dp)∣p, k]}.

k∈[0,1]
The first term is the average instantaneous payoff and

the second term is the continuation payoff.
When current belief is p and investment in R is k, then
the probability that a breakthrough occurs is pkλdt. In
this event, the value function jumps to u(1) = g (since
the new posterior is p = 1). The probability that there is
no breakthrough is 1 − pkλdt = p(1 − kλdt) + (1 − p), and
the value function changes to u(p + dp) ≈ u(p) + u ′ (p)dp =
u(p) − kλp(1 − p)u ′ (p)dt. Expanding e−rdt as 1 − rdt
(and ignoring terms of order o(dt)), we can rewrite the
Bellman equation as follows:
u(p) = max {r[(1 − k)s + kgp]dt +

k∈[0,1]
(1 − rdt)[pkλgdt + (1 − pkλdt)(u(p) − kλp(1 − p)u ′ (p)dt)]}
Now expand all the terms on the right hand side, and
collect all terms of order o(dt). Cancel the term u(p)
from both the left hand and right hand side of the equa-
tion. Next, move the term −ru(p)dt to the left hand side

of the equation. Finally, divide throughout by rdt and

by letting dt → 0, ignore o(dt)/dt. This gives us
u(p) = s + max k[b(p, u) − c(p)]

k∈[0,1]
where
c(p) = s − gp
and
pkλ
b(p, u) = (g − u(p) − (1 − p)u ′ (p)).
r
Clearly, c(p) represents the opportunity cost of playing
R, while b(p, u) is the discounted expected benefit of
playing R. The first part of b(p, u) is the expected value
of the jump should a breakthrough occur; the second
part is the deterioration in expected payoff if no break-
through occurs.
Finally, notice that the optimal strategy always requires
that k ∈ {0, 1}, depending on whether the benefit of
playing R is greater than the opportunity cost or not.
In the latter case, when it is better to play S, the value
function is u(p) = s; in the former case, u satisfies the
first order ODE
λp(1 − p)u ′ (p) + (r + λp))u(p) = (r + λ)gp.
For the solution to this ODE, see §10.5 and, in particu-

lar, §10.5.1 in particular. The solution has the form
v(p) = gp + C(1 − p)Ω(p)µ
where
1−p r
Ω(p) = and µ ∶= .
p λ
Let us interpret the value function. The first term is
the expected payoff from committing to the risky arm,
while the second term is the option value of being able
to switch the the safe arm.

[SOMETHING ABOUT SMOOTH PASTING]

We know that there exists a point p∗ such that v(p) = s
for p ⩽ p∗ and by smooth pasting, we know that v ′ (p∗ ) =
0. The condition v(p∗ ) = s implies that
s − gp∗
C=
(1 − p∗ )Ω(p∗ )µ
while the smooth pasting condition v ′ (p∗ ) = 0 implies

that (after some rearranging)
µ + p∗
g = CΩ(p∗ )µ
p∗
(s − gp∗ )(µ + p∗ )
=
(1 − p∗ )p∗
We can solve this to give us

µs
p∗ ∶= .
µg + g − s
Thus, the value function can be written as

µ
∗ 1−p Ω (p )
(20) v(p) = gp + (s − gp ) ( ∗
)( )
1−p Ω(p∗ )
when p > p∗ and v(p) = s otherwise. We can summarise

this in the proposition below.
21 Proposition: The optimal strategy for the decision

maker is to play R exclusively if p > p∗ and S exclus-
ively for p < p∗ . The value function is given in equation
(20). ♢
Let us look back at some of the more interesting proper-

ties of the solution.
• The solution is convex in p

• The solution has a threshold property, and as in

the discrete time case, the risky arm is abandoned
with positive probability even when it is good
• The probability of such a mistake is increasing in s
and decreasing in r
• u(p) = gp, if s = 0, as in the discrete time case
• Here too, there is an option value to being able to
change to the safe arm
22 Exercise: Show that v(p) is convex in p. (It will suffice

to show that v ′′ ⩾ 0.) Is it the case that v is increasing in
p? ♢

10.3 The Space of Bounded Functions
Let S be an arbitrary set and let B(S, R) be the space of

all bounded functions from S to R, that we shall write
as B(S). Thus,
B(S) ∶= {f ∈ RS ∶ sup ∣f(x)∣ < ∞}.

x∈S
We shall show here that B(S) is a complete normed

space, ie a Banach space.
10.3.1 Defining a Norm
Note first that B(S) is a vector space, ie for all f, g ∈ B(S)

and α, β ∈ R, αf + βg ∈ B(S).
23 Exercise: Verify that B(S) is a vector space. If S = R, are

the function f(x) = ∣x∣, f(x) = ln(1 + x2 ) and f(x) = x sin( x1 )
in B(R)? If f ∈ B([0, 1]), does f have to be continuous?
Does f have to be measurable? ♢
Let f ∈ B(S). Define ∥f∥∞ ∶= supx∈S ∣f(s)∣. By definition of

B(S), ∥f∥∞ < ∞ for all f ∈ B(S). Before we check that we
indeed have a norm, we first recall the definition of a
norm.
24 Definition: Let V be a vector space. Then, the function

∥⋅∥ ∶ V × V → R+ is a norm if it satisfies the following: For
all u, v ∈ V ,
(a) ∥u∥ = 0 if and only if u = 0.

(b) ∥λu∥ = ∣λ∣ ∥u∥ for all λ ∈ R (absolute homogeneity)
(c) ∥u + v∥ ⩽ ∥u∥ + ∥v∥ (subadditivity)

25 Exercise: Show that ∥⋅∥∞ satisfies properties (a) and (b)

above. ♢
To see that ∥⋅∥∞ is subadditive, let f, g ∈ B(S). Then, for

all x ∈ S, ∣f(x) + g(x)∣ ⩽ ∣f(x)∣ + ∣g(x)∣. (Why?) Therefore,
∥f + g∥∞ = supx∈S ∣f(x) + g(x)∣ ⩽ supx∈S ∣f(x)∣+supx∈S ∣g(x)∣ =
∥f∥∞ + ∥g∥∞ , as desired.
10.3.2 Completeness
Let (fn ) be a Cauchy sequence in B(S). We want to

show that there exists a function f ∈ B(S) such that
∥fn − f∥∞ → 0. Clearly, the only candidate for such an f
must be defined as follows:
f(x) ∶= lim fn (x) for all x ∈ S.

n→∞
We first need to show that f ∈ B(S). To see this, let ε > 0,

so that by the Cauchyness of the sequence (fn ), there
exists N > 0 such that for all m, n > N, ∥fn − fm ∥∞ ⩽ ε.
Then, for all m, n ⩾ N, and for all x ∈ S, we have that
∣f(x) − fn (x)∣ = lim ∣fm (x) − fn (x)∣ ⩽ lim ∥fm − fn ∥∞ ⩽ ε.

m→∞ m→∞
Since this holds for all x ∈ S, we see that supx∈S ∣f(x) − fn (x)∣ =
∥f − fn ∥∞ ⩽ ε. Therefore, ∥f∥∞ = ∥(f − fn ) + fn ∥∞ ⩽
∥f − fn ∥∞ + ∥fn ∥∞ ⩽ ∥fn ∥∞ + ε. Thus, f ∈ B(S).
All that needs to be shown now is that ∥f − fn ∥∞ → 0.
Fortunately, this also follows from the fact that ∥f − fn ∥∞ ⩽
ε, since ε is arbitrary.

10.3.3 An Order Structure
It is useful to mention that there is a natural (incom-

plete) ordering on B(S). For any functions, f, g ∈ B(S),
we say that f ⩽ g if and only if f(x) ⩽ g(x) for all x ∈ S.
This is also referred to as the pointwise order.
10.4 Contraction Mappings
Let (X, d) be a metric space. We begin with a definition

for maps from X to itself.
26 Definition: A mapping f ∶ X → X is said to be a contrac-

tion if there exists a number α ∈ [0, 1) such that
d(f(x), f(y)) ⩽ αd(x, y)
for all x, y ∈ X. The mapping f is said to have modulus of

contraction α. ♢
The following is a very useful exercise.
27 Exercise: Let (X, d) be a complete metric space and let

(xn ) be a sequence in X such that for all n ⩾ 0, d(xn+1 , xn ) ⩽
αd(xn , xn−1 ) for some α ∈ [0, 1). Then, (xn ) is conver-
gent. ♢

Proof (Solution to Exercise 27). Since (X, d) is com-

plete, it suffices to prove that (xn ) is Cauchy. By defin-
ition, d(x2 , x1 ) ⩽ αd(x1 , x0 ), d(x3 , x2 ) ⩽ αd(x2 , x1 ) ⩽
α2 d(x1 , x0 ), and d(xn+1 , xn ) ⩽ αn d(x1 , x0 ). If m > n, then
m m
d(xm , xn ) ⩽ ∑ d(xk , xk−1 ) ⩽ ∑ αk−1 d(x1 , x0 )
k=n+1 k=n+1
∞
αn
⩽ ∑ αk−1 d(x1 , x0 ) = d(x1 , x0 ).
k=n+1 1−α
which proves that (xn ) is a Cauchy sequence (since

we can make n arbitrarily large), and hence conver-
gent. ∎
We now state the Banach fixed point theorem.
28 Theorem (Banach): Let (X, d) be a complete metric

space and f ∶ X → X be a contraction. Then, f has a
unique fixed point x. ♢
Proof. We first show that if f has a fixed point, it is

unique. To this end, suppose x ≠ y and both x and
y are fixed points of f. Recall that since x and y are
distinct, d(x, y) > 0. Then,
d(x, y) = d(f(x), f(y)) ⩽ αd(x, y)
which is impossible unless d(x, y) = 0. Hence, if f has

a fixed point, it is unique. Now to show that f does
have a fixed point.
Let x0 ∈ X be an arbitrary point. Define a sequence
inductively as follows:
xn+1 ∶= f(xn ), n = 0, 1, 2, . . . .
Then, for n ⩾ 1, we have
d(xn+1 , xn ) = d(f(xn ), f(xn−1 )) ⩽ αd(xn , xn−1 ).

By Exercise 27, we see that the sequence (xn ) con-

verges to x ∈ X. But notice that f is continuous. There-
fore, the sequence (f(xn−1 )) converges to f(x). But
f(xn−1 ) = xn , so that f(x) = x, which completes the
proof. ∎
Notice that computing the fixed point is very simple (in

principle). All we need to do is pick some point x0 ∈ X
and iterate the contraction f, so that xn ∶= f(xn−1 ) =
f(n) (x0 ), where f(n) (x0 ) ∶= f ○ f(n−1) (x0 ) ∶= f(f(n−1) )(x0 ) is
defined inductively, with f(1) (x0 ) ∶= f(x0 ). The sequence
(xn ) is known as the orbit of the point x0 under the
action of the mapping f.
Although contraction mappings are extremely useful
(because they work in such generality), it is often hard
to verify that a mapping is a contraction. Nevertheless,
a natural sufficient condition for a mapping to be a con-
traction was identified by Blackwell. We are now ready
to state Blackwell’s Theorem, which is fundamental to
the theory of discounted dynamic programming.
29 Theorem (Blackwell’s Theorem): Let S be a non-empty

set, B(S) the Banach space of bounded functions with
the sup norm, and let L ⊂ B(S) be a complete subspace
of B(S) that contains all the constant functions. Let
Ψ ∶ L → L be an operator that satisfies:
(i) Ψ is monotone in the sense that f ⩽ g implies Ψ(f) ⩽

Ψ(g).
(ii) There exists some constant α ∈ [0, 1) such that

for each constant function c1, we have Ψ(f + c1) ⩽
Ψ(f) + αc1, where 1 is the constant function that
maps each x ∈ S to 1.
Then, Ψ has a unique fixed point. ♢

Proof. The theorem is proved by showing that the

mapping Ψ is a contraction with modulus α. Toward
this end, let f, g ∈ L and let c ∶= ∥f − g∥∞ . Then, we have
both f ⩽ g + c1 and g ⩽ f + c1. From properties (i) and
(ii) above, it follows that Ψ(f) ⩽ Ψ(g) + αc1 and Ψ(g) ⩽
Ψ(f) + αc1. In other words, ∣Ψ(f)(x) − Ψ(g)(x)∣ ⩽ αc for
each x ∈ S. Thus,
∥Ψ(f) − Ψ(g)∥∞ = sup ∣Ψ(f)(x) − Ψ(g)(x)∣ ⩽ αc = α∥f − g∥∞

x∈S
which proves that Ψ is a contraction with modulus α,

as required. ∎
In many instances, the set S has a natural topological

structure. For instance, S ⊂ Rn etc. In such an instance,
we may like to take advantage of this structure. This is
done by considering the space of continuous bounded
functions on S, Cb (S), which is a closed subspace of
B(S), and hence is complete. Therefore, letting L ∶=
Cb (S) is the theorem above implies that the fixed point
lies in Cb (S). By suitably varying L, other properties
of the fixed point, such as monotonicity in S (assum-
ing S has an order structure) can also be deduced. The
biggest difficulty here is ensuring that the operator Ψ is
an injection.
10.5 Ordinary Differential Equations
Below is a quick review of a class of ordinary differ-

ential equations that are of interest to us. A first order

differential equation is an equation governing the evolu-

tion of y(t) as a function of y and t. Here y, t ∈ R. More
specifically, the first order ODE is
y ′ = f(t, y).
There are three common questions that are immediately

posed of such equations.
(a) Does the ODE have a solution, when y(t0 ) = y0 ?
(b) If so, is there an explicit solution?
(c) If not, can we nevertheless say something about
the qualitative properties of the solution?
The first question is usually settled by (once again) the
contraction mapping theorem. We will consider the
second question in the specific case of the first-order
ODE that can be written in the normal linear form
y ′ + p(t)y = q(t).
Such an ODE can be solved using an integrating factor

µ(t) ∶= e∫ p(t) dt . This is done by observing that
′
(µ(t)y) = µ(t)y ′ + µ ′ (t)y
using the chain rule to differentiate µ(t) = e∫ p(t) dt , so

that µ ′ (t) = p(t)µ(t).
To solve the ODE, we multiply both sides of the ODE
by µ(t), so that
µ(t)q(t) = µ(t)[y ′ + p(t)y]

′
= (µ(t)y)
which implies that
µ(t)y = ∫ µ(t)q(t) dt + C

where C is a constant of integration. Dividing both

sides by µ(t) gives us the solution to the linear ODE.
30 Example: Consider the linear ODE y ′ − 2ty = t. Then,
−2t dt
= e −t .
2
µ(t) = e∫
Also,
−t 2 1 −t 2
∫ te dt = − 2 e ,
which means that the solution to the ODE is
2
y = Cet − 12 . ♢
31 Exercise: Solve the ODE y ′ =

4y
t
using the method out-
lined above. ♢
10.5.1 A More Involved Example
We are interested in solving the ODE
r + Nλt (r + Nλ)g
y ′ (t) + [ ] y (t ) =
Nλt(1 − t) Nλ(1 − t)
This is an ODE in normal linear form where

r + Nλt
p(t) ∶= [ ]
Nλt(1 − t)
and
(r + Nλ)g
q(t) ∶= .
Nλ(1 − t)
We shall proceed via a number of simple steps.

Step 1. Evaluating ∫ p(t) dt.

r 1
∫ p(t) dt = ∫ [ Nλt(1 − t) + 1 − t ] dt
r 1 1 1
=∫ [ ( + )+ ] dt
Nλ t 1 − t 1−t
r
= [ ln t − ln(1 − t)] − ln(1 − t)
Nλ
Step 2. Evaluating e∫ p(t) dt .
From Step 1, we see that

r
t Nλ 1
e∫ p(t) dt =[ ] =∶ µ(t).
1−t 1−t
Step 3. Evaluating ∫ µ(t)q(t) dt.

Therefore,
(r + Nλ)g
∫ µ(t)q(t) dt = ∫ µ(t) Nλ(1 − t) dt
r
(r + Nλ)g t Nλ 1
= ∫ [1 − t] dt
Nλ (1 − t)2
Now, let
1−t
Ω(t) ∶= ,
t
so we can rewrite the integral as
r
(r + Nλ)g t Nλ t2 1 (r + Nλ)g 1 1
Nλ
∫ [1 − t] (1 − t)2 t2
dt =
Nλ
∫ Ω(t)r/Nλ+2 t2 dt.
Now make the substitution Ω(t) = z, noting that − t12 dt =

dz. Thus, the integral in question becomes
(r + Nλ)g 1 1 (r + Nλ)g 1
Nλ
∫ Ω(t)r/Nλ+2 t2 dt = − Nλ ∫ zr/Nλ+2 dz
(r + Nλ)g Nλ 1
=
Nλ r + Nλ Ω(t)r/Nλ+1
1
=g
Ω(t)r/Nλ+1
= ∫ µ(t)q(t) dt

µ(t)q(t) dt
Step 4. Evaluating ∫ .
µ (t )
From the steps above, we see that
∫ µ(t)q(t) dt = g 1
Ω(t)r/Nλ (1 − t)
µ(t) Ω(t)r/Nλ+1
= gt.
Step 5. Solving the ODE.

Thus, the solution to the ODE is
r
y(t) = gt + C(1 − t)Ω(t) Nλ ,
where C is a constant of integration determined by

boundary conditions.

11 Metric Spaces
11.1 Basic Definitions
A metric space (X, d) is a set X and a function d ∶ X × X →

R such that
• d(x, y) = d(y, x) for all x, y ∈ X (symmetry),

• d(x, y) ⩾ 0 for all x, y ∈ X (positive definiteness) and
d(x, y) = 0 if and only if x = y, and
• d(x, z) ⩽ d(x, y) + d(y, z) (triangle property).

The function d is referred to as the metric on the space
X. Here are some examples of metric spaces.
Euclidean Space: X ∶= RN and d2 (x, y) ∶= ∥x − y∥2 =

1/2
( ∑n
i=1 (xi − yi 2
)) .
Rn with another metric: X ∶= RN and dp (x, y) ∶= ∥x − y∥p =

1/p
p
( ∑n
i=1 ∣xi − yi ∣ ) , where 1 ⩽ p < ∞.
With yet another metric: X ∶= RN and d∞ (x, y) ∶= ∥x − y∥∞ =

supi ∣xi − yi ∣.
1/p
p
A nonexample: X ∶= RN and dp (x, y) ∶= ( ∑n
i=1 ∣xi − yi ∣ ) ,
where 0 < p < 1. (Show that dp does not satisfy the
triangle property. In particular, let x = (1, 0, . . . , 0)
and y = (0, . . . , 0, 1), so that dp (0, x) + dp (0, y) <
dp (x, y).)
162
11 Metric Spaces 163
Discrete Metric Space: X is any set, and d ∈ RX×X is

given by
⎧
⎪1 if x ≠ y;
⎪
d(x, y) ∶= ⎨
⎪
⎩0 otherwise.
⎪
Sequence Space I: `p (R) is the space of all sequences

in R wherein x ∈ `p if and only if ∑∞
p
i=0 ∣xi ∣ < ∞,
where 1 ⩽ p < ∞. This is a vector space with norm
1/p
∥x∥p ∶= ( ∑∞
p
i=0 ∣xi ∣ ) . This norm is referred to as
the `p -norm (read “little ell p” norm). The induced
metric is dp (x, y) ∶= ∥x − y∥p .
Sequence Space II: `∞ (R) is the space of all sequences
in R wherein x ∈ `∞ if and only if supi ∣xi ∣ < ∞.
This is a vector space with norm ∥x∥∞ ∶= supi ∣xi ∣.
The norm is referred to as the “sup” norm. The
induced metric is d∞ (x, y) ∶= ∥x − y∥∞ .
Function Space I: Let X be any set. The space of all
bounded real functions is B(X) which is a vector
space. It has norm ∥f∥∞ = supx∈X ∣f(x)∣. This is
referred to as the “sup” norm. The induced metric
is d∞ (f, g) ∶= ∥f − g∥∞ .
Function Space II: Let X be any metric space. The space
of all bounded, continuous real functions is Cb (X)
which is a vector space. It has (a) norm ∥f∥∞ =
supx∈X ∣f(x)∣. This is referred to as the “sup” norm.
Once again, the induced metric is d∞ (f, g) ∶= ∥f − g∥∞ .
Function Space III: Let X ∶= [0, 1], and C[0, 1] the space
of all continuous real functions on [0, 1]. For each
p ∈ [1, ∞), there is a norm on C[0, 1] wherein
1/p
1 p
∥f∥p ∶= ( ∫0 ∣f(t)∣ dt) . The induced metric is
dp (f, g) ∶= ∥f − g∥P .

An open ball around the point x ∈ X of radius r > 0 is

the set B(x, r) ∶= {y ∈ X ∶ d(x, y) < r}. The closed ball is
the set B[x, r] ∶= {y ∈ X ∶ d(x, y) ⩽ r}.
1 Example: Let X ∶= RN
+ and endow it with the discrete
metric. Let x ∈ RN + . Then, for r ⩽ 1, B(x, r) = {x}, while for
r > 1, B(x, r) = X. (This is clearly true for any set X.) ♢
2 Exercise: Draw the open ball B(0, 1) for the following

metric spaces. (R2 , d1 ), (R2 , d2 ), (R2 , d4 ), (R2 , d∞ ) and
(R2 , d1/2 ).
Let A be a subset of X. A point x ∈ X is said to be in the

interior of A if there exists rx > 0 (so the rx can depend
on x) such that B(x, rx ) ⊂ A. A set O ⊂ X is open if for
each x ∈ X, there exists rx > 0 (so the rx depends on x),
such that B(x, rx ) ⊂ O, that is if each point x is in the
interior of O.
3 Exercise: Let (Oλ )λ∈A be a collection of open sets, where

A is an index set. Show that ⋃λ∈A Oλ is open. Let O1 and
O2 be open. Show that O1 ∩ O2 is also open. ♢
4 Exercise: Show that On ∶= (−1/n, 1/n) is an open subset

of R. What is ⋂n On ? Is it open? ♢
5 Exercise: Show that ∅ and X are open. ♢
Clearly, r < r ′ implies B(x, r) ⊂ B(x, r ′ ). A warning

though. In general metric space, “balls” don’t look like
the balls in Euclidean space. Here is a useful exercise.
6 Exercise: Does there exist a metric space X with x, y ∈ X

and r ′ > r > 0 such that B(y, r ′ ) ⊂ B(x, r)? ♢

A set C ⊂ X is closed if the complement of C, given by

Cc ∶= X ∖ C, is open. Another characterisation of closed
sets is in terms of open balls around a point. A point
x ∈ X is a contact point of the set C if for every r > 0,
B(x, r) ∩ C ≠ ∅. The set of all contact points of the set
C is written as C, and is referred to as the closure of C.
Clearly, C ⊂ C. Thus, a set is closed if and only if C = C.
The closure operator has the following properties.
7 Theorem: The closure operator has the following prop-
erties.
(a) C ⊂ D implies C ⊂ D;
(b) C = C;
(c) C ∪ D = C ∪ D;
(d) ∅ = ∅, X = X. ♢
8 Exercise: Prove the theorem above. ♢
9 Exercise: For a set A, let FA ∶= {C ⊂ X ∶ C ⊃ A, C = C}.

Show that
A ∶= ⋂ C.
C∈FA
10 Exercise: Show Fn ∶= [−1 + 1/n, 1 − 1/n] is closed in R.

What is ⋃n Fn ? ♢
11 Exercise: Let (Fα )α∈A be a collection of closed sets,

where A is an index set. Show that ⋂α Fα is closed. ♢
12 Exercise: Show that B(x, r) ⊂ B[x, r]. Is the inclusion

always strict? (Hint: Consider the discrete metric.) ♢
A sequence in the set X is a function f ∶ N → X. The set

of all possible sequences is written as X∞ (rather than
XN ). A sequence (xn ) ∈ X∞ converges to a point x ∈ X if
for each ε > 0, there exists Nε > 0 such that for all n > Nε ,
xn ∈ B(x, ε). The point x is known as the limit of the
sequence.

13 Exercise: Show that the limit of a sequence must be

unique. ♢
14 Theorem: A point x ∈ X is a contact point of a set C

if and only if there exists a sequence (xn ) ∈ C∞ that
converges to x. ♢
15 Exercise: Prove the theorem above. Show that the ∅

and X are closed. ♢
Notice that the sequence could be the constant sequence.

This is something you should keep in mind.
11.2 Completeness
A sequence (xn ) ∈ X∞ is Cauchy if for every ε, there ex-

ists N (that can depend on ε) such that for all n, m > N,
d(xn , xm ) < ε. Notice that the Cauchyness of a sequence
depends only the terms in the sequence and the metric
in question. Thus, it is a property of a metric space.
16 Exercise: Let (xn ) ∈ X∞ be a convergent sequence. Show

that it is Cauchy. ♢
Thus, every convergent sequence is Cauchy. Is the con-

verse true? Let’s see this by example.
17 Example: Let X ∶= (0, 1], and let xn ∶= 2−n . You should

show that the sequence (xn ) is Cauchy. But it is not con-
vergent, though through no fault of its own. This is be-
cause, the obvious “limit point” is x = 0. Unfortunately, 0
is not in the space. Nevertheless, in the space [0, 1], the
sequence does converge. ♢

Thus, all Cauchy sequences are not convergent. More

importantly, we have established that Cauchyness is
a property of the sequence, whereas convergence is
a property of both the sequence and the underlying
space. A class of spaces that are useful to study are
ones where all Cauchy sequences are convergent. A
metric space is complete if every Cauchy sequence is
convergent.
18 Exercise: Show that RN is complete. Let X be the dis-

crete metric space. Show that X is complete. (What does
a Cauchy sequence look like in X?) ♢
A useful property of complete metric spaces is that

closed subsets of complete metric spaces are also com-
plete. Actually more is true, but first a definition. Sup-
pose Y ⊂ X. Then, the function d∣Y ∶ Y × Y → R+ , which
is the restriction of d to Y is also a metric on Y . Then,
(Y , d∣Y ) is a metric space in its own right. We shall refer
to it as the metric subspace of X with the metric d (and
thereby omit mentioning d∣Y ).
19 Theorem: Let X be a metric space and Y a subspace of

X.
(i) If X is complete and if Y is closed in X, then Y is com-
plete.
(ii) If Y is complete, then it is closed in X. ♢
As always, you should try and prove this.
11.3 Product Spaces
For j = 1, . . . , n, let (Xj , dj ) be metric spaces. The product

metric space is X ∶= X1 × ⋅ ⋅ ⋅ × Xn , with the product metric

D1 , defined as
n
D1 (x, y) ∶= ∑ dj (xj , yj )
j=1
where x = (x1 , . . . , xn ) ∈ X. We could also have defined

the product metric D∞ as
D∞ (x, y) ∶= max dj (xj , yj ).
j=1,...,n
The two metrics are equivalent in the sense that they

define the same open sets. More precisely, consider a
set A ⊂ X. Then, A is open with respect to the metric
D1 if and only if it is open with the metric D∞ . This is
because (fill in the details)
D∞ (x, y) ⩽ D1 (x, y) ⩽ nD∞ (x, y).
11.4 Continuous Functions
Before we talk about continuity, an exercise.

20 Exercise: Prove de Morgan’s laws:
c c
( ⋃ Ai ) = ⋂ Aci and ( ⋂ Ai ) = ⋃ Aci ,

i∈I i∈I i∈I i∈I
where (Ai )i∈I is any collection of subsets of X. ♢
The intuition behind continuity is the same as in the

real case. Controlling the distance between points in the
domain allows us to control the distance between their
images in the codomain. Let us make this precise. Here,
(X, dX ) and (Y , dY ) are metric spaces.
21 Definition (Continuity–I): A function f ∶ X → Y is con-
tinuous at x ∈ X if for each ε > 0, there exists δ > 0 such
that y ∈ B(x, δ) implies f(y) ∈ B(f(x), ε). The function
f is continuous everywhere (or just continuous) if it is
continuous at each x ∈ X. ♢

22 Definition (Continuity–II): f ∶ X → Y is continuous if for

each open set O ⊂ Y , f−1 (O) is open in X. ♢
Here, f−1 (O) ∶= {x ∈ X ∶ f(x) ∈ O}. This leads us to yet

another definition.
23 Definition (Continuity–III): f ∶ X → Y is continuous if for
each closed set C ⊂ Y , f−1 (C) is closed in X. ♢
The following theorem is fundamental.

24 Theorem: Let f ∶ X → Y be a function. Then, the follow-
ing are equivalent.
(i) Continuity I.
(ii) Continuity II.
(iii) Continuity III. ♢
Proof. That (ii) and (iii) are equivalent is left as an ex-

ercise. We shall show that (i) implies (ii). Toward this
end, let O be open in Y and let x ∈ f−1 (O). Since f(x) ∈
O, there exists an ε such that B(f(x), ε) ⊂ O. Then,
by Continuity I, there exists a δ such that y ∈ B(x, δ)
implies f(y) ∈ B(f(x), ε). In other words, f(B(x, δ)) ⊂
B(f(x), ε), ie, B(x, δ) ⊂ f−1 (B(f(x), ε)) ⊂ f−1 (O). Thus,
x is an interior point of f−1 (O). Since x is arbitrary,
f−1 (O) is open, so that Continuity II is satisfied.
Let us assume that (ii) holds and fix x ∈ X. Then, for

each ε > 0, f−1 (B(f(x), ε)) is open in X. Thus, there
exists a δ such that B(x, δ) ⊂ f−1 (B(f(x), ε)). Clearly,
f(B(x, δ)) ⊂ B(f(x), ε), so that Continuity I is satis-
fied. ∎
25 Exercise: Show that Continuity II and Continuity III

are equivalent. ♢
Here is an important characterisation of real continuous

functions.

26 Theorem: Let f ∶ X → R. Then, the following are equival-

ent.
(i) f is continuous.
(ii) {x ∶ f(x) ⩾ c} and {x ∶ f(x) ⩽ c} are closed for each
c ∈ R.
(iii) {x ∶ f(x) > c} and {x ∶ f(x) < c} are open for each
c ∈ R. ♢
A useful property of continuous functions to keep in

mind is that they always map Cauchy sequences to
Cauchy sequences. We end with some useful properties
of continuous functions. We leave the proofs as exer-
cises.
27 Exercise: Let f ∶ X → Y and g ∶ Y → Z be continuous

functions. Show that the function h ∶ X → Z defined as
h ∶= g ○ f is also continuous. ♢
28 Exercise: Suppose f1 , f2 ∶ X → R are continuous. Show

that f1 + f2 , ∣f1 ∣, −f1 , min{f1 , f2 } and 1/f1 are continuous
(where we assume that f1 is never zero). ♢
11.5 Compactness
Finite spaces have some very desirable properties. For

instance, a function defined on a finite space always has
a maximiser. Compactness can be thought of as a finite-
ness structure on infinite spaces. If X is a finite space
(with, for instance, the discrete metric), then for any
ε, there exist finitely many points x0 , . . . , xn such that
X = ⋃i B(xi , ε). Thus, all the analysis on the space X is
reduced to the analysis of finitely many open balls of
arbitrarily small radius. If ε is sufficiently small, and the
question at hand is nice (ie, the functions are continu-
ous and the space is complete), then the problem can

be solved by first solving it locally (ie, for some ε) and

then looking over finitely many of these ε-balls. This
reduces the complexity of the problem at hand, and
spaces with this property are highly treasured. In the
discussion above, we seem to have isolated two proper-
ties of compact spaces, the first being completeness. The
second is defined below. As we shall see below, these
two properties completely characterise compact metric
spaces.
29 Definition: A metric space X is totally bounded if for

each ε > 0, there exists a finite subset {x0 , . . . , xn } ⊂ X such
that X = ⋃i B(xi , ε). ♢
To formally define compactness, we need a few more

definitions.
30 Definition: An open cover of a set S ⊂ X is a collection

of open sets O such that S ⊂ ⋃O∈O O. ♢
31 Definition: A metric space X is compact if every open

cover of X has a finite subset that also covers X. The fi-
nite subset is known as a subcover. ♢
A subset S of a metric space is compact if every open

cover of S has a finite subcover. Notice that we can use
the original definition if we treat S as a metric space in
its own right (ie, as a subspace of X). The two spaces
are distinct, but only if we are pedantic. Otherwise, we
shall not distinguish between the two. When X = R,
compactness has a very convenient characterisation.
First a definition.
32 Definition: A set K ⊂ R is bounded if there exists a finite

a ∈ [0, ∞) such that K ⊂ (−a, a). ♢

33 Theorem (Heine-Borel): A set K ⊂ R is compact if and

only if K is closed and bounded. ♢
34 Definition: A collection of sets F has the finite intersec-

tion property if for all F1 , . . . , Fn ∈ F , F1 ∩ ⋅ ⋅ ⋅ ∩ Fn ≠ ∅. ♢
We shall now see that compactness ensures that any

collection of closed subsets of X denoted by F has a
nonempty intersection. In other words, compactness
ensures that if F is a collection of closed sets with the
finite intersection property, then ⋂F∈F F ≠ ∅. Indeed, this
is a defining characteristic of compact spaces.
35 Theorem: Let X be a metric space. Then, the following

are equivalent.
(i) X is compact.
(ii) Every collection of closed subsets F of X that has
the finite intersection property has a nonempty intersec-
tion. ♢
36 Exercise: Prove the theorem above. (Use de Morgan’s

laws.) ♢
Compact subsets of R have the nice property that their

glb and lub are in the set, ie, they have a greatest and
least element. Now for the property alluded to above.
37 Theorem (Weierstraß): Let X be a compact metric

space and f ∶ X → R be continuous. Then, f attains a
maximum and minimum on X. ♢
Proof. (First proof.) We shall actually prove that if

X is compact and f continuous, then f(X) is compact.
Let O be an open cover of f(X) ⊂ R. Then, since f is
continuous, f−1 (O) is open for each O ∈ O. It is easy
to see that (f−1 (O))O∈O is an open cover of X. (Why?)

But X is compact, so there exist O1 , . . . , On ∈ O such

that ⋃i f−1 (Oi ) covers X. But then, ⋃i Oi covers f(X).
(Why?) Thus, f(X) has a finite subcover, and so is
compact. By the Heine-Borel theorem, the conclusion
follows.
(Second proof.) Let Ax ∶= {y ∶ f(x) > f(y)}, which
is open for each x ∈ X. If there is no maximiser, for
each x, there exists z ∈ X such that x ∈ Az . Thus, X =
⋃x∈A Ax . But X is compact, so there is a finite subcover
Ax1 , . . . , Axn of X. Without loss of generality, let us
assume f(x1 ) ⩾ . . . ⩾ f(xn ). Then, f(x1 ) ⩾ f(y) for all
y ∈ X, contradicting the fact that there is no maximum.
(Third Proof.) Let Cx ∶= {y ∶ f(x) ⩽ f(y)}, which is

closed for each x ∈ X, since f is continuous. Let F
be the collection of closed sets given by (Cx )x∈X . Let
x1 , . . . , xn be such that f(x1 ) ⩾ . . . ⩾ f(xn ). Then, x1 ∈
⋂ni=1 Cxi . Thus, F has the fip. Since X is compact, it
follows that ⋂x∈X Cx ≠ ∅, so let x∗ ∈ ⋂x∈X Cx . But this
implies that x∗ ∈ Cx for all x ∈ X, ie, f(x∗ ) ⩾ f(x) for all
x ∈ X. Thus, x∗ is a maximiser. ∎
38 Definition: A metric space X is sequentially compact if

every sequence in X has a convergent subsequence. ♢
We end with a characterisation of compactness in metric

spaces.
39 Theorem: The following are equivalent for a metric

space X.
(i) X is compact.
(ii) X is sequentially compact.
(iii) X is complete and totally bounded. ♢
40 Theorem: Let X be a metric space and Y a subspace of

X.

(i) If X is compact and Y is closed, then Y is compact.

(ii) If Y is compact, then it is closed in X. ♢
41 Exercise: Prove the theorems above. ♢
42 Example: We end with an example showing that the

Heine-Borel theorem fails in infinite dimensional spaces.
Consider C[0, 1], the Banach space of continuous real
functions on [0, 1]. Let f ∶ C[0, 1] → R be defined as
1/2 1
f(x) ∶= ∫ x(t) dt − ∫ x(t) dt.
0 1/2
It is easy to see that f is continuous, since ∣f(x)∣ ⩽ ∥x∥ for

each x ∈ X[0, 1], and linear since f(x + αy) = f(x) + αf(y)
for all x, y ∈ C[0, 1] and α ∈ R. Also,
sup f(x) = 1.
x∈C[0,1]
(Can you show this?) But there is no continuous func-

tion that achieves this supremum, showing that the
Heine-Borel theorem fails. (There is, however, a discontinuous
but measurable function that achieves this supremum.
Such a measurable function is an element of Lp [0, 1], for
some 1 ⩽ p ⩽ ∞.) ♢

12 Envelope Theorems
12.1 Motivation
Consider the set of parametrised optimisation problems
sup{f(x, t) ∶ x ∈ X}
where X is an arbitrary choice set, t ∈ T is a set of para-

meters, and f ∶ X × T → R is the objective. If one is only
interested in the maximum value for each t ∈ T , we may
consider the value function
(1) V (t) = sup{f(x, t) ∶ x ∈ X}

(2) X∗ (t) = {x ∈ X ∶ f(x, t) = V (t)}
where X∗ (t) is the set of maximisers of f(x, t) at t. Sup-

pose now, for simplicity, that T = [0, 1]. An envelope
theorem tells us the rate at which the value function var-
ies with t, ie it provides a formula for V ′ (t). Envelope
Theorems play an integral part in economic analysis,
and make appearances in, for instance, consumer theory,
the theory of choice under uncertainty, and mechanism
design. Below we shall provide a simple, but extremely
elegant and powerful approach to envelope theorems, in
the fashion of Milgrom and Segal (2002).
12.2 Results
In this section, we shall prove two theorems, assuming

throughout that T = 0, 1]. The first provides a formula
175
for the derivative of V at each t ∈ [0, 1], provided cer-

tain conditions hold. The second provides sufficient
conditions for us to integrate the value function over a
subinterval. We will also assume that for each t, X∗ (t)
is nonempty. This typically follows in economic models
from other assumptions.
3 Theorem: Take t ∈ [0, 1] and x∗ ∈ X∗ (t), and suppose
that ft (x∗ , t) exists. If t > 0 and V is left-hand differ-
entiable at t, then V ′ (t−) ⩽ ft (x∗ , t). If t < 1 and V is
right-hand differentiable at t, then V ′ (t+) ⩾ ft (x∗ , t). If t ∈
(0, 1) and V is differentiable at t, then V ′ (t) = ft (x∗ , t). ♢
Proof. Using (1) and (2), it is easy to see that for any
t ′ ∈ [0, 1], it must be that
f(x∗ , t ′ ) − f(x∗ , t) ⩽ V (t ′ ) − V (t)
Suppose t ′ ∈ (t, 1). Then,

f(x∗ , t ′ ) − f(x∗ , t)
ft (x∗ , t) = lim
t ↓t
′
t′ − t
V (t ′ ) − V (t)
⩽ lim
t ′ ↓t t′ − t
= V ′ (t+)
On the other hand, if t ′ ∈ (0, t), we have

f(x∗ , t) − f(x∗ , t ′ )
ft (x∗ , t) = lim
t ↑t
′
t − t′
V (t ) − V (t ′ )
⩾ lim
t ′ ↑t t − t′
= V ′ (t−)
Finally, if V is differentiable at t, the envelope formula

V ′ (t) = ft (x∗ , t) obtains. ∎
Notice that this powerful version of the envelope the-

orem requires no assumptions on the choice set X. In-
stead, for each x ∈ X, we think of f(x, ⋅) as a function of t.

This gives us a family of functions {f(x, ⋅) ∶ x ∈ X} to con-

sider, where X is merely an index set. We will see next
that this is a very important point of view. Notice that
the theorem above is somewhat unsatisfactory, in the
sense that it assumes V is differentiable to provide the
envelope formula. We will now provide some simple
sufficient conditions that ensure that V is differentiable
almost everywhere, and a useful integral version of the
envelope condition holds.
First some preliminaries. A function g ∶ [0, 1] → R is
Lipschitz if there exists M > 0 such that ∣g(t) − g(t ′ )∣ ⩽
M ∣t − t ′ ∣ for all t, t ′ ∈ [0, 1]. In this case, we shall say
that g is Lipschitz of rank M. Intuitively, a function is
Lipschitz if it’s graph is never too steep. Lipschitz func-
tions are immensely useful, since they are more rigid
than arbitrary continuous functions. This is reflected in
the following useful theorem of Rademacher:
Every Lipschitz function g ∈ R[0,1] is differen-
tiable almost everywhere, and is the integral
of its derivative.
4 Exercise: Show that every Lipschitz function is abso-

lutely continuous.1 Show that an absolutely continuous 1
This is a standard result, but
function is uniformly continuous. Show that a uniformly one that requires you to know
what an absolutely continuous
continuous function is continuous. Show that if a func-
function is. The definition can
tion is Lipschitz of rank M, it is also Lipschitz of rank N, be found in any analysis text.
whenever N > M. ♢
A family of functions {f(x, ⋅) ∶ x ∈ X} that map [0, 1] to

R is equi-Lipschitzian if there exists an M > 0 such that
for all x ∈ X, the function f(x, ⋅) ∶ [0, 1] → R is Lipschitzian
of rank M. In this case, we shall say that the family of
functions {f(x, ⋅) ∶ x ∈ X} is equi-Lipschitzian of rank
M. All our definition says is that each member of the

family is Lipschitzian of rank M. We now state a useful

lemma.
5 Lemma: Suppose V is defined as above in (1). Then, for

all t1 , t2 ∈ [0, 1], ∣V (t1 ) − V (t2 )∣ ⩽ supx∈X ∣f(x, t1 ) − f(x, t2 )∣. ♢
Proof. Suppose V (t1 ) − V (t2 ) ⩾ 0. For any ε > 0, there

exists y1 such that V (t1 ) ⩽ f(y1 , t1 ) + ε. This implies
0 ⩽ V (t1 ) − V (t2 ) ⩽ f(y1 , t1 ) − f(y1 , t2 ) + ε

⩽ sup ∣f(x, t1 ) − f(x, t2 )∣ + ε
x∈X
Since ε was arbitrary, we see that V (t1 ) − V (t2 ) ⩽

supx∈X ∣f(x, t1 ) − f(x, t2 )∣. Similarly, if V (t1 ) − V (t2 ) ⩽
0, for any ε > 0, we can choose y2 so that V (t2 ) ⩽
f(y2 , t2 ) + ε. Then,
0 ⩾ V (t1 ) − V (t2 ) ⩾ f(y2 , t1 ) − f(y2 , t2 ) − ε

⩾ − sup ∣f(x, t1 ) − f(x, t2 )∣ − ε
x∈X
which implies V (t1 ) − V (t2 ) ⩾ − supx∈X ∣f(x, t1 ) − f(x, t2 )∣.

Combining the two inequalities gives us the desired
result. ∎
We are now ready to state the second main theorem of

this section.
6 Theorem: Suppose the family of functions {f(x, ⋅)}x∈X

is equi-Lipschitzian of rank M > 0. Then V is Lipschitz.
Suppose, in addition, there is a set S ⊂ [0, 1] of Lebesgue
measure 0, such that each f(x, ⋅) is differentiable on Sc ,
and that X∗ (t) ≠ ∅ for all t ∈ Sc ∶= [0, 1] ∖ S. Then, for any
selection x∗ (t) ∈ X∗ (t) where t ∈ Sc , we have
t
(7) V (t) = V (0) + ∫ ft (x∗ (s), s) ds ♢
0

Proof. For any t1 , t2 ∈ [0, 1], notice (by the lemma

above) that
∣V (t1 ) − V (t2 )∣ ⩽ sup ∣f(x, t1 ) − f(x, t2 )∣ ⩽ M ∣t1 − t2 ∣

x∈X
which establishes the Lipschitzness of V . Therefore,

V is differentiable almost everywhere, and V (t) =
V (0) + ∫0 V ′ (s) ds. If ft (x, t) exists for all x ∈ X and
t
t ∈ Sc , then V ′ (s) is given by the envelope formula for

all s ∈ Sc , as established above. The integral condition
follows immediately. ∎
We should note that the above theorem is stated with

a little more generality in Milgrom and Segal (2002).
Essentially, they assume that each function f(x, ⋅) is ab-
solutely continuous, and that each ft (x, t) is dominated
by an integrable function. The proof is, nevertheless,
almost identical. We avoid this more general theorem
only because it is more tedious to define absolute con-
tinuity.
It should be noted that these sufficient conditions are
far from necessary. Consider the following example. Let
⎧
⎪x + (1 − t)
⎪ x⩽t
f(x, t) = ⎨
⎪
⎩−x + (1 − t) x ⩾ t
⎪
Clearly, f(x, t) is not differentiable at x = t, whence

f(t, t) = 1. Now define for each t ∈ [0, 1], as before,
V (t) = max{f(x, t) ∶ x ∈ R}
But V (t) = 1 for all t ∈ [0, 1], therefore V is differentiable

everywhere, giving us an instance where V ′ (t) ⊂ ∂f(t, t)
(the subdifferential of f(x, t) at t). For an envelope the-
orem for such an occasion, and one that tells us more,
see Clarke (1983).

12.3 Applications
Consider the constrained optimisation problem with k

inequality constraints:
V (t ) = sup f(x, t) where g ∶ X × [0, 1] → Rk

x∈X;g(x,t)⩾0
∗
X (t) = {x ∈ X ∶ g(x, t) ⩾ 0, f(x, t) = V (t)}
In the problem above, we shall assume that X is a con-

vex set (typically in Rn ), and f and g are concave in
x. It is well known that if there exists x̂ ∈ X such that
g(x̂, t) ≫ 0,2 then the problem above can be represen- 2
This is referred to as a Slater
ted as a saddle-point problem for a Lagrangean. Put condition.
differently, the Slater condition ensures that the Karush-
Kuhn-Tucker (KKT) conditions for optimality hold,
which allows us to use our cookbook recipes for solving
Lagrangeans.
If we now make some assumptions, such as equi-Lipschitzness
etc, and some others that guarantee that the Lagrange
multipliers λi vary nicely with t, we can show that
∂gi ∗
V ′ (t) = ft (x∗ , t) + ∑ λ∗i (t) (x , t )
i ∂t
where for each t, (x∗ , λ∗ (t)) is the solution to the Lag-

rangean problem.
I should note that Milgrom and Segal (2002) provide
a complete treatment of this problem, with all the as-
sumptions filled in. I have refrained from going to their
level of detail, since we haven’t discussed saddle point
problems in class. But if you think carefully, you should
see what kind of assumptions are needed for the result
noted above to hold.


Bibliography
Clarke, Frank H
1983
Optimization and Nonsmooth Analysis,
John Wiley and Sons, New York.
Geanakoplos, John
2003
‘Nash and Walras via Brouwer’,
Economic Theory, 21,
Pp. 585–603.
Keller, Godfrey, Sven Rady and Martin Cripps
2005
‘Strategic Experimentation with Exponential Bandits’,
Econometrica, 73, 1,
Pp. 39–68.
Machina, Mark
1982
‘‘Expected Utility’ Analysis without the Independence
Axiom’,
Pp. 277–323.
Markowitz, Harry
1952
‘The Utility of Wealth’,
Journal of Political Economy, 50,
Pp. 151–158.
182
Bibliography 183
Maskin, Eric S and Kevin W S Roberts

2008
‘On the Fundamental Theorems of General Equilibrium’,
Economic Theory, 35,
Pp. 233–240.
Milgrom, Paul and Ilya Segal
2002
‘Envelope Theorems for Arbitrary Choice Sets’,
Pp. 583–601.
Milgrom, Paul and Chris Shannon
1994
‘Monotone Comparative Statics’,
Econometrica, 62,
Pp. 157 –180.
Müller, A
1998
‘Comparing Risks with Unbounded Distributions’,
Journal of Mathematical Economics, 30,
Pp. 229–239.
Neumann, John von and Oskar Morgenstern
1953
Theory of Games and Economic Behavior,
3rd,
Princeton University Press, Princeton, NJ.
Presman, E L
1990
‘Poisson Version of the Two-Armed Bandit Problem with
Discounting’,
Theory of Probability and Its Applications, 35,
Pp. 307–317.

Bibliography 184
Quah, John K-H

2007
‘The Comparative Statics of Constrained Optimization
Problems’,
Pp. 401–431.
Rothschild, M and J E Stiglitz
1970
‘Increasing Risk I – A Definition’,
Journal of Economic Theory, 2,
Pp. 225–243.
1971
‘Increasing Risk II – Its Economic Consequences’,
Journal of Economic Theory, 3,
Pp. 66–84.

Lecture Notes On Microeconomics

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lecture Notes On Microeconomics

Enviado por

Direitos autorais:

Formatos disponíveis

Lecture Notes on

1 Theory of Choice 1 5 Duality 40

7.4 Expected Utility Theory . . . 81 10.2 Bandit Problems . . . . . . . 133

©r. vijay krishna 29th September 2010 22:59

The theory of choice underlies most of microeconomic

1 Example: There are two lotteries A and B. Lottery A is

1.2 Preference Relations

The following are the primitives in our model of con-

The way to interpret a binary relation is the following:

2 Example: Let us consider the following examples

B ∶= {(Hillary, Chelsea), (Bill, Chelsea)}

can be interpreted as capturing the relationship of

The following are useful kinds of binary relations. Fix a

©r. vijay krishna 29th September 2010 22:59

(e) antiymmetric if x B y and y B x implies x = y for

(g) negatively transitive if ¬[x B y] and ¬[y B z] im-

(i) weakly connected if x B y or y B x or x = y for all

(j) acyclic if x1 B x2 , x2 B x3 , . . . , xn−1 B xn implies

Our approach. Start with a binary relation, and make

3 Exercise: Let B be a binary relation that represents a

4 Definition: A binary relation ≽ on a set X is a preference

5 Example: Let us think a little more carefully about the

©r. vijay krishna 29th September 2010 22:59

Given a preference ≽ on X. We may define ≻ ⊂ X × X as

≻ is referred to as the agent’s strict preference and ∼ is

6 Exercise: Let ≽ be a preference relation. Show that ≽=≻

7 Proposition: Let ≻ be asymmetric and negatively trans-

8 Example: Let B ⊂ N × N be defined as a B b if and only if

9 Exercise: Let B ⊂ N × N be defined as a B b if and only

A useful definition is the following.

10 Definition: Let X be a set. An equivalence relation on X

11 Exercise: Let R ⊂ X × X be symmetric and transitive.

©r. vijay krishna 29th September 2010 22:59

12 Exercise: Let B ⊂ R2+ × R2+ , where x ∶= (xa , xb ) is a bundle

1.3 Revealed Preference

Descriptive point of view: observe choices being made.

13 Exercise: If X ∶= {1, . . . , n}, show that the cardinality of

14 Definition: A choice function for a finite set X is a func-

The idea behind the choice function is this. If the agent

15 Example (Money Pump): Let X ∶= {x, y, z}, and suppose

©r. vijay krishna 29th September 2010 22:59

then an unscrupulous confidence man will be able to

If preferences are ≽, then

c(A; ≽) ∶= max (≽) = {x ∈ A ∶ x ≽ y, ∀ y ∈ A}

16 Exercise: Is c(A; ≽) well defined? Ie, is c(A) ≠ ∅ for all

The following is a useful characterisation of choice func-

17 Exercise: Let ≽ be complete. Then, c(⋅; ≽) is a choice

We shall now make some assumptions about choice

18 Definition (Houthakker’s Axiom aka WARP): If x, y ∈

19 Definition (Sen’s decomposition): Sen’s α: If x ∈ B ⊂ A

20 Exercise: Which of Sen’s α and β does the choice func-

We begin with a simple proposition about choice func-

©r. vijay krishna 29th September 2010 22:59

21 Proposition: If ≽ is a preference relation, then c(⋅; ≽)

Proof. If x, y ∈ A ∩ B, x ∈ c(A; ≽) and y ∈ c(B; ≽), then

22 Exercise: Show that WARP implies Sen’s α and β. ♢

The following is a fundamental characterisation of

23 Proposition: If c satisfies both α and β, then there ex-

Proof. Define ≽ as follows. x ≽ y if x ∈ c({x, y}).

If z ∈ c(A), then z ∈ c({y, z}) (Sen’s α). By Sen’s β, it

We now have to show that c(A) ∶= c(; ≽) for all A. Fix

©r. vijay krishna 29th September 2010 22:59