Analysis I Notes by Garcia

Lectures on Real Analysis I
Stephan Ramon Garcia

Department of Mathematics
Pomona College
610 N. College Ave.
Claremont, CA 91711-6348
Preliminary Version
Last Revised: January 21, 2008
http://pages.pomona.edu/sg064747
stephan.garcia@pomona.edu
Contents
Lecture 1. Introduction
1.1. Preliminaries
1.2. Hippasus Theorem
1.3. A Nonconstructive Proof
1.4. A Third Proof of Hippasus Theorem
1
1
1
2
3
Lecture 2. The Archimedean Property and Its Consequences

2.1. The Archimedean Property
2.2. The Binomial Theorem & Bernoullis Inequality
2.3. An Analytic Proof of Hippasus Theorem
5
5
6
7
Lecture 3. The Least Upper Bound Property

3.1. The Least Upper Bound
Property
3.2. The Existence of 2
9
9
10
Lecture 4. Monotone Sequences and Series

4.1. The Monotone Sequence Property and Infinite Series
4.2. Series with Non-negative Terms and Decimal Expansions
13
13
14
Lecture 5. Bijections
5.1. Counting Without Counting
5.2. Galileos Paradox
5.3. Injections
5.4. Surjections
5.5. Bijections
17
17
17
18
20
21
Lecture 6. Cardinality
6.1. Cardinality
6.2. Countable Sets
23
23
24
Lecture 7. Cantors Theorem

7.1. Constructions with Countable Sets
7.2. Cantors Diagonal Argument
27
27
28
Lecture 8. The Continuum Hypothesis

8.1. Cantors Powerset Theorem
8.2. Russells Paradox
8.3. The Continuum Hypothesis
8.4. Digression on Geometry
30
30
31
32
33
Lecture 9. Normed Vector Spaces
35
i
ii
CONTENTS
9.1. Vector Spaces

9.2. Norms on Vector Spaces
35
36
Lecture 10. Metric Spaces

10.1. Metric Spaces
10.2. Convergent Sequences
39
39
41
Lecture 11. Subsequences, Continuity

11.1. Subsequences
11.2. Continuity
43
43
44
Lecture 12. Sequences and Continuity

12.1. Sequential Characterization of Continuity
12.2. Continuity and Composition
12.3. Limit, Accumulation, and Isolated Points
46
46
47
47
Lecture 13. Closed Sets

13.2. Closed Sets
49
49
50
Lecture 14. Open Sets

14.1. Closed Sets
14.2. Open Sets
51
51
53
Lecture 15. Set Operations with Open and Closed Sets

15.1. Complements of Open and Closed Sets
15.2. Set Operations with Open and Closed Sets
54
54
54
Lecture 16. Topological Characterization of Continuity

16.1. Inverse Images
16.2. Topological Characterization of Continuity
56
56
57
Lecture 17. Cauchy Sequences

17.1. Cauchy Sequences
17.2. Completeness
59
59
60
Lecture 18. Completeness

18.1. Completions of Metric Spaces
61
62
Lecture 19. Infinite Series

19.1. Cauchy Criterion for Series
19.2. The Divergence and Comparison Tests
64
64
65

20.1. An Extended Example
68
68
Lecture 21. Integral Test

21.1. The Harmonic Series and Integral Test
70
70
Lecture 22. Alternating Series

22.1. The Alternating Series Test
22.2. Manipulating Series
73
73
75
CONTENTS
iii
Lecture 23. Rearrangements of Series

23.1. Rearrangements of Series
23.2. Cauchy Products of Series
78
78
78
Lecture 24. Products of Series

24.2. The Cauchy Product of Convergent Series Can Diverge!
24.3. The Euler Product Formula
24.4. Eulers Refinement of Euclids Theorem
80
80
80
82
83
Lecture 25. Compactness

25.1. Compactness
25.2. Compact Sets in Rn
86
86
87
Lecture 26. The Cantor Set

26.1. The Cantor Set
26.2. The Cantor Ternary Function
26.3. Cantor Set Trivia
89
89
91
92
Lecture 27. Compactness and Continuity

27.1. Continuity and Compactness
27.2. Uniform Continuity
95
95
96
Lecture 28. Uniform Continuity

28.1. Nested Compact Sets
98
98
Lecture 29. Contraction Mapping Principle

29.1. The Contraction Mapping Principle
100
100
Lecture 30. Derivatives

30.1. Derivatives
30.2. Basic Theorems
103
103
104
Lecture 31. Mean Value Theorem

105
105
Lecture 32. Functions Behaving Badly

32.1. Functions Behaving Badly
108
108
Lecture 33. Uniform Convergence

33.1. Pointwise Convergence
33.2. Uniform Convergence
112
112
113

34.1. Completeness of C(X)
115
115
Lecture 35. Weierstrass M -test

35.1. Weierstra M -Test
35.2. Weierstrass Approximation Theorem
35.3. Cauchys Mean Value Theorem
117
117
119
119
Lecture 36. LHopitals Rule

and Taylors Theorem
120
iv
CONTENTS
36.1. LHopitals Rule

36.2. Taylors Theorem
120
121
Lecture 37. Taylor Series

37.1. Smoothness Classes
37.2. Some Smooth Functions
123
124
124
Lecture 38. Initial Value Problems

38.1. Existence and Uniqueness of Solutions
128
128
Lecture 39. Picard Iteration

39.1. Initial Value Problems
39.2. Extended Example
130
130
130
Appendix A. Basic Logic

A.1. Primitive Concepts
A.2. Negation (NOT)
A.3. Conjunction (AND)
A.4. Disjunction (OR)
A.5. Manipulating Propositions
A.6. Implication (P Q)
A.7. Converse (P Q)
A.8. If and only if ()
A.9. Contrapositive
133
133
134
135
136
137
138
139
140
140
Appendix B. Basic Set Theory

B.1. Sets
B.2. Using Properties to define Sets
B.3. Russells Paradox
B.4. Quantifiers
B.5. Negating Propositions With Quantifiers
B.6. Subsets
B.7. Complement, Union, and Intersection
B.8. Ordered Pairs
B.9. Cartesian Products
B.10. Power Sets
B.11. Concerning Exceptional Penguins
142
142
144
144
146
148
149
151
152
152
152
153
Appendix C. Mathematical Induction

C.1. The Power Sum Problem
C.2. Mathematical Induction

C.3. The Binomial Coefficient nk
C.4. Pascals Triangle
C.5. The Binomial Theorem
C.6. Bernoullis Solution to the Power Sum Problem
156
156
157
159
160
161
163
Appendix D. Ordered Fields

D.1. Fields
D.2. Ordered Fields
165
165
166
Appendix E. Primes Numbers
168
CONTENTS
E.1. Euclids Theorem

E.2. The Prime Number Theorem
Appendix F.
Galileos Paradox
168
170
172
Appendix G. Inner Product Spaces

G.1. Review: The Dot Product
G.2. Inner Products
G.3. Norms Defined by Inner Products
G.4. Orthogonal Vectors
G.5. The Cauchy-Schwarz-Bunyakowsky Inequality
G.6. The Triangle Inequality
174
174
175
176
177
179
181
Appendix H. Covering Compactness

H.1. Covering Compactness
H.2. Covering Compactness = Sequential Compactness
H.3. Total Boundedness
182
182
183
183
LECTURE 1
Introduction
1.1. Preliminaries
Since the real number system (denoted by R) is basic to real analysis, we need to know
exactly what real numbers are. As we will see, this is a far less trivial problem than it first
appears and it deserves serious consideration.
Although to some, the rigorous construction of the real number system can be endlessly fascinating, to others it may appear tedious and pedantic. There is a certain undeniable beauty to seeing the real number system built from the ground up, using logic and set
theory alone. On the other hand, while the grand scheme may be inspiring, many of the
details are quite mechanical and uninteresting. We will content ourselves with some sort
of middle ground, leaving some of the details to the homework and later lectures, while
omitting others altogether.
Since we all believe that R exists, in some form or another, we will introduce the
real numbers somewhat axiomatically. This means that we will not go into the details of
justifying why such a number system exists in proving the existence R we would stray
too far into the realm of set theory and away from real analysis itself. We will simply state
the basic properties of R as axioms and highlight their importance.
Among other things, the real number system contains a number of distinguished subsets:
Definition. R denotes the set of real numbers, N = {0, 1, 2, 3, . . .} denotes the set of natural numbers, Z = {. . . , 2, 1, 0, 1, 2, . . .} denotes the set of integers1, and Q denotes
the set of rational numbers (fractions).
In particular, note that our definition of N includes the number 0. This is somewhat
standard, as far as set theory and logic go, but in other branches of mathematics you may
see N introduced starting from 1. This is not a major mathematical issue, but it is important
to point out the notation that will we be using.
1.2. Hippasus Theorem
The complement of Q in R is the set
I = R\Q = {x R : x
/ Q}
of all irrational numbers (i.e. real numbers which are not rational). It turns out that I 6=
(the
empty set) in other words irrational number exist. In particular, we will demonstrate
that 2 (the length of the diagonal of a unit square) is irrational. While it appears that the
original proof of this
fact was due to Hippasus of Metapontum (who was a Pythagorean),
the irrationality of 2 is often attributed to Pythagoras himself. Regardless of who thought
of it, the proof is a standard example of proof by contradiction.
1The letter Z stands for the first letter of the German word Zahlen.
1
2 is irrational.
Proof #1. Suppose toward a contradiction that 2 is rational. Let us write 2 = a/b
where the fraction is reduced to lowest terms (so that the greatest common divisor gcd{a, b}
of a and b is 1). Squaring the preceding equation, we obtain 2b2 = a2 . This shows that
a2 is even, whence a is even as well (since the square of an odd number is odd). Writing
a = 2c, we find that 2b2 = (2c)2 = 4c2 and thus b2 = 2c2 . This shows that b2 , and
hence b itself, is even. Therefore a and b are both divisible by 2, a contradiction to the
hypothesis
that the fraction a/b was reduced to lowest
terms. Our initial assumption that

2 is rational is therefore false and we conclude that 2 is irrational.

Theorem 1.1 (Hippasus of Metapontum).
It is important to note that the preceding proof implicitly relied on the Fundamental
Theorem of Arithmetic. There are several other basic properties of N that we often take
for granted. Chief among these is the following:
Theorem 1.2 (Well-Ordering Property of N). Every nonempty subset of N contains a
smallest element. In other words, if S N and S 6= , then there exists n S such
that n m for every m S.
The Well-Ordering Property of N can be proved using the Principle of Mathematical
Induction, which can itself be proved from the axioms of set theory. However, we will not
concern ourselves with such details. We now give another proof of Theorem 1.1 which
relies on a minimality argument:
Proof #2. Suppose

toward a contradiction that 2 is rational. Let b be the smallest positive
integer such that 2 = a/b for some a Z. First observe that a > b since otherwise
2b2 = a2 b2
whence 2 1, an absurdity. A few algebraic manipulations leads us to another representation of 2 as a rational number:
!
2 a
2b a
2 2
21
2= 2
=
= a b =
.
1
ab
21
21
b
Since a b > 0 and 2 > 0, it follows from the preceding that 2b a > 0. However,
a > b and 2b a > 0 mean that
0 < a b < b,
contradicting the minimality of b. We therefore conclude that 2 is irrational.

1.3. A Nonconstructive Proof
Some proofs are constructive. In other words, the method of the proof can be used
to explicitly construct examples of the objects which they assert exist. On the other hand,
some proofs are nonconstructive in the sense that they establish the existence of something
without directly constructing it. A particularly striking example of a nonconstructive proof
is the following:
Theorem 1.3. There exist irrational numbers a and b so that ab is rational.
2
Proof. Let us consider the number c = 2 . This number is either rational or irrational
and thus there are two possible cases to consider:
(i) If c is rational, then let a = b = 2. In this case, ab = c is rational while a and

b are irrational.
S.R. Garcia Lectures on Real Analysis I (Preliminary Version)
(ii) If c is irrational, then let a = c and b = 2. In this case, the usual rules for
manipulating exponents yields
22 2
2
= 2 = 2.
ab = ( 2 ) 2 = 2
In this case, ab is rational while a and b are irrational.
Since both cases lead to the conclusion that there are irrational numbers a and b such that
ab is rational, the proof is finished.

Observe that the preceding proof does not tell us whether c is irrational or not. It turns
out that c is irrational this follows from the famed Gelfond-Schneider Theorem, a deep
and difficult result in the theory of transcendental numbers.
1.4. A Third Proof of Hippasus Theorem
Before giving our third proof of Hippasus Theorem, we need a few preliminaries. A
particularly familiar application of the Well-Ordering Property is the following:
Theorem 1.4 (Division Algorithm). Given a, b Z with b > 0, there exist unique q, r Z
such that a = qb + r and 0 r < b.
In other words, when you divide a by b, you wind up with a quotient q and a remainder
r which satisfies 0 r < b. Thus the Division Algorithm is just a familiar fact from grade
school arithmetic. Although we omit the proof, it is important to mention that the Division
Algorithm can be proved from more primitive notions. In particular, almost everything in
mathematics can be built up from the basic axioms of set theory.
If a, b are two nonzero integers, then gcd(a, b) will denote the greatest common divisor
of a and b. An important fact about the greatest common divisor is the following theorem,
the proof of which demonstrates both minimality and maximality arguments.
Theorem 1.5 (Linear Representation of GCD). Let a, b be nonzero integers. If g =
gcd(a, b), then there exist integers x0 and y0 such that g = ax0 + by0 . In other words, the
greatest common divisor of a and b is an integral linear combination of a and b.
Proof. Without loss of generality, we may assume that a, b > 0. The set
S = {ax + by : x, y Z}
contains positive integers (as well as 0). By the Well-Ordering Property of N, there exist
x0 and y0 such that l = ax0 + by0 is the smallest positive integer in S. It will turn out that
l is the greatest common divisor of a and b, that is l = g where g = gcd(a, b). Notice that
0 < l a,
0<lb
by the definition of l.
We first need to show that l is a common divisor of a and b. By the Division Algorithm,
we may write a = lq + r where 0 r < l (i.e. q is the quotient and r is the remainder
when a is divided by l). Therefore
r = a lq
= a q(ax0 + by0 )
= a(1 qx0 ) b(qy0 ).
Since r is of the form ax + by, it follows that r S. Since 0 r < l and l is the smallest
positive element of S, we see that r = 0. In other words, l evenly divides a (since the
remainder, r, is zero). Similar reasoning shows that l divides b as well. Therefore l is a

common divisor of a and b.
Since g = gcd(a, b) is the greatest common divisor of a and b, it is a common divisor
of a and b. Thus we can write a = gA and b = gB. Hence
l = ax0 + by0 = g(Ax0 + By0 )
and thus g divides l (in particular, g l). Since g l, it must be the case that g = l since
g is the greatest common divisor of a and b.

In terms of Abstract Algebra, the preceding theorem states that the ideal generated
by the integers a, b in the ring Z (a principal ideal domain) must be generated by a single
element, namely g.
We now present yet another proof of Hippasus Theorem. In fact, we prove that n is
irrational when n is not a perfect square. The following approach is not as well-known as
the others and it has a completely different flavor altogether:
Theorem
1.6 (Hippasus of Metapontum). If a natural number n is not a perfect square,
then n is irrational.
Proof #3. Suppose that n = a/b where the fraction a/b has been reduced to lowest
terms. In other words, a and b share no common factors and hence gcd(a, b) = 1. By the
linear representation of the GCD, there exist integers x, y so that 1 = ax + by. It therefore
follows that
n = n(ax + by)
= ( na)x + ( nb)y
= bnx + ay
since na = bn and nb = a. However, bnx + ay is an integer, which implies that n

is also an integer.
Since this contradicts the hypothesis that n is not a perfect square, we

conclude that n is irrational.
It turns out that we have one more proof of Hippasus Theorem in store. . .
LECTURE 2
The Archimedean Property and Its Consequences

2.1. The Archimedean Property
An extremely useful property of R is the so-called Archimedean Property:
Theorem 2.1 (Archimedean Property of R). For every > 0 and M R, there exists
n N so that M < n.
The proof that R enjoys the Archimedean Property requires the Least Upper Bound
Principle, which we will discuss relatively soon. It is important to note that there are
mathematical structures (i.e. ordered fields see Notes on Fields) which are similar to R,
yet which do not enjoy the Archimedean Property.
The Archimedean Property is often used in the following form:
Corollary 1. For every > 0, there exists n N such that
1
n
< .
Another consequence of the Archimedean Property is:

Theorem 2.2 (Greatest Integer Function). For each x R there exists a unique integer,
denoted [x], such that
[x] x < [x] + 1.
(2.1)
Proof. Without loss of generality, suppose that x > 0. By the Archimedean Property (with
= 1 and M = x), the set S = {n N : x < n} is not empty. By the Well-Ordering
Principle, S has a smallest element, say m. In other words, we have
m 1 x < m.
It is clear that [x] = m 1 has the desired property (2.1).

We must now show that [x] is the unique integer which satisfies (2.1). Suppose that
a Z satisfies a x < a+ 1. We consider two possible cases. First, suppose that [x] a.
In this case, it follows that 0 [x] a x a < 1, whence [x] = a since both a and [x]
are integers. A similar argument takes care of the case [x] < a.

In light of the preceding, we introduce the following useful notation:
Definition. For each x R, let [x] denote the greatest integer x and let hxi = x
[x] denote the fractional part of x. In other words, each real number x can be written
(uniquely) in the form x = [x] + hxi where [x] Z and 0 hxi < 1.
In case you have encountered this concept of inner products in linear algebra, we
should mention that the notation hxi for the fractional part of x is completely unrelated. In
fact, many textbooks use the notation {x} instead. Needless to say, in a course where one
deals with sets all the time, the use of set brackets for the purpose of denoting fractional
parts is quite unwise.
The following theorem asserts that Q is dense in R, in the sense that between any two
real numbers one can find a rational number:
5
Theorem 2.3 (Density of Q in R). If a < b, then there exists x Q so that a < x < b.
Proof. Since b a > 0 the Archimedean Principle asserts that there exists n N such
that (b a)n > 1. Since bn an > 1, it follows that there exists an integer m such that
an < m < bn.
(2.2)
Indeed, it follows from (2.1) that we may let m = [an] + 1:

an < [an] + 1 an + 1 < bn.
| {z }
m
Dividing (2.2) through by n, we find that a < x < b where x =
m
n.
The same statement holds for the set of irrational numbers I = R\Q:
Theorem 2.4 (Density of I in R). If a < b, then there exists x I such that a < x < b.
Proof. By the preceding theorem, there exists a rational number y such that
b
a
<y< ,
2
2
or equivalently,
a < 2y < b.
|{z}
x
Since 2 is irrational, it follows easily that x = 2y is irrational as well.
It follows from the preceding theorems that:

MORAL: In between any two distinct real numbers there are
infinitely many rational numbers and infinitely many irrational
numbers.
2.2. The Binomial Theorem & Bernoullis Inequality
The Binomial Theorem is one result from elementary algebra that turns out to be exceedingly useful in analysis:
Theorem 2.5 (Binomial Theorem). The formula
n
X
n k nk
(x + y)n =
x y
k
k=0
holds for any integer n 1 and any real numbers x, y. Moreover, the binomial coefficient

n!
n
=
k!(n k)!
k
is always an integer.
For a proof of the Binomial Theorem, you can consult the Notes on Induction. As
an immediate consequence of the Binomial Theorem, we obtain the following:
Theorem 2.6 (Bernoullis Inequalities). The inequalities
(1 + a)n 1 + na
(1 + a)n 1 + na +
hold for all a 0 and n N.
(Weak Version)
n(n 1) 2
a
2
(Strong Version)
Proof #1. The right hand sides of these inequalities are simply the first two and three
terms, respectively, in the binomial expansion of (1 + a)n . Since each term in the binomial
expansion is 0, the desired result follows.

Both versions of Bernoullis Inequality can be proved by Mathematical Induction. For
instance, here is an inductive proof of the weak version of Bernoullis Inequality:
Proof #2. Let a > 0 and let P (n) be the statement
(a > 1)( (1 + a)n 1 + na ).
(2.3)
We will use mathematical induction to show that the statements P (0), P (1), . . . are all true.
BASE C ASE: Clearly P (0) is true since the desired inequality reduces 1 1, which is
obviously true.
I NDUCTIVE S TEP: Suppose that P (n) is true for some value of n. In other words, suppose
that (2.3) is true for this specific value of n. Multiplying the inequality in (2.3) through by
1 + a we find that
(1 + a)n+1 = (1 + a)n (1 + a)
(1 + na)(1 + a)
= 1 + na + a + na2
= 1 + (n + 1)a + na2
1 + (n + 1)a.
In other words, the statement P (n + 1) is also true we have established that P (n)
P (n + 1). This completes the inductive step.
C ONCLUSION: By mathematical induction, it follows that P (n) is true for n = 0, 1, 2, . . .
and hence (1 + a)n 1 + na for a > 0 and every integer n 0.

As a consequence of the weak version of Bernoullis Inequality, we can prove the
following well-known and useful result:
Theorem 2.7. If x > 1 and M > 0, then there exists n N such that xn > M . Similarly
if 0 x < 1 and > 0, then there exists n N such that 0 xn < .
Proof. Since the second assertion follows immediately from the first, we prove only the
first statement in the theorem. If x > 1, then write x = 1 + a where a > 0 and use the
weak form of Bernoullis Inequality:
xn = (1 + a)n 1 + na.
By the Archimedean Property, there exists n N so that na > M 1 whence xn > M ,

as desired.

2.3. An Analytic Proof of Hippasus Theorem
In this section, we provide yet another proof of Hippasus Theorem. The following
proof is of a more analytical nature than our earlier proofs and it also illustrates some of
the techniques that we have developed.
2 is irrational.
Proof. #4. Assume toward a contradiction that 2 = p/q where p, q are integers and
q 1. Define the numbers en via the formula
D En
2 = ( 2 1)n
en =
Theorem 2.8 (Hippasus of Metapontum).
and observe that
0 < 2 1 < 12 .
(2.4)
Indeed, elementary arithmetic shows that the preceding inequality is equivalent to the obvious inequality 1 < 2 < 49 (i.e we can establish (2.4) without the use of a calculator or
decimal expansions). It follows from (2.4) and the definition of en that
1
(2.5)
0 < en < n
2
for all n N. Now observe that for each n N there exist integers an , bn such that
en = an + bn 2.
(2.6)
Although this statement can be proved by Mathematical Induction, it also be proved directly from the Binomial Theorem:
n
X
n n
( 2) (1)nk .
en = ( 2 1)n =
k
k=0

n
Since the binomial coefficients k are integers and since ( 2)n is either an integer or an
integer times 2, the desired formula (2.6) follows immediately. By (2.6), we have
e n = an + b n 2

p
= an + b n
q
an q + b n p
=
q
cn
=
q
where cn is an integer. Since en 6= 0, it follows that cn 1 whence en 1/q. Putting this
together with (2.5), we find that
1
1
en < n
q
2
for every n N. However, the resulting inequality
2n < q fails for sufficiently large n by

Theorem 2.7. This contradiction shows that 2 must be irrational.
LECTURE 3
The Least Upper Bound Property

3.1. The Least Upper Bound Property
The key property that singles out R among all ordered fields (such as Q or R(x) see
Notes on Ordered Fields for background) is the least upper bound property. To discuss
this important property, we first need a few definitions.
Definition. If A R, then an upper bound for the set A is a number s R such that
a s for all a A. If the set A has an upper bound, then we say that it is bounded above.
In terms of the real line visualization of R, an upper bound for A is simply a point
that lies to the right of the entire set A.
Definition. We call a real number s a least upper bound (or supremum) for A if
(i) s is an upper bound for A
(ii) if t is any upper bound for A, then s t.
This is written s = sup A and where sup stands for supremum. If A is not bounded above,
then we say that sup A = .
The corresponding notion of greatest lower bound (also called the infimum) of a set A
(denoted inf A) is defined analogously.
Note that sup A, when it exists, is uniquely determined. Indeed, if s1 , s2 are two least
upper bounds for A, then s1 , s2 are both upper bounds. Since s1 is the least upper bound,
it follows that s1 s2 . Similarly, we find that s2 s1 and hence s1 = s2 . Thus we can
speak of the least upper bound of a set.
Example 3.1. If A is a finite subset of R, then sup A is simply the largest element of A
and inf A is the smallest.
Example 3.2. sup N = since N is not bounded above (this follows from the Archimedean
property).
Example 3.3. sup[0, 1) = 1, where [0, 1) denotes the half-open interval:
[0, 1) = {x R : 0 x < 1}.
Clearly 1 is an upper bound for [0, 1), so condition (i) in the definition is satisfied.
Now let us check condition (ii). We claim that
x is an upper bound for [0, 1)
instead of proceeding directly, we prove the contrapositive:

x<1
x 1.
(3.1)
x is not an upper bound for [0, 1).
If x is any number smaller than 1, we can say that x = 1 , where > 0. But then
x < 1 2 [0, 1) and hence x is not an upper bound for [0, 1). This proves (3.1) and thus
1 is the least upper bound for [0, 1). The proves that sup[0, 1) = 1.
9
10

Note that the preceding example demonstrates that sup A does not have to belong to
A.
Example 3.4. Consider the set
A = {1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, . . .}.
The set A is bounded above, since every element of A is 2. In other words, 2 is an upper
bound for A. Of course, we also recognize that S contains a list of better and better rational
approximations to
2 = 1.414213562 . . . .
The problem with the rational number system Q is that it has holes which must be repaired.
Our intuition tells us that the sequence
1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, . . .
is increasing up to 2. That is, there should be some real number that is the least upper
bound for A. In other words, the sequence
above is approaching a hole in Q. This hold
will be plugged with the real number 2.

Although we will not prove the following theorem, rest assured that it can be proved
from the basic axioms of Set Theory:
Theorem 3.1 (Least Upper Bound Property). Every nonempty subset of R that is bounded
above has a least upper bound in R.
It turns out that R is the only ordered field that has this property. In other words, no
matter what construction is used to produce an ordered field with the least upper bound
property, the end result will be essentially R. Therefore it is almost silly to speak of least
upper bounds in any other field that R.
Theorem 3.2 (Approximation Property of Suprema). If s = sup A exists and is finite, then
for every > 0 there exists a A such that s < a.
Proof. Suppose toward a contradiction that the statement is false. In other words, ( >
0)(a A)(s a). This means that s is an upper bound for A, which is impossible
since s is supposedly the least (i.e. smallest) upper bound for A). Therefore the statement
is true.
3.2. The Existence of 2

In this section, we provethat there exists a real number s > 0 such that s2 = 2. In
other words, we prove that 2 actually exists in the system of real numbers (we never
attempted to justify this before). The proof depends on the Least Upper Bound Principle
and the following well-known, but often overlooked, theorem:
Theorem 3.3 (Trichotomy Law). If x, y R, then one and only one of the following
statements holds: x > y, x < y, or x = y.
We are now ready to proceed:

Theorem 3.4. There exists a real number s > 0 such that s2 = 2. In other words,
exists in R.
11
Proof. Define the set

A = {x R : x 0, x2 2}.
Since 0 A, it is clear that A is nonempty. Furthermore, since

2x
2 < 4 x2 ,
it follows that A is bounded above by 2. By the Least Upper Bound Principle, the set A
has a least upper bound in R. Let s = sup A and note that s 1.
By the Trichotomy Law, there are three possible cases to check:
s2 < 2,
s2 > 2,
s2 = 2.
If we can show that the first two cases lead to contradictions, this we can conclude that
s2 = 2, as desired.
(i) If s2 < 2, then we claim that (s + n1 )2 < 2 holds for sufficiently large n N.
Since
2

1
2s
1
s+
= s2 +
+ 2
n
n
n
1
2s
+
s2 +
n
n
2s + 1
2
,
=s +
n
if we make n large enough so that
s2 +
2s + 1
< 2,
n
then (s + n1 )2 < 2 will hold. In particular, any value of n such that

2s + 1
<n
2 s2
will suffice. This contradicts the fact that s is an upper bound for A since the
larger number s + n1 also belongs to A.
(ii) If s2 > 2, then we claim that (s n1 )2 > 2 holds for sufficiently large n N.
Since
2

1
2s
1
s
= s2
+ 2
n
n
n
2s
> s2 ,
n
if we make n large enough so that
s2
2s
> 2,
n
then (s n1 )2 > 2 will hold. In particular, any value of n such that

n>
2s
2
s2
12

will suffice. In particular, s n1 is an upper bound for S since t > s n1 implies
that
2

1
t2
2< s
n
whence t
/ S. This contradicts the fact that s is the least upper bound for S.
Since (i) and(ii) led to contradictions, it follows from the Trichotomy Law that s2 = 2.
In other words, 2 exists in R.
Similar, but more complicated, arguments guarantee the existence of n x whenever

x 0 and n 0. We will not go into the details any further.
LECTURE 4
Monotone Sequences and Series

4.1. The Monotone Sequence Property and Infinite Series
You might recall the following definition from Math 101, Math 31H, or even Calculus
II:
Definition. Let an be a sequence of real numbers. We say that the limit of the sequence
an is L, written limn an = L, if for every > 0, there exists N N such that
|an L| < holds whenever n N . In symbols, this reads:
( > 0)(N N)( n N |an L| < ).
One of the most important consequences of the Least Upper Bound Property is the
so-called Monotone Sequence Property, which asserts that an increasing sequence which
is bounded above must converge:
Theorem 4.1 (Monotone Sequence Property). If an is a sequence of real numbers which
is monotonically increasing (i.e. an an+1 for all n) and which is bounded above (i.e.
there exists M R so that an M for all n), then an is convergent.
The proof of the preceding theorem is requested on an upcoming homework assignment.
P
Definition. An infinite series i=0 ai of real numbers is said to converge to S if the
sequence of partial sums
n
X
ai
Sn =
i=0
tends to S:
ai = S
means that
i=0
lim Sn = S.
One of the simplest andP

most important examples of a convergent series of real num
bers is the geometric series n=0 xn with common ratio |x| < 1:
Theorem 4.2 (Geometric Series Formula). If |x| < 1, then
X
1
.
xn =
1
x
n=0
The preceding series is called the Geometric Series.
Proof. If |x| < 1 and > 0 are given, then let N N be so large that
|x|N < |1 x|.
We are guaranteed that such an N exists since |x| < 1. Recalling that
Sn = 1 + x + x2 + + xn1
13
(4.1)
14

=
1 xn
,
1x
it follows that if n N , then

n

Sn 1 = 1 x 1

1x
1x
1 x
n
x

=
1 x
|x|n
|1 x|
|x|N
|1 x|
.
Thus
1
,
1x
which is equivalent to the desired formula (4.1).
lim Sn =
Let us now recall the following fact from elementary arithmetic:

Corollary 2. If D is a block of decimal digits of length d, then
D
.
0.D = d
10 1
(4.2)
The preceding is nothing more than the familiar recipe for dealing with repeating
decimal expansions. For example:
Example 4.1.
4
= 0.4,
9
47
= 0.47,
99
476
= 0.476, . . . .
999
4.2. Series with Non-negative Terms and Decimal Expansions

Pn
Theorem 4.3. If
i 0 for all i N and if the sequence Sn =
i=0 ai is bounded above,
Pa
then the series i=0 ai converges.
Proof. Since ai 0 for all i N, it follows that the sequence Sn is monotonically

increasing (Sn Sn+1 for all n N). By hypothesis, the Sn are bounded above and
hence limn Sn exists by the Monotone Sequence Property.

As a result, we find that base-b expansions are valid:
Theorem 4.4 (Existence of Base-b Expansions). Let b be a positive integer. If di
{0, 1, . . . , b 1} for each i, then the infinite series
X
di
(4.3)
(0.d1 d2 d3 d4 . . .)b .
(4.4)
i=1
bi
converges to a real number in [0, 1], denoted
Furthermore, each x [0, 1) has a unique representation (called a base-b expansion) of

the form (4.4) which does not eventually terminate in a string of (b 1)s.
15
Proof. First note that if di {0, 1, . . . , b 1} for each i, then the formula for the summation of a geometric series tells us that the partial sums of (4.3) are bounded above by
1
X
b1
b
=
(b
1)
1
i
b
1
b
i=1
= (b 1)
= 1.
1
b1
By the preceding theorem, it follows that any series of the form (4.3) converges to a real
number in [0, 1].
Suppose now that x [0, 1) and let d1 be the largest natural number such that
d1
x.
(4.5)
b
In other words, let d1 = [xb]. Observe that 0 d1 b 1 since d1 b would contradict
the fact that x < 1. Similarly, let d2 be the largest natural number such that
d1
d2
(4.6)
+ 2 x
b
b
(this can again be defined in terms of the greatest integer function) and observe that 0
d2 b 1 since d2 b would violate the maximality of d1 in (4.5). Proceeding in
this manner, we obtain a sequence d1 , d2 , d3 . . . of base-b digits of x which satisfy the
inequality
dn
d2
d1
(4.7)
+ 2 + + n x
b
b
b
for each n = 1, 2, 3, . . .. Let

d1
dn
d2
A=
+ 2 + + n : n = 1, 2, 3, . . .
b
b
b
Since A 6= and x is an upper bound for D, it follows that s = sup A exists. The
definition of sup A implies that s x. We claim now that s = x.
Suppose toward a contradiction that s < x. Let m N be so large that
1
< x s.
bm
By the definition of dm , it follows that
d1
dm
d2
1
x<
+ 2 + + m + m
b {z
b } b
|b
1
s+ m
b
< x.
However, this implies that x < x, a contradiction. Since s x, it follows that we must
actually have x = s, as claimed. The remainder of the proof (the fact that the series (4.3)
converges to x and the uniqueness assertion) is left to the reader.

An important fact about infinite decimal expansions is that they can help us better understand the relationship between rational numbers and irrational numbers. The following
theorem precisely characterizes rational and irrational numbers according to their infinite
decimal expansions:
16
Theorem 4.5. A real number has an eventually repeating decimal expansion if and only if
it is rational. In other words, a real number x is a rational number if and only if its infinite
decimal expansion is of the form
x = A.BC
where A, B, C are finite blocks of decimal digits.
Proof. Suppose that the real number x has a repeating decimal expansion:
x = A.BC
where A, B, C are blocks of digits of lengths a, b, c, respectively. Clearly,
whence
x A = 0.BC
10b (x A) = B.C
= B + 0.C.
By the preceding lemma, we know that

Solving the equation
0.C = C/(10c 1).

10b (x A) = B +
C
10c 1
x = A + 10b B +
10b C
,
10c 1
leads to
which shows that x is a rational number.

On the other hand, if x is rational, then x = a/b where a, b are integers. When
performing the long division a/b, past a certain point only 0s will drop down since a
is an integer (i.e. a = a.000000 . . .). Once the division has proceeded to the point where
only 0s drop down, there are only b possible remainders at every step of the division.
Eventually some remainder will be repeated and the division will form a loop (of length at
most b).

Example 4.2. The preceding theorem implies that
0.123456789101112131415 . . .
and
0.2030507011013017019 . . .
are irrational numbers. Although their infinite decimal expansions have definite patterns
that we can concretely describe, their infinite decimal expansions do not eventually repeat.
On the other hand,
= 3.1415926535 . . .
is irrational (this requires a complicated proof) and its infinite decimal expansion does not
seem to have any discernible pattern.
LECTURE 5
Bijections
5.1. Counting Without Counting
The set
A = {apple, bird, cat}
has three elements. What do we mean by three? This is a philosophical question, but
clearly there is some property that the set A shares with the set
B = {a, b, c}.
There is some abstract notion of the number three that A and B share and we instantly
recognize this property even though we cannot define it.
We know that the sets A and B above have the same number of elements since they
both have three elements. Unfortunately, this procedure will not work for infinite sets the
types of sets that are of interest in real analysis. Nevertheless, by pairing up elements
(apple, a),
(bird, b),
(cat, c)
and noting that there are no elements of A or B left over, we can conclude that A and B
have the same number of elements without counting. In other words, to see that A and B
have the same number of elements does not actually require us to count to three (or to even
know what three is). For finite sets A and B, we observe that
If there is a one-to-one correspondence between the elements
of two finite sets A and B, then A and B have the same number
of elements.
In order to carry over this scheme of counting without counting to more general
sets, we need to discuss functions and their properties. However, let us first examine what
happens if we naively try to count infinite sets.
5.2. Galileos Paradox
In his final book The Discourses and Mathematical Demonstrations Relating to Two
New Sciences (1638), Galileo has a dialogue between two characters about infinite sets.
They discuss what is now known as Galileos Paradox. Galileo did not have permission
from the Inquisition to publish this book after a heresy trial based on an earlier book, the
Roman Inquisition banned Galileo from publishing anything. After failed attempts to publish his book in Germany, France, and Poland, it was finally published in the Netherlands.
Let
S = {0, 1, 4, 9, 16, . . .}
denote the set of perfect squares. Clearly S is a proper subset of N. In other words, S N
and S 6= N since clearly there are natural numbers (like 3) that are not perfect squares.
17
18
However, look at what happens when we line up the elements of S and N:

N
S
0
0
1 2
1 4
3 4 5 6 7
9 16 25 36 49
Galileos Paradox is the apparent contradiction that although S is much smaller than
N, we can still pair off elements of N with elements S. According to our intuition
obtained from studying finite sets, we might say that N and S have the same number of
elements.
In more precise terminology, Galileos Paradox is essentially the observation that the
set N properly contains
S = {0, 1, 4, 9, 16, 25, . . .},
even though the function f : N S defined by f (n) = n2 is a one-to-one correspondence

between S and N:
n
f (n)
0
0
1 2
1 4
3 4 5 6 7
9 16 25 36 49
Intuitively we know that this cannot happen for finite sets:

A finite set cannot be put into a one-to-one correspondence
with a proper subset of itself.
For example, there is clearly no way that the set {1, 2, 3} can be put into a one-to-one correspondence with {1, 2}. But what exactly do we mean by a one-to-one correspondence?
When we say that f (n) = n2 is a one-to-one correspondence between N and S we
mean first of all that f is a function. It assigns exactly one element of S (the target set) to
each element of N (the domain). Moreover, this function has two nice properties:
(i) f is one-to-one (injective). This means that no two distinct inputs (natural
numbers) get sent to the same output (a perfect square).
(ii) f is onto (surjective). In other words, f hits everything in S. In other words,
all of the elements in the target set S are outputs of f .
5.3. Injections
Let A and B be sets. A function f : A B can be thought of as a rule which assigns
to each a A, some corresponding element of B, called b = f (a). A function does not
have to be defined by a formula, however. It is simply some definite method of assigning
an element of B to each element of A.
Definition. A function f : A B is injective (often called one-to-one) if f (a1 ) = f (a2 )
implies that a1 = a2 . A function that is injective is called an injection.
Essentially, the preceding definition states:
A function is injective if distinct inputs lead to distinct outputs.
Indeed the contrapositive of the definition is
a1 6= a2
f (a1 ) 6= f (a2 ).
To determine whether a function is injective or not often requires a short proof.
(5.1)
19
Example 5.1. The function f : N N defined by

f (n) = n + 1
is injective. If f (x) = f (y), then x + 1 = y + 1 and hence x = y. Therefore f is injective.
Intuitively, it is clear that f (n) = n + 1 is a one-to-one function. The technical definition
of injectivity represents this idea in a rigorous way.
Example 5.2. The function f : Z N defined by f (n) = |n| is not injective since
f (1) = f (1), for instance. However, the function g : N N defined by g(n) = |n|
is injective. This illustrates the fact that changing the domain of a function can affect
injectivity.
Example 5.3. The function f : R R defined by f (x) = x2 is not injective. To prove
that f is not injective, all we have to do is find a pair of distinct elements in the domain R
which f maps to the same output. This is easy, since f (1) = f (1) = 1, for example (to
show that f is not injective, we need only find one such pair).
Example 5.4. Let us prove that the function f : [0, ) R defined by f (x) = x2
is injective. If f (x) = f (y) for some x, y [0, ), then x2 = y 2 . This implies that
(x y)(x + y) = 0 and hence either x y = 0 or x + y = 0. There are two possible cases
we must consider:
(i) If x y = 0, then x = y.
(ii) If x + y = 0, then x = y. But this implies that 0 x = y 0 (since x 0
and y 0), from which we see that x = y = 0.
Since both cases lead to the conclusion that x = y, it follows that f is injective.
Observe that the preceding proof did not automatically assume that f is invertible
(i.e., we did not make any use of the square-root function). Using the square-root function
would be inappropriate here since otherwise our reasoning would have been circular.
Another proof that f is injective can be based upon the contrapositive (5.1) of the
definition. If x1 6= x2 and x1 , x2 [0, ), then without loss of generality suppose that
0 x1 < x2 . It follows from this that x21 < x22 whence f (x1 ) 6= f (x2 ). To be really
picky, one should prove that 0 x1 < x2 implies that x21 < x22 . Let x2 = x1 + where
= x2 x1 > 0. It follows that
x22 = x21 + 2x1 + 2 > x21
since > 0 and x1 > 0.
The following theorem involves is useful for constructing various examples:
Theorem 5.1. If f is differentiable on an open interval I and f (x) 6= 0 for all x I,
then f is injective on I.
Proof. Suppose toward a contradiction that a, b I, a < b, and f (a) = f (b). By the
Mean Value Theorem from Calculus I, there exists some c such that a < c < b such that
f (b) f (a) = f (c)(b a).
Since f (b) f (a) = 0 and f (c) 6= 0, it follows that b a = 0 whence a = b. This
contradiction proves that f is injective.
20
5.4. Surjections
Definition. For a function f : A B, the set A is called the domain of f . The set B is
sometimes called the target set of f . The range of f is defined by
Ran f = {b B : (a A)(b = f (a))}.
The range of f is also sometimes called the image of f and denoted f (A).
Note that the range f (A) of f is not always equal to B.
Example 5.5. We can define a function f : R R via the formula f (x) = x2 . Here
A = B = R so that the domain and target set of f are both R. The range of f , however, is
the interval [0, ) = {x R : 0 < x}. This is because not every element of the target set
R is hit by the function. This points out the distinction between the target set B and the
range of a function. The target set is what you are aiming for and the range is what you
hit.
Definition. A function f : A B is called surjective (with respect to B) if for every
b B, there exists an a A such that f (a) = b. A surjective function is called a
surjection.
In symbols, the definition reads:
(b B)(a A)(f (a) = b).
Another commonly used terminology (which you may have heard in your calculus class)
is onto.
Observe that the target set B is of fundamental importance in the definition of surjectivity. By definition, a function is surjective if and only if ran f = B. That is, if and only
if the range of the function equals the entire target set B.
To say that a function is surjective is the same as saying that it
hits its entire target set.
Whether a function is surjective or not depends heavily on the target set B.
Example 5.6. The function f : N N defined by f (n) = n + 1 is not surjective since
f (n) 6= 0 for any n N.
Example 5.7. The function f : Z Z defined by f (n) = n + 1 is surjective. Indeed, for
any b Z, there exists an a Z (namely a = b 1) so that f (a) = b.
The preceding example illustrates the following rule:
A function f : A B is surjective if and only if the equation
f (a) = b has a solution for every b in B.
Example 5.8. The function f : R [1, 1] defined by f (x) = sin x is surjective. Note
that the equation f (x) = y for y [1, 1] has infinitely many solutions. In particular,
surjectivity does not guarantee that solutions to f (a) = b are necessarily unique.
21
5.5. Bijections
Definition. If f : A B is both injective and surjective, then we say that f is bijective.
Bijections are special. If f : A B is a bijection, then we can define an inverse
function f 1 : B A by setting f 1 (b) = a whenever f (a) = b. This is well-defined
since f is both surjective and injective. Indeed, if f is not surjective, then f 1 (b) cannot
be defined for those b B\f (A). If f is not injective, then there may be two distinct
a1 , a2 A such that f (a1 ) = f (a2 ) = b and hence f 1 (b) does not make sense.
Note that
f 1 f : A A; f f 1 : B B
and hence f 1 f and f f 1 are different functions (they have different domains) unless
A = B. Thus f 1 f = IA and f f 1 = IB where IA and IB denotes the identity
functions on A and B, respectively.
Example 5.9. The table below covers a number of examples:
f (x) =
x+1
x+1
sin x
x3 x
tan x
Domain Target Set

N
N
Z
Z
R
R
R
R
( 2 , 2 )
R
Range
{1, 2, 3, . . .}
Z
[1, 1]
R
R
Injective Surjective
Yes
No
Yes
Yes
No
No
No
Yes
Yes
Yes
Bijection
No
Yes
No
No
Yes
Most of the entries in the table are relatively self-explanatory. A few are worth mentioning
specifically, however. The function f : R R defined by f (x) = x3 x is a surjection
but not an injection. It is a surjection since limx f (x) = and f is continuous
(hence by the Intermediate Value Theorem from Calculus I, its range in R). It is not an
injection since f (1) = f (1) = 0.
Definition. Suppose that f : A B and g : B C are two functions. The composition
g f is the function g f : A C defined by
(g f )(a) = g(f (a)).
Observe that function composition is associative. Indeed, if h : C D, then
(h (g f ))(a) = h((g f )(a))
= h(g(f (a)))
= (h g)(f (a))
= ((h g) f )(a)
for all a A and hence we may write h g f without parentheses.
From our perspective, the most important property of function composition is that it
respects the properties of injectivity, surjectivity, and bijectivity:
Theorem 5.2. Let f : A B and g : B C be functions.
(i) If f and g are injections, then g f : A C is an injection,
(ii) If f and g are surjections, then g f : A C is a surjection,
(iii) If f and g are bijections, then g f : A C is a bijection,
22
(iv) If f : A B is a bijection, then f 1 : B A is also a bijection.
Proof. We first prove (i). For each a1 , a2 A we have
(g f )(a1 ) = (g f )(a2 ) g(f (a1 )) = g(f (a2 ))

f (a1 ) = f (a2 )
a1 = a2
and hence g f is injective. The first two s are because g and f are injections, respectively. Now for (ii). If c C, then we must find some a A such that (g f )(a) = c.
Since g is surjective, there exists some b B such that g(b) = c. Since f is surjective,
there exists some a A such that f (a) = b. Hence
(g f )(a) = g(f (a)) = g(b) = c
and g f is surjective. The proof of (iii) follows immediately from (i) and (ii). Statement
(iv) was discussed above when we defined inverse functions.
LECTURE 6
Cardinality
6.1. Cardinality
Definition. Let A and B be sets. If there exists a bijection f : A B, then A and B
are said to have equal cardinality (or stated: A and B are of the same cardinality). This is
written A
= B.
B just means that A and B have the same number of
Example 6.1. For finite sets, A =
elements. For instance, the sets
A = {apple, bird, cat},
B = {a, b, c}.
have the same cardinality since there is a bijection f : A B.
One of the most important properties of the symbol

= is that it is an equivalence
relation. In other words, it behaves like an equal sign:
Theorem 6.1.
= is an equivalence relation. In other words, for any sets A, B, C the
following are true:
(i) A
=A
(ii) A
=A
= B implies that B
(iii) A
= C.
= C implies that A
= B and B
Proof. (i) follows from the fact that the identity function IA : A A is a bijection. (ii)
follows from the fact that a bijection f : A B has an inverse function f 1 : B A
which is also a bijection. (iii) follows from the fact that the composition of bijections is
also a bijection.

The concept of cardinality allows us to divide up the universe of sets into various
categories. Some important definitions are:
Definition. We say that a set A is
(i) finite if A = or A
= {1, 2, . . . , n} for some n N
(ii) infinite if A is not finite
(iii) countable if A is finite or A
=N
(iv) uncountable if A is not countable.
A countable infinite set is sometimes called countably infinite.
There is an alternate definition of infinite which is sometimes used. We state it in the
form of a theorem (without proof):
Theorem 6.2. A set A is infinite if and only if there exists a proper subset B ( A such
that A
= B.
23
24
6.2. Countable Sets
Let us discuss some examples of countable sets and various methods for constructing
them.
Example 6.2. N
= N. Indeed, the identity function I : N N defined by I(n) = n for
all n N is clearly a bijection.
Theorem 6.3. Any subset of a countable set is countable.
Sketch of Pf. If A is a countable set, then we may list the elements of A:
a0 , a1 , a2 , . . . .
If B A, then we simply make a new list by crossing out those elements of A which do
not belong to B. This produces a new list which provides a recipe for a bijection.

Example 6.3. S
= N, where S = {0, 1, 4, 9, 16, . . .}. Indeed, the function f (n) = n2 is a
bijection from N onto S.
If A is a countable infinite set, then there is a bijection f : N A which provides a
list of the elements of A:
f (0),
f (1),
f (2),
f (3), . . . .
In other words, the elements of a countably infinite set can be listed.

Theorem 6.4. If A and B are countable, then A B is countable.
Proof. Without loss of generality, suppose that both A and B are countably infinite and
disjoint: A B = . The elements of A and B can be listed:
a0 ,
a1 ,
a2 ,
a3 ,
b0 ,
b1 ,
b2 ,
b3 ,
a4 , . . .
b4 , . . . .
We can simply interlace the two lists to obtain a listing of every element of A B:
a0 , b 0 , a1 , b 1 , a2 , b 2 , a3 , b 3 , a4 , b 4 , . . . .
Since A B = , this list provides a bijection from N to A B. In fact, the function

(
an/2
n even
f (n) =
b(n1)/2 n odd.
The case where one or both of A, B is finite is left to the reader (as is the case where
A B 6= ).

Example 6.4. Z
= N. Indeed,
0, 1, 1, 2, 2, 3, 3, 4, 4, . . .
is a complete listing of Z. Implicitly, this listing defines a function f : N Z whose

values are represented in the following table:
n
f (n)
0
0
1 2
1 1
3 4 5
2 2 3
In fact, an explicit formula for this funtion is

(
f (n) =
n
2
n+1
2
6 7
3 4
n even
n odd.
25
Example 6.5. N2
= N. This example is so important that we explain it in two different
ways. First, consider Figure 1, which illustrates a procedure for listing every element of
N2 . This provides a definite procedure for listing each element of N2 . In fact, one can find
F IGURE 1. A listing of N2 .
a polynomial in two variables which accomplishes this task (we leave the derivation of this
formula to the homework).
On a completely different note, there is also a brief number-theoretic argument which
provides another bijection f : N2 N. We claim that the function
f (a, b) = 2a (2b + 1) 1
is a bijection. If f (a, b) = f (c, d), then

2a (2b + 1) 1 = 2c (2d + 1) 1
whence
2a (2b + 1) = 2c (2d + 1).
Since 2b + 1 and 2d + 1 are odd, it follows from the Fundamental Theorem of Arithmetic
that a = c. This implies that 2b + 1 = 2d + 1 whence b = d. Therefore f is injective. If
n N is given, then use the Fundamental Theorem of Arithmetic to factor n + 1 to yield
a, b N such that
n + 1 = 2a (2b + 1).
Clearly this implies that f (a, b) = n whence f is surjective.
N. Using a similar idea, we can construct a list of every rational
Example 6.6. Q[0, 1] =
number in the closed interval [0, 1]:
0, 1, 12 , 13 , 32 , 41 , 43 , 15 , 52 , 53 , 54 , 16 , 65 , 71 , . . .
To do this, simply list the fractions in [0, 1] with denominator 1, 2, 3, . . . without repeats.
Since there are at most n fractions in [0, 1] having denominator n, it follows that each stage
in this procedure is finite.
Example 6.7. It is possible to prove that the Newman function
1
n 7
[n] + 1 hni
26
recursively generates the Calkin-Wilf sequence

1
1
2
1
3
2
3
1
4

1
2
1
3
2
3
1
4
3
which contains every positive rational number exactly once.
LECTURE 7
Cantors Theorem
7.1. Constructions with Countable Sets
S
Theorem 7.1. If An is countable for each n N, then nN An is countable. In other
words, the countable union of countable sets is countable.
Sketch of Pf. Without loss of generality, suppose that each of the An is countably infinite
and that Ai Aj = if i 6= j. For each n N, arrange the elements of each An in a list:
An = {a0n , a1n , a2n , a3n , . . .}.
The function
f (i, j) = ith element in the listing of Aj
= aij
S
defines a bijection f : N2 nN An . It therefore follows that
[
An
=N
= N2
nN
Theorem 7.2. The Cartesian product A B of two countable sets A, B is countable.

Sketch of Pf. Without loss of generality, suppose that A and B are countably infinite sets.
Let a0 , a1 , a2 , . . . be a listing of the elements of A. Since
A B = {(a, b) : a A, b B}
[
=
{ (an , b) : b B }
nN
and each set An = {(an , b) : b B} is countable, it follows from the preceding theorem
that A B is countable.

Example 7.1. Z2
= N. Indeed, Z
= N (i.e., Z is countable) and hence the preceding
theorem tells us that Z2 is countable as well. This can also be established via a snake
eating the dots argument. First regard Z2 as a subset of the Euclidean plane. Starting at
(0, 0), trace out a square spiral pattern which hits every lattice point (a, b) Z2 .
Example 7.2. Q
= N. For each point (a, b) Z2 , we can associate the fraction a/b.
Some of these will be meaningless (if b = 0) and many will be repeats, since 1/2 = 2/4 =
3/6 = , for example. We can, however, produce a list of all of Q by using the snake
argument from the preceding example to produce a complete list of all possible rational
numbers.
Another way to prove that Q
= N is to use the fact that Q [0, 1]
= N and employ
some of our theorems on constructing countable sets. We leave this as an exercise.
27
28
Lecture 7. Cantors Theorem

7.2. Cantors Diagonal Argument
One might begin to suspect that bijections between infinite sets are essentially meaningless and that all infinities are the same. Shockingly, it turns out that this is not the
case. The following remarkable theorem is due to Georg Cantor:
Theorem 7.3 (Cantor). R is uncountable. In other words, there does not exist a bijection
f : N R.
Proof. Suppose toward a contradiction that a bijection f : N R exists (in fact, we will
prove that no surjection f : N R exists). We will use the fact that any real number can
be written uniquely as a sequence of decimal digits1. Since the function f is supposed to
be a bijection from N to R, we obtain a complete listing
f (0), f (1), f (2), f (3), . . .
of all R. Let us write this list as an array:
f (0) = d00 .d01 d02 d03 d04 . . .
f (1) = d10 .d11 d12 d13 d14 . . .
f (2) = d20 .d21 d22 d23 d24 . . .
f (3) = d30 .d31 d32 d33 d34 . . .
f (4) = d40 .d41 d42 d43 d44 . . .
.. .. ..
. . .
where the di0 s are integers and the dij {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} for j 1. We will
take the diagonal number:
d00 .d11 d22 d33 d44 . . .
and tweak it so that the resulting number cannot possibly be on our list. This will be our
desired contradiction.
Consider the new number
x = D0 .D1 D2 D3 . . .
where the new digits Dn are defined by
Dn =
4
7
dnn =
6 4
.
dnn = 4
Note that for each n N, the nth decimal place of x is different than the nth decimal place
of f (n). In other words, x cannot be any of the f (n) and hence the function f : N R is
not surjective, a contradiction.

The numbers 4 and 7 in the preceding proof are not important. We just do not want to
use 9s in either case since otherwise the number y produced might end in all 9s, which
would cause a problem since we are using decimal expansions that do not trail off in all
9s.
MORAL: R is so much larger than N that it belongs to a higher
class of infinite sets. In other words, there are different levels
of infinity.
1If we agree not to end in all 9s. For instance: 0.50000 . . . = 0.4999999 . . ..
29
Corollary 3. The set R\Q of irrational numbers is uncountable. In particular, there are
more irrational numbers than rational numbers.
Proof. Recall that Q is countable. Since the union of two countable sets is countable, if
R\Q were countable, then R would be countable too. This would be a contradiction to
Cantors Theorem.

Corollary 4. Every subinterval (a, b) of R is uncountable.
Sketch of Pf. It suffices to find a bijection between (a, b) and R. This can be done, for
instance, by composing an appropriate linear function f (x) = ax + b (with a 6= 0) a
continuous, monotone increasing function with two vertical asymptotes, such as g(x) =
tan1 x on the interval (/2, /2) or h(x) = x/(1 x2 ) on the interval (1, 1).

The previous corollary asserts that not only does R contain vastly more elements than
N, any tiny subinterval of R, no matter how small, does as well. This may seem paradoxical
at first, and it takes a long time to digest. Try to think of this in the context of the fact that
between any two rational numbers, there is an irrational number and that between any two
irrational numbers, there is a rational number.
LECTURE 8
The Continuum Hypothesis

8.1. Cantors Powerset Theorem
Definition. If A is a set, then the power set of A, denote P(A) is defined to be the set of
all subsets of A. In symbols:
P(A) = {B : B A}.
Example 8.1. If A = {a, b, c}, then

P(A) = , {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, A .
Example 8.2. Describing the power set of infinite sets is much trickier. For instance, P(N)
contains every possible subset of N and hence contains the sets
, {1}, {5, 23}, {2, 4, 6, 8, . . .}, {100, 101, 102, . . .}, {2, 3, 5, 7, 11, 13, . . .}.
It turns out that P(N) is much larger than N itself. In fact, Cantor showed that there
are infinitely many levels of infinity:
Theorem 8.1 (Cantor). If S is any set, then there does not exist a bijection f : S P(S).
In other words, P(S) is of a strictly larger cardinality than S.
Proof. Assume toward a contradiction that f : S P(S) is a bijection. For each x S,
we have f (x) S and hence either x f (x) or x
/ f (x). Let
E = {x S : x
/ f (x)}.
Since f is a bijection, there exists a z S such that f (z) = E. However,
zEz
/ f (z)
Def. of E
z
/E
Since f (z) = E.
This contradiction shows that no such f can exist.
The preceding theorem shows that if S is an infinite set, then P(S) is much bigger
than S itself, so much bigger that it bumps up to a higher level of infinity. Moreover, we
can obtain a chain of ever larger infinite sets:
S,
P(S),
P(P(S)), . . . .
This is a somewhat shocking thought! Now for a big question:

Question. If N A R, then is it necessarily the case that either A
= N or A
= R?
Phrased another way, are there intermediate cardinalities between that of N and that of
R?
30
31
8.2. Russells Paradox

Having worked with sets a little bit, you might be surprised to learn that our approach
to sets is not logically sound. In fact, it is called naive set theory to distinguish it from
the rigorous axiomatic approach used in formal set theory. A startlingly simple logical
paradox due to Bertrand Russell immediately shows that the basis of this approach to sets
is unsound.
One of the basic principles of naive set theory is the General Comprehension Principle, which we implicitly used above. In the early days of set theory (around 18731900),
mathematicians and logicians had always assumed that you can always define a set if you
have a definite property P (x). In other words, given a reasonable statement P (x), the set
of all x for which P (x) is true should exist, logically speaking. Essentially, they assumed
that
{x : P (x)}
should always exist and be something that we are allowed to think about and discuss logically. Surprisingly, this is not the case.
The death blow to naive set theory came in 1901 and it is called Russells Paradox.
Russell begins by letting
R = {x : (x is a set) (x
/ x)}
In other words,
R is the set of all sets that are not elements of themselves.
The expression
P (x) = (x is a set) (x
/ x)
is quite unambiguous. An object x should either be a set or not a set. An object x should
either be an element of itself or not be an element of itself. Thus P (x) looks like an unambiguous, if a little unusual, condition. As logical human beings, we should be permitted to
think about the set R.
Russell then asks: Does R contain itself or not? Unfortunately, the definition of R
implies that
RR R
/ R.
Neither R R nor R
/ R is logically possible! This means that we cannot treat R
as a set it is simply too large of an idea to be considered in a logically sound manner.
In other words, we cannot logically consider the set of all sets that are not elements of
themselves without running into paradoxes. We just cannot it is a law of the universe.
Russells Paradox shows that the General Comprehension Principle is not correct.
Russell discovered this paradox and sent it to Gottlob Frege (1848 1925) as Frege was
finishing his Grundgesetze der Arithmetik, a work which attempted to rigorously derive the
laws of arithmetic from supposedly logical axioms. Russells Paradox invalidated much of
Freges work. Indeed, Frege noted:
A scientist can hardly meet with anything more undesirable than to have the
foundation give way just as the work is finished. I was put in this position
by a letter from Mr. Bertrand Russell when the work was nearly through the
press.
There are many other logical paradoxes that have been discovered throughout the
years, but Russells paradox is one of the most important. It forced mathematicians and
logicians to completely reevaluate mathematics and logic from the ground up. Russells
32
Paradox ushered in a new age in which sets would have to be treated in a rigorous axiomatic
fashion. The rules would have to be explicitly stated in such a way that Russells Paradox
would not occur in the universe of Axiomatic Set Theory. Although we will not discuss
axiomatic set theory in this course, it is important to be aware that sets and set theory are
not as simple as they sound.
Here are a couple of paradoxes which are somewhat similar in spirit:
Example 8.3. A car is equipped with a Russell light on its dashboard. The light turns on
to warn the driver if a light has burnt out. What happens when the Russell light burns out?
Example 8.4. The following paradox of Eubulides of Miletus1 (4th century BCE) indicates
that self-reference can be troublesome:
This statement is false.
This is a troublesome sentence (call it P ) since
P is true
P is false.
Thus Eubulides statement is not a logical proposition. This paradox is similar to the liar
paradox: I am lying.
8.3. The Continuum Hypothesis
Question. If N A R, then is it necessarily the case that either A
= N or A
= R?
Phrased another way, are there intermediate cardinalities between that of N and that of
R?
The Continuum Hypothesis (CH) asserts that if N A R, then either A
= N
or A
= R. Georg Cantor believed CH to be true, and spent years attempting to prove
it. David Hilbert, one of the greatest mathematicians in history, placed it first on his list
of open questions presented to the 1900 International Mathematical Congress in Paris.
Surprisingly, the question of whether CH is true or false is not possible to answer.
In 1940, Kurt Godel proved that CH cannot be disproved from the axioms of set theory. Specifically, he showed that CH cannot be disproved using the Zermelo-Fraenkel (ZF)
axioms or using the Zermelo-Fraenkel axioms with the addition of the (at one time controversial) Axiom of Choice (AC). This extended axiom system is denoted ZFC. In 1963, Paul
Cohen demonstrated that CH cannot be proved from ZFC either and hence CH is logically
independent of ZFC it is neither true nor false, with respect to the standard axioms of set
theory (of course, the results of Godel and Cohen rely on the assumption that ZFC is not
in itself flawed).
Using the standard (ZFC) axioms of set theory, one can add CH or its negation to
obtain two different versions of mathematics, one in which CH is true and one in which
CH is false. Each universe is as valid as the other the truth or falsehood of CH is therefore
a matter of opinion, since it cannot be proved or disproved from ZFC. This seems bizarre,
but it is easier to understand if we examine a similar situation that occurred in classical
geometry.
1I have actually been to Miletus (now known as Milet, in modern Turkey). There are many fascinating
Roman era ruins, partially sunken below a swamp, which are open to the public. There are, however, few tourists
who visit the site.
33
8.4. Digression on Geometry

Around 2300 years ago, the famous geometer Euclid of Alexandria (in modern Egypt)
wrote the Elements, a monumental treatise on geometry and related topics. What is remarkable about the Elements is that it is an attempt to build geometry in a logical and rigorous
manner from a few basic axioms. Although Euclids book contains numerous oversights
and hidden assumptions, it is nonetheless a magnificent intellectual achievement (despite
the fact that many of the results in the Elements were already known to others and that
Euclid collected them together in a textbook).
Euclid worked with certain primitive notions, for example points, lines, and circles,
although he did attempt to give some vague definitions. For instance, he says that:
Definition. A point is that which has no part.
Definition. A line is breadthless length.
Although this might appear silly, how would you define a point? One might be
tempted to say that:
A point is an ordered pair (x, y) whose entries x and y are real numbers,
although the modern Cartesian definition above is not much better. Indeed, what exactly
are real numbers anyway? One cannot simply say that
A real number is a point on a line,
since this would lead us in circles! Similarly, how would you define the words length and
angle?
After making 23 somewhat lengthy definitions, defining everything from circles to
isosceles triangles to rhomboids, Euclid is ready to begin talking about axioms2 statements which are given as true. Euclidean geometry then refers to the vast body of theorems which can be proved using Euclids definitions and axioms. Euclids axioms (he
called them postulates) for geometry are:
Postulate 1. A straight line segment can be drawn joining any two points.
Postulate 2. Any straight line segment can be extended indefinitely in a straight line.
Postulate 3. Given any straight line segment, a circle can be drawn having the segment as
radius and one endpoint as center.
Postulate 4. All right angles are congruent.
Postulate 5. If two lines are drawn which intersect a third in such a way that the sum of
the inner angles on one side is less than two right angles, then the two lines inevitably must
intersect each other on that side if extended far enough.
From these he proceeds to prove many well-known theorems on plane geometry. Unfortunately, Euclids 5th Postulate looks too complicated. Is the 5th Postulate something
that we should accept as true? Euclid himself must have been unsatisfied with his 5th
Postulate since he absolutely held off from using it as long as he could (until his twentyninth theorem Proposition I.29).
Perhaps Postulate 5 is redundant and can be proved from Postulates 14? If that were
possible, we would not need to assume that Postulate 5 is true at all we could prove it
2He actually also talked about common notions, which were mainly explicit rules for logical deduction.
34
from Postulates 14 and call it a theorem instead. This is precisely what people tried to do
(unsuccessfully) for 2000 years.
Given only Postulates 14, it is impossible to prove or to disprove Postulate 5. In
other words, Euclids 5th Postulate is neither true nor false in the mathematical universe
generated by Postulates 14. This is not a statement about universal truth and universal
falsehood, which is reserved for philosophy. It means only that if we are given only Postulates 14 as true, we cannot logically deduce the truth or falsehood of Postulate 5. One
says that the 5th Postulate is logically independent of Postulates 14.
This opens up two possible mathematical universes, each as valid as the other. If one
assumes that Euclids 5th Postulate is true and proceeds to prove theorems from based
on Postulates 15, then one is proving theorems about Euclidean (flat) geometry. If one
assumes that Euclids 5th Postulate is false and proceeds to prove theorems in this setting,
then one is proving theorems about hyperbolic geometry (a type of curved geometry). The
existence of curved geometries is not surprising to us in the 21st century, since we are used
to hearing of relativity and curved space-time. Many years ago, however, this was an
extremely radical thought. Indeed, the philosopher Immanuel Kant went so far as to say
that Euclidean geometry is the inevitable necessity of thought.
LECTURE 9
Normed Vector Spaces

9.1. Vector Spaces
You may have encountered vectors and vector spaces in other settings before. An
abstract vector space is a generalization of n-dimensional Euclidean space, Rn .
Definition. A vector space is a set V endowed with (and closed under) operations called
vector addition and scalar multiplication such that the following hold:
(i) C OMMUTATIVITY: u + v = v + u for all u, v in V.
(ii) A SSOCIATIVITY: (u + v) + w = u + (v + w) for all u, v, w in V.
(iii) A DDITIVE I DENTITY: There exists a vector 0 V such that
u+0=0+u=u
for all u V.
(iv) A DDITIVE I NVERSE: For every u V, there exists a v V such that u + v =
0.
(v) M ULTIPLICATIVE I DENTITY: 1u = u for all u V.
(vi) D ISTRIBUTIVITY:
a(u + v) = au + av,
(a + b)u = au + bu
for all a, b R and u, v V.
Note that in general we do not have a rule that lets us multiply two vectors (i.e., like
a cross-product). An important theorem for constructing and identifying new vector
spaces is the following:
Theorem 9.1. A subset W of a vector space V is itself a vector space (with the operations
inherited from V) if and only if c1 w1 + c2 w2 W for all c1 , c2 R and w1 , w2 W.
Example 9.1. R itself is a vector space. The operations are simply the usual operations of
addition and multiplication.
Example 9.2. The simplest and most important nontrivial example of a vector space is
n-dimensional Euclidean space, Rn , with the usual operations of vector addition
(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn )
and scalar multiplication
a(x1 , . . . , xn ) = (ax1 , . . . , axn ).
35
36
Example 9.3. The set Mn (R) of all n n matrices is a vector space. Indeed, matrices
can be added and multiplied by constants (one can check that the vector space axioms are
2
satisfied). In fact, Mn (R) is really a disguised version of Rn .
Example 9.4. Let Pn (R) denote the set of polynomials of degree n:
Pn (R) = {a0 + a1 x + a2 x + + an xn : a0 , a1 . . . , an R}.
We consider each polynomial
a0 + a1 x + + an xn
to be a vector in Pn (R) and we define vector addition by

(a0 + + an xn ) + (b0 + + bn xn ) = (a0 + b0 ) + + (an + bn )xn
and scalar multiplication by
c(a0 + an xn ) = ca0 + + can xn .
Notice the similarity between Pn (R) and Rn+1 . Each polynomial in Pn (R) is uniquely
determined by an (n + 1)-tuple (a0 , a1 , . . . , an ) of real numbers.
Example 9.5. If X Rn , then the set C(X) of all continuous real-valued functions
f : X R is a vector space. Using the fact that the sum of continuous functions is
continuous, one can verify that C(X) is closed under the operations of addition and scalar
multiplication. In particular, note that the zero function plays the role of the zero vector in
C(X).
Example 9.6. The set
V = {f : R R : (x R)( f (x) + f (x) = 0 )}
of all solutions to the differential equation
y (x) + y(x) = 0
(9.1)
is a vector space (with the regular multiplication and function addition playing the roles of
scalar multiplication and vector addition).
Recall from Calculus I that every differentiable function is automatically continuous.
It therefore follows that V C(R), a known vector space. By Theorem 9.1, to show that
V is a vector space, we need only show that if y1 and y2 are two solutions to (9.1) and
c1 , c2 R, c1 y1 + c2 y2 also satisfies the differential equation (9.1):
(c1 y1 + c2 y2 ) + (c1 y1 + c2 y2 ) = c1 y1 + c2 y2 + c1 y1 + c2 y2
= c1 (y1 + y1 ) + c2 (y2 + y2 )
= c1 0 + c2 0
= 0.
In any case, the main point of this discussion is to explain that vector spaces composed
of functions often arise in natural settings.
9.2. Norms on Vector Spaces
Definition. A norm on a vector space V is any function k k : V R that satisfies the
following conditions:
(i) kvk 0 for all v V and kvk = 0 if and only if v = 0
(ii) kavk = |a|kvk for any a R and v V,
37
(iii) kv + wk kvk + kwk.
A vector space V is a normed linear space if there is a norm on V.
The inequality (iii) in the preceding definition is known as the Triangle Inequality.
Example 9.7. R is a normed linear space when equipped with the norm kak = |a|. In fact,
norms are generalizations of the absolute value function to vector spaces. Also observe
that if > 0, then kak = |a| is also a norm on R.
Often there are several possible norms on a given vector space. In that case, we should
be specific about stating which norm we are using.
Example 9.8. There are many different norms on Rn . For instance, the following norms
on Rn are extremely important:
n
X
|vi |,
kvk1 =
i=1
v
u n
uX
kvk2 = t
|vi |2 ,
i=1
kvk = sup |vi |.

1in
Here v = (v1 , v2 , . . . , vn ) denotes a typical vector in Rn . Observe that the 2-norm k k2 on

Rn is simply the standard Euclidean norm that you encountered in Multivariable Calculus
and/or Linear Algebra. You should check that the axioms (i), (ii), and (iii) for a norm are
indeed satisfied by the above.
Example 9.9. Any positive multiple of a norm is also a norm. For instance,
n
1X
|vi |
kvk =
n i=1
is a norm on Rn . It is 1/n times the 1-norm k k1 on Rn . Observe that this new norm is
simply the mean of the absolute values of the entries of v. In particular, it is not hard to
see how this norm would come up in statistics.
Example 9.10. If V = C([a, b]), the vector space of continuous functions on [a, b], then we
have a choice of many possible norms. The following norms on C([a, b]) are all extremely
important:
Z b
kf k1 =
|f (x)| dx,
sa
Z b
kf k2 =
|f (x)|2 dx,
a
kf k = sup |f (x)|.
axb
Observe that the functions we are considering are continuous, the preceding norms are
actually well-defined. For instance, if f is continuous, then it has an absolute maximum
and minimum on [a, b] by the Extreme Value Theorem from Calculus I and kf k is welldefined. In fact, you have actually been using the -norm (also called the sup norm) for
most of your mathematical career since
kf k = the absolute maximum of |f (x)| on [a, b].
38
Example 9.11. If V denotes the vector space of all possible functions f : [a, b] R,
then there are no useful norms that can be defined on V. Indeed, the norms from the
preceding examples are no longer well-defined since without any restrictions on f , the
integrals defining the prospective norms need not exist (i.e., they can blow up or be
undefined). In other words, this vector space is simply too large to have any useful
geometric structure.
LECTURE 10
Metric Spaces
10.1. Metric Spaces
Having seen that normed linear spaces and inner product spaces (see Notes on Inner
Products) are natural generalizations of Rn , we now turn to metric spaces, which can be
loosely characterized as anything that we can have a halfway decent notion of distance
in.
Definition. A metric space is a set M , whose elements are called points, endowed with a
metric d : M M R that satisfies the following properties:
(i) d(x, y) 0 for all x, y M . Moreover, d(x, y) = 0 if and only if x = y,
(ii) d(x, y) = d(y, x) for all x, y M ,

(iii) d(x, z) d(x, y) + d(y, z) for all x, y, z M .
The third property is called the Triangle Inequality.
The basic idea is that d is a distance function which assigns a distance d(x, y) between
any two points x, y M . Essentially, metric spaces are the most general mathematical
object to which the notion of distance applies. Many metric spaces are familiar, some
are quite strange and pathological. Since some metric spaces have more than one possible
metric, we sometimes say that (M, d) is a metric space if we want to be specific about
which metric d we will be using.
Example 10.1. Any normed vector space is automatically a metric space. The metric is
simply given by
d(x, y) = kx yk.
Properties (i) and (ii) of a metric are obviously satisfied, and the triangle inequality (iii)
follows from the short computation
d(x, z) = kx zk
= k(x y) + (y z)k
kx yk + ky zk
= d(x, y) + d(y, z).

n
In particular, R is a metric space in several different ways (recall that there are many
different ways to place a norm on Rn ). When confusion might occur, we should be specific
about which metric we are using.
Example 10.2. The set
Mn (R) = {A : A is an n n matrix}
39
40

2
is a metric space. Indeed, it can be viewed as Rn and hence we can equip Mn (R) with
2
any of the metrics that we place on Rn . For instance, if aij denotes the ijth entry of A,
then
v
uX
u n
kAk2 = t
|aij |2
i,j=1
defines a norm on Mn (R). Thus if A an B are n n matrices with entries aij and bij , then
v
uX
u n
|aij bij |2
d2 (A, B) = kA Bk2 = t
i,j=1
gives us a metric on Mn (R). Other metrics that are useful are

d1 (A, B) = kA Bk1 =
and
n
X
i,j=1
|aij bij |,
d (A, B) = kA Bk = max |aij bij |.

1i,jn
One important thing to note is that, unlike normed vector spaces (which include inner
product spaces), there is no notion of adding and scalar multiplication in general metric
spaces.
Definition. If X is a nonempty set, then we define the discrete metric on X to be the
metric defined by
(
0 x=y
d(x, y) =
1 x 6= y.
The discrete metric comes up occasionally in graph theory and computer science. If
X is a metric space equipped with the discrete metric, then all points of X lie at a unit
distance from all other points. This is a difficult thing to visualize. For instance, picture
what the discrete metric on R looks like.
Definition. If X is a metric space and Y X, then Y is also a metric space (when
equipped with the metric inherited from X). We say that Y is a subspace of X.1
Example 10.3. A Mobius strip in R3 (with the standard metric) is also a metric space.
The set of all invertible n n matrices is a metric subspace of Mn (R), although it is not a
vector subspace (it is not closed under addition).
Example 10.4. The spiral
Y = {(r cos r, r sin r) R2 : r 0}
is a subspace of X = R2 (equipped with the usual Euclidean metric).

Essentially every subset of a metric space gives you a new metric space. However,
we must be careful with the word subspace. If V is a normed vector space, then it is
automatically a metric space with d(x, y) = kx yk. Any subset of V is automatically a
metric space, but not all subsets of V are normed vector spaces since not all subsets of V
are vector spaces.
1Do not confuse this with the notion of subspace from linear algebra.
41
10.2. Convergent Sequences

A sequence of points in a metric space M is simply a list x0 , x1 , x2 , . . . of some points
in M (possibly with repetition). Formally speaking, a sequence is a function f : N M
and what we think of as the nth term in the sequence fn is actually f (n). In fact, every
function f : N M defines a sequence and vice-versa. However, the subscript notation
is more natural.
Definition. A sequence xn in a metric space (M, d) converges to the limit x M if for
every > 0, there exists N N such that d(xn , x) < whenever n N . In symbols:
( > 0)(N N)(n N d(xn , x) < ).
d
We denote this by limn xn = x, xn x, or xn x (when we wish to be specific

about which metric we are using).
It should be remarked that when using the notation limn xn = x or xn x it is
important that we know which metric we are using. Only use this notation when there is
no chance of confusion.
The following simple fact from arithmetic is often useful:
Lemma 1 (The -Principle). Let x R. If |x| < for every > 0, then x = 0.
Proof. By the Trichotomy Law, either x < 0, x = 0, or x > 0. If x > 0, then let = x/2
so that 0 < x < x/2, which leads to the contradiction 0 < 2 < 1. Similar reasoning shows
that x < 0 is impossible as well. Therefore x = 0.

The following theorem is a classic example of an 2 -argument:
Theorem 10.1. Limits are unique (when they exist). In other words, if xn is a sequence of
points in a metric space M such that xn x and xn y, then x = y.
Proof. Let > 0. By the definition of convergence, there exist Nx , Ny N such that
n Nx
d(xn , x) < /2
n Ny
d(xn , y) < /2.
and
Let N = max{Nx , Ny }. If n N , then it follows from the triangle inequality that
d(x, y) d(x, xn ) + d(xn , y)
<
= .
Thus 0 d(x, y) < for any > 0, whence d(x, y) = 0 by the -Principle. By the
definition of a metric, it follows that x = y, as desired.

Another important property of convergent sequences is that they are always bounded:
Theorem 10.2. A convergent sequence is bounded. In other words, if limn xn = x,
then there exists R > 0 so that d(xn , x) R for all n N.
Proof. Letting = 1 in the definition of convergence, we find that there exists N N so
that d(xn , x) < 1 whenever n N . Since
d(x0 , x), d(x1 , x), . . . , d(xN 1 , x)
42
is a finite sequence, there exists R0 > 0 so that

d(xn , x) < R0 ,
This implies that the inequality
holds for any n N.
n = 0, 1, 2, . . . , N 1.
d(xn , x) < max{1, R0 } = R
Geometrically speaking, the preceding theorem asserts that the open ball
BR (x) = {y M : d(x, y) < R}
centered at x and with radius R contains every point in the sequence xn .
LECTURE 11
Subsequences, Continuity
11.1. Subsequences
Definition. If xn is a sequence in a metric space (M, d), then we say that yk is a subsequence of xn if there is a sequence nk of natural numbers such that
and yk = xnk .
0 n0 < n1 <
Observe that the terms in the subsequence yk must appear in the same order that they
appeared in xn . Furthermore, also note that k nk for all k N.
Example 11.1. The sequence
1 1
1, , , . . .
2 3
converges to 0 in the metric space (R, d), where d(x, y) = |x y| is the usual metric. The
sequence
1 1 1 1
1, , , , , . . .
3 5 7 9
is a subsequence of the original sequence. On the other hand,
1 1 1 1
, , , , 1, . . .
5 3 3 9
is not a subsequence for a variety of reasons. First, the terms do not appear in the same
order as they did in the original sequence. Second, 13 is repeated twice, but only occurs
once in the original series.
Theorem 11.1. Every subsequence of a convergent sequence converges and it converges
to the same limit as the original sequence does.
Proof. Let yk = xnk be a subsequence of xn . If > 0, let N N be so large that
d(xn , x) < for n N . Since k nk for all k N, it follows that d(yk , x) =
d(xnk , x) < whenever k N . Thus yk x.

Keep in mind the following examples of sequences and subsequences.
Example 11.2. The sequence xn = (1)n in R (with the usual metric) does not converge.
However, the subsequences 1, 1, 1, 1, . . . and 1, 1, 1, . . . do converge (to different limits).
Example 11.3. The sequence
xn =
1/n
n
n even
n odd
in R (with the standard metric) does not converge. However, one can show that every
subsequence of xn which does converge converges to 0.
43
44
Lecture 11. Subsequences, Continuity

11.2. Continuity
The principle objects of study in analysis are functions. In particular, we are interested in continuous functions. The following definition is a generalization of the notion of
continuity encountered in calculus:
Definition. Let (A, dA ) and (B, dB ) be metric spaces. We say that a function f : A B
is continuous at a point x0 A if
( > 0)( > 0)( dA (x, x0 ) < dB (f (x), f (x0 )) < ).
We say that f : A B is a continuous function on A if f is continuous at each point of

A.
Example 11.4. The preceding definition coincides with the definition of continuity from
calculus, when A = B = R, equipped with the regular metric d(x, y) = |x y|. Feel free
to assume that the continuous functions you knew from calculus are actually continuous.
Example 11.5. Since a norm on a vector space always gives rise to a metric, it follows that
(Mn (R), d) is a metric space where
v
uX
u n
d(A, B) = kA Bk2 = t
|aij bij |2 ,
i,j=1
In fact, this is essentially the Euclidean metric on Rn ! Thus the determinant function
det : Mn (R) R is continuous (with respect to any of the metrics above) since det
is a polynomial in the n2 real variables aij (where 1 i, j n) and we know from
Multivariable Calculus that polynomial functions are continuous functions.
The following theorem is an example of a standard continuity argument:
Theorem 11.2. Let (A, dA ) be a metric space and let B be a normed vector space with
norm k kB .1 If f : A B and g : A B are continuous, then f + g is also continuous.
Proof. This is another /2 argument. Let > 0 be given and note that the definition of
continuity gives us 1 > 0 and 2 > 0 so that
dA (x, y) < 1
dA (x, y) < 2
dB (f (x), f (y)) <

dB (g(x), g(y)) <
2.
Let = min{1 , 2 }, then dA (x, y) < implies that
dB ((f + g)(x), (f + g)(y))) = k(f + g)(x) (f + g)(y)kB
= k(f (x) + g(x)) (f (y) + g(y))kB

= k(f (x) f (y)) + (g(x) g(y))kB
kf (x) f (y)kB + kg(x) g(y)kB
= dB (f (x), f (y)) + dB (g(x), g(y))

= 2 +
= .
Thus f + g is continuous.
1Recall that B is automatically a metric space with metric d (x, y) = kx yk .
B
B
45
Example 11.6. If we take A = B = R with the usual metric, then the preceding theorem
simply states that the sum of two continuous functions is continuous.
LECTURE 12
Sequences and Continuity

12.1. Sequential Characterization of Continuity
The following theorem is extremely useful for establishing the convergence of certain
sequences:
Theorem 12.1 (Squeeze Theorem). Let an 0 be a sequence in R such that an 0. If
xn is a sequence of points in a metric space (M, d) such that d(x, xn ) an for all n, then
xn x.
Proof. Let > 0 be given and let N N be so large that n N implies that an < . It
follows from this that n N implies d(x, xn ) an < whence xn x.

Using the preceding, we can establish the following sequential characterization of
continuity:
Theorem 12.2. Let (A, dA ) and (B, dB ) be metric spaces. A function f : A B is
continuous at a point a A if and only if
d
A
a
an
B
f (a).
f (an )
In particular, if f is continuous at a point a A, then

lim f (an ) = f lim an .
n
(12.1)
A
a. Let > 0 and use the
Proof. () Suppose that f is continuous at a and that an
definition of continuity to obtain a > 0 so that
dA (a, a ) <
dB (f (a), f (a )) < .
dA (an , a) < .
The definition of convergence gives us N N so that

nN
Putting this together, we see that

dB
nN
dB (f (an ), f (a)) <
and hence f (an ) f (a) as desired.

() Suppose toward a contradiction that (12.1) holds for every sequence an which converges to a but that a is not continuous at a. To see what this means, we must negate the
following:
( > 0)( > 0)(a A)( dA (a, a ) < dB (f (a), f (a )) < ).
Performing the negation we find that
( > 0)( > 0)(a A)( dA (a, a ) < dB (f (a), f (a )) ).
46

1
2n
Applying this successively with =

d(a, an ) <
1
2n
47
we obtain a sequence an such that

and d(f (a), f (an )) .
d
B
A
f (a) by hypothesis. However, this
a whence f (an )
By the Squeeze Theorem, an
contradicts the fact that d(f (an ), f (a)) for all n N. Thus f must actually be
continuous at a, as desired.
12.2. Continuity and Composition

Using the sequential characterization of continuity, we can see that the composition of
continuous functions is continuous:
Theorem 12.3. The composition of continuous functions is continuous.
Pf. 1. Let (A, dA ), (B, dB ), and (C, dC ) be three metric spaces and let f : A B and
d
A
a be a convergent sequence in A.
g : B C be two continuous functions. Let an
Since f is continuous, it follows that
B
f (a).
f (an )
Since g is continuous, it follows that
C
g(f (a)).
g(f (an ))
A
C
a. By the previous theorem, we can
(g f )(a) whenever an
Therefore (g f )(an )
conclude that the composition g f : A C is continuous.
Of course, we can also prove this theorem using the definition:

Pf. 2. Let (A, dA ), (B, dB ), and (C, dC ) be metric spaces and let f : A B and g :
B C be continuous functions. If > 0 is given, the definition of continuity gives us
some > 0 so that
dB (b1 , b2 ) <
dC (g(b1 ), g(b2 )) < .
dB (f (a1 ), f (a2 )) < .
By the definition of continuity, there exists > 0 so that

dA (a1 , a2 ) <
We conclude that there exists > 0 so that

dA (a1 , a2 ) <
whence g f is continuous.
dC (g(f (a1 )), g(f (a2 ))) < ,


Definition. Let (M, d) be a metric space and let S M .
(i) A point x M is a limit point of S if there exists a sequence xn in S so that
xn x.
(ii) A point x M is a accumulation point of S if there exists a sequence xn of
distinct points of S so that xn x.
(iii) A point x M is called an isolated point of S if there exists > 0 such that
B (x) S = {x}.
Here B (x) denotes the open ball of radius centered at x.
48
Lecture 12. Sequences and Continuity

In particular, note that
An accumulation point of S is automatically a limit point of S,
An isolated point of S automatically belongs to S.

The following example illustrates more basic facts about limit, accumulation, and isolated
points:
Example 12.1. Let M = R and d(x, y) = |x y|. If
(i) S = [0, 1), then 1 is a limit point and an accumulation point of S. In particular,
note that neither a limit point nor an accumulation point of S need actually
belong to S.
(ii) S = {0}, then 0 is an isolated point of S. On the other hand, 0 is also a limit
point of S since the sequence 0, 0, 0, . . . of points of S converges to 0.
LECTURE 13
Closed Sets
Definition. Let (M, d) be a metric space and let S M .
(i) A point x M is a limit point of S if there exists a sequence xn in S so that
xn x.
(ii) A point x M is a accumulation point of S if there exists a sequence xn of
distinct points of S so that xn x.
(iii) A point x M is called an isolated point of S if there exists > 0 such that
B (x) S = {x}.
Here B (x) denotes the open ball of radius centered at x.
In particular, note that
An accumulation point of S is automatically a limit point of S,
An isolated point of S automatically belongs to S.
The following example illustrates more basic facts about limit, accumulation, and isolated
points:
Example 13.1. Let M = R and d(x, y) = |x y|. If
(i) S = [0, 1), then 1 is a limit point and an accumulation point of S. In particular,
note that neither a limit point nor an accumulation point of S need actually
belong to S.
(ii) S = {0}, then 0 is an isolated point of S. On the other hand, 0 is also a limit
point of S since the sequence 0, 0, 0, . . . of points of S converges to 0.
Theorem 13.1. Let (A, dA ) and (B, dB ) be metric spaces. If a is an isolated point of A
and f : A B is any function, then f is continuous at a.
Proof. Since a is an isolated point of A, there exists > 0 such that dA (x, a) < implies
that x = a. If > 0 is given, then observe that
dA (x, a) <
Thus f is continuous at a.
x=a
dB (f (x), f (a)) = 0 < .

Theorem 13.2. If (M, d) is a metric space and S M , then x S is an accumulation

point of S if and only if x is not an isolated point of S.
Proof. () If x S is an accumulation point of S, then there exists a sequence of distinct
points xn (none of which are x itself) of S such that xn x. It follows that for each
> 0, there exists N N such that n N implies that d(x, xn ) < . This implies that
xN B (x) S whence x is not an isolated point of S.
49
50
Lecture 13. Closed Sets
() Suppose that x S is not an isolated point of S. We will inductively define a sequence

xn of distinct points of S such that xn x. First, select a point x0 such that d(x, x0 ) < 1.
Now assume that we have found a sequence of distinct points x0 , x1 , . . . , xn of S such that
d(x, xn ) < 21n .
We claim that there exists a point xn+1 S, not among the points x0 , x1 , . . . , xn ,
1
. Indeed, suppose toward a contradiction that no such choice
such that d(x, xn+1 ) < 2n+1
of xn+1 is possible. Let
< min{d(x, x0 ), d(x, x1 ), . . . , d(x, xn )}
and observe that
B (x) S = {x},
whence x is an isolated point of S. Since this is a contradiction, it follows that such an
xn+1 can always be found.
The sequence xn so constructed satisfies d(x, xn ) < 21n for all n N whence xn x
by the Squeeze Theorem. By definition, it follows that x is an accumulation point of S.
13.2. Closed Sets
Definition. The set of limit points of S is denoted S and called the closure of S (with
respect to M and d). A subset S of a metric space (M, d) is called closed (with respect to
M and d) if every limit point of S belongs to S. In other words, S is closed if and only if
S = S.
Example 13.2. If (M, d) is a metric space, then and M are both closed sets. Indeed, the
closure of is since there are no elements of to make sequences with. On the other
hand, M is closed since any convergent sequence in M converges to a point of M . Thus
M contains all of its limit points.
Lemma 2. S S holds for any subset S of a metric space (M, d).
Proof. Each element x S is automatically a limit point of S. Indeed, x is the limit of
the sequence x, x, x, x, x, x, . . ..

Example 13.3. This example illustrates that one needs to make sure that the big metric
space is declared beforehand. For instance, Q is a closed subset of the metric space (Q, d),
where d denotes the standard metric d(x, y) = |x y|. When using Q as the big metric
space, it is as if we are assuming that irrational numbers no longer exist as far as Q is
concerned, they do not.
On the other hand,
as a subset of the metric space
Q is not closed when considered
/ Q. In fact, the density of Q in
(R, d). For example, 2 is a limit point of Q in R and 2
R implies that Q = R. We therefore see that the property of being closed depends strongly
on the big metric space which the set in question belongs to.
Example 13.4. Consider C([a, b]) with the infinity metric:
d (f, g) = kf gk = sup |f (x) g(x)|.
axb
Consider the subspace P = {p(x) : p is a polynomial} of C([a, b]). What is P ? This

is a deep and difficult question (which was answered by Weierstrass). We will hopefully
discuss the answer later in the course.
LECTURE 14
Open Sets
14.1. Closed Sets
Definition. The set of all limit points of S is denoted S and called the closure of S (with
respect to M and d).
Since each element x S is a limit point of S (i.e., x is the limit of the sequence
x, x, x, x, x, x, . . .), it follows that
S S.
(14.1)
Definition. A subset S of a metric space (M, d) is called closed (with respect to M and
d) if every limit point of S belongs to S. In other words, S is closed if and only if
S = S.
Example 14.1. If (M, d) is a metric space, then and M are both closed sets. Indeed, the
closure of is since there are no elements of to make sequences with. On the other
hand, M is closed since any convergent sequence in M converges to a point of M . Thus
M contains all of its limit points.
Theorem 14.1. If (M, d) is a metric space and S M , then
S = S.
In other words, the closure of a set is a closed set.
Proof. By (14.1), we need only prove that
S S.
(14.2)
Given x S, we must find a sequence xn in S such that xn x. However, since x S,

we know that there exists a sequence yn in S such that yn x. Hence for any > 0, there
exists N N such that
n N d(x, yn ) < .
2
On the other hand, since yn belongs to S, it follows that there exists xn S such that
d(xn , yn ) < 2 . Putting this all together we find that n N implies that
d(x, xn ) d(x, yn ) + d(yn , xn )
< +
2 2
= .
Thus xn is a sequence in S which converges to x, whence x S. This establishes (14.2)
and completes the proof.

51
52
Lecture 14. Open Sets
Example 14.2. This example illustrates that one needs to make sure that the big metric
space is declared beforehand. For instance, Q is a closed subset of the metric space (Q, d),
where d denotes the standard metric d(x, y) = |x y|. When using Q as the big metric
space, it is as if we are assuming that irrational numbers no longer exist as far as Q is
concerned, they do not.
On the other hand,
Q is not closed when considered
as a subset of the metric space
(R, d). For example, 2 is a limit point of Q in R and 2
/ Q. In fact, the density of Q in
R implies that Q = R. We therefore see that the property of being closed depends strongly
on the big metric space which the set in question belongs to.
Example 14.3. Consider C([a, b]) with the infinity metric:
d (f, g) = kf gk = sup |f (x) g(x)|.
axb
Consider the subspace P = {p(x) : p is a polynomial} of C([a, b]). What is P ? This

is a deep and difficult question (which was answered by Weierstrass). We will hopefully
discuss the answer later in the course.
Theorem 14.2. Let (M, d) be a metric space. For each x M and for each > 0, the set
C (x) = {y M : d(x, y) }
is a closed subset of M . The set C (x) is referred to as the closed ball of radius centered
at x.
Proof. To prove that C (x) is closed, we need only show that C (x) contains all of its
limit points. Suppose that yn is a sequence in C (x) which converges to a point y. This
means that for any > 0, there exists N N so that
nN
d(yn , y) < .
Therefore n N implies that

d(x, y) d(x, yn ) + d(yn , y)
< + .
Since this holds for every > 0, it follows from the -Principle that d(x, y) whence
y C (x).

Example 14.4. It is not true in general that
B (x) = C (x) = {y M : d(x, y) }.
Let M be any nonempty set which contains at least two points, let d be the discrete metric
on N, and let = 1. In this case
B (x) = {y M : d(x, y) < 1} = {x}
whence B (x) = {x}. On the other hand,
{y M : d(x, y) 1} = M 6= {x}
since M contains at least two points.
Corollary 5. The interval [a, b] (where a < b) is a closed subset of R (equipped with the
standard metric).
53
Proof. The interval [a, b] is simply

C ba ( b+a
2 ) = {y R : |y
2
b+a
2 |
ba
2 },
the closed ball in R of radius (b a)/2 centered at (b + a)/2. Indeed, the inequalities
are equivalent to a y b.
ba
2 y
b+a
2
ba
2
14.2. Open Sets

Definition. A subset S of a metric space (M, d) is called open (with respect to M and d)
if for each x S there exists a corresponding > 0 such that
B (x) S.
Example 14.5. If (M, d) is a metric space, then and M are open subsets of M .
We must be careful with the term open ball since we now have a technical definition
for the term open. Is what we call an open ball actually an open set, according to our
definition? Fortunately, the answer is yes.
Theorem 14.3. Let (M, d) be a metric space. For each x M and each > 0, the subset
is an open subset of M .
B (x) = {y M : d(x, y) < }.
Proof. Let x M and let > 0. If y B (x), then d(x, y) < and hence
r = d(x, y) > 0
Consider the corresponding open r ball centered at y:

Br (y) = {z M : d(y, z) < r}.
For each z in Br (y), it follows that
d(x, z) d(x, y) + d(y, z)
< d(x, y) + ( d(x, y))

= .
Therefore z B (x) whence Br (y) B (x). By the definition of open sets, it follows
that B (x) is open.
LECTURE 15
Set Operations with Open and Closed Sets

15.1. Complements of Open and Closed Sets
The relationship between open and closed sets is the following duality theorem:
Theorem 15.1. In a metric space (M, d),
(i) the complement of an open set is closed
(ii) the complement of a closed set is open.
Pf. of (i). Suppose that S is an open subset of M . We wish to prove that its complement
S c is closed. Suppose that xn is a sequence in S c and that xn converges to some point x in
M . Suppose toward a contradiction that x
/ S c (i.e., x S). Since S is an open set, there
exists > 0 so that B (x) S. By the definition of convergence, there exists N N so
that n N implies that d(xn , x) < . This implies that
xN B (x) S.
Since xN S contradicts the fact that xN S c , it follows that S c is closed.
Pf. of (ii). Let S M be closed and suppose toward a contradiction that S c is open. In
other words, suppose that there exists some x S c such that B (x) 6 S c for every > 0.
It follows that for every n N, there exists xn S such that xn B 21n (x). This implies
that d(xn , x) < 21n whence xn x by the Squeeze Theorem. Since S is closed, it follows
that x belongs to S. However, this contradicts the fact that x belongs to S c .

Example 15.1. The half-open intervals [a, b) and (a, b] are neither open nor closed in R.
On the other hand, and R are both closed and open in R.
Definition. Let (M, d) be a metric space. A subset S M is called clopen if S if both
open and closed.
Example 15.2. In a metric space (M, d), the sets and M are clopen. There are sometimes other clopen sets in a metric space we will learn more about clopen sets when we
discuss connectivity.
15.2. Set Operations with Open and Closed Sets
It turns out that open and closed sets have relatively nice properties, set theoretically
speaking. The following theorem tells us how open sets react to the usual set theoretical
operations:
54
55
Theorem 15.2. Let (M, d) be a metric space.

(i) the arbitrary union of open sets is open
(ii) the intersection of finitely many open sets is open
(iii) and M are open sets.
Pf. of (i). Let I denote an index set and consider the indexed union
[
Ai = { x M : (i I)(x Ai ) }
iI
of open sets Ai . If x iI Ai , then there exists i I so that x Ai . Since the set Ai is

open, there exists > 0 so that B (x) Ai iI Ai . Thus iI Ai is open.
Pf. of (ii). Let A1 , A2 , . . . , An be open subsets of M and let

n
\
Ai = B.
x
i=1
Since each Ai is open, there exist corresponding i > 0 so that Bi (x) Ai . If

then it follows that
= min{1 , 2 , . . . , n },
B (x) Bi (x) Ai
for i = 1, 2, . . . , n. Therefore B (x) ni=1 Ai and hence ni=1 Ai is an open set.
Pf. of (iii). Trivial.
We can phrase the preceding theorem in terms of closed sets:

Corollary 6. Let (M, d) be a metric space.
(i) the arbitrary intersection of closed sets is closed
(ii) the union of finitely many closed sets is closed
(iii) and M are closed sets.
Proof. This follows from de Morgans laws
!c
[
\
=
Aci ,
Ai
iI
iI
Ai
!c
iI
Aci ,
iI
which are valid for any index set I (whether finite or infinite).
A useful theorem (which we shall not prove, at least yet) is the following:
Theorem 15.3. An open set S R can be uniquely expressed as a countable union of
disjoint open intervals in such a way that the endpoints of these intervals do not belong to
S.
LECTURE 16
Topological Characterization of Continuity

16.1. Inverse Images
Definition. Let f : A B be a function and let Y B. The inverse image of Y under f
is the set
f 1 (Y ) = {a A : f (a) Y }.
Be aware of the fact that the notation f 1 has nothing to do with reciprocals or inverse
functions. In particular, f 1 (Y ) exists as a set, even if the function f is not invertible or
Y = .
It is important to note that
a f 1 (Y )
f (a) Y.
It is rather convenient that inverse images work well with the standard set operations:
Theorem 16.1. If f : A B is a function and C, D B, then
(i) f 1 (C D) = f 1 (C) f 1 (D),
(ii) f 1 (C D) = f 1 (C) f 1 (D),

c
(iii) f 1 (Y ) = f 1 (Y c ).1
Pf. of (i). Since
x f 1 (C D) f (x) C D
def. inv. img.
(f (x) C) (f (x) D)
(x f
xf
(C)) (x f
(C) f
(D))
(D)
def. of
def. inv. img.
def. of ,
it follows that the conditions for membership in f 1 (C D) and f 1 (C) f 1 (D) are
identical.

Pf. of (ii). This is similar to the previous proof:
x f 1 (C D) f (x) C D
def. inv. img.
(f (x) C) (f (x) D)
(x f
xf
(C)) (x f
(C) f
(D)
(D))
def. of
def. inv. img.
def. of .
Since the conditions for membership in f 1 (C D) and f 1 (C) f 1 (D) are identical,
it follows that f 1 (C D) = f 1 (C) f 1 (D).

1Observe that Y c B and that `f 1 (Y )c A. In other words, the complements are with respect to B
and A, respectively.
56
57
We leave the proof of (iii) to the reader.

16.2. Topological Characterization of Continuity
Theorem 16.2. Let (A, dA ) and (B, dB ) be metric spaces and let f : A B. The
following statements are equivalent:
(i) f is continuous (i.e., f satisfies the definition).
(ii) f 1 (Y ) is a closed subset of A whenever Y is a closed subset of B.

(iii) f 1 (Y ) is an open subset of A whenever Y is an open subset of B.
d
A
B
a.
f (a) whenever an
(iv) f (an )
Proof. (i) (iv): This was a previous theorem proved in class.
(i) (ii): Let Y be a closed subset of B. Let xn be a sequence in f 1 (Y ) which

converges to some point x in A. Since f is continuous, the equivalence (i) (iv) ensures
that
dB
f (x).
f (xn )
Since the sequence f (xn ) belongs to the closed subset Y , it follows that f (x) also belongs
to Y . Thus x belongs to f 1 (Y ) and therefore f 1 (Y ) is a clsoed set.
(ii) (iii): This follows immediately from the fact that
c
f 1 (Y ) = f 1 (Y c )
for Y B.
(iii) (i): Let x A and let > 0 and note that B (f (x)) is an open subset of B. By
condition (iii), the set f 1 (B (f (x)) is an open subset of A which contains x. Therefore
there exists > 0 so that
B (x) f 1 (B (f (x)).
In other words,
d(x, y) < y B (x)
y f 1 (B (f (x)))
f (y) B (f (x))
d(f (x), f (y)) < ,

which is exactly what we need to satisfy the condition.
Example 16.1. The theorem does not say that that f (X) is open if X is open. Indeed, let
A = B = R and let f be the zero function. Then f (X) = {0} for any subset X of R,
regardless of whether X is open or not.
Example 16.2. The theorem does not say that that f (X) is closed if X is closed. Indeed,
let A = B = R and let f (x) = tan1 x. Note that X = R is a closed subset of R but that
f (X) = (/2, /2), which is not closed.
In additional to providing an elegant topological characterization of continuity, the
preceding theorem is extremely useful since it often provides a shortcut to proving that
sets are open or closed.
58
Lecture 16. Topological Characterization of Continuity
Example 16.3. The set S = {n : n Z} is a closed subset of R since S = sin1 ({0}).

Here sin1 ({0}) denotes the inverse image of the closed set {0} under the sine function
(which we are assuming is continuous). There are of course other ways of proving that S
is closed, but none quite so easy.2
Example 16.4. The interior of an ellipse
S = {(x, y) R2 : ax2 + by 2 < c},
a, b, c > 0
is an open subset of R (with respect to the usual metric). Indeed, the function f : R2 R
defined by
f (x, y) = ax2 + by 2
is clearly continuous (it is the sum of the continuous functions g(x, y) = ax2 and h(x, y) =
by 2 , which are themselves products of continuous functions . . . ). Since the set (, c) is
an open subset of R, it follows that
S = f 1 ( (, c) )
is an open subset of R2 . Similar arguments apply to other planar regions defined in terms
of strict inequalities.
Example 16.5. Consider R3 , equipped with the usual metric. Recall that a plane P is a
subset of the form
P = {(x, y, z) R3 : ax + by + cz = d}
where a, b, c, d R are constants. Since the function f : R3 R defined by
f (x, y, z) = ax + by + cz
is continuous, it follows that

f 1 ({d}) = {(x, y, z) R3 : f (x, y, z) = d} = P
is a closed subset of R3 since {d} is a closed subset of R. Similar arguments show that
most surfaces in R3 are closed sets.
Example 16.6. There does not exist a continuous function f : R R (with respect to the
usual metric on R) so that f (x) 0 if x Q and f (x) < 0 if x
/ Q. Indeed, if such an f
existed, then f 1 ( [0, ) ) would be a closed subset of R. However, f 1 ( [0, ) ) = Q is
not closed.
2Although S is a union of the closed sets {n} (n Z), this union is not a finite union.
LECTURE 17
Cauchy Sequences
17.1. Cauchy Sequences
Definition. Let (M, d) be a metric space. A sequence xn in M satisfies the Cauchy condition if for every > 0, there exists N N so that
m, n N
d(xn , xm ) < .
If the sequence xn satisfies the Cauchy condition, then we say that xn is a Cauchy sequence.
In more descriptive terms, one might say that a Cauchy sequence is a sequence whose
terms eventually get arbitrarily close to each other. One important fact about Cauchy sequences is the following:
Theorem 17.1. Every convergent sequence is a Cauchy sequence.
Proof. Let (M, d) be a metric space and suppose that xn x. If > 0, then let N N
be so large that n N implies that d(xn , x) < /2. Therefore if n, m N we have
d(xn , xm ) d(xn , x) + d(x, xm )
< 2 +
= .
Using the preceding theorem we can easily prove that the harmonic series
1+
1 1 1
+ + +
2 3 4
(17.1)
diverges.
Theorem 17.2. The harmonic series (17.1) diverges.
Proof. Suppose toward a contradiction that the series (17.1) converges. In other words,
suppose that the sequence
1 1
1
Sn = 1 + + + +
2 3
n
of partial sums converges. Since limn Sn exists, it follows that Sn is a Cauchy sequence. Thus if = 12 , there exists a corresponding N N such that
n, m N
|Sm Sn | < 21 .
However, this contradicts the observation that 2N, N N and

1 1
1
1
1 1
1 + + + +
S2N SN = 1 + + + +
2 3
2N
2 3
N
1
1
1
=
+
+ +
N +1 N +2
2N
59
60
Lecture 17. Cauchy Sequences

>
1
1
1
+
+ +
2N {z
2N}
|2N
N times
1

= .
2
The following example illustrates that there are Cauchy sequences which do not converge:
Example 17.1. In the metric space Q, endowed with the usual metric, the sequence
1, 1.4, 1.41, 1.414, . . .
(17.2)
of rational approximations to 2 is Cauchy. Although this can be verified directly, it

follows from the fact that the sequence (17.2) converges in R. Indeed, the sequence (17.2)
is Cauchy in R and is thus Cauchy in Q since the metrics
on Q and R agree on Q. However,
the sequence (17.2) does not converge in Q since 2 is not rational.

This reflects the idea that Q is a metric space which has holes. In some sense, it is
incomplete. In this case, we have already remedied the problem by enlarging Q to form
R by using the Least Upper Bound Principle.
17.2. Completeness
Definition. A metric space (M, d) is called complete if every Cauchy sequence in M
converges (to a limit in M ). In other words, a metric space (M, d) is complete if and only
if every Cauchy sequence in M converges.
One of the most important aspects of completeness is that any closed subspace of a
complete metric space inherits the property of completeness:
Theorem 17.3. Every closed subset of a complete metric space is itself a complete metric
space. In other words, if (M, d) is a complete metric space and S is a closed subset of M ,
then (S, d) is a complete metric space.
Proof. Let (M, d) be a metric space and suppose that S is a closed subset of M . If xn is a
Cauchy sequence in S, then clearly xn is a Cauchy sequence in M . Since M is complete
(with respect to d) it follows that xn is converges to some point x in M . Since S is closed,
this limit point x must belong to S. Therefore every Cauchy sequence in S converges to a
limit in S and hence (S, d) is a complete metric space.

We state the following theorem without proof:
Theorem 17.4. R, endowed with the usual metric d(x, y) = |x y|, is complete.
The proof that R is a complete metric space ultimately rests on the Least Upper Bound
Principle. However, the details are somewhat lengthy and consequently the proof is omitted.
LECTURE 18
Completeness
Lemma 3. The metrics d1 , d2 , d on Rn satisfy
for all x, y in Rn .
d (x, y) d2 (x, y) d1 (x, y) n d (x, y)
(18.1)
Proof. Each of the inequalities can be verified directly from the definitions of d1 , d2 , d .

Theorem 18.1. A subset A Rn is open (resp. closed) with respect to d1 , d2 , or d if
and only if it is open (resp. closed) with respect to either of the others. In other words, the
metrics d1 , d2 , d are equivalent in the sense that they induce the same open and closed
sets.
Sketch of Pf. One verifies that the inequality (18.1) implies the equivalence of the following statements:
d
(i) xi 1 x
d
(ii) xi 2 x
d
x,
(iii) xi
where xi denotes a sequence in Rn and x Rn . For instance, let us prove that (iii) implies
(i).
d
x, then for any > 0 there exists N N such that
If xi
i N d (x, xi ) < .
n
By (18.1), this implies that
iN
d1 (x, xi ) <
whence xi 1 x.
It follows from the equivalence of (i), (ii), and (iii) that the closures with respect to d1 ,
d2 , and d of a subset S of Rn are identical. Since a set is closed if and only if equals
its closure, it follows that the metric spaces (Rn , d1 ), (Rn , d2 ), (Rn , d ) have exactly the
same closed sets. Since the complement of a closed (resp. open) set is open (resp. closed),
it follows that these metric spaces also have precisely the same open sets.

Theorem 18.2. Rn is complete with respect to any of the metrics d1 , d2 , d .
Proof. By (18.1), a sequence in Rn which is Cauchy (resp. convergent) with respect to d1 ,
d2 , or d is automatically Cauchy (resp. convergent) with respect to the other two metrics.
It therefore suffices to prove that Rn is complete with respect to the metric d .
61
62
Lecture 18. Completeness
If a sequence xn is Cauchy with respect to d , then the ith entries xj (i) and xk (i) of
the jth and kth vectors xj and xk satisfy
|xj (i) xk (i)| max{ |xj (1) xk (1)|, . . . , |xj (n) xk (n)| }
= d (xj , xk )
for i = 1, 2, 3, . . . , n. Thus if xn is a Cauchy sequence with respect to the metric d on

Rn , then xn (i) is a Cauchy sequence in R (with the usual metric) for each i = 1, 2, . . . , n.
Since R is complete, we may define a vector x Rn by setting
x = (x(1), x(2), . . . , x(n))
where the entries

x(1), x(2), . . . , x(n)
are defined by
lim xj (i) = x(i),
i = 1, 2, . . . , n.
Our next goal is to show that the sequence xn in Rn converges to x with respect to the
metric d .
Let > 0 be given and let M1 , M2 , . . . , Mn N be so large that
for i = 1, 2, 3, . . . , n. If
j Mi
|xj (i) x(i)| <
N = max{M1 , M2 , . . . , Mn },
then
j N (i {1, 2, . . . , n})( |xj (i) x(i)| < ).

Putting this together, we find that if j N then
d (xj , x) = max{ |xj (1) x(1)|, |xj (2) x(2)|, . . . , |xj (n) x(n)| }
< .
d
x and hence RN is complete with respect to d .

Therefore xn
Corollary 7. Mn (R) is complete (with respect to any of the metrics d1 , d2 , d ).

Proof. The metrics d1 , d2 , d on Mn (R) are essentially identical to the corresponding
2
2
metrics on Rn . In fact, Mn (R) (as a normed vector space) is essentially Rn with a
different labeling scheme.

18.1. Completions of Metric Spaces
If (M, d) is a metric space which is not complete, then there is a somewhat elaborate
process by which (M, d) may be embedded into a larger metric space (M , d ) which is
complete. In other words, (M , d ) is a complete metric space and there exists an injection
f : M M that satisfies d(x, y) = d ( f (x), f (y) ) for all x, y M (such a function is
called an isometry). In other words, we can embed a copy of (M, d) inside (M , d ) such
that the metric on the larger metric space (M , d ) agrees with the original metric on M .
Theorem 18.3. Every metric space (M, d) can be completed. In other words, there exists
a complete metric space (M , d ) and an injection f : M M which satisfies d(x, y) =
d ( f (x), f (y) ) for all x, y M .1
1The completion (M , d ) depends heavily on d and is unique up to homeomorphism, meaning that for
all practical purposes any two completions of (M, d) are topologically identical.
63
The proof of the theorem is somewhat lengthy and the difficulties are mostly notational (although the concepts are interesting). The basic idea is to construct M from
equivalence classes of Cauchy sequences from M . We will not go into the details of the
construction, however.
Example 18.1. Consider the metric space (Q, d) where d(x, y) = |x y|. It can be shown
that the completion of (Q, d) is essentially R (with the normal metric). In fact, some real
analysis textbooks begin by constructing R explicitly from Q using this technique.
Example 18.2. It turns out that C([a, b]) is complete with respect to the metric d (we
will prove this later in the course), but not with respect to d1 or d2 . The completions of
C([a, b]) with respect to d1 and d2 turn out to be the Lebesgue spaces L1 [a, b] and L2 [a, b],
respectively. To go into more detail would require a long digression on measure theory and
the Lebesgue integral.
LECTURE 19
Infinite Series
19.1. Cauchy Criterion for Series
P
Definition. An infinite series n=0 an in
Pamnormed vector space V is said to converge to
a vector S V if the partial sums Sm = n=0 an tend to S:
an = S
n=0
lim Sm = S.
Here the limit is taken with respect to the metric d(x, y) = kx yk on V.

Definition. We say that a normed vector space (V, k k) is complete if V is a complete
metric space when equipped with the metric d(x, y) = kxyk. A complete normed vector
space is sometimes called a Banach space.
Standard examples of Banach spaces are Rn and Mn (R) (with respect to any of
d1 , d2 , d ). It also turns out that (C([a, b]), d ) is a Banach space where d denotes
the metric
d (f, g) = sup |f (x) g(x)|.
axb
Later in the course, we will prove that (C([a, b]), d ) is indeed complete. In graduate
analysis or differential equations you will encounter many other Banach spaces, including
the Lebesgue spaces and the Sobolev spaces. For now, we will mostly be concerned with
Rn and Mn (R).
In a Banach space, we have the Cauchy Convergence Criterion for series:
P
Theorem 19.1. A series n=0 an in a Banach space V converges if and only if for every
> 0 there exists N N so that
kjN
k
X
n=j
an k < .
Proof. Since V is complete, the given series converges if and only if the sequence
Sm =
n
X
an
n=0
of partial sums is a Cauchy sequence. If Sm is a Cauchy sequence, then for each > 0
there exists N N such that k j N implies that
k
k
X
n=j
an k = kSk Sj1 k = d(Sk , Sj1 ) < .
On the other hand, if the preceding condition holds, then the partial sums Sm form a
Cauchy sequence. Since V is complete, it follows that limn Sm exists.

64
65
The following often overlooked theorem is quite useful in many contexts:

P
Theorem 19.2. If (V, k k) is a Banach space and if the series n=0 an converges, then
!
X
lim
an = 0.
m
n=m+1
Here 0 denotes the zero vector in V. In other words, the tail end of a convergent series
tends to zero.
Proof. Let > 0 and find N N such that
nN
kS Sm k < .
| {z }
d(S,Sm )
In other words, we have

nN
k(S Sm ) 0k <
whence S Sm converges to the vector 0. However, this is just another way of saying that
!
X
lim

an = lim (S Sm ) = 0.
m
n=m+1
19.2. The Divergence and Comparison Tests

An important consequence of the preceding theorem is the so-called Divergence Test
from Calculus II:
P
Theorem 19.3. Let n=0 an be a series in a Banach space (V, k k).
P
(i) If n=0 an converges, then limn an = 0 (the zero vector). In other words:
the terms of a convergent sequence tend to 0.
P
(ii) If limn an 6= 0 (this includes the limit not existing), then n=0 an diverges.
Proof. Since (ii) is the contrapositive of (i) and thus it suffices to prove (i). By the Cauchy
Criterion for Series, we find that for any > 0 there exists N N so that
nN
kSn Sn1 k < .

{z
}
|
kan k
Putting this altogether, we find that for any > 0 there exists N N so that
nN
from which it follows that limn an = 0.
kan 0k <
Another important consequence of the Cauchy Criterion for Series is the following
generalization of the Comparison Theorem from Calculus II:
66
P
Theorem 19.4. Let
n=0 an be a series in a Banach space (V, k k).
P
(i) If n=0 bn is a convergent series of non-negative real numbers and if there
exists N N so that
then
n=0
nN
kan k bn ,
an converges. In particular, if
n=0
converges (in R), then
kan k
an
n=0
converges (in V). In other words, every absolutely convergent series in V

converges.
P
(ii) If n=0 cn is a divergent series of non-negative real numbers and if there exists
N N so that
then
P
nN
0 cn kan k,
an diverges.
Proof. If n=0 bn is a convergent series of non-negative real numbers, then for each > 0
there exists N N so that
kjN
k
X
Thus k j N implies that

kSk Sj1 k = k
bn < .
n=j
k
X
n=j
k
X
n=j
k
X
an k
kan k
bn
n=j
< .
By the Cauchy Criterion for Series in a Banach Space, it follows that the series
converges in V. This establishes (i). We leave the proof of (ii) to the reader.
n=0
an
The importance of the preceding theorem is that it allows us to conclude that a series of
vectors in a normed vector space converges if a corresponding numerical series converges.
Needless to say, it is usually much easy to test for the convergence of a series of real
numbers than a series of vectors in a normed vector space.
P
P
Definition. A series n=0 an in a Banach space is called absolutely convergent if n=0 kan k
converges in R.
67
Example 19.1. Not every convergent series (even in R) converges absolutely. For instance,
it can be shown that the Alternating Harmonic Series
X
(1)n+1
1 1 1
= 1 + +
n
2 3 4
n=1
converges (in fact to the value ln 2). However, the corresponding series of positive terms is
simply the harmonic series, which diverges.
Example 19.2. If A is any n n matrix, then we may define the exponential of A by the
series
X
An
.
exp(A) =
n!
n=0
Since Mn (R) is complete with respect to d2 (and d1 , d as well), we need only show
that the terms of the preceding series are bounded in norm by the terms of a convergent
numerical series. Using the submultiplicativity of d2 we find that
n
A
1
n

n! = n! kA k2
2
Since
1
kAkn2 .
n!
X
kAkn2
n!
n=0
is a convergent series in R (via the Ratio Test from Calculus II), it therefore follows that
X
An
n=0
n!
is a convergent series in (Mn (R), d2 ). The limit matrix is defined to be exp(A).

This is important for the study of differential equations. Let A be an n n matrix and
consider the initial value problem
y (t) = Ay,
where
y1 (t)
y = ... ,
y(0) = y0
y1 (t)
y = ... ,
a0

y0 = ... .
yn (t)
yn (t)
One can show that the solution is given by the simple formula
y(t) = eAt y0 .
This is entirely analogous to the fact that the solution to
y (t) = ay(t),
y(0) = y0
is given by
y(t) = eat y0 .
an
LECTURE 20
Infinite Series
20.1. An Extended Example
Matrices are particularly interesting to study because their algebraic structure (e.g.,
multiplication and inversion) is closely related to their analytic structure (e.g., metrics and
convergence). This example highlights a few such connections.
The following algebraic lemma is quite useful. You have seen it before in the case of
1 1 matrices (i.e., real numbers):
Lemma 4. The formulae
(I Am ) = (I A)(I + A + A2 + + Am1 )
= (I + A + A2 + + Am1 )(I A)
hold for all A Mn (R) and all m N. Here we use the notation
A0 = I,
A1 = A,
A2 = AA,
A3 = AAA, . . . ,
where I denotes the n n identity matrix.

Proof. Matrix multiplication is not commutative in general. Fortunately, powers of a A
commute since matrix multiplication is associative:
Aj Ak = (AA A) (AA A)
| {z } | {z }
j
= (AA A)
| {z }
j+k
= (AA A)
| {z }
k+j
= (AA A) (AA A)
| {z } | {z }
k
=A A .
Since any power of A (including A0 = I) commutes with any other power of A, the
identities
(I A)(I + A + + Am1 ) = I Am
and
(I + A + + Am1 )(I A) = I Am
can be proved using the same arguments used in the real case.

P
Lemma 5. If kAk2 < 1, then n=0 An converges (the limit being taken with respect to
the d2 metric).
68
69
Proof. It suffices to prove that the sum is absolutely convergent. Note that we cannot
assume that the series sums to (I A)1 since we do not know yet whether I A is
invertible.
Since we will be dealing only with the 2-norm, we will simply drop the subscript 2 in
the following. The sum is absolutely convergent since
X
X
1
kAn k
kAkn =
.
1 kAk
n=0
n=0
P
n
Indeed, the preceding shows that the partial sums of the real series
n=0 kA k (which has
nonnegative terms) are bounded above. Also observe that we used the fact that kAk < 1 to
sum the resulting real geometric series. Moreover, the inequality kAn k kAkn follows
from the submultiplicativity of the 2-norm. Since Mn (R) is complete, we know that every
absolutely convergent series in Mn (R) converges in Mn (R) and therefore the given series
converges to some matrix S.

Lemma 6. Matrix inverses are unique, when they exist. In other words, if X, Y, Z
Mn (R) satisfy XY = Y X = I and XZ = XZ = I, then Y = Z.
Proof. Using the fact that matrix multiplication is associative, we see that
Y = Y I = Y (XZ) = (Y X)Z = IZ = Z.
Theorem 20.1. If kAk2 < 1, then I A is invertible and
X
(I A)1 =
An ,
n=0
where the series converges absolutely with respect to the d2 metric.

P
n
Proof. Be a preceding lemma,
n=0 A converges to some matrix S. Let
Sm =
m1
X
An
n=0
denote the mth partial sum of the series we are concerned with. Since
(I Am ) = (I A)Sm
= Sm (I A)
d
by a preceding lemma, we may pass to the limit and use the fact that Am 2 0 (the zero
matrix) since kAk2 < 1. Using the fact that multiplication by I A is continuous with
respect to d2 , we have
I = (I A)S = S(I A)
from which it follows (using the uniqueness of inverses) that S = (I A)1 .
LECTURE 21
Integral Test
21.1. The Harmonic Series and Integral Test
In this section, we consider only series with real terms. In other words, we have V = R
and our metric is implicitly given by d(x, y) = |x y|.
Example 21.1. Recall that the harmonic series
X
1
n
n=1
diverges. In particular,
1
= 0,
n
P
In particular, the implication that limn an = 0 implies the
but n=1 an diverges.
P
convergence of n=1 an is false in general. Make sure you remember this.
lim an =
Although we have already proved that the harmonic series diverges (Lecture 17), a second proof of the divergence of the harmonic series is requested on an upcoming homework
assignment. A cheap1 way of seeing that the harmonic series diverges is the Integral Test
from Calculus II:
Theorem 21.1 (The Integral Test). Suppose
R that f : [0, ) R is continuous,
Ppositive, and decreasing. If an = f (n), then 1 f (x) dx converges if and only if n=1 an
converges.
Proof. By considering the graph of f (x) and interpreting the partial sums are the sum of
the areas of boxes, one obtains the inequalities
Z n
Z n
f (x) dx a1 + a2 + + an a1 +
f (x) dx.
(21.1)
1
In particular, the convergence of either the improper integral or the series will imply the
convergence of the other.

Since we have not introduced the integral, let alone improper integrals, in a respectable
manner yet, please do not use the integral test on the homework (until we cover integrals
more formally). In any case, back to the harmonic series:
Example 21.2. The estimates (21.1) from the integral test (with f (x) = 1/x) imply that
ln n 1 +
1
1
+ + 1 + ln n.
2
n
1In the sense that we have not introduced integrals in a rigorous manner.
70
(21.2)
71
In particular, the partial sums of the harmonic series tend to infinity, but extremely slowly.
For instance, the sum of the first million terms satisfies:
6
13.815 . . .
10
X
1
n
n=1
<
<
14.815 . . . .
<
21.7233 . . . .
The sum of the first billion terms satisfies:
20.7233 . . .
10
X
1
n
n=1
<
In particular, observe that it would be very difficult indeed to conclude that the harmonic
series diverges based on purely numerical evidence. In fact, we would have to add the first
2.688 1043
terms to get a partial sum of the harmonic series to be greater than 100.
It follows from (21.2) that
1
1
0 1 + + + ln n 1.
| 2
{z n
}
=F (n)
This implies that the sequence F (n) is non-negative (i.e., bounded below by 0). Moreover,
F (n) is a decreasing sequence since:

1
1
1
1
F (n) F (n + 1) = 1 + + + ln n 1 + + +
ln(n + 1)
2
n
2
n+1
1
= ln(n + 1) ln(n)
n+1
Z n+1
1
dx
=
x
n+1
n
> 0.
The inequality follows from the fact that
1
1
>
x
n+1
on the interval [n, n + 1] and hence the area under the graph of f (x) from x = n to
x = n + 1 must be greater than 1/(n + 1).
It follows from the preceding computations that
1
1
+ + ln n
2
n
is a decreasing sequence of real numbers which is bounded below. This implies that
limn F (n) exists. This limit is called Euler-Mascheroni Constant and it is denoted
(lower-case gamma):

1
1
= lim 1 + + + ln n
n
2
n
0.5772156649 . . ..
F (n) = 1 +
After 0, 1, , e, and the imaginary unit i, is perhaps the most important mathematical
constant. The Euler-Mascheroni constant appears, among other places, in number theory
72
Lecture 21. Integral Test
and complex analysis. For instance, Dirichlet proved that if d(n) denotes the number of
divisors of an integer n, then
!
n
1X
lim
d(i) ln n = 2 1.
n
n i=1
This is an interesting statement about the average number of divisors of positive integers.
LECTURE 22
Alternating Series
22.1. The Alternating Series Test
The following lemma is useful in a variety of scenarios:
Lemma 7. If an is a sequence in a metric space (M, d) such that limn a2n = L and
limn a2n+1 = L, then limn an = L.
Proof. Considering a2n and a2n+1 as sequences in their own right, it follows that for each
> 0 there exists N1 , N2 N such that
n 12 N1
1
2 N2
d(a2n , L) <
d(a2n+1 , L) < .
Now let N = max{N1 , N2 } and observe that

nN
d(an , L) < ,
regardless of whether n is even or odd.
The Alternating Series Test applies to series of real numbers whose terms alternate
between positive and negative values.
Theorem 22.1 (Alternating Series Test). If an an+1 > 0 for all n N and limn an =
0, then the alternating series
(1)n an = a0 a1 + a2 a3 +
n=0
converges.
Proof. Let Sm denote the mth partial sum of the given series. Observe that
S 0 = a0
0
S 2 = a0 a1 + a2
= S0 + (a2 a1 )
S0
S 4 = a0 a1 + a2 a3 + a4
= S2 + (a4 a3 )
S2
73
74
and in general we have

0 S4 S2 S0 .
Since the evenly indexed partial sums S2n form a decreasing sequence which is bounded
below, it follows that limn S2n exists. Let us denote this limit by S. A similar argument
shows that limn S2n+1 exists as well. By the preceding lemma, it suffices to show that
limn S2n+1 = S:
lim S2n+1 = lim (S2n a2n1 )
= lim S2n lim a2n1

n
=S +0
= S.
Example 22.1. The series
X
(1)n
1
1
1
1
= 1 + +
n
+
1
2
3
4
5
n=0
is convergent by the Alternating Series Test.
Recall that the alternating harmonic series is the series
X
(1)n+1
1 1 1 1
= 1 + +
n
2 3 4 5
n=1
obtained by inserting signs into the harmonic series:
X
1 1 1 1
1
= 1 + + + + + .
n
2 3 4 5
n=1
The Alternating Series Test asserts that the alternating harmonic series converges, but we
can say much more. In fact, it is possible to find the sum of the alternating harmonic series
explicitly. In particular, this eliminates the need to appeal to the Alternating Series Test in
the first place we can show that the alternating harmonic series converges without it and
we can compute the sum exactly.
Theorem 22.2. The alternating harmonic series converges to ln 2:
1
1 1 1
+ + = ln 2.
2 3 4
(22.1)
Proof. Let
1
1
+ +
2
m
denote the mth partial sum of the harmonic series and let
Hm = 1 +
1
(1)m+1
+ +
2
m
denote the mth partial sum of the alternating harmonic series. Recall that the EulerMascheroni constant is defined by the limit
Sm = 1
= lim (Hm ln m)
m
0.577 . . . .
75
In particular, we proved that the preceding limit exists. A clever trick now shows that
the evenly indexed partial sums S2m of the alternating harmonic series converges to ln 2.
Observe that
1 1
1
1
S2m = 1 + +
2 3
2m 1 2m

1 1
1 1 1
1
1
= 1 + + + +
2
+ + + +
2 3
2m
2 4 6
2m

1 1
1
1
1 1
1 + + + +
= 1 + + + +
2 3
2m
2 3
m
= H2m Hm
= H2m ln(2m) + ln(2m) Hm
= [H2m ln(2m)] + ln 2 + [ln m Hm ].
Taking the limit as m we find that
lim S2m = ln 2 + lim [H2m ln(2m)] lim [Hm ln m]
= ln 2 +
= ln 2.
On the other hand,

lim S2m+1

1
m
2m + 1
1
= lim S2m + lim
m
m 2m + 1
= ln 2 + 0

= lim S2m +
= ln 2.
Since limm S2m = limm S2m+1 = ln 2, it follows that limn Sn = ln 2. In
other words, we have proved (22.1).
Theorem 22.3. If
space, then
22.2. Manipulating Series

P
n=1 bn = B are convergent series in a Banach
n=1 an = A and
(an + bn ) = A + B
n=1
and
can = cA
n=1
for any c R.
Proof. We prove only the first portion of the theorem. The proof of the second statement
is considerably easier. If > 0 is given, then the partial sums
m
m
X
X
Sm =
an ,
Tm =
bn
n=1
n=1
are convergent sequences in V. Therefore there exist N1 , N2 N so that
m N1 kSm Ak <
2
76

m N2
kTm Bk <
.
2
If n N = max{N1 , N2 }, then
Pm
Pm
Pm
k n=1 (an + bn ) (A + B)k = k [( n=1 an ) A] + [( n=1 bn ) B] k
Pm
Pm
k ( n=1 an ) Ak + k ( n=1 bn ) Bk
= kSm Ak + kTm Bk
< +
2 2
= .
In other words, we may add convergent series together (or multiply them by constants)
without affecting convergence. Things work as we would expect, one might say. On the
other hand, taking products of infinite series is tricky indeed. First of all, products are not
defined in most complete normed vector spaces, since their is no notion of multiplication.
Second, even for Mn (R) which does have a notion of multiplication, we have the additional
difficulty that multiplication is not commutative. Indeed, the situation is delicate enough
when using only real numbers as we shall shortly see.
Example 22.2. We know that
1
Multiplying (22.2) by
1
2
1 1 1
+ + = ln 2.
2 3 4
(22.2)
we find that
1 1 1 1
+ + = 12 ln 2.
2 4 6 8
Inserting zeros between the terms of (22.3) we find that
1
1
1
1
+ 0 + 0+ + 0 + 0 + =
2
4
6
8
Adding (22.2) and (22.4) we find that
0+
(22.3)
1
2
ln 2.
(22.4)
1 1 1 1 1
+ + + = 32 ln 2.
3 2 5 7 4
In other words, by rearranging the terms of the series (22.2) so that each negative term
occurs after a pair of positive terms we have changed the sum.
1+
Theorem 22.4. The sum of rearrangement of the alternating harmonic series (22.2) consisting of p positive terms and q negative terms (the terms stay in the same relative order)
is equal to is log 2 + 12 (log p log q).
Proof. Let Hn denote the nth partial sum of the harmonic series and observe that
1
+ n
n
where n is a sequence converging to the Euler-Mascheroni constant . Indeed, note that
Z n
dx
Hn log n = Hn
x
1
!
n1
X 1 Z k+1 dx
1
+
=
k
x
n
k
Hn = log n +
k=1
n1
X Z k+1
k=1
1
1
k x
{z
dx +
}
77
1
n
1
.
n
Since limn Hn log n = , it follows that limn n = .
Consider only partial sums that consists of blocks of terms. Since the pattern has
period p + q we consider the partial sums S(p+q)n . Since the sum for each successive
block of p + q terms tends to zero, it suffices to prove that
1
(22.5)
lim S(p+q)n = log 2 + (log p log q).
n
2
Since

1 1
1 1
1
1
S(p+q)n =
+ + +
+ + +
1 3
2pn 1
2 4
2qn

1 1
1
1
1 1 1
+ + + +
+ + +
=
1 2 3
2pn
2 4
2pn

1 1
1
+ + +
2 4
2qn
1
1
= H2pn Hpn Hqn
2
2
1
1
1
1
log(pn) pn
= log(2pn) + 2pn +
2pn 2
2
2pn
1
1
1
log(qn) qn
2
2
2qn
1
1
1
= log 2 + log p + log n log p log n log q
2
2
2
1
1
1
1
log n + 2pn pn qn
2
2
2
2qn
1
1
1
1
= log 2 + (log p log q) + 2pn pn qn
,
2
2
2
2qn
(22.5) follows upon passing to the limit.

= n +
LECTURE 23
Rearrangements of Series
23.1. Rearrangements of Series
P
Definition. A series n=0 an of real numbers is called conditionally convergent if it is

convergent but not absolutely convergent.
Example 23.1. The alternating harmonic series conditionally convergent.
Definition. Let an be a sequence in a metric space (M, d). A rearrangement of an is a
sequence bn which is of the form bn = a(n) where : N N is a bijection.
P
of real numTheorem 23.1 (Weierstra). If n=0 an is a conditionally convergent series
P
bers, then for each R there exists a rearrangement bn of an such that n=0 bn = .
Additionally, there also exist rearrangements whose sums diverge to or .
There are cases when rearrangement does not affect the sum of a series:
P
P
an =
Theorem
23.2. If an 0 for all n N and
n=0
n=0 an converges, then
P
P
b
for
every
rearrangement
b
of
a
.
In
particular,
every
rearrangement
of
n
n
n=0 an
n=0 n
is convergent.
Proof. Since bn is a rearrangement of an , there P
exists a bijection
P : N N such that
bn = a(n) for all n N. It suffices to show that n=0 bn n=0 an . Indeed, if we can
P
P
prove this then the reverse inequality n=0 an n=0 bn will follow since an is also a
rearrangement of bn (i.e., an = b1 (n) ). For each N N, we have
N
X
n=1
bn =
N
X
n=1
a(n)
M
X
n=1
an
an
n=1
where
M = max{(0), (1), . . . , (N )}.
P
Since
bn 0 for all n N and
n are bounded above
n=0
P
Pthe partial sums of thePseries
Pb
by n=1 an , it follows that n=0 bn converges and n=0 bn n=0 an .

Along similar lines, the following theorem is true:
P
Theorem 23.3. If
n=0 an is an absolutely convergent series in a Banach space, then
P
P
a
=
b
n=0 n
n=0 n for any rearrangement bn of an .
If n=0 an and n=0 bn are two convergent series in R, then what might we say
about their product? Formally, we expect that term-by-term multiplication (the so-called
78
79
Cauchy product1 of the two series)

!
X
X
bj = (a0 + a1 + a2 + )(b0 + b1 + b2 + )
ai
i=0
j=0
= a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) +
= c0 + c1 + c2 +
=
cn
n=0
where the terms of the new series are given by

n
X
cn =
ak bnk .
k=0
The following theorem of Mertens implies that products of series can be taken term-byterm as long as at least one of the series is absolutely convergent.
P
P
Theorem 23.4 (Mertens). If n=0 an and n=0 bn are both convergent (with sums A
and B, respectively) and if at least one of the two series is absolutely convergent, then the
product of the two series may be taken term-by-term:
X
cn = AB
n=0
where
cn =
n
X
ak bnk .
k=0
In other words, the product of an absolutely convergent series and a convergent series can
be multiplied together term-by-term via the Cauchy formula.
We will not prove Mertens theorem here, since it is more important to understand
this particular result than to reproduce its proof. A related theorem of N.H. Abel is the
following:
P
P
P
Theorem 23.5 (Abel). If
= A,
n=0 an P
n=0 bn = B,
n=0 cn = C are convergent
n
series of real numbers (where cn = k=0 ak bnk ), then C = AB.
P
In other words, Abels theorem says that if the term-by-term product series n=0 cn
converges (without the absolute convergence assumption that Mertens theorem requires),
then the sum must actually be AB. The proof requires a clever argument based on partial summation (a discrete analog of integration by parts) and a theorem on the boundary
behavior of power series near their circle of convergence.
1The reason for introducing this method of multiplication is due to the fact that we need to add every
possible term ai bj . We cannot sum with respect to i first since this would lead to the sum of infinitely many
infinite series. Similarly, summing with respect to j first would lead to the same problem. This is similar to the
problem of counting N N by thinking diagonally we can actually list every ai bj without introducing infinitely
many . . . s. The sums defining the new terms cn are finite sums, and hence cause us no trouble.
LECTURE 24
Products of Series
P
If n=0 an and n=0 bn are two convergent series in R, then what might we say
about their product? Formally, we expect that term-by-term multiplication (the so-called
Cauchy product1 of the two series)
!
X
X
bj = (a0 + a1 + a2 + )(b0 + b1 + b2 + )
ai
i=0
j=0
= a0 b0 + (a0 b1 + a1 b0 ) + (a0 b2 + a1 b1 + a2 b0 ) +
= c0 + c1 + c2 +
=
cn
n=0
where
cn =
n
X
ak bnk
k=0
should yield a convergent series. As the following example shows, this is not always the
case.
24.2. The Cauchy Product of Convergent Series Can Diverge!
Consider the series
X
(1)n
n+1
n=0
By the Alternating Series Test, this series converges to some value A. What happens when
we square this series and perform term-by-term multiplication? In other words, let
(1)n
an = b n =
n+1
P
P
and consider the Cauchy product of the two series n=0 an and n=0 bn .
The formula for the new terms cn tells us that
n
X
cn =
ak bnk
k=0
1The reason for introducing this method of multiplication is due to the fact that we need to add every
possible term ai bj . We cannot sum with respect to i first since this would lead to the sum of infinitely many
infinite series. Similarly, summing with respect to j first would lead to the same problem. This is similar to the
problem of counting N N by thinking diagonally we can actually list every ai bj without introducing infinitely
many . . . s. The sums defining the new terms cn are finite sums, and hence cause us no trouble.
80
81

n
X
(1)nk
(1)k
=
k+1
nk+1
k=0
=
n
X
(1)k (1)nk
p
(n k + 1)(k + 1)
k=0
= (1)n
n
X
k=0
C LAIM: The inequality
1
p
.
(n k + 1)(k + 1)
(n k + 1)(k + 1)
holds for 0 k n.
n
2
+1
2
Pf. of Claim. The inequality can be verified by simply multiplying it all out and checking
to see whether we are led to a true inequality:
?
(n k + 1)(k + 1)
n
+1
2
2
2
n
+n+1
nk + n k 2 k + k + 1
4
? n2
nk k 2
4
? n2
nk + k 2
0
4
n
2
0
k . (T RUE )
2
?
This last inequality is clearly true, and hence so is our desired inequality (i.e., working
backward from the last inequality yields the desired inequality).

Returning to the formula for the terms cn we find that
|cn | =
=
=
n
X
k=0
n
X
k=0
n
X
1
p
(k + 1)(n k + 1)
q
n
k=0 2
n
X
k=0
n
2
+1
1
+1
2
n+2
= (n + 1)
=
2
n+2
2n + 2
.
n+2
2
82
P
From this it is clear that limn |cn | = 2 6= 0 and therefore the series
n=0 cn does not
converge (by the so-called Divergence Test). In other words, attempting to compute
j
k
X
X
(1)
(1)
j+1
k+1
j=0
k=0
by multiplying term-by-term leads to a divergent series.
24.3. The Euler Product Formula

If p is a fixed prime number and z > 1, then the series
n

X
X
1
1
=
pzn
pz
n=0
n=0
=
1
1 p1z
converges absolutely since |1/pz | < 1. Since Mertens theorem says that we may multiply
absolutely convergent series term-by-term, it follows that (using p = 2, 3 in the above) that

1
1
1
1
1
1
=
1 + z + 2z +
1 + z + 2z +
(24.1)
2
2
3
3
1 21z 1 31z

1
1
1
1
1 + z + z +
=
1 + z + z +
2
4
3
9
1
1
1
1
1
1
1
= 1 + z + z + z + z + z + z + z +
2
3
4
6
8
9
12
where the last sum includes terms corresponding exactly to those numbers whose prime
factorizations only use 2 and 3. To see why, consider that we must multiply each term
1/2jz with each term 1/3kz when expanding out the multiplication on the right hand side
of (24.1). Similarly, one can show that
1
1
1
1 21z 1 31z 1 51z

1
1
1
1
1
1
1
1
1
1
= 1 + z + z + z + z + z + z + z + z + z + z + (24.2)
2
3
4
5
6
8
9
10
12
15
where this time the sum includes terms corresponding to all numbers whose factorizations
use only the primes 2, 3, and 5. The great mathematician Leonhard Euler (1707-1783)
recognized that this process can be repeated indefinitely, producing what is now known as
the Euler product formula. We describe this amazing formula below.
The Riemann -function is the function
(z) =
X
1
,
nz
n=0
(24.3)
defined (at the moment) for real variable z > 1. The Euler Product Formula relates the
series (24.3) for the -function to an infinite product indexed by the prime numbers:
1
X
Y
1
1
=
.
(24.4)
1
nz
pz
n=0
pP
83
Here P = {2, 3, 5, 7, . . .} denotes the set of all prime numbers. We will not go into
the details of the proof here, although we will mention that it involves the Fundamental
Theorem of Algebra which tells us that each term 1/nz can be written in the form
1
nz
1
(pa1 1 pa2 2 par r )z
z
z
z

1
1
1
=
pa1 1
pa2 2
par r
in exactly one way. From (24.4), we can see exactly why the -function is important to
number theorists. It connects analysis (i.e. infinite series and later functions of a complex
variable) to the prime numbers.
24.4. Eulers Refinement of Euclids Theorem
Recall that Euclid (over 2300 years ago) showed that the set of prime numbers is
infinite. This nontrivial assertion, now known as Euclids theorem, was proved in Book
IX of Euclids Elements. One proof of Euclids theorem is in Lecture 1.2 An 18th century
proof based on the Euler Product Formula (24.4) is given below:
Theorem 24.1 (Euclids Theorem). The number of primes is infinite.
Pf. 1 (Euler, 1737). If the set P of primes were finite, then the Euler Product Formula
(24.4) would have only finitely many terms. Hence the product would remain bounded as
zP 1 which contradicts the fact that the series is unbounded (since the harmonic series

n=1 1/n diverges) as z 1.
Using infinite series techniques, Euler proved a much sharper version of Euclids theorem. Eulers version roughly tells us that there are enough primes to make the series of
prime reciprocals
1 1 1 1
1
1
+ + + +
+
+
(24.5)
2 3 5 7 11 13
diverge. Compare this with the series of reciprocals of perfect squares
1+
1
1
1 1
+ +
+
+
4 9 16 25
(24.6)
which converges by the integral test (Euler also proved that (24.6) converges to 2 /6, an
important result which we will discuss later). Although there are infinitely many primes
and infinitely many perfect squares, the primes are packed close enough together to make
(24.5) diverge while the perfect squares are far enough apart to make (24.6) converge.
The recent proof of Eulers theorem presented below is due to Clarkson:
Theorem 24.2 (Euler, 1737). If pn denotes the nth prime number, then the series
X
1
p
n=1 n
(24.7)
diverges. In particular, this implies that the number of primes is infinite.

2Suppose that the set P = {p , p , . . . , p } of all primes is finite. The number N = p p p + 1 is
n
n
1 2
1 2
not divisible by any of the primes pj since division by any pj would leave a remainder of 1. Therefore the prime
factors of N do not belong to P, a contradiction.
84
Pf. (Clarkson, 1966). Suppose toward a contradiction that the series (24.7) converges. It
follows that there exists a positive integer k such that
m=k+1
1
1
< .
pm
2
(24.8)
This is because the left hand side of (24.8) is the tail-end of a convergent series and hence
tends to 0 as m (i.e we are letting = 12 ). Now let
Q = p1 p2 pk
and note that all of the numbers
1 + nQ,
n = 1, 2, 3, . . .
are not divisible by any of the primes p1 , . . . , pk . This follows from the same trick used in
Euclids original proof (see Lecture 1). Hence the prime factors3 of each number 1 + nQ
all belong to the set {pk+1 , pk+2 , . . .}.
For each N 1 we have
!j
N
X
X
X
1
1
.
(24.9)
1 + nQ j=0
pm
n=1
m=k+1
The reason for the inequality is due to the fact that the sum on the right hand side of (24.9),
when expanded, includes every term on the left hand side. Now observe that (24.8) tells us
that the right hand side of (24.9) is dominated by the convergent geometric series
j
X
1
.
2
j=0
This implies that
1
1
+
nQ
n=1
converges, since it is a series of positive terms which has bounded partial sums. The
integral test, however, reveals that this is false.4 This contradiction shows that the original
series (24.7) diverges.

There are many variants and further refinements of this theorem. For instance, a sharpened form says that
X1
log log x = B1
lim
x
p
px
where B1 0.2614972 . . . is called Mertens Constant. This was first demonstrated (independently) in 1866 by Meissel and Mertens in 1874. A shocking refinement of Eulers
3We are implicitly using
Theorem 24.3 (Fundamental Theorem of Arithmetic). Every integer n > 1 can be expressed as a product of
ar
1 a2
primes. Specifically n = pa
1 p2 pr where the pk are distinct primes and the ak are positive integers. The
factorization of an integer n > 1 into primes is unique, apart from the order of the prime factors.
4i.e. R
1
dx
1+xQ
diverges.
85
theorem is Bruns theorem (1919). This theorem states that the series of reciprocals of twin
primes converges. In fact

1 1
1 1
1
1
+
+
+ 1.9021606 . . .
+
+
+
3 5
5 7
11 13
It is not known whether the constant 1.9021606 . . . is rational or irrational. Furthermore,
it is not even known whether or not there are infinitely many twin primes.
LECTURE 25
Compactness
25.1. Compactness
Definition. Let (M, d) be a metric space. A subset S M is called compact1 if every
sequence xn in S has a subsequence xnk which converges to a point in S.
Example 25.1. is a compact subset of any metric space.
Example 25.2. Any finite subset of a metric space is compact. Indeed, let (M, d) be a
metric space and let S = {a1 , a2 , . . . , an } be a finite subset of M . If xn is a sequence in
S, then there exists i {1, 2, . . . , n} so that xn = ai for infinitely many n. In other words,
there exists a sequence nk of natural numbers such that the corresponding subsequence xnk
of xn is constant (each term is ai ). In particular, the subsequence xnk converges to ai .
Example 25.3. The set S = { n1 : n N} is not compact in R (with respect to the usual
metric), even though every subsequence of each sequence in S converges to 0. Since this
limit point is not an element of S, S is not compact. On the other hand, S {0} is a
compact subset of R.
Theorem 25.1. Every closed interval [a, b] R is compact.2
Proof. Without loss of generality, suppose that xn is a sequence in [0, 1]. Let I0 = [0, 1]
and select any xn0 I0 . Now observe that xn [0, 12 ] or xn [ 12 , 1] for infinitely many
values of n. Let I1 denote one of these subintervals which contains xn for infinitely many
values of n and select xn1 I1 where n0 < n1 . Continuing this bisection procedure, we
construct a sequence of subintervals Ik such that
Ik+1 Ik I1 I0 = [0, 1]
and
1
,
2k
and corresponding points xnk Ik such that n0 < n1 < . The subsequence xnk so
constructed satisfies the condition
1
j, k N |xnk xnj | < N
2
since the terms xnk and xnj are restricted to lie in the interval IN . Therefore the subsequence xnk is Cauchy. Since [0, 1] is complete (it a closed subset of the complete metric
space R), it follows that the subsequence xnk converges to a limit in [0, 1].

Length(Ik ) =
1The term sequentially compact is sometimes used to distinguish this concept from covering compactness,
which we will discuss later. It turns out that in metric spaces the two concepts are equivalent (this is a major
theorem). Therefore we can safely use the term compact without worrying too much in the long run.
2By definition [a, b] is of finite length b a. In other words, we do not mean to include closed intervals of
the form [a, ) or (, b].
86
87
Theorem 25.2. A box [a1 , b1 ] [a2 , b2 ] [an , bn ] is compact in Rn .
Proof. We prove the theorem in R2 . The general case (where n > 2) is similar, but the
notation is more cumbersome. If (xn , yn ) is a sequence in the box [a1 , b1 ] [a2 , b2 ], then
xn is a sequence in the compact subset [a1 , b1 ] of R. By the compactness of [a1 , b1 ] (as a
subset of R), there exists a subsequence xnk of xn so that xnk converges to some x in R.
Now ynk lives in the compact set [a2 , b2 ] and hence has a convergent subsequence
ynkj which converges to some point y in R. Therefore the subsubsequence (xnkj , ynkj )
converges to the point (x, y) in R2 . This proves that the box [a1 , b1 ] [a2 , b2 ] is compact.

Theorem 25.3. Every compact subset of a metric space (M, d) is closed and bounded.
Proof. Let S be a compact subset of a metric space (M, d). Suppose that xn is a sequence
in S which converges in M . In other words, there exists x M so that xn x. Since
S is compact, there is a subsequence xnk of xn that converges to some point y S. But
subsequences of a convergent sequence must converge to the original limit. Therefore
x = y S and hence S is closed (since S contains all of its limits points).
To see that S is bounded, fix x M . Either S is bounded (i.e., there exists M > 0
so that S BM (x)) or else for each n N there exists xn so that d(x, xn ) > n. Since
S is compact, there exists a subsequence xnk of xn which converges to a point y S.
However,
nk < d(xnk , x)
d(xnk , y) + d(y, x)
for all k N. However, since xnk y and d(y, x) is constant, the right hand side of the
preceding inequality is bounded, a contradiction. Thus S is bounded, as claimed.

Theorem 25.4. Every closed subset of a compact metric space (M, d) is compact.
Proof. Let S be a closed subset of a compact metric space (M, d). If xn is a sequence
in S, then the compactness of M yields a subsequence xnk which converges to a point x
in M . However, xnk is a sequence in the closed set S and hence the limit point x must
belong to S. In particular, this means that every sequence in S has a subsequence which
converges in S. Thus S is compact.

Corollary 8. The arbitrary intersection of compact
T sets is compact. In other words, if Ai
is a compact subset of (M, d) for all i I, then iI Ai is also compact.
Proof.
T Recall that the arbitrary intersection of closed sets is closed. It therefore follows
that iI Ai is a closed subset of a compact set (namely any of the Ai ) and is thus compact
by the preceding theorem.

It is also true that the union of finitely many compact sets is also compact.
25.2. Compact Sets in Rn
Theorem 25.5 (Bolzano-Weierstrass). A bounded sequence in Rn has a convergent subsequence.3
3Note that the limit point does not have to be a member of the sequence. The sequence 1, 1 , 1 , . . . in R
2 3
is bounded and has many convergent subsequences. However, the limit point 0 does not belong to the original
sequence.
88
Lecture 25. Compactness
Proof. A bounded sequence in Rn is contained in a box. Since boxes are compact, some
subsequence converges to a limit contained in the box.

The preceding theorem is sometimes stated without mention of sequences:
Theorem 25.6 (Bolzano-Weierstrass). A bounded, infinite subset S of Rn has an accumulation point in Rn .4
Recall that if a subset S in a metric space (M, d) is compact, then S must be closed
and bounded. In general, the converse is false (i.e., it is possible for S to be closed and
bounded, but not compact). However, Rn is particularly nice in the sense that the converse
is true for subsets of Rn :5
Theorem 25.7 (Heine-Borel). A subset S of Rn is compact if and only if S is closed and
bounded.
Proof. We have already proved that a compact set must be closed and bounded. Indeed,
this holds in any metric space (M, d), not just Rn .
On the other hand, suppose that S is a closed and bounded subset of Rn . Since S is
bounded, it follows that S is contained in a box B = [a1 , b1 ][a2 , b2 ] [an , bn ] Rn .
Since B is compact, it follows that every sequence xn in S has a subsequence xnk which
converges to a limit x in B. However, since S is closed, it follows that x belongs to S.
Therefore S is compact.
4The limit point need not belong to S.

5Here Rn is endowed with the standard Euclidean metric d . However, since a sequence in Rn converges
2
with respect to one of d1 , d2 , d if and only if it converges with respect to the other two metrics, it turns out that
the following theorem holds regardless of whether which of the three metrics d1 , d2 , d one uses.
LECTURE 26
The Cantor Set

26.1. The Cantor Set
The Cantor Set is an interesting and bizarre mathematical object and one of the first
fractals to be discovered. It is an endless source of counterexamples and quite simply an
interesting object to ponder. Moreover, the Cantor Set is the tip of the iceberg of a whole
theory of fractals (self-similar objects) and can be used to produce examples of Peano
curves (space-filling curves).
Define a sequence of subsets Cn of [0, 1] according to the following scheme:
C0
[0, 1]
C1
C2
..
.
=
..
.
[0, 13 ] [ 23 , 1]
[0, 91 ] [ 29 , 13 ] [ 32 , 97 ] [ 89 , 1]
..
.
where Cn+1 is obtained from Cn by removing the middle third of every closed interval
contained in Cn . To be more specific:
Cn consists of 2n closed intervals of length
1
.
3n
The Cantor set is defined to be

C=
Cn .
n=0
In other words, C is the set that is left over after removing the intervals
[ 31 , 23 ],
[ 19 , 92 ],
[ 97 , 89 ], . . .
from [0, 1]. One might at first think that C is empty. Nothing could be further from the
truth. In fact, it is immediately clear that C is infinite since 0, 1, 13 , 23 , 19 , . . . all belong to
C. This is because each of these numbers belongs to every Cn (i.e., these numbers are
never removed from [0, 1] during the construction of C). In other words, the endpoints
of any of the closed subintervals that belong to any of the Cn also belong to the Cantor Set
C.
It turns out that C is a highly nontrivial example of a compact set:
Theorem 26.1. C is compact.
Proof. Each set Cn is the finite intersection of closed sets and hence closed. It follows that
C, being the intersection of closed sets, is itself closed. Since C is also bounded, it follows
that C is compact by the Heine-Borel theorem.

89
90
One of the first things one notices about C is that it is somewhat sparse. Although we
have not discussed the concept of Lebesgue measure (a Math 137-138 topic), the following
theorem is too curious to pass up:
Theorem 26.2. C has measure zero. In other words, the length of C is zero.
Proof. We will show that the complement [0, 1]\C of the Cantor set has length 1. This is
a much more intuitive thing to do since [0, 1]\C is simply a union of open intervals, each
of which has a well-defined length. Therefore the length of [0, 1]\C is simply the sum of
the lengths of the intervals that are removed in the formation of C. Recalling that the set
Cn from the nth stage of the construction of the Cantor set consists of 2n closed intervals
of length 31n , we compile the following table:
Stage
0
1
2
3
..
.
n
..
.
# of Intervals Removed Length of Intervals Removed Total Removed

0
0
0
1
1/3
1/3
2
1/9
2/9
4
1/27
4/27
..
..
..
.
.
.
2n1
1/3n
2n1 /3n
..
..
..
.
.
.
The total removed is therefore

1
3
2
9
4
27
+ =
=
=
=
1
2
4
3 (1 + 3 + 9
X
1
( 32 )n
3
n=0
1
1
3 1
1.
+ )
2
3
Thus [0, 1]\C has length 1, which implies that C has zero length.
Of course we may have expected the preceding since C seems so sparse and spread
out. However, in another sense C is quite large:
Theorem 26.3. C is uncountable. In fact, C has the same cardinality as [0, 1] itself.
Sketch of Pf. One notes that a number x [0, 1] belongs to C if and only if the base3 expansion of x contains only the digits 0 or 2. In other words, each x C can be
represented uniquely in the form
a0
a2
a1
x=
+ 2 + 3 +
3
3
3
where ai {0, 2} for each i. Thus C has the same cardinality as the set of all infinite
strings of 0s and 2s. However, this is the same cardinality as that of the set of all infinite
binary strings strings which use only the symbols 0 and 1. The set of all infinite binary
strings corresponds to [0, 1] itself, however, which is uncountable.

The preceding theorem is quite remarkable, since it says that C has the same cardinality as [0, 1] (and hence R itself) despite the fact that the length of C, by any reasonable
standard of measurement, is zero.
91
Recall that the Cantor set C is a peculiar compact subset of [0, 1] which is uncountable, yet is of measure 0. Unlike other sets that we have encountered in our cardinality
discussions, the Cantor set is closed in R. In other words, C contains all of its limit points.1
In fact, it turns out that even more is true:
Theorem 26.4. Every point of C is an accumulation point of C.
Proof. If x C and > 0 are given, then let n N so be large that 3n < . Since
x Cn , there exists a closed interval I of length 3n such that x I Cn . If y be an
endpoint of I which is distinct from x, then |x y| 3n < . This implies that x is an
accumulation point of C.

Thus C is an uncountable subset of [0, 1] of measure zero which is somehow so dense
in itself that every point of C is the limit of a sequence of distinct points of C. Moreover,
C is totally disconnected, in the sense that between any two points x, y C, there exists
a point z in between them which is not an element of C. To show this, it suffices to prove
that C contains no intervals.
Theorem 26.5. C contains no intervals.
Proof. Let I be a subinterval of [0, 1] of length > 0. Let n N be so large that
3n < . Then Cn consists entirely of intervals of length < which implies that I 6 Cn .
In particular, this means that I 6 C.

26.2. The Cantor Ternary Function
Using the Cantor set, one can create a host of pathological functions. For instance,
consider the Cantor ternary function (a.k.a. the Devils Staircase) f : [0, 1] [0, 1]
defined by
1
if x [ 13 , 23 ]
1 if x [ 1 , 2 ]
4
9 9
f (x) = 3
if x [ 79 , 89 ]
.. ..
. .
(see Figure 1). Clearly this is well defined if x
/ C. If does belong to C, then recall x C
if and only if there exists a sequence an of 0s and 2s so that
x=
For such x, we define
f
X
an
3n
n=1
X
an
.
3n
n=1
X
an
.
2n+1
n=1
Theorem 26.6. The Cantor ternary function f : [0, 1] [0, 1] is continuous and increasing and satisfies f (x) = 0 for all x
/ C.
Sketch of Pf. Since f is constant on each of the open intervals removed during the construction of C, we need only show that f is continuous at each point of C itself. Let > 0
be given and let n N be so large that 1/2n < . If |x y| < = 31n , then there are
1For instance, contrast this with Q [0, 1] which is certainly not closed.
92

1
3
4
1
2
1
4
F IGURE 1. Graph of the Cantor ternary function f .

ternary expansions of x and y whose first n symbols agree. Therefore the first n symbols
in the binary expansion of f (x) and f (y) agree, implying that
|f (x) f (y)|
1
2n
< .
Since f is constant on [0, 1]\C, it follows that f (x) = 0 for all x

/ C.
In other words, the Cantor ternary function is flat almost everywhere yet is still
increasing. Using such tricks, one can create even more bizarre functions:
Example 26.1. Let f denote the Cantor ternary function. The function g : [0, 1] R
defined by
g(x) = x 2f (x)
satisfies (see Figure 2) g (x) = 1 almost everywhere (when x

/ C) while also satisfying
g(0) = 0,
g(1) = 1.
In particular, g is increasing almost everywhere, yet manages to decrease overall.

There are many more strange fractals that can be created using this same circle of
ideas. For instance, the Menger Sponge is depicted in Figure 2.
26.3. Cantor Set Trivia
There are a whole host of deeper theorems concerning the Cantor set (which we will
not prove). For example:
Theorem 26.7. D(C) = {x y : x, y C} = [1, 1].
In other words, the set of all differences of elements of C is all of [1, 1]. This is
despite the fact that C itself has measure zero.
Another shocking theorem is the following Cantor Surjection Theorem:
Theorem 26.8. If (M, d) is a compact metric space, then there exists a continuous surjection f : C M where C is the Cantor set.2
2Here C is endowed with the usual metric d(x, y) = |x y|.
93
1
4
1
2
3
4
-1
F IGURE 2. Graph of g(x) = x 2f (x)
F IGURE 3. The second stage in the construction of the Menger sponge.
In other words, all compact metric spaces are continuous images of the Cantor set.
This does not contradict anything we have learned about connectedness. Although the
continuous image of a connected set is connected, the continuous image of a disconnected
set can certainly be connected.
Another byproduct of the construction of the Cantor Set is the construction of Peano
curves (space-filling curves). For example:
94
Theorem 26.9. There exists a continuous function f : [0, 1] [0, 1] [0, 1] which is
surjective.
Finally, we leave off with the shocking fact that a nontrivial metric space can be homeomorphic to its own Cartesian product:
Theorem 26.10. C is homeomorphic to C C.
LECTURE 27
Compactness and Continuity

27.1. Continuity and Compactness
Theorem 27.1. Let (A, dA ) and (B, dB ) be metric spaces and let f : A B be a
continuous function. If S is a compact subset of A, then f (S) is a compact subset of B. In
other words, the continuous image of a compact set is compact.
Proof. Let S be a compact subset of A and suppose that yn is a sequence in f (S) For
each n N, select xn in S so that f (xn ) = yn .1 Since S is compact, it follows that some
subsequence xnk of xn converges to a limit x S. By the continuity of f , it follows
that the sequence ynk = f (xnk ) converges to the point y = f (x), which clearly belongs
to f (S). In particular, we have shown that every sequence yn in f (S) has a subsequence
which converges in f (S). Thus f (S) is compact.

Corollary 9. If S is a closed and bounded subset of Rn and f : Rn Rm is a continuous
function, then f (S) is a compact subset of Rm .
Proof. This follows immediately from the Heine-Borel theorem (a subset of Rn is compact
if and only if it is closed and bounded) and the fact that the continuous image of a compact
set is compact.

Example 27.1. Even for functions f : R R, the image of a closed set need not be
closed and the image of a bounded set need not be bounded. The reader is urged to come
up with examples demonstrating this.
For real-valued continuous functions, the preceding theorem has the following important corollary:
Theorem 27.2 (Extreme Value Theorem). If (A, dA ) is compact, then each real-valued
continuous function f : A R is bounded (i.e., there exists M > 0 such that |f (x)| M
for all x A). Moreover, it assumes global maximum and minimum values. A special case
from Calculus I is:
A continuous function f : [a, b] R is bounded and assumes an
absolute maximum and absolute minimum somewhere on [a, b].
Proof. By the preceding theorem, f (A) is a compact subset of R. In particular, this means
that f (A) is closed and bounded. Since f (A) is nonempty and bounded above (resp.
below), it follows that sup f (A) (resp. inf f (A)) exists and is finite. It is clear that
inf f (A) f (x) sup f (A)
holds for all x A. We claim that f attains the absolute maximum value y = sup f (A) at
some point of A (the proof that f attains its absolute minimum value inf f (A) is similar).
1In other words, let x belong to f 1 ({y }).
n
n
95
96
Lecture 27. Compactness and Continuity
By the Approximation Property of Suprema, there exists a sequence yn f (A) such

that yn y (if such a sequence did not exist, then y would not be the least upper bound
of f (A)). Thus there exists a sequence xn A so that f (xn ) y. Since A is compact, it
follows that some subsequence xnk converges to a point x in A. In particular,
f (x) = lim f (xnk ) = y = sup f (A)
n
and thus f assumes the maximum value sup f (A) at x.
Example 27.2. There is a hottest place on Earth if we imagine the surface of the earth
to be the sphere S = {(x, y, z) R3 : x2 + y 2 + z 2 = 1}, we see that S is compact (it is
closed and bounded). Since the function T (x, y, z) describing the temperature at any point
of S is continuous (one would assume), the preceding theorem says that there is a point on
S at which the temperature is an absolute maximum.
27.2. Uniform Continuity
Recall that a function f : A B is said to be continuous on A if f is continuous
at each point x in A. In particular, observe that if f is continuous on A, then the in the
definition of continuity is allowed to depend upon x. In other words, given > 0, the same
is not guaranteed to work for each x in A. Some x will require smaller s than others.
There is a stronger notion of continuity that is extremely important in analysis:
Definition. Let (A, dA ) and (B, dB ) be metric spaces. A function f : A B is called
uniformly continuous if
( > 0)( > 0)(x, y A)( dA (x, y) <
dB (f (x), f (y)) < ).
The key difference between uniform continuity and continuity is that once > 0 is
fixed, the same > 0 must work for all x, y A. Let us make this distinction explicit. A
function f : A B is continuous on A if
(x A)( > 0)( > 0)(y A)( dA (x, y) <
dB (f (x), f (y)) < )
and uniformly continuous on A if

( > 0)( > 0)(x, y A)( dA (x, y) <
dB (f (x), f (y)) < ).
Example 27.3. The function f : [0, 1] R defined by f (x) = x2 is uniformly continuous. One can prove this directly from the definition, but we prefer a more clever
approach. Consider the following:
|f (x) f (y)| = |x2 y 2 |
= |x + y||x y|
2|x y|.
It follows that if > 0 is given that we may take = /2 for all x, y [0, 1]. Indeed,
|x y| < immediately implies, by the preceding inequalities, that |f (x) f (y)| < .
The following example indicates that the domain of the function plays a significant
role in whether a continuous function is uniformly continuous:
Example 27.4. The function f : R R defined by f (x) = x2 is continuous, but not uniformly continuous. What does it mean to not be uniformly continuous? Since the definition
97
of uniform continuity is somewhat complicated, let us explicitly negate the definition. In

other words, let us evaluate the following logical expression:
( > 0)( > 0)(x, y)( |x y| <
|f (x) f (y)| < ).
It follows that f is not uniformly continuous if and only if the following holds2:
( > 0)( > 0)(x, y)( |x y| <
|f (x) f (y)| ).
Thus we must prove that the preceding statement is satisfied by our given function. Indeed,
if = 1 and > 0 is given, we wish to find x, y such that |x y| < and |x2 y 2 | 1.
We claim that if x is sufficiently large and if y = x + 2 , then both conditions will hold.
We therefore wish to find x 0 such that the inequality
1 |f (x + 2 ) f (x)|
= |(x + 2 )2 x2 |
= 2x( 2 ) + ( 2 )2
= x +
2
4
is satisfied. A short computation reveals that any x which satisfies

1
x
4
will do. In particular, this shows that f is not uniformly continuous.
2Recall that P Q is equivalent to P Q. Thus (P Q) is equivalent to P ( Q).
LECTURE 28
Uniform Continuity
Theorem 28.1. A continuous function on a compact set is uniformly continuous.
Proof. Let (A, dA ) be a compact metric space, let (B, dB ) be a metric space, and let f :
A B be continuous. Suppose toward a contradiction that f is not uniformly continuous.
Hence there exists > 0 so that no matter how small > 0 is, there exist x, y A so that
dA (x, y) < but dB (f (x), f (y)) .
Letting = 21n for n N, we may therefore find sequences xn , yn in A so that
dA (xn , yn ) <
1
2n
and dB (f (xn ), f (yn )) .
Since (A, dA ) is compact, it follows that some subsequence xnk of xn converges to a

point x in A. However, since the right hand side of
dA (x, ynk ) dA (x, xnk ) + dA (xnk , ynk )
tends to zero, it follows that ynk also converges to x:
lim xnk = lim ynk = x.
The sequential characterization of continuity tells us that

lim f (xnk ) = lim f (ynk ) = f (x)
and thus the right hand side of

dB (f (xnk ), f (ynk )) dB (f (xnk ), f (x)) + dB (f (x), f (ynk ))
tends to zero. In particular, it follows that dB (f (xnk ), f (ynk )) < holds for sufficiently
large k. However, this contradicts the fact that dB (f (xnk ), f (ynk )) .

Example 28.1. Let A = C [2, 3] where C is the Cantor set. Since A is closed and
bounded in R, it follows that A is compact. If we define f : A R by
2
ex sin x + 47 + cos(sin(cos( 3 x + 47)))
,
f (x) =
47 x
e + 47
then the preceding theorem asserts that f is uniformly continuous on A. Clearly this is not
something you would want to ever verify by direct computation.
28.1. Nested Compact Sets
Theorem 28.2. Let (M, d) be a metric space. If An is a sequence of nonempty, compact
subsets of M such that
A0 A1 A2
T
then A = n=0 An is compact and nonempty.
98
99
Proof. Recall that any closed subset of a compact set is compact. Since the arbitrary
intersection of closed sets is closed, it follows that A is a closed subset of A0 whence A is
compact. We must now show that A is nonempty.
For each n N, select a point xn An . Since A0 is compact and since xn A0 for
all n N, it follows that some subsequence xnk of xn converges to a limit x A0 .
We wish to show that the limit point x of this subsequence also belongs to each An .
To do this, note that for each n N the tail sequence
xn , xn+1 , xn+2 , . . .
is a sequence in An (a closed set) which has a subsequence which convergesTto x. Hence it
follows that the limit point x belongs to An for each n N. Therefore x n=1 An = A
and A 6= .

Definition. Let (M, d) be a metric space. The diameter diam(S) of a subset S M is
defined by
diam(S) = sup{d(x, y) : x, y S}.
In other words, the diameter of a set is the supremum of the distances between points of S.
Theorem 28.3. Let (M, d) be a metric space. If An is a sequence of nonempty, compact
subsets of M satisfying
lim diam(An ) = 0,
n
T
then A = n=0 An consists of a single point.
Proof. By the preceding theorem, A 6= . On the other hand, A An for each n N

and thus for any x, y A it follows that
n
0 d(x, y) diam(An ) 0.
In other words, d(x, y) = 0 for every x, y A, whence x = y for all x, y A. Thus A

consists of exactly one point.
LECTURE 29
Contraction Mapping Principle

29.1. The Contraction Mapping Principle
To avoid the unnecessary proliferation of parentheses, we will sometimes denote T (x)
by T x (as is done in linear algebra).
Definition. Let (M, d) be a metric space. A function T : M M is a
(i) contraction if d(T x, T y) d(x, y) for all x, y M ,
(ii) strict contraction if d(T x, T y) < d(x, y) for all x, y M ,

(iii) uniformly strict contraction if there exists [0, 1) so that
for all x, y M .
d(T x, T y) d(x, y)
Lemma 8. If T is a contraction, then T is continuous.

Proof. Suppose that T : M M is a contraction. Let > 0 and let = . If d(x, y) ,
then d(T x, T y) d(x, y) < = . Thus T is continuous.

Theorem 29.1 (Contraction Mapping Principle). If T : M M is a uniformly strict
contraction on a complete metric space (M, d), then T has a unique fixed point x M
(i.e., T x = x). Furthermore, for any x0 M the iterates of x0 under T converge to x. In
other words,
lim T n x0 = x
n
for all x M where
Tn = T
| T T{z T} .
n
Proof. We need to show that a fixed point p of T exists and that it is unique.
E XISTENCE: Since T : M M is a uniformly strict contraction, there exists a constant
[0, 1) so that d(T x, T y) d(x, y) for all x, y M . Fix any x0 M and set
xn = T n x0 for n 1. It follows that
d(xn+1 , xn ) = d(T n+1 x0 , T n x0 )
= d(T (T n x0 ), T (T n1x0 ))
= d(T xn , T xn1 )
d(xn , xn1 ).
It follows from a previous HW assignment that the sequence xn = T n x0 converges to

some limit x:
lim T n x0 = x.
n
100
101
Since T is a contraction it is continuous and hence

T x = T ( lim xn )
n
= lim T xn
n
= lim xn+1
n
= x.
Thus T x = x and x is a fixed point of T . However, it appears that x might depend upon
the initial point x0 . We must show that this is not the case in other words we must show
that T can have at most one fixed point.
U NIQUENESS: If y is a fixed point of T then
0 d(x, y) = d(T x, T y) d(x, y),
which is possible if and only if d(x, y) = 0 (i.e., when x = y).
The following example demonstrates the power of the Contraction Mapping Principle
and of the entire metric space machinery that we have built up. In particular, the Contraction Mapping Principle can provide proofs that certain complicated differential and integral
equations have unique solutions. Moreover, it also provides an algorithm by which these
solutions can be computed.
Example 29.1. If g : [0, 1] R is continuous, then there exists a continuous real-valued
function f : [0, 1] such that
Z x
2
f (x t)et dt = g(x).
(29.1)
f (x)
0
Define the function T : C[0, 1] C[0, 1] by

Z x
2
T f = g(x) +
f (x t)et dt.
0
Clearly if T f = f , then f is a solution to (29.1). Moreover, any solution f to (29.1) also

satisfies T f = f . In other words, the solutions of (29.1) (if any exist) correspond to the
fixed points of the function T .
Recall that C([0, 1]) is complete with respect to d (we mentioned this earlier and
will prove it later in the course). Using some of the basic properties of integrals from
Calculus II, we find that
d (f1 , f2 ) = kT f1 T f2 k

Z x
Z x

t2
t2

f2 (x t)e
dt
f1 (x t)e
dt + g(x) +
= sup g(x) +
0x1
0
0
Z x

2
= sup
[f1 (x t) f2 (x t)] et dt
0x1
sup
0x1
0
x
|f1 (x t) f2 (x t)|et dt
kf1 f2 k
kf1 f2 k
et dt
et dt
102
where
Lecture 29. Contraction Mapping Principle

kf1 f2 k
=
et dt = 0.746824 . . . < 1.
0
2
The fact that 0 < 1 can be seen by considering the graph of ex for 0 x 1 (see
2
Figure 1). An alternate approach might be to use the fact that the series expansion for ex
1.0
0.8
0.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0
F IGURE 1. Graph of y = ex .
is alternating:
2
ex =
X
(x2 )n
n!
n=0
x4
x6
+
2
6
4
x
1 x2 +
2
= 1 x2 +
for x [0, 1]. Thus
t2
dt

x4
2
1x +
dt
2
= 0.76.
By the Contraction Mapping Principle, it follows that T has a unique fixed point. In other
words, there exists a unique continuous real-valued function f : [0, 1] such that
(29.1) holds.
The preceding example shows the power of this abstract approach. It also indicates
that we need to have a better understanding of integrals, derivatives, infinite series, and the
d metric in order to handle some of the sophisticated problems that one encounters in
other branchs of mathematics and in applications.
LECTURE 30
Derivatives
The following notion of convergence is commonly used in Calculus:
Definition. Let (A, dA ) and (B, dB ) be metric spaces, let a A, and let f : A\{a} B
be a function. We say that
lim f (x) = y
xa
if for every > 0 there exists > 0 such that

0 < dA (x, a) <
|f (x) y| < .
The following theorem asserts that the preceding definition of limits agrees with our
original definition and also agrees quite well with the definition of continuity:
Theorem 30.1. Let (A, dA ) and (B, dB ) be metric spaces, let a A and let f : A\{a}
B be a function. The following are equivalent:
(i) limxa f (x) = y,
(ii) limn f (xn ) = y for any sequence xn in A which converges to a.
(iii) f can be extended continuously to a by setting f (a) = y. In other words, the
function fe : A B defined by
(
f (x) x 6= a
fe(x) =
y
x=a
is continuous on A.
30.1. Derivatives
Definition. A function f : (a, b) R is said to be differentiable at x0 (a, b) if for every
> 0 there exists > 0 such that

f (x) f (x0 )

L < .
(30.1)
0 < |x x0 | <
x x0
We call L the derivative of f at x0 , denoted f (x0 ).
The preceding condition is frequently shortened to

lim
xx0
or equivalently
f (x) f (x0 )
=L
x x0
f (x0 + h) f (x0 )
=L
h0
h
lim
103
104
Lecture 30. Derivatives
In other words, f is differentiable at x0 if the function

(
f (x)f (x0 )
x 6= x0
xx0
g(x) =
L
x = x0
is continuous at x0 .
One can also see that (30.1) is equivalent to saying that

0 < |x x0 | < f (x) f (x0 ) f (x0 )(x x0 ) < |x x0 |

|
{z
}

E(x)
(30.2)
so that
f (x) = f (x0 ) + f (x0 )(x x0 ) + E(x)

where the error term E(x) has the property that for each > 0, there exists > 0 so that
|x x0 | < implies that

E(x)

x x0 < .
In other words, the error term E(x) is goes to zero faster than x x0 does:
E(x)
= 0.
x x0
The astute reader will observe that there is the slight problem defining E(x0 ). However,
one notes that (30.2) implies that E(x) is uniformly continuous near x0 and hence can be
extended continuously to x0 (see HW).
Summing things up, a differentiable function f is well approximated by the linear
function f (x0 ) + f (x0 )(x x0 ) near x0 .
lim
xx0
Theorem 30.2. If f : (a, b) R is a constant function, then f (x) = 0 for all x (a, b).
Proof. Since the difference quotient is always 0, the theorem follows immediately from
the definition of the derivative.

Theorem 30.3. If f is differentiable at x0 , then f is continuous at x0 .
Proof. This follows from the fact that

f (x) f (x0 )
|x x0 |
lim |f (x) f (x0 )| = lim

xx0
xx0
x x0

f (x) f (x0 )
lim |x x0 |
= lim
xx0
xx0
x x0
= |f (x0 )| lim |x x0 |
xx0
= 0.
LECTURE 31
Mean Value Theorem

Definition. A function f : (a, b) R has a local maximum (resp a local minimum) at
x0 (a, b) if there exists an open interval I such that x0 I (a, b) and f (x) f (x0 )
(resp. f (x) f (x0 )) for all x I.
The following theorem is familiar from Calculus I. It justifies the traditional method of
maximizing or minimizing differentiable functions by searching for zeros of the derivative.
Theorem 31.1. If f : (a, b) R has a local maximum or a local minimum at x0 (a, b)
and f is differentiable at x0 , then f (x0 ) = 0.
Proof. Without loss of generality, suppose that f has a local maximum at x0 . Thus there
exists an open interval I (a, b) such that x0 I and f (x) f (x0 ) for all x I. In
other words,
f (x) f (x0 ) 0
for all x I. Now let > 0 be so small that (x0 , x0 + ) I and then observe that
x0 < x < x0 +
f (x) f (x0 )
0
x x0
whence f (x0 ) 0. On the other hand, we also see that

x0 < x < x0
f (x) f (x0 )
0
x x0
whence f (x0 ) 0. Putting this all together, we find that f (x0 ) = 0, as desired.
Another incredibly useful theorem from Calculus I is the following:

Theorem 31.2 (Mean Value Theorem). If f : [a, b] R is continuous on [a, b] and
differentiable on (a, b), then there exists x0 (a, b) so that
f (b) f (a) = f (x0 )(b a).
(31.1)
Proof. Let
f (b) f (a)
ba
denote the slope of the secant of the graph of f from (a, f (a)) to (b, f (b)). Next define the
auxiliary function
g(x) = f (x) Sx
S=
and note that g(a) = g(b) since the net rise of both f (x) and Sx are the same over the
interval [a, b]. Let us make this more precise:
g(a) = f (a) Sa
105
106
Lecture 31. Mean Value Theorem

f (b) f (a)
ba
(b a)f (a) af (b) + af (a)
=
ba
bf (a) af (b)
=
,
ba
= f (a) a
g(b) = f (b) Sb
f (b) f (a)
ba
(b a)f (b) bf (b) + bf (a)
=
ba
bf (a) af (b)
=
ba
= f (b) b
whence
g(a) = g(b) =
There are two cases to investigate:
bf (a) af (b)
.
ba
(i) If g is constant on [a, b], then

0 = g (x) = f (x) S
for all x (a, b) whence f (x0 ) = S for every x0 (a, b). In particular, this
proves (31.1).
(ii) Suppose now that g is not constant. Since g is continuous on [a, b], it follows
from the Extreme Value Theorem that g assumes an absolute maximum and
absolute minimum on [a, b]. Since g is not constant, either the absolute maximum or absolute minimum value of g on [a, b] is attained at some x0 (a, b).
It follows that g(x0 ) is either a local maximum or local minimum whence
g (x0 ) = 0. However, this implies that f (x0 ) = S whence (31.1) follows.
An immediate corollary of the Mean Value Theorem is Rolles Theorem:
Theorem 31.3 (Rolles Theorem). If f : [a, b] R is continuous on [a, b], differentiable
on (a, b), and f (a) = f (b), then there exists x0 (a, b) so that f (x0 ) = 0.
Example 31.1. If f (x) = x3 + px + q where p > 0, then f has a unique real root. First
let us observe that at least one root exists. Since limx f (x) = , it follows that
f assumes both positive and negative values whence a root must exist by the Intermediate
Value Theorem. Now suppose toward a contradiction that there exist a < b such that
f (a) = f (b) whence there would exist a c (a, b) such that 0 = f (c) = 3c2 + p > 0.
This is a contradiction.
Theorem 31.4. If f is differentiable on (a, b) and f (x) 0 for all x (a, b), then f is
increasing on (a, b). In other words, x y implies that f (x) f (y).
Proof. Let x < y (the case x = y is trivial) and note that the Mean Value Theorem gives
us c (x, y) so that
f (y) f (x) = f (c)(y x) y x 0.
107
Example 31.2. The Mean Value Theorem can be used to prove various interesting inequalities for everyday functions. For example, given a, b (/2, /2) there exists c strictly
between a and b so that
tan b tan a = (sec2 c)(b a)
from which it follows that
| tan b tan a| |b a|
2
since sec c 1 for all c (/2, /2).
2
Example 31.3. The function f (x) = ex is uniformly continuous on [0, 1]. Although
this is guaranteed from the fact that f is continuous and [0, 1] is compact, we can prove
this directly. Given 0 x < y 1, the Mean Value Theorem asserts that there exists
c (x, y) so that
2
2
2
ex ey = 2cec (x y).
Since 0 c 1, it follows that
2
|ex ey | 2e|x y|.
If > 0 is given, it follows from the preceding inequality that if
|x y| < =
|f (x) f (y)| < .
2e
2
Since this depends only upon , it follows that f (x) = ex is uniformly continuous on
[0, 1].
The following theorem (which we state without proof) asserts that a differentiable
function satisfies the mean value property:
Theorem 31.5. Let f : [a, b] R be differentiable at each point of [a, b]. If f (a) < y <
f (b) or f (b) < y < f (a), then there exists x (a, b) such that f (x) = y.
Example 31.4. There does not exist a function F : R R such that F (x) = [x] for all
x R. Since the greatest integer function certainly does not have the intermediate value
property, it is clear from the preceding theorem that it cannot be the derivative of another
function.
LECTURE 32
Functions Behaving Badly

32.1. Functions Behaving Badly
Since we know most of the standard theorems from Calculus I pertaining to derivatives, we will not bother to prove them here. It is simply more important to know how to
use these classic theorems (and to understand exactly what they do and do not say) rather
than how to prove them.
The following examples illustrate that differentiable functions can have discontinuities
which are not jump discontinuities:
Example 32.1. The function f : R R defined by
(
x2 sin x1 x > 0
f (x) =
0
x0
(see Figure 1) is differentiable everywhere even at x = 0. Indeed, the standard formulas
0.002
0.001
0.025
-0.001
-0.002
F IGURE 1. Graph of f (x) = x2 sin(1/x).

from Calculus I tell us that the derivative (see Figure 2) given by
1
1
f (x) = 2x sin cos
x
x
for x > 0. Using the definition of the derivative, we also see that

f (x) f (0)
= lim 0 = 0

lim
x 0 x0
x0
and

f (x) f (0)

= lim |x sin 1 |
lim
x
x 0 x0+
x0+
108
0.05
109
0.75
0.5
0.25
0.025
0.05
-0.25
-0.5
-0.75
-1
F IGURE 2. Graph of f (x) = 2x sin(1/x) cos(1/x).

lim x = 0
x0+
whence f (0) = 0. Since f (x) oscillates wildly (with amplitude approaching 1 as x 0),
it follows that limx0 f (x) 6= 0 = f (0) and hence f is discontinuous at x = 0. In
particular, note that f exists everywhere and the discontinuity at x = 0 is not a jump
discontinuity.
Example 32.2. An even more bizarre function can be constructed by modifying the preceding example. Consider the function (see Figure 2)
0.01
0.0075
0.005
0.0025
0.02
0.04
-0.0025
-0.005
-0.0075
-0.01
F IGURE 3. Graph of g(x) = x3/2 sin(1/x).
g(x) =
x3/2 sin x1
0
x>0
.
x0
By reasoning similar to that of the preceding example, we see that g (0) = 0. However,
the standard derivative formulas from Calculus I tell us that
1
1
1
3
g (x) =
x sin cos
2
x
x
x
for x > 0. Thus g is differentiable at x = 0 but g oscillates with increasing frequency
and unbounded amplitude as x approaches zero (see Figure 1). In particular, g exists
everywhere but is discontinuous at 0 in an extreme way.
110
Lecture 32. Functions Behaving Badly

20
15
10
0.025
0.05
-5
-10
-15
-20
F IGURE 4. Graph of g (x) =
3
2 x sin(1/x)
1
x
cos(1/x).
Example 32.3. This example should dispel the common misconception that if f (x0 ) > 0,
then f must be increasing in some neighborhood of x0 . Using similar reasoning to the
preceding examples, one can show that the function (see Figure 5)
(
x + 2x2 sin x1 x 6= 0
h(x) =
0
x=0
satisfies h (0) = 1 > 0 and hence one might naively assume that h is increasing in some
small neighborhood of 0 (this is not guaranteed by any theorem read your calculus text
more closely). This turns out to be false. Indeed, the derivative of h is given by
0.075
0.05
0.025
-0.1
0.05
-0.05
0.1
-0.025
-0.05
-0.075
F IGURE 5. Graph of h(x) = x + 2x2 sin(1/x).
h (x) =
1 + 4x sin x1 2 cos x1
1
x 6= 0
x=0
which oscillates between positive and negative values infinitely often as x approaches 0.
Thus h is not increasing on any open interval that contains 0.
Example 32.4. Another common misconception is that if a function has a local minimum
or maximum at a point, then the derivative of that function must undergo a simple change
of sign at that point. Consider the function (see Figure 6)
111
0.000014
0.000012
0.00001
8 10-6
6 10-6
4 10-6
2 10-6
-0.04
0.02
-0.02
0.04
F IGURE 6. Graph of k(x) = x4 (2 + sin(1/x)).
k(x) =
x4 (2 + sin x1 )
0
x 6= 0
x = 0.
In particular, we note that attains its absolute minimum at k(0) = 0. Using the definition
of the derivative, we can see that
(
4x3 (2 + sin x1 ) x2 cos x1 x 6= 0
k (x) =
0
x = 0.
In particular, k (0) = 0 as expected. However, the formula for k shows that k (x) assumes both positive and negative values in any neighborhood of 0 and hence k(x) is not
monotonic on any interval (0, ) or (, 0) for any > 0.
LECTURE 33
Uniform Convergence
33.1. Pointwise Convergence
Definition. Let (A, dA ) and (B, dB ) be metric spaces. A sequence of functions fn : A
B converges pointwise to a function f : A B if
lim fn (a) = f (a)
for each a A.
In other words, a sequence fn converges pointwise to f if and only if it converges
point-by-point. Unfortunately, pointwise convergence is of limited use since it does not
respect continuity. Consider the following example:
Example 33.1. The functions fn : [0, 1] R defined by fn (x) = xn (see Figure 1) are
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
F IGURE 1. Graphs of x, x2 , . . . , x20 .

each continuous. Nevertheless,
(
0
lim fn (x) =
n
1
if 0 x < 1
if x = 1
and thus the pointwise limit of the functions fn is discontinuous at x = 1. In particular,

this shows that the notion of pointwise convergence is not restrictive enough to ensure that
the limit of continuous functions is continuous.
Even more disheartening is the fact that [0, 1] is compact and hence each function fn
is uniformly continuous (as opposed to merely continuous) on [0, 1]. We can verify this
directly. Consider some specific function fn in our sequence. If > 0 is given, then let
= /n so that |x y| < implies that
|fn (x) fn (y)| = |xn y n |
112
113
= |(x y)(xn1 + xn2 y + + xy n2 + y n1 )|
(1 + 1 + + 1 + 1) |x y|
|
{z
}
n
= n|x y|
< .
so that each fn is uniformly continuous. Thus even the pointwise limit of uniformly continuous functions need not be continuous. Although our s do not depend on x, y, they do
seem to depend on n and .
33.2. Uniform Convergence
Since pointwise convergence does not preserve continuity, we need a stronger, more
restrictive notion of convergence. Fortunately, we have already laid some of the groundwork for this in our discussion of normed vector spaces.
Definition. Let (A, dA ) and (B, dB ) be metric spaces. A sequence of functions fn : A
B converges uniformly to a function f : A B if for each > 0 there exists N N so
that for all a A
n N dB (fn (a), f (a)) < .
The main way to picture uniform convergence (at least in the special case of functions
from some closed interval [a, b] to R) is with via -tubes (see Figure 2). Based upon
F IGURE 2. Graphs of function f (in black), an -tube, and a function

fn (x) (in blue) satisfying |f (x) fn (x)| < for all x.
the definitions, one can clearly see that a sequence which converges uniformly also converges pointwise. Moreover, uniform convergence is another name for something we have
encountered before:
Theorem 33.1. A sequence fn C([a, b]) converges uniformly to a function f C([a, b])
d
f . In other words, convergence with respect to the metric d is

if and only if fn
equivalent to uniform convergence.
114
Proof. This is due to the fact that |fn (x) f (x)| < for all x [a, b] if and only if
sup{|fn (x) f (x)| : x [a, b]} < if and only if d (fn , f ) < .

An important fact about uniform convergence is that it preserves continuity:
Theorem 33.2. Let (A, dA ) and (B, dB ) be metric spaces. If fn : A B is a sequence
of continuous functions which converges uniformly to f : A B, then f is continuous. In
other words: the uniform limit of continuous functions is continuous.
Proof. Suppose that fn : A B converges uniformly to some function f : A B. We
wish to show that the limit function f is continuous on A. It therefore suffices to show
that f is continuous at each point x A. To this end, let > 0 and let x A. Since fn
converges to f uniformly on A, there exists N N so that
n N dB (fn (y), f (y)) <

(33.1)
3
for any y A (in particular, this holds for n = N ). Since fN is continuous at x, there
exists > 0 so that
(33.2)
dA (x, y) < dB (fN (x), fN (y)) < .
3
Putting (33.1) and (33.2) together we find that
dB (f (x), f (y)) dB (f (x), fN (x)) + dB (fN (x), fN (y)) + dB (fN (y), f (y))
< + +
3 3 3
= .
In other words, given any > 0 and any x A, we can find a corresponding > 0 so that
dA (x, y) <
Thus the limit function f is continuous on A.
dB (f (x), f (y)) < .
LECTURE 34
Uniform Convergence
34.1. Completeness of C(X)
Definition. Let (X, d) be a compact metric space. C(X) denotes the normed vector space
of all continuous function f : X R endowed with the norm kf k = supxX |f (x)|.
By the Extreme Value Theorem, it follows that kf k is finite for each f C(X) (this
is why we need X to be compact). Being a normed vector space, C(X) is automatically a
metric space when equipped with the associated metric
d (f, g) = sup |f (x) g(x)|.
xX
Theorem 34.1. If (X, d) is a compact metric space, then C(X) is a complete metric space.
Proof. Let fn be a Cauchy sequence in C(X). It follows from the fact that
|fn (x) fm (x)| sup |fn (x) fm (x)|
axb
= d (fn , fm )
that fn (x) is Cauchy in R for each x X. Thus the sequence fn converges pointwise and
we may define a function f : X R by the formula
f (x) = lim fn (x).
n
We claim that the sequence fn converges to f uniformly.

If > 0 is given, then the fact that fn is Cauchy (with respect to d ) implies that
there exists N N so that
n, m N d (fn , fm ) < .
2
Since fn converges to f pointwise, it follows that for each x X there exists an m(x)
N so that
|fm(x) (x) f (x)| < .

2
Putting this all together, we see that n N implies that
|fn (x) f (x)| |fn (x) fm(x) (x)| + |fm(x) (x) f (x)|
< +
2 2
=
for any x X. This implies that the sequence fn converges to f uniformly on X. Since
the uniform limit of continuous functions is continuous, it follows that f is continuous (and
hence belongs to C(X)). Therefore C(X) is complete.

Another fact (which we shall not prove) is the following:
115
116
Theorem 34.2. If fn C([a, b]) converges to f uniformly, then

Z b
Z b
f (x) dx.
fn (x) dx =
lim
n
In other words, it is legal to interchange integration and uniform limits.

The situation for derivatives is somewhat more complicated (see Figure 1).
1.4
1.2
0.8
0.6
0.4
0.2
-1
-0.5
0.5
F IGURE 1. The uniform limit of the differentiable functions fn (x) =

q
x2 + 21n is f (x) = |x|, which is not differentiable at x = 0.
LECTURE 35
Weierstrass M-test
35.1. Weierstra M -Test
Theorem 35.1 (Weierstra M -Test). Let (X, d) be a metric space and let let fn : X R
be a sequence of functions satisfying
|fn (x)| Mn
P
P
for all x X and for all n N. If n=0 Mn converges, then n=0 fn converges
uniformly (and absolutely for each x X). In particular, if each fn is continuous, then
the limit function f : X R is continuous.
P
Proof.
series n=0 |fn (x)| converges by comparison with
Pnumerical
P For each x X, the
for each x X whence

n=0 fn (x) is absolutely convergent
n=0 Mn . In particular,
P
we may define a function f : X R by setting f (x) = n=0 fn (x).
Given > 0, let N N be so large that
nN
Mj < .
j=n+1
This follows from the fact that the tail end of a convergent series goes to zero. For each
x X we now have
n
X
X
fj (x)|
fj (x)| = |
|f (x)
j=0
j=n+1
j=n+1
|fj (x)|
Mj
j=n+1
< .
P
Thus the series n=0 fn (x) converges uniformly to f (x).

Now suppose that each fn is continuous. Since the uniform limit of continuous functions is continuous, it follows that the limit function f : X R is continuous.

The Weierstrass M -Test is a useful way for constructing continuous functions using
infinite series:
Corollary
10. If fn : [a, b] P
R is a sequence of continuous functions and where
P
kf
k
converges,
then
f
=
n=0 fn is well-defined and continuous on [a, b].

n=0
P
Example 35.1. Let fn = an sin nx where |an | 1/n2 . It follows that f (x) = n=0 an sin nx
converges uniformly on R (why?) and is continuous there.
117
Lecture 35. Weierstrass M -test
118
The Weierstrass M -Test also furnishes a way for producing somewhat bizarre continuous functions. For example, one can show that a sequence of everywhere differentiable
functions can converge uniformly to a nowhere differentiable function.
Example 35.2. Start with a sawtooth w : R R defined by
w(x) = 1 |2 hxi 1|
where hxi denotes the fractional part of x (see Figure 1). The Weierstrass nowhere differ1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.4
0.6
1
0.8
0.2
0.4
0.6
0.8
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
F IGURE 1. Graphs of w(x), w(4x), w(16x).

entiable function is defined by
W (x) =
n
X
3
n=0
w(4n x).
By the Weierstrass M -Test, one sees that W (x) series converges uniformly on [0, 1]. In
particular W (x) is continuous on [0, 1] (and hence uniformly continuous).
However, one can show that W (x) does not have a derivative at any point of [0, 1]. In
P3
P4
light of how spiky the graph of the functions n=0 ( 34 )n w(4n x) and n=0 ( 43 )n w(4n x)
are (see Figure 2), it is not surprising that the limit function W is not differentiable anywhere.
2.5
2
1.5
1
0.5
0.2
0.4
0.6
0.4
0.6
0.8
2.5
2
1.5
1
0.5
0.2
0.8
F IGURE 2. Graphs of the fourth and fifth partial sums of the Weierstrass series.
119
35.2. Weierstrass Approximation Theorem

A famous result which we might prove in Math 132 is the Weierstrass Approximation
Theorem:
Theorem 35.2. The polynomials are dense in C([a, b]) with respect to the d metric.
Given f C([a, b]), there exist polynomials pn such that pn converges to f uniformly.
(i.e. kpn f k 0).
The Weierstrass Theorem is remarkable for it asserts that even a continuous function
which is not differentiable anywhere on [a, b] (like the infamous Weierstrass function from
the preceding lecture) is the uniform limit of polynomials (which are themselves infinitely
differentiable).
35.3. Cauchys Mean Value Theorem
The following theorem is known as Cauchys Mean Value Theorem or the Extended
Mean Value Theorem:
Theorem 35.3 (Cauchys Mean Value Theorem). If f (x) and g(x) are both continuous on
the closed interval [a, b], differentiable on the open interval (a, b), and g (x) 6= 0 for all
x (a, b), then there exists some c (a, b), such that1
f (b) f (a)
f (c)
=
.
g (c)
g(b) g(a)
(35.1)
Setting g(x) = x on [a, b] yields the standard version of the Mean Value Theorem.
Proof. Define an auxiliary function
h(x) = f (x)
and observe that
f (b) f (a)
g(x)
g(b) g(a)
f (a)g(b) f (b)g(a)
g(b) g(a)
(this is straightforward, but slightly tedious computation). Since h is continuous on [a, b]
and differentiable on (a, b), it follows from Rolles Theorem that there exists c (a, b)
such that h (c) = 0. In other words,
h(a) = h(b) =
f (b) f (a)
g (c).
g(b) g(a)
The preceding equation immediately implies (35.1).
0 = f (c)
1Observe that the denominator g(b) g(a) is nonzero. Indeed, if g(a) = g(b), then by Rolles Theorem
there exists x0 (a, b) such that g (x0 ) = 0. This contradicts the hypothesis of the theorem.
LECTURE 36
LHopitals Rule
and Taylors Theorem
36.1. LHopitals Rule
An important consequence of Cauchys Mean Value Theorem is the following:
Theorem 36.1 (LHopitals Rule). If
(i) f, g are differentiable on (a, b),
(ii) limxa+ f (x) = limxa+ g(x) = 0,
(iii) g(x) 6= 0 and g (x) 6= 0 for all x (a, b),
(iv) limxa+
f (x)
tends to a finite limit L,
g (x)
then
f (x)
f (x)
= lim
= L.
(36.1)
xa+ g(x)
xa+ g (x)
Similar statements hold in the cases where x and/or limxa+ f (x) = limxa+ g(x) =
lim
Proof. Let x > a and observe that (ii) ensures that f and g extend continuously to [a, b)
by setting
f (a) = g(a) = 0
Let xn be a sequence in (a, b) tending to a. By Cauchys Mean Value Theorem, there exists
a sequence cn such that a < cn < xn for all n N and such that
f (cn )
f (xn ) f (a)
=
.
g (cn )
g(xn ) g(a)
Since f (a) = g(a) = 0, it follows that
f (xn )
f (cn )
=
g (cn )
g(xn )
for all n N. As xn a+ , it follows from the Squeeze Theorem that cn a+ whence
f (cn )
f (xn )
= lim
= L.
lim
n g (cn )
n g(xn )
By (iv), the limit L is independent of the sequence cn . In particular, the preceding holds for
every sequence xn in (a, b) tending to a from which the desired result (36.1) follows.
Example 36.1. Condition (iv) is essential. Consider the functions
f (x) = x + sin x,
g(x) = x.
120
121
Clearly
f (x)
= lim (1 + cos x),
x g (x)
x
lim
which does not exist. On the other hand, it is clear that

lim
f (x)
x + sin x
sin x
= lim
= 1 + lim
= 1 + 0 = 1.
x
x x
g(x)
x
An incorrect application of LHopitals rule has led to the wrong answer.

Example 36.2. Condition (iii) is also essential. Consider the functions
f (x) = x + cos x sin x
g(x) = esin x (x + cos x sin x)
and observe that
f (x) = 2 cos2 x
g (x) = 2esin x cos2 x + esin x cos x(x + cos x sin x)
= esin x cos x(x + cos x sin x + 2 cos x).
In particular, note that g ( 2n+1
2 ) = 0 for each n Z because of the factor of cos x in the
expression for g (x). On one hand,
f (x)
= lim
x g (x)
x
lim
d
dx (x + cos x sin x)
d sin x
(x + cos x sin x)
dx e
2
2 cos x
2esin x cos2 x + esin x cos x(x + cos x sin x)
2 cos x
= lim sin x
x e
(x + sin x cos x + 2 cos x)
2e
lim
x x 3
= 0,
= lim
whereas
lim
f (x)
1
= lim sin(x)
x
g(x)
e
does not exist. An incorrect application of LHopitals rule in this instance leads to the
wrong answer.
36.2. Taylors Theorem
An important generalization of the Mean Value Theorem is Taylors Theorem:
Theorem 36.2 (Taylors Theorem). Let n 0 and f : [a, b] R. If
(i) f , f , . . . f (n) are continuous on (a, b),
(ii) f (n+1) exists on (a, b),
(iii) x, x0 (a, b),
Lecture 36. LHopitals Rule

and Taylors Theorem
122
then there exists strictly between x and x0 such that
f (x) = f (x0 ) + f (x0 )(x x0 ) + +
|
{z
Pn (x)
(n+1)
f
()
(x x0 )n+1 .
(n + 1)!
|
{z
}
f (n) (x0 )
(x x0 )n
n!
}
Rn (x)
Proof. Fix x 6= x0 and let rn = rn (x) be the number defined by

n
X
rn
f (k) (x0 )
(x x0 )k +
(x x0 )n+1
f (x) =
k!
(n + 1)!
k=0
{z
}
|
{z
} |
Rn (x)
Pn (x)
(for this specific value of x we are certainly not asserting that f is a polynomial of degree
n + 1). We wish to show that rn = f (n+1) () for some lying strictly between x and x0 .
Define the auxiliary function
n
X
f (k) (t)
rn
F (t) =
(x t)k +
(x t)n+1
k!
(n + 1)!
k=0
and observe that
F (x0 ) = F (x) = f (x).

A computation based on the telescoping series trick shows that
f (n+1) (t)
rn
(x t)n (x t)n
n!
n!
(x t)n (n+1)
(f
(t) rn ).
=
n!
Since F is differentiable on the open interval between x0 and x, it follows from Rolles
Theorem that there exists a lying strictly between x0 and x such that F () = 0. In other
words,
(x )n (n+1)
(f
() rn )
0 = F () =
n!
(n+1)
whence rn = f
(), as desired.

F (t) =
The expression
Pn (x) =
n
X
f (k) (x0 )
(x x0 )k
k!
k=0
is called the nth order Taylor approximation to f at x0 . It is a polynomial of degree n.

Observe that Taylors Theorem provides the estimate
|f (x) Pn (x)|
|f (n+1) ()|
|x x0 |n+1
(n + 1)!
of the error in approximating f (x) by Pn (x). The expression

f (n+1) ()
(x x0 )n+1
(n + 1)!
is called the nth order Remainder term.
LECTURE 37
Taylor Series
Theorem 37.1 (Taylors Inequality). If |f (n+1) (x)| M for |x x0 | < r, then the
remainder Rn (x) of the Taylor series satisfies
|Rn (x)|
M
|x x0 |n+1
(n + 1)!
for |x x0 | < r.
Proof. This follows from Taylors Theorem and the fact that |f (n+1) ()| M for all
between x and x0 .

Definition. If f is infinitely differentiable at x0 , then
X
f (n) (x0 )
(x x0 )n
n!
n=0
is called the Taylor series for f centered at x0 . We do not claim that the series converges
nor that it converges to f on some open interval containing x0 .
Theorem 37.2 (Taylor Expansion Theorem). Suppose that
(i) the Taylor series for f centered at x0 converges1 for |x x0 | < r,
(ii) limn Rn (x) = 0 for |x x0 | < r,
then
f (x) =
X
f (n) (x0 )
(x x0 )n
n!
n=0
for |x x0 | < r. In other words, the Taylor series for f (x) centered at x0 converges to the
value f (x).
Proof. Let |xx0 | < r and recall that f (x)Pn (x) = Rn () where is strictly between x
and x0 . Taking absolute values and applying (ii) we find that limn |f (x) Pn (x)| = 0.
Since Pn (x) is simply the nth partial sum of the Taylor series for f (x) centered at x0 , the
desired result follows.

To most students of Calculus II, it is surprising that both (i) and (ii) are required for
the conclusion of the theorem to hold. We will consider several bizarre examples shortly.
1In particular, this implies that f is infinitely differentiable at x .
0
123
124

37.1. Smoothness Classes
Definition. Let f : (a, b) R be a function.
If f is continuous on (a, b), then f is C 0 on (a, b),
If f is continuously differentiable (i.e., f exists and is continuous) on (a, b),

then f is C 1 on (a, b),
If f is nth order continuously differentiable on (a, b) (i.e., f (n) exists and is
continuous), then f is C n on (a, b),
If f is infinitely differentiable on (a, b) (i.e., f (n) exists for each n N), then f
is C on (a, b). C functions are also called smooth functions.
This leads to the hierarchy of smoothness classes:
C0 C1 C2 C =
C n.
nN
Each inclusion is a proper inclusion since one can show that

f0 (x) = |x| is C 0 but not C 1
f1 (x) = x|x|
is C 1 but not C 2
f2 (x) = |x|3
is C 2 but not C 3
..
.
Definition. A function f : (a, b) R is analytic on (a, b) if for each x0 (a, b) there is

a power series
X
f (x) =
an (x x0 )n
n=0
centered at x0 which converges in some interval |x x0 | < r. If f is analytic on (a, b),

then f is said to be of class C .
We know from Calculus II that the coefficients are given by Taylors Formula
f (n) (x0 )
.
n!
Indeed, this is not hard to derive assuming that term-by-term differentiation is allowable.
Also recall that a power series is infinitely differentiable on the interior of its interval of
convergence.2 In particular, this implies that C C . We will discuss analytic functions
in more detail when we have discussed uniform convergence. However, it is important (and
not at all obvious) that that C ( C .
an =
37.2. Some Smooth Functions

Example 37.1. In this example, we demonstrate the existence of an infinitely differentiable
function which does not equal its own Taylor series on any nontrivial open interval around
2This requires proof, of course. The proper setting for power series is in complex analysis and you will
learn more about them there.
125
its center. In particular, condition (ii) in the Taylor Expansion theorem cannot be ignored.
To be more specific, we claim that the function f : R R
(
2
e1/x x 6= 0
f (x) =
0
x=0
is C but not C (see Figure 1). To be specific, we claim that f is infinitely differentiable
1
0.8
0.6
0.4
0.2
-4
-2
F IGURE 1. Graph of f (x) near x0 = 0. Despite appearances, the function is

nonconstant near x = 0.
at each point of R, and that
f (n) (0) = 0,
n = 0, 1, 2, . . . .
In other words, we are claiming that the Taylor series of f (x) centered at x0 = 0 is the
zero function. In particular, f is an example of an infinitely differentiable function which
does not equal its own Taylor series on any open interval containing 0. This shows that C
is a proper subset of C .
By the standard differentiation formulas, it follows that f is infinitely differentiable
at x0 as long as x0 6= 0. It therefore remains to show that f (n) (0) exists for each n =
0, 1, 2, . . . (in particular, we will show that f (n) (0) = 0). We do this by induction.
BASE C ASE: Since f (0) (0) = f (0) = 0 by definition, the base case is trivial.
I NDUCTIVE S TEP: Suppose that we have already shown that
For x 6= 0, we see that
f (0) = f (0) = = f (n1) (0) = 0.

f (x) = 2x3 e1/x
f (x) = (4x6 6x4 )e1/x
f (x) = (8x9 36x7 + 24x5 )e1/x
..
.
and, more generally, we observe that f (n) (x) is a polynomial3 in 1/x times e1/x :
2
f (n) (x) = Pn ( x1 )e1/x ,

3This requires a short inductive proof itself.
x 6= 0.
126
Using LHopitals rule we see that

f (n1) (x) f (n1) (0)
x0
x0
f (n1) (x)
= lim
x0
x
2
1
= lim x Pn1 ( x1 )e1/x
f (n) (0) = lim
x0
= lim tPn1 (t)et

t
=0
2
since et tends to zero faster4 than any polynomial can blow up. Thus f (n) (0) = 0 for
all n = 0, 1, 2, . . . as claimed. In particular, the Taylor series for f (x) at x0 = 0 is the zero
function!
Definition. Let f : R R be a function. The support of f is the set
supp(f ) = {x R : f (x) 6= 0}.
In other words, supp(f ) is the closure of the set upon which f does not vanish. In particular, supp(f ) is always closed.
Example 37.2. In this example, we construct a C with compact support. Consider the
bump function
(
2
e1/(1x ) if |x| < 1,
f (x) =
0
otherwise .
An argument similar to that used in Example 37.1 shows that f C (R). See Figure 2
We say that f has compact support since supp(f ) = [1, 1] is compact. Keep in mind that
F IGURE 2. Graph of a bump function.

f accomplishes this despite being infinitely differentiable on R. Bump functions are incredibly useful tools in advanced topology, advanced partial differential equations, Fourier
analysis, and functional analysis.
Example 37.3. Let f (x) be the bump function constructed in the preceding example
R1
and let A = 1 f (x) dx. The function
Z
1 x
g(x) =
f (t) dt
A 1
4This requires yet another application of LHopitals Rule to verify.
127
is C (by the Fundamental Theorem of Calculus) and satisfies
if x 1
= 0
g(x) (0, 1) if 1 < x < 1
=1
if x 1
Thus we have a C ramp function.
The following theorem shows that condition (i) in the Taylor Expansion Theorem
cannot be ignored:
Theorem 37.3. For each sequence a0 , a1 , . . . of real numbers, there exists a function f
C (R) such that
f (n) (0)
= an
n!
P
for all n N. In particular, for each prospective Taylor series n=0 an xn , there exists
a function f whose Taylor coefficients are precisely an . Moreover, the choice an = nn
yields a C function whose Taylor series diverges whenever x 6= 0.
Sketch of Pf. Let be a C supported in [2, 2] and such that (x) = 1 if x [1, 1]
(it takes some work to justify the existence of such a function). In particular, observe that
(n) (0) = 0 for n = 1, 2, 3, . . . since is constant on [1, 1]. Now let
fn (x) = an xn (n x)
where n is a sequence of positive numbers to be defined later. Now observe that
(
n!an if j = n
(j)
fn (0) =
0
if j 6= n.
P
We need only choose n such that the series f (x) =

n=0 fn (x) converges to a C
function. We claim that
n
X
|aj |
n = n +
j=0
does the job.
LECTURE 38
Initial Value Problems

38.1. Existence and Uniqueness of Solutions
Consider the initial value problem
y (x) = F (x, y(x)),
y(x0 ) = y0
(38.1)
where y is a function of x, x0 and y0 are real constants, and F (x, y) is a continuous function
of x, y. Many standard problems in pure and applied mathematics are of this form.
Theorem 38.1. Suppose that F (x, y) and F
y (x, y) are continuous on an open neighbor2
hood of (x0 , y0 ) R . If > 0 is given, there there exists a > 0 such that there is a
unique continuously differentiable function y(x) on I = [x0 , x0 + ] which satisfies the
initial value problem (39.1) and for which |y(x) y0 | < for all x I.
Proof. By the hypotheses of the theorem, there exists a closed rectangle R (a compact set)
centered at (x0 , y0 ) and constants M0 , M1 > 0 such that
|F (x, y)| M0

F

y (x, y) M1
(x, y) R,
(x, y) R.
Now let (x, y1 ), (x, y2 ) R. By the Mean Value Theorem applied to

the variable y, there exists c lying strictly between y1 , y2 such that

F
(x, c) |y1 y2 |
|F (x, y1 ) F (x, y2 )| =
y
M1 |y1 y2 |.
F
y
with respect to
In summary, we have obtained the inequalities

|F (x, y)| M0
|F (x, y1 ) F (x, y2 )| M1 |y1 y2 |
(x, y) R,
(x, y1 ), (x, y2 ) R.
By considering an even smaller rectangle, we may also presume that R has width 2 where
> 0 is sufficiently small so that
M0 ,
M1 < 1.
Let I = [x0 , x0 + ] and let E denote the closed ball in C(I) centered at the
constant function y0 . In other words,
E = {f C(I) : kf (x) y0 k }.
Since E is a closed subset of the complete metric space (C(I), d ), it follows that (E, d )
is itself a complete metric space.
128

Now consider the function T : E C(I) defined by
Z x
F (t, (t)) dt,
[T ](x) = y0 +
x0
129
E.
We claim that T (E) E. In other words, T maps E into itself and can be regarded as a
function T : E E. Indeed, let E (i.e., k y0 k ). For each x I it follows
that
Z x

|[T ](x) y0 | =
F (t, (t)) dt
x0
M0 |x x0 |
M0
.
Thus kT y0 k and T E.
Having established that T maps E into E, we now show that T : E E is a strict
uniform contraction. In fact, we will show that the contraction constant is
= M1 < 1.
Let 1 , 2 E and note that
kT 1 T 2 k
Z x

Z x

= sup
F (t, 2 (t)) dt
F (t, 1 (t)) dt
xI
x0
x
Z x0
|F (t, 1 (t)) F (t, 2 (t))| dt
sup
xI x0
Z x
= sup M1
|1 (t) 2 (t)| dt
xI
x0
Z x
= M1 k1 2 k
dt
x0
= M1 k1 2 k
k1 2 k .
Since = M1 < 1, it follows that T is a strict uniform contraction.

Since T : E E is a strict uniform contraction, it follows from the Contraction
Mapping Principle that T has a unique fixed point y E. In other words, there exists a
function y E such that
Z x
F (t, y(t)) dt
(38.2)
y(x) = y0 +
x0
for all x I. In particular,
y(x0 ) = y0 .
Moreover, taking the derivative of (38.2) and using the Fundamental Theorem of Calculus
it follows that y = F (x, y) for all x I. In other words, our fixed point y E is a
solution to the initial value problem (39.1). Moreover, it is the only solution in E.
LECTURE 39
Picard Iteration
39.1. Initial Value Problems
y (x) = F (x, y(x)),
y(x0 ) = y0
(39.1)
where y is a function of x, x0 and y0 are real constants, and F (x, y) is a continuous function
of x, y. Many standard problems in pure and applied mathematics are of this form.
Theorem 39.1. Suppose that F (x, y) and F
y (x, y) are continuous on an open neighborhood of (x0 , y0 ) R2 . If > 0 is given, there there exists a > 0 such that there is a
unique continuously differentiable function y(x) on I = [x0 , x0 + ] which satisfies the
initial value problem (39.1) and for which |y(x) y0 | < for all x I.
Let us recall a few things about the method of proof of the Existence and Uniqueness
Theorem. First, we obtained the > 0 at the beginning of the proof. Recall that the size
of (and consequently the length of the interval upon which we expect a solution to (39.1)
to exist) was determined by the behavior of F and F
y (x, y). Once > 0 was determined,
we defined I = [x0 , x0 + ] and defined E to be the closed -ball in C(I) centered at
the constant function y0 .
Next, we noticed that y(x) is a solution to the initial value problem (39.1) if and only
if
Z x
y (t) dt
y(x) y(x0 ) =
x0
Z x
F (t, y(t)) dt.
=
x0
Since the preceding is equivalent to

y(x) = y0 +
F (t, y(t)) dt,
x0
it follows that y C(I) is a solution to (39.1) if and only if y is a fixed point of the integral
operator
Z x
F (t, (t)) dt,
[T ](x) = y0 +
x0
39.2. Extended Example

y = 2x(1 + y),
130
y(0) = 0.
131
Since the equation is both separable and linear, it is easy to solve (assuming you have taken
an elementary course in differential equations):
2
y(x) = ex 1.
In fact, it is easy to check that the above is indeed a solution to the initial value problem.
Let us see the Contraction Mapping Principle in action. Here x0 = y0 = 0 and
F (x, y) = 2x(1 + y). The corresponding initial value problem can be rewritten as the
integral equation:
Z
x
(x) =
2t[1 + (t)] dt.
We therefore would like to find a fixed point of the integral operator

Z x
[T ](x) =
2t[1 + (t)] dt.
0
We let 0 (x) = y0 = 0 (the zero function) and repeatedly apply T :

n (x) = [T n 0 ](x).
The sequence n should approach (with respect to d ) the actual solution to our initial
value problem (at least on some neighborhood of x0 = 0). This method is known as Picard
iteration.
With the initial approximation is 0 (x) = y0 = 0, it follows that
1 (x) = [T ](x)
Z x
=
2t[1 + 0] dt
Z0 x
2t dt
=
0
= x2 .
Similarly,
2 (x) = [T 1 ](x)
Z x
2t[1 + 1 (t)] dt
=
Z0 x
=
2t[1 + t2 ] dt
Z0 x
=
2t + 2t3 dt
0
= x2 +
x4
.
2
Computing again
3 (x) = [T 2 ](x)
Z x
=
2t[1 + 2 (t)] dt
0

Z x
x4
dt
2t 1 + t2 +
=
2
0
Z x
=
2t + 2t3 + t5 dt
0
132
Lecture 39. Picard Iteration

= x2 +
x4
x6
+ .
2
6
The general pattern

x4
x6
x2n
+
+ +
2!
3!
n!
can be established by mathematical induction. In particular, the Contraction Mapping
Principle tells us that the sequence n converges uniformly on some interval containing
x0 = 0 to our solution.
The iterative method of solving the initial value problem is clearly leading us to the
solution
2
x6
x4
+
+ +
ex 1 = x2 +
2!
3!
X
x2n
=
.
n!
n=1
n (x) = x2 +
In particular, n is simply the a partial sum in the Taylor series for ex 1.
APPENDIX A
Basic Logic
A.1. Primitive Concepts
To do meaningful mathematics one needs to start out with various primitive concepts. There are many things that we cannot adequately define without some form of
self-reference. For example, try to define the following without referring to other concepts
that require further definitions:
Idea
Statement
True, false
Sets, objects
Everything, nothing
There are a host of words that we use every day that we simply cannot define without
reference to other, equally hard-to-define concepts. You might say, a set is a collection of
objects. But what is a collection? What are objects? Simply put, to convey information to
someone, you must both already have a common language and several primitive concepts
that both parties understand beforehand.
Another interesting example is that of numbers. What exactly is 2? What is a whole
number, exactly? Can you define it? Of course, one might just say that this is silly we all
know what numbers are, dont we? It turns out, however, that some languages only have
words for one and many but no words to express the concept of two, three, etc.
There are certain ideas (such as sets, true, false, etc.) which mathematicians use freely,
without worrying about any of the philosophical difficulties involved. On the other hand,
many philosophers are not satisfied with this situation and seek further to clarify the meaning of some of these words (in analytic philosophy). In keeping with our main theme
(learning about real analysis), we will not be overly picky with the philosophical details.
The term sentence and statement will be used interchangeably in these notes to refer
to an expression that is well-formed in the rules of the language in which it is written. This
brings up the ideas of languages and of what exactly constitutes meaning (these are issues
that are discussed in the realms of computer science and philosophy). There are expressions like i(&*#dfs9[{ and at the the up that have no meaningful interpretation
in the language in which they are written and we will not consider these to be sentences.
Sentences are classified according to their truth value:
Example A.1. Some sentences are true. The sentences
1+1=2
and
There are infinitely many prime numbers
133
134
are true. The fact that the second statement (known as Euclids theorem) is true is not
obvious it requires proof.
Example A.2. Some sentences are false. For instance
0>1
and
One can get an A in Math 131 without doing the homework
are false statements.
We also adopt the conventions that a statement cannot be simultaneously true and
false, although a sentence can be neither true nor false.1 A proposition is a statement
which has a definite truth value (it is either true or false). For example, 1 + 1 = 3 is
a proposition (which is false). Of course, there are many propositions whose exact truth
value is unknown to us. For instance:
(i) There are infinitely many pairs of twin primes.2
(ii) There exists an odd perfect number.3
Nevertheless, either an odd perfect number exists or one does not. The sentence There
exists an odd perfect number is a proposition. Unfortunately, we have not
been able to determine its exact truth value at this point in time.
A.2. Negation (NOT)
There are several basic operations which allow us to create new propositions from
old ones. The simplest of these operations is called negation, which simply reverses the
truth value of its argument. The negation P (read not P ) of a proposition P is the
proposition
It is not the case that P .
When negating English sentences, one can often write things in a more elegant fashion.
Example A.3. If P is the proposition
Class meets at 9am,
then P would be
It
{z case that}
| is not the
class
meets
{z at 9am} .
|
P
A shorter sentence which has the same meaning is
Class does not meet at 9am.

Example A.4. If P is the proposition 1 + 1 = 2, then P is simply 1 + 1 6= 2.
Example A.5. If P is the proposition e < e , then P is simply e e . Which

proposition is true? This is actually a moderately interesting calculus problem. Can you
determine (without a calculator) which of e and e is larger?
1This does not occur often in practice, but it does come up when considering meta-mathematical issues. We
will consider only one such example in this course.
2Twin primes are prime numbers like (17 and 19) or (29 and 31) which differ from each other by 2.
3
A natural number n is called perfect if n is equal to the sum of its proper divisors. For instance, 6 =
1 + 2 + 3, so 6 is a perfect number. The next largest perfect number is 28 since 28 = 1 + 2 + 4 + 7 + 14. Can
you find more?
135
A proposition P and and its negation P are related by the following truth table:
P
T
F
P
F
T
Moreover, it is not hard to see that the expressions P and P have the same truth table:
P
T
F
P
F
T
P
T
F
The importance of this observation is that the roles of P and P can be interchanged in
mathematical arguments. We say that P and P are equivalent and write
P
to symbolize this relationship.

Example A.6. Many times, we have propositions which depend on variables. For instance,
let x denote a real variable and let P (x) be the statement
x is rational.4
(A.1)
The truth value of P (x) depends, of course, on x. Since a real number which is not rational
is called irrational, we can write P (x) as
x is not irrational.
(A.2)
Clearly, (A.1) and (A.2) are saying the same thing in two different ways.
A.3. Conjunction (AND)
If P and Q are propositions, then the new proposition P Q is interpreted as P and Q,
just as in English. In other words, the sentence P Q is true if and only if both statements
P and Q are true. Therefore the truth value of P Q is related to P and Q via the following
table:
P Q P Q
T T
T
T F
F
F T
F
F F
F
Example A.7. If
P = It is Thursday
Q = It is raining today,
then P Q is the proposition
It is Thursday
{z
}
|
P
and
|{z}
it is raining today .
{z
}
|
Q
The proposition P Q is therefore true only on rainy Thursdays (when P and Q are both
true).
4A rational number is a fraction, a ratio a/b of integers a and b, where b 6= 0.
136
Example A.8. Using truth tables, we can derive the associative law for :
P (Q R) (P Q) R
Indeed, we merely need to produce the truth tables for P (Q R) and (P Q) R and
compare entries. Since these expressions have three propositional variables (P, Q, R), our
truth table will have 8 = 23 rows since there are two possibilities for each variable (namely
T or F ).
P
T
T
T
F
F
F
F
Q R
T T
T F
F T
T T
F T
T F
F F
P Q
T
T
F
F
F
F
F
(P Q) R
T
F
F
F
F
F
F
QR
T
F
F
T
F
F
F
P (Q R)
T
F
F
F
F
F
F
Since the truth tables for P (Q R) and (P Q) R are the same, they are equivalent
statements.
A.4. Disjunction (OR)
If P and Q are propositions, then P Q is the new proposition
P or Q
where the word or is to be interpreted as an inclusive or (see below). Specifically, the
truth value of the proposition P Q is related to P and Q via the following table:
P
T
T
F
F
Q
T
F
T
F
P Q
T
T
T
F
Example A.9. A mathematician is in a restaurant and sees a sign (which is presumably

truthful) that says
Lunch comes with soup or salad.
Everyday English often uses an exclusive or, meaning that one can have either soup or
salad, but not both. This is not how things work in mathematics. From the mathematicians
viewpoint, however, having both soup and salad with lunch is a definite possibility.
Example A.10. If
P = It is Thursday
Q = It is raining today,
It is Thursday
{z
}
|
P
or
|{z}
it is raining today .
{z
}
|
Q
This proposition is false only on sunny days that are not Thursday.
137
A.5. Manipulating Propositions

Now that we have introduced , , and , we need to know how they interact with
each other. One of the most important conventions is that takes priority over and .
Two other basic rules for manipulating propositions are called de Morgans laws:
(P Q)
(P Q)
P Q
(A.3)
P Q.
(A.4)
A short computation shows that the expressions (P Q) and P Q have the same
truth tables, which establishes (A.3):
P
T
T
F
F
Q
T
F
T
F
P Q
T
F
F
F
(P Q)
F
T
T
T
P
F
F
T
T
Q
F
T
F
T
P Q
F
T
T
T
We could do something similar to show that (A.4) is correct, but there is a better way. Since
(A.3) holds for any two propositions P and Q, we can insert P and Q in their place to
obtain
(P Q)
Negating both sides gives
(P Q)
(P ) (Q)
P Q.
(P Q)
P Q,
which establishes (A.4), the second of de Morgans laws.

Example A.11. Let x denote an integer variable, x 2, and define
P (x) = x is prime
Q(x) = x is odd.
Thus
P (x) Q(x) = x is prime and x is odd
= x is an odd prime
The proposition P (x) Q(x) is therefore true for
x = 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .
and false for all other integers. By (A.3), the first of de Morgans laws, it follows that the
negation of the proposition x is an odd prime is
(x is an odd prime)
|
{z
}
P (x)Q(x)
(x is prime and x is odd)

{z
}
|
P (x)Q(x)

(x is prime)
{z
}
|
P (x)

and
(x
is
odd
)
|{z} | {z }
Q(x)
(x is prime) (x
| is{z odd})
|
{z
}
P (x)
Q(x)
(x is not prime) (x is not odd)
138
(x is composite) (x is even)
x is composite or even.
x is composite or x is even
(Recall that an integer n is composite if it is divisible by a positive integer other than 1 and
n).
A.6. Implication (P Q)
The proposition P Q (called an implication) is read
If P , then Q
or
P implies Q
and is commonly denoted
P Q
or
Q P.
The proposition P is called the hypothesis of the implication and the proposition Q is
called the conclusion. Be careful with the order since P Q and (P Q) are quite
different expressions.5 Always remember that takes priority over and .
The truth table
P Q P P Q
T T
F
T
(A.5)
T F
F
F
F T
T
T
F F
T
T
for P Q shows that the only case where P Q is false is when P is true and Q is
false.
In some texts, P Q is defined to be
(P Q).
There is no contradiction, since the truth table

P
T
T
F
F
Q
T
F
T
F
Q P Q (P Q)
F
F
T
T
T
F
F
F
T
T
F
T
for (P Q) is identical to the truth table (A.5) for P Q and hence

P Q
(P Q).
This also follows from de Morgans laws as well.

Example A.12. The term implies has a slightly different meaning in mathematics than
in the everyday world. The truth value of an implication P Q does not depend on the
actual meaning of P and Q, only on their truth values. For instance,
If 1 + 1 = 2, then penguins can swim. (TRUE)
5In fact, if you want to be extra careful, you can write (P ) Q instead.
139
If 1 + 1 = 2, then penguins can fly. (FALSE)

If penguins can fly, then 1 + 1 = 2. (TRUE)
If penguins can fly, then 1 + 1 6= 2. (TRUE)
Example A.13. If
P = You miss the final
Q = You fail the course,
then in plain English the proposition P Q reads:
If you miss the final, then you fail the course.
Going back to the technical definition P Q of P Q, we see that P Q can also be
interpreted as
You
do not miss
the final}
{z
|
P
In other words, P Q can be read as
or
|{z}
you fail the course .

|
{z
}
Q
You take the final or you fail the course.

Note that the or does not mean that taking the final and failing the course are mutually
exclusive possibilities. Indeed, it is certainly possible to take the final and fail the course.
A.7. Converse (P Q)
The converse of P Q is the proposition Q P , which we usually write as
P Q.
In English, this might be read:
P is implied by Q.
Example A.14. If
P = Every student passes the course
Q = You pass the course,
If every student passes the course, then you pass the course.
On the other hand, the converse of P Q is the proposition Q P which can be written
as:
If you pass the course, then every student passes the course.
Clearly these mean quite different things. It is quite possible that P is false (someone will
fail) and Q is true (you pass). In this case, P Q is true but Q P is false. This makes
perfect sense just because you pass the course does not mean that everyone else will.
140

A.8. If and only if ()
The expression (P Q) (P Q) is so important that it has its own symbol,

P Q. We read this as
P if and only if Q
or P iff Q. In plain English, we sometimes say that
P and Q are equivalent statements.
The truth table for P Q looks like
P Q
T T
T F
F T
F F
P Q
T
F
T
T
P Q
T
T
F
T
P Q
T
F
F
T
Essentially, P Q means that P and Q are either simultaneously true or simultaneously

false. Also of note is the fact that (make the appropriate truth table)
(P Q)
(P Q).
This does not conflict with our earlier usage of . For instance, we wrote de Morgans
first rule (A.3) as:
(P Q) P Q.
(A.6)
The preceding can itself be thought of as a statement, as opposed to simply relating the
truth values of the two statements (P Q) and P Q. A short computation shows
that the expression (A.6) has the following truth table:
P
T
T
F
F
Q
T
F
T
F
(P Q) P Q
F
F
T
T
T
T
T
T
(P Q) P Q
T
T
T
T
In other words, the statement (A.6) is true regardless of the truth value (or meaning) of P
and Q. Such statements (in more precise terminology, sentential forms) are called tautologies.
A.9. Contrapositive
The contrapositive of an implication P Q is defined to be
Q P.
The reason that contrapositives are so important is because they are equivalent to their
original implications:
(P Q) (Q P ).
This follows from examining the associated truth table:

P
T
T
F
F
Q
T
F
T
F
P Q
T
F
T
T
Q P
F
F
T
F
F
T
T
T
Q P
T
F
T
T
(P Q) (Q P )
T
T
T
T
141
Thus if one wants to prove that the statement P Q is true, one can prove Q P
instead.
Example A.15. A positive integer x 2 is called perfect if x equals the sum of its proper
divisors. For instance 6 and 28 are perfect numbers since
6=1+2+3
28 = 1 + 2 + 4 + 7 + 14.
Let x denote a positive integer 2 and let
P (x) = x is perfect
Q(x) = x is even.
In plain English, we might say that
(P (x) Q(x))
(Q(x) P (x))
If x is perfect, then x is even
If x is odd, then x is not perfect.
Note that these two propositions mean exactly the same thing, but in different ways. It is
unknown whether an odd perfect number exists (an unsolved problem for over 2000 years),
so the truth value of the propositions above are unknown.
APPENDIX B
Basic Set Theory

B.1. Sets
Almost everything in mathematics can be defined in terms of sets. Indeed, most of
mathematics fits comfortably inside the framework of set theory. What exactly is a set?
As we mentioned in before, this is a difficult question. According to Georg Cantor (the
founder of set theory)
By a set we are to understand any collection into a whole of definite and
separate objects of our intuition or our thought.
This definition is somewhat circular and it underlines one of the obstacles in talking about
sets. One cannot define a set as a a collection without first knowing what a collection
is. After all, how does one define a collection? We will simply have to accept that the
student understands what is meant by the term set. We do not have the time to grapple
with the deep philosophical issues that are clearly at hand.
Sets have elements, also known as members. If A is a set, then x A stands for the
proposition
x belongs to A
or
x is an element of A.
For example, 2 {0, 1, 2} is a true proposition. One way to describe a set is by just
writing out its members between the set brackets { and }. The proposition (x A),
which translates as
x is not an element of A,
is usually written x
/ A.
Example B.1. According to the definition, we have
2
/ {0, penguin, {0, 1, 2}}.
This example shows a couple things. First, the elements of a set do not have to be the
same type of thing. Second, a set (namely {0, 1, 2}) can be an element of another set. If
one thinks of a set as being a box in which objects are placed, then one sees that is not
unreasonable for a box to contain some items and possibly another box.
Two sets A and B are called equal, written A = B, if and only if they have exactly the
same elements. If two sets A and B are not equal, we write A 6= B (which literally means
(A = B)). This is the case whenever A contains an object that B does not, or vice-versa.
Example B.2. According to our definition of set equality, a set is completely determined
by its members. For instance,
{, e, e, } = {, , , e} = {, e} = {e, }.
142
143
Repetition and order do not matter when listing the members of a set. Also observe that
{, e, {e}} 6= {, e}
since {e} and e are not the same thing. One way to think about this is that e and a box
containing e are not the same thing.
The set
= {}
is called the empty set. It has no elements, it contains nothing. One can think of it as an
empty box. There is one catch, however, for the empty set is considered to be unique it
is the only set with no elements.
Example B.3. Using the definition of set equality, we see that
6= {}
since {} while
/ (and therefore and {} do not have exactly the same
elements). Think of it this way: An box with an empty box inside is not the same thing as
an empty box.
Example B.4. The following sets
{}
{, {}}
{, {}, {, {}}}
..
.
are all distinct from one another. In fact, each successive set in our list contains all of the
preceding ones as elements. They are all created from nothing, using only the primitive
notion of sets. You can therefore build quite complicated sets without assuming that actual
objects exist! In fact, if one wants to axiomatize set theory and construct all of mathematics
rigorously from the basic principle of a set, one can take the sequence of sets above as
starting point for defining the natural numbers.
Since this is a mathematics course, we will obviously be talking about numbers (of
various sorts) quite often. Some important sets of numbers which we will frequently refer
to are the following:
P = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .}, the set of prime numbers.1
N = {1, 2, 3, . . .}, the set of natural numbers.
Z = {. . . , 2, 1, 0, 1, 2, . . .}, the set of integers.
Q, the set of rational numbers (fractions).
R, the set of real numbers.
1This is not standard mathematical notation.
144
Although all of the above can be rigorously constructed from the basic axioms of
set theory, we will not do so here. In this course, we make the bold assumption that
P, N, Z, and Q exist. The real numbers, however, are a different matter altogether. We will
explore the nature and structure of R shortly. The real numbers, it turns out, are much more
complicated that you might think.
B.2. Using Properties to define Sets
We can use propositions to define sets using the so-called set builder notation. If P (x)
is a proposition (whose truth value depends on the variable object x), we define
{x : P (x)}
to be the set of all x such that P (x) is true. We are overlooking a few fine points of logic
here,2 but this definition will be sufficient for most purposes (although we will see how
unrestricted use of the set builder notation can lead to logical paradoxes).
Example B.5. The set P of all prime numbers can be written as
{2, 3, 5, 7, 11, 13, . . .} = {x : x is a prime number}
= {y : y is a prime number}
Note that the particular symbol used as a variable is irrelevant. It is a dummy variable as
R1
R1
in calculus: 0 f (x) dx = 0 f (y) dy.
Example B.6. When using the set builder notation, we must be careful to use conditions
that are unambiguous. For instance
{x : x is a lucky number}
is not well-defined since x is a lucky number is not a proposition depending on x

(it is an opinion).
Example B.7. Using the set builder notation, one can easily create sets which exist, logically speaking, but whose elements are impossible to find explicitly. For instance, the
elements of
{x : x is a finite string of digits from the decimal expansion of }
are not possible to explicitly produce since we do not know the entire decimal expansion
for (for it is an infinite string of digits without any apparent pattern). We know that
certain strings of digits, like 1415 and 535 belong to the set above, but in general it is not
a set that we can grasp in its entirety.
B.3. Russells Paradox
Having worked with sets a little bit, you might be surprised to learn that our approach
to sets is not logically sound. In fact, it is called naive set theory to distinguish it from
the rigorous axiomatic approach used in formal set theory. A startlingly simple logical
paradox due to Bertrand Russell immediately shows that the basis of this approach to sets
is unsound.
One of the basic principles of naive set theory is the General Comprehension Principle, which we implicitly used above. In the early days of set theory (around 18731900),
mathematicians and logicians had always assumed that you can always define a set if you
2For instance, what is an object? Is x allowed to be any object? Clearly x should be restricted to all objects
for which P (x) makes sense, whatever that would mean.
145
have a definite property P (x). In other words, given a reasonable statement P (x), the set
of all x for which P (x) is true should exist, logically speaking. Essentially, they assumed
that
{x : P (x)}
should always exist and be something that we are allowed to think about and discuss logically. Surprisingly, this is not the case.
The death blow to naive set theory came in 1901 and it is called Russells Paradox.
Russell begins by letting
R = {x : (x is a set) (x
/ x)}
In other words,
R is the set of all sets that are not elements of themselves.
The expression
P (x) = (x is a set) (x
/ x)
is quite unambiguous. An object x should either be a set or not a set. An object x should
either be an element of itself or not be an element of itself. Thus P (x) looks like an unambiguous, if a little unusual, condition. As logical human beings, we should be permitted to
think about the set R.
Russell then asks: Does R contain itself or not? This is a simple yes/no question and
there are clearly only two possibilities.
C ASE 1: If R R is true, then R
/ R is true by the definition of R. However, this is not
logically possible since R
/ R is false when R R is true.
C ASE 2: If R
/ R is true, then R R is true by the definition of R. However, this is not
logically possible since R R is false when R
/ R is true.
Neither R R not R
/ R are logically possible! This means that we cannot treat R
as a set it is simply too large of an idea to be considered in a logical sound manner.
In other words, we cannot logically consider the set of all sets that are not elements of
themselves without running into paradoxes. We just cannot it is a law of the universe.
Russells Paradox shows that the General Comprehension Principle is not correct. Russell
discovered this paradox and sent it to Gottlob Frege (1848 1925) as Frege was finishing
his Grundgesetze der Arithmetik, a work which attempted to rigorously derive the laws of
arithmetic from supposedly logical axioms. Russells Paradox invalidated much of Freges
work. Indeed, Frege noted:
A scientist can hardly meet with anything more undesirable than to have the
foundation give way just as the work is finished. I was put in this position
by a letter from Mr. Bertrand Russell when the work was nearly through the
press.
There are many other logical paradoxes that have been discovered throughout the
years, but Russells paradox is one of the most important. It forced mathematicians and
logicians to completely reevaluate mathematics and logic from the ground up. Russells
Paradox ushered in a new age in which sets would have to be treated in a rigorous axiomatic fashion. The rules would have to be explicitly stated in such a way that Russells
Paradox would not occur in the universe of axiomatic set theory. Although we will not
discuss axiomatic set theory in this course, it is important to be aware that sets and set
theory are not as simple as they sound.
146

Here are a couple of paradoxes which are somewhat similar in spirit:
Example B.8. A car is equipped with a Russell light on its dashboard. The light turns on
to warn the driver if a light has burnt out. What happens when the Russell light burns out?
Example B.9. The following paradox of Eubulides of Miletus3 (4th century BCE) indicates that self-reference can be troublesome:
This statement is false.
This is a troublesome sentence (call it P ) since
P is true
P is false.
Thus Eubulides statement is not a logical proposition. This paradox is similar to the liar
paradox: I am lying.
B.4. Quantifiers
In mathematics, we often deal with propositions which depend on variables. The
special symbols (called quantifiers) and will help us. The symbol stands either for
for all, for every, or for each (depending on which makes more grammatical
sense). The symbol stands for there exists.
There are many ways to use quantifiers and various ways to combine them with other
symbols. The best way to understand how to read and translate sentences with quantifiers
is to study a number of examples.
Example B.10. The statement
(x > 0)(y)(x = ey )
can be translated in a number of ways:
For every x > 0, there exists a y such that x = ey .
For each x > 0, there is a y so that x = ey .
For each positive x there exists a y such that x = ey .

Every positive number is the exponential of another
number.
Every positive number has a logarithm.
Note that we did not even bother specifying what type of object y is. To be more precise,
we could have written y R instead of y. Most of the time, however, this level of
precision is more cumbersome that it is worth. It is clear, in this context, that y is a real
number and not (for instance) a function, a matrix, or a penguin.
The order of in which quantifiers appear is extremely important. Changing the order
of quantifiers often completely changes the meaning of a statement.
3I have actually been to Miletus (now known as Milet, in modern Turkey). There are many fascinating
Roman era ruins, partially sunken below a swamp, which are open to the public. There are, however, few tourists
who visit the site.
147

(y)(x > 0)(x = ey )
translates as
There exists a y such that for every x > 0, x = ey .
In more proper English, this reads
There exists a y such that x = ey for every x > 0.
This is completely false. It asserts that there is a single number y with the property that
y = ln x for every nonnegative x. Presumably, that is not what we intended to say!
Great liberties are often taken when translating from mathematical notation to mathematical English. It takes a while to get used to switching back and forth and mathematicians normally do their thinking somewhere in between the two extremes.

(x, y Q) ( (x + y) Q) (xy Q)
can be translated in many different ways:
For all x, y Q, x + y Q and xy Q.

For all x, y Q, both x + y and xy belong to Q.
For any rational numbers x and y, it is the case that
x + y and xy are rational.
The sum and product of rational numbers is rational.
The rational numbers are closed under addition and
multiplication.
Although most mathematical writing is in English, you must always be able to break
things down symbolically if you have any doubt as to the logical meaning of a statement.
This is especially important if you need to negate a complicated logical statement. This
occurs, for instance, when beginning a proof by contradiction.
Example B.13. Recall that P denotes the set of prime numbers, so that
P = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .}.
The statement:
(p P)(q P)(p < q).
translates as
For every prime p there exists a prime q such that p < q.
(B.1)
In other words, this says that

For any prime, there is a bigger prime.
This proposition is true and it is known as Euclids Theorem4, which is usually stated as
There are infinitely many prime numbers.
4There are several theorems that go by the name Euclids Theorem. The theorem we are discussing is
Proposition IX.20 of the Elements.
148
This is a somewhat liberal translation of (B.1), of course. Nevertheless, it emphasizes the

fact that there is not a direct correspondence between mathematics and English.
Example B.14. As mentioned earlier, the order in which the quantifiers appear is crucial.
For example, the proposition
(p P)(q P)(p < q)
means
There exists a prime p such that for every prime q, p < q.
The preceding statement asserts that there is a prime that is strictly less than all other
primes. We can easily demonstrate this is if false. Take q = 2 and note that there is no
prime p such that p < 2.
Example B.15. Another symbol which occurs quite frequently is the ! symbol. It stands
for unique so that (!x) is translated as there exists a unique x such that.
Using R as our universal set, consider the statement: (!x)(y)(xy = y). In other words
There exists a unique x such that for every y, xy = y.
This is a true statement in fact the x in question is simply the number 1. It is the only
real number x with the property that xy = y for every y.
B.5. Negating Propositions With Quantifiers
We negate propositions involving quantifiers according to the rules:
[(x)P (x)]
[(x)P (x)]
(x)(P (x))
(x)(P (x)).
If the quantifiers have additional symbols attached, the rules are the same. For instance:
[(x A)P (x)]
[(x A)P (x)]
(x A)(P (x))
(x A)(P (x)).
Consider the following example:

Example B.16. There are a number of ways that the statement
(x Q)(x2 = 2)
(B.2)
could be translated. For instance, we might say that

There exists an x in Q such that x2 = 2.
There exists a rational number x such that x2 = 2.
There is a rational number x such that x2 = 2.
2 has a rational square root.
It turns out that (B.2) is false. In fact, 2 is an irrational number and hence cannot be
written in the form a/b where a and b are integers.5
We can negate (B.2) to obtain a true statement. In words, we might say:
There does not exist a rational number x such that x2 = 2.
5According to legend, this was discovered by the Pythagorean philosopher Hippasus of Metapontum (an
ancient Greek colony in southern Italy) around 500 BCE. The numbers e and were proved to be irrational by
Euler and Lambert in 1737 and 1760, respectively.
149
We would like to write this in terms of quantifiers (There does not exist is not a
quantifier) According to the rules for negating propositions with quantifiers, the negation
of (B.2) is:
(x Q)(x2 = 2)
There are several ways to interpret this:
(x Q)(((x2 = 2))
(x Q)(x2 6= 2).
For every x in Q, x2 6= 2.
For each rational number x, it is the case that
x2 6= 2.
For all rational x, x2 6= 2.

Of course, the three interpretations above are all equivalent ways of saying:
2 is irrational.
This is a true statement, known as Hippasus theorem (and often attributed to Pythagoras).
B.6. Subsets
Definition (Set Inclusion). If A and B are sets, then we say that A B (read: A is a
subset of B) if every member of A is also a member of B. In other words,
(A B)
(x)(x A x B),
(B.3)
When we write x, the variable x actually lives in some universal set U . Typically, U
will be a set of numbers, functions, or other mathematical objects. Moreover, exactly what
the universal set U is will typically be clear from context.
The following theorem can be proved from the basic definitions and logical principles
(although we will not prove it in these notes):
Theorem B.1. (A = B) [A B B A]
The importance of the theorem above is that if we wish to prove that A = B, it suffices
to prove A B and B A separately. This is sometimes easier than proving A = B
directly.
Example B.17. {0, 1} {0, 1, 2} R. Indeed, {0, 1} is subset of {0, 1, 2} since
every element in {0, 1} (namely the numbers 0 and 1) also belongs to {0, 1, 2}. We also
see that {0, 1, 2} is a subset of R since 0, 1, and 2 all are elements of R (the set of real
numbers).
Example B.18. Observe that A A holds for any set A. In other words, every set is a
subset of itself. Indeed, the proposition
(x)((x A) (x A))
is true for any x in our universal space. To see this, simply write out a truth table for the
implication P Q where P = (x A) and Q = (x A):
xA xA
T
T
F
F
(x A) (x A)
T
T
We can therefore say that

For any x A, it is the case that x A.
150
By the definition (B.3) we can conclude that A A.

Example B.19. Also observe that A for any set A. In other words, every set has the
empty set as a subset. Indeed, the statement (x)((x ) (x A)) is true since
there are no elements of the empty set and hence the hypothesis x (the P in P Q)
of the implication above is always false:
x xA
F
T
F
F
(x ) (x A)
T
T
We can therefore say that (x)((x ) (x A)).

Note that any proposition of the form (x ) Q(x) is true in the same way that
that the proposition
If a penguin can fly, then it will rule the world.
Logically, this is a true statement since there are no penguins that can fly and hence the
hypothesis a penguin can fly is always false.
Instead of writing (A B) we write A * B, which is read: A is not a
subset of B.
Example B.20. What does A * B really mean?
A*B
[(x)(x A x B)]
(x)[(x A x B)]
(x)[(x A) (x
/ B)].
(x)[((x A) (x B))]
Hence another way of saying that A * B is:

There exists an x such that x is an element of
A and x is not an element of B.
In other words, to show that A * B it suffices to show that there exists some x which
belongs to A but not to B.
Definition (Proper Subset). If A and B are sets, then we say that A B (read: A is a
proper subset of B) if A B but A 6= B. This is sometimes written A ( B as
well.
Example B.21. Recalling from previous lectures our definitions of standard sets of numbers, we immediately recognize that
PNZQRC
Here we have added C, the set of complex numbers, to our list of number sets.6
6We might also add C H O where H is quaternion number system and O is the octonion number
system. The quaternions are noncommutative 4-dimensional number system (whose elements are of the form
a + bi + cj + dk where a, b, c, d R) discovered by the Irish mathematician William Rowan Hamilton. The
idea came to him while he was walking to a meeting of the Irish Academy. Hamilton scratched the fundamental
formulas i2 = j 2 = k 2 = ijk = 1 on the stone of Brougham Bridge (Dublin). Hamiltons graffiti remains to
this day a mathematical tourist attraction. The octonions O are a bizarre number system (also called the Cayley
numbers) which we do not describe here in any further detail.
151
B.7. Complement, Union, and Intersection

Having introduced sets and some basic logical principles, we are now ready to discuss
relationships between sets and methods of constructing new sets from old ones. First we
will discuss the notions of inclusion and complement.
Definition (Set Complement). If A and B are sets, then the complement of B in A is the
set7
A\B = {x A : x
/ B}.
Example B.22. Let A = {0, 1, {a, b}} and B = {a, b, 1}. Then A\B = {0, {a, b}}. On
the other hand B\A = {a, b}.
Typically, one works inside of some universal set U and in terms of Venn diagrams, the
complement of A in U is just the outside of A. If the universal set is declared beforehand
(or obvious from context), then we sometimes denote the complement of A in U by A or
Ac .
Example B.23. If the universal set U = Z and A = N, then
the set of negative integers.
Nc = Z\N = {. . . , 3, 2, 1}
Definition (Union). If A and B are sets then the union of A and B is the set
A B = {x : (x A) (x B)}.
Definition (Intersection). If A and B are sets then the intersection of A and B is the set
A B = {x : (x A) (x B)}.
There are many laws about how unions and intersections interact. They can be derived
from the rules for and . For example:
Theorem B.2. A (B C) = (A B) (A C).
Proof. One way to show that the two sets are equal A (B C) and (A B) (A C)
is to show that the conditions for membership in them are logically equivalent. We will
therefore try to show that the statements x [A (B C)] and x [(A B) (A C)]
are logically equivalent:8
x [A (B C)]
(x A) [x (B C)]
(x A) [(x B) (x C)]
[(x A) (x B)] [(x A) (x C)]

[x (A B)] [x (A C)]
x [(A B) (A C)]
def. of
def. of
dist. law for ,
def. of
def. of
In conclusion, we have shown that x A (B C) if and only if x (A B)

(A C) and therefore the two sets are equal.

7Many books denote set complement of B in A by A B. Both notations are common in mathematics, but
I prefer A\B to A B since it is not uncommon in abstract algebra to consider sets of the form { a b : (a
A) (b B)} where A and B are sets of numbers (or, more generally, elements of a commutative group).
8It is good form, especially when you are starting out, to write your reasoning on the side. You should write
in such a manner that a fellow student could look at your work and understand the reasoning behind what you are
doing.
152
Although Venn diagrams are not always accurate (they are limited by the constraints
of being in two dimensions and are hence unsuitable for picturing complex relationships
between large numbers of sets), they are generally a good tool for getting the feel of a
statement. For instance, draw some Venn diagrams to convince yourself that the theorem
above is true.
B.8. Ordered Pairs
Since we will typically be concerned with order pairs (and ordered n-tuples) of real
numbers, we do not need to go further into the subject of ordered tuples and Cartesian
products at this point. However, let us briefly mention the technical definition:
Definition (Ordered Pairs). The symbol (a, b) denotes an ordered pair. It has the property
that if (c, d) is another ordered pair then
(a, b) = (c, d)
(a = c) (b = d).
It is important to note that the existence of a definition does not logically imply the
existence of the object defined. For instance, we might make the following definition:
Definition. A penguin p is called exceptional if p can fly.
It is clear that no exceptional penguins exist, despite the nice definition we made for
them. We have not actually proved that ordered pairs exist or that some structure satisfying
the definition can be constructed using sets. However, one can actually define the ordered
pair (a, b) to be the set
(a, b) = {{a}, {a, b}}
and then verify that this set satisfies the property of the definition. However, we will not
go through the (somewhat tedious) proof.
B.9. Cartesian Products
Definition (Cartesian Product). If A and B are sets, then the Cartesian product of A and
B is the set
A B = {(a, b) : a A, b B}.
Example B.24. If A = R and B = R then the Cartesian product R R is denoted R2 .
This is typically thought of as the xy-plane in analytic geometry.
Example B.25. [1, 1] (0, 1) = {(x, y) : (1 x 1) (0 < y < 1)}.

Example B.26. If A = {0, 1} and B = R, then A B can be thought of as the set of
points which are on the vertical lines x = 0 and x = 1 in the xy-plane.
B.10. Power Sets
Definition. If A is a set, then the power set of A, denote P(A) is defined to be the set of
all subsets of A. In symbols:
P(A) = {B : B A}.
Example B.27. If A = , then P(A) = {}.
Example B.28. If A = {a}, then P(A) = {, A}.
Example B.29. If A = {a, b}, then P(A) = {, {a}, {b}, A}.
153
Example B.30. If A = {a, b, c}, then

P(A) = , {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, A .
You can probably see a pattern forming here. In this case, our intuition is correct:
Theorem B.3. If A is finite and #A = n, then P(A) = 2n .

Sketch of Pf. Since A is finite and has exactly n elements, we may index the elements of
A:
A = {a1 , a2 , . . . , an }.
To see why the theorem is true, the easiest way is to think like a computer. There is a
one-to-one correspondence between subsets of A and binary strings (a string of 0s or 1s)
of length n. For a given subset B of A, the jth binary digit of the corresponding string is 0
if aj
/ B and 1 if aj B. For instance, the subset
B = {a1 , a3 , a5 }
corresponds to the binary string 10101000 000. Since there are 2n possible strings,
there are 2n possible subsets.

Another way to think of the preceding sketch is to consider how many choices one
has when creating a subset of A. To construct a subset of A, one has to choose whether to
include a1 or not. Then on has to choose whether to include a2 or not, and so forth. In all,
there will be n choices to make and there are 2n possible ways of doing this.
Example B.31. Describing the power set of infinite sets is much trickier. For instance,
P(N) contains every possible subset of N and hence the sets
, {1}, {5, 23}, {2, 4, 6, 8, . . .}, {100, 101, 102, . . .}, {2, 3, 5, 7, 11, 13, . . .}
all belong to P(N).
B.11. Concerning Exceptional Penguins

Definition. A penguin p is called exceptional if p can fly.
This is clearly a defintion whose use may lead to contradictions and falsehoods. Just
because we can define something, it does not mean that this thing necessarily exists, that it
is logically sound, or that it is necessarily a useful concept. Furthermore, you can never be
sure that you will not run into contradictions far down the road if you just invent a new
mathematical concept.
The present digression concerns odd perfect numbers, a class of much-studied numbers which few people actually believe exist. Nevertheless, there is an enormous literature
on the subject and many theorems have been proved about them.
The Pythagoreans (who were numerologists) regarded the number 6 as special because
it is equal to the sum of its proper divisors. Specifically
1 + 2 + 3 = 6.
The next largest numbers with this property are 28, 496, and 8128 since:
28 = 1 + 2 + 4 + 7 + 14
496 = 1 + 2 + 4 + 8 + 16 + 31 + 62 + 124 + 248
8128 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 127
+254 + 508 + 1016 + 2032 + 4064.
154
One of the cornerstones of Pythagorean philosophy was the assignment of mystical qualities to numbers. They chose to call numbers like 6, 28, 496, and 8126 perfect numbers.
Later philosophers and theologians like St. Augustine and Alcuin of York would expound the special nature of such numbers. For instance, in the City of God, St. Augustine
(354-430) said:
Six is a number perfect in itself, and not because God created the world in six
days; rather the contrary is true. God created the world in six days because this
number is perfect, and it would remain perfect, even if the work of the six days
did not exist.
The fact that it takes 28 days for the Moon to travel round the Earth was also seen to
confirm the importance of perfect numbers.
In his book Introductio Arithmetica, Nicomachus of Gerasa (ca. 60-120 C.E.) conjectured that there is one perfect number with exactly k digits for each k 1 and that they
alternate ending in 6 and 8. Both of these claims are incorrect, since the fifth and sixth
perfect numbers are 33, 550, 336 and 8, 589, 869, 056.
After Euclid and until Euler, most mathematicians implicitly assumed that all perfect
numbers were generated by a formula due to Euclid and Euler9. This formula produces
only even perfect numbers. Some, like Descarte and Mersenne admitted that they saw no
reason why odd perfect numbers should not exist, despite the fact that no one had yet found
one.
Euler was one of the first to attack one of the most intriguing (and one of the oldest)
problems in number theory and proved an important theorem on odd perfect numbers.
Although no odd perfect numbers are known to exist, there are many conditions that a
hypothetical odd perfect number must satisfy. As Sylvester (1814-1897) noted:
. . . the existence of [an odd perfect number] - its escape, so to say, from the
complex web of conditions which hem it in on all sides would be little short of
a miracle.
A (by no means complete) list of conditions that an odd perfect number n must satisfy are
given below:
n has at least four distinct prime factors. (Cole, 1888)
If n is not divisible by 3, 5, or 7, then n has at least 26 distinct prime factors.
(Catalan, 1888)
n has at least 5 distinct prime factors and n > 2 106 . (Turcanov, 1908)
n has at least 6 distinct prime factors. (Gradshtein, 1925)
Not all of the even exponents ki can be 2. (Steuerwald, 1937) %item Not all of
the even exponents ki can be 4. Nor is it possible for one of them to be 4 and
the others 2. (Kanold, 1941)
n > 1020 . (Kanold, 1957)
Not all of the even exponents ki can be 6. (Haggis, McDaniel, 1972)
n > 1036 (Tuckerman, 1976).
If 3 does not divide n then n has at least 11 distinct prime factors. (Haggis,
1983)
9Marking a collaboration of Eu mathematicians 2000 years apart!
155
n > 10160 . (Brent, Cohen, 1987)
n > 10300 . (Brent, 1991)

A whole theory has been developed about a class of numbers that probably do not
exist. The goal of these mathematicians is to show that odd perfect numbers are so strange
that they cannot exist (so the whole topic of odd perfect numbers can be thought of as
a giant proof by contradiction). Their work makes it seem extremely plausible that odd
perfect numbers do not exist, but so far a proof has evaded mathematicians for over 2000
years.
This indicates that one should take definitions seriously. One can define a concept
(odd perfect numbers, for instance) and prove many theorems about this concept, but nevertheless the concept may be vacuous since the concept may contradict earlier axioms.
For instance, someone might prove one day that odd perfect numbers do not exist. The
theorems quoted above would therefore be about a class of numbers that do not exist.10
10Although in this case, fortunately, the aim of these theorems is to show that the properties that an odd
perfect number must satisfy are so restrictive that no number can satisfy them. In particular, no one actually
believes that odd perfect numbers exist.
APPENDIX C
Mathematical Induction
C.1. The Power Sum Problem
What is
1 + 2 + 3 + + 100?
(C.1)
According to mathematical folklore, the correct answer (namely 5050) was given immediately by the young Carl Friedrich Gauss (1777-1855) when his teachers assigned his class
this busy work problem. His teachers soon realized Gauss prodigious talent and his
education was later sponsored by the Duke of Brunswick.
The young Gauss found the sum (C.1) using the formula
n(n + 1)
.
2
This formula1 can be derived by adding the two equations
1 + 2 + 3 + + n =
1 + 2 + 3 + + n =
n + (n 1) + (n 2) + + 1 =
(C.2)
S
S
together to obtain the equation

n(n + 1) = 2S
for the unknown sum S. Dividing by 2 yields the formula S = 12 (n(n + 1)). In particular,
setting n = 100 in Gauss formula (C.2) gives the answer 5050 to (C.1).
What about sums of squares? Noting that
12
12 + 22
= 1
= 5
12 + 22 + 32
12 + 22 + 33 + 42
= 14
= 30,
12 + 22 + 33 + 42 + 52
= 55,
some trial and error might lead us to conjecture the formula

n(n + 1)(2n + 1)
.
(C.3)
6
Of course, we have not actually proved that (C.3) is the correct formula since checking
a finite number of cases does not prove that the formula is valid for every n = 1, 2, . . ..
Consider the following example:
1 2 + 2 2 + + n2 =
Example C.1. Let p(n) = n2 + n + 41 so that p(0) = 41, p(1) = 43, p(2) = 47,
p(3) = 53, p(4) = 61, . . . . Do you notice a pattern? It appears that p(0), p(1), p(2), . . .
are always primes. In fact, p(n) is prime for n = 0, 1, 2, . . . , 39 but p(40) is composite.
1The formula (C.2) was known since ancient times (and hence merely rediscovered by the young Gauss).
156
157
Indeed, p(40) = 402 + 40 + 41 = 40(40 + 1) + 41 = 40 41 + 41 = 41 41. This striking

example is due to Leonhard Euler.
Is (C.3) correct? Can we prove it? Furthermore, how do we find formulas for sums of
cubes and higher powers?
C.2. Mathematical Induction
Suppose that P (n) is a statement depending on the natural number n. The Principle
of Mathematical Induction2 states that if both
(1) P (1) is true,
(2) If P (n) is true, then P (n + 1) is true,
are true statements, then P (n) is true for all n 1.
The informal reason why is the following. By (1), P (1) is true. By (2), P (2) =
P (1 + 1) is true since P (1) is true. By (2), P (3) = P (2 + 1) is true since P (2) is true
an so on. Another way to think about induction is to imagine climbing an infinite ladder.
Condition (2) means that if you are on step #n of the ladder, then you are able to climb to
step #(n + 1). Condition (1) says that you start on step #1. We conclude from this that we
can eventually reach every single step on the ladder.3
Proving that P (1) is true is called the base case of the induction. The second step is
a little more conceptually difficult. We must prove the implication
If P (n) is true, then P (n + 1) is true.
We are not asserting that P (n) IS true for all n, but rather we are trying to prove that
P (n + 1) is true under the inductive hypothesis that P (n) is true. The key word is if.
Theorem C.1. The summation formula
1 + 2 + + n =
n(n + 1)
2
(C.4)
holds for all n N.

Proof. Let P (n) be the formula (C.4). We want to show that P (n) is true for all (integers)
n 1.
BASE C ASE: P (1) is true since 1 =
12
2 .
This establishes the base case.
I NDUCTIVE S TEP: If P (n) is true for some number n, does it follow that P (n + 1) is also
true? We basically want to prove the statement If P (n) is true, then P (n + 1)
is true. In other words, we must show that

n(n + 1)
1 + 2 + + n =
2

(n + 1)((n + 1) + 1)
1 + 2 + + n + (n + 1) =
.
2
Therefore our goal is to derive the formula
1 + 2 + + n + (n + 1) =
(n + 1)((n + 1) + 1)
2
2It is possible to prove the principle of mathematical induction from the axioms of set theory, but we will
not do that here.
3Note that there is no infinity step and that it is improper to speak of a last step since there is no last
natural number.
158
from the formula
n(n + 1)
.
2
Adding n + 1 to both sides of the preceding formula gives
n(n + 1)
+ (n + 1)
(1 + 2 + + n) + (n + 1) =
2
n(n + 1) + 2(n + 1)
=
2
(n + 1)(n + 2)
=
2
(n + 1)((n + 1) + 1)
=
.
2
Therefore P (n + 1) is true if P (n) is true. By induction, the formula holds for all n
N.

1 + 2 + + n =
Similarly, we can prove that (C.3) is correct by mathematical induction:

Theorem C.2. The summation formula
1 2 + 2 2 + + n2 =
n(n + 1)(2n + 1)
6
(C.5)
holds for all natural numbers n 1.

Solution. We proceed using mathematical induction.
BASE C ASE: The base case n = 1 is easily verified:
123
12 =
.
6
I NDUCTIVE S TEP: Suppose that the formula (C.5) holds for some unspecified value of n
(this is our inductive hypothesis). In other words, suppose that
n(n + 1)(2n + 1)
(C.6)
1 2 + 2 2 + + n2 =
6
for some specific value of n. Adding (n + 1)2 to both sides of (C.6) we obtain
12 + 22 + + n2 + (n + 1)2
=
=
=
=
=
=
=
=
=
=
n(n + 1)(2n + 1)
+ (n + 1)2
6
n(n + 1)(2n + 1)
+ (n2 + 2n + 1)
6
n(n + 1)(2n + 1) 6(n2 + 2n + 1)
+
6
6
2
2
1
6 ((n + n)(2n + 1) + 6n + 12n + 6)
3
2
2
1
6 (2n + 3n + n + 6n + 12n + 6)
3
2
1
6 (2n + 9n + 13n + 6)
3
2
2
1
6 ((2n + 7n + 6n) + (2n + 7n +
2
1
6 (n + 1)(2n + 7n + 6)
1
6 (n + 1)(n + 2)(2n + 3)
6))
(n + 1)((n + 1) + 1)(2(n + 1) + 1)
.
6
159
Hence if (C.5) holds for some value of n, it must also hold for n + 1 as well. Since we
have established that the formula holds for n = 1, it follows that it also holds for 2. Since
it holds for n = 2, it must hold for n = 3, and so on. This is the essence of mathematical
induction.

Based on the facts that
1 + 1 + 1 + + 1
1 + 2 + 3 + + n
1 + 2 2 + 3 2 + + n2
= n
n(n + 1)
=
2
n(n + 1)(2n + 1)
.
=
6
(C.7)
(C.8)
we might conjecture that the power sum

Sm (n) = 1m + 2m + + nm
(C.9)
is always a polynomial in the variable n of degree m + 1. This was proved by Jacob

Bernoulli (1654-1705) in his book Ars Conjectandi (published posthumously in 1713). He
proudly proclaimed that in less than half a quarter of an hour he was able to sum the
tenth powers of the first thousand integers. Before we solve this old problem, we need to
reintroduce the binomial coefficient, first encountered in calculus.

C.3. The Binomial Coefficient nk
You may have seen the binomial coefficient

n!
n
=
k
k!(n k)!
(C.10)
before. Here n! (where n is a natural number) denotes the product

n = 1 2 3 (n 1) n
if n
1. If n = 0, then 0! is defined to be 1. Although it is not obvious, it turns out that
n
Indeed, looking at the formula (C.10), it is actually somewhat
k is always an integer.

remarkable that nk is an integer! We will see why shortly.
The symbol nk is sometimes read n choose k since it turns out that this also repre-
sents the number of ways to choose k objects from a collection of n. In other words, nk
is the number of k element subsets of a set with n elements:
Theorem C.3. Let S denote a set containing exactly n elements. For any non-negative

integer k, the number of subsets of S containing precisely k elements is given by nk .
Proof. If S has n elements and we wish to form an ordered list with exactly k (necessarily
distinct) elements, we have n ways to choose the first element, n 1 ways to choose the
second, and so forth. There are therefore n(n 1) (n k + 1)(n k) = n!/(n k)!
separate lists of k elements from S. For a given subset with k elements, there are k!
different orderings, so for each of the n!/(n k)! lists that were chosen, there are only
n!/(k!(n k)!) distinct subsets containing precisely k elements.

Corollary 11. nk is always an integer.

Although it follows from the combinatorial interpretation that nk is always an integer,
we will present an independent proof of this fact later. The following example shows how
the preceding theorem works:
160
Example C.2. Consider the set S = {a, b, c, d, e}. How many two element subsets of S
are there? To make a two element subset, we first need to choose one element, and there
are 5 ways of doing this. Lets say that we pick a:
{a}.
Now we have to choose an additional element of S to into our subset. There are 4 additional
ways of doing this. Lets pick b:
{a, b}.
We have produced a two element subset of S. There were
5!
3!
ways of doing this. However, if we had chosen b first and then a, we would have {b, a}
5!
by the number of ways to
instead. But {a, b} = {b, a} and we must therefore divide 3!
order aset with 2 objects, namely 2!. Therefore the total number of 2 elements subsets of
S is 52 or 5 choose 2. Of course this example does not prove anything, but it gives you
a little bit of the feel for the proof of the preceding theorem.
54=
C.4. Pascals Triangle

A useful mnemonic for remembering binomial coefficients is Pascals Triangle, the
first few rows of which are reproduced below:
1
1
1
1
1
1
3
4
1
2
1
3
6
10
4
10
(C.11)
1
1
5
From Pascals Triangle, one can deduce Pascals Rule which describes the (n + 1)st row
of Pascals Triangle in terms of the nth row.
Theorem C.4 (Pascals Rule). For n, k 0,

n
n
n+1
+
=
k
k+1
k+1
Proof. This is a straightforward computation:

n!
n
n
n!
+
=
+
k+1
k
k!(n k)! (k + 1)!(n k 1)!
n!(n k)
n!(k + 1)
+
=
(k + 1)!(n k)! (k + 1)!(n k)!
n!(k + 1 + n k)
=
(k + 1)!((n + 1) (k + 1))!
(n + 1)!
=
(k + 1)!((n + 1) (k + 1))!

n+1
=
.
k+1
161

n+1
Using Pascals Rule we see that the entries
in the (n + 1)st row of the triangle
k

are integers precisely because the entries nk in the preceding row are integers. Sometimes
Pascals Rule is written in the form:

n
n
n+1
+
=
k1
k
k
for n 1 and 1 k n.

Corollary 12. nk is always an integer.
Proof. We prove this by induction on n. Here P (n) is the statement

n
is an integer for all k {0, 1, 2, . . . , n}.
k

We may start our induction at n = 1 since 00 = 1 is clearly an integer.

BASE C ASE: The statement
P (1) is true since 10 = 11 = 1 follows immediately from

the definition of nk .
I NDUCTIVE S TEP: Now we must show that
If P (n) is true, then P (n + 1) is true

to complete the proof. If P (n) is true, then nk is an integer when 0 k n. We must

then show, under this hypothesis, that n+1
is an integer when 0 k n + 1. This is
k
where Pascals Rule comes in.
For each k {1, 2, 3, . . . , n} Pascals Rule says that

n+1
n
n
=
+
.
k
k1
k

n
If P (n) is true, then k1
and nk are both integers and hence so is n+1
k . Therefore

n+1
is an integer when 1 k n. For k = 0 and k = n + 1
k

n+1
n+1
= 1,
=1
0
n+1

follow from the definition of the binomial coefficient. Therefore n+1
is an integer when
k
0 k n + 1 and hence P (n) implies P (n + 1). This completes the proof.

C.5. The Binomial Theorem
Expanding out (by brute-force) (x+ y)n shows that the coefficient of the term xk y nk
in the expansion of (x + y)n is given by nk . The first few binomial expansions (for small
integer exponents) are written below:
(x + y)0
(x + y)1
x+y
(x + y)2
x2 + 2xy + y 2
(x + y)3
x3 + 3x2 y + 3xy 2 + y 3
(x + y)4
x4 + 4x3 y + 6x2 y 2 + 4xy 3 + y 4
x5 + 5x4 y + 10x3 y 2 + 10x2 y 3 + 5xy 4 + y 5 .
(x + y)
The binomial theorem says that this pattern (based on Pascals triangle) continues indefinitely:
162
Theorem C.5 (Binomial Theorem). The formula

n
X
n k nk
n
(x + y) =
x y
k
(C.12)
k=0
holds for any integer n 1 and any real numbers x, y.
Proof. We prove this by induction on n. Here P (n) is the statement

n
X
n k nk
x y
.
(x + y)n =
k
k=0
BASE C ASE: The statement P (1) is true since (x + y)1 = x + y.
I NDUCTIVE S TEP: Now we must prove the statement If P (n) is true, then P (n+
1) is true. In other word we must show that
"
#
"
#
n
n+1
X
X n + 1
n k nk
n
k (n+1)k
n+1
(x + y) =
x y
(x + y)
=
x y
.
k
k
k=0
k=0
If P (n) is true, then

(x + y)n+1
=
=
=
=
=
=
(x + y)(x + y)n
n
X
n k nk
(x + y)
x y
k
k=0
n
n
X
n k+1 nk X n k nk+1
x
y
+
x y
k
k
k=0
k=0
n+1
n
X n
X
n k (n+1)k
k n(k1)
x y
+
x y
k1
k
k=1
k=0
n+1
n
X n
X
n k (n+1)k
xk y (n+1)k +
x y
k1
k
k=1
k=0
!

n
X
n 0 n+1
n n+1 0
n
n
k (n+1)k
+
x y
+
x
y
+
x y
0
n
k1
k
k=1

n
n + 1 0 n+1 X n + 1 k (n+1)k
n + 1 n+1 0
x y
+
x y
+
x
y
0
k
n+1
k=1
n+1
X n + 1
xk y (n+1)k .
k
k=0
We have shown If P (n) is true, then P (n + 1) is true and hence P (n) is

true for all n N by induction.

Example C.3. The equation
n
2 = (1 + 1) =
n
X
n
k=0

n
k is
follows from the binomial theorem. Recall that

the number of k element subsets of
a set with n elements. The preceding equation tells us that there are 2n total subsets of a
set with n elements.
163
C.6. Bernoullis Solution to the Power Sum Problem

Jacob Bernoulli discovered an extremely clever solution to the power sum problem,
which we present here. Using the Binomial Theorem to expand (x + 1)m+1 we find that
!
m+1
X m + 1
m+1
m+1
k
m+1k
xm+1
(x + 1)
x
=
x 1
k
k=0
!
m+1
X m + 1
k
xm+1
=
x
k
k=0

m+1
m+1 2
m+1 m
m+1
= 1+
xm+1
x+
x + +
x +x
1
2
m

m+1
m+1 2
m+1 m
=1+
x+
x + +
x ,
1
2
m
yielding the formula
m+1
(x + 1)
m+1

m+1
m+1 2
m+1 m
=1+
x+
x + +
x .
1
2
m
Since this holds for x = 1, 2, 3, . . . , n, we may add this equation to itself as x goes from 1
to n to obtain

n
n
X

X
m+1
m+1 2
m+1 m
m+1
m+1
1+
x+
x + +
x .
(x + 1)
x
=
1
2
m
x=1
x=1
The sum on the left telescopes and hence

n
X
m+1
m+1 2
m+1 m
(n + 1)m+1 1 =
1+
x+
x + +
x .
1
2
m
x=1
!
!
!
!

X

n
n
n
n
X
X
X
m+1
m
+
1
m+1
=
1 +
x +
xm
x2 + +
1
m
2
x=1
x=1
x=1
x=1

m+1
m+1
m+1
=n+
S1 (n) +
S2 (n) + +
Sm (n)
1
2
m
where
Sm (n) = 1m + 2m + + nm
denotes the sum of the first n mth powers. All of these computations yield Bernoullis
formula

m+1
m+1
m+1
m+1
(n + 1)
1=n+
S1 (n) +
S2 (n) + +
Sm (n).
1
2
m
This is a recursive formula for Sm (n). In other words, if we have formulas for Sk (n) for
k = 1, 2, . . . , m 1 we can solve the equation above for Sm (n).
Example C.4. Recall that our experimentation suggested that
2

n(n + 1)
.
S3 (n) = 13 + 23 + + n3 =
2
164
This formula can be derived from Bernoullis recursive procedure. Indeed, we have
n(n + 1)
S1 (n) =
2
n(n + 1)(2n + 1)
S2 (n) =
6
and hence setting m = 3 in Bernoullis formula we see that

4 n(n + 1)
4 n(n + 1)(2n + 1)
4
(n + 1)4 1 = n +
+
+
S3 (n).
1
2
6
2
3
Expanding out both sides of the preceding equation yields
n4 + 4n3 + 6n2 + 4n = n + (2n2 + 2n) + (2n3 + 3n2 + n) + 4S3 (n).
Collecting common terms reduces the preceding to
n4 + 2n3 + n2 = 4S3 (n)
from which it follows that
2
n(n + 1)
2
as desired. Although this formula could also be proved using mathematical induction, one
would first have to know the formula beforehand (i.e. via numerical computations and
guesswork, as we have done). The advantage of Bernoullis method is that knowledge of
lower order power sums leads directly to formulas for higher order power sums, without
having to derive formulas from numerical computations and inspired guesswork.
S3 (n) =
APPENDIX D
Ordered Fields
D.1. Fields
The two prominent modern methods of constructing the real numbers (starting only
with the rational numbers, set theory, and logic) is through Dedekind cuts or equivalence
classes of Cauchy sequences. We will briefly touch on these later on in the course and
through the homework. However, we will not dwell on them now. Rather, we will examine
the properties of the real numbers that makes them what (we think) they are.1
Let us assume for the moment that R exists. What type of object is R? Where does it
fit into the grand scheme of things? In algebraic terminology, the real numbers R form a
field, a type of generalized number system which shares many of the standard properties
of elementary arithmetic.
Definition. A field is a set K endowed with two operations, denoted + and , which satisfy
the following axioms:
(i) C OMMUTATIVITY: x + y = y + x and x y = y x for every x, y K.
(ii) A SSOCIATIVITY: (x + y) + z = x + (y + z) and (x y) z = x (y z) for
every x, y, z K.
(iii) D ISTRIBUTIVITY: x (y + z) = x y + x z for every x, y, z K.
(iv) A DDITIVE AND M ULTIPLICATIVE I DENTITIES: There are distinct elements
called 0 and 1 of K such that x + 0 = x and 1 x = x for every x K
(v) A DDITIVE AND M ULTIPLICATIVE I NVERSES: For each x K, there exists an
element of K, denoted x, such that x + (x) = 0. For any nonzero x K,
there exists an element of K, denoted x1 , such that x x1 = 1.
It is important to be explicit about these axioms, for there are many algebraic systems
which do not obey all of the rules above. For instance, one can add and multiply n n
matrices, but matrix multiplication is not commutative nor does every nonzero matrix has
an inverse.
Most of the rules of basic algebra that you are familiar with from grade school can be
proved from these basic axioms. Unless you have taken abstract algebra, you might not
have known that there are many other number systems that obey these rules too.
It is important to understand that many different fields exist, and that the operations
+ and do not necessarily correspond to our usual understanding of addition and multiplication. Furthermore, the symbols 0 and 1 do not necessarily correspond to the numbers 0
and 1, in the usual sense. Consider the following examples:
1Fortunately, mathematicians have proved that the real number system exists and that it satisfies the properties of a complete ordered field. These properties are not assumed as axioms, rather they can be deduced
logically from either construction method referred to above.
165
166
Appendix D. Ordered Fields
Example D.1. Let K be a set containing the symbols 0 and 1 and define the operations +
and by the following tables:
+
0
1
0 1
0 1
1 0
0
1
0
0
0
1
0
1
One can check that K = {0, 1}, equipped with the operations above, forms a field. In fact,
you are already familiar with this field since it corresponds to the algebra of even and odd
numbers (represented by 0 and 1).
Example D.2. One can sometimes make new fields from pieces of old number systems.
If p is a prime number, then the set Zp = {0, 1, 2, . . . , p 1} forms a field2 when the
operations are defined by
x + y = remainder of x + y when divided by p
x y = remainder of x y when divided by p
As expected, 0 and 1 play the role of additive and multiplicative identities in this field.
Note also that Z2 is simply the field from the previous example.
Example D.3. The rational numbers Q, endowed with the standard operations, form a
field. It is a subfield of R.
Example D.4. The set
Q( 2) = {a + b 2 : a, b Q},
endowed with the usual operations of addition and multiplication, is also field.
Example D.5. The complex numbers system C is a field. Notice also that
Q Q( 2) R C.
Example D.6. R(x), the set of (real) rational functions, is a field (when endowed with
the usual addition and multiplication of functions). The constant functions 0 and 1 are the
additive and multiplicative identities.
MORAL: Although R is a field, the field axioms (i.e. standard
properties of commutativity, associativity, and distributivity) do
not narrow things down to the points where R is the only such
object. Can we list more properties of R? In fact, can we find
a list of properties that characterize R completely?
D.2. Ordered Fields
One property that helps to distinguish R from other fields is the fact that R comes
equipped with an ordering. Specifically, the real numbers form what is called an ordered
field. In addition to the standard field axioms, an ordered field also satisfies the following:
Definition. A field K is an ordered field if there is a subset K+ of K such that
(i) If x, y K+ , then x + y K+ and x y K+ .
(ii) T RICHOTOMY: For each x K, one and only one of the following is true:
x K+ , x = 0, x K+ .
2If p is not a prime number, then Z is not a field. For instance, 2 has no multiplicative inverse in Z .
p
4
167
One then says that x < y if y x K+ . The elements of K+ are called positive and the
elements such that x K+ are called negative.
Example D.7. Q, Q( 2), and R, endowed with the usual notions of positive and negative,
are ordered fields.
Example D.8. The field R(x) of all rational functions in the variable x can be ordered.
Specifically, we say that f R(x) 0 if f is eventually positive. In other words
(f R(x) 0)
(M > 0)(x > M )(f (x) > 0).
Although this example may seem somewhat alien at first, it directly corresponds to the
intuitive notion of how strong a function is as x . For instance, in the ordering of
R(x) we have x2 > x > 1/x. Unlike R, however, R(x) does not have the Archimedean
Property. Indeed, in R(x), we have 1 R(x) x but n 1 R(x) x also holds for any n N.
Every ordered field comes equipped with an absolute value, defined by:
(
x
if x 0
|x| =
x if x < 0.
It is not too hard to show that the absolute value enjoys the standard features that we all
expect it to. However, there are two important properties that are often forgotten:
|x + y| |x| + |y|
||x| |y|| |x y|
(Triangle Inequality)
(Reverse Triangle Inequality).
Know these inequalities well you will use them many times in this course.
Another important consequence of the order axioms is the Trichotomy Law:
Theorem D.1 (Trichotomy Law). Let K be an ordered field. Given x, y K, then one
and only one of the following statements is true: x < y, x = y, x > y.
Example D.9. Ordered fields are much rarer than fields. For example, no finite field is
an ordered field. Furthermore, the complex number system C is not ordered. Indeed, if
C were an ordered field, then by the Trichotomy Law, either i > 0, i = 0, or i < 0.
Manipulating these inequalities quickly leads to contradictions (try it).
Example D.10. Some fields can be ordered in more than one way. For example, Q( 2)
sits inside R, and as suchhas a natural ordering. However, one can
declare a new ordering by saying that a + b 2 is positive in the new sense if a b 2is positive in the
usual sense. It requires some checking, but it turns out that this gives Q( 2) two possible
orderings. Fortunately, Q and R themselves can be ordered in one and only one way (this
requires checking too).
Adding the order axioms to the field axioms narrows things down abit. We are closer
to obtaining a list of properties that characterizes R. However, Q, Q( 2), and R(x) are
also ordered fields. We therefore need to add at least one more axiom to make sure that we
have completely characterized R.
APPENDIX E
Primes Numbers
E.1. Euclids Theorem
Recall that the prime numbers are the building blocks of all integers. You are probably at least informally acquainted (via grade school arithmetic) with many of their basic
properties.
Definition. An integer p > 1 is called a prime number if there is no (integer) divisor d of
p such that 1 < d < p. A positive integer that is not prime is called a composite number.
Example E.1. The integers 2, 3, 5, and 7 are primes and 4, 6, 8, and 9 are composites.
Less obvious examples are 1299709 (the 100000th prime number) and 1299711, which is
divisible by 3 and hence composite.
Theorem E.1 (Fundamental Theorem of Arithmetic). Every integer n > 1 can be expressed as a product of primes. Specifically, we may write n = pa1 1 pa2 2 par r where the
pk are distinct primes and the ak are positive integers. The factorization of an integer
n > 1 into primes is unique, apart from the order of the prime factors.
This theorem first appeared (somewhat vaguely) as Proposition 14 of Book IX of Euclids book the Elements (ca. 2300 BCE):
If a number be the least that is measured by prime numbers, it will not be
measured by any other prime except those originally measuring it.
However, C.F. Gauss (in his groundbreaking 1804 treatise Disquisitiones Arithmeticae) was the first to state and prove the Fundamental Theorem of Arithmetic in a rigorous
way. Incidentally, Gauss was also the first to prove the Fundamental Theorem of Algebra
in a rigorous way!
An important mathematical fact is that the set
P = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}
of prime numbers in infinite. This nontrivial assertion, now known as Euclids theorem,
was proved
in Book IX of Euclids book the Elements. Euclids proof, along with the irrationality of 2 (commonly attributed to Pythagoras, but most likely due to the Pythagorean
Hippasus of Metapontum), is considered one of the most mathematically elegant contributions of the ancient Greeks.
In his famous book A Mathematicians Apology, the great early 20th century English
mathematician G.H. Hardy stated that
I can hardly do better than go back to the Greeks. I will state and prove two of
the famous theorems of Greek mathematics. They are simple theorems, both
in idea and in execution, but there is no doubt at all about their being theorems
of the highest class. Each is as fresh and significant as when it was discovered
. . . two thousand years have not written a wrinkle on either of them . . . The first
168
169
is Euclids proof of the existence of an infinity of primenumbers. . . . My second

example is Pythagorass proof of the irrationality of 2. . .
Euclids proof is startling in its simplicity and its elegant use of reductio ad absurdum
(proof by contradiction). As Hardy says:
Reductio ad absurdum, which Euclid loved so much, is one of a mathematicians
finest weapons. It is a far finer gambit than any chess play: a chess player may
offer the sacrifice of a pawn or even a piece, but a mathematician offers the
game.
We are now ready to prove Euclids theorem:

Theorem E.2 (Euclid of Alexandria). The number of primes is infinite.
Proof. Suppose toward a contradiction that the set P = {p1 , p2 , . . . , pn } of all primes is
finite. If this is the case, then the number N = p1 p2 pn + 1 is not divisible by any of
the primes pj . Indeed, division by pj leaves a remainder of 1 since N is one more than
p1 p2 pn , which is divisible by pj . Therefore the prime factors of N cannot belong to
the set P (which was supposed to contain all of the prime numbers). This contradicts our
hypothesis that P contains every prime number and hence this hypothesis must be false. In
other words, the set of all primes cannot be finite it must be infinite.

Although there are infinitely many primes, it is always possible to find arbitrarily large
gaps between consecutive primes. The proof of this fact is relatively straightforward and
provides an example of a direct proof:
Theorem E.3. There are arbitrarily large gaps in the sequence of primes: For each integer
n 2, there exists a sequence of n consecutive composite integers.
Proof. Let n 2 be a given positive integer and note that (n + 1)!, being the product of
1, 2, . . . , n, n + 1, is divisible by each of the numbers 2, 3, . . . , n. Therefore
(n + 1)! + 2
is divisible by 2
(n + 1)! + 3
is divisible by 3
..
.
..
.
(n + 1)! + n
is divisible by n.
Hence it follows that there exist n consecutive composite numbers.
Example E.2. For n = 4, the construction used in the proof of Theorem E.3 produces the
sequence
122 = 2 61
123 = 3 41
124 = 4 31
125 = 5 25
of four consecutive composite integers. Hoewever, 24, 25, 26, 27 and 32, 33, 34, 35 are
both much smaller sequences of composite integers. In general, the method of Theorem E.3
produces much larger sequences than necessary. This also illustrates the fact that although
a proof might work, it does not mean that the methods used are necessarily optimal.
170
Appendix E. Primes Numbers

E.2. The Prime Number Theorem
Legendre was the first to publicly make a significant conjecture regarding the large
scale distribution of prime numbers. In his Essai sur la Theorie des Nombres (1798), he
proposed that

.
x
=1
lim (x)
x
log x 1.08366
where (x) denotes the number of primes x and log denotes the natural logarithm.
Based on numerical evidence, Gauss (as a child) conjectured that
(x)
=1
x/ log x
(E.1)
.
lim (x) Li(x)
(E.2)
lim
and
where the function
dt
log
t
2
is called the logarithmic integral. It appears that Gauss work on the subject began in 1791
(at the age of fourteen), well before Legendres book was written. The conjecture (E.1) is
true, and it is now known as the Prime Number Theorem. The proof of the Prime Number
Theorem would have to wait until the end of the 19th century.
A major step was taken in 1850, when the Russian mathematician Pafnuty Lvovich
Chebyshev proved that there exist constants c1 , c2 such that
x
x
< (x) < c2
c1
log x
log x
Li(x) =
for sufficiently large x. He also proved that if

lim
(x)
x/ log x
exists, then this limit must equal 1. Unfortunately, Chebyshev was not able to prove that
the limit actually exists.
In 1896, Hadamard and de la Vallee Poussin (independently) proved the celebrated
Prime Number Theorem:
Theorem E.4 (Prime Number Theorem).
lim
(x)
= 1.
x/ log x
Their proofs are technical and involve the use of complex function theory and the
Riemann -function. In 1949, Selberg and Erdos succeeded in proving the Prime Number Theorem without using complex function theory. Their so-called elementary proof is
exceedingly complicated, but does not use advanced complex analysis.
It is interesting to note that the conjecture (E.2) of the fourteen year old Gauss is also
true and more accurate than the standard prime number theorem.
A result of Littlewood (1914) shows that the difference (x) Li(x) assumes both
positive and negative values infinitely often. However, the first value of x for which (x) >
Li(x) is not known. In 1933, Skewes proved that such an x must occur before
e79
ee
1034
1010
(E.3)
171
The number (E.3) is called Skewes number and is widely believed to be the largest number
that has ever appeared for a genuine purpose. Subsequently this extravagant bound has
been reduced to 1.165 101165 by Lehman (1966), 8.185 10370 by te Riele (1987), and
it is now known to be somewhat less than 1.39822 10316 .
APPENDIX F
Galileos Paradox
The following is the passage from The Discourses and Mathematical Demonstrations Relating to Two New Sciences concerning Galileos Paradox:
S IMPLICIO: Here a difficulty presents itself which appears to me
insoluble. Since it is clear that we may have one line greater than
another, each containing an infinite number of points, we are forced
to admit that, within one and the same class, we may have something
greater than infinity, because the infinity of points in the long line is
greater than the infinity of points in the short line. This assigning to
an infinite quantity a value greater than infinity is quite beyond my
comprehension.
S ALVIATI: This is one of the difficulties which arise when we attempt, with our finite minds, to discus the infinite, assigning to it
those properties which we give to the finite and limited; but this I
think is wrong, for we cannot speak of infinite quantities as being the
one greater or less than or equal to another. To prove this I have in
mind an argument which, for the sake of clearness, I shall put in the
form of questions to Simplicio who raised this difficulty. I take it for
granted that you know which of the numbers are squares and which
are not.
S IMPLICIO: I am quite aware that a squared number is one which
results from the multiplication of another number by itself; this 4, 9,
etc., are squared numbers which come from multiplying 2, 3, etc., by
themselves.
S ALVIATI: Very well; and you also know that just as the products are
called squares so the factors are called sides or roots; while on the
other hand those numbers which do not consist of two equal factors
are not squares. Therefore if I assert that all numbers, including both
squares and non-squares, are more than the squares alone, I shall
speak the truth, shall I not?
S IMPLICIO: Most certainly.
S ALVIATI: If I should ask further how many squares there are one
might reply truly that there are as many as the corresponding number
of roots, since every square has its own root and every root its own
172

square, while no square has more than one root and no root more than
one square.
S IMPLICIO: Precisely so.
S ALVIATI: But if I inquire how many roots there are, it cannot be
denied that there are as many as the numbers because every number
is the root of some square. This being granted, we must say that
there are as many squares as there are numbers because they are just
as numerous as their roots, and all the numbers are roots. Yet at
the outset we said that there are many more numbers than squares,
since the larger portion of them are not squares. Not only so, but
the proportionate number of squares diminishes as we pass to larger
numbers, Thus up to 100 we have 10 squares, that is, the squares
constitute 1/10 part of all the numbers; up to 10000, we find only
1/100 part to be squares; and up to a million only 1/1000 part; on
the other hand in an infinite number, if one could conceive of such a
thing, he would be forced to admit that there are as many squares as
there are numbers taken all together.
S AGREDO: What then must one conclude under these circumstances?
S ALVIATI: So far as I see we can only infer that the totality of all
numbers is infinite, that the number of squares is infinite, and that
the number of their roots is infinite; neither is the number of squares
less than the totality of all the numbers, nor the latter greater than
the former; and finally the attributes equal, greater, and less,
are not applicable to infinite, but only to finite, quantities. When
therefore Simplicio introduces several lines of different lengths and
asks me how it is possible that the longer ones do not contain more
points than the shorter, I answer him that one line does not contain
more or less or just as many points as another, but that each line
contains an infinite number.
173
APPENDIX G
Inner Product Spaces

G.1. Review: The Dot Product
Let us recall some ideas you may have seen in your basic Linear Algebra and/or Multivariable Calculus course. Recall that the norm of a vector x = (x1 , x2 , x3 ) in R3 is given
by
q
kxk = x21 + x22 + x23 .
You may also recall that the dot product (or scalar product) of two vectors x =
(x1 , x2 , x3 ) and y = (y1 , y2 , y3 ) in R3 is defined by the formula
x y = x1 y1 + x2 y2 + x3 y3 .
Note that the dot product takes two vectors as input and outputs a scalar. Hence the dot
product does not provide us a way to multiply two vectors together to obtain another vector.
Of paramount importance is the geometric relation:
x y = kxkkyk cos
where was the angle between x and y. This easily implies the Cauchy-SchwarzBunyakowky inequality
|x y| kxkkyk
for all x, y in R3 .
One of the most important properties of the dot product is the following:
xx
= x1 x1 + x2 x2 + x3 x3
= x21 + x22 + x23
= kxk2 .
In particular, we note that

xx 0
for any vector x. The dot product also satisfies the properties:
(x + y) z = x z + y z
and
xy =yx
as you should recall.
174
175
G.2. Inner Products

Since the dot product proved so useful in Vector Calculus and basic Linear Algebra,
we would like to generalize it as much as possible. Motivated by the ideas in the preceding
section, we are led to the following formal definition:
Definition. An inner product on a vector space V is a function
h, i : V V R
such that:
(i) (P OSITIVITY) hv, vi 0 for all v V;
(ii) (D EFINITENESS) hv, vi = 0 if and only if v = 0;

(iii) (A DDITIVITY IN
FIRST SLOT )
hu + v, wi = hu, wi + hv, wi;
(iv) (S YMMETRY) hu, vi = hv, ui;

(v) (H OMOGENEITY) hau, vi = a hu, vi for all a R.
An inner product space is simply a vector space V equipped with an inner product.
There are a couple additional properties that inner products have, which follow quickly
from the definitions. For example, combining (iii) and (v) yields:
hau + bv, wi = a hu, wi + b hv, wi
for all a, b R and u, v, w, V.
Example G.1. Rn , when equipped with the dot product, is an inner product space. With
our new notation, we have
n
X
xi yi .
hx, yi =
i=1
where x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ).
Example G.2. If A is an invertible n n matrix, then

n
hx, yiA = hAx, Ayi
defines an inner product on R . Here hAx, Ayi refers to the standard inner product on Rn
from the preceding example. Let us briefly check that this satisfies properties (i) through
(v):
(i) If x Rn , then hx, xiA = hAx, Axi 0 since the standard inner product (i.e.,
the dot product) satisfies (i). More geometrically, we note that
p
kAxk = hAx, Axi,
the Euclidean norm of the vector Ax Rn .
(ii) If hx, xiA = 0, then hAx, Axi = 0 whence Ax = 0 since the standard inner
product satisfies (ii). Since A is invertible, it follows that x = 0 since the
homogeneous system Ax = 0 has only the trivial solution.
(iii) This is a straightforward computation using the fact that multiplication by A is
linear:
hu + v, wiA = hA(u + v), Awi
176

= hAu + Av, Awi
= hAu, Awi + hAv, Awi

= hu, wiA + hv, wiA .
(iv) Since the standard inner product satisfies (iv) it follows that
hu, viA = hAu, Avi = hAv, Aui = hv, uiA .
(v) Using the fact that the standard inner product satisfies (v) along with the fact
that A(au) = a(Au) for all a R and u V, we see that
hau, viA = hA(au), Avi = haAu, Avi = a hAu, Avi = a hu, viA .
In summary, there are many possible inner products on Rn . It turns out that the inner
products described above are the only possible inner products on Rn .
G.3. Norms Defined by Inner Products
Recall that a norm on a vector space V is a function k k : V R that satisfies the
following conditions:
(i) kvk 0 for all v V and kvk = 0 if and only if v = 0
(ii) kavk = |a|kvk for any a R and v V,
(iii) kv + wk kvk + kwk.
It turns out that an inner product space is always a normed vector space. In fact, the
following definition is a generalization of the fact that if x = (x1 , x2 , x3 ) is a vector in R3 ,
then its Euclidean length kxk is given by kxk2 = x x.
Definition. If V is an inner product space and v V, then the norm on V induced by the
inner product is defined by
p
(G.1)
kvk = hv, vi.
It turns out that (G.1) indeed defines a norm on V. In other words, one can verify that
the axioms for a norm are satisfied by the expression (G.1):
p
Theorem G.1. If V is an inner product space, then kvk = hv, vi defines a norm on V.
In particular, kvk satisfies the axioms (i), (ii), and (iii) for a norm on V and V is thus a
normed vector space.
p
Proof. Property (i) is easily verified:
hv, vi 0 for all v Vpis automatic since
hv, vi 0 for all v V by the definition of an inner product. If hv, vi = 0, then
hv, vi = 0 whence v = 0 by the definition of an inner product.
Property (ii) is slightly trickier:
kavk2
=
=
=
=
=
hav, avi
a hv, avi
a hav, vi
a2 hv, vi
|a|2 kvk2 .
177
Make sure you see why each step was valid look at the axioms for inner products to see
which rules we used. Taking square roots yields the desired formula
kavk = |a|kvk.
We postpone the proof of Property (iii), the Triangle Inequality, until later.
Example G.3. We can define an inner product on C([a, b]), the vector space of continuous
(real-valued) functions on the closed interval [a, b], by defining
Z b
hf, gi =
f (x)g(x) dx.
a
The reason for using continuous functions is to ensure that the preceding integral exists
and is finite. Something that requires proof is that
Z b
hf, f i =
|f (x)|2 dt
a
equals zero iff f (x) is the zero function. We will overlook this for the moment.
The preceding product is not so bizarre. In fact, vectors in Rn are just functions, if
you think of them the right way. One usually thinks of a vector f Rn as an n-tuple
f = (a1 , a2 , . . . , an ).
One can also think of f as the function

f : {1, 2, 3, . . . , n} R
such that f (x) = ax for each x {1, 2, 3, . . . , n}. From this point of view the inner
product on Rn is simply
n
n
X
X
hf , gi =
ax b x =
f (x)g(x).
x=1
x=1
Keeping in mind that integration is a type of summation process (think Riemann sums),
one begins to see the relationship between the standard inner products on Rn and C([a, b]).
They are essentially the same, except that one is discrete and one is continuous. In light of
this revelation, we will begin using the symbols f, g to denote generic vectors (as opposed
to u, v, . . .).
G.4. Orthogonal Vectors
Definition. Two vectors u, v V are called orthogonal if hu, vi = 0.
Example G.4. In the real inner product space Rn vectors a = (a0 , a1 , . . . , an ) and b =
(b1 , b2 , . . . , bn ) are orthogonal iff
n
X
ha, bi =
an bn = 0.
k=1
Recall that in R we have
hu, vi = kukkvk cos

where denotes the angle between u and v. Therefore hu, vi = 0 if and only if u and
v are perpendicular vectors. In light of the preceding example, we see that the concept of
orthogonality is a generalization of the notion of perpendicularity in Rn . Indeed, we study
inner products precisely because we want to import as many geometric notions into the
study of abstract inner product spaces as possible.
178
Example G.5. If m, n Z, then cos 2nx and sin 2mx are orthogonal in C([0, 1]) with
R1
respect to the inner product hf, gi = 0 f (x)g(x) dx. Indeed, the following integral can
be verified directly:
Z 1
cos(2nx) sin(2mx) dx = 0.
0
This is the main observation behind the theory of Fourier series.
Given two perpendicular line segments which form the sides of a right triangle, the
Pythagorean theorem tells us how to find the length of the hypotenuse. Although this is
one of the most basic theorems in all of mathematics, a surprising number of math majors
do not know how to prove it from basic principles. Here is a simple proof:
Theorem G.2 (Classical Pythagorean Theorem). If a, b, c are the lengths of the two sides
and hypotenuse of a right triangle, respectively, then a2 + b2 = c2 .
Proof. Put four copies of the triangle around a square of side c to make a square of side
a + b. Comparing areas of the big square to the sum of the areas of the components we get:
(a + b)2 = c2 + 4( 21 ab).
Expanding and canceling terms shows that a2 + b2 = c2 .
Properly interpreted, the Pythagorean theorem suggests something about inner product
spaces. The Euclidean plane is simply the inner product space R2 and the sides of our
triangle are orthogonal vectors u and v. In this form, the Pythagorean theorem states
ku + vk2 = kuk2 + kvk2 .
This is true in complete generality and it is one of the most fundamental properties of
abstract inner product spaces:
Theorem G.3 (Abstract Pythagorean Theorem). If f and g are orthogonal vectors in an
inner product space, then
kf + gk2 = kf k2 + kgk2 .
Proof. If f, g are orthogonal, then hf, gi = 0 by definition. Thus
kf + gk2
= hf + g, f + gi
= hf, f i + hf, gi + hg, f i + hg, gi

= kf k2 + kgk2 .
Another geometrically inspired theorem is the following:

Theorem G.4 (Parallelogram Identity). If f, g are vectors in an inner product space, then
kf + gk2 + kf gk2 = 2(kf k2 + kgk2 ).
Proof. The proof is a straightforward computation based upon the properties of an inner
product. It is important to note that the Parallelogram Identity does not hold for normed
vector spaces in general. The Parallelogram Identity is a special property of norms arising
from inner products.

The abstract Pythagorean Theorem highlights the usefulness of orthogonal vectors. In
fact, the same idea of projection of one vector along another (think back to the dot product)
also applies in arbitrary inner product spaces. If f, g V, then the equation
f = cg + (f cg)
179
obviously holds for all c R. It will be useful to find a constant c such that
hf cg, cgi = 0.
In other words, we want to write f as a scalar multiple of g plus something orthogonal to

g. To do this, we solve the above equation for the constant c:
0
= hf cg, cgi
= c hf, gi c2 hg, gi
= c(hf, gi ckgk2 )
and thus either c = 0 or
hf, gi
.
kgk2
We obtain the orthogonal decomposition f = cg + h where the vector
c=
h=f
hf, gi
g
kgk2
is orthogonal to g. Notice the important fact that h = 0 if and only if f and g are scalar
multiples of one another.
G.5. The Cauchy-Schwarz-Bunyakowsky Inequality
One of the most useful inequalities in all of mathematics is the Cauchy-SchwarzBunyakowsky Inequality. In the west, the following has traditionally be known as the
Schwarz Inequality or the Cauchy-Schwarz Inequality. In Eastern Europe, it is frequently
called the Bunyakowsky Inequality. In light of this, many authors simply refer to it as the
CSB Inequality.
Theorem G.5 (Cauchy-Schwarz-Bunyakowsky Inequality). If h, i is an inner product on
V, then | hf, gi | kf kkgk for all f, g V. Equality holds if and only if f and g are scalar
multiples of one another.
Pf. #1. If either f or g is the zero vector, then the inequality is obviously true. Thus
is suffices to check the case where neither f nor g is zero. Write down the orthogonal
decomposition of f with respect to g:
hf, gi
g + h.
kgk2
Here the vector h is orthogonal to f . The Pythagorean Theorem states that:
f=
hf, gi 2
gk + khk2
kgk2
| hf, gi |2 kgk2
kgk4
| hf, gi |2
=
kgk2
which implies the CSB inequality. Equality holds in the CSB inequality if and only if
h = 0, which by the comment at the end of the preceding section implies that f and g are
scalar multiples of one another.

kf k2
There is an entirely different proof of the Cauchy-Schwarz inequality that is interesting

in and of itself. We present this below:
180
Pf. #2. Let f, g V and let t R be any real scalar. Furthermore, suppose that f 6= 0
and g 6= 0 to avoid any trivialities. Now observe that
p(t) = ktf + gk2 0
is a real-valued function of the variable t and furthermore p(t) 0 for all t. We can use
the definition of the norm and some basic properties of inner products to derive an explicit
formula for p(t):
p(t) =
=
=
=
=
ktf + gk2
htf + g, tf + gi
htf, tf i + htf, gi + hg, tf i + hg, gi
t2 hf, f i + 2t hf, gi + hg, gi
kf k2 t2 + 2 hf, gi t + kgk2 .
We can rewrite this as

p(t) = at2 + bt + c
where a = kf k2 > 0, b = 2 hf, gi, and c = kgk2 . The graph of p(t) is a parabola
which opens upward. Moreover, p(t) is always nonnegative and hence the discriminant is
nonpositive:
b2 4ac 0.
Substituting in for a, b, c yields the CSB inequality.
If equality held in the CSB inequality, then b2 4ac = 0 whence the quadratic polynomial p(t) has a unique real root, say c. Thus p(c) = kcf + gk2 = 0, whence cf + g is the
zero vector. In particular, this implies that f and g are scalar multiples of each other.
Example G.6. Applying the CSB inequality to the inner product
Z b
f (x)g(x) dx
hf, gi =
a
on C([a, b]) yields the highly nontrivial inequality

s
Z
sZ
Z b
b

b

|f (x)|2 dx
|g(x)|2 dx,
f (x)g(x) dx

a

a
a
valid for all continuous function f, g on [a, b]. Try proving that directly!
Example G.7. If x1 , x2 , . . . , xn and y1 , y2 , . . . , yn are real numbers, then

2
n
! n
!
n

X
X |yi |2
X

2
.
i|xi |
xi yi

i
i=1
i=1
i=1
Why? Because of the CSB inequality. Let
x = (x1 , 2x2 , 3x3 , . . . , nxn )

y2
yn
y = (y1 ,
).
, y33 , . . . ,
n
2
Since x, y Rn , we may use the CSB inequality for the standard inner product to get
| hx, yi | kxkkyk,
which (when squared) yields exactly the strange inequality proposed above.
181
G.6. The Triangle Inequality

We mentioned earlier (Theorem G.1) that whenever you have an inner product, you
get a norm for free via the formula
p
kuk = hu, ui.
We proved that this proposed norm satisfies (i) and (ii) of the axioms for a norm, but we
never showed that (iii), the Triangle Inequality, was satisfied.
A fundamental theorem in plane geometry says that the sum of the lengths of two sides
of a triangle is always greater than the length of the other side. The following theorem
generalizes this idea to inner product spaces:
Lemma 9 (Triangle Inequality). Let V be an inner product space. If f, g V, then
kf + gk kf k + kgk.
Equality holds if and only if f and g are nonnegative scalar multiples of each other.
Proof.
kf + gk2
= hf + g, f + gi
= hf, f i + hf, gi + hg, f i + hg, gi
= kf k2 + 2 hf, gi + kgk2
kf k2 + 2| hf, gi | + kgk2
kf k2 + 2kf kkgk + kgk2
= (kf k + kgk)2 .
Taking square roots of both sides yields the triangle inequality.
In conclusion, we have the following relationships between vector spaces, normed

vector spaces, and inner product spaces:
inner product spaces ( normed vector spaces ( vector spaces.
APPENDIX H
Covering Compactness
It turns out that there is a completely different approach to the concept of compactness.
These notes give a brief introduction to this viewpoint.
H.1. Covering Compactness
Definition. Let (M, d) be a metric space and let S M . We say that S is covering
compact if, whenever S is contained in the union of a collection of open subsets of M , S
is contained in the union of a finite number of these open subsets.
This definition is frequently stated as:
S is covering compact if every open cover of S has a finite subcover.
Example H.1. Any finite set S = {x1 , x2 , . . . , xn } in a metric space (M, d) is covering
compact. Let {A }I be an open cover of S. In other words, I is an index set1 and for
each I we have an open subset A of M . Since
S I A ,
it follows that each xn belongs to at least one of the A . In other words, there exist
1 , 2 , . . . , n I so that xi Ai for i = 1, 2, . . . , n. In particular,
n
[
Ai .
S
i=1
Thus the open cover {A }I for S can be refined to produce a subcover

of S containing n of the Ai .
{A1 , A2 , . . . An }
Example H.2. (0, 1] is not covering compact since the open cover defined by
A = (, 1 + ),
>0
does not have a finite subcover which still covers (0, 1]. Indeed, take n of the A :
A1 , A2 , . . . , An
and note that
x < min{1 , 2 , . . . , n }
x
/
n
[
Ai .
i=1
In other words, the union of any finite number of the A excludes points of (0, 1] which
are sufficiently close to zero. Since there exists an open cover of (0, 1] which cannot be
refined to produce a finite subcover of (0, 1], it follows that (0, 1] is not covering compact.
1The index set can be finite, countably infinite, or even uncountable there are no restrictions.
182
183
Example H.3. The subset

S = { n1 : n N} {0}
of R is covering compact. If {A }I is an open cover of S, then there exists some 0 I
so that 0 A0 . Since this set A0 is open, there exists > 0 so that B (0) A0 . Since
limn n1 = 0, there exists N N so that n N implies that n1 < . In particular, all
but finitely many of the n1 belongs to A0 and hence only finitely many other of the A are
needed to cover S. Thus S is covering compact.
Example H.4. N, regarded as a subset of R, is not covering compact. Indeed, if An =
(n 1, n + 1) for n N, then clearly each An is open and nN An = N. Nevertheless,
there is no finite subcollection of the An whose union contains all of N since each An
contains only one natural number (namely n).
H.2. Covering Compactness = Sequential Compactness
Covering compactness is a somewhat difficult property of a set to verify directly, since
it involves checking that every possible open cover of a set always reduces to a finite
subcover of that set. Fortunately, we have the following theorem:
Theorem H.1. Let (M, d) be a metric space. A subset S M is compact if and only if S
is covering compact. In other words, the notions of sequential compactness2 and covering
compactness are equivalent.
Proof. Suppose toward a contradiction that S is covering compact but not sequentially
compact. Thus there exists a sequence xn in S which has no subsequences which converge
in S. In particular, this implies that the xn assumes infinitely many distinct values since
otherwise xn would have a subsequence xnk which is constant.
Therefore for each a S there exists , which depends on a, such that B (a) contains
only finitely many of the xn . Therefore
{B (a) : a A}
forms an open cover of S. Since S is covering compact, there exists a finite subcover
{B1 (a1 ), B2 (a2 ), . . . , Bn (an )}
of S. However, each Bi (ai ) contains only finitely many terms of the sequence xn . On the
other hand, since
n
[
Bi (ai ),
S
i=1
it follows that the sequence xn assumes only finitely many values, a contradiction.
The proof that sequential compactness implies covering compactness is significantly
more difficult (it would take a couple pages) and is therefore omitted.

H.3. Total Boundedness
Definition. A set S M is totally bounded if for each > 0 there exists a finite covering
of S by -balls.
S In other words, S is totally bounded if there exist x1 , x2 , . . . , xn M
such that S ni=1 B (xi ).
Theorem H.2. Let (M, d) be a metric space and let S M . The following are equivalent:
(i) S is (sequentially) compact
2What we have been referring to as compactness.
184
Appendix H. Covering Compactness

(ii) S is covering compact
(iii) S is closed and totally bounded
(iv) S is complete and totally bounded
If M = Rn and d is the Euclidean metric, then the three conditions above are equivalent
to S being closed and bounded.
There are essentially two totally different ways of looking at compactness. We have
chosen to use the sequential approach because it is somewhat more intuitive. The covering
approach is a little more abstract and difficult to motivate. Nevertheless, the concept of covering compactness is open to greater generalization. When one studies point-set topology
(typically in graduate school), one no longer considers metric spaces, but rather topological
spaces where open and closed sets exist, but there is no notion of distance. Consequently,
the notion of compactness one encounters there is actually covering compactness.
For each theorem about compactness which we proved using the sequential definition,
there is typically a corresponding proof which uses the covering definition. For example:
Theorem H.3. A continuous function on a compact metric space is uniformly continuous.
Pf. (via Covering Compactness). Let (A, dA ) be compact and let f : A B be continuous. For each > 0 and for each x A there exists a number (x) > 0 so that
dA (x, y) < (x)
dB (f (x), f (y)) < 2 .
The open balls B(x)/2 (x) form an open cover of A since x B(x)/2 (x) for each x. Since
A is (covering) compact, there exists x1 , x2 , . . . , xn A so that
n
[
B(xi )/2 (xi ).
A
i=1
Now let
= min{ (x2 1 ) , (x22 ) , . . . , (x2n ) }

Now suppose that x, y A and |x y| < and let x B(xi )/2 (xi ) for some
i {1, 2, . . . , n}. It follows that
dA (xi , y) dA (xi , x) + dA (x, y)
<
(xi )
2
(xi )
2
+
+
(xi )
2
= (xi ).
Therefore
dA (xi , y) < (xi ) and dA (x, xi ) < (xi )
which implies that
dB (f (xi ), f (y)) <
and dB (f (x), f (xi )) < 2 .
Putting this all together, we have shown that |x y| < implies that
dB (f (x), f (y)) dB (f (x), f (xi )) + dB (f (xi ), f (y))

< 2 +
=
Since this depends only upon (and not x or y) it follows that f is uniformly continuous.
185
Theorem H.4. Let (M, d) be a metric space. If An is a sequence of nonempty, compact

subsets of M such that
A1 A2 A3 ,
T
then A = n=1 An is also compact and nonempty.3
Pf. (via Covering Compactness). We have already seen that the arbitrary intersection of
compact sets is compact. Thus A is a compact subset of (M, d). We must now show that
A is nonempty.
Suppose toward a contradiction that A =
n=1 An = . This implies that
[
M=
Acn
n=1
and hence {Acn : n N} is an open cover of M . Since (M, d) is compact, it follows that
the open cover {Acn : n N} has a finite subcover. In other words, there exists
n 1 < n 2 < . . . < nm
so that
Since
if follows that
whence
which is a contradiction.
M = Acn1 Acn2 Acnm .

An1 An2 An3 Anm ,
Acn1 Acn2 Acn3 Acnm
M Acnm
Anm ,
3The important part of the theorem is the assertion that the intersection is nonempty!

Analysis I Notes by Garcia

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Analysis I Notes by Garcia

Enviado por

Direitos autorais:

Formatos disponíveis

Lectures on Real Analysis I

Stephan Ramon Garcia

Lecture 2. The Archimedean Property and Its Consequences

Lecture 3. The Least Upper Bound Property

3.2. The Existence of 2

Lecture 4. Monotone Sequences and Series

Lecture 7. Cantors Theorem

Lecture 8. The Continuum Hypothesis

Lecture 9. Normed Vector Spaces

9.1. Vector Spaces

Lecture 10. Metric Spaces

Lecture 11. Subsequences, Continuity

Lecture 12. Sequences and Continuity

Lecture 13. Closed Sets

Lecture 14. Open Sets

Lecture 15. Set Operations with Open and Closed Sets

Lecture 16. Topological Characterization of Continuity

Lecture 17. Cauchy Sequences

Lecture 18. Completeness

Lecture 19. Infinite Series

Lecture 20. Infinite Series

Lecture 21. Integral Test

Lecture 22. Alternating Series

Lecture 23. Rearrangements of Series

Lecture 24. Products of Series

Lecture 25. Compactness

Lecture 26. The Cantor Set

Lecture 27. Compactness and Continuity

Lecture 28. Uniform Continuity

Lecture 29. Contraction Mapping Principle

Lecture 30. Derivatives

Lecture 31. Mean Value Theorem

Lecture 32. Functions Behaving Badly

Lecture 33. Uniform Convergence

Lecture 34. Uniform Convergence

Lecture 35. Weierstrass M -test

Lecture 36. LHopitals Rule

36.1. LHopitals Rule

Lecture 37. Taylor Series

Lecture 38. Initial Value Problems

Lecture 39. Picard Iteration

Appendix A. Basic Logic

Appendix B. Basic Set Theory

Appendix C. Mathematical Induction

Appendix D. Ordered Fields

Appendix E. Primes Numbers

E.1. Euclids Theorem

Appendix G. Inner Product Spaces

Appendix H. Covering Compactness

terms. Our initial assumption that

Proof #2. Suppose

contradicting the minimality of b. We therefore conclude that 2 is irrational.

(i) If c is rational, then let a = b = 2. In this case, ab = c is rational while a and

S.R. Garcia Lectures on Real Analysis I (Preliminary Version)

= a(1 qx0 ) b(qy0 ).

remainder, r, is zero). Similar reasoning shows that l divides b as well. Therefore l is a

since na = bn and nb = a. However, bnx + ay is an integer, which implies that n

The Archimedean Property and Its Consequences

Another consequence of the Archimedean Property is:

It is clear that [x] = m 1 has the desired property (2.1).

Lecture 2. The Archimedean Property and Its Consequences

Indeed, it follows from (2.1) that we may let m = [an] + 1:

Dividing (2.2) through by n, we find that a < x < b where x =

Since 2 is irrational, it follows easily that x = 2y is irrational as well.