Você está na página 1de 84

Lectures on Stochastic Processes

William G. Faris
November 8, 2001
2
Contents
1 Random walk 7
1.1 Symmetric simple random walk . . . . . . . . . . . . . . . . . . . 7
1.2 Simple random walk . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Gamblers ruin and martingales 13
2.1 Gamblers ruin: fair game . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Gamblers ruin: unfair game . . . . . . . . . . . . . . . . . . . . . 14
2.3 The dominated convergence theorem . . . . . . . . . . . . . . . . 15
2.4 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 The single server queue 19
3.1 A discrete time queue . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Emptying the queue . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 The monotone convergence theorem . . . . . . . . . . . . . . . . 21
3.4 The stationary distribution . . . . . . . . . . . . . . . . . . . . . 22
3.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 The branching process 25
4.1 The branching model . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Explosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Extinction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Markov chains 29
5.1 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 The strong Markov property . . . . . . . . . . . . . . . . . . . . . 30
5.3 Transient and recurrent states . . . . . . . . . . . . . . . . . . . . 31
5.4 Recurrent classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.6 The rst passage time . . . . . . . . . . . . . . . . . . . . . . . . 34
5.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3
4 CONTENTS
6 Markov chains: stationary distributions 35
6.1 Stationary distributions . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 Detailed balance . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Positive recurrence and stationary distributions . . . . . . . . . . 36
6.4 The average time in a state . . . . . . . . . . . . . . . . . . . . . 37
6.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.6 Convergence to a stationary distribution . . . . . . . . . . . . . . 38
6.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7 The Poisson process 41
7.1 The Bernoulli process . . . . . . . . . . . . . . . . . . . . . . . . 41
7.2 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.3 The Poisson paradox . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.4 Combining Poisson processes . . . . . . . . . . . . . . . . . . . . 43
7.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8 Markov jump processes 45
8.1 Jump rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.2 Hitting probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 46
8.3 Stationary distributions . . . . . . . . . . . . . . . . . . . . . . . 46
8.4 The branching process . . . . . . . . . . . . . . . . . . . . . . . . 47
8.5 The N server queue . . . . . . . . . . . . . . . . . . . . . . . . . . 47
8.6 The Ehrenfest chain . . . . . . . . . . . . . . . . . . . . . . . . . 48
8.7 Transition probabilities . . . . . . . . . . . . . . . . . . . . . . . 49
8.8 The embedded Markov chain . . . . . . . . . . . . . . . . . . . . 50
8.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
9 The Wiener process 51
9.1 The symmetric random walk . . . . . . . . . . . . . . . . . . . . 51
9.2 The Wiener process . . . . . . . . . . . . . . . . . . . . . . . . . 51
9.3 Continuity and dierentiability . . . . . . . . . . . . . . . . . . . 52
9.4 Stochastic integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 52
9.5 Equilibrium statistical mechanics . . . . . . . . . . . . . . . . . . 53
9.6 The Einstein model of Brownian motion . . . . . . . . . . . . . . 54
9.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
10 The Ornstein-Uhlenbeck process 57
10.1 The velocity process . . . . . . . . . . . . . . . . . . . . . . . . . 57
10.2 The Ornstein-Uhlenbeck position process . . . . . . . . . . . . . 59
10.3 Stationary Gaussian Markov processes . . . . . . . . . . . . . . . 59
10.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11 Diusion and drift 61
11.1 Stochastic dierential equation . . . . . . . . . . . . . . . . . . . 61
11.2 Diusion equations . . . . . . . . . . . . . . . . . . . . . . . . . . 61
11.3 Stationary distributions . . . . . . . . . . . . . . . . . . . . . . . 62
CONTENTS 5
11.4 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . 63
11.5 Martingales and hitting probabilities . . . . . . . . . . . . . . . . 64
11.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
12 Stationary processes 67
12.1 Mean and covariance . . . . . . . . . . . . . . . . . . . . . . . . . 67
12.2 Gaussian processes . . . . . . . . . . . . . . . . . . . . . . . . . . 68
12.3 Mean and covariance of stationary processes . . . . . . . . . . . . 68
12.4 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
12.5 Impulse response functions . . . . . . . . . . . . . . . . . . . . . 69
12.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
13 Spectral analysis 73
13.1 Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 73
13.2 Convolution and Fourier transforms . . . . . . . . . . . . . . . . 74
13.3 Smoothness and decay . . . . . . . . . . . . . . . . . . . . . . . . 74
13.4 Some transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
13.5 Spectral densities . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
13.6 Frequency response functions . . . . . . . . . . . . . . . . . . . . 78
13.7 Causal response functions . . . . . . . . . . . . . . . . . . . . . . 80
13.8 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
13.9 1/f noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
13.10Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6 CONTENTS
Chapter 1
Random walk
1.1 Symmetric simple random walk
Let X
0
= x and
X
n+1
= X
n
+
n+1
. (1.1)
The
i
are independent, identically distributed random variables such that
P[
i
= 1] = 1/2. The probabilities for this random walk also depend on
x, and we shall denote them by P
x
. We can think of this as a fair gambling
game, where at each stage one either wins or loses a xed amount.
Let T
y
be the rst time n 1 when X
n
= y. Let
xy
= P
x
[T
y
< ] be the
probability that the walk starting at x ever gets to y at some future time.
First we show that
12
= 1. This says that in the fair game one is almost
sure to eventually get ahead by one unit. This follows from the following three
equations.
The rst equation says that in the rst step the walk either goes from 1 to
2 directly, or it goes from 1 to 0 and then must go from 0 to 2. Thus

12
=
1
2
+
1
2

02
. (1.2)
The second equation says that to go from 0 to 2, the walk has to go from
0 to 1 and then from 1 to 2. Furthermore, these two events are independent.
Thus

02
=
01

12
. (1.3)
The third equation says that
01
=
12
. This is obvious.
Thus =
12
satises
=
1
2

2
+
1
2
. (1.4)
This is a quadratic equation that can also be written as
2
2 + 1 = 0, or
( 1)
2
= 0. Its only solution is = 1.
How long does it take, on the average, to get ahead by this amount? Let
m
xy
= E
x
[T
y
], the expected time that the random walk takes to get to y,
7
8 CHAPTER 1. RANDOM WALK
starting at x. We have just seen that if x = 1, then T
2
< with probability
one. Let us do the same kind of computation for m
12
= E
1
[T
2
].
The rst equation says that in the rst step the walk either goes from 1 to
2 directly, or it goes from 1 to 0 and then must go from 0 to 2. Thus
m
12
=
1
2
(1 +m
02
) +
1
2
1. (1.5)
The second equation says that to go from 0 to 2, the walk has to go from 0
to 1 and then from 1 to 2. Thus
m
02
= m
01
+m
12
. (1.6)
The third equation says that m
01
= m
12
. This is obvious.
Thus m = m
12
satises
m =
1
2
(1 + 2m) +
1
2
. (1.7)
This is a linear equation that can also be written as m = 1+m. Its only solution
is m = . Thus we have seen that m
01
= .
At rst this seems strange, but it is quite natural. In the fair game there is
a small probability of a bad losing streak. It takes a long time to recover from
the losing streak and eventually get ahead. Innitely long, on the average.
The calculation above uses a rather subtle property of random walk. Namely,
it was assumed that after the walk accomplishes the task of going from 0 to 1,
then it has an equally dicult task of going from 1 to 2. Now this is delicate,
because the walk going from 1 to 2 starts out at a random time, not at a xed
time. Namely, if we start the walk at 0 and set T = T
1
as the rst time the
walk gets to 1, then from that time on the walk is X
T
, X
T+1
, X
T+2
, . . . with
X
T
= 1 by denition. The point is that this has the same distribution as the
walk X
0
, X
1
, X
2
, . . . with X
0
= 1. This fact is called the strong Markov property.
The strong Markov property has the following statement. Start the walk
out at x and let T = T
y
. Let B be a set of random walk paths. We must prove
that
P
x
[(X
T
, X
T+1
, X
T+2
, . . .) B | T < ] = P
y
[(X
0
, X
1
, X
2
, . . .) B]. (1.8)
This says that the walk does not care when and how it got to y for the rst
time; from that point on it moves just as if it were started there at time zero.
The proof is the following. We write
P
x
[(X
T
, X
T+1
, X
T+2
, . . .) B, T < ] =

n=1
P
x
[(X
T
, X
T+1
, X
T+2
, . . .) B, T = n].
(1.9)
However
P
x
[(X
T
, X
T+1
, X
T+2
, . . .) B, T = n] = P
x
[(X
n
, X
n+1
, X
n+2
. . .) B, T = n].
(1.10)
1.2. SIMPLE RANDOM WALK 9
Now the event T = n depends only on the walk X
1
, X
2
, . . . , X
n
, which depends
only on the
1
,
2
, . . . ,
n
. On the other hand, the walk X
n
, X
n+1
, X
n+2
, . . . with
X
n
= y depends only on the
n+1
,
n+1
, . . .. Thus the events are independent,
and we have
P
x
[(X
T
, X
T+1
, X
T+2
, . . .) B, T = n] = P
x
[(X
n
, X
n+1
, X
n+2
. . .) B]P[T = n],
(1.11)
where X
n
= y. Finally, this is
P
x
[(X
T
, X
T+1
, X
T+2
, . . .) B, T = n] = P
y
[(X
0
, X
1
, X
2
. . .) B]P[T = n].
(1.12)
The proof is nished by summing over n.
The picture of symmetric simple random walk that emerges is the following.
Starting from any point, the probability of eventually getting to any other point
is one. However the expected time to accomplish this is innite. Thus an
imbalance in one direction is always compensated, but this random process is
incredibly inecient and can take a huge amount of time to do it.
1.2 Simple random walk
Let X
0
= x and
X
n+1
= X
n
+
n+1
. (1.13)
The
i
are independent, identically distributed random variables such that
P[
i
= 1] = p, P[
i
= 1] = q, and P[ = 0] = r. Here p + q + r = 1, so
the walk can change position by only one step at a time. The probabilities for
this random walk depend on x, and we shall denote them by P
x
. We can think
of this as a gambling game, where at each stage one either wins or loses a xed
amount.
Let T
y
be the rst time n 1 when X
n
= y. Let
xy
= P
x
[T
y
< ] be the
probability that the walk starting at x ever gets to y at some future time.
First we calculate
12
. The calculation again uses three equations.
The rst equation says that in the rst step the walk either goes from 1 to 2
directly, or it stays at 1 and then must go to 2, or it goes from 1 to 0 and then
must go from 0 to 2. Thus

12
= q
02
+r
12
+p. (1.14)
The second equation says that to go from 0 to 2, the walk has to go from
0 to 1 and then from 1 to 2. Furthermore, these two events are independent.
Thus

02
=
01

12
. (1.15)
The third equation says that
01
=
12
. This is obvious.
Thus =
12
satises
= q
2
+r +p. (1.16)
10 CHAPTER 1. RANDOM WALK
This is a quadratic equation that can also be written as q
2
+(r 1) +p = 0,
or ( 1)(q p) = 0. If q > 0 its solutions are = 1 and = p/q.
If q p, then it is clear that the probability
01
= 1, except in the case
p = q = 0. The game is even or favorable, so one is eventually ahead.
If p < q, then it would seem plausible that the probability is
01
= p/q.
The game is unfavorable, and there is some chance that there is an initial losing
streak from which one nevers recovers. There are a number of ways of seeing
that this is the correct root. If the root were one, then the process starting at
any number x 0 would be sure to eventually reach 1. However according to
the strong law of large numbers the walk X
n
/n pq < 0 as n . Thus the
walk eventually becomes negative and stays negative. This is a contradiction.
How long does it take, on the average, to get ahead by this amount? Let
m
xy
= E
x
[T
y
], the expected time that the random walk takes to get to y,
starting at x. We have just seen that if x = 1 and p q (not both zero),
then T
2
< with probability one. Let us do the same kind of computation for
m
12
= E
1
[T
2
].
The rst equation says that in the rst step the walk either goes from 1 to
2 directly, or it remains at 1 and then goes to 2, or it goes from 1 to 0 and then
must go from 0 to 2. Thus
m
12
= q(1 +m
02
) +r(1 +m
12
) +p1. (1.17)
The second equation says that to go from 0 to 2, the walk has to go from 0
to 1 and then from 1 to 2. Thus
m
02
= m
01
+m
12
. (1.18)
Here we are implicitly using the strong Markov property.
The third equation says that m
01
= m
12
. This is obvious.
Thus m = m
12
satises
m = q(1 + 2m) +r(1 +m) +p. (1.19)
This is a linear equation that can also be written as m = 1 + (1 +q p)m. Its
solutions are m = and m = 1/(p q).
If p q, then it is clear that the expectation m
12
= . The game is even
or unfavorable, so it can take a very long while to get ahead, if ever.
If q < p, then it would seem plausible that the expected number of steps to
get ahead by one is m
12
= 1/(p q). The game is favorable, so one should have
win relatively soon, on the average. There are a number of ways of seeing that
this is the correct solution. One way would be to use the identity
E
1
[T
2
] =

k=0
kP
1
[T
2
= k] =

k=1
P
1
[T
2
k]. (1.20)
Then one can check that if q < p, then the probabilities P
1
[T
2
k] go rapidly
to zero, so the series converges. This rules out an innite expectation.
1.3. PROBLEMS 11
1.3 Problems
1. For the symmetric simple random walk starting at zero, nd the mean and
standard deviation of X
n
. Find an approximate expression for P[X
n
= y].
(Hint: If n is even, then y is even, and if n is odd, then y is odd. So for
xed n the spacing between adjacent values of y is y = 2. Use the
central limit theorem, and replace the integral with a Riemann sum with
y = 2.)
2. For the simple random walk starting at zero, nd the mean and standard
deviation of X
n
.
3. For the simple random walk starting at 1, compute
11
, the probability of
a return to the starting point.
4. For the simple random walk starting at 1, compute m
11
, the expected
time to return to the starting point.
5. A gambler plays roulette with the same stake on each play. This is simple
random walk with p = 9/19 and q = 10/19. The player has deep pockets,
but must win one unit. What is the probability that he does this in the
rst three plays? What is the probability that he ever does this?
12 CHAPTER 1. RANDOM WALK
Chapter 2
Gamblers ruin and
martingales
2.1 Gamblers ruin: fair game
Let X
0
= x and
X
n+1
= X
n
+
n+1
. (2.1)
The
i
are independent, identically distributed random variables such that
P[
i
= 1] = p, P[
i
= 1] = q, and P[
i
= 0] = r. Here p + q + r = 1, so
the walk can change position by only one step at a time. The probabilities for
this random walk depend on x, and we shall denote them by P
x
.
We can think of this as a gambling game, where at each stage one either wins
or loses a xed amount. However now we want to consider the more realistic
case when one wants to win a certain amount, but there is a cap on the possible
loss. Thus take a < x < b. The initial capital is x. We want to play until the
time T
b
when the earnings achieve the desired level b or until the time T
a
when
we are broke. Let T be the minimum of these two times. Thus T is the time
the game stops.
There are a number of ways to solve the problem. One of the most elegant
is the martingale method. A martingale is a fair game.
We rst want to see when the game is fair. In this case the conditional
expectation of the winnings at the next stage are the present fortune. Thus
E[X
n+1
| X
n
= z] = z. (2.2)
However this just says that q(z 1) +rz +p(z + 1) = z, or p = q.
It follows from this equation that
E[X
n+1
] =

z
E[X
n+1
| X
n
= z]P[X
n
= z] =

z
zP[X
n
= z] = E[X
n
]. (2.3)
Therefore, starting at x we have
E
x
[X
n
] = x. (2.4)
13
14 CHAPTER 2. GAMBLERS RUIN AND MARTINGALES
The expected fortune stays constant.
Let T n be the minimum of T and n. Then X
Tn
is the game stopped at
T. It is easy to show that this too is a fair game. Either one has not yet won or
lost, and the calculation of the conditional expectation is as before, or one has
won or lost, and future gains and losses are zero. Therefore
E
x
[X
Tn
] = x. (2.5)
Now the possible values of the stopped game are bounded: a X
Tn
b
for all n. Furthermore, since T < with probability one, the limit as n
of X
Tn
is X
T
. It follows from the dominated convergence theorem (described
below) that
E
x
[X
T
] = x. (2.6)
Thus the game remains fair even in the limit of innite time.
From this we see that
aP
x
[X
T
= a] +bP
x
[X
T
= b] = x. (2.7)
It is easy then to calculate, for instance, that the probability of winning is
P
x
[X
T
= b] =
x a
b a
. (2.8)
2.2 Gamblers ruin: unfair game
If the original game is not fair, then we want to modify it to make it fair. The
new game will be that game whose accumulated winnings at stage n is f(X
n
).
The conditional expectation of the winnings in the new game at the next stage
are the present fortune of this game. Thus
E[f(X
n+1
) | X
n
= z] = f(z). (2.9)
This says that qf(z 1) +rf(z) +pf(z + 1) = f(z). One solution of this is to
take f to be a constant function. However this will not give a very interesting
game. Another solution is obtained by trying an exponential f(z) = a
z
. this
gives q/a + r + pa = 1. If we take a = q/p this gives a solution. Thus in the
following we take
f(X
n
) =
_
q
p
_
X
n
(2.10)
as the fair game. If p < q, so the original game is unfair, then in the new
martingale game rewards being ahead greatly and penalized being behind only
a little.
It follows from this equation that
E[f(X
n+1
)] =

z
E[f(X
n+1
) | X
n
= z]P[X
n
= z] =

z
f(z)P[X
n
= z] = E[f(X
n
)].
(2.11)
2.3. THE DOMINATED CONVERGENCE THEOREM 15
Therefore, starting at x we have
E
x
[f(X
n
)] = f(x). (2.12)
The expected fortune in the new martingale game stays constant.
Notice that to play this game, one has to arrange that the gain or loss at
stage n + 1 is
f(X
n+1
) f(X
n
) =
_
_
q
p
_

n+1
1
_
f(X
n
). (2.13)
If, for instance, p < q, then the possible gain of (q p)/p f(X
n
) is greater than
the possible loss of (q p)/q f(X
n
); the multiplicative factor of q/p makes this
a fair game.
Let T n be the minimum of T and n. Then X
Tn
is the game stopped at
T. It is easy to show that this too is a fair game. Therefore the same calculation
shows that
E
x
[f(X
Tn
)] = f(x). (2.14)
Now if p < q and f(x) = (q/p)
x
, we have that a x b implies f(a)
f(x) f(b). It follows that f(a) f(X
Tn
) f(b) is bounded for all n with
bounds that do not depend on n. This justies the passage to the limit as
n . This gives the result
E
x
[f(X
T
)] = f(x). (2.15)
The game is fair in the limit.
From this we see that
_
q
p
_
a
P
x
[X
T
= a] +
_
q
p
_
b
P
x
[X
T
= b] =
_
q
p
_
x
. (2.16)
It is easy then to calculate, for instance, that the probability of winning is
P
x
[X
T
= b] =
(q/p)
x
(q/p)
a
(q/p)
b
(q/p)
a
. (2.17)
If we take p < q, so that we have a losing game, then we can recover our
previous result for the probability of eventually getting from x to b in the random
walk by taking a = . Then we get (q/p)
xb
= (p/q)
bx
.
2.3 The dominated convergence theorem
The dominated convergence theorem gives a useful condition on when one can
take a limit inside an expectation.
Theorem 2.1 Let X
n
be a sequence of random variables. Suppose that the
limit as n of X
n
is X, for each outcome in the sample space. Let Y 0
be a random variable with E[Y ] < such that |X
n
| Y for each outcome in
the sample space and for each n. Then the limit as n of E[X
n
] is E[X].
16 CHAPTER 2. GAMBLERS RUIN AND MARTINGALES
It is perhaps almost too trivial to remark, but it is a fact that a constant
M is a random variable with nite expectation. This gives the following special
case of the dominated convergence theorem. This special case is sometimes
called the bounded convergence theorem.
Corollary 2.1 Let X
n
be a sequence of random variables. Suppose that the
limit as n of X
n
is X, for each outcome in the sample space. Let M 0
be a constant such that |X
n
| M for each outcome in the sample space and for
each n. Then the limit as n of E[X
n
] is E[X].
2.4 Martingales
The general denition of a martingale is the following. We have a sequence
of random variables
1
,
2
,
3
, . . .. The random variable Z
n
is a function of

1
, . . . ,
n
, so that
Z
n
= h
n
(
1
, . . . ,
n
). (2.18)
The martingale condition is that
E[Z
n+1
|
1
= a
1
, . . . ,
n
= a
n
] = h
n
(a
1
, . . . , a
n
). (2.19)
Thus a martingale is a fair game. The expected value at the next stage, given
the present and past, is the present fortune.
It follows by a straightforward calculation that
E[Z
n+1
] = E[Z
n
]. (2.20)
In particular E[Z
n
] = E[Z
0
].
The fundamental theorem about martingales says that if one applies a gam-
bling scheme to a martingale, then the new process is again a martingale. That
is, no gambling scheme can convert a fair game to a game where one has an
unfair advantage.
Theorem 2.2 Let
1
,
2
,
3
, . . . be a sequence of random variables. Let X
0
, X
1
, X
2
, . . .
be a martingale dened in terms of these random variables, so that X
n
is a func-
tion of
1
, . . . ,
n
. Let W
n
be a gambling scheme, that is, a function of
1
, . . . ,
n
.
Let Z
0
, Z
1
, Z
2
, . . . be a new process such that Z
n+1
Z
n
= W
n
(X
n+1
X
n
).
Thus the gain in the new process is given by the gain in the original process
modied by the gambling scheme. Then Z
0
, Z
1
, Z
2
, . . . is also a martingale.
Proof: The condition that X
n
is a martingale is that the expected gain
E[X
n+1
X
n
|
1
= a
1
, . . . ,
n
= a
n
] = 0. Let
W
n
= g
n
(
1
, . . . ,
n
) (2.21)
be the gambling scheme. Then
E[Z
n+1
Z
n
|
1
= a
1
, . . . ,
n
= a
n
] = E[W
n
(X
n+1
X
n
) |
1
= a
1
, . . . ,
n
= a
n
].
(2.22)
2.4. MARTINGALES 17
On the other hand, this is equal to
g
n
(a
1
, . . . , a
n
)E[X
n+1
X
n
|
1
= a
1
, . . . ,
n
= a
n
] = 0. (2.23)
Example: Say that X
n
=
1
+
2
+
n
is symmetric simple random walk.
Let the gambling scheme W
n
= 2
n
. That is, at each stage one doubles the
amount of the bet. The resulting martingale is Z
n
=
1
+2
2
+4
3
+ +2
n1

n
.
This game gets wilder and wilder, but it is always fair. Can one use this double
the bet game to make money? See below.
One important gambling scheme is to quit gambling once some goal is
achieved. Consider a martingale X
0
, X
1
, X
2
, . . .. Let T be a time with the
property that the event T n is dened by
1
, . . . ,
n
. Such a time is called
a stopping time. Let the gambling scheme W
n
be 1 if n < T and 0 if T n.
Then the stopped martingale is given by taking Z
0
= X
0
and Z
n+1
Z
n
=
W
n
(X
n+1
X
n
). Thus if T n, Z
n+1
Z
n
= 0, while if n < T then
Z
n+1
Z
n
= X
n+1
X
n
. As a consequence, if T n, then Z
n
= X
T
, while if
n < T, then Z
n
= X
n
. This may be summarized by saying that Z
n
= X
Tn
,
where T n is the minimum of T and n. In words: the process no longer changes
after time T.
Corollary 2.2 Let T be a stopping time. Then if X
n
is a martingale, then so
is the stopped martingale Z
n
= X
nT
.
It might be, for instance, that T is the rst time that the martingale X
n
belongs to some set. Such a time is a stopping time. Then the process X
Tn
is
also a martingale.
Example: A gambler wants to use the double the bet martingale Z
n
=

1
+ 2
2
+ 4
3
+ + 2
n1

n
to get rich. The strategy is to stop when ahead.
The process Z
Tn
is also a martingale. This process will eventually win one
unit. However, unfortunately for the gambler, at any particular non-random
time n the stopped process is a fair game. Z
Tn
is either 1 with probability
1 1/2
n
or 1 2
n
with probability 1/2
n
. It looks like an easy win, most of the
time. But a loss is a disaster.
If Z
n
is a martingale, then the processes Z
Tn
where one stops at the stop-
ping time T is also a martingale. However what if T < with probability one,
there is no limit on the time of play, and one plays until the stopping time?
Does the game remain fair in the limit of innite time?
Theorem 2.3 If T < and if there is a random variable Y 0 with E[Y ] <
such that for all n we have |Z
Tn
| Y , then E[Z
Tn
] E[Z
T
] as n .
This theorem, of course, is just a corollary of the dominated convergence
theorem. In the most important special case the dominating function Y is just
a constant. This says that for a bounded martingale we can always pass to
the limit. In general the result can be false, as we shall see in the following
examples.
Example. Let X
n
be symmetric simple random walk starting at zero. Then
X
n
is a martingale. Let T be the rst n with X
n
= b > 0. The process
18 CHAPTER 2. GAMBLERS RUIN AND MARTINGALES
X
Tn
that stops when b is reached is a martingale. The stopped process is
a martingale. However the innite time limit is not fair! In fact X
T
= b by
denition. A fair game is converted into a favorable game. However this is only
possible because the game is unbounded below.
Example. Let X
n
be simple random walk starting at zero, not symmetric.
Then (q/p)
X
n
is a martingale. Let b > 0 and T be the rst n 1 with
X
n
= b. Then (q/p)
X
Tn
is a martingale. If p < q, an unfavorable game,
then this martingale is bounded. Thus we can pass to the limit. This gives
(q/p)
x
= (q/p)
b
P[T
b
< ]. On the other hand, if q < p, then the martingale
is badly unbounded. It is not legitimate to pass to the limit. And in fact
(q/p)
x
= (q/p)
b
P[T
b
< ] = (q/p)
b
.
Example: Consider again the double the bet gambling game where one quits
when ahead by one unit. Here Z
Tn
= 1 with probability 11/2
m
and Z
Tn
=
1 2
n
with probability 1/2
n
. Eventually the gambler will win, so the Z
T
= 1.
This limiting game is no longer a martingale, it is favorable to the gambler.
However one can win only by having unlimited credit. This is unrealistic in this
kind of game, since the losses along the way can be so huge.
2.5 Problems
1. HPS, Chapter 1, Exercise 25.
2. For the simple random walk with p +q +r = 1 and p = q show that X
n

(p q)n is a martingale. This is the game in which an angel compensates


the player for the average losses. Then let a < x < b and let T be the
rst time the walk starting at x gets to a or b. Show that the stopped
martingale X
Tn
(p q)T n is a martingale. Finally, use E
x
[T] <
to show that E
x
[X
T
] = (p q)E
x
[T] +x. Compute E
x
[T] explicitly.
3. For the symmetric simple random walk with p + q + r = 1 and p = q
show that X
2
n
(1 r)n is a martingale. The average growth of X
2
n
is
compensated to make a fair game. Then let a < x < b and let T be the rst
time the symmetric walk starting at x gets to a or b. Show that the stopped
martingale X
2
Tn
(1r)T n is a martingale. Finally, use E
x
[T] < to
show that for the symmetric walk E
x
[X
2
T
] = (1 r)E
x
[T] +x
2
. Compute
E
x
[T] explicitly.
Chapter 3
The single server queue
3.1 A discrete time queue
Here the process is the following. Clients enter the queue. From time to time
a client is served, if any are present. Let X
n
be the number of clients in the
queue just before service. Let W
n
be the number of clients in the queue just
after service. Then
W
n
= (X
n
1) 0 (3.1)
and
X
n+1
= W
n
+
n+1
, (3.2)
where
n+1
0 are the number of new clients that enter. We shall assume
that these are independent random variables whose values are natural numbers.
Thus P[
i
= k] = f(k).
The queue may be described either in terms of X
n
or in terms of W
n
. In
the rst description
X
n+1
= (X
n
1) 0 +
n+1
. (3.3)
In the second description
W
n+1
= (W
n
+
n+1
) 0. (3.4)
We shall use the rst description. This process behaves like a random walk
away from zero. Thus if X
n
1, then X
n+1
= X
n
+
n+1
1. Thus it can step
up by any amount, but it can step down by at most one. On the other hand,
when X
n
= 0, then X
n+1
= X
n
+
n+1
. An empty queue can only step up in
size.
3.2 Emptying the queue
From now on we make the assumption that P[
i
2] = 1 f(0) f(1) > 0. If
this is violated then the queue cannot possibly grow in size. We also make the
19
20 CHAPTER 3. THE SINGLE SERVER QUEUE
assumption that P[
i
= 0] = f(0) > 0. If this does not hold, then the queue
cannot decrease in size.
We want to see if the queue is sure to eventually empty. For this we need to
compute
x0
= P
x
[T
0
< ] for x 1. First we note that

10
= f(0) +

y=1
f(y)
y0
. (3.5)
This just says that at the rst step something happens. The second observation
is that for x 1 we have

x0
=
x
10
. (3.6)
This follows from the strong Markov property of random walk.
Let =
10
. We have shown that
=

y=0
f(y)
y
. (3.7)
We can write this also as a xed point equation
= (), (3.8)
where (t) is the generating function of the number of new entrants to the queue

i
.
This generating function has remarkable properties in the interval from 0
to 1. First, note that (0) = f(0) and (1) = 1. Furthermore,

(t) 0, so
is increasing from f(0) to 1. Furthermore,

(t) 0, so (t) is a convex


function. In fact, since there is some i with i 2 and f(i) > 0, it follows that

(t) > 0 for 0 < t 1, so (t) is strictly convex. Finally,

(1) = , the
expected number of new entrants to the queue. All these properties give a fairly
complete picture of (t).
There are two cases. If 1, then (t) > t for 0 t < 1. So the only root
of (t) = t is = 1. This says that the queue with one customer eventually
becomes empty, with probability one. This is indeed what one would expect for
a random walk with mean step size 1 0.
If, on the other hand, > 1, then (t) = t has a second root < 1. This
must be the probability of the queue with one customer eventually becoming
empty. Then there is some probability 1 > 0 that the queue never becomes
empty. This is indeed what one gets by using the strong law of large numbers
with the random walk with mean step size 1 > 0. In fact, the queue will
eventually grow at a faster than linear rate.
Starting the queue from zero is exactly the same as starting it from 1. We
have therefore proved the following theorem.
Theorem 3.1 Consider the queue model with the above assumptions. Start the
queue with x customers. If the mean number of entrants to the queue between
service times is 1, then the queue eventually empties, with probability one.
3.3. THE MONOTONE CONVERGENCE THEOREM 21
If this mean number is > 1, then the probability that the queue ever empties
in the future is
x
< 1 for x 1 and is for x = 0, where is the unique root
of (t) = t that is less than one.
When the mean number of entrants to the queue between service times is
1, then the queue eventually empties. Let us see how long it takes on the
average to empty. For this we need to compute m
x0
= E
x
[T
0
] for x 1. First
we note that
m
10
= f(0)1 +

y=1
f(y)(1 +m
y0
). (3.9)
This just says that either one succeeds at the rst step, or one has wasted one
unit of time. The second observation is that for x 1 we have
m
x0
= xm
10
. (3.10)
This follows from the strong Markov property of random walk.
Let m = m
01
. We have shown that
m = 1 +m. (3.11)
If 1, then this has only the solution m = . However if < 1, then there
is also the solution m = 1/(1 ). This is the correct formula for the expected
time. To see this, one needs to show that the expected time to for the random
walk to get from 1 to 0 is nite. This can be done in various ways, such as by
a martingale argument.
Here is a sketch of the martingale argument. Consider the martingale X
Tn

( 1)T n starting at x 1. This is just the random walk stopped at the


time T when it rst gets to zero, with a correction term that compensates for
the fact that the walk is not symmetric. Since it is a martingale, its expectation
is the initial value x. This says that E[X
Tn
] ( 1)E[T n] = x. But
X
Tn
0, and so its expectation is also greater than or equal to zero. This
says ( 1)E[T n] x. Since < 1, this is equivalent to E[T n]
x/(1 ). Let n . By the monotone convergence theorem (see below),
m = E[T] x/(1 ). So it cannot be innite.
Theorem 3.2 Consider the queue model with the above assumptions. Start the
queue with x customers. If the mean number of customers entering the queue
between service times is < 1, then the expected time until the queue is empty
again is equal to x/(1 ) for x 1 and is 1/(1 ) for x = 0.
3.3 The monotone convergence theorem
Theorem 3.3 Let X
n
0 be a sequence of positive random variables such that
X
n
X
n+1
for each n. Assume that for each outcome X
n
X as n .
Then E[X
n
] E[X] as n .
22 CHAPTER 3. THE SINGLE SERVER QUEUE
The monotone convergence theorem would be nothing new if we knew that
the limiting function X had a nite expectation. In that case, we would have
0 X
n
X, and so we could apply the dominated convergence theorem.
The point is that the theorem shows that if the X
n
are positive and monotone
increasing and E[X
n
] converges to a nite limit, then E[X] is nite and has this
limit as its value.
3.4 The stationary distribution
Consider the queue with < 1. The theorem says that the mean time for a
return to 0 is 1/(1 ). It would then seem reasonable that the proportion of
time that the queue is empty would be 1 . In fact, for every state of the
queue there should be a stable proportion of time that the queue spends in this
state.
These numbers that predict the proportion of time in the state are probabil-
ities (x). They are called the stationary distribution of the queue. The queue
is dened by the numbers
f(y) = P[
i
= y]. (3.12)
The problem is to go from the probabilities f(y) to the probabilities (x). This
should be set up in a way that works whenever < 1.
The general requirement for a stationary distribution is
P[X
n
= y] = P[X
n+1
= y]. (3.13)
If we dene (y) = P[X
n
= y], we obtain for the queue that
(y) = (0)f(y) +
y+1

x=1
(x)f(y x + 1). (3.14)
This equation may be solved recursively for (y +1) in terms of (0), . . . , (y).
But it is not so easy to interpret the solutions. Therefore we transform it to
another form from which it is apparent, for instance, that the solutions are
positive.
Recall that expected number of entrants to the queue is
= E[
i
] =

y=0
yf(y). (3.15)
Let
g(z) = P[
i
z] =

y=z
f(y). (3.16)
Recall the alternative formula for the mean:
=

z1
g(z). (3.17)
3.4. THE STATIONARY DISTRIBUTION 23
Take z 1. Sum the equation for the stationary distribution from 0 to z 1.
This gives
z1

y=0
(y) = (0)
z1

y=0
f(y) +
z1

y=0
y+1

x=1
(x)f(y x + 1). (3.18)
Interchange the order of summation. This gives
z1

y=0
(y) = (0)(1 g(z)) +
z

x=1
z1

y=x1
(x)f(y x + 1). (3.19)
This can be rewritten as
z1

y=0
(y) = (0)(1 g(z)) +
z

x=1
(x)(1 g(z x + 1)). (3.20)
This can be solved for (z).
The nal equation for the stationary distribution is that for each z 1 we
have
(z)[1 g(1)] = (0)g(z) +
z1

x=1
(x)g(z x + 1). (3.21)
In more abstract language, this is a balance equation of the form
P[X
n+1
< z, X
n
z] = P[X
n+1
z, X
n
< z]. (3.22)
This equation says that the rate of transition from having z or more clients to
fewer than z clients is equal to the rate of transition from having fewer than
z clients to z or more clients. The left hand says that to make the transition
down you have to have exactly z clients and no new customers. The right hand
size says that either you have 0 clients and z or more new customers, or you
have x 1 clients, serve 1, leaving x 1 and then add z (x 1) or more new
customers.
These equations are easy to solve recursively. Suppose we know (0). We
compute (1)f(0) = (0)g(1), then (2)f(0) = (0)g(2) + (1)g(2), then
(3)g(0) = (0)g(3) +(1)g(3) +(2)g(4), and so on.
If the stationary probabilities are to exist, then the (0) must be chosen so
that the sum of the (z) for all z = 0, 1, 2, 3, . . . is 1. To see when this works,
sum the equation for z = 1, 2, 3, . . .. Then interchange the order of summation.
This gives the product of two independent sums on the right hand side. The
result is
[1 (0)][1 g(1)] = (0) + [1 (0]]( g(1)]. (3.23)
It equates the rate of moving down by one from anywhere in the queue except
zero to the rate for moving up to anywhere in the queue except zero. This
simplies to [1 (0)][1 g(1)] = [1 (0)]g(1). This can balance only if
< 1. In fact, we see that
(0) = 1 . (3.24)
We have proved the following theorem.
24 CHAPTER 3. THE SINGLE SERVER QUEUE
Theorem 3.4 Consider the queue model with < 1. Then there is a stationary
distribution satisfying (x) = P[X
n
= x] = P[X
n+1
= x]. It satises the balance
equation
P[X
n+1
< z, X
n
z] = P[X
n+1
z, X
n
< z]. (3.25)
Consider the probability f(x) that exactly x clients enter between services and
the cumulative probability g(z) =

xz
f(x). The balance equation takes the
form
(z)f(0) = (0)g(z) +
z1

x=1
(x)g(z x + 1). (3.26)
for z 1. Furthermore, (0) = 1 .
3.5 Problems
1. Think of a queue process taking place in continuous time. Say that cus-
tomers are entering the queue randomly at rate
1
per second. Thus
number of customers who enter the queue in a time period of length t
seconds is Poisson with mean
1
t. Say that the time between services is
exponential with rate
2
per second. Show that the number of customers
arriving in a time period between services is geometric with parameter
p =
2
/(
1
+
2
). Show that the expected number of customers arriving
is = q/p. Suppose that
2
>
1
, so the service rate is faster than the
incoming customer rate. Show that it follows that p > 1/2, so < 1.
2. Now go to the discrete time model. Assume that the number of customers
arriving in a time period between services is geometric with parameter p.
Let q = 1 p and = q/p. Suppose p 1/2, so that 1. Suppose
that there are x customers present initially. The manager has promised to
quit his job the next time that the queue empties. What is the probability
that this ever happens?
3. Continue with the discrete time model. Assume that the number of cus-
tomers arriving in a time period between services is geometric with para-
meter p. Let q = 1p and = q/p. Suppose p > 1/2, so that < 1. Show
that the stationary distribution of the queue is geometric with parameter
1. Hint: Use g(z) = qg(z1) to show (z)p = q(z1)p+(z1)q
2
=
q(z 1).
4. Consider a stochastic process that satises detailed balance (time re-
versibility):
P[X
n+1
= y, X
n
= x] = P[X
n+1
= x, X
n
= y]. (3.27)
Show that it has a stationary measure:
P[X
n+1
= y] = P[X
n
= y]. (3.28)
Chapter 4
The branching process
4.1 The branching model
The process is the following. There are X
n
individuals at time n. The ith
individual has a random number
(n+1)
i
of children. Here P[
(n+1)
i
= k] = f(k).
Thus
X
n+1
=
(n+1)
1
+
(n+1)
2
+ +
(n+1)
X
n
. (4.1)
Thus the next generation consists of all the children of the individuals of the
previous generation.
One could also write this in a more abstract notation as
X
n+1
=

i=1

(n+1)
i
1
iX
n
. (4.2)
4.2 Explosion
The rst thing one can do to analyze the model is to look at expectations and
variances. This turns out to be interesting, but ultimately misleading, in that
it misses some of the essential stochastic features.
The rst computation is that of the mean. Let = E[
(n+1)
i
] be the mean
number of children of an individual. Since the conditional expectation of the
number of children of z individuals is z, we have
E[X
n+1
| X
n
= z] = z. (4.3)
It follows that
E[X
n+1
] =

z
E[X
n+1
| X
n
= z]P[X
n
= z] =

z
zP[X
n
= z] = E[X
n
].
(4.4)
From this we see that
E
x
[X
n
] = x
n
. (4.5)
25
26 CHAPTER 4. THE BRANCHING PROCESS
If > 1, then the mean size of the population grows exponentially.
The second computation is that of the variance. Let
2
be the variance of
the number of children of an individual. Then the conditional variance of the
number of children of z individuals is
2
z. We compute
E[(X
n+1
z)
2
| X
n
= z] =
2
z. (4.6)
Write
(X
n+1
E[X
n
])
2
= (X
n+1
z)
2
+2(X
n+1
z)(zE[X
n
])+(zE[X
n
])
2
.
(4.7)
Take the conditional expectation of both sides, and note that the cross term has
conditional expectation zero. This gives
E[(X
n+1
E[X
n
])
2
| X
n
= z] =
2
z +
2
(z E[X
n
])
2
. (4.8)
However
Var(X
n+1
) =

z
E[(X
n+1
E[X
n+1
])
2
| X
n
= z]P[X
n
= z]. (4.9)
It follows that
Var(X
n+1
) =
2
E[X
n
] +
2
Var(X
n
). (4.10)
This equation says that the variance in the next generation is the mean size
of the current generation times the variance of the number of children plus
the variance of the current generation times the square of the mean number of
children.
One can solve this explicitly. The solution is
Var(X
n
) = x
2

n1
(1 + +
2
+ +
n1
). (4.11)
If = 1, this is a geometric series with sum
Var(X
n
) = x
2

n1

n
1
1
. (4.12)
Theorem 4.1 For the branching process the expected size of the nth generation
is
E
x
[X
n
] = x
n
(4.13)
and the variance of the size of the nth generation is
Var(X
n
) = x
2

n1

n
1
1
. (4.14)
if = 1 and
Var(X
n
) = x
2
n (4.15)
if = 1. Furthermore, the random variable
Z
n
=
X
n

n
(4.16)
4.3. EXTINCTION 27
is a martingale, and if > 1 its variance
Var(Z
n
) = x

2
( 1)
_
1
1

n
_
(4.17)
is bounded independent of n.
The fact that for > 1 the ratio Z
n
= X
n
/
n
is a martingale with bounded
variance is important. It says that in the case of mean exponential growth the
variability of the actual size of the population relative to its expected size is
under control. There is a remarkable martingale theorem that says that under
these circumstance there a random variable Z

so that Z
n
Z

as n .
Thus the asymptotic growth of the population is actually exponential, with a
random coecient Z

0. As we shall now discuss in detail, with > 1 it will


happen that Z

> 0 with strictly positive probability, but also Z

= 0 with
strictly positive probability.
Why does the average size calculation miss the point? Because even if > 1,
so that the expected population size is undergoing exponential growth, there is
a possibility that the actual population size is zero. That is, there is a chance
that fairly soon the population becomes extinct. There is no way to recover
from this.
4.3 Extinction
From now on we make the assumption that P[
i
= 1] = f(1) < 1. If this is vio-
lated then each individual has exactly one child, and the number of individuals
remains constant. It turns out that under this hypothesis the branching process
is sure either to become extinct or to grow to innity.
We want to see when the process will become extinct. For this we need to
compute
x0
= P
x
[T
0
< ] for x 1. First we note that

10
= f(0) +

y=1
f(y)
y0
. (4.18)
This just says that if there is one individual, then for the line to become extinct,
each child must have a line that becomes extinct. The second observation is
that for x 1 we have

x0
=
x
10
. (4.19)
This is because the extinction events for the line of each child are independent.
Let =
01
. We have shown that
=

y=0
f(y)
y
. (4.20)
We can write this also as a xed point equation
= (), (4.21)
28 CHAPTER 4. THE BRANCHING PROCESS
where (t) is the generating function of the number of new entrants to the queue

i
.
This generating function has remarkable properties in the interval from 0 to
1. First, note that (0) = f(0) and (1) = 1. Furthermore,

(t) 0, so is
increasing from f(0) to 1. Furthermore,

(t) 0, so (t) is a convex function.


Finally,

(1) = , the expected number of children of an individual.


There are two cases. If 1, then (t) > t for 0 t < 1. So the only root
of (t) = t is = 1. This says that the descendents of the starting individual
eventually die out, with probability one. This is indeed what one would expect
from such pitiful reproductive capability.
If, on the other hand, > 1, then (t) = t has a second root < 1. This
root is unique. In fact, > 1 implies that there is some i 2 with f(i) > 0. It
follows that

(t) > 0 for 0 < t 1, so (t) is strictly convex.


This second root must be the probability that the line starting with one
individual eventually dies out. To see this, consider
X
n
. This is a martingale,
so E
1
[
X
n
] = . As n , then either X
n
or X
n
is eventually zero. So

X
n
converges to the indicator function 1
A
of the event A of eventual extinction.
By the bounded convergence theorem E
1
[1
A
] = . This says that P
1
[A] = .
Theorem 4.2 Consider the branching model with the above assumptions. Start
with x individuals. If the mean number of children of an individual is 1,
then the line eventually dies out, with probability one. If this mean number of
children is > 1, then the probability that the line dies out is
x
< 1 for each
x 1. Here is the unique root of (t) = t that is less than one.
4.4 Problems
1. HPS, Chapter 1, Problem 32
2. HPS, Chapter 1, Problem 33
3. HPS, Chapter 1, Problem 34
Chapter 5
Markov chains
5.1 The Markov property
Let S be a countable set. We consider random variables X
n
with values in S.
The ocial denition of Markov chain is through the following Markov property:
P[X
n+1
= b | X
0
= a
0
, X
1
= a
1
, . . . , X
n
= a
n
] = P[X
n+1
= b | X
n
= a
n
].
(5.1)
In other words, the next step in the future depends on the past and present only
through the present. We shall also assume that the process is time homogeneous,
so that the transition probabilities are always the same. This says that
P[X
n+1
= b | X
n
= a] = P[X
1
= b | X
0
= a]. (5.2)
Let
(x) = P[X
0
= x] (5.3)
be the initial probabilities. Let
P[X
n+1
= y | X
n
= x] = P(x, y) (5.4)
be the transition probabilities. If these data are the same, then this is regarded
as the same chain. In fact, the chain is often thought of as associated just with
the transition probabilities.
Let us look at the Markov property in a special case, to see how it works.
Say that we want to compute
P[X
n+2
= b
2
, X
n+1
= b
1
| X
n
= a] = P[X
n+2
= b
2
| X
n+1
= b
1
, X
n
= a]P[X
n+1
= b
1
| X
n
= a].
(5.5)
We would be completely stuck unless we could use the Markov property
P[X
n+2
= b
2
| X
n+1
= b
1
, X
n
= a] = P[X
n+2
= b
2
| X
n+1
= b
1
]. (5.6)
29
30 CHAPTER 5. MARKOV CHAINS
Then we get
P[X
n+2
= b
2
, X
n+1
= b
1
| X
n
= a] = P[X
n+2
= b
2
| X
n+1
= b
1
]P[X
n+1
= b
1
| X
n
= a].
(5.7)
It follows by summing that
P[X
n+2
= b
2
| X
n
= a] =

b
1
P[X
n+2
= b
2
| X
n+1
= b
1
]P[X
n+1
= b
1
| X
n
= a].
(5.8)
There is a considerably more general statement of the Markov property. If
G is an arbitrary event dened in terms of the values of X
0
, X
1
, . . . , X
n
, then
P[X
n+1
= b
1
, X
n+2
= b
2
, X
n+3
= b
3
, . . . | G] = P
X
n
[X
1
= b
1
, X
2
= b
2
, X
3
= b
3
, . . .].
(5.9)
It is just as if the process started all over at X
n
, none of the history in G is
relevant, other than the value of X
n
.
It is not too hard to show that
P[X
n+k
= y | X
n
= x] = P
k
(x, y), (5.10)
where P
k
is the matrix power of the transition probability matrices. Thus there
is a strong connection between Markov chains and linear algebra. We denote
the process starting with probability density by P

, so
P

[X
k
= y] =

x
(x)P
k
(x, y). (5.11)
In particular, if the process starts at x, then we write
P
x
[X
k
= y] = P
k
(x, y). (5.12)
Theorem 5.1 Let S be a countable set. Let F be a function that sends a pair
(x, r) with x S and r a number to a value y = F(x, r) in S. Let
1
,
2
,
3
, . . .
be a sequence of independent identically distributed random variables. Let X
0
be a random variable with values in S. Then the sequence of random variables
X
n
with values in S dened by
X
n+1
= F(X
n
,
n+1
) (5.13)
is a Markov chain.
5.2 The strong Markov property
The strong Markov property is an extension of the Markov property to a random
time. This time must be stopping time. This means that the event that T = n
must be denable in terms of the values X
0
, X
1
, X
2
, . . . , X
n
. In other words, to
know when you stop, it is sucient to know the past.
5.3. TRANSIENT AND RECURRENT STATES 31
A typical example is when T is the rst time in the future that the process ar-
rives at some set. To know that T = n you only need to know that X
1
, . . . , X
n1
are not in the set and that X
n
is in the set.
The strong Markov property says that if G is any event dened in terms of
the value of T and of X
0
, X
1
, . . . , X
T
that implies T < , then
P[X
T+1
= b
1
, X
T+2
= b
2
, X
T+3
= b
3
, . . . | G] = P
X
T
[X
1
= b
1
, X
2
= b
2
, X
3
= b
3
, . . .]
(5.14)
It is just as if the process started all over at X
T
. None of the history in G is
relevant.
Here is a special case:
P[X
T+1
= b
1
, X
T+2
= b
2
, X
T+3
= b
3
, . . . | T < ] = P
X
T
[X
1
= b
1
, X
2
= b
2
, X
3
= b
3
, . . .]
(5.15)
Here is a typical application. Suppose the chain starting at x has the prop-
erty that it can get to z only by rst passing through y. Then
P
x
[T
z
< ] = P
x
[T
z
< , T
y
< ] = P
x
[T
z
< | T
y
< ]P[T
y
< ].
(5.16)
By the strong Markov property this is
P
x
[T
z
< ] = P
y
[T
z
< ]P
x
[T
y
< ]. (5.17)
Introduce the notation
xz
= P
x
[T
z
< ]. Then this result says that if the chain
starting at x can get to z only by rst passing through y, then
xz
=
xy

yz
.
5.3 Transient and recurrent states
First we x some notation. Consider a Markov chain starting at x. Let T
y
be
the least n 1 such that X
n
= y. If there is no such n, then T
y
= . The
hitting probability of y starting from x is

xy
= P
x
[T
y
< ]. (5.18)
Consider a Markov chain starting at x. Let N(y) be the number of n 1
such that X
n
= y. We allow the possibility that N(y) = .
Theorem 5.2 For each m 1
P
x
[N(y) m] =
xy

m1
yy
. (5.19)
Corollary 5.1 If
yy
< 1, then
P
x
[N(y) = ] = 0. (5.20)
Corollary 5.2 If
yy
= 1, then
P
x
[N(y) = ] =
xy
. (5.21)
A state of a Markov chain is transient if
yy
< 1 and recurrent if
yy
= 1.
From the previous corollaries, if the chain is started at transient state, it returns
nitely many times. If the chain is started at a recurrent state, it returns
innitely many times.
32 CHAPTER 5. MARKOV CHAINS
5.4 Recurrent classes
We say that x leads to y if
xy
> 0.
Theorem 5.3
E
x
[N(y)] =

n=1
P
n
(x, y). (5.22)
Corollary 5.3 The state x leads to the state y if and only if there exists some
k 1 with P
k
(x, y) > 0.
Proof: If P
k
(x, y) > 0 for some k 1, then P
x
[T
y
k] > 0 and so
xy
> 0.
On the other hand, if P
k
(x, y) = 0 for all k 1, then E
x
[N(y)] = 0, so
N(y) = 0 almost surely. This can only happen if
xy
= 0.
Theorem 5.4 If x is recurrent and x leads to y, then y is recurrent and y leads
to x. Furthermore
xy
=
yx
= 1.
Proof: Suppose x is recurrent and x leads to y. Then
xx
= 1 and
xy
> 0.
Take the least n such that P
x
[T
y
= n] > 0. If the process gets from x to y in
time n and does not return to x, then the process has left x and never returned.
Thus P
x
[T
y
= n](1
yx
) 1
xx
. This proves that
yx
= 1. In particular, y
leads to x.
Start the process at x. Let A
r
be the event that the process visits y between
the rth and r +1st visits to x (including the starting visit). Let A be the event
r A
r
, which is the same as the event T
y
< . Then P
x
[A
r
] = is constant, by
the strong Markov property. Furthermore, > 0. The reason is that if P
x
[A
r
] =
0 for each r, then P
x
[A] = 0, and so
xy
= 0. It follows that P
x
[A
c
r
] = 1 < 1.
Since the events A
r
are independent, P
x
[A
c
] = P
x
[r A
c
r
] =

r
P
x
[A
c
r
] = 0. We
conclude that
xy
= P
x
[A] = 1.
We have proved that if x is recurrent and x leads to y, then
yx
= 1 and

xy
= 1. In particular y leads to x. Since
yx
= 1 and
xy
= 1, it follows that

yy
= 1. So y is recurrent.
Given a recurrent state, the set of all states that it leads to is called its
recurrent class. We have the following theorem.
Theorem 5.5 The states of a Markov chain consist of transient states and
states belonging to a disjoint collection of recurrent classes.
Let C be a recurrent class. Let x be a state. Let us write
C
(x) for the
probability that the process, starting from x, ever gets into the recurrent class
C. Of course once it gets there, it remains there.
Theorem 5.6 For each recurrent class C the hitting probability function
C
is
a function that satises the equation
P
C
=
C
(5.23)
with the boundary condition that
C
(y) = 1 for each y in C and
C
(x) = 0 for
each x belonging to another recurrence class C

= C.
5.5. MARTINGALES 33
This theorem shows how to nd hitting probabilities of a recurrent class.
The idea is to solve the equation Pf = f with the boundary condition that f is
one on the class and 0 on all other classes. This is a system of linear equations.
The intuitive signicance of the equation is exhibited by writing it as
f(x) =

y
P(x, y)f(y). (5.24)
This shows that one is looking at the hitting probability as a function of the
initial point and conditioning on the rst step. Since this is using the initial point
as the variable, this is sometimes called the backward equation or backward
method.
In matrix language, one is nding a column vector that is an eigenvector of
P with eigenvalue 1. The dimension of this space of column vectors is at least
equal to the number of recurrent classes.
Remark: Say that one wants to nd the hitting probability of some particular
state. Then change to a new chain in which this state is an absorbing state.
Then in this new chain one has a recurrent class with just one point. All the
theory above applies.
5.5 Martingales
If we have a Markov chain with transition probability matrix P, then a function
f with Pf = f is called a harmonic function.
Theorem 5.7 Let X
n
be a Markov chain. Let f be a harmonic function on
the state space. Then Z
n
= f(X
n
) is a function of the Markov chain that is a
martingale.
Suppose that f is a bounded function satisfying Pf = f. Then we know from
the martingale property that f(x) = E
x
[f(X
n
)]. There is a general theorem that
a bounded martingale Z
n
= f(X
n
) converges. It is also true that a martingale
is constant on each recurrent class. Call its constant value on the recurrent
class C by the name f(C). Assume that with probability one the X
n
eventually
enters a recurrent class. Then it is not hard to see that
f(x) =

C
(x)f(C). (5.25)
This shows that in this case the bounded martingales that are functions of the
process are just linear combinations of the hitting probability functions. Thus
the martingale method gives nothing new.
On the other hand, from the point of interpretation the martingale method
is quite dierent. It is a kind of forward method, in that one starts at a xed x
and follows the process forward in time, seeing where it eventually arrives.
There are examples where the martingale method is more general. Consider,
for instance, simple random walk with probability p of going up and q of going
34 CHAPTER 5. MARKOV CHAINS
down. Assume that p < q, so this is a losing game. Then the solution of Pf = f
is f(x) = (q/p)
x
. Say that we are interested in winning an amount b. So we start
at x < b and stop the random walk when we get to b. This gives a Markov chain
X
n
with b as an absorbing state. The martingale (q/p)
X
n
is bounded, since it
can never be larger than (q/p)
b
or smaller than zero. So it converges as n .
The equation we obtain is (q/p)
x
=
xb
(q/p)
b
+ (1
xb
)(q/p)

. The second
term is zero, so this is just (q/p)
x
=
xb
(q/p)
b
, a known equation. However
notice that the term that is zero is associated not with another recurrent class,
but with the process running o to .
5.6 The rst passage time
Consider a Markov chain. Let A be a set, and let x be a point not in A. Let
T
A
be the rst time the chain is in the set. Then m
x
= E
x
[T
A
] satises the
equation m
x
=

y
P(x, y)(1+m
y
), provided that we make the convention that
m
y
= 0 for y A. This can be stated without any reference to the values of m
on A by writing it in the form
m
x
=

y/ A
P(x, y)m
y
+ 1, (5.26)
for x = A.
Let Q be the matrix P with the rows and columns corresponding to A taken
out. Then
P
x
[T
A
> n] =

y
Q
n
(x, y). (5.27)
In matrix language this is P
x
[T
A
> n] = Q
n
1(x), where Q
n
is the matrix power
and 1 is the column vector with all entries equal to 1. From this we see that
the rate of decrease is bounded by a constant times the nth power of the largest
eigenvalue of Q. Now we can calculate the expectation by summing over all
n = 0, 1, 2, 3, . . .. This gives
m
x
= E
x
[T
A
] =

n=0
Q
n
(x, y) =

y
(1 Q)
1
(x, y). (5.28)
In matrix language this is m
x
= (1 Q)
1
1(x). In other words, the column
vector m satises m = Qm+ 1. This is the same result as before.
5.7 Problems
1. HPS, Chapter 1, Problem 18
2. HPS, Chapter 1, Problem 19
3. HPS, Chapter 1, Problem 20
4. HPS, Chapter 1, Problem 22
Chapter 6
Markov chains: stationary
distributions
6.1 Stationary distributions
Recall that a Markov process is determined by a transition probability matrix
P(x, y). It satises 0 P(x, y) 1 and

y
P(x, y) = 1 (6.1)
for each x. Thus each row sums to one.
A stationary distribution is a probability mass function (x) such that

x
(x)P(x, y) = (y) (6.2)
for each y. Furthermore,

x
(x) = 1. (6.3)
This equation has a more probabilistic interpretation. Let (y) = P[X
n
= y.
It says that

x
P[X
n+1
= y | X
n
= x]P[X
n
= x] = P[X
n
= y]. (6.4)
In other words,
P[X
n+1
= y] = P[X
n
= y]. (6.5)
The probabilities are stationary, or time invariant.
In matrix language the equation for a stationary distribution is P = . In
other words, is a row vector that is a left eigenvector of P with eigenvalue 1.
35
36 CHAPTER 6. MARKOV CHAINS: STATIONARY DISTRIBUTIONS
6.2 Detailed balance
The probability mass function (x) is said to satisfy detailed balance if
(x)P(x, y) = (y)P(y, x). (6.6)
This equation may also be expressed in probabilistic language. It says that
P[X
n+1
= y | X
n
= x]P[X
n
= x] = P[X
n+1
= x | X
n
= y]P[X
n
= y]. (6.7)
This can be expressed even more simply as
P[X
n+1
= y, X
n
= x] = P[X
n+1
= x, X
n
= y]. (6.8)
From this it is clear that detailed balance is the same thing as time reversibility.
There are exactly as many transitions from x to y as there are from y to x.
Theorem 6.1 If is a probability mass function that satises detailed balance,
then it is a stationary distribution.
Theorem 6.2 If satises detailed balance and is strictly positive, then
P(x, y)P(y, z)P(z, x) = P(x, z)P(z, y)P(y, x). (6.9)
Theorem 6.3 If for some z the transition probability P(x, z) > 0 for all x, and
if
P(x, y)P(y, z)P(z, x) = P(x, z)P(z, y)P(y, x), (6.10)
and if c > 0, then
(x) = c
P(z, x)
P(x, z)
(6.11)
satises
(x)P(x, y) = (y)P(y, x). (6.12)
If this last theorem applies, and if the resulting satises

x
(x) < ,
then the c may be chosen so that is a stationary distribution.
6.3 Positive recurrence and stationary distribu-
tions
A recurrent state x of a Markov chain is positive recurrent if m
x
= E
x
[T
x
] < .
It is null recurrent if m
x
= E
x
[T
x
] = .
Theorem 6.4 If x is positive recurrent, and if x leads to y, then y is positive
recurrent.
According to this theorem, if one state in a recurrent class is positive recur-
rent, then all states in this class are positive recurrent. So we may speak of the
class as being positive recurrent.
6.4. THE AVERAGE TIME IN A STATE 37
Theorem 6.5 If a recurrent class C is positive recurrent, then it has a station-
ary distribution
C
given by

C
(x) =
1
m
x
(6.13)
for x in C.
Theorem 6.6 Consider a Markov chain with a collection of positive recurrent
classes. For each recurrent class C there is a stationary distribution . Let
C
be coecients with
C
0 and with

C

C
= 1. Then
(x) =

C
(x) (6.14)
is a stationary distribution. All stationary distributions are of this form.
We begin to get the following picture of a Markov chain. There are a collec-
tion of recurrent classes. For each recurrent class there is a bounded function f
that solves Pf = f and is one on the given class, zero on the others. For each
positive recurrent class there is a stationary distribution that solves P =
and is concentrated on the class.
In the case when there are only nitely many states, every recurrent class is
positive recurrent.
6.4 The average time in a state
Theorem 6.7 Consider a Markov chain starting at x. Let y be a positive re-
current state belonging to a positive recurrent class C. Let N
n
(y) be the number
of visits to y in times 1 up to n. Then
lim
n
N
n
(y)
n
= 1
T
y
<

C
(y). (6.15)
Corollary 6.1 Consider a Markov chain starting at x. Let y be a positive
recurrent state belonging to a positive recurrent class C. Then
lim
n

n
k=1
P
k
(x, y)
n
=
C
(x)
C
(y). (6.16)
In matrix language this result says that if all the recurrent classes C are
positive recurrent, then
lim
n

n
k=1
P
k
n
=

C
, (6.17)
where the
C
are column vectors and the
C
are row vectors.
38 CHAPTER 6. MARKOV CHAINS: STATIONARY DISTRIBUTIONS
6.5 Examples
Random walk. The random walk on the integers is transient except in the
symmetric case, when it is null recurrent.
Gamblers ruin. The gamblers ruin is a random walk on the integers
where the game stops when a xed amount is won or lost. All states
are transient except for the winning and losing states. These are each a
positive recurrent class consisting of one point.
The single server queue. Let be the expected number of clients between
services. If > 1 it is transient; the queue length goes to innity. If
= 1 it is null recurrent; the queue empties, but less and less frequently.
If < 1 it is positive recurrent. In this last case there is a stationary
distribution.
The branching process. There is one positive recurrent class consisting of
zero individuals. All other states are transient. Let be the expected
number of children of an individual. If > 1 there is a possibility of a
population explosion. If 1 there can only be extinction.
6.6 Convergence to a stationary distribution
The analysis of a Markov chain consists of two parts. If we start with a transient
state, then it is interesting to see whether it reaches a recurrent class, and if so,
which one. If it reaches a recurrent class, then it will stay there. So then the
only question is how it spends its time.
If the class is a positive recurrent class, then the proportion of time in each
state is given by the stationary distribution. We may as well consider the Markov
chain to consist of just this one class. We always have that for each initial state
x
lim
n

n
k=1
P
k
(x, y)
n
= (y). (6.18)
However in some cases there is a better result. This result needs a hypothesis
that says that the process is suciently irregular in time.
Say, for instance, that for each state there is some non-zero probability
P(x, x) > 0 of remaining at that state. Then if there is some non-zero probabil-
ity P
n
0
(x, y) > 0 of getting to another state y in n
0
steps, then for each n n
0
there is a non-zero probability P
n
(x, y) > 0 of getting to y in n steps. This is
just because one could have been delayed at x for the rst n n
0
time steps
and then gotten to y in the remaining n
0
time steps.
Theorem 6.8 Suppose the Markov chain consists of a single positive recurrent
class. Assume that for each pair of states x, y there exists a number n
0
such
that for each n n
0
we have
P
n
(x, y) > 0. (6.19)
6.6. CONVERGENCE TO A STATIONARY DISTRIBUTION 39
Then for each initial state x and nal state z
lim
n
P
n
(x, z) = (z). (6.20)
Proof: Let X
n
be the Markov chain starting at x. Let Y
n
be another copy
of the chain starting at a random point with probability distribution . Run
the two chains independently. Then after n steps the probability distribution
of X
n
is P[X
n
= z] = P
n
(x, z) and the probability distribution of Y
n
is P[Y
n
=
z] = (z).
This chain has a stationary distribution (x)(y), in which the two compo-
nents are independent and each individually have the stationary distribution.
Therefore each state is positive recurrent. In particular, each state is recurrent.
It follows from the hypotheses of the theorem that each state (x, y) of this
chain leads to each other state (z, w). Let a be any state. In particular, each
state (x, y) leads to the state (a, a) on the diagonal. That is, P
(x,y)
[T
(a,a)
<
] > 0.
Now the crucial point is that the fact that (x, y) leads to (a, a) with proba-
bility greater than zero implies that (x, y) leads to (a, a) with probability one.
That is, P
(x,y)
[T
(a,a)
< ] = 1.
Let T be the rst n 1 such that X
n
= Y
n
. Then since T T
(a,a)
, it follows
that T < with probability one.
By the strong Markov property, P[X
n
= z | T n] = P[Y
n
= z | T n]. It
follows that P[X
n
= z, T n] = P[Y
n
= z, T n]. Thus we have
P[X
n
= z] P[Y
n
= z] = P[X
n
= z, n < T] P[Y
n
= z, n < T]. (6.21)
Now we can use the general inequality |P[A, C] P[B, C]| P[C]. This gives
|P
n
(x, z) (z)| = |P[X
n
= z] P[Y
n
= z]| P[n < T]. (6.22)
This is the key equation.
Since T < with probability one, it follows that P[n < T] tends to zero as
n . Therefore each term in the right hand size approaches zero.
This theorem says in matrix language that under suitable hypotheses
lim
n
P
n
= 1, (6.23)
where the column vector 1 multiplies the row vector to give a matrix.
The hypothesis of the theorem is intended to rule out some kind of periodicity
that would be preserved over arbitrarily long times. A typical periodicity is
where there is a division of the states into even states and odd states, and the
process at each time moves from an even state to an odd one, or from an odd
state to an even one.
The theorem by itself does not give a very good idea of how fast P[n < T]
approaches zero. All it does is reduce the problem of approach to an equilibrium
to a rst passage problem. This rst passage time is the time for the doubled
process to enter the set of diagonal elements (z, z). Here is a simple result along
this line.
40 CHAPTER 6. MARKOV CHAINS: STATIONARY DISTRIBUTIONS
Corollary 6.2 Consider a Markov chain consisting of one positive recurrent
class with N < states. Assume that there is a number k such that P
k
(x, y)
> 0 for every pair of states x, y. Then
|P
nk
(x, z) (z)| (1 N
2
)
n
. (6.24)
Proof: Again let T be the hitting time for the diagonal. We estimate P[n <
T]. Notice that from every initial state (x, y) and every nal state (z, z) on the
diagonal we have P
k
(x, z)P
k
(y, z)
2
. Therefore, in every k steps we have a
chance at least

z
P
k
(x, z)P
k
(y, z) N
2
of hitting the diagonal.
Let
Q = P 1. (6.25)
Then
Q
n
= P
n
1. (6.26)
So the rate of convergence to the stationary distribution is determined by the
eigenvalue of Q of largest absolute value. This is the eigenvalue of P of second
largest absolute value. The dierence 1 || is called the spectral gap. One
of the most fundamental and dicult problems of Markov process theory is to
estimate the spectral gap from below.
In the case of periodicity there can be eigenvalues of absolute value one other
than the obvious eigenvalue 1. This is a nuisance with which one may not care
to deal with. However it is easy to destroy periodicity by changing the process
to wait a random amount of time before jumping.
6.7 Problems
1. Consider a Markov chain consisting of one positive recurrent class. Assume
that there is a number k and a state z such that P
k
(x, z) > 0 for every
state x. Show that
|P
nk
(x, z) (z)| (1
2
)
n
. (6.27)
2. HPS, Chapter 2, Problem 6
3. HPS, Chapter 2, Problem 19
4. HPS, Chapter 2, Problem 20
Chapter 7
The Poisson process
7.1 The Bernoulli process
It is tempting to introduce the Poisson process as a limiting case of a Bernoulli
process. The Bernoulli process is just repeatedly ipping a coin that has a
probability p of coming up heads.
Let time be divided into intervals of length t > 0. Thus we consider times
t, 2t, . . . , nt, . . .. At the end of each interval there is an event that is a
success with probability p = t and a failure with probability q = 1 t.
Let N(t) be the number of successes up to time t = nt. Let T
r
be the time of
the rth success.
Then N(t) has the binomial probabilities
P[N(t) = k] =
_
n
k
_
p
k
(1 p)
nk
. (7.1)
We can use p = t/n to write this in the form
P[N(t) = k] =
_
n
k
_
1
n
k
(t)
k
_
1
t
n
_
nk
. (7.2)
On the other hand, T
r
has the negative binomial probabilities
P[T
r
= t] = P[N(t t) = r 1]p =
_
n 1
r 1
_
p
r1
(1 p)
nr
p (7.3)
We can also write this as
P[T
r
= t] =
_
n 1
r 1
_
1
n
r1
(t)
r1
_
1
t
n
_
nr
t. (7.4)
In particular, T
1
has the (shifted) geometric probabilities
P[T
1
= t] = P[N(t t) = 0]p = (1 p)
n1
p =
_
1
t
n
_
n1
t. (7.5)
41
42 CHAPTER 7. THE POISSON PROCESS
It is worth recalling that for this geometric distribution
P[T
1
> t] = P[N(t) = 0] = (1 p)
n
=
_
1
t
n
_
n
. (7.6)
7.2 The Poisson process
The Poisson process N(t) is obtained by letting n and t 0 with
= p/t and t = nt xed. The calculations use the fact that
_
n
k
_
1
n
k

1
k!
(7.7)
as n with k xed. They also use the fact that
_
1
t
n
_
n
e
t
(7.8)
as n with t xed.
Let N(t) be the number of successes up to time t. Let T
r
be the time of the
rth success. Then N(t) has the Poisson probabilities
P[N(t) = k] =
1
k!
(t)
k
e
t
. (7.9)
On the other hand, T
r
has the gamma probability density
P[t < T
r
< t +dt] =
1
(r 1)!
(t)
r1
e
t
dt. (7.10)
This is easy to remember, because
P[t < T
r
< t +dt] = P[N(t) = r 1]dt. (7.11)
In particular, T
1
has the exponential probability density
P[t < T
1
< t +dt] = e
t
dt. (7.12)
It is worth recalling that for this exponential distribution
P[T
1
> t] = P[N(t) = 0] = e
t
. (7.13)
The Poisson process has the Markov property
P[N(t +s) = k +m | N(t) = k] = P[N(s) = m]. (7.14)
If we take m = 0 and k = 0 we get
P[T
1
> t +s | T
1
> t] = P[T
1
> s]. (7.15)
This is the famous Markov property of exponential waiting times.
The two key properties of the Poisson process are the following:
Poisson increments. For each s < t the random variable N(t) N(s) is
Poisson with mean (t s).
Independent increments. The increments N(t) N(s) corresponding to
disjoint intervals are independent.
7.3. THE POISSON PARADOX 43
7.3 The Poisson paradox
Now there is an interesting paradox. The waiting times between jumps in the
Poisson distribution have the exponential distribution with parameter . Let a
be a xed number. Let T
a
be the rst T
r
with T
r
a. So if r > 1 we have
T
r1
< a T
r
= T
a
. Thus T
a
a is only part of the waiting time T
r
T
r1
associated with this r. Nevertheless, T
a
a has the exponential distribution.
To see this, compute
P[T
a
> a +s] =

r=1
P[N(a) = r 1, N(a +s) = r 1]. (7.16)
Introducing conditional probability this becomes
P[T
a
> a+s] =

r=1
P[N(a+s) = r 1 | N(a) = r 1]P[N(a) = r 1]. (7.17)
However P[N(a + s) = r 1 | N(a) = r 1] = e
s
, by the Markov property.
Thus
P[T
a
> a +s] = e
s
. (7.18)
How can T
a
a have the exponential distribution, in spite of the fact that
T
a
a is only part of the waiting time up to the jump time T
a
? The resolution of
the paradox is that the waiting interval that happens to bracket a will not have
the exponential distribution; in fact it will tend to be longer than the average.
The remarkable thing is that this does not depend on the particular time a.
7.4 Combining Poisson processes
Theorem 7.1 Say that N
1
(t) is a Poisson process with rate
1
, and N
2
(t) is
a Poisson process with rate
2
. Say that these processes are independent. Let
N(t) = N
1
(t) +N
2
(t). Then N(t) is a Poisson process with rate =
1
+
2
.
This theorem can be proved in several ways. The rst is to use independence
to compute the distribution
P[N(t) = n] =
n

k=0
P[N
1
(t) = k, N
2
(t) = nk] =
n

k=0
P[N
1
(t) = k][N
2
(t) = nk].
(7.19)
This says that
P[N(t) = n] =
n

k=0
(
1
t)
k
k!
e

1
t
(
2
t)
nk
(n k)!
e

2
t
. (7.20)
In other words,
P[N(t) = n] =
1
n!
n

k=0
_
n
k
_
(
1
t)
k
(
2
t)
nk
e

1
t
2
t
. (7.21)
44 CHAPTER 7. THE POISSON PROCESS
It then follows from the binomial theorem that
P[N(t) = n] =
1
n!
(
1
t +
2
t)
n
e
(
1
t+
2
t)
. (7.22)
The second proof is much simpler. Consider the waiting times between
jumps. The probability that W
1
> t is e

1
t
. The probability that W
2
> t is
e

2
t
. Let W be the minimum of the random variables. For the minimum to
be larger than t, both individual times have to be larger than t. If they are
independent, then this is the product of the probabilities. This gives the result
P[W > t] = e
(
1
t+
2
t)
.
The third proof is simplest of all, though it could be criticized on the basis
of rigor. Suppose that the probability of a jump in the rst process in the time
interval from t to t + dt is
1
dt. Similarly, suppose that the probability of
a jump in the second process in same time interval is
2
dt. If the jumps are
independent, then the probability of two jumps in the time interval is
1
dt
2
dt,
which is negligible. Therefore the probability that one or the other jumps is the
sum
1
dt +
2
dt.
7.5 Problems
1. HPS, Chapter 3, Problem 3
2. HPS, Chapter 3, Problem 4
3. HPS, Chapter 3, Problem 5
Chapter 8
Markov jump processes
8.1 Jump rates
The idea of a Markov jump process is that there is a space of states. For every
pair of states x = y there is a jump rate q
xy
0. This is the probability per
second of a jump from x to y. Thus q
xy
dt is the conditional probability of a
jump from x to y in time from t to t +dt, given that the particle is at x at time
t.
Example: The Poisson process is an example. The state space consists of
the natural numbers. The transition from x to x + 1 takes place at rate . All
other transitions have rate zero.
Dene
q
x
=

y=x
q
xy
. (8.1)
This is the probability per second of a jump from x to anywhere else. Thus
q
x
dt is the conditional probability of a jump from x in time from t to t + dt,
given that the particle is at x at time t.
The time the particle remains at x is exponential with rate q
x
. The expected
time the particle remains at x is thus 1/q
x
, measured in seconds.
Dene
q
xx
= q
x
. (8.2)
With this convention, there is a matrix q
xy
dened for all pairs of states x, y.
This matrix is called the generator of the jump process.
This matrix has the properties that all entries with x = y have q
xy
0,
while q
xx
0. Furthermore,

y
q
xy
= 0 (8.3)
for each x. The row sums are zero.
45
46 CHAPTER 8. MARKOV JUMP PROCESSES
8.2 Hitting probabilities
Consider the process starting at x. Let T
y
be the rst time the particle gets to
y after it has jumped from x. If x is not an absorbing state, then we set

xy
= P
x
[T
y
< ]. (8.4)
On the other hand, if x is an absorbing state with q
x
= 0, we set
xy
= 0 for
x = y and
xx
= 1.
We say that x leads to y if
xy
> 0.
A state y is called recurrent if
yy
= 1 and transient if
yy
< 1.
Theorem 8.1 Let y be a recurrent state. Let
xy
be the probability of ever
getting from x to y. Then

z
q
xz

zy
= 0. (8.5)
The boundary conditions are that
yy
= 1 and
xy
= 0 if x does not lead to y.
In matrix language the equation says that q = 0. Here the generator q is a
square matrix, and is a column vector.
Proof: The probability of a transition from x in time dt that eventually leads
to y is given by solving

xy
=

z=x
q
xz
dt
zy
+ (1 q
x
dt)
xy
. (8.6)
This can also be written as an equality of the transition rates from x to y in the
form
q
x

xy
=

z=x
q
xz

zy
. (8.7)
This translates to the matrix equation in the theorem.
Note: If y is not a recurrent state, then one can compute the probability of
ever getting from x to y by considering a modied chain in which y has been
converted to an absorbing state.
8.3 Stationary distributions
Theorem 8.2 The equation for a stationary distribution is

x
(x)q
xy
= 0. (8.8)
In matrix language the equation says that q = 0. Here the generator q is a
square matrix, and is a row vector.
Proof: The rate of arriving at y is equal to the rate of leaving y. This says
that

x=y
(x)q
xy
= (y)q
y
. (8.9)
8.4. THE BRANCHING PROCESS 47
This translates to the matrix equation in the theorem.
We dene m
x
= E
x
[T
x
] and dene a recurrent state to be positive recurrent
if this is nite or if the state is absorbing.
Theorem 8.3 The stationary probability associated with a positive recurrent
state (other than an absorbing state) is given by
(x) =
1/q
x
m
x
. (8.10)
This theorem says that the probability of being at x is the ratio of the average
time in seconds spent at x before a jump to the average total time in seconds
that it takes to jump from x, wander around, and then return.
Theorem 8.4 If a distribution satises detailed balance (x)q
xy
= (y)q
yx
for all pairs of states, then it satises the equation for a stationary distribution.
This equation says that the process is time reversible: The rate of transitions
from x to y is the same as the rate from y to x.
8.4 The branching process
The branching process involves a situation where there are x individuals at a
given time. Each individual has a rate of splitting in two and a rate of dying.
So the birth rate is x and the death rate is x. Clearly 0 is an absorbing state.
If the process is sure to die out. On the other hand, if > there is a
chance of a population explosion. If there are x individuals, the probability that
the line of an individual dies out is
x0
=
x
10
, since the lines are independent.
The probability of extinction is given by solving
( +)
10
=
2 0
+ (8.11)
and using
20
=
2
10
. This can be written as a quadratic equation

2
10
( +)
10
+ = 0. (8.12)
The solutions are
10
= (/) and
10
= 1. The probability of extinction is the
non-trivial root /.
8.5 The N server queue
Here N 1 is the number of servers. Customers come in at rate q
x x+1
= . If
there are x N customers in the queue, each gets served, at the same average
rate. So the service rate is q
x x1
= x. If there are x > N customers, only the
rst N are being served. So the service rate is then q
xx1
= N.
If > N, then the queue is badly overloaded, and there is some probability
that the queue never empties. To see this, take x > N. We solve the equation
48 CHAPTER 8. MARKOV JUMP PROCESSES
for
xN
. This will tell us that we are not almost sure to get to N, and this in
turn will tell us that the chain is transient. This equation for N + 1 is
( +N)
N+1 N
=
N+2 0
+N. (8.13)
On the other hand, when x > N reducing the queue by two steps involves
reducing it by one step and then by one more step. So
N+2 N
=
2
N+1 N
. So we
have a quadratic equation for =
N+1 N
of the form
2
(+N)+N = 0.
The solutions are 1 and = N/. The second root is the relevant one, and it
is less than one.
On the other hand, if < N, the service can keep up and there should be
a stationary distribution. The detailed balance equation is
(x + 1)(x + 1) = (x) (8.14)
for x + 1 N and
(x + 1)N = (x) (8.15)
for x+1 > N. From the second equation we see that we get a convergent series
precisely when /(N) < 1.
8.6 The Ehrenfest chain
The Ehrenfest chain consists of a process on the d + 1 points 0, 1, 2, 3, . . . , d.
The idea is that there are d particles, some in box 1 and some in box 0. The
state of the chain is the number of particles in box 1.
The transition rate from x to x + 1 is (d x). This is because if there are
x particles in box 1 and d x particles in box 0, each of the d x particles has
the same chance of making the jump. Each particle jumps at rate . So the
total jump rate is (d x).
The transition rate from x + 1 to x is (x + 1). This is because if there
are x + 1 particles in box 1 and d x 1 particles in box 0, each of the x + 1
particles has the same chance of making the jump. Each particle jumps at rate
. So the total jump rate is (x+1). The Ehrenfest chain actually corresponds
to the case when = , but it is no trouble to handle the extra generality.
The detailed balance equation then says that
(x + 1)(x + 1) = (x)(d x). (8.16)
This equation can be solved for all the (x) in terms of (0). The result is that
(x) =
_
d
x
_
(/)
x
(0). The conclusion is that the stationary distribution
(x) =
_
d
x
__

+
_
x
_

+
_
dx
(8.17)
is binomial.
There is also a microscopic view of the Ehrenfest chain. First we consider
a rather special two state Markov jump process. Its states are 0 and 1. The
8.7. TRANSITION PROBABILITIES 49
transition rate in each direction is > 0. Clearly the stationary measure for
such a process satises the detailed balance condition (0) = (1). This
implies that the stationary distribution is assigns probability (1) = /( +)
to 1 and (0) = 1 (1) = /( +) to 0.
The microscopic Ehrenfest chain consists of d two state Markov jump processes.
We think of each chain as describing what one particle does. This chain has a
total of 2
d
states. Its stationary distribution is clearly the independent distrib-
ution on the 2
d
states, assigning probability /( +) to an individual particle
being in state 1 rather than 0.
The previous macroscopic Ehrenfest chain is obtained by counting the num-
ber of the individual particles that are in state one. In the stationary distribution
this is the number of successes in d trials, where the probability of success on
each trial is /( +). So it must be binomial.
8.7 Transition probabilities
The transition probabilities
P
xy
(t) = P
x
[X(t) = y] (8.18)
may in principle be computed from the jump rates.
One method is to use the backward equation
d
dt
P
xy
(t) =

z
q
xz
P
zy
(t). (8.19)
This equation says that to go from x to y in time t one can jump in the rst
interval dt to z and then proceed to y, or one can not jump at all in this rst
interval. In matrix language this equation is
d
dt
P(t) = qP(t). (8.20)
Another method is to use the forward equation
d
dt
P
xy
(t) =

z
P
xz
(t)q
zy
. (8.21)
This equation says that to go from x to y in time t one can jump to z just
before t and then jump in the last interval dt to y, or one can already be at y
and remain there. In matrix language this equation is
d
dt
P(t) = P(t)q. (8.22)
Example: Let us solve the forward equation for the Poisson process. In this
case all the transitions are from x to x + 1 with the same rate . The forward
equation says that for x y
d
dt
P
xy
(t) = P
xy1
(t) P
xy
(t). (8.23)
50 CHAPTER 8. MARKOV JUMP PROCESSES
When y = x the rst term is not present. The initial condition is P
xx
(0) = 1
and P
xy
(0) = 0 for all y = x.
To solve this equation, start with the case y = x. The solution is obviously
P
xx
(t) = e
t
. Then it is easy to show by induction as we consider y = x +
1, x + 2, x + 3, . . . that for x y
P
xy
(t) =
(t)
yx
(y x)!
exp(t). (8.24)
8.8 The embedded Markov chain
The picture that emerges of a jump process is that a particle stays at x for
some random time with exponential distribution with parameter q
x
. If q
x
> 0,
then eventually it makes a transition to some other state y. To which state?
The conditional probability that it makes a jump to state y, given that it has
decided to jump, is q
xy
/q
x
. So it makes the jump according to this law. It then
continues the process of waiting and jumping.
This leads to a formal denition of a Markov chain associated with the jump
process. The conditional probability of a jump to y, given that there is a jump
from x in time interval dt, is
Q
xy
=
q
xy
q
x
(8.25)
for y = x. If q
x
> 0 we dene Q
xx
= 0. If q
x
= 0, so that we have an absorbing
state, then we dene Q
xx
= 1 and Q
xy
= 0 for y = x. The matrix Q is the
matrix of transition probabilities of a Markov chain. This is called the embedded
Markov chain of the jump process.
The hitting probabilities
xy
may be calculated either with the jump process
or with the embedded Markov chain. These give the same results. The reason
is they are the probabilities of ever getting to a nal state. Thus they do not
depend at all on the timing.
The equation for the stationary distribution, on the other hand, must be
expressed in terms of the jump rates. The probability (x) of being at a point
depends not only on where the particle wanders, but on how long on the average
it lingers at the state.
8.9 Problems
1. HPS, Chapter 3, Problem 10
2. HPS, Chapter 3, Problem 12
3. HPS, Chapter 3, Problem 13
4. HPS, Chapter 3, Problem 21
Chapter 9
The Wiener process
9.1 The symmetric random walk
It is tempting to introduce the Wiener process as a limit of a symmetric random
walk process. This is just repeatedly ipping a fair coin and keeping track of
the imbalance between heads and tails.
Let time be divided into intervals of length t > 0. Let space be divided
into intervals of length
x =

t. (9.1)
Here > 0 is a positive constant that measures the amount of diusion. Often

2
/2 is called the diusion constant. For each interval there is a random variable

i
that is 1 with probabilities one half for either sign. These are thus identically
distributed random variables with mean zero and variance one.
We take the random walk process to be
W
t
=
1
x +
2
x +
n
x (9.2)
where t = nt. Then W
t
has mean zero and variance n(x)
2
=
2
t. Then
X(t) has the binomial probabilities associated to (n + k)/2 positive steps and
(n k)/2 negative steps:
P[X(t) = x] =
_
n
(n +k)/2
__
1
2
_
n
(9.3)
where x = kx.
9.2 The Wiener process
The Wiener process is obtained by letting n and t 0 with nt = t
xed. Thus W(t) has mean zero and variance
2
t and is Gaussian.
Now this argument does not prove that there is such a mathematical object
as the Wiener process. But Wiener proved that it exists, and while the proof
51
52 CHAPTER 9. THE WIENER PROCESS
is somewhat technical, there are many versions now available that are not so
terribly dicult.
Without going into the construction of the process, we can write properties
that characterize it.
The two key properties of the Wiener process are:
Gaussian increments. For each s < t the random variable W(t) W(s) is
Gaussian with mean zero and variance
2
(t s).
Independent increments. The increments W(t) W(s) corresponding to
disjoint intervals are independent.
It is often convenient to specify that the Wiener process has a particular
value at t = 0, for instance W(0) = 0. Sometimes we may specify another
value. It is also convenient to think of the Wiener process as dened for all real
t. This can be arranged by imagining another Wiener process going backward
in time and joining the two at time zero.
9.3 Continuity and dierentiability
Wiener proved that the process can be chosen so that the function that sends
t to W(t) is continuous with probability one. This is another property that
should be assumed.
The continuity becomes less obvious when one realizes that the Wiener
process is not dierentiable. This is made plausible by the following remark.
Theorem 9.1 Let W(t) be the Wiener process. Let h > 0. Then the variance
of the dierence quotient is
Var
_
W(t +h) W(t)
h
_
=

2
h
. (9.4)
Hence the variance of the dierence quotient approaches innity as h 0.
9.4 Stochastic integrals
Even though the Wiener process is not dierentiable, we would like to make
sense of integrals of the form
_

f(t)W

(t) dt. (9.5)


The derivative of the Wiener process is called white noise. In order to get a
well-dened random variable out of white noise, one has to integrate it.
9.5. EQUILIBRIUM STATISTICAL MECHANICS 53
Theorem 9.2 Let f(t) be a function that is square-integrable:
_

|f(t)|
2
dt < . (9.6)
Then there is a corresponding Gaussian random variable
_

f(t) dW(t) (9.7)


with mean zero and variance

2
_

|f(t)|
2
dt. (9.8)
Theorem 9.3 Let f(t) and g(t) be square-integrable functions. Then the co-
variance of the associated stochastic integrals is
Cov
__

f(t) dW(t),
_

g(s) dW(s)
_
=
2
_

f(t)g(t) dt. (9.9)


If you like delta functions, you can think of this result as saying that for
white noise the covariance of W

(t) and W

(s) is
2
(t s). However it is not
necessary to talk this way.
9.5 Equilibrium statistical mechanics
Equilibrium statistical mechanics is based on a very simple principle: lower
energy states are more likely than higher energy states. This principle is used
to dene a probability density. Then the energy is a random variable.
Here are the details. The energy of a system is a function H(x) of the state x
of the system. The probability density is supposed to be a decreasing function
of the energy. Furthermore, it should have the property that if the energy
H(x) = H
1
(x
1
) + H
2
(x
2
) is the sum of two terms that depend on dierent
coordinates, then the corresponding probabilities should be independent.
A formula that makes this work is to take the probability density to be
1
Z()
e
H(x)
, (9.10)
That is, the probability is a decreasing exponential function of the energy. The
constant
Z() =
_
e
H(x)
dx (9.11)
is chosen to make this a probability density.
This rule satises the factorization property needed for independence:
1
Z)
e
(H
1
+H
2
)
=
1
Z
1
()
e
H
1
1
Z
2
()
e
H
2
. (9.12)
54 CHAPTER 9. THE WIENER PROCESS
Now the only remaining problem is to identify the coecient . This is a
long story, but the result is that
=
1
kT
, (9.13)
where kT is the absolute temperature measured in energy units. Perhaps one
could think of this as the denition of absolute temperature.
Once we have the probability density, we can think of H as a random vari-
able. The expectation of H is
E() =
1
Z()
_
H(x)e
H(x)
dx. (9.14)
Lemma 9.1 The normalization constant Z() satises
d
d
Z() = E()Z(). (9.15)
Theorem 9.4 The rate of change of the energy with respect to is
d
d
E() =
1
Z()
_
H(x)
2
e
H(x)
dx +E()
2
= Var(H). (9.16)
This shows that the expected energy E() is a decreasing function of .
Thus the expected energy is an increasing function of the temperature.
9.6 The Einstein model of Brownian motion
The Einstein model of Brownian motion describes the motion of a particle.
The particle is bombarded randomly by molecules. In the Einstein model the
x component of the particle as a function of time is a Wiener process with a
certain diusion constant
2
/2. The fact that the motion is so irregular is due
to the fact that the molecular bombardment is extremely rapid and random.
This model is very strange from the point of view of physics. In physics the
fundamental physical law is that mass times acceleration equals force. In the
Einstein model the diusing particle not only does not have an acceleration, it
does not even have a velocity. Therefore it is remarkable that Einstein was able
to do physics with this model. His great achievement was to derive a formula
for the diusion constant.
The formula involves the absolute temperature T, measured in degrees Kelvin.
There is a constant k, called Boltzmanns constant, that converts temperature
units into energy units. The formula actually involves the energy quantity kT,
which is proportional to temperature. Recall that energy units are force times
distance. The formula also involves a frictional coecient that measures the
drag on the particle due to the surrounding uid. If the particle is dragged
through the uid, it experiences a frictional force proportional to the speed
9.6. THE EINSTEIN MODEL OF BROWNIAN MOTION 55
with which it is dragged. This proportionality constant has the units of force
over velocity.
The Einstein formula is
1
2

2
E
=
kT

. (9.17)
The diusion constant has the units of distance over velocity, that is, distance
squared over time. This reects the fact that the distance over which diusion
takes place is proportional, on the average, to the square root of the time.
Because of this famous theory, the Wiener process is sometimes called Brown-
ian motion. However this is misleading, because one could imagine various
models of the physical process of Brownian motion. But there is only one math-
ematical Wiener process.
Here is how Einstein found his formula. He used an amazing trick. He
considered an external force in which the Brownian motion is taking place.
This force f is taken to be constant. It does not matter how large or small it is,
as long as it is not zero! This force balances the frictional force to produce an
average terminal velocity a. Again this can be arbitrarily small. Thus Einsteins
model for the displacement X(t) of a diusing particle is really
X(t) X(0) = at +W(t) W(0), (9.18)
where for the purposes of the derivation a is non-zero, but possibly very small.
Strictly speaking the particle has no velocity, since the dierence quotient of the
displacement (X(t +t) X(t))/t does not have a limit as t 0. However
the expectation of the dierence quotient does, and limit of the expectation is
a. So a is the velocity in some average sense.
The external force f on the particle makes it achieve the terminal average
velocity a. The relation between the force and the terminal average velocity is
f a = 0. (9.19)
This just says that the external force and the average frictional force sum to
zero when the terminal average velocity is reached. That is, the assumption is
that the average acceleration is zero.
Now assume that there is a stationary density for the particles satisfying
detailed balance. The particles diuse, but they also drift systematically in the
direction of the force at an average velocity of a. We imagine that the motion
takes place in a bounded interval of space. At the endpoints the particles just
reect o, so this is like keeping the particles in a box. It may help to think
of the interval as being a vertical column, and the force f being the force of
gravity in a downward direction. Then the density will be higher at the bottom
than at the top. The particles drift downward, but since there are more of them
at the bottom, the net diusion is upward.
The detailed balance equation, as we shall see later, is
a(x)
1
2

2
(x)
x
= 0. (9.20)
56 CHAPTER 9. THE WIENER PROCESS
This says that the systematic motion of particles in the direction of the velocity
a plus the diusion in the direction from higher density to lower density gives
a balance of zero transport of particles.
Now the fundamental principle of equilibrium statistical mechanics is that
the probability is determined by the formula (1/Z) exp(H/(kT)), where H is
the energy and kT is the absolute temperature, measured in energy units. In
some sense this is the fundamental denition of temperature. For our problem
the energy H = fx, given by force times distance. Thus
(x) =
1
Z
exp(
fx
kT
). (9.21)
Inserting this in the detailed balance equation gives
1
2

2
f
kT
= a. (9.22)
The Einstein relation comes from eliminating a/f from these equations. How-
ever Einsteins derivation is really too clever. We shall see that there is another
method that more straightforward and intuitive. This involves a slightly more
detailed model of Brownian motion, due to Ornstein and Uhlenbeck.
9.7 Problems
1. The transition probability density p
t
(x, y) for the Wiener process satises
E
x
[f(W(t))] =
_

p
t
(x, y)f(y) dy. (9.23)
Write an explicit formula for p
t
(x, y).
2. Prove the theorems about stochastic integrals for the case when the func-
tions f(t) and g(s) are piecewise constant.
3. Suppose that the energy H(x) =
1
2
cx
2
is quadratic. Show that in equilib-
rium the expected energy E() is proportional to the absolute temperature
T. (Recall that 1/ = kT.) What is the proportionality constant? How
does it depend on c.
4. HPS, Chapter 4, Problem 18
5. HPS, Chapter 4, Problem 19
6. HPS, Chapter 5, Problem 15
Chapter 10
The Ornstein-Uhlenbeck
process
10.1 The velocity process
We shall describe the Ornstein-Uhlenbeck velocity process as a process describ-
ing the velocity of a diusing particle. (This process is also called the Langevin
process.) The same mathematics works for any diusion process with linear
drift.
The law of motion is mass times acceleration equals force. After multiplying
by dt we get a stochastic dierential equation
mdV (t) = V (t) dt +dW(t). (10.1)
Here m > 0 is the mass and > 0 is the friction coecient. This is Newtons
second law of motion. Both sides have the dimensions of momentum. The
dW(t) represents a random force. The corresponding diusion constant
2
OU
has the dimensions of momentum squared over time.
Let
=

m
(10.2)
be the relaxation rate. Then the equation becomes
dV (t) = V (t) dt +
1
m
dW(t). (10.3)
For a linear force law like this it is easy to solve the equation explicitly by a
stochastic integral. The solution is
V (t) = e
t
V (0) +
1
m
_
t
0
e
(ts)
dW(s). (10.4)
57
58 CHAPTER 10. THE ORNSTEIN-UHLENBECK PROCESS
Theorem 10.1 The Ornstein-Uhlenbeck velocity process starting at v is a ran-
dom variable V (t) dened for t 0 that is Gaussian with mean
E[V (t)] = e
t
v (10.5)
and variance
Var(V (t)) =

2
OU
2m
2
(1 e
2t
). (10.6)
Proof: The mean comes from taking the mean in the rst term in the solu-
tion. The mean of the stochastic integral in the second term is zero. On the
other hand, the variance of the rst term is zero. The variance of the stochastic
integral in the second term is computed by using the properties of stochastic
integrals.
Theorem 10.2 The Ornstein-Uhlenbeck process is correlated over a time roughly
equal to 2/. Its covariance for t
1
0 and t
2
0 is
Cov(V (t
1
), V (t
2
)) =

2
OU
2m
2
(e
|t
2
t
1
|
e
(t
1
+t
2
)
) (10.7)
Proof: The covariance comes from the stochastic integral in the second term
of the solution.
If we wait for a long time, we see that the mean is zero and the covariance
of the process is
Cov(V (t
1
), V (t
2
)) =

2
OU
2m
2
e
|t
2
t
1
|
. (10.8)
This identies the stationary measure as the Gaussian measure with mean zero
and variance
2
OU
/(2m
2
). This relation is due to a balance between the uc-
tuation
2
OU
/m
2
and the dissipation .
From these results we can identify
2
OU
. Consider the process in equilibrium,
that is, in the stationary measure. The expectation of the kinetic energy is given
by equilibrium statistical mechanics. Since the kinetic energy is quadratic, it is
just
E[
1
2
mV (t)
2
] =
1
2
kT. (10.9)
This comes from treating the velocity as a Gaussian variable with density pro-
portional to exp(
1
2
mv
2
/(kT)). That is, the velocity is Gaussian with mean zero
and variance kT/m. Since the variance of the velocity is also (1/2)
2
OU
/(m
2
),
this allows us to identify
1
2

2
OU
= mkT = kT. (10.10)
10.2. THE ORNSTEIN-UHLENBECK POSITION PROCESS 59
10.2 The Ornstein-Uhlenbeck position process
The result of the above discussion is a model for the velocity of a diusing
particle. In order to compare the results with the Einstein model, one must nd
the position. This is given by
X(t) =
_
t
0
V (s) ds. (10.11)
However this is easy to gure out. Integrate the original stochastic dieren-
tial equation directly. This gives
m(V (t) V (0)) = (X(t) X(0)) +W(t) W(0). (10.12)
Solving for the position we get
X(t) X(0) =
1

(W(t) W(0))
m

(V (t) V (0)). (10.13)


This is not the same as the process described by Einstein, but it is rather
close. In fact, if we can neglect the second term, we get that
X(t) X(0)
1

(W(t) W(0)). (10.14)


Thus in this approximation the position process is a Wiener process with vari-
ance (
2
OU
/
2
)t = 2(kT/)t. This is the same as the prediction
2
E
t of the
Einstein theory. However in this case the Einstein formula was obtained by
applying statistical mechanics to the velocity instead of to the position.
How good is this approximation? The process V (t) V (0) has a sta-
tionary distribution. This stationary distribution has mean zero and variance

2
OU
/(2m). So the process in the second term has a stationary distribution
with variance
2
OU
m/(2
3
) = (1/2)(m/)
2
E
. So this shows that as soon as t is
larger than (1/2)(m/), this is a good approximation for the displacement.
10.3 Stationary Gaussian Markov processes
There is a sense in which the stationary Ornstein-Uhlenbeck velocity process
is the nicest of all possible stochastic processes. It is a stationary stochastic
process with continuous time parameter and with continuous real values. Thus
the covariance function for the process at two times s and t is of the form
r(t s), where r(t) = r(t) is a symmetric function. It is also a Gaussian
process with mean zero. Thus the covariance function completely describes the
process. Finally, it is a Markov process. This forces the covariance to have the
decaying exponential form r(t) = r(0)e
|t|
for some > 0. Thus only one
parameter is needed to describe the process: the decay rate .
60 CHAPTER 10. THE ORNSTEIN-UHLENBECK PROCESS
10.4 Problems
1. The transition probability density p
t
(v, w) for the Ornstein-Uhlenbeck ve-
locity V (t) process satises
E
v
[f(V (t))] =
_

p
t
(v, w)f(w) dw. (10.15)
Write an explicit formula for p
t
(v, w).
2. Show directly that
V (t) =
1
m
_
t

e
(ts)
dW(s) (10.16)
has the covariance of the stationary Ornstein-Uhlenbeck velocity process.
3. Find the covariance function for the Ornstein-Uhlenbeck position process
X(t)X(0). (Hint: Write the process as a sum of two stochastic integrals.
Then the covariance is a sum of four terms.)
4. Is the Ornstein-Uhlenbeck velocity process a Markov process? Discuss.
5. Is the Ornstein-Uhlenbeck position process a Markov process? Discuss.
Chapter 11
Diusion and drift
11.1 Stochastic dierential equation
We can look at stochastic dierential equations of a more general form
dX(t) = a(X(t)) dt +dW(t). (11.1)
Here W(t) is the Wiener process with variance
2
t. The function a(x) is called
the drift. If we think of x as displacement, then a(x) is a kind of average velocity.
The meaning of this stochastic dierential equation is given by converting it to
an integral equation
X(t) X(t
0
) =
_
t
t
0
a(X(s)) ds +W(t) W(t
0
). (11.2)
Both sides of this equation are intended to be well-dened random variables.
If the function a(x) is not linear, then it is dicult to nd an explicit solution.
However under appropriate hypotheses one can prove that a solution exists by
iterating the integral equation and showing that these iterates converge to a
solution.
11.2 Diusion equations
If we start this process with X(0) = x, then we can try to compute
E
x
[f(X(t))] =
_

p
t
(x, y)f(y) dy. (11.3)
Here p
t
is the probability density for getting from x to near y in time t.
Theorem 11.1 Let
L =
1
2

2

2
x
2
+a(x)

x
. (11.4)
61
62 CHAPTER 11. DIFFUSION AND DRIFT
Then
u(x, t) = E
x
[f(X(t))] =
_

p
t
(x, y)f(y) dy (11.5)
satises the backward equation
u
t
= Lu (11.6)
with initial condition u(x, 0) = f(x).
Theorem 11.2 Let
L

=
1
2

2

2
y
2


y
(a(y) ). (11.7)
Thus L

f = (1/2)
2
f

(a(y)f)

. Let
(y, t) =
_

0
(x)p
t
(x, y) dx (11.8)
be the probability density as a function of y, when the process is started with
density
0
(x). Then (y, t) satises the forward equation (or Fokker-Planck
equation)

t
= L

(11.9)
with initial condition (y, 0) =
0
(y).
Note that the forward equation may be written

t
+

y
J(y) = 0, (11.10)
where
J(y) =
1
2

2

y
(y) +a(y)(y) = 0 (11.11)
is the probability current.
11.3 Stationary distributions
Corollary 11.1 Let
L

=
1
2

2

2
y
2


y
(a(y) ). (11.12)
The equation for a stationary probability density (y) is
L

= 0. (11.13)
Corollary 11.2 Say that a probability density satises detailed balance:
J(y) =
1
2

2

y
(y) +a(y)(y) = 0. (11.14)
Then it is a stationary probability density.
11.4. BOUNDARY CONDITIONS 63
The nice thing is that it is possible to solve the detailed balance equation in
terms of an integral. Let
I(y) =
_
y
y
0
a(z) dz. (11.15)
The solution is then
(y) = C exp(
2

2
I(y)). (11.16)
Example: Suppose that the drift is constant: a(y) = a and I(y) = ay. Then
the stationary distribution is exponential
(y) = C exp(
2

2
y). (11.17)
The total integral cannot be one, unless there are some boundary conditions
that place a limit to the exponential growth. We shall see that limiting the
process to a bounded interval with reecting boundary conditions at the end
points will give a stationary process.
Example: Suppose that the drift is linear: a(y) = y and I(y) = y
2
/2.
Then
(y) = C exp(

2
y
2
). (11.18)
If > 0 this makes sense as a Gaussian. This is in some sense the nicest of all
stochastic processes: it is Markov, stationary, and Gaussian. We have already
encountered it as the Ornstein-Uhlenbeck velocity process.
11.4 Boundary conditions
One can also think of the motion as taking place in an interval. Then one has
to specify what happens when the diusing particle gets to an end point.
The rst case is that when the particle reaches the boundary point x

it
is sent to a storage place. This is the case of an absorbing boundary. The
boundary conditions for this case are u(x

, t) = 0 for the backward equation


and (x

, t) = 0 for the forward equation. Usually when there is an absorbing


boundary the process is transient, except for the absorbing state. Notice that
the detailed balance equation, when combined with the boundary condition of
vanishing at a point, gives only the zero solution. This is not eligible to be a
probability density.
The second case is when the particle is reected when it reaches the boundary
point x

. This is the case of a reecting boundary. The boundary conditions


for this case are u/x(x

, t) = 0 for the backward equation and J(x

, t) =
a(x

, t) (
2
/2)/y(x

, t) = 0 for the forward equation. If both boundaries


are reecting, then the process is usually recurrent.
If one has a one dimensional diusion with reecting boundary conditions,
then solving for a stationary probability density is easy. Detailed balance says
that the current is zero, and the reecting boundary condition says that the
current continues to be zero at the boundary. An example would be the Wiener
64 CHAPTER 11. DIFFUSION AND DRIFT
process with two reecting boundaries. Then the statationary probability den-
sity is uniform.
11.5 Martingales and hitting probabilities
A function f(X(t)) of the diusion process is a martingale if the equation Lf = 0
is satised. The general solution to this equation is given by
f(x) = C
1
_
x
x
0
exp(
2

2
I(y)) dy +C
2
. (11.19)
In particular, hitting probabilities dene martingales. Thus if x
0
< x < x
1
, and
f(x) is the probability of hitting x
1
before x
0
, then f(x) satises this equation
with f(x
0
) = 0 and f(x
1
) = 1. Thus
f(x) = C
1
_
x
x
0
exp(
2

2
I(y)) dy. (11.20)
Furthermore, we require that
f(x
1
) = C
1
_
x
1
x
0
exp(
2

2
I(y)) dy = 1. (11.21)
This xes the constant C
1
.
This technique also works for investigating recurrence and transience. Thus,
for instance, take the case when x
1
= +. If the integral of exp((2/
2
)I(y))
converges near +, then this shows that there is a strictly positive probability
of running o to +. Thus the process is transient. If the integral does not
converge, then the probability of going to +. before reaching a is zero. The
boundary condition at + is cannot be satised. There is a similar criterion
that stays that if the integral of exp((2/
2
)I(y)) converges near , then
there is a strictly positive probability of running o to . So the only way to
have recurrence is to have both integrals diverge, at + and at .
It is amusing to compare this with the condition for positive recurrence,
that is, for having a stationary measure. This condition says that the integral
of exp((2/
2
)I(y)) converges at + and at . However the sign in front of
the drift coecient in the exponential is opposite. This condition is a stronger
condition than the condition for recurrence. The Wiener process is an example
that is recurrent but not positive recurrent.
Example: Consider diusion on the half line x 0 with constant drift a = 0.
The solution to the martingale equation is f(x) = C
1
e
ax
+C
2
. Say that we
are interested in the probability that the particle escapes to + before getting
to 0. The solution that vanishes at 0 is f(x) = C
1
(1 e
ax
). If a > 0, then
the solution is f(x) = (1 e
ax
). On the other hand, if a < 0 the solution is
f(x) = 0.
11.6. PROBLEMS 65
11.6 Problems
1. Recurrence of the Wiener process. Show that starting at any point x, the
probability of reaching a point b is one.
2. Transience of the Wiener process with drift. Show that if the drift is
negative, then starting at any point x, the probability of reaching a point
b with x < b is less than one. Calculate this probability.
3. Gamblers ruin. Consider diusion on an interval a < x < b with absorbing
boundary conditions. Take the case of constant negative drift. Find the
probability of reaching b as a function of x.
4. A queue. Consider diusion on the interval 0 < x < with reecting
boundary condition at zero. Take the case of a constant negative drift.
Find the stationary probability density.
5. The double well. Consider diusion on the line with drift a(x) = bxcx
3
,
where b > 0 and c > 0. Sketch the drift a(x). This is positive recurrent,
so it has a stationary probablity density (x). Solve for (x) and sketch
it.
6. Punctuated equilibrium. Consider the diusion in the preceding problem.
Take b large with b/c xed. Sketch a typical sample path. Discuss how the
process combines features of the Ornstein-Uhlenbeck process with features
of the two-state Markov process.
66 CHAPTER 11. DIFFUSION AND DRIFT
Chapter 12
Stationary processes
12.1 Mean and covariance
Finally we leave the realm of Markov processes. We go to the realm of proba-
bility where we study processes through their covariance functions.
Consider a stochastic process X(t) dened for real t. (We could also consider
a stochastic process X
n
dened for integer n; much of the theory is parallel.)
The mean of the process is
(t) = E[X(t)]. (12.1)
The covariance of the process is
r(s, t) = Cov(X(s), X(t)). (12.2)
The variance of the process is of course r(t, t).
Not every function can be a variance function. It is obvious that r(s, t) =
r(t, s), so this symmetry property is essential. Furthermore, it is not hard to
prove from the Schwarz inequality that |r(s, t)|
_
r(s, s)
_
r(t, t). However
there is another deeper property.
Theorem 12.1 For each integrable function f the covariance r(s, t) has the
positivity property
_

f(s)r(s, t)f(t) ds dt 0. (12.3)


Proof: Consider the linear combination
_
f(t)X(t) dt. (12.4)
Its variance is greater than equal to zero.
67
68 CHAPTER 12. STATIONARY PROCESSES
12.2 Gaussian processes
In general, knowing the mean and covariance of a process is not enough to
determine the process. However for a Gaussian process this always works.
A Gaussian process is a process that has the property that for each t
1
, . . . , t
n
the random variables X(t
1
), . . . , X(t
n
) have a joint Gaussian (normal) distrib-
ution.
The mean vector of X(t
1
), . . . , X(t
n
) is the vector (t
1
), . . . , (t
n
) with com-
ponents (t
j
). The covariance matrix of X(t
1
), . . . , X(t
n
) is the matrix with en-
tries r(t
i
, t
j
). Since the mean vector and covariance matrix of jointly Gaussian
random variables determine the density function, a Gaussian process is deter-
mined by its mean and covariance functions. (Technically, this is true only
for properties of the process that are determined by what happens at a nite
number of time instants.)
Thus in the following there are two points of view. The rst is to think of a
general process with nite mean and covariance functions. Then these functions
are giving seriously incomplete information about the process. However there
are circumstances where one has a non-Gaussian process, but one still gets
useful information from knowing the covariance. The second is to think of a
Gaussian process. This is much more special, but one is sure that the mean and
covariance is giving a complete description.
12.3 Mean and covariance of stationary processes
A process is stationary if for every function f(x
1
, . . . , x
n
) for which the expec-
tations are dened we have
E[f(X(t
1
), X(t
2
), . . . , X(t
n
))] = E[f(X(t
1
s), X(t
2
s), . . . , X(t
n
s))].
(12.5)
In particular the mean satises (t) = (0) and so
E[X(t)] = , (12.6)
independent of t. We shall usually consider a situation when this constant is
zero.
The covariance of the process satises r(t
1
, t
2
) = r(t
1
t
2
, 0). Thus we may
write r(t) for r(t, 0) and obtain
Cov(X(t
1
), X(t
2
)) = r(t
1
t
2
). (12.7)
Clearly the function r(t) = r(t) is symmetric. The variance of the process is
of course the constant r(0). We have the inequality |r(t)| r(0). Furthermore,
the covariance r(s t) has the positivity property
_

f(s)r(s t)f(t) ds dt 0. (12.8)


12.4. CONVOLUTION 69
12.4 Convolution
The convolution of two functions f and g is the function f g dened by
(f g)(t) =
_

f(t s)g(s) ds. (12.9)


This is a commutative product, since a change of variable gives
(f g)(t) =
_

f(u)g(t u) du. (12.10)


Ordinarily we want to assume that one of the functions is integrable and the
other at least bounded.
The interpretation of f g is that it is a weighted integral of translates of
f with weight function g. Of course it can also be thought of as a weighted
integral of translates of g with weight function f.
Sometimes we want one of the functions to satisfy f(t) = 0 for t < 0. In this
case we say that the convolution is causal. Then the convolution takes the form
(f g)(t) =
_
t

f(t s)g(s) ds (12.11)


or
(f g)(t) =
_

0
f(u)g(t u) du. (12.12)
We see that in this case the convolution (f g)(t) only depends on g(s) for s t,
or on g(t u) for u 0. In other words, the convolution f g at time t is a
weighted integral of the values of g in the past of t.
12.5 Impulse response functions
Let Y (t) be a stationary process with mean zero. Let h(t) be a real integrable
function, which we call the impulse response function. Then we dene a new
process by the convolution of the impulse response function with the process.
The result
X(t) =
_

h(t u)Y (u) du =


_

h(v)Y (t v) dv (12.13)
is another stationary process. If the rst process has covariance function r
Y
,
the new process has covariance function r
X
. We can express r
X
in terms of r
Y
by the formula
r
X
(t s) =
_

h(t u)h(s v)r


Y
(u v) dudv. (12.14)
70 CHAPTER 12. STATIONARY PROCESSES
The impulse response function is causal if h(t) = 0 for all t < 0. Then we
may write the new process as
X(t) =
_
t

h(t u)Y (u) du =


_

0
h(v)Y (t v) dv (12.15)
The covariances are then related by
r
X
(t s) =
_
t

_
s

h(t u)h(s v)r


Y
(u v) dudv. (12.16)
One important but rather singular case is when Y (t) = dW(t)/dt is white
noise. We take h(t) to be a real square integrable function. Then the convolution
X(t) =
_

h(t u) dW(u).
is another stationary process. Since the white noise process has covariance

2
(u v), the covariance of the resulting process is
r
X
(t s) =
2
_

h(t u)h(s u) du.


Again the impulse response function is causal if h(t) = 0 for all t < 0. Then
we may write the new process as
X(t) =
_
t

h(t u) dW(u). (12.17)


The covariance is given by
r
X
(t s) =
2
_
min(s,t)

h(t u)h(s u) du. (12.18)


In particular, for t 0 this is
r
X
(t) =
2
_
0

h(t u)h(u) du. (12.19)


Example: We have the standard example of the Ornstein-Uhlenbeck process.
This is given by
X(t) =
_
t

e
(ts)
dW(s). (12.20)
Thus the impulse response function is h(t) = e
t
for t 0 and zero elsewhere.
The corresponding covariance function is
r
X
(t) =
2
_
0

e
(ts)
e
s
ds =

2
2
e
t
(12.21)
12.6. PROBLEMS 71
for t 0.
We can ask the converse question: Given a covariance function r(t), can it
be represented by some causal impulse response function h(t) in the form
r(t) =
2
_
0

h(t u)h(u) du (12.22)


for t 0. This would say that this covariance function could be explained as
that of a causal process formed from white noise. The answer turns out to be
somewhat surprising. It will appear in the next chapter. However here is one
easy remark. We shall see that it can be accomplished by an impulse response
function with h(0) = 0 only for covariances with a slope discontinuity at zero.
The reason is that we can compute the right hand derivative to be
r

(0+) =
1
2

2
h(0)
2
. (12.23)
Example: Take c > 0 and consider the covariance function
r(t) =
c
2
t
2
+c
2
(12.24)
This cannot be explained by a causal impulse response function with h(0) = 0.
It is too smooth. We shall see in the next chapter that it cannot be explained
by any causal impulse response function.
12.6 Problems
1. HPS, Chapter 4, Problem 9
2. HPS, Chapter 4, Problems 20(b)(c)
3. HPS, Chapter 4, Problem 20(d)
4. HPS, Chapter 5, Problem 6
5. HPS, Chapter 5, Problem 7.
6. Consider the identity r

(0+) = (1/2)
2
h(0)
2
for the case of the Ornstein-
Uhlenbeck process with r(t) = r(0)e
|t|
and h(t) = e
t
. Show that this
gives the uctuation-dissipation relation.
72 CHAPTER 12. STATIONARY PROCESSES
Chapter 13
Spectral analysis
13.1 Fourier transforms
Let f be a complex function of time t such that the quantity
M
2
=
_

|f(t)|
2
dt < . (13.1)
Dene the Fourier transform

f as another complex function of angular frequency
given by

f() =
_

e
it
f(t) dt. (13.2)
Then
M
2
=
_

f()|
2
d
2
< (13.3)
with the same constant M
2
.
The big theorem about Fourier transforms is the inversion formula. It says
that every square integrable function of t can be represented as as an inverse
Fourier transform, that is, as an integral involving functions e
it
for varying
angular frequencies .
Theorem 13.1 Let f(t) be a square-integrable function, and let

f() be its
Fourier transform. Then
f(t) =
_

e
it

f()
d
2
. (13.4)
Recall that e
it
= cos(t) +i sin(t), so this is an expansion of an arbitrary
function of t in terms of trigonometric functions cos(t) and sin(t) with varying
angular frequencies .
73
74 CHAPTER 13. SPECTRAL ANALYSIS
If f(t) is real, then

f() =

f(), so if we write

f() = a() ib() and

f() = a() +ib(), we get the inversion formula in real form as


f(t) =
_

0
[cos(t)a() + sin(t)b()]
d

. (13.5)
If in addition f(t) is even, so that f(t) = f(t), then

f() is real and even,
and so

f() = a() and b() = 0. Thus we have the real even form
f(t) =
_

cos(t)

f()
d
2
=
_

0
cos(t)

f()
d

. (13.6)
The real expressions do much to demystify the Fourier transform, but in practice
everybody uses the complex notation.
13.2 Convolution and Fourier transforms
The Fourier transform is particularly useful for simplifying convolutions. Recall
that the convolution of f and g is the function f g dened by
(f g)(t) =
_

f(t s)g(s) ds. (13.7)


The fundamental result is that the Fourier transform of the convolution is given
by

(f g)() =

f() g(). (13.8)
Thus the Fourier transform of a convolution is the product of the Fourier trans-
forms.
Another useful fact is that the Fourier transform of the function f(t) is
the complex conjugate

f() of the Fourier transform of f(t).
13.3 Smoothness and decay
Here are two more big theorems about Fourier transforms.
Theorem 13.2 Let f(t) be a square integrable function, that is, a function such
that
_

|f(t)|
2
dt < . (13.9)
Then its Fourier transform

f() is also square integrable. Furthermore,
_

|f(t)|
2
dt =
_

f()|
2
d
2
. (13.10)
There is a corresponding result for the inverse Fourier transform.
13.3. SMOOTHNESS AND DECAY 75
Theorem 13.3 Let f(t) be an integrable function, that is, a function such that
_

|f(t)| dt < . (13.11)


Then its Fourier transform

f() is bounded and continuous. There is a corre-
sponding result for the inverse Fourier transform.
Proof: It is clear that
|

f()| = |
_

e
it
f(t) dt|
_

|f(t)| dt (13.12)
for each . Since the bound on the right hand side does not depend on , the
function

f is bounded.
The fact that

f is continuous is proved in more advanced treatments. (It is
an immediate consequence of the dominated convergence theorem.)
The second theorem has some useful corollaries.
Corollary 13.1 Let f be a function such f(t) and t
m
f(t) are integrable. Then
each derivative of the Fourier transform

f() up to order m is bounded and
continuous. There is a corresponding result for the inverse Fourier transform.
Proof: The Fourier transform of (it)
m
f(t) is the mth derivative of

f().
Corollary 13.2 Let f be a function such f(t) and its mth derivative f
(m)
(t)
are integrable. Then the Fourier transform

f() is bounded and also satises
a bound |

f()| C/|t|
m
. It thus goes to zero at innity at least at this rate.
There is a corresponding result for the inverse Fourier transform.
Proof: The Fourier transform of the mth derivative of f(t) is (i)
m

f().
What do these corollaries mean intuitively? Let us look at the case of the
inverse Fourier transform. If

f() and
m

f() are integrable, then the inverse
Fourier transform f(t) has m continuous derivatives. This says that if there are
few high frequencies in

f(), then the only kind of function f(t) that you can
synthesize out of sines and cosines is a rather smooth function. In particular,
to synthesize a discontinuous function f(t) you need a lot of high frequencies,
so

f() cannot even be integrable.
On the other hand, if

f() and its derivatives through order m are integrable,
then the inverse Fourier transform f(t) approaches zero at innity at least as
fast as a constant times 1/|t|
m
. This says that the smoothness of the Fourier
transform

f() produces a lot of cancelation in f(t) at long distances. However
a sharp frequency cuto in

f() leaves persistent oscillations in f(t) that do not
approach zero rapidly.
76 CHAPTER 13. SPECTRAL ANALYSIS
13.4 Some transforms
Most Fourier transforms are dicult to compute. However there are a few
cases where it is quite easy. Here are some of them. Notice that whenever you
have computed a Fourier transform, you have also computed an inverse Fourier
transform. The only change is a sign and a factor of 2. In these examples each
of the functions whose Fourier transform is being computed is a probability
density.
Take > 0. The Fourier transform of the function that is f(t) = e
t
for
t 0 and zero elsewhere is
_

0
e
it
e
t
dt =

+i
. (13.13)
Notice that the function f(t) is discontinuous, and its Fourier transform

f() is
not integrable.
It follows that the Fourier transform of the function (1/2)(f(t) + f(t)) =
(1/2)e
|t|
is (1/2)(

f() +

f()), which gives
_

e
it
e
|t|
dt =

2

2
+
2
. (13.14)
In this case the Fourier transform

f() is integrable, and the function f(t) is
continuous. On the other hand, the derivative of f(t) is not continuous, and
correspondingly

f() is not integrable.


Take a > 0. The Fourier transform of the function that is f(t) = 1/a
0 t a and zero elsewhere is
_
a
0
e
it
1
a
dt =
1
ia
(1 e
ia
). (13.15)
The function f(t) is discontinuous, and its Fourier transform

f() is not inte-
grable.
It follows that the Fourier transform of the function (1/2)(f(t) + f(t)) =
1/(2a) for a t t, zero elsewhere, is (1/2)(

f() +

f()), which gives
_
a
a
e
it
1
2a
dt =
sin(a)
a
. (13.16)
The function is not continuous, and its Fourier transform is not integrable.
As an example of the convolution theorem, take f(t) = 1/(2a) for |t| a and
zero otherwise. Then the convolution (f f)(t) has Fourier transform

f()
2
.
This says that
_
2a
2a
e
it
1
2a
(1
1
2a
|t|) dt =
sin
2
(a)
(a)
2
. (13.17)
This time the Fourier transform

f() is integrable, and the function f(t) is con-
tinuous. However, the derivative of f(t) is not continuous, and correspondingly

f() is not integrable.


13.5. SPECTRAL DENSITIES 77
13.5 Spectral densities
Let X(t) be a real stationary process, indexed by the time parameter t. Assume
that the mean is zero, so E[X(t)]=0. Then the covariance function is given by
r(t s) = E[X(t)X(s)]. (13.18)
If r(t) is the covariance function of a real stationary process, then r(t) is real
and even. Dene the spectral density function to be r(). This is real and even.
Then the covariance function may be expressed as an integral over frequencies
by
r(t) =
_

cos(t) r()
d
2
=
_

0
cos(t) r()
d

. (13.19)
Note that in the present treatment the spectral density is always the density
with respect to d/(2). Whenever one does an integral over the frequency
parameter one must use this weight, with the 1/(2) factor.
It is a remarkable fact (demonstrated below) that for a covariance function
the spectral density is positive: r() 0. Thus one can think of the variance
as the integral over variances associated with each frequency:
r(0) =
_

r()
d
2
=
_

0
r()
d

. (13.20)
Since X(t) is stationary random, there is no reason to believe that it would
have a Fourier transform that is a function. However one can dene the truncated
Fourier transform

X
T
() =
_
T
T
e
it
X(t) dt (13.21)
and this is a well-dened complex-valued random variable. Its variance is
E[|X
T
()|
2
]. There is no reason to expect this variance to have a limit as
T . However we shall consider the limit of the normalized variance
E[|X
T
()|
2
]/(2T).
Theorem 13.4 Consider a mean zero stationary process X(t) with integrable
correlation function r(t). Then the corresponding spectral density r() 0 for
all . This positivity has an interpretation in terms of the process. Namely, let

X
T
() be the truncated Fourier transform of the process. Then
1
2T
E[|X
T
()|
2
] r() (13.22)
as T .
This theorem proves that the spectral density function is positive. Further-
more, it gives an interpretation of the spectral density function as a limit of the
normalized variance of the truncated Fourier transform

X
T
() as T .
78 CHAPTER 13. SPECTRAL ANALYSIS
Proof: We can calculate the normalized variance
1
2T
E[|X
T
()|
2
] =
1
2T
_
T
T
_
T
T
e
i(ts)
r(t s) dt ds. (13.23)
Change to variables u = t s and v = t +s. We obtain
1
2T
E[|X
T
()|
2
] =
1
4T
_
2T
2T
_
2T|u|
2T+|u|
e
iu
r(u) dv du. (13.24)
This gives the explicit formula
1
2T
E[|X
T
()|
2
] =
_
2T
2T
(1
|u|
2T
)e
iu
r(u) du. (13.25)
For each u the limit of the integrand as T is just e
iu
r(u). Hence the
limit of the integrals is
lim
T
1
2T
E[|X
T
()|
2
] =
_

e
iu
r(u) du = r(). (13.26)
13.6 Frequency response functions
Let Y (t) be a stationary process. Let h(t) be a real function, which we call the
impulse response function. Then the convolution
X(t) =
_

h(t u)Y (u) du (13.27)


is another stationary process. If the rst process has covariance function r
Y
,
the new process has covariance function r
X
. We can express r
X
in terms of r
Y
by the formula
r
X
(t s) =
_

h(t u)h(s v)r


Y
(u v) dudv. (13.28)
This is a messy integral. However consider the Fourier transform

h() which
we call the frequency response function. Then it is easy to calculate that
r
X
() = |

h()|
2
r
Y
(). (13.29)
Thus the covariance of the new process has a very simple description in the
frequency domain.
Let h(t) be an impulse response function that is causal, that is, h(t) = 0 for
t < 0. Then the relation between the processes is
X(t) =
_
t

h(t u)Y (u) du. (13.30)


13.6. FREQUENCY RESPONSE FUNCTIONS 79
We can now express r
X
in terms of r
Y
by the formula
r
X
(t s) =
_
t

_
s

h(t u)h(s v)r


Y
(u v) dudv. (13.31)
What is causality in terms of the frequency response function? In that case

h() =
_

0
e
it
h(t) dt. (13.32)
The special feature now is that one is allowed to take complex, with negative
imaginary part, and the integral still converges, in fact very rapidly. Write
=
1
+i
2
with
2
< 0. Then for t 0
|e
it
| = e

2
t
1 (13.33)
is not only bounded by one, it also decreases exponentially as t . So the
integral is very rapidly convergent.
Now look at the case where the input is white noise. Let h be a square
integrable function. Then the general (not necessarily causal) convolution
X(t) =
_

h(t u) dW(u) (13.34)


is a stationary process. This has covariance function r
X
. We can express r
X
by
r
X
(t) =
2
_

h(t u)h(y) du. (13.35)


Again this is simpler when expressed in terms of the frequency response function.
We see that
r
X
() = |

h()|
2

2
. (13.36)
The white noise has spectral density
2
, a constant. Since all frequencies con-
tribute equally, this is a justication for the terminology white noise.
Let h(t) be an impulse response function that is causal, that is, h(t) = 0 for
t < 0. Then the relation between the processes is
X(t) =
_
t

h(t u) dW(u). (13.37)


We can now express r
X
(t) for t 0 by
r
X
(t) =
2
_
0

h(t u)h(u) du. (13.38)


The special feature of the frequency response function

h() in the causal case
is again that it is dened for all in the lower half plane.
80 CHAPTER 13. SPECTRAL ANALYSIS
Example: We have the standard example of the Ornstein-Uhlenbeck process.
This is given by
dX(t) =
_
t

e
(ts)
dW(s). (13.39)
Thus the impulse response function is h(t) = e
t
for t 0 and zero elsewhere.
The corresponding covariance function is
r
X
(t) =
2
_
0

e
(ts)
e
s
ds =

2
2
e
t
(13.40)
for t 0. The frequency response function is

h() =
1
i +
, (13.41)
which is dened for all in the lower half plane. The corresponding spectral
density is
r
X
() =
2
|

h()|
2
=
2
1

2
+
2
= r(0)
2

2
+
2
. (13.42)
In the physics literature this is called Lorentzian. Recall that integrals involving
involve the weight d/(2). So the Lorentzian in the literature is often
r
X
()
2
= r(0)
1

2
+
2
. (13.43)
13.7 Causal response functions
It is interesting to ask when a correlation function r(t) comes from convolving
a causal impulse response function with white noise. This is equivalent to the
representation
r(t) =
2
_
0

h(t s)h(s) ds. (13.44)


for t 0. Here h(t) is a square-integrable function such that h(t) = 0 for all
t < 0.
There is an interesting theorem that describes when this happens. To de-
scribe this theorem, we need a preliminary denition Take > 0 and dene the
approximate delta function

by

() =
1

2
+
2
. (13.45)
This function has integral one and is very peaked near zero when is small.
Theorem 13.5 Let r(t) be a covariance function such that the corresponding
spectral density r() has a logarithm log r() such that for each real and for
each real > 0 the convolution
(

log r)() =
_

( ) log r() d > . (13.46)


13.7. CAUSAL RESPONSE FUNCTIONS 81
Then r(t) may be represented by a square integrable causal impulse response
function.
The hypothesis of the theorem ensures that r() is never too close to zero.
Thus it cannot be zero on an interval, and it cannot approach zero too rapidly
at innity. Otherwise log r() is on an interval, or approaches rapidly
at innity. This would in turn make the convolution integral diverge to .
It is only possible to give an indication of the proof of the theorem. The
condition that r(t) may be represented in this way may be stated in terms of the
corresponding frequency impulse function. Recall the r() is even and positive.
Furthermore

h() satises

h() =

h(). The representation takes the form
r() =
2
|

h()|
2
. (13.47)
The causality condition is that h() may be extended to all with negative
imaginary part. Another way to write this representation is
r() =

h()

h() (13.48)
Thus to see whether the correlation r(t) comes from a causal convolution with
white noise, the task is to see whether the spectral density r() is a product of
a function dened in the lower half plane with its complex conjugate.
Sums are easier than products. So the idea is to express
r() = exp(log r()) (13.49)
and try to write log r() as the sum of a function with its complex conjugate.
However under the hypothesis of the theorem, we can write this as a limit using
(

log r)() log r() (13.50)


as 0.
Now the trick is to use the decomposition

() =
1
2i
1
i

1
2i
1
+i
(13.51)
of the approximate delta function as the sum of two complex conjugates. Then
we dene

h( i) = exp(
1
2i
_

1
+i
log r() d). (13.52)
Then

h( i) = exp(
1
2i
_

1
i
log r() d). (13.53)
It follows that

2
|

h()|
2
=
2

h()

h() = exp(
_

( ) log r() d). (13.54)


82 CHAPTER 13. SPECTRAL ANALYSIS
Take 0. In the limit we get

2
|

h()|
2
= exp(log r()) = r(). (13.55)
Example: Take the Ornstein-Uhlenbeck example with > 0. Here the
impulse response function is causal and given by
h(t) = e
t
(13.56)
for t 0. Therefore the frequency response function

h() =
i
i
(13.57)
extends to the lower half plane. The spectral density is
r() =
2
1

2
+
2
. (13.58)
The corresponding correlation function is
r() =

2
2
e
|t|
. (13.59)
Example: Take the example
r(t) =
c
2
c
2
+t
2
. (13.60)
The corresponding spectral density is
r() = ce
c||
. (13.61)
The integral does not converge, and so there is no causal representation. The
problem is that there are not enough high frequencies in the spectral density.
The time correlations are too smooth to have a causal explanation.
See the book Gaussian Processes, Function Theory, and the Inverse Spectral
Problem, by Dym and McKean. It has much more on this subject.
13.8 White noise
The reason for the term white noise is that the spectral density of the white
noise process is a constant
2
. This is the density with respect to d/(2), so
the contribution from the variance from any frequency interval of xed length
is the same, the integral of
2
d/(2) over the interval. All frequencies are
equally represented, hence the term white. Of course there is no such process,
at least not if we require it to be realized in the form of a function of t.
We can approximate white noise dW(t)/dt by dierence quotients of the
Wiener process. Take a > 0 and dene
X(t) =
W(t) W(t a)
a
. (13.62)
13.9. 1/F NOISE 83
This is obtained as a Wiener stochastic integral by using a causal impulse re-
sponse function h(t) that is 1/a on the interval from zero to a. The correspond-
ing frequency response function is

h() = (1 e
ia
)/(ia). (13.63)
The corresponding spectral density of the X(t) process is
r() =
2
|

h()|
2
=
2
4 sin
2
(a/2)/(a
2

2
). (13.64)
Thus the correlation function of the X(t) process is
r(t) =
2
(1/a)(1 |t|/a) (13.65)
for |t| a, and zero elsewhere.
In applications people do not often worry about whether their process is
really white noise for all frequencies, just over a suciently large range of fre-
quencies. So, for instance, if we take a to be small, then as long as is much
less than /a, the spectral density for the X(t) process will be quite uniform
over this range. So this might be a model for an actual white noise process in
an experimental situation.
13.9 1/f noise
The frequency that we have been using up to now is actually the angular
frequency = 2f, where f is the frequency. The frequency f tells how many
complete oscillations per second; the angular frequency describes how many
radians per second.
The subject of 1/f noise (or icker noise) is more mysterious. This refers
to experimental situations where the spectral density at low frequencies is
approximately proportional to 1/. The paradox is that 1/ is not integrable
near = 0, and so the total variance would be innite.
One explanation of white noise is that it is noise due to the additive eect
of independent processes on various time scales. Think of as determining a
time rate (so that 1/ is a time duration). The Ornstein-Uhlenbeck process has
a xed rate, so
r(t) = r(0)e
|t|
(13.66)
and
r() = r(0)
2

2
+
2
(13.67)
are determined by this parameter. A process with a wide variety of rates might
be determined by
r() =
_

0
p()
2

2
+
2
d. (13.68)
Take the case when p() = c/ for 0 < a < b < and is zero elsewhere.
The a is a lower cuto on the rate of decay, and b is an upper cuto. Notice
84 CHAPTER 13. SPECTRAL ANALYSIS
that this gives a lot of weight to slow processes with a decay rate near a. This
gives
r() =
_
b
a
c

2
+
2
d. (13.69)
This simplies to
r() =
2c

_
b
a
1

2
+
2
d =
1

2c
_
b
a

() d. (13.70)
If 0 < a is small compared to , and if b is large compared to , then for this
range of the integral is approximately equal to 1/2. Thus in this range
r()
c

. (13.71)
So this provides a possible covariance that could simulate 1/f noise over a large
range. Of course this is just an empirical t, not an explanation. The expression
above shows that there can be a covariance that gives 1/f noise over some range;
it does not give the underlying mechanism.
If we want to see what this looks like in terms of time correlations, we can
write
r(t) =
_
b
a
c

e
|t|
d. (13.72)
This can also be written
r(t) =
_
b|t|
a|t|
c
s
e
s
ds = c[log(b|t|)e
b|t|
log(a|t|)e
a|t|
+
_
b|t|
a|t|
log(s)e
s
ds].
(13.73)
If a|t| is much smaller then one and b|t| is much larger than one, then this
becomes
r(t) c[log(a|t|) ]. (13.74)
Here is the integral
=
_

0
log(s)e
s
ds. (13.75)
The expression for the covariance shows that for this range of time the corre-
lation decreases with time at a logarithmic rate. This is a very slow rate of
decrease.
Much more information about 1/f noise may be found in Kogans book on
Electronic Noise and Fluctuations in Solids, and elsewhere.
13.10 Problems
1. HPS, Chapter 6, Problem 21
2. HPS, Chapter 6, Problem 22

Você também pode gostar