Applied Probability Theory

Stat333 Lecture Notes
Applied Probability Theory
Jiahua Chen
Department of Statistics and Actuarial Science
University of Waterloo
c
Jiahua
Chen
Fall, 2003
Course Outline
Stat333
Review of basic probability. Generating functions and their applications.
Simple random walk, branching process and renewal events. Discrete time
Markov chain. Poisson process and continues time Markov chain. Quequing
theory and renewal processes.
Contents
1 Introduction
1.1 Probability Model . . . .
1.2 Conditional Probabilities
1.3 Bayes Formula . . . . .
1.4 Key Facts . . . . . . . .
1.5 Problems . . . . . . . . .
. . . . . . . . . . .
and Independence
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
2 Random Variables
2.1 Random Variable . . . . . . .
2.2 Discrete Random Variables . .
2.3 Continuous Random Variables
2.4 Expectations . . . . . . . . .
2.5 Joint Distribution . . . . . . .
2.6 Independence . . . . . . . . .
2.7 Formulas for Expectations . .
2.8 Key Results and Concepts . .
2.9 Problems . . . . . . . . . . . .
3 Conditional Expectation
3.1 Introduction . . . . . .
3.2 Formulas . . . . . . . .
3.3 Comment . . . . . . .
3.4 Problems . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
4
5
5
.
.
.
.
.
.
.
.
.
7
7
9
10
11
12
14
14
15
16
.
.
.
.
19
19
22
24
25
CONTENTS
4 Generating Functions
4.1 Introduction . . . . . . . . . . . . . . . . . . . .
4.2 Probability Generating Functions . . . . . . . .
4.3 Convolution . . . . . . . . . . . . . . . . . . . .
4.3.1 Key Facts . . . . . . . . . . . . . . . . .
4.4 The Simple Random Walk . . . . . . . . . . . .
4.4.1 First Passage Times . . . . . . . . . . .
4.4.2 Returns to Origin . . . . . . . . . . . . .
4.4.3 Some Key Results in the Simple Random
4.5 The Branching Process . . . . . . . . . . . . . .
4.5.1 Mean and Variance of Zn . . . . . . . . .
4.5.2 Probability of Extinction . . . . . . . . .
4.5.3 Some Key Results in Branch Process . .
4.6 Problems . . . . . . . . . . . . . . . . . . . . . .
5 Renewal Events
5.1 Introduction . . . . . . . . . . . . . .
5.2 The Renewal and Lifetime Sequences
5.3 Some Properties . . . . . . . . . . . .
5.4 Delayed Renewal Events . . . . . . .
5.5 Summary . . . . . . . . . . . . . . .
5.6 Problems . . . . . . . . . . . . . . . .
6 Discrete Time MC
6.1 Introduction . . . . . . . . . . . . . .
6.2 Chapman-Kolmogorov Equations . .
6.3 Classification of States . . . . . . . .
6.4 Limiting Probabilities . . . . . . . . .
6.5 Mean Time Spent in Transient States
6.6 Problems . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Walk
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Exponential and Poisson

7.1 Definition and Some Properties . . . . . . . . . . . . . . . .
7.2 Properties of Exponential Distribution . . . . . . . . . . . .
7.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
32
34
36
36
38
40
41
42
43
44
48
49
.
.
.
.
.
.
59
59
61
64
67
69
69
.
.
.
.
.
.
73
73
80
82
89
95
96
105
. 106
. 106
. 109
CONTENTS
7.4
7.5
7.6
7.3.1 Inter-arrival and Waiting Time Distributions

Further Properties . . . . . . . . . . . . . . . . . .
Conditional Distribution of the Arrival Times . . .
Problems . . . . . . . . . . . . . . . . . . . . . . . .
8 Continuous Time Markov Chain

8.1 Birth and Death Process . . . . . .
8.2 Kolmogorov Differential Equations
8.3 Limiting Probabilities . . . . . . . .
8.4 Problems . . . . . . . . . . . . . . .
9 Queueing Theory
9.1 Cost Equations . . . . . .
9.2 Steady-State Probabilities
9.3 Exponential Model . . . .
9.4 Single Server . . . . . . .
9.5 Network of Queues . . . .
9.5.1 Open System . . .
9.5.2 Closed Systems . .
9.6 Problems . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
112
113
114
116
.
.
.
.
119
. 122
. 125
. 130
. 134
.
.
.
.
.
.
.
.
139
. 139
. 141
. 143
. 144
. 149
. 149
. 150
. 154
10 Renewal Process
155
10.1 Distribution of N (t) . . . . . . . . . . . . . . . . . . . . . . . 156
10.2 Limiting Theorems and Their Applications . . . . . . . . . . . 159
10.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
11 Sample Exam Papers
165
11.1 Quiz 1: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 165
11.2 Quiz 2: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 167
11.3 Final Exam: Winter 2003 . . . . . . . . . . . . . . . . . . . . 169
Chapter 1
Introduction
1.1
Probability Model
A probability model consists of three parts: sample space, a collection of

events, and a probability measure.
Assume an experiment is to be done. The set of all possible outcomes is
called Sample Space. For example, if we roll a die, {1, 2, 3, 4, 5, 6} is the
sample space. We use notation S for the sample space. Every element of
S is called a sample point. Mathematically, the sample space is merely an
arbitrary set. There is no need of a corresponding experiment.
Roughly speaking, every subset of S is an event. The collection of events
is then all possible subsets of S. In some cases, however, we only admit a
specific class of subsets of S as events. We do not discuss this point in this
course.
For every event, we assign a probability to it. To make it meaningful,
we have to maintain some internal consistency. That is, the assignment is
required to have some properties. The following conditions are placed on
assigning probabilities.
Axioms of Probability Measure
A probability measure P is a function of events such that:
1. 0 P (E) 1 for any event E;
1
CHAPTER 1. INTRODUCTION
2. P (S) = 1;
3. P (
i=1 Ei ) =
i=1 P (Ei ) for any mutually exclusive events Ei , i =
1, 2, . . .. i.e. Ei Ej = for all i 6= j.
Mathematically, the above definition does not depend on the hypothetical

experiment. A probability model consists of a sample space S, a -algebra B
(a collection of subsets of S with some properties), and a probability measure
P.
The axioms for a probability model imply that the probability measure
has many other properties not explicitly stated as axioms. For example, since
P ( ) = P () + P (), we must have P () = 0.
Let E c be the complement of event E which consists of all sample points
which do not belong to E. Axioms 2 and 3 imply that
1 = P (S) = P (E E c ) = P (E) + P (E c ).
Hence, P (E c ) = 1 P (E).
For any two events E1 and E2 , we have
P (E1 E2 ) = P (E1 ) + P (E2 ) P (E1 E2 ).
In general,
P (ni=1 Ei ) =
P (Ei )
P (Ei1 Ei2 ) + , +(1)n+1 P (ni=1 Ei ).
i1 <i2
Example 1.1
In a party, n men throw their hats in the centre of the room. Each man
randomly picks up a hat. What is the probability that nobody gets his own
hat? What is the limit of this probability when n ?
Solution: Let Ai = the event that the ith man gets his hat for i =
1, 2, . . . , n. Then the event that nobody gets his own = [Ai ]c .
Note that
P (i Ai ) = nP (A1 ) (n2 )P (A1 A2 ) + .
1.2. CONDITIONAL PROBABILITIES AND INDEPENDENCE
Using classical definition of the probability measure (which satisfies three

Axioms),
(n 1)!
(n 2)!
P (A1 ) =
; P (A1 A2 ) =
n!
n!
and so on. We get
P (i Ai ) = 1
1
1
1
+ + (1)n+1 .
2! 3!
n!
The answer to the question is then

1 P (i Ai ) = 1 [1
1
1
1
+ + (1)n+1 ].
2! 3!
n!
The limit when n is then exp(1).
1.2
Conditional Probabilities and Independence
Two events A and B are independent if and only if

P (AB) = P (A)P (B).
Some people may have probabilistic instinct on why this relationship describes the independence, and why our notion of independence implies this
relationship. However, once the notion of independence is defined as above,
this relationship serves as our golden standard. We always try to verify this,
whether we work on assignment problems or on applications on the concept
of independence. For instance, to test if being a smoker is independent of
having heart disease, we check whether the above relationship is true by
collecting data on these incidents.
A sequence of events A1 , . . . , An are independent of each other if and only
if
Y
P (iI Ai ) =
P (Ai )
iI
for all subsets I of {1, 2, . . . , n}.

We would like to emphasize that pairwise independence does not imply
the overall independence.
Let E and F be two events and P (F ) > 0. We define that the conditional
probability of E given F by
P (E|F ) = P (EF )/P (F ).
As already defined, two events E and F are independent if and only if
P (EF ) = P (E)P (F ). When events E and F are independent, we find
P (E|F ) = P (E)
when P (F ) > 0. However, we should not use this relationship as the definition of independence. When P (F ) = 0, the conditional probability is not
defined, but E and F can still be two independent events.
1.3
Bayes Formula
Let Fi , i = 1, 2, . . . , n be mutually exclusive events such that Fi = S, and

P (E) > 0. Then
P (Fk |E) =
P (EFk )
P (E|Fk )P (Fk )
.
=P
P (E)
i P (E|Fi )P (Fi )
The Bayes formula is a mathematical consequence of defining the conditional probability. However, this formula has generated a lot of thinking in
statistics. We could think of E is an event (subset of sample space) of some
experiment to be done, and Fi s classify the sample points of the same experiment according to possibly a different rule (than the rule of E). Somehow,
E is readily observed, but Fi s are not. Before the experiment is done, we
may have some prior information on what probabilities of Fi s are. However,
when the experiment is done and the outcome (the sample point) is known
to belong to E, but its membership in Fi remains unknown, this Bayes formula allows us to update our assessment of the chance for Fi in view of the
occurrence of E. For example, before we toss a die, it is known the chance of
observing 2 is 1/6. After a die is tossed, and you are told that the outcome
is an even number, then the conditional probability becomes 1/3.
Here is a less straightforward example.
1.4. KEY FACTS
Example 1.2
There are three coins in a box: 1. two headed; 2. fair; 3. biased with P (H) =
0.75.
When one of the coins is selected at random and flipped, it shows head.
What is the probability that it is the two headed coin?
Solution: Let C1 , C2 and C3 represent the events when the two headed,
fair or biased coin is selected, respectively. We want to find P (C1 |H).
P (H|C1 )P (C1 )
.
P (C1 |H) = P3
i=1 P (H|Ci )P (Ci )
The answer is 4/9.
Remark It is not so important to memorize the Bayes formula, but the definition of the conditional probability. Once you understand the conditional
probability, you can work out the formula easily.
1.4
Key Facts
A probability space consists of three components: Sample space, the collection of events, and the probability measure. The probability measure satisfies
three Axioms and from which we introduce the concepts of conditional probability and independence. The Bayes theorem is a simple consequence of
manipulating the idea of conditional probability. However, the result incited
philosophical debate in statistics.
1.5
Problems
1. Suppose that in an experiment, a fair die is rolled twice. Let A={the

first outcome is even}, B={the total score is 4}, C= the total score,
D=the absolute difference between two scores.
(a) Which of A, B, C, D are events? Which of them are random
variables?
(b) Which of the following make sense? Which of them do not?
(i) A B, (ii) P (C), (iii) E(A), (iv) Var(D).
2. Let S be the sample space of an particular experiment, A and B be
events, and P be a probability measure. Which of the followings are
Axioms, definitions and formulas?
(i) P (A B) = P (A) + P (B) P (AB).
(ii) P (S) = 1.
(iii) P (A|B) = P (AB)/P (B) when P (B) 6= 0.
3. Using only the axioms of probability, show that
1) P (A B) = P (A) + P (B) P (AB)
2) P (A B C) = P (A) + P (B) + P (C) P (AB) P (AC) P (BC) +
P (ABC).
4. a) Prove that P (ABC) = P (A|BC)P (B|C)P (C).
b) Prove that if A and B are independent, then so are Ac and B c .
5. Let A and B be two events.
(a) Show that in general, if A and B are mutually exclusive, then they
are not necessarily independent.
(b) Find a particular pair of events A and B such that they are both
mutually exclusive and independent.
6. Prove Booles inequalities:
(a) P (ni=1 Ai )
Pn
i=1
P (Ai ),
(b) P (ni=1 Ai ) 1
7. Let A1 A2 be a sequence of events. If

show that
lim P (An ) = 0.
n
i=1
Pn
i=1
P (Aci ).
Ai = (empty),
Chapter 2
Random Variables
2.1
Random Variable
In practice, we may describe the outcomes of an experiment by any terminology. For example, if Mary and Paul compete in a game, the outcomes can
be: Mary wins; Mary loses; it is a draw.
However, it is more convenient in mathematics to code the outcomes by
numbers. For example, we may define the outcome as 1 if Mary wins, the
outcome is 1 if Mary loses, and as 0 if it is a draw. That is, we can transform
the outcomes in S into numbers. There are many ways to transform the
outcomes.
In probability theory, we call the mechanism of transforming sample
points into numbers as Random Variable. More formally, we define a
random variable as a function on the sample space S.
We use capital letters X, Y , and so on for random variables.
In most applications, we focus mainly on the value of the function (random variables). That is why it appears that the random variables are numbers, rather than mechanisms of transforming sample points into numbers.
As a function, a random variable is totally deterministic. There is nothing
random. However, the inputs of this function are random, this fact implies
the outcome of the transformation is random. This is how we get the notion
that random variables are random.
Example 2.1
7
CHAPTER 2. RANDOM VARIABLES
Let S be the ordered outcomes of rolling two fair dice. Define X be the sum
of two outcomes. If = (2, 5) which is a sample point, then X() = 7.
Nothing is random.
Since in a specific experiment, we are not certain in advance whether the

two outcomes will be = (2, 5), we hence do not know whether the outcome
of X will be 7. This gives us the illusion of X being random. Its randomness
is inherited from the randomness of the outcome in S.
When we use notation X = 7, we often do not mean that the outcome
of X is 7 in a specific experiment. Rather, we define it as
X = 7 = Set of sample points which makes X equal 7.
Hence, in this example,
X = 7 = {(1, 6), (2, 5), . . . , (6, 1)}
which is a subset of S. Consequently, it is an event. When the dice are fair,
the classical definition assigns a probability of 1/6 to this event.
If the dice are not fair, we usually assign a different value to it, or we
do not know what value is most suitable in this application. However, we
believe that there must be a suitable value exists and it does not have any
effect on the definition of X.
There is another excuse for not focus on the fact that a random variable
X is a function. We care more about probabilities associated with events in
the form of X x than about how X maps S into real numbers. Once
P (X x) is available for all real numbers x, we then classify X according
to the form of this function, and ignore X itself.
Example 2.2
Toss a coin until the first head appears. Suppose in each trial, P (H) = p
and trials are independent. Define X = the number of tosses when the
experiment is completed.
In this experiment, the sample space is
S = {H, T H, T T H, . . .}.
2.2. DISCRETE RANDOM VARIABLES
The corresponding values of X are

{1, 2, 3, . . .}.
We find
P (X = n) = p(1 p)n1
for all n 1. Once this is done, we say X has geometric distribution. How
this X is defined becomes irrelevant.
If X is a random variable, we call

F (x) = P (X x)
the cumulative distribution function(c.d.f.). It is known that F (x) is a
c.d.f. of some random variable in some probability model if and only if
1. F (x) 0;
2. F () = 1, F () = 0;
3. F (x) is monotone increasing and right continuous.
That is, we can construct a sample space together with a probability measure
and a random variable, so that the cumulative distribution function of this
random variable is given by F (x).
2.2
Discrete Random Variables
If the set of all possible outcomes of a random variable X is countable, then

we say that the random variable X is discrete.
For example, if a random variable can only take values {.2, .5, 2, }, it
is discrete. More commonly seen discrete random variables in our textbooks
take integer values. However, we should remember that discrete random
variable can take any values, as long as the number of possible values remain
countable.
By the way, the notion of countable needs to be clarified. If we can find
a one-to-one map from a set to a set of integers, then this set is countable. The set of all even numbers is countable. The set of the numbers
10
{1, .1, .01, .001, . . .} is also countable. Being countable implies that we can
arrange the elements in the set into a sequence. We often represent a countable set of real numbers as {x1 , x2 , . . .}.
If { t1 , t2 , t3 , . . .} is the set of possible outcomes of X, we say the function
f (ti ) = P (X = ti )
the probability (mass) function (p.m.f.) of X.
Note that in this definition, I used notation ti for possible values of the
random variable X. Although it is a common practice that we use xi s for
possible values of the random variable X, this is not a requirement. It is
very important for us to make a distinction between (the notation of) the
possible values of X, and X itself.
2.3
Continuous Random Variables
If the c.d.f. of a random variable F (x) = P (X x) can be written as

F (x) =
f (t)dt,
for some non-negative f (t), we say X is absolutely continuous. We have

f (x) = dF (x)/dx
(for almost all x) and f (x) is called the density function of X.
We classify random variables according to their cumulative distribution
functions, probability functions or density functions. We usually do not mind
how these random variables are defined.
Example 2.3
1. X has Binomial (n, p) distribution if f (i) = P (X = i) =
for i = 0, 1, . . . , n.
2. X has Poisson () distribution if
f (i) = P (X = i) =
for i = 0, 1, 2, . . ..
i
exp()
i!

n
i
pi (1p)ni
2.4. EXPECTATIONS
11
3. X has uniform [0, 1] distribution if F (x) = P (X x) = x for x [0, 1],

or f (x) = 1 for x [0, 1].
4. X has exponential distribution with mean parameter if its c.d.f. is
given by F (x) = 1 exp(x/) or if its p.d.f. is given by f (x) =
1
exp(x/) for x 0.
Note that we do not have to specify the sample space, probability measure, and how the random variables are defined in the above example.
Two basic types of random variables have been introduced. In theory,
there is a third type of random variables. However, the third type of random
variables is usually not discussed in elementary probability courses. Notice
that the sum of two random variables is clearly another random variable.
When we add a continuous random variable to a discrete random variable,
the new random variable is not discrete nor continuous. That is, we cannot always classify a random variable into one of the three possible types.
A measure theory result states, however, that any random variable can be
written as a linear combination of three random variables of each type.
2.4
Expectations
A proper definition of the expectation of a random variable needs advanced

knowledge of real analysis. We give a handicapped definition as follows.
If X is discrete with possible values {x0 , x1 , x2 , . . . , }, then we calculate
its expectation as
E(X) =
xi P (X = xi )
i=0
when the summation converges absolutely.

If X is (absolute) continuous with density function f (x), then we calculate
its expectation as
Z
E(X) =
tf (t)dt
when the integration converges absolutely.
12
When the convergence does not hold, we say the expectation does not
exist.
To calculate the expectation of any random variable, we should pay a lot
of attention to the if part before you start. Many students lost the clue
because they ignore this part of the definition.
Example 2.4
Calculate expectation of Binomial and Exponential random variables.
2.5
Joint Distribution
Let X and Y be two random variables. Note that it is possible to define two
functions on the same sample space. For example, suppose our sample space
is [0, 1][0, 1], the unit square. Every sample point can be represented as
(w1 , w2 ). Let
X(w1 , w2 ) = w1 , Y (w1 , w2 ) = w2
and assume the probability measure on [0, 1][0, 1] is uniform. Then both
X and Y have uniform distribution. We find
P (X s, Y t) = st;
when (s, t) [0, 1][0, 1].
If Z is another random variable such that
Z(w1 , w2 ) = 1 w1 .
We find Z also have uniform distribution. However,
P (X s, Z t) 6= st
in general.
The moral of this example is: knowing individual distributions of X, Y
and Z is not enough to tell their joint behavior.
The joint random behavior of two random variables X and Y is characterized by their joint c.d.f. defined as
F (x, y) = P (X x, Y y).
2.5. JOINT DISTRIBUTION
13
The joint c.d.f. of more random variables are defined similarly.

Let us point out again that the lower case letters x, y are notation for
dummy variables. They do not have to associate with random variables X
and Y . That is, we may use
F (s, t) = P (X s, Y t)
to represent exactly the same joint c.d.f. It is the appearance of X, Y in the
definition that makes F the joint c.d.f of X and Y .
The marginal c.d.f of X or Y can be obtained by taking limit.
FX (s) = P (X s) = lim F (s, t).
t
FY (y) = P (Y y) = s
lim F (s, y).
Note that I used (s, t, y) on purpose. It is certainly not a good practice,
but the point is, X does not have to be linked with x.
When both X and Y are discrete, it is more convenient to work with the
joint probability (mass) function:
f (x, y) = P (X = x, Y = y);
When there exists a non-negative function f (x, y) such that
F (x, y) =
f (s, t)dsdt
for all real numbers (x, y), we say that X and Y are jointly (absolutely)
continuous and f (x, y) is their joint density function.
The marginal probability function (for discrete case) can be obtained as
fX (x) =
f (x, y).
The marginal density function (for continuous case) can be obtained as

fX (x) =
f (x, y)dy.
14
2.6
Independence
If the joint c.d.f. of X and Y satisfies F (x, y) = FX (x)FY (y) for all x, y, then
we say X and Y are independent.
When both X and Y are discrete, then the independence is equivalent to
f (x, y) = fX (x)fY (y)
for all (x, y) where f (x, y) is the joint probability function. When X and Y
are jointly continuous, then the independence is equivalent to
f (x, y) = fX (x)fY (y)
for almost of (x, y) where f (x, y) is the joint density function.
2.7
Formulas for Expectations
Let X and Y be two random variables. We define

V ar(X) = E(X E(X))2 = E(X 2 ) (EX)2 ;
Cov(X, Y ) = E[(X EX)(Y EY )].
It is known that
E(aX + bY ) = aEX + bEY ;
V ar(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X, Y )
where a, b are two real numbers (constants).
Let Z = X + Y be a newly created random variable. Its c.d.f. can be
derived from the joint c.d.f. of X and Y . This task is not always simple.
There are two special cases.
First, assume X and Y are independent and jointly continuous. Assume
that X has density function f (x) and Y have density function g(y). Then
we know that the joint density function f (x, y) = f (x)g(y). The density
function of Z = X + Y is given by
fZ (t) =
f (t y)g(y)dy.
2.8. KEY RESULTS AND CONCEPTS
15
Second, assume X and Y are independent, take non-negative integer values only, with probability functions f (x) and g(y). (Note the notation look
the same as before). The joint probability function of Z = X + Y is
P (Z = n) =
n
X
f (i)g(n i).
i=0
Example 2.5 Derivation of the distribution of X + Y .

1. Both X and Y have exponential distribution with common density
F (x) = exp(x) for x 0.
2. Both X and Y have Poisson distribution with means 1 and 2 .
2.8
Key Results and Concepts
Random variables are real valued functions defined on the sample space.
Their randomness is the consequence of the randomness of the outcome from
the sample space. We classify them according to their cumulative distribution functions or equivalently, their probability mass functions or probability
density functions.
A discrete random variable takes at most countable number of possible
values. An absolutely continuous random variable has cumulative distribution function which can be obtained from a density function by integration.
(Or roughly, its cumulative distribution function is differentiable). The third
type of random variable is not discussed.
A random variable has, say, Poisson distribution if its probability function
has the form
n
exp(), n 0, 1, 2, . . . .
n!
In general, the distribution of a random variable is named after the form of
its cumulative distribution function.
The mean, variance, moments of a random variable are determined by
its distribution. In many examples, they can be obtained by summation or
16
integration (to some students) easily. In other examples, the mean, variance
of a random variable can be obtained via its relationship to other random
variables. Thus, memorizing some formulas is useful.
2.9
Problems
1. If X and Y are two random variables, what do we mean by

(i) F (x) is the cumulative distribution function of X?
(ii) X 4 is independent of Y 2?
2. Let X be a random variable with Binomial distribution with parameters
n = 3, p = 0.4, i.e.
!
3
pX (k) =
(0.4)k (1 0.4)3k
k
when k = 0, 1, 2, 3.
Let Y = (X 1)2 .
(i) Let FX (x) be the cumulative distribution function of X. Calculate
FX (2.4).
(ii) Tabulate the probability function of Y .
(iii) Tabulate the probability function of X given Y = 1. (iv) Tabulate
E(X|Y ).
3. A random number N of fair dice is thrown. P (N = n) = 2n , n 1.
Let S be the sum of the scores. Find the probability that
a) N = 2 given S = 4
b) S = 4 given N = 2.
c) S = 4 given N is even
d) the largest number shown by any die is r.
4. A coupon is selected at random from a series of k coupons and placed
in each box of cereal. A house-husband has bought N boxes of cereal.
2.9. PROBLEMS
17
What is the probability that all k coupons are obtained? (Hint: Consider the event that the ith coupon is not obtained. The answer is in
nice summation format.)
5. If birthdays are equally likely to fall in each of the twelve months of
the year, find the probability that all twelve months are represented
among the birthdays of 20 people selected at random.
(Hint: let Ai be the event that the ith month is not included and
consider A1 A2 A12 )
6. Let X be a random variable and g() be a real valued function.
(a) What do we mean by X is discrete?
(b) If X is a discrete random variable, argue that g(X) is also a random
variable and discrete.
(c) If X is a continuous random variable, is g(X) necessarily a continuous random variable? Why?
7. Let a and b be independent random variables uniformly distributed in
(0, 1). What is the probability that x2 + ax + b = 0 has no real roots?
8. Express the distribution functions of
X + = max{0, X},
X = min{0, X},
|X| = X + + X ,
in terms of the distribution function F of the random variable X.

9. Is it generally true that E(1/X) = 1/E(X)? Is it ever true that
E(1/X) = 1/E(X)?
10. Suppose that 10 cards, of which 5 are red and 5 are green, are put
at random into 10 envelops, of which 7 are red and 3 are green, so
that each envelop will contain a card. Determine the probability that
exactly k envelopes will contain a card with a matching color(k=0, 1,
. . ., 10).
18
Chapter 3
Conditional Distribution and
Expectations
3.1
Introduction
Suppose both X and Y are discrete and hence have a joint probability function f (x, y). Then, we have
P (X = x|Y = y) =
f (x, y)
P (X = x, Y = y)
=
.
P (Y = y)
fY (y)
Of course, this is meaningful only if P (Y = y) = fY (y) > 0.

When we pay no attention on the part of Y = y, this is a function of
x only. However, this function (or the way of crunching the number x and
report a number called probability) is determined by X, Y and the number y
jointly. As a function of x, it is a probability function. Since it is determined
by X and Y = y, we say it is the conditional probability function of X given
Y = y. A commonly used notation is fX|Y (x|y).
Example 3.1
There are two urns. The first contains 4 white and 6 black balls, and the
second contains 2 white balls and 8 black balls. A urn is selected randomly,
and then we randomly pick 5 balls from the urn (with replacement). Define
19
20
CHAPTER 3. CONDITIONAL EXPECTATION
X= the number of while balls selected. What is the probability function of

X?
Solution: Consider the situations when different urns are selected. Define
Y = i if the ith urn is selected.
Let us work on the conditional probability functions first.
!
5
P (X = j|Y = 1) =
(.4)j (.6)5j
j
and
5
(.2)j (.8)5j
P (X = j|Y = 2) =
j
for j = 0, 1, . . . , 5.
The marginal probability function of X is given by
!
5
5
P (X = j) = (.5)[
((.4)j (.6)5j ] + (0.5)[
(.2)j (.8)5j ].
j
j
As we have noticed, when Y = 1 is given, X has binomial distribution

with n = 5, p = 0.4. This distribution has expectation 2. We use notation
E(X|Y = 1) = 2.
In general, we define
E(X|Y = y) =
xP (X = x|Y = y)
where the sum is over all possible values of X.

Remark: Again, we should always first determine whether X is discrete.
If it is, then determine the what values of X can be before this formula is
applied.
When both X and Y are discrete, E(X|Y = y) is well defined. There are
several components in this definition. Whenever we use a new value y, the
outcome will probably change. In the last example,
E(X|Y = 1) = 2,
E(X|Y = 2) = 1.
3.1. INTRODUCTION
21
When we focus on the value of Y in this expression, we find we have a

function of y defined as
(y) = E(X|Y = y).
Just like a function such as g(y) = y 2 , we know that (Y ) is also a random
variable. Thus, we might want to know the expectation of this new random
variable. It turns out that
E[(Y )] =
(y)P (Y = y)
E(X|Y = y)P (Y = y)
XX
xP (X = x|Y = y)]P (Y = y)
xP (X = x, Y = y)
x,y
= E(X).
To be more concrete, we do not use (Y ) in textbooks, but write it as
E(X|Y ) and call it the conditional expectation of X given Y . For some with
mathematical curiosity, we may write
E(X|Y ) = E[X|Y = y]|y=Y .
Hence, the above identity can be stated as
E[E(X|Y )] = E(X).
One intuitive interpretation of this result is: the grand average is the weighted
average of sub-averages. To find the average mark of students in stat230, we
may first calculate the average in each of 6 sections. Hence, we obtain 6
conditional expectations (conditioning on which section a student is in). We
then calculate the weighted average of section averages according to the size
of each section. This is the second expectation being applied on the left hand
side of the above formula.
It turns out that this concept applies to continuous random variables too.
If (X, Y ) are jointly continuous, we define the conditional density function
22
of X given Y = y as
fX|Y (x|y) =
f (x, y)
fY (y)
where f (x, y) is the joint density, fX and fY are marginal density functions,
and we assume that fY (y) is larger than zero,
The conditional expectation will then be defined as
E(X|Y = y) =
xf (x, y)
dx
fY (y)
which is again a function of y. Same argument implies we could define

E(X|Y ) in exactly the same way as before. It is easy to verify that
E[E(X|Y )] = E(X).
In fact, this equality is true regardless the type of random variables (after
they are properly defined). The only restriction is: all relevant quantities
exist.
3.2
Formulas
Most formulas for ordinary expectation remain valid for the conditional expectation. For example,
E(aX + bY |Z) = aE(X|Z) + bE(Y |Z).
If g() is a function, we have
E[g(Y )X|Y ] = g(Y )E[X|Y ]
as g(Y ) is regarded as non-random with respect to Y .
At last, we define
V ar(X|Y ) = E[(X E(X|Y ))2 |Y ].
Then
V ar(X) = E[V ar(X|Y )] + V ar[E(X|Y )].
3.2. FORMULAS
23
To show this, notice that

E[V ar(X|Y )] = E{E[(X E(X|Y ))2 |Y ]}
= E{E(X 2 |Y ) [E(X|Y )]2 }
= EX 2 E[E(X|Y )]2 ,
and
V ar(E(X|Y )) = E[E(X|Y )]2 [E{E(X|Y )}]2
= E[E(X|Y )]2 [E(X)]2 .
Adding them up, we get the conclusion.
Example 3.2
A miner is trapped in a mine with 3 doors. If he uses the first door, he will
be free 2 hours later. If he uses the second, he will be back to the same spot
3 hours later. If he uses the third door, he will be back to the same spot 5
hours later. Assume that he does not have memory and will always pick a
door at random. What is the expected time it takes for him to get free?
Solution: Let X be the number of hours it takes until he gets free. We
are asked to calculate E(X).
It seems that the expectation is simpler if we know which door he selected
in the first place. For this reason, we define random variable Y to be the
door he selects in the first try.
Now it is simple to write down
E(X|Y = 1) = 2.
However, we only have
E(X|Y = 2) = 3 + E(X),
E(X|Y = 3) = 5 + E(X).
Even though it does not directly answer our question, we do have

1
E(X) = E[E(X|Y )] = [2 + (3 + EX) + (5 + EX)].
3
24
This is a simple linear equation, we find E(X) = 10.

Can we use the same idea to calculate V ar(X)?
It is seen that
V ar(X|Y = 1) = 0; V ar(X|Y = 2) = V ar(X|Y = 3) = V ar(X).

Hence,
2
E[V ar(X|Y )] = V ar(X),
3
1
98
V ar[E(X|Y )] = [22 + 133 + 152 ] 102 = .
3
3
Consequently, we find
2
98
V ar(X) = V ar(X) +
3
3
and hence V ar(X) = 98.
Remark: We certainly do not believe that the miner will be memoryless.

Such an example might be useful to model a trapped mouse. We might be
able to make the inference whether a mouse will learn after repeating this
experiment a number of times. We could compare the observed average with
this theoretical average under memoryless assumption. Any discrepancy may
point to the possibility that the mouse is in fact learning.
3.3
Comment
It could be claimed that the probability theory is a special case of measure theory in mathematics. However, the concepts of independence and
conditional expectation allow probability theory to be a separate scientific
discipline.
Our subsequent developments depend heavily on the use of conditional
expectation.
3.4. PROBLEMS
3.4
25
Problems
1. Let X be an random variable such that

P (X = n) = p(1 p)n ,
n = 0, 1, 2, . . .
is its probability function and 0 < p < 1.

(i) Show that P (X k) = (1 p)k for k = 0, 1, 2, . . ..
(ii) Prove the memoryless property:
P (X k1 + k2 |X k1 ) = P (X k2 )
for all non-negative integers k1 and k2 . (iii) Calculate the probability
that X is even.
2. Suppose X and Y are independent and exponentially distributed with
parameter > 0. Their common probability density function is
f (t) = et
t 0.
(i) Calculate P (X > 5|X > 3).

(ii) Calculate P (X + Y 1).
3. There are two TAs for a certain course. For a particular assignment
handed in, if it were marked by the first TA, the mark would be random
with mean 75% and variance (0.1)2 ; while if it were marked by the
second TA, the mark would be random with mean 70% and variance
(0.05)2 . The first TA has 40% chance to mark any single assignment.
Let X be the mark of the particular assignment. Calculate the mean
and variance of X.
4. Let X1 , X2 , X3 , . . . be independently distributed random variables such
that Xn has probability mass function
fn (k) = P (Xn = k) =
n
k
pk (1 p)nk
k = 0, 1, . . . , n.
26

(a) Find the probability generating function of Xn .
(b) Find the probability generating function of X1 + X2 + X3 .
(c) Let N be a positive integer valued random variable with probability
generating function G(s) and assume it is independent of X1 , X2 , . . ..
Find the probability generating function of XN
(d) Continuation of (c), find the probability generating function of XN +
XN +1 .
5. An integer N is chosen from the geometric distribution with probability
function
fN (n) = (1 )n1 , n = 1, 2, . . .
Given N = n, X has the uniform distribution on 1, 2, . . . , n.
a) Find the joint p.f. of X and N .
b) Find the conditional p.f. of N given X = x.
6. The number of fish that Elise catches in a day is a Poisson random
variable with mean 30. However, on the average, Elise tosses back two
out of every three fish she catches. What is the probability that, on a
given day, Elise takes home n fish. What is the mean and variance of
(a) the number of fish she catches,
(b) the number of fish she takes home?
(What independence assumptions have you made?)
7. Let X1 , X2 , X3 be independent random variables taking values in the
positive integers and having probability function given by P (Xi = x) =
(1 pi )px1
for x = 1, 2, . . . , and i = 1, 2, 3.
i
(a) Show that
P (X1 < X2 < X3 ) =
(b) Find P (X1 X2 X3 ).
(1 p1 )(1 p2 )p2 p23

.
(1 p2 p3 )(1 p1 p2 p3 )
3.4. PROBLEMS
27
8. Suppose that 13 cards are selected at random from a regular deck of 52

playing cards. (a) If it is known that at least one ace has been selected,
what is the probability that at least two aces have been selected? (b)
If it is known that the ace of heart has been selected, what is the
probability that at least two aces have been selected?
9. The number of children N in a randomly chosen family has mean
and variance 2 . Each child is a male with probability p independently
and X represents the number of male children in a randomly chosen
family. Find the mean and variance of X.
10. Suppose we have ten coins which are such that if the ith one is flipped
then heads will appear with probability i/10, i = 1, 2, . . . , 10. When
one of the coins is randomly selected and flipped, it shows head. What
is the conditional probability that it was the fifth coin?
28
Chapter 4
Generating functions and their
applications
4.1
Introduction
Suppose that {aj } = {a0 , a1 , . . .}, is a sequence of real numbers. If

A(s) =
aj s j = a0 + a1 s + a2 s 2 +
(4.1)
j=0
converges in some interval |s| s0 where s0 > 0, then A(s) is called the
generating function of the sequence {aj }
0 . The generating function provides
a convenient summary of a real number sequence. In many examples, simple
and explicit expressions of A(s) can be obtained. This enables us to study
the properties of {aj }
0 conveniently.
Example 4.1
The Fibonacci sequence {fj } is defined by f0 = 0, f1 = 1 and the recursive
relationship
fj = fj1 + fj2 ,
j = 2, 3, . . . .
(4.2)
We use the tool of generating function to find explicit expressions of fj .

29
30
CHAPTER 4. GENERATING FUNCTIONS
Solution: Multiplying by sj and summing over j gives
fj sj =
j=2
fj1 sj +
j=2
fj2 sj .
(4.3)
j=2
Note the summation starts from j = 2 because (4.2) is valid only when
P
j
j = 2, 3, . . .. By defining F (s) =
j=0 fj s , we get
j=2
fj sj =
fj sj f0 f1 s = F (s) s.
j=0
With similar treatment on the right hand side of (4.3), we obtain

F (s) s = sF (s) + s2 F (s).
(4.4)
Ignoring the convergence issue for the moment, we find

F (s) =
s
.
1 s s2
This is surely a simple and explicit generating function. To study other

properties of the sequence, let us note that in general, a generating function
has the McLaurin series expansion
A(s) = A(0) + A0 (0)s + A00 (0)s2 /2! +
which by comparison with (4.1) gives
aj =
A(j)
.
j!
This, of course, requires the function be analytic at 0 which is true when A(s)
converges in a neighbourhood of 0. An obvious conclusion is: the real number
sequences and the generating functions have an one-to-one correspondence
when the convergence and the analytic properties are true.
Now let us get back to the example, F (s) clearly converges at least for
|s| 0.5. This allows us to look for its McLaurin series expansion. Note that
1 5
1+ 5
2
s)(1
s)
1 s s = (1
2
2
4.1. INTRODUCTION
31
and by the method of partial fractions
1 X
1+ 5 j j X
1 5 j j
F (s) = [ (
)s (
) s ].
2
2
5 j=0
j=0
Recall the property of one-to-one correspondence,
1 1+ 5 j
1 5 j
fj = [(
) (
) ], j = 0, 1, 2, . . . .
2
2
5
It is interesting to note that

lim fj /fj1 = (1 +
5)/2
which is the golden ratio, to which the ancient Egyptians attributed many
mystical quantities.
In this example, the generating function has been used as a tool for solving
the difference equation (4.2). The generating functions will be seen to be
far more useful than just this. For example, if A(s) converges in |s| s0
with s0 > 1, then
A(1) =
j=1
aj , A0 (1) =
jaj
j=1
and so on.
Example 4.2
Consider the following series:
aj = 1, j = 0, 1, 2, . . . ;
bj = 1/j!, j = 0, 1, 2, . . . ;
c0 = 0, cj = 1/j, j = 1, 2, . . . .
Easy calculation shows their corresponding generating functions are A(s) =
(1s)1 , B(s) = es and C(s) = log(1s), where the regions of convergence
as |s| < 1 for A(s) and C(s), and all real s for B(s).
32
4.2
Probability Generating Functions
Let X be a random variable taking non-negative integer values with probability function {pj }, where
pj = P {X = j},
j = 0, 1, 2, . . . .
The generating function of {pj } is called the probability generating function of X and we write
G(s) = GX (s) = E{sX } = p0 + p1 s + p2 s2 + .
(4.5)
Of course, this function provides a convenient summary of the probability

function of X. Note that it converges at least for |s| 1 since, for s in this
interval,
pj |s|
j=0
pj = 1.
j=0
Using some mathematics tools, we can easily find

G0 (1) = E(X) =
jpj ,
G(r) (1) = E(X (r) ) =
j (r) pj
j=0
j=0
whenever the corresponding quantities exist. Otherwise, G(r) (1) has to be

replaced by lims1 G(r) (s) and infinity outcome is allowed. Note j (r) =
j(j 1) (j r + 1) and E(X (r) ) is the rth factorial moment of X. The
variance of X can be expressed as
V ar(X) = E(X (2) ) + E(X) [E(x)]2 = G00 (1) + G0 (1) [G0 (1)]2 .
Example 4.3
Suppose X has geometric distribution with parameter p so that
pj = P (X = j) = p(1 p)j ,
j = 0, 1, 2, . . . .
The probability generating function of X is

G(s) = E(sX ) =
j=0
p(1 p)j sj = p[1 (1 p)s]1
4.2. PROBABILITY GENERATING FUNCTIONS
33
for |s| < (1 p)1 . The mean of X is

E(X) = G0 (1) = p(1 p)[1 (1 p)s]2 |s=1 = p1 1.
From
E(X (2) ) = G00 (1) = 2p(1 p)2 [[1 (1 p)s]3 |s=1 = 2p2 (1 p)2 ,
we have
V ar(X) = 2p2 (1 p)2 + p1 1 (p1 1)2 = p2 (1 p).
Let us note that the definition of geometric distribution can be different in
different places. We should find it out before we start.
Consider now the sequence of tail area probabilities {qj } defined by
qj = P (X > j) = pj+1 + pj+2 + , j = 0, 1, 2, . . . .
j
Let Q(s) =
j=0 qj s be the corresponding generating function and note that
since qj 1 for all j, it follows that
qj sj
sj = (1 s)1
j=0
j=0
for |s| < 1. Note that q0 = 1 p0 and

qj = qj1 pj ,
j = 1, 2, . . . .
(4.6)
Again, (4.6) is true for all j start from 1. Multiplying (4.6) by sj and summing
over j we obtain
j=1
qj s j =
j=1
qj1 sj
p j sj
j=1
so that
Q(s) (1 p0 ) = sQ(s) [G(s) p0 ].
Thus, for all |s| < 1, we have that
Q(s) =
1 G(s)
.
1s
(4.7)
34
Since G(1) = 1, it follows from (4.7) and the Mean Value Theorem in
calculus that, for given |s| < 1, there exists s (s, 1) such that
Q(s) = G0 (s ).
It follows that
lim Q(s) = lim G0 (s ) = E(X).
s1
s1
Note, we have, in fact, proved

E(X) =
qj .
j=0
As a way to check whether you understand the technique of obtaining Q(s),

please use the similar technique to find the generating function of P (X j).
4.3
Convolutions and Sums of Independent

Random Variables
Let {aj } and {bj } be two sequences of real numbers and cj be a sequence
defined by
cj =
j
X
al bjl = a0 bj + a1 bj1 + + aj b0 ,
j = 0, 1, 2, . . . .
l=0
The new sequence is called the convolution of {aj } and {bj }; we write
{cj } = {aj } {bj }.
Theorem 4.1
If A(s), B(s) and C(s) are the generating functions of {aj }, {bj } and {cj } =
{aj } {bj } respectively, then (when they all exist at s)
C(s) = A(s)B(s).
Proof Let bj = 0 when j = 1, 2, . . .. Hence,
cj =
X
l=0
al bjl .
4.3. CONVOLUTION
35
Thus,
C(s) =
=
=
c j sj =
j=0
X
al bjl sj
j=0 l=0
al bjl sj =
l=0 j=0
[al sl
l=0
bjl sjl ]
j=0
[al sl B(s)] = A(s)B(s).
l=0
If X and Y are two non-negative integer valued independent random

variables and A(s) and B(s) are their probability generating functions, then
the probability function of Z = X + Y is
C(s) = E(sZ ) = E(sX+Y ) = E(sX )E(sY ) = A(s)B(s).
Thus, the above theorem implies that the probability function of Z is the
convolution of these of X and Y . Namely,
P (Z = j) =
j
X
P (X = l)P (Y = j l),
j = 0, 1, 2, . . . ,
l=0
a fact we knew already.

However, the theorem is more useful than this. First, we may directly
identify the distribution of Z by the form of C(s) rather than by the form of
P (Z = j). Second, by expanding C(s), we can avoid the direct summation
to find P (Z = j).
Example 4.4
Assume X and Y are independent and have binomial distributions with parameters (n, p) and (m, p) respectively. Then
A(s) =
n
X
j=0
n
X
n j
n
p (1 p)nj sj =
(ps)j (1 p)nj = (1 p + ps)n ,
j
j
j=0
36
and B(s) = (1 p + ps)m . Hence, the probability generating function of

X + Y is C(s) = A(s)B(s) = (1 p + ps)m+n . Thus, we know X + Y has
binomial distribution right away and it is simple to expand C(s) to find
!
m+n j
P (X + Y = j) =
p (1 p)m+nj , j = 0, 1, . . . , m + n.
j
4.3.1
Key Facts
Suppose that X has probability generating function, G(s).

E(X) = G0 (s)|s=1 ,
V ar(X) = [G00 (s) + G0 (s) + {G0 (s)}2 ]|s=1 .
If X and Y are independent random variables,

GX+Y (s) = GX (s)GY (s).
4.4
The Simple Random Walk
Let Z1 , Z2 , . . . be a sequence of independent random variables with P (Zn =

1) = p and P (Zn = 1) = q = 1 p, 0 < p < 1. Let X0 = 0 and
Xn = Xn1 + Zn for n 1. The stochastic process {Xn }
n=0 is called a
simple random walk. By plotting Xn against n, we may obtain a figure
as follows.
When used to model gambling, Xn would be the net winning of a gambler
after n games, where each game results in a gain of one dollar with probability
p, or a loss of one dollar with probability q = 1 p. When used to model
the movement of a particle, Xn would be the location of the particle on a
line after n unit times. If we use it to model the movement of a not so sober
individual walking on a line, then Xn would be its position after n steps.
This gives us the idea why this process have such a name.
4.4. THE SIMPLE RANDOM WALK
t
@
@
@t
@
@
@t
t
t
t
@
@
@t
@
@
@t
37
t
@
@
@t
@
@
@t
@
@
@t
t
@
@
@t
Here, we use generating functions to examine properties of the process

{Xn }. Some quantities to be investigated are
un = P (Xn = 0)
fn = P (X1 6= 0, . . . , Xn1 6= 0, Xn = 0)
n = P (X1 < 1, . . . , Xn1 < 1, Xn 1)
n(r) = P (X1 < r, . . . , Xn1 < r, Xn r)
(r)
= P (X1 > r, . . . , Xn1 > r, Xn r)
n
for n = 1, 2, . . . and r = 1, 2, . . ..
(r)
For convenience, we define u0 = 1, f0 = 0 = 0 = (r) = 0. In the
simple random walk as presented, Zn can be either 1 or 1. Thus, it is
impossible for Xn1 < r, Xn > r to occur for any n. We insist on using
Xn1 < r, Xn r instead of Xn1 < r, Xn = r in the definitions of (r)
n . It
has the advantage of being able to retain the same definition for more general
random walks.
Each of these quantities represents the probability of a particular outcome
of the simple random walk after n trials. We summarize them in the following
table.
Symbol
un
fn
n
(r)
n
Probability of
return to 0 at trial n
first return to 0 at trial n
first passage through 1 at trial n
first passage through r at trial n
38

Clearly, (1)
n = n . It is also easily seen that
u2n+1 = f2n+1 = 0;
2n = 0
because, for example, it is impossible to make the sum of odd number of 1s

equaling 0.
P
Since X2n = 2n
i=1 Zi , if X2n = 0, we must have equal number of Zi being
1 and being 1. Thus, it is simple to find
u2n
2n
=
(pq)n ,
n
n = 1, 2, . . . .
You can try to verify that the generating function of {un } is given by
U (s) = (1 4pqs2 )1/2 .
To find the generating functions of other sequences, let F (s), (s), and
(s) be generating functions of {fn }, {n } and {n(r) }. We will get them
through some difference equations.
(r)
4.4.1
First Passage Times
The first thing we do is to find a relationship between n and (2)

n . Note that
for the random walk to reach 2 at trial n, it has to reach 1 at some time
between trial 1 and n. Let k be the first time when the walk reaches 1, and
it will reach 2 at n. Thus, this event can be equivalently be described as
Ak = {X1 X0 < 1, X2 X0 < 1, . . . , Xk1 X0 < 1, Xk X0 = 1}
{Xk+1 Xk < 1, . . . , Xn1 Xk < 1, Xn Xk = 1} = B0k Bkn
where B0k and Bkn are independent events. Clearly, P (B0k ) = k0 and
P (Bkn ) = nk . Thus, we have
n1
(2)
n = P (k=1 Ak ) =
n1
X
k=1
k nk =
n
X
k=0
k nk
39
since 0 = 0. Note this identity is still true even when n = 0. Therefore,

we have found {(2) } = {n } {n } (convolution) and (2) = [(s)]2 . In like
manner,
(r) (s) = [(s)]r , r = 2, 3, . . . .
Although the above relationship is neat, we cannot solve it to obtain an
explicit expression of n s yet. Let us work on another relationship between
{(2)
n } and {n }. It is obvious that 1 = p. If the first passage through 1 at
trial n such that n > 1, it requires Z1 = X1 = 1. After that, it requires the
simple random walk to gain a value of 2 in exactly n 1 steps. Thus
(2)
n = qn1 ,
n = 2, 3, . . . .
(4.8)
Multiplying both sides of (4.8) by sn and sum over n with care over its range,
we have
n s n = q
n=2
(2)
n1 sn .
n=2
We find
2
(s) ps = qs(2)
n (s) = qs[(s)]
from the first relationship.

It is easy to find the two possible forms:
1 1 4pqs2
(s) =
.
2qs
When s 0, we should have (s) 0 so we must have
!
X
1 1 4pqs2
1/2
1
(s) =
= (2qs)
(4pqs2 )j
2qs
j
j=1
where the binomial expansion has been used. From this we find 2n = 0 and
1
2n1 = (2q)
1/2
2n 1 n n1
(4pq)n = (2n 1)1
p q ,
n
n
n = 1, 2, . . . .
The generating function (s) will tell us more about the simple random
walk. Since
(s) =
n=0
n sn ,
40

(1) = 0 + 1 + 2 +
= P (first passage through 1 ever occurs)
= (1
=
1 4p + 4p2 )/2q = (1 |p q|)/2q
1
pq
p/q p < q.
The walk is certain to pass through 1 when p > q, or even when p = q = 1/2.
If p q, we may define the random variable N which is the waiting time
until first passage through 1 occurs. That is
N = min{n : Xn = 1}
and we know, in this case, that P (N < ) = 1. Since P (N = j) = j , the
probability generating function of N is (s) and
0
E(N ) = (1 ) =
(p q)1 , p > q
,
p=q.
Can we still define N when p < q?

If the walk is used to model gambling, the above conclusions amount
to say: the gambler is certain to have positive net winning at some time if
p 1/2. If, however, p < 1/2, the gambler may never have a net winning at
any time. If p = 1/2, the average waiting time until the gambler wins some
money can be infinity although it is certain to occur.
4.4.2
Returns to Origin
For a first return to the origin at trial n, the walk must either begin with
X1 = 1 or X1 = 1. In the first case, the event can be written as
A = {X1 = 1, X2 X1 < 1, X3 X1 < 1, . . . , Xn1 X1 < 1, Xn X1 1}.
Hence P (A) = qn1 . In the second case, the event becomes
B = {X1 = 1, (X2 X1 ) < 1, . . . , (Xn1 X1 ) < 1, (Xn X1 ) 1}.
41
Note {Xn } is also a simple random work with P (Xn = 1) = q rather than
p. Hence the event B has similar structure to the event A. Let
(1)
= P (X1 < 1, X2 < 1, . . . , Xn1 < 1, Xn = 1).
n
Then, {(1)
} has the same generating function as that of {n } except for p
n
(1)
and q switched. In addition, P (B) = P (X1 = 1)n1 and therefore, for
n 1,
(1)
fn = P (A) + P (B) = pn1 + qn1 .
Equivalently,
F (s) = ps(1) (s) + qs(s)
1 1 4pqs2
1 1 4pqs2
+ qs
= ps
2ps
2qs
= 1
1 4pqs2 .
The probability that the process ever returns to the origin is

F (1) =
fn = 1 |p q|
n=0
and so a return is certain only if p = q = 1/2. In this case, the mean time to
return is
d
F 0 (1 ) = lim [1 1 s2 ] = .
s1 ds
Thus, if the game is fair and you have lost some money at the moment, we
have a good news for you: the chance that you will win back all your money
is 1. The bad news is, the above result also tells you that on average, you
may not live that long to see it.
4.4.3
Some Key Results in the Simple Random Walk

Symbol Expression

2n
un
(pq)n
n
Generating function
U (s) = (1 4pqs2 )1/2
fn
(2n 1)1

(2n 1)1
2n
n
(pq)n
2n1
n
pn q n1
(1 4pqs2 )
(2qs)1 [1
(1 4pqs2 )]
42

The following are key steps of deriving the results in the above table.
qs[(s)]2 (s) + ps = 0;
F (s) = 1 [U (s)]1 ;
F (s) = ps(1) (s) + qs(s);
(2) (s) = [(s)]2 .
4.5
The Branching Process
Now let us study the second example of simple stochastic processes. Here we
have particles that are capable of producing particles of like kind. Assume
that all such particles act independently of one another, and that each particle has a probability pj of producing exactly j new particles, j = 0, 1, 2, . . .,
P
pj = 1. For simplicity, we assume that the 0th generation to consist of
a single particle and the direct descendants of that particle form the first
generation. Similarly, the direct descendants of the nth generation form the
(n + 1)th generation.
Z0 = 1
u
P
@PPP

PP

@

PP
@u

Pu
u
u

@

@

u
u
u
u @u

H
@HH

@ H

u
u
u
u @u HHu
u
u
u
r r r r r r r r
Z1 = 4
Z2 = 5
Z3 = 9
Let Zn be the population of the nth generation so that Z0 = 1 and

P (Z1 = j) = pj , j = 0, 1, 2, . . .. Let Xni be the direct descendants of the ith
individual in the nth generation. Hence, we have
Zn+1 =
Zn
X
Xni
i=1
for all n 1. In addition, all Xni are independent and have the same
distribution as that of Z1 .
4.5. THE BRANCHING PROCESS
43
Due to the assumption of Z0 = 1, we have Z1 = X01 . That is, the

population size of the first generation is the same as the family size of the
founding member.
Let Hn (s) = E(sZn ) and G(s) = E(sX01 ). With the assumption Z0 = 1,
we have H0 (s) = s and H1 (s) = G(s). For an obvious reason, we call G(s)
the probability generating function of the family size distribution.
If Zn = k is given, we would have
Zn+1 =
k
X
Xni .
i=1
Thus,
E(sZn+1 |Zn = k) = E[s
Pk
i=1
Xni
] = Gk (s).
Recall the definition of the conditional expectation, we have shown

E(sZn+1 |Zn ) = GZn (s).
(4.9)
To avoid confusing, let us write

Hn (t) = E[tZn ],
and substitute t = G(s) into it. Then, we get
Hn (G(s)) = E{[G(s)]Zn } = E{E(sZn+1 |Zn )} = E(sZn+1 ) = Hn+1 (s)
with the help of (4.9). That is, we have found an iterative relationship
Hn+1 (s) = Hn (G(s)) = . . . = G(Hn (s)) n = 1, 2, . . . .
(4.10)
This relationship can be used to calculate Hn (s) recursively, although the

calculations are generally not pleasant.
4.5.1
Mean and Variance of Zn
If the expectation of Z1 exists, the mean population of the nth generation is

n = E(Zn ) = Hn0 (1)
44
and from (4.10) it follows that

0
n = G0 (Hn1 (1))Hn1
(1) = n1 ,
n = 1, 2, . . .
(4.11)
where = G0 (1) is the mean family size and we have Hn1 (1) = 1. Since
0 = 1, it follows from (4.11) that n = n . Thus, if > 1, the average
population size increases exponentially. If < 1, E(Zn ) approaches 0 at an
exponential rate as n . The case = 1 gives the curious result that
E(Zn ) = 1 for all n.
More directly,
V ar(Zn ) = E[V ar(Zn |Zn1 )] + V ar[E(Zn |Zn1 )]
= E(Zn1 2 ) + V ar(Zn1 ).
Thus
2
,
n2 = n1 2 + 2 n1
n = 1, 2, . . .
where n2 = V ar(Zn ) and 2 is the variance of the family size distribution.

Note 02 = 0 and n1 = n1 , we find
n2 =
n1 (1 n ) 2
,
1
n = 1, 2, . . .
which can be established by inductive argument.

Remark The mean of Zn can be obtained by the direct arguments, while
the variance of Zn can be obtained by using the iterative relationship (4.10).
The two derivations are about equally difficulty compared the derivations
presented.
4.5.2
Probability of Extinction
Let qn represent the probability that the populations extinct by the n generation. That is, define
qn = P (Zn = 0) = Hn (0), n = 0, 1, 2, . . . .
Thus, q0 = 0 and, by (4.10),
qn = G(qn1 ).
(4.12)
45
Note that q0 q1 q2 and qj 1 for all j. Thus,

q = n
lim qn
exists and represents the probability that the population ever becomes extinct. From (4.12), it follows that q is a fixed point of the probability generating function G(s); that is
q = G(q).
This gives us the idea that we need only solve the equation G(s) s = 0
to obtain the probability of extinction. However, we need to know that when
the equation has more than one solution, which one gives the probability of
extinction?
Theorem 4.2
Let {Zn }
n=0 be a branching process as specified in this section such that
Z0 = 1, and the family size generating function is given by G(s). Then the
probability of extinction for this branching process q is the smallest solution
of the equation
s = G(s)
in the interval [0, 1].
Proof: Assume that smallest solution in [0, 1] is q and we want to show
that q = q .
Let qn = P (Zn = 0) for n = 0, 1, . . .. Clearly, q0 = 0 q . Assume that
qk q for some n. Note that G(s) in an increasing function for s [0, 1].
Hence, G(qk ) q . Consequently, qk+1 = G(qk ) q . This implies qn q
for all k. Let n , we obtain q q . Since q is also a solution in [0, 1],
and q is the smallest such solutions, we must have q = q .
In many situations, we do not have to solve the equation to determind the

value of q. Let h(s) = G(s) s. One obvious solution in [0, 1] is s = 1. Note
P
j2
0 when s [0, 1].
h0 (s) = G0 (s) 1, h00 (s) = G00 (s) =
j=2 j(j 1)pj s
Thus, h(s) is a convex function.
There are several possibilities:
1. If h0 (1) = G0 (1)1 = 1 > 0, the curve of h(s) goes down from s = 0
and then comes up to hit 0 at s = 1. Since h(0) = P (X01 = 0) 0, the
46

curve crosses 0 line exactly once before s = 1. Since q is the smallest
solution in [0, 1]. We must have q < 1 in this case.
2. If h0 (1) = G0 (1) 1 = 1 < 0, we must have h(0) = P (X01 = 0) > 0.
Thus, the curve decreases all the way down to 0 at s = 1. Thus s = 1
is the unique solution in [0, 1], we must have q = 1.
3. Suppose h0 (1) = 0. If futher, h(0) = P (X01 = 0) > 0, then we are at the
same situation as in case 2; On the other hand, h(0) = P (X01 = 0) = 0
implies the family size is fixed at 1, hence q = 0.
Remark Because of the above summary, most students tend to always solve
the equation to find the probability of ultimate extinction. This is often more
than what is needed.
Example 4.5
47
Lotka (See Feller 1968, page 294) showed that to a reasonable approximation,
the distribution of the number of male offspring in an American family was
described by
p0 = 0.4825, pk = (0.2126)(0.5893)k1 , k 1
which is a geometric distribution with a modified first term. The corresponding probability generating function is
G(s) =
0.4825 0.0717s
1 0.5893s
and G0 (1) = 1.261. Thus, for example, in the 16th generation, the average
population of male descendants of a single root ancestor is
16 = (1.261)16 = 40.685.
The probability of extinction, however, is the smallest solution of
q=
0.4826 0.0717q
.
1 0.5893q
Thus, we find q = 0.8197. This suggests that for those names that do survive
to the 16th generation, the average size is very much more than 40.685. (All
the calculations are subject to original round off errors).
Example 4.6
From the point of epidemiology, it is more important to control the spread
of the disease than to cure the infected patients. Suppose that the spread of
a disease can be modeled by a branching process. Then it is very important
to make sure that the average number of people being infected by a patient
is less than 1. If so, the probability of its extinction will be one. However,
even if the average number of people being infected is larger than one, there
is still a positive chance that the disease will extinct.
A scientist in Health Canada analyzed the data from the SARS (severe
atypical respiratory syndrome) epidemic in year 2003. It is noticed that many
interest phenomena could be partially explained by the results in branching
process.
48
First, many countries imported SARS patients but they did not cause
epidemics. This can be explained by the fact that the probability of extinction is not small (even when the average number of people being infected by
a single patient is larger than 1).
Second, a few patients were nicknamed super-spreader. They might
simply corresponding to the portion of branching process which do not extinct.
Third, after government intervention, the average number of people being
infected by a single patient was substantially reduced. When it fell below 1,
the epidemic was doomed to extinct.
Finally, it was not cost effective to screen all airplane passengers, but to
take strict and quick measure of quarantine of new and old cases. When
the average number of people being infected by a single patient falls below
one, the disease will be controlled with probability one.
4.5.3
Some Key Results in Branch Process
For simplicity, we assumed that the population starts from a single individual:
Z0 = 1; we also assumed the numbers of offsprings of various individual are
independent and have the same distribution.
Under these assumptions, we have shown that
n = n
and
n2 =
n 1 n1 2

1
where and 2 are the mean and the variance of the family size and n and
n2 are the mean and the variance of the size of the nth generation.
We have shown that the probability of extinction, q, is the smallest nonnegative solution to
G(s) = s
where G(s) is the probability generating function of the family size. Further,
it is known that q = 1 when < 1; when > 1, q < 1. When = 1, q = 1
unless the family size is not random.
4.6. PROBLEMS
49
These results can all be derived from the fact that

Hn (s) = Hn1 (G(s))
where Hn (s) is the probability generating function of the population size of
the nth generation.
4.6
Problems
1. Find the mean and variance of X when

(a) X has Poisson distribution with p(x) =
x
e ,
x!
x = 0, 1, . . ..
(b) X has exponential distribution with f (x) = ex , x 0.

2. (a) If X and Y are exponentially distributed with rate = 1 and
independent of each other, find the density function of X + Y .
(b) If X and Y are geometrically distributed with parameter p and
independent of each other, find the probability mass function of X + Y .
(c) Find a typical discrete distribution and a typical continuous distribution (not discussed in class) to repeat question (a) and (b).
3. Suppose that given N = n, X has binomial distribution with parameters n and p. Suppose also N has Poisson distribution with parameter
. Use the technique of generating functions to find
(a) the marginal distribution of X.
(b) the distribution of N X.
4. Let X1 , X2 , X3 , . . . be independent and identically distributed random
variables such that X1 has probability mass function
f (k) = P (X1 = k) = p(1 p)k
k = 0, 1, 2, . . . .
(a) Find the probability generating function of X1 .
50

(b) Let In = 1 if Xn n and In = 0 if Xn < n for n = 0, 1, 2, . . ..
That is, In is an indicator random variable. Show that the probability
generating function of In is given by
Hn (s) = 1 + (s 1)(1 p)n .
(c) Let N be a random variable with probability generating function
G(s) and assume it is independent of X1 , X2 , . . .. Let IN = In when
N = n and In is the indicator random variable defined in (b). Show
that
E[sIN |N ] = HN (s) = 1 + (s 1)(1 p)N .
Find the probability generating function of IN ,
5. A coin is tossed repeatedly, heads appearing with probability p = 2/3
on each toss.
(a) Let X be the number of tosses until the first occasion by which two
heads have appeared successively. Write down a difference equation for
f (k) = P (X = k). Assume that f (0) = 0.
(b) Show the generating function of f (k) is given by
F (s) =
4 2
2
1
].
s[
2 +
27 1 3 s 1 + 13 s
(c) Find an explicit expression for f (k) and calculate E(X).

6. Let X and Y be independent random variables with negative binomial
distribution and probability function
pi =
k
i
pk (p 1)i ,
i = 0, 1, . . . .
(a) Show that the probability generating function of X is given by

pk
G(s) = (1+(p1)s)
k.
(b) Find the probability function of X + Y .
(c) Calculate E(eX ) and V ar(eX ) and what condition on the size of p
is needed?
4.6. PROBLEMS
51
7. Give the sequences generated by the following:

1) A(s) = (1 s)1.5 ;
2) B(s) = (s2 s 12)1 ;
3) C(s) = s log(1 s2 )/ log(1 );
4) D(s) = s/(5 + 3s);
5) E(s) = (3 + 2s)/(s2 3s 4);
6) F (s) = (p + qs)n .
8. Turn the following equation systems into equations in generating functions.
1) b0 = 1; bj = bj1 + 2aj , j = 1, 2, . . .; a0 = 0.
2) b0 = 0, b1 = p, bn = q
Pn1
r=1
br bn1r , n = 2, 3, . . ..
9. 1) Find the generating function of the sequence aj = j(j + 1), j =

0, 1, 2, . . ..
2) Find the generating function of the sequence aj = j/(j + 1), j =
0, 1, 2, . . ..
3) Let X be a non-negative integer valued random variable and define
rj = P (X j). Find the generating function of {rj } in terms of the
probability generating function of X.
10. 1) Negative Binomial
!
k
pj =
(p)j (1 p)k , j = 0, 1, . . .
j
where k > 0 and 0 < p < 1.
2) Let ro = 0, rj = c/j(j + 2), j = 1, 2, . . . (find the constant c by
yourselves).
Find the means and the variances of the above distributions whichever
exists.
52
11. Find the probability generating function of the following distributions:

1. Discrete uniform on 0, 1, . . . , N .
2. Geometric.
3. Binomial.
4. Poisson.
12. Let {an } be a sequence with generating function A(s), |s| < R, R > 0.
Find the generating functions of
1) {c + an } where c is a real number.
2) {can } where c is a real number.
3) {an + an+2 }.
4) {(n + 1)an }.
5) {a2n } = {a0 , 0, a2 , 0, a4 , . . .}.
6) {a3n } = {a0 , 0, 0, a3 , 0, 0, a6 , . . .}.
13. Consider a usual branching process: let the population size of the nth
generation be Xn and family size of the ith family in the nth generPXn1
ation be Zn,i . Thus, Xn = i=1
Zn,i and X0 = 1. Assume Zn,i are
independent and identically distributed, and
P (Z1,1 = 0) =
1
+ a;
2
P (Z1,1 = 1) =
1
2a;
4
P (Z1,1 = 3) =
1
+ a,
4
for some a.
(a) Find probability generating function of the family size. When a =
1/8, find the probability generating function of X2 .
(b) Find range of a such that the probability of extinction is less than
1.
(c) When a = 1/8, find the expectation and variance of the population
size of the 5th generation and the probability of extinct.
14. For a branching process with family size distribution given by
P0 = 1/6, P2 = 1/3, P3 = 1/2;
4.6. PROBLEMS
53
calculate the probability generating function of Z2 given Zo = 1, where

Z2 is the population of the second generation. Find also, the mean
and variance of Z2 and the probability of extinction. Repeat the same
calculation when Zo = 3 and
P0 = 1/6, P1 = 1/2, P3 = 1/3.
15. Let the probability pn that a family has exactly n children be pn when
n 1, and p0 = 1p(1+p+p2 + ). Assume that all 2n sex sequences
in a family of n children have probability 2n . Show that for k 1,
the probability that a family has exactly k boys is 2pk /(2 p)k+1 .
Given that a family includes at least one boy, what is the probability
that there are two or more boys?
16. Let Xi , i 1, be independent uniform (0, 1) random variables, and
define N by
N = min{n : Xn < Xn+1 }
where X0 = x. Let f (x) = E(N ).
(a) Derive an integral equation for f (x) by conditioning on X1 .
(b) Differentiate both sides of the equation derived in (a).
(c) Solve the resulting equation obtained in (b).
17. Consider a sequence defined by r0 = 0, r1 = 1 and rj = rj1 + 2rj2 ,
j 2. Find the generating function R(s) of {rj }, determine r25 . For
what region of s values does the series for R(s) converge?
18. Let X1 , X2 , be independent random variables with common p.g.f.
G(s) = E(sXi ). Let N be a random variable with p.g.f. H(s) indepenP
dent of the Xi s. Let T be defined as 0 if N = 0 and N
i=1 Xi if N > 0.
Show that the p.g.f. of T is given by H(G(s)). Hence find E(T ) and
Var(T) in terms of E(X), V ar(X), E(N ) and V ar(N ).
19. Consider a branching process in which the family size distribution is
Poisson with mean .
54

(a) Under what condition will the probability of extinction of the process be less than 1?
(b) Find the extinction probability when = 2.5 numerically.
(c) When = 2.5 find the expected size of the 10th generation, and
the probability of extinction by the 5th generation. Comment on the
relationship between this second number and the ultimate extinction
probability obtained in (b).
20. Consider a branching process in which the family size distribution is

geometric with parameter p. (The geometric distribution has p.m.f
pj = p(1 p)j , j = 0, 1, . . .).
(a) Under what condition will the probability of extinction of the process be less than 1?
(b) Find the probability of extinction when p = 1/3.
(c) When p = 1/3, find the expectation and variance of the size of the
10th generation and the probability of extinction by the 5th generation.
21. Let {Zn }
n=0 be a usual branching process with Z0 = 1. It is known
P0 = p, P1 = pq, P2 = q 2 with 0 p 1 and q = 1 p.
1) Find a condition on the size of p such that the probability of extinction is 0. 2) Find the range of p such that the probability of extinction
is smaller than 1. Calculate the probability of extinction when p = 1/2.
3) Calculate the mean and the variance of Zn when p = 1/2.
22. Let X1 , X2 , . . . be independent random variables with common p.g.f.
G(s) = E(sXi ). Let N be a random variable with p.g.f. H(s). Show
that
( PN
i=1 Xi N 1
T =
0
N =0
has p.g.f. H(G(s)). Hence, find the mean and variance of T in terms
of the means and variances of Xi and N . Remark: Can you see the
relevance between this problem and the usual branching process?
4.6. PROBLEMS
55
23. Branching with immigration Each generation of a branching process (with a single progenitor) is augmented by a random number of
immigrants who are indistinguishable from the other members of the
population. Suppose that the numbers of immigrants in different generations are independent of each other and of the past history of the
branching process, each such number having probability generating
function H(s). Show that the probability generating function Gn of
the size of the nth generation satisfies Gn+1 (s) = Gn (G(s))H(s), where
G is the probability generating function of a typical family of offspring.
24. Consider the random walk X0 = 0, Xn = Xn1 + Zn where P (Zn =
+1) = p, P (Zn = 1) = q, n = 1, 2, . . . independently (p + q = 1).
Find the probability that the event Xn = r will ever occur where r is
a fixed positive integer. If p > q, find the expected time until its first
occurrence.
25. Consider the random walk X0 = 0, Xn = Xn1 + Zn where P (Zn =
+1) = p, P (Zn = 2) = q, n = 1, 2, . . . independently (p + q = 1). Let
(r) (s) and n be defined as in the class. Show that (r) (s) = [(s)]r
and derive the relationship
(3)
n = qn1 ,
n = 2, 3, . . . .
Hence, show that

qs[(s)]3 (s) + ps = 0.
26. Consider the random walk Xo = 0, Xn = Xn1 +Zn where Z1 , Z2 , . . . are
independent, but P (Zn = 1) = p, P (Zn = 1) = q and P (Zn = 0) = r,
n = 1, 2, . . . (p + q + r = 1). Let
fn = P (X1 6= 0, . . . , Xn1 6= 0, Xn = 0),
n = P (X1 < 1, . . . , Xn1 < 1, Xn 1).
Find their generating functions (s) and F (s). Hence, obtain the probability the 0 ever recurs and the probability that the walk ever passes
through 1.
56
27. If an unbiased coin is tossed repeatedly, show that the probability that
the number of heads ever exceeds twice the number of tails is ( 51)/2.
28. Let pr,k be the probability that the simple random walk visits state r
(r > 0) exactly k times.
a) If p = q = 0.5, show that pr,k = 0, k = 0, 1, 2, . . ..
b) If p > q, show that
pr,k =
0,
k = 0;
k1
(1 ) , k = 1, 2, . . .
where = |p q|.
c) If p < q, show that
pr,k =
1 ,
k = 0;
k1
(1 ) , k = 1, 2, . . .
where = (p/q)r .
29. Consider a gambler who at each play of the game has probability p of
winning one unit and probability q = 1 p of losing one unit. Assuming that successive plays of the game are independent, what is the
probability that, starting with i unit, the gamblers fortune will reach
N before reaching 0? Hint: Let Pi , i = 0, . . . , N denote that probability that, starting with i, the gamblers fortune will eventually reach N .
Derive a relationship between Pi s.
n n
30. Using the fact that u2n+1 = 0 and u2n = (2n
n ) p q to show
U (s) = (1 4pqs2 )1/2 .

31. (a) Consider a simple random walk with reflecting barrier. Let Z1 , Z2 , . . .
be independent and identically distributed random variables such that
P (Zn = 1) = p, P (Zn = 1) = 1 p = q. Assume 0 < p < 1 and
X0 = 1. Also Xn+1 = Xn + Zn if Xn > 0 and Xn+1 = 1 if Xn = 0. Verify that {Xn } is a Markov chain and find its transition matrix. Classify
the state space.
4.6. PROBLEMS
57
(b) Let fn = P (X1 6= 0, X2 6= 0, . . . , Xn1 6= 0, Xn = 0|X0 = 1) for

n = 1, 2, . . . and f0 = 0. It is known that the generating function of fn
is given by
1 1 4pqs2
F (s) =
2ps
and 1 4pq = |pq|. Find the probability that 0 will ever be reached.
(c) Find the range of p such that state 0 is recurrent.
58
Chapter 5
Renewal Events and Discrete
Renewal Processes
5.1
Introduction
Consider a sequence of trials that are not necessarily independent and let
represent some property which, on the basis of the outcomes of the first
n trials, can be said unequivocally to occur or not to occur at trial n. By
convention, we suppose that has just occurred at trial 0, and En represents
the event that occurs at trial n, n = 1, 2, . . ..
We call an event in renewal theory. However, it is not an event in the
sense of probability models in which events are subsets of the sample space.
Taking the simple random walk {Xn } as an example, we regard Xn as the
outcome of the nth trial. Thus, {Xn } themselves are outcomes of a sequence
of trials. An event 1 can be used to describe: the outcome Xn is 0. That
is, the event has just occurred at trial n is the event
En = {Xn = 0}
for a given n.
Similarly, another possible event 2 can be defined such that 2 has just
occurred at trail n is the event
En = {Xn Xn1 = 1, Xn1 Xn2 = 1},
59
n = 2, 3, . . . .
60
CHAPTER 5. RENEWAL EVENTS
The events E0 and E1 have to be defined separately.

In general, if we have a well defined event , then we can easily describe
the event En for every n. If we have a complete description of event En for
every n, the event is then well defined. It is convenient to define f0 = 0
c
and for n 1, fn = P (E1c E2c . . . En1
En ). Thus, fn is the probability that
occurs for the first time at trial n (after trial 0).
We say that is a renewal event if each time occurs, the process
undergoes a renewal or regeneration. That is to say that at the point when
occurs, the outcomes of the successive trials have the same stochastic properties as the outcomes of the successive trials started at time 0. In particular, the probability that will next occur after n additional trials is fn ,
n = 1, 2, . . .. Mathematically, it means
1. P (En+m |En ) = P (Em |E0 );
c
c
c
c
En+2
En+1
|En ) = P (Em Em1
E2c E1c |E0 ).
2. P (En+m En+m1
Another simple (but not rigorous) way to define a renewal event is: independent of previous outcomes of the trials, once occurs, the waiting time
for the next occurrence of has the fixed distribution.
Example 5.1
Consider a sequence of Bernoulli trials in which P (S) = p and P (F ) = q
with p + q = 1. Let represent the event that trials n 2, n 1 and n result
respectively in F, S and S. We shall say that is the event F SS. It is
clear that is a renewal event. If occurs at n, the process regenerates and
the waiting time for its next occurrence has the same distribution as had the
waiting time for the first occurrence.
Example 5.2
In the same situation as above, let represent the event SS. That is, is
said to occur at trial n if trials n 1 and n both give S as the outcome. In
this case, is not a renewal event; the occurrence of does not constitute a
renewal of the process. The reason is, if has occurred at trial n, the chance
it will recur at trial n + 1 is p, but the chance that occurs on the first trial
is 0.
5.2. THE RENEWAL AND LIFETIME SEQUENCES
61
Example 5.3
In most situations, the event of record breaking is not a renewal event. Let
us consider the record high temperature. The record always gets higher and
makes it hard to break. Thus, the waiting time for the next occurrence is
likely to be longer. Hence, it cannot be a renewal event.
Example 5.4
The simple random walk provides a rich source for examples of renewal
events. As before, we assume X0 = 0 and Xn = Xn1 + Zn , where Zn = +1
or 1 with respective probabilities p and q, independently, n = 1, 2, . . ..
a) Let represent return to the origin. Then is a renewal event. In
fact, the notation that we used in our analysis of the simple random walk
will motivate our choice of notation for recurrent events as introduced in the
next section.
b) Let represent a ladder point in the walk. By this we mean that
occurs at trial n if
Xn = max{X0 , X1 , . . . , Xn1 ) + 1
and we assume to have occurred at trial 0. Thus, the first occurrence of
corresponds to first passage through 1, the second occurrence of corresponds to first passage through 2, and so on. Here again, is a renewal
event, since each ladder point corresponds to a regeneration of the process.
c) As a final example, suppose that is said to occur at trial n if the number of positive values in Z1 , . . . , Zn is exactly twice the number of negative
values. Equivalently, occurs at trial n if and only if Xn = n/3.
5.2
The Renewal and Lifetime Sequences
Let represent a renewal event and as before define the lifetime sequence
{fn } where f0 = 0 and
fn = P { occurs for the first time at trial n},
n = 1, 2, . . . .
62
In like manner, we define the renewal sequence un , where u0 = 1 and

un = P { occurs at trial n},
Let F (s) = fn sn and U (s) =
{fn } and {un }. Note that
P
f=
n = 1, 2, . . . .
un sn be the generating functions of
fn = F (1) 1
since f has the interpretation that recurs at some time in the sequence.
Since the event may not occur at all, it is possible for f to be less than 1.
Clearly, 1 f represents the probability that never recurs in the infinite
sequence of trials.
If f < 1, the waiting time for to occur is not really a random variable.
This is because that it has probability 1 f to BE infinity which is not
allowed for a random variable. For this kind of renewal events, after each
occurrence of , there is a probability 1 f that it will never occur again.
The probability that it will occur exactly k times is f k (1 f ), k = 0, 1, . . ..
(Use the model that we toss a coin to decide whether it will recur).
We may compute the chance of occuring at most 1000 times as
100
X
f k (1 f ) = 1 f 101 .
k=0
The chance of it occuring at most m times is

m
X
f k (1 f ) = 1 f m+1 .
k=0
Thus, when m tends to infinity, the chance for to occur no more than m
times tends to 1. Based on this fact, we claim that such a renewal event
occurs finitely often. It is transient.
If f = 1, then will occur some time in the future with probability
one. Only then, we can discuss the waiting time for the next occurrence
of . For a renewal event with this property, the waiting times from
the nth occurrence to the (n + 1)th (hence called inter-occurrence time)
are independent and have the same distribution. The function F (s) defined
5.2. THE RENEWAL AND LIFETIME SEQUENCES
63
earlier is the corresponding probability generating function. A renewal event

with f = 1 is called recurrent.
For a recurrent event, F (s) is a probability generating function. The
mean inter-occurrence time is
= F (1) =
nfn .
n=0
If < , we say that is positive recurrent. If = , we say that is

null recurrent.
Finally, if can occur only at n = t, 2t, 3t, . . . for some positive integer
t > 1, we say that is periodic with period t. More formally, let t =
g.c.d.{n : fn > 0}. (g.c.d. stands for the greatest common divisor). If t > 1,
the recurrent event is said to be periodic with period t. If t = 1, is said
to be aperiodic.
Note that even if the first a few fn values are zero, the renewal event can
still be aperiodic. Many students believe that: if f1 = f2 = 0, the period of
the renewal event must be at least 3. This is wrong. The renewal event can
still be aperiodic if f8 > 0, f11 > 0 and so on. The greatest common divisor
of 8, 11 is one! No additional information is needed.
Another remark is, suppose fi > 0 and fj > 0 for some integers i and
j. In addition, i and j is mutually prime, then the greatest common divisor
for the set {i, j, any additional numbers} = 1. That is, we know that the
period is 1 already. No need to look further.
To show that the greatest common divisor is t which is larger than 1, we
have to make sure fn = 0 whenever n is not a multiple of t. This is much
harder in general.
In the simple random walk, the renewal event of returning to zero has
period 2. This is because fn > 0 only if you lose and win equal number of
games in a total of n games. Thus, n must be even when fn > 0. The period
is t = 2, rather than any larger because f2 > 0. Thus the greatest common
divisor cannot be larger than 2.
64
5.3
Some Properties
For a renewal event to occur at trial n 1, either occurs for the first
time at n with probability fn = fn u0 , or occurs for the first time at some
intermediate trial k < n and then occurs again at n. The probability of this
event is fk unk . Notice that f0 = 1, we therefore have
un = f0 un + f1 un1 + + fn1 u1 + fn u0 , n = 1, 2, . . . .
This equation is called renewal equation.
Using the typical generating function methodology, we get
U (s) 1 = F (s)U (s).
Hence
U (s) =
1
1 F (s)
or
F (s) = 1
1
.
U (s)
Recall that when we discussed the simple random walk, we found in that
context,
q
U (s) = (1 4pqs2 )1/2 , F (s) = 1 1 4pqs2 .
It is simple to see that this relationship is true.
The concepts defined in the last section are all related to the {un } sequence and we summarize this in the following.
Theorem 5.1
The renewal event is
1. transient if and only if u =
un = U (1) < ,
2. recurrent if and only if u = ,

3. periodic if t = g.c.d.{n : un > 0} is greater than 1 and aperiodic if
t = 1.
4. null recurrent if and only if
un = and un 0 as n .
5.3. SOME PROPERTIES
65
Proof 1 and 2:
un =
n=0
un = lim U (s) = lim [1 F (s)]1 .

s1
s1
It follows that u < and u = when f = F (1) < 1 and f = 1 respectively.

The event is transient in the former case and persistent in the latter.
P
3. If has period d > 1, then F (s) = fn sn contains only powers of sd .
Since
U (s) = [1 F (s)]1 = 1 + F (s) + F 2 (s) +
if follows that U (s) = un sn contains only powers of sd and so t = g.c.d{n :
un > 0} = g.c.d.{md : umd > 0} is such that t|d. But since un = 0 implies
that fn = 0, it follows that d|t. Hence t = d.
4. This result will follow from the renewal theorem.
The following is the famous renewal theorem.
Theorem 5.2 (The renewal theorem).
Let be a recurrent and aperiodic renewal event and let
=
nfn = F 0 (1)
be the mean inter-occurrence time. Then

lim un = 1 .
Proof: See Feller (1968, page 335).

When = which implies is null recurrent, then 1 = 0. This proves
4) in the last theorem.
For recurrent periodic renewal event , we might be able to re-scale the
time unit and then make use of this theorem. Suppose that has period
d > 1. We can define a new sequence of trials so that each new trial is a
combination of d original trials. That is, if the outcome of the original trials
are X1 , X2 , . . .. For instance, define
Ym+1 = (Xmd+1 , Xmd+2 , . . . , X(m+1)d ).
The new sequence {Y0 , Y1 , . . .} can also be used to define the renewal event
. However, in this case, becomes aperiodic and the theorem can then be
applied.
66
Example 5.5
Let represent the occurrence of F F S in a sequence of Bernoulli trials with
P (S) = p and P (F ) = q, (p+q=1). In this case
u0 = 1, u1 = u2 = 0
and
un = pq 2 ,
n = 3, 4, . . . .
Thus
U (s) = 1 + pq 2 (s3 + s4 + ) = 1 + pq 2 s3 /(1 s),
|s| < 1
and
F (s) = 1 [U (s)]1 = pq 2 s3 /(1 s + pq 2 s3 ).
Note that F (1) = f = 1 so that is recurrent. Since un pq 2 > 0 as
n , it follows that is positive recurrent and the mean inter-occurrence
time is = (pq 2 )1 .
Example 5.6
Consider again the simple random walk and let represent return to the
origin. It is known that
F (s) = 1
1 4pqs2 ,
|s| 1
and
U (s) = (1 4pqs2 )1/2 .
Since fn = 0 for all odd n and non-zero for even n, it follows that is periodic
with period d = 2. If p = q, F (1) = f = 1 so that is in this case recurrent.
If p 6= q,
f = F (1) = 1 |p q| < 1
and is transient. When p = q, = lims1 F 0 (s) = so that is null
recurrent.
5.4. DELAYED RENEWAL EVENTS
5.4
67
Delayed Renewal Events
In a simple random walk with X0 = 0, the event of the walk returning to

0 is a renewal event. When Xn = 0 for some n, the process renews itself:
it behaves as if we have just observed X0 = 0, and we can re-set the clock
back to 0. More specifically, if X10 = 0, then {X10 = 0, X10+1 , X10+2 , . . .} is
stochastically the same as {X0 = 0, X1 , X2 , . . .}. However, let be the event
of Xn = 1, then is not a renewal event. When X5 = 1, {X5 , X6 , . . .} does
not behave the same as {X0 , X1 , . . .}. Hence, we cannot re-set the clock back
to 0 and pretend that nothing happened.
If we observe that X5 = 1 and X19 = 1, then {X19+0 = 1, X19+1 , . . .} will
have the same stochastic property of the system {X5 = 1, X5+1 , X5+2 , . . .}.
Hence, the event does not renewal itself when it first occurs, but after its
first occurrence, the future occurrence of renews the process to the time
when it first occurs. Such events are called delayed renewal events.
The main difference between the delayed renewal and the usual renewal
events is: the waiting time for the first occurrence of has different distribution from the distribution of the inter-occurrence times. An informal way to
describe the delayed renewal event is: we missed the beginning and started
from the middle of the sequence.
Suppose that is a delayed renewal event. Let us define some quantities:
1) {bn }: the probability that first occurs on trial n, n = 0, 1, 2, . . ..
2) {fn }: the probability that first occurs again n trials later once that
has occurred,
3) {un }: the probability that occurs on trial n, given that occurred
on trial 0.
4) {vn }: unconditional probability that occurs on trial n.
By convention, we suppose that f0 = 0 but we do allow b0 > 0 so that
may occur for the first time at trial 0. Let B(s), F (s), U (s) and V (s) be
corresponding generating functions. We have,
U (s) = [1 F (s)]1 ,
|s| < 1,
which can be proved in the same way as for renewal events.

P
Let b =
n=0 bn be the probability that ever occurs. The property of
the delayed renewal event are determined by the {fn } sequence. Thus,
68

P
is recurrent if f =
fn = 1 and transient if f < 1. Periodicities are
determined by examining g.c.d. {n : fn > 0}. Note that it is possible that
is a recurrent event and yet has non-zero probability that will never occur,
but once it does it then occurs infinitely often.
To find V (s), let us note that when occurs at trial n 1, either
occurs for the first time at n with probability bn = bn u0 , or occurs for the
first time at some intermediate trial k < n and then occurs again at n. Thus,
vn = b0 un + b1 un1 + + bn u0 ,
n = 0, 1, 2, . . . .
We recognize the right side as the convolution of {bn } with {un } and so
V (s) = B(s)U (s), |s| < 1
and,
V (s) = B(s)[1 F (s)]1 ,
|s| < 1.
Example 5.7
Consider the simple random walk and let represent passage through 1.
Thus, occurs at trial n if Xn = 1. Then is a delayed renewal event. In
the notation used for the random walks
bn = n = P (first passage at trial n)
and B(s) = (s). Once occurs, the probability it recurs after n additional
steps is the same as the probability of a return to the origin n. Thus
U (s) = (1 4pqs2 )1/2
and {vn } where vn = P (Xn = 1), n = 1, 2, . . . has generating function
1 1 4pqs2
V (s) = (s)U (s) =

.
2qs 1 4pqs2
Theorem If is delayed renewal event that is recurrent and aperiodic, then

lim vn = b lim un = b1 .
That is, we have to wait until the event occur for once, and then it
becomes a renewal event from there.
The conclusions we obtained here will be very useful when we discuss
Markov chain.
5.5. SUMMARY
5.5
69
Summary
Table 5.1: Summary of some concepts
Terminology
Event
Renewal Event
Delayed Renewal
Event
Recurrent
Transient
Positive Recurrent
Null Recurrent
Period
Aperiodic
5.6
Definition
It is a property of a stochastic process.
Its occurrence can be determined after n trials.
When this type of event occurs, the stochastic
process undergoes a renewal: the random behavior of the
process from this point is the same as the process from
time zero
At the second time when this type of event occurs
the process undergoes a renewal: the random behavior of the
process from this point is the same as the process from
time when it occurred for the first time.
The renewal event will occur with probability 1.
The renewal event may never occur.
The renewal event is recurrent and the
expected waiting time for the next occurrence is finite
The renewal event is recurrent but the
expected waiting time for the next occurrence is infinite
The greatest common divisor of the number of trials
after which the renewal event can happen.
The period of the renewal event is 1.
Problems
1. A fair die is rolled repeatedly. We keep a record of the score of each

rolls. Let be the renewal event that scores 123 occur consecutively.
(a) Let un = P ( occurs at trial n) = P (Xn2 = 1, Xn1 = 2, Xn = 3).
1
1
Show that un = 216
for most n. Find the values of n for which un 6= 216
.
70

Recall that we assume u0 = 1.
(b) Obtain the generating function of the sequence un and therefore,
show that the generating function of
fn = P ( occurs at trial n for the first time)
is given by
F (s) =
s3
.
s3 + 216(1 s)
(c) Is the renewal event periodic? Is it recurrent? If so, find the mean
inter-occurrent time.
2. In a simple random walk examine if the following are renewal events.
(a) is said to occur at trial n if at trial n, a return to origin from the
positive side takes place.
(b) is said to occur at trial n if at trial n, the walk is to the right side
of the origin.
3. Lifetime distribution of a fuse is given by fn = n1 (1), n = 1, 2, . . ..
(a) Show that P (X = m + n|X > m) = fn , n = 1, 2, . . ..
(b) Suppose that a new fuse is place in service on day 0 and immediately
upon its failure is replaced with an identical fuse. Also assume that
the lifetimes are i.i.d. random variables with distribution given above.
The event is said to occur at trial n if a new fuse is put at trial n.
Note that is a recurrent event. Obtain F (s) and U (s), and hence
determine un for this event.
(c) Let T be the survival time of the fuse in service at time n. (if a
failure occurs at time n, the fuse in service is the replacement). Write
T as a sum of indicator variables Y0 , Y1 , . . ., where Yi = 1 if the fuse in
service at n is also in service at time i, and 0 otherwise. Show that
E(T ) =
1 + n+1
.
1
5.6. PROBLEMS
71
Note that as n , E(T ) (1 + )/(1 ) which is strictly greater

than the mean inter-occurrence time. Can you explain this fact on
intuitive grounds?
4. In a symmetric random walk in two dimensions, a particle begins at the
origin and then moves 1 unit to the N, S, E, or W each with probability
1/4. Let designate return to the origin.
(a) Show that u2n+1 = 0 and
2n
u2n = 4
n
2n X
n
n i=0 i
!2
(b) Show that the particle returns to the origin with probability 1.
Argue from this result that the particle must pass through every point
in the integer lattice.
5. Consider a renewal event with the {fn } sequence having generating
function F (s). Let Nk denote the number of occurrences in the first k
trials and let qk,n = P (Nk = n).
Show that Qn (s) =
k qk,n s
is given by
Qn (s) =
{1 F (s)}F n (s)
.
1s
Hint: qk,n = combination of F (T1 = 1 )P (T2 = 2 ) P (Tn =

n )P (Tn+1 > ) where Ti is the number of trials between the (i 1)th
and the i-th occurrence and = k 1 n .
P
6. A coin is tossed repeatedly, heads appearing with probability p = 2/3

on each toss. Let be the renewal event that THH occurs consecutively.
(a). Let un = P ( occurs at trial n). Show that un = 4/27 for most n.
Find the values of n for which un = 4/27 does not hold. (e.g. u0 = 1
by convention).
(b). Obtain the generating function of un and therefore show that the
generating function of
fn = P ( occurs at trial n for the first time)
72

is given by
F (s) =
4s3
.
27 27s + 4s3
(c). Is it a recurrent renewal event? If so, find the mean interoccurrence time. Otherwise, find the probability that will ever occur
again.
7. (Self-organizing data retrieval system). consider a shelf containing two
books, B1 and B2 (among others). These books have two possible
orders on the shelf, namely B1 B2 or B2 B1 . Assume that at epochs
n = 0, 1, 2, . . ., a book is required by a library user, and that at any
epoch the probability that Bj is needed is pj , j = 1, 2, independently of
what happens in other epochs. Assume p1 > 0, p2 > 0, p1 + p2 < 1. To
obtain the required book, the librarian always searches the book-shelf
from left to right, so that average search time for the requested book
is minimized if the book with higher pj value is on the left.
However, the librarian does not know which book is more popular, and
therefore cannot decide whether B1 B2 or B2 B1 is the better arrangement. To increase the chance of having the requested book nearer the
left much of the time, the following algorithm has been devised. Whenever any book is requested, it is placed to the end of the shelf when it
is returned. Thus if B2 is demanded and the shelf order is B1 B2 , the
new arrangement will be B2 B1 once B2 is returned.
(a) Let be the renewal event shelf order of B1 and B2 is B1 B2 .
Let fn be the lifetime sequence for , having generating function F (s).
Show that
F (s) = (1 p2 )s + p1 p2 s2 /(1 (1 p1 )s).
(b) Show that is aperiodic and recurrent. Hence determine lim un =
P ( at epoch n). Determine also the long run probability that the self
order is B2 B1 .
Chapter 6
Discrete Time Markov Chain
6.1
Introduction
We have studied properties of several special stochastic processes. There are

often no precise definitions for stochastic processes. A rough definition is
that a stochastic process is a collection of random variables. Thus, even a
single random variable qualifies as a stochastic process. The simple random
walk, branching process, and renewal process are all typical examples of
stochastic processes. Some common features of these processes are: each
process contains countable many random variables; these random variables
are arranged in some order, that is, we know exactly which one is a precedent
of the other; even though the random variables under investigation are not
necessarily independent of each other, they are often functions of a sequence
of independent and identically distributed random variables.
Why do these stochastic share these properties? This is not because
stochastic processes in the real world happen to have these properties. Rather,
it is because our current knowledge does not allow us to draw meaningful
conclusions from more general processes. We have to limit ourselves to these
simple, yet useful stochastic processes.
In this chapter, we consider a bit more general stochastic process {Xn , n =
0, 1, 2 . . .}. Note that we have only allowed countable number of possible random variables and they are arranged in a fixed order. In addition, we will
only allow Xn to take countable possible values. With this restriction, all Xn
73
74
CHAPTER 6. DISCRETE TIME MC
are discrete random variables. Since it is always possible to label the set of
possible values of Xn s by non-negative integers, we assume Xn taking only
non-negative integer values. When a stochastic process takes values other
than non-negative integers, most of our conclusions will stay.
The most important additional assumption on the stochastic process we
make is the following Markov property:
P {Xn+1 = j|Xn = i, Xn1 = in1 , . . . , X1 = i1 , Xo = io }
= P {Xn+1 = j|Xn = i}
= P {X1 = j|X0 = i}
= pij .
The first equality specifies the Markov property. It is often described
as the property that given the present (Xn = j), the future (Xn+1 = j) is
independent of the past (outcomes of X1 , . . . , Xn1 ). The second equality
further requires the Markov property is time homogeneous. That is, the
conditional probability does not depend on time n. The third equality simply
assigns a special notation. We call this quantity the transition probability
from state i to state j.
The set of all possible values of Xn s is called the State Space. The
sub-indices of Xn are regarded as time. That is, the value of Xn is the state
of the process at time n. Unless otherwise mentioned, the state space will be
denoted as {0, 1, 2, . . .} and the time will also be {0, 1, 2, . . .}. If Xn = i, we
say that the Markov chain is in state i at time n.
It should be clear that the state space contains all the possible values of
X1 , and all possible values of X2 , and all possible values of X3 and so on. It
is not dictated by any single random variable.
Example 6.1
Suppose X0 = 0, X1 has discrete uniform distribution on {0, 1}, X2 has
uniform distribution on {0, 1, 2} and son on. In general, Xn has discrete
uniform distribution on {0, 1, 2, . . . , n} for n = 0, 1, 2, . . ..
The stochastic process {Xn }
n=0 has a state space S = {0, 1, 2, . . .}.
The state space is NOT {0, 1, . . . , n}.
6.1. INTRODUCTION
75
Definition 6.1
A stochastic process is a discrete time Markov chain, if it
1. consists of a sequence of random variables (that is, countable many),
2. has countable state space, and
3. has Markov property.
As already mentioned, notation pij is used for the transition probability

P (Xn+1 = j|Xn = i). It is also called one-step the transition probability
from state i to state j. Obviously,
pij 0;
pij = 1,
i, j 0;
i = 0, 1, 2, . . . .
j=0
It is most convenient to use a matrix notation P = [Pij ]. All entries of P are

non-negative and its the sum of every row of P equals 1.
Example 6.2 (A communication system)
Consider a communication system which transmits the digits 0 and 1. Each
digit transmitted must pass several stages, at each stage there is a probability
p that the digit entered will be unchanged when it leaves. Let Xn be the
digit entering the nth stage. Then {Xn }
n=1 is a two-state Markov chain.
The state space is {0, 1}. The transition matrix is
P =
"
p
1p
1p
p
Example 6.3 (Simple random walk):
76
It is more convenient to denote state space as {0, 1, 2, . . .}. The transition

probabilities are
|i j| =
6 1;
0,
pij = p,
j = i + 1;
1 p, j = i 1.
Example 6.4 (Branching process):

Assume Z0 = 1 and the family size has Binomial (2, p) distribution. Then
the state space is {0, 1, 2, . . .}, and the transition probability is
!
2i j
pij =
p (1 p)2ij .
j
To model an experiment (or real world phenomenon) by a (discrete time)

Markov chain, we should go through the following three steps:
identifying a sequence of random variables;
identifying the corresponding state space;
obtaining the transition probabilities.
We should further confirm the state space is countable, and the Markov property is suitable. In the above examples, you may easily find the state spaces
are countable. If we work more carefully, we often find the Markov property
is true as we can calculate the transition probabilities without conditioning
on the state of Xn1 and so on.
Finally, we should certainly make sure that the Markov chain so defined
serve the purpose of solving problem under investigation. We may define a
flawless Markov chain but it does not help us to answer the question asked.
Example 6.5 (Weather forecasting):
6.1. INTRODUCTION
77
Suppose that there are only two possible weather conditions for any single
day: 1, Rain; 2, Sunny. In addition, we assume that tomorrows weather
depends on todays weather, but not on previous weather conditions when
todays weather is given. Also, the chance of rain tomorrow given today is
raining is , and the chance of being sunny tomorrow given today is sunny
is . We can model this experiment as a discrete time Markov chain.
First, we should define
Yn = {
1 It is sunny on the nth day;

0 It rains on the nth day.
The state space is clearly {0, 1} by the above definition and is countable.
The Markov property is satisfied as it is clearly stated in the description
of the problem. We certainly cannot blindly believe the real world can indeed
be modeled by a stochastic process with a Markov property. At the same
time, it is hoped that this is a harmless mathematical assumption. The
transition probability matrix is
P =
"
1
1
The model in the above example might be too simplistic to be useful in

the real world. One possible remedy of having a more realistic model, but
still simple enough mathematically, is to assume that tomorrows weather
depends only on the weather of yesterday and today. Let us use notation R
for rain and S for sunny. Assume the transition probabilities are as follows:
Yesterday
n-1
R
S
R
S
Today
n
R
R
S
S
P(Tomorrow = R)
n+1
0.7
0.5
0.4
0.2
Under the current specification, {Yn } defined in the last example is no

longer a Markov chain. The probability of Yn=1 = 1 depends on both Yn and
78
Yn1 . (Can we mathematically verify it?) However, it is possible to build on

this process a different process which is a Markov chain.
Let us define for n 1,
Yesterday
n-1
R
S
R
S
Today
n
R
R
S
S
Xn
n
0
1
2
3
We may have to ignore X0 . With this stochastic process {Xn }

n=1 , have a
state space {0, 1, 2, 3} which is still countable. It now has Markov property
as
P (Xn+1 = 0|Xn = 0, whatever Xn1 , Xn2 , . . .) = 0.7.
Note that Xn1 is assumed to be consistent with Xn so that the probability
of the event in the conditioning part is non-zero. In general, we may define
P (A|B) to be anything if P (B) = 0.
To verify the Markov property, we need only work out the transition
matrix. It is simply to see that
P (Yn+1 = 0|Yn = 0, Whatever values of Yn1 and so on) = 0.7.
The transition matrix turns out to be
P =
0.7 0 0.3 0
0.5 0 0.5 0
.
0 0.4 0 0.6
0 0.2 0 0.8
It is simple to check again that pij 0 and j pij = 1.

Remark: There are more than one way to label the state space of the
stochastic process. When we exchange the states 0 and 1 in the above definition of random variable Yn , then the transition matrix would have first and
second rows, first and second columns exchanged too. This will create some
problem for marking your assignments, but it is not a problem in theory.
P
6.1. INTRODUCTION
79
When trying to use a probability model to describe a real world phenomenon, we are certain that it is not correct. We are happy to learn that it
might still be very useful. Further, by increasing the model complexity, we
can often obtain a model that is closer to reality and very useful.
Example 6.6
Suppose there are 3 white and 3 black balls in two urns, each containing
3 balls. At each step, we draw a ball randomly from each urn. We then
exchange the balls and put them back into urns. We are interested in knowing
the number of while balls in the first urn after n exchanges.
Let Xn be the number of white balls in the first urn after n exchanges.
We will show {Xn } is a Markov chain.
Step 1: It is obvious that Xn can only be 0, 1, 2, or 3, regardless of n.
Therefore, the state space is {0, 1, 2, 3} which is countable.
Step 2: We need to verify the Markov property. First note,
P (Xn+1 = j|Xn = i, Xn1 = in1 , . . .) = 0 if |i j| 2.
The notation in1 in the above expression simply means some number. The
equation implies the transition probability from state i to state j does not
depend on the value of in1 or others, it equals 0 as long as |i j| 2.
We have a large number of other cases. If i = 0 and j = 0, we find
P (Xn+1 = 0|Xn = 0, whatever others) = 0.
This is because we will definitely obtain a while ball. Obviously
There is no need to consider other cases when i = 0.
If i = 3, we have
If i = 2, we have
4
P (Xn+1 = 2|Xn = 2, whatever others) = .
9
80

4
9
1
9
If i = 1, we have
4
9
4
9
1
9
As none of the above transition probabilities depend on the whatever part,
the Markov property has been verified. The transition probability matrix is
9
P =
4
9
4
9
4
9
4
9
1
9
This completes the task for modeling this experiment as a Markov chain.

The above model turns out to be very useful to describe the diffusion
process in the physical world. If you drop a colored water into a cup of
clean water, soon the color will spread out. It will be mixed with the water
perfectly without our help. By imaging molecules moving from one part of
the cup to another part randomly, we can see that in the limit, the color
molecules will distribute uniformly over the whole cup.
6.2
Chapman-Kolmogorov Equations
Suppose that in Example 6.6, it is known that X0 = 1. Give this information,

what is the probability that X2 = 1? More generally, what would be the
probability X100 = 1?
6.2. CHAPMAN-KOLMOGOROV EQUATIONS
81
The first question has a specific answer, the second one can be answered
in principle.
P (X2 = 1, X0 = 1)
P (X0 = 1)
P3
j=0 P (X2 = 1, X1 = j, X0 = 1)
=
P (X0 = 1)
P (X2 = 1|X0 = 1) =
3
X
P (X2 = 1|X1 = j)P (X1 = j|X0 = 1)
3
X
p1j pj1 =
j=0
j=0
41
.
81
The solution to P (X100 = 1|X0 = 1) can be obtained in the same way,

except we need to work on sum of items which are products of 100 numbers.
Due to (temporal homogeneity) Markov property, we know that
P (Xn+2 = j|Xn = i) = P (X2 = j|X0 = i)
which does not depend on n. We hence call it two-step transition matrix.
(2)
Using notation pij , we find
(2)
pij =
pik pkj .
The above formula may look complex. However, if expressed in matrix format, it becomes
P (2) = P 2
where P (2) is the two-step transition probability matrix and P is the one step
transition probability matrix. In general, we have
P (m) = P m .
When the state of the Markov chain at time 0 is not given, but we know
the distribution of X0 as
P (X0 = i) = i ,
82
we can then find the distribution of X1 as follows:

P (X1 = j) =
pkj k .
Let be the row vector of i and be the row vector of P (X1 = j), we have
= P.
Similarly, if m is the row vector of P (Xm = j), we have
m = P m .
The above formulas P (m) = P m and its generalized form
P (m) P (n) = P (m+n)
are called Chapman-Kolmogorov equations. They are so simple and straightforward. I am sure that if you were born a few centuries earlier, it could be
your name that is attached to these formulas.
6.3
Classification of States
In almost any disciplines, it is often possible to classify the objects under

investigation into several groups. The objects in the same group share some
common properties.
Recall that all possible values of all Xn , n = 0, 1, . . ., form the state space
S of the Markov chain. For convenience, we denote them as {0, 1, 2, . . .} most
of the times. In this section, we plan to classify these states according to the
stochastic properties of the Markov chain. As you may have already guessed
correctly, the transition probability matrix P , together with state space S,
almost completely determine the properties of the Markov chain.
Accessible: We say that state j is accessible from state i if there exist
m 0 such that
(m)
Pij = P (Xn+m = j|Xn = i) > 0.
This implies that if the Markov chain ever enters state i, it is possible for it
to enter state j in the future.
6.3. CLASSIFICATION OF STATES
83
Communicate: If states i and j are accessible from each other, we say

that they communicate. We use notation i j. We now try to use this
relation to classify the states.
We first claim that any state of the Markov chain communicates with
itself. This might be interpetted as the state i is accessible from itself in 0
steps. It may also be regarded simply as a convention.
Second, it is easily to see that if state i communicates with state j, the
state j communicates with state i. This is the consequence that the definition
of communicate is symmetric in states i and j.
Third, if state i communicates with state j, and state j communicates
with state k, then state i communicates with state k too. This can be proved
as follows. The assumptions imply the existence of integers m1 and m2 such
that
P (Xm1 = j|X0 = i) > 0,
P (Xm2 = k|X0 = j) > 0.
Hence,
P (Xm1 +m2 = k|X0 = i) =
P (Xm1 +m2 = k|P (Xm1 = l)P (Xm1 = l|X0 = i)
P (Xm1 +m2 = k|P (Xm1 = j)P (Xm1 = j|X0 = i)

> 0.
Similarly, we can show
P (Xm1 +m2 = i|X0 = k) > 0
for probably a different pair of m1 and m2 .
The above three properties of communicate then describe an equivalent
relationship. An equivalent relationship divides the state space uniquely into
equivalent classes. The equivalent classes have the following two properties:
1. Any two states in the same class are equivalent to each other.
2. Two states from two different classes are not equivalent.
84
Consequently, the relationship communicate divides up the state space

of a Markov chain into distinct classes. The states in the same class communicate with each other. States from different classes do not communicate.
If all states communicate with each other, then there is only one class in
the state space. In this case, we say that the Markov chain is irreducible.
The idea behind this notion is: if a Markov chain is reducible, we might
be able to reduce it before we study its other properties. One principle in
mathematics is always to simplify the problem under investigation.
Example 6.7
Consider a Markov chain consisting of four states 0, 1, 2, 3 and having
transition matrix
1 1
0 0
2
2
1 1
0 0
2
2
P=
1 1 1 1 .
Note that out set up assumes that {Xn }

n=0 have been defined, the state
space is {0, 1, 2, 3} and the transition probabilities are already specified by the
transition probability matrix. In many of our assignment problems, you may
have to go through these omitted steps. When you get lost on an assignment
problem before you even get started, ask the question of what these things
are first.

- 0

1
6
@
I
@
@
@
@
@
- 3

2

85
It is obvious that states 0 and 1 communicate with each other. Hence,

they belong to the same class. All states can be accessed from state 2.
However, state 2 can only be accessed from itself. Therefore, state 2 does
not belong to the class that contains states 0 and 1. It has to form its own
class. State 3 is an absorbing state. When the Markov chain enters this
state, it will stay there forever. Consequently, it does not communicate with
other states either. Hence, state 3 also forms its own class.
In summary, the state space of this Markov chain is divided into three
classes: {0, 1}, {2}, and {3}.
What is the consequence for a Markov chain to be reducible? In the

above example, if X0 = 0, then the states 2, 3 will never be reached. Thus,
the Markov chain has practically the reduced state space {0, 1}.
Next, we consider the problem about the property of the states in the
same class.
It is obvious that the property that the Markov chain enters state i is a
renewal event for any given i. Well, it is probably a delayed renewal event
as we cannot assume X0 = i for every i. However, once the Markov chain is
in state i at time n, the future will not depend on the outcome of Xk , k < n
(conditionally). This justifies the claim that Xn = i is a (delayed) renewal
event.
With this, we may ask the question: is the renewal event Xn = i
transient, positive recurrent or null recurrent? In short, we simply ask the
question: is state i transient, positive recurrent or null recurrent? To avoid
the complication raised from delayed, we answer this question on the assumption that X0 = i whenever we investigate the property of state i.
Recall that a renewal event is recurrent if the probability of its occurrence
in the future is one. Let
fi = P (Xn = i for some n|X0 = i).
Then state i is recurrent if and only if fi = 1. If fi < 1, state i is transient.
In this case, the Markov chain will only enter state i finite number of times.
Sooner of later, the Markov chain will leave this state alone forever.
Consider the Markov chain defined in Example 6.7, it is easily seen that
state 2 is transient. Starting from state 2, the Markov chain has probability
86
0.25 to stay in the same state after one transition. However, once it leaves
state 2, the Markov chain will never re-enter this state.
According to our discussion on renewal event, it is also true that state i
is transient if and if only
X
P (Xn = i|X0 = i) =
(n)
pii < .
(Recall we obtained this theorem by using generating functions). Consequently, state i is recurrent if and only if
X
P (Xn = i|X0 = i) =
(n)
pii = .
Recall a state i is transient implies that the Markov chain will only visit
i finite number of times (in the whole process over infinity time horizon). If
a Markov chain has only finite number of states, at least one of the states
must be visited infinity number of times (over the infinity time horizon). We
hence conclude:
Corollary 6.1
If a Markov chain has finite state space, then at least one of its state is
positive recurrent.
Further, we find the following is true.

Corollary 6.2
If state i is recurrent, and states i and j communicate, then state j is also
recurrent.
Proof: We use the criterion that a renewal event is recurrent if and only if
P
n
n un = . See Theorem 5.1. For state j, the un is pjj .
Since i and j communicate, there exist m1 and m2 such that
(m1 )
pij
> 0,
(m2 )
pji
> 0.
In addition,
(m +n+m2 )
pjj 1
(m2 ) (n) (m1 )

pii pij .
pji
87
Now we sum over n on both size and find

X
(m +n+m2 )
pjj 1
(m2 )
pji
(n)
(m1 )
pii ]pij
= .
Note that it implies n pnjj = .
Remarks: The above result implies that recurrent property is a class property. If one state is a class is recurrent, than all the states in the same class
are recurrent. Further, we see that transient property is also a class property.
If one state is transient, then all states in the same class are transient.
We claim without proof that positive recurrent property, periodicity are
all class properties.
P
Example 6.8
Let the Markov chain consisting of the states 0, 1, 2, and 3 have the transition
probability matrix
0 0 21 12
1 0 0 0
.
P=
0 1 0 0
0 1 0 0
Determine which states are transient and which are recurrent.
Solution: The Markov chain is irreducible and have finite state space,
hence all states are recurrent.
Example 6.9
Consider the Markov chain having states
probability matrix
1
1
0
21 21
2 2 0
P=
0 0 12
0 0 1
2
1
1
0
4
4
0, 1, 2, 3 and 4 with the transition

0
0
1
2
1
2
0
0
0
0
1
2
Classify the state space (including identifying the transient, positive recurrent, null recurrent classes, and the periods of each classes).
88
Solution: This chain consists of the three classes {0, 1}, {2, 3} and {4}.
The first two classes are positive recurrent and the third transient. All classes
are aperiodic.
Example 6.10 (Simple random walk)

We already know many properties of the simple random walk. It is simple
to see that the chain is irreducible. When p = 0.5, all states are recurrent.
When p 6= 0.5, all states are transient. Note that this chain has infinite state
space.
We may verify the recurrence property of state 0 directly.
Recall
(2n)!
(2n)
p00 =
{p(1 p)}n
n!n!
for all n = 1, 2, . . .. Using Stirling formula, we find
(2n)
p00
{4p(1 p)}n
.
n
(n)
Hence, when 4p(1 p) < 1, n p00 < . The state 0 is transient. This
happens when p 6= 0.5. Otherwise the sum is infinity and the state 0 is
recurrent.
It turns out that the two-dimensional random walk (on grid) has similar
property. If the probability of moving in 4 directions are equal, then all the
states are recurrent. The simple random walks on three or higher dimensions
lose this property. All states are transient.
Remark It is often asked whether a closed class is a recurrent class. The
simple random walk example answers this question. When p 6= 0.5, the state
space is a closed class, but it is also a transient class.
Remark We have discussed many ways to find out whether a state is recurrent or transient. It may not be clear which methods should be applied to
inexperienced users. My thumb of rules are:
P
1. When confused, ask yourself what the definitions of recurrentness or

transientness are.
6.4. LIMITING PROBABILITIES
89
2. Try first to see if the state is transient. If it belongs to an open class,

then it is transient.
3. If it belongs to a closed class. Check the finiteness of the class. If yes,
state i is positive recurrent.
4. If it belongs to a class which is closed but has infinite size, try to see if
P (n)
you can tell n pii is finite or infinite.
Unfortunately, you have to be resourceful in order to use the last criterion.
Thus, you should use it only as the last resource.
6.4
Limiting Probabilities
Recall a renewal event is aperiodic if its period is one. A Markov chain is

irreducible if all its state belong to the same class. A positive recurrent,
aperiodic state is called ergodic. These properties are all class properties.
That is, if one state is found to have the property, all states in the same class
will share this property.
It is seen that the precise distribution of Xn is often hard to obtain for each
n. However, it is realized that when the process has been in existence for a
long time, the distribution of Xn seem to stabilize. For example, shortly after
we drop a coloured water into a cup of clean water, we cannot tell exactly
what these colour molecules are. After we waited enough amount of time,
we are certain that they have spread out very uniformly. Mathematically, we
can find a limit for P (Xn = j) as n . The result is summarized by the
following theorem.
Theorem 6.1
(n)
For an irreducible ergodic Markov chain, limn pij exists and is independent of initial state i. Further, letting
(n)
j = lim pij ,
n
90
then i is the unique nonnegative solution of

j =
i pij ,
i=0
j = 1.
We do not present a proof of this theorem. However, since the Markov

chain entering state i is a renewal event, the renewal theorem applies. Hence,
(n)
under the conditions in this theorem lim pjj exists and equals 1
j where j
is the average time it takes for the Markov chain to come back to state j.
Further, when the Markov chain is irreducible and ergodic, for any state
j, the probability for the chain to enter j eventually is 1. This implies that
(n)
the limit of pij is the same as pjj regardless of from which state the chain
starts.
Let be the vector of the limiting probabilities, and n be the vector of
P (Xn = j|X0 = i). From Chapman-Kolmogorov equality, we have
n+1 = n P.
(n)
Let n on both sides, under the assumption that the limit of pij exists,
we have
= P.
That is, must be the solution of this equation. However, P I does not
have full rank. There exist many many solutions. The one which also satisfies
P
i=0 j = 1 gives the limiting probabilities. The solution with this property
is unique.
The renewal theorem claims j = 1
where j is the expected interj
occurrence time. Hence j is the long run proportion of times the Markov
chain is in state j.
When the Markov chain is irreducible and positive recurrent, but not
aperiodic, we may still have a unique non-negative solution of
= P
91
satisfying j j = 1. In this case, j is still long-run proportion of time when

(n)
the Markov chain is in state j, but the limit of pij may not exist. We still
have j = j1 . That is, the expected inter-occurrence time is given by j1 .
(n)
When limn pij exists and equal for all i, then limn P (Xn = j) = j .
If a solution to
= P
P
satisfying j j = 1 exists and the Markov chain is irreducible, then all states
are positive recurrent.
P
(n)
Recall that P (Xn = j) =
i=0 pij P (X0 = i). Let n results in
lim P (Xn = j) = j .
If n = , then n+m = for all m = 1, 2, . . .. Hence, we say is the
stationary distribution of the Markov chain. In some books, it is also
called the steady state of the Markov chain. It can be seen that the stationary distribution may exist even when the Markov chain is reducible (not
irreducible). In this case, there can exist more than one stationary distributions.
When n = , we also call that the Markov chain has reached equilibrium. Under this status, the rate of the chain entering any given state is the
same as the rate of the chain leaving this state.
Example 6.11
A problem of interest to sociologists is to determine the proportion of society that has an upper or lower class occupation. One possible mathematical
model would be to assume that transition between social classes of the successive generations in a family can be regarded as transitions of a Markov
chain. Let us assume that we examine a single family tree of their first child.
Let Xn = 0, 1, 2 depending on the social class of the child in the nth generation. Suppose Xn is a Markov chain and the transition probability matrix is
given by
0.45 0.48 0.07
P =
0.05 0.70 0.25 ,
0.01 0.50 0.49
Solving the equation P = and
j = 1, we will get
0 = 0.07, 1 = 0.62, 2 = 0.31.
92
In other words, in long run, the child under consideration has 7% chance
of belonging to class 0. If this model applies to all individuals in the society,
about 7% of people in the population will belong to class zero in long run.
Example 6.12
Consider a large population of individuals and consider their genotype at a
specific locus. Each individual has a pair of genes at this locus. A gene can
have different forms called alleles. In this example, we assume that there are
only two possible alleles named A and a. In generation 0, the proportion
of individuals with genotype AA, aa or Aa are respectively p0 , q0 and r0
(p0 + r0 + q0 = 1). Mendalians law states that the child of a couple will
inherit one gene from each parent. The two genes of each parent is equally
likely to be transmitted to its offspring.
It is a bit difficult to define a sequence of random variables Xn explicitly
here. Consider a line of individuals so that each person is the first child
of the person considered in the last generation. This person serves as a
representative of the general population.
Let Xn be the genotype of the nth person in this line, n = 0, 1, 2, . . ..
Assume
P (X0 = AA) = p0 , P (X0 = Aa) = r0 , and P (X0 = aa) = q0 .
That is, the first individual is chosen randomly from the population.
In addition, we assume his/her spouse will be selected from the population
randomly (at least in terms of his/her genotype). Let Yn be the genotype of
the spouse of the Xn . We assume that Yn has the same distribution as Xn .
P (X1 = AA|X0 = AA)
= P (X1 = AA, Y0 = AA|X0 = AA)
+P (X1 = AA, Y0 = Aa|X0 = AA)
+P (X1 = AA, Y0 = aa|X0 = AA)
1
= p0 + r0 + 0.
2
93
If Yn has distribution pn , rn , qn , we have

P (Xn+1 = AA|Xn = AA)
= P (Xn+1 = AA, Yn = AA|Xn = AA)
+P (Xn+1 = AA, Yn = Aa|Xn = AA)
+P (Xn+1 = AA, Yn = aa|Xn = AA)
1
= pn + rn + 0.
2
Note that P (Yn = AA|Xn = AA) = P (Yn = AA) due to independence.
Straightforward calculation reveals that the transition probability matrix
for the nth generation to the (n + 1)th generation is given by
p +
n
Pn =
0
pn
+
2
rn
2
rn
4
0
qn +
qn
+
2
rn
2
rn
4
q + r2n
p + r2n
pn
+ 2q +
2
rn
2
Using the notation (pn , rn , qn ) for the distribution of Xn , we have for all
n 1,
h
r0
r0
r0
r0 i
(pn , rn , qn ) = (p0 + )2 , 2(p0 + )(q0 + ), (q0 + )2 .
2
2
2
2
Our computational results imply that the distribution of Xn stabilizes after
one generate. It is simple to verify that all Pn , n 0 are in fact equal. Hence
{Xn }
n=0 is a Markov chain. Its limiting probability is given by 1 and the
transition probability matrix is given by P1 .
Example 6.13 (Renewal events):

Consider a sequence of Bernoulli trials with outcomes H and T so that
P (H) = p and P (T ) = q. We may model any pattern as a state of a
Markov chain. For instance, if we are interested in the occurrence of T H, we
may define
outcome at n-1, n Xn =
TT
0
TH
1
HT
2
HH
3.
94
We may have to define X1 differently. This is obviously an irreducible, ergodic

Markov chain. It is obvious that for all n 2.
P (Xn = 1) = pq.
By Theorem 6.1 about the limiting probabilities of Markov chain, we have
1 = pq. The average time it takes for the Markov chain to come back to
1
state 1 (starting from 1) is therefore pq
.
Since it happens that starting from state 1 is equivalent to starting from
nothing, hence the expected waiting time for HT to occur is the same as the
average inter-occurrence time.
This result is can also be regarded as a result from applying the Renewal
Theorem (Theorem 5.2).
Example 6.14
If we want to know the expected time for the occurrence of the pattern T HT ,
a caution has to be applied. Using the same argument, we can get the average
time it takes to the next appearance of T HT starting from T HT which is
1
. However, starting from T HT is different from starting from nothing.
pq 2
You may notice that T HT is a delayed renewal event.
To calculate the average waiting time for the first occurrence of T HT
from the beginning, we take note of the following fact. Once T has occurred,
the waiting time distribution of the occurrence of T HT from this moment
is just the same as the waiting time distribution for T HT from the moment
when T HT occurred. Therefore, we can simply calculate the average waiting
time for the first occurrence of T . It happens that T is a renewal event and
the technique used in Example 6.13 can be used.
The renewal theorem, or the limiting probability theorem for Markov
chain tell us that this average waiting time is 1q . Consequently, the average
waiting time for T HT to occur from the beginning is
1
1
+ 2.
q pq
Can you justify now that

1
1
E[waiting time until THTH appears] =
+ 2 2?
pq p q
6.5. MEAN TIME SPENT IN TRANSIENT STATES
6.5
95
Mean Time Spent in Transient States
If a state is transient, then the Markov chain will leave this state alone after
some finite amount of time. Let fi be the probability that the the chain will
return to state i (start from i). Then the number of future visits Ni (start
from i) has geometric distribution
P (Ni = n) = fin (1 fi ),
n = 0, 1, . . . .
Hence E[Ni ] = (1 fi )1 . Please note that the probability of success (leave

forever from now) is 1 fi in this case.
It is, however, not obvious how to calculate fi . Assume we have a finite
class of transient states T = {1, 2, . . . , t}. Let
PT = [pij ]
be the set of transition probabilities within this class. This is just a submatrix
of the full transition probability matrix.
We claim that at least one of its rows has a sum less than 1. Otherwise,
the Markov chain will stay inside this class forever. If so, this class is not a
transient class.
Let sij be the expected number of time periods (which is a random
variable) that the chain will visit j, given that it starts from i. Let i,j = 1
when i = j and 0 otherwise.
Recall that if the chain leaves this class, it will never come back. Otherwise, we find states from other class that communicate with states in this
class. This contradicts the assumption of class. Using conditional expectation technique, we get
sij = i,j +
t
X
pik skj .
k=1
Let S = [sij ], we may write the above equation as

S = I + PT S.
Hence S = (I PT )1 .
Example 6.15
96
Consider the gamblers ruin problem with p = 0.4 and N = 7. The class of
transient states consists of {1, 2, . . . , 6}. We can easily find pi,i+1 = 0.4 and
pi,i1 = 0.6 for these transient states. Inverting I P gives a big matrix
(using Splus or whatever you can think of). It turns out that
s3,5 = 0.9228,
s3,2 = 2.3677.
How does the above calculation relates to fi we discussed earlier? Let fij
be the probability that start from state i, the Markov chain will ever visit
state j (again if i = j). Hence, fi = fii . It can be seen that
sij = (ij + sjj )fij + ij (1 fij )
= ij + fij sjj .
This is a very simple relation.
6.6
Problems
1. Let {Xn }
n=0 be a stochastic process.
(a) If for each fixed n, Xn has density function
f (x) = 1
when x [n, n + 1],
write down the state space of this process.

(b) If for each fixed n, P (Xn = n) = P (Xn = 1) = 0.5, write down
the state space of this process.
(c) Which of the above state spaces, the one in (a) or in (b), is countable?
2. Suppose that whether or not it rains today depends on previous weather
conditions through the last three days. Show how this system may
be analyzed by using a Markov chain. How many states are needed?
Define the stochastic process, list the state space.
Suppose also that if it has rained for the past three days, then it will
rain today with probability 0.8; if it did not rain for any of the past
6.6. PROBLEMS
97
three days, then it will rain today with probability 0.2; and in any other
case the weather today will, with probability 0.6, be the same as the
weather yesterday. Determine the the transition matrix.
3. Let the transition probability matrix of a two-state Markov chain be
given by
!
p
1p
.
P=
1p
p
Show by mathematical induction that
(n)
0.5 + 0.5(2p 1)n 0.5 0.5(2p 1)n

0.5 0.5(2p 1)n 0.5 + 0.5(2p 1)n
4. Let the one step transition matrix of an MC be

P =
"
1a
a
b
1b
0 < a, b < 1.
Show that the n-step transition matrix

1
P =
a+b
n
"
b a
b a
(1 a b)n
+
a+b
"
a a
b b
Use matrix multiplication directly to obtain P 3 when a = b = 0.25.

Verify the result by using the formula you just obtained.
5. Specify the classes of the following Markov chains and determine whether
they are transient or recurrent, whether they are periodic or aperiodic.
For recurrent states, find their mean recurrence time.
0 0.5 0.5
P2 =
P1 =
0.5
0
0.5
0.5 0.5 0
P3 =
1/2 0 1/2 0
0
1/4 1/2 1/4 0
0
1/2 0 1/2 0
0
0
0
0 1/2 1/2
0
0
0 1/2 1/2
P4 =
0
0
.5
0
0
0
.5
0
0
0
0
1
1
1
0
0
1/4 3/4 0
0 0
1/2 1/2 0
0 0
0
0
1
0 0
0
0 1/3 2/3 0
1
0
0
0 0
98
P5 =
P7 =
0
0
1
1
0
0
0.5 0.5 0
1/3 1/3 1/3
0
0
0
0
P6 =
1/3 2/3 0
0
0
0
2/3 1/3 0
0
0
0
0
0 1/4 3/4 0
0
0
0 1/5 4/5 0
0
1/4 0 1/4 0 1/4 1/4
1/6 1/6 1/6 1/6 1/6 1/6
0
0
0
1/3
P8 =
1 0
0 0
1 0
0 2/3
0
1
0
0
1
0
0
0
0 0
0 3/4 1/4 0
0 0
0 1/8 7/8 0
0 0
1/4 1/4 0 1/8 3/8 0
1/3 0 1/6 1/6 1/3 0
0
0
0
0
0 1
6. Let {Xn }
n=0 be a Markov Chain with transition probability matrix
P =
2
3
3
4
1
3
1
3
1
4
0 0
0 0
.
0 13 13
0 0 0 1
1) Classify the state space into classes. Assume the state space is
{0, 1, 2, 3}.
2) Which of them are recurrent, or transient?
3) Find the period of state 2.
4) Find the expected inter-occurrence times for all recurrent states.
(The answers to some states should be obvious; Limiting probabilities
are useful if you know how to get them).
7. Prove that if the number of states in a Markov chain is M , and if state
j can be reached from state i, it can be reached in M steps or less.
8. A transition matrix P is said to be doubly stochastic if the sum over
each column equals one; that is
X
pij = 1, for all j.
If such a chain is irreducible and aperiodic and consists of M + 1 states,

0, 1, . . . M , show that the limiting probabilities are given by
6.6. PROBLEMS
99
j =
1
,
M +1
j = 0, 1, . . . , M
9. Let {Xn }
n=0 be a Markov Chain with transition probability matrix
P =
2
3
1
2
1
3
1
2
1
4
0 0
0 0
.
0 14 12
0 0 0 1
1) Classify the state space into classes.

2) Which of them are recurrent, or transient?
3) Find the period of state 2. (assume the state space is { 0, 1, 2, 3}).
4) Find the expected inter-recurrent times for all recurrent states. (The
answers to some states should be obvious; Limiting probabilities)
10. Consider the transition matrix
P =
1
4
3
4
1
2
3
4
1
4
1
2
0
0
0
0 0 0
0 13 0
0
0
0
1
2
3
0
0
0
0
0
(a) Show that S consists of 2 closed classes and 2 open classes. What
are these classes?
(b) Determine the period of each of the closed classes.
Note that it is impossible to return to either of the transient states 2
and 4 in this chain. In this case, we set the period of the state to be
infinity, to indicate that the chain cannot return to this state.
(c) Find the unique steady state corresponding to each of the closed
classes.
(d) Write down the general form of all steady states for P .
100

(2) If X0 = 2, what is the probability of absorption in to the class
{0, 1}? If X0 = 4, what is the probability of absorption in to the class
{0, 1}?
P =
1
5
1
3
1
5
1
5
1
5
1
5
0
0 23 0 0
0 0 12 0 12 0
.
0 35 0 25 0 0
0 0 12 0 12 0
1
1
1
0
0
0
4
4
2
(a) Show that S consists of two closed classes and one open class.
(b) Find the period of each of the three classes.
(c) Find the unique steady state corresponding to each closed class,
and write down the general form of all steady states for P .
(d) Find the probability of absorption into {1, 3} from state 0 and
the probability of absorption into {1, 3} from state 5. What can you
say about the probabilities of absorption in {2, 4} from states 0 and 5
respectively?
P =
0
0
0
1
1
1
3
0
0
0
0
2
3
0 0
0 14 34
0 14 34
0 0 0
0 0 0
(a) Check that P is irreducible and find the period of P .

(b) Solve for the unique steady state of P .
(c) Use the periodic form of the Main Convergence Theorem to find
the mean recurrence time of each of the states.
6.6. PROBLEMS
101
13. Consider a chain with states 0, 1, 2, . . ., a with

p0,1 = 1, Pa,a1 = 1
and
Pij =
2
2
i /a
(a i) /a
2i(a i)/a2
j =i1
j =i+1
j = i, (i 6= 0, a).
Show that the chain is ergodic and obtain the stationary distribution.
14. One form of a random walk with two reflecting barriers has transition
matrix given by
P00 = 1 p, P01 = p;
Pj,j1 = q, Pj,j = r, Pj,j+1 = p when 0 < j < a;
Pa,a1 = q, Pa,a = 1 q.
where p + q + r = 1 and all non-zero. Show that the chain is irreducible
and aperiodic. Determine the stationary distribution for this chain.
15. Let {Zn }
n=0 be a branching process with the family size distribution
given by P (X = 0) = 1/3, P (X = 2) = 2/3.
1) State the definition of the Markov chain.
2) Verify that {Zn }
n=0 is a Markov chain. Calculate the transition
probabilities pij . (Think about situations such as i = 0; j = 0; j is odd
etc).
3) Classify the state space. Indicate whether they are recurrent or
transient. Give a one line explanation.
4) Can you find a stationary distribution?
16. Each morning an individual leaves his house and goes for a run. He is
equally likely to leave either from his front or back door. Upon leaving
the house, he chooses a pair of running shoes (or goes running barefoot
if there are no shoes at the door from which he departed). On his return
he is equally likely to enter, and leave his running shoes, either by the
front or back door. If he owns a total of k pairs of running shoes, what
proportion of the time does he run barefooted?
102
17. The proof copy of a book is read by an infinite sequence of editors

checking for mistakes. Each mistake is detected with probability p
at each reading; between readings the printer corrects the detected
mistakes but introduces a random number of new errors (errors may
be introduced even if no mistakes were detected). Assume as much
independence as usual and that the numbers of new errors after different readings are independent and have Poisson distribution. Find
the stationary distribution of the number Xn of errors after the nth
editor-printer cycle.
18. For a series of dependent trials the probability of success on any trial
is (k + 1)/(k + 2) where k is equal to the number of successes on the
previous two trials. Compute limn P (success on the nth trial).
19. It is known that for a Markov Chain, the limit probabilities exist if it
is ergodic and aperiodic. Find a simple example of Markov chain such
that it does not satisfy all the conditions but the limit probabilities still
exist.
20. Suppose there are 5 white and 5 black balls in an urn. On each day, a
ball is selected randomly and replaced by a ball with other color. Let
Xn = 0 if a while ball is selected and Xn = 1 otherwise. Also, let Yn
be the number of white balls in the urn after nth selection. Are {Xn }
and {Yn } Markov chain? If not, explain why. If yes, list the state space
and obtain the transition matrix.
21. Suppose 4 balls are placed into two urns A and B. On each day, One
ball is selected such that each of the four balls is equally likely to be
selected and the ball is then placed into the other urn.
Let Xn be the number of balls in urn A on the nth day. Let Yn be the
number of balls in urn A on the 2nth day.
a) Are {Xn }
n=0 and {Yn }n=0 Markov chains. If any of them are, write
down their state spaces and transition matrices and do the usual classification.
b) Given X0 = 1, find the probability function of X2 .
6.6. PROBLEMS
103
c) In a long run, what proportion of times when at least one urn is

empty?
d) Given X0 = k, calculate the probability that number of balls in
urn A reaches 0 before the number of balls in urn B reaches 0 for
k = 0, 1, 2, 3 and 4.
22. Suppose that coin 1 has probability 0.7 of coming up heads, and coin 2
has probability 0.6 of coming up heads. If the coin flipped today comes
up heads, then we select coin 1 to flip tomorrow, and if it comes up
tails, then we select coin 2 to flip tomorrow. If the coin initially (on
the 0th day) flipped is equally likely to be coin 1 or coin 2, then what
is the probability that the coin flipped on the third day after the initial
flip is coin 1?
23. A particle moves on a circle through points which have been marked
0, 1, 2, 3, 4 (in clockwise order). At each step it has a probability p of
moving to the right (clockwise) and 1p to the left (counterclockwise).
Let Xn denote its location on the circle after the nth step. Show that
the process {Xn , n 0} is a Markov chain.
(a) Find the transition probability matrix.
(b) If we know X1 = 2, what is the probability of X3 = 4?
(c) If X0 is equally likely to be 0, 1, 2, 3, 4, what is the probability of
X3 = 4?
24. Consider a process {Xn , n = 1, 2, . . .} which takes on the values 0, 1,
or 2. Suppose
P {Xn+1 = j|Xn = i, Xn1 = in1 , . . . , Xo = io } =
PijI , when n is even

PijII , when n is odd
where 2j=0 PijI = 2j=0 PijII = 1, i = 0, 1, 2. Is {Xn , n 0} a Markov

chain? If not, then show how, by enlarging the state space, we may
transform it into a Markov chain.
P
25. Show that if state i is recurrent and state i does not communicate with
state j, then pij = 0. This implies that once a process enters a recurrent
104

class of states it can never leave that class. For this reason, a recurrent
class is often referred to as a closed class.
Chapter 7
Exponential Distribution and
the Poisson Process
Recall that we commented that the real world processes are often too complex
to be analyzed based our current mathematical knowledge. We hence often
restrict ourselves to stochastic processes with simple and nice mathematical
properties. Hopefully, the results we developed are still applicable to the real
world approximately. If the model is too far off, we may then increase the
complexity of our model to see if the generalized model helps.
The discrete time Markov chain ignores the duration between two transitions. It is often still satisfactory when we use it to model the gambling
problem, the English text, or even the music notes. When it is used to model
the population size, the idea of generation is obviously too rough. It is very
important to know that some individuals may give birth at younger age than
others.
The waiting time for the next transition (when one gives birth, for example) should clearly be regarded as the outcome of a random variable. It
turns out that a simple yet more realistic assumption on its distribution is
exponential. Unlike the normal distribution, the exponential distribution is
non-negative, its cumulative distribution function has simple form, and it
has memoryless property.
105
106
7.1
CHAPTER 7. EXPONENTIAL AND POISSON
Definition and Some Properties
The density function of an exponential distribution with intensity parameter is given by

f (x) = exp(x), x 0.
Its c.d.f. is given by
F (x) = 1 exp(x) x 0.
It is simple to find that the moment generating function is given by
(t) =
which is defined for all t < .

If X has exponential distribution with parameter , then
E(X) =
1
1
, V ar(X) = 2 .
Please note that if we call = E(X), then = 1 . We also call the

rate of the exponential distribution. It is more convenient to parameterize
the distribution in the current way in this course. However, it might be more
convenient to use mean as a parameter in other courses such as STAT230.
Hence, when reading the phrase that X has exponential distribution with parameter equaling 2.4 (say), we must first clarify what this parameter stands
for. Two possibilities are 1) the mean equals 2.4, or 2) the intensity parameter or the rate equals 2.4.
7.2
Properties of Exponential Distribution
The Exponential distribution has the well known memoryless property. If X

has exponential distribution, then for any s, t > 0,
P (X > s + t|X > t) = P (X > s).
In fact, the exponential distribution family is the only class of distributions
with the above property.
7.2. PROPERTIES OF EXPONENTIAL DISTRIBUTION
107
The memoryless has interesting implications. If how long we live has

exponential distribution, then no matter how old each of us is now, the
expected remaining life time will be the same. The insurance company should
not ask for higher premium for life insurance from older folks. For any
continuous random variable, we may calculate the the conditional probability
P (X (t, t + dt)|X > t)
f (t)dt
= r(t)dt.
1 F (t)
We call
r(t) =
f (t)
1 F (t)
the hazard rate. The hazard rate for exponential distribution is a constant.
That is, it does not depend on t.
Our life time distribution has a non-constant hazard rate for obvious
reasons. Hence, it does not make sense to use exponential model for insurance
companies. However, the hazard remains almost constant for a period of
time, say from age 22 to 40. An exponential model is still helpful in many
ways.
Example 7.1
Let X1 , X2 , . . . , Xn be independent exponential random variables with respective rates 1 , . . . , n , where i 6= j when i 6= j. Let N be a random
variable independent of these random variables, such that
pj = P (N = j),
n
X
pj = 1.
j=1
Then XN is said to be a hyper-exponential random variable. Its density

function is a mixture:
f (t) =
n
X
pj j exp{j t}.
j=1
It can be shown that the risk function r(t) converges to min(j ) as t .
108
Example 7.2
Let Xi , i = 1, 2, . . . , n be iid random variables with exponential distribution
and rate . Then the density function of Sn = X1 + . . . + Xn is
fn (t) =
n
tn1 exp(t)
(n 1)!
for t 0. This distribution is called Gamma with n degrees of freedom and
scale parameter 1 .
Example 7.3
Assume X1 and X2 are two independent exponential random variables with
rates 1 and 2 . Then
1. min(X1 , X2 ) has exponential distribution with rate 1 + 2 .
2. P (X1 < X2 ) =
1
.
1 +2
3. Further, the event X1 < X2 is independent of the event min(X1 , X2 )

t for any t.
We can generalize the above result easily. Suppose Xi , i = 1, 2, . . . , n are

independent exponential random variables with rate i . Then
1. min{Xi : i = 1, . . . , n} has exponential distribution with rate
i .
2. The probability that X1 has the smallest value (among Xi ) is given by

1
.
i i
3. The event Xi has the smallest value is independent of the actual

value of the smallest random variable.
7.3. THE POISSON PROCESS
7.3
109
The Poisson Process
We start with the simplest continuous time stochastic process. If, starting
from a conceptual beginning at t = 0, we are able to determine the value of
N (t) for each given t > 0 such that N (t) represents the number of occurrence
of some incidents (events). We say {N (t) : t 0} is a counting process.
Let {N (t) : t 0} be a stochastic process. To qualify as a counting
process, it must satisfy
(i) N (t) 0;
(ii) N (t) is integer valued;
(iii) If s < t, then N (s) N (t).
We say the counting process has independent increment if
N (t2 ) N (t1 ),
N (s2 ) N (s1 )
are independent whenever t2 t1 s2 s1 . Assume N (t) represents

the number of phone calls by time t from the beginning of the day. The
independent increment property implies that the number of calls I receive in
the first hour is independent of the number of phone calls in the second hour.
If I receive more phone calls in the second hour, it is because that it happens
to be so, not because I received few than usual calls in the first hour.
We say that the counting process has stationary increment if the distribution of
N (t + s) N (t)
depends only (determined by) on s, but not on t. If the number of students
come to my office hours can be modeled by a counting process with stationary
increment, then I should receive similar number of visits over weeks. This
is obviously not true, since I do observe significant increasement in the week
with assignment due. Thus, this counting process does not have stationary
increment.
The most important special case of the counting process is the Poisson
process. Here we put forward a regular, but slightly different definition from
Stat230.
Definition 7.1 Poisson Process Definition 1
110
The counting process {N (t), t 0} is said to be a Poisson process having

rate , > 0, if
(i) N (0) = 0;
(ii) the process has independent increments;
(iii) the number of events (incidents) in any interval of length t is Poisson
distributed with mean t. That is, for all s, t, 0,
P [N (s + t) N (s) = n] =
(t)n
exp(t),
n!
n = 0, 1, . . .
Remark: Note that the number of events is a random variable. Hence,

its random behavior is specified by a cumulative distribution function. This
definition requires that this distribution to be Poisson in order for the process
to be called a Poisson process.
The definition may not be very useful. It is hard to see in any applications, why the third condition might be satisfied. That is why an equivalent
definition is called for. To this aim, we define the notation o(h) (call it small
o of h).
Definition 7.2 Definition of o(h).
If a function f (h) satisfies
lim
h0
f (h)
= 0,
h
we say f (h) = o(h).

A simple non-trivial example is that f (h) = 1 cos(h) = o(h).
Here comes the second equivalent definition of Poisson process.
Definition 7.3 Poisson Process Definition 2

The counting process {N (t), t 0} is said to be a Poisson process having
rate , > 0, if
(i) N (0) = 0.
(ii) The process has stationary and independent increments.
(iii) P [N (h) = 1] = h + o(h).
(iv) P [N (h) 2] = o(h).
7.3. THE POISSON PROCESS
111
Theorem 7.1
Two definitions of Poisson process are equivalent.
Proof: We will only show the conditions (iii) and (iv) in Definition 2 implies
the condition (iii) in Definition.
We first work on P (N (t) = 0). Define P0 (t) = P (N (t) = 0).
Let h > 0 be a small number. Then
P0 (t + h) = P (N (t) = 0, N (t + h) N (t) = 0)
= P (N (t) = 0)P (N (t + h) N (t) = 0) independent increment
= P0 (t)P (N (h) = 0) stationary increment
= P0 (t){1 P (N (h) = 1) P (N (h) 2)}
= P0 (t){1 h + o(h)}.
Consequently, we have
P0 (t + h) P0 (t)
o(h)
= P0 (t) +
.
h
h
Let h 0. The left hand side is the derivative of P0 (t) and the right hand
side gives the result. That is,
P00 (t) = P0 (t).
The solution of this differential equation is given by
P0 (t) = exp(t)
in view of the boundary condition P0 (0) = 1.
Next, we build on top of this result. We use mathematical induction for
other cases. Define
Pn (t) = P (N (t) = n)
and assume
(t)k
exp{t}
k!
for k = 0, 1, . . . , n 1. We have shown that this assumption is true when
n = 1.
Pk (t) =
112
Base on the above induction assumption, we try to show the expression

is true when k = n.
We have
Pn (t + h) = P (N (t + h) = n)
= P (N (t) = n, N (t + h) N (t) = 0)
+P (N (t) = n 1, N (t + h) N (t) = 1)
+P (N (t + h) = n, N (t + h) N (t) 2)
= Pn (t){1 h + o(h)} + (h)Pn1 (t) + o(h).
Note that, we have used the fact o(h) + o(h) = o(h). Thus, we get
Pn (t + h) Pn (t)
o(h)
= Pn (t) + Pn1 (t) +
.
h
h
Let h 0, we get
Pn0 (t) = Pn (t) + Pn1 (t)
(t)n1
= Pn (t) +
exp(t).
(n 1)!
It is a differential equation which can be solved by standard methods. We
however choose to point out that the analytical form suggested for Pn (t)
solves the above equation.
7.3.1
Inter-arrival and Waiting Time Distributions
Let T1 be the waiting time for the first event in a Poisson process with rate
. It is obvious that for all t 0,
P (T1 > t) = P (N (t) = 0) = exp(t).
Hence, T1 has exponential distribution with rate .
Now, let T2 be the waiting time for the second event after the first event
has occurred. We call it inter-arrival time. What is the distribution of T2 ?
Note that
P (T2 > t|T1 = s) = P (N (s + t) N (s) = 0) = exp(t).
7.4. FURTHER PROPERTIES
113
Hence, T2 has exponential distribution with rate too.

One caution is that P (T1 = s) = 0. The above argument is not fully
mathematically satisfactory. We, however, do not have the tool to completely
avoid this problem.
Theorem 7.2
The inter-arrival times Tn , n = 1, 2, . . . are independent and identically distributed with common exponential distribution having rate (or mean 1/).
P
If Sn = ni=1 Ti . Then
P (Sn t) = P (N (t) n).
The density function of Sn is
f (t) =
(t)n1
exp(t).
(n 1)!
The corresponding distribution is called Gamma distribution with n degrees of freedom and scale parameter 1/.
7.4
Further Properties
Suppose that the events in a Poisson process can be classified into two types:
I and II. Further, this classification is random, and it is independent of the
process itself. For example, suppose we can model the number of customs
entering a store as a Poisson process. We classify customers into two types:
class one consists of customers who will buy something; class two consists of
custerms who will just have a look. If we further assume that their purchasing decisions are made independently, then we are in a situation where the
following model will apply.
Let N (t) be the original process. Let N1 (t) be the number of first type
events occurred in [0, t]. Similarly define N2 (t).
Theorem 7.3
114
Under the assumption that each event in a Poisson process can be independently classified as types I and II, the two sub-counting processes are both
Poisson process with rates p and (1 p) where p is the probability of an
event being type I.
Proof: In this situation, it is more convenient to use the first definition of
the Poisson process.
We calculate P (N1 (t) = n, N2 (t) = m) for each pairs of non-negative
integers. This will give us the joint distribution of N1 (t) and N2 (t). Whether
N1 (t) and N2 (t) are independent, whether they all have Poisson distribution,
these questions will be answered easily. Other conditions in the definitions
are obvious.
Here is our calculation:
P (N1 (t) = n, N2 (t) = m) = P (N1 (t) = n, N1 (t) + N2 (t) = n + m)
= P (N1 (t) = n, N (t) = n + m)
= P (N1 (t) = n|N (t) = n + m)P (N (t) = n + m)
!
n+m n
(t)n+m
m
exp(t)
=
p (1 p)
(n + m)!
n
(pt)n
{(1 p)t}n
=
exp(pt)
exp{(1 p)t}.
n!
n!
Obviously, N1 (t) and N2 (t) are independent and both have Poisson distribution.
7.5
Conditional Distribution of the Arrival

Times
Suppose that in a Poisson process, it is known that N (t) = 1. We want to

know when did this event occur during the period of [0, t].
As before, let T1 the the time when the first event occurred. For any
s t, we have
P (T1 s|N (t) = 1) =
P (T1 s, N (t) = 1)
P (N (t) = 1)
7.5. CONDITIONAL DISTRIBUTION OF THE ARRIVAL TIMES
115
P (N (s) = 1, N (t) N (s) = 0)

P (N (t) = 1)
s
= .
t
That is, the first event is equally likely to have occurred at any moment in
[0, t]. This is another evidence for uniformity.
Let Si , i = 1, 2, . . . be the time when the ith event occurred. Given
N (t) = n, what is the conditional joint distribution of Si , for i = 1, 2, . . . , n?
For this purpose, let si , i = 1, 2, . . . , n be an increase sequence of positive
numbers such that sn < t and none of them are equal. Let us try to calculate
the probability of the event
Si (si , si + dsi ) for all i = 1, 2, . . . , n00 .
The notation dsi are just some imaginary small numbers. Roughly, we may
believe that
P (Si (si , si + dsi ), i = 1, 2, . . . , n|N (t) = n)
P (Si (si , si + dsi ), i = 1, 2, . . . , n, N (t) = n)
=
P (N (t) = n)
= [P (N (si + dsi ) N (si ) = 1, N (si ) N (si1 + dsi1 ) = 0, i = 1, 2, . . . , n,
N (t) N (sn + dsn ) = 0)]/[P (N (t) = n)]
n!
ds1 ds2 dsn .
tn
That is, the joint density function of Si is given by

f (s1 , . . . , sn ) =
n!
tn
for all 0 s1 s2 sn t. Note that adding equalities do not change

the density function.
The question is then, what story does this density tell us? Suppose
Y1 , Y2 , . . ., Yn are n independent and identically distributed uniform random variables in [0, t]. Arranging them in increasing order and denoting the
resulting sequence as Y(i) , i = 1, 2, . . . , n, we get order statistics. It turns out
116
that the ordered independent uniform random variables in [0, t] has the joint
density function given by
f (s1 , . . . , sn ) =
n!
.
tn
The moral is: the joint occurrence of n events in [0, t] is again uniform.
7.6
Problems
1. For a Poisson process show, for s < t, that

!
k s k
s
P {N (s) = k|N (t) = n} =
( ) (1 )nk ,
n t
t
k = 0, 1, . . . , n.
2. The number of bankruptcies in normal years Canada-wide follows a

Poisson process with intensity parameter = 10000 per month. Among
them, each bankruptcy has probability 0.98 to be personal bankruptcy,
others are business bankruptcies. Assume the independence of two
types of bankruptcies. Assume also that all 12 months are of equal
length.
(a) If in a particular month, 8500 bankruptcies have been observed,
what is the probability that no more than 10 of these are business
bankruptcies (expression only)?
(b) If in another particular month, 1000 business bankruptcies were
observed, what is the expected total number of bankruptcies in that
month?
(c) Assume the debt for each personal bankruptcy has normal distribution with mean = $100, 000 and standard deviation $40, 000,
independent of each other. Let X be the total debt of the personal
bankruptcies in a month. Calculate the mean and variance of X.
3. The number of meteorites that hit the Earth follows a Poisson process
with intensity parameter = 200 per month. Each meteorite has
probability p of reaching the ground, otherwise it burns up in the air.
Assume the usual independence when necessary.
7.6. PROBLEMS
117
(a) If 200 meteorites have hit the Earth in a particular month, what is
the expected number of them that reached the ground?
(b) If in another particular month, 20 meteorites were found to have
reached ground, what is the expected number of meteorites (including
those burnt up in the air) to have hit the Earth in that month?
(c) Assume the mass of meteorites have an exponential distribution
with mean = 1000 kg, independent of each other. Let X be the total
mass of meteorites that hit the Earth in a year. Calculate the mean
and variance of X. Assume a year equals exactly 12 months.
4. Cars pass a point on the highway at a Poisson rate of one per minute.
If five percent of the cars on the road are Dodges, then
(a) what is the probability that at least one Dodge passes by during a
hour?
(b) given that ten Dodges have passed by in an hour, what is the
expected number of cars to have passed by in that time?
(c) if 50 cars have passed by in an hour, what is the probability that
five of them were Dodges?
5. Let {N (t), t 0} be a Poisson process with rate . Let Sn denote the
time of the nth event. Find
(a) E(S4 ),
(b) E[S4 |N (1) = 2],
(c) E[N (4) N (2)|N (1) = 3].
6. Two individuals, A and B, both require kidney transplants. If she does
not receive a new kidney, then A will die after an exponential time with
rate A , and B after an exponential time with rate B . New kidneys
arrive in accordance with a Poisson process having rate . It has been
decided that the first kidney will go to A (or to B if B is alive and A
is not at that time) and the next one to B (if still living).
(a) What is the probability A obtains a new kidney?
(b) What is the probability B obtains a new kidney?
118
7. Telephone calls arrive at a switchboard in a Poisson process at the rate

of 2 per minute. A random one-tenth of the calls are long distance.
(a) What is the probability that no call arrives between 9:00-9:05am?
(b) What is the probability that at least 2 calls arrive between 10:0010:02am?
(c) What is the probability of at least one long distance call in a ten
minute period?
(d) Given that there have been 8 long distance calls in an hour, what
is the expected number of calls to have arrived in the same period?
(e) Given that there were 90 calls in an hour, what is the probability
that 10 were long distance?
8. Three customers A, B and C enter a bank. A and B to deposit money
and C to buy a money order. Suppose that the time it takes to deposit
money is exponentially distributed with mean 2 minutes, and that the
time it takes to buy a money order is exponentially distributed with
mean 4 minutes. If all three customers are served immediately, what is
the probability that C is finished first? That A is finished last?
Chapter 8
Continuous Time Markov
Chain
One of the shortcomes of the discrete time Markov chain is that it can only
be used to model situations when a transition occurs only at discrete times.
This is not a problem when modeling the outcome of gambling, an English
text or a music piece. It might also be an ideal mathematical model for DNA
sequences. However, it is a bit stretched to model the sizes of some animal
populations.
Continuous time Markov chain represents one of the directions in which
the discrete time Markov chain is generalized. Other than the inter-arrival
time between two transitions is now a continuous random variable, we retain
other requirements of the corresponding stochastic process.
Let {X(t), t 0} be a stochastic process. It is a continuous time Markov
chain if it has the following two properties:
(i) It has countable state space;
(ii) It has Markov property:
P (X(t + s) = j|X(s) = i, X(u) = x(u), 0 u < s)
= P (X(t + s) = j|X(s) = i),
The concept remains the same. Given present (X(s) = i), the future
outcome X(s + t) = j is independent of the past (X(u) = x(u), 0 u < s).
119
120
CHAPTER 8. CONTINUOUS TIME MARKOV CHAIN
When P (X(t + s) = j|X(s) = i) = P (X(t) = j|X(0) = i), this Markov chain

is homogeneous in time.
Naturally, we call
pij (t) = P (X(t) = j|X(0) = i)
the transition probability of the Markov chain from state i to state j in a
period of time t. We use notation P (t) for the matrix formed by these
transition probabilities.
Recall that the state space is countable. Hence we denote it as {0, 1, 2, . . .}.
Suppose X(0) = 3. As time t goes, X(t) may move out of state 3. Let T3 be
the time it takes for this transition to occur. Due to the Markov property,
which can also be interpetted as the memoryless property, T3 must have exponential distribution with some rate. Let call this rate 3 . At the moment
t = T3 + , where is an imaginary tiny quantity, X(t) could equal 0, 1,
2, 4, 5, . . .. What is the probability for X(t) = 4 given X(T3 )? Since it
is occurring in a very short period of time, we use p34 for this probability
and call it instantaneous transition probability. Since this probability is
computed under the condition that a transition has occurred, we must have
p33 = 0.
In general, we have
X
pij = 1
pii = 0,
j
for all i = 0, 1, 2, . . .. We use notation P for the instantaneous transition

probability matrix.
Be aware, this P is different from the P defined for discrete time Markov
chain.
Example 8.1
Consider two machines that are maintained by a single repair-person. Machine i functions for an exponential time with rate before breaking down.
The repair time for either machine is exponential with rate .
Let X(t) = the number of machines functioning at time t. Then, we get
a continuous time stochastic process {X(t), t 0}.
The state space of this process is obviously {0, 1, 2} and countable.
121
The Markov property can be verified as the waiting time for a transition is
exponential regardless of which state the Markov chain is in at the moment.
For example, if X(0) = 0, the waiting time for a transition to occur is
the same as waiting time for the repair-person to get one machine repaired.
This waiting time is exponential with rate .
If X(0) = 1, a transition occurs either when the break down machine is
repaired, or the functioning machine breaks, whichever occurs first. Assume
the independence of two waiting times, the shorter of the two has exponential
distribution with rate +
If X(0) = 2, a transition occurs when one of the machines breaks down.
Again, under independence assumption, the waiting time for the first break
down is exponential with rate 2.
Note also not only the waiting time for a transition is independent of the
past (given the present), the transition probability is also independent of the
past.
When X(0) = 0, the only possible transition is from 0 to 1. Hence,
p01 = 1.
When X(0) = 1, it transfers to 0 if the functioning machine breaks down
ahead of the break down machine is repaired. (the chance of them to occur
at exactly the same time is nil). This occurs with probability
+
and this event is independent of the occurrence time. (Review our discussion
on exponential distributions). Hence, p10 = /( + ), and p12 = /( + ).
When X(0) = 2, the only possible transition is from 2 to 1. Hence,
p21 = 1.
The above discussion fully verifies the Markov property, and we find
P =
1
0
1
The rates for T0 , T1 , T2 are 0 = , 1 = + and 2 = 2.
In the future, we will tie P and i together so it is simpler to memorize

them.
122
Note that to verify a stochastic process as a continuous time Markov

chain, we go through the following three steps.
Step 0: Define the process {X(t) : t 0} if it is not given;
Step 1: Identify the state space to check its countability.
Step 2: (i) Verify the distribution of the waiting time for a transition
is exponential for all state i. (ii) Verify that the instantaneous transition
probabilities pij do not depend on the history of the stochastic process given
the present state i.
Finally, we normally present the transition matrix P , and the rates of
transitions i at the concluding step.
8.1
Birth and Death Process
The example we just gave is a special case of birth and death process, while
the later is a special continuous time Markov chain.
Suppose we are investigating a specific biological species. Somehow we
have a starting point t = 0, and define
X(t) = Population size at time t.
We now have a continuous time stochastic process with countable state space.
To qualify as a Markov chain, we make several assumptions on its random
behavior:
(i) When X(t) = n, the waiting time for the next birth to occur has
exponential distribution with rate n for n 0.
(ii) When X(t) = n, the waiting time for the next death to occur has exponential distribution with rate n for n 1. (0 = 0). Also, the occurrences
of the birth and the death are independent of each other.
We call such a stochastic process birth and death process. It is seen
that
(a) State space: S = {0, 1, 2, . . .},
(b) (i) Waiting time for the next transition to occur has exponential distribution with rate n + n for n = 1, 2, . . .. (ii) The instantaneous transition
probabilities are:
8.1. BIRTH AND DEATH PROCESS
123
p01 = 1 (no twins).

i
i
, pi,i1 =
(i 1).
i + i
i + i
Unless we are asked to model the birth and death process as a continuous
time Markov chain, we need only specify the birth and death rates in order
to have the birth and death process defined.
Let us consider a special birth and death process.
pi,i+1 =
Example 8.2
A birth and death process is said a pure birth process if n = 0 for all n. It
has further linear birth rate if n = n.
Example 8.3 (A linear growth model)

Consider a birth and death process with linear birth and death rates. That
is, we have a birth and death process {X(t) : t 0} with n = n and
n = n for n = 0, 1, . . ..
Assume X(0) = 1. What would be the value of E[X(t)|X(0) = 1]?
Define M (t) = E[X(t)|X(0) = 1]. Let T1 be the time when the first event
occurs, whether it is a death or a birth. Then
M (t) = E{E[X(t)|T1 , X(0) = 1]}.
Depending on the value of T1 , the conditional expectation has different
outcomes. (That is why we say that the conditional expectation is a function
of T1 ).
If T1 < t, and the event is a death, we have E[X(t)|T1 , X(0) = 1] = 0.
If T1 = s < t, and the event is a birth, we have E[X(t)|T1 , X(0) = 1] =
E[X(t)|X(s) = 2]. Hence, E[X(t)|T1 , X(0) = 1] = 2M (t s).
If T1 > t, E[X(t)|T1 , X(0) = 1] = 1.
Combining these cases together, we have
M (t) = P (T1 > t) +
[2M (t s)]( + ) exp{( + )s}ds

+
= exp{( + )t} + 2
M (t s) exp{( + )s}ds.
124
Taking derivative with respect to t and simplifying the outcome, we have

M 0 (t) = ( )M (t).
Therefore, we must have M (t) = exp{( )t}.
It is simply to claim that if M (0) = i, Mi (t) = i exp{( )t}.
This completes this example.
Example 8.4
Let us consider a birth and death process {X(t), t 0} with birth and death
rates be given by i , i , with 0 = 0.
Let Ti be the time it takes for the process, starting from state i, to enter
state i + 1 for the first time.
Assuming i > 0 for all i. We have
E(T0 ) =
1
.
0
What can we say about the expectation of Ti with i 1?

Let us define
Ii =
1 If the first transition after X(0) = i is a birth;

0 If the first transition after X(0) = i is a death.
Then we have the following.

E[Ti |Ii = 1] = Expected time until the first event occurs =
1
.
i + i
E[Ti |Ii = 0] = Expected time until the first event occurs

+Expected time it takes to go from i 1 to i
+Expected time it takes to go from i to i + 1
1
+ E[Ti1 ] + E[Ti ].
=
i + i
Using the formula for conditional expectation, we get
E[Ti ] = P (Ii = 1)E[Ti |Ii = 1] + P (Ii = 0)E[Ti |Ii = 0]
1
i
=
+
[E(Ti ) + E(Ti1 )].
i + i i + i
8.2. KOLMOGOROV DIFFERENTIAL EQUATIONS
125
We hence arrived at a recursive relationship:

E(Ti ) =
1
i
+ E(Ti1 )
i i
for i = 1, 2, . . ..
In particular, if the birth and death rates are constant, we have
E(Ti ) =
1 (/)i+1
[1 + + ( )2 + + ( )i ] =
.
8.2
Kolmogorov Differential Equations
Recall the Chapman and Kolmogorov equations for the discrete time Markov
chain. The system tells us that the n-step transition matrix is the product
of n one-step transition matrices. This equation system remains true for the
continuous time Markov chain with some modifications.
Lemma 8.1
Suppose {X(t) : t 0} is a continuous time Markov chain. Let pij (t) =
P [X(t) = j|X(0) = i] be its transition probability function. We have
pij (t + s) =
pik (t)pkj (s).
k=0
In matrix form, we have

P (t + s) = P (t)P (s) = P (s)P (t).
The proof is the same as that for the discrete time Markov chain.
For discrete time Markov chain, the shortest time unit is 1. There is no
shortest time unit for continuous time Markov chain. If P (0.01) is known,
we can work out P (0.01n) for all positive integer n in principle. We need
only multiply P (0.01) with itself n times even though you might be bored
126
to death for this task. The real challenge is, however to compute P (0.002)
based on that? Can we find an analytical form for P (t) based on parameters
i , pij , the instantaneous transition probability? The answer is positive in
principle.
Lemma 8.2
Suppose {X(t) : t 0} is a continuous time Markov chain with exponential rates i and instantaneous transition probabilities pij . Let pij (t) be its
transition probabilities for a time period of t. Then we have:
(a)
lim
h0
(b)
1 pii (h)
= i ;
h
pij (h)
= i pij .
h0
h
lim
Proof: We have, by definition,

pii (h) = P [X(h) = i|X(0) = i]
= P {no transitions in(0, h]}
+P {2 or more transitions in(0, h]}
= exp{i h} + o(h).
Therefore, we have
1 pii (h)
1 exp{i h} + o(h)
=
h
h
whose limit is i as h 0. This proves (a).
Similarly,
pij (h) = P {X(h) = i|X(0) = i}
= P {one transition from i to j in(0, h]}
+P {2 or more transitions resulting in j during (0, h]}
= pij P {one transition in(0, h]} + o(h)
= i pij h + o(h).
127
The result (b) is now obvious.
Let V = diag(0 , 1 , . . .) and define G = V (P I) where I is the identity

matrix. The above result can then be extended and summarized in a neat
matrix form.
Theorem 8.1 Kolmogorovs Backward Equations
For a continuous Markov chain, we have
P 0 (t) = GP (t)
where P 0 (t) is a component-wise derivative of P (t) with respect to t.
Proof It is simple to see that
P (t + h) P (t)
[P (h) I]P (t)
=
.
h
h
The limit is obviously GP (t).

Theorem 8.2 Kolmogorovs forward Equations
For a continuous Markov chain, we have
P 0 (t) = P (t)G
where P 0 (t) is a component-wise derivative of P (t) with respect to t.
Proof It is simply to note
P (t + h) P (t)
P (t)[P (h) I]
=
.
h
h
The limit is obviously P (t)G.
Unfortunately, the proofs above are not truly rigorous. The problem is
the order of transition matrix, which could be . In the case when the
state space is infinity (but countable, of course), the matrix multiplication
involves summation of infinite terms. The above manipulation implies taking
derivatives term by term in the summation. This is not always valid of taking
derivatives of the summation. Therefore, the theorem on forward equation
must include some regularity conditions. While we do not specify them here,
128
we would like to mention that they are satisfied whenever the sample space
is finite. It is also okay with birth and death processes.
Let us assume all the processes to be considered in this course
are regular.
The matrix G plays an important role in these two equations. It is called
infinitesimal generator. The backward equation applies P (t) from the back
of G, and the forward equation applies P (t) from the front of G.
In principle, once G is known, we can solve the backward equation to find
the transition matrix P (t). In reality, this is not always feasible. We have a
few examples for which this can be done.
Example 8.5
Pure birth process with constant birth rate . In other words: Poisson process. We practically used the differential equation to show that the number
of events in a fixed period of time has Poisson distribution.
Example 8.6
Consider a lab with one machine. The waiting time until it breaks is exponential with rate , and when it is broken, the waiting time until it is
repaired is exponential with rate . Let us define
X(t) =
0 the machine works at time t,

1 the machine is under repair at t.
It is obvious that {X(t) : t 0} is a birth and death process. It infinitesimal

generating function is
"
#

G=
.

How do we get the transition probability matrix P (t)?
Solution: According to Kolmogorovs backward equation, we know that
P 0 (t) = GP (t)
and P (0) = I. Let us try to solve the equation.
129
Component-wise, we have
p000 (t) = [p10 (t) p00 (t)],
p010 (t) = [p00 (t) p10 (t)].
We can then find
p000 (t) + p010 (t) = 0.
This implies
p00 (t) + p10 (t) = C
where C is a constant. Check the value at t = 0, we find C = . Hence
p10 = [1 p00 (t)]. Substituting back, we find
p000 (t) = ( + )p00 (t).
Solving this equantion, we find
p00 (t) =
exp{( + )t} +
.
+
+
We can similarly work out other components of P (t).

Using this result, we are able to answer the questions such as: if the
machine works at t = 0, what is the probability that is is working at t = 10?
The answer is:
p00 (t) =
exp{10( + )} +
.
+
+
Note that if t , the limit of p00 (t) is
.
+
Thus, the long term proportion of times when the machine is working, is
The answer is very reasonable.
.
+
130
8.3
Limiting Probabilities
Similar to the discrete time Markov chain, when t , pij (t) often has a
limit which does not depend on i. The condition for the validity of this result
is also similar. However, we are no longer bothered by the periodicity.
Theorem 8.3
Suppose {X(t) : t 0} is a continuous time Markov chain with infinitesimal
generator G. Suppose
(a) all states of the Markov chain communicate with each other:
P {X(t) = j for some t > 0|X(0) = i} > 0
for all i, j; (irreducible)
(b) let Tij = The amount of time from X(0) = i until X(t) = j for the
first time, we have E(Tij ) < ,
Then limt pij (t) = j exists for all i, j and the vector satisfies
G = 0;
j = 1.
Remark
1. The limiting probability i still has the interpretation of long run proportional times when the Markov chain stays in state i.
2. Assume that the Markov chain is irreducible, and a non-zero solution
to G = 0 exists. Then the limiting probability exists and all the states
are positive recurrent. That is, we need not verify the condition (b)
before we solving for the limiting probabilities.
3. Without the notation G = V V P , the equation that satisfies can
be written as
X
j j =
k k pkj .
k
We may regard that j j as the rate the Markov chain leaving state j,
P
and k k k pkj as the rate the Markov chain entering state j. When
131
the time t goes to infinity, the Markov reaches equilibrium: the rates
of entering and leaving a state is the same for all states. For this sake,
when the Markov reaches this stage, it is in equilibrium.
4. When the limiting probability exists, the Markov chain is called ergodic. The limiting probability vector is also a stationary probability
distribution, or equilibrium distribution.
5. The expected inter-occurrence time of state j is again given by j =
1/j .
We do not have to rely on the equation G = 0 to find the . See the
following example.
Example 8.7 Birth and death process
Consider a typical birth and death process with birth and death rates n and
n . We easily set up to following table:
State rate of leaving
0
0 0
1
1 (1 + 1 )
2
2 (2 + 2 )
3
3 (3 + 3 )
..
.
n
n (n + n )
..
.
rate of entering
1 1
2 2 + 0 0
3 3 + 1 1
4 4 + 2 2
..
.
n+1 n+1 + n1 n1
..
.
Since the birth and death process has to settle down to some states, the rates
of moving between states have to be balanced. This observation gives
State rate of up rate of down
0
0 0
1 1
1
1 1
2 2
2
2 2
3 3
3
3 3
4 4
..
..
.
.
n
n n
n+1 n+1
..
..
.
.
132
In this case, we get

0
0 ;
1
1
2 =
1 ;
2
2
3 =
2 ;
3
...
....
1 =
From the fact that
n = 1, we find
h
1 = 0 1 +
n1
X
Y i i
n=1 i=0
i+1
A meaningful solution exists if and only if

n1
X
Y i
n=1 i=0
i+1
< .
This is the necessary and sufficient condition for the birth and death process
to reach equilibrium.
When this condition is satisfied, we find
n =
Qn1
i=0
1+
n=1
(i /i+1 )
.
i=0 (i /i+1 )
Qn1
Remark
(i) When the birth rates are too high, the population will keep increasing.
No equilibrium can be researched.
(ii) When n = 0 for some n = N . Then the population size will be
capped by N . It is easy to see that the equilibrium is now always possible.
Example 8.8
A job shop has M machines and one repair person. Assume each machine
will work exponential time with rate , independent of each other, and the
133
repairing time is also exponential with rate , regardless how many machines
are working at the moment.
Define X(t) to be the number of machines not working at time t. Then
{X(t) : t 0} is a birth and death process:
State Space
0
1
2
M
Birth Rates M (M 1) (M 2) 0
Death Rates 0

(a) What is the average number of machines not working in long run?
We need to work out the limiting probabilities to answer this question.
Using the argument of the rates of movements, we note
nn+1 n+1n
(M n)0 n
n+1 .
Thus,
n+1 =
From
(M n)
n = (/)n+1 [(M n)(M n + 1) M ]0 .
= 1, we find
0 =
M
hX
M!
i1
( )i .
i=0 (M i)!
There are no closed solutions. The average number of machines not working
is
lim E[X(t)] =
M
X
nn .
n=0
(b) In long run, the proportion of machine which are working is

1
PM
nn
.
M
n=0
Example 8.9
134
When the birth and death rates are all constant (do not dependent on the
state), the solution for the limiting probability is very simple. The limiting
probabilities are given by
n = ( )n (1 ) n = 0, 1, 2, . . . ,
when < .
This model is also called an M/M/1 queue: a work station has a single
server who works at constant rate, a steady stream of customs arrive for
service. If the service rate is larger than the arriving rate, the system is
stable. A custom will find on average (1 /)1 customers in front of him
upon arrival.
8.4
Problems
1. Suppose that a one-celled organism can be in one of two states either A

or B. An individual in state A will change to state B at an exponential
rate ; an individual in state B divides into two new individuals of
type A at an exponential rate . Define an appropriate continuoustime Markov chain for a population of such organisms and determine
the appropriate parameters for this model.
2. Potential customers arrive at a single-server station in accordance with
a Poisson process with rate . However, if the arrival finds n customers
already in the station, then he will enter the system with probability
n . Assuming an exponential service rate , set this up as a birth and
death process and determine the birth and death rates.
3. Consider a birth and death process with birth rates i = (i+1), i 0,
and death rates i = i, i 0.
(a) Determine the expected time to go from state 0 to state 2.
(b) Determine the expected time to go from state 2 to state 3.
(c) Determine the variances in parts (a) and (b).
8.4. PROBLEMS
135
4. There are two TAs for a particular course who answer questions in a
tutorial center. The number of students who come to ask questions
can be modeled by a Poisson process with intensity = 15/hour. The
amount of time it takes to answer questions for a single student has an
exponential distribution with rate = 12/hour. Assume the center is
large enough for 4 students including those who are asking questions
and new arrivals will not enter when the room is full.
(a) Set up a birth and death process to model this process. This includes: define {X(t), t 0}; write down its state space and its birth
and death rates.
(b) Write down its infinitesimal generator G.
(c) Obtain the limiting probabilities of this process.
(d) What proportion of the time is the room full? Assume the center
has been at service for a very long time.
(e) What proportion of the time can at least one of the TAs have a
rest?
5. A job shop consists of three machines and two repairmen. The amount
of time a machines works before breaking down is exponentially distributed with mean 10. If the amount of time it takes a single repairman to fix a machine is exponentially distributed with mean 8, then
(a) what is the average number of machines not in use?
(b) what proportion of the time are both repairmen busy?
6. Each individual in a biological population is assumed to give birth at
an exponential rate , and to die at an exponential rate . In addition,
there is an exponential rate of increase due to immigration. However,
immigration nor birth are allowed when the population size reaches N .
(a) Set this up as a birth and death model.
(b) If N = 3, = = 1, = 2, determine the proportion of time that
immigration is restricted.
136
7. Potential customers arrive at a full-service, one-pump gas station at a

Poisson rate of 20 cars per hour. However, customers will only enter
the station for gas if there are no more than two cars (including the
one currently being attended to) at the pump. Suppose the amount of
time required to service a car is exponentially distributed with a mean
of five minutes.
(a) What fraction of the attendants time will be spent servicing cars?
(b) What fraction of potential customers are lost?
8. A parking lot has N spaces. The incoming traffic is of Poisson type at a
rate of cars per hour whereas the occupancy times are exponentially
distributed with a mean of hours.
(1) Find the appropriate differential equations for the probabilities,
Pn (t), of finding exactly n spaces occupied at time t.
(2) When N = 5, = 2 and = 1, obtain the variance of the number
of spaces occupied if the process has been operating for a very long
time.
9. A small appropriate, operated by a single barber, has room for at most
two customers. Potential customers arrive at a Poisson rate of three
per hour, and the successive service times are independent exponential
random variables with mean 1/4 hour. What is
(a) the average number of customers in the shop?
(b) the proportion of potential customers that enter the shop?
(c) If the barber could work twice as fast, how much more business
would he do?
10. Consider two machines, both of which have an exponential lifetime with
mean 1/. There is a single repairman that can service machines at an
exponential rate . Set up the Kolmogorov backward equations; you
need not solve them. If you can solve this equation, what questions
will you be able to answer?
8.4. PROBLEMS
137
11. Consider two machines. Machine i operates for an exponential time

with rate i and then fails; its repair time is exponential with rate
i , i = 1, 2. The machines act independently of each other. Define
a four-state continuous-time Markov chain which jointly describes the
condition of the two machines. Use the assumed independence to compute the transition probabilities for this chain and then verify that these
transition probabilities satisfy the forward and backward equations.
12. There are 6 copies of the movie Toy Story in a 24-hour video rental
shop. The demand for this movie is a Poisson process with intensity
parameter = 5/day. Once a tape is rented, the waiting time for its
returning has exponential distribution with rate = 1/day. Define
{X(t), t 0} be the number of copies of Toy story in the shop at time
t.
(a) Model {X(t), t 0} as a birth and death process. Specify it state
space, birth and death rates.
(c) Obtain the limiting probabilities of this process.
(d) What proportion of times when the shop is out of any copies of Toy
Story? Assume the shop has been at service for a very long time.
(e) If the owner charges $2.5 per day for the rental of one copy, how
much money does the owner make from Toy Story per day? (Assume
the owner charge $1.25 for half day and so on for simplicity).
13. Let {X(t)} be a typical birth and death process with birth rates n and
death rates n , n = 0, 1, . . ., and 0 = 0. (You are responsible to know
any other assumptions made in a general birth and death process).
(a) In this set up, let Bn be the waiting time until a birth when X(t) = n
and Dn be the waiting time until a death when X(t) = n. What are
the distributions of Bn and Dn and their related parameters?
(b) Let Tn = min{Bn , Dn }. Calculate P (Tn > t) for t > 0. What is
the distribution of Tn ?
(c) Calculate P (Bn < Dn ) for any non-negative integer n.
138

(d) Verify that {X(t)} is a continuous time Markov chain. Identifying corresponding parameters: the exponential rate vn and conditional
transition probability pij . (given that X(t) is leaving i, the chance it
enters j).
14. A computer can handle N tasks simultaneously. The tasks are submitted to the computer as a Poisson process with a rate of per second
and the amount of time it takes to complete a task is independent of
other tasks and has exponential distribution with a mean of seconds.
The tasks submitted while the computer is at full load will be lost
without any warnings. (a) Set up a birth and death process to model
this process. This includes: define {X(t), t 0}; write down its state
space and its birth and death rates.
(c) Assume N = 3, = 4 and = 1,
(i) obtain the limiting probabilities of this process.
(ii) obtain the mean number of tasks the computer handles at any
moment if the computer has been operating for a very long time.
(iii) what proportion of the jobs you submitted will get lost in a long
run?
Chapter 9
Queueing Theory
Queueing theory is closely related to the continuous time Markov chain. It
has the following basic set ups. There is a service station with several servers.
Customers come for service. They leave after being served. There are three
important factors that determine the properties of a queueing system.
The first factor is the random mechanism of the arrival of the customers.
Is the waiting time for the next customer a constant? Is it independent of
what happened already?
The second factor is the number of servers. How many customers can be
served simultaneously?
The third factor is the random mechanism of the service time. How long
does it take to serve a customer? Is it random?
The model becomes more complex if the number of servers changes according to the length of the queue. Customers may also be divided into
several classes so that some of them receive priority service.
There are also several questions we might be interested in their answers.
On average, how long a customer has to wait before being served? What
proportion of the time when the server is idle? Once we have sufficient
understanding of the queue, the system will be optimized.
9.1
Cost Equations
Let us define the following quantities:

139
140
CHAPTER 9. QUEUEING THEORY
L, the average number of customers in the system;

LQ , the average number of customers waiting in the queue;
W , the average amount of time a customer spends in the system;
WQ , the average amount of time a customer spends waiting in the queue.
These quantities are obviously not independent of each other. Lower
average number of waiting customers implies shorter waiting time. The fundamental constraint is: there is a balance between staying in the queue and
being served in the system.
To make this balance relationship more explicit, we image that each customer will pay to stay in the system, and hence the system makes money.
For a balanced system,
The amount customers pay = The amount the system earns.
When above equality is computed based on unit time, it becomes
Average rate the system earns
= average amount an entering customer pays
the rate of entering customers.
Using this argument, when every customer pays one dollar per unit time,
we find
L = a W
where a is the rate of entering customers.
If customers pay only when they are waiting, not when they are being
served, then the relation becomes
LQ = a W Q .
If customers pay for the service time only, we get
average number of customers in service = a E[S],
where E(S) is the average amount of time a customer spends on service.
In order for these identities hold, the system has to be able to reach an
equilibrium. That is, at some time in the distance future, the rate of entering
equals the rate of leaving.
9.2. STEADY-STATE PROBABILITIES
9.2
141
Steady-State Probabilities
.
Let us now define a stochastic process for the queueing system. At any
given time t, we might be interested in several aspects of the queueing system.
We define
X(t) = number of customers in the system at time t.
Hence X(t) is the total number of customers including those being served at
the moment and those who are waiting. One quantity of interest is P {X(t) =
n} for each n. Namely, the probability (mass) function of X(t) for each
given t. Mathematically, this is often too hard to be computed analytically.
Instead, consider
n = lim P {X(t) = n}
t
when it exists. Under certain conditions, computing this limit is simple. This
quantity can be interpreted as long-run proportion of times when there will
be exactly n customers in the system. It is also referred to as steady-state
probability of exactly n customers in the system. It is usually true that
n is the long-run proportion of times when the system contain n customers.
If 3 = 0.2, then about 20% of times, the system contains 3 customers.
P
On average (in the long-run), there are nn number of customers in the
system.
Let Tm be the arrival time of the mth customer, then X(Tm ) is the
number of customers in the system when the mth customer arrives. Define
an = lim P (X(Tm ) = n).
m
That is, the system is sampled when a new customer arrives.

Let Sm be the departure time of the mth customer. Let
dn = lim P (X(Sm +) = n).
m
So, we sample the system when a customer leaves.

All three limits, n , an and dn are long-run proportion of times when
there are n customers in the system under the specific sampling plan.
142
Example 9.1
Consider a queueing model in which all customers have their service time
equal to 1, and where the times between successive customers are always
greater than 1. In this case, the system is always empty when a new customer
arrives, and when a customer leaves. We hence find
a0 = d0 = 1.
However, 0 > 0 as long as there is a steady stream of customers arriving.
If you work in a service station with this property, your supervisor can
always pick up the right time so that you are found idle all every time, even
though you are very busy in between.
Example 9.2
If there are no multiple arrivals, and there is only one server, then an = dn
for all n.
If the system reaches a balance, the long term number of transitions of
X(t) from n to n + 1 have to be the same as the number of transitions from
n + 1 to n. The former represents an and the later represents dn . So they
are equal.
The conditions on single arrival and single server make sure that transitions such as from n to n + 2 cannot happen.
Example 9.3
If the customers arrive according to a Poisson process model, then
n = an .
Due to possible sampling bias, the supervisor may not always know how
busy you are on average. However, if he/she picks next inspection time
according to an exponential distribution, he/she will not be at risk of misjudging your average work load in the long run.
9.3. EXPONENTIAL MODEL
9.3
143
Exponential Model
A special queueing model is when (i) customers arrive according to the conditions of Poisson process with rate ; (ii) the service station has one server; (iii)
service time has exponential distribution with rate . This type of queueing model is also called /M/M/1 model. The letter M stands for Markov
property: that is, the memoryless property of the exponential distribution
used to describe the arrival and service. The digit 1 stands for the number
of server. Obviously, if X(t) is the number of customers in the system at
time t, then {X(t) : t 0} is simply a birth and death process. If n is the
limit of P [X(t) = n], then it satisfies equation G = 0. From another point
of view, considering the rate of X(t) moving up and down, we have
State up down
0
0 1
1
1 2
2
2 3
n
n n+1
Equating these pairs of rates, we find

n = (/)n o .
Hence, when < so that the limits exist,
0 = 1 (/).
As a distribution, is geometric.
On average, how many customers are there in the system? The answer
is simple, the geometric distribution has mean
L=
1=
.
1 /
(Pay attention that this starts from 0 .)

The average waiting time W = L/ = ( )1 .
144
Let S be the service time a customer receives. Then, by assumption, it

has exponential distribution with rate . Hence, E(S) = 1 . (Too bad we
use as a notation for rate, not for its mean). So, on average, a customer
spends the following amount of time in the queue (not being served):
WQ = W E(S) =
.
( )
The average number of customers waiting in the queue is

LQ = WQ =
2
.
( )
Let W be the amount of time an arbitrary customer spends in the system.

Note that before taking the average, it is random. Its average (expectation)
is given by W = ( )1 . What is its distribution?
If we know the number of customers in the system when this customer
arrives, then the conditional distribution is gamma. Let N be the number of
customers in the system at the moment when this customer arrives. Then
P [W t|N = n] =
(s)n1
exp(s)ds.
(n 1)!
Since Poisson arrivals see the average, we have

P (N = n) = n = (/)n (1 /).
Hence
P [W t] = E{P [W t|N ]} = 1 exp{( )t}.
So, the waiting time of a randomly selected customer is also exponentially
distributed.
9.4
A Single-server Exponential Queueing System Having Finite Capacity
Assume the service station can only hold N customers. When the system is
full, the arrivals get lost. This system can be analyzed in the same fashion
as before.
9.4. SINGLE SERVER
145
State
0
1
2
up
down
0
1
1
2
2
3
N 1 N 1 N
Note that this balance sheet stops at N .
Now, we have
(1 /)
.
0 =
1 (/)N +1
In case you have not noticed the difference, here we put down a list:
(1) This result remains true regardless of > . If there too many
customers that the system can handle, we simply turn them away.
(2) We have a sum of finite number of terms. Sharpening your memory
on geometric summation is necessary.
It is often the case that finite case is harder than the infinite case. We
have, for this system,
L=
N
X
n=0
nn =
{1 + N (/)N +1 (N + 1)(/)N }
.
( ){1 (/)N +1 )}
What is the average amount of time a customer spend in the system.

This depends on whether we count those who will be turned away. If there
are counted, (whose time is zero), the answer is L/. Otherwise, only a =
(1 N ) entered the system, their average time spend in the system is
L/{(1 N )}.
Example 9.4
Suppose it costs c dollars per hour to provide service at rate . Suppose
also that we profit A dollars for each customer served. If the system has
capacity N , what service rate maximizes our total profit?
146
Solution: We can work out the relationship between the net profit and the
service rate, together with arrival rate . Let us assume that M/M/1 model
is suitable.
Net profit per hour = (1 N )A c =
A[1 (/)N ]
c.
1 (/)N +1
We cannot find the that maximize the above expression analytically. If

N = 2, = 1, A = 10 and c = 1, then
Net profit per hour =
10(3 )
.
3 1
We may find the value of that maximizes above numerically. The answer
is approximately = 2.
Example 9.5 A shoeshine shop

It is not entirely clear why we should be interested in a shoeshine shop.
This example, however, provides an example when the possible paths of the
continuous Markov chain form a net. This structure makes the task of solving
equations hard for limiting probabilities.
Ignoreing the background, this continuous time Markov chain has 5 states,
they are:
(0, 0): No customers in the system;
(1, 0): One customer in the system and receiving type I service;
(0, 1): Receiving type II service;
(1, 1): Two customers, receiving types I and II services respectively;
(b, 1): Two customers, one finished type I, and the other is receiving type
II service.
The system can accept at most two customers. A customer arrives when
the system is in states (1, 0), (1, 1) or (b, 1) will be turned away.
Service time of type I is exponential with rate 1 , and of type II is 2 .
Arrival rate of new customers is . You bet that everything is exponential.
The following transitions are possible:
(0, 0) (1, 0),
9.4. SINGLE SERVER
147
(1, 0) (0, 1),
(0, 1) (1, 1),
(0, 1) (0, 0),
(1, 1) (0, 1),
(1, 1) (b, 1),
(b, 1) (0, 1).
Let us equate the rate of enter and rate of leaving:

state
Leave
Enter
(0, 0)
0,0
2 0,1
(1, 0)
1 1,0
0,0 + 2 1,1
(0, 1) ( + 2 )0,1 1 1,0 + 2 b,1
(1, 1) (1 + 2 )1,1
0,1
(b, 1)
2 b,0
1 1,1
General solutions are involved. We work on special cases:
(a) If = 1, 1 = 1, 2 = 2, we find
[(0, 0), (1, 0), (1, 1), (0, 1), (b, 1)] = (12, 16, 2, 6, 1)/37.
So, L = 28/37, W = 28/18. (We should be careful about the rate of entering).
(b) If = 1, 1 = 2, 2 = 1, we find
[(0, 0), (1, 0), (1, 1), (0, 1), (b, 1)] = (3, 2, 1, 2, 3)/11.
So, L = 1, W = 11/6. It is a better arrangement to make the room for the
next customer.
Example 9.6 A queueing system with bulk service

Consider the system when the single server can serve two customers a
time. The service time for two customers are identical. The elevator is such
an example. Because of this, this is still a one server system.
What makes it special is: when there are two or more customers waiting
after the server finishes the service to previous customers, it will take the
148
next two customers. If there is only one waiting, it will take just one. When
nobody is waiting, it idles. It seems more convenient to let
X(t) = the number of customers in the line.
When nobody is waiting, we still have two different situations: the server is
idle, or the server is busy. So, we define X(t) = 0 when no one is in the line,
but the server is busy, and X(t) = 00 when no one is in the line, and the
server is idle.
To find the limiting probabilities, we notice that there is still a general
direction when the Markov chain moves: it either moves up or down to one
state.
state up
down
0
0
00
0
0
0
(1 + 2 )
0
1
(2 + 3 )
n
n (n+1 + n+2 )
Although it is possible to solve this system of equations with generating
function, we need only use this idea to justify that the solution has the form
n = n 0 .
for all n = 0, 1, 2, . . .. Substitute into the relationship, we find
=
Further, from the fact that
1 + 4/ 1
2
i = 1, we find
0 =
(1 )
.
+ (1 )
The rest of limiting probabilities can then be easily calculated. For instance,
00 = 0 .
One remark is that the solution makes sense only if < 1. This requires
2 > , which is obviously necessary.
9.5. NETWORK OF QUEUES

The proportion of customers who are served along is
LQ =
149
00 +1
,
and
.
(1 )[ + (1 )]
It is seen that
WQ = LQ /
and so on.
9.5
9.5.1
Network of Queues
Open System
Consider a two-server system in which customers arrive at a Poisson rate

at server 1. After being served by server 1, they then join the queue in
front of server 2. Assume there are infinite waiting space at both servers.
Each server serves one customer at a time. The service time of the server i
has exponential distribution with rate i , i = 1, 2. Such a system is called a
tandem or sequential system. Note that this system is similar to the shoeshine
example. The difference is that the shoeshine example has finite capacity.
It is more convenient to define
X(t) = (n, m)
if there are n customers in the queue for server 1 and m customers in the
queue for server 2 at time t. The state space of this stochastic process is
obviously countable. Although it is possible to identify the corresponding infinitesimal generator, we may find it not so useful in determining the limiting
probabilities.
Note that from state (n, m), the Markov chain may enter three possible
states: (n 1, m) if a customer completes server 1 first; (n, m 1) if a
customer completes server 2 first; (n + 1, m) if a new customer arrives before
any one completes services. If any of n or m is zero, we need to make some
minor adjustments.
150
Now consider the states from which the Markov chain may enter state
(n, m). They include (n 1, m), (n + 1, m 1), (n, m + 1). Again, we make
some minor adjustments if any of n, m is zero.
So the general balance equation is
n,m (1 + 2 + ) = n1,m + 1 n+1,m1 + 2 n,m+1 .
For special cases, we have
0,0 = 1 0,1 ;
( + 1 )n,0 = 2 n,1 + n1,0 ;
( + r )0,m = 2 0,m+1 + 1 1,m1 .
Rather than solving this system of equations directly, it is more convenient to guess the solution and verify it. The idea is: the system under
consideration is similar to two M/M/1 systems. If the balance (equilibrium)
will be researched, the arrival rate for the second server has also to be .
Hence, we must have
n, = (1 /1 )(/1 )n ;
and
,m = (1 /2 )(/2 )m .
And we guess
n,m = n, ,m .
Needless to say, this guess is correct. We may verify it quickly.
Note that the limiting distribution is a joint independent geometric. The
total number of customers in the system has expectation given by
L=
9.5.2
+
.
1 2
Closed Systems
If customers come and go, it is called an open system. If no new customers

enter, and existing ones never depart, the system is closed. My best imagination of such a system is when the system is so large, it includes all the
service stations you can possibly have.
151
Suppose we have m customers in the system and there are k service

stations. Whenever a customer completes a service, it immediately gets into a
queue in front of another server (could be the same one). If we follow a single
customer and define Xn to be the server index in which line this customer
enters after completed n services, we can easily see that {Xn } is a discrete
time Markov chain. Certainly, we have to assume that the choice of the next
server depending only on which service this customer has just completed, not
on the history of the services. Let P be the transition probability matrix for
this discrete time Markov chain. Assume further that this Markov chain is
irreducible and having a stationary distribution . It is known that
= P
with kj=1 j = 1. Note that it is more convenient to write the state space
as {1, 2, . . . , k}.
From the experience of the last example, it seems possible to believe that
the arrival to service station j is again Poisson process with some rate, say
m (j). If so, we must have
P
m = m P.
As the solution to the above type of equation is unique up to a scaler, we
must have
m = km k,
where km k = j m (j). We may interpret km k as the average service
completion rate of the entire system. It is the system throughput rate.
I would like to add that these arguments are still until the exponential
service time assumption with service rate j at station j. We must also
assume that service stations are independent of each other. Our next question
is: how do these m customers distribute among these k servers?
Let Y (t) = (n1 , n2 , . . . , nk ) be the vector with j component equaling the
number of customers in jth station at time t. Then {Y (t) : t 0} is a
continuous time Markov chain (with vector valued random variables). Let
P
Pm (n1 , n2 , . . . , nk ) = lim P [Y (t) = (n1 , n2 , . . . , nk )].

t
152
It can be shown that, if it exists,

Pm (n1 , n2 , . . . , nk ) = Km
p
Y
(m (j)/j )nj ,
j=1
for all possible vector (n1 , n2 , . . . , nk ). Note that Km is a normalizing constant.

Due to the relationship between j and m (j), we have
Pm (n1 , n2 , . . . , nk ) = Cm
p
Y
(j /j )nj .
j=1
Note the normalizing constant Km becomes Cm now.

At the first sight, I would claim this is a multinomial probability function.
It looks like so and fits my intuition. Unfortunately, it is not. The main
difference is that the individual terms does not have a multinomial coefficient.
Because of this, even if j and j are given, it is still hard to determine Cm
when m is large. There are m+k1 choose m possible vectors (n1 , n2 , . . . , nk )
P
such that nj = m.
Even without knowing what this Cm equals numerically, we can still learn
a lot from the above expression. Consider the moment when a particular
customer has just completed service i and will enter service j. What is the
probability that s/he will see (m1 , m2 , . . . , mk ) customers in k stations? Note
P
that we have kj=1 mj = m 1.
P (seeing(m1 , m2 , . . . , mk )|i j)
P (Y (t) = (m1 , m2 , . . . , mi + 1, . . . , mk ), i j)
=
P (i j)
P (Y (t) = (m1 , m2 , . . . , mi + 1, . . . , mk ))i Pij
= P
P (Y (t) = (n1 , n2 , . . . , ni + 1, . . . , nk ))i Pij
= K(i /i )
k
Y
(j /j )mj
j=1
= C
k
Y
j=1
(j /j )mj .
153
Since C is a normalizing constant, we find this conditional probability function is the same as Pm1 . Hence, we claim
Theorem 9.1 The arrival theorem
In the closed network system with m customers, the system as seen be arrivals
to server j is distributed as the stationary distribution in the same network
system when there are only m 1 customers.
That is, this customer may pretend that s/he is an observer from outside.
Let Lm (j) and Wm (j) be the average number of customers and the average time a customer spends at server j when there are m customers in the
network. Upon conditioning on the number of customers found at server j
by an arrival to that server, it follows that
Wm (j) =
1 + Em [nj ]
1 + Lm1 (j)
=
.
j
j
Replacing Em [nj ] by Lm1 (j) in the last equality is based on the arrival
theorem. (Sorry that we have used lower case for random variable, and
upper case for expected value here).
In addition, since m1 (j) = m1 j , the cost equation implies
Lm1 (j) = m1 j Wm1 (j).
Substituting back to Wm (j), we get
Wm (j) =
Since
1 + m1 j
j .
Wm1 (j)
Lm1 (j) = m 1, we obtain

m 1 = m1
j Wm1 (j),
or
m1
.
j j Wm1 (j)
m1 = P
These manipulations result in
Wm (j) =
1
(m 1)j Wm1 (j)
+
.
P
j
j i i Wm1 (i)
154
After so much work, we may rightfully ask: so what? Note that W1 (j) =
1/j which is very easy to calculate. The above relationship enables us to
obtain W2 (j), and from Wm1 (j), we can easily get Wm (j). Thus, we can
compute Wm (j) iteratively. The cost equation will then make it possible to
calculate all Lm (j).
9.6
Problems
1. Consider a single-server bank for which customers arrive in accordance

with a Poisson process with rate . If a customer only will enter the
bank if the server is free when he arrives, and if the service time of a
customer has the distribution G, then what proportion of time is the
server busy?
2. Consider the following queueing system. Customers arrive in a Poisson
process at rate > 0 and are served, in order of arrival, by a single
server. Service times are independent; however, they are not identically distributed since it has been observed that the server works more
quickly when there are a number of customers waiting in the queue.
To model this phenomenon of stat-dependent serve times assume that
when there are j customers in the system the server provides exponential service at rate j, j = 1, 2, . . ..
(a) Show that {n }, the equilibrium probability distribution for the
number of customers in the system (including the one being served), is
Poisson with mean = /.
(b) Let W be the equilibrium waiting time for a customer who joins the
queue, and suppose that W has pdf w(x) with corresponding Laplace
P
n
transform w(s).
If (z) =
n=0 n z , show that
(i) (z) = w(
z);
(ii) w(s)
= es/ .
(c) Using the results of (b) or otherwise, find E(W ).
Chapter 10
Renewal Process
In the Poisson process model, the inter-arrival times are assumed to be independent an identically distributed exponential random variables. We now
seek to relax the requirement slightly.
Definition 10.2
Let X1 , X2 , . . . , be a sequence of independent and identically distributed
non-negative random variables. Define
N (t) = max{n :
n
X
Xi t}
i=1
for all t 0. Then {N (t) : t 0} is called a renewal process.

Compared to the Poisson process, the inter-arrival times no longer have
to have exponential distribution for the renewal process. Thus, the renewal
process losses the Markov or memoryless property. If it has been a while
since the occurrence of the last event, the waiting time for the next event
from the moment may be substantially shorter than the usual waiting time.
At the same time, if an event has just occurred, the waiting time for the
next event has the same distribution no matter how often events occurred
before the time of last occurrence. In this sense, the process renews itself at
the moment when an event occurs. We may now link this process with the
concepts of renewal events discussed before.
155
156
CHAPTER 10. RENEWAL PROCESS
Let us define S0 = 0 and Sn =
Pn
i=1
Xi for n 1. Assume that
P (X1 = 0) < 1.
Let = E[X1 ]. Obviously, > 0. We did not really pay attention whether
N (t) is well defined for each t. It is likely that N (t) < 200, for instance, no
matter how large the t is?
It turns out that this cannot happen. According to the strong law of
large numbers, we have
Sn
n
almost surely as n . It is hence true that Sn n. Thus, when
n increases to infinity, Sn also increases to infinity almost surely. By the
definition of N (t), we easily see that
P (N (t) < ) = 1
for all t 0. At the same time, limt N (t) = .
10.1
Distribution of N (t)
For each given t, N (t) is a random variable. What is its distribution? The
answer is usually not available unless the distribution of X1 has a convenient
form. Some discussions are possible.
Note that the event N (t) n is the same as Sn t. Thus, it is seen that
P {N (t) = n} = P {N (t) n} P {N (t) n + 1}
= P {Sn t} P {Sn+1 t}.
Denote Fn (t) = P {Sn t} which is the convolution of the distributions of
X1 , . . . , Xn , we have the expression
P {N (t) = n} = Fn (t) Fn+1 (t).
As indicated earlier, this expression does not provide any practical means
of computing the distribution of N (t).
10.1. DISTRIBUTION OF N (T )
157
Example 10.1
Suppose that in a renewal process, the inter-arrival times X1 , X2 , . . . , are
uniformly distributed on the unit interval [0, 1]. Then for 0 t 1,
Fn (t) =
tn
n!
for n = 1, 2, . . .. However, the expression for t > 1 is usually very complex.
Example 10.2
Suppose that in a renewal process, the inter-arrival times X1 , X2 , . . . , are
discretely uniform on integers {0, 1, 2, 3}. Then the expressions of F1 (t) and
F2 (t) are easy to obtain:
F1 (i) =
i+1
4
i = 0, 1, 2, 3.
The probability mass function f2 (t) and the cumulative distribution function
F2 (t) are given by
0 1 2 3 4 5 6
16f2 (t) 1 2 3 4 3 2 1
16F2 (t) 1 3 6 10 13 15 16
It is also possible to find examples when a simple expression for the distribution of N (t). Other than the standard special case of Poisson process,
we have the following examples.
Example 10.3
Consider the renewal process whose inter-arrival times have geometric distribution such that
P (X = i) = p(1 p)i1 ,
i 1.
158
It is seen that
!
k1 n
P (Sn = k) =
p (1 p)kn ,
n1
k n.
Thus, we have
P (N (t) = n) =
[t]
X
[t]
X
k1 n
k 1 n+1
kn
p (1 p)
p (1 p)kn1 .
n1
n
k=n+1
k=n
Let us now consider the problem of computing the expectation of the

mean-value function:
m(t) = E[N (t)].
This function is also called renewal function.
Recall an expectation formula derived for non-negative integer valued
random variables,
m(t) =
P {N (t) n} =
n=1
Fn (t).
n=1
It can be shown (by using characteristic functions), that the renewal function
and the inter-arrival distribution uniquely determine each other. Thus, if the
inter-arrival time distribution is exponential with rate = 2, then we find
m(t) = t = 2t.
If the renewal function
m(t) = t = 2t,
we know that {N (t) : t 0} is a Poisson process with rate = 2.
One mathematical problem is the finiteness of m(t) for any given t. Suppose P (X1 > 0) > 0. It can be shown that for any given t, Fn (t) decreases
at an exponential rate when n is large. Thus, m(t) is always finite when
P (X1 > 0) > 0.
The relationship between m(t) and the distribution of the inter-arrival
time is made explicit in the following theorem.
10.2. LIMITING THEOREMS AND THEIR APPLICATIONS
159
Theorem 10.1
Let m(t) be the renewal function of the renewal process {N (t) : t 0} and
F (t) be the distribution of the inter-arrival time. Assume that F (0) < 1.
Then
Z t
m(t x)dF (x).
m(t) = F (t) +
0
Proof: We have
m(t) = E{E[N (t)|X1 ]}
=
E[N (t)|X1 = x]dF (x)
[1 + m(t x)]dF (x)
= F (t) +
m(t x)dF (x).
10.2
Limiting Theorems and Their Applications
Theorem 10.2
Suppose that {N (t) : t 0} is a renewal process and the inter-arrival time
X1 has non-zero expectation . Then
N (t)
1
almost surely as t .
Proof: Let Sn be the occurrence time of the nth event as before. By the
definition of renewal process, we have
SN (t) t SN (t)+1
which implies
SN (t)
SN (t)+1
t
.
N (t)
N (t)
N (t)
160
By the law of large numbers, we have

Sn
n
almost surely. Since N (t) almost surely when t , we have
SN (t)
,
N (t)
and
"
SN (t)+1
SN (t)
1
=
1+
.
N (t)
N (t)
N (t)
Thus, we have the result.
The elementary renewal theorem is as follows.
Theorem 10.3
Suppose that {N (t) : t 0} is a renewal process and the inter-arrival time
X1 has non-zero expectation . Then, the renewal function satisfies
1
m(t)
as t .
We do not provide a proof here. It should be noted that this result cannot
be directly obtained from the last theorem.
If the renewal theorem is assumed, the limiting probabilities of the discrete time Markov chain can be derived as follows.
Example 10.4
Let {Xn : n = 0, 1, . . .} be a discrete time Markov chain. Assume that it is
irreducible, aperiodic and positive recurrent. Let the state space be denoted
as S = {0, 1, . . . , }.
Consider the case when X0 = i for some i. Define Tk be the inter-arrival
time for the Markov chain to visit state i. Thus, we can define a renewal
process Ni (t) to be the number of times when state i is visited by time t.
10.3. PROBLEMS
161
By the renewal theorem, the long term proportion of times when state i is
visited is given by
Ni (n)
lim
1
i
n
n
where i = E[T1 ]. That is, i = 1
i .
Example 10.5
Let {N (t) : t 0} be a renewal process and X1 , X2 , . . . , be the inter-arrival
times. Let = E[X1 ] > 0. For any given n, the event N (t) + 1 = n implies
that the n 1th event has occurred by time t but the nth event has not
occurred yet. In another word, we know that
n1
X
Xi t <
i=1
n
X
Xn .
i=1
Consequently, it has nothing to do with the values of Xn+1 , Xn+2 , . . .. A

random variable T = N (t) + 1, with the property that T = n is independent
of future outcomes Xn+1 , Xn+2 , . . ., is called a stopping time. It can be
shown that for a stopping time,
E[
T
X
Xi ] = E[T ]E[X].
i=1
In our case, we have

N (t)+1
E[
Xi ] = E[N (t) + 1]E[X].
i=1
It turns out that N (t) is not a stopping time and the above formula is
not applicable to N (t).
10.3
Problems
1. Suppose that the inter-arrival distribution for a renewal process is Poisson distributed with mean . That is, suppose
P (Xn = k) =
k1
exp(),
(k 1)!
k = 1, 2, . . . .
162

(a) Find the distribution of Sn .
(b) Calculate P (N (t) n).
(c) Find m(t) = E[N (t)](not necessarily in a closed form).
2. Mr. Smith works on a temporary basis. The mean length of each job
he gets is three months. If the amount of time he spends between jobs
is exponentially distributed with mean 2, then at what rate does Mr.
Smith get new jobs?
3. Each time a machine is repaired it remains up for an exponentially
distributed time with rate . It then fails, and its failure is either of
two types. If it is a type 1 failure, then the time to repair the machines
is exponential with rate 1 ; if it is a type 2 failure, then the repair time
is exponential with rate 2 . Each failure is , independently of the time
it took the machines to fail, a type 1 failure with probability p and a
type 2 failure with probability 1 p. What proportion of time is the
machine down due to a type 1 failure? what proportion of time is the
machine down due to a type 2 failure? What proportion of time is it
up?
4. A machine in use is replaced by a new machine either when it fails
or when it reaches the age of T years. If the lifetimes of successive
machines are independent with a common distribution F having density
f show that
(a) the long-run rate at which machines are replaced equals
"Z
#1
xf (x)dx + T (1 F (T ))
(b) the long-run rate at which machines in use fail equals
RT
0
F (T )
.
xf (x)dx + T [1 F (T )]
5. Machines in a factory break down at an exponential rate of six per hour.

There is a single repairman who fixes machines at an exponential rate
10.3. PROBLEMS
163
of eight per hour. The cost incurred in lost production when machines
are out of service is $10 per hour per machine. What is the average
cost rate incurred due to failed machines?
6. The manager of a market can hire either Mary or Alice. Mary, who
gives service at an exponential rate of 20 customers per hour, can be
hired at a rate of $3 per hour. Alice, who gives service at an exponential
rate of 30 customers per hour, can be hired at a rate of $C per hour.
The manager estimates that, on the average, each customers time is
worth $1 per hour and should be accounted for the model. If customers
arrive at a Poisson rate of 10 per hour, then
(a) what is the average cost per hour if Mary is hired? if Alice is hired?
(b) find C if the average cost per hour is the same for Mary and Alice.
7. Consider a renewal process {N (t), t 0} having a gamma (r, ) interarrival distribution. That is, the inter-arrival density is
f (x) =
ex (x)r1
,
(r 1)!
x > 0.
(a) Show that

P {N (t) n} =
X
et (t)i
i!
i=nr
(b) Use (a) to show that

m(t) =
X
i et (t)i
i=r
[ ]
r
i!
where [ ri ] is the largest integer less than or equal to i/r.
164
Chapter 11
Sample Exam Papers
11.1
Quiz 1: Winter 2003
1. [4] Using only the axioms of the probability, show that if A and B are
two event such that A B, then
P (A) P (B).
2. [2] Two independent random variables X and Y have probability mass
functions
P (X = k) = 1/3;
k = 0, 1, 2,
and
P (Y = k) = (1/2)k+1 ;
k = 0, 1, 2, . . .
That is: X has uniform distribution on {0, 1, 2}, and Y has geometric
distribution.
[3] (a) Find the probability generating function of X.
[3] (b) Find a closed form expression for the probability generating
function of Y . (This means that leaving it as a summation is not
enough).
[3] (c) Find the probability generating function of XY .
165
166
CHAPTER 11. SAMPLE EXAM PAPERS
3. [2] Let X be a random variable with probability function given by the

following table:
x -2 -1 0 1 2
p .3 .2 .1 .2 .2
Let Y = (X + 1)2 .
[3] (a) Tabulate the probability function of Y .
[3] (b) Tabulate the conditional probability function of X given Y = 0.
[3] (c) Tabulate E(X|Y ).
[3] (d) Compute V ar[E(X|Y )].
4. [3] The number of claims received at an insurance company during a
week is a random variable with mean 20 and variance 120. The amount
paid in each claim is a random variable with mean 350 and variance
10000. Assume that the amounts of different claims are independent.
(a) [4] Suppose this company received exactly 3 claims in a particular
week. The amount of each claim is still random as already specified.
What are the mean and variance of the total amount paid to these 3
claims in this week?
(b) [4] Assume that in one week, all claims received the same payment
of 300. What is the mean and variance of the total amount paid in this
week?
(c) [4] What are the mean and variance of the total amount paid to
claims in an ordinary week?
5. [3] A secretary puts n letters n envelopes randomly. Let An be the
event that at least one letter is in the correct envelope.
[4] (a) Show that for all n = 1, 2, . . .,
pn = 1 P (An ) = 1 +
(1)n
(1) (1)2
+
+ +
.
1!
2!
n!
Hint: for each given n, define Bi = the event that the ith letter is in
the ith envelope, i = 1, 2, . . . , n. Then An = B1 B2 Bn .
11.2. QUIZ 2: WINTER 2003
167
[4] (b) Define p0 = 0, find the generating function of {pn }

n=1 .
Hint: obtain a difference equation first.
11.2
Quiz 2: Winter 2003
1. [3] State the definition of Markov chain (discrete time).

2. [4] State the definitions of the concepts of transient, positive recurrent
and null recurrent for a renewal event.
3. Assume the coding sequence of a DNA sequence in a region without
genes can be modeled as a random sample of symbols: A, G, T, C with
corresponding probabilities
PA = 0.2, PG = 0.2, PT = 0.4, PC = 0.2.
Assume X0 = T .
[3] (a) Let
un = P (T AT occurs at trial n) = P (Xn2 = T, Xn1 = A, Xn = T |X0 = T ).
Show that un = 0.032 for n 3. Find the values of un for n = 0, 1, 2.
[3] (b) Obtain the generating function of the sequence un .
[2] (c) Is T AT a renewal or delayed renewal event? Give a one sentence
justification.
[3] (d) Show that T AT is recurrent.
[3] (e) Use renewal theorem, compute the mean inter-occurrence time
for T AT after its first occurrence.
4. Assume {Xn }
n=0 is a Markov chain with transition probability matrix
P =
0 0.3 0.2 0.5
0.3 0 0.5 0.2

.
0
0 0.4 0.6
0
0 0.3 0.7
168

[3] (a) Find the two step transition probability matrix.
[3] (b) Suppose that the probability function of X1 is given by the
vector 1 = (0, 0.5, 0, 0.5) . Find the probability function of X3 .
[6] (c) Classify the state space. For each class, determine whether it is
recurrent or transient. Determine their periods.
[2] (c) What does it mean by irreducible? Is this MC reducible?
[5] (d) Find the long run proportions of times when the MC is in state
0, in state 2. (Do not blindly solve P = ).
[3] (e) Calculate limn E[Xn ].
5. Let {Zn }
n=0 be a usual branching process with Z0 = 1 and Zn =
PZn1
j=1 Xn1,j for n > 0 with family sizes Xn,j being iid random variables.
Assume X0,1 has discrete uniform distribution on 0, 1, . . . , k for some
positive integer k.
For example, if k = 3, then P (X0,1 = j) = 0.25 for j = 0, 1, 2, 3.
[2] (a) For what values of k the probability of extinction is 1?
[4] (b) When k = 3, compute the probability of extinction.
[4] (c) When k = 5, calculation the mean and variance of X5 .
6. In a more complex random walk, Z1 , Z2 , . . . are independent and identically distributed random variables with
P (Z1 = 1) = p,
P (Z1 = 0) = r
and P (Z1 = 2) = q,
such that p+r+q = 1 and all p, r, q are non-zero. As usual Xn =

for n 1, with X0 = 0.
(11.1)
Pn
i=1
Define (as usual)

(r)
n = P (Xn = r, Xn1 < r, Xn2 < r, . . . , X2 < r, X1 < r|X0 = 0)
for r > 0.
Zi
11.3. FINAL EXAM: WINTER 2003
169
You are given that the corresponding generating functions satisfy

(r) (s) = [(s)]r .
[4] (a) Show that (s) satisfies the equation
qs[(s)]3 + (rs 1)(s) + ps = 0.
[4] (b) When p = 0.25, q = 0.25 and r = 0.5, find the probability that
Xn = 1 will ever occur for some n 1.
[bonus 5] (c) Show that when p = 2q > 0, Xn = 0 is a recurrent renewal
event (more on recurrent part).
11.3
Final Exam: Winter 2003
1. Let Xn , n = 1, 2, . . . be a sequence of independent and identically

distributed geometric random variables such that Xn (for all n) has
probability mass function
f (k) = P (Xn = k) = p(1 p)k , k = 0, 1, 2, . . .
for some parameter p (0, 1).
(a) [5] Find the probability generating function of Xn .
(b) [5] Find the probability generating function of X1 + X3 .
(c) [5] Let N be a Poisson distribution random variable with mean
and is independent of X1 , X2 , . . .. Find the probability generating
P +1
function of TN = 2N
i=1 Xi .
(d) [5] Compute E(TN ) and V ar(TN ), where TN is defined as in (c).
2. Suppose 4 balls are placed into two urns A and B. On each day, One
ball is selected such that each of the four balls is equally likely to be
selected and the ball is then placed into the other urn.
Let Xn be the number of balls in urn A on the nth day; and Yn be the
number of balls in urn A on the 2nth day for n = 0, 1, 2, . . ..
170
a) [6] Are {Xn }

n=0 and {Yn }n=0 Markov chains. If any of them are,
write down their state spaces and transition matrices and do the usual
classification.
b) [4] Given X0 = 3, find the probability function of X2 .

c) [4] In a long run, what proportion of times when at least one urn is
empty?
d) [6] Given X0 = k, calculate the probability that number of balls
in urn A reaches 0 before the number of balls in urn B reaches 0 for
k = 0, 1, 2, 3 and 4.
3. A student has practically infinite number of assignment problems to
work on at the moment. The time it takes to solve a problem is random with exponential distribution whose mean is 10 minutes. The
probability that her solution is correct is 80%. Assume the amount of
time for solving each problem, and the probability of getting correct
answers are all independent.
The worth of each correct answer is random with uniform distribution
on 1, 2, 3 marks. (No partial marks for wrong answers for simplicity).
(a) [4] What is the probability that she solved exactly 10 problems in
an hour?
(b) [4] What is the probability that she used more than 1 hour to solve
the first 3 problems?
(c) [4] If she solved 10 problems in a hour, calculate the probability
that she gets at least 9 correctly.
(d) [4] Suppose she solved 9 problems correctly in one hour. Given this,
what is her expected number of problems solved in the same period?
(e) [4] Suppose she worked on assignment for one hour and handed in
whatever she completed in that hour. Let T be her mark of her hand
in. Calculate the mean and variance of T .
4. A professor has access to two computer servers, and he has a computing
job to be done.
11.3. FINAL EXAM: WINTER 2003
171
Assume that the time it takes to complete a job is random with exponential distribution, independent of each other, for both servers. The
rates are 1 = 3/hour and 2 = 2/hour for Servers 1 and 2 respectively.
Let X1 (t) and X2 (t) be the numbers of jobs in the queues for Servers
1 and 2 respectively at time t. The jobs in the queues do not switch
between servers even if the other machine is idle sometimes.
The professor submitted the same job to both servers at time t = 0.
His job is done as soon as one of two servers completes it.
Suppose X1 (0) = 1 and X2 (0) = 1.
[5] (1) What is the probability that Server 1 will start work on his job
before Server 2?
[5] (2) What is the probability that he has to wait for 0.5 hours or
longer before any server starts working on his job?
[5] (3) What is the probability that his job is completed by Server 1
before Server 2 starts working on this job?
[5] (4) What is the probability that he has to wait at least 2 hours
before the job is done?
5. A closed population has N individuals. Assume the number of flu
cases can be modeled by a birth and death process. Let X(t) be the
number of flu cases in this population at time t. The birth rate k =
(k + 1)(N k) and the death rate (not the death of the individual,
but the death of the flu) k = k 2 when X(t) = k, k = 0, 1, . . . , N .
(a) [5] Given X(0) = 0, what is the expected waiting times until X(t) =
1, until X(t) = 2?
(b) [5] Given X(0) = k, for some 0 < k < N , what is the probability
that after the next transition, there will be an extra case rather than
there will be one few case?
(c) [5] Assume = 1, = 9 and N = 5. In the long run, what is the
proportion of times when there are no flu cases in the population?
(d) [5] Assume = 1, = 9 and N = 5. what is the average number
of flu cases at any moment in the long run?
172

(e) [bonus 2] Answer (c) for a general N .

Applied Probability Theory - J. Chen

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Applied Probability Theory - J. Chen

Enviado por

Direitos autorais:

Formatos disponíveis

Stat333 Lecture Notes

7 Exponential and Poisson

7.3.1 Inter-arrival and Waiting Time Distributions

8 Continuous Time Markov Chain

A probability model consists of three parts: sample space, a collection of

Mathematically, the above definition does not depend on the hypothetical

P (Ei1 Ei2 ) + , +(1)n+1 P (ni=1 Ei ).

1.2. CONDITIONAL PROBABILITIES AND INDEPENDENCE

Using classical definition of the probability measure (which satisfies three

The answer to the question is then

The limit when n is then exp(1).

Conditional Probabilities and Independence

Two events A and B are independent if and only if

for all subsets I of {1, 2, . . . , n}.

Let Fi , i = 1, 2, . . . , n be mutually exclusive events such that Fi = S, and

1.4. KEY FACTS

1. Suppose that in an experiment, a fair die is rolled twice. Let A={the

7. Let A1 A2 be a sequence of events. If

CHAPTER 2. RANDOM VARIABLES

Since in a specific experiment, we are not certain in advance whether the

2.2. DISCRETE RANDOM VARIABLES

The corresponding values of X are

If X is a random variable, we call

Discrete Random Variables

If the set of all possible outcomes of a random variable X is countable, then

CHAPTER 2. RANDOM VARIABLES

Continuous Random Variables

If the c.d.f. of a random variable F (x) = P (X x) can be written as

for some non-negative f (t), we say X is absolutely continuous. We have

3. X has uniform [0, 1] distribution if F (x) = P (X x) = x for x [0, 1],

A proper definition of the expectation of a random variable needs advanced

when the summation converges absolutely.

when the integration converges absolutely.

CHAPTER 2. RANDOM VARIABLES

2.5. JOINT DISTRIBUTION

The joint c.d.f. of more random variables are defined similarly.

The marginal density function (for continuous case) can be obtained as

CHAPTER 2. RANDOM VARIABLES

Formulas for Expectations

Let X and Y be two random variables. We define

2.8. KEY RESULTS AND CONCEPTS

Example 2.5 Derivation of the distribution of X + Y .

Key Results and Concepts

CHAPTER 2. RANDOM VARIABLES

1. If X and Y are two random variables, what do we mean by

in terms of the distribution function F of the random variable X.

CHAPTER 2. RANDOM VARIABLES

Of course, this is meaningful only if P (Y = y) = fY (y) > 0.

CHAPTER 3. CONDITIONAL EXPECTATION

X= the number of while balls selected. What is the probability function of

As we have noticed, when Y = 1 is given, X has binomial distribution

where the sum is over all possible values of X.

When we focus on the value of Y in this expression, we find we have a

CHAPTER 3. CONDITIONAL EXPECTATION

which is again a function of y. Same argument implies we could define

To show this, notice that

Even though it does not directly answer our question, we do have

CHAPTER 3. CONDITIONAL EXPECTATION

This is a simple linear equation, we find E(X) = 10.

V ar(X|Y = 1) = 0; V ar(X|Y = 2) = V ar(X|Y = 3) = V ar(X).

Remark: We certainly do not believe that the miner will be memoryless.

1. Let X be an random variable such that