Escolar Documentos
Profissional Documentos
Cultura Documentos
Jiahua Chen
Department of Statistics and Actuarial Science
University of Waterloo
c
Jiahua
Chen
Fall, 2003
Course Outline
Stat333
Review of basic probability. Generating functions and their applications.
Simple random walk, branching process and renewal events. Discrete time
Markov chain. Poisson process and continues time Markov chain. Quequing
theory and renewal processes.
Contents
1 Introduction
1.1 Probability Model . . . .
1.2 Conditional Probabilities
1.3 Bayes Formula . . . . .
1.4 Key Facts . . . . . . . .
1.5 Problems . . . . . . . . .
. . . . . . . . . . .
and Independence
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
2 Random Variables
2.1 Random Variable . . . . . . .
2.2 Discrete Random Variables . .
2.3 Continuous Random Variables
2.4 Expectations . . . . . . . . .
2.5 Joint Distribution . . . . . . .
2.6 Independence . . . . . . . . .
2.7 Formulas for Expectations . .
2.8 Key Results and Concepts . .
2.9 Problems . . . . . . . . . . . .
3 Conditional Expectation
3.1 Introduction . . . . . .
3.2 Formulas . . . . . . . .
3.3 Comment . . . . . . .
3.4 Problems . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
4
5
5
.
.
.
.
.
.
.
.
.
7
7
9
10
11
12
14
14
15
16
.
.
.
.
19
19
22
24
25
CONTENTS
4 Generating Functions
4.1 Introduction . . . . . . . . . . . . . . . . . . . .
4.2 Probability Generating Functions . . . . . . . .
4.3 Convolution . . . . . . . . . . . . . . . . . . . .
4.3.1 Key Facts . . . . . . . . . . . . . . . . .
4.4 The Simple Random Walk . . . . . . . . . . . .
4.4.1 First Passage Times . . . . . . . . . . .
4.4.2 Returns to Origin . . . . . . . . . . . . .
4.4.3 Some Key Results in the Simple Random
4.5 The Branching Process . . . . . . . . . . . . . .
4.5.1 Mean and Variance of Zn . . . . . . . . .
4.5.2 Probability of Extinction . . . . . . . . .
4.5.3 Some Key Results in Branch Process . .
4.6 Problems . . . . . . . . . . . . . . . . . . . . . .
5 Renewal Events
5.1 Introduction . . . . . . . . . . . . . .
5.2 The Renewal and Lifetime Sequences
5.3 Some Properties . . . . . . . . . . . .
5.4 Delayed Renewal Events . . . . . . .
5.5 Summary . . . . . . . . . . . . . . .
5.6 Problems . . . . . . . . . . . . . . . .
6 Discrete Time MC
6.1 Introduction . . . . . . . . . . . . . .
6.2 Chapman-Kolmogorov Equations . .
6.3 Classification of States . . . . . . . .
6.4 Limiting Probabilities . . . . . . . . .
6.5 Mean Time Spent in Transient States
6.6 Problems . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Walk
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
32
34
36
36
38
40
41
42
43
44
48
49
.
.
.
.
.
.
59
59
61
64
67
69
69
.
.
.
.
.
.
73
73
80
82
89
95
96
105
. 106
. 106
. 109
CONTENTS
7.4
7.5
7.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
112
113
114
116
.
.
.
.
119
. 122
. 125
. 130
. 134
.
.
.
.
.
.
.
.
139
. 139
. 141
. 143
. 144
. 149
. 149
. 150
. 154
10 Renewal Process
155
10.1 Distribution of N (t) . . . . . . . . . . . . . . . . . . . . . . . 156
10.2 Limiting Theorems and Their Applications . . . . . . . . . . . 159
10.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
11 Sample Exam Papers
165
11.1 Quiz 1: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 165
11.2 Quiz 2: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 167
11.3 Final Exam: Winter 2003 . . . . . . . . . . . . . . . . . . . . 169
Chapter 1
Introduction
1.1
Probability Model
CHAPTER 1. INTRODUCTION
2. P (S) = 1;
3. P (
i=1 Ei ) =
i=1 P (Ei ) for any mutually exclusive events Ei , i =
1, 2, . . .. i.e. Ei Ej = for all i 6= j.
P (Ei )
i1 <i2
Example 1.1
In a party, n men throw their hats in the centre of the room. Each man
randomly picks up a hat. What is the probability that nobody gets his own
hat? What is the limit of this probability when n ?
Solution: Let Ai = the event that the ith man gets his hat for i =
1, 2, . . . , n. Then the event that nobody gets his own = [Ai ]c .
Note that
P (i Ai ) = nP (A1 ) (n2 )P (A1 A2 ) + .
1
1
1
+ + (1)n+1 .
2! 3!
n!
1
1
1
+ + (1)n+1 ].
2! 3!
n!
1.2
CHAPTER 1. INTRODUCTION
Let E and F be two events and P (F ) > 0. We define that the conditional
probability of E given F by
P (E|F ) = P (EF )/P (F ).
As already defined, two events E and F are independent if and only if
P (EF ) = P (E)P (F ). When events E and F are independent, we find
P (E|F ) = P (E)
when P (F ) > 0. However, we should not use this relationship as the definition of independence. When P (F ) = 0, the conditional probability is not
defined, but E and F can still be two independent events.
1.3
Bayes Formula
P (EFk )
P (E|Fk )P (Fk )
.
=P
P (E)
i P (E|Fi )P (Fi )
The Bayes formula is a mathematical consequence of defining the conditional probability. However, this formula has generated a lot of thinking in
statistics. We could think of E is an event (subset of sample space) of some
experiment to be done, and Fi s classify the sample points of the same experiment according to possibly a different rule (than the rule of E). Somehow,
E is readily observed, but Fi s are not. Before the experiment is done, we
may have some prior information on what probabilities of Fi s are. However,
when the experiment is done and the outcome (the sample point) is known
to belong to E, but its membership in Fi remains unknown, this Bayes formula allows us to update our assessment of the chance for Fi in view of the
occurrence of E. For example, before we toss a die, it is known the chance of
observing 2 is 1/6. After a die is tossed, and you are told that the outcome
is an even number, then the conditional probability becomes 1/3.
Here is a less straightforward example.
Example 1.2
There are three coins in a box: 1. two headed; 2. fair; 3. biased with P (H) =
0.75.
When one of the coins is selected at random and flipped, it shows head.
What is the probability that it is the two headed coin?
Solution: Let C1 , C2 and C3 represent the events when the two headed,
fair or biased coin is selected, respectively. We want to find P (C1 |H).
P (H|C1 )P (C1 )
.
P (C1 |H) = P3
i=1 P (H|Ci )P (Ci )
The answer is 4/9.
Remark It is not so important to memorize the Bayes formula, but the definition of the conditional probability. Once you understand the conditional
probability, you can work out the formula easily.
1.4
Key Facts
A probability space consists of three components: Sample space, the collection of events, and the probability measure. The probability measure satisfies
three Axioms and from which we introduce the concepts of conditional probability and independence. The Bayes theorem is a simple consequence of
manipulating the idea of conditional probability. However, the result incited
philosophical debate in statistics.
1.5
Problems
CHAPTER 1. INTRODUCTION
2. Let S be the sample space of an particular experiment, A and B be
events, and P be a probability measure. Which of the followings are
Axioms, definitions and formulas?
(i) P (A B) = P (A) + P (B) P (AB).
(ii) P (S) = 1.
(iii) P (A|B) = P (AB)/P (B) when P (B) 6= 0.
3. Using only the axioms of probability, show that
1) P (A B) = P (A) + P (B) P (AB)
2) P (A B C) = P (A) + P (B) + P (C) P (AB) P (AC) P (BC) +
P (ABC).
4. a) Prove that P (ABC) = P (A|BC)P (B|C)P (C).
b) Prove that if A and B are independent, then so are Ac and B c .
5. Let A and B be two events.
(a) Show that in general, if A and B are mutually exclusive, then they
are not necessarily independent.
(b) Find a particular pair of events A and B such that they are both
mutually exclusive and independent.
6. Prove Booles inequalities:
(a) P (ni=1 Ai )
Pn
i=1
P (Ai ),
(b) P (ni=1 Ai ) 1
i=1
Pn
i=1
P (Aci ).
Ai = (empty),
Chapter 2
Random Variables
2.1
Random Variable
In practice, we may describe the outcomes of an experiment by any terminology. For example, if Mary and Paul compete in a game, the outcomes can
be: Mary wins; Mary loses; it is a draw.
However, it is more convenient in mathematics to code the outcomes by
numbers. For example, we may define the outcome as 1 if Mary wins, the
outcome is 1 if Mary loses, and as 0 if it is a draw. That is, we can transform
the outcomes in S into numbers. There are many ways to transform the
outcomes.
In probability theory, we call the mechanism of transforming sample
points into numbers as Random Variable. More formally, we define a
random variable as a function on the sample space S.
We use capital letters X, Y , and so on for random variables.
In most applications, we focus mainly on the value of the function (random variables). That is why it appears that the random variables are numbers, rather than mechanisms of transforming sample points into numbers.
As a function, a random variable is totally deterministic. There is nothing
random. However, the inputs of this function are random, this fact implies
the outcome of the transformation is random. This is how we get the notion
that random variables are random.
Example 2.1
7
Let S be the ordered outcomes of rolling two fair dice. Define X be the sum
of two outcomes. If = (2, 5) which is a sample point, then X() = 7.
Nothing is random.
2.2
For example, if a random variable can only take values {.2, .5, 2, }, it
is discrete. More commonly seen discrete random variables in our textbooks
take integer values. However, we should remember that discrete random
variable can take any values, as long as the number of possible values remain
countable.
By the way, the notion of countable needs to be clarified. If we can find
a one-to-one map from a set to a set of integers, then this set is countable. The set of all even numbers is countable. The set of the numbers
10
{1, .1, .01, .001, . . .} is also countable. Being countable implies that we can
arrange the elements in the set into a sequence. We often represent a countable set of real numbers as {x1 , x2 , . . .}.
If { t1 , t2 , t3 , . . .} is the set of possible outcomes of X, we say the function
f (ti ) = P (X = ti )
the probability (mass) function (p.m.f.) of X.
Note that in this definition, I used notation ti for possible values of the
random variable X. Although it is a common practice that we use xi s for
possible values of the random variable X, this is not a requirement. It is
very important for us to make a distinction between (the notation of) the
possible values of X, and X itself.
2.3
f (t)dt,
i
exp()
i!
n
i
pi (1p)ni
2.4. EXPECTATIONS
11
Note that we do not have to specify the sample space, probability measure, and how the random variables are defined in the above example.
Two basic types of random variables have been introduced. In theory,
there is a third type of random variables. However, the third type of random
variables is usually not discussed in elementary probability courses. Notice
that the sum of two random variables is clearly another random variable.
When we add a continuous random variable to a discrete random variable,
the new random variable is not discrete nor continuous. That is, we cannot always classify a random variable into one of the three possible types.
A measure theory result states, however, that any random variable can be
written as a linear combination of three random variables of each type.
2.4
Expectations
xi P (X = xi )
i=0
tf (t)dt
12
When the convergence does not hold, we say the expectation does not
exist.
To calculate the expectation of any random variable, we should pay a lot
of attention to the if part before you start. Many students lost the clue
because they ignore this part of the definition.
Example 2.4
Calculate expectation of Binomial and Exponential random variables.
2.5
Joint Distribution
Let X and Y be two random variables. Note that it is possible to define two
functions on the same sample space. For example, suppose our sample space
is [0, 1][0, 1], the unit square. Every sample point can be represented as
(w1 , w2 ). Let
X(w1 , w2 ) = w1 , Y (w1 , w2 ) = w2
and assume the probability measure on [0, 1][0, 1] is uniform. Then both
X and Y have uniform distribution. We find
P (X s, Y t) = st;
when (s, t) [0, 1][0, 1].
If Z is another random variable such that
Z(w1 , w2 ) = 1 w1 .
We find Z also have uniform distribution. However,
P (X s, Z t) 6= st
in general.
The moral of this example is: knowing individual distributions of X, Y
and Z is not enough to tell their joint behavior.
The joint random behavior of two random variables X and Y is characterized by their joint c.d.f. defined as
F (x, y) = P (X x, Y y).
13
FY (y) = P (Y y) = s
lim F (s, y).
Note that I used (s, t, y) on purpose. It is certainly not a good practice,
but the point is, X does not have to be linked with x.
When both X and Y are discrete, it is more convenient to work with the
joint probability (mass) function:
f (x, y) = P (X = x, Y = y);
When there exists a non-negative function f (x, y) such that
F (x, y) =
f (s, t)dsdt
for all real numbers (x, y), we say that X and Y are jointly (absolutely)
continuous and f (x, y) is their joint density function.
The marginal probability function (for discrete case) can be obtained as
fX (x) =
f (x, y).
f (x, y)dy.
14
2.6
Independence
If the joint c.d.f. of X and Y satisfies F (x, y) = FX (x)FY (y) for all x, y, then
we say X and Y are independent.
When both X and Y are discrete, then the independence is equivalent to
f (x, y) = fX (x)fY (y)
for all (x, y) where f (x, y) is the joint probability function. When X and Y
are jointly continuous, then the independence is equivalent to
f (x, y) = fX (x)fY (y)
for almost of (x, y) where f (x, y) is the joint density function.
2.7
f (t y)g(y)dy.
15
Second, assume X and Y are independent, take non-negative integer values only, with probability functions f (x) and g(y). (Note the notation look
the same as before). The joint probability function of Z = X + Y is
P (Z = n) =
n
X
f (i)g(n i).
i=0
2.8
Random variables are real valued functions defined on the sample space.
Their randomness is the consequence of the randomness of the outcome from
the sample space. We classify them according to their cumulative distribution functions or equivalently, their probability mass functions or probability
density functions.
A discrete random variable takes at most countable number of possible
values. An absolutely continuous random variable has cumulative distribution function which can be obtained from a density function by integration.
(Or roughly, its cumulative distribution function is differentiable). The third
type of random variable is not discussed.
A random variable has, say, Poisson distribution if its probability function
has the form
n
exp(), n 0, 1, 2, . . . .
n!
In general, the distribution of a random variable is named after the form of
its cumulative distribution function.
The mean, variance, moments of a random variable are determined by
its distribution. In many examples, they can be obtained by summation or
16
integration (to some students) easily. In other examples, the mean, variance
of a random variable can be obtained via its relationship to other random
variables. Thus, memorizing some formulas is useful.
2.9
Problems
3
pX (k) =
(0.4)k (1 0.4)3k
k
when k = 0, 1, 2, 3.
Let Y = (X 1)2 .
(i) Let FX (x) be the cumulative distribution function of X. Calculate
FX (2.4).
(ii) Tabulate the probability function of Y .
(iii) Tabulate the probability function of X given Y = 1. (iv) Tabulate
E(X|Y ).
3. A random number N of fair dice is thrown. P (N = n) = 2n , n 1.
Let S be the sum of the scores. Find the probability that
a) N = 2 given S = 4
b) S = 4 given N = 2.
c) S = 4 given N is even
d) the largest number shown by any die is r.
4. A coupon is selected at random from a series of k coupons and placed
in each box of cereal. A house-husband has bought N boxes of cereal.
2.9. PROBLEMS
17
What is the probability that all k coupons are obtained? (Hint: Consider the event that the ith coupon is not obtained. The answer is in
nice summation format.)
5. If birthdays are equally likely to fall in each of the twelve months of
the year, find the probability that all twelve months are represented
among the birthdays of 20 people selected at random.
(Hint: let Ai be the event that the ith month is not included and
consider A1 A2 A12 )
6. Let X be a random variable and g() be a real valued function.
(a) What do we mean by X is discrete?
(b) If X is a discrete random variable, argue that g(X) is also a random
variable and discrete.
(c) If X is a continuous random variable, is g(X) necessarily a continuous random variable? Why?
7. Let a and b be independent random variables uniformly distributed in
(0, 1). What is the probability that x2 + ax + b = 0 has no real roots?
8. Express the distribution functions of
X + = max{0, X},
X = min{0, X},
|X| = X + + X ,
18
Chapter 3
Conditional Distribution and
Expectations
3.1
Introduction
Suppose both X and Y are discrete and hence have a joint probability function f (x, y). Then, we have
P (X = x|Y = y) =
f (x, y)
P (X = x, Y = y)
=
.
P (Y = y)
fY (y)
20
5
P (X = j|Y = 1) =
(.4)j (.6)5j
j
and
5
(.2)j (.8)5j
P (X = j|Y = 2) =
j
for j = 0, 1, . . . , 5.
The marginal probability function of X is given by
!
5
5
P (X = j) = (.5)[
((.4)j (.6)5j ] + (0.5)[
(.2)j (.8)5j ].
j
j
xP (X = x|Y = y)
E(X|Y = 2) = 1.
3.1. INTRODUCTION
21
(y)P (Y = y)
E(X|Y = y)P (Y = y)
XX
xP (X = x|Y = y)]P (Y = y)
xP (X = x, Y = y)
x,y
= E(X).
To be more concrete, we do not use (Y ) in textbooks, but write it as
E(X|Y ) and call it the conditional expectation of X given Y . For some with
mathematical curiosity, we may write
E(X|Y ) = E[X|Y = y]|y=Y .
Hence, the above identity can be stated as
E[E(X|Y )] = E(X).
One intuitive interpretation of this result is: the grand average is the weighted
average of sub-averages. To find the average mark of students in stat230, we
may first calculate the average in each of 6 sections. Hence, we obtain 6
conditional expectations (conditioning on which section a student is in). We
then calculate the weighted average of section averages according to the size
of each section. This is the second expectation being applied on the left hand
side of the above formula.
It turns out that this concept applies to continuous random variables too.
If (X, Y ) are jointly continuous, we define the conditional density function
22
of X given Y = y as
fX|Y (x|y) =
f (x, y)
fY (y)
where f (x, y) is the joint density, fX and fY are marginal density functions,
and we assume that fY (y) is larger than zero,
The conditional expectation will then be defined as
E(X|Y = y) =
xf (x, y)
dx
fY (y)
3.2
Formulas
Most formulas for ordinary expectation remain valid for the conditional expectation. For example,
E(aX + bY |Z) = aE(X|Z) + bE(Y |Z).
If g() is a function, we have
E[g(Y )X|Y ] = g(Y )E[X|Y ]
as g(Y ) is regarded as non-random with respect to Y .
At last, we define
V ar(X|Y ) = E[(X E(X|Y ))2 |Y ].
Then
V ar(X) = E[V ar(X|Y )] + V ar[E(X|Y )].
3.2. FORMULAS
23
E(X|Y = 3) = 5 + E(X).
24
3.3
Comment
It could be claimed that the probability theory is a special case of measure theory in mathematics. However, the concepts of independence and
conditional expectation allow probability theory to be a separate scientific
discipline.
Our subsequent developments depend heavily on the use of conditional
expectation.
3.4. PROBLEMS
3.4
25
Problems
n = 0, 1, 2, . . .
t 0.
n
k
pk (1 p)nk
k = 0, 1, . . . , n.
26
3.4. PROBLEMS
27
28
Chapter 4
Generating functions and their
applications
4.1
Introduction
aj s j = a0 + a1 s + a2 s 2 +
(4.1)
j=0
converges in some interval |s| s0 where s0 > 0, then A(s) is called the
generating function of the sequence {aj }
0 . The generating function provides
a convenient summary of a real number sequence. In many examples, simple
and explicit expressions of A(s) can be obtained. This enables us to study
the properties of {aj }
0 conveniently.
Example 4.1
The Fibonacci sequence {fj } is defined by f0 = 0, f1 = 1 and the recursive
relationship
fj = fj1 + fj2 ,
j = 2, 3, . . . .
(4.2)
30
fj sj =
j=2
fj1 sj +
j=2
fj2 sj .
(4.3)
j=2
Note the summation starts from j = 2 because (4.2) is valid only when
P
j
j = 2, 3, . . .. By defining F (s) =
j=0 fj s , we get
j=2
fj sj =
fj sj f0 f1 s = F (s) s.
j=0
(4.4)
s
.
1 s s2
A(j)
.
j!
This, of course, requires the function be analytic at 0 which is true when A(s)
converges in a neighbourhood of 0. An obvious conclusion is: the real number
sequences and the generating functions have an one-to-one correspondence
when the convergence and the analytic properties are true.
Now let us get back to the example, F (s) clearly converges at least for
|s| 0.5. This allows us to look for its McLaurin series expansion. Note that
1 5
1+ 5
2
s)(1
s)
1 s s = (1
2
2
4.1. INTRODUCTION
31
1 X
1+ 5 j j X
1 5 j j
F (s) = [ (
)s (
) s ].
2
2
5 j=0
j=0
Recall the property of one-to-one correspondence,
1 1+ 5 j
1 5 j
fj = [(
) (
) ], j = 0, 1, 2, . . . .
2
2
5
5)/2
which is the golden ratio, to which the ancient Egyptians attributed many
mystical quantities.
In this example, the generating function has been used as a tool for solving
the difference equation (4.2). The generating functions will be seen to be
far more useful than just this. For example, if A(s) converges in |s| s0
with s0 > 1, then
A(1) =
j=1
aj , A0 (1) =
jaj
j=1
and so on.
Example 4.2
Consider the following series:
aj = 1, j = 0, 1, 2, . . . ;
bj = 1/j!, j = 0, 1, 2, . . . ;
c0 = 0, cj = 1/j, j = 1, 2, . . . .
Easy calculation shows their corresponding generating functions are A(s) =
(1s)1 , B(s) = es and C(s) = log(1s), where the regions of convergence
as |s| < 1 for A(s) and C(s), and all real s for B(s).
32
4.2
Let X be a random variable taking non-negative integer values with probability function {pj }, where
pj = P {X = j},
j = 0, 1, 2, . . . .
The generating function of {pj } is called the probability generating function of X and we write
G(s) = GX (s) = E{sX } = p0 + p1 s + p2 s2 + .
(4.5)
pj |s|
j=0
pj = 1.
j=0
jpj ,
j (r) pj
j=0
j=0
j = 0, 1, 2, . . . .
j=0
33
qj sj
sj = (1 s)1
j=0
j=0
j = 1, 2, . . . .
(4.6)
Again, (4.6) is true for all j start from 1. Multiplying (4.6) by sj and summing
over j we obtain
j=1
qj s j =
j=1
qj1 sj
p j sj
j=1
so that
Q(s) (1 p0 ) = sQ(s) [G(s) p0 ].
Thus, for all |s| < 1, we have that
Q(s) =
1 G(s)
.
1s
(4.7)
34
Since G(1) = 1, it follows from (4.7) and the Mean Value Theorem in
calculus that, for given |s| < 1, there exists s (s, 1) such that
Q(s) = G0 (s ).
It follows that
lim Q(s) = lim G0 (s ) = E(X).
s1
s1
qj .
j=0
4.3
Let {aj } and {bj } be two sequences of real numbers and cj be a sequence
defined by
cj =
j
X
al bjl = a0 bj + a1 bj1 + + aj b0 ,
j = 0, 1, 2, . . . .
l=0
The new sequence is called the convolution of {aj } and {bj }; we write
{cj } = {aj } {bj }.
Theorem 4.1
If A(s), B(s) and C(s) are the generating functions of {aj }, {bj } and {cj } =
{aj } {bj } respectively, then (when they all exist at s)
C(s) = A(s)B(s).
Proof Let bj = 0 when j = 1, 2, . . .. Hence,
cj =
X
l=0
al bjl .
4.3. CONVOLUTION
35
Thus,
C(s) =
=
=
c j sj =
j=0
X
al bjl sj
j=0 l=0
al bjl sj =
l=0 j=0
[al sl
l=0
bjl sjl ]
j=0
l=0
j
X
P (X = l)P (Y = j l),
j = 0, 1, 2, . . . ,
l=0
n
X
j=0
n
X
n j
n
p (1 p)nj sj =
(ps)j (1 p)nj = (1 p + ps)n ,
j
j
j=0
36
m+n j
P (X + Y = j) =
p (1 p)m+nj , j = 0, 1, . . . , m + n.
j
4.3.1
Key Facts
4.4
t
@
@
@t
@
@
@t
t
t
t
@
@
@t
@
@
@t
37
t
@
@
@t
@
@
@t
@
@
@t
t
@
@
@t
Probability of
return to 0 at trial n
first return to 0 at trial n
first passage through 1 at trial n
first passage through r at trial n
38
2n
=
(pq)n ,
n
n = 1, 2, . . . .
You can try to verify that the generating function of {un } is given by
U (s) = (1 4pqs2 )1/2 .
To find the generating functions of other sequences, let F (s), (s), and
(s) be generating functions of {fn }, {n } and {n(r) }. We will get them
through some difference equations.
(r)
4.4.1
n1
X
k=1
k nk =
n
X
k=0
k nk
39
n = qn1 ,
n = 2, 3, . . . .
(4.8)
Multiplying both sides of (4.8) by sn and sum over n with care over its range,
we have
n s n = q
n=2
(2)
n1 sn .
n=2
We find
2
(s) ps = qs(2)
n (s) = qs[(s)]
1 1 4pqs2
(s) =
.
2qs
When s 0, we should have (s) 0 so we must have
!
X
1 1 4pqs2
1/2
1
(s) =
= (2qs)
(4pqs2 )j
2qs
j
j=1
where the binomial expansion has been used. From this we find 2n = 0 and
1
2n1 = (2q)
1/2
2n 1 n n1
(4pq)n = (2n 1)1
p q ,
n
n
n = 1, 2, . . . .
The generating function (s) will tell us more about the simple random
walk. Since
(s) =
n=0
n sn ,
40
1
pq
p/q p < q.
The walk is certain to pass through 1 when p > q, or even when p = q = 1/2.
If p q, we may define the random variable N which is the waiting time
until first passage through 1 occurs. That is
N = min{n : Xn = 1}
and we know, in this case, that P (N < ) = 1. Since P (N = j) = j , the
probability generating function of N is (s) and
0
E(N ) = (1 ) =
(p q)1 , p > q
,
p=q.
4.4.2
Returns to Origin
For a first return to the origin at trial n, the walk must either begin with
X1 = 1 or X1 = 1. In the first case, the event can be written as
A = {X1 = 1, X2 X1 < 1, X3 X1 < 1, . . . , Xn1 X1 < 1, Xn X1 1}.
Hence P (A) = qn1 . In the second case, the event becomes
B = {X1 = 1, (X2 X1 ) < 1, . . . , (Xn1 X1 ) < 1, (Xn X1 ) 1}.
41
Note {Xn } is also a simple random work with P (Xn = 1) = q rather than
p. Hence the event B has similar structure to the event A. Let
(1)
= P (X1 < 1, X2 < 1, . . . , Xn1 < 1, Xn = 1).
n
Then, {(1)
} has the same generating function as that of {n } except for p
n
(1)
and q switched. In addition, P (B) = P (X1 = 1)n1 and therefore, for
n 1,
(1)
fn = P (A) + P (B) = pn1 + qn1 .
Equivalently,
F (s) = ps(1) (s) + qs(s)
1 1 4pqs2
1 1 4pqs2
+ qs
= ps
2ps
2qs
= 1
1 4pqs2 .
fn = 1 |p q|
n=0
and so a return is certain only if p = q = 1/2. In this case, the mean time to
return is
d
F 0 (1 ) = lim [1 1 s2 ] = .
s1 ds
Thus, if the game is fair and you have lost some money at the moment, we
have a good news for you: the chance that you will win back all your money
is 1. The bad news is, the above result also tells you that on average, you
may not live that long to see it.
4.4.3
Generating function
U (s) = (1 4pqs2 )1/2
fn
(2n 1)1
(2n 1)1
2n
n
(pq)n
2n1
n
pn q n1
(1 4pqs2 )
(2qs)1 [1
(1 4pqs2 )]
42
4.5
Now let us study the second example of simple stochastic processes. Here we
have particles that are capable of producing particles of like kind. Assume
that all such particles act independently of one another, and that each particle has a probability pj of producing exactly j new particles, j = 0, 1, 2, . . .,
P
pj = 1. For simplicity, we assume that the 0th generation to consist of
a single particle and the direct descendants of that particle form the first
generation. Similarly, the direct descendants of the nth generation form the
(n + 1)th generation.
Z0 = 1
u
P
@PPP
PP
@
PP
@u
Pu
u
u
@
@
u
u
u
u @u
H
@HH
@ H
u
u
u
u @u HHu
u
u
u
r r r r r r r r
Z1 = 4
Z2 = 5
Z3 = 9
Zn
X
Xni
i=1
for all n 1. In addition, all Xni are independent and have the same
distribution as that of Z1 .
43
k
X
Xni .
i=1
Thus,
E(sZn+1 |Zn = k) = E[s
Pk
i=1
Xni
] = Gk (s).
(4.9)
(4.10)
4.5.1
44
n = 1, 2, . . .
(4.11)
where = G0 (1) is the mean family size and we have Hn1 (1) = 1. Since
0 = 1, it follows from (4.11) that n = n . Thus, if > 1, the average
population size increases exponentially. If < 1, E(Zn ) approaches 0 at an
exponential rate as n . The case = 1 gives the curious result that
E(Zn ) = 1 for all n.
More directly,
V ar(Zn ) = E[V ar(Zn |Zn1 )] + V ar[E(Zn |Zn1 )]
= E(Zn1 2 ) + V ar(Zn1 ).
Thus
2
,
n2 = n1 2 + 2 n1
n = 1, 2, . . .
n1 (1 n ) 2
,
1
n = 1, 2, . . .
4.5.2
Probability of Extinction
Let qn represent the probability that the populations extinct by the n generation. That is, define
qn = P (Zn = 0) = Hn (0), n = 0, 1, 2, . . . .
Thus, q0 = 0 and, by (4.10),
qn = G(qn1 ).
(4.12)
45
46
Remark Because of the above summary, most students tend to always solve
the equation to find the probability of ultimate extinction. This is often more
than what is needed.
Example 4.5
47
Lotka (See Feller 1968, page 294) showed that to a reasonable approximation,
the distribution of the number of male offspring in an American family was
described by
p0 = 0.4825, pk = (0.2126)(0.5893)k1 , k 1
which is a geometric distribution with a modified first term. The corresponding probability generating function is
G(s) =
0.4825 0.0717s
1 0.5893s
and G0 (1) = 1.261. Thus, for example, in the 16th generation, the average
population of male descendants of a single root ancestor is
16 = (1.261)16 = 40.685.
The probability of extinction, however, is the smallest solution of
q=
0.4826 0.0717q
.
1 0.5893q
Thus, we find q = 0.8197. This suggests that for those names that do survive
to the 16th generation, the average size is very much more than 40.685. (All
the calculations are subject to original round off errors).
Example 4.6
From the point of epidemiology, it is more important to control the spread
of the disease than to cure the infected patients. Suppose that the spread of
a disease can be modeled by a branching process. Then it is very important
to make sure that the average number of people being infected by a patient
is less than 1. If so, the probability of its extinction will be one. However,
even if the average number of people being infected is larger than one, there
is still a positive chance that the disease will extinct.
A scientist in Health Canada analyzed the data from the SARS (severe
atypical respiratory syndrome) epidemic in year 2003. It is noticed that many
interest phenomena could be partially explained by the results in branching
process.
48
First, many countries imported SARS patients but they did not cause
epidemics. This can be explained by the fact that the probability of extinction is not small (even when the average number of people being infected by
a single patient is larger than 1).
Second, a few patients were nicknamed super-spreader. They might
simply corresponding to the portion of branching process which do not extinct.
Third, after government intervention, the average number of people being
infected by a single patient was substantially reduced. When it fell below 1,
the epidemic was doomed to extinct.
Finally, it was not cost effective to screen all airplane passengers, but to
take strict and quick measure of quarantine of new and old cases. When
the average number of people being infected by a single patient falls below
one, the disease will be controlled with probability one.
4.5.3
For simplicity, we assumed that the population starts from a single individual:
Z0 = 1; we also assumed the numbers of offsprings of various individual are
independent and have the same distribution.
Under these assumptions, we have shown that
n = n
and
n2 =
n 1 n1 2
1
where and 2 are the mean and the variance of the family size and n and
n2 are the mean and the variance of the size of the nth generation.
We have shown that the probability of extinction, q, is the smallest nonnegative solution to
G(s) = s
where G(s) is the probability generating function of the family size. Further,
it is known that q = 1 when < 1; when > 1, q < 1. When = 1, q = 1
unless the family size is not random.
4.6. PROBLEMS
49
4.6
Problems
x
e ,
x!
x = 0, 1, . . ..
k = 0, 1, 2, . . . .
50
4 2
2
1
].
s[
2 +
27 1 3 s 1 + 13 s
k
i
pk (p 1)i ,
i = 0, 1, . . . .
4.6. PROBLEMS
51
Pn1
r=1
br bn1r , n = 2, 3, . . ..
k
pj =
(p)j (1 p)k , j = 0, 1, . . .
j
where k > 0 and 0 < p < 1.
2) Let ro = 0, rj = c/j(j + 2), j = 1, 2, . . . (find the constant c by
yourselves).
Find the means and the variances of the above distributions whichever
exists.
52
1
+ a;
2
P (Z1,1 = 1) =
1
2a;
4
P (Z1,1 = 3) =
1
+ a,
4
for some a.
(a) Find probability generating function of the family size. When a =
1/8, find the probability generating function of X2 .
(b) Find range of a such that the probability of extinction is less than
1.
(c) When a = 1/8, find the expectation and variance of the population
size of the 5th generation and the probability of extinct.
14. For a branching process with family size distribution given by
P0 = 1/6, P2 = 1/3, P3 = 1/2;
4.6. PROBLEMS
53
54
4.6. PROBLEMS
55
23. Branching with immigration Each generation of a branching process (with a single progenitor) is augmented by a random number of
immigrants who are indistinguishable from the other members of the
population. Suppose that the numbers of immigrants in different generations are independent of each other and of the past history of the
branching process, each such number having probability generating
function H(s). Show that the probability generating function Gn of
the size of the nth generation satisfies Gn+1 (s) = Gn (G(s))H(s), where
G is the probability generating function of a typical family of offspring.
24. Consider the random walk X0 = 0, Xn = Xn1 + Zn where P (Zn =
+1) = p, P (Zn = 1) = q, n = 1, 2, . . . independently (p + q = 1).
Find the probability that the event Xn = r will ever occur where r is
a fixed positive integer. If p > q, find the expected time until its first
occurrence.
25. Consider the random walk X0 = 0, Xn = Xn1 + Zn where P (Zn =
+1) = p, P (Zn = 2) = q, n = 1, 2, . . . independently (p + q = 1). Let
(r) (s) and n be defined as in the class. Show that (r) (s) = [(s)]r
and derive the relationship
(3)
n = qn1 ,
n = 2, 3, . . . .
56
27. If an unbiased coin is tossed repeatedly, show that the probability that
the number of heads ever exceeds twice the number of tails is ( 51)/2.
28. Let pr,k be the probability that the simple random walk visits state r
(r > 0) exactly k times.
a) If p = q = 0.5, show that pr,k = 0, k = 0, 1, 2, . . ..
b) If p > q, show that
pr,k =
0,
k = 0;
k1
(1 ) , k = 1, 2, . . .
where = |p q|.
c) If p < q, show that
pr,k =
1 ,
k = 0;
k1
(1 ) , k = 1, 2, . . .
where = (p/q)r .
29. Consider a gambler who at each play of the game has probability p of
winning one unit and probability q = 1 p of losing one unit. Assuming that successive plays of the game are independent, what is the
probability that, starting with i unit, the gamblers fortune will reach
N before reaching 0? Hint: Let Pi , i = 0, . . . , N denote that probability that, starting with i, the gamblers fortune will eventually reach N .
Derive a relationship between Pi s.
n n
30. Using the fact that u2n+1 = 0 and u2n = (2n
n ) p q to show
4.6. PROBLEMS
57
1 1 4pqs2
F (s) =
2ps
and 1 4pq = |pq|. Find the probability that 0 will ever be reached.
(c) Find the range of p such that state 0 is recurrent.
58
Chapter 5
Renewal Events and Discrete
Renewal Processes
5.1
Introduction
Consider a sequence of trials that are not necessarily independent and let
represent some property which, on the basis of the outcomes of the first
n trials, can be said unequivocally to occur or not to occur at trial n. By
convention, we suppose that has just occurred at trial 0, and En represents
the event that occurs at trial n, n = 1, 2, . . ..
We call an event in renewal theory. However, it is not an event in the
sense of probability models in which events are subsets of the sample space.
Taking the simple random walk {Xn } as an example, we regard Xn as the
outcome of the nth trial. Thus, {Xn } themselves are outcomes of a sequence
of trials. An event 1 can be used to describe: the outcome Xn is 0. That
is, the event has just occurred at trial n is the event
En = {Xn = 0}
for a given n.
Similarly, another possible event 2 can be defined such that 2 has just
occurred at trail n is the event
En = {Xn Xn1 = 1, Xn1 Xn2 = 1},
59
n = 2, 3, . . . .
60
Another simple (but not rigorous) way to define a renewal event is: independent of previous outcomes of the trials, once occurs, the waiting time
for the next occurrence of has the fixed distribution.
Example 5.1
Consider a sequence of Bernoulli trials in which P (S) = p and P (F ) = q
with p + q = 1. Let represent the event that trials n 2, n 1 and n result
respectively in F, S and S. We shall say that is the event F SS. It is
clear that is a renewal event. If occurs at n, the process regenerates and
the waiting time for its next occurrence has the same distribution as had the
waiting time for the first occurrence.
Example 5.2
In the same situation as above, let represent the event SS. That is, is
said to occur at trial n if trials n 1 and n both give S as the outcome. In
this case, is not a renewal event; the occurrence of does not constitute a
renewal of the process. The reason is, if has occurred at trial n, the chance
it will recur at trial n + 1 is p, but the chance that occurs on the first trial
is 0.
61
Example 5.3
In most situations, the event of record breaking is not a renewal event. Let
us consider the record high temperature. The record always gets higher and
makes it hard to break. Thus, the waiting time for the next occurrence is
likely to be longer. Hence, it cannot be a renewal event.
Example 5.4
The simple random walk provides a rich source for examples of renewal
events. As before, we assume X0 = 0 and Xn = Xn1 + Zn , where Zn = +1
or 1 with respective probabilities p and q, independently, n = 1, 2, . . ..
a) Let represent return to the origin. Then is a renewal event. In
fact, the notation that we used in our analysis of the simple random walk
will motivate our choice of notation for recurrent events as introduced in the
next section.
b) Let represent a ladder point in the walk. By this we mean that
occurs at trial n if
Xn = max{X0 , X1 , . . . , Xn1 ) + 1
and we assume to have occurred at trial 0. Thus, the first occurrence of
corresponds to first passage through 1, the second occurrence of corresponds to first passage through 2, and so on. Here again, is a renewal
event, since each ladder point corresponds to a regeneration of the process.
c) As a final example, suppose that is said to occur at trial n if the number of positive values in Z1 , . . . , Zn is exactly twice the number of negative
values. Equivalently, occurs at trial n if and only if Xn = n/3.
5.2
Let represent a renewal event and as before define the lifetime sequence
{fn } where f0 = 0 and
fn = P { occurs for the first time at trial n},
n = 1, 2, . . . .
62
f=
n = 1, 2, . . . .
fn = F (1) 1
since f has the interpretation that recurs at some time in the sequence.
Since the event may not occur at all, it is possible for f to be less than 1.
Clearly, 1 f represents the probability that never recurs in the infinite
sequence of trials.
If f < 1, the waiting time for to occur is not really a random variable.
This is because that it has probability 1 f to BE infinity which is not
allowed for a random variable. For this kind of renewal events, after each
occurrence of , there is a probability 1 f that it will never occur again.
The probability that it will occur exactly k times is f k (1 f ), k = 0, 1, . . ..
(Use the model that we toss a coin to decide whether it will recur).
We may compute the chance of occuring at most 1000 times as
100
X
f k (1 f ) = 1 f 101 .
k=0
f k (1 f ) = 1 f m+1 .
k=0
Thus, when m tends to infinity, the chance for to occur no more than m
times tends to 1. Based on this fact, we claim that such a renewal event
occurs finitely often. It is transient.
If f = 1, then will occur some time in the future with probability
one. Only then, we can discuss the waiting time for the next occurrence
of . For a renewal event with this property, the waiting times from
the nth occurrence to the (n + 1)th (hence called inter-occurrence time)
are independent and have the same distribution. The function F (s) defined
63
= F (1) =
nfn .
n=0
64
5.3
Some Properties
For a renewal event to occur at trial n 1, either occurs for the first
time at n with probability fn = fn u0 , or occurs for the first time at some
intermediate trial k < n and then occurs again at n. The probability of this
event is fk unk . Notice that f0 = 1, we therefore have
un = f0 un + f1 un1 + + fn1 u1 + fn u0 , n = 1, 2, . . . .
This equation is called renewal equation.
Using the typical generating function methodology, we get
U (s) 1 = F (s)U (s).
Hence
U (s) =
1
1 F (s)
or
F (s) = 1
1
.
U (s)
Recall that when we discussed the simple random walk, we found in that
context,
q
U (s) = (1 4pqs2 )1/2 , F (s) = 1 1 4pqs2 .
It is simple to see that this relationship is true.
The concepts defined in the last section are all related to the {un } sequence and we summarize this in the following.
Theorem 5.1
The renewal event is
1. transient if and only if u =
un = U (1) < ,
un = and un 0 as n .
65
Proof 1 and 2:
un =
n=0
s1
nfn = F 0 (1)
66
Example 5.5
Let represent the occurrence of F F S in a sequence of Bernoulli trials with
P (S) = p and P (F ) = q, (p+q=1). In this case
u0 = 1, u1 = u2 = 0
and
un = pq 2 ,
n = 3, 4, . . . .
Thus
U (s) = 1 + pq 2 (s3 + s4 + ) = 1 + pq 2 s3 /(1 s),
|s| < 1
and
F (s) = 1 [U (s)]1 = pq 2 s3 /(1 s + pq 2 s3 ).
Note that F (1) = f = 1 so that is recurrent. Since un pq 2 > 0 as
n , it follows that is positive recurrent and the mean inter-occurrence
time is = (pq 2 )1 .
Example 5.6
Consider again the simple random walk and let represent return to the
origin. It is known that
F (s) = 1
1 4pqs2 ,
|s| 1
and
U (s) = (1 4pqs2 )1/2 .
Since fn = 0 for all odd n and non-zero for even n, it follows that is periodic
with period d = 2. If p = q, F (1) = f = 1 so that is in this case recurrent.
If p 6= q,
f = F (1) = 1 |p q| < 1
and is transient. When p = q, = lims1 F 0 (s) = so that is null
recurrent.
5.4
67
|s| < 1,
68
is recurrent if f =
fn = 1 and transient if f < 1. Periodicities are
determined by examining g.c.d. {n : fn > 0}. Note that it is possible that
is a recurrent event and yet has non-zero probability that will never occur,
but once it does it then occurs infinitely often.
To find V (s), let us note that when occurs at trial n 1, either
occurs for the first time at n with probability bn = bn u0 , or occurs for the
first time at some intermediate trial k < n and then occurs again at n. Thus,
vn = b0 un + b1 un1 + + bn u0 ,
n = 0, 1, 2, . . . .
We recognize the right side as the convolution of {bn } with {un } and so
V (s) = B(s)U (s), |s| < 1
and,
V (s) = B(s)[1 F (s)]1 ,
|s| < 1.
Example 5.7
Consider the simple random walk and let represent passage through 1.
Thus, occurs at trial n if Xn = 1. Then is a delayed renewal event. In
the notation used for the random walks
bn = n = P (first passage at trial n)
and B(s) = (s). Once occurs, the probability it recurs after n additional
steps is the same as the probability of a return to the origin n. Thus
U (s) = (1 4pqs2 )1/2
and {vn } where vn = P (Xn = 1), n = 1, 2, . . . has generating function
1 1 4pqs2
That is, we have to wait until the event occur for once, and then it
becomes a renewal event from there.
The conclusions we obtained here will be very useful when we discuss
Markov chain.
5.5. SUMMARY
5.5
69
Summary
Table 5.1: Summary of some concepts
Terminology
Event
Renewal Event
Delayed Renewal
Event
Recurrent
Transient
Positive Recurrent
Null Recurrent
Period
Aperiodic
5.6
Definition
It is a property of a stochastic process.
Its occurrence can be determined after n trials.
When this type of event occurs, the stochastic
process undergoes a renewal: the random behavior of the
process from this point is the same as the process from
time zero
At the second time when this type of event occurs
the process undergoes a renewal: the random behavior of the
process from this point is the same as the process from
time when it occurred for the first time.
The renewal event will occur with probability 1.
The renewal event may never occur.
The renewal event is recurrent and the
expected waiting time for the next occurrence is finite
The renewal event is recurrent but the
expected waiting time for the next occurrence is infinite
The greatest common divisor of the number of trials
after which the renewal event can happen.
The period of the renewal event is 1.
Problems
70
s3
.
s3 + 216(1 s)
(c) Is the renewal event periodic? Is it recurrent? If so, find the mean
inter-occurrent time.
2. In a simple random walk examine if the following are renewal events.
(a) is said to occur at trial n if at trial n, a return to origin from the
positive side takes place.
(b) is said to occur at trial n if at trial n, the walk is to the right side
of the origin.
3. Lifetime distribution of a fuse is given by fn = n1 (1), n = 1, 2, . . ..
(a) Show that P (X = m + n|X > m) = fn , n = 1, 2, . . ..
(b) Suppose that a new fuse is place in service on day 0 and immediately
upon its failure is replaced with an identical fuse. Also assume that
the lifetimes are i.i.d. random variables with distribution given above.
The event is said to occur at trial n if a new fuse is put at trial n.
Note that is a recurrent event. Obtain F (s) and U (s), and hence
determine un for this event.
(c) Let T be the survival time of the fuse in service at time n. (if a
failure occurs at time n, the fuse in service is the replacement). Write
T as a sum of indicator variables Y0 , Y1 , . . ., where Yi = 1 if the fuse in
service at n is also in service at time i, and 0 otherwise. Show that
E(T ) =
1 + n+1
.
1
5.6. PROBLEMS
71
u2n = 4
n
2n X
n
n i=0 i
!2
(b) Show that the particle returns to the origin with probability 1.
Argue from this result that the particle must pass through every point
in the integer lattice.
5. Consider a renewal event with the {fn } sequence having generating
function F (s). Let Nk denote the number of occurrences in the first k
trials and let qk,n = P (Nk = n).
Show that Qn (s) =
k qk,n s
is given by
Qn (s) =
{1 F (s)}F n (s)
.
1s
72
4s3
.
27 27s + 4s3
(c). Is it a recurrent renewal event? If so, find the mean interoccurrence time. Otherwise, find the probability that will ever occur
again.
7. (Self-organizing data retrieval system). consider a shelf containing two
books, B1 and B2 (among others). These books have two possible
orders on the shelf, namely B1 B2 or B2 B1 . Assume that at epochs
n = 0, 1, 2, . . ., a book is required by a library user, and that at any
epoch the probability that Bj is needed is pj , j = 1, 2, independently of
what happens in other epochs. Assume p1 > 0, p2 > 0, p1 + p2 < 1. To
obtain the required book, the librarian always searches the book-shelf
from left to right, so that average search time for the requested book
is minimized if the book with higher pj value is on the left.
However, the librarian does not know which book is more popular, and
therefore cannot decide whether B1 B2 or B2 B1 is the better arrangement. To increase the chance of having the requested book nearer the
left much of the time, the following algorithm has been devised. Whenever any book is requested, it is placed to the end of the shelf when it
is returned. Thus if B2 is demanded and the shelf order is B1 B2 , the
new arrangement will be B2 B1 once B2 is returned.
(a) Let be the renewal event shelf order of B1 and B2 is B1 B2 .
Let fn be the lifetime sequence for , having generating function F (s).
Show that
F (s) = (1 p2 )s + p1 p2 s2 /(1 (1 p1 )s).
(b) Show that is aperiodic and recurrent. Hence determine lim un =
P ( at epoch n). Determine also the long run probability that the self
order is B2 B1 .
Chapter 6
Discrete Time Markov Chain
6.1
Introduction
74
are discrete random variables. Since it is always possible to label the set of
possible values of Xn s by non-negative integers, we assume Xn taking only
non-negative integer values. When a stochastic process takes values other
than non-negative integers, most of our conclusions will stay.
The most important additional assumption on the stochastic process we
make is the following Markov property:
P {Xn+1 = j|Xn = i, Xn1 = in1 , . . . , X1 = i1 , Xo = io }
= P {Xn+1 = j|Xn = i}
= P {X1 = j|X0 = i}
= pij .
The first equality specifies the Markov property. It is often described
as the property that given the present (Xn = j), the future (Xn+1 = j) is
independent of the past (outcomes of X1 , . . . , Xn1 ). The second equality
further requires the Markov property is time homogeneous. That is, the
conditional probability does not depend on time n. The third equality simply
assigns a special notation. We call this quantity the transition probability
from state i to state j.
The set of all possible values of Xn s is called the State Space. The
sub-indices of Xn are regarded as time. That is, the value of Xn is the state
of the process at time n. Unless otherwise mentioned, the state space will be
denoted as {0, 1, 2, . . .} and the time will also be {0, 1, 2, . . .}. If Xn = i, we
say that the Markov chain is in state i at time n.
It should be clear that the state space contains all the possible values of
X1 , and all possible values of X2 , and all possible values of X3 and so on. It
is not dictated by any single random variable.
Example 6.1
Suppose X0 = 0, X1 has discrete uniform distribution on {0, 1}, X2 has
uniform distribution on {0, 1, 2} and son on. In general, Xn has discrete
uniform distribution on {0, 1, 2, . . . , n} for n = 0, 1, 2, . . ..
The stochastic process {Xn }
n=0 has a state space S = {0, 1, 2, . . .}.
The state space is NOT {0, 1, . . . , n}.
6.1. INTRODUCTION
75
Definition 6.1
A stochastic process is a discrete time Markov chain, if it
1. consists of a sequence of random variables (that is, countable many),
2. has countable state space, and
3. has Markov property.
pij = 1,
i, j 0;
i = 0, 1, 2, . . . .
j=0
"
p
1p
1p
p
76
|i j| =
6 1;
0,
pij = p,
j = i + 1;
1 p, j = i 1.
2i j
pij =
p (1 p)2ij .
j
6.1. INTRODUCTION
77
Suppose that there are only two possible weather conditions for any single
day: 1, Rain; 2, Sunny. In addition, we assume that tomorrows weather
depends on todays weather, but not on previous weather conditions when
todays weather is given. Also, the chance of rain tomorrow given today is
raining is , and the chance of being sunny tomorrow given today is sunny
is . We can model this experiment as a discrete time Markov chain.
First, we should define
Yn = {
The state space is clearly {0, 1} by the above definition and is countable.
The Markov property is satisfied as it is clearly stated in the description
of the problem. We certainly cannot blindly believe the real world can indeed
be modeled by a stochastic process with a Markov property. At the same
time, it is hoped that this is a harmless mathematical assumption. The
transition probability matrix is
P =
"
1
1
Today
n
R
R
S
S
P(Tomorrow = R)
n+1
0.7
0.5
0.4
0.2
78
Today
n
R
R
S
S
Xn
n
0
1
2
3
P =
0.7 0 0.3 0
0.5 0 0.5 0
.
0 0.4 0 0.6
0 0.2 0 0.8
6.1. INTRODUCTION
79
When trying to use a probability model to describe a real world phenomenon, we are certain that it is not correct. We are happy to learn that it
might still be very useful. Further, by increasing the model complexity, we
can often obtain a model that is closer to reality and very useful.
Example 6.6
Suppose there are 3 white and 3 black balls in two urns, each containing
3 balls. At each step, we draw a ball randomly from each urn. We then
exchange the balls and put them back into urns. We are interested in knowing
the number of while balls in the first urn after n exchanges.
Let Xn be the number of white balls in the first urn after n exchanges.
We will show {Xn } is a Markov chain.
Step 1: It is obvious that Xn can only be 0, 1, 2, or 3, regardless of n.
Therefore, the state space is {0, 1, 2, 3} which is countable.
Step 2: We need to verify the Markov property. First note,
P (Xn+1 = j|Xn = i, Xn1 = in1 , . . .) = 0 if |i j| 2.
The notation in1 in the above expression simply means some number. The
equation implies the transition probability from state i to state j does not
depend on the value of in1 or others, it equals 0 as long as |i j| 2.
We have a large number of other cases. If i = 0 and j = 0, we find
P (Xn+1 = 0|Xn = 0, whatever others) = 0.
This is because we will definitely obtain a while ball. Obviously
P (Xn+1 = 1|Xn = 0, whatever others) = 1.
There is no need to consider other cases when i = 0.
If i = 3, we have
P (Xn+1 = 2|Xn = 3, whatever others) = 1.
If i = 2, we have
4
P (Xn+1 = 2|Xn = 2, whatever others) = .
9
80
4
P (Xn+1 = 1|Xn = 1, whatever others) = .
9
1
P (Xn+1 = 0|Xn = 1, whatever others) = .
9
As none of the above transition probabilities depend on the whatever part,
the Markov property has been verified. The transition probability matrix is
9
P =
4
9
4
9
4
9
4
9
1
9
This completes the task for modeling this experiment as a Markov chain.
The above model turns out to be very useful to describe the diffusion
process in the physical world. If you drop a colored water into a cup of
clean water, soon the color will spread out. It will be mixed with the water
perfectly without our help. By imaging molecules moving from one part of
the cup to another part randomly, we can see that in the limit, the color
molecules will distribute uniformly over the whole cup.
6.2
Chapman-Kolmogorov Equations
81
The first question has a specific answer, the second one can be answered
in principle.
P (X2 = 1, X0 = 1)
P (X0 = 1)
P3
j=0 P (X2 = 1, X1 = j, X0 = 1)
=
P (X0 = 1)
P (X2 = 1|X0 = 1) =
3
X
3
X
p1j pj1 =
j=0
j=0
41
.
81
pij =
pik pkj .
The above formula may look complex. However, if expressed in matrix format, it becomes
P (2) = P 2
where P (2) is the two-step transition probability matrix and P is the one step
transition probability matrix. In general, we have
P (m) = P m .
When the state of the Markov chain at time 0 is not given, but we know
the distribution of X0 as
P (X0 = i) = i ,
82
pkj k .
Let be the row vector of i and be the row vector of P (X1 = j), we have
= P.
Similarly, if m is the row vector of P (Xm = j), we have
m = P m .
The above formulas P (m) = P m and its generalized form
P (m) P (n) = P (m+n)
are called Chapman-Kolmogorov equations. They are so simple and straightforward. I am sure that if you were born a few centuries earlier, it could be
your name that is attached to these formulas.
6.3
Classification of States
83
84
0 0
2
2
1 1
0 0
2
2
P=
1 1 1 1 .
- 0
1
6
@
I
@
@
@
@
@
- 3
2
85
86
0.25 to stay in the same state after one transition. However, once it leaves
state 2, the Markov chain will never re-enter this state.
According to our discussion on renewal event, it is also true that state i
is transient if and if only
X
P (Xn = i|X0 = i) =
(n)
pii < .
(Recall we obtained this theorem by using generating functions). Consequently, state i is recurrent if and only if
X
P (Xn = i|X0 = i) =
(n)
pii = .
Recall a state i is transient implies that the Markov chain will only visit
i finite number of times (in the whole process over infinity time horizon). If
a Markov chain has only finite number of states, at least one of the states
must be visited infinity number of times (over the infinity time horizon). We
hence conclude:
Corollary 6.1
If a Markov chain has finite state space, then at least one of its state is
positive recurrent.
pij
> 0,
(m2 )
pji
> 0.
In addition,
(m +n+m2 )
pjj 1
pji
87
(m +n+m2 )
pjj 1
(m2 )
pji
(n)
(m1 )
pii ]pij
= .
Remarks: The above result implies that recurrent property is a class property. If one state is a class is recurrent, than all the states in the same class
are recurrent. Further, we see that transient property is also a class property.
If one state is transient, then all states in the same class are transient.
We claim without proof that positive recurrent property, periodicity are
all class properties.
P
Example 6.8
Let the Markov chain consisting of the states 0, 1, 2, and 3 have the transition
probability matrix
0 0 21 12
1 0 0 0
.
P=
0 1 0 0
0 1 0 0
Determine which states are transient and which are recurrent.
Solution: The Markov chain is irreducible and have finite state space,
hence all states are recurrent.
Example 6.9
Consider the Markov chain having states
probability matrix
1
1
0
21 21
2 2 0
P=
0 0 12
0 0 1
2
1
1
0
4
4
0
0
0
0
1
2
Classify the state space (including identifying the transient, positive recurrent, null recurrent classes, and the periods of each classes).
88
Solution: This chain consists of the three classes {0, 1}, {2, 3} and {4}.
The first two classes are positive recurrent and the third transient. All classes
are aperiodic.
p00
{4p(1 p)}n
.
n
(n)
Hence, when 4p(1 p) < 1, n p00 < . The state 0 is transient. This
happens when p 6= 0.5. Otherwise the sum is infinity and the state 0 is
recurrent.
It turns out that the two-dimensional random walk (on grid) has similar
property. If the probability of moving in 4 directions are equal, then all the
states are recurrent. The simple random walks on three or higher dimensions
lose this property. All states are transient.
Remark It is often asked whether a closed class is a recurrent class. The
simple random walk example answers this question. When p 6= 0.5, the state
space is a closed class, but it is also a transient class.
Remark We have discussed many ways to find out whether a state is recurrent or transient. It may not be clear which methods should be applied to
inexperienced users. My thumb of rules are:
P
89
6.4
Limiting Probabilities
For an irreducible ergodic Markov chain, limn pij exists and is independent of initial state i. Further, letting
(n)
j = lim pij ,
n
90
i pij ,
i=0
j = 1.
Let n on both sides, under the assumption that the limit of pij exists,
we have
= P.
That is, must be the solution of this equation. However, P I does not
have full rank. There exist many many solutions. The one which also satisfies
P
i=0 j = 1 gives the limiting probabilities. The solution with this property
is unique.
The renewal theorem claims j = 1
where j is the expected interj
occurrence time. Hence j is the long run proportion of times the Markov
chain is in state j.
When the Markov chain is irreducible and positive recurrent, but not
aperiodic, we may still have a unique non-negative solution of
= P
91
satisfying j j = 1 exists and the Markov chain is irreducible, then all states
are positive recurrent.
P
(n)
Recall that P (Xn = j) =
i=0 pij P (X0 = i). Let n results in
lim P (Xn = j) = j .
If n = , then n+m = for all m = 1, 2, . . .. Hence, we say is the
stationary distribution of the Markov chain. In some books, it is also
called the steady state of the Markov chain. It can be seen that the stationary distribution may exist even when the Markov chain is reducible (not
irreducible). In this case, there can exist more than one stationary distributions.
When n = , we also call that the Markov chain has reached equilibrium. Under this status, the rate of the chain entering any given state is the
same as the rate of the chain leaving this state.
Example 6.11
A problem of interest to sociologists is to determine the proportion of society that has an upper or lower class occupation. One possible mathematical
model would be to assume that transition between social classes of the successive generations in a family can be regarded as transitions of a Markov
chain. Let us assume that we examine a single family tree of their first child.
Let Xn = 0, 1, 2 depending on the social class of the child in the nth generation. Suppose Xn is a Markov chain and the transition probability matrix is
given by
P =
0.05 0.70 0.25 ,
0.01 0.50 0.49
Solving the equation P = and
j = 1, we will get
92
In other words, in long run, the child under consideration has 7% chance
of belonging to class 0. If this model applies to all individuals in the society,
about 7% of people in the population will belong to class zero in long run.
Example 6.12
Consider a large population of individuals and consider their genotype at a
specific locus. Each individual has a pair of genes at this locus. A gene can
have different forms called alleles. In this example, we assume that there are
only two possible alleles named A and a. In generation 0, the proportion
of individuals with genotype AA, aa or Aa are respectively p0 , q0 and r0
(p0 + r0 + q0 = 1). Mendalians law states that the child of a couple will
inherit one gene from each parent. The two genes of each parent is equally
likely to be transmitted to its offspring.
It is a bit difficult to define a sequence of random variables Xn explicitly
here. Consider a line of individuals so that each person is the first child
of the person considered in the last generation. This person serves as a
representative of the general population.
Let Xn be the genotype of the nth person in this line, n = 0, 1, 2, . . ..
Assume
P (X0 = AA) = p0 , P (X0 = Aa) = r0 , and P (X0 = aa) = q0 .
That is, the first individual is chosen randomly from the population.
In addition, we assume his/her spouse will be selected from the population
randomly (at least in terms of his/her genotype). Let Yn be the genotype of
the spouse of the Xn . We assume that Yn has the same distribution as Xn .
P (X1 = AA|X0 = AA)
= P (X1 = AA, Y0 = AA|X0 = AA)
+P (X1 = AA, Y0 = Aa|X0 = AA)
+P (X1 = AA, Y0 = aa|X0 = AA)
1
= p0 + r0 + 0.
2
93
p +
n
Pn =
0
pn
+
2
rn
2
rn
4
0
qn +
qn
+
2
rn
2
rn
4
q + r2n
p + r2n
pn
+ 2q +
2
rn
2
Using the notation (pn , rn , qn ) for the distribution of Xn , we have for all
n 1,
h
r0
r0
r0
r0 i
(pn , rn , qn ) = (p0 + )2 , 2(p0 + )(q0 + ), (q0 + )2 .
2
2
2
2
Our computational results imply that the distribution of Xn stabilizes after
one generate. It is simple to verify that all Pn , n 0 are in fact equal. Hence
{Xn }
n=0 is a Markov chain. Its limiting probability is given by 1 and the
transition probability matrix is given by P1 .
94
This result is can also be regarded as a result from applying the Renewal
Theorem (Theorem 5.2).
Example 6.14
If we want to know the expected time for the occurrence of the pattern T HT ,
a caution has to be applied. Using the same argument, we can get the average
time it takes to the next appearance of T HT starting from T HT which is
1
. However, starting from T HT is different from starting from nothing.
pq 2
You may notice that T HT is a delayed renewal event.
To calculate the average waiting time for the first occurrence of T HT
from the beginning, we take note of the following fact. Once T has occurred,
the waiting time distribution of the occurrence of T HT from this moment
is just the same as the waiting time distribution for T HT from the moment
when T HT occurred. Therefore, we can simply calculate the average waiting
time for the first occurrence of T . It happens that T is a renewal event and
the technique used in Example 6.13 can be used.
The renewal theorem, or the limiting probability theorem for Markov
chain tell us that this average waiting time is 1q . Consequently, the average
waiting time for T HT to occur from the beginning is
1
1
+ 2.
q pq
6.5
95
If a state is transient, then the Markov chain will leave this state alone after
some finite amount of time. Let fi be the probability that the the chain will
return to state i (start from i). Then the number of future visits Ni (start
from i) has geometric distribution
P (Ni = n) = fin (1 fi ),
n = 0, 1, . . . .
t
X
pik skj .
k=1
96
Consider the gamblers ruin problem with p = 0.4 and N = 7. The class of
transient states consists of {1, 2, . . . , 6}. We can easily find pi,i+1 = 0.4 and
pi,i1 = 0.6 for these transient states. Inverting I P gives a big matrix
(using Splus or whatever you can think of). It turns out that
s3,5 = 0.9228,
s3,2 = 2.3677.
How does the above calculation relates to fi we discussed earlier? Let fij
be the probability that start from state i, the Markov chain will ever visit
state j (again if i = j). Hence, fi = fii . It can be seen that
sij = (ij + sjj )fij + ij (1 fij )
= ij + fij sjj .
6.6
Problems
1. Let {Xn }
n=0 be a stochastic process.
(a) If for each fixed n, Xn has density function
f (x) = 1
6.6. PROBLEMS
97
three days, then it will rain today with probability 0.2; and in any other
case the weather today will, with probability 0.6, be the same as the
weather yesterday. Determine the the transition matrix.
3. Let the transition probability matrix of a two-state Markov chain be
given by
!
p
1p
.
P=
1p
p
Show by mathematical induction that
(n)
"
1a
a
b
1b
0 < a, b < 1.
"
b a
b a
(1 a b)n
+
a+b
"
a a
b b
0 0.5 0.5
P2 =
P1 =
0.5
0
0.5
0.5 0.5 0
P3 =
1/2 0 1/2 0
0
1/4 1/2 1/4 0
0
1/2 0 1/2 0
0
0
0
0 1/2 1/2
0
0
0 1/2 1/2
P4 =
0
0
.5
0
0
0
.5
0
0
0
0
1
1
1
0
0
1/4 3/4 0
0 0
1/2 1/2 0
0 0
0
0
1
0 0
0
0 1/3 2/3 0
1
0
0
0 0
98
P5 =
P7 =
0
0
1
1
0
0
0.5 0.5 0
1/3 1/3 1/3
0
0
0
0
P6 =
1/3 2/3 0
0
0
0
2/3 1/3 0
0
0
0
0
0 1/4 3/4 0
0
0
0 1/5 4/5 0
0
1/4 0 1/4 0 1/4 1/4
1/6 1/6 1/6 1/6 1/6 1/6
0
0
0
1/3
P8 =
1 0
0 0
1 0
0 2/3
0
1
0
0
1
0
0
0
0 0
0 3/4 1/4 0
0 0
0 1/8 7/8 0
0 0
1/4 1/4 0 1/8 3/8 0
1/3 0 1/6 1/6 1/3 0
0
0
0
0
0 1
6. Let {Xn }
n=0 be a Markov Chain with transition probability matrix
P =
2
3
3
4
1
3
1
3
1
4
0 0
0 0
.
0 13 13
0 0 0 1
1) Classify the state space into classes. Assume the state space is
{0, 1, 2, 3}.
2) Which of them are recurrent, or transient?
3) Find the period of state 2.
4) Find the expected inter-occurrence times for all recurrent states.
(The answers to some states should be obvious; Limiting probabilities
are useful if you know how to get them).
7. Prove that if the number of states in a Markov chain is M , and if state
j can be reached from state i, it can be reached in M steps or less.
8. A transition matrix P is said to be doubly stochastic if the sum over
each column equals one; that is
X
6.6. PROBLEMS
99
j =
1
,
M +1
j = 0, 1, . . . , M
9. Let {Xn }
n=0 be a Markov Chain with transition probability matrix
P =
2
3
1
2
1
3
1
2
1
4
0 0
0 0
.
0 14 12
0 0 0 1
P =
1
4
3
4
1
2
3
4
1
4
1
2
0
0
0
0 0 0
0 13 0
0
0
0
1
2
3
0
0
0
0
0
(a) Show that S consists of 2 closed classes and 2 open classes. What
are these classes?
(b) Determine the period of each of the closed classes.
Note that it is impossible to return to either of the transient states 2
and 4 in this chain. In this case, we set the period of the state to be
infinity, to indicate that the chain cannot return to this state.
(c) Find the unique steady state corresponding to each of the closed
classes.
(d) Write down the general form of all steady states for P .
100
P =
1
5
1
3
1
5
1
5
1
5
1
5
0
0 23 0 0
0 0 12 0 12 0
.
0 35 0 25 0 0
0 0 12 0 12 0
1
1
1
0
0
0
4
4
2
(a) Show that S consists of two closed classes and one open class.
(b) Find the period of each of the three classes.
(c) Find the unique steady state corresponding to each closed class,
and write down the general form of all steady states for P .
(d) Find the probability of absorption into {1, 3} from state 0 and
the probability of absorption into {1, 3} from state 5. What can you
say about the probabilities of absorption in {2, 4} from states 0 and 5
respectively?
12. Consider the transition matrix
P =
0
0
0
1
1
1
3
0
0
0
0
2
3
0 0
0 14 34
0 14 34
0 0 0
0 0 0
6.6. PROBLEMS
101
2
2
i /a
(a i) /a
2i(a i)/a2
j =i1
j =i+1
j = i, (i 6= 0, a).
Show that the chain is ergodic and obtain the stationary distribution.
14. One form of a random walk with two reflecting barriers has transition
matrix given by
P00 = 1 p, P01 = p;
Pj,j1 = q, Pj,j = r, Pj,j+1 = p when 0 < j < a;
Pa,a1 = q, Pa,a = 1 q.
where p + q + r = 1 and all non-zero. Show that the chain is irreducible
and aperiodic. Determine the stationary distribution for this chain.
15. Let {Zn }
n=0 be a branching process with the family size distribution
given by P (X = 0) = 1/3, P (X = 2) = 2/3.
1) State the definition of the Markov chain.
2) Verify that {Zn }
n=0 is a Markov chain. Calculate the transition
probabilities pij . (Think about situations such as i = 0; j = 0; j is odd
etc).
3) Classify the state space. Indicate whether they are recurrent or
transient. Give a one line explanation.
4) Can you find a stationary distribution?
16. Each morning an individual leaves his house and goes for a run. He is
equally likely to leave either from his front or back door. Upon leaving
the house, he chooses a pair of running shoes (or goes running barefoot
if there are no shoes at the door from which he departed). On his return
he is equally likely to enter, and leave his running shoes, either by the
front or back door. If he owns a total of k pairs of running shoes, what
proportion of the time does he run barefooted?
102
a) Are {Xn }
n=0 and {Yn }n=0 Markov chains. If any of them are, write
down their state spaces and transition matrices and do the usual classification.
6.6. PROBLEMS
103
25. Show that if state i is recurrent and state i does not communicate with
state j, then pij = 0. This implies that once a process enters a recurrent
104
Chapter 7
Exponential Distribution and
the Poisson Process
Recall that we commented that the real world processes are often too complex
to be analyzed based our current mathematical knowledge. We hence often
restrict ourselves to stochastic processes with simple and nice mathematical
properties. Hopefully, the results we developed are still applicable to the real
world approximately. If the model is too far off, we may then increase the
complexity of our model to see if the generalized model helps.
The discrete time Markov chain ignores the duration between two transitions. It is often still satisfactory when we use it to model the gambling
problem, the English text, or even the music notes. When it is used to model
the population size, the idea of generation is obviously too rough. It is very
important to know that some individuals may give birth at younger age than
others.
The waiting time for the next transition (when one gives birth, for example) should clearly be regarded as the outcome of a random variable. It
turns out that a simple yet more realistic assumption on its distribution is
exponential. Unlike the normal distribution, the exponential distribution is
non-negative, its cumulative distribution function has simple form, and it
has memoryless property.
105
106
7.1
1
1
, V ar(X) = 2 .
7.2
107
f (t)dt
= r(t)dt.
1 F (t)
We call
r(t) =
f (t)
1 F (t)
the hazard rate. The hazard rate for exponential distribution is a constant.
That is, it does not depend on t.
Our life time distribution has a non-constant hazard rate for obvious
reasons. Hence, it does not make sense to use exponential model for insurance
companies. However, the hazard remains almost constant for a period of
time, say from age 22 to 40. An exponential model is still helpful in many
ways.
Example 7.1
Let X1 , X2 , . . . , Xn be independent exponential random variables with respective rates 1 , . . . , n , where i 6= j when i 6= j. Let N be a random
variable independent of these random variables, such that
pj = P (N = j),
n
X
pj = 1.
j=1
n
X
pj j exp{j t}.
j=1
108
Example 7.2
Let Xi , i = 1, 2, . . . , n be iid random variables with exponential distribution
and rate . Then the density function of Sn = X1 + . . . + Xn is
fn (t) =
n
tn1 exp(t)
(n 1)!
scale parameter 1 .
Example 7.3
Assume X1 and X2 are two independent exponential random variables with
rates 1 and 2 . Then
1. min(X1 , X2 ) has exponential distribution with rate 1 + 2 .
2. P (X1 < X2 ) =
1
.
1 +2
i .
7.3
109
We start with the simplest continuous time stochastic process. If, starting
from a conceptual beginning at t = 0, we are able to determine the value of
N (t) for each given t > 0 such that N (t) represents the number of occurrence
of some incidents (events). We say {N (t) : t 0} is a counting process.
Let {N (t) : t 0} be a stochastic process. To qualify as a counting
process, it must satisfy
(i) N (t) 0;
(ii) N (t) is integer valued;
(iii) If s < t, then N (s) N (t).
We say the counting process has independent increment if
N (t2 ) N (t1 ),
N (s2 ) N (s1 )
110
(t)n
exp(t),
n!
n = 0, 1, . . .
h0
f (h)
= 0,
h
111
Theorem 7.1
Two definitions of Poisson process are equivalent.
Proof: We will only show the conditions (iii) and (iv) in Definition 2 implies
the condition (iii) in Definition.
We first work on P (N (t) = 0). Define P0 (t) = P (N (t) = 0).
Let h > 0 be a small number. Then
P0 (t + h) = P (N (t) = 0, N (t + h) N (t) = 0)
= P (N (t) = 0)P (N (t + h) N (t) = 0) independent increment
= P0 (t)P (N (h) = 0) stationary increment
= P0 (t){1 P (N (h) = 1) P (N (h) 2)}
= P0 (t){1 h + o(h)}.
Consequently, we have
P0 (t + h) P0 (t)
o(h)
= P0 (t) +
.
h
h
Let h 0. The left hand side is the derivative of P0 (t) and the right hand
side gives the result. That is,
P00 (t) = P0 (t).
The solution of this differential equation is given by
P0 (t) = exp(t)
in view of the boundary condition P0 (0) = 1.
Next, we build on top of this result. We use mathematical induction for
other cases. Define
Pn (t) = P (N (t) = n)
and assume
(t)k
exp{t}
k!
for k = 0, 1, . . . , n 1. We have shown that this assumption is true when
n = 1.
Pk (t) =
112
7.3.1
Let T1 be the waiting time for the first event in a Poisson process with rate
. It is obvious that for all t 0,
P (T1 > t) = P (N (t) = 0) = exp(t).
Hence, T1 has exponential distribution with rate .
Now, let T2 be the waiting time for the second event after the first event
has occurred. We call it inter-arrival time. What is the distribution of T2 ?
Note that
P (T2 > t|T1 = s) = P (N (s + t) N (s) = 0) = exp(t).
113
P
If Sn = ni=1 Ti . Then
P (Sn t) = P (N (t) n).
The density function of Sn is
f (t) =
(t)n1
exp(t).
(n 1)!
The corresponding distribution is called Gamma distribution with n degrees of freedom and scale parameter 1/.
7.4
Further Properties
Suppose that the events in a Poisson process can be classified into two types:
I and II. Further, this classification is random, and it is independent of the
process itself. For example, suppose we can model the number of customs
entering a store as a Poisson process. We classify customers into two types:
class one consists of customers who will buy something; class two consists of
custerms who will just have a look. If we further assume that their purchasing decisions are made independently, then we are in a situation where the
following model will apply.
Let N (t) be the original process. Let N1 (t) be the number of first type
events occurred in [0, t]. Similarly define N2 (t).
Theorem 7.3
114
Under the assumption that each event in a Poisson process can be independently classified as types I and II, the two sub-counting processes are both
Poisson process with rates p and (1 p) where p is the probability of an
event being type I.
Proof: In this situation, it is more convenient to use the first definition of
the Poisson process.
We calculate P (N1 (t) = n, N2 (t) = m) for each pairs of non-negative
integers. This will give us the joint distribution of N1 (t) and N2 (t). Whether
N1 (t) and N2 (t) are independent, whether they all have Poisson distribution,
these questions will be answered easily. Other conditions in the definitions
are obvious.
Here is our calculation:
P (N1 (t) = n, N2 (t) = m) = P (N1 (t) = n, N1 (t) + N2 (t) = n + m)
= P (N1 (t) = n, N (t) = n + m)
= P (N1 (t) = n|N (t) = n + m)P (N (t) = n + m)
!
n+m n
(t)n+m
m
exp(t)
=
p (1 p)
(n + m)!
n
(pt)n
{(1 p)t}n
=
exp(pt)
exp{(1 p)t}.
n!
n!
Obviously, N1 (t) and N2 (t) are independent and both have Poisson distribution.
7.5
P (T1 s, N (t) = 1)
P (N (t) = 1)
115
That is, the first event is equally likely to have occurred at any moment in
[0, t]. This is another evidence for uniformity.
Let Si , i = 1, 2, . . . be the time when the ith event occurred. Given
N (t) = n, what is the conditional joint distribution of Si , for i = 1, 2, . . . , n?
For this purpose, let si , i = 1, 2, . . . , n be an increase sequence of positive
numbers such that sn < t and none of them are equal. Let us try to calculate
the probability of the event
Si (si , si + dsi ) for all i = 1, 2, . . . , n00 .
The notation dsi are just some imaginary small numbers. Roughly, we may
believe that
P (Si (si , si + dsi ), i = 1, 2, . . . , n|N (t) = n)
P (Si (si , si + dsi ), i = 1, 2, . . . , n, N (t) = n)
=
P (N (t) = n)
= [P (N (si + dsi ) N (si ) = 1, N (si ) N (si1 + dsi1 ) = 0, i = 1, 2, . . . , n,
N (t) N (sn + dsn ) = 0)]/[P (N (t) = n)]
n!
ds1 ds2 dsn .
tn
n!
tn
116
that the ordered independent uniform random variables in [0, t] has the joint
density function given by
f (s1 , . . . , sn ) =
n!
.
tn
The moral is: the joint occurrence of n events in [0, t] is again uniform.
7.6
Problems
k s k
s
P {N (s) = k|N (t) = n} =
( ) (1 )nk ,
n t
t
k = 0, 1, . . . , n.
7.6. PROBLEMS
117
(a) If 200 meteorites have hit the Earth in a particular month, what is
the expected number of them that reached the ground?
(b) If in another particular month, 20 meteorites were found to have
reached ground, what is the expected number of meteorites (including
those burnt up in the air) to have hit the Earth in that month?
(c) Assume the mass of meteorites have an exponential distribution
with mean = 1000 kg, independent of each other. Let X be the total
mass of meteorites that hit the Earth in a year. Calculate the mean
and variance of X. Assume a year equals exactly 12 months.
4. Cars pass a point on the highway at a Poisson rate of one per minute.
If five percent of the cars on the road are Dodges, then
(a) what is the probability that at least one Dodge passes by during a
hour?
(b) given that ten Dodges have passed by in an hour, what is the
expected number of cars to have passed by in that time?
(c) if 50 cars have passed by in an hour, what is the probability that
five of them were Dodges?
5. Let {N (t), t 0} be a Poisson process with rate . Let Sn denote the
time of the nth event. Find
(a) E(S4 ),
(b) E[S4 |N (1) = 2],
(c) E[N (4) N (2)|N (1) = 3].
6. Two individuals, A and B, both require kidney transplants. If she does
not receive a new kidney, then A will die after an exponential time with
rate A , and B after an exponential time with rate B . New kidneys
arrive in accordance with a Poisson process having rate . It has been
decided that the first kidney will go to A (or to B if B is alive and A
is not at that time) and the next one to B (if still living).
(a) What is the probability A obtains a new kidney?
(b) What is the probability B obtains a new kidney?
118
Chapter 8
Continuous Time Markov
Chain
One of the shortcomes of the discrete time Markov chain is that it can only
be used to model situations when a transition occurs only at discrete times.
This is not a problem when modeling the outcome of gambling, an English
text or a music piece. It might also be an ideal mathematical model for DNA
sequences. However, it is a bit stretched to model the sizes of some animal
populations.
Continuous time Markov chain represents one of the directions in which
the discrete time Markov chain is generalized. Other than the inter-arrival
time between two transitions is now a continuous random variable, we retain
other requirements of the corresponding stochastic process.
Let {X(t), t 0} be a stochastic process. It is a continuous time Markov
chain if it has the following two properties:
(i) It has countable state space;
(ii) It has Markov property:
P (X(t + s) = j|X(s) = i, X(u) = x(u), 0 u < s)
= P (X(t + s) = j|X(s) = i),
The concept remains the same. Given present (X(s) = i), the future
outcome X(s + t) = j is independent of the past (X(u) = x(u), 0 u < s).
119
120
121
The Markov property can be verified as the waiting time for a transition is
exponential regardless of which state the Markov chain is in at the moment.
For example, if X(0) = 0, the waiting time for a transition to occur is
the same as waiting time for the repair-person to get one machine repaired.
This waiting time is exponential with rate .
If X(0) = 1, a transition occurs either when the break down machine is
repaired, or the functioning machine breaks, whichever occurs first. Assume
the independence of two waiting times, the shorter of the two has exponential
distribution with rate +
If X(0) = 2, a transition occurs when one of the machines breaks down.
Again, under independence assumption, the waiting time for the first break
down is exponential with rate 2.
Note also not only the waiting time for a transition is independent of the
past (given the present), the transition probability is also independent of the
past.
When X(0) = 0, the only possible transition is from 0 to 1. Hence,
p01 = 1.
When X(0) = 1, it transfers to 0 if the functioning machine breaks down
ahead of the break down machine is repaired. (the chance of them to occur
at exactly the same time is nil). This occurs with probability
+
and this event is independent of the occurrence time. (Review our discussion
on exponential distributions). Hence, p10 = /( + ), and p12 = /( + ).
When X(0) = 2, the only possible transition is from 2 to 1. Hence,
p21 = 1.
The above discussion fully verifies the Markov property, and we find
P =
1
0
1
122
8.1
The example we just gave is a special case of birth and death process, while
the later is a special continuous time Markov chain.
Suppose we are investigating a specific biological species. Somehow we
have a starting point t = 0, and define
X(t) = Population size at time t.
We now have a continuous time stochastic process with countable state space.
To qualify as a Markov chain, we make several assumptions on its random
behavior:
(i) When X(t) = n, the waiting time for the next birth to occur has
exponential distribution with rate n for n 0.
(ii) When X(t) = n, the waiting time for the next death to occur has exponential distribution with rate n for n 1. (0 = 0). Also, the occurrences
of the birth and the death are independent of each other.
We call such a stochastic process birth and death process. It is seen
that
(a) State space: S = {0, 1, 2, . . .},
(b) (i) Waiting time for the next transition to occur has exponential distribution with rate n + n for n = 1, 2, . . .. (ii) The instantaneous transition
probabilities are:
123
Example 8.2
A birth and death process is said a pure birth process if n = 0 for all n. It
has further linear birth rate if n = n.
= exp{( + )t} + 2
M (t s) exp{( + )s}ds.
124
Example 8.4
Let us consider a birth and death process {X(t), t 0} with birth and death
rates be given by i , i , with 0 = 0.
Let Ti be the time it takes for the process, starting from state i, to enter
state i + 1 for the first time.
Assuming i > 0 for all i. We have
E(T0 ) =
1
.
0
1
.
i + i
125
1
i
+ E(Ti1 )
i i
for i = 1, 2, . . ..
In particular, if the birth and death rates are constant, we have
E(Ti ) =
1 (/)i+1
[1 + + ( )2 + + ( )i ] =
.
8.2
Recall the Chapman and Kolmogorov equations for the discrete time Markov
chain. The system tells us that the n-step transition matrix is the product
of n one-step transition matrices. This equation system remains true for the
continuous time Markov chain with some modifications.
Lemma 8.1
Suppose {X(t) : t 0} is a continuous time Markov chain. Let pij (t) =
P [X(t) = j|X(0) = i] be its transition probability function. We have
pij (t + s) =
k=0
The proof is the same as that for the discrete time Markov chain.
For discrete time Markov chain, the shortest time unit is 1. There is no
shortest time unit for continuous time Markov chain. If P (0.01) is known,
we can work out P (0.01n) for all positive integer n in principle. We need
only multiply P (0.01) with itself n times even though you might be bored
126
to death for this task. The real challenge is, however to compute P (0.002)
based on that? Can we find an analytical form for P (t) based on parameters
i , pij , the instantaneous transition probability? The answer is positive in
principle.
Lemma 8.2
Suppose {X(t) : t 0} is a continuous time Markov chain with exponential rates i and instantaneous transition probabilities pij . Let pij (t) be its
transition probabilities for a time period of t. Then we have:
(a)
lim
h0
(b)
1 pii (h)
= i ;
h
pij (h)
= i pij .
h0
h
lim
127
Unfortunately, the proofs above are not truly rigorous. The problem is
the order of transition matrix, which could be . In the case when the
state space is infinity (but countable, of course), the matrix multiplication
involves summation of infinite terms. The above manipulation implies taking
derivatives term by term in the summation. This is not always valid of taking
derivatives of the summation. Therefore, the theorem on forward equation
must include some regularity conditions. While we do not specify them here,
128
we would like to mention that they are satisfied whenever the sample space
is finite. It is also okay with birth and death processes.
Let us assume all the processes to be considered in this course
are regular.
The matrix G plays an important role in these two equations. It is called
infinitesimal generator. The backward equation applies P (t) from the back
of G, and the forward equation applies P (t) from the front of G.
In principle, once G is known, we can solve the backward equation to find
the transition matrix P (t). In reality, this is not always feasible. We have a
few examples for which this can be done.
Example 8.5
Pure birth process with constant birth rate . In other words: Poisson process. We practically used the differential equation to show that the number
of events in a fixed period of time has Poisson distribution.
Example 8.6
Consider a lab with one machine. The waiting time until it breaks is exponential with rate , and when it is broken, the waiting time until it is
repaired is exponential with rate . Let us define
X(t) =
129
Component-wise, we have
p000 (t) = [p10 (t) p00 (t)],
p010 (t) = [p00 (t) p10 (t)].
We can then find
p000 (t) + p010 (t) = 0.
This implies
p00 (t) + p10 (t) = C
where C is a constant. Check the value at t = 0, we find C = . Hence
p10 = [1 p00 (t)]. Substituting back, we find
p000 (t) = ( + )p00 (t).
Solving this equantion, we find
p00 (t) =
exp{( + )t} +
.
+
+
exp{10( + )} +
.
+
+
.
+
Thus, the long term proportion of times when the machine is working, is
The answer is very reasonable.
.
+
130
8.3
Limiting Probabilities
Similar to the discrete time Markov chain, when t , pij (t) often has a
limit which does not depend on i. The condition for the validity of this result
is also similar. However, we are no longer bothered by the periodicity.
Theorem 8.3
Suppose {X(t) : t 0} is a continuous time Markov chain with infinitesimal
generator G. Suppose
(a) all states of the Markov chain communicate with each other:
P {X(t) = j for some t > 0|X(0) = i} > 0
for all i, j; (irreducible)
(b) let Tij = The amount of time from X(0) = i until X(t) = j for the
first time, we have E(Tij ) < ,
Then limt pij (t) = j exists for all i, j and the vector satisfies
G = 0;
j = 1.
Remark
1. The limiting probability i still has the interpretation of long run proportional times when the Markov chain stays in state i.
2. Assume that the Markov chain is irreducible, and a non-zero solution
to G = 0 exists. Then the limiting probability exists and all the states
are positive recurrent. That is, we need not verify the condition (b)
before we solving for the limiting probabilities.
3. Without the notation G = V V P , the equation that satisfies can
be written as
X
j j =
k k pkj .
k
We may regard that j j as the rate the Markov chain leaving state j,
P
and k k k pkj as the rate the Markov chain entering state j. When
131
the time t goes to infinity, the Markov reaches equilibrium: the rates
of entering and leaving a state is the same for all states. For this sake,
when the Markov reaches this stage, it is in equilibrium.
4. When the limiting probability exists, the Markov chain is called ergodic. The limiting probability vector is also a stationary probability
distribution, or equilibrium distribution.
5. The expected inter-occurrence time of state j is again given by j =
1/j .
We do not have to rely on the equation G = 0 to find the . See the
following example.
Example 8.7 Birth and death process
Consider a typical birth and death process with birth and death rates n and
n . We easily set up to following table:
State rate of leaving
0
0 0
1
1 (1 + 1 )
2
2 (2 + 2 )
3
3 (3 + 3 )
..
.
n
n (n + n )
..
.
rate of entering
1 1
2 2 + 0 0
3 3 + 1 1
4 4 + 2 2
..
.
n+1 n+1 + n1 n1
..
.
Since the birth and death process has to settle down to some states, the rates
of moving between states have to be balanced. This observation gives
State rate of up rate of down
0
0 0
1 1
1
1 1
2 2
2
2 2
3 3
3
3 3
4 4
..
..
.
.
n
n n
n+1 n+1
..
..
.
.
132
n = 1, we find
h
1 = 0 1 +
n1
X
Y i i
n=1 i=0
i+1
n=1 i=0
i+1
< .
This is the necessary and sufficient condition for the birth and death process
to reach equilibrium.
When this condition is satisfied, we find
n =
Qn1
i=0
1+
n=1
(i /i+1 )
.
i=0 (i /i+1 )
Qn1
Remark
(i) When the birth rates are too high, the population will keep increasing.
No equilibrium can be researched.
(ii) When n = 0 for some n = N . Then the population size will be
capped by N . It is easy to see that the equilibrium is now always possible.
Example 8.8
A job shop has M machines and one repair person. Assume each machine
will work exponential time with rate , independent of each other, and the
133
repairing time is also exponential with rate , regardless how many machines
are working at the moment.
Define X(t) to be the number of machines not working at time t. Then
{X(t) : t 0} is a birth and death process:
State Space
0
1
2
M
Birth Rates M (M 1) (M 2) 0
Death Rates 0
(a) What is the average number of machines not working in long run?
We need to work out the limiting probabilities to answer this question.
Using the argument of the rates of movements, we note
nn+1 n+1n
(M n)0 n
n+1 .
Thus,
n+1 =
From
(M n)
n = (/)n+1 [(M n)(M n + 1) M ]0 .
= 1, we find
0 =
M
hX
M!
i1
( )i .
i=0 (M i)!
There are no closed solutions. The average number of machines not working
is
lim E[X(t)] =
M
X
nn .
n=0
PM
nn
.
M
n=0
Example 8.9
134
When the birth and death rates are all constant (do not dependent on the
state), the solution for the limiting probability is very simple. The limiting
probabilities are given by
n = ( )n (1 ) n = 0, 1, 2, . . . ,
when < .
This model is also called an M/M/1 queue: a work station has a single
server who works at constant rate, a steady stream of customs arrive for
service. If the service rate is larger than the arriving rate, the system is
stable. A custom will find on average (1 /)1 customers in front of him
upon arrival.
8.4
Problems
8.4. PROBLEMS
135
4. There are two TAs for a particular course who answer questions in a
tutorial center. The number of students who come to ask questions
can be modeled by a Poisson process with intensity = 15/hour. The
amount of time it takes to answer questions for a single student has an
exponential distribution with rate = 12/hour. Assume the center is
large enough for 4 students including those who are asking questions
and new arrivals will not enter when the room is full.
(a) Set up a birth and death process to model this process. This includes: define {X(t), t 0}; write down its state space and its birth
and death rates.
(b) Write down its infinitesimal generator G.
(c) Obtain the limiting probabilities of this process.
(d) What proportion of the time is the room full? Assume the center
has been at service for a very long time.
(e) What proportion of the time can at least one of the TAs have a
rest?
5. A job shop consists of three machines and two repairmen. The amount
of time a machines works before breaking down is exponentially distributed with mean 10. If the amount of time it takes a single repairman to fix a machine is exponentially distributed with mean 8, then
(a) what is the average number of machines not in use?
(b) what proportion of the time are both repairmen busy?
6. Each individual in a biological population is assumed to give birth at
an exponential rate , and to die at an exponential rate . In addition,
there is an exponential rate of increase due to immigration. However,
immigration nor birth are allowed when the population size reaches N .
(a) Set this up as a birth and death model.
(b) If N = 3, = = 1, = 2, determine the proportion of time that
immigration is restricted.
136
8.4. PROBLEMS
137
138
14. A computer can handle N tasks simultaneously. The tasks are submitted to the computer as a Poisson process with a rate of per second
and the amount of time it takes to complete a task is independent of
other tasks and has exponential distribution with a mean of seconds.
The tasks submitted while the computer is at full load will be lost
without any warnings. (a) Set up a birth and death process to model
this process. This includes: define {X(t), t 0}; write down its state
space and its birth and death rates.
(b) Write down its infinitesimal generator G.
(c) Assume N = 3, = 4 and = 1,
(i) obtain the limiting probabilities of this process.
(ii) obtain the mean number of tasks the computer handles at any
moment if the computer has been operating for a very long time.
(iii) what proportion of the jobs you submitted will get lost in a long
run?
Chapter 9
Queueing Theory
Queueing theory is closely related to the continuous time Markov chain. It
has the following basic set ups. There is a service station with several servers.
Customers come for service. They leave after being served. There are three
important factors that determine the properties of a queueing system.
The first factor is the random mechanism of the arrival of the customers.
Is the waiting time for the next customer a constant? Is it independent of
what happened already?
The second factor is the number of servers. How many customers can be
served simultaneously?
The third factor is the random mechanism of the service time. How long
does it take to serve a customer? Is it random?
The model becomes more complex if the number of servers changes according to the length of the queue. Customers may also be divided into
several classes so that some of them receive priority service.
There are also several questions we might be interested in their answers.
On average, how long a customer has to wait before being served? What
proportion of the time when the server is idle? Once we have sufficient
understanding of the queue, the system will be optimized.
9.1
Cost Equations
140
9.2
141
Steady-State Probabilities
.
Let us now define a stochastic process for the queueing system. At any
given time t, we might be interested in several aspects of the queueing system.
We define
X(t) = number of customers in the system at time t.
Hence X(t) is the total number of customers including those being served at
the moment and those who are waiting. One quantity of interest is P {X(t) =
n} for each n. Namely, the probability (mass) function of X(t) for each
given t. Mathematically, this is often too hard to be computed analytically.
Instead, consider
n = lim P {X(t) = n}
t
when it exists. Under certain conditions, computing this limit is simple. This
quantity can be interpreted as long-run proportion of times when there will
be exactly n customers in the system. It is also referred to as steady-state
probability of exactly n customers in the system. It is usually true that
n is the long-run proportion of times when the system contain n customers.
If 3 = 0.2, then about 20% of times, the system contains 3 customers.
P
On average (in the long-run), there are nn number of customers in the
system.
Let Tm be the arrival time of the mth customer, then X(Tm ) is the
number of customers in the system when the mth customer arrives. Define
an = lim P (X(Tm ) = n).
m
142
Example 9.1
Consider a queueing model in which all customers have their service time
equal to 1, and where the times between successive customers are always
greater than 1. In this case, the system is always empty when a new customer
arrives, and when a customer leaves. We hence find
a0 = d0 = 1.
However, 0 > 0 as long as there is a steady stream of customers arriving.
If you work in a service station with this property, your supervisor can
always pick up the right time so that you are found idle all every time, even
though you are very busy in between.
Example 9.2
If there are no multiple arrivals, and there is only one server, then an = dn
for all n.
If the system reaches a balance, the long term number of transitions of
X(t) from n to n + 1 have to be the same as the number of transitions from
n + 1 to n. The former represents an and the later represents dn . So they
are equal.
The conditions on single arrival and single server make sure that transitions such as from n to n + 2 cannot happen.
Example 9.3
If the customers arrive according to a Poisson process model, then
n = an .
Due to possible sampling bias, the supervisor may not always know how
busy you are on average. However, if he/she picks next inspection time
according to an exponential distribution, he/she will not be at risk of misjudging your average work load in the long run.
9.3
143
Exponential Model
A special queueing model is when (i) customers arrive according to the conditions of Poisson process with rate ; (ii) the service station has one server; (iii)
service time has exponential distribution with rate . This type of queueing model is also called /M/M/1 model. The letter M stands for Markov
property: that is, the memoryless property of the exponential distribution
used to describe the arrival and service. The digit 1 stands for the number
of server. Obviously, if X(t) is the number of customers in the system at
time t, then {X(t) : t 0} is simply a birth and death process. If n is the
limit of P [X(t) = n], then it satisfies equation G = 0. From another point
of view, considering the rate of X(t) moving up and down, we have
State up down
0
0 1
1
1 2
2
2 3
n
n n+1
1=
.
1 /
144
.
( )
2
.
( )
(s)n1
exp(s)ds.
(n 1)!
9.4
Assume the service station can only hold N customers. When the system is
full, the arrivals get lost. This system can be analyzed in the same fashion
as before.
145
State
0
1
2
up
down
0
1
1
2
2
3
N 1 N 1 N
Note that this balance sheet stops at N .
Now, we have
(1 /)
.
0 =
1 (/)N +1
In case you have not noticed the difference, here we put down a list:
(1) This result remains true regardless of > . If there too many
customers that the system can handle, we simply turn them away.
(2) We have a sum of finite number of terms. Sharpening your memory
on geometric summation is necessary.
It is often the case that finite case is harder than the infinite case. We
have, for this system,
L=
N
X
n=0
nn =
{1 + N (/)N +1 (N + 1)(/)N }
.
( ){1 (/)N +1 )}
146
Solution: We can work out the relationship between the net profit and the
service rate, together with arrival rate . Let us assume that M/M/1 model
is suitable.
Net profit per hour = (1 N )A c =
A[1 (/)N ]
c.
1 (/)N +1
10(3 )
.
3 1
We may find the value of that maximizes above numerically. The answer
is approximately = 2.
147
(1, 0) (0, 1),
(0, 1) (1, 1),
(0, 1) (0, 0),
(1, 1) (0, 1),
(1, 1) (b, 1),
(b, 1) (0, 1).
148
next two customers. If there is only one waiting, it will take just one. When
nobody is waiting, it idles. It seems more convenient to let
X(t) = the number of customers in the line.
When nobody is waiting, we still have two different situations: the server is
idle, or the server is busy. So, we define X(t) = 0 when no one is in the line,
but the server is busy, and X(t) = 00 when no one is in the line, and the
server is idle.
To find the limiting probabilities, we notice that there is still a general
direction when the Markov chain moves: it either moves up or down to one
state.
state up
down
0
0
00
0
0
0
(1 + 2 )
0
1
(2 + 3 )
n
n (n+1 + n+2 )
Although it is possible to solve this system of equations with generating
function, we need only use this idea to justify that the solution has the form
n = n 0 .
for all n = 0, 1, 2, . . .. Substitute into the relationship, we find
=
Further, from the fact that
1 + 4/ 1
2
i = 1, we find
0 =
(1 )
.
+ (1 )
The rest of limiting probabilities can then be easily calculated. For instance,
00 = 0 .
One remark is that the solution makes sense only if < 1. This requires
2 > , which is obviously necessary.
149
00 +1
,
and
.
(1 )[ + (1 )]
It is seen that
WQ = LQ /
and so on.
9.5
9.5.1
Network of Queues
Open System
150
Now consider the states from which the Markov chain may enter state
(n, m). They include (n 1, m), (n + 1, m 1), (n, m + 1). Again, we make
some minor adjustments if any of n, m is zero.
So the general balance equation is
n,m (1 + 2 + ) = n1,m + 1 n+1,m1 + 2 n,m+1 .
For special cases, we have
0,0 = 1 0,1 ;
( + 1 )n,0 = 2 n,1 + n1,0 ;
( + r )0,m = 2 0,m+1 + 1 1,m1 .
Rather than solving this system of equations directly, it is more convenient to guess the solution and verify it. The idea is: the system under
consideration is similar to two M/M/1 systems. If the balance (equilibrium)
will be researched, the arrival rate for the second server has also to be .
Hence, we must have
n, = (1 /1 )(/1 )n ;
and
,m = (1 /2 )(/2 )m .
And we guess
n,m = n, ,m .
Needless to say, this guess is correct. We may verify it quickly.
Note that the limiting distribution is a joint independent geometric. The
total number of customers in the system has expectation given by
L=
9.5.2
+
.
1 2
Closed Systems
151
m = m P.
As the solution to the above type of equation is unique up to a scaler, we
must have
m = km k,
where km k = j m (j). We may interpret km k as the average service
completion rate of the entire system. It is the system throughput rate.
I would like to add that these arguments are still until the exponential
service time assumption with service rate j at station j. We must also
assume that service stations are independent of each other. Our next question
is: how do these m customers distribute among these k servers?
Let Y (t) = (n1 , n2 , . . . , nk ) be the vector with j component equaling the
number of customers in jth station at time t. Then {Y (t) : t 0} is a
continuous time Markov chain (with vector valued random variables). Let
P
152
p
Y
(m (j)/j )nj ,
j=1
p
Y
(j /j )nj .
j=1
k
Y
(j /j )mj
j=1
= C
k
Y
j=1
(j /j )mj .
153
Since C is a normalizing constant, we find this conditional probability function is the same as Pm1 . Hence, we claim
Theorem 9.1 The arrival theorem
In the closed network system with m customers, the system as seen be arrivals
to server j is distributed as the stationary distribution in the same network
system when there are only m 1 customers.
That is, this customer may pretend that s/he is an observer from outside.
Let Lm (j) and Wm (j) be the average number of customers and the average time a customer spends at server j when there are m customers in the
network. Upon conditioning on the number of customers found at server j
by an arrival to that server, it follows that
Wm (j) =
1 + Em [nj ]
1 + Lm1 (j)
=
.
j
j
Replacing Em [nj ] by Lm1 (j) in the last equality is based on the arrival
theorem. (Sorry that we have used lower case for random variable, and
upper case for expected value here).
In addition, since m1 (j) = m1 j , the cost equation implies
Lm1 (j) = m1 j Wm1 (j).
Substituting back to Wm (j), we get
Wm (j) =
Since
1 + m1 j
j .
Wm1 (j)
j Wm1 (j),
or
m1
.
j j Wm1 (j)
m1 = P
These manipulations result in
Wm (j) =
1
(m 1)j Wm1 (j)
+
.
P
j
j i i Wm1 (i)
154
After so much work, we may rightfully ask: so what? Note that W1 (j) =
1/j which is very easy to calculate. The above relationship enables us to
obtain W2 (j), and from Wm1 (j), we can easily get Wm (j). Thus, we can
compute Wm (j) iteratively. The cost equation will then make it possible to
calculate all Lm (j).
9.6
Problems
If (z) =
n=0 n z , show that
(i) (z) = w(
z);
(ii) w(s)
= es/ .
(c) Using the results of (b) or otherwise, find E(W ).
Chapter 10
Renewal Process
In the Poisson process model, the inter-arrival times are assumed to be independent an identically distributed exponential random variables. We now
seek to relax the requirement slightly.
Definition 10.2
Let X1 , X2 , . . . , be a sequence of independent and identically distributed
non-negative random variables. Define
N (t) = max{n :
n
X
Xi t}
i=1
156
Pn
i=1
P (X1 = 0) < 1.
Let = E[X1 ]. Obviously, > 0. We did not really pay attention whether
N (t) is well defined for each t. It is likely that N (t) < 200, for instance, no
matter how large the t is?
It turns out that this cannot happen. According to the strong law of
large numbers, we have
Sn
n
almost surely as n . It is hence true that Sn n. Thus, when
n increases to infinity, Sn also increases to infinity almost surely. By the
definition of N (t), we easily see that
P (N (t) < ) = 1
for all t 0. At the same time, limt N (t) = .
10.1
Distribution of N (t)
For each given t, N (t) is a random variable. What is its distribution? The
answer is usually not available unless the distribution of X1 has a convenient
form. Some discussions are possible.
Note that the event N (t) n is the same as Sn t. Thus, it is seen that
P {N (t) = n} = P {N (t) n} P {N (t) n + 1}
= P {Sn t} P {Sn+1 t}.
Denote Fn (t) = P {Sn t} which is the convolution of the distributions of
X1 , . . . , Xn , we have the expression
P {N (t) = n} = Fn (t) Fn+1 (t).
As indicated earlier, this expression does not provide any practical means
of computing the distribution of N (t).
10.1. DISTRIBUTION OF N (T )
157
Example 10.1
Suppose that in a renewal process, the inter-arrival times X1 , X2 , . . . , are
uniformly distributed on the unit interval [0, 1]. Then for 0 t 1,
Fn (t) =
tn
n!
Example 10.2
Suppose that in a renewal process, the inter-arrival times X1 , X2 , . . . , are
discretely uniform on integers {0, 1, 2, 3}. Then the expressions of F1 (t) and
F2 (t) are easy to obtain:
F1 (i) =
i+1
4
i = 0, 1, 2, 3.
The probability mass function f2 (t) and the cumulative distribution function
F2 (t) are given by
0 1 2 3 4 5 6
16f2 (t) 1 2 3 4 3 2 1
16F2 (t) 1 3 6 10 13 15 16
It is also possible to find examples when a simple expression for the distribution of N (t). Other than the standard special case of Poisson process,
we have the following examples.
Example 10.3
Consider the renewal process whose inter-arrival times have geometric distribution such that
P (X = i) = p(1 p)i1 ,
i 1.
158
It is seen that
!
k1 n
P (Sn = k) =
p (1 p)kn ,
n1
k n.
Thus, we have
P (N (t) = n) =
[t]
X
[t]
X
k1 n
k 1 n+1
kn
p (1 p)
p (1 p)kn1 .
n1
n
k=n+1
k=n
P {N (t) n} =
n=1
Fn (t).
n=1
It can be shown (by using characteristic functions), that the renewal function
and the inter-arrival distribution uniquely determine each other. Thus, if the
inter-arrival time distribution is exponential with rate = 2, then we find
m(t) = t = 2t.
If the renewal function
m(t) = t = 2t,
we know that {N (t) : t 0} is a Poisson process with rate = 2.
One mathematical problem is the finiteness of m(t) for any given t. Suppose P (X1 > 0) > 0. It can be shown that for any given t, Fn (t) decreases
at an exponential rate when n is large. Thus, m(t) is always finite when
P (X1 > 0) > 0.
The relationship between m(t) and the distribution of the inter-arrival
time is made explicit in the following theorem.
159
Theorem 10.1
Let m(t) be the renewal function of the renewal process {N (t) : t 0} and
F (t) be the distribution of the inter-arrival time. Assume that F (0) < 1.
Then
Z t
m(t x)dF (x).
m(t) = F (t) +
0
Proof: We have
m(t) = E{E[N (t)|X1 ]}
=
= F (t) +
10.2
Theorem 10.2
Suppose that {N (t) : t 0} is a renewal process and the inter-arrival time
X1 has non-zero expectation . Then
N (t)
1
almost surely as t .
Proof: Let Sn be the occurrence time of the nth event as before. By the
definition of renewal process, we have
SN (t) t SN (t)+1
which implies
SN (t)
SN (t)+1
t
.
N (t)
N (t)
N (t)
160
n
almost surely. Since N (t) almost surely when t , we have
SN (t)
,
N (t)
and
"
SN (t)+1
SN (t)
1
=
1+
.
N (t)
N (t)
N (t)
Thus, we have the result.
The elementary renewal theorem is as follows.
Theorem 10.3
Suppose that {N (t) : t 0} is a renewal process and the inter-arrival time
X1 has non-zero expectation . Then, the renewal function satisfies
1
m(t)
as t .
We do not provide a proof here. It should be noted that this result cannot
be directly obtained from the last theorem.
If the renewal theorem is assumed, the limiting probabilities of the discrete time Markov chain can be derived as follows.
Example 10.4
Let {Xn : n = 0, 1, . . .} be a discrete time Markov chain. Assume that it is
irreducible, aperiodic and positive recurrent. Let the state space be denoted
as S = {0, 1, . . . , }.
Consider the case when X0 = i for some i. Define Tk be the inter-arrival
time for the Markov chain to visit state i. Thus, we can define a renewal
process Ni (t) to be the number of times when state i is visited by time t.
10.3. PROBLEMS
161
By the renewal theorem, the long term proportion of times when state i is
visited is given by
Ni (n)
lim
1
i
n
n
where i = E[T1 ]. That is, i = 1
i .
Example 10.5
Let {N (t) : t 0} be a renewal process and X1 , X2 , . . . , be the inter-arrival
times. Let = E[X1 ] > 0. For any given n, the event N (t) + 1 = n implies
that the n 1th event has occurred by time t but the nth event has not
occurred yet. In another word, we know that
n1
X
Xi t <
i=1
n
X
Xn .
i=1
T
X
Xi ] = E[T ]E[X].
i=1
E[
i=1
It turns out that N (t) is not a stopping time and the above formula is
not applicable to N (t).
10.3
Problems
1. Suppose that the inter-arrival distribution for a renewal process is Poisson distributed with mean . That is, suppose
P (Xn = k) =
k1
exp(),
(k 1)!
k = 1, 2, . . . .
162
2. Mr. Smith works on a temporary basis. The mean length of each job
he gets is three months. If the amount of time he spends between jobs
is exponentially distributed with mean 2, then at what rate does Mr.
Smith get new jobs?
3. Each time a machine is repaired it remains up for an exponentially
distributed time with rate . It then fails, and its failure is either of
two types. If it is a type 1 failure, then the time to repair the machines
is exponential with rate 1 ; if it is a type 2 failure, then the repair time
is exponential with rate 2 . Each failure is , independently of the time
it took the machines to fail, a type 1 failure with probability p and a
type 2 failure with probability 1 p. What proportion of time is the
machine down due to a type 1 failure? what proportion of time is the
machine down due to a type 2 failure? What proportion of time is it
up?
4. A machine in use is replaced by a new machine either when it fails
or when it reaches the age of T years. If the lifetimes of successive
machines are independent with a common distribution F having density
f show that
(a) the long-run rate at which machines are replaced equals
"Z
#1
xf (x)dx + T (1 F (T ))
RT
0
F (T )
.
xf (x)dx + T [1 F (T )]
10.3. PROBLEMS
163
of eight per hour. The cost incurred in lost production when machines
are out of service is $10 per hour per machine. What is the average
cost rate incurred due to failed machines?
6. The manager of a market can hire either Mary or Alice. Mary, who
gives service at an exponential rate of 20 customers per hour, can be
hired at a rate of $3 per hour. Alice, who gives service at an exponential
rate of 30 customers per hour, can be hired at a rate of $C per hour.
The manager estimates that, on the average, each customers time is
worth $1 per hour and should be accounted for the model. If customers
arrive at a Poisson rate of 10 per hour, then
(a) what is the average cost per hour if Mary is hired? if Alice is hired?
(b) find C if the average cost per hour is the same for Mary and Alice.
7. Consider a renewal process {N (t), t 0} having a gamma (r, ) interarrival distribution. That is, the inter-arrival density is
f (x) =
ex (x)r1
,
(r 1)!
x > 0.
X
et (t)i
i!
i=nr
X
i et (t)i
i=r
[ ]
r
i!
164
Chapter 11
Sample Exam Papers
11.1
1. [4] Using only the axioms of the probability, show that if A and B are
two event such that A B, then
P (A) P (B).
2. [2] Two independent random variables X and Y have probability mass
functions
P (X = k) = 1/3;
k = 0, 1, 2,
and
P (Y = k) = (1/2)k+1 ;
k = 0, 1, 2, . . .
That is: X has uniform distribution on {0, 1, 2}, and Y has geometric
distribution.
[3] (a) Find the probability generating function of X.
[3] (b) Find a closed form expression for the probability generating
function of Y . (This means that leaving it as a summation is not
enough).
[3] (c) Find the probability generating function of XY .
165
166
(1)n
(1) (1)2
+
+ +
.
1!
2!
n!
Hint: for each given n, define Bi = the event that the ith letter is in
the ith envelope, i = 1, 2, . . . , n. Then An = B1 B2 Bn .
167
11.2
P =
0
0 0.3 0.7
168
5. Let {Zn }
n=0 be a usual branching process with Z0 = 1 and Zn =
PZn1
j=1 Xn1,j for n > 0 with family sizes Xn,j being iid random variables.
Assume X0,1 has discrete uniform distribution on 0, 1, . . . , k for some
positive integer k.
For example, if k = 3, then P (X0,1 = j) = 0.25 for j = 0, 1, 2, 3.
[2] (a) For what values of k the probability of extinction is 1?
[4] (b) When k = 3, compute the probability of extinction.
[4] (c) When k = 5, calculation the mean and variance of X5 .
6. In a more complex random walk, Z1 , Z2 , . . . are independent and identically distributed random variables with
P (Z1 = 1) = p,
P (Z1 = 0) = r
and P (Z1 = 2) = q,
(11.1)
Pn
i=1
Zi
169
11.3
170
171
Assume that the time it takes to complete a job is random with exponential distribution, independent of each other, for both servers. The
rates are 1 = 3/hour and 2 = 2/hour for Servers 1 and 2 respectively.
Let X1 (t) and X2 (t) be the numbers of jobs in the queues for Servers
1 and 2 respectively at time t. The jobs in the queues do not switch
between servers even if the other machine is idle sometimes.
The professor submitted the same job to both servers at time t = 0.
His job is done as soon as one of two servers completes it.
Suppose X1 (0) = 1 and X2 (0) = 1.
[5] (1) What is the probability that Server 1 will start work on his job
before Server 2?
[5] (2) What is the probability that he has to wait for 0.5 hours or
longer before any server starts working on his job?
[5] (3) What is the probability that his job is completed by Server 1
before Server 2 starts working on this job?
[5] (4) What is the probability that he has to wait at least 2 hours
before the job is done?
5. A closed population has N individuals. Assume the number of flu
cases can be modeled by a birth and death process. Let X(t) be the
number of flu cases in this population at time t. The birth rate k =
(k + 1)(N k) and the death rate (not the death of the individual,
but the death of the flu) k = k 2 when X(t) = k, k = 0, 1, . . . , N .
(a) [5] Given X(0) = 0, what is the expected waiting times until X(t) =
1, until X(t) = 2?
(b) [5] Given X(0) = k, for some 0 < k < N , what is the probability
that after the next transition, there will be an extra case rather than
there will be one few case?
(c) [5] Assume = 1, = 9 and N = 5. In the long run, what is the
proportion of times when there are no flu cases in the population?
(d) [5] Assume = 1, = 9 and N = 5. what is the average number
of flu cases at any moment in the long run?
172