Jim Dai Textbook PDF

Stochastic Manufacturing & Service Systems
Jim Dai and Hyunwoo Park

School of Industrial and Systems Engineering
Georgia Institute of Technology
October 19, 2011

2
Contents
1 Newsvendor Problem 5
1.1 Profit Maximization . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Cost Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Initial Inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 Queueing Theory 25
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Lindley Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Traffic Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Kingman Approximation Formula . . . . . . . . . . . . . . . . . . 29
2.5 Little’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.8 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Discrete Time Markov Chain 39

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.1.1 State Space . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1.2 Transition Probability Matrix . . . . . . . . . . . . . . . . 40
3.1.3 Initial Distribution . . . . . . . . . . . . . . . . . . . . . . 42
3.1.4 Markov Property . . . . . . . . . . . . . . . . . . . . . . . 44
3.1.5 DTMC Models . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Stationary Distribution . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.1 Interpretation of Stationary Distribution . . . . . . . . . . 48
3.2.2 Function of Stationary Distribution . . . . . . . . . . . . 49
3.3 Irreducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Transition Diagram . . . . . . . . . . . . . . . . . . . . . 51
3.3.2 Accessibility of States . . . . . . . . . . . . . . . . . . . . 51
3.4 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Recurrence and Transience . . . . . . . . . . . . . . . . . . . . . 54
3.5.1 Geometric Random Variable . . . . . . . . . . . . . . . . 55
3.6 Absorption Probability . . . . . . . . . . . . . . . . . . . . . . . . 57
3
4 CONTENTS
3.7 Computing Stationary Distribution Using Cut Method . . . . . . 59

3.8 Introduction to Binomial Stock Price Model . . . . . . . . . . . . 61
3.9 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.10 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4 Poisson Process 71
4.1 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . 71
4.1.1 Memoryless Property . . . . . . . . . . . . . . . . . . . . 72
4.1.2 Comparing Two Exponentials . . . . . . . . . . . . . . . . 73
4.2 Homogeneous Poisson Process . . . . . . . . . . . . . . . . . . . . 75
4.3 Non-homogeneous Poisson Process . . . . . . . . . . . . . . . . . 78
4.4 Thinning and Merging . . . . . . . . . . . . . . . . . . . . . . . . 80
4.4.1 Merging Poisson Process . . . . . . . . . . . . . . . . . . . 80
4.4.2 Thinning Poisson Process . . . . . . . . . . . . . . . . . . 80
4.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.6 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5 Continuous Time Markov Chain 91

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.1 Holding Times . . . . . . . . . . . . . . . . . . . . . . . . 96
5.1.2 Generator Matrix . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Stationary Distribution . . . . . . . . . . . . . . . . . . . . . . . 100
5.3 M/M/1 Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.4 Variations of M/M/1 Queue . . . . . . . . . . . . . . . . . . . . . 103
5.4.1 M/M/1/b Queue . . . . . . . . . . . . . . . . . . . . . . . 103
5.4.2 M/M/∞ Queue . . . . . . . . . . . . . . . . . . . . . . . . 104
5.4.3 M/M/k Queue . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5 Open Jackson Network . . . . . . . . . . . . . . . . . . . . . . . . 107
5.5.1 M/M/1 Queue Review . . . . . . . . . . . . . . . . . . . . 107
5.5.2 Tandem Queue . . . . . . . . . . . . . . . . . . . . . . . . 108
5.5.3 Failure Inspection . . . . . . . . . . . . . . . . . . . . . . 109
5.6 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6 Exercise Answers 117

6.1 Newsvendor Problem . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2 Queueing Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3 Discrete Time Markov Chain . . . . . . . . . . . . . . . . . . . . 135
6.4 Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.5 Continuous Time Markov Process . . . . . . . . . . . . . . . . . . 159
Chapter 1
Newsvendor Problem
In this course, we will learn how to design, analyze, and manage a manufacturing
or service system with uncertainty. Our first step is to understand how to solve
a single period decision problem containing uncertainty or randomness.
1.1 Profit Maximization

We will start with the simplest case: selling perishable items. Suppose we are
running a business retailing newspaper to Georgia Tech campus. We have to
order a specific number of copies from the publisher every evening and sell those
copies the next day. One day, if there is a big news, the number of GT people
who want to buy and read a paper from you may be very high. Another day,
people may just not be interested in reading a paper at all. Hence, you as a
retailer, will encounter the demand variability and it is the primary uncertainty
you need to handle to keep your business sustainable. To do that, you want to
know what is the optimal number of copies you need to order every day. By
intuition, you know that there will be a few other factors than demand you need
to consider.
• Selling price (p): How much will you charge per paper?
• Buying price (cv ): How much will the publisher charge per paper? This
is a variable cost, meaning that this cost is proportional to how many you
order. That is why it is denoted by cv .
• Fixed ordering price (cf ): How much should you pay just to place an
order? Ordering cost is fixed regardless of how many you order.
• Salvage value (s) or holding cost (h): There are two cases about the
leftover items. They could carry some monetary value even if expired.
Otherwise, you have to pay to get rid of them or to storing them. If they
have some value, it is called salvage value. If you have to pay, it is called
5
6 CHAPTER 1. NEWSVENDOR PROBLEM
holding cost. Hence, the following relationship holds: s = −h. This is

per-item value.
• Backorder cost (b): Whenever the actual demand is higher than how
many you prepared, you lose sales. Loss-of-sales could cost you something.
You may be bookkeeping those as backorders or your brand may be dam-
aged. These costs will be represented by backorder cost. This is per-item
cost.
• Your order quantity (y): You will decide how many papers to be
ordered before you start a day. That quantity is represented by y. This
is your decision variable.
As a business, you are assumed to want to maximize your profit. Expressing

your profit as a function of these variables is the first step to obtain the optimal
ordering policy. Profit can be interpreted in two ways: (1) revenue minus cost,
or (2) money you earn minus money you lose.
Let us adopt the first interpretation first. Revenue is represented by selling
price (p) multiplied by how many you actually sell. The actual sales is bounded
by the realized demand and how many you prepared for the period. When you
order too many, you can sell at most as many as the number of people who
want to buy. When you order too few, you can only sell what you prepared.
Hence, your revenue is minimum of D and y, i.e. min(D, y) or D ∧ y. Thinking
about the cost, first of all, you have to pay something to the publisher when
buying papers, i.e. cf +ycv . Two types of additional cost will be incurred to you
depending on whether your order is above or below the actual demand. When
it turns out you prepared less than the demand for the period, the backorder
cost b per every missed sale will occur. The amount of missed sales cannot be
negative, so it can be represented by max(D − y, 0) or (D − y)+ . When it turns
out you prepared more, the quantity of left-over items also cannot go negative,
so it can be expressed as max(y − D, 0) or (y − D)+ . In this way of thinking,
we have the following formula.
Profit =Revenue − Cost

=Revenue − Ordering cost − Holding cost − Backorder cost
=p(D ∧ y) − (cf + ycv ) − h(y − D)+ − b(D − y)+ (1.1)
How about the second interpretation of profit? You earn p − cv dollars every
time you sell a paper. For left-over items, you lose the price you bought in
addition to the holding cost per paper, i.e. cv + h. When the demand is higher
than what you prepared, you lose b backorder cost. Of course, you also have to
pay the fixed ordering cost cf as well when you place an order. With this logic,
we have the following profit function.
Profit =Earning − Loss

=(p − cv )(D ∧ y) − (cv + h)(y − D)+ − b(D − y)+ − cf (1.2)
1.1. PROFIT MAXIMIZATION 7
Since we used two different approaches to model the same profit function,
(1.1) and (1.2) should be equivalent. Comparing the two equations, you will
also notice that (D ∧ y) + (y − D)+ = y.
Now our quest boils down to maximizing the profit function. However, (1.1)
and (1.2) contain a random element, the demand D. We cannot maximize
a function of random element if we allow the randomness to remain in our
objective function. One day demand can be very high. Another day it is also
possible nobody wants to buy a single paper. We have to figure out how to get
rid of this randomness from our objective function. Let us denote profit for the
nth period by gn for further discussion.
Theorem 1.1 (Strong Law of Large Numbers).

g1 + g2 + g3 + · · · + gn
Pr lim = E[g1 ] = 1
n→∞ n
The long-run average profit converges to the expected profit for a single period
with probability 1.
Based on Theorem 1.1, we can change our objective function from just profit
to expected profit. In other words, by maximizing the expected profit, it is
guaranteed that the long-run average profit is maximized because of Theorem
1.1. Theorem 1.1 is the foundational assumption for the entire course. When we
will talk about the long-run average something, it involves Theorem 1.1 in most
cases. Taking expectations, we obtain the following equations corresponding to
(1.1) and (1.2).
E[g(D, y)] =pE[D ∧ y] − (cf + ycv ) − hE[(y − D)+ ] − bE[(D − y)+ ] (1.3)
=(p − cv )E[D ∧ y]
− (cv + h)E[(y − D)+ ] − bE[(D − y)+ ] − cf (1.4)
Since (1.3) and (1.4) are equivalent, we can choose either one of them for
further discussion and (1.4) will be used.
Before moving on, it is important for you to understand what E[D∧y], E[(y−
D)+ ], E[(D − y)+ ] are and how to compute them.
Example 1.1. Compute E[D ∧ 18], E[(18 − D)+ ], E[(D − 18)+ ] for the demand
having the following distributions.
1. D is a discrete random variable. Probability mass function (pmf) is as
follows.
d 10 15 20 25 30
1 1 1 1 1
Pr{D = d} 4 8 8 4 4
Answer: For a discrete random variable, you first compute D ∧ 18, (18 −
D)+ , (D − 18)+ for each of possible D values.
d 10 15 20 25 30
1 1 1 1 1
Pr{D = d} 4 8 8 4 4
D ∧ 18 10 15 18 18 18
(18 − D)+ 8 3 0 0 0
(D − 18)+ 0 0 2 7 12
Then, you take the weighted average using corresponding Pr{D = d} for
each possible D.
1 1 1 1 1 125
E[D ∧ 18] = (10) + (15) + (18) + (18) + (18) =
4 8 8 4 4 8
+ 1 1 1 1 1 19
E[(18 − D) ] = (8) + (3) + (0) + (0) + (0) =
4 8 8 4 4 8
+ 1 1 1 1 1
E[(D − 18) ] = (0) + (0) + (2) + (7) + (12) = 5
4 8 8 4 4
2. D is a continuous random variable following uniform distribution between

10 and 30, i.e. D ∼ Uniform(10, 30).
Answer: Computing expectation of continuous random variable involves
integration. A continuous random variable has probability density func-
tion usually denoted by f . This will be also needed to compute the ex-
pectation. In this case,
(
1
, if x ∈ [10, 30]
fD (x) = 20
0, otherwise
Using this information, compute the expectations directly by integration.

Z ∞
E[D ∧ 18] = (x ∧ 18)fD (x)dx
−∞
Z 30
1
= (x ∧ 18) dx
10 20
Z 18 Z 30
1 1
= (x ∧ 18) dx + (x ∧ 18) dx
10 20 18 20
Z 18 Z 30
1 1
= x dx + 18 dx
10 20 18 20
x=18 x=30
x2 18x
= +
40 x=10 20 x=18
The key idea is to remove the ∧ operator that we cannot handle by sep-
arating the integration interval into two. The other two expectations can
be computed in a similar way.
Z ∞
+
E[(18 − D) ] = (18 − x)+ fD (x)dx
−∞
Z 30
1
= (18 − x)+dx
10 20
Z 18 Z 30
1 1
= (18 − x)+ dx + (18 − x)+ dx
10 20 18 20
Z 18 Z 30
1 1
= (18 − x) dx + 0 dx
10 20 18 20
2 x=18

18x − x2
= +0
20

x=10
Z ∞
+
E[(D − 18) ] = (18 − x)+ fD (x)dx
−∞
Z 30
1
= (x − 18)+ dx
10 20
Z 18 Z 30
1 1
= (x − 18)+ dx + (x − 18)+ dx
10 20 18 20
Z 18 Z 30
1 1
= 0 dx + (x − 18) dx
10 20 18 20
x=30
x2
− 18x
=0 + 2

20

x=18
Now that we have learned how to compute E[D∧y], E[(y−D)+ ], E[(D−y)+ ],

we have acquired the basic toolkit to obtain the order quantity that maximizes
the expected profit. First of all, we need to turn these expectations of the profit
function formula (1.4) into integration forms. For now, assume that the demand
is a nonnegative continuous random variable.
E[g(D, y)] =(p − cv )E[D ∧ y] − (cv + h)E[(y − D)+ ] − bE[(D − y)+ ] − cf

Z ∞
=(p − cv ) (x ∧ y)fD (x)dx
0
Z ∞
− (cv + h) (y − x)+ fD (x)dx
0
Z ∞
−b (x − y)+ fD (x)dx − cf
0
Z y Z ∞
=(p − cv ) xfD (x)dx + yfD (x)dx
0 y
Z y
− (cv + h) (y − x)fD (x)dx
0
Z ∞
−b (x − y)fD (x)dx − cf
y
Z y Z y
=(p − cv ) xfD (x)dx + y 1 − fD (x)dx
0 0
Z y Z y
− (cv + h) y fD (x)dx − xfD (x)dx
0 0
Z y Z y
− b E[D] − xfD (x)dx − y 1 − fD (x)dx − cf (1.5)
0 0
There can be many ways to obtain the maximum point of a function. Here
we will take the derivative of (1.5) and set it to zero. y that makes the derivative
equal to zero will make E[g(D, y)] either maximized or minimized depending on
the second derivative. For now, assume that such y will maximize E[g(D, y)].
We will check this later. Taking the derivative of (1.5) will involve differentiating
an integral. Let us review an important result from Calculus.
Theorem 1.2 (Fundamental Theorem of Calculus). For a function
Z y
H(y) = h(x)dx,
c
0
we have H (y) = h(y), where c is a constant.
Theorem 1.2 can be translated as follows for our case.
Z y
d
xfD (x)dx =yfD (y) (1.6)
dy 0
Z y
d
fD (x)dx =fD (y) (1.7)
dy 0
Also remember the relationship between cdf and pdf of a continuous random
variable.
Z y
FD (y) = fD (x)dx (1.8)
−∞
Use (1.6), (1.7), (1.8) to take the derivative of (1.5).
d
E[g(D, y)] =(p − cv ) (yfD (y) + 1 − FD (y) − yfD (y))
dy
− (cv + h) (FD (y) + yfD (y) − yfD (y))
− b (−yfD (y) − 1 + FD (y) + yfD (y))
=(p + b − cv )(1 − FD (y)) − (cv + h)FD (y)
=(p + b − cv ) − (p + b + h)FD (y) = 0 (1.9)
If we differentiate (1.9) one more time to obtain the second derivative,
d2
E[g(D, y)] = − (p + b + h)fD (y)
dy 2
which is always nonpositive because p, b, h, fD (y) ≥ 0. Hence, taking the deriva-

tive and setting it to zero will give us the maximum point not the minimum
point. Therefore, we obtain the following result.
Theorem 1.3 (Optimal Order Quantity). The optimal order quantity y ∗ is the
smallest y such that

p + b − cv −1 p + b − cv
FD (y) = or y = FD .
p+b+h p+b+h
for continuous demand D.
Looking at Theorem 1.3, it provides the following intuitions.
• Fixed cost cf does not affect the optimal quantity you need to order.
• If you can procure items for free and there is no holding cost, you will
prepare as many as you can.
• If b h, b cv , you will also prepare as many as you can.
• If the buying cost is almost as same as the selling price plus backorder
cost, i.e. cv ≈ p + b, you will prepare nothing. You will prepare only upon
you receive an order.
Example 1.2. Suppose p = 10, cf = 100, cv = 5, h = 2, b = 3, D ∼ Uniform(10, 30).

How many should you order for every period to maximize your long-run average
profit?
Answer: First of all, we need to compute the criterion value.
p + b − cv 10 + 3 − 5 8
= =
p+b+h 10 + 3 + 2 15
Then, we will look up the smallest y value that makes FD (y) = 8/15.
CDF
0.5
0
0 5 10 15 20 25 30 35 40
Therefore, we can conclude that the optimal order quantity

8 62
y ∗ = 10 + 20 = units.
15 3
Although we derived the optimal order quantity solution for the continuous
demand case, Theorem 1.3 applies to the discrete demand case as well. I will
fill in the derivation for discrete case later.
Example 1.3. Suppose p = 10, cf = 100, cv = 5, h = 2, b = 3. Now, D is a
discrete random variable having the following pmf.
d 10 15 20 25 30
1 1 1 1 1
Pr{D = d} 4 8 8 4 4
What is the optimal order quantity for every period?

Answer: We will use the same value 8/15 from the previous example and look
up the smallest y that makes FD (y) = 8/15. We start with y = 10.
1 8
FD (10) = <
4 15
1 1 3 8
FD (15) = + = <
4 8 8 15
1 1 1 1 8
FD (20) = + + = <
4 8 8 2 15
1 1 1 1 3 8
FD (25) = + + + = ≥
4 8 8 4 4 15
∗
Hence, the optimal order quantity y = 25 units.
1.2 Cost Minimization

Suppose you are a production manager of a large company in charge of operating
manufacturing lines. You are expected to run the factory to minimize the cost.
Revenue is another person’s responsibility, so all you care is the cost. To model
the cost of factory operation, let us set up variables in a slightly different way.
1.2. COST MINIMIZATION 13
• Understock cost (cu ): It occurs when your production is not sufficient

to meet the market demand.
• Overstock cost (co ): It occurs when you produce more than the market
demand. In this case, you may have to rent a space to store the excess
items.
• Unit production cost (cv ): It is the cost you should pay whenever you
manufacture one unit of products. Material cost is one of this category.
• Fixed operating cost (cf ): It is the cost you should pay whenever you
decide to start running the factory.
As in the profit maximization case, the formula for cost expressed in terms
of cu , co , cv , cf should be developed. Given random demand D, we have the
following equation.
Cost =Manufacturing Cost
+ Cost associated with Understock Risk
+ Cost associated with Overstock Risk
=(cf + ycv ) + cu (D − y)+ + co (y − D)+ (1.10)
(1.10) obviously also contains randomness from D. We cannot minimize
a random objective itself. Instead, based on Theorem 1.1, we will minimize
expected cost then the long-run average cost will be also guaranteed to be
minimized. Hence, (1.10) will be transformed into the following.
E[Cost] =(cf + ycv ) + cu E[(D − y)+ ] + co E[(y − D)+ ]
Z ∞ Z ∞
=(cf + ycv ) + cu (x − y)+ fD (x)dx + co (y − x)+ fD (x)dx
0 0
Z ∞ Z y
=(cf + ycv ) + cu (x − y)fD (x)dx + co (y − x)fD (x)dx (1.11)
y 0
Again, we will take the derivative of (1.11) and set it to zero to obtain y
that makes E[Cost] minimized. We will verify the second derivative is positive
in this case. Let g here denote the cost function and use Theorem 1.2 to take
the derivative of integrals.
d
E[g(D, y)] =cv + cu (−yfD (y) − 1 + FD (y) + yfD (y))
dy
+ co (FD (y) + yfD (y) − yfD (y))
=cv + cu (FD (y) − 1) + co FD (y) (1.12)
∗
The optimal production quantity y is obtained by setting (1.12) to be zero.
Theorem 1.4 (Optimal Production Quantity). The optimal production quan-
tity that minimizes the long-run average cost is the smallest y such that

cu − cv cu − cv
FD (y) = or y = F −1 .
cu + co cu + co
Theorem 1.4 can be also applied to discrete demand. Several intuitions can
be obtained from Theorem 1.4.
• Fixed cost (cf ) again does not affect the optimal production quantity.
• If understock cost (cu ) is equal to unit production cost (cv ), which makes
cu − cv = 0, then you will not produce anything.
• If unit production cost and overstock cost are negligible compared to un-
derstock cost, meaning cu cv , co , you will prepare as much as you can.
To verify the second derivative of (1.11) is indeed positive, take the derivative
of (1.12).
d2
E[g(D, y)] = (cu + co )fD (y) (1.13)
dy 2
(1.13) is always nonnegative because cu , co ≥ 0. Hence, y ∗ obtained from The-
orem 1.4 minimizes the cost instead of maximizing it.
Before moving on, let us compare criteria from Theorem 1.3 and Theorem
1.4.
p + b − cv cu − cv
and
p+b+h cu + co
Since the profit maximization problem solved previously and the cost minimiza-
tion problem solved now share the same logic, these two criteria should be some-
what equivalent. We can see the connection by matching cu = p + b, co = h.
In the profit maximization problem, whenever you lose a sale due to under-
preparation, it costs you the opportunity cost which is the selling price of an
item and the backorder cost. Hence, cu = p + b makes sense. When you over-
prepare, you should pay the holding cost for each left-over item, so co = h also
makes sense. In sum, Theorem 1.3 and Theorem 1.4 are indeed the same result
in different forms.
Example 1.4. Suppose demand follows Poisson distribution with parameter 3.
The cost parameters are cu = 10, cv = 5, co = 15. Note that e−3 ≈ 0.0498.
Answer: The criterion value is
cu − cv 10 − 5
= = 0.2,
cu + co 10 + 15
so we need to find the smallest y such that makes FD (y) ≥ 0.2. Compute the
probability of possible demands.
30 −3
Pr{D = 0} = e = 0.0498
0!
31
Pr{D = 1} = e−3 = 0.1494
1!
32 −3
Pr{D = 2} = e = 0.2241
2!
1.3. INITIAL INVENTORY 15
Interpret these values into FD (y).
FD (0) =Pr{D = 0} = 0.0498 < 0.2

FD (1) =Pr{D = 0} + Pr{D = 1} = 0.1992 < 0.2
FD (2) =Pr{D = 0} + Pr{D = 1} + Pr{D = 2} = 0.4233 ≥ 0.2
Hence, the optimal production quantity here is 2.
1.3 Initial Inventory

Now let us extend our model a bit further. As opposed to the assumption that
we had no inventory at the beginning, suppose that we have m items when we
decide how many we need to order. The solutions we have developed in previous
sections assumed that we had no inventory when placing an order. If we had
m items, we should order y ∗ − m items instead of y ∗ items. In other words,
the optimal order or production quantity is in fact the optimal order-up-to or
production-up-to quantity.
We had another implicit assumption that we should order, so the fixed cost
did not matter in the previous model. However, if cf is very large, meaning
that starting off a production line or placing an order is very expensive, we may
want to consider not to order. In such case, we have two scenarios: to order
or not to order. We will compare the expected cost for the two scenarios and
choose the option with lower expected cost.
Example 1.5. Suppose understock cost is $10, overstock cost is $2, unit pur-
chasing cost is $4 and fixed ordering cost is $30. In other words, cu = 10, co =
2, cv = 4, cf = 30. Assume that D ∼ Uniform(10, 20) and we already possess 10
items. Should we order or not? If we should, how many items should we order?
Answer: First, we need to compute the optimal amount of items we need to
prepare for each day. Since
cu − cv 10 − 4 1
= = ,
cu + co 10 + 2 2
the optimal order-up-to quantity y ∗ = 15 units. Hence, if we need to order, we
should order 5 = y ∗ − m = 15 − 10 items. Let us examine whether we should
actually order or not.
1. Scenario 1: Not To Order
If we decide not to order, we will not have to pay cf and cv since we order
nothing actually. We just need to consider understock and overstock risks.
We will operate tomorrow with 10 items that we currently have if we decide
not to order.
E[Cost] =cu E[(D − 10)+ ] + co E[(10 − D)+ ]

=10(E[D] − 10) + 2(0) = $50
Note that in this case E[(10 − D)+ ] = 0 because D is always greater than
10.
2. Scenario 2: To Order
If we decide to order, we will order 5 items. We should pay cf and cv
accordingly. Understock and overstock risks also exist in this case. Since
we will order 5 items to lift up the inventory level to 15, we will run
tomorrow with 15 items instead of 10 items if we decide to order.
E[Cost] =cf + (15 − 10)cv + cu E[(D − 15)+ ] + co E[(15 − D)+ ]
=30 + 20 + 10(1.25) + 2(1.25) = $65
Since the expected cost of not ordering is lower than that of ordering, we
should not order if we already have 10 items.
It is obvious that if we have y ∗ items at hands right now, we should order
nothing since we already possess the optimal amount of items for tomorrow’s
operation. It is also obvious that if we have nothing currently, we should order
y ∗ items to prepare y ∗ items for tomorrow. There should be a point between 0
and y ∗ where you are indifferent between order and not ordering.
Suppose you as a manager should give instruction to your assistant on when
he/she should place an order and when should not. Instead of providing in-
structions for every possible current inventory level, it is easier to give your
assistant just one number that separates the decision. Let us call that number
the critical level of current inventory m∗ . If we have more than m∗ items at
hands, the expected cost of not ordering will be lower than the expected cost
of ordering, so we should not order. Conversely, if we have less than m∗ items
currently, we should order. Therefore, when we have exactly m∗ items at hands
right now, the expected cost of ordering should be equal to that of not ordering.
We will use this intuition to obtain m∗ value.
The decision process is summarized in the following figure.
m* y*
Critical Optimal
level for order-up-to
placing an order quantity
Inventory
If your current If your current

inventory inventory
lies here, lies here,
you should order. you should NOT order
Order up to y*. because your inventory is over m*.
1.4. SIMULATION 17
Example 1.6. Given the same settings with the previous example (cu =
10, cv = 4, co = 2, cf = 30), what is the critical level of current inventory
m∗ that determines whether you should order or not?
Answer: From the answer of the previous example, we can infer that the
critical value should be less than 10, i.e. 0 < m∗ < 10. Suppose we currently
own m∗ items. Now, evaluate the expected costs of the two scenarios: ordering
and not ordering.
1. Scenario 1: Not Ordering
E[Cost] =cu E[(D − m∗ )+ ] + co E[(m∗ − D)+ ]

=10(E[D] − m∗ ) + 2(0) = 150 − 10m∗ (1.14)
2. Scenario 2: Ordering
In this case, we will order. Given that we will order, we will order y ∗ −m∗ =
15 − m∗ items. Therefore, we will start tomorrow with 15 items.
E[Cost] =cf + (15 − 10)cv + cu E[(D − 15)+ ] + co E[(15 − D)+ ]

=30 + 4(15 − m∗ ) + 10(1.25) + 2(1.25) = 105 − 4m∗ (1.15)
At m∗ , (1.14) and (1.15) should be equal.
150 − 10m∗ = 105 − 4m∗ ⇒ m∗ = 7.5 units
The critical value is 7.5 units. If your current inventory is below 7.5, you should
order for tomorrow. If the current inventory is above 7.5, you should not order.
1.4 Simulation
Generate 100 random demands from Uniform(10, 30).
p = 10, cf = 30, cv = 4, h = 5, b = 3
p + b − cv 10 + 3 − 4 1
= =
p + b + h 10 + 3 + 5 2
The optimal order-up-to quantity from Theorem 1.3 is 20. We will compare
the performance between the policies of y = 15, 20, 25.
Listing 1.1: Continuous Uniform Demand Simulation
# Set up parameters
p=10;cf=30;cv=4;h=5;b=3
# How many random demands will be generated?

n=100
# Generate n random demands from the uniform distribution

Dmd=runif(n,min=10,max=30)
# Test the policy where we order 15 items for every period

y=15
mean(p*pmin(Dmd,y)-cf-y*cv-h*pmax(y-Dmd,0)-b*pmax(Dmd-y,0))
> 33.54218

y=20
> 44.37095

y=25
> 32.62382
You can see the policy with y = 20 maximizes the 100-period average profit
as promised by the theory. In fact, if n is relatively small, it is not guaranteed
that we have maximized profit even if we run based on the optimal policy
obtained from this section. The underlying assumption is that we should operate
with this policy for a long time. Then, Theorem 1.1 guarantees that the average
profit will be maximized when we use the optimal ordering policy.
Discrete demand case can also be simulated. Suppose the demand has the
following distribution. All other parameters remain same.
d 10 15 20 25 30
1 1 1 1 1
Pr{D = d} 4 8 4 8 4
The theoretic optimal order-up-to quantity in this case is also 20. Let us test
three policies: y = 15, 20, 25.
Listing 1.2: Discrete Demand Simulation
# Set up parameters
p=10;cf=30;cv=4;h=5;b=3
# How many random demands will be generated?

n=100
# Generate n random demands from the discrete demand distribution

Dmd=sample(c(10,15,20,25,30),n,replace=TRUE,c(1/4,1/8,1/4,1/8,1/4))

y=15
> 19.35

y=20
> 31.05

1.5. EXERCISE 19
y=25
> 26.55
There are other distributions such as triangular, normal, Poisson or binomial

distributions available in R. When you do your senior project, for example, you
will observe the demand for a department or a factory. You first approximate
the demand using these theoretically established distributions. Then, you can
simulate the performance of possible operation policies.
1.5 Exercise
1. Show that (D ∧ y) + (y − D)+ = y.
2. Let D be a discrete random variable with the following pmf.
d 5 6 7 8 9
1 3 4 1 1
Pr{D = d} 10 10 10 10 10
Find
(a) E[min(D, 7)]

(b) E[(7 − D)+ ]
where x+ = max(x, 0).
3. Let D be a Poisson random variable with parameter 3. Find
(a) E[min(D, 2)]

(b) E[(3 − D)+ ].
Note that pmf of a Poisson random variable with parameter λ is
λk −λ
Pr{D = k} = e .
k!
4. Let D be a continuous random variable and uniformly distributed between

5 and 10. Find
(a) E[max(D, 8)]

(b) E[(D − 8)− ]
where x− = min(x, 0).
5. Let D be an exponential random variable with parameter 7. Find
(a) E[max(D, 3)]

(b) E[(D − 4)− ].

Note that pdf of an exponential random variable with parameter λ is
fD (x) = λe−λx for x ≥ 0.
6. David buys fruits and vegetables wholesale and retails them at Davids
Produce on La Vista Road. One of the more difficult decisions is the
amount of bananas to buy. Let us make some simplifying assumptions,
and assume that David purchases bananas once a week at 10 cents per
pound and retails them at 30 cents per pound during the week. Bananas
that are more than a week old are too ripe and are sold for 5 cents per
pound.
(a) Suppose the demand for the good bananas follows the same distribu-
tion as D given in Problem 2. What is the expected profit of David
in a week if he buys 7 pounds of banana?
(b) Now assume that the demand for the good bananas is uniformly
distributed between 5 and 10 like in Problem 4. What is the expected
profit of David in a week if he buys 7 pounds of banana?
(c) Find the expected profit if David’s demand for the good bananas
follows an exponential distribution with mean 7 and if he buys 7
pounds of banana.
7. Suppose we are selling lemonade during a football game. The lemonade
sells for $18 per gallon but only costs $3 per gallon to make. If we run
out of lemonade during the game, it will be impossible to get more. On
the other hand, leftover lemonade has a value of $1. Assume that we
believe the fans would buy 10 gallons with probability 0.1, 11 gallons with
probability 0.2, 12 gallons with probability 0.4, 13 gallons with probability
0.2, and 14 gallons with probability 0.1.
(a) What is the mean demand?
(b) If 11 gallons are prepared, what is the expected profit?
(c) What is the best amount of lemonade to order before the game?
(d) Instead, suppose that the demand was normally distributed with
mean 1000 gallons and variance 200 gallons2 . How much lemonade
should be ordered?
8. Suppose that a bakery specializes in chocolate cakes. Assume the cakes
retail at $20 per cake, but it takes $10 to prepare each cake. Cakes cannot
be sold after one week, and they have a negligible salvage value. It is
estimated that the weekly demand for cakes is: 15 cakes in 5% of the
weeks, 16 cakes in 20% of the weeks, 17 cakes in 30% of the weeks, 18
cakes in 25% of the weeks, 19 cakes in 10% of the weeks, and 20 cakes in
10% of the weeks. How many cakes should the bakery prepare each week?
What is the bakery’s expected optimal weekly profit?
1.5. EXERCISE 21
9. A camera store specializes in a particular popular and fancy camera. As-

sume that these cameras become obsolete at the end of the month. They
guarantee that if they are out of stock, they will special-order the cam-
era and promise delivery the next day. In fact, what the store does is to
purchase the camera from an out of state retailer and have it delivered
through an express service. Thus, when the store is out of stock, they
actually lose the sales price of the camera and the shipping charge, but
they maintain their good reputation. The retail price of the camera is
$600, and the special delivery charge adds another $50 to the cost. At
the end of each month, there is an inventory holding cost of $25 for each
camera in stock (for doing inventory etc). Wholesale cost for the store to
purchase the cameras is $480 each. (Assume that the order can only be
made at the beginning of the month.)
(a) Assume that the demand has a discrete uniform distribution from 10
to 15 cameras a month (inclusive). If 12 cameras are ordered at the
beginning of a month, what are the expected overstock cost and the
expected understock or shortage cost? What is the expected total
cost?
(b) What is optimal number of cameras to order to minimize the expected
total cost?
(c) Assume that the demand can be approximated by a normal distribu-
tion with mean 1000 and standard deviation 100 cameras a month.
What is the optimal number of cameras to order to minimize the
expected total cost?
10. Next month’s production at a manufacturing company will use a certain
solvent for part of its production process. Assume that there is an ordering
cost of $1,000 incurred whenever an order for the solvent is placed and
the solvent costs $40 per liter. Due to short product life cycle, unused
solvent cannot be used in following months. There will be a $10 disposal
charge for each liter of solvent left over at the end of the month. If there
is a shortage of solvent, the production process is seriously disrupted at a
cost of $100 per liter short. Assume that the initial inventory level is m,
where m = 0, 100, 300, 500 and 700 liters.
(a) What is the optimal ordering quantity for each case when the demand
is discrete with Pr{D = 500} = Pr{D = 800} = 1/8, Pr{D =
600} = 1/2 and Pr{D = 700} = 1/4?
(b) What is the optimal ordering policy for arbitrary initial inventory
level m? (You need to specify the critical value m∗ in addition to
the optimal order-up-to quantity y ∗ . When m ≤ m∗ , you make an
order. Otherwise, do not order.)
(c) Assume optimal quantity will be ordered. What is the total expected
cost when the initial inventory m = 0? What is the total expected
cost when the initial inventory m = 700?
11. Redo Problem 10 for the case where the demand is governed by the con-
tinuous uniform distribution varying between 400 and 800 liters.
12. An automotive company will make one last production run of parts for
Part 947A and 947B, which are not interchangeable. These parts are no
longer used in new cars, but will be needed as replacements for warranty
work in existing cars. The demand during the warranty period for 947A is
approximately normally distributed with mean 1,500,000 parts and stan-
dard deviation 500,000 parts, while the mean and standard deviation for
947B is 500,000 parts and 100,000 parts. (Assume that two demands
are independent.) Ignoring the cost of setting up for producing the part,
each part costs only 10 cents to produce. However, if additional parts
are needed beyond what has been produced, they will be purchased at 90
cents per part (the same price for which the automotive company sells its
parts). Parts remaining at the end of the warranty period have a salvage
value of 8 cents per part. There has been a proposal to produce Part
947C, which can be used to replace either of the other two parts. The
unit cost of 947C jumps from 10 to 14 cents, but all other costs remain
the same.
(a) Assuming 947C is not produced, how many 947A should be pro-
duced?
(b) Assuming 947C is not produced, how many 947B should be pro-
duced?
(c) How many 947C should be produced in order to satisfy the same
fraction of demand from parts produced in-house as in the first two
parts of this problem.
(d) How much money would be saved or lost by producing 947C, but
meeting the same fraction of demand in-house?
(e) Is your answer to question (c), the optimal number of 947C to pro-
duce? If not, what would be the optimal number of 947C to produce?
(f) Should the more expensive part 947C be produced instead of the two
existing parts 947A and 947B. Why?
Hint: compare the expected total costs. Also, suppose that D ∼ Normal(µ, σ 2 ).
Z q Z q
1 (x−µ)2 1 (x−µ)2
x√ e− 2σ2 dx = (x − µ) √ e− 2σ2 dx
0 2πσ 0 2πσ
Z q
1 (x−µ)2
+µ √ e− 2σ2 dx
0 2πσ
Z (q−µ)2
1 t
= √ e− 2σ2 dt + µPr{0 ≤ D ≤ q}
µ2 2 2πσ
where, in the 2nd step, we changed variable by letting t = (x − µ)2 .

1.5. EXERCISE 23
13. A warranty department manages the after-sale service for a critical part
of a product. The department has an obligation to replace any damaged
parts in the next 6 months. The number of damaged parts X in the next
6 months is assumed to be a random variable that follows the following
distribution:
x 100 200 300 400

Pr{X = x} .1 .2 .5 .2
The department currently has 200 parts in stock. The department needs
to decide if it should make one last production run for the part to be used
for the next 6 months. To start the production run, the fixed cost is $2000.
The unit cost to produce a part is $50. During the warranty period of next
6 months, if a replacement request comes and the department does not
have a part available in house, it has to buy a part from the spot-market at
the cost of $100 per part. Any part left at the end of 6 month sells at $10.
(There is no holding cost.) Should the department make the production
run? If so, how many items should it produce?
14. A store sells a particular brand of fresh juice. By the end of the day, any
unsold juice is sold at a discounted price of $2 per gallon. The store gets
the juice daily from a local producer at the cost of $5 per gallon, and it
sells the juice at $10 per gallon. Assume that the daily demand for the
juice is uniformly distributed between 50 gallons to 150 gallons.
(a) What is the optimal number of gallons that the store should order
from the distribution each day in order to maximize the expected
profit each day?
(b) If 100 gallons are ordered, what is the expected profit per day?
15. An auto company is to make one final purchase of a rare engine oil to ful-
fill its warranty services for certain car models. The current price for the
engine oil is $1 per gallon. If the company runs out the oil during the war-
ranty period, it will purchase the oil from a supply at the market price of
$4 per gallon. Any leftover engine oil after the warranty period is useless,
and costs $1 per gallon to get rid of. Assume the engine oil demand during
the warranty is uniformly distributed (continuous distribution) between 1
million gallons to 2 million gallons, and that the company currently has
half million gallons of engine oil in stock (free of charge).
(a) What is the optimal amount of engine oil the company should pur-
chase now in order to minimize the total expected cost?
(b) If 1 million gallons are purchased now, what is the total expected
cost?
16. A company is obligated to provide warranty service for Product A to its

customers next year. The warranty demand for the product follows the
following distribution.
d 100 200 300 400

Pr{D = d} .2 .4 .3 .1
The company needs to make one production run to satisfy the warranty
demand for entire next year. Each unit costs $100 to produce; the penalty
cost of a unit is $500. By the end of the year, the savage value of each
unit is $50.
(a) Suppose that the company has currently 0 units. What is the optimal
quantity to produce in order to minimize the expected total cost?
Find the optimal expected total cost.
(b) Suppose that the company has currently 100 units at no cost and
there is $20000 fixed cost to start the production run. What is the
optimal quantity to produce in order to minimize the expected total
cost? Find the optimal expected total cost.
17. Suppose you are running a restaurant having only one menu, lettuce salad,
in the Tech Square. You should order lettuce every day 10pm after clos-
ing. Then, your supplier delivers the ordered amount of lettuce 5am next
morning. Store hours is from 11am to 9pm every day. The demand for
the lettuce salad for a day (11am-9pm) has the following distribution.
d 20 25 30 35
Pr{D = d} 1/6 1/3 1/3 1/6
One lettuce salad requires two units of lettuce. The selling price of lettuce
salad is $6, the buying price of one unit of lettuce is $1. Of course, leftover
lettuce of a day cannot be used for future salad and you have to pay 50
cents per unit of lettuce for disposal.
(a) What is the optimal order-up-to quantity of lettuce for a day?
(b) If you ordered 50 units of lettuce today, what is the expected profit
of tomorrow? Include the purchasing cost of 50 units of lettuce in
your calculation.
Chapter 2
Queueing Theory
Before getting into Discrete-time Markov Chains, we will learn about general
issues in the queueing theory. Queueing theory deals with a set of systems
having waiting space. It is a very powerful tool that can model a broad range
of issues. Starting from analyzing a simple queue, a set of queues connected
with each other will be covered as well in the end. This chapter will give you
the background knowledge when you read the required book, The Goal. We will
revisit the queueing theory once we have more advanced modeling techniques
and knowledge.
2.1 Introduction
Think about a service system. All of you must have experienced waiting in a
service system. One example would be the Student Center or some restaurants.
This is a human system. A bit more automated service system that has a queue
would be a call center and automated answering machines. We can imagine a
manufacturing system instead of a service system.
These waiting systems can be generalized as a set of buffers and servers.
There are key factors when you try to model such a system. What would you
need to analyze your system?
• How frequently customers come to your system? → Inter-arrival Times

• How fast your servers can serve the customers? → Service Times
• How many servers do you have? → Number of Servers
• How large is your waiting space? → Queue Size
If you can collect data about these metrics, you can characterize your queue-
ing system. In general, a queueing system can be denoted as follows.
G/G/s/k
25
26 CHAPTER 2. QUEUEING THEORY
The first letter characterizes the distribution of inter-arrival times. The second
letter characterizes the distribution of service times. The third number denotes
the number of servers of your queueing system. The fourth number denotes
the total capacity of your system. The fourth number can be omitted and in
such case it means that your capacity is infinite, i.e. your system can contain
any number of people in it up to infinity. The letter “G” represents a general
distribution. Other candidate characters for this position is “M” and “D” and
the meanings are as follows.
• G: General Distribution
• M: Exponential Distribution
• D: Deterministic Distribution (or constant)
The number of servers can vary from one to many to infinity. The size of
buffer can also be either finite or infinite. To simplify the model, assume that
there is only a single server and we have infinite buffer. By infinite buffer, it
means that space is so spacious that it is as if the limit does not exist.
Now we set up the model for our queueing system. In terms of analysis,
what are we interested in? What would be the performance measures of such
systems that you as a manager should know?
• How long should your customer wait in line on average?
• How long is the waiting line on average?
There are two concepts of average. One is average over time. This applies
to the average number of customers in the system or in the queue. The other
is average over people. This applies to the average waiting time per customer.
You should be able to distinguish these two.
Example 2.1. Assume that the system is empty at t = 0. Assume that
u1 = 1, u2 = 3, u3 = 2, u4 = 3,
v1 = 4, v2 = 2, v3 = 1, v4 = 2.
(ui is ith customer’s inter-arrival time and vi is ith customer’s service time.)
1. What is the average number of customers in the system during the first
10 minutes?
2. What is the average queue size during the first 10 minutes?
3. What is the average waiting time per customer for the first 4 customers?
Answer:
1. If we draw the number of people in the system at time t with respect to

t, it will be as follows.
2.2. LINDLEY EQUATION 27
Z(t)
1
0
0 1 2 3 4 5 6 7 8 9 10
Z 10
1 1
E[Z(t)]t∈[0,10] = Z(t)dt = (10) = 1
10 0 10
2. If we draw the number of people in the queue at time t with respect to t,

it will be as follows.
2
Q(t)
0
0 1 2 3 4 5 6 7 8 9 10
Z 10
1 1
E[Q(t)]t∈[0,10] = Q(t)dt = (2) = 0.2
10 0 10
3. We first need to compute waiting times for each of 4 customers. Since

the first customer does not wait, w1 = 0. Since the second customer
arrives at time 4, while the first customer’s service ends at time 5. So, the
second customer has to wait 1 minute, w2 = 1. Using the similar logic,
w3 = 1, w4 = 0.
0+1+1+0
E[W ] = = 0.5 min
4
2.2 Lindley Equation

From the previous example, we now should be able to compute each customer’s
waiting time given ui , vi . It requires too much effort if we have to draw graphs
every time we need to compute wi . Let us generalize the logic behind calculating
waiting times for each customer. Let us determine (i + 1)th customer’s waiting
time. If (i + 1)th customer arrives after all the time ith customer waited and
got served, (i + 1)th customer does not have to wait. Its waiting time is 0.
Otherwise, it has to wait wi + vi − ui+1 . Figure 2.1, and Figure 2.2 explain the
two cases.
ui+1 wi+1
wi vi
Time
i th i th (i+1)th i th
arrival service arrival service
start end
Figure 2.1: (i + 1)th arrival before ith service completion. (i + 1)th waiting time
is wi + vi − ui+1 .
ui+1
wi vi
Time
i th i th i th (i+1)th
arrival service service arrival
start end
Figure 2.2: (i + 1)th arrival after ith service completion. (i + 1)th waiting time
is 0.
Simply put,
wi+1 = (wi + vi − ui+1 )+ .
This is called the Lindley Equation.
Example 2.2. Given the following inter-arrival times and service times of first
10 customers, compute waiting times and system times (time spent in the system
including waiting time and service time) for each customer.
ui = 3, 2, 5, 1, 2, 4, 1, 5, 3, 2
vi = 4, 3, 2, 5, 2, 2, 1, 4, 2, 3
Answer: Note that system time can be obtained by adding waiting time and
service time. Denote the system time of ith customer by zi .
ui 3 2 5 1 2 4 1 5 3 2
vi 4 3 2 5 2 2 1 4 2 3
wi 0 2 0 1 4 2 3 0 1 1
zi 4 5 2 6 6 4 4 4 3 4
2.3. TRAFFIC INTENSITY 29
2.3 Traffic Intensity

Suppose
E[ui ] =mean inter-arrival time = 2 min

E[vi ] =mean service time = 4 min.
Is this queueing system stable? By stable, it means that the queue size should
not go to the infinity. Intuitively, this queueing system will not last because av-
erage service time is greater than average inter-arrival time so your system will
soon explode. What was the logic behind this judgement? It was basically com-
paring the average inter-arrival time and the average service time. To simplify
the judgement, we come up with a new quantity called the traffic intensity.
Definition 2.1 (Traffic Intensity). Traffic intensity ρ is defined to be
λ 1/E[ui ]
ρ= =
µ 1/E[vi ]
where λ is the arrival rate and µ is the service rate.
Given a traffic intensity, it will fall into one of the following three categories.
• If ρ < 1, the system is stable.
• If ρ = 1, the system is unstable unless both inter-arrival times and service
times are deterministic (constant).
• If ρ > 1, the system is unstable.
Then, why don’t we call ρ utilization instead of traffic intensity? Utilization
seems to be more intuitive and user-friendly name. In fact, utilization just
happens to be same as ρ if ρ < 1. However, the problem arises if ρ > 1 because
utilization cannot go over 100%. Utilization is bounded above by 1 and that is
why traffic intensity is regarded more general notation to compare arrival and
service rates.
Definition 2.2 (Utilization). Utilization is defined as follows.
(
ρ, if ρ < 1
Utilization =
1, if ρ ≥ 1
Utilization can also be interpreted as the long-run fraction of time the server
is utilized.
2.4 Kingman Approximation Formula

Theorem 2.1 (Kingman’s High-traffic Approximation Formula). Assume the
traffic intensity ρ < 1 and ρ is close to 1. The long-run average waiting time in
a queue
c2a + c2s

ρ
E[W ] ≈ E[vi ]
1−ρ 2
where c2a , c2s are squared coefficient of variation of inter-arrival times and service
times defined as follows.
Var[u1 ] Var[v1 ]
c2a = 2, c2s = 2
(E[u1 ]) (E[v1 ])
Example 2.3. 1. Suppose inter-arrival time follows an exponential distri-
bution with mean time 3 minutes and service time follows an exponential
distribution with mean time 2 minutes. What is the expected waiting time
per customer?
2. Suppose inter-arrival time is constant 3 minutes and service time is also
constant 2 minutes. What is the expected waiting time per customer?
Answer:
1. Traffic intensity is
λ 1/E[ui ] 1/3 2
ρ= = = = .
µ 1/E[vi ] 1/2 3
Since both inter-arrival times and service times are exponentially dis-
tributed,
E[ui ] = 3, Var[ui ] = 32 = 9, E[vi ] = 2, Var[vi ] = 22 = 4.
Therefore, c2a = Var[ui ]/(E[ui ])2 = 1, c2s = 1. Hence,

2
ca + c2s

ρ
E[W ] =E[vi ]
1−ρ 2

2/3 1+1
=2 = 4 minutes.
1/3 2
2. Traffic intensity remains same, 2/3. However, since both inter-arrival

times and service times are constant, their variances are 0. Thus, c2a =
c2s = 0.

2/3 0+0
E[W ] = 2 = 0 minutes
1/3 2
It means that none of the customers will wait upon their arrival.
As shown in the previous example, when the distributions for both inter-
arrival times and service times are exponential, the squared coefficient of varia-
tion term becomes 1 from the Kingman’s approximation formula and the formula
2.5. LITTLE’S LAW 31
becomes exact to compute the average waiting time per customer for M/M/1
queue.

ρ
E[W ] =E[vi ]
1−ρ
Also note that if inter-arrival time or service time distribution is deterministic,
c2a or c2s becomes 0.
Example 2.4. You are running a highway collecting money at the entering
toll gate. You reduced the utilization level of the highway from 90% to 80% by
adopting car pool lane. How much does the average waiting time in front of the
toll gate decrease?
Answer:
0.9 0.8
= 9, =4
1 − 0.9 1 − 0.8
The average waiting time in in front of the toll gate is reduced by more than a
half.
The Goal is about identifying bottlenecks in a plant. When you become
a manager of a company and are running a expensive machine, you usually
want to run it all the time with full utilization. However, the implication of
Kingman formula tells you that as your utilization approaches to 100%, the
waiting time will be skyrocketing. It means that if there is any uncertainty or
random fluctuation input to your system, your system will greatly suffer. In
lower ρ region, increasing ρ is not that bad. If ρ near 1, increasing utilization a
little bit can lead to a disaster.
Atlanta, 10 years ago, did not suffer that much of traffic problem. As its
traffic infrastructure capacity is getting closer to the demand, it is getting more
and more fragile to uncertainty.
A lot of strategies presented in The Goal is in fact to decrease ρ. You can
do various things to reduce ρ of your system by outsourcing some process, etc.
You can also strategically manage or balance the load on different parts of your
system. You may want to utilize customer service organization 95% of time,
while utilization of sales people is 10%.
2.5 Little’s Law

L = λW
The Little’s Law is much more general than G/G/1 queue. It can be applied
to any black box with definite boundary. The Georgia Tech campus can be one
black box. ISyE building itself can be another. In G/G/1 queue, we can easily
get average size of queue or service time or time in system as we differently draw
box onto the queueing system.
The following example shows that Little’s law can be applied in broader
context than the queueing theory.
Example 2.5 (Merge of I-75 and I-85). Atlanta is the place where two inter-
state highways, I-75 and I-85, merge and cross each other. As a traffic manager
of Atlanta, you would like to estimate the average time it takes to drive from
the north confluence point to the south confluence point. On average, 100 cars
per minute enter the merged area from I-75 and 200 cars per minute enter the
same area from I-85. You also dispatched a chopper to take a aerial snapshot
of the merged area and counted how many cars are in the area. It turned out
that on average 3000 cars are within the merged area. What is the average time
between entering and exiting the area per vehicle?
Answer:
L =3000 cars
λ =100 + 200 = 300 cars/min
L 3000
∴W = = = 10 minutes
λ 300
2.6 Throughput
Another focus of The Goal is set on the throughput of a system. Throughput
is defined as follows.
Definition 2.3 (Throughput). Throughput is the rate of output flow from a

system. If ρ ≤ 1, throughput= λ. If ρ > 1, throughput= µ.
The bounding constraint of throughput is either arrival rate or service rate

depending on the traffic intensity.
Example 2.6 (Tandem queue with two stations). Suppose your factory pro-
duction line has two stations linked in series. Every raw material coming into
your line should be processed by Station A first. Once it is processed by Station
A, it goes to Station B for finishing. Suppose raw material is coming into your
line at 15 units per minute. Station A can process 20 units per minute and
Station B can process 25 units per minute.
1. What is the throughput of the entire system?
2. If we double the arrival rate of raw material from 15 to 30 units per minute,
what is the throughput of the whole system?
Answer:
1. First, obtain the traffic intensity for Station A.
λ 15
ρA = = = 0.75
µA 20
Since ρA < 1, the throughput of Station A is λ = 15 units per minute.

Since Station A and Station B is linked in series, the throughput of Station
2.7. SIMULATION 33
A becomes the arrival rate for Station B.

λ 15
ρB = = = 0.6
µB 25
Also, ρB < 1, the throughput of Station B is λ = 15 units per minute.
Since Station B is the final stage of the entire system, the throughput of
the entire system is also λ = 15 units per minute.
2. Repeat the same steps.
λ 30
ρA = = = 1.5
µA 20
Since ρA > 1, the throughput of Station A is µA = 20 units per minute,
which in turn becomes the arrival rate for Station B.
µA 20
ρB = = = 0.8
µB 25
ρB < 1, so the throughput of Station B is µA = 20 units per minute,
which in turn is the throughput of the whole system.
2.7 Simulation
Listing 2.1: Simulation of a Simple Queue and Lindley Equation
N = 100
# Function for Lindley Equation

lindley = function(u,v){
for (i in 1:length(u)) {
if(i==1) w = 0
else {
w = append(w, max(w[i-1]+v[i-1]-u[i], 0))
}
}
return(w)
}
# CASE 1: Discrete Distribution

# Generate N inter-arrival times and service times
u = sample(c(2,3,4),N,replace=TRUE,c(1/3,1/3,1/3))
v = sample(c(1,2,3),N,replace=TRUE,c(1/3,1/3,1/3))
# Compute waiting time for each customer

w = lindley(u,v)
w
# CASE 2: Deterministic Distribution

# All inter-arrival times are 3 minutes and all service times are 2
minutes
# Observe that nobody waits in this case.
u = rep(3, 100)
v = rep(2, 100)
w = lindley(u,v)
w
The Kingman’s approximation formula is exact when inter-arrival times and

service times follow iid exponential distribution.

1 ρ
E[W ] =
µ 1−ρ
We can confirm this equation by simulating an M/M/1 queue.
Listing 2.2: Kingman Approximation
# lambda = arrival rate, mu = service rate

N = 10000; lambda = 1/10; mu = 1/7
# Generate N inter-arrival times and service times from exponential

distribution
u = rexp(N,rate=lambda)
v = rexp(N,rate=mu)
# Compute the average waiting time of each customer

w = lindley(u,v)
mean(w)
> 16.20720
# Compare with Kingman approximation

rho = lambda/mu
(1/mu)*(rho/(1-rho))
> 16.33333
The Kingman’s approximation formula becomes more and more accurate as

N grows.
2.8 Exercise
1. Let Y be a random variable with p.d.f. ce−3s for s ≥ 0, where c is a
constant.
(a) Determine c.
(b) What is the mean, variance, and squared coefficient of variation of
Y where the squared coefficient of variation of Y is defined to be
Var[Y ]/(E[Y ]2 )?
2. Consider a single server queue. Initially, there is no customer in the sys-

tem. Suppose that the inter-arrival times of the first 15 customers are:
2, 5, 7, 3, 1, 4, 9, 3, 10, 8, 3, 2, 16, 1, 8
2.8. EXERCISE 35
In other words, the first customer will arrive at t = 2 minutes, and the
second will arrive at t = 2 + 5 minutes, and so on. Also, suppose that the
service time of the first 15 customers are
1, 4, 2, 8, 3, 7, 5, 2, 6, 11, 9, 2, 1, 7, 6
(a) Compute the average waiting time (the time customer spend in buffer)
of the first 10 departed customers.
(b) Compute the average system time (waiting time plus service time) of
the first 10 departed customers.
(c) Compute the average queue size during the first 20 minutes.
(d) Compute the average server utilization during the first 20 minutes.
(e) Does the Little’s law of hold for the average queue size in the first 20
minutes?
3. We want to decide whether to employ a human operator or buy a machine
to paint steel beams with a rust inhibitor. Steel beams are produced at a
constant rate of one every 14 minutes. A skilled human operator takes an
average time of 700 seconds to paint a steel beam, with a standard devi-
ation of 300 seconds. An automatic painter takes on average 40 seconds
more than the human painter to paint a beam, but with a standard devi-
ation of only 150 seconds. Estimate the expected waiting time in queue
of a steel beam for each of the operators, as well as the expected number
of steel beams waiting in queue in each of the two cases. Comment on the
effect of variability in service time.
4. The arrival rate of customers to an ATM machine is 30 per hour with
exponentially distirbuted interarrival times. The transaction times of two
customers are independent and identically distributed. Each transaction
time (in minutes) is distributed according to the following pdf:
(
4λ2 se−2λs , if s ≥ 0
f (s) =
0, otherwise
where λ = 2/3.
(a) What is the average waiting for each customer?
(b) What is the average number of customers waiting in line?
(c) What is the average number of customers at the site?
5. A production line has two machines, Machine A and Machine B, that are
arranged in series. Each job needs to processed by Machine A first. Once
it finishes the processing by Machine A, it moves to the next station, to
be processed by Machine B. Once it finishes the processing by Machine
B, it leaves the production line. Each machine can process one job at
a time. An arriving job that finds the machine busy waits in a buffer.
(The buffer sizes are assumed to be infinite.) The processing times for
Machine A are iid having exponential distribution with mean 4 minutes.
The processing times for Machine B are iid with mean 2 minutes. Assume
that the inter-arrival times of jobs arriving at the production line are iid,
having exponential distribution with mean of 5 minutes.
(a) What is the utilization of Machine A? What is the utilization of
Machine B?
(b) What is the throughput of the production system? (Throughput is
defined to be the rate of final output flow, i.e. how many items will
exit the system in a unit time.)
(c) What is the average waiting time at Machine A, excluding the service
time?
(d) It is known the average time in the entire production line is 30 min-
utes per job. What is the long-run average number of jobs in the
entire production line?
(e) Suppose that the mean inter-arrival time is changed to 1 minute.
What are the utilizations for Machine A and Machine B, respectively?
What is the throughput of the production system?
6. An auto collision shop has roughly 10 cars arriving per week for repairs.
A car waits outside until it is brought inside for bumping. After bumping,
the car is painted. On the average, there are 15 cars waiting outside in
the yard to be repaired, 10 cars inside in the bump area, and 5 cars inside
in the painting area. What is the average length of time a car is in the
yard, in the bump area, and in the painting area? What is the average
length of time from when a car arrives until it leaves?
7. A small bank is staffed by a single server. It has been observed that, during
a normal business day, the inter-arrival times of customers to the bank are
iid having exponential distribution with mean 3 minutes. Also, the the
processing times of customers are iid having the following distribution (in
minutes):
x 1 2 3
Pr{X = x} 1/4 1/2 1/4
An arrival finding the server busy joins the queue. The waiting space is
infinite.
(a) What is the long-run fraction of time that the server is busy?
(b) What the the long-run average waiting time of each customer in the
queue, excluding the processing time?
(c) What is average number of customers in the bank, those in queue
plus those in service?
2.8. EXERCISE 37
(d) What is the throughput of the bank?

(e) If the inter-arrival times have mean 1 minute. What is the throughput
of the bank?
8. You are the manager at the Student Center in charge of running the food
court. The food court is composed of two parts: cooking station and
cashier’s desk. Every person should go to the cooking station, place an
order, wait there and pick up first. Then, the person goes to the cashier’s
desk to check out. After checking out, the person leaves the food court.
The cook and the cashier can handle one person at a time. We have only
one cook and only one cashier. An arriving person who finds the cook or
the cashier busy waits in line. The waiting space is assumed to be infinite.
The processing times for the cook are iid having deterministic distribution
with mean 2 minutes, i.e. it takes exactly 2 minutes for the cook to
process one person. The processing times for the cashier are iid having
exponential distribution with mean 1 minute. Assume the inter-arrival
times of persons arriving at the food court are iid having exponential
distribution with mean 3 minutes.
(a) What is the utilization of the cook? What is the utilization of the
cashier? What is the throughput of the system?
(b) What is the long-run average system time at the cooking station,
including the service time?
(c) It is known that the average time spent in the food court is 15 minutes
per person. What is the long-run average number of people in the
food court?
Assume the mean inter-arrival time is changed to 30 seconds for the next
subquestions.
(d) What is the utilization of the cook? What is the utilization of the
cashier? What is the throughput of the system?
(e) What is the long-run average system time at the cooking station,
including the service time?
9. Suppose you are modeling the Georgia Tech (GT)’s undergraduate popu-
lation. Assume that the undergraduate population is in steady-state.
(a) Every year 2500 students come to GT as first-year students. It takes

4.5 years on average for each student to graduate. How many un-
dergraduate students does GT have on average at a certain point of
time?
(b) In addition to the first-year students, a unknown number of students
come to GT as transferred students every year. It takes 2.5 years
on average for a transferred student to graduate. Suppose you know
that the average size of GT undergraduate population at a certain
point of time is 13250 students. How many students come to GT as

transferred students every year?
Chapter 3
Discrete Time Markov

Chain
In the first chapter, we learned how to manage uncertainty when you are selling
perishable items such as newspapers, milk, etc. How should we model differently
if we are running a business dealing with durable or non-perishable goods?
Discrete-time Markov Chain is the right tool to model such a situation.
3.1 Introduction
Suppose that your company is selling home appliances to customers. You are an
inventory manager in charge of operating company inventory efficiently. Sup-
pose that the demand for the appliances a week is as follows.
d 0 1 2 3
1 1 1 1
Pr{Dn = d} 8 4 2 8
First of all, we need to define the general notion of inventory policy.
Definition 3.1 ((s, S) Policy). If the inventory policy is (s, S), then the order-
ing principle is as follows given that the current inventory level is Xn .
• If we have less than or equal to s items (Xn ≤ s), order up to S items,

i.e. order S − Xn .
• If we have more than s items (Xn > S), do not order for the next period.
It is simple but widely adopted inventory policy in real world. In our exam-
ple, assume our inventory policy to be (s, S) = (1, 4). Let Xn be the inventory
level at the end of nth week.
39
40 CHAPTER 3. DISCRETE TIME MARKOV CHAIN
3.1.1 State Space

State space is a set containing all possible values that Xn can take. In this case,
Xn can take either one of 0, 1, 2, 3, 4, so the state space is {0, 1, 2, 3, 4}.
S = {0, 1, 2, 3, 4}
3.1.2 Transition Probability Matrix

Now, we would like to compute the probability of a future event given the
information about our current state. First, think about Pr{Xn+1 = 3 | Xn = 1}.
In plain English, it means that if we have 1 item at the end of the nth week,
what is the probability that we finish the (n + 1)th week with 3 items. Due
to (s, S) = (1, 4), over the weekend between week n and n + 1, we will order
3 items to fill our inventory up to 4. Consequently, we will start the (n + 1)th
week with 4 items. The question boils down to the probability of us ending up
with 3 items given that we had 4 items at the beginning. It is equivalent to the
probability we sell only 1 item during the (n + 1)th week which is 1/4.
Pr{Xn+1 = 3 | Xn = 1}
=Pr{Next week ends with 3 items | This week ended with 1 item}
=Pr{Next week ends with 3 items | Next week begins with 4 items}
=Pr{Selling 1 item during the next week}
1
=Pr{D = 1} =
4
How about Pr{Xn+1 = 3 | Xn = 3}? In this case, we will not order because
our current inventory at the end of nth week is 3 which is greater than s = 2.
Therefore, we will start the next week with 3 items. It boils down to the
probability of us ending up with 3 items at the end of the next week given that
we have 3 items at the beginning of the next week, meaning that we sell nothing
for the next week. The probability of selling nothing during a certain week is
1/8.
Pr{Xn+1 = 3 | Xn = 3}
=Pr{Next week ends with 3 items | This week ended with 3 items}
=Pr{Next week ends with 3 items | Next week begins with 3 items}
=Pr{Selling 0 items during the next week}
1
=Pr{D = 0} =
8
You should now be able to compute Pr{Xn+1 = j | Xn = i} for all possible

(i, j) pairs. There will be in this case 5 × 5 = 25 (i, j) pairs since we have 5
3.1. INTRODUCTION 41
states in the state space. We use matrix notation to denote these probabilities.
 
0 0 1/8 1/2 1/4 1/8
1 0 1/8 1/2 1/4 1/8 
P = 2 5/8 1/4 1/8 0 0 

3 1/8 1/2 1/4 1/8 0 
4 0 1/8 1/2 1/4 1/8
Note that row numbers represent the current state and column numbers repre-
sent the next state. So, (i, j) element of this matrix is
Pi,j = Pr{Xn+1 = j | Xn = i}.
Another important feature of a transition probability matrix is that each row

sums to 1 because you will jump from a state to another in S for sure.
Now we know all possible probabilities between one-period transition. How
can we compute the transition probability between multiple period? What is the
probability that we finish with 3 items two weeks later given that we ended this
week with 1 item? It will involve a property called the Markov property that
will be mentioned shortly. Let us proceed for now. In mathematical notation,
what is Pr{Xn+2 = 3 | Xn = 1}? At this point, let us recall the definition of
conditional probability.
Pr{A ∩ B} Pr{A, B}
Pr{A | B} = = or Pr{A, B} = Pr{A | B}Pr{B}
Pr{B} Pr{B}
Using this definition,
Pr{Xn+2 = 3 | Xn = 1}
4
X
= Pr{Xn+2 = 3, Xn+1 = k | Xn = 1}
k=0
4
X
= Pr{Xn+2 = 3 | Xn+1 = k, Xn = 1}Pr{Xn+1 = k | Xn = 1}
k=0
4
X
= Pr{Xn+2 = 3 | Xn+1 = k}Pr{Xn+1 = k | Xn = 1}
k=0
∵ Due to the Markov property
=Pr{Xn+2 = 3 | Xn+1 = 0}Pr{Xn+1 = 0 | Xn = 1}
+ Pr{Xn+2 = 3 | Xn+1 = 1}Pr{Xn+1 = 1 | Xn = 1}
+ Pr{Xn+2 = 3 | Xn+1 = 2}Pr{Xn+1 = 2 | Xn = 1}
+ Pr{Xn+2 = 3 | Xn+1 = 3}Pr{Xn+1 = 3 | Xn = 1}
+ Pr{Xn+2 = 3 | Xn+1 = 4}Pr{Xn+1 = 4 | Xn = 1}.
This formula looks very complicated and it will become more and more compli-
cated and error-prone as the state space grows. Here comes the power of matrix
notation. You can just square the transition probability matrix and look up the
2
corresponding number. In this case, P1,3 is the probability we are looking for.
In general, the transition probability between k periods
k
Pr{Xn+k = j | Xn = i} = Pi,j .
3.1.3 Initial Distribution

With transition probability matrix, we know what the probabilities are for jump-
ing from one state to another. However, we do not know from which state the
chain starts off at the beginning unless we are given the specific information.
Initial distribution is the information that determines where the chain starts at
time 0. Initial distribution plays a key role to compute unconditioned proba-
bility such as Pr{Xn = j} not Pr{Xn = j | X0 = i}. In order to compute
Pr{Xn = j}, we need information about Pr{X0 = i} for all i. Then,
X
Pr{Xn = j} = Pr{Xn = j | X0 = i}Pr{X0 = i}.
i∈S
Example 3.1 (Computing Unconditioned Probability). Suppose that your

daily mood only depends on your mood on the previous day. Your mood has
three distinctive states: Happy, So-so, Gloomy. The transition probability ma-
trix is given as
 
Happy .7 .3 0
P = So-so .3 .5 .2 .
Gloomy 0 .9 .1
In addition, your mood on day 0 is determined by a coin toss. You are happy
with probability 0.5 and gloomy with probability 0.5. Then, what is the prob-
ability that you are gloomy on day 2?
Answer: Denote the states happy, so-so, gloomy by 0, 1, 2, respectively. Since
we need two-period transition probabilities from day 0 to day 2, we need to
compute P 2 .
 
.58 .36 .06
P 2 = .36 .52 .12
.27 .54 .19
The probability that you are gloomy on day 2 is Pr{X2 = 2}.
2
X
Pr{X2 = 2} = Pr{X2 = 2 | X0 = i}Pr{X0 = i}
i=0
=Pr{X2 = 2 | X0 = 0}Pr{X0 = 0}
+ Pr{X2 = 2 | X0 = 1}Pr{X0 = 1}
+ Pr{X2 = 2 | X0 = 2}Pr{X0 = 2}
=(.27)(.5) + (.54)(0) + (.19)(.5) = .23
Hence, you are gloomy on day 2 with 23%.
Handling initial distribution becomes easy when we use matrix notation.

An initial distribution can be denoted by a vector a0 , of which each element
represents the probability that the chain is in a state at time 0. For example,
if a0 = (.2, .5, .3), it is interpreted as the chain starts from state 1 with proba-
bility .2, from state 2 with probability .5, and from state 3 with probability .3.
This notation can be extended to any time n. For instance, an represents the
probabilities that the chain is at a certain state at time n. Then, the following
relation exists with this notation.
an+1 = an P, an = a0 P n
where an = (Pr{Xn = 1}, Pr{Xn = 2}, . . . , Pr{Xn = k}) if the state space
S = {1, 2, . . . , k}.
Now that we learn all necessary elements of DTMC, let us try to model our
inventory system using DTMC.
Example 3.2 (Inventory Model for Non-perishable Items). A store sells a par-
ticular product which is classified as durable goods, i.e. left-over items can be
sold next period. The weekly demands for the product are iid random variables
with the following distribution:
d 0 1 2 3
Pr{D = d} .3 .4 .2 .1
The store is closed during the weekends. Unsold items at the end of a week
can be sold again the following week. Each Friday evening if the number of
remaining items is less than or equal to 2, the store orders enough items to
bring the total number of items up to 4 on Monday morning. Otherwise, do
not order. (The ordered items reach the store before the store opens on the
Monday morning.) Assume that any demand is lost when the product is out
of stock. Let Xn be the number of items in stock at the beginning of the nth
week, while Yn be the number of items at the end of the nth week. Xn , Yn
are DTMC. Then, what is the state space and transition probability matrix of
Xn , Yn respectively?
Answer: Our inventory policy, first of all, is (s, S) = (2, 4). Think about Xn
first. Since Xn is the inventory level at the beginning of each week, the state
space S = {3, 4} because if we had less than or equal to 2 items at the end of the
previous week, we would have ordered up to 4. Hence, the beginning inventory
is always greater than 2 which is our small s in the (s, S) policy. The transition
probability matrix is

3 .3 .7
P = . (3.1)
4 .4 .6
How about Yn ? Since Yn is the end-of-week inventory level, the state space
S = {0, 1, 2, 3, 4}. The transition probability matrix is

 
0 0 .1 .2 .4 .3
1 0 .1 .2 .4 .3
P = 2  0 .1 .2 .4 .3.
3 .1 .2 .4 .3 0 
4 0 .1 .2 .4 .3
Note that row 0, 1, 2, 4 are identical because if we ended a week with 0, 1, 2,

or 4 items, we will start the next week with S = 4 items anyway.
Now let us define DTMC in a formal way. As shown above, a DTMC has
the following three elements.
• State space S: For example, S = {0, 1, 2} or S = {0, 1, 2, 3, . . . }. S does
not have to be finite.
• Transition probability matrix P : The sum of each row of P should
be equal to 1. The sum of each column does not have to be equal to 1.
• Initial distribution a: It gives you the information about where your
DTMC starts at time n = 0.
Definition 3.2 (Discrete Time Markov Chain (DTMC)). A discrete time stochas-
tic process X = {Xn : n = 0, 1, 2, . . . } is said to be a DTMC on state space S
with transition matrix P if for each n ≥ 1 for i0 , i1 , i2 , . . . , i, j ∈ S
Pr{Xn+1 = j | X0 = i0 , X1 = i1 , X2 = i2 , . . . , Xn−1 = in−1 , Xn = i}

=Pr{Xn+1 = j | Xn = i} = Pi,j = (i, j)th entry of P. (3.2)
3.1.4 Markov Property

The most important part of Definition 3.2 is (3.2). (3.2) is called the Markov
property. In plain English, it says that once you know todays state, tomorrows
state has nothing to do with past information. No matter how you reached the
current state, your tomorrow will only depend on the current state. In (3.2),
• Past states: X0 = i0 , X1 = i1 , X2 = i2 , . . . , Xn−1 = in−1
• Current state: Xn = i
• Future state: Xn+1 = j.
Definition 3.3 (Markov Property). Given the current information (state), fu-
ture and past are independent.
From information gathering perspective, it is very appealing because you
just need to remember the current. For an opposite example, Wikipedia keeps
track of all the histories of each article. It requires tremendous effort. This is
the beauty of the Markov property.
What if I think my situation depends not only the current but one week
ago? Then, you can properly define state space so that each state contains two
weeks instead of one week. I have to stress that you are the one who decides
how your model should be: what the state space is, etc. You can add a bit more
assumption to fit your situation to Markov model.
Then, is our inventory model DTMC? Yes, because we know that we do not
have to know the past stock level to decide whether to order or not. Let us look
at the following example solely about the Markov property.
Example 3.3 (Weird Frog Jumping Around). Imagine a situation that a frog
is jumping around lily pads.
1. Suppose the frog is jumping around 3 pads. Label each pad as 1, 2, and 3.
The frog’s jumping has been observed extensively and behavioral scientists
concluded this frog’s jumping pattern to be as follows.
1 → 2 → 3 → 2 → 1 → 2 → 3 → 2 → ...
Let Xn denote the pad label that the frog is sitting on at time n. Is Xn
a DTMC? If so, what are the state space and the transition probability
matrix? If not, why?
2. A friend of the frog is also jumping around lily pads in the near pond.
This friend frog jumps around 4 pads instead of 3. Label each pad as
1, 2A, 2B, and 3. The friend frog’s jumping pattern is characterized as
follows.
1 → 2A → 3 → 2B → 1 → 2A → 3 → 2B → 1 → 2A → . . .
Let Yn denote the pad on which the friend frog is sitting at time n. Is Yn
a DTMC? If so, what are the state space and the transition probability
matrix? If not, why?
Answer:
1. It is not a DTMC because Xn does not have the Markov property. To
prove the point, let us try to complete the transition probability matrix
given that the state space S = {1, 2, 3}.
 
1 0 1 0
P = 2 ? ? ?
3 0 1 0
It is impossible to predict the next step of the frog by just looking at the
current location of it if the frog is sitting on pad 2. If we know additionally
where it came from to reach pad 2, it becomes possible to predict its next
step. That is, it requires past information to determine probability of
future. Hence the Markov property does not hold. Thus, Xn cannot be a
DTMC.
2. It is DTMC. State space S = {1, 2A, 2B, 3} and the transition probability
matrix is
 
1 0 1 0 0
2A  0 0 0 1 .

P =
2B 1 0 0 0

3 0 0 1 0
3.1.5 DTMC Models

DTMC is a very versatile tool which can simplify and model various situations
in the real world. Only inventory situation has been modeled using DTMC so
far. Let us explore several other possibilities.
Example 3.4 (Two State Model). S = {0, 1} and

0 α 1−α 3/4 1/4
P = =
1 1−β β 1/2 1/2
What can be modeled using this type of DTMC? Give one example.
Answer: There can be many examples with two-state DTMC. The followings
are some examples.
• Consider we are modeling the weather of a day. State 0 means hot and
state 1 means cold. Then, the probability of hot after a hot day is 3/4.
• Another example would be machine repairing process. State 0 is the ma-
chine is up and running and state 1 means machine is under repair.
Example 3.5 (Simple Random Walk). Suppose you toss a coin at each time n
and you go up if you get a head, down if you get a tail. Then, the state space
S = {. . . , −2, −1, 0, 1, 2, . . . } and Xn is the position after nth toss of the coin.
What is the transition probability matrix given that the probability of getting
a head is p and that of getting a tail is q = 1 − p?
Xn
q
3.2. STATIONARY DISTRIBUTION 47
Answer:

p,
 if j = i + 1
Pi,j = q, if j = i − 1

0, otherwise

Example 3.6 (Google PageRank Algorithm). Suppose you are analyzing brows-
ing behavior of users. There are three web pages of your interest (A, B, C) linked
to each other. Since transition from one page to another solely depends on the
current page you are viewing not on your browsing history, this case exhibits the
Markov property. Hence, we can model this browsing behavior using DTMC.
After collecting statistics regarding which link is clicked, you may come up with
the following transition probability matrix. It is just one possible example.
 
A 0 .7 .3
P = B .2 0 .8
C .5 .5 0
In practice, there are a great number of web sites and pages, so the corre-
sponding state space and transition probability matrix will be extremely large.
However, in this way, you can predict users’ browsing behavior and estimate the
probability a user will visit a certain web site.
Theorem 3.1. Suppose a function
f : S × (−∞, ∞) 7→ S, f (i, u) ∈ S
such that Xn+1 = f (Xn , Un ) and {Ui : i = 1, 2, 3, . . . } is an iid sequence. Then,

{Xn : n = 1, 2, 3, . . . } is a DTMC.
As in Example 3.3, it is usually easy to show that something does not possess
the Markov property. It is, however, hard to prove that something does possess
the Markov property. Theorem 3.1 is a useful tool in such cases. In plain
English, Theorem 3.1 means that if the next state is determined by the current
state and independent and identical random trials, the chain has the Markov
property. Let us see how Theorem 3.1 works in our examples. Recall the simple
random walk case in Example 3.5. Where we will stand is determined by our
current position and a coin toss. The coin toss is independent from where we
are standing now. Hence, the simple random walk described in Example 3.5
does have the Markov property.
3.2 Stationary Distribution

Now that we have the basic elements of DTMC, the natural question will be what
we can do with this new tool. As in the newsvendor model, we are interested in
what will happen if we run this system for a long period of time. To understand
what will happen in the long run, we introduce a new concept called stationary
distribution.
Definition 3.4 (Stationary Distribution). Suppose π = (πi , i ∈ S) satisfies

P
• πi ≥ 0 for all i ∈ S and i∈S πi = 1,
• π = πP .
Then, π is said to be a stationary distribution of the DTMC.
Example 3.7. Assume the demand distribution as follows.
d 0 1 2 3
Pr(D = d) .1 .4 .3 .2
If our inventory policy is (s, S) = (1, 3) and Xn is the inventory at the end of
each week, then
 
.2 .3 .4 .1
.2 .3 .4 .1
P = .5 .4 .1
.
0
.2 .3 .4 .1
What is the stationary distribution of this DTMC?

Answer: Let us apply the π = πP formula. Then, we have the following
system of linear equations.
π0 =.2π0 + .2π1 + .5π2 + .2π3

π1 =.3π0 + .3π1 + .4π2 + .3π3
π2 =.4π0 + .4π1 + .1π2 + .4π3
π3 =.1π0 + .1π1 + 0π2 + .1π3
3
X
1= πi
i=0
To solve this a bit more easily, you can fix one variable, in this case, π3 = 1.
After solving with this setting, you will obtain the relative ratio of π0 , π1 , π2 , π3 .
In the end, you need to scale it down or up to make its sum 1. As a result, you
will get the following result.
38 9 43 9 40 9 9
π0 = , π1 = , π2 = , π3 = ≈ .0692.
9 130 9 130 9 130 130
In the test, you could be asked to give the stationary distribution for a sim-
pler transition probability matrix. Hence, you should practice to compute the
stationary distribution from the given transition matrix.
3.2.1 Interpretation of Stationary Distribution

There are two ways to interpret the stationary distribution.
3.2. STATIONARY DISTRIBUTION 49
1. Stationary distribution represents the long-run fraction of time that the

DTMC stays in each state.
2. Stationary distribution represents the probability that the DTMC will be
in a certain state if you run it for very long time.
Using these two interpretations, you can claim the following statements
based on the stationary distribution computed in the previous example.
• Example of Interpretation 1. If we run this inventory system for a
long time, π0 = 38/130 ≈ 29% of time we end a week with zero inventory
left.
• Example of Interpretation 2. If we run this inventory system for a
long time, the chance that we will end a week with three items is π3 =
9/130 ≈ 7%.
3.2.2 Function of Stationary Distribution

We want to compute long run average cost or profit of a DTMC. So far, we have
not considered any cost associated with inventory. Let us assume the following
cost structure in addition to the previous example.
1. Holding cost for each item left by the end of a Friday is $100 per item.
2. Variable cost for purchasing one item is $1000.
3. Fixed cost is $1500.
4. Each item sells $2000.
5. Unfulfilled demand is lost.
If you are a manager of a company, you will be interested in something like
this: What is the long-run average profit per week? You can lose one week,
earn another week. But, you should be aware of profitability of your business
in general.
Let f (i) be the expected profit of the following week, given that this week’s
inventory ends with i items. Let us first think about the case i = 0 first. You
need to order three items, and cost of doing it will be associated with both
variable cost and fixed cost. We should also count in the revenue you will earn
next week. Fortunately, we do not have to pay any holding cost because we do
not have any inventory at the end of this week.
f (0) = − Cost + Revenue

=[−3($1000) − $1500] + [3($2000)(.2) + 2($2000)(.3) + 1($2000)(.4) + 0(.1)]
= − $1300
This is not the best week you want. How about the case you are left with 2
items at the end of this week? First of all, you should pay the holding cost $100
per item. When calculating the expected revenue, you should add probabilities
for D is 2 or 3. This is because even if the demand is 3, you can only sell 2
items. Since you do not order, there will be no cost associated with ordering.
f (2) = − Cost + Revenue
=[−2($100)] + [($0)(.1) + ($2000)(.4) + ($4000)(.3 + .2)]
=$2600
This seems to be quite a good week. The rest can be computed as follows.
f (1) =[−1($100) − 2($1000) − $1500]
+[3($2000)(.2) + 2($2000)(.3) + 1($2000)(.4) + 0(.1)] = −$400
f (3) =[−3($100)] + [3($2000)(.2) + 2($2000)(.3) + 1($2000)(.4) + 0(.1)] = $2900
Based on this, how would you compute the long-run average profit? This is
where the stationary distribution comes into play.
3
X
Long-run avg profit = f (i)πi
i=0
=f (0)π0 + f (1)π1 + f (2)π2 + f (3)π3
≈$488.39
It means that under the (s, S) = (1, 3) policy, we are earning about 500 dollars
every week using this policy. So, probably you can now decide whether or not to
keep running the business. It does not guarantee that you earn exactly $488.39
each week. It is the long-run average profit. In terms of getting the optimal
policy maximizing the long-run average profit, it involves advanced knowledge
on stochastic processes. Especially, this kind of problem cannot be solved using
the simplex method because it is highly nonlinear problem. You will be better
off by trying to solve it numerically using Matlab, R or other computer aids.
Example 3.8. Suppose your employment status for each week can be modeled
as a DTMC. You are either employed and unemployed for a certain week and
whether your employment status for the following week solely depends on the
this week’s status. The transition probability matrix is as follows.

Employed .9 .1
P =
Unemployed .3 .7
When employed, your net income is $300 that week. Otherwise, you spend $100
for the week, i.e. net income is −$100 for ths week. What is your long-run
average net income per week?
Answer: First of all, you need to compute the stationary distribution of the
DTMC. π0 represents the unemployed part, while π1 represents the employed
part.
.9π0 + .3π1 =π0
.1π0 + .7π1 =π1
π0 + π1 =1
3.3. IRREDUCIBILITY 51
Solving this, you can obtain π = (π0 , π1 ) = (.75, .25). Hence, the long-run
average net income per week = (−$100)π0 + ($300)π1 = $200.
3.3 Irreducibility
In the following three sections, we will learn a few important characteristics
of a DTMC. We will start with irreducibility of a chain. Before defining an
irreducible DTMC, it is necessary to understand accessibility of a state.
3.3.1 Transition Diagram

Suppose we have the following transition matrix.
 
1 0 .2 .8
P1 = 2 .5 0 .5
3 .6 .4 0
State space S = {1, 2, 3}. Is the following diagram equivalent to the matrix?
.8
1 3
.6
.2 .5
.5 .4
2
The diagram contains exactly the same amount of information as the transition
matrix has.
3.3.2 Accessibility of States

Let X = {Xn : n = 0, 1, 2, . . . } be a DTMC on S with transition matrix P .
Definition 3.5 (Accessibility and Irreducibility). 1. State i can reach state

j if there exists an n such that Pr(Xn = j|X0 = i) > 0, i.e. Pijn > 0. This
is mathematically noted as i → j.
2. States i and j are said to communicate if i → j and j → i.
3. X is said to be irreducible if all states communicate. Otherwise, it is said

to be reducible.
Why is irreducibility important? If a chain is reducible we may have more

than one stationary distribution.
Theorem 3.2. 1. If X is irreducible, there exists, at most one stationary

distribution.
2. If X has a finite state space, it has at least one stationary distribution.
Why is stationary distribution important? As seen in last section, when we
compute long-run average profit of a company, we need the stationary distribu-
tion. Combining the two cases in Theorem 3.2, we have the following unified
assertion.
Theorem 3.3. For a finite state, irreducible DTMC, there exists a unique sta-
tionary distribution.
3.4 Periodicity
Think of the following DTMC.
1 4
2 3
In this case, we have P 100 6= P 101 . However, according to our corollary, there
should be a unique stationary distribution for this chain, too. How do we obtain
it given that P 100 6= P 101 ? How about getting the average of the two?
P 100 + P 101
2
Even in this case, we cannot obtain the limiting distribution, because this
chain oscillates as n → ∞. How do we formally classify such cases?
Definition 3.6 (Periodicity). For a state i ∈ S,
d(i) = gcd{x : Piin > 0}
where gcd is the greatest common divisor.
For example, in the first example,
d(1) = gcd{2, 4, 3, . . . } = 1
d(2) =1.
In fact, if i ↔ j, d(i) = d(j). It is called the solidarity property. Since all states
in a irreducible DTMC communicate, the periods of all states are the same. We
will revisit it after covering recurrence and transience.
From the third example, d(4) = gcd{2, 4, 6, . . . } = 2, so this DTMC is
periodic with period d = 2.
3.4. PERIODICITY 53
Theorem 3.4. 1. If DTMC is aperiodic, then limn→∞ P n exists.

2. If DTMC is periodic with period d ≥ 2,
P n + P n+1 + P n+2 + · · · + P n+d−1
lim
n→∞ n
exists.
Since we are getting too abstract, let us look at another example.
Example 3.9.

.2 .8
P =
.5 .5
Is this DTMC irreducible? Yes. Now check Pii1 > 0. Therefore, d(i) = 1. It is an
easy way to check if a DTMC is aperiodic. Therefore, the limiting distribution
exists, each row of which is equal to the stationary distribution. What would
be the stationary distribution then?
πP = π

.2 .8
(π1 , π2 ) = (π1 , π2 )
.5 .5

5 8
∴ π= ,
13 13
Hence, even without the help from Matlab, we can say that we know

n 5/13 8/13
lim P = .
n→∞ 5/13 8/13
Example 3.10.
 
0 .5 0 .5
.5 0 .5 0
P = 
0 .5 0 .5
.5 0 .5 0
According to our theorems, how many distributions do we have? Just one. Can
you give the stationary distribution?
π = (25%, 25%, 25%, 25%)
It means that the long-run average fraction of time you are in each state is a
quarter. Then, is this true?
 
1/4 1/4 1/4 1/4
n
1/4 1/4 1/4 1/4
P = 
1/4 1/4 1/4 1/4
1/4 1/4 1/4 1/4
It is not true because P n alternates between two matrices as n grows. Rather,

the following statement is true for this DTMC.
lim (P n + P n+1 )/2
n→∞
3.5 Recurrence and Transience

Let X be a DTMC on state space S with transition matrix P . For each state
i ∈ S, let τi denote the first n ≥ 1 such that Xn = i.
Definition 3.7. 1. State i is said to be recurrent if Pr(τi < ∞|X0 = i) = 1.
2. State i is said to be positive recurrent if E(τi |X0 = i) < ∞.
3. State i is said to be transient if it is not recurrent.
Let us revisit the solidarity property to make it complete.
Definition 3.8 (Solidarity Property). If i ↔ j, then d(i) = d(j) where d(k) is
the period of state k. Also, i is recurrent or transient if and only if j is recurrent
or transient
The solidarity property states that if two states communicate with each
other, it is not possible that one state i is recurrent and the other state j is
transient. Therefore, if we apply the solidarity property to a irreducible DTMC,
only one of the following statements is true.
1. All states are transient.
2. All states are (positive) recurrent.
Remember that all states communicate with each other in a irreducible DTMC.
Example 3.11.
.5
.5 1 2
1
Given that X0 = 1, is it possible τ1 = 1, meaning that the chain returns to state

1 at time 1? Yes.
1
Pr(τ1 = 1|X0 = 1) =
2
Pr(τ1 = 2|X0 = 1) =Pr(X1 6= 1, X2 = 1|X0 = 1) = (0.5)1 = 0.5
Pr(τ1 = 3|X0 = 1) =0

1 3
E(τ1 |X0 = 1) =1 + 2(0.5) = < ∞
2 2
Note that the second probability is not 0.25+0.5, because X1 should not be
equal to 1 for τ1 = 2. Since the last expectation is finite, this chain is positive
recurrent.
3.5. RECURRENCE AND TRANSIENCE 55
Example 3.12.
.5
.5
1 3
1 .5
.5
2
Is state 1 positive recurrent? Yes. State 2 is transient and how about state 3?
Pr(τ3 = 1|X0 = 3) =0
Pr(τ3 = 2|X0 = 3) =0.5
Pr(τ3 = 3|X0 = 3) =(0.5)2
Pr(τ3 = 4|X0 = 3) =(0.5)3
..
.
τ3 is basically a geometric random variable and E(τ3 |X0 = 3) = 1/p = 2 < ∞.
3.5.1 Geometric Random Variable

Let us review how to compute the geometric random variable. Let X be the
number of tosses to get the 1st head, then X is a geometric random variable.
It could take millions of tosses for you to get the 1st head. But, what is the
probability X is finite?
Pr(X = 1) =p
Pr(X = 2) =pq
Pr(X = 3) =pq 2
Pr(X = 4) =pq 3
..
.
X∞
Pr(X < ∞) = Pr(X = n)
n=1
=p + pq + pq 2 + pq 3 + . . .
p
=p(1 + q + q 2 + . . . ) = =1
1−q
How about the expectation?
E(X) =1p + 2pq + 3pq 2 + . . .

=p(1 + 2q + 3q 2 + . . . ) = p(q + q 2 + q 3 + . . . )0
0
1 p 1
=p = 2
=
1−q (1 − q) p
Example 3.13.
.4
1 3
.8
.6 .3
.7 .2
2
In this case, we know that state 1 is positive recurrent. We also know that
1 ↔ 2 and 1 ↔ 3. Thus, we can conclude that all states in this chain is positive
recurrent.
Example 3.14. Consider a simple random walk.
p p p p p p
··· -2 -1 0 1 2 ···
q q q q q q
1. p = 1/3, q = 2/3: State i is transient. Say you started from 100, there is
positive probability that you never return to state 100. Hence, every state
is transient.
2. p = 2/3 > q: By the strong law of large numbers, Pr(Sn /n → −1/3) = 1
because
n Pn
X Sn ξi 2 1 1
Sn = ξi , = i=1 → E(ξ1 ) = (−1)q + (1)p = − + = − .
i=1
n n 3 3 3
3. p = q = 1/2: State i is recurrent, but not positive recurrent.
Note from the example above that if the chain is irreducible, every state is
either recurrent or transient.
Theorem 3.5. Assume X is irreducible. X is positive recurrent if and only if
X has a (unique) stationary distribution π = (πi ). Furthermore,
1
E(τi |X0 = i) = .
πi
3.6. ABSORPTION PROBABILITY 57
Recall that one of the interpretation of stationary distribution is the long-run

average time the chain spend in a state.
Theorem 3.6. Assume that X is irreducible in a finite state space. Then, X
is recurrent. Furthermore, X is positive recurrent.
In a finite state space DTMC, there is no difference between positive recur-
rence and recurrence. These two theorems are important because it gives us
a simple tool to check if stationary distribution exists. How about limit dis-
tribution? Depending on transition matrix, there may or may not be limiting
distribution. When do we have limiting distribution? Aperiodic. If periodic,
we used average. Let us summarize.
Theorem 3.7. Let X be an irreducible finite state DTMC.
1. X is positive recurrent.
2. X has a unique stationary π = (πi ).
3. limn→∞ Pijn = πj , independent of what i is if the DTMC is aperiodic.
4.
(Pijn + Pijn+1 + Pijn+2 + · · · + Pijn+d−1 )

lim = πj ,
n→∞ d
if the DTMC is periodic with period d.
3.6 Absorption Probability

Think of the following DTMC. You are asked to compute limn→∞ P n .
.2
.8
1 2 .5
.5
.5
1
.8
3 4 5
.2
.5 .8
6 7 .5
.5
.2
When computing rows 1, 2, you can just forget about states except for 1 and
2 because there is no arrow going out. Same for rows 6, 7.
 
1 2/3 1/3 0 0 0 0 0
22/3 1/3 0 0 0 0 0 
3 ? ? 0 0 0 ? ? 
P 100 ≈ lim P n = 4 

n→∞  ? ? 0 0 0 ? ? 
5 ? ? 0 0 0 ? ? 
6 0 0 0 0 0 2/3 1/3
7 0 0 0 0 0 2/3 1/3
What would be P31 ? Before this, note that {1, 2}, {6, 7} are closed sets,
meaning no arrows are going out. Contrarily, {3, 4, 5} are transient states. It
means that the DTMC starts from state 3, it may be absorbed into either {1, 2}
or {6, 7}.
Let us define a new notation. Let f3,{1,2} denote the probability that, start-
ing state 3, the DTMC ends in {1, 2}. Thus, f3,{1,2} + f3,{6,7} = 1. Let us
compute the numbers for these.
f3,{1,2} =(.25)(1) + (.5)(f4,{1,2} ) + (.25)(0)

f4,{1,2} =(.5)(1) + (.5)(f5,{1,2} )
f5,{1,2} =(.5)(f3,{1,2} ) + (.25)(f4,{1,2} ) + (.25)(0)
We have three unknowns and three equations, so we can solve this system of
linear equations.
 
5
x = .25 + .5y
 x =
 f3,{1,2} = 8
3
y = .5 + .5z ⇒ y= f4,{1,2} = 4
1
 
z = .5x + .25y z= f5,{1,2} =
 
2
We also now know that f3,{6,7} = 1−5/8 = 3/8, f4,{6,7} = 1−3/4 = 1/4, f5,{6,7} =
1 − 1/2 = 1/2.
However, to compute P31 , we consider not only f3,{1,2} but also the proba-
bility that the DTMC will be in state 1 not in state 2. Therefore,
5 2 3 2 1 2
  

 P31 = f3,{1,2} π1 = 8 3 
 P41 = f4,{1,2} π1 = 4 3 
 P51 = f5,{1,2} π1 = 2 3
5 1 3 1 1 1

P  
32 = f3,{1,2} π2 = 8 3
P
42 = f4,{1,2} π2 = 4 3
P
52 = f5,{1,2} π2 = 2 3
3 2
, 1 2
, 1 2
.


 P36 = f3,{6,7} π6 = 8 3


 P46 = f4,{6,7} π6 = 4 3


 P56 = f5,{6,7} π6 = 2 3
3 1 1 1 1 1
P37 = f3,{6,7} π7 = P47 = f4,{6,7} π7 = P57 = f5,{6,7} π7 =
  
8 3 4 3 2 3
3.7. COMPUTING STATIONARY DISTRIBUTION USING CUT METHOD59
Finally, we have the following complete limiting transition matrix.

 
1 2/3 1/3 0 0 0 0 0
2 2/3 1/3 0 0 0 0 0 

35/12 5/24 0 0 0 1/4 1/8 
P 100 ≈ lim P n = 4 

n→∞  1/2 1/4 0 0 0 1/6 1/12 
5 1/3 1/6 0 0 0 1/3 1/6  
6 0 0 0 0 0 2/3 1/3 
7 0 0 0 0 0 2/3 1/3
3.7 Computing Stationary Distribution Using Cut

Method
Example 3.15. Assume that X is an irreducible DTMC that is positive recur-
rent. It means X has a unique stationary distribution. Transition matrix can
be described as in the following state diagram.
.5
.25 1 3
.25 .5
.5 1
2
How do we find the stationary distribution? As far as we learned, we can

solve the following system of linear equations.
X
π = πP, πi = 1
i
This is sometimes doable, but becomes easily tedious as the number of states
increases.
We can instead use the flow balance equations. The idea is that for a state,
rate into the state should be equal to rate out of the state. Considering flow in
and out, we ignore self feedback loop. Utilizing this idea, we have the following
equations.
State 1 : rate in = .5π2

rate out = π1 (.5 + .25)
State 2 : rate in = .25π1 + 1π3
rate out = .25π2
State 3 : rate in = .5π1 + .5π2
rate out = 1π3
Equating each pair of rate in and out, now we have three equations. These three
equations are equivalent to the equations we can get from π = πP .
Example 3.16 (Reflected Random Walk). Suppose X has the following tran-
sition diagram.
p p p
q 0 1 2 3
q q q
If the probability of the feedback loop at state 0 is 1, it means that you will be
stuck there once you get in there. In business analogy or gambler analogy, it
means the game is over. Somebody forcefully pick the DTMC out from there
to keep the game going. The Wall Street in 2008 resembles this model. They
expected they would be bailed out. It is an incentive problem, but we will not
cover that issue now.
Suppose p = 1/3, q = 2/3. Then, the chain is irreducible. I can boldly say
that there exists a unique stationary distribution. However, solving π = πP
gives us infinite number of equations. Here the flow balance equations come
into play. Let us generalize the approach we used in the previous example.
For any subset of states A ⊂ S,
rate into A = rate out of A.
If A = {0, 1, 2}, we essentially look at the flow between state 2 and 3 because
this state diagram is linked like a simple thin chain. We have the following
equations.

p
π1 = π0
q
2
p p
π2 = π1 = π0
q q
3
p p
π3 = π2 = π0
q q
4
p p
π4 = π3 = π0
q q
Every element of the stationary distribution boils down to the value of π0 . Recall
3.8. INTRODUCTION TO BINOMIAL STOCK PRICE MODEL 61
P
we always have one more condition: i πi = 1.
∞ 2 3 !
X p p p
πi =π0 1+ + + + ... = 1
i=0
q q q
1 1 p 1
π0 = 2 3 = 1 =1− =
q 2

1 + pq + p
q + p
q + ... 1− p
q
n
p
πn = π0
q
So far, we assumed that p < q. What if p = q? What would π0 be?

π0 = π1 = π2 = · · · = 0. Then, is π a probability distribution? No. Since we do
not have the stationary distribution, this chain is not positive recurrent. What
if p > q? It gets worse. The chain cannot be positive recurrent because you
have nonzero probability of not coming back. In fact, In this case, every state
is transient.
I call this method cut method. You probably realized that it would be very
useful. Remember that positive recurrence means that the chain will eventually
come back to a state and the chain will have some types of cycles. You will see
the second example often as you study further. It is usually called birth-death
process.
3.8 Introduction to Binomial Stock Price Model

Suppose you invest in stocks. Let Xn denote the stock price at the end of period
n. This period can be month, quarter, or year as you want. Investing in stocks
is risky. How would you define the return?
Xn − Xn−1
Rn = , n≥1
Xn−1
Rn is the return in period n. We can think that X0 is the initial money you
invest in. Say X0 = 100, X1 = 110.
X1 − X0 110 − 100 1
R1 = = = = 10%
X0 100 10
Suppose you are working in a financial firm. You should have a model for stock
prices. No model is perfect, but each model has its own strength. One way to
model the stock prices is using iid random variables. Assume that {Rn } is a
series of iid random variables. Then, Xn can be represented as follows.
n
Y
Xn = X0 (1 + R1 )(1 + R2 )(1 + R3 ) . . . (1 + Rn ) = X0 (1 + Ri )
i=1
Q P
is similar to except that it represents multiplication instead of summation.
Assuming Rn iid, is X = {Xn : n = 0, 1, . . . , n} DTMC? Let us have more
concrete distribution of Rn .
Pr(Rn = 0.1) =p = 20%

Pr(Rn = 0.05) =q = (1 − p) = 80%
If you look at the formula for Xn again,
Xn = Xn−1 (1 + Rn ) = f (Xn−1 , Rn )
so you know that Xn is a DTMC because Xn can be expressed as a function of

Xn−1 and iid Rn .
This type of model is called binomial model. In this model, only two possi-
bilities for Rn exist: 0.1 or 0.05. Hence, we can rewrite Xn as follows.
Xn = X0 (1 + .1)Zn (1 + .05)n−Zn
where Zn ∼ Binomial(n, 0.2).

Many Wall Street firm uses computational implementation based on these
models. Practical cases are so complicated that they may not have analytic
solutions. Health care domain also uses this kind of Markov models, e.g. gene
mutation. DTMC has a huge application area. Our school has a master program
called Quantitative Computational Finance. It models stock prices not only in
a discrete manner but in a continuous manner. They have a concept called
geometric Brownian motion. Brownian motion has the term like eB(t) . In
binomial model, we also had the term with powers. If you look at the stock price
from newspapers, it looks continuous depending on the resolution. However, it is
in fact discrete. There are two mainstreams in academia and practice regarding
stock price modeling: discrete and continuous. Famous Black-Scholes formula
for options pricing is a continuous model. This is a huge area, so let us keep it
this level.
3.9 Simulation
Let us start from the basics of DTMC. Suppose that a DTMC on S = {1, 2, 3}
has the following transition probability matrix.
 
.3 .2 .5
P = .6 0 .4
0 .8 .2
Suppose we know the initial distribution is a0 = (.5, .2, .3). How can we compute
Pr{X10 = 1 | X0 = 3} and Pr{X10 = 1} using computer?
Listing 3.1: Basics of DTMC

3.10. EXERCISE 63
# Surprisingly, R does not have native function for raising a matrix

to the power. You have to install and import a library called
expm
library(expm)
# Set up P and a_0

P = matrix(c(.3,.2,.5, .6,0,.4, 0,.8,.2), 3,3, byrow=TRUE)
a0 = c(.5,.2,.3)
# Compute Pˆ10 and read (3,1) element

P %ˆ% 10
(P %ˆ% 10)[3,1]
# Compute a_10 = a_0 * Pˆ10 and read the first element

a0 %*% (P %ˆ% 10)
(a0 %*% (P %ˆ% 10))[1]
The different limiting behavior between periodic and aperiodic DTMC can
be also easily observed using R. Suppose that we have two slightly different
transition probability matrices.
   
0, .6, 0, .4 0, .2, .4, .4
.2, 0, .8, 0  .2, 0, .8, 0 
0, .1, 0, .9 , P 2 =  0, .1, 0, .9 
P1 =    
.7, 0, .3, 0 .7, 0, .3, 0
P 1 is periodic, while P 2 is aperiodic. Observe P 1100 differs from P 1101 , while

P 2100 = P 2101 .
Listing 3.2: Limiting Distribution P 100 of periodic and aperiodic DTMC
library(expm)
# Periodic DTMC
P1 = matrix(c(0,.6,0,.4, .2,0,.8,0, 0,.1,0,.9, .7,0,.3,0), 4,4,
byrow=TRUE)
P1%ˆ%100
P1%ˆ%101
# Observe that P1ˆ100 differs from P1ˆ101.
# Aperiodic DTMC
P2 = matrix(c(0,.2,.4,.4, .2,0,.8,0, 0,.1,0,.9, .7,0,.3,0), 4,4,
byrow=TRUE)
P2%ˆ%100
P2%ˆ%101
# Observe that P1ˆ100 is equal to P1ˆ101.
3.10 Exercise
1. A store stocks a particular item. The demand for the product each day
is 1 item with probability 1/6, 2 items with probability 3/6, and 3 items
with probability 2/6. Assume that the daily demands are independent
and identically distributed. Each evening if the remaining stock is less

than 3 items, the store orders enough to bring the total stock up to 6
items. These items reach the store before the beginning of the following
day. Assume that any demand is lost when the item is out of stock.
(a) Let Xn be the amount in stock at the beginning of day n; assume that
X0 = 5. If the process is a Markov chain, give the state space, initial
distribution, and transition matrix. If the process is not, explain why
it’s not.
(b) Let Yn be the amount in stock at the end of day n; assume that
Y0 = 2. If the process is a Markov chain, give the state space, initial
distribution, and transition matrix. If the process is not, explain why
it’s not.
Also compute the following.
(c) Pr{X2 = 6 | X0 = 5}.

(d) Pr{X2 = 5, X3 = 4, X5 = 6 | X0 = 3}.
(e) E[X2 | X0 = 6].
(f) Assume the initial distribution is (0, 0, .5, .5). Find Pr{X2 = 6}.
(g) With the same initial distribution, find Pr{X4 = 3, X1 = 5 | X2 = 6}
2. A six-sided die is rolled repeatedly. After each roll n = 1, 2, . . . , let Xn be

the largest number rolled in the first n rolls. Is {Xn , n ≥ 1} a discrete-
time Markov chain? If it’s not, show that it is not. If it is, answer the
following questions:
(a) What is the state space and the transition probabilities of the Markov
chain?
(b) What is the distribution of X1 ?
3. Redo the previous problem except replace Xn with Yn where Yn is the

number of sixes among the first n rolls. (So the first question will be, is
{Yn , n ≥ 1} a discrete-time Markov chain?)
4. Consider two stocks. Stock 1 always sells for $10 or $20. If stock 1 is
selling for $10 today, there is a .80 chance that it will sell for $10 for
tomorrow. If it is selling for $20 today, there is a .90 chance that it will
sell for $20 tomorrow.
Stock 2 always sells for $10 or $25. If stock 2 sells today for $10 there is
a .90 chance that it will sell tomorrow for $10. If it sells today for $25,
there is a .85 chance that it will sell tomorrow for $25. Let Xn denote the
price of the 1st stock and Yn denote the price of the 2nd stock during the
nth day. Assume that {Xn : n ≥ 0} and {Yn : n ≥ 0} are discrete time
Markov chains.
3.10. EXERCISE 65
(a) What is the transition matrix for {Xn : n ≥ 0}? Is {Xn : n ≥ 0}

irreducible?
(b) What is the transition matrix for {Yn : n ≥ 0}? Is {Yn : n ≥ 0}
irreducible?
(c) What is the stationary distribution of {Xn : n ≥ 0}?
(d) What is the stationary distribution of {Yn : n ≥ 0}?
(e) On January 1st, your grand parents decide to give you a gift of 300
shares of either Stock 1 or Stock 2. You are to pick one stock. Once
you pick the stock you cannot change your mind. To take advantage
of a certain tax law, your grand parents dictate that one share of the
chosen stock is sold on each trading day. Which stock should you
pick to maximize your gift account by the end of the year? (Explain
your answer.)
5. Suppose each morning a factory posts the number of days worked in a row
without any injuries. Assume that each day is injury free with probability
98/100. Furthermore, assume that whether tomorrow is injury free or not
is independent of which of the preceding days were injury free. Let X0 = 0
be the morning the factory first opened. Let Xn be the number posted on
the morning after n full days of work.
(a) Is {Xn , n ≥ 0} a Markov chain? If so, give its state space, initial
distribution, and transition matrix P . If not, show that it is not a
Markov chain.
(b) Is the Markov chain irreducible? Explain.
(c) Is the Markov chain periodic or aperiodic? Explain and if it is peri-
odic, also give the period.
(d) Find the stationary distribution.
(e) Is the Markov chain positive recurrent? If so, why? If not, why not?
6. Consider the following transition matrix:

 
0 0.5 0 0.5
0.6 0 0.4 0
P =  0 0.7 0
 (3.3)
0.3
0.8 0 0.2 0
(a) Is the Markov chain periodic? Give the period of each state.
(b) Is (π1 , π2 , π3 , π4 ) = (33/96, 27/96, 15/96, 21/96) the stationary dis-
tribution of the Markov Chain?
100 101
(c) Is P11 = π1 ? Is P11 = π1 ? Give an expression for π1 in terms of
100 101
P11 and P11 .
7. For each of the following transition matrices, do the following: (1) Deter-
mine whether the Markov chain is irreducible; (2) Find a stationary distri-
bution π; is the stationary distribution unique; (3) Determine whether the
Markov chain is periodic; (4) Give the period of each state. (5) Without
using any software package, find P 100 approximately.
(a)  
0 1/3 2/3
P = 2/3 0 1/3
1/3 2/3 0
(b)  
.2 .8 0 0
.5 .5 0 0
P =
 
0 0 0 1
0 0 .5 .5
8. A store sells a particular product. The weekly demands for the product
are iid random variables with the following distribution:
d 0 1 2 3
.
Pr{D = d} .3 .4 .2 .1
The store is closed during the weekends. Unsold items at the end of a
week can be sold again the following week. Each Friday evening if the
number of remaining items is at most 2, the store orders enough items to
bring the total number of items up to 4 on Monday morning. Otherwise,
do not order. (The ordered items reach the store before the store opens on
the Monday morning.) Assume that any demand is lost when the product
is out of stock. Let Xn be the number of items in stock at the beginning
of the nth week.
(a) If week 1 starts with 3 items, what is the probability that week 2
starts with 3 items?
(b) Is X = {Xn : n = 0, 1, . . .} a discrete time Markov chain? If so, give
its state space and the transition matrix, otherwise explain why it is
not a Markov chain.
9. Let X = {Xn : n = 0, 1, 2, . . .} be a discrete time Markov chain on state

space S = {1, 2, 3, 4} with transition matrix
0 21 0 21
 
 1 0 3 0
P = 4 4
0 1 0 1  .

2 2
1
4 0 34 0
(a) Draw a transition diagram

(b) Find Pr{X2 = 4|X0 = 2}.
3.10. EXERCISE 67
(c) Find Pr{X2 = 2, X4 = 4, X5 = 1|X0 = 2}.

(d) What is the period of each state?
(e) Let π = ( 81 , 14 , 38 , 41 ). Is π the unique stationary distribution of X?
Explain your answer.
1
(f) Let P n be the nth power of P . Does limn→∞ P1,4
n
= 4 hold? Explain
your answer.
(g) Let τ1 be the first n ≥ 1 such that Xn = 1. Compute E(τ1 |X0 = 1).
(If it takes you a long time to compute it, you are likely on a wrong
track.)
10. Let  
.2 .8 0 0 0
.5 .5 0 0 0
 
P =
0 .25 0 .75 0 

0 0 .5 0 .5
0 0 0 0 1
Find approximately P 100 .
11. Suppose that the weekly demand D of a non-perishable product is iid with
the following distribution.
d 10 20 30
Pr{D = d} .2 .5 .3
Unused items from one week can be used in the following week. The
management decides to use the following inventory policy: whenever the
inventory level in Friday evening is less than or equal to 10, an order is
made and it will arrive by Monday morning. The order-up-to quantity is
set to be 30.
(a) If this week (week 0) starts with inventory 20, what is the probability
that week 2 has starting inventory level 10?
(b) If this week (week 0) starts with inventory 20, what is the probability
that week 100 has starting inventory level 10?
(c) Suppose that each item sells at $200. Each item costs $100 to order,
and each leftover item by Friday evening has a holding cost of $20.
Suppose that each order has a fixed cost $500. Find the long-run
average profit per week.
12. Let X = {Xn : n = 0, 1, 2, 3, . . .} be a discrete time Markov chain on the

state space {1, 2, 3, 4} and transition matrix
 
0 1 0 0
.2 0 .8 0 
P =  0 .4 0 .6

0 0 1 0
(a) Draw the transition diagram of the Markov chain. Is the Markov
chain irreducible? What is the period of state 2?
(b) Find Pr{X1 = 2, X3 = 4|X0 = 3}.
(c) Find E(X2 |X0 = 3).
(d) Find the stationary distribution π = (π1 , π2 , π3 , π4 ) of the Markov
chain.
(e) Is the expression Pr{X100 = 2|X0 = 3} ≈ π2 correct? If not, correct
the expression.
13. Let
 
.4 .6 0 0 0 0
.7 .3 0 0 0 0
 
0 0 0 1 0 0
P = .
0
 0 0 0 .5 .5

.6 0 .4 0 0 0
0 0 0 0 0 1
(a) Compute P 100 approximately. (Leaving fractional forms as they are

is okay.)
(b) Approximately, is P 101 different from P 100 ?
14. Suppose you are working as an independent consultant for a company.
Your performance is evaluated at the end of each month and the evaluation
falls into one of three categories: 1, 2, and 3. Your evaluation can be
modeled as a discrete-time Markov chain and its transition diagram is as
follows.
.75 .5
.25 1 2 3 .75
.5 .25
Denote your evaluation at the end of nth month by Xn and assume that
X0 = 2.
(a) What are state space, transition probability matrix and initial dis-
tribution of Xn ?
(b) What is the stationary distribution?
(c) What is the long-run fraction of time when your evaluation is either
2 or 3?
Your monthly salary is determined by the evaluation of each month in the
following way.
Salary when your evaluation is n = $5000 + n2 × $5000, n = 1, 2, 3

3.10. EXERCISE 69
(d) What is the long-run average monthly salary?

15. Suppose you are modeling your economic status using DTMC. There are
five possible economic status: Trillionaire (T), Billionaire (B), Millionaire
(M), Thousandaire (Th), Bankrupt (Bk). Your economic status changes
every month. Once you are bankrupt, there is no chance of coming back
to other states. Also, once you become a billionaire or a trillionaire, you
will never be millionaire, thousandaire, or bankrupt. Other chances of
transitions are as follows. (Chances for transitions to itself are omitted.)
Transition Probability
T→B 0.9
B→T 0.3
M→B 0.1
M → Th 0.5
Th → M 0.2
Th → Bk 0.3
(a) Is this chain irreducible? Explain your answer to get credit for this
subquestion.
(b) What is the transition probability matrix of this DTMC model?
(c) Compute P 359 .
(d) What is the probability that you will become a trillionaire (T) 30
years later given that you are a thousandaire (Th) now?
Chapter 4
Poisson Process
In Chapter 2, we learned general queueing theory and how the flow of incoming
customers can be modeled. The way we modeled was to denote the inter-arrival
times between customers by a random variable and assumed that the random
variable has a specific probability distribution. In this chapter, we will learn
a special case: the case where the inter-arrival times follow iid exponential
distribution. We will learn how it is different from other distributions.
4.1 Exponential Distribution

First of all, let us start from the basics of an exponential distribution.
Definition 4.1. A random variable X is said to have exponential distribution
with rate λ (with mean 1/λ) if it has c.d.f. of
F (x) = 1 − e−λx
and p.d.f. of
(
λe−λx , if x ≥ 0
f (x) = .
0, if x < 0
Be careful that rate λ implies that the mean of the exponential distribution
is 1/λ. Inter-arrival times are often modeled by a sequence of iid exponential
random variables. What do we know about the exponential distribution?
1. E[X] = 1/λ
2. c2 (X) = 1
3. Var[X] = 1/λ2
4. Memoryless property
What is the memoryless property?
71
72 CHAPTER 4. POISSON PROCESS
4.1.1 Memoryless Property
Pr{X > t + s | X > s} = Pr{X > t}
Let me paraphrase this concept. Look at the light bulb on the ceiling. Let X
denote the lifetime of the bulb and assume that this light bulb’s lifetime follows
the exponential distribution. If it is on now, the length of its lifetime is as long
as a new bulb. s is the time until now and t is additional lifetime. Given that
the light has been on for s units of time, the probability that it will be on for t
more units of time is same as that of a new light bulb. How is it so?
Pr{X > t + s} e−λ(t+s)

Pr{X > t + s | X > s} = = = e−λt = Pr{X > t}
Pr{X > s} e−λs
In fact, it is the defining property of exponential property. It means that if a
non-negative continuous random variable has the memoryless property, it must
be exponential distribution.
Theorem 4.1.
Pr{X > t + S | X > S} = Pr{X > t}
for any positive random variable S that is independent of X.

Mathematically, we just replaced s with S. Let me explain the change in
plain English. If we are saying, “if the light is on at noon today, its remaining
lifetime distribution is as if it is new,” we are referring to s. If we are saying, “if
the light is on when IBM stock price hits 100 dollars, its remaining lifetime is
as if it is new,” we are referring to S which is random. The important thing to
note is that S should be independent of X. Suppose S = X/2, i.e. S depends
on X.
Pr X > t + X

X X 2
Pr X > t + |X> =
2 2 Pr {X > X/2}

X
=Pr X > t + = Pr{X > 2t}
2
6=Pr{X > t}
The memory property does not hold if X and S are not independent.
Come back to other derivations from exponential distribution. Let X1 and
X2 be independent exponential random variables with rates λ1 and λ2 respec-
tively, i.e. X1 ∼ Exp(λ1 ), X2 ∼ Exp(λ2 ). Let X = min(X1 , X2 ), then X denote
the time when any one of two bulbs fails.
Pr{X > t} =Pr{X1 > t, X2 > t} = Pr{X1 > t}Pr{X2 > t} ∵ X1 ⊥⊥ X2

−λ1 x −λ2 x −(λ1 +λ2 )x
=e e =e
Therefore, we can say that X = min(X1 , X2 ) ∼ Exp(λ1 + λ2 ).

4.1. EXPONENTIAL DISTRIBUTION 73
Example 4.1. Let X1 and X2 follow exponential distribution with means 2

hours and 6 hours respectively. Then, what would be the mean of min(X1 , X2 )?
1 1
E[min(X1 , X2 )] = 1 1 = = 1.5 hours.
2 + 6
4/6
How about the expectation of max(X1 , X2 )? We can use the fact that X1 +X2 =
min(X1 , X2 ) + max(X1 , X2 ).
E[max(X1 , X2 )] =E[X1 + X2 ] − E[min(X1 , X2 )] = 8 − 1.5 = 6.5 hours.
We do not have a convenient way to compute the expectation of max(X1 , X2 , . . . , Xn )

more than two exponential random variables.
Theorem 4.2.
λ1 λ2
Pr{X1 < X2 } = , Pr{X1 > X2 } = , Pr{X1 = X2 } = 0
λ1 + λ2 λ1 + λ2
How to compute this? One way is to guess the answer. As λ1 becomes large,
X1 gets shorter. It implies that Pr{X1 < X2 } goes close to 1. Another way is
to double integrate.
4.1.2 Comparing Two Exponentials

Let X1 and X2 be two independent random variables having exponential distri-
bution with rates λ1 and λ2 .
Theorem 4.3.
λ1
Pr{X1 < X2 } = , f (x1 , x2 ) = λ1 e−λ1 x1 e−λ2 x2
λ1 + λ2
How would you remember the correct formula was λ1 /(λ1 +λ2 ) not λ2 /(λ1 +
λ2 )? With λ1 increasing, the expected lifetime of the first lightbulb X1 becomes
shorter, hence the probability X1 breaks down before X2 gets close to 1. We
can also compute the probability using the joint pdf.
ZZ Z ∞ Z x2
Pr{X1 < X2 } = f (x1 , x2 )dx1 dx2 = f (x1 , x2 )dx1 dx2
D 0 0
When computing double integral, it is helpful to draw the region where integra-
tion applies. The shaded region in the following figure is the integration range
in 2 dimensions.
X2 3
0
0 1 2 3 4 5
X1
Example 4.2. 1. A bank has two teller, John and Mary. John’s processing
times are iid exponential distributions X1 with mean 6 minutes. Mary’s
processing times are iid exponential distributions X2 with mean 4 minutes.
A car with three customers A, B, C shows up at 12:00 noon and two tellers
are both free. What is expected time the car leaves the bank?
Using intuition, we can see that it would be between 8 and 12 minutes.

Suppose A, B first start to get service. Once one server completes, C will
occupy either A or B’s position depending on which server finishes first. If
C occupies A’s server after A is completed, B has already been in service
while A was getting served. Let Y1 , Y2 denote the remaining processing
time for John and Mary respectively. The expected time when the car
leaves is
E[W ] =E[min(X1 , X2 )] + E[max(Y1 , Y2 )]

=E[min(X1 , X2 )] + E[max(X1 , X2 )]
because of the memoryless property of exponential distribution. Even

if B has gone through service for a while, due to memoryless property
of exponential distribution, B’s service time is as if B just started the
service. Thus,
E[W ] =E[min(X1 , X2 ) + max(X1 , X2 )]

=E[X1 + X2 ] = 10 minutes.
2. What is the probability that C finishes service before A?

4.2. HOMOGENEOUS POISSON PROCESS 75
First compute the probability that B finishes service before A.
1/6 4 4
Pr{B before A} = = =
1/4 + 1/6 4+6 10
2
4
∴ Pr{C before A} =
10
This is because we can think of A just starting to get service when C

started service.
3. What is the probability that C finishes last?
Pr{C finishes last} =1 − Pr{C not last}

=1 − (Pr{C finishes before A} + Pr{C finishes before B})
2 2
4 6 12
=1 − − =
10 10 25
4.2 Homogeneous Poisson Process

A sequence of customers having iid exponential inter-arrival times is specially
called a Poisson process. We will first learn about homogeneous Poisson process.
Homogeneous Poisson process differs from non-homogeneous Poisson process in
that the exponential distribution associated with it has a constant rate λ.
Definition 4.2 (Poisson Process). N = {N (t), t ≥ 0} is said to be a Poisson

process with rate λ if
1. N (t) − N (s) ∼ Poisson(λ × (t − s)) for any 0 ≤ s < t
2. N has independent increments, i.e., N (t) − N (s) is independent of N (u)

for 0 ≤ u ≤ s
3. N (0) = 0
N is said to model a counting process, e.g. N (t) is the number of arrivals

in [0, t]. First of all, whichever time interval you look at closely, say (s, t], the
number of arrivals in the interval follows the distribution of Poisson random
variable. Second, how many people arrive during a time interval in the past
does not affect the number of arrivals in a future interval.
Where do we see this thing in the real world? We have very large population
base, say 1 million or 100 thousands. Each person makes independent decision,
e.g. placing a call, with small probability, then you will see the Poisson process.
Example 4.3. Assume N is a Poisson process with rate λ = 2/minutes.

1. Find the probability that there are exactly 4 arrivals in first 3 minutes.
(2(3 − 0))4 −2(3−0) 64
Pr(N (3) − N (0) = 4) = e = e−6 = 0.1339
4! 4!
2. What is the probability that exactly two arrivals in [0, 2] and at least 3
arrivals in [1, 3]?
Pr({N (2) =2} ∩ {N (3) − N (1) ≥ 3})
=Pr(N (1) = 0, N (2) = 2, N (3) − N (1) ≥ 3)
+ Pr(N (1) = 1, N (2) = 2, N (3) − N (1) ≥ 3)
+ Pr(N (1) = 2, N (2) = 2, N (3) − N (1) ≥ 3)
=Pr(N (1) = 0, N (2) − N (1) = 2, N (3) − N (2) ≥ 1)
+ Pr(N (1) = 1, N (2) − N (1) = 1, N (3) − N (2) ≥ 2)
+ Pr(N (1) = 2, N (2) − N (1) = 0, N (3) − N (2) ≥ 3)
=Pr(N (1) = 0)Pr(N (2) − N (1) = 2)Pr(N (3) − N (2) ≥ 1)
+ Pr(N (1) = 1)Pr(N (2) − N (1) = 1)Pr(N (3) − N (2) ≥ 2)
+ Pr(N (1) = 2)Pr(N (2) − N (1) = 0)Pr(N (3) − N (2) ≥ 3)
What are we doing here? Basically, we are decomposing the intervals
into non-overlapping ones. Then, we will be able to use the indepen-
dent increment property. We have learned how to compute Pr(N (1) =
0), Pr(N (2) − N (1) = 2). How about Pr(N (3) − N (2) ≥ 1)?
Pr{N (3) − N (2) ≥ 1} =1 − Pr{N (3) − N (2) < 1} = 1 − Pr{N (3) − N (2) = 0}
20 −2
=1 − e = 1 − e−2
0!
3. What is the probability that there is no arrival in [0, 4]?
Pr{N (4) − N (0) = 0} = e−8
4. What is the probability that the first arrival will take at least 4 minutes?
Let T1 be the arrival time of the first customer. Is T1 a continuous or
discrete random variable? Continuous.
Pr{T1 > 4} = Pr{N (4) = 0} = e−8
Can you understand the equality above? In plain English, “the first arrival
takes at least 4 minutes” is equivalent to “there is no arrival for the first
4 minutes.” It is very important duality. What if we change “4” minutes
to t minutes?
Pr{T1 > t} = Pr{N (t) = 0} = e−2t
Surprisingly, T1 is an exponential random variable. In fact, the times
between arrivals also follows the same iid exponential distribution. We
will cover this topic further after the Spring break.
4.2. HOMOGENEOUS POISSON PROCESS 77
Example 4.4. Let N (t) be the number of arrivals in [0, t]. Assume that N =
{N (t), t ≥ 0} is a Poisson process with rate λ = 2/min.
1.
(2(7 − 3))5 −2(7−3) 85
Pr{N (3, 7] = 5} = e = e−8
5! 5!
2. Let T1 be the arrival time of the 1st customer.
(2t)0 −2t
Pr{T1 > t} = Pr{N (0, t] = 0} = e = e−2t , t≥0
0!
What is this distribution? It is exponential distribution, meaning that
T1 follows exponential distribution. T1 ∼exp with mean 0.5 minutes.
Therefore,
Pr{T1 ≤ t} = 1 − e−2t .
3. Let T3 be the arrival time of the 3rd customer. Which of the following is
correct?
Pr{T3 > t} =Pr{N (0, t] ≤ 2} = Pr{N (0, t] = 0} + Pr{N (0, t] = 1} + Pr{N (0, t] = 2}
2t −2t (2t)2 −2t
=e−2t + e + e
1! 2!
Pr{T3 > t} =Pr{N (0, t] = 2}
The first equation is correct. Now we can compute the cdf of T3 .
2t −2t (2t)2 −2t

−2t
Pr{T3 ≤ t} = 1 − Pr{T3 > t} = 1 − e + e + e
1! 2!
Can we compute the pdf of T3 ? We can take derivative of the cdf to obtain
the pdf.
2(2t)2 −2t
fT3 (t) = e
2!
What random variable is this? Poisson? No. Poisson is a discrete r.v. T3
can obviously take non-integral values. It is gamma distribution. To help
you distinguish different 2s here, let me use λ.
λ(λt)2 −λt
e ∼ Gamma(3, λ)
2!
α = 3 is the shape parameter and λ is the scale parameter. For gamma
distribution, it is easy to understand when α is an integer. Even if α is
not an integer, gamma distribution is still defined. However, it is tricky to
compute something like (2.3)!. That’s where Γ(α) , gamma function, comes
into play. You can look up the table to get the specific value for a certain
α.
Now think why we get gamma distribution for T3 . Each inter-arrival time
is exponential r.v. Also, they are iid. Therefore, we can compute T3 by
summing three of them. That’s why we get gamma distribution.
By the way, Erlang distribution, if you have heard of them, is gamma
distribution with integer α.
4.3 Non-homogeneous Poisson Process

We now move on to non-homogeneous Poisson process. λ associated with the
exponential distribution of non-homogeneous Poisson process varies over time.
That is, λ is a function of time t, λ(t). Then, why do we ever need non-
homogeneity which makes computation more complicated? Suppose that you
are observing a hospital’s hourly arrival rate. Arrival rate will not be constant
over an entire day.
10
8
Homogeneous
Hourly Arrival Rate
3
Non-homogeneous
2
0
0 2 4 6 8 10 12 14 16 18 20 22 24
Time (Hour)
Therefore, to model this type of real world phenomena, we need more sophis-
ticated model. The figure above clearly suggests that non-homogeneous Poisson
process can capture and model the reality better than just homogeneous Poisson
process. In other words, non-homogeneous Poisson process allows us to model
a rush-hour phenomenon.
Definition 4.3 (Non-homogeneous Poisson Process). N = {N (t), t ≥ 0} is said

to be a time-inhomogeneous Poisson process with rate function λ = {λ(t), t ≥ 0}
if
1. N has independent increments,

4.3. NON-HOMOGENEOUS POISSON PROCESS 79
2.
R k
t
s
λ(u)du Rt
Pr{N (s, t] = k} = e− s
λ(u)du
,
k!
where N (s, t] = N (t) − N (s) is the number of arrivals in (s, t].
Example 4.5. Suppose we modeled the arrival rate of a store as follows. One
month after launching, the arrival rate settles down and it jumps up at the end
of the second month due to discount coupon.
3
2
Arrival Rate (t)
0
0 1 2 3 4 5
Time
Assume N is a time-non-homogeneous Poisson process with rate function
λ = {λ(t), t ≥ 0} given in the figure above. Then, average number of arrivals in
R .5
(0, .5] is 0 λ(t)dt = (1/2)(.5)2 = .125.
1.
Pr{N (.5) ≥ 2} =1 − Pr{N (.5) < 2} = 1 − Pr{N (.5) = 0} − Pr{N (.5) = 1}
.1251 −.125
Pr{N (.5) = 1} = e
1!
2.
Pr{there are at least 1 arrival in (0, 1] and 4 arrivals in (1, 3]}
=Pr{N (0, 1] ≥ 1}Pr{N (1, 3] = 4}
by the independent increment property. Each of these can be computed
as follows.
Pr{N (0, 1] ≥ 1} =1 − Pr{N (0, 1] = 0} = 1 − e−1/2
34 −3
Pr{N (1, 3] = 4} = e
4!
It is important to note that you should first decompose the time intervals so
that they do not overlap. Then, we can convert the probability into the product
form in which we can compute each element of the product form.
4.4 Thinning and Merging

We covered most of Poisson process related topics. Some topics remaining are
merging and splitting of Poisson arrival processes. Let us further develop our
model for incoming customers stream. What if we have two independent streams
of Poisson processes merged together and goes into a single queue? Will the
merged process be also Poisson? Conversely, if there are two types of customers
in a single flow and we separate the two from one stream, will the two split
processes be Poisson? Understanding merging and thinning Poisson process
will greatly enhance the applicability of Poisson process to reality.
4.4.1 Merging Poisson Process

Suppose that there are only two entrances in Georgia Tech. Denote the numbers
of arrivals through gate A and B by NA , NB .
NA = {NA (t), t ≥ 0}, NB = {NB (t), t ≥ 0}
Theorem 4.4. Assume NA is a Poisson process with rate λA . Assume NB is a

Poisson process with rate λB . Assume further that NA and NB are independent.
Then, N = {N (t), t ≥ 0} is a Poisson processes with rate λ = λA + λB where
N (t) = NA (t) + NB (t).
The independence assumption may or may not be true. Thinking of the

Georgia Tech case here, more people through gate A may mean that less people
through gate B. Think about Apple’s products. Would sales of iPad be corre-
lated with sales of iPhone? Or, would it be independent? We need to be careful
about independence condition when you model the real world.
4.4.2 Thinning Poisson Process

Now think about splitting a Poisson process. N = {N (t), t ≥ 0} is a Poisson
process describing the arrival process. Each customer has to make a choice
which of two stores they shop, A or B. To make the decision, they flip a biased
coin with probability p of getting a head. Let NA (t) be the number of arrivals
to store A in (0, t]. Let NB (t) be the number of arrivals to store B in (0, t].
Compose N (t) = NA (t) + NB (t).
Theorem 4.5. Suppose N is Poisson with rate λ. Then, NA is a Poisson

process with rate pλ and NB is a Poisson process with rate (1 − p)λ.
It may be silly that you shop and choose by flipping a coin. However, from a
company’s perspective, they view people choose based on flipping a coin using
4.4. THINNING AND MERGING 81
statistical inference. Suppose a company has two products. They will model
people’s choice using a lot of tracking data and make a conclusion based on
statistical inference.
The following is the sketch of the proof.
Proof.
NA (t) ∼Poisson(λp)
NB (t) ∼Poisson(λ(1 − p))
Define
N (t)
(
X 1, if the ith toss is a head
NA (t) = Yi , Yi =
i=1
0, if the ith toss is a tail
where {Yi } iid and is independent of N (t). Then,

N (t)
(λpt)k −λpt X
Pr(NA (t) = k) = e = Pr( Yi = k)
k! i=1
 
∞ N (t)
X X
= Pr  Yi = k|N (t) = n Pr(N (t) = n)
n=k i=1
∞ n
!
X X (λt)n −λt
= Pr Yi = k|N (t) = n e
i=1
n!
n=k
∞ n
!
X X (λt)n −λt
= Pr Yi = k e
i=1
n!
n=k
∞
(λt)n −λt

X n k
= p (1 − p)n−k e
k n!
n=k
∞
X n! (λt)n −λt
= pk (1 − p)n−k e
(n − k)!k! n!
n=k
k ∞
−λt p 1
X
=e (1 − p)n−k (λt)k (λt)n−k
k! (n − k)!
n=k
∞
pk (λt)k X 1
=e−λt (1 − p)n−k (λt)n−k
k! (n − k)!
n=k
k k
((1 − p)λt)1 ((1 − p)λt)2 ((1 − p)λt)3

−λt p (λt)
=e 1+ + + + ...
k! 1! 2! 3!
pk (λt)k −(1−p)λt
=e−λt e using Taylor expansion
k!
(λpt)k −λpt
= e
k!
The difficulty arising here is that, in the summation of Yi , Yi and N are en-
tangled. But, we have conditioning! In conditioning, we used the following
property of conditional probability, the law of total probability.
X
Pr(A) = Pr(A|Bn )Pr(Bn )
n
The Taylor expansion is
x2 x3
ex = 1 + x + + + ....
2 3!
Although we obtained what we wanted at first, we still need to prove the inde-
pendence between NA and NB . To do that, we need to show
(λpt)k e−λpt (λpt)l e−λpt

Pr(NA (t) = k, NB (t) = l) =
k! l!
which is the product form. We do not go over the computation, but it will be
very similar.
In Poisson process, independent increment is another important concept, but
we will not go over that very much. FYI, as another example, if each customer
does not toss a coin and instead odd number customers go to store A and even
number customers go to store B, the split processes will not be Poisson.
4.5 Simulation
First, let us simulate a homogeneous Poisson process with a fixed rate 2 people
per minute.
Listing 4.1: Simulating Homogeneous Poisson Process
# Set up arrival rate

lambda = 2
# How many arrivals will be generated?

N=100
# Generate N random inter-arrival times from exponential

distribution
interarrivals=rexp(N, rate=lambda)
interarrivals
# Arrival timestamps (cumulative sum of inter-arrival times)

cumsum(interarrivals)
Simulating non-homogeneous Poisson process is substantially more com-

plicated than simulating homogeneous Poisson process. To simulate a non-
homogeneous Poisson process with intensity function λ(t), choose a sufficiently
large λ so that λ(t) = λp(t) and simulate a Poisson process with rate parameter
4.5. SIMULATION 83
1
λ. Accept an event from the Poisson simulation at time t with probability
√ p(t).
Let us simulate a non-homogeneous Poisson process with rate λ(t) = t.
Listing 4.2: Simulating Non-homogeneous Poisson Process
# Set up arrival rate as a function of time and the large constant c

lambda = function(t) {sqrt(t)}
c = 10000
# To simulate N non-homogeneous arrivals, we will need much larger

size (N_homo > N*c) of homogeneous arrivals
N=100
N_homo = N*c*3
# Generate N_homo homogeneous Poisson arrivals

h_arrivals = cumsum(rexp(N_homo, rate=c))
head(h_arrivals)
# Collect events with probability lambda(t)/c

arrivals = c()
for (i in 1:length(h_arrivals)) {
if (runif(1)<lambda(h_arrivals[i])/c) {
arrivals = append(arrivals, h_arrivals[i])
if (length(arrivals)>=N) break
}
}
arrivals
Finally, we can also simulate merging and thinning Poisson processes. Sup-
pose we are modeling the inflow of people into the Georgia Tech Atlanta campus.
There are two entrances to the campus: Gate A and B. Through Gate A, people
come in following a homogeneous Poisson process with rate 3. Through Gate
B, arrivals form a Poisson process with rate 5. If we want to model the arrivals
to the campus as a whole, we know that the merged process is a homogeneous
Poisson process with rate 8 from theory. Still, we can programmatically merge
two independent processes.
Listing 4.3: Merging Poisson Processes
# Set up two arrival rates

l1 = 3
l2 = 5
# How many arrivals will be generated?

N=100
# Simulate two N homogeneous Poisson arrivals

arrivals1 = cumsum(rexp(N, rate=l1))
arrivals2 = cumsum(rexp(N, rate=l2))
# Merge the two processes and truncate first N arrivals

merged = sort(c(arrivals1,arrivals2))[1:N]
merged
1 Ross, Sheldon M. (2006). Simulation. Academic Press. p. 32.

# Compare with another independent homogeneous Poisson process with

rate 8
comparison = cumsum(rexp(N, rate=l1+l2))
comparison
4.6 Exercise
1. Suppose there are two tellers taking customers in a bank. Service times at
a teller are independent, exponentially distributed random variables, but
the first teller has a mean service time of 2 minutes while the second teller
has a mean of 5 minutes. There is a single queue for customers awaiting
service. Suppose at noon, 3 customers enter the system. Customer A goes
to the first teller, B to the second teller, and C queues. To standardize the
answers, let us assume that TA is the length of time in minutes starting
from noon until Customer A departs, and similarly define TB and TC .
(a) What is the probability that Customer A will still be in service at
time 12:05?
(b) What is the expected length of time that A is in the system?
(c) What is the expected length of time that A is in the system if A is
still in the system at 12:05?
(d) How likely is A to finish before B?
(e) What is the mean time from noon until a customer leaves the bank?
(f) What is the average time until C starts serv
(g) What is the average time that C is in the system?
(h) What is the average time until the system is empty?
(i) What is the probability that C leaves before A given that B leaves
before A?
(j) What are the probabilities that A leaves last, B leaves last, and C
leaves last?
(k) Suppose D enters the system at 12:10 and A, B, and C are still there.
Let WD be the time that D spends in the system. What is the mean
time that D is in the system?
2. Suppose we agree to deliver an order in one day. The contract states that
if we deliver the order within one day we receive $1000. However, if the
order is late, we lose money proportional to the tardiness until we receive
nothing if the order is two days late. The length of time for us to complete
the order is exponentially distributed with mean 0.7 days. For notation,
let T be the length of time until delivery.
(a) What are the probabilities that we will deliver the order within one
day and within two days?
4.6. EXERCISE 85
(b) What is the expected tardiness?

(c) Notice that this contract is the one in the Littlefield simulation
game 1. We now add two more contracts (which similarly have the
rule of losing money proportional to tardiness):
• price = $750; quoted lead time = 7 days; maximum lead time
= 14 days.
• price = $1250; quoted lead time = 0.5 day; maximum lead time
= 1 days.
Notice that in the $1000 contract, quoted lead time = 1 day; maxi-
mum lead time = 2 days. What are the expected revenues for these
three contracts? Which contract would be the most lucrative assum-
ing the mean time to delivery is 0.7 days?
(d) Suppose the mean time to delivery is exponentially distributed with
mean δ days. For what values of δ is the shortest $1250 contract op-
timal, for what values is the medium length $1000 contract optimal,
and for what values is the longest $750 contract optimal?
3. Suppose passengers arrive at a MARTA station between 10am -5pm fol-

lowing a Poisson process with rate λ = 60 per hour. For notation, let N (t)
be the number of passengers arrived in the first t hours, S0 = 0, Sn be the
arrival time of the nth passenger, Xn be the interarrival time between the
(n − 1)st and nth passenger.
(a) What is the expected number of passengers arrived in the first 2

hours?
(b) What is the probability that no passengers arrive in the first one
hour?
(c) What is the mean number of arrivals between noon and 1pm?
(d) What is the probability that exactly two passengers arrive between
2pm and 4pm?
(e) What is the probability that ten passengers arrive between 2pm and
4pm given that no customer arrive in the first half hour?
(f) What is the average time of the first arrival?
(g) What is the expected time of the thirtieth arrival?
(h) What is the expected time of the first arrival given that no passenger
arrives in the first hour?
(i) What is the expected arrival time of the first passenger who arrives
after 2pm?
(j) Suppose that 40% of passengers are female. What is the expected
number of female passengers arrived in the first 2 hours? For no-
tation, let W (t) be the number of female passengers and M (t) the
number of male passengers arrive in the first t hours. Let SnW be
the arrival time of the nth female passenger, and XnW be the interar-
rival time between the nth and (n − 1)st female passenger. Similarly
define, SnM and XnM for the male passengers.
(k) What is the probability of at least one female passengers arrive in
the first half hour given that no male passengers arrive?
(l) What is the probability that the first three passengers are all female?
4. Nortel in Canada operates a call center for customer service. Assume that
each caller speaks either English or French, but not both. Suppose that the
arrival for each type of calls follows a Poisson process. The arrival rates
for English and French calls are 2 and 1 calls per minute, respectively.
Assume that call arrivals for different types are independent.
(a) What is the probability that the 2nd English call will arrive after
minute 5?
(b) Find the probability that, in first 2 minutes, there are at least 2
English calls and exactly 1 French call?
(c) What is the expected time for the 1st call that can be answered by a
bilingual operator (that speaks both English and French) to arrive?
(d) Suppose that the call center is staffed by bilingual operators that
speak both English and French. Let N (t) be the total number of
calls of both types that arrive during (0, t]. What kind of a process
is N (t)? What is the mean and variance of N (t)?
(e) What is the probability that no calls arrive in a 10 minute interval?
(f) What is the probability that at least two calls arrive in a 10 minute
interval?
(g) What is the probability that two calls arrive in the interval 0 to 10
minutes and three calls arrive in the interval from 5 to 15 minutes?
(h) Given that 4 calls arrived in 10 minutes, what is the probability that
all of these calls arrived in the first 5 minutes?
(i) Find the probability that, in the first 2 minutes, there are at most 1
call, and, in the first 4 minutes, there are exactly 3 calls?
5. Suppose customer arrive to a bank according to a Poisson process but the
arrival rate fluctuates over time. From the opening time at 9 a.m. until
11, customers arrive at a rate of 10 customers per hour. From 11 to noon,
the arrival rate increases linearly until it reaches 20 customers per hour.
From noon to 1pm, it decreases linearly to 15 customers per hour, and
remains at 15 customers per hour until the bank closes at 5 p.m. For
notation, let N (t) be the number of arrivals in the t hours since the bank
opened and λ(t) the arrival rate at t hours after opening.
(a) What is the arrival rate at 12:30pm?
(b) What is the average number of customers per day?
4.6. EXERCISE 87
(c) What is the probability of k arrivals between 11:30 and 11:45?

(d) What is the probability of k arrivals between 11:30 and 11:45 given
that there were 7 arrivals between 11:00 and 11:30?
(e) Consider the first customer that arrives after noon and let T be the
length of time since noon to when that customer arrives. What is
the probability that this customer arrives after 12:10, after 12:20? Is
T exponentially distributed?
6. Calls to a call center follow a Poisson process with rate λ = 2 calls per
minute. The call center has 10 phone lines. There is a single agent whose
processing times are exponentially distributed mean 4 minutes. Assume
that at 12 noon (time 0), there are two calls in the system; the first call
is being processed, the second call is waiting, and the third call is on its
way to arrive.
(a) What is the expected time that the second call leaves the system?
(b) What is the probability that the second call leaves an empty system?
(c) What is the expected time until the 3rd call leaves the system?
(d) At 12:05pm, what is the probability that the second call is the only
call in the system?
7. Calls to a center follow a Poisson process with rate 120 calls per hour. Each
call has probability 1/4 from a male customer. The call center opens at
9am each morning.
(a) What is the probability that there are no incoming calls in the first
2 minutes?
(b) What is the probability that there are exactly one calls in the first 2
minutes and exactly three calls from minute 1 to minute 4?
(c) What is the probability that during the first 2 minutes there are 1
call from male customer and 2 calls from female customers?
(d) What is the expected elapse of time from 9am until the second female
customer arrives?
(e) What is the probability that the second call arrives after 9:06am?
8. Assume that passengers arrive at a MARTA station between 10am -12noon

following a non-homogeneous Poisson process with rate function λ(t) given
as follows. From 10am-11am, the arrival rate is constant, 60 passengers
per hour. From 11am to noon, it increases linearly to 120 passengers per
hour.
(a) What is the probability that there is at least 1 passenger arrival in

the first 2 minutes (after 10am)?
(b) What is the probability that there are 2 arrivals between 10:00am to
10:10am and 1 arrival between 10:05am to 10:15am?
(c) What is the expected number of arrivals between 10:50am and 11:10am?
(d) What is the probability that there are exactly 10 arrivals between
10:50am and 11:10am?
(e) What is the probability that the 1st passenger after 10am will take
at least 2 minutes to arrival?
(f) What is the probability that the 1st passenger after 11am will take
at least 2 minutes to arrival?
9. Assume that call arrival to a call center follows a non-homogeneous Poisson

process. The call center opens from 9am to 5pm. During the first hour,
the arrival rate increases linearly from 0 at 9am to 60 calls per hour at
10am. After 10am, the arrival rate is constant at 60 calls per hour.
(a) Plot the arrival rate function λ(t) as a function of time t; indicate
clearly the time unit used.
(b) Find the probability that exactly 5 calls have arrived by 9:10am.
(c) What is the probability that the 1st call arrives after 9:20am?
(d) What is the probability that there are exactly one call between 11:00am
and 11:05am and at least two calls between 11:03am and 11:06am?
10. A call center is staffed by two agents, Mary and John. Mary’s processing
times are iid, exponentially distributed with mean 2 minutes and John’s
processing times are iid, exponentially distributed with mean 4 minutes.
(a) Suppose that when you arrive, both Mary and John are busy, and you
are the only one waiting in the waiting line. What is the probability
that John will serve you? What is your expected time in system
(waiting plus service)?
(b) What is the probability that you will be answered in 2 minutes?
(c) Suppose that when you arrive, you are the 2nd one in the waiting
line. What is the probability that you be answered in 2 minutes?
11. Suppose you are observing an one-way road in Georgia Tech. Also, sup-
pose there are only two types of vehicles on the road: sedan and truck.
The interval between two sedans follows an iid exponential distribution
with mean time 2 minute. The arrivals of trucks form a Poisson process
with rate 60 per hour. The arrival of two types are independent from each
other.
(a) What is the probability that the next vehicle passing you is a truck?
(b) What is the probability that at least three sedans will pass you in
next 10 minutes?
4.6. EXERCISE 89
(c) What is the probability that you will not be hit by a vehicle if you
cross the road blindly? Assume that it takes 1 minute for you to
cross the road with your eyes closed.
(d) What is the expected time until the next vehicle arrives regardless of
the type of the vehicle?
You realized that there is another type of vehicle on the road: motorcycle.
The arrivals of motorcycles is also a Poisson process but its arrival rate of
a day is as follows.
Time Arrival Rate (per hour)

12am-8am 20
8am-4pm 40
4pm-12am 30
The time is currently 7:30am. The arrival of three types of vehicles are all
independent from one another. (So, for part (e)-(f), there are three types
of vehicles: sedan, truck, motorcycle.)
(e) What is the probability that 30 motorcycles will pass you by 8:30am?
(f) What is the probability that 2 motorcycles will pass you between
7:30am and 8:10am and 3 motorcycles will pass you between 7:50am
and 8:30am?
Chapter 5
Continuous Time Markov

Chain
In Chapter 3, we learned about discrete time Markov chain. In DTMC, the time
period is discretized so that time is denoted by integers. DTMC itself has wide
range of applications. However, in some cases in reality, discretizing time may
be too much simplifying the situation. In such cases, continuous time Markov
chain comes into play. An outstanding application domain where CTMC is
heavily used is queueing theory. We will cover various ways to model different
queueing systems.
5.1 Introduction
Let us start with some examples.
Example 5.1. A machine goes up and down over time when it is down, it
takes di amount of time to be repaired, where di is exponentially distributed
with mean 1 hour. {di , i = 1, 2, 3, . . . } is an iid sequence. When it is up, it will
stay up ui amount of time that is exponentially distributed with mean 10 hours.
{ui , i = 1, 2, 3, . . . } is an iid sequence. Assume that up times and repair times
are independent. Can the problem be modeled by a CTMC?
The answer is yes. What are the components of a CTMC? First, there has
to be a state space, in this case S = {up, down}. Second, each state has to have
corresponding holding time, in this case λup = 1/10, λdown = 1. Last, we need
a roadmap matrix R from the underlying DTMC.

0 1
R=
1 0
Example 5.2. We have two machines each of which having the same charac-
teristics as before and there is only one repairperson. What is X(t)? Let X(t)
be the number of machines that are up at time t. State space S = {0, 1, 2}.
91
92 CHAPTER 5. CONTINUOUS TIME MARKOV CHAIN
Is X = {X(t), t ≥ 0} a CTMC? What is the holding time of state 2? In

the state 2, two machines are competing to fail first. The holding time is
min(X1 , X2 ) ∼ Exp(1/10 + 1/10) = Exp(1/5). How about the holding time
for state 1? At this state, the repairperson and the other operating machine
are competing to finish first. Due to the memoryless property, the other alive
machine is as if new. The holding time follows Exp(1/1 + 1/10) = Exp(11/10).
At the end of state 2, what would be the next state? This is where the roadmap
matrix comes into play.
1 1/10
R1,2 = , R1,0 =
1 + 1/10 1 + 1/10
You can compute the entire R matrix based on similar logic.
Suppose we have two machines and one repairman. Up times and down
times follow the following distributions.
Up times ∼Exp(1/10)
Down times ∼Exp(1)
Let X(t) be the number of machines that are up at time t. Since X(t) is
a stochastic process, I am going to model it using CTMC. For CTMC, we
need three ingredients. First, state space S = {0, 1, 2}. Second, holding times
λ0 , λ1 , λ2 . λi can be interpreted as how frequently the chain jumps from state
i. Let us compute λi , i ∈ S = {0, 1, 2}.
• When X(t) = 0, the repairman is working on one of two machines both of
which are down at the moment, so λ0 = 1.
• When X(t) = 1, the holding time follows min(up time, down time) =
Exp((1/10) + 1), so λ1 = 11/10.
• When X(t) = 2, the holding time in this case follows min(up time, up time) =
Exp(0.1 + 0.1), so λ2 = 1/5.
Now we know when the chain will jump, but we don’t know to which state the
chain will jump. Roadmap matrix, the last ingredient of a CTMC, tells us the
probability.
 
0 0 1 0
R = 1 (1/10)/(1 + 1/10) 0 1/(1 + (1/10))
2 0 1 0
We have specified the input data parameters we need to model a CTMC.
Let us introduce the concept of generator matrix which is somewhat more
convenient than having holding times and roadmap matrix separately.
   
0 −λ0 λ0 R01 λ0 R02 −1 1 0
G = 1 λ1 R10 −λ1 λ1 R12  = 1/10 −11/10 1 
2 λ2 R20 λ2 R21 −λ2 0 1/5 −1/5
We can also draw a rate diagram showing these data graphically. If you are
given the following diagram, you should be able to construct G from it.
1 1
0 1 2
1/10 1/5
If you look at the diagram, you may be able to interpret the problem more
intuitively.
Example 5.3. We have two operators and three phone lines. Calls arrival
follows a Poisson process with rate λ = 2 calls/minute. Each call processing
time is exponentially distributed with mean 5 minutes. Let X(t) be the number
of calls in the system at time t. In this example, we assume that calls arriving
when no phone line is available are lost. What would the state space be? S =
{0, 1, 2, 3}.
2 2 2
0 1 2 3
1/5 2/5 2/5
How do we compute the roadmap matrix based on this diagram? For ex-
ample, you are computing R21 , R23 . Simply just add up all rates leaving from
state 2 and divide the rate to the destination state by the sum.
2 2/5
R23 = , R21 =
2 + (2/5) 2 + (2/5)
Now, let us add another condition. If each customer has a patience that
is exponentially distributed with mean 10 minutes. When the waiting time in
the queue of a customer exceeds his patience time, the customer abandons the
system without service. Then, the only change we need to make is the rate on
the arrow from state 3 to state 2.
2 2 2
0 1 2 3
1/5 2/5 2/5+1/10
Example 5.4. John and Mary are the two operators. John’s processing times
are exponentially distributed with mean 6 minutes. Mary’s processing times
are exponentially distributed with mean 4 minutes. Model this system by a
CTMC. What would the state space be? S = {0, 1J, 1M, 2, 3}. Why can’t we
use S = {0, 1, 2, 3}? Let’s see and let the question be open so far. Let us draw
the diagram first assuming that S = {0, 1, 2, 3}.
2 2 2
0 1 2 3
? 1/6+1/4 1/6+1/4+1/10
We cannot determine the rate from state 2 to state 1 because we don’t know who
is processing the call, John or Mary. So, we cannot live with S = {0, 1, 2, 3}. For
Markov chain, it is really an important concept that we don’t have to memorize
all the history up to the present. It’s like “Just tell me the state. I will tell you
how it will evolve.” Then, let’s see if S = {0, 1J, 1M, 2, 3} works.
1M
? 2
2
1/4 1/6
0 2 3
? 2
1/6+1/4+1/10
1/6 1/4
1J
Even in this case, we cannot determine who takes a call when the call arrives
when both of them are free. It means that we do not have enough specification
to model completely. In tests or exams, you will see more complete description.
It is part of manager’s policy. You may want John takes the call when both
are free. You may toss a coin whenever such cases happen. What would the
optimal policy be in this case? It depends on your objective. Probably, John
and Mary are not paid same. You may want to reduce total labor costs or the
average waiting time of customers.
Suppose now that we direct every call to John when both are free.
1M
0 2
2
1/4 1/6
0 2 3
2 2
1/6+1/4+1/10
1/6 1/4
1J
We have complete information for modeling a CTMC. These are inputs. Then,
what would outputs we are interested in be? We want to know which fraction
of time the chain stays in a certain state in the long run. We also want to
know how many customers are lost. We may plan to install one more phone
line and want to evaluate the improvement by new installation. We will cover
these topics from next time.
Example 5.5. A small call center with 3 phone lines and two agents, Mary
and John. Call arrivals follow a Poisson process with rate λ = 2 calls per
minute. Mary’s processing times are iid exponential with mean 4 minutes.
John’s processing times are iid exponential with mean 6 minutes. Customer’s
patience follows iid exponential with mean 10 minutes. An incoming call to an
empty system always goes to John.
The rate diagram will be as follows.
1M
2
1/4
2
1/6
0 2 3
2 2
1/6+1/4+1/10
1/6 1/4
1J
It is very tempting to model this problem like the following. It is wrong.

You won’t get any credit for this if you model like this in test.
2 2 2
0 1 2 3
? 1/6+1/4 1/6+1/4+1/10
This model is not detail enough to capture all situations explained in the ques-
tion. You cannot determine what number should go into the rate from state 1
to 0 because you did not take care of who is handling the phone call if there is
only one.
Let us think about the formal definition of CTMC.
Definition 5.1 (Continuous-Time Markov Chain). Let S be a discrete space.
(It is called the state space.) For each state i ∈ S, let {ui (j), j = 1, 2, . . . }
be a sequence of iid r.v.’s having exponential distribution with rate λi and
{φi (j), j = 1, 2, . . . } be a sequence of iid random vectors.
Also, as in DTMC, CTMC has the similar Markov property in a subtly

different mathematical form.
Pr{X(t + s) = j | X(t0 ) = i0 , X(t1 ) = i1 , . . . , X(tn−1 ) = in−1 , X(t) = i}

=Pr{X(t + s) = j | X(t) = i} = Pr{X(s) = j | X(0) = i} = Pij (s)
for any t0 < t1 < · · · < tn−1 < t and any i0 , i1 , . . . , in−1 ∈ S and any i, j ∈ S.
This is the defining property of a CTMC. As in DTMC, Kolmogorov-Chapman
equation holds
Pij (s) = Pr{X(s) = j | X(0) = i}.
For example, S = {1, 2, 3}. φi (j) takes vector in (1, 0, 0), (0, 1, 0), (0, 0, 1).
Think you are throwing a three-sided die and choose one of three possible vectors
for the next φi value.
4
u3(1)
3
Y Axis
u2(1) u2(2)
2
u1(1)
1
0
0 1 2 3 4 5 6 7 8 9 10
X Axis
Think of φi (j) as the outcome of the jth toss of the ith coin. If we have the
roadmap matrix,
 
0 1/2 1/2
R = 1/4 0 3/4 ,
1/8 7/8 0
each row i represent ith coin. In this case, the first coin is a fair coin but the
other two are biased. Now we can generate a series of vectors φi (j).
5.1.1 Holding Times

Then, how can we generate iid exponentially distributed random variables,
ui (j)? Computer is only able to drop a needle within an interval. Say [0, 1]. We
can ask computer to drop a needle between 0 and 1. How can we generate an
exponentially distributed random variable from this basic building block? Say
we are trying to generate exponential random variables with rate 4. Define
1
X = − ln(1 − U ).
4
Then,

1
Pr{X > t} =Pr − (1 − U ) > t
4
=Pr{ln(1 − U ) < −4t} = Pr (1 − U ) < e−4t

=Pr U > 1 − e−4t

=1 − 1 − e−4t

=e−4t .
We got the exponential distribution we initially wanted by just using a uniform

random variable. You will learn how to generate other types of random variables
from a uniform distribution in simulation course. That will be the main topic
there. Simulation in fact is an integral part of IE curriculum.
5.1.2 Generator Matrix

Let us examine the relationship between roadmap matrix at time t and generator
matrix. Generator matrix is the derivative of transition matrix at time 0.
P (t) = (Pij (t))

0
P1,1 (0) = − λ1 = G1,1
0
Pi,j (0) =Gij
Remember Kolmogorov-Chapman equation? Differentiate with respect to s.
P (t + s) = P (t)P (s) for any t, s ≥ 0

d
= P 0 (t) = P (t)P 0 (0) = P (t)G,

P (t + s) t≥0
ds s=0
P (0) = I P (t) = etG = expm(tG)
In other words, we can generate P (t) for any t using the generator matrix G.
That is why G is called generator matrix.
In reality, we can compute expm without Matlab only in a few cases. In such
special case, you can first obtain eigenvalues of G matrix using the following
formula.
GV1 = a1 V1 , GV2 = a2 V2 , GV3 = a3 V3

If all eigenvalues are distinct, we can exponentiate the matrix rather easily.
   
a1 a1
GV =V  a2  ⇒ G=V  a2  V −1
a3 a3
   n 
a1 a1
∴ G =V  a2  V −1 , . . . , Gn = V  an2  V −1
n
a3 a3
Hence,
∞
X Gn
expm(G) =
n=0
n!
an
P 
∞ 1
n=0 n! P∞ an  −1
=V  V
 2
n=0 n! P∞ an
3
n=0 n!
 a 
e 1
=V  ea2  V −1 .
a3
e
Let us run Matlab experiment.

>> G=[-2 1 1; 2 -5 3; 2 2 -4]
>> expm(2*G)
>> expm(5*G)
>> exp(5*G)
When you compute e5G , you will see all rows identical. It seems like the chain
reaches steady-state. Also, if you look at the result from exp(5*G) command,
it is just completely wrong for our purpose.
Now we can answer the questions raised at the beginning of the class.
Pr{X(2) = 3 | X(0) = 1} =.2856

Pr{X(5) = 2, X(7) = 3 | X(0) = 1} =(.2143)(.2858) = 0.06124694
Since you are not allowed to use Matlab in test, you will be given a matrix
such as expm(G).
Example 5.6. Let X be a CTMC on state space S = {1, 2, 3} with generator
 
−2 1 1
G =  2 −5 3  .
2 2 −4
1. Find Pr(X(2) = 3|X(0) = 1).
Pr(X(2) = 3|X(0) = 1) =e2G

1,3
2. Find Pr(X(5) = 2, X(7) = 3|X(0) = 1).

Pr(X(5) = 2, X(7) = 3|X(0) = 1)
=Pr(X(5) = 2|X(0) = 1)Pr(X(7) = 3|X(5) = 2)
=Pr(X(5) = 2|X(0) = 1)Pr(X(2) = 3|X(0) = 2) = e5G 2G
1,2 e2,3
Example 5.7. Let X = {X(t), t ≥ 0} be a CTMC on state space S = {1, 2, 3}

with the following rate diagram.
2 1
1 2 3
5 4
The generator matrix is

 
−2 2 0
G= 5 −6 1 .
0 4 −4
Suppose that you are asked to compute P1,3 (10) meaning the probability going
from state 1 to state 3 after 10 minutes. Using Kolmogorov-Chapman equation,
X
P1,3 (10) = P1,k (5)Pk,3 (5) = [P (5)]21,3 .
k
You can compute P1,3 (10) given that you are given P1,3 (5). How about P1,3 (1)
or P1,3 (1/10)? We can still compute P1,3 (10) by exponentiating these to the 10th
power or 100th power. In this way, we can have the following approximation
formula.
P (t) = P 0 (0)t + P (0) = P 0 (0)t + I
What we are really talking about is the derivative of the matrix.
P (t) = etG
What are we talking about? Exponentiating a matrix? There are two commands
in Matrax relevant to exponentiating a matrix.
>> exp(A)
[e1 e2; e3 e4]
>> expm(A)
This is what we want.
∞
X An
expm(A) =
n=0
n!
When you ask the calculator

This is called transient
5.2 Stationary Distribution

By definition, stationary distribution of CTMC means that if you start with this
distribution you will not deviate from it forever. For example, if (a, b, 1 − a − b)
is the stationary distribution of a CTMC X(t) and X(0) = (a, b, 1 − a − b).
Then, X(5) = X(10.2) = (a, b, 1 − a − b).
How can we compute the stationary distribution without using computer.
As in DTMC, we might want to use πP = π. But, in this case, P (t) can change
over time. Which P (t) should we use? In addition, it is usually hard to compute
P (t) from generator matrix without using a computer. We have to come up with
the way to compute the stationary distribution only with the generator. Since
π = πP (t) should hold for all t ≥ 0, if we take derivative of both sides with
respect to t and evaluate it at t = 0, then
0 = πP 0 (0) ⇒ πG = 0.
In our case,
 
−2 1 1
(π1 , π2 , π3 )  2 −5 3  = 0.
2 2 −4
A theoretic parts are involved here.
1. Because X is irreducible, it has at most one stationary distribution.
2. Because S is finite, it has at least one stationary distribution.
Therefore, we have one unique stationary distribution in our case.
Example 5.8. Consider a call center with two homogeneous agents and 3 phone
lines. Arrival process is Poisson with rate λ = 2 calls per minute. Processing
times are iid exponentially distributed with mean 4 minutes.
1. What is the long-run fraction of time that there are no customers in the
system? π0
2. What is the long-run fraction of time that both agents are busy? π2 + π3
3. What is the long-run fraction of time that all three lines are used? π3
Answer: X(t) is the number of calls in the system at time t. S = {0, 1, 2, 3}.
We can draw the rate diagram based on this information. In fact, having the rate
diagram is equivalent to having the generator matrix. When we solve πG = 0,
it is really just solving flow balancing equations, flow in = flow out in each state.
1 1 1
2π0 = π1 , 2π1 = π2 , 2π2 = π3 , π0 + π1 + π2 + π3 = 1
4 2 2
Solving this by setting π0 = 1 and normalizing the result, we obtain

1 8 32 128
π = (1, 8, 32, 128) ⇒ π = , , , .
169 169 169 169
Your manager may also be interested in other performance measures.
5.3. M/M/1 QUEUE 101
1. The number of calls lost per minute is λπ3 = 2(128/169) which seems to
be quite high.
2. The throughput of the system is λ(1 − π − 3).
5.3 M/M/1 Queue

Suppose we have an M/M/1 queue, meaning that we have Poisson arrival process
with rate λ arrivals/minute and service times are iid exponentially distributed
with rate µ. To illustrate the point, set λ = 1/2, µ = 1. Assume that the buffer
size is infinite. Let X(t) be the number of customers in the system at time t.
Then, X = {X(t), t ≥ 0} is a CTMC with state space S = {0, 1, 2, . . . }. What
is the easiest way to model this as a CTMC? Draw the rate diagram.
λ λ λ
0 1 2 ...
µ µ µ
Is this CTMC irreducible? Yes. Does it have a stationary distribution?

Yes or no. It depends on the relationship between λ and µ. What if λ > µ?
The queue will eventually be fed up to infinity. In such case, we don’t have a
stationary distribution. If λ < µ, we will have a unique stationary distribution.
Even if λ = µ, we won’t have a stationary distribution. We will look into this
later.
How can we determine the stationary distribution? We can get one by using
πG = 0, but let us first try the cut method. If we cut states into two groups, in
steady state, flow going out from and in to one group should equate. Therefore,
π0 λ =π1 µ ⇒ π1 = ρπ0
π1 λ =π2 µ ⇒ π2 = ρπ1
π2 λ =π3 µ ⇒ π3 = ρπ2
..
.
where ρ = λ/µ = 0.5 in this case. Solving the system of equations, you will get
the following solution.
π1 =ρπ0
π2 =ρ2 π0
π3 =ρ3 π0
..
.
The problem is that we don’t know what π0 is. Let us determine π0 intuitively
first. If server utilization is ρ, it means that in the long run the server is not
busy for 1 − ρ fraction of time. Therefore, π0 = 1 − ρ. We can get πi , ∀i from

ρi π0 . We can solve this analytically as well. Remember the sum of stationary
distribution should be 1.
π0 + π1 + π2 + · · · =1
π0 + ρπ0 + ρ2 π0 + · · · =1
π0 (1 + ρ + ρ2 + . . . ) =1

1
π0 =1 ⇒ π0 = 1 − ρ
1−ρ
We expected this. More concretely,

2
1 1 1
π2 = 1− = = 0.125.
2 2 8
What does this π2 mean? It means that 12.5% of time the system has two
customers.
In general, we can conclude that the CTMC has a stationary distribution if
and only if ρ < 1. Let us assume ρ < 1 for further examples. As a manager,
you will want to know more than long run fraction of time how many customers
you have. You may want to know the long run average number of customers in
the system.
∞
X
nπn =0π0 + 1π1 + 2π2 + . . .
n=0
=1ρ(1 − ρ) + 2ρ2 (1 − ρ) + 3ρ3 (1 − ρ) + . . .
=ρ(1 − ρ)(1 + 2ρ + 3ρ2 + . . . )
=ρ(1 − ρ)(1 + ρ + ρ2 + ρ3 + . . . )0
0
1
=ρ(1 − ρ)
1−ρ
1 ρ
=ρ(1 − ρ) =
(1 − ρ)2 1−ρ
Next question you may wonder is what the average time in the system will
be. Can we use the Little’s Law, L = λW ? What do we know among these
three variables?
ρ
L= = λW
1−ρ
1 ρ 1 λ/µ 1 1 m
W = = = =
λ1−ρ λ1−ρ µ1−ρ 1−ρ
where m is the mean processing time. A little bit tweaked question will be what
the average waiting time in the queue. We can again use the Little’s Law as
5.4. VARIATIONS OF M/M/1 QUEUE 103
long as we define the boundary of our system correctly. It should be Lq = λWq .

How do we compute Lq , Wq ?
Lq =0π0 + 0π1 + 1π2 + 2π3 + . . .

m ρ
Wq =W − m = −m=m
1−ρ 1−ρ
Is this Wq formula familiar? Recall the Kingman’s formula.
c2a + c2s

ρ
W =m
1−ρ 2
In our case here, since both arrival and processing times are iid exponentially
distributed, c2a = c2s = 1. That’s why we did not have ca , cs terms in our original
formula.
You should be able to see the connections among the topics covered during
this semester.
If you look at Wq = m/(1 − ρ), you will notice that Wq → ∞ as ρ ↑ 1. If
ρ = 1/2, Wq = m, which means that the length of time you wait is equal to
the length of time you get serviced. If ρ = 0.95, Wq = 19m. It means that you
wait 19 times longer than you actually get service. It is extremely poor service.
You, as a manager, should achieve both high utilization and short waiting time.
Using a pool of servers, you can achieve both.
5.4 Variations of M/M/1 Queue

5.4.1 M/M/1/b Queue
Let us limit our queue size finite. Let me explain the notation here. First
“M” means Poisson arrival process. In fact, Poisson arrival assumption is not
very restrictive in reality. Many many empirical studies validate that a lot
of human arrivals form a Poisson process. Second “M” means exponentially
distributed service times. If we use “D” instead of “M”, it means deterministic
service times. Factory robots’ service times are usually deterministic. Third “1”
denotes the number of servers, in this case a single server. The last “b” means
the number of spaces in the system. If you remember “G/G/1” queue taught
earlier in the semester, “G” means a general distribution. If you think you have
log-normal service times, we did not learn how to solve the queue analytically.
One thing we can do is computer simulation. By the way, log-normal means
that ln(X) ∼ N (µ, σ 2 ).
Take an example of limited size queueing system. For example, say b = 2.
b is the maximum number of customers that can wait in the queue. The rate
diagram will be as follows.
λ λ λ
0 1 2 3
µ µ µ
We still can use the detailed balance equations. Suppose λ = 1/2, µ = 1 as

in the previous example.
1 1 8
π0 = = =
1 + ρ + ρ2 + ρ3 1 + 1/2 + (1/2)2 + (1/2)3 15
4 2 1
π1 = , π2 = , π3 =
15 15 15
The questions you will be interested are
1. What is the average number of customers in the system?
8 4 2 1 4+4+3 11
0 +1 +2 +3 = =
15 15 15 15 15 15
2. What is the average number of customers in the queue?

8 4 2 1 4
0 +0 +1 +2 =
15 15 15 15 15
3. What is average waiting time in queue? Again, we will use the Little’s
Law. This formula is the one you remember 10 years from now like “There
was something called the Little’s Law in IE curriculum.”
4 1 8
Lq = λWq ⇒ = Wq ⇒ Wq =
15 2 15
Is this correct? What is suspicious? We just used λ = 1/2, but in limited
size queue case we lose some customers. We should use effective arrival
rate, which excludes blocked customers.
Lq = λeff Wq = λ(1 − π3 )Wq
This will generate correct Wq .
5.4.2 M/M/∞ Queue

Arrival process is Poisson with rate λ. Service times are iid exponentially dis-
tributed with rate µ. Let X(t) be the number of customers in system at time
t. X = {X(t), t ≥ 0} is a CTMC in S = {0, 1, 2, . . . }. Google’s servers may
be modeled using this model. In reality, you will never have infinite number of
servers. However, Google should have so many servers that we can assume they
have infinite number of servers. Model is just for answering a certain question.
What would be the rate diagram?
5.4. VARIATIONS OF M/M/1 QUEUE 105
λ λ λ
0 1 2 ···
1µ 2µ 3µ
How can we compute the stationary distribution? Use balance equations

and the cut method.
λ
λπ0 = µπ1 ⇒ π1 = π0
µ
λ2
λπ1 = 2µπ2 ⇒ π2 = 2 π0
2µ
λ3
λπ2 = 3µπ3 ⇒ π3 = π0
3!µ3
..
.
λn
πn = π0
n!µn
P∞
Since we have another condition, i=0 πi = 1,
∞
λ2 λ3

X λ
πi =π0 1 + + 2 + + . . . =1
i=0
µ 2µ 3!µ3
1 1 λ
π0 = λ λ2 λ3
= = e− µ
1 + + 2µ
µ 2 + 3!µ3 + ... eλ/µ
n
1 λ −µλ
πn = e , n = 0, 1, 2, . . . .
n! µ
Thinking of π = (π0 , π1 , π2 , . . . ), what is the distribution? It is clearly not

exponential because it is discrete. π ∼ a Poisson distribution. Compare it with
M/M/1 queue. M/M/1 queue’s stationary distribution is geometric.
You may be not fuzzy about Poisson process, Poisson random variable, etc.
Poisson process tracks a series of incidents. At any given time interval, the
number of incidents is a Poisson random variable following Poisson distribution.
Suppose λ = 10, µ = 1. Then, for example,
102 e−10
π2 = = 50e−10
2!
which is quite small. The long run average number of customers in system is
λ
0π0 + 1π1 + 2π2 + · · · = .
µ
Capacity provisioning: You may want to achieve a certain level of service

capacity. Suppose you have actually 100 servers. Then, you need to compute
Pr{X(∞) > 100}. If Pr{X(∞) > 100} = 10%, it may not be acceptable
because I will be losing 10% of customers. How can we actually compute this
number?
Pr{X(∞) > 100} =1 − Pr{X(∞) ≤ 100}

=1 − [Pr{X(∞) = 0} + Pr{X(∞) = 1} + Pr{X(∞) = 2} + . . . ]
You can also compute how many servers you will need to limit the loss proba-
bility to a certain level. Say you want to limit the loss probability to 2%. Solve
the following equation to obtain c.
Pr{X(∞) > c} = 0.02
This is an approximate way to model a many server situation. We will go

further into the model with finite number of servers.
5.4.3 M/M/k Queue

In this case, we have only k servers instead of infinite number. Assume that the
butter size is infinite. Then, the rate diagram will look like the following.
λ λ λ
k−2 k−1 k k+1
(k − 1)µ kµ kµ
Solving the following system of linear equations,
λ
π1 = π0
µ
2
λ 1
π2 = π0
µ 2!
..
.
k
λ 1
πk = π0 ,
µ k!
Let ρ = λ/(kµ). Then,
1
π0 = 2 3 k−1 k .
λ 1 λ 1 λ 1 λ 1 λ 1
1+ µ + 2! µ + 3! µ + ··· + (k−1)! µ + k! µ 1−ρ
5.5. OPEN JACKSON NETWORK 107
The probability that a customer will wait when they arrive is

k
1 1 λ 1
Pr(X(∞) ≥ k) = πk = π0 .
1−ρ 1−ρ µ k!
As a manager managing this call center, you want to keep it reasonably low.
Remember that the average waiting time in M/M/1 queue is

ρ
E[W ] = m = mw.
1−ρ
Let us call w waiting time factor. This factor is highly nonlinear. Don’t try
to push up the utilization when you are already fed up. Conversely, a little
more carpooling can dramatically improve traffic condition. What is it for this
M/M/k case? Let us first compute the average queue size.
E[Q] =1πk+1 + 2πk+2 + 3πk+3 + . . .
=1ρπk + 2ρ2 πk + 3ρ3 πk + . . .
ρ
=πk
(1 − ρ)2
Using Little’s Law, we can compute average waiting time.
1 ρ
E[W ] = πk
λ (1 − ρ)2
If you use many servers parallel, I can achieve both quality and efficient
service. If the world is so deterministic that arrival times and service times are
both deterministic, you can achieve quality service of high utilization even with
a single server. For example, every two minutes a customer arrives and service
time is slightly less than 2 minutes. There will be no waiting even if you have
only one server. In reality, it is full of uncertainty. In this case, having a large
system will make your system more robust to uncertainty.
Sometimes, you will hear automated message when you call to a call center
saying that there are 10 customers are ahead of you. This information is very
misleading. Your actual waiting time will heavily depend on how many servers
the call center hired.
You can analyze M/M/k/b queue in similar way. The only difference is to
use 1 + ρ + ρ2 + · · · + ρb instead of 1 + ρ + ρ2 + . . . .
5.5 Open Jackson Network

5.5.1 M/M/1 Queue Review
Before going into Jackson Network, let us review M/M/1 queue first. In an
M/M/1 queue, suppose the arrival rate α = 2 jobs per minute. The mean
processing time m = 0.45 minutes. The traffic intensity is computed as follows.
ρ = αm = 2(0.45) = 0.9
Let X(t) be the number of jobs in system at time t. The stationary distri-
bution is
πn =Pr{X(∞) = n} = (1 − ρ)ρn , n = 0, 1, 2, 3, . . .
2 2
π2 =(1 − ρ)ρ = (1 − 0.9)0.9 = 0.081 = 8.1%.
5.5.2 Tandem Queue

Now let us extend our model. Suppose two queues are connected in tandem
meaning that the queues are in series. Jobs are still arriving at rate α = 2
jobs per minute. The first queue’s mean processing time is m1 = 0.45 minutes
and the second queue’s mean processing time is m2 = 0.40 minutes. Define
X(t) = (X1 (t), X2 (t)) where Xi (t) is the number of jobs at station i at time t.
This model is much closer to the situation described in the book, “The Goal”.
If you draw the rate diagram of this CTMC, one part of the diagram will look
like the following.
(3, 3)
µ2
(2, 3) (2, 2)
µ1
(1, 4)
This CTMC may have a stationary distribution. How can you compute the
following?
Pr{X(∞) = (2, 3)} =Pr{X1 (∞) = 2, X2 (∞) = 3}

=Pr{X1 (∞) = 2}Pr{X2 (∞) = 3}
∵ We can just assume the independence
because each queue does not seem to affect each other.
=(1 − ρ1 )ρ21 (1 − ρ2 )ρ32
where ρ1 = αm1 , ρ2 = αm2 . When determining ρ2 , you may be tempted to set

it as m2 /m1 . However, if you think about it, the output rate of the first station
is α not 1/m1 because ρ1 < 1. If ρ1 ≥ 1, we don’t have to care about the steady
state of the system since the first station will explode in the long run.
5.5. OPEN JACKSON NETWORK 109
Let us summarize. Let ρi = αmi for i = 1, 2 be the traffic intensity at station

i. Assume ρ1 < 1, ρ2 < 1. Then,
Pr{X(∞) = (n1 , n2 )} =(1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 , n1 , n2 = 0, 1, 2, . . .
π(n1 ,n2 ) =(1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 .
Suppose I want to claim π = (π(n1 ,n2 ) , n1 , n2 = 0, 1, 2, . . . ) is indeed the station-
ary distribution of the tandem queue. What should I do? I just need to verify
that this distribution satisfies the following balance equation.
(α + µ1 + µ2 )π(2, 3) = απ(1, 3) + µ2 π(2, 4) + µ1 π(3, 2)
Since this chain is irreducible, if one distribution satisfies the balance equation,
it is the only one satisfying it, which is my unique stationary distribution. You
may think this looks like cheating. There can be two ways to find the stationary
distribution: one is to solve the equation πP = π directly, the other is guess
something first and just verify it is the one we are looking for. If possible, we can
take the easy way. If you can remember M/M/1 queue’s stationary distribution,
you should be able to compute the stationary distribution of this tandem queue.
5.5.3 Failure Inspection

Returning to Open Jackson Network, “Open” means that arriving customers
will eventually leave the system.
Let us extend our model once again. Every setting remains same except
that there is inspection at the end of the second station. If a job is found to
be defective, the job will go back to the queue of station 1 and get reprocessed.
The chance of not being defective is 50%. Even in this case, the stationary
distribution has the same form to the previous case.
Pr{X1 (∞) = 2, X2 (∞) = 3} =Pr{X1 (∞) = 2}Pr{X2 (∞) = 3}
=(1 − ρ1 )ρ21 (1 − ρ2 )ρ32
The question is how to set ρ1 , ρ2 . Let us define a new set of variables. Denote
the throughput from station i by λi . It is reasonable to assume that λ1 = λ2
for a stable system. Then, because of the feedback,
λ1 = α + 0.5λ2 ⇒ λ1 = α + 0.5λ1 ⇒ λ1 = 2α = 2.
1 2
Station 1 Station 2 0.5 2
0.5 2
Therefore,
ρ1 =λ1 m1 = 2(0.45) = 90%

ρ2 =λ2 m2 = 2(0.40) = 80%.
This is called the traffic equation. How can we compute the average number of
jobs in the system? Recall that the average number of jobs in M/M/1 queue is
ρ/(1 − ρ). Then,
average number of jobs in the system

=average # of jobs in station 1 + average # of jobs in station 2
ρ1 ρ2 0.9 0.8
= + = + = 9 + 4 = 13 jobs.
1 − ρ1 1 − ρ2 1 − 0.9 1 − 0.8
Still, we did not prove that (1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 is indeed the stationary
distribution. The only thing we need to prove is that this distribution satisfies
the balance equation. At first glance, the two stations seem to be quite inter-
related, but in steady-state, the two are in fact independent. It was found by
Jackson at the end of 50s and published in Operations Research. In theory,
the independence holds for any number of stations and each station can have
multiple servers. It is very general result.
What is the average time in system per job? Use Little’s Law, L = λW .
L = λW ⇒ 13 = 1W ⇒ W = 13 minutes.
It could be 13 hours, days, weeks. It is lead time that you can quote to your
potential customer. How can we reduce the lead time? We can definitely hire or
buy more servers or machines. Another thing you can do is to lower the arrival
rate, in reality term, it means that you should reject some orders. It may be
painful or even impossible. Suppose we reduce the failure rate from 50% to
40%. Then,
5 3 5 2
ρ1 = (0.45) = (0.15)5 = 0.75 = , ρ2 = (0.4) =
3 4 3 3
ρ1 ρ2 3/4 2/3
W = + = + = 3 + 2 = 5 minutes.
1 − ρ1 1 − ρ2 1/4 1/3
Just with 10% point decrease, the lead time dropped more than a half. This
is because of the nonlinearity of ρ/(1 − ρ). What if we have 55% failure than
50%? It will become much worse system. W = 30 minutes. You must now be
convinced that you have to manage the bottleneck. You cannot load up your
bottleneck which is already full.
Before finishing up, let us try out more complicated model. There are two
inspections at the end of station 2 and 3. Failure rates are 30% and 20%
5.6. SIMULATION 111
respectively. Then,
λ2 =λ1 + 0.2λ3
λ1 =α + 0.3λ2
λ3 =0.7λ2 .
Now we can compute ρ1 = λ1 m1 , ρ2 = λ2 m2 , ρ3 = λ3 m3 . The stationary

distribution also has the same form.
π(n1 , n2 , n3 ) = (1 − ρ1 )ρn1 1 (1 − ρ2 )ρn2 2 (1 − ρ3 )ρn3 3
Remember that it has been infinite queue size.
5.6 Simulation
The key difference between DTMC and CTMC comes from the use of generator
matrix. Let us see how we can obtain transition probability matrix between
time 0 and t from the generator matrix. Suppose we have to following generator
matrix.
 
−2 1 1
G =  2 −5 3 
2 2 −4
Let us compute P (0.5) = e0.5G and P (5) = e5G from G where eG is the matrix
exponential of G.
Listing 5.1: Use of Generator Matrix
library(expm)
# Set up the generator matrix

G = matrix(c(-2,1,1, 2,-5,3, 2,2,-4), 3,3,byrow=TRUE)
G
# The following matrices are P(0.5) and P(5)

expm(0.5*G)
expm(5*G)
5.7 Exercise
1. Consider a CTMC X = {X(t), t ≥ 0} on S = {A, B, C} with generator G
given by  
−12 4 8
G= 5 −6 1 
2 0 −2
(a) Draw the rate diagram.
(b) Use a computer software like matlab or a good calculator to directly

compute the transition probability matrix P (t) at t = 0.20 minutes.
(In matlab, the command to exponentiate a matrix A is expm(A).)
(c) Do the previous part for t = 1.0 minute.
(d) Using the results from parts (b) and (c), but without using a software
package or calculator, find P {X(1.2) = C|X(0) = A} and P {X(3) =
A|X(1) = B}.
(e) Do part (b) for t = 5 minutes. What phenomenon have you observed?
2. Customers arrive at a two-server system according to a Poisson process

with rate λ = 10 per hour. An arrival finding server 1 free will begin his
service with him. An arrival finding server 1 busy, server 2 free will join
server 2. An arrival finding both servers busy goes away. Once a customer
is served by either server, he departs the system. The service times at both
servers are exponential random variables. Assume that the service rate of
the first server is 6 per hour and the service rate of the second sever is 4
per hour.
(a) Describe a continuous time Markov chain to model the system and
give the rate transition diagram.
(b) Find the stationary distribution of the continuous time Markov chain.
(c) What is the long-run fraction of time that server i is busy, i = 1, 2?
3. Suppose we have a small call center staffed by two operators A and B

handling three telephone lines. A only handles line 1, and B only handles
only line 2 and 3. Calls arrive according to a Poisson process with rate
λ = 100 calls per hour. All arrivals prefer line 1. Arrivals find all lines are
busy will go away. Service times are exponentially distributed with a mean
of 4 minutes. Customers are willing to wait an exponentially distributed
length of time with a mean of 8 minutes before reneging if service has not
begun.
Describe a continuous time Markov chain to model the system and give
the rate transition diagram and the generator G.
4. Consider a call center that is staffed by K agents with three phone lines.
Call arrivals follow a Poisson process with rate 2 per minute. An arrival
call that finds all lines busy is lost. Call processing times are exponentially
distributed with mean 1 minute. An arrival call that finds both agents
busy will wait in the third phone line untill get service.
(a) Find the throughput and average waiting time when K = 1.

(b) Find the throughput and average waiting time when K = 2.
(c) Find the throughput and average waiting time when K = 3.
5.7. EXERCISE 113
5. Consider a production system consisting of three single-server stations

in series. Customer orders arrive at the system according to a Poisson
process with rate 1 per hour. Each customer order immediate triggers a
job that is released to the production system to be processed at station
1 first, and then at station 2. After being processed at station 2, a job
has p1 = 10% probability going back to station 1 for rework and 1 − p1
probability continuing onto station 3. After being processed at station 3, a
job has p2 = 5% probability going back to station 1, p3 = 10% probability
going back to station 2, and 1 − p2 − p3 probability leaving the production
system as a finished product. Assume that the processing times of jobs
at each station are iid, having exponential distribution, regardless of the
history of the jobs. The average processing times at stations 1, 2 and 3
are m1 = 0.8, m2 = 0.70 and m3 = 0.8 hours, respectively.
(a) Find the long-run fraction of time that there are 2 jobs at station 1,
1 job at station 2 and 4 jobs at station 3.
(b) Find the long-run average (system) size at station 3.
(c) Find the long-run average time in system for each job.
(d) Reduce p1 to 5%. Answer 1(c) again. What story can you tell?
6. Consider the production system in the previous problem with following
modification: p1 = 10%, p2 = p3 = 0 (station 3 makes 100% reliable
operations.) Furthermore, the Management decides to adopt the make-to-
stock policy, using the CONWIP job release policy. Recall that when the
system is operating CONWIP policy only a job leaving station 3 triggers
a new job to be released to station 1. Let N be the CONWIP level.
(a) For N = 2, compute the throughput of the production system. What
is the average time in system per job?
(b) Is it possible to double the throughput? If so, what N is needed to
achieve it? What is the corresponding average time in system per
job?
7. Consider a production system that consists of three single-server stations.
Each station has an infinite size waiting buffer. When a job to a station
finds the server busy, the job waits in the buffer; when the server is free,
the server picks the next job from the buffer. Assume that jobs arrive
at station 1 following a Poisson process with rate α = 1 job per minute.
After being processed at station 1, a job is sent to station 2 without any
delay. After being processed at station 2, it will be sent to station 3 with
probability 50% or to station 1 with probability 50% to be reworked. After
being processed at station 3, it is shipped to a customer with probability
50% or is sent to station 2 with probability 50% to be reworked. The job
processing times at station i are iid exponentially distributed with mean
mi minutes, i = 1, 2, 3. Assume that
m1 = .3 minutes, m2 = .2 minutes, m3 = .45 minutes.
(a) What is long-run average utilization for station i, i = 1, 2, 3.

(b) What is the long-run fraction of time that station 1 has 2 jobs, station
2 has 3 jobs, and station 3 has one job? (Leaving your answer in a
numerical expression is OK. )
(c) What is the long-run average number of jobs at station 2, including
those in the buffer and possibly the one being served?
(d) What is the throughput of the system? Here, throughput is the rate
at which the jobs are being shipped to customers.
(e) What is the long-run average time in system per job?
8. Let X = {X(t) : t ≥ 0} be a continuous time Markov chain with state

space {1, 2, 3} with generator
 
−4 ? 1
G =  1 ? 2 .
1 ? −2
Using matlab, one can compute eG via command

 
0.20539 0.36211 0.43250
expm(G) = 0.19865 0.35727 0.44407 .
0.19865 0.33896 0.46239
(a) Fill in entries in G.

(b) Find Pr{X(2) = 3|X(1) = 2}.
(c) Find Pr{X(1) = 2, X(3) = 3|X(0) = 2} (Leaving answer in a nu-
merical expression is OK.)
(d) Find the stationary distribution of the Markov chain.
(e) Find Pr{X(200) = 3|X(1) = 2}.
9. A call center has 4 phone lines and two agents. Call arrival to the call
center follows a Poisson process with rate 2 calls per minute. Calls that
receive a busy signal are lost. The processing times are iid exponentially
distributed with mean 1 minute.
(a) Model the system by a continuous time Markov chain. Specify the
state space. Clearly describe the meaning of each state. Draw the
rate diagram.
(b) What is the long-run fraction of time that there are two calls in the
system?
(c) What is the long-run average number of waiting calls, excluding those
in service, in the system?
(d) What is the throughput (the rate at which completed calls leaves the
call center) of the call center?
5.7. EXERCISE 115
(e) What is the average waiting time of an accepted call?
10. A production line consists of two single-machine stations, working in serial.

Each machine has an infinite buffer waiting area. Jobs arrival at the
production line follows a Poisson process with rate 2 jobs per minute. Each
job is first processed at station 1; afterwards, it is processed at station 2.
After a job is processed at station 2, with 12% probability the job is sent
back to station 1 for rework, and with 88% probability the job is shipped to
customers, regardless of the job’s history. When the machine at a station
is busy, the arriving job at the station is waiting in its buffer. Assume
that processing times at each machine are iid, exponentially distributed
with mean .4 minutes.
(a) Find the long-run average utilization of machine 1 and machine 2.
(b) What is the long-run fraction of time that there 2 jobs at station 1
and 3 jobs at station 2? you may leave your answer in an expression.
(c) What is the long-run average number of jobs at station 2?
(d) What is the average time in system per job?
11. A production system has three machines working in parallel. The up times
of a machine is assumed to be iid, exponentially distributed with mean
3 hours. When a machine is down, its repair times are iid, exponentially
distributed with mean 1 hour. There are two repairmen, working at the
same speed.
(a) Using a CTMC to find the long-run fraction of time that all machines
are up and running; you need to specify the meaning of each state;
draw the rate diagram; spell out necessary equations that are used
in your calculations.
(b) When a machine is up and running, it processes customer orders at
rate 4 orders per minute. Assume the customer orders are backlogged
in the next month. What is the average production rate per minute
of the production system?
12. Suppose you are running a factory having three stations. Parts come
into Station 1 according to a Poisson process with rate 720 per hour.
After being processed by Station 1, the parts move to Station 2 with
probability 1/3. With 2/3 probability, the parts move to Station 3. After
being processed by Station 2, the parts go back to Station 1. After being
processed by Station 3, the parts immediately leave the system. The
average processing times of three stations are exponential with mean m1 =
2 seconds, m2 = 3 seconds, m3 = 4 seconds. Each station has a single
server and infinite waiting space.
(a) What is the utilization of Station 1?
(b) What is the long-run fraction of time that the entire factory is empty?
(c) What is the total average number of parts in the entire factory?
(d) What is the total average “waiting” time spent in the factory per
part?
(e) What is the throughput of the entire factory?
13. Suppose you are running a laundry shop at the Tech Square. The shop is
open 24 hours a day 7 days a week. You have virtually infinite number of
laundry machines so that each student can start his/her laundry as soon
as he/she arrives at your shop. Each machine can finish any amount of
laundry in exponential time with mean 30 minutes. Students come to your
shop according to a Poisson process with rate 6 per hour. Each student
waits in your shop until his/her laundry is done. You are trying to model
this shop using CTMC. Let X(t) denote the number of students in your
shop.
(a) Which of the following is most likely the right model for this problem?
Choose one from “M/M/1”, “M/M/1/∞”, “M/M/∞”, “M/M/∞/1”.
(b) What is the state space of X(t)? Draw the rate diagram of X(t).
(c) Compute the stationary distribution of X(t).
(d) What is the long-run fraction of time that at least 1 student is in the
shop?
(e) If there is at least 1 student is in the shop, you make $100 per hour
on average. On the other hand, when the shop is empty, you lose $50
per hour on average. What is long-run average profit per day?
Chapter 6
Exercise Answers
6.1 Newsvendor Problem

1. If D > y, (D ∧ y) + (y − D)+ = y + 0 = y. If D < y, (D ∧ y) + (y − D)+ =
D + (y − D) = y.
2. (a) min(D, 7) has the pmf given by


 1/10 if k = 5
3/10 if k = 6

P (min(D, 7) = k) =

 6/10 if k = 7
0 otherwise

Hence,
1 3 6
E[min(D, 7)] = ∗5+ ∗6+ ∗ 7 = 6.5.
10 10 10
(b) (7 − D)+ has the pmf given by



 1/10 if k = 2
3/10 if k = 1

+
P ((7 − D) = k) =
 6/10
 if k = 0
0 otherwise

Hence,
1 3
E[(7 − D)+ ] = ∗2+ ∗ 1 = 0.5.
10 10
117
118 CHAPTER 6. EXERCISE ANSWERS
3. (a) Let X = min(D, 2). Note that X can only take the value of 0, 1 or 2.
Pr{X = 0} =Pr{D = 0} = e−3

Pr{X = 1} =Pr{D = 0} = 3e−3
Pr{X = 2} =Pr{D = 2} + Pr{D = 3} + Pr{D = 4} + · · ·
=1 − Pr{D = 0} − Pr{D = 1} = 1 − 4e−3
Therefore,
E[X] =0Pr{X = 0} + 1Pr{X = 1} + 2Pr{X = 2}

=3e−3 + 2(1 − 4e−3 ) = 2 − 5e−3 .
(b) Let X = (3 − D)+ . Note that X can only take the value of 0, 1, 2, 3.
Pr{X = 0} =Pr{D = 3} + Pr{D = 4} + Pr{D = 5} + · · ·

=1 − Pr{D = 0} − Pr{D = 1} − Pr{D = 2} = 1 − e−3 − 3e−3 − 4.5e−3
Pr{X = 1} =Pr{D = 2} = 4.5e−3
Pr{X = 2} =Pr{D = 1} = 3e−3
Pr{X = 3} =Pr{D = 0} = e−3
Therefore,
E[X] =0Pr{X = 0} + 1PrPr{X = 1} + 2Pr{X = 2} + 3Pr{X = 3}

=4.5e−3 + 6e−3 + 3e−3 = 13.5e−3 .
4. (a)
10
s s2 10 24 36
Z
E[max(D, 8)] = 8 ∗ P (D ≤ 8) + ds = 8 ∗ 3/5 + | = + = 8.4.
8 5 10 8 5 10
(b)
8
8−s 8∗3 s2 8 24 39
Z
E[(8 − D)+ ] = ds = − | = − = 0.9.
5 5 5 10 5 5 10
6.1. NEWSVENDOR PROBLEM 119
5. (a) Let X = max(D, 3).

Z ∞
E[X] = max(x, 3)fD (x)dx
0
Z 3 Z ∞
= max(x, 3)fD (x)dx + max(x, 3)fD (x)dx
0 3
Z 3 Z ∞
= 3fD (x)dx + xfD (x)dx
0 3
Z 3 Z ∞
= (3)7e−7x dx + (x)7e−7x fD (x)dx
0 3
Z ∞
−7x 3 −7x ∞ −7x

= −3e 0
+ −xe 3
− −e dx
3
∞
1
= −3e−21 − (−3) + 0 − (−3e−21 ) − e−7x

7 3
−21

1 −21 e
=3 − 0 − e =3+
7 7
(b) Let X = (D − 4)− .
Z ∞
E[X] = (x − 4)− fD (x)dx
0
Z 4 Z ∞
= (x − 4)− fD (x)dx + (x − 4)− fD (x)dx
0 4
Z 4 Z ∞
= (x − 4)fD (x)dx + 0fD (x)dx
0 4
Z 4
= (x − 4)7e−7x dx
0
4
Z 4
= −(x − 4)e−7x 0 − −e−7x dx
0
1 4
−7(0) −7x

= 0 − (−(0 − 4)e ) − e
7
0

1 −28 1
=−4− e −
7 7
1 −28 1 27 1
=−4− e + = − − e−28
7 7 7 7
6. (a) p = 30, cv = 10, sv = 5. From the newsvendor class notes
Profit(7) = (p − cv ) min{D, 7} − (cv − sv )(7 − D)+ .
Hence
E[Profit(7)] = 20E[min{D, 7}] − 5E[(7 − D)+ ] = 127.5
using the values found in question 1.
(b) Similar to the previous question
E[Profit(7)] = 20E[min{D, 7}] − 5E[(7 − D)+ ] = 130
the way to calculate is the same as in question 2.

(c) If D has an exponential distribution with mean 7 then the pdf of D
is given by
1 −1x
fD (x) = e 7 .
7
As in the last 2 questions we need to find E[min{D, 7}] and E[(7 −
D)+ ].
Z 7
s −s/7
E[min(D, 7)] = 7 ∗ P (D ≥ 7) + e ds = 7e−1 + (7 − 14e−1 ) ≈ 4.43
0 7
7 7
7 − s −s/7 s −s/7
Z Z
+
E[(7 − D) ] = e ds = 7 − e ds = 7 − (7 − 7e−1 ) ≈ 2.575
0 7 0 7
E[Profit(7)] = 20E[min{D, 7}] − 5E[(7 − D)+ ] = 75.725.
7. Data: p = 18, cv = 3, sv = 1, and the distribution of demand is:

D=d 10 11 12 13 14
1 2 4 2 1
P {D = d} 10 10 10 10 10
(a) E(D)= 12
(b) Since E[min{D, 11}] = 10.9 and E[max{11 − D, 0}]= 0.1 it follows
that expected profit under the policy q = 11 is:
g(11)= 15E[min{D, 11}] - 2E[max{11 − D, 0}]= 163.
p−cv 15
(c) The smallest q satisfying F (q) ≥ p−sv = 17 is q = 13.
(d) From tables of the Gaussian (i.e., normal) distribution or by using a
15
calculator: F (q) = P {D < q} = 17 ⇒ q = 1016.78
8. Data: p = 20, cv = 10, and the demand distribution is:

b 15 16 17 18 19 20
1 4 6 5 2 2
P {D = d} 20 20 20 20 20 20
The appropriate decision rule is to choose the smallest q such that F (q) ≥
p−cv p−cv
p = 21 . So for q = 17 we have 11 1
20 > 2 . For q < 17 F (q) < p = 12 .
9. (a) Demand distribution is as follows.
d 10 11 12 13 14 15
1 1 1 1 1 1
Pr{D = d} 6 6 6 6 6 6
Unit understock cost cu is p+b = 650, unit overstock cost co is h = 25

and variable purchasing cost cv is 480.
E[Understock Cost] =cu E[(D − 12)+ ] = 650(1) = $650

E[Overstock Cost] =co E[(12 − D)+ ] = 25(0.5) = $12.5
E[Total Cost] =$662.5
(b) the optimal order-up-to quantity is the smallest y such that
p + b − cv 170
FD (y) ≥ = .
p+b+h 675
Test from the smallest possible D value.
1 170
FD (10) = <
6 650
2 170
FD (11) = ≥
6 650
Therefore, the optimal order-up-to quantity is 11.
(c) Look up the z-table to find z that makes the left-hand side tail area
equal to 170/650.
0.4
0.3
PDF
0.2
Area = 0.2518
0.1
0
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
The corresponding z ∗ value is −0.669 and we multiply the standard

deviation of the demand and add the mean of the demand to find
the corresponding y ∗ value in the demand curve.
y ∗ = σz ∗ + µ = (100)(−0.669) + 1000 = 933
Hence, the optimal order-up-to quantity in this case is 933 units.

10. (a) The optimal order-up-to quantity is the smallest y such that
cu − cv 100 − 40 6
FD (y) ≥ = = .
cu + co 100 + 10 11
Test from the smallest possible D value.
1 6
FD (500) = <
8 11
5 6
FD (600) = ≥
8 11
Therefore, the optimal order-up-to quantity y ∗ is 600 liters.
(b) If we decide to order, we will order up to 600 liters. Now, we need
to set up the guideline that tells us when to order. Suppose we have
m inventory currently. Evaluate the two scenarios: ordering and not
ordering
• Ordering: If we decide to order, first of all, we will place an
order of (600 − m) liters. We need to pay the fixed ordering cost
and the variable purchasing cost for the order. We also have
to expect the costs from shortage and left-over situations. Note
that we will start the next month with 600 liters because we will
order up to 600 liters. Hence,
E[Cost] =cf + cv (600 − m) + cu E[(D − 600)+ ] + co E[(600 − D)+ ]

=1000 + 40(600 − m) + 100(50) + 10(12.5) = 30125 − 40m
• Not Ordering: If we decide not to order, we don’t have to pay

the ordering cost. We do have to expect shortage and left-over
costs. Now assume that 500 ≤ m ≤ 600. (We will check if our
m∗ will be indeed in this range, [500, 600].) Note that we will
start the next month with m liters because we will not order in
this scenario.
E[Cost] =cu E[(D − m)+ ] + co E[(m − D)+ ]

600 − m 700 − m 800 − m m − 500
=100 + + + 10
2 4 8 8

4600 − 7m m − 500
=100 + 10
8 8
460000 − 700m + 10m − 5000 455000 − 690m
= =
8 8
When we have exactly m∗ liters, we should be indifferent between
ordering and not ordering. Hence,
455000 − 690m∗
30125 − 40m∗ = ⇒ m∗ = 578.38 liters.
8
m∗ is indeed in the [500, 600] range. It means that if our inventory
is below 578.38 liters, we will order up to 600 liters. If our inventory
is above 578.38 liters, we will not order for the next month.
(c) If m = 0, since m = 0 < m∗ , we will order 600 liters. Then,
E[Cost] =cf + cv (600) + cu E[(D − 600)+ ] + co E[(600 − D)+ ]

=1000 + 40(600) + 100(50) + 10(12.5) = $30125.
On the other hand, if we have m = 700 liters, we will not order

because m = 700 > m∗ . Thus,
E[Cost] =cu E[(D − 700)+ ] + co E[(700 − D)+ ]

100 200 100
=100 + 10 +
8 8 2
=100(12.5) + 10(75) = $2000.
11. (a) The optimal order-up-to quantity is the smallest y such that
cu − cv 100 − 40 6
FD (y) ≥ = = .
cu + co 100 + 10 11
In this case, look up the cdf graph of the continuous uniform distri-
bution of D.
1
0.9
0.8
0.7
0.6 6/11
CDF
0.5
0.4
0.3
0.2
0.1
618.18
0
0 100 200 300 400 500 600 700 800 900 1000
Hence, the optimal order-up-to quantity y ∗ is 618.18 liters.

(b) Suppose we have m liters of inventory currently. We have two sce-
narios again.
• Ordering:
E[Cost] =cf + cv (y ∗ − m) + cu E[(D − y ∗ )+ ] + co E[(y ∗ − D)+ ]

Z 800 Z 618.18
x − 618.18 618.18 − x
=1000 + 40(618.18 − m) + 100 dx + 10 dx
618.18 400 400 400
=30454.5 − 40m
• Not Ordering: Assume that m is somewhere between 400 and

618.18 liters.
E[Cost] =cu E[(D − m)+ ] + co E[(m − D)+ ]

Z 800 Z m
x−m m−x
=100 dx + 10 dx
m 400 400 400
2 2
m(m − 400) − 0.5(m2 − 4002 )

0.5(800 − m ) − m(800 − m)
=100 + 10
400 400
2 2
0.5m − 800m + 320000 0.5m − 400m + 80000
= +
4 40
5.5m2 − 8400m + 3280000
=
40
When we have exactly m∗ liters, we should be indifferent between
ordering and not ordering. Hence,
5.5m2 − 8400m + 3280000

30454.5 − 40m∗ = ⇒ m∗ = 703.46 or 532.90 liters.
40
Since we know that m∗ < y ∗ = 618.18, the true critical value is
m∗ = 532.90 liters. It means that if our inventory is below 532.90
liters, we will order up to 618.18 liters. If our inventory is above
532.90 liters, we will not order for the next month.
(c) If m = 0, since m = 0 < m∗ , we will order 618.18 liters. Then,
E[Cost] =cf + cv (618.18) + cu E[(D − 618.18)+ ] + co E[(618.18 − D)+ ] = $30454.5.
On the other hand, if we have m = 700 liters, we will not order

because m = 700 > m∗ . Thus,
E[Cost] =cu E[(D − 700)+ ] + co E[(700 − D)+ ]

Z 800 Z 700
x − 700 700 − x
=100 dx + 10 dx
700 400 400 400
=100(12.5) + 10(112.5) = $2375.
12. (a) Using the notation in Section 1.4 of the newsvendor notes: cv = 10,
sv = 8, and p = 90. Then
p − cv 90 − 10
F (qA ) = = = 0.97561.
p − sv 90 − 8
The corresponding standard normal value is 1.97.

Hence,
qA = 1.97 ∗ 500, 000 + 1, 500, 000 = 2, 485, 000.

(b) As done above, the critical value is 1.97. Hence the optimal order up
to quantity is
qB = 1.97 ∗ 100, 000 + 500, 000 = 697, 000.
(c) Let FC (x) be the distribution function of the demand for part C.
To satisfy the same fraction of demand as A and B we need find the
quantity qc such that
FC (qC ) = FA (qA ) = FB (qB ) = 0.97561.
Now part C is normally distributed with mean as sum of means of A

and B and variance as sum of variance of A and B. This implies
p
qC = 1.97 ∗ ( 500, 0002 + 100, 0002 ) + 2, 000, 000 = 3, 004, 506.844.
(d) To compute the expected cost to produce part A and B we use the
formula given in the newsvendor notes.
E(Cost(A)) = cv qA − sv E((qA − DA )+ ) + pE((DA − qA )+ )

E(Cost(B)) = cv qB − sv E((qB − DB )+ ) + pE((DB − qB )+ ).
Recall that Fi (x) denotes the cumulative distribution function of the

demand for part i for i = A, B, C. Hence Fi (x) is the c.d.f. of a
normal distribution with the mean and variance given in the question
for each part type.
First we compute the expected overstock quantity for A:
qA
−(x − µA )2

q −x
Z
+
E((qA − DA ) ) = pA exp 2 dx
2
2πσA 2σA
0
Z qA
−(x − µA )2

x
= qA (FA (qA ) − FA (0)) − p
2
exp 2 dx
0 2πσA 2σA
Z qA
−(x − µA )2

x − µA
= (qA − µA )(FA (qA ) − FA (0)) − p
2
exp 2 dx
0 2πσA 2σA
Z (qA −µA )2
1 t
= (qA − µA )(FA (qA ) − FA (0)) − p
2
exp − 2 dt
µ2A 2 2πσA 2σA
" r
2
!(qA −µA )2 #
σA t
= (qA − µA )(FA (qA ) − FA (0)) + exp − 2
2π 2σA 2
µ
A
Next we compute the expected understock quantity for A:
∞
(x − µA )2

x − qA
Z
+
E((DA − qA ) ) = p
2
exp − 2 dx
qA 2πσA 2σA
Z ∞
(x − µA )2

x
= −qA (1 − FA (qA )) + p
2
exp − 2 dx
qA 2πσA 2σA
Z ∞
(x − µA )2

x − µA
= (µA − qA )(1 − FA (qA )) + p
2
exp − 2 dx
qA 2πσA 2σA
Z ∞
1 t
= (µA − qA )(1 − FA (qA )) + p
2
exp − 2 dx
(qA −µA )2 2 2πσA 2σA
" r !∞ #
2

σA t
= (µA − qA )(1 − FA (qA )) − exp − 2
2π 2σA
(qA −µA )2
The same holds for B and C, i.e.;
2
" r !(qB −µB )2 #
σ t
E((qB − DB )+ ) = (qB − µB )(FB (qB ) − FB (0)) + B

exp − 2
2π 2σB 2
µB
" r ! 2#
2 C −µC )
(q

+ σC t
E((qC − DC ) ) = (qC − µC )(FC (qC ) − FC (0)) + exp − 2
2π 2σC 2
µC
" r !∞ #
2

+ σB t
E((DB − qB ) ) = (µB − qB )(1 − FB (qB )) − exp − 2
2π 2σB
(qB −µB )2
" r ! #
2
∞
+ σC t
E((DC − qC ) ) = (µC − qC )(1 − FC (qC )) − exp − 2 .
2π 2σC
(qC −µC )2
New we plug all the numbers. To illustrate (Below, Φ denotes the

c.d.f. of the standard normal distribution.)
E((qA − DA )+ ) = 985, 000 ∗ (0.97561 − Φ(−3))

(985, 000)2 (1, 500, 000)2

500, 000
+ √ exp − − exp −
2π 2 ∗ 500, 0002 2 ∗ 500, 0002
= 986, 065
(985, 000)2

500, 000
E((DA − qA )+ ) = −985, 000 ∗ (1 − 0.97561) + √ exp −
2π 2 ∗ 500, 0002
= 4, 640.47.
Similarly,
E((qB − DB )+ ) = 197, 911

E((DB − qB )+ ) = 927.363
E((qC − DC )+ ) = 1, 009, 238
E((DC − qC )+ ) = 4, 732.
Now,
E(Cost(A)) = 10 ∗ 2, 485, 000 − 8 ∗ 986, 065 + 90 ∗ 4640.47 = 173, 791$
Similarly
E(Cost(B)) = 54, 701$

E(Cost(C)) = 344, 150$
It costs less to produce A and B instead of C. And the company

would lose 344, 150 − 54, 701 − 173, 791 = 115, 658$.
(e) The optimal amount of C to produce is found by solving
90 − 14
F (qC ) = = 0.9268
90 − 8
The corresponding standard normal value is 1.45.
∗
qC = 1.45 ∗ 509, 901 + 2, 000, 000 = 2, 739, 356.
Here mean and variance are found by adding the mean and variances
for A and B.
(f) This is exactly similar to part (d) above except that we have to
compute costs by plugging in the optimal quantity ordered for C.
E(Cost(C)) = 338, 352. It is more profitable to produce A and B.
13. Note that
100 − 50 5
= .
100 − 10 9
Therefore, the optimal order up to quantity is 300.
Assume the department makes a production run. The total expected cost
is
2000 + 100(50) + 100E(X − 300)+ − 10E(300 − X)+

= 2000 + 5000 + 100(100)(.2) − 10[200(.1) + 100(.2)]
= 2000 + 5000 + 2000 − 400
= 2000 + 5000 + 1600
= 8600.
Assume the department does not make the production. The expected
total cost is
100E(X − 200)+ − 10E(200 − X)+

= 100[100(.5) + 200(.2)] − 10[100(.1)]
= 8900.
Therefore, it is cheaper for the company to make the production run. It

should produce 100 parts.
14. (a) Note that

10 − 5 5
= .
10 − 2 8
Thus, one should order x, where x satisfies
x − 50 5
=
150 − 50 8
or x = 112.5 gallons.
(b)
E[Profit] =10E{min(100, D)} + 2E{(100 − D)+ } − 5(100)

Z 150 Z 150
1 1
=10 min(100, s) ds + 2 (100 − s)+ ds − 500
50 100 50 100
Z 100 Z 150 Z 100
10 2
= sds + 100ds + (100 − s)ds − 500
100 100 50
50 100

1 1 2 2
2 1 2 2

= 100 − 50 + 5000 + 5000 − 100 − 50 − 500
10 2 100 2
=875 + 25 − 500 = 400 dollars
15. Here p = 4, c = 1 and sv = −1.
(a) First, find the critical ratio:

p−c
= 3/5.
p − sv
Then we want to find q ∗ such that Pr{D ≤ q ∗ } = 3/5. Since D

is uniform between 1 million gallons and 2 million gallons, Pr{D ≤
q ∗ } = q ∗ − 1. Therefore, we have q ∗ = 1.6 million gallons, and the
optimal order quantity is 1.6 − 0.5 = 1.1 million gallons.
(b) One million gallons are purchased, so the order upto quantity is q =
1.5 million. There are three sources of cost: variable cost, shortage
cost, and overstock cost. The variable cost is Cv = 1 million dollars.
To compute the shortage and overstock costs, we first get E[(D −
R 0.5 R 0.5
q)+ ] = 0 xdx = 1/8 million gallons and E[(q − D)+ ] = 0 xdx =
1/8 million gallons. Therefore the shortage cost Cs = 4 × 1/8 = 1/2
million dollars and the overstock cost is Co = 1/8 million dollars.
Hence the total cost is 1 + 1/2 + 1/8 = 1.625 million dollars.
16. (a) The critical value is
p − cv 500 − 100 8
= = ≈ 0.89
p+h 500 − 50 9
where p is penalty cost, cv is variable cost, and h = −s is holding

cost or negative salvage value. The minimum y such that
p − cv
F (y) ≥ = 0.89
p+h
is y = 300. Therefore, the optimal quantity to produce is 300 units.

The optimal expected total cost is
E[C] =100(300) + 500E[(D − 300)+ ] − 50E[(300 − D)+ ]

=30000 + 500(10) − 50(80) = 31000 dollars.
(b) Fixed cost does not affect the optimal production quantity. The
optimal production up-to quantity is still 300, so the company has
to produce 200 units. The optimal expected total cost is
E[C] =20000 + 100(200) + 500E[(D − 300)+ ] − 50E[(300 − D)+ ]

=41000 dollars.
17. (a) Remember “1 salad = 2 lettuce”. In terms of salad, p = 6, cv =

1 × 2 = 2, h = 0.5 × 2 = 1. The optimal order-up-to quantity for
lettuce salad, y ∗ , is the smallest y such that
p − cv 6−2 4
F (y) ≥ = = .
p+h 6+1 7
Hence, y ∗ = 30 salad dishes. Therefore, the optimal order-up-to

quantity for lettuce is 60 units.
(b) You have 25 salad dishes for the day. There is no overstock cost. The
expected profit is
E[Profit] =6 × E[(D ∧ 25)] − 2 × 25 − 1 × E[(25 − D)+ ]

145 5 5 1
=6 × − 50 − 1 × = 145 − 50 − = 94 dollars.
6 6 6 6
6.2 Queueing Theory

1. (a)
∞
c
Z
1= ce−3s ds = ∴c=3
0 3
(b) An exponential random variable having pdf in the form λe−λx has the
mean of 1/λ and the variance of 1/λ2 . Hence, E[Y ] = 1/3, Var[Y ] =
1/9 and
Var[Y ] 1/9
= = 1.
(E[Y ])2 (1/3)2
2. (a) The waiting time of the first 10 customers are
0, 0, 0, 0, 7, 6, 4, 6, 0, 0,
so the average waiting time is
7+6+4+6
E[W ] = = 2.3 minutes.
10
(b) The system time of the first 10 customers are
1, 4, 2, 8, 10, 13, 9, 8, 6, 11,
so the average system time is
1 + 4 + 2 + 8 + 10 + 13 + 9 + 8 + 6 + 11
= 7.2 minutes.
10
(c) There is a queue of size 1 (i.e. 1 customer is waiting in buffer) only

during time [18, 20].
2
# of People in Queue, Q(t)
0
0 2 4 6 8 10 12 14 16 18 20
Time, t
6.2. QUEUEING THEORY 131
So the average queue size in the first 20 minutes is

(1)(2) 1
= .
20 10
(d) During time intervals [2, 3), [7, 11), [14, 16), [17, 20], server is busy.
# of People in System, Z(t) 2
0
0 2 4 6 8 10 12 14 16 18 20
Time, t
So the average utilization in the first 20 minutes is

1+4+2+3 1
= .
20 2
(e) No. Little’s law characterize quantities in the long-run average, in
this problem, the given time is too short.
3. By the Kingman’s formula, for a G/G/1 queue, the expected waiting time
in the queue is equal to
2
ca + c2s

1 ρ
E[W ] = ,
µ 1−ρ 2
where c2a and c2s are the squared coefficient of variations for the inter-
arrival times and the service times, respectively, λ is the arrival rate, µ
is the service rate and ρ is the utilization of the server. Since each inter-
arrival time is constant c2a = 0 and λ = 30/7 beams/hour. Let us consider
the two possible options: human operator and automatic painter.
• Human operator: µ = 36/7 beams/hour, c2s = 900/4900 =
0.1837, and ρ = 30/36 = 5/6.
• automatic painter: µ = 36/7.4 = 4.865 beams/hour, c2s =
2 2
150 /740 = 0.041, and ρ = 0.88.
Therefore, the average waiting time in the queue for the human operator
is
0.1837 5/6
E[W ] = = 5.358 minutes.
2 6/7
The average waiting time in the queue for the automatic painter is
0.041 0.88
E[W ] = = 1.86 minutes.
2 0.58
Even though the service rate of the human operator is greater than the
service rate of the automatic painter, because the service times of the
automatic painter are less variable, the waiting time in the queue of the
automatic painter is a lot smaller.
4.
5. (a) The arrival rate, λA to machine A is 12 jobs/hour. The service rate
of machine A µA is 15 jobs/hour and the service rate of machine B
µB is 30 jobs/hour. The utilization of machine A is
ρA = 12/15 = 0.8.
Since ρA < 1 the arrival rate to machine B is equal to λA = 12

jobs/hour. Hence the utilization of machine B is
ρB = 12/30 = 0.4.
(b) Since the utilizations of both machines are less than 1, the through-
out is equal to the arrival rate to the system which is equal to 12
jobs/hour.
(c) Using the Kingman’s formula again
2
ca + c2s

1 ρ 0.8
E[W ] = = = 0.2666 hrs = 16 mins.
µ 1−ρ 2 3
(d) Using the Little’s Law
L = λW,
where λ is the arrival rate and is equal to 12 jobs/hour and W =

30 mins = 0.5 hrs. Hence, L = 6 jobs.
(e) If the arrival rate, λA to machine A is 60 jobs/hour then the utiliza-
tion of machine A is
ρA = min{60/15, 1} = 1.
Since ρA = 1 the arrival rate to machine B is equal to 15 jobs/hour.

Hence the utilization of machine B is
ρB = 15/30 = 0.5.
Since the utilization of the first machine is equal to 1, the throughput

is equal to the service rate of this machine which is equal to 15
jobs/hour.
6.2. QUEUEING THEORY 133
6. Let Lq denote the average number of cars in the yard, Lb denote the
average number of cars in the bump area, and Lp the average number of
cars in the paint area. Note that for all the parts of the system the arrival
rate is λ = 10 cars/week. The average time spent in the yard is
Lq 15
Wq = = = 1.5 weeks.
λ 10
The average time spent in the bumping area is
Lb 10
Wb = = = 1 week.
λ 10
The average time spent in the paint area is
Lb 5
Wp = = = 0.5 weeks.
λ 10
The average length of time from when a car arrives until it leaves is
W = Wq + Wb + Wp = 3 weeks.
7. (a) EX = 1/4 + 2/2 + 3/4 = 2 so µ = 1/2. Thus ρ = (1/3)/(1/2) = 2/3.

The long-run fraction that the system is busy is 2/3.
(b) VarX = 1/4 + 1/4 = 1/2, so cs = 1/8. For exponential(λ) random
variable, the variance is 1/λ2 and the mean is 1/λ. So ca = 1. Using
Kingman’s formula, the average waiting time in queue will be
ρ c2a + c2s 9/8

EWq = EX =2×2× = 9/4 = 2.25
1−ρ 2 2
(c) By Little’s law,

Lq = 1/3EWq = 3/4
where Lq is average number of customers in the queue. So the total
number of customers in the entire call center will be
L = Lq + ρ = 3/4 + 2/3 = 17/12
(d) Processing rate is faster than arrival rate. Hence, throughput is de-
termined by arrival rate.
1
Throughput =
3
(e) As arrival rate gets faster, service rate becomes the bottleneck.
1
Throughput =
2
8. (a) Since λ = 1/3, µCook = 1/2, µCashier = 1, the traffic intensity of the
cooking station is
λ 1/3 2
ρCook = = = .
µCook 1/2 3
Since ρCook < 1, the utilization of the cook is equal to the traffic
intensity which is 2/3. Since the utilization of the cook is less than
1, the throughput of the cooking station is λ = 1/3, which in turn
the arrival rate for the cashier’s desk. Hence, the traffic intensity of
the cashier is
λ 1/3 1
ρCashier = = = .
µCook 1 3
Since ρCashier < 1, the utilization of the cashier is equal to the traffic
intensity which is 1/3. Since the utilization of the cashier is less than
1, the throughput of the cashier’s desk is again λ = 1/3 people per
minute, which in turn the throughput of the system.
(b) First of all, the long-run average waiting time at the cooking station
is
2
ca + c2s

ρCook
E[W ] = E[Cooking Service Time] .
1 − ρCook 2
Since inter-arrival times are exponential and cooking times are con-
stant, c2a = 1, c2s = 0. Hence,

2/3 1+0 1
E[W ] = 2 = (2)(2) = 2 minutes.
1/3 2 2
Since the average cooking service time is 2 minutes, the average sys-
tem time at the cooking station is E[W ]+E[Cooking Service Time] =
2 + 2 = 4 minutes.
(c) Using Little’s Law,
1
L = λW = (15) = 5 people.
3
(d) λ is changed to 1/0.5 = 2 people per minute. The traffic intensity of

the cooking station is
λ 2
ρCook = = = 4.
µCook 1/2
Since ρCook > 1, the utilization of the cook is 1. Since the utilization
of the cook is 1, the throughput of the cooking station is µCook = 1/2,
6.3. DISCRETE TIME MARKOV CHAIN 135
which in turn the arrival rate for the cashier’s desk. Hence, the traffic
intensity of the cashier is
µCook 1/2 1
ρCashier = = = .
µCashier 1 2
Since ρCashier < 1, the utilization of the cashier is equal to the traffic
intensity which is 1/2. Since the utilization of the cashier is less than
1, the throughput of the cashier’s desk is µCook = 1/2 people per
minute, which in turn the throughput of the system.
(e) Since the traffic intensity at the cooking station is greater than 1,
the long-run average waiting time at the cooking station is infinite.
Hence, the average system is also infinite at the cooking station.
9. (a) Using Little’s Law,
L1st = λ1st W1st = 2500(4.5) = 11250 students.
(b) Let λtf denote the number of transferred students entering GT each
year. Then,
13250 = L1st + Ltf = 11250 + λtf Wtf = 11250 + 2.5λtf ∴ λtf = 800 students per year.
6.3 Discrete Time Markov Chain

1. (a) Observe that the state space of Xn is S = {3, 4, 5, 6}, since if the
inventory drops below 3 we order up to 6. So at the beginning of a
day the minimum number of items is equal to 3. The initial state is
deterministic and the initial distribution is given by Pr{X0 = 5} = 1.
Next, we find the transition matrix. Note that
Pr{Xn+1 = 3 | Xn = 3} = Pr{Xn+1 = 4 | Xn = 3} = Pr{Xn+1 = 5 | Xn = 3} = 0,
since whenever the inventory level goes below 3 we order up to 6.

Therefore,
Pr{Xn+1 = 6 | Xn = 3} = 1.
Now,
Pr{Xn+1 = 3 | Xn = 4} = 1/6,
since the demand is equal to 1 with probability 1/6 and if the demand
during day n is 1 we end up with 3 items and do not order. Otherwise
we order, hence
Pr{Xn+1 = 6|Xn = 4} = 5/6.

Going in this fashion, the transition matrix can be shown to be

 
3 0 0 0 1
4 1/6 0 0 5/6
P =  .
5 3/6 1/6 0 2/6
6 2/6 3/6 1/6 0
(b) Let Yn be the amount in stock at the end of day n. The inventory at
the end of a day can be any value from 0 to 5. It cannot be 6 because
at the beginning of a day the maximum number of items we can have
in the inventory is 6 and the demand is strictly greater than zero with
probability 1. So the state space in this case is S = {0, 1, 2, 3, 4, 5}.
(If you think 6 must be included in the state space include 6 in
the transition matrix and see what happens.) The initial state is
deterministic and the initial distribution is given by Pr{Y0 = 2} = 1.
Observe that
Pr{Yn+1 = 5 | Yn = 0} = Pr{Dn+1 = 1} = 1/6,
where Dn+1 is the demand during day n + 1. Similarly
Pr{Yn+1 = 4 | Yn = 0} =Pr{Dn+1 = 2} = 3/6,

Pr{Yn+1 = 3 | Yn = 0} =Pr{Dn+1 = 3} = 2/6.
Going in this fashion, we come up with the following transition matrix

 
0 0 0 0 2/6 3/6 1/6
1 0 0 0 2/6 3/6 1/6 
2 0
 0 0 2/6 3/6 1/6
P =  .
3 2/6 3/6 1/6 0 0 0 
4  0 2/6 3/6 1/6 0 0 
5 0 0 2/6 3/6 1/6 0
In the solutions below we use the following formulas: for two events A and
B
T
Pr(A B)
Pr(A | B) =
Pr(B)
and for two random variables Y and Z and a constant k
∞
X
E[Y | Z = k] = nPr{Y = n | Z = k}
n=0
and
∞
X
E[Y ] = E[Y | Z = k]Pr{Z = k}.
k=0
First, as computed in Problem 2(a) of Assignment 7,

   
0 0 0 1 1/3 1/2 1/6 0
 1/6 0 0 5/6  2
 5/18 5/12 5/36 1/6 
P =   =⇒ P = .
3/6 1/6 0 2/6  5/36 1/6 1/18 23/36 
2/6 3/6 1/6 0 1/6 1/36 0 29/36
2
(c) Pr{X2 = 6 | X0 = 5} = P3,4 = 23/36,
2 2
(d) Pr{X2 = 5, X3 = 4, X5 = 6 | X0 = 3} = P1,3 P3,2 P2,4 = 1/216,
(e)
E[X2 | X0 = 6] = 3Pr{X2 = 3 | X0 = 6} + 4Pr{X2 = 4 | X0 = 6}

+5Pr{X2 = 5 | X0 = 6} + 6Pr{X2 = 6 | X0 = 6}
2 2 2 2
= 3P4,1 + 4P4,2 + 5P4,3 + 6P4,4 = 49/9,
(f) If we know that α = (0, 0, 0.5, 0.5), then

2 2 2 2
Pr{X2 = 6} = α1 P1,4 + α2 P2,4 + α3 P3,4 + α4 P4,4 = 52/72 = 13/18,
(g)
Pr{X4 = 3, X1 = 5, X2 = 6}
Pr{X4 = 3, X1 = 5 | X2 = 6} =
Pr{X2 = 6}
Pr{X4 = 3, X2 = 6 | X1 = 5}Pr{X1 = 5}
=
Pr{X2 = 6}
2
P3,4 P4,1 Pr{X1 = 5}
=
Pr{X2 = 6}
where
Pr{X1 = 5} = α1 P1,3 + α2 P2,3 + α3 P3,3 + α4 P4,3 = 1/12
an Pr{X2 = 6} has been calculated in the previous problem. The

finial ansewer is 1/156.
2. (a) In order to show that {Xn , n ≥ 1} is a discrete-time Markov chain it

is enough to show that one step transition probabilities describe the
evolution of the system.
First, the state space is S = {1, 2, . . . , 6}. (The state space has
nothing to do with the fact that process is Markov or not.)
We have for kn > kn+1
Pr{Xn+1 = kn+1 | Xn = kn } = 0
for all possible values of kn .

If kn = kn+1 ∈ S = {1, 2, . . . , 6}, then
Pr{Xn+1 = kn+1 | Xn = kn } = kn /6.
If kn < kn+1 ∈ S = {2, . . . , 5, 6}, then
Pr{Xn+1 = kn+1 | Xn = kn } = 1/6.
Since one step transition probabilities describe the evolution of the

system, {Xn : n ≥ 1} is a Markov chain with the transition matrix
1 1 1 1 1 1
 
6 6 6 6 6 6
2 1 1 1 1
 0 6 6 6 6 6

3 1 1 1
 
 0 0 6 6 6 6

P = 4 1 1

 0 0 0 6 6 6

5 1
 
 0 0 0 0 6 6

6
0 0 0 0 0 6
(b) Pr{X1 = i} = 1/6 for i ∈ {1, 2, . . . , 6}.
3. The state space is S = {0, 1, 2, . . .}. (The state space is unbounded, so

has infinitely many elements.)
For 0 ≤ kn ≤ n,
Pr{Xn+1 = kn + 1 | Xn = kn } = 1/6 and

Pr{Xn+1 = kn | Xn = kn } = 5/6.
Since one step transition probabilities describe the evolution of the system,
{Yn : n ≥ 1} is a Markov chain.

0.8 0.2
4. (a) The state space is {$10, $20}. The transition matrix is .
0.1 0.9
It is irreducible.

0.9 0.1
(b) The state space is {$10, $25}. The transition matrix is .
0.15 0.85
It is irreducible.
X X
(c) The stationary distribution is (π$10 , π$20 ) = (1/3, 2/3).
Y Y
(d) The stationary distribution is (π$10 , π$25 ) = (3/5, 2/5).
P300 P300
(e) What you need look at is E( i=1 Xi ) and E( i=1 Yi ). And choose
the one with larger expectation. By the stationary distribution ob-
tained in (b) and (c), we have
300
X 1 2
E( Xi ) = 300(10 × + 20 × ) = 5000,
i=1
3 3
300
X 3 2
E( Yi ) = 300(10 × + 25 × ) = 4800,
i=1
5 5
so you should pick stock 1.
5. (a) Yes, it is a Markov Chain. The state space is {0, 1, 2, 3, · · · }.

98/100 j = i + 1;
a = (1, 0, 0, · · · ), Pi,j =
2/100 j = 0.
(b) Yes, it is irreducible. Starting from any state i, the probability of

visiting state 0 in one step is 2/100. And starting from state 0, the
probability of visiting state i in i step is (98/100)i . Therefore, state 0
communicates with any other state. Thus any state commutes with
each other.
(c) Starting from state 0, we have positive probability to visit state 0
again in one step. So state 0 is aperiodic. And periodicity is a class
property, so the Markov chain is aperiodic.
(d) Let p = 98/100. You can write down the following balance equations.
πi = pπi−1 (6.1)
X
π0 = (1 − p) πi . (6.2)
i
P
From (6.2), you can write π0 = 1 − p since i πi = 1. From (6.1),
you will have πi = pi π0 = (1 − p)pi .
(e) Yes, because it is irreducible, and the stationary distribution exists.
6. Suppose the state space to be {0, 1, 2, 3, }, one can draw the transition
diagram
.4
0 3
.8
.6 .5 .3 .2
.4
1 2
.7
(a) Yes, it is. Each state has period 2.

P4
(b) Yes, it is. π satisfies the balance equation πP = π and i=1 πi = 1.
Since the Markov chain is irreducible, the stationary distribution is
unique.
100 101
(c) P11 = 11/16 6= π1 , P11 = 0 6= π1 . Since the Markov chain has
period 2,
100 101
P11 + P11
π1 = .
2
7. (a) Assume the state space is {0, 1, 2}. One can draw the transition
diagram as the following:
0
2/3 2/3
1/3 1/3
1/3
1 2
2/3
(1) Since any state commutes with each other, the Markov chain is
irreducible.
(2) The stationary distribution is π = (1/3, 1/3, 1/3), which is unique
because the state space is irreducible.
(3) The Markov chain is periodic.
(4) Each state can be visited again either in 2 steps (e.g. 0 → 1 → 0)
or 3 steps (e.g. 0 → 2 → 1 → 0). The period is the greatest
common factor of 2 and 3, which is 1.
(5) Since Markov chain is irreducible, aperiodic, and has a finite
state space, limn→∞ Pijn = πj , where π is the unique stationary
distribution obtained in (2). And n = 100 is long enough to
“reach” the steady state, so
 
1/3 1/3 1/3
P 100 ≈ 1/3 1/3 1/3 .
1/3 1/3 1/3
(b) Assume the state space is {0, 1, 2, 3}. One can draw the transition
diagram as the following:
.8
.2 0 1 .5
.5
1
2 3 .5
.5
(1) It is NOT irreducible since state 0 can only commute with state
1.
(2) π = {5/26, 8/26, 1/6, 2/6} is a stationary distribution. But it
is not unique sincePπ = {5/39, 8/39, 2/9, 4/9} is also one. Both
solve π = πP and π = 1.
(3) Periodicity is only defined for irreducible Markov chain.
(4) State 4 can come back to itself in either 2 steps or 3 steps, so the
period is 1. Any other state can come back to itself in one step,
so its period is 1.
(5) Markov chain can be decomposed into two Markov chains with
state space {0, 1} and {2, 3} respectively. And each of them is
irreducible and aperiodic, having finite state space,
lim Pijn = πj for i, j ∈ {0, 1} or i, j ∈ {2, 3}

n→∞
where {π0 , π1 } is the unique stationary distribution for Markov

chain with state space {0, 1} and {π2 , π3 } is the unique stationary
distribution for Markov chain with state space {2, 3}. Pijn = 0,
for all other (i, j) pairs, of which one is from {0, 1} and the other
is from j ∈ {2, 3}. So
 
5/13 8/13 0 0
5/13 8/13 0 0 
P 100 ≈  0
.
0 1/3 2/3
0 0 1/3 2/3
8. (a) The inventory policy of this store is (s, S) = (2, 4). Given that week
1 started with 3 items, in order for week 2 to start with 3 items, no
order can be made at the end of week 1. The latter is equivalent
to the fact that the demand for week 1 is zero. Thus, given week 1
starts with 3 items, the probability that week 2 starts with 3 items
is equal to the probability that week 1 has demand 0, which is equal
to .3.
(b) Yes. For n ≥ 0, let Dn+1 be the demand of week n + 1. Then
(
4 if Xn − Dn+1 ≤ 2,
Xn+1 =
Xn − Dn+1 if Xn − Dn+1 ≥ 3.
State space S = {3, 4, 5, 6, . . .} and transition matrix

3 .3 .7
P = .
4 .4 .6
9. (a)
1/2
1 2
1/4
1/4 1/2 1/2 3/4
3/4
4 3
1/2
1 1 3 1
(b) Pr{X2 = 4|X0 = 2} = P21 P14 + P23 P34 = 4 × 2 + 4 × 2 = 21 .
(c)
Pr{X2 = 2, X4 = 4, X5 = 1|X0 = 2}
= Pr{X5 = 1|X4 = 4}Pr{X4 = 4|X2 = 2}Pr{X2 = 2|X0 = 2}
1 1 1 3 1
= P41 × × ( × + × )
2 4 2 4 2
1 1 1
= × ×
4 2 2
1
= .
16
(d) Each state has period 2.

(e) YES. Since π = ( 18 , 14 , 38 , 14 ) satisfy π = πP and sum of the elements
are 1, then π must be the stationary distribution. It is unique since
this DTMC is irreducible.
(f) No. The limit does not exist since the DTMC has period 2.
1
(g) E(τ1 |X0 = 1) = π1 = 8.
10. Solving
.2 .8
(π1 , π2 ) = (π1 , π2 )
.5 .5
5 8
we have (π1 , π2 ) = ( 13 , 13 ). Let X be the DTMC with transition matrix P
on state space S = {1, 2, 3, 4, 5}. Starting X0 = 3, let f3 be the probability
that the DTMC is absorbed into {1, 2}. Similarly, starting X0 = 4, let f4
be the probability that the DTMC is absorbed into {1, 2}. Then, we have
1 3
f3 = + f4 ,
4 4
1
f4 = f3 ,
2
2
which has solution f3 = 5and f4 = 15 . Therefore,
 5 8

13 13 0 0 0
 5 8
0 0 0 
 5 132 13
P 100 ≈  13 ( 5 ) 13 ( 5 ) 0 0 ( 35 )
8 2

.

 5 ( 1 ) 8 ( 1 ) 0 0 ( 4 )
13 5 13 5 5
0 0 0 0 1
11. (a) Let Xn denote the inventory level at the “beginning” of n-th week.
Using Xn to model the problem as DTMC, the state space would be
S = {20, 30}. The transition matrix for this DTMC is

20 0 .2 + .5 + .3 0 1 2 .2 .8
P = = , P = .
30 .2 .5 + .3 .2 .8 .16 .84
Since we are looking at two-week transition, the probability that
week 2 starts with inventory level 20 given that week 0 started with
inventory level 20 is as follows.
2
P(X2 = 20|X0 = 20) = P20,20 = 0.2
(b) Since the DTMC is irreducible, recurrent, and aperiodic, it has a

unique stationary distribution. Denote the stationary distribution
by π = (π20 , π30 ).

0 1 1 5
πP = π ⇒ π20 π30 = π20 π30 ⇒ π20 = , π30 =
.2 .8 6 6
We are looking at 100-week transition, so

100 π20 π30 1/6 5/6
P ≈ = .
π20 π30 1/6 5/6
Therefore, the probability that week 100 starts with inventory level
20 given that week 0 started with inventory level 20 is as follows.
100 1
P(X100 = 20|X0 = 20) = P20,20 =
6
(c) Define f (y) to be the expected profit of a week given that the man-
agement started the week with inventory level y. Since the company
can start a week with inventory level of either 20 or 30, computing
f (20) and f (30) will suffice to compute the long run average profit.
(1) y = 20
• D = 10 : 200(10) − 500 − 20(10) − 100(20) = −700
• D = 20 : 200(20) − 500 − 100(30) = 500
• D = 30 : 200(20) − 500 − 100(30) = 500
f (20) = (−700)(.2) + 500(.8) = 260
(2) y = 30
• D = 10 : 200(10) − 20(20) = 1600
• D = 20 : 200(20) − 500 − 20(10) − 100(20) = 1300
• D = 30 : 200(30) − 500 − 100(30) = 2500
f (30) = 1600(.2) + 1300(.5) + 2500(.3) = 1720
From part (b), we know that the stationary distribution is π =

(1/6, 5/6). It means that, in the long run, the company starts a week
with inventory level 20 with probability 1/6 and 30 with probability
5/6. Therefore, the long run average profit is
260 5(1720) 2
π20 f (20) + π30 f (30) = + = 1476 dollars per week.
6 6 3
12. (a) The transition diagram is as follows.
1 .8 .6
1 2 3 4
.2 .4 1
Since every state communicates with each other, this chain is irre-
ducible. The period of state 2 is 2 because there is no way to start
from and come back to state 2 in odd number of steps. (By soli-
darity property, the period of every state is 2 because the chain is
irreducible.)
(b) Due to Markov property,
Pr{X1 = 2, X3 = 4|X0 = 3} =Pr{X3 = 4|X1 = 2}Pr{X1 = 2|X0 = 3}

2 1
=P2,4 P3,2 = (.48)(.4) = 0.192
because
 
.2 0 .8 0
0 .52 0 .48
P2 = 

.
.08 0 .92 0 
0 .4 0 .6
(Alternatively, you can compute the probability by looking at the

transition diagram. There is only one path for the chain to move
from state 3 to state 2 to state 4 in three steps. You can just multiply
all involved probabilities. (.4)(.8)(.6) = 0.192)
(c) Looking at P 2 matrix, notice that there is only two possible states
at time 2 given that the chain started from state 3. X2 = 1 with
probability 0.08 and X2 = 3 w.p. 0.92.
E(X2 |X0 = 3) =1Pr{X2 = 1|X0 = 3} + 3Pr{X2 = 3|X0 = 3}

=1(0.08) + 3(0.92) = 2.84
(d) Using the

P cut method, we have the following balance equations as
well as π = 1.
1π1 = .2π2 , .8π2 = .4π3 , .6π3 = 1π4
Solving the linear system of equations, we obtain the following sta-

tionary distribution.
1
π = (π1 , π2 , π3 , π4 ) = (1, 5, 10, 6)
22
(e) It is not correct. Since this chain is periodic with period 2, the
following expression holds instead. (The limit distribution does not
exist.)
Pr{X100 = 2|X0 = 3} + Pr{X101 = 2|X0 = 3}

≈ π2
2
(It is obvious that Pr{X100 = 2|X0 = 3} = 0 because the number

of steps is 100 which is an even number. Hence, it is easy to check
Pr{X100 = 2|X0 = 3} 6≈ π2 .)
13. Assuming that the state space S = {1, 2, 3, 4, 5, 6}, the transition diagram
is as follows.
.6
.4 1 2 .3
.7 6 1
.5
1
3 4
.6
.4
.5
(a) Since states 3, 4, 5 are transient, columns 3, 4, 5 in P 100 should be

all zeroes. States 1 and 2 are irreducible part, so solve πP = π for
state 1 and 2.

.4 .6
(π1 , π2 ) = (π1 , π2 )
.7 .3
Hence, (π1 , π2 ) = (7/13, 6/13). Now, compute the absorption proba-

bilities f3,6 , f4,6 , f5,6 . Denote each of these by x, y, z respectively.
x =y
y =.5 + .5z
z =.4x
Hence, x = f3,6 = 5/8, y = f4,6 = 5/8, z = f5,6 = 1/4 and f3,{1,2} =

3/8, f4,{1,2} = 3/8, f5,{1,2} = 3/4. Also, once you are absorbed by
{1, 2}, you will be in state 1 or 2 following the probabilities of π1 , π2 .
In sum,
 
7/13 6/13 0 0 0 0

 7/13 6/13 0 0 0 0  
(7/13)(3/8) (6/13)(3/8) 0 0 0 5/8
P 100 ≈ 

(7/13)(3/8) (6/13)(3/8) 0 0 0 5/8 .

 
(7/13)(3/4) (6/13)(3/4) 0 0 0 1/4
0 0 0 0 0 1
(b) No, they are the same approximately. P 100 ≈ P 101 .
14. (a) State space S = {1, 2, 3} and initial distribution a0 = (0, 1, 0). The
transition probability matrix is
 
1 .25 .75 0
P = 2  .5 0 .5  .
3 0 .25 .75
(b) Solve πP = π.
.25π1 + .5π2 =π1

.75π1 + .25π3 =π2
.5π2 + .75π3 =π3
π1 + π2 + π3 =1
Hence,

2 3 6
π= , , .
11 11 11
(c) π2 + π3 = 9/11
(d) Denote the monthly earning when your evaluation is n by g(n). Then,
g(n) is the following function.
g(1) = $10000, g(2) = $25000, g(3) = $50000
Hence, your long-run average monthly salary is

3
X $1000(20 + 75 + 300) 395000
πi g(i) = π1 g(1) + π2 g(2) + π3 g(3) = =$ .
i=1
11 11
15. (a) No. State T cannot reach State Bk and vice versa. (Other examples
that make sense are okay, too.)
(b)
 
T .1 .9 0 0 0
B .3 .7 0 0 0
P = M 0
 .1 .4 .5 0

Th 0 0 .2 .5 .3
Bk 0 0 0 0 1
(c)
fM,Bk =.4fM,Bk + .5fT h,Bk

fT h,Bk =.3 + .5fT h,Bk + .2fM,Bk
So, fM,Bk = .75, fT h,Bk = .9

 
.25 .75 0 0 0
 .25 .75 0 0 0
P 359 = 
 
(.25)(.25) (.25)(.75) 0 0 .75

 (.1)(.25) (.1)(.75) 0 0 .9 
0 0 0 0 1
(d) Approximately, P 359 ≈ P 360 . Pr{X360 = T | X0 = Th} = .025

6.4 Poisson Process

1. Let S1 and S2 be independent and exponentially distributed random vari-
ables with means µ11 = 2 and µ12 = 5. They corresponds to the service
time of the first and the second teller.
(a) Since A starts service immediately after arrival, Pr(TA ≥ 5) =
1
Pr(S1 ≥ 5) = e− 2 5 = 0.0821.
(b) By the same reason, E(TA ) = E(S1 ) = 2.
(c) This amount, in mathematical expression, will be
E(TA | TA > 5) = E(S1 | S1 > 5)
= 5 + E(S1 ) // by memoryless property of exponential random variables
= 7.
(d)
µ1
Pr(TA < TB ) = Pr(S1 < S2 ) = = 5/7.
µ1 + µ2
(e) The time from noon till a customer leaves is min{TA , TB }. Its expec-
tation will be the same as
1
E(min{S1 , S2 }) = = 10/7.
µ1 + µ2
(f) If a customer leaves, one of the tellers will be able to serve C. So it
is the same as time till one departure, which is computed in (e).
(g) The total time C spends in the system will be the summation of the
waiting time min{TA , TB } and its service time. The service time will
depend on at which teller the customer gets service. So by condition-
ing, we have
E(service time of C ) = E(S1 | TA < TB )Pr(TA < TB ) + E(S2 | TB < TA )Pr(TB < TA )
= 2 × 5/7 + 5 × 2/7 by problem (e)
= 20/7.
Thus, the expected time C in the system is 10/7 + 20/7 = 30/7.
(h) The expected time from noon till C start service is E(min{TA , TB }),
which has already been studied in (e). Let’s now study the expected
time from C enter the service till the system is empty. No matter
which teller C goes to, both of the teller are busy upon this time.
By memoryless property, the two tellers can be imagined just start
service together at this time. The expected time left for both of them
to finish is still E(max{S1 , S2 }). So
E(time until system is empty) = E(min{TA , TB }) + E(max{S1 , S2 })
= 2E(min{S1 , S2 }) + E(S1 ) + E(S2 ) − E(min{S1 , S2 })
= E(S1 ) + E(S2 ) = 7.
6.4. POISSON PROCESS 149
(i) B leaves before A means that C start service at teller 2.

Pr(TC < TA | TB < TA ) = Pr(S2 < S1 ) // again by memoryless property
µ2
= = 2/7.
µ1 + µ2
(j) The probability that A leaves last is
Pr(TB < TA , TC < TA ) = Pr(TC < TA | TB < TA )Pr(TB < TA )
= 2/7 × 2/7 = 4/49.
The probability that B leaves last is
Pr(TA < TB , TC < TB ) = Pr(TC < TB | TA < TB )Pr(TA < TB )
= Pr(TC < TB | TA < TB ) × 5/7
= 5/7 × 5/7 = 25/49. // computation of the first 5/7 is similar as in (i)
The probability that C leaves last is
Pr(TA < TC , TB < TC ) = Pr(TA < TC , TB < TA ) + Pr(TB < TC , TA < TB )
= Pr(TA < TC | TB < TA )Pr(TB < TA ) + Pr(TB < TC | TA < TB )Pr(TA < T
= 5/7 × 2/7 + 2/7 × 5/7 = 20/49.
(k) By memoryless property, we can imagine that everything just starts
at 12:10. In order that D enter service, two customers must leave.
The average time from 12:10 till one departure is same as in (e), which
is 10/7. The average time from then to next departure will be the
same, 10/7, due to memoryless property. Now, it is left to compute
the expected service time of D. Using a similar conditioning in (g)
E(service time of D ) = E(S1 | TA < TB , TC < TB )Pr(TA < TB , TC < TB )
+E(S1 | TB < TA , TA < TC )Pr(TB < TA , TA < TC )
+E(S2 | TB < TA , TC < TA )Pr(TB < TA , TC < TA )
+E(S2 | TA < TB , TB < TC )Pr(TA < TB , TB < TC )
= 2 × 5/7 × 5/7 + 2 × 2/7 × 5/7 + 5 × 2/7 × 2/7 + 5 × 5/7 × 2/7
= 20/7.
So the total time D spends in the system is 10/7+10/7+20/7 = 40/7.
2. (a) Pr(T ≤ 1) = 1 − e−10/7 ≈ 0.7603 and Pr(T ≤ 1) = 1 − e−20/7 ≈
0.9426.
(b) The tardiness will be T 0 = max (T − 1), 0.
Z ∞
E(T 0 ) = (t − 1)λe−λt dt
1
Z ∞
−λ
= e sλe−λs dt
0
= e10/7 0.7 ≈ 0.1678
(c) Let g1 (t), g2 (t), g3 (t) denote the profit for the “$1250”, “$1000”, “$750”
contract with delivery time t respectively. Then they can be written
as:

 1250 0 ≤ t ≤ 1/2
g1 (t) = 2500(1 − t) 1/2 < t ≤ 1
0 t>1


 1000 0≤t≤1
g2 (t) = 1000(2 − t) 1 < t ≤ 2
0 t>2


 750 0≤t≤7
g3 (t) = 750(14 − t)/7 7 < t ≤ 14
0 t > 14

Let T be a random variable with parameter δ, let’s compute E[g1 (T )]:

Z 0.5 Z 1
1 1
f1 (δ) = E[g1 (T )] = 1250 d−t/δ dt − 2500(1 − t) d−t/δ dt
0 δ 0.5 δ
1 1
= 1250 − 2500δ(e− 2δ − e− δ )
Similarly,
1 2
f2 (δ) = E[g2 (T )] = 1000 − 1000δ(e− δ − e− δ )
750 − 7 14
f3 (δ) = E[g3 (T )] = 750 − δ(e δ − e− δ )
7
Plug-in 0.7 into each of the expected profit function, f1 (0.7) = 812.69,
f2 (0.7) = 872.45 and f3 (0.7) = 749.99. Hence, choosing the “$1000”
contract will give us the max expected profit.
(d) Using a software package like Maple or Mathematica, we can plot
fi (δ) for i = 1, 2, 3. f1 (·) achieves the largest value, which means
that “$1250 contract” is optimal, if 0 ≤ δ ≤ 0.5375; f2 (·) achieves
the largest value, which means that “$1000 contract” is optimal, if
0.5375 ≤ δ ≤ 1.055; f2 (·) achieves the largest value, which means
that “$750 contract” is optimal, if 1.055 ≤ δ.
3. (a) The expected number of passengers that arrive in the first 2 hours is
just E[N (2)] = 60 × 2 = 120.
(b) The probability that no passengers arrive in the first hour is just
600 e−60
Pr{N (1) = 0} = = e−60 .
0!
(c) The mean number of arrivals between noon and 1pm is just E[N (3)−
N (2)] = 3 × 60 − 2 × 60 = 60.
(d) The probability that exactly two passengers arrive between 2pm and
4pm is just
(60 × 2)2 e−(60×2) 1202 e−120

Pr{N (6) − N (4) = 2} = = .
2! 2
(e) If you realize that a Poisson process has independent increments,
then
12010 e−120
Pr{N (6) − N (4) = 10 | N (.5) = 0} = Pr{N (6) − N (4) = 10} = .
10!
(f) The average time (given in hours) of the first arrival is E[S1 ] = 1/λ =
1/60 because S1 follows an exponential distribution with rate λ = 60.
(g) The expected time of the thirtieth arrival is E[S30 ] = 30/60 = 1/2.
(h) The expected time of the first arrival given that no passengers arrive
in the first hour will be 1+1/60 = 61/60 hours due to the memoryless
property.
(i) The expected arrival time of the first passenger who arrives after
2pm will be 4 + 1/60 = 241/60 hours, again, due to the memoryless
property.
(j) Female passengers form another Poisson process with rate 40%×60 =
24 per hour. Hence, in 2 hours, the expected number of female
arrivals is 48.
(k) By splitting argument in Poisson process, the Poisson processes of
female and male customers are independent from each other. There-
fore,
Pr{W (0.5) > 0} = 1 − Pr{W (0.5) = 0} = 1 − e−12 .
(l) The probability that a certain customer is female is 40%. Since in-
terarrival times are all exponential, because of memoryless property,
the probability of three consecutive female arrivals is
.4 × .4 × .4 = 6.4%.
4. Consider the following notation: let NE (·) denote the Poisson process
corresponding to English calls, and let NF (·) denote the Poisson process
corresponding to French calls. Furthermore, let {SnE } denote the arrival
times of English calls (with similar notation for French arrival times).
(a)
Pr(S2E > 5) = Pr(NE (5) ≤ 1) = e−2(5) + 10e−10 = 11e−10 .
(b) Because the two processes given above are independent,
Pr(NE (2) ≥ 2, NF (2) = 1) = Pr(NE (2) ≥ 2)Pr(NF (2) = 1) = (1−5e−4 )(2e−2 )

(c) Recall that if X is an exponential random variable with rate λ and

Y is also exponential with rate µ, and both are independent, then
min(X, Y ) is also exponential with rate λ + µ (to see this, consider
Pr(min(X, Y ) > t) for some positive t). Notice that the arrival time
of the first English call is exponential with rate 2, and the arrival
time of the first French call is exponential with rate 1. Hence the
arrival time of the first call is just min(SnE , S1F ), which we now know
is exponential with rate 3 (since both processes are independent).
Thus, the expected arrival time of the first call is just 1/3 minutes.
(d) Here N is a Poisson process with rate 3. To see this, notice that for
integers j, k ≥ 0 and real numbers a, b such that a < b, we have that
N (0) = N E(0) + N F (0) = 0 + 0 = 0
Pr(N (b) − N (a) = k | N (a) = j)

= Pr(NE (b) − NE (a) + NF (b) − N F (a) = k | NE (a) + NF (a) = j)
= Pr(NE (b) − NE (a) + NF (b) − N F (a) = k) = Pr(N (b) − N (a) = k).
(above we used the facts that Ni (b) − Ni (a) is independent of Ni (a),

where i is E or F ; this follows from both processes being known to be
Poisson and independent of each other.) Finally, for any fixed t ≥ 0,
we need to show that N (t) is Poisson with some rate (3 in this case).
Let k be a nonnegative integer. Then
k
X
Pr(N (t) = k) = Pr(NF (t) = k − i)P (NE (t) = i)
i=0
k
X tk−i e−t (2t)i e−2t
=
i=0
(k − i)! i!
k −3t
(3t) e
= .
k!
here we used the Binomial Theorem in the last equality. If you
know what Moment Generating functions are, you can show that
N (t) is Poisson using an argument that’s quicker than the one given
above. Now that we know that N is a Poisson process with rate 3,
so E[N (t)] = 3t and Var(N (t)) = 3t.
(e) The probability that no calls arrive in a 10 minute interval is just
Pr(N (10) = 0) = e−30 .
(f) The probability that at least two calls arrive in a 10 minute interval
is just
Pr(N (10) ≥ 2) = 1 − 31e−30 .
(g)
Pr(N (10) = 2, N (15) − N (5) = 3) =Pr(N (5) = 2, N (10) − N (5) = 0, N (15) − N (10) = 3)
+ Pr(N (5) = 1, N (10) − N (5) = 1, N (15) − N (10) = 2)
+ Pr(N (5) = 0, N (10) − N (5) = 2, N (15) − N (10) = 1)
152 e−15 e−15 153 e−15 15e−15 15e−15 152 e−15 e−15 152 e−15 15e−1
= + +
2! 0! 3! 1! 1! 2! 0! 2! 1!
Using the fact that a Poisson process has independent increments to
evaluate each probability gives us the result.
(h)
Pr(N (5) = 4 | N (10) = 4) = Pr(N (5) = 4, N (10) = 4)/P (N (10) = 4)

= Pr(N (5) = 4, N (10) − N (5) = 0)/P (N (10) = 4) = (1/2)4 .
(i)
Pr(N (2) ≤ 1, N (4) = 3) = Pr(N (2) = 0, N (4) = 3) + Pr(N (2) = 1, N (4) = 3)

= Pr(N (2) = 0, N (4) − N (2) = 3) + P (N (2) = 1, N (4) − N (2) = 2)
= (6)2 e−12 + 63 e−12 /2.
5. The arrival rate is



 10 9 ≤ t ≤ 11
20(t − 10) 11 < t ≤ 12

λ(t) =

 15 + 5(13 − t) 12 < t ≤ 13
15 13 < t ≤ 17

(a) λ(12.5) = 17.5.

R 17
(b) E[N (8)] = 9 λ(s)ds = 225/2.
(c) The probability of k arrivals between 11:30 and 11:45 is just
(d)
R 11.75 R 11.75
( 11.5 λ(s)ds)k e− 11.5 λ(s)ds
k!
R 11.75 R 11.75
( 11.5 20(s − 10)ds)k e− 11.5 20(s−10)ds
=
k!
= (4.0625)k e−4.0625 /k!.
(e) To do this, note that (here minutes need to be changed into hours,
e.g. 10mins = 16 hours)
1 1
Pr(T > ) = P (N (12 ) − N (12) = 0)
6 6
and
1 1
Pr(T > ) = P (N (12 ) − N (12) = 0).
3 3
Then
1 R 12 61
Pr(T > ) = e− 12 15+5(13−t)dt = e−235/72 .
6
Here T is not an exponential random variable, since it’s not memo-
ryless (try comparing Pr(T > 15 | T > 5) with Pr(T > 10)).
6. Let S1 be remaining service time of customer 1, S2 be service time for
customer 2, A3 be the arrival time of customer 3, S3 be service time
for customer 3. S1 , S2 , S3 are independent identical exponential random
variables with mean 4, A3 is an exponential random variable with mean
0.5.
(a) E[S1 + S2 ] = E[S1 ] + E[S2 ] = 8.
(b) The event second call leaves an empty system is E1 = {S1 +S2 < A3 },
P (A3 > S1 + S2 )
=P (A3 > S1 + S2 |A3 > S1 )P (A3 > S1 )
=P (A3 > S2 )P (A3 > S1 )
2
1
= .
9
(c) The expected leaving time of the 3rd customer is

E[max(S1 + S2 , A3 )] + E[S3 ]
=E[S1 + S2 ]P (S1 + S2 > A3 ) + E[A3 ]P (S1 + S2 < A3 ) + E[S3 ]
2 ! 2
1 1
=8 1 − + 0.5
9 9
=7.907 minutes.
(d) The event that the second call is the only call in system after 5
minutes is denoted by the event E2 = {S1 < 5, S1 + S2 > 5, A3 > 5}.
Let λ1 = 41 , then we have
Z 5
5
P (S1 < 5, S1 + S2 > 5) = λ1 e−λ1 x e−λ1 (5−x) dx = 5λ1 e−5λ1 = e−5/4
0 4
P (A3 > 5) = e−5λ = e−10
5
P (E2 ) = e−11.25 .
4
7. (a) Let N (t) be the total number of calls by minute t, then N (t) has rate
λ = 2 call per minute:
P {N (2) = 0} = e−2λ = e−4 .
(b)
P {N (2) = 1 and N (4) − N (1) = 3}
= P {N (1) = 1, N (2) − N (1) = 0, and N (4) − N (2) = 3}
+P {N (1) = 0, N (2) − N (1) = 1, and N (4) − N (2) = 2}
(2λ)3 (2λ)2
= λe−λ × e−λ × e−2λ + e−λ × λe−λ × e−2λ
6 2
2 −4λ 4
= λ e ( λ + 2)
3
56 −8
= e
3
= 0.006262
(c) Let NM (t), NF (t) be the total number of calls from male and fe-
male customers respectively by minute t, then NM (t) and NF (t) are
independent Poisson processes with rates 12 and 32 respectively:
32 −3 9
P {NM (2) = 1 and NF (2) = 2} = e−1 × e = e−4 .
2 2
(d) Let Xi be the interarrival time between the (i − 1)th female customer
and ith female customer, which follows exponential distribution with
rate 23 :
2 4
E[X1 + X2 ] = E[X1 ] + E[X2 ] = 2 × = minutes.
3 3
(e)
P {N (6) < 2} = P {N (6) = 0}+P {N (6) = 1} = e−12 +12×e−12 = 13e−12
8. Let N (t) be the non-homogeneous Poisson process. Using one passenger

per minute as the unit, then the arrival rate is

1 10am ≤ t ≤ 11am
λ(t) =
t − 10 11am < t ≤ 12
(a) P (N (10 : 02) − N (10 : 00) ≥ 1) = 1 − P (N (10 : 02) − N (10 : 00) =
0) = 1 − e−2 .
(b) The probability that 2 arrivals between 10:00am to 10:10am and 1
arrival between 10:05am to 10:15am is
P (N (10 : 05) − N (10 : 00) = 2, N (10 : 10) − N (10 : 05) = 0,
N (10 : 15) − N (10 : 10) = 1)
+P (N (10 : 05) − N (10 : 00) = 1, N (10 : 10) − N (10 : 05) = 1,
N (10 : 15) − N (10 : 10) = 0)
5 −5 5 −5 5 −5 51 −5 51 −5 50 −5
2 0 1
= e × e × e + e × e × e
2! 0! 1! 1! 1! 0!
175 −15
= e .
2
(c) Expected number of arrivals between 10:50am and 11:10am is equal

to Z 11:10
125
λ(t)dt = .
10:50 6
(d) The probability that there are exactly 10 arrivals between 10:50am
and 11:10am is
( 125
6 )
10
125
e− 6 .
10!
(e) Probability that the 1st passenger after 10am will take at least 2
minutes to arrival:
P (N (10 : 02) − N (10) = 0) = e−2 .

(f) Probability that the 1st passenger after 11am will take at least 2
minutes to arrival:
R 11:02 13
P (N (11 : 02) − N (11) = 0) = e− 11:00
λ(t)dt
= e− 6 .
9. (a) If we use the time units of mins:

1
0 ≤ t ≤ 60 min s ⇒ λ(t) = 60 t
60 ≤ t ≤ 480 min s ⇒ λ(t) = 1
(b) The arrival process N(t) during 9am and 9:10 am is a non-homogeneous
arrival process, the average rate is 1/12 call per min. So we can
view N(9am,9:10am) as possion random variable with mean 10/12 =
5/6call.
The mean of poisson random variable can also be calculated through:
Z10 Z10
1 1 1 1 1 5
λ(t)dt = tdt = · t2 |t=10 = · · 100 =
60 60 2 t=0 60 2 6
0 0
So the probability of exactly 5 calls during 9am and 9:10 am should

be
( 56 )5 − 5
P (N (9am, 9 : 10am) = 5) = e 6
5!
(c) It’s equivalent to show the probability of there’s no arrival between
9am to 9:20am. Similar as part b,the average rate is 1/6 call per
min, thus we can view N(9am,9:20am) as possion random variable
with mean 10/3 call. Thus the required probability should be
10
P (N (9am, 9 : 20am) = 0) = e− 3
(d) the rate after 11 am is always 1 call per min. We are calculating
the intersection of two events. There’re two possibilities of the ar-
rival time of the one call in 11am-11:05 am: either it arrives between
11am-11:03m,or it arrives between 11:03am-11:05am.
P(there are exactly one call between 11:00am and 11:05am and at
least two calls between 11:03am and 11:06am) =P(exactly one call
in 11am-11:03am and no call in 11:03-11:05 am and at least two calls
between 11:05am and 11:06am)+P(no call in 11am-11:03am and one
call in 11:03am-11:05am and at least one call in 11:05-11:06am)
Denote N (x) as possion random variable with mean x. By inde-

pendent increment property of non-homogenous possion process, we
know the above identity is equal to:
P (N (3) = 1)P (N (2) = 0)P (N (1) ≥ 2) + P (N (3) = 0)P (N (2) = 1)P (N (1) ≥ 1)
= 3e−3 · e−2 · (1 − 2e−1 ) + e−3 (2e−2 )(1 − e−1 )
= 5e−5 − 8e−6
10. (a) Let M, J denote Mary’s and John’s service time. Let µM , µJ denote
Mary’s and John’s service rate and µM = 1/2, µJ = 1/4.
µJ 1/4 1
Pr{M > J} = = =
µM + µJ 1/2 + 1/4 3
Your expected time in the system is

1 1
E[System] =E[W ] + Pr{M > J} + Pr{M < J}
µJ µM

4 1 2
= + (4) + (2) = 4 minutes.
3 3 3
(b) Since min(M, J) ∼ Exp(µM + µJ ),

3
Pr{min(M, J) < 2} = 1 − e− 2 .
Alternatively, you can think of service process is a Poisson process

with rate 3/4. We are looking for the probability that the number of
service completions in 2 minutes is greater than 0. If we denote the
number of service completion in time t by N (t),
3
Pr{N (2) ≥ 1} = 1 − Pr{N (2) = 0} = 1 − e− 2 .
(c)
3 3 3 5 3
Pr{N (2) ≥ 2} = 1 − Pr{N (2) = 0} − Pr{N (2) = 1} = 1 − e− 2 − e− 2 = 1 − e− 2
2 2
11. (a) Denote the time until next sedan arrives by TS and the time until
next truck arrives by TT .
λT 60 2
Pr{TT < TS } = = =
λT + λS 60 + 30 3
(b) Let us use minute as our time unit here. Sedan’s arrival rate is .5
per minute.
Pr{NS (10) ≥ 3} =1 − (Pr{NS (10) = 0} + Pr{NS (10) = 1} + Pr{NS (10) = 2})

0 −5
51 e−5 52 e−5

5 e
=1 − + +
0! 1! 2!
−5 −5 −5
= 1 − 18.5e−5

=1 − e + 5e + 12.5e
(c) Truck’s arrival rate is 1 per minute, so the total arrival rate of any
type of vehicles is 1.5 per minute.
1.50 e−1.5
Pr{N (1) = 0} = = e−1.5
0!
(d) min(TT , TS ) ∼ Exp(λT + λS ), so E[min(TT , TS )] = 1/(λT + λS ) =

1/(1 + .5) = 2/3 minutes.
(e) Let us use hour as our time unit here. The area under the arrival
rate graph between 7:30am and 8:30am is 30 vehicles.
303 0e−30
Pr{NM (7.5, 8.5] = 30} =
30!
(f)
Pr{NM (7.5, 8.17] = 2, NM (7.83, 8.5] = 3}

=Pr{NM (7.5, 7.83] = 0, NM (7.83, 8.17] = 2, NM (8.17, 8.5] = 1}
+Pr{NM (7.5, 7.83] = 1, NM (7.83, 8.17] = 1, NM (8.17, 8.5] = 2}
+Pr{NM (7.5, 7.83] = 2, NM (7.83, 8.17] = 0, NM (8.17, 8.5] = 3}
=Pr{NM (7.5, 7.83] = 0}Pr{NM (7.83, 8.17] = 2}Pr{NM (8.17, 8.5] = 1}
+Pr{NM (7.5, 7.83] = 1}Pr{NM (7.83, 8.17] = 1}Pr{NM (8.17, 8.5] = 2}
+Pr{NM (7.5, 7.83] = 2}Pr{NM (7.83, 8.17] = 0}Pr{NM (8.17, 8.5] = 3}
(20/3)0 e−20/3 102 e−10 (40/3)1 e−40/3
=
0! 2! 1!
(20/3)1 e−20/3 101 e−10 (40/3)2 e−40/3
+
1! 1! 2!
(20/3)2 e−20/3 100 e−10 (40/3)3 e−40/3
+
2! 0! 3!
6.5. CONTINUOUS TIME MARKOV PROCESS 159
6.5 Continuous Time Markov Process

1. (a)
A
5 8
4 2
B C
1
(b) In R, try running the following command.

library(Matrix);G=matrix(c(-12,4,8, 5,-6,1, 2,0,-2),
3,3,byrow=TRUE);expm(0.2*G)
 
.2299 .1805 .5896
P (0.2) = e0.2G = .2383 .3989 .3628
.1411 .0508 .8081
(c)
 
.1673 .1131 .7196
P (1.0) = eG = .1688 .1176 .7137
.1662 .1097 .7241
(d)
 
.1669 .1119 .7212
P (1.2) = P (1.0)P (0.2) = .1675 .1136 .7189
.1665 .1105 .7230
and so Pr{P (1.2) = C | X(0) = A} = 0.7212.

 
.1667 .1111 .7222
P (2.0) = P (1.0)P (1.0) = .1667 .1112 .7221
.1667 .1111 .7222
and so Pr{P (3.0) = A | X(1.0) = B} = 0.1667.

(e)
 
.1667 .1111 .7222
P (5.0) = e5G = .1667 .1111 .7222
.1667 .1111 .7222
The Markov Chain has reached steady-state by time 5.

2. (a) This process can be modeled as follows: here our state space is
S = {(0, 0), (1, 0), (1, 1), (0, 1)} = {0, 1, 2, 3} where a state (i, j) cor-
responds to i customers in server 1 and j customers in server 2. The
generator is as follows (the rate transition diagram can be drawn
immediately after knowledge of the generator):
 
−10 10 0 0
 6 −16 10 0 
G= 
 0 4 −10 6 
4 0 10 −14
(b) To find the stationary distribution, we need to again solve the system
of balance and normalizing equations. From the generator, we see
that
10p0 =6p1 + 4p3

16p1 =10p0 + 4p2
10p2 =10p1 + 10p3
14p3 =6p2
1 =p0 + p1 + p2 + p3
Solving this, we that p0 = 18/88, p1 = 20/88, and p3 = 15/88.
(c) The long-run fraction of time that server 1 is busy is p1 + p3 = 35/88.

The long-run fraction of time that server 1 is busy is p2 + p3 = 50/88.
3. The arrival rate is λ = 100/60 = 5/3 per min; the service rate is µ = 1/4
and the reneging rate is γ = 1/8. The state space is S = {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)}
The diagram is as follows.
(0, 0)
µ
µ
λ
(1, 0) (0, 1)
µ
µ λ µ+γ
λ
(1, 1) (0, 2)
λ µ
µ+γ λ
(1, 2)
4. For all the sub-problem below, the state space will always be {0, 1, 2, 3}.
(a) The transition diagram is as follows.

2 2 2
0 1 2 3
1 1 1
Solve the stationary distribution using “cuts”, we get π = (1/15, 2/15, 4/15, 8/15).
The throughput is λef f = 2(1 − π3 ) = 14/15. To find the average
waiting time, we use Little’s law. Note that L = 1π2 + 2π3 = 20/15,
L
so W = λef f
= 4/7.
(b) The rate diagram is as follows.
2 2 2
0 1 2 3
1 1+1 1+1
waiting time, we use Little’s law. Note that L = 1π3 = 7/2, so
L
W = λef f
= 1/5.
(c) The rate diagram is as follows.
2 2 2
0 1 2 3
1 1+1 1+1+1
L
waiting time, we use Little’s law. Note that L = 0, so W = λef f
= 0.
5. (a) Let λi be the total arrival rate to station i, i = 1, 2, 3. Then we have

the following traffic equations:
λ1 =1 + (0.1)λ2 + (0.05)λ3 ,
λ2 =λ1 + (0.1)λ3 ,
λ3 =(0.9)λ2 .
Solving the equations, one has λ1 = 1.1895, λ2 = 1.3072 and λ3 =

1.1765. Thus,
ρ1 = λ1 m1 = 0.9516, ρ2 = 0.9150, ρ3 = 0.9412.
Let Xi (t) be the number of jobs at station i at time t. Then the

long-run fraction of time that there are 2 jobs at station 1, 1 job at
station 2 and 4 jobs at station 3 is equal to
Pr{X1 (∞) = 2, X2 (∞) = 1, X3 (∞) = 4} =Pr{X1 (∞) = 2}Pr{X2 (∞) = 1}Pr{X3 (∞) = 4}
=(1 − ρ1 )ρ21 (1 − ρ2 )ρ2 (1 − ρ3 )ρ43
=1.5718 × 10−4 .
(b) ρ3 /(1 − ρ3 ) = 16.0

(c) The total long-run average system size is
L = ρ1 /(1 − ρ1 ) + ρ2 /(1 − ρ2 ) + ρ3 /(1 − ρ3 ) = 46.4449 jobs.
Using Little’s law, L = λW , the average time in system per job is

46.4449 hours, since the arrival rate into the system is 1 job per hour.
(d) The average size of the system is L = 21.985 jobs. Thus, the average
time in system per job is 21.985 hours. 5% of reduction of the rework
cuts the average cycle time to less than half. This is consistent with
Goldratt theory in managing bottlenecks: a small help to a bottleneck
machine, the great reduction to the cycle time.
6. (a) Let γi be the throughput rate going through station i. Then we have
traffic equations:
γ1 = (0.1)γ2 + γ3 , (6.3)
γ2 = γ1 , (6.4)
γ3 = (0.9)γ2 . (6.5)
Equations (6.4) and (6.5) imply equation (6.3). Thus, equation (6.3)
is redundant. (In general, in a closed network, there is always a
redundant traffic equation.) Thus, one cannot solve γ from (6.3)-
(6.5). However, if we can find γ1 by using some method, then γ2 and
γ3 follow readily.
For the moment, we set
γ1 = 1. (6.6)
(Setting γ1 to any other positive number is OK.) Then, γ2 = 1 and
γ3 = 0.9. (Remember these are not true throughput rates. With γ
set, one can define
ρ1 = γ1 m1 = 0.8, ρ2 = 0.7, ρ3 = 0.72.
(Warning: These ρ’s are not true utilizations of machines because

γ’s are not true throughput. But at least we know machine 1 is the
mostly heavily utilized machine.)
Let Xi (t) be the number of jobs at station i at time t. Then
Pr{X1 (∞) = 2, X2 (∞) = 0, X3 (∞) = 0} = Cρ21 ρ02 ρ03

Pr{X1 (∞) = 1, X2 (∞) = 1, X3 (∞) = 0} = Cρ11 ρ12 ρ03
Pr{X1 (∞) = 1, X2 (∞) = 0, X3 (∞) = 1} = Cρ11 ρ02 ρ13
Pr{X1 (∞) = 0, X2 (∞) = 1, X3 (∞) = 1} = Cρ01 ρ12 ρ13
Pr{X1 (∞) = 0, X2 (∞) = 2, X3 (∞) = 0} = Cρ01 ρ22 ρ03
Pr{X1 (∞) = 0, X2 (∞) = 0, X3 (∞) = 2} = Cρ01 ρ02 ρ23
These probabilities have to add up to 1. Thus, we have
C = 0.305. (6.7)
Hence,
Pr{X1 (∞) = 0, X2 (∞) = 1, X3 (∞) = 1} = 0.1537, (6.8)

Pr{X1 (∞) = 0, X2 (∞) = 2, X3 (∞) = 0} = 0.1495, (6.9)
Pr{X1 (∞) = 0, X2 (∞) = 0, X3 (∞) = 2} = 0.1581. (6.10)
Note that these probabilities do not depend on the choice to γ1 in

(6.6). If one changes the choice of γ1 , C in (6.7) will change accord-
ingly. ¿From (6.8)–(6.10), the long-run fraction of time that machine
1 is idle is given by 0.1494 + 0.1537 + 0.1581 = .461. Thus, the uti-

lization of machine 1 is ρ∗1 = 1 − 0.461 = 0.539. (This is the true
utilization of machine 1.) Therefore, the true throughput at station
1 is γ1∗ = 0.539/m1 = 0.539/0.8 = .6737, and hence γ2∗ = 0.6737 and
γ3∗ = (0.9)γ2∗ = .6064. Thus, the production rate of the system is
0.6064 jobs per hour. That is also the rate at which new jobs are
introduced into the system. Using Little’s law, L = λW , we have
that the average time in system per job is 2/0.6064 = 3.2982 hours.
By the way, the true utilizations of machines are
ρ∗1 = 0.539, ρ∗2 = γ2∗ m2 = 4716, ρ∗3 = γ3∗ m3 = 0.4851.
(b) No. To double γ3∗ , one needs to double γ1∗ , and hence double ρ∗1 .
Currently, machine 1 is more than 50% utilized, and one cannot
double it.
7. (a) Solve the following equations for traffic density,
λ1 = α + 0.5λ2
λ2 = λ1 + 0.5λ3
λ3 = 0.5λ2
Then λ1 = 3, λ2 = 4, λ3 = 2, the long run average utilization for 3

stations are:
ρ1 = λ1 m1 = 0.9, ρ2 = λ2 m2 = 0.8, ρ3 = λ3 m3 = 0.9.
(b)
(1 − ρ1 )ρ21 (1 − ρ2 )ρ32 (1 − ρ3 )ρ33 = 0.0006
(c)
ρ2
= 0.8/0.2 = 4.
1 − ρ2
(d) Throughput is equal to
0.5λ3 = 1.
(e) The long run average number of customers in the system is equal to
ρ1 ρ2 ρ3 0.9 0.8 0.9
L= + + = + + = 22.
1 − ρ1 1 − ρ2 1 − ρ3 0.1 0.2 0.1
By Little’s law, the long-run average time in system is equal to L/α =
22 minutes.
8. (a) Each row of G sum up to 0.
 
−4 3 1
G= 1 −3 2 .
1 1 −2
(b) Let P (t) be the transition probability matrix, with Pij (t) = P (Xt =
j|X0 = i). Then we have
P (t) = etG .
This implies
Pr{X(2) = 3|X(1) = 2} = P (1)23 = expm(G)23 = 0.44407.
(c) By markov property, we obtain

Pr{X(1) = 2, X(3) = 3|X(0) = 2} = P (1)22 P (2)23
= {eG }22 {e2G }23
= 0.35727 × 0.4499
= 0.1607.
(d) Solving (π1 , π2 , π3 )G = 0, we get stationary distribution π = (.2, .35, .45).

(e) Pr{X(200) = 3|X(1) = 2} = π3 = .45.
9. (a) Let X(t) denote the total number of customers in the system, includ-
ing those in service and waiting. The state space is {0, 1, 2, 3, 4} as
there are 4 phone lines in total.
State 0 means that there is no one in the system. State 1 and 2 mean
that there is (are) 1 and 2 customer(s) in service, respectively. State
3 and 4 mean that there are 2 customers in service, with 1 and 2
customer(s) waiting, respectively.
Let λ = 2 denote the arrival rate, and µ = 1 denote the service rate.
The rate diagram is shown in Figure 1.
(b) Let πi denote the long-run fraction of time that there are i customers
in the system, i = 0, 1, . . . , 4. The system equation is as follow:
λπ0 = µπ1
λπ1 = 2µπ2
λπ2 = 2µπ3
λπ3 = 2µπ4
P4
and i=0 πi = 1. Solve the above equations, we have that
1 2 2 2 2
(π0 , π1 , π2 , π3, π4 ) = ( , , , , )
9 9 9 9 9
= (0.1111, 0.2222, 0.2222, 0.2222, 0.2222).
The desired probability is π2 = 29 = 0.2222.
(c) The long-run average number of waiting calls equals to
Lq = π3 + 2π4
2
= = 0.6667.
3
(d) The throughput equals to the unblocked arrival rate

14
(1 − π4 )λ = = 1.5556.
9
It is also equal to the rate at which completed calls leaves the call
center, i.e.,
14
µπ1 + 2µ(π2 + π3 + π4 ) = = 1.5556.
9
(e) Using Little’s law, the average waiting time of an accepted call equals
to
Lq
Wq =
λ
1
= = 0.3333.
3
10. (a) Solving traffic equation
λ1 = 2 + .12 ∗ λ2 ,
λ2 = λ1 ,
25
we have λ2 = λ1 = 2/.88 = 11 . Thus,
ρ1 = ρ2 = .4(25/11) = 10/11.
Therefore, the long-run average utilization of machine 1 is 10/11, and

the long-run average utilization of machine 2 is 10/11.
(b)
(1 − ρ1 )ρ21 (1 − ρ2 )ρ32
ρ2
(c) The long-run average number of jobs at station 2 is 1−ρ2 = 10.
(d) The long-run average number of jobs in the system is 10 + 10 = 20.
By Little’s law, the average time in system per job is
20/α = 20/2 = 10 minutes.
11. (a) Let X(t) be the number of machines that are up at time. Then the
state space is {0, 1, 2, 3}. State i means that there are i machines
that are up, i = 0, 1, 2, 3. The rate diagram is
We have the following balance equations for the stationary distribu-
tion π:
2π0 = (1/3)π1 ,
2π1 = (2/3)π2 ,
π2 = π3 ,
π0 + π1 + π2 + π3 = 1.
Solving these equations, we have
π = (1/22, 3/22, 9/22, 9/22).
Thus, the long-run fraction of time that all machines that are up is
9/22.
(b)
0π0 +4π1 +8π2 +12π3 = 12/22+72/22+108/22 = 192/22 = 96/11 jobs per minute.
12. (a) Arrival rate from outside is 12 parts per minute. Denote the through-
puts of each station by λ1 , λ2 , λ3 , respectively. Assuming that the
traffic intensity of each station is less than 1, the arrival rate for
each station should be the same as the corresponding throughput
λ1 , λ2 , λ3 . At each confluence, set up an equation that balances in-
coming and outgoing rates.
12 + λ2 =λ1
(1/3)λ1 =λ2
(2/3)λ1 =λ3
Solving these equations, obtain λ1 = 18, λ2 = 6, λ3 = 12 parts per

minute. For station 1, the arrival rate is λ1 = 18 and the service rate
is µ1 = 1/m1 = .5 per second= 30 per minute. Hence, ρ1 = λ1 /µ1 =
18/30 = 0.6 and the utilization of station 1 is 60%.
(b) Each station is M/M/1, so the stationary distribution for each station
is (1 − ρk )ρik where k is the station number and i is the number of
people in the station. Hence, the answer is
(1 − ρ1 )ρ01 × (1 − ρ2 )ρ02 × (1 − ρ3 )ρ03

=(1 − ρ1 )(1 − ρ2 )(1 − ρ3 ) = (1 − .6)(1 − .3)(1 − .8) = (.4)(.7)(.2) = 0.056
(c)
21 + 9 + 56 86 43
(.6/.4) + (.3/.7) + (.8/.2) = (3/2) + (3/7) + (4) = = = parts
14 14 7
(d) Using Little’s Law, W = L/λ, compute the average waiting time for
each station. Average number of people in the queue for each station
is L1 = .62 /.4 = .9, L2 = .32 /.7 = 9/70, L3 = .82 /.2 = 3.2. Hence,
average waiting time for each station is W1 = .9/18 = .05, W2 =
(9/70)/6 = 3/140, W3 = 3.2/12 = 4/15.
1 3 4 21 + 9 + 112 142 71
W = + + = = = .
20 140 15 420 420 210
(e) It should be the same as the arrival rate for the entire system since
any traffic intensity is greater than or equal to 1. The answer is that
the throughput of the entire factory is 12 parts per minute.
13. (a) M/M/∞ because we have infinite number of servers and infinite sys-
tem space.
(b) The state space S = {0, 1, 2, · · · }. The rate diagram is as follows.
6 6 6 6
0 1 2 3 ···
8 6 4 2
(c) Set λ = 6, µ = 2, so ρ = 3.
π1 =ρπ0
ρ ρ2
π2 = π1 = π0
2 2!
ρ ρ3
π3 = π2 = π0
3 3!
..
.
Hence,
∞
ρ2 ρ3
X
πi = π0 1 + ρ + + + · · · = π0 e ρ = 1 ∴ π0 = e−ρ = e−3 .
i=0
2 3!
In sum,
3n −3
πn = e for n = 0, 1, 2, · · · .
n!
(d) 1 − π0 = 1 − e−3
(e) ($100)(1 − π0 ) + (−$50)π0 = ($100)(1 − e−3 ) − ($50)e−3 = (100 −
150e−3 ) dollars

Jim Dai Textbook PDF

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Jim Dai Textbook PDF

Enviado por

Direitos autorais:

Formatos disponíveis

Stochastic Manufacturing & Service Systems

Jim Dai and Hyunwoo Park

October 19, 2011

3 Discrete Time Markov Chain 39

3.7 Computing Stationary Distribution Using Cut Method . . . . . . 59

5 Continuous Time Markov Chain 91

6 Exercise Answers 117

1.1 Profit Maximization

holding cost. Hence, the following relationship holds: s = −h. This is

As a business, you are assumed to want to maximize your profit. Expressing

Profit =Revenue − Cost

Profit =Earning − Loss

2. D is a continuous random variable following uniform distribution between

Using this information, compute the expectations directly by integration.

be computed in a similar way.

Now that we have learned how to compute E[D∧y], E[(y−D)+ ], E[(D−y)+ ],

E[g(D, y)] =(p − cv )E[D ∧ y] − (cv + h)E[(y − D)+ ] − bE[(D − y)+ ] − cf

Use (1.6), (1.7), (1.8) to take the derivative of (1.5).

If we differentiate (1.9) one more time to obtain the second derivative,

which is always nonpositive because p, b, h, fD (y) ≥ 0. Hence, taking the deriva-

for continuous demand D.

Looking at Theorem 1.3, it provides the following intuitions.

• If b  h, b  cv , you will also prepare as many as you can.

Example 1.2. Suppose p = 10, cf = 100, cv = 5, h = 2, b = 3, D ∼ Uniform(10, 30).

Therefore, we can conclude that the optimal order quantity

What is the optimal order quantity for every period?

1.2 Cost Minimization

• Understock cost (cu ): It occurs when your production is not sufficient

Interpret these values into FD (y).

FD (0) =Pr{D = 0} = 0.0498 < 0.2

Hence, the optimal production quantity here is 2.

1.3 Initial Inventory

E[Cost] =cu E[(D − 10)+ ] + co E[(10 − D)+ ]

If your current If your current

1. Scenario 1: Not Ordering

E[Cost] =cu E[(D − m∗ )+ ] + co E[(m∗ − D)+ ]

E[Cost] =cf + (15 − 10)cv + cu E[(D − 15)+ ] + co E[(15 − D)+ ]

At m∗ , (1.14) and (1.15) should be equal.

150 − 10m∗ = 105 − 4m∗ ⇒ m∗ = 7.5 units

Listing 1.1: Continuous Uniform Demand Simulation

# How many random demands will be generated?

# Generate n random demands from the uniform distribution

# Test the policy where we order 15 items for every period

# Test the policy where we order 20 items for every period

# Test the policy where we order 25 items for every period

Listing 1.2: Discrete Demand Simulation

# How many random demands will be generated?

# Generate n random demands from the discrete demand distribution

# Test the policy where we order 15 items for every period

# Test the policy where we order 20 items for every period

# Test the policy where we order 25 items for every period

There are other distributions such as triangular, normal, Poisson or binomial

2. Let D be a discrete random variable with the following pmf.

(a) E[min(D, 7)]

where x+ = max(x, 0).

3. Let D be a Poisson random variable with parameter 3. Find

(a) E[min(D, 2)]

Note that pmf of a Poisson random variable with parameter λ is

4. Let D be a continuous random variable and uniformly distributed between

(a) E[max(D, 8)]

where x− = min(x, 0).

5. Let D be an exponential random variable with parameter 7. Find

(a) E[max(D, 3)]

• If b h, b cv , you will also prepare as many as you can.