Probability Foundations For Engineers PDF

Probability
Manufacturing and Industrial Engineering
Nachlas
“… responds to a need that I felt some years ago, which is to provide a basic and
direct presentation of probability to engineers.”
—Enrico Zio, Politecnico di Milano, Dipartimento Energia, Milano, Italy
Foundations
“… an excellent introductory book on probability for engineers …”
—Edward A. Pohl, University of Arkansas, Fayetteville, USA
Probability Foundations for Engineers

“The theories are presented in a conversational rather than formal form as in
for Engineers
most of the literature on probability. … introduces the reader in the field of
randomness in a nice way. … creates a solid foundation to build up knowledge …
The strength of the book is that it presents and translates the intuition concerning
probability into mathematical structures using examples and explanations rather
than the traditional approach of theorem and proof …”
— Prof. Uday Kumar, Luleå University of Technology, Sweden
“… gives an in-depth and rigorous presentation of probability theory, while

avoiding a classical mathematical—theorems/proofs—presentation. … As the
author himself writes, he wants his book to be a supporting tool to go from
intuition to mathematical rigor and this is certainly rewarding and fruitful from
the pedagogical point of view. … a valuable tool for engineering students who
want to learn the basic concepts and notions of probability theory and be able to
make use of these on engineering problems.”
—Christophe Bereguer, Grenoble Institute of Technology, France
“… takes a fresh approach to teaching undergraduate engineering students

the fundamentals of probability. The book exploits students’ existing intuition
regarding probabilistic concepts when presenting these concepts in a more
rigorous manner.”
—Lisa Maillart, University of Pittsburgh, Pennsylvania, USA
“ … a valuable book … Its conversational manner, use of everyday examples, and

attention to the fundamentals of probability theory make it eminently suitable
for an introductory one-semester course.”
—Andrew K. S. Jardine, University of Toronto, Canada
Joel A. Nachlas
K14453
ISBN: 978-1-4665-0299-4
90000
www.c rc pr e ss.c o m
9 781466 502994
w w w.crcpress.com
K14453 cvr mech.indd 1 4/11/12 10:53 AM

Probability
Foundations
for Engineers
Probability
Foundations
for Engineers
Joel A. Nachlas
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2012 by Taylor & Francis Group, LLC

CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works

Version Date: 20120518
International Standard Book Number-13: 978-1-4665-0301-4 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made
to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all
materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in
any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, micro-
filming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.
copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that
have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identi-
fication and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at

http://www.crcpress.com
Dedicated to the memory of
Dr. Marvin M. Nachlas,
a talented scientist, a sensitive physician, and a loving father

Contents
Preface...................................................................................................................... ix
Author...................................................................................................................... xi
1. Introduction..................................................................................................... 1
1.1 Historical Perspectives......................................................................... 1
1.2 Formal Systems..................................................................................... 2
1.3 Intuition.................................................................................................. 3
Exercises............................................................................................................ 3
2. A Brief Review of Set Theory...................................................................... 5
2.1 Introduction........................................................................................... 5
2.2 Definitions.............................................................................................. 5
2.3 Set Operations....................................................................................... 7
2.4 Venn Diagrams...................................................................................... 8
2.5 Dimensionality..................................................................................... 10
2.6 Conclusion............................................................................................. 11
Exercises........................................................................................................... 11
3. Probability Basics.......................................................................................... 15
3.1 Random Experiments, Outcomes, and Events................................. 15
3.2 Probability............................................................................................. 17
3.3 Probability Axioms.............................................................................. 17
3.4 Conditional Probability....................................................................... 21
3.5 Independence....................................................................................... 25
Exercises........................................................................................................... 28
4. Random Variables and Distributions....................................................... 33
4.1 Random Variables................................................................................ 33
4.2 Distributions......................................................................................... 35
4.2.1 Probability Mass Functions................................................... 38
4.2.2 Probability Density Functions.............................................. 40
4.2.3 Survivor Functions................................................................. 41
4.3 Discrete Distribution Functions........................................................42
4.3.1 The Bernoulli Distribution....................................................43
4.3.2 The Binomial Distribution.....................................................44
4.3.3 The Multinomial Distribution.............................................. 47
4.3.4 The Poisson Distribution....................................................... 48
4.3.5 The Geometric Distribution.................................................. 49
4.3.6 The Negative Binomial Distribution.................................... 50
4.4 Continuous Distribution Functions.................................................. 52
4.4.1 The Exponential Distribution............................................... 53
4.4.2 The Gamma Distribution......................................................54
vii
viii Contents
4.4.3 The Weibull Distribution...................................................... 56

4.4.4 The Normal Distribution...................................................... 58
4.4.5 The Uniform Distribution.................................................... 64
4.5 Conditional Probability...................................................................... 64
4.6 Hazard Functions............................................................................... 66
4.7 Independent Random Variables....................................................... 68
Exercises.......................................................................................................... 69
5. Joint, Marginal, and Conditional Distributions.................................... 77
5.1 The Idea of Joint Random Variables................................................. 77
5.2 The Discrete Case................................................................................ 78
5.2.1 Marginal Probability Functions.......................................... 80
5.2.2 Conditional Probability Functions...................................... 81
5.3 The Continuous Case......................................................................... 83
5.3.1 Marginal Probability Functions.......................................... 85
5.3.2 Conditional Probability Functions...................................... 87
5.4 Independence...................................................................................... 91
5.5 Bivariate and Multivariate Normal Distributions........................... 93
Exercises......................................................................................................... 102
6. Expectation and Functions of Random Variables................................. 105
6.1 Expectation......................................................................................... 105
6.2 Three Properties of Expectation...................................................... 109
6.3 Expectation and Random Vectors................................................... 110
6.4 Conditional Expectation................................................................... 114
6.5 General Functions of Random Variables........................................ 119
6.5.1 One-Dimensional Functions............................................... 119
6.5.2 Multidimensional Functions............................................... 122
6.6 Expectation and Functions of Multiple Random Variables......... 125
6.7 Sums of Independent Random Variables....................................... 126
Exercises......................................................................................................... 133
7. Moment-Generating Functions................................................................ 137
7.1 Construction of the Moment-Generating Function...................... 137
7.2 Convolutions....................................................................................... 140
7.3 Joint Moment-Generating Functions............................................... 142
7.4 Conditional Moment-Generating Functions.................................. 146
Exercises......................................................................................................... 148
8. Approximations and Limiting Behavior................................................ 151
8.1 Distribution-Free Approximations.................................................. 151
8.2 Normal and Poisson Approximations............................................ 153
8.3 Laws of Large Numbers and the Central Limit Theorem........... 156
Exercises......................................................................................................... 158
Appendix: Cumulative Poisson Probabilities............................................... 161
Index...................................................................................................................... 165
Preface
This book is intended for undergraduate (probably sophomore-level) engi-

neering students—principally industrial engineering students but also
those in electrical and mechanical engineering who enroll in a first course in
probability. It is specifically intended to present probability theory to them in
an accessible manner. The book was first motivated by the persistent failure
of students entering my random processes course to bring an understanding
of basic probability with them from the prerequisite course. This motivation
was reinforced by more recent success with the prerequisite course when it
was organized in the manner used to construct this text.
Essentially, everyone understands and deals with probability every day
in their normal lives. There are innumerable examples of this. Nevertheless,
for some reason, when engineering students who have good math skills are
presented with the mathematics of probability theory, a disconnect occurs
somewhere. It may not be fair to assert that the students arrived to the s econd
course unprepared because of the previous emphasis on theorem-proof type
mathematical presentation, but the evidence seems to support this view. In
any case, in assembling this text I have carefully avoided a theorem-proof
type of presentation. All of the theory is included but I have tried to present
it in a conversational rather than a formal manner. I have relied heavily on
the assumption that undergraduate engineering students have a solid mas-
tery of calculus. The math is not emphasized so much as it is used.
Another point stressed in the preparation of the text is that there are no
ball-in-urn examples or problems. Gambling problems related to cards and
dice are used, but balls in urns have been avoided. At the same time, to the
extent possible, the examples used are based on engineering applications—
often in inventory, service operations, reliability, or quality contexts.
In developing the content of this book, I respected the fact that there is
a second and probably other courses that should follow it. I have therefore
focused on the fundamentals of probability theory. I have avoided advancing
into stochastic processes, the gambler’s ruin problem, matching problems,
and stopping rule analyses. The intent here is to provide a comprehensive
and understandable treatment of the fundamentals. Once the students have
mastered these, we can lead them forward to stochastic processes, simula-
tion modeling, and statistics.
Clearly, there are many views concerning how the fundamentals of
probability should be organized. I have attempted to create coherent sec-
tions of the topic and to present them in an organized sequential manner.
Necessarily, the text starts with set theory and moves on to probability axi-
oms. I then treat single-dimensional random variables and their distribu-
tions, followed by multidimensional random vectors and their distributions,
ix
x Preface
and then conditional distributions. In the process, I intentionally postpone

the discussion of expectation and moments until later in the text. I conclude
with a short treatment of approximations and the three key limit theorems.
In my view, this makes for a dense but manageable one-semester course that
should prepare students for the continued study of probability. Considering
the dominant role that probability has in engineering practice and in our
lives, I believe that this is an effective way to introduce the rigor of the
subject to those who will use it.
Author
Joel Nachlas, Ph.D., has worked on the faculty of the Industrial and
Systems Engineering Department at Virginia Polytechnic Institute and State
University (Virginia Tech, Blacksburg) since 1974. He has served and con-
tinues to serve as the coordinator for the department’s Operations Research
faculty and curricula and is also the coordinator of the department’s inter-
national program. The foci of Dr. Nachlas’s research are the application of
probability theory to reliability analysis and maintenance planning and sta-
tistical methods to quality control. He earned a B.E.S. from Johns Hopkins
University (Baltimore, Maryland) in 1970 and an M.S. and Ph.D. from the
University of Pittsburgh (Pennsylvania) in 1974 and 1976, respectively. All
three of his degrees are in industrial engineering with a concentration in
operations research. Dr. Nachlas has received numerous awards for his
research including the 1991 P.K. McElroy Award and the 2004 Golomski
Award. He is also the editor of the Proceedings of the Annual Reliability and
Maintainability Symposium, a member of INFORMS, the Institute of Industrial
Engineers, and a fellow of both the American Society for Quality and the
Society of Reliability Engineers. He also serves as head coach of the Virginia
Tech men’s lacrosse team and was selected in 2001 as the U.S. Lacrosse MDIA
national coach of the year.
xi
1
Introduction
Most people have an intuitive feel for probability. Many people play card
games—either for fun or for profit—and most start playing card games as
children. People also talk about weather in terms of probability. It is common
to speak of the chances of side effects associated with medications, and the
chances of automobile accidents, or of contracting communicable diseases.
These are just a few examples of the ways in which probability is a part of
our lives that we seem to understand well.
Paradoxically, most people confronted with the study of the mathemati-
cal representation and analysis of probability find this effort challenging
or worse. The question becomes one of translating our intuition concerning
probability into an understanding of the mathematical structure of the sub-
ject. The answer is far from clear. This text represents an attempt to support
the transition from intuition to mathematical rigor. The vehicle for promot-
ing the transition is explanation and example rather than theorem and proof.
As we proceed, readers are encouraged to reflect on the experiences they
have had with practical realizations of probability and the relationship of
those experiences to the topics described here.
1.1 Historical Perspectives

Several authors have recounted the evolution of probability theory. They
indicate that interest in probability started with gambling, perhaps in pre-
historic times, and that probability analysis has been used—and sometimes
abused—consistently in an intuitive manner until sometime in the 17th cen-
tury. At that time, scientific study in numerous disciplines and especially in
mathematics advanced dramatically. An element of the awakening of scien-
tific inquiry was the exploration of random phenomena.
The two principle contributors to the definition of probability as a subject of
scientific study were the French mathematicians Pierre de Fermat and Blaise
Pascal. It is fair to say that they were the progenitors of the mathematical study
of probability. Important contributions were made subsequently by Bernoulli,
Huygens, and DeMoivre. In general, these mathematicians focused on discrete
problems, many of which were motivated by questions about gambling or
equivalent processes. The structure of modern probability theory is probably
1
2 Probability Foundations for Engineers
attributable to Richard von Mises and Andrei Kolmogorov, whose principal

works were published in the 1930s. The reader is encouraged to explore the
descriptions of the work of these mathematicians. Perhaps the most important
points to extract from this short historical description are that the origins of
probability analysis are attempts to understand real human experiences, and
that the scientific formalism of probability theory is relatively recent.
1.2 Formal Systems

Probability theory is a formal system, as are most mathematical domains. To
put this in perspective, we might consider that mathematics as a whole is a
formal system and that mathematics is a very broad domain of study within
which there are reasonably self-contained subdomains. A global definition
of a “formal system” is that it is a coherent set of elements, such as a vocabu-
lary, along with a syntax or rules for combining the elements. A computer
programming language such as C++ or HTML is a formal system as are
(1) a language such as English or Chinese, (2) chemical notation, and (3) the
philosophical rules for dialog.
For discussion purposes, we observe that geometry is a formal system and
is also a subdomain of mathematics. It is the part of mathematics concerned
with describing spatial relationships. Note that geometry starts with a set
of axioms, which are rules that are accepted without proof. The elements of
geometry are angles, points, lines, and planes. There are rules for using the
axioms to construct understandings about the elements.
In a mathematical sense, probability is also a formal system. It is the sys-
tem that is used to describe random phenomena. It is based on a set of axi-
oms and includes a set of rules for using the axioms to obtain understanding
about the elements, which are the realizations of the random phenomena.
What then is a random phenomenon? Each person has his or her own defi-
nition. The one assumed here is that a random phenomenon is one in which
repeated application of the same stimulus yields different and unpredictable
responses. For example, if one repeatedly rolls a fair six-sided die, one may
observe several different responses and none of the responses can be pre-
dicted in advance.
Within the context of the definition of a random phenomenon, we should
pause and distinguish between “statistics” and probability. In this text, the
subject is probability, which is a model for future experiences. We discuss
what we will observe if we roll a die or if we produce a unit of product
or if we monitor an inventory level or a stock price. Statistics is the use of
analytical methods to interpret and make decisions using historical infor-
mation. It may involve the interpretation of observations from a random
phenomenon, but it may also involve descriptions of nonrandom processes.
Introduction 3
The statements that parking lot P4 contains 240 places, that the United States
has had 44 presidents, and that a series of 8 coin tosses yielded 5 heads are
all statistics. They describe past experiences. Many people confuse the two
terms. We are studying probability.
1.3 Intuition
This chapter began with the comment that probability started as an intuitive
evaluation of future experiences. As you now undertake to study probability,
consider the following questions.
1. What do you think are the chances that you would see a blackjack
hand?
2. What is the probability that your car will survive until you graduate?
3. What is the probability that one of your classmates will die this year?
4. What is the probability that a tornado will damage your campus this
year?
5. For an arbitrary consumer product that you purchase this year, what
is the probability that it is defective?
What does your intuition suggest concerning these questions? Assuming

you will use mathematical logic rather than intuition to answer these ques-
tions, what models do you think will help? It is not necessary to answer this
question now. It is worthwhile to remember these questions and the choice
of model as your study proceeds.
Exercises
1.1 Describe an experience you have had with probability, possibly in
a game or betting context. Indicate how you analyzed the prob-
abilities involved.
1.2 How should we interpret the fact that a weather forecast indicates
a 60% chance of rain today and it does not rain?
1.3 Identify four events or activities that involve you today and are
subject to probability.
1.4 Suggest four engineering applications in which probability is an
important element.
2
A Brief Review of Set Theory
2.1 Introduction
The starting point for our study of probability is a review of the basic concepts of
the mathematical domain called set theory. The reason we start with set theory
is that it will provide a vehicle for organizing the elements of our probability
models. As implied in the name, set theory is a structured language for discuss-
ing “sets.” The initial formal definition of set theory was provided by George
Cantor in 1874. The objective of Cantor’s work and that of other mathemati-
cians working with set theory was to obtain an understanding of infinity. The
difficulty of this idea precipitated considerable debate among mathematicians
and ultimately led to the definition of the axiomatic system that we will use.
This chapter is called a review of set theory because many students who
undertake the study of probability have already encountered set theory in
earlier math courses. For those who are meeting set theory here for the first
time, the descriptions provided next should be sufficient. If not, many sup-
plementary resources are available in the library and on the Web.
A set is simply a collection of entities in which we are interested. The collec-
tion of interest might be all of the Ford sedans registered in Oregon this year,
the people in Pennsylvania receiving liver transplants this month, the red
face cards in a standard deck of poker cards, the engine bearings produced
in a particular plant today, the duration of Internet sessions, the hardness of
cutting tools, or the equity stocks included in your investment portfolio. This
list is intended to illustrate that the idea of a set is general. It can be applied to
any collection of things that we would like to discuss or analyze. The collec-
tion may include a finite number of members (elements) or an infinite num-
ber of members. The important aspect of a set is that it be clearly defined.
2.2 Definitions
It is conventional to represent a set by a capital letter. For example, the set of
Chevrolet Malibus registered in Florida could be represented as
5
M = { x|x is a Chevrolet Malibu with Florida liicense tags}

Note that the capital M has been used to represent the set and that x has
been used to represent an element (or member) of the set. The vertical line
is read as “such that.” Thus, this set definition should be read as “M is the
set of members, x, such that x is a Chevrolet Malibu with Florida tags.” Note
further that braces “{ }” are used to specify the members of a set. If we wish
to analyze features of any group of items, the definition of the corresponding
set must make the identities of the elements clear.
For most applications, we anticipate that a set will have subsets. That is, sets
may contain groups of members that are subject to more specific identification
and can thus be organized into sets. For example, define the sets B and W as
B = { x|x ∈ M and is blue}

W = { x|x ∈ M and is white}

where the symbol ∈ is read as “is an element of” or “is in.” Thus the set B
is the set of elements of M that are blue (the set of blue Chevrolet Malibus
registered in Florida). We can see that the sets B and W are contained in the
set M and we represent this as
B ⊂ M and W ⊂ M
which are read as B is contained in M and W is contained in M. Equivalently,

we could simply say that B is a subset of M and W is a subset of M. In terms
of notation, we may also wish to represent cases in which a subset might
actually correspond to the entire set of which it is a part. For two sets, say X
and Y, we would represent this as
X ⊆ Y

This is read as “X is a subset of Y.” The distinction between this algebraic
statement and the ones provided for B and W is that it would be more correct
in those earlier cases to say B is a proper subset of M and W is a proper subset
of M. This means that the subset B does not exhaust M and similarly for W.
The conceptual parallel to the distinction in membership statements B ⊂ M
and X ⊆ Y is the numerical distinction we make between a < b and a ≤ b. In the
first case, equality is precluded while in the second case equality is possible.
In fact, observe that an implication of this notation is that
if X ⊆ Y and Y ⊆ X, then X = Y
Thus, if two sets simultaneously contain each other, they must be identical.
A Brief Review of Set Theory 7
Regardless of the context within which we define sets, there are two sets that
are fundamental to our definitions and our analysis. These are the “universe”
of elements, the set of all of the possible elements we might discuss, and the
“null” set (or empty set), which contains no elements at all. In a probability
context, we will also refer to the universe as the “sample space” and will denote
it by an uppercase omega, Ω. We represent the empty set by the symbol ∅.
2.3 Set Operations

Since set theory is a formal mathematical system, it includes algebraic opera-
tors. Specifically, there are rules for combining and separating the elements
of sets into other sets. The operations are called union and intersection. The
idea of a union of sets is conceptually parallel to the idea of addition in arith-
metic. For two sets A and B of a universe Ω, the union is defined as
A ∪ B = { x|x ∈ A or x ∈ B} (2.1)
Thus, the union of two sets is a set of elements that is in at least one of the
sets. For the example of Chevrolet Malibus registered in Florida, B ∪ W is the
set of those cars each of which is either blue or white.
Conceptually, an intersection is similar to arithmetic multiplication. The
intersection of two sets is a set of elements that are in both sets. That is
A ∩ B = { x|x ∈ A and x ∈ B} (2.2)
In the case of the example of Chevrolet Malibus registered in Florida, let
T = { x|x ∈ M and is a two door model}

F = { x|x ∈ M and is a four door model}

Then, B ∩ T represents the set of those cars each of which is blue and has two
doors. Notice that
T ∩ F = ∅
which is to say that the sets T and F have no common elements. Their intersec-
tion is the empty set. We say that these sets are disjointed or mutually exclusive.
The operators union and intersection permit us to describe sets and com-
binations of sets conveniently and efficiently. Fortunately, these operators
have the desirable properties that one often seeks in an algebraic operator.
Specifically, they are commutative, associative, and distributive. These rela-

tions are defined as follows:
Commutative: A ∪ B = B ∪ A and A ∩ B = B ∩ A

Associative: (A ∪ B) ∪ C = A ∪ (B ∪ C) and (A ∩ B) ∩ C = A ∩ (B ∩ C)
Distributive: A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)andA ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
One further relationship that we define is the complement of a set. This

is the set of elements that are not in the set. In this text, we will denote the
complement of the set A by Ac and the algebraic definitions is
A c = { x|x ∈ Ω and x ∉ A} (2.3)
That is, Ac is the set of elements of the universe that are not in the set A. Note
that the definition of the complement of a set permits us to state that
A ∩ A c = ∅ and A ∪ A c = Ω
Note also that the operations of union and intersection along with the defi-
nitions of the universe, the empty set, and the complement of a set are suf-
ficient for us to describe and analyze sets in any way we feel is informative.
This includes what we might call the difference in sets. Suppose we have
the sets
A = {1,2,3,4,5} and B = {4,5,6,7,8}
If we wish to identify the set of elements of A that are not in B, we take
A ∩ Bc = {1,2,3}
Comparable constructions permit us to describe nearly any meaningful set.
2.4 Venn Diagrams

There is a convenient graphical representation that is often used for sets.
This is the Venn diagram. An example of the Venn diagram for our sets of
Chevrolet automobiles in Florida is shown in Figure 2.1.
Notice that the sets W and B are disjoint as are the sets T and F. Notice
also that the sets we have discussed so far do not exhaust the sample space
as there are other elements of M. For example, there are elements of M that
are neither blue nor white and are neither two- nor four-door models: green
Ω =M
T
W
B
FIGURE 2.1
Example of a Venn diagram.
station wagons. In addition, if it were meaningful to do so, we could recog-

nize that the set M can be discussed as a subset of the larger set that includes
the other types of cars that are registered in Florida.
With our definitions of sets, subsets, and operations on sets, we can con-
struct many meaningful and useful statements about sets. One example is
( A c )c = A (2.4)
Another is
E ∪ F = E ∪ (E c ∩ F ) (2.5)
Two other useful and widely used relationships are known as DeMorgan’s
laws:
( A ∪ B)c = A c ∩ Bc (2.6)
and
( A ∩ B)c = A c ∪ Bc (2.7)
In order to appreciate these two relationships, consider the Venn dia-

gram that represents them (Figure 2.2). Notice that the complement of the
union of A and B is the set of elements that are simultaneously not in A
and not in B. Similarly, the complement of the intersection of A and B is
the set of elements of the universe that are not simultaneously elements
of both A and B. Another useful relationship that we can see from a Venn
diagram is that a set A can always be represented as the union of the two
sets (A ∩ B) and (A ∩ Bc). That is,
A B A B
FIGURE 2.2
Venn diagrams for DeMorgan’s laws.
A = ( A ∩ B) ∪ ( A ∩ Bc ) (2.8)
Observe that the two sets (A ∩ B) and (A ∩ Bc) are disjoint and the identity
applies even if one of the intersections is empty.
It is going to be useful to extend this idea to multiple sets so consider a col-
lection of sets say E1 , E2 , …, En that is defined so that the sets are pairwise dis-
joint and in total they exhaust the set, A, of which they are all subsets. That is,
Ei ∩ E j = ∅, ∀i, j ≠ i (2.9)
and
∪ E = A (2.10)
i=1
i
We say that the set of sets Ej forms a partition of the set A. We can then use
the partition to state that
A = ( A ∩ E1 ) ∪ ( A ∩ E2 ) ∪ … ∪ ( A ∩ En ) (2.11)

If we think again of the set of Chevrolet Malibus registered in Florida, M,
the sets B and W along with the sets representing Florida-registered Malibus
of each other available color form a partition of M.
2.5 Dimensionality
The final aspect of sets we must consider is their size. The size of a set
is defined as the number of elements that are members of the set. Size is
referred to as the cardinality of the set. For a set, B, we represent the cardinal-
ity by ||B||, and generally we wish to know whether the cardinality of a set
is countable or uncountable. As the terms imply, a set is countable if one could
match the elements of the set with some or all of the natural numbers—one
could count them. If only some of the set of natural numbers is needed, the
set is a finite set. If all of the natural numbers are needed, then the set is
countably infinite. In both of these cases, the elements of the set are said to be
“discrete.” On the other hand, if a set has uncountably infinite cardinality,
the number of elements is infinite and is much greater than the number of
natural numbers.
The most common example of an uncountably infinite set is the set of real
numbers within any arbitrary interval, say [0, 1]. To see that this is the case, enu-
merate and count any sequence of values between zero and one, say (0.0, 0.01,
0.02, 0.03, …). As you do this, note that between any two of your enumerated
numbers, many additional values (infinitely many in fact) can be identified.
2.6 Conclusion
The definitions and relationships included in this chapter are the basic con-
stituents of set theory. There are more detailed and more extensive discus-
sions of set theory than the one provided here. However, the description in
this chapter has been formulated to support the study of probability in the
chapters that follow.
Although set theory has many domains of application, it is fundamental to
the construction of probability theory. The reader is encouraged to fully mas-
ter the concepts of this chapter prior to moving on to the study of probability.
Exercises
2.1 Identify at least two sets of states of the United States in at least two
different ways.
2.2 For the sets you identified in Exercise 2.1, identify subsets.
2.3 Select any physical entity and define sets of it.
2.4 Let E, F, and G be three sets. State expressions for:
a. Only F occurs
b. Exactly two of the sets occur
c. At least one of the sets occurs
d. E and G occur but F does not occur
e. None of the sets occur
2.5 Two 6-sided dice are tossed. Let A represent the set of tosses for
which the sum of the dice is even, B be the set of tosses for which at
least one die shows a 3, and C be the set of tosses for which the sum
of the dice is 7. Identify the elements of
a. A ∩ B
b. Bc ∩ C
c. A ∩ C
d. Ac ∩ Bc ∩ Cc
2.6 Let A, B, and C be three sets. Prove that C ∩ (A ∩ B)c = (C ∩ Ac) ∪ (C ∩ Bc).
2.7 Prove that R ∪ S = R ∪ (Rc ∩ S).
2.8 Draw a Venn diagram that shows the relationship in Exercise 2.7.
2.9 Two fair 6-sided dice are tossed. Let A be the set of tosses for which
the sum of the dice is less than 7, B be the set of tosses for which at
least one die shows a 3, and C be the set of tosses for which the sum
of the dice exceeds 4. Identify the elements of
a. A ∩ Cc
b. A ∩ Bc ∩ C
c. B ∩ Cc
d. Ac ∩ Cc
e. A ∪ (B ∩ C)
2.10 Explain why the set of all stars is countably infinite.
2.11 Let a universe, Ω, be the set of cards in a standard poker deck.
Identify a partition of Ω.
2.12 Suppose events A, B, and C form a partition of a sample space Ω.
Use a Venn diagram to show that an event E of the same sample
space can be stated as E = (A ∩ E) ∪ (B ∩ E) ∪ (C ∩ E).
2.13 Suppose a sample space is defined by Ω = {x|0 ≤ x ≤ 20}. If the
events A = {x|8 ≤ x ≤ 12}, B = {x|10 ≤ x < 15}, C = {x|7 < x ≤ 10}, and
D = {x|11 ≤ x ≤ 17}, describe the following events and draw them on
the real line.
A ∪ B, A ∩ B, A ∩ Dc
B ∪ C, Bc ∩ Cc, C ∪ D
Bc ∩ D, A ∩ C, A ∪ B ∪ C
2.14 In some communication circuits, a three-component voting routine
is used to determine if a message has been transmitted accurately.
For a particular message, let Yi represent the event that voter i indi-
cates accurate transmission. Express each of the following events in
words.
Y2 ∪ Y3 Y1 ∪ Y2 ∪ Y3c (Y1 ∪ Y3 )c (Y1 ∪ Y2 ∪ Y3 )c

Y1 ∩ Y2 Y2c ∩ Y3 (Y1 ∩ Y2 ∩ Y3 )c Y1c ∩ Y2c

2.15 Suppose an experiment consists of observing the speed of traf-

fic flow on Interstate 695 between the Falls Church exit and the
exit for the Dulles Airport Toll Road. Provide a reasonable defini-
tion of the sample space for this experiment. Interpret the events
A = {x|0 ≤ x ≤ 4 mph}, B = {x|20 ≤ x ≤ 35 mph}, and C = {x|50 ≤ x ≤ 65 mph}.
3
Probability Basics
3.1 Random Experiments, Outcomes, and Events

The discussion of concepts from set theory is preliminary to our investiga-
tion of probability. We will use relationships among sets to build our prob-
ability models. The starting point for the study of probability is the idea of
a random experiment. A random experiment is the process of observing a
phenomenon in which we have an interest. As implied in the term random
experiment, the phenomenon of interest is presumed to be one that is subject
to randomness.
A reasonable first question is, What do we mean by the term randomness?
In a general context, the term randomness is defined here as the property of
an experiment that with repeated application of the same stimulus yields
multiple responses such that on any observation the response that will be
obtained is not predictable. A simple example of such a process is the toss-
ing of a coin. Two observations are possible, and prior to any toss, its result
cannot be predicted.
Within the context of industrial engineering applications, the experiment
may be checking an in-process inventory level, measuring the wear on a
cutting tool, counting the number of incoming calls to a call center, observ-
ing the value of an investment, counting the number of positive diagnostic
medical tests, observing the life length of a component or equipment unit, or
any of the many other processes that we treat. The common features of those
experiments are that they are meaningful to us and they display random
variation in their responses to the stimuli we apply.
A random experiment necessarily can result in any of a number of obser-
vations. The possible observations that might occur are called outcomes. It is
important to be careful here. This is where we start to use set theory. The
result of any random experiment is an outcome but we usually discuss the
observation, the outcome, in terms of events.
An event is a set of outcomes. While our specific experimental observation
will be an outcome, we can only define a probability model for our experi-
ment using sets of outcomes. Our terminology convention is to call those
sets events. We will further discuss the relationships between outcomes and
15
events soon. For now, we should first recall from our discussion of set theory
that we call the set of all possible outcomes of an experiment the sample space
and will use Ω to represent it. The following are two examples.
Example 3.1
When an integrated circuit is manufactured, it can have three types of
flaws:
a. A short due to a lack of material on a circuit path

b. A bridge due to the deposition of excessive material across two
circuit paths
c. A crack or pit in its external coating
If we select a recently produced circuit at random and test it, the result of
this process will be random. The set of possible observations is
Ω = {no flaw, short, bridge, crack}
Example 3.2
The tread depth on certain newly manufactured, radial pattern automotive
tires varies in the range of 7.5 mm to 8.5 mm. If we select a recently pro-
duced tire and measure its tread depth, the set of possible o
bservations is
Ω = { x|7.5 ≤ x ≤ 8.5 }
An observation we should make concerning these two examples is that the

sample space may be comprised of a finite (or countable) number of elements,
or it may be comprised of an uncountably infinite number of elements. When
the number of elements of the sample space is countable (as in Example 3.1), it
is usually the case that we consider the elements “discrete” and as a result, we
may define an event to correspond to a single outcome. On the other hand,
when the number of elements is uncountably infinite (as in Example 3.2), we
define events in terms of sets of outcomes. For Example 3.1, we could define
the events:
A = { no flaw}, B = { short, bridge}, C = { crack }, and D = { short, bridge, crack }
Other definitions are also possible but these illustrate the fact that an event
may correspond to one or to more than one outcome. Note also that the
events need not be mutually exclusive.
In the case of Example 3.2, we would not—in fact cannot—define events
comprised of single outcomes. Instead, we define events as intervals, such as
E = { x|7.80 ≤ x < 7.99 }, F = { x|7.90 ≤ x < 8.25 }, and G = { x|8.10 ≤ x ≤ 8.40 }

Probability Basics 17
The key point here is that events are sets. Thus, all of the things we said in Chapter
2 about sets apply to events. We may therefore model our random experiment
in terms of the events corresponding to sets of possible observations.
3.2 Probability
Keep in mind that our objective is to define a predictive mathematical
model of an experiment consisting of observing a phenomenon or process
of interest. The definition of the sample space and its events provides the
structural basis (the skeleton) for our model. We next attach our predictive
measure—probability—to our structure.
In its most general sense, probability is simply a single-valued mathematical
function that we define on a sample space. There are rules for how we make
the definition but these are reasonably unrestrictive. Two key rules are (1)
that we define our probability functions on the events of the sample space
rather than on outcomes, and (2) that the domain of the probability function
is the entire sample space and the range is the real interval [0,1].
Before proceeding with this idea further, we observe that people (prob-
ability specialists and philosophers most of all) have argued about how to
assign probabilities to events and how to interpret probability measures for
a long time. These debates continue and are often quite intense. Fortunately
for engineers who wish to apply probability, the formal mathematical system
we will create and study here is internally consistent and “correct” for any
(and all) of the different philosophical interpretations.
As previously discussed, just as geometry is our mathematical language
for describing spatial relationships, probability is our mathematical language
for describing randomness. The philosophical explanations of the origins of
randomness differ substantially but the usefulness of the mathematics tran-
scends those distinctions.
3.3 Probability Axioms

For the purposes of our study, probability is the measure that indicates the
chance that a random experiment will yield a specific event. We have consid-
erable latitude in how we specify that measure. In fact, the function we define
may be organized in nearly any way we wish provided that it conforms to
three basic axioms. Another way to view this idea is that we will take three
basic and very general axioms and using those axioms, we will construct the
entire formal system that we call probability. Note the parallel to geometry.
The three axioms of probability are

1. For every event E, 0 ≤ Pr[E] ≤ 1.
2. Pr[Ω] = 1.
3. If { A1, A 2, … } is a collection of disjoint events, then
 
 j ∪
Pr  A j  =
 ∑ Pr[ A j ].
  j
It may seem surprising but these three axioms are all that we need to develop
all of probability theory. Starting with them, we can construct many useful
results that we can then use to model the physical phenomena that we wish
to study.
Following are a few examples of the results that we can construct.
Example 3.3
For any event E, Pr[Ec] = 1 − Pr[E]. Since E ∪ Ec = Ω and E ∩ Ec = ∅,
Pr[E ∪ Ec] = Pr[Ω] = 1 and Pr[E ∪ Ec] = Pr[E] + Pr[Ec].
Example 3.4
The result in Example 3.3 implies that Pr[∅] = 0.
Example 3.5
For two events E1 and E2 having E1 ⊆ E2, it must be the case that
Pr[E1] ≤ Pr[E2]. Suppose E1 ⊆ E2. Then, it must be the case that
E2 = E1 ∪ (E2 ∩ E1c ) and clearly E1 ∩ (E2 ∩ E1c ) = ∅ . Thus, Pr[E2] =
Pr[E1 ] + Pr[E2 ∩ E1c ] , so Pr[E2] ≥ Pr[E1].
Example 3.6
For any two events E1 and E2, Pr[E1 ∪ E2] = Pr[E1] + Pr[E2] − Pr[E1 ∩ E2].
Note that E1 = (E1 ∩ E2 ) ∪ (E1 ∩ E2 ) and (E1 ∩ E2c ) ∩ (E1 ∩ E2 ) = ∅, so Pr[E1] =
c
Pr[E1 ∩ E2 ] + Pr[(E1 ∩ E2 ] .
c
Similarly E2 = (E2 ∩ E1c ) ∪ (E1 ∩ E2 ) and (E2 ∩ E1c ) ∩ (E1 ∩ E2 ) = ∅ , so

Pr[E2 ] = Pr[E2 ∩ E1c ] + Pr[(E1 ∩ E2 )].
Thus, Pr[E1 ] + Pr[E2 ] = Pr[E1 ∩ E2c ] + Pr[E2 ∩ E1c ] + 2 Pr[(E1 ∩ E2 )].
However, the union may be decomposed as E1 ∪ E2 =
( ) ( )
(E1 ∩ E2c ) ∪ (E1 ∩ E2 ) ∪ (E2 ∩ E1c ) ∪ (E1 ∩ E2 ) .
So Pr[E1 ∪ E2] = Pr [E1] + Pr [E2] – Pr [E1 ∩ E2].
Example 3.7
For three events E1, E2, and E3, we have Pr[E1 ∪ E2 ∪ E3] = Pr [E1] + Pr [E2] +
Pr [E3] − Pr[E1 ∩ E2] − Pr[E1 ∩ E3] − Pr[E2 ∩ E3] + Pr[E1 ∩ E2 ∩ E3].
Next, return to the application of probability to events rather than out-

comes. Consider the example of tire tread depths. What are the possible
outcomes? How many possible outcomes are there? How can we define a
probability function on the set of outcomes in a manner that assures that
axioms 1 and 2 hold?
The answer is that this cannot be done because of the fact that Ω is uncount-
ably infinite. However, if we form events as subsets of the sample space, we
can define the probability function. Hence, this is how we proceed. Consider
two examples of a discrete sample space.
Example 3.8
Consider that we have two fair four-sided dice (pyramids). For each die,
what are the possible outcomes?
Ω = { 1, 2, 3, 4 }
For the sample space corresponding to the tossing of only one die, con-
sider the events:
a. We roll an even number, A = { x|x is even }.

b. We roll a number larger than 2, B = { x|x > 2 }.
c. We roll a number less than or equal to 3, C = { x|x ≤ 3 }.
What are the elements of the sets A, B, and C?
= { 2, 4 }, B = { 3, 4 }, C = { 1, 2, 3 }
A
What are A ∪ B, A ∩ B, B ∩ C, A ∪ B ∪ C?
∪ B = { 2, 3, 4 }, A ∩ B = {4 }, B ∩ C = { 3 }, A ∪ B ∪ C = { Ω }

A
What are the probabilities of occurrence for these sets?
Pr[ A ∪ B] = 3 4 , Pr[ A ∩ B] = 1 4 , Pr[B ∩ C] = 1 4 , Pr[ A ∪ B ∪ C] = 1
Example 3.9
Suppose we roll both of the four-sided dice and take the sum of their
outcomes. What is the sample space?
Ω = { 2, 3, 4, 5, 6, 7, 8 }
If we define the events, E, F, G and H as

a.
E—The sum is odd.
b.
F—The sum is between 4 and 7.
c.
G—The sum exceeds 5.
d.
H—The sum is 4.
What are the elements of these sets?
= { x|x is odd }, F = { x|4 ≤ x ≤ 7}, G = { x|x > 5 }, H = { 4 }

E
What are E ∪ F, E ∩ F, E ∩ G, E ∩ H, E ∪ H?
E ∪ F = { 3, 4, 5, 6, 7 }, E ∩ F = { 5, 7 }, E ∩ G = { 7 }, E ∩ H = ∅, E ∪ H = { 3, 4, 5, 7 }
What are the probabilities associated with these events?
Pr[E ∪ F ] = 7 8 , Pr[E ∩ F ] = 3 8 , Pr[E ∩ G] = 1 8 , Pr[E ∩ H ] = 0,

Pr[E ∪ H ] = 11 16

The preceding examples serve to illustrate that the assignment of probabil-

ity measures to events in a discrete sample space can be relatively simple and
is usually quite logical. Clearly, the process can be more complicated when
modeling an industrial phenomenon such as an inventory but it should still
be logical.
The extension to a continuous sample space is direct but we must again
be guided by logical descriptions of the process of interest. For the earlier
example of tire tread depth, we would require a sensible rule for assigning
probabilities to events. One such rule that might be reasonable would be that
the probability of an event should be proportional to its length. If that were
the rule, then the following example makes sense.
Example 3.10
7.99 − 7.80
E = { x|7.80 ≤ x < 7.99 }, Pr[E] = = 0.19
8.5 − 7.5
8.25 − 7.90
F = { x|7.90 ≤ x < 8.25 }, Pr[ F ] = = 0.35
8.5 − 7.5
8.4 − 8.10
G = { x|8.10 ≤ x ≤ 8.40 }, Pr[G] = = 0.30
8.5 − 7.5
8.25 − 8.10
F ∩ G = { x|8.10 ≤ x < 8.25 }, Pr[ F ∩ G] = = 0.15
8.5 − 7.5
8.25 − 7.80
E ∪ F = { x|7.80 ≤ x < 8.25 }, Pr[E ∪ F ] = = 0.45
8.5 − 7.5
3.4 Conditional Probability

There are experiments in which we may have some partial information or for
which knowledge concerning one event can influence the probability of another
event. For example, if my sample space is the set of all Virginia Tech industrial
and systems engineering (ISE) students, and I select one of these students at
random, and you tell me the student is male, the probability that the student
is taller than 180 cm differs from that quantity for a female student. The use of
information in this way is referred to as the use of conditional probabilities.
Another example is that if I select a motorist on an interstate highway at
random, and note that the motorist is talking on a cell phone while driving,
the probability that the motorist is younger than 35 years of age is greater
than if the motorist is not talking on a cell phone.
One final example using my dice is that you will adjust the probability that
the sum of the numbers on the two 4-sided dice exceeds 5 if I tell you that the
sum is even. In general, we compute
Pr[X 1 + X 2 > 5] = 6 16 = 0.375
However,
Pr[X 1 + X 2 > 5|X 1 + X 2 is even] = 4 8 = 0.5
Similarly, if we are given the fact that the first die shows a 2, we have
Pr[X 1 + X 2 > 5|X 1 = 2] = 1 4 = 0.25
The general relationship for conditional probabilities is explained as fol-

lows. Let A and B be events defined on a sample space Ω. The conditional
probability of event A given the occurrence of event B is denoted by Pr[ A|B ]
and is computed as
Pr[ A ∩ B]
Pr[ A|B] = (3.1)
Pr[B]
provided Pr[B] ≠ 0.
An appropriate view of a conditional probability is that the available
knowledge reduces the set of possible outcomes from the full sample space
to a subset of it—the event B. As a result, that knowledge alters the probabili-
ties associated with our observations. Looking at a Venn diagram empha-
sizes the point (Figure 3.1). If we know that B has occurred, then we know
that elements of the sample space that are in the complement of B have not
occurred. Hence, the event B contains all of the possible observations and the
set A ∩ B contains the observations from event A that are possible.
A B
FIGURE 3.1
Venn diagram illustrating conditional probability.
Example 3.11
Consider the sum of the numbers showing on the four-sided dice again.
Enumerate the elements of the sample space as
X1\X2 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
Now look at the results stated earlier:
Pr[X 1 + X 2 > 5|X 1 + X 2 is even] = 4 8 = 0.5

Pr[X 1 + X 2 > 5|X 1 = 2] = 1 4 = 0.25

To construct these probabilities using set notation, let E = { x|X1 + X2 > 5 }.

We can see that the set E = { (2, 4), (3, 3), (3, 4), (4, 2), (4, 3), (4, 4) }. Next,
let A = { x|X1 + X2 is even } so that A = { (1, 1), (1, 3), (2, 2), (2, 4), (3, 1), (3, 3),
(4, 2), (4, 4) }. Then A ∩ E = { (2, 4), (3, 3), (4, 2), (4, 4) }, so Pr[ A ∩ E] = 4 16 and
Pr[ A] = 8 16 , and the result follows.
Similarly, let B = { x|X1 = 2 }, so B = { (2, 1), (2, 2), (2, 3), (2, 4)} and B ∩ E =
{ (2, 4) }. Now Pr[B] = 1 4 and Pr[B ∩ E] = 1 16 , so again the result follows.
While they may initially appear complicated, conditional probabilities

often simplify the definition of probability models and the calculation of
probabilities. In addition, conditional probability leads us to other useful
relationships, specifically Bayes’ rule and the law of total probability.
Start again with the basic conditional probability relationship.
Pr[ A ∩ B]
Pr[ A|B] =
Pr[B]
Assume that both of the events A and B are nonempty. Then, it must also
be the case that
Pr[ A ∩ B]
Pr[B|A] =
Pr[ A]
Therefore, we can conclude that
Pr[A|B] Pr[B] = Pr[A ∩ B] = Pr[B|A] Pr[A]
We sometimes call this relationship “unconditioning.” More important, we

can manipulate it to yield
Pr[B|A]Pr[ A]
Pr[ A|B] = (3.2)
Pr[B]
This is the simplest realization of Bayes’ rule, which was initially formu-
lated by Sir Thomas Bayes during the 18th century and first published in
1763, two years after his death. Bayes was investigating incidence rates of
infectious diseases. To obtain the fully expanded expression of Bayes’ rule,
we first construct the Law of Total Probability.
Recall that for any two events, say C and D, C = (C ∩ D) ∪ (C ∩ Dc ) and
(C ∩ D) ∩ (C ∩ Dc) = Ø, so Pr[C] = Pr[C ∩ D] + Pr[C ∩ Dc].
Now, the “unconditioning” relationship allows us to state that
Pr[C ∩ D] = Pr[C|D] Pr[D] and Pr[C ∩ Dc] = Pr[C|Dc] Pr[Dc], so
Pr[C] = Pr[C|D]Pr[D] + Pr[C|Dc ]Pr[Dc ] (3.3)
This is the simplest form of the law of total probability—simplest in the

sense that only D and Dc are used to partition the sample space. If, rather
than using only D and its complement, we extend to a partition of the sample
space, say D1, D2, . . . Dn , then
n n
Pr[C] = ∑
i=1
Pr[C ∩ Di ] = ∑ Pr[C|D ]Pr[D ] (3.4)
i=1
i i
This is the general statement of the law of total probability. Take a look at it
and use the example of the four-sided dice to try it out.
Example 3.12
Let C = {x|X1 + X2 = 4} and let D1 = {x|X1 = 1}, D2 = {x|X1 = 2}, D3 = {x|X1 = 3},
D4 = {x|X1 = 4}. Then
Pr[C ∩ D1 ] = Pr[C|D1 ] = 1 4 = 0.25, Pr[D1] = 0.25

Pr[C ∩ D2 ] = Pr[C|D2 ] = 1 4 = 0.25, Pr[D2] = 0.25
Pr[C ∩ D3 ] = Pr[C|D3 ] = 1 4 = 0.25, Pr[D3] = 0.25
Pr[C ∩ D4 ] = Pr[C|D4 ] = 0, Pr[D4] = 0.25
From these, we compute
Pr[C] = 3 16 = 0.1875
To state the law of total probability in words, we might say that the prob-
ability of an event may be computed as the sum of the probabilities of its
intersections with the events that comprise a partition of the sample space.
Using the law of total probability, we can extend the earlier conditional
probability statement that
Pr[B|A]Pr[ A]
Pr[ A|B] =
Pr[B]
to the form
Pr[B|A j ]Pr[ A j ] Pr[B|A j ]Pr[ A j ]
Pr[ A j |B] = = n (3.5)
Pr[B]
∑ Pr[B|A ]Pr[A ]
i=1
i i
This is the general form of Bayes’ rule. It has many applications and forms
a basis for many of the questions and analyses we pursue in probability.
Example 3.13
An inventory system contains four types of products. Customers order
one unit of a product at a time: 20% of customers order the first type of
product, 30% the second type, 15% the third type, and 35% the fourth
type. Due to the policy used to manage the inventory system, the sup-
plier is out of the first type of product 6% of the time, the second 2% of
the time, the third 12% of the time, and the fourth 1% of the time. When
a customer orders a product that the inventory system does not have, the
order cannot be filled, and the customer takes his business elsewhere.
What is the probability that an order cannot be filled?
Let Ti denote the event that the customer orders product type i, i = 1,
2, 3, 4.
Let S denote the event that the order cannot be filled.
Pr ( S) = ∑ Pr (S T ) Pr (T ) = 0.06 (0.2) + 0.02 (0.3) + 0.12 (0.15) + 0.01(0.35)

j=1
j j
= 0.0395
Suppose a customer order cannot be filled. What is the probability that

the customer ordered the second type of product?
Pr ( S T2 ) Pr (T2 ) 0.02 ( 0.3 )

Pr (T2 S) = = = 0.1519
Pr ( S) 0.0395
Example 3.14
A cell phone manufacturer purchases display screens from two differ-
ent suppliers: 40% of screens are from Reliable Video and 60% are from
New Age Technology. Although both suppliers are trying to meet the
same longevity requirements, it has been found that the screens from
Reliable Video have a one-year survival rate of 82%, whereas those from
New Age Technology have a 94% one-year survival rate. (a) What frac-
tion of the company’s phones will have screens that survive one year?
(b) If a one-year-old phone is selected at random and found to have a
failed video display, what is the probability that the screen was pur-
chased from New Age Technology?
Pr[E] = Pr[E|RV ]Pr[RV ] + Pr[E|NAT ]Pr[ NAT ]
= (0.40))(0.82) + (0.60)(0.94) = 0.892
Pr[E′|NAT ]Pr[NAT ] (0.06)(0..6)

Pr[NAT |E′] = = = 0.333
Pr[E′] 0.108
3.5 Independence
The next topic of this chapter is that of independence. This is a difficult topic
that many people find confusing. The basic idea is that the probability of
occurrence of an event either is or is not influenced by the occurrence of
another event. If the chance of occurrence of an event is affected by the chance
of occurrence of another event, the two events are dependent, and if the prob-
ability of occurrence is not affected, the events are independent. Formally, two
events A and B defined on a sample space, are said to be independent if
Pr[ A ∩ B] = Pr[ A]Pr[B] (3.6)
Events that are not independent are dependent.

Using the simple form of Bayes’ rule previously developed that
Pr[B | A]Pr[ A]
Pr[ A | B] =
Pr[B]
implies that for independent events Pr[ A|B ] = Pr[ A ]. To see this, take
Pr[ A ∩ B] Pr[ A]Pr[B]

Pr[ A|B] = = = Pr[ A]
Pr[B] Pr[B]
The fact that for independent events Pr[ A|B ] = Pr[ A ] is intuitively appeal-

ing as it clearly indicates that knowledge of the occurrence of the event B has
no influence on the probability of event A.
Because it is not always obvious whether two events are independent,
it is important that independence be tested using either of the following
conditions:
Pr[
A ∩ B ] = Pr[ A ] Pr[ B ] or Pr[ A|B ] = Pr[ A ]
This is illustrated in Example 3.15.
Example 3.15
Two fair 6-sided dice are rolled. Define the events:
A = { x|X1 + X2 is odd }
B = { x|X1 is odd }
C = { x|X1+ X2 ≤ 5 }
D = { x|X1 = 3 }
Are events A and B independent? Are events C and D independent?
( )( )
Pr[ A ∩ B] = 1 4 = Pr[ A]Pr[B] = 1 2 1 2 = 1 4
so events A and B are independent.
( )( )
Pr[ A]Pr[B] = 10 36 1 6 =
5
108
≠ Pr[ A ∩ B] =
2
36
so events C and D are dependent. Try other combinations of these events

yourself.
A note of caution is that many people confuse independence and the con-
cept of mutually exclusive events. These are distinct ideas that should be dis-
tinguished and the difference must be recognized. One observation that may
help with keeping the ideas separate is that since independent events have
Pr[ A ∩ B ] = Pr[ A ] Pr[ B ] and mutually exclusive events have Pr[ A ∩ B ] = 0, we
must conclude that mutually exclusive events with nonzero probabilities of

occurrence are dependent.
Next, observe that the definition of independence allows us to obtain some
interesting relationships immediately. For example, if A and B are inde-
pendent, then so are A and the complement of B. That is, since A and B are
independent,
Pr[
A ∩ B ] = Pr[ A ] Pr[ B ]
Now we know that
A = ( A ∩ B ) ∪ ( A ∩ Bc )
so
Pr[
A ] = Pr[ A ∩ B ] + Pr[ A ∩ Bc ] = Pr[ A ]Pr[ B ] + Pr[ A ∩ Bc ]
Pr[
A ∩ Bc ] = Pr[ A ] (1−Pr[ B ] ) = Pr[ A ] Pr[ Bc ]
In addition, if events A, B, and C are independent, then A is independent of

all events obtained using B and C, such as intersections and unions.
One final result that we should obtain for conditional probabilities and
independence is the multiplication rule. Consider a set of events { E1, E2, …, En }
and take their intersection. Using conditioning, we compute the probability
of the intersection as
Pr[E1 ∩ E2 ∩ … ∩ En ] = Pr[En |E1 ∩ E2 ∩ … ∩ En−1 ]Pr[E1 ∩ E2 ∩ … ∩ En−1 ]
= Pr[En |E1 ∩ E2 ∩ … ∩ En−1 ]Pr[En−1 |E1 ∩ E2 ∩ … ∩ En−2 ]
Pr[E1 ∩ E2 ∩ … ∩ En−2 ]
= Pr[En |E1 ∩ E2 ∩ … ∩ En−1 ]Pr[En−1 |E1 ∩ E2 ∩ … ∩ En−2 ] … Pr[E2 |E1 ]Pr[E1 ]
Then, if the events are independent, this reduces to
Pr[E1 ∩ E2 ∩ … ∩ En ] = ∏ Pr[E ]
i=1
i
This useful relationship assures that we can compute the probability of

occurrence for intersections of events and assures that for independent
events the calculation is relatively easy.
Exercises
3.1 A random experiment consists of measuring the weight of the car-
bon dioxide emitted by a coal fired power plant during a 4-hour
period. Identify the sample space for this experiment.
3.2 An experiment consists of selecting an acre of land in the Jefferson
National Forest at random and counting the number of cardinal
nests in that parcel of land. Identify the sample space for this
experiment.
3.3 An experiment consists of measuring the speed of randomly
selected southbound vehicles as they pass mile marker 118
on Interstate Highway 81. Identify the sample space for this
experiment.
3.4 Suppose two events A and B are mutually exclusive and that
Pr[ A ] = 0.3 and Pr[ B ] = 0.5. What are the probabilities that (a) either
event occurs, (b) A occurs but B does not, and (c) both A and B occur?
3.5 A university bookstore accepts MasterCard and Visa credit cards.
Forty-two percent of the stores customers carry a MasterCard and
33% carry Visa. If 14% of the store’s customers carry both cards,
what percentage of the store’s customers carry a credit card the
store accepts?
3.6 Ninety-two percent of college students have laptop computers,
and 68% have MP3 players. If 20% of college students own neither
of these types of electronic devices, what is the probability that a
student selected at random will own both of the two devices?
3.7 For two events A and B of a sample space, what is the value of
Pr[ A ∪ B ] + Pr[ A ∩ B ]?
3.8 Eighty-eight percent of Virginia Tech ISE students have TI-89 cal-
culators, and 65% have tablet-type PCs. If 6% of VT ISE students
own neither of these types of electronic devices, what is the prob-
ability that a student selected at random will own both of the two
devices?
3.9 Suppose two events Y and Z are mutually exclusive and that
Pr[ Y ] = 0.25 and Pr[ Z ] = 0.4. What are the probabilities that (a) either
event occurs, (b) Y occurs but Z does not, and (c) both Y and Z occur?
3.10 Two fair 6-sided dice are rolled. Let A be the event that the sum of
the numbers on the dice is odd and let B be the event that the first
die shows an odd number. Compute Pr[ A ∪ B ].
3.11 A small community organization consists of 20 families, of which
4 have one child, 8 have two children, 5 have three children, 2 have
four children, and 1 has five children.
a. If one of the families is chosen at random, what is the probability

that it has two children?
b. If one of the children is chosen at random, what is the probability
this child comes from a family having two children?
3.12 Let U and W be two events of a sample space. Show that the prob-
ability that exactly one of these events occurs is Pr[ U ] + Pr[ W ] −
2Pr[ U ∩ W ].
3.13 For two sets, S and T, in a sample space show that S ⊂ T implies that
T c ⊂ Sc.
3.14 Units of a manufactured product that are to be inspected are known
to be subject to three types of defects. Six percent of the units have
surface finish flaws, 4% have improperly fixed ground wires, and
3% have base mounts that are not level. It is known that .25% of
the units have all three defect types and .50% have surface finish
flaws and improperly fixed ground wires. In addition, we know
that .50% have the combination of only surface finish flaws and
base mounts that are not level, and 1.5% have only the unlevel base
mount defect.
a. What is the probability that a unit selected at random will have
both an improperly fixed ground wire and a base mount that is
not level?
b. What is the probability that a unit selected at random will have
only the improperly fixed ground wire defect?
c. What is the probability that a unit selected at random will have
only the surface flaw defect?
d. What is the probability that a unit selected at random will be
defect free?
3.15 Sixty-five percent of Virginia Tech undergraduate students
have automobiles, and 42% have bicycles. If 27% of college stu-
dents own neither a car nor a bike, what is the probability that a
Virginia Tech student selected at random will own both a car and
a bike?
the numbers on the dice is less than or equal to 8 and let B be the
event that the first die shows an even number. Compute Pr[ A ∪ B ]
and Pr[ Ac ∩ B ].
3.17 Suppose that for two events A and B, Pr[ A ] = 0.75 and Pr[ B ] = 0.60.
Show that Pr[ A ∩ B ] ≥ 0.35. The general form of this relationship is
called Bonferroni’s inequality.
3.18 Two cards are drawn at random from a standard poker deck. What
is the probability that they have the same value?
3.19 If two fair 6-sided dice are rolled, what is the conditional probability
that the first die landed on 6 given that the sum of their numbers is 9.
3.20 If two fair 6-sided dice are rolled, what is the conditional probability
that the first die lands on 4 given that the sum of their numbers is 8.
3.21 In reliability analysis, a parallel system functions successfully as
long as at least one of the identical parallel components is function-
ing. Suppose a parallel system of three components, each having a
reliability of 0.80, is functioning. What is the conditional probabil-
ity that component number one is functioning? What is the condi-
tional probability that it has failed?
3.22 Suppose we have four different unfair coins having a probability
of heads equal to 0.62, 0.56, 0.52, and 0.70, respectively. If one of the
coins is selected at random and flipped with the result that it shows
heads, what is the conditional probability that it was the third coin
that was used?
3.23 The machining center at a production facility has three lathes of
differing ages and thus precision. The oldest, machine A, produces
finished units of product of which 88% are good, 8% are blemished,
and 4% unusable. Machine B produces 92% good, 6% blemished,
and 2% unusable. The newest machine, machine C, turns out prod-
uct that is 96% good, 3% blemished, and 1% unusable. If machine
A produces 1/4 of the company’s output and machine B turns out
1/3 of the output, what fraction of the company’s product is good?
What percentage is blemished? If an unit of product is selected at
random and found to be blemished, what is the probability that it
was produced on machine B?
3.24 Sixty-four percent of the fire alarms in a building were manufac-
tured by Acme and the rest were manufactured by Emca. Fire alarms
are tested every 3 months and the test will give a false indication of
failure with probability 0.04. The test will also give a false indication
of proper function with probability 0.08. The Acme alarms have a
failure probability of 0.18, whereas those from Emca have a failure
probability of 0.15. If a test indicates that a particular alarm is failed,
what is the probability that it was manufactured by Acme?
3.25 The machining center at a production facility has three lathes of
differing ages and thus precision. The oldest, machine A, produces
finished units of product of which 92% are good, 5% are blemished,
and 3% unusable. Machine B produces 93% good, 5% blemished,
and 2% unusable, whereas the newest machine, machine C, turns
out product that is 96% good, 3% blemished, and 1% unusable. If
machine A produces 1/3 of the company’s output while machine B
turns out 1/3 of the output, what fraction of the company’s product
is good? What percentage is blemished?
3.26 Seventy-five percent of the customers at an on campus fast food res-

taurant order hamburgers, and 85% of those who order hamburg-
ers also order fried potatoes. If 80% of all customers order fried
potatoes and a randomly selected customer has not ordered fried
potatoes, what is the probability that the customer has ordered a
hamburger?
3.27 A cell phone manufacturer purchases display screens from two
different suppliers: 70% of screens are from Reliable Video and
30% are from New Age Technology. While both suppliers are try-
ing to meet the same longevity requirements, it has been found
that the screens from Reliable Video have a one-year survival rate
of 90%, and those from New Age Technology have a 86% one-
year survival rate. (a) What fraction of the company’s phones will
have screens that survive one year? (b) If a one-year-old phone is
selected at random and found to have a failed video display, what
is the probability that the screen was purchased from New Age
Technology?
3.28 A portable phone manufacturer purchases power supply chips
from three sources. Fifty percent are obtained from supplier A,
which has a 1.5% defect rate, whereas 30% come from supplier
B which has a 0.75% defect rate, and 20% come from supplier C,
which has a 2% defect rate. If a phone from this manufacturer is
selected at random, what is the probability that it has a defective
power supply chip? If the selected phone does have a defective
power supply chip, what is the probability the defective chip was
obtained from supplier A?
3.29 At Virginia Tech, 87% of engineering students have graphing cal-
culators, 36% of business students have these devices and none of
the students from other curricula have them. Assume that 22% of
VT students are enrolled in engineering, and 18% are enrolled in
business. If a VT student is selected at random and found not to
have a graphing calculator, what is the probability the student is
enrolled in engineering.
the numbers on the dice is odd and let B be the event that the first
die shows an odd number. Show why these events are or are not
independent? Compute Pr[ A ∪ B ].
the numbers on the dice is odd and let B be the event that at least
one of the dice shows a 1 or a 5. Determine whether these events
are independent. What is Pr[ A|B ]?
3.32 Show why mutually exclusive events each having nonzero prob-
ability of occurrence are not independent.
4
Random Variables and Distributions
We have seen how the possible outcomes of a random experiment may be

described using sets and how a probability measure can be applied to the events
that may be observed. In order to provide a general method for representing
experimental observations in a quantitative manner, we will define here the
idea of a random variable. If we consider the observations of blue or white cars,
we see that these do not allow us to do much analytically. The idea of the ran-
dom variable is that we should assign a number to correspond to our observa-
tions so that we can perform analyses on the results of our experiments.
4.1 Random Variables

Usually, in engineering applications, we are interested in quantifying the
results of a random experiment. Once we associate a numerical value with
each possible outcome of a random experiment, we will then have a quantita-
tive representation of our observations. Begin with a formal definition of a
random variable.
A random variable is a real-valued function defined on a sample space.
Since a random variable is a function, its domain is the sample space and a
set of real numbers is the range of the random variable.
It is as simple and as complicated as that. We define a mapping from each
element of the sample space to a real number. We may do this in nearly any
way we wish. Clearly, we will usually want to define well-behaved functions—
ones that are single valued, perhaps invertible, perhaps differentiable—but
while the form of the function may be important to the application, this is not
required by the theory.
Example 4.1
When we roll a six-sided die, the set of observations we may actually
make are the number of spots on the side of the cube that lands facing
up. A logical mapping is
one spot → 1
two spots → 2
three spots → 3
33
four spots → 4
five spots → 5
six spots → 6
However, if there were a reason to do so, we could define the mapping as

one spot → –3
two spots → –5
three spots → 0
four spots → 2
five spots → 5
six spots → 9
Both of these mappings are acceptable as definitions of the random vari-

able representing our random experiment of tossing the die. Yet another
admissible mapping is
one spot → –5
two spots → –1.5
three spots → 5
four spots → 2.2
five spots → 8
six spots → 6.5
Clearly, the logic for such an assignment would be difficult to recognize

but the mapping is consistent with the definition and is thus acceptable.
It is reasonable to expect that definitions of random variables will be based

on a logical interpretation of the elements of the sample space and the exper-
iment for which they are intended.
Example 4.2
Referring back to the example of the depth of a tire tread in the previous
chapters, we had the sample space Ω = {x|7.5 ≤ x ≤ 8.5}. Clearly, an appro-
priate definition of the random variable is to map the depth readings to
the same numerical value. A reasonable alternative would be to map the
readings to the difference between the observed depth and 7.5 mm. That
is, y = x − 7.5.
Example 4.3
When we flip a coin, the outcomes are H and T. We can map H to 1 and
T to 0, or we could just as well map H to 10 and T to 5.
Example 4.4
If we measure the ambient temperature in New York City at 8:00 a.m., we
might map the thermometer reading to a Celsius scale, a Fahrenheit, or
even a Kelvin scale.
Random Variables and Distributions 35
Random variables are typically denoted by italicized, capital letters.

Specific values taken on by a random variable are typically denoted by itali-
cized, lowercase letters. That is, we might say that the name of our random
variable Y represents the number of spots we observe when tossing a die.
When we make the observation, we say Y = y, which is to say that Y takes
value y on a specific trial.
In an industrial engineering context, there are many types of phenomena
that we may wish to study using a random experiment. Therefore, there are
many possible definitions of random variables.
In general, a random variable that can take on at most a countable num-
ber of values is said to be a discrete random variable. A random variable
that can take on an uncountable number of values is said to be a continu-
ous random variable.
To summarize, we create a quantitative representation of the results of our

observation of a random phenomenon by mapping the elements of the sam-
ple space to a set of real numbers. We have considerable latitude in how we
do this so that the resulting random variable can have whatever behaviors
we believe will be useful. Logically, it is usually but not always the case that
discrete sample spaces are mapped to discrete random variables and con-
tinuous sample spaces are mapped to continuous random variables. A final
point is that the definition of the random variable should be completely clear
in the sense that the range of the variable should be clearly specified. For
example, we must indicate when negative values are permitted.
4.2 Distributions
Once we have defined a random variable, we would like to assign a prob-
ability measure to it. So the question is how do we obtain probabilities for
random variables?
The answer is that we associate with the random variables the same prob-
abilities that apply to the events to which they correspond. Suppose that for a
particular experiment, we have an event A comprised of outcomes ai as
= { a1 , a2 , … , an }
A
For the experiment, we would presumably have been able to define Pr[A].
Now, suppose we define the random variable
xi = x(ai)
Then for X = { x(a1 ), (a2 ), … , x(an )} we assign Pr[X] = Pr[A]. It is particularly

important to note that we define the probabilities for random variables in
terms of probabilities of events rather than outcomes.
Example 4.5
For the six-sided die, let X be the number of spots showing. Then, for the
event B, corresponding to an even number of spots
Pr[X = 2 or 4 or 6] = Pr[B] = 0.5
and for the event C, corresponding to having the number of spots exceed 2,
Pr[X > 2] = Pr[C] = 0.667
Example 4.6
In Chapter 3 (Example 3.10), for the case of the depth of tire tread, we
computed Pr[F] = 0.35 where F = {x | 7.90 ≤ x < 8.25} It would be reasonable
to define the random variable corresponding to the depth of the tread to
be the same number. That is,
= d(x) = x
d
and then we have Pr[7.90 ≤ d ≤ 8.10] = 0.35.
The guiding principle here is that we associate probabilities with random

variables by transferring the probabilities of the events, for which the sets
of random variables are the images, to those sets of random variables. It is
important to realize that by transferring probabilities from events to sets of
random variables, we maintain consistency with the three axioms of prob-
ability. Among the consequences of being consistent with the probability
axioms is the fact that the relationship
for events E1 and E2 having E1 ⊆ E2, Pr[E1 ] ≤ Pr[E2 ] (4.1)
is transferred to the random variables. This implies that if we order the ran-
dom variable in an increasing sequence, then as we include more of the val-
ues of the random variable in a set, the probability is nondecreasing. This
assures that we can transfer the probability measure for both discrete and
continuous random variables.
Specifically, once the random variables are defined and are expressed as
real valued quantities, we organize their probabilities into distribution func-
tions, and this enables us to manipulate the probabilities efficiently. Formally,
the definition of a distribution function is
FX ( x) = Pr[X ≤ x] (4.2)
Notice the notation. The name of the random variable is an italicized capital
letter. Specific values of the random variable are represented by italicized,
lowercase letters.
The distribution function is often called the cumulative distribution function

(cdf) because it represents the accumulation of probability over the increas-
ing values of the random variable. We can confirm the fact that the definition
of the distribution function is consistent with the three probability axioms,
and as a result we will see that it is always the case that
0 ≤ FX (x) ≤ 1
and that the function is “right continuous.”

Mathematically, the cdf has four important properties that assure its appli-
cation to random phenomena. These are
1. The distribution function is nondecreasing. That is, for a < b , it must

be the case that FX (a) ≤ FX (b).
lim FX ( x) = 1.
2.
x →∞
3. lim FX ( x) = 0.
x →−∞
4.
F is right continuous. A decreasing sequence that converges to x has
lim FX ( y ) = FX ( x).
y↓x
These properties hold for both discrete and continuous random variables.
In both cases, the function is defined with respect to the entire real line. The
realizations of these conditions are slightly different for discrete and con-
tinuous variables so we treat the two cases a little differently. Consider first
an example of the discrete case.
Example 4.7
Suppose our experiment is comprised of tossing a coin three times and
counting the number of heads. For this experiment, the sample space is
Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Suppose we assign our random variable as
X(TTT) = 0
X(HTT) = X(THT) = X(TTH) = 1
X(HHT) = X(HTH) = X(THH) = 2
X(HHH) = 3
Now, we assign the probabilities to this random variable as
FX (0) = 1 8 = 0.125 FX (2) = 7 8 = 0.875
FX (1) = 4 8 = 0.5 FX (3) = 8 8 = 1.0

7/8
Fx(x)
1/2
1/8
x
1 2 3
FIGURE 4.1
Example discrete distribution function.
Drawing a graph of this function, we obtain the image shown in

Figure 4.1. Take particular note that the plot is for the distribution func-
tion. This is the base function for assigning probabilities to random
variables. Note that the function is right continuous as it forms a step
function by making an increase at discrete points, is nondecreasing, and
reaches the value of 1.0 when the full range of the random variable is
attained. Not fully shown is the fact that the function has a value of zero
in the negative domain, and the value of 1.0 may be considered to con-
tinue for all values beyond X = 3.
4.2.1 Probability Mass Functions

For a discrete random variable, we can use the distribution function to obtain
probabilities for particular values of X. The resulting function is called a
probability mass function (pmf).
For Example 4.7, the pmf is
Pr[X = 0] = FX (0) = 0.125
Pr[X = 1] = FX (1) − FX (0) = 0.375
Pr[X = 2] = FX (2) − FX (1) = 0.375
Pr[X = 3] = FX (3) − FX (2) = 0.125
In general, when we have the distribution function for a discrete random

variable, we obtain the pmf as
fX ( x) = Pr[X = x] = FX ( x) − FX ( x − 1) (4.3)
Note that a consequence of this construction is that

x
FX ( x) = ∑f
j= 0
X ( j) (4.4)
Equation (4.3) and Equation (4.4) formally express the fact that the probabili-
ties for individual values of a discrete random variable are obtained from the
distribution function and that these probabilities can be added up to recover
accumulated values for the distribution function.
It is appropriate at this point to summarize the construction of a probabil-
ity measure for a discrete random variable. For a random experiment that
produces one of only a countable number of possible outcomes, we define a
mapping from each outcome to a real number. We call that number a realiza-
tion of a random variable because it represents the observation we make in
our random experiment. We then assign probabilities to the real numbers
by transferring the probabilities on events—sets of outcomes—to the cor-
responding sets of images of the outcomes that comprise those events. This
provides an analytical structure that we can use to study and describe pos-
sible observations of the experiment.
Example 4.8
In an automated warehouse, shelf space is allocated according to the
sizes of the products to be stored. For a particular realization of this sys-
tem, the stored items are cases of paint (in cans) and their daily arrival
sequences to the warehouse are determined by orders received. A case
of 1-pint cans contains 12 cans and requires 25% of a bin shelf, whereas
a case of 1-gallon cans contains 6 cans and requires 50% of a bin shelf. A
5-gallon can is handled as a single unit and requires 40% of a bin shelf,
and a 10-gallon can is handled individually and requires 60% of a bin
shelf. Our experiment consists of observing the next arriving product to
be stored. Let
Ω = {ω 1 , ω 2 , ω 3 , ω 4 }

where
ω1 = a case of 1-pint cans

ω2 = a case of 1-gallon cans
ω 3 = a 5-gallon can
ω4 = a 10-gallon can
Suppose we have determined that Pr[ω1] = 0.20, Pr[ω2] = 0.60, Pr[ω 3] = 0.15,

and Pr[ω4] = 0.05. Next, define our random variable as X(ωj) = j. This defi-
nition allows us to state that
Pr[X = 1] = 0.20, Pr[X = 2] = 0.60, Pr[X = 3] = 0.15

Pr[X = 4] = 0.05, Pr[X ≤ 3] = 0.95, Pr[X < 3] = 0.80
Pr[X ≥ 2] = 0.80 and Pr[2 ≤ X ≤ 3] = 0.75
Clearly, we could construct many other probability statements.

Fx
1.0
0.8
0.6
0.4
0.2
x
500 1000 1500 2000
FIGURE 4.2
Example of a continuous distribution function.
4.2.2 Probability Density Functions

In the case of a continuous random variable, we again construct the distribu-
tion function as the base of our probability analysis. We do this in the same
manner we use for discrete random variables: we transfer probabilities asso-
ciated with events to their sets of images. The distribution function usually
looks like the curve in Figure 4.2. The notation for the distribution function
is the same as it was in the discrete case. It is
FX ( x) = Pr[X ≤ x] (4.5)
We obtain probabilities for images of specific events using the derivative of

the distribution function. The derivative of the distribution function is called
the probability density function (pdf) and is denoted by
d
f X ( x) = FX ( x) (4.6)
dx
It is reasonable to think of the probability density function as the rate at
which the distribution function is accumulating probability. As with all
derivatives, it is a rate function.
Example 4.9
The ambient temperature at 8:00 a.m. in New York City falls in the inter-
val –40°F ≤ T ≤ 120°F. Define X(T) = T so that a plot of FX(x) might look like
the graph in Figure 4.2. Then the pdf for the temperature, evaluated at
75°F would be:
d
f X (75) = FX ( x)
dx x = 75
Note that we can obtain the probabilities associated with any specific
event as
x (T2 )
Pr[ x(T1 ) ≤ X ≤ x(T2 )] = FX ( x(T2 )) − FX ( x(T1 )) =
∫
x (T1 )
f X (u) du
So the probability that we observe the temperature to be in the interval

75°F ≤ T ≤ 85°F is given by
85
Pr[ x(75) ≤ X ≤ x(85)] = FX ( x(85)) − FX ( x(75)) = FX (85) − FX (75) =
∫ 75
f X (u) du
One very important point here is that we must always be careful to assure
that we define our probability functions so that
0 ≤ FX (x) ≤ 1
Usually, this means that we must be careful with our definition of the ran-
dom variable as we define the probabilities on events so that
Pr[Ω] = 1
4.2.3 Survivor Functions

Once the distribution function on a random variable has been defined,
probabilities can be computed for the complements of sample space events
and this is often useful. In general, for a distribution function FX (x), the
probability for the event that is the complement to the one corresponding to
X ≤ x is called the survivor function (or the reliability function) and is repre-
sented by
FX ( x) = Pr[X > x] = 1 − Pr[X ≤ x] = 1 − FX ( x) (4.7)
This definition applies to both discrete and continuous random variables.

The realization of Equation (4.7) for the discrete random variable is
xmax x
FX ( x) = ∑
j= x+1
f X ( j) = 1 − ∑f
j = xmin
X ( j) (4.8)
and for a continuous random variable, it is
∞ x
FX ( x) =
∫ x
fX (u) du = 1 −
∫ −∞
fX (u) du (4.9)
Since there is no probability mass associated with an individual value, x,

having x as a limit on both integrals does not introduce any error in the
computation.
Example 4.10
In Example 4.8, the probability of large format containers is
FX (2) = Pr[X = 3] + Pr[X = 4] = 0.20
Example 4.11
In Example 4.9, the probability that the morning temperature in
New York exceeds 95°F is
120
FX (95) =
∫
95
f X (u) du
For both discrete and continuous random variables, the aforementioned

definitions of the distribution function and the corresponding probability
mass function or probability density function are general. In engineer-
ing applications, specific realizations of these functions are used to model
observed behavior. In some cases, the form of the distribution function
that is used is obtained empirically from direct observation. More often,
the model is defined using one of a collection of previously identified dis-
tributions that have been found to represent particular phenomena well.
In the sections that follow, these models and their usual applications are
enumerated.
4.3 Discrete Distribution Functions

As illustrated in Example 4.7, a discrete distribution function may be con-
structed on the basis of a logical understanding of a process, or on the basis
of observations of the random experiment for which the distribution is to be
used. In constructing the distribution, one has considerable latitude but must
conform to the axioms of probability. To emphasize this point, suppose it has
been observed that a particular process displays
Pr[Y = 0] = c 10 , Pr[Y = 1] = 1.25c 10 , Pr[Y = 2] = 1.75c 10 , and Pr[Y = 3] = c 10
Since the probabilities must sum to 1.0, we conclude that c = 2 and

 0 −∞ < y < 0
 0.2 0≤y<1

FY ( y ) = 0.45 1≤ y < 2
 0.8 2≤y<3

1.00 3≤y<∞
Experience over time has led us to recognize that it is often unnecessary to

construct an empirical distribution function for a discrete random variable
because many phenomena display recognizable behavior patterns. Those
patterns are sufficiently common that specific forms of discrete distribution
functions have been defined to model them. In addition, these general mod-
els appear to be typical of particular types of phenomena. We examine the
most commonly used models here. These are the
• Bernoulli
• Binomial
• Multinomial
• Poisson
• Geometric
• Negative binomial
A point to keep in mind as we discuss these models is that they are actually
families of distributions. Each specific realization of one of the models is
obtained by selecting its parameter values.
4.3.1 The Bernoulli Distribution

Taking these in turn, consider first the Bernoulli random variable. It is used
as a model for sample spaces having binary outcomes. The classic example is
coin tossing, but it also applies to product inspections and many other physi-
cal systems. Experiments having only two possible outcomes are often called
Bernoulli trials. The idea is that the sample space has exactly two possible
outcomes, Ω = {ω1 , ω2}.
In the case of coin tossing, Ω = {H , T}, and we can let H and T represent both
the outcomes and the corresponding events. If we map the sample space to
the random variable X as X (H) = 1, X (T) = 0, and if we assign the probabilities
as Pr[H] = p, Pr[T] = q = 1 − p, then the pmf is
fX ( x) = p x (1 − p)1− x = p x q1− x , x = 0, 1
What then is the distribution function?

FX (0) = (1 − p) = q
FX (1) = 1
Note that specifying the possible values of the random variable is part of
the process of defining the distribution or pmf. In many cases, the possible
values of the random variable are obvious and can be omitted, but as a gen-
eral rule one should be careful to assure that the definition is clear.
Example 4.12
Suppose we test a fire alarm every three months. If the probability that
the alarm is faulty is p = 0.08, we define the random variable as
1 if the alarm is faulty

X=
0 if the alarm is fu
unctioning
and we find
f x (0) = 0.92 f x (1) = 0.08
Fx (0) = 0.92 Fx (1) = 1
4.3.2 The Binomial Distribution

The binomial distribution provides a model for the number of events that
occur in a fixed number of Bernoulli trials. We can use it to represent the
number of heads in 20 coin tosses, the number of defective parts found in
the inspection of the output of a production process, or any other situation
where there are two possible categories of observations and we wish to
know how many are in one of the categories. It is common practice to refer
to the category of interest as a “success” even if that corresponds to finding
a defective unit of product. Essentially, a binomial random variable can be
viewed as a sum of Bernoulli random variables. This will be formally veri-
fied later in this text.
Now to construct the pmf, consider that we plan to toss a particular coin
n = 20 times and to count the number of times the coin lands showing heads.
What is the probability that nine heads occur? Let Pr[H] = p, Pr[T] = q = 1 − p,
so
Pr[9 Heads] = Cp 9 (1 − p)11
where the binomial coefficient C represents the number of ways the 9 heads can
be dispersed among the 20 tosses. That is, the two sequences {0, 0, 1, 0, 0, 1, 0, 0,
0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0} and {0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0} both have

 20
9 heads. How many such sequences are there? The answer is , which is
 9 
read 20 things taken 9 at a time or equivalently, the “combination of 20 items,
9 at a time.” This notation is interpreted as
 20 20 !
 =
9  9 ! 11!
or in general
 n n!
 = (4.10)
 k k !( n − k )!
It represents the number of distinct ways that n items can be arranged with
k being of one class and (n – k) being of the other class.
The origin of the binomial coefficient is binomial theorem, which states
that
n
 n
( a + b) =
n
∑  j  a b
j= 0
j n− j
(4.11)
The reader is encouraged to confirm this relation for n = 3 and n = 4.

Combining our ideas to this point, we find that for our coin-tossing
example
 20
fX (9) =   p 9 (1 − p)11
9 
and so for the general case, the pmf for the binomial distribution is
 n n!
fX ( x) =   p x (1 − p)n− x = p x (1 − p)n− x , 0 ≤ x ≤ n (4.12)
 x x !(n − x)!
The corresponding distribution function (cdf) is
x
 n
FX ( x) = ∑
j= 0
j
  p (1 − p)
 j
n− j
As a standard notation, many people refer to the binomial cdf as B(x, n, p)

and to the pmf as b(x, n, p). Using this notation:
x x
FX ( x) = B( x , n, p) = ∑ b( j, n, p) = ∑ f
j= 0 j= 0
X ( j) (4.13)
To complete our discussion of the coin example, suppose the coin in

question is “fair” (e.g., p = 0.5), what is the probability of 9 heads in 20
tosses?
 20
fX (9) =   p 9 (1 − p)11 = 0.16
9
In addition, recognizing that a distribution function must apply to every

possible value of a discrete random variable, one may compute the probabili-
ties shown in Table 4.1. Note that the symmetry around x = 10 in the values
of the pmf is typical of binomial distributions.
Clearly, the binomial distribution is useful in analyzing coin tosses but
it is also useful in engineering applications. The earliest application of the
binomial distribution in industrial engineering was probably in applica-
tions to quality control. In the inspection of units of product, the bino-
mial distribution provides a useful model for the number of defective (or
nonconforming) units that will be found in an inspection sample. More
recently, it has been used to represent the number of patients in a cohort
who respond positively to a particular treatment, for the number of cus-
tomers who select a particular brand of a product, for the number of airline
passengers who might present risks of destructive acts, and for many other
similar situations.
TAbLE 4.1
Full Distribution and Mass Function Values for x Heads in 20 Coin Tosses
x Fx(X) fx(X) x Fx(X) fx(X) x Fx(X) fx(X)
0 ~0 ~0 7 0.1316 0.0739 14 0.9793 0.0370
1 ~0 ~0 8 0.2517 0.1201 15 0.9941 0.0148
2 0.0002 0.0002 9 0.4119 0.1602 16 0.9987 0.0046
3 0.0013 0.0011 10 0.5881 0.1762 17 0.9998 0.0011
4 0.0059 0.0046 11 0.7483 0.1602 18 ~1.0000 0.0002
5 0.0207 0.0148 12 0.8684 0.1201 19 ~1 ~0
6 0.0577 0.0370 13 0.9423 0.0739 20 ~1 ~0
Example 4.13
Suppose that an injection molding process for automotive interior door
panels tends to generate defective panels at a rate of about 0.8%. What
is the probability that a sample of 125 panels will contain two or fewer
defective panels?
For this case, p = 0.008, so
2
 125
FX (2) = ∑ 
j= 0
j
 (0.008) (0.992)
j 
125− j
= 0.92
In closing, we should note that the binomial distribution is called a

two-parameter distribution because it is defined by the two parameters n and p.
4.3.3 The Multinomial Distribution

The multinomial distribution provides a model for observations that are
subject to more than two classifications. It can be viewed as a generaliza-
tion or as an extension of the binomial distribution. One familiar example is
when we are rolling a die and recording how many times each face comes
up. In this case there are six classes of outcomes, so our observation space is
the vector {x1, x2, x3, x4, x5, x6}. The pmf for this random variable is
n!
fX ( x1 , x2 , x3 , x4 , x5 , x6 ) = p1x1 p2x2 p3x3 p4x4 p5x5 p6x6 ,
x1 ! x2 ! x3 ! x4 ! x5 ! x6 !
6
(4.14)
0 ≤ xi ≤ n, ∀i, ∑x = n
i=1
i
and the cdf is obtained as an appropriate summation. We will look at this

distribution in greater detail in the next chapter. For now, observe that we
could start with this model, simplify the analysis to the question of observa-
tions of one of the xk, and the result is the binomial distribution.
Example 4.14
Batches of integrated circuits that are produced for assembly in
cell phones and in portable computers are subject to two key types
of defects. They may have a short circuit due to material bridging
between circuit paths or a short circuit due to an absence of conductive
material in a circuit path. If we inspect a sample of these components,
we would classify each unit as ω1 = nondetective, ω 2 = bridging short,
or ω 3 = material void short. Let xk represent the number of units in a
sample of n = 50 chips that belong in category k, then our probability
model is
n!
Pr[X 1 = x1 , X 2 = x2 , X 3 = x3 ] = fX ( x1 , x2 , x3 ) = p1x1 p2x2 p3x3
x1 ! x2 ! x3 !
For P = ( p1 , p2 , p3 ) = (0.975, 0.015, 0.010), and x1 = 48, x2 = 1, x3 = 1 :
n! 50!
f X (48, 1, 1) = p1x1 p2x2 p3x3 = (0.975)48 (0.015)(0.010) = 0.109
x1 ! x 2 ! x 3 ! 48!1!1!
4.3.4 The Poisson Distribution

One of the most commonly used and important probability models in indus-
trial engineering and in many other domains is the Poisson distribution.
The reason this distribution is so important is that there are many phenom-
ena that appear to be well represented by it. Customer arrivals, employee
and student absenteeism counts, defects per square area in metals and in
fabrics, bird nesting patterns, meteor showers, and many other types of
occurrences seem to display frequency behavior that is consistent with the
Poisson model.
For the Poisson, the general form of the pmf is
(λt)x
fX ( x) = e −λt , 0 ≤ x < ∞ (4.15)
x!
The associated distribution function is
x
(λt) j
FX ( x) = ∑
j= 0
e −λt
j!
(4.16)
For this distribution, we say that the parameter λ represents the rate for the
process. The Poisson is a one-parameter distribution and that parameter is λ.
Example 4.15
Incoming calls to a call center arrive at a rate of λ = 4/min. What is the
probability that the number of calls in any minute exceeds 6? What is the
probability that the number of calls in 5 minutes is fewer than 16?
To answer the first question, we take t = 1 and compute the survivor
function:

FX (6) = Pr[X > 6] = 1.0 − Pr[X ≤ 6] = 1.0 − FX (6) = 1.0 − ∑e
j= 0
−4 ( 4) j
j!
= 1.0 − 0.889 = 0.111

To answer the second question, we take t = 5 and compute:
15
FX (15) = ∑j=0
e −20
(20) j
j!
= 0.157
Note that in the general case, the quantity t need not necessarily repre-
sent time. It is often used to represent time but any meaningful measure
that is consistent with the definition of λ is acceptable. In addition, there are
cases in which a single “time” unit is assumed. In those cases, we have the
simplified forms
λx
f X ( x) = e −λ
x!
and
x
λj
FX ( x) = ∑j=0
e −λ
j!
where the range of the random variable remains 0 ≤ x < ∞.
4.3.5 The Geometric Distribution

The geometric distribution provides a model for how many Bernoulli trials
will be required before the first time a particular event (a success) occurs.
One familiar example is the number of rolls of a six-sided die before a 4
occurs. For a fair die, the probability that any roll yields a 4 is p = 1 6 , so the
probability that the number of rolls until the first time a 4 is observed equals
k must be
f K ( k ) = Pr[ K = k ] = (1 − p)k − 1 p , 1 ≤ k < ∞ (4.17)
which is to say, the first k – 1 rolls must yield an observation other than a 4 and
( )( )
5
the final roll must yield a 4. Thus, we can compute f K (6) = 5 6 1 6 = 0.067.
Of course, we can also compute values of the distribution function. By the
usual definition
FK ( k ) = Pr[ K ≤ k ] = ∑ (1 − p)
j=1
j−1
p (4.18)
With a little effort we can simplify this summation as follows:
k k −1
 ∞ ∞

FK ( k ) = ∑
j=1
(1 − p) j−1 p = p ∑
j− 1= 0
(1 − p) j−1 = p 

∑
r=0
(1 − p)r − ∑ (1 − p) 
r=k
r
 ∞


= p
1
 1 − (1 − p)
− (1 − p) k
r− k =0
∑
(1 − p)r − k 

1 1  1 1
= p  − (1 − p)k  = p  − (1 − p)k  = 1 − (1 − p)k
p 1 − (1 − p)  p p
Now that we see the result, we recognize that it is conceptually reasonable.

The first four (or any event of interest) will occur on or before trial k provided
it is not the case that the event is not observed on any of the k trials.
Example 4.16
In the development of single-shot weapons such as tactical missiles,
a “test analyze and fix (TAAF)” regime is used to find system faults.
A sequence of firings is performed until a missile fails to fire prop-
erly. The device is then subjected to diagnosis to determine and fix the
cause of the failure. If this procedure is implemented for a population
of m issiles for which the failure probability is p = 0.035, what is
the probability that a failure occurs on the sixth firing and what is
the probability that the number of firings required to find a failure
exceeds 12?
f K (6) = ( 0.965 ) ( 0.035 ) = 0.029

5

and
FK (12) = ( 0.965 ) = 0.652

12

4.3.6 The Negative Binomial Distribution

The negative binomial distribution may be viewed as a generalization of
the geometric distribution. It represents the probabilities for the number
of Bernoulli trials required to obtain a fixed number of successes. That is,
rather than consider only the number of trials until the first success, we can
consider the number of trials until any selected number of successes have
occurred. If we are counting tosses of a six-sided die and observing 4’s, the
negative binomial can be used to compute the probability that the third 4
occurs on any, say the 12th, toss. We can construct the pmf for the negative
binomial distribution logically. How can we have the third 4 occur on the
12th toss? The answer is that the first 11 tosses must include two 4’s in any
feasible way and then the final toss must yield a 4. Now the probability of
two 4’s in 11 tosses is given by the binomial as b(2, 11, p) and the probability
that the final toss yields a 4 is p. Therefore, the negative binomial probability
that the third success occurs on the 12th trial is
 11
f K (12) = b −1 (12 , 3, p) = pb(2 , 11, p) = p   p 2 (1 − p)9
 2

 11 11! 1
( ) (56)
3 9
=   p 3 (1 − p)9 = = 0.049
 2 2 ! 9! 6
In general, we will observe the xth success on the kth trial if we observe
x – 1 successes in k – 1 trials and then a success on the final trial. That is
 k – 1
f K ( k ) = b −1 ( k , x , p) = pb( x − 1, k − 1, p) =  k−x
 p (1 − p) , x ≤ k < ∞ (4.19)
x
 x − 1
The distribution function is
k k
 j – 1

−1
FK ( k ) = B ( k , x , p) = ∑b
j= x
−1
( j , x , p) = ∑  x – 1 p (1 − p)
j= x
x j− x
(4.20)
and is interpreted as the probability that the xth success occurs on or before
the kth trial.
Note the correspondence and the difference between the negative binomial
distribution and the binomial distribution. For the binomial distribution, the
number of trials is fixed and the random variable is the number of successes,
and for the negative binomial distribution, the number of successes is fixed
and the random variable is the number of trials.
Because of their dual relationship, we can express the cdf for the negative
binomial distribution in terms of that for the binomial distribution. The rela-
tionship is
B−1 ( k , x , p) = 1 − B( x − 1, k , p) (4.21)
This equation reflects the fact that the xth success occurs on or before the kth
trial only if the number of successes in k trials is x or greater (not x – 1 or fewer).
Example 4.17
In testing a new software product, a manufacturer undertakes updating
the product when the third fault is found by a member of the population
of test users. If each user has a probability of p = 0.04 of encountering a

fault, what is the probability that 20 tests are performed before an update
is made. What is the probability that more than 40 tests are required to
find four faults?
 19
f K (20) = b −1 (20, 3, 0.04) =   (0.04)3 (0.96)17 = 0.0055
 2
FK (40) = 1 − B−1 (40, 3, 0.04) = B(2, 40, 0.04) = 0.786
4.4 Continuous Distribution Functions

As in the case of discrete random variables, it may be appropriate to con-
struct an empirical distribution for the probabilities that describe an experi-
ment. Once again, it is important to comply with the axioms of probability
and this usually means assuring that the probability model integrates to one.
Example 4.18
Suppose the random variable X has the probability density function
f X ( x) = c(1 − x 2 ), 0≤x≤1
What is the value of the constant c? We obtain the answer by construct-

ing the distribution function and evaluating it at the maximum value of
the random variable. Thus
x x x
1 3 1
FX ( x) =
∫
0
f X (u) du =
∫
0
c(1 − u2 ) du = c (u −
3
u ) = c( x − x 3 )
0 3
c 2
FX (1) = 1 = c − = c , so c = 3 2
3 3
Before leaving Example 4.18, note that it is often important to provide a

more complete definition of a density or distribution function than was
given in the example. A more appropriate specification of the density func-
tion would have been
 0 −∞ < x < 0

fX ( x) = c(1 − x 2 ) 0≤x≤1
 0 1< x <∞

Except where the simple definition is completely clear, the more formal defi-
nition should be used.
Again as in the case of the discrete random variables, observation of our
natural environment and of our manufacturing processes causes us to see
that the probabilities of occurrence for events in these domains often appear
to display a recognizable pattern. There are several of these commonly occur-
ring patterns that can be described in the form of a continuous distribution
function. The five most common models are the
• Exponential
• Gamma
• Weibull
• Normal
• Uniform
4.4.1 The Exponential Distribution

The times between arriving customers to a service provider, and the length
of equipment operating intervals prior to failure, are two of the many phe-
nomena that seem to be well represented by the exponential distribution.
The defining feature of this distribution is that short durations are more
likely than long ones. The cdf for the exponential distribution is
FT (t) = 1 − e −λt , 0 < t < ∞ (4.22)
and the corresponding density function (pdf) is
fT (t) = λe −λt (4.23)
Figure 4.3 shows a plot of the exponential density for the case in which
λ = 0.01. Observe that the plot implies greater probabilities for small values
than for larger ones. One can verify this by noting that
Pr[50 ≤ T ≤ 60] = FT (60) − FT (50) = 0.058

while
Pr[250 ≤ T ≤ 260] = FT (260) − FT (250) = 0.008
Clearly, the random variable for the exponential distribution is always

nonnegative. The single parameter, λ, is called the rate parameter as it char-
acterizes the rate at which arrivals occur. The survivor function represents
the probability of a longer wait for an arrival or, in the case of equipment
failure times, the survivor function represents the probability that failure
0.010
0.008
0.006
0.004
0.002
100 200 300 400 500
FIGURE 4.3
Representative exponential density function.
does not occur prior to a particular time. This is often called the reliability
of the equipment.
Example 4.19
An insurance call center receives calls at a rate of four per minute. What is
the probability that the time between two calls is (a) less than 5 seconds,
(b) greater than 30 seconds, and (c) between 10 and 20 seconds? The rate
of 4/min means λ = 4, so
Pr[T ≤ 5 sec = 1 12 min] = 1 − e −0.333 = 0.283
Pr[T > 30 sec = 0.5 min] = e −2.0 = 0.135
Pr[10 sec = 1 6 min ≤ T ≤ 20 sec = 1 3 min] = FT ( 1 3) − FT ( 1 6) = 0.736 − 0.486 = 0.250
Example 4.20
An electronic component has a failure rate of λ = 10−4 /hr. What is its

20,000 hour reliability?
FT(20,000) = 1 − e−2 = 0.865
4.4.2 The Gamma Distribution

The gamma distribution is defined for a nonnegative random variable. Often,
that variable is interpreted as time but other applications also exist. The alge-
braic form of the pdf for the gamma distribution is
λ α t α− 1 −λt
fT (t) = e , 0 < t < ∞ (4.24)
Γ(α)
where Г(α) is the gamma function evaluated at the value of the parameter
α. The reader is reminded that the gamma function is the definite integral
defined as
∞
Γ ( z) =
∫ 0
t z − 1 e − t dt
and that for the special case in which the argument is an integer
Γ( z) = ( z − 1)!
In general, the parameters of the gamma distribution are nonnegative but

are not necessarily integer. The cdf for the gamma distribution cannot be
expressed in closed form unless the parameter α is an integer. In that case,
the cdf is
α− 1 ∞
FT (t) = 1 − e −λt ∑
k=0
(λt)k
Γ( k + 1)
= e −λt
k =α
(λt)k
∑
Γ( k + 1)
or (4.25)
α− 1 ∞
∑ (λkt!) ∑ (λkt!)
k k
FT (t) = 1 − e −λt = e −λt
k=0 k =α
For cases in which the parameter α is not an integer, values of the dis-
tribution function are computed by numerical integration. The value of the
gamma function is then calculated using a numerical approximation. The
best available numerical approximation is given in Abramowitz and Stegun*
as follows. For 0 ≤ α ≤ 1,
Γ(α + 1) = " α !" = 1 + c1α + c2 α 2 + c3 α 3 + c 4 α 4 + c5 α 5 + ε(α)
where
c1 = −0.5748646
c2 = 0.9512363
c3 = −0.6998588
c4 = 0.4245549
c5 = −0.1010678
and the resulting error is bounded by |ε(α)|< 5 × 10−5. As an example, observe

that
Γ(3.26) = (2.26)(1.26)Γ(1.26) = (2.848)(0.904) = 2.574
* Abramowitz, M., Stegun, I. A., 1965, Handbook of Mathematical Functions, New York, Dover
Publications.
Thus, for a gamma distribution having α = 4.26 and λ = 0.025
125
1 1
FT (t = 125) =
Γ(α) ∫ 0
λ α t α− 1 e −λt dt =
8.396
(2.764) = 0.329
where the value of the integral was obtained numerically.

The gamma distribution is often a good representative model for various
types of queuing systems, equipment failure processes, and environmental
pollutant accumulations. It is also used to represent error rates in large
software products and infection rates for communicable diseases.
Example 4.21
Suppose the random variable T represents the time to failure for an auto-
matic guided vehicle, and that it has a gamma distribution with param-
eters α = 3, λ = 0.75/yr. Compute Pr[2.5 yr ≤ T ≤ 7.5 yr].
∞ ∞
∑ (7.5λ)k −7.5 λ
∑ (2.5kλ! )
k
Pr[2.5 ≤ T ≤ 7.5] = FT (7.5) − FT (2.5) = e − e −2.5 λ
k =α
k! k =α
α −1 α −1
∑ (2.5kλ! ) ∑ (7.5kλ! )
k k
= e −2.5 λ − e −7.5 λ
k =0 k =0
 (2.5)(0.75) ((2.5)(0.75)) 
2
= e −( 2.5)( 0.75)  1 + + 
 1 2 
 (7.5)(0.75) ((7.5)(0.75)) 
2
− e −(7.5)( 0.75)  1 + + 
 1 2 

= e −1.875  1 + 1.875 +
(1.875)2  − e −5.625  1 + 5.625 + ( 5.625)2 
  
 2   2 
= 0.710 − 0.081 = 0.630
4.4.3 The Weibull Distribution

During the first half of the 20th century, a Swedish engineer named Waloodi
Weibull was studying the breaking behavior of metal tensile specimens.
Weibull found that the failure behavior followed a pattern that he described
using the form
β
 t−δ 
−
 θ−δ 
FT (t) = 1 − e , δ ≤ t < ∞ (4.26)
1.0
0.8
0.6
0.4
0.2
500 1000 1500 2000 2500 3000
FIGURE 4.4
Weibull cdf when θ = 1000 and when θ = 1500.
It is reasonable to observe that three parameters should permit one to

model most types of behavior. The three parameters are nonnegative, as is
the random variable. Of the three parameters, δ is called the minimum life
parameter. It represents the force value below which no failures occurred. It
thus represents an axis offset, which can often be ignored. If we exclude that
parameter from the model, we obtain the more widely used two-parameter
form of the distribution:
( ) , 0 < t < ∞ (4.27)

β
− tθ
FT (t) = 1 − e
In this form, the parameter β is called the shape parameter and θ is called
the location parameter. As the names imply, the shape and location param-
eters determine the shape and location of the distribution. In the case of the
Weibull, we see an informative example of the effect of the designation of
the location parameter. Notice that the value of the distribution function at
the value of the random variable t = θ is FT (t = θ) = 1 − e−1 = 0.632 regardless of
the value of the parameter β. Thus, the value of θ determines the range over
which the distribution varies. This is illustrated in Figure 4.4. Increasing
values of the location parameter move the 63.2% point of the distribution
to the right, which implies a wider range of values over which the random
variable may occur.
Example 4.22
Suppose the life lengths of certain memory chips are well modeled by
the Weibull distribution having β = 2.25 and θ = 18,000 hr. What fraction
of the chip population will fail by 10,000 hours, and what is the probabil-
ity that a chip survives more than 25,000 hours?
2.25
 10, 000 
− 
= 1 − e −( 0.556)
2.25
 18, 000 
FT (10, 000) = 1 − e = 1 − e −0.266 = 0.234
2.25
 25, 000 
− 
= e −(1.389)
2.25
 18, 000 
FT (25, 000) = e = e −2.094 = 0.123
4.4.4 The Normal Distribution

It would be difficult to overstate the importance of the normal distribution,
which some people call the Gaussian distribution. There are great num-
bers of natural and man-made phenomena that display behavior that con-
forms to the normal distribution model. In fact, it is likely the reader has
already encountered the normal distribution, perhaps under the label of
the “bell-shaped curve” or some similar descriptor. Physical dimensions of
manufactured products often vary in a manner that is well described by
the normal distribution. The same is true of the quantities such as modulus
of elasticity, density, viscosity, and other “physical constants” that are not
really constant. Physical characteristics, such as the heights and weights of
populations of animals and humans, also display normal distribution-type
dispersion. Most students are probably familiar with teachers who distrib-
ute their test scores and course grades according to a normal distribution
and have probably encountered discussions of intelligence test scores that
are also thought to vary according to a normal distribution. There are many
other examples, many of which are familiar to us.
In general, the normal distribution allows the random variable to vary
over both the positive and the negative domains. The distribution is speci-
fied using two parameters, µ and σ, where µ is the location parameter that
specifies the location of the center of the distribution and σ is the scale
parameter that provides an indication of the range of dispersion of the dis-
tribution. A representative image of the normal density function is shown in
Figure 4.5. The algebraic statement of the normal density is
−
( x−µ )2
e 2σ2
f X ( x) = , − ∞ < x < ∞ (4.28)
2 πσ 2
and it is worth repeating that the range of the random variable is −∞ < x < ∞.
Unfortunately, the normal density cannot be integrated in closed form to
provide an algebraic statement of the distribution function. Consequently,
computation of probabilities for normal random variables is usually accom-
plished in one of four ways.
One way to obtain probabilities for normal random variables is by numeri-
cal integration. One may simply program a numerical integration algorithm
or else use one that is commercially available. A second approach is to use an
0.10
0.08
0.06
0.04
0.02
10 10 20 30
FIGURE 4.5
Normal density when µ = 1000 and when σ = 4.
existing mathematical software product, which has the normal distribution

incorporated as a built-in function. Most math software products have it.
The most common way to obtain probabilities for the normal distribution
is to convert the normal random variable to a “standard normal variable,”
and then use a table of the cdf for the standard normal distribution. The
transformation of a normal random variable to a standard normal random
variable is
x−µ
z= (4.29)
σ
where it is a notational convention to use z to represent a standard normal

variable. It is also a convention that the standard normal density and distri-
bution functions be represented using the Greek letter phi as
2
−z 2
e
φ( z) = (4.30)
2π
for the density function and
z
Φ( z) =
∫−∞
φ(w) dw (4.31)
for the distribution function. Observe that the algebraic form for the den-
sity function is simplified by the transformation of the variables in Equation
(4.29). This is because the distribution on the transform variable z has param-
eters µ = 0 and σ = 1. Values of the cdf of Equation (4.31) are included in a
widely used table that is reproduced here in Table 4.2. The use of the standard
normal table is illustrated in the following example.
TAbLE 4.2
Cumulative Probabilities for the Standard Normal Distribution
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8079 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8728 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9648 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9712 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9773 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9983 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
Example 4.23
The diameters of baseball cores are well modeled by a normal distribu-
tion having µ = 25 mm and σ = 0.80 mm. In processing, the cores are
passed through a 26-mm screen and the cores trapped above the screen
are machined down provided they have a diameter that is no larger
than 27 mm. Cores with diameters larger than 27 mm are discarded.

What fraction of the population of cores is trapped by the screen? What
fraction of the population of cores is machined?
Let x represent the dimension of interest, the diameter of the cores of
baseballs. Then, the corresponding standard normal variable is
x − 25
z=
0.8
and the cores trapped by the screen are those that have diameters that
exceed 26 mm, so
26 − 25
Pr[X > 26] = Pr[Z > = 1.25] = 1 − Pr[Z ≤ 1.25] = 1 − Φ(1.25)
0.8
= 1.0 − 0.8944 = 0.1056
In this calculation, we first find that z = 1.25 is the value of the stan-
dard normal variate that corresponds to a core diameter of 26 mm.
We then go to Table 4.2 where the leftmost column gives the values
of z to a level of tenths. The column headings extend the values of z
to hundredths so we observe that in the row labeled 1.2, the column
headed 0.05 will correspond to z = 1.25 and the table entry of 0.8944 is
therefore the value of Φ(1.25). Subtracting that value from 1 yields the
probability we seek.
To determine the fraction of the population that is machined to meet
the spec, we compute
27 − 25
Pr[26 < X < 27 ] = Pr[1.25 < Z < = 2.5] = Φ(2.5) − Φ(1.25)
0.8
= 0.9938 − 0.8944 = 0.0994
The observant reader will have noted that all of the z values that index the
table are positive. However, negative values of z occur often. How is this han-
dled? The symmetry of the normal density allows us to use the identity that
Φ(− z) = 1 − Φ( z) (4.32)
Example 4.24
Suppose the baseball cores that pass through the 26-mm screen are sub-
sequently passed across a 24.5-mm screen and any cores that are not
trapped are discarded because they are too small. What fraction of the
population of cores is discarded because they are too small?
24.5 − 25
Pr[X < 24.5] = Pr[Z < = −0.625]
0.8
= Φ(−0.625) = 1.0 − Φ(0.625) = 1.0 − 0.7341 = 0.2659
The rationale for Equation (4.32) and its application in Example 4.24 is that
the area under the normal density (the integral) to the right of any value z is
equal to the area under the normal density to the left of −z. Consequently,
including only the positive coordinates in the table is sufficient to specify the
entire distribution function.
A further comment is that the probability stated in Example 4.24 for Φ(0.625)
was computed by linear interpolation between Φ(0.62) and Φ(0.63). This is a
reasonable approach to obtaining values relative to a finer mesh.
Example 4.25
Specifications for the length of a machined component are 1.8 ± 0.16 mm.
Assuming that component length is well modeled by a normal distribu-
tion, what value of σ will assure that at least 98% of the population falls
within the specs?
1.64 − 1.80 1.96 − 1.80 0.16 0.16

Pr[1.64 ≤ X ≤ 1.96] = Pr[ ≤z≤ ] = Pr[− ≤z≤ ]
σ σ σ σ
0.16 0.16 0.16  0.16 
= Φ( ) − Φ(− ) = Φ( ) −  1 − Φ( )
σ σ σ  σ 
0.16
= 2Φ( ) − 1 = 0.98
σ
so
0.16
Φ( ) = 0.99
σ
and
0.16
= z0.99 = 2.326
σ
so
0.16
σ= = 0.069
2.326
Example 4.25 is intended to illustrate that we can determine “quantiles”
for the normal distribution by reversing the process we used to obtain prob-
abilities. That is, the example is really asking for the normal variates that
have 1% of the population in the tails of the density. Essentially, for any tail
probability, γ, we can say that
zγ = Φ −1 ( γ ) (4.33)
and
xγ = µ + zγ σ (4.34)
The standard normal variate that corresponds to a cdf value of γ is the

γ quantile of the standard normal distribution, and inverting Equation
(4.29) yields the corresponding variate of the original normal distribution.
In the case of Example 4.25, the question is made a little more interest-
ing by asking for σ, but the solution is still driven by the identity of the
quantile.
A final approach that can be used to compute normal distribution prob-
abilities is by numerical approximation. The approximation is defined for
standard normal variates and was published by Abramowitz and Stegun.
To compute approximate values for cumulative probabilities of the standard
normal distribution, we calculate
1
( )
−16
Φ( z) ≈ 1 − 1 + d1 z + d2 z 2 + d3 z 3 + d4 z 4 + d5 z 5 + d6 z6 + ε( z) (4.35)
2
where
d1 = 0.04986(73)
d2 = 0.02114(10)
d3 = 0.00327(76)
d4 = 0.00003(80)
d5 = 0.00004(89)
d6 = 0.00000(54)
and the calculation yields a result with an error term of | ε(z) |< 1.5 × 10−7.

Abramowitz and Stegun also provide an approximation algorithm for quan-
tiles of 0.05 or less. The algorithm is
c 0 + c1 t + c 2 t 2
z1− γ ≈ t − + ε( γ ) (4.36)
1 + e1 t + e 2 t 2 + e 3 t 3
where
c0 = 2.515517
c1 = 0.802853
c2 = 0.010328
e1 = 1.432788
e2 = 0.189269
e3 = 0.001308
and
 1
t = ln  2 
γ 
This approximation yields results for which |ε(γ)|< 4.5 × 10−4.

4.4.5 The Uniform Distribution

The uniform distribution provides a model of cases in which the probability
associated with a set of values of a random variable is proportional to the
proportion of the sample space corresponding to the set of values. The most
common application of the uniform distribution is in computer simulation
experiments.
In the general case, we allow the random variable, say X, to fall in a range
[a, b] and the distribution function is defined as
x−a
FX ( x) = , a ≤ x ≤ b (4.37)
b−a
The corresponding density function is
1
f X ( x) = (4.38)
b−a
so we can see that the proportion in Equation (4.37) is the fraction of the
range of the random variable in the set {a ≤ X ≤ x}.
Under the general definition of the uniform distribution, the parameters a
and b may be positive or negative as long as a < b. In practice, negative values
are rarely used and in fact, the most common use of the distribution model
has a = 0 and b = 1.
Example 4.26
For a uniformly distributed random variable on the range [0.5, 2.5], what
are Pr[1.3 ≤ X ≤ 1.9] and Pr[X > 2.1]?
Since a = 0.5 and b = 2.5
1.9 − 0.5 1.3 − 0.5 1.9 − 1.3

Pr[1.3 ≤ X ≤ 1.9] = FX (1.9) − FX (1.3) = − = = 0.3
2.5 − 0.5 2.5 − 0.5 2.0
and
2.1 − 0.5
Pr[X > 2.1] = 1 − FX (2.1) = 1 − = 0.2
2.5 − 0.5
4.5 Conditional Probability

The concept of conditional probability described in Chapter 3 extends
directly to the random variables that are images of events and thus to their
distribution and density or probability mass functions. The condition stated

in Equation (3.1) that
Pr[ A ∩ B]
Pr[ A|B] =
Pr[B]
is meaningful for probability calculations based on distribution functions

and is again a reflection of a reduction of the magnitude of the sample
space resulting from available information. For a discrete random vari-
able having pmf fX (x), the knowledge that X ≥ a can be incorporated in
our analysis by the application of the conditioning Equation (3.1) with the
result that we obtain fX|X≥a (x|X ≥ a) = Pr[X = x|X ≥ a]. In this case, Equation
(3.1) implies
Pr[X = x] f X ( x)
fX|X ≥ a ( x|X ≥ a) = Pr[X = x|X ≥ a] = = , a ≤ x ≤ xmax
Pr[X ≥ a] FX ( a − 1)
In the case of a continuous random variable T having density function fT (t)

the corresponding construction is
FT (t) − FT (τ)
FT|T ≥τ (t|T ≥ τ) = , τ≤t≤∞
1 − FT (τ)
Example 4.27
Suppose X is a Poisson random variable with λ = 12 and we know that for
a certain application X ≥ 8. We compute Pr[X = 11|X ≥ 8] as
f X (11)
f X|X ≥ 8 ( x|X ≥ 8) = = 0.126
FX (7)
and Pr[X ≤ 16|X ≥ 8] as
FX (16) − FX (7)
FX|X ≥ 8 (16|X ≥ 8) = = 0.889
FX (7)
To construct an example for a continuous random variable and expand our

experiences, reverse the direction of the inequality.
Example 4.28
Suppose that we know that a continuous random variable T represents
the age at failure of a component and has an exponential distribution
with parameter λ = 0.005/day. If we also know that the component will

be replaced preventively at an age of τ = 24 days, what is the probability
that the component fails before 10 days given replacement will occur at
24 days, and what is the probability that the component will survive for
at least 18 days given that replacement will occur at 24 days?
FT (10) 1 − e −(0.025)(10) 0.221

FT|T ≤τ (10|T ≤ 24) = = = = 0.490
FT (24) 1 − e −(0.025)(24) 0.451
FT (24) − FT (18) e −(0.025)(10) − e −(0.025)(24)

FT|T ≤τ (18|T ≤ 24) = = = 0.197
FT (24) 1 − e −(0.025)(24)
The point of this discussion is that it is often meaningful to define or ana-

lyze conditional probabilities, and the fact that the conditioning information
is expressed in terms of a random variable and its distribution function does
not alter its sense or limit its applicability.
4.6 Hazard Functions

In the case of continuous random variables, we define an additional descrip-
tor of the behavior of the distribution function. The hazard function is most
often used in reliability analysis, but it has other applications as well. It is
defined as the conditional failure density given survival to any point in time.
An equivalent definition is that the hazard function is the instantaneous rate
of failure given survival to any point in time.
The algebraic construction of the hazard function follows from our under-
standing of conditional probabilities and derivatives. Representing the life
length (failure age) of a device by T, we can say
Pr[t ≤ T ≤ t + ∆t ∩ T ≥ t] Pr[t ≤ T ≤ t + ∆t]

P r[t ≤ T ≤ t + ∆t|T ≥ t] = =
Pr[T ≥ t] Pr[T ≥ t]
FT (t + ∆t) − FT (t)
=
FT (t)
Then, dividing by ∆t and taking the limit yields
FT (t + ∆t) − FT (t) fT (t)

zT (t) = lim = (4.39)
∆t→0 ∆tFT (t) FT (t)
The appropriate interpretation of this expression is that the density indi-
cates the rate at which the distribution is growing and for the hazard function,
it must be scaled by the proportion of the distribution function that remains.
Example 4.29
If a population of microelectronic capacitors displays life length behav-
ior that is well modeled by a Weibull distribution having β = 1.8 and
θ = 15,000 hr, compute and interpret the value of the hazard function at
7800 and 15,600 hours.
For the Weibull distribution
βtβ− 1 − ( t θ)β
fT (t) = e
θβ
and
( )
β
− tθ
FT (t) = e
so
βtβ− 1
zT (t) =
θβ
Therefore
FT (7800) = 0.735 and zT (7800) = 7.1 × 10−5

−5
FT (15,600) = 0.342 and zT (15,600) = 12.4 × 10
so at 7800 hours, 73.5% of the population will be surviving and those

survivors are failing at a rate of 7.1 × 10−5 per hour, while a year later at
15,600 hours of age, only 34.2% of the population is surviving and those
are failing at a rate of 12.4 × 10−5 per hour.
Example 4.30
Suppose the exponential distribution (with parameter λ = 0.04/day)
forms a representative model of the life lengths of a certain population of
flies. If 1000 flies hatch at a location today, what proportion of the origi-
nal population and what proportion of the survivors will die on days 2
and 3 of their lives?
For the exponential, zT (t) = λ and FT (t) = 1 − e−λt, so the proportion of the
survivors that start each day that die that day is 4%. However, the aver-
age number of flies that die on day 2 is 38 and on day 3 is 37. The reason
for the difference is that on day 2, 4% of the 960 day-1 survivors die while
on day 3, 4% of the 922 day-2 survivors die.
An interpretation of the hazard function that makes sense in an actuarial or

a reliability context is that the hazard function gives the rate at which surviv-
ing units are failing. In a reliability or actuarial context, it is quite common to
refer to a distribution in terms of the behavior of its hazard function. One may
reasonably say that a particular device displays an increasing hazard function
so its life length should be modeled by a distribution having this characteristic.
One reason for specifying a distribution model in terms of its hazard

function is that the hazard function completely identifies its distribution
function. The correspondence between the two functions is one to one. In
fact, starting with the definition of the hazard function in Equation (4.39)
fT (t) fT (t)
zT (t) = =
FT (t) 1 − FT (t)
we can rearrange the terms to obtain the (relatively simple) nonhomoge-

neous differential equation
dFT (t)
fT (t) = = zT (t) ( 1 − FT (t))
dt
for which, the solution is
FT (t) = 1 − e ∫0 T
− z ( u) du
(4.40)
Thus, every distribution function defined for a continuous nonnegative random

variable can be described in terms of its hazard function in the form given in
Equation (4.40).
4.7 Independent Random Variables

Before leaving the discussion of random variables and their probabilities,
consider the transfer of the idea of independence from events of a sample
space to the corresponding variables. Recall that in Chapter 3, two events,
say A and B, are said to be independent if
Pr[ A ∩ B] = Pr[ A]Pr[B]
or equivalently
Pr[A|B] = Pr[A]
Now, if there are random variables that are defined as images of the event,
it must be the case that those random variables are independent as the proba-
bilities have been transferred directly. Suppose the set X contains the images
of the elements of A and the set Y contains the images of the set B. Then,
necessarily
Pr[X|Y] = Pr[X]
and
Pr[Y|X] = Pr[Y]
That is, X and Y are independent.
Exercises
4.1 For the sample space described in Exercise 3.2 of Chapter 3, define
a random variable that can be used to represent observations.
4.4 At a local pizza restaurant, the hourly demand, D, for a particular
type of pizza has the following probability mass function:
 0.08 d=0

 0.12 d=1
 0.18 d=2

f D (d) =  0.24 d=3
 0.16 d=4

 0.12 d=5
 0.10 d=6

What are the values for FD (4), FD (2), Pr[1 ≤ D ≤ 4], and Pr[D ≥ 4|D >2]?
4.5 Consider a random variable for which the distribution function is
given by
 0.08 0≤Y <1

 0.20 1≤Y < 2

0.42 2≤Y <3
FY ( y ) = 
 0.68 3≤Y < 4
 0..86 4≤Y <5

1.00 5≤Y <∞

Determine the values for fY (3), FY (3), Pr[2 ≤ Y ≤ 4], and Pr[Y ≥ 3|Y >1].
4.6 Suppose two fair six-sided dice are rolled. What are the possible
values that could arise for
a. The larger of the two numbers observed.
b. The smaller of the two numbers observed.
c.
The magnitude of the difference between the two numbers
observed.
d. The sum of the two numbers observed.
What are the probabilities associated with each of these random
variables?
4.7 A discrete random variable, Y, has the following probability mass
function:
 0.03 y=0
 0.09 y=1

 0.14 y=2

 0.18 y=3
fY ( y ) =  0.23 y=4
 0.15 y=5

 0.09 y=6
0.07 y=7

 0.02 y=8

What are the values for FY (3), FY (5), Pr[2 ≤ Y ≤ 6] and Pr[Y ≥ 5|Y ≥ 2]?
4.8 A random variable Y has the following pmf:
 0 y=0
0.8c y=1

1.2 c y=2
fY ( y ) = 
 c y=3
0.6c y=4

0.4c y=5

Determine the values for c, FY (3), FY (2), and Pr[Y ≥ 3|Y >1].
4.9 Consider a random variable for which the distribution function is
given by
 0.06 0≤Y <1

 0.18 1≤Y < 2

 0.36 2≤Y <3

FY ( y ) =  0.64 3≤Y < 4
 0..80 4≤Y <5

0.92 5≤Y <6
1.00 6≤Y <∞

Determine the values for fY (3), FY (4), Pr[3 ≤ Y ≤ 5], and Pr[Y ≥ 4|Y >2].

4.10 A random variable has the probability density defined as
f X (x) = c(1 − x2), 0 ≤ x ≤ 1
a. Determine the value of c.
b. What is Pr[0.5 ≤ X ≤ 0.85]?

f X (x) = c(4x − 2x2), 0 ≤ x ≤ 2.
a. Determine the value of c.
b. What is Pr[0.5 ≤ X ≤ 1.2]?
4.12 Demand at a local microbrewery for its premium beer in gallons
per week is a random variable with density function f X (x) = 4(1 − x)3,
0 ≤ x ≤ 1. How much beer must the brewer produce per week in
order to have a stock out probability in any week of 0.05?
4.13 What is the probability that a binomial random variable, X, hav-
ing parameters n = 50 and p = 0.025, takes a value greater than
2? What is the probability that X exceeds 4 if it is known that X
exceeds 2?
4.14 A particular engine bearing plant has three manufacturing lines.
For line A, the proportion of output bearings that are defective is
0.01, whereas for line B the proportion is 0.016, and for line C it is
0.025. Line A produces 30% of the plant’s output, whereas line B
produces 45% and line C produces 25% of the output. If a sample
of n = 100 bearings is inspected and found to contain one defective
bearing, what is the probability that the sample was taken from the
output of line B?
4.15 What is the probability that a binomial random variable, Y, hav-
ing parameters n = 40 and p = 0.05, takes a value between 2 and 4
inclusive?
4.16 If 0.5% of the output wheel-well manifolds from an injection mold-
ing process are defective, what is the probability that a sample of 80
units will include 2 or more defective manifolds?
4.17 The firm that manufactures patriot missiles purchases the guid-
ance circuits from three different suppliers. Supplier A provides
30% of the guidance circuits and those circuits have a fault prob-
ability of pA = 0.02, whereas the circuits from supplier B, which pro-
vides 25% of those purchased, have a fault probability of pB = 0.025.
The guidance circuits purchased from supplier C have pC = 0.01. If a
batch of 200 missiles is fired during a particular strategic offensive
and three of the missiles fail to track to target, what is the probabil-
ity that the batch of missiles contained guidance circuits obtained
from supplier B?
4.18 The number of patients arriving to a local pharmacy for flu shots
during November is a Poisson random variable with parameter
of λ = 2.8/hr. What is the probability that the number of arriving
patients in any hour exceeds three?
4.19 The number of incoming calls to a mail order call center is a
Poisson random variable with parameter of λ = 1.8/min. What is the
probability that the number of arriving calls in any minute exceeds
three? What is the probability that the number of arriving calls
exceeds three during any 2-minute interval?
4.20 Suppose it is known that the number of accidents occurring per
day on Main Street is a Poisson random variable with parameter
λ = 4/day.
a. What is the probability that the number of accidents on any day
is four or more?
b. Given that the number of accidents today is at least one, what is
the conditional probability of four or more accidents today?
4.21 The number of calls to a particular university Internet site is well
modeled by a Poisson distribution with λ = 1.2/min. What is the
probability that the number of calls during any minute will exceed
three if it is known that the number of calls exceeds one?
4.22 What is the probability that more than 12 tosses of a fair die are
required to obtain the first 6?
4.23 Bob and Joe have each purchased an unbalanced six-sided die
at a novelty shop. Bob’s die has Pr[X = 4] = 0.10, while Joe’s has
Pr[X = 4] = 0.20. If one of these dice is selected at random and rolled
until the first 4 occurs and that happens to be the seventh roll, what
is the probability that the die is Bob’s? How does this probability
change if the first 4 occurs on the fourth roll?
4.24 If an experiment consists of tossing a fair die, what is the probabil-
ity that a 3 occurs for the fourth time on the 20th toss?
4.25 A fair six-sided die is rolled until the third time a 5 is obtained.
What is the probability that the number of rolls exceeds 20? What
is the probability that the number of rolls exceeds 20 given that it

exceeds 12?
4.26 If it is known that in a series of coin tosses, the fourth head occurred
on the twelfth trial, what is the probability that the third head
occurred on the ninth trial?
4.27 If a bearing inspection process is continued until the third defec-
tive bearing is found and the defect proportion for the bearing
population is p = 0.015, use the binomial distribution model to state
(do not compute) the probability that the process terminates on or
before the 64th inspected bearing.
4.28 A local electricity utility company inspects residential meters at
random. Approximately 4.5% of the meters register electricity con-
sumption inaccurately. Inspections continue until four inaccurate
meters are found at which time an order for replacement meters
is made. What is the probability that the number of inspections
before an order is made exceeds 20? What is the probability that
the number of inspections exceeds 20 given that the first 10 meters
inspected are accurate?
4.29 The time between arrivals of customers to a concert ticket sales web
site is well modeled by an exponential distribution with parameter
λ = 0.25/min. What is the probability that the time until the next cus-
tomer arrival customer exceeds 7.5 minutes? What is the probability
that the time until the next customer arrival exceeds 7.5 minutes
given that it is no greater than 10 minutes?
4.30 If the life lengths of a population of integrated circuits are well
modeled by an exponential distribution having λ = 0.0004/hr, what
is the probability that a randomly selected member of that popula-
tion operates without failure for more than one year (5000 hr)?
4.31 The delivery lead time for a component used by an assembly plant
in its product is well modeled by an exponential distribution hav-
ing λ = 0.167/day. What is the probability that a particular compo-
nent batch arrives 8 or more days after is ordered?
4.32 The random variable T has a gamma distribution with parameters
α = 3, λ = 0.75/hr. Compute Pr[2.5 ≤ T ≤ 7.5].
4.33 The time required to locate a series of bugs in a word processing
software product is well modeled by a gamma distribution having
parameters α = 5.0 and λ = 0.022/day. What is the probability that the
discovery process will require more than 90 days?
4.34 Use the numerical approximation for the gamma function to com-
pute the value of Г[3.72].
4.35 For the gamma distribution having α = 3.72 and λ = 0.04, compute
FT (280) and FT (400).
4.36 Life lengths of certain automotive tires are well modeled by a gamma
distribution having α = 2.85 and λ = 7.14 × 10−5/mile. If the manufac-
turer of the tires offers a free replacement warranty of 18,000 miles on
the tires, what fraction of the tire population will have to be replaced?
4.37 If the life lengths of memory chips is well modeled by the Weibull
distribution having β = 1.8 and θ = 20,000 hr, what fraction of the
memory chip population will survive beyond 27,000 hours?
4.38 The life lengths of another population of memory chips is well mod-
eled by the Weibull distribution having β = 2.25 and θ = 18,000 hr.
a. What is the probability that a chip survives more than 25,000
hours?
b. What is the value of the hazard function at 25,000 hours?
4.39 The life length of a photocopier roller bearing displays a Weibull
distribution having β = 3.2 and θ = 12,000 cycles. What fraction of
the population will fail by 8000 cycles? By 18,000 cycles?
4.40 An electronics manufacturer purchases 30% of its cell phone bat-
teries from a supplier that claims those batteries have a Weibull
life length with parameters β = 1.40 and θ = 25,000 hr. The manu-
facturer produces the remaining 70% of its cell phone batteries in
its own plant, and those batteries have a Weibull life distribution
with parameters β = 1.80 and θ = 30,000 hr. If your cell phone has
required a one-year (8760 hours) warranty replacement because of
battery failure, what is the probability that your phone contained a
battery from the supplier?
4.41 The annual snowfall in Buffalo, New York, is normally distributed
with μ = 120" and σ = 10.4". What is the probability that Buffalo will
have more than 140" in any year? What is the probability that this
year’s total will be between 110" and 130"?
4.42 A normal random variable, X, with μ = 45 takes a value less than or
equal to 38.5 with probability 0.125. What is the value of σ for the
distribution?
4.43 A normal random variable, Y, with σ = 7.5 takes a value greater than
or equal to 264.4 with probability 0.230. What is the value of μ for
the distribution?
4.44 The thickness, T, of personal computer chassis spacers is well mod-
eled by a normal distribution having μ = 0.4 mm and σ = 0.04 mm If
a spacer is selected at random, what is
a. Pr[0.33 ≤ T ≤ 0.45]?
b. Pr[T ≥ 0.32]?
c. Pr[T ≥ 0.50]?
d. Pr[0.375 ≤ T ≤ 0.50 | T > 0.32]?
4.45 The heights of male college students in Ohio, X, is normally dis-

tributed with μ = 175 cm and σ = 10.2 cm. If one of these students is
selected at random, what is
a. Pr[160.0 ≤ X ≤ 190.0]?
b. Pr[X ≤ 150.0]?
c. Pr[X ≥ 200.0]?
4.46 Breakfast cereal manufacturers are subject to fines if more than 1%
of their 24-oz cereal boxes actually contain less than 23.92 oz of
cereal. If cereal box fill volume displays a normal distribution with
σ = 0.045 oz at what value, μ, should the mean fill volume be set on
the filler machine in order to avoid being at risk of a fine?
4.47 Use the numerical approximation of Equation (4.35) to compute
Pr[X ≤ 57.5] for a normal distribution having μ = 50.0 and σ = 4.2.
Compare your result to the value you obtained from Table 4.2.
4.48 Solve Exercise 4.41 using the numerical approximation in Equation
(4.36).
4.49 If a random variable has a uniform distribution over the range
10 ≤ X ≤ 20, what is the probability that the random variable takes a
value in the range [13.75, 17.25]?
4.50 If a random variable has the probability density defined as f X (x) = 2x,
0 ≤ x ≤ 1, identify the corresponding hazard function.
4.51 Show that the Weibull hazard function is decreasing when β < 1,
increasing when β > 1 and constant when β = 1.
4.52 Plot the hazard function for a Weibull distribution having β = 3.2
and θ = 2000 hr.
4.53 Compute the value of the hazard function at T = 200 hr and at
T = 400 hr for a gamma distribution having α = 2.0 and λ = 0.0072/hr.
4.54 Suppose the distribution on a nonnegative random variable, X,
which takes on values in the interval [0, 2] has the hazard function
zX(x) = 1 + 2x. Identify the distribution function on X and Pr[X ≤ 1.20].
4.55 A cutting tool can only be changed prior to processing of a batch of
jobs. A new cutting tool has a gamma life distribution with α = 3.0
and θ = 1/20,000 cycles. A used cutting tool is one that has operated
for 20,000 cycles, whereas an old cutting tool is one that has been
operated for 50,000 cycles. What is the probability that each type of
tool will survive processing a batch that requires 15,000 cycles of
cutting? There are three new, three used, and two old tools avail-
able. If a tool is selected at random, processing is started on the
batch, and the tool fails, what is the probability that a used tool was
being employed?
5
Joint, Marginal, and
Conditional Distributions
There is no reason to limit the definition of a sample space to a single dimen-

sion. There are random phenomena that are best described in terms of a
vector-valued outcome. In terms of engineering applications, it is easy to
imagine that (1) the quality of a manufactured part would be measured by
several dimensions such as thickness, height, and length; (2) the demand
for consumer electronic products from an inventory would include several
types of devices or several distinct models; or (3) the mix of patient needs in a
hospital emergency room would imply consumption of various quantities of
several resources. In a possibly more familiar application, the cards obtained
in a poker hand might also be well described in multivariate terms. Thus, in
each of the upcoming example cases, the measured quantities might be best
represented as a random vector.
5.1 The Idea of Joint Random Variables

If we encounter a sample space for which the outcomes have multivariate mea-
sures, it will probably be logical to define the representative random variable
using the same number of dimensions. The result will be a multivariate ran-
dom variable—a random vector. Consider some examples in two dimensions.
Example 5.1
Suppose we toss two fair dice and map the number of spots facing up on
each die to the corresponding number. Then, our random vector would
be X = (X 1 , X 2 ), where Xi is the number observed on die i.
Example 5.2
At a regional credit card call center, customers call in either to request a
credit limit increase or to check on their existing balance. If Y1 represents
the number of customers who call to request a credit limit increase dur-
ing a 4-hour interval and Y2 represents the number of customers who
call to check their account balance, then the random vector Y = (Y1 , Y2 )
models the sample space of incoming calls.
77
Example 5.3
If the location of a hole punched in a work piece varies in two dimensions
and is evaluated relative to its horizontal and vertical alignment, then the
random vector (X, Y) provides a representation of hole position quality.
Example 5.4
If the life of an automotive tire is defined in terms of distance traveled and
days of use, then the random vector (D, U) represents tire age accumulation.
Clearly, random vectors are common and can include any number of dimen-
sions. In addition, the quantities that comprise the random vectors may be
discrete or continuous.
As in the case of univariate random variables, we map the probabilities of
events of the sample space to the sets of random variables that are the images
of those events. We again form the mapping so that we have a distribution
function for the random vector. The general representation for the distribu-
tion function for a random vector is
FX1 , X2 , … , Xr ( x1 , x2 , … , xr ) = Pr[X 1 ≤ x1 , X 2 ≤ x2 , … , X r ≤ xr ] (5.1)
and we call this function the “joint distribution function” (or joint cumulative
distribution function or joint cdf) on the random vector X = (X 1 , X 2 , … , X r ).
To examine the general form in detail, consider a two-dimensional random
vector (X, Y) for which the realization of Equation (5.1) is
FX ,Y ( x , y ) = Pr[X ≤ x , Y ≤ y ] (5.2)
Although it is not essential to do so, we can separate our discussion of the

discrete and continuous cases.
5.2 The Discrete Case

For purposes of this discussion, suppose the random vector represents our
observation of the toss of two distinguishable dice, where the number on the
first die is mapped to X and the number on the second die is mapped to Y.
Our first observation is that we can readily construct both the joint distribu-
tion function and the corresponding joint probability mass function for the
random vector. The joint probability mass function is
fX ,Y ( x , y ) = 1 36 ∀x , y

and the joint distribution function is enumerated in Table 5.1.

Joint, Marginal, and Conditional Distributions 79
TAbLE 5.1
Joint Distribution Function for Two Fair Dice
X\Y 1 2 3 4 5 6
1 1 1 1 1 5 1
36 18 12 9 36 6
2 1 1 1 2 5 1
18 9 6 9 18 3
3 1 1 1 1 5 1
12 6 4 3 12 2
4 1 2 1 4 5 2
9 9 3 9 9 3
5 5 5 5 5 25 5
36 18 12 9 36 6
6 1 1 1 2 5 1
6 3 2 3 6
In general, the joint distribution function is parallel to the univariate distri-

bution functions described in Chapter 4. As with the functions in Chapter 4
x y
FX ,Y ( x , y ) = ∑∑ f
i= 0 j= 0
X ,Y (i, j) (5.3)
However, disassembling Equation (5.3) to obtain some of its components

must be done carefully as some relationships are a bit counterintuitive.
Perhaps the most important of those relationships is that
fX ,Y ( x , y ) ≠ FX ,Y ( x , y ) − FX ,Y ( x − 1, y − 1) (5.4)
Instead, algebraic manipulation of Equation (5.3) leads to
x−1 y −1
fX ,Y ( x , y ) = FX ,Y ( x , y ) − FX ,Y ( x − 1, y − 1) − ∑i= 0
f X ,Y (i , y ) − ∑f
j= 0
X ,Y ( x , j) (5.5)
At the same time, examination of Table 5.1 indicates that some expected

relationships do apply. In particular, note that it is always the case that
FX ,Y ( x , y ) − FX ,Y ( x − 1, y ) = Pr[X = x , Y ≤ y ] (5.6)
and
FX ,Y ( x , y ) − FX ,Y ( x , y − 1) = Pr[X ≤ x , Y = y ] (5.7)
5.2.1 Marginal Probability Functions

Given the definition of a joint distribution function, one may obtain a marginal
distribution on one of the random variables by evaluating the joint distribution
function at the maximum value(s) of the other random variable(s). That is,
FX ( x) = FX ,Y ( x , y max ) (5.8)
and
FY ( y ) = FX ,Y ( xmax , y ) (5.9)
The marginal distributions are proper univariate distributions and conform

to the relationships discussed in Chapter 4. Observe FX(x) = FX,Y (x, 6) in the
rightmost column of Table 5.1 and FY(y) = FX,Y (6, y) in the bottom row of the
table. Similarly, note that
fX ( x) = Pr[X = x] = FX ( x) − FX ( x − 1) = FX ,Y ( x , y max ) − FX ,Y ( x − 1, y max )

and the corresponding expression holds for the random variable Y. The cor-
ollary results for this expression are that the marginal probability mass func-
tions can be constructed as
f X ( x) = ∑fy
X ,Y ( x , y ) (5.10)
and
fY ( y ) = ∑fx
X ,Y ( x , y ) (5.11)
Thus, the joint and marginal probability measures are intertwined and one
can usually be obtained from the other. Consider an example.
Example 5.5
Suppose that the random vector Y = (Y1 , Y2 ) described in Example 5.2
has the joint probability mass function (pmf)
e −17 11y1 6 y2 − y1
fY1 ,Y2 ( y1 , y 2 ) = , 0 ≤ y1 ≤ y 2 , 0 ≤ y 2 < ∞
y1 !( y 2 − y1 )!

Applying Equation (5.10) to this joint pmf yields the marginals
∞ ∞ ∞
e −17 11y1 6 j − y1 e −17 11y1 6 j − y1
fY1 ( y1 ) = ∑f
j=0
Y1 , Y2 ( y 1 , j) = ∑
j = y1
y1 !( j − y1 )!
=
y1 ! ∑ ( j − y )!
j − y1 = 0 1
e −17 11y1 6 e −11 11y1

= e =
y1 ! y1 !
and
y2 y2 y2
i y2 − i
e −17 11i6 y2 − i e −17
fY2 ( y 2 ) = ∑
i=0
fY1 ,Y2 (i, y 2 ) = ∑
i=0
i !( y 2 − i)!
=
y2 ! ∑ yi !(!11
i=0
2 6
y − i)!
2
e −17
y2
 y2  e −17 17 y2
=
y2 ! ∑  i
i=0
i y2 − i
 11 6

=
y2 !

We should recognize these marginal probability mass functions as Poisson

so we know their corresponding marginal distribution functions. Values for
the joint distribution are obtained using Equation (5.3). For example, the
reader may confirm that
FY1 ,Y2 ( y1 = 5, y 2 = 13) = 0.033

5.2.2 Conditional Probability Functions

The joint distribution function also provides a basis for constructing conditional
distribution functions and their corresponding conditional probability mass func-
tions. The concept of the conditional functions is exactly the same as described
in Chapter 3 and stated in Equation (3.1). If one has partial information about
a random experiment, the probabilities of the observations of that experiment
may be adjusted to reflect that knowledge. In Chapter 3, we found that for
some experiments, knowledge that an event B occurred implied a revision in
the probability that another event, say A, occurs. In this case, we found that
Pr[ A ∩ B]
Pr[ A|B] = (5.12)
Pr[B]
The definition of conditional probability functions on a random vector again
requires the application of the probabilities of the events of the sample space
to the images of those events. Once this mapping is defined, the probability
associated with the intersection will usually be a joint probability measure
and the probability of the condition will often be a marginal probability.
Example 5.6
For the two dice having the joint distribution enumerated in Table 5.1,
we can compute
Pr[(X ≤ 3, Y ≤ 2) ∩ (Y ≤ 4)]
Pr[X ≤ 3, Y ≤ 2|Y ≤ 4] =
Pr[Y ≤ 4]
Pr[X ≤ 3, Y ≤ 2] FX ,Y (3, 2) 1 6 1
= = = =
Pr[Y ≤ 4] FY ( 4) 2 4
3
or

Pr[(X ≤ 3, Y ≤ 2) ∩ (X ≤ 4, Y ≤ 4)]
Pr[X ≤ 3, Y ≤ 2|X ≤ 4, Y ≤ 4] =
Pr[X ≤ 4, Y ≤ 4]
Pr[X ≤ 3, Y ≤ 2] FX ,Y (3, 2) 1 6 3
= = = =
Pr[X ≤ 4, Y ≤ 4] FX ,Y ( 4, 4) 4 8
9
or
Pr[(X ≤ 3) ∩ (X ≤ 4, Y ≤ 4)]
Pr[X ≤ 3|X ≤ 4, Y ≤ 4] =
Pr[X ≤ 4, Y ≤ 4]
Pr[X ≤ 3, Y ≤ 4] FX ,Y (3, 4) 1 3 3
= = = =
Pr[X ≤ 4, Y ≤ 4] FX ,Y ( 4, 4) 4 4
9
Example 5.7
For the random vector described in Example 5.2 with the joint probabil-
ity mass function stated in Example 5.5, we can construct the conditional
probability mass functions as
e −17 11y1 6 y2 −y1

fY ,Y ( y1 , y 2 ) y1 !( y 2 − y1 )! e −6 6 y2 −y1
fY2|Y1 ( y 2 |y1 ) = 1 2 = =
fY1 ( y1 ) e −11 11y1 ( y 2 − y1 )!
y1 !

and
e −17 11y1 6 y2 − y1
f Y ,Y ( y 1 , y 2 ) y1 !( y 2 − y1 )! y2 ! 11y1 6 y2 − y1
fY1|Y2 ( y1 |y 2 ) = 1 2 = −17 y2
=
fY2 ( y 2 ) e 17 y1 !( y 2 − y1 )!! 17 y2
y2 !
 y 2   11  y1  6  y2 − y1
=    
 y1   17   17 

so
e −6 6 y2 −2
fY2|Y1 ( y 2 |y1 = 2) =
( y 2 − 2)!

and
e −6 62
fY2|Y1 ( 4|y1 = 2) = = 0.0446
2!
and also
 4   11  y1  6  4− y1
fY1|Y2 ( y1 |y 2 = 4) =      
 y1   17   17 

and
 4  11  2  6  2
fY1|Y2 (2|y 2 = 4) =       = 0.313
 2   17   17 

Example 5.6 and Example 5.7 illustrate the fact that the conditional prob-
abilities may be analyzed using either the conditional distribution function
or the conditional probability mass function. The choice depends upon the
application. The examples are also intended to emphasize the use of the basic
conditioning relationship given in Equation (5.12). As with many of the anal-
yses treated in this text, it is very often worthwhile to base a computation on
a return to an initial definition.
5.3 The Continuous Case

As in the discrete case, a treatment of joint distribution functions for con-
tinuous random vectors begins with Equation (5.2). Then, the extension of
Equation (5.3) to continuous variables implies the use of integrals rather than
summations:
x y
FX ,Y ( x , y ) = Pr[X ≤ x , Y ≤ y ] =
∫ ∫
−∞ −∞
fX ,Y (u, v)dvdu (5.13)
Here again, it is appropriate to note that the joint distribution may reason-
ably apply to a random vector having more than two dimensions.
One of the advantages of the continuous model is that the joint distribu-
tion function is often (not always) differentiable. In those cases in which the
joint distribution can be differentiated, the joint probability density function
is obtained as
∂2
fXY ( x , y ) = FXY ( x , y ) (5.14)
∂x∂y
Thus, it is often possible to move between the distribution and density
functions as necessary. As noted in the case of the discrete joint distribution,
one should be cautious with difference computations. For the continuous
random vectors
b d
Pr[ a ≤ X ≤ b , c ≤ Y ≤ d] =
∫∫
a c
fX ,Y (u, v)dvdu (5.15)
but it is not the case that
Pr[ a ≤ X ≤ b , c ≤ Y ≤ d] = Pr[X ≤ b , Y ≤ d] − Pr[X ≤ a, Y ≤ c]
Instead, one can compute
Pr[ a ≤ X ≤ b , c ≤ Y ≤ d] = Pr[X ≤ b , Y ≤ d] − Pr[X ≤ a, Y ≤ c]
− Pr[X ≤ a, c ≤ Y ≤ d] − Pr[ a ≤ X ≤ b , Y ≤ c]
but it is usually easier to use Equation (5.15).
Example 5.8
Suppose the following joint density function has been defined to model
the response of a new material to a magnetic field:
f XY ( x , y ) = 2 e − x −y , 0 ≤ x ≤ y, 0 ≤ y < ∞

The joint distribution function is constructed as
x y x y x
∫∫ ∫e ∫ ∫ e (−e )
y
FXY ( x , y ) = 2 e − u− v dvdu = 2 −u
e − v dvdu = 2 −u −v
du
0 u 0 u 0 u
∫ ( ) ∫( )
x x
 1 
=2 e − u e − u − e − y du = 2 e −2 u − e − y − u du = 2  − e −2 u + e − y − u 
0 0  2 0
= 1 − 2 e − y − e −2 x + 2 e − x − y
and we observe that differentiation returns the joint density function
∂2 ∂  ∂
f XY ( x , y ) =
∂x∂y
FXY ( x , y ) = 
∂y  ∂x
( 
1 − 2 e − y − e −2 x + 2ee − x − y 

)
∂
=
∂y
( )
2 e −2 x − 2 e − x − y = 2 e − x − y

Example 5.9
The diameter of an automotive side rail spot weld, X, and its compressive
strength, Y, have been modeled using a bivariate version of a uniform
density function as
1
f XY ( x , y ) = , 0.44 ≤ x ≤ 0.64 cm, 1000 ≤ y < 2000 N
200
For this model
( x − 0.44)( y − 1000)
FXY ( x , y ) =
200
so
(0.50 − 0.44)(1400 − 1000)

FXY (0.5, 1400) = = 0.12
200
and more interesting
0.64 2000

Pr[0.56 ≤ X ≤ 0.64, 1600 ≤ Y ≤ 2000] =
∫ ∫
0.56 1600
f X ,Y (u, v) dv du = 0.16
For this last computation, one might consider the probability to be the joint
distribution equivalent of the univariate survivor function. That is, we might
label this probability as
xmax ymax

FX ,Y ( x , y ) = Pr[ x ≤ X ≤ xmax , y ≤ Y ≤ y max ] =
∫ ∫
x y
fX ,Y (u, v) dv du
This convention is not universally agreed upon but is used in this text.
5.3.1 Marginal Probability Functions

As in the case of the discrete variables, the joint distribution function may
be used to obtain a marginal distribution on one of the random variables by
evaluating the joint distribution function at the maximum value(s) of the
other random variable(s). Equation (5.8) and Equation (5.9) apply equally to
continuous random vectors. Their realizations may be stated as
x ∞
FX ( x) = FX ,Y ( x , ∞) =
∫ ∫
−∞ −∞
fX ,Y (u, v) dv du (5.16)
and
∞ y
FY ( y ) = FX ,Y (∞ , y ) =
∫ ∫ −∞ −∞
fX ,Y (u, v) dv du (5.17)
The derivative relationships described in Chapter 4 also apply, so the marginal

probability density functions can be obtained as derivatives of the marginal
distribution functions but may also be constructed from the joint probability
density function. That is,
∞
d
f X ( x) =
dx
FX ( x) =
∫ −∞
fXY ( x , v) dv (5.18)
and
∞
d
fY ( y ) =
dy
FY ( y ) =
∫ −∞
fXY (u, y ) du (5.19)
Example 5.10
For the joint distribution function of Example 5.8, the marginal distribu-
tions may be constructed as
FX ( x) = FXY ( x , ∞) = 1 − e −2 x
FY (y ) = FXY ( y , y ) = 1 − 2e − y + e −2 y

Then, by differentiation, the marginal densities are
d
f X ( x) = FX ( x) = 2 e −2 x
dx
d
fY ( y ) = FY ( y ) = 2 e − y − 2 e −2 y
dy

On the other hand, integration of the joint density function yields
∞ ∞
∫ ∫ ( )
∞
f X ( x) = f XY ( x , y ) dy = 2 e − x − y dy = 2 − e − x − y = 2 e −2 x
x
x x
y y
∫ ∫ ( )
y
fY ( y ) = f XY ( x , y ) dx = 2 e − x − y dx = 2 − e − x − y = 2 e − y − 2 e −2 y
0
0 0
and integrating these functions will yield the marginal distribution

functions.
5.3.2 Conditional Probability Functions

The construction of conditional probability statements for continuous ran-
dom variables again follows from Equation (3.1). We divide the probability of
the image of the intersection of the relevant events by the probability of the
image of the known condition. As in the case of discrete random variables,
the probability associated with the intersection will often be a joint prob-
ability measure, and the probability of the condition will often be a marginal
probability.
Example 5.11
For the joint distribution function of Example 5.8 having the marginal
distributions and densities constructed in Example 5.10, the conditional
density functions may be constructed as
f X ,Y ( x , y ) 2e − x − y e−x
f X|Y ( x|y) = = −y −2 y =
fY ( y ) 2e − 2e 1 − e−y

and
f X ,Y ( x , y ) 2 e − x − y
fY|X ( y|x) = = = e −( y − x)
f X ( x) 2 e −2 x

The corresponding conditional distribution functions should be

obtained by integration:
x
1 x
1 − e−x

FX|Y ( x|y ) =
∫
0
f X|Y (u|y ) du =
1 − e−y ∫ 0
e − u du =
1 − e−y
and
y y

FY|X ( y|x) =
∫ x
fY|X ( v|x) dv =
∫ x
e −( v − x ) dv = 1 − e −( y − x )
There are three important points that are illustrated by Example 5.11. The
first of these points is that a conditional probability function is really a func-
tion of the stated condition. For example, fX|Y ( x|y ) really is a function of Y.
If a value of 5 is specified for Y, the function is distinctly different than if the
specified value is 3.
The second of the important points is that the conditional density functions
are proper univariate density functions and the conditional distribution func-
tions are proper univariate distribution functions. These functions conform
to the descriptions provided for univariate probability functions in Chapter 4.
The third important point is that the definitions of the conditional den-
sities in Example 5.11 are incomplete as the range of the random variables
should have been specified and was not given. Unless the range of the ran-
dom variable is very obvious, it should be stated. In the case of Example
5.11, it should have been noted that f X|Y (x|y) applies for 0 ≤ x ≤ y and f Y|X (y|x)
applies for x ≤ y < ∞.
The use of two-dimensional random vectors has been useful in illustrat-
ing the construction of marginal and conditional probability functions.
However, higher dimension random vectors are also possible and for such
vectors our definitions can be extended, but this should be demonstrated.
Consider an example:
Example 5.12
For the three-dimensional joint density function
f X ,Y , Z ( x , y , z) = 2( x + y )z, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1

there are three marginal joint densities
1 1

f X ,Y ( x , y ) =
∫ 0
f X ,Y , Z ( x , y , z) dz =
∫ 2(x + y)z dz = (x + y)
0
1 1

f X , Z ( x , z) =
∫ 0
f X ,Y , Z ( x , y , z) dy =
∫ 2(x + y)z dy = (2x + 1)z
0
and
1 1

fY , Z ( y , z) =
∫ 0
f X ,Y , Z ( x , y , z) dx =
∫ 2(x + y)z dx = (2y + 1)z
0
in addition to the three univariate marginal densities
1 1 1 1 1
1
f X ( x) =
∫∫
0 0
f X ,Y , Z ( x , y , z) dz dy =
∫∫0 0
2( x + y )z dz dy =
∫ (x + y) dy = x + 2
0
1 1 1 1 1
1
fY ( y ) =
∫∫
0 0
f X ,Y , Z ( x , y , z) dz dx =
∫ ∫ 2(x + y)z dz dx = ∫ (x + y) dx = y + 2
0 0 0
and
1 1 1 1 1

f Z ( z) =
∫∫
0 0
f X ,Y , Z ( x , y , z) dx dy =
∫∫0 0
2( x + y )z dx =
∫ (2y + 1)z dy = 2z
0
The corresponding distribution functions are

x y x y
FX ,Y ( x , y ) =
∫∫ 0 0
f X ,Y ( x , y )dydx =
∫∫ 0 0
( x + y )dydx
x  y2  x 2 y + xy 2

=
∫ 0
 xy + 2  dx = 2
x z x z
FX , Z ( x , z) =
∫∫ 0 0
f X , Z ( x , z) dz dx =
∫ ∫ (2x + 1)z dz dx
0 0
x
z2 ( x 2 + x)z 2

=
∫ 0
(2 x + 1)
2
dx =
2
y z y z
FY , Z ( y , z) =
∫∫ 0 0
fY , Z ( y , z) dz dy =
∫ ∫ (2y + 1)z dz dy
0 0
y
z2
( y + y )z 2 2

=
∫ 0
(2 y + 1)
2
dy =
2
x x
 1 x2 + x

FX ( x) =
∫ 0
f X ( x)dx =
∫ 0
 x +
2
 dx =
2
y y
 1 y2 + y

FY ( y ) =
∫ 0
fY ( y )dy =
∫0
 y +  dy =
2 2
z z

FZ ( z) =
∫ 0
f Z ( x) dz =
∫ 0
2 z dz = z 2
and finally
x y z
FX ,Y ,Z ( x , y , z) =
∫∫∫
0 0 0
f X ,Y ,Z ( x , y , z)dzdydx
x y z  x 2 y + xy 2  2

=
∫∫∫
0 0 0
2( x + y )zdzdydx = 
 2  z
Example 5.13
For the three-dimensional joint density function of Example 5.12, there
are many conditional probability functions that can be constructed.
Here is a partial list:
FX,Y|Z ( x, y|z), f X,Y|Z ( x, y|z), FX,Z|Y ( x, z|y), f X,Z|Y ( x, z|y), FY,Z|X ( y, z|x),

f Y,Z|X ( y, z|x),
FX|Y ,Z ( x|y, z), f X|Y ,Z ( x|y, z), FY|X ,Z ( y|x, z), f Y|X ,Z ( y|x, z), FZ|X ,Y ( z|x, y),

f X|Y ,Z ( x|y, z),
FX|Y ( x|y ), f X|Y ( x|y ), FX|Z ( x|z ), f X|Z ( x|z ), FY|Z ( y|z ), f Y|Z ( y|z ), FY|X ( y|x ),

f Y|X ( y|x ),
FZ|X ( z|x ), f Z|X ( z|x ), FZ|Y ( z|y ), f Z|Y ( z|y ).
Taking a few of these as representative
f X ,Y , Z ( x, y , z) 2( x + y )z 4( x + y )z
f X , Z|Y ( x , z|y) = = =
fY ( y ) 1 2y + 1
y+
2
f X ,Y , Z ( x , y , z) 2( x + y )z 2( x + y )
f X|Y , Z ( x|y , z) = = =
fY , Z ( y , z) (2 y + 1)z (2 y + 1)

f X ,Y ( x , y) x + y 2( x + y )
fY|X ( y|x) = = =
f X ( x) 1 2x + 1
x+
2
with the corresponding distribution functions
( x 2 + 2 xy)z 2
FX , Z|Y ( x , z|y ) = , so FX , Z|Y ( x = 0.4, z = 0.6|y = 0.25) = 0.086
2y + 1
x 2 + 2 xy
FX|Y , Z ( x|y , z) = , so FX|Y , Z ( x = 0.4|y = 0.25, z = 0.6) = 0.240
(2 y + 1)
y 2 + 2 xy
FY|X ( y|x) = , so FY|X ( y = 0.6|x = 0.4) = 0.467
2x + 1
The preceding examples illustrate that joint distributions defined on ran-

dom vectors allow for a wide variety of analyses. Regardless of the dimen-
sionality of the random vector, marginal densities and distributions of any
smaller dimensionality can be determined. In addition, conditional densities
and distributions can also be constructed as needed. Related to these con-
structions is the fact that while the joint probability functions can be used
to construct the marginal probability functions, in general, the reverse is
not the case. Unless the constituent random variables are independent, as
discussed next, the identities of the marginal probability functions are not
sufficient to identify the joint functions.
5.4 Independence
The concept of independence of random variables follows directly from
that of independence of events. The algebraic test is also the same. The
concept of independence is that knowledge of the occurrence of a random
event (or the random variable that is its image) does not restrict the chance
of occurrence of another event. In Chapter 3, we expressed this idea in
Equation (3.6) as
Pr[A ∩ B] = Pr[ A ]Pr[ B ]
and we used this condition to obtain the equivalent statement that for inde-
pendent events
Pr[A | B] = Pr[ A ]
The extension of this concept to random variables and random vectors is

direct. Two random variables, X and Y, are independent if any of the follow-
ing conditions are satisfied:
FX ,Y ( x , y ) = FX ( x)FY ( y ) (5.20)
fX ,Y ( x , y ) = fX ( x) fY ( y ) (5.21)
FX|Y ( x|y ) = FX ( x) (5.22)
fX|Y ( x|y ) = fX ( x) (5.23)
The reason each of these conditions is sufficient is that they are equivalent.
Furthermore, the conditions apply equally to discrete and continuous ran-
dom variables. This is because the concept of independence applies equally
to discrete and continuous random variables.
Note that it is also possible for subsets of the constituents of a random
vector to be independent. For a random vector X = (X 1 , X 2 , X 3 , X 4 , X 5 ) it is
conceivable that X1 and X2 could be independent of X3, X4, and X5, while at
the same time X1 and X2 are dependent, and the set X3, X4, and X5 are also
dependent. In that case, we would find that
FX1 , X2 , X3 , X 4 , X5 ( x1 , x2 , x3 , x4 , x5 ) = FX1 , X2 ( x1 , x2 )FX3 , X 4 , X5 ( x3 , x4 , x5 )

and
FX1 , X2|X3 , X 4 , X5 ( x1 , x2 |x3 , x4 , x5 ) = FX1 , X2 ( x1 , x2 )

Example 5.14
For the distinguishable dice having the probabilities enumerated in
Table 5.1, X and Y are independent. We can see that for each of the table
entries FX,Y (x,y) = FX (x) FY (y). For example,
1 1  2
FX ,Y (3, 4) = = FX ( x)FY ( y ) =  
3 2  3
and
2 2  1
FX ,Y ( 4, 2) = = FX ( x)FY ( y ) =  
9 3  3
Also, we can see that
f X ,Y (3, 4) 1 36 1
f X|Y (3|4) = = =
fY (4) 1
6 6
Example 5.15
For the discrete joint density function in Example 5.5, we found that
e −17 17 y2 e −11 11y1

fY2 ( y 2 ) = and fY1 ( y1 ) =
y2 ! y1 !
so we can see that
e −17 11y1 6 y2 − y1 e −11 11y1 e −17 17 y2 e −28 11y1 17 y2

fY1 ,Y2 ( y1 , y 2 ) = ≠ fY1 ( y1 ) fY2 ( y 2 ) = =
y1 !( y 2 − y1 )! y1 ! y2 ! y1 ! y 2!
Also, we found in Example 5.7 that
e −6 6 y2 − y1
fY2|Y1 ( y 2 |y1 ) =
( y 2 − y1 )!

which is not the same as
e −17 17 y2
fY2 ( y 2 ) =
y2 !

Example 5.16
For the joint density function in Example 5.8, we found in Example 5.10
that
f X ( x) = 2 e −2 x and fY ( y ) = 2 e − y − 2 e −2 y
The product of these functions is
2 e −2 x (2 e − y − 2 e −2 y ) = 4(e −2 x − y − e −2( x + y ) ) ≠ 2 e − x − y = f XY ( x , y )

Example 5.17
For the three-dimensional joint density function of Example 5.12, the
random variables X and Y are dependent, but the pair is independent of
Z, and each is individually independent of Z. To see these relationships,
observe that
 1  1
f X ( x ) f Y ( y ) =  x +   y +  ≠ ( x + y ) = f X ,Y ( x , y )
 2  2

f X ,Y , Z ( x , y , z) = 2( x + y )z = f X ,Y ( x , y ) f Z ( z) = ( x + y )(2 z)

 1
f X ,Z ( x , z) = (2 x + 1)z =  x +  (2 z) = f X ( x) fZ ( z)
 2

While Equation (5.21) is used in Example 5.17, note that Equations (5.20),
(5.22), and (5.23) yield the same conclusions.
The concept of independence is particularly important for two reasons.
As should be apparent from the construction, using multiplicative computa-
tions for independent random variables can simplify some calculations. The
second and less obvious reason is that dependence implies a reduction in
the size of the portion of the sample space that must be considered. For some
modeling and calculation situations, the use of conditioning can greatly sim-
plify the analysis of a phenomenon.
5.5 Bivariate and Multivariate Normal Distributions

A particularly useful realization of joint distributions is the multivariate
normal. As with the univariate normal distribution discussed in Chapter 4,
the multivariate normal provides an accurate model for many naturally
occurring phenomena. One obvious example is the quality of a manufac-

tured product that must conform simultaneously to several engineering
specifications. The quality measures for those specifications are likely to be
dependent and thus to vary jointly.
The algebraic statement of the multivariate normal density is similar to
but more complicated than the single dimensional model. It will be easier to
understand the multivariate distribution if we start with the bivariate case.
For the two-dimensional distribution, a graph of the density is shown in
Figure 5.1. Note that the bell shape is preserved but because of the depen-
dence of the variables, the density is not completely symmetrical; it looks a
bit like a ski cap. To specify the density function requires five parameters.
The form of the density function is
 2   
2
−
1  x −µ x  − 2 ρ x − µ x y − µ y +  y − µ y  
 
1 2 ( 1− ρ2 )  σ x  σx σy  σy  
fX ,Y ( x , y ) = e  
(5.24)
2 πσ x σ y 1 − ρ2
Notice the similarity of this function to the univariate normal density. As

in the case of the single variable model, the dispersion in each random vari-
able is characterized by the two parameters μ and σ. For two variables, this
implies a total of four parameters. The fifth parameter, ρ, provides a measure
of the strength of the dependence between the variables. It is called the cor-
relation coefficient. Thus, there are a total of five parameters necessary to
uniquely specify a two-dimensional normal distribution.
5
10
15
20
0.06
0.04
0.02
10 15 0.00
20 25 30
FIGURE 5.1
Bivariate normal density with (µ x = 12, µ y = 18, σx = 1.5, σy = 2.2, ρ = 0.6).
Calculation of bivariate normal probabilities is difficult. There are numeri-

cal integration algorithms available but it is recommended that one use a
standard mathematical software package to compute the probabilities.
Each of the existing packages has a routine for performing the calculation
efficiently.
Example 5.18
For the bivariate normal distribution shown in Figure 5.1, the use of a
standard mathematical software package yields
FX,Y(10.4, 14.8) = 0.039
FX,Y(12.6, 16.5) = 0.227
FX,Y(13.8, 21.3) = 0.850
and
FX ,Y (12.6, 16.5) = 1.0 + FX ,Y (12.6, 16.5) − FX ,Y (12.6, ∞) − FX ,Y (∞, 16.5)
= 1.0 + 0.227 − 0.655 − 0.248 = 0.324

A particularly interesting feature of the bivariate normal distribution is

that the marginal and conditional distributions are normal. As we know, for
the joint density on the random vector (X, Y), the marginal density on X is
obtained as

f X ( x) =
∫−∞
f X ,Y ( x , y ) dy
Starting with the form of the density given in Equation (5.24), we can sub-
stitute for y as
y − µy
v=
σy

so
dy
dv =
σy

and then completing the square on v in the exponent as

1  x − µ x  2 x − µx  1  x − µx  2 1  x − µx 
2
 − 2ρv + 2
v =   + 2(1 − ρ2 )  v − ρ σ 
2(1 − ρ2 )  σ x  σx  2  σ x  x

yields the integral

2 2
1  x−µ x  1  x−µ x 
∞ −  − v−ρ
1
∫
2  σ x  2 ( 1− ρ2 )  σ x 
f X ( x) = e dv
2 πσ x 1 − ρ2 −∞

Since x is not a variable of integration, we can factor out the first term of
the integrand to obtain
2 2
1  x−µ x  1  x−µ x 
∞ −  v − ρ σ 
1 − 2  1

f X ( x) =
2 πσ x
e σ 
x
∫−∞ 1 − ρ2
e 2 ( 1− ρ2 )  x
dv
and making the substitution
1 x − µx 
u=  v − ρ σ 
2
1− ρ x

implies that
dv
du =
1 − ρ2

so the integral reduces to
∞ u2
1 −

∫−∞ 2π
e 2 du = 1
Therefore
2
1  x−µ x 
1 − 
2  σ x 
f X ( x) = e (5.25)
σ x 2π
which we recognize as the normal density. The same analysis for the mar-
ginal on Y yields
2
1  y −µ y 
− 
1 2  σy 

fY ( y ) = e (5.26)
σ y 2π
Once we have the marginal densities, we can construct the conditional

densities. These are
 2   
2
−
1  x −µ x  − 2 ρ x − µ x y − µ y + y −µ y  
 
1 2 ( 1− ρ2 )  σ x 

σx σy  σy  

e
f (x, y) 2 πσ x σ y 1 − ρ2
fY|X ( y|x) = X ,Y = 2
f X ( x) 1
1  x−µ x 
− 
2  σ x 
e
σ x 2π
2
1  σy 
1 −  y −µ y −ρ ( x−µ x )
2 σ 2y ( 1− ρ2 )  σx 
= e
σ y 2 π(1 − ρ2 )

which is a univariate normal density having µ y|x = µ y + ρ σ y ( x − µ x ) and

σ 2y|x = σ 2y (1 − ρ2 ) and σx
2
 σx 
y −µ x −ρ ( y −µ y )
1
− 2
f (x, y) 1 2  σy
fX|Y ( x|y ) = X ,Y = e 2 σ x (1−ρ )  
fY ( y ) 2
σ x 2 π(1 − ρ )

σ
which is a univariate normal density having µ x|y = µ x + ρ x y − µ y
σy
( ) and
σ 2x|y = σ 2x (1 − ρ2 ).
Having seen the bivariate model, we can advance to normal distributions
on random vectors of higher dimensionality. In order to do this, note that the
exponent in Equation (5.24) is actually a quadratic form. That is
1  x − µ  2 x − µx y − µy  y − µy  
2
 x
− 2ρ + 
2(1 − ρ2 )  σ x  σx σy  σ y  
 
 x − µx 
= ( x − µ x , y − µ y )M −1 
 y − µ y 

where M–1 is a symmetric matrix comprised of the parameters σ x , σ y , and ρ as
 1 ρ 
 σ 2x (1 − ρ2 ) 2 
σ x σ y (1 − ρ )
M −1 = 
 ρ 1 
 σ x σ y (1 − ρ2 ) σ 2y (1 − ρ2 ) 

For this case, the matrix M–1 is obtained from the covariance matrix
 σ 2x ρσ x σ y 
M= 
ρσ x σ y σ 2y 

and as in the univariate case, the variance terms appear in the denominators.
To extend this form to the multivariate case, represent the random vector as
X = (X 1 , X 2 , … , X r ) for which the mean vector is µ = (µ 1 , µ 2 , … , µ r ) and the
covariance matrix, M, is symmetric and positive definite. The elements of the
covariance matrix are σ 2Xi , X j , which is the covariance of the two variables.
Then, the general form for the r-variate normal density is
1
M −1 2 1
− ( X − µ) M −1 ( X − µ)T
fX ( x) = r e 2 (5.27)
(2 π) 2
where (X − µ) = (X 1 − µ 1 , X 2 − µ 2 , … , X r − µ r ) and (X − µ)T is the transpose

of (X − µ).
As in the case of the bivariate normal, the computation of probabilities is
best performed using an available mathematical software package. In addi-
tion, all of the conceivable marginal and conditional densities based on the
r-variate normal are normal densities. In each case, the construction follows
exactly that used for the aforementioned bivariate case.
For the single-dimensional cases, the forms for the marginal densities are
identical to Equation (5.25) and Equation (5.26). To obtain an s dimensional
marginal from the r-variate model, we partition the random vector so that
the s components of X are listed first. That is, we rearrange the random vec-
tor to be
(X − µ ) = (X 1 − µ 1 , … , X s − µ s , X s+ 1 − µ s+ 1 , … , X r − µ r )

with variables s + 1 to r to be integrated out. Naturally, the covariance matrix

must also be rearranged. Let Ms represent the reorganized form of the matrix
M. That is,
S R
Ms =  
 RT T 

where the submatrix S has dimension s × s and the submatrix T has dimen-
sion (r – s) × (r –s). Then, the marginal density on the s-variate random vector
(X s ) = (X 1 , X 2 , … , X s ) is
1
S−1 2 1
− ( X s − µ s )S−1 ( X s −µ s )T
f X 1 , X 2 , … , X s ( xs ) = s e 2
(5.28)
(2 π ) 2
This is simply an s-variate normal.

For the conditional densities, the analysis is a little more involved as
both the mean vector and the inverse of the covariance matrix must be
adjusted to reflect the conditioning. We can use the bivariate case as a
model for this analysis. The objective is to obtain the conditional density
fX1 , X2 , … , X s|X s + 1 , … , Xr ( xs |xr − s ). To do this, we start with a partition of the inverse
of the reordered covariance matrix Ms and label that inverse as Q. Represent
the partitioned inverse as
 Qs U1 
Q = Ms−1 =  T
U 1 V 

With this definition, the inverse of the covariance matrix for the condi-
tional density is
Qs − U 1V −1U 1T

To see that this is the form of the matrix inverse, consider the multiplica-
tion of Q and Ms. Since
 I s× s 0  S R   Qs U1 
Ms−1Q = I =  =
 0 I r − s× r − s   RT T  U 1T V 
 SQs + RU 1T SU 1 + RV 
= T 
 R Qs + TU 1 RTU 1 + TV 
T

it must be the case that
SQs + RU 1T = I s × s
and
SU 1 + RV = 0
so
R = −SU 1V −1
and substituting gives
SQs − SU 1V −1U 1T = I s × s

Multiplying on both sides of the equality by S –1 yields
S−1 = Qs − U 1V −1U 1T

The interpretation of this form of the covariance matrix is that it represents

the covariance of the random vector relative to the condition stated for the
known variables. Note the correspondence to the bivariate case.
Next, for the mean vector, the dimensionality will be 1 × s, so the vector
will be obtained as
µTs = (µ 1 , µ 2 … , µ s )T − RT −1 ( xs+1 − µ s+1 , xs+ 2 − µ s+ 2 … , xr − µ r )T

Given these constructions, the conditional density is
1
Qss − U 1V −1U 1T 2 1
− ( X s − µ s )(Qss −U1V −1U1T )( X s − µ s )T
fX1 ,X2 , … ,X s|X s+1 , … ,Xr ( xs |xr − s ) = s e 2
(5.29)
(2 π ) 2
Example 5.19
Suppose we have a three-dimensional random vector (X ) = (X 1 , X 2 , X 3 )
for which the mean vector is (15, 9, 12) and the covariance matrix is
3 0 2
 
M = 0 2 1
 2 1 2 

and that we wish to construct the marginal density f X1 , X3 ( x1 , x3 ). First,

we construct the reordered covariance matrix, Ms as
3 2 0
  S R
Ms =  2 2 1 =  T
R T 
 0 1 2  

with
3 2 1 −1 
S=  and S−1 = 

 −1 3 
2 2
2
Since µ= (15, 9, 12), we take µ 2 = (15, 12) and obtain:
1 1 3 
(0.5) 2 − 2  ( x1 −µ1 )2 − 2( x1 −µ1 )( x3 −µ3 )+ 2 ( x3 −µ3 )2 
f X1 ,X3 ( x1 , x3 ) = e
(2 π )
1 3 
0.707 − 2  ( x1 −15)2 − 2( x1 −15)( x3 −12 )+ 2 ( x3 −12 )2 
= e
(2 π )
It should also be clear that
1
1 − 4 ( x2 − 10)2
f X 2 ( x2 ) = e
4π
Finally, to determine the conditional density f X1 , X3|X2 ( x1 , x3 |x2 = 8), we

start by constructing
 3 −4 2
Qss U1   
Q = Ms−1 =  T = −4 6 −3 
U 1 V  
 2 −3 2 

and this implies that V = 2, U 1T = (2, −3) and
 3 −4 
Qs = 
 −4 6 

In addition, T −1 = 0.5 and x2 − µ2 = 8 − 9 = −1. Therefore
 3 −4   2 
Qs − U 1V −1U 1T =  −   ( 1 ) ( 2 , −3 )
 −4 6   −3 2
 3 −4   2 −3   1 −1 
= −  9  =  −1 3 
 −4 6   −3
 2  2
and
 15   0  15 
µTs =   −   ( 1 2)(−1) =  
 12   1  12.5

so
1 3 
0.707 − 2  ( x1 −15)2 − 2( x1 −15)( x3 −12.5)+ 2 ( x3 −12.5)2 
f X1 ,X3|X2 ( x1 , x3 |x2 = 8) = e
(2 π )
In summary, normal distributions of any meaningful dimensionality can

be used to model a physical phenomenon. Those models are reasonably well
behaved and conform to the marginal and conditional probability relation-
ships as directly as do the other distributions we have examined.
Exercises
5.1 Two fair six-sided dice are rolled. Let X represent the sum of the
numbers observed on the two dice and let Y represent the mag-
nitude of the difference between the two numbers. Construct the
joint probability mass function f X,Y(x,y).
5.2 The joint pmf for the discrete random variables M and N is
e −7 4m 3n− m
f MN (m, n) = , 0 ≤ m ≤ n, 0 ≤ n < ∞
m !(n − m)!
Compute Pr[1 ≤ M ≤ 6, 4 ≤ N ≤ 6].
5.3 The joint density on the random variables X and Y is
fXY ( x , y ) = 2 e − x− y , 0 ≤ x ≤ y , 0 ≤ y < ∞

Compute Pr[1 ≤ X ≤ 6, 4 ≤ Y ≤ 10].
5.4 The joint probability density function for the random variables X
and Y is
x
fXY ( x , y ) = + cy , 0 ≤ x ≤ 1, 1 ≤ y ≤ 5
5
Determine the value of c. Are X and Y independent?

5.5 Consider the joint density function
6 2 xy
fXY ( x , y ) = ( x + ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
7 2
Compute Pr[X ≤ 0.6, Y ≤ 0.8].
5.6 Let X be selected at random from the set {1,2,3,4,5} and let Y be
selected from the set {1, 2, …, X}. Identify the joint probability mass
function of (X, Y), compute the conditional pmf on Y given X = 4,
and the conditional pmf on X given Y = 3.
5.7 For the joint pmf constructed in Exercise 5.1, identify the marginal
pmf on Y.
5.8 The joint pmf for the discrete random variables M and N is
e −7 4m3n− m
f MN (m, n) = , 0 ≤ m ≤ n, 0 ≤ n < ∞
m !(n − m)!
Determine the marginal pmf on M.

5.9 The joint density function on X and Y is
fXY ( x , y ) = ( x + y ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1

Determine the marginal density functions for the two variables.

5.10 For the joint density function
1
fX ,Y ( x , y ) = , 0 < y < x, 0 < x < 1
x
Determine the marginal densities on X and Y.

5.11 For the discrete pmf analyzed in Exercise 5.2 and Exercise 5.8, iden-
tify f N|M (n|m) and compute Pr[N ≤ 10 | M = 6].
e − x/y e − y
f ( x, y ) = ; 0 < x < ∞, 0 < y < ∞
y
Find the conditional probability density function on X given Y and

compute Pr[X ≤ 3.6 | Y = 4].
and Y is
0.02 xe −0.01y
fXY ( x , y ) = , 0 ≤ x ≤ y, y > 0
y2

Determine f X|Y (x|y) and compute Pr[X > 1.4 | Y = 2.5].

5.14 For the joint density function
6 2 xy
fXY ( x , y ) = ( x + ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
7 2
Compute Pr[0.75 < Y < 1.6 | X = 0.25].
fXY ( x , y ) = 2 e − x − y , 0 ≤ x ≤ y, 0 ≤ y < ∞

Compute Pr[3.4 ≤ Y ≤ 6.2 | X = 2].
5.16 The joint probability density function for the random vari-
ables X and Y is f XY(xy) = xe–x(y + 1), 0 < y < ∞, 0 < x < ∞. Compute
Pr[Y ≥ 1.8 | Y > 1.2, X = 1.5].
5.17 For the joint density function given in Exercise 5.5, are the variables
X and Y independent?
5.18 For the joint density function given in Exercise 5.9, are X and Y
independent?
5.19 Show that f X|Y(x|y) = f X(x) implies independence of X and Y.
5.20 Consider the three-dimensional normal density specified in
Example 5.19. Determine the joint marginal density on X 2 = (X 1 , X 2 )
and indicate why these two variables are independent.
5.21 For a bivariate normal density, suppose the quadratic form is
1
3
(
6( x + 1)2 − 2 xy + 2 y − 4x − 4 + ( y − 2)2 )

What are the values of µ X , µY , σ X , σY , and σ X ,Y ?

2 2 2
5.22 For the normal distribution on the random vector X = (X 1 , X 2 , X 3 , X 4 )

having mean vector µ = (1, 0, −2 , 3) and covariance matrix:
2 0 1 0
0 5 0 −3
M= 
1 0 1 0
0 −3 2 
 0

Identify the marginal joint density on the vector (X1 , X2) and the
conditional joint density (X1 , X4 | X2 , X3).
6
Expectation and Functions
of Random Variables
Once a random variable has been defined, there can be many reasons
for defining functions of that variable. One reasonable example would
be a conversion of a temperature measurement from a Celsius scale to a
Fahrenheit scale. Another would be the conversion from sales volume to
revenue. In general, since a random variable is a function for which the
range is the real line, the construction of another function—a function of
the random variable—should be a reasonable thing to do and should con-
form to usual algebraic behaviors. The corresponding distribution func-
tion for the functional variable can be constructed at the same time. This
is an important extension of the probability concepts treated in this text,
so it is included in this chapter. However, the analysis of general functions
of random variables and random vectors is placed later in the chapter
because there is a particular function—called expectation—that is central
to many probability analyses.
6.1 Expectation
The expectation of a random variable, X, is defined as a weighted sum of
the possible values that X may take. The weighting is the corresponding
probability measure. For a discrete random variable, X, the expectation or
expected value of X is denoted by E[X] and is computed as
E[X ] = ∑ x Pr[X = x] (6.1)

x
The corresponding definition for a continuous random variable, say Y, is

computed as
E[Y ] =
∫ yf (y)dy (6.2)
y
Y
105
Note that in both cases the possible values of the random variable are mul-
tiplied by their corresponding probability measure and that these products
are accumulated over the range of the random variable.
It is important to note that the expected value of a random variable is a
descriptor of the distribution on that variable. In fact, it is the first moment
about the origin of the probability mass function (pmf) or probability den-
sity function (pdf), whichever term applies, and it is often referred to as
the mean of the distribution. The expected value does, in fact, correspond
to the center of gravity of the probability measure. Thus, the expected
value of a random variable—its mean—is an indication of the center of the
distribution.
Keeping in mind that the pdf (or pmf) is a function, we realize that it has
higher-order moments than just the first. The moments jointly characterize
all of the features of the function. In the study of probability, we often con-
sider higher moments of a distribution, particularly the second moment. In
general, the kth moment of a distribution is defined as
E[X k ] = ∑xx
k
Pr[X = x] (6.3)
for a discrete random variable and
E[Y k ] =
∫y y
k
fY ( y )dy (6.4)
for one that is continuous.

The emphasis on the second moment arises from the fact that it is used to
compute the “variance” of a distribution. The variance is a measure of the
dispersion in the probability measure and is computed as
Var[X ] = σ 2X = E[X 2 ] − E 2 [X ] (6.5)
Note that the variance is also the first moment about the mean. That is, we
could define the variance as
Var[X ] = E[( X − E[X ]) ]

2

Using the properties of expectation that are discussed later, this expression
can be shown to equal Equation (6.5). First, consider some realizations of the
mean and variance.
All distribution functions have moments and each of the commonly used
distributions that have been discussed in this text has a mean and a variance.
Usually, using Equation (6.3) or Equation (6.4), the determination of the val-
ues for the moments is not too difficult. Consider some examples.
Expectation and Functions of Random Variables 107
Example 6.1
For the binomial distribution, we have
n n
 n
E[X ] = ∑ x=0
x   p x q n− x =
 x ∑ x x !(nn−! x)! p q
x=1
x n− x
n n− 1
= ∑ n!
( x − 1)!(n − x)!
p x q n− x = np ∑
( x −
(n − 1)!
1)!( n − x )!
p x −1 q n− x = np
x=1 x − 1= 0
where the final step depends on the fact that the summation represents
all of the probability for a binomial random variable having range zero
to n – 1 and is thus equal to one. In the case of the variance, we start with
n n
 n
E[X 2 ] = ∑
x=0
x 2   p x q n− x =
 x ∑x x=1
2 n!
x !(n − x)!
p x q n− x
n n− 1
= ∑
x=1
xn!
( x − 1)!(n − x)!
p x q n− x = np
x − 1= 0
∑
( x − 1 + 1)(n − 1)! x −1 n− x
( x − 1)!(n − x)!
p q
 n−1 ( x − 1)(n − 1)! n− 1


= np  ∑
 x −1= 0 ( x − 1)!(n − x)!
p x − 1q n− x +
x − 1= 0
∑
(n − 1)!
( x − 1)!(n − x)!
p x − 1q n− x 

 n− 1 
= np  ∑ (n − 1)!
 x −1=1 ( x − 2)!(n − x)!
p x −1q n− x + 1

 n− 2

= np  (n − 1) p
 x−2=0
∑ (n − 2)!
( x − 2)!(n − x)!
p x − 2 q n− x + 1 = np ((n − 1) p + 1)

= np ((n − 1) p + 1) = np ( np + q ) = n2 p 2 + npq

Then, applying Equation (6.5) yields
Var[X ] = E[X 2 ] − E 2 [X ] = n2 p 2 + npq − (np)2 = npq.
Example 6.2
For the geometric distribution, we have
∞ ∞ ∞
 ∞

E[ K ] = ∑ kpq
k =1
k −1
=p ∑ kq
k =0
k −1
=p ∑ dqd (q ) = p dqd  ∑ q 
k =0
k
k= 0
k
d  1  1 p 1
=p =p = =
dq  1 − q  (1 − q)2 p 2 p

and for the variance, we start with
∞ ∞ ∞ ∞
E[ K 2 ] = ∑
k =1
k 2 pq k −1 = p ∑k =0
( k 2 − k + k )q k −1 = pq ∑
k =0
k( k − 1)q k − 2 + p ∑ kq
k =0
k −1
∞
 ∞

∑ dqd (q ) + 1p = pq dqd  ∑ q  + 1p
2 2
= pq 2
k
2
k
k=0 k =0
d2  1  1 2 pq 1 2 p(1 − p) 1 2 − p
= pq + = + = + = 2
dq 2  1 − q  p (1 − q)3 p p3 p p

Using Equation (6.5), we obtain
2−p 1 q
Var[ K ] = E[ K 2 ] − E 2 [ K ] = − 2 = 2
p2 p p

Example 6.3
For the exponential distribution, we have
∞ ∞ ∞ ∞
 1  1

E[T ] =
∫0
tfT (t)dt =
∫0
λte − λt dt = −te − λt +
∫
0
e − λt dt =  −te − λt − e − λt  =
 λ  0
λ
and
∞ ∞
E[T 2 ] =
∫ 0
λt 2 e − λt dt = −t 2 e − λt + 2
∫ 0
te − λt dt
∞
2t − λt 2
= −t 2 e − λt −
λ
e +
λ ∫ 0
e − λt dt
∞
 2t 2  2
=  −t 2 e − λt − e − λt − 2 e − λt  = 2
 λ λ 0 λ

Using Equation (6.5), we obtain
2 1 1
Var[T ] = E[T 2 ] − E 2 [T ] = − =
λ2 λ2 λ2
Notice an important feature of the moments of a distribution—they are

expressed in terms of the parameters of the distribution. Thus, a binomial
distribution having parameters n = 80 and p = 0.05 has mean µ X = 4 and
variance σ 2X = 3.80. Similarly, an exponential distribution with parameter
λ = 0.001/hr has a mean value of E[T] = 1000.0 hr.
6.2 Three Properties of Expectation

The analysis and use of expected values is extensive. It is therefore useful to
be aware of three properties of expectation. The first of these is that expecta-
tion is a “linear operator.” By this, we mean that
E[X k + Y m ] = E[X k ] + E[Y m ] (6.6)
We can see this for the simplest case as
E[X + Y ] =
∫ ∫ (x + y) f
x y
X ,Y ( x , y )dydx =
∫ ∫ xf
x y
X ,Y ( x , y )dydx +
∫ ∫ yf
x y
X ,Y ( x , y )dydx
=
∫ xf
x
X ,Y ( x , y )dx +
∫ yf
y
X ,Y ( x , y )dydx = E[X ] + E[Y ]
This construction also confirms that the expected value of a sum of random
variables equals the sum of their expected values regardless of whether the
variables are independent. Thus, expectation has the property of being linear.
A further implication of the linearity of the expectation is that for con-
stants, say a and b,
E[ aX + b] = aE[X ] + b (6.7)
and this behavior also extends to higher-order moments as
E[ aX k + b] = aE[X k ] + b
The assertion that the first moment about the mean equals Equation (6.5) is
based on the linearity of the expectation. That is, starting with
Var[X ] = E[( X − E[X ]) ]

2

we can perform the squaring operation and distribute the expectation across
the resulting sum. Then
Var[X ] = E[( X − E[X ]) ] = E[X 2 − 2 XE[X ] + E 2 [X ]]

2
= E[X 2 ] − 2E[XE[X ]] + E[E 2 [X ]] = E[X 2 ] − 2E[X ]E[X ] + E 2 [X ]
= E[X 2 ] − E 2 [X ]
This construction also illustrates the fact that an expected value, E[X], is a
constant rather than a random variable.
A second property of expectation is that the variance “operator” is almost

linear. In fact, the variance is not linear, but we say it is almost linear because
Var ( aX + b ) = a 2Var ( X ) (6.8)
Notice that a constant has no variance. It is constant and does not vary.
The third property of expectation is that it applies to functions of random
variables as well as to the random variables. We will examine functions of
random variables later in this chapter. For now, suppose that we have defined
a function, say g(X), on the random variable X. We can obtain the expected
value of that function by applying the definition of expectation directly. That
is, for a discrete random variable
E[ g(X )] = ∑ g(x) Pr[X = x] (6.9)

x
and for the case in which the random variable is continuous
E[ g(X )] =
∫ g( x) f
x
X ( x)dx (6.10)
Essentially, the expectation of the function is the weighted sum of the values
the function can assume where the weights are the associated probability
measures.
Example 6.4
Suppose we roll a fair six-sided die and receive a payment equal to three
times the number shown by the die. What is our expected gain? For this
experiment, the payout function is
g(x) = 3x
so the expected payout is
1 1 1 1 1 1
E[ g(X )] = (3) + (6) + (9) + (12) + (15) + (18) = 10.5
6 6 6 6 6 6
For the more advanced reader, can you show that Var[g(x)] = 26.25?
6.3 Expectation and Random Vectors

The definition of expectation extends directly to random vectors and the inter-
pretation is the same. The expected values correspond to the moments of the
distributions and thus characterize the distributions. In order to develop the

expectation relations for random vectors, start with the two-dimensional case.
Suppose we have the random vector (X, Y) and its associated distribution
function FX,Y(x,y). As we observed earlier, finding the values for the indi-
vidual expectations corresponds to constructing the marginal densities (or
pmf) and using those to obtain moments. Thus, it is the expectations of mul-
tiplicative forms that have not yet been treated. For a discrete random vector,
the expected value function is
E[XY ] = ∑ ∑ xy Pr[X = x, Y = y] (6.11)

x y
and the corresponding form for a continuous random vector is
E[XY ] =
∫ ∫ xyf
x y
X ,Y ( xy )dydx (6.12)
These definitions conform to the pattern set for the single-dimensional case.
Consider two examples.
Example 6.5
For the discrete bivariate density defined in Example 5.5, we construct
the expected value of the random vector (Y1, Y2) as
∞ y2 ∞ y2 − 1
e −17 11y1 6 y2 − y1 11y1 −16 y2 − y1
E[Y1Y2 ] = ∑∑
y 2 = 0 y1 = 0
y1 y 2
y1 !( y 2 − y1 )!
= 11e −17
y
∑ ∑
2 =1
y2
y1 − 1= 0
(y1 − 1)!(y 2 − y1 )!!
∞ y2 − 1 y1 − 1 y 2 − y1
= 11e −17 ∑ (y 1− 1)! y ∑ (y(y −−11)!)!(11y
y2 = 1
2
2
y1 − 1= 0
2
1 2
6
− y1 )!
∞ y2 − 1
 y 2 − 1
= 11e −17 ∑ y2 =1
( y
y2
2 − 1)!
y
∑  y − 1 11
1 − 1= 0
1
y1 − 1 y 2 − y1
6
∞ ∞
y 2 (17 y2 −1 ) − 1 + 1)(17 y2 −1 )
= 11e −17 ∑ y2 =1
( y 2 − 1)!!
= 11e −17
y
∑ (y
2 − 1= 0
2
( y 2 − 1)!
∞ ∞
( y 2 − 1)(17 y2 −1 ) y2 − 1
= 11e −17
y2
∑ − 1= 0
( y 2 − 1)!
+
y
∑ ((y17 − 1)!)
2 − 1= 0
2
∞ ∞
( y 2 − 1)(17 y2 −1 ) (17 y2 −1 )
= 11 ∑
y2 − 1= 0
e −17
( y 2 − 1)!
+ 11e −17
y
∑
2 − 1= 0
( y 2 − 1)!
= 11(17 ) + 11e −17 17

e = 198.0
Example 6.6
For the continuous random vector (X, Y) suppose
f XY ( x , y ) = 2 e − x − y , 0 ≤ x ≤ y, 0 ≤ y < ∞

Then, the expected value of the product is
∞ y ∞ y ∞ y
E[XY ] =
∫ ∫
0 0
xyf XY ( x , y )dxdy = 2
∫ ∫
0 0
xye − x − y dxdy = 2
∫0
ye − y
∫
0
xe − x dxdy
( ) dy = 2 ∫
∞ ∞
∫ ( ye )
y
=2 ye − y − xe − x − e − x −y
− ye −2 y − y 2 e −2 y dy
0 0 0
∞
 1 1 1 1 1 
= 2  − ye − y − e − y + ye −2 y + e −2 y + y 2 e −2 y + ye −2 y + e −2 y 
 2 4 2 2 4 0
 1 1
= 21− −  = 1
 4 4
When we consider higher moments, the general definitions described ear-

lier apply, but there is a special case that we should examine. This is the
covariance. First, note that the higher moments are given by
E[X mY n ] = ∑∑ x y
x y
m n
Pr[X = x , Y = y ] (6.13)
in the discrete case and
E[X mY n ] =
∫∫x y
x y
m n
fX ,Y ( xy )dydx (6.14)
in the continuous case. It is important to recognize that these higher moments

exist, but they are rarely used in engineering applications so we will not
pursue them further here. Instead, consider the covariance of two random
variables.
Recall that the variance is not a distribution moment but is a function of
distribution moments that describes dispersion in probability. The covari-
ance is a comparable measure and is defined as
Cov[X , Y ] = E ( X − E[X ]) (Y − E[Y ])  = E[XY ] − E[X ]E[Y ] (6.15)
The covariance provides information about the dispersion in probabilities

and the mutual behavior of the two variables. The covariance may be either
positive or negative. A positive covariance indicates that X tends to increase
(decrease) as Y increases (decreases). A negative covariance indicates that X

tends to decrease (increase) as Y increases (decreases). In fact, we say that the
two variables may be correlated, and their correlation is
Cov[X , Y ] σ2
ρXY = = XY (6.16)
Var[X ]Var[Y ] σ X σ Y
The correlation, ρXY between X and Y will lie in the interval (–1, 1) and its
sign will be determined by the numerator—the covariance—as the denomi-
nator is positive by definition.
Example 6.7
For the discrete bivariate density defined in Example 5.5, we found in
Example 6.5 that E[Y1Y2] = 198.0 and we found in Example 5.5 that
e −11 11y1 e −17 17 y2

fY1 ( y1 ) = and fY2 ( y 2 ) =
y1 ! y2 !
so we can see that E[Y1] = 11.0, E[Y2] = 17.0, and

Cov[Y1,Y2] = E[Y1Y2] – E[Y1] E[Y2] = 198.0 – (11.0)(17.0) = 11.0
Also, Var[Y1] = 11.0 and Var[Y2] = 17.0, so
Cov[Y1 , Y2 ] 11.0
ρY1Y2 = = = 0.804
Var[Y1 ]Var[Y2 ] (11.0)(17.0)

Example 6.8
For the continuous density of Example 6.6, we obtained E[XY] = 1.0, and
in Example 5.10, we found that
f X(x) = 2e−2x and f Y(y) = 2e−y −2e−2y
so we can compute the expectations E[X] = 0.50 and E[Y] = 1.50 and
Cov[X, Y] = E[X,Y] – E[X]E[Y] = 1.0 – (0.50)(1.50) = 0.25
Also, Var[X] = 0.25 and Var[Y] = 1.75, so
Cov[XY ] 0.25
ρXY = = = 0.378
Var[X ]Var[Y ] (0.25)(1.75)
Now that the relationships of covariance and correlation have been defined,
the reader is encouraged to review the discussion of the multivariate normal
distribution in Chapter 5, Section 5.5 of this text. The covariance and corre-
lation terms presented there are exactly the same as those described in this
chapter.
A particularly important feature of covariance is that independent random
variables have a covariance of zero. This is reasonably apparent when we
recognize the fact that independent random variables, say X and Y, have
E[XY] = E[X]E[Y]
Finally, it is appropriate to note three properties of the covariance.

These are
1.
Cov[X, Y] = Cov[Y, X]
2.
Cov[X, X] = Var[X]
3.
Cov[aX, bY] = abCov[X, Y]
Each of these follows directly from the definition of Equation (6.15).

The extension to higher dimensions of the expectation relationships defined
here for two dimensions is direct. For a random vector, X = (X 1 , X 2 ,… , X n ),
Equation (6.11) and Equation (6.12) need only be expanded to the dimension
of the vector, and the same applies to Equation (6.13) and Equation (6.14).
Covariance should be structured in matrix form as in the case of the multi-
variate normal distribution. The global variance of a multivariate distribu-
tion is rarely considered interesting but may be computed as
 n

Var[X ] = E 

∏ ( X − E[X ]) (6.17)
i=1
i i
6.4 Conditional Expectation

As we found in Chapter 5, conditional probabilities can be defined using the
set relationships in Chapter 2, and these models can be useful and informa-
tive. We also noted that conditional distributions are proper distributions
in the same sense of univariate and joint distributions. A logical next obser-
vation is that conditional distributions have moments. In fact, the defini-
tions for expected values and for variances apply directly to conditional
distributions.
Example 6.9
For the discrete bivariate density defined in Example 5.5, we found in
Example 5.7 that
e −6 6 y2 − y1
fY2|Y1 ( y 2 |y1 ) =
( y 2 − y1 )!
and
 y 2   11  y1  6  y2 − y1
fY1|Y2 ( y1 |y 2 ) =      
 y1   17   17 

The application of Equation (6.3) and Equation (6.5) to these density

functions yields the conditional means and conditional variances:
y2
 y2  y1 y 2 − y1
E[Y1 |Y2 ] = ∑ y  y   1711 
y1 = 0
1
1
 6
 
17
 11 
=   y2
 17 

 11   6 
Var[Y1 |Y2 ] = E[Y12 |Y2 ] − E 2 [Y1 |Y2 ] =     y 2
 17   17 

Note that the conditional distribution on Y1 given Y2 is a binomial, so

the mean and variance are easily identified. For these expressions, if we
take Y2 = 4, we obtain E[Y1 |Y2 = 4] = 2.588 and Var[Y1 |Y2 = 4] = 0.913.
Also
∞ ∞
e −6 6 y2 − y1 e −6 6 y2 − y1
E[Y2 |Y1 ] = ∑
y 2 = y1
y2
( y 2 − y1 )! y
= ∑
2 − y1 = 0
( y 2 − y1 + y1 )
( y 2 − y1 )!
∞ ∞
e −6 6 y2 − y1 e −6 6 y2 − y1
= ∑
y 2 − y1 = 0
( y 2 − y1 )
( y 2 − y1 )!
+ y1
y
∑
2 − y1 =0
( y 2 − y1 )!
= 6 + y1

Var[Y2 |Y1 ] = E[Y22 |Y1 ] − E 2 [Y2 |Y1 ] = 6

Again, observe that the conditional distribution on Y2 given Y1 is a Poisson,

so the mean and variance are easily identified. For these expressions, if
we take Y1 = 10, we obtain E[Y2 |Y1 = 10] = 16 and Var[Y2 |Y1 = 10] = 6.
Example 6.10
For the continuous density of Example 6.6, we found the conditional
densities in Example 5.12 to be
e −x
f X|Y ( x|y ) =
1 − e −y
and
fY|X ( y|x) = e −( y −x )
The application of Equation (6.4) and Equation (6.6) to these density

functions yields the conditional means and conditional variances
y
e−x 1 − e − y − ye − y

E[X |Y ] =
∫ 0
x
1− e −y
dx =
1 − e−y
1 − 2 e − y − y 2 e − y + e −2 y
Var[X|Y ] = E[X 2 |Y ] − E 2 [X|Y ] =
(1 − e − y )2
If we take Y = 6, we obtain E[X |Y = 6] = 0.985 and Var[X|Y ] = 0.910.

Also
∞
E[Y|X ] =
∫ x ye −( y − x)
dy = x + 1
Var[Y|X ] = E[Y 2 |X ] − E 2 [Y|X ] = 1
so taking X = 8 yields E[Y | X] = 9.
A further observation concerning conditional expectation is that it applies

to univariate distributions just as well as to joint distributions. If we have
a discrete random variable for which the pmf is f X (x), the mean of a corre-
sponding conditional pmf is obtained as
xmax xmax
∑ ∑ x F (a − 1)
f X ( x)
E[X |X ≥ a] = xfX|X ≥ a ( x|X ≥ a) =
X
x= a x= a
The corresponding construction for the variance is
Var[X |X ≥ a] = E[X 2 |X ≥ a] − E 2 [X |X ≥ a]

In the case of a continuous random variable, say T, the same logic yields
τ τ
fT (t)

E[T |T ≤ τ] =
∫ 0
tfT|T ≤τ (t|T ≤ τ)dt =
∫ t F (τ) dt
0 T
and
Var[T |T ≤ τ] = E[T 2 |T ≤ τ] − E 2 [T |T ≤ τ]

Of course, other conditions are managed in the corresponding fashion.

Example 6.11
For the Poisson random variable in Example 4.27,
∞
xe −λ λ x  ∞
xe −λ λ x
7 −λ 
∑ ∑ ∑ xe x !λ
x
1 1
E[X|X ≥ 8] = =  − 
FX (7) x=8
x! 1 − FX (7)  x=0
x! x=0 
 7 −λ   6 −λ x−1 
∑ (ex −λ1)!  = 1 − F1 (7)  λ − λ ∑ e(x −λ1)! 
x
1
= λ −
1 − FX (7)  x=1
X
x − 1= 0
λ(1 − FX (6)) (12.0)(0.954)

= = = 12.576
1 − FX (7) 0.911

E[X 2|X ≥ 8]
∞ ∞ ∞
e −λ e −λ e −λ
=
1 − FX (7) ∑
x= 8
x2λ x
x!
=
1 − FX (7) ∑x= 8
xλ x
= ∑
( x − 1 + 1)λ x
( x − 1)! 1 − FX (7) x−1=7 ( x − 1)!
e −λ  λx  e −λ  2 λ x −1 
∞ ∞ ∞ ∞
λ x− 2
=  ∑ ( x − 1)λ x
+ ∑=  λ + λ 
1 − FX (7)  x−1=7 ( x − 1)! x−1=7 ( x − 1)!  1 − FX (7)  x−2=6 ( x − 2)! x−1=7 ( x − 1)! ∑ ∑
e −λ  2  λ x− 2   ∞ λ x −1 λ x −1  
∞ 5 6
λ x− 2
= λ  ∑ − ∑
 +λ
1 − FX (7)   x−2=0 ( x − 2)! x−2=0 ( x − 2)! 
− ∑ 
 x−1=0 ( x − 1)! x−1=0 ( x − 1)!  
∑
λ 2 e −λ e λ λ 2 FX (5) λ e −λ e λ λFX (6) λ 2 (1 − FX (5)) + λ(1 − FX (6))
= − + − =
1 − FX (7) 1 − FX (7) 1 − FX (7) 1 − FX (7) 1 − FX (7)
144(0.978) + 12(0.954)
= = 167.514
0.911
Var[X |X ≥ 8] = E[X 2 |X ≥ 8] − E 2 [X |X ≥ 8] = 9.366

Example 6.12
For the exponential failure distribution of Example 5.15,
24 24
fT (t) 1
E[T |T ≤ 24] =
∫
0
t
FT (24)
dt =
1 − e −(0.025)(24.0) ∫ 0
λte −λt dt
1 1
24 − 24e −0.6 − e −0.6
1  −λt 1 −λt  λ λ
=  −te − e  = = 10.807
1 − e −0.6 λ 1 − e −0.6
0
24 24
fT (t) 1
E[T 2 |T ≤ 24] =
∫ 0
t2
FT (24)
dt =
1 − e −0.60 ∫
0
λt 2 e −λt dt
24
1  2 −λt 2t −λt 2 −λt 
=  −t e − e − 2 e 
1 − e −0.6  λ λ 0
2  48 2 
− e −0.6  242 + + 
λ2  λ λ2 
= = 163.942
1 − e −0.6
Var[T |T ≤ 24] = E[T 2 |T ≤ 24] − E 2 [T |T ≤ 24] = 47.148
One final aspect of conditional expectation is that the use of conditioning

may make it easier to obtain expected values. In particular, we may have
a joint distribution for which the construction of the marginal probability
measure on one of the variables is quite difficult, so direct calculation of the
mean and variance for that variable is correspondingly complicated. In such
a case, we may exploit that
E[Y ] = E [ E[Y |X ]] =
∫ E[Y|x] f
x
X ( x)dx (6.18)
or
E[Y ] = E [ E[Y |X ]] = ∑ E[Y|X = x]Pr[X = x] (6.19)
x
depending on whether the variables are continuous or discrete.
Example 6.13
For the joint density
f XY ( x , y ) = 2 e − x − y , 0 ≤ x ≤ y, 0 ≤ y < ∞

we have found that
f X ( x) = 2 e −2 x

and
fY|X ( y|x) = e −( y − x )

Using these
∞ ∞ ∞
E[Y|X ] =
∫ x
yfY|X ( y|x)dy =
∫ x
ye x − y dy = e x
∫ x
ye − y dy
( ) ( )
∞
= e x − ye − y − e − y = e x xe − x + e − x = x + 1
x
and
∞

E[Y ] =
∫ 0
( x + 1) f X ( x)dx = E[X ] + 1 = 3 2
as E[X ] = 1 .
2
Example 6.14
In Example 6.9, we constructed
y2
 y 2   11  y1  6  y2 − y1  11 
E[Y1 |Y2 ] = ∑
y1 = 0
y1      
 y1   17   17 
=   y2
 17 

Thus, we can see that
E[Y1 ] = ∑  1711  y

y2 = 0
 11   11 
f ( y 2 ) =   E[Y2 ] =   (17 ) = 11
2 Y2
 17   17 

These examples illustrate the fact that it can be much easier to use condi-
tioning to perform probability calculations than it is to make the calculations
directly. The reason for this is that the conditional probabilities are based on
some knowledge that sometimes simplifies their calculations.
This completes the enumeration of the various aspects of the functions of
random variables that correspond to expectation. It is now time to move on to
general functions of random variables and random vectors. This discussion,
in turn, will be followed by an examination of the special class of functions in
which we sum independent random variables.
6.5 General Functions of Random Variables

6.5.1 One-Dimensional Functions
The reason for defining a function of a random variable may be as simple as a
wish to change temperature scales. As we know, for a temperature measured
on the Celsius scale, X, the conversion to the Fahrenheit scale is defined by
the function
Y = 1.8X + 32
For functions of this nature, the transfer of the probabilities is straightfor-

ward. We can readily state that
Pr[Y ≤ 140] = Pr[1.8X + 32 ≤ 140] = Pr[X ≤ 52]

In fact, for most linear functions, the computation of probabilities follows

this pattern. Simply substituting the function in the probability statement
and simplifying the expression to isolate the random variable should yield
the probability.
If the functional relationship is nonlinear, the computation should follow
directly from the form of the function. In the case of an exponential function
such as
Y = e aX
we obtain probabilities on Y as
1
Pr[Y ≤ y ] = Pr[e aX ≤ y ] = Pr[ aX ≤ ln y ] = Pr[X ≤ ln y ]
a
While the computations of the probabilities are straightforward as shown

earlier, we can also construct the distribution function and density of prob-
ability mass function. The general rule for doing this is as follows:
1. For a random variable X having probability density function or proba-

bility mass function f X(X) and a quantity Y, which is defined as a func-
tion of X as Y= g(X), we first find the inverse of the function, say g –1(Y),
d −1
and take the derivative of the inverse function, g ( y ).
dy
2. Then
d −1
fY ( y ) = g ( y ) fX ( g −1 ( y )) (6.20)
dy
This is to say that we take the absolute value of the derivative of the inverse
function and multiply it by the density function on X evaluated at the value,
which is the inverse of the functional value for which the density is desired.
Consider some examples.
Example 6.15
− λx − λx
Let FX ( x) = 1 − e so f X ( x) = λe .
Suppose Y = g(X) = cX.
Then
y
g −1 ( y ) =
c
and
d −1 d  y 1
g (y) = =
dy dy  c  c

Therefore,
λ
d −1 λ − y
fY ( y ) = g ( y ) f X ( g −1 ( y )) = e c
dy c

and
λ λ
y y
λ −cy
∫ ∫
− y
FY ( y ) = fY ( y )dy = e dy = 1 − e c
0 0 c
Numerically, suppose λ = 0.0001 and c = 15. Then
FX ( x = 4000) = 1 − e −( 0.0001)( 4000) = 1 − e −0.4 = 0.330
0.0001
−( )( 60000 )
FY ( y = 60000) = 1 − e 15 = 1 − e −0.4 = 0.330

Example 6.16
) , so fX (x) = βx e −( x θ) , and suppose again that
β−1 β
Let FX ( x) = 1 − e(
β
x
θ
β
θ
Y = g(X ) = cX . Then the inverse function and its derivative are the same
as in the previous example so
β−1 β
β c 
y y 
− c θ  β
d −1 −1   


 cβy β−1 −  cθ

y

fY ( y ) = g ( y ) f X ( g ( y )) = e = e

dy θβ
( cθ)β
and
β
−  cθ
y y
∫
 
FY ( y ) = fY ( y )dy = 1 − e
0
because dy = cdx. Numerically, for β = 1.6, θ = 1500.0, and c = 3.15
( ) = 1 − e −0.366 = 0.306
1.6
− 800 1500
FX ( x = 800) = 1 − e
( )
1.6
− 2520 ( 3.15 )( 1500 )
FY ( y = 2520) = 1 − e = 1 − e −0.336 = 0.306
Example 6.17
x−a 1
Let FX ( x) = so f X ( x) =
b−a b−a
Suppose Y = g(X) = ln X. Then
g −1 ( y ) = e y

and
d −1
g (y) = e y
dy

Therefore,
d −1 ey
fY ( y ) = g ( y ) f X ( g −1 ( y )) =
dy (b − a)

and
y y
ey ey − a
FY ( y ) =
∫
ln a
fY ( y ) dy =
∫
ln a b − a
dy =
b−a
Numerically, suppose b = 5 and a = 1,
2.6 − 1.0
FX ( x = 2.6) = = 0.40
5.0 − 1.0
FY ( y = ln 2.6 = 0.9555) = 0.40
6.5.2 Multidimensional Functions

The extension to constructing functions of random vectors is actually the
generalization of the method described earlier for a single-dimensional ran-
dom variable. The same motivation applies. That is, engineering applications
may involve computing a function of a random vector in order to obtain a
meaningful system performance measure. One example of this is the com-
putation of a consumer demand function for a set of interrelated or compet-
ing products.
For a random vector, X = (X 1 , X 2 , … , X n ), we might define the functions
Yi = g i (X ) to obtain the function vector Y = (Y1 , Y2 , … , Yn ). The functions
g i (X ) must be differentiable and the system of equations must be “invert-
ible” in the sense that we can obtain the set of “inverse” functions X i = hi (Y ).
When the functions are so defined, we can obtain the joint density on the
random vector Y as:
fY ( y1 , y 2 , … y n ) = fX ( h1 ( y ), h2 ( y ), … , hn ( y ))| J ( x)|−1 (6.21)
where J ( x) is the Jacobian matrix of partial derivatives of the functions g i (X ).

For clarification, consider the two-dimensional case. Suppose we have a
bivariate probability density, fX ( x1 , x2 ), such that X = (X 1 , X 2 ), Y = (Y1 , Y2 ),
and we have well-behaved functions Y1 = g1 (X ) and Y2 = g 2 (X ), so that the
Jacobian matrix
 ∂g1 ∂g1 
 ∂x1 ∂x2 
J ( x1 , x2 ) =  
∂g 2 ∂g 2
 
 ∂x1 ∂x2 
exists. Assume further that the functions X 1 = h1 (Y ) and X 2 = h2 (Y ) can be

constructed. In this case, Equation (6.21) can be applied directly to obtain
f Y(y1, y2).
Example 6.18
Suppose the random variables X1 and X 2 are independent and gamma
distributed with
λ α1 x1α1 −1 e − λx1
f X 1 ( x1 ) =
Γ(α 1 )
and
λ α 2 x2α 2 −1 e − λx2
f X 2 ( x2 ) =
Γ(α 2 )
and that
Y1 = g1 ( x1 , x2 ) = X 1 + X 2

and
Y2 = g 2 ( x1 , x2 ) = X 1 (X + X )
1 2
For these functions
 ∂g1 ∂g1 
 ∂x1 = 1 ∂x2 = 1 
J ( x1 , x 2 ) = 
∂g x2 ∂gg 2 − x1 
 2 ∂x = ∂x2 = ( x1 + x2 )2 
 1 ( x1 + x2 )2

and
−1
J ( x1 , x 2 ) =
x1 + x2
We construct the functions h1 ( y1 , y 2 ) and h2 ( y1 , y 2 ) as
x2 = y1 − x1

and
y 2 = x1 ( x + y − x ) = x1 y
1 1 1 1

so x1 = y1 y 2 and x2 = y1 (1 − y 2 ) .
Now, as we know
λ α1 +α 2 x1α1 −1 x2α 2 −1 e − λ( x1 + x2 )
f X1 ,X2 ( x1 , x2 ) =
Γ((α 1 )Γ(α 2 )
Therefore,
−1
fY1 ,Y2 ( y1 , y 2 ) = f X1 , X2 ( h1 ( y1 , y 2 ), h2 ( y1 , y 2 ))| J ( x1 , x2 )|
λ α1 +α 2 ( y1y 2 )α1 − 1 ( y1 (1 − y 2 ))α 2 − 1 e −λ y1

= y1
Γ(α 1 )Γ(α 2 )
λ α1 +α 2 y1α1 +α 2 − 1y 2α1 − 1 (1 − y 2 )α 2 − 1 e −λy1

=
Γ(α 1 )Γ(α 2 )
Example 6.19
Suppose the random variables X1 and X 2 have the joint density function
1
f X1 ,X2 ( x1 , x2 ) = , x1 ≥ 1, x2 ≥ 1
x12 x22
x
and that y1 = x1x2 and y 2 = 1 x2 . What is the density on Y ? For these
functions
 ∂g1 ∂g1 
 ∂x1 = x2 ∂x2 = x1 
J ( x1 , x 2 ) = 
∂g 1 ∂g 2 − x1 
 2 ∂x = ∂x2 = x22 
 1 x2

and
−2 x1 2 x1
J ( x1 , x 2 ) = = 2
x22 x2

We construct the functions h1 ( y1 , y 2 ) and h2 ( y1 , y 2 ) as
x1 = y1 y 2

and
y1
x2 = y2

Then, using Equation (6.21), we obtain

−1
fY1 ,Y2 ( y1 , y 2 ) = f X1 , X2 ( h1 ( y1 , y 2 ), h2 ( y1 , y 2 ))| J ( x1 , x2 )|
1  1 
( )
=  2 y 
y1
( y1y 2 ) y2
2
1
= 2
2 y1 y 2
In summary, there are many engineering applications in which we wish

to compute a function of a random variable or random vector. The transla-
tion from the random quantity to a system performance measure proceeds
as described earlier. In addition, once the density function or probability
mass function for the performance measure is obtained, we may compute
moments for the performance measure.
6.6 Expectation and Functions of Multiple Random Variables

Once the distribution on a function of one or more random variables has
been constructed, the mean and variance (as well as other moments) of that
distribution can be obtained in the usual manner. However, for some func-
tions, the mean and variance can be determined directly from those of the
variables in the function. Equation (6.7) states that
E[ aX + b] = aE[X ] + b
and this relationship extends to sums and differences of random variables.

That is, for the sum Z = aX + bY + c
E[Z] = aE[X ] + bE[Y ] + c
Similarly, using the definition of variance, we find that
Var[Z] = E[Z 2 ] − E 2 [Z] = E[( aX + bY + c)2 ] − E 2 [ aX + bY + c]
= E[ a 2 X 2 + 2 abXY + 2 acX + b 2Y 2 + 2bcY + c 2 ]
− a 2 E 2 [X ] − 2 abE[X ]E[Y ] − 2 acE[X ] − b 2 E 2 [Y ] − 2bcE[Y ] + E[c 2 ]
( ) ( )
= a 2 E[X 2 ] − E 2 [X ] + b 2 E[Y 2 ] − E 2 [Y ] + 2 ab ( E[XY ] − E[X ]E[Y ])
= a 2 Var[X ] + b 2Var[Y ] + 2 abCo var[X , Y ]
It is true that this type of analysis is not applicable to all functions. It usually
applies well to linear functions and many products. When it does not apply,
one must use the general definitions of mean and variance. Note particularly
that when it does apply, the variables in the function need not be independent.
6.7 Sums of Independent Random Variables

In the analysis of random quantities, a functional form that occurs quite
often is the sum. We may wish to compute the multiple interval product
demand as a sum of the demand in individual intervals. We may wish to
compute the buildup of component dimensional tolerances. We may wish
to calculate overall facility usage by a diverse set of hospital patients, or we
may wish to compute the aggregate value of a stock portfolio. While it is not
always the case that the constituent quantities are independent, it is often
the case that the quantities of interest are independent. When the random
variables included in a sum are independent, calculation of the sum will
conform to the rules described here.
To describe the rules of addition, it is sufficient to consider two random
variables, say X and Y, as sums of greater numbers of variables can always
be performed on a pairwise basis. Suppose the distributions FX(x) and FY(y)
are known and our objective is to identify FZ(z) where Z = X + Y. Starting
with the discrete case, observe that in order for Z to take any specific value,
it must be the case that X has a value no greater than Z and Y have the value
Z – X. In addition, all such combinations of X and Y should be considered.
Algebraically, this means
z z
FZ ( z) = Pr[Z ≤ z] = Pr[X + Y ≤ z] = ∑∑ f
i= 0 j= 0
X (i − j) fY ( j) (6.22)
Note that the expression exactly represents probabilities corresponding to

the aforementioned verbal description. Note further that the product calcu-
lation is only appropriate if the variables are independent. Two equivalent
forms to Equation (6.22) that are reminiscent of the conditioning construc-
tion are
FZ ( z) = ∑ F (z − j) f ( j) (6.23)
j= 0
X Y
and
z
FZ ( z) = ∑f
i= 0
X (i)FY ( z − i) (6.24)
For the pmf, the construction is the same
f Z ( z) = ∑ f (z − j) f ( j) (6.25)
j= 0
X Y
or equivalently
z
f Z ( z) = ∑f
i= 0
X (i) fY ( z − i) (6.26)
Following are some general application examples.
Example 6.20
Suppose the two random variables have binomial distributions so that
 n
f X ( x) =   p1x (1 − p1 )n− x
 x

and
 m
fY ( y ) =   p2y (1 − p2 )m− y
y 

Then
z z
 n   m
f Z ( z) = ∑f
j= 0
X ( z − j) fY ( j) = ∑  z − j p
j= 0
z− j
1 (1 − p1 )n− z+ j   p2j (1 − p2 )m− j
j 

In the special case in which the event probabilities are the same, so
p1 = p2 = p,
z
 n   m
f Z ( z) = ∑  z − j p
j= 0
z− j
(1 − p)n− z+ j   p j (1 − p)m− j
j 
z
 n   m  n + m
= p z q n+ m− z ∑  z − j  j  = 
j= 0 z 
z n+ m− z
p q

which is a binomial pmf.
Example 6.21
Suppose the two random variables have Poisson distributions so that
−λ 1
f X ( x) = e λ 1x
x!
and
−λ 2
fY ( y ) = e λ 2y
y!

Then
z z
∑ ∑  e λ 1z− j   e − λ 2 λ 2j 
− λ1
f Z ( z) = f X ( z − j) fY ( j) = ( z − j)!  j ! 
j= 0 j= 0
z
e − λ1 − λ 2 e − ( λ1 + λ 2 )
=
z! ∑ (z −zj!)! j! λ
j= 0
z− j
1 λ 2j =
z!
(λ 1 + λ 2 ) z

which is a Poisson pmf.
Example 6.22
Suppose the two random variables have geometric distributions so that
f N (n) = q1n−1 p1

and
f M (m) = q2m−1 p2

For K = N + M
k −1 k −1 k −1
f K (k ) = ∑
j=1
f N ( k − j) f M ( j) = ∑
j=1
q1k − j−1 p1 q2j−1 p2 = p1 p2 ∑q
j=1
k − j−1 j−1
1 q2
In the special case in which the event probabilities are the same, so
p1 = p2 = p ,
k −1
 k − 1
f K (k ) = p 2 ∑q
j=1
k −2
= ( k − 1) p 2 q k − 2 = 
 1 
2 k− 2
p q

which is a negative binomial pmf.
These examples illustrate the facts that the probabilities for the sums can
be computed directly and in some cases, the identity of the distribution is
preserved. Following is a more specific example.
Example 6.23
Suppose the one-day demand for a particular digital camera has the pmf
d 0 1 2 3
Pr(D=d) 0.1 0.4 0.3 0.2
What are the pmf and the cdf on the two-day demand?
The pmf is
d2 0 1 2 3 4 5 6
Pr(D2=d2) 0.01 0.08 0.22 0.28 0.25 0.12 0.04
The cdf is
d2 0 1 2 3 4 5 6
Pr(D2=d2) 0.01 0.09 0.31 0.59 0.84 0.96 1.00
The rules for the distribution on the sum of two continuous random vari-
ables are similar to those for discrete random variables. Essentially, the sums
are replaced by integrals. The concept is the same that all combinations of
the two variables should be included. Therefore,
z z
FZ ( z) = Pr[Z ≤ z] = Pr[X + Y ≤ z] =
∫ ∫
−∞ −∞
fX ( z − y ) fY ( y )dydx. (6.27)
Equivalent forms are

z
FZ ( z) =
∫ −∞
FX ( z − y ) fY ( y )dy (6.28)
and
z
FZ ( z) =
∫ −∞
fX ( x)FY ( z − x)dx (6.29)
The corresponding expressions for the probability density function are
z
f Z ( z) =
∫ −∞
fX ( z − y ) fY ( y )dy (6.30)
and
z
f Z ( z) =
∫ −∞
fX ( x) fY ( z − x)dx (6.31)
Following are some general examples.
Example 6.24
Suppose the two random variables have exponential distributions so that
f X ( x) = λ 1 e − λ 1 x
and
fY ( y ) = λ 2 e − λ 2 y
Then
z z
f Z ( z) =
∫
−∞
f X ( z − y ) fY ( y )dy =
∫0
λ 1e − λ1 ( z− y )λ 2 e − λ 2 y dy
z
z  e − ( λ 2 − λ1 ) y 
= λ 1λ 2 e − λ 1z
∫
0
e − ( λ 2 − λ1 ) y
dy = λ 1λ 2 e − λ 1z
 − λ − λ 
2 1
0
 1 − e − ( λ 2 − λ1 ) z  λλ
= λ 1λ 2 e − λ 1z 
 λ 2 − λ 1  λ 2 − λ
= 1 2 e − λ 1z − e − λ 2 z ( )

For the special case in which both distributions have the same value
for the rate parameter, so that λ1 = λ2 = λ,
z z
f Z ( z) =
∫ 0
f X ( z − y ) fY ( y )dy =
∫ 0
λe − λ( z− y )λe − λy dy
∫
z
= λ 2 e − λz dy = λ 2 e − λz y 0 = λ 2 ze − λz
0
which is a gamma density function.
Example 6.25
Suppose the two random variables have gamma distributions so that
λ α1 1 x α1 −1 e − λ1x
f X ( x) =
Γ(α 1 )
and
λ α2 2 y α 2 −1 e − λ 2 y
fY ( y ) =
Γ(α 2 )
Then
z
f Z ( z) =
∫ 0
f X ( z − y ) fY ( y )dy
z
λ α1 1 ( z − y )α1 −1 e − λ1 ( z− y ) λ α2 2 y α 2 −1 e − λ 2 y
=
∫ 0 Γ(α 1 ) Γ(α 2 )
dy
λ 1α1 λ α2 2 z
=
Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1 e − λ1 ( z− y )− λ 2 y dy
λ 1α1 λ α2 2 e − λ1z z

=
Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1 e −( λ 2 − λ1 )y dy
For the special case in which both distributions have the same value
for the rate parameter, so that λ1 = λ2 = λ,
z z
λ α1 ( z − y )α1 −1 e − λ( z− y ) λ α 2 y α 2 −1e − λy
f Z ( z) =
∫0
f X ( z − y ) fY ( y )dy =
∫ 0 Γ(α 1 ) Γ(α 2 )
dy
λ α1 +α 2 z
=
Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1e − λ( z− y )− λy dy
λ α1 +α 2 e − λz z
=
Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1dy
zα1 +α 2 − 2 λ α1 +α 2 e − λz z
=
zα1 +α 2 − 2 Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1dy
z λ α1 +α 2 zα1 +α 2 − 2 e − λz z
z − y α1 −1 y α 2 −1

=
z Γ(α 1 )Γ(α 2 ) ∫(
0 z
) ( )
z
dy
y dy
Now, let w = so dw = and
z z
λ α1 +α 2 zα1 +α 2 −1 e − λz 1
f Z ( z) =
Γ(α 1 )Γ(α 2 ) ∫ (1 − w)
0
α1 −1
(w)α 2 −1 dw
λ α1 +α 2 zα1 +α 2 −1 e − λz Γ((α 1 )Γ(α 2 ) λ α1 +α 2 zα1 +α 2 −1 e − λz

= =
Γ(α 1 )Γ(α 2 ) Γ(α 1 + α 2 ) Γ(α 1 + α 2 )
which is a gamma density function.

Finally, here is a specific example.
Example 6.26
Suppose a machining station at a manufacturing facility takes an expo-
nentially distributed time to process a single piece. The distribution has
parameter λ = 4/hr. Suppose further that the cost of holding in-process
inventory is proportional to the length of time a workpiece waits for pro-
cessing. If a particular workpiece arrives to the inventory and finds four
pieces ahead of it with the first one just entering the machining station,
what is the distribution on the time until processing is started on the
arriving piece?
For this problem, the total waiting time is T T = T1 + T2 + T3 + T4. We know
that all of the times have exponential distributions with a common rate
parameter, so following Example 6.24, if X = T1 + T2 and Y = T3 + T4, then
x x
f X ( x) =
∫ 0
fT1 ( x − t2 ) fT2 (t2 )dt2 =
∫ 0
λe − λ( x −t2 ) λe − λtt2 dt2
∫
x
= λ 2 e − λx dt2 = λ 2 e − λx t2 0 = λ 2 xe − λx
0
y y
fY ( y ) =
∫ 0
fT3 ( y − t4 ) fT4 (t4 )dt4 =
∫ 0
λe − λ( y −t4 ) λe − λtt4 dt4
∫
y
= λ 2 e − λy dt4 = λ 2 e − λy t4 0 = λ 2 ye − λy
0
Next, we note that TT = X + Y, both of which have gamma distributions

with the same parameters, so we apply the result of Example 6.25 to obtain
λ α1 +α 2 t α1 +α 2 −1 e − λt 4 4 t 3 e −4t 128t 3 e −4t

fTT (t) = = =
Γ(α 1 + α 2 ) Γ(( 4) 3
Of course, the result of Example 6.25 implied this result, so we could

have simply stated it at the start. The stepwise construction is interesting
because of the transition in density function forms.
Exercises
6.1 A discrete random variable, X, has the following probability mass
function:
0.08 x=0
0.24 x=1

fX ( x) = 0.30 x=2
0.20 x=3

0.18 x=4

What are the values for E[X] and Var[X]?

6.2 For the pmf associated with the sum of the numbers observed
when two fair six-sided dice are rolled, compute E[X] and Var[X].
6.3 At a local pizza restaurant, the hourly demand, D, for a particular
type of pizza has the following probability mass function:
 0.08 d=0
0.12 d=1

 0.18 d=2

fD (d) =  0.24 d=3
 0.16 d=4

0.12 d=5
 0.10 d=6

Compute E[D] and Var[D].

6.4 A random variable Y has the following pmf:
 0 y=0
0.8c y=1

1.2 c y=2
fY ( y ) = 
 c y=3
0.6c y=4

0.4c y=5

Compute E[Y] and Var[Y]. (Exercise 4.8, Chapter 4)

6.5 If a manufacturing process generates units of product of which 1.5%
are defective, and we take an inspection sample of n = 80 copies of
the product, what are the mean and variance for the number of defec-
tive units found in the sample? Explain why it is reasonable to have a
noninteger expected value.
6.6 For the bearing inspection process described in Exercise 4.27
of Chapter 4, what are the mean and variance for the number of
inspections performed?
6.7 If a random variable has the probability density defined as
f X(x) = 1.5(1 – x2), 0 ≤ x ≤ 1, compute E[X] and Var[X].
6.8 Compute E[T] and Var[T] for the exponential variable having λ = 0.024.
6.9 A random variable X has E[X] = 2 and Var[X] = 4.2.
a. Compute E[(X + 1)2].
b. Compute Var[2X + 3].
6.10 Suppose a random variable has density function
 ax + bx 2 0<x<1
f X ( x) = 
 0 elsewhere
If E[X] = 0.6, what are the values of a and b?

6.11 A random variable X has E[X] = 4.4 and Var[X] = 1.6.
a. Compute E[(2X 2 – X + 3)].
3X − 1
b. Compute Var[ ].
4
6.12 If the random variables X and Y are independent and have the
same probability distribution, determine E[(X − Y)2].
6.13 For the joint pmf on the discrete random variables M and N,
e −7 4m 3n− m
f MN (m, n) = , 0 ≤ m ≤ n, 0 ≤ n < ∞
m !(n − m)!
Determine E[MN].
and Y is f XY(x, y) = x + y, 0 < x < 1, 0 < y < 1. Compute CoVar[XY].
6.15 Consider the joint density function
6 xy
f ( x , y ) = ( x 2 + ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
XY 7 2
Compute CoVar[XY].
6.16 For the joint pmf on the discrete random variables M and N,
e −7 4m 3n− m
f MN (m, n) = , 0 ≤ m ≤ n, 0 ≤ n < ∞
m !(n − m)!
Determine E[M|N = 12], Var[M|n = 12], E[N|M =4 ] and Var[N|M = 4].

− x/y − y
f x , y = e e , 0 < x < ∞, 0 < y < ∞
( ) y
Find the expected value of X given that Y = y.
6.18 The joint density on the random variables X and Y is f XY(x, y) = xe –x (y + 1),
0 ≤ x ≤ ∞, 0 ≤ y ≤ ∞. Compute E[Y|X = 12].
and Y is
0.02 xe −0.01y
fXY ( x , y ) = , 0 ≤ x ≤ y, y > 0
y2
Determine Pr[X > 1.4|Y = 2.5], E[X|Y = 2.5] and Var[X|Y = 2.5].
6.20 For the pmf of Exercise 6.1, determine the values for E[X|X > 1] and
Var[X | X > 1].
6.21 The density function on a random variable X is fX (x) = x 2 , 0 ≤ x ≤ 2.
Compute E[X|X > 0.5].
6.22 The random variables X and Y have joint density f XY(x, y) = e – (x + y),
0 ≤ x ≤ ∞, 0 ≤ y ≤ ∞. Compute Pr[X < Y].
6.23 Suppose X and Y are independent exponential random variables.
Identify the distribution function on Z = X/Y and construct an
expression for Pr[X < Y].
and Y is
x
fXY ( x , y ) = + cy , 0 ≤ x ≤ 1, 1 ≤ y ≤ 5
5
Compute Pr[X + Y ≥ 3.5].
6.25 The joint density function on X and Y is f XY(x, y) = (x + y), 0 ≤ x ≤ 1,
0 ≤ y ≤ 1. Compute Pr[X + Y ≤ 0.8].
6.26 Suppose X and Y are independent and identically distributed uni-
form random variables over the range (0, 1) and that we define the
variables U = X + Y and V = X/Y. Determine the joint density on U
and V.
6.27 Suppose X and Y are independent and identically distributed
exponential random variables having parameter λ = 1 over the
range (0,1) and that we define the variables U = X + Y and V =
X/(X + Y). Determine the joint density on U and V.
6.28 For the random variables analyzed in Exercise 6.13, compute
E[N – M] and Var[N–M].
E[X + Y] and Var[X + Y].

E[X + Y] and Var[X + Y].
6.31 John tosses a fair six-sided die 12 times. Let X represent the num-
ber of times that a 4 occurs in those 12 tosses. Mary then tosses
the same die 8 times. Let Y represent the number of times that a 2
occurs in those 8 tosses and let Z = X + Y. Compute Pr[Z = 4].
6.32 Friday sales of a popular beverage is a Poisson random variable
with parameter λfri = 5.8, whereas the corresponding Saturday sales
volumes are Poisson with parameter λ sat = 6.4. Identify the distribu-
tion on Z, the volume of two-day sales, and compute Pr[Z ≥ 12].
6.33 A single pump is to be used sequentially to drain two liquid
reserves. Let X represent the time to drain the first reserve and
Y represent the time to drain the second reserve. Assuming
f X(x) = 0.4e – 0.4 x and f Y(y) = 0.4e – 0.4 y. Compute Pr[X + Y > 4.5].
6.34 Suppose X and Y are independent random variables with the expo-
nential distributions f X(x) = 2e – 2 x and f Y(y) = 2.4e – 2.4 y. For Z = X + Y
determine fz(z) and compute Pr[Z > 4.8].
6.35 Suppose X, Y, and Z are independent exponential random variables
with common parameter λ = 1.50. Compute the probability that the
sum of these random variables does not exceed 4.2.
6.36 Suppose X and Y are normally distributed random variables
having µX = 24.0, µY = 20.0, σ 2X = 4.0, and σ Y2 = 2.25. Compute
Pr[X + Y ≤ 47.5].
7
Moment-Generating Functions
The moment-generating function for a distribution contains all of the informa-

tion about the distribution and can be used to “generate the moments” of the
distribution. Essentially, probability distributions are characterized by their
moments so it is often useful to be able to generate the distribution moments.
7.1 Construction of the Moment-Generating Function

Regardless of whether a random variable is discrete or continuous, the
moment-generating function (mgf) for its distribution is defined as an expec-
tation. Specifically
MX (θ) = E[e θx ] (7.1)
Given our understanding of expectation, we know that the application of

this definition to a discrete random variable implies
MX (θ) = E[e θx ] = ∑ex

θx
Pr[X = x] (7.2)
and the corresponding form for a continuous random variable is
MX (θ) = E[e θx ] =
∫e x
θx
fX ( x) dx (7.3)
Then, once the moment-generating function is constructed, the moments

of the distribution are obtained as successive derivatives of the mgf. That is,
dk
E[X k ] = MX (θ) (7.4)
dθ k θ= 0
so the kth moment is computed as the kth derivative of the mgf evaluated at θ = 0.
This relationship applies to both discrete and continuous random variables.
Example 7.1
A binomial distribution having parameters n and p has the moment-
generating function
137
n
 n n
 n
MX (θ) = ∑ x=0
e θx   p x q n− x =
 x
∑  x (pe ) q
x=0
θ x n− x
= (q + pe θ )n
The mean and variance of the distribution may be obtained as
d d
E[X ] = MX (θ) = (q + pe θ )n = npe θ (q + pe θ )n − 1 = np
dθ θ= 0 dθ θ= 0
θ= 0
d2 d
E[X 2 ] = MX (θ) = npe θ (q + pe θ )n−1
dθ2 θ= 0
dθ θ= 0
= npe θ (q + pe θ )n−1 + n(n − 1) p 2 e 2 θ (q + pe θ )n− 2

θ= 0
2
= np + n(n − 1) p = np + n p − np 2 2 2
so as we know,
Var[X ] = E[X 2 ] − E 2 [X ] = np + n2 p 2 − np 2 − n2 p 2 = np − np 2 = npq
Example 7.2
An exponential distribution having parameter λ has the moment-
generating function
∞ ∞
λ
MX (θ) = E[e θx ] =
∫ 0
λe θx e −λx dx =
∫ 0
λe −( λ−θ) x dx =
λ−θ
The mean and variance of the distribution may be obtained as
d d λ λ 1
E[X ] = MX (θ) = = =
dθ θ= 0 dθ λ − θ θ= 0 ( λ − θ)2 θ= 0
λ
d2 d λ 2λ 2
E[X 2 ] = MX (θ) = = =
dθ 2
θ= 0
dθ (λ − θ)2 θ= 0
(λ − θ)3 θ= 0
λ2
so
2 1 1
Var[X ] = E[X 2 ] − E 2 [X ] = − 2 = 2
λ 2
λ λ
The moment-generating function can be constructed for empirical distri-
butions as well as for the standard distribution families. Consider the two-
day demand distribution constructed in Example 6.23 in Chapter 6. For that
distribution, the mgf is
MX (θ) = ∑e
x
θx
Pr[X = x]
= 0.01 + 0.08e θ + 0.22 e 2 θ + 0.28e 3θ + 0.25e 4θ + 0.12 e 5θ + 0.04e 6θ

Moment-Generating Functions 139
Then, the mean and variance are obtained in the same manner as for other
distributions.
E[X ] =
d
dθ
(
0.01 + 0.08e θ + 0.22 e 2 θ + 0.28e 3θ + 0.25e 4θ + 0.12 e 5θ + 0.04e 6θ )
θ= 0
(
= 0.08e θ + 0.44e 2 θ + 0.84e 3θ + 1.0e 4θ + 0.60e 5θ + 0.24e 6θ ) θ= 0
= 3.2
E[X 2 ] =
d
dθ
(
0.08e θ + 0.44e 2 θ + 0.84e 3θ + 1.0e 4θ + 0.60e 5θ + 0.24e 6θ )
θ= 0
(
= 0.08e θ + 0.88e 2 θ + 2.54e 3θ + 4.00e 4θ + 3.0e 5θ + 1.44e 6θ ) θ= 0
= 11.94
so
Var[ X ] = E[ X2 ] − E2[ X ] = 11.94 − (3.2)2 = 1.70
Next, the fact that the normal distribution is so widely used makes it
worthwhile to include the construction of its moment-generating function
here. Recall that the density function for the normal distribution is
2
1 −( x − µ )
f X ( x) = e 2 σ2
2
2 πσ
Applying the definition of the mgf:
2 2
∞
1 −( x − µ ) ∞
1 θx −( x − µ )
MX (θ) =
∫−∞
e θx
2 πσ 2
e 2σ2 dx =
∫ −∞ 2 πσ 2
e 2σ2 dx
Working with the exponent, we express it using a common denominator

and then complete the square:
2θxσ 2 − ( x 2 − 2µx + x 2 ) 2(µ + θσ 2 )x − x 2 − µ 2

=
2σ 2 2σ 2
x 2 − 2(µ + θσ 2 )x + µ 2 + 2µθσ 2 − 2µθσ 2 + θ2 σ 4 − θ2 σ 4
=−
2σ 2
( x − (µ + θσ 2 ))2 − 2µθσ 2 − θ2 σ 4 ( x − (µ + θσ 2 ))2 2µθσ 2 + θ2 σ 4
=− = − +
2σ 2 2σ 2 2σ 2
( x − (µ + θσ 2 ))2 2µθ + θ2 σ 2
=− +
2σ 2 2
Returning this form of the exponent to the mgf expression:
∞ ( x − ( µ+θσ 2 )2 2 µ θ+θ2 σ 2 2 µθ+θ2 σ 2 ∞ ( x − ( µ+θσ 2 )2

1 1
∫ ∫
− + −
MX (θ) = e 2 σ2 2
dx = e 2 e 2 σ2 dx
2 2
−∞ 2 πσ −∞ 2 πσ
θ2 σ 2
µθ+
=e 2
The integral corresponds to an integral over the entire range of a normal

random variable having mean equal to μ + θσ2. Note that taking the first two
derivatives returns the moments with which we are familiar:
d d µθ+
θ2 σ 2  µθ+
θ2 σ 2 
E[X ] = MX (θ) = e 2 =  (µ + θσ 2 )e 2
 =µ
dθ θ= 0 dθ   θ= 0
θ= 0
θ2 σ 2
d2 d µθ+
E[X 2 ] = 2 MX (θ) = (µ + θσ 2 )e 2
dθ θ= 0
dθ
θ= 0
2 2 2 2
θ σ θ σ
µθ+ µθ+
= σ 2e 2 + (µ + θσ 2 )2 e 2 = σ2 + µ2
θ= 0
7.2 Convolutions
There are numerous applications of the moment-generating functions, but
the most widely implemented is the determination of the distribution on the
sum of independent random variables. Recall, that some of the sums were
constructed directly in Chapter 6. Unfortunately, not all sums of indepen-
dent random variables can be analyzed directly as was done in Chapter 6. For
example, suppose X and Y are independent random variables with normal
distributions. The computation rules of Chapter 6 imply that for Z = X + Y
( z − y − µx )2 ( y − µy )2
− −
z z 2 σ 2x 2 σ 2y
e e
f Z ( z) =
∫ −∞
fX ( z − y ) fY ( y ) dy =
∫−∞ 2 πσ 2x 2 πσ 2y
dy

and evaluating this integral is very difficult. On the other hand, the dis-
tribution on the sum of independent random variables is known to have a
moment-generating function comprised of the product of the moment gener-
ating functions of the variables in the sum. In general, as well as for the case
of two normal variables
MZ (θ) = MX (θ)MY (θ) (7.5)

so for the two normal variables
θ2 σ X
2
θ2 σ Y
2
θ2 ( σ X
2 2
+σY ) θ2 σ 2Z
µX θ+ µY θ+ ( µX + µY )θ+ µZ θ+
MZ (θ) = e 2 e 2 =e 2 =e 2

which we recognize as the mgf for a normal random variable, in this case Z,
having μz = μX + μY and σ 2Z = σ 2X + σ Y2 .
The process of constructing the distribution on the sum of independent
random variables is called taking the convolution of the variables. Thus, we
would say that Z is the convolution of X and Y.
As indicated in Chapter 6, sums of more than two random variables can
be accumulated pairwise. However, using the moment-generating func-
tions, the distribution on the sum of several independent random variables
can be directly identified. For the set of independent random variables, say
X1, X2, …, Xn, the random variable Y = X1 + X2 + … + Xn has a distribution for
which the mgf is
MY (θ) = ∏M
i =1
Xi (θ) (7.6)
Thus, in principle, the construction of the distribution on the sum is straight-

forward. This is because in constructing the moment-generating function, we
are performing a transform operation on the probability function. Thus, we
can recapture the probability function by inverting the transform. In point
of fact, the inversion can be quite difficult. However, the situation is not dire.
Moment-generating functions have the characteristic that they are unique.
If we have a moment-generating function with the format
θ2 σ 2Z
µZ θ+
MZ (θ) = e 2
it must be the case that the random variable Z has a normal distribution with
the indicated parameters. Similarly, if we find a random variable with the
moment-generating function
MX(θ) = ( q + pe θ )n
then it must be the case that the random variable X has a binomial distribu-
tion. Similar statements apply to each of the standard distributions described
in this text, so it is only in the analysis of other probability distributions
that we must actually perform the inversion operation on the mgf. For those
cases, most modern mathematical software packages now have inversion

algorithms that, in the worst case, yield the distribution in numerical form.
7.3 Joint Moment-Generating Functions

In Chapter 6 we observed that joint probability distributions can be used to
model the behaviors of systems that are described in terms of random vec-
tors. It is reasonable to expect that those joint distributions also have moment-
generating functions and, in fact, they do. In general, for a random vector
X = (X 1 , X 2 , ..., X n ) having joint distribution function FX1 ,X2 , ...,Xn ( x1 , x2 , ..., xn ) ,
the joint mgf is defined as
MX1 ,X2 ,...,Xn (θ1 , θ2 ,..., θn ) = E  e θ1X1 +θ2X2 ++θnXn 
=
∫ ∫ ∫ e
x1 x2 xn
θ1x1 + θ2 x2 ++ θn xn
fX1 ,X2 ,...,Xn ( x1 , x2 ,..., xn )dxn ,..., dx1 (7.7)
As before, successive derivatives of the moment-generating function yield

the moments of the distribution. However, the relationships can be a bit com-
plicated. In the case of a two-dimensional vector
∂n
MX ,Y (θ1 , θ2 ) θ = 0 = E[X n− mY m ] (7.8)
∂θ ∂θ1n− m
m
2
1
θ2 = 0
Also, the extension to higher dimensionality follows the pattern implied

in Equation (7.8). Thus, for a random vector X = (X 1 , X 2 ,..., X n ) having joint
distribution function FX1 , X2 ,..., Xn ( x1 , x2 ,..., xn )
ds
E[X 1r1 X 2r2 X ns− r1 − r2 … − rn−1 ] = MX1 ,X2 ,...,Xn (θ1 , θ2 ,..., θn )
dθ1r1 θ2r2 θns− r1 − r2 … − rn−1 θ= 0

(7.9)
Consider two-dimensional discrete and continuous examples.
Example 7.3
Suppose the following empirical pmf describes a process of interest to us:
fX,Y (1,1) = 1/9 fX,Y (1,2) = 1/6 fX,Y (1,3) = 1/18
fX,Y (2,1) = 1/18 f (2,2) = 1/9 fX,Y (2,3) = 1/9
fX,Y (3,1) = 1/9 fX,Y (3,2) = 1/9 fX,Y (3,3) = 1/6
For this joint distribution, the joint moment-generating function is

3 3
MX ,Y (θ1 , θ2 ) = E[e θ1X +θ2Y ] = ∑∑e

x=1 y =1
θ1 X + θ 2 Y
f XY ( x , y ) =
1 θ1 + θ 2 1 θ1 + 2 θ 2
9
e + e
6
1 θ 1 + 3 θ 2 1 2 θ1 + θ 2 1 2 θ 1 + 2 θ 2 1 2 θ 1 + 3 θ 2 1 3 θ1 + θ 2
+ e + e + e + e + e
18 18 9 9 9
1 1
+ e 3 θ1 + 2 θ 2 + e 3 θ1 + 3 θ 2
9 6
The first joint moment is
∂2 1 2 3 2
E[XY ] = MX ,Y (θ1 , θ2 ) = e θ1 +θ2 + e θ1 + 2 θ2 + e θ1 + 3θ2 + e 2 θ1 +θ2
∂θ2 ∂θ1 9 6 18 18
4 6 3 6 9
+ e 2 θ 1 + 2 θ 2 + e 2 θ 1 + 3 θ 2 + e 3 θ 1 + θ 2 + e 3 θ 1 + 2 θ 2 + e 3 θ1 + 3 θ 2
9 9 9 9 6
Evaluating at θ1 = θ2 = 0
∂2 1 2 3 2 4 6 3 6 9 78 13
MX ,Y (θ1 , θ2 ) = + + + + + + + + = =
∂θ2 ∂θ1 θ1 = 0 9 6 18 18 9 9 9 9 6 18 3
θ2 = 0

Example 7.4
−x− y
For the joint density f XY ( x , y ) = 2 e , 0 ≤ x ≤ y , 0 ≤ y < ∞ , the joint
moment-generating function is

∞ y ∞ y
MX ,Y (θ1 , θ2 ) = E[e θ1X +θ2Y ] = 2
∫ ∫
0 0
e θ1x +θ2 y e − x − y dxdy = 2
∫ 0
e −(1−θ2 ) y
∫
0
e −(1−θ1 ) x dxdy
y
∞ ∞
 −1  −(1−θ1 ) x
=2
∫ 0
e − ( 1− θ2 ) y
 1 − θ  e
1
dy =
1
2
− θ1 ∫ 0
(
e −(1−θ2 ) y 1 − e −(1−θ1 ) y dy )
0
∫ (e )
2 − ( 1− θ2 ) y
= − e −( 2 −θ2 −θ1 ) y dy
1 − θ1 0
∞
2  −1 −(1−θ2 ) y −1 
= e − e − ( 2 − θ 2 − θ1 ) y 
1 − θ1  1 − θ2 2 − θ 2 − θ1 0
2  1 1  2  (1 − θ1 ) + (1 − θ2 ) − (1 − θ2 ) 
= − =
1 − θ1  1 − θ2 2 − θ2 − θ1  1 − θ1  (1 − θ2 )(((1 − θ1 ) + (1 − θ2 )) 
2 2
= =
(1 − θ2 )((1 − θ1 ) + (1 − θ2 )) (1 − θ2 )(1 − θ1 ) + (1 − θ2 )2
Here again
∂2 ∂ 2(1 − θ2 )
E[XY ] = MX ,Y (θ1 , θ2 ) =
( )
2
∂θ2 ∂θ1 θ1 = 0
θ2 = 0
∂θ 2 (1 − θ 2 )(1 − θ1 ) + (1 − θ 2 )
2
θ1 = 0
θ2 = 0
=
( )
−2 (1 − θ2 )(1 − θ1 ) + (1 − θ2 )2 + 4(1 − θ2 ) ( 2(1 − θ2 ) + (1 − θ1 ))
((1 − θ )(1 − θ ) + (1 − θ ) )
2 1 2
2 3
θ1 = 0
θ2 = 0
−2 ( 1 + 1) + 4(1) ( 2 + 1) −4 + 12
= = =1

(1 + 1) 3
8
Next, recall that joint distributions can be analyzed to identify mar-

ginal probability functions. We know that in general
FX1 ( x1 ) = FX1 , X2 ,..., Xn ( x1 , ∞, ..., ∞)

FX2 ( x2 ) = FX1 , X2 ,..., Xn (∞, x2 , ∞, ..., ∞)

and so on. In terms of two dimensions, this is
FX1 ( x1 ) = FX1 , X2 ( x1 , ∞)

FX2 ( x2 ) = FX1 , X2 (∞, x2 )

The corresponding result for the joint moment-generating functions is
MX (θ1 ) = MXY (θ1 , 0) and MY (θ2 ) = MXY (0, θ2 ) (7.10)
In other words, we can obtain the mgf for a marginal distribution by

evaluating the joint mgf with the complementary transform variables set
to zero. This applies to joint marginal probability functions as well as to
one-dimensional marginals.
Example 7.5
For the empirical distribution presented in Example 7.3, the marginal
probability mass functions are
fX(1) = 1/3 fX(2) = 5/18 fX(3) = 7/18
fY(1) = 5/18 fY(2) = 7/18 fY(3) = 1/3
Using the joint mgf from Example 7.3, we find that
1 θ1 5 2 θ1 7 3 θ1
MX (θ1 ) = MXY (θ1 , 0) = e + e + e
3 18 18
and
5 θ2 7 2 θ2 1 3 θ2
MY (θ2 ) = MXY (0, θ2 ) = e + e + e
18 18 3
Example 7.6
For the continuous distribution analyzed in Example 7.4, the marginal
probability density functions are
∞ ∞

f X ( x) =
∫x
f XY ( x , y ) dy = 2
∫ x
e − x − y dy = 2 e −2 x
y y

fY ( y ) =
∫
0
f XY ( x , y ) dx = 2
∫ 0
e − x − y dx = 2 e − y [1 − e − y ]
Evaluating the joint mgf, we obtain
2
MX (θ1 ) = MXY (θ1 , 0) =
2 − θ1
and
2
MY (θ2 ) = MXY (0, θ2 ) =
(1 − θ2 ) + (1 − θ2 )2
Note also that
∂ 2(1 − θ2 ) 1
E[X ] = MX ,Y (θ1 , θ2 ) = =
( )
2
∂θ1 θ1 = 0 (1 − θ 2 )(1 − θ1 ) + (1 − θ2 )2 2
θ2 = 0 θ1 = 0
θ2 = 0
∂2 4(1 − θ2 ) ( 2 − θ1 − θ2 ) 1
E[X 2 ] = MX ,Y (θ1 , θ2 ) = =
∂θ12 θ1 = 0
θ2 = 0
(1 (
− θ 2 ) ( 2 − θ1 − θ 2 )
2 2
) θ1 = 0
2
θ2 = 0
∂ 2 ((1 − θ1 ) + 2(1 − θ2 )) 3
E[Y ] = MX ,Y (θ1 , θ2 ) = =
( )
2
∂θ2 θ2 = 0 (1 − θ2 )(1 − θ1 ) + (1 − θ2 )2 2
θ1 = 0 θ2 = 0
θ1 = 0
∂2
E[Y 2 ] = MX ,Y (θ1 , θ2 )
∂θ22 θ2 = 0
θ1 = 0
−4(1 − θ2 ) ( 2 − θ1 − θ2 ) + ( 6 − 2θ1 − 4θ2 ) ( 6 − 2θ1 − 4θ2 ) 7

= =
(1 − θ2 )3 ( 2 − θ1 − θ2 )
3
θ2 = 0 2
θ1 = 0
The point of these calculations is that the joint moment-generating func-

tion can be used to construct moments of the joint distribution or moments of
any marginal distribution. The process is always the same. Take the appro-
priate derivative of the mgf and evaluate the result with the values of the
corresponding transform variables set to zero.
7.4 Conditional Moment-Generating Functions

It should not be surprising to find that conditional distributions also have
moment-generating functions and conditional moments. Since conditional
probability distributions are proper distributions, the rules for constructing
a conditional mgf are those stated in Equation (7.2) and Equation (7.3). The
computation of the conditional moments is performed using Equation (7.4).
Following are some examples.
Example 7.7
For the joint probability mass function
e −17 11y 1 6 y 2 − y 1
fY1 ,Y2 ( y1 , y 2 ) = , 0 ≤ y1 ≤ y 2 , 0 ≤ y 2 < ∞
y 1 !( y 2 − y 1 )!
We found in Example 5.7 that
e −6 6 y 2 − y 1
fY2|Y1 ( y 2 |y1 ) =
( y 2 − y1 )!
and
 y 2   11  y 1  6  y 2 − y 1
fY1|Y2 ( y1|y 2 ) =      
 y1   17   17 
The joint mgf is
∞ y2
e −17 11y1 6 y2 − y1
MY1 ,Y2 (θ1 , θ2 ) = ∑∑e
y 2 = 0 y1 = 0
θ1 y 1 + θ 2 y 2
y1 !( y 2 − y1 )!
∞ y2
e θ2 y 2 y 2 !(11e θ1 )y1 6 y2 − y1
= e −17 ∑
y2 =0
y2 ! ∑
y1 =0
y1 !( y 2 − y1 )!
(e )
∞ ∞ θ2 y2
e θ2 y 2 (6 + 11e θ1 )
=e −17
∑
y2 =0
y2 !
(6 + 11e θ1 )y2 = e −17
y
∑
2 =0
y2 !
θ2
( 6+ 11eθ1 )− 17
= ee
and the conditional moment-generating functions are

y2
 y 2   11  y1  6  y2 − y1
MY1|Y2 (θ1 |y 2 ) = ∑e
y1 = 0
θ1y1
 y1   17   17 
y2 y y 2 − y1 y2
 y2 
∑  y   1711 e
1
θ1   6  6 11 θ1 
=    = + e
1 17  17 17 
y1 = 0

and
∞ ∞
e −6 6 y2 − y1 (6e θ2 )y2 − y1
∑ ∑
θ2
MY2|Y1 (θ2 |y1 ) = e θ2 y 2 = e −6 +θ2 y1 = e 6 e − 6 +θ2 y1
y 2 = y1
( y 2 − y1 )! ( y 2 − y 1 )!
y 2 − y1 =0

Therefore,
y2
d d  6 11 θ1 
E[Y1 |Y2 ] = MY1|Y2 (θ1 |y 2 ) =  + e 
dθ1 θ1 = 0
dθ1  17 17 
θ1 = 0
y2 − 1
 6 11 θ1  11 θ1 11
= y2  + e e = y2
 17 17  17 17
θ1 = 0

and
d d 6 eθ2 − 6 +θ2 y1
E[Y2 |Y1 ] = MY2|Y1 (θ2 |y1 ) = e
dθ2 θ2 = 0
dθ2 θ2 = 0
θ2
− 6 +θ2 y1
= (6e θ2 + y1 )e 6 e = 6 + y1
θ2 = 0

Example 7.8
For the joint probability density function f XY( x, y ) = 2e−x−y, 0 ≤ x ≤ y, 0 ≤ y ≤ ∞
we found that
fY|X ( y|x) = e −( y − x )

and
e−x
f X|Y ( x|y ) =
[1 − e − y ]
Therefore,
y y
e−x 1 y
MX|Y (θ1 ) =
∫
0
e xθ1 f X|Y ( x|y ) dx =
∫ 0
e xθ1
1 − e−y
dx =
1 − e−y ∫ 0
e −(1−θ1 ) x dx
y
e −(1−θ1 ) x 1 – e –(1– θ1 ) y
=− =
(1 − e )(1 − θ1 ) 0 (1 − e − y )(1 − θ1 )
−y
∞ ∞ ∞
MY|X (θ2 ) =
∫0
e yθ2 fY|X ( y|x) dy =
∞
∫ 0
e yθ2 e −( y − x ) dy = e x
∫
0
e −(1−θ2 ) y dy
e −(1−θ2 ) y e θ2 x
= −e x =
1 − θ2 1 − θ2
x
and
∂
MX|Y (θ1 ) =
( )
1 – e −(1−θ1 ) y – ( 1 − θ1 ) ye −(1−θ1 ) y
=
1 – e – y – ye − y
∂θ1 θ1 = 0
(1 − e − y )(1 − θ1 )2 1 − e−y
θ1 = 0

∂ xe θ2 x (1 − θ2 ) − e θ2 x (−1)
MY|X (θ2 ) = = x+1
∂θ2 θ2 = 0
(1 − θ2 )2 θ2 = 0

∂2
MX|Y (θ1 ) =
( ) (( )
− (1 − θ1 ) y 2 e −(1−θ1 ) y (1 − θ1 ) − 2 1 − e −(1−θ1 ) y − (1 − θ1 ) ye −(1−θ1 ) y (−1) )
∂θ12
θ1 = 0
(1 − e − y )(1 − θ1 )3
θ1 = 0
=
(
− y 2 e − y + 2 1 − e − y − ye − y ) = 2 − 2e −y
− 2 ye − y − y 2 e − y
1 − e−y 1 − e−y
∂2 x 2 e θ2 x (1 − θ2 )2 − 2 ( 1 + x − θ2 x ) e θ2 x (−1)
MY|X (θ2 ) = = x2 + 2x + 2
∂θ22
θ2 = 0
(1 − θ 2 ) 3
θ2 = 0
2
2 − 2 e − y − 2 ye − y − y 2 e − y  1 − e − y − ye − y  1 − 2 e − y − y 2 e − y + e −2 y
Var[X|Y ] = −y − −y  =
1− e  1− e  (1 − e − y )2
Var[Y|X ] = 2 + 2 x + x 2 − ( x + 1) = 1
2
Recall also that, as shown in Example 6.13, these conditional expecta-

tions provide a convenient basis for computing the expectations for the
marginal distributions.
Exercises
7.1 Construct the moment-generating function for a Bernoulli
distribution.
7.2 Construct the moment-generating function for a Poisson
distribution.
7.3 Construct the moment-generating function for the distribution on

pizza demand given in Exercise 4.4 of Chapter 4.
f X( x ) = 0.5, 1 ≤ x ≤ 3. Identify the moment-generating function for the
distribution.
7.5 The distribution for a random variable has the moment-generating
function MX (θ) = ( 1 2)8 (e θ + 1)8 . Compute the mean and variance of
the distribution.
7.6 Construct the moment-generating function for a geometric
distribution.
7.7 Construct the moment-generating function for a negative binomial
distribution.
7.8 Construct the moment-generating function for a sum of two inde-
pendent Poisson random variables. State the probability density
function for the sum.
7.9 Construct the moment-generating function for a sum of n indepen-
dent Poisson random variables. State the probability density func-
tion for the sum.
7.10 Construct the moment-generating function for a gamma distribu-
tion. Take the derivatives of the moment-generating function to
obtain the mean and variance of the distribution.
7.11 Suppose the random variable X has the moment-generating func-
tion MX( θ ) and that Y = aX + b. Use MX( θ ) to construct the moment-
generating function for Y.
7.12 For the density function given in Exercise 7.4, construct the condi-
tional density on X given that X ≥ 1.5 and compute the conditional
moment-generating function for the conditional density.
7.13 Construct the moment-generating function for a sum of two inde-
pendent gamma random variables each having a distinct shape
parameter αi and both having a common scale parameter λ.
7.14 Take the derivatives of the moment-generating function for the
normal distribution and obtain the first two distribution moments.
7.15 Use the moment-generating functions to identify the distribution
on the sum of n distinct and independent normal random vari-
ables. What is the distribution if μXi = μ ∀i and σXi = σ ∀i ?
7.16 The moment generating function for the random v ariable
θ
X is MX(θ) = e2.5(e −1) and that for the random variable Y is
MY(θ) = ( 0.80 + 0.20e θ)20. Compute Pr[ X + Y = 3 ] and E[XY].
7.17 The joint pmf for the discrete random variables X and Y is
e −7 4 x 3 y − x
fXY ( x , y ) = , 0 ≤ x ≤ y, 0 ≤ y < ∞
x !( y − x)!
Construct the joint moment-generating function MXY( θ1, θ2 ) for this

pmf. Then use the moment-generating function to obtain MX( θ1 ),
MY( θ2 ), E[X], E[Y], E[ XY ], Var[X], and Var[ Y ].
7.18 The joint probability mass function for the discrete random vari-
ables X and Y is
x( y − 1)
fXY ( x , y ) = , x = 1, 2, y = 1, 2, 3, 4, 5
30
Construct the joint moment-generating function and use it to
obtain E[ X ] and E[ Y ].
7.19 The joint probability density function for the random variables
X and Y is f XY ( x, y ) = 2e−x−y, 0 ≤ x ≤ y, 0 ≤ y < ∞. Construct the joint
moment-generating function MXY( θ1, θ2 ) for this density and iden-
tify the corresponding marginal moment-generating functions.
7.20 For the joint pmf of Problem 6.13, construct MX|Y( θ1 ) and MY|X( θ2 )
and use them to compute E[ X|Y = 12 ] and E[ Y|X = 12 ].
7.21 The joint probability density function for the random variables X and
Y is f XY ( x, y ) = 2e−x−y, 0 ≤ x ≤ y, 0 ≤ y < ∞. Construct the conditional den-
sity f Y|X(y|x) and the corresponding conditional moment-generating
function MY|X( θ2 ). Then, use the conditional moment-generating
function to obtain E[ Y|X = 7.5 ].
7.22 The joint probability density function for the random variables X and
Y is f XY ( x, y ) = 4e−2x, 0 ≤ x < ∞, 0 ≤ y ≤ x. Compute Pr[ Y ≤ 2 ]. Then, con-
struct the conditional moment-generating function on Y given X = 2.
8
Approximations and Limiting Behavior
There are situations in which the choice of probability model is not clear or,
even if we have an idea of which model to use, we simply wish to make an
estimate of a probability without performing the complete computations. In
addition, there are cases in which it is convenient to use one probability dis-
tribution to approximate probabilities for another distribution. Ultimately,
these methods lead us to two key limit theorems that have wide applicability
and important implications. These topics are examined in this chapter.
8.1 Distribution-Free Approximations

There are two key approximations we can use to get a feel for some probabil-
ities without resorting to computations using the actual distributions. These
two approximations are known as Markov’s inequality and Chebyshev’s
inequality.
Markov’s inequality applies only to nonnegative random variables but
may be used for both discrete and continuous random variables. It states
that for any positive value, say c,
Pr[X ≥ c] ≤ E[X ] c (8.1)
Example 8.1
Suppose a hospital emergency room has experienced patient arrival pat-
terns that are well modeled by a Poisson distribution having λ = 4 / hr
and is using this model to establish staffing policies. What are the
chances that more than seven patients will arrive during any one-hour
period? Using Markov’s inequality
Pr[X ≥ 7] ≤ 4 7
Example 8.2
Suppose the time between incoming calls to a telephone call center
is exponential with the parameter λ = 0.5/min. This means that the
151
expected time between incoming calls is 2 minutes. What is the prob-

ability that the time between two calls exceeds 6 minutes?
Pr[T ≥ 6] ≤ 2 6 = 0.333
One appropriate interpretation of Markov’s inequality is that for most

distributions, it is unlikely to experience a value of the random variable
that is far from the mean of the distribution. Values close to the mean are
more likely to occur. This is stated more directly in Chebyshev’s inequality,
which is
2
Pr[|X − µ| ≥ k ] ≤ σ (8.2)
k2
This relationship applies to all random variables and specifically states that
the probability of observing a value of a random variable that is “far” from
its mean is inversely proportional to the square of the value.
Example 8.3
For the hospital emergency room of Example 8.1, Chebyshev’s inequality
indicates that
Pr[|X − µ| ≥ 7 − 4 = 3] ≤ σ
2
= 49
k2
and the chances that nine patients arrive during any hour is
Pr[|X − µ| ≥ 9 − 4 = 5] ≤ σ
2
= 4 25 = 0.16
k2
Example 8.4
For the call center of Example 8.2, Chebyshev’s inequality indicates that
Pr[|X − µ| ≥ 6 − 2 = 4] ≤ σ
2
= 4 16 = 0.25
k2
and the chances of an 8-minute wait between calls is
Pr[|X − µ| ≥ 8 − 2 = 6] ≤ σ
2
= 4 36 = 0.111
k2
Example 8.5
For a binomial random variable with n = 50, p = 0.08 so μ = 4.0, σ2 = npq = 3.68.
Then
Approximations and Limiting Behavior 153
Pr[X ≥ 10] = Pr[|X − µ|≥ 6] ≤ 3.68 36 = 0.102

Pr[X ≥ 20] = Pr[|X − µ|≥ 16] ≤ 3.68 256 = 0.014

Example 8.6
For a normal random variable having μ = 64.0 and σ = 1.50,
Pr[X ≤ 61] = Pr[|X − µ|≥ 3] ≤ 2.25 9 = 0.25

Pr[X ≥ 69] = Pr[|X − µ|≥ 5] ≤ 2.25 25 = 0.09

The Markov inequality and the Chebyshev’s inequality provide a quick

approximation to probabilities without actual reference to the d
istribution
function. They are actually more important because of their implica-
tion for limiting behaviors of probabilities. The limits will be discussed
in Section 8.3, but this is preceded by the examination of some additional
approximations.
8.2 Normal and Poisson Approximations

For naturally occurring phenomena, it has been observed that either the
Poisson distribution or the normal distribution is the most appropriate
probability model in a surprisingly large number of cases. In the case of the
Poisson, it seems that many time (or space) indexed natural phenomena dis-
play the feature that relatively few occurrences are observed despite the fact
that there are very many opportunities for the phenomenon to occur. In the
case of the normal distribution, it seems that many phenomena display a
concentration of observations about the mean with near symmetry as the
values move away from the mean.
Whatever the reason, the result is that the normal distribution and the
Poisson distribution both provide reasonable approximations to some prob-
abilities. As a first case, consider the Poisson approximation to the binomial
distribution. For a binomial distribution having parameters n and p, the val-
ues of the distribution are computed by the sum of terms as
x
 n
Pr[X ≤ x] = ∑  x p q
i= 0
x n− x
and in the absence of a mathematical software package, this calculation can

be taxing. However, if n is relatively large and p is relatively small so that
λ = np is of moderate size, the required probability can be approximated by
the corresponding Poisson cumulative distribution function (cdf) for which
convenient tables exist. In fact, a table of values of the Poisson distribution
function is provided in the Appendix of this book.
Example 8.7
For a binomial random variable with n = 80 and p = 0.04 so λ = 3.2, the
calculation of
Pr[X ≤ 6] = B(6,80,0.04) = 0.959
is taxing, but referring to the table of cumulative Poisson probabilities,

we find that
Pr[X ≤ 6] = B(6,80,0.04) ≈ P(6, λ = 3.2) = 0.955
Example 8.8
For a binomial random variable with n = 100 and p = 0.06 so λ = 6.0,
calculation of both
Pr[X = 7] = b(7,100,0.06) = 0.141
and
Pr[X ≤ 7] = B(7,100,0.06) = 0.748
are involved but referring to the table of cumulative Poisson probabili-

ties, we find that
Pr[X = 7] ≈ P(7, λ = 6.0) − P(6, λ = 6.0) = 0.744 − 0.606 = 0.138

and
Pr[X ≤ 7] ≈ P(7, λ = 6.0) = 0.744
The normal distribution may be also be used to approximate binomial

probabilities and can sometimes be used to approximate Poisson probabili-
ties. In both cases, we use the mean and variance of the distribution in the
standard normal formulation. Thus, for the binomial
x − np
Pr[X ≤ x] ≈ Pr[Z ≤ ] (8.3)
npq
the corresponding construction for the Poisson distribution is

x−λ
Pr[X ≤ x] ≈ Pr[Z ≤ ] (8.4)
λ
In the application of these approximations, there is another consideration
that is important. Normal probabilities are defined for a continuous random
variable, and both the binomial and the Poisson random variables are discrete.
Consequently, it is often appropriate to compute the approximating probabilities
for a value X = x as Pr[ x − 0.5 ≤ X ≤ x + 0.5 ] and for cumulative probabilities that
X ≤ x as Pr[ X ≤ x + 0.5 ]. This is known as the continuous to discrete correction.
Example 8.9
For a binomial random variable with n = 100 and p = 0.03 so np = 3.0 and
npq = 2.91. We find
Pr[
X ≤ 5 ] = B( 5, 100, 0.03 ) = 0.919
and
5.5 − 3
Pr[X ≤ 5] ≈ Pr[Z ≤ ] = Pr[Z ≤ 1.466] = 0.928
2.91
Example 8.10
In an automated machining process, output workpieces are accumulated
into batches of 800 parts. If the machining process generates 2.5% defec-
tive pieces, what is the probability that a batch has fewer than 12 defec-
tive parts?
The number of defects in a batch should have a binomial distribution,
so the appropriate computation is
Pr[
X ≤ 12 ] = B( 12, 800, 0.025 ) = 0.037
The normal approximation to this probability is
12.5 − 20
Pr[X ≤ 12] ≈ Pr[Z ≤ ] = Pr[Z ≤ −1.698] = 0.045
19.5
Example 8.11
In a manufacturing assembly line, work stoppages occur due to machinery
jams according to a Poisson distribution having the parameter λ = 10/hr.
What is the probability that 15 or more stoppages occur during any one-
hour period?
The Poisson probability of this event is
Pr[X ≥ 15] = 1 − P(14, λ = 8.0) = 1.0 − 0.917 = 0.083
The normal approximation to this probability is
Pr[X ≥ 15] ≈ 1 − Pr[Z ≤ 14.5 − 10 ] = 1 − Pr[Z ≤ 1.423] = 1.0 − 0.923 = 0.077

10
To close this discussion, it is noted that the normal distribution can often
be used to approximate most other distributions. In each case, the quality
of the approximation can be poor but under certain circumstances can be
quite good. For example, when the shape parameter of the gamma distribu-
tion is large, the normal distribution approximates the gamma distribution
reasonably well. When the shape parameter of the Weibull distribution
is between about 2.8 and 3.5, the normal distribution approximates the
Weibull distribution well.
8.3 Laws of Large Numbers and the Central Limit Theorem

An important foundation for much of our ongoing study of probability is the
limiting behavior of sequences of random variables and sequences of prob-
abilities. For these limits, there are three key results that underpin many of
the results we will see and use.
All three of the results to be defined here are based on the idea mentioned
at the start of this chapter that random variables display a “regularity” in
that they tend not to be too far from their expected values. Within that con-
text, suppose we observe a sequence of independent random variables, say
Xi, having a common probability distribution, or equivalently suppose we
obtain a sequence of observations of a particular random variable, X. Then
for the sequence, successively compute the values of
∑X
Sn 1
Yn = = i
n n
i=1
The first of our limiting results is known as the weak law of large numbers
where the term “law” may be taken to be synonymous with distribution.
Although it was developed before Chebyshev’s inequality, the result can be
shown to follow logically from that inequality and is that as long as the vari-
ables Xi have a finite expected value, μ = E[ Xi ], then
lim Pr  Yn − µ ≥ ε  = 0 (8.5)
n→∞
In words, the probability that the weighted sum (or average), Yn, will differ
from the mean goes to zero.
The reader is encouraged to experiment with this behavior. It is sug-
gested that the reader take a fair six-sided die and roll it repeatedly while
recording the result of each roll. Compute Yn as you proceed and observe
its behavior.
The second of our limiting results is comparable but stronger because it is
made without recourse to a probability. The strong law of large numbers is that
as long as the random variables have a finite expected value μ = E[ Xi ], then
lim Yn = µ (8.6)
n→∞
The third and most widely used of our limiting results is known as the central
limit theorem. This result is that as long as the random variables in the sequence
have both a finite expected value μ = E[ Xi ] and a finite variance σ2, then
 
Y −µ 1 y y2
lim Pr  n ≤ y =
∫
−
e 2 dy (8.7)
n→∞ σ  2π −∞
 n 
which is to say, the distribution on the weighted sum, Yn, converges to a stan-
2
dard normal distribution having expected value μ and variance σ n . Note
that this result applies regardless of the identity of the distribution on the
observations, Xi. It is a very strong result that suggests why the descriptors
of so many natural phenomena display the bell shape. The result also pro-
vides an indication of why the normal distribution often yields reasonable
approximations to probabilities from other distributions.
A final result is that the central limit theorem also applies to a sequence
of independent random vectors. If we observe a sequence of random vectors
X i = (X i ,1 , X i ,2 , … , X i , r ) that are mutually independent and have a common
distribution with mean vector µ = (µ 1, µ 2 , … , µ r )all of which elements are
finite and covariance matrix Σ in which all elements are finite, then the dis-
tribution on the vector of component-wise sums Yn = (Yn,1 , Yn,2 , … , Yn, r ) com-
puted as
∑X
Sn, i 1
Yn, i = = j,i
n n j=1

converges to a multivariate normal distribution with mean vector µ and

1
covariance matrix Σ .
n
Exercises
8.1 For a manufacturing process that generates units of product of
which 1.25% are defective, use Markov’s inequality to compute a
limit on the probability of observing more than 4 defective units in
an inspection sample of 100 units.
8.2 The number of calls arriving to a credit card service call center is
Poisson with the parameter λ = 6 / min. Use Markov’s inequality to
compute a limit on the chance that more than 12 customer calls
arrive within a one-minute interval.
8.3 The number of computers sold per week by the university book
store is a random variable with an expected value of 12. Use
Markov’s inequality to compute a limit on the probability that the
sales volume in any week will be (a) 18 or more or (b) 24 or more.
8.4 The time to failure for a high-intensity lamp is a gamma random
variable having parameters α = 2.5, λ = 0.02. Use Markov’s inequal-
ity to compute a limit on the probability that a lamp will survive (a)
more than 150 hours or (b) more than 200 hours.
8.5 Repeat Exercise 8.1 using Chebyshev’s inequality.
8.6 Repeat Exercise 8.2 using Chebyshev’s inequality.
8.7 Assuming the variance in weekly computer sales is 4, repeat
Exercise 8.3 using Chebyshev’s inequality.
8.8 For male college students in the United States, the mean height is
180 cm (6¢) and the variance is 25 cm2. Use Chebyshev’s inequality
to compute a limit on the probability of observing a student taller
than 205 cm (6¢102).
8.9 Use the Poisson distribution to approximate the probability for
Exercise 8.1.
8.10 Use the normal distribution to approximate the probability for
Exercise 8.1.
8.11 Use the Poisson distribution to approximate the cumulative binomial
probability of observing four or fewer defective parts in a sample of
200 units from a production lot having a defect rate of 1.5%. Then,
use the normal distribution to approximate the same probability.
8.12 Use the normal distribution to compute an approximation to the
Poisson probability of Exercise 8.2.
8.13 A gambler playing roulette and betting simply on red or black
has a win probability of 0.474 because of the zero and double zero.
What do the weak law of large numbers and the strong law of large
numbers imply about the long-term winnings of the gambler?
8.14 In reliability analysis, one often considers the use of a sequence

of identical components where each component has an exponen-
tial life length and is replaced upon failure. If the members of a
particular component family have an exponential life length with
parameter λ = 0.01/ hr, use the strong law of large numbers to com-
pute the average rate of failure based on a sequence of n component
life lengths.
8.15 Some modern financial analysts suggest that the daily change in
the price (value) of a company’s stock is a normally distributed ran-
dom variable having a mean of zero and variance σ2 where that
variance represents a measure of risk. If the analysts were correct,
what does the weak law of large numbers indicate about the long-
term change in stock prices?
8.16 If a gambler plays a game in which she wins $4 with probability
0.4, wins $2 with probability 0.1, wins zero with probability 0.2,
and loses $6 with probability 0.3, use the strong law of large num-
bers to compute the likely value of the gambler’s winnings after
100 plays of the game.
8.17 Use the central limit theorem to compute the probability that the
gambler of Exercise 8.16 wins more than $95 over the 100 plays of
the game.
8.18 In an automated machining process, a cutting tool has an opera-
tional life length that is well modeled as a gamma random variable
with α = 4.5, λ = 0.0075/cycle. Worn-out tools are replaced instanta-
neously so that processing can continue uninterrupted. If the avail-
able supply of cutting tools is 25, use the central limit theorem to
estimate the probability that the available tools will be sufficient to
complete one week’s work, which requires 15,400 cycles.
Appendix: Cumulative
Poisson Probabilities
np/x 0 1 2 3 4 5 6 7 8 9 10
0.02 0.980 1
0.04 0.961 0.999 1
0.06 0.942 0.998 1
0.08 0.923 0.997 1
0.10 0.905 0.995 1
0.12 0.887 0.993 1
0.14 0.869 0.991 1
0.16 0.852 0.988 0.999 1
0.18 0.835 0.986 0.999 1
0.20 0.819 0.982 0.999 1
0.25 0.779 0.974 0.998 1
0.30 0.741 0.963 0.996 1
0.35 0.705 0.951 0.994 1
0.40 0.670 0.938 0.992 0.999 1
0.45 0.638 0.925 0.989 0.999 1
0.50 0.607 0.910 0.986 0.998 1
0.55 0.577 0.894 0.982 0.998 1
0.60 0.549 0.878 0.977 0.997 1
0.65 0.522 0.861 0.972 0.996 0.999 1
0.70 0.497 0.844 0.966 0.994 0.999 1
0.75 0.472 0.827 0.959 0.993 0.999 1
0.80 0.449 0.809 0.953 0.991 0.999 1
0.85 0.427 0.791 0.945 0.989 0.998 1
0.90 0.407 0.772 0.937 0.987 0.998 1
0.95 0.387 0.754 0.929 0.984 0.997 1
1.00 0.368 0.736 0.920 0.981 0.996 0.999 1
1.10 0.333 0.699 0.900 0.974 0.995 0.999 1
1.20 0.301 0.663 0.879 0.966 0.992 0.998 1
1.30 0.273 0.627 0.857 0.957 0.989 0.998 1
1.40 0.247 0.592 0.833 0.946 0.986 0.997 0.999 1
1.50 0.223 0.558 0.809 0.934 0.981 0.996 0.999 1
1.60 0.202 0.525 0.783 0.921 0.976 0.994 0.999 1
1.70 0.183 0.493 0.757 0.907 0.970 0.992 0.998 1
1.80 0.165 0.463 0.731 0.891 0.964 0.99 0.997 0.999 1
1.90 0.150 0.434 0.704 0.875 0.956 0.987 0.997 0.999 1
2.00 0.135 0.406 0.677 0.857 0.947 0.983 0.995 0.999 1
(Continued)
161
162 Appendix: Cumulative Poisson Probabilities
np/x 0 1 2 3 4 5 6 7 8 9 10
2.20 0.111 0.355 0.623 0.819 0.928 0.975 0.993 0.998 1
2.40 0.091 0.308 0.570 0.779 0.904 0.964 0.988 0.997 0.999 1
2.60 0.074 0.267 0.518 0.736 0.877 0.951 0.983 0.995 0.999 1
2.80 0.061 0.231 0.469 0.692 0.848 0.935 0.976 0.992 0.998 0.999 1
3.00 0.050 0.199 0.423 0.647 0.815 0.916 0.966 0.988 0.996 0.999 1
3.20 0.041 0.171 0.380 0.603 0.781 0.895 0.955 0.983 0.994 0.998 1
3.40 0.033 0.147 0.340 0.558 0.744 0.871 0.942 0.977 0.992 0.997 0.999
3.60 0.027 0.126 0.303 0.515 0.706 0.844 0.927 0.969 0.988 0.996 0.999
3.80 0.022 0.107 0.269 0.473 0.668 0.816 0.909 0.960 0.984 0.994 0.998
4.00 0.018 0.092 0.238 0.433 0.629 0.785 0.889 0.949 0.979 0.992 0.997
4.20 0.015 0.078 0.210 0.395 0.590 0.753 0.867 0.936 0.972 0.989 0.996
4.40 0.012 0.066 0.185 0.359 0.551 0.720 0.844 0.921 0.964 0.985 0.994
4.60 0.010 0.056 0.163 0.326 0.513 0.686 0.818 0.905 0.955 0.980 0.992
4.80 0.008 0.048 0.143 0.294 0.476 0.651 0.791 0.887 0.944 0.975 0.990
5.00 0.007 0.040 0.125 0.265 0.440 0.616 0.762 0.867 0.932 0.968 0.986
5.20 0.006 0.034 0.109 0.238 0.406 0.581 0.732 0.845 0.918 0.96 0.982
5.40 0.005 0.029 0.095 0.213 0.373 0.546 0.702 0.822 0.903 0.951 0.977
5.60 0.004 0.024 0.082 0.191 0.342 0.512 0.670 0.797 0.886 0.941 0.972
5.80 0.003 0.021 0.072 0.170 0.313 0.478 0.638 0.771 0.867 0.929 0.965
6.00 0.002 0.017 0.062 0.151 0.285 0.446 0.606 0.744 0.847 0.916 0.957
6.20 0.002 0.015 0.054 0.134 0.259 0.414 0.574 0.716 0.826 0.902 0.949
6.40 0.002 0.012 0.046 0.119 0.235 0.384 0.542 0.687 0.803 0.886 0.939
6.60 0.001 0.010 0.040 0.105 0.213 0.355 0.511 0.658 0.780 0.869 0.927
6.80 0.001 0.009 0.034 0.093 0.192 0.327 0.480 0.628 0.755 0.850 0.915
7.00 0 0.007 0.030 0.082 0.173 0.301 0.450 0.599 0.729 0.830 0.901
7.20 0 0.006 0.025 0.072 0.156 0.276 0.420 0.569 0.703 0.810 0.887
7.40 0 0.005 0.022 0.063 0.140 0.253 0.392 0.539 0.676 0.788 0.871
7.60 0 0.004 0.019 0.055 0.125 0.231 0.365 0.510 0.648 0.765 0.854
7.80 0 0.004 0.016 0.048 0.112 0.210 0.338 0.481 0.620 0.741 0.835
8.00 0 0.003 0.014 0.042 0.100 0.191 0.313 0.453 0.593 0.717 0.816
8.50 0 0.002 0.009 0.030 0.074 0.150 0.256 0.386 0.523 0.653 0.763
9.00 0 0.001 0.006 0.021 0.055 0.116 0.207 0.324 0.456 0.587 0.706
9.50 0 0 0.004 0.015 0.040 0.089 0.165 0.269 0.392 0.522 0.645
10.00 0 0 0.003 0.010 0.029 0.067 0.130 0.220 0.333 0.458 0.583
Appendix: Cumulative Poisson Probabilities 163
np/x 11 12 13 14 15 16 17 18 19 20 21
3.4 1
3.6 1
3.8 0.999 1
4 0.999 1
4.2 0.999 1
4.4 0.998 0.999 1
4.6 0.997 0.999 1
4.8 0.996 0.999 1
5 0.995 0.998 0.999 1
5.2 0.993 0.997 0.999 1
5.4 0.990 0.996 0.999 1
5.6 0.988 0.995 0.998 0.999 1
5.8 0.984 0.993 0.997 0.999 1
6 0.980 0.991 0.996 0.999 0.999 1
6.2 0.975 0.989 0.995 0.998 0.999 1
6.4 0.969 0.986 0.994 0.997 0.999 1
6.6 0.963 0.982 0.992 0.997 0.999 0.999 1
6.8 0.955 0.978 0.990 0.996 0.998 0.999 1
7 0.947 0.973 0.987 0.994 0.998 0.999 1
7.2 0.937 0.967 0.984 0.993 0.997 0.999 1
7.4 0.926 0.961 0.980 0.991 0.996 0.998 0.999 1
7.6 0.915 0.954 0.976 0.989 0.995 0.998 0.999 1
7.8 0.902 0.945 0.971 0.986 0.993 0.997 0.999 1
8 0.888 0.936 0.966 0.983 0.992 0.996 0.998 0.999 1
8.5 0.849 0.909 0.949 0.973 0.986 0.993 0.997 0.999 0.999 1
9 0.803 0.876 0.926 0.959 0.978 0.989 0.995 0.998 0.999 1
9.5 0.752 0.836 0.898 0.940 0.967 0.982 0.991 0.996 0.998 0.999 1
10 0.697 0.792 0.864 0.917 0.951 0.973 0.986 0.993 0.997 0.998 0.999
Probability
Manufacturing and Industrial Engineering
Nachlas
“… responds to a need that I felt some years ago, which is to provide a basic and
direct presentation of probability to engineers.”
—Enrico Zio, Politecnico di Milano, Dipartimento Energia, Milano, Italy
Foundations
“… an excellent introductory book on probability for engineers …”
—Edward A. Pohl, University of Arkansas, Fayetteville, USA
Probability Foundations for Engineers

“The theories are presented in a conversational rather than formal form as in
for Engineers
most of the literature on probability. … introduces the reader in the field of
randomness in a nice way. … creates a solid foundation to build up knowledge …
The strength of the book is that it presents and translates the intuition concerning
probability into mathematical structures using examples and explanations rather
than the traditional approach of theorem and proof …”
— Prof. Uday Kumar, Luleå University of Technology, Sweden
“… gives an in-depth and rigorous presentation of probability theory, while

avoiding a classical mathematical—theorems/proofs—presentation. … As the
author himself writes, he wants his book to be a supporting tool to go from
intuition to mathematical rigor and this is certainly rewarding and fruitful from
the pedagogical point of view. … a valuable tool for engineering students who
want to learn the basic concepts and notions of probability theory and be able to
make use of these on engineering problems.”
—Christophe Bereguer, Grenoble Institute of Technology, France
“… takes a fresh approach to teaching undergraduate engineering students

the fundamentals of probability. The book exploits students’ existing intuition
regarding probabilistic concepts when presenting these concepts in a more
rigorous manner.”
—Lisa Maillart, University of Pittsburgh, Pennsylvania, USA
“ … a valuable book … Its conversational manner, use of everyday examples, and

attention to the fundamentals of probability theory make it eminently suitable
for an introductory one-semester course.”
—Andrew K. S. Jardine, University of Toronto, Canada
Joel A. Nachlas
K14453
ISBN: 978-1-4665-0299-4
90000
www.c rc pr e ss.c o m
9 781466 502994
w w w.crcpress.com
K14453 cvr mech.indd 1 4/11/12 10:53 AM

Probability Foundations For Engineers PDF

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Probability Foundations For Engineers PDF

Enviado por

Direitos autorais:

Formatos disponíveis

Probability

Manufacturing and Industrial Engineering

Probability Foundations for Engineers

“… gives an in-depth and rigorous presentation of probability theory, while

“… takes a fresh approach to teaching undergraduate engineering students

“ … a valuable book … Its conversational manner, use of everyday examples, and

K14453 cvr mech.indd 1 4/11/12 10:53 AM

© 2012 by Taylor & Francis Group, LLC

No claim to original U.S. Government works

International Standard Book Number-13: 978-1-4665-0301-4 (eBook - PDF)

and the CRC Press Web site at

Dr. Marvin M. Nachlas,

a talented scientist, a sensitive physician, and a loving father

4.4.3 The Weibull Distribution...................................................... 56

This book is intended for undergraduate (probably sophomore-level) engi-

and then conditional distributions. In the process, I intentionally postpone

1.1 Historical Perspectives

attributable to Richard von Mises and Andrei Kolmogorov, whose principal

1.2 Formal Systems

What does your intuition suggest concerning these questions? Assuming

M = { x|x is a Chevrolet Malibu with Florida liicense tags}

B = { x|x ∈ M and is blue}

W = { x|x ∈ M and is white}

B ⊂ M and W ⊂ M

which are read as B is contained in M and W is contained in M. Equivalently,

if X ⊆ Y and Y ⊆ X, then X = Y

2.3 Set Operations

A ∩ B = { x|x ∈ A and x ∈ B} (2.2)

In the case of the example of Chevrolet Malibus registered in Florida, let

T = { x|x ∈ M and is a two door model}

F = { x|x ∈ M and is a four door model}

Specifically, they are commutative, associative, and distributive. These rela-

Commutative: A ∪ B = B ∪ A and A ∩ B = B ∩ A

One further relationship that we define is the complement of a set. This

A c = { x|x ∈ Ω and x ∉ A} (2.3)

A = {1,2,3,4,5} and B = {4,5,6,7,8}

If we wish to identify the set of elements of A that are not in B, we take

Comparable constructions permit us to describe nearly any meaningful set.

2.4 Venn Diagrams

station wagons. In addition, if it were meaningful to do so, we could recog-

In order to appreciate these two relationships, consider the Venn dia-

Y1 ∩ Y2 Y2c ∩ Y3 (Y1 ∩ Y2 ∩ Y3 )c Y1c ∩ Y2c

2.15 Suppose an experiment consists of observing the speed of traf-

3.1 Random Experiments, Outcomes, and Events

a. A short due to a lack of material on a circuit path

Ω = {no flaw, short, bridge, crack}

An observation we should make concerning these two examples is that the

A = { no flaw}, B = { short, bridge}, C = { crack }, and D = { short, bridge, crack }

E = { x|7.80 ≤ x < 7.99 }, F = { x|7.90 ≤ x < 8.25 }, and G = { x|8.10 ≤ x ≤ 8.40 }

3.3 Probability Axioms

The three axioms of probability are

Similarly E2 = (E2 ∩ E1c ) ∪ (E1 ∩ E2 ) and (E2 ∩ E1c ) ∩ (E1 ∩ E2 ) = ∅ , so

Next, return to the application of probability to events rather than out-

a. We roll an even number, A = { x|x is even }.

What are the elements of the sets A, B, and C?

What are A ∪ B, A ∩ B, B ∩ C, A ∪ B ∪ C?

∪ B = { 2, 3, 4 }, A ∩ B = {4 }, B ∩ C = { 3 }, A ∪ B ∪ C = { Ω }

What are the probabilities of occurrence for these sets?

Pr[ A ∪ B] = 3 4 , Pr[ A ∩ B] = 1 4 , Pr[B ∩ C] = 1 4 , Pr[ A ∪ B ∪ C] = 1

If we define the events, E, F, G and H as

What are the elements of these sets?

= { x|x is odd }, F = { x|4 ≤ x ≤ 7}, G = { x|x > 5 }, H = { 4 }

What are E ∪ F, E ∩ F, E ∩ G, E ∩ H, E ∪ H?

2.15 Suppose an experiment consists of observing the speed of traf-