Escolar Documentos
Profissional Documentos
Cultura Documentos
Nachlas
“… responds to a need that I felt some years ago, which is to provide a basic and
direct presentation of probability to engineers.”
—Enrico Zio, Politecnico di Milano, Dipartimento Energia, Milano, Italy
Foundations
“… an excellent introductory book on probability for engineers …”
—Edward A. Pohl, University of Arkansas, Fayetteville, USA
for Engineers
most of the literature on probability. … introduces the reader in the field of
randomness in a nice way. … creates a solid foundation to build up knowledge …
The strength of the book is that it presents and translates the intuition concerning
probability into mathematical structures using examples and explanations rather
than the traditional approach of theorem and proof …”
— Prof. Uday Kumar, Luleå University of Technology, Sweden
Joel A. Nachlas
K14453
ISBN: 978-1-4665-0299-4
90000
www.c rc pr e ss.c o m
9 781466 502994
w w w.crcpress.com
Joel A. Nachlas
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made
to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all
materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in
any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, micro-
filming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.
copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that
have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identi-
fication and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
Preface...................................................................................................................... ix
Author...................................................................................................................... xi
1. Introduction..................................................................................................... 1
1.1 Historical Perspectives......................................................................... 1
1.2 Formal Systems..................................................................................... 2
1.3 Intuition.................................................................................................. 3
Exercises............................................................................................................ 3
2. A Brief Review of Set Theory...................................................................... 5
2.1 Introduction........................................................................................... 5
2.2 Definitions.............................................................................................. 5
2.3 Set Operations....................................................................................... 7
2.4 Venn Diagrams...................................................................................... 8
2.5 Dimensionality..................................................................................... 10
2.6 Conclusion............................................................................................. 11
Exercises........................................................................................................... 11
3. Probability Basics.......................................................................................... 15
3.1 Random Experiments, Outcomes, and Events................................. 15
3.2 Probability............................................................................................. 17
3.3 Probability Axioms.............................................................................. 17
3.4 Conditional Probability....................................................................... 21
3.5 Independence....................................................................................... 25
Exercises........................................................................................................... 28
4. Random Variables and Distributions....................................................... 33
4.1 Random Variables................................................................................ 33
4.2 Distributions......................................................................................... 35
4.2.1 Probability Mass Functions................................................... 38
4.2.2 Probability Density Functions.............................................. 40
4.2.3 Survivor Functions................................................................. 41
4.3 Discrete Distribution Functions........................................................42
4.3.1 The Bernoulli Distribution....................................................43
4.3.2 The Binomial Distribution.....................................................44
4.3.3 The Multinomial Distribution.............................................. 47
4.3.4 The Poisson Distribution....................................................... 48
4.3.5 The Geometric Distribution.................................................. 49
4.3.6 The Negative Binomial Distribution.................................... 50
4.4 Continuous Distribution Functions.................................................. 52
4.4.1 The Exponential Distribution............................................... 53
4.4.2 The Gamma Distribution......................................................54
vii
viii Contents
ix
x Preface
Joel Nachlas, Ph.D., has worked on the faculty of the Industrial and
Systems Engineering Department at Virginia Polytechnic Institute and State
University (Virginia Tech, Blacksburg) since 1974. He has served and con-
tinues to serve as the coordinator for the department’s Operations Research
faculty and curricula and is also the coordinator of the department’s inter-
national program. The foci of Dr. Nachlas’s research are the application of
probability theory to reliability analysis and maintenance planning and sta-
tistical methods to quality control. He earned a B.E.S. from Johns Hopkins
University (Baltimore, Maryland) in 1970 and an M.S. and Ph.D. from the
University of Pittsburgh (Pennsylvania) in 1974 and 1976, respectively. All
three of his degrees are in industrial engineering with a concentration in
operations research. Dr. Nachlas has received numerous awards for his
research including the 1991 P.K. McElroy Award and the 2004 Golomski
Award. He is also the editor of the Proceedings of the Annual Reliability and
Maintainability Symposium, a member of INFORMS, the Institute of Industrial
Engineers, and a fellow of both the American Society for Quality and the
Society of Reliability Engineers. He also serves as head coach of the Virginia
Tech men’s lacrosse team and was selected in 2001 as the U.S. Lacrosse MDIA
national coach of the year.
xi
1
Introduction
Most people have an intuitive feel for probability. Many people play card
games—either for fun or for profit—and most start playing card games as
children. People also talk about weather in terms of probability. It is common
to speak of the chances of side effects associated with medications, and the
chances of automobile accidents, or of contracting communicable diseases.
These are just a few examples of the ways in which probability is a part of
our lives that we seem to understand well.
Paradoxically, most people confronted with the study of the mathemati-
cal representation and analysis of probability find this effort challenging
or worse. The question becomes one of translating our intuition concerning
probability into an understanding of the mathematical structure of the sub-
ject. The answer is far from clear. This text represents an attempt to support
the transition from intuition to mathematical rigor. The vehicle for promot-
ing the transition is explanation and example rather than theorem and proof.
As we proceed, readers are encouraged to reflect on the experiences they
have had with practical realizations of probability and the relationship of
those experiences to the topics described here.
1
2 Probability Foundations for Engineers
The statements that parking lot P4 contains 240 places, that the United States
has had 44 presidents, and that a series of 8 coin tosses yielded 5 heads are
all statistics. They describe past experiences. Many people confuse the two
terms. We are studying probability.
1.3 Intuition
This chapter began with the comment that probability started as an intuitive
evaluation of future experiences. As you now undertake to study probability,
consider the following questions.
1. What do you think are the chances that you would see a blackjack
hand?
2. What is the probability that your car will survive until you graduate?
3. What is the probability that one of your classmates will die this year?
4. What is the probability that a tornado will damage your campus this
year?
5. For an arbitrary consumer product that you purchase this year, what
is the probability that it is defective?
Exercises
1.1 Describe an experience you have had with probability, possibly in
a game or betting context. Indicate how you analyzed the prob-
abilities involved.
1.2 How should we interpret the fact that a weather forecast indicates
a 60% chance of rain today and it does not rain?
1.3 Identify four events or activities that involve you today and are
subject to probability.
1.4 Suggest four engineering applications in which probability is an
important element.
2
A Brief Review of Set Theory
2.1 Introduction
The starting point for our study of probability is a review of the basic concepts of
the mathematical domain called set theory. The reason we start with set theory
is that it will provide a vehicle for organizing the elements of our probability
models. As implied in the name, set theory is a structured language for discuss-
ing “sets.” The initial formal definition of set theory was provided by George
Cantor in 1874. The objective of Cantor’s work and that of other mathemati-
cians working with set theory was to obtain an understanding of infinity. The
difficulty of this idea precipitated considerable debate among mathematicians
and ultimately led to the definition of the axiomatic system that we will use.
This chapter is called a review of set theory because many students who
undertake the study of probability have already encountered set theory in
earlier math courses. For those who are meeting set theory here for the first
time, the descriptions provided next should be sufficient. If not, many sup-
plementary resources are available in the library and on the Web.
A set is simply a collection of entities in which we are interested. The collec-
tion of interest might be all of the Ford sedans registered in Oregon this year,
the people in Pennsylvania receiving liver transplants this month, the red
face cards in a standard deck of poker cards, the engine bearings produced
in a particular plant today, the duration of Internet sessions, the hardness of
cutting tools, or the equity stocks included in your investment portfolio. This
list is intended to illustrate that the idea of a set is general. It can be applied to
any collection of things that we would like to discuss or analyze. The collec-
tion may include a finite number of members (elements) or an infinite num-
ber of members. The important aspect of a set is that it be clearly defined.
2.2 Definitions
It is conventional to represent a set by a capital letter. For example, the set of
Chevrolet Malibus registered in Florida could be represented as
5
6 Probability Foundations for Engineers
Note that the capital M has been used to represent the set and that x has
been used to represent an element (or member) of the set. The vertical line
is read as “such that.” Thus, this set definition should be read as “M is the
set of members, x, such that x is a Chevrolet Malibu with Florida tags.” Note
further that braces “{ }” are used to specify the members of a set. If we wish
to analyze features of any group of items, the definition of the corresponding
set must make the identities of the elements clear.
For most applications, we anticipate that a set will have subsets. That is, sets
may contain groups of members that are subject to more specific identification
and can thus be organized into sets. For example, define the sets B and W as
where the symbol ∈ is read as “is an element of” or “is in.” Thus the set B
is the set of elements of M that are blue (the set of blue Chevrolet Malibus
registered in Florida). We can see that the sets B and W are contained in the
set M and we represent this as
X ⊆ Y
This is read as “X is a subset of Y.” The distinction between this algebraic
statement and the ones provided for B and W is that it would be more correct
in those earlier cases to say B is a proper subset of M and W is a proper subset
of M. This means that the subset B does not exhaust M and similarly for W.
The conceptual parallel to the distinction in membership statements B ⊂ M
and X ⊆ Y is the numerical distinction we make between a < b and a ≤ b. In the
first case, equality is precluded while in the second case equality is possible.
In fact, observe that an implication of this notation is that
Thus, if two sets simultaneously contain each other, they must be identical.
A Brief Review of Set Theory 7
Regardless of the context within which we define sets, there are two sets that
are fundamental to our definitions and our analysis. These are the “universe”
of elements, the set of all of the possible elements we might discuss, and the
“null” set (or empty set), which contains no elements at all. In a probability
context, we will also refer to the universe as the “sample space” and will denote
it by an uppercase omega, Ω. We represent the empty set by the symbol ∅.
A ∪ B = { x|x ∈ A or x ∈ B} (2.1)
Thus, the union of two sets is a set of elements that is in at least one of the
sets. For the example of Chevrolet Malibus registered in Florida, B ∪ W is the
set of those cars each of which is either blue or white.
Conceptually, an intersection is similar to arithmetic multiplication. The
intersection of two sets is a set of elements that are in both sets. That is
Then, B ∩ T represents the set of those cars each of which is blue and has two
doors. Notice that
T ∩ F = ∅
which is to say that the sets T and F have no common elements. Their intersec-
tion is the empty set. We say that these sets are disjointed or mutually exclusive.
The operators union and intersection permit us to describe sets and com-
binations of sets conveniently and efficiently. Fortunately, these operators
have the desirable properties that one often seeks in an algebraic operator.
8 Probability Foundations for Engineers
That is, Ac is the set of elements of the universe that are not in the set A. Note
that the definition of the complement of a set permits us to state that
A ∩ A c = ∅ and A ∪ A c = Ω
Note also that the operations of union and intersection along with the defi-
nitions of the universe, the empty set, and the complement of a set are suf-
ficient for us to describe and analyze sets in any way we feel is informative.
This includes what we might call the difference in sets. Suppose we have
the sets
A ∩ Bc = {1,2,3}
Ω =M
T
W
B
FIGURE 2.1
Example of a Venn diagram.
( A c )c = A (2.4)
Another is
E ∪ F = E ∪ (E c ∩ F ) (2.5)
Two other useful and widely used relationships are known as DeMorgan’s
laws:
( A ∪ B)c = A c ∩ Bc (2.6)
and
( A ∩ B)c = A c ∪ Bc (2.7)
A B A B
FIGURE 2.2
Venn diagrams for DeMorgan’s laws.
A = ( A ∩ B) ∪ ( A ∩ Bc ) (2.8)
Observe that the two sets (A ∩ B) and (A ∩ Bc) are disjoint and the identity
applies even if one of the intersections is empty.
It is going to be useful to extend this idea to multiple sets so consider a col-
lection of sets say E1 , E2 , …, En that is defined so that the sets are pairwise dis-
joint and in total they exhaust the set, A, of which they are all subsets. That is,
Ei ∩ E j = ∅, ∀i, j ≠ i (2.9)
and
∪ E = A (2.10)
i=1
i
We say that the set of sets Ej forms a partition of the set A. We can then use
the partition to state that
A = ( A ∩ E1 ) ∪ ( A ∩ E2 ) ∪ … ∪ ( A ∩ En ) (2.11)
If we think again of the set of Chevrolet Malibus registered in Florida, M,
the sets B and W along with the sets representing Florida-registered Malibus
of each other available color form a partition of M.
2.5 Dimensionality
The final aspect of sets we must consider is their size. The size of a set
is defined as the number of elements that are members of the set. Size is
referred to as the cardinality of the set. For a set, B, we represent the cardinal-
ity by ||B||, and generally we wish to know whether the cardinality of a set
is countable or uncountable. As the terms imply, a set is countable if one could
match the elements of the set with some or all of the natural numbers—one
A Brief Review of Set Theory 11
could count them. If only some of the set of natural numbers is needed, the
set is a finite set. If all of the natural numbers are needed, then the set is
countably infinite. In both of these cases, the elements of the set are said to be
“discrete.” On the other hand, if a set has uncountably infinite cardinality,
the number of elements is infinite and is much greater than the number of
natural numbers.
The most common example of an uncountably infinite set is the set of real
numbers within any arbitrary interval, say [0, 1]. To see that this is the case, enu-
merate and count any sequence of values between zero and one, say (0.0, 0.01,
0.02, 0.03, …). As you do this, note that between any two of your enumerated
numbers, many additional values (infinitely many in fact) can be identified.
2.6 Conclusion
The definitions and relationships included in this chapter are the basic con-
stituents of set theory. There are more detailed and more extensive discus-
sions of set theory than the one provided here. However, the description in
this chapter has been formulated to support the study of probability in the
chapters that follow.
Although set theory has many domains of application, it is fundamental to
the construction of probability theory. The reader is encouraged to fully mas-
ter the concepts of this chapter prior to moving on to the study of probability.
Exercises
2.1 Identify at least two sets of states of the United States in at least two
different ways.
2.2 For the sets you identified in Exercise 2.1, identify subsets.
2.3 Select any physical entity and define sets of it.
2.4 Let E, F, and G be three sets. State expressions for:
a. Only F occurs
b. Exactly two of the sets occur
c. At least one of the sets occurs
d. E and G occur but F does not occur
e. None of the sets occur
2.5 Two 6-sided dice are tossed. Let A represent the set of tosses for
which the sum of the dice is even, B be the set of tosses for which at
12 Probability Foundations for Engineers
least one die shows a 3, and C be the set of tosses for which the sum
of the dice is 7. Identify the elements of
a. A ∩ B
b. Bc ∩ C
c. A ∩ C
d. Ac ∩ Bc ∩ Cc
2.6 Let A, B, and C be three sets. Prove that C ∩ (A ∩ B)c = (C ∩ Ac) ∪ (C ∩ Bc).
2.7 Prove that R ∪ S = R ∪ (Rc ∩ S).
2.8 Draw a Venn diagram that shows the relationship in Exercise 2.7.
2.9 Two fair 6-sided dice are tossed. Let A be the set of tosses for which
the sum of the dice is less than 7, B be the set of tosses for which at
least one die shows a 3, and C be the set of tosses for which the sum
of the dice exceeds 4. Identify the elements of
a. A ∩ Cc
b. A ∩ Bc ∩ C
c. B ∩ Cc
d. Ac ∩ Cc
e. A ∪ (B ∩ C)
2.10 Explain why the set of all stars is countably infinite.
2.11 Let a universe, Ω, be the set of cards in a standard poker deck.
Identify a partition of Ω.
2.12 Suppose events A, B, and C form a partition of a sample space Ω.
Use a Venn diagram to show that an event E of the same sample
space can be stated as E = (A ∩ E) ∪ (B ∩ E) ∪ (C ∩ E).
2.13 Suppose a sample space is defined by Ω = {x|0 ≤ x ≤ 20}. If the
events A = {x|8 ≤ x ≤ 12}, B = {x|10 ≤ x < 15}, C = {x|7 < x ≤ 10}, and
D = {x|11 ≤ x ≤ 17}, describe the following events and draw them on
the real line.
A ∪ B, A ∩ B, A ∩ Dc
B ∪ C, Bc ∩ Cc, C ∪ D
Bc ∩ D, A ∩ C, A ∪ B ∪ C
2.14 In some communication circuits, a three-component voting routine
is used to determine if a message has been transmitted accurately.
For a particular message, let Yi represent the event that voter i indi-
cates accurate transmission. Express each of the following events in
words.
Y2 ∪ Y3 Y1 ∪ Y2 ∪ Y3c (Y1 ∪ Y3 )c (Y1 ∪ Y2 ∪ Y3 )c
15
16 Probability Foundations for Engineers
events soon. For now, we should first recall from our discussion of set theory
that we call the set of all possible outcomes of an experiment the sample space
and will use Ω to represent it. The following are two examples.
Example 3.1
When an integrated circuit is manufactured, it can have three types of
flaws:
If we select a recently produced circuit at random and test it, the result of
this process will be random. The set of possible observations is
Example 3.2
The tread depth on certain newly manufactured, radial pattern automotive
tires varies in the range of 7.5 mm to 8.5 mm. If we select a recently pro-
duced tire and measure its tread depth, the set of possible o
bservations is
Ω = { x|7.5 ≤ x ≤ 8.5 }
Other definitions are also possible but these illustrate the fact that an event
may correspond to one or to more than one outcome. Note also that the
events need not be mutually exclusive.
In the case of Example 3.2, we would not—in fact cannot—define events
comprised of single outcomes. Instead, we define events as intervals, such as
The key point here is that events are sets. Thus, all of the things we said in Chapter
2 about sets apply to events. We may therefore model our random experiment
in terms of the events corresponding to sets of possible observations.
3.2 Probability
Keep in mind that our objective is to define a predictive mathematical
model of an experiment consisting of observing a phenomenon or process
of interest. The definition of the sample space and its events provides the
structural basis (the skeleton) for our model. We next attach our predictive
measure—probability—to our structure.
In its most general sense, probability is simply a single-valued mathematical
function that we define on a sample space. There are rules for how we make
the definition but these are reasonably unrestrictive. Two key rules are (1)
that we define our probability functions on the events of the sample space
rather than on outcomes, and (2) that the domain of the probability function
is the entire sample space and the range is the real interval [0,1].
Before proceeding with this idea further, we observe that people (prob-
ability specialists and philosophers most of all) have argued about how to
assign probabilities to events and how to interpret probability measures for
a long time. These debates continue and are often quite intense. Fortunately
for engineers who wish to apply probability, the formal mathematical system
we will create and study here is internally consistent and “correct” for any
(and all) of the different philosophical interpretations.
As previously discussed, just as geometry is our mathematical language
for describing spatial relationships, probability is our mathematical language
for describing randomness. The philosophical explanations of the origins of
randomness differ substantially but the usefulness of the mathematics tran-
scends those distinctions.
It may seem surprising but these three axioms are all that we need to develop
all of probability theory. Starting with them, we can construct many useful
results that we can then use to model the physical phenomena that we wish
to study.
Following are a few examples of the results that we can construct.
Example 3.3
For any event E, Pr[Ec] = 1 − Pr[E]. Since E ∪ Ec = Ω and E ∩ Ec = ∅,
Pr[E ∪ Ec] = Pr[Ω] = 1 and Pr[E ∪ Ec] = Pr[E] + Pr[Ec].
Example 3.4
The result in Example 3.3 implies that Pr[∅] = 0.
Example 3.5
For two events E1 and E2 having E1 ⊆ E2, it must be the case that
Pr[E1] ≤ Pr[E2]. Suppose E1 ⊆ E2. Then, it must be the case that
E2 = E1 ∪ (E2 ∩ E1c ) and clearly E1 ∩ (E2 ∩ E1c ) = ∅ . Thus, Pr[E2] =
Pr[E1 ] + Pr[E2 ∩ E1c ] , so Pr[E2] ≥ Pr[E1].
Example 3.6
For any two events E1 and E2, Pr[E1 ∪ E2] = Pr[E1] + Pr[E2] − Pr[E1 ∩ E2].
Note that E1 = (E1 ∩ E2 ) ∪ (E1 ∩ E2 ) and (E1 ∩ E2c ) ∩ (E1 ∩ E2 ) = ∅, so Pr[E1] =
c
Pr[E1 ∩ E2 ] + Pr[(E1 ∩ E2 ] .
c
Example 3.7
For three events E1, E2, and E3, we have Pr[E1 ∪ E2 ∪ E3] = Pr [E1] + Pr [E2] +
Pr [E3] − Pr[E1 ∩ E2] − Pr[E1 ∩ E3] − Pr[E2 ∩ E3] + Pr[E1 ∩ E2 ∩ E3].
Probability Basics 19
Example 3.8
Consider that we have two fair four-sided dice (pyramids). For each die,
what are the possible outcomes?
Ω = { 1, 2, 3, 4 }
For the sample space corresponding to the tossing of only one die, con-
sider the events:
= { 2, 4 }, B = { 3, 4 }, C = { 1, 2, 3 }
A
Example 3.9
Suppose we roll both of the four-sided dice and take the sum of their
outcomes. What is the sample space?
Ω = { 2, 3, 4, 5, 6, 7, 8 }
a.
E—The sum is odd.
b.
F—The sum is between 4 and 7.
c.
G—The sum exceeds 5.
d.
H—The sum is 4.
Example 3.10
7.99 − 7.80
E = { x|7.80 ≤ x < 7.99 }, Pr[E] = = 0.19
8.5 − 7.5
8.25 − 7.90
F = { x|7.90 ≤ x < 8.25 }, Pr[ F ] = = 0.35
8.5 − 7.5
8.4 − 8.10
G = { x|8.10 ≤ x ≤ 8.40 }, Pr[G] = = 0.30
8.5 − 7.5
8.25 − 8.10
F ∩ G = { x|8.10 ≤ x < 8.25 }, Pr[ F ∩ G] = = 0.15
8.5 − 7.5
8.25 − 7.80
E ∪ F = { x|7.80 ≤ x < 8.25 }, Pr[E ∪ F ] = = 0.45
8.5 − 7.5
Probability Basics 21
However,
Similarly, if we are given the fact that the first die shows a 2, we have
Pr[ A ∩ B]
Pr[ A|B] = (3.1)
Pr[B]
provided Pr[B] ≠ 0.
An appropriate view of a conditional probability is that the available
knowledge reduces the set of possible outcomes from the full sample space
to a subset of it—the event B. As a result, that knowledge alters the probabili-
ties associated with our observations. Looking at a Venn diagram empha-
sizes the point (Figure 3.1). If we know that B has occurred, then we know
that elements of the sample space that are in the complement of B have not
occurred. Hence, the event B contains all of the possible observations and the
set A ∩ B contains the observations from event A that are possible.
22 Probability Foundations for Engineers
A B
FIGURE 3.1
Venn diagram illustrating conditional probability.
Example 3.11
Consider the sum of the numbers showing on the four-sided dice again.
Enumerate the elements of the sample space as
X1\X2 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
Pr[ A ∩ B]
Pr[ A|B] =
Pr[B]
Probability Basics 23
Assume that both of the events A and B are nonempty. Then, it must also
be the case that
Pr[ A ∩ B]
Pr[B|A] =
Pr[ A]
Pr[B|A]Pr[ A]
Pr[ A|B] = (3.2)
Pr[B]
This is the simplest realization of Bayes’ rule, which was initially formu-
lated by Sir Thomas Bayes during the 18th century and first published in
1763, two years after his death. Bayes was investigating incidence rates of
infectious diseases. To obtain the fully expanded expression of Bayes’ rule,
we first construct the Law of Total Probability.
Recall that for any two events, say C and D, C = (C ∩ D) ∪ (C ∩ Dc ) and
(C ∩ D) ∩ (C ∩ Dc) = Ø, so Pr[C] = Pr[C ∩ D] + Pr[C ∩ Dc].
Now, the “unconditioning” relationship allows us to state that
Pr[C ∩ D] = Pr[C|D] Pr[D] and Pr[C ∩ Dc] = Pr[C|Dc] Pr[Dc], so
Pr[C] = ∑
i=1
Pr[C ∩ Di ] = ∑ Pr[C|D ]Pr[D ] (3.4)
i=1
i i
This is the general statement of the law of total probability. Take a look at it
and use the example of the four-sided dice to try it out.
Example 3.12
Let C = {x|X1 + X2 = 4} and let D1 = {x|X1 = 1}, D2 = {x|X1 = 2}, D3 = {x|X1 = 3},
D4 = {x|X1 = 4}. Then
Pr[C] = 3 16 = 0.1875
To state the law of total probability in words, we might say that the prob-
ability of an event may be computed as the sum of the probabilities of its
intersections with the events that comprise a partition of the sample space.
Using the law of total probability, we can extend the earlier conditional
probability statement that
Pr[B|A]Pr[ A]
Pr[ A|B] =
Pr[B]
to the form
Pr[B|A j ]Pr[ A j ] Pr[B|A j ]Pr[ A j ]
Pr[ A j |B] = = n (3.5)
Pr[B]
∑ Pr[B|A ]Pr[A ]
i=1
i i
This is the general form of Bayes’ rule. It has many applications and forms
a basis for many of the questions and analyses we pursue in probability.
Example 3.13
An inventory system contains four types of products. Customers order
one unit of a product at a time: 20% of customers order the first type of
product, 30% the second type, 15% the third type, and 35% the fourth
type. Due to the policy used to manage the inventory system, the sup-
plier is out of the first type of product 6% of the time, the second 2% of
the time, the third 12% of the time, and the fourth 1% of the time. When
a customer orders a product that the inventory system does not have, the
order cannot be filled, and the customer takes his business elsewhere.
What is the probability that an order cannot be filled?
Let Ti denote the event that the customer orders product type i, i = 1,
2, 3, 4.
Let S denote the event that the order cannot be filled.
= 0.0395
Probability Basics 25
Example 3.14
A cell phone manufacturer purchases display screens from two differ-
ent suppliers: 40% of screens are from Reliable Video and 60% are from
New Age Technology. Although both suppliers are trying to meet the
same longevity requirements, it has been found that the screens from
Reliable Video have a one-year survival rate of 82%, whereas those from
New Age Technology have a 94% one-year survival rate. (a) What frac-
tion of the company’s phones will have screens that survive one year?
(b) If a one-year-old phone is selected at random and found to have a
failed video display, what is the probability that the screen was pur-
chased from New Age Technology?
3.5 Independence
The next topic of this chapter is that of independence. This is a difficult topic
that many people find confusing. The basic idea is that the probability of
occurrence of an event either is or is not influenced by the occurrence of
another event. If the chance of occurrence of an event is affected by the chance
of occurrence of another event, the two events are dependent, and if the prob-
ability of occurrence is not affected, the events are independent. Formally, two
events A and B defined on a sample space, are said to be independent if
Pr[B | A]Pr[ A]
Pr[ A | B] =
Pr[B]
26 Probability Foundations for Engineers
Pr[
A ∩ B ] = Pr[ A ] Pr[ B ] or Pr[ A|B ] = Pr[ A ]
Example 3.15
Two fair 6-sided dice are rolled. Define the events:
A = { x|X1 + X2 is odd }
B = { x|X1 is odd }
C = { x|X1+ X2 ≤ 5 }
D = { x|X1 = 3 }
( )( )
Pr[ A ∩ B] = 1 4 = Pr[ A]Pr[B] = 1 2 1 2 = 1 4
( )( )
Pr[ A]Pr[B] = 10 36 1 6 =
5
108
≠ Pr[ A ∩ B] =
2
36
A note of caution is that many people confuse independence and the con-
cept of mutually exclusive events. These are distinct ideas that should be dis-
tinguished and the difference must be recognized. One observation that may
help with keeping the ideas separate is that since independent events have
Pr[ A ∩ B ] = Pr[ A ] Pr[ B ] and mutually exclusive events have Pr[ A ∩ B ] = 0, we
Probability Basics 27
Pr[
A ∩ B ] = Pr[ A ] Pr[ B ]
A = ( A ∩ B ) ∪ ( A ∩ Bc )
so
Pr[
A ] = Pr[ A ∩ B ] + Pr[ A ∩ Bc ] = Pr[ A ]Pr[ B ] + Pr[ A ∩ Bc ]
Pr[
A ∩ Bc ] = Pr[ A ] (1−Pr[ B ] ) = Pr[ A ] Pr[ Bc ]
Pr[E1 ∩ E2 ∩ … ∩ En−2 ]
Pr[E1 ∩ E2 ∩ … ∩ En ] = ∏ Pr[E ]
i=1
i
Exercises
3.1 A random experiment consists of measuring the weight of the car-
bon dioxide emitted by a coal fired power plant during a 4-hour
period. Identify the sample space for this experiment.
3.2 An experiment consists of selecting an acre of land in the Jefferson
National Forest at random and counting the number of cardinal
nests in that parcel of land. Identify the sample space for this
experiment.
3.3 An experiment consists of measuring the speed of randomly
selected southbound vehicles as they pass mile marker 118
on Interstate Highway 81. Identify the sample space for this
experiment.
3.4 Suppose two events A and B are mutually exclusive and that
Pr[ A ] = 0.3 and Pr[ B ] = 0.5. What are the probabilities that (a) either
event occurs, (b) A occurs but B does not, and (c) both A and B occur?
3.5 A university bookstore accepts MasterCard and Visa credit cards.
Forty-two percent of the stores customers carry a MasterCard and
33% carry Visa. If 14% of the store’s customers carry both cards,
what percentage of the store’s customers carry a credit card the
store accepts?
3.6 Ninety-two percent of college students have laptop computers,
and 68% have MP3 players. If 20% of college students own neither
of these types of electronic devices, what is the probability that a
student selected at random will own both of the two devices?
3.7 For two events A and B of a sample space, what is the value of
Pr[ A ∪ B ] + Pr[ A ∩ B ]?
3.8 Eighty-eight percent of Virginia Tech ISE students have TI-89 cal-
culators, and 65% have tablet-type PCs. If 6% of VT ISE students
own neither of these types of electronic devices, what is the prob-
ability that a student selected at random will own both of the two
devices?
3.9 Suppose two events Y and Z are mutually exclusive and that
Pr[ Y ] = 0.25 and Pr[ Z ] = 0.4. What are the probabilities that (a) either
event occurs, (b) Y occurs but Z does not, and (c) both Y and Z occur?
3.10 Two fair 6-sided dice are rolled. Let A be the event that the sum of
the numbers on the dice is odd and let B be the event that the first
die shows an odd number. Compute Pr[ A ∪ B ].
3.11 A small community organization consists of 20 families, of which
4 have one child, 8 have two children, 5 have three children, 2 have
four children, and 1 has five children.
Probability Basics 29
3.19 If two fair 6-sided dice are rolled, what is the conditional probability
that the first die landed on 6 given that the sum of their numbers is 9.
3.20 If two fair 6-sided dice are rolled, what is the conditional probability
that the first die lands on 4 given that the sum of their numbers is 8.
3.21 In reliability analysis, a parallel system functions successfully as
long as at least one of the identical parallel components is function-
ing. Suppose a parallel system of three components, each having a
reliability of 0.80, is functioning. What is the conditional probabil-
ity that component number one is functioning? What is the condi-
tional probability that it has failed?
3.22 Suppose we have four different unfair coins having a probability
of heads equal to 0.62, 0.56, 0.52, and 0.70, respectively. If one of the
coins is selected at random and flipped with the result that it shows
heads, what is the conditional probability that it was the third coin
that was used?
3.23 The machining center at a production facility has three lathes of
differing ages and thus precision. The oldest, machine A, produces
finished units of product of which 88% are good, 8% are blemished,
and 4% unusable. Machine B produces 92% good, 6% blemished,
and 2% unusable. The newest machine, machine C, turns out prod-
uct that is 96% good, 3% blemished, and 1% unusable. If machine
A produces 1/4 of the company’s output and machine B turns out
1/3 of the output, what fraction of the company’s product is good?
What percentage is blemished? If an unit of product is selected at
random and found to be blemished, what is the probability that it
was produced on machine B?
3.24 Sixty-four percent of the fire alarms in a building were manufac-
tured by Acme and the rest were manufactured by Emca. Fire alarms
are tested every 3 months and the test will give a false indication of
failure with probability 0.04. The test will also give a false indication
of proper function with probability 0.08. The Acme alarms have a
failure probability of 0.18, whereas those from Emca have a failure
probability of 0.15. If a test indicates that a particular alarm is failed,
what is the probability that it was manufactured by Acme?
3.25 The machining center at a production facility has three lathes of
differing ages and thus precision. The oldest, machine A, produces
finished units of product of which 92% are good, 5% are blemished,
and 3% unusable. Machine B produces 93% good, 5% blemished,
and 2% unusable, whereas the newest machine, machine C, turns
out product that is 96% good, 3% blemished, and 1% unusable. If
machine A produces 1/3 of the company’s output while machine B
turns out 1/3 of the output, what fraction of the company’s product
is good? What percentage is blemished?
Probability Basics 31
Since a random variable is a function, its domain is the sample space and a
set of real numbers is the range of the random variable.
It is as simple and as complicated as that. We define a mapping from each
element of the sample space to a real number. We may do this in nearly any
way we wish. Clearly, we will usually want to define well-behaved functions—
ones that are single valued, perhaps invertible, perhaps differentiable—but
while the form of the function may be important to the application, this is not
required by the theory.
Example 4.1
When we roll a six-sided die, the set of observations we may actually
make are the number of spots on the side of the cube that lands facing
up. A logical mapping is
one spot → 1
two spots → 2
three spots → 3
33
34 Probability Foundations for Engineers
four spots → 4
five spots → 5
six spots → 6
Example 4.2
Referring back to the example of the depth of a tire tread in the previous
chapters, we had the sample space Ω = {x|7.5 ≤ x ≤ 8.5}. Clearly, an appro-
priate definition of the random variable is to map the depth readings to
the same numerical value. A reasonable alternative would be to map the
readings to the difference between the observed depth and 7.5 mm. That
is, y = x − 7.5.
Example 4.3
When we flip a coin, the outcomes are H and T. We can map H to 1 and
T to 0, or we could just as well map H to 10 and T to 5.
Example 4.4
If we measure the ambient temperature in New York City at 8:00 a.m., we
might map the thermometer reading to a Celsius scale, a Fahrenheit, or
even a Kelvin scale.
Random Variables and Distributions 35
4.2 Distributions
Once we have defined a random variable, we would like to assign a prob-
ability measure to it. So the question is how do we obtain probabilities for
random variables?
The answer is that we associate with the random variables the same prob-
abilities that apply to the events to which they correspond. Suppose that for a
particular experiment, we have an event A comprised of outcomes ai as
= { a1 , a2 , … , an }
A
For the experiment, we would presumably have been able to define Pr[A].
Now, suppose we define the random variable
xi = x(ai)
Example 4.5
For the six-sided die, let X be the number of spots showing. Then, for the
event B, corresponding to an even number of spots
Pr[X = 2 or 4 or 6] = Pr[B] = 0.5
and for the event C, corresponding to having the number of spots exceed 2,
Pr[X > 2] = Pr[C] = 0.667
Example 4.6
In Chapter 3 (Example 3.10), for the case of the depth of tire tread, we
computed Pr[F] = 0.35 where F = {x | 7.90 ≤ x < 8.25} It would be reasonable
to define the random variable corresponding to the depth of the tread to
be the same number. That is,
= d(x) = x
d
is transferred to the random variables. This implies that if we order the ran-
dom variable in an increasing sequence, then as we include more of the val-
ues of the random variable in a set, the probability is nondecreasing. This
assures that we can transfer the probability measure for both discrete and
continuous random variables.
Specifically, once the random variables are defined and are expressed as
real valued quantities, we organize their probabilities into distribution func-
tions, and this enables us to manipulate the probabilities efficiently. Formally,
the definition of a distribution function is
FX ( x) = Pr[X ≤ x] (4.2)
Notice the notation. The name of the random variable is an italicized capital
letter. Specific values of the random variable are represented by italicized,
lowercase letters.
Random Variables and Distributions 37
0 ≤ FX (x) ≤ 1
3. lim FX ( x) = 0.
x →−∞
4.
F is right continuous. A decreasing sequence that converges to x has
lim FX ( y ) = FX ( x).
y↓x
These properties hold for both discrete and continuous random variables.
In both cases, the function is defined with respect to the entire real line. The
realizations of these conditions are slightly different for discrete and con-
tinuous variables so we treat the two cases a little differently. Consider first
an example of the discrete case.
Example 4.7
Suppose our experiment is comprised of tossing a coin three times and
counting the number of heads. For this experiment, the sample space is
X(TTT) = 0
X(HTT) = X(THT) = X(TTH) = 1
X(HHT) = X(HTH) = X(THH) = 2
X(HHH) = 3
7/8
Fx(x)
1/2
1/8
x
1 2 3
FIGURE 4.1
Example discrete distribution function.
Pr[X = 0] = FX (0) = 0.125
Pr[X = 1] = FX (1) − FX (0) = 0.375
Pr[X = 2] = FX (2) − FX (1) = 0.375
Pr[X = 3] = FX (3) − FX (2) = 0.125
fX ( x) = Pr[X = x] = FX ( x) − FX ( x − 1) (4.3)
FX ( x) = ∑f
j= 0
X ( j) (4.4)
Random Variables and Distributions 39
Equation (4.3) and Equation (4.4) formally express the fact that the probabili-
ties for individual values of a discrete random variable are obtained from the
distribution function and that these probabilities can be added up to recover
accumulated values for the distribution function.
It is appropriate at this point to summarize the construction of a probabil-
ity measure for a discrete random variable. For a random experiment that
produces one of only a countable number of possible outcomes, we define a
mapping from each outcome to a real number. We call that number a realiza-
tion of a random variable because it represents the observation we make in
our random experiment. We then assign probabilities to the real numbers
by transferring the probabilities on events—sets of outcomes—to the cor-
responding sets of images of the outcomes that comprise those events. This
provides an analytical structure that we can use to study and describe pos-
sible observations of the experiment.
Example 4.8
In an automated warehouse, shelf space is allocated according to the
sizes of the products to be stored. For a particular realization of this sys-
tem, the stored items are cases of paint (in cans) and their daily arrival
sequences to the warehouse are determined by orders received. A case
of 1-pint cans contains 12 cans and requires 25% of a bin shelf, whereas
a case of 1-gallon cans contains 6 cans and requires 50% of a bin shelf. A
5-gallon can is handled as a single unit and requires 40% of a bin shelf,
and a 10-gallon can is handled individually and requires 60% of a bin
shelf. Our experiment consists of observing the next arriving product to
be stored. Let
Ω = {ω 1 , ω 2 , ω 3 , ω 4 }
where
Fx
1.0
0.8
0.6
0.4
0.2
x
500 1000 1500 2000
FIGURE 4.2
Example of a continuous distribution function.
FX ( x) = Pr[X ≤ x] (4.5)
d
f X ( x) = FX ( x) (4.6)
dx
It is reasonable to think of the probability density function as the rate at
which the distribution function is accumulating probability. As with all
derivatives, it is a rate function.
Example 4.9
The ambient temperature at 8:00 a.m. in New York City falls in the inter-
val –40°F ≤ T ≤ 120°F. Define X(T) = T so that a plot of FX(x) might look like
the graph in Figure 4.2. Then the pdf for the temperature, evaluated at
75°F would be:
d
f X (75) = FX ( x)
dx x = 75
Random Variables and Distributions 41
Note that we can obtain the probabilities associated with any specific
event as
x (T2 )
Pr[ x(T1 ) ≤ X ≤ x(T2 )] = FX ( x(T2 )) − FX ( x(T1 )) =
∫
x (T1 )
f X (u) du
85
Pr[ x(75) ≤ X ≤ x(85)] = FX ( x(85)) − FX ( x(75)) = FX (85) − FX (75) =
∫ 75
f X (u) du
One very important point here is that we must always be careful to assure
that we define our probability functions so that
0 ≤ FX (x) ≤ 1
Usually, this means that we must be careful with our definition of the ran-
dom variable as we define the probabilities on events so that
Pr[Ω] = 1
xmax x
FX ( x) = ∑
j= x+1
f X ( j) = 1 − ∑f
j = xmin
X ( j) (4.8)
∞ x
FX ( x) =
∫ x
fX (u) du = 1 −
∫ −∞
fX (u) du (4.9)
42 Probability Foundations for Engineers
Example 4.10
In Example 4.8, the probability of large format containers is
Example 4.11
In Example 4.9, the probability that the morning temperature in
New York exceeds 95°F is
120
FX (95) =
∫
95
f X (u) du
0 −∞ < y < 0
0.2 0≤y<1
FY ( y ) = 0.45 1≤ y < 2
0.8 2≤y<3
1.00 3≤y<∞
• Bernoulli
• Binomial
• Multinomial
• Poisson
• Geometric
• Negative binomial
A point to keep in mind as we discuss these models is that they are actually
families of distributions. Each specific realization of one of the models is
obtained by selecting its parameter values.
fX ( x) = p x (1 − p)1− x = p x q1− x , x = 0, 1
FX (0) = (1 − p) = q
FX (1) = 1
Note that specifying the possible values of the random variable is part of
the process of defining the distribution or pmf. In many cases, the possible
values of the random variable are obvious and can be omitted, but as a gen-
eral rule one should be careful to assure that the definition is clear.
Example 4.12
Suppose we test a fire alarm every three months. If the probability that
the alarm is faulty is p = 0.08, we define the random variable as
and we find
f x (0) = 0.92 f x (1) = 0.08
Fx (0) = 0.92 Fx (1) = 1
where the binomial coefficient C represents the number of ways the 9 heads can
be dispersed among the 20 tosses. That is, the two sequences {0, 0, 1, 0, 0, 1, 0, 0,
Random Variables and Distributions 45
20 20 !
=
9 9 ! 11!
or in general
n n!
= (4.10)
k k !( n − k )!
It represents the number of distinct ways that n items can be arranged with
k being of one class and (n – k) being of the other class.
The origin of the binomial coefficient is binomial theorem, which states
that
n
n
( a + b) =
n
∑ j a b
j= 0
j n− j
(4.11)
20
fX (9) = p 9 (1 − p)11
9
and so for the general case, the pmf for the binomial distribution is
n n!
fX ( x) = p x (1 − p)n− x = p x (1 − p)n− x , 0 ≤ x ≤ n (4.12)
x x !(n − x)!
x
n
FX ( x) = ∑
j= 0
j
p (1 − p)
j
n− j
46 Probability Foundations for Engineers
x x
FX ( x) = B( x , n, p) = ∑ b( j, n, p) = ∑ f
j= 0 j= 0
X ( j) (4.13)
20
fX (9) = p 9 (1 − p)11 = 0.16
9
TAbLE 4.1
Full Distribution and Mass Function Values for x Heads in 20 Coin Tosses
x Fx(X) fx(X) x Fx(X) fx(X) x Fx(X) fx(X)
0 ~0 ~0 7 0.1316 0.0739 14 0.9793 0.0370
1 ~0 ~0 8 0.2517 0.1201 15 0.9941 0.0148
2 0.0002 0.0002 9 0.4119 0.1602 16 0.9987 0.0046
3 0.0013 0.0011 10 0.5881 0.1762 17 0.9998 0.0011
4 0.0059 0.0046 11 0.7483 0.1602 18 ~1.0000 0.0002
5 0.0207 0.0148 12 0.8684 0.1201 19 ~1 ~0
6 0.0577 0.0370 13 0.9423 0.0739 20 ~1 ~0
Random Variables and Distributions 47
Example 4.13
Suppose that an injection molding process for automotive interior door
panels tends to generate defective panels at a rate of about 0.8%. What
is the probability that a sample of 125 panels will contain two or fewer
defective panels?
For this case, p = 0.008, so
2
125
FX (2) = ∑
j= 0
j
(0.008) (0.992)
j
125− j
= 0.92
n!
fX ( x1 , x2 , x3 , x4 , x5 , x6 ) = p1x1 p2x2 p3x3 p4x4 p5x5 p6x6 ,
x1 ! x2 ! x3 ! x4 ! x5 ! x6 !
6
(4.14)
0 ≤ xi ≤ n, ∀i, ∑x = n
i=1
i
Example 4.14
Batches of integrated circuits that are produced for assembly in
cell phones and in portable computers are subject to two key types
of defects. They may have a short circuit due to material bridging
between circuit paths or a short circuit due to an absence of conductive
material in a circuit path. If we inspect a sample of these components,
we would classify each unit as ω1 = nondetective, ω 2 = bridging short,
or ω 3 = material void short. Let xk represent the number of units in a
sample of n = 50 chips that belong in category k, then our probability
model is
48 Probability Foundations for Engineers
n!
Pr[X 1 = x1 , X 2 = x2 , X 3 = x3 ] = fX ( x1 , x2 , x3 ) = p1x1 p2x2 p3x3
x1 ! x2 ! x3 !
n! 50!
f X (48, 1, 1) = p1x1 p2x2 p3x3 = (0.975)48 (0.015)(0.010) = 0.109
x1 ! x 2 ! x 3 ! 48!1!1!
(λt)x
fX ( x) = e −λt , 0 ≤ x < ∞ (4.15)
x!
The associated distribution function is
x
(λt) j
FX ( x) = ∑
j= 0
e −λt
j!
(4.16)
For this distribution, we say that the parameter λ represents the rate for the
process. The Poisson is a one-parameter distribution and that parameter is λ.
Example 4.15
Incoming calls to a call center arrive at a rate of λ = 4/min. What is the
probability that the number of calls in any minute exceeds 6? What is the
probability that the number of calls in 5 minutes is fewer than 16?
To answer the first question, we take t = 1 and compute the survivor
function:
FX (6) = Pr[X > 6] = 1.0 − Pr[X ≤ 6] = 1.0 − FX (6) = 1.0 − ∑e
j= 0
−4 ( 4) j
j!
15
FX (15) = ∑j=0
e −20
(20) j
j!
= 0.157
Note that in the general case, the quantity t need not necessarily repre-
sent time. It is often used to represent time but any meaningful measure
that is consistent with the definition of λ is acceptable. In addition, there are
cases in which a single “time” unit is assumed. In those cases, we have the
simplified forms
λx
f X ( x) = e −λ
x!
and
x
λj
FX ( x) = ∑j=0
e −λ
j!
which is to say, the first k – 1 rolls must yield an observation other than a 4 and
( )( )
5
the final roll must yield a 4. Thus, we can compute f K (6) = 5 6 1 6 = 0.067.
Of course, we can also compute values of the distribution function. By the
usual definition
FK ( k ) = Pr[ K ≤ k ] = ∑ (1 − p)
j=1
j−1
p (4.18)
50 Probability Foundations for Engineers
k k −1
∞ ∞
FK ( k ) = ∑
j=1
(1 − p) j−1 p = p ∑
j− 1= 0
(1 − p) j−1 = p
∑
r=0
(1 − p)r − ∑ (1 − p)
r=k
r
∞
= p
1
1 − (1 − p)
− (1 − p) k
r− k =0
∑
(1 − p)r − k
1 1 1 1
= p − (1 − p)k = p − (1 − p)k = 1 − (1 − p)k
p 1 − (1 − p) p p
Example 4.16
In the development of single-shot weapons such as tactical missiles,
a “test analyze and fix (TAAF)” regime is used to find system faults.
A sequence of firings is performed until a missile fails to fire prop-
erly. The device is then subjected to diagnosis to determine and fix the
cause of the failure. If this procedure is implemented for a population
of m issiles for which the failure probability is p = 0.035, what is
the probability that a failure occurs on the sixth firing and what is
the probability that the number of firings required to find a failure
exceeds 12?
12th toss? The answer is that the first 11 tosses must include two 4’s in any
feasible way and then the final toss must yield a 4. Now the probability of
two 4’s in 11 tosses is given by the binomial as b(2, 11, p) and the probability
that the final toss yields a 4 is p. Therefore, the negative binomial probability
that the third success occurs on the 12th trial is
11
f K (12) = b −1 (12 , 3, p) = pb(2 , 11, p) = p p 2 (1 − p)9
2
11 11! 1
( ) (56)
3 9
= p 3 (1 − p)9 = = 0.049
2 2 ! 9! 6
In general, we will observe the xth success on the kth trial if we observe
x – 1 successes in k – 1 trials and then a success on the final trial. That is
k – 1
f K ( k ) = b −1 ( k , x , p) = pb( x − 1, k − 1, p) = k−x
p (1 − p) , x ≤ k < ∞ (4.19)
x
x − 1
k k
j – 1
−1
FK ( k ) = B ( k , x , p) = ∑b
j= x
−1
( j , x , p) = ∑ x – 1 p (1 − p)
j= x
x j− x
(4.20)
and is interpreted as the probability that the xth success occurs on or before
the kth trial.
Note the correspondence and the difference between the negative binomial
distribution and the binomial distribution. For the binomial distribution, the
number of trials is fixed and the random variable is the number of successes,
and for the negative binomial distribution, the number of successes is fixed
and the random variable is the number of trials.
Because of their dual relationship, we can express the cdf for the negative
binomial distribution in terms of that for the binomial distribution. The rela-
tionship is
B−1 ( k , x , p) = 1 − B( x − 1, k , p) (4.21)
This equation reflects the fact that the xth success occurs on or before the kth
trial only if the number of successes in k trials is x or greater (not x – 1 or fewer).
Example 4.17
In testing a new software product, a manufacturer undertakes updating
the product when the third fault is found by a member of the population
52 Probability Foundations for Engineers
19
f K (20) = b −1 (20, 3, 0.04) = (0.04)3 (0.96)17 = 0.0055
2
Example 4.18
Suppose the random variable X has the probability density function
f X ( x) = c(1 − x 2 ), 0≤x≤1
x x x
1 3 1
FX ( x) =
∫
0
f X (u) du =
∫
0
c(1 − u2 ) du = c (u −
3
u ) = c( x − x 3 )
0 3
c 2
FX (1) = 1 = c − = c , so c = 3 2
3 3
0 −∞ < x < 0
fX ( x) = c(1 − x 2 ) 0≤x≤1
0 1< x <∞
Random Variables and Distributions 53
Except where the simple definition is completely clear, the more formal defi-
nition should be used.
Again as in the case of the discrete random variables, observation of our
natural environment and of our manufacturing processes causes us to see
that the probabilities of occurrence for events in these domains often appear
to display a recognizable pattern. There are several of these commonly occur-
ring patterns that can be described in the form of a continuous distribution
function. The five most common models are the
• Exponential
• Gamma
• Weibull
• Normal
• Uniform
Figure 4.3 shows a plot of the exponential density for the case in which
λ = 0.01. Observe that the plot implies greater probabilities for small values
than for larger ones. One can verify this by noting that
0.010
0.008
0.006
0.004
0.002
FIGURE 4.3
Representative exponential density function.
does not occur prior to a particular time. This is often called the reliability
of the equipment.
Example 4.19
An insurance call center receives calls at a rate of four per minute. What is
the probability that the time between two calls is (a) less than 5 seconds,
(b) greater than 30 seconds, and (c) between 10 and 20 seconds? The rate
of 4/min means λ = 4, so
Example 4.20
FT(20,000) = 1 − e−2 = 0.865
λ α t α− 1 −λt
fT (t) = e , 0 < t < ∞ (4.24)
Γ(α)
Random Variables and Distributions 55
where Г(α) is the gamma function evaluated at the value of the parameter
α. The reader is reminded that the gamma function is the definite integral
defined as
∞
Γ ( z) =
∫ 0
t z − 1 e − t dt
and that for the special case in which the argument is an integer
Γ( z) = ( z − 1)!
FT (t) = 1 − e −λt ∑
k=0
(λt)k
Γ( k + 1)
= e −λt
k =α
(λt)k
∑
Γ( k + 1)
or (4.25)
α− 1 ∞
∑ (λkt!) ∑ (λkt!)
k k
FT (t) = 1 − e −λt = e −λt
k=0 k =α
For cases in which the parameter α is not an integer, values of the dis-
tribution function are computed by numerical integration. The value of the
gamma function is then calculated using a numerical approximation. The
best available numerical approximation is given in Abramowitz and Stegun*
as follows. For 0 ≤ α ≤ 1,
where
c1 = −0.5748646
c2 = 0.9512363
c3 = −0.6998588
c4 = 0.4245549
c5 = −0.1010678
* Abramowitz, M., Stegun, I. A., 1965, Handbook of Mathematical Functions, New York, Dover
Publications.
56 Probability Foundations for Engineers
125
1 1
FT (t = 125) =
Γ(α) ∫ 0
λ α t α− 1 e −λt dt =
8.396
(2.764) = 0.329
Example 4.21
Suppose the random variable T represents the time to failure for an auto-
matic guided vehicle, and that it has a gamma distribution with param-
eters α = 3, λ = 0.75/yr. Compute Pr[2.5 yr ≤ T ≤ 7.5 yr].
∞ ∞
∑ (7.5λ)k −7.5 λ
∑ (2.5kλ! )
k
Pr[2.5 ≤ T ≤ 7.5] = FT (7.5) − FT (2.5) = e − e −2.5 λ
k =α
k! k =α
α −1 α −1
∑ (2.5kλ! ) ∑ (7.5kλ! )
k k
= e −2.5 λ − e −7.5 λ
k =0 k =0
(2.5)(0.75) ((2.5)(0.75))
2
= e −( 2.5)( 0.75) 1 + +
1 2
(7.5)(0.75) ((7.5)(0.75))
2
− e −(7.5)( 0.75) 1 + +
1 2
= e −1.875 1 + 1.875 +
(1.875)2 − e −5.625 1 + 5.625 + ( 5.625)2
2 2
β
t−δ
−
θ−δ
FT (t) = 1 − e , δ ≤ t < ∞ (4.26)
Random Variables and Distributions 57
1.0
0.8
0.6
0.4
0.2
FIGURE 4.4
Weibull cdf when θ = 1000 and when θ = 1500.
In this form, the parameter β is called the shape parameter and θ is called
the location parameter. As the names imply, the shape and location param-
eters determine the shape and location of the distribution. In the case of the
Weibull, we see an informative example of the effect of the designation of
the location parameter. Notice that the value of the distribution function at
the value of the random variable t = θ is FT (t = θ) = 1 − e−1 = 0.632 regardless of
the value of the parameter β. Thus, the value of θ determines the range over
which the distribution varies. This is illustrated in Figure 4.4. Increasing
values of the location parameter move the 63.2% point of the distribution
to the right, which implies a wider range of values over which the random
variable may occur.
Example 4.22
Suppose the life lengths of certain memory chips are well modeled by
the Weibull distribution having β = 2.25 and θ = 18,000 hr. What fraction
of the chip population will fail by 10,000 hours, and what is the probabil-
ity that a chip survives more than 25,000 hours?
58 Probability Foundations for Engineers
2.25
10, 000
−
= 1 − e −( 0.556)
2.25
18, 000
FT (10, 000) = 1 − e = 1 − e −0.266 = 0.234
2.25
25, 000
−
= e −(1.389)
2.25
18, 000
FT (25, 000) = e = e −2.094 = 0.123
−
( x−µ )2
e 2σ2
f X ( x) = , − ∞ < x < ∞ (4.28)
2 πσ 2
and it is worth repeating that the range of the random variable is −∞ < x < ∞.
Unfortunately, the normal density cannot be integrated in closed form to
provide an algebraic statement of the distribution function. Consequently,
computation of probabilities for normal random variables is usually accom-
plished in one of four ways.
One way to obtain probabilities for normal random variables is by numeri-
cal integration. One may simply program a numerical integration algorithm
or else use one that is commercially available. A second approach is to use an
Random Variables and Distributions 59
0.10
0.08
0.06
0.04
0.02
10 10 20 30
FIGURE 4.5
Normal density when µ = 1000 and when σ = 4.
x−µ
z= (4.29)
σ
z
Φ( z) =
∫−∞
φ(w) dw (4.31)
for the distribution function. Observe that the algebraic form for the den-
sity function is simplified by the transformation of the variables in Equation
(4.29). This is because the distribution on the transform variable z has param-
eters µ = 0 and σ = 1. Values of the cdf of Equation (4.31) are included in a
widely used table that is reproduced here in Table 4.2. The use of the standard
normal table is illustrated in the following example.
60 Probability Foundations for Engineers
TAbLE 4.2
Cumulative Probabilities for the Standard Normal Distribution
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8079 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8728 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9648 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9712 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9773 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9983 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
Example 4.23
The diameters of baseball cores are well modeled by a normal distribu-
tion having µ = 25 mm and σ = 0.80 mm. In processing, the cores are
passed through a 26-mm screen and the cores trapped above the screen
are machined down provided they have a diameter that is no larger
Random Variables and Distributions 61
x − 25
z=
0.8
and the cores trapped by the screen are those that have diameters that
exceed 26 mm, so
26 − 25
Pr[X > 26] = Pr[Z > = 1.25] = 1 − Pr[Z ≤ 1.25] = 1 − Φ(1.25)
0.8
= 1.0 − 0.8944 = 0.1056
In this calculation, we first find that z = 1.25 is the value of the stan-
dard normal variate that corresponds to a core diameter of 26 mm.
We then go to Table 4.2 where the leftmost column gives the values
of z to a level of tenths. The column headings extend the values of z
to hundredths so we observe that in the row labeled 1.2, the column
headed 0.05 will correspond to z = 1.25 and the table entry of 0.8944 is
therefore the value of Φ(1.25). Subtracting that value from 1 yields the
probability we seek.
To determine the fraction of the population that is machined to meet
the spec, we compute
27 − 25
Pr[26 < X < 27 ] = Pr[1.25 < Z < = 2.5] = Φ(2.5) − Φ(1.25)
0.8
= 0.9938 − 0.8944 = 0.0994
The observant reader will have noted that all of the z values that index the
table are positive. However, negative values of z occur often. How is this han-
dled? The symmetry of the normal density allows us to use the identity that
Φ(− z) = 1 − Φ( z) (4.32)
Example 4.24
Suppose the baseball cores that pass through the 26-mm screen are sub-
sequently passed across a 24.5-mm screen and any cores that are not
trapped are discarded because they are too small. What fraction of the
population of cores is discarded because they are too small?
24.5 − 25
Pr[X < 24.5] = Pr[Z < = −0.625]
0.8
= Φ(−0.625) = 1.0 − Φ(0.625) = 1.0 − 0.7341 = 0.2659
62 Probability Foundations for Engineers
The rationale for Equation (4.32) and its application in Example 4.24 is that
the area under the normal density (the integral) to the right of any value z is
equal to the area under the normal density to the left of −z. Consequently,
including only the positive coordinates in the table is sufficient to specify the
entire distribution function.
A further comment is that the probability stated in Example 4.24 for Φ(0.625)
was computed by linear interpolation between Φ(0.62) and Φ(0.63). This is a
reasonable approach to obtaining values relative to a finer mesh.
Example 4.25
Specifications for the length of a machined component are 1.8 ± 0.16 mm.
Assuming that component length is well modeled by a normal distribu-
tion, what value of σ will assure that at least 98% of the population falls
within the specs?
0.16
Φ( ) = 0.99
σ
and
0.16
= z0.99 = 2.326
σ
so
0.16
σ= = 0.069
2.326
Example 4.25 is intended to illustrate that we can determine “quantiles”
for the normal distribution by reversing the process we used to obtain prob-
abilities. That is, the example is really asking for the normal variates that
have 1% of the population in the tails of the density. Essentially, for any tail
probability, γ, we can say that
zγ = Φ −1 ( γ ) (4.33)
and
xγ = µ + zγ σ (4.34)
Random Variables and Distributions 63
1
( )
−16
Φ( z) ≈ 1 − 1 + d1 z + d2 z 2 + d3 z 3 + d4 z 4 + d5 z 5 + d6 z6 + ε( z) (4.35)
2
where
d1 = 0.04986(73)
d2 = 0.02114(10)
d3 = 0.00327(76)
d4 = 0.00003(80)
d5 = 0.00004(89)
d6 = 0.00000(54)
c 0 + c1 t + c 2 t 2
z1− γ ≈ t − + ε( γ ) (4.36)
1 + e1 t + e 2 t 2 + e 3 t 3
where
c0 = 2.515517
c1 = 0.802853
c2 = 0.010328
e1 = 1.432788
e2 = 0.189269
e3 = 0.001308
and
1
t = ln 2
γ
x−a
FX ( x) = , a ≤ x ≤ b (4.37)
b−a
1
f X ( x) = (4.38)
b−a
so we can see that the proportion in Equation (4.37) is the fraction of the
range of the random variable in the set {a ≤ X ≤ x}.
Under the general definition of the uniform distribution, the parameters a
and b may be positive or negative as long as a < b. In practice, negative values
are rarely used and in fact, the most common use of the distribution model
has a = 0 and b = 1.
Example 4.26
For a uniformly distributed random variable on the range [0.5, 2.5], what
are Pr[1.3 ≤ X ≤ 1.9] and Pr[X > 2.1]?
Since a = 0.5 and b = 2.5
and
2.1 − 0.5
Pr[X > 2.1] = 1 − FX (2.1) = 1 − = 0.2
2.5 − 0.5
Pr[ A ∩ B]
Pr[ A|B] =
Pr[B]
Pr[X = x] f X ( x)
fX|X ≥ a ( x|X ≥ a) = Pr[X = x|X ≥ a] = = , a ≤ x ≤ xmax
Pr[X ≥ a] FX ( a − 1)
FT (t) − FT (τ)
FT|T ≥τ (t|T ≥ τ) = , τ≤t≤∞
1 − FT (τ)
Example 4.27
Suppose X is a Poisson random variable with λ = 12 and we know that for
a certain application X ≥ 8. We compute Pr[X = 11|X ≥ 8] as
f X (11)
f X|X ≥ 8 ( x|X ≥ 8) = = 0.126
FX (7)
and Pr[X ≤ 16|X ≥ 8] as
FX (16) − FX (7)
FX|X ≥ 8 (16|X ≥ 8) = = 0.889
FX (7)
Example 4.28
Suppose that we know that a continuous random variable T represents
the age at failure of a component and has an exponential distribution
66 Probability Foundations for Engineers
Example 4.29
If a population of microelectronic capacitors displays life length behav-
ior that is well modeled by a Weibull distribution having β = 1.8 and
θ = 15,000 hr, compute and interpret the value of the hazard function at
7800 and 15,600 hours.
For the Weibull distribution
βtβ− 1 − ( t θ)β
fT (t) = e
θβ
and
( )
β
− tθ
FT (t) = e
so
βtβ− 1
zT (t) =
θβ
Therefore
Example 4.30
Suppose the exponential distribution (with parameter λ = 0.04/day)
forms a representative model of the life lengths of a certain population of
flies. If 1000 flies hatch at a location today, what proportion of the origi-
nal population and what proportion of the survivors will die on days 2
and 3 of their lives?
For the exponential, zT (t) = λ and FT (t) = 1 − e−λt, so the proportion of the
survivors that start each day that die that day is 4%. However, the aver-
age number of flies that die on day 2 is 38 and on day 3 is 37. The reason
for the difference is that on day 2, 4% of the 960 day-1 survivors die while
on day 3, 4% of the 922 day-2 survivors die.
fT (t) fT (t)
zT (t) = =
FT (t) 1 − FT (t)
dFT (t)
fT (t) = = zT (t) ( 1 − FT (t))
dt
FT (t) = 1 − e ∫0 T
− z ( u) du
(4.40)
or equivalently
Pr[A|B] = Pr[A]
Now, if there are random variables that are defined as images of the event,
it must be the case that those random variables are independent as the proba-
bilities have been transferred directly. Suppose the set X contains the images
of the elements of A and the set Y contains the images of the set B. Then,
necessarily
Pr[X|Y] = Pr[X]
Random Variables and Distributions 69
and
Pr[Y|X] = Pr[Y]
Exercises
4.1 For the sample space described in Exercise 3.2 of Chapter 3, define
a random variable that can be used to represent observations.
4.2 For the sample space described in Exercise 3.1 of Chapter 3, define
a random variable that can be used to represent observations.
4.3 For the sample space described in Exercise 3.3 of Chapter 3, define
a random variable that can be used to represent observations.
4.4 At a local pizza restaurant, the hourly demand, D, for a particular
type of pizza has the following probability mass function:
0.08 d=0
0.12 d=1
0.18 d=2
f D (d) = 0.24 d=3
0.16 d=4
0.12 d=5
0.10 d=6
What are the values for FD (4), FD (2), Pr[1 ≤ D ≤ 4], and Pr[D ≥ 4|D >2]?
4.5 Consider a random variable for which the distribution function is
given by
Determine the values for fY (3), FY (3), Pr[2 ≤ Y ≤ 4], and Pr[Y ≥ 3|Y >1].
70 Probability Foundations for Engineers
4.6 Suppose two fair six-sided dice are rolled. What are the possible
values that could arise for
a. The larger of the two numbers observed.
b. The smaller of the two numbers observed.
c.
The magnitude of the difference between the two numbers
observed.
d. The sum of the two numbers observed.
What are the probabilities associated with each of these random
variables?
4.7 A discrete random variable, Y, has the following probability mass
function:
0.03 y=0
0.09 y=1
0.14 y=2
0.18 y=3
fY ( y ) = 0.23 y=4
0.15 y=5
0.09 y=6
0.07 y=7
0.02 y=8
What are the values for FY (3), FY (5), Pr[2 ≤ Y ≤ 6] and Pr[Y ≥ 5|Y ≥ 2]?
4.8 A random variable Y has the following pmf:
0 y=0
0.8c y=1
1.2 c y=2
fY ( y ) =
c y=3
0.6c y=4
0.4c y=5
Determine the values for c, FY (3), FY (2), and Pr[Y ≥ 3|Y >1].
4.9 Consider a random variable for which the distribution function is
given by
Random Variables and Distributions 71
Determine the values for fY (3), FY (4), Pr[3 ≤ Y ≤ 5], and Pr[Y ≥ 4|Y >2].
4.10 A random variable has the probability density defined as
f X (x) = c(1 − x2), 0 ≤ x ≤ 1
a. Determine the value of c.
b. What is Pr[0.5 ≤ X ≤ 0.85]?
4.11 A random variable has the probability density defined as
f X (x) = c(4x − 2x2), 0 ≤ x ≤ 2.
a. Determine the value of c.
b. What is Pr[0.5 ≤ X ≤ 1.2]?
4.12 Demand at a local microbrewery for its premium beer in gallons
per week is a random variable with density function f X (x) = 4(1 − x)3,
0 ≤ x ≤ 1. How much beer must the brewer produce per week in
order to have a stock out probability in any week of 0.05?
4.13 What is the probability that a binomial random variable, X, hav-
ing parameters n = 50 and p = 0.025, takes a value greater than
2? What is the probability that X exceeds 4 if it is known that X
exceeds 2?
4.14 A particular engine bearing plant has three manufacturing lines.
For line A, the proportion of output bearings that are defective is
0.01, whereas for line B the proportion is 0.016, and for line C it is
0.025. Line A produces 30% of the plant’s output, whereas line B
produces 45% and line C produces 25% of the output. If a sample
of n = 100 bearings is inspected and found to contain one defective
bearing, what is the probability that the sample was taken from the
output of line B?
4.15 What is the probability that a binomial random variable, Y, hav-
ing parameters n = 40 and p = 0.05, takes a value between 2 and 4
inclusive?
4.16 If 0.5% of the output wheel-well manifolds from an injection mold-
ing process are defective, what is the probability that a sample of 80
units will include 2 or more defective manifolds?
72 Probability Foundations for Engineers
4.17 The firm that manufactures patriot missiles purchases the guid-
ance circuits from three different suppliers. Supplier A provides
30% of the guidance circuits and those circuits have a fault prob-
ability of pA = 0.02, whereas the circuits from supplier B, which pro-
vides 25% of those purchased, have a fault probability of pB = 0.025.
The guidance circuits purchased from supplier C have pC = 0.01. If a
batch of 200 missiles is fired during a particular strategic offensive
and three of the missiles fail to track to target, what is the probabil-
ity that the batch of missiles contained guidance circuits obtained
from supplier B?
4.18 The number of patients arriving to a local pharmacy for flu shots
during November is a Poisson random variable with parameter
of λ = 2.8/hr. What is the probability that the number of arriving
patients in any hour exceeds three?
4.19 The number of incoming calls to a mail order call center is a
Poisson random variable with parameter of λ = 1.8/min. What is the
probability that the number of arriving calls in any minute exceeds
three? What is the probability that the number of arriving calls
exceeds three during any 2-minute interval?
4.20 Suppose it is known that the number of accidents occurring per
day on Main Street is a Poisson random variable with parameter
λ = 4/day.
a. What is the probability that the number of accidents on any day
is four or more?
b. Given that the number of accidents today is at least one, what is
the conditional probability of four or more accidents today?
4.21 The number of calls to a particular university Internet site is well
modeled by a Poisson distribution with λ = 1.2/min. What is the
probability that the number of calls during any minute will exceed
three if it is known that the number of calls exceeds one?
4.22 What is the probability that more than 12 tosses of a fair die are
required to obtain the first 6?
4.23 Bob and Joe have each purchased an unbalanced six-sided die
at a novelty shop. Bob’s die has Pr[X = 4] = 0.10, while Joe’s has
Pr[X = 4] = 0.20. If one of these dice is selected at random and rolled
until the first 4 occurs and that happens to be the seventh roll, what
is the probability that the die is Bob’s? How does this probability
change if the first 4 occurs on the fourth roll?
4.24 If an experiment consists of tossing a fair die, what is the probabil-
ity that a 3 occurs for the fourth time on the 20th toss?
4.25 A fair six-sided die is rolled until the third time a 5 is obtained.
What is the probability that the number of rolls exceeds 20? What
Random Variables and Distributions 73
4.36 Life lengths of certain automotive tires are well modeled by a gamma
distribution having α = 2.85 and λ = 7.14 × 10−5/mile. If the manufac-
turer of the tires offers a free replacement warranty of 18,000 miles on
the tires, what fraction of the tire population will have to be replaced?
4.37 If the life lengths of memory chips is well modeled by the Weibull
distribution having β = 1.8 and θ = 20,000 hr, what fraction of the
memory chip population will survive beyond 27,000 hours?
4.38 The life lengths of another population of memory chips is well mod-
eled by the Weibull distribution having β = 2.25 and θ = 18,000 hr.
a. What is the probability that a chip survives more than 25,000
hours?
b. What is the value of the hazard function at 25,000 hours?
4.39 The life length of a photocopier roller bearing displays a Weibull
distribution having β = 3.2 and θ = 12,000 cycles. What fraction of
the population will fail by 8000 cycles? By 18,000 cycles?
4.40 An electronics manufacturer purchases 30% of its cell phone bat-
teries from a supplier that claims those batteries have a Weibull
life length with parameters β = 1.40 and θ = 25,000 hr. The manu-
facturer produces the remaining 70% of its cell phone batteries in
its own plant, and those batteries have a Weibull life distribution
with parameters β = 1.80 and θ = 30,000 hr. If your cell phone has
required a one-year (8760 hours) warranty replacement because of
battery failure, what is the probability that your phone contained a
battery from the supplier?
4.41 The annual snowfall in Buffalo, New York, is normally distributed
with μ = 120" and σ = 10.4". What is the probability that Buffalo will
have more than 140" in any year? What is the probability that this
year’s total will be between 110" and 130"?
4.42 A normal random variable, X, with μ = 45 takes a value less than or
equal to 38.5 with probability 0.125. What is the value of σ for the
distribution?
4.43 A normal random variable, Y, with σ = 7.5 takes a value greater than
or equal to 264.4 with probability 0.230. What is the value of μ for
the distribution?
4.44 The thickness, T, of personal computer chassis spacers is well mod-
eled by a normal distribution having μ = 0.4 mm and σ = 0.04 mm If
a spacer is selected at random, what is
a. Pr[0.33 ≤ T ≤ 0.45]?
b. Pr[T ≥ 0.32]?
c. Pr[T ≥ 0.50]?
d. Pr[0.375 ≤ T ≤ 0.50 | T > 0.32]?
Random Variables and Distributions 75
Example 5.1
Suppose we toss two fair dice and map the number of spots facing up on
each die to the corresponding number. Then, our random vector would
be X = (X 1 , X 2 ), where Xi is the number observed on die i.
Example 5.2
At a regional credit card call center, customers call in either to request a
credit limit increase or to check on their existing balance. If Y1 represents
the number of customers who call to request a credit limit increase dur-
ing a 4-hour interval and Y2 represents the number of customers who
call to check their account balance, then the random vector Y = (Y1 , Y2 )
models the sample space of incoming calls.
77
78 Probability Foundations for Engineers
Example 5.3
If the location of a hole punched in a work piece varies in two dimensions
and is evaluated relative to its horizontal and vertical alignment, then the
random vector (X, Y) provides a representation of hole position quality.
Example 5.4
If the life of an automotive tire is defined in terms of distance traveled and
days of use, then the random vector (D, U) represents tire age accumulation.
Clearly, random vectors are common and can include any number of dimen-
sions. In addition, the quantities that comprise the random vectors may be
discrete or continuous.
As in the case of univariate random variables, we map the probabilities of
events of the sample space to the sets of random variables that are the images
of those events. We again form the mapping so that we have a distribution
function for the random vector. The general representation for the distribu-
tion function for a random vector is
and we call this function the “joint distribution function” (or joint cumulative
distribution function or joint cdf) on the random vector X = (X 1 , X 2 , … , X r ).
To examine the general form in detail, consider a two-dimensional random
vector (X, Y) for which the realization of Equation (5.1) is
FX ,Y ( x , y ) = Pr[X ≤ x , Y ≤ y ] (5.2)
fX ,Y ( x , y ) = 1 36 ∀x , y
TAbLE 5.1
Joint Distribution Function for Two Fair Dice
X\Y 1 2 3 4 5 6
1 1 1 1 1 5 1
36 18 12 9 36 6
2 1 1 1 2 5 1
18 9 6 9 18 3
3 1 1 1 1 5 1
12 6 4 3 12 2
4 1 2 1 4 5 2
9 9 3 9 9 3
5 5 5 5 5 25 5
36 18 12 9 36 6
6 1 1 1 2 5 1
6 3 2 3 6
x y
FX ,Y ( x , y ) = ∑∑ f
i= 0 j= 0
X ,Y (i, j) (5.3)
fX ,Y ( x , y ) ≠ FX ,Y ( x , y ) − FX ,Y ( x − 1, y − 1) (5.4)
x−1 y −1
fX ,Y ( x , y ) = FX ,Y ( x , y ) − FX ,Y ( x − 1, y − 1) − ∑i= 0
f X ,Y (i , y ) − ∑f
j= 0
X ,Y ( x , j) (5.5)
FX ,Y ( x , y ) − FX ,Y ( x − 1, y ) = Pr[X = x , Y ≤ y ] (5.6)
and
FX ,Y ( x , y ) − FX ,Y ( x , y − 1) = Pr[X ≤ x , Y = y ] (5.7)
80 Probability Foundations for Engineers
FX ( x) = FX ,Y ( x , y max ) (5.8)
and
FY ( y ) = FX ,Y ( xmax , y ) (5.9)
and the corresponding expression holds for the random variable Y. The cor-
ollary results for this expression are that the marginal probability mass func-
tions can be constructed as
f X ( x) = ∑fy
X ,Y ( x , y ) (5.10)
and
fY ( y ) = ∑fx
X ,Y ( x , y ) (5.11)
Thus, the joint and marginal probability measures are intertwined and one
can usually be obtained from the other. Consider an example.
Example 5.5
Suppose that the random vector Y = (Y1 , Y2 ) described in Example 5.2
has the joint probability mass function (pmf)
e −17 11y1 6 y2 − y1
fY1 ,Y2 ( y1 , y 2 ) = , 0 ≤ y1 ≤ y 2 , 0 ≤ y 2 < ∞
y1 !( y 2 − y1 )!
Applying Equation (5.10) to this joint pmf yields the marginals
∞ ∞ ∞
e −17 11y1 6 j − y1 e −17 11y1 6 j − y1
fY1 ( y1 ) = ∑f
j=0
Y1 , Y2 ( y 1 , j) = ∑
j = y1
y1 !( j − y1 )!
=
y1 ! ∑ ( j − y )!
j − y1 = 0 1
and
y2 y2 y2
i y2 − i
e −17 11i6 y2 − i e −17
fY2 ( y 2 ) = ∑
i=0
fY1 ,Y2 (i, y 2 ) = ∑
i=0
i !( y 2 − i)!
=
y2 ! ∑ yi !(!11
i=0
2 6
y − i)!
2
e −17
y2
y2 e −17 17 y2
=
y2 ! ∑ i
i=0
i y2 − i
11 6
=
y2 !
Pr[ A ∩ B]
Pr[ A|B] = (5.12)
Pr[B]
The definition of conditional probability functions on a random vector again
requires the application of the probabilities of the events of the sample space
to the images of those events. Once this mapping is defined, the probability
associated with the intersection will usually be a joint probability measure
and the probability of the condition will often be a marginal probability.
Example 5.6
For the two dice having the joint distribution enumerated in Table 5.1,
we can compute
Pr[(X ≤ 3, Y ≤ 2) ∩ (Y ≤ 4)]
Pr[X ≤ 3, Y ≤ 2|Y ≤ 4] =
Pr[Y ≤ 4]
Pr[X ≤ 3, Y ≤ 2] FX ,Y (3, 2) 1 6 1
= = = =
Pr[Y ≤ 4] FY ( 4) 2 4
3
82 Probability Foundations for Engineers
or
Pr[(X ≤ 3, Y ≤ 2) ∩ (X ≤ 4, Y ≤ 4)]
Pr[X ≤ 3, Y ≤ 2|X ≤ 4, Y ≤ 4] =
Pr[X ≤ 4, Y ≤ 4]
Pr[X ≤ 3, Y ≤ 2] FX ,Y (3, 2) 1 6 3
= = = =
Pr[X ≤ 4, Y ≤ 4] FX ,Y ( 4, 4) 4 8
9
or
Pr[(X ≤ 3) ∩ (X ≤ 4, Y ≤ 4)]
Pr[X ≤ 3|X ≤ 4, Y ≤ 4] =
Pr[X ≤ 4, Y ≤ 4]
Pr[X ≤ 3, Y ≤ 4] FX ,Y (3, 4) 1 3 3
= = = =
Pr[X ≤ 4, Y ≤ 4] FX ,Y ( 4, 4) 4 4
9
Example 5.7
For the random vector described in Example 5.2 with the joint probabil-
ity mass function stated in Example 5.5, we can construct the conditional
probability mass functions as
and
e −17 11y1 6 y2 − y1
f Y ,Y ( y 1 , y 2 ) y1 !( y 2 − y1 )! y2 ! 11y1 6 y2 − y1
fY1|Y2 ( y1 |y 2 ) = 1 2 = −17 y2
=
fY2 ( y 2 ) e 17 y1 !( y 2 − y1 )!! 17 y2
y2 !
y 2 11 y1 6 y2 − y1
=
y1 17 17
so
e −6 6 y2 −2
fY2|Y1 ( y 2 |y1 = 2) =
( y 2 − 2)!
Joint, Marginal, and Conditional Distributions 83
and
e −6 62
fY2|Y1 ( 4|y1 = 2) = = 0.0446
2!
and also
4 11 y1 6 4− y1
fY1|Y2 ( y1 |y 2 = 4) =
y1 17 17
and
4 11 2 6 2
fY1|Y2 (2|y 2 = 4) = = 0.313
2 17 17
Example 5.6 and Example 5.7 illustrate the fact that the conditional prob-
abilities may be analyzed using either the conditional distribution function
or the conditional probability mass function. The choice depends upon the
application. The examples are also intended to emphasize the use of the basic
conditioning relationship given in Equation (5.12). As with many of the anal-
yses treated in this text, it is very often worthwhile to base a computation on
a return to an initial definition.
x y
FX ,Y ( x , y ) = Pr[X ≤ x , Y ≤ y ] =
∫ ∫
−∞ −∞
fX ,Y (u, v)dvdu (5.13)
Here again, it is appropriate to note that the joint distribution may reason-
ably apply to a random vector having more than two dimensions.
One of the advantages of the continuous model is that the joint distribu-
tion function is often (not always) differentiable. In those cases in which the
joint distribution can be differentiated, the joint probability density function
is obtained as
84 Probability Foundations for Engineers
∂2
fXY ( x , y ) = FXY ( x , y ) (5.14)
∂x∂y
Thus, it is often possible to move between the distribution and density
functions as necessary. As noted in the case of the discrete joint distribution,
one should be cautious with difference computations. For the continuous
random vectors
b d
Pr[ a ≤ X ≤ b , c ≤ Y ≤ d] =
∫∫
a c
fX ,Y (u, v)dvdu (5.15)
− Pr[X ≤ a, c ≤ Y ≤ d] − Pr[ a ≤ X ≤ b , Y ≤ c]
Example 5.8
Suppose the following joint density function has been defined to model
the response of a new material to a magnetic field:
f XY ( x , y ) = 2 e − x −y , 0 ≤ x ≤ y, 0 ≤ y < ∞
x y x y x
∫∫ ∫e ∫ ∫ e (−e )
y
FXY ( x , y ) = 2 e − u− v dvdu = 2 −u
e − v dvdu = 2 −u −v
du
0 u 0 u 0 u
∫ ( ) ∫( )
x x
1
=2 e − u e − u − e − y du = 2 e −2 u − e − y − u du = 2 − e −2 u + e − y − u
0 0 2 0
= 1 − 2 e − y − e −2 x + 2 e − x − y
∂2 ∂ ∂
f XY ( x , y ) =
∂x∂y
FXY ( x , y ) =
∂y ∂x
(
1 − 2 e − y − e −2 x + 2ee − x − y
)
∂
=
∂y
( )
2 e −2 x − 2 e − x − y = 2 e − x − y
Joint, Marginal, and Conditional Distributions 85
Example 5.9
The diameter of an automotive side rail spot weld, X, and its compressive
strength, Y, have been modeled using a bivariate version of a uniform
density function as
1
f XY ( x , y ) = , 0.44 ≤ x ≤ 0.64 cm, 1000 ≤ y < 2000 N
200
( x − 0.44)( y − 1000)
FXY ( x , y ) =
200
so
0.64 2000
Pr[0.56 ≤ X ≤ 0.64, 1600 ≤ Y ≤ 2000] =
∫ ∫
0.56 1600
f X ,Y (u, v) dv du = 0.16
For this last computation, one might consider the probability to be the joint
distribution equivalent of the univariate survivor function. That is, we might
label this probability as
xmax ymax
FX ,Y ( x , y ) = Pr[ x ≤ X ≤ xmax , y ≤ Y ≤ y max ] =
∫ ∫
x y
fX ,Y (u, v) dv du
This convention is not universally agreed upon but is used in this text.
x ∞
FX ( x) = FX ,Y ( x , ∞) =
∫ ∫
−∞ −∞
fX ,Y (u, v) dv du (5.16)
and
86 Probability Foundations for Engineers
∞ y
FY ( y ) = FX ,Y (∞ , y ) =
∫ ∫ −∞ −∞
fX ,Y (u, v) dv du (5.17)
∞
d
f X ( x) =
dx
FX ( x) =
∫ −∞
fXY ( x , v) dv (5.18)
and
∞
d
fY ( y ) =
dy
FY ( y ) =
∫ −∞
fXY (u, y ) du (5.19)
Example 5.10
For the joint distribution function of Example 5.8, the marginal distribu-
tions may be constructed as
FX ( x) = FXY ( x , ∞) = 1 − e −2 x
FY (y ) = FXY ( y , y ) = 1 − 2e − y + e −2 y
d
f X ( x) = FX ( x) = 2 e −2 x
dx
d
fY ( y ) = FY ( y ) = 2 e − y − 2 e −2 y
dy
∞ ∞
∫ ∫ ( )
∞
f X ( x) = f XY ( x , y ) dy = 2 e − x − y dy = 2 − e − x − y = 2 e −2 x
x
x x
y y
∫ ∫ ( )
y
fY ( y ) = f XY ( x , y ) dx = 2 e − x − y dx = 2 − e − x − y = 2 e − y − 2 e −2 y
0
0 0
Example 5.11
For the joint distribution function of Example 5.8 having the marginal
distributions and densities constructed in Example 5.10, the conditional
density functions may be constructed as
f X ,Y ( x , y ) 2e − x − y e−x
f X|Y ( x|y) = = −y −2 y =
fY ( y ) 2e − 2e 1 − e−y
and
f X ,Y ( x , y ) 2 e − x − y
fY|X ( y|x) = = = e −( y − x)
f X ( x) 2 e −2 x
x
1 x
1 − e−x
FX|Y ( x|y ) =
∫
0
f X|Y (u|y ) du =
1 − e−y ∫ 0
e − u du =
1 − e−y
and
y y
FY|X ( y|x) =
∫ x
fY|X ( v|x) dv =
∫ x
e −( v − x ) dv = 1 − e −( y − x )
There are three important points that are illustrated by Example 5.11. The
first of these points is that a conditional probability function is really a func-
tion of the stated condition. For example, fX|Y ( x|y ) really is a function of Y.
If a value of 5 is specified for Y, the function is distinctly different than if the
specified value is 3.
The second of the important points is that the conditional density functions
are proper univariate density functions and the conditional distribution func-
tions are proper univariate distribution functions. These functions conform
to the descriptions provided for univariate probability functions in Chapter 4.
88 Probability Foundations for Engineers
The third important point is that the definitions of the conditional den-
sities in Example 5.11 are incomplete as the range of the random variables
should have been specified and was not given. Unless the range of the ran-
dom variable is very obvious, it should be stated. In the case of Example
5.11, it should have been noted that f X|Y (x|y) applies for 0 ≤ x ≤ y and f Y|X (y|x)
applies for x ≤ y < ∞.
The use of two-dimensional random vectors has been useful in illustrat-
ing the construction of marginal and conditional probability functions.
However, higher dimension random vectors are also possible and for such
vectors our definitions can be extended, but this should be demonstrated.
Consider an example:
Example 5.12
For the three-dimensional joint density function
f X ,Y , Z ( x , y , z) = 2( x + y )z, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1
1 1
f X ,Y ( x , y ) =
∫ 0
f X ,Y , Z ( x , y , z) dz =
∫ 2(x + y)z dz = (x + y)
0
1 1
f X , Z ( x , z) =
∫ 0
f X ,Y , Z ( x , y , z) dy =
∫ 2(x + y)z dy = (2x + 1)z
0
and
1 1
fY , Z ( y , z) =
∫ 0
f X ,Y , Z ( x , y , z) dx =
∫ 2(x + y)z dx = (2y + 1)z
0
1 1 1 1 1
1
f X ( x) =
∫∫
0 0
f X ,Y , Z ( x , y , z) dz dy =
∫∫0 0
2( x + y )z dz dy =
∫ (x + y) dy = x + 2
0
1 1 1 1 1
1
fY ( y ) =
∫∫
0 0
f X ,Y , Z ( x , y , z) dz dx =
∫ ∫ 2(x + y)z dz dx = ∫ (x + y) dx = y + 2
0 0 0
and
1 1 1 1 1
f Z ( z) =
∫∫
0 0
f X ,Y , Z ( x , y , z) dx dy =
∫∫0 0
2( x + y )z dx =
∫ (2y + 1)z dy = 2z
0
x y x y
FX ,Y ( x , y ) =
∫∫ 0 0
f X ,Y ( x , y )dydx =
∫∫ 0 0
( x + y )dydx
x y2 x 2 y + xy 2
=
∫ 0
xy + 2 dx = 2
x z x z
FX , Z ( x , z) =
∫∫ 0 0
f X , Z ( x , z) dz dx =
∫ ∫ (2x + 1)z dz dx
0 0
x
z2 ( x 2 + x)z 2
=
∫ 0
(2 x + 1)
2
dx =
2
y z y z
FY , Z ( y , z) =
∫∫ 0 0
fY , Z ( y , z) dz dy =
∫ ∫ (2y + 1)z dz dy
0 0
y
z2
( y + y )z 2 2
=
∫ 0
(2 y + 1)
2
dy =
2
x x
1 x2 + x
FX ( x) =
∫ 0
f X ( x)dx =
∫ 0
x +
2
dx =
2
y y
1 y2 + y
FY ( y ) =
∫ 0
fY ( y )dy =
∫0
y + dy =
2 2
z z
FZ ( z) =
∫ 0
f Z ( x) dz =
∫ 0
2 z dz = z 2
and finally
x y z
FX ,Y ,Z ( x , y , z) =
∫∫∫
0 0 0
f X ,Y ,Z ( x , y , z)dzdydx
x y z x 2 y + xy 2 2
=
∫∫∫
0 0 0
2( x + y )zdzdydx =
2 z
Example 5.13
For the three-dimensional joint density function of Example 5.12, there
are many conditional probability functions that can be constructed.
Here is a partial list:
90 Probability Foundations for Engineers
f X ,Y , Z ( x, y , z) 2( x + y )z 4( x + y )z
f X , Z|Y ( x , z|y) = = =
fY ( y ) 1 2y + 1
y+
2
f X ,Y , Z ( x , y , z) 2( x + y )z 2( x + y )
f X|Y , Z ( x|y , z) = = =
fY , Z ( y , z) (2 y + 1)z (2 y + 1)
f X ,Y ( x , y) x + y 2( x + y )
fY|X ( y|x) = = =
f X ( x) 1 2x + 1
x+
2
( x 2 + 2 xy)z 2
FX , Z|Y ( x , z|y ) = , so FX , Z|Y ( x = 0.4, z = 0.6|y = 0.25) = 0.086
2y + 1
x 2 + 2 xy
FX|Y , Z ( x|y , z) = , so FX|Y , Z ( x = 0.4|y = 0.25, z = 0.6) = 0.240
(2 y + 1)
y 2 + 2 xy
FY|X ( y|x) = , so FY|X ( y = 0.6|x = 0.4) = 0.467
2x + 1
discussed next, the identities of the marginal probability functions are not
sufficient to identify the joint functions.
5.4 Independence
The concept of independence of random variables follows directly from
that of independence of events. The algebraic test is also the same. The
concept of independence is that knowledge of the occurrence of a random
event (or the random variable that is its image) does not restrict the chance
of occurrence of another event. In Chapter 3, we expressed this idea in
Equation (3.6) as
Pr[A ∩ B] = Pr[ A ]Pr[ B ]
and we used this condition to obtain the equivalent statement that for inde-
pendent events
Pr[A | B] = Pr[ A ]
FX ,Y ( x , y ) = FX ( x)FY ( y ) (5.20)
fX ,Y ( x , y ) = fX ( x) fY ( y ) (5.21)
The reason each of these conditions is sufficient is that they are equivalent.
Furthermore, the conditions apply equally to discrete and continuous ran-
dom variables. This is because the concept of independence applies equally
to discrete and continuous random variables.
Note that it is also possible for subsets of the constituents of a random
vector to be independent. For a random vector X = (X 1 , X 2 , X 3 , X 4 , X 5 ) it is
conceivable that X1 and X2 could be independent of X3, X4, and X5, while at
the same time X1 and X2 are dependent, and the set X3, X4, and X5 are also
dependent. In that case, we would find that
and
Example 5.14
For the distinguishable dice having the probabilities enumerated in
Table 5.1, X and Y are independent. We can see that for each of the table
entries FX,Y (x,y) = FX (x) FY (y). For example,
1 1 2
FX ,Y (3, 4) = = FX ( x)FY ( y ) =
3 2 3
and
2 2 1
FX ,Y ( 4, 2) = = FX ( x)FY ( y ) =
9 3 3
f X ,Y (3, 4) 1 36 1
f X|Y (3|4) = = =
fY (4) 1
6 6
Example 5.15
For the discrete joint density function in Example 5.5, we found that
e −6 6 y2 − y1
fY2|Y1 ( y 2 |y1 ) =
( y 2 − y1 )!
e −17 17 y2
fY2 ( y 2 ) =
y2 !
Joint, Marginal, and Conditional Distributions 93
Example 5.16
For the joint density function in Example 5.8, we found in Example 5.10
that
f X ( x) = 2 e −2 x and fY ( y ) = 2 e − y − 2 e −2 y
2 e −2 x (2 e − y − 2 e −2 y ) = 4(e −2 x − y − e −2( x + y ) ) ≠ 2 e − x − y = f XY ( x , y )
Example 5.17
For the three-dimensional joint density function of Example 5.12, the
random variables X and Y are dependent, but the pair is independent of
Z, and each is individually independent of Z. To see these relationships,
observe that
1 1
f X ( x ) f Y ( y ) = x + y + ≠ ( x + y ) = f X ,Y ( x , y )
2 2
f X ,Y , Z ( x , y , z) = 2( x + y )z = f X ,Y ( x , y ) f Z ( z) = ( x + y )(2 z)
1
f X ,Z ( x , z) = (2 x + 1)z = x + (2 z) = f X ( x) fZ ( z)
2
While Equation (5.21) is used in Example 5.17, note that Equations (5.20),
(5.22), and (5.23) yield the same conclusions.
The concept of independence is particularly important for two reasons.
As should be apparent from the construction, using multiplicative computa-
tions for independent random variables can simplify some calculations. The
second and less obvious reason is that dependence implies a reduction in
the size of the portion of the sample space that must be considered. For some
modeling and calculation situations, the use of conditioning can greatly sim-
plify the analysis of a phenomenon.
2
2
−
1 x −µ x − 2 ρ x − µ x y − µ y + y − µ y
1 2 ( 1− ρ2 ) σ x σx σy σy
fX ,Y ( x , y ) = e
(5.24)
2 πσ x σ y 1 − ρ2
5
10
15
20
0.06
0.04
0.02
10 15 0.00
20 25 30
FIGURE 5.1
Bivariate normal density with (µ x = 12, µ y = 18, σx = 1.5, σy = 2.2, ρ = 0.6).
Joint, Marginal, and Conditional Distributions 95
Example 5.18
For the bivariate normal distribution shown in Figure 5.1, the use of a
standard mathematical software package yields
FX,Y(10.4, 14.8) = 0.039
FX,Y(12.6, 16.5) = 0.227
FX,Y(13.8, 21.3) = 0.850
and
f X ( x) =
∫−∞
f X ,Y ( x , y ) dy
Starting with the form of the density given in Equation (5.24), we can sub-
stitute for y as
y − µy
v=
σy
so
dy
dv =
σy
1 x − µ x 2 x − µx 1 x − µx 2 1 x − µx
2
− 2ρv + 2
v = + 2(1 − ρ2 ) v − ρ σ
2(1 − ρ2 ) σ x σx 2 σ x x
Since x is not a variable of integration, we can factor out the first term of
the integrand to obtain
2 2
1 x−µ x 1 x−µ x
∞ − v − ρ σ
1 − 2 1
f X ( x) =
2 πσ x
e σ
x
∫−∞ 1 − ρ2
e 2 ( 1− ρ2 ) x
dv
1 x − µx
u= v − ρ σ
2
1− ρ x
implies that
dv
du =
1 − ρ2
∞ u2
1 −
∫−∞ 2π
e 2 du = 1
Therefore
2
1 x−µ x
1 −
2 σ x
f X ( x) = e (5.25)
σ x 2π
which we recognize as the normal density. The same analysis for the mar-
ginal on Y yields
2
1 y −µ y
−
1 2 σy
fY ( y ) = e (5.26)
σ y 2π
Joint, Marginal, and Conditional Distributions 97
2
2
−
1 x −µ x − 2 ρ x − µ x y − µ y + y −µ y
1 2 ( 1− ρ2 ) σ x
σx σy σy
e
f (x, y) 2 πσ x σ y 1 − ρ2
fY|X ( y|x) = X ,Y = 2
f X ( x) 1
1 x−µ x
−
2 σ x
e
σ x 2π
2
1 σy
1 − y −µ y −ρ ( x−µ x )
2 σ 2y ( 1− ρ2 ) σx
= e
σ y 2 π(1 − ρ2 )
2
σx
y −µ x −ρ ( y −µ y )
1
− 2
f (x, y) 1 2 σy
fX|Y ( x|y ) = X ,Y = e 2 σ x (1−ρ )
fY ( y ) 2
σ x 2 π(1 − ρ )
σ
which is a univariate normal density having µ x|y = µ x + ρ x y − µ y
σy
( ) and
σ 2x|y = σ 2x (1 − ρ2 ).
Having seen the bivariate model, we can advance to normal distributions
on random vectors of higher dimensionality. In order to do this, note that the
exponent in Equation (5.24) is actually a quadratic form. That is
1 x − µ 2 x − µx y − µy y − µy
2
x
− 2ρ +
2(1 − ρ2 ) σ x σx σy σ y
x − µx
= ( x − µ x , y − µ y )M −1
y − µ y
1 ρ
σ 2x (1 − ρ2 ) 2
σ x σ y (1 − ρ )
M −1 =
ρ 1
σ x σ y (1 − ρ2 ) σ 2y (1 − ρ2 )
98 Probability Foundations for Engineers
For this case, the matrix M–1 is obtained from the covariance matrix
σ 2x ρσ x σ y
M=
ρσ x σ y σ 2y
and as in the univariate case, the variance terms appear in the denominators.
To extend this form to the multivariate case, represent the random vector as
X = (X 1 , X 2 , … , X r ) for which the mean vector is µ = (µ 1 , µ 2 , … , µ r ) and the
covariance matrix, M, is symmetric and positive definite. The elements of the
covariance matrix are σ 2Xi , X j , which is the covariance of the two variables.
Then, the general form for the r-variate normal density is
1
M −1 2 1
− ( X − µ) M −1 ( X − µ)T
fX ( x) = r e 2 (5.27)
(2 π) 2
(X − µ ) = (X 1 − µ 1 , … , X s − µ s , X s+ 1 − µ s+ 1 , … , X r − µ r )
S R
Ms =
RT T
where the submatrix S has dimension s × s and the submatrix T has dimen-
sion (r – s) × (r –s). Then, the marginal density on the s-variate random vector
(X s ) = (X 1 , X 2 , … , X s ) is
Joint, Marginal, and Conditional Distributions 99
1
S−1 2 1
− ( X s − µ s )S−1 ( X s −µ s )T
f X 1 , X 2 , … , X s ( xs ) = s e 2
(5.28)
(2 π ) 2
Qs U1
Q = Ms−1 = T
U 1 V
With this definition, the inverse of the covariance matrix for the condi-
tional density is
Qs − U 1V −1U 1T
To see that this is the form of the matrix inverse, consider the multiplica-
tion of Q and Ms. Since
I s× s 0 S R Qs U1
Ms−1Q = I = =
0 I r − s× r − s RT T U 1T V
SQs + RU 1T SU 1 + RV
= T
R Qs + TU 1 RTU 1 + TV
T
SQs + RU 1T = I s × s
and
SU 1 + RV = 0
so
R = −SU 1V −1
100 Probability Foundations for Engineers
SQs − SU 1V −1U 1T = I s × s
S−1 = Qs − U 1V −1U 1T
1
Qss − U 1V −1U 1T 2 1
− ( X s − µ s )(Qss −U1V −1U1T )( X s − µ s )T
fX1 ,X2 , … ,X s|X s+1 , … ,Xr ( xs |xr − s ) = s e 2
(5.29)
(2 π ) 2
Example 5.19
Suppose we have a three-dimensional random vector (X ) = (X 1 , X 2 , X 3 )
for which the mean vector is (15, 9, 12) and the covariance matrix is
3 0 2
M = 0 2 1
2 1 2
3 2 0
S R
Ms = 2 2 1 = T
R T
0 1 2
with
3 2 1 −1
S= and S−1 =
−1 3
2 2
2
Joint, Marginal, and Conditional Distributions 101
1 1 3
(0.5) 2 − 2 ( x1 −µ1 )2 − 2( x1 −µ1 )( x3 −µ3 )+ 2 ( x3 −µ3 )2
f X1 ,X3 ( x1 , x3 ) = e
(2 π )
1 3
0.707 − 2 ( x1 −15)2 − 2( x1 −15)( x3 −12 )+ 2 ( x3 −12 )2
= e
(2 π )
1
1 − 4 ( x2 − 10)2
f X 2 ( x2 ) = e
4π
3 −4 2
Qss U1
Q = Ms−1 = T = −4 6 −3
U 1 V
2 −3 2
3 −4
Qs =
−4 6
3 −4 2
Qs − U 1V −1U 1T = − ( 1 ) ( 2 , −3 )
−4 6 −3 2
3 −4 2 −3 1 −1
= − 9 = −1 3
−4 6 −3
2 2
and
15 0 15
µTs = − ( 1 2)(−1) =
12 1 12.5
so
1 3
0.707 − 2 ( x1 −15)2 − 2( x1 −15)( x3 −12.5)+ 2 ( x3 −12.5)2
f X1 ,X3|X2 ( x1 , x3 |x2 = 8) = e
(2 π )
102 Probability Foundations for Engineers
Exercises
5.1 Two fair six-sided dice are rolled. Let X represent the sum of the
numbers observed on the two dice and let Y represent the mag-
nitude of the difference between the two numbers. Construct the
joint probability mass function f X,Y(x,y).
5.2 The joint pmf for the discrete random variables M and N is
e −7 4m 3n− m
f MN (m, n) = , 0 ≤ m ≤ n, 0 ≤ n < ∞
m !(n − m)!
Compute Pr[1 ≤ M ≤ 6, 4 ≤ N ≤ 6].
5.3 The joint density on the random variables X and Y is
fXY ( x , y ) = 2 e − x− y , 0 ≤ x ≤ y , 0 ≤ y < ∞
Compute Pr[1 ≤ X ≤ 6, 4 ≤ Y ≤ 10].
5.4 The joint probability density function for the random variables X
and Y is
x
fXY ( x , y ) = + cy , 0 ≤ x ≤ 1, 1 ≤ y ≤ 5
5
6 2 xy
fXY ( x , y ) = ( x + ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
7 2
Compute Pr[X ≤ 0.6, Y ≤ 0.8].
5.6 Let X be selected at random from the set {1,2,3,4,5} and let Y be
selected from the set {1, 2, …, X}. Identify the joint probability mass
function of (X, Y), compute the conditional pmf on Y given X = 4,
and the conditional pmf on X given Y = 3.
Joint, Marginal, and Conditional Distributions 103
5.7 For the joint pmf constructed in Exercise 5.1, identify the marginal
pmf on Y.
5.8 The joint pmf for the discrete random variables M and N is
e −7 4m3n− m
f MN (m, n) = , 0 ≤ m ≤ n, 0 ≤ n < ∞
m !(n − m)!
fXY ( x , y ) = ( x + y ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
1
fX ,Y ( x , y ) = , 0 < y < x, 0 < x < 1
x
e − x/y e − y
f ( x, y ) = ; 0 < x < ∞, 0 < y < ∞
y
0.02 xe −0.01y
fXY ( x , y ) = , 0 ≤ x ≤ y, y > 0
y2
6 2 xy
fXY ( x , y ) = ( x + ), 0 ≤ x ≤ 1, 0 ≤ y ≤ 2
7 2
Compute Pr[0.75 < Y < 1.6 | X = 0.25].
104 Probability Foundations for Engineers
fXY ( x , y ) = 2 e − x − y , 0 ≤ x ≤ y, 0 ≤ y < ∞
Compute Pr[3.4 ≤ Y ≤ 6.2 | X = 2].
5.16 The joint probability density function for the random vari-
ables X and Y is f XY(xy) = xe–x(y + 1), 0 < y < ∞, 0 < x < ∞. Compute
Pr[Y ≥ 1.8 | Y > 1.2, X = 1.5].
5.17 For the joint density function given in Exercise 5.5, are the variables
X and Y independent?
5.18 For the joint density function given in Exercise 5.9, are X and Y
independent?
5.19 Show that f X|Y(x|y) = f X(x) implies independence of X and Y.
5.20 Consider the three-dimensional normal density specified in
Example 5.19. Determine the joint marginal density on X 2 = (X 1 , X 2 )
and indicate why these two variables are independent.
5.21 For a bivariate normal density, suppose the quadratic form is
1
3
(
6( x + 1)2 − 2 xy + 2 y − 4x − 4 + ( y − 2)2 )
2 0 1 0
0 5 0 −3
M=
1 0 1 0
0 −3 2
0
Identify the marginal joint density on the vector (X1 , X2) and the
conditional joint density (X1 , X4 | X2 , X3).
6
Expectation and Functions
of Random Variables
Once a random variable has been defined, there can be many reasons
for defining functions of that variable. One reasonable example would
be a conversion of a temperature measurement from a Celsius scale to a
Fahrenheit scale. Another would be the conversion from sales volume to
revenue. In general, since a random variable is a function for which the
range is the real line, the construction of another function—a function of
the random variable—should be a reasonable thing to do and should con-
form to usual algebraic behaviors. The corresponding distribution func-
tion for the functional variable can be constructed at the same time. This
is an important extension of the probability concepts treated in this text,
so it is included in this chapter. However, the analysis of general functions
of random variables and random vectors is placed later in the chapter
because there is a particular function—called expectation—that is central
to many probability analyses.
6.1 Expectation
The expectation of a random variable, X, is defined as a weighted sum of
the possible values that X may take. The weighting is the corresponding
probability measure. For a discrete random variable, X, the expectation or
expected value of X is denoted by E[X] and is computed as
E[Y ] =
∫ yf (y)dy (6.2)
y
Y
105
106 Probability Foundations for Engineers
Note that in both cases the possible values of the random variable are mul-
tiplied by their corresponding probability measure and that these products
are accumulated over the range of the random variable.
It is important to note that the expected value of a random variable is a
descriptor of the distribution on that variable. In fact, it is the first moment
about the origin of the probability mass function (pmf) or probability den-
sity function (pdf), whichever term applies, and it is often referred to as
the mean of the distribution. The expected value does, in fact, correspond
to the center of gravity of the probability measure. Thus, the expected
value of a random variable—its mean—is an indication of the center of the
distribution.
Keeping in mind that the pdf (or pmf) is a function, we realize that it has
higher-order moments than just the first. The moments jointly characterize
all of the features of the function. In the study of probability, we often con-
sider higher moments of a distribution, particularly the second moment. In
general, the kth moment of a distribution is defined as
E[X k ] = ∑xx
k
Pr[X = x] (6.3)
E[Y k ] =
∫y y
k
fY ( y )dy (6.4)
Note that the variance is also the first moment about the mean. That is, we
could define the variance as
Using the properties of expectation that are discussed later, this expression
can be shown to equal Equation (6.5). First, consider some realizations of the
mean and variance.
All distribution functions have moments and each of the commonly used
distributions that have been discussed in this text has a mean and a variance.
Usually, using Equation (6.3) or Equation (6.4), the determination of the val-
ues for the moments is not too difficult. Consider some examples.
Expectation and Functions of Random Variables 107
Example 6.1
For the binomial distribution, we have
n n
n
E[X ] = ∑ x=0
x p x q n− x =
x ∑ x x !(nn−! x)! p q
x=1
x n− x
n n− 1
= ∑ n!
( x − 1)!(n − x)!
p x q n− x = np ∑
( x −
(n − 1)!
1)!( n − x )!
p x −1 q n− x = np
x=1 x − 1= 0
where the final step depends on the fact that the summation represents
all of the probability for a binomial random variable having range zero
to n – 1 and is thus equal to one. In the case of the variance, we start with
n n
n
E[X 2 ] = ∑
x=0
x 2 p x q n− x =
x ∑x x=1
2 n!
x !(n − x)!
p x q n− x
n n− 1
= ∑
x=1
xn!
( x − 1)!(n − x)!
p x q n− x = np
x − 1= 0
∑
( x − 1 + 1)(n − 1)! x −1 n− x
( x − 1)!(n − x)!
p q
n− 1
= np ∑ (n − 1)!
x −1=1 ( x − 2)!(n − x)!
p x −1q n− x + 1
n− 2
= np (n − 1) p
x−2=0
∑ (n − 2)!
( x − 2)!(n − x)!
p x − 2 q n− x + 1 = np ((n − 1) p + 1)
= np ((n − 1) p + 1) = np ( np + q ) = n2 p 2 + npq
Example 6.2
For the geometric distribution, we have
∞ ∞ ∞
∞
E[ K ] = ∑ kpq
k =1
k −1
=p ∑ kq
k =0
k −1
=p ∑ dqd (q ) = p dqd ∑ q
k =0
k
k= 0
k
d 1 1 p 1
=p =p = =
dq 1 − q (1 − q)2 p 2 p
108 Probability Foundations for Engineers
∞ ∞ ∞ ∞
E[ K 2 ] = ∑
k =1
k 2 pq k −1 = p ∑k =0
( k 2 − k + k )q k −1 = pq ∑
k =0
k( k − 1)q k − 2 + p ∑ kq
k =0
k −1
∞
∞
∑ dqd (q ) + 1p = pq dqd ∑ q + 1p
2 2
= pq 2
k
2
k
k=0 k =0
d2 1 1 2 pq 1 2 p(1 − p) 1 2 − p
= pq + = + = + = 2
dq 2 1 − q p (1 − q)3 p p3 p p
2−p 1 q
Var[ K ] = E[ K 2 ] − E 2 [ K ] = − 2 = 2
p2 p p
Example 6.3
For the exponential distribution, we have
∞ ∞ ∞ ∞
1 1
E[T ] =
∫0
tfT (t)dt =
∫0
λte − λt dt = −te − λt +
∫
0
e − λt dt = −te − λt − e − λt =
λ 0
λ
and
∞ ∞
E[T 2 ] =
∫ 0
λt 2 e − λt dt = −t 2 e − λt + 2
∫ 0
te − λt dt
∞
2t − λt 2
= −t 2 e − λt −
λ
e +
λ ∫ 0
e − λt dt
∞
2t 2 2
= −t 2 e − λt − e − λt − 2 e − λt = 2
λ λ 0 λ
2 1 1
Var[T ] = E[T 2 ] − E 2 [T ] = − =
λ2 λ2 λ2
E[X + Y ] =
∫ ∫ (x + y) f
x y
X ,Y ( x , y )dydx =
∫ ∫ xf
x y
X ,Y ( x , y )dydx +
∫ ∫ yf
x y
X ,Y ( x , y )dydx
=
∫ xf
x
X ,Y ( x , y )dx +
∫ yf
y
X ,Y ( x , y )dydx = E[X ] + E[Y ]
This construction also confirms that the expected value of a sum of random
variables equals the sum of their expected values regardless of whether the
variables are independent. Thus, expectation has the property of being linear.
A further implication of the linearity of the expectation is that for con-
stants, say a and b,
E[ aX + b] = aE[X ] + b (6.7)
E[ aX k + b] = aE[X k ] + b
The assertion that the first moment about the mean equals Equation (6.5) is
based on the linearity of the expectation. That is, starting with
we can perform the squaring operation and distribute the expectation across
the resulting sum. Then
= E[X 2 ] − E 2 [X ]
This construction also illustrates the fact that an expected value, E[X], is a
constant rather than a random variable.
110 Probability Foundations for Engineers
Notice that a constant has no variance. It is constant and does not vary.
The third property of expectation is that it applies to functions of random
variables as well as to the random variables. We will examine functions of
random variables later in this chapter. For now, suppose that we have defined
a function, say g(X), on the random variable X. We can obtain the expected
value of that function by applying the definition of expectation directly. That
is, for a discrete random variable
E[ g(X )] =
∫ g( x) f
x
X ( x)dx (6.10)
Essentially, the expectation of the function is the weighted sum of the values
the function can assume where the weights are the associated probability
measures.
Example 6.4
Suppose we roll a fair six-sided die and receive a payment equal to three
times the number shown by the die. What is our expected gain? For this
experiment, the payout function is
g(x) = 3x
so the expected payout is
1 1 1 1 1 1
E[ g(X )] = (3) + (6) + (9) + (12) + (15) + (18) = 10.5
6 6 6 6 6 6
For the more advanced reader, can you show that Var[g(x)] = 26.25?
E[XY ] =
∫ ∫ xyf
x y
X ,Y ( xy )dydx (6.12)
These definitions conform to the pattern set for the single-dimensional case.
Consider two examples.
Example 6.5
For the discrete bivariate density defined in Example 5.5, we construct
the expected value of the random vector (Y1, Y2) as
∞ y2 ∞ y2 − 1
e −17 11y1 6 y2 − y1 11y1 −16 y2 − y1
E[Y1Y2 ] = ∑∑
y 2 = 0 y1 = 0
y1 y 2
y1 !( y 2 − y1 )!
= 11e −17
y
∑ ∑
2 =1
y2
y1 − 1= 0
(y1 − 1)!(y 2 − y1 )!!
∞ y2 − 1 y1 − 1 y 2 − y1
= 11e −17 ∑ (y 1− 1)! y ∑ (y(y −−11)!)!(11y
y2 = 1
2
2
y1 − 1= 0
2
1 2
6
− y1 )!
∞ y2 − 1
y 2 − 1
= 11e −17 ∑ y2 =1
( y
y2
2 − 1)!
y
∑ y − 1 11
1 − 1= 0
1
y1 − 1 y 2 − y1
6
∞ ∞
y 2 (17 y2 −1 ) − 1 + 1)(17 y2 −1 )
= 11e −17 ∑ y2 =1
( y 2 − 1)!!
= 11e −17
y
∑ (y
2 − 1= 0
2
( y 2 − 1)!
∞ ∞
( y 2 − 1)(17 y2 −1 ) y2 − 1
= 11e −17
y2
∑ − 1= 0
( y 2 − 1)!
+
y
∑ ((y17 − 1)!)
2 − 1= 0
2
∞ ∞
( y 2 − 1)(17 y2 −1 ) (17 y2 −1 )
= 11 ∑
y2 − 1= 0
e −17
( y 2 − 1)!
+ 11e −17
y
∑
2 − 1= 0
( y 2 − 1)!
Example 6.6
For the continuous random vector (X, Y) suppose
f XY ( x , y ) = 2 e − x − y , 0 ≤ x ≤ y, 0 ≤ y < ∞
∞ y ∞ y ∞ y
E[XY ] =
∫ ∫
0 0
xyf XY ( x , y )dxdy = 2
∫ ∫
0 0
xye − x − y dxdy = 2
∫0
ye − y
∫
0
xe − x dxdy
( ) dy = 2 ∫
∞ ∞
∫ ( ye )
y
=2 ye − y − xe − x − e − x −y
− ye −2 y − y 2 e −2 y dy
0 0 0
∞
1 1 1 1 1
= 2 − ye − y − e − y + ye −2 y + e −2 y + y 2 e −2 y + ye −2 y + e −2 y
2 4 2 2 4 0
1 1
= 21− − = 1
4 4
E[X mY n ] = ∑∑ x y
x y
m n
Pr[X = x , Y = y ] (6.13)
E[X mY n ] =
∫∫x y
x y
m n
fX ,Y ( xy )dydx (6.14)
Cov[X , Y ] σ2
ρXY = = XY (6.16)
Var[X ]Var[Y ] σ X σ Y
The correlation, ρXY between X and Y will lie in the interval (–1, 1) and its
sign will be determined by the numerator—the covariance—as the denomi-
nator is positive by definition.
Example 6.7
For the discrete bivariate density defined in Example 5.5, we found in
Example 6.5 that E[Y1Y2] = 198.0 and we found in Example 5.5 that
Cov[Y1 , Y2 ] 11.0
ρY1Y2 = = = 0.804
Var[Y1 ]Var[Y2 ] (11.0)(17.0)
Example 6.8
For the continuous density of Example 6.6, we obtained E[XY] = 1.0, and
in Example 5.10, we found that
so we can compute the expectations E[X] = 0.50 and E[Y] = 1.50 and
Cov[XY ] 0.25
ρXY = = = 0.378
Var[X ]Var[Y ] (0.25)(1.75)
Now that the relationships of covariance and correlation have been defined,
the reader is encouraged to review the discussion of the multivariate normal
114 Probability Foundations for Engineers
distribution in Chapter 5, Section 5.5 of this text. The covariance and corre-
lation terms presented there are exactly the same as those described in this
chapter.
A particularly important feature of covariance is that independent random
variables have a covariance of zero. This is reasonably apparent when we
recognize the fact that independent random variables, say X and Y, have
E[XY] = E[X]E[Y]
1.
Cov[X, Y] = Cov[Y, X]
2.
Cov[X, X] = Var[X]
3.
Cov[aX, bY] = abCov[X, Y]
n
Var[X ] = E
∏ ( X − E[X ]) (6.17)
i=1
i i
Example 6.9
For the discrete bivariate density defined in Example 5.5, we found in
Example 5.7 that
Expectation and Functions of Random Variables 115
e −6 6 y2 − y1
fY2|Y1 ( y 2 |y1 ) =
( y 2 − y1 )!
and
y 2 11 y1 6 y2 − y1
fY1|Y2 ( y1 |y 2 ) =
y1 17 17
y2
y2 y1 y 2 − y1
E[Y1 |Y2 ] = ∑ y y 1711
y1 = 0
1
1
6
17
11
= y2
17
11 6
Var[Y1 |Y2 ] = E[Y12 |Y2 ] − E 2 [Y1 |Y2 ] = y 2
17 17
∞ ∞
e −6 6 y2 − y1 e −6 6 y2 − y1
E[Y2 |Y1 ] = ∑
y 2 = y1
y2
( y 2 − y1 )! y
= ∑
2 − y1 = 0
( y 2 − y1 + y1 )
( y 2 − y1 )!
∞ ∞
e −6 6 y2 − y1 e −6 6 y2 − y1
= ∑
y 2 − y1 = 0
( y 2 − y1 )
( y 2 − y1 )!
+ y1
y
∑
2 − y1 =0
( y 2 − y1 )!
= 6 + y1
Example 6.10
For the continuous density of Example 6.6, we found the conditional
densities in Example 5.12 to be
e −x
f X|Y ( x|y ) =
1 − e −y
and
116 Probability Foundations for Engineers
fY|X ( y|x) = e −( y −x )
y
e−x 1 − e − y − ye − y
E[X |Y ] =
∫ 0
x
1− e −y
dx =
1 − e−y
1 − 2 e − y − y 2 e − y + e −2 y
Var[X|Y ] = E[X 2 |Y ] − E 2 [X|Y ] =
(1 − e − y )2
∞
E[Y|X ] =
∫ x ye −( y − x)
dy = x + 1
∑ ∑ x F (a − 1)
f X ( x)
E[X |X ≥ a] = xfX|X ≥ a ( x|X ≥ a) =
X
x= a x= a
Var[X |X ≥ a] = E[X 2 |X ≥ a] − E 2 [X |X ≥ a]
In the case of a continuous random variable, say T, the same logic yields
τ τ
fT (t)
E[T |T ≤ τ] =
∫ 0
tfT|T ≤τ (t|T ≤ τ)dt =
∫ t F (τ) dt
0 T
and
Var[T |T ≤ τ] = E[T 2 |T ≤ τ] − E 2 [T |T ≤ τ]
Example 6.11
For the Poisson random variable in Example 4.27,
∞
xe −λ λ x ∞
xe −λ λ x
7 −λ
∑ ∑ ∑ xe x !λ
x
1 1
E[X|X ≥ 8] = = −
FX (7) x=8
x! 1 − FX (7) x=0
x! x=0
7 −λ 6 −λ x−1
∑ (ex −λ1)! = 1 − F1 (7) λ − λ ∑ e(x −λ1)!
x
1
= λ −
1 − FX (7) x=1
X
x − 1= 0
e −λ λx e −λ 2 λ x −1
∞ ∞ ∞ ∞
λ x− 2
= ∑ ( x − 1)λ x
+ ∑= λ + λ
1 − FX (7) x−1=7 ( x − 1)! x−1=7 ( x − 1)! 1 − FX (7) x−2=6 ( x − 2)! x−1=7 ( x − 1)! ∑ ∑
e −λ 2 λ x− 2 ∞ λ x −1 λ x −1
∞ 5 6
λ x− 2
= λ ∑ − ∑
+λ
1 − FX (7) x−2=0 ( x − 2)! x−2=0 ( x − 2)!
− ∑
x−1=0 ( x − 1)! x−1=0 ( x − 1)!
∑
λ 2 e −λ e λ λ 2 FX (5) λ e −λ e λ λFX (6) λ 2 (1 − FX (5)) + λ(1 − FX (6))
= − + − =
1 − FX (7) 1 − FX (7) 1 − FX (7) 1 − FX (7) 1 − FX (7)
144(0.978) + 12(0.954)
= = 167.514
0.911
Example 6.12
For the exponential failure distribution of Example 5.15,
24 24
fT (t) 1
E[T |T ≤ 24] =
∫
0
t
FT (24)
dt =
1 − e −(0.025)(24.0) ∫ 0
λte −λt dt
1 1
24 − 24e −0.6 − e −0.6
1 −λt 1 −λt λ λ
= −te − e = = 10.807
1 − e −0.6 λ 1 − e −0.6
0
24 24
fT (t) 1
E[T 2 |T ≤ 24] =
∫ 0
t2
FT (24)
dt =
1 − e −0.60 ∫
0
λt 2 e −λt dt
24
1 2 −λt 2t −λt 2 −λt
= −t e − e − 2 e
1 − e −0.6 λ λ 0
118 Probability Foundations for Engineers
2 48 2
− e −0.6 242 + +
λ2 λ λ2
= = 163.942
1 − e −0.6
E[Y ] = E [ E[Y |X ]] =
∫ E[Y|x] f
x
X ( x)dx (6.18)
or
E[Y ] = E [ E[Y |X ]] = ∑ E[Y|X = x]Pr[X = x] (6.19)
x
Example 6.13
For the joint density
f XY ( x , y ) = 2 e − x − y , 0 ≤ x ≤ y, 0 ≤ y < ∞
f X ( x) = 2 e −2 x
and
fY|X ( y|x) = e −( y − x )
Using these
∞ ∞ ∞
E[Y|X ] =
∫ x
yfY|X ( y|x)dy =
∫ x
ye x − y dy = e x
∫ x
ye − y dy
( ) ( )
∞
= e x − ye − y − e − y = e x xe − x + e − x = x + 1
x
and
∞
E[Y ] =
∫ 0
( x + 1) f X ( x)dx = E[X ] + 1 = 3 2
as E[X ] = 1 .
2
Expectation and Functions of Random Variables 119
Example 6.14
In Example 6.9, we constructed
y2
y 2 11 y1 6 y2 − y1 11
E[Y1 |Y2 ] = ∑
y1 = 0
y1
y1 17 17
= y2
17
These examples illustrate the fact that it can be much easier to use condi-
tioning to perform probability calculations than it is to make the calculations
directly. The reason for this is that the conditional probabilities are based on
some knowledge that sometimes simplifies their calculations.
This completes the enumeration of the various aspects of the functions of
random variables that correspond to expectation. It is now time to move on to
general functions of random variables and random vectors. This discussion,
in turn, will be followed by an examination of the special class of functions in
which we sum independent random variables.
Y = 1.8X + 32
Y = e aX
we obtain probabilities on Y as
1
Pr[Y ≤ y ] = Pr[e aX ≤ y ] = Pr[ aX ≤ ln y ] = Pr[X ≤ ln y ]
a
d −1
fY ( y ) = g ( y ) fX ( g −1 ( y )) (6.20)
dy
This is to say that we take the absolute value of the derivative of the inverse
function and multiply it by the density function on X evaluated at the value,
which is the inverse of the functional value for which the density is desired.
Consider some examples.
Example 6.15
− λx − λx
Let FX ( x) = 1 − e so f X ( x) = λe .
Suppose Y = g(X) = cX.
Then
y
g −1 ( y ) =
c
Expectation and Functions of Random Variables 121
and
d −1 d y 1
g (y) = =
dy dy c c
Therefore,
λ
d −1 λ − y
fY ( y ) = g ( y ) f X ( g −1 ( y )) = e c
dy c
and
λ λ
y y
λ −cy
∫ ∫
− y
FY ( y ) = fY ( y )dy = e dy = 1 − e c
0 0 c
0.0001
−( )( 60000 )
FY ( y = 60000) = 1 − e 15 = 1 − e −0.4 = 0.330
Example 6.16
) , so fX (x) = βx e −( x θ) , and suppose again that
β−1 β
Let FX ( x) = 1 − e(
β
x
θ
β
θ
Y = g(X ) = cX . Then the inverse function and its derivative are the same
as in the previous example so
β−1 β
β c
y y
− c θ β
d −1 −1
cβy β−1 − cθ
y
fY ( y ) = g ( y ) f X ( g ( y )) = e = e
dy θβ
( cθ)β
and
β
− cθ
y y
∫
FY ( y ) = fY ( y )dy = 1 − e
0
( ) = 1 − e −0.366 = 0.306
1.6
− 800 1500
FX ( x = 800) = 1 − e
( )
1.6
− 2520 ( 3.15 )( 1500 )
FY ( y = 2520) = 1 − e = 1 − e −0.336 = 0.306
122 Probability Foundations for Engineers
Example 6.17
x−a 1
Let FX ( x) = so f X ( x) =
b−a b−a
g −1 ( y ) = e y
and
d −1
g (y) = e y
dy
Therefore,
d −1 ey
fY ( y ) = g ( y ) f X ( g −1 ( y )) =
dy (b − a)
and
y y
ey ey − a
FY ( y ) =
∫
ln a
fY ( y ) dy =
∫
ln a b − a
dy =
b−a
2.6 − 1.0
FX ( x = 2.6) = = 0.40
5.0 − 1.0
When the functions are so defined, we can obtain the joint density on the
random vector Y as:
∂g1 ∂g1
∂x1 ∂x2
J ( x1 , x2 ) =
∂g 2 ∂g 2
∂x1 ∂x2
Example 6.18
Suppose the random variables X1 and X 2 are independent and gamma
distributed with
λ α1 x1α1 −1 e − λx1
f X 1 ( x1 ) =
Γ(α 1 )
and
λ α 2 x2α 2 −1 e − λx2
f X 2 ( x2 ) =
Γ(α 2 )
and that
Y1 = g1 ( x1 , x2 ) = X 1 + X 2
and
Y2 = g 2 ( x1 , x2 ) = X 1 (X + X )
1 2
124 Probability Foundations for Engineers
∂g1 ∂g1
∂x1 = 1 ∂x2 = 1
J ( x1 , x 2 ) =
∂g x2 ∂gg 2 − x1
2 ∂x = ∂x2 = ( x1 + x2 )2
1 ( x1 + x2 )2
and
−1
J ( x1 , x 2 ) =
x1 + x2
x2 = y1 − x1
and
y 2 = x1 ( x + y − x ) = x1 y
1 1 1 1
so x1 = y1 y 2 and x2 = y1 (1 − y 2 ) .
Now, as we know
λ α1 +α 2 x1α1 −1 x2α 2 −1 e − λ( x1 + x2 )
f X1 ,X2 ( x1 , x2 ) =
Γ((α 1 )Γ(α 2 )
Therefore,
−1
fY1 ,Y2 ( y1 , y 2 ) = f X1 , X2 ( h1 ( y1 , y 2 ), h2 ( y1 , y 2 ))| J ( x1 , x2 )|
Example 6.19
Suppose the random variables X1 and X 2 have the joint density function
1
f X1 ,X2 ( x1 , x2 ) = , x1 ≥ 1, x2 ≥ 1
x12 x22
x
and that y1 = x1x2 and y 2 = 1 x2 . What is the density on Y ? For these
functions
Expectation and Functions of Random Variables 125
∂g1 ∂g1
∂x1 = x2 ∂x2 = x1
J ( x1 , x 2 ) =
∂g 1 ∂g 2 − x1
2 ∂x = ∂x2 = x22
1 x2
and
−2 x1 2 x1
J ( x1 , x 2 ) = = 2
x22 x2
x1 = y1 y 2
and
y1
x2 = y2
( )
= 2 y
y1
( y1y 2 ) y2
2
1
= 2
2 y1 y 2
E[ aX + b] = aE[X ] + b
126 Probability Foundations for Engineers
( ) ( )
= a 2 E[X 2 ] − E 2 [X ] + b 2 E[Y 2 ] − E 2 [Y ] + 2 ab ( E[XY ] − E[X ]E[Y ])
It is true that this type of analysis is not applicable to all functions. It usually
applies well to linear functions and many products. When it does not apply,
one must use the general definitions of mean and variance. Note particularly
that when it does apply, the variables in the function need not be independent.
z z
FZ ( z) = Pr[Z ≤ z] = Pr[X + Y ≤ z] = ∑∑ f
i= 0 j= 0
X (i − j) fY ( j) (6.22)
FZ ( z) = ∑ F (z − j) f ( j) (6.23)
j= 0
X Y
and
z
FZ ( z) = ∑f
i= 0
X (i)FY ( z − i) (6.24)
f Z ( z) = ∑ f (z − j) f ( j) (6.25)
j= 0
X Y
or equivalently
z
f Z ( z) = ∑f
i= 0
X (i) fY ( z − i) (6.26)
Example 6.20
Suppose the two random variables have binomial distributions so that
n
f X ( x) = p1x (1 − p1 )n− x
x
and
m
fY ( y ) = p2y (1 − p2 )m− y
y
128 Probability Foundations for Engineers
Then
z z
n m
f Z ( z) = ∑f
j= 0
X ( z − j) fY ( j) = ∑ z − j p
j= 0
z− j
1 (1 − p1 )n− z+ j p2j (1 − p2 )m− j
j
In the special case in which the event probabilities are the same, so
p1 = p2 = p,
z
n m
f Z ( z) = ∑ z − j p
j= 0
z− j
(1 − p)n− z+ j p j (1 − p)m− j
j
z
n m n + m
= p z q n+ m− z ∑ z − j j =
j= 0 z
z n+ m− z
p q
Example 6.21
Suppose the two random variables have Poisson distributions so that
−λ 1
f X ( x) = e λ 1x
x!
and
−λ 2
fY ( y ) = e λ 2y
y!
Then
z z
∑ ∑ e λ 1z− j e − λ 2 λ 2j
− λ1
f Z ( z) = f X ( z − j) fY ( j) = ( z − j)! j !
j= 0 j= 0
z
e − λ1 − λ 2 e − ( λ1 + λ 2 )
=
z! ∑ (z −zj!)! j! λ
j= 0
z− j
1 λ 2j =
z!
(λ 1 + λ 2 ) z
Example 6.22
Suppose the two random variables have geometric distributions so that
f N (n) = q1n−1 p1
Expectation and Functions of Random Variables 129
and
f M (m) = q2m−1 p2
For K = N + M
k −1 k −1 k −1
f K (k ) = ∑
j=1
f N ( k − j) f M ( j) = ∑
j=1
q1k − j−1 p1 q2j−1 p2 = p1 p2 ∑q
j=1
k − j−1 j−1
1 q2
In the special case in which the event probabilities are the same, so
p1 = p2 = p ,
k −1
k − 1
f K (k ) = p 2 ∑q
j=1
k −2
= ( k − 1) p 2 q k − 2 =
1
2 k− 2
p q
which is a negative binomial pmf.
These examples illustrate the facts that the probabilities for the sums can
be computed directly and in some cases, the identity of the distribution is
preserved. Following is a more specific example.
Example 6.23
Suppose the one-day demand for a particular digital camera has the pmf
d 0 1 2 3
Pr(D=d) 0.1 0.4 0.3 0.2
What are the pmf and the cdf on the two-day demand?
The pmf is
d2 0 1 2 3 4 5 6
Pr(D2=d2) 0.01 0.08 0.22 0.28 0.25 0.12 0.04
The cdf is
d2 0 1 2 3 4 5 6
Pr(D2=d2) 0.01 0.09 0.31 0.59 0.84 0.96 1.00
The rules for the distribution on the sum of two continuous random vari-
ables are similar to those for discrete random variables. Essentially, the sums
are replaced by integrals. The concept is the same that all combinations of
the two variables should be included. Therefore,
z z
FZ ( z) = Pr[Z ≤ z] = Pr[X + Y ≤ z] =
∫ ∫
−∞ −∞
fX ( z − y ) fY ( y )dydx. (6.27)
130 Probability Foundations for Engineers
and
z
FZ ( z) =
∫ −∞
fX ( x)FY ( z − x)dx (6.29)
z
f Z ( z) =
∫ −∞
fX ( z − y ) fY ( y )dy (6.30)
and
z
f Z ( z) =
∫ −∞
fX ( x) fY ( z − x)dx (6.31)
Example 6.24
Suppose the two random variables have exponential distributions so that
f X ( x) = λ 1 e − λ 1 x
and
fY ( y ) = λ 2 e − λ 2 y
Then
z z
f Z ( z) =
∫
−∞
f X ( z − y ) fY ( y )dy =
∫0
λ 1e − λ1 ( z− y )λ 2 e − λ 2 y dy
z
z e − ( λ 2 − λ1 ) y
= λ 1λ 2 e − λ 1z
∫
0
e − ( λ 2 − λ1 ) y
dy = λ 1λ 2 e − λ 1z
− λ − λ
2 1
0
1 − e − ( λ 2 − λ1 ) z λλ
= λ 1λ 2 e − λ 1z
λ 2 − λ 1 λ 2 − λ
= 1 2 e − λ 1z − e − λ 2 z ( )
For the special case in which both distributions have the same value
for the rate parameter, so that λ1 = λ2 = λ,
Expectation and Functions of Random Variables 131
z z
f Z ( z) =
∫ 0
f X ( z − y ) fY ( y )dy =
∫ 0
λe − λ( z− y )λe − λy dy
∫
z
= λ 2 e − λz dy = λ 2 e − λz y 0 = λ 2 ze − λz
0
Example 6.25
Suppose the two random variables have gamma distributions so that
λ α1 1 x α1 −1 e − λ1x
f X ( x) =
Γ(α 1 )
and
λ α2 2 y α 2 −1 e − λ 2 y
fY ( y ) =
Γ(α 2 )
Then
z
f Z ( z) =
∫ 0
f X ( z − y ) fY ( y )dy
z
λ α1 1 ( z − y )α1 −1 e − λ1 ( z− y ) λ α2 2 y α 2 −1 e − λ 2 y
=
∫ 0 Γ(α 1 ) Γ(α 2 )
dy
λ 1α1 λ α2 2 z
=
Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1 e − λ1 ( z− y )− λ 2 y dy
λ 1α1 λ α2 2 e − λ1z z
=
Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1 e −( λ 2 − λ1 )y dy
For the special case in which both distributions have the same value
for the rate parameter, so that λ1 = λ2 = λ,
z z
λ α1 ( z − y )α1 −1 e − λ( z− y ) λ α 2 y α 2 −1e − λy
f Z ( z) =
∫0
f X ( z − y ) fY ( y )dy =
∫ 0 Γ(α 1 ) Γ(α 2 )
dy
λ α1 +α 2 z
=
Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1e − λ( z− y )− λy dy
λ α1 +α 2 e − λz z
=
Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1dy
zα1 +α 2 − 2 λ α1 +α 2 e − λz z
=
zα1 +α 2 − 2 Γ(α 1 )Γ(α 2 ) ∫ (z − y)
0
α1 −1
y α 2 −1dy
z λ α1 +α 2 zα1 +α 2 − 2 e − λz z
z − y α1 −1 y α 2 −1
=
z Γ(α 1 )Γ(α 2 ) ∫(
0 z
) ( )
z
dy
132 Probability Foundations for Engineers
y dy
Now, let w = so dw = and
z z
λ α1 +α 2 zα1 +α 2 −1 e − λz 1
f Z ( z) =
Γ(α 1 )Γ(α 2 ) ∫ (1 − w)
0
α1 −1
(w)α 2 −1 dw
Example 6.26
Suppose a machining station at a manufacturing facility takes an expo-
nentially distributed time to process a single piece. The distribution has
parameter λ = 4/hr. Suppose further that the cost of holding in-process
inventory is proportional to the length of time a workpiece waits for pro-
cessing. If a particular workpiece arrives to the inventory and finds four
pieces ahead of it with the first one just entering the machining station,
what is the distribution on the time until processing is started on the
arriving piece?
For this problem, the total waiting time is T T = T1 + T2 + T3 + T4. We know
that all of the times have exponential distributions with a common rate
parameter, so following Example 6.24, if X = T1 + T2 and Y = T3 + T4, then
x x
f X ( x) =
∫ 0
fT1 ( x − t2 ) fT2 (t2 )dt2 =
∫ 0
λe − λ( x −t2 ) λe − λtt2 dt2
∫
x
= λ 2 e − λx dt2 = λ 2 e − λx t2 0 = λ 2 xe − λx
0
y y
fY ( y ) =
∫ 0
fT3 ( y − t4 ) fT4 (t4 )dt4 =
∫ 0
λe − λ( y −t4 ) λe − λtt4 dt4
∫
y
= λ 2 e − λy dt4 = λ 2 e − λy t4 0 = λ 2 ye − λy
0
Exercises
6.1 A discrete random variable, X, has the following probability mass
function:
0.08 x=0
0.24 x=1
fX ( x) = 0.30 x=2
0.20 x=3
0.18 x=4
0.08 d=0
0.12 d=1
0.18 d=2
fD (d) = 0.24 d=3
0.16 d=4
0.12 d=5
0.10 d=6
0 y=0
0.8c y=1
1.2 c y=2
fY ( y ) =
c y=3
0.6c y=4
0.4c y=5
the product, what are the mean and variance for the number of defec-
tive units found in the sample? Explain why it is reasonable to have a
noninteger expected value.
6.6 For the bearing inspection process described in Exercise 4.27
of Chapter 4, what are the mean and variance for the number of
inspections performed?
6.7 If a random variable has the probability density defined as
f X(x) = 1.5(1 – x2), 0 ≤ x ≤ 1, compute E[X] and Var[X].
6.8 Compute E[T] and Var[T] for the exponential variable having λ = 0.024.
6.9 A random variable X has E[X] = 2 and Var[X] = 4.2.
a. Compute E[(X + 1)2].
b. Compute Var[2X + 3].
6.10 Suppose a random variable has density function
ax + bx 2 0<x<1
f X ( x) =
0 elsewhere
Determine Pr[X > 1.4|Y = 2.5], E[X|Y = 2.5] and Var[X|Y = 2.5].
6.20 For the pmf of Exercise 6.1, determine the values for E[X|X > 1] and
Var[X | X > 1].
6.21 The density function on a random variable X is fX (x) = x 2 , 0 ≤ x ≤ 2.
Compute E[X|X > 0.5].
6.22 The random variables X and Y have joint density f XY(x, y) = e – (x + y),
0 ≤ x ≤ ∞, 0 ≤ y ≤ ∞. Compute Pr[X < Y].
6.23 Suppose X and Y are independent exponential random variables.
Identify the distribution function on Z = X/Y and construct an
expression for Pr[X < Y].
6.24 The joint probability density function for the random variables X
and Y is
x
fXY ( x , y ) = + cy , 0 ≤ x ≤ 1, 1 ≤ y ≤ 5
5
Compute Pr[X + Y ≥ 3.5].
6.25 The joint density function on X and Y is f XY(x, y) = (x + y), 0 ≤ x ≤ 1,
0 ≤ y ≤ 1. Compute Pr[X + Y ≤ 0.8].
6.26 Suppose X and Y are independent and identically distributed uni-
form random variables over the range (0, 1) and that we define the
variables U = X + Y and V = X/Y. Determine the joint density on U
and V.
6.27 Suppose X and Y are independent and identically distributed
exponential random variables having parameter λ = 1 over the
range (0,1) and that we define the variables U = X + Y and V =
X/(X + Y). Determine the joint density on U and V.
6.28 For the random variables analyzed in Exercise 6.13, compute
E[N – M] and Var[N–M].
6.29 For the random variables analyzed in Exercise 6.14, compute
E[X + Y] and Var[X + Y].
136 Probability Foundations for Engineers
MX (θ) = E[e θx ] =
∫e x
θx
fX ( x) dx (7.3)
dk
E[X k ] = MX (θ) (7.4)
dθ k θ= 0
so the kth moment is computed as the kth derivative of the mgf evaluated at θ = 0.
This relationship applies to both discrete and continuous random variables.
Example 7.1
A binomial distribution having parameters n and p has the moment-
generating function
137
138 Probability Foundations for Engineers
n
n n
n
MX (θ) = ∑ x=0
e θx p x q n− x =
x
∑ x (pe ) q
x=0
θ x n− x
= (q + pe θ )n
d d
E[X ] = MX (θ) = (q + pe θ )n = npe θ (q + pe θ )n − 1 = np
dθ θ= 0 dθ θ= 0
θ= 0
d2 d
E[X 2 ] = MX (θ) = npe θ (q + pe θ )n−1
dθ2 θ= 0
dθ θ= 0
2
= np + n(n − 1) p = np + n p − np 2 2 2
so as we know,
Example 7.2
An exponential distribution having parameter λ has the moment-
generating function
∞ ∞
λ
MX (θ) = E[e θx ] =
∫ 0
λe θx e −λx dx =
∫ 0
λe −( λ−θ) x dx =
λ−θ
d d λ λ 1
E[X ] = MX (θ) = = =
dθ θ= 0 dθ λ − θ θ= 0 ( λ − θ)2 θ= 0
λ
d2 d λ 2λ 2
E[X 2 ] = MX (θ) = = =
dθ 2
θ= 0
dθ (λ − θ)2 θ= 0
(λ − θ)3 θ= 0
λ2
so
2 1 1
Var[X ] = E[X 2 ] − E 2 [X ] = − 2 = 2
λ 2
λ λ
The moment-generating function can be constructed for empirical distri-
butions as well as for the standard distribution families. Consider the two-
day demand distribution constructed in Example 6.23 in Chapter 6. For that
distribution, the mgf is
MX (θ) = ∑e
x
θx
Pr[X = x]
Then, the mean and variance are obtained in the same manner as for other
distributions.
E[X ] =
d
dθ
(
0.01 + 0.08e θ + 0.22 e 2 θ + 0.28e 3θ + 0.25e 4θ + 0.12 e 5θ + 0.04e 6θ )
θ= 0
(
= 0.08e θ + 0.44e 2 θ + 0.84e 3θ + 1.0e 4θ + 0.60e 5θ + 0.24e 6θ ) θ= 0
= 3.2
E[X 2 ] =
d
dθ
(
0.08e θ + 0.44e 2 θ + 0.84e 3θ + 1.0e 4θ + 0.60e 5θ + 0.24e 6θ )
θ= 0
(
= 0.08e θ + 0.88e 2 θ + 2.54e 3θ + 4.00e 4θ + 3.0e 5θ + 1.44e 6θ ) θ= 0
= 11.94
so
Var[ X ] = E[ X2 ] − E2[ X ] = 11.94 − (3.2)2 = 1.70
Next, the fact that the normal distribution is so widely used makes it
worthwhile to include the construction of its moment-generating function
here. Recall that the density function for the normal distribution is
2
1 −( x − µ )
f X ( x) = e 2 σ2
2
2 πσ
2 2
∞
1 −( x − µ ) ∞
1 θx −( x − µ )
MX (θ) =
∫−∞
e θx
2 πσ 2
e 2σ2 dx =
∫ −∞ 2 πσ 2
e 2σ2 dx
( x − (µ + θσ 2 ))2 2µθ + θ2 σ 2
=− +
2σ 2 2
140 Probability Foundations for Engineers
d d µθ+
θ2 σ 2 µθ+
θ2 σ 2
E[X ] = MX (θ) = e 2 = (µ + θσ 2 )e 2
=µ
dθ θ= 0 dθ θ= 0
θ= 0
θ2 σ 2
d2 d µθ+
E[X 2 ] = 2 MX (θ) = (µ + θσ 2 )e 2
dθ θ= 0
dθ
θ= 0
2 2 2 2
θ σ θ σ
µθ+ µθ+
= σ 2e 2 + (µ + θσ 2 )2 e 2 = σ2 + µ2
θ= 0
7.2 Convolutions
There are numerous applications of the moment-generating functions, but
the most widely implemented is the determination of the distribution on the
sum of independent random variables. Recall, that some of the sums were
constructed directly in Chapter 6. Unfortunately, not all sums of indepen-
dent random variables can be analyzed directly as was done in Chapter 6. For
example, suppose X and Y are independent random variables with normal
distributions. The computation rules of Chapter 6 imply that for Z = X + Y
( z − y − µx )2 ( y − µy )2
− −
z z 2 σ 2x 2 σ 2y
e e
f Z ( z) =
∫ −∞
fX ( z − y ) fY ( y ) dy =
∫−∞ 2 πσ 2x 2 πσ 2y
dy
and evaluating this integral is very difficult. On the other hand, the dis-
tribution on the sum of independent random variables is known to have a
moment-generating function comprised of the product of the moment gener-
ating functions of the variables in the sum. In general, as well as for the case
of two normal variables
Moment-Generating Functions 141
θ2 σ X
2
θ2 σ Y
2
θ2 ( σ X
2 2
+σY ) θ2 σ 2Z
µX θ+ µY θ+ ( µX + µY )θ+ µZ θ+
MZ (θ) = e 2 e 2 =e 2 =e 2
which we recognize as the mgf for a normal random variable, in this case Z,
having μz = μX + μY and σ 2Z = σ 2X + σ Y2 .
The process of constructing the distribution on the sum of independent
random variables is called taking the convolution of the variables. Thus, we
would say that Z is the convolution of X and Y.
As indicated in Chapter 6, sums of more than two random variables can
be accumulated pairwise. However, using the moment-generating func-
tions, the distribution on the sum of several independent random variables
can be directly identified. For the set of independent random variables, say
X1, X2, …, Xn, the random variable Y = X1 + X2 + … + Xn has a distribution for
which the mgf is
MY (θ) = ∏M
i =1
Xi (θ) (7.6)
θ2 σ 2Z
µZ θ+
MZ (θ) = e 2
it must be the case that the random variable Z has a normal distribution with
the indicated parameters. Similarly, if we find a random variable with the
moment-generating function
MX(θ) = ( q + pe θ )n
then it must be the case that the random variable X has a binomial distribu-
tion. Similar statements apply to each of the standard distributions described
in this text, so it is only in the analysis of other probability distributions
that we must actually perform the inversion operation on the mgf. For those
142 Probability Foundations for Engineers
=
∫ ∫ ∫ e
x1 x2 xn
θ1x1 + θ2 x2 ++ θn xn
fX1 ,X2 ,...,Xn ( x1 , x2 ,..., xn )dxn ,..., dx1 (7.7)
∂n
MX ,Y (θ1 , θ2 ) θ = 0 = E[X n− mY m ] (7.8)
∂θ ∂θ1n− m
m
2
1
θ2 = 0
ds
E[X 1r1 X 2r2 X ns− r1 − r2 … − rn−1 ] = MX1 ,X2 ,...,Xn (θ1 , θ2 ,..., θn )
dθ1r1 θ2r2 θns− r1 − r2 … − rn−1 θ= 0
(7.9)
Example 7.3
Suppose the following empirical pmf describes a process of interest to us:
fX,Y (1,1) = 1/9 fX,Y (1,2) = 1/6 fX,Y (1,3) = 1/18
fX,Y (2,1) = 1/18 f (2,2) = 1/9 fX,Y (2,3) = 1/9
fX,Y (3,1) = 1/9 fX,Y (3,2) = 1/9 fX,Y (3,3) = 1/6
Moment-Generating Functions 143
1 θ 1 + 3 θ 2 1 2 θ1 + θ 2 1 2 θ 1 + 2 θ 2 1 2 θ 1 + 3 θ 2 1 3 θ1 + θ 2
+ e + e + e + e + e
18 18 9 9 9
1 1
+ e 3 θ1 + 2 θ 2 + e 3 θ1 + 3 θ 2
9 6
∂2 1 2 3 2
E[XY ] = MX ,Y (θ1 , θ2 ) = e θ1 +θ2 + e θ1 + 2 θ2 + e θ1 + 3θ2 + e 2 θ1 +θ2
∂θ2 ∂θ1 9 6 18 18
4 6 3 6 9
+ e 2 θ 1 + 2 θ 2 + e 2 θ 1 + 3 θ 2 + e 3 θ 1 + θ 2 + e 3 θ 1 + 2 θ 2 + e 3 θ1 + 3 θ 2
9 9 9 9 6
Evaluating at θ1 = θ2 = 0
∂2 1 2 3 2 4 6 3 6 9 78 13
MX ,Y (θ1 , θ2 ) = + + + + + + + + = =
∂θ2 ∂θ1 θ1 = 0 9 6 18 18 9 9 9 9 6 18 3
θ2 = 0
Example 7.4
−x− y
For the joint density f XY ( x , y ) = 2 e , 0 ≤ x ≤ y , 0 ≤ y < ∞ , the joint
moment-generating function is
∞ y ∞ y
MX ,Y (θ1 , θ2 ) = E[e θ1X +θ2Y ] = 2
∫ ∫
0 0
e θ1x +θ2 y e − x − y dxdy = 2
∫ 0
e −(1−θ2 ) y
∫
0
e −(1−θ1 ) x dxdy
y
∞ ∞
−1 −(1−θ1 ) x
=2
∫ 0
e − ( 1− θ2 ) y
1 − θ e
1
dy =
1
2
− θ1 ∫ 0
(
e −(1−θ2 ) y 1 − e −(1−θ1 ) y dy )
0
∫ (e )
2 − ( 1− θ2 ) y
= − e −( 2 −θ2 −θ1 ) y dy
1 − θ1 0
∞
2 −1 −(1−θ2 ) y −1
= e − e − ( 2 − θ 2 − θ1 ) y
1 − θ1 1 − θ2 2 − θ 2 − θ1 0
2 1 1 2 (1 − θ1 ) + (1 − θ2 ) − (1 − θ2 )
= − =
1 − θ1 1 − θ2 2 − θ2 − θ1 1 − θ1 (1 − θ2 )(((1 − θ1 ) + (1 − θ2 ))
2 2
= =
(1 − θ2 )((1 − θ1 ) + (1 − θ2 )) (1 − θ2 )(1 − θ1 ) + (1 − θ2 )2
144 Probability Foundations for Engineers
Here again
∂2 ∂ 2(1 − θ2 )
E[XY ] = MX ,Y (θ1 , θ2 ) =
( )
2
∂θ2 ∂θ1 θ1 = 0
θ2 = 0
∂θ 2 (1 − θ 2 )(1 − θ1 ) + (1 − θ 2 )
2
θ1 = 0
θ2 = 0
=
( )
−2 (1 − θ2 )(1 − θ1 ) + (1 − θ2 )2 + 4(1 − θ2 ) ( 2(1 − θ2 ) + (1 − θ1 ))
((1 − θ )(1 − θ ) + (1 − θ ) )
2 1 2
2 3
θ1 = 0
θ2 = 0
−2 ( 1 + 1) + 4(1) ( 2 + 1) −4 + 12
= = =1
(1 + 1) 3
8
FX1 ( x1 ) = FX1 , X2 ( x1 , ∞)
Example 7.5
For the empirical distribution presented in Example 7.3, the marginal
probability mass functions are
fX(1) = 1/3 fX(2) = 5/18 fX(3) = 7/18
fY(1) = 5/18 fY(2) = 7/18 fY(3) = 1/3
1 θ1 5 2 θ1 7 3 θ1
MX (θ1 ) = MXY (θ1 , 0) = e + e + e
3 18 18
Moment-Generating Functions 145
and
5 θ2 7 2 θ2 1 3 θ2
MY (θ2 ) = MXY (0, θ2 ) = e + e + e
18 18 3
Example 7.6
For the continuous distribution analyzed in Example 7.4, the marginal
probability density functions are
∞ ∞
f X ( x) =
∫x
f XY ( x , y ) dy = 2
∫ x
e − x − y dy = 2 e −2 x
y y
fY ( y ) =
∫
0
f XY ( x , y ) dx = 2
∫ 0
e − x − y dx = 2 e − y [1 − e − y ]
2
MX (θ1 ) = MXY (θ1 , 0) =
2 − θ1
and
2
MY (θ2 ) = MXY (0, θ2 ) =
(1 − θ2 ) + (1 − θ2 )2
∂ 2(1 − θ2 ) 1
E[X ] = MX ,Y (θ1 , θ2 ) = =
( )
2
∂θ1 θ1 = 0 (1 − θ 2 )(1 − θ1 ) + (1 − θ2 )2 2
θ2 = 0 θ1 = 0
θ2 = 0
∂2 4(1 − θ2 ) ( 2 − θ1 − θ2 ) 1
E[X 2 ] = MX ,Y (θ1 , θ2 ) = =
∂θ12 θ1 = 0
θ2 = 0
(1 (
− θ 2 ) ( 2 − θ1 − θ 2 )
2 2
) θ1 = 0
2
θ2 = 0
∂ 2 ((1 − θ1 ) + 2(1 − θ2 )) 3
E[Y ] = MX ,Y (θ1 , θ2 ) = =
( )
2
∂θ2 θ2 = 0 (1 − θ2 )(1 − θ1 ) + (1 − θ2 )2 2
θ1 = 0 θ2 = 0
θ1 = 0
∂2
E[Y 2 ] = MX ,Y (θ1 , θ2 )
∂θ22 θ2 = 0
θ1 = 0
Example 7.7
For the joint probability mass function
e −17 11y 1 6 y 2 − y 1
fY1 ,Y2 ( y1 , y 2 ) = , 0 ≤ y1 ≤ y 2 , 0 ≤ y 2 < ∞
y 1 !( y 2 − y 1 )!
e −6 6 y 2 − y 1
fY2|Y1 ( y 2 |y1 ) =
( y 2 − y1 )!
and
y 2 11 y 1 6 y 2 − y 1
fY1|Y2 ( y1|y 2 ) =
y1 17 17
∞ y2
e −17 11y1 6 y2 − y1
MY1 ,Y2 (θ1 , θ2 ) = ∑∑e
y 2 = 0 y1 = 0
θ1 y 1 + θ 2 y 2
y1 !( y 2 − y1 )!
∞ y2
e θ2 y 2 y 2 !(11e θ1 )y1 6 y2 − y1
= e −17 ∑
y2 =0
y2 ! ∑
y1 =0
y1 !( y 2 − y1 )!
(e )
∞ ∞ θ2 y2
e θ2 y 2 (6 + 11e θ1 )
=e −17
∑
y2 =0
y2 !
(6 + 11e θ1 )y2 = e −17
y
∑
2 =0
y2 !
θ2
( 6+ 11eθ1 )− 17
= ee
Moment-Generating Functions 147
y2 y y 2 − y1 y2
y2
∑ y 1711 e
1
θ1 6 6 11 θ1
= = + e
1 17 17 17
y1 = 0
and
∞ ∞
e −6 6 y2 − y1 (6e θ2 )y2 − y1
∑ ∑
θ2
MY2|Y1 (θ2 |y1 ) = e θ2 y 2 = e −6 +θ2 y1 = e 6 e − 6 +θ2 y1
y 2 = y1
( y 2 − y1 )! ( y 2 − y 1 )!
y 2 − y1 =0
Therefore,
y2
d d 6 11 θ1
E[Y1 |Y2 ] = MY1|Y2 (θ1 |y 2 ) = + e
dθ1 θ1 = 0
dθ1 17 17
θ1 = 0
y2 − 1
6 11 θ1 11 θ1 11
= y2 + e e = y2
17 17 17 17
θ1 = 0
and
d d 6 eθ2 − 6 +θ2 y1
E[Y2 |Y1 ] = MY2|Y1 (θ2 |y1 ) = e
dθ2 θ2 = 0
dθ2 θ2 = 0
θ2
− 6 +θ2 y1
= (6e θ2 + y1 )e 6 e = 6 + y1
θ2 = 0
Example 7.8
For the joint probability density function f XY( x, y ) = 2e−x−y, 0 ≤ x ≤ y, 0 ≤ y ≤ ∞
we found that
fY|X ( y|x) = e −( y − x )
and
e−x
f X|Y ( x|y ) =
[1 − e − y ]
Therefore,
y y
e−x 1 y
MX|Y (θ1 ) =
∫
0
e xθ1 f X|Y ( x|y ) dx =
∫ 0
e xθ1
1 − e−y
dx =
1 − e−y ∫ 0
e −(1−θ1 ) x dx
y
e −(1−θ1 ) x 1 – e –(1– θ1 ) y
=− =
(1 − e )(1 − θ1 ) 0 (1 − e − y )(1 − θ1 )
−y
148 Probability Foundations for Engineers
∞ ∞ ∞
MY|X (θ2 ) =
∫0
e yθ2 fY|X ( y|x) dy =
∞
∫ 0
e yθ2 e −( y − x ) dy = e x
∫
0
e −(1−θ2 ) y dy
e −(1−θ2 ) y e θ2 x
= −e x =
1 − θ2 1 − θ2
x
and
∂
MX|Y (θ1 ) =
( )
1 – e −(1−θ1 ) y – ( 1 − θ1 ) ye −(1−θ1 ) y
=
1 – e – y – ye − y
∂θ1 θ1 = 0
(1 − e − y )(1 − θ1 )2 1 − e−y
θ1 = 0
∂ xe θ2 x (1 − θ2 ) − e θ2 x (−1)
MY|X (θ2 ) = = x+1
∂θ2 θ2 = 0
(1 − θ2 )2 θ2 = 0
∂2
MX|Y (θ1 ) =
( ) (( )
− (1 − θ1 ) y 2 e −(1−θ1 ) y (1 − θ1 ) − 2 1 − e −(1−θ1 ) y − (1 − θ1 ) ye −(1−θ1 ) y (−1) )
∂θ12
θ1 = 0
(1 − e − y )(1 − θ1 )3
θ1 = 0
=
(
− y 2 e − y + 2 1 − e − y − ye − y ) = 2 − 2e −y
− 2 ye − y − y 2 e − y
1 − e−y 1 − e−y
∂2 x 2 e θ2 x (1 − θ2 )2 − 2 ( 1 + x − θ2 x ) e θ2 x (−1)
MY|X (θ2 ) = = x2 + 2x + 2
∂θ22
θ2 = 0
(1 − θ 2 ) 3
θ2 = 0
2
2 − 2 e − y − 2 ye − y − y 2 e − y 1 − e − y − ye − y 1 − 2 e − y − y 2 e − y + e −2 y
Var[X|Y ] = −y − −y =
1− e 1− e (1 − e − y )2
Var[Y|X ] = 2 + 2 x + x 2 − ( x + 1) = 1
2
Exercises
7.1 Construct the moment-generating function for a Bernoulli
distribution.
7.2 Construct the moment-generating function for a Poisson
distribution.
Moment-Generating Functions 149
There are situations in which the choice of probability model is not clear or,
even if we have an idea of which model to use, we simply wish to make an
estimate of a probability without performing the complete computations. In
addition, there are cases in which it is convenient to use one probability dis-
tribution to approximate probabilities for another distribution. Ultimately,
these methods lead us to two key limit theorems that have wide applicability
and important implications. These topics are examined in this chapter.
Example 8.1
Suppose a hospital emergency room has experienced patient arrival pat-
terns that are well modeled by a Poisson distribution having λ = 4 / hr
and is using this model to establish staffing policies. What are the
chances that more than seven patients will arrive during any one-hour
period? Using Markov’s inequality
Pr[X ≥ 7] ≤ 4 7
Example 8.2
Suppose the time between incoming calls to a telephone call center
is exponential with the parameter λ = 0.5/min. This means that the
151
152 Probability Foundations for Engineers
Pr[T ≥ 6] ≤ 2 6 = 0.333
This relationship applies to all random variables and specifically states that
the probability of observing a value of a random variable that is “far” from
its mean is inversely proportional to the square of the value.
Example 8.3
For the hospital emergency room of Example 8.1, Chebyshev’s inequality
indicates that
Pr[|X − µ| ≥ 7 − 4 = 3] ≤ σ
2
= 49
k2
and the chances that nine patients arrive during any hour is
Pr[|X − µ| ≥ 9 − 4 = 5] ≤ σ
2
= 4 25 = 0.16
k2
Example 8.4
For the call center of Example 8.2, Chebyshev’s inequality indicates that
Pr[|X − µ| ≥ 6 − 2 = 4] ≤ σ
2
= 4 16 = 0.25
k2
Pr[|X − µ| ≥ 8 − 2 = 6] ≤ σ
2
= 4 36 = 0.111
k2
Example 8.5
For a binomial random variable with n = 50, p = 0.08 so μ = 4.0, σ2 = npq = 3.68.
Then
Approximations and Limiting Behavior 153
Example 8.6
For a normal random variable having μ = 64.0 and σ = 1.50,
x
n
Pr[X ≤ x] = ∑ x p q
i= 0
x n− x
154 Probability Foundations for Engineers
Example 8.7
For a binomial random variable with n = 80 and p = 0.04 so λ = 3.2, the
calculation of
Pr[X ≤ 6] = B(6,80,0.04) = 0.959
Pr[X ≤ 6] = B(6,80,0.04) ≈ P(6, λ = 3.2) = 0.955
Example 8.8
For a binomial random variable with n = 100 and p = 0.06 so λ = 6.0,
calculation of both
Pr[X = 7] = b(7,100,0.06) = 0.141
and
Pr[X ≤ 7] = B(7,100,0.06) = 0.748
x − np
Pr[X ≤ x] ≈ Pr[Z ≤ ] (8.3)
npq
Approximations and Limiting Behavior 155
Example 8.9
For a binomial random variable with n = 100 and p = 0.03 so np = 3.0 and
npq = 2.91. We find
Pr[
X ≤ 5 ] = B( 5, 100, 0.03 ) = 0.919
and
5.5 − 3
Pr[X ≤ 5] ≈ Pr[Z ≤ ] = Pr[Z ≤ 1.466] = 0.928
2.91
Example 8.10
In an automated machining process, output workpieces are accumulated
into batches of 800 parts. If the machining process generates 2.5% defec-
tive pieces, what is the probability that a batch has fewer than 12 defec-
tive parts?
The number of defects in a batch should have a binomial distribution,
so the appropriate computation is
Pr[
X ≤ 12 ] = B( 12, 800, 0.025 ) = 0.037
12.5 − 20
Pr[X ≤ 12] ≈ Pr[Z ≤ ] = Pr[Z ≤ −1.698] = 0.045
19.5
Example 8.11
In a manufacturing assembly line, work stoppages occur due to machinery
jams according to a Poisson distribution having the parameter λ = 10/hr.
What is the probability that 15 or more stoppages occur during any one-
hour period?
The Poisson probability of this event is
Pr[X ≥ 15] = 1 − P(14, λ = 8.0) = 1.0 − 0.917 = 0.083
156 Probability Foundations for Engineers
To close this discussion, it is noted that the normal distribution can often
be used to approximate most other distributions. In each case, the quality
of the approximation can be poor but under certain circumstances can be
quite good. For example, when the shape parameter of the gamma distribu-
tion is large, the normal distribution approximates the gamma distribution
reasonably well. When the shape parameter of the Weibull distribution
is between about 2.8 and 3.5, the normal distribution approximates the
Weibull distribution well.
∑X
Sn 1
Yn = = i
n n
i=1
The first of our limiting results is known as the weak law of large numbers
where the term “law” may be taken to be synonymous with distribution.
Although it was developed before Chebyshev’s inequality, the result can be
shown to follow logically from that inequality and is that as long as the vari-
ables Xi have a finite expected value, μ = E[ Xi ], then
lim Pr Yn − µ ≥ ε = 0 (8.5)
n→∞
Approximations and Limiting Behavior 157
In words, the probability that the weighted sum (or average), Yn, will differ
from the mean goes to zero.
The reader is encouraged to experiment with this behavior. It is sug-
gested that the reader take a fair six-sided die and roll it repeatedly while
recording the result of each roll. Compute Yn as you proceed and observe
its behavior.
The second of our limiting results is comparable but stronger because it is
made without recourse to a probability. The strong law of large numbers is that
as long as the random variables have a finite expected value μ = E[ Xi ], then
lim Yn = µ (8.6)
n→∞
The third and most widely used of our limiting results is known as the central
limit theorem. This result is that as long as the random variables in the sequence
have both a finite expected value μ = E[ Xi ] and a finite variance σ2, then
Y −µ 1 y y2
lim Pr n ≤ y =
∫
−
e 2 dy (8.7)
n→∞ σ 2π −∞
n
which is to say, the distribution on the weighted sum, Yn, converges to a stan-
2
dard normal distribution having expected value μ and variance σ n . Note
that this result applies regardless of the identity of the distribution on the
observations, Xi. It is a very strong result that suggests why the descriptors
of so many natural phenomena display the bell shape. The result also pro-
vides an indication of why the normal distribution often yields reasonable
approximations to probabilities from other distributions.
A final result is that the central limit theorem also applies to a sequence
of independent random vectors. If we observe a sequence of random vectors
X i = (X i ,1 , X i ,2 , … , X i , r ) that are mutually independent and have a common
distribution with mean vector µ = (µ 1, µ 2 , … , µ r )all of which elements are
finite and covariance matrix Σ in which all elements are finite, then the dis-
tribution on the vector of component-wise sums Yn = (Yn,1 , Yn,2 , … , Yn, r ) com-
puted as
∑X
Sn, i 1
Yn, i = = j,i
n n j=1
Exercises
8.1 For a manufacturing process that generates units of product of
which 1.25% are defective, use Markov’s inequality to compute a
limit on the probability of observing more than 4 defective units in
an inspection sample of 100 units.
8.2 The number of calls arriving to a credit card service call center is
Poisson with the parameter λ = 6 / min. Use Markov’s inequality to
compute a limit on the chance that more than 12 customer calls
arrive within a one-minute interval.
8.3 The number of computers sold per week by the university book
store is a random variable with an expected value of 12. Use
Markov’s inequality to compute a limit on the probability that the
sales volume in any week will be (a) 18 or more or (b) 24 or more.
8.4 The time to failure for a high-intensity lamp is a gamma random
variable having parameters α = 2.5, λ = 0.02. Use Markov’s inequal-
ity to compute a limit on the probability that a lamp will survive (a)
more than 150 hours or (b) more than 200 hours.
8.5 Repeat Exercise 8.1 using Chebyshev’s inequality.
8.6 Repeat Exercise 8.2 using Chebyshev’s inequality.
8.7 Assuming the variance in weekly computer sales is 4, repeat
Exercise 8.3 using Chebyshev’s inequality.
8.8 For male college students in the United States, the mean height is
180 cm (6¢) and the variance is 25 cm2. Use Chebyshev’s inequality
to compute a limit on the probability of observing a student taller
than 205 cm (6¢102).
8.9 Use the Poisson distribution to approximate the probability for
Exercise 8.1.
8.10 Use the normal distribution to approximate the probability for
Exercise 8.1.
8.11 Use the Poisson distribution to approximate the cumulative binomial
probability of observing four or fewer defective parts in a sample of
200 units from a production lot having a defect rate of 1.5%. Then,
use the normal distribution to approximate the same probability.
8.12 Use the normal distribution to compute an approximation to the
Poisson probability of Exercise 8.2.
8.13 A gambler playing roulette and betting simply on red or black
has a win probability of 0.474 because of the zero and double zero.
What do the weak law of large numbers and the strong law of large
numbers imply about the long-term winnings of the gambler?
Approximations and Limiting Behavior 159
np/x 0 1 2 3 4 5 6 7 8 9 10
0.02 0.980 1
0.04 0.961 0.999 1
0.06 0.942 0.998 1
0.08 0.923 0.997 1
0.10 0.905 0.995 1
0.12 0.887 0.993 1
0.14 0.869 0.991 1
0.16 0.852 0.988 0.999 1
0.18 0.835 0.986 0.999 1
0.20 0.819 0.982 0.999 1
0.25 0.779 0.974 0.998 1
0.30 0.741 0.963 0.996 1
0.35 0.705 0.951 0.994 1
0.40 0.670 0.938 0.992 0.999 1
0.45 0.638 0.925 0.989 0.999 1
0.50 0.607 0.910 0.986 0.998 1
0.55 0.577 0.894 0.982 0.998 1
0.60 0.549 0.878 0.977 0.997 1
0.65 0.522 0.861 0.972 0.996 0.999 1
0.70 0.497 0.844 0.966 0.994 0.999 1
0.75 0.472 0.827 0.959 0.993 0.999 1
0.80 0.449 0.809 0.953 0.991 0.999 1
0.85 0.427 0.791 0.945 0.989 0.998 1
0.90 0.407 0.772 0.937 0.987 0.998 1
0.95 0.387 0.754 0.929 0.984 0.997 1
1.00 0.368 0.736 0.920 0.981 0.996 0.999 1
1.10 0.333 0.699 0.900 0.974 0.995 0.999 1
1.20 0.301 0.663 0.879 0.966 0.992 0.998 1
1.30 0.273 0.627 0.857 0.957 0.989 0.998 1
1.40 0.247 0.592 0.833 0.946 0.986 0.997 0.999 1
1.50 0.223 0.558 0.809 0.934 0.981 0.996 0.999 1
1.60 0.202 0.525 0.783 0.921 0.976 0.994 0.999 1
1.70 0.183 0.493 0.757 0.907 0.970 0.992 0.998 1
1.80 0.165 0.463 0.731 0.891 0.964 0.99 0.997 0.999 1
1.90 0.150 0.434 0.704 0.875 0.956 0.987 0.997 0.999 1
2.00 0.135 0.406 0.677 0.857 0.947 0.983 0.995 0.999 1
(Continued)
161
162 Appendix: Cumulative Poisson Probabilities
np/x 0 1 2 3 4 5 6 7 8 9 10
2.20 0.111 0.355 0.623 0.819 0.928 0.975 0.993 0.998 1
2.40 0.091 0.308 0.570 0.779 0.904 0.964 0.988 0.997 0.999 1
2.60 0.074 0.267 0.518 0.736 0.877 0.951 0.983 0.995 0.999 1
2.80 0.061 0.231 0.469 0.692 0.848 0.935 0.976 0.992 0.998 0.999 1
3.00 0.050 0.199 0.423 0.647 0.815 0.916 0.966 0.988 0.996 0.999 1
3.20 0.041 0.171 0.380 0.603 0.781 0.895 0.955 0.983 0.994 0.998 1
3.40 0.033 0.147 0.340 0.558 0.744 0.871 0.942 0.977 0.992 0.997 0.999
3.60 0.027 0.126 0.303 0.515 0.706 0.844 0.927 0.969 0.988 0.996 0.999
3.80 0.022 0.107 0.269 0.473 0.668 0.816 0.909 0.960 0.984 0.994 0.998
4.00 0.018 0.092 0.238 0.433 0.629 0.785 0.889 0.949 0.979 0.992 0.997
4.20 0.015 0.078 0.210 0.395 0.590 0.753 0.867 0.936 0.972 0.989 0.996
4.40 0.012 0.066 0.185 0.359 0.551 0.720 0.844 0.921 0.964 0.985 0.994
4.60 0.010 0.056 0.163 0.326 0.513 0.686 0.818 0.905 0.955 0.980 0.992
4.80 0.008 0.048 0.143 0.294 0.476 0.651 0.791 0.887 0.944 0.975 0.990
5.00 0.007 0.040 0.125 0.265 0.440 0.616 0.762 0.867 0.932 0.968 0.986
5.20 0.006 0.034 0.109 0.238 0.406 0.581 0.732 0.845 0.918 0.96 0.982
5.40 0.005 0.029 0.095 0.213 0.373 0.546 0.702 0.822 0.903 0.951 0.977
5.60 0.004 0.024 0.082 0.191 0.342 0.512 0.670 0.797 0.886 0.941 0.972
5.80 0.003 0.021 0.072 0.170 0.313 0.478 0.638 0.771 0.867 0.929 0.965
6.00 0.002 0.017 0.062 0.151 0.285 0.446 0.606 0.744 0.847 0.916 0.957
6.20 0.002 0.015 0.054 0.134 0.259 0.414 0.574 0.716 0.826 0.902 0.949
6.40 0.002 0.012 0.046 0.119 0.235 0.384 0.542 0.687 0.803 0.886 0.939
6.60 0.001 0.010 0.040 0.105 0.213 0.355 0.511 0.658 0.780 0.869 0.927
6.80 0.001 0.009 0.034 0.093 0.192 0.327 0.480 0.628 0.755 0.850 0.915
7.00 0 0.007 0.030 0.082 0.173 0.301 0.450 0.599 0.729 0.830 0.901
7.20 0 0.006 0.025 0.072 0.156 0.276 0.420 0.569 0.703 0.810 0.887
7.40 0 0.005 0.022 0.063 0.140 0.253 0.392 0.539 0.676 0.788 0.871
7.60 0 0.004 0.019 0.055 0.125 0.231 0.365 0.510 0.648 0.765 0.854
7.80 0 0.004 0.016 0.048 0.112 0.210 0.338 0.481 0.620 0.741 0.835
8.00 0 0.003 0.014 0.042 0.100 0.191 0.313 0.453 0.593 0.717 0.816
8.50 0 0.002 0.009 0.030 0.074 0.150 0.256 0.386 0.523 0.653 0.763
9.00 0 0.001 0.006 0.021 0.055 0.116 0.207 0.324 0.456 0.587 0.706
9.50 0 0 0.004 0.015 0.040 0.089 0.165 0.269 0.392 0.522 0.645
10.00 0 0 0.003 0.010 0.029 0.067 0.130 0.220 0.333 0.458 0.583
Appendix: Cumulative Poisson Probabilities 163
np/x 11 12 13 14 15 16 17 18 19 20 21
3.4 1
3.6 1
3.8 0.999 1
4 0.999 1
4.2 0.999 1
4.4 0.998 0.999 1
4.6 0.997 0.999 1
4.8 0.996 0.999 1
5 0.995 0.998 0.999 1
5.2 0.993 0.997 0.999 1
5.4 0.990 0.996 0.999 1
5.6 0.988 0.995 0.998 0.999 1
5.8 0.984 0.993 0.997 0.999 1
6 0.980 0.991 0.996 0.999 0.999 1
6.2 0.975 0.989 0.995 0.998 0.999 1
6.4 0.969 0.986 0.994 0.997 0.999 1
6.6 0.963 0.982 0.992 0.997 0.999 0.999 1
6.8 0.955 0.978 0.990 0.996 0.998 0.999 1
7 0.947 0.973 0.987 0.994 0.998 0.999 1
7.2 0.937 0.967 0.984 0.993 0.997 0.999 1
7.4 0.926 0.961 0.980 0.991 0.996 0.998 0.999 1
7.6 0.915 0.954 0.976 0.989 0.995 0.998 0.999 1
7.8 0.902 0.945 0.971 0.986 0.993 0.997 0.999 1
8 0.888 0.936 0.966 0.983 0.992 0.996 0.998 0.999 1
8.5 0.849 0.909 0.949 0.973 0.986 0.993 0.997 0.999 0.999 1
9 0.803 0.876 0.926 0.959 0.978 0.989 0.995 0.998 0.999 1
9.5 0.752 0.836 0.898 0.940 0.967 0.982 0.991 0.996 0.998 0.999 1
10 0.697 0.792 0.864 0.917 0.951 0.973 0.986 0.993 0.997 0.998 0.999
Probability
Manufacturing and Industrial Engineering
Nachlas
“… responds to a need that I felt some years ago, which is to provide a basic and
direct presentation of probability to engineers.”
—Enrico Zio, Politecnico di Milano, Dipartimento Energia, Milano, Italy
Foundations
“… an excellent introductory book on probability for engineers …”
—Edward A. Pohl, University of Arkansas, Fayetteville, USA
for Engineers
most of the literature on probability. … introduces the reader in the field of
randomness in a nice way. … creates a solid foundation to build up knowledge …
The strength of the book is that it presents and translates the intuition concerning
probability into mathematical structures using examples and explanations rather
than the traditional approach of theorem and proof …”
— Prof. Uday Kumar, Luleå University of Technology, Sweden
Joel A. Nachlas
K14453
ISBN: 978-1-4665-0299-4
90000
www.c rc pr e ss.c o m
9 781466 502994
w w w.crcpress.com