Você está na página 1de 57

This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th

June ,2018. It shall not be reproduced or distributed without express written


permission from Indian Institute of Management, Ahmedabad.

PY





CO

ON
TI
EC
SP
IN




This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
Technical Note

About the Technical Note

This technical note can be used by an instructor over 15 one-and-half hour sessions
to provide a crisp introduction to essential ideas in probability for management stu-

PY
dents. This material has been used to teach an introductory probability course at IIM
Ahmedabad for first year management (PGP) students. While there are many classic
text books in probability, it is often hard to adapt them to teach management students
with diverse backgrounds in limited time, while also ensuring adequate mathematical

CO
rigor. The aim of this note is to enable the instructor to introduce the subject through
a set of carefully chosen problems, some purely engineered to understand the concept
and some that are motivated by real applications. Hence, the style of presentation has
been to provide a crisp description of ideas interspersed with carefully chosen exer-
cises. Most exercises include an answer key and at times a more elaborate hint depend-
ing on the level of difficulty. Some of the exercises have been created by the author,
some adapted from books and some borrowed from assignments or exams from past
ON
courses at IIM. The books referred while preparing this note are listed at the end of this
note. These books can also serve as useful references for the instructor using this note.
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

2
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C ONTENTS Technical Note

PY
Contents

CO
1 The Concept of Probability 5

2 Enumeration Principles 9

3 Conditional Probability, Bayes Theorem... 13

3.1 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13


ON
3.2 Bayes Theroem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.3 Independence of Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Random Variable and Probability Distribution 23


TI

4.0.1 Summary Measures of a Distribution . . . . . . . . . . . . . . . . 26

5 Some Popular Discrete Distributions 31

5.1 [ Bin(n, p)] . . . . . . . . . . . . . . . . . . . . . . .


Binomial Distribution [Bin 31
EC

5.2 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.3 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . 34

5.4 Poisson distribution [ Poi ()] . . . . . . . . . . . . . . . . . . . . . . . . . 35


SP

6 Joint Distributions of more than one Random Variable 39

7 Sums of random variables 45

8 Continuous Random Variables 49


IN

8.1 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

8.2 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

8.3 Normal or Gaussian Distribution . . . . . . . . . . . . . . . . . . . . . . . 52

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

3
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C ONTENTS Technical Note

8.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Introduction

PY
We live in a world of uncertainty and we are often forced to make decisions with lim-
ited information. For example, which company share should I invest in?, where in the
ocean should we search for the missing flight?, whether a person should be convicted
for a crime based on the evidence presented?, etc. To all these questions, we would

CO
agree that there is no single answer. Instead, there are a set of possibilities or outcomes.
Further, of these outcomes some may be more likely and some less. Information such
as this is captured by a "Probability Model:: an approximate mathematical description of
a phenomenon involving uncertainty".
". Simply speaking a probability model specifies all
possible outcomes and the probability of each outcome. In practice, such models are
formulated partly based on our understanding of the context and partly constructed
based on analysis of historical data. Statistics provides a framework to systematically
ON
collect and analyze data to formulate and estimate such models.

Once a good model is known, it can be used to make decisions. For example, a com-
Distribution It turns out, as we
monly used model for log of stock returns is the Normal Distribution.
will learn later, that for a Normal model it is enough to know two parameters, namely
Mean or Expected Value and the Variance. Therefore, past data can be analyzed to
TI

draw inferences about the mean and variance of log returns. Once we have a handle
on these parameters, the probability model is known. Then, we can compute various
quantities of interest that could help in decision making, for example the probability
EC

that log returns exceed certain value. Also, knowing this for various stocks, one can
strategize to create a portfolio with desired level of expected returns and uncertainty .

To understand and correctly apply statistics a firm foundation in probability is essen-


tial. This note will focus on gaining a firm basic understanding of probability concepts.
This will also serve as foundation for study of more advanced topics in statistics and
SP

data analysis.
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

4
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 1: T HE C ONCEPT OF P ROBABILITY Technical Note

PY
C HAPTER 1

The Concept of Probability

CO
Most of us already have a notion of probability. Consider the following questions:

1. If I toss a fair coin , what is the probability of heads?

The answer is ofcourse 50% or 0.5. What do we mean by this though?. It simply
ON
means that if I keep tossing the coin a large number of times, then approximately
50% of the time I would get heads. If I do it long enough, i.e. infinitely many
times I will get exactly 50% heads.

This is the "Relative Frequency" notion of probability. It tacitly assumes that


whatever is the experiment that led to the outcome can be repeated infinitely
TI

many times under identical conditions.

2. What is the probability that AAP will lose the next Delhi assembly election?
EC

The answer to this is less straightforward and may depend on each individuals
view point. Suppose I say there is an 80% chance. What does this mean?. Clearly,
this does not have the relative frequency interpretation, for we cannot repeat the
elections under identical conditions infinitely many times!. Instead, it can be in-
"degree of belief".
terpreted as a "degree belief"
SP

Both the above interpretations are fine for our understanding whenever they make
sense. Before formally defining probability, let us understand some associated termi-
nology.

Experiment It is that which leads to a random outcome. Example 1: Tossing


Random Experiment:
IN

a coin twice, Example 2: Occurence of an insurance claim.

Sample Space is a collection of all possible outcomes of a random experiment. This is


usually denoted by S . In Example 1 S = { HH, HT, TH, TT }. In Example. 2. S =

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

5
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 1: T HE C ONCEPT OF P ROBABILITY Technical Note

the interval (0, policy Limit], i.e. the claim value can be any value between zero
and the policy limit. The former is a "discrete" sample space and the latter is a
"continuous" sample space.

PY
Event is a subset of the sample space. If the event consists of a single outcome, it
is a "Simple Event", In Example 1 A={HH} is a simple event. If an event consists
of multiple outcomes it is a "compound event". In Example 1. A={HH, HT, TH}
is the compound event that at least one toss is a head. In Example 2 B=[10000,
50000] is the compound event that a claim amount is at least 10000 but not more

CO
than 50000.

We will often work with combination of various events. Towards that end lets
recall a few notations and facts from set theory:

(i)A B means the set A is contained in B. This means: every outcome in A is


also in B. For instance, in Example 1, take A = first toss is a head. B= At least one
toss is a head. Then A B.
ON
(ii) A B is the union set of A and B, which contains all the those elements that
belong to either A or B.

(iii) A B is the intersection set of A and B, which contains all the those elements
that belong to both A and B.
TI

(iv) Similarly, we can of course talk about unions and intersection of multiple
(possibly infinitely many) events, denoted by iN=1 Ai and iN=1 Ai respectively.
Note that N can potentially be infinity ().
(
EC

(v) Ac is the compliment if A, which contains all those elements of S that do not
belong to A.

(vi) A and B are disjoint if A B = . Similarly, events A1 , A2 , , An , are


pairwise-disjoint or mutually exclusive if any two events Ai and A j are disjoint.
SP

(vii) A1 , A2 , , An , are Exhaustive if in=1 Ai = S .

(viii) As a matter of convention the Null set (denoted by { } or ) is considered


an event.
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

6
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 1: T HE C ONCEPT OF P ROBABILITY Technical Note

Exercises

1. Two 6 faced dice are rolled sequentially.

a. Write down the sample space.

PY
b. List the elements that make up the following events: A=sum of the two values
is 5, B=Value of first die is higher than second, C= the first value is

c. What are the elements of A C, B C , A B C ?.

CO
2. Show that ( A B)c = Ac Bc and in general that (iN=1 Ai )c = iN=1 Aic

(hint: Sets S1 and S2 are equal if S1 S2 and S2 S1 . Let x ( A B)c then


argue that x Ac and x Bc . Similarly, let y Ac and y Bc then argue that
y
/ ( A B).)

ON
We are now ready to formally define probability.

P)) on the sample space S (or to be more


((P)
Probability is a real valued function (P
precise...on the set of Events based on S ) with the following properties:

(i) P(S) = 1, P() = 0.


TI

(ii) For any A S, 0 P( A) 1.

(iii) For mutually exclusive events A1 , A2 , , An , :


P(i=1 Ai ) = i=1 P( Ai ).
EC

Exercises

3. Show that P( Ac ) = 1 P( A).

4. Show that P( A B) = P( A) + P( B) P( A B)
SP

(Hint: First argue that A B = ( A B) ( A Bc ) ( Ac B). Then use axioms


of probability.)

5. Show Booles inequality: P( A B) P( A) + P( B)


IN

6. Use the above result to show that P( A B C ) = P( A) + P( B) + P(C ) P( A


B ) P ( A C ) P ( B C ) + P ( A B C ).

Can you guess and write down the general formula for P(in=1 Ai )?

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

7
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 1: T HE C ONCEPT OF P ROBABILITY Technical Note

7. Three distinguishable balls are randomly placed in three cells.

a. What is the sample space associated with the experiment?.

PY
(hint: All possible combinations of (balls in cell 1, balls in cell 2, balls in cell 3). Let
B1, B2, B3 denote the balls then, examples of elements of S would be ({B1,B2,B3},
{ }, { }), ({B1,B2}, {B3}, { } ) , ({B1}, {B2}, {B3}) etc.

b. Let S1 be the event that cell 1 is singly occupied S2 be the event that cell 2 is
singly occupied, findP(S1 ), P(S2 ), P(S1 S2 ).

CO
322 32
(Ans: P(S1 ) = P(S2 ) = 33
, P ( S1 S2 ) = 33
)

8. A readership survey conducted among the adult population showed that 35%
read Times, 15% read Express and 25% read Herald; 10% read both Times and
Express, 8% read both Express and Herald, 5% read both Times and Herald; 4%
read all three publications. If an adult in the city is chosen at random, what is the
probability that:
ON
(i) He does not read any newspaper.(ii) He reads only one of the newspapers. (iii)
He reads exactly two of the newspapers. (iv) He reads all three papers.

(Hint: Define events A=Reads Times, B= Reads Herald, C=Reads Express. Ans:(i)
44%, (ii) 41% (iii)11%, (iv)4%.)
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

8
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 2: E NUMERATION P RINCIPLES Technical Note

PY
C HAPTER 2

Enumeration Principles

CO
Much of computing probability has to do with counting number of outcomes that sat-
isfy some criteria. Here are some basic counting principles.

If r experiments are performed such that experiment 1 may result in n1 outcomes,


ON
experiment 2 in n2 and so on upto experiment r which can result in nr outcomes.
Then total number of potential outcomes is n1 n2 nr .

Combinations: nCr = Number of ways of choosing r objects out of n = r!(nn!r)! .

Exercises
TI

1. Show by logical argument and alternatively by just using algebra that:

nCr = (n 1)Cr1 + (n 1)Cr


EC

2. Show that for any positive integer n and real numbers x, y:


( x + y)n = nk=0 nCk x k ynk .

(Hint: Use induction on n. This result is known as the "Binomial Theorem")

3. As a special case of above problem and also by a direct argument, show 2n =


SP

nk=0 nck .

(Hint: Consider n cells which can either be filled or left empty. How many ways
can we do this?)

n!
Permutations n Pr = Number of ordered ways of choosing r objects out of n = (n
Permutations: r )!
.
IN

Exercise: There are 10 brands (B1 , B2 , , B10 ) in the market which have been
ranked by a market research firm. If you believe that this ranking is random,
what is the probability that the first three brands are B1 , B2 , B3 in that order?.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

9
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 2: E NUMERATION P RINCIPLES Technical Note
1
(Ans: 10P3 )

Exercises

PY
4. The numbers 1, 2, 3, , n are ordered randomly. What is the probability that (a)
1 and 2 are neighbors in that order (b) 1,2 and 3 are neighbors in that order.

(hint: In (a), for counting number of favorable cases consider the string "12" as a
( n 1) ! ( n 2) !
single entity. Take a similar approach for (b). Ans: (a) n! , (b) n!! .
n )

CO
5. You own Brand A of a certain product and there are 9 other competing brands.
What is the probability that your brand is among the top 3 in terms of consumer
preference?.

39!
(Ans: 10! )

6. How many ways can we put r distinguishable balls in n cells?.ON


(Hint: Each ball has n potential places to take. hence nr )

7. How many ways can we put r indistinguishable balls in n cells?.

(Answer: Imagine the r balls placed side by side thus creating (r-1) gaps between
them. (* * * * ...*). Imagine the n cells as being created by n + 1 sticks; (| |
| |....|), with the sticks at either extreme being fixed and the rest (n 1) of the
TI

sticks placed either in any of the (r-1) gaps or after the extreme * at either end. (e.g.
***|**| | ****||). Now solving the problem amounts to counting the number of
arrangements of (n 1 + r ) objects where (n-1) are of one type (|) and rest r of
EC

( n 1+r ) !
another type (*). Hence answer is (n 1 + r )Cr = (n1)!r!

8. How many non-negative integer solutions are there to the equation x1 + x2 +


+ xn = r.

(Hint: same as previous question)


SP

9. How many ways can we put r indistinguishable objects in n cells so that each cell
contains at least 1 object ?.

(Hint: In previous problem take (r-n) instead of r and ( x1 1), ( x2 1), ...etc..
Ans: (r 1)Cn1 )
IN

10. There are n objects such that n1 are of type 1, n2 of type 2,.., nr of type r. Objects of
each type are not distinguishable among themselves. How many distinguishable
arrangements of these n objects are possible?

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

10
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 2: E NUMERATION P RINCIPLES Technical Note
n!
(Ans: n1 !n2 !nr ! ).

11. In a city 7 accidents happened in a week.Choose the correct answer along with
appropriate justification. The probability that all 7 happened on different days of

PY
the week if

(a) the accidents are distinguishable and each distinguishable distribution of ac-
cidents across 7 days have equal probability.

(b) the accidents are indistinguishable and each distinguishable distribution of

CO
accidents across 7 days have equal probability

(Ans:(a) 77!7 , (b) 1


13C6 )

(Hint: First consider a simpler example with 2 accidents on 2 days and work out
explicitly without using any formulas. Then think about the above problem.)

(Note: The assumptions under part (a) and part (b) relate to two important the-
ON
ories in Physics for the distribution of atomic particles into a set of energy states.
The assumption under part(a) relates to "Maxwell-Boltzmann" Statistics which is
statistical behavior exhibited by some distinguishable particles and (b) relates to
"Bose-Einstein" statistics which is the statistical behavior exhibited by indistin-
guishable particles.)
TI

12. There are n flag poles and r flags. r1 flags are red, r2 are blue and r1 + r2 = r.
What are the number of distinguishable ways in which you can hoist these flags?.
(assume that a flag pole can hoist at most 1 flag and that you need to hoist all flags.
also, n r).
EC

rr!!
(Ans: nCr r1 !!rr2 ! )

Here is a question to demonstrate how the counting exercises like the above can
be useful in statistical analysis. Such analysis is referred to as theory of runs. This
can be useful when building models to see whether the deviations of actiual data
SP

from the model are indeed random. Typically, if it is random then we have a good
model because whatever is not explained by the model is just random noise.

13. There are 14 seats of which 6 are vacant/empty (E) and 8 are occupied (O) in the
following pattern EEEOOEOOOEEOOO. The question of interest is whether the
IN

seating is random. The approach taken is to ask what is the probability of getting
6 runs. ( Each continuous occurrence of Es or that of Os is termed as a "run").
Compare this with the pattern EEEEEEOOOOOOOO, i.e what is the probability
of getting 2 runs.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

11
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 2: E NUMERATION P RINCIPLES Technical Note
(61)C31 (81)C31
(Ans: P(6runs) = (8+6)C6
 14%, P(2runs) = 0.07%. The second
pattern is highly unlikely if the seating is indeed random. Therefore, if such a
pattern is actually observed then we have evidence to believe that the seating

PY
must not have been done at random.)

CO
ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

12
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

PY
C HAPTER 3

Conditional Probability, Bayes

CO
Theorem, Independence of Events

3.1 Conditional Probability


ON
Consider the experiment where we throw a fair 6 faced die. The set of possible out-
comes or the sample space is S = {1, 2, 3, 4, 5, 6}. Let A = {3}. Then P( A) = 1/6.
Suppose I give you additional information that the die-throw had resulted in an out-
come that is divisible by 3. Now what is the probability that the outcome was {3}. With
the additional information, the set of possible outcomes is a restricted sample space or
TI

a subset of the original sample space B = {3, 6}. Among the two there is no reason
why one outcome should be more likely than the other. Therefore, the new probability
for event {3} is 1/2. What we have worked out here is the conditional probability of
EC

observing A = {3} "given" or "conditioned" on event B = {3, 6}. A formal definition


is as follows.

Let P(B)>0. Conditional Probability of an event A conditioned on an event B, is given


by
P( A B)
P( A| B) =
SP

.
P( B)
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

13
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

Exercises

1. (Conditional probability is also a Probability, i.e. it is a function satisfying the


axioms of probability.)

PY
For a fixed event B with P( B) > 0, let Q( A) = P( A| B).

a. Show that Q() is a probability of S , i.e. it satisfies the three axioms.

b. What is Q( Bc )?.

CO
2. Consider families with 2 children. If one child of a family is a boy, what is the
probability that the other child is a girl?

(Ans: 2/3)

3. An insurance company sells a number of different policies; among these 60% are
for autos, 40% are for home owners, 20% are for both. Suppose a person is picked
at random from the population of policy holders. Let A1 be event that he has only
ON
an auto policy, A2 be the event that he has only a home owners policy, A3 be the
event that he has both auto and home owners and A4 be the event that he has
neither auto nor home owners.

a. Find P( A1 ), P( A2 ), P( A3 ), P( A4 ).
TI

b. Let B be the event that the person renew atleast one of auto or homeowners.
From past experience it is known that P( B| A1 ) = .6, P( B| A2 ) = .7 and P( B| A3 ) =
.8. Given that a person selected at random has an auto or a home owners policy,
EC

what is the probability that she will renew at least one of them?.

(Hint: Find P( B| A1 A2 A3 ) Ans. 0.675).

4. A common test for AIDS is called ELISA (Enzyme-Linked Immunosorbent Assay)


test. Among 1,000,000 people who were given the test, we can expect results
similar to that given in the table 3.1. If one of the 1000000 people are selected
SP

at random, what is (a) P( B1 ), (b) P( A1 ) (c) P( A1 | B2 ), (d)P( B1 | A1 ). What is the


interpretation of each of these probabilities?.

(ans: (a) 0.005 (b) 0.0785 (c) 0.074 (d) 0.062).


IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

14
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

Table 3.1: : ELISA Test


B1 : Has B2 : No Totals
AIDS AIDS

PY
A1: Test 4,885 73,630 78,515
Positive
A2: Test 115 921,370 921,485
Negative
Total 5000 995,000 1,000,000

CO
3.2 Bayes Theroem

Let A and B be two events. Bayes theorem gives a way to relate P( A| B) and P( B| A).
P( A B)
P( B| A) =
P( A)
P( A| B) P( B)
=
ON P( A)
P( A| B) P( B)
=
P( A B) + P( A Bc )
P( A| B) P( B)
=
P( A| B) P( B) + P( A| Bc ) P( Bc )
TI

General form of Bayes Theorem


Let B1 , B2 , , BN be events that are Mutually Exclusive and Exhaustive (i.e, Bi Bj =
, i = j and iN=1 Bi = S ) and let A be any other event with P( A) > 0. Then
EC

P( A| Bj ) P( Bj )
P( Bj | A) =
iN=1 P( A| Bi ) P( Bi )

Problems
SP

1. In a bolt factory, machines A, B, C manufacture respectively 25,35 and 40 percent


of total production. Of their output 5, 4 and 2 percent are defective. A bolt is
drawn at random from the produce and is found to be defective. What is the
probability that it was manufactured by machine A ?.
IN

(Ans: .3623)

Solution outline:

Define the events of interest.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

15
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

Let A, B and C denote the events that the bolt is produced by machine A, B and
C repectively. Let event D= the bolt is defective

Summarize known information

PY
P( A) = .25, P( B) = .35, P(C ) = .4, P( D | A) = .05, P( D | B) = .04, P( D |C ) = .02

Determine what you want to find. .... P( A| D )

P( D | A) P( A)
P( A| D ) =
P( D )

CO
P( D ) = P( D | A) P( A) + P( D | B) P( B) + P( D |C ) P(C )

2. According to a research study, the incidence rate of HIV in India is .4% for certain
section of the population. A Clinical test in India is 95 % accurate in detecting
HIV. i.e. If there is HIV, it will correctly detect it 95% of times. If there is no HIV, it
will again be correct in 95% of cases. A person from this section of the population
ON
undergoes a test and the test says he has HIV. What is the probability that he
really has the disease.
(a) 95% (b) 50% (c) Cannot be determined (d)  .07.

(Hint: Let A = has disease, B= test is positive, what are P(A|B), P(B|A) ?)

3. A second independent test that has similar accuracy also comes out positive.
TI

Now, what is the probability that he has disease?

(Such an updation of probability based on additional information is referred to


as "Bayesian Update" )
EC

4. An industrial raw material is graded and classified into the three categories 1-
Super quality, 2-Medium quality and 3-Low quality. The grading is done visually
and there is a high probability of misclassification. The misclassification however
is limited to neighboring categories. Category 1 may be misclassified as 2. Cate-
gory 2 may be misclassified as either 1 or 3. Category 3 may be misclassified as 2.
SP

Therefore, category 1 will never be misclassified as 3 and vice-versa.

Suppose the probability of misclassification is p for all possible types of misclas-


sification. Let Bi be the event that a lot offered for grading is actually of category
i (for i = 1, 2, 3). Let Cj be the event that a lot offered for grading is classified into
IN

category j (for j = 1, 2, 3). Aswer the following in terms of p

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

16
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

a)

P(C1 | B1 ) =

PY
P(C2 | B2 ) =

P(C3 | B3 ) =

b)

P(C1 ) = P( B1 ) + P( B2 ) + P( B3 )

CO
P(C2 ) = P( B1 ) + P( B2 ) + P( B3 )

P(C3 ) = P( B1 ) + P( B2 ) + P( B3 )

c). Suppose p = .2, P(C1 ) = .56, P(C2 ) = .32, P(C3 ) = .12.

P( B1 ) =

P( B2 ) =
ON
P( B3 ) =

d). Suppose p = 1/3, P(C1 ) = 6/10, P(C2 ) = 1/3, P(C3 ) = 1/15. Find P( B1 ),P( B2 ), P( B3 ).

5. In a factory, there are three machines 1, 2, 3, producing 50%, 30%, 20% respec-
TI

tively of the total output. Out of the items produced by machine 2, four percent
are defectives. The corresponding figure for machine 3 is 6%. The following is
known: "If an item is drawn at random from the production line and found to
EC

be defective then the conditional probability for this item to be produced by ma-
chine 1 is 0.50". What is the proportion of defective items among those produced
by machine 1?

(Hint: Use tree diagram with three machines representing three branches emanat-
ing from the first node, then from the end of each machine branch two branches
SP

Defective and Non-defective will come out. Answer: 0.048)

6. A production process involves three machines A, B and C, which produce 50%,


30% and 20% respectively, of the total output. Out of the items produced by ma-
chine A, 10% fail in a quality control test. The corresponding figures for machines
IN

B and C are 20% and 30% respectively. All items passing the quality control test
are directly acceptable. On the other hand, items failing in the quality control
test are further processed and thus 40%, 50% and 60% of them turn out to be
marginally acceptable, depending on whether they came from machines A, B and

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

17
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

C respectively, e.g., out of the items, that are produced by machine A and that fail
in the quality control test, 40% eventually turn out to be marginally acceptable,
and so on.

PY
(a) Find the probability that a randomly chosen item from the production process
is found to be directly acceptable.

(b) Find the probability that a randomly chosen item from the production process
turns out to be marginally acceptable.

CO
(c) Given that a randomly chosen item from the production process has failed in
the quality control test, what is the conditional probability that it turns out to be
marginally acceptable?

(d) Given that a randomly chosen item from the production process has turned
out to be marginally acceptable, what is the conditional probability that it was
produced by machine A?
ON
(e) Given that a randomly chosen item was not produced by machine B, what is
the conditional probability that it turns out to be marginally acceptable?

(Ans: (a) 0.83, (b) 0.086, (c) 0.506, (d) 0.232, (e) 0.08)
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

18
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

3.3 Independence of Events

Pairwise independence: Two events A and B are said to be independent if


P ( A B ) = P ( A ) P ( B ).

PY
If P( B) > 0, this is equivalent to saying P( A| B) = P( A).

i.e. knowing that event B has occurred does not alter the likelihood of event A
and vice-versa.

CO
Exercises:

1. A 6- faced fair die is tossed and the outcome is noted. Are the events A=outcome
is divisible by 2 and B=outcome is divisible by 3 independent?

2. Show that if A and B are independent then so are (a) A and Bc (b) Ac and B (c)Ac
and Bc . ON
Independence of Three Events: Events A, B and C are said to be (mutually) indepen-
dent if all of the following hold:

a. P( A B) = P( A) P( B), P( A C ) = P( A) P(C ), P( B C ) = P( B) P(C )


TI

b. P( A B C ) = P( A) P( B) P(C ) .

3. Exercise: Suppose A, B and C are mutually independent events and that P( A) =


.5, P( B) = .8 and P(C ) = .9. Find the probabilities that
EC

a. All three events occur

b. Exactly two of three events occur.

c. None of the events occur.


SP

(Ans. a) 0.36 b)0.49 c)0.01).

Note: It is important to note that both the above (a) and (b) in the definition
need to be verified for independence of three events. i.e, Just (a) does not ensure
(b) holds and also just (b) does not ensure (a) holds. Examples for both these
situations are covered in the problems.
IN

Below is an example to show that Pair-wise Independence of events does not


imply Mutual Independence of Many Events.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

19
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

4. Exercise: Two 6 faced dice are thrown. Let event A= Die 1 shows 6. event, B=
Die 2 shows six, C= Both dice show the same face. Show that A, B, C are pairwise
independent but not mutually independent.

PY
Below is an example to show that P( A B C ) = P( A) P( B) P(C ) does not
imply Pairwise independence.

Exercise: Consider families with 3 children. Let B denote boy and G denote girl.
Let sample space be S= BBB, BBG, BGB, BGG, GBB. GBG, GGB, GGG. Assume

CO
that all 8 possiblities for the three children in order have equal probabilities.

Define events A B and C as follows: A= at least 2 boys, B= 1st child is a girl, C=


2nd child is a boy.

Show that P( A B C ) = P( A) P( B) P(C ) but A, B, C are not pairwise inde-


pendent. ON
(Solution Show that P(A)= P(B)=P(C)= 1/2 P(ABC) = P( GBB ) = 1/8= P(A). P(B).
P(C) but now note that P(AB) is also equal = P(GBB) = 1/8 which is not equal to
P(A).P(B)).

Independence of many Events: Events A1 , A2 , , A N are said to be indepen-


TI

dent if P( Ai1 Ai2 Aik ) = P( Ai1 ) P( Ai2 ) P( Aik ) for all 1 < k N and all
k-tuples (i1 , i2 , , ik ) of {1, 2, , N }.

5. Exercise: A machine has n components. It functions only if atleast one com-


EC

ponent functions. All components function independently and probability of a


component not functioning = pp.. What is the probability that all components are
functioning given that the machine is functioning.
(1 p ) n
(Ans. 1 p n )
SP

What if events are not independent ?

In general, if A1 , A2 , , A N are not necessarily independent, we can write


IN

P ( A 1 A 2 A 3 A N ) = P ( A 1 ) P ( A 2 | A 1 ) P ( A 3 | A 1 A 2 ) P ( A N | A 1 A 2 A N 1 )

[ Notation: Sometimes for convenience P( A1 A2 A3 A N ) may just be writ-


ten as P( A1 A2 A N ) (without explicitly writing the intersection () sign)].

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

20
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

6. Exercise: An urn has 3 black balls and 5 white balls. Each time I draw a ball. If
it is black, I add 1 black ball and 2 white balls. If it is white, I add 2 black balls
and 1 white ball. In four successive draws, what is the probabiity of getting a

PY
Black-White-Black-White in four successive draws?.

(Hint: Let Bi denote the event ith draw results in Black ball and Wj the event that
jth draw is white. Then P( B1 W2 B3 W4 ) = P( B1 ) P(W2 | B1 ) P( B3 | B1 W2 ) P(W4 | B1 W2 B3 )
Ans: 0.06016).

CO
ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

21
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 3: C ONDITIONAL P ROBABILITY, B AYES T HEOREM ... Technical Note

PY
CO
ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

22
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 4: R ANDOM VARIABLE AND P ROBABILITY D ISTRIBUTION Technical Note

PY
C HAPTER 4

Random Variable and Probability

CO
Distribution

Suppose we toss a coin 3 times. The sample space is given by


S = { HHH, HHT, HTH, HTT, THH, THT TTH , TTT }.
THT,, TTH,
ON
Assuming the coin is fair and independent between tosses, the probability of each out-
come in the sample space is 1/8. Suppose, we are not interested in the sample space in
all its detail, but only in the "number of heads". Then, the information of interest can
be summarized as follows.

There are 4 possible values for number of heads depending on the outcomes in
TI

the sample space, viz {0, 1, 2, 3}. The corresponding probabilities for these values
to occur are determined as follows:

{ TTT }) = 1/8
P( Numbero f heads = 0) = P(({
EC

P( Numbero f heads = 1) = P({ HTT, THT, TTH }) = 3/8

P( Numbero f heads = 2) = P({ HHT, HTH, THH }) = 3/8

P( Numbero f heads = 3) = P({ TTT }) = 1/8.


SP

A Random Variable (X) is a function that maps the sample space to the real line, (i.e. X :
S R)
R). In the example above, ""X=Number of Heads" is a random variable.
A Discrete Random Variable can only take countably many values. Note that countably
many does not necessarily mean finite. e.g. the set {1/7, 1/8, 1/9,3} has countably many elements
because it has finitely many elements. However, the set {0, 1, 2, 3, , } is also countable
IN

although it has infinitely many elements. The set of rational numbers (i.e. ratios of integers) is
also countable. . To be precise, a set of possible values is said to be "countable" if one can define
a one-one map from the set to the set of natural numbers
numbers.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

23
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 4: R ANDOM VARIABLE AND P ROBABILITY D ISTRIBUTION Technical Note

A Continuous random variable takes values in a continuum. e.g. X= a number picked


ranodmly from the interval [0,1].

The data we observe in real life, can be thought of as realizations of a random

PY
variable.

Example 1 : X31/Dec/2015 = Closing Stock price of Infosys on 31 Dec 2015.

Example 2 : X = Wind speed at hub height of a wind mill.

CO
Example 3 : X = Waiting time for a machine to break down after repair.

Example 4: X=CAT
=CAT score of a randomly chosen first year IIMA student.

Example 5:

1, if treatment for disease is effective
X=
0, if treatment is not effective
ON
As in the examples above, a number of phenomenon involving uncertainty can be for-
mulated in terms of random variables.

questions:
To understand any random variable, need to ask two basic questions

1. What possible values can the random variable take?


TI

2. .. and with what probabilities?.

....To answer the latter question, one needs to be clear about the assumptions
EC

being made.

For example, in the case of three coin tosses

Random variable was X= Number of heads.

Possible values X = 0 or 1 or 2 or 3
SP

.... and with probabilites 1/8, 3/8, 3/8 and 1/8 respectively.

.... This is based on the asumptions that (i) coin is fair (ii) outcomes from different
tosses are independent.
IN

Probability Mass Function ( f ()) for a discrete random variable X taking values X =
{ a1 , a2 , , ak , } is given by

f ( a) = P( X = a), where a X .

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

24
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 4: R ANDOM VARIABLE AND P ROBABILITY D ISTRIBUTION Technical Note

Sometimes, the above may just be referred as the distribution of the random variable.

Cumulative Distribution Function is defined as F ( a) = P( X a), a R.

We will see later that for continuous random variables, the p.m.f is not meaningful.

PY
(p.d.f): A
However, an analogous concept is that of Probability Density Function (p.d.f):
function f () is said to be the p.d.f of a continuous random variable X if P( a < X b) =
b
a
f ( x )dx. Note that the cumulative distribution function is meaningful for both dis-
crete and continuous random variables.

CO
Important properties of c.d.f F ()

(i) F () = 0, F () = 1

(ii) F ( x ) is right continuous. i.e. limh0 F ( x + h) = F ( x )

(iii) P( a < X b) = F (b) F ( a)

1. Exercise
ON
Classify the following variables into discrete and continuous random variables.

a. Number of patients arriving at a clinic in one hour.

b. Heart rate (Numer of beats per minute) of a student who has just seen the first
quiz question.
TI

c. Time taken by a student to complete an examination.

d. Number of crimes committed in a month in a city.


EC

e. Blood pressure of a patient

f. Height of a student randomly chosen from the class.

2. Exercise: Let a random experiment be the cast of a pair of unbiased six-sided dice
and let X be equal to the smaller of the outcomes if they are different and common
SP

values if they are equal. Find the p.m.f. of X by explicitly stating the assumptions
you are making.

Solution outline

Determine what the sample space is?


IN

What are the possible values of X?

What is the probability of each value of X ? (Each value of X corresponds to some


specific outcomes in the sample space. Add the probabilities of those outcomes)

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

25
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 4: R ANDOM VARIABLE AND P ROBABILITY D ISTRIBUTION Technical Note

3. Exercise (Geometric Distribution) :

A coin is independently tossed repeatedly. Let X=the number of tosses until and
including the toss in which a head appears for the first time. Let P( H ) = p.

PY
a. What is the distribution of X?.

b. Show that P( X > k + r | X > k ) = P( X > r ).

(Ans: a. P( X = k ) = (1 p)( k 1) p for k {1, 2, 3, , }.)

CO
Geometric Distribution is a "discrete waiting time" distribution. It is the distribution
of waiting time until some event happens, where time is measured in integer units.
An analogue of this in continuous time is the "Exponential Distribution" which we will
study later. The property under part b of previous question is known as Memoryless
property.. Knowing that the event has not happened until r time units does not change
the probability that you have to wait for k more time units until the event happens.
ON
In other words, the past is irrelevant to determine future waiting time. What we have
seen above is that Geometric distribution satisfies this property. What is even more
interesting is that Geometric is the only discrete distribution on {0, 1, 2, 3, , }
with this property.

4.0.1 Summary Measures of a Distribution


TI

A random variable by definition takes random values (as governed by p.m.f). For
example, I cannot say what exactly will be the stock price of a company tomorrow.
EC

However, with some effort it is easier for me to comment on the distribution of stock
price. (i.e. what values and with what probability). Since I cannot precisely say what
exact value the random variable will take, it helps to define some summary measures
of its distribution. Given below are some summary measures.

Expectation of X (E[X]) or Mean of X:


SP

For Discrete random variable with p.m.f. p(): E[ X ] = x:p( x)>0 x p( x ).

Alternatively, if X takes value a1 , a2 , then E[ X ] = i1 ai P( X = ai ).

E[ X ] can be interpreted as the "Average" value of X since it is the weighted aver-


IN

age of possible values of X weighted by the frequency of their occurences.



The analogous definition for continuous random variable is E[ X ] = x f ( x )dx

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

26
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 4: R ANDOM VARIABLE AND P ROBABILITY D ISTRIBUTION Technical Note

Important Properties of Expectation

(i) For constants a and b, E[ a + bX ] = a + b.E[ X ].

PY
(ii) For any two random variables X and Y, E[ X + Y ] = E[ X ] + E[Y ].

(iii) Let X take values { a1 , a2 , , ak , } and let g() be a function on the real
line. Then Y = g( X ) is another random variable.

E [Y ] = yP(Y = y) can be computed as = g ( ai ) P ( X = ai )

CO
y:P(Y =y)>0 i 1

Variance of X [V(X)]:: This is a measure of spread of the values of X around the


mean value of X.. It is computed as the expected value of squared deviations from
the mean.

Discrete: V ( X ) = ( x E[ X ])2 p( x ) = (ai E[X ])2 P(X = ai ).


p( x )>0 i 1
ON
Continuous: V ( X ) = ( x E[ X ])2 f ( x )

A simplified formula for V ( X ) which can sometimes be useful for computation


is given by
V ( X ) = E[ X 2 ] ( E[ X ])2
TI

This is obtained as below.

V ( X ) = E[( X E[ X ])2 ] = E[ X 2 2X.E[ X ] + ( E[ X ])2 ]


EC

= E[ X 2 ] E[2X.E[ X ]] + ( E[ X ])2
= E[ X 2 ] 2E[ X ].E[ X ]] + ( E[ X ])2
= E[ X 2 ] ( E[ X ])2


(SD(X)): SD ( X ) =
Standard Deviation of X (SD(X)) V (X) .
SP

This is also a measure of spread but has the same units as X, unlike variance
which is of squared units.

1. Exercise

[Bern(p)] Find E[ X ] and V [ X ] for the Bernoulli random


Bernoulli Distribution [Bern(p)]:
IN

variable
:
1, with probability p
X= .
0, with probability (1 p)

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

27
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 4: R ANDOM VARIABLE AND P ROBABILITY D ISTRIBUTION Technical Note

2. Exercise

For X Geo ( p), find E[ X ], V [ X ] and SD ( X ).

PY
Solution idea:

observe
k =1 (1 p )
k 1 p = 1

So, k 1 p E[ X ]
k =1 ( 1 p ) = p . Now differentiate both sides, then L.H.S will give p .

(1 p )
(Answer: E[ X ] = 1p , V ( X ) = p2
)

CO
ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

28
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 4: R ANDOM VARIABLE AND P ROBABILITY D ISTRIBUTION Technical Note

3. Exercise

An insurance company writes a policy to the effect that an amount of money A


must be paid if some event E occurs within a year. If the company estimates that

PY
E will occur within a year with probability p,, what should it charge the customer
in order that its expected profit will be 10% of A?.

Percentiles or Quantiles:: In many business problems it is common to consider


summary measures beyond mean or variance. For example, a financial lending

CO
institution may want to know the "value at risk" (VAR), i.e a value such that the
probability of the losses from the portfolio exceeding this number is very small
e.g. 0.05. In probablistic terms, "loss" is the random variable and VAR in the
example is the 95th percentile or quantile of the distribution of loss. To be more
precise,
Q is said to be the 100pth percentile (or quantile) of the distribution of X if P( X <=
Q) = p.
ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

29
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 4: R ANDOM VARIABLE AND P ROBABILITY D ISTRIBUTION Technical Note

Table 4.1: : Random Variable


X -3 -1 0 1 2 3 5 8
P( X = x ) .1 .2 .15 .2 .1 .15 .05 .05

PY
4. Exercise

Table 4.1 shows the p.m.f of a random variable X. Find

a. P( X > 0)

CO
b. P( X is even)

c. P(1 X 8)

d. P( X = 3| X 0)

e. P( X 3| X > 0)

f. E[ X ]
ON
g. V [ X ]

h. Find the 30 th quantile of X.

( Ans: a) .55, b) .3, c) .55 d) .2222 e) .4545 f) 1 g) 6.5 h) -1 (actually any number
TI

q : 1 q < 0 is a 30-th quantile )

5. Exercise
EC

Suppose that a school has 20 classes: 16 with 25 students in each, three with 100
students in each and one with 300 students for a total of 1000 students.

a. What is the average class size?

b. Suppose a student is picked at random from the 1000 students. Let X= size of
SP

the class to which he belongs. what is the p.m.f of X?.

c. What is E[X] ?

d. Is it surprising why a. and c. are not equal. Can you define a random variable
Y such that E[Y ] will give the answer in a.
IN

(Ans: a) 50 b) P(X=25)=.4, P(X=100)=.3, P(X=300)=.3 c) 130.)

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

30
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 5: S OME P OPULAR D ISCRETE D ISTRIBUTIONS Technical Note

PY
C HAPTER 5

Some Popular Discrete

CO
Distributions

5.1 Binomial Distribution [Bin(n


n,, p)]
ON
This can be thought of as a generalization of Bernoulli distribution. Bern(p) is the distri-
bution of X=number of heads in one single toss. Binomial ((Bin(n, p)) is the distribution
of X= number of heads in n independent tosses.

1. A machine produces n items in a day. Probabilty that an item is defective is


TI

p.. Assume that the quality of items are independent of each other. Let random
variable X = Number of defective items produced in the day.

a. What is the distribution (pmf) of X?.


EC

b. Find E[X], V(X).

Solution Outline:

First basic question: List possible values of X....It i s {0, 1, 2, ...., n}.
SP

What is P(X=0) ?

What is P(X=n) ?

What is P(X=k)? for 0 < k < n


IN

For part (b)

Note that nk=0 nCk pk (1 p)nk =1

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

31
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 5: S OME P OPULAR D ISCRETE D ISTRIBUTIONS Technical Note

Note that the number of outcomes with X = k is same as the number of ways of
arranging n objects, k of which are of one kind (i.e. H) and n k are of another
n!
kind (i.e. T) , i.e. k!(nk )!
= nCk .

PY
Also, note that each outcome with k heads has probability pk (1 p)nk .

Hence p.m.f is given by P( X = k ) = n!


k!(nk )!
p k (1 p)nk , for k {0, 1, 2, , n}

For b, note that

CO
n n
n! ( n 1) !
k. k!(n k)! pk (1 p)nk = np (k 1)!(n k)! pk1 (1 p)nk = simplify .... = np
k =0 k =1

Similar approach for variance gives V ( X ) = np(1 p)

(Alternative approach for (b):: Express Binomial random variable as a sum of n


independent Bernoulli random variables. Then use the facts that (i) Expectation
ON
of sum of any random variables is sum of the expectations. (ii) Variance of sum
of independent random variables is sum of the individual variances.)

2. Exercise If probability of hitting a target is .2 and 10 shots are fired independently.

a. What is the probability of hitting the target at least twice.


TI

b. What is the probability that the target was hit at least twice given that it had
been hit at least once?

(Ans: a) .6242 b) .6992 )


EC

3. Exercise An airline always overbooks when possible. A particular plane has 95


seats and each ticket cost Rs. 4000. The airline sells 100 such tickets.

a. If the probability of an individual not showing up is 0.05, assuming indepen-


dence, what is the probability that the airline can accomodate all who show up?.
SP

(Hint: Define r.v X


X=number
=number of people who show up out of 100. Express the
probability in terms of X and compute)

b. If the airline must return Rs. 4000 plus a penalty of Rs. 5000 for all who show
up but cannot be accomodated, what is the expected penalty the airline will pay.
IN

(Hint: Create a new r.v. Y= penalty paid by airline. Express Y in terms of X. Since
we know distribution of X, find the distribution of Y. i.e. Identify possible values
of Y and its probability. Ans:(a) .5640 (b) Rs. 4275.)

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

32
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 5: S OME P OPULAR D ISCRETE D ISTRIBUTIONS Technical Note

5.2 Hypergeometric Distribution

1. Exercise: An urn contains N balls out of which R are red and the rest white
in color. A sample of n balls is chosen randomly without replacement. Let the

PY
random variable X = Number of red balls in the sample. Show that the p.m.f of
X is given by
RCr ( N R)Cnr
P( X = r ) = , r {0, 1, 2, ..., n}
NCn

CO
X in the above exercise is said to follow a "Hyper-geometric" distribution.
Note that for the formula above makes when we have the following conditions:

min( R, n) r and ( N R) n r

For any r,, if either of these conditions do not hold, the probability is taken to be
0. ON
2. Exercise: A random sample of size 3 is drawn without replacement from a lot of
size 10, which contains 4 defective items. What is the probability that at least 1 of
the 3 items drawn are defective?.

The expectation and variance for the above hypergeometric distribution are as
follows. (The derivation is a bit complex and the reader is referred to example 8j
TI

chapter 4 of Sheldon Ross book for derivation)


 
nR nR (n 1)( R 1) nR
E[ X ] = , V (X) = +1
N N N1 N
EC

3. Exercise: A purchaser of electrical components buys them in lots of size 10. It is


his policy to inspect 3 components randomly chosen from the lot and to accept the
lot only if all 3 are non defective. If 30% of the lots have 4 defective components
and 70% have only 1, what proportion of lots will the purchaser reject?.

(Ans. 0.46. Hint: Define event A=lot has 4 defectives, random variable X= Num-
SP

ber of defectives in the sample. P(acceptance)= P( X = 0) = P( X = 0| A) P( A) +


P( X = 0| Ac ) .P( Ac ). Then use hyper-geometric for each term.)

Here is an exercises that will help understand one key assumption behind Hy-
IN

pergeometric distribution better.

4. A sample of 3 items is selected at random from a box containing 20 items of which


4 are defective. Let X = number of defective items in the sample.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

33
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 5: S OME P OPULAR D ISCRETE D ISTRIBUTIONS Technical Note

a. What is the distribution of X when the sampling is done with replacement.

b. What is the distribution of X when the sampling is done without replacement.

PY
5. Exercise: (If you like a bit of algebra...derive this !, else focus on understanding
the result). Suppose X follows Hyper-geometric distribution, i.e.,

RCr ( N R)CKr
P( X = r ) = , r {0, 1, 2, ..., Min(K, R)}
)}
NCK

Show that as N , if R
N p, then P( X = r ) = k Cr pr (1 p)kr .

CO
Meaning of the above result: When the population is large, sampling with or
without replacement will not make much of a difference. Hence, Hypergeometric
will be close to Binomial.

6. Reading Exercise: An interesting application of Hypergeometric distribution


is in estimating unknown population sizes. Read example 8h in section 4.8.3 in
Sheldon Ross book "A first course in probability theory".
ON
5.3 Negative Binomial Distribution

Negative Binomial can be thought of as a generalization of the geometric(


geometric(p) dis-
tribution. Geometric is the distribution of X =number of tosses until and in-
TI

cluding 1 head, whereas Negative binomial is the distribution of X = number of


tosses until and including r heads.

Exercise
EC

1. A coin with P(H)=p is tossed independently. Let X= Number of tosses until r


heads occur. Show that the p.m.f of X is given by

P( X = n) = (n 1)Cr1 pr (1 p)nr , n {r, r + 1, , }


SP

[Here X is said to follow Negative Binomial(r, p) distribution. ]

2. If independent trials each resulting in a success with probability p are performed,


what is the probability of r successes before m failures.

(Hint: r successes occur before m failures if and only if r successes happen no


IN

later than r + m 1 trials. Ans: nm=+rr1 (n 1)Cr1 pr (1 p)nr )

3. You are interested in estimating the proportion of missing books in a library. You
take the catalogue and design two sampling plans

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

34
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 5: S OME P OPULAR D ISCRETE D ISTRIBUTIONS Technical Note

a) Draw a random sample of size 100, of names from the catalogue. Then look
them up in the library to see how many of them are missing.

b) Keep drawing a random item sequentially from the catalogue and check whether

PY
the item is missing or not. Once you encounter two missing books, you stop the
sampling.

If the true proportion missing=.01,

(i) what is the expected number of books you will check in the two schemes?

CO
(ii)What is the probability that you encounter no missing item in the first sam-
pling scheme?

(iii) What is the probability that you will sample more than 500 items in the sec-
ond sampling scheme? ON
(Ans: (i) (100,200) (ii) 0.366 (iii) .0397 . Caution: If using Excel check the syntax of
the command carefully to make sure you supply correct inputs)

The negative-binomial distribution is a discrete waiting time distribution. It is the wait-


rr)) of successes happens. Clearly, this can be viewed as
ing time until a fixed number (r)
the sum of r independent geometric random variables. Using the properties of sum of
TI

independent random variables, one can easily see that

r r (1 p )
E[ X ] = V (X) =
p p2
EC

5.4 Poisson distribution [ Poi ()]

A random variable X is said to follow a Poisson Distribution with rate or mean if it


has the following p.m.f
SP

e k
P( X = k) = , for k {0, 1, 2, 3, , }.
k!

Typically, such a distribution is found useful in modelling the number of events hap-
pening in a fixed time interval. e.g. the number of radio active particles that decayed
IN

in 1 hour. It is important that the number of occurrences be looked at against some


reference. The reference need not always be "time". For example, it can be used to
model the number of occurrences of printing mistakes per every 100 words in a book,
the number of claims from an auto insurance 100 Million dollar portfolio.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

35
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 5: S OME P OPULAR D ISCRETE D ISTRIBUTIONS Technical Note

An Important Note about the Popular Distributions. So far, all the distributions that
we have studied can be derived just based on some assumptions made about the exper-
iment or underlying process. In fact, this is what makes these distributions important

PY
and useful. For example, our knowledge of the insurance context can help us decide
whether assuming that claim from different policies are independent of each other and
whether there can be atmost 1 claim from each policy and if the portfolio of policies
I am looking at are of similar risk. If these business context conditions are met then
Binomial naturally follows as the distribution for number of claims. This is true even

CO
in the case of Poisson distribution. We will just state the underlying assumptions for
Poisson. The derivation of why that leads to the Poisson distribution is beyond the
scope of this note.

Suppose X is a random variable that is the number of events happening in a time interval
of length 1. Then, it will follow Poisson() if the following conditions are met:

Assumptions underlying the Poisson Distribution


ON
(i) In any small interval t,
t, there can be at most one occurence.

(ii) In a very small interval t, P(occurence within the interval (t, t + t)) t.

(iii) Occurence or non-occurence of events in any two disjoint time intervals are indepen-
dent of each other.
TI

1. Exercise: Expectation and Variance of X Poi ()


EC

Show that E[ X ] = V ( X ) =

Note that the expectation and variance of a Poisson random variable are equal.
Such properties help while doing statistical modeling. For example, when ana-
lyzing a dataset on event occurence, you may observe that the mean and variance
in the data are close. That may be an indication that the underlying process may
SP

be poisson.

2. Exercise: An executive makes an average five telephone calls per hour at a cost
of Rs. 2 per call. Determine the probability that in any hour the cost of calls (a)
exceeds Rs 6 and (b) remains less than Rs. 10.
IN

(Ans: a).7349 b) 0.4405 )

3. The number of break downs of a computer system in a month is believed to fol-


low a Poisson distribution. It has been consistently observed in the past that the

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

36
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 5: S OME P OPULAR D ISCRETE D ISTRIBUTIONS Technical Note

average number of monthly break downs is 1. Find the probability that this com-
puter will work for 3 months (a) without any break down (b) with exactly one
break down.

PY
(Ans: a) .0497 b) .1494 )

Poisson can be viewed as a limiting case or approximation to a Binomial(n,p).

4. Exercise:( Try if you like a bit of algebra!). Suppose X Bin(n


n,, p) and suppose
e k
n and that np then P( X = k ) k! .

CO
5. Color blindness appears in 1% of a population. How large a random sample
(with replacement) should one draw from the population if the probability of it
containing atleast 1 color blind person is 95% or more. Use both the binomial and
Poisson to derive the required sample sizes.

(Ans a. 300 , b. 300 (hint: for Poisson take = .01 n))


ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

37
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 5: S OME P OPULAR D ISCRETE D ISTRIBUTIONS Technical Note

Table 5.1: : Discrete Random Variables/ Distributions


Name Para values taken p.m.f E[X] V[X]
me-

PY
ters
(1 p )
Geometric p k {1, 2, , } (1 p ) k 1 p 1
p p2
Bernoulli p k {0, 1} p k ( 1 p ) 1 k p p (1 p )
Binomial (n, p) k {0, 1, , n} nCk pk (1 p)nk np np(1 p)
RCk ( N R)Cnk
Hyper geo- ( N, R k {0, 1, , n} NCn
nR
N [ (nN
1)( R1)
1 +1

CO
nR nR
metric , n) min( R, n) k N ] N
n ( N R)
( k 1) ! r (1 p )
Negative (r, p) k {r, r + 1, , } (r 1)!(kr )!
r
p p2
Binomial pr (1 p ) k r
e k
Poisson k {0, 1, , } k!
ON
Let us recap the various discrete distributions we learnt through the following
exercise.

6. Exercise: Determine the discrete distribution better matches the random variable
described below and write down the p.m.f. clearly where possible.
TI

a. A manufacturer of computer chips randomly selects 100 chips in order to de-


termine percentage defective. X=number of defectives in the 100 sampled chips.

b. An MBA grad keeps appearing for recruitment interviews until he gets se-
EC

lected into two jobs. Then he selects the best of two. Probability of his getting
selected to any job is 0.6 and this event is independent of other jobs. X= number
of interviews he takes.

c. Of five applicants for a job, two will be selected. Although all are equally
qualified, only three of the applicants have the ability to fulfill the expectations
SP

of the company. Suppose two selections are made at random and X=number of
qualified applicants selected who can fulfill the companys expectation.

d. Let X= number of calls recieved at a call center on a given working day.

e. An opinion poll is conducted by taking a sample of 1000 randomly chosen


IN

voters out of one million total voters, to determine how many people support a
particular party. X= number of people in the sample who support.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

38
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER Technical Note
6: J OINT D ISTRIBUTIONS OF MORE THAN ONE R ANDOM VARIABLE

PY
C HAPTER 6

Joint Distributions of more than one

CO
Random Variable

Consider a simple problem. Suppose 3 distinguishable balls are distributed randomly


into 3 cells. Let X1 = number of balls occupying cell 1 and X2 = number of balls occu-
ON
pying cell 2. Then X1 Bin(3, 13 ). Also, X2 Bin(3, 13 ).

Question: Is P( X1 = n1 , X2 = n2 ) = P( X1 = n1 ) P( X2 = n2 )?

Joint distribution of discrete random variables ( X1 , X2 , , Xk ). means knowing the


TI

"joint probability mass function" P( X1 = a1 , X2 = a2 , , Xk = ak ) or equivalently the


"joint cumulative distribution function" P( X1 a1 , X2 a2 , , Xk ak ) for all possible
values of ( a1 , a2 , , ak ).
EC

Basically, we are again asking the same two basic questions...

1. What possible combination of values can the random variables ( X1 , X2 , , Xk )


take?
SP

2. With what probabilities?

In principle, we can talk about joint distributions of a many random variables, some
or all of which may be continuous. For continuous random variables, it does not make
sense to ask the probability of particular value. In that case,
IN

So in general knowing the Joint distribution of any discrete or continuous random


variables ( X1 , X2 , , Xk ). means knowing P( X1 a1 , X2 a2 , , Xk ak ) for all
possible values of ( a1 , a2 , , ak ).

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

39
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER Technical Note
6: J OINT D ISTRIBUTIONS OF MORE THAN ONE R ANDOM VARIABLE

Multinomial Distribution: Suppose n distinguishable balls are distributed ran-


domly into k cells such that probability of a ball occupying cell i is pi , for i
{1, 2, , k} and p1 + p2 + + pk = 1. Let Xi = number of balls occupying cell

PY
i. Then,
n!
P ( X1 = n 1 , X2 = n 2 , , X k = n k ) = pn1 p2n2 pnk k
n1 ! n2 ! n k ! 1

for allni {0, 1, 2, , n} such that n1 + n2 + + nk = n


n..

CO
This is a generalization of the Binomial distribution which can be thought of as
resulting from distributing n distinguishable balls into 2 cells (H and T). The first
term in the product on R.H.S is the number of ways to allocate the occurrence of
faces 1, 2, ....,k into n places so that n1 of type 1, n2 of type 2,.. etc occur. Each such
allocation has probability p1n1 p2n2 pnk k . ON
1. Exercise: Suppose a 6 faced die has 2 faces numbered 1, 3 faces numbered 2 and
1 face numbered 3. Suppose the die is thrown independently 10 times and in each
throw each face has an equal chance of showing up.

a. What is the joint distribution of ( X1 , X2 , X3 ) where Xi = number of times face


i shows up.
TI

b. What is the distribution of ( X1 + X2 )


EC

Marginal Distribution: Suppose we know the joint distribution of ( X, Y ). Sup-


pose a1 , a2 , , are values taken by X and b1 , b2 , are values taken by Y. Then,

P( X = a) = P(X = a, Y = bj )
j 1
SP

Note that this is not new. The event { X = a} is the union of disjoint events
{ X = aa,, Y = b j }, j 1 and the equality just follows by applying the axioms
of probability. Whne we are studying jointly distrbuted random variables, the
distribution of individual random variables are referred to as "Marginal distribu-
tions".
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

40
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER Technical Note
6: J OINT D ISTRIBUTIONS OF MORE THAN ONE R ANDOM VARIABLE

Table 6.1: : Joint Distribution (Each entry in the cells (i.e. apart from row and column
headings) is the probability that X takes the vaue in the column heading
and Y takes the value in the row heading
X values

PY
Y Values 1 0 2 6
1 1 1 1
-2 9 27 27 9
2 1 1
1 9 0 9 9
1 4
3 0 0 9 27

CO
2. Exercise: X and Y are random variables with the joint distribution as in table 6.1.

a. Find the marginal distribution of X.

b. Find the marginal distribution of Y.


ON
Independence of Random Variables

(i) Two random variables X and Y are said to be independent if

P( X = a, Y = b) = P( X = a) P(Y = b) for all possible values ( a, b) of ( X, Y ).


TI

or an equivalent definition if continuous random variables are involved, would


be
EC

P( X a, Y b) = P( X a) P(Y b) for all possible values ( a, b) of ( x, Y ).

Multiple random variables X1 , X2 , , Xk are independent if


(ii)Multiple

P ( X1 = a 1 , X2 = a 2 , , X k = a k ) = P ( X1 = a 1 ) P ( X2 = a 2 ) P ( X k = a k )
SP

or an equivalent definition if continuous random variables are involved, would


be

P ( X1 a 1 , X2 a 2 , , X k a k ) = P ( X1 a 1 ) P ( X2 a 2 ) P ( X k a k )
IN

for all possible ( a1 , a2 , , ak ).

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

41
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER Technical Note
6: J OINT D ISTRIBUTIONS OF MORE THAN ONE R ANDOM VARIABLE

Returning to the question

c. Are X and Y independent?

PY
d. Find P(Y is even)

e. Find P(XY is odd)

f. Find P( X > 0, Y 0)

g. Find E[ X ] and E[Y ]

CO
h. Find V [ X ] and V [Y ].

Covariance: In practical problems, while dealing with multiple random vari-


ables, it becomes important to study the association between the random vari-
ables. One measure of linear association is "Covariance". Loosely speaking it tries
to measure how two random variables linearly vary with repect to each other.
Mathematically Covariance between random variables X and Y is denoted by
ON
Cov( X, Y ) and defined as

Cov( X, Y ) = E[( X E[ X ])(Y E[Y ])]

It is easy to see that, alternatively


TI

X,, Y ) = E[ XY ] E[ X ] E[Y ]
Cov( X

Correlation Coefficient: Covariance is dependent on the units of the measured


variables. A related measure is the correlation coefficient which is defined as
EC

follows
Cov( X, Y )
X, Y ) = 
Corr ( X
V ( X ) V (Y )
You will learn more about correlations when you study "Regression Modeling".

i. Find Cov( X
X,, Y )
SP

j. Find Correlation Coefficient between X and Y.

Important Result: Covariance of independent random varoiables

X , Y are independent, then Cov( X, Y ) = 0, OR E[ XY ] = E[ X ] E[Y ]


If X,
IN

k. Find E[ X |Y = i ] for i = 1, 0, 2, 6. (Hence note that E[ X |Y ] is a function of Y)

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

42
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER Technical Note
6: J OINT D ISTRIBUTIONS OF MORE THAN ONE R ANDOM VARIABLE

Conditional Distribution of X given Y = y (denoted X|Y=y) is nothing but


specifying P( X = x |Y = y), for all possible values of x gien the particular value
y of Y.

PY
l. Find E[Y | X = j] for j = 2, 1, 3.

m. Let g(Y ) = E[ X |Y ]. Verify that E[ g(Y )] = E[ E[ X |Y ]] = E[ X ]

Important Result: Expectation and Variance of Conditional Expectation

CO
More generally the following important equalities hold. These will be stated
without proof.

For any two random variables X and Y

E[ X ] = E [ E[ X |Y ]]
V [ X ] = V ( E[ X |Y ]])) + E (V [ X |Y ]])
ON
3. Exercise: Let X Poi (2) and Y Poi (3). Let X and Y be independent.

a. What is the distribution of X + Y


Y??

b. What is the distribution of X | X + Y = 10?


TI

4. Exercise Customers arrive at a bank according to a Poisson distribution at a rate


5 per hour. Each customer arrives independently of each other and is male with
probability .3 and female with probability .7. Let X= number of male customers
EC

arriving in an hour and Y=


Y = number of female customers arriving in an hour.

a. What is the joint distribution of ( X, Y ) ?

b. Are X and Y independent ?

c. What are the marginal distributions of X and Y?


SP

Caution: Independence of X and Y implies Cov( X, Y ) = 0 but Cov( X, Y ) does


not necessarily imply independence.
IN

5. Exercise X and Y are random variables with the joint distribution as in table 6.2.

a. Find the marginal distribution of X.

b. Find the marginal distribution of Y.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

43
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER Technical Note
6: J OINT D ISTRIBUTIONS OF MORE THAN ONE R ANDOM VARIABLE

Table 6.2: : Joint Distribution


X values
Y Values -1 0 1
1 1

PY
-1 6 6 0
1 1
0 0 6 6
1 1
1 6 6 0

c. What is cov(X,Y)?

CO
d. Based on c, can we conclude that X and Y are independent?

e. Are X and Y independent?

ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

44
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 7: S UMS OF RANDOM VARIABLES Technical Note

PY
C HAPTER 7

Sums of random variables

CO
In many practical situations it is common to encounter sums of random variables. Here
are a few examples

Claim from an insurance policy is a random variable. Total claims from an insur-
ON
ance portfolio is sum of many random variables.

Daily sales of a company is a random variable. Yearly sales is the sum of daily
sales over many days.

The total return in rupees from one stock in an investment portfolio is a random
variable. The total return from the portfolio is the sum of returns from individual
TI

stocks.

Some General Facts


EC

(i)E[in=1 Xi ] = in=1 E[ Xi ]

(ii) V [in=1 Xi ] = in=1 V ( Xi ) + 2 1i< jn Cov( Xi , X j )

Note that if Xi s are independent, then Cov( Xi , X j ) = 0. Hence,


SP

(iii) If X1 , X2 , , Xn are independent then

V ( X1 + X2 + + X n ) = V ( X1 ) + V ( X2 ) + + V ( X n )

Further suppose, X1 , X2 , are independent and identically distributed (iid).


Since they are iid, they will all have same mean and variance. Then, the following
IN

are two important results

As a consequence of the above observations, we have the following identities that


you will come across when you study Survey Sampling.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

45
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 7: S UMS OF RANDOM VARIABLES Technical Note

(iv) If X1 , X2 , , Xn are iid with E[ Xi ] = and V ( Xi ) = 2 , then


   
1 n 1 n 2
n i n i
E Xi , = , V Xi , = ,
=1 =1
n

PY
This result is extremely useful in practice. It is saying that whetever be the vari-
ability of the individual random variables, the variabilty of average of many such
random variables will be lesser and will decrease as n increases.

e.g. The average return from a portfolio of stocks is more predictable than return

CO
from each indiviual stock.

e.g. The number of insurance claims from a policy is less predictable or less cer-
tainly known than the number of claims from an insurance portfolio.

ON
1. Exercise: In a constituency, there are 2 political candidates A and B standing for
elections. For an exit poll, a researcher wants to interview n people and estimate
the percentage of people in support of A. How many people should she interview
so that so that the standard deviation of the estimated proportion is not more than
.01?
TI

Facts specific to some distributions

(i) X1 , X2 , , Xk are independent Bin(n1 , p), Binomial (n2 , p), , Bin(nk , p) re-
spectively, then X1 + X2 + + Xk Bin(n1 + n2 + + nk , p)
EC

Why ? because this is also same as the sum of n1 + n2 + + nk independent


Bernoulli(p) random variables.

CAUTION: X1 Bin(n
n,, p1 ) , X2 Bin(n, p2 ), where p1 = p2 , the X1 + X2 is
Binomial.
NOT Binomial.
SP

(ii) X1 , X2 , , Xk are independent Poi (1 ), Poi (2 ), , Poi (k ) then X1 + X2 +


+ Xk Poi (1 + 2 + + k ).

(iii) X1 , X2 , , Xk are independent geometric( p) then X = X1 + X2 + + Xk


NegBin(k, p).
IN

Why?

What is E[ X ] and V [ X ]?

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

46
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 7: S UMS OF RANDOM VARIABLES Technical Note

2. Exercise: According to a study conducted on eating habits of an adult population


in a country, it is found that 25% of males and 20% of females never eat breakfast.
Suppose a sample of 5 men and 5 women are chosen. What is the probability that

PY
a. Atleast 2 of the 10 never eat breakfast.

b. Number of women who eat breakfast is at least as much as the number of men
who eat breakfast.

c. What would have been the answer to (a) if both the percentage of men and

CO
women who did not eat break fast were equal to 20%?.

(Ans. a) Hint: X= number of men not eating, Y= number of women not eating.
P( X + Y 0)=1 P( X + Y = 0) P( X + Y = 1). 1 0.755 0.85 5 0.755
0.2 0.84 5 0.25 0.754 0.85 . b) Hint: P( X Y ) = 5k=0 P( X k, Y = k ) =
5k=0 P( X k ) P(Y = k ) = 5k=0 5r=k P( X = r ) P(Y = k ) and use X Bin(5, .25)
and Y Bin(5, .2). c) P( Z 10) where Z Bin(10, .2).
ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

47
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 7: S UMS OF RANDOM VARIABLES Technical Note

PY
CO
ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

48
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 8: C ONTINUOUS R ANDOM VARIABLES Technical Note

PY
C HAPTER 8

Continuous Random Variables

CO
A good way to understand idea behind continuous random variable is by drawing a
parallel with discrete random variables. see table 8.1. We will mainly study uniform,
exponential, normal and gamma distributions. A quick summary is given in table 8.2.

Recall that for any random variable be it discrete or continuous, the c.d.f. F () is right
ON
continuous. For a contnuous random variable we would need it be to continuous from
both left and right.

Variable": A random variable X with c.d.f F () is said


Definition of "Continuous Random Variable":
to be continuous if F ( x ) is a continuous function in x
TI

1. Exercise: If X has p.d.f f ( x ) = 44xx3 , 0 < x < 1 then

a. Find c.d.f of X
X..
EC

b. Find c.d.f of Y = X 2

c. Find p.d.f of Y.
Y.

2. Exercise: If X is a continuous r.v. with cdf F, then


SP

a. what is the distribution of Y = F ( X )?

Solution:

A simpler argument but not fully rigorous would be as follows:


IN

Yy F ( X ) y X F 1 (y)}
So, P(Y y) = P( X F 1 (y)) = F ( F 1 (y)) = y
Hence Y = F ( X ) U [0, 1].

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

49
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 8: C ONTINUOUS R ANDOM VARIABLES Technical Note

Table 8.1: : Discrete versus Continuous random variables


Discrete R.V. Contunous R.V
Values takes discrete Values takes values in a continuum.

PY
e.g. values X {0, 1, 2, , } OR X X (0, ), X (
(,, ),
{ 23 , 53 } etc. X [2, 7], X [2, 7] [10, 11]
In general X { a1 , a2 , , ak , ..} XSR
probability mass func- P( X = ak ) not applicable
tion (p.m.f)

CO
cumulative distribu- F(x) = P( X F(x) = P( X x) =
x
tion function (c.d.f) x )=k:ak x P( X = ak ) f ( x )dx
d
probability density not applicable f (x) = dx F ( x )
function (p.d.f)
Probability of a set is adding probabilities of indi- computing the area under the
obtained by vidual values density curve.
Example Distributions Geometric, Binomial, Hyper- Uniform, Exponential, Nor-
geometric,
ON
Negative Bino- mal, Gamma.
mial, Poisson

E[ X ] k ak P( X = ak ) x f ( x )dx

V [X] k ( ak E[ X ])2 P( X = ak ) ( x E[ X ])2 f ( x )dx
TI

A rigorous argument would be

Y y F ( X ) y X x0 = sup{ x : F ( x ) = y}
EC

So, P(Y y) = P( X x0 ) = F ( x0 ) = y by continuity.

b. Suppose you know how to simulate a random number from U [0, 1], how
F?
would you simulate from F?

-Simulate first a number u from U [0, 1] and then compute x = F 1 (u). The
SP

above result says that such a generated x will be a random number from the
F.
distribution F.

8.1 Uniform Distribution


IN

This is perhaps the simplest of the continuous distributions. For example, suppose a
bank knows that the recovery on a loan it has given to a customer, in case there is a
default, is anywhere between 60% to 80% of the outstanding loan amount. Suppose,

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

50
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 8: C ONTINUOUS R ANDOM VARIABLES Technical Note

Table 8.2: : Continuous Distributions

X Uniform Exponential Normal or Gaussain


U [, ] exp(), Mean= 1 N (M ean =
Mean

PY
variance=2 )
, variance=

1
x ex , x > 0,
,
( x )2
p.d.f f (x) = f (x) = f (x) = 1 e 2
0, otherwise 0, x 0 2
f (x) < x <




0, x <
0, x < 0

CO
c.d.f F ( x ) = x
, x
F(x) = No closed form expres-

1 exx , x > 0
F(x)
sion. Need to use tables
1, x >
or computer
+ 1
Mean 2
E[ X ]
( )2 1
Variance 12 ON 2
2

in the absence of any other information, the bank believes that these values are equally
likely. Then, we are essentially saying that X = Recovery rate on the loan follows a
uniform distribution on the interval [.6, .8], i.e. X U [.6, .8]. In general, uniform dis-
tribution can be on any interval and so in general we have U [, ]. The mean, variance,
p.d.f, c.d.f etc. are summarized in table 8.2.
TI

1. Exercise A job takes anywhere between 0 and 1 hour to complete. Assume that
the time to completion is uniformly distributed.
EC

a. What is the probability that it will take at most 5 minutes to complete?

b. What is the probability that it will take 10 to 20 minutes to complete?

c. What is the expected time and sd for completion.

d. Given that a job took less than 20 minutes to complete, what is the expected
SP

time for its completion ?.

2. Exercise Trains headed to a destination A arrive at station A at 15 minute inter-


vals starting at 7:10 A.M, whereas trains headed to station B arrive at 15 minute
intervals starting at 7 A.M. If a certain passenger arrives at the station at a time
IN

that is uniformly distributed between 7 and 8 A.M, and then gets on the first train
that arrives what is the probability that the passenger travels to A.

(Ans 2/3)

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

51
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 8: C ONTINUOUS R ANDOM VARIABLES Technical Note

8.2 Exponential Distribution

Recall the Geometric distribution, which was a discrete waiting time distribution. Ex-
ponential is a continuous time waiting time distribution. It is the only continuous

PY
distributon on (0, ) that satisfies the "Memoryless property" (i.e. P( X > t + s| X >
t) = P( X > t)). Basicaly, the fact that an event has not occured until some time t has
no bearing the probability of it happening within a future time s.. The past is irrelavant
while determining future waiting times. See table 8.2 for details on p.d.f, c.d.f, etc...

CO
1. Exercise: If X is exponentially distributed show that P( X > t + s| X > t) = P( X >
s)

This is known as "Memoryless" property.

2. Exercise: Jones figures that the total number of thousands of miles that an auto
can be driven before it would need to be junked is exponential with mean param-
eter 1
20 .
ON
Smith has a used car that he claims has been driven only 10,000 miles. If
Jones purchases the car, what is the probability that she would get at least 20,000
additional miles out of it?

8.3 Normal or Gaussian Distribution


TI

The Normal distribution is perhaps the most fascinating of all distributions. It makes
an appearance in various interesting examples
EC

Sir.Francis-Galton in his study of heredity observed that the heights of parents


and that of children were approximately following a Normal distribution. His
quest on this topic led to the "Regression Model" that is extensively used in prac-
tice today.

In physics, the Maxwells velocity distribution of quantum particles in 3-dimensions


SP

is believed to follow a tri-variate Normal distribution.

The buzz-noise that we often hear while tuning old radios is referred to as "white
noise" and the various noise levels that are part of the buzz would approximately
be normally distributed.
IN

While it is interesting that the normal distribution appeared in various situation, there
is no formal or mathematical justification for why that should be the case. In fact, there
is no reason why all data should be normally distributed. However, although all data

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

52
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 8: C ONTINUOUS R ANDOM VARIABLES Technical Note

need not be normally distributed, the most startling mathematical fact is that averages
of data will be normally distributed. This celebrated mathematical result is known as
the Central Limit Theorem or in short as CLT.. The formal statement of CLT is given

PY
later.

An important fact that is useful for computing with Normal distribution is as


follows:
X
X N (, 2 ) = N (0, 1)

CO
N (0, 1) is known as the Standard Normal Distribution

1. Exercise: Suppose X N ( = 3, sd = = 2), compute

a. P( X 2).

b. P( X 3) ON
c. P(3 < X 2)

d. 97.5 percentile of X.

e. 95 percentile of X.

2. Exercise: GMAT scores for a group of students are approximately normally dis-
TI

tributed with mean 580 and SD=55. All students above a score of 650 are admitted
to a business school.

a. What percentage of students are expected to be admitted to the business


EC

school?

b. What percentage of the admitted students are expected to have a score over
700?.

c. What is the score of the student at the 95th percentile?


SP

Important Properties of the Normal Distribution

(i) if X N (, variance = 2 ), then for constants a and b, ( aX + b) N ( a +


IN

bb,, a2 2 ).

(ii) If X and Y are independent and are Normally distributed , then so is X + Y. In


fact, X + Y N ( E[ X + Y ], V [ X + Y ]), where as we know V [ X + Y ] = V [ X ] + V [Y ]

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

53
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 8: C ONTINUOUS R ANDOM VARIABLES Technical Note

(iii) In general, if X1 , X2 , , Xn are independent and normally distributed. Then,


for constants a1 , a2 , , an , a1 X1 + a2 X2 + + an Xn is also normally distributed.
2
(iv) If Xi are iid N (, 2 ) then X N (, n ).

PY
Recall that if X and Y are independent, then Cov( X, Y ) = 0. However, we know
that if Cov( X, Y ) = 0 that does not necessarily mean that X and Y are indepen-
dent. However, an interesting exception is when ( X, Y ) follow a bi-variate nor-
mal distribution. In that case independence and zero covariance turn out to be

CO
equivalent. Discussion if bi-variate normal is beyond the scope of this note.

3. Exercise: A machine produces bearings whose diameters are normally distributed


with mean 0.5 and SD .003 centimeters respectively.

a. If tolerance limits are .5+ or - .004 centimeters, what percentage of products


would be unacceptable?

b.Suppose the contribution per bearing of acceptable size is 10 paise. Those ex-
ON
ceeding .504 cms can be reworked at a cost of 6 paise per piece (resulting in a net
contribution of 4 paise) and those below .496 have to be scrapped resulting in a
net loss of 20 paise per piece. What is the expected net contribution from a batch
of 10000 pieces.

(Ans: a. 18.24% b. Rs. 671)


TI

4. A project has four phases viz. 1,2,3,4. A phase cannot start until the previous
phase is completed. The time to completion for each phase is believed to be nor-
mally distributed with means 6, 12, 4 and 8 weeks respectively and standard de-
EC

viations 1,3,1 and 2 weeks respectively. Completion times of the different stages
are independent of each other.

a.What is the expected total time and SD of total time for completion of the
project?
SP

b. What is the probability that phase 3 can be started no later than 20 weeks from
start.

c. If the project is scheduled to be completed in 32 weeks, what is the probability


that it will be completed in time.
IN

d. What should be the planned duration if a probability of 80% is specified for in


time completion of the project?.

(Ans: a. (30,3.873) b. 0.7364 c. 69.73 d. 33.25 weeks)

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

54
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 8: C ONTINUOUS R ANDOM VARIABLES Technical Note

8.4 Central Limit Theorem

The Central Limit Theorem: Let X1 , X2 , , Xn be independent and identically distributed


X = 1 in=1 .
(i.i.d) random variables with mean=E[ Xi ] = and variance =V [ Xi ] = 2 . Let X

PY
n
2
Then for large n, X approximately follows a Normal distribution with mean and variance n.
Stated differently,
n( X )
N (0, 1) as n .

More precisely,
 

CO
( X )
lim P n t = (t), where () is the c.d.f of N (0, 1)
n

Caution !: Many people make the common mistake of interpreting CLT wrongly by
thinking that all data should be approximately normally distributed. That is incorrect.

The result only says that if we were to take averages of data points, the average which
is also a random variable is approximately normal.

1. Simulation to understand CLT.. In Excel or R


ON
Step 1. draw 50 random numbers from U[0,1] and compute their mean. (These
50 numbers are realizations if X1 , X2 , X50 which are iid U[0,1].

Step 2. Repeat Step 1 1000 times and record the mean each time.
TI

Step 3. Plot the histogram of the 1000 means.

Step
 4. Super-impose the p.d.f of Normal distribution with mean 50 and variance
100
12 .
EC

What do you observe?

Repeat the above exercise with 100 samples instead of 50. Now, what do you
observe?

2. If X Bin(n
n,, p) then for large n, X is approximately normally distributed with mean
SP

np and variance np(1 p).

Is the above statement true or false?. Why?


IN

3. An insurance portfolio has 10 policies. The first policy is expected to result in 1


claim, 2nd policy is expected to result in 2 claims and in general the ith policy is
expected to result in i claims. Suppose that the number of claims from each policy
is distributed as Poisson and they are independent of each other.

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

55
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
C HAPTER 8: C ONTINUOUS R ANDOM VARIABLES Technical Note

a. What is the distribution of total claims from the portfolio?

b. Per CLT, what is the approximate distribution of the total number of claims
from the portfolio?

PY
4. You manage a sales organization consisting of 100 people. Each sales person is
capable of selling on an average 4 items per month. Based on your experience
you have observed that the standard deviation of sales made is 1.

a. What is the expected sales made by the organization in a month?

CO
b. What is the probability that the total sales in the month exceeds 410 items?
(state your assumptions clearly).
 
410
4
(Ans: b) P Z > 1001
)
100

ON
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

56
This document is authorized for personal use only by Shirsendu Nandi, of Indian Institute of Management Rohtak till 14th June ,2018. It shall not be reproduced or distributed without express written
permission from Indian Institute of Management, Ahmedabad.

IIMA/QM0275TEC
R EFERENCES Technical Note

PY
References

CO
Ross, Sheldon. (2007). Introduction to Probability Models,, Ninth Ed, Academic Press El-
sevier.

Ross, Sheldon. (2014). A First Course in Probability,, Ninth Ed, Pearson Education.

Feller, William. (1993). An Introduction to Probability Theory and Its Applications -Volume
I,, Ninth Wiley Eastern Reprint, Wiley Eastern Limited.
ON
Feller, William. (1991). An Introduction to Probability Theory and Its Applications -Volume
II,, Sixth Wiley Eastern Reprint, Wiley Eastern Limited.

Hogg, Robert V. and Tanis, Elliot A. and Rao, Jagan Mohan. (2006). Probability and
Statistical Inference,, Seventh Ed, Pearson Eductaion.

Stine, Robert E. and Foster, Dean. (2011). Statistics for Business


Business, Pearson Eductaion.
TI
EC
SP
IN

Prepared by Prof. Karthik Sriram,Indian Institute of Management, Ahmedabad.


2014 by the Indian Institute of Management, Ahmedabad

57

Você também pode gostar