Final RSA With Question Papers Solution-new-May - 16

CHAPTER-1
Overview of Probability Theory and Basics of Random Variable
1. Motivation
The study of probability stems from the analysis of certain games of chance, and it has found
applications in most branches of science and engineering. In this chapter the basic concepts of probability
theory are presented.
2. Syllabus
Sr. No. of Self

Topic Fine Detailing
No. Hours Study
01 Introduction  Classical and relative- 1 hour 1 hour
to Probability frequency-based definitions of
probability;
 sets, fields, sample space and
events;
 axiomatic definition of
probability;
 joint and conditional 1 hour 1 hour
probabilities, independence,
total probability;
 Bayes’ Rule and applications. 1 hour 1 hour
 Functions of one random 1 hour 1 hour
variable
 their distribution and density 1 hour 1 hour
functions
Random variables:
Random Signal Analysis Page 1

 Definition of random variable, 1 hour 1 hour
 Cumulative Distribution Function 1 hour 1 hour

(CDF),
1 hour 1 hour
 Probability Mass Function (PMF),
Probability Density Functions
(PDF) and properties,
 Some special distributions:

1 hour 1 hour
Uniform, Gaussian and Rayleigh
distributions; Binomial, and
Poisson distributions; Mixed
Random Variables.
3. Books Recommended
1. A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochastic

Processes, 4th Edition, McGraw-Hill, 2002
2. P.Z. Peebles, Probability, Random Variables and Random Signal Principles,

4th edition, Mc-Graw Hill, 2000
3. H. Stark and J.W. Woods, Probability and Random Processes with

Applications to Signal Processing, 3e, Pearson edu
4. Wim C Van Etten, Introduction to Random Signals and Noise, Wiley
5. Miller, Probability and Random Processes-with applications to signal

processing and communication, first ed2007, Elsevier
4. Weight age in University Examination: 8-15 marks.

5. Objective
The objective of this module is to make the reader understand concepts of probability, different types of
probability & acquire an ability to compute probability in various cases.
6. Key Notation
N. Set of natural numbers
Set of integers
Q. Set of rational numbers
Set of real numbers
C. Set of complex numbers
CDF Cumulative Distribution Function
PMF Probability Mass Function
PDF Probability Density Function
RV Random Variable
Open interval on the real line
Closed interval on the real line
Semi open intervals
7. Key Definitions

1. Set: A set is a well defined collection of objects. These objects are called elements or
members of the set. Usually uppercase letters are used to denote sets.
2. Subset and Superset:
A set A is called a subset of B (or B is called the superset of A), denoted by , if all the
elements of A are also elements of B. Thus .
If A is a subset of B and there is at least one element in B which is not an element of A, then A is
called a proper subset of B. We write
3. Universal set:
We always consider all sets for the problem under consideration to be a subset of a (large) set called
the universal set. For the binary digital communication problem, the set may be considered as the
universal set. We shall denote the universal set by the symbol S.
4. Union: The union of two sets A and B, denoted by is defined as the set of
elements that are either in A or in B or both. In set builder notation, we write
5. Intersection: The intersection of two sets A and B, denoted by , is defined as the set
of elements that are common to both A and B. We can write,
6. Difference: The difference of two sets A and B, denoted by is the set of those
elements of A which do not belong to B. Thus,

Similarly,
7. Complement: The complement of a set A, denoted by , is defined as the set of all

elements which are not in A.
Clearly,
8. Disjoint: Two sets A and B are called disjoint if
In the above example, B and C are disjoint sets.
9. Venn diagram
The sets and set operations can be illustrated by means of the Venn diagrams. A rectangle is used to
represent the universal set and a circle is used to represent any set in it.
10. Random Variable:

A random variable associates the points in the sample space with real numbers.
Consider the probability space and function mapping the sample space
into the real line. Let us define the probability of a subset by
11. Conditional Probability:
For two events A and B with , the conditional probability was defined as

12. Conditional Distribution Function
Consider the event and any event B involving the random variable X . The conditional
distribution function of X given B is defined as
8. Key Relations
1. Relative-frequency based definition of probability (von Mises, 1919)
If an experiment is repeated times under similar conditions and the event occurs in times, then
2. Conditional probability
3. Axiomatic definition of probability (Kolmogorov, 1933)

4. Probability Using Counting Method
In many applications we have to deal with a finite sample space and the elementary events formed
by single elements of the set may be assumed equiprobable. In this case, we can define the probability of
the event A according to the classical definition discussed earlier:
where = number of elements favourable to A and n is the total number of elements in the sample
space .
5. Bernoulli trial
Suppose in an experiment, we are only concerned whether a particular event A has occurred or not.
We call this event as the ‘success' with probability and the complementary event as the
‘failure' with probability . Such a random experiment is called Bernoulli trial.
Probability of
Success :
Failure :
6. Binomial Law
We are interested in finding the probability of k ‘successes' in n independent Bernoulli trials.

This probability is given by
9. Theory
Basic Concepts of Set Theory
The modern approach to probability based on axiomatically defining probability as function of a set.
A background on the set theory is essential for understanding probability.
Some of the basic concepts of set theory are introduced here.
Set:
A set is a well defined collection of objects. These objects are called elements or members of the
set. Usually uppercase letters are used to denote sets.
Example 1
is a set and are its elements.
 The elements of a set are enumerated within a pair of curly brackets as shown in this example.
 Instead of listing all the elements, we can represent a set in the set-builder notation specifying some
common properties satisfied by the elements. Thus the set
represents the set . We read ' ' as 'all such that'. Particularly, if a set is infinite
having infinite number of elements or listing all the elements of the set is cumbersome, such a
representation is useful.

 If an element x is a member of the set A, we write . If x is not a member of A,
we write . In the above example, and .
 The null set or empty set is the set that does not contain any element. A null set is denoted by .
Subset and Superset
A set A is called a subset of B (or B is called the superset of A), denoted by , if all the
elements of A are also elements of B. Thus .
If A is a subset of B and there is at least one element in B which is not an element of A, then A is
called a proper subset of B. We write
Example 2 Let .
Then, .
o A set A is a subset of itself .
o Implies that .
o The null set is a subset of every set.
o If the set A is finite with n number of elements, then A has subsets.
For example, the set of binary digits has subsets.
These are: .
Universal set
We always consider all sets for the problem under consideration to be a subset of a (large) set called
the universal set. For the binary digital communication problem, the set may be considered as the
universal set. We shall denote the universal set by the symbol S.

Example 3
In discussion involving English letters, the alphabet of the English language may be considered as the
universal set.
Equality of two sets
Two sets A and B are equal if and only if they have the same elements. Thus,
o A = B if and only if and . In other words,
We take above definition of the equality of two sets to establish identities involving set theoretic operation.
Set operations
We can combine events by set operation to get other events. Following set operations are useful:
Union: The union of two sets A and B, denoted by is defined as the set of elements that
are either in A or in B or both. In set builder notation, we write
Intersection: The intersection of two sets A and B, denoted by , is defined as the set of elements
that are common to both A and B. We can write,
Difference: The difference of two sets A and B, denoted by is the set of those elements of A
which do not belong to B. Thus,

Similarly,
Complement: The complement of a set A, denoted by , is defined as the set of all elements which
are not in A.
Clearly,
Example 4
Disjoint: Two sets A and B are called disjoint if
In the above example, B and C are disjoint sets.
Venn diagram
The sets and set operations can be illustrated by means of the Venn diagrams. A rectangle is used to
represent the universal set and a circle is used to represent any set in it.
Following are a few illustrations of Venn diagram:

Properties of set operations
1. Identity properties

2. Commutative properties
3. Associative properties
4. Distributive properties
5. Complementary properties
6. De Morgan’s laws
These properties can be proved easily and verified using the Venn diagram. They can be used to
derive useful results.
For example, Putting in

Probability Concepts
Before we give a definition of probability, let us examine the following concepts:
1. Random Experiment: An experiment is a random experiment if its outcome cannot be predicted

precisely. One out of a number of outcomes is possible in a random experiment. A single
performance of the random experiment is called a trial.
2. Sample Space: The sample space is the collection of all possible outcomes of a random
experiment. The elements of are called sample points.

 A sample space may be finite, countably infinite or uncountable.
 A finite or countably infinite sample space is called a discrete sample space.
 An uncountable sample space is called a continuous sample space
3. Event: An event A is a subset of the sample space such that probability can be assigned to it. Thus

 For a discrete sample space, all subsets are events.
 is the certain event (sure to occur) and is the impossible event.
Figure 1

Consider the following examples.
Example 1. Tossing a fair coin
The possible outcomes are H (head) and T (tail). The associated sample space is It is a finite
sample space. The events associated with the sample space are: and .
Example 2. Throwing a fair die:
The possible 6 outcomes are:
The associated finite sample space is .Some events are
Example 3. Tossing a fair coin until a head is obtained
We may have to toss the coin any number of times before a head is obtained. Thus the possible outcomes
are :
H, TH,TTH,TTTH, …..
How many outcomes are there? The outcomes are countable but infinite in number. The countably infinite
sample space is .
Example 4. Picking a real number at random between -1 and +1
The associated sample space is
Clearly is a continuous sample space.
Example 5. Output of a radio receiver at any time

Suppose the output voltage of a radio receiver at any time t is a value lying between -5 V and 5V.
The associated sample space is continuous and given by
Clearly is a continuous sample space.
The probability of an event is a number assigned to the event. Let us see how we can define
probability.
Classical definition of probability (Laplace 1812)
Consider a random experiment with a finite number of outcomes If all the outcomes of the experiment
are equally likely, the probability of an event is defined by
where
Example 6 A fair die is rolled once. What is the probability of getting a ‘6’ ?
Here and
Example 7 A fair coins is tossed twice. What is the probability of getting two ‘heads'?
Here and .
Total number of outcomes is 4 and all four outcomes are equally likely.
Only outcome favourable to is {HH}

Remark
 The classical definition is limited to a random experiment which has only a finite number of
outcomes. In many experiments like that in the above examples, the sample space is finite and each
outcome may be assumed ‘equally likely.' In such cases, the counting method can be used to
compute probabilities of events.
 Consider the experiment of tossing a fair coin until a ‘head' appears. As we have discussed earlier,
there are countably infinite outcomes. Can you believe that all these outcomes are equally likely?
 The notion of equally likely is important here. Equally likely means equally probable. Thus this
definition presupposes that all events occur with equal probability. Thus the definition includes a
concept to be defined.
Relative-frequency based definition of probability (von Mises, 1919)
If an experiment is repeated times under similar conditions and the event occurs in times, then
Example 8 Suppose a die is rolled 500 times. The following table shows the frequency each face.

We see that the relative frequencies are close to . How do we ascertain that these relative frequencies
will approach to as we repeat the experiments infinite no of times?
Discussion this definition is also inadequate from the theoretical point of view.
 We cannot repeat an experiment infinite number of times.

 How do we ascertain that the above ratio will converge for all possible sequences of
outcomes of the experiment?
Axiomatic definition of probability (Kolmogorov, 1933)
We have earlier defined an event as a subset of the sample space. Does each subset of the sample
space forms an event?
The answer is yes for a finite sample space. However, we may not be able to assign probability
meaningfully to all the subsets of a continuous sample space. We have to eliminate those subsets.
The concept of the sigma algebra is meaningful now.
Definition Let be a sample space and a sigma field defined over it. Let be a mapping
from the sigma-algebra into the real line such that for each , there exists a unique
. Clearly is a set function and is called probability, if it satisfies the following three
axioms.

 The triplet is called the probability space.
 Any assignment of probability assignment must satisfy the above three axioms
 If ,
This is a special case of axiom 3 and for a discrete sample space , this simpler version may be
considered as the axiom 3. We shall give a proof of this result below.
 The events A and B are called mutually exclusive .

For any event , we can define the probability
In a special case, when the outcomes are equi-probable, we can assign equal probability p to each
elementary event.
Example 9 Consider the experiment of rolling a fair die considered in example 2.
Suppose represent the elementary events. Thus is the event of getting ‘1', is the
event of getting '2' and so on.
Since all six disjoint events are equiprobable and we get ,

Suppose is the event of getting an odd face. Then
Example 10 Consider the experiment of tossing a fair coin until a head is obtained discussed in Example 3.
Here . Let us call
and so on. If we assign, then Let is the event of obtaining

the head before the 4 th toss. Then
Probability assignment in a continuous space
Suppose the sample space S is continuous and un-countable. Such a sample space arises when the
outcomes of an experiment are numbers. For example, such sample space occurs when the experiment
consists in measuring the voltage, the current or the resistance. In such a case, the sigma algebra consists
of the Borel sets on the real line.
Suppose and is a non-negative integrable function such that,

For any Borel set ,
defines the probability on the Borel sigma-algebra B.
We can similarly define probability on the continuous space of etc.
Example 11 Suppose
Then for
Example 12 Consider the two-dimensional Euclidean space. Let and represents
the area under .
This example interprets the geometrical definition of probability.
Probability Using Counting Method
In many applications we have to deal with a finite sample space and the elementary events formed
by single elements of the set may be assumed equiprobable. In this case, we can define the probability of
the event A according to the classical definition discussed earlier:

where = number of elements favourable to A and n is the total number of elements in the sample
space .
Thus calculation of probability involves finding the number of elements in the sample space and the
event A. Combinatorial rules give us quick algebraic formulae to find the elements in .We briefly outline
some of these rules:
1. Product rule Suppose we have a set A with m distinct elements and the set B with n distinct
elements and . Then contains mn ordered pair of elements.

This is illustrated in Fig for m=5 and n=4 n other words if we can choose element a in m possible
ways and the element b in n possible ways then the ordered pair (a, b) can be chosen in mn possible
ways.
Figure 1 Illustration of the product rule
The above result can be generalized as follows: The number of distinct k -tupples in
is where represents the
number of distinct elements in .

Example 1 A fair die is thrown twice. What is the probability that a 3 will appear at least once.
Solution: The sample space corresponding to two throws of the die is illustrated in the following table.
Clearly, the sample space has elements by the product rule. The event corresponding to
getting at least one 3 is highlighted and contains 11 elements. Therefore, the required probability is .
2. Sampling with replacement and ordering
Suppose we have to choose k objects from a set of n objects. Since sampling is with ordering, each ordered
arrangements of k objects is to be considered. Further, after every choosing, the object is placed back in
the set. In this case, the number of distinct ordered k - tupples = . If a random
experiment has n outcomes, if the experiment is repeated k times, then
the total number of outcomes = .
3. Sampling without replacement and with ordering
Suppose we have to choose k objects from a set of n objects by picking one object after another at random.
In this case the first object can be chosen from n objects, the second object can be chosen from n-1 objects,
and so. Therefore, by applying the product rule, the number of distinct ordered k-tupples in this case is

The number is called the permutation of n objects taking k at a time and denoted by .Thus
Clearly,
Example 2 Birthday problem - Given a class of students, what is the probability of two students in the
class having the same birthday? Plot this probability vs. number of students and be surprised!.
Let be the number of students in the class.

The plot of probability vs number of students is shown in above table. Observe the steep rise in the
probability in the beginning. In fact this probability for a group of 25 students is greater than 0.5 and that
for 60 students onward is closed to 1. This probability for 366 or more number of students is exactly one.
Sampling without replacing and without ordering Suppose be the number of ways in which k objects
can be chosen out of a set of
n objects. In this case ordering of the objects in the set of k objects is not considered.
Note that k objects can be arranged among themselves in k ways. Therefore, if ordering of the k objects is
considered, the number of ways in which k objects can be chosen out of n objects is . This is the case
of sampling with ordering.
is also called the binomial co-efficient.
Example 3 An urn contains 6 red balls, 5 green balls and 4 blue balls. 9 balls were picked at random from
the urn without replacement. What is the probability that out of the balls 4 are red, 3 are green and 2 are
blue?
Solution:
9 balls can be picked from a population of 15 balls in .
Therefore the required probability is
5. Arranging n objects into k specific groups Suppose we want to partition a set of n distinct elements
into k distinct subsets of sizes respectively so that .

Then the total number of distinct partitions is
This can be proved by noting that the resulting number of partitions is
Example 4 What is the probability that in a throw of 12 dice each face occurs twice?
Solution: The total number of elements in the sample space of the outcomes of a single throw of 12
dice is
The number of favourable outcomes is the number of ways in which 12 dice can be arranged in six
groups of size 2 each – group 1 consisting of two dice each showing 1, group 2 consisting of two dice each
showing 2 and so on.
Therefore, the total number distinct groups
Hence the required probability is
Conditional probability

Consider the probability space . Let A and B two events in . We ask the following question
–
Given that A has occurred, what is the probability of B?
The answer is the conditional probability of B given A denoted by . We shall develop the
concept of the conditional probability and explain under what condition this conditional probability is same
as .
Let us consider the case of equiprobable events discussed earlier. Let sample points be
favourable for the joint event .

This concept suggests us to define conditional probability. The probability of an event B under the condition
that another event A has occurred is called the conditional probability of B given A and defined by
We can similarly define the conditional probability of A given B , denoted by .
From the definition of conditional probability, we have the joint probability of two events
A and B as follows
Example 1 Consider the example tossing the fair die. Suppose
Example 2 A family has two children. It is known that at least one of the children is a girl. What is the
probability that both the children are girls?

A = event of at least one girl
B = event of two girls
Conditional probability and the axioms of probability
In the following we show that the conditional probability satisfies the axioms of probability.
By definition
Axiom 1:
Axiom 2:
We have,
Axiom 3:

Consider a sequence of disjoint events .
We have,
Note that the sequence is also sequence of disjoint events.
Properties of Conditional Probabilities
If , then
We have ,

Chain Rule of Probability
We have ,
We can generalize the above to get the chain rule of probability

3. Theorem of Total Probability Let be n events such that
Then for any event B,
Proof: We have and the sequence is disjoint.
Remark
(1) A decomposition of a set S into 2 or more disjoint nonempty subsets is called a partition of S. The
subsets form a partition of S if
(2) The theorem of total probability can be used to determine the probability of a complex event in terms
of related simpler events. This result will be used in Bays' theorem to be discussed to the end of the lecture.
Example 3 Suppose a box contains 2 white and 3 black balls. Two balls are picked at random without
replacement.
Let = event that the first ball is white and

Let = event that the first ball is black.
Clearly and form a partition of the sample space corresponding to picking two balls from the
box.
Let B = the event that the second ball is white. Then.
Independent events
Two events are called independent if the probability of occurrence of one event does not affect the
probability of occurrence of the other. Thus the events A and B are independent if
and
where and are assumed to be non-zero.
Equivalently if A and B are independent, we have
or --------------------

Two events A and B are called statistically dependent if they are not independent. Similarly, we can define
the independence of n events. The events are called independent if and only
Example 4 Consider the example of tossing a fair coin twice. The resulting sample space is given by
and all the outcomes are equiprobable.
Let be the event of getting ‘tail' in the first toss and be the event of
getting ‘head' in the second toss. Then
and
Again, so that
Hence the events A and B are independent.
Example 5 Consider the experiment of picking two balls at random discussed in example 3.
In this case, and .
Therefore, and and B are dependent.

Baye’s Theorem
This result is known as the Baye's theorem. The probability is called the a priori probability
and is called the a posteriori probability. Thus the Bays' theorem enables us to determine the a
posteriori probability from the observation that B has occurred. This result is of practical
importance and is the heart of Baysean classification, Baysean estimation etc.
Example 1
In a binary communication system a zero and a one is transmitted with probability 0.6 and 0.4
respectively. Due to error in the communication system a zero becomes a one with a probability 0.1 and a
one becomes a zero with a probability 0.08. Determine the probability (i) of receiving a one and (ii) that a
one was transmitted when the received message is one.
Let S be the sample space corresponding to binary communication. Suppose be event of
transmitting 0 and be the event of transmitting 1 and and be corresponding events of receiving 0
and 1 respectively.

Given and
Example 2 In an electronics laboratory, there are identically looking capacitors of three makes
in the ratio 2:3:4. It is known that 1% of ,1.5% of are defective. What

percentage of capacitors in the laboratory are defective? If a capacitor picked at defective is found to be
defective, what is the probability it is of make ?
Let D be the event that the item is defective. Here we have to find .
Here
The conditional probabilities are

Example 3
Box A contains 2 red chips; box B contains two white chips; and box C contains 1 red chip and 1 white chip.
A box is selected at random, and one chip is taken at random from that box. What is the probability of
selecting a white chip?
Solution. Let A be the event that Box A is randomly selected; let B be the event that Box B is randomly
selected; and let C be the event that Box C is randomly selected. Because there are three boxes that are
equally likely to be selected, P(A) = P(B) = P(C) = 1/3. Let W be the event that a white chip is randomly
selected. The probability of selecting a white chip from a box depends on the box from which the chip is
selected:
P(W | A) = 0
P(W | B) = 1
P(W | C) = 1/2
Now, a white chip could be selected in one of three ways: (1) Box A could be selected, and then a white
chip be selected from it; or (2) Box B could be selected, and then a white chip be selected from it; or (3) Box
C could be selected, and then a white chip be selected from it. That is, the probability that a white chip is
selected is:
(P(W) = P[(W/A) or (W/ B) or (W/ C)])
Then, recognizing that the events W ∩ A, W ∩ B, and W ∩ C are mutually exclusive, we get
(P(W) = P(W|A)P(A)+P(W|B)P(B)+P(W|C)P(C)
= 1/2

Example 4
In the above example, if the selected chip is white, what is the probability that the other chip in the box is
red?
Solution. The box that contains one white chip and one red chip is Box C. Therefore, we are interested in
finding P(C | W). From previous example P(W) = ½.
𝑃(𝐶∩𝑊)
P(C/ W) =
𝑃(𝑊)
P(W|C)P(C)
=
P(W|A)P(A)+P(W|B)P(B)+P(W|C)P(C)
= 1/3
Repeated Trials
In our discussions so far, we considered the probability defined over a sample space corresponding to
a random experiment. Often, we have to consider several random experiments in a sequence. For example,
the experiment corresponding to sequential transmission of bits through a communication system may be
considered as a sequence of experiments each representing transmission of single bit through the channel.
PRODUCT: Suppose two experiments and with the corresponding sample space and are
performed sequentially. Such a combined experiment is called the product of two experiments and .
Clearly, the outcome of this combined experiment consists of the ordered pair where
and . The sample space corresponding to the combined experiment is given by . The
events in S consist of all the Cartesian products of the form where is an event in and is an
event in . Our aim is to define the probability .

We can easily show that
where is the probability defined on the events of This is because, the event in
S occurs whenever in occurs, irrespective of the event in .
Also note that
Independent Experiments
In many experiments, the events and are independent for every selection of
and . Such experiments are called independent experiments.

In this case can write
Example 1
Consider the experiments of rolling a fair die and tossing a fair coin sequentially. What is the probability
that a '2' and a 'head' will occur?
Solution: Suppose is the sample space of the experiment of rolling of a six-faced fair die and is the
sample space of the experiment of tossing of a fair die.
Example 2
In a digital communication system transmitting 1 and 0, 1 is transmitted twice as often as 0. If two

bits are transmitted in a sequence, what is the probability that both the bits will be 1?
Solution:

We can similarly define the sample space corresponding to n experiments and
the Cartesian product of events
If the experiments are independent, we can write
where is probability defined on the event of .
Bernoulli trial
Suppose in an experiment, we are only concerned whether a particular event A has occurred or not.
We call this event as the ‘success' with probability and the complementary event as the
‘failure' with probability . Such a random experiment is called Bernoulli trial.
Probability of
Success :
Failure :

Binomial Law
We are interested in finding the probability of k ‘successes' in n independent Bernoulli trials.
This probability is given by
Consider n independent repetitions of the Bernoulli trial. Let S be the sample space associated with
each trial and we are interested in a particular event and its complement such that and
. If A occurs in a trial, then we have a ‘success' otherwise a 'failure'.
Thus the sample space corresponding to the n repeated trials is .
Any event in S is of the form where some s are A and remaining s are .
Using the property of independent experiment we have,
If k 's are A and the remaining n - k 's are , then
But there are number of events in S with k number of A's and n - k number of s.
For example, if , the possible events are

We also note that all the events are mutually exclusive.
Hence the probability of k successes in n independent repetitions of the Bernoulli trial is given by
The above law is known as the binomial probability law.
A typical plot of vs k for n=20 and a particluar value of p is shown in the figure.
Example 3 A fair dice is rolled 6 times. What is the probability that a 4 appears thrice?
Solution:
We have
with
And with
Example 4
A communication source emits binary symbols 1 and 0 with probability 0.6 and 0.4 respectively. What is the
probability that there will be 5 1's in a message of 20 symbols?
Solution:
Example 5 In a binary communication system, bit error occurs with a probability of . What is
the probability of getting at least one error bit in a message of 8 bits?
Solution:
Here we can consider the sample space
Approximations of the Binomial probabilities
Two interesting approximations of the binomial probabilities are very important .
Case 1: Suppose n is very large and p is very small and a constant.

This probability law is known as Poisson probability and widely used in engineering and other fields.
We shall discuss more about it in a later class.
Case 2: When n is sufficiently large and , in the neighbourhood of np(1 - p) may be

approximated as
The right hand side is an expression for normal probability law to be discussed in a later class.
Example 6 Consider the problem in Example 2.

Here,
More Problems
Question 1: A die is rolled, find the probability that an even number is obtained.
Solution to Question 1:
 Let us first write the sample space S of the experiment.
S = {1,2,3,4,5,6}
 Let E be the event "an even number is obtained" and write it down.
E = {2,4,6}
 We now use the formula of the classical probability.
P(E) = n(E) / n(S) = 3 / 6 = 1 / 2
Question 2: Two coins are tossed, find the probability that two heads are obtained.
Note: Each coin has two possible outcomes H (heads) and T (Tails).
 The sample space S is given by.
S = {(H,T),(H,H),(T,H),(T,T)}
 Let E be the event "two heads are obtained".
E = {(H,H)}
 We use the formula of the classical probability.
P(E) = n(E) / n(S) = 1 / 4
Question 3: Which of these numbers cannot be a probability?
a) -0.00001

b) 0.5
c) 1.001
d) 0
e) 1
f) 20%
 A probability is always greater than or equal to 0 and less than or equal to 1, hence only a) and c)
above cannot represent probabilities: -0.00010 is less than 0 and 1.001 is greater than 1.
Question 4: Two dice are rolled, find the probability that the sum is
a) equal to 1
b) equal to 4
c) less than 13
 a) The sample space S of two dice is shown below.
S = { (1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6) }
 Let E be the event "sum equal to 1". There are no outcomes which correspond to a sum equal to 1,
hence
P(E) = n(E) / n(S) = 0 / 36 = 0
 b) Three possible outcomes give a sum equal to 4: E = {(1,3),(2,2),(3,1)}, hence.
P(E) = n(E) / n(S) = 3 / 36 = 1 / 12
 c) All possible outcomes, E = S, give a sum less than 13, hence.
P(E) = n(E) / n(S) = 36 / 36 = 1

Question 5: A die is rolled and a coin is tossed, find the probability that the die shows an odd number and
the coin shows a head.
 The sample space S of the experiment described in question 5 is as follows
S = { (1,H),(2,H),(3,H),(4,H),(5,H),(6,H)
(1,T),(2,T),(3,T),(4,T),(5,T),(6,T)}
 Let E be the event "the die shows an odd number and the coin shows a head". Event E may be
described as follows
E={(1,H),(3,H),(5,H)}
 The probability P(E) is given by
P(E) = n(E) / n(S) = 3 / 12 = 1 / 4
Question 6: A card is drawn at random from a deck of cards. Find the probability of getting the 3 of
diamond.
 The sample space S of the experiment in question 6 is shown below

 Let E be the event "getting the 3 of diamond". An examination of the sample space shows that there
is one "3 of diamond" so that n(E) = 1 and n(S) = 52. Hence the probability of event E occurring is
given by
P(E) = 1 / 52
Question 7: A card is drawn at random from a deck of cards. Find the probability of getting a queen.
 The sample space S of the experiment in question 7 is shown above (see question 6)
 Let E be the event "getting a Queen". An examination of the sample space shows that there are 4
"Queens" so that n(E) = 4 and n(S) = 52. Hence the probability of event E occurring is given by
P(E) = 4 / 52 = 1 / 13
Question 8: A jar contains 3 red marbles, 7 green marbles and 10 white marbles. If a marble is drawn from
the jar at random, what is the probability that this marble is white?
 We first construct a table of frequencies that gives the marbles color distributions as follows
color frequency
red 3
green 7
white 10
 We now use the empirical formula of the probability
Frequency for white color

P(E)= ________________________________________________
Total frequencies in the above table
= 10 / 20 = 1 / 2

Question 9: The blood groups of 200 people is distributed as follows: 50 have type A blood, 65 have B
blood type, 70 have O blood type and 15 have type AB blood. If a person from this group is selected at
random, what is the probability that this person has O blood type?
 We construct a table of frequencies for the the blood groups as follows
group frequency
a 50
B 65
O 70
AB 15
 We use the empirical formula of the probability
Frequency for O blood

P(E)= ________________________________________________
Total frequencies
= 70 / 200 = 0.35
Random Variable
In application of probabilities, we are often concerned with numerical values which are random in
nature. For example, we may consider the number of customers arriving at a service station at a particular
interval of time or the transmission time of a message in a communication system. These random
quantities may be considered as real-valued function on the sample space. Such a real-valued function is
called real random variable and plays an important role in describing random data. We shall introduce the
concept of random variables in the following sections.
Mathematical Preliminaries
Real-valued point function on a set

Recall that a real-valued function maps each element , a unique element .The
set is called the domain of and the set is called the range of . Clearly .
The range and domain of are shown in Figure below
A random variable associates the points in the sample space with real numbers.
Consider the probability space and function mapping the sample space into
the real line. Let us define the probability of a subset by
Such a definition will be valid if is a valid event. If is a discrete sample space, is
always a valid event, but the same may not be true if is infinite. The concept of sigma algebra is again
necessary to overcome this difficulty. We also need the Borel sigma algebra -the sigma algebra defined
on the real line.
The function is called a random variable if the inverse image of all Borel sets under is
an event. Thus, if is a random variable, then

Example 1 Consider the example of tossing a fair coin twice. The sample space is S={ HH,HT,TH,TT} and
all four outcomes are equally likely. Then we can define a random variable as follows
Here .
Example 2 Consider the sample space associated with the single toss of a fair die. The sample space is
given by .
If we define the random variable that associates a real number equal to the number on the face of
the die, then .
Axiom1
Axiom2
Axiom 3
Suppose are disjoint Borel sets. Then are distinct events in

.Therefore,

Thus the random variable X induces a probability space
Probability Distribution Function(PDF):
We have seen that the event and are equivalent and .The
underlying sample space is omitted in notation and we simply write and instead of
and respectively.
Consider the Borel set , where represents any real number. The equivalent event
is denoted as .The event can be taken as a

representative event in studying the probability description of a random variable . Any other event can
be represented in terms of this event.
For example,
and so on.
The probability is called the probability distribution function ( also
called the cumulative distribution function , abbreviated as CDF ) of and denoted by . Thus

Properties of the Distribution Function
1.
This follows from the fact that is a probability and its value should lie between 0 and 1.
2. is a non-decreasing function of . Thus, if
3. is right continuous.
4.
5.
6.
We have ,
7.

Example 1: Consider the random variable defined by
Find a) .
b) .
c) .
d) .
Solution:

Discrete, Continuous and Mixed-type Random Variables
A random variable is called a discrete random variable if is piece-wise constant. Thus is
flat except at the points of jump discontinuity. If the sample space is discrete the random variable
defined on it is always discrete.
• X is called a continuous random variable if is an absolutely continuous function of x . Thus
is continuous everywhere on and exists everywhere except at finite or countably infinite

points .
• X is called a mixed random variable if has jump discontinuity at countable number of points
and increases continuously at least in one interval of X. For a such type RV X,
where is the distribution function of a discrete RV, is the distribution function of a

continuous RV and o< p <1.
Typical plots of for discrete, continuous and mixed-random variables are shown in Figure 1,
Figure 2 and Figure 3 respectively.
Plot of Fx(x) vs. x for continuous random variable

Plot of vs. for a mixed-type random variable
Discrete Random Variables and Probability Mass functions
A random variable is said to be discrete if the number of elements in the range is finite or
accountably infinite.
First assume to be countably finite. Let be the elements of . Here the mapping
partitions into subsets .

The discrete random variable in this case is completely specified by the probability mass function
(pmf) .

Clearly,
• Suppose .Then
Figure illustrates a discrete random variable.
Figure shows Discrete Random Variable

Example 1 Consider the random variable with the distribution function
Continous Random Variables and Probability Density Functions
For a continuous random variable , is continuous everywhere. Therefore,
This implies that for

Therefore, the probability mass function of a continuous RV is zero for all .A continuous random
variable cannot be characterized by the probability mass function. A continuous random variable has a very
important chacterisation in terms of a function called the probability density function.
If is differentiable, the probability density function ( pdf) of denoted by is defined as
Interpretation of
so that
Thus the probability of lying in some interval is determined by . In that sense,
represents the concentration of probability just as the density represents the concentration of mass.

Properties of the Probability Density Function
 .
This follows from the fact that is a non-decreasing function
The pdf of the RV is given by
Remark: Using the Dirac delta function we can define the density function for a discrete random
variables.
Consider the random variable defined by the probability mass function (pmf)

The distribution function can be written as
where is the shifted unit-step function given by
Then the density function can be written in terms of the Dirac delta function as
Example 2
Consider the random variable defined with the distribution function given by,
Probability Density Function of Mixed Random Variable
Suppose is a mixed random variable with having jump discontinuity at .

As already stated, the CDF of a mixed random variable is given by
where is a discrete distribution function of and is a continuous distribution function of

. The corresponding pdf is given by
where
and is a continuous pdf. We can establish the above relations as follows.
Suppose denotes the countable subset of points on such that the random variable
is characterized by the probability mass function . Similarly, let be a continuous
subset of points on such that RV is characterized by the probability density function .
Clearly the subsets and partition the set If , then .
Thus the probability of the event can be expressed as
Taking the derivative with respect to x , we get

can be expressed as
Example 5
X is the random variable representing the life time of a device with the PDF for . Define the
following random variable
Find FY(y).

Solution:
Binomial Probability Distribution
To understand binomial distributions and binomial probability, it helps to understand binomial experiments
and some associated notation; so we cover those topics first.
Binomial Experiment
A binomial experiment is a statistical experiment that has the following properties:
 The experiment consists of n repeated trials.

 Each trial can result in just two possible outcomes. We call one of these outcomes a success and the
other, a failure.
 The probability of success, denoted by P, is the same on every trial.
 The trials are independent; that is, the outcome on one trial does not affect the outcome on other
trials.
Consider the following statistical experiment. You flip a coin 2 times and count the number of times the
coin lands on heads. This is a binomial experiment because:
 The experiment consists of repeated trials. We flip a coin 2 times.

 Each trial can result in just two possible outcomes - heads or tails.
 The probability of success is constant - 0.5 on every trial.
 The trials are independent; that is, getting heads on one trial does not affect whether we get heads
on other trials.
Notation
The following notation is helpful, when we talk about binomial probability.
 x: The number of successes that result from the binomial experiment.

 n: The number of trials in the binomial experiment.
 P: The probability of success on an individual trial.
 Q: The probability of failure on an individual trial. (This is equal to 1 - P.)

 n!: The factorial of n (also known as n factorial).
 b(x; n, P): Binomial probability - the probability that an n-trial binomial experiment results in exactly
x successes, when the probability of success on an individual trial is P.
 nCr: The number of combinations of n things, taken r at a time.
Binomial Distribution
A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The
probability distribution of a binomial random variable is called a binomial distribution.
Suppose we flip a coin two times and count the number of heads (successes). The binomial random variable
is the number of heads, which can take on values of 0, 1, or 2. The binomial distribution is presented below.
Number of heads Probability
0 0.25
1 0.50
2 0.25
The binomial distribution has the following properties:
 The mean of the distribution (μx) is equal to n * P .

 The variance (σ2x) is n * P * ( 1 - P ).
 The standard deviation (σx) is sqrt[ n * P * ( 1 - P ) ].
Binomial Formula and Binomial Probability
The binomial probability refers to the probability that a binomial experiment results in exactly x successes.
For example, in the above table, we see that the binomial probability of getting exactly one head in two
coin flips is 0.50.
Given x, n, and P, we can compute the binomial probability based on the binomial formula:
Binomial Formula. Suppose a binomial experiment consists of n trials and results in x successes. If the
probability of success on an individual trial is P, then the binomial probability is:
b(x; n, P) = nCx * Px * (1 - P)n - x

or
b(x; n, P) = { n! / [ x! (n - x)! ] } * Px * (1 - P)n - x

Example 1
Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?
Solution: This is a binomial experiment in which the number of trials is equal to 5, the number of successes
is equal to 2, and the probability of success on a single trial is 1/6 or about 0.167. Therefore, the binomial
probability is:
b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3

b(2; 5, 0.167) = 0.161
Cumulative Binomial Probability
A cumulative binomial probability refers to the probability that the binomial random variable falls within a
specified range (e.g., is greater than or equal to a stated lower limit and less than or equal to a stated upper
limit).
For example, we might be interested in the cumulative binomial probability of obtaining 45 or fewer heads
in 100 tosses of a coin (see Example 1 below). This would be the sum of all these individual binomial
probabilities.
b(x < 45; 100, 0.5) =

b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + ... + b(x = 44; 100, 0.5) + b(x = 45; 100, 0.5)
Example 1
What is the probability of obtaining 45 or fewer heads in 100 tosses of a coin?
Solution: To solve this problem, we compute 46 individual probabilities, using the binomial formula. The
sum of all these probabilities is the answer we seek. Thus,
b(x < 45; 100, 0.5) = b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + . . . + b(x = 45; 100, 0.5)
b(x < 45; 100, 0.5) = 0.184
Example 2
The probability that a student is accepted to a prestigious college is 0.3. If 5 students from the same school
apply, what is the probability that at most 2 are accepted?
Solution: To solve this problem, we compute 3 individual probabilities, using the binomial formula. The sum
of all these probabilities is the answer we seek. Thus,

b(x < 2; 5, 0.3) = b(x = 0; 5, 0.3) + b(x = 1; 5, 0.3) + b(x = 2; 5, 0.3)
b(x < 2; 5, 0.3) = 0.1681 + 0.3601 + 0.3087
b(x < 2; 5, 0.3) = 0.8369
Example 3
What is the probability that the world series will last 4 games? 5 games? 6 games? 7 games? Assume that
the teams are evenly matched.
Solution: This is a very tricky application of the binomial distribution. If you can follow the logic of this
solution, you have a good understanding of the material covered in the tutorial, to this point.
In the world series, there are two baseball teams. The series ends when the winning team wins 4 games.
Therefore, we define a success as a win by the team that ultimately becomes the world series champion.
For the purpose of this analysis, we assume that the teams are evenly matched. Therefore, the probability
that a particular team wins a particular game is 0.5.
Let's look first at the simplest case. What is the probability that the series lasts only 4 games. This can occur
if one team wins the first 4 games. The probability of the National League team winning 4 games in a row
is:
b(4; 4, 0.5) = 4C4 * (0.5)4 * (0.5)0 = 0.0625
Similarly, when we compute the probability of the American League team winning 4 games in a row, we
find that it is also 0.0625. Therefore, probability that the series ends in four games would be 0.0625 +
0.0625 = 0.125; since the series would end if either the American or National League team won 4 games in
a row.
Now let's tackle the question of finding probability that the world series ends in 5 games. The trick in
finding this solution is to recognize that the series can only end in 5 games, if one team has won 3 out of
the first 4 games. So let's first find the probability that the American League team wins exactly 3 of the first
4 games.
b(3; 4, 0.5) = 4C3 * (0.5)3 * (0.5)1 = 0.25
Okay, here comes some more tricky stuff, so listen up. Given that the American League team has won 3 of
the first 4 games, the American League team has a 50/50 chance of winning the fifth game to end the
series. Therefore, the probability of the American League team winning the series in 5 games is 0.25 * 0.50
= 0.125. Since the National League team could also win the series in 5 games, the probability that the series
ends in 5 games would be 0.125 + 0.125 = 0.25.
The rest of the problem would be solved in the same way. You should find that the probability of the series
ending in 6 games is 0.3125; and the probability of the series ending in 7 games is also 0.3125.
Negative Binomial Distribution
In this lesson, we cover the negative binomial distribution and the geometric distribution. As we will see,
the geometric distribution is a special case of the negative binomial distribution.
Negative Binomial Experiment
A negative binomial experiment is a statistical experiment that has the following properties:
 The experiment consists of x repeated trials.

 Each trial can result in just two possible outcomes. We call one of these outcomes a success and the
other, a failure.
 The probability of success, denoted by P, is the same on every trial.
 The trials are independent; that is, the outcome on one trial does not affect the outcome on other
trials.
 The experiment continues until r successes are observed, where r is specified in advance.
Consider the following statistical experiment. You flip a coin repeatedly and count the number of times the
coin lands on heads. You continue flipping the coin until it has landed 5 times on heads. This is a negative
binomial experiment because:
 The experiment consists of repeated trials. We flip a coin repeatedly until it has landed 5 times on
heads.
 Each trial can result in just two possible outcomes - heads or tails.
 The probability of success is constant - 0.5 on every trial.
 The trials are independent; that is, getting heads on one trial does not affect whether we get heads
on other trials.
 The experiment continues until a fixed number of successes have occurred; in this case, 5 heads.
Notation
The following notation is helpful, when we talk about negative binomial probability.
 x: The number of trials required to produce r successes in a negative binomial experiment.

 r: The number of successes in the negative binomial experiment.
 P: The probability of success on an individual trial.
 Q: The probability of failure on an individual trial. (This is equal to 1 - P.)
 b*(x; r, P): Negative binomial probability - the probability that an x-trial negative binomial
experiment results in the rth success on the xth trial, when the probability of success on an
individual trial is P.
 nCr: The number of combinations of n things, taken r at a time.

Negative Binomial Distribution
A negative binomial random variable is the number X of repeated trials to produce r successes in a
negative binomial experiment. The probability distribution of a negative binomial random variable is called
a negative binomial distribution. The negative binomial distribution is also known as the Pascal
distribution.
Suppose we flip a coin repeatedly and count the number of heads (successes). If we continue flipping the
coin until it has landed 2 times on heads, we are conducting a negative binomial experiment. The negative
binomial random variable is the number of coin flips required to achieve 2 heads. In this example, the
number of coin flips is a random variable that can take on any integer value between 2 and plus infinity.
The negative binomial probability distribution for this example is presented below.
Number of coin flips Probability
2 0.25
3 0.25
4 0.1875
5 0.125
6 0.078125
7 or more 0.109375
Negative Binomial Probability
The negative binomial probability refers to the probability that a negative binomial experiment results in r
- 1 successes after trial x - 1 and r successes after trial x. For example, in the above table, we see that the
negative binomial probability of getting the second head on the sixth flip of the coin is 0.078125.
Given x, r, and P, we can compute the negative binomial probability based on the following formula:
Negative Binomial Formula. Suppose a negative binomial experiment consists of x trials and results in r
successes. If the probability of success on an individual trial is P, then the negative binomial probability is:
b*(x; r, P) = x-1Cr-1 * Pr * (1 - P)x - r

The Mean of the Negative Binomial Distribution
If we define the mean of the negative binomial distribution as the average number of trials required to
produce r successes, then the mean is equal to:
μ=r/P
where μ is the mean number of trials, r is the number of successes, and P is the probability of a success on
any given trial.
Alternative Views of the Negative Binomial Distribution
As if statistics weren't challenging enough, the above definition is not the only definition for the negative
binomial distribution. Two common alternative definitions are:
 The negative binomial random variable is R, the number of successes before the binomial
experiment results in k failures. The mean of R is:
μR = kP/Q
 The negative binomial random variable is K, the number of failures before the binomial experiment
results in r successes. The mean of K is:
μK = rQ/P
The moral: If someone talks about a negative binomial distribution, find out how they are defining the
negative binomial random variable.
On this web site, when we refer to the negative binomial distribution, we are talking about the definition
presented earlier. That is, we are defining the negative binomial random variable as X, the total number of
trials required for the binomial experiment to produce r successes.
Geometric Distribution
The geometric distribution is a special case of the negative binomial distribution. It deals with the number
of trials required for a single success. Thus, the geometric distribution is negative binomial distribution
where the number of successes (r) is equal to 1.
An example of a geometric distribution would be tossing a coin until it lands on heads. We might ask: What
is the probability that the first head occurs on the third flip? That probability is referred to as a geometric
probability and is denoted by g(x; P). The formula for geometric probability is given below.

Geometric Probability Formula. Suppose a negative binomial experiment consists of x trials and results in
one success. If the probability of success on an individual trial is P, then the geometric probability is:
g(x; P) = P * Qx - 1
Sample Problems
The problems below show how to apply your new-found knowledge of the negative binomial distribution
(see Example 1) and the geometric distribution (see Example 2).
Example 1
Bob is a high school basketball player. He is a 70% free throw shooter. That means his probability of making
a free throw is 0.70. During the season, what is the probability that Bob makes his third free throw on his
fifth shot?
Solution: This is an example of a negative binomial experiment. The probability of success (P) is 0.70, the
number of trials (x) is 5, and the number of successes (r) is 3.
To solve this problem, we enter these values into the negative binomial formula.
b*(x; r, P) = x-1Cr-1 * Pr * Qx - r
b*(5; 3, 0.7) = 4C2 * 0.73 * 0.32
b*(5; 3, 0.7) = 6 * 0.343 * 0.09 = 0.18522
Thus, the probability that Bob will make his third successful free throw on his fifth shot is 0.18522.
Example 2
Let's reconsider the above problem from Example 1. This time, we'll ask a slightly different question: What
is the probability that Bob makes his first free throw on his fifth shot?
Solution: This is an example of a geometric distribution, which is a special case of a negative binomial
distribution. Therefore, this problem can be solved using the negative binomial formula or the geometric
formula. We demonstrate each approach below, beginning with the negative binomial formula.
The probability of success (P) is 0.70, the number of trials (x) is 5, and the number of successes (r) is 1. We
enter these values into the negative binomial formula.
b*(x; r, P) = x-1Cr-1 * Pr * Qx - r
b*(5; 1, 0.7) = 4C0 * 0.71 * 0.34
b*(5; 3, 0.7) = 0.00567

Now, we demonstate a solution based on the geometric formula.
g(x; P) = P * Qx - 1
g(5; 0.7) = 0.7 * 0.34 = 0.00567
Notice that each approach yields the same answer.
Hypergeometric Distribution
The probability distribution of a hypergeometric random variable is called a hypergeometric distribution.

This lesson describes how hypergeometric random variables, hypergeometric experiments, hypergeometric
probability, and the hypergeometric distribution are all related.
Notation
The following notation is helpful, when we talk about hypergeometric distributions and hypergeometric
probability.
 N: The number of items in the population.

 k: The number of items in the population that are classified as successes.
 n: The number of items in the sample.
 x: The number of items in the sample that are classified as successes.
 kCx: The number of combinations of k things, taken x at a time.
 h(x; N, n, k): hypergeometric probability - the probability that an n-trial hypergeometric experiment
results in exactly x successes, when the population consists of N items, k of which are classified as
successes.
Hypergeometric Experiments
A hypergeometric experiment is a statistical experiment that has the following properties:
 A sample of size n is randomly selected without replacement from a population of N items.

 In the population, k items can be classified as successes, and N - k items can be classified as failures.
Consider the following statistical experiment. You have an urn of 10 marbles - 5 red and 5 green. You
randomly select 2 marbles without replacement and count the number of red marbles you have selected.
This would be a hypergeometric experiment.
Note that it would not be a binomial experiment. A binomial experiment requires that the probability of
success be constant on every trial. With the above experiment, the probability of a success changes on
every trial. In the beginning, the probability of selecting a red marble is 5/10. If you select a red marble on
the first trial, the probability of selecting a red marble on the second trial is 4/9. And if you select a green
marble on the first trial, the probability of selecting a red marble on the second trial is 5/9.

Note further that if you selected the marbles with replacement, the probability of success would not
change. It would be 5/10 on every trial. Then, this would be a binomial experiment.
Hypergeometric Distribution
A hypergeometric random variable is the number of successes that result from a hypergeometric
experiment. The probability distribution of a hypergeometric random variable is called a hypergeometric
distribution.
Given x, N, n, and k, we can compute the hypergeometric probability based on the following formula:
Hypergeometric Formula. Suppose a population consists of N items, k of which are successes. And a
random sample drawn from that population consists of n items, x of which are successes. Then the
hypergeometric probability is:
h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]
The hypergeometric distribution has the following properties:
 The mean of the distribution is equal to n * k / N .

 The variance is n * k * ( N - k ) * ( N - n ) / [ N2 * ( N - 1 ) ] .
Example 1
Suppose we randomly select 5 cards without replacement from an ordinary deck of playing cards. What is
the probability of getting exactly 2 red cards (i.e., hearts or diamonds)?
Solution: This is a hypergeometric experiment in which we know the following:
 N = 52; since there are 52 cards in a deck.

 k = 26; since there are 26 red cards in a deck.
 n = 5; since we randomly select 5 cards from the deck.
 x = 2; since 2 of the cards we select are red.
We plug these values into the hypergeometric formula as follows:
h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]

h(2; 52, 5, 26) = [ 26C2 ] [ 26C3 ] / [ 52C5 ]
h(2; 52, 5, 26) = [ 325 ] [ 2600 ] / [ 2,598,960 ] = 0.32513
Thus, the probability of randomly selecting 2 red cards is 0.32513.
Cumulative Hypergeometric Probability

A cumulative hypergeometric probability refers to the probability that the hypergeometric random
variable is greater than or equal to some specified lower limit and less than or equal to some specified
upper limit.
For example, suppose we randomly select five cards from an ordinary deck of playing cards. We might be
interested in the cumulative hypergeometric probability of obtaining 2 or fewer hearts. This would be the
probability of obtaining 0 hearts plus the probability of obtaining 1 heart plus the probability of obtaining 2
hearts, as shown in the example below.
Example 1
Suppose we select 5 cards from an ordinary deck of playing cards. What is the probability of obtaining 2 or
fewer hearts?
Solution: This is a hypergeometric experiment in which we know the following:
 N = 52; since there are 52 cards in a deck.

 k = 13; since there are 13 hearts in a deck.
 n = 5; since we randomly select 5 cards from the deck.
 x = 0 to 2; since our selection includes 0, 1, or 2 hearts.
We plug these values into the hypergeometric formula as follows:
h(x < x; N, n, k) = h(x < 2; 52, 5, 13)

h(x < 2; 52, 5, 13) = h(x = 0; 52, 5, 13) + h(x = 1; 52, 5, 13) + h(x = 2; 52, 5, 13)
h(x < 2; 52, 5, 13) = [ (13C0) (39C5) / (52C5) ] + [ (13C1) (39C4) / (52C5) ] + [ (13C2) (39C3) / (52C5) ]
h(x < 2; 52, 5, 13) = [ (1)(575,757)/(2,598,960) ] + [ (13)(82,251)/(2,598,960) ] + [ (78)(9139)/(2,598,960) ]
h(x < 2; 52, 5, 13) = [ 0.2215 ] + [ 0.4114 ] + [ 0.2743 ]
h(x < 2; 52, 5, 13) = 0.9072
Thus, the probability of randomly selecting at most 2 hearts is 0.9072.
Poisson Distribution
A Poisson distribution is the probability distribution that results from a Poisson experiment.
Attributes of a Poisson Experiment
A Poisson experiment is a statistical experiment that has the following properties:
 The experiment results in outcomes that can be classified as successes or failures.

 The average number of successes (μ) that occurs in a specified region is known.
 The probability that a success will occur is proportional to the size of the region.
 The probability that a success will occur in an extremely small region is virtually zero.

Note that the specified region could take many forms. For instance, it could be a length, an area, a volume,
a period of time, etc.
Notation
The following notation is helpful, when we talk about the Poisson distribution.
 e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural logarithm
system.)
 μ: The mean number of successes that occur in a specified region.
 x: The actual number of successes that occur in a specified region.
 P(x; μ): The Poisson probability that exactly x successes occur in a Poisson experiment, when the
mean number of successes is μ.
Poisson Distribution
A Poisson random variable is the number of successes that result from a Poisson experiment. The
probability distribution of a Poisson random variable is called a Poisson distribution.
Given the mean number of successes (μ) that occur in a specified region, we can compute the Poisson
probability based on the following formula:
Poisson Formula. Suppose we conduct a Poisson experiment, in which the average number of successes
within a given region is μ. Then, the Poisson probability is:
P(x; μ) = (e-μ) (μx) / x!
where x is the actual number of successes that result from the experiment, and e is approximately equal to
2.71828.
The Poisson distribution has the following properties:
 The mean of the distribution is equal to μ .

 The variance is also equal to μ .
Example 1
The average number of homes sold by the Acme Realty company is 2 homes per day. What is the
probability that exactly 3 homes will be sold tomorrow?
Solution: This is a Poisson experiment in which we know the following:
 μ = 2; since 2 homes are sold per day, on average.

 x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.
 e = 2.71828; since e is a constant equal to approximately 2.71828.

We plug these values into the Poisson formula as follows:
P(x; μ) = (e-μ) (μx) / x!

P(3; 2) = (2.71828-2) (23) / 3!
P(3; 2) = (0.13534) (8) / 6
P(3; 2) = 0.180
Thus, the probability of selling 3 homes tomorrow is 0.180 .
Cumulative Poisson Probability
A cumulative Poisson probability refers to the probability that the Poisson random variable is greater than
some specified lower limit and less than some specified upper limit.
Example 1
Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that tourists will
see fewer than four lions on the next 1-day safari?
Solution: This is a Poisson experiment in which we know the following:
 μ = 5; since 5 lions are seen per safari, on average.

 x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer than 4 lions; that is,
we want the probability that they will see 0, 1, 2, or 3 lions.
 e = 2.71828; since e is a constant equal to approximately 2.71828.
To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions. Thus, we need
to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To compute this sum, we use
the Poisson formula:
P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)

P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ]
P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] + [ (0.006738)(125) / 6 ]
P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ]
P(x < 3, 5) = 0.2650
Thus, the probability of seeing at no more than 3 lions is 0.2650.
What is the Normal Distribution?
The normal distribution refers to a family of continuous probability distributions described by the normal
equation.

The Normal Equation
The normal distribution is defined by the following equation:
Normal equation. The value of the random variable Y is:
Y = { 1/[ σ * sqrt(2π) ] } * e-(x - μ)2/2σ2
where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately
3.14159, and e is approximately 2.71828.
The random variable X in the normal equation is called the normal random variable. The normal equation
is the probability density function for the normal distribution.
The Normal Curve
The graph of the normal distribution depends on two factors - the mean and the standard deviation. The
mean of the distribution determines the location of the center of the graph, and the standard deviation
determines the height and width of the graph. When the standard deviation is large, the curve is short and
wide; when the standard deviation is small, the curve is tall and narrow. All normal distributions look like a
symmetric, bell-shaped curve, as shown below.
The curve on the left is shorter and wider than the curve on the right, because the curve on the left has a
bigger standard deviation.
Probability and the Normal Curve
The normal distribution is a continuous probability distribution. This has several implications for probability.
 The total area under the normal curve is equal to 1.

 The probability that a normal random variable X equals any particular value is 0.
 The probability that X is greater than a equals the area under the normal curve bounded by a and
plus infinity (as indicated by the non-shaded area in the figure below).
 The probability that X is less than a equals the area under the normal curve bounded by a and
minus infinity (as indicated by the shaded area in the figure below).

Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the following
"rule".
 About 68% of the area under the curve falls within 1 standard deviation of the mean.
 About 95% of the area under the curve falls within 2 standard deviations of the mean.
 About 99.7% of the area under the curve falls within 3 standard deviations of the mean.
Collectively, these points are known as the empirical rule or the 68-95-99.7 rule. Clearly, given a normal
distribution, most outcomes will be within 3 standard deviations of the mean.
To find the probability associated with a normal random variable, use a graphing calculator, an online
normal distribution calculator, or a normal distribution table. In the examples below, we illustrate the use
of Stat Trek's Normal Distribution Calculator, a free tool available on this site. In the next lesson, we
demonstrate the use of normal distribution tables.
Example 1
An average light bulb manufactured by the Acme Corporation lasts 300 days with a standard deviation of
50 days. Assuming that bulb life is normally distributed, what is the probability that an Acme light bulb will
last at most 365 days?
Solution: Given a mean score of 300 days and a standard deviation of 50 days, we want to find the
cumulative probability that bulb life is less than or equal to 365 days. Thus, we know the following:
 The value of the normal random variable is 365 days.

 The mean is equal to 300 days.
 The standard deviation is equal to 50 days.
We enter these values into the Normal Distribution Calculator and compute the cumulative probability. The
answer is: P( X < 365) = 0.90. Hence, there is a 90% chance that a light bulb will burn out within 365 days.
Example 2
Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard
deviation of 10, what is the probability that a person who takes the test will score between 90 and 110?

Solution: Here, we want to know the probability that the test score falls between 90 and 110. The "trick" to
solving this problem is to realize the following:
P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )
We use the Normal Distribution Calculator to compute both probabilities on the right side of the above
equation.
 To compute P( X < 110 ), we enter the following inputs into the calculator: The value of the normal
random variable is 110, the mean is 100, and the standard deviation is 10. We find that P( X < 110 )
is 0.84.
 To compute P( X < 90 ), we enter the following inputs into the calculator: The value of the normal
random variable is 90, the mean is 100, and the standard deviation is 10. We find that P( X < 90 ) is
0.16.
We use these findings to compute our final answer as follows:
P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )

P( 90 < X < 110 ) = 0.84 - 0.16
P( 90 < X < 110 ) = 0.68
Thus, about 68% of the test scores will fall between 90 and 110.
Standard Normal Distribution
The standard normal distribution is a special case of the normal distribution. It is the distribution that
occurs when a normal random variable has a mean of zero and a standard deviation of one.
Standard Score (aka, z Score)
The normal random variable of a standard normal distribution is called a standard score or a z-score. Every
normal random variable X can be transformed into a z score via the following equation:
z = (X - μ) / σ
where X is a normal random variable, μ is the mean of X, and σ is the standard deviation of X.
Standard Normal Distribution Table
A standard normal distribution table shows a cumulative probability associated with a particular z-score.
Table rows show the whole number and tenths place of the z-score. Table columns show the hundredths
place. The cumulative probability (often from minus infinity to the z-score) appears in the cell of the table.
For example, a section of the standard normal table is reproduced below. To find the cumulative
probability of a z-score equal to -1.31, cross-reference the row of the table containing -1.3 with the column

containing 0.01. The table shows that the probability that a standard normal random variable will be less
than -1.31 is 0.0951; that is, P(Z < -1.31) = 0.0951.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
... ... ... ... ... ... ... ... ... ... ...
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0722 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
... ... ... ... ... ... ... ... ... ... ...
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
Of course, you may not be interested in the probability that a standard normal random variable falls
between minus infinity and a given value. You may want to know the probability that it lies between a
given value and plus infinity. Or you may want to know the probability that a standard normal random
variable lies between two given values. These probabilities are easy to compute from a normal distribution
table. Here's how.
 Find P(Z > a). The probability that a standard normal random variable (z) is greater than a given
value (a) is easy to find. The table shows the P(Z < a). The P(Z > a) = 1 - P(Z < a).
Suppose, for example, that we want to know the probability that a z-score will be greater than 3.00.
From the table (see above), we find that P(Z < 3.00) = 0.9987. Therefore, P(Z > 3.00) = 1 - P(Z < 3.00)
= 1 - 0.9987 = 0.0013.
 Find P(a < Z < b). The probability that a standard normal random variables lies between two values is
also easy to find. The P(a < Z < b) = P(Z < b) - P(Z < a).
For example, suppose we want to know the probability that a z-score will be greater than -1.40 and
less than -1.20. From the table (see above), we find that P(Z < -1.20) = 0.1151; and P(Z < -1.40) =
0.0808. Therefore, P(-1.40 < Z < -1.20) = P(Z < -1.20) - P(Z < -1.40) = 0.1151 - 0.0808 = 0.0343.
In school or on the Advanced Placement Statistics Exam, you may be called upon to use or interpret
standard normal distribution tables. Standard normal tables are commonly found in appendices of most
statistics texts.

The Normal Distribution as a Model for Measurements
Often, phenomena in the real world follow a normal (or near-normal) distribution. This allows researchers
to use the normal distribution as a model for assessing probabilities associated with real-world
phenomena. Typically, the analysis involves two steps.
 Transform raw data. Usually, the raw data are not in the form of z-scores. They need to be
transformed into z-scores, using the transformation equation presented earlier: z = (X - μ) / σ.
 Find probability. Once the data have been transformed into z-scores, you can use standard normal
distribution tables, online calculators (e.g., Stat Trek's free normal distribution calculator), or handheld
graphing calculators to find probabilities associated with the z-scores.
The problem in the next section demonstrates the use of the normal distribution as a model for
measurement.
Chi-Square Distribution
The distribution of the chi-square statistic is called the chi-square distribution. In this lesson, we learn to
compute the chi-square statistic and find the probability associated with the statistic. Chi-square examples
illustrate key points.
The Chi-Square Statistic
Suppose we conduct the following statistical experiment. We select a random sample of size n from a
normal population, having a standard deviation equal to σ. We find that the standard deviation in our
sample is equal to s. Given these data, we can define a statistic, called chi-square, using the following
equation:
Χ2 = [ ( n - 1 ) * s2 ] / σ2
If we repeated this experiment an infinite number of times, we could obtain a sampling distribution for the
chi-square statistic. The chi-square distribution is defined by the following probability density function:
Y = Y0 * ( Χ2 ) ( v/2 - 1 ) * e-Χ2 / 2
where Y0 is a constant that depends on the number of degrees of freedom, Χ2 is the chi-square statistic, v =
n - 1 is the number of degrees of freedom, and e is a constant equal to the base of the natural logarithm
system (approximately 2.71828). Y0 is defined, so that the area under the chi-square curve is equal to one.
In the figure below, the red curve shows the distribution of chi-square values computed from all possible
samples of size 3, where degrees of freedom is n - 1 = 3 - 1 = 2. Similarly, the green curve shows the
distribution for samples of size 5 (degrees of freedom equal to 4); and the blue curve, for samples of size 11
(degrees of freedom equal to 10).

The chi-square distribution has the following properties:
 The mean of the distribution is equal to the number of degrees of freedom: μ = v.

 The variance is equal to two times the number of degrees of freedom: σ2 = 2 * v
 When the degrees of freedom are greater than or equal to 2, the maximum value for Y occurs when
Χ2 = v - 2.
 As the degrees of freedom increase, the chi-square curve approaches a normal distribution.
Cumulative Probability and the Chi-Square Distribution
The chi-square distribution is constructed so that the total area under the curve is equal to 1. The area
under the curve between 0 and a particular chi-square value is a cumulative probability associated with
that chi-square value. For example, in the figure below, the shaded area represents a cumulative
probability associated with a chi-square statistic equal to A; that is, it is the probability that the value of a
chi-square statistic will fall between 0 and A.
Fortunately, we don't have to compute the area under the curve to find the probability. The easiest way to
find the cumulative probability associated with a particular chi-square statistic is to use the Chi-Square
Distribution Calculator, a free tool provided by Stat Trek.
Chi-Square Distribution Calculator
The Chi-Square Distribution Calculator solves common statistics problems, based on the chi-square
distribution. The calculator computes cumulative probabilities, based on simple inputs. Clear instructions

guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and
sample problems provide straightforward explanations. The calculator is free. It can be found under the
Stat Tables tab, which appears in the header of every Stat Trek web page.
Chi-Square Distribution Calculator
Test Your Understanding: Chi-Square Examples
Problem 1
The Acme Battery Company has developed a new cell phone battery. On average, the battery lasts 60
minutes on a single charge. The standard deviation is 4 minutes.
Suppose the manufacturing department runs a quality control test. They randomly select 7 batteries. The
standard deviation of the selected batteries is 6 minutes. What would be the chi-square statistic
represented by this test?
Solution
We know the following:
 The standard deviation of the population is 4 minutes.

 The standard deviation of the sample is 6 minutes.
 The number of sample observations is 7.
To compute the chi-square statistic, we plug these data in the chi-square equation, as shown below.
Χ2 = [ ( n - 1 ) * s2 ] / σ2
Χ2 = [ ( 7 - 1 ) * 62 ] / 42 = 13.5
where Χ2 is the chi-square statistic, n is the sample size, s is the standard deviation of the sample, and σ is
the standard deviation of the population.
Problem 2
Let's revisit the problem presented above. The manufacturing department ran a quality control test, using 7
randomly selected batteries. In their test, the standard deviation was 6 minutes, which equated to a chi-
square statistic of 13.5.
Suppose they repeated the test with a new random sample of 7 batteries. What is the probability that the
standard deviation in the new test would be greater than 6 minutes?

Solution
We know the following:
 The sample size n is equal to 7.

 The degrees of freedom are equal to n - 1 = 7 - 1 = 6.
 The chi-square statistic is equal to 13.5 (see Example 1 above).
Given the degrees of freedom, we can determine the cumulative probability that the chi-square statistic
will fall between 0 and any positive value. To find the cumulative probability that a chi-square statistic falls
between 0 and 13.5, we enter the degrees of freedom (6) and the chi-square statistic (13.5) into the Chi-
Square Distribution Calculator. The calculator displays the cumulative probability: 0.96.
This tells us that the probability that a standard deviation would be less than or equal to 6 minutes is 0.96.
This means (by the subtraction rule) that the probability that the standard deviation would be greater than 6 minutes
is 1 - 0.96 or .04.
10. Questions
Objective questions
Question 1: In how many ways can the letters of the word ABACUS be rearranged such that the vowels
always appear together?
B. 3! * 3!
C.
D.
E.

Question 2: How many different four letter words can be formed (the words need not be meaningful) using
the letters of the word MEDITERRANEAN such that the first letter is E and the last letter is R?
A. 59
B.
C. 56
D. 23
E.
Question 3:What is the probability that the position in which the consonants appear remain unchanged
when the letters of the word "Math" are re-arranged?
A. 1/4
B. 1/6
C. 1/3
D. 1/24
E. 1/12
Question 4: There are 6 boxes numbered 1, 2, ... 6. Each box is to be filled up either with a red or a green
ball in such a way that at least 1 box contains a green ball and the boxes containing green balls are
consecutively numbered. The total number of ways in which this can be done is:
A. 5
B. 21
C. 33
D. 60
E. 6
Question 5: A man can hit a target once in 4 shots. If he fires 4 shots in succession, what is the probability
that he will hit his target?
man can hit a target once in 4 shots. If he fires 4 shots in succession, what is the probability that he will hit
his target?
A.1
B.
C.
D.
E.
Question 6: In how many ways can 5 letters be posted in 3 post boxes, if any number of letters can be
posted in all of the three post boxes?
In how many ways can 5 letters be posted in 3 post boxes, if any number of letters can be posted in all of
the three post boxes?
A. 5 C 3
B.5 P 3
C.53
D.35
E.25
Question 7: Ten coins are tossed simultaneously. In how many of the outcomes will the third coin turn up a
head?
Ten coins are tossed simultaneously. In how many of the outcomes will the third coin turn up a head?
A. 210
B. 29
C. 28
D. 29
E. None of these
Question 8: In how many ways can the letters of the word "PROBLEM" be rearranged to make seven letter
words such that none of the letters repeat?
In how many ways can the letters of the word "PROBLEM" be rearranged to make 7 letter words such that
none of the letters repeat?
A.7!
B.7C7
C.77
D.49
E.None of these
Short Questions
1. State A-priori or classical definition
2. State A-posteriori or relative frequency definition of probability

3. State Axiomatic definition of probability
4. State Baye’s theorem
5. State Total probability theorem
6. Define conditional probabilities of events
7. Define Joint probabilities of events
8. What is Bernoulli’s trial?
9. State Bernoulli’s theorem
10. State Binomial theorem
11. Define R.V. Give an example
12. State important Properties of a distribution function
13. What is a discrete RV and continuous RV? Give examples of each
14. Define probability distribution of i) discrete random variable ii) continuous random variable.
Long Questions
1. In a factory, 4 machines A1, A2, A3 and A4 produce 10%, 25%, 35% and 30% of the items respectively.
The percentage of the defective items produced by them is 5%, 4%, 3% and 2% respectively. An item
selected at random is found to be defective. What is the probability that it was produced by machine A2?
2. In a communication system a zero is transmitted with probability 0.55.In the channel a zero received as
zero with probability 0.9 and one received as one with probability 0.8. Find the probability that
(i) One is received

(ii) Zero is received
(i) One transmitted one received
(ii) Zero transmitted zero received

3. A mechanism consist of four paths A, B, C, D and probability of their failure are p,q,r,s respectively. The
mechanism works if their is no failure in any of these parts. Find the probability that
(i) Mechanism is working
(ii) Mechanism is not working
4. Suppose an urn contains ten white balls and five red balls. Two balls are withdrawn at random from the
urn without replacements:
(i) What is the probability that both balls are white?
(ii) What is the probability that the second ball is red?
5. Two balanced dices are being rolled simultaneously. If sum of the numbers shown at a time by the two
faces is 6. What is the probability that the number shown by one of the face to the dice in this case is 1?
6. Define R.V. Give an example
7. State important Properties of a distribution function
8. What is a discrete RV and continuous RV ? Give an examples of Each
9. Define probability distribution of i) a discrete random variable ii) a continuous random variable.
University Questions
Dec 2012
Q. State and prove Baye’s theorem
Q. State the three axioms of Probability
Q. If A and B are two events such that P(A)=0.3, P(B)=0.4, P(AB)=0.2 find P(A U B), P(A/B)
Q. State and prove properties of distribution function

May 12
Q. If A and B are two events, prove that P(A U B)= P(A) + P(B) – P(AB)
Q. Explain conditional probability with example
Q. State and prove Baye’s theorem and total probability theorem
Q. Suppose two million lottery tickets are issued with 100 wining tickets among them
(i) If aperson purchase 100 tickets what is the probability of wining
(ii) How many tickets should one buy to be 95% confident of having wining tickets ?
Q. What 'is a Random Variable? Explain continuous and discrete Random Variables with suitable examples.
Define expectation of continuous and discrete Random Variables
Dec 11
Q. If A and B are two independent events, prove that P(AB)= P(A). P(B)
Q. Suppose five cards to be drawn at random from a standard deck of cards. If all the drawn cards are red,
what is the probability that all of them hearts
Q. the random variable has exponential probability density function f(x)= K e-|x|. Determine value of K and
corrosponding distribution function
May 2011
1. (a) State and explain :
(i) Independent events
(ii) Joint and conditional probabilities of events.

Q. Suppose box I contains 5 white balls and 6 black balls and box II contains 6 white balls and 4 black balls.
A box is selected at random and then a ball is chosen at random from the selected box.
(i) What is the probability that the ball chosen will be a white ball?
(ii) Given that the ball chosen is white, what is the probability that it came from box I?
Dec 2010
In a communication system a zero is transmitted with probability 0.4 and one is transmitted with
probability 0.6. Due to noise in the channel a zero can be received as one with probability 0.1 and zero with
probability 0.9 similarly a one can be received as zero with probability 0.1 and one with probability 0.9.
Now-
(i) A one was observed what is the probability that zero was transmitted
(ii) A one was observed what is the probability that one was transmitted
May 2010
Q. (a) Give the following definitions of probability with the shortcomings if any:
(i) A-priori or classical definition
(ii) A-posteriori or relative frequency definition
(iii) Axiomatic definition
Q. A mechanism consist of three paths A, B, C and probability of their failure are p,q,r respectively. The
mechanism works if their is no failure in any of these parts. Find the probability that
(i) Mechanism is working
(ii) Mechanism is not working
Dec 2009
Q. State and explain Baye’s theorem & conditional probability.

Q. If A and B are independent events then prove that P(A∩B)= P(A). P(B)
if A and B are exclusive events then prove that P(A∩B)= P(A)
Q. In a communication system a zero is transmitted with probability 0.45.In the channel a zero received as
zero with probability 0.9 and one received as one with probability 0.8. Find the probability that
(i) One is received

(ii) Zero is received
(iii) One transmitted one received
(iv) Zero transmitted zero received
Q. State and prove any two properties of-
(i) Density functions
(ii) Distribution functions.
Dec 2008
Q. State the three axioms of Probability
Q. Explain the concept of Joint and Conditional Probability with one eg. each.
Q. In a factory, 4 machines A1, A2, A3 and A4 produce 10%, 25%, 35% and 30% of the items respectively.
The percentage of the defective items produced by them is 5%, 4%, 3% and 2% respectively. An item
selected at random is found to be defective. What is the probability that it was produced by machine A2?
Q. If X, Yare two independent exponentially distributed random variables with same parameter unity, find
the probability density function of U=X+Y, V = X/(X + V).
Q. A random variable takes values 9, 13, 17 (5 + 4n) each with probability 1/n, find mean and variance of X.
Q. What 'is a Random Variable? Explain continuous and discrete Random Variables

Q. Suppose X and Yare two random variables. Define covariance and correlation of X and Y. When do we
say that X and Y are (i) Orthogonal (ii) Independent and (iii) Uncorrelated? Are Uncorrelated variables
independent?
May 2007
Q. State and explain with example:-
(i) Conditional probability
(ii) Baye's Theorem.
If two events A and Bare independent, show that -
(i) A and [are independent
(ii) A and B are independent. .
Q. For a certain binary communication channel, the probability that a transmitted '0' is received as a '0'is
0.95 while the probability that a transmitted '1' is received as '1' is 0.90. If the probability of transmitting a
'0' is 0.4, find the probability that -
(i) a '1' is received
(ii) a '1' was transmitted given that '1' was received
(iii) the error has occurred.
Dec 2006
Q. (a) Give the following definitions of probability with the shortcomings if any:
(i) A-priori or classical definition
(ii) A-posteriori or relative frequency definition
(iii) Axiomatic definition
(b) State Total probability theorem and Baye’s' theorem.

Suppose box I contains 5 white balls and 6 black balls and box II contains 6 white balls and 4 black balls. A
box is selected at random and then a ball is chosen at random from the selected box.
(i) What is the probability that the ball chosen will be a white ball?
(ii) Given that the ball chosen is white, what is the probability that it came from box I?
June 2006
Q(a) (i) Define the conditional probability of an event A' given that another event B has occurred.
(ii) A biased coin is tossed till a head appears for the first time. What is the probability that the number of
tosses required is odd? [2+6]
(b) State Bayes' theorem
Q.A certain test for a particular cancer is known to be 95% accurate. A person submits to the test and the
results are positive. Suppose the person comes from a population of 100,000 where 2000 people suffer
from that disease. What can we conclude about the probability that the person under test has that
particular cancer?
Dec 2005
(a) Let B1, B2 . . . ., Bn be partitions of an event space Bi, i = 1, 2 . . . n, for the event B that has 10 occurred.
Suppose now an event A occurs. Find expression for P (BIA) in terms of B1,
(b) Two balanced dices are being rolled simultaneously. If sum of the numbers shown at a time by the 10
two faces is 7. What is the probability that the number shown by one of the face to the dice in this case is
1?
June 2005
Q (a) (i) with the help of a Venn diagram show that the conditional probability of occurrence of an event A
given that the event B has occurred, is given by -
p ( AIB) = P ( AB) / P (B)

(ii) An Urn contains two black balls and three white balls. Two balls are selected at random from the urn
without replacement and sequence of colours is noted. Find the probability that both balls are black.
(b) Suppose that 5 cards to be drawn at random from a standard deck of 52 cards. If all the cards drawn are
red, what is the probability that all of them are hearts.
Dec2004
(a) An experiment is performed N times. During the trial the event A occurs nA times and the event B only
occurs nAB time, during the occurrence of the event A. From the relative frequency approach define the
probability of occurrence of the event A, p(A), the joint probability of occurrence of the events A and B,
P(AB) and the conditional probability of the event B, P(B/A) given the event A has occurred in terms of the
frequency of occurrences nA, nAB and N. Show that P (B/A) = P(AB) / p(A) and P(B'A) = 1 - P(B/A). (b) In
throwing of fair die the probability of the event A = (The outcome is greater than 3). The Probability of the
event B. (the outcome is even numbers). Find P(A/B) and P(A'B).
May 2004
1. (a) State and explain :
(i) Apriori probability of the outcome of an experiment
(ii) Joint and conditional probabilities of events.
(iii) Baye’s Theorem. (Dec 2008, May 2009)
(b) Suppose an urn contains five white balls and seven red balls. Two balls are withdrawn at random from
the urn without replacements:
(i) What is the probability that both balls are white?
(ii) What is the probability that the second ball is red?

CHAPTER-2
Operations on One Random Variable
1. Motivation
When we have a random variable which is function of another random variable and we know the
statistics of one random variable then we can get the statistics of the unknown random variable in
terms of known random variable.
2. Syllabus
Sr. No. of Self

Topic Fine Detailing Week
No. Hours Study
01 Functions of  Functions of one random 6 1.5 12
one random variable
variable
 their distribution and density
functions
 Mean, variance and moments
 Chebyshev, Markov inequality
 Characteristic functions
 Moment theorem





5. Objective
In this chapter we study a few basic concepts of functions of random variables and investigate the expected
value of a certain function of a random variable. The techniques of moment generating functions and
characteristic functions, which are very useful in some applications, are presented.
6. Key Notation:
FX(x) Cumulative Distribution Function
fX(x) Probability Density Function

Semi open intervals
7. Key Definitions
1. Moment of random variable
2. Characteristic function of a discrete random variable
8. Key Relations
1. The function of RV
2. Expected value of RV

3. Expected value of function of RV
4. Standard Deviation
5. Variance
6. Chebyshev Inequality
7. Nth order moment of RV

9. Theory
Functions of Random Variables
Often we have to consider random variables which are functions of other random variables. Let
be a random variable and is a function. Then is a random variable. We are interested to

find the pdf of . For example, suppose represents the random voltage input to a full-wave rectifier.
Then the rectifier output is given by . We have to find the probability description of the random
variable . We consider the following cases:
(a) is a discrete random variable with probability mass function .

The probability mass function of is given by
b) is a continuous random variable with probability density function and is one-to-one

and
monotonically increasing.
The probability mass function of is given by

This is illustrated in the Figure 1

Example 1 Probability density function of a linear function of a random variable
Suppose,
Example 2 Probability density function of the distribution function of a random variable
Suppose the distribution function of a continuous random variable is monotonically

increasing and
one-to-one and define the random variable .Then
Remark
(1) The distribution given by is called a uniform distribution over the interval [0,1].
(2) The above result is particularly important in simulating a random variable with a particular distribution
function.
We assumed to be one-to-one function for invariability. However, the result is more general -
the random variable defined by the distribution function of any random variable is uniformly
distributed
over [0,1]
For example, if is a discrete RV,

(3) is a continuous random variable with probability density function and has multiple
solutions
for x.
Suppose, for has solutions . Then
Proof:
Consider the plot of . Suppose at a point , we have three distinct roots as

shown.
consider the event . This event will be equivalent to union of events

Where the negative sign in is used to account for positive probability.
Therefore, dividing by dy and taking the limit, we get
in the above, we assumed to have three roots. In general, if has n roots, then
Example 3 Probability density function of a linear function of a random variable
Suppose
Example 4 Probability density function of the output of a full-wave rectifier
has two solutions and and at each solution point.

Figure 3
Example 5 Probability density function of the output of a square-law device
and
so that
Expected Value of a Random Variable
 The expectation operation extracts a few parameters of a random variable and provides a summary
description of the random variable in terms of these parameters.
 It is far easier to estimate these parameters from data than to estimate the distribution or density
function of the random variable.
 Moments are some important parameters obtained through the expection operation.
Expected value or mean of a random variable
The expected value of a random variable is defined by
provided exists.
is also called the mean or statistical average of the random variable and is denoted by
Note that, for a discrete RV with the probability mass function (pmf) the pdf
is given by
Thus for a discrete random variable with
Interpretation of the mean
 The mean gives an idea about the average value of the random value. The values of the random
variable are spread about this value.

 Observe that
Therefore, the mean can be also interpreted as the centre of gravity of the pdf curve.
Figure1 Mean of a random variable
Example 1
Suppose is a random variable defined by the pdf
Then

Example 2
Consider the random variable with the pmf as tabulated below
Value of the random

0 1 2 3
variable x
pX(x)
Then
Example 3 Let X be a continuous random variable with
Then

=
Hence EX does not exist. This density function is known as the Cauchy density function.
Remark
If is an even function of , then Thus the mean of a RV with an even symmetric pdf
is 0.
Expected value of a function of a random variable
Suppose is a real-valued function of a random variable as discussed in the last class.

Then,
We shall illustrate the above result in the special case when is one-to-one and
monotonically increasing function of x In this case,
Figure 2

The following important properties of the expectation operation can be immediately derived:
a) If is a constant,
Clearly
(b) If are two functions of the random variable and are constants,
The above property means that is a linear operator.

Mean-square value
Variance
For a random variable with the pdf and mean the variance of is denoted by and
defined as
Thus for a discrete random variable with
The standard deviation of is defined as
Example 4
Find the variance of the random variable discussed in Example 1.
Example 5

Find the variance of the random variable discussed in Example 2. As already computed
Remark
 The variance is a central moment and measure of dispersion of the random variable about the
mean.
 is the average of the square deviation from the mean. It gives information about the
deviation of the values of the RV about the mean. A smaller implies that the random values are
more clustered about the mean. Similarly, a bigger means that the random values are more
scattered.
For example, consider two random variables with pmf as shown below. Note that each of
has zero mean.The variances are given by and implying that has more
spread about the mean.

Figure 3 The pdfs of two discrete random variables with same mean but different variances .
he pdfs of two continous random variables with the same mean and different variances are illustrated in
Figure 4 .

Figure 4
 We could have used the mean absolute deviation to know about the deviation of the
random values about the mean. But it is more difficult both for analysis and numerical calculation.
 Properties of variance
 (1)




 (2) If then
3) If is a constant,
nth moment of a random variable
We can define the nth moment and the nth central- moment of a random variable X by the following
relations
Note that
 The mean is the first moment and the mean-square value is the second moment
 The first central moment is 0 and the variance is the second central moment
 The third central moment measures lack of symmetry of the pdf of a random variable
is called the coefficient of skewness and If the pdf is symmetric this coefficient will be zero.

 The fourth central moment measures flatness or peakedness of the pdf of a random variable.
is called kurtosis. If the peak of the pdf is sharper, then the random variable has a
higher kurtosis.
Inequalities based on expectations
The mean and variance also give some quantitative information about the bounds of RVs. Following
inequalities are extremely useful in many practical problems.
Chebysev Inequality
Suppose a parameter of a manufactured item with known mean The quality
control department rejects the item if the absolute deviation of from is greater than What
fraction of the manufacturing item does the quality control department reject? Can you roughly guess it?
The standard deviation gives us an intuitive idea how the random variable is distributed about the mean.
This idea is more precisely expressed in the remarkable Chebysev Inequality stated below. For a random
variable with mean
Proof:

Markov Inequality
For a random variable which takes only nonnegative values
Remark
Example 6
Solution: A nonnegative RV has the mean Find an upper bound of the probability
By Markov's inequality
Hence the required upper bound
Just as the frequency-domain charcterisations of discrete-time and continuous-time signals, the probability
mass function and the probability density function can also be characterized in the frequency-domain by
means of the charcteristic function of a random variable . These functions are particularly important in

 calculating of moments of a random variable
 evaluating the PDF of combinations of multiple RVs.
Characteristic function
Consider a random variable with probability density function The characteristic function of
denoted by is defined as
Note the following:
 is a complex quantity, representing the Fourier transform of and traditionally using
instead of This implies that the properties of the Fourier transform applies to the
characteristic function.
 The interpretation that is the expectation of helps in calculating moments with the
help of the characteristics function. In a simple case ,
 As always non-negative and , always exists.

[Recall that the Fourier transform of a function g(t) exists if , i.e., g(t) is absolutely
integrable.]
We can get from by the inverse transform
Example 1
Consider the random variable X with pdf given by
= 0 otherwise. The characteristics function is given by
Solution:
Example 2
The characteristic function of the random variable with

Characteristic function of a discrete random variable
Suppose X is a random variable taking values from the discrete set with corresponding
probability mass function for the value
Then ,
If Rx is the set of integers ,we can write
In this case can be interpreted as the discrete-time Fourier transform with substituting
in the original discrete-time Fourier transform. The inverse relation is
Example 3
Suppose X is a random variable with the probability mass function

.
Then,
(Using the Binomial theorem)
Example 4
The characteristic function of the discrete random variable with
Moments and the characteristic function
Given the characteristics function the nth moment is given by
Moments and the characteristic function

Given the characteristics function the nth moment is given by
To prove this consider the power series expansion of
Taking expectation of both sides and assuming to exist, we get
Taking the first derivative of with respect to at we get
Similarly, taking the derivative of with respect to at we get

Example 5
First two moments of the random variable in Example 2
Probability Generating Function
If the random variable under consideration takes non- negative integer values only, it is convenient to
characterize the random variable in terms of the probability generating function G (z) defined by
Note that
is related to z-transform, in actual z-transform, is used instead of
The characteristic function of X is given by

Mean and Variance from the Probability Generating Function

Probability Mass Functions from Probability Generating Functions
Consider the derivative given by
Thus, given the probability generating function , we can get the probability mass function from the
derivatives of at Z = 0. Hence this transform is called the probability generating function.
More problems
Problem
In a recent little league softball game, each player went to bat 4 times. The number of hits made by each
player is described by the following probability distribution.
Number of hits, x 0 1 2 3 4
Probability, P(x) 0.10 0.20 0.30 0.25 0.15
What is the mean of the probability distribution?

Solution
The correct answer is E. The mean of the probability distribution is 2.15, as defined by the following
equation.
E(X) = Σ [ xi * P(xi) ]
E(X) = 0*0.10 + 1*0.20 + 2*0.30 + 3*0.25 + 4*0.15 = 2.15
Problem
The number of adults living in homes on a randomly selected city block is described by the following
probability distribution.
Number of adults, x 1 2 3 4
Probability, P(x) 0.25 0.50 0.15 0.10
What is the standard deviation of the probability distribution?
Solution
The correct answer is D. The solution has three parts. First, find the expected value; then, find the variance;
then, find the standard deviation. Computations are shown below, beginning with the expected value.
E(X) = Σ [ xi * P(xi) ]
E(X) = 1*0.25 + 2*0.50 + 3*0.15 + 4*0.10 = 2.10
Now that we know the expected value, we find the variance.
σ2 = Σ { [ xi - E(x) ]2 * P(xi) }
σ2 = (1 - 2.1)2 * 0.25 + (2 - 2.1)2 * 0.50 + (3 - 2.1)2 * 0.15 + (4 - 2.1)2 * 0.10
σ = (1.21 * 0.25) + (0.01 * 0.50) + (0.81) * 0.15) + (3.61 * 0.10) = 0.3025 + 0.0050 + 0.1215 + 0.3610 =
2
0.79
And finally, the standard deviation is equal to the square root of the variance; so the standard deviation is
sqrt(0.79) or 0.889.
Problem
The table on the left shows the joint probability distribution between two random variables - X and Y; and
the table on the right shows the joint probability distribution between two random variables - A and B.
X A
0 1 2 0 1 2
Y 3 0.1 0.2 0.2 B 3 0.1 0.2 0.2

4 0.1 0.2 0.2 4 0.2 0.2 0.1
Which of the following statements are true?
I. X and Y are independent random variables.

II. A and B are independent random variables.
Solution
The correct answer is A. The solution requires several computations to test the independence of random
variables. Those computations are shown below.
X and Y are independent if P(x|y) = P(x), for all values of X and Y. From the probability distribution table,
we know the following:
P(x=0) = 0.2; P(x=0 | y=3) = 0.2; P(x=0 | y = 4) = 0.2

P(x=1) = 0.4; P(x=1 | y=3) = 0.4; P(x=1 | y = 4) = 0.4
P(x=2) = 0.4; P(x=2 | y=3) = 0.4; P(x=2 | y = 4) = 0.4
Thus, P(x|y) = P(x), for all values of X and Y, which means that X and Y are independent. We repeat the
same analysis to test the independence of A and B.
P(a=0) = 0.3; P(a=0 | b=3) = 0.2; P(a=0 | b = 4) = 0.4

P(a=1) = 0.4; P(a=1 | b=3) = 0.4; P(a=1 | b = 4) = 0.4
P(a=2) = 0.3; P(a=2 | b=3) = 0.4; P(a=2 | b = 4) = 0.2
Thus, P(a|b) is not equal to P(a), for all values of A and B. For example, P(a=0) = 0.3; but P(a=0 | b=3) = 0.2.
This means that A and B are not independent.
Problem
X
0 1 2
3 0.1 0.2 0.2
Y
4 0.1 0.2 0.2
The table on the right shows the joint probability distribution between two random variables - X and Y. (In a
joint probability distribution table, numbers in the cells of the table represent the probability that particular
values of X and Y occur together.)
What is the mean of the sum of X and Y?
Solution

The correct answer is D. The solution requires three computations: (1) find the mean (expected value) of X,
(2) find the mean (expected value) of Y, and (3) find the sum of the means. Those computations are shown
below, beginning with the mean of X.
E(X) = Σ [ xi * P(xi) ]
E(X) = 0 * (0.1 + 0.1) + 1 * (0.2 + 0.2) + 2 * (0.2 + 0.2) = 0 + 0.4 + 0.8 = 1.2
Next, we find the mean of Y.
E(Y) = Σ [ yi * P(yi) ]
E(Y) = 3 * (0.1 + 0.2 + 0.2) + 4 * (0.1 + 0.2 + 0.2) = (3 * 0.5) + (4 * 0.5) = 1.5 + 2 = 3.5
And finally, the mean of the sum of X and Y is equal to the sum of the means. Therefore,
E(X + Y) = E(X) + E(Y) = 1.2 + 3.5 = 4.7
Problem
Suppose X and Y are independent random variables. The variance of X is equal to 16; and the variance of Y
is equal to 9. Let Z = X - Y.
What is the standard deviation of Z?
Solution
The correct answer is B. The solution requires us to recognize that Variable Z is a combination of two
independent random variables. As such, the variance of Z is equal to the variance of X plus the variance of
Y.
Var(Z) = Var(X) + Var(Y) = 16 + 9 = 25
The standard deviation of Z is equal to the square root of the variance. Therefore, the standard deviation is
equal to the square root of 25, which is 5.
Problem
The average salary for an employee at Acme Corporation is $30,000 per year. This year, management
awards the following bonuses to every employee.
 A Christmas bonus of $500.

 An incentive bonus equal to 10 percent of the employee's salary.
What is the mean bonus received by employees?
Solution
The correct answer is C. To compute the bonus, management applies the following linear transformation to
the each employee's salary.
Y = mX + b
Y = 0.10 * X + 500
where Y is the transformed variable (the bonus), X is the original variable (the salary), m is the multiplicative
constant 0.10, and b is the additive constant 500.
Since we know that the mean salary is $30,000, we can compute the mean bonus from the following
equation.
Y = mX + b
Y = 0.10 * $30,000 + $500 = $3,500
10. Questions
Objective Questions
1. Which operator is used to calculate average value
a. A[ ]
b. E[ ]
c. D[ ]
d. Z[ ]
2. Mean square value of RV is given as
a. E[X2]
b. E2[X]
c. [E[X2]]2
d. E[X2]
3. If X is uniformly distributed in [0,1] then P( X > 0.2) is
a. 0.2
b. 0.5
c. 0.7

d. 0.9
Short Questions
1. Write the formula to express the pdf of RV which is function of another RV.
2. Define expected value of RV
3. Write the formula to find the expected value of continuous & discrete RV
4. Write the formula to find the expected value of function of RV
5. Define variance of RV
6. Write the formula to find the variance of RV
7. Define Standard Deviation.
8. Write the formula to find the standard deviation of RV
9. What is Chebyshev inequality.
10. What is moment generating function
11. Define n-th order moment
12. Define n-th order characteristic function
Dec 2012
Q. Explain MGF of discrete random variable and continuous random variable in detail
Q. Find characteristic function of Binomial distribution and poisson distribution.
May 2012
Q. Find characteristic function of poisson distribution, it’s mean and variance

Q. If X is a continuous random variable and Y= aX + b then, find pdf of Y in terms of F X(x), CDF of X.
Q. Prove that if two random variables are independent, then density function of their sum is given by
convolution of their density functions.
Dec 2011
Q. Write short note on poisson distribution, Rayleigh distribution, Gaussian distribution
May 2011
Q. If X is a continuous random variable and Y= aX + b then, prove that -
1 Y b
f Y ( y)  fX ( )
|a| a
Q. Let X be a continuous random variable with uniform pdf in (0, 2π). Find probability density function of Y=
cos X
Q. Find characteristic function of poisson distribution, it’s mean and variance
Dec 2010
Q. In medical Imaging such as computer tomography the reaction between the detector reading Y and body
absorptivity X follows Y=eX law. Let X be N(µ,σ2). Compute the pdf of Y
Q. The characteristic function of the Laplace variable
f(x)=(m/2)e-m|x| -∞<x<∞
also find its mean and variance.
Dec 2009
Q.How the characteristic function Фx(w) of a random variable X is defined? Show that Фx(w) can be
expressed as
j n wn 1 dn
 ( w)   mn where mn  n [ n  X ( w)] w0 is the nth order moment of r.v X
n! j dw

Q. If X is Poisson distributed random variable find the moment generating function and characteristic
function
Q. Suppose, for has solutions . Then prove that
Q. If the probability density function of X is fX(x) = e-X for x>0; then find the probability density function of
Y= X3
Dec 2008
Q. If X is a continuous random variable and Y= aX + b then, prove that -
1 Y b
f Y ( y)  fX ( )
|a| a
May 2007
Q. (a) If X is a continuous random variable and Y= aX + b then, prove that -
1 Y b
f Y ( y)  fX ( )
|a| a
(b) If a random variable X has uniform distribution in (-2, 2), find the probability density function fy(y) of Y=
3X + 2.
Dec 2006

Q. Obtain the distribution function of Y =aX + b, where X is uniformly distributed in (c,d).
Q. How the characteristic function Фx(w) of a random variable X is defined? Show that Фx(w) can be
expressed as –
j n wn 1 dn
 ( w)   mn where mn  [  X ( w)] w0 is the nth order moment of r.v X
n! j n dw n
(b) Find the characteristic function of the geometric distribution given by
P(X=r)=qrp r=0,1,2,... ;p+q=1
Hence find the mean and the variance.
June 2005
Q. How the characteristic function Фx(w) of a random variable X is defined? Show that Фx(w) can be
expressed as –
j n wn 1 dn
 ( w)   mn where mn  [  X ( w)] w0 is the nth order moment of r.v X
n! j n dw n
An exponential distributed random variable X, with parameter A is given by X = λe-λx
Find E[X], Ф’(x))/j.
Dec 2005
Q. (i) Explain what is a moment generating function of a random variable.
(ii) If X is a random variable and f(x) is given by f(x) = f = 1/b. e-(x-a)/b, find the first and
second moments of X.

CHAPTER-3
Multiple of Random Variable and Convergence
1. Motivation
2. Syllabus

Sr. No. of Self
No. Hours Study
01 Multiple of  Vector random variables, 1 hour 1 hour
Pairs of random variables,
Random
Variable and  Joint CDF, Joint PDF 1 hour 1 hour
Convergence Independence, Conditional
CDF and PDF, Conditional
Expectation
 One function of two random
variable, two functions of two 1 hour 1 hour
random variables;
 Joint moments, joint 1 hour 1 hour

characteristic function,
covariance and correlation-
 independent, uncorrelated
and orthogonal random 1 hour 1 hour
variables.





5. Objective
In this chapter we study a few basic concepts of functions of random variables and investigate the expected
value of a certain function of a random variable. The techniques of moment generating functions and
characteristic functions, which are very useful in some applications, are presented.
6. Key Notation:
FX(x) Cumulative Distribution Function
fX(x) Probability Density Function
Semi open intervals
7. Key Definitions
1. Moment of random variable

2. Characteristic function of a discrete random variable
8. Key Relations
8. The function of RV
9. Expected value of RV
10. Expected value of function of RV

11. Standard Deviation
12. Variance
13. Chebyshev Inequality
14. Nth order moment of RV
Jointly Distributed Random Variables
We may define two or more random variables on the same sample space. Let and be two real
random variables defined on the same probability space The mapping such that for
is called a joint random variable.

Figure 1
• The above figure illustrates the mapping corresponding to a joint random variable. The joint random
variable in the above case is denoted by .
• We may represent a joint random variable as a two-dimensional vector .
• We can extend the above definition to define joint random variables of any dimension. The mapping
such that for is called an n-dimensional random variable and
denoted by the vector
Example1 Suppose we are interested in studying the height and weight of the students in a class. We
can define the joint RV where represents height and represents the weight.
Example 2 Suppose in a communication system is the transmitted signal and is the corresponding
noisy received signal. Then is a joint random variable.
Joint Probability Distribution Function
Recall the definition of the distribution of a single random variable. The event was used to
define the probability distribution function . Given , we can find the probability of any event
involving the random variable. Similarly, for two random variables and , the event

is considered as the representative event.
The probability is called the joint distribution function or the joint
cumulative distribution function (CDF) of the random variables and and denoted by .
Figure 2
satisfies the following properties:
Note that
•
• is right continuous in both the variables.
Figure 4
Given ,we have a complete description of the random variables and

.
To prove this
Similarly .
• Given , each of is called a marginal distribution

function or marginal cumulative distribution function (CDF).
Example 3

Consider two jointly distributed random variables and with the joint CDF
(a) Find the marginal CDFs
(b) Find the probability
(a)
(b)
Jointly Distributed Discrete Random Variables
If and are two discrete random variables defined on the same probability space such
that takes values from the countable subset and takes values from the countable subset .Then
the joint random variable can take values from the countable subset in . The joint random
variable is completely specified by their joint probability mass function
Given , we can determine other probabilities involving the random variables and

Remark
This is because
• Marginal Probability Mass Functions: The probability mass functions and are obtained
from the joint probability mass function as follows
and similarly
These probability mass functions and obtained from the joint probability mass functions
are called marginal probability mass functions .
Example 4 Consider the random variables and with the joint probability mass function as tabulated in
Table 1. The marginal probabilities and are as shown in the last column and the last row

respectively.
Table 1
Joint Probability Density Function
If and are two continuous random variables and their joint distribution function is continuous
in both and , then we can define joint probability density function by
provided it exists.
Clearly
Properties of Joint Probability Density Function
• is always a non-negative quantity. That is,

• The probability of any Borel set can be obtained by
Marginal density functions
The marginal density functions and of two joint RVs and are given by the
derivatives of the corresponding marginal distribution functions. Thus
Remark
• The marginal CDF and pdf are same as the CDF and pdf of the concerned single random variable. The
marginal term simply refers that it is derived from the corresponding joint distribution or density function
of two or more joint random variables.
• With the help of the two-dimensional Dirac Delta function, we can define the joint pdf of two discrete
jointly random variables. Thus for discrete jointly random variables and .

Example 5 The joint density function of the random variables in Example 3 is
Example 6 The joint pdf of two random variables and are given by
• Find .
• Find .
• Find and .
• What is the probability ?

Conditional Distributions
We discussed the conditional CDF and conditional PDF of a random variable conditioned on some events
defined in terms of the same random variable. We observed that
and

We can define these quantities for two random variables. We start with the conditional probability mass
functions for two random variables.
Conditional Probability Mass Functions
Suppose and are two discrete jointly random variable with the joint PMF The conditional
PMF of given is denoted by and defined as
Similarly we can define the conditional probability mass function
• The conditional PMF satisfies the properties of the probability mass functions.
• From the definition of conditional probability mass functions, we can define two independent random
variables. Two discrete random variables X and Y are said to be independent if and only if

so that
Bayes' Rule for Discrete Random Variables
Suppose and are two discrete jointly random variables. Given and we can
determine the a posteriori probability mass function by using

the Bayes' rule as follows:
Example 1 Consider the random variables and with the joint probability mass function as presented in
the
following table

The marginal probabilities are as shown in the last column and the last row
Conditional Probability Distribution Function
Consider two continuous jointly random variables and with the joint probability distribution
function We are interested to find the conditional distribution function of one of the random
variables on the condition of a particular value of the other random variable.
We cannot define the conditional distribution function of the random variable on the condition of the
event by the relation
as in the above expression. The conditional distribution function is defined in the limiting
sense as follows:

Conditional Probability Density Function
is called the conditional probability density function of given
Let us define the conditional distribution function .
The conditional density is defined in the limiting sense as follows
Because,
The right hand side of the highlighted equation is
Similarly we have

Two random variables are statistically independent if for all
• (4)
Example 2 X and Y are two jointly random variables with the joint pdf given by
find,
(a)
(b)
(c)
Solution:
Since
we get

Bayes’ Rule for Continuous Random Variables:
Given the marginal density function and the conditional density , we can apply the Bayes'
rule for two continuous joint random variables to get as follows. Recall that
In context of the above Bayes rule, is called the a priori density function and is called
the a posteriori density function.
Example 3 For random variables X and Y, the joint probability density function is given by

Find the marginal density Are independent?
and
Therefore,
and
Hence X and Y are not independent.
Bayes’ Rule for Mixed Random Variables

Let be a discrete random variable with probability mass function and be a continuous random
variable defined on the same sample space with the conditional probability density function In
practical problems we may have to estimate the conditional PMF of given the observed value We
can define this conditional PMF also in the limiting sense
Example 4
is a binary random variable with
is the Gaussian noise with mean

Then
Independent Random Variables Revisited
Let and be two random variables characterized by the joint distribution function
and the corresponding joint density function
Then and are independent if and are independent events. Thus,
and equivalently
Transformation of Two Random Variables
We are often interested in finding out the probability density function of a function of two or more RVs.
Following are a few examples.
• The received signal by a communication receiver is given by
where is received signal which is the superposition of the message signal and the noise .
The frequently applied operations on communication signals like modulation, demodulation, correlation
etc. involve multiplication of two signals in the form Z = XY.
We have to know about the probability distribution of in any analysis of . More formally, given two
random variables X and Y with joint probability density function and a function we
have to find .
In this lecture, we shall address this problem.
Probability Density of the Function of Two Random Variables
We consider the transformation
Consider the event corresponding to each z. We can find a variable subset such that

Probability density function of Z = X + Y .
Consider Figure 2

Figure 2
We have
Therefore, is the coloured region in the Figure 2.

If X and Y are independent
where * is the convolution operation.
Example 1
Suppose X and Y are independent random variables and each uniformly distributed over (a, b). and
are as shown in the figure below.

The PDF of is a triangular probability density function as shown in the figure.
Example 2 Erlang distribution
Suppose and are independent random variables and
We have
and
Note that is an Erlang distribution with
robability density function of Z = XY

Substituting u = xy du = xdy
Probability density function of
If X and Y are independent random variables, then

Example 3
Suppose X and Y are independent zero mean Gaussian random variable with unity standard deviation and
. Then
which is the Cauchy probability density function.
Probability density function of
Here
(changing from Cartesian into polar coordinates)

Example 4 Rayleigh random variable
Suppose X and Y are two independent Gaussian random variables each with mean 0 and variance and
.Then
The above is the Rayleigh density function which we discussed earlier.
Example 5 Rician random variable
Suppose X and Y are independent Gaussian variables with non-zero mean and respectively and
constant variance. We have to find the joint density function of the random variable .
Here

We have shown that
Suppose Then
where is the modified zeroth-order Bessel function of the first kind.
 When , and the Rician density reduces to the Rayleigh density
 When the Rician density approaches to the Gaussian density
 The Rician density is used to model the envelope of a sinusoid plus a narrow-band Gaussian noise.

 It is used to model the received noise in a multipath situation.
Joint Probability Density Function of Two Functions of Two Random Variables
We consider the transformation We have to find out the joint probability density
function where and .
Consider a differential region of area at point in the plane as shown in Figure 1.
Suppose the inverse mapping relation is and .
Let us see how the corners of the differential region are mapped to the plane. Observe that
Therefore,
The point is mapped to the point in the plane.

Figure 1
We can similarly find the points in the plane corresponding to and

The mapping is shown in Figure 2 . We notice that each differential region in the plane is a
parallelogram. It can be shown that the differential parallelogram at has a area
where is the Jacobian of the transformation defined as the determinant .
Further, it can be shown that the absolute values of the Jacobians of the forward and the inverse
transform are inverse of each other so that

where
Therefore, the differential parallelogram in Figure 2 has an area of
Suppose the transformation and has roots and let be the

roots. The inverse mapping of the differential region in the plane will be differential regions
corresponding to roots. The inverse mapping is illustrated in the following Figure 2 for As these
parallelograms are non- overlapping,
 If and does not have a root in , then

Example Pdf of Linear Transformation

Then
Example 2 Suppose X and Y are two independent Gaussian random variables each with mean 0 and
variance . Given and , find .
Solution:
We have and so that (1)
and (2)
From (1)
and
From (2)

Expected Values of Functions of Random Variables
Recall that
 If is a function of a continuous random variable then
 If is a function of a discrete random variable then

Suppose is a function of continuous random variables then the expected
value of is given by
Thus can be computed without explicitly determining .
We can establish the above result as follows.
Suppose has roots at . Then
where
is the differential region containing The mapping is illustrated in Figure 1 for .

Figure 1
Note that
As is varied over the entire axis, the corresponding (non-overlapping) differential regions in
plane cover the entire plane.
Thus,
If is a function of discrete random variables , we can similarly show that
Example 1 The joint pdf of two random variables is given by

Find the joint expectation of
Example 2 If
Proof:
Thus, expectation is a linear operator.
Example 3
Consider the discrete random variables discussed .The joint probability mass function of the
random variables are tabulated in Table . Find the joint expectation of .

Remark
(1) We have earlier shown that expectation is a linear operator. We can generally write
Thus
(2) If are independent random variables and ,then

Joint Moments of Random Variables
Just like the moments of a random variable provide a summary description of the random variable, so
also the joint moments provide summary description of two random variables. For two continuous random
variables , the joint moment of order is defined as
and the joint central moment of order is defined as
where and
Remark
(1) If are discrete random variables, the joint expectation of order and is defined as
(2) If and , we have the second-order moment of the random variables given by
(3) If are independent,

Covariance of two random variables
The covariance of two random variables is defined as
Cov(X,Y) is also denoted as .
Expanding the right-hand side, we get
The ratio is called the correlation coefficient. We will give an interpretation of
and later on.
We will also show that To establish the relation, we prove the following result:
For two random variables

Proof:
Consider the random variable
Non-negativity of the left-hand side implies that its minimum also must be nonnegative.
For the minimum value,

so the corresponding minimum is
Now
Thus
Uncorrelated random variables
Two random variables are called uncorrelated if

Recall that if are independent random variables, then
Then
Thus two independent random variables are always uncorrelated.
The converse is not always true.
• Two random variables may be dependent, but still they may be uncorrelated. If there exists
correlation between two random variables, one may be represented as a linear regression of the other. We
will discuss this point in the next section.
Linear Regression of on
Suppose is an approximation of Y in terms of X .This approximation is called the linear

regression of Y on X.
Therefore, is the regression error.
The mean square regression error is

Minimising the error will give optimal values of . Corresponding to the optimal solutions for
we have,
Solving for ,
so that
where is the correlation coefficient .
Thus is the linear regression of the random variable Y on the random variable
X.The linear regression approximates a random variable in terms of another random variable by means of a
straight- line fit.
The linear regression is illustrated in Figure 2.

Figure 2
Remark
If then are called positively correlated .
If then are called negatively correlated
If then are uncorrelated.
Note that independence implies uncorrelatedness. But uncorrelated generally does not imply
independence (except for jointly Gaussian random variables).
Example 4
are dependent, but they are uncorrelated.
Because
In fact for any zero- mean symmetric distribution of X, are uncorrelated.

Jointly Gaussian Random Variables
Many practically occuring random variables are modeled as jointly Gaussian random variables. For
example, noise samples at different instants in the communication system are modeled as jointly Gaussian
random variables.
Two random variables are called jointly Gaussian if their joint probability density function is
The joint pdf is determined by 5 parameters
 means
 variances
 correlation coefficient
We denote the jointly Gaussian random variables and with these parameters as
The joint pdf has a bell shape centred at as shown in the Figure 1 below. The variances
determine the spread of the pdf surface and determines the orientation of the surface in the

plane.
Figure 1 Jointly Gaussian PDF surface
Properties of jointly Gaussian random variables
(1) If and are jointly Gaussian, then and are both Gaussian.

We have
Similarly
(2) The converse of the above result is not true. If each of and is Gaussian, and are not
necessarily jointly Gaussian. Suppose
in this example is non-Gaussian and qualifies to be a joint pdf. Because,
and

The marginal density is given by
Similarly,
Thus and are both Gaussian, but not jointly Gaussian.
(3) If and are jointly Gaussian, then for any constants and ,the random variable given by
is Gaussian with mean and variance

(4) Two jointly Gaussian RVs and are independent if and only if and are uncorrelated
.Observe that if and are uncorrelated, then
Example 1 Suppose X and Y are two jointly-Gaussian 0-mean random variables with variances of 1 and 4
respectively and a covariance of 1. Find the joint PDF
We have
Joint Characteristic Functions of Two Random Variables
The joint characteristic function of two random variables X and Y is defined by
If and are jointly continuous random variables, then

Note that is same as the two-dimensional Fourier transform with the basis function
instead of
is related to the joint characteristic function by the Fourier inversion formula
If and are discrete random variables, we can define the joint characteristic function in terms of the
joint probability mass function as follows:
Properties of the Joint Characteristic Function
The joint characteristic function has properties similar to the properties of the chacteristic function of a
single random variable. We can easily establish the following properties:
1.
2.
3. If and are independent random variables, then
4. We have,

Hence,
In general, the order joint moment is given by
Example 2 The joint characteristic function of the jointly Gaussian random variables and with the
joint pdf
Let us recall the characteristic function of a Gaussian random variable

If and are jointly Gaussian,
we can similarly show that
We can use the joint characteristic functions to simplify the probabilistic analysis as illustrated on next
page:

Example 3 Linear transformation of two random variables
Suppose then
If and are jointly Gaussian, then
which is the characteristic function of a Gaussian random variable with
mean and variance
Thus the linear transformation of two Gaussian random variables is a Gaussian random variable.
Example 4 If Z = X + Y and X and Y are independent, then
Using the property of the Fourier transform, we get
Conditional Expectation
Recall that

 If are continuous random variables, then the conditional density function of
is given by
 If are discrete random variables, then the probability mass function of

is given by
The conditional expectation of is defined by
The conditional expectation of is also called the conditional mean of .
Clearly, denotes the centre of mass of the conditional pdf or the conditional pmf as shown in Figure
1 on next page.
Remark
 We can similarly define the conditional expectation of

 Higher-order conditional moments can be defined in a similar manner.
 Particularly, the conditional variance of is given by

Example 1
Consider the discrete random variables discussed in example 4 in lecture 18.The joint
probability mass function of the random variables are tabulated in Table . Find the joint expectation of
The conditional probability mass function is given by

Similarly, we can show that
We also note that
Example 2
Suppose are jointly uniform random variables with the joint probability density function
given by
Find

From the figure, in the shaded area.
Figure 2
We have
Example 3

Suppose are jointly Gaussian random variables with the joint probability density function
given by
Find .
We have,
which is a Gaussian distribution.

Therefore,
We can similarly show that
Conditional Expectation as a random variable
Using this function, we may define a random variable . Thus we may
consider as a function of the random variable and as the value of at .
Total expectation theorem
We establish the following results.
and
Proof :

Thus
and similarly
The above results simplify the calculation of the unconditional expectations .

We can also show that
and
Example 4 In example 1, we have

Baysean Estimation theory and conditional expectation
Consider two random variables with joint pdf . Suppose is observable and
some a priori information about is available in a sense that some values of are more likely. We can
represent this prior information in the form of a prior density function . We have to estimate for
a given value in some optimal sense.
The conditional density function is called likelihood function in estimation terminology.
Clearly
Also we have the Bayes rule

where is the a posteriori density function
Suppose the optimum estimator is a function of the random variable such that it minimizes the
mean-square estimation error . Such an estimator is known as the minimum mean-square

error(MMSE) estimator.
The estimation problem is
Minimize with respect to .
This is equivalent to minimizing
Since is always positive, the above integral will be minimum if the inner integral is minimum. This
results in the problem :
Minimize with respect to .

The minimum is given by

Example 5
Suppose are two jointly Gaussian random variables considered in the earlier example. We
have to estimate from a single observation .The MMSE estimator is given by
If and
then
Thus the MMSE estimator for two zero-mean jointly Gaussian random variables is linearly
related with the data . This result plays an important role in the optimal filtering of random signals.
Markov Inequality
Let us first take a look at the Markov Inequality. Even though the statement looks very simple, clever
application of the inequality is at the heart of more powerful inequalities like Chebyshev or Chernoff.
Initially, we will see the simplest version of the inequality and then we will discuss the more general
version. The basic Markov inequality states that given a random variable X that can only take non negative
values, then

There are some basic things to note here. First the term P(X >= k E(X)) estimates the probability that the
random variable will take the value that exceeds k times the expected value. The term P(X >= E(X)) is
related to the cumulative density function as 1 – P(X < E(X)). Since the variable is non negative, this
estimates the deviation on one side of the error.
Intuitive Explanation of Markov Inequality

Intuitively, what this means is that , given a non negative random variable and its expected value E(X)
(1) The probability that X takes a value that is greater than twice the expected value is atmost half. In other
words, if you consider the pmf curve, the area under the curve for values that are beyond 2*E(X) is atmost
half.
(2) The probability that X takes a value that is greater than thrice the expected value is atmost one third.
and so on.
Let us see why that makes sense. Let X be a random variable corresponding to the scores of 100 students in
an exam. The variable is clearly non negative as the lowest score is 0. Tentatively lets assume the highest
value is 100 (even though we will not need it). Let us see how we can derive the bounds given by Markov
inequality in this scenario. Let us also assume that the average score is 20 (must be a lousy class!). By
definition, we know that the combined score of all students is 2000 (20*100).
Let us take the first claim – The probability that X takes a value that is greater than twice the expected
value is atmost half. In this example, it means the fraction of students who have score greater than 40
(2*20) is atmost 0.5. In other words atmost 50 students could have scored 40 or more. It is very clear that it
must be the case. If 50 students got exactly 40 and the remaining students all got 0, then the average of
the whole class is 20. Now , if even one additional student got a score greater than 40, then the total score
of 100 students become 2040 and the average becomes 20.4 which is a contradiction to our original
information. Note that the scores of other students that we assumed to be 0 is an over simplification and
we can do without that. For eg, we can argue that if 50 students got 40 then the total score is atleast 2000
and hence the mean is atleast 20.
We can also see how the second claim is true. The probability that X takes a value that is greater than thrice
the expected value is atmost one third. If 33.3 students got 60 and others got 0, then we get the total score
as around 2000 and the average remains the same. Similarly, regardless of the scores of other 66.6
students, we know that the mean is atleast 20 now.
This also must have made clear why the variable must be non-negative. If some of the values are negative,
then we cannot claim that mean is atleast some constant C. The values that do not exceed the threshold
may well be negative and hence can pull the mean below the estimated value.
Let us look at it from the other perspective : Let p be the fraction of students who have a score of atleast a .
Then it is very clear to us that the mean is atleast a*p. What Markov inequality does is to turn this around.
It says, if the mean is a*p then the fraction of students with a score greater than a is atmost p. That is, we
know the mean here and hence use the threshold to estimate the fraction .
Generalized Markov Inequality

The probability that the random variable takes a value thats greater than k*E(X) is at most 1/k. The fraction
1/k act as some kind of a limit. Taking this further, you can observe that given an arbitrary constant a, the
probability that the random variable X takes a value >= a ie P(X >= a) is atmost 1/a times the expected
value. This gives the general version of Markov inequality.
In the equation above, I seperated the fraction 1/a because that is the only varying part. We will later see
that for Chebyshev we get a similar fraction. The proof of this inequality is straightforward. There are
multiple proofs even though we will use the follow proof as it allows us to show Markov
inequality graphically.This proof is partly taken from Mitzenmacher and Upfal’s exceptional book on
Randomized Algorithms.
Consider a constant a >= 0. Then define an indicator random variable I which takes value of 1 is X >=a . ie
Now we make a clever observation. We know that X is non negative. ie X >= 0. This means that the
fraction X/a is atleast 0 and atmost can be infinty. Also, if X < a, then X/a < 1. When X > a, X/a > 1. Using
these facts,
If we take expectation on both sides, we get
But we also know that the expectation of indicator random variable is also the probability that it takes the
value 1. This means E[I] = Pr(X>=a). Putting it all together, we get the Markov inequality.
Even more generalized Markov Inequality

Sometimes, it might happen that the random variable is not non-negative. In cases like this, a clever hack
helps. Design a function f(x) such that f(x) is non negative. Then we can apply Markov inequality on the
modified random variable f(X). The Markov inequality for this special case is :
This is a very powerful technique. Careful selection of f(X) allows you to derive more powerful bounds.
(1) One of the simplest examples is f(X) = |X| which guarantees f(X) to be non negative.
(2) Later we will show that Chebyshev inequality is nothing but Markov inequality that
uses
(3) Under some additional constraints, Chernoff inequality uses .
Simple Examples

Let us consider a simple example where it provides a decent bound and one where it does not. A typical
example where Markov inequality works well is when the expected value is small but the threshold to test
is very large.
Example 1:
Consider a coin that comes up with head with probability 0.2 . Let us toss it n times. Now we can use
Markov inequality to bound the probability that we got atleast 80% of heads.
Let X be the random variable indicating the number of heads we got in n tosses. Clearly, X is non negative.
Using linearity of expectation, we know that E[X] is 0.2n.We want to bound the probability P(X >= 0.8n).
Using Markov inequality , we get
Of course we can estimate a finer value using the Binomial distribution, but the core idea here is that we do
not need to know it !
Example 2:
For an example where Markov inequality gives a bad result, let us the example of a dice. Let X be the face
that shows up when we toss it. We know that E[X] is 7/2 = 3.5. Now lets say we want to find the probability
that X >= 5. By Markov inequality,
The actual answer of course is 2/6 and the answer is quite off. This becomes even more bizarre , for
example, if we find P(X >= 3) . By Markov inequality,
The upper bound is greater than 1 ! Of course using axioms of probability, we can set it to 1 while the
actual probability is closer to 0.66 . You can play around with the coin example or the score example to find
cases where Markov inequality provides really weak results.
Tightness of Markov
The last example might have made you think that the Markov inequality is useless. On the contrary, it
provided a weak bound because the amount of information we provided to it is limited. All we provided to
it were that the variable is non negative and that the expected value is known and finite. In this section, we
will show that it is indeed tight – that is Markov inequality is already doing as much as it can.
From the previous example, we can see an example where Markov inequality is tight. If the mean of 100
students is 20 and if 50 students got a score of exactly 0, then Markov implies that atmost 50 students can
get a score of atleast 40.
Consider a random variable X such that

We can estimate its expected value as
We can see that ,

This implies that the bound is actually tight ! Of course one of the reasons why it was tight is that the other
value is 0 and the value of the random variable is exactly k. This is consistent with the score example we
saw above.
Chebyshev Inequality
Chebyshev inequality is another powerful tool that we can use. In this inequality, we remove the restriction
that the random variable has to be non negative. As a price, we now need to know additional information
about the variable – (finite) expected value and (finite) variance. In contrast to Markov, Chebyshev allows
you to estimate the deviation of the random variable from its mean. A common use of it estimates the
probability of the deviation from its mean in terms of its standard deviation.
Similar to Markov inequality, we can state two variants of Chebyshev. Let us first take a look at the simplest
version. Given a random variable X and its finite mean and variance, we can bound the deviation as
There are few interesting things to observe here :

(1) In contrast to Markov inequality, Chebyshev inequality allows you to bound the deviation on both sides
of the mean.
(2) The length of the deviation is on both sides which is usually (but not always) tighter than the bound k
E[X]. Similarly, the fraction 1/k^2 is much more tighter than 1/k that we got from Markov inequality.
(3) Intuitively, if the variance of X is small, then Chebyshev inequality tells us that X is close to its expected
value with high probability.
(4) Using Chebyshev inequality, we can claim that atmost one fourth of the values that X can take is beyond
2 standard deviation of the mean.
Generalized Chebyshev Inequality

A more general Chebyshev inequality bounds the deviation from mean to any constant a . Given a positive
constant a ,
Proof of Chebyshev Inequality

The proof of this inequality is straightforward and comes from a clever application of Markov inequality. As
discussed above we select . Using it we get ,
We used the Markov inequality in the second line and used the fact that .

Common Pitfalls
It is important to notice that Chebyshev provides bound on both sides of the error. One common mistake
to do when applying Chebyshev is to divide the resulting probabilistic bound by 2 to get one sided error.
This is valid only if the distribution is symmetric. Else it will give incorrect results. You can refer Wikipedia to
see one sided Chebyshev inequalities.
Chebyshev Inequality for higher moments

One of the neat applications of Chebyshev inequality is to use it for higher moments. As you would have
observed, in Markov inequality, we used only the first moment. In the Chebyshev inequality, we use the
second moment (and first). We can use the proof above to adapt Chebyshev inequality for higher
moments. In this post, I will give a simple argument for even moments only. For general argument (odd and
even) look at this Math Overflow post.
The proof of Chebyshev for higher moments is almost exactly the same as the one above. The only
observation we make is that is always non negative for any k. Of course the next
observation is gives the 2k^th central moment . Using the statement from Mitzenmacher
and Upfal’s book we get ,
It should be intuitive to note that the more information we get the tighter the bound is. For Markov we got
1/t as the fraction. It was 1/a^2 for second order Chebyshev and 1/a^k for k^th order Chebyshev inequality.
Chebyshev Inequality and Confidence Interval

Using Chebyshev inequality, we previously claimed that atmost one fourth of the values that X can take is
beyond 2 standard deviation of the mean. It is possible to turn this statement around to get a confidence
interval.
If atmost 25% of the population are beyond 2 standard deviations away from mean, then we can be
confident that atleast 75% of the population lie in the interval . More generally,
we can claim that, percentage of the population lies in the
interval . We can similarly derive that 94% of the population lie within 4
standard deviations away from mean.
Applications of Chebyshev Inequality

We previously saw two applications of Chebyshev inequality – One to get tighter bounds using higher
moments without using complex inequalities. The other is to estimate confidence interval. There are some
other cool applications that we will state without providing the proof. For proofs refer the Wikipedia entry
onChebyshev inequality.
(1) Using Chebyshev inequality, we can prove that the median is atmost one standard deviation away from
the mean.
(2) Chebyshev inequality also provides the simplest proof for weak law of large numbers.
Tightness of Chebyshev Inequality

Similar to Markov inequality, we can prove the tightness of Chebyshev inequality. I had fun deriving this
proof and hopefully some one will find it useful. Define a random variable X as ,
[Note: I could not make the case statement work in WordPress Latex and hence the crude work around]
X={ + C with probability p
{ – C with probability p
{ with probability 1-2p
If we want to find the probability that the variable deviates from mean by constant C, the bound provided
by Chebyshev is ,
which is tight !
10. Questions
Objective Questions
1. Which operator is used to calculate average value
a. A[ ]
b. E[ ]
c. D[ ]
d. Z[ ]
2. Mean square value of RV is given as
a. E[X2]
b. E2[X]
c. [E[X2]]2
d. E[X2]

3. If X,Y are random variable, E[AX+BY] equals
a. A E[X] + B E[Y]
b. AX + B E[Y]
c. A E[X] + BY
d. A E[X + BY]
4. If X is uniformly distributed in [0,1] then P( X > 0.2) is
a. 0.2
b. 0.5
c. 0.7
d. 0.9
5. COV( X,Y) is defined as
a. E[(X-µX)(Y-µY)]
b. E[X-µY]
c. E[Y-µX]
d. E[µX-µY]
Short Questions
1. Write the formula to express the pdf of RV which is function of another RV.
2. Define expected value of RV
3. Write the formula to find the expected value of continuous & discrete RV
4. Write the formula to find the expected value of function of RV
5. Define variance of RV
6. Write the formula to find the variance of RV

7. Define Standard Deviation.
8. Write the formula to find the standard deviation of RV
9. What is Chebyshev inequality.
10. What is moment generating function
11. Define n-th order moment
12. Define n-th order characteristic function
Long Questions
1. The joint probability density function of (x,y) is given as
fX,Y(x,y)=20 e-4Xe-5Y x > 0, y > 0
=0 otherwise
Find the probability that
(i) 0 < x < 2 and 0.3 < y < 0.4

(ii) x < 1 and y > 0.3
2. Find the probability density function of Z = X + Y where X and Y are
(i) Any two random variables
(ii) Independent.
3. If X is a continuous random variable and Y= 4X + 2 then, find the pdf of Y.
4. Given: f(x,y) = k, 0<y<x<1
= 0 otherwise
Determine k and the conditional densities fx/y(x/ y) and Fy/x(y/ x).
5. Suppose X and Y are two random variables. Define covariance and correlation coefficient of X and Y.
When do we say that X and Y are
(i) Orthogonal
(ii) independent
(iii) Uncorrelated?
(iv) Are uncorrelated random variables independent?
Dec 2012
Q. If X and Y are two independent random variable with identical uniform distribution in (0,1) find
probability density function of (U,V) where U= X + Y and V=X-Y. Are U, V independent ?
May 2012
Q. The joint density function of two dimensional random variable (X,Y) is given by
f(x,y)= kxye-(x2 +y2)
Find (i) the value of K
(i) Marginal density function of X & Y

(ii) Conditional density function of Y given that X=x and the conditional density function of X
given Y=y
(iii) Check for independence of X and Y
Q. Suppose X and Y are continuous random variable with joint probability density function
f(x,y)= xe-y ; 0 < x < 2, y> 0
= 0 elsewhere

(i) Find joint cumulative function of X and Y
(ii) Find marginal probability function of X and Y
Dec 2011
Q. The joint probability density function of (x,y) is given as
fX,Y(x,y)=15 e-3Xe-5Y x > 0, y > 0
=0 otherwise
(i) x < 2 and y > 0.2

(ii) find marginal density of X and Y
(iii) Are X and Y independent
(iv) Find E[X/Y] and E[Y/X]
Q. Suppose X and Y are two random variables. Define covariance and correlation coefficient of X and Y.
When do we say that X and Y are (i) Orthogonal (ii) independent and (iii) uncorrelated? Are uncorrelated
random variables independent?
May 2011
Q. The joint density function of two dimensional random variable (X,Y) is given by
f(x,y)= kxye-(x2 +y2)
Find (i) the value of K
(i) Marginal density function of X & Y

(ii) Conditional density function of Y given that X=x and the conditional density function of X
given Y=y
(iii) Check for independence of X and Y

Q. Suppose X and Y are two random variables. Define covariance and correlation coefficient of X and Y.
When do we say that X and Y are (i) Orthogonal (ii) independent and (iii) uncorrelated? Are uncorrelated
random variables independent?
Dec 2010
Q. The joint probability density function of (x,y) is given as f X,Y(x,y)=C(1-x-y)
for the values of x and y for which (x,y) lies within the triangle as shown
outside the triangle fX,Y(x,y)= 0
Find (i) C
(ii) fX(x)
(iii) fY(y)
May 2010
The joint probability density function of (x,y) is given as
fX,Y(x,y)=CeXeY 0<X<Y<∞
=0 otherwise

Find (i) Normalization constant C
(ii) fX(x)
(iii) FY(y)
(iv) FX(x/y)
(v) FY(y/x)
(vi) EY(y/x)
(vii) EX(x/y)
Q. If fX,Y(x,y)=2eXeY 0<X<Y<∞
=0 otherwise
Find the correlation coefficient of X and Y. Are X and Y independent ?
Q. State and explain joint & conditional probabilities of the event.
Q. The characteristic function of the Laplace variable
f(x)=(m/2)e-m|x| -∞<x<∞
also find its mean and variance.
Dec 2009
Q. The joint probability density function of (x,y) is given as
fX,Y(x,y)=15 e-3Xe-5Y x > 0, y > 0
=0 otherwise
(v) 1 < x < 2 and 0.2 < y < 0.3

(vi) x < 2 and y > 0.2
Q. If X and Yare two independent exponential random variables with probability density functions given by
fx(x) = 2.e-2x , x > 0
= 0, x < 0 and
fy(Y) = 3.e-3y , y > 0
= 0, y<0
Find the probability density function of z =x + y.
June 2005
Q. We define conditional cdf of Y given X = x by
Fy(y Ix) = lim Fy(Y IX < X <.x + h) and applying Baye's rule it can be written as
h->0
FY(y| x <X< x+h)=lim P[Y<y, x<X<x+h] /P[x<X<x+h]
h->0
show that fY(y/x)= d/dy. [FY(y/x)]=fxy(x.y) / fx((x)
Find the characteristic function Фx(w) and hence determine the expected value of X.
May 2007
Q. The joint probability density function of two dimensional random variable (x, y) is given by - fxy(X, y) =
4xy e-(x2 +y2 ), x >0, y > 0
(i) Find the marginal density functions of x and y.
(ii) Find the conditional density function of Y given that X=x and the conditional density

function of X given that Y = y.
(iii) Check for independence of X and Y.
Q. If X and Yare two independent exponential random variables with probability density functions given by
fx(x) = 2.e-2x , x > 0
= 0, x < 0 and
fy(Y) = 3.e-3y , y > 0
= 0, y<0
Find the probability density function of z =x + y.
Q. The joint probability density function of (x, y) is f xy(x,y) =8.e-(2x+-4y); x, y > O.
If U= X/ Y and V= y, find the joint probability density function of (U, V) and hence find probability density
function of U.
Q. If X and Yare two random variables with standard 'deviations 6x and 6y and if C xy is the covariance
between them, then prove .
(i) Cxy(x, y) = Rxy (x, y) - E[X].E[Y]
(ii) | Cxy | < 6x.6y.
Also deduce that -1 < p < 1.
Q. If X = cos θ and Y = sin θ where θ is uniformly distributed over (0,2π).

Prove that -
(I) X and Yare uncorrelated.
(ii) X and Yare not independent.
Dec 2006
(a) Suppose X and Y are two random variables. Define covariance and
correlation coefficient of X and Y. When do we say that X and Y are
(i) Orthogonal (ii) independent and (iii) uncorrelated? Are uncorrelated random variables independent?
(b) Suppose that X and Yare continuous random variables with joint
probability density function:
f xy (x, y) = 1/2 xe - Y , 0 < x < 2, Y > 0
= 0 elsewhere
find: i) the joint distribution function of X; Y and
ii) the marginal probability density functions of X and Y.
Q. (a) Find the probability density function of Z = X + Y where X and Yare
(i) any two random variables (ii) independent.
If X and Yare independent, Binomial random variables with parameters (m,p) and (n,p) respectively, obtain
the distribution of X+Y.

(b) Given: f(x,y) = k, 0<x<y<1
= 0 otherwise
Determine k and the conditional densities fx/y(x/ y) and Fy/x(y/ x).
June 2006
(a) Suppose X and Y are two random variables. Define covariance and correlation coefficient of X and Y.
When do we say that X and Yare (i) orthogonal (ii) independent and (iii) uncorrelated ?
(b) Suppose X and Yare continuous random variables with joint probability density function.
f( x y) = (x + y)2 /40 ,-1 < x < 1 - 3 < Y < 3
= 0 elsewhere
Find (i) the marginal density functions of X and Y
(ii) means and variances of X and Y
(iii) correlation coefficient of X and Y [12]
(a) Suppose X and Y are independent random variables and each is exponentially distributed with common
parameter A. That is,
f x (x) = λe-λx, and f y (y) = λe-λy. Let the random variables u and v be given by u=X+ Y and v=X-Y, obtain the
joint density of u and v and the marginal density of u.
(b) Let f x y(x,y) = 1, 0< |Y| < x < 1
Determine E (XlY) and E (YIX).
Dec 2005
(b) If X and Yare independent random variables and z = x + y, find f(z) by the transform method.
June 2005

Q. fX,Y(x,y) = Cexey 0<y<x<∞
=0 otherwise
Find the value of constant C
Find (i) fX(x) (ii) FX(x) (iii) FY(y) (iv) FX(x/y) (v) fY(y/x) (vi) E(Y/x) (vii) E(X/y)
Q.(a) Let X and Y two continuous random variables. -
(i) Derive an expression for their joint moment at the origin. Why it is called correlation?
Explain its physical significance.
(ii) Derive an expression for their joint central moment. Why it is called covariance? Explain
its physical significance.
(iii) Derive an expression for 'their normalized covariance. Why it is called covariance
coefficient? Explain what is its physical significance? What is its range of values?
(iv) Explain when X and Yare orthogonal, when they are independent and they are uncorrelated
Q. fX,Y(x,y)= (x+y)2/40 -1<x<1, -3<y<3
=0 otherwise
(i) Find marginal densities of X and Y.
(ii) Find mean and variance of X and Y.
(iii) Find second order moment of X and Y.
(iv) Find correlation coefficient of X and Y.
Dec 2004
(a) Let X and Y two continuous random variables. What is meant by their correlation function R xy ? Derive
an expression for Rxy. given their joint density function, fxy (x, y). What happens to Rxy ?

(i) if X and Yare independent? (ii) if X and Yare orthogonal?
(b) What is meant by the covariance function Cxy of the random variables of X and Y. Write and expression
for the covariance. Given the mean value of X and Yare μx and μy respectively. Under which conditions Cxy is
positive? Under which conditions Cxy is negative? .What is normalized covariance or covariance coefficient?
Write the expression for ρ and its range of values.
(c) Let X and Y be two continuous random variables with means equal to 7/12 and 7/12 respectively and
variances equal to 11/44 and 11/44 respectively and their joint probability density function.
fX,Y(x,y)= x+y 0<x<1, 0<y<1
=0 otherwise
Determine correlation function, covariance function, and correlation coefficient of X and V.
Q. Define the characteristic functions Φx(w) and Φy(w) of the continuous random variables X and Y
respectively and find the probability density function fz(Z), given Z = X + Y.

CHAPTER-4
Sequence of Random Variables and Convergence
 Motivation
 Syllabus
Sr. No. of Self
No. Hours Study
01 Stochastic  Sequence of random variables 1 hour 1 hour
Convergence
and limit  Convergence everywhere,
theorems almost everywhere,
 MS, in probability, in 1 hour 1 hour

distribution a
 comparison of convergence
modes,

 strong law of large numbers 1 hour 1 hour
 Central Limit Theorem

and its significance.
 Books Recommended




5. Objective
In this chapter we study the laws of large numbers and the central limit theorem, which is one of the most
remarkable results in probability theory, are discussed.
6. Key Notation

E[ ], µ Expected value
σ Standard Deviation
σ 2, var[ ] Variance
E[Xn] n-th order moment of RV
ΦX(w) Characteristic function
7. Key Definitions & Relations
1. The weak law of large numbers states that the sample average converges in probability towards
the expected value
That is to say that for any positive number ε,
2. Central Limit Theorem
Consider independent random variables .The mean and variance of each
of the random variables are assumed to be known. Suppose and .

Form a random variable
The mean and variance of are given by

9. Theory
Convergence of a sequence of random variables
Let be a sequence n independent and identically distributed random variables. Suppose

we want to estimate the mean of the random variable on the basis of the observed data by means of the
relation
How closely does represent the true mean as n is increased? How do we measure the
closeness between and ?
Notice that is a random variable.What do we mean by the statement converges to ?
 Consider a deterministic sequence of real numbers The sequence converges to a
limit x if corresponding to every we can find a positive integer N such that
For example, the sequence converges to the number 0.
Because, for any we can choose a positive integer such that
 The Cauchy criterion gives the condition for convergence of a sequence without actually finding the
limit. The sequence converges if and only if, for every there exists a positive

integer N such that
Convergence of a random sequence cannot be defined as above. Note that for each
represent a sequence of numbers.
Thus represents a family of sequences of numbers. Convergence of a random sequence is

to be defined using different criteria. Five of these criteria are explained below.
1. Convergence Everywhere
A sequence of random variables is said to converge everywhere to X if
Note here that the sequence of numbers for each sample point is convergent.
2. Almost sure (a.s.) convergence or convergence with probability 1
A random sequence may not converge for every . Consider the event
The sequence is said to converge to X almost sure or with probability 1
We write in this case.
One important application is the Strong Law of Large Numbers (SLLN) :
If are independent and identically distributed random variables with a finite mean

, then
Remark:
 is called the sample mean.
 The strong law of large numbers states that the sample mean converges to the true mean as the
sample size increases.
 The SLLN is one of the fundamental theorems of probability. There is a weaker version of the law
that we will discuss later.
3. Convergence in the mean square sense
A random sequence is said to converge in the mean-square sense (m.s) to a random

variable X if
X is called the mean-square limit of the sequence and we write
where means limit in mean-square. We also write
 The following Cauchy criterion gives the condition for m.s. convergence of a random sequence
without actually finding the limit. The sequence converges in m.s. if and only if ,
for every there exists a positive integer N such that

Example 1
If are iid random variables, then
we have to show that
Now,
4. Convergence in probability
Associated with the sequence of random variables we can define a sequence of

probabilities
for every .
The sequence is said convergent to X in probability if this sequence of probability is
convergent that is for every .
We write to denote convergence in probability of the sequence of random variables
to the random variable X.
If a sequence is convergent in mean, then it is convergent in probability also, because
[ Markov Inequality ]

we have,
if ( mean square convergent ) then
Example 2
suppose be a sequence of random variables with
Clearly,
Therefore
Thus the above sequence converges to a constant in probability.
Remark:
Convergence in probability is also called stochastic convergence.
Weak Law of Large numbers
Suppose are independent and identically distributed random variables, with sample
mean

We have,
5. Convergence in distribution
Consider the random sequence and a random variable . Suppose and
are the distribution functions of and respectively. The sequence is said to converge to in
distribution if
for all x at which is continuous. Here the two distribution functions eventually coincide. We write
to denote convergence in distribution of the random sequence to the random variable

.
Example 3 Suppose is a sequence of RVs with each random variable having the
uniform density

define
We can show that
clearly,
Relation between Types of Convergence
Central Limit Theorem
Consider independent random variables .The mean and variance of each of the
random variables are assumed to be known. Suppose and . Form a random

variable
The mean and variance of are given by

and
Thus we can determine the mean and the variance of .
Can we guess about the probability distribution of ?

The central limit theorem (CLT) provides an answer to this question.
he CLT states that under very general conditions converges in distribution to
as . The conditions are:
1. The random variables are independent and identically distributed.
2. The random variables are independent with same mean and variance, but not
identically distributed.
3. The random variables are independent with different means and same variance and
not identically distributed.
4. The random variables are independent with different means and each variance
being neither too small nor too large.
We shall consider the first condition only. In this case, the central-limit theorem can be stated as follows:
Suppose is a sequence of independent and identically distributed random variables
each with mean and variance and Then, the sequence { } converges in
distribution to a Gaussian random variable with mean 0 and variance . That is,

Remarks
 The central limit theorem is really a property of convolution. Consider the sum of two statistically
independent random variables, say, . Then the pdf is the convolution of
. This can also be shown with the help of the characteristic functions as follows:
where * is the convolution operation.
We can illustrate the CLT by convolving two uniform distributions repeatedly. In Figure 1, the
convolution of two uniform distributions gives a triangular distribution. Further convolution gives a
parabolic distribution and so on.

Proof of the Central Limit Theorem :
We give a less rigorous proof of the theorem with the help of the characteristic function. Further we
consider each of to have zero mean. Thus,
Clearly,
The characteristic function of is given by
We will show that as the characteristic function is of the form of the characteristic function of a
Gaussian random variable.
Expanding in power series

Assume all the moments of to be finite. Then
Substituting
where is the average of terms involving and higher powers of .
Note also that each term in involves a ratio of a higher moment and a power of and therefore,
which is the characteristic function of a Gaussian random variable with 0 mean and variance .
Remark
1. Under the conditions of the CLT, the sample mean converges in distribution to
In other words, if samples are taken from any distribution with mean and variance
, as the sample size increases, the distribution function of the sample mean approaches to the
distribution function of a Gaussian random variable.
2. The CLT states that the distribution function converges to a Gaussian distribution function.
The theorem does not say that the pdf is a Gaussian pdf in the limit. For example, suppose
each has a Bernoulli distribution. Then the pdf of Y consists of impulses and can never approach

the Gaussian pdf.
3. The Cauchy distribution does not meet the conditions for the central limit theorem to hold. As we
have noted earlier, this distribution does not have a finite mean or a finite variance. Suppose a
random variable has the Cauchy distribution
The characteristic function of is given by
The sample mean will have the characteristic function
Thus the sum of large number of Cauchy random variables will not follow a Gaussian distribution.
1. The central-limit theorem is one of the most widely used results of probability. If a random variable
is result of several independent causes, then the random variable can be considered to be Gaussian.
For example,
1. the thermal noise in a resistor is result of the independent motion of billions of electrons
and is modelled as Gaussian.
2. the observation error/ measurement error of any process is modeled as a Gaussian.
2. The CLT can be used to simulate a Gaussian distribution given a routine to simulate a particular
random variable.
3. Normal approximation of the Binomial distribution
4. One of the applications of the CLT is in approximation of the Binomial coefficients. We have already
stated about this approximation.Suppose, .... is a sequence of Bernoulli(p) random

variables with
5. Then is a Binomial distribution with and
Thus,
or,
( assume the integrand interval = 1 )

This is normal approximation to the Binomial coefficients and is known as the De-Moirre-Laplace
approximation.
Law of large numbers (LLN)
In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing
the same experiment a large number of times. According to the law, the average of the results obtained
from a large number of trials should be close to the expected value, and will tend to become closer as more
trials are performed.
The LLN is important because it "guarantees" stable long-term results for the averages of some random
events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will
tend towards a predictable percentage over a large number of spins. Any winning streak by a player will
eventually be overcome by the parameters of the game. It is important to remember that the LLN only
applies (as the name indicates) when a large number of observations are considered. There is no principle
that a small number of observations will coincide with the expected value or that a streak of one value will
immediately be "balanced" by the others
Examples
For example, a single roll of a fair, six-sided die produces one of the numbers 1, 2, 3, 4, 5, or 6, each with
equal probability. Therefore, the expected value of a single die roll is

According to the law of large numbers, if a large number of six-sided dice are rolled, the average of their
values (sometimes called the sample mean) is likely to be close to 3.5, with the precision increasing as more
dice are rolled.
It follows from the law of large numbers that the empirical probability of success in a series of Bernoulli
trials will converge to the theoretical probability. For a Bernoulli random variable, the expected value is the
theoretical probability of success, and the average of n such variables (assuming they are independent and
identically distributed (i.i.d.)) is precisely the relative frequency.
For example, a fair coin toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability
that the outcome will be heads is equal to 1/2. Therefore, according to the law of large numbers, the
proportion of heads in a "large" number of coin flips "should be" roughly 1/2. In particular, the proportion
of heads after n flips will almost surely converge to 1/2 as n approaches infinity.
Though the proportion of heads (and tails) approaches 1/2, almost surely the absolute (nominal) difference
in the number of heads and tails will become large as the number of flips becomes large. That is, the
probability that the absolute difference is a small number, approaches zero as the number of flips becomes
large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero.
Intuitively, expected absolute difference grows, but at a slower rate than the number of flips, as the
number of flips grows.
Forms
Two different versions of the law of large numbers are described below; they are called the strong law of
large numbers, and the weak law of large numbers. Both versions of the law state that – with virtual
certainty – the sample average
converges to the expected value
where X1, X2, ... is an infinite sequence of i.i.d. Lebesgue integrable random variables with expected value
E(X1) = E(X2) = ...= µ. Lebesgue integrability of Xj means that the expected value E(Xj) exists according to
Lebesgue integration and is finite.
An assumption of finite variance Var(X1) = Var(X2) = ... = σ2 < ∞ is not necessary. Large or infinite variance
will make the convergence slower, but the LLN holds anyway. This assumption is often used because it
makes the proofs easier and shorter.
The difference between the strong and the weak version is concerned with the mode of convergence being
asserted. For interpretation of these modes, see Convergence of random variables.

Weak law
The weak law of large numbers (also called Khintchine's law) states that the sample average converges in
probability towards the expected value[6][proof]
That is to say that for any positive number ε,
Interpreting this result, the weak law essentially states that for any nonzero margin specified, no matter
how small, with a sufficiently large sample there will be a very high probability that the average of the
observations will be close to the expected value; that is, within the margin.
Convergence in probability is also called weak convergence of random variables. This version is called the
weak law because random variables may converge weakly (in probability) as above without converging
strongly (almost surely) as below.
Strong law
The strong law of large numbers states that the sample average converges almost surely to the expected
value[7]
That is,
The proof is more complex than that of the weak law.[8] This law justifies the intuitive interpretation of the
expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "long-
term average".
Almost sure convergence is also called strong convergence of random variables. This version is called the
strong law because random variables which converge strongly (almost surely) are guaranteed to converge
weakly (in probability). The strong law implies the weak law but not vice versa, when the strong law
conditions hold the variable converges both strongly (almost surely) and weakly (in probability) . However
the weak law may hold in conditions where the strong law does not hold and then the convergence is only
weak (in probability) .
There are different views among mathematicians whether the two laws could be unified to one law,
thereby replacing the weak law.[9]

However the strong law conditions could not be proven to hold same as the weak law to date.
The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem.
Moreover, if the summands are independent but not identically distributed, then
provided that each Xk has a finite second moment and
Differences between the weak law and the strong law
The weak law states that for a specified large n, the average is likely to be near μ. Thus, it leaves open
the possibility that happens an infinite number of times, although at infrequent intervals.
The strong law shows that this almost surely will not occur. In particular, it implies that with probability 1,
we have that for any ε > 0 the inequality holds for all large enough n.
Central Limit Theorem Examples
A Central Limit Theorem word problem will most likely contain the phrase “assume the variable is normally
distributed”, or one like it. With these central limit theorem examples, you will be given:
A population (i.e. 29-year-old males, seniors between 72 and 76, all registered vehicles, all cat owners)
An average (i.e. 125 pounds, 24 hours, 15 years, $15.74)
A standard deviation (i.e. 14.4lbs, 3 hours, 120 months, $196.42)
A sample size (i.e. 15 males, 10 seniors, 79 cars, 100 households)
Central Limit Theorem Examples: Greater than
General Steps
Step 1:Identify the parts of the problem. Your question should state:
the mean (average or μ)
the standard deviation (σ)

population size
sample size (n)
a number associated with “greater than” ( ). Note: this is the sample mean. In other words, the problem
is asking you “What is the probability that a sample mean of x items will be greater than this number?
Step 2: Draw a graph. Label the center with the mean. Shade the area roughly above (i.e. the “greater
than” area). This step is optional, but it may help you see what you are looking for.
Step 3: Use the following formula to find the z-score. Plug in the numbers from step 1.
Click here if you want easy, step-by-step instructions for solving this formula.
Subtract the mean (μ in step 1) from the ‘greater than’ value ( in step 1). Set this number aside for a
moment.
Divide the standard deviation (σ in step 1) by the square root of your sample (n in step 1). For example, if
thirty six children are in your sample and your standard deviation is 3, then 3/√36=0.5
Divide your result from step 1 by your result from step 2 (i.e. step 1/step 2)
Step 4: Look up the z-score you calculated in step 3 in the z-table. If you don’t remember how to look up z-
scores, you can find an explanation in step 1 of this article: Area to the right of a z-score.
Step 5: Subtract your z-score from 0.5. For example, if your score is 0.1554, then 0.5 – 0.1554 = 0.3446.
Step 6: Convert the decimal in Step 5 to a percentage. In our example, 0.3446 = 34.46%.
That’s it!
2. Specific Example

Q. A certain group of welfare recipients receives SNAP benefits of $110 per week with a standard deviation
of $20. If a random sample of 25 people is taken, what is the probability their mean benefit will be greater
than $120 per week?
Step 1: Insert the information into the z-formula:

= (120-110)/20 √25 = 10/ (20/5) = 10/4 = 2.5.
Step 2: Look up the z-score in a table (or calculate it using technology). A z-score of 2.5 has an area of
roughly 49.38%. Adding 50% (for the left half of the curve), we get 99.38%.
Central Limit Theorem Examples: Less than
1. General Steps
Step 1: Identify the parts of the problem. Your question should state:
the mean (average or μ)
the standard deviation (σ)
population size
sample size (n)
a number associated with “less than” ( )
Step 2: Draw a graph. Label the center with the mean. Shade the area roughly below (i.e. the “less than”
area). This step is optional, but it may help you see what you are looking for.
Step 3: Use the following formula to find the z-score. Plug in the numbers from step 1.
Click here if you want simple, step-by-step instructions for using this formula.
If formulas confuse you, all this formula is asking you to do is:

Subtract the mean (μ in step 1) from the less than’ value ( in step 1). Set this number aside for a
moment.
Divide the standard deviation (σ in step 1) by the square root of your sample (n in step 1). For example, if
thirty six children are in your sample and your standard deviation is 2, then 3/√36=0.5
Divide your result from step 1 by your result from step 2 (i.e. step 1/step2)
Step 4: Look up the z-score you calculated in step 4 in the z-table. If you don’t remember how to look up z-
scores,you can find an explanation in step 1 of this article on area to the right of a z-score in a normal
distribution curve.
Step 5: Add your z-score to 0.5. For example, if your z-score is 0.1554, then 0.5 + 0.1554 is 0.6554.
Step 6:Convert the decimal in Step 6 to a percentage. In our example, 0.6554=65.54%.
That’s it!
2. Specific Example
A population of 29 year-old males has a mean salary of $29,321 with a standard deviation of $2,120. If a
sample of 100 men is taken, what is the probability their mean salaries will be less than $29,000?
Step 1: Insert the values into the z-formula:
=(29,000-29,321)/2,120/√100 = -321/212 = -1.51.
Step 2: Look up the z-score in the left-hand z-table (or use technology). -1.51 has an area of 93.45%.
However, this is not the answer, as the question is asking for LESS THAN, and 93.45% is the area “greater
than” so you need to subtract from 100%.
100%-93.45%=6.55% or about 0.07.
Central Limit Theorem Examples: Between
Sample problem: There are 250 dogs at a dog show who weigh an average of 12 pounds, with a standard
deviationof 8 pounds. If 4 dogs are chosen at random, what is the probability they have an average weight
of greater than 8 pounds and less than 25 pounds?
Step 1:Identify the parts of the problem. Your question should state:
mean (average or μ) standard deviation (σ) population size
sample size (n)
number associated with “less than” 1
number associated with “greater than” 2

Step 2: Draw a graph. Label the center with the mean. Shade the area between 1 and 2. This step is
optional, but it may help you see what you are looking for.
Step 3: Use the following formula to find the z-scores.
All this formula is asking you to do is:
a) Subtract the mean (μ in Step 1) from the greater than value (Xbar in Step 1): 25-12=13.
b) Divide the standard deviation (σ in Step 1) by the square root of your sample (n in Step 1): 8/sqrt4=4
c) Divide your result from a by your result from b: 13/4=3.25
Step 4 Use the formula from Step 3 to find the z-values. This time, use Xbar2 from Step 1 (8).
a) Subtract the mean (μ in Step 1) from the greater than value (Xbar in Step 1): 8-12=-4.
b) Divide the standard deviation (σ in Step 1) by the square root of your sample (n in Step 1): 8/sqrt4=4
c) Divide your result from a by your result from b: -4/4= -1
Step 5: Look up the value you calculated in Step 3 in the z-table.
Z value of 3.25 corresponds to .4994
Step 6: Look up the value you calculated in Step 4 in the z-table.
Z value of 1 corresponds to .3413
Note that the bell curve is symmetrical, so if you want to look up a negative value like -1, then just look up
the positive counterpart. The area will be the same.
Step 7: Add Step 5 and 6 together:
.4994 + .3413 = .8407

Step 8: Convert the decimal in Step 7 to a percentage:
.8407 = 84.07%
That’s it!
Back to top for more Central Limit Theorem Examples
Central Limit Theorem on the TI 89
Sample problem: A population of community college students includes inner city students (p = .33). What is
theprobability that a random sample of 45 students from the population will have from 20% to 40% inner
city students?
Step 1: Press APPS. Highlight the Stats/List Editor by using the scroll keys. Press ENTER.
If you don’t see the Stats/List editor you need to load the app. See instructions here.
Step 2: Press F5 and scroll down to C: BinomialCdf.
Step 3: Enter 45 in the Num Trials box.
Step 4: Scroll down and enter .33 in the Prob Success box.
Step 5: Scroll down and enter 9 in the Lower Value box (because 20% of 45 = 9).
Step 6: Scroll down and enter 18 in the Upper Value box (because 40% of 45 = 18). Press ENTER.
Step 7: Read the Result: Cdf=.857142. This means that the probability your random sample will have 20-
40% inner city students is 85.71%.
10. Questions
Objective questions
1. Central Limit theorem is applicable to
a. Weak variable
b. Large variables
c. Strong variable
d. Random variables

2. Weak law of large numbers is applicable to
a. Weak variable
b. Large variables
c. Strong variable
d. Random variables
3. Convergence Everywhere implies
a. | X(s) - Xn(s) | → 0
b. | X(s) - Xn(s) | → ∞
c. | X(s) - Xn(s) | → µ
d. | X(s) - Xn(s) | → σ
4. Almost sure convergence or convergence with probability 1 implies
a. P(s| Xn(s) → X(s) ) = 1
b. P(s| Xn(s) → X(s) ) = 0
c. P(Xn(s)) = 1
d. P( X(s) ) = 0
5. Convergence in the mean square sense implies
a. E [ Xn- X]2 → 0
b. E [ X]2 → 0
c. E [ Xn]2 → 0

d. E [ Xn- X] → 0
Short Questions
1. Explain when sequence of random variables is said to converge everywhere
2. Explain when sequence of random variables is said Almost sure (a.s.) to converge or converge with
probability 1
3. Explain when sequence of random variables is said to converge in the mean square sense
4. Explain when sequence of random variables is said to converge in probability
5. State Central limit theorem
6. State weak law of large numbers
Long Questions
1. The scores on a general test have mean 450 and standard deviation 50. It is highly desirable to score over
480 on this exam. A person can get into Smith's College prestigious MBA program if he/she scores over 480.
In one location 25 people sign up to take the exam. The average score of these 25 people exceeds 490. Is
this odd? Should the test center investigate? Answer on the basis of the CLT.
2. A machine fills cereal boxes at a factory. Due to an accumulation of small errors (different flakes sizes,
etc.) it is thought that the amount of cereal in a box is normally distributed with mean 22 oz. for a

supposedly 20 oz. box. Suppose the standard deviation of the amount filled is 1.3 oz. A federal regulatory
selects four of these boxes at random and finds that the average content of these boxes is less than 18 oz.
This official knows that the company claims the mean content to be 22 oz. He promptly fines the company.
Who is right? Use the CLT in your answer.
3. Sixteen adult males are in a pit which is 98 feet deep. They decide to stand on one another (feet to
head), hoping that the person on top can grip the top of the pit and get out, and, hence go for help. What's
the probability that their plan succeeds?
4.. (Weak law of large numbers) If are iid random variables each with mean and
Show that converges to in probability.
5. Suppose and . Examine if
6. Show that implies
Dec 2012
Q. State & explain Central Limit theorem.
Q. Define sequence of random variable
Q. explain and prove Chebyshev inequality
May 2012
Dec 2011

Dec 2010
1. State & explain Central Limit theorem.
2. Let X1, X2, X3,..... be the sequence of random variable
Define (i) convergence almost everywhere
(ii) Convergence in probability
(iii) Convergence in mean square sense
(iv) Convergence in distribution
For the above sequence of random variable
May 2010
1. State & explain Central Limit theorem.
2. Explain strong law of large numbers
3. Describe the sequence of random variable

CHAPTER-5
Random Process
1. Motivation:
This topic develops the fundamental understanding and analyzes the behavior of signals and
random phenomena.
2. Syllabus:
Sr. No Content Duration Self Study Time
1.  spectral representation of a real 1 hour 1 hour

WSS process
 power spectral density and 1 hour 1 hour
properties
 cross power spectral density and 1 hour 1 hour
properties
1 hour 1 hour
 autocorrelation function and power
spectral density of a WSS random
sequence
 linear time-invariant system with a 1 hour 1 hour

WSS process as an input
stationarity of the output,
 Autocorrelation and

power-spectral density of the 1 hour 1 hour
output
 examples of random processes:

white noise process and white
1 hour 1 hour
noise sequence; Gaussian process;
Poisson process
 spectral representation of a real 1 hour 1 hour
WSS process
 power spectral density and

properties
1 hour 1 hour
 cross power spectral density and

properties
1 hour 1 hour
 autocorrelation function and power

spectral density of a WSS random
1 hour 1 hour
sequence
 linear time-invariant systemwith a

WSS process as an input
1 hour 1 hour
stationarity of the output,
 Autocorrelation and
power-spectral density of the
output 1 hour 1 hour
 examples of random processes:

white noise process and white

noise sequence; Gaussian process;
Poisson process
1 hour 1 hour
3. References:
4. Weightage in University Examination: 20 to 25 Marks
5. Prerequisite:
Knowledge of signals and its behavior is required
6. Key Notations:

1. SSS : strictsense stationary
2. WSS wide-sense stationary
E[ ], µ Expected value
σ Standard Deviation
σ 2, var[ ] Variance
E[Xn] n-th order moment of RV
ΦX(w) Characteristic function
R( t) Autocorrelation
S(w ) Power Spectral Density
δ( t) Delta function
h( t) Impulse Response
H(z) System Function
T( ) Transformation
7. Key Definitions:
1. Random Process:
A random process maps each sample point to a waveform.
Thus a random process is a function of the sample point and index variable and may be written as
2 Conditional Probability:

For two events A and B with , the conditional probability was defined as
3. Conditional Distribution Function
Consider the event and any event B involving the random variable X . The conditional
distribution function of X given B is defined as
4. Linear system
The system is called linear if the principle of superposition applies: the weighted sum of inputs results
in the weighted sum of the corresponding outputs . Thus for a linear system
5. Linear time-invariant system
Consider a linear system with y ( t ) = T x ( t ). The system is called time-invariant if
6. Causal system
The system is called causal if the output of the system at depends only on the present and
past values of input. Thus for a causal system

7. Zero - the point in the where Consequently at such a point.
8. Pole - the point in the where Consequently at such a point.
8. Theory and Mathematical Representation
Introduction
1. Random Process
In practical problems, we deal with time varying waveforms whose value at a time is random in nature. For
example, the speech waveform recorded by a microphone, the signal received by communication receiver
or the daily record of stock-market data represents random variables that change with time. How do we
characterize such data? Such data are characterized as random or stochastic processes. This will covers the
fundamentals of random processes.
Recall that a random variable maps each sample point in the sample space to a point in the real line.
A random process maps each sample point to a waveform.
Consider a probability space . A random process can be defined on as an indexed
family of random variables where is an index set, which may be discrete or

continuous, usually denoting time. Thus a random process is a function of the sample point and index
variable and may be written as .
 For a fixed is a random variable.
 For a fixed , is a single realization of the random process and is a deterministic

function.
For a fixed and a fixed is a single number.
When both and are varying we have the random process .
 The random process is normally denoted by

 Example 1 Consider a sinusoidal signal where is a binary random variable with
probability mass functions and
 Clearly, is a random process with two possible realizations and
At a particular time is a random variable with two values and
Continuous-time vs. Discrete-time process
If the index set is continuous, is called a continuous-time process.
Example 2 Suppose, where and are constants and is uniformly
distributed between 0 and . is an example of a continuous-time process.
Figure below shows continuous-time process.

If the index set is a countable set, is called a discrete-time process. Such a random process
can be represented as and called a random sequence. Sometimes the notation is

used to describe a random sequence indexed by the set of positive integers.
We can define a discrete-time random process on discrete points of time. Particularly, we can get a
discrete-time random process by sampling a continuous-time process \ at a
uniform interval such that
The discrete-time random process is more important in practical implementations. Advanced

statistical signal processing techniques have been developed to process this type of signals.
Example 3 Suppose where is a constant and is a random variable uniformly

distributed between and .
is an example of a discrete-time process illustrated in Figure
2. Continuous-state vs. Discrete-state process
The value of a random process is at any time can be described from its probabilistic model.
The state is the value taken by at a time , and the set of all such states is called the state
space. A random process is discrete-state if the state-space is finite or countable. It also means that the
corresponding sample space is also finite or countable. Otherwise , the random process is called continuous
state.

Example 4 Consider the random sequence generated by repeated tossing of a fair coin where
we assign 1 to Head and 0 to Tail.
Clearly, can take only two values - 0 and 1. Hence is a discrete-time two-state process.
How to describe a random process?
As we have observed above that at a specific time is a random variable and can be described
by its probability distribution function This distribution function is called the first-
order probability distribution function.
We can similarly define the first-order probability density function
To describe , we have to use joint distribution function of the random variables at all possible
values of . For any positive integer , represents jointly distributed random
variables. Thus a random process can thus be described by specifying the joint
distribution function .
or th the joint probability density function
If is a discrete-state random process, then it can be also specified by the collection of
joint probability mass function
3. Stationary Random Process

The concept of stationarity plays an important role in solving practical problems involving random
processes. Just like time-invariance is an important characteristics of many deterministic systems,
stationarity describes certain time-invariant property of a class of random processes. Stationarity also leads
to frequency-domain description of a random process.
4.1 Strict-sense Stationary Process
A random process is called strict-sense stationary (SSS) if its probability structure is invariant with
time. In terms of the joint distribution function, is called SSS if
Thus, the joint distribution functions of any set of random variables does not depend
on the placement of the origin of the time axis. This requirement is a very strict. Less strict form of
stationarity may be defined.
Particularly,
if then is called
order stationary.
is called order stationary does not depend on the placement of the origin of the time axis. This
requirement is a very strict. Less strict form of stationarity may be defined.
 If is stationary up to order 1

Let us assume Then
As a consequence
 If is stationary up to order 2
Put
As a consequence, for such a process
Similarly,
Therefore, the autocorrelation function of a SSS process depends only on the time lag
We can also define the joint stationarity of two random processes. Two processes

and are called jointly strict-sense stationary if their joint probability distributions of any
order is invariant under the translation of time. A complex random process is called
SSS if and are jointly SSS.
4.2 Wide-sense stationary process
It is very difficult to test whether a process is SSS or not. A subclass of the SSS process called the wide sense
stationary process is extremely important from practical point of view.
A random process is called wide sense stationary process (WSS) if
(1) For a WSS process
(2) An SSS process is always WSS, but the converse is not always true.
Example 1 Sinusoid with random phase
Consider the random process given by
where are constants and are unifirmly distributed between

This is the model of the carrier wave (sinusoid of fixed frequency) used to analyse the noise performance
of many receivers.
Note that
By applying the rule for the transformation of a random variable, we get
which is independent of Hence is first-order stationary.
Note that
and
Hence is wide-sense stationary

4. Mean and Variance
For any t ,
Thus mean and variance of the process are constants.
5. Autocorrelation Function
When are not in the same pulse interval, and hence are independent.
To find the autocorrelation function for , let us consider the case .
Depending on the delay D , the points may lie on one or two pulse intervals.
6.1 Autocorrelation of a deterministic signal
Consider a deterministic signal such that
Such signals are called power signals. For a power signal the autocorrelation function is defined as

measures the similarity between a signal and its time-shifted version. Particularly,
is the mean-square value. If is a voltage waveform across a 1 ohm resistance,
then is the average power delivered to the resistance. In this sense, represents the average
power of the signal.
Example 1 Suppose The autocorrelation function of at lag is given by
We see that of the above periodic signal is also periodic and its maximum occurs when
The power of the signal is
The autocorrelation of the deterministic signal gives us insight into the properties of the autocorrelation
function of a WSS process. We shall discuss these properties next.
Poisson process
In probability theory, a Poisson process is a stochastic process that counts the number of events[note 1] and
the time points at which these events occur in a given time interval. The time between each pair of
consecutive events has an exponential distribution with parameter λ and each of these inter-arrival times is
assumed to be independent of other inter-arrival times. The process is named after the Poisson distribution
introduced by French mathematician Siméon Denis Poisson.[1] It describes the time of events in radioactive
decay,[2] telephone calls at a call center,[3] document requests on a web server,[4] and many other punctual
phenomena where events occur independently from each other.
The Poisson process is a continuous-time stochastic process; the sum of a Bernoulli process can be thought
of as its discrete-time counterpart. A Poisson process is a pure-birth process, the simplest example of a
birth-death process. It is also a point process on the real half-line
Definition

The basic form of Poisson process, often referred to simply as "the Poisson process", is a continuous-time
counting process {N(t), t ≥ 0} that possesses the following properties:[5]
 N(0) = 0
 Independent increments (the numbers of occurrences counted in disjoint intervals are independent
of each other)
 Stationary increments (the probability distribution of the number of occurrences counted in any
time interval only depends on the length of the interval)
 Proportionality (the probability of an occurrence in a time interval is proportional to the length of
the time interval)
 The probability of simultaneous occurrences equals zero.
Consequences of this definition include:
 The probability distribution of the waiting time until the next occurrence is an exponential
distribution.
 For each t≥0, the probability distribution of N(t) is a Poisson distribution with parameter λt. Here
λ>0 is called the rate of the Poisson process.
 The occurrences are distributed uniformly on any interval of time. (Note that N(t), the total number
of occurrences, has a Poisson distribution over the non-negative integers, whereas the location of
an individual occurrence on t ∈ (a, b] is uniform.)
Other types of Poisson process are described below.
There are a series of generalizations of the basic Poisson process defined above; these are also termed
Poisson processes. The first of them, called homogeneous, coincides with the basic Poisson process defined
above.
Homogeneous
Sample path of a counting Poisson process N(t)
A homogeneous Poisson process counts events that occur at a constant rate; it is one of the most well-
known Lévy processes. This process is characterized by a rate parameter λ, also known as intensity, such

that the number of events in time interval (t, t + τ] follows a Poisson distribution with associated parameter
λτ. This relation is given as
where N(t + τ) − N(t) = k is the number of events in time interval (t, t + τ].
Just as a Poisson random variable is characterized by its scalar parameter λ, a homogeneous Poisson
process is characterized by its rate parameter λ, which is the expected number of "events" or "arrivals" that
occur per unit time.
N(t) is a sample homogeneous Poisson process, not to be confused with a density or distribution function.
Inhomogeneous
Main article: Inhomogeneous Poisson process
An inhomogeneous Poisson process counts events that occur at a variable rate. In general, the rate
parameter may change over time; such a process is called a non-homogeneous Poisson process or
inhomogeneous Poisson process. In this case, the generalized rate function is given as λ(t). Now the
expected number of events between time a and time b is
Thus, the number of arrivals in the time interval [a, b], given as N(b) − N(a), follows a Poisson distribution
with associated parameter Na,b
A rate function λ(t) in a non-homogeneous Poisson process can be either a deterministic function of time or
an independent stochastic process, giving rise to a Cox process. A homogeneous Poisson process may be
viewed as a special case when λ(t) = λ, a constant rate.
Spatial
An important variation on the (notionally time-based) Poisson process is the spatial Poisson process. In the
case of a one-dimension space (a line) the theory differs from that of a time-based Poisson process only in
the interpretation of the index variable. For higher dimension spaces, where the index variable (now x) is in
some vector space V (e.g. R2 or R3), a spatial Poisson process can be defined by the requirement that the
random variables defined as the counts of the number of "events" inside each of a number of non-

overlapping finite sub-regions of V should each have a Poisson distribution and should be independent of
each other.
Space-time
A further variation on the Poisson process, the space-time Poisson process, allows for separately
distinguished space and time variables. Even though this can theoretically be treated as a pure spatial
process by treating "time" as just another component of a vector space, it is convenient in most
applications to treat space and time separately, both for modeling purposes in practical applications and
because of the types of properties of such processes that it is interesting to study.
In comparison to a time-based inhomogeneous Poisson process, the extension to a space-time Poisson

process can introduce a spatial dependence into the rate function, such that it is defined as , where
2 3
for some vector space V (e.g. R or R ). However a space-time Poisson process may have a rate
function that is constant with respect to either or both of x and t. For any set (e.g. a spatial region)
with finite measure , the number of events occurring inside this region can be modeled as a Poisson
process with associated rate function λS(t) such that
Separable space-time processes
In the special case that this generalized rate function is a separable function of time and space, we have:
for some function . Without loss of generality, let
(If this is not the case, λ(t) can be scaled appropriately.) Now, represents the spatial probability
density function of these random events in the following sense. The act of sampling this spatial Poisson
process is equivalent to sampling a Poisson process with rate function λ(t), and associating with each event
a random vector sampled from the probability density function . A similar result can be shown for
the general (non-separable) case.
Characterisation
In its most general form, the only two conditions for a counting process to be a Poisson process are:[citation
needed]
 Orderliness: which roughly means

which implies that arrivals don't occur simultaneously (but this is actually a mathematically stronger
statement).
 Memorylessness (also called evolution without after-effects): the number of arrivals occurring in
any bounded interval of time after time t is independent of the number of arrivals occurring before
time t.
These seemingly unrestrictive conditions actually impose a great deal of structure in the Poisson process. In
particular, they imply that the time between consecutive events (called interarrival times) are independent
random variables. For the homogeneous Poisson process, these inter-arrival times are exponentially
distributed with parameter λ (mean 1/λ).
Also, the memorylessness property entails that the number of events in any time interval is independent of
the number of events in any other interval that is disjoint from it. This latter property is known as the
independent increments property of the Poisson process.
Properties
As defined above, the stochastic process {N(t)} is a Markov process, or more specifically, a continuous-time
Markov process.[citation needed]
To illustrate the exponentially distributed inter-arrival times property, consider a homogeneous Poisson
process N(t) with rate parameter λ, and let Tk be the time of the kth arrival, for k = 1, 2, 3, ... . Clearly the
number of arrivals before some fixed time t is less than k if and only if the waiting time until the kth arrival
is more than t. In symbols, the event [N(t) < k] occurs if and only if the event [Tk > t] occurs. Consequently
the probabilities of these events are the same:
In particular, consider the waiting time until the first arrival. Clearly that time is more than t if and only if
the number of arrivals before time t is 0. Combining this latter property with the above probability
distribution for the number of homogeneous Poisson process events in a fixed interval gives:
And therefore:
(which is the CDF of the exponential distribution).
Consequently, the waiting time until the first arrival T1 has an exponential distribution, and is thus
memoryless. One can similarly show that the other interarrival times Tk − Tk−1 share the same distribution.
Hence, they are independent, identically distributed (i.i.d.) random variables with parameter λ > 0; and
expected value 1/λ. For example, if the average rate of arrivals is 5 per minute, then the average waiting
time between arrivals is 1/5 minute.
Applications
The classic example of phenomena well modelled by a Poisson process is deaths due to horse kick in the
Prussian army, as shown in 1898 by Ladislaus Bortkiewicz, a Polish economist and statistician who also
examined data of child suicides.[6][7] The following examples are also well-modeled by the Poisson process:
 Number of road crashes (or injuries/fatalities) at a site or in an area

 Goals scored in a association football match.[8]
 Requests for individual documents on a web server.[4]
 Particle emissions due to radioactive decay by an unstable substance. In this case the Poisson
process is non-homogeneous in a predictable manner—the emission rate declines as particles are
emitted.
 Action potentials emitted by a neuron.[9]
 L. F. Richardson showed that the outbreak of war followed a Poisson process from 1820 to 1950. [10]
 Photons landing on a photodiode, in particular in low light environments. This phenomenon is
related to shot noise.
 Opportunities for firms to adjust nominal prices.[11]
 Arrival of innovations from research and development.[12]
 Requests for telephone calls at a switchboard.[citation needed]
 In queueing theory, the times of customer/job arrivals at queues are often assumed to be a Poisson
process.
 The evolution (changes on pages) of Internet, in general (although not in the particular case of
Wikipedia)
Gaussian process
In probability theory and statistics, Gaussian processes are a family of stochastic processes. In a Gaussian
process, every point in some input space is associated with a normally distributed random variable.
Moreover, every finite collection of those random variables has a multivariate normal distribution. The
distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables,
and as such, it is a distribution over functions.
The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of
the normal distribution which is often called the Gaussian distribution. In fact, Gaussian processes can be
seen as an infinite-dimensional generalization of multivariate normal distributions.
Gaussian processes are important in statistical modelling because of properties inherited from the normal.
For example, if a random process is modeled as a Gaussian process, the distributions of various derived
quantities can be obtained explicitly. Such quantities include the average value of the process over a range
of times and the error in estimating the average using sample values at a small set of times.

Definition
A Gaussian process is a stochastic process Xt, t ∈ T, for which any finite linear combination of samples has a
joint Gaussian distribution. More accurately, any linear functional applied to the sample function Xt will give
a normally distributed result. Notation-wise, one can write X ~ GP(m,K), meaning the random function X is
distributed as a GP with mean function m and covariance function K.[1] When the input vector t is two- or
multi-dimensional, a Gaussian process might be also known as a Gaussian random field.[2]
Some authors[3] assume the random variables Xt have mean zero; this greatly simplifies calculations without
loss of generality and allows the mean square properties of the process to be entirely determined by the
covariance function K.[4]
Alternative definitions
Alternatively, a process is Gaussian if and only if for every finite set of indices in the index set
is a multivariate Gaussian random variable. Using characteristic functions of random variables, the Gaussian
property can be formulated as follows: is Gaussian if and only if, for every finite set of
indices , there are real valued , with such that
The numbers and can be shown to be the covariances and means of the variables in the process.[5]
Covariance functions
A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.[2]
Thus, if a Gaussian process is assumed to have mean zero, defining the covariance function completely
defines the process' behaviour. The covariance matrix K between all the pair of points x and x' specifies a
distribution on functions and is known as the Gram matrix. Importantly, because every valid covariance
function is a scalar product of vectors, by construction the matrix K is a non-negative definite matrix.
Equivalently, the covariance function K is a non-negative definite function in the sense that for every pair x
and x', K(x,x') ≥ 0; if K(,) > 0 then K is called positive definite. Importantly the non-negative definiteness of K
enables its spectral decomposition using the Karhunen–Loeve expansion. Basic aspects that can be defined
through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.[6][7]
Stationarity refers to the process' behaviour regarding the separation of any two points x and x' . If the
process is stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual
position of the points x and x'; an example of a stationary process is the Ornstein–Uhlenbeck process. On
the contrary, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is non-
stationary.

If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x' then the
process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be
homogeneous;[8] in practice these properties reflect the differences (or rather the lack of them) in the
behaviour of the process given the location of the observer.
Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors
can be induced by the covariance function.[6] If we expect that for "near-by" input points x and x' their
corresponding output points y and y' to be "near-by" also, then the assumption of smoothness is present. If
we wish to allow for significant displacement then we might choose a rougher covariance function. Extreme
examples of the behaviour is the Ornstein–Uhlenbeck covariance function and the squared exponential
where the former is never differentiable and the latter infinitely differentiable.
Periodicity refers to inducing periodic patterns within the behaviour of the process. Formally, this is
achieved by mapping the input x to a two dimensional vector u(x) = (cos(x), sin(x)).
Applications
A Gaussian process can be used as a prior probability distribution over functions in Bayesian inference.[7][9]
Given any set of N points in the desired domain of your functions, take a multivariate Gaussian whose
covariance matrix parameter is the Gram matrix of your N points with some desired kernel, and sample
from that Gaussian.
Inference of continuous values with a Gaussian process prior is known as Gaussian process regression, or
kriging; extending Gaussian process regression to multiple target variables is known as cokriging.[10]
Gaussian processes are thus useful as a powerful non-linear multivariate interpolation tool. Additionally,
Gaussian process regression can be extended to address learning tasks in both supervised (e.g. probabilistic
classification[7]) and unsupervised (e.g. manifold learning[2]) learning frameworks.
6.2 Properties of the autocorrelation function of a real WSS process
Consider a real WSS process Since the autocorrelation function of such a process is a
function of the lag we can redefine a one-parameter autocorrelation function as
If is a complex WSS process, then

where is the complex conjugate of For a discrete random sequence, we can define the
autocorrelation sequence similarly.
The autocorrelation function is an important function charactersing a WSS random process. It possesses
some general properties. We briefly describe them below.
1. is the mean-square value of the process. Thus,
Remark If is a voltage signal applied across a 1 ohm resistance, then is the ensemble average
power delivered to the resistance.
2. For a real WSS process is an even function of the time Thus,
Because,
Remark For a complex process
3. This follows from the Schwartz inequality
We have

4. is a positive semi-definite function in the sense that for any positive integer and real
Proof
Define the random variable
It can be shown that the sufficient condition for a function to be the autocorrelation function of a real
WSS process is that be real, even and positive semidefinite.
5. If is MS periodic, then . is also periodic with the same period.
Proof: Note that a real WSS random process is called mean-square periodic ( MS periodic ) with a
period if for every

Again
6. Suppose
where is a zero-mean WSS process and Then
Interpretation of the autocorrelation function of a WSS process
The autocorrelation function measures the correlation between two random variables and
If drops quickly with respect to then the and will be less correlated for large This
in turn means that the signal has lot of changes with respect to time. Such a signal has high frequency
components. If drops slowly, the signal samples are highly correlated and such a signal has less high
frequency components. Later on we see that is directly related to the frequency -domain
representation of a WSS process.
7. Cross correlation function of jointly WSS processes
If and are two real jointly WSS random processes, their cross-correlation functions are
independent of and depends on the time-lag. We can write the cross-correlation function

The cross correlation function satisfies the following properties:
This property is illustrated in Figure below.
We Have
Further,
iii. If and Y (t) are uncorrelated,
iv. If X ( t ) and Y (t) are orthogonal processes,

Example 1
Consider a random process which is sum of two real jointly WSS random processes.
We have
If and are orthogonal processes,then
Example 2
Suppose
where X (t) is a WSS process and

Linear time-invariant systems
In many applications, physical systems are modeled as linear time-invariant (LTI) systems. The dynamic
behaviour of an LTI system to deterministic inputs is described by linear differential equations. We are
familiar with time and transform domain (such as Laplace transform and Fourier transform) techniques to
solve these differential equations. In this lecture, we develop the technique to analyze the response of an
LTI system to WSS random process.
The purpose of this study is two-folds:
 Analysis of the response of a system

 Finding an LTI system that can optimally estimate an unobserved random process from an observed
process. The observed random process is statistically related to the unobserved random process.
For example, we may have to find LTI system (also called a filter) to estimate the signal from the
noisy observations.
Basics of Linear Time Invariant Systems
A system is modelled by a transformation T that maps an input signa to an output signal y(t) as
shown in Figure 1. We can thus write,

Linear system
The system is called linear if the principle of superposition applies: the weighted sum of inputs results
in the weighted sum of the corresponding outputs . Thus for a linear system
Example 1 Consider the output of a differentiator, given by
Then,
Hence the linear differentiator is a linear system.
Linear time-invariant system
Consider a linear system with y ( t ) = T x ( t ). The system is called time-invariant if
It is easy to check that that the differentiator in the above example is a linear time-invariant system.

Causal system
The system is called causal if the output of the system at depends only on the present and
past values of input. Thus for a causal system
Response of a linear time-invariant system to deterministic input
As shown in Figure 2, a linear system can be characterised by its impulse response where
is the Dirac delta function .
Figure 2
Recall that any function x(t) can be represented in terms of the Dirac delta function as follows
If x(t) is input to the linear system y ( t ) = T x ( t ), then
Where is the response at time t due to the shifted impulse
If the system is time invariant,

Therefore for a linear-time invariant system,
where * denotes the convolution operation.
We also note that
Thus for a LTI System,
Taking the Fourier transform, we get
Figure 3 shows the input-output relationship of an LTI system in terms of the impulse response and the
frequency response.
Figure 3

Response of an LTI System to WSS input
Consider an LTI system with impulse response h(t). Suppose is a WSS process input to the
system. The output of the system is given by
where we have assumed that the integrals exist in the mean square (m.s.) sense.
Mean and autocorrelation of the output process
where is the frequency response at 0 frequency ( ) and given by
Therefore, the mean of the output process is a constant
iscrete-time Linear Shift Invariant System with Deterministic Inputs

We have seen that the Dirac delta function plays a very important role in the analysis of the
response of the continuous-time LTI systems to deterministic and random inputs. Similar role in the case of
the discrete-time LTI system is played by the unit sample sequence , defined by
Any discrete-time signal can be expressed in terms of as follows:
As illustrated in Figure 1, discrete-time linear shift-invariant system is characterized by the unit sample
response which is the output of the system to the unit sample sequence .
The DTFT of the unit sample response is the transfer function of the system and given by
The transfer function in terms of the is given by
where is a function of the complex variable It is defined on a region of convergence (ROC) on the
An analysis similar to that for the continuous-time LTI system can be applied to the discrete-time
LTI system. Such an analysis shows that the response of a the linear time-invariant system with
impulse response to a deterministic input is

By taking the DTFT of both sides, we get
More generally, we can take the of the input and the response and show that
Remark
 If the LTI system is causal, then
In this case, the ROC of is a region in the given by For example, suppose
Then,
 Similarly, if the LTI system is anti-causal , then
In this case, the ROC of is a region in the given by
 The contour is called the unit circle. Thus represents evaluated on the unit circle.

 can be expressed as the ratio of two polynomials in
The polynomials and helps us in analyzing the properties of a linear system in terms of the
zeros and poles of defined by -
Zero - the point in the where Consequently at such a point.
Pole - the point in the where Consequently at such a point. The ROC of
does not contain any pole. The poles and zeroes and unit circle on the complex plane are illustrated
in Figure 2.

 For the stability of the LTI system, the unit-sample response should decay to zero as A
necessary and sufficient condition for the stability of a discrete-time LTI system is that all its poles
lie strictly inside the unit circle.
 A discrete-time LTI system is called a minimum- phase system if all its poles and zeros lie inside the
unit circle. A minimum-phase system is always stable as its poles lie inside the unit circle. Because
the zeros of the system lie inside the unit circle, the inverse system with a transfer function
will have all its poles inside the unit circle and be stable.
• A discrete-time LTI system is called a maximum- phase system if all its poles and zeros lie outside
the unit circle.
Response of a discrete-time LTI system to WSS input
Consider a discrete-time linear time-invariant system with impulse response and input as
shown in Figure 3 below. Assume to be a WSS process with mean and autocorrelation function
Figure 3
The output random process is given by

Given the WSS input the output process is also WSS. We establish this result in the
following section.
Mean and Autocorrelation of the output
The mean of the output is given by
where
Thus the mean of the output process is constant. We write
The cross-correlation between the output and the input random processes is given by

Thus does not depend on , but on lag and we can write
The autocorrelation function of the out put is
is a function of lag m only and we write
The mean-square value of the output process is
Thus if is WSS then is also WSS.

Taking the DTFT of we get
This input-output relation in the frequency domain is illustrated in Figure 4.
Figure 4
In terms of the we get
which is illustrated in Figure 5 .
Notice that if is causal, then is anti-causal and vice versa.
Similarly if is minimum-phase then is maximum-phase.
Similarly if is minimum-phase then is maximum-phase.
Figure 5
Remark

Finding the probability density function of the output process is a difficult task. However, if
is a WSS Gaussian random process, then the output process is also Gaussian with the
probability density function determined by its mean and the autocorrelation function.
Example 1
Suppose and is a zero-mean white-noise sequence with variance . Then
By partial fraction expansion and inverse transform, we get
 Though the input is an uncorrelated process in the above example, the output is a correlated
process.
 For the same white noise input, we can generate random processes with different autocorrelation
functions or power spectral densities.
Spectral factorization theorem

Consider a discrete-time LTI system with the transfer function and the white noise sequence
as the input random process as shown in the Figure 6 below.
Figure 6
Then
We have seen that is the product of a constant and two transfer functions This
result is of fundamental importance in modeling a WSS process because of the spectral factorization
theorem stated below:
Thus a WSS random signal with continuous spectrum that satisfies the Paley Wiener
condition can be considered as an output of a linear filter fed by a white noise sequence
{w[n]}as shown in Figure 7(a). The sequence {w[n]} is called the innovation sequence.

Proof of the spectral factorization theorem
Since is analytic in an annular region that includes the unit circle
where is the order cepstral coefficient. For a real signal and

ote that and are both analytic.
Therefore, is a minimum phase filter .
The concepts of minimum-phase and maximum-phase filters are illustrated in Figure 8.

Remarks
 Note that of a real process is a function of . Therefore, is a
function of Consider rational spectrum so that where
and are polynomials in . If is a root of so is Thus the roots of are
symmetrical about the unit circle groups the poles and zeros inside the unit circle
and groups the poles and zeros outside the unit circle.
 can be factorized into a minimum-phase and a maximum-phase factors i.e. and
 In general spectral factorization is difficult, however for a signal with rational power spectrum,
spectral factorization can be easily done.

 Since is a minimum phase filter, the inverse filter exists and stable. Therefore we can
have a filter to filter the given signal to get the innovation sequence.
 and are related through an invertible transform; so they contain the same
information.
Example 2 Suppose the power spectral density of a discrete random sequence is given by
Then
Wold's Decomposition
Any WSS signal can be decomposed as a sum of two mutually orthogonal processes a regular process
 a regular process and a predictable process ,
 can be expressed as the output of linear filter using a white noise sequence as input.
 is a predictable process, that is, the process can be predicted from its own past with zero
prediction error.
Consider the problem of estimating a signal in presence of additive noise. We want to estimate the signal
by filtering the noisy signal.

As you might be knowing that the filter is a frequency selective device- it passes selectively a band of
frequency components and suppresses other components. Such a deterministic filter cannot be a solution
to the problem of suppressing random noise because random noise cannot be localized to a specific
frequency band.
We have to use the probabilistic properties of the noise to dissociate the noise from the signal. An
optimal filter performs this dissociation. We will consider the case when the signal to be estimated is of
known form (deterministic). For example, in radar application a signal of known form is reflected from a
distant target. The received signal is the sum of the scaled and shifted version of the original signal and the
noise.
That is,
Where X ( t ) is shifted and scaled version of the known transmitted signal and V ( t ) is a noise assumed
to be WSS with a power spectral density . We wish to decide whether X ( t ) is present and its value
by passing through a linear filter of impulse response
Instantaneous signal power
Average noise power

Signal to noise ratio
We have to determine such that is maximum.
Instantaneous signal power
Average noise power
Signal to noise ratio

We have to determine such that is maximum.
Case 1:White noise
Then
Equality holds if
Band pass Random Processes

A random process is called a band-pass process if its power spectrum is zero outside a
band . is called the band-width and is called the centre frequency of the
band-pass process . If is very small compared to the centre frequency , then is called a
narrow-band process.
We can similarly define a low-pass random process as a random process if its power spectral
density is zero outside the band
 In telecommunication, we often deal with random signals which have PSD concentrated in a small
frequency band and negligible outside this band. The information bearing signals like speech, image
and video are low-pass signals. These information-bearing signals modulate a sinusoidal carrier for
transmitting over the communication channel that acts as a bandpass filter. For example, the
amplitude- modulated waveform received by a communication receiver is modelled as an
amplitude-modulated random-phase sinusoid

where and M ( t ) is a WSS process independent of

.. Here the modulation process translates the spectrum of M ( t ) from base-band to a band centred around
 The noise associated with communication signal undergoes band-pass filtering in the
communication receiver and the band-pass filtered noise can be modeled as a band-pass process.
Figure 1 illustrates the power spectrum of a bandpass random process.

Figure 1 Power spectrum of a band-pass random process
We can do the correlation and power spectral analysis of such signals in the usual manner. However, for
analysis of nonlinear operations like the multiplication with a random process, the following trigonometric
representation is useful.
Rice's representation or quadrature representation of a WSS process
An arbitrary zero-mean WSS process can be represented in terms of the slowly varying
components and as follows:
(1)
where is a center frequency arbitrary chosen in the band . and
are respectively called the in-phase and the quadrature-phase components of
Let us choose a dual process such that
then ,

(2)
and
(3)
For such a representation, we require the processes and to be WSS.
Note that
As is zero mean, we require that
and
Again

Under the the above highlighted conditions
and
How to find satisfying the above two conditions?

For this, consider to be the Hilbert transform of , i.e.
where and the integral is defined in the mean-square sense. See the illustration in Figure 2.
The frequency response of the Hilbert transform is given by
and
The Hilbert transform of Y(t) satisfies the following spectral relations

From the above two relations, we get
The Hilbert transform of is generally denoted as Therefore, from (2) and (3) we establish
and
The realization for the in phase and the quadrature phase components is shown in Figure 3 below.

Figure 3
From the above analysis, we can summarise the following expressions for the autocorrelation functions
where
See the illustration in Figure 4
Figure 4
The variances and are given by
Taking the Fourier transform of we get
Similarly ,

Notice that the cross power spectral density is purely imaginary. Particularly, if is locally
symmetric about
implying that
Consequently, the zero-mean processes and are also uncorrelated.
Example 1
Suppose the band-limited white-noise process has the PSD as shown in Figure 5 below.

Figure 5
We have earlier shown that
The plot of is as shown in Figure 5. Therefore,
(1) The representation of the band-pass process in terms of the in-phase and the quadrature
phasecomponents is not unique. By selecting different we can have different representations.

(2) The band-pass process can be written also as
where
and
and are respectively called the envelope and the phase of the process .
(3) If is a Gaussian process, then (being linear transform of ) is also Gaussian.
Consequently, the processes and are also Gaussian.
4) Under the condition of local symmetry of about , and are uncorrelated.If
and are also Gaussian processes , then and will be independent. Using the
results on the PDF of functions of RVs, we get the following.
 The envelope will be Rayleigh-distributed. Thus
 The phase will be distributed.

Example 1 Consider patients coming to a doctor’s office at random points in time. Let Xn denote the time
(in hrs) that the n th patient has to wait before being admitted to see the doctor. Describe the random
process Xn, n ≥ 1.
Solution. The random process Xn is a discrete-time, continuous-valued random process. The sample space
is SX = {x : x ≥ 0}. The index parameter set (domain of time) is I = {1, 2, 3, · · ·}.
Example 2 The number of failures N(t), which occur in a computer network over the time interval [0, t), can
be described by a homogeneous Poisson process {N(t), t ≥ 0}. On an average, there is a failure after every 4
hours, i.e. the intensity of the process is equal to λ = 0.25[h −1 ]
(a) What is the probability of at most 1 failure in [0, 8), at least 2 failures in [8, 16), and at most 1 failure in
[16, 24) (time unit: hour)?
Solution (a) The probability p = P[N(8) − N(0) ≤ 1, N(16) − N(8) ≥ 2, N(24) − N(16) ≤ 1] is required. In view of
the independence and the homogeneity of the increments of a homogeneous Poisson process, it can be
determined as follows: p = P[N(8) − N(0) ≤ 1]P[N(16) − N(8) ≥ 2]P[N(24) − N(16) ≤ 1] = P[N(8) ≤ 1]P[N(8) ≥
2]P[N(8) ≤ 1]. Since P[N(8) ≤ 1] = P[N(8) = 0] + P[N(8) = 1] = e −0.25·8 + 0.25 · 8 · e −0.25·8 = 0.406 and
P[N(8) ≥ 2] = 1 − P[N(8) ≤ 1] = 0.594, the desired probability is p = 0.406 × 0.594 × 0.406 = 0.098.
Example 3 — Random Telegraph signal Let a random signal X(t) have the structure X(t) = (−1)N(t) Y, t ≥ 0, 3
where {N(t), t ≥ 0} is a homogeneous Poisson process with intensity λ and Y is a binary random variable with
P(Y = 1) = P(Y = −1) = 1/2 which is independent of N(t) for all t. Signals of this structure are called random
telegraph signals. Random telegraph signals are basic modules for generating signals with a more
complicated structure. Obviously, X(t) = 1 or X(t) = −1 and Y determines the sign of X(0).
Since |X(t)| 2 = 1 < ∞ for all t ≥ 0, the stochastic process {X(t), t ≥ 0} is a secondorder process. Letting I(t) =
(−1)N(t) , its trend function is m(t) = E[X(t)] = E[Y ]E[I(t)]. Since E[Y ] = 0, the trend function is identically
zero: m(t) ≡ 0. It remains to show that the covariance function C(s, t) of this process depends only on |t−s|.
This requires the determination of the probability distribution of I(t). A transition from I(t) = −1 to I(t) = +1
or, conversely, from I(t) = +1 to I(t) = −1 occurs at those time points where Poisson events occur, i.e. where
N(t) jumps. P(I(t) = 1) = P(even number of jumps in [0, t]) = e −λtX∞ i=0 (λt) 2i (2i)! = e −λt cosh λt, P(I(t) =
−1) = P(odd number of jumps in [0, t]) = e −λtX∞ i=0 (λt) 2i+1 (2i + 1)! = e −λt sinh λt. Hence the expected
value of I(t) is E[I(t)] = 1 · P(I(t) = 1) + (−1) · P(I(t) = −1) = e −λt[cosh λt − sinh λt] = e −2λt . Since C(s, t) =
COV[X(s), X(t)] = E[(X(s)X(t))] = E[Y I(s)Y I(t)] = E[Y 2 I(s)I(t)] = E(Y 2 )E[I(s)I(t)] and E(Y 2 ) = 1, C(s, t) =
E[I(s)I(t)]. Thus, in order to evaluate C(s, t), the joint distribution of the random vector (I(s), I(t)) must be
determined. In view of the homogeneity of the increments of {N(t), t ≥ 0}, for 4 s < t, p1,1 = P(I(s) = 1, I(t) =
1) = P(I(s) = 1)P(I(t) = 1|I(s) = 1) = e −λs cosh λs P(even number of jumps in (s, t]) = e −λs cosh λs e−λ(t−s)
cosh λ(t − s) = e −λt cosh λs cosh λ(t − s). Analogously, p1,−1 = P(I(s) = 1, I(t) = −1) = e −λt cosh λs sinh λ(t − s)
p−1,1 = P(I(s) = −1, I(t) = 1) = e −λt sinh λs sinh λ(t − s) p−1,−1 = P(I(s) = −1, I(t) = −1) = e −λt sinh λs cosh λ(t −
s). Since E[I(s)I(t)] = p1,1 + p−1,−1 − p1,−1 − p−1,1, we obtain C(s, t) = e −2λ(t−s) , s < t. Note that the order
of s and t can be changed so that C(s, t) = e −2λ|t−s| .
Questions
Objective Question
1. The point in Z plane which causes H(z) = 0
a. Zero
b. Pole
c. Master
d. Slave
2. The point in Z plane which causes H(z) = ∞
a. Zero
b. Pole
c. Master
d. Slave

3. Give relation between autocorrelation & power spectral density
a. power spectral density = FT [autocorrelation]
b. power spectral density = LT [autocorrelation]
c. power spectral density = ZT [autocorrelation]
d. power spectral density = KLT [autocorrelation]
4. LTI system implies system is
a. Linear
b. Time Invariant
c. Time Variant
d. Low Pass
5. Causal system are
a. Realizable
b. Imaginary
c. Ideal
d. Lossless
8. Short Question
1. What is Random Process ? Explain it with example.

2. State four classes of RP giving one example of each.
3. Define a SSS and a WSS. What is the difference between the two.
4. Define the first order and second order PDF of Random process
5. Define Linear system
6. Define Causal system

7. Define Linear Time Invariant system
8. Define Autocorrelation of RP
9. Define Power spectral density of RP
10. Give relation between autocorrelation & power spectral density
11. Give relation between power spectral density at input & output of LTI system.
9. Long Question
1. Explain in brief:
(i) WSS process
(ii) Poisson process
(iii) Queuing system
2. If the WSS Process X(t) is given by X(t) = 10 cos (100t+ θ)

Were θ is uniformly distributed over ( - π, π). Prove that the X(t) is. Correlation Ergodic.
3. What is a Random Process? State four classes of random processes giving one example each
4. 1. Consider a sinusoidal signal where A is random variable with probability mass
functions and
5. a) Sketch all the possible realizations of X(t).
(b) Sketch the marginal CDF's of the random variables at
6. Consider the random process {X(t)} given by
7. where are constants and is a discrete random variable. Examine if

{X(t)} is WSS in the following cases:
(i)

(ii)
Mumbai University Question:
Dec 2012
Q. A random process is given by X(t)= sin(Wt+Y) where Y is uniformly distributed in (0, 2π). Verify whether
{X(t)} is WSS process
Q. State and prove the properties of autocorrelation function and cross correlation function.
Q. If the WSS Process X(t) is given by X(t) = 10 cos (100t+ θ)
Were θ is uniformly distributed over ( - π, π). Prove that the X(t) is.correlation Ergodic.
Q. A WSS random process {X(t)} is applied to the input of LTI system whose impulse response is te-atu(t)
where a(>0) is a real constant
May 2012
Q. A random process is given by X(t)= Acos(Wt+Y) where Y is uniformly distributed in (0, 2π). Verify whether
{X(t)} is WSS process
Q. Explain power spectral density. State it’s properties and prove any two of them
Q. Prove that if input to the LTI is WSS then output is also WSS
Dec 2011
Q. What is random process ? state four classes of random process with example
Q. If the X(t) is given by X(t) = 10 cos (100t+ θ) where θ is uniformly distributed over ( - π, π). Prove that
the X(t) is WSS

Q. Prove that if input to the LTI is WSS then output is also WSS
May 2011
Q. Define poisson process and prove that it is markov process
Q. Find autocorrelation function and poer spectral density of random process is given by X(t)= Acos(Wt+Y)
where Y is uniformly distributed in (-π, π). Verify whether {X(t)} is WSS process
Q. Explain Gaussian process, ergodic process
Dec 2010
A WSS random process X(t)with autocorrelation- RXX(τ)= Ae-a|τ|
where A and a are real positive constants, is applied to the input of the LTI system with impulse response
h(t)= e-b|τ|u(t) where b is real positive constants. Find the autocorrelation of the output Y(t) of the system.
Dec 2009
Q. State and prove the properties of autocorrelation function and cross correlation function.
Q.The power spectral density of random process is given by :-
Find: - S(w)=10w2 + 35 / (w2 + 4 ) (w2 + 9 )
(i) Average Power.
(ii) R (t) the autocorrelation function.
Q.(a) If the WSS Process X(t) is given by X(t) = 10 cos (100t+ θ)
Were θ is uniformly distributed over ( - π, π). Prove that the X(t) is.correlation Ergodic.
Dec 2008
Q.What is a Random Process? State four classes of random processes giving one example each

Q. Explain in brief:
(i) WSS process
(ii) Poisson process
(iii) Queuing system.
CHAPTER-6
Markov Chains and Introduction to Queuing Theory

1. Motivation:
The objective of this course is to analyze the behavior of signals and random phenomena, with
special emphasis on its applications to communication engineering, signals and linear systems.
2. Syllabus:
Module Content Duration Self Study Time
1. Introduction
2 Introduction 1 lecture 1 hours
3 Homogeneous chain, stochastic 2 lecture 2 hours
matrix, Random walks,
4 higher transition probabilities and the 2 lectures 2 hours

Chapman-Kolmogorov equation,

5 classification of states 2 lectures 2 hours
6 Markovian models, Birth and death 2 lectures 2 hours

queuing models, Steady state results
7 Single and Multiple server Queuing 2 lectures 2 hours

models, Finite source models, Little’s
formula
3. References:
4. Weightage in University Examination: 10 to 12 Marks
5. Prerequisite:
Knowledge of signals and its behavior is required
6. Key Definitions:

1. Markov chains:
Markov chains are discrete state space processes that have the Markov property. Usually they are
defined to have also discrete time
1. Stochastic process
Dynamical system with stochastic (i.e. at least partially random) dynamics. At each
Time the system is in one state Xt, taken from a set S, the state space. One often
writes such a process as
3 Homogeneous (or stationary) Markov chains
A Markov chain with transition probabilities that depend only on the length m-n of the separating
time interval,
is called a homogeneous (or stationary) Markov chain.
4 stochastic matrix
The one-step transition probabilities WXY (1) in a homogeneous Markov chain are from
now on interpreted as entries of a matrix W = { WXY } , the so-called transition matrix of the chain, or
stochastic matrix.
7. Theory and Mathematical Representation
Introduction
7.1 Random walks
A drunk walks along a pavement of width 5. At each time step he/she moves one position forward, and one
position either to the left or to the right with equal probabilities.
Except: when in position 5 can only go to 4 (wall), when in position 1 and going to the

right the process ends .
The fair casino
You decide to take part in a roulette game, starting with a capital of C 0 pounds. At each round of the game
you gamble £10. You lose this money if the roulette gives an even number, and you double it (so receive
£20) if the roulette gives an odd number.
Suppose the roulette is fair, i.e. the probabilities of even and odd outcomes are exactly
1/2. What is the probability that you will leave the casino broke?
The gambling banker
Consider two urns A and B in a casino game. Initially A contains two white balls, and
B contains three black balls. The balls are then `shuffled' repeatedly at discrete time
steps according to the following rule: pick at random one ball from each urn, and swap them. The three
possible states of the system during this (discrete time and discrete
State space) stochastic process is shown below:

Many many other real-world processes...
Dynamical systems with stochastic (partially or fully random) dynamics. Some are really fundamentally
random; others are `practically' random.
E.g.
Physics: quantum mechanics, solids/liquids/gases at nonzero temperature, diffusion Biology: interacting

molecules, cell motion, predator-prey models,
Medicine: epidemiology, gene transmission, population dynamics,
Commerce: stock markets & exchange rates, insurance risk, derivative pricing,
Sociology: herding behavior, traffic, opinion dynamics,
Computer science: internet traffic, search algorithms,
Leisure: gambling, betting,
7.2. Definitions and properties of stochastic processes
We first define stochastic processes generally, and then show how one finds discrete time
Markov chains as probably the most intuitively simple class of stochastic processes.
Dynamical system with stochastic (i.e. at least partially random) dynamics. At each
Time the system is in one state Xt, taken from a set S, the state space. One often writes
such a process as
Consequences, conventions
(i)We can only speak about the probabilities to find the system in certain states at certain times: each X t is a
random variable.
(ii) To define a process fully: specify the probabilities (or probability densities) for the

Xt at all t, or give a recipe from which these can be calculated.
(iii) If time discrete: label time steps by integers n ≥0, write X = ( Xn : n ≥0) .
7.3. Definitions and properties of stochastic processes
Markov chains are discrete state space processes that have the Markov property. Usually they are defined
to have also discrete time
The Markov property
A discrete time and discrete state space stochastic process is Markovian if and only if the conditional
probabilities do not depend on (X0,……,Xn) in full, but only on the most recent state Xn:
The likelihood of going to any next state at time n + 1 depends only on the state we
Find ourselves in at time n. The system is said to have no memory.
Consequences, conventions
(i) For a Markovian chain one has
Proof:

(ii) Let us define the probability Pn(X) to find the system at time n ≥ 0 in state
X € S:
This defines a time dependent probability measure on the set S, with the usual
Properties
ΣxPn(X) = 1 and Pn(X) ≥ 0 for all X € S and all n
(iii) For any two times m > n ≥ 0 the measures Pn(X) and Pm(X) are related via
With -----------(I)
Defined: homogeneous (or stationary) Markov chains
A Markov chain with transition probabilities that depend only on the length m ¡ n of
the separating time interval,
is called a homogeneous (or stationary) Markov chain. Here the absolute time is
irrelevant: if we re-set our clocks by a uniform shift n -- n + K for fixed K, then all
probabilities to make certain transitions during given time intervals remain the same.
consequences, conventions
(i) The transition probabilities in a homogeneous Markov chain obey the Chapman-

Kolmogorov equation:
The likelihood to go from Y to X in m steps is the sum over all paths that go first
in m - 1 steps to any intermediate state X’, followed by one step from X’ to X.The Markovian property
guarantees that the last step is independent of how we got to X’. Stationary ensures that the likelihood to
go in m - 1 steps to X’ is not dependent on when various intermediate steps were made.
Proof:
Rewrite Pm(X) in two ways, first by choosing n = 0 in the right-hand side of (I), second by choosing n = m - 1
in the right-hand side of (I):
Next we use (I) once more, now to rewrite Pm-1(X’) by choosing n = 0:
Finally we choose P0(X) = δXY , and demand that the above is true for any Y € S:
Defined: Stochastic matrix
The one-step transition probabilities WXY (1) in a homogeneous Markov chain are from now on interpreted
as entries of a matrix W = (WXY), the so-called transition matrix of the chain, or stochastic matrix.
consequences, conventions:
(i) In a homogeneous Markov chain one has

Proof:
This follows directly from (8), in combination with our identification of W XY in Markov chains as the
probability to go from Y to X in one time step.
Examples
Some dice rolling examples:
Xn = number of sixes thrown after n rolls?
6 at stage n : Xn = Xn-1 + 1; probability 1/6
no 6 at stage n : Xn = Xn-1; probability 5=6
So P(Xn) depends only on Xn-1, not on earlier values: Markovian!
If Xn-1 had been known exactly:
If Xn-1 is not known exactly, average over all possible values of Xn-1:
Hence

7.4. Properties of homogeneous finite state space Markov chains
Simplification of notation & formal solution
Since the state space S is discrete, we can represent/label the states by integer numbers,
and write simply S = (1; 2; 3; : : :). Now the X are themselves integer random variables. To
exploit optimally the simple nature of Markov chains we change our notation
S =(1; 2; 3; : : :); X; Y = I, j Pn(X) = pi(n); WXY = pji
From now on we will limit ourselves for simplicity to Markov chains with finite state spaces
S = (1; : : : ; |S|). This is not essential but removes distracting technical complications.
Defined: homogeneous Markov chains in standard notation
In our new notation the dynamical eqns of the Markov chain becomes
Example Suppose a car rental agency has three locations in Ottawa: Downtown location (labelled A), East
end location (labelled B) and a West end location (labelled C). The agency has a group of delivery drivers to
serve all three locations. The agency's statistician has determined the following:
1. Of the calls to the Downtown location, 30% are delivered in Downtown area, 30% are delivered in
the East end, and 40% are delivered in the West end
2. Of the calls to the East end location, 40% are delivered in Downtown area, 40% are delivered in
the East end, and 20% are delivered in the West end
3. Of the calls to the West end location, 50% are delivered in Downtown area, 30% are delivered in
the East end, and 20% are delivered in the West end.
After making a delivery, a driver goes to the nearest location to make the next delivery. This way, the
location of a specific driver is determined only by his or her previous location.
We model this problem with the following matrix:

T is called the transition matrix of the above system. In our example, a state is the location of a particular
driver in the system at a particular time. The entry sji in the above matrix represents the probability of
transition from the state corresponding to i to the state corresponding to j. (e.g. the state corresponding to
2 is B)
To make matters simple, let's assume that it takes each delivery person the same amount of time (say 15
minutes) to make a delivery, and then to get to their next location. According to the statistician's data, after
15 minutes, of the drivers that began in A, 30% will again be in A, 30% will be in B, and 40% will be in C.
Since all drivers are in one of those three locations after their delivery, each column sums to 1. Because we
are dealing with probabilities, each entry must be between 0 and 1, inclusive. The most important fact that
lets us model this situation as a Markov chain is that the next location for delivery depends only on the
current location, not previous history. It is also true that our matrix of probabilities does not change during
the time we are observing.
Now, let’s start with a simple question. I f you begin at location C, what is the probability (say, P) that you
will be in area B after 2 deliveries? Think about how you can get to B in two steps. We can go from C to C,
then from C to B, we can go from C to B, then from B to B, or we can go from C to A, then from A to B. To
figure out P, let P(XY) denote the probability of going from X to Y in one delivery (where X,Y can be A,B or
C). Do you remember how probabilities work? If two (or more) independent events must both (all) happen,
to obtain the probability of them both (all) happening, we multiply their probabilities together. To obtain
the probability of either (any) happening, we add the probabilities of those events together.
This gives us P = P(CA)P(AB) + P(CB)P(BB) + P(CC)P(CB) for the probability that a delivery person goes from C
to B in 2 deliveries. Substituting into our formula using the statistician's data above gives P = (.5)(.3) +
(.3)(.4) + (.2)(.3) = .33.This tells us that if we begin at location C, we have a 33% chance of being in location
B after 2 deliveries.
Let's try this for another pair. If we begin at location B, what is the probability of being at location B after 2
deliveries? Try this yourself before you read further! The probability of going from location B to location B
in two deliveries is P(BA)P(AB) + P(BB)P(BB) + P(BC)P(CB) = (.4)(.3)+(.4)(.4) + (.2)(.3) = .34. Now it wasn't so
bad calculating where you would be after 2 deliveries, but what if you need to know where you will be after
5, or 15 deliveries? That could take a LONG time. There must be an easier way, right? Look carefully at
where these numbers come from. As you might suspect, they are the result of matrix multiplication..
Going from C to B in 2 deliveries is the same as taking the inner product of row 2 and column 3. Going from
B to B in 2 deliveries is the same as taking the inner product of row 2 and column 2. If you multiply T by T,
the (2, 3) and (2,2) entries are respectively, the same answers that you got for these two questions above.
The rest of T2 answers the same type of question for any other pair of locations X and Y.

You will notice that the elements on each column still add to 1 and each element is between 0 and 1,
inclusive. Since we are modeling our problem with a Markov chain, this is essential. This matrix indicates
the probabilities of going from location i to location j in exactly 2 deliveries.
Now that we have this matrix, it should be easier to find where we will be after 3 deliveries. We will let
p(AB) represent the probability of going from A to B in 2 deliveries. Let's find the probability of going from C
to B in 3 deliveries: it is p(CA)P(AB) + p(CB)P(BB) + p(CC)P(CB) = (.37)(.3) + (.33)(.4) + (.3)(.3) = .333. You will
see that this probability is the inner product of row 2 of T 2 and column 3 of T. Therefore, if we multiply T2 by
T, we will get the probability matrix for 3 deliveries.
By now, you probably know how we find the matrix of probabilities for 4, 5 or more deliveries. Notice that
the elements on each column still add to 1. Therefore, it is important that you do not round your answers.
Keep as many decimal places as possible to retain accuracy.

, ,
What do you notice about these matrices as we take into account more and more deliveries? The numbers
in each row seems to be converging to a particular number. Think about what this tells us about our long-
term probabilities. This tells us that after a large number of deliveries, it no longer matters which location
we were in when we started. At the end of the week, we have (approximately) a 38.9% Chance of being at
location A, a 33.3% chance of being at location B, and a 27.8% chance of being in location C. This
convergence will happen with most of the transition matrices that we consider.
Remark If all the entries of the transition matrix are between 0 and 1 EXCLUSIVELY, then convergence is
guaranteed to take place. Convergence may take place when 0 and 1 are in the transition matrix, but
convergence is no longer guaranteed.
For an example, look at the matrix
Think about the situation that this matrix represents in order to understand why Ak oscillates as k grows.

Sometimes, you will be given a vector of initial distributions to describe how many or what proportion of
the objects are in each state in the beginning. Using this vector, you can find out how many (or what
proportion) of the objects are in each state at any later time. If the initial distribution vector consists of
numbers between 0 and 1, it tells you what proportion of the total number of objects are in each state in
the beginning, and the elements in the column sum to one. Alternatively, the vector of initial distributions
could contain the actual number of objects or people in each state in the beginning. In this case, all the
elements will be nonnegative and the elements in each row will add to the total number of objects or
people in the entire system. In our example above, the vector of initial distributions tells us what
proportion of the drivers originally begins in each area. For example, if we start out with a uniform
distribution, we will have 1/3 of our drivers in each area. Thus, the vector
is the vector of initial distribution. After one delivery, the distribution will be (approximately) 40% of our
drivers in area A, 33.4% in area B, and 26.6% in area C. We found this by multiplying our initial distribution
matrix by our transition matrix, as follows:
After many deliveries, we saw that some convergence occurs, so that the area from which we start doesn't
matter. This will mean that we will obtain the same right-hand side no matter with which initial distribution
we start. For example,

Notice that each right-hand side is the same as one of the columns of our transition matrix after many
deliveries. This is exactly what we expected because we said that about 38.9% of the people will be in area
A after many deliveries regardless of what percentage of the people were in area A in the initial
distribution. Check this with several initial distributions to convince yourself of the truth of this statement.
If the initial distribution indicates the actual number of people in the system, the following can represent
our system after one delivery:
Did you notice that we now have a fractional number of people in areas A and C after one delivery? We
know that this cannot happen, but this gives us a good idea of approximately how many delivery people are
in each area. After many deliveries, the right-hand side of this equality will also be very close to a particular
vector. For example,
The particular vector that the product converges to is the total number of people in the system (54 in this
case) times any columnn of the matrix that Ak converges to as k grows,

Try some examples to convince yourself that the vector indicating the number of people in each area after
many deliveries will not change if people are moved from one state to another in the initial distribution.
Also notice that the number of people in the entire system never changes. People move from place to
place, but the system never loses or gains people. This can also be illustrated easily using block
multiplication: Since, for large N, TN = [w w w] has its 3 columns the same, then if v = (coulmn vector) (p 1,
p2, p3) is the initial distribution vector ,TN v = [w w w] (p1, p2, p3) (transpose)= p1w+ p2w+ p3w =( p1+ p2+
p3)w =(total number of people initially) w.
I hope the above example gave you a good idea about the process of Markov chains. Now here is the
general setting:
Definitions For a Markov chain with n states, the state vector is a column vector whose ith component
represents the probability that the system is in the ith state at that time. Note that the sum of the entries of
a state vector is 1. For example, vectors X0 and X1 in the above example are state vectors. If pij is the
probability of movement (transition) from one state j to state i, then the matrix T=[ pij] is called the
transition matrix of the Markov chain.
The following Theorem gives the relation between two consecutive state vectors:
If Xn+1 and Xn are two consecutive state vectors of a Markov chain with transition matrix T, then X n+1=T Xn
For a Markov chain, we are usually interested in the long-term behavior of a general state vector Xn. In
other words, we would like to find the limit of Xn as n→∞. It may happen that this limit does not exist, for
example let

then
Clearly Xn oscillates between the vectors (0, 1) and (1, 0) and therefore does not approach a fixed vector.
A question is: what makes Xn approach a limiting vector as n→∞. The next theorem will give an answer,
first we need a definition:
Definition A transition matrix T is called regular if, for some integer r, all entries of Tr are strictly positive. (0
is not strictly positive).
For example, the matrix
is regular since

A Markov chain process is called regular if its transition matrix is regular.
We state now the main theorem in Markov chain theory:
1. If T is a regular transition matrix, then as n approaches infinity, Tn→S where S is a matrix of the
form [v, v,…,v] with v being a constant vector.
2. If T is a regular transition matrix of a Markov chain process, and if X is any state vector, then as n
approaches infinity, TnX→p, where p is a fixed probability vector (the sum of its entries is 1), all of
whose entries are positive.
Consider a Markov chain with a regular transition matrix T, and let S denote the limit of Tn as n approaches
infinity, then TnX→SX=p, and therefore the system approaches a fixed state vector p called the steady-state
vector of the system. Now since Tn+1=TTn and that both Tn+1 and Tn approach S, we have S=TS. Note that
any column of this matrix equation gives Tp=p. Therefore, the steady-state vector of a regular Markov chain
with transition matrix T is the unique probability vector p satisfying Tp=p.
Is there a way to compute the steady-state vector of a regular Markov chain without using the limit? Well,
if we can solve Tp=p, for p, then yes! You might have seen this sort of thing before (and certainly will in
your first linear algebra course) Recall the definition of an eigenvector and an eigenvalue of a square
matrix:
Given a square matrix A, we say that the number λ is an eigenvalue of A if there exists a nonzero vector X
satisfying: AX=λX. In this case, we say that X is an eigenvector of A corresponding to the eigenvalue λ.

It is now clear that a steady-state vector of a regular Markov chain is an eigenvector for the transition
matrix corresponding to the eigenvalue 1.
Recall that the eigenvalues of a matrix A are the solutions to the equation det(A- λI)=0 where I is the
identity matrix of the same size as A. If λ is an eigenvalue of A, then an eigenvector corresponding to λ is a
non-zero solution to the homogeneous system (A- λI)X=0. Consequently, there are infinitely many
eigenvectors corresponding to a fixed eigenvalue.
Example If you have lived in Ottawa for a while, you must have realized that the weather is a main concern
of the population. An unofficial study of the weather in the city in early spring yields the following
observations:
1. It is almost impossible to have two nice days in a row
2. If we have a nice day, we just as likely to have snow or rain the next day
3. If we have snow or rain, then we have an even chance to have the same the next day
4. If there is a change from snow or rain, only half of the time is this a change to a nice day.
a. Write the transition matrix to model this system.
b. If it is nice today, what is the probability of being nice after one week?
c. Find the long time behaviour of the weather.
Solution 1) since the weather tomorrow depends only on today, this is a Markov chain process. The
transition matrix of this system is

where the letters N, R, S represent Nice, Rain, Snow respectively.
2) If it is nice today, then the initial state-vector is
After seven days (one week), the state-vector would be

So, there is about 20% chance of being nice in one week.
3) Notice first that we are dealing with a regular Markov chain since the transition matrix is regular,
so we are sure that the steady-state vector exists. To find it we solve the homogeneous system (T-
I)X=0 which has the following coefficient matrix:
Reducing to reduced echelon form gives
The general solution of this system is
So what solution do we choose? Remember that a steady-state vector is in particular a probability vector;
that is the sum of its components is 1: 0.5t+t+t=1 gives t=0.4. Thus, the steady-state vector is

In the long term, there is 20% chance of getting a nice day, 40% chance of having a rainy day and 40%
chance of having a snowy day.
Objective Question
1. Define Poisson Process

2. Define semi Markov chain
3. Define Gaussian Process
Short Question
1. State and prove Chapman-Kolmogorov Equation

2. Define Poisson Process and find its probability density function
3. What are the assumptions in Poisson Process
Long Question
1. Defined: Stochastic matrix
2. Properties of homogeneous finite state space Markov chains
3. State and prove Chapman-Kolmogorov Equation
Mumbai University Question Papers
Dec 2012
Q. State and prove Chapman-Kolmogorov Equation.
Q. The transition matrix of markov chain with three states 0,1,2 is given by
¾ ¼ 0
¼ ½ ¼

0 ¾ ¼
And the initial state distribution is
P(Xo=i)= 1/3, i=0,1,2
Find (i) P(X2=2) (ii) p(X3=1, X2=2, X1=1,X0=2)
May 2012
Dec 2011
May 2011
Q. Three boys A, B, C are throwing balls to each other. A alway throws ball to B. B is as likely to throw the
ball to C as to A. The probability that C will throw the ball to A is 2/3. Write transition probability matrix and
prove that process is markovian
Q. the transition probability of Markov chain
0.5 0.4 0.1
0.3 0.4 0.3
0.2 0.3 0.5
Find limiting probability
Q. Define Markov chain giving example
1.Define Markov Chain giving an example. 4 Marks Dec 2009
2. State and prove Chapman-Kolmogorov Equation. 10 Marks Dec 2009

3. State and prove Chapman-Kolmogorov Equation. 10 Marks Dec 2008

Final RSA With Question Papers Solution-new-May - 16

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Final RSA With Question Papers Solution-new-May - 16

Enviado por

Direitos autorais:

Formatos disponíveis

CHAPTER-1

Overview of Probability Theory and Basics of Random Variable

Sr. No. of Self

 Functions of one random 1 hour 1 hour

Random Signal Analysis Page 1

 Cumulative Distribution Function 1 hour 1 hour

 Some special distributions:

1. A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochastic

2. P.Z. Peebles, Probability, Random Variables and Random Signal Principles,

3. H. Stark and J.W. Woods, Probability and Random Processes with

4. Wim C Van Etten, Introduction to Random Signals and Noise, Wiley

5. Miller, Probability and Random Processes-with applications to signal

4. Weight age in University Examination: 8-15 marks.

Random Signal Analysis Page 2

N. Set of natural numbers

Q. Set of rational numbers

Set of real numbers

C. Set of complex numbers

CDF Cumulative Distribution Function

PMF Probability Mass Function

PDF Probability Density Function

Open interval on the real line

Closed interval on the real line

Semi open intervals

Random Signal Analysis Page 3

2. Subset and Superset:

elements of A are also elements of B. Thus .

called a proper subset of B. We write

Random Signal Analysis Page 4

7. Complement: The complement of a set A, denoted by , is defined as the set of all

8. Disjoint: Two sets A and B are called disjoint if

In the above example, B and C are disjoint sets.

10. Random Variable:

into the real line. Let us define the probability of a subset by

11. Conditional Probability:

Random Signal Analysis Page 5

1. Relative-frequency based definition of probability (von Mises, 1919)

3. Axiomatic definition of probability (Kolmogorov, 1933)

Random Signal Analysis Page 6

‘failure' with probability . Such a random experiment is called Bernoulli trial.

We are interested in finding the probability of k ‘successes' in n independent Bernoulli trials.

Random Signal Analysis Page 7

Basic Concepts of Set Theory

Some of the basic concepts of set theory are introduced here.

is a set and are its elements.

common properties satisfied by the elements. Thus the set

Random Signal Analysis Page 8

we write . In the above example, and .

Subset and Superset

elements of A are also elements of B. Thus .

called a proper subset of B. We write

o A set A is a subset of itself .

o The null set is a subset of every set.

o If the set A is finite with n number of elements, then A has subsets.

For example, the set of binary digits has subsets.

Random Signal Analysis Page 9

Equality of two sets

o A = B if and only if and . In other words,

Random Signal Analysis Page 10

Disjoint: Two sets A and B are called disjoint if

In the above example, B and C are disjoint sets.

Following are a few illustrations of Venn diagram: