Você está na página 1de 335

CHAPTER-1

Overview of Probability Theory and Basics of Random Variable

1. Motivation

The study of probability stems from the analysis of certain games of chance, and it has found
applications in most branches of science and engineering. In this chapter the basic concepts of probability
theory are presented.

2. Syllabus

Sr. No. of Self


Topic Fine Detailing
No. Hours Study
01 Introduction  Classical and relative- 1 hour 1 hour
to Probability frequency-based definitions of
probability;
 sets, fields, sample space and
events;
 axiomatic definition of
probability;
 joint and conditional 1 hour 1 hour

probabilities, independence,
total probability;
 Bayes’ Rule and applications. 1 hour 1 hour

 Functions of one random 1 hour 1 hour

variable
 their distribution and density 1 hour 1 hour

functions
Random variables:

Random Signal Analysis Page 1


 Definition of random variable, 1 hour 1 hour

 Cumulative Distribution Function 1 hour 1 hour


(CDF),

1 hour 1 hour
 Probability Mass Function (PMF),
Probability Density Functions
(PDF) and properties,

 Some special distributions:


1 hour 1 hour
Uniform, Gaussian and Rayleigh
distributions; Binomial, and
Poisson distributions; Mixed
Random Variables.

3. Books Recommended

1. A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochastic


Processes, 4th Edition, McGraw-Hill, 2002

2. P.Z. Peebles, Probability, Random Variables and Random Signal Principles,


4th edition, Mc-Graw Hill, 2000

3. H. Stark and J.W. Woods, Probability and Random Processes with


Applications to Signal Processing, 3e, Pearson edu

4. Wim C Van Etten, Introduction to Random Signals and Noise, Wiley

5. Miller, Probability and Random Processes-with applications to signal


processing and communication, first ed2007, Elsevier

4. Weight age in University Examination: 8-15 marks.

Random Signal Analysis Page 2


5. Objective

The objective of this module is to make the reader understand concepts of probability, different types of
probability & acquire an ability to compute probability in various cases.

6. Key Notation

N. Set of natural numbers

Set of integers

Q. Set of rational numbers

Set of real numbers

C. Set of complex numbers

CDF Cumulative Distribution Function

PMF Probability Mass Function

PDF Probability Density Function

RV Random Variable

Open interval on the real line

Closed interval on the real line

Semi open intervals

7. Key Definitions

Random Signal Analysis Page 3


1. Set: A set is a well defined collection of objects. These objects are called elements or
members of the set. Usually uppercase letters are used to denote sets.

2. Subset and Superset:

A set A is called a subset of B (or B is called the superset of A), denoted by , if all the

elements of A are also elements of B. Thus .

If A is a subset of B and there is at least one element in B which is not an element of A, then A is

called a proper subset of B. We write

3. Universal set:

We always consider all sets for the problem under consideration to be a subset of a (large) set called

the universal set. For the binary digital communication problem, the set may be considered as the
universal set. We shall denote the universal set by the symbol S.

4. Union: The union of two sets A and B, denoted by is defined as the set of
elements that are either in A or in B or both. In set builder notation, we write

5. Intersection: The intersection of two sets A and B, denoted by , is defined as the set
of elements that are common to both A and B. We can write,

6. Difference: The difference of two sets A and B, denoted by is the set of those
elements of A which do not belong to B. Thus,

Random Signal Analysis Page 4


Similarly,

7. Complement: The complement of a set A, denoted by , is defined as the set of all


elements which are not in A.

Clearly,

8. Disjoint: Two sets A and B are called disjoint if

In the above example, B and C are disjoint sets.

9. Venn diagram

The sets and set operations can be illustrated by means of the Venn diagrams. A rectangle is used to
represent the universal set and a circle is used to represent any set in it.

10. Random Variable:


A random variable associates the points in the sample space with real numbers.

Consider the probability space and function mapping the sample space

into the real line. Let us define the probability of a subset by

11. Conditional Probability:

For two events A and B with , the conditional probability was defined as

Random Signal Analysis Page 5


12. Conditional Distribution Function

Consider the event and any event B involving the random variable X . The conditional
distribution function of X given B is defined as

8. Key Relations

1. Relative-frequency based definition of probability (von Mises, 1919)

If an experiment is repeated times under similar conditions and the event occurs in times, then

2. Conditional probability

3. Axiomatic definition of probability (Kolmogorov, 1933)

Random Signal Analysis Page 6


4. Probability Using Counting Method

In many applications we have to deal with a finite sample space and the elementary events formed
by single elements of the set may be assumed equiprobable. In this case, we can define the probability of
the event A according to the classical definition discussed earlier:

where = number of elements favourable to A and n is the total number of elements in the sample

space .

5. Bernoulli trial

Suppose in an experiment, we are only concerned whether a particular event A has occurred or not.

We call this event as the ‘success' with probability and the complementary event as the

‘failure' with probability . Such a random experiment is called Bernoulli trial.

Probability of

Success :

Failure :

6. Binomial Law

We are interested in finding the probability of k ‘successes' in n independent Bernoulli trials.

Random Signal Analysis Page 7


This probability is given by

9. Theory

Basic Concepts of Set Theory

The modern approach to probability based on axiomatically defining probability as function of a set.
A background on the set theory is essential for understanding probability.

Some of the basic concepts of set theory are introduced here.

Set:

A set is a well defined collection of objects. These objects are called elements or members of the
set. Usually uppercase letters are used to denote sets.

Example 1

is a set and are its elements.

 The elements of a set are enumerated within a pair of curly brackets as shown in this example.

 Instead of listing all the elements, we can represent a set in the set-builder notation specifying some

common properties satisfied by the elements. Thus the set

represents the set . We read ' ' as 'all such that'. Particularly, if a set is infinite
having infinite number of elements or listing all the elements of the set is cumbersome, such a
representation is useful.

Random Signal Analysis Page 8


 If an element x is a member of the set A, we write . If x is not a member of A,

we write . In the above example, and .

 The null set or empty set is the set that does not contain any element. A null set is denoted by .

Subset and Superset

A set A is called a subset of B (or B is called the superset of A), denoted by , if all the

elements of A are also elements of B. Thus .

If A is a subset of B and there is at least one element in B which is not an element of A, then A is

called a proper subset of B. We write

Example 2 Let .

Then, .

o A set A is a subset of itself .

o Implies that .

o The null set is a subset of every set.

o If the set A is finite with n number of elements, then A has subsets.

For example, the set of binary digits has subsets.

These are: .

Universal set

We always consider all sets for the problem under consideration to be a subset of a (large) set called

the universal set. For the binary digital communication problem, the set may be considered as the
universal set. We shall denote the universal set by the symbol S.

Random Signal Analysis Page 9


Example 3

In discussion involving English letters, the alphabet of the English language may be considered as the
universal set.

Equality of two sets

Two sets A and B are equal if and only if they have the same elements. Thus,

o A = B if and only if and . In other words,

We take above definition of the equality of two sets to establish identities involving set theoretic operation.

Set operations

We can combine events by set operation to get other events. Following set operations are useful:

Union: The union of two sets A and B, denoted by is defined as the set of elements that
are either in A or in B or both. In set builder notation, we write

Intersection: The intersection of two sets A and B, denoted by , is defined as the set of elements
that are common to both A and B. We can write,

Difference: The difference of two sets A and B, denoted by is the set of those elements of A
which do not belong to B. Thus,

Random Signal Analysis Page 10


Similarly,

Complement: The complement of a set A, denoted by , is defined as the set of all elements which
are not in A.

Clearly,

Example 4

Disjoint: Two sets A and B are called disjoint if

In the above example, B and C are disjoint sets.

Venn diagram

The sets and set operations can be illustrated by means of the Venn diagrams. A rectangle is used to
represent the universal set and a circle is used to represent any set in it.

Following are a few illustrations of Venn diagram:

Random Signal Analysis Page 11


Properties of set operations

1. Identity properties

Random Signal Analysis Page 12


2. Commutative properties

3. Associative properties

4. Distributive properties

5. Complementary properties

6. De Morgan’s laws

These properties can be proved easily and verified using the Venn diagram. They can be used to
derive useful results.

For example, Putting in

Random Signal Analysis Page 13


Probability Concepts

Before we give a definition of probability, let us examine the following concepts:

1. Random Experiment: An experiment is a random experiment if its outcome cannot be predicted


precisely. One out of a number of outcomes is possible in a random experiment. A single
performance of the random experiment is called a trial.

2. Sample Space: The sample space is the collection of all possible outcomes of a random

experiment. The elements of are called sample points.


 A sample space may be finite, countably infinite or uncountable.
 A finite or countably infinite sample space is called a discrete sample space.
 An uncountable sample space is called a continuous sample space

3. Event: An event A is a subset of the sample space such that probability can be assigned to it. Thus


 For a discrete sample space, all subsets are events.

 is the certain event (sure to occur) and is the impossible event.

Figure 1

Random Signal Analysis Page 14


Consider the following examples.

Example 1. Tossing a fair coin

The possible outcomes are H (head) and T (tail). The associated sample space is It is a finite

sample space. The events associated with the sample space are: and .

Example 2. Throwing a fair die:

The possible 6 outcomes are:

The associated finite sample space is .Some events are

Example 3. Tossing a fair coin until a head is obtained

We may have to toss the coin any number of times before a head is obtained. Thus the possible outcomes
are :
H, TH,TTH,TTTH, …..
How many outcomes are there? The outcomes are countable but infinite in number. The countably infinite

sample space is .

Example 4. Picking a real number at random between -1 and +1

The associated sample space is

Clearly is a continuous sample space.

Example 5. Output of a radio receiver at any time

Random Signal Analysis Page 15


Suppose the output voltage of a radio receiver at any time t is a value lying between -5 V and 5V.

The associated sample space is continuous and given by

Clearly is a continuous sample space.

The probability of an event is a number assigned to the event. Let us see how we can define
probability.

Classical definition of probability (Laplace 1812)

Consider a random experiment with a finite number of outcomes If all the outcomes of the experiment
are equally likely, the probability of an event is defined by

where

Example 6 A fair die is rolled once. What is the probability of getting a ‘6’ ?

Here and

Example 7 A fair coins is tossed twice. What is the probability of getting two ‘heads'?

Here and .
Total number of outcomes is 4 and all four outcomes are equally likely.

Only outcome favourable to is {HH}

Random Signal Analysis Page 16


Remark

 The classical definition is limited to a random experiment which has only a finite number of
outcomes. In many experiments like that in the above examples, the sample space is finite and each
outcome may be assumed ‘equally likely.' In such cases, the counting method can be used to
compute probabilities of events.

 Consider the experiment of tossing a fair coin until a ‘head' appears. As we have discussed earlier,
there are countably infinite outcomes. Can you believe that all these outcomes are equally likely?

 The notion of equally likely is important here. Equally likely means equally probable. Thus this
definition presupposes that all events occur with equal probability. Thus the definition includes a
concept to be defined.

Relative-frequency based definition of probability (von Mises, 1919)

If an experiment is repeated times under similar conditions and the event occurs in times, then

Example 8 Suppose a die is rolled 500 times. The following table shows the frequency each face.

Random Signal Analysis Page 17


We see that the relative frequencies are close to . How do we ascertain that these relative frequencies

will approach to as we repeat the experiments infinite no of times?

Discussion this definition is also inadequate from the theoretical point of view.

 We cannot repeat an experiment infinite number of times.


 How do we ascertain that the above ratio will converge for all possible sequences of
outcomes of the experiment?

Axiomatic definition of probability (Kolmogorov, 1933)

We have earlier defined an event as a subset of the sample space. Does each subset of the sample
space forms an event?

The answer is yes for a finite sample space. However, we may not be able to assign probability
meaningfully to all the subsets of a continuous sample space. We have to eliminate those subsets.
The concept of the sigma algebra is meaningful now.

Definition Let be a sample space and a sigma field defined over it. Let be a mapping
from the sigma-algebra into the real line such that for each , there exists a unique

. Clearly is a set function and is called probability, if it satisfies the following three
axioms.

Random Signal Analysis Page 18


 The triplet is called the probability space.
 Any assignment of probability assignment must satisfy the above three axioms

 If ,

This is a special case of axiom 3 and for a discrete sample space , this simpler version may be
considered as the axiom 3. We shall give a proof of this result below.

 The events A and B are called mutually exclusive .

Random Signal Analysis Page 19


For any event , we can define the probability

In a special case, when the outcomes are equi-probable, we can assign equal probability p to each
elementary event.

Example 9 Consider the experiment of rolling a fair die considered in example 2.

Suppose represent the elementary events. Thus is the event of getting ‘1', is the
event of getting '2' and so on.

Since all six disjoint events are equiprobable and we get ,

Random Signal Analysis Page 20


Suppose is the event of getting an odd face. Then

Example 10 Consider the experiment of tossing a fair coin until a head is obtained discussed in Example 3.

Here . Let us call

and so on. If we assign, then Let is the event of obtaining


the head before the 4 th toss. Then

Probability assignment in a continuous space

Suppose the sample space S is continuous and un-countable. Such a sample space arises when the
outcomes of an experiment are numbers. For example, such sample space occurs when the experiment
consists in measuring the voltage, the current or the resistance. In such a case, the sigma algebra consists
of the Borel sets on the real line.

Suppose and is a non-negative integrable function such that,

Random Signal Analysis Page 21


For any Borel set ,

defines the probability on the Borel sigma-algebra B.

We can similarly define probability on the continuous space of etc.

Example 11 Suppose

Then for

Example 12 Consider the two-dimensional Euclidean space. Let and represents

the area under .

This example interprets the geometrical definition of probability.

Probability Using Counting Method

In many applications we have to deal with a finite sample space and the elementary events formed
by single elements of the set may be assumed equiprobable. In this case, we can define the probability of
the event A according to the classical definition discussed earlier:

Random Signal Analysis Page 22


where = number of elements favourable to A and n is the total number of elements in the sample

space .

Thus calculation of probability involves finding the number of elements in the sample space and the

event A. Combinatorial rules give us quick algebraic formulae to find the elements in .We briefly outline
some of these rules:

1. Product rule Suppose we have a set A with m distinct elements and the set B with n distinct

elements and . Then contains mn ordered pair of elements.


This is illustrated in Fig for m=5 and n=4 n other words if we can choose element a in m possible
ways and the element b in n possible ways then the ordered pair (a, b) can be chosen in mn possible
ways.

Figure 1 Illustration of the product rule

The above result can be generalized as follows: The number of distinct k -tupples in

is where represents the

number of distinct elements in .

Random Signal Analysis Page 23


Example 1 A fair die is thrown twice. What is the probability that a 3 will appear at least once.

Solution: The sample space corresponding to two throws of the die is illustrated in the following table.

Clearly, the sample space has elements by the product rule. The event corresponding to

getting at least one 3 is highlighted and contains 11 elements. Therefore, the required probability is .

2. Sampling with replacement and ordering

Suppose we have to choose k objects from a set of n objects. Since sampling is with ordering, each ordered
arrangements of k objects is to be considered. Further, after every choosing, the object is placed back in

the set. In this case, the number of distinct ordered k - tupples = . If a random
experiment has n outcomes, if the experiment is repeated k times, then

the total number of outcomes = .

3. Sampling without replacement and with ordering

Suppose we have to choose k objects from a set of n objects by picking one object after another at random.

In this case the first object can be chosen from n objects, the second object can be chosen from n-1 objects,
and so. Therefore, by applying the product rule, the number of distinct ordered k-tupples in this case is

Random Signal Analysis Page 24


The number is called the permutation of n objects taking k at a time and denoted by .Thus

Clearly,

Example 2 Birthday problem - Given a class of students, what is the probability of two students in the
class having the same birthday? Plot this probability vs. number of students and be surprised!.

Let be the number of students in the class.

Random Signal Analysis Page 25


The plot of probability vs number of students is shown in above table. Observe the steep rise in the
probability in the beginning. In fact this probability for a group of 25 students is greater than 0.5 and that
for 60 students onward is closed to 1. This probability for 366 or more number of students is exactly one.

Sampling without replacing and without ordering Suppose be the number of ways in which k objects
can be chosen out of a set of
n objects. In this case ordering of the objects in the set of k objects is not considered.
Note that k objects can be arranged among themselves in k ways. Therefore, if ordering of the k objects is

considered, the number of ways in which k objects can be chosen out of n objects is . This is the case
of sampling with ordering.

is also called the binomial co-efficient.

Example 3 An urn contains 6 red balls, 5 green balls and 4 blue balls. 9 balls were picked at random from
the urn without replacement. What is the probability that out of the balls 4 are red, 3 are green and 2 are
blue?
Solution:

9 balls can be picked from a population of 15 balls in .

Therefore the required probability is

5. Arranging n objects into k specific groups Suppose we want to partition a set of n distinct elements

into k distinct subsets of sizes respectively so that .

Random Signal Analysis Page 26


Then the total number of distinct partitions is

This can be proved by noting that the resulting number of partitions is

Example 4 What is the probability that in a throw of 12 dice each face occurs twice?

Solution: The total number of elements in the sample space of the outcomes of a single throw of 12

dice is

The number of favourable outcomes is the number of ways in which 12 dice can be arranged in six
groups of size 2 each – group 1 consisting of two dice each showing 1, group 2 consisting of two dice each
showing 2 and so on.
Therefore, the total number distinct groups

Hence the required probability is

Conditional probability

Random Signal Analysis Page 27


Consider the probability space . Let A and B two events in . We ask the following question

Given that A has occurred, what is the probability of B?

The answer is the conditional probability of B given A denoted by . We shall develop the
concept of the conditional probability and explain under what condition this conditional probability is same

as .

Let us consider the case of equiprobable events discussed earlier. Let sample points be
favourable for the joint event .

Random Signal Analysis Page 28


This concept suggests us to define conditional probability. The probability of an event B under the condition
that another event A has occurred is called the conditional probability of B given A and defined by

We can similarly define the conditional probability of A given B , denoted by .

From the definition of conditional probability, we have the joint probability of two events
A and B as follows

Example 1 Consider the example tossing the fair die. Suppose

Example 2 A family has two children. It is known that at least one of the children is a girl. What is the
probability that both the children are girls?

Random Signal Analysis Page 29


A = event of at least one girl

B = event of two girls

Conditional probability and the axioms of probability

In the following we show that the conditional probability satisfies the axioms of probability.

By definition

Axiom 1:

Axiom 2:

We have,

Axiom 3:

Random Signal Analysis Page 30


Consider a sequence of disjoint events .

We have,

Note that the sequence is also sequence of disjoint events.

Properties of Conditional Probabilities

If , then

We have ,

Random Signal Analysis Page 31


Chain Rule of Probability

We have ,

We can generalize the above to get the chain rule of probability

Random Signal Analysis Page 32


3. Theorem of Total Probability Let be n events such that

Then for any event B,

Proof: We have and the sequence is disjoint.

Remark

(1) A decomposition of a set S into 2 or more disjoint nonempty subsets is called a partition of S. The

subsets form a partition of S if

(2) The theorem of total probability can be used to determine the probability of a complex event in terms
of related simpler events. This result will be used in Bays' theorem to be discussed to the end of the lecture.

Example 3 Suppose a box contains 2 white and 3 black balls. Two balls are picked at random without
replacement.

Let = event that the first ball is white and

Random Signal Analysis Page 33


Let = event that the first ball is black.

Clearly and form a partition of the sample space corresponding to picking two balls from the
box.

Let B = the event that the second ball is white. Then.

Independent events

Two events are called independent if the probability of occurrence of one event does not affect the
probability of occurrence of the other. Thus the events A and B are independent if

and

where and are assumed to be non-zero.

Equivalently if A and B are independent, we have

or --------------------

Random Signal Analysis Page 34


Two events A and B are called statistically dependent if they are not independent. Similarly, we can define

the independence of n events. The events are called independent if and only

Example 4 Consider the example of tossing a fair coin twice. The resulting sample space is given by

and all the outcomes are equiprobable.

Let be the event of getting ‘tail' in the first toss and be the event of
getting ‘head' in the second toss. Then

and

Again, so that

Hence the events A and B are independent.

Example 5 Consider the experiment of picking two balls at random discussed in example 3.

In this case, and .

Therefore, and and B are dependent.

Random Signal Analysis Page 35


Baye’s Theorem

This result is known as the Baye's theorem. The probability is called the a priori probability

and is called the a posteriori probability. Thus the Bays' theorem enables us to determine the a

posteriori probability from the observation that B has occurred. This result is of practical
importance and is the heart of Baysean classification, Baysean estimation etc.

Example 1

In a binary communication system a zero and a one is transmitted with probability 0.6 and 0.4
respectively. Due to error in the communication system a zero becomes a one with a probability 0.1 and a
one becomes a zero with a probability 0.08. Determine the probability (i) of receiving a one and (ii) that a
one was transmitted when the received message is one.

Let S be the sample space corresponding to binary communication. Suppose be event of

transmitting 0 and be the event of transmitting 1 and and be corresponding events of receiving 0
and 1 respectively.

Random Signal Analysis Page 36


Given and

Example 2 In an electronics laboratory, there are identically looking capacitors of three makes

in the ratio 2:3:4. It is known that 1% of ,1.5% of are defective. What


percentage of capacitors in the laboratory are defective? If a capacitor picked at defective is found to be

defective, what is the probability it is of make ?

Let D be the event that the item is defective. Here we have to find .

Here

The conditional probabilities are

Random Signal Analysis Page 37


Example 3

Box A contains 2 red chips; box B contains two white chips; and box C contains 1 red chip and 1 white chip.
A box is selected at random, and one chip is taken at random from that box. What is the probability of
selecting a white chip?

Solution. Let A be the event that Box A is randomly selected; let B be the event that Box B is randomly
selected; and let C be the event that Box C is randomly selected. Because there are three boxes that are
equally likely to be selected, P(A) = P(B) = P(C) = 1/3. Let W be the event that a white chip is randomly
selected. The probability of selecting a white chip from a box depends on the box from which the chip is
selected:

P(W | A) = 0

P(W | B) = 1

P(W | C) = 1/2

Now, a white chip could be selected in one of three ways: (1) Box A could be selected, and then a white
chip be selected from it; or (2) Box B could be selected, and then a white chip be selected from it; or (3) Box
C could be selected, and then a white chip be selected from it. That is, the probability that a white chip is
selected is:

(P(W) = P[(W/A) or (W/ B) or (W/ C)])

Then, recognizing that the events W ∩ A, W ∩ B, and W ∩ C are mutually exclusive, we get

(P(W) = P(W|A)P(A)+P(W|B)P(B)+P(W|C)P(C)

= 1/2

Random Signal Analysis Page 38


Example 4

In the above example, if the selected chip is white, what is the probability that the other chip in the box is
red?

Solution. The box that contains one white chip and one red chip is Box C. Therefore, we are interested in
finding P(C | W). From previous example P(W) = ½.

𝑃(𝐶∩𝑊)
P(C/ W) =
𝑃(𝑊)

P(W|C)P(C)
=
P(W|A)P(A)+P(W|B)P(B)+P(W|C)P(C)

= 1/3

Repeated Trials

In our discussions so far, we considered the probability defined over a sample space corresponding to
a random experiment. Often, we have to consider several random experiments in a sequence. For example,
the experiment corresponding to sequential transmission of bits through a communication system may be
considered as a sequence of experiments each representing transmission of single bit through the channel.

PRODUCT: Suppose two experiments and with the corresponding sample space and are

performed sequentially. Such a combined experiment is called the product of two experiments and .

Clearly, the outcome of this combined experiment consists of the ordered pair where

and . The sample space corresponding to the combined experiment is given by . The

events in S consist of all the Cartesian products of the form where is an event in and is an

event in . Our aim is to define the probability .

Random Signal Analysis Page 39


We can easily show that

where is the probability defined on the events of This is because, the event in

S occurs whenever in occurs, irrespective of the event in .

Also note that

Independent Experiments

In many experiments, the events and are independent for every selection of

and . Such experiments are called independent experiments.

Random Signal Analysis Page 40


In this case can write

Example 1

Consider the experiments of rolling a fair die and tossing a fair coin sequentially. What is the probability
that a '2' and a 'head' will occur?

Solution: Suppose is the sample space of the experiment of rolling of a six-faced fair die and is the
sample space of the experiment of tossing of a fair die.

Example 2

In a digital communication system transmitting 1 and 0, 1 is transmitted twice as often as 0. If two


bits are transmitted in a sequence, what is the probability that both the bits will be 1?

Solution:

Random Signal Analysis Page 41


We can similarly define the sample space corresponding to n experiments and
the Cartesian product of events

If the experiments are independent, we can write

where is probability defined on the event of .

Bernoulli trial

Suppose in an experiment, we are only concerned whether a particular event A has occurred or not.

We call this event as the ‘success' with probability and the complementary event as the

‘failure' with probability . Such a random experiment is called Bernoulli trial.

Probability of

Success :

Failure :

Random Signal Analysis Page 42


Binomial Law

We are interested in finding the probability of k ‘successes' in n independent Bernoulli trials.

This probability is given by

Consider n independent repetitions of the Bernoulli trial. Let S be the sample space associated with

each trial and we are interested in a particular event and its complement such that and

. If A occurs in a trial, then we have a ‘success' otherwise a 'failure'.

Thus the sample space corresponding to the n repeated trials is .

Any event in S is of the form where some s are A and remaining s are .

Using the property of independent experiment we have,

If k 's are A and the remaining n - k 's are , then

But there are number of events in S with k number of A's and n - k number of s.

For example, if , the possible events are

Random Signal Analysis Page 43


We also note that all the events are mutually exclusive.

Hence the probability of k successes in n independent repetitions of the Bernoulli trial is given by

The above law is known as the binomial probability law.

A typical plot of vs k for n=20 and a particluar value of p is shown in the figure.

Example 3 A fair dice is rolled 6 times. What is the probability that a 4 appears thrice?

Solution:

We have

with

And with

Example 4
Random Signal Analysis Page 44
A communication source emits binary symbols 1 and 0 with probability 0.6 and 0.4 respectively. What is the
probability that there will be 5 1's in a message of 20 symbols?

Solution:

Example 5 In a binary communication system, bit error occurs with a probability of . What is
the probability of getting at least one error bit in a message of 8 bits?

Solution:

Here we can consider the sample space

Approximations of the Binomial probabilities

Two interesting approximations of the binomial probabilities are very important .

Case 1: Suppose n is very large and p is very small and a constant.

Random Signal Analysis Page 45


This probability law is known as Poisson probability and widely used in engineering and other fields.
We shall discuss more about it in a later class.

Case 2: When n is sufficiently large and , in the neighbourhood of np(1 - p) may be


approximated as

The right hand side is an expression for normal probability law to be discussed in a later class.

Example 6 Consider the problem in Example 2.

Random Signal Analysis Page 46


Here,

More Problems

Question 1: A die is rolled, find the probability that an even number is obtained.

Solution to Question 1:

 Let us first write the sample space S of the experiment.

S = {1,2,3,4,5,6}

 Let E be the event "an even number is obtained" and write it down.

E = {2,4,6}

 We now use the formula of the classical probability.

P(E) = n(E) / n(S) = 3 / 6 = 1 / 2

Question 2: Two coins are tossed, find the probability that two heads are obtained.

Note: Each coin has two possible outcomes H (heads) and T (Tails).

Solution to Question 2:

 The sample space S is given by.

S = {(H,T),(H,H),(T,H),(T,T)}

 Let E be the event "two heads are obtained".

E = {(H,H)}

 We use the formula of the classical probability.

P(E) = n(E) / n(S) = 1 / 4

Question 3: Which of these numbers cannot be a probability?

a) -0.00001

Random Signal Analysis Page 47


b) 0.5
c) 1.001
d) 0
e) 1
f) 20%

Solution to Question 3:

 A probability is always greater than or equal to 0 and less than or equal to 1, hence only a) and c)
above cannot represent probabilities: -0.00010 is less than 0 and 1.001 is greater than 1.

Question 4: Two dice are rolled, find the probability that the sum is

a) equal to 1

b) equal to 4

c) less than 13

Solution to Question 4:

 a) The sample space S of two dice is shown below.

S = { (1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6) }

 Let E be the event "sum equal to 1". There are no outcomes which correspond to a sum equal to 1,
hence

P(E) = n(E) / n(S) = 0 / 36 = 0

 b) Three possible outcomes give a sum equal to 4: E = {(1,3),(2,2),(3,1)}, hence.

P(E) = n(E) / n(S) = 3 / 36 = 1 / 12

 c) All possible outcomes, E = S, give a sum less than 13, hence.

P(E) = n(E) / n(S) = 36 / 36 = 1

Random Signal Analysis Page 48


Question 5: A die is rolled and a coin is tossed, find the probability that the die shows an odd number and
the coin shows a head.

Solution to Question 5:

 The sample space S of the experiment described in question 5 is as follows

S = { (1,H),(2,H),(3,H),(4,H),(5,H),(6,H)
(1,T),(2,T),(3,T),(4,T),(5,T),(6,T)}

 Let E be the event "the die shows an odd number and the coin shows a head". Event E may be
described as follows

E={(1,H),(3,H),(5,H)}

 The probability P(E) is given by

P(E) = n(E) / n(S) = 3 / 12 = 1 / 4

Question 6: A card is drawn at random from a deck of cards. Find the probability of getting the 3 of
diamond.

Solution to Question 6:

 The sample space S of the experiment in question 6 is shown below

Random Signal Analysis Page 49


 Let E be the event "getting the 3 of diamond". An examination of the sample space shows that there
is one "3 of diamond" so that n(E) = 1 and n(S) = 52. Hence the probability of event E occurring is
given by

P(E) = 1 / 52

Question 7: A card is drawn at random from a deck of cards. Find the probability of getting a queen.

Solution to Question 7:

 The sample space S of the experiment in question 7 is shown above (see question 6)

 Let E be the event "getting a Queen". An examination of the sample space shows that there are 4
"Queens" so that n(E) = 4 and n(S) = 52. Hence the probability of event E occurring is given by

P(E) = 4 / 52 = 1 / 13

Question 8: A jar contains 3 red marbles, 7 green marbles and 10 white marbles. If a marble is drawn from
the jar at random, what is the probability that this marble is white?

Solution to Question 8:

 We first construct a table of frequencies that gives the marbles color distributions as follows

color frequency
red 3
green 7
white 10

 We now use the empirical formula of the probability

Frequency for white color


P(E)= ________________________________________________

Total frequencies in the above table

= 10 / 20 = 1 / 2

Random Signal Analysis Page 50


Question 9: The blood groups of 200 people is distributed as follows: 50 have type A blood, 65 have B
blood type, 70 have O blood type and 15 have type AB blood. If a person from this group is selected at
random, what is the probability that this person has O blood type?

Solution to Question 9:

 We construct a table of frequencies for the the blood groups as follows

group frequency
a 50
B 65
O 70
AB 15

 We use the empirical formula of the probability

Frequency for O blood


P(E)= ________________________________________________

Total frequencies

= 70 / 200 = 0.35

Random Variable

In application of probabilities, we are often concerned with numerical values which are random in
nature. For example, we may consider the number of customers arriving at a service station at a particular
interval of time or the transmission time of a message in a communication system. These random
quantities may be considered as real-valued function on the sample space. Such a real-valued function is
called real random variable and plays an important role in describing random data. We shall introduce the
concept of random variables in the following sections.

Mathematical Preliminaries

Real-valued point function on a set


Random Signal Analysis Page 51
Recall that a real-valued function maps each element , a unique element .The

set is called the domain of and the set is called the range of . Clearly .

The range and domain of are shown in Figure below

A random variable associates the points in the sample space with real numbers.

Consider the probability space and function mapping the sample space into

the real line. Let us define the probability of a subset by

Such a definition will be valid if is a valid event. If is a discrete sample space, is

always a valid event, but the same may not be true if is infinite. The concept of sigma algebra is again
necessary to overcome this difficulty. We also need the Borel sigma algebra -the sigma algebra defined
on the real line.

The function is called a random variable if the inverse image of all Borel sets under is
an event. Thus, if is a random variable, then

Random Signal Analysis Page 52


Example 1 Consider the example of tossing a fair coin twice. The sample space is S={ HH,HT,TH,TT} and
all four outcomes are equally likely. Then we can define a random variable as follows

Here .

Example 2 Consider the sample space associated with the single toss of a fair die. The sample space is

given by .

If we define the random variable that associates a real number equal to the number on the face of

the die, then .

Axiom1

Axiom2

Axiom 3

Suppose are disjoint Borel sets. Then are distinct events in


.Therefore,

Random Signal Analysis Page 53


Thus the random variable X induces a probability space

Probability Distribution Function(PDF):

We have seen that the event and are equivalent and .The

underlying sample space is omitted in notation and we simply write and instead of

and respectively.

Consider the Borel set , where represents any real number. The equivalent event

is denoted as .The event can be taken as a


representative event in studying the probability description of a random variable . Any other event can
be represented in terms of this event.

For example,

and so on.

The probability is called the probability distribution function ( also

called the cumulative distribution function , abbreviated as CDF ) of and denoted by . Thus

Random Signal Analysis Page 54


Properties of the Distribution Function

1.

This follows from the fact that is a probability and its value should lie between 0 and 1.

2. is a non-decreasing function of . Thus, if

3. is right continuous.

4.

5.

6.
We have ,

7.

Random Signal Analysis Page 55


Example 1: Consider the random variable defined by

Find a) .

b) .

c) .

d) .
Solution:

Random Signal Analysis Page 56


Discrete, Continuous and Mixed-type Random Variables

A random variable is called a discrete random variable if is piece-wise constant. Thus is

flat except at the points of jump discontinuity. If the sample space is discrete the random variable
defined on it is always discrete.

• X is called a continuous random variable if is an absolutely continuous function of x . Thus

is continuous everywhere on and exists everywhere except at finite or countably infinite


points .

• X is called a mixed random variable if has jump discontinuity at countable number of points
and increases continuously at least in one interval of X. For a such type RV X,

where is the distribution function of a discrete RV, is the distribution function of a


continuous RV and o< p <1.

Typical plots of for discrete, continuous and mixed-random variables are shown in Figure 1,
Figure 2 and Figure 3 respectively.

Plot of Fx(x) vs. x for continuous random variable

Random Signal Analysis Page 57


Plot of vs. for a mixed-type random variable

Discrete Random Variables and Probability Mass functions

A random variable is said to be discrete if the number of elements in the range is finite or
accountably infinite.

First assume to be countably finite. Let be the elements of . Here the mapping

partitions into subsets .


The discrete random variable in this case is completely specified by the probability mass function

(pmf) .

Random Signal Analysis Page 58


Clearly,

• Suppose .Then

Figure illustrates a discrete random variable.

Figure shows Discrete Random Variable

Random Signal Analysis Page 59


Example 1 Consider the random variable with the distribution function

Continous Random Variables and Probability Density Functions

For a continuous random variable , is continuous everywhere. Therefore,

This implies that for

Random Signal Analysis Page 60


Therefore, the probability mass function of a continuous RV is zero for all .A continuous random
variable cannot be characterized by the probability mass function. A continuous random variable has a very
important chacterisation in terms of a function called the probability density function.

If is differentiable, the probability density function ( pdf) of denoted by is defined as

Interpretation of

so that

Thus the probability of lying in some interval is determined by . In that sense,

represents the concentration of probability just as the density represents the concentration of mass.

Random Signal Analysis Page 61


Properties of the Probability Density Function

 .

This follows from the fact that is a non-decreasing function

Example 1 Consider the random variable with the distribution function

The pdf of the RV is given by

Remark: Using the Dirac delta function we can define the density function for a discrete random
variables.
Consider the random variable defined by the probability mass function (pmf)

Random Signal Analysis Page 62


The distribution function can be written as

where is the shifted unit-step function given by

Then the density function can be written in terms of the Dirac delta function as

Example 2

Consider the random variable defined with the distribution function given by,

Probability Density Function of Mixed Random Variable

Suppose is a mixed random variable with having jump discontinuity at .


As already stated, the CDF of a mixed random variable is given by

where is a discrete distribution function of and is a continuous distribution function of


Random Signal Analysis Page 63
. The corresponding pdf is given by

where

and is a continuous pdf. We can establish the above relations as follows.

Suppose denotes the countable subset of points on such that the random variable

is characterized by the probability mass function . Similarly, let be a continuous

subset of points on such that RV is characterized by the probability density function .

Clearly the subsets and partition the set If , then .

Thus the probability of the event can be expressed as

Taking the derivative with respect to x , we get

Random Signal Analysis Page 64


Example 4 Consider the random variable with the distribution function

can be expressed as

Example 5

X is the random variable representing the life time of a device with the PDF for . Define the
following random variable

Find FY(y).

Random Signal Analysis Page 65


Solution:

Binomial Probability Distribution

To understand binomial distributions and binomial probability, it helps to understand binomial experiments
and some associated notation; so we cover those topics first.

Binomial Experiment

A binomial experiment is a statistical experiment that has the following properties:

 The experiment consists of n repeated trials.


 Each trial can result in just two possible outcomes. We call one of these outcomes a success and the
other, a failure.
 The probability of success, denoted by P, is the same on every trial.
 The trials are independent; that is, the outcome on one trial does not affect the outcome on other
trials.

Consider the following statistical experiment. You flip a coin 2 times and count the number of times the
coin lands on heads. This is a binomial experiment because:

 The experiment consists of repeated trials. We flip a coin 2 times.


 Each trial can result in just two possible outcomes - heads or tails.
 The probability of success is constant - 0.5 on every trial.
 The trials are independent; that is, getting heads on one trial does not affect whether we get heads
on other trials.

Notation

The following notation is helpful, when we talk about binomial probability.

 x: The number of successes that result from the binomial experiment.


 n: The number of trials in the binomial experiment.
 P: The probability of success on an individual trial.
 Q: The probability of failure on an individual trial. (This is equal to 1 - P.)

Random Signal Analysis Page 66


 n!: The factorial of n (also known as n factorial).
 b(x; n, P): Binomial probability - the probability that an n-trial binomial experiment results in exactly
x successes, when the probability of success on an individual trial is P.
 nCr: The number of combinations of n things, taken r at a time.

Binomial Distribution

A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The
probability distribution of a binomial random variable is called a binomial distribution.

Suppose we flip a coin two times and count the number of heads (successes). The binomial random variable
is the number of heads, which can take on values of 0, 1, or 2. The binomial distribution is presented below.

Number of heads Probability

0 0.25

1 0.50

2 0.25

The binomial distribution has the following properties:

 The mean of the distribution (μx) is equal to n * P .


 The variance (σ2x) is n * P * ( 1 - P ).
 The standard deviation (σx) is sqrt[ n * P * ( 1 - P ) ].

Binomial Formula and Binomial Probability

The binomial probability refers to the probability that a binomial experiment results in exactly x successes.
For example, in the above table, we see that the binomial probability of getting exactly one head in two
coin flips is 0.50.

Given x, n, and P, we can compute the binomial probability based on the binomial formula:

Binomial Formula. Suppose a binomial experiment consists of n trials and results in x successes. If the
probability of success on an individual trial is P, then the binomial probability is:

b(x; n, P) = nCx * Px * (1 - P)n - x


or
b(x; n, P) = { n! / [ x! (n - x)! ] } * Px * (1 - P)n - x

Random Signal Analysis Page 67


Example 1

Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?

Solution: This is a binomial experiment in which the number of trials is equal to 5, the number of successes
is equal to 2, and the probability of success on a single trial is 1/6 or about 0.167. Therefore, the binomial
probability is:

b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3


b(2; 5, 0.167) = 0.161

Cumulative Binomial Probability

A cumulative binomial probability refers to the probability that the binomial random variable falls within a
specified range (e.g., is greater than or equal to a stated lower limit and less than or equal to a stated upper
limit).

For example, we might be interested in the cumulative binomial probability of obtaining 45 or fewer heads
in 100 tosses of a coin (see Example 1 below). This would be the sum of all these individual binomial
probabilities.

b(x < 45; 100, 0.5) =


b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + ... + b(x = 44; 100, 0.5) + b(x = 45; 100, 0.5)

Example 1

What is the probability of obtaining 45 or fewer heads in 100 tosses of a coin?

Solution: To solve this problem, we compute 46 individual probabilities, using the binomial formula. The
sum of all these probabilities is the answer we seek. Thus,

b(x < 45; 100, 0.5) = b(x = 0; 100, 0.5) + b(x = 1; 100, 0.5) + . . . + b(x = 45; 100, 0.5)
b(x < 45; 100, 0.5) = 0.184

Example 2

The probability that a student is accepted to a prestigious college is 0.3. If 5 students from the same school
apply, what is the probability that at most 2 are accepted?

Solution: To solve this problem, we compute 3 individual probabilities, using the binomial formula. The sum
of all these probabilities is the answer we seek. Thus,

Random Signal Analysis Page 68


b(x < 2; 5, 0.3) = b(x = 0; 5, 0.3) + b(x = 1; 5, 0.3) + b(x = 2; 5, 0.3)
b(x < 2; 5, 0.3) = 0.1681 + 0.3601 + 0.3087
b(x < 2; 5, 0.3) = 0.8369

Example 3

What is the probability that the world series will last 4 games? 5 games? 6 games? 7 games? Assume that
the teams are evenly matched.

Solution: This is a very tricky application of the binomial distribution. If you can follow the logic of this
solution, you have a good understanding of the material covered in the tutorial, to this point.

In the world series, there are two baseball teams. The series ends when the winning team wins 4 games.
Therefore, we define a success as a win by the team that ultimately becomes the world series champion.

For the purpose of this analysis, we assume that the teams are evenly matched. Therefore, the probability
that a particular team wins a particular game is 0.5.

Let's look first at the simplest case. What is the probability that the series lasts only 4 games. This can occur
if one team wins the first 4 games. The probability of the National League team winning 4 games in a row
is:

b(4; 4, 0.5) = 4C4 * (0.5)4 * (0.5)0 = 0.0625

Similarly, when we compute the probability of the American League team winning 4 games in a row, we
find that it is also 0.0625. Therefore, probability that the series ends in four games would be 0.0625 +
0.0625 = 0.125; since the series would end if either the American or National League team won 4 games in
a row.

Now let's tackle the question of finding probability that the world series ends in 5 games. The trick in
finding this solution is to recognize that the series can only end in 5 games, if one team has won 3 out of
the first 4 games. So let's first find the probability that the American League team wins exactly 3 of the first
4 games.

b(3; 4, 0.5) = 4C3 * (0.5)3 * (0.5)1 = 0.25

Okay, here comes some more tricky stuff, so listen up. Given that the American League team has won 3 of
the first 4 games, the American League team has a 50/50 chance of winning the fifth game to end the
series. Therefore, the probability of the American League team winning the series in 5 games is 0.25 * 0.50
= 0.125. Since the National League team could also win the series in 5 games, the probability that the series
ends in 5 games would be 0.125 + 0.125 = 0.25.

The rest of the problem would be solved in the same way. You should find that the probability of the series
ending in 6 games is 0.3125; and the probability of the series ending in 7 games is also 0.3125.
Random Signal Analysis Page 69
Negative Binomial Distribution

In this lesson, we cover the negative binomial distribution and the geometric distribution. As we will see,
the geometric distribution is a special case of the negative binomial distribution.

Negative Binomial Experiment

A negative binomial experiment is a statistical experiment that has the following properties:

 The experiment consists of x repeated trials.


 Each trial can result in just two possible outcomes. We call one of these outcomes a success and the
other, a failure.
 The probability of success, denoted by P, is the same on every trial.
 The trials are independent; that is, the outcome on one trial does not affect the outcome on other
trials.
 The experiment continues until r successes are observed, where r is specified in advance.

Consider the following statistical experiment. You flip a coin repeatedly and count the number of times the
coin lands on heads. You continue flipping the coin until it has landed 5 times on heads. This is a negative
binomial experiment because:

 The experiment consists of repeated trials. We flip a coin repeatedly until it has landed 5 times on
heads.
 Each trial can result in just two possible outcomes - heads or tails.
 The probability of success is constant - 0.5 on every trial.
 The trials are independent; that is, getting heads on one trial does not affect whether we get heads
on other trials.
 The experiment continues until a fixed number of successes have occurred; in this case, 5 heads.

Notation

The following notation is helpful, when we talk about negative binomial probability.

 x: The number of trials required to produce r successes in a negative binomial experiment.


 r: The number of successes in the negative binomial experiment.
 P: The probability of success on an individual trial.
 Q: The probability of failure on an individual trial. (This is equal to 1 - P.)
 b*(x; r, P): Negative binomial probability - the probability that an x-trial negative binomial
experiment results in the rth success on the xth trial, when the probability of success on an
individual trial is P.
 nCr: The number of combinations of n things, taken r at a time.

Random Signal Analysis Page 70


Negative Binomial Distribution

A negative binomial random variable is the number X of repeated trials to produce r successes in a
negative binomial experiment. The probability distribution of a negative binomial random variable is called
a negative binomial distribution. The negative binomial distribution is also known as the Pascal
distribution.

Suppose we flip a coin repeatedly and count the number of heads (successes). If we continue flipping the
coin until it has landed 2 times on heads, we are conducting a negative binomial experiment. The negative
binomial random variable is the number of coin flips required to achieve 2 heads. In this example, the
number of coin flips is a random variable that can take on any integer value between 2 and plus infinity.
The negative binomial probability distribution for this example is presented below.

Number of coin flips Probability

2 0.25

3 0.25

4 0.1875

5 0.125

6 0.078125

7 or more 0.109375

Negative Binomial Probability

The negative binomial probability refers to the probability that a negative binomial experiment results in r
- 1 successes after trial x - 1 and r successes after trial x. For example, in the above table, we see that the
negative binomial probability of getting the second head on the sixth flip of the coin is 0.078125.

Given x, r, and P, we can compute the negative binomial probability based on the following formula:

Negative Binomial Formula. Suppose a negative binomial experiment consists of x trials and results in r
successes. If the probability of success on an individual trial is P, then the negative binomial probability is:

b*(x; r, P) = x-1Cr-1 * Pr * (1 - P)x - r

Random Signal Analysis Page 71


The Mean of the Negative Binomial Distribution

If we define the mean of the negative binomial distribution as the average number of trials required to
produce r successes, then the mean is equal to:

μ=r/P

where μ is the mean number of trials, r is the number of successes, and P is the probability of a success on
any given trial.

Alternative Views of the Negative Binomial Distribution

As if statistics weren't challenging enough, the above definition is not the only definition for the negative
binomial distribution. Two common alternative definitions are:

 The negative binomial random variable is R, the number of successes before the binomial
experiment results in k failures. The mean of R is:

μR = kP/Q

 The negative binomial random variable is K, the number of failures before the binomial experiment
results in r successes. The mean of K is:

μK = rQ/P

The moral: If someone talks about a negative binomial distribution, find out how they are defining the
negative binomial random variable.

On this web site, when we refer to the negative binomial distribution, we are talking about the definition
presented earlier. That is, we are defining the negative binomial random variable as X, the total number of
trials required for the binomial experiment to produce r successes.

Geometric Distribution

The geometric distribution is a special case of the negative binomial distribution. It deals with the number
of trials required for a single success. Thus, the geometric distribution is negative binomial distribution
where the number of successes (r) is equal to 1.

An example of a geometric distribution would be tossing a coin until it lands on heads. We might ask: What
is the probability that the first head occurs on the third flip? That probability is referred to as a geometric
probability and is denoted by g(x; P). The formula for geometric probability is given below.

Random Signal Analysis Page 72


Geometric Probability Formula. Suppose a negative binomial experiment consists of x trials and results in
one success. If the probability of success on an individual trial is P, then the geometric probability is:

g(x; P) = P * Qx - 1

Sample Problems

The problems below show how to apply your new-found knowledge of the negative binomial distribution
(see Example 1) and the geometric distribution (see Example 2).

Example 1

Bob is a high school basketball player. He is a 70% free throw shooter. That means his probability of making
a free throw is 0.70. During the season, what is the probability that Bob makes his third free throw on his
fifth shot?

Solution: This is an example of a negative binomial experiment. The probability of success (P) is 0.70, the
number of trials (x) is 5, and the number of successes (r) is 3.

To solve this problem, we enter these values into the negative binomial formula.

b*(x; r, P) = x-1Cr-1 * Pr * Qx - r
b*(5; 3, 0.7) = 4C2 * 0.73 * 0.32
b*(5; 3, 0.7) = 6 * 0.343 * 0.09 = 0.18522

Thus, the probability that Bob will make his third successful free throw on his fifth shot is 0.18522.

Example 2

Let's reconsider the above problem from Example 1. This time, we'll ask a slightly different question: What
is the probability that Bob makes his first free throw on his fifth shot?

Solution: This is an example of a geometric distribution, which is a special case of a negative binomial
distribution. Therefore, this problem can be solved using the negative binomial formula or the geometric
formula. We demonstrate each approach below, beginning with the negative binomial formula.

The probability of success (P) is 0.70, the number of trials (x) is 5, and the number of successes (r) is 1. We
enter these values into the negative binomial formula.

b*(x; r, P) = x-1Cr-1 * Pr * Qx - r
b*(5; 1, 0.7) = 4C0 * 0.71 * 0.34
b*(5; 3, 0.7) = 0.00567

Random Signal Analysis Page 73


Now, we demonstate a solution based on the geometric formula.

g(x; P) = P * Qx - 1
g(5; 0.7) = 0.7 * 0.34 = 0.00567

Notice that each approach yields the same answer.

Hypergeometric Distribution

The probability distribution of a hypergeometric random variable is called a hypergeometric distribution.


This lesson describes how hypergeometric random variables, hypergeometric experiments, hypergeometric
probability, and the hypergeometric distribution are all related.

Notation

The following notation is helpful, when we talk about hypergeometric distributions and hypergeometric
probability.

 N: The number of items in the population.


 k: The number of items in the population that are classified as successes.
 n: The number of items in the sample.
 x: The number of items in the sample that are classified as successes.
 kCx: The number of combinations of k things, taken x at a time.
 h(x; N, n, k): hypergeometric probability - the probability that an n-trial hypergeometric experiment
results in exactly x successes, when the population consists of N items, k of which are classified as
successes.

Hypergeometric Experiments

A hypergeometric experiment is a statistical experiment that has the following properties:

 A sample of size n is randomly selected without replacement from a population of N items.


 In the population, k items can be classified as successes, and N - k items can be classified as failures.

Consider the following statistical experiment. You have an urn of 10 marbles - 5 red and 5 green. You
randomly select 2 marbles without replacement and count the number of red marbles you have selected.
This would be a hypergeometric experiment.

Note that it would not be a binomial experiment. A binomial experiment requires that the probability of
success be constant on every trial. With the above experiment, the probability of a success changes on
every trial. In the beginning, the probability of selecting a red marble is 5/10. If you select a red marble on
the first trial, the probability of selecting a red marble on the second trial is 4/9. And if you select a green
marble on the first trial, the probability of selecting a red marble on the second trial is 5/9.

Random Signal Analysis Page 74


Note further that if you selected the marbles with replacement, the probability of success would not
change. It would be 5/10 on every trial. Then, this would be a binomial experiment.

Hypergeometric Distribution

A hypergeometric random variable is the number of successes that result from a hypergeometric
experiment. The probability distribution of a hypergeometric random variable is called a hypergeometric
distribution.

Given x, N, n, and k, we can compute the hypergeometric probability based on the following formula:

Hypergeometric Formula. Suppose a population consists of N items, k of which are successes. And a
random sample drawn from that population consists of n items, x of which are successes. Then the
hypergeometric probability is:

h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]

The hypergeometric distribution has the following properties:

 The mean of the distribution is equal to n * k / N .


 The variance is n * k * ( N - k ) * ( N - n ) / [ N2 * ( N - 1 ) ] .

Example 1

Suppose we randomly select 5 cards without replacement from an ordinary deck of playing cards. What is
the probability of getting exactly 2 red cards (i.e., hearts or diamonds)?

Solution: This is a hypergeometric experiment in which we know the following:

 N = 52; since there are 52 cards in a deck.


 k = 26; since there are 26 red cards in a deck.
 n = 5; since we randomly select 5 cards from the deck.
 x = 2; since 2 of the cards we select are red.

We plug these values into the hypergeometric formula as follows:

h(x; N, n, k) = [ kCx ] [ N-kCn-x ] / [ NCn ]


h(2; 52, 5, 26) = [ 26C2 ] [ 26C3 ] / [ 52C5 ]
h(2; 52, 5, 26) = [ 325 ] [ 2600 ] / [ 2,598,960 ] = 0.32513

Thus, the probability of randomly selecting 2 red cards is 0.32513.

Cumulative Hypergeometric Probability

Random Signal Analysis Page 75


A cumulative hypergeometric probability refers to the probability that the hypergeometric random
variable is greater than or equal to some specified lower limit and less than or equal to some specified
upper limit.

For example, suppose we randomly select five cards from an ordinary deck of playing cards. We might be
interested in the cumulative hypergeometric probability of obtaining 2 or fewer hearts. This would be the
probability of obtaining 0 hearts plus the probability of obtaining 1 heart plus the probability of obtaining 2
hearts, as shown in the example below.

Example 1

Suppose we select 5 cards from an ordinary deck of playing cards. What is the probability of obtaining 2 or
fewer hearts?

Solution: This is a hypergeometric experiment in which we know the following:

 N = 52; since there are 52 cards in a deck.


 k = 13; since there are 13 hearts in a deck.
 n = 5; since we randomly select 5 cards from the deck.
 x = 0 to 2; since our selection includes 0, 1, or 2 hearts.

We plug these values into the hypergeometric formula as follows:

h(x < x; N, n, k) = h(x < 2; 52, 5, 13)


h(x < 2; 52, 5, 13) = h(x = 0; 52, 5, 13) + h(x = 1; 52, 5, 13) + h(x = 2; 52, 5, 13)
h(x < 2; 52, 5, 13) = [ (13C0) (39C5) / (52C5) ] + [ (13C1) (39C4) / (52C5) ] + [ (13C2) (39C3) / (52C5) ]
h(x < 2; 52, 5, 13) = [ (1)(575,757)/(2,598,960) ] + [ (13)(82,251)/(2,598,960) ] + [ (78)(9139)/(2,598,960) ]
h(x < 2; 52, 5, 13) = [ 0.2215 ] + [ 0.4114 ] + [ 0.2743 ]
h(x < 2; 52, 5, 13) = 0.9072

Thus, the probability of randomly selecting at most 2 hearts is 0.9072.

Poisson Distribution

A Poisson distribution is the probability distribution that results from a Poisson experiment.

Attributes of a Poisson Experiment

A Poisson experiment is a statistical experiment that has the following properties:

 The experiment results in outcomes that can be classified as successes or failures.


 The average number of successes (μ) that occurs in a specified region is known.
 The probability that a success will occur is proportional to the size of the region.
 The probability that a success will occur in an extremely small region is virtually zero.

Random Signal Analysis Page 76


Note that the specified region could take many forms. For instance, it could be a length, an area, a volume,
a period of time, etc.

Notation

The following notation is helpful, when we talk about the Poisson distribution.

 e: A constant equal to approximately 2.71828. (Actually, e is the base of the natural logarithm
system.)
 μ: The mean number of successes that occur in a specified region.
 x: The actual number of successes that occur in a specified region.
 P(x; μ): The Poisson probability that exactly x successes occur in a Poisson experiment, when the
mean number of successes is μ.

Poisson Distribution

A Poisson random variable is the number of successes that result from a Poisson experiment. The
probability distribution of a Poisson random variable is called a Poisson distribution.

Given the mean number of successes (μ) that occur in a specified region, we can compute the Poisson
probability based on the following formula:

Poisson Formula. Suppose we conduct a Poisson experiment, in which the average number of successes
within a given region is μ. Then, the Poisson probability is:

P(x; μ) = (e-μ) (μx) / x!

where x is the actual number of successes that result from the experiment, and e is approximately equal to
2.71828.

The Poisson distribution has the following properties:

 The mean of the distribution is equal to μ .


 The variance is also equal to μ .

Example 1

The average number of homes sold by the Acme Realty company is 2 homes per day. What is the
probability that exactly 3 homes will be sold tomorrow?

Solution: This is a Poisson experiment in which we know the following:

 μ = 2; since 2 homes are sold per day, on average.


 x = 3; since we want to find the likelihood that 3 homes will be sold tomorrow.
 e = 2.71828; since e is a constant equal to approximately 2.71828.

Random Signal Analysis Page 77


We plug these values into the Poisson formula as follows:

P(x; μ) = (e-μ) (μx) / x!


P(3; 2) = (2.71828-2) (23) / 3!
P(3; 2) = (0.13534) (8) / 6
P(3; 2) = 0.180

Thus, the probability of selling 3 homes tomorrow is 0.180 .

Cumulative Poisson Probability

A cumulative Poisson probability refers to the probability that the Poisson random variable is greater than
some specified lower limit and less than some specified upper limit.

Example 1

Suppose the average number of lions seen on a 1-day safari is 5. What is the probability that tourists will
see fewer than four lions on the next 1-day safari?

Solution: This is a Poisson experiment in which we know the following:

 μ = 5; since 5 lions are seen per safari, on average.


 x = 0, 1, 2, or 3; since we want to find the likelihood that tourists will see fewer than 4 lions; that is,
we want the probability that they will see 0, 1, 2, or 3 lions.
 e = 2.71828; since e is a constant equal to approximately 2.71828.

To solve this problem, we need to find the probability that tourists will see 0, 1, 2, or 3 lions. Thus, we need
to calculate the sum of four probabilities: P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5). To compute this sum, we use
the Poisson formula:

P(x < 3, 5) = P(0; 5) + P(1; 5) + P(2; 5) + P(3; 5)


P(x < 3, 5) = [ (e-5)(50) / 0! ] + [ (e-5)(51) / 1! ] + [ (e-5)(52) / 2! ] + [ (e-5)(53) / 3! ]
P(x < 3, 5) = [ (0.006738)(1) / 1 ] + [ (0.006738)(5) / 1 ] + [ (0.006738)(25) / 2 ] + [ (0.006738)(125) / 6 ]
P(x < 3, 5) = [ 0.0067 ] + [ 0.03369 ] + [ 0.084224 ] + [ 0.140375 ]
P(x < 3, 5) = 0.2650

Thus, the probability of seeing at no more than 3 lions is 0.2650.

What is the Normal Distribution?

The normal distribution refers to a family of continuous probability distributions described by the normal
equation.

Random Signal Analysis Page 78


The Normal Equation

The normal distribution is defined by the following equation:

Normal equation. The value of the random variable Y is:

Y = { 1/[ σ * sqrt(2π) ] } * e-(x - μ)2/2σ2

where X is a normal random variable, μ is the mean, σ is the standard deviation, π is approximately
3.14159, and e is approximately 2.71828.

The random variable X in the normal equation is called the normal random variable. The normal equation
is the probability density function for the normal distribution.

The Normal Curve

The graph of the normal distribution depends on two factors - the mean and the standard deviation. The
mean of the distribution determines the location of the center of the graph, and the standard deviation
determines the height and width of the graph. When the standard deviation is large, the curve is short and
wide; when the standard deviation is small, the curve is tall and narrow. All normal distributions look like a
symmetric, bell-shaped curve, as shown below.

The curve on the left is shorter and wider than the curve on the right, because the curve on the left has a
bigger standard deviation.

Probability and the Normal Curve

The normal distribution is a continuous probability distribution. This has several implications for probability.

 The total area under the normal curve is equal to 1.


 The probability that a normal random variable X equals any particular value is 0.
 The probability that X is greater than a equals the area under the normal curve bounded by a and
plus infinity (as indicated by the non-shaded area in the figure below).
 The probability that X is less than a equals the area under the normal curve bounded by a and
minus infinity (as indicated by the shaded area in the figure below).

Random Signal Analysis Page 79


Additionally, every normal curve (regardless of its mean or standard deviation) conforms to the following
"rule".

 About 68% of the area under the curve falls within 1 standard deviation of the mean.
 About 95% of the area under the curve falls within 2 standard deviations of the mean.
 About 99.7% of the area under the curve falls within 3 standard deviations of the mean.

Collectively, these points are known as the empirical rule or the 68-95-99.7 rule. Clearly, given a normal
distribution, most outcomes will be within 3 standard deviations of the mean.

To find the probability associated with a normal random variable, use a graphing calculator, an online
normal distribution calculator, or a normal distribution table. In the examples below, we illustrate the use
of Stat Trek's Normal Distribution Calculator, a free tool available on this site. In the next lesson, we
demonstrate the use of normal distribution tables.

Example 1

An average light bulb manufactured by the Acme Corporation lasts 300 days with a standard deviation of
50 days. Assuming that bulb life is normally distributed, what is the probability that an Acme light bulb will
last at most 365 days?

Solution: Given a mean score of 300 days and a standard deviation of 50 days, we want to find the
cumulative probability that bulb life is less than or equal to 365 days. Thus, we know the following:

 The value of the normal random variable is 365 days.


 The mean is equal to 300 days.
 The standard deviation is equal to 50 days.

We enter these values into the Normal Distribution Calculator and compute the cumulative probability. The
answer is: P( X < 365) = 0.90. Hence, there is a 90% chance that a light bulb will burn out within 365 days.

Example 2

Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard
deviation of 10, what is the probability that a person who takes the test will score between 90 and 110?

Random Signal Analysis Page 80


Solution: Here, we want to know the probability that the test score falls between 90 and 110. The "trick" to
solving this problem is to realize the following:

P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )

We use the Normal Distribution Calculator to compute both probabilities on the right side of the above
equation.

 To compute P( X < 110 ), we enter the following inputs into the calculator: The value of the normal
random variable is 110, the mean is 100, and the standard deviation is 10. We find that P( X < 110 )
is 0.84.
 To compute P( X < 90 ), we enter the following inputs into the calculator: The value of the normal
random variable is 90, the mean is 100, and the standard deviation is 10. We find that P( X < 90 ) is
0.16.

We use these findings to compute our final answer as follows:

P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 )


P( 90 < X < 110 ) = 0.84 - 0.16
P( 90 < X < 110 ) = 0.68

Thus, about 68% of the test scores will fall between 90 and 110.

Standard Normal Distribution

The standard normal distribution is a special case of the normal distribution. It is the distribution that
occurs when a normal random variable has a mean of zero and a standard deviation of one.

Standard Score (aka, z Score)

The normal random variable of a standard normal distribution is called a standard score or a z-score. Every
normal random variable X can be transformed into a z score via the following equation:

z = (X - μ) / σ

where X is a normal random variable, μ is the mean of X, and σ is the standard deviation of X.

Standard Normal Distribution Table

A standard normal distribution table shows a cumulative probability associated with a particular z-score.
Table rows show the whole number and tenths place of the z-score. Table columns show the hundredths
place. The cumulative probability (often from minus infinity to the z-score) appears in the cell of the table.

For example, a section of the standard normal table is reproduced below. To find the cumulative
probability of a z-score equal to -1.31, cross-reference the row of the table containing -1.3 with the column

Random Signal Analysis Page 81


containing 0.01. The table shows that the probability that a standard normal random variable will be less
than -1.31 is 0.0951; that is, P(Z < -1.31) = 0.0951.

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010

... ... ... ... ... ... ... ... ... ... ...

-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0722 0.0708 0.0694 0.0681

-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823

-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985

... ... ... ... ... ... ... ... ... ... ...

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Of course, you may not be interested in the probability that a standard normal random variable falls
between minus infinity and a given value. You may want to know the probability that it lies between a
given value and plus infinity. Or you may want to know the probability that a standard normal random
variable lies between two given values. These probabilities are easy to compute from a normal distribution
table. Here's how.

 Find P(Z > a). The probability that a standard normal random variable (z) is greater than a given
value (a) is easy to find. The table shows the P(Z < a). The P(Z > a) = 1 - P(Z < a).

Suppose, for example, that we want to know the probability that a z-score will be greater than 3.00.
From the table (see above), we find that P(Z < 3.00) = 0.9987. Therefore, P(Z > 3.00) = 1 - P(Z < 3.00)
= 1 - 0.9987 = 0.0013.

 Find P(a < Z < b). The probability that a standard normal random variables lies between two values is
also easy to find. The P(a < Z < b) = P(Z < b) - P(Z < a).

For example, suppose we want to know the probability that a z-score will be greater than -1.40 and
less than -1.20. From the table (see above), we find that P(Z < -1.20) = 0.1151; and P(Z < -1.40) =
0.0808. Therefore, P(-1.40 < Z < -1.20) = P(Z < -1.20) - P(Z < -1.40) = 0.1151 - 0.0808 = 0.0343.

In school or on the Advanced Placement Statistics Exam, you may be called upon to use or interpret
standard normal distribution tables. Standard normal tables are commonly found in appendices of most
statistics texts.

Random Signal Analysis Page 82


The Normal Distribution as a Model for Measurements

Often, phenomena in the real world follow a normal (or near-normal) distribution. This allows researchers
to use the normal distribution as a model for assessing probabilities associated with real-world
phenomena. Typically, the analysis involves two steps.

 Transform raw data. Usually, the raw data are not in the form of z-scores. They need to be
transformed into z-scores, using the transformation equation presented earlier: z = (X - μ) / σ.

 Find probability. Once the data have been transformed into z-scores, you can use standard normal
distribution tables, online calculators (e.g., Stat Trek's free normal distribution calculator), or handheld
graphing calculators to find probabilities associated with the z-scores.

The problem in the next section demonstrates the use of the normal distribution as a model for
measurement.

Chi-Square Distribution

The distribution of the chi-square statistic is called the chi-square distribution. In this lesson, we learn to
compute the chi-square statistic and find the probability associated with the statistic. Chi-square examples
illustrate key points.

The Chi-Square Statistic

Suppose we conduct the following statistical experiment. We select a random sample of size n from a
normal population, having a standard deviation equal to σ. We find that the standard deviation in our
sample is equal to s. Given these data, we can define a statistic, called chi-square, using the following
equation:

Χ2 = [ ( n - 1 ) * s2 ] / σ2

If we repeated this experiment an infinite number of times, we could obtain a sampling distribution for the
chi-square statistic. The chi-square distribution is defined by the following probability density function:

Y = Y0 * ( Χ2 ) ( v/2 - 1 ) * e-Χ2 / 2

where Y0 is a constant that depends on the number of degrees of freedom, Χ2 is the chi-square statistic, v =
n - 1 is the number of degrees of freedom, and e is a constant equal to the base of the natural logarithm
system (approximately 2.71828). Y0 is defined, so that the area under the chi-square curve is equal to one.

In the figure below, the red curve shows the distribution of chi-square values computed from all possible
samples of size 3, where degrees of freedom is n - 1 = 3 - 1 = 2. Similarly, the green curve shows the
distribution for samples of size 5 (degrees of freedom equal to 4); and the blue curve, for samples of size 11
(degrees of freedom equal to 10).

Random Signal Analysis Page 83


The chi-square distribution has the following properties:

 The mean of the distribution is equal to the number of degrees of freedom: μ = v.


 The variance is equal to two times the number of degrees of freedom: σ2 = 2 * v
 When the degrees of freedom are greater than or equal to 2, the maximum value for Y occurs when
Χ2 = v - 2.
 As the degrees of freedom increase, the chi-square curve approaches a normal distribution.

Cumulative Probability and the Chi-Square Distribution

The chi-square distribution is constructed so that the total area under the curve is equal to 1. The area
under the curve between 0 and a particular chi-square value is a cumulative probability associated with
that chi-square value. For example, in the figure below, the shaded area represents a cumulative
probability associated with a chi-square statistic equal to A; that is, it is the probability that the value of a
chi-square statistic will fall between 0 and A.

Fortunately, we don't have to compute the area under the curve to find the probability. The easiest way to
find the cumulative probability associated with a particular chi-square statistic is to use the Chi-Square
Distribution Calculator, a free tool provided by Stat Trek.

Chi-Square Distribution Calculator

The Chi-Square Distribution Calculator solves common statistics problems, based on the chi-square
distribution. The calculator computes cumulative probabilities, based on simple inputs. Clear instructions

Random Signal Analysis Page 84


guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and
sample problems provide straightforward explanations. The calculator is free. It can be found under the
Stat Tables tab, which appears in the header of every Stat Trek web page.

Chi-Square Distribution Calculator

Test Your Understanding: Chi-Square Examples

Problem 1

The Acme Battery Company has developed a new cell phone battery. On average, the battery lasts 60
minutes on a single charge. The standard deviation is 4 minutes.

Suppose the manufacturing department runs a quality control test. They randomly select 7 batteries. The
standard deviation of the selected batteries is 6 minutes. What would be the chi-square statistic
represented by this test?

Solution

We know the following:

 The standard deviation of the population is 4 minutes.


 The standard deviation of the sample is 6 minutes.
 The number of sample observations is 7.

To compute the chi-square statistic, we plug these data in the chi-square equation, as shown below.

Χ2 = [ ( n - 1 ) * s2 ] / σ2
Χ2 = [ ( 7 - 1 ) * 62 ] / 42 = 13.5

where Χ2 is the chi-square statistic, n is the sample size, s is the standard deviation of the sample, and σ is
the standard deviation of the population.

Problem 2

Let's revisit the problem presented above. The manufacturing department ran a quality control test, using 7
randomly selected batteries. In their test, the standard deviation was 6 minutes, which equated to a chi-
square statistic of 13.5.

Suppose they repeated the test with a new random sample of 7 batteries. What is the probability that the
standard deviation in the new test would be greater than 6 minutes?

Random Signal Analysis Page 85


Solution

We know the following:

 The sample size n is equal to 7.


 The degrees of freedom are equal to n - 1 = 7 - 1 = 6.
 The chi-square statistic is equal to 13.5 (see Example 1 above).

Given the degrees of freedom, we can determine the cumulative probability that the chi-square statistic
will fall between 0 and any positive value. To find the cumulative probability that a chi-square statistic falls
between 0 and 13.5, we enter the degrees of freedom (6) and the chi-square statistic (13.5) into the Chi-
Square Distribution Calculator. The calculator displays the cumulative probability: 0.96.

This tells us that the probability that a standard deviation would be less than or equal to 6 minutes is 0.96.
This means (by the subtraction rule) that the probability that the standard deviation would be greater than 6 minutes
is 1 - 0.96 or .04.

10. Questions

Objective questions

Question 1: In how many ways can the letters of the word ABACUS be rearranged such that the vowels
always appear together?

B. 3! * 3!

C.

D.

E.

Random Signal Analysis Page 86


Question 2: How many different four letter words can be formed (the words need not be meaningful) using
the letters of the word MEDITERRANEAN such that the first letter is E and the last letter is R?
A. 59

B.

C. 56

D. 23

E.

Question 3:What is the probability that the position in which the consonants appear remain unchanged
when the letters of the word "Math" are re-arranged?
A. 1/4
B. 1/6
C. 1/3
D. 1/24
E. 1/12

Question 4: There are 6 boxes numbered 1, 2, ... 6. Each box is to be filled up either with a red or a green
ball in such a way that at least 1 box contains a green ball and the boxes containing green balls are
consecutively numbered. The total number of ways in which this can be done is:
A. 5
B. 21
C. 33
Random Signal Analysis Page 87
D. 60
E. 6

Question 5: A man can hit a target once in 4 shots. If he fires 4 shots in succession, what is the probability
that he will hit his target?
man can hit a target once in 4 shots. If he fires 4 shots in succession, what is the probability that he will hit
his target?

A.1

B.

C.

D.

E.

Question 6: In how many ways can 5 letters be posted in 3 post boxes, if any number of letters can be
posted in all of the three post boxes?
In how many ways can 5 letters be posted in 3 post boxes, if any number of letters can be posted in all of
the three post boxes?

A. 5 C 3

B.5 P 3

C.53

D.35

E.25
Random Signal Analysis Page 88
Question 7: Ten coins are tossed simultaneously. In how many of the outcomes will the third coin turn up a
head?
Ten coins are tossed simultaneously. In how many of the outcomes will the third coin turn up a head?

A. 210

B. 29

C. 28

D. 29

E. None of these

Question 8: In how many ways can the letters of the word "PROBLEM" be rearranged to make seven letter
words such that none of the letters repeat?
In how many ways can the letters of the word "PROBLEM" be rearranged to make 7 letter words such that
none of the letters repeat?

A.7!

B.7C7

C.77

D.49

E.None of these

Short Questions

1. State A-priori or classical definition

2. State A-posteriori or relative frequency definition of probability


Random Signal Analysis Page 89
3. State Axiomatic definition of probability

4. State Baye’s theorem

5. State Total probability theorem

6. Define conditional probabilities of events

7. Define Joint probabilities of events

8. What is Bernoulli’s trial?

9. State Bernoulli’s theorem

10. State Binomial theorem

11. Define R.V. Give an example

12. State important Properties of a distribution function

13. What is a discrete RV and continuous RV? Give examples of each

14. Define probability distribution of i) discrete random variable ii) continuous random variable.

Long Questions

1. In a factory, 4 machines A1, A2, A3 and A4 produce 10%, 25%, 35% and 30% of the items respectively.
The percentage of the defective items produced by them is 5%, 4%, 3% and 2% respectively. An item
selected at random is found to be defective. What is the probability that it was produced by machine A2?

2. In a communication system a zero is transmitted with probability 0.55.In the channel a zero received as
zero with probability 0.9 and one received as one with probability 0.8. Find the probability that

(i) One is received


(ii) Zero is received
(i) One transmitted one received
(ii) Zero transmitted zero received

Random Signal Analysis Page 90


3. A mechanism consist of four paths A, B, C, D and probability of their failure are p,q,r,s respectively. The
mechanism works if their is no failure in any of these parts. Find the probability that
(i) Mechanism is working
(ii) Mechanism is not working

4. Suppose an urn contains ten white balls and five red balls. Two balls are withdrawn at random from the
urn without replacements:

(i) What is the probability that both balls are white?

(ii) What is the probability that the second ball is red?

5. Two balanced dices are being rolled simultaneously. If sum of the numbers shown at a time by the two
faces is 6. What is the probability that the number shown by one of the face to the dice in this case is 1?

6. Define R.V. Give an example

7. State important Properties of a distribution function

8. What is a discrete RV and continuous RV ? Give an examples of Each

9. Define probability distribution of i) a discrete random variable ii) a continuous random variable.

University Questions

Dec 2012

Q. State and prove Baye’s theorem

Q. State the three axioms of Probability

Q. If A and B are two events such that P(A)=0.3, P(B)=0.4, P(AB)=0.2 find P(A U B), P(A/B)

Q. State and prove properties of distribution function

Random Signal Analysis Page 91


May 12

Q. If A and B are two events, prove that P(A U B)= P(A) + P(B) – P(AB)

Q. Explain conditional probability with example

Q. State and prove Baye’s theorem and total probability theorem

Q. Suppose two million lottery tickets are issued with 100 wining tickets among them

(i) If aperson purchase 100 tickets what is the probability of wining

(ii) How many tickets should one buy to be 95% confident of having wining tickets ?

Q. What 'is a Random Variable? Explain continuous and discrete Random Variables with suitable examples.
Define expectation of continuous and discrete Random Variables

Dec 11

Q. State and prove Baye’s theorem

Q. If A and B are two independent events, prove that P(AB)= P(A). P(B)

Q. Suppose five cards to be drawn at random from a standard deck of cards. If all the drawn cards are red,
what is the probability that all of them hearts

Q. What 'is a Random Variable? Explain continuous and discrete Random Variables with suitable examples.

Q. the random variable has exponential probability density function f(x)= K e-|x|. Determine value of K and
corrosponding distribution function

May 2011

1. (a) State and explain :

(i) Independent events

(ii) Joint and conditional probabilities of events.

Random Signal Analysis Page 92


Q. Suppose box I contains 5 white balls and 6 black balls and box II contains 6 white balls and 4 black balls.
A box is selected at random and then a ball is chosen at random from the selected box.

(i) What is the probability that the ball chosen will be a white ball?

(ii) Given that the ball chosen is white, what is the probability that it came from box I?

Dec 2010

In a communication system a zero is transmitted with probability 0.4 and one is transmitted with
probability 0.6. Due to noise in the channel a zero can be received as one with probability 0.1 and zero with
probability 0.9 similarly a one can be received as zero with probability 0.1 and one with probability 0.9.
Now-

(i) A one was observed what is the probability that zero was transmitted
(ii) A one was observed what is the probability that one was transmitted

May 2010

Q. (a) Give the following definitions of probability with the shortcomings if any:

(i) A-priori or classical definition

(ii) A-posteriori or relative frequency definition

(iii) Axiomatic definition

Q. State and prove Baye’s theorem

Q. A mechanism consist of three paths A, B, C and probability of their failure are p,q,r respectively. The
mechanism works if their is no failure in any of these parts. Find the probability that
(i) Mechanism is working
(ii) Mechanism is not working

Dec 2009

Q. State and explain Baye’s theorem & conditional probability.

Random Signal Analysis Page 93


Q. If A and B are independent events then prove that P(A∩B)= P(A). P(B)

if A and B are exclusive events then prove that P(A∩B)= P(A)

Q. In a communication system a zero is transmitted with probability 0.45.In the channel a zero received as
zero with probability 0.9 and one received as one with probability 0.8. Find the probability that

(i) One is received


(ii) Zero is received
(iii) One transmitted one received
(iv) Zero transmitted zero received
Q. State and prove any two properties of-
(i) Density functions
(ii) Distribution functions.
Dec 2008

Q. State the three axioms of Probability

Q. Explain the concept of Joint and Conditional Probability with one eg. each.

Q. State and prove Baye’s theorem

Q. In a factory, 4 machines A1, A2, A3 and A4 produce 10%, 25%, 35% and 30% of the items respectively.
The percentage of the defective items produced by them is 5%, 4%, 3% and 2% respectively. An item
selected at random is found to be defective. What is the probability that it was produced by machine A2?

Q. If X, Yare two independent exponentially distributed random variables with same parameter unity, find
the probability density function of U=X+Y, V = X/(X + V).

Q. A random variable takes values 9, 13, 17 (5 + 4n) each with probability 1/n, find mean and variance of X.

Q. What 'is a Random Variable? Explain continuous and discrete Random Variables

Q. What 'is a Random Variable? Explain continuous and discrete Random Variables with suitable examples.

Random Signal Analysis Page 94


Q. Suppose X and Yare two random variables. Define covariance and correlation of X and Y. When do we
say that X and Y are (i) Orthogonal (ii) Independent and (iii) Uncorrelated? Are Uncorrelated variables
independent?

May 2007

Q. State and explain with example:-

(i) Conditional probability

(ii) Baye's Theorem.

If two events A and Bare independent, show that -

(i) A and [are independent

(ii) A and B are independent. .

Q. For a certain binary communication channel, the probability that a transmitted '0' is received as a '0'is
0.95 while the probability that a transmitted '1' is received as '1' is 0.90. If the probability of transmitting a
'0' is 0.4, find the probability that -

(i) a '1' is received

(ii) a '1' was transmitted given that '1' was received

(iii) the error has occurred.

Dec 2006

Q. (a) Give the following definitions of probability with the shortcomings if any:

(i) A-priori or classical definition

(ii) A-posteriori or relative frequency definition

(iii) Axiomatic definition

(b) State Total probability theorem and Baye’s' theorem.

Random Signal Analysis Page 95


Suppose box I contains 5 white balls and 6 black balls and box II contains 6 white balls and 4 black balls. A
box is selected at random and then a ball is chosen at random from the selected box.

(i) What is the probability that the ball chosen will be a white ball?

(ii) Given that the ball chosen is white, what is the probability that it came from box I?

June 2006

Q(a) (i) Define the conditional probability of an event A' given that another event B has occurred.

(ii) A biased coin is tossed till a head appears for the first time. What is the probability that the number of
tosses required is odd? [2+6]

(b) State Bayes' theorem

Q.A certain test for a particular cancer is known to be 95% accurate. A person submits to the test and the
results are positive. Suppose the person comes from a population of 100,000 where 2000 people suffer
from that disease. What can we conclude about the probability that the person under test has that
particular cancer?

Dec 2005

(a) Let B1, B2 . . . ., Bn be partitions of an event space Bi, i = 1, 2 . . . n, for the event B that has 10 occurred.
Suppose now an event A occurs. Find expression for P (BIA) in terms of B1,

(b) Two balanced dices are being rolled simultaneously. If sum of the numbers shown at a time by the 10
two faces is 7. What is the probability that the number shown by one of the face to the dice in this case is
1?

June 2005

Q (a) (i) with the help of a Venn diagram show that the conditional probability of occurrence of an event A
given that the event B has occurred, is given by -

p ( AIB) = P ( AB) / P (B)

Random Signal Analysis Page 96


(ii) An Urn contains two black balls and three white balls. Two balls are selected at random from the urn
without replacement and sequence of colours is noted. Find the probability that both balls are black.

(b) Suppose that 5 cards to be drawn at random from a standard deck of 52 cards. If all the cards drawn are
red, what is the probability that all of them are hearts.

Dec2004

(a) An experiment is performed N times. During the trial the event A occurs nA times and the event B only
occurs nAB time, during the occurrence of the event A. From the relative frequency approach define the
probability of occurrence of the event A, p(A), the joint probability of occurrence of the events A and B,
P(AB) and the conditional probability of the event B, P(B/A) given the event A has occurred in terms of the
frequency of occurrences nA, nAB and N. Show that P (B/A) = P(AB) / p(A) and P(B'A) = 1 - P(B/A). (b) In
throwing of fair die the probability of the event A = (The outcome is greater than 3). The Probability of the
event B. (the outcome is even numbers). Find P(A/B) and P(A'B).

May 2004

1. (a) State and explain :

(i) Apriori probability of the outcome of an experiment

(ii) Joint and conditional probabilities of events.

(iii) Baye’s Theorem. (Dec 2008, May 2009)

(b) Suppose an urn contains five white balls and seven red balls. Two balls are withdrawn at random from
the urn without replacements:

(i) What is the probability that both balls are white?

(ii) What is the probability that the second ball is red?

Random Signal Analysis Page 97


CHAPTER-2

Operations on One Random Variable

1. Motivation
When we have a random variable which is function of another random variable and we know the
statistics of one random variable then we can get the statistics of the unknown random variable in
terms of known random variable.

2. Syllabus

Sr. No. of Self


Topic Fine Detailing Week
No. Hours Study
01 Functions of  Functions of one random 6 1.5 12
one random variable
variable
 their distribution and density
functions

 Mean, variance and moments

 Chebyshev, Markov inequality

 Characteristic functions

 Moment theorem

Random Signal Analysis Page 98


3. Books Recommended

1. A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochastic


Processes, 4th Edition, McGraw-Hill, 2002

2. P.Z. Peebles, Probability, Random Variables and Random Signal Principles,


4th edition, Mc-Graw Hill, 2000

3. H. Stark and J.W. Woods, Probability and Random Processes with


Applications to Signal Processing, 3e, Pearson edu

4. Wim C Van Etten, Introduction to Random Signals and Noise, Wiley

5. Miller, Probability and Random Processes-with applications to signal


processing and communication, first ed2007, Elsevier

4. Weight age in University Examination: 15-30 marks.

5. Objective

In this chapter we study a few basic concepts of functions of random variables and investigate the expected
value of a certain function of a random variable. The techniques of moment generating functions and
characteristic functions, which are very useful in some applications, are presented.

6. Key Notation:

FX(x) Cumulative Distribution Function

fX(x) Probability Density Function

Open interval on the real line

Random Signal Analysis Page 99


Closed interval on the real line

Semi open intervals

7. Key Definitions

1. Moment of random variable

2. Characteristic function of a discrete random variable

8. Key Relations

1. The function of RV

2. Expected value of RV

Random Signal Analysis Page 100


3. Expected value of function of RV

4. Standard Deviation

5. Variance

6. Chebyshev Inequality

7. Nth order moment of RV

Random Signal Analysis Page 101


9. Theory

Functions of Random Variables

Often we have to consider random variables which are functions of other random variables. Let

be a random variable and is a function. Then is a random variable. We are interested to


find the pdf of . For example, suppose represents the random voltage input to a full-wave rectifier.

Then the rectifier output is given by . We have to find the probability description of the random
variable . We consider the following cases:

(a) is a discrete random variable with probability mass function .


The probability mass function of is given by

b) is a continuous random variable with probability density function and is one-to-one


and
monotonically increasing.
The probability mass function of is given by

Random Signal Analysis Page 102


This is illustrated in the Figure 1

Random Signal Analysis Page 103


Example 1 Probability density function of a linear function of a random variable

Suppose,

Example 2 Probability density function of the distribution function of a random variable

Suppose the distribution function of a continuous random variable is monotonically


increasing and

one-to-one and define the random variable .Then

Remark

(1) The distribution given by is called a uniform distribution over the interval [0,1].

(2) The above result is particularly important in simulating a random variable with a particular distribution
function.

We assumed to be one-to-one function for invariability. However, the result is more general -
the random variable defined by the distribution function of any random variable is uniformly
distributed
over [0,1]
For example, if is a discrete RV,

Random Signal Analysis Page 104


(3) is a continuous random variable with probability density function and has multiple
solutions
for x.

Suppose, for has solutions . Then

Proof:

Consider the plot of . Suppose at a point , we have three distinct roots as


shown.

consider the event . This event will be equivalent to union of events

Random Signal Analysis Page 105


Where the negative sign in is used to account for positive probability.

Therefore, dividing by dy and taking the limit, we get

in the above, we assumed to have three roots. In general, if has n roots, then

Example 3 Probability density function of a linear function of a random variable

Suppose

Example 4 Probability density function of the output of a full-wave rectifier

has two solutions and and at each solution point.

Random Signal Analysis Page 106


Figure 3

Example 5 Probability density function of the output of a square-law device

and

so that

Expected Value of a Random Variable

 The expectation operation extracts a few parameters of a random variable and provides a summary
description of the random variable in terms of these parameters.
 It is far easier to estimate these parameters from data than to estimate the distribution or density
function of the random variable.
Random Signal Analysis Page 107
 Moments are some important parameters obtained through the expection operation.

Expected value or mean of a random variable

The expected value of a random variable is defined by

provided exists.

is also called the mean or statistical average of the random variable and is denoted by

Note that, for a discrete RV with the probability mass function (pmf) the pdf

is given by

Thus for a discrete random variable with

Interpretation of the mean

 The mean gives an idea about the average value of the random value. The values of the random
variable are spread about this value.

Random Signal Analysis Page 108


 Observe that

Therefore, the mean can be also interpreted as the centre of gravity of the pdf curve.

Figure1 Mean of a random variable

Example 1

Suppose is a random variable defined by the pdf

Then

Random Signal Analysis Page 109


Example 2

Consider the random variable with the pmf as tabulated below

Value of the random


0 1 2 3
variable x

pX(x)

Then

Example 3 Let X be a continuous random variable with

Then

Random Signal Analysis Page 110


=
Hence EX does not exist. This density function is known as the Cauchy density function.

Remark

If is an even function of , then Thus the mean of a RV with an even symmetric pdf
is 0.

Expected value of a function of a random variable

Suppose is a real-valued function of a random variable as discussed in the last class.


Then,

We shall illustrate the above result in the special case when is one-to-one and
monotonically increasing function of x In this case,

Figure 2

Random Signal Analysis Page 111


The following important properties of the expectation operation can be immediately derived:

a) If is a constant,

Clearly

(b) If are two functions of the random variable and are constants,

The above property means that is a linear operator.

Random Signal Analysis Page 112


Mean-square value

Variance

For a random variable with the pdf and mean the variance of is denoted by and
defined as

Thus for a discrete random variable with

The standard deviation of is defined as

Example 4

Find the variance of the random variable discussed in Example 1.

Example 5

Random Signal Analysis Page 113


Find the variance of the random variable discussed in Example 2. As already computed

Remark

 The variance is a central moment and measure of dispersion of the random variable about the
mean.

 is the average of the square deviation from the mean. It gives information about the

deviation of the values of the RV about the mean. A smaller implies that the random values are

more clustered about the mean. Similarly, a bigger means that the random values are more
scattered.

For example, consider two random variables with pmf as shown below. Note that each of

has zero mean.The variances are given by and implying that has more
spread about the mean.

Random Signal Analysis Page 114


Figure 3 The pdfs of two discrete random variables with same mean but different variances .

he pdfs of two continous random variables with the same mean and different variances are illustrated in
Figure 4 .

Random Signal Analysis Page 115


Figure 4

 We could have used the mean absolute deviation to know about the deviation of the
random values about the mean. But it is more difficult both for analysis and numerical calculation.
 Properties of variance

 (1)

Random Signal Analysis Page 116



 (2) If then

3) If is a constant,

nth moment of a random variable

We can define the nth moment and the nth central- moment of a random variable X by the following
relations

Note that

 The mean is the first moment and the mean-square value is the second moment

 The first central moment is 0 and the variance is the second central moment

 The third central moment measures lack of symmetry of the pdf of a random variable
is called the coefficient of skewness and If the pdf is symmetric this coefficient will be zero.

Random Signal Analysis Page 117


 The fourth central moment measures flatness or peakedness of the pdf of a random variable.

is called kurtosis. If the peak of the pdf is sharper, then the random variable has a
higher kurtosis.

Inequalities based on expectations

The mean and variance also give some quantitative information about the bounds of RVs. Following
inequalities are extremely useful in many practical problems.

Chebysev Inequality

Suppose a parameter of a manufactured item with known mean The quality

control department rejects the item if the absolute deviation of from is greater than What
fraction of the manufacturing item does the quality control department reject? Can you roughly guess it?

The standard deviation gives us an intuitive idea how the random variable is distributed about the mean.
This idea is more precisely expressed in the remarkable Chebysev Inequality stated below. For a random

variable with mean

Proof:

Random Signal Analysis Page 118


Markov Inequality

For a random variable which takes only nonnegative values

Remark

Example 6

Solution: A nonnegative RV has the mean Find an upper bound of the probability

By Markov's inequality

Hence the required upper bound

Just as the frequency-domain charcterisations of discrete-time and continuous-time signals, the probability
mass function and the probability density function can also be characterized in the frequency-domain by
means of the charcteristic function of a random variable . These functions are particularly important in

Random Signal Analysis Page 119


 calculating of moments of a random variable
 evaluating the PDF of combinations of multiple RVs.

Characteristic function

Consider a random variable with probability density function The characteristic function of

denoted by is defined as

Note the following:

 is a complex quantity, representing the Fourier transform of and traditionally using

instead of This implies that the properties of the Fourier transform applies to the
characteristic function.

 The interpretation that is the expectation of helps in calculating moments with the
help of the characteristics function. In a simple case ,

 As always non-negative and , always exists.

Random Signal Analysis Page 120


[Recall that the Fourier transform of a function g(t) exists if , i.e., g(t) is absolutely
integrable.]

We can get from by the inverse transform

Example 1

Consider the random variable X with pdf given by

= 0 otherwise. The characteristics function is given by

Solution:

Example 2

The characteristic function of the random variable with

Random Signal Analysis Page 121


Characteristic function of a discrete random variable

Suppose X is a random variable taking values from the discrete set with corresponding

probability mass function for the value

Then ,

If Rx is the set of integers ,we can write

In this case can be interpreted as the discrete-time Fourier transform with substituting
in the original discrete-time Fourier transform. The inverse relation is

Example 3

Suppose X is a random variable with the probability mass function

Random Signal Analysis Page 122


.

Then,

(Using the Binomial theorem)

Example 4

The characteristic function of the discrete random variable with

Moments and the characteristic function

Given the characteristics function the nth moment is given by

Moments and the characteristic function

Random Signal Analysis Page 123


Given the characteristics function the nth moment is given by

To prove this consider the power series expansion of

Taking expectation of both sides and assuming to exist, we get

Taking the first derivative of with respect to at we get

Similarly, taking the derivative of with respect to at we get

Random Signal Analysis Page 124


Example 5

First two moments of the random variable in Example 2

Probability Generating Function

If the random variable under consideration takes non- negative integer values only, it is convenient to
characterize the random variable in terms of the probability generating function G (z) defined by

Note that

is related to z-transform, in actual z-transform, is used instead of

The characteristic function of X is given by


Mean and Variance from the Probability Generating Function

Random Signal Analysis Page 125


Probability Mass Functions from Probability Generating Functions

Consider the derivative given by

Thus, given the probability generating function , we can get the probability mass function from the

derivatives of at Z = 0. Hence this transform is called the probability generating function.

More problems

Problem

In a recent little league softball game, each player went to bat 4 times. The number of hits made by each
player is described by the following probability distribution.

Number of hits, x 0 1 2 3 4
Probability, P(x) 0.10 0.20 0.30 0.25 0.15

What is the mean of the probability distribution?

Random Signal Analysis Page 126


Solution

The correct answer is E. The mean of the probability distribution is 2.15, as defined by the following
equation.

E(X) = Σ [ xi * P(xi) ]
E(X) = 0*0.10 + 1*0.20 + 2*0.30 + 3*0.25 + 4*0.15 = 2.15

Problem

The number of adults living in homes on a randomly selected city block is described by the following
probability distribution.

Number of adults, x 1 2 3 4
Probability, P(x) 0.25 0.50 0.15 0.10

What is the standard deviation of the probability distribution?

Solution

The correct answer is D. The solution has three parts. First, find the expected value; then, find the variance;
then, find the standard deviation. Computations are shown below, beginning with the expected value.

E(X) = Σ [ xi * P(xi) ]
E(X) = 1*0.25 + 2*0.50 + 3*0.15 + 4*0.10 = 2.10

Now that we know the expected value, we find the variance.

σ2 = Σ { [ xi - E(x) ]2 * P(xi) }
σ2 = (1 - 2.1)2 * 0.25 + (2 - 2.1)2 * 0.50 + (3 - 2.1)2 * 0.15 + (4 - 2.1)2 * 0.10
σ = (1.21 * 0.25) + (0.01 * 0.50) + (0.81) * 0.15) + (3.61 * 0.10) = 0.3025 + 0.0050 + 0.1215 + 0.3610 =
2

0.79

And finally, the standard deviation is equal to the square root of the variance; so the standard deviation is
sqrt(0.79) or 0.889.

Problem

The table on the left shows the joint probability distribution between two random variables - X and Y; and
the table on the right shows the joint probability distribution between two random variables - A and B.

X A
0 1 2 0 1 2
Y 3 0.1 0.2 0.2 B 3 0.1 0.2 0.2

Random Signal Analysis Page 127


4 0.1 0.2 0.2 4 0.2 0.2 0.1

Which of the following statements are true?

I. X and Y are independent random variables.


II. A and B are independent random variables.

Solution

The correct answer is A. The solution requires several computations to test the independence of random
variables. Those computations are shown below.

X and Y are independent if P(x|y) = P(x), for all values of X and Y. From the probability distribution table,
we know the following:

P(x=0) = 0.2; P(x=0 | y=3) = 0.2; P(x=0 | y = 4) = 0.2


P(x=1) = 0.4; P(x=1 | y=3) = 0.4; P(x=1 | y = 4) = 0.4
P(x=2) = 0.4; P(x=2 | y=3) = 0.4; P(x=2 | y = 4) = 0.4

Thus, P(x|y) = P(x), for all values of X and Y, which means that X and Y are independent. We repeat the
same analysis to test the independence of A and B.

P(a=0) = 0.3; P(a=0 | b=3) = 0.2; P(a=0 | b = 4) = 0.4


P(a=1) = 0.4; P(a=1 | b=3) = 0.4; P(a=1 | b = 4) = 0.4
P(a=2) = 0.3; P(a=2 | b=3) = 0.4; P(a=2 | b = 4) = 0.2

Thus, P(a|b) is not equal to P(a), for all values of A and B. For example, P(a=0) = 0.3; but P(a=0 | b=3) = 0.2.
This means that A and B are not independent.

Problem

X
0 1 2
3 0.1 0.2 0.2
Y
4 0.1 0.2 0.2

The table on the right shows the joint probability distribution between two random variables - X and Y. (In a
joint probability distribution table, numbers in the cells of the table represent the probability that particular
values of X and Y occur together.)

What is the mean of the sum of X and Y?

Solution

Random Signal Analysis Page 128


The correct answer is D. The solution requires three computations: (1) find the mean (expected value) of X,
(2) find the mean (expected value) of Y, and (3) find the sum of the means. Those computations are shown
below, beginning with the mean of X.

E(X) = Σ [ xi * P(xi) ]
E(X) = 0 * (0.1 + 0.1) + 1 * (0.2 + 0.2) + 2 * (0.2 + 0.2) = 0 + 0.4 + 0.8 = 1.2

Next, we find the mean of Y.

E(Y) = Σ [ yi * P(yi) ]
E(Y) = 3 * (0.1 + 0.2 + 0.2) + 4 * (0.1 + 0.2 + 0.2) = (3 * 0.5) + (4 * 0.5) = 1.5 + 2 = 3.5

And finally, the mean of the sum of X and Y is equal to the sum of the means. Therefore,

E(X + Y) = E(X) + E(Y) = 1.2 + 3.5 = 4.7

Problem

Suppose X and Y are independent random variables. The variance of X is equal to 16; and the variance of Y
is equal to 9. Let Z = X - Y.

What is the standard deviation of Z?

Solution

The correct answer is B. The solution requires us to recognize that Variable Z is a combination of two
independent random variables. As such, the variance of Z is equal to the variance of X plus the variance of
Y.

Var(Z) = Var(X) + Var(Y) = 16 + 9 = 25

The standard deviation of Z is equal to the square root of the variance. Therefore, the standard deviation is
equal to the square root of 25, which is 5.

Problem

The average salary for an employee at Acme Corporation is $30,000 per year. This year, management
awards the following bonuses to every employee.

 A Christmas bonus of $500.


 An incentive bonus equal to 10 percent of the employee's salary.

What is the mean bonus received by employees?

Solution

The correct answer is C. To compute the bonus, management applies the following linear transformation to
the each employee's salary.
Random Signal Analysis Page 129
Y = mX + b
Y = 0.10 * X + 500

where Y is the transformed variable (the bonus), X is the original variable (the salary), m is the multiplicative
constant 0.10, and b is the additive constant 500.

Since we know that the mean salary is $30,000, we can compute the mean bonus from the following
equation.

Y = mX + b
Y = 0.10 * $30,000 + $500 = $3,500

10. Questions

Objective Questions

1. Which operator is used to calculate average value

a. A[ ]

b. E[ ]

c. D[ ]

d. Z[ ]

2. Mean square value of RV is given as

a. E[X2]

b. E2[X]

c. [E[X2]]2

d. E[X2]

3. If X is uniformly distributed in [0,1] then P( X > 0.2) is

a. 0.2

b. 0.5

c. 0.7

Random Signal Analysis Page 130


d. 0.9

Short Questions

1. Write the formula to express the pdf of RV which is function of another RV.

2. Define expected value of RV

3. Write the formula to find the expected value of continuous & discrete RV

4. Write the formula to find the expected value of function of RV

5. Define variance of RV

6. Write the formula to find the variance of RV

7. Define Standard Deviation.

8. Write the formula to find the standard deviation of RV

9. What is Chebyshev inequality.

10. What is moment generating function

11. Define n-th order moment

12. Define n-th order characteristic function

University Questions

Dec 2012

Q. Explain MGF of discrete random variable and continuous random variable in detail

Q. Find characteristic function of Binomial distribution and poisson distribution.

May 2012

Q. Find characteristic function of poisson distribution, it’s mean and variance

Random Signal Analysis Page 131


Q. If X is a continuous random variable and Y= aX + b then, find pdf of Y in terms of F X(x), CDF of X.

Q. Prove that if two random variables are independent, then density function of their sum is given by
convolution of their density functions.

Dec 2011

Q. Write short note on poisson distribution, Rayleigh distribution, Gaussian distribution

May 2011

Q. If X is a continuous random variable and Y= aX + b then, prove that -

1 Y b
f Y ( y)  fX ( )
|a| a

Q. Let X be a continuous random variable with uniform pdf in (0, 2π). Find probability density function of Y=
cos X

Q. Find characteristic function of poisson distribution, it’s mean and variance

Dec 2010

Q. In medical Imaging such as computer tomography the reaction between the detector reading Y and body
absorptivity X follows Y=eX law. Let X be N(µ,σ2). Compute the pdf of Y

Q. The characteristic function of the Laplace variable

f(x)=(m/2)e-m|x| -∞<x<∞

also find its mean and variance.

Dec 2009

Q.How the characteristic function Фx(w) of a random variable X is defined? Show that Фx(w) can be
expressed as

j n wn 1 dn
 ( w)   mn where mn  n [ n  X ( w)] w0 is the nth order moment of r.v X
n! j dw

Random Signal Analysis Page 132


Q. If X is Poisson distributed random variable find the moment generating function and characteristic
function

Q. Suppose, for has solutions . Then prove that

Q. If the probability density function of X is fX(x) = e-X for x>0; then find the probability density function of
Y= X3

Dec 2008

Q. If X is a continuous random variable and Y= aX + b then, prove that -

1 Y b
f Y ( y)  fX ( )
|a| a

May 2007

Q. (a) If X is a continuous random variable and Y= aX + b then, prove that -

1 Y b
f Y ( y)  fX ( )
|a| a

(b) If a random variable X has uniform distribution in (-2, 2), find the probability density function fy(y) of Y=
3X + 2.

Dec 2006

Random Signal Analysis Page 133


Q. Obtain the distribution function of Y =aX + b, where X is uniformly distributed in (c,d).

Q. How the characteristic function Фx(w) of a random variable X is defined? Show that Фx(w) can be
expressed as –

j n wn 1 dn
 ( w)   mn where mn  [  X ( w)] w0 is the nth order moment of r.v X
n! j n dw n

(b) Find the characteristic function of the geometric distribution given by

P(X=r)=qrp r=0,1,2,... ;p+q=1

Hence find the mean and the variance.

June 2005

Q. How the characteristic function Фx(w) of a random variable X is defined? Show that Фx(w) can be
expressed as –

j n wn 1 dn
 ( w)   mn where mn  [  X ( w)] w0 is the nth order moment of r.v X
n! j n dw n

An exponential distributed random variable X, with parameter A is given by X = λe-λx

Find E[X], Ф’(x))/j.

Dec 2005

Q. (i) Explain what is a moment generating function of a random variable.

(ii) If X is a random variable and f(x) is given by f(x) = f = 1/b. e-(x-a)/b, find the first and

second moments of X.

Random Signal Analysis Page 134


CHAPTER-3

Multiple of Random Variable and Convergence

1. Motivation
When we have a random variable which is function of another random variable and we know the
statistics of one random variable then we can get the statistics of the unknown random variable in
terms of known random variable.

2. Syllabus

Random Signal Analysis Page 135


Sr. No. of Self
Topic Fine Detailing
No. Hours Study
01 Multiple of  Vector random variables, 1 hour 1 hour
Pairs of random variables,
Random
Variable and  Joint CDF, Joint PDF 1 hour 1 hour
Convergence Independence, Conditional
CDF and PDF, Conditional
Expectation
 One function of two random
variable, two functions of two 1 hour 1 hour
random variables;

 Joint moments, joint 1 hour 1 hour


characteristic function,
covariance and correlation-

 independent, uncorrelated
and orthogonal random 1 hour 1 hour
variables.

3. Books Recommended

1. A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochastic


Processes, 4th Edition, McGraw-Hill, 2002

2. P.Z. Peebles, Probability, Random Variables and Random Signal Principles,


4th edition, Mc-Graw Hill, 2000

3. H. Stark and J.W. Woods, Probability and Random Processes with


Applications to Signal Processing, 3e, Pearson edu

Random Signal Analysis Page 136


4. Wim C Van Etten, Introduction to Random Signals and Noise, Wiley

5. Miller, Probability and Random Processes-with applications to signal


processing and communication, first ed2007, Elsevier

4. Weight age in University Examination: 15-30 marks.

5. Objective

In this chapter we study a few basic concepts of functions of random variables and investigate the expected
value of a certain function of a random variable. The techniques of moment generating functions and
characteristic functions, which are very useful in some applications, are presented.

6. Key Notation:

FX(x) Cumulative Distribution Function

fX(x) Probability Density Function

Open interval on the real line

Closed interval on the real line

Semi open intervals

7. Key Definitions

1. Moment of random variable

Random Signal Analysis Page 137


2. Characteristic function of a discrete random variable

8. Key Relations

8. The function of RV

9. Expected value of RV

10. Expected value of function of RV

Random Signal Analysis Page 138


11. Standard Deviation

12. Variance

13. Chebyshev Inequality

14. Nth order moment of RV

Jointly Distributed Random Variables

We may define two or more random variables on the same sample space. Let and be two real

random variables defined on the same probability space The mapping such that for

is called a joint random variable.

Random Signal Analysis Page 139


Figure 1

• The above figure illustrates the mapping corresponding to a joint random variable. The joint random

variable in the above case is denoted by .

• We may represent a joint random variable as a two-dimensional vector .

• We can extend the above definition to define joint random variables of any dimension. The mapping

such that for is called an n-dimensional random variable and

denoted by the vector

Example1 Suppose we are interested in studying the height and weight of the students in a class. We

can define the joint RV where represents height and represents the weight.

Example 2 Suppose in a communication system is the transmitted signal and is the corresponding

noisy received signal. Then is a joint random variable.

Joint Probability Distribution Function

Recall the definition of the distribution of a single random variable. The event was used to

define the probability distribution function . Given , we can find the probability of any event
involving the random variable. Similarly, for two random variables and , the event

Random Signal Analysis Page 140


is considered as the representative event.

The probability is called the joint distribution function or the joint

cumulative distribution function (CDF) of the random variables and and denoted by .

Figure 2

satisfies the following properties:

Note that


Random Signal Analysis Page 141
• is right continuous in both the variables.

Figure 4

Given ,we have a complete description of the random variables and


.

To prove this

Similarly .

• Given , each of is called a marginal distribution


function or marginal cumulative distribution function (CDF).

Example 3

Random Signal Analysis Page 142


Consider two jointly distributed random variables and with the joint CDF

(a) Find the marginal CDFs

(b) Find the probability

(a)

(b)

Jointly Distributed Discrete Random Variables

If and are two discrete random variables defined on the same probability space such

that takes values from the countable subset and takes values from the countable subset .Then

the joint random variable can take values from the countable subset in . The joint random

variable is completely specified by their joint probability mass function

Given , we can determine other probabilities involving the random variables and

Random Signal Analysis Page 143


Remark

This is because

• Marginal Probability Mass Functions: The probability mass functions and are obtained
from the joint probability mass function as follows

and similarly

These probability mass functions and obtained from the joint probability mass functions
are called marginal probability mass functions .

Example 4 Consider the random variables and with the joint probability mass function as tabulated in

Table 1. The marginal probabilities and are as shown in the last column and the last row

Random Signal Analysis Page 144


respectively.

Table 1

Joint Probability Density Function

If and are two continuous random variables and their joint distribution function is continuous

in both and , then we can define joint probability density function by

provided it exists.

Clearly

Properties of Joint Probability Density Function

• is always a non-negative quantity. That is,

Random Signal Analysis Page 145


• The probability of any Borel set can be obtained by

Marginal density functions

The marginal density functions and of two joint RVs and are given by the
derivatives of the corresponding marginal distribution functions. Thus

Remark

• The marginal CDF and pdf are same as the CDF and pdf of the concerned single random variable. The
marginal term simply refers that it is derived from the corresponding joint distribution or density function
of two or more joint random variables.

• With the help of the two-dimensional Dirac Delta function, we can define the joint pdf of two discrete
jointly random variables. Thus for discrete jointly random variables and .

Random Signal Analysis Page 146


Example 5 The joint density function of the random variables in Example 3 is

Example 6 The joint pdf of two random variables and are given by

• Find .

• Find .

• Find and .

• What is the probability ?

Random Signal Analysis Page 147


Conditional Distributions

We discussed the conditional CDF and conditional PDF of a random variable conditioned on some events
defined in terms of the same random variable. We observed that

and

Random Signal Analysis Page 148


We can define these quantities for two random variables. We start with the conditional probability mass
functions for two random variables.

Conditional Probability Mass Functions

Suppose and are two discrete jointly random variable with the joint PMF The conditional

PMF of given is denoted by and defined as

Similarly we can define the conditional probability mass function

• The conditional PMF satisfies the properties of the probability mass functions.

• From the definition of conditional probability mass functions, we can define two independent random
variables. Two discrete random variables X and Y are said to be independent if and only if

Random Signal Analysis Page 149


so that

Bayes' Rule for Discrete Random Variables

Suppose and are two discrete jointly random variables. Given and we can

determine the a posteriori probability mass function by using


the Bayes' rule as follows:

Example 1 Consider the random variables and with the joint probability mass function as presented in
the

following table

Random Signal Analysis Page 150


The marginal probabilities are as shown in the last column and the last row

Conditional Probability Distribution Function

Consider two continuous jointly random variables and with the joint probability distribution

function We are interested to find the conditional distribution function of one of the random
variables on the condition of a particular value of the other random variable.

We cannot define the conditional distribution function of the random variable on the condition of the

event by the relation

as in the above expression. The conditional distribution function is defined in the limiting
sense as follows:

Random Signal Analysis Page 151


Conditional Probability Density Function

is called the conditional probability density function of given

Let us define the conditional distribution function .

The conditional density is defined in the limiting sense as follows

Because,

The right hand side of the highlighted equation is

Similarly we have

Random Signal Analysis Page 152


Two random variables are statistically independent if for all

• (4)

Example 2 X and Y are two jointly random variables with the joint pdf given by

find,

(a)

(b)

(c)

Solution:

Since

we get

Random Signal Analysis Page 153


Bayes’ Rule for Continuous Random Variables:

Given the marginal density function and the conditional density , we can apply the Bayes'

rule for two continuous joint random variables to get as follows. Recall that

In context of the above Bayes rule, is called the a priori density function and is called
the a posteriori density function.
Example 3 For random variables X and Y, the joint probability density function is given by

Random Signal Analysis Page 154


Find the marginal density Are independent?

and

Therefore,

and

Hence X and Y are not independent.

Bayes’ Rule for Mixed Random Variables

Random Signal Analysis Page 155


Let be a discrete random variable with probability mass function and be a continuous random

variable defined on the same sample space with the conditional probability density function In

practical problems we may have to estimate the conditional PMF of given the observed value We
can define this conditional PMF also in the limiting sense

Example 4

is a binary random variable with

is the Gaussian noise with mean

Random Signal Analysis Page 156


Then

Independent Random Variables Revisited

Let and be two random variables characterized by the joint distribution function

and the corresponding joint density function

Then and are independent if and are independent events. Thus,

and equivalently

Transformation of Two Random Variables

We are often interested in finding out the probability density function of a function of two or more RVs.
Following are a few examples.
Random Signal Analysis Page 157
• The received signal by a communication receiver is given by

where is received signal which is the superposition of the message signal and the noise .

The frequently applied operations on communication signals like modulation, demodulation, correlation
etc. involve multiplication of two signals in the form Z = XY.

We have to know about the probability distribution of in any analysis of . More formally, given two

random variables X and Y with joint probability density function and a function we

have to find .

In this lecture, we shall address this problem.

Probability Density of the Function of Two Random Variables

We consider the transformation

Consider the event corresponding to each z. We can find a variable subset such that

Random Signal Analysis Page 158


Probability density function of Z = X + Y .

Consider Figure 2

Random Signal Analysis Page 159


Figure 2

We have

Therefore, is the coloured region in the Figure 2.

Random Signal Analysis Page 160


If X and Y are independent

where * is the convolution operation.

Example 1

Suppose X and Y are independent random variables and each uniformly distributed over (a, b). and

are as shown in the figure below.

Random Signal Analysis Page 161


The PDF of is a triangular probability density function as shown in the figure.

Example 2 Erlang distribution

Suppose and are independent random variables and

We have

and

Note that is an Erlang distribution with

robability density function of Z = XY

Random Signal Analysis Page 162


Substituting u = xy du = xdy

Probability density function of

If X and Y are independent random variables, then

Random Signal Analysis Page 163


Example 3

Suppose X and Y are independent zero mean Gaussian random variable with unity standard deviation and

. Then

which is the Cauchy probability density function.

Probability density function of

Here

(changing from Cartesian into polar coordinates)

Random Signal Analysis Page 164


Example 4 Rayleigh random variable

Suppose X and Y are two independent Gaussian random variables each with mean 0 and variance and

.Then

The above is the Rayleigh density function which we discussed earlier.

Example 5 Rician random variable

Suppose X and Y are independent Gaussian variables with non-zero mean and respectively and

constant variance. We have to find the joint density function of the random variable .

Here

Random Signal Analysis Page 165


We have shown that

Suppose Then

where is the modified zeroth-order Bessel function of the first kind.

 When , and the Rician density reduces to the Rayleigh density

 When the Rician density approaches to the Gaussian density

 The Rician density is used to model the envelope of a sinusoid plus a narrow-band Gaussian noise.

Random Signal Analysis Page 166


 It is used to model the received noise in a multipath situation.

Joint Probability Density Function of Two Functions of Two Random Variables

We consider the transformation We have to find out the joint probability density

function where and .

Consider a differential region of area at point in the plane as shown in Figure 1.

Suppose the inverse mapping relation is and .

Let us see how the corners of the differential region are mapped to the plane. Observe that

Therefore,

The point is mapped to the point in the plane.

Random Signal Analysis Page 167


Figure 1

We can similarly find the points in the plane corresponding to and


The mapping is shown in Figure 2 . We notice that each differential region in the plane is a

parallelogram. It can be shown that the differential parallelogram at has a area

where is the Jacobian of the transformation defined as the determinant .

Further, it can be shown that the absolute values of the Jacobians of the forward and the inverse
transform are inverse of each other so that

Random Signal Analysis Page 168


where

Therefore, the differential parallelogram in Figure 2 has an area of

Suppose the transformation and has roots and let be the


roots. The inverse mapping of the differential region in the plane will be differential regions

corresponding to roots. The inverse mapping is illustrated in the following Figure 2 for As these
parallelograms are non- overlapping,

 If and does not have a root in , then

Random Signal Analysis Page 169


Example Pdf of Linear Transformation

Random Signal Analysis Page 170


Then

Example 2 Suppose X and Y are two independent Gaussian random variables each with mean 0 and

variance . Given and , find .

Solution:

We have and so that (1)

and (2)

From (1)

and

From (2)

Random Signal Analysis Page 171


Expected Values of Functions of Random Variables

Recall that

 If is a function of a continuous random variable then

 If is a function of a discrete random variable then

Random Signal Analysis Page 172


Suppose is a function of continuous random variables then the expected
value of is given by

Thus can be computed without explicitly determining .

We can establish the above result as follows.

Suppose has roots at . Then

where

is the differential region containing The mapping is illustrated in Figure 1 for .

Random Signal Analysis Page 173


Figure 1

Note that

As is varied over the entire axis, the corresponding (non-overlapping) differential regions in
plane cover the entire plane.

Thus,

If is a function of discrete random variables , we can similarly show that

Example 1 The joint pdf of two random variables is given by

Random Signal Analysis Page 174


Find the joint expectation of

Example 2 If

Proof:

Thus, expectation is a linear operator.

Example 3

Consider the discrete random variables discussed .The joint probability mass function of the

random variables are tabulated in Table . Find the joint expectation of .

Random Signal Analysis Page 175


Remark

(1) We have earlier shown that expectation is a linear operator. We can generally write

Thus

(2) If are independent random variables and ,then

Random Signal Analysis Page 176


Joint Moments of Random Variables

Just like the moments of a random variable provide a summary description of the random variable, so
also the joint moments provide summary description of two random variables. For two continuous random

variables , the joint moment of order is defined as

and the joint central moment of order is defined as

where and

Remark

(1) If are discrete random variables, the joint expectation of order and is defined as

(2) If and , we have the second-order moment of the random variables given by

(3) If are independent,

Random Signal Analysis Page 177


Covariance of two random variables

The covariance of two random variables is defined as

Cov(X,Y) is also denoted as .

Expanding the right-hand side, we get

The ratio is called the correlation coefficient. We will give an interpretation of

and later on.

We will also show that To establish the relation, we prove the following result:

For two random variables


Proof:

Consider the random variable

Non-negativity of the left-hand side implies that its minimum also must be nonnegative.

For the minimum value,

Random Signal Analysis Page 178


so the corresponding minimum is

Now

Thus

Uncorrelated random variables

Two random variables are called uncorrelated if

Random Signal Analysis Page 179


Recall that if are independent random variables, then

Then

Thus two independent random variables are always uncorrelated.

The converse is not always true.

• Two random variables may be dependent, but still they may be uncorrelated. If there exists
correlation between two random variables, one may be represented as a linear regression of the other. We
will discuss this point in the next section.

Linear Regression of on

Suppose is an approximation of Y in terms of X .This approximation is called the linear


regression of Y on X.

Therefore, is the regression error.

The mean square regression error is

Random Signal Analysis Page 180


Minimising the error will give optimal values of . Corresponding to the optimal solutions for

we have,

Solving for ,

so that

where is the correlation coefficient .

Thus is the linear regression of the random variable Y on the random variable
X.The linear regression approximates a random variable in terms of another random variable by means of a
straight- line fit.

The linear regression is illustrated in Figure 2.

Random Signal Analysis Page 181


Figure 2

Remark

If then are called positively correlated .

If then are called negatively correlated

If then are uncorrelated.

Note that independence implies uncorrelatedness. But uncorrelated generally does not imply
independence (except for jointly Gaussian random variables).

Example 4

are dependent, but they are uncorrelated.

Because

In fact for any zero- mean symmetric distribution of X, are uncorrelated.

Random Signal Analysis Page 182


Jointly Gaussian Random Variables

Many practically occuring random variables are modeled as jointly Gaussian random variables. For
example, noise samples at different instants in the communication system are modeled as jointly Gaussian
random variables.

Two random variables are called jointly Gaussian if their joint probability density function is

The joint pdf is determined by 5 parameters

 means

 variances

 correlation coefficient

We denote the jointly Gaussian random variables and with these parameters as

The joint pdf has a bell shape centred at as shown in the Figure 1 below. The variances

determine the spread of the pdf surface and determines the orientation of the surface in the

Random Signal Analysis Page 183


plane.

Figure 1 Jointly Gaussian PDF surface

Properties of jointly Gaussian random variables

(1) If and are jointly Gaussian, then and are both Gaussian.

Random Signal Analysis Page 184


We have

Similarly

(2) The converse of the above result is not true. If each of and is Gaussian, and are not
necessarily jointly Gaussian. Suppose

in this example is non-Gaussian and qualifies to be a joint pdf. Because,

and

Random Signal Analysis Page 185


The marginal density is given by

Similarly,

Thus and are both Gaussian, but not jointly Gaussian.

(3) If and are jointly Gaussian, then for any constants and ,the random variable given by

is Gaussian with mean and variance

Random Signal Analysis Page 186


(4) Two jointly Gaussian RVs and are independent if and only if and are uncorrelated

.Observe that if and are uncorrelated, then

Example 1 Suppose X and Y are two jointly-Gaussian 0-mean random variables with variances of 1 and 4

respectively and a covariance of 1. Find the joint PDF

We have

Joint Characteristic Functions of Two Random Variables

The joint characteristic function of two random variables X and Y is defined by

If and are jointly continuous random variables, then

Random Signal Analysis Page 187


Note that is same as the two-dimensional Fourier transform with the basis function
instead of

is related to the joint characteristic function by the Fourier inversion formula

If and are discrete random variables, we can define the joint characteristic function in terms of the
joint probability mass function as follows:

Properties of the Joint Characteristic Function

The joint characteristic function has properties similar to the properties of the chacteristic function of a
single random variable. We can easily establish the following properties:

1.

2.
3. If and are independent random variables, then

4. We have,

Random Signal Analysis Page 188


Hence,

In general, the order joint moment is given by

Example 2 The joint characteristic function of the jointly Gaussian random variables and with the
joint pdf

Let us recall the characteristic function of a Gaussian random variable

Random Signal Analysis Page 189


If and are jointly Gaussian,

we can similarly show that

We can use the joint characteristic functions to simplify the probabilistic analysis as illustrated on next
page:

Random Signal Analysis Page 190


Example 3 Linear transformation of two random variables

Suppose then

If and are jointly Gaussian, then

which is the characteristic function of a Gaussian random variable with

mean and variance

Thus the linear transformation of two Gaussian random variables is a Gaussian random variable.

Example 4 If Z = X + Y and X and Y are independent, then

Using the property of the Fourier transform, we get

Conditional Expectation

Recall that

Random Signal Analysis Page 191


 If are continuous random variables, then the conditional density function of

is given by

 If are discrete random variables, then the probability mass function of


is given by

The conditional expectation of is defined by

The conditional expectation of is also called the conditional mean of .

Clearly, denotes the centre of mass of the conditional pdf or the conditional pmf as shown in Figure
1 on next page.

Remark

 We can similarly define the conditional expectation of


 Higher-order conditional moments can be defined in a similar manner.

 Particularly, the conditional variance of is given by

Random Signal Analysis Page 192


Example 1

Consider the discrete random variables discussed in example 4 in lecture 18.The joint
probability mass function of the random variables are tabulated in Table . Find the joint expectation of

The conditional probability mass function is given by

Random Signal Analysis Page 193


Similarly, we can show that

We also note that

Example 2

Suppose are jointly uniform random variables with the joint probability density function
given by

Find

Random Signal Analysis Page 194


From the figure, in the shaded area.

Figure 2

We have

Example 3

Random Signal Analysis Page 195


Suppose are jointly Gaussian random variables with the joint probability density function
given by

Find .

We have,

which is a Gaussian distribution.

Random Signal Analysis Page 196


Therefore,

We can similarly show that

Conditional Expectation as a random variable

Using this function, we may define a random variable . Thus we may

consider as a function of the random variable and as the value of at .

Total expectation theorem

We establish the following results.

and

Proof :

Random Signal Analysis Page 197


Thus

and similarly

The above results simplify the calculation of the unconditional expectations .


We can also show that

and

Example 4 In example 1, we have

Random Signal Analysis Page 198


Baysean Estimation theory and conditional expectation

Consider two random variables with joint pdf . Suppose is observable and
some a priori information about is available in a sense that some values of are more likely. We can

represent this prior information in the form of a prior density function . We have to estimate for

a given value in some optimal sense.

The conditional density function is called likelihood function in estimation terminology.

Clearly

Also we have the Bayes rule

Random Signal Analysis Page 199


where is the a posteriori density function

Suppose the optimum estimator is a function of the random variable such that it minimizes the

mean-square estimation error . Such an estimator is known as the minimum mean-square


error(MMSE) estimator.

The estimation problem is

Minimize with respect to .

This is equivalent to minimizing

Since is always positive, the above integral will be minimum if the inner integral is minimum. This
results in the problem :

Minimize with respect to .


The minimum is given by

Random Signal Analysis Page 200


Example 5

Suppose are two jointly Gaussian random variables considered in the earlier example. We

have to estimate from a single observation .The MMSE estimator is given by

If and

then

Thus the MMSE estimator for two zero-mean jointly Gaussian random variables is linearly
related with the data . This result plays an important role in the optimal filtering of random signals.

Markov Inequality
Let us first take a look at the Markov Inequality. Even though the statement looks very simple, clever
application of the inequality is at the heart of more powerful inequalities like Chebyshev or Chernoff.
Initially, we will see the simplest version of the inequality and then we will discuss the more general
version. The basic Markov inequality states that given a random variable X that can only take non negative
values, then

Random Signal Analysis Page 201


There are some basic things to note here. First the term P(X >= k E(X)) estimates the probability that the
random variable will take the value that exceeds k times the expected value. The term P(X >= E(X)) is
related to the cumulative density function as 1 – P(X < E(X)). Since the variable is non negative, this
estimates the deviation on one side of the error.

Intuitive Explanation of Markov Inequality


Intuitively, what this means is that , given a non negative random variable and its expected value E(X)
(1) The probability that X takes a value that is greater than twice the expected value is atmost half. In other
words, if you consider the pmf curve, the area under the curve for values that are beyond 2*E(X) is atmost
half.
(2) The probability that X takes a value that is greater than thrice the expected value is atmost one third.
and so on.
Let us see why that makes sense. Let X be a random variable corresponding to the scores of 100 students in
an exam. The variable is clearly non negative as the lowest score is 0. Tentatively lets assume the highest
value is 100 (even though we will not need it). Let us see how we can derive the bounds given by Markov
inequality in this scenario. Let us also assume that the average score is 20 (must be a lousy class!). By
definition, we know that the combined score of all students is 2000 (20*100).
Let us take the first claim – The probability that X takes a value that is greater than twice the expected
value is atmost half. In this example, it means the fraction of students who have score greater than 40
(2*20) is atmost 0.5. In other words atmost 50 students could have scored 40 or more. It is very clear that it
must be the case. If 50 students got exactly 40 and the remaining students all got 0, then the average of
the whole class is 20. Now , if even one additional student got a score greater than 40, then the total score
of 100 students become 2040 and the average becomes 20.4 which is a contradiction to our original
information. Note that the scores of other students that we assumed to be 0 is an over simplification and
we can do without that. For eg, we can argue that if 50 students got 40 then the total score is atleast 2000
and hence the mean is atleast 20.
We can also see how the second claim is true. The probability that X takes a value that is greater than thrice
the expected value is atmost one third. If 33.3 students got 60 and others got 0, then we get the total score
as around 2000 and the average remains the same. Similarly, regardless of the scores of other 66.6
students, we know that the mean is atleast 20 now.
This also must have made clear why the variable must be non-negative. If some of the values are negative,
then we cannot claim that mean is atleast some constant C. The values that do not exceed the threshold
may well be negative and hence can pull the mean below the estimated value.
Let us look at it from the other perspective : Let p be the fraction of students who have a score of atleast a .
Then it is very clear to us that the mean is atleast a*p. What Markov inequality does is to turn this around.
It says, if the mean is a*p then the fraction of students with a score greater than a is atmost p. That is, we
know the mean here and hence use the threshold to estimate the fraction .

Generalized Markov Inequality

Random Signal Analysis Page 202


The probability that the random variable takes a value thats greater than k*E(X) is at most 1/k. The fraction
1/k act as some kind of a limit. Taking this further, you can observe that given an arbitrary constant a, the
probability that the random variable X takes a value >= a ie P(X >= a) is atmost 1/a times the expected
value. This gives the general version of Markov inequality.

In the equation above, I seperated the fraction 1/a because that is the only varying part. We will later see
that for Chebyshev we get a similar fraction. The proof of this inequality is straightforward. There are
multiple proofs even though we will use the follow proof as it allows us to show Markov
inequality graphically.This proof is partly taken from Mitzenmacher and Upfal’s exceptional book on
Randomized Algorithms.
Consider a constant a >= 0. Then define an indicator random variable I which takes value of 1 is X >=a . ie

Now we make a clever observation. We know that X is non negative. ie X >= 0. This means that the
fraction X/a is atleast 0 and atmost can be infinty. Also, if X < a, then X/a < 1. When X > a, X/a > 1. Using
these facts,

If we take expectation on both sides, we get

But we also know that the expectation of indicator random variable is also the probability that it takes the
value 1. This means E[I] = Pr(X>=a). Putting it all together, we get the Markov inequality.

Even more generalized Markov Inequality


Sometimes, it might happen that the random variable is not non-negative. In cases like this, a clever hack
helps. Design a function f(x) such that f(x) is non negative. Then we can apply Markov inequality on the
modified random variable f(X). The Markov inequality for this special case is :

This is a very powerful technique. Careful selection of f(X) allows you to derive more powerful bounds.
(1) One of the simplest examples is f(X) = |X| which guarantees f(X) to be non negative.
(2) Later we will show that Chebyshev inequality is nothing but Markov inequality that
uses
(3) Under some additional constraints, Chernoff inequality uses .

Simple Examples

Random Signal Analysis Page 203


Let us consider a simple example where it provides a decent bound and one where it does not. A typical
example where Markov inequality works well is when the expected value is small but the threshold to test
is very large.
Example 1:
Consider a coin that comes up with head with probability 0.2 . Let us toss it n times. Now we can use
Markov inequality to bound the probability that we got atleast 80% of heads.
Let X be the random variable indicating the number of heads we got in n tosses. Clearly, X is non negative.
Using linearity of expectation, we know that E[X] is 0.2n.We want to bound the probability P(X >= 0.8n).
Using Markov inequality , we get

Of course we can estimate a finer value using the Binomial distribution, but the core idea here is that we do
not need to know it !
Example 2:
For an example where Markov inequality gives a bad result, let us the example of a dice. Let X be the face
that shows up when we toss it. We know that E[X] is 7/2 = 3.5. Now lets say we want to find the probability
that X >= 5. By Markov inequality,

The actual answer of course is 2/6 and the answer is quite off. This becomes even more bizarre , for
example, if we find P(X >= 3) . By Markov inequality,

The upper bound is greater than 1 ! Of course using axioms of probability, we can set it to 1 while the
actual probability is closer to 0.66 . You can play around with the coin example or the score example to find
cases where Markov inequality provides really weak results.

Tightness of Markov
The last example might have made you think that the Markov inequality is useless. On the contrary, it
provided a weak bound because the amount of information we provided to it is limited. All we provided to
it were that the variable is non negative and that the expected value is known and finite. In this section, we
will show that it is indeed tight – that is Markov inequality is already doing as much as it can.
From the previous example, we can see an example where Markov inequality is tight. If the mean of 100
students is 20 and if 50 students got a score of exactly 0, then Markov implies that atmost 50 students can
get a score of atleast 40.

Consider a random variable X such that

Random Signal Analysis Page 204


We can estimate its expected value as

We can see that ,


This implies that the bound is actually tight ! Of course one of the reasons why it was tight is that the other
value is 0 and the value of the random variable is exactly k. This is consistent with the score example we
saw above.

Chebyshev Inequality
Chebyshev inequality is another powerful tool that we can use. In this inequality, we remove the restriction
that the random variable has to be non negative. As a price, we now need to know additional information
about the variable – (finite) expected value and (finite) variance. In contrast to Markov, Chebyshev allows
you to estimate the deviation of the random variable from its mean. A common use of it estimates the
probability of the deviation from its mean in terms of its standard deviation.
Similar to Markov inequality, we can state two variants of Chebyshev. Let us first take a look at the simplest
version. Given a random variable X and its finite mean and variance, we can bound the deviation as

There are few interesting things to observe here :


(1) In contrast to Markov inequality, Chebyshev inequality allows you to bound the deviation on both sides
of the mean.
(2) The length of the deviation is on both sides which is usually (but not always) tighter than the bound k
E[X]. Similarly, the fraction 1/k^2 is much more tighter than 1/k that we got from Markov inequality.
(3) Intuitively, if the variance of X is small, then Chebyshev inequality tells us that X is close to its expected
value with high probability.
(4) Using Chebyshev inequality, we can claim that atmost one fourth of the values that X can take is beyond
2 standard deviation of the mean.

Generalized Chebyshev Inequality


A more general Chebyshev inequality bounds the deviation from mean to any constant a . Given a positive
constant a ,

Proof of Chebyshev Inequality


The proof of this inequality is straightforward and comes from a clever application of Markov inequality. As
discussed above we select . Using it we get ,

We used the Markov inequality in the second line and used the fact that .

Random Signal Analysis Page 205


Common Pitfalls
It is important to notice that Chebyshev provides bound on both sides of the error. One common mistake
to do when applying Chebyshev is to divide the resulting probabilistic bound by 2 to get one sided error.
This is valid only if the distribution is symmetric. Else it will give incorrect results. You can refer Wikipedia to
see one sided Chebyshev inequalities.

Chebyshev Inequality for higher moments


One of the neat applications of Chebyshev inequality is to use it for higher moments. As you would have
observed, in Markov inequality, we used only the first moment. In the Chebyshev inequality, we use the
second moment (and first). We can use the proof above to adapt Chebyshev inequality for higher
moments. In this post, I will give a simple argument for even moments only. For general argument (odd and
even) look at this Math Overflow post.
The proof of Chebyshev for higher moments is almost exactly the same as the one above. The only
observation we make is that is always non negative for any k. Of course the next
observation is gives the 2k^th central moment . Using the statement from Mitzenmacher
and Upfal’s book we get ,

It should be intuitive to note that the more information we get the tighter the bound is. For Markov we got
1/t as the fraction. It was 1/a^2 for second order Chebyshev and 1/a^k for k^th order Chebyshev inequality.

Chebyshev Inequality and Confidence Interval


Using Chebyshev inequality, we previously claimed that atmost one fourth of the values that X can take is
beyond 2 standard deviation of the mean. It is possible to turn this statement around to get a confidence
interval.
If atmost 25% of the population are beyond 2 standard deviations away from mean, then we can be
confident that atleast 75% of the population lie in the interval . More generally,
we can claim that, percentage of the population lies in the
interval . We can similarly derive that 94% of the population lie within 4
standard deviations away from mean.

Applications of Chebyshev Inequality


We previously saw two applications of Chebyshev inequality – One to get tighter bounds using higher
moments without using complex inequalities. The other is to estimate confidence interval. There are some
other cool applications that we will state without providing the proof. For proofs refer the Wikipedia entry
onChebyshev inequality.
(1) Using Chebyshev inequality, we can prove that the median is atmost one standard deviation away from
the mean.
(2) Chebyshev inequality also provides the simplest proof for weak law of large numbers.

Tightness of Chebyshev Inequality

Random Signal Analysis Page 206


Similar to Markov inequality, we can prove the tightness of Chebyshev inequality. I had fun deriving this
proof and hopefully some one will find it useful. Define a random variable X as ,
[Note: I could not make the case statement work in WordPress Latex and hence the crude work around]
X={ + C with probability p
{ – C with probability p
{ with probability 1-2p

If we want to find the probability that the variable deviates from mean by constant C, the bound provided
by Chebyshev is ,

which is tight !

10. Questions

Objective Questions

1. Which operator is used to calculate average value

a. A[ ]

b. E[ ]

c. D[ ]

d. Z[ ]

2. Mean square value of RV is given as

a. E[X2]

b. E2[X]

c. [E[X2]]2

d. E[X2]

Random Signal Analysis Page 207


3. If X,Y are random variable, E[AX+BY] equals

a. A E[X] + B E[Y]

b. AX + B E[Y]

c. A E[X] + BY

d. A E[X + BY]

4. If X is uniformly distributed in [0,1] then P( X > 0.2) is

a. 0.2

b. 0.5

c. 0.7

d. 0.9

5. COV( X,Y) is defined as

a. E[(X-µX)(Y-µY)]

b. E[X-µY]

c. E[Y-µX]

d. E[µX-µY]

Short Questions

1. Write the formula to express the pdf of RV which is function of another RV.

2. Define expected value of RV

3. Write the formula to find the expected value of continuous & discrete RV

4. Write the formula to find the expected value of function of RV

5. Define variance of RV

6. Write the formula to find the variance of RV

Random Signal Analysis Page 208


7. Define Standard Deviation.

8. Write the formula to find the standard deviation of RV

9. What is Chebyshev inequality.

10. What is moment generating function

11. Define n-th order moment

12. Define n-th order characteristic function

Long Questions

1. The joint probability density function of (x,y) is given as

fX,Y(x,y)=20 e-4Xe-5Y x > 0, y > 0

=0 otherwise

Find the probability that

(i) 0 < x < 2 and 0.3 < y < 0.4


(ii) x < 1 and y > 0.3

2. Find the probability density function of Z = X + Y where X and Y are

(i) Any two random variables

(ii) Independent.

3. If X is a continuous random variable and Y= 4X + 2 then, find the pdf of Y.

4. Given: f(x,y) = k, 0<y<x<1

= 0 otherwise

Determine k and the conditional densities fx/y(x/ y) and Fy/x(y/ x).

5. Suppose X and Y are two random variables. Define covariance and correlation coefficient of X and Y.
When do we say that X and Y are
Random Signal Analysis Page 209
(i) Orthogonal
(ii) independent
(iii) Uncorrelated?
(iv) Are uncorrelated random variables independent?

University Questions

Dec 2012

Q. If X and Y are two independent random variable with identical uniform distribution in (0,1) find
probability density function of (U,V) where U= X + Y and V=X-Y. Are U, V independent ?

May 2012

Q. The joint density function of two dimensional random variable (X,Y) is given by

f(x,y)= kxye-(x2 +y2)

Find (i) the value of K

(i) Marginal density function of X & Y


(ii) Conditional density function of Y given that X=x and the conditional density function of X
given Y=y
(iii) Check for independence of X and Y

Q. Suppose X and Y are continuous random variable with joint probability density function

f(x,y)= xe-y ; 0 < x < 2, y> 0

= 0 elsewhere

Random Signal Analysis Page 210


(i) Find joint cumulative function of X and Y
(ii) Find marginal probability function of X and Y

Q. Prove that if two random variables are independent, then density function of their sum is given by
convolution of their density functions.

Dec 2011

Q. The joint probability density function of (x,y) is given as

fX,Y(x,y)=15 e-3Xe-5Y x > 0, y > 0

=0 otherwise

Find the probability that

(i) x < 2 and y > 0.2


(ii) find marginal density of X and Y
(iii) Are X and Y independent
(iv) Find E[X/Y] and E[Y/X]

Q. Suppose X and Y are two random variables. Define covariance and correlation coefficient of X and Y.
When do we say that X and Y are (i) Orthogonal (ii) independent and (iii) uncorrelated? Are uncorrelated
random variables independent?

May 2011

Q. The joint density function of two dimensional random variable (X,Y) is given by

f(x,y)= kxye-(x2 +y2)

Find (i) the value of K

(i) Marginal density function of X & Y


(ii) Conditional density function of Y given that X=x and the conditional density function of X
given Y=y
(iii) Check for independence of X and Y

Random Signal Analysis Page 211


Q. Suppose X and Y are two random variables. Define covariance and correlation coefficient of X and Y.
When do we say that X and Y are (i) Orthogonal (ii) independent and (iii) uncorrelated? Are uncorrelated
random variables independent?

Dec 2010

Q. The joint probability density function of (x,y) is given as f X,Y(x,y)=C(1-x-y)

for the values of x and y for which (x,y) lies within the triangle as shown

outside the triangle fX,Y(x,y)= 0

Find (i) C

(ii) fX(x)

(iii) fY(y)

May 2010

The joint probability density function of (x,y) is given as

fX,Y(x,y)=CeXeY 0<X<Y<∞

=0 otherwise

Random Signal Analysis Page 212


Find (i) Normalization constant C

(ii) fX(x)

(iii) FY(y)

(iv) FX(x/y)

(v) FY(y/x)

(vi) EY(y/x)

(vii) EX(x/y)

Q. If fX,Y(x,y)=2eXeY 0<X<Y<∞

=0 otherwise

Find the correlation coefficient of X and Y. Are X and Y independent ?

Q. State and explain joint & conditional probabilities of the event.

Q. The characteristic function of the Laplace variable

f(x)=(m/2)e-m|x| -∞<x<∞

also find its mean and variance.

Dec 2009

Q. The joint probability density function of (x,y) is given as

fX,Y(x,y)=15 e-3Xe-5Y x > 0, y > 0

=0 otherwise

Find the probability that

(v) 1 < x < 2 and 0.2 < y < 0.3

Random Signal Analysis Page 213


(vi) x < 2 and y > 0.2

Q. If X and Yare two independent exponential random variables with probability density functions given by

fx(x) = 2.e-2x , x > 0

= 0, x < 0 and

fy(Y) = 3.e-3y , y > 0

= 0, y<0

Find the probability density function of z =x + y.

June 2005

Q. We define conditional cdf of Y given X = x by

Fy(y Ix) = lim Fy(Y IX < X <.x + h) and applying Baye's rule it can be written as

h->0

FY(y| x <X< x+h)=lim P[Y<y, x<X<x+h] /P[x<X<x+h]

h->0

show that fY(y/x)= d/dy. [FY(y/x)]=fxy(x.y) / fx((x)

Find the characteristic function Фx(w) and hence determine the expected value of X.

May 2007

Q. The joint probability density function of two dimensional random variable (x, y) is given by - fxy(X, y) =
4xy e-(x2 +y2 ), x >0, y > 0

(i) Find the marginal density functions of x and y.

(ii) Find the conditional density function of Y given that X=x and the conditional density

Random Signal Analysis Page 214


function of X given that Y = y.

(iii) Check for independence of X and Y.

Q. Prove that if two random variables are independent, then density function of their sum is given by
convolution of their density functions.

Q. If X and Yare two independent exponential random variables with probability density functions given by

fx(x) = 2.e-2x , x > 0

= 0, x < 0 and

fy(Y) = 3.e-3y , y > 0

= 0, y<0

Find the probability density function of z =x + y.

Q. The joint probability density function of (x, y) is f xy(x,y) =8.e-(2x+-4y); x, y > O.

If U= X/ Y and V= y, find the joint probability density function of (U, V) and hence find probability density
function of U.

Q. If X and Yare two random variables with standard 'deviations 6x and 6y and if C xy is the covariance
between them, then prove .

(i) Cxy(x, y) = Rxy (x, y) - E[X].E[Y]

(ii) | Cxy | < 6x.6y.

Also deduce that -1 < p < 1.

Q. If X = cos θ and Y = sin θ where θ is uniformly distributed over (0,2π).

Random Signal Analysis Page 215


Prove that -

(I) X and Yare uncorrelated.

(ii) X and Yare not independent.

Dec 2006

(a) Suppose X and Y are two random variables. Define covariance and

correlation coefficient of X and Y. When do we say that X and Y are

(i) Orthogonal (ii) independent and (iii) uncorrelated? Are uncorrelated random variables independent?

(b) Suppose that X and Yare continuous random variables with joint

probability density function:

f xy (x, y) = 1/2 xe - Y , 0 < x < 2, Y > 0

= 0 elsewhere

find: i) the joint distribution function of X; Y and

ii) the marginal probability density functions of X and Y.

Q. (a) Find the probability density function of Z = X + Y where X and Yare

(i) any two random variables (ii) independent.

If X and Yare independent, Binomial random variables with parameters (m,p) and (n,p) respectively, obtain
the distribution of X+Y.

Random Signal Analysis Page 216


(b) Given: f(x,y) = k, 0<x<y<1

= 0 otherwise

Determine k and the conditional densities fx/y(x/ y) and Fy/x(y/ x).

June 2006

(a) Suppose X and Y are two random variables. Define covariance and correlation coefficient of X and Y.
When do we say that X and Yare (i) orthogonal (ii) independent and (iii) uncorrelated ?

(b) Suppose X and Yare continuous random variables with joint probability density function.

f( x y) = (x + y)2 /40 ,-1 < x < 1 - 3 < Y < 3

= 0 elsewhere

Find (i) the marginal density functions of X and Y

(ii) means and variances of X and Y

(iii) correlation coefficient of X and Y [12]

(a) Suppose X and Y are independent random variables and each is exponentially distributed with common
parameter A. That is,

f x (x) = λe-λx, and f y (y) = λe-λy. Let the random variables u and v be given by u=X+ Y and v=X-Y, obtain the
joint density of u and v and the marginal density of u.

(b) Let f x y(x,y) = 1, 0< |Y| < x < 1

Determine E (XlY) and E (YIX).

Dec 2005

(b) If X and Yare independent random variables and z = x + y, find f(z) by the transform method.

June 2005

Random Signal Analysis Page 217


Q. fX,Y(x,y) = Cexey 0<y<x<∞

=0 otherwise

Find the value of constant C

Find (i) fX(x) (ii) FX(x) (iii) FY(y) (iv) FX(x/y) (v) fY(y/x) (vi) E(Y/x) (vii) E(X/y)

Q.(a) Let X and Y two continuous random variables. -

(i) Derive an expression for their joint moment at the origin. Why it is called correlation?

Explain its physical significance.

(ii) Derive an expression for their joint central moment. Why it is called covariance? Explain

its physical significance.

(iii) Derive an expression for 'their normalized covariance. Why it is called covariance

coefficient? Explain what is its physical significance? What is its range of values?

(iv) Explain when X and Yare orthogonal, when they are independent and they are uncorrelated

Q. fX,Y(x,y)= (x+y)2/40 -1<x<1, -3<y<3

=0 otherwise

(i) Find marginal densities of X and Y.

(ii) Find mean and variance of X and Y.

(iii) Find second order moment of X and Y.

(iv) Find correlation coefficient of X and Y.

Dec 2004

(a) Let X and Y two continuous random variables. What is meant by their correlation function R xy ? Derive
an expression for Rxy. given their joint density function, fxy (x, y). What happens to Rxy ?

Random Signal Analysis Page 218


(i) if X and Yare independent? (ii) if X and Yare orthogonal?

(b) What is meant by the covariance function Cxy of the random variables of X and Y. Write and expression
for the covariance. Given the mean value of X and Yare μx and μy respectively. Under which conditions Cxy is
positive? Under which conditions Cxy is negative? .What is normalized covariance or covariance coefficient?
Write the expression for ρ and its range of values.

(c) Let X and Y be two continuous random variables with means equal to 7/12 and 7/12 respectively and
variances equal to 11/44 and 11/44 respectively and their joint probability density function.

fX,Y(x,y)= x+y 0<x<1, 0<y<1

=0 otherwise

Determine correlation function, covariance function, and correlation coefficient of X and V.

Q. Define the characteristic functions Φx(w) and Φy(w) of the continuous random variables X and Y
respectively and find the probability density function fz(Z), given Z = X + Y.

Random Signal Analysis Page 219


CHAPTER-4

Sequence of Random Variables and Convergence

 Motivation

When we have a random variable which is function of another random variable and we know the
statistics of one random variable then we can get the statistics of the unknown random variable in
terms of known random variable.

 Syllabus
Sr. No. of Self
Topic Fine Detailing
No. Hours Study
01 Stochastic  Sequence of random variables 1 hour 1 hour
Convergence
and limit  Convergence everywhere,
theorems almost everywhere,

 MS, in probability, in 1 hour 1 hour


distribution a

 comparison of convergence
modes,

Random Signal Analysis Page 220


 strong law of large numbers 1 hour 1 hour

 Central Limit Theorem


and its significance.

 Books Recommended

1. A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochastic


Processes, 4th Edition, McGraw-Hill, 2002

2. P.Z. Peebles, Probability, Random Variables and Random Signal Principles,


4th edition, Mc-Graw Hill, 2000

3. H. Stark and J.W. Woods, Probability and Random Processes with


Applications to Signal Processing, 3e, Pearson edu

4. Wim C Van Etten, Introduction to Random Signals and Noise, Wiley

5. Miller, Probability and Random Processes-with applications to signal


processing and communication, first ed2007, Elsevier

4. Weight age in University Examination: 8-16 marks.

5. Objective

In this chapter we study the laws of large numbers and the central limit theorem, which is one of the most
remarkable results in probability theory, are discussed.

6. Key Notation

Random Signal Analysis Page 221


E[ ], µ Expected value

σ Standard Deviation

σ 2, var[ ] Variance

E[Xn] n-th order moment of RV

ΦX(w) Characteristic function

7. Key Definitions & Relations

1. The weak law of large numbers states that the sample average converges in probability towards
the expected value

That is to say that for any positive number ε,

2. Central Limit Theorem

Consider independent random variables .The mean and variance of each

of the random variables are assumed to be known. Suppose and .


Form a random variable

The mean and variance of are given by

Random Signal Analysis Page 222


9. Theory

Convergence of a sequence of random variables

Let be a sequence n independent and identically distributed random variables. Suppose


we want to estimate the mean of the random variable on the basis of the observed data by means of the
relation

How closely does represent the true mean as n is increased? How do we measure the

closeness between and ?

Notice that is a random variable.What do we mean by the statement converges to ?

 Consider a deterministic sequence of real numbers The sequence converges to a

limit x if corresponding to every we can find a positive integer N such that

For example, the sequence converges to the number 0.

Because, for any we can choose a positive integer such that

 The Cauchy criterion gives the condition for convergence of a sequence without actually finding the

limit. The sequence converges if and only if, for every there exists a positive

Random Signal Analysis Page 223


integer N such that

Convergence of a random sequence cannot be defined as above. Note that for each

represent a sequence of numbers.

Thus represents a family of sequences of numbers. Convergence of a random sequence is


to be defined using different criteria. Five of these criteria are explained below.

1. Convergence Everywhere

A sequence of random variables is said to converge everywhere to X if

Note here that the sequence of numbers for each sample point is convergent.

2. Almost sure (a.s.) convergence or convergence with probability 1

A random sequence may not converge for every . Consider the event

The sequence is said to converge to X almost sure or with probability 1

We write in this case.

One important application is the Strong Law of Large Numbers (SLLN) :

If are independent and identically distributed random variables with a finite mean

Random Signal Analysis Page 224


, then

Remark:

 is called the sample mean.

 The strong law of large numbers states that the sample mean converges to the true mean as the
sample size increases.
 The SLLN is one of the fundamental theorems of probability. There is a weaker version of the law
that we will discuss later.

3. Convergence in the mean square sense

A random sequence is said to converge in the mean-square sense (m.s) to a random


variable X if

X is called the mean-square limit of the sequence and we write

where means limit in mean-square. We also write

 The following Cauchy criterion gives the condition for m.s. convergence of a random sequence

without actually finding the limit. The sequence converges in m.s. if and only if ,

for every there exists a positive integer N such that

Random Signal Analysis Page 225


Example 1

If are iid random variables, then

we have to show that

Now,

4. Convergence in probability

Associated with the sequence of random variables we can define a sequence of


probabilities

for every .

The sequence is said convergent to X in probability if this sequence of probability is

convergent that is for every .

We write to denote convergence in probability of the sequence of random variables

to the random variable X.

If a sequence is convergent in mean, then it is convergent in probability also, because

[ Markov Inequality ]

Random Signal Analysis Page 226


we have,

if ( mean square convergent ) then

Example 2

suppose be a sequence of random variables with

Clearly,

Therefore
Thus the above sequence converges to a constant in probability.

Remark:
Convergence in probability is also called stochastic convergence.

Weak Law of Large numbers

Suppose are independent and identically distributed random variables, with sample
mean

Random Signal Analysis Page 227


We have,

5. Convergence in distribution

Consider the random sequence and a random variable . Suppose and

are the distribution functions of and respectively. The sequence is said to converge to in
distribution if

for all x at which is continuous. Here the two distribution functions eventually coincide. We write

to denote convergence in distribution of the random sequence to the random variable


.

Example 3 Suppose is a sequence of RVs with each random variable having the

uniform density

Random Signal Analysis Page 228


define

We can show that

clearly,

Relation between Types of Convergence

Central Limit Theorem

Consider independent random variables .The mean and variance of each of the

random variables are assumed to be known. Suppose and . Form a random


variable

The mean and variance of are given by

Random Signal Analysis Page 229


and

Thus we can determine the mean and the variance of .

Can we guess about the probability distribution of ?


The central limit theorem (CLT) provides an answer to this question.

he CLT states that under very general conditions converges in distribution to

as . The conditions are:

1. The random variables are independent and identically distributed.

2. The random variables are independent with same mean and variance, but not
identically distributed.

3. The random variables are independent with different means and same variance and
not identically distributed.

4. The random variables are independent with different means and each variance
being neither too small nor too large.

We shall consider the first condition only. In this case, the central-limit theorem can be stated as follows:

Suppose is a sequence of independent and identically distributed random variables

each with mean and variance and Then, the sequence { } converges in

distribution to a Gaussian random variable with mean 0 and variance . That is,

Random Signal Analysis Page 230


Remarks

 The central limit theorem is really a property of convolution. Consider the sum of two statistically

independent random variables, say, . Then the pdf is the convolution of

. This can also be shown with the help of the characteristic functions as follows:

where * is the convolution operation.

We can illustrate the CLT by convolving two uniform distributions repeatedly. In Figure 1, the
convolution of two uniform distributions gives a triangular distribution. Further convolution gives a
parabolic distribution and so on.

Random Signal Analysis Page 231


Proof of the Central Limit Theorem :

We give a less rigorous proof of the theorem with the help of the characteristic function. Further we

consider each of to have zero mean. Thus,

Clearly,

The characteristic function of is given by

We will show that as the characteristic function is of the form of the characteristic function of a
Gaussian random variable.

Expanding in power series

Random Signal Analysis Page 232


Assume all the moments of to be finite. Then

Substituting

where is the average of terms involving and higher powers of .

Note also that each term in involves a ratio of a higher moment and a power of and therefore,

which is the characteristic function of a Gaussian random variable with 0 mean and variance .

Remark

1. Under the conditions of the CLT, the sample mean converges in distribution to

In other words, if samples are taken from any distribution with mean and variance

, as the sample size increases, the distribution function of the sample mean approaches to the
distribution function of a Gaussian random variable.

2. The CLT states that the distribution function converges to a Gaussian distribution function.

The theorem does not say that the pdf is a Gaussian pdf in the limit. For example, suppose

each has a Bernoulli distribution. Then the pdf of Y consists of impulses and can never approach

Random Signal Analysis Page 233


the Gaussian pdf.

3. The Cauchy distribution does not meet the conditions for the central limit theorem to hold. As we
have noted earlier, this distribution does not have a finite mean or a finite variance. Suppose a

random variable has the Cauchy distribution

The characteristic function of is given by

The sample mean will have the characteristic function

Thus the sum of large number of Cauchy random variables will not follow a Gaussian distribution.

1. The central-limit theorem is one of the most widely used results of probability. If a random variable
is result of several independent causes, then the random variable can be considered to be Gaussian.
For example,

1. the thermal noise in a resistor is result of the independent motion of billions of electrons
and is modelled as Gaussian.
2. the observation error/ measurement error of any process is modeled as a Gaussian.

2. The CLT can be used to simulate a Gaussian distribution given a routine to simulate a particular
random variable.
3. Normal approximation of the Binomial distribution
4. One of the applications of the CLT is in approximation of the Binomial coefficients. We have already

stated about this approximation.Suppose, .... is a sequence of Bernoulli(p) random

Random Signal Analysis Page 234


variables with

5. Then is a Binomial distribution with and

Thus,

or,

( assume the integrand interval = 1 )


This is normal approximation to the Binomial coefficients and is known as the De-Moirre-Laplace
approximation.

Law of large numbers (LLN)

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing
the same experiment a large number of times. According to the law, the average of the results obtained
from a large number of trials should be close to the expected value, and will tend to become closer as more
trials are performed.

The LLN is important because it "guarantees" stable long-term results for the averages of some random
events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will
tend towards a predictable percentage over a large number of spins. Any winning streak by a player will
eventually be overcome by the parameters of the game. It is important to remember that the LLN only
applies (as the name indicates) when a large number of observations are considered. There is no principle
that a small number of observations will coincide with the expected value or that a streak of one value will
immediately be "balanced" by the others

Examples

For example, a single roll of a fair, six-sided die produces one of the numbers 1, 2, 3, 4, 5, or 6, each with
equal probability. Therefore, the expected value of a single die roll is

Random Signal Analysis Page 235


According to the law of large numbers, if a large number of six-sided dice are rolled, the average of their
values (sometimes called the sample mean) is likely to be close to 3.5, with the precision increasing as more
dice are rolled.

It follows from the law of large numbers that the empirical probability of success in a series of Bernoulli
trials will converge to the theoretical probability. For a Bernoulli random variable, the expected value is the
theoretical probability of success, and the average of n such variables (assuming they are independent and
identically distributed (i.i.d.)) is precisely the relative frequency.

For example, a fair coin toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability
that the outcome will be heads is equal to 1/2. Therefore, according to the law of large numbers, the
proportion of heads in a "large" number of coin flips "should be" roughly 1/2. In particular, the proportion
of heads after n flips will almost surely converge to 1/2 as n approaches infinity.

Though the proportion of heads (and tails) approaches 1/2, almost surely the absolute (nominal) difference
in the number of heads and tails will become large as the number of flips becomes large. That is, the
probability that the absolute difference is a small number, approaches zero as the number of flips becomes
large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero.
Intuitively, expected absolute difference grows, but at a slower rate than the number of flips, as the
number of flips grows.

Forms

Two different versions of the law of large numbers are described below; they are called the strong law of
large numbers, and the weak law of large numbers. Both versions of the law state that – with virtual
certainty – the sample average

converges to the expected value

where X1, X2, ... is an infinite sequence of i.i.d. Lebesgue integrable random variables with expected value
E(X1) = E(X2) = ...= µ. Lebesgue integrability of Xj means that the expected value E(Xj) exists according to
Lebesgue integration and is finite.

An assumption of finite variance Var(X1) = Var(X2) = ... = σ2 < ∞ is not necessary. Large or infinite variance
will make the convergence slower, but the LLN holds anyway. This assumption is often used because it
makes the proofs easier and shorter.

The difference between the strong and the weak version is concerned with the mode of convergence being
asserted. For interpretation of these modes, see Convergence of random variables.

Random Signal Analysis Page 236


Weak law

The weak law of large numbers (also called Khintchine's law) states that the sample average converges in
probability towards the expected value[6][proof]

That is to say that for any positive number ε,

Interpreting this result, the weak law essentially states that for any nonzero margin specified, no matter
how small, with a sufficiently large sample there will be a very high probability that the average of the
observations will be close to the expected value; that is, within the margin.

Convergence in probability is also called weak convergence of random variables. This version is called the
weak law because random variables may converge weakly (in probability) as above without converging
strongly (almost surely) as below.

Strong law

The strong law of large numbers states that the sample average converges almost surely to the expected
value[7]

That is,

The proof is more complex than that of the weak law.[8] This law justifies the intuitive interpretation of the
expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "long-
term average".

Almost sure convergence is also called strong convergence of random variables. This version is called the
strong law because random variables which converge strongly (almost surely) are guaranteed to converge
weakly (in probability). The strong law implies the weak law but not vice versa, when the strong law
conditions hold the variable converges both strongly (almost surely) and weakly (in probability) . However
the weak law may hold in conditions where the strong law does not hold and then the convergence is only
weak (in probability) .

There are different views among mathematicians whether the two laws could be unified to one law,
thereby replacing the weak law.[9]

Random Signal Analysis Page 237


However the strong law conditions could not be proven to hold same as the weak law to date.

The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem.

Moreover, if the summands are independent but not identically distributed, then

provided that each Xk has a finite second moment and

Differences between the weak law and the strong law

The weak law states that for a specified large n, the average is likely to be near μ. Thus, it leaves open
the possibility that happens an infinite number of times, although at infrequent intervals.

The strong law shows that this almost surely will not occur. In particular, it implies that with probability 1,
we have that for any ε > 0 the inequality holds for all large enough n.

Central Limit Theorem Examples

A Central Limit Theorem word problem will most likely contain the phrase “assume the variable is normally
distributed”, or one like it. With these central limit theorem examples, you will be given:

A population (i.e. 29-year-old males, seniors between 72 and 76, all registered vehicles, all cat owners)

An average (i.e. 125 pounds, 24 hours, 15 years, $15.74)

A standard deviation (i.e. 14.4lbs, 3 hours, 120 months, $196.42)

A sample size (i.e. 15 males, 10 seniors, 79 cars, 100 households)

Central Limit Theorem Examples: Greater than

General Steps
Step 1:Identify the parts of the problem. Your question should state:

the mean (average or μ)

the standard deviation (σ)


Random Signal Analysis Page 238
population size

sample size (n)

a number associated with “greater than” ( ). Note: this is the sample mean. In other words, the problem
is asking you “What is the probability that a sample mean of x items will be greater than this number?

Step 2: Draw a graph. Label the center with the mean. Shade the area roughly above (i.e. the “greater
than” area). This step is optional, but it may help you see what you are looking for.

Step 3: Use the following formula to find the z-score. Plug in the numbers from step 1.

Click here if you want easy, step-by-step instructions for solving this formula.

Subtract the mean (μ in step 1) from the ‘greater than’ value ( in step 1). Set this number aside for a
moment.

Divide the standard deviation (σ in step 1) by the square root of your sample (n in step 1). For example, if
thirty six children are in your sample and your standard deviation is 3, then 3/√36=0.5

Divide your result from step 1 by your result from step 2 (i.e. step 1/step 2)

Step 4: Look up the z-score you calculated in step 3 in the z-table. If you don’t remember how to look up z-
scores, you can find an explanation in step 1 of this article: Area to the right of a z-score.

Step 5: Subtract your z-score from 0.5. For example, if your score is 0.1554, then 0.5 – 0.1554 = 0.3446.

Step 6: Convert the decimal in Step 5 to a percentage. In our example, 0.3446 = 34.46%.

That’s it!

2. Specific Example

Random Signal Analysis Page 239


Q. A certain group of welfare recipients receives SNAP benefits of $110 per week with a standard deviation
of $20. If a random sample of 25 people is taken, what is the probability their mean benefit will be greater
than $120 per week?

Step 1: Insert the information into the z-formula:


= (120-110)/20 √25 = 10/ (20/5) = 10/4 = 2.5.
Step 2: Look up the z-score in a table (or calculate it using technology). A z-score of 2.5 has an area of
roughly 49.38%. Adding 50% (for the left half of the curve), we get 99.38%.

Central Limit Theorem Examples: Less than

1. General Steps
Step 1: Identify the parts of the problem. Your question should state:

the mean (average or μ)

the standard deviation (σ)

population size

sample size (n)

a number associated with “less than” ( )

Step 2: Draw a graph. Label the center with the mean. Shade the area roughly below (i.e. the “less than”
area). This step is optional, but it may help you see what you are looking for.

Step 3: Use the following formula to find the z-score. Plug in the numbers from step 1.

Click here if you want simple, step-by-step instructions for using this formula.
If formulas confuse you, all this formula is asking you to do is:

Random Signal Analysis Page 240


Subtract the mean (μ in step 1) from the less than’ value ( in step 1). Set this number aside for a
moment.

Divide the standard deviation (σ in step 1) by the square root of your sample (n in step 1). For example, if
thirty six children are in your sample and your standard deviation is 2, then 3/√36=0.5

Divide your result from step 1 by your result from step 2 (i.e. step 1/step2)

Step 4: Look up the z-score you calculated in step 4 in the z-table. If you don’t remember how to look up z-
scores,you can find an explanation in step 1 of this article on area to the right of a z-score in a normal
distribution curve.

Step 5: Add your z-score to 0.5. For example, if your z-score is 0.1554, then 0.5 + 0.1554 is 0.6554.

Step 6:Convert the decimal in Step 6 to a percentage. In our example, 0.6554=65.54%.

That’s it!

2. Specific Example
A population of 29 year-old males has a mean salary of $29,321 with a standard deviation of $2,120. If a
sample of 100 men is taken, what is the probability their mean salaries will be less than $29,000?
Step 1: Insert the values into the z-formula:
=(29,000-29,321)/2,120/√100 = -321/212 = -1.51.
Step 2: Look up the z-score in the left-hand z-table (or use technology). -1.51 has an area of 93.45%.
However, this is not the answer, as the question is asking for LESS THAN, and 93.45% is the area “greater
than” so you need to subtract from 100%.
100%-93.45%=6.55% or about 0.07.

Central Limit Theorem Examples: Between

Sample problem: There are 250 dogs at a dog show who weigh an average of 12 pounds, with a standard
deviationof 8 pounds. If 4 dogs are chosen at random, what is the probability they have an average weight
of greater than 8 pounds and less than 25 pounds?

Step 1:Identify the parts of the problem. Your question should state:

mean (average or μ) standard deviation (σ) population size

sample size (n)

number associated with “less than” 1

number associated with “greater than” 2

Random Signal Analysis Page 241


Step 2: Draw a graph. Label the center with the mean. Shade the area between 1 and 2. This step is
optional, but it may help you see what you are looking for.

Step 3: Use the following formula to find the z-scores.

All this formula is asking you to do is:

a) Subtract the mean (μ in Step 1) from the greater than value (Xbar in Step 1): 25-12=13.
b) Divide the standard deviation (σ in Step 1) by the square root of your sample (n in Step 1): 8/sqrt4=4
c) Divide your result from a by your result from b: 13/4=3.25

Step 4 Use the formula from Step 3 to find the z-values. This time, use Xbar2 from Step 1 (8).

a) Subtract the mean (μ in Step 1) from the greater than value (Xbar in Step 1): 8-12=-4.
b) Divide the standard deviation (σ in Step 1) by the square root of your sample (n in Step 1): 8/sqrt4=4
c) Divide your result from a by your result from b: -4/4= -1

Step 5: Look up the value you calculated in Step 3 in the z-table.

Z value of 3.25 corresponds to .4994

Step 6: Look up the value you calculated in Step 4 in the z-table.

Z value of 1 corresponds to .3413

Note that the bell curve is symmetrical, so if you want to look up a negative value like -1, then just look up
the positive counterpart. The area will be the same.

Step 7: Add Step 5 and 6 together:

.4994 + .3413 = .8407

Random Signal Analysis Page 242


Step 8: Convert the decimal in Step 7 to a percentage:

.8407 = 84.07%

That’s it!
Back to top for more Central Limit Theorem Examples

Central Limit Theorem on the TI 89

Sample problem: A population of community college students includes inner city students (p = .33). What is
theprobability that a random sample of 45 students from the population will have from 20% to 40% inner
city students?

Step 1: Press APPS. Highlight the Stats/List Editor by using the scroll keys. Press ENTER.
If you don’t see the Stats/List editor you need to load the app. See instructions here.

Step 2: Press F5 and scroll down to C: BinomialCdf.

Step 3: Enter 45 in the Num Trials box.

Step 4: Scroll down and enter .33 in the Prob Success box.

Step 5: Scroll down and enter 9 in the Lower Value box (because 20% of 45 = 9).

Step 6: Scroll down and enter 18 in the Upper Value box (because 40% of 45 = 18). Press ENTER.

Step 7: Read the Result: Cdf=.857142. This means that the probability your random sample will have 20-
40% inner city students is 85.71%.

10. Questions

Objective questions

1. Central Limit theorem is applicable to

a. Weak variable

b. Large variables

c. Strong variable

d. Random variables

Random Signal Analysis Page 243


2. Weak law of large numbers is applicable to

a. Weak variable

b. Large variables

c. Strong variable

d. Random variables

3. Convergence Everywhere implies

a. | X(s) - Xn(s) | → 0

b. | X(s) - Xn(s) | → ∞

c. | X(s) - Xn(s) | → µ

d. | X(s) - Xn(s) | → σ

4. Almost sure convergence or convergence with probability 1 implies

a. P(s| Xn(s) → X(s) ) = 1

b. P(s| Xn(s) → X(s) ) = 0

c. P(Xn(s)) = 1

d. P( X(s) ) = 0

5. Convergence in the mean square sense implies

a. E [ Xn- X]2 → 0

b. E [ X]2 → 0

c. E [ Xn]2 → 0

Random Signal Analysis Page 244


d. E [ Xn- X] → 0

Short Questions

1. Explain when sequence of random variables is said to converge everywhere

2. Explain when sequence of random variables is said Almost sure (a.s.) to converge or converge with
probability 1

3. Explain when sequence of random variables is said to converge in the mean square sense

4. Explain when sequence of random variables is said to converge in probability

5. State Central limit theorem

6. State weak law of large numbers

Long Questions

1. The scores on a general test have mean 450 and standard deviation 50. It is highly desirable to score over
480 on this exam. A person can get into Smith's College prestigious MBA program if he/she scores over 480.
In one location 25 people sign up to take the exam. The average score of these 25 people exceeds 490. Is
this odd? Should the test center investigate? Answer on the basis of the CLT.

2. A machine fills cereal boxes at a factory. Due to an accumulation of small errors (different flakes sizes,
etc.) it is thought that the amount of cereal in a box is normally distributed with mean 22 oz. for a

Random Signal Analysis Page 245


supposedly 20 oz. box. Suppose the standard deviation of the amount filled is 1.3 oz. A federal regulatory
selects four of these boxes at random and finds that the average content of these boxes is less than 18 oz.
This official knows that the company claims the mean content to be 22 oz. He promptly fines the company.
Who is right? Use the CLT in your answer.

3. Sixteen adult males are in a pit which is 98 feet deep. They decide to stand on one another (feet to
head), hoping that the person on top can grip the top of the pit and get out, and, hence go for help. What's
the probability that their plan succeeds?

4.. (Weak law of large numbers) If are iid random variables each with mean and

Show that converges to in probability.

5. Suppose and . Examine if

6. Show that implies

University Questions

Dec 2012

Q. State & explain Central Limit theorem.

Q. Define sequence of random variable

Q. explain and prove Chebyshev inequality

May 2012

Q. State & explain Central Limit theorem.

Q. Define sequence of random variable

Dec 2011

Q. State & explain Central Limit theorem.

Random Signal Analysis Page 246


Q. Define sequence of random variable

Dec 2010

1. State & explain Central Limit theorem.

2. Let X1, X2, X3,..... be the sequence of random variable

Define (i) convergence almost everywhere

(ii) Convergence in probability

(iii) Convergence in mean square sense

(iv) Convergence in distribution

For the above sequence of random variable

May 2010

1. State & explain Central Limit theorem.

2. Explain strong law of large numbers

3. Describe the sequence of random variable

Random Signal Analysis Page 247


CHAPTER-5
Random Process

1. Motivation:

This topic develops the fundamental understanding and analyzes the behavior of signals and
random phenomena.

2. Syllabus:

Sr. No Content Duration Self Study Time

1.  spectral representation of a real 1 hour 1 hour


WSS process

 power spectral density and 1 hour 1 hour

properties

 cross power spectral density and 1 hour 1 hour

properties

1 hour 1 hour
 autocorrelation function and power
spectral density of a WSS random
sequence

 linear time-invariant system with a 1 hour 1 hour


WSS process as an input
stationarity of the output,

 Autocorrelation and

Random Signal Analysis Page 248


power-spectral density of the 1 hour 1 hour
output

 examples of random processes:


white noise process and white
1 hour 1 hour
noise sequence; Gaussian process;
Poisson process
 spectral representation of a real 1 hour 1 hour
WSS process

 power spectral density and


properties
1 hour 1 hour

 cross power spectral density and


properties
1 hour 1 hour

 autocorrelation function and power


spectral density of a WSS random
1 hour 1 hour
sequence

 linear time-invariant systemwith a


WSS process as an input
1 hour 1 hour
stationarity of the output,

 Autocorrelation and
power-spectral density of the
output 1 hour 1 hour

 examples of random processes:


white noise process and white

Random Signal Analysis Page 249


noise sequence; Gaussian process;
Poisson process
1 hour 1 hour

3. References:

1. A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochastic

Processes, 4th Edition, McGraw-Hill, 2002

2. P.Z. Peebles, Probability, Random Variables and Random Signal Principles,

4th edition, Mc-Graw Hill, 2000

3. H. Stark and J.W. Woods, Probability and Random Processes with

Applications to Signal Processing, 3e, Pearson edu

4. Wim C Van Etten, Introduction to Random Signals and Noise, Wiley

5. Miller, Probability and Random Processes-with applications to signal

processing and communication, first ed2007, Elsevier

4. Weightage in University Examination: 20 to 25 Marks

5. Prerequisite:

Knowledge of signals and its behavior is required

6. Key Notations:

Random Signal Analysis Page 250


1. SSS : strictsense stationary

2. WSS wide-sense stationary

E[ ], µ Expected value

σ Standard Deviation

σ 2, var[ ] Variance

E[Xn] n-th order moment of RV

ΦX(w) Characteristic function

R( t) Autocorrelation

S(w ) Power Spectral Density

δ( t) Delta function

h( t) Impulse Response

H(z) System Function

T( ) Transformation

7. Key Definitions:

1. Random Process:

A random process maps each sample point to a waveform.

Thus a random process is a function of the sample point and index variable and may be written as

2 Conditional Probability:

Random Signal Analysis Page 251


For two events A and B with , the conditional probability was defined as

3. Conditional Distribution Function

Consider the event and any event B involving the random variable X . The conditional
distribution function of X given B is defined as

4. Linear system

The system is called linear if the principle of superposition applies: the weighted sum of inputs results
in the weighted sum of the corresponding outputs . Thus for a linear system

5. Linear time-invariant system

Consider a linear system with y ( t ) = T x ( t ). The system is called time-invariant if

6. Causal system

The system is called causal if the output of the system at depends only on the present and
past values of input. Thus for a causal system

Random Signal Analysis Page 252


7. Zero - the point in the where Consequently at such a point.

8. Pole - the point in the where Consequently at such a point.

8. Theory and Mathematical Representation

Introduction

1. Random Process

In practical problems, we deal with time varying waveforms whose value at a time is random in nature. For
example, the speech waveform recorded by a microphone, the signal received by communication receiver
or the daily record of stock-market data represents random variables that change with time. How do we
characterize such data? Such data are characterized as random or stochastic processes. This will covers the
fundamentals of random processes.

Recall that a random variable maps each sample point in the sample space to a point in the real line.
A random process maps each sample point to a waveform.

Consider a probability space . A random process can be defined on as an indexed

family of random variables where is an index set, which may be discrete or


continuous, usually denoting time. Thus a random process is a function of the sample point and index

variable and may be written as .

 For a fixed is a random variable.

 For a fixed , is a single realization of the random process and is a deterministic


function.

For a fixed and a fixed is a single number.

When both and are varying we have the random process .

 The random process is normally denoted by

Random Signal Analysis Page 253


 Example 1 Consider a sinusoidal signal where is a binary random variable with

probability mass functions and

 Clearly, is a random process with two possible realizations and

At a particular time is a random variable with two values and

Continuous-time vs. Discrete-time process

If the index set is continuous, is called a continuous-time process.

Example 2 Suppose, where and are constants and is uniformly

distributed between 0 and . is an example of a continuous-time process.

Figure below shows continuous-time process.

Random Signal Analysis Page 254


If the index set is a countable set, is called a discrete-time process. Such a random process

can be represented as and called a random sequence. Sometimes the notation is


used to describe a random sequence indexed by the set of positive integers.

We can define a discrete-time random process on discrete points of time. Particularly, we can get a

discrete-time random process by sampling a continuous-time process \ at a

uniform interval such that

The discrete-time random process is more important in practical implementations. Advanced


statistical signal processing techniques have been developed to process this type of signals.

Example 3 Suppose where is a constant and is a random variable uniformly


distributed between and .

is an example of a discrete-time process illustrated in Figure

2. Continuous-state vs. Discrete-state process

The value of a random process is at any time can be described from its probabilistic model.

The state is the value taken by at a time , and the set of all such states is called the state
space. A random process is discrete-state if the state-space is finite or countable. It also means that the
corresponding sample space is also finite or countable. Otherwise , the random process is called continuous
state.

Random Signal Analysis Page 255


Example 4 Consider the random sequence generated by repeated tossing of a fair coin where
we assign 1 to Head and 0 to Tail.

Clearly, can take only two values - 0 and 1. Hence is a discrete-time two-state process.

How to describe a random process?

As we have observed above that at a specific time is a random variable and can be described

by its probability distribution function This distribution function is called the first-
order probability distribution function.

We can similarly define the first-order probability density function

To describe , we have to use joint distribution function of the random variables at all possible

values of . For any positive integer , represents jointly distributed random

variables. Thus a random process can thus be described by specifying the joint
distribution function .

or th the joint probability density function

If is a discrete-state random process, then it can be also specified by the collection of

joint probability mass function

3. Stationary Random Process

Random Signal Analysis Page 256


The concept of stationarity plays an important role in solving practical problems involving random
processes. Just like time-invariance is an important characteristics of many deterministic systems,
stationarity describes certain time-invariant property of a class of random processes. Stationarity also leads
to frequency-domain description of a random process.

4.1 Strict-sense Stationary Process

A random process is called strict-sense stationary (SSS) if its probability structure is invariant with

time. In terms of the joint distribution function, is called SSS if

Thus, the joint distribution functions of any set of random variables does not depend
on the placement of the origin of the time axis. This requirement is a very strict. Less strict form of
stationarity may be defined.

Particularly,

if then is called
order stationary.

is called order stationary does not depend on the placement of the origin of the time axis. This
requirement is a very strict. Less strict form of stationarity may be defined.

 If is stationary up to order 1

Random Signal Analysis Page 257


Let us assume Then

As a consequence

 If is stationary up to order 2

Put

As a consequence, for such a process

Similarly,

Therefore, the autocorrelation function of a SSS process depends only on the time lag

We can also define the joint stationarity of two random processes. Two processes

Random Signal Analysis Page 258


and are called jointly strict-sense stationary if their joint probability distributions of any

order is invariant under the translation of time. A complex random process is called

SSS if and are jointly SSS.

4.2 Wide-sense stationary process

It is very difficult to test whether a process is SSS or not. A subclass of the SSS process called the wide sense
stationary process is extremely important from practical point of view.

A random process is called wide sense stationary process (WSS) if

(1) For a WSS process

(2) An SSS process is always WSS, but the converse is not always true.

Example 1 Sinusoid with random phase

Consider the random process given by

where are constants and are unifirmly distributed between

Random Signal Analysis Page 259


This is the model of the carrier wave (sinusoid of fixed frequency) used to analyse the noise performance
of many receivers.

Note that

By applying the rule for the transformation of a random variable, we get

which is independent of Hence is first-order stationary.

Note that

and

Hence is wide-sense stationary


Random Signal Analysis Page 260
4. Mean and Variance

For any t ,

Thus mean and variance of the process are constants.

5. Autocorrelation Function

When are not in the same pulse interval, and hence are independent.

To find the autocorrelation function for , let us consider the case .

Depending on the delay D , the points may lie on one or two pulse intervals.

6.1 Autocorrelation of a deterministic signal

Consider a deterministic signal such that

Such signals are called power signals. For a power signal the autocorrelation function is defined as

Random Signal Analysis Page 261


measures the similarity between a signal and its time-shifted version. Particularly,

is the mean-square value. If is a voltage waveform across a 1 ohm resistance,

then is the average power delivered to the resistance. In this sense, represents the average
power of the signal.

Example 1 Suppose The autocorrelation function of at lag is given by

We see that of the above periodic signal is also periodic and its maximum occurs when

The power of the signal is

The autocorrelation of the deterministic signal gives us insight into the properties of the autocorrelation
function of a WSS process. We shall discuss these properties next.

Poisson process

In probability theory, a Poisson process is a stochastic process that counts the number of events[note 1] and
the time points at which these events occur in a given time interval. The time between each pair of
consecutive events has an exponential distribution with parameter λ and each of these inter-arrival times is
assumed to be independent of other inter-arrival times. The process is named after the Poisson distribution
introduced by French mathematician Siméon Denis Poisson.[1] It describes the time of events in radioactive
decay,[2] telephone calls at a call center,[3] document requests on a web server,[4] and many other punctual
phenomena where events occur independently from each other.

The Poisson process is a continuous-time stochastic process; the sum of a Bernoulli process can be thought
of as its discrete-time counterpart. A Poisson process is a pure-birth process, the simplest example of a
birth-death process. It is also a point process on the real half-line

Definition

Random Signal Analysis Page 262


The basic form of Poisson process, often referred to simply as "the Poisson process", is a continuous-time
counting process {N(t), t ≥ 0} that possesses the following properties:[5]

 N(0) = 0
 Independent increments (the numbers of occurrences counted in disjoint intervals are independent
of each other)
 Stationary increments (the probability distribution of the number of occurrences counted in any
time interval only depends on the length of the interval)
 Proportionality (the probability of an occurrence in a time interval is proportional to the length of
the time interval)
 The probability of simultaneous occurrences equals zero.

Consequences of this definition include:

 The probability distribution of the waiting time until the next occurrence is an exponential
distribution.
 For each t≥0, the probability distribution of N(t) is a Poisson distribution with parameter λt. Here
λ>0 is called the rate of the Poisson process.
 The occurrences are distributed uniformly on any interval of time. (Note that N(t), the total number
of occurrences, has a Poisson distribution over the non-negative integers, whereas the location of
an individual occurrence on t ∈ (a, b] is uniform.)

Other types of Poisson process are described below.

There are a series of generalizations of the basic Poisson process defined above; these are also termed
Poisson processes. The first of them, called homogeneous, coincides with the basic Poisson process defined
above.

Homogeneous

Sample path of a counting Poisson process N(t)

A homogeneous Poisson process counts events that occur at a constant rate; it is one of the most well-
known Lévy processes. This process is characterized by a rate parameter λ, also known as intensity, such

Random Signal Analysis Page 263


that the number of events in time interval (t, t + τ] follows a Poisson distribution with associated parameter
λτ. This relation is given as

where N(t + τ) − N(t) = k is the number of events in time interval (t, t + τ].

Just as a Poisson random variable is characterized by its scalar parameter λ, a homogeneous Poisson
process is characterized by its rate parameter λ, which is the expected number of "events" or "arrivals" that
occur per unit time.

N(t) is a sample homogeneous Poisson process, not to be confused with a density or distribution function.

Inhomogeneous

Main article: Inhomogeneous Poisson process

An inhomogeneous Poisson process counts events that occur at a variable rate. In general, the rate
parameter may change over time; such a process is called a non-homogeneous Poisson process or
inhomogeneous Poisson process. In this case, the generalized rate function is given as λ(t). Now the
expected number of events between time a and time b is

Thus, the number of arrivals in the time interval [a, b], given as N(b) − N(a), follows a Poisson distribution
with associated parameter Na,b

A rate function λ(t) in a non-homogeneous Poisson process can be either a deterministic function of time or
an independent stochastic process, giving rise to a Cox process. A homogeneous Poisson process may be
viewed as a special case when λ(t) = λ, a constant rate.

Spatial

An important variation on the (notionally time-based) Poisson process is the spatial Poisson process. In the
case of a one-dimension space (a line) the theory differs from that of a time-based Poisson process only in
the interpretation of the index variable. For higher dimension spaces, where the index variable (now x) is in
some vector space V (e.g. R2 or R3), a spatial Poisson process can be defined by the requirement that the
random variables defined as the counts of the number of "events" inside each of a number of non-

Random Signal Analysis Page 264


overlapping finite sub-regions of V should each have a Poisson distribution and should be independent of
each other.

Space-time

A further variation on the Poisson process, the space-time Poisson process, allows for separately
distinguished space and time variables. Even though this can theoretically be treated as a pure spatial
process by treating "time" as just another component of a vector space, it is convenient in most
applications to treat space and time separately, both for modeling purposes in practical applications and
because of the types of properties of such processes that it is interesting to study.

In comparison to a time-based inhomogeneous Poisson process, the extension to a space-time Poisson


process can introduce a spatial dependence into the rate function, such that it is defined as , where
2 3
for some vector space V (e.g. R or R ). However a space-time Poisson process may have a rate
function that is constant with respect to either or both of x and t. For any set (e.g. a spatial region)
with finite measure , the number of events occurring inside this region can be modeled as a Poisson
process with associated rate function λS(t) such that

Separable space-time processes

In the special case that this generalized rate function is a separable function of time and space, we have:

for some function . Without loss of generality, let

(If this is not the case, λ(t) can be scaled appropriately.) Now, represents the spatial probability
density function of these random events in the following sense. The act of sampling this spatial Poisson
process is equivalent to sampling a Poisson process with rate function λ(t), and associating with each event
a random vector sampled from the probability density function . A similar result can be shown for
the general (non-separable) case.

Characterisation

In its most general form, the only two conditions for a counting process to be a Poisson process are:[citation
needed]

 Orderliness: which roughly means

Random Signal Analysis Page 265


which implies that arrivals don't occur simultaneously (but this is actually a mathematically stronger
statement).

 Memorylessness (also called evolution without after-effects): the number of arrivals occurring in
any bounded interval of time after time t is independent of the number of arrivals occurring before
time t.

These seemingly unrestrictive conditions actually impose a great deal of structure in the Poisson process. In
particular, they imply that the time between consecutive events (called interarrival times) are independent
random variables. For the homogeneous Poisson process, these inter-arrival times are exponentially
distributed with parameter λ (mean 1/λ).

Also, the memorylessness property entails that the number of events in any time interval is independent of
the number of events in any other interval that is disjoint from it. This latter property is known as the
independent increments property of the Poisson process.

Properties

As defined above, the stochastic process {N(t)} is a Markov process, or more specifically, a continuous-time
Markov process.[citation needed]

To illustrate the exponentially distributed inter-arrival times property, consider a homogeneous Poisson
process N(t) with rate parameter λ, and let Tk be the time of the kth arrival, for k = 1, 2, 3, ... . Clearly the
number of arrivals before some fixed time t is less than k if and only if the waiting time until the kth arrival
is more than t. In symbols, the event [N(t) < k] occurs if and only if the event [Tk > t] occurs. Consequently
the probabilities of these events are the same:

In particular, consider the waiting time until the first arrival. Clearly that time is more than t if and only if
the number of arrivals before time t is 0. Combining this latter property with the above probability
distribution for the number of homogeneous Poisson process events in a fixed interval gives:

And therefore:

(which is the CDF of the exponential distribution).

Consequently, the waiting time until the first arrival T1 has an exponential distribution, and is thus
memoryless. One can similarly show that the other interarrival times Tk − Tk−1 share the same distribution.
Random Signal Analysis Page 266
Hence, they are independent, identically distributed (i.i.d.) random variables with parameter λ > 0; and
expected value 1/λ. For example, if the average rate of arrivals is 5 per minute, then the average waiting
time between arrivals is 1/5 minute.

Applications

The classic example of phenomena well modelled by a Poisson process is deaths due to horse kick in the
Prussian army, as shown in 1898 by Ladislaus Bortkiewicz, a Polish economist and statistician who also
examined data of child suicides.[6][7] The following examples are also well-modeled by the Poisson process:

 Number of road crashes (or injuries/fatalities) at a site or in an area


 Goals scored in a association football match.[8]
 Requests for individual documents on a web server.[4]
 Particle emissions due to radioactive decay by an unstable substance. In this case the Poisson
process is non-homogeneous in a predictable manner—the emission rate declines as particles are
emitted.
 Action potentials emitted by a neuron.[9]
 L. F. Richardson showed that the outbreak of war followed a Poisson process from 1820 to 1950. [10]
 Photons landing on a photodiode, in particular in low light environments. This phenomenon is
related to shot noise.
 Opportunities for firms to adjust nominal prices.[11]
 Arrival of innovations from research and development.[12]
 Requests for telephone calls at a switchboard.[citation needed]
 In queueing theory, the times of customer/job arrivals at queues are often assumed to be a Poisson
process.
 The evolution (changes on pages) of Internet, in general (although not in the particular case of
Wikipedia)

Gaussian process

In probability theory and statistics, Gaussian processes are a family of stochastic processes. In a Gaussian
process, every point in some input space is associated with a normally distributed random variable.
Moreover, every finite collection of those random variables has a multivariate normal distribution. The
distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables,
and as such, it is a distribution over functions.

The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of
the normal distribution which is often called the Gaussian distribution. In fact, Gaussian processes can be
seen as an infinite-dimensional generalization of multivariate normal distributions.

Gaussian processes are important in statistical modelling because of properties inherited from the normal.
For example, if a random process is modeled as a Gaussian process, the distributions of various derived
quantities can be obtained explicitly. Such quantities include the average value of the process over a range
of times and the error in estimating the average using sample values at a small set of times.

Random Signal Analysis Page 267


Definition

A Gaussian process is a stochastic process Xt, t ∈ T, for which any finite linear combination of samples has a
joint Gaussian distribution. More accurately, any linear functional applied to the sample function Xt will give
a normally distributed result. Notation-wise, one can write X ~ GP(m,K), meaning the random function X is
distributed as a GP with mean function m and covariance function K.[1] When the input vector t is two- or
multi-dimensional, a Gaussian process might be also known as a Gaussian random field.[2]

Some authors[3] assume the random variables Xt have mean zero; this greatly simplifies calculations without
loss of generality and allows the mean square properties of the process to be entirely determined by the
covariance function K.[4]

Alternative definitions

Alternatively, a process is Gaussian if and only if for every finite set of indices in the index set

is a multivariate Gaussian random variable. Using characteristic functions of random variables, the Gaussian
property can be formulated as follows: is Gaussian if and only if, for every finite set of
indices , there are real valued , with such that

The numbers and can be shown to be the covariances and means of the variables in the process.[5]

Covariance functions

A key fact of Gaussian processes is that they can be completely defined by their second-order statistics.[2]
Thus, if a Gaussian process is assumed to have mean zero, defining the covariance function completely
defines the process' behaviour. The covariance matrix K between all the pair of points x and x' specifies a
distribution on functions and is known as the Gram matrix. Importantly, because every valid covariance
function is a scalar product of vectors, by construction the matrix K is a non-negative definite matrix.
Equivalently, the covariance function K is a non-negative definite function in the sense that for every pair x
and x', K(x,x') ≥ 0; if K(,) > 0 then K is called positive definite. Importantly the non-negative definiteness of K
enables its spectral decomposition using the Karhunen–Loeve expansion. Basic aspects that can be defined
through the covariance function are the process' stationarity, isotropy, smoothness and periodicity.[6][7]

Stationarity refers to the process' behaviour regarding the separation of any two points x and x' . If the
process is stationary, it depends on their separation, x − x', while if non-stationary it depends on the actual
position of the points x and x'; an example of a stationary process is the Ornstein–Uhlenbeck process. On
the contrary, the special case of an Ornstein–Uhlenbeck process, a Brownian motion process, is non-
stationary.

Random Signal Analysis Page 268


If the process depends only on |x − x'|, the Euclidean distance (not the direction) between x and x' then the
process is considered isotropic. A process that is concurrently stationary and isotropic is considered to be
homogeneous;[8] in practice these properties reflect the differences (or rather the lack of them) in the
behaviour of the process given the location of the observer.

Ultimately Gaussian processes translate as taking priors on functions and the smoothness of these priors
can be induced by the covariance function.[6] If we expect that for "near-by" input points x and x' their
corresponding output points y and y' to be "near-by" also, then the assumption of smoothness is present. If
we wish to allow for significant displacement then we might choose a rougher covariance function. Extreme
examples of the behaviour is the Ornstein–Uhlenbeck covariance function and the squared exponential
where the former is never differentiable and the latter infinitely differentiable.

Periodicity refers to inducing periodic patterns within the behaviour of the process. Formally, this is
achieved by mapping the input x to a two dimensional vector u(x) = (cos(x), sin(x)).

Applications

A Gaussian process can be used as a prior probability distribution over functions in Bayesian inference.[7][9]
Given any set of N points in the desired domain of your functions, take a multivariate Gaussian whose
covariance matrix parameter is the Gram matrix of your N points with some desired kernel, and sample
from that Gaussian.

Inference of continuous values with a Gaussian process prior is known as Gaussian process regression, or
kriging; extending Gaussian process regression to multiple target variables is known as cokriging.[10]
Gaussian processes are thus useful as a powerful non-linear multivariate interpolation tool. Additionally,
Gaussian process regression can be extended to address learning tasks in both supervised (e.g. probabilistic
classification[7]) and unsupervised (e.g. manifold learning[2]) learning frameworks.

6.2 Properties of the autocorrelation function of a real WSS process

Consider a real WSS process Since the autocorrelation function of such a process is a

function of the lag we can redefine a one-parameter autocorrelation function as

If is a complex WSS process, then

Random Signal Analysis Page 269


where is the complex conjugate of For a discrete random sequence, we can define the
autocorrelation sequence similarly.

The autocorrelation function is an important function charactersing a WSS random process. It possesses
some general properties. We briefly describe them below.

1. is the mean-square value of the process. Thus,

Remark If is a voltage signal applied across a 1 ohm resistance, then is the ensemble average
power delivered to the resistance.

2. For a real WSS process is an even function of the time Thus,

Because,

Remark For a complex process

3. This follows from the Schwartz inequality

We have

Random Signal Analysis Page 270


4. is a positive semi-definite function in the sense that for any positive integer and real

Proof

Define the random variable

It can be shown that the sufficient condition for a function to be the autocorrelation function of a real

WSS process is that be real, even and positive semidefinite.

5. If is MS periodic, then . is also periodic with the same period.

Proof: Note that a real WSS random process is called mean-square periodic ( MS periodic ) with a

period if for every

Random Signal Analysis Page 271


Again

6. Suppose

where is a zero-mean WSS process and Then

Interpretation of the autocorrelation function of a WSS process

The autocorrelation function measures the correlation between two random variables and

If drops quickly with respect to then the and will be less correlated for large This
in turn means that the signal has lot of changes with respect to time. Such a signal has high frequency

components. If drops slowly, the signal samples are highly correlated and such a signal has less high

frequency components. Later on we see that is directly related to the frequency -domain
representation of a WSS process.

7. Cross correlation function of jointly WSS processes

If and are two real jointly WSS random processes, their cross-correlation functions are
independent of and depends on the time-lag. We can write the cross-correlation function

Random Signal Analysis Page 272


The cross correlation function satisfies the following properties:

This property is illustrated in Figure below.

We Have

Further,

iii. If and Y (t) are uncorrelated,

iv. If X ( t ) and Y (t) are orthogonal processes,

Random Signal Analysis Page 273


Example 1

Consider a random process which is sum of two real jointly WSS random processes.

We have

If and are orthogonal processes,then

Example 2

Suppose

where X (t) is a WSS process and

Random Signal Analysis Page 274


Linear time-invariant systems

In many applications, physical systems are modeled as linear time-invariant (LTI) systems. The dynamic
behaviour of an LTI system to deterministic inputs is described by linear differential equations. We are
familiar with time and transform domain (such as Laplace transform and Fourier transform) techniques to
solve these differential equations. In this lecture, we develop the technique to analyze the response of an
LTI system to WSS random process.

The purpose of this study is two-folds:

 Analysis of the response of a system


 Finding an LTI system that can optimally estimate an unobserved random process from an observed
process. The observed random process is statistically related to the unobserved random process.
For example, we may have to find LTI system (also called a filter) to estimate the signal from the
noisy observations.

Basics of Linear Time Invariant Systems

A system is modelled by a transformation T that maps an input signa to an output signal y(t) as
shown in Figure 1. We can thus write,

Random Signal Analysis Page 275


Linear system

The system is called linear if the principle of superposition applies: the weighted sum of inputs results
in the weighted sum of the corresponding outputs . Thus for a linear system

Example 1 Consider the output of a differentiator, given by

Then,

Hence the linear differentiator is a linear system.

Linear time-invariant system

Consider a linear system with y ( t ) = T x ( t ). The system is called time-invariant if

It is easy to check that that the differentiator in the above example is a linear time-invariant system.

Random Signal Analysis Page 276


Causal system

The system is called causal if the output of the system at depends only on the present and
past values of input. Thus for a causal system

Response of a linear time-invariant system to deterministic input

As shown in Figure 2, a linear system can be characterised by its impulse response where

is the Dirac delta function .

Figure 2

Recall that any function x(t) can be represented in terms of the Dirac delta function as follows

If x(t) is input to the linear system y ( t ) = T x ( t ), then

Where is the response at time t due to the shifted impulse

If the system is time invariant,

Random Signal Analysis Page 277


Therefore for a linear-time invariant system,

where * denotes the convolution operation.

We also note that

Thus for a LTI System,

Taking the Fourier transform, we get

Figure 3 shows the input-output relationship of an LTI system in terms of the impulse response and the
frequency response.

Figure 3

Random Signal Analysis Page 278


Response of an LTI System to WSS input

Consider an LTI system with impulse response h(t). Suppose is a WSS process input to the

system. The output of the system is given by

where we have assumed that the integrals exist in the mean square (m.s.) sense.

Mean and autocorrelation of the output process

where is the frequency response at 0 frequency ( ) and given by

Therefore, the mean of the output process is a constant

iscrete-time Linear Shift Invariant System with Deterministic Inputs

Random Signal Analysis Page 279


We have seen that the Dirac delta function plays a very important role in the analysis of the
response of the continuous-time LTI systems to deterministic and random inputs. Similar role in the case of

the discrete-time LTI system is played by the unit sample sequence , defined by

Any discrete-time signal can be expressed in terms of as follows:

As illustrated in Figure 1, discrete-time linear shift-invariant system is characterized by the unit sample

response which is the output of the system to the unit sample sequence .

The DTFT of the unit sample response is the transfer function of the system and given by

The transfer function in terms of the is given by

where is a function of the complex variable It is defined on a region of convergence (ROC) on the

An analysis similar to that for the continuous-time LTI system can be applied to the discrete-time

LTI system. Such an analysis shows that the response of a the linear time-invariant system with

impulse response to a deterministic input is


Random Signal Analysis Page 280
By taking the DTFT of both sides, we get

More generally, we can take the of the input and the response and show that

Remark

 If the LTI system is causal, then

In this case, the ROC of is a region in the given by For example, suppose

Then,

 Similarly, if the LTI system is anti-causal , then

In this case, the ROC of is a region in the given by

 The contour is called the unit circle. Thus represents evaluated on the unit circle.

Random Signal Analysis Page 281


 can be expressed as the ratio of two polynomials in

The polynomials and helps us in analyzing the properties of a linear system in terms of the

zeros and poles of defined by -

Zero - the point in the where Consequently at such a point.

Pole - the point in the where Consequently at such a point. The ROC of

does not contain any pole. The poles and zeroes and unit circle on the complex plane are illustrated
in Figure 2.

Random Signal Analysis Page 282


 For the stability of the LTI system, the unit-sample response should decay to zero as A
necessary and sufficient condition for the stability of a discrete-time LTI system is that all its poles
lie strictly inside the unit circle.
 A discrete-time LTI system is called a minimum- phase system if all its poles and zeros lie inside the
unit circle. A minimum-phase system is always stable as its poles lie inside the unit circle. Because

the zeros of the system lie inside the unit circle, the inverse system with a transfer function
will have all its poles inside the unit circle and be stable.

• A discrete-time LTI system is called a maximum- phase system if all its poles and zeros lie outside
the unit circle.

Response of a discrete-time LTI system to WSS input

Consider a discrete-time linear time-invariant system with impulse response and input as

shown in Figure 3 below. Assume to be a WSS process with mean and autocorrelation function

Figure 3

The output random process is given by

Random Signal Analysis Page 283


Given the WSS input the output process is also WSS. We establish this result in the
following section.

Mean and Autocorrelation of the output

The mean of the output is given by

where

Thus the mean of the output process is constant. We write

The cross-correlation between the output and the input random processes is given by

Random Signal Analysis Page 284


Thus does not depend on , but on lag and we can write

The autocorrelation function of the out put is

is a function of lag m only and we write

The mean-square value of the output process is

Thus if is WSS then is also WSS.

Random Signal Analysis Page 285


Taking the DTFT of we get

This input-output relation in the frequency domain is illustrated in Figure 4.

Figure 4

In terms of the we get

which is illustrated in Figure 5 .

Notice that if is causal, then is anti-causal and vice versa.

Similarly if is minimum-phase then is maximum-phase.

Similarly if is minimum-phase then is maximum-phase.

Figure 5

Remark

Random Signal Analysis Page 286


Finding the probability density function of the output process is a difficult task. However, if

is a WSS Gaussian random process, then the output process is also Gaussian with the
probability density function determined by its mean and the autocorrelation function.

Example 1

Suppose and is a zero-mean white-noise sequence with variance . Then

By partial fraction expansion and inverse transform, we get

 Though the input is an uncorrelated process in the above example, the output is a correlated
process.

 For the same white noise input, we can generate random processes with different autocorrelation
functions or power spectral densities.

Spectral factorization theorem

Random Signal Analysis Page 287


Consider a discrete-time LTI system with the transfer function and the white noise sequence

as the input random process as shown in the Figure 6 below.

Figure 6

Then

We have seen that is the product of a constant and two transfer functions This
result is of fundamental importance in modeling a WSS process because of the spectral factorization
theorem stated below:

Thus a WSS random signal with continuous spectrum that satisfies the Paley Wiener

condition can be considered as an output of a linear filter fed by a white noise sequence
{w[n]}as shown in Figure 7(a). The sequence {w[n]} is called the innovation sequence.

Random Signal Analysis Page 288


Proof of the spectral factorization theorem

Since is analytic in an annular region that includes the unit circle

where is the order cepstral coefficient. For a real signal and

Random Signal Analysis Page 289


ote that and are both analytic.

Therefore, is a minimum phase filter .

The concepts of minimum-phase and maximum-phase filters are illustrated in Figure 8.

Random Signal Analysis Page 290


Remarks

 Note that of a real process is a function of . Therefore, is a

function of Consider rational spectrum so that where

and are polynomials in . If is a root of so is Thus the roots of are

symmetrical about the unit circle groups the poles and zeros inside the unit circle

and groups the poles and zeros outside the unit circle.

 can be factorized into a minimum-phase and a maximum-phase factors i.e. and

 In general spectral factorization is difficult, however for a signal with rational power spectrum,
spectral factorization can be easily done.

Random Signal Analysis Page 291


 Since is a minimum phase filter, the inverse filter exists and stable. Therefore we can

have a filter to filter the given signal to get the innovation sequence.

 and are related through an invertible transform; so they contain the same
information.

Example 2 Suppose the power spectral density of a discrete random sequence is given by

Then

Wold's Decomposition

Any WSS signal can be decomposed as a sum of two mutually orthogonal processes a regular process

 a regular process and a predictable process ,

 can be expressed as the output of linear filter using a white noise sequence as input.

 is a predictable process, that is, the process can be predicted from its own past with zero
prediction error.

Consider the problem of estimating a signal in presence of additive noise. We want to estimate the signal
by filtering the noisy signal.

Random Signal Analysis Page 292


As you might be knowing that the filter is a frequency selective device- it passes selectively a band of
frequency components and suppresses other components. Such a deterministic filter cannot be a solution
to the problem of suppressing random noise because random noise cannot be localized to a specific
frequency band.

We have to use the probabilistic properties of the noise to dissociate the noise from the signal. An
optimal filter performs this dissociation. We will consider the case when the signal to be estimated is of
known form (deterministic). For example, in radar application a signal of known form is reflected from a
distant target. The received signal is the sum of the scaled and shifted version of the original signal and the
noise.

That is,

Where X ( t ) is shifted and scaled version of the known transmitted signal and V ( t ) is a noise assumed

to be WSS with a power spectral density . We wish to decide whether X ( t ) is present and its value

by passing through a linear filter of impulse response

Instantaneous signal power

Average noise power


Random Signal Analysis Page 293
Signal to noise ratio

We have to determine such that is maximum.

Instantaneous signal power

Average noise power

Signal to noise ratio

Random Signal Analysis Page 294


We have to determine such that is maximum.

Case 1:White noise

Then

Equality holds if

Band pass Random Processes

Random Signal Analysis Page 295


A random process is called a band-pass process if its power spectrum is zero outside a

band . is called the band-width and is called the centre frequency of the

band-pass process . If is very small compared to the centre frequency , then is called a
narrow-band process.

We can similarly define a low-pass random process as a random process if its power spectral

density is zero outside the band

 In telecommunication, we often deal with random signals which have PSD concentrated in a small
frequency band and negligible outside this band. The information bearing signals like speech, image
and video are low-pass signals. These information-bearing signals modulate a sinusoidal carrier for
transmitting over the communication channel that acts as a bandpass filter. For example, the
amplitude- modulated waveform received by a communication receiver is modelled as an
amplitude-modulated random-phase sinusoid

where and M ( t ) is a WSS process independent of


.. Here the modulation process translates the spectrum of M ( t ) from base-band to a band centred around

 The noise associated with communication signal undergoes band-pass filtering in the
communication receiver and the band-pass filtered noise can be modeled as a band-pass process.

Figure 1 illustrates the power spectrum of a bandpass random process.

Random Signal Analysis Page 296


Figure 1 Power spectrum of a band-pass random process

We can do the correlation and power spectral analysis of such signals in the usual manner. However, for
analysis of nonlinear operations like the multiplication with a random process, the following trigonometric
representation is useful.

Rice's representation or quadrature representation of a WSS process

An arbitrary zero-mean WSS process can be represented in terms of the slowly varying

components and as follows:

(1)

where is a center frequency arbitrary chosen in the band . and

are respectively called the in-phase and the quadrature-phase components of

Let us choose a dual process such that

then ,

Random Signal Analysis Page 297


(2)

and

(3)

For such a representation, we require the processes and to be WSS.

Note that

As is zero mean, we require that

and

Again

Random Signal Analysis Page 298


Under the the above highlighted conditions

and

How to find satisfying the above two conditions?

Random Signal Analysis Page 299


For this, consider to be the Hilbert transform of , i.e.

where and the integral is defined in the mean-square sense. See the illustration in Figure 2.

The frequency response of the Hilbert transform is given by

and

The Hilbert transform of Y(t) satisfies the following spectral relations

Random Signal Analysis Page 300


From the above two relations, we get

The Hilbert transform of is generally denoted as Therefore, from (2) and (3) we establish

and

The realization for the in phase and the quadrature phase components is shown in Figure 3 below.

Random Signal Analysis Page 301


Figure 3

From the above analysis, we can summarise the following expressions for the autocorrelation functions

where

See the illustration in Figure 4

Figure 4

The variances and are given by

Taking the Fourier transform of we get

Similarly ,

Random Signal Analysis Page 302


Notice that the cross power spectral density is purely imaginary. Particularly, if is locally

symmetric about

implying that

Consequently, the zero-mean processes and are also uncorrelated.

Example 1

Suppose the band-limited white-noise process has the PSD as shown in Figure 5 below.

Random Signal Analysis Page 303


Figure 5

We have earlier shown that

The plot of is as shown in Figure 5. Therefore,

(1) The representation of the band-pass process in terms of the in-phase and the quadrature

phasecomponents is not unique. By selecting different we can have different representations.

Random Signal Analysis Page 304


(2) The band-pass process can be written also as

where

and

and are respectively called the envelope and the phase of the process .

(3) If is a Gaussian process, then (being linear transform of ) is also Gaussian.

Consequently, the processes and are also Gaussian.

4) Under the condition of local symmetry of about , and are uncorrelated.If

and are also Gaussian processes , then and will be independent. Using the
results on the PDF of functions of RVs, we get the following.

 The envelope will be Rayleigh-distributed. Thus

 The phase will be distributed.

Random Signal Analysis Page 305


Example 1 Consider patients coming to a doctor’s office at random points in time. Let Xn denote the time
(in hrs) that the n th patient has to wait before being admitted to see the doctor. Describe the random
process Xn, n ≥ 1.

Solution. The random process Xn is a discrete-time, continuous-valued random process. The sample space
is SX = {x : x ≥ 0}. The index parameter set (domain of time) is I = {1, 2, 3, · · ·}.

Example 2 The number of failures N(t), which occur in a computer network over the time interval [0, t), can
be described by a homogeneous Poisson process {N(t), t ≥ 0}. On an average, there is a failure after every 4
hours, i.e. the intensity of the process is equal to λ = 0.25[h −1 ]

(a) What is the probability of at most 1 failure in [0, 8), at least 2 failures in [8, 16), and at most 1 failure in
[16, 24) (time unit: hour)?

Solution (a) The probability p = P[N(8) − N(0) ≤ 1, N(16) − N(8) ≥ 2, N(24) − N(16) ≤ 1] is required. In view of
the independence and the homogeneity of the increments of a homogeneous Poisson process, it can be
determined as follows: p = P[N(8) − N(0) ≤ 1]P[N(16) − N(8) ≥ 2]P[N(24) − N(16) ≤ 1] = P[N(8) ≤ 1]P[N(8) ≥
2]P[N(8) ≤ 1]. Since P[N(8) ≤ 1] = P[N(8) = 0] + P[N(8) = 1] = e −0.25·8 + 0.25 · 8 · e −0.25·8 = 0.406 and
P[N(8) ≥ 2] = 1 − P[N(8) ≤ 1] = 0.594, the desired probability is p = 0.406 × 0.594 × 0.406 = 0.098.

Example 3 — Random Telegraph signal Let a random signal X(t) have the structure X(t) = (−1)N(t) Y, t ≥ 0, 3
where {N(t), t ≥ 0} is a homogeneous Poisson process with intensity λ and Y is a binary random variable with
P(Y = 1) = P(Y = −1) = 1/2 which is independent of N(t) for all t. Signals of this structure are called random
telegraph signals. Random telegraph signals are basic modules for generating signals with a more
complicated structure. Obviously, X(t) = 1 or X(t) = −1 and Y determines the sign of X(0).

Since |X(t)| 2 = 1 < ∞ for all t ≥ 0, the stochastic process {X(t), t ≥ 0} is a secondorder process. Letting I(t) =
(−1)N(t) , its trend function is m(t) = E[X(t)] = E[Y ]E[I(t)]. Since E[Y ] = 0, the trend function is identically
zero: m(t) ≡ 0. It remains to show that the covariance function C(s, t) of this process depends only on |t−s|.
This requires the determination of the probability distribution of I(t). A transition from I(t) = −1 to I(t) = +1
or, conversely, from I(t) = +1 to I(t) = −1 occurs at those time points where Poisson events occur, i.e. where
Random Signal Analysis Page 306
N(t) jumps. P(I(t) = 1) = P(even number of jumps in [0, t]) = e −λtX∞ i=0 (λt) 2i (2i)! = e −λt cosh λt, P(I(t) =
−1) = P(odd number of jumps in [0, t]) = e −λtX∞ i=0 (λt) 2i+1 (2i + 1)! = e −λt sinh λt. Hence the expected
value of I(t) is E[I(t)] = 1 · P(I(t) = 1) + (−1) · P(I(t) = −1) = e −λt[cosh λt − sinh λt] = e −2λt . Since C(s, t) =
COV[X(s), X(t)] = E[(X(s)X(t))] = E[Y I(s)Y I(t)] = E[Y 2 I(s)I(t)] = E(Y 2 )E[I(s)I(t)] and E(Y 2 ) = 1, C(s, t) =
E[I(s)I(t)]. Thus, in order to evaluate C(s, t), the joint distribution of the random vector (I(s), I(t)) must be
determined. In view of the homogeneity of the increments of {N(t), t ≥ 0}, for 4 s < t, p1,1 = P(I(s) = 1, I(t) =
1) = P(I(s) = 1)P(I(t) = 1|I(s) = 1) = e −λs cosh λs P(even number of jumps in (s, t]) = e −λs cosh λs e−λ(t−s)
cosh λ(t − s) = e −λt cosh λs cosh λ(t − s). Analogously, p1,−1 = P(I(s) = 1, I(t) = −1) = e −λt cosh λs sinh λ(t − s)
p−1,1 = P(I(s) = −1, I(t) = 1) = e −λt sinh λs sinh λ(t − s) p−1,−1 = P(I(s) = −1, I(t) = −1) = e −λt sinh λs cosh λ(t −
s). Since E[I(s)I(t)] = p1,1 + p−1,−1 − p1,−1 − p−1,1, we obtain C(s, t) = e −2λ(t−s) , s < t. Note that the order
of s and t can be changed so that C(s, t) = e −2λ|t−s| .

Questions

Objective Question

1. The point in Z plane which causes H(z) = 0

a. Zero

b. Pole

c. Master

d. Slave

2. The point in Z plane which causes H(z) = ∞

a. Zero

b. Pole

c. Master

d. Slave

Random Signal Analysis Page 307


3. Give relation between autocorrelation & power spectral density

a. power spectral density = FT [autocorrelation]

b. power spectral density = LT [autocorrelation]

c. power spectral density = ZT [autocorrelation]

d. power spectral density = KLT [autocorrelation]

4. LTI system implies system is

a. Linear

b. Time Invariant

c. Time Variant

d. Low Pass

5. Causal system are

a. Realizable

b. Imaginary

c. Ideal

d. Lossless

8. Short Question

1. What is Random Process ? Explain it with example.


2. State four classes of RP giving one example of each.
3. Define a SSS and a WSS. What is the difference between the two.
4. Define the first order and second order PDF of Random process
5. Define Linear system
6. Define Causal system

Random Signal Analysis Page 308


7. Define Linear Time Invariant system
8. Define Autocorrelation of RP
9. Define Power spectral density of RP
10. Give relation between autocorrelation & power spectral density
11. Give relation between power spectral density at input & output of LTI system.

9. Long Question

1. Explain in brief:

(i) WSS process

(ii) Poisson process

(iii) Queuing system

2. If the WSS Process X(t) is given by X(t) = 10 cos (100t+ θ)


Were θ is uniformly distributed over ( - π, π). Prove that the X(t) is. Correlation Ergodic.

3. What is a Random Process? State four classes of random processes giving one example each

4. 1. Consider a sinusoidal signal where A is random variable with probability mass

functions and
5. a) Sketch all the possible realizations of X(t).

(b) Sketch the marginal CDF's of the random variables at

6. Consider the random process {X(t)} given by

7. where are constants and is a discrete random variable. Examine if


{X(t)} is WSS in the following cases:

(i)

Random Signal Analysis Page 309


(ii)

Mumbai University Question:

Dec 2012

Q. A random process is given by X(t)= sin(Wt+Y) where Y is uniformly distributed in (0, 2π). Verify whether
{X(t)} is WSS process

Q. State and prove the properties of autocorrelation function and cross correlation function.

Q. If the WSS Process X(t) is given by X(t) = 10 cos (100t+ θ)

Were θ is uniformly distributed over ( - π, π). Prove that the X(t) is.correlation Ergodic.

Q. A WSS random process {X(t)} is applied to the input of LTI system whose impulse response is te-atu(t)
where a(>0) is a real constant

May 2012

Q. A random process is given by X(t)= Acos(Wt+Y) where Y is uniformly distributed in (0, 2π). Verify whether
{X(t)} is WSS process

Q. Explain power spectral density. State it’s properties and prove any two of them

Q. Prove that if input to the LTI is WSS then output is also WSS

Dec 2011

Q. What is random process ? state four classes of random process with example

Q. Explain power spectral density. State it’s properties and prove any two of them

Q. If the X(t) is given by X(t) = 10 cos (100t+ θ) where θ is uniformly distributed over ( - π, π). Prove that
the X(t) is WSS

Random Signal Analysis Page 310


Q. Prove that if input to the LTI is WSS then output is also WSS

May 2011

Q. Define poisson process and prove that it is markov process

Q. Find autocorrelation function and poer spectral density of random process is given by X(t)= Acos(Wt+Y)
where Y is uniformly distributed in (-π, π). Verify whether {X(t)} is WSS process

Q. Explain power spectral density. State it’s properties and prove any two of them

Q. Explain Gaussian process, ergodic process

Dec 2010

A WSS random process X(t)with autocorrelation- RXX(τ)= Ae-a|τ|

where A and a are real positive constants, is applied to the input of the LTI system with impulse response
h(t)= e-b|τ|u(t) where b is real positive constants. Find the autocorrelation of the output Y(t) of the system.

Dec 2009

Q. State and prove the properties of autocorrelation function and cross correlation function.

Q.The power spectral density of random process is given by :-

Find: - S(w)=10w2 + 35 / (w2 + 4 ) (w2 + 9 )

(i) Average Power.

(ii) R (t) the autocorrelation function.

Q.(a) If the WSS Process X(t) is given by X(t) = 10 cos (100t+ θ)

Were θ is uniformly distributed over ( - π, π). Prove that the X(t) is.correlation Ergodic.

Dec 2008

Q.What is a Random Process? State four classes of random processes giving one example each

Random Signal Analysis Page 311


Q. Explain in brief:

(i) WSS process

(ii) Poisson process

(iii) Queuing system.

CHAPTER-6

Markov Chains and Introduction to Queuing Theory


1. Motivation:

The objective of this course is to analyze the behavior of signals and random phenomena, with
special emphasis on its applications to communication engineering, signals and linear systems.

2. Syllabus:

Module Content Duration Self Study Time

1. Introduction

2 Introduction 1 lecture 1 hours

3 Homogeneous chain, stochastic 2 lecture 2 hours

matrix, Random walks,

4 higher transition probabilities and the 2 lectures 2 hours


Chapman-Kolmogorov equation,

Random Signal Analysis Page 312


5 classification of states 2 lectures 2 hours

6 Markovian models, Birth and death 2 lectures 2 hours


queuing models, Steady state results

7 Single and Multiple server Queuing 2 lectures 2 hours


models, Finite source models, Little’s
formula

3. References:

1. A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochastic

Processes, 4th Edition, McGraw-Hill, 2002

2. P.Z. Peebles, Probability, Random Variables and Random Signal Principles,

4th edition, Mc-Graw Hill, 2000

3. H. Stark and J.W. Woods, Probability and Random Processes with

Applications to Signal Processing, 3e, Pearson edu

4. Wim C Van Etten, Introduction to Random Signals and Noise, Wiley

5. Miller, Probability and Random Processes-with applications to signal

processing and communication, first ed2007, Elsevier

4. Weightage in University Examination: 10 to 12 Marks

5. Prerequisite:

Knowledge of signals and its behavior is required

6. Key Definitions:

Random Signal Analysis Page 313


1. Markov chains:
Markov chains are discrete state space processes that have the Markov property. Usually they are
defined to have also discrete time

1. Stochastic process
Dynamical system with stochastic (i.e. at least partially random) dynamics. At each

Time the system is in one state Xt, taken from a set S, the state space. One often
writes such a process as

3 Homogeneous (or stationary) Markov chains

A Markov chain with transition probabilities that depend only on the length m-n of the separating
time interval,

is called a homogeneous (or stationary) Markov chain.

4 stochastic matrix

The one-step transition probabilities WXY (1) in a homogeneous Markov chain are from

now on interpreted as entries of a matrix W = { WXY } , the so-called transition matrix of the chain, or
stochastic matrix.

7. Theory and Mathematical Representation

Introduction
7.1 Random walks

A drunk walks along a pavement of width 5. At each time step he/she moves one position forward, and one
position either to the left or to the right with equal probabilities.

Except: when in position 5 can only go to 4 (wall), when in position 1 and going to the

Random Signal Analysis Page 314


right the process ends .

The fair casino

You decide to take part in a roulette game, starting with a capital of C 0 pounds. At each round of the game
you gamble £10. You lose this money if the roulette gives an even number, and you double it (so receive
£20) if the roulette gives an odd number.

Suppose the roulette is fair, i.e. the probabilities of even and odd outcomes are exactly

1/2. What is the probability that you will leave the casino broke?

The gambling banker

Consider two urns A and B in a casino game. Initially A contains two white balls, and

B contains three black balls. The balls are then `shuffled' repeatedly at discrete time

steps according to the following rule: pick at random one ball from each urn, and swap them. The three
possible states of the system during this (discrete time and discrete

State space) stochastic process is shown below:

Random Signal Analysis Page 315


Many many other real-world processes...

Dynamical systems with stochastic (partially or fully random) dynamics. Some are really fundamentally
random; others are `practically' random.

E.g.

Physics: quantum mechanics, solids/liquids/gases at nonzero temperature, diffusion Biology: interacting


molecules, cell motion, predator-prey models,

Medicine: epidemiology, gene transmission, population dynamics,

Commerce: stock markets & exchange rates, insurance risk, derivative pricing,

Sociology: herding behavior, traffic, opinion dynamics,

Computer science: internet traffic, search algorithms,

Leisure: gambling, betting,

7.2. Definitions and properties of stochastic processes

We first define stochastic processes generally, and then show how one finds discrete time

Markov chains as probably the most intuitively simple class of stochastic processes.

Dynamical system with stochastic (i.e. at least partially random) dynamics. At each

Time the system is in one state Xt, taken from a set S, the state space. One often writes
such a process as

Consequences, conventions

(i)We can only speak about the probabilities to find the system in certain states at certain times: each X t is a
random variable.

(ii) To define a process fully: specify the probabilities (or probability densities) for the

Random Signal Analysis Page 316


Xt at all t, or give a recipe from which these can be calculated.

(iii) If time discrete: label time steps by integers n ≥0, write X = ( Xn : n ≥0) .

7.3. Definitions and properties of stochastic processes

Markov chains are discrete state space processes that have the Markov property. Usually they are defined
to have also discrete time

The Markov property

A discrete time and discrete state space stochastic process is Markovian if and only if the conditional
probabilities do not depend on (X0,……,Xn) in full, but only on the most recent state Xn:

The likelihood of going to any next state at time n + 1 depends only on the state we

Find ourselves in at time n. The system is said to have no memory.

Consequences, conventions

(i) For a Markovian chain one has

Proof:

Random Signal Analysis Page 317


(ii) Let us define the probability Pn(X) to find the system at time n ≥ 0 in state

X € S:

This defines a time dependent probability measure on the set S, with the usual

Properties

ΣxPn(X) = 1 and Pn(X) ≥ 0 for all X € S and all n

(iii) For any two times m > n ≥ 0 the measures Pn(X) and Pm(X) are related via

With -----------(I)

Defined: homogeneous (or stationary) Markov chains

A Markov chain with transition probabilities that depend only on the length m ¡ n of

the separating time interval,

is called a homogeneous (or stationary) Markov chain. Here the absolute time is

irrelevant: if we re-set our clocks by a uniform shift n -- n + K for fixed K, then all

probabilities to make certain transitions during given time intervals remain the same.

consequences, conventions

(i) The transition probabilities in a homogeneous Markov chain obey the Chapman-

Random Signal Analysis Page 318


Kolmogorov equation:

The likelihood to go from Y to X in m steps is the sum over all paths that go first

in m - 1 steps to any intermediate state X’, followed by one step from X’ to X.The Markovian property
guarantees that the last step is independent of how we got to X’. Stationary ensures that the likelihood to
go in m - 1 steps to X’ is not dependent on when various intermediate steps were made.

Proof:

Rewrite Pm(X) in two ways, first by choosing n = 0 in the right-hand side of (I), second by choosing n = m - 1
in the right-hand side of (I):

Next we use (I) once more, now to rewrite Pm-1(X’) by choosing n = 0:

Finally we choose P0(X) = δXY , and demand that the above is true for any Y € S:

Defined: Stochastic matrix

The one-step transition probabilities WXY (1) in a homogeneous Markov chain are from now on interpreted
as entries of a matrix W = (WXY), the so-called transition matrix of the chain, or stochastic matrix.

consequences, conventions:

(i) In a homogeneous Markov chain one has

Random Signal Analysis Page 319


Proof:

This follows directly from (8), in combination with our identification of W XY in Markov chains as the
probability to go from Y to X in one time step.

Examples

Some dice rolling examples:

Xn = number of sixes thrown after n rolls?

6 at stage n : Xn = Xn-1 + 1; probability 1/6

no 6 at stage n : Xn = Xn-1; probability 5=6

So P(Xn) depends only on Xn-1, not on earlier values: Markovian!

If Xn-1 had been known exactly:

If Xn-1 is not known exactly, average over all possible values of Xn-1:

Hence

Random Signal Analysis Page 320


7.4. Properties of homogeneous finite state space Markov chains

Simplification of notation & formal solution

Since the state space S is discrete, we can represent/label the states by integer numbers,

and write simply S = (1; 2; 3; : : :). Now the X are themselves integer random variables. To

exploit optimally the simple nature of Markov chains we change our notation

S =(1; 2; 3; : : :); X; Y = I, j Pn(X) = pi(n); WXY = pji

From now on we will limit ourselves for simplicity to Markov chains with finite state spaces

S = (1; : : : ; |S|). This is not essential but removes distracting technical complications.

Defined: homogeneous Markov chains in standard notation

In our new notation the dynamical eqns of the Markov chain becomes

Example Suppose a car rental agency has three locations in Ottawa: Downtown location (labelled A), East
end location (labelled B) and a West end location (labelled C). The agency has a group of delivery drivers to
serve all three locations. The agency's statistician has determined the following:

1. Of the calls to the Downtown location, 30% are delivered in Downtown area, 30% are delivered in
the East end, and 40% are delivered in the West end

2. Of the calls to the East end location, 40% are delivered in Downtown area, 40% are delivered in
the East end, and 20% are delivered in the West end

3. Of the calls to the West end location, 50% are delivered in Downtown area, 30% are delivered in
the East end, and 20% are delivered in the West end.

After making a delivery, a driver goes to the nearest location to make the next delivery. This way, the
location of a specific driver is determined only by his or her previous location.

We model this problem with the following matrix:

Random Signal Analysis Page 321


T is called the transition matrix of the above system. In our example, a state is the location of a particular
driver in the system at a particular time. The entry sji in the above matrix represents the probability of
transition from the state corresponding to i to the state corresponding to j. (e.g. the state corresponding to
2 is B)

To make matters simple, let's assume that it takes each delivery person the same amount of time (say 15
minutes) to make a delivery, and then to get to their next location. According to the statistician's data, after
15 minutes, of the drivers that began in A, 30% will again be in A, 30% will be in B, and 40% will be in C.
Since all drivers are in one of those three locations after their delivery, each column sums to 1. Because we
are dealing with probabilities, each entry must be between 0 and 1, inclusive. The most important fact that
lets us model this situation as a Markov chain is that the next location for delivery depends only on the
current location, not previous history. It is also true that our matrix of probabilities does not change during
the time we are observing.

Now, let’s start with a simple question. I f you begin at location C, what is the probability (say, P) that you
will be in area B after 2 deliveries? Think about how you can get to B in two steps. We can go from C to C,
then from C to B, we can go from C to B, then from B to B, or we can go from C to A, then from A to B. To
figure out P, let P(XY) denote the probability of going from X to Y in one delivery (where X,Y can be A,B or
C). Do you remember how probabilities work? If two (or more) independent events must both (all) happen,
to obtain the probability of them both (all) happening, we multiply their probabilities together. To obtain
the probability of either (any) happening, we add the probabilities of those events together.

This gives us P = P(CA)P(AB) + P(CB)P(BB) + P(CC)P(CB) for the probability that a delivery person goes from C
to B in 2 deliveries. Substituting into our formula using the statistician's data above gives P = (.5)(.3) +
(.3)(.4) + (.2)(.3) = .33.This tells us that if we begin at location C, we have a 33% chance of being in location
B after 2 deliveries.

Let's try this for another pair. If we begin at location B, what is the probability of being at location B after 2
deliveries? Try this yourself before you read further! The probability of going from location B to location B
in two deliveries is P(BA)P(AB) + P(BB)P(BB) + P(BC)P(CB) = (.4)(.3)+(.4)(.4) + (.2)(.3) = .34. Now it wasn't so
bad calculating where you would be after 2 deliveries, but what if you need to know where you will be after
5, or 15 deliveries? That could take a LONG time. There must be an easier way, right? Look carefully at
where these numbers come from. As you might suspect, they are the result of matrix multiplication..
Going from C to B in 2 deliveries is the same as taking the inner product of row 2 and column 3. Going from
B to B in 2 deliveries is the same as taking the inner product of row 2 and column 2. If you multiply T by T,
the (2, 3) and (2,2) entries are respectively, the same answers that you got for these two questions above.
The rest of T2 answers the same type of question for any other pair of locations X and Y.

Random Signal Analysis Page 322


You will notice that the elements on each column still add to 1 and each element is between 0 and 1,
inclusive. Since we are modeling our problem with a Markov chain, this is essential. This matrix indicates
the probabilities of going from location i to location j in exactly 2 deliveries.

Now that we have this matrix, it should be easier to find where we will be after 3 deliveries. We will let
p(AB) represent the probability of going from A to B in 2 deliveries. Let's find the probability of going from C
to B in 3 deliveries: it is p(CA)P(AB) + p(CB)P(BB) + p(CC)P(CB) = (.37)(.3) + (.33)(.4) + (.3)(.3) = .333. You will
see that this probability is the inner product of row 2 of T 2 and column 3 of T. Therefore, if we multiply T2 by
T, we will get the probability matrix for 3 deliveries.

By now, you probably know how we find the matrix of probabilities for 4, 5 or more deliveries. Notice that
the elements on each column still add to 1. Therefore, it is important that you do not round your answers.
Keep as many decimal places as possible to retain accuracy.

Random Signal Analysis Page 323


, ,

What do you notice about these matrices as we take into account more and more deliveries? The numbers
in each row seems to be converging to a particular number. Think about what this tells us about our long-
term probabilities. This tells us that after a large number of deliveries, it no longer matters which location
we were in when we started. At the end of the week, we have (approximately) a 38.9% Chance of being at
location A, a 33.3% chance of being at location B, and a 27.8% chance of being in location C. This
convergence will happen with most of the transition matrices that we consider.

Remark If all the entries of the transition matrix are between 0 and 1 EXCLUSIVELY, then convergence is
guaranteed to take place. Convergence may take place when 0 and 1 are in the transition matrix, but
convergence is no longer guaranteed.

For an example, look at the matrix

Think about the situation that this matrix represents in order to understand why Ak oscillates as k grows.

Random Signal Analysis Page 324


Sometimes, you will be given a vector of initial distributions to describe how many or what proportion of
the objects are in each state in the beginning. Using this vector, you can find out how many (or what
proportion) of the objects are in each state at any later time. If the initial distribution vector consists of
numbers between 0 and 1, it tells you what proportion of the total number of objects are in each state in
the beginning, and the elements in the column sum to one. Alternatively, the vector of initial distributions
could contain the actual number of objects or people in each state in the beginning. In this case, all the
elements will be nonnegative and the elements in each row will add to the total number of objects or
people in the entire system. In our example above, the vector of initial distributions tells us what
proportion of the drivers originally begins in each area. For example, if we start out with a uniform
distribution, we will have 1/3 of our drivers in each area. Thus, the vector

is the vector of initial distribution. After one delivery, the distribution will be (approximately) 40% of our
drivers in area A, 33.4% in area B, and 26.6% in area C. We found this by multiplying our initial distribution
matrix by our transition matrix, as follows:

After many deliveries, we saw that some convergence occurs, so that the area from which we start doesn't
matter. This will mean that we will obtain the same right-hand side no matter with which initial distribution
we start. For example,

Random Signal Analysis Page 325


Notice that each right-hand side is the same as one of the columns of our transition matrix after many
deliveries. This is exactly what we expected because we said that about 38.9% of the people will be in area
A after many deliveries regardless of what percentage of the people were in area A in the initial
distribution. Check this with several initial distributions to convince yourself of the truth of this statement.

If the initial distribution indicates the actual number of people in the system, the following can represent
our system after one delivery:

Did you notice that we now have a fractional number of people in areas A and C after one delivery? We
know that this cannot happen, but this gives us a good idea of approximately how many delivery people are
in each area. After many deliveries, the right-hand side of this equality will also be very close to a particular
vector. For example,

The particular vector that the product converges to is the total number of people in the system (54 in this
case) times any columnn of the matrix that Ak converges to as k grows,

Random Signal Analysis Page 326


Try some examples to convince yourself that the vector indicating the number of people in each area after
many deliveries will not change if people are moved from one state to another in the initial distribution.
Also notice that the number of people in the entire system never changes. People move from place to
place, but the system never loses or gains people. This can also be illustrated easily using block
multiplication: Since, for large N, TN = [w w w] has its 3 columns the same, then if v = (coulmn vector) (p 1,
p2, p3) is the initial distribution vector ,TN v = [w w w] (p1, p2, p3) (transpose)= p1w+ p2w+ p3w =( p1+ p2+
p3)w =(total number of people initially) w.

I hope the above example gave you a good idea about the process of Markov chains. Now here is the
general setting:

Definitions For a Markov chain with n states, the state vector is a column vector whose ith component
represents the probability that the system is in the ith state at that time. Note that the sum of the entries of
a state vector is 1. For example, vectors X0 and X1 in the above example are state vectors. If pij is the
probability of movement (transition) from one state j to state i, then the matrix T=[ pij] is called the
transition matrix of the Markov chain.

The following Theorem gives the relation between two consecutive state vectors:

If Xn+1 and Xn are two consecutive state vectors of a Markov chain with transition matrix T, then X n+1=T Xn

For a Markov chain, we are usually interested in the long-term behavior of a general state vector Xn. In
other words, we would like to find the limit of Xn as n→∞. It may happen that this limit does not exist, for
example let

Random Signal Analysis Page 327


then

Clearly Xn oscillates between the vectors (0, 1) and (1, 0) and therefore does not approach a fixed vector.

A question is: what makes Xn approach a limiting vector as n→∞. The next theorem will give an answer,
first we need a definition:

Definition A transition matrix T is called regular if, for some integer r, all entries of Tr are strictly positive. (0
is not strictly positive).

For example, the matrix

is regular since

Random Signal Analysis Page 328


A Markov chain process is called regular if its transition matrix is regular.

We state now the main theorem in Markov chain theory:

1. If T is a regular transition matrix, then as n approaches infinity, Tn→S where S is a matrix of the
form [v, v,…,v] with v being a constant vector.

2. If T is a regular transition matrix of a Markov chain process, and if X is any state vector, then as n
approaches infinity, TnX→p, where p is a fixed probability vector (the sum of its entries is 1), all of
whose entries are positive.

Consider a Markov chain with a regular transition matrix T, and let S denote the limit of Tn as n approaches
infinity, then TnX→SX=p, and therefore the system approaches a fixed state vector p called the steady-state
vector of the system. Now since Tn+1=TTn and that both Tn+1 and Tn approach S, we have S=TS. Note that
any column of this matrix equation gives Tp=p. Therefore, the steady-state vector of a regular Markov chain
with transition matrix T is the unique probability vector p satisfying Tp=p.

Is there a way to compute the steady-state vector of a regular Markov chain without using the limit? Well,
if we can solve Tp=p, for p, then yes! You might have seen this sort of thing before (and certainly will in
your first linear algebra course) Recall the definition of an eigenvector and an eigenvalue of a square
matrix:

Given a square matrix A, we say that the number λ is an eigenvalue of A if there exists a nonzero vector X
satisfying: AX=λX. In this case, we say that X is an eigenvector of A corresponding to the eigenvalue λ.

Random Signal Analysis Page 329


It is now clear that a steady-state vector of a regular Markov chain is an eigenvector for the transition
matrix corresponding to the eigenvalue 1.

Recall that the eigenvalues of a matrix A are the solutions to the equation det(A- λI)=0 where I is the
identity matrix of the same size as A. If λ is an eigenvalue of A, then an eigenvector corresponding to λ is a
non-zero solution to the homogeneous system (A- λI)X=0. Consequently, there are infinitely many
eigenvectors corresponding to a fixed eigenvalue.

Example If you have lived in Ottawa for a while, you must have realized that the weather is a main concern
of the population. An unofficial study of the weather in the city in early spring yields the following
observations:

1. It is almost impossible to have two nice days in a row

2. If we have a nice day, we just as likely to have snow or rain the next day

3. If we have snow or rain, then we have an even chance to have the same the next day

4. If there is a change from snow or rain, only half of the time is this a change to a nice day.

a. Write the transition matrix to model this system.

b. If it is nice today, what is the probability of being nice after one week?

c. Find the long time behaviour of the weather.

Solution 1) since the weather tomorrow depends only on today, this is a Markov chain process. The
transition matrix of this system is

Random Signal Analysis Page 330


where the letters N, R, S represent Nice, Rain, Snow respectively.

2) If it is nice today, then the initial state-vector is

After seven days (one week), the state-vector would be

Random Signal Analysis Page 331


So, there is about 20% chance of being nice in one week.

3) Notice first that we are dealing with a regular Markov chain since the transition matrix is regular,
so we are sure that the steady-state vector exists. To find it we solve the homogeneous system (T-
I)X=0 which has the following coefficient matrix:

Reducing to reduced echelon form gives

The general solution of this system is

So what solution do we choose? Remember that a steady-state vector is in particular a probability vector;
that is the sum of its components is 1: 0.5t+t+t=1 gives t=0.4. Thus, the steady-state vector is

Random Signal Analysis Page 332


In the long term, there is 20% chance of getting a nice day, 40% chance of having a rainy day and 40%
chance of having a snowy day.

Objective Question

1. Define Poisson Process


2. Define semi Markov chain
3. Define Gaussian Process
Short Question

1. State and prove Chapman-Kolmogorov Equation


2. Define Poisson Process and find its probability density function
3. What are the assumptions in Poisson Process
Long Question

1. Defined: Stochastic matrix

2. Properties of homogeneous finite state space Markov chains

3. State and prove Chapman-Kolmogorov Equation

Mumbai University Question Papers

Dec 2012

Q. State and prove Chapman-Kolmogorov Equation.

Q. The transition matrix of markov chain with three states 0,1,2 is given by

¾ ¼ 0

¼ ½ ¼

Random Signal Analysis Page 333


0 ¾ ¼

And the initial state distribution is

P(Xo=i)= 1/3, i=0,1,2

Find (i) P(X2=2) (ii) p(X3=1, X2=2, X1=1,X0=2)

May 2012

Q. State and prove Chapman-Kolmogorov Equation.

Dec 2011

Q. State and prove Chapman-Kolmogorov Equation.

May 2011

Q. State and prove Chapman-Kolmogorov Equation.

Q. Three boys A, B, C are throwing balls to each other. A alway throws ball to B. B is as likely to throw the
ball to C as to A. The probability that C will throw the ball to A is 2/3. Write transition probability matrix and
prove that process is markovian

Q. the transition probability of Markov chain

0.5 0.4 0.1

0.3 0.4 0.3

0.2 0.3 0.5

Find limiting probability

Q. Define Markov chain giving example

1.Define Markov Chain giving an example. 4 Marks Dec 2009

2. State and prove Chapman-Kolmogorov Equation. 10 Marks Dec 2009


Random Signal Analysis Page 334
3. State and prove Chapman-Kolmogorov Equation. 10 Marks Dec 2008

Random Signal Analysis Page 335

Você também pode gostar