STOCHPROC

ELEMENTARY PROBABILITY
¯ ¯
AND STOCHASTIC PROCESSES
¯ ¯
Nguyen Van Thu

Professor Dr. HCMIU VN
1 SET THEORY
This Section treats some elementary ideas and concepts of set theory which are nec-
essary for a modern introduction to probability theory. Any well defined list or
collection of objects is called its elements or members. We write p ∈ A, if p is
an element in the set A. We say that A is a subset of B, if every element of A belongs
to set B. This is denoted by A ⊂ B
Two sets are equal if each is contained in the other, that is
A = B if and only if A ⊂ Band B ⊂ A
We specify a particular set either by listing its elements or by stating properties

which characterize the elements of the set. For example,
A = {1, 3, 5, 7, 9}
B = {x : x is a prime number , x < 15}.

All sets under investigation are assumed to be subsets of some fixed set called the
universal set and denoted by Ω. We use ∅ to denote the empty or null set i.e the set
which contains no elements; this set is regarded as a subset of every other set.
Ex: N = the set of positive integers : 1,2,3,...
Z = the set of integers: · · · ,-2,-1,0,1,2,3,· · ·
Thus we have N ⊂ Z ⊂ R
Theorem 1.1 Let A, B and C be any sets. Then (i) A ⊂ A;(ii) if A ⊂ B and B ⊂ A
then A = B; and (iii) if A ⊂ B and B ⊂ C then A ⊂ C
A. SET OPERATIONS
1
Let A and B be arbitrary sets. The union of A and B, denoted by A ∪ B is the set
of elements which belong to A or B:
A ∩ B = {x : x ∈ A or x ∈ B}.
The intersection of A and B, denoted by A ∩ B, is the set of elements which belong

both to A and B:
A ∩ B = {x : x ∈ A and x ∈ B}.
If A ∩ B = ∅ then A and B are said disjoint. The difference of A and B or the relative
complement of B with respect to A, denoted by A \ B, is the set of elements which
belong to A but not to B :
A \ B = {x : x ∈ A, x ∈
/ B}.
Observe that A \ B and B are disjoint, i.e (A \ B) ∩ B = ∅.

The absolute complement or complement of A, denoted by Ac , is the set of elements
which do not belong to A:
Ac = {x : x ∈ Ω, x ∈
/ A}.
That is, Ac is the difference of the universal set Ω and A.

Venn diagrams
B. FINITE AND COUNTABLE SETS

A set is finite if it is empty or if it consists of exactly n elements where
n is a positive integer, otherwise it is infinite.
EX: Let M be the set of the days of week.
M = {Monday,Tuesday,..., Sunday}. Then M is finite
EX: Let Y be the set of even integers Y = {2, 4, 6 · · · }. Then Y is an
infinite set.
A set is countable if it is finite of if its elements can be arranged in the
form of a sequence.
2
Two examples above are countable sets.
EX: Let I be the unit interval of real numbers; i.e I = {x : 0 6 x 6 1}.
Then I is also an uncountable set.
3
2 ELMENTS OF COMBINATORIAL ANALYSIS
This section explains the basic notions of combinatorial analysis and devel-
ops the corresponding probabilistic background.
A. FACTORIAL NOTATION
The product of the positive integers from 1 to n inclusive occurs very
often in mathematics and hence is denoted by the special symbol n!
n! = 1.2.3 · · · (n − 2)(n − 1)n
( read ”n factorial”).
B. PERMUTATIONS
An arrangement of a set of n objects in a given order is called a permu-
tation of the objects (taken all at a time).
An arrangment of any r 6 nof these objects in a given order is called an
r-permutation or a permutaion of the n objects taken r at a time.
The number of permutations of n object taken r at a time will be denoted
by P (n, r)
n!
P (n, r) =
(n − r)!
In the special case that r = n, we have
P (n, n) = n(n − 1)(n − 2) · · · 3.2.1 = n!
Corollary: There are n! permutations of n objects (taken all at a time)

EX: We have 5 seats and need to put 3 peoples into these seats. There
are 5 ways for the first person, 4 ways for the second person and 3 ways for
the third person.
Therefore, we have
P (5, 3) = 5.4.3 = 60
5!
or P (5, 3) = = 3.4.5 = 60
(5 − 3)!
C. COMBINATIONS
Suppose we have a collection of n objects. A combination of these n
objects taken r at a time, or an r-combination, is any subset of r elements.
In other words, an r-combination is any selection of r of the n objects where
order does not account.
4
The
number of combinations of n objects taken r at a time will be denoted
n
by r ,
n n!
=
r r!(n − r)
EX: The combinations of the letters a,b,c,d taken 3 at a time are {a,b,c},
{a,b,d}, {a,c,d}, {b,c,d} or simply abc, abd, acd, bcd
That means we have

4 4!
= = 4 cases
3 3!1!
5
3 INTRODUCTION TO PROBABILITY
Probability is the mathematical study of random or nondeterministic exper-

iments. This mathematical theory arose from the study of games of chance.
This motivates that so many text-book examples of elementary probability
involve dice-tossing or coin-throwing. If a coin is thrown in the air, then
it is certain that the coin will come down, but it is not certain that the
head will appear. However, suppose we repeat this experiment of throwing
a coin; let s be the number of successes, i.e the number of times the head
appears, and let n be the number of tosses. Then it has been empirically
observed that the ratio f = ns , called the relative frequency, becomes sta-
ble in the long run , i.e approaches a limit. This stability is the basic of
probabily theory.
Probability may be regarded as quantifying the chance with which a
stated outcome of an event will take place. Usually, probability values are
assumed lieing on a scale between 0 ( impossible) and 1(certainty) but they
are sometimes expressed as percentages.
A. SAMPLE SPACE AND EVENTS

The set Ω of all possible outcomes of some given experiment is called the
sample space. A particular outcome, i.e an element in Ω, is called a sample
point or sample. An event A is a set of outcomes, in other words, a subset
of the sample Ω.
The event {a} consisting of a simple sample a ∈ Ω is called an elementary
event. The empty set ∅ and Ω itself are events; ∅ is called the impossible
event and Ω the certain or sure event.
Two events A and B are called mutually exclusive if they are disjoint,
i.e if A ∩ B = ∅. In the other words, A and B are mutually exclusive if they
cannot occur simultanously.
EX: Tosse a dice and observe the number that appears on top. The
sample space consists of the six possible numbers
Ω = {1, 2, 3, 4, 5, 6}
Let A be the event that an even numbers occurs, B that an odd number
occurs and C that a prime number occurs:
A = {2, 4, 6}, B = {1, 3, 5}, C = {2, 3, 5}
6
Then A ∪ C = {2, 3, 4, 5, 6}
B ∩ C = {3, 5}
C c = {1, 4, 6}
B. AXIOMS OF PROBABILITY
Let Ω be a sample space, let T be the class of events and let P be a
real-valued function defined on T . Then P is called a .probability of event
A if the following axioms hold:
(i) For every event A, 0 6 P (A) 6 1;
(ii) P (Ω) = 1;
(iii) If A and B are mutually exclusive events, then
P (A ∪ B) = P (A) + P (B).
(iv) If A1 , A2 · · · is a sequence of mutually exclusive events,then
P (A1 ∪ A2 · · · ) = P (A1 ) + P (A2 ) + · · ·
Now, we can prove the number of theorems which follow directly from our
axioms.
Case 1: If ∅ is the empty set, then P (∅) = 0
Proof. Let A be any set, A and ∅ are disjoint, and A ∪ ∅ = A
P (A) = P (A ∪ ∅) = P (A) + P (∅)
Subtracting P (A) from both sides gives our result.

Case 2: If Ac is the complement of an event A, the P (Ac ) = 1 − P (A).
Proof. The sample space Ω can be decomposed into the mutually exclu-
sive events A and Ac ; that is Ω = A ∪ Ac .
By (ii) and (iii), we obtain
1 = P (Ω) = P (A ∪ Ac ) = P (A) + P (Ac )
from which our result follows .

Case 3: If A ⊂ B then P (A) 6 P (B).
Proof. If A ⊂ B, then B can be decomposed into the mutually exclusive
events A and B \ A :
P (B) = P (A) + P (B \ A) :
The result follows the fact that P (B \ A) ≥ 0.
7
EX: Let three coins be tossed and the number of heads observed; then
the sample space is Ω = {0, 1, 2, 3}. We obtain the following assignment
1 3 3 1
P (0) = , P (1) = , P (2) = , P (3) =
8 8 8 8
since each probability is nonnegative and the sum of the probability is 1.
Let A be the event that at least one head appears and let B be the event
that all tails or all heads appear
A = {1, 2, 3} and B = {0, 3}.
By definition, we have
3 3 1 7
P (A) = P (1) + P (2) + P (3) = + + =
8 8 8 8
and
1 1 1
P (B) = P (0) + P (3) = + =
8 8 4
D. FINITE EQUIPROBABLE SPACES

A finite probability space Ω where each sample point has the same prob-
ability, will be called an equiprobable or uniform space. In particular, if
Ω has n points, then the probability of each point is n1 . Futhermore, if an
event A has r points, then its probability is r · n1 . In the other words
number of elements in A
P (A) = .
number of elements in Ω
The above formula for P (A) can only be used with respect to an equiprob-
able space, can’t be used in general.
EX: Let a card be selected at random from an ordinary pack of 52 cards.
Let A = {the card is spade} and B = { the card is face card, i.e jack, queen or king}
Since we have an equiprobable space
number of spades 13
P (A) = = ,
number of cards 52
number of face cards 12 3
P (B) = = = ,
number of cards 52 13
number of spade face cards 3
P (A ∩ B) = = .
number of cards 52
8
E. INFINITE SAMPLE SPACES
Let Ω be a countably infinite sample spaces, Ω = {ω1 , ω2 , · · · }. As in the
finite case, we obtain a probability space by assigning to each ωi ∈ Ω a
nonnegative real number pi , called its probability, such that
∞
X
p1 + p2 + · · · = pi = 1.
i=1
The probability P (A) of any event A is then the sum of the probabilities of
its points i.e.
P (A) = Σω∈A P (ω).
EX: Consider the sample space Ω = {1, 2, 3, · · · , ∞} of the experiment of
tossing a coin till a head appears, here n denotes the number of times the
coin is tossed. A probability space is obtained by setting
1 1 1
P (1) = , P (2) = , · · · , P (n) = n , · · · , P (∞) = 0
2 4 2
9
4 CONDITIONAL PROBABILITY AND INDE-
PENDENCE
A. CONDITIONAL PROBABILITY
Let E be an arbitrary event in a sample space Ω with P (E) > 0. The
probabilty that an event A occurs once E has occured or, in other words,
the conditional probability of A given E, write P (A|E) is defined as follows
P (A ∩ E)
P (A|E) =
P (E)
In particular, if Ω is a finite equiprobable space and |A| denotes the number

of elements in an event A then
|A ∩ E| |E| |A ∩ E|
P (A ∩ E) = , P (E) = and also P (A|E) = .
|Ω| |Ω| |E|
EX: Let a pair of fair dice be tossed. If the sum is 6, find the probability
that one of the dice is a 2. In other words, if
E = { sum is 6 } = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)}
and
A = { a 2 appears on at least one die }
Find P (A|E)?
E consists of 5 elements and two of them (2,4),(4,2) belong to A,
A ∩ E = {(2, 4), (4, 2)}
Then P (A ∩ E) = 52 .
B. MULTIPLICATION THEOREM FOR CONDITIONAL PROBABIL-

ITY
If we cross multiply the above equation defining conditional probability
and use the fact that A ∩ E = E ∩ A, we obtain the following useful theorem.
Theorem 1. P (E ∩ A) = P (E)P (A|E)
10
This result can be extended by induction as follows
For any events A1 , A2 , · · · , An
P (A1 ∩ A2 ∩ · · · An ) = P (A1 )P (A2 |A1 )P (A3 |A1 A2 ) · · · P (An |A1 ∩ A2 ∩ · · · ∩ An−1 )
EX: We have a box with w white marbles and r red marbles. Select
one marble from this box and then return that marble to this box with x
marbles which are the same color. Find the probability that for the first
fourth selection, all of the selected marbles are white.
Solution: Let Wi be the event a white marble is selected in ith , i = 1, 4.
Then
P (W1 W2 W3 W4 ) = P (W1 )P (W2 |W1 )P (W3 |W1 W2 )P (W4 |W1 W2 W3 )

w w+x w + 2x w + 3x
= · · ·
w + r w + r + x w + r + 2x w + r + 3x
C. PARTITIONS AND BAYES THEOREM

Suppose the events A1 , A2 , · · · , An form a partition of a sample space Ω,
that is, the events Ai are mutually exclusive and their union is Ω. Now let
B be any other event. Then
B = Ω ∩ B = (A1 ∪ A2 · · · ∪ An ) ∩ B = (A1 ∩ B) ∪ (A2 ∩ B) ∪ · · · ∪ (An ∩ B)
where Ai ∩ B are also mutually exclusive. Accordingly,
P (B) = P (A1 ∩ B) + P (A2 ∩ B) + · · · + P (An ∩ B)
Thus, by multiply theorem
P (B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + · · · + P (An )P (B|An ) (1)
In this equation, we use (1) to replace P (B) and use P (Ai ∩B) = P (Ai )P (B|Ai )
to replace P (Ai ∩ B), thus obtaining
Bayes’ Theorem: Suppose A1 , A2 , · · · , An is a partition of S and B is any
event. Then, for any i,
P (Ai )P (B|Ai )
P (Ai |B) =
P (A1 )P (B|A1 ) + · · · + P (An )P (B|An )
11
EX: Box I has 2 white marbles and 3 yellow marbles. Box II has one white
marble and 2 yellow marbles. Take one marble from box I and throw to
box II. Then take one marble from box II out.
a) Find the probability that for the second selection, we get a white
marble
b) Suppose that we have a yellow marble for the second selection. Find
the probability that we have a white one for the first selection.
Solution. Now, let T1 be the event that we have a white marble for the
first selection.
T2 be the event that we have a white marble for the second selec-
tion.
V1 be the event that we have a yellow marble for the first selection.
V2 be the event that we have a yellow marble for the second se-
lection.
a) By using the multiplication theorem for conditional probability
2 2 3 1 7
P (T2 ) = P (T1 )P (T2 |T1 ) + P (V1 )P (T2 |V1 ) = · + · =
5 4 5 4 20
b) By using Bayes Theorem,
P (T1 )P (V2 |T1 ) P (T1 )P (V2 |T1 )

P (T1 |V2 ) = =
P (V2 ) P (T1 )P (V2 |T1 ) + P (V1 )P (V2 |V1 )
2 1
·
5 2 1 4
= 2 1 = =
5
· 2
+ 35 · 3
4
1+ 9
4
13
D. INDEPENDENCE
An event B is said to be independent of an event A if the probability that B occurs
is not influenced by whether A has or has not occured. In other words, if the
probability of B equals the conditional probability of B given A : P (B) =
P (B|A)
Now substituting P (B) for P (B|A) in the multiplications theorem P (A ∩
B) = P (A)P (B|A), we obtain
P (A ∩ B) = P (A)P (B)
Definition 2. Event A and B are independent, if P (A ∩ B) = P (A)P (B).
12
By using the above equation, we go to the definition of independence.
EX: Let a fair coin be tossed 3 times, we obtain the equiprobable space
Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }
Consider the events A = { the first toss is head }, B = { the second toss is head}
and
C = { exactly two heads are tossed in a row }
Clear A and B are independent events. We have
4 1
P (A) = P ({HHH, HHT, HT H, HT T }) = =
8 2
4 1
P (B) = P ({HHH, HHT, T HH, T HT }) = =
8 2
2 1
P (C) = P ({HHT, T HH}) = =
8 4
Then P (A ∩ B) = P ({HHH, HHT }) = 14
P (A ∩ C) = P ({HHT }) = 18 P (B ∩ C) = P ({HHT, T HH}) = 41 .
Accordingly
1 1 1
P (A)P (B) = · = = P (A ∩ B) A and B are independent,
2 2 4
1 1 1
P (A)P (C) = · = = P (A ∩ C) A and C are independent,
2 2 4
1 1 1
P (B)P (C) = · = 6= P (B ∩ C) B and C are not independent.
2 4 8
Three events A, B and C are independent if
(i) P (A ∩ B) = P (A)P (B), P (A ∩ C) = P (A)P (C) and P (B ∩ C) = P (B)P (C)
i.e the events are pairwise independent, and
(ii) P (A ∩ B ∩ C) = P (A)P (B)P (C).
The condition (ii) does not follow from condition (i), in other words,
three events may be pairwise independent but not independent themslves.
EX: Let a pair of coins be tossed, here Ω = { HH,HT,TH,TT } is an
equiprobable space. Consider the events
A = { heads on the first coin } = { HH,HT }
B = { heads on the second coin } = { HH,TH }
13
C = { heads on exactly one coin } = { HT,TH }
2 1
Then P (A) = P (B) = P (C) = 4
= 2
and
1 1 1
P (A∩B) = P ({ HH }) = , P (A∩C) = P ({ HT }) = , P (B∩C) = P ({ TH }) =
4 4 4
Thus the condition (i) is satisfied, i.e the events are pairwise independent.
However, A ∩ B ∩ C = ∅ and so
P (A ∩ B ∩ C) = P (∅) = 0 6= P (A)P (B)P (C)
In other words, condition (ii) is not satisfied and so the three events are
not independent.
14
5 RANDOM VARIABLE
A. INTRODUCTION
Definition 3. A random variable X on a sample space Ω is a function from Ω into
the set R of real numbers such that the preimage of every interval of R is an event of
Ω.
We emphasize that if Ω is a discrete space in which every subset is an
event, then every real-valued function on Ω is a random variable. On the
other hand, it can be shown that if Ω is uncountable then certain real-valued
functions on Ω are not random variables.
If X and Y are random variables on the sample space Ω, then X + Y, X +
k, kX and XY (where k is a real number) are the function on Ω defined by
(X + Y )(ω) = X(ω) + Y (ω)
(X + k)(ω) = X(ω) + k
(kX)(ω) = kX(ω)
XY (ω) = X(ω)Y (ω)
for every ω ∈ Ω. It can be shown that these are also random variables.
We denote
P (X = a) = P ({ω ∈ Ω : X(ω) = a})
and P (a 6 X 6 b) = P ({ω ∈ Ω : a 6 X(ω) 6 b}).
B. DISCRETE RANDOM VARIABLES IN GENERAL
Now, we suppose X is a random variable on Ω with a countably infinite
set, say X(Ω) = {x1 , x2 , · · · }. Such random variables together with those
finite sets are called discrete random variables. In the finite case, we make
X(Ω) into a probability space by defining the probability of xi to be f (xi ) =
P (X = xi ) and call f the distribution of X
The expectation EX (denoted by µ ) and variance V arX are defined by

∞
X
EX = x1 f (x1 ) + x2 f (x2 ) + · · · = xi f (xi )
i=1
15
That is , EX is weighted average of the possible values.
EX: A pair of fair dice is tossed. We obtain the finite equiprobable Ω
consisting of the 36 ordered pairs of numbers between 1 and 6
Ω = {(1, 1), (1, 2), · · · , (6, 6)}.
Let X assign to each point (a, b) in Ω the maximum of its numbers, i.e
X(a, b) = max(a, b). Then the image set of random variable X is X(Ω) =
{1, 2, 3, 4, 5, 6}
The distribution of X
1
f (1) = P (X = 1) = P ({1, 1}) =
36
3
f (2) = P (X = 2) = P ({2, 1}, {2, 2}) =
36
Similarly,
5 7
f (3) = P (X = 3) = , f (4) = P (X = 4) =
36 36,
9 11
f (5) = P (X = 5) =
, f (6) = P (X = 6) = .
36 36
This information is put in the form of a table as follows
Next, we compute the mean of X

6
X 1 3 5 7 9 11 161
EX = xi f (xi ) = 1 · +2· +3· +4· +5· +6· = = 4, 47
i=1
36 36 36 36 36 36 36
Some properties of expectation

(i) EX = c if X always gets constant value c with the probability 1
(ii) E(αX + βY ) = αEX + βEY where α, β are constant
(iii) If X ≥ 0 then EX ≥ 0
16
(iv) If X ≥ Y then EX ≥ EY
(v) |EX| 6 E|X|
C. VARIANCE AND STANDARD DEVIATION

The mean of a random variable X measures the ”average” value of X.
The next concept, that of variance of X, measure the ”spread” or ”disper-
sion” of X.
Let X be a random variable with the following distribution
If X is a function from Ω into the set R of real numbers and g is a function

from R into R, that is
X g
Ω −→ R −→ R
then g(X) is a new random variable.
∞
P ∞
P
If J = |g(xk )f (xk )| < ∞, g(X) will have expectation Eg(X) = g(xk )f (xk )
k=1 k=1
If J = +∞, g(X) will not have expectation
Some special case of g
Case 1: g(t) = tn , n = 1, 2, · · ·
∞
X
EX n = Eg(X) = g(xk )f (xk )
k=1
is called the nth moment of X.

Case 2: g(t) = (t − µ)n , n = 1, 2, · · ·
∞
X
E(X − µX )n = Eg(X) = (xk − µX )n f (xk )
k=1
17
is called the nth center moment of X.
In particular, n = 2, E(X − µX )2 is called the variance of X.
The variance of X, denoted by V arX,
n
X
V arX = (xi − µX )2 f (xi ) = E(X − µX )2 = EX 2 − µ2X
i=1
where µX is the mean of X. The standard deviation of X, denoted by σX , is

the square root of V arX : √
σX = V arX.
EX:Continue the above example
The expectation of X is 4,47, it means µX = 4, 47 . We compute the
variance and standard deviation of X. First, we compute
6
2
X 1 3 5 7 9 11 791
EX = x2i f (xi ) = 12 · + 22 · + 32 · + 42 · + 52 · + 62 · = = 21, 97
i=1
36 36 6 36 36 36 36
2 2 2
√ = EX − µX = 21.97 − (4, 47) = 1, 99
Hence, V arX
and σX = 1, 99 = 1, 4
Remark 1.There is physical interpretation of mean and variance. Suppose that
each point xi on the x axis there is placed a unit with mass f (xi ). Then the mean is
the center of gravity of the system, and the variance is the moment of the inertia of
the system.
Remark 2.Many random variables gives rise to the same distribution; hence we
frequently speak of the mean, variance and standard deviation of a distribution instead
of the underlying random viariable.
Remark 3.Let X be a random variable with mean µ and the standard deviationσ >
0. The standardized random variable X • is defined as
X −µ
X• = .
σ
Ex: Show that E(X • = 0 and V ar(X • ) = 1.
D. CONTINUOUS RANDOM VARIABLES
Suppose that X is a random variable whose image set X(Ω) is a continum of
numbers such as a interval. From the definition of random variables that the set
{ω ∈ Ω : a 6 X(ω) 6 b} is an event in Ω and so the probability P (a 6 X 6 b) is well
defined. We assume that there is a piecewise continuous function f : R → R such that
18
P (a 6 X 6 b) is equal to the area under the graph of f between x = aand x = b (as
shown bellow).
P (a 6 X 6 b) =area of shaded region.
In the language of calculus,
Zb
P (a 6 X 6 b) = f (x)dx.
a
In this case, X is said to be a continous random variable. The function f is called

the distribution or the continous probability function ( or density function ) of X, it
satisfies the conditions R
(i) f (x) ≥ 0 and (ii) f (x)dx = 1
R
The expectation EX is defined by
Z
EX = xf (x)dx
R
when it exists.
The variance is defined by
Z
2
V arX = E(X − µX ) = (x − µX )2 f (x)dx
R
when it exists.
19
Just as in the discrete case, V arX exists if and only if µX = EX and EX 2 both
exist and then Z
V arX = EX − µX = x2 f (x)dx − µ2X
2 2
The standard deviation σX is defined by

√
σX = V arX when V arX exists.
EX: Let X be a continous random variables with the following distribution

1
f (x) = xif 0 6 x 6 20elsewhere
2
Then
Z1,5 Z1,5
1 x2 1,5 2, 25 − 1 1, 25 5
P (1 6 X 6 1, 5) = f (x)dx = xdx = 1
= = =
2 4 4 4 16
1 1
Next, we compute the expectation, variance and standard deviation of X
Z2 Z2
x2 x3 2 4
Z
1
EX = xf (x)dx = x · xdx = dx = =
2 2 6 0 3
R 0 0
Z2 Z2
x3 x4 2
Z
1
EX 2 = x2 f (x)dx = x2 · xdx = dx = =2
2 2 8 0
R 0 0
r √
2 16
2 2 2 2
V arX = EX − µ = 2 − = and σX = =
9 9 9 3
Remark
A finite number of contious random variables, say X, Y, · · · , Z are said to be inde-
pendent if for any intervals [a, a0 ], [b, b0 ], · · · , [c, c0 ]
P (a 6 X 6 a0 , b 6 X 6 b0 , · · · , c 6 X 6 c0 ) = P (a 6 X 6 a0 )P (b 6 X 6 b0 ) · · · P (c 6 X 6 c0 )
E. CUMULATIVE DISTRIBUTION FUNCTION

The cumulative distribution function F is the function F : R → R defined by
F (a) = P (X 6 a)
20
If X is a discrete random variable with distribution function, then F is the ”step
function” defined by X
F (x) = f (xi )
xi 6x
On the other hand, if X is a continous random variable with distribution function,

then
Zx
F (x) = f (t)dt.
−∞
In either case, F is monotonic increasing, i.e
F (a) 6 F (b) whenever a 6 b
and the limit of F to the left is 0 and to right is 1
lim F (x) = 0 and lim F (x) = 1

x→−∞ x→∞
EX: Let X be a discrete random variable with the following distribution
The graph of the cumulative distribution function F of X follows

Graph of F
Observe that F is a ”step function” with a step at the xi with height f (xi )
EX: Let X be a continous random variable with the following distribution
1
f (x) = x if 0 6 x 6 20elsewhere
2
21
The cumulative function F follows
1
F (x) = 0if x < 0 x2 if 0 6 x 6 21if x > 2
4
Here we use the fact that for 0 6 x 6 2
Zx
1 1
F (x) = tdt = x2
2 4
0
F. JOINT DISTRIBUTION
Let X and Y be random variables on a sample space Ω with respective image sets
X(Ω) = {x1 , x2 , · · · , xn } and Y (Ω) = {y1 , y2 , · · · , ym }
We make the product set
X(Ω)Y (Ω) = {(x1 , y1 ), (x1 , y2 ), · · · , (xn , ym )}
into a probability space by defining the probability of the ordered pair (xi , yj ) to be
P (X = xi , Y = yj ) which is written h(xi , yj ). Then h(xi , yj ) is called the joint distri-
bution or joint probability function of X and Y and is usually given in the form of a
table.
22
The above function f and g are defined by
m
X n
X
f (xi ) = h(xi , yj ) and g(yj ) = h(xi , yj )
j=1 i=1
Both f (xi ) and g(yj ) are called the marginal distribution and are distribution of
X and Y . The joint distribution and respective means µX and µY , the covariance of
X and Y , denoted by Cov(X, Y ), is defined by
X
Cov(X, Y ) = (xi − µX )(yj − µY )h(xi , yj ) = E(X − µX )(Y − µY )
i,j
or equivalently by X
Cov(X, Y ) = xi yj h(xi , yj ) − µX µY
i,j
The correlation of X and Y , denoted by ρ(X, Y ), is defined by
Cov(X, Y )
ρ(X, Y ) =
σX σY
Some properties of ρ
(i) ρ(X, Y ) = ρ(Y, X)
(ii)0 6 ρ 6 1
(iii) ρ(X, X) = 1
(iv) ρ(aX + b, cX + d) = ρ(X, Y ) if a, c 6= 0
EX: Let X and Y be random variables with the following joint distribution
Distribution of X and Y
23
We compute EXY and µX , µY
EXY = 1 · 4 · 0 + 1 · 10 · 21 + 3 · 4 · 12 + 3 · 10 · 0 = 11
µX = EX = 1 · 21 + 3 · 21 = 2
µY = EY = 4 · 12 + 10 · 12 = 7
Then, Cov(X, Y ) = E(XY ) − µX µY = 11 − 2 · 7 = −3
Remark The notion of a joint distribution h is extended to any finite number of
random variables X, Y, · · · , Z is the obvious way that is h(xi , yj , · · · , zk ) = P (X =
xi , Y = yj , · · · , Z = zk )
G. INDEPENDENT RANDOM VARIABLE
A finite number of random variables X, Y, Z, · · · on a sample space Ω are said to
be independent if
P (X = xi , Y = yj , · · · Z = zk ) = P (X = xi )P (Y = yj ) · · · P (Z = zk )
for any values xi , yj , · · · , zk . In particular, X and Y are independent if
P (X = xi , Y = yj ) = P (X = xi )P (Y = yj )
Now, if X and Y has respective distribution f and g, and joint distribution h, then,
the above equation can be written as
h(xi , yj ) = f (xi )g(yj )
Theorem 4. Let X and Y be independent random variables. Then

(i) EXY = EXEY
(ii) V ar(X + Y ) = V arX + V arY
(iii) Cov(X, Y ) = 0
H. TCHEBY CHEFF’S INEQUALITY-LAW OF LARGE NUMBER

The intuitive idea of probability is the so-called ”law of average ”, i.e if a event A
occurs with probability p, then the ”average number of occurences of A” approaches p.
This concept is made precisely by the law of Large Numbers stated below.
Theorem 5. Let X be a random variable with mean µ and standard deviation σ. Then
for every > 0
σ2
P (|X − µ| ≥ ) 6 2

24
Proof. We begin with definition of variance
X
σ 2 = V arX = (xi − µ)2 f (xi )
i
We delete all the terms in the above series for which |xi −µ| < . This does not increase
the value of the series, since all its terms are nonnegative, that is
X
σ2 ≥ ∗(xi − µ)2 f (xi )
i
where the asterisk indicates that the summation extends only over these i for which
|xi − µ| ≥ . Thus this new summation does not increase in value if we replace each
|xi − µ| by ; that is X X
σ2 ≥ ∗2 f (xi ) = 2 ∗f (xi )
i i
P
But ∗f (xi ) is equal to the probability that |xi − µ| ≥ ; hence
i
σ 2 ≥ 2 P (|X − µ| ≥ )
Dividing by 2 we get the desired inequality.
Theorem 6. Let X1 , X2 , · · · be a sequence of independent random variables with the

same distribution with mean µ and variance σ 2 . Let
Sn = (X1 + X2 + · · · Xn )/n be sample mean
Then for any > 0
lim P (|Sn − µ| ≥ ) = 0 or equivalently lim P (|Sn − µ| < ) = 1
n→∞ n→∞
Proof. Note first that
EX1 + EXn + · · · + EXn nµ
E(Sn ) = = =µ
n n
Since X1 , · · · , Xn are independent, it follows that
X1 + · · · + X n 1 nσ 2 σ2
var(Sn ) = V ar( ) = 2 (V arX1 + · · · + V arXn ) = 2 =
n n n n
Thus by Tchebycheff’s inequality
σ2
P (|Sn − µ| ≥ ) 6
n2
The theorem now follows from the fact that the limit of the right side is 0 as
n→∞
25
26
6 SOME ORDINARY DISTRIBUTION
A. DISCRETE RANDOM VARIABLE
A1.BECNOULLI DISTRIBUTION
Consider an experiment which has 2 outcomes, one is success and one the other is
failure. Let p be the probability of success and so q = 1 − p is the probability of failure.
Let X be the number of success in doing experiment one time.
A2.BINOMIAL DISTRIBUTION
Repeat an experiment in A.1 n times and these experiment are independent.
Let S be the number of successes in n times of doing experiment

n k n−k
Pk = P (S = k) = p q
k
We denote S ∼ b(n, p)
We can calculate
27
EX: Let a die be tossed 20 times. Let S be the number of obtaining the face 1 in
20 times tossing.
a) Find the distribution of S
b) Find the probability that in 20 times of tossing, the number of obtaining face 1
is 3.
Solve.
a)S ∼ b(20, 16 )
b) P (S = 3) = 20
1 5
3
( 6 )( 6 )
A3. POISSON DISTRIBUTION

Given λ > 0, let the random variable X gets the real-valued 0, 1, 2, · with the respec-
tive probability P0 , P1 , P2 , · · ·
λk e−λ
Pk = P (X = k) =
k!
Then we said that X has Poisson distribution, denoted by X ∼ P oisson(λ)
We can calculate
This countably infinite distribution appears in many natural phenomena such as the
number of telephone calls per minute at some switchboard, the number of α particles
emitted by a radioactive substance.
28
A4.GEOMETRIC DISTRIBUTION
Let a coin be tossed until we get a head and denote p and q be the respective prob-
ability of success and failure. Let X be the number of tossing. Then X get real valued
1, 2, 3, · · ·
P (X = 1) = P (H) = p
P (X = 2) = P (T H) = P (T )P (H) = qp
P (X = 3) = P (T T H) = P (T )P (T )P (H) = q 2 p
P (X = k) = q k−1 p
A5.MULTINOMIAL DISTRIBUTION
The binomial distribution is generalized as follows. Suppose the sample space of an
experiment is partitioned into s mutually exclusive events A1 , A2 , · · · , As with respective
probabilities p1 , p2 , · · · , ps . Hence p1 + p2 + · · · + ps = 1. Then
Theorem 7. In n repeated trials, the probability that A1 occurs k1 times, A2 occurs k2

times,· · · , As occurs ks times, is equal to
n!
!pk11 pk22 · · · ppss
k1 !k2 ! · · · ks
where k1 + k2 + · · · + ks = n
The above numbers form the so-called multinomial distribution since they are pre-
cisely the terms in the expansion of (p1 + p2 + · · · ps )n . If s = 2 then we obtain the
binomial distribution, which is discussed at the beginning of the chapter.
EX: A fair dice is tossed 8 times. The probability of obtaining the face 5 and 6
twice and each of the others is
8! 12 12 1 1 1 1 35
= ' 0, 06
2!2!1!1!1!1! 6 6 6 6 6 6 5832
A6. HYPERGEOMETRIC DISTRIBUTION
29
A box contain N marbles. Of these, M are drawn at random, marked and returned
to the box. Next, n marbles are drawn at random from the box and marked marbles are
counted. If X denotes the number of marked marbles, then
−1
N M N −M
PX = x =
n x N −n
It is easy to check that
n
EX = M
N
nM
V arX = (N − M )(N − n)
N 2 (N − 1)
EX: A lot consisting of 50 bulbs in inspected by taking at random 10 bulbs and
testing them. If the number of defective bulbs is at most 1, the lot is accepted, otherwise,
it is rejected. If there are, infact 10 defective bulbs in the lot, the probability of accepting
the lot, is
10 40 40

1
50
9 + 10
50
= 0, 3487
10 10
B.CONTINOUS DISTRIBUTION
B1.UNIFORM DISTRIBUTION
A random variable X is said to have a uniform distribution on the interval [a, b] if
its probability density function is given by
1
f (x) = a 6 x 6 b0elsewhere
b−a
We will write X ∼ U [a, b] if X has uniform distribution on [a, b]
B2.EXPONENTIAL DISTRIBUTION
A random variable X is said to have exponential distribution with positive parameter
λ(λ > 0) if its p.d.f is given by
f (x) = λe−λx x ≥ 00x < 0
We will write X ∼ ξ(λ) if X has exponential distribution.

B3.NORMAL DISTRIBUTION
A random variable X is said to have a normal distribtion if its p.d.f is given by
1 1 x−µ 2
f (x) = √ e− 2 ( σ ) ∀x ∈ R
σ 2π
30
We will write X ∼ N (µ, σ 2 ) if X has normal distribution. This function is one of
the most important examples of a contious probability distribution.
We can calculate
If µ = 0 and σ 2 = 1, we obtain the standard normal distribution or curve

1 2
f (x) = √ e−x /2 ∀x ∈ R
2π
Y −µ
Theorem 8. If Y has normal distribution Y ∼ N (0, 1) then X = σ
. Conversely, if
X ∼ N (0, 1) then Y = µ + σX ∼ N (µ, σ 2 )
Rx 2
Theorem 9. Let X have standard normal distribution and denote Φ(x) = √1 e−t /2 dt.
2π
0
Then we have some properties
(i) P (a < X < b) = P (a 6 X 6 b) = F (b) − F (a)
(ii) Φ(−a) = −Φ(a)
(iii) P (|Z| < a) = 2Φ(a) ∀a > 0
EX: Given a random variable Y which has normal distribution, Y ∼ N (2, 9). Find
P (Y (6 5), P (Y > −1), P (−4 < Y < 8)
Solve. P (Y 6 5) = P ( Y 3−2 < 5−2

3
) = P (Z < 1) = 0, 5 + Φ(1) = 0, 5 + 0, 3413 = 0, 8413
Y −2 −1−2
P (Y > −1) = P 3 > 3 = P (Z > −1) = 1 − P (Z 6 −1) = 1 − (0, 5 +
Φ(−1)) = 0, 5 − Φ(−1) = 0, 5 + Φ(1) = 0, 5 + 0, 3413 = 0, 8413
P (−4 < Y < 8) = P (−2 < Y 3−2 < 2) = P (|Z| < 2) = 2Φ(2) = 2.0, 4772 = 0, 9554.
Here, Z has standard normal distribution.
Density function n(x, µ, σ 2 ) of the normal distribution N (µ.σ 2 ) is
1 2
n(x, µ, σ 2 ) = √ e−1/2[(x−µ)/σ] , −∞ < x < ∞,
2πσ
31
Table A.3 indicates the area under the standard normal curve corre-
spending to P (Z < z) where Z is the standard normal random variable . To
ilustrate the use of the table A.3, let us find probability that Z is less than 1.74. First,
we locate a value of z equal to 1.74 in the left column, then move across thw row to the
column under 0.04, where we read 0.9591 . therefore,
P (Z < 1.74) = 0.9591
B4.GAMMA DISTRIBUTION
The integral
Z∞
Γ(α) = xα−1 e−x dx
0
converges or deverges according as α > 0 or 6 0. For α > 0, this integral is called the
Gamma function.
Some properties of Gamma
√ fuction
(i)Γ(1) = 1!; Γ( 21 ) = π
(ii)Γ(α + 1) = αΓ(α)
(iii) Γ(n + 1) = n!
A random variable X is said to have Gamma distribution if p.d.f is given by
1 x
f (x) = α
xα−1 e− β x > 00x 6 0
Γ(α)β
where α, β > 0
We will write X ∼ Gamma( α, β) if X has Gamma distribution. In particular, if
α = 2r and β = 2, we said that X has Chi-Square distribution with the p.d.f is
1 r x
f (x) = r x 2 −1 e− 2 x > 00x 6 0
Γ( 2r )2 2
In this case, X will be denoted X ∼ χ2(r)
32
7 INTRODUCTION TO STATISTICS
What do we mean by statistics ?
We have statistics in the form of annual reports, distribution surveys, museum

records,to name just a few. It is impossible to immagine life without some form of
statistical information being readily at hand.
The word statisticsis used in two senses. It refers to collections of quatitative
information, and methods of handling that sort of data. A society’s annual report, list-
ing the number or whereabouts of interesting animals or plant sightings, is an example
of the first sense in which the word is used. Statistics also refers to the drawing of
inrefences about large groups on the basis of observations made on the smaller ones.
Statistics, then, is to do with ways of organizing, summarizing and describing qual-
ifiable data, and methods of drawing inferences an generalizing upon them.
SAMPLE MOMENTS AND THEIR DISTRIBUTION
Suppose that we seek information about some numerical characteristics of a collec-

tion of elements, called a population.
Biologists are familiar with the term population as meaning all the individualsof
a species that interact with one another to maintain a homogenuous gene pool. In
statistics, the term of population is extended to mean any collection of individual items
or units which ae suject of investigation. In other words, a population consists
of the totality of the observations which we are concerned. Characteristics of
a population which differ from individual to indivildual are called variables. Length,
mass, age, temprerature, ... are are examples of biological variables to which numbers
or values can be assigned. Once numbers or values have been assigned to the variables
they can be measured.
The number of observations in the populations is defined to be the size of population.
A sample is a subset of population.
For reason of time or cost, we may not be able to study each elment of population.
Our object is to draw conclusions or about the unknown population characteristics on
the basis of information from a smaller group or sub-setf individuals which represents
the group as a whole.This subset is called a sample. Each element of the sample
provides a record, such as a measurement, which is called an observation.
B. RANDOM SAMPLING
33
The outcome of a statistical experiment may be recorded either as a numerical value
or as adescriptive representation. When a pair of dice arew tossed and the total is
the outcome of interest, we record a numerical value. However, if the students of a
certain school are given blood test ad the type of blood is of interest, then a discriptive
representation might be the mos useful. A person’s blood can be classified in 8 was. It
must be AB, A, B or O with aplus or minus sign, depending on the presence or absence
of the Rh antigen.
Definition 10. Let X be a random variable with distribution function F and let
X1 , X2 , · · · , Xn be identical independent random variables with common distribution F .
Then the collection X1 , X2 , · · · , Xn is known as a random sample of size n from the dis-
tribution function F or simply as n independent observation on X. If X1 , X2 , · · · , Xn
is a random sample from F , their joint distribution f is given by
n
Y
∗
F (x1 , x2 , · · · , xn ) = F (xi )
i=1
Definition 11. Let X1 , X2 , · · · , Xn be sample from a distribution function F . Then

the statistics n
X Xi
X=
i=1
n
is called the sample mean and the statistics
n n
X (Xi − X)2 X X 2 − n(X)
S2 = =
i=1
n−1 i=1
n−1
is called the sample variance. Moreover, the sample standard deviation, denoted
by S, is the positive square root of the sample variance.
It should be noted that the sample statistic X, S and S 2 (and others that we will
be defined later) are random variables while the parameters µ, σ 2 and so on are fixed
constants that may be unknown.
Example 12. (Coffee prices.) A comparison of coffee prices at 4 randomly selected
grocery stores in San Diego showed increases from the previous month of 12, 15, 17 and
20 cents for a pound bag. Find the variance of this random sample of price increases.
Solution calculating the sample mean, we get
12 + 15 + 17 + 20
X= = 16 cents.
4
34
Therefore, P4
2 i=1 (xi − 16)2 34
s = = .
3 3
Whereas the expression for the sample variance in the above definition best illustrates
that S 2 is a measure of variability, an alternative expression does have some merit and
thus we shoud be aware of it.
Definition 13. (STATISTC) Any function of the random variables constituting a ran-
dom sample is called a statistic.
C. SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS

Let X1 , X2 , · · · , Xn be a sample from a population with density function F . Then
(i) EX = µ
2
(ii)V arX = σn
E(X−µ)3
(iii) E(X − µ)3 = n2
E(X−µ)4 2 2
(iv) E(X − µ)4 = n3
+ (n − 1) [E(X−µ)
n3
]
(v) ES 2 = σ 2 . This is precisely the reason why we call S 2 the sample variance.
4
(vi) V ar(S 2 ) = E(X−µ)
n
3−n
+ n(n−1) [E(X − µ)2 ]2
Theorem 14. Let X1 , X2 , · · · , Xn be a sample from N (µ, σ 2 ). Then we know that

2
X ∼ N (µ, σn )
Proof. To prove the above statement, we use these properties

(1)If X ∼ N (µ, σ 2 ) then αX ∼ N (αµ, α2 σ 2 )
(2)If X1 , X2 , · · · , Xn are a sample from N (µ, σ 2 ) then X1 + X2 + · · · + Xn ∼
N (nµ, nσ 2 )
We have
X1 + X 2 + · · · + Xn 1
X= = (X1 + X2 + · · · + Xn )
n n
By using properties (2)
X1 + X2 + · · · + Xn ∼ N (nµ, nσ 2 )
and by using property (1)
1 σ2
(X1 + X2 + · · · + Xn ) ∼ N (µ, )
n n
35
Theorem 15. If X ∼ N (0, 1) then X 2 ∼ χ2(1)
Theorem 16. Let X1 , X2 , · · · , Xn be identiacal independent distribution (i.i.d) random

n
P
variables and let Sn = Xk . Then
k=1
a) Sn ∼ χ2(n) ⇔ X1 ∼ χ2(1)
n
Xk2 ∼ χ2(n)
P
b) X1 ∼ N (0, 1) ⇒
k=1
Definition 17. Let X ∼ N (0, 1) and Y ∼ χ2(n) and let X and Y be independent. Then
the statistic s
X
T =
Y /n
is said to have a t-distribution with n degrees of freedom (d.f ) and we write T ∼ t(n)
Theorem 18. The p.d.f of T is given by
Γ[ (n+1) ] (n+1)
fn (t) = n √ 2
(1 + t2 /n)− 2 − ∞ < t < +∞
Γ( 2 ) nπ
Definition 19. Let X and Y be independent χ2 random variables with m and n d.f
respectively. The random variable
X/m
F =
Y /n
is said to have an F -distribution with (m.n) d.f and we write F ∼ F (m, n)
Theorem 20. The p.d.f of the F -statistic is given by
(m + n) m m m −1
g(f ) = Γ[ ( )( f ) 2 ]f > 00f 6 0
2 n n
D.DISTRIBUTION OF (X, S 2 ) IN SAMPLE FROM A NORMAL POP-
ULATION
By some calculations, we obtain some statements
(i) X and S 2 are independent
2
(ii) (n−1)S
σ2
∼ χ2(n−1)
√
n(X−µ)
(iii)The distribution of S
is t(n−1)
36
E. UNBIAS ESTIMATORS
Consider a random sample X1 , · · · , Xn from a distribution that involves a parameter
θ whose parameter is unknown and must be estimated. In a problem of this type, it is
desirable to use an estimator θ̂(X1 , · · · , Xn ) that, with high probability, will be close to
θ.
This lead to the following definition. An estimator θ̂(X1 , · · · , Xn ) is an unbiased
estimator of a parameter θ if
E θ̂ = θ
for every possible value of θ.
EX: Let (X1 , · · · , Xn ) be a random sample from a random variable X ∼ P oisson(θ), θ >
0. n
We know that X = n1
P
Xi .
1
It is very easy to see that
n n
(X1 + · · · + Xn ) 1X 1X nθ
EX = E = EXi = θ= =θ
n n i=1 n i=1 n
Therefore, X is unbiased estimator of θ.
37
8 CONFIDENT INTERVAL
A. CONFIDENT INTERVAL FOR PARAMETER p OF BINOMIAL DIS-
TRIBUTION b(n, p)
Let S be a random variable which has binomial distribution b(n, p). By some simple
calculation, we have
ES = np
From this, we can imply that
S
=pE
n
Therefore, Sn be unbiased estimator of p and be denoted by p̂. Moreover, by Demoive-
Laplace theorem
S − np
Z= √ ∼ N (0, 1)
npq
Given a positive number c
√
n(X − µ)
P <c =α
σ0
That means α is confidence.

The above equation can be written 2Φ(c) = α.
The inequality
S − np
npq < c
√
is equavalent to r r
pq pq
p̂ − c < p < p̂ + c
n n
where p̂ = Sn
One recommended solution is replacing p by unbiased estimator p̂
r r
p̂(1 − p̂) p̂(1 − p̂) S
p̂ − c < p < p̂ + c where p̂ =
n n n
EX: Make a confident interval 95% and 99% for p where p is the ratio of families in
Ho Chi Minh City having washing machine. Interview randomly 100 families, we know
60 families having washing machine.
Solve. Let S be the number of families having washing machine in 100 interviewed
families, S ∼ b(100, p), where p is unknown parameter.
38
We know that S = 60
S 60
and p̂ = = = 0, 6
n 100
We have

S − np
P √
< c = P (|Z| < c) where Z ∼ N (0, 1)
npq
From the fact that

P (|Z| < c) = α
⇔ 2Φ(c) = α
α
⇔ Φ(c) =
2
We can get c = 1, 96 from the Laplace table. Finally, we have the confident interval
for p r r
0, 6.0, 4 0, 6.0, 4
0, 6 − 1, 96 < p < 0, 6 + 1, 96
100 100
B. CONFIDENT INTERVAL FOR THE EXPECTATION µ OF THE NOR-
MAL DISTRIBUTION (where µ is unknown and σ02 is known)
Let X1 , · · · , Xn be a sample from N (µ, σ02 ), σ0 was given. From this, we can infer
that√
n(X−µ)
σ0
∼ N (0, 1).
√
n(X−µ)
We have P σ0 < c = P (|Z| < c) where Z ∼ N (0, 1)

Given a positive number c
√
n(X − µ)
P <c =α
σ0
That means α is confidence.

The above equation can be written 2Φ(c) = α.
From the fact that √
n(X − µ)
<c
σ0
we can get
σ0 σ0
X −c· √ <µ<X +c· √
n n
39
EX: The height of a student of one university has normal distribution N (µ, σ02 )
where σ0 = 10cm. Make a confident interval 95% for µ, knowing that when measuring
100 students, we have
100
1 X
X= Xi = 158, 6cm
100 i=1
√
n(X−µ)
Solve. We have P σ0 < c = 0, 95
This equation is equivalent to
2Φ(c) = 0, 95
By using Laplace table, we can infer c = 1, 96. The confidence interval for µ
σ0 σ0
X − c√ < µ < X + c√
n n
1, 96.10 1, 96.10
158, 6 − √ < µ < 158, 6 + √
100 100
C. CONFIDENT INTERVAL FOR THE EXPECTATION µ OF THE NOR-
MAL DISTRIBUTION(where both µ and σ0 are unknown )
We know that the confident interval for µ is
σ0 σ0
X − c√ < µ < X + c√
n n
Because σ0 is unknown, one recommanded solution is replacing σ02 by unbiased estimator

S 2.
S S
X − c√ < µ < X + c√
n n
α is confidence, that means
√
n(X − µ)
P <c =α
S
⇔ P (|tn−1 | < c) = α
where tn−1 is a Student random variable.
We can get the value of c from the Student table.
40
EX: The height of a student of one university has normal distribution N (µ, σ02 )
where µ and σ is unknown. Make a confident interval 95% for µ, knowing that when
measuring 10 students, we have
10
1X
X = 158, 6 and S2 = (Xi − X)2 = 100
9 i=1
We have √
n(X − µ)
P <c =α
S
⇔ P (|t9 | < c) = α
By using Student table, we can infer c = 2, 2622. Therefore, we have the confident
interval for µ
S S
X − c√ < µ < X + c√
n n
2, 2622.10 2, 2622.10
158, 6 − √ < µ < 158, 6 + √
10 10
41
9 TESTING HYPOTHESES
A. PROBLEM OF TESTING HYPOTHESES
In this section, we shall consider statiscal problems involving a parameter θ whose
value is unknown but must lie in a certain parameter space Ω. We shall suppose that
Ω can be partitioned into 2 disjoint subsets Ω0 and Ω1 and that the statistician must
decide whether the unknown value of θ lies Ω0 or in Ω1 .
Let H0 denote the hypotheses that θ ∈ Ω0 , and let H1 denote hypotheses that θ ∈ Ω1 .
Since the subsets Ω0 and Ω1 are disjoint and Ω0 ∪ Ω1 = Ω, exactly one of the hypotheses
H0 and H1 must be true. The stasiscian must decide whether to accept the hypothesis
H0 or to accept the hypothesis H1 . Accepting H0 is equivalent to rejecting H1 and
accepting H1 is equivalent to rejecting H0 . A problem of this type is called a problem
of testing. H1 is called the alternative hypothesis.
B.TWO TYPE OF ERRORS
First, the test might result in the rejection of the null hypothesis H0 when, in fact,
H0 is true. This result is called an error of type 1.
Second, the test might result in the acceptance of the null hypothesis H0 when, in
fact, the alternative hypothesis H1 is true. This result is called an error of type 2. Let
α denote the problem of an error of type 1 and β denote the problem of an error of
type 2. Thus
α = P ( rejecting H0 |H0 is true )
β = P ( accepting H0 |H0 is fault)
It is desirable to find a test procedure for which the probabilities α and β of the two
types of error will be small. Ordinarily, the smaller α is the greater β is.
Therefore, this criterion is to assign α by a value which is called solution α by a
value which is called significance, and desire to find a test which makes β be minimum.
This is called Neyman-Pearson criterion.
C. TESTING PROBLEM
C.1.TESTING p FOR A RANDOM VARIABLE S ∼ b(n, p)
Testing problem
H0 : p = p0
H1 : p 6= p0
where p0 is the given ratio. denote p̂ the unbiased estimation of p and exactly p̂ = Sn
We know that, √
S − np n(p̂ − p)
Z= √ = √ ∼ N (0, 1)
npq pq
is a random variable which has standard normal distribution.
42
By replacing p = p0 (H0 is true), we get
√
n(p̂ − p0 )
Z= √
p0 q0
A well testing criterion is φreject H0 when|Z| ≥ c

accept H0 when |Z| < c
Let α0 be a level of significance,
α0 = P ( rejecting H0 |H0 is true)
= P (|Z| ≥ c|p = p0 )
= 1 − P (|Z| < c)
= 1 − 2Φ(c)
By using Laplace table, we can infer c.
EX: A survey from 100 random families in Ho Chi Minh City give the result that
38 families having washing machine. Then, the statement ”30% families have washing
machine” can be accepted or not with the significance 10% and 5%.
Solve. Testing problem
H0 : p = p0 = 0, 3
H1 : p 6= p0
The statistic √ √ 38
n(p̂ − p0 ) 100( 100 − 0, 3)
Z= √ = √ = 1, 75
p0 q 0 0, 3.0, 7
A well testing criterion
φ rejecting H0 if |Z| ≥ c accepting H0 if |Z| < c
With the significance α = 10%

10% = α = P (|Z| ≥ c|p = p0 ) = 1 − P (|Z| < c|p = p0 ) = 1 − 2Φ(c)
We get c = 1, 65
Intuitively, we have |Z| > c. Therefore, we can reject the null hypotheseis H0 at the
level of significance α = 10%.
Another case is left as an excercise.
C.2.TESTING THE EXPECTATION µ FOR A RANDOM VARIABLE
X ∼ N (µ, σ02 ) (where µ is unknown parameter and σ0 is known).
Testing problem
43
H0 : µ = µ0
H1 : µ 6= µ0 √
As we know, Z = n(X−µ)
σ0
∼ N (0, 1)
By replacing µ = µ0 (H0 is true ), we get
√
n(X − µ0 )
Z=
σ0
A well testing critetion is
φ rejecting H0 if |Z| ≥ c accepting H0 if |Z| < c
Let α0 be a level of significance
α0 = P ( rejecting H0 |H0 is true ) = P (|Z| > c|µ = µ0 )
EX: Let X be the product of rice on 1 ha. Let X1 , · · · , X25 be sample from N (µ, 9)
and X = 4, 3 ton/ha. Accept or reject the null hypothesis µ = 5 ton/ha at a level of
significance α = 5%
Solve. H0 : µ = 5
H1 : µ 6= 5
√ √
n(X − µ0 ) 25(4, 3 − 5)
|Z| =

=
= 1, 17
σ0 3
With the level of significance α = 5%, we have
α = 5% = P (rejectH0 |H0 is true)
=P (|Z| ≥ c|H0 is true)
=1 − 2Φ(c)
By using Laplace table, we can infer c = 1, 96. Intuitively, we have |Z| < c.
Therefore, the null hypothsis H0 is accepted at the level of significance α = 5%
C.3. TESTING THE EXPECTATION µ FOR A RANDOM VARIABLE
X ∼ N (µ, σ 2 ) (where σ 2 is unknown parameters).
Testing problem
H0 : µ = µ0
H1 : µ 6= µ0
Let X1 , · · · , Xn be sample from normal distribution N (µ, σ 2 )
√
n(X − µ)
t= ∼ tn−1
S
44
By replacing µ = µ0 (H0 is true ), we have
√
n(X − µ0 )
t=
S
φ rejecting H0 if |t| ≥ c accepting H0 if |t| < c
With a level of significance
α0 = P ( rejecting H0 |H0 is true ) = P (|t| ≥ c|µ = µ0 )
By using Student table, we can imply c.

In case the d.f is greater or equal to 30, then we can get c from the Laplace table.
EX: The weight X of rice bags produced by automachine has normal distribution
N (µ, σ 2 ). The standard weight of rice bag is 50kg. Weight randomly 24 rice bag, we
get X1 , · · · , X24 and X = 49, 15kg and S 2 = 2, 82. Then the statement ” the machine
is well working ” can be accepted or not with the level of significance 5%
Solve.
H0 : ”the machine is well working”
H1 : ”the machine is well working”
These null and alternative hypotheses is equivalent to
H0 : ”µ = µ0 = 50kg”
H1 : ”µ 6= µ0 ”
First, we calculate t
√ √
n(X − µ0 ) 24(49, 15 − 50)
|t| =

=
√ = 2, 48
S 2, 82
t is a Student random variable with the d.f is 23.

A level of significance is α0 = 0, 05 = P (|t23 | ≥ c)
By using Student table, we can inply c = 2, 069
φ rejecting H0 if |t| ≥ c accepting H0 if |t| < c
Intuitively, we have |t| > c

Therefore, the statement ”the machine is well working” is not acceptable, i.e the
machine is badly working.
45
C.4.COMPARING THE VARIANCES
We shall now consider a problem of testing hypotheses which uses the F distribution.
Suppose that the random variables X1 , · · · , Xn from a random sample of m observations
from a normal distribution which both the mean µ1 and the variance σ 2 are unknown
and the random variables Y1 , · · · , Yn from a random sample of n observations from a
normal distribution which both the mean µ1 and the variance σ 2 are unknown.
Suppose finally that the following hypotheses are to be tested at a specified level of
significance α0 (0 < α0 < 1), i.e
H0 : σ12 6 σ22
H1 : σ12 > σ22
We shall let the statistic V be defined by the following relation
2
SX /(m − 1)
V = 2
SY /(n − 1)
m
P m
P
where SX = (Xi − X) and SY = (Yi − X) .
i=1 i=1
Let α0 is a level of significance, i.e P (V ≥ c) = α0 . By using Fisher table, we can
imply c.
The likelihood ratio test procedure which we have just described specifies that the
hypothesis H0 should be rejected if V ≥ c
EX: Suppose that 6 observations (X1 , · · · , X6 ) are selected at random from a normal
distribution for which both the mean µ1 and the variance σ12 are unknown. It is found
6
2
= (Xi − X)2 = 30. 21 observation (Y1 , · · · , Y21 ) are selected at random from
P
that SX
i=1
a normal distribution for which both the mean µ2 and the variance σ22 are unknown. It
6
is found that SY2 = (Yi − Y )2 = 40
P
i=1
Test of hypotheses
H0 : σ12 6 σ22
H1 : σ12 > σ22
In this example, m=6 and n=21
2
SX /(m − 1) 30/5
V = 2 = =3
SY /(n − 1) 40/20
V has an F distribution with 5 and 20 degrees of freedom. With the level of signif-
icance α0 = 5%, i.e P (V ≥ c) = 5
46
We can imply c = 2, 71 by using Fisher table. Intuitively, we note that V > c
Therefore, the hypotheses H0 that σ12 6 σ22 should be rejected at the level of signifi-
cance α0 = 5%
If the level of significance α0 = 2.5%, we get c=3.29. Therefore, the the hypotheses
H0 that σ12 6 σ22 should be accepted at the level of significance α0 = 2.5%
47
P ROBLEMS FOR THE MIDDLE TERM EXAM
[1] Given three sets A, B, C. Use the Venn diagram to illustrate the
following sets: A ∪ B ∪ C, A ∩ B ∪ B ∩ (C \ (A ∪ B)). [2] Let a card be selected
from two ordinary packs of 52 cards. Denote A = {the card is Diamonds or
clubs}, B = {the card is a red suit one}, C = {the card is not a Heart Jack}.
Compute the following probabilities: P (A), P (B ∪ C), P (C|B) [3]
EX 1( 2 points): What are a sample space, events an their probilitiy?
Formulate axioms of probability ? Give an example ?
EX 2 ( 2 points): What is the conditional probability of an event A given

E. When three events Ai , i = 1, 2, 3 are independent ? Give an example of
a sequence of dependent ( non-independent) events Ai , i > 3 such that each
pair of its membrs are independent.
EX 3 (2 points): A pair of fair dice is tossed. We obtain the finite

equiprobable Ω consisting of the 36 ordered pairs of numbers between 1
and 6
Ω = {(1, 1), (1, 2), · · · , (6, 6)}.
Let X assign to each point (a, b) in Ω the minimum of its numbers, i.e
X(a, b) = min(a, b). What is the image set of random variable X ? Find the
distribution of X and its expectation value, variance and standard deviaton.
EX 3 ( 2 points) Let a pair of fair dice be tossed. If the sum is 5, find

the probability that one of the dice is a 2.
EX 4( 2 points): Give a definition of a random variabe X on a sample

space Ω and its distribution function and, emphasize the above concepts for
the case that if Ω is a discrete space.
EX 5( 2points): A drawer contains red socks and white socks. When

two socks are drawn at random, the probability that both socks are red and
the third one is white is 21 . (i) How small can the number of socks in the
drawer be ? (ii) If the number of white is divisible by 3, how small can the
48
number of socks be ?
EX 6 (2’): A pair of fair dice is tossed. We obtain the finite equiprobable

Ω consisting of the 36 ordered pairs of numbers between 1 and 6
Ω = {(1, 1), (1, 2), · · · , (6, 6)}.
Let X assign to each point (a, b) in Ω the minimum of its numbers, i.e
X(a, b) = min(a, b). What is the image set of random variable X ? Find the
distribution of X and its expectation value, variance and standard deviaton.
EX 7: (4.20) Box A contains nine cards numbered 1through 9,and bx B

contains five cards numbered 1 through 5. A box is chosen at random and a
card drawn. if the number is even, find the probability that the card came
from box A.
EX 8:(4.23) Let A be the event that a family has children of both sexes,
and let B denote the event that a family has at most one boy. The events A
and B are independent. How small can the number of children in a family
be ?
EX 9 Three machines A,B and C produce respectively 50%, 30% and

20% of total number of items of a factory. The percentages of defective
output of these machines are respectively 2%, 3% and 4%. An item is
selected at random and is found defective. Find the probability that the
item was produced by machines C.
EX 10:( 4.54) A box contains 5 radio tubes of which 2 are defective. The
tubes are tested one after the other until the two defecive tubes are dis-
covered.What is the probability that the process stopped on the (i) second
test,(ii) third test ?
EX 11:( Example 5.4) Let a die be tossed 20 times. Let S be the number
of obtaining the face 1 in 20 times tossing.
a) Find the distribution of S
b) Find the probability that in 20 times of tossing, the number of ob-
taining face 1 is 3.
Solve.
49
a)∼ b(20, 16 )
20
( 16 )( 56 )

b) P (S = 3) = 3
EX 12: A lot consisting of 50 bulbs is inspected by taking at random 10

bulbs and testing them. If the number of defective bulbs is at most 1, the
lot is accepted, otherwise, it is rejected. If there are, infact 10 defective
bulbs in the lot, find the probability of accepting the lot. (Hint:
10 40 40

1
9 +
50
10
50
= 0, 3487
10 10
50
10 Introduction to Stochastic processes
I. Review of basic terminology and properties of random variables and Dis-
tribution Function.
The following concepts will be assumed familiar to the Reader:
i. A real random variable ( r.v.) X;

ii. Distribution F of X defined by F (λ) = P r{X ≤ λ} and its elementary
properties;
iii. An even pertaining to the r.v. X, and the probability thereof;
iv. E{X}, the expectation of X, and the higher moments {Xn };
v. The law of total probabilities and Bayes rule for computing probabil-
ities of events.
II. Joint distribution functions.
Given a pair (X, Y) of r.v.’s, their joint distribution function is the func-
tion FXY of two real variables given by
F(λ1 , λ2 ) = Pro{X ≤ λ1 , Y ≤ λ2 }.
Similarly, the joint distribution function of any finite collection X1 , X2 , , Xn

of r.v.’s is defined as the function
F(λ1 , ..., λn ) = FX1 ,...,Xn (λ1 , ..., λn ) = Pro{X1 ≤ λ1 , ..., Xn ≤ λn }.
The distribution function
F λi1 , ..., λik ) = lim F (λ1 , ..., λn )

λi →∞,i6∈{i1 ,...,ik }
is called the marginal distribution of r.v.’s X1 , ..., Xn .
III. conditional distributions and conditional expectation
The conditional probability P r{A|B} of the event A given the event B is

defined by
P r{A ∪ B}
P r{A|B} = , if P r{B} > 0,
P r{B}
and the left undefined, or assigned an arbitrary value, when P r{B} = 0.
51
Let X and Y be r.v.’s which can attain only countably many different
values, say 1,2, ... the conditional distribution function
{X ≤ x, Y ≤ y}
FX|Y (x|y) = , if P r{Y = y} > 0,
P r{Y = y}
and any arbitrary discrete distribution function whenever P r{Y = y} =

0.This last prescription is consistent with the subsequent calculations in-
voked on conditional distribution functions.
Suppose X and Y are jointly distributed continuous r.v.’s having the

joint probability density function pXY (x, y). The the conditional distribution
of X given Y = y is defined as
R
p [u, y)du
u≤x XY
FX|Y (x|y) = ,
pY (y)
whenever pY (y) > 0, and with an arbitrary specification wheneverpY (y) = 0.

Note that F X|Y (x|y) satisfies the following conditions:
(C.P.1) F X|Y (x|y) is a probability distribution function for each x for each
fixed y;
(C.P.2) F X|Y (x|y) is a function of y for each x; and
(C.P.3) For any values x,y
Z
P r{X ≤ x, Y ≤ y} = FX|Y (x|y)dFY (y)
u≤y
where FY (y) = P r{Y ≤ y} is the marginal dist5ribution of Y.

As noted earlier, in this lecture notes, one needs only deal with the integral
in (C.P.3) for the continuous and discrete versions. To wit, when Y is a
continuous r.v. having the probability density function pY (y) the integral in
(C.P.3) is computed as
Z
P r{X ≤ x, Y ≤ y} = F X|Y (x|v)pY (v)dv.
u≤y
And when Y is discrete the formula is
P r{X ≤ x, Y ≤ y} = Σu≤y F (x|u)P r{Y = u}.
52
These three properties capture the essential features of conditional distri-
butions. In fact , from *(C.P.3) we obtain
P r{X ≤ x, Y ≤ y} = P r{X ≤ x, Y ≤ y} − P r{X ≤ x, Y < y}
= Σu≤y FX|Y (x|u)P r(Y = u) − Σu<y F (x|u)P r{Y = u} = FX|Y (x|y)P r{Y = y},
which then implies the function
fX|Y (X|Y ) = P r{x < X, y = Y }/P R{y = Y }
at least when P r{Y = y} > 0. In advanced research, (C.P.1-3) is taken as the
basic for definition of conditional distributions. it can be established that
such conditional distributions exist for arbitrary real r.v.’s X and Y, and
even for real random vectors X = (X1 , ..., Xn ) and Y = (Y1 , ..., Yn ).
The application of (C.P.3) in the case y = ∞ produces the law of total
probability
Z +∞
P r{X ≤ x} = P r{X ≤ x, Y ≤ ∞} = FX|Y (x|y)dFY (y),
−∞
which is one of the most fundamental formulas of probability analysis.

When Y is discrete, this relation becomes
P r{X ≤ x} = Σy P r{X ≤ x|Y = y}
and when Y has the probability density function pY (y) we have
Z +∞
P r{X ≤ x} = P r{X ≤ x|Y = y}pY (y)dy.
−∞
When X and Y are jointly distributed continuous r.v.’s, we may define the
conditional density function
d pXY (x, y)
pX|Y (x|y) = FX|Y (x|y) =
dx pY (y)
at values y for which pY (y) > 0, and as a fixed arbitrary probability density
function when pY (y) = 0.
Let g be a function for which the expectation of g(X) exists. the condi-
tional expectation of g(X) given Y=y can be expressed in the form
Z
E[g(X)|Y = y] = g(x)dFX|Y (x|y).
x
53
When X and Y are jointly continuous r.v.’s, E[g(X)|Y = y] may be computed
from
Z R
g(x)pXY (x, y)dx
E[g(X)|Y = y] = g(x)pX|Y (x|y)dx = , if pY (y) > 0,
pY (y)
and if X and Y are jointly distributed discrete r.v.’s, taking the possible
values x1 , x2 , · · · , then the detailed for mula reduces to
E[g(X)|Y = y] = Σ∞
i=1 P r[X = xi |Y = y]
Σ∞ g(xi )P r{X = xi , Y = y}
= i=1 , if P r{Y = y} > 0.
P r{Y = y}
In parallel with (C.P.1-3) we see that the conditional expectation of g(X)
given Y=y satisfies
(C.E.1) E[g(X)|Y = y] is a function of y for each function g for which
E[|g(x)|] < ∞; and
(C.E.2) For any bounded function h we have
Z
E[g(X)h(Y )] = E[g(X)|Y = y]h(y)dFY (y),
where FY is the marginal distribution function for Y.

In particular, for h(y) ≡ 1, (C.E.2) produces the formula expressing the law
of total probability for expectations,
Z
E[g(X)] = E[g(X)|Y = y]dF (y),
which , when X is discrete, becomes
E[g(X)] = Σ∞
i=1 P r{Y = yi }
and, when Y has a probability density function pY , becomes

Z
E[g(X)] = E[g(X)|Y = y]pY (y)dy.
Since the conditional expectation of g(X) given Y=y is the expectation

with respect to the conditional distribution FX|Y , conditional expectations
behave in many ways like ordinary expectations. In particular, if a and b
54
are fixed numbers and g,h are given functions for which g(X) and h(X)b
are integrable, then
E[ag(X) + bh(X)|Y = y] = aE[g(X)|Y = y] + bE[h(X)|Y = y].
According to (C.E.1), E[g(X)|Y = y] is a function of real variable Y. If we

valuate this function at the random variable Y, we obtain a r.v. which
we denote by E[g(X)|Y ]. The basic property (C.E.2) then is stated for any
bounded function h of y,
E[g(X)h(Y )] = E{E[g(X)|Y ]h(Y )}
When h(y) = 1, for all y, we get the law of total probability in the form
E[g(X)] = E[E[g(X)|Y ]]
The following list summarizes these and other properties of conditional

expectations. Here, with or without a fixes X and Y are random variables,
c is a real number, g is a function for which E[|g(X)|] < ∞, f is a bounded
function and h is a function of two variables for which E[|h(X)|] < ∞.
E[a1 g(X1 ) + a2 g(X2 )|Y ] = a1 E[g(X1 )|Y ] + a2 E[g(X2 )|Y ] (1)

g ≥ 0impliesE[g(x)|Y ] ≥ 0 (2)
E[h(X, Y )|Y = y] = E[h(X, y)|Y = y], (3)
E[g(X)|Y ] = E[g(X)]if X and Y are independent, (4)
E[g(X)f (Y )|Y ] = f (Y )E[g(X)], (5)
E[g(X)f (Y )] = E[E[g(X)|Y ]f (Y )]. (6)
As consequences of theses properties, with either g = 1 or f = 1 we obtain,
E[c|Y ] = c, (7)
E[f (Y )|Y ] = f (Y ), (8)
E[g(X)f (Y )|Y ] = f (Y )E[g(X)], (9)
E[g(X)] = E[E[g(X)]|Y ]. (10)
55
IV. IMPORTANT STOCHASTIC PROCESSES .
Infinite families of random variables:
In dealing with an infinite family of random variables, a direct gener-

alization of the preceding definitions involves substantial difficulties. We
need to adopt a slightly modified approach.
Given a denumerable infinite family X1 , X2 , . . . of r.v’s, their statistical prop-
erties are regarded as defined by prescribing, for each integer n ≥ 1 and very
sets i1 , . . . , in of n distinct positive integers, the joint distribution function
FXi1 ,...,Xin of the random variables Xi1 , ,̇Xin . Of course, some consistency re-
quirements must be imposed upon the infinite family FXi1 ,...,Xin namely, that
FXi1 ,...,Xij ,Xin+1 ,...,Xin (λ1 , . . . , λj−1 , . . . , λj+1 . . . . , λn ) = lim FXi1 ,...,Xij ,Xin+1 ,...,Xin
λj →∞
and the distribution function obtained from
FXi1 ,...,Xin (λ1 , . . . , λ)
by interchanging two of indices iv , iµ and the corresponding variables iv , iµ

should be invariant. This simply means that the manner of labelling the
random variables X1 , X2 , . . . is not relevant.
The joint distribution {FXi1 ,...,Xin } are called the finite-dimensional distribu-
tion associated with {Xn }n = 1∞ . In principle, all important probabilistic
quantities of the variables {Xn }n = 1∞ can be computes in terms of the
finite-dimensional distributions.
The developments in this book are intended to serve as an introduction to
various aspects of stochastic processes. The theory of stochastic processes
is concerned with the investigation of the structure of families of random
variables Xt , where t is a parameter running over a suitable index set T.
Sometimes, when no ambiguity can arise we write X(t) instead of Xt .
A realization or sample function of a stochastic process Xt , t ∈ T is an assign-
ment to each t ∈ T , of possible value of Xt . The index set t may correspond
to discrete units of time T = {0, 1, 2, . . . } and {Xt } could then represent the
outcomes at successive trials like the result of tossing successive observation
of some characteristic of population, etc.
The values of the {X1 } may be one-dimensional, two-dimensional, or n-
dimensional, or even more general. In the case where Xn is the outcome of
56
n-th toss of a die, its possible values are contained in the set {1, 2, 3, 4, 5, 6}
and typical realization of the process would be 5,1,3,2,2,4,1,6,3,6.... This is
shown schematically in the fig.1, where the ordinate for t = n is the value
of Xn . In this example, the random variables Xn are mutually independent
but generally the random variables Xn are dependent. Stochastic for which
T = [0, ∞] are particular important in applications. Here t can usually be
interpreted as time. We will content ourselves, for the moment, with a very
brief discussion of some concepts of stochastic processes and two examples
thereof; a summary of various types of stochastic processes is presented at
the at of the end of the Chapter, while the examples themselves will be
treated in greater detail in succeeding chapters.
Example 1 A very important example is the celebrated Brownian motion

process. This process has the following characteristics:
(a) Suppose that t0 < t1 < · · · < tn ; then the increments Xt1 − Xt0 , · · · , Xtn −
Xtn−1 are mutually independent r.v.’s. ( A process with this property is said
to be a process with independent increments, and expresses the fact that
the changes of Xt over non overlapping time periods are independent r.v.’s.)
(b) The probability distribution of Xt2 −Xt1 , t2 > t1 , depends only on t2 −t1
(and not, for example, on t1 ).
(c)
Z x
−1/2
P r[Xt − Xs ≤ x] = [2πB(t = s)] exp[−u2 /2B(t = s)]du,
−∞
s < t (B is a positive constant).

Th3e Brownian process (also called the Wiener process) has proved to be
fundamental in the study of numerous other types of stochastic process.
The history of this process began with the observation by R. Brown
in 1827 that small particles immersed in a liquid exhibit ceascles irreg-
ular motions.In 1905 Einstein explained this motion by postulating that
the particles under observation are subject to perpetual collision with the
molecules of the surrounding medium. The analytical results derived by
57
Einstein were later experimentally verified and extended by various physi-
cists and mathematicians.
Let Xt denote the displacement (from its starting point, along some fixed
axis) at time t of a Brownian particle. The displacement Xt − Xs over the
time interval (s,t) can be regarded as the sum of a large number of small
displacements. The central limit theorem is essentially applicable and and
it seems reasonable to assert that Xt − Xs is normally distributed. Similarly,
it seems reasonable assume that the distributions of Xt − Xs and Xt+h − ts+h
are the same, for any h¿0, if we suppose that the medium to be in equi-
librium.Finally, intuitively clear that that the displacement Xt − Xs should
depend only on the length t-s and not on the time we begin observation.
Example2: Poisson processes.
V.CLASSIFICATION OF GENERAL STOCHASTIC PROCESSES
The main elements distinguishing stochastic processes are in the nature

of the state space, the index parameter T, and the dependence relations
among the r.v.’s Xt , t ∈ T.
STATE SPACE S
This is the space in which the possible values of Xt lie4. In the case that
S = (0, 1, 2, · · · ) we refer to the process at hand as a discrete state processes.
If S=real line (−∞, +∞),then we call Xt a real valued stochastic process. If
S is a k-dimensional Euclidean space, then Xt is called a k-vector process,...
INDEX PARAMETER T
If T=(0,1,2,...) then we shall always say that Xt is a discrete time process.

Often when T is discrete we shall write Xn instead of Xt . if T = (0, ∞), then
we say that Xt is a continuous time process.
CLASSICAL TYPES OF STOCHASTIC PROCESSES
58
(a) Process with Stationary Independent Increments;
(b) Martingales; (c) Markov processes; (d) Stationary processes.
11 STATIONARY PROCESSES
A stationary process is a stochastic process whose probabilistic laws re-
main unchanged through shifts in time (some time in space). the concept
captures the very natural notion of a physical system that lacks an inher-
ent time (space) origin. It is an appropriate assumption for a variety of
processes in communication theory, astronomy, biology, ecology, and eco-
nomics. vskip0.5cm 1. Definition and examples
Let T be an abstract set having the property that the sum of any two
points in T is also in T. Often T will be the set Z+ = {0, 1, 2, ...} of nonnegative
integers, but it just as well could be the could be the positive half or whole
real line, the plane, finite dimensional space, the surface of a sphere, or or
perhaps even an infinite- dimensional space.
Definition 1.1. A strictly stationary process is a stochastic process

{X(t), t ∈ T } with the property that for any positive integer k and any
points t1 , ..., tk and h ∈ T , the joint distribution of
{X(t1 ), ..., X(tk )}
is the same as the joint distribution of
{X(t1 + h), ..., X(tk + h)}.
Definition 1.2. A stochastic process {X(t), t ∈ T } is called a second order

process, if for every t the second moment of X(t) exists i.e. E[X(t)]2 < ∞.
The most important example of the second order processes is the Gaus-
sian process which is defined as the following: Definition 1.3. A stochastic
process {X(t), t ∈ T } is called a Gaussian process, if for any t1 , ..., tk ∈ T the
random vector (X(t1 , ..., X(tk ) has a k-dimensional Gaussian distribution, i.e.
for any real numbers λ1 , ..., λk the r.v. Σkj=1 λj X(tj ) is a Gaussian r.v. Def-
inition 1.4. If {X(t), t ∈ T } is a second order process then the function
R(t, s) := EX(t)X(s), t, s ∈ T is called its correlation function.
59
Definition 1.5. A second order process {X(t), t ∈ T } is a second is called
a weakly stationary process, if the mean m=EX(t) is constant, independent of
time, and the covariance function
R(t, s) := E[X(t) − m][X(t) − m]
depends only on difference t − s, t, s ∈ T. if we define the covariance function

R(h):=R(h,0) then R(t,s)=R(t-s).
Examples of stationary processes
(a)Elecrtic pulses in communication theory are often postulated to de-

scribe a stationary process. of course, in any physical system there is a
transient period at the beginning of the signal. Since typically this has a
short duration compared to the signal length, a stationary model may be
appropriate. In electrical communication theory, often both the electrical
potential and the current are often represented as a complex variables. here
we may encounter complex-valued stationary processes.
(b) the spatial and/or planar distribution of stars or glaxies, plants and
animals, are often stationary. here T might be Euclidean space, the sur-
phace of a sphere, or the plane.
A stationary distribution may be postulated forn the hight , again two
dimensional.
(c) Economic time series, such as unemployment, gross national products
, national income, etc., are often assumed to correspond to astationary
process, at least after some correction for long-term growth has been made.
12 Markov chains
The following are definitions and elementary properties of vectors and ma-
trices which are required for this chapter.
By a vector u we simply mean an n-tuple of numbers:
u = (u1 , u2 , · · · , un )
The uj are called the components of u. if all the components are zeros, the
vector u is called a zero vector. the set of all n-vectors forms a n-dimensional
60
Euclidean space.
By a matrx A we mean a rectangle array of numbers:
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A = aij =  ···

··· ··· 
am1 am2 · · · amn
the m horizontal n-tuples
(a11 , a12 , · · · , a1n ), (a21 , a22 , · · · , a2n ), · · · (am1 , am2 , · · · , amn )
are called the rows of A, and the n vertical m-tuples

     
a11 a12 a1n
 a21   a22 
 , · · · ,  a2n 
 
 ,
 ···   ···   ··· 
am1 am2 amn
its columns.
if A is an n-square matrix and u is a vector with n components, then we
can form the product uA which is again a a vector with n components.
We call u 6= 0 a fixed vector or fixed point of A if uA = u
Theorem 21. id u is a fixed vctor of a matrix A, then every non-zero scalar multiple
ku of u is also a fixed vector of A.
REGULAR STOCHASTIC MATRICES
Definition A stochastic matrix P is said to be regular, if all the entries of some power
Pm are positive.
0 1 1/2 1/2
Example The matrix A = is regular, since A2 =
1/2 1/2 1/4 3/4
FIXED POINTS AND REGULAR STOCHASTIC MATRICES
Theorem 22. Let P be a regular stochastic matrix. then:
{i} P has a unique probability vector t, and the components of t are positive.
{ii} the sequence P, P 2 , p3 , · · · of powers of P approaches the matrix T whose rows are
each the fixed point t.
{iii} if p is any probability vector, then the sequence of vectors pP, pP 2 , pP 3 , · · ·
approaches the fixed point t.
61

0 1
Example Consider the regular stochastic matrix P = . Its fixed
1/2 1/2
point is t = (1/3, 2/3)
MARKOV CHAIN
We now consider a sequence of trials whose outcomes , say, X1 , X2 , · · · , satisfy the
following two properties:
(i) Each out come belongs to a finite set of out comes (a1 , a2 , · · · , am ) called the state
space of the system.; if the outcome on n-trial is ai , then we say that the system is in
state ai or at the n-th step.
(ii) The outcome of any trial depends at most upon the outcome of immediately pre-
ceding trial and not upon any other previous out come; with each pair of states (aI , aj )
there is given the probability pij such that aj occurs immediately after ai occurs. Al-
ternatively, Xn+1 depends on Xn but not on the earlier values X0 , X1 , ..., Xn−1 of the
sequence.
Such a stochastic process is called a (finite) Markov chain. the numbers pij , called the
transition probabilities can be arranged in a matrix
 
p00 p01 · · · p0n
P = (pij ) =  p10 .. p11 p1n . 

. .. 
pn0 ··· pnn
called the transition matrix.
One may represent the chain by a graph with nodes representing the sample sapace of
Xn and directed arcs (i,j) for all pairs of states i,j such that pi,j > 0.
Theorem 23. The transition matrix P of a Markov chain is a stochastic matrix.
Example A man either drives his car or catches a train to work each day. Suppose
he never goes by train two days in a row.; but if he drives to work , then the next day
he is just as likely to drive again as he is to travel by train.
The state space of the system is (t (train), d (drive)). This stochastic process is a
Markov chain since a outcome on any day depends only what happened on a preceding
day.
The transition of the Markov chain is

0 1
1/2 1/2
The first row of the matrix corresponds to the fact that he never goes by the train two
days in a row and so he definitely will drive the day after he travels by train. The
62
second row of the matrix corresponds to the fact that the day after he drives he will
drive or goes by train with equal probability.
Higher Transition probabilities
The entry aij in the transition probabilities P of a Markov chain is the probability
that the system changes from the state ai to the state aj in one step. That is probability,
says, pnij . that a system changes from the state ai to the state aj in n steps?
ai → ai1 → ai2 → · · · → ain−1 → aj
Theorem 24. Let P be the transition probability matrix of a Markov chain process.
then the n-step transition matrix is equal to the nth power of P. Moreover, If p = (pi )
is the probability distribution of the system at arbitrary time, the pP is the probability
distribution of the system one step later and pPn is the probability distribution of the
(n)
system n steps later. In particular, if we denote by P(n) := (pij ) the n-step transition
matrix , then
p(1) = p(0) P, p(2) = p(1) P, · · · , p(n) = p(n−1) P = p(0) P
Example 25. Consider the Markov chain of Example 20. We have

   
a12 a1n
1/2 1/2 1/2 1/2 3/8 5/8  a22   a2n 
p(4) = P2 P2 = =  ··· ,···
,  , 
 ··· 
1/4 3/4 1/4 3/4 5/16 11/16
am2 amn
Thus the probability that the system changes from, say, state t to state d in exactly 4
steps is 5/8, i.e. p4td = 5/8. Similarly, p4tt = 5/16 and p4dt = 5/16 and p4dd = 11/16
STATIONARY DISTRIBUTION OF REGULAR MARKOV CHAINS

Suppose that a Markov chain is regular, i.e. that its transition matrix P is regular.
Therefore, the squence of n-step transition matrices P n approaches the matrix T whose
rows are each the unique fixed probability vector t of P.; hence the probability pnij that
aj occurs for sufficiently large n is independent of the original state ai and it approaches
the components tj of t. Thus, we have
Theorem 26. let the transition matrix P of a Markov chain be regular. Then, in
the long run, the probability that any state aj occurs is approximately equal to the
component of the unique fixed probability vector t of P.
63
Hence we see that the effect of the initial state or the initial probability distribution
of the process wears off as the number of steps of the process increase. Furthermore,
every sequence of probability distributions approaches the fixed probability vector t of
P, called the stationary distribution of the Markov chain.
Review Exercises From textbook of Schaum’s Outline of
THEORY AND PROBLEMS

1. Example 7.16, 7.17, 7.18, 7.19, 7.20.
2. Problems: 7.12, 7.13, 7.14, 7.18, 7.19, 7.20, 7.21, 7.22, 7.23, 7.26, 7.53.
13 MARKOV PROCESSES
64

STOCHPROC

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

STOCHPROC

Enviado por

Direitos autorais:

Formatos disponíveis

ELEMENTARY PROBABILITY

Nguyen Van Thu

A = B if and only if A ⊂ Band B ⊂ A

We specify a particular set either by listing its elements or by stating properties

B = {x : x is a prime number , x < 15}.

The intersection of A and B, denoted by A ∩ B, is the set of elements which belong

Observe that A \ B and B are disjoint, i.e (A \ B) ∩ B = ∅.

That is, Ac is the difference of the universal set Ω and A.

B. FINITE AND COUNTABLE SETS

n! = 1.2.3 · · · (n − 2)(n − 1)n

P (n, n) = n(n − 1)(n − 2) · · · 3.2.1 = n!

Corollary: There are n! permutations of n objects (taken all at a time)

Probability is the mathematical study of random or nondeterministic exper-

A. SAMPLE SPACE AND EVENTS

(iv) If A1 , A2 · · · is a sequence of mutually exclusive events,then

P (A1 ∪ A2 · · · ) = P (A1 ) + P (A2 ) + · · ·

P (A) = P (A ∪ ∅) = P (A) + P (∅)

Subtracting P (A) from both sides gives our result.

1 = P (Ω) = P (A ∪ Ac ) = P (A) + P (Ac )

from which our result follows .

A = {1, 2, 3} and B = {0, 3}.

D. FINITE EQUIPROBABLE SPACES

In particular, if Ω is a finite equiprobable space and |A| denotes the number

A ∩ E = {(2, 4), (4, 2)}

B. MULTIPLICATION THEOREM FOR CONDITIONAL PROBABIL-

Theorem 1. P (E ∩ A) = P (E)P (A|E)

P (A1 ∩ A2 ∩ · · · An ) = P (A1 )P (A2 |A1 )P (A3 |A1 A2 ) · · · P (An |A1 ∩ A2 ∩ · · · ∩ An−1 )

P (W1 W2 W3 W4 ) = P (W1 )P (W2 |W1 )P (W3 |W1 W2 )P (W4 |W1 W2 W3 )

C. PARTITIONS AND BAYES THEOREM

B = Ω ∩ B = (A1 ∪ A2 · · · ∪ An ) ∩ B = (A1 ∩ B) ∪ (A2 ∩ B) ∪ · · · ∪ (An ∩ B)

where Ai ∩ B are also mutually exclusive. Accordingly,

P (B) = P (A1 ∩ B) + P (A2 ∩ B) + · · · + P (An ∩ B)

Thus, by multiply theorem

P (B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + · · · + P (An )P (B|An ) (1)

P (T1 )P (V2 |T1 ) P (T1 )P (V2 |T1 )

Definition 2. Event A and B are independent, if P (A ∩ B) = P (A)P (B).

Ω = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }

A = { heads on the first coin } = { HH,HT }

B = { heads on the second coin } = { HH,TH }

P (A ∩ B ∩ C) = P (∅) = 0 6= P (A)P (B)P (C)

The expectation EX (denoted by µ ) and variance V arX are defined by

Ω = {(1, 1), (1, 2), · · · , (6, 6)}.

Next, we compute the mean of X

Some properties of expectation

C. VARIANCE AND STANDARD DEVIATION

If X is a function from Ω into the set R of real numbers and g is a function

is called the nth moment of X.

where µX is the mean of X. The standard deviation of X, denoted by σX , is

P (a 6 X 6 b) =area of shaded region.

In the language of calculus,

In this case, X is said to be a continous random variable. The function f is called

The standard deviation σX is defined by

EX: Let X be a continous random variables with the following distribution

Next, we compute the expectation, variance and standard deviation of X

E. CUMULATIVE DISTRIBUTION FUNCTION

On the other hand, if X is a continous random variable with distribution function,

In either case, F is monotonic increasing, i.e

F (a) 6 F (b) whenever a 6 b

and the limit of F to the left is 0 and to right is 1

lim F (x) = 0 and lim F (x) = 1

EX: Let X be a discrete random variable with the following distribution

The graph of the cumulative distribution function F of X follows

The correlation of X and Y , denoted by ρ(X, Y ), is defined by