Introduction To Probability Theory

Introduction to Probability Theory
K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay
November 5, 2017
2
LECTURES 26-27
Chapter10 : Limit Theorems

We describe two classes of limit theorems, i.e, “law of large numbers’
and ”central limit theorems“ in this chapter. Limit thorems describes the
the asymptotic behavior of random dynamical systems. For example, Law
of large numbers describes the asymptotic behavior of the average ’position’
Sn at the nth epoch (here we are looking time evolution at discrete time)
where as central limit theorem describes the fluctuations of Sn arround its
’average’. Here we describe the situation where ’displacements’ are inde-
pendent and identically distributed, i.e. when Sn = X1 + · · · + Xn , where
{Xn |n ≥ 1} is a sequence of random variables.
0.0.1 Some fundamental Inequalities

In this subsection, we discuss some fundamental inequalities which we will
be using.
Theorem 0.1 (Markov inequality) Let X be a non negative random variable

with finite nth moment. Then we have for each > 0
EX n
P {X ≥ } ≤ .
n
Proof. Set
0 if X(ω) <
Y (ω) =
n if X(ω) ≥ .
Then Y is a non negative simple random arable such that Y ≤ X n . Hence
from Theorem 6.0.29, we have EY ≤ EX n . Therefore
n P {X ≥ } = EY ≤ EX n .
This completes the proof.
Example 0.1 In the Markov inequality, no negativity is important. For

example, take X = 1 or −1 with probability 12 . Then Markov inequality
doesn’t hold. For example take ε = 12 and n = 1 and note that EX = 0.
As a corollary we have the Chebyshev’s inequality.

3
Chebyshev’s inequality. Let X be a random variable with finite mean µ

and finite variance σ 2 . Then for each > 0,
σ2
P {|X − µ| ≥ } ≤
2
The proof of Chebyshev’s inequality follows by replacing X by |X − µ|
in the Markov inequality.
Power of Markov inequality and Chebyschev’s inequality are in its gen-
erality not in its sharpness. But one can use Markov inequality to get some
nice estimates called the exponential inequalities. Note that one need to use
the structure of the distribution. Following is an important manifestation
of this.
Theorem 0.2 (Hoeffding’s inequality-Binomial case) Let X denote Bino-

mial (n, 12 ), then
n 2
P {X − ≥ nt} ≤ 2e−2nt , t > 0.

2
Proof: Note X = X1 + · · · + Xn , where Xi ’s are i.i.d. Bernoulli ( 21 ). Fix
λ > 0.
n n
P {X ≥ + nt} = P {λ(X − − nt) ≥ 0}
2 2
λ(X− n −nt)
= P {e 2 ≥ 1}
n

= P {eλ(X− 2 ) ≥ eλnt
n
(use Markov inequality) ≤ e−λnt E[eλ(X− 2 ) ]
1
= e−λnt Πni=1 E[eλ(Xi − 2 ) ]
− λ2 + e λ2 n
−λnt e
= e
2
nλ2
≤ e−λnt e 8
λ2 n
= e−λnt+ 8 .
In the last inequality, we use the following.

e− λ2 + e λ2 2 1 λ2 λ2

= e 4 + 2 + e− 4
2 4
λ2
≤ e 4 .
4
Hence n λ2 n 2
P {X ≥ + nt} ≤ inf e−λnt+ 8 = e−2nt .
2 λ≥0
A similar arguement implies

n 2
P {X ≤ − nt} ≤ e−2nt .
2
Note that, the above proof contains a general procedure. We will make this
explicit in the following general ’exponetial’ inequality.
Theorem 0.3 For any random variable X, we have the following tail bound.
P {X ≥ t} ≤ inf E[eλ(X−t) ], t > 0.

λ≥0
Proof It follows easily from
P {X ≥ t} = P {eλ(X−t) ≥ 1}
(Markov inequality) ≤ E[eλ(X−t) ].
Theorem 0.4 (Chernoff ’s inequality) Let X1 , X2 , · · · , Xn be i.i.d. random

variables, with mgf M (·). Then

P {Sn ≥ t} ≤ inf e−λt M n (λ) , P {Sn ≤ −t} ≤ inf e−λt M n (−λ) .
λ≥0 λ≥0
Proof follows from Theorem 0.3 by taking X = Sn and X = −Sn .
Example 0.2 Consider the case when Sn ∼ Binomial (n, p). Note
M (λ) = 1 − p + peλ .
Set
f (λ) = e−λt (1 − p + peλ )n .
Now verify that f attains its minimum at
h (1−p)t i
n
λ = log .
p(1 − nt )
Hence
t − t 1 − t −(1− t )
n n n n
inf f (λ) = .
λ≥0 p 1−p
5
This gives an upper tail bound

t − t 1 − t −(1− t )
n n
P {Sn ≥ t} ≤ n n
.
p 1−p
When p = 12 , the above bound becomes

1 t − nt t −(1− nt )
P {Sn ≥ t} ≤ 1− .
2 n n
Note that this bound is indeed sharper than the corresponding Hoeffding’s
inequality but a bit of comprise in sharpness gives a better convenient bound.
We end this section on inequalities with one more fundamental inequality

called the Cauchy-Schwarz inequality.
Lemma 0.1 Let X and Y be random variables with finite second moment.
Then XY has finite mean and satisfies
1 1
E|XY | ≤ [EX 2 ] 2 [EY 2 ] 2 .
Proof: If EX 2 = 0 or EY 2 = 0, then proof is easy (exercise). So we assume

])2
that EX 2 , EY 2 6= 0. Consider for λ = (E[XY
EY 2
,
0 ≤ E[X − λY ]2 = EX 2 − 2λE[XY ] + λ2 EY 2
(E[XY ])2
= EX 2 − .
EY 2
Hence the inequality follows.
0.0.2 Types of convergences

To describe the asymptotic behavior, for example in law of large numbers,
one should define the meaning of
X1 + · · · + Xn
lim .
n→∞ n
i.e. one need to talk about convergence of random variables. There are
multiple ways one can define convergence of sequence of random variables.
Definition 10.1. Let Xn , n ≥ 1, X be random variables defined on a prob-

ability space. Then Xn is said to converges almost surely if
P { lim Xn = X} = 1 .
n→∞
6
If Xn converges to X almost surely, we write Xn → X a.s.
Definition 10.2. Let Xn , n ≥ 1, X be random variables defined on a prob-

ability space. Then Xn is said to converge to X in Probability, if for each
> 0,
lim P {|Xn − X| > } = 0 .
n→∞
Definition 10. 3 Let Xn , n ≥ 1, X be random variables with distribution
functions Fn , n ≥ 1, F respectively. We say that Xn converges to X in
distribution if
lim Fn (x) = F (x) for all x ∈ D,
n→∞
where D is the set of discontinuities of F .
Definition 10.4 Let Xn , n ≥ 1, X be random variables. Then Xn is said to
converge to X in m the moment if Xn , n ≥ 1, X have finite mth moments
and
lim E|Xn − X|m = 0.
n→∞
Remark 0.1 Let Xn , n ≥ 1, X be random variables. Then following rela-

tions holds.
• Xn → X a.s. ⇒ Xn → X in Probability .
Recall that
n o ∞ \
\ [ ∞
lim Xn = X = {|Xn − X| ≤ ε}.
n→∞
ε>0 n=1 m=n
Now
∞ \
\ [ ∞ ∞ \
[ ∞
P {|Xn − X| ≤ ε} = 1 ⇒ P {|Xn − X| ≤ ε} = 1
ε>0 n=1 m=n n=1 m=n
for all ε > 0
\∞
⇒ lim P {|Xn − X| ≤ ε} = 1
n→∞
m=n
for all ε > 0
⇒ lim P ({|Xn − X| ≤ ε} = 1
n→∞
for all ε > 0.
This implies convergence in probability.

7
• Converse is not true. i.e. convergence in probability need not imply

convergence a.s.
For example, consider Xn , n ≥ 1, Xn ∼ Bernoulli ( n1 ).
For ε > 0,
1
P (|Xn | > ε) = P {Xn = 1} = → 0 as n → ∞.
n
Hence Xn → 0 in probability.
Now note that

∞
X
P {Xn = 1} = ∞. Hence using Borel-Cantelli lemma, it follows
n=1
that P ({Xn = 1 i.o}) = 1. Hence Xn doen’t converge to 0 a.s.
• Xn → X in Probability ⇒ along a subsequence Xn → X a.s.
• Xn → X in mth moment ⇒ Xn → X in Probability.
Using Markov inequality we get
1
P {|Xn − X| > ε} ≤ E|Xn − X|m .
εm
The above immdeately implies that
lim E|Xn − X|m = 0 ⇒ lim P {|Xn − X| > ε} = 0, for all ε > 0.

n→∞ n→∞
• Xn → X in Probability ⇒ Xn → X in distribution.
8
Proof of this is again not very difficult to see. For ε > 0, consider
|ΦXn (t) − ΦX (t)| = |E[eitXn − eitX ]|

≤ E|eit(Xn −X) − 1|
p
= E 2(1 − cos t(Xn − X))
t(Xn − X)
= 2E| sin |
2
t(Xn − X)
= 2E[| sin |I{|Xn −X|≤ε} |]
2
t(Xn − X)
+2E[| sin |I{|Xn −X|>ε} |]
2
(Xn − X)
≤ 2tE[| |I{|Xn −X|≤ε} |]
2
+P {|Xn − X| > ε}
≤ 2tε + P {|Xn − X| > ε}.
Hence
lim |ΦXn (t) − ΦX (t)| ≤ 2tε, for all ε > 0.
n→∞
i.e., Xn → X in probability implies that ΦXn (t) → ΦX (t) for each t.

Hence by the continuity theorem, FXn (x) → FX (x) at all continuity
points of F .
Following is a useful technical result for the rest of the chapter.
Lemma 0.2 Let Xn , X, n ≥ 1 be random variables such that P ({|Xn −X| ≥

ε i.o}) = 0 for all ε > 0. Then Xn → X a.s.
Proof: For ε > 0, consider

∞ \
[ ∞ ∞ [
\ ∞
P {|Xm − X| < ε} = 1−P {|Xm − X| ≥ ε}
n=1 m=n n=1 m=n
= 1 − P (|Xn − X| ≥ ε i.o ) = 1.
Therefore
∞ \
\ [ ∞
P {|Xm − X| < ε} = 1.
ε>0 n=1 m=n
This implies that Xn → X a.s.

9
0.0.3 Limit theorems

In this section, we look at limit theorems. As told earlier, we only consider
special cases but are good enough for many well known situations such as
random walks (simple symmetric ). We first discuss law of large numbers in
the weak form.
Theorem 0.5 (Weak law of large numbers) Let X1 , X2 , . . . be a sequence of

independent and identically distributed random variables, each having finite
mean µ and finite variance σ 2 . Then Snn converges in probability to µ. i.e.,
for each > 0,
n S o
n
lim P − µ ≥ = 0 ,

n→∞ n
where Sn = X1 + · · · + Xn .
Proof: Note that

S
n
S
n
2 σ2
E = µ, E −µ = .
n n n
Sn
Hence applying Chebyshev’s inequality to n we get
n S
n
o σ2
P − µ ≥ ≤ .

n n2
Therefore n S o
n
lim P − µ ≥ = 0 .

n→∞ n
Jacob Bernoulli proved weak law of large numbers when Xn ’s are i.i.d.
Bernoulli(p) random variables. As we have seen earlier that convergence
in probability only guarantee a.s convergence along sub sequence. So con-
centration of the averages of Sn to its mean is not completely satisfactory.
What one need is a.s. convergence instead of convergence in probability. In
the next theorem, we will do it for Bernoulli case.
Theorem 0.6 (Strong law -Bernoulli case) Let Xn , n ≥ 1 be a sequence of

i.i.d. Bernoulli ( 21 ) random variables. Then Snn → 12 a.s.
Proof: We use the following result which is left as an exercise. If P (|Xn −

X| ≥ ε i.o. ) = 1 for all ε > 0, then P (limn→∞ Xn = X) = 1 . Using
Hoeffing’s inequality, for each ε > 0, we have
S
n 1
2
P − ≥ ε ≤ 2e−2nε .
n 2
10
Hence
∞ S ∞
n 1
X X 2
P − ≥ε ≤2 (e−2ε )n < ∞.
n 2
n=1 n=1
Therefore by using Borel-Cantelli lemma, it follows that
S
n 1

P − ≥ ε i.o. = 0.
n 2
Sn 1
Hence using Lemma 0.2, we have n → 2 a.s. .
The above result was only intended at illustrating the use inequalities like
Hoeffding or Chernoff. Now we will prove strong law without any structural
condition on the distribution of Xn ’s.
Theorem 0.7 (Strong law of large numbers) Let X1 , X2 , . . . be a sequence

of independent and identically distributed random variables, each having fi-
nite mean µ and finite fourth moment. Then
Sn
lim → µ a.s. .
n→∞ n
Proof:
n
X 4
(Sn − nµ)4 = (Xi − µ)
i=1
n
X n
X
= (Xi − µ)4 + 4 (Xi − µ)3 (Xj − µ)
i=1 i6=j
n
X
+6 (Xi − µ)2 (Xj − µ)2
i6=j
n
X
+ (Xi − µ)2 (Xj − µ)(Xk − µ)
i,j,k distinct
X n
+ 24 (Xi − µ)(Xj − µ)(Xk − µ)(Xl − µ)
i,j,k,l distinct
[Note we used above number of permutations of four objects (need not be

distinct)]
Hence using the fact that (Xi − µ), (Xj − µ), (Xk − µ), (Xl − µ) are
independent for i, j, k, l distinct, we get
n
X n
X
4 4
E(Sn − nµ) = E(Xi − µ) + 6 E(Xi − µ)2 E(Xj − µ)2 . (0.1)
i=1 i6=j
11
Since X1 , X2 , . . . are identically distributed, we get
E(Sn − nµ)4 = nE(X1 − µ)4 + 3n(n − 1)E(X1 − µ)2 E(X2 − µ)2

≤ nK + 3n(n − 1)K ,
(0.2)
where K = E(X1 − µ)4 . Hence
hS i4 K 3K
n
E −µ ≤ 3
+ 2 . (0.3)
n n n
Therefore for each ε > 0,
S 1 h Sn i4
n
P − µ ≥ ε ≤ E − µ

n ε4 n
K 1 3
≤ + .
ε4 n 3 n 2
Hence
∞ S
X n
P − µ ≥ ε < ∞.

n
n=1
Now using Borel-Cantelli lemma it follows that for each ε > 0,

S
n
P − µ ≥ ε i.o. = 0.

n
Now note that if P (|Xn −X| ≥ ε i.o. ) = 1 for all ε > 0, then P (limn→∞ Xn =
X) = 1.
As an application we show that any continuous function can be approx-
imated by Bernstein polynomials.
Example 0.3 Let f : [0, 1] → R be a continuous function. Consider the

Bernstein polynomials
n
X k n
Bn (x) = f( ) xk (1 − x)n−k , 0 ≤ x ≤ 1 .
n k
k=0
Fix x ∈ (0, 1). Let X1 , X2 , . . . be independent and identically distributed

Bernoulli(x) random variables. Then using strong law of large numbers, we
have
Sn
→ x a.s. as n → ∞ .
n
12
Now note that Sn is Binomial (n, x) random variable. Hence

h S i
n
Bn (x) = E f .
n
Set S
n
Yn = f .
n
Then Yn → f (x) a.s. as n → ∞ and |Yn | ≤ K where K such that
−K ≤ f (x) ≤ K. Here we use the fact that every continuous function
defined on [0, 1] is bounded. Now apply the dominated convergence theorem
(Theorem 6.0.31), we get
lim EYn = f (x) .

n→∞
i.e.
lim Bn (x) = f (x), 0 < x < 1 .
n→∞
For x = 0, x = 1, the proof follows by observing that Bn (0) = f (0) and

Bn (1) = f (1).
Theorem 0.8 (Central limit theorem) Let X1 , X2 , . . . be a sequence of in-

dependent and identically distributed random variables, each having finite
mean µ and finite non zero variance σ 2 . Then
h S − nµ i
n
lim P √ ≤ x = N (x), x ∈ R ,
n→∞ σ n
where N (·) is the standard normal distribution function.
Proof:
Set
Sn − nµ
S̄n = √ .
σ n
For t ≥ 0,
t
−inµ σ√ t
ΦS̄n (t) = e ΦSn ( √ )
n
σ n (0.4)
t
−inµ σ√ t n
= e X1 ( √ )) ,
n (Φ
σ n
where the second inequality uses the fact that X1 , X2 , . . . be a sequence of
independent and identically distributed.
13
For each fixed t > 0, for sufficiently large n, ΦX1 ( σ√t n )) is close to 1.
Hence for sufficiently large values of n, we have from (0.4),
t t
n[ln(ΦX1 ( σ√ )) − iµ( σ√ )]
ΦS̄n (t) = e n n . (0.5)
Hence for t 6= 0,
t t t2 ln(ΦX1 ( σ√t n )) − iµ( σ√t n )

lim n[ln(ΦX1 ( √ )) − iµ( √ )] = lim
n→∞ σ n σ n σ 2 n→∞ ( σ√t n )2
(0.6)
Consider
ln(ΦX1 ( σ√t n )) − iµ( σ√t n ) ln(ΦX1 (x)) − iµx
lim t = lim
n→∞ (σ √
n
)2 x→0 x2
(1)
ΦX1 (x) − iµφX1 (x)
= lim
x→0 2x
(2)
φX1 (0) − iµΦ(1) (0) σ2
= = − .
2 2
(0.7)
The last two equalities follows from l’Hospital’s rule in view of the properties
of characteristic functions. Combining (0.5), (0.5) and (0.5), we get, for each
t 6= 0
t2
lim ΦS̄n (t) = e− 2 .
n→∞
For t = 0, the above limit follows easily. Hence for each t ∈ R, we have
t2
lim ΦS̄n (t) = e− 2 . (0.8)
n→∞
Hence using continuity theorem of characteristic functions, we have from

(0.8), we complete the proof.
With this I end this course. You will have a tutorial on 8th Novem-
ber at the usual tutorial hour. Also your end semester examina-
tion syllabus is based on all topics with pre midsem topics having
a weightage of (roughly) 10 %.
BEST OF LUCK.

Introduction To Probability Theory

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Introduction To Probability Theory

Enviado por

Direitos autorais:

Formatos disponíveis

Introduction to Probability Theory

Chapter10 : Limit Theorems

0.0.1 Some fundamental Inequalities

Theorem 0.1 (Markov inequality) Let X be a non negative random variable

This completes the proof.

Example 0.1 In the Markov inequality, no negativity is important. For

As a corollary we have the Chebyshev’s inequality.

Chebyshev’s inequality. Let X be a random variable with finite mean µ

Theorem 0.2 (Hoeffding’s inequality-Binomial case) Let X denote Bino-

In the last inequality, we use the following.

A similar arguement implies

P {X ≥ t} ≤ inf E[eλ(X−t) ], t > 0.

Proof It follows easily from

Theorem 0.4 (Chernoff ’s inequality) Let X1 , X2 , · · · , Xn be i.i.d. random

Proof follows from Theorem 0.3 by taking X = Sn and X = −Sn .

This gives an upper tail bound

When p = 12 , the above bound becomes

We end this section on inequalities with one more fundamental inequality

Proof: If EX 2 = 0 or EY 2 = 0, then proof is easy (exercise). So we assume

0.0.2 Types of convergences

Definition 10.1. Let Xn , n ≥ 1, X be random variables defined on a prob-

If Xn converges to X almost surely, we write Xn → X a.s.

Definition 10.2. Let Xn , n ≥ 1, X be random variables defined on a prob-

Remark 0.1 Let Xn , n ≥ 1, X be random variables. Then following rela-

This implies convergence in probability.

• Converse is not true. i.e. convergence in probability need not imply

For example, consider Xn , n ≥ 1, Xn ∼ Bernoulli ( n1 ).

Now note that

• Xn → X in Probability ⇒ along a subsequence Xn → X a.s.

• Xn → X in mth moment ⇒ Xn → X in Probability.

Using Markov inequality we get

The above immdeately implies that

lim E|Xn − X|m = 0 ⇒ lim P {|Xn − X| > ε} = 0, for all ε > 0.

This completes the proof.

|ΦXn (t) − ΦX (t)| = |E[eitXn − eitX ]|

i.e., Xn → X in probability implies that ΦXn (t) → ΦX (t) for each t.

Following is a useful technical result for the rest of the chapter.

Lemma 0.2 Let Xn , X, n ≥ 1 be random variables such that P ({|Xn −X| ≥

Proof: For ε > 0, consider

This implies that Xn → X a.s.

0.0.3 Limit theorems

Theorem 0.5 (Weak law of large numbers) Let X1 , X2 , . . . be a sequence of

Proof: Note that

Theorem 0.6 (Strong law -Bernoulli case) Let Xn , n ≥ 1 be a sequence of

Proof: We use the following result which is left as an exercise. If P (|Xn −

Theorem 0.7 (Strong law of large numbers) Let X1 , X2 , . . . be a sequence

[Note we used above number of permutations of four objects (need not be

Since X1 , X2 , . . . are identically distributed, we get

E(Sn − nµ)4 = nE(X1 − µ)4 + 3n(n − 1)E(X1 − µ)2 E(X2 − µ)2

Now using Borel-Cantelli lemma it follows that for each ε > 0,

Example 0.3 Let f : [0, 1] → R be a continuous function. Consider the

Fix x ∈ (0, 1). Let X1 , X2 , . . . be independent and identically distributed

Now note that Sn is Binomial (n, x) random variable. Hence

lim EYn = f (x) .

For x = 0, x = 1, the proof follows by observing that Bn (0) = f (0) and

Theorem 0.8 (Central limit theorem) Let X1 , X2 , . . . be a sequence of in-

where N (·) is the standard normal distribution function.

t t t2 ln(ΦX1 ( σ√t n )) − iµ( σ√t n )

Hence using continuity theorem of characteristic functions, we have from

Você também pode gostar