Você está na página 1de 13

Introduction to Probability Theory

K. Suresh Kumar
Department of Mathematics
Indian Institute of Technology Bombay

November 5, 2017
2

LECTURES 26-27

Chapter10 : Limit Theorems


We describe two classes of limit theorems, i.e, “law of large numbers’
and ”central limit theorems“ in this chapter. Limit thorems describes the
the asymptotic behavior of random dynamical systems. For example, Law
of large numbers describes the asymptotic behavior of the average ’position’
Sn at the nth epoch (here we are looking time evolution at discrete time)
where as central limit theorem describes the fluctuations of Sn arround its
’average’. Here we describe the situation where ’displacements’ are inde-
pendent and identically distributed, i.e. when Sn = X1 + · · · + Xn , where
{Xn |n ≥ 1} is a sequence of random variables.

0.0.1 Some fundamental Inequalities


In this subsection, we discuss some fundamental inequalities which we will
be using.

Theorem 0.1 (Markov inequality) Let X be a non negative random variable


with finite nth moment. Then we have for each  > 0

EX n
P {X ≥ } ≤ .
n

Proof. Set 
0 if X(ω) < 
Y (ω) =
n if X(ω) ≥  .
Then Y is a non negative simple random arable such that Y ≤ X n . Hence
from Theorem 6.0.29, we have EY ≤ EX n . Therefore

n P {X ≥ } = EY ≤ EX n .

This completes the proof.

Example 0.1 In the Markov inequality, no negativity is important. For


example, take X = 1 or −1 with probability 12 . Then Markov inequality
doesn’t hold. For example take ε = 12 and n = 1 and note that EX = 0.

As a corollary we have the Chebyshev’s inequality.


3

Chebyshev’s inequality. Let X be a random variable with finite mean µ


and finite variance σ 2 . Then for each  > 0,
σ2
P {|X − µ| ≥ } ≤
2
The proof of Chebyshev’s inequality follows by replacing X by |X − µ|
in the Markov inequality.
Power of Markov inequality and Chebyschev’s inequality are in its gen-
erality not in its sharpness. But one can use Markov inequality to get some
nice estimates called the exponential inequalities. Note that one need to use
the structure of the distribution. Following is an important manifestation
of this.

Theorem 0.2 (Hoeffding’s inequality-Binomial case) Let X denote Bino-


mial (n, 12 ), then
n 2
P { X − ≥ nt} ≤ 2e−2nt , t > 0.

2
Proof: Note X = X1 + · · · + Xn , where Xi ’s are i.i.d. Bernoulli ( 21 ). Fix
λ > 0.
n  n 
P {X ≥ + nt} = P {λ(X − − nt) ≥ 0}
2  2 
λ(X− n −nt)
= P {e 2 ≥ 1}
 n

= P {eλ(X− 2 ) ≥ eλnt
n
(use Markov inequality) ≤ e−λnt E[eλ(X− 2 ) ]
1
= e−λnt Πni=1 E[eλ(Xi − 2 ) ]
 − λ2 + e λ2 n
−λnt e
= e
2
nλ2
≤ e−λnt e 8

λ2 n
= e−λnt+ 8 .

In the last inequality, we use the following.


 e− λ2 + e λ2 2 1  λ2 λ2

= e 4 + 2 + e− 4
2 4
λ2
≤ e 4 .
4

Hence  n  λ2 n 2
P {X ≥ + nt} ≤ inf e−λnt+ 8 = e−2nt .
2 λ≥0

A similar arguement implies


 n  2
P {X ≤ − nt} ≤ e−2nt .
2
Note that, the above proof contains a general procedure. We will make this
explicit in the following general ’exponetial’ inequality.

Theorem 0.3 For any random variable X, we have the following tail bound.

P {X ≥ t} ≤ inf E[eλ(X−t) ], t > 0.


λ≥0

Proof It follows easily from

P {X ≥ t} = P {eλ(X−t) ≥ 1}
(Markov inequality) ≤ E[eλ(X−t) ].

Theorem 0.4 (Chernoff ’s inequality) Let X1 , X2 , · · · , Xn be i.i.d. random


variables, with mgf M (·). Then
   
P {Sn ≥ t} ≤ inf e−λt M n (λ) , P {Sn ≤ −t} ≤ inf e−λt M n (−λ) .
λ≥0 λ≥0

Proof follows from Theorem 0.3 by taking X = Sn and X = −Sn .

Example 0.2 Consider the case when Sn ∼ Binomial (n, p). Note

M (λ) = 1 − p + peλ .

Set
f (λ) = e−λt (1 − p + peλ )n .
Now verify that f attains its minimum at

h (1−p)t i
n
λ = log .
p(1 − nt )

Hence
 t − t  1 − t −(1− t )
n n n n
inf f (λ) = .
λ≥0 p 1−p
5

This gives an upper tail bound


   t − t  1 − t −(1− t )
n n
P {Sn ≥ t} ≤ n n
.
p 1−p

When p = 12 , the above bound becomes


  1  t − nt  t −(1− nt )
P {Sn ≥ t} ≤ 1− .
2 n n
Note that this bound is indeed sharper than the corresponding Hoeffding’s
inequality but a bit of comprise in sharpness gives a better convenient bound.

We end this section on inequalities with one more fundamental inequality


called the Cauchy-Schwarz inequality.
Lemma 0.1 Let X and Y be random variables with finite second moment.
Then XY has finite mean and satisfies
1 1
E|XY | ≤ [EX 2 ] 2 [EY 2 ] 2 .

Proof: If EX 2 = 0 or EY 2 = 0, then proof is easy (exercise). So we assume


])2
that EX 2 , EY 2 6= 0. Consider for λ = (E[XY
EY 2
,

0 ≤ E[X − λY ]2 = EX 2 − 2λE[XY ] + λ2 EY 2
(E[XY ])2
= EX 2 − .
EY 2
Hence the inequality follows.

0.0.2 Types of convergences


To describe the asymptotic behavior, for example in law of large numbers,
one should define the meaning of
X1 + · · · + Xn
lim .
n→∞ n
i.e. one need to talk about convergence of random variables. There are
multiple ways one can define convergence of sequence of random variables.

Definition 10.1. Let Xn , n ≥ 1, X be random variables defined on a prob-


ability space. Then Xn is said to converges almost surely if

P { lim Xn = X} = 1 .
n→∞
6

If Xn converges to X almost surely, we write Xn → X a.s.

Definition 10.2. Let Xn , n ≥ 1, X be random variables defined on a prob-


ability space. Then Xn is said to converge to X in Probability, if for each
 > 0,
lim P {|Xn − X| > } = 0 .
n→∞
Definition 10. 3 Let Xn , n ≥ 1, X be random variables with distribution
functions Fn , n ≥ 1, F respectively. We say that Xn converges to X in
distribution if
lim Fn (x) = F (x) for all x ∈ D,
n→∞
where D is the set of discontinuities of F .
Definition 10.4 Let Xn , n ≥ 1, X be random variables. Then Xn is said to
converge to X in m the moment if Xn , n ≥ 1, X have finite mth moments
and
lim E|Xn − X|m = 0.
n→∞

Remark 0.1 Let Xn , n ≥ 1, X be random variables. Then following rela-


tions holds.

• Xn → X a.s. ⇒ Xn → X in Probability .

Recall that
n o ∞ \
\ [ ∞
lim Xn = X = {|Xn − X| ≤ ε}.
n→∞
ε>0 n=1 m=n

Now
∞ \
\ [ ∞  ∞ \
[ ∞ 
P {|Xn − X| ≤ ε} = 1 ⇒ P {|Xn − X| ≤ ε} = 1
ε>0 n=1 m=n n=1 m=n
for all ε > 0
 \∞ 
⇒ lim P {|Xn − X| ≤ ε} = 1
n→∞
m=n
for all ε > 0
⇒ lim P ({|Xn − X| ≤ ε} = 1
n→∞
for all ε > 0.

This implies convergence in probability.


7

• Converse is not true. i.e. convergence in probability need not imply


convergence a.s.

For example, consider Xn , n ≥ 1, Xn ∼ Bernoulli ( n1 ).

For ε > 0,

1
P (|Xn | > ε) = P {Xn = 1} = → 0 as n → ∞.
n

Hence Xn → 0 in probability.

Now note that



X
P {Xn = 1} = ∞. Hence using Borel-Cantelli lemma, it follows
n=1
that P ({Xn = 1 i.o}) = 1. Hence Xn doen’t converge to 0 a.s.

• Xn → X in Probability ⇒ along a subsequence Xn → X a.s.

• Xn → X in mth moment ⇒ Xn → X in Probability.

Using Markov inequality we get

1
P {|Xn − X| > ε} ≤ E|Xn − X|m .
εm

The above immdeately implies that

lim E|Xn − X|m = 0 ⇒ lim P {|Xn − X| > ε} = 0, for all ε > 0.


n→∞ n→∞

This completes the proof.

• Xn → X in Probability ⇒ Xn → X in distribution.
8

Proof of this is again not very difficult to see. For ε > 0, consider

|ΦXn (t) − ΦX (t)| = |E[eitXn − eitX ]|


≤ E|eit(Xn −X) − 1|
p
= E 2(1 − cos t(Xn − X))
t(Xn − X)
= 2E| sin |
2
t(Xn − X)
= 2E[| sin |I{|Xn −X|≤ε} |]
2
t(Xn − X)
+2E[| sin |I{|Xn −X|>ε} |]
2
(Xn − X)
≤ 2tE[| |I{|Xn −X|≤ε} |]
2
+P {|Xn − X| > ε}
≤ 2tε + P {|Xn − X| > ε}.

Hence
lim |ΦXn (t) − ΦX (t)| ≤ 2tε, for all ε > 0.
n→∞

i.e., Xn → X in probability implies that ΦXn (t) → ΦX (t) for each t.


Hence by the continuity theorem, FXn (x) → FX (x) at all continuity
points of F .

Following is a useful technical result for the rest of the chapter.

Lemma 0.2 Let Xn , X, n ≥ 1 be random variables such that P ({|Xn −X| ≥


ε i.o}) = 0 for all ε > 0. Then Xn → X a.s.

Proof: For ε > 0, consider


∞ \
[ ∞  ∞ [
\ ∞ 
P {|Xm − X| < ε} = 1−P {|Xm − X| ≥ ε}
n=1 m=n n=1 m=n
= 1 − P (|Xn − X| ≥ ε i.o ) = 1.

Therefore
∞ \
\ [ ∞ 
P {|Xm − X| < ε} = 1.
ε>0 n=1 m=n

This implies that Xn → X a.s.


9

0.0.3 Limit theorems


In this section, we look at limit theorems. As told earlier, we only consider
special cases but are good enough for many well known situations such as
random walks (simple symmetric ). We first discuss law of large numbers in
the weak form.

Theorem 0.5 (Weak law of large numbers) Let X1 , X2 , . . . be a sequence of


independent and identically distributed random variables, each having finite
mean µ and finite variance σ 2 . Then Snn converges in probability to µ. i.e.,
for each  > 0,
n S o
n
lim P − µ ≥  = 0 ,

n→∞ n
where Sn = X1 + · · · + Xn .

Proof: Note that


S 
n
S
n
2 σ2
E = µ, E −µ = .
n n n
Sn
Hence applying Chebyshev’s inequality to n we get
n S
n
o σ2
P − µ ≥  ≤ .

n n2
Therefore n S o
n
lim P − µ ≥  = 0 .

n→∞ n
Jacob Bernoulli proved weak law of large numbers when Xn ’s are i.i.d.
Bernoulli(p) random variables. As we have seen earlier that convergence
in probability only guarantee a.s convergence along sub sequence. So con-
centration of the averages of Sn to its mean is not completely satisfactory.
What one need is a.s. convergence instead of convergence in probability. In
the next theorem, we will do it for Bernoulli case.

Theorem 0.6 (Strong law -Bernoulli case) Let Xn , n ≥ 1 be a sequence of


i.i.d. Bernoulli ( 21 ) random variables. Then Snn → 12 a.s.

Proof: We use the following result which is left as an exercise. If P (|Xn −


X| ≥ ε i.o. ) = 1 for all ε > 0, then P (limn→∞ Xn = X) = 1 . Using
Hoeffing’s inequality, for each ε > 0, we have
 S
n 1
 2
P − ≥ ε ≤ 2e−2nε .
n 2
10

Hence
∞  S ∞
n 1
X  X 2
P − ≥ε ≤2 (e−2ε )n < ∞.
n 2
n=1 n=1
Therefore by using Borel-Cantelli lemma, it follows that
 S
n 1

P − ≥ ε i.o. = 0.
n 2
Sn 1
Hence using Lemma 0.2, we have n → 2 a.s. .
The above result was only intended at illustrating the use inequalities like
Hoeffding or Chernoff. Now we will prove strong law without any structural
condition on the distribution of Xn ’s.

Theorem 0.7 (Strong law of large numbers) Let X1 , X2 , . . . be a sequence


of independent and identically distributed random variables, each having fi-
nite mean µ and finite fourth moment. Then
Sn
lim → µ a.s. .
n→∞ n
Proof:
n
X 4
(Sn − nµ)4 = (Xi − µ)
i=1
n
X n
X
= (Xi − µ)4 + 4 (Xi − µ)3 (Xj − µ)
i=1 i6=j
n
X
+6 (Xi − µ)2 (Xj − µ)2
i6=j
n
X
+ (Xi − µ)2 (Xj − µ)(Xk − µ)
i,j,k distinct
X n
+ 24 (Xi − µ)(Xj − µ)(Xk − µ)(Xl − µ)
i,j,k,l distinct

[Note we used above number of permutations of four objects (need not be


distinct)]
Hence using the fact that (Xi − µ), (Xj − µ), (Xk − µ), (Xl − µ) are
independent for i, j, k, l distinct, we get
n
X n
X
4 4
E(Sn − nµ) = E(Xi − µ) + 6 E(Xi − µ)2 E(Xj − µ)2 . (0.1)
i=1 i6=j
11

Since X1 , X2 , . . . are identically distributed, we get

E(Sn − nµ)4 = nE(X1 − µ)4 + 3n(n − 1)E(X1 − µ)2 E(X2 − µ)2


≤ nK + 3n(n − 1)K ,
(0.2)
where K = E(X1 − µ)4 . Hence
hS i4 K 3K
n
E −µ ≤ 3
+ 2 . (0.3)
n n n
Therefore for each ε > 0,
 S  1 h Sn i4
n
P − µ ≥ ε ≤ E − µ

n ε4 n
K 1 3
≤ + .
ε4 n 3 n 2
Hence
∞  S 
X n
P − µ ≥ ε < ∞.

n
n=1

Now using Borel-Cantelli lemma it follows that for each ε > 0,


 S 
n
P − µ ≥ ε i.o. = 0.

n
Now note that if P (|Xn −X| ≥ ε i.o. ) = 1 for all ε > 0, then P (limn→∞ Xn =
X) = 1.
This completes the proof.
As an application we show that any continuous function can be approx-
imated by Bernstein polynomials.

Example 0.3 Let f : [0, 1] → R be a continuous function. Consider the


Bernstein polynomials
n  
X k n
Bn (x) = f( ) xk (1 − x)n−k , 0 ≤ x ≤ 1 .
n k
k=0

Fix x ∈ (0, 1). Let X1 , X2 , . . . be independent and identically distributed


Bernoulli(x) random variables. Then using strong law of large numbers, we
have
Sn
→ x a.s. as n → ∞ .
n
12

Now note that Sn is Binomial (n, x) random variable. Hence


h  S i
n
Bn (x) = E f .
n
Set S 
n
Yn = f .
n
Then Yn → f (x) a.s. as n → ∞ and |Yn | ≤ K where K such that
−K ≤ f (x) ≤ K. Here we use the fact that every continuous function
defined on [0, 1] is bounded. Now apply the dominated convergence theorem
(Theorem 6.0.31), we get

lim EYn = f (x) .


n→∞

i.e.
lim Bn (x) = f (x), 0 < x < 1 .
n→∞

For x = 0, x = 1, the proof follows by observing that Bn (0) = f (0) and


Bn (1) = f (1).

Theorem 0.8 (Central limit theorem) Let X1 , X2 , . . . be a sequence of in-


dependent and identically distributed random variables, each having finite
mean µ and finite non zero variance σ 2 . Then
h S − nµ i
n
lim P √ ≤ x = N (x), x ∈ R ,
n→∞ σ n

where N (·) is the standard normal distribution function.

Proof:
Set
Sn − nµ
S̄n = √ .
σ n
For t ≥ 0,
t
−inµ σ√ t
ΦS̄n (t) = e ΦSn ( √ )
n
σ n (0.4)
t
−inµ σ√ t n
= e X1 ( √ )) ,
n (Φ
σ n
where the second inequality uses the fact that X1 , X2 , . . . be a sequence of
independent and identically distributed.
13

For each fixed t > 0, for sufficiently large n, ΦX1 ( σ√t n )) is close to 1.
Hence for sufficiently large values of n, we have from (0.4),
t t
n[ln(ΦX1 ( σ√ )) − iµ( σ√ )]
ΦS̄n (t) = e n n . (0.5)

Hence for t 6= 0,

t t t2 ln(ΦX1 ( σ√t n )) − iµ( σ√t n )


lim n[ln(ΦX1 ( √ )) − iµ( √ )] = lim
n→∞ σ n σ n σ 2 n→∞ ( σ√t n )2
(0.6)
Consider
ln(ΦX1 ( σ√t n )) − iµ( σ√t n ) ln(ΦX1 (x)) − iµx
lim t = lim
n→∞ (σ √
n
)2 x→0 x2
(1)
ΦX1 (x) − iµφX1 (x)
= lim
x→0 2x
(2)
φX1 (0) − iµΦ(1) (0) σ2
= = − .
2 2
(0.7)
The last two equalities follows from l’Hospital’s rule in view of the properties
of characteristic functions. Combining (0.5), (0.5) and (0.5), we get, for each
t 6= 0
t2
lim ΦS̄n (t) = e− 2 .
n→∞

For t = 0, the above limit follows easily. Hence for each t ∈ R, we have
t2
lim ΦS̄n (t) = e− 2 . (0.8)
n→∞

Hence using continuity theorem of characteristic functions, we have from


(0.8), we complete the proof.

With this I end this course. You will have a tutorial on 8th Novem-
ber at the usual tutorial hour. Also your end semester examina-
tion syllabus is based on all topics with pre midsem topics having
a weightage of (roughly) 10 %.

BEST OF LUCK.

Você também pode gostar