Você está na página 1de 15

Convergence in Distribution, and

the Central Limit Theorem


Definition: A function F : R [0, 1] is called a distribution function if it has
the following properties:
i) F is increasing:

x<y

F (x) F (y)

ii)


limx F (x) = 0
limx+ F (x) = 1

iii) F is continuous from the right.

Note: The set of discountinuities of F is at most countable (this fact is true for
increasing functions).

Example: The function F (x) =

1 x1
3e

x
x+1

x<1
is a distribution funcx1

tion.

It is known that
Theorem: If F : R [0, 1] is a distribution function, then there exists a random
variable X such that for all x R we have
F (x) = P (X x)
We denote this F by FX .
Definition: A sequence {Xn } of random variables is said to converge in distribution (or, in law) to a r.v. X provided that
lim FXn (x) = FX (x)

(1)
d

at every continuity point of FX . We may use the notation Xn X to describe


this situation.
Note that (1) is equivalent to
lim P (Xn x) = P (X x)

Note: When Xn s and X are discrete with values from {u1 , u2 , ...}, then the
above condition is equivalent to having
k = 1, 2, ...

lim P (Xn = uk ) = P (X = uk )

Verify this fact!.


Example: Let Xn N (0, n1 ). Show that {Xn } converges in distribution, then
find its limit.
Solution: Let (u) be the cummulative distribution function of the standard
normal r.v.:
Z u
x2
1
e 2 dx
(u) =
2

Then

FXn (u) =

nx2
n
e 2 dx = (u n)
2

So,
limn FXn (u)

()
(0)
=

()

1
2

u<0
u=0
u>0
u<0
u=0
u>0

So, by taking the random variable with distribution



0
u<0
FX (u) =
1
u0
FX is continuous except at u = 0; and FXn (u) FX (u) at each continuity
point of FX . So, Xn X in distribution.
Example: Let Xn be the r.v. with density

nxn1
0<x<1
fn (x) =
0
otherwise
Does {Xn } converge in law to any r.v.?
Solution: Let Fn be the distribution function of Xn .
For 0 x 1 we have
Z x
Z x
it=x
Fn (x) =
fn (t)dt =
ntn1 dt = tn
= xn

t=0

For x 1 we have Fn (x) Fn (1) = 1, so that Fn (x) = 1.


For x < 0 we have Fn (x) Fn (0) = 0, so that Fn (x) = 0.

x<1
0
So
lim Fn (x) =
n

1
x1
d

Therefore, Xn X where X has the distribution function



0
x<1
F (x) =
1
x1
(this X is indeed a degenerate r.v.). Recall that a constant random variable
X a is called degenerate; its distribution function is

0
x<a
F (x) = P (X a) =
1
xa
Example: Let Xn be the degenerate r.v. Xn n. So, Xn has the distribution
function

0
x<n
Fn (x) =
1
xn
Does {Xn } converges in law to any r.v.?
Solution: For every x we have limn Fn (x) = 0, and we know that there
cannot exist any distribution function that would agree with the zero function
on its continuity points. Therefore, {Xn } does not converge in law to any r.v.

Note: We recall from Calculus that for every real number c we have limx+ 1 +
x
ec . To see this, put y = 1 + xc , and note that

limx+ ln(y) = limx+ x ln 1 + xc
=

limx+

limz0+

= limz0+

c
ln(1+ x
)
1
x

ln(1+cz)
z
c
1+cz

= c
Then,
lim

x+

1+

c x
= lim y = lim eln(y) = elimx+ ln(y) = ec
x+
x+
x
3


c x
x

As a result, for positive integers n we have



c n
= ec
lim 1 +
n
n
Here is a genralization:
Lemma: Let cn be a sequence of real or complex numbers with lim cn = c.
n
Then lim 1 + cnn = ec .
n
Proof: Since we already know that limn 1 + nc = ec , we equivalently need
to show that
h
cn n 
c n i
lim 1 +
1+
=0
n
n
For this, let rn = cn c. Then rn 0, so for  > 0 there exists some n0 such
that
n n0

|rn | 
So, if we let sn = sup{|rn | , |rn+1 | , |rn+2 | , ...}, then
n n0

0 sn 

This shows that lim sn = 0. So, there exists some n1 such that
n n1

|sn | 1

k < n & n n1

|sn |nk |sn |

Therefore,

Then for n n1 we have







1 + cn n 1 + c n = 1 + c +
n
n
n
P
n
= k=0

n
k

P
n1
= k=0

n
k





rn n
n

1+


c n
n

1+


c k
n


rn nk
n

1+


c k
n


rn nk

n

1+

Pn1
k=0

n
k



1 + c k rn nk

Pn1
k=0

n
k



1 + c k |sn |nk
n

Pn1

n
k



1 + c k |sn |
n


1 nk
n

Pn

n
k



1 + c k |sn |
n


1 nk
n

n
k



1 + c k


1 nk
n

n
k



k=0

k=0

= |sn |

Pn

|sn |

Pn

k=0

k=0


= |sn | 1 +

|c|+1
n

after cancelling out

1+


c n

n

|c|
n

k

n


1 nk
n


1 nk
n
(n n0 )

But, the right-hand side tends to 0 e|c|+1 , so



cn n 
c n

lim 1 +
1+
=0
n
n
n
which proves the claim.
Note: One could have skipped the introduction of the sequence {sn } by using
the so-called upper-limit.
Continuity Theorem (based on moment generating functions): Suppose
the moment generating functions of the random variables X, X1 , X2 , ... exist
on some neighbourhood (, ) of the origin. If lim MXn (t) = MX (t) for all
d

t (, ), then Xn X.
Continuity Theorem (based on characteristic functions): If X, X1 , X2 ,
d

... are random variables such that limn Xn (t) = X (t), then Xn X, and
vice versa.
Example (Poisson approximation to the binomial distribution): Let
Xn Binomial(n, pn = n ), or more generally, assume lim npn = . Show that
5

Xn X, where X Poisson().
Proof: One way of doing this exercise is by showing that
lim P (Xn = k) = P (X = k)

k = 0, 1, 2, ...

(Try it!). However, we may apply either of the contiuity theorems:


h
in
limn Xn (t) = lim pn eit + (1 pn )
=

h
in
lim 1 + pn (eit 1)

h
lim 1 +

= e(e

it

npn (eit 1)
n

1)

in

as npn (eit 1) (eit 1)

= X (t)
d

So, from the continuity theorem, we have Xn X.


Lemma: If E|X n | < , then for all k = 1, ..., n we have E|X k | < . In other
words, if the n-th moment of X exists, then all lower order moments exist too.
Proof: Let A be the event A = {|X| 1}. Let f (x) be the density function of
X. Then on A we have
|X k | = |X|k |X|n = |X n |
Then

E|X k | = E(|X|k )
=

|x|k f (x)dx

|x|k f (x)dx +

|x|n f (x)dx +

Ac
Ac

|x|n f (x)dx +

= E(|X|n ) + 1
= E(|X n |) + 1 <
We recall the following fact from Calculus:

|x|k f (x)dx
1 f (x)dx

f (x)dx

Proposition: Let h(x, t) be a function defined for all t in an interval (a, b).
Suppose that an integrable function g(x) exists such that


d

h(x, t) g(x)
t (a, b) x

dt
then

d
dt

Z
h(x, t)dx =

d
h(x, t)dx
dt

t (a, b)

Now we use this fact to prove the following result:


Theorem: Let

eitx f (x)dx

(t) =

be the characteristic function of X. If the n-th moment E(X n ) exists (i.e., if


E|X n | < ), then all derivatives 0 (t), 00 (t), ... , (n) (t) exist
Z
Z
d itx
(k) (t) =
e f (x)dx =
(ix)k eitx f (x)dx
k = 1, 2, ..., n
(1)
dt

Especially,
(k) (0) = ik E(X k )
Proof:





d itx
e f (x) = (ix)eitx f (x) = |x|f (x)
dt

But, from the preceeding Lemma weR know that the moments E(X), E(X 2 ),

... , E(X n ) all exist, and therefore |x|f (x) = E|X| < , so, using the
preceeding Proposition, we can write
R itx
d
e f (x)dx
0 (t) = dt

d
dt


eitx f (x) dx

using preceeding Proposition

(ix)eitx f (x)dx

So,
Z

(ix)eitx f (x)dx

(t) =

(2)

Next Step:


d


(ix)eitx f (x) = (ix)2 eitx f (x) = |x2 |f (x)
dt

and we know that

|x2 |f (x) = E|X 2 | <

therefore we may differentiate (2) to write


Z
Z
Z

d
d
00
itx
itx
(t) =
(ix)e f (x)dx =
(ix)e f (x) dx =
(ix)2 eitx f (x)dx
dt
dt

so

00

(t) =

(ix)2 eitx f (x)dx

(3)

Continuing this way, we get the equality (1) for all k = 1, 2, ..., n.
Remark: We have seen smewhere that if the MGF of a random variable X exists
on some neighbourhood (h, h) of the origin, then both MX (t) and X (t) have
power series expansions on that neighbourhood, therefore have derivatives of all
order in that neighbourhood with
(n) (0) = in E(X n )
M (n) (0) = E(X n )

Taylors Theorem: Suppose f : [a, b] C is n1 times differentiable on [a, b].


If f (n) (x0 ) exists, then for every x [a, b] we have
f (x) = f (x0 )+

f 0 (x0 )
f 00 (x0 )
f (n) (x0 )
(xx0 )+
(xx0 )2 + +
(xx0 )n +R(x)
1!
2!
n!

where limxx0

R(x)
(xx0 )n

= 0.

Definition: A sequence {Xn } of random variables is called an i.i.d (independent identically distributed) if the xn s share the same distribution and that
the Xn s are independent.
A simple observation is the following, which shows that applying a transformation on an i.i.d. will result in an i.i.d:
Proposition: If {Xn } is an i.i.d., then for every continuous function g : R R,
the sequence {g(Xn )} is an i.i.d. too.
Proof: If Xm and Xn are any two members of the sequence and if O1 = [a, b]
and O2 = [c, d] are two intervals in the real line, then
P (g(Xm ) O1 , g(Xn ) O2 ) = P (Xm g 1 (O1 ) , Xn g 1 (O2 ))
= P (Xm g 1 (O1 ))P (Xn g 1 (O2 ))
= P (g(Xm ) O1 )P (g(Xn ) O2 )
8

This shows that g(Xn )s are independent. Further, for every interval O in the
real line:
P (g(Xm ) O) = P (Xm g 1 (O))
= P (Xn g 1 (O))
= P (g(Xn ) O)
which shows that the Xn s enjoy the same distribution.
Central Limit Theorem: Let {X1 , X2 , ...} be an i.i.d. with finite mean
and finite variance 2 (equivalently, the first and second moments are finite).
n = X1 ++Xn be the sequence of sample means. Then the sequence
Let X
n

n )
n(X

converges in distribution to N (0, 1).

t2

Proof: Let n be the C.F. of the r.v. n(Xn ) . Let (t) = e 2 be the C.F.
f the r.v. N (0, 1). For every t we will show n (t) (t); then in light of the
convergence theorem this will do.
set
Xi
i = 1, ..., n
Zi =

Then form the last proposition, the sequence {Z1 , ..., Zn , ...} is an i.i.d. since it
is found by applying a transformation on the i.i.d. {X1 , ..., Xn }. Since the Zi s
have the same distribution, their C.F.s are
identical; let be their common

Pn
n(Xn )
characteristic function. Note further that
= 1n i=1 Zi as

Pn

n )
)
n n
n(X
n(X
nX
n
=
=
=

n
n

Xi n
1 X

=
n
n

i=1

Xi

Then
n (t) = 1

Pn
Z
i=1 i

= Pn

i=1

= Z1
=

Zi

t
n

(t)

t
n

Zn

  n
tn

t
n

(1)

Since the common second moment of Xi s is finite, the common second moment
of Zi s is finite and therefore the second derivative 00 (t) exists for all t. So then
we can apply the Taylors theorem to write:
(s) = (0) + 0 (0)s +

1 00
(0)s2 + R(s)
2!

1 X
=
Zi
n i=1

R(s)
s2

where, lims0


t
n

= 0. By changing s to

t
1
= (0) + 0 (0) +
2!
n

we will have

2

00

(0) + R


(2)

But
(0) = 1
0 (0) = i E(Z1 ) = 0
00 (0) = i2 E(Z12 ) = 1
So, the identity (2) reduces to




t
t2
t

=1
+R
2n
n
n
By putting this back into (1):
n
n (t) =
1

1+
=

t2
2n

+R
2

t2 +nR

t
n
t
n

on
 n

(3)

But,

nR


=n

 
 
2 R t
t
R
t
n
n
2

=
t
 2
 2 0
n
t
t

Therefore, from (3), we get


t2

lim n (t) = e 2

as desired.
Definition: Let {Yn } be a sequence of random variables. Suppose that sed

n
N (0, 1). Then we say that {Yn }
quences {an } and {bn } exis such that Ynba
n
is asymptotically normal N (an , b2n ).

Note: Recall the central limit theorem



n(Xn ) d
N (0, 1)

(1)

Since Xi are assumed to form an i.i.d. in central limit theorem, we have



n ) = E X1 ++Xn = 1 {E(X1 ) + + E(Xn )} = 1 { + + } =
E(X
n
n
n
n)
V ar(X

1
n2

{V ar(X1 ) + + V ar(Xn )} =
10

2
n

So, the relationship (1) is the same as


n E(X
n) d
X
N (0, 1)

std(Xn )

(2)

n tends to N (0, 1) in distribution. Further,


i.e., the standardized form of X
the relationship (2) is the same as
n d
X
N (0, 1)
( n )
n is assymtotically normal N (, 2 )
So, we could further say that X
n
n =
Now, by defining the sums Sn = X1 + +Xn , we have X
have
Sn
n( n ) d
N (0, 1)

equivalently
Sn n d

N (0, 1)
(3)
n

Sn
n ,

so we actually

So, Sn is assymptotically normal N (n, n 2 ). On the other hand, note that


E(Sn )
V ar(Sn )

= E(X1 ) + + E(Xn ) = n
= V ar(X1 ) + + V ar(Xn ) = n 2

So, the relationship (3) is the same as


Sn E(Sn ) d
N (0, 1)
std(Sn )
i.e., the standardized form of Sn tends to N (0, 1) in distribution.
d

Note: If {Yn } is a sequence that Yn N (0, 1), then for every two real numbers
a < b we have
P (a < Yn b) = P (Yn b) P (Yn a) = FYn (b) FYn (a) (b) (a)
so we use (b) (a) to approximate P (a < Yn b) for large n. When Yn has
a continuous distribution, the probabilities P (Yn = a) and P (Yn = b) are zero,
therefore we may use (b) (a) to approximate all of
P (a Yn b)

P (a Yn < b)

P (a < Yn < b)

But, when the r.v.s Yn are integer-valued, then we apply a continuity correction.
So here is the discussion:

11

Discussion: Suppose X is an integer-valued r.v. and suppose X N (, 2 )


Y (so, X is approximately normal with mean and variance 2 ). How do we
approximate a probability P (X = k) using this fact?. Of course we cannot
approximate this probability by P (Y = k) as this latter one is zero. OIne way
of properly approximating is this:
P (X = k) = P (X k)P (X k1) = FX (k)FX (k1) FY (k)FY (k1)
But, it has become a common practice amongst the statisticians to do this
instead:
P (X = k) = P (k 0.5 < X k + 0.5)

= FX (k + 0.5) FX (k 0.5)
FY (k + 0.5) FY (k 0.5)
= P (k 0.5 < Y < k + 0.5)
= P

k0.5

k+0.5

<


<

k+0.5

k0.5

So,

P (X = k)

k + 0.5

k 0.5

As a result of this, we will have:


Pn
P (m X n) =
k=m P (X = k)

Pn

k=m



i
h 
k+0.5
k0.5

n+0.5

m0.5

= P (m 0.5 < Y < n + 0.5)


So:
P (m X n) P (m 0.5 < Y < n + 0.5)

where

X N (, 2 ) Y

This is the continuity correction when both end points are inclusive. But id
the left-end-point is exclusive, then we write
P (m < X n) = P (m+1 X n) P ((m+1)0.5 < Y < n+0.5) = P (m+0.5 < Y < n+0.5)

P (m < X n) P (m + 0.5 < Y < n + 0.5)

And, as you may guess, we can prove the followings:


P (m < X < n) P (m + 0.5 < Y < n 0.5)
12

P (m X < n) P (m 0.5 < Y < n 0.5)


So this is the rule: If the left-end point is inclusive, then subtract 0.5, otherwise
add 0.5. For the right-end point, things are done in the opposite way.
Note: To evaluate a value (c) of the normal cumulative distribution, use the
following function in a dell of the Microsoft Excel
= NORM.DIST(c, 0, 1, TRUE)

Example: Suppose X is integral-valued and X N (2, 3.7) Y , and suppose


that we want P (0.2 < X 4.7).
Solution:
P (0.2 < X 4.7)

= P (1 X 4) P (0.5 < Y < 4.5)


= P

0.52

3.7

4.52

3.7

<


Y 2

3.7

<

4.52

3.7

0.52

3.7

= 0.6854

Example: A die is rolled thirty times. Find the (approximate) probability that
the total score will be between 90 and 120 inclusive.
Solution: Let Xi be the outcome of the i-th rolling. We want P (90 S30
120). In fact, the mean value of each rolling Xi is
1 + 2 + + 6
= 3.5
6
and the variance is
(1 3.5)2 + (2 3.5)2 + + (6 3.5)2
= 2.9167
6
So then
E(S30 ) =
V ar(S30 ) =

(30)(3.5) = 105
(30)(2.9167) = 87.5

S30 N (105, 87.5) Y

13

P (90 S30 120) P (89.5 < Y < 120.5)


= P

89.5105

87.5

<

120.5105

87.5

Y105
87.5

<

120.5105

87.5

89.5105

87.5

0.9512 0.0488 = 0.9025

Example: For X Binomial(20, 0.3), find the approximate probabilities for


P (X = 7) and P (7 < X 10).
Solution: If Xi Bernoulli(0.3) = Binomial(1, 0.3), then
X = S20 = X1 + + X20
We have
E(X) = (20)(0.3) = 6

V ar(X) = (20)(0.3)(0.7) = 4.2

So,
X N (6, 4.2) Y
Then
P (X = 7) = P (7 X 7) P (6.5 < Y < 7.5) = P

6.56

4.2

<

Y 6

4.2

<

7.56

4.2





6.56

( 7.56

4.2
4.2

0.7679 0.5964 = 0.1725

Example: A random vector (X1 , ..., X100 ) (i.e. an i.i.d.) is picked from a
population with distribution uniform(1, 1). Find the approximate probability
that the square of the distance of the vector from the origin is less than or equal
to 40.
P100
Solution: We are looking for P ( i=1 Xi2 40). We have
i1
R1
= 13
E(Xi2 ) = 1 x2 12 dx = 16 x3
1

E(Xi4 )
V ar(Xi2 )

R1

x4 21 dx =
1

1 5
10 x

i1
1


2
= E(Xi4 ) E(Xi2 ) =
14

1
5

1
5

1
9

4
45

So then:

(100)( 13 ) =

P
E( Xi2 ) =
P
V ar( Xi2 )

100
3

4
V ar(Xi2 ) = (100)( 45
)=

Based on CLT:

 100

80
9

80 
Y
3
9
Since the Xi s are continuously distributed, we dont apply the continuity correction. So,


100
P
Y 100
803 40
803
P ( Xi2 40) P (Y 40) = P
X

Xi2 N

= P (N (0, 1) 2.235) = (2.235) = 0.9873

Example: Suppose a random sample (an i.i.d.) of 48 numbers is picked from


uniform(0, 2). Find the approximate probability that the sample mean is between 0.8 and 1.1
Solution: We know that if X uniform(, ), then
E(X) =

+
2

and

V ar(X) =

( )2
12

In the case of the sample {X1 , ..., X48 } we have


1
3

E(Xi )

V ar(Xi ) =

48 )
E(X

48 ) =
V ar(X

V ar(X1 )
48

1
144

So, from the CLT we have


48 N
X

1
1,
144


Y

Since we are dealing with continuous random variables, no correction is


needed. So,


Y 1
1.11
0.81
48 < 1.1) P (0.8 < Y < 1.1) = P

P (0.8 < X
<
<
1
1
1
144

15

144

144

(1.2) (2.4) = 0.8767

Você também pode gostar