5 - CentralLimitTheorem

Convergence in Distribution, and
the Central Limit Theorem

Definition: A function F : R [0, 1] is called a distribution function if it has
the following properties:
i) F is increasing:
x<y
F (x) F (y)
ii)

limx F (x) = 0
limx+ F (x) = 1
iii) F is continuous from the right.
Note: The set of discountinuities of F is at most countable (this fact is true for
increasing functions).
Example: The function F (x) =
1 x1
3e
x
x+1
x<1
is a distribution funcx1
tion.
It is known that
Theorem: If F : R [0, 1] is a distribution function, then there exists a random
variable X such that for all x R we have
F (x) = P (X x)
We denote this F by FX .
Definition: A sequence {Xn } of random variables is said to converge in distribution (or, in law) to a r.v. X provided that
lim FXn (x) = FX (x)
(1)
d
at every continuity point of FX . We may use the notation Xn X to describe

this situation.
Note that (1) is equivalent to
lim P (Xn x) = P (X x)
Note: When Xn s and X are discrete with values from {u1 , u2 , ...}, then the
above condition is equivalent to having
k = 1, 2, ...
lim P (Xn = uk ) = P (X = uk )
Verify this fact!.

Example: Let Xn N (0, n1 ). Show that {Xn } converges in distribution, then
find its limit.
Solution: Let (u) be the cummulative distribution function of the standard
normal r.v.:
Z u
x2
1
e 2 dx
(u) =
2
Then
FXn (u) =
nx2
n
e 2 dx = (u n)
2
So,
limn FXn (u)
()
(0)
=
()
1
2
u<0
u=0
u>0
u<0
u=0
u>0
So, by taking the random variable with distribution

0
u<0
FX (u) =
1
u0
FX is continuous except at u = 0; and FXn (u) FX (u) at each continuity
point of FX . So, Xn X in distribution.
Example: Let Xn be the r.v. with density

nxn1
0<x<1
fn (x) =
0
otherwise
Does {Xn } converge in law to any r.v.?
Solution: Let Fn be the distribution function of Xn .
For 0 x 1 we have
Z x
Z x
it=x
Fn (x) =
fn (t)dt =
ntn1 dt = tn
= xn
t=0
For x 1 we have Fn (x) Fn (1) = 1, so that Fn (x) = 1.

For x < 0 we have Fn (x) Fn (0) = 0, so that Fn (x) = 0.
x<1
0
So
lim Fn (x) =
n
1
x1
d
Therefore, Xn X where X has the distribution function

0
x<1
F (x) =
1
x1
(this X is indeed a degenerate r.v.). Recall that a constant random variable
X a is called degenerate; its distribution function is

0
x<a
F (x) = P (X a) =
1
xa
Example: Let Xn be the degenerate r.v. Xn n. So, Xn has the distribution
function

0
x<n
Fn (x) =
1
xn
Does {Xn } converges in law to any r.v.?
Solution: For every x we have limn Fn (x) = 0, and we know that there
cannot exist any distribution function that would agree with the zero function
on its continuity points. Therefore, {Xn } does not converge in law to any r.v.
Note: We recall from Calculus that for every real number c we have limx+ 1 +
x
ec . To see this, put y = 1 + xc , and note that

limx+ ln(y) = limx+ x ln 1 + xc
=
limx+
limz0+
= limz0+
c
ln(1+ x
)
1
x
ln(1+cz)
z
c
1+cz
= c
Then,
lim
x+
1+
c x
= lim y = lim eln(y) = elimx+ ln(y) = ec
x+
x+
x
3

c x
x
As a result, for positive integers n we have

c n
= ec
lim 1 +
n
n
Here is a genralization:
Lemma: Let cn be a sequence of real or complex numbers with lim cn = c.
n
Then lim 1 + cnn = ec .
n
Proof: Since we already know that limn 1 + nc = ec , we equivalently need
to show that
h
cn n
c n i
lim 1 +
1+
=0
n
n
For this, let rn = cn c. Then rn 0, so for > 0 there exists some n0 such
that
n n0
|rn |
So, if we let sn = sup{|rn | , |rn+1 | , |rn+2 | , ...}, then
n n0
0 sn
This shows that lim sn = 0. So, there exists some n1 such that
n n1
|sn | 1
k < n & n n1
|sn |nk |sn |
Therefore,
Then for n n1 we have

1 + cn n 1 + c n = 1 + c +
n
n
n
P
n
= k=0
n
k
P
n1
= k=0
n
k

rn n
n
1+

c n
n
1+

c k
n

rn nk
n
1+

c k
n

rn nk

n
1+

Pn1
k=0
n
k

1 + c k rn nk
Pn1
k=0
n
k

1 + c k |sn |nk
n
Pn1
n
k

1 + c k |sn |
n

1 nk
n
Pn
n
k

1 + c k |sn |
n

1 nk
n
n
k

1 + c k

1 nk
n
n
k

k=0
k=0
= |sn |
Pn
|sn |
Pn
k=0
k=0

= |sn | 1 +
|c|+1
n
after cancelling out
1+

c n

n
|c|
n
k
n

1 nk
n

1 nk
n
(n n0 )
But, the right-hand side tends to 0 e|c|+1 , so

cn n
c n

lim 1 +
1+
=0
n
n
n
which proves the claim.
Note: One could have skipped the introduction of the sequence {sn } by using
the so-called upper-limit.
Continuity Theorem (based on moment generating functions): Suppose
the moment generating functions of the random variables X, X1 , X2 , ... exist
on some neighbourhood (, ) of the origin. If lim MXn (t) = MX (t) for all
d
t (, ), then Xn X.
Continuity Theorem (based on characteristic functions): If X, X1 , X2 ,
d
... are random variables such that limn Xn (t) = X (t), then Xn X, and
vice versa.
Example (Poisson approximation to the binomial distribution): Let
Xn Binomial(n, pn = n ), or more generally, assume lim npn = . Show that
5
Xn X, where X Poisson().
Proof: One way of doing this exercise is by showing that
lim P (Xn = k) = P (X = k)
k = 0, 1, 2, ...
(Try it!). However, we may apply either of the contiuity theorems:

h
in
limn Xn (t) = lim pn eit + (1 pn )
=
h
in
lim 1 + pn (eit 1)
h
lim 1 +
= e(e
it
npn (eit 1)
n
1)
in
as npn (eit 1) (eit 1)
= X (t)
d
So, from the continuity theorem, we have Xn X.

Lemma: If E|X n | < , then for all k = 1, ..., n we have E|X k | < . In other
words, if the n-th moment of X exists, then all lower order moments exist too.
Proof: Let A be the event A = {|X| 1}. Let f (x) be the density function of
X. Then on A we have
|X k | = |X|k |X|n = |X n |
Then
E|X k | = E(|X|k )
=
|x|k f (x)dx
|x|k f (x)dx +
|x|n f (x)dx +
Ac
Ac
|x|n f (x)dx +
= E(|X|n ) + 1
= E(|X n |) + 1 <
We recall the following fact from Calculus:
|x|k f (x)dx
1 f (x)dx
f (x)dx
Proposition: Let h(x, t) be a function defined for all t in an interval (a, b).
Suppose that an integrable function g(x) exists such that

d

h(x, t) g(x)
t (a, b) x

dt
then
d
dt
Z
h(x, t)dx =
d
h(x, t)dx
dt
t (a, b)
Now we use this fact to prove the following result:

Theorem: Let
eitx f (x)dx
(t) =
be the characteristic function of X. If the n-th moment E(X n ) exists (i.e., if

E|X n | < ), then all derivatives 0 (t), 00 (t), ... , (n) (t) exist
Z
Z
d itx
(k) (t) =
e f (x)dx =
(ix)k eitx f (x)dx
k = 1, 2, ..., n
(1)
dt
Especially,
(k) (0) = ik E(X k )
Proof:

d itx
e f (x) = (ix)eitx f (x) = |x|f (x)
dt

But, from the preceeding Lemma weR know that the moments E(X), E(X 2 ),
... , E(X n ) all exist, and therefore |x|f (x) = E|X| < , so, using the
preceeding Proposition, we can write
R itx
d
e f (x)dx
0 (t) = dt
d
dt

eitx f (x) dx
using preceeding Proposition
(ix)eitx f (x)dx
So,
Z
(ix)eitx f (x)dx
(t) =
(2)
Next Step:

d

(ix)eitx f (x) = (ix)2 eitx f (x) = |x2 |f (x)
dt

and we know that
|x2 |f (x) = E|X 2 | <
therefore we may differentiate (2) to write

Z
Z
Z

d
d
00
itx
itx
(t) =
(ix)e f (x)dx =
(ix)e f (x) dx =
(ix)2 eitx f (x)dx
dt
dt
so
00
(t) =
(ix)2 eitx f (x)dx
(3)
Continuing this way, we get the equality (1) for all k = 1, 2, ..., n.
Remark: We have seen smewhere that if the MGF of a random variable X exists
on some neighbourhood (h, h) of the origin, then both MX (t) and X (t) have
power series expansions on that neighbourhood, therefore have derivatives of all
order in that neighbourhood with
(n) (0) = in E(X n )
M (n) (0) = E(X n )
Taylors Theorem: Suppose f : [a, b] C is n1 times differentiable on [a, b].

If f (n) (x0 ) exists, then for every x [a, b] we have
f (x) = f (x0 )+
f 0 (x0 )
f 00 (x0 )
f (n) (x0 )
(xx0 )+
(xx0 )2 + +
(xx0 )n +R(x)
1!
2!
n!
where limxx0
R(x)
(xx0 )n
= 0.
Definition: A sequence {Xn } of random variables is called an i.i.d (independent identically distributed) if the xn s share the same distribution and that
the Xn s are independent.
A simple observation is the following, which shows that applying a transformation on an i.i.d. will result in an i.i.d:
Proposition: If {Xn } is an i.i.d., then for every continuous function g : R R,
the sequence {g(Xn )} is an i.i.d. too.
Proof: If Xm and Xn are any two members of the sequence and if O1 = [a, b]
and O2 = [c, d] are two intervals in the real line, then
P (g(Xm ) O1 , g(Xn ) O2 ) = P (Xm g 1 (O1 ) , Xn g 1 (O2 ))
= P (Xm g 1 (O1 ))P (Xn g 1 (O2 ))
= P (g(Xm ) O1 )P (g(Xn ) O2 )
8
This shows that g(Xn )s are independent. Further, for every interval O in the
real line:
P (g(Xm ) O) = P (Xm g 1 (O))
= P (Xn g 1 (O))
= P (g(Xn ) O)
which shows that the Xn s enjoy the same distribution.
Central Limit Theorem: Let {X1 , X2 , ...} be an i.i.d. with finite mean
and finite variance 2 (equivalently, the first and second moments are finite).
n = X1 ++Xn be the sequence of sample means. Then the sequence
Let X
n
n )
n(X
converges in distribution to N (0, 1).
t2
Proof: Let n be the C.F. of the r.v. n(Xn ) . Let (t) = e 2 be the C.F.
f the r.v. N (0, 1). For every t we will show n (t) (t); then in light of the
convergence theorem this will do.
set
Xi
i = 1, ..., n
Zi =
Then form the last proposition, the sequence {Z1 , ..., Zn , ...} is an i.i.d. since it
is found by applying a transformation on the i.i.d. {X1 , ..., Xn }. Since the Zi s
have the same distribution, their C.F.s are
identical; let be their common

Pn
n(Xn )
characteristic function. Note further that
= 1n i=1 Zi as
Pn
n )
)
n n
n(X
n(X
nX
n
=
=
=
n
n
Xi n
1 X
=
n
n
i=1
Xi
Then
n (t) = 1
Pn
Z
i=1 i
= Pn
i=1
= Z1
=
Zi
t
n
(t)
t
n
Zn
n
tn
t
n
(1)
Since the common second moment of Xi s is finite, the common second moment
of Zi s is finite and therefore the second derivative 00 (t) exists for all t. So then
we can apply the Taylors theorem to write:
(s) = (0) + 0 (0)s +
1 00
(0)s2 + R(s)
2!
1 X
=
Zi
n i=1
R(s)
s2
where, lims0

t
n
= 0. By changing s to
t
1
= (0) + 0 (0) +
2!
n
we will have
2
00
(0) + R

(2)
But
(0) = 1
0 (0) = i E(Z1 ) = 0
00 (0) = i2 E(Z12 ) = 1
So, the identity (2) reduces to

t
t2
t

=1
+R
2n
n
n
By putting this back into (1):
n
n (t) =
1

1+
=
t2
2n
+R
2
t2 +nR
t
n
t
n
on
n
(3)
But,

nR

=n

2 R t
t
R
t
n
n
2
=
t
2
2 0
n
t
t
Therefore, from (3), we get

t2
lim n (t) = e 2
as desired.
Definition: Let {Yn } be a sequence of random variables. Suppose that sed
n
N (0, 1). Then we say that {Yn }
quences {an } and {bn } exis such that Ynba
n
is asymptotically normal N (an , b2n ).
Note: Recall the central limit theorem

n(Xn ) d
N (0, 1)
(1)
Since Xi are assumed to form an i.i.d. in central limit theorem, we have

n ) = E X1 ++Xn = 1 {E(X1 ) + + E(Xn )} = 1 { + + } =
E(X
n
n
n
n)
V ar(X
1
n2
{V ar(X1 ) + + V ar(Xn )} =
10
2
n
So, the relationship (1) is the same as

n E(X
n) d
X
N (0, 1)
std(Xn )
(2)
n tends to N (0, 1) in distribution. Further,

i.e., the standardized form of X
the relationship (2) is the same as
n d
X
N (0, 1)
( n )
n is assymtotically normal N (, 2 )
So, we could further say that X
n
n =
Now, by defining the sums Sn = X1 + +Xn , we have X
have
Sn
n( n ) d
N (0, 1)
equivalently
Sn n d
N (0, 1)
(3)
n
Sn
n ,
so we actually
So, Sn is assymptotically normal N (n, n 2 ). On the other hand, note that

E(Sn )
V ar(Sn )
= E(X1 ) + + E(Xn ) = n
= V ar(X1 ) + + V ar(Xn ) = n 2
So, the relationship (3) is the same as

Sn E(Sn ) d
N (0, 1)
std(Sn )
i.e., the standardized form of Sn tends to N (0, 1) in distribution.
d
Note: If {Yn } is a sequence that Yn N (0, 1), then for every two real numbers
a < b we have
P (a < Yn b) = P (Yn b) P (Yn a) = FYn (b) FYn (a) (b) (a)
so we use (b) (a) to approximate P (a < Yn b) for large n. When Yn has
a continuous distribution, the probabilities P (Yn = a) and P (Yn = b) are zero,
therefore we may use (b) (a) to approximate all of
P (a Yn b)
P (a Yn < b)
P (a < Yn < b)
But, when the r.v.s Yn are integer-valued, then we apply a continuity correction.
So here is the discussion:
11
Discussion: Suppose X is an integer-valued r.v. and suppose X N (, 2 )

Y (so, X is approximately normal with mean and variance 2 ). How do we
approximate a probability P (X = k) using this fact?. Of course we cannot
approximate this probability by P (Y = k) as this latter one is zero. OIne way
of properly approximating is this:
P (X = k) = P (X k)P (X k1) = FX (k)FX (k1) FY (k)FY (k1)
But, it has become a common practice amongst the statisticians to do this
instead:
P (X = k) = P (k 0.5 < X k + 0.5)
= FX (k + 0.5) FX (k 0.5)
FY (k + 0.5) FY (k 0.5)
= P (k 0.5 < Y < k + 0.5)
= P
k0.5
k+0.5
<

<
k+0.5
k0.5
So,

P (X = k)
k + 0.5
k 0.5
As a result of this, we will have:

Pn
P (m X n) =
k=m P (X = k)
Pn
k=m

i
h
k+0.5
k0.5
n+0.5
m0.5
= P (m 0.5 < Y < n + 0.5)

So:
P (m X n) P (m 0.5 < Y < n + 0.5)
where
X N (, 2 ) Y
This is the continuity correction when both end points are inclusive. But id
the left-end-point is exclusive, then we write
P (m < X n) = P (m+1 X n) P ((m+1)0.5 < Y < n+0.5) = P (m+0.5 < Y < n+0.5)
P (m < X n) P (m + 0.5 < Y < n + 0.5)
And, as you may guess, we can prove the followings:

P (m < X < n) P (m + 0.5 < Y < n 0.5)
12
P (m X < n) P (m 0.5 < Y < n 0.5)

So this is the rule: If the left-end point is inclusive, then subtract 0.5, otherwise
add 0.5. For the right-end point, things are done in the opposite way.
Note: To evaluate a value (c) of the normal cumulative distribution, use the
following function in a dell of the Microsoft Excel
= NORM.DIST(c, 0, 1, TRUE)
Example: Suppose X is integral-valued and X N (2, 3.7) Y , and suppose

that we want P (0.2 < X 4.7).
Solution:
P (0.2 < X 4.7)
= P (1 X 4) P (0.5 < Y < 4.5)

= P
0.52
3.7
4.52
3.7
<

Y 2
3.7
<
4.52
3.7
0.52
3.7
= 0.6854
Example: A die is rolled thirty times. Find the (approximate) probability that
the total score will be between 90 and 120 inclusive.
Solution: Let Xi be the outcome of the i-th rolling. We want P (90 S30
120). In fact, the mean value of each rolling Xi is
1 + 2 + + 6
= 3.5
6
and the variance is
(1 3.5)2 + (2 3.5)2 + + (6 3.5)2
= 2.9167
6
So then
E(S30 ) =
V ar(S30 ) =
(30)(3.5) = 105
(30)(2.9167) = 87.5
S30 N (105, 87.5) Y
13
P (90 S30 120) P (89.5 < Y < 120.5)

= P
89.5105
87.5
<
120.5105
87.5
Y105
87.5
<
120.5105
87.5
89.5105
87.5
0.9512 0.0488 = 0.9025
Example: For X Binomial(20, 0.3), find the approximate probabilities for

P (X = 7) and P (7 < X 10).
Solution: If Xi Bernoulli(0.3) = Binomial(1, 0.3), then
X = S20 = X1 + + X20
We have
E(X) = (20)(0.3) = 6
V ar(X) = (20)(0.3)(0.7) = 4.2
So,
X N (6, 4.2) Y
Then
P (X = 7) = P (7 X 7) P (6.5 < Y < 7.5) = P
6.56
4.2
<
Y 6
4.2
<
7.56
4.2

6.56
( 7.56
4.2
4.2
0.7679 0.5964 = 0.1725
Example: A random vector (X1 , ..., X100 ) (i.e. an i.i.d.) is picked from a
population with distribution uniform(1, 1). Find the approximate probability
that the square of the distance of the vector from the origin is less than or equal
to 40.
P100
Solution: We are looking for P ( i=1 Xi2 40). We have
i1
R1
= 13
E(Xi2 ) = 1 x2 12 dx = 16 x3
1
E(Xi4 )
V ar(Xi2 )
R1
x4 21 dx =
1
1 5
10 x
i1
1

2
= E(Xi4 ) E(Xi2 ) =
14
1
5
1
5
1
9
4
45
So then:
(100)( 13 ) =
P
E( Xi2 ) =
P
V ar( Xi2 )
100
3
4
V ar(Xi2 ) = (100)( 45
)=
Based on CLT:
100
80
9
80
Y
3
9
Since the Xi s are continuously distributed, we dont apply the continuity correction. So,

100
P
Y 100
803 40
803
P ( Xi2 40) P (Y 40) = P
X
Xi2 N
= P (N (0, 1) 2.235) = (2.235) = 0.9873
Example: Suppose a random sample (an i.i.d.) of 48 numbers is picked from

uniform(0, 2). Find the approximate probability that the sample mean is between 0.8 and 1.1
Solution: We know that if X uniform(, ), then
E(X) =
+
2
and
V ar(X) =
( )2
12
In the case of the sample {X1 , ..., X48 } we have

1
3
E(Xi )
V ar(Xi ) =
48 )
E(X
48 ) =
V ar(X
V ar(X1 )
48
1
144
So, from the CLT we have

48 N
X
1
1,
144

Y
Since we are dealing with continuous random variables, no correction is

needed. So,

Y 1
1.11
0.81
48 < 1.1) P (0.8 < Y < 1.1) = P
P (0.8 < X
<
<
1
1
1
144
15
144
144
(1.2) (2.4) = 0.8767

5 - CentralLimitTheorem

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

5 - CentralLimitTheorem

Enviado por

Direitos autorais:

Formatos disponíveis

Convergence in Distribution, and

the Central Limit Theorem

iii) F is continuous from the right.

Example: The function F (x) =

at every continuity point of FX . We may use the notation Xn X to describe

Verify this fact!.

So, by taking the random variable with distribution

For x 1 we have Fn (x) Fn (1) = 1, so that Fn (x) = 1.

Therefore, Xn X where X has the distribution function

As a result, for positive integers n we have

|sn |nk |sn |

Then for n n1 we have

after cancelling out

But, the right-hand side tends to 0 e|c|+1 , so

(Try it!). However, we may apply either of the contiuity theorems:

as npn (eit 1) (eit 1)

So, from the continuity theorem, we have Xn X.

Now we use this fact to prove the following result:

be the characteristic function of X. If the n-th moment E(X n ) exists (i.e., if

using preceeding Proposition

|x2 |f (x) = E|X 2 | <

therefore we may differentiate (2) to write

(ix)2 eitx f (x)dx

Taylors Theorem: Suppose f : [a, b] C is n1 times differentiable on [a, b].

converges in distribution to N (0, 1).

Therefore, from (3), we get

Note: Recall the central limit theorem

Since Xi are assumed to form an i.i.d. in central limit theorem, we have

So, the relationship (1) is the same as

n tends to N (0, 1) in distribution. Further,

So, Sn is assymptotically normal N (n, n 2 ). On the other hand, note that

So, the relationship (3) is the same as

Discussion: Suppose X is an integer-valued r.v. and suppose X N (, 2 )

As a result of this, we will have:

= P (m 0.5 < Y < n + 0.5)

P (m < X n) P (m + 0.5 < Y < n + 0.5)

And, as you may guess, we can prove the followings:

P (m X < n) P (m 0.5 < Y < n 0.5)

Example: Suppose X is integral-valued and X N (2, 3.7) Y , and suppose

= P (1 X 4) P (0.5 < Y < 4.5)

S30 N (105, 87.5) Y

P (90 S30 120) P (89.5 < Y < 120.5)

0.9512 0.0488 = 0.9025

Example: For X Binomial(20, 0.3), find the approximate probabilities for

V ar(X) = (20)(0.3)(0.7) = 4.2

0.7679 0.5964 = 0.1725

= P (N (0, 1) 2.235) = (2.235) = 0.9873

Example: Suppose a random sample (an i.i.d.) of 48 numbers is picked from

In the case of the sample {X1 , ..., X48 } we have

So, from the CLT we have

Since we are dealing with continuous random variables, no correction is

(1.2) (2.4) = 0.8767

Você também pode gostar