Escolar Documentos
Profissional Documentos
Cultura Documentos
11/7/2011
)/
n(Xn
n(Xn )/
is exactly a
N (0, 1)
random
Xn is exactly a N (, 2 /n) random variable for any n. Consequently, if n(Xn )/ is approximately a N (0, 1) random variable, then it makes sense to use N (, 2 /n) as an approximating distribution for Xn . 2 Similarly, one can also use N (n, n ) to approximate the sum of n i.i.d. random varin ables: Sn := i=1 Xi . For example, one could use N (np, np(1 p)) to approximate the a binomial distribution: Bin(n, p) because the binomial distribution is the sum of n i.i.d.
variable, then Bernoulli random variables. This approximation is not good unless
n is sucient large.
is a good question. It depends on the underlying distribution from which the random sample
(X1 , ..., Xn )
enough for this
is drawn.
to be
Bin(n, p), a useful rule of thumb is that n > 30 is large approxiamted by N (np, np(1 p)) well enough. (Question: does
One needs to have a measure of
How does one evaluate the approximation quality? distance on the space of probability measures. Smirnov distance:
(1)
Fn
and
are two cdf 's. There are many other distances that one can use, but the
KS distance is the most common, perhaps for its simplicity. Now that we have a measure of approximation quality, we can do the following computer exercise: (1) For a xed
n,
(say
S = 1000000),
generate
random
n-
(X1,s , ..., Xn,s ) : s = 1, ..., S from a population 2 and variance . n(Xn,1 )/, ..., n(Xn,S )/ . (2) Compute
samples: size
distribution
FX
by
Fn(Xn,1 )/ (x) = S 1
and
S s=1
1{ n(Xn,1
)/ x}.
(4) Estimate the KS distance between points on
Fn(Xn,1 )/ (x)
(x)
R. n and repeat to see how large n needs to be for the KS distance to be smaller FX
will require dierent
(5) Change
n.
Try
Bern(p)
for dierent
p.
Try
t-
distributions with dierent degrees of freedom, and then try other familar distributions. You will get a sense about the applicability of the central limit theorem.
Convergence in Distribution
The CLT is a special case of a sequence of random variables converge in distribution to a random variable.
Denition 1. A sequence of random variables or vectors {Yn } converges in distribun=1 tion to a random variable Y , if
n
for
(2)
Remark.
Yn d Y ,
we say
Yn
(4) The concept of convergence in distribtion involves the distributions of random variables only, not the random variable themselves. e.g. suppose the CLT conditions hold:
Yn d Z
for
Z N (0, 1).
(5) In the denition above, the convergence of the cdf is only required to hold for continuous points of the cdf of the limiting random variable. This is important when the limiting random variable is not a continuous random variable. e.g.
Yn
has
pdf : n/2 x (0, 1/n) fYn (y) = n/2 x (1, 1 + 1/n) 0 otherwise.
for
Then
Yn d Y
Y Bern(1/2),
and
for all
y R/{0, 1}.
Continuous Mapping Theorem Theorem 1 (CMT). If h : Rm Rp is a continuous function, Yn , Y are Rm -valued random
vectors, and Yn d Y . Then
h(Ym ) d h(Y ).
e.g. If If
2 Yn d Z N (0, 1), then Yn d Z 2 2 (1). n(Xn )/ d Z N (0, 1), then n(Xn ) d Z N (0, 2 ).
then
If
Yn Yn d Z Z 2 (2). Yn .
The CMT requires joint convergence (in distribution) of all elements of the vector The following is NOT true: if
Y1,n d Y1 , Y2,n d Y2 then h(Y1,n , Y2,n ) d h(Y1 , Y2 ). However, the following is true: if Y1,n d Y1 , Y2,n p r then h(Y1,n , r) d h(Y1 , r), where r is a constant vector. This is due to the Lemma 1 below. In Lemma 1, Yn,2 p r can be replaced by Yn,2 d r , because for a constant vector/scalar r , Yn,2 p r is equivalent to Yn,2 d r as we proved in class.
Lemma 1. Supposes {Yn,1 } and {Yn,2 } are two sequences of random vectors/variables, n=1 n=1
and as n , Yn,1 d Y1 and Yn,2 p r for a random variable Y1 and a constant vector/scalar r. Then
Yn := Y1,n Y2,n d Y1 r =: Y. Y {(y1 , y2 ) R2 : y1 the cdf of Y1 . Consider
is
Proof.
(scalar case only) The set of points of continuity of the cdf of where
C(FY1 ), y2 = r},
C(FY1 )
Then
FYn (y1 , y2 ) = Pr(Y1,n y1 , Y2,n y2 ) Pr(Y2,n y2 ) = Pr(Y2,n r y2 r) = Pr(r Y2,n > r y2 ) Pr(|Y2,n r| > r y2 ) p 0 = FY (y1 , y2 ).
Consider (3)
Then
FYn (y1 , y2 ) = Pr(Y1,n y1 ) Pr(Y1,n y1 , Y2,n > y2 ) Pr(Y1,n y1 ) Pr(Y2,n > y2 ) = Pr(Y1,n y1 ) Pr(Y2,n r > y2 r) Pr(Y1,n y1 ) Pr(|Y2,n r| > y2 r) Pr(Y1 y1 ) 0 = FY (y1 , y2 ).
The above two displays imply that (5)
FYn (y1 , y2 ) FY (y1 , y2 ). 2 Now that we have shown for any (y1 , y2 ) {(y1 , y2 ) R : y2 = r}, FYn (y1 , y2 ) FY (y1 , y2 ). By the denition of convergence in distribution, Yn d Y .
The vector case of the above lemma can be proved using the Cramr-Wold Device, the
CMT, and the scalar case proof above. The Cramr-Wold device is a device to obtain the convergence in distribution of random vectors from that of real random variables. The the-
orem is given below without proof ( the proof is straightforward using mgf 's/characteristic functions).
Theorem 2 (Cramr-Wold Device). Suppose {Yn } is a sequence of random k-vectors that n=1
satises c Yn d c'Y as n for all c Rk . Then Yn d Y .
The proof of Lemma 1 for the vector case is left as an exercise.
Example 1.
{X1 , ..., Xn }
2 > 0. The distribution 2 of the following random variable is often of interest: tn = n(Xn )/SX , where SX = (n 1)1 n (Xi Xn )2 . Suppose that the mean is known constant, then tn is a statistic i=1 and it is often called the t-statistic. What is the limiting distribution of tn ? We have learned from the CLT that tn SX / d Z N (0, 1). We have shown in 2 2 previous lectures that SX p . Then by Lemma 1 above, we have
and variance
tn SX / 2 SX
Let
Z 2
(6)
h(x, y) = x/ y .
Then,
is continuous and
2 tn = h(tn SX /, SX ).
(7)
Example 2.
sample
Consider a random
{X1 , ..., Xn } drawn from a population distribution with mean and variance 2 and 4 2 2 nite fourth moment: E|X| < . We know that is the probability limit of SX . What is 2 n(SX 2 )? the limiting distribution of
n
2 n(SX 2 ) =
n n [(Xi ) ] (Xn )2 + n 2 /(n 1) n1 i=1 n n 1 n 2 2 [(Xi ) ] ( n(Xn ))2 + n 2 /(n 1). = n1 n1 n i=1
2 2
5
n n1 n = n1 n = n1
n1 2 [(Xi Xn )2 ] n i=1
n
Because
n/(n 1) 0,
by the
By the CMT,
(8)
Yi = (Xi )2 . Then (Y1 , ..., Yn ) is an i.i.d. 2 2 2 2 with mean and variance E((Xi ) ) . By 1 n
n
Clearly,
n/(n 1) 1.
1 n n1 n
[(Xi )2 2 ] d Z2 .
i=1
(9)
Using (8) and (9), Lemma 1 and CMT, we can conclude that
2 n(SX 2 ) d Z2 .
(10)
Example 3.
the
Delta Method.
Suppose
is an estimator of
and
satises
g : R d R k
n(n ) d N (0, ). .
(11)
Suppose
method says:
(12)
where
G=
g() .
Proving that the Delta method works is a simple application of Lemma 1 and CMT. Because
g()
we can do a mean-value
expansion of
g(n )
around
where
lies on the line segment connectiong n and . Notice that n(g(n ) g()) is n ) g( and [ n(n )]. We already know that the second a product of two components: N (0, ).
We need to show the limiting distribution
(or probability limit) of the rst component. The convergence (11) implies that
1 (n ) = [ n(n )] d 0 N (0, ) = 0. n
Or equivalently: for any
(14)
n p 0.
Because
lies in between
Thus
Therefore, that
n p .
This and the Slutsky Theorem (you can use the CMT here, too) implies
g() g(n ) p = G.
(16)
By (11) and (16) and Lemma 1, the convergences in the two equations hold jointly. Then by (13) and the CMT,
(17)
{Yn }
is bounded
in probability, if
B n
Remark.
in probability. (2) A sequence of random variables that converges in distribution to a random variable
is bounded in probability.
B n
lim lim Pr(|Yn | > B) = lim lim [Pr(Yn > B) + Pr(Yn < B)]
B n
=0,
7
(18)
where the last equality holds by the properties of cdf 's. (3) A sequence of random variables that converges in probability to a constant bounded in probability. (Excercise)
is
Denition 3 (Op ).
n .
We say
Yn = Op (Xn )
if and only if
Yn /Xn
is bounded in probability as
Remark.
Yn = Op (1)
if and only if
Yn
is bounded in probablility as
n .
op :
if and only if
Denition 4 (op ).
Remark.
r
if anly only if
We say
Yn = op (Xn )
Yn /Xn p 0
as
n . Yn p
Yn r = op (1),
or sometimes written as
Yn = r + op (1). Xn
when
Remark.
op (Xn ).
Yn
Yn =