Escolar Documentos
Profissional Documentos
Cultura Documentos
2
Matrix notation:
(3.33) yi = X i + ui , where yi :G 1 , X i : G K .
Examples:
xi 1 0
SUR: Xi = , ' = ( 1 , 2 ,..., G ) , K = g K g
0 xiG
xi 1
x i2
Panel data: X i = , X i is T K matrix => yit = xit + uit
xiT
3
3.3.2 Asymptotic Properties of System OLS
Assumption SOLS.1:
(3.34) E(Xi ui ) = 0 orthogonality condition
Remarks:
If Xi has a sufficient number of elements equal to unity, then E(ui ) = 0 .
Note multi-equation nature of (3.34) compared to orthogonality
condition in single equation model:
SUR: (3.34) implies
(3.35) E(xig uig ) = 0 g .
Note, however, that (3.34) does not imply that regressors of equation
g are uncorrelated with errors from equation h, i.e. (3.34) allows
that E(xig uih ) 0 g h
4
Panel data: (3.34) implies
(3.36) E(xit uit ) = 0 t
Thus, the orthogonality condition does not require the stronger
assumption
(3.37) E(xit uis ) = 0 t ,s
5
Under strict exogeneity, it follows that
E ( yt | x1 , x2 ,..., xT ) = E ( yt | xt )
Example:
yt = 0 + 1 yt 1 + ut
Note that xt (1, yt 1 ) and ut = yt 0 1 yt 1 .
Then, assuming first-order dynamics in the conditional mean,
E ( yt | x1 , x2 ,..., xt ) = E ( yt | xt ) and E ( ut | xt ) = 0 hold, because
E ( yt | x1 , x2 ,..., xt ) = E ( yt | y0 , y1 ,..., yt 1 ) = 0 + 1 yt 1 = E ( yt | xt ) and
E ( ut | x1 , x2 ,..., xt ) = E ( ut | yt 1 ) = E ( ut | xt ) = 0 .
This, however, is different from strict exogeneity. Strict exogeneity does not
hold, as E ( ut | x1 , x2 ,..., xt +1 ) = E ( ut | y0 , y1 ,..., yt ) = ut , t = 1, , T-1
6
Consistency of System OLS and Pooled OLS
Assumption SOLS.2:
(3.39) A E ( X i ' X i ) is non-singular (has rank K)
Then, using E ( X i'ui ) = 0 :
1
(3.40) = E( X X i ) '
i E ( X i' yi )
and
( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N
(3.41)
7
Matrix notation:
= ( X ' X ) X ' y ,
1
X1 y1
X2 y2
X= . ( NG K matrix), and y = . ( NG 1 vextor)
. .
. .
XN yN
8
Application to SUR:
xi 1 0
Xi = =>
0 xiG
( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N
1
xi 1' yi 1
N
xi 1' xi 1 0 N
xi 2' yi 2
.
(3.42) = i =1
i =1
.
0 xiG' xiG .
xiG' yiG
9
Remarks:
If there are no-cross equation restrictions, estimating SUR is equivalent
to estimating OLS equation by equation.
Proof: Obvious from equation (3.42)
Cross-equation restrictions need to be considered in X i : See example
10
Application to Panel data:
xi 1
x i2
X i = =>
xiT
( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N
N T 1 N T
(3.43) = x ' x
i =1 t =1
it it x ' y
i =1 t = 1
it it
This estimator is called the POLS (pooled ordinary least squares) estimator,
= POLS
11
Theorem: Consistency of SOLS
Under assumptions SOLS.1 and SOLS.2, plim = .
Proof: See consistency of OLS
Asymptotic distribution:
12
Proof follows along the lines of the one of OLS:
N 1 N
Write SOLS estimator as = + N 1
X X
i =1
i i N 1
X u
i =1
i i
( )
1 1
N ( ) = N 1 i =1 X i X i i =1 X iui
N N
Then N 2 .
( )
1
i =1 X iX i
1 N
N = A1 + o p (1) (because of consistency),
13
As a consequence of Lemma 3.5 (WO, p. 39), we know because of CLT,
respectively convergence in distribution, that
1
i =1 X iui
N
N 2 = O p ( 1) .
Hence
( )
1 1
N ( ) = N 1 i =1 X i X i i =1 X iui
N N
N 2 =
1 1
(A )
i =1 X iui i =1 X i ' ui
1 N N
+ o p ( 1) N 2 =A 1
N 2 + o p (1) .
14
1
i =1 X i ' ui
N
Asymptotic Normality of A 1
N 2 + o p (1) follows from the
15
Avar N ( ) = lim Var( x N ) = lim E( x N xN )
1 1
i =1 X i ' ui + o p ( 1 ) i =1 X i ' ui
N N
= lim E A 1 N 2 A 1 N 2 + op ( 1 )
1
=A E N 1
( i =1 X i ' ui ui X i
N
) A 1 = A1 E ( X i ' ui ui X i ) A1 = A1 BA1
q.e.d.
16
Estimated asymptotic variance of SOLS estimator:
1 BA
A 1
=
Avar V ,
N
where
(3.45)
(
A = N i = 1 X i X i = X ' X / N
1 N
)
and
(3.46)
(
1 N
)
B = N i = 1 X iu i u i X i , u i = yi X i
( ) ( )( )
1 1
N N N
(3.47) V = i =1
X i X i i =1
X iu i u i X i i =1
X i X i
17
3.3.3. Pooled OLS, Revisited
Consider yt = xt + ut (skipping index i for reasons of
convenience)
Assumption POLS.2:
T
(3.49) rank
t =1
E ( xt xt ) = K
18
Want to apply usual test statistics from POLS: Further assumption
required!
Assumptions POLS.3: Homoskedasticity plus no correlation across time
periods:
(3.50) E ( ut2 xt xt ) = 2 E ( xt xt ), t = 1, 2,..., T , where 2 = E ( ut2 )
(3.51) E ( ut us xt x s ) = 0, t , s = 1, 2,..., T , t s
and ) = 2 ( X ' X )1 = 2
Avar( x x
i =1 t =1
it it .
19
Proof:
First part follows from consistency and asymptotic normality of SOLS.
Further, by POLS.3:
N T T
B = E ( X i ' ui ui X i ) = E N 1
u u it is xit xis
i =1 t =1 s=1
N T T
= E N
2 1
xit xit = E
i =1 t =1
2
it it
x
t =1
x = 2
E( X
i X i ) = 2
A
NT
it
u 2
i =1 t =1
20
Without POLS.3:
Heteroskedasticity and Autocorrelation robust estimate of Avar ( ) :
N T 1 N T T N T 1
(3.54) ( POLS ) =
Avar x x
i =1 t =1
it it u u
i =1 t =1 s = 1
it is xit xis x x
i =1 t =1
it it
21
3.3.4. Feasible GLS (FGLS)
OLS estimator does not include information on correlation across ui
efficiency gain by using variance-covariance matrix 2 I of ui , i.e.
(3.55) E ( ui ui | X i ) = E ( ui ui )
system homoskedasticity
(3.56) is G G positive definite matrix and E ( X i 1 X i ) is
nonsingular
22
Usual motivation of GLS estimator:
Premultiply equation yi = X i + ui by 1/ 2 :
1/ 2 yi = ( 1/ 2 X i ) + 1/ 2 ui leading to yi* = X i* + ui* , where
E ( ui* ui* ) = I G holds (verify!). Then:
N 1 N N 1 N
(3.56) GLS = X X
i =1
*
i
*
i X y
i =1
*
i
*
i = X
i =1
i
1
Xi X
i =1
i
1
yi
23
Consistency and asymptotic normality of FGLS requires somewhat
stronger assumption than SOLS.1:
Assumption SGLS.3:
(3.57) E ( X i ui ) = 0
E ( xi1ui ) 0
(3.58) =0 in SUR models, and
0 ui )
E ( xiG
24
E ( xi1ui ) = 0
E ( xi2 ui ) = 0
(3.59) in panel data models.
ui ) = 0
E ( xiT
25
Theorem Asymptotic Normality of FGLS
N 1
(3.61) =
Avar X
i =1
i
1
Xi
26
Remark: (Full) Matrix Notation of FGLS as Alternative to Summation over
units i:
( )X ( )
1
(3.62) FGLS = X I N
1 1 y
X IN
X1 y1
X2 y2
X= . , as well as y = . .
. .
. .
XN yN
27
Panel data allows finding better solutions:
29
3.4. Basic Linear Unobserved Effects Panel Data Models
WO Chapter 10
31
3.4.2. Random Effects Models
Consider
(3.65) yit = 0 + xit + ci + uit , t = 1,..., T
32
Remarks:
It follows from RE.1 that E ( xis uit ) = 0 s , t = 1,..., T
E (ci ) = 0 would follow without assumption RE.1 b) when an
intercept is included
33
Evaluation takes place by comparing productivity (= wages) at some initial
period t=1, when no one participated in the program, to wages at t=2, after
a treatment group has participated in the program, whereas a control
group has not.
Define
progi 1 = 0 for all i
progi 2 = 0 for all i of the control group
progi 2 = 1 for all i of the treatment group
34
Strict exogeneity?
Adverse wage shock ut might people choose to participate in future
training, i.e. Cov ( ut , progi . t +1 ) 0
Training program could have lasting effects
35
WO 10.3 Lagged Dependent Variable
Want to study wage persistence (i.e. the size of ) after controlling for
unobserved heterogeneity:
Let yit = log( wageit ) , xit = yi . t 1 then a standard time series assumption
would be that E ( uit | yi . t 1 , yi . t 2 ,..., yi .0 , ci ) = E ( uit | xit , xi . t 1 ,..., xi 1 , ci ) = 0 .
Thus, uit is uncorrelated with past xis , s t , but uit cannot be uncorrelated
with future xis , s > t :
E ( yit uit ) = 1 E ( yi . t 1uit ) + E (ci uit ) + E ( uit2 ) = E ( uit2 ) > 0
E ( yi . t +1uit ) = 1 E ( yi . t uit ) + E ( ci uit ) + E ( ui . t +1uit ) = 1 E ( yi . t uit ) = 1 E ( uit2 ) 0 ,
37
Note that E ( it | xi 1 ,..., xiT ) = 0 satisfies the strict exogeneity assumption
SGLS.3 (3.57) such that we can apply GLS methods that account for the
particular error structure in eq. (3.70).
Change back to system notation: Write yit = xit + it , t = 1,..., T for all T
periods as (see 3.3.1)
(3.72) yi = X i + i ,
where
(3.73) i = c i jT + ui , jT = (1,...,1)' (T 1 vector).
38
Following along the lines of GLS, we define
(3.74) E( i i )
and we assume it to be positive definite (note that it is the same for all i
random sampling!).
c2 + u2 c2 c2
c2 c2 + u2
(3.75) = E ( i i ) = = u2 IT + 2j jT jT
c2 c2 + u2
Matrix Notation:
(3.76) Var ( ) = I NT , where ' = (1 ,..., N ) .
39
) instead of ) GLS rank condition,
Assuming the slightly modified ( plim (
RE.2,
( )
1
(3.77) )
rank E X i plim ( Xi ) = K ,
40
Theorem Consistency of the Random Effects Estimator
Under RE.1 and RE.2 RE is an N asymptotically normal and consistent
estimator of .
Estimation of
Inspection of 2 = c2 + u2 suggests:
i. estimation of 2 ,
ii. estimation of c2 => estimation of u2
42
First step: Consistent estimation of 2
t =1 t =1
NT K
it
2
i = 1 t =1
43
Second step: Find a consistent estimator of c2
Recall that c2 = E ( it is )
=> T (T 1) / 2 pairs s t , s , t = 1, 2,..., T , all being equal to c2
T 1 T T 1 T T 1 T
=> E ( it is ) = E ( it is ) = 2
c = c2T (T 1) / 2
t =1 s = t + 1 t =1 s = t + 1 t =1 s = t + 1
Robust Estimation of
45
Reasons:
E ( i i | xi 1 ,..., xiT ) is not constant, i.e. E ( i i | xi 1 ,..., xiT ) E ( i i )
E( i i ) does not has the random effect structure = E( i i ) , based on
c2 and u2
46
has T (T + 1) / 2 estimated elements (recall that random effect
structure requires only two parameters)
with very large N, general FGLS is attractive alternative if appears
to have a pattern different from the random effects pattern
estimation:
N
1
(3.83)
N
,
i =1
i i where i are POLS residuals
47
Testing for the presence of an Unobserved Effect
H 0 : c2 = 0 Cov ( it , is ) = 0
48
Calculate test statistic that is distributed asymptotically as standard
normal:
N T 1 T
i =1 t =1 s = t + 1
it is
(3.85)
N T 1 T 2 1/ 2
i =1
it is
t =1 s = t + 1
49
Fixed Effect Methods
(Thus, RE. b) is not maintained, i.e. E (ci | xi 1 ,..., xiT ) is allowed to be any
function of xi 1 ,..., xiT )
50
Flexibility comes at a price: xi 1 ,..., xiT cannot include time-constant factors
51
Estimation by way of within transformation:
52
The fixed effects estimator (FE) estimator or within estimator is defined as
the POLS estimator applied to eq. (3.90)
The between estimator is defined as the OLS estimator applied to the cross
section equation yi = xi + ci + ui (note loss of efficiency)
53
Vector notation and time-demeaning
Proof: All we need is a tool that produces deviations from the time mean as,
for example, the auxiliary regession yit = i + yit , t = 1,..., T , implying that
i = yi => The residual maker matrix QT IT jT ( jT jT )1 jT does the job:
QT jT = 0, QT yi = yi , QT X i = X i , QT ui = ui .
54
Assumption FE.2:
T
(3.91) (
rank E ( X i X i ) = rank ) E ( x x it it ) =K (FE.2)
t =1
Remark:
Would be less than K when any element xit is identical zero for
all i, i.e. observations need to vary over time
55
Properties of between estimator:
Inconsistent under FE.1, because E ( xici ) is not necessarily zero
Consisteny of between estimator requires assumption RE.1 (b)
56
Variance of uit , t = 1,..., T , is homoskedastic. Indeed:
T 1
(3.94) = + / T 2 / T =
2
u
2
u
2
u
2
u
T
(3.95) = u2 / T 2 u2 / T = u2 / T < 0
Nevertheless, negatively correlated uit do not affect efficiency of FE,
because asymptotics use CLT based on uit , not on uit :
57
Asymptotic variance of FE estimator follows from
1 1 N 1 1 N
( )
N N
N FE = N 1 X i X i N 2
X u i i = N 1 X i X i N 2
X u i i
i =1 i =1 i =1 i =1
( ) ( )
a 1
(3.96) N FE N 0, u2 E ( X i X i )
1
=> Avar ( FE ) = u2 E ( X i X i ) /N
58
and
1 1
( )
N N T
(3.97) FE = u2
Avar X X
i =1
i i = u2 x x
i =1 t = 1
it it
Estimation of u2 :
Note that (3.97) requires u2 , i.e. residuals uit , not residuals uit from the
transformed equation yit = xit + uit !
59
Hence, using sample moments, and correcting for degrees of freedom, we
N T
1
obtain u =
2
2
u
N (T 1) K i =1 t =1 it
60
Word of caution:
Denominator is N (T 1) K , not NT K as for standard regressions or
2 in the RE model!
(Downward) Bias can be substantial when T is small !
Using standard computer packages and applying it after demeaning
requires correction of standard errors
Example: N=500, T=3, K=10 => correction factor = 1.227
61
Measure of the relative importance c2 /( c2 + u2 ) of the fixed-effect
component ci :
( )
N T
1 2
Given FE , =
2
NT K i =1 t =1
yit xit FE is a consistent estimate of
62
The Dummy Variable Approach to Fixed-Effect Modelling
(LSDV)
View the ci as parameters to be estimated along with :
Matrix notation
(3.101) y = X * * + ,
63
Remark: LSDV as defined in (3.101) reproduces the FE estimator FE of ,
and also the residuals from the LSDV approach are identical to the
residuals u of the regression yit = xit + uit .
64
Treatment of collinearity of constant and fixed effects in software packages:
The overall intercept is either for an arbitrary cross section unit or, more
commonly, for the average of the ci across i.
( )
N T
FE = u2
Avar x x
i =1 t =1
it it would be wrong when FE.3 does not hold.
65
Procedure:
( ) ( ) ( )( )
1 1
N N N
(3.102) FE =
Avar X i X i X iu i u i X i X i X i
i =1 i =1 i =1
66
First Differencing Methods
Consider
Assumption FD.1
Assumption FD.2
T
(3.105) rank E ( x x )
t =2
it it =K
67
First-difference (FD) estimator is the pooled OLS estimator from the
regression yit on xit , t = 2,..., T ; i = 1,..., N
Example: Differencing
we obtain
68
Discussion of FD under FD.1, FD.2 (continued):
Efficiency:
69
Inference:
Under FD.1 FD.3 the usual OLS standard errors from the first
difference regression are asymptotically valid.
( )
N T
1 2
(3.107) =
2
e yit xit FD
N (T 1) K i =1 t =1
(3.108) ( )
Avar FD = ( X X )
1
( i =1 X ie i e i X i
N
) ( X X )
1
70
Practical hint: Avoid producing differences such as yi +1.1 yi .T when
calculating y and x from stacked cross sections; differences for
observations 1, T+1, 2T+1, , (N-1)T+1 should be set to missing
71
From the first-difference
In general, more than two periods and further explanatory variables are
included (see Chapter on Evaluation Methods)
72
Comparison of estimators
T=2:
73
T>2
Hinges upon the assumptions about the idiosyncratic errors uit :
FE is more efficient when uit are serially uncorrelated
FD is more efficient when uit follows a random walk
Recall from assumption FD.3:
E (eit eis | xi , ci ) = 0 => eit = uit is serially uncorrelated
=> uit = ui . t 1 + eit follows a random walk
Under FD.1 FD.3, estimation of usual OLS standard errors
from the first difference regression are asymptotically valid,
while under FE.1 FE.3 estimated OLS standard errors based
on (within) FE regression need to be corrected (note, however,
that this is not the case for the LSDV approach)
Formal test by using a Hausman test
74
Fixed Effects versus Random Effects
{ ( )}
1/ 2
(3.112) = 1 1/ 1 + T c2 / u2
Thus, 1 when T or c2 / u2
75
{
We conclude from = 1 1/ 1 + T ( / )}
1/ 2
2 2
c u that random effects can be
close to fixed effects
for large T
{
We conclude from = 1 1/ 1 + T ( / )}
1/ 2
2 2
c u that random effects can be
close to pooled OLS
if c2 / u2 0 , i.e. = c2 /( c2 + u2 ) 0
76
The Hausman Test Comparing the RE and FE estimators
( )
when FE RE 0 then we observe consistency in both cases
but RE is more efficient => use of the efficient RE estimator
77
( )
when FE RE 0 then RE differs from the consistent
alternative and RE must be inconsistent => use of the consistent FE
estimator
( )
( ) ( ) ( )
1 a
(3.113) H = FE RE FE Avar
Avar RE FE RE 2M ,
78
Proof:
H is based on the Wald statistic
(
) ( ) ( )
1 a
H = FE RE FE RE
Avar FE
RE 2M
Consider
(
Var
FE RE ) = Var ( ) + Var ( ) Cov (
FE RE RE , FE )
Cov ,
RE (
FE )
Hausmans essential result is that the covariance of an efficient estimator
with its difference from an inefficient estimator is zero, which implies that
( ) (
Cov FE RE , RE = Cov FE , RE Var RE = 0 or) ( )
Cov ( FE )
, RE = Var RE ( )
79
( ) ( )
Replacing Cov FE , RE = Var RE , we obtain
( ) ( ) ( ) ( )
Var FE RE = Var FE + Var RE Var RE Var RE ( )
q.e.d.
(3.114) t=
(
k FE )
kFE
,
( ) Avar
( )
1/ 2
Avar kFE k RE
80
( )
k
where Avar FE
and Avar ( )
k
RE
are corresponding main diagonal
( ) and Avar
elements of Avar FE
( ) , respectively.
RE
Remarks:
Test assumes that RE.1 RE.3 hold; Rejection does not allow any
conclusion about true causes of rejection
reason of rejection might be that RE.3 is too strong an
assumption
Hausman test has no power against the alternative that
Assumption RE.1 is true but Assumption RE.3 is false.
81