Overview Panel

3.
3 System OLS and GLS

WO, Chapter 7
3.3.1. Motivation: Seemingly Unrelated Regressions (SUR)

Consider G equations that seem to be unrelated, but correlation across the
errors in different equations can be exploited in more efficient estimators
y1 = x1 1 + u1
(3.32)
y2 = x2 2 + u2

yG = xG G + uG
Gain in efficiency: use GLS instead of OLS

1
Remarks:
In many applications all x g , g = 1,,G, include the same set of
variables, but this not necessarily required
Each equation represents a generic individual (person, firm, city,
household, ) from the population, each equation g involves
K g explanatory variables
Denote random draw as yig = xig g + uig , i=1,,N, all g = 1,,G
Asymptotics are based on G fixed and N tending to infinity

(Small T) Panel data might be considered as special case: Consider t =
1,,T instead of g = 1,,G
2
Matrix notation:
(3.33) yi = X i + ui , where yi :G 1 , X i : G K .
Examples:
xi 1 0
SUR: Xi = , ' = ( 1 , 2 ,..., G ) , K = g K g
0 xiG
xi 1
x i2
Panel data: X i = , X i is T K matrix => yit = xit + uit

xiT
3
3.3.2 Asymptotic Properties of System OLS
Assumption SOLS.1:
(3.34) E(Xi ui ) = 0 orthogonality condition
Remarks:
If Xi has a sufficient number of elements equal to unity, then E(ui ) = 0 .
Note multi-equation nature of (3.34) compared to orthogonality
condition in single equation model:
SUR: (3.34) implies
(3.35) E(xig uig ) = 0 g .
Note, however, that (3.34) does not imply that regressors of equation
g are uncorrelated with errors from equation h, i.e. (3.34) allows
that E(xig uih ) 0 g h
4
Panel data: (3.34) implies
(3.36) E(xit uit ) = 0 t
Thus, the orthogonality condition does not require the stronger
assumption
(3.37) E(xit uis ) = 0 t ,s
Stronger assumption than (3.34) would be

(3.38) E ( ui | X i ) = 0 zero conditional mean assumption
Contemporaneous and strict exogeneity in panel data analysis (defined

for population):
E ( ut | xt ) = 0 (contemporaneous) exogeneity (see 3.36)
E ( ut | x1 , x2 ,..., xT ) = 0 strict exogeneity (see 3.37)
5
Under strict exogeneity, it follows that
E ( yt | x1 , x2 ,..., xT ) = E ( yt | xt )
Example:
yt = 0 + 1 yt 1 + ut
Note that xt (1, yt 1 ) and ut = yt 0 1 yt 1 .
Then, assuming first-order dynamics in the conditional mean,
E ( yt | x1 , x2 ,..., xt ) = E ( yt | xt ) and E ( ut | xt ) = 0 hold, because
E ( yt | x1 , x2 ,..., xt ) = E ( yt | y0 , y1 ,..., yt 1 ) = 0 + 1 yt 1 = E ( yt | xt ) and
E ( ut | x1 , x2 ,..., xt ) = E ( ut | yt 1 ) = E ( ut | xt ) = 0 .
This, however, is different from strict exogeneity. Strict exogeneity does not
hold, as E ( ut | x1 , x2 ,..., xt +1 ) = E ( ut | y0 , y1 ,..., yt ) = ut , t = 1, , T-1
6
Consistency of System OLS and Pooled OLS
Assumption SOLS.2:
(3.39) A E ( X i ' X i ) is non-singular (has rank K)
Then, using E ( X i'ui ) = 0 :
1
(3.40) = E( X X i ) '
i E ( X i' yi )
and
( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N
(3.41)
is the SOLS estimator, is unbiased (follows from OLS properties)
7
Matrix notation:
= ( X ' X ) X ' y ,
1
where X is matrix of stacked X i , i.e.
X1 y1
X2 y2
X= . ( NG K matrix), and y = . ( NG 1 vextor)
. .
. .
XN yN
8
Application to SUR:
xi 1 0
Xi = =>
0 xiG
( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N
1
xi 1' yi 1
N
xi 1' xi 1 0 N
xi 2' yi 2
.
(3.42) = i =1

i =1
.
0 xiG' xiG .
xiG' yiG
9
Remarks:
If there are no-cross equation restrictions, estimating SUR is equivalent
to estimating OLS equation by equation.
Proof: Obvious from equation (3.42)
Cross-equation restrictions need to be considered in X i : See example
10
Application to Panel data:
xi 1
x i2
X i = =>

xiT
( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N
N T 1 N T
(3.43) = x ' x
i =1 t =1
it it x ' y
i =1 t = 1
it it
This estimator is called the POLS (pooled ordinary least squares) estimator,
= POLS
11
Theorem: Consistency of SOLS
Under assumptions SOLS.1 and SOLS.2, plim = .
Proof: See consistency of OLS
Asymptotic distribution:
Theorem: Asymptotic Normality of SOLS

Under assumptions SOLS.1 and SOLS.2,
d
(3.44) N ( ) N (0, A1 BA1 ) , where
N
A plim N 1 X i X i and B E ( X iui ui X i ) = Var ( X iui ) .
i =1
12
Proof follows along the lines of the one of OLS:
N 1 N
Write SOLS estimator as = + N 1
X X
i =1
i i N 1
X u
i =1
i i
( )
1 1

N ( ) = N 1 i =1 X i X i i =1 X iui
N N
Then N 2 .
Inspecting both factors, we first note that
( )
1
i =1 X iX i
1 N
N = A1 + o p (1) (because of consistency),
and by CLT it follows that

1 d

i =1 X iui N ( 0 , B ) , where B E ( X iui ui X i ) = Var ( X iui ) .
N
N 2
13
As a consequence of Lemma 3.5 (WO, p. 39), we know because of CLT,
respectively convergence in distribution, that
1

i =1 X iui
N
N 2 = O p ( 1) .
Hence
( )
1 1

N ( ) = N 1 i =1 X i X i i =1 X iui
N N
N 2 =
1 1
(A )

i =1 X iui i =1 X i ' ui
1 N N
+ o p ( 1) N 2 =A 1
N 2 + o p (1) .
14
1

i =1 X i ' ui
N
Asymptotic Normality of A 1
N 2 + o p (1) follows from the
asymptotic equivalence lemma (Lemma 3.7, WO p. 39), saying

d p d
If z N z and x N z N 0 , then x N z :
1 1

i =1 i i N
N
i =1 X i ' ui
N
Define z N A1 N 2 X ' u , x = A 1
N 2 + o p ( 1 ).
Asymptotic variance simplifies because of E( X i ui ) = 0 :
15
Avar N ( ) = lim Var( x N ) = lim E( x N xN )
1 1

i =1 X i ' ui + o p ( 1 ) i =1 X i ' ui
N N
= lim E A 1 N 2 A 1 N 2 + op ( 1 )
1
=A E N 1
( i =1 X i ' ui ui X i
N
) A 1 = A1 E ( X i ' ui ui X i ) A1 = A1 BA1
q.e.d.
16
Estimated asymptotic variance of SOLS estimator:
1 BA
A 1
=
Avar V ,
N
where
(3.45)
(
A = N i = 1 X i X i = X ' X / N
1 N
)
and
(3.46)
(
1 N
)
B = N i = 1 X iu i u i X i , u i = yi X i
It can be shown that plimB = B (non-trivial proof, see note on p. 152 in

WO). Thus,
( ) ( )( )
1 1
N N N
(3.47) V = i =1
X i X i i =1
X iu i u i X i i =1
X i X i
17
3.3.3. Pooled OLS, Revisited
Consider yt = xt + ut (skipping index i for reasons of
convenience)
Replicate assumptions SOLS.1 and SOLS.2:

Assumption POLS.1:
(3.48) E ( xtut ) = 0
Note that (3.48) does not exclude correlation across observations t s
Assumption POLS.2:
T
(3.49) rank
t =1
E ( xt xt ) = K
18
Want to apply usual test statistics from POLS: Further assumption
required!
Assumptions POLS.3: Homoskedasticity plus no correlation across time
periods:
(3.50) E ( ut2 xt xt ) = 2 E ( xt xt ), t = 1, 2,..., T , where 2 = E ( ut2 )
(3.51) E ( ut us xt x s ) = 0, t , s = 1, 2,..., T , t s
Theorem Large Sample Properties of Pooled OLS:

Under POLS.1 and POLS.2, the pooled OLS estimator is consistent and
asymptotically normal. If POLS.3 holds in addition, then
2
(
Avar( ) = E ( X ' X ) / N
1
i i )
N T 1
and ) = 2 ( X ' X )1 = 2
Avar( x x
i =1 t =1
it it .
19
Proof:
First part follows from consistency and asymptotic normality of SOLS.
Further, by POLS.3:
N T T
B = E ( X i ' ui ui X i ) = E N 1
u u it is xit xis
i =1 t =1 s=1
N T T
= E N
2 1
xit xit = E
i =1 t =1
2
it it
x
t =1
x = 2
E( X
i X i ) = 2
A
Thus, under POLS.3:

A 1
BA 1
2 1
A
(3.52) Avar ( ) = =
N N
Estimation of 2 :
N T
1
(3.53) =
2
NT
it

u 2
i =1 t =1
20
Without POLS.3:
Heteroskedasticity and Autocorrelation robust estimate of Avar ( ) :
N T 1 N T T N T 1
(3.54) ( POLS ) =
Avar x x
i =1 t =1
it it u u
i =1 t =1 s = 1
it is xit xis x x
i =1 t =1
it it
Sandwich formula, allows for serial correlation and time-varying

variances in the disturbances
Note that random sampling is assumed: no correlation across micro

units i, and we are assuming Ai = A for all i clustering of standard
errors might be necessary (cluster unit i)
21
3.3.4. Feasible GLS (FGLS)
OLS estimator does not include information on correlation across ui
efficiency gain by using variance-covariance matrix 2 I of ui , i.e.
Assumptions SGLS.1 and SGLS.2
(3.55) E ( ui ui | X i ) = E ( ui ui )
system homoskedasticity
(3.56) is G G positive definite matrix and E ( X i 1 X i ) is
nonsingular
Remark: As E ( ui | X i ) = 0 , (3.55) is the same as Var ( ui | X i ) = Var ( ui )
22
Usual motivation of GLS estimator:
Premultiply equation yi = X i + ui by 1/ 2 :
1/ 2 yi = ( 1/ 2 X i ) + 1/ 2 ui leading to yi* = X i* + ui* , where
E ( ui* ui* ) = I G holds (verify!). Then:
N 1 N N 1 N
(3.56) GLS = X X
i =1
*
i
*
i X y
i =1
*
i
*
i = X
i =1
i
1
Xi X
i =1
i
1
yi
Unfeasible estimator. Replace by : Feasible GLS. Estimation of :

N
1
(3.57)
=
N
u u
i =1
i i
23
Consistency and asymptotic normality of FGLS requires somewhat
stronger assumption than SOLS.1:
Assumption SGLS.3:
(3.57) E ( X i ui ) = 0
Remarks: (3.57) requires that each element of ui is uncorrelated with each

element of X i , i.e.
E ( xi1ui ) 0
(3.58) =0 in SUR models, and
0 ui )
E ( xiG
24
E ( xi1ui ) = 0
E ( xi2 ui ) = 0

(3.59) in panel data models.

ui ) = 0
E ( xiT
Thus, (3.59) would be satisfied if E ( xs ut ) = 0 t , s = 1,..., T

stronger assumption than (weak) exogeneity required
Compare to POLS.1: E ( xtut ) = 0 (no exclusion of correlation across
ts)
Reason of stronger assumption: Proof of consistency requires

N N
plim N 1
X
i =1
i
1
ui = 0 (instead of plim N 1
X u
i =1
i i = 0 ) such that other
equations or other time periods are involved.
25
Theorem Asymptotic Normality of FGLS
Under Assumptions SGLS.1 to SGLS.3:

( ).
a 1
(3.60) N ( FGLS ) N 0, E ( X i 1 X i )
N
1

An consistent estimator of Avar is obtained by using =
N
u u :
i =1
i i
N 1
(3.61) =
Avar X
i =1
i
1
Xi
Proof: See WO, 153 162
26
Remark: (Full) Matrix Notation of FGLS as Alternative to Summation over
units i:
( )X ( )
1
(3.62) FGLS = X I N
1 1 y
X IN
Verify by using 1/ 2 yi = ( 1/ 2 X i ) + 1/ 2 ui for all i, and by using
X1 y1
X2 y2
X= . , as well as y = . .
. .
. .
XN yN
27
Panel data allows finding better solutions:
a. Treat c as constant over time:

Take first differences,
treat c as individual-specific effect (fixed-effect)
b.Treat c as (random) part of the error term: Use FGLS instead of OLS
(random effect)
Preliminary discussion of solutions: Consider equation of interest

(3.63) yt = 0 + xt + c + ut , t = 1, 2 ,
where E ( ut | xt , c ) = 0, t = 1, 2 E ( xtut ) = 0 t = 1, 2.
Application of POLS? Consistent only if E ( xtc ) = 0 !
29
3.4. Basic Linear Unobserved Effects Panel Data Models
WO Chapter 10
3.4.1. Motivation: The Omitted Variable Bias Reconsidered
Let c be an unobservable random variable, the vector

( y , x1 , x2 ,..., x K , c ) represents the population of interest
c captures unobserved features of an individual such as (cognitive)
ability, motivation, early family upbringing that do not change over
time; firm data: e.g. managerial quality, structure
Dealing with the problem Cov ( x , c ) 0 in cross-sections:
Find proxy variable for c
Use IV approach (see Chapter 4)

28
Differencing equation (3.63) across the two time periods eliminates c:
(3.64) y = x + u .
Consistency of OLS requires

(i) orthogonality of E ( x ' u) = 0 , i.e.
E ( x2 x1 )'( u2 u1 ) = E ( x2 u2 ) + E ( x1u1 ) E ( x1u2 ) E ( x2 u1 ) = 0
Strict exogeneity required! but no restriction on
correlation between xt and c
(ii) rank E ( x ' u) = K
fails, because constant term and other variables being
constant over time are differenced out
the effect of any variable that is constant over time cannot
be distinguished from c
30
Further introductory remarks:
Restriction to balanced panels
Fixed T, large N assumption:

Cross-section asymptotics with N=50 and T=8? Difficult to know
but more reasonable than T
N=60, T=55: Assumptions about time series dependence required
N=5, T=40: Framework of multiple time series required
31
3.4.2. Random Effects Models
Consider
(3.65) yit = 0 + xit + ci + uit , t = 1,..., T
Random or Fixed Effects?
Random effect is synonymous with Cov ( xit , ci ) = 0 , t = 1, , T;

fixed-effects (firm-specific effect, individual fixed effect, state-specific
effect,..) means that Cov ( xit , ci ) 0 is allowed for.
Strict exogeneity assumption RE.1:

(3.66) RE.1 a) E ( uit | xi 1 ,..., xiT , ci ) = 0 , b) E (ci | xi 1 ,..., xiT ) = E (ci ) = 0
32
Remarks:
It follows from RE.1 that E ( xis uit ) = 0 s , t = 1,..., T
E (ci ) = 0 would follow without assumption RE.1 b) when an
intercept is included
Examples illustrating RE.1:
WO 10.1 Program Evaluation: Estimating the effect of job training on

subsequent wages
(3.67) log( wageit ) = t + zit + 1 progit + ci + uit ,
where t denotes a time-varying intercept, zit vector of observable

variables, and ci covers unobserved ability.
33
Evaluation takes place by comparing productivity (= wages) at some initial
period t=1, when no one participated in the program, to wages at t=2, after
a treatment group has participated in the program, whereas a control
group has not.
Define
progi 1 = 0 for all i
progi 2 = 0 for all i of the control group
progi 2 = 1 for all i of the treatment group
Cov ( progit , ci ) = 0 ? depends on (self-)selection problem!

High-ability workers might participate with higher probability
Workers might be assigned based on characteristics unknown
to the econometrician
34
Strict exogeneity?
Adverse wage shock ut might people choose to participate in future
training, i.e. Cov ( ut , progi . t +1 ) 0
Training program could have lasting effects
WO 10.2 Distributed Lag Model
(3.68) patentsit = t + zit + 0 RDit + 1 RDi . t 1 + ... + 5 RDi . t 5 + ci + uit
Does R&D spending depend on unobserved firm characteristics?

Do shocks to current patents (changes in uit ) affect future expenditures
on R&D?
35
WO 10.3 Lagged Dependent Variable
(3.69) log( wageit ) = 1 log( wagei . t 1 ) + ci + uit , t = 1,..., T
Want to study wage persistence (i.e. the size of ) after controlling for
unobserved heterogeneity:
Let yit = log( wageit ) , xit = yi . t 1 then a standard time series assumption
would be that E ( uit | yi . t 1 , yi . t 2 ,..., yi .0 , ci ) = E ( uit | xit , xi . t 1 ,..., xi 1 , ci ) = 0 .
Thus, uit is uncorrelated with past xis , s t , but uit cannot be uncorrelated
with future xis , s > t :
E ( yit uit ) = 1 E ( yi . t 1uit ) + E (ci uit ) + E ( uit2 ) = E ( uit2 ) > 0
E ( yi . t +1uit ) = 1 E ( yi . t uit ) + E ( ci uit ) + E ( ui . t +1uit ) = 1 E ( yi . t uit ) = 1 E ( uit2 ) 0 ,
The strict exogeneity assumption never holds in unobserved

effect models with lagged dependent variables.
36
Estimation and Inference under the Basic Random Effects
Assumptions
Consider error term in basic model yit = xit + it , t = 1,..., T :

(3.70) it ci + uit
Note that joint dependence on ci causes serial correlation of it

POLS would lead to unbiased, consistent and asymptotic normal
estimation of , but statistical inference would need robust estimation
of variance-covariance matrix (sandwich-formula 3.54)
FGLS is straightforward and more efficient solution that exploits serial
correlation of it
37
Note that E ( it | xi 1 ,..., xiT ) = 0 satisfies the strict exogeneity assumption
SGLS.3 (3.57) such that we can apply GLS methods that account for the
particular error structure in eq. (3.70).
Analysis of serial correlation and variance:

(3.71) Var ( it ) = c2 + u2 , Cov ( it , is ) = c2 ts
Change back to system notation: Write yit = xit + it , t = 1,..., T for all T
periods as (see 3.3.1)
(3.72) yi = X i + i ,
where
(3.73) i = c i jT + ui , jT = (1,...,1)' (T 1 vector).
38
Following along the lines of GLS, we define
(3.74) E( i i )
and we assume it to be positive definite (note that it is the same for all i
random sampling!).
From Var ( it ) = c2 + u2 and Cov ( it , is ) = c2 t s it follows that
c2 + u2 c2 c2
c2 c2 + u2
(3.75) = E ( i i ) = = u2 IT + 2j jT jT
c2 c2 + u2
Matrix Notation:
(3.76) Var ( ) = I NT , where ' = (1 ,..., N ) .
39
) instead of ) GLS rank condition,
Assuming the slightly modified ( plim (
RE.2,
( )
1
(3.77) )
rank E X i plim ( Xi ) = K ,
and provided that we have consistent estimates of c2 and u2 at our disposal

(see below; implies consistent estimate ), we eventually obtain the random
effects estimator:
N 1 N
(3.78) RE = X i X i
i =1
1
X
i =1
1 y .
i
40
Theorem Consistency of the Random Effects Estimator
Under RE.1 and RE.2 RE is an N asymptotically normal and consistent
estimator of .
Proof: Follows from consistency of FGLS.
For efficiency of RE , the rely on general conditional variances and

conditional covariances instead of the unconditional statement = E ( i i ) :
Assumption RE.3:
(3.79) a) E ( ui ui | xi 1 ,..., xiT , ci ) = u2 , b) E (ci2 | xi 1 ,..., xiT ) = c2
=> E ( uit2 ) = u2 , t = 1, 2,..., T , Cov ( it , is ) = c2 ts
Var (ci ) = c2 and, hence, = E ( i i )
(by iterated expectations argument)
41
Theorem Efficiency of the Random Effects Estimator
Under RE.1, RE.2 and RE.3 RE is consistent, N asymptotically normal
and asymptotically efficient in the class of estimators under
E ( i | xi 1 ,..., xiT ) = 0 .
Proof: Follows from the fact that RE is asymptotically equivalent to GLS

under RE.1 RE.3 (WO, p. 260).
Estimation of
Inspection of 2 = c2 + u2 suggests:
i. estimation of 2 ,
ii. estimation of c2 => estimation of u2
42
First step: Consistent estimation of 2
Consider basic panel regression equations yit = xit + it , t = 1,..., T :

T T
E ( it ) =
2
E ( it ) = T 2
= T
2 1
it )
E ( 2
t =1 t =1
Hence, consistent estimator of 2 requires consistent estimation it
Use residuals of consistent Pooled OLS, i.e. it = yit xit POLS

Consistent estimator is given by
N T
1
(3.80) =
2
NT K
it
2

i = 1 t =1
43
Second step: Find a consistent estimator of c2
Recall that c2 = E ( it is )
=> T (T 1) / 2 pairs s t , s , t = 1, 2,..., T , all being equal to c2
T 1 T T 1 T T 1 T
=> E ( it is ) = E ( it is ) = 2
c = c2T (T 1) / 2
t =1 s = t + 1 t =1 s = t + 1 t =1 s = t + 1
Solve for c2 , replace population moment by sample moment, use

consistent residuals of POLS, make small-sample-degrees-of-freedom
correction:
N T 1 T
1
(3.81) c2 =
NT (T 1) / 2 K

i =1 t =1 s = t + 1
it is
Summation in (3.81) might lead to negative value: indication of negative

correlation in u => Conclusion: RE.3 might be violated.
44
Correlation between composite errors it and is does not depend on
difference (t-s), and is also useful measure of the relative importance of the
unobserved effect ci :
(3.82) Corr ( it , is ) = c2 /( c2 + u2 )
Robust Estimation of
a) Conduct statistical inference without assumption RE.3 (still consistent

estimator)
Use robust variance estimator V under RE.1 and RE.2 (see WO, p.
160)
45
Reasons:
E ( i i | xi 1 ,..., xiT ) is not constant, i.e. E ( i i | xi 1 ,..., xiT ) E ( i i )
E( i i ) does not has the random effect structure = E( i i ) , based on
c2 and u2
b) General FGLS Analysis

applies when idiosynchratic errors uit , t = 1, 2,..., T are generally
heteroskedastic and serially correlated
Keep assumption E ( i i | xi 1 ,..., xiT ) = , but:

Estimate without restricting random effect structure
46
has T (T + 1) / 2 estimated elements (recall that random effect
structure requires only two parameters)
with very large N, general FGLS is attractive alternative if appears
to have a pattern different from the random effects pattern
estimation:
N
1
(3.83)

N
,
i =1
i i where i are POLS residuals
Consistent estimator, and with E ( i i | xi 1 ,..., xiT ) = just as efficient

as RE under assumption RE.3
47
Testing for the presence of an Unobserved Effect
H 0 : c2 = 0 Cov ( it , is ) = 0
Derive test statistic based on asymptotic normality of

N T 1 T T 1 T 2
d
(3.84) N 1/ 2 it is N 0, E it is ,
i = 1 t =1 s = t + 1 t =1 s = t + 1
which holds under H 0 : c2 = 0 that the it are serially uncorrelated (WO, p.

264 and problem 7.4
48
Calculate test statistic that is distributed asymptotically as standard
normal:
N T 1 T

i =1 t =1 s = t + 1
it is
(3.85)
N T 1 T 2 1/ 2

i =1
it is

t =1 s = t + 1
Caution: Detection of many kinds of serial correlation possible

rejection does not necessarily imply that random effect structure is
true
49
Fixed Effect Methods
Key difference to RE models: ci may be correlated with error term uit

write linear unobserved effects model yit = xit + ci + uit , t = 1,..., T , as
(3.86) yi = X i + ci jT + ui (T 1 vector)
Strict exogeneity assumption FE.1:

(3.87) FE.1 a) E ( uit | xi 1 ,..., xiT , ci ) = 0
(Thus, RE. b) is not maintained, i.e. E (ci | xi 1 ,..., xiT ) is allowed to be any
function of xi 1 ,..., xiT )
50
Flexibility comes at a price: xi 1 ,..., xiT cannot include time-constant factors
However, we can estimate time-varying impact of time-constant variables:

Consider
(3.88) yit = 1 + 2d 2t + ... + T dTt + zi 1 + d 2t zi 2 + ... + dTt zi T + wit + ci + uit
for t = 1,...T under FE.1, where
d 2,..., dT denote time-period dummies
zi be a vector of time-constant observable variables
wit be a vector of time-varying variables
identification problem: 1 + zi 1 cannot be distinguished from ci
however, 2 ,..., T are identified, we can test whether the impact of
time-constant variables (such as gender bias) has changed over time!
51
Estimation by way of within transformation:
Average yit = xit + ci + uit , over t = 1,..., T and get yi = xi + ci + ui

Subtracting yi = xi + ci + ui from yit = xit + ci + uit gives the transformed
equation
(3.89) ( yit yi ) = ( xit xi ) + ( uit ui )
or
(3.90) yit = xit + uit
=> elimination of individual specific effect
52
The fixed effects estimator (FE) estimator or within estimator is defined as
the POLS estimator applied to eq. (3.90)
Consistent estimation if application of POLS to (3.90)? Yes, because

E ( xit uit ) = 0 (immediately follows from FE.1 and E ( xit uit ) = 0 )
Moreover, the FE estimator is unbiased, because

E ( uit | xi 1 ,..., xiT ) = 0 (which follows from strict exogeneity
assumption FE.1, E ( uit | xi ) = E ( uit | xi ) E ( uit | xi ) = 0 and from xi
being a function of xi = ( xi 1 ,..., xiT ) )
The between estimator is defined as the OLS estimator applied to the cross
section equation yi = xi + ci + ui (note loss of efficiency)
53
Vector notation and time-demeaning
Proposition: yit = xit + uit follows from yi = X i + ci jT + ui by application

of the Frisch-Waugh-Theorem
Proof: All we need is a tool that produces deviations from the time mean as,
for example, the auxiliary regession yit = i + yit , t = 1,..., T , implying that
i = yi => The residual maker matrix QT IT jT ( jT jT )1 jT does the job:
QT jT = 0, QT yi = yi , QT X i = X i , QT ui = ui .
54
Assumption FE.2:
T
(3.91) (
rank E ( X i X i ) = rank ) E ( x x it it ) =K (FE.2)
t =1
Remark:
Would be less than K when any element xit is identical zero for
all i, i.e. observations need to vary over time
Fixed effect (within) estimator:

N 1 N N T 1 N T
(3.92) FE = X X
i =1
i i X y
i =1
i i = x x
i =1 t =1
it it x y
i =1 t = 1
it it
55
Properties of between estimator:
Inconsistent under FE.1, because E ( xici ) is not necessarily zero
Consisteny of between estimator requires assumption RE.1 (b)
Asymptotic Inference with Fixed Effects

The assumption FE.3 assures that the FE is efficient (follows from efficiency
of POLS under Gauss-Markov-Assumptions):
(3.93) E ( ui ui | xi , ci ) = u2 IT (FE.3)
Implications:
Var ( ui | xi , ci ) = u2 IT , E ( ui ui | xi , ci ) = E ( ui ui ), Var ( ui ) = E ( ui ui ) = u2 IT
56
Variance of uit , t = 1,..., T , is homoskedastic. Indeed:
Var ( uit ) = E ( u it2 ) = E ( uit ui )2 = E ( uit2 ) + E ( u i2) 2 E ( uit ui )
T 1
(3.94) = + / T 2 / T =
2
u
2
u
2
u
2
u
T
However, time demeaning leads to negatively serially correlated

uit , t = 1,..., T :
E ( uit uis ) = E ( uit ui )( uis ui ) = E ( u i2) E ( ui uis ) E ( ui uis )
(3.95) = u2 / T 2 u2 / T = u2 / T < 0
Nevertheless, negatively correlated uit do not affect efficiency of FE,
because asymptotics use CLT based on uit , not on uit :
57
Asymptotic variance of FE estimator follows from
1 1 N 1 1 N
( )
N N
N FE = N 1 X i X i N 2
X u i i = N 1 X i X i N 2
X u i i
i =1 i =1 i =1 i =1
(note that X u = X iQT QT ui = X iQT ui = X ui ).
Moreover, under FE.3, E ( ui ui | X i ) = u2 IT . Then, from system OLS it

follows that
( ) ( )
a 1
(3.96) N FE N 0, u2 E ( X i X i )
1
=> Avar ( FE ) = u2 E ( X i X i ) /N
58
and
1 1
( )
N N T
(3.97) FE = u2
Avar X X
i =1
i i = u2 x x
i =1 t = 1
it it
Estimation of u2 :
Note that (3.97) requires u2 , i.e. residuals uit , not residuals uit from the
transformed equation yit = xit + uit !
Solution strategy: Express u2 dependent on uit

T 1 T
Use (3.94), i.e. Var ( uit ) = E ( u ) =
2
it
2
u
T
=> it u (T 1)
E (
t =1
u 2
) = 2
59
Hence, using sample moments, and correcting for degrees of freedom, we
N T
1
obtain u =
2
2
u
N (T 1) K i =1 t =1 it
Avoid overly complicated notation u , but recall that definition might be

ambiguous:
N T
1
(3.98) u2 =
N (T 1) K
u
i =1 t =1
2
it ,
where fixed effect residuals are defined as

(3.99) u it = yit xit FE , t = 1,..., T ; i = 1,..., N
=> u2 is unbiased estimator of u2
60
Word of caution:
Denominator is N (T 1) K , not NT K as for standard regressions or
2 in the RE model!
(Downward) Bias can be substantial when T is small !
Using standard computer packages and applying it after demeaning
requires correction of standard errors
Example: N=500, T=3, K=10 => correction factor = 1.227
61
Measure of the relative importance c2 /( c2 + u2 ) of the fixed-effect
component ci :
( )
N T
1 2
Given FE , =
2

NT K i =1 t =1

yit xit FE is a consistent estimate of
2 such that c2 = 2v u2 is a consistent estimate of c2 .
Testing multiple restrictions: F-Test

( SSRr SSRur ) / Q a
(3.100) F= F (Q , N (T 1) K ) ,
SSRur /( N (T 1) K )
N T
where SSRur = u it2 unrestricted residuals: u it = yit xit FE
i = 1 t =1
SSRr = restricted residuals from similar regression, but
with Q restrictions imposed on
62
The Dummy Variable Approach to Fixed-Effect Modelling
(LSDV)
View the ci as parameters to be estimated along with :
Matrix notation
(3.101) y = X * * + ,
where X * = ( X , d 1,..., dN ), * = ( 1 ,..., K , c1 ,..., c N )

1 if n = i
and dnit = t = 1,..., T ; i = 1,..., N
0 if n i
Consistency, Efficiency of LSDV :

Under assumptions FE.1, rank ( X * ) = K + N , and FE.3, the LSDV
regression y = X * * + satisfies the GAUSS-Markov-Assumptions.
63
Remark: LSDV as defined in (3.101) reproduces the FE estimator FE of ,
and also the residuals from the LSDV approach are identical to the
residuals u of the regression yit = xit + uit .
Proof: Application of the Frisch-Waugh-Theorem
Advantages of the LSDV approach:
y = X * * + produces u2 with the (usual) correct degrees of freedom:

NT N K = N (T 1) K
Under the GAUSS-Markov assumptions, y = X * * + provides best
linear unbiased estimates of c1 ,..., c N
We can test equality of c1 ,..., c N using an F-Test
64
Treatment of collinearity of constant and fixed effects in software packages:
The overall intercept is either for an arbitrary cross section unit or, more
commonly, for the average of the ci across i.
Statistical Inference in the Presence of Serial Correlation:

Application of the Robust Variance Matrix Estimator
FE estimator is consistent and asymptotically normal under assumption
FE.1 and FE.2. Note, however, that statistical inference based on
1
( )
N T
FE = u2
Avar x x
i =1 t =1
it it would be wrong when FE.3 does not hold.
Most severe problem: serial correlation in uit , t = 1,..., T .
65
Procedure:
a) Detection of serial correlation (of various kinds):
Run the pooled OLS regression uit on u i . t 1 (where uit are FE

residuals, u it = yit xit FE ), make t-values robust to serial
correlation (see 3.47 and 3.54).
b) If we find serial correlation, adjust the asymptotic variance estimator

and test statistics. Applying (3.47), we obtain the robust estimate
( ) ( ) ( )( )
1 1
N N N
(3.102) FE =
Avar X i X i X iu i u i X i X i X i
i =1 i =1 i =1
66
First Differencing Methods
Consider
(3.103) yit = xit + uit , t = 2,..., T
Assumption FD.1
(3.104) E ( uit | xi 1 ,..., xiT , ci ) = 0 (same as FE.1)
Assumption FD.2
T
(3.105) rank E ( x x )
t =2
it it =K
67
First-difference (FD) estimator is the pooled OLS estimator from the
regression yit on xit , t = 2,..., T ; i = 1,..., N
Discussion of FD under FD.1, FD.2:
Elements of xit must be time varying
Constant and unobserved effect ci get differenced away
Example: Differencing
yit = 1 + 2d 2t + ... + T dTt + zi 1 + d 2t zi 2 + ... + dTt zi T + wit + ci + uit
we obtain
yit = 2 ( d 2t ) + ... + T ( dTt ) + ( d 2t ) zi 2 + ... + ( dTt ) zi T + wit + uit
68
Discussion of FD under FD.1, FD.2 (continued):
From FD.1 it follows that E ( xit uit ) = 0 => FD is consistent
From FD.1 it follows E ( uit | xi 2 ,..., xiT ) = 0 (strict exogeneity)

=> FD is unbiased conditional on X
Efficiency:
Consider FD model in matrix notation y = X + e , where e = u .

Efficiency follows from applying the GAUSS-MARKOV Theorem under
assumptions FD.1, FD.2 and the additional assumption FD.3:
(3.106) E (ei ei | xi 1 ,..., xiT , ci ) = e2 IT 1
69
Inference:
Under FD.1 FD.3 the usual OLS standard errors from the first
difference regression are asymptotically valid.
A consistent estimator of e2 is obtained from (note that denominator is

from usual OLS standard error due to skipping the first period):
( )
N T
1 2
(3.107) =
2
e yit xit FD
N (T 1) K i =1 t =1
Robust estimate (without FD.3):
(3.108) ( )
Avar FD = ( X X )

1
( i =1 X ie i e i X i
N
) ( X X )
1
70
Practical hint: Avoid producing differences such as yi +1.1 yi .T when
calculating y and x from stacked cross sections; differences for
observations 1, T+1, 2T+1, , (N-1)T+1 should be set to missing
Application: Policy Analysis Using First Differences - The Difference-in-

Difference (DiD) Estimator
Recall Program Evaluation: Estimating the effect of job training on

subsequent wages
log( wageit ) = t + 1 progit + ci + uit , t = 1, 2
We defined
progi 1 = 0 for all i
progi 2 = 0 for all i of the control group
progi 2 = 1 for all i of the treatment group
71
From the first-difference
(3.109) log( wagei 2 ) = 2 + 1 progi 2 + ui 2 ,
where progi 2 = progi 2 , we obtain the DiD estimator 1 . It can be written as
(3.110) 1 = log( wage )treatment log( wage )control
(see WO, Problem 10.4)
In general, more than two periods and further explanatory variables are
included (see Chapter on Evaluation Methods)
72
Comparison of estimators
Fixed Effects versus First Differencing
Common feature: inconsistency in case of standard endogeneity

problems, including measurement error, time-varying omitted
variables, and simultaneity (see WO, Chapter 11, for details)
T=2:
FE and FD produce identical estimates (WO Problem 10.3)
FD is easier to implement because all procedures that can be

applied to single cross sections can be applied directly
73
T>2
Hinges upon the assumptions about the idiosyncratic errors uit :
FE is more efficient when uit are serially uncorrelated
FD is more efficient when uit follows a random walk
Recall from assumption FD.3:
E (eit eis | xi , ci ) = 0 => eit = uit is serially uncorrelated
=> uit = ui . t 1 + eit follows a random walk
Under FD.1 FD.3, estimation of usual OLS standard errors
from the first difference regression are asymptotically valid,
while under FE.1 FE.3 estimated OLS standard errors based
on (within) FE regression need to be corrected (note, however,
that this is not the case for the LSDV approach)
Formal test by using a Hausman test
74
Fixed Effects versus Random Effects
Consider quasi-time demeaning:
(3.111) yit yi = ( xit xi ) + ( it i )
Estimating (3.111) by POLS, we obtain (WO, p. 286/287)
{ ( )}
1/ 2
(3.112) = 1 1/ 1 + T c2 / u2
Thus, 1 when T or c2 / u2
75
{
We conclude from = 1 1/ 1 + T ( / )}
1/ 2
2 2
c u that random effects can be
close to fixed effects
for large T
if the estimated variance of the ci is large relative to the estimated

variance of the uit
=> = c2 /( c2 + u2 ) 1
{
We conclude from = 1 1/ 1 + T ( / )}
1/ 2
2 2
c u that random effects can be
close to pooled OLS
if c2 / u2 0 , i.e. = c2 /( c2 + u2 ) 0
76
The Hausman Test Comparing the RE and FE estimators
Test points at crucial difference between both estimators: Potential

correlation between when ci and xit , i.e. validity of Assumption RE.1.b.
Idea of test: Consider difference FE RE under the assumption that

RE.3 holds true (i.e. existence of the random effect structure = E( i i ) )
Correlation = 0 => RE would be consistent and more efficient than FE

(Proof: See WO, p.289), FE consistent
Correlation 0 => RE would be inconsistent, FE consistent
( )
when FE RE 0 then we observe consistency in both cases
but RE is more efficient => use of the efficient RE estimator
77
( )
when FE RE 0 then RE differs from the consistent
alternative and RE must be inconsistent => use of the consistent FE
estimator
Hausman test statistic:
( )
( ) ( ) ( )
1 a
(3.113) H = FE RE FE Avar
Avar RE FE RE 2M ,
where RE denotes the vector of random coefficient estimates without the

coefficients on time-constant variables, FE is vector of corresponding fixed
effect estimates, each being M 1 vectors ( M = dim ( ) 1)
FE
78
Proof:
H is based on the Wald statistic
(
) ( ) ( )
1 a
H = FE RE FE RE
Avar FE
RE 2M
Consider
(
Var
FE RE ) = Var ( ) + Var ( ) Cov (
FE RE RE , FE )
Cov ,
RE (
FE )
Hausmans essential result is that the covariance of an efficient estimator
with its difference from an inefficient estimator is zero, which implies that
( ) (
Cov FE RE , RE = Cov FE , RE Var RE = 0 or) ( )
Cov ( FE )
, RE = Var RE ( )
79
( ) ( )
Replacing Cov FE , RE = Var RE , we obtain
( ) ( ) ( ) ( )
Var FE RE = Var FE + Var RE Var RE Var RE ( )
q.e.d.
If primary interest lies in single parameter estimate k (for example,

because of single important explanatory policy variable), and other
variables are control variables, then Hausman test boils down to a t-test
(3.114) t=
(
k FE )
kFE
,
( ) Avar
( )
1/ 2
Avar kFE k RE
80
( )
k
where Avar FE
and Avar ( )
k
RE
are corresponding main diagonal
( ) and Avar
elements of Avar FE
( ) , respectively.
RE
Remarks:
Test assumes that RE.1 RE.3 hold; Rejection does not allow any
conclusion about true causes of rejection
reason of rejection might be that RE.3 is too strong an
assumption
Hausman test has no power against the alternative that
Assumption RE.1 is true but Assumption RE.3 is false.
If Assumption RE.3 fails, WO proposes robust form of the Hausman

statstic based on the F statistic: See WO, p.290.
81

Overview Panel

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Overview Panel

Enviado por

Direitos autorais:

Formatos disponíveis

3.

3 System OLS and GLS

3.3.1. Motivation: Seemingly Unrelated Regressions (SUR)

Gain in efficiency: use GLS instead of OLS

Denote random draw as yig = xig g + uig , i=1,,N, all g = 1,,G

Asymptotics are based on G fixed and N tending to infinity

Stronger assumption than (3.34) would be

Contemporaneous and strict exogeneity in panel data analysis (defined

is the SOLS estimator, is unbiased (follows from OLS properties)

where X is matrix of stacked X i , i.e.

Theorem: Asymptotic Normality of SOLS

Inspecting both factors, we first note that

and by CLT it follows that

asymptotic equivalence lemma (Lemma 3.7, WO p. 39), saying

Asymptotic variance simplifies because of E( X i ui ) = 0 :

It can be shown that plimB = B (non-trivial proof, see note on p. 152 in

Replicate assumptions SOLS.1 and SOLS.2:

Theorem Large Sample Properties of Pooled OLS:

Thus, under POLS.3:

Sandwich formula, allows for serial correlation and time-varying

Note that random sampling is assumed: no correlation across micro

Assumptions SGLS.1 and SGLS.2

Remark: As E ( ui | X i ) = 0 , (3.55) is the same as Var ( ui | X i ) = Var ( ui )

Unfeasible estimator. Replace by : Feasible GLS. Estimation of :

Remarks: (3.57) requires that each element of ui is uncorrelated with each

Thus, (3.59) would be satisfied if E ( xs ut ) = 0 t , s = 1,..., T

Reason of stronger assumption: Proof of consistency requires

Under Assumptions SGLS.1 to SGLS.3:

Proof: See WO, 153 162

Verify by using 1/ 2 yi = ( 1/ 2 X i ) + 1/ 2 ui for all i, and by using

a. Treat c as constant over time:

Preliminary discussion of solutions: Consider equation of interest

Application of POLS? Consistent only if E ( xtc ) = 0 !

3.4.1. Motivation: The Omitted Variable Bias Reconsidered

Let c be an unobservable random variable, the vector

Dealing with the problem Cov ( x , c ) 0 in cross-sections:

Find proxy variable for c

Use IV approach (see Chapter 4)

Consistency of OLS requires

Restriction to balanced panels

Fixed T, large N assumption:

Random or Fixed Effects?

Random effect is synonymous with Cov ( xit , ci ) = 0 , t = 1, , T;

Strict exogeneity assumption RE.1:

Examples illustrating RE.1:

WO 10.1 Program Evaluation: Estimating the effect of job training on

(3.67) log( wageit ) = t + zit + 1 progit + ci + uit ,

where t denotes a time-varying intercept, zit vector of observable

Cov ( progit , ci ) = 0 ? depends on (self-)selection problem!

WO 10.2 Distributed Lag Model

(3.68) patentsit = t + zit + 0 RDit + 1 RDi . t 1 + ... + 5 RDi . t 5 + ci + uit

Does R&D spending depend on unobserved firm characteristics?

(3.69) log( wageit ) = 1 log( wagei . t 1 ) + ci + uit , t = 1,..., T

The strict exogeneity assumption never holds in unobserved

Consider error term in basic model yit = xit + it , t = 1,..., T :

Note that joint dependence on ci causes serial correlation of it

Analysis of serial correlation and variance:

From Var ( it ) = c2 + u2 and Cov ( it , is ) = c2 t s it follows that

and provided that we have consistent estimates of c2 and u2 at our disposal

Proof: Follows from consistency of FGLS.

For efficiency of RE , the rely on general conditional variances and

Proof: Follows from the fact that RE is asymptotically equivalent to GLS

Consider basic panel regression equations yit = xit + it , t = 1,..., T :

Hence, consistent estimator of 2 requires consistent estimation it