Você está na página 1de 81

3.

3 System OLS and GLS


WO, Chapter 7

3.3.1. Motivation: Seemingly Unrelated Regressions (SUR)


Consider G equations that seem to be unrelated, but correlation across the
errors in different equations can be exploited in more efficient estimators
y1 = x1 1 + u1
(3.32)
y2 = x2 2 + u2



yG = xG G + uG

Gain in efficiency: use GLS instead of OLS


1
Remarks:
In many applications all x g , g = 1,,G, include the same set of
variables, but this not necessarily required
Each equation represents a generic individual (person, firm, city,
household, ) from the population, each equation g involves
K g explanatory variables

Denote random draw as yig = xig g + uig , i=1,,N, all g = 1,,G

Asymptotics are based on G fixed and N tending to infinity


(Small T) Panel data might be considered as special case: Consider t =
1,,T instead of g = 1,,G

2
Matrix notation:

(3.33) yi = X i + ui , where yi :G 1 , X i : G K .

Examples:
xi 1 0
SUR: Xi = , ' = ( 1 , 2 ,..., G ) , K = g K g
0 xiG

xi 1
x i2
Panel data: X i = , X i is T K matrix => yit = xit + uit

xiT

3
3.3.2 Asymptotic Properties of System OLS
Assumption SOLS.1:
(3.34) E(Xi ui ) = 0 orthogonality condition
Remarks:
If Xi has a sufficient number of elements equal to unity, then E(ui ) = 0 .
Note multi-equation nature of (3.34) compared to orthogonality
condition in single equation model:
SUR: (3.34) implies
(3.35) E(xig uig ) = 0 g .
Note, however, that (3.34) does not imply that regressors of equation
g are uncorrelated with errors from equation h, i.e. (3.34) allows
that E(xig uih ) 0 g h

4
Panel data: (3.34) implies
(3.36) E(xit uit ) = 0 t
Thus, the orthogonality condition does not require the stronger
assumption
(3.37) E(xit uis ) = 0 t ,s

Stronger assumption than (3.34) would be


(3.38) E ( ui | X i ) = 0 zero conditional mean assumption

Contemporaneous and strict exogeneity in panel data analysis (defined


for population):
E ( ut | xt ) = 0 (contemporaneous) exogeneity (see 3.36)
E ( ut | x1 , x2 ,..., xT ) = 0 strict exogeneity (see 3.37)

5
Under strict exogeneity, it follows that
E ( yt | x1 , x2 ,..., xT ) = E ( yt | xt )
Example:

yt = 0 + 1 yt 1 + ut
Note that xt (1, yt 1 ) and ut = yt 0 1 yt 1 .
Then, assuming first-order dynamics in the conditional mean,
E ( yt | x1 , x2 ,..., xt ) = E ( yt | xt ) and E ( ut | xt ) = 0 hold, because
E ( yt | x1 , x2 ,..., xt ) = E ( yt | y0 , y1 ,..., yt 1 ) = 0 + 1 yt 1 = E ( yt | xt ) and
E ( ut | x1 , x2 ,..., xt ) = E ( ut | yt 1 ) = E ( ut | xt ) = 0 .
This, however, is different from strict exogeneity. Strict exogeneity does not
hold, as E ( ut | x1 , x2 ,..., xt +1 ) = E ( ut | y0 , y1 ,..., yt ) = ut , t = 1, , T-1

6
Consistency of System OLS and Pooled OLS

Assumption SOLS.2:
(3.39) A E ( X i ' X i ) is non-singular (has rank K)
Then, using E ( X i'ui ) = 0 :
1
(3.40) = E( X X i ) '
i E ( X i' yi )

and

( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N
(3.41)

is the SOLS estimator, is unbiased (follows from OLS properties)

7
Matrix notation:

= ( X ' X ) X ' y ,
1

where X is matrix of stacked X i , i.e.

X1 y1
X2 y2
X= . ( NG K matrix), and y = . ( NG 1 vextor)
. .
. .
XN yN

8
Application to SUR:

xi 1 0
Xi = =>
0 xiG

( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N

1
xi 1' yi 1
N
xi 1' xi 1 0 N
xi 2' yi 2
.
(3.42) = i =1

i =1
.
0 xiG' xiG .
xiG' yiG

9
Remarks:
If there are no-cross equation restrictions, estimating SUR is equivalent
to estimating OLS equation by equation.
Proof: Obvious from equation (3.42)
Cross-equation restrictions need to be considered in X i : See example

10
Application to Panel data:

xi 1
x i2
X i = =>

xiT

( i =1 X i ' X i ) (N i =1 X i ' y i )
1
= N 1 N 1 N

N T 1 N T
(3.43) = x ' x
i =1 t =1
it it x ' y
i =1 t = 1
it it

This estimator is called the POLS (pooled ordinary least squares) estimator,
= POLS

11
Theorem: Consistency of SOLS
Under assumptions SOLS.1 and SOLS.2, plim = .
Proof: See consistency of OLS

Asymptotic distribution:

Theorem: Asymptotic Normality of SOLS


Under assumptions SOLS.1 and SOLS.2,
d
(3.44) N ( ) N (0, A1 BA1 ) , where
N
A plim N 1 X i X i and B E ( X iui ui X i ) = Var ( X iui ) .
i =1

12
Proof follows along the lines of the one of OLS:

N 1 N
Write SOLS estimator as = + N 1
X X
i =1
i i N 1
X u
i =1
i i

( )
1 1

N ( ) = N 1 i =1 X i X i i =1 X iui
N N
Then N 2 .

Inspecting both factors, we first note that

( )
1

i =1 X iX i
1 N
N = A1 + o p (1) (because of consistency),

and by CLT it follows that


1 d

i =1 X iui N ( 0 , B ) , where B E ( X iui ui X i ) = Var ( X iui ) .
N
N 2

13
As a consequence of Lemma 3.5 (WO, p. 39), we know because of CLT,
respectively convergence in distribution, that

1

i =1 X iui
N
N 2 = O p ( 1) .

Hence

( )
1 1

N ( ) = N 1 i =1 X i X i i =1 X iui
N N
N 2 =
1 1
(A )

i =1 X iui i =1 X i ' ui
1 N N
+ o p ( 1) N 2 =A 1
N 2 + o p (1) .

14
1

i =1 X i ' ui
N
Asymptotic Normality of A 1
N 2 + o p (1) follows from the

asymptotic equivalence lemma (Lemma 3.7, WO p. 39), saying


d p d
If z N z and x N z N 0 , then x N z :
1 1

i =1 i i N
N
i =1 X i ' ui
N
Define z N A1 N 2 X ' u , x = A 1
N 2 + o p ( 1 ).

Asymptotic variance simplifies because of E( X i ui ) = 0 :

15
Avar N ( ) = lim Var( x N ) = lim E( x N xN )

1 1

i =1 X i ' ui + o p ( 1 ) i =1 X i ' ui
N N
= lim E A 1 N 2 A 1 N 2 + op ( 1 )

1
=A E N 1
( i =1 X i ' ui ui X i
N
) A 1 = A1 E ( X i ' ui ui X i ) A1 = A1 BA1

q.e.d.

16
Estimated asymptotic variance of SOLS estimator:

1 BA
A 1
=
Avar V ,
N
where
(3.45)
(
A = N i = 1 X i X i = X ' X / N
1 N
)
and
(3.46)
(
1 N
)
B = N i = 1 X iu i u i X i , u i = yi X i

It can be shown that plimB = B (non-trivial proof, see note on p. 152 in


WO). Thus,

( ) ( )( )
1 1
N N N
(3.47) V = i =1
X i X i i =1
X iu i u i X i i =1
X i X i

17
3.3.3. Pooled OLS, Revisited
Consider yt = xt + ut (skipping index i for reasons of
convenience)

Replicate assumptions SOLS.1 and SOLS.2:


Assumption POLS.1:
(3.48) E ( xtut ) = 0
Note that (3.48) does not exclude correlation across observations t s

Assumption POLS.2:
T
(3.49) rank
t =1
E ( xt xt ) = K

18
Want to apply usual test statistics from POLS: Further assumption
required!
Assumptions POLS.3: Homoskedasticity plus no correlation across time
periods:
(3.50) E ( ut2 xt xt ) = 2 E ( xt xt ), t = 1, 2,..., T , where 2 = E ( ut2 )
(3.51) E ( ut us xt x s ) = 0, t , s = 1, 2,..., T , t s

Theorem Large Sample Properties of Pooled OLS:


Under POLS.1 and POLS.2, the pooled OLS estimator is consistent and
asymptotically normal. If POLS.3 holds in addition, then
2
(
Avar( ) = E ( X ' X ) / N
1
i i )
N T 1

and ) = 2 ( X ' X )1 = 2
Avar( x x
i =1 t =1
it it .

19
Proof:
First part follows from consistency and asymptotic normality of SOLS.
Further, by POLS.3:
N T T
B = E ( X i ' ui ui X i ) = E N 1
u u it is xit xis
i =1 t =1 s=1
N T T
= E N
2 1
xit xit = E
i =1 t =1
2
it it
x
t =1
x = 2
E( X
i X i ) = 2
A

Thus, under POLS.3:


A 1
BA 1
2 1
A
(3.52) Avar ( ) = =
N N
Estimation of 2 :
N T
1
(3.53) =
2

NT
it

u 2

i =1 t =1

20
Without POLS.3:
Heteroskedasticity and Autocorrelation robust estimate of Avar ( ) :

N T 1 N T T N T 1

(3.54) ( POLS ) =
Avar x x
i =1 t =1
it it u u
i =1 t =1 s = 1
it is xit xis x x
i =1 t =1
it it

Sandwich formula, allows for serial correlation and time-varying


variances in the disturbances

Note that random sampling is assumed: no correlation across micro


units i, and we are assuming Ai = A for all i clustering of standard
errors might be necessary (cluster unit i)

21
3.3.4. Feasible GLS (FGLS)
OLS estimator does not include information on correlation across ui
efficiency gain by using variance-covariance matrix 2 I of ui , i.e.

Assumptions SGLS.1 and SGLS.2

(3.55) E ( ui ui | X i ) = E ( ui ui )
system homoskedasticity
(3.56) is G G positive definite matrix and E ( X i 1 X i ) is
nonsingular

Remark: As E ( ui | X i ) = 0 , (3.55) is the same as Var ( ui | X i ) = Var ( ui )

22
Usual motivation of GLS estimator:

Premultiply equation yi = X i + ui by 1/ 2 :
1/ 2 yi = ( 1/ 2 X i ) + 1/ 2 ui leading to yi* = X i* + ui* , where
E ( ui* ui* ) = I G holds (verify!). Then:

N 1 N N 1 N
(3.56) GLS = X X
i =1
*
i
*
i X y
i =1
*
i
*
i = X
i =1
i
1
Xi X
i =1
i
1
yi

Unfeasible estimator. Replace by : Feasible GLS. Estimation of :


N
1
(3.57)
=
N
u u
i =1
i i

23
Consistency and asymptotic normality of FGLS requires somewhat
stronger assumption than SOLS.1:

Assumption SGLS.3:

(3.57) E ( X i ui ) = 0

Remarks: (3.57) requires that each element of ui is uncorrelated with each


element of X i , i.e.

E ( xi1ui ) 0
(3.58) =0 in SUR models, and
0 ui )
E ( xiG

24
E ( xi1ui ) = 0
E ( xi2 ui ) = 0

(3.59) in panel data models.


ui ) = 0
E ( xiT

Thus, (3.59) would be satisfied if E ( xs ut ) = 0 t , s = 1,..., T


stronger assumption than (weak) exogeneity required
Compare to POLS.1: E ( xtut ) = 0 (no exclusion of correlation across
ts)

Reason of stronger assumption: Proof of consistency requires


N N
plim N 1
X
i =1
i
1
ui = 0 (instead of plim N 1
X u
i =1
i i = 0 ) such that other
equations or other time periods are involved.

25
Theorem Asymptotic Normality of FGLS

Under Assumptions SGLS.1 to SGLS.3:


( ).
a 1
(3.60) N ( FGLS ) N 0, E ( X i 1 X i )
N
1

An consistent estimator of Avar is obtained by using =
N
u u :
i =1
i i

N 1

(3.61) =
Avar X
i =1
i
1
Xi

Proof: See WO, 153 162

26
Remark: (Full) Matrix Notation of FGLS as Alternative to Summation over
units i:

( )X ( )
1
(3.62) FGLS = X I N
1 1 y
X IN

Verify by using 1/ 2 yi = ( 1/ 2 X i ) + 1/ 2 ui for all i, and by using

X1 y1
X2 y2
X= . , as well as y = . .
. .
. .
XN yN

27
Panel data allows finding better solutions:

a. Treat c as constant over time:


Take first differences,
treat c as individual-specific effect (fixed-effect)
b.Treat c as (random) part of the error term: Use FGLS instead of OLS
(random effect)

Preliminary discussion of solutions: Consider equation of interest


(3.63) yt = 0 + xt + c + ut , t = 1, 2 ,
where E ( ut | xt , c ) = 0, t = 1, 2 E ( xtut ) = 0 t = 1, 2.

Application of POLS? Consistent only if E ( xtc ) = 0 !

29
3.4. Basic Linear Unobserved Effects Panel Data Models
WO Chapter 10

3.4.1. Motivation: The Omitted Variable Bias Reconsidered

Let c be an unobservable random variable, the vector


( y , x1 , x2 ,..., x K , c ) represents the population of interest
c captures unobserved features of an individual such as (cognitive)
ability, motivation, early family upbringing that do not change over
time; firm data: e.g. managerial quality, structure

Dealing with the problem Cov ( x , c ) 0 in cross-sections:

Find proxy variable for c

Use IV approach (see Chapter 4)


28
Differencing equation (3.63) across the two time periods eliminates c:
(3.64) y = x + u .

Consistency of OLS requires


(i) orthogonality of E ( x ' u) = 0 , i.e.
E ( x2 x1 )'( u2 u1 ) = E ( x2 u2 ) + E ( x1u1 ) E ( x1u2 ) E ( x2 u1 ) = 0
Strict exogeneity required! but no restriction on
correlation between xt and c
(ii) rank E ( x ' u) = K
fails, because constant term and other variables being
constant over time are differenced out
the effect of any variable that is constant over time cannot
be distinguished from c
30
Further introductory remarks:

Restriction to balanced panels

Fixed T, large N assumption:


Cross-section asymptotics with N=50 and T=8? Difficult to know
but more reasonable than T
N=60, T=55: Assumptions about time series dependence required
N=5, T=40: Framework of multiple time series required

31
3.4.2. Random Effects Models

Consider
(3.65) yit = 0 + xit + ci + uit , t = 1,..., T

Random or Fixed Effects?

Random effect is synonymous with Cov ( xit , ci ) = 0 , t = 1, , T;


fixed-effects (firm-specific effect, individual fixed effect, state-specific
effect,..) means that Cov ( xit , ci ) 0 is allowed for.

Strict exogeneity assumption RE.1:


(3.66) RE.1 a) E ( uit | xi 1 ,..., xiT , ci ) = 0 , b) E (ci | xi 1 ,..., xiT ) = E (ci ) = 0

32
Remarks:
It follows from RE.1 that E ( xis uit ) = 0 s , t = 1,..., T
E (ci ) = 0 would follow without assumption RE.1 b) when an
intercept is included

Examples illustrating RE.1:

WO 10.1 Program Evaluation: Estimating the effect of job training on


subsequent wages

(3.67) log( wageit ) = t + zit + 1 progit + ci + uit ,

where t denotes a time-varying intercept, zit vector of observable


variables, and ci covers unobserved ability.

33
Evaluation takes place by comparing productivity (= wages) at some initial
period t=1, when no one participated in the program, to wages at t=2, after
a treatment group has participated in the program, whereas a control
group has not.

Define
progi 1 = 0 for all i
progi 2 = 0 for all i of the control group
progi 2 = 1 for all i of the treatment group

Cov ( progit , ci ) = 0 ? depends on (self-)selection problem!


High-ability workers might participate with higher probability
Workers might be assigned based on characteristics unknown
to the econometrician

34
Strict exogeneity?
Adverse wage shock ut might people choose to participate in future
training, i.e. Cov ( ut , progi . t +1 ) 0
Training program could have lasting effects

WO 10.2 Distributed Lag Model

(3.68) patentsit = t + zit + 0 RDit + 1 RDi . t 1 + ... + 5 RDi . t 5 + ci + uit

Does R&D spending depend on unobserved firm characteristics?


Do shocks to current patents (changes in uit ) affect future expenditures
on R&D?

35
WO 10.3 Lagged Dependent Variable

(3.69) log( wageit ) = 1 log( wagei . t 1 ) + ci + uit , t = 1,..., T

Want to study wage persistence (i.e. the size of ) after controlling for
unobserved heterogeneity:
Let yit = log( wageit ) , xit = yi . t 1 then a standard time series assumption
would be that E ( uit | yi . t 1 , yi . t 2 ,..., yi .0 , ci ) = E ( uit | xit , xi . t 1 ,..., xi 1 , ci ) = 0 .
Thus, uit is uncorrelated with past xis , s t , but uit cannot be uncorrelated
with future xis , s > t :
E ( yit uit ) = 1 E ( yi . t 1uit ) + E (ci uit ) + E ( uit2 ) = E ( uit2 ) > 0
E ( yi . t +1uit ) = 1 E ( yi . t uit ) + E ( ci uit ) + E ( ui . t +1uit ) = 1 E ( yi . t uit ) = 1 E ( uit2 ) 0 ,

The strict exogeneity assumption never holds in unobserved


effect models with lagged dependent variables.
36
Estimation and Inference under the Basic Random Effects
Assumptions

Consider error term in basic model yit = xit + it , t = 1,..., T :


(3.70) it ci + uit

Note that joint dependence on ci causes serial correlation of it


POLS would lead to unbiased, consistent and asymptotic normal
estimation of , but statistical inference would need robust estimation
of variance-covariance matrix (sandwich-formula 3.54)
FGLS is straightforward and more efficient solution that exploits serial
correlation of it

37
Note that E ( it | xi 1 ,..., xiT ) = 0 satisfies the strict exogeneity assumption
SGLS.3 (3.57) such that we can apply GLS methods that account for the
particular error structure in eq. (3.70).

Analysis of serial correlation and variance:


(3.71) Var ( it ) = c2 + u2 , Cov ( it , is ) = c2 ts

Change back to system notation: Write yit = xit + it , t = 1,..., T for all T
periods as (see 3.3.1)
(3.72) yi = X i + i ,
where
(3.73) i = c i jT + ui , jT = (1,...,1)' (T 1 vector).

38
Following along the lines of GLS, we define
(3.74) E( i i )
and we assume it to be positive definite (note that it is the same for all i
random sampling!).

From Var ( it ) = c2 + u2 and Cov ( it , is ) = c2 t s it follows that

c2 + u2 c2 c2
c2 c2 + u2
(3.75) = E ( i i ) = = u2 IT + 2j jT jT

c2 c2 + u2
Matrix Notation:
(3.76) Var ( ) = I NT , where ' = (1 ,..., N ) .

39
) instead of ) GLS rank condition,
Assuming the slightly modified ( plim (
RE.2,

( )
1
(3.77) )
rank E X i plim ( Xi ) = K ,

and provided that we have consistent estimates of c2 and u2 at our disposal


(see below; implies consistent estimate ), we eventually obtain the random
effects estimator:
N 1 N
(3.78) RE = X i X i
i =1
1
X
i =1
1 y .
i

40
Theorem Consistency of the Random Effects Estimator
Under RE.1 and RE.2 RE is an N asymptotically normal and consistent
estimator of .

Proof: Follows from consistency of FGLS.

For efficiency of RE , the rely on general conditional variances and


conditional covariances instead of the unconditional statement = E ( i i ) :
Assumption RE.3:
(3.79) a) E ( ui ui | xi 1 ,..., xiT , ci ) = u2 , b) E (ci2 | xi 1 ,..., xiT ) = c2
=> E ( uit2 ) = u2 , t = 1, 2,..., T , Cov ( it , is ) = c2 ts
Var (ci ) = c2 and, hence, = E ( i i )
(by iterated expectations argument)
41
Theorem Efficiency of the Random Effects Estimator
Under RE.1, RE.2 and RE.3 RE is consistent, N asymptotically normal
and asymptotically efficient in the class of estimators under
E ( i | xi 1 ,..., xiT ) = 0 .

Proof: Follows from the fact that RE is asymptotically equivalent to GLS


under RE.1 RE.3 (WO, p. 260).

Estimation of
Inspection of 2 = c2 + u2 suggests:
i. estimation of 2 ,
ii. estimation of c2 => estimation of u2

42
First step: Consistent estimation of 2

Consider basic panel regression equations yit = xit + it , t = 1,..., T :


T T
E ( it ) =
2
E ( it ) = T 2
= T
2 1
it )
E ( 2

t =1 t =1

Hence, consistent estimator of 2 requires consistent estimation it

Use residuals of consistent Pooled OLS, i.e. it = yit xit POLS


Consistent estimator is given by
N T
1
(3.80) =
2

NT K
it
2

i = 1 t =1

43
Second step: Find a consistent estimator of c2
Recall that c2 = E ( it is )
=> T (T 1) / 2 pairs s t , s , t = 1, 2,..., T , all being equal to c2
T 1 T T 1 T T 1 T
=> E ( it is ) = E ( it is ) = 2
c = c2T (T 1) / 2
t =1 s = t + 1 t =1 s = t + 1 t =1 s = t + 1

Solve for c2 , replace population moment by sample moment, use


consistent residuals of POLS, make small-sample-degrees-of-freedom
correction:
N T 1 T
1
(3.81) c2 =
NT (T 1) / 2 K

i =1 t =1 s = t + 1
it is

Summation in (3.81) might lead to negative value: indication of negative


correlation in u => Conclusion: RE.3 might be violated.
44
Correlation between composite errors it and is does not depend on
difference (t-s), and is also useful measure of the relative importance of the
unobserved effect ci :
(3.82) Corr ( it , is ) = c2 /( c2 + u2 )

Robust Estimation of

a) Conduct statistical inference without assumption RE.3 (still consistent


estimator)
Use robust variance estimator V under RE.1 and RE.2 (see WO, p.
160)

45
Reasons:
E ( i i | xi 1 ,..., xiT ) is not constant, i.e. E ( i i | xi 1 ,..., xiT ) E ( i i )
E( i i ) does not has the random effect structure = E( i i ) , based on
c2 and u2

b) General FGLS Analysis


applies when idiosynchratic errors uit , t = 1, 2,..., T are generally
heteroskedastic and serially correlated

Keep assumption E ( i i | xi 1 ,..., xiT ) = , but:


Estimate without restricting random effect structure

46
has T (T + 1) / 2 estimated elements (recall that random effect
structure requires only two parameters)
with very large N, general FGLS is attractive alternative if appears
to have a pattern different from the random effects pattern
estimation:
N
1
(3.83)

N
,
i =1
i i where i are POLS residuals

Consistent estimator, and with E ( i i | xi 1 ,..., xiT ) = just as efficient


as RE under assumption RE.3

47
Testing for the presence of an Unobserved Effect
H 0 : c2 = 0 Cov ( it , is ) = 0

Derive test statistic based on asymptotic normality of


N T 1 T T 1 T 2
d
(3.84) N 1/ 2 it is N 0, E it is ,
i = 1 t =1 s = t + 1 t =1 s = t + 1

which holds under H 0 : c2 = 0 that the it are serially uncorrelated (WO, p.


264 and problem 7.4

48
Calculate test statistic that is distributed asymptotically as standard
normal:
N T 1 T


i =1 t =1 s = t + 1
it is
(3.85)
N T 1 T 2 1/ 2


i =1
it is

t =1 s = t + 1

Caution: Detection of many kinds of serial correlation possible


rejection does not necessarily imply that random effect structure is
true

49
Fixed Effect Methods

Key difference to RE models: ci may be correlated with error term uit


write linear unobserved effects model yit = xit + ci + uit , t = 1,..., T , as
(3.86) yi = X i + ci jT + ui (T 1 vector)

Strict exogeneity assumption FE.1:


(3.87) FE.1 a) E ( uit | xi 1 ,..., xiT , ci ) = 0

(Thus, RE. b) is not maintained, i.e. E (ci | xi 1 ,..., xiT ) is allowed to be any
function of xi 1 ,..., xiT )

50
Flexibility comes at a price: xi 1 ,..., xiT cannot include time-constant factors

However, we can estimate time-varying impact of time-constant variables:


Consider
(3.88) yit = 1 + 2d 2t + ... + T dTt + zi 1 + d 2t zi 2 + ... + dTt zi T + wit + ci + uit
for t = 1,...T under FE.1, where
d 2,..., dT denote time-period dummies
zi be a vector of time-constant observable variables
wit be a vector of time-varying variables
identification problem: 1 + zi 1 cannot be distinguished from ci
however, 2 ,..., T are identified, we can test whether the impact of
time-constant variables (such as gender bias) has changed over time!

51
Estimation by way of within transformation:

Average yit = xit + ci + uit , over t = 1,..., T and get yi = xi + ci + ui


Subtracting yi = xi + ci + ui from yit = xit + ci + uit gives the transformed
equation
(3.89) ( yit yi ) = ( xit xi ) + ( uit ui )
or
(3.90) yit = xit + uit

=> elimination of individual specific effect

52
The fixed effects estimator (FE) estimator or within estimator is defined as
the POLS estimator applied to eq. (3.90)

Consistent estimation if application of POLS to (3.90)? Yes, because


E ( xit uit ) = 0 (immediately follows from FE.1 and E ( xit uit ) = 0 )

Moreover, the FE estimator is unbiased, because


E ( uit | xi 1 ,..., xiT ) = 0 (which follows from strict exogeneity
assumption FE.1, E ( uit | xi ) = E ( uit | xi ) E ( uit | xi ) = 0 and from xi
being a function of xi = ( xi 1 ,..., xiT ) )

The between estimator is defined as the OLS estimator applied to the cross
section equation yi = xi + ci + ui (note loss of efficiency)

53
Vector notation and time-demeaning

Proposition: yit = xit + uit follows from yi = X i + ci jT + ui by application


of the Frisch-Waugh-Theorem

Proof: All we need is a tool that produces deviations from the time mean as,
for example, the auxiliary regession yit = i + yit , t = 1,..., T , implying that
i = yi => The residual maker matrix QT IT jT ( jT jT )1 jT does the job:
QT jT = 0, QT yi = yi , QT X i = X i , QT ui = ui .

54
Assumption FE.2:
T
(3.91) (
rank E ( X i X i ) = rank ) E ( x x it it ) =K (FE.2)
t =1

Remark:
Would be less than K when any element xit is identical zero for
all i, i.e. observations need to vary over time

Fixed effect (within) estimator:


N 1 N N T 1 N T
(3.92) FE = X X
i =1
i i X y
i =1
i i = x x
i =1 t =1
it it x y
i =1 t = 1
it it

55
Properties of between estimator:
Inconsistent under FE.1, because E ( xici ) is not necessarily zero
Consisteny of between estimator requires assumption RE.1 (b)

Asymptotic Inference with Fixed Effects


The assumption FE.3 assures that the FE is efficient (follows from efficiency
of POLS under Gauss-Markov-Assumptions):
(3.93) E ( ui ui | xi , ci ) = u2 IT (FE.3)
Implications:
Var ( ui | xi , ci ) = u2 IT , E ( ui ui | xi , ci ) = E ( ui ui ), Var ( ui ) = E ( ui ui ) = u2 IT

56
Variance of uit , t = 1,..., T , is homoskedastic. Indeed:

Var ( uit ) = E ( u it2 ) = E ( uit ui )2 = E ( uit2 ) + E ( u i2) 2 E ( uit ui )

T 1
(3.94) = + / T 2 / T =
2
u
2
u
2
u
2
u
T

However, time demeaning leads to negatively serially correlated


uit , t = 1,..., T :

E ( uit uis ) = E ( uit ui )( uis ui ) = E ( u i2) E ( ui uis ) E ( ui uis )

(3.95) = u2 / T 2 u2 / T = u2 / T < 0
Nevertheless, negatively correlated uit do not affect efficiency of FE,
because asymptotics use CLT based on uit , not on uit :

57
Asymptotic variance of FE estimator follows from
1 1 N 1 1 N

( )
N N
N FE = N 1 X i X i N 2
X u i i = N 1 X i X i N 2
X u i i
i =1 i =1 i =1 i =1

(note that X u = X iQT QT ui = X iQT ui = X ui ).

Moreover, under FE.3, E ( ui ui | X i ) = u2 IT . Then, from system OLS it


follows that

( ) ( )
a 1
(3.96) N FE N 0, u2 E ( X i X i )
1
=> Avar ( FE ) = u2 E ( X i X i ) /N

58
and
1 1

( )
N N T
(3.97) FE = u2
Avar X X
i =1
i i = u2 x x
i =1 t = 1
it it

Estimation of u2 :

Note that (3.97) requires u2 , i.e. residuals uit , not residuals uit from the
transformed equation yit = xit + uit !

Solution strategy: Express u2 dependent on uit


T 1 T
Use (3.94), i.e. Var ( uit ) = E ( u ) =
2
it
2
u
T
=> it u (T 1)
E (
t =1
u 2
) = 2

59
Hence, using sample moments, and correcting for degrees of freedom, we
N T
1
obtain u =
2
2
u
N (T 1) K i =1 t =1 it

Avoid overly complicated notation u , but recall that definition might be


ambiguous:
N T
1
(3.98) u2 =
N (T 1) K
u
i =1 t =1
2
it ,

where fixed effect residuals are defined as


(3.99) u it = yit xit FE , t = 1,..., T ; i = 1,..., N

=> u2 is unbiased estimator of u2

60
Word of caution:
Denominator is N (T 1) K , not NT K as for standard regressions or
2 in the RE model!
(Downward) Bias can be substantial when T is small !
Using standard computer packages and applying it after demeaning
requires correction of standard errors
Example: N=500, T=3, K=10 => correction factor = 1.227

61
Measure of the relative importance c2 /( c2 + u2 ) of the fixed-effect
component ci :

( )
N T
1 2
Given FE , =
2

NT K i =1 t =1

yit xit FE is a consistent estimate of

2 such that c2 = 2v u2 is a consistent estimate of c2 .

Testing multiple restrictions: F-Test


( SSRr SSRur ) / Q a
(3.100) F= F (Q , N (T 1) K ) ,
SSRur /( N (T 1) K )
N T
where SSRur = u it2 unrestricted residuals: u it = yit xit FE
i = 1 t =1
SSRr = restricted residuals from similar regression, but
with Q restrictions imposed on

62
The Dummy Variable Approach to Fixed-Effect Modelling
(LSDV)
View the ci as parameters to be estimated along with :
Matrix notation
(3.101) y = X * * + ,

where X * = ( X , d 1,..., dN ), * = ( 1 ,..., K , c1 ,..., c N )


1 if n = i
and dnit = t = 1,..., T ; i = 1,..., N
0 if n i

Consistency, Efficiency of LSDV :


Under assumptions FE.1, rank ( X * ) = K + N , and FE.3, the LSDV
regression y = X * * + satisfies the GAUSS-Markov-Assumptions.

63
Remark: LSDV as defined in (3.101) reproduces the FE estimator FE of ,
and also the residuals from the LSDV approach are identical to the
residuals u of the regression yit = xit + uit .

Proof: Application of the Frisch-Waugh-Theorem

Advantages of the LSDV approach:

y = X * * + produces u2 with the (usual) correct degrees of freedom:


NT N K = N (T 1) K
Under the GAUSS-Markov assumptions, y = X * * + provides best
linear unbiased estimates of c1 ,..., c N
We can test equality of c1 ,..., c N using an F-Test

64
Treatment of collinearity of constant and fixed effects in software packages:
The overall intercept is either for an arbitrary cross section unit or, more
commonly, for the average of the ci across i.

Statistical Inference in the Presence of Serial Correlation:


Application of the Robust Variance Matrix Estimator
FE estimator is consistent and asymptotically normal under assumption
FE.1 and FE.2. Note, however, that statistical inference based on
1

( )
N T
FE = u2
Avar x x
i =1 t =1
it it would be wrong when FE.3 does not hold.

Most severe problem: serial correlation in uit , t = 1,..., T .

65
Procedure:

a) Detection of serial correlation (of various kinds):

Run the pooled OLS regression uit on u i . t 1 (where uit are FE


residuals, u it = yit xit FE ), make t-values robust to serial
correlation (see 3.47 and 3.54).

b) If we find serial correlation, adjust the asymptotic variance estimator


and test statistics. Applying (3.47), we obtain the robust estimate

( ) ( ) ( )( )
1 1
N N N
(3.102) FE =
Avar X i X i X iu i u i X i X i X i
i =1 i =1 i =1

66
First Differencing Methods
Consider

(3.103) yit = xit + uit , t = 2,..., T

Assumption FD.1

(3.104) E ( uit | xi 1 ,..., xiT , ci ) = 0 (same as FE.1)

Assumption FD.2
T
(3.105) rank E ( x x )
t =2
it it =K

67
First-difference (FD) estimator is the pooled OLS estimator from the
regression yit on xit , t = 2,..., T ; i = 1,..., N

Discussion of FD under FD.1, FD.2:

Elements of xit must be time varying

Constant and unobserved effect ci get differenced away

Example: Differencing

yit = 1 + 2d 2t + ... + T dTt + zi 1 + d 2t zi 2 + ... + dTt zi T + wit + ci + uit

we obtain

yit = 2 ( d 2t ) + ... + T ( dTt ) + ( d 2t ) zi 2 + ... + ( dTt ) zi T + wit + uit

68
Discussion of FD under FD.1, FD.2 (continued):

From FD.1 it follows that E ( xit uit ) = 0 => FD is consistent

From FD.1 it follows E ( uit | xi 2 ,..., xiT ) = 0 (strict exogeneity)


=> FD is unbiased conditional on X

Efficiency:

Consider FD model in matrix notation y = X + e , where e = u .


Efficiency follows from applying the GAUSS-MARKOV Theorem under
assumptions FD.1, FD.2 and the additional assumption FD.3:

(3.106) E (ei ei | xi 1 ,..., xiT , ci ) = e2 IT 1

69
Inference:

Under FD.1 FD.3 the usual OLS standard errors from the first
difference regression are asymptotically valid.

A consistent estimator of e2 is obtained from (note that denominator is


from usual OLS standard error due to skipping the first period):

( )
N T
1 2
(3.107) =
2
e yit xit FD
N (T 1) K i =1 t =1

Robust estimate (without FD.3):

(3.108) ( )
Avar FD = ( X X )

1
( i =1 X ie i e i X i
N
) ( X X )
1

70
Practical hint: Avoid producing differences such as yi +1.1 yi .T when
calculating y and x from stacked cross sections; differences for
observations 1, T+1, 2T+1, , (N-1)T+1 should be set to missing

Application: Policy Analysis Using First Differences - The Difference-in-


Difference (DiD) Estimator

Recall Program Evaluation: Estimating the effect of job training on


subsequent wages
log( wageit ) = t + 1 progit + ci + uit , t = 1, 2
We defined
progi 1 = 0 for all i
progi 2 = 0 for all i of the control group
progi 2 = 1 for all i of the treatment group

71
From the first-difference

(3.109) log( wagei 2 ) = 2 + 1 progi 2 + ui 2 ,

where progi 2 = progi 2 , we obtain the DiD estimator 1 . It can be written as

(3.110) 1 = log( wage )treatment log( wage )control

(see WO, Problem 10.4)

In general, more than two periods and further explanatory variables are
included (see Chapter on Evaluation Methods)

72
Comparison of estimators

Fixed Effects versus First Differencing

Common feature: inconsistency in case of standard endogeneity


problems, including measurement error, time-varying omitted
variables, and simultaneity (see WO, Chapter 11, for details)

T=2:

FE and FD produce identical estimates (WO Problem 10.3)

FD is easier to implement because all procedures that can be


applied to single cross sections can be applied directly

73
T>2
Hinges upon the assumptions about the idiosyncratic errors uit :
FE is more efficient when uit are serially uncorrelated
FD is more efficient when uit follows a random walk
Recall from assumption FD.3:
E (eit eis | xi , ci ) = 0 => eit = uit is serially uncorrelated
=> uit = ui . t 1 + eit follows a random walk
Under FD.1 FD.3, estimation of usual OLS standard errors
from the first difference regression are asymptotically valid,
while under FE.1 FE.3 estimated OLS standard errors based
on (within) FE regression need to be corrected (note, however,
that this is not the case for the LSDV approach)
Formal test by using a Hausman test

74
Fixed Effects versus Random Effects

Consider quasi-time demeaning:

(3.111) yit yi = ( xit xi ) + ( it i )

Estimating (3.111) by POLS, we obtain (WO, p. 286/287)

{ ( )}
1/ 2
(3.112) = 1 1/ 1 + T c2 / u2

Thus, 1 when T or c2 / u2

75
{
We conclude from = 1 1/ 1 + T ( / )}
1/ 2
2 2
c u that random effects can be
close to fixed effects

for large T

if the estimated variance of the ci is large relative to the estimated


variance of the uit
=> = c2 /( c2 + u2 ) 1

{
We conclude from = 1 1/ 1 + T ( / )}
1/ 2
2 2
c u that random effects can be
close to pooled OLS

if c2 / u2 0 , i.e. = c2 /( c2 + u2 ) 0

76
The Hausman Test Comparing the RE and FE estimators

Test points at crucial difference between both estimators: Potential


correlation between when ci and xit , i.e. validity of Assumption RE.1.b.

Idea of test: Consider difference FE RE under the assumption that


RE.3 holds true (i.e. existence of the random effect structure = E( i i ) )

Correlation = 0 => RE would be consistent and more efficient than FE


(Proof: See WO, p.289), FE consistent
Correlation 0 => RE would be inconsistent, FE consistent

( )
when FE RE 0 then we observe consistency in both cases
but RE is more efficient => use of the efficient RE estimator

77
( )
when FE RE 0 then RE differs from the consistent
alternative and RE must be inconsistent => use of the consistent FE
estimator

Hausman test statistic:

( )
( ) ( ) ( )
1 a
(3.113) H = FE RE FE Avar
Avar RE FE RE 2M ,

where RE denotes the vector of random coefficient estimates without the


coefficients on time-constant variables, FE is vector of corresponding fixed
effect estimates, each being M 1 vectors ( M = dim ( ) 1)
FE

78
Proof:
H is based on the Wald statistic

(
) ( ) ( )
1 a
H = FE RE FE RE
Avar FE
RE 2M

Consider
(
Var
FE RE ) = Var ( ) + Var ( ) Cov (
FE RE RE , FE )
Cov ,
RE (
FE )
Hausmans essential result is that the covariance of an efficient estimator
with its difference from an inefficient estimator is zero, which implies that

( ) (
Cov FE RE , RE = Cov FE , RE Var RE = 0 or) ( )
Cov ( FE )
, RE = Var RE ( )
79
( ) ( )
Replacing Cov FE , RE = Var RE , we obtain

( ) ( ) ( ) ( )
Var FE RE = Var FE + Var RE Var RE Var RE ( )
q.e.d.

If primary interest lies in single parameter estimate k (for example,


because of single important explanatory policy variable), and other
variables are control variables, then Hausman test boils down to a t-test

(3.114) t=
(
k FE )
kFE
,
( ) Avar
( )
1/ 2
Avar kFE k RE

80
( )
k
where Avar FE
and Avar ( )
k
RE
are corresponding main diagonal
( ) and Avar
elements of Avar FE
( ) , respectively.
RE

Remarks:
Test assumes that RE.1 RE.3 hold; Rejection does not allow any
conclusion about true causes of rejection
reason of rejection might be that RE.3 is too strong an
assumption
Hausman test has no power against the alternative that
Assumption RE.1 is true but Assumption RE.3 is false.

If Assumption RE.3 fails, WO proposes robust form of the Hausman


statstic based on the F statistic: See WO, p.290.

81

Você também pode gostar