Você está na página 1de 43

PANEL DATA (Ch. 10)

The recommended exercise questions from the textbook:

Chapter 10: All except (10.6), (10.10).

[1]

What are panel data?

Panel data consists of the observations on the same n entities at

two or more time periods T. If the data set contains observations

on the variables X and Y, then the data are denoted

(

X

it

,

Y

it

),

i =

1,

,

n and t =

1,

, T ,

where the first subscript, i, refers to the entity being observed, and

the second subscript, t, refers to the date at which it is observed.

Balanced panel Vs. unbalanced panel.

• Balanced panel:

• Unbalanced panel:

Variables are observed for each entity and

each time period.

Some missing data for at least one time

period.

We consider the analysis of balanced panel. But extension to

unbalanced is straightforward.

Panel-1

 [2] Revisiting Omitted Variables Biases • Issue: • Do alcohol taxes help decrease traffic deaths? • Data: fatality.wf1 • 48 U.S. states (excluding Alaska and Hawaii): N = 48. • 1982 -1988: T =7. • fatality rate = # of traffic accident deaths per 10,000 people. beertax = tax per a case of beer (\$). • Estimation results for the 1982 data: Fatality Rate = 2.01 + 0.15BeerTax (0.15) (0.13) • Estimation results for the 1988 data:

Fatality Rate = 1.86 + 0.44BeerTax (0.11) (0.13)

Panel-2

Panel-3

What is going on here?

Consider a simple multiple regression model (for a given time t):

Y it = β 0 + β 1 X it + β 2 Z i + u it , i = 1, where Z i is a time-invariant regressor.

, N,

1 What do β 1 and β 2 measure? β 1 measures the partial effect of X it on Y it with Z i held constant. Similarly, β 2 measures the partial effect of Z i on Y i with X it held constant.

If you estimate Y it = α 0 + α 1 X it + error it instead?

ˆ

α ββ+

1

p

12

cov(

X Z

it

,

i

)

var(

X

it

)

Each state would have a different level of preference for alcohol (say, Z i = Pal).

Pal (Z) and Beertax (X) could be positively related: cov(

X Z

it

,

i

)

>0.

Pal (Z) would have a positive partial effect on FatalityRate (β 2 > 0).

Thus,

αˆ

1

could be positive even if the true β 1 is negative.

How could we control Pal using panel data?

Panel-4

[3]

Two equations for 1982 and 1988:

Panel Data with Two Time Periods

FatalityRate i,1988 = β 0 + β 1 BeerTax i,1988 + β 2 Z i + u i,1988 .

FatalityRate i,1982 = β 0 + β 1 BeerTax i,1982 + β 2 Z i + u i,1982 .

FatalityRate i,1988 – Fatality i,1982

= β 1 (BeerTax i,1988 –BeerTax i,1982 ) + (u i,1988 -u i,1982 ).

No Z i in (1)! OLS on (1) will yield a consistent estimator of β 1 .

Actual estimation results for (1):

Fatality Fatality

1988

1982

(1)

= -0.072 – 1.04(BeerTax 1988 – BeerTax 1982 ) (0.065) (0.36)

Panel-5

Comments on the before-and-after estimation results.

• As real beer tax increases by \$1 per case, the traffic fatality rate falls by 1.04 deaths per 10,000 people.

This is a big effect, because mean traffic fatality rate is

approximately two.

• This before-and-after approach works well if T = 2. What should we do if T > 2?

Panel-6

 [4] Fixed Effects Regression (A) A simple regression model:

Y it = β 0 + β 1 X it + β 2 Z i + u it , i = 1,

, N, t = 1,

• Set α i = β 0 + β 2 Z i . Then, we have

Y it = β 1 X it + α i + u it , which is called the “fixed effects regression model.

 , T. (1) (2)

• For the i’th cross-sectional entity, the regression line is (2). The

slope coefficient β 1 is the same for all i, but the intercept terms α i

are different across different i (but constant over time).

• Set:

Y it = β 0 + β 1 X it + γ 2 D2 i + γ 3 D3 i +

+ γ n Dn i + u it ,

where i = 1,

, n, t = 1,

, T (nT observations),

D 2

1

if i is the nd entit y ;

2

i = ⎨

0

otherwise

,

and other dummy variables D3,

, Dn are similarly defined.

(3)

• In (3), α 1 = β 0 , α 2 = β 0 + γ 2 ,

, α n = β 0 + γ n .

• The slope coefficient β 1 and n other parameters (β 0 , γ 2 ,

be estimated by OLS on model (3).

, γ n ) can

Panel-7

• “Entity-demeaned” OLS algorithm

• Y it = β 1 X it + α i + u it

Y

i

= β 1

X

i + α i +

u i , where

1
Y
=
i
T

Σ

T

t = 1

Y

it .

------------------------------------

(

YY−= β X X + uu.

it

i

1

it

i

it

i

)

(

)

(

)

(4)

• OLS estimator of β 1 from (4) = OLS estimator of β 1 from (3).

• Least Square Assumptions for the fixed effects model:

(FEA.1)

(FEA.2) The data,

Eu X X

X

(

|

it

i

1

,

(

i

i

2

1

,

,

,

,

)

,

X α =

iT

,

XY

iT

,

i

i

1

,

0

.

Y

iT

)

, i =1,

, n, are random

sample.

(FEA.3) (

X α

it

,

i

) have nonzero finite fourth moments: Large

(FEA.4)

(FEA.5)

outliers are unlikely. There is no perfect multicollinearity.

No autocorrelation:

cov(

uu

it

,

is

|

X

i

1

,

,

)

X α =

iT

,

i

0

for all

t

s .

For multiple regressions, X it should be replaced by full list of X 1,it ,

…, X k,it .

• What happens if (FEA.5) is violated?

Panel-8

(B)

(C)

Extension to multiple X’s.

The fixed effects regression model is

Y it = β 1 X 1,it +

+ β k X k,it + α i + u it ,

where i = 1,

Equivalently, the fixed effects model can be written as

, n, and t = 1,

, T.

(5)

Y it = β 0 + β 1 X 1,it +

+ β k X k,it + γ 2 D2 i +

+ γ n Dn i + u it .

(6)

“Entity-demeaned” algorithm

(

YY− = β X X ++ β X X + uu

it

i

1

1,

it

1,

i

k

k it

,

k i

,

it

i

)

(

)

(

)

(

). (7)

OLS estimators of β 1 ,

β k from (6).

, β k from (7) = OLS estimators of β 1 ,

,

Application to Traffic Deaths.

Fixed effects regression results:

Fatality Rate = -0.66BeerTax + StateFixedEffects.

(0.20)

Panel-9

[5]

(1)

Time and Entity Fixed Effects Model

Motivation.

Y it = β 0 + β 1 X it + β 2 Z i + β 3 S t + u it ,

where, Y it = FatalityRate; X it = BeerTax;

Z i = time-invariant preferences for alcohol or driving of the

people in State i;

S t = Time specific effects (common to all states) such as

overall mobile safety improvements.

(2)

• Let

B

1

t

i f t is the f irst time period

0,

otherwise

.

1

= ⎨

;

Define dummy variables B2 t ,

, BT t similarly.

Time and Entity Fixed Effects Model:

Y it = β 0 + β 1 X 1,it +

 + β k X k,it + γ 2 D2 i + + γ n Dn i + δ 2 B2 t + δ T BT t + u it .

• Too many regressors. But can get reasonably accurate estimates

(3)

of β 1 ,

inaccurate.

, β k . But the estimates of γ 2 ,

Application to traffic death

, γ n and δ 2 ,

, δ T are

Fatality Rate = -0.64Beertax + StateFixedEffects

(0.25)

Panel-10

+ TimeFixedEffects.

 [6] Drunk Driving Laws and Traffic Death • Would driving laws and economic conditions matter?

Panel-11

• Drinking or drunken driving law do not matter very much.

• Economic factors are important.

• (4) is the base model.

• Average tax = \$0.5/case, and average fatality rate = 2 per 10,000 people.

• As tax increases by \$0.5, fatality rate drops 0.45×0.5 = 0.225 (per

10,000).

But this result is somewhat imprecise: The confidence interval for

the effect of BeerTax at 95% of confidence level is:

−±×0.45 1.96 0.22 (-0.88, -0.02),

which is quite wide.

Panel-12

[7]

(1)

Eviews Exercise

Exercise with an artificial panel data set named “artificial_panel.xls.”

There are four variables in the excel file, “country”, “year”, “y”, and “x”. Each variable has 11 observations from the 3 rd row to the 14 th row. The data are artificial numbers for three countries, US, Japan and Korea. Notice that the variable “country” is alphabetic, not numeric.

 STEP 1: Open artificial_panel.xls using Excel. Then, using your mouse, block the data and copy them. STEP 2: Open Eviews. Then, type the following on the Eviews window (the narrow white window below the File, Edit, Object buttons):

create u 12 (enter)

Then, a workfile window will pop up.

Panel-13

Type the followings on the Eviews window:

alpha country (enter) data year y x (enter)

The command “alpha” is used to create alphabetic variables, while “data” is for numeric variables.

Then, a spreadsheet will pop up.

Panel-14

Close the window by clicking on X on the North-East corner of the window. Eviews will ask you whether you want to delete Untitled Group. Click on the Yes button.

Panel-15

STEP 3:

On the workfile, click on the show buttom. Then, a SHOW window will pop up. Type on the window:

country year y x

Panel-16

Click on OK. Then, a spreadsheet will pop up.

Panel-17

Click on Edit+/- buttom and locate your cursor on the 1-country cell. And push the right button on your mouse.

Panel-18

Then, you will see that the data from the excel file are pasted to the spreadsheet.

Panel-19

Close the spreadsheet by clicking on X on the North-East corner. Eviews will ask you whether you want to delete Untitled Group. Click on the Yes button.

STET 4:

On the workfile, push the save buttom. Determine the drive and file folder where you want to save the file. Choose the file name

“artificial_panel.wf1”.

Panel-20

Click on the save button. Then, a “Workfile Save” window will pop up. Just click on the ok button.

Panel-21

Then, you will be back to the workfile.

Panel-22

STEP 5:

On the workfile, push the Proc button. Choose Structure/Resize Current Page…

Panel-23

Then you will have the Workfile Structure window. Choose Dated Panel. Then, you will have the following screen.

Panel-24

Type 2001 for Start date, 2004 for End date, country for Cross- section ID series, and year for Data series. Then, click on OK.

Panel-25

Then, you will be back to the workfile. Save it!!!

STEP 6:

button. Choose Equation and choose

art_pan as the name of the object. Then, an Equation Estimation window will pop up. Type “y x” on the Equation specification box.

Push the objects/new object

Panel-26

And click on Panel Options.

Panel-27

Choose “Fixed” for Cross-section, “Fixed” for Period, and “White (diagonal) for Coef covariance method.

By choosing “Fixed” for Cross-section, you are doing regression with dummy variables for individual entities. By choosing “Fixed” for Period, you are adding time dummy variables into regression.

Panel-28

STEP 7:

Choose view/Fixed/Random Effects/Cross-section Effects. Then you will have:

Panel-29

Choose view/Fixed/Random Effects/Period Effects.

Panel-30

Choose view/Fixed/Random Effects Testing/Redundant Fixed Effects.

Panel-31

Panel-32

I found that the F and χ 2 statistics for the individual dummy variables and the time dummy variables are computed assuming the error terms in the regression models are homoskedastic over i and t. So, the results are not reliable if the error terms are in fact heteroskedastic. If you would like to test whether time effects are statistically significant,

I would like to suggest you to estimate your model choosing None for Period but including time-dummy variables as time dummy variables.

Panel-33

(2) Exercise with fatality.wf1.

-----------------------------------------------------------------------------------

variable name

----------------------------------------------------------------------------------

variable label

 state State ID (FIPS) Code year Year spircons Spirits Consumption unrate Unemployment Rate perinc Per Capita Personal Income emppop Employment/Population Ratio beertax Tax on Case of Beer sobapt % Southern Baptist mormon % Mormon mlda Minimum Legal Drinking Age dry % Residing in Dry Counties yngdrv % of Drivers Aged 15-24 vmiles Ave. Mile per Driver vmilespd Ave. Mile per 1,000 Driver breath Prelim. Breath Test Law jaild Mandatory Jail Sentence comserd Mandatory Community Service jailcom jaild + comserd allmort # of Vehicle Fatalities (#VF) mrall Vehicle Fatality Rate (VFR) = #VF/Population vfrall 10,000*mrall = VFR per 10,000 people allnite # of Night-time VF (#NVF) mralln Night-time VFR (NVFR) allsvn # of Single VF (#SVF) a1517 #NVF, 15-17 year olds mra1517n NVFR, 15-17 year olds a1829 #VF, 18-20 year olds a1820n #NVF, 18-20 year olds mra1820 VFR, 18-20 year olds mra1820n NVFR, 18-20 year olds a2124 #VF, 21-24 year olds mra2124 VFR, 21-24 year olds a2124n #NVF, 21-24 year olds mra2124n NVFR, 21-24 year olds aidall # of alcohol-involved VF

Panel-34

 da18 Dummy variable for drinking age = 18 da19 Dummy variable for drinking age = 19 da20 Dummy variable for drinking age = 20 lincperc Log of per capita real income mraidall Alcohol-Involved VFR pop Population pop1517 Population, 15-17 year olds pop1820 Population, 18-20 year olds pop2124 Population, 21-24 year olds miles total vehicle miles (millions) unus U.S. unemployment rate epopus U.S. Emp/Pop Ratio gspch GSP Rate of Change Dum1982 Dum1983 Dum1984

:

DUM1988

------------------------------------------------------------------------------------

Panel-35

Estimation of the specification (4) on Table 10.1 in p. 368.

Dependent Variable: VFRALL Sample: 1982 1988 Cross-sections included: 48 Total panel (balanced) observations: 336 White diagonal standard errors & covariance (d.f. corrected)

 Variable Coefficient Std. Error t-Statistic Prob. C -2.327171 1.316419 -1.767804 0.0782 BEERTAX -0.450272 0.222005 -2.028203 0.0435 DA18 0.027509 0.065473 0.420158 0.6747 DA19 -0.019096 0.039510 -0.483315 0.6293 DA20 0.030875 0.045689 0.675767 0.4998 JAILD 0.012644 0.031940 0.395866 0.6925 COMSERD 0.034135 0.114820 0.297289 0.7665 VMILESPD 0.008226 0.008368 0.983073 0.3264 LINCPERC 1.814889 0.472220 3.843312 0.0002 UNRATE -0.063043 0.011616 -5.427345 0.0000 DUM1982 0.533926 0.075931 7.031706 0.0000 DUM1983 0.435841 0.070418 6.189300 0.0000 DUM1984 0.246723 0.050392 4.896067 0.0000 DUM1985 0.155325 0.043688 3.555327 0.0004 DUM1986 0.189843 0.040808 4.652090 0.0000 DUM1987 0.087532 0.032452 2.697246 0.0074 Effects Specification Cross-section fixed (dummy variables) R-squared Adjusted R-squared Log likelihood Durbin-Watson stat 0.939540 Mean dependent var S.D. dependent var F-statistic Prob(F-statistic) 2.040444 0.925809 0.570194 183.8646 68.42532 1.733929 0.000000

Testing significance of the individual and time dummy variables:

[Estimation choosing “Fixed” for period and not using dummy variables as regressor.]

Redundant Fixed Effects Tests Equation: MIN Test cross-section and period fixed effects

 Effects Test Statistic d.f. Prob. Cross-section F Cross-section Chi-square Period F Period Chi-square Cross-Section/Period F Cross-Section/Period Chi-square 44.772106 (47,273) 0.0000 727.186063 47 0.0000 19.685127 (6,273) 0.0000 120.798386 6 0.0000 40.398468 (53,273) 0.0000 732.351587 53 0.0000

Panel-37

Testing significance of the time dummy variables:

[Estimation choosing “None” for period and using dummy variables as regressor.]

 Wald Test: Equation: MIN Test Statistic Value df Probability F-statistic 11.46715 (6, 273) 0.0000 Chi-square 68.80287 6 0.0000

Panel-38

What if Assumption #5 fails: so corr(u it , u is |X it ,X is ,α i ) 0?

• OLS panel data estimators of β 1 are unbiased, consistent.

• The OLS standard errors will be wrong.

• Use “heteroskedasticity and autocorrelation-consistent standard errors” (clustered standard errors).

• The clustered SE formula is NOT the usual (hetero-robust) SE formula! [Appendix 10.2 (pp. 379 – 381)].

• The clustered SE might not be very accurate if N is small.

• Eviews can compute these!

In Eviews, choose “White period” instead of “White (diagonal)”.

Panel-39

Estimation of the specification (7) on Table 10.1 in p. 368.

Dependent Variable: VFRALL Sample: 1982 1988 Cross-sections included: 48 Total panel (balanced) observations: 336 White period standard errors & covariance (d.f. corrected)

 Variable Coefficient Std. Error t-Statistic Prob. C -2.327171 1.915400 -1.214979 0.2254 BEERTAX -0.450272 0.319805 -1.407961 0.1603 DA18 0.027509 0.075267 0.365483 0.7150 DA19 -0.019096 0.053288 -0.358351 0.7204 DA20 0.030875 0.054076 0.570957 0.5685 JAILD 0.012644 0.017699 0.714386 0.4756 COMSERD 0.034135 0.142797 0.239043 0.8113 VMILESPD 0.008226 0.007355 1.118432 0.2644 LINCPERC 1.814889 0.683535 2.655150 0.0084 UNRATE -0.063043 0.013984 -4.508168 0.0000 DUM1982 0.533926 0.098541 5.418291 0.0000 DUM1983 0.435841 0.091540 4.761205 0.0000 DUM1984 0.246723 0.064103 3.848852 0.0001 DUM1985 0.155325 0.054832 2.832774 0.0050 DUM1986 0.189843 0.042774 4.438265 0.0000 DUM1987 0.087532 0.032445 2.697841 0.0074 Effects Specification Cross-section fixed (dummy variables) R-squared Adjusted R-squared Durbin-Watson stat 0.939540 Mean dependent var S.D. dependent var Prob(F-statistic) 2.040444 0.925809 0.570194 1.733929 0.000000

• Average tax = \$0.5/case, and average fatality rate = 2 per 10,000 people.

• As tax increases by \$0.5, fatality rate drops 0.45×0.5 = 0.225 (per

10,000).

The confidence interval for the effect of BeerTax at 95% of

confidence level is:

−±×0.45 1.96 0.32 (-1.08, 0.18),

which is wider than (-0.88, -0.02).

Panel-41

Panel-42

Panel-43

Panel-43