Você está na página 1de 43

PANEL DATA (Ch.

10) The recommended exercise questions from the textbook: Chapter 10: All except (10.6), (10.10).

[1]

What are panel data? two or more time periods T. If the data set contains observations on the variables X and Y, then the data are denoted

Panel data consists of the observations on the same n entities at

( X it , Yit ), i = 1,..., n and t = 1,..., T ,


where the first subscript, i, refers to the entity being observed, and the second subscript, t, refers to the date at which it is observed. Balanced panel Vs. unbalanced panel. Balanced panel: Variables are observed for each entity and each time period. Unbalanced panel: Some missing data for at least one time period. We consider the analysis of balanced panel. But extension to unbalanced is straightforward.

Panel-1

[2]

Revisiting Omitted Variables Biases

Issue: Do alcohol taxes help decrease traffic deaths? Data: fatality.wf1 48 U.S. states (excluding Alaska and Hawaii): N = 48. 1982 -1988: T =7. fatality rate = # of traffic accident deaths per 10,000 people. beertax = tax per a case of beer ($). Estimation results for the 1982 data:
n = 2.01 + 0.15BeerTax FatalityRate (0.15) (0.13)

Estimation results for the 1988 data: n = 1.86 + 0.44BeerTax FatalityRate (0.11) (0.13)

Panel-2

Panel-3

What is going on here? Consider a simple multiple regression model (for a given time t): Yit = 0 + 1Xit + 2Zi + uit, i = 1, ... , N, where Zi is a time-invariant regressor. 1 What do 1 and 2 measure? 1 measures the partial effect of Xit on Yit with Zi held constant. Similarly, 2 measures the partial effect of Zi on Yi with Xit held constant. If you estimate Yit = 0 + 1Xit + errorit instead?
1 p 1 + 2 cov( X it , Z i ) var( X it )

Each state would have a different level of preference for alcohol (say, Zi = Pal). Pal (Z) and Beertax (X) could be positively related: cov( X it , Zi ) >0. Pal (Z) would have a positive partial effect on FatalityRate (2 > 0).

1 could be positive even if the true 1 is negative. Thus,


How could we control Pal using panel data?

Panel-4

[3]

Panel Data with Two Time Periods FatalityRatei,1988 = 0 + 1BeerTaxi,1988 + 2Zi + ui,1988. FatalityRatei,1982 = 0 + 1BeerTaxi,1982 + 2Zi + ui,1982. FatalityRatei,1988 Fatalityi,1982 = 1(BeerTaxi,1988 BeerTaxi,1982) + (ui,1988-ui,1982). (1)

Two equations for 1982 and 1988:

No Zi in (1)! OLS on (1) will yield a consistent estimator of 1. Actual estimation results for (1):

n Fatality 1988 Fatality1982


= -0.072 1.04(BeerTax1988 BeerTax1982) (0.065) (0.36)

Panel-5

Comments on the before-and-after estimation results.


As real beer tax increases by $1 per case, the traffic fatality rate falls by 1.04 deaths per 10,000 people. This is a big effect, because mean traffic fatality rate is approximately two. This before-and-after approach works well if T = 2. What should we do if T > 2?

Panel-6

[4]

Fixed Effects Regression Yit = 0 + 1Xit + 2Zi + uit, i = 1, ... , N, t = 1, ... , T. (1)

(A) A simple regression model:

Set i = 0 + 2Zi. Then, we have Yit = 1Xit + i + uit, which is called the fixed effects regression model. For the ith cross-sectional entity, the regression line is (2). The slope coefficient 1 is the same for all i, but the intercept terms i are different across different i (but constant over time). Set: Yit = 0 + 1Xit + 2D2i + 3D3i + ... + nDni + uit, where i = 1, ... , n, t = 1, ..., T (nT observations),
1 if i is the 2nd entity; D 2i = 0 otherwise,

(2)

(3)

and other dummy variables D3, ..., Dn are similarly defined. In (3), 1 = 0, 2 = 0 + 2, ... , n = 0 + n. The slope coefficient 1 and n other parameters (0, 2, ..., n) can be estimated by OLS on model (3).

Panel-7

Entity-demeaned OLS algorithm Yit = 1Xit + i + uit


Yi = 1 X i + i + ui , where Yi =

1 T t =1Yit . T (4)

------------------------------------

(Y

it

Yi ) = 1 ( X it X i ) + ( uit ui ) .

OLS estimator of 1 from (4) = OLS estimator of 1 from (3). Least Square Assumptions for the fixed effects model: (FEA.1) E (uit | X i1 , X i 2 ,..., X iT , i ) = 0 . (FEA.2) The data, ( X i1 ,..., X iT , Yi1 ,..., YiT ) , i =1, ..., n, are random sample. (FEA.3) ( X it , i ) have nonzero finite fourth moments: Large outliers are unlikely. (FEA.4) There is no perfect multicollinearity. (FEA.5) No autocorrelation: cov(uit , uis | X i1 ,..., X iT , i ) = 0 for all

t s.
For multiple regressions, Xit should be replaced by full list of X1,it, , Xk,it. What happens if (FEA.5) is violated?

Panel-8

(B) Extension to multiple Xs. The fixed effects regression model is Yit = 1X1,it + ... + kXk,it + i + uit, where i = 1, ... , n, and t = 1, ... , T. Equivalently, the fixed effects model can be written as Yit = 0 + 1X1,it + ... + kXk,it + 2D2i + ... + nDni + uit. Entity-demeaned algorithm (6) (5)

(Y

it

Yi ) = 1 ( X 1,it X 1,i ) + ... + k ( X k ,it X k ,i ) + ( uit ui ) .

(7)

OLS estimators of 1, ... , k from (7) = OLS estimators of 1, ... , k from (6). (C) Application to Traffic Deaths. Fixed effects regression results:
n = -0.66BeerTax + StateFixedEffects. FatalityRate (0.20)

Panel-9

[5]

Time and Entity Fixed Effects Model

(1)

Motivation. Yit = 0 + 1Xit + 2Zi + 3St + uit, where, Yit = FatalityRate; Xit = BeerTax; Zi = time-invariant preferences for alcohol or driving of the people in State i; St = Time specific effects (common to all states) such as overall mobile safety improvements.

Return to our FatalityRate example:

1 if t is the first time period ; Let B1t = 0, otherwise.

Define dummy variables B2t, ... , BTt similarly. (2) Time and Entity Fixed Effects Model: Yit = 0 + 1X1,it + ... + kXk,it + 2D2i + ... + nDni + 2B2t + ... TBTt + uit. Too many regressors. But can get reasonably accurate estimates of 1, ... , k. But the estimates of 2, ... , n and 2, ... , T are inaccurate. (3) Application to traffic death
n = -0.64Beertax + StateFixedEffects FatalityRate (0.25) + TimeFixedEffects.

Panel-10

[6]

Drunk Driving Laws and Traffic Death

Would driving laws and economic conditions matter?

Panel-11

Drinking or drunken driving law do not matter very much. Economic factors are important. (4) is the base model. Average tax = $0.5/case, and average fatality rate = 2 per 10,000 people. As tax increases by $0.5, fatality rate drops 0.450.5 = 0.225 (per 10,000).
But this result is somewhat imprecise: The confidence interval for

the effect of BeerTax at 95% of confidence level is:


0.45 1.96 0.22 (-0.88, -0.02),

which is quite wide.

Panel-12

[7]
(1)

Eviews Exercise
Exercise with an artificial panel data set named artificial_panel.xls.

There are four variables in the excel file, country, year, y, and x. Each variable has 11 observations from the 3rd row to the 14th row. The data are artificial numbers for three countries, US, Japan and Korea. Notice that the variable country is alphabetic, not numeric. STEP 1: STEP 2: Open artificial_panel.xls using Excel. Then, using your mouse, block the data and copy them. Open Eviews. Then, type the following on the Eviews window (the narrow white window below the File, Edit, Object buttons): create u 12 (enter)

Then, a workfile window will pop up.

Panel-13

Type the followings on the Eviews window: alpha country (enter) data year y x (enter) The command alpha is used to create alphabetic variables, while data is for numeric variables. Then, a spreadsheet will pop up.

Panel-14

Close the window by clicking on X on the North-East corner of the window. Eviews will ask you whether you want to delete Untitled Group. Click on the Yes button.

Panel-15

STEP 3:

On the workfile, click on the show buttom. Then, a SHOW window will pop up. Type on the window: country year y x

Panel-16

Click on OK. Then, a spreadsheet will pop up.

Panel-17

Click on Edit+/- buttom and locate your cursor on the 1-country cell. And push the right button on your mouse.

Panel-18

Then, you will see that the data from the excel file are pasted to the spreadsheet.

Panel-19

Close the spreadsheet by clicking on X on the North-East corner. Eviews will ask you whether you want to delete Untitled Group. Click on the Yes button. STET 4: On the workfile, push the save buttom. Determine the drive and file folder where you want to save the file. Choose the file name artificial_panel.wf1.

Panel-20

Click on the save button. Then, a Workfile Save window will pop up. Just click on the ok button.

Panel-21

Then, you will be back to the workfile.

Panel-22

STEP 5:

On the workfile, push the Proc button. Choose Structure/Resize Current Page

Panel-23

Then you will have the Workfile Structure window. Choose Dated Panel. Then, you will have the following screen.

Panel-24

Type 2001 for Start date, 2004 for End date, country for Crosssection ID series, and year for Data series. Then, click on OK.

Panel-25

Then, you will be back to the workfile. Save it!!! STEP 6: Push the objects/new object... button. Choose Equation and choose art_pan as the name of the object. Then, an Equation Estimation window will pop up. Type y x on the Equation specification box.

Panel-26

And click on Panel Options.

Panel-27

Choose Fixed for Cross-section, Fixed for Period, and White (diagonal) for Coef covariance method. By choosing Fixed for Cross-section, you are doing regression with dummy variables for individual entities. By choosing Fixed for Period, you are adding time dummy variables into regression.

Panel-28

STEP 7:

Choose view/Fixed/Random Effects/Cross-section Effects. Then you will have:

Panel-29

Choose view/Fixed/Random Effects/Period Effects.

Panel-30

Choose view/Fixed/Random Effects Testing/Redundant Fixed Effects.

Panel-31

Panel-32

I found that the F and 2 statistics for the individual dummy variables and the time dummy variables are computed assuming the error terms in the regression models are homoskedastic over i and t. So, the results are not reliable if the error terms are in fact heteroskedastic. If you would like to test whether time effects are statistically significant, I would like to suggest you to estimate your model choosing None for Period but including time-dummy variables as time dummy variables.

Panel-33

(2) Exercise with fatality.wf1. ----------------------------------------------------------------------------------variable name variable label ---------------------------------------------------------------------------------state State ID (FIPS) Code year Year spircons Spirits Consumption unrate Unemployment Rate perinc Per Capita Personal Income emppop Employment/Population Ratio beertax Tax on Case of Beer sobapt % Southern Baptist mormon % Mormon mlda Minimum Legal Drinking Age dry % Residing in Dry Counties yngdrv % of Drivers Aged 15-24 vmiles Ave. Mile per Driver vmilespd Ave. Mile per 1,000 Driver breath Prelim. Breath Test Law jaild Mandatory Jail Sentence comserd Mandatory Community Service jailcom jaild + comserd allmort # of Vehicle Fatalities (#VF) mrall Vehicle Fatality Rate (VFR) = #VF/Population vfrall 10,000*mrall = VFR per 10,000 people allnite # of Night-time VF (#NVF) mralln Night-time VFR (NVFR) allsvn # of Single VF (#SVF) a1517 #NVF, 15-17 year olds mra1517n NVFR, 15-17 year olds a1829 #VF, 18-20 year olds a1820n #NVF, 18-20 year olds mra1820 VFR, 18-20 year olds mra1820n NVFR, 18-20 year olds a2124 #VF, 21-24 year olds mra2124 VFR, 21-24 year olds a2124n #NVF, 21-24 year olds mra2124n NVFR, 21-24 year olds aidall # of alcohol-involved VF

Panel-34

da18 Dummy variable for drinking age = 18 da19 Dummy variable for drinking age = 19 da20 Dummy variable for drinking age = 20 lincperc Log of per capita real income mraidall Alcohol-Involved VFR pop Population pop1517 Population, 15-17 year olds pop1820 Population, 18-20 year olds pop2124 Population, 21-24 year olds miles total vehicle miles (millions) unus U.S. unemployment rate epopus U.S. Emp/Pop Ratio gspch GSP Rate of Change Dum1982 Dum1983 Dum1984 : DUM1988 ------------------------------------------------------------------------------------

Panel-35

Estimation of the specification (4) on Table 10.1 in p. 368.


Dependent Variable: VFRALL Sample: 1982 1988 Cross-sections included: 48 Total panel (balanced) observations: 336 White diagonal standard errors & covariance (d.f. corrected) Variable C BEERTAX DA18 DA19 DA20 JAILD COMSERD VMILESPD LINCPERC UNRATE DUM1982 DUM1983 DUM1984 DUM1985 DUM1986 DUM1987 Coefficient -2.327171 -0.450272 0.027509 -0.019096 0.030875 0.012644 0.034135 0.008226 1.814889 -0.063043 0.533926 0.435841 0.246723 0.155325 0.189843 0.087532 Std. Error 1.316419 0.222005 0.065473 0.039510 0.045689 0.031940 0.114820 0.008368 0.472220 0.011616 0.075931 0.070418 0.050392 0.043688 0.040808 0.032452 t-Statistic -1.767804 -2.028203 0.420158 -0.483315 0.675767 0.395866 0.297289 0.983073 3.843312 -5.427345 7.031706 6.189300 4.896067 3.555327 4.652090 2.697246 Prob. 0.0782 0.0435 0.6747 0.6293 0.4998 0.6925 0.7665 0.3264 0.0002 0.0000 0.0000 0.0000 0.0000 0.0004 0.0000 0.0074

Effects Specification Cross-section fixed (dummy variables) R-squared Adjusted R-squared Log likelihood Durbin-Watson stat 0.939540 0.925809 183.8646 1.733929 Mean dependent var S.D. dependent var F-statistic Prob(F-statistic) 2.040444 0.570194 68.42532 0.000000

Panel-36

Testing significance of the individual and time dummy variables: [Estimation choosing Fixed for period and not using dummy variables as regressor.]

Redundant Fixed Effects Tests Equation: MIN Test cross-section and period fixed effects Effects Test Cross-section F Cross-section Chi-square Period F Period Chi-square Cross-Section/Period F Cross-Section/Period Chi-square Statistic 44.772106 727.186063 19.685127 120.798386 40.398468 732.351587 d.f. (47,273) 47 (6,273) 6 (53,273) 53 Prob. 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Panel-37

Testing significance of the time dummy variables: [Estimation choosing None for period and using dummy variables as regressor.]

Wald Test: Equation: MIN Test Statistic F-statistic Chi-square Value 11.46715 68.80287 df (6, 273) 6 Probability 0.0000 0.0000

Panel-38

Comments on (FEA.5):
What if Assumption #5 fails: so corr(uit,uis|Xit,Xis,i) 0?

OLS panel data estimators of 1 are unbiased, consistent. The OLS standard errors will be wrong. Use heteroskedasticity and autocorrelation-consistent standard errors (clustered standard errors). The clustered SE formula is NOT the usual (hetero-robust) SE formula! [Appendix 10.2 (pp. 379 381)]. The clustered SE might not be very accurate if N is small. Eviews can compute these!
In Eviews, choose White period instead of White (diagonal).

Panel-39

Estimation of the specification (7) on Table 10.1 in p. 368.

Dependent Variable: VFRALL Sample: 1982 1988 Cross-sections included: 48 Total panel (balanced) observations: 336 White period standard errors & covariance (d.f. corrected) Variable C BEERTAX DA18 DA19 DA20 JAILD COMSERD VMILESPD LINCPERC UNRATE DUM1982 DUM1983 DUM1984 DUM1985 DUM1986 DUM1987 Coefficient -2.327171 -0.450272 0.027509 -0.019096 0.030875 0.012644 0.034135 0.008226 1.814889 -0.063043 0.533926 0.435841 0.246723 0.155325 0.189843 0.087532 Std. Error 1.915400 0.319805 0.075267 0.053288 0.054076 0.017699 0.142797 0.007355 0.683535 0.013984 0.098541 0.091540 0.064103 0.054832 0.042774 0.032445 t-Statistic -1.214979 -1.407961 0.365483 -0.358351 0.570957 0.714386 0.239043 1.118432 2.655150 -4.508168 5.418291 4.761205 3.848852 2.832774 4.438265 2.697841 Prob. 0.2254 0.1603 0.7150 0.7204 0.5685 0.4756 0.8113 0.2644 0.0084 0.0000 0.0000 0.0000 0.0001 0.0050 0.0000 0.0074

Effects Specification Cross-section fixed (dummy variables) R-squared Adjusted R-squared Durbin-Watson stat 0.939540 0.925809 1.733929 Mean dependent var S.D. dependent var Prob(F-statistic) 2.040444 0.570194 0.000000

Panel-40

Average tax = $0.5/case, and average fatality rate = 2 per 10,000 people. As tax increases by $0.5, fatality rate drops 0.450.5 = 0.225 (per 10,000).
The confidence interval for the effect of BeerTax at 95% of

confidence level is:


0.45 1.96 0.32 (-1.08, 0.18),

which is wider than (-0.88, -0.02).

Panel-41

Panel-42

Panel-43

Você também pode gostar