Você está na página 1de 19

REGRESSION DIAGNOSTICS

Detecting problems in regression models Treating them to obtain unbiased results

Assumptions of OLS Estimator


1) E(ei) = 0

(unbiasedness) 2) Var(ei) is constant (homoscedasticity) 3) Cov(ui,uj) = 0 (independent error terms) 4) Cov(ui,Xi) = 0 (error terms unrelated to Xs) ei ~ iid (0 , 2) Gauss-Markov Theorem: If these conditions hold, OLS is the best linear unbiased estimator (BLUE).
Additional Assumption: eis are normally distributed.

3 illnesses in Regression
1) Multicollinearity: Strong relationship

among explanatory variables.


2) Heteroscedasticty: Changing variance. 3) Autocorrelated Error Terms: this is a

symptom of specification error.

11-4

Multicollinearity (strong relationship among explanatory variables themselves)



Variances of regression coefficients are inflated. Regression coefficients may be different from their true values, even signs. Adding or removing variables produces large changes in coefficients. Removing a data point may cause large changes in coefficient estimates or signs. In some cases, the F ratio may be significant, R2 may be very high despite the all t ratios are insignificant (suggesting no significant relationship).

11-5

Solutions to the Multicollinearity Problem Drop a collinear variable from the regression Combine collinear variables (e.g. use their sum as one variable)

Heteroscedasticity
The variance of error terms is used in computing t-tests of coefficients. If this variance is not constant, then t-tests are not healthy (not efficient, i.e.: the probability of type 2 error is higher). However, the coefficients are unbiased. Therefore heteroscedasticity is not a fatal illness. Check by White test or similar tests. Use heteroscedasticity-adjusted t-statistics and p-values.

Autocorrelation in Error Terms


This is a fatal illness. Because, it indicates a specification error (missing variable, variables used in inappropriate form, etc.). With the current incorrect specification, you cannot see the true coefficients, which you would see if you were estimating the correct model. Hence, this is a serious problem. Check: Durbin-Watson, Graph of error terms. DW 2(1) where et = et-1 + vt

Limitation of DW test statistic: It only checks for first order serial correlation in residuals.

Breusch-Godfrey Test: checks for higher order autocorrelation AR(q) in residuals H0: no serial correlation

Solution to the problem of serial correlation in et Find the correct specification. In time series, use first differences.

Time Series Regressions


Lagged variable: Yt = 0+1Xt+2Xt-1+ut Autoregressive Model: Xt = 1Xt-1+2Xt-2+ut Time-Trend:

AR(2)

Yt = 0 + 1Xt + 2Tt + ut

Spurious Regressions
As a general and very strict rule: All variables in a time-series regression must be stationary. Never run a regression with nonstationary variables!
* DW statistic will warn. Usually, DW << 2 . As most economic time-series grow over time, if you run regression with non-stationary variables you will find spurious positive relationships.

STATIONARITY
A variable is called stationary if it displays mean-reverting behavior (i.e., if its mean remains constant over time). Any regression with nonstationary variables is invalid. Hence, any time-series application must start with two preliminary steps: 1) Test stationarity of the variables 2) If they are not, convert them into a stationary

A regression with nonstationary variables will typically reveal the problem with a Durbin-Watson (DW) statistic being significantly smaller than 2. DW statistic measures the first-order autocorrelation in the error term. DW << 2 implies positive autocorrelation in the error term. -------------------Financial Markets Application: All price series are typically nonstationary. Therefore, we use returns. Rt = ln(Pt / Pt-1)

TESTING STATIONARITY: UNIT ROOT TESTS ADF Test : H0 : the series is non-stationary (i.e. it has a unit root)
ADF test statistics need to be compared to McKinnon critical values. If H0 can be rejected (the test statistic more negative than the critical value), then the variable can be used in regression.

ADF (Augmented Dickey-Fuller) - Test


yt = + t + yt-1 + yt-1 + . + yt-p + t

H0 : = 0

HA: < 0

(PP test makes a nonparametric adjustment for lagged changes.)

Test equation derived from the primitive form: Yt = Yt-1 + et < 1 stationary ; = 1 non-stationary ; > 1 explosive

KPSS Test:

H0: the series is stationary HA: the series is non-stationary

(use when the sample size is small)

Treating Non-stationary variables


Before using a non-stationary series in any regression, we have to first treat it.
Possible Remedies:

1) first-difference it:

yt = yt yt-1

A series is: I(0) if it is stationary I(1) if it becomes stationary when differenced once I(2) if it becomes stationary when differenced twice 2) adjust for trend Sometimes a series can become stationary after de-trending, the it is called trend-stationary. 3) Field-specific treatments: Use inflation-adjusted series; in financial time-series use log returns.

Sometimes, a variable may be stationary but can have strong persistence. To check this, obtain ACF and PACF.

A variable yt is called white noise if: yt ~ i.i.d. (0, 2)


When Yt and Xt are both white noise, then a regression analysis in the form Yt = 0+1Xt+t is adequate, otherwise the problem of serial correlation in residuals will arise. Portmanteau Test: H0: Xt is white noise HA: Xt is not white noise Solution: include lags of the persistent dependent variable.

Summary
Serial correlation in t may result from various reasons, each signaling that the econometrician is doing something wrong: 1) missing variable (omitted variable bias) 2) incorrect functional form 3) using nonstationary variables 4) persistence in the variables 5) (linear deterministic) trend
These reasons are interrelated (e.g., to correct persistence in Xt, you add Xt-1, which was a missing variable in the original specification).

Diagnosis
1) Always see the time series plots of all variables before you run any regression. Check for stationarity, persistence, trend, seasonality and outliers. Any unexpected result should remind you to follow all steps on this page. 2) Formal tests. Before using any variable in regression: Always perform unit root tests, Check persistence (autocorrelations). 3) Post-regression: Always perform Durbin-Watson and Breusch-Godfrey tests for serial correlation in residuals.

Treatment
1) Try to find the reason, if it is due to a distortion (e.g., a missing variable, inaccurate model specification, using nominal variables, trend term, persistence), the first-best solution is to remove the distortion (find all relevant variables and the most appropriate model specification this is done by reviewing the theory, use real variables, adjust for trend, add lags). 2) First-differencing is the ultimate solution, especially in case of nonstationary variables. Take first difference of logs: ln(yt) = ln(yt)ln(yt-1)

Você também pode gostar