Você está na página 1de 31

Chapter 7

MODEL SPECIFICATION AND


DIAGNOSTIC TESTING
(Lec 13)

Nguyen Thu Hang, BMNV, FTU CS2 1


Outline
1. Model selection criteria
2. Types of specification errors
3. Tests of specification errors
3.1. Detecting the presence of unnecessary variables
3.2. Tests for omitted variables and incorrect functional
form
3.3. Tests for Normality (in residuals)
4. Ten Commandments of applied Econometrics

Nguyen Thu Hang, BMNV, FTU CS2 2


1. Model selection criteria
• Be data admissible: predictions made from the
model must be logically possible.  The
modeled and observed y should have the same
properties
• Be consistent with theory: it must make good
economic sense  our model should “make
sense”
• Have weakly exogenous regressors: the
explanatory variable must be uncorrelated with
the error term.
• Exhibit parameter constancy: the values of the
parameters should be stable. Otherwise,
forecasting will be difficulty.  We should expect
out-of-sample validation
Nguyen Thu Hang, BMNV, FTU CS2 3
1. Model selection criteria
• Exhibit data coherency: the residuals
estimated from the model must be purely
random. In other words, if the regression
model is adequate, the residuals from the
model must be white noise. If that is not the
case, there is some specification error in the
model. all information should be in the
model. Nothing left in the errors
• Be encompassing: the model should include
all the rival models in the sense that it is
capable of explaining their results.  Our
model should explain earlier models.
Nguyen Thu Hang, BMNV, FTU CS2 4
2. Types of specification errors
• Omission of a relevant variable(s)
• Inclusion of an unnecessary variable(s)
• Adopting the wrong functional form

Nguyen Thu Hang, BMNV, FTU CS2 5


Omission of a relevant variable(s)
The true model: Yi  1   2 X 2i 3 X 3i  ui
One fits the model: Yi  1   2 X 2ivi
The consequences of omitting variables X3:
• If the omitted variable X3 is correlated with in the included
variable X2, i.e. r12 the correlation between the two variables is
nonzero  ̂ 1 and ̂ 2 are biased as well as inconsistent. The
bias does not disappear as the sample size gets larger.
• Even if X2 and X3 are not correlated, ̂ 1 biased although̂ 2 is
now unbiased.
• The disturbance variance  is incorrectly estimated.
2

• The conventionally measured variance of ̂ 2 is a biased


estimator of the variance of the true estimator ˆ 2
• The confidence interval and hypothesis-testing procedures are
likely to give misleading conclusions.
• The forecasts based on the incorrect model and the forecast
intervals will be unreliable.
Nguyen Thu Hang, BMNV, FTU CS2 6
Inclusion of an irrelevant variable(s)
The true model: Yi  1   2 X 2iui
One fits the model: Yi   1   2 X 2i 3 X 3i  vi
The consequences of this specification error:
• The OLS estimators of the parameters of the incorrect
model are all unbiased and consistent: E(ˆ1 )  1 E(ˆ 2 )   2
E (ˆ 3 )   3  0
• The error variance is correctly estimated.
• The usual confidence interval and hypothesis-testing
procedures remain valid.
• The estimated  ‘s will be generally inefficient. Their
variances will be generally larger than those of the true
model.

Nguyen Thu Hang, BMNV, FTU CS2 7


Functional Form Mis-specification

• For example, if we estimate a linear model


Yi  1   2 X 2i   3 X 3i  U i

• But the true model is a log-linear model


ln Yi  b1  b2 ln X 2i  b3 ln X 3i  U i

Then the mis-specification arises because we


estimate the “wrong” functional form
Nguyen Thu Hang, BMNV, FTU CS2 8
Mis-specification tests
Mis-specification generally occurs when:

• We omit a relevant variable, or


• We include an irrelevant variable, or
• We use an incorrect functional form

In most circumstances we do not know what


the “true” model is. How can we determine,
therefore, whether the model we estimate is
correctly specified?
Nguyen Thu Hang, BMNV, FTU CS2 9
Preliminary Analysis (informal Tests)
• Variables based on economic theory (if possible)
• Observe sign and significance of coefficients;
what happens when an additional variable is
added or deleted?
• Does adj R2 increase when more variables are
added
• Look at the pattern of the residuals
(if there are noticeable patterns then it is possible
that the model has been mis-specified)

Nguyen Thu Hang, BMNV, FTU CS2 10


3. Tests of specification errors
• Detecting the presence of unnecessary
variables
• Tests for omitted variables and incorrect
functional form
- Examination of residuals.
- The Durbin-Watson d statistic
- Ramsey’s RESET test

Nguyen Thu Hang, BMNV, FTU CS2 11


Detecting the presence of unnecessary variables
• A k-variable model: Y
i  1   2 X 2i  ....   k X ki  ui
• To test whether Xk really belongs in the model  test the
significance of the estimated ̂ k  t test
• To test whether X3 and X4 legitimately belong in the model
 F test  See Chap.2
• Note carefully that we should not use the t and F tests to
build a model iteratively. That is we should not say that
initially Y is related to X2 only because ˆ 2 is statistically
significant and then expand the model to include X3 and
keep that variable if ˆ3 turns out to be statistically
significant, and so on. To build the best model  This is
called data mining, regression fishing or data snooping…
Nguyen Thu Hang, BMNV, FTU CS2 12
Tests for omitted variables
and incorrect functional form
• Examination of residuals
• The Durbin-Watson d statistic
• Ramsey’s RESET test

Nguyen Thu Hang, BMNV, FTU CS2 13


Example- Residual examination
• The true model: (wage1.dta)
log( wage)  1   2 educ   3 exp er   4 exp er 2   5 female   6 female * educ  u

• One fits the model:


log( wage)  1   2educ  3 exp er  5 female  6 female * educ  u

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0927793 .0089777 10.33 0.000 .0751423 .1104163


exper .0094302 .0014518 6.50 0.000 .0065781 .0122823
female -.2958839 .1787929 -1.65 0.099 -.6471275 .0553597
femaleXeduc -.0038152 .013975 -0.27 0.785 -.0312694 .0236391
_cons .4614994 .1267468 3.64 0.000 .2125017 .7104971

Nguyen Thu Hang, BMNV, FTU CS2 14


Example- Residual examination

There exists 2
1
uhat2

some pattern of
0

residuals
-1
-2

-1 0 1 2 3
log(wage)

Nguyen Thu Hang, BMNV, FTU CS2 15


Example- Residual examination
• The true model: (wage1.dta)
log( wage)  1   2 educ   3 exp er   4 exp er 2   5 female   6 female * educ  u

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0843801 .008754 9.64 0.000 .0671825 .1015777


exper .0389047 .0048295 8.06 0.000 .029417 .0483925
expersq -.0006858 .0001076 -6.38 0.000 -.0008971 -.0004745
female -.3294333 .1724333 -1.91 0.057 -.6681849 .0093182
femaleXeduc -.0006201 .0134809 -0.05 0.963 -.0271039 .0258637
_cons .3873644 .1227335 3.16 0.002 .14625 .6284788

Nguyen Thu Hang, BMNV, FTU CS2 16


Example- Residual examination

1
There still exists
some pattern of
residuals
0
uhat

-1
-2

-1 0 1 2 3
log(wage)

Nguyen Thu Hang, BMNV, FTU CS2 17


The Durbin-Watson d statistic
Step 1: From the assumed model, obtain the OLS residuals.
Step 2: If it is believed that the assumed model if mis-
specified because it excludes a relevant explanatory
variable, say, Z from the model, order the residuals
obtained in Step 1 according to increasing values of Z. The
Z variable could be one of the X variables included in the
assumed model or it could be some function of that
variable.
Step 3: Compute the d statistic from the residuals thus
ordered by the usual d formula.

n
(uˆ  uˆ t 1 ) 2

d t 2 t

t 1 t
n
uˆ 2

Step 4: From the Durbin-Watson tables, if the estimated d


value if significant, then one can accept the hypothesis of
model mis-specification.
Nguyen Thu Hang, BMNV, FTU CS2 18
Ramsey’s RESET TEST
A more formal test of mis-specification

Proxy variables

RESET test: proxies based on the


predicted value of Y

Nguyen Thu Hang, BMNV, FTU CS2 9.19


Ramsey’s RESET TEST
Suppose we estimate the following model
Yi  b1  b2 X 2i  Vi
and want to test for mis-specification.
The RESET test uses the predicted values
Yˆi  bˆ1  bˆ2 X 2i
And creates various powers of Yˆi
Adding these powers to the original model, we then
estimate a new model:
Yi  b1  b2 X 2i  b3Yˆi 2  b4Yˆi 3  b5Yˆi 4  U i
Nguyen Thu Hang, BMNV, FTU CS2 9.20
Ramsey’s RESET TEST
Perform an F-test on the significance of the
additional variables 2 2
( Rnew  Rold ) / r
F
(1  Rnew
2
) /( n  k )
Where r= number of additional variables,
k= number of parameters in the new model
If additional variables are significant: evidence
of mis-specification
Cautionary Note
RESET is easy to apply but cannot tell us
the reason for the mis-specification
(i.e. omitted variable or functional form)
Nguyen Thu Hang, BMNV, FTU CS2 9.21
Example
. reg lwage educ exper female femaleXeduc

Source SS df MS Number of obs = 526


F( 4, 521) = 70.95
Model 52.3076455 4 13.0769114 Prob > F = 0.0000
Residual 96.022106 521 .184303466 R-squared = 0.3526
Adj R-squared = 0.3477
Total 148.329751 525 .28253286 Root MSE = .42931

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0927793 .0089777 10.33 0.000 .0751423 .1104163


exper .0094302 .0014518 6.50 0.000 .0065781 .0122823
female -.2958839 .1787929 -1.65 0.099 -.6471275 .0553597
femaleXeduc -.0038152 .013975 -0.27 0.785 -.0312694 .0236391
_cons .4614994 .1267468 3.64 0.000 .2125017 .7104971

Nguyen Thu Hang, BMNV, FTU CS2 22


Example

• predict Yhat, xb
• gen Y2=Yhat^2
• gen Y3=Yhat^3
• gen Y4=Yhat^4

Nguyen Thu Hang, BMNV, FTU CS2 23


Example
. reg lwage educ exper female femaleXeduc Y2 Y3 Y4

Source SS df MS Number of obs = 526


F( 7, 518) = 45.18
Model 56.2334642 7 8.03335203 Prob > F = 0.0000
Residual 92.0962872 518 .17779206 R-squared = 0.3791
Adj R-squared = 0.3707
Total 148.329751 525 .28253286 Root MSE = .42165

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ -.2795518 .4160554 -0.67 0.502 -1.096915 .5378115


exper -.0268507 .0425503 -0.63 0.528 -.1104432 .0567417
female .1758307 1.35953 0.13 0.897 -2.495039 2.8467
femaleXeduc .0654232 .0245688 2.66 0.008 .0171565 .11369
Y2 .9318348 5.155182 0.18 0.857 -9.1958 11.05947
Y3 .4476047 2.492681 0.18 0.858 -4.449401 5.344611
Y4 -.1509599 .4300247 -0.35 0.726 -.9957668 .6938469
_cons 1.732046 .8400993 2.06 0.040 .0816254 3.382467
Nguyen Thu Hang, BMNV, FTU CS2 24
Example
• F=7.36 >F(0.05, 3, 518)
2
( Rnew  Rold
2
)/r
F
(1  Rnew
2
) /( n  k )

There exist omitted variables

Nguyen Thu Hang, BMNV, FTU CS2 25


Example
estat ovtest

Ramsey RESET test using powers of the


fittedvalues of lwage
Ho: model has no omitted variables
F(3, 518) = 7.36
Prob > F = 0.0001
 Reject H0

Nguyen Thu Hang, BMNV, FTU CS2 26


Tests for Normality (in residuals)

• There are both graphical and statistical methods for


evaluating normality.

• Graphical methods include the histogram and normality


plot.

• Statistical methods include diagnostic hypothesis tests for


normality, and a rule of thumb that says a variable is
reasonably close to normal if its skewness and kurtosis have
values between –1.0 and +1.0.

• None of the methods is absolutely definitive.

Nguyen Thu Hang, BMNV, FTU CS2 27


Example
1
.8
Density

.6
.4
.2
0

-2 -1 0 1
Residuals

Nguyen Thu Hang, BMNV, FTU CS2 28


Jarque-Bera test
 S 2 ( K  3) 2 
JB  n  
6 24 

Where S= skewness, K=Kurtosis, JB follows Chi-square(2)


H0: the residuals are normally distributed.

If JB > Chi-square(0.05,2) = 5.99  Reject H0

Nguyen Thu Hang, BMNV, FTU CS2 29


Example- Jarque-Bera test
 S 2 ( K  3) 2 
JB  n     11.126 JB=11.126>Chi2(2)
6 24 

 Reject H0
. sum res, de

Residuals

Percentiles Smallest
1% -.9413162 -1.896354
5% -.6726342 -1.090606
10% -.4950989 -1.065244 Obs 526
25% -.2650768 -1.056344 Sum of Wgt. 526

50% -.0384252 Mean -1.55e-09


Largest Std. Dev. .4276672
75% .2703081 1.213416
90% .5756027 1.231669 Variance .1828992
95% .7610849 1.232905 Skewness .1106091
99% 1.032537 1.283482
Nguyen Thu Hang, BMNV, FTU CS2Kurtosis 3.677287 30
“Ten Commandments of applied Econometrics”
by Peter Kennedy of Simon Fraser University
1. Thou shalt use common sense and economic theory.
2. Thou shalt ask the right questions (i.e. put relevance before
mathematical elegance).
3. Thou shalt know the context (do not perform ignorant statistical
analysis).
4. Thou shalt inspect the data
5. Thou shalt not worship complexity. Use the KISS principle, that is,
keep it stochastically simple.
6. Thou shalt look long and hard at thy results.
7. Thou shalt beware the costs of data mining.
8. Thou shalt be willing to compromise (do not worship textbook
prescriptions).
9. Thou shalt not confuse significance with substance (do not
confuse statistical significance with practical significance).
10. Thou shalt confess in the presence of sensitivity (that is, anticipate
criticism)
Nguyen Thu Hang, BMNV, FTU CS2 31

Você também pode gostar