Você está na página 1de 43

Econometric Modeling

.[1]

Econometric Models
They are statistical models used in Econometrics.
They specifies the statistical relationship that is
believed to hold between the various economic
quantities pertaining to a particular economic
phenomenon under study.
An econometric model can be derived from
a deterministic economic model by allowing
for uncertainty, or from an economic model
which itself is stochastic. However, it is also
possible to use econometric models that are
not tied to any specific economic theory
Example
Monthly spending by consumers is
linearly dependent on consumers' income in
the previous month. Then the model will
consist of the equation.
Ct =+Yt-1+t
where Ct is consumer spending in month t, Yt-1 is income
during the previous month, and t is an error term measuring
the extent to which the model cannot fully explain
consumption. Then one objective of the econometrician is to
obtain estimates of the parameters and; these estimated
parameter values, when used in the model's equation, enable
predictions for future values of consumption to be made
contingent on the prior month's income.
Model selection criteria
In its most basic forms, model selection is one of the
fundamental tasks of scientific inquiry.
Be data admissible: that is predictions made from the
model must be logical possible.
Be consistence with theory: That is, it must make good
economic sense. For example, if Milton Friedmans
permanent income hypothesis holds, the intercept
value in the regression of permanent consumption on
permanent income is expected to be zero.
Model selection criteria
Model selection techniques can be considered
as estimators of some physical quantity, such as the
probability of the model producing the given data.
The bias and variance are both important measures of
the quality of this estimator; efficiency is also often
considered.
A standard example of model selection is that of curve
fitting, where, given a set of points and other
background knowledge (e.g. points are a result
of i.i.d. samples), we must select a curve that describes
the function that generated the points.
Model selection criteria
Be encompassing: that is, the model should
encompass or include all the rival models in
the sense that it is capable of explain their
results. In short, other models can not bean
improvement over the chosen model.
The value of the parameters should be stable.
Otherwise, forecasting would be difficult.
exploratory data analysis (EDA)
It is an approach to analyzing data sets to summarize their
main characteristics, often with visual methods. A statistical
model can be used or not, but primarily EDA is for seeing
what the data can tell us beyond the formal modeling or
hypothesis testing task. Exploratory data analysis was
promoted by John Tukey to encourage statisticians to
explore the data, and possibly formulate hypotheses that
could lead to new data collection and experiments. EDA is
different from initial data analysis (IDA),[1] which focuses
more narrowly on checking assumptions required for
model fitting and hypothesis testing, and handling missing
values and making transformations of variables as needed.
EDA encompasses IDA.
Specifying an Econometric Equation and
Specification Error
Before any equation can be estimated, it must be completely specified
Specifying an econometric equation consists of three parts, namely
choosing the correct:
independent variables
functional form
form of the stochastic error term
Again, this is part of the first classical assumption from Chapter 4
A specification error results when one of these choices is made
incorrectly
This chapter will deal with the first of these choices (the two other
choices will be discussed in subsequent chapters)

2011 Pearson Addison-Wesley. All rights


6-8
reserved.
Type of Specification Problem
Omitted Variables

Two reasons why an important explanatory variable might


have been left out:
we forgot
it is not available in the dataset, we are examining
Either way, this may lead to omitted variable bias
(or, more generally, specification bias)
The reason for this is that when a variable is not included, it
cannot be held constant
Omitting a relevant variable usually is evidence that the
entire equation is a suspect, because of the likely bias of
the coefficients.

2011 Pearson Addison-Wesley. All rights


6-10
reserved.
The Consequences of an
Omitted Variable

Suppose the true regression model is:


(6.1)
Where is a classical error term
If X2 is omitted, the equation becomes instead:
(6.2)
Where:
(6.3)
Hence, the explanatory variables in the estimated regression (6.2) are not
independent of the error term (unless the omitted variable is uncorrelated with all
the included variablessomething which is very unlikely)
But this violates Classical Assumption III!

2011 Pearson Addison-Wesley. All rights


6-11
reserved.
The Consequences of an Omitted Variable
(cont.)
What happens if we estimate Equation 6.2 when Equation 6.1 is the truth?
We get bias!
What this means is that:
(6.4)
The amount of bias is a function of the impact of the omitted variable on the
dependent variable times a function of the correlation between the included and
the omitted variable
Or, more formally:
(6.7)
So, the bias exists unless:
1. the true coefficient equals zero, or
2. the included and omitted variables are uncorrelated

2011 Pearson Addison-Wesley. All rights


6-12
reserved.
Correcting for an Omitted Variable

In theory, the solution to a problem of specification bias seems easy:


add the omitted variable to the equation!
Unfortunately, thats easier said than done, for a couple of reasons
1. Omitted variable bias is hard to detect: the amount of bias introduced can
be small and not immediately detectable
2. Even if it has been decided that a given equation is suffering from omitted
variable bias, how to decide exactly which variable to include?
Note here that dropping a variable is not a viable strategy to help cure
omitted variable bias:
If anything youll just generate even more omitted variable bias on the
remaining coefficients!

2011 Pearson Addison-Wesley. All rights


6-13
reserved.
Correcting for an Omitted Variable (cont.)

What if:
You have an unexpected result, which leads you to believe that you have an
omitted variable
You have two or more theoretically sound explanatory variables as potential
candidates for inclusion as the omitted variable to the equation is to use
How do you choose between these variables?
One possibility is expected bias analysis
Expected bias: the likely bias that omitting a particular variable would have
caused in the estimated coefficient of one of the included variables

2011 Pearson Addison-Wesley. All rights


6-14
reserved.
Correcting for an Omitted Variable (cont.)

Expected bias can be estimated with Equation 6.7:


(6.7)
When do we have a viable candidate?
When the sign of the expected bias is the same as the sign of
the unexpected result
Similarly, when these signs differ, the variable is
extremely unlikely to have caused the unexpected result

2011 Pearson Addison-Wesley. All rights


6-15
reserved.
Irrelevant Variables
This refers to the case of including a variable in an equation when it does
not belong there
This is the opposite of the omitted variables caseand so the impact can
be illustrated using the same model
Assume that the true regression specification is:
(6.10)
But the researcher for some reason includes an extra variable:
(6.11)
The misspecified equations error term then becomes:
(6.12)

2011 Pearson Addison-Wesley. All rights


6-16
reserved.
Irrelevant Variables (cont.)

So, the inclusion of an irrelevant variable will not cause bias


(since the true coefficient of the irrelevant variable is zero, and so
the second term will drop out of Equation 6.12)
However, the inclusion of an irrelevant variable will:
Increase the variance of the estimated coefficients, and this
increased variance will tend to decrease the absolute
magnitude of their t-scores
Decrease the R2 (but not the R2)
Table 6.1 summarizes the consequences of the omitted variable
and the included irrelevant variable cases (unless r12 = 0)

2011 Pearson Addison-Wesley. All rights


6-17
reserved.
Table 6.1 Effect of Omitted Variables and Irrelevant
Variables on the Coefficient Estimates

2011 Pearson Addison-Wesley. All rights


6-18
reserved.
Four Important Specification Criteria

We can summarize the previous discussion into four criteria to help


decide whether a given variable belongs in the equation:
1. Theory: Is the variables place in the equation unambiguous and theoretically
sound?
2. t-Test: Is the variables estimated coefficient significant in the expected direction?
3. R2: Does the overall fit of the equation (adjusted for degrees of freedom) improve
when the variable is added to the equation?
4. Bias: Do other variables coefficients change significantly when the variable is
added to the equation?
If all these conditions hold, the variable belongs in the equation
If none of them hold, it does not belong
The tricky part is the intermediate cases: use sound judgment!

2011 Pearson Addison-Wesley. All rights


6-19
reserved.
Specification Searches

Almost any result can be obtained from a given dataset, by


simply specifying different regressions until estimates with
the desired properties are obtained
Hence, the integrity of all empirical work is open to question
To counter this, the following three points of Best Practices in
Specification Searches are suggested:
1. Rely on theory rather than statistical fit as much as possible when choosing
variables, functional forms, and the like
2. Minimize the number of equations estimated (except for sensitivity
analysis, to be discussed later in this section)
3. Reveal, in a footnote or appendix, all alternative specifications
estimated

2011 Pearson Addison-Wesley. All rights


6-20
reserved.
Sequential Specification
Searches
The sequential specification search technique allows a researcher to:
Estimate an undisclosed number of regressions
Subsequently present a final choice (which is based upon an unspecified
set of expectations about the signs and significance of the coefficients) as if
it were only a specification
Such a method misstates the statistical validity of the regression
results for two reasons:
1. The statistical significance of the results is overestimated because the
estimations of the previous regressions are ignored
2. The expectations used by the researcher to choose between various
regression results rarely, if ever, are disclosed

2011 Pearson Addison-Wesley. All rights


6-21
reserved.
Bias Caused by Relying on the
t-Test to Choose Variables

Dropping variables solely based on low t-statistics may lead to two different types
of errors:
1. An irrelevant explanatory variable may sometimes be included in the equation
(i.e., when it does not belong there)
2. A relevant explanatory variables may sometimes be dropped from the equation
(i.e., when it does belong)
In the first case, there is no bias but in the second case there is bias
Hence, the estimated coefficients will be biased every time an excluded variable
belongs in the equation, and that excluded variable will be left out every time its
estimated coefficient is not statistically significantly different from zero
So, we will have systematic bias in our equation!

2011 Pearson Addison-Wesley. All rights


6-22
reserved.
Sensitivity Analysis

Contrary to the advice of estimating as few equations as possible


(and based on theory, rather than fit!), sometimes we see journal article
authors listing results from five or more specifications
Whats going on here:
In almost every case, these authors have employed a technique called
sensitivity analysis
This essentially consists of purposely running a number of alternative
specifications to determine whether particular results are robust (not
statistical flukes) to a change in specification
Why is this useful? Because true specification isnt known!

2011 Pearson Addison-Wesley. All rights


6-23
reserved.
Data Mining

Data mining involves exploring a data set to try to uncover


empirical regularities that can inform economic theory
That is, the role of data mining is opposite that of traditional
econometrics, which instead tests the economic theory on a
data set
Be careful, however!
a hypothesis developed using data mining techniques must be tested
on a different data set (or in a different context) than the one used to
develop the hypothesis
Not doing so would be highly unethical: After all, the researcher
already knows ahead of time what the results will be!

2011 Pearson Addison-Wesley. All rights


6-24
reserved.
Test of specification error
6.2 Durbin-Watson Test
n

(ut ut 1 )
2

d 2
n

t

u
1
2

d
u u 2 u u
2
t
2
t 1 t t 1

u 2
t

26
6.2 Durbin-Watson Test
The sampling distribution of d depends on values of
the explanatory variables and hence Durbin and
Watson derived upper limits and lower
(dU )
limits for the(dsignificance
) level for d.
L

There are tables to test the hypothesis of zero


autocorrelation against the hypothesis of first-order
positive autocorrelation. ( For negative
autocorrelation we interchange .)

d L and dU 27
6.2 Durbin-Watson Test
If d d L, we reject the null hypothesis of no
autocorrelation.

If d dU, we do not reject the null hypothesis.

If d L d d, Uthe test is inconclusive.

28
6.2 Durbin-Watson Test
Illustrative Example
Consider the data in Table 3.11. The estimated
production function is

log X 3.938 1.451 log L1 0.384 log K1


( 0.237) ( 0.083) ( 0.048)

R 2
0.to
Referring 9946 0k=2
DWwith
the DW table for
.88 and n=39 0.559
5%
significance level, we see that .
Since the observed , we reject 1.38
d L the
hypothesis at the 5% level.
d 0.858 d L
0 29
6.2 Limitations of D-W Test
1. It test for only first-order serial correlation.

2. The test is inconclusive if the computed value lies


between . dU
d L and
3. The test cannot be applied in models with lagged
dependent variables.

30
6.3 Estimation in Levels Versus First
Differences
Simple solutions to the serial correlation problem: First
Difference

If the DW test rejects the hypothesis of zero serial correlation,


what is the next step?

In such cases one estimates a regression by transforming all


the variables by
-differencing (quasi-first difference)

First-difference

31
6.3 Estimation in Levels Versus First
Differences

yt xt ut

yt 1 xt 1 ut 1

( yt yt 1 ) ( xt xt 1 ) (ut ut 1 )

32
6.3 Estimation in Levels Versus First
Differences

yt t xt ut

yt 1 (t 1) xt 1 ut 1

( yt yt 1 ) ( xt xt 1 ) (ut ut 1 )

33
6.3 Estimation in Levels Versus First
Differences
When comparing equations in levels and first
differences, one cannot compare the R2 because the
explained variables are different.

One can compare the residual sum of squares but


only after making a rough adjustment. (Please refer
to P.231)

34
6.3 Estimation in Levels Versus First
Differences
Let
R1 Rfrom the first difference equation
2 2

RSS 0residual sum of squares from the levels equation


RSS 1residual sum of squares from the first difference
equation
RD2 comparable R 2
from the levels equation

1 R 2D n k 1
Then RSS 0 d RSS1
1 R 1
2
nk
RSS 0 n k 1
d
RSS1 n k 35
6.3 Estimation in Levels Versus First
Differences
Illustrative Examples
Consider the simple Keynesian model discussed by
Friedman and Meiselman. The equation estimated in
levels is

C At t t expenditure
where Ct=t personal consumption
1 , 2 ,...., T
(current dollars)
At= autonomous expenditure
(current dollars)

36
6.3 Estimation in Levels Versus First
Differences
The model fitted for the 1929-1030 period gave
(figures in parentheses are standard)

1. Ct 58,335.0 2.4984 A t
( 0.312)

R 0.8771
1
2
DW 0.89 RSS1 11,943 10 4

2. Ct 1.993 A t
( 0.324)

R02 0.8096 DW 1.51 RSS 0 8,387 10 4

37
6.3 Estimation in Levels Versus First
Differences
RSS 0 n k 1
R 1
2
d (1 R 2
1 )
RSS 1 n k
D

11.943 9
1 (0.89) (1 0.8096)
8.387 10
1 0.2172 0.7828

This is to be compared with R12 0.8096


from the
equation in first differences.
38
6.3 Estimation in Levels Versus First
Differences
For the production function data in Table 3.11 the
first difference equation is

log X 0.987 log L1 0.502 log K1


( 0.158) ( 0.134)

R 0.8405 DW 1.177 RSS1 0.0278


1
2

The comparable figures the levels equation reported


earlier in Chapter 4, equation (4.24) are
R 0.9946 DW 0.858 RSS 0 0.0434
2
0
39
6.3 Estimation in Levels Versus First
Differences

0.0434 36
R 1
2
D (0.858) (1 0.8405)
0.0278 37
1 0.2079 0.7921

This is to be compared with R12 0.8405


from the
equation in first differences.

40
errors of measurement
It is assumed that the data of consumption or
income is accurate in Keynesian model.
Unfortunately, this idea is not met in practice
for a variety of reasons such as nonresponse
error, reporting errors and computing error.
Although the errors of measurement in the
dependent variable still give unbiased
estimate of the parameters and their
variances.
errors of measurement
The estimated variances are now larger than
in the where there are no such errors of
measurement.
In case of measurement in the explanatory
variable X, there is no satisfactory solution.
That is why it is so crucial to measure the data
as accurately as possible.

Você também pode gostar