Você está na página 1de 12

ODETTE LEH V.

CARAGOS

APPLIED ECONOMETRICS

MA in Economics

Summer SY 2014-2015

FINAL EXAM
PART I: THEORETICAL

PROBLEM 1 (10 points)


Explain the following concepts:
(a) Multicollinearity (2.5pts)
(b) Omitted variable bias (2.5pts)
(c) Unbiasedness (2.5pts)
(d) p-value (2.5pts)

MULTICOLLINEARITY
exists when at least some of the predictor variables are correlated among themselves;
A linear relation between the predictors. Predictors are usually related to some
extent, it is a matter of degree.
Examples include a person's gender, race, grade point average, math SAT score, IQ,
and starting salary. For each of these predictor examples, the researcher just
observes the values as they occur for the people in the random sample.
Some researchers observed the following data on 20 individuals with high blood
pressure:
o
o
o
o
o
o

1|Page

blood pressure (y = BP, in mm Hg)


age (x1 = Age, in years)
weight (x2 = Weight, in kg)
body surface area (x3 = BSA, in sq m)
duration of hypertension (x4 = Dur, in years)
basal pulse (x5 = Pulse, in beats per minute)

stress index (x6 = Stress)

The researchers were interested in determining if a relationship exists between blood


pressure and age, weight, body surface area, duration, pulse rate and/or stress level.
The matrix plot of BP, Age, Weight, and BSA:

and the matrix plot of BP, Dur, Pulse, and Stress:

allow us to investigate the various marginal relationships between the response BP and
the predictors. Blood pressure appears to be related fairly strongly to Weight and BSA,
and hardly related at all to Stress level.

2|Page

The matrix plots also allow us to investigate whether or not relationships exist among
the predictors. For example, Weight and BSA appear to be strongly related, while
Stress and BSA appear to be hardly related at all.
OMITTED VARIABLE BIAS
is the bias that appears in the estimates of parameters in a regression analysis, when
the assumed specification is incorrect in that it omits an independent variable that is
correlated with both the dependent variable and one or more included independent
variables.
UNBIASEDNESS
is an estimator or decision rule with zero bias. All else equal, an unbiased estimator
is preferable to a biased estimator, but in practice all else is not equal, and biased
estimators are frequently used, generally with small bias. When a biased estimator is
used, the bias is also estimated. A biased estimator may be used for various reasons:
because an unbiased estimator does not exist without further assumptions about a
population or is difficult to compute (as in unbiased estimation of standard
deviation); because an estimator is median-unbiased but not mean-unbiased (or the
reverse); because a biased estimator reduces some loss function (particularly mean
squared error) compared with unbiased estimators (notably in shrinkage
estimators); or because in some cases being unbiased is too strong a condition, and
the only unbiased estimators are not useful.
P-VALUE
is the level of marginal significance within a statistical hypothesis test, representing
the probability of the occurrence of a given event. The p-value is used as an
alternative to rejection points to provide the smallest level of significance at which

3|Page

the null hypothesis would be rejected. The smaller the p-value, the stronger the
evidence is in favor of the alternative hypothesis.
For example, if two studies of returns from two particular assets were done using two
different significance levels, a reader could not compare the probability of returns for
the two assets easily. For ease of comparison, researchers will often feature the pvalue in the hypothesis test and allow the reader to interpret the statistical
significance themselves. This is called a p-value approach to hypothesis testing.

PROBLEM 2 (20 points)


i)
A linear multiple regression model is given as:

We obtain the random sample from the population and confirm that each of
X1 and X2 has variation within the sample.
i)What are the parameters in this model? (5pts); ii) State when the
explanatory variable X1 is called to be endogenous. (5pts); iii) Suppose that ii)
X1 and X2 are both exogenous and linearly independent but two variables are
highly correlated. What problem does this cause for the estimation of 1.
(10pts)
iii)

4|Page

The parameter 0 is the constant, representing the expected response when


both x1 and x2 are zero. (As before, this value may not be directly
interpretable if zero is not in the range of the predictors.) The parameter 1
is the slope along the x1 axis and represents the expected change in the
response per unit change in x1 at constant values of x2. Similarly, 2is the
slope along the x2 axis and represents the expected change in the response
per unit change in x2 while holding x1 constant. Then, u is the error term (or
disturbance).
If for some reason such as omission of relevant variables, measurement
errors, simultaneity, etc., X1 is correlated with u, we say that X1 is an
endogenous explanatory variable.
The estimation of 1 will be biased.

PART II: APPLIED

PROBLEM 3 (30 points)


Refer to the same data used in the midterm exam (WAGE_IQ.DTA).
i)

Estimate a model where each one-point increase in IQ has the same


percentage effect on wage. If IQ increases by 15 points, what is the
approximate percentage increase in predicted wage? Hint: Use the
variable lwage instead of wage. Further, examine the relationship
between the two variables lwage and educ by a) creating a two-way
graph b) getting their correlation. [15 points]

ii)

Estimate the following model.

i)

Log(wage)=0+ 1 (educ)+ 2 (female)+3(experience)+u


Is there any gender wage gap? How big is the wage gap?
Log (wage) = 0 + 1* IQ + u
Log (wage) = 5.887 + $0.009 (IQ)
A one point increase in IQ score is predicted to increase monthly earnings by $
0.009.

5|Page

wagePredicted = 5.887 + $0.009 (15)


An increase by 15 points in IQ score = $0.009 x 15 =
monthly earnings

$0.135

increase in

wagePredicted = 5.887 + $0.135


=$6.002
R-squared = 0.0991; variation in IQ explains 9.91% of variation in wages.

Two-way graph of the two variables lwage and educ

6|Page

Correlation of the two variables lwage and educ

In general, r > 0 indicates positive relationship, r < 0 indicates negative


relationship while r = 0 indicates no relationship (or that the variables are
independent and not related). Here
r = +1.0 describes a perfect positive
correlation and r = -1.0 describes a perfect negative correlation.
In view of the premises, data showed positive correlation with a correlation
coefficient of 0.3121.

ii)
Given the following model:
Log(wage)=0+ 1 (educ)+ 2 (female1)+3(experience)+u

No female variable in Wage_IQ.dta, thus, I used the mothers educ or meduc.

7|Page

Log(wage)=6.35+ .033 (mothers educ)+ .008 (experience)+u


An extra year of a mothers education results in a 33% increase in predicted
wage, while 0.8% only for experience.

8|Page

Log(wage)=6.35+ .032 (fathers educ)+ .011 (experience)+u


While an extra year of a fathers education results in a 32% increase in
predicted wage, while 1.1% only for experience.
Only 1% wage difference between a mothers and fathers education and 0.3%
for experience.

9|Page

PROBLEM 4 (40 points)

i)

Using 1388 observations from a survey on mothers, a study was conducted to


examine the relationship between birth weight of newborn babies and cigarette
consumption of mothers. Please use the dataset bwght.dta into Stata and get a
general idea about the data. Hint: Use describe and summary Stata commands.
i) Regress the log of birth weight of new born baby on the number of cigarettes
smoked during the mothers pregnancy and log of family income.
Interpret the estimation results.
Log(bwght)=0+ 1 (cigs)+ 2 log(faminc)+u
ii)

iii)

What is the estimated birth weight when cigs = 0? How does it change
when cigs = 20 pieces? Discuss the difference.

A one point increase in cigarettes consumed is predicted to decrease


birthweight by -.004. and a one point increase in family increase is predicted
Now we use the number of packs of cigarettes as an explanatory to increase birthweight by .016.
variable. Interpret the parameters and compare with part (i) results.
Log(bwght)= 4.718 + -.004 (cigs)+ .016 log(faminc)+u
Log(bwght)=0+ 1 (packs)+ 2 log(faminc)+u

R-squared = 0.0258; variation in cigarettes consumed explains 2.58% of


variation in birthweight.

iv) Is it enough to run such regression models to explain the causal relationship
between the birth weight, and smoking behavior of the mother and
ii) If cigarette consumed is 0, the birth weight is 4.734
family income? Please justify your answer.
If cigarettes consumed are 20 pcs, it will result to a decrease in birthweight by

0.08 or 8%
10 | P a g e

birthweightPredicted = 4.718 + -.004 (20) + .016 log(faminc)+u


=4.654
iii)

If a pregnant mother will consume 0 pack of cigarettes, birthweight 4.734


birthweightPredicted = 4.718 + -.082 (0)+ .016 log(faminc)+u
If cigarettes consumed are 20 packs, it will result to a decrease in birthweight
by 1.64 or 164%

11 | P a g e

birthweightPredicted = 4.718 + -.082 (20)+ .016 log(faminc)+u


= 3.094
The data showed a strong relationship between the number of cigarettes
consumed to birthweight. The higher the number of cigarettes consumed, the
greater it will affect the birthweight.
iv) Regression ASSUMES a causal relationship. If there is no basis for causality
as a result of physical/intellectual/scientific analysis of the issue, there is no
basis for a causal analysis and no basis for a regression. Thus, birth weight,
smoking behaviour of the mother, and family income are not enough because it
might create bias because of the omitted variables. However, no matter how
many variables control for, there are always factors that cannot be
included or unobserved but the effect would be reduced if we include more
variables such as in this case, education, cigarette price, cigarette tax, and
etc.

12 | P a g e

Você também pode gostar