Você está na página 1de 8

Applied Econometrics for Managers: Assignment #3

Group 1
PGP18143 SHOUNAK DUTTA
PGP18173 ADITYA KUKKILLAYA
PGP18200 JOY CHATTERJEE
PGP18221 PRASUN KUMAR ROY
PGP18249 UDIT BIBHUTI

(a) Once you import the data, tell R to treat it as panel data and check if the data has been correctly
imported. How do you verify that n=595 and T=7 (Note that the data is a balanced panel, and each
individual is being observed for 7 years (7 rows) before the next individual’s data is recorded))?
Sol.
The output of pdim(mydata) is Balanced Panel: n = 595, T = 7, N = 4165.
This implies that the data is panel data.

(b) Provide a table with summary statistics for all the variables in the dataset.
Sol.
SUMMARY STATISTICS FOR ALL THE VARIABLES IN THE DATASET

(c) What is the gender-wise distribution in the data? Provide a pie chart showing the distribution
of males and females in the dataset.
Sol.
(d) In how many instances (in percentage terms), wages for men are covered by union contract and
wages for women are covered by union contract? Create a bar plot showing the instances of male
and female wages being covered and not covered by union contracts.
Sol.

Title Male Female


No Union Contract 61.66% 78.89%
Union Contract 38.34% 21.11%

(e) Let’s say that we are interested in the resident location choice of individuals. The variable
‘South’ represents an individual’s choice to live in the South or not. We want to know if we can
predict this location choice from certain demographic characteristics of an individual. In order to
do this, let’s start with a linear probability model where ‘South’ is the dependent variable and
‘EXP’, ‘WKS’, ‘OCC’, ‘IND’, ‘FEM’, ‘ED’, ‘BLK’ are the explanatory variables. Estimate the
coefficients, and provide the heteroscedasticity corrected standard errors. Explain the results.
(Note that you do not need to incorporate any panel data modelling technique yet)
Sol.
INTERPRETATION OF RESULTS:

• A unit increase in EXP decreases the probability of the individual’s choice to live in
SOUTH by 0.00313019, keeping other factors constant
• If weeks worked increases by 1 week, the probability of the individual’s choice to live
in SOUTH increases by 0.0028889, keeping other factors constant.
• Compared to an individual having a white-collar job, an individual having a blue-
collar job will have 0.04584822 less probability to live in SOUTH, keeping other
factors constant
• Compared to an individual working in non-manufacturing industry, an individual
working in manufacturing industry will have 0.0846087 less probability to live in
SOUTH, keeping other factors constant
• Compared to males, females will have 0.00734815 more probability to live in SOUTH,
keeping other factors constant
• A year increase in ED (years of education) decreases the probability of the individual’s
choice to live in SOUTH by 0.02904409, keeping other factors constant
• Compared to white people, black people will have 0.18127964 more probability to live
in SOUTH, keeping other factors constant
• Except FEM all other variables are statistically significant at 5% level of significance
• Adjusted R-squared of the model is 0.04282

(f) Provide the distribution or summary statistics of the predicted dependent variable ((South)ˆ).
What do you see?
Sol.

The predicted probabilities for SOUTH are between 0.05 and 0.69. Here the predicted
probabilities are restricted between 0 and 1.

(g) Suppose we want to know the likelihood of living in the South for two people with different
demographic characteristics. The first person is a black male having a blue-collar job in the
manufacturing industry with experience=31, who has worked for 52 weeks in a given year and has
10 years of education. The second person is a non-black female having a white-collar job in the
non-manufacturing industry with experience=9, who has worked for 42 weeks in a given year and
has 16 years of education. Calculate the estimated probability of these two individuals living in
the South.
Sol.
First Person - 0.4468046
Second Person - 0.2690407

(h) Now, estimate a logit model instead of LPM from question 7, and show the results. Calculate
the log-likelihood? Why is it negative? Calculate McFadden’s pseudo R-squared.
Sol.
Log Likelihood = -2417.195
Li(B)= yi log[G(xiB)] +(1- yi) log [1- G(xiB)]
The log-likelihood is always negative. This is because y i is either 0 or 1 and both variables inside
the log function are between 0 and 1, which means their natural logs are negative.
McFadden’s pseudo R-squared = 0.03658657

(i) Now use the logit results to make the same predictions as in question 7. What are the estimated
probabilities of living in the South for the two individuals?
Sol.
First Person - 0.4346457
Second Person - 0.2618315

(j) Now, add the variable ‘MS’ to the logit model in question 8. Use likelihood ratio test to compare
between the two models (one with ‘MS’, the other without ‘MS’).
Sol.

As P value >0.05, we fail to reject the Null hypothesis i.e. coefficient of MS = 0

(k) Calculate the average partial effects for the model in question 8. Explain the results.
Sol.

INTERPRETATION OF RESULTS:
• A unit increase in EXP decreases the probability of the individual’s choice to live in
SOUTH by 0.0031885, keeping other factors constant.
• If weeks worked increases by 1 week, the probability of the individual’s choice to live
in SOUTH increases by 0.0029508, keeping other factors constant.
• Compared to an individual having a white-collar job, an individual having a blue-
collar job will have 0.0416492 less probability to live in SOUTH, keeping other factors
constant.
• Compared to an individual working in non-manufacturing industry, an individual
working in manufacturing industry will have 0.084724 less probability to live in
SOUTH, keeping other factors constant.
• Compared to males, females will have 0.0062586 more probability to live in SOUTH,
keeping other factors constant.
• A year increase in ED (years of education) decreases the probability of the individual’s
choice to live in SOUTH by 0.0285102, keeping other factors constant.
• Compared to white people, black people will have 0.174716 more probability to live in
SOUTH, keeping other factors constant.
• Except FEM all other variables are statistically significant at 5% level of significance.

(l) Now we will start using the panel features of the data. To begin with, which of the variables are
showing no time variation and which of the variables are showing no individual variation?
Sol.

(m) Create time dummies. How many time dummies will be there?
Sol.
Time dummies will be created for the time variable. There will be six-time dummies.

(n) Now, we want to know if we can explain the variation in individual earnings by certain
demographic characteristics. In order to do this, we will build a model where ‘LWAGE’ is the
dependent variable and ‘EXP’, ‘WKS’, ‘OCC’, ‘IND’, ‘FEM’, ‘ED’, ‘BLK’, and the time
dummies are the explanatory variables. Estimate the coefficients using pooled OLS, first
difference, and fixed effects. Show the results from the three models in one nice table. Explain the
results.
Sol.
==========================================
Dependent variable:
-----------------------------
LWAGE
Pooled OLS FD FE
(1) (2) (3)
------------------------------------------
EXP 0.008*** 0.096***
(0.0005) (0.001)

WKS 0.003*** -0.0002 0.001


(0.001) (0.001) (0.001)

OCC -0.130*** -0.024* -0.019


(0.013) (0.014) (0.014)
IND 0.070*** 0.021 0.023
(0.011) (0.016) (0.016)

FEM -0.417***
(0.017)

ED 0.059***
(0.002)

BLK -0.140***
(0.020)

time_dummy2 0.078*** -0.006 -0.007


(0.019) (0.007) (0.008)

time_dummy3 0.202*** 0.030*** 0.029***


(0.019) (0.009) (0.008)

time_dummy4 0.291*** 0.033*** 0.032***


(0.019) (0.009) (0.008)

time_dummy5 0.372*** 0.027*** 0.027***


(0.019) (0.009) (0.008)

time_dummy6 0.443*** 0.009 0.009


(0.019) (0.007) (0.008)

time_dummy7 0.523***
(0.019)

Constant 5.418*** 0.096***


(0.063) (0.003)

------------------------------------------
Observations 4,165 3,570 4,165
R2 0.501 0.010 0.655
==========================================
Note: *p<0.1; **p<0.05; ***p<0.01

• A unit increase in EXP increases the wage of the individual by 0.8%, keeping other
factors constant.
• If weeks worked increases by 1 week, the wage of the individual increases by 0.3%,
keeping other factors constant.
• Compared to an individual having a white-collar job, an individual having a blue-
collar job will have 13% less wage, keeping other factors constant
• Compared to an individual working in non-manufacturing industry, an individual
working in manufacturing industry will have 7% more wage, keeping other factors
constant
• Compared to males, females will have 41.7% less wage, keeping other factors constant
• A year increase in ED (years of education) increases the wage of individual by 5.9%,
keeping other factors constant
• Compared to white people, black people will have 14% less wage, keeping other factors
constant
• Compared to year 1, the wage in Year 2 will increase by 7.8%, keeping other factors
constant
• Compared to year 1, the wage in Year 3 will increase by 20.2%, keeping other factors
constant
• Compared to year 1, the wage in Year 4 will increase by 29.1%, keeping other factors
constant
• Compared to year 1, the wage in Year 5 will increase by 37.2%, keeping other factors
constant
• Compared to year 1, the wage in Year 6 will increase by 44.3%, keeping other factors
constant
• Compared to year 1, the wage in Year 7 will increase by 52.3%, keeping other factors
constant

(o) Can you provide a justification for using one model over the other (pooled OLS, FD, or FE)?
Sol.
Between FD and FE models, FE has higher R2 value (0.655 vs 0.010). Hence FE model should be
selected over FD.

(p) Can you build a better model than what is given in question 15 to answer the same question?
Build the better model, show the results, and explain how your model is better.
Sol.
====================================================
Dependent variable:
-------------------------------------
LWAGE
Pooled OLS FD FE
(1) (2) (3) (4)
----------------------------------------------------
EXP 0.008*** 0.096*** 0.060***
(0.0005) (0.001) (0.007)

WKS 0.003*** -0.0002 0.001 0.001*


(0.001) (0.001) (0.001) (0.001)

OCC -0.130*** -0.024* -0.019 -0.016


(0.013) (0.014) (0.014) (0.014)

IND 0.070*** 0.021 0.023 0.027*


(0.011) (0.016) (0.016) (0.015)

FEM -0.417***
(0.017)

ED 0.059***
(0.002)

BLK -0.140***
(0.020)

time_dummy2 0.078*** -0.006 -0.007 0.038


(0.019) (0.007) (0.008) (0.038)
time_dummy3 0.202*** 0.030*** 0.029*** -0.064*
(0.019) (0.009) (0.008) (0.037)

time_dummy4 0.291*** 0.033*** 0.032*** -0.067*


(0.019) (0.009) (0.008) (0.036)

time_dummy5 0.372*** 0.027*** 0.027*** -0.018


(0.019) (0.009) (0.008) (0.037)

time_dummy6 0.443*** 0.009 0.009 0.029


(0.019) (0.007) (0.008) (0.038)

time_dummy7 0.523***
(0.019)

ED:time_dummy2 -0.001
(0.003)

ED:time_dummy3 0.013***
(0.003)

ED:time_dummy4 0.016***
(0.003)

ED:time_dummy5 0.015***
(0.003)

ED:time_dummy6 0.012***
(0.003)

ED:time_dummy7 0.017***
(0.003)

Constant 5.418*** 0.096***


(0.063) (0.003)

----------------------------------------------------
Observations 4,165 3,570 4,165 4,165
R2 0.501 0.010 0.655 0.661
====================================================
Note: *p<0.1; **p<0.05; ***p<0.01

LWAGE~EXP+WKS+OCC+IND+FEM+ED+BLK+time_dummy+time_dummy*ED
If we interact ED with time_dummy as ED variable does not have any time variation the R2 value
of model increases to 0.661.

Você também pode gostar