Escolar Documentos
Profissional Documentos
Cultura Documentos
William Greene
Stern School of Business
New York University
Part 3
Inference in Binary
Choice Models
Agenda
Measuring the Fit of the Model to the Data
Predicting the Dependent Variable
Hypothesis Tests
Linear Restrictions
Structural Change
Heteroscedasticity
Model Specification (Logit vs. Probit)
Aggregate Prediction and Model Simulation
Scaling and Heteroscedasticity
Choice Based Sampling
How Well Does the Model Fit?
There is no R squared
There are no residuals or sums of squares
The model is not computed to optimize the fit
of the model to the data
Fit measures computed from log L
Pseudo R squared = 1 logL/logL0
Also called the likelihood ratio index
Others - these do not measure fit.
Direct assessment of the effectiveness of
the model at predicting the outcome
Fit Measures for Binary Choice
Likelihood Ratio Index
Bounded by 0 and 1
Rises when the model is expanded
Can be strikingly low; .038 in our model.
To Compare Models
Use logL
Use information criteria to compare
nonnested models
Fit Measures Based on LogL
----------------------------------------------------------------------
Binary Logit Model for Binary Choice
Dependent variable DOCTOR
Log likelihood function -2085.92452 Full model LogL
Restricted log likelihood -2169.26982 Constant term only LogL0
Chi squared [ 5 d.f.] 166.69058
Significance level .00000
McFadden Pseudo R-squared .0384209 1 LogL/logL0
Estimation based on N = 3377, K = 6
Information Criteria: Normalization=1/N
Normalized Unnormalized
AIC 1.23892 4183.84905 -2LogL + 2K
Fin.Smpl.AIC 1.23893 4183.87398 -2LogL + 2K + 2K(K+1)/(N-K-1)
Bayes IC 1.24981 4220.59751 -2LogL + KlnN
Hannan Quinn 1.24282 4196.98802 -2LogL + 2Kln(lnN)
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| 1.86428*** .67793 2.750 .0060
AGE| -.10209*** .03056 -3.341 .0008 42.6266
AGESQ| .00154*** .00034 4.556 .0000 1951.22
INCOME| .51206 .74600 .686 .4925 .44476
AGE_INC| -.01843 .01691 -1.090 .2756 19.0288
FEMALE| .65366*** .07588 8.615 .0000 .46343
--------+-------------------------------------------------------------
Fit Measures Based on Predictions
Computation
Use the model to compute
predicted probabilities
Use the model and a rule to
compute predicted y = 0 or 1
Fit measure, compare
predictions to actuals
Fit Measures
+----------------------------------------+
| Fit Measures for Binomial Choice Model |
| Logit model for variable DOCTOR |
+----------------------------------------+
| Y=0 Y=1 Total|
| Proportions .34202 .65798 1.00000|
| Sample Size 1155 2222 3377|
+----------------------------------------+
| Log Likelihood Functions for BC Model |
| P=0.50 P=N1/N P=Model| P=.5 => No Model. P=N1/N => Constant only
| LogL = -2340.76 -2169.27 -2085.92| Log likelihood values used in LRI
+----------------------------------------+
| Fit Measures based on Log Likelihood |
| McFadden = 1-(L/L0) = .03842|
| Estrella = 1-(L/L0)^(-2L0/n) = .04909|
| R-squared (ML) = .04816|
| Akaike Information Crit. = 1.23892| Multiplied by 1/N
| Schwartz Information Crit. = 1.24981| Multiplied by 1/N
+----------------------------------------+
| Fit Measures Based on Model Predictions|
| Efron = .04825| Note huge variation. This severely limits
| Ben Akiva and Lerman = .57139| the usefulness of these measures.
| Veall and Zimmerman = .08365|
| Cramer = .04771|
+----------------------------------------+
Cramer Fit Measure
F = Predicted Probability
N
y N (1 y )F
F
i 1 i
i 1 i
N1 N0
Mean F | when y = 1 - Mean F | when y = 0
= reward for correct predictions minus
penalty for incorrect predictions
+----------------------------------------+
| Fit Measures Based on Model Predictions|
| Efron = .04825|
| Ben Akiva and Lerman = .57139|
| Veall and Zimmerman = .08365|
| Cramer = .04771|
+----------------------------------------+
Predicting the Outcome
Predicted probabilities
P = F(a + b1Age + b2Income + b3Female+)
Predicting outcomes
Predict y=1 if P is large
Use 0.5 for large (more likely than not)
Generally, use y 1 if P > P*
Count successes and failures
Individual Predictions from a Logit Model
Predicted Values (* => observation was not in estimating sample.)
Observation Observed Y Predicted Y Residual x(i)b Pr[Y=1]
29 .000000 1.0000000 -1.0000000 .0756747 .5189097
31 .000000 1.0000000 -1.0000000 .6990731 .6679822
34 1.0000000 1.0000000 .000000 .9193573 .7149111
38 1.0000000 1.0000000 .000000 1.1242221 .7547710
42 1.0000000 1.0000000 .000000 .0901157 .5225137
49 .000000 .0000000 .000000 -.1916202 .4522410
52 1.0000000 1.0000000 .000000 .7303428 .6748805
58 .000000 1.0000000 -1.0000000 1.0132084 .7336476
83 .000000 1.0000000 -1.0000000 .3070637 .5761684
90 .000000 1.0000000 -1.0000000 1.0121583 .7334423
109 .000000 1.0000000 -1.0000000 .3792791 .5936992
116 1.0000000 .0000000 1.0000000 -.3408756 .2926339
125 .000000 1.0000000 -1.0000000 .9018494 .7113294
132 1.0000000 1.0000000 .000000 1.5735582 .8282903
154 1.0000000 1.0000000 .000000 .3715972 .5918449
158 1.0000000 1.0000000 .000000 .7673442 .6829461
177 .000000 1.0000000 -1.0000000 .1464560 .5365487
184 1.0000000 1.0000000 .000000 .7906293 .6879664
191 .000000 1.0000000 -1.0000000 .7200008 .6726072
+---------------------------------------------------------+
|Predictions for Binary Choice Model. Predicted value is |
|1 when probability is greater than .500000, 0 otherwise.|
|Note, column or row total percentages may not sum to |
|100% because of rounding. Percentages are of full sample.|
+------+---------------------------------+----------------+
|Actual| Predicted Value | |
|Value | 0 1 | Total Actual |
+------+----------------+----------------+----------------+
| 0 | 3 ( .1%)| 1152 ( 34.1%)| 1155 ( 34.2%)|
| 1 | 3 ( .1%)| 2219 ( 65.7%)| 2222 ( 65.8%)|
+------+----------------+----------------+----------------+
|Total | 6 ( .2%)| 3371 ( 99.8%)| 3377 (100.0%)|
+------+----------------+----------------+----------------+
Aggregate Predictions
Prediction table is based on predicting aggregate shares.
+---------------------------------------------------------+
|Crosstab for Binary Choice Model. Predicted probability |
|vs. actual outcome. Entry = Sum[Y(i,j)*Prob(i,m)] 0,1. |
|Note, column or row total percentages may not sum to |
|100% because of rounding. Percentages are of full sample.|
+------+---------------------------------+----------------+
|Actual| Predicted Probability | |
|Value | Prob(y=0) Prob(y=1) | Total Actual |
+------+----------------+----------------+----------------+
| y=0 | 431 ( 12.8%)| 723 ( 21.4%)| 1155 ( 34.2%)|
| y=1 | 723 ( 21.4%)| 1498 ( 44.4%)| 2222 ( 65.8%)|
+------+----------------+----------------+----------------+
|Total | 1155 ( 34.2%)| 2221 ( 65.8%)| 3377 ( 99.9%)|
+------+----------------+----------------+----------------+
Simulating the Model to Examine
Changes in Market Shares
There is no F statistic
Comparisons of Likelihood Functions:
Likelihood Ratio Tests
Distance Measures: Wald Statistics
Lagrange Multiplier Tests
Base Model
----------------------------------------------------------------------
Binary Logit Model for Binary Choice
Dependent variable DOCTOR
Log likelihood function -2085.92452
Restricted log likelihood -2169.26982
Chi squared [ 5 d.f.]
Significance level
166.69058
.00000
H0: Age is not a significant
McFadden Pseudo R-squared .0384209 determinant of
Estimation based on N = 3377, K = 6
Information Criteria: Normalization=1/N Prob(Doctor = 1)
Normalized Unnormalized
AIC 1.23892 4183.84905 H0: 2 = 3 = 5 = 0
Fin.Smpl.AIC 1.23893 4183.87398
Bayes IC 1.24981 4220.59751
Hannan Quinn 1.24282 4196.98802
Hosmer-Lemeshow chi-squared = 13.68724
P-value= .09029 with deg.fr. = 8
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
|Characteristics in numerator of Prob[Y = 1]
Constant| 1.86428*** .67793 2.750 .0060
AGE| -.10209*** .03056 -3.341 .0008 42.6266
AGESQ| .00154*** .00034 4.556 .0000 1951.22
INCOME| .51206 .74600 .686 .4925 .44476
AGE_INC| -.01843 .01691 -1.090 .2756 19.0288
FEMALE| .65366*** .07588 8.615 .0000 .46343
--------+-------------------------------------------------------------
Likelihood Ratio Tests
Null hypothesis restricts the parameter
vector
Alternative releases the restriction
Test statistic: Chi-squared =
2 (LogL|Unrestricted model
LogL|Restrictions) > 0
Degrees of freedom = number of
restrictions
LR Test of H0
UNRESTRICTED MODEL RESTRICTED MODEL
Binary Logit Model for Binary Choice Binary Logit Model for Binary Choice
Dependent variable DOCTOR Dependent variable DOCTOR
Log likelihood function -2085.92452 Log likelihood function -2124.06568
Restricted log likelihood -2169.26982 Restricted log likelihood -2169.26982
Chi squared [ 5 d.f.] 166.69058 Chi squared [ 2 d.f.] 90.40827
Significance level .00000 Significance level .00000
McFadden Pseudo R-squared .0384209 McFadden Pseudo R-squared .0208384
Estimation based on N = 3377, K = 6 Estimation based on N = 3377, K = 3
Information Criteria: Normalization=1/N Information Criteria: Normalization=1/N
Normalized Unnormalized Normalized Unnormalized
AIC 1.23892 4183.84905 AIC 1.25974 4254.13136
Fin.Smpl.AIC 1.23893 4183.87398 Fin.Smpl.AIC 1.25974 4254.13848
Bayes IC 1.24981 4220.59751 Bayes IC 1.26518 4272.50559
Hannan Quinn 1.24282 4196.98802 Hannan Quinn 1.26168 4260.70085
Hosmer-Lemeshow chi-squared = 13.68724 Hosmer-Lemeshow chi-squared = 7.88023
P-value= .09029 with deg.fr. = 8 P-value= .44526 with deg.fr. = 8
LM = [igi(b0)][igi(b0)gi(b0)]-1[igi(b0)]
Test Results
Matrix DERIV has 6 rows and 1 columns.
+-------------+
1| .2393443D-05 zero from FOC
2| 2268.60186
3| .2122049D+06
4| .9683957D-06 zero from FOC
5| 849.70485
6| .2380413D-05 zero from FOC
+-------------+
N
LogL under specification A is i=1
log f i,A ( xi , )
Model B specifies density f i,B ( z i , )
N
LogL under specification B is i=1
log f i,B ( z i , )
f i,A (xi , )
let vi log
f ( z , )
.
i,B i
N v
Under some assumptions, V= N [0,1]
sv
Large positive values of V favor model A (greater than 1.96)
Large negative values favor B (less than -1.96)
x
Ev
x
( )
[x (z u v)](v) dv
The integral does not have a closed form, but it can easily be simulated :
E[y|x,z ] 1
R
Est. ( ) [x (z u vr )]
x R r 1
The scale factor is computed using the model coefficients, means of the
variables and 35,000 draws from the standard normal population.
Endogenous Binary Variable
U* = x + h +
y = 1[U* > 0]
h* = z + u
h = 1[h* > 0]
E[|h*] 0 Cov[u, ] 0
Additional Assumptions:
(u,) ~ N[(0,0),(u2, u, 1)]
z = a valid set of instrumental
variables, uncorrelated with (u,)
Endogenous Binary Variable
P(Y = y,H = h) = P(Y = y|H =h) x P(H=h)
This is a simple bivariate probit model.
Not a simultaneous equations model - the estimator
is FIML, not any kind of least squares.
Doctor = F(age,age2,income,female,Public) Public = F(age,educ,income,married,kids,female)
Application: Doctor,Public
+-----------------------------------------------------+
| Joint Frequency Table for Bivariate Probit Model |
| Predicted cell is the one with highest probability |
+-----------------------------------------------------+
| PUBLIC |
+-------------+---------------------------------------+
| DOCTOR | 0 1 Total |
|-------------+-------------+------------+------------+
| 0 | 1403 | 8732 | 10135 |
| Fitted | ( 127) | ( 2715) | ( 2842) |
|-------------+-------------+------------+------------+
| 1 | 1720 | 15471 | 17191 |
| Fitted | ( 645) | ( 23839) | ( 24484) |
|-------------+-------------+------------+------------+
| Total | 3123 | 24203 | 27326 |
| Fitted | ( 772) | ( 26554) | ( 27326) |
|-------------+-------------+------------+------------+
FIML Estimates
----------------------------------------------------------------------
FIML Estimates of Bivariate Probit Model
Dependent variable DOCPUB
Log likelihood function -25671.43905
Estimation based on N = 27326, K = 14
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
|Index equation for DOCTOR
Constant| .59049*** .14473 4.080 .0000
AGE| -.05740*** .00601 -9.559 .0000 43.5257
AGESQ| .00082*** .681660D-04 12.100 .0000 2022.86
INCOME| .08883* .05094 1.744 .0812 .35208
FEMALE| .34583*** .01629 21.225 .0000 .47877
PUBLIC| .43533*** .07357 5.917 .0000 .88571
|Index equation for PUBLIC
Constant| 3.55054*** .07446 47.681 .0000
AGE| .00067 .00115 .581 .5612 43.5257
EDUC| -.16839*** .00416 -40.499 .0000 11.3206
INCOME| -.98656*** .05171 -19.077 .0000 .35208
MARRIED| -.00985 .02922 -.337 .7361 .75862
HHKIDS| -.08095*** .02510 -3.225 .0013 .40273
FEMALE| .12139*** .02231 5.442 .0000 .47877
|Disturbance correlation
RHO(1,2)| -.17280*** .04074 -4.241 .0000
--------+-------------------------------------------------------------
Model Predictions
+--------------------------------------------------------+
| Bivariate Probit Predictions for DOCTOR and PUBLIC |
| Predicted cell (i,j) is cell with largest probability |
| Neither DOCTOR nor PUBLIC predicted correctly |
| 1599 of 27326 observations |
| Only DOCTOR correctly predicted |
| DOCTOR = 0: 1062 of 10135 observations |
| DOCTOR = 1: 632 of 17191 observations |
| Only PUBLIC correctly predicted |
| PUBLIC = 0: 140 of 3123 observations |
| PUBLIC = 1: 632 of 24203 observations |
| Both DOCTOR and PUBLIC correctly predicted |
| DOCTOR = 0 PUBLIC = 0: 69 of 1403 |
| DOCTOR = 1 PUBLIC = 0: 92 of 1720 |
| DOCTOR = 0 PUBLIC = 1: 252 of 8732 |
| DOCTOR = 1 PUBLIC = 1: 15008 of 15471 |
+--------------------------------------------------------+
Partial Effects
Conditional Mean
E[ y | x, h] (x h)
E[ y | x, z ] Eh E[ y | x, h]
Prob( h 0 | z)E[ y | x, h 0] Prob( h 1| z )E[ y | x, h 1]
( z) (x) ( z) (x )
Partial Effects
Direct Effects
E[ y | x, z ]
( z )(x) (z )(x )
x
Indirect Effects
E[ y | x, z ]
( z ) (x) (z )(x )
z
( z) (x ) (x)
Identification Issues
Exclusions are not needed for estimation
Identification is, in principle, by functional form
Researchers usually have a variable in the
treatment equation that is not in the main probit
equation to improve identification
A fully simultaneous model
y1 = f(x1,y2), y2 = f(x2,y1)
Not identified even with exclusion restrictions
A Sample Selection Model
U* = x +
y = 1[U* > 0]
h* = z + u
h = 1[h* > 0]
E[|h] 0 Cov[u, ] 0
(y,x) are observed only when h = 1
Additional Assumptions:
(u,) ~ N[(0,0),(u2, u, 1)]
z = a valid set of instrumental
variables, uncorrelated with (u,)
Application: Doctor,Public
3 Groups of observations: (Public=0), (Doctor=1|Public=1), (Doctor=0|Public=1)
+-----------------------------------------------------+
| Joint Frequency Table for Bivariate Probit Model |
| Predicted cell is the one with highest probability |
+-----------------------------------------------------+
| PUBLIC |
+-------------+---------------------------------------+
| DOCTOR | 0 1 Total |
|-------------+-------------+------------+------------+
| 0 | 1403 | 8732 | 10135 |
| Fitted | ( 127) | ( 2715) | ( 2842) |
|-------------+-------------+------------+------------+
| 1 | 1720 | 15471 | 17191 |
| Fitted | ( 645) | ( 23839) | ( 24484) |
|-------------+-------------+------------+------------+
| Total | 3123 | 24203 | 27326 |
| Fitted | ( 772) | ( 26554) | ( 27326) |
+-------------+-------------+------------+------------+
Sample Selection
Doctor = F(age,age2,income,female,Public=1)
Public = F(age,educ,income,married,kids,female)
Selected Sample
+-----------------------------------------------------+
| Joint Frequency Table for Bivariate Probit Model |
| Predicted cell is the one with highest probability |
+-----------------------------------------------------+
| PUBLIC |
+-------------+---------------------------------------+
| DOCTOR | 0 1 Total |
|-------------+-------------+------------+------------+
| 0 | 0 | 8732 | 8732 |
| Fitted | ( 0) | ( 511) | ( 511) |
|-------------+-------------+------------+------------+
| 1 | 0 | 15471 | 15471 |
| Fitted | ( 477) | ( 23215) | ( 23692) |
|-------------+-------------+------------+------------+
| Total | 0 | 24203 | 24203 |
| Fitted | ( 477) | ( 23726) | ( 24203) |
|-------------+-------------+------------+------------+
| Counts based on 24203 selected of 27326 in sample |
+-----------------------------------------------------+
ML Estimates
----------------------------------------------------------------------
FIML Estimates of Bivariate Probit Model
Dependent variable DOCPUB
Log likelihood function -23581.80697
Estimation based on N = 27326, K = 13
Selection model based on PUBLIC
Means for vars. 1- 5 are after selection.
--------+-------------------------------------------------------------
Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X
--------+-------------------------------------------------------------
|Index equation for DOCTOR
Constant| 1.09027*** .13112 8.315 .0000
AGE| -.06030*** .00633 -9.532 .0000 43.6996
AGESQ| .00086*** .718153D-04 11.967 .0000 2041.87
INCOME| .07820 .05779 1.353 .1760 .33976
FEMALE| .34357*** .01756 19.561 .0000 .49329
|Index equation for PUBLIC
Constant| 3.54736*** .07456 47.580 .0000
AGE| .00080 .00116 .690 .4899 43.5257
EDUC| -.16832*** .00416 -40.490 .0000 11.3206
INCOME| -.98747*** .05162 -19.128 .0000 .35208
MARRIED| -.01508 .02934 -.514 .6072 .75862
HHKIDS| -.07777*** .02514 -3.093 .0020 .40273
FEMALE| .12154*** .02231 5.447 .0000 .47877
|Disturbance correlation
RHO(1,2)| -.19303*** .06763 -2.854 .0043
--------+-------------------------------------------------------------
Estimation Issues
True proportion of y is
Sample proportion of y is
= a function of y i (two values)
Redefined Multinomial Choice
Fly Ground
Choice Based Sample