Multinomial 2

POLS570 - Limited Dependent Variables
October 25, 2006

Interpreting Multinomial Logit
1 Introduction
Today well talk about interpreting MNL models. Well start with general
issues of model t, and then get to variable eects.
2 Model Fit in the MNL Model
Consider again a general model for i = {1, ...N} observations where we have J
possible outcomes j {1, ...J} on the dependent variable Y
i
, and K indepen-
dent variables X
ik
{X
i1
...X
iK
} with associated k 1 vectors of parameters
for each of the alternatives
j
, so that we have
jk
{
j1
, ...
jK
}. After
running a MNL, we want to know how well the model ts the data. To
gure this out, there are several alternatives.
2.1 Likelihood ratio tests
This is a test vs. the null (that is, a test for whether all coecients

= 0j, k)
This is reported automatically by most software, including Stata.
It tells you if you can reject this null, which is often not very useful.
2.2 Likelihood ratio tests on specic parameters/variables
How to do it...
1. Estimate models including (the unrestricted model) and exclud-
ing (the restricted model) the m variable(s) in question (m < k,
naturally).
2. Calculate the LR statistic (-2 the dierence of the lnLs).
3. This is distributed as
2
m(J1)
; that is, with J1 degrees of freedom
for each of the m excluded variables in the restricted model.
1
The results of this test tell you if the eect of the variable(s) is/are
jointly signicant across the various outcomes.
I.e., if inclusion of those independent variable(s) helps you predict the
dependent variable to a statistically signicant degree; note that this
is dierent from individual-choice eects of the variable (i.e., t-tests).
We can also use this method to test the hypothesis that all the variables
have no joint eect on one of the outcomes.
Use the -test- command after estimation in Stata.
More on this in a bit...
A Wald test (described in Long, p. 161-2) is asymptotically equivalent
to this...
2.3 Pseudo-R-squared...
These are usually calculated based on some function of the null and
model lnLs.
Stata uses 1
lnL
model
lnL
null
.
Maddala (1983) explains why this usually isnt a very good measure...
Basically, even for a perfect t model, the pseudo-R
2
will be less than
1.0; sometimes, a lot less...
2.4 Predictive ability of the model
The idea is to generate predicted probabilities, and see how well the
model replicates the observed distribution of outcomes on the depen-
dent variable.
How to do it:
1. Recall that the basic probability statement for the MNL model is:
Pr(Y
i
= j) =
exp(X
i
j
)
J
j=1
exp(X
i
j
)
(1)
2
The MNL model will therefore generate J predicted probabilities
for each observation; i.e., the probability that each observation
will fall into each of the j categories (as a function of its values on
the independent variables).
2. You can then classify each observation into one of the J categories,
based on the highest of these probabilities, and then calculate a
reduction in error statistic (a la binary logit/probit)...
This can have its problems as well...
There is a tendency for the model to predict most/all observations
in one category, even if the variables seem to have signicant ef-
fects.
This is especially true if the dependent variable is very skewed
into one category.
One way around this:
Calculate the probabilities, then see if, for data with that par-
ticular pattern of independent variable values, how closely
the proportion of cases in each category matches those prob-
abilities.
2.5 A Goodness-of-t Example
Consider the model of presidential voting in 1996, slightly modied from last
time (because weve included ideosq, the ideology variable squared, to test
for a quadratic eect...):
. mlogit prezvote selfideo ideosq
Iteration 0: Log Likelihood =-354.32706...
Iteration 6: Log Likelihood =-259.96055
Multinomial regression Number of obs = 382
chi2(4) = 188.73
Prob > chi2 = 0.0000
Log Likelihood = -259.96055 Pseudo R2 = 0.2663
3
----------------------------------------------------------------------
prezvote | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+------------------------------------------------------------
2 |
selfideo | 3.550536 1.423881 2.494 0.013 .7597812 6.341291
ideosq |-.2163453 .1465542 -1.476 0.140 -.5035863 .0708957
_cons |-11.71739 3.388688 -3.458 0.001 -18.35909 -5.075679
---------+----------------------------------------------------------
3 |
selfideo | 1.659943 1.045926 1.587 0.113 -.3900346 3.70992
ideosq |-.1288385 .1214894 -1.060 0.289 -.3669534 .1092764
_cons |-6.118547 2.199407 -2.782 0.005 -10.42931 -1.807788
----------------------------------------------------------------------
(Outcome prezvote==1 is the comparison group)
Consider several things:
1. The reported Chi-Square value tells you the test that all the variables
are jointly zero; we can easily reject this (null) hypothesis.
2. Similarly, the pseudo-R
2
is respectable, but doesnt tell you anything
that dierent from the -2lnL test, since its just equal to 1
259.96
354.33
.
We may also want to test some coecient-specic or equation-specic hy-
potheses, using the -test- command. The general syntax is:
. test [equation]variable(s)
where [equation] is the coding for the outcome youre interested in, and
variable is the variable(s) youre interested in testing the eects of. Note
that you need not necessarily specify the equation or variable; if you dont
specify an equation, it tests the joint eects of the variable(s) indicated across
all possible outcome categories, while if you dont specify a variable, it tests
the joint signicance of all the variables in that outcome category. We can
also include equality statements, to test hypotheses about restrictions on co-
ecients across outcomes or variables.
4
Some examples:
Test whether ideosq is jointly signicant:
. test ideosq
( 1) [2]ideosq = 0.0
( 2) [3]ideosq = 0.0
chi2( 2) = 2.76
Prob > chi2 = 0.2515
So we cannot condently reject the null that ideosq has no eect,
overall.
Test whether both of the variables have a signicant eect on voting
for Perot vs. Clinton:
. test [3]
( 1) [3]selfideo = 0.0
( 2) [3]ideosq = 0.0
chi2( 2) = 10.93
Prob > chi2 = 0.0042
We can quite condently say that the two variables have a signicant
joint impact on the probability of voting for Perot, as against Clinton.
(Note that, if they had not, and our categories were aggregates rather
than discrete choices, we may have wanted to consider joining the two
outcomes).
5
Test whether the direct eect of ideology has the same impact on voting
for Perot as it does on voting for Dole (as against Clinton):
. test [2]selfideo=[3]selfideo
( 1) [2]selfideo - [3]selfideo = 0.0
chi2( 1) = 1.37
Prob > chi2 = 0.2410
We cannot condently reject the hypothesis that the impact of ideology
on Perot and Dole voters is equal. Now, suppose we wanted to test the
same hypothesis for Clinton vs. Dole:
. test [1]selfideo=[2]selfideo
( 1) - [2]selfideo = 0.0
chi2( 1) = 6.22
Prob > chi2 = 0.0126
Note here that:
Since we assume that

Clinton
= 0 (b/c he is the baseline category),
this is the same as testing whether (0

Dole
) = 0.
This ought to be the same as testing whether

Dole
= 0, right?
It is: This test gives a chi-square value thats simply equal to the square
of the t-test for

Dole
= 0 (that is, [(2.494)
2
= 6.22]).
There are other tests that one might do; pretty straightforward to implement
using the -test- command. Well discuss predicted values in a bit, after we
discuss interpretation of the coecients.
6
3 Interpretation of MNL Variable Eects
Typically, interpreting MNL coecients is not especially easy (there are lots
or parameters, a nonlinear form, etc.) A few key things to remember:
1. Always bear in mind what your baseline, comparison category is; all
reported results are relative to this.
2. Its generally easier to talk in terms of probabilities (either predictions
or changes), or even odds ratios, than as partials/derivatives.
3.1 Partial changes
The partial change in Pr(Y
i
= j) for a particular variable X
k
is:
Pr(Y
i
= j)
X
k
= Pr(Y
i
= j|X)
jk

J
j=1
jk
Pr(Y
i
= j|X)
(2)
For several reasons (outlined in Long and Maddala), this is a bad in-
dicator of a variables inuence on Pr(Y
i
= j)...
It depends entirely on the values at which the other variables are
set when the derivative is taken.
It may or may not have the same sign as the coecient itself.
Its sign may change with the value of the variable in question (not
unlike in the ordered models...).
IMO, in general, youre better o not using it.
3.2 Odds Ratios
The MNL can be thought of as a log-odds model, where the log of the ratio
of two probabilities is a function of the independent variables:
ln
Pr(Y
i
= j|X)
Pr(Y
i
= j
|X)
= X(
j
) (3)
If (as is always the case, as a practical matter) we set the coecients of one
category (say,

j
) to zero, then we just get:
7
ln
Pr(Y
i
= j|X)
Pr(Y
i
= j
|X)
= X(
j
)
One nice thing about this approach is that it is linear in the variables; this
in turn means that we can get the change in the odds ratio for category j
associated with a particular variable X
k
by just examining exp(
jk
)...
So for a oneunit change in X
k
, the odds of observing the relevant
category j (versus the baseline category) will change by exp(
jk
).
And for a change of some value in X
k
, the relative odds of the selected
outcome j, relative to the baseline, will change by exp(
jk
).
3.3 Odds Ratios: An Example
Consider again the results from the simple two-variable model, above...
A one unit increase in selfideo corresponds to:
An increase in the log-odds of a Dole vote, versus a vote for Clin-
ton, of exp(3.55) = 34.8.
An increase in the log-odds of a Perot vote, versus a vote for
Clinton, of exp(1.66) = 5.26.
These are LARGE increases in the odds not surprisingly, more con-
servative voters are much more likely to vote for Dole (or Perot) than
for Clinton.
Stata will automatically convert coecients to odds ratios, with the -rrr-
(for relative risk ratio) option. (Note that retyping the command after esti-
mation just redisplays the last results estimated):
. mlogit, rrr
Multinomial regression Number of obs = 38
chi2(4) = 188.73
Prob > chi2 = 0.0000
Log Likelihood = -259.96055 Pseudo R2 = 0.2663
8
---------------------------------------------------------------------
prezvote | RRR Std. Err. z P>|z| [95% Conf. Interval]
---------+-----------------------------------------------------------
2 |
selfideo | 34.83199 49.5966 2.494 0.013 2.137808 567.5287
ideosq | .8054571 .1180432 -1.476 0.140 .6043593 1.073469
---------+-----------------------------------------------------------
3 |
selfideo | 5.25901 5.500535 1.587 0.113 .6770334 40.85054
ideosq | .8791159 .1068033 -1.060 0.289 .6928419 1.115471
---------------------------------------------------------------------
(Outcome prezvote==1 is the comparison group)
Stata now reports exp(
jk
) rather than

jk
. (Note that neither of these
accounts for the quadratic eects... so its not a very good example).
4 Predicted Probabilities and Probability Changes
We can use the basic probability statement of the MNL to generate predic-
tions for the probability of each category, a la the binomial logit/probit. As
in those models, were required to select and set the values of the other inde-
pendent variables (typically means or medians). We can then do the usual
stu:
Examine predictions across ranges of independent variables.
Examine changes in predictions with unit/std. dev./min-max changes
in independent variables.
Plot any/all of the above.
This can be an excellent way to show results (e.g., the Sellers paper).
4.1 Example: Predicted Probabilities
Stata will generate J insample predicted values for each observation one for
each category of the dependent variable. However, when using -predict-,
you need to tell Stata which equation (outcome) you want the predicted
values for, using the -outcome- option:
9
. predict clinprob, outcome(1)
(95 missing values generated)
This creates a new variable clinprob, which is equal to the model-predicted
probability that each voter in the sample would vote for Clinton, based on
his or her ideology. Similarly:
. predict doleprob, outcome(2)
. predict peroprob, outcome(3)
. su clinprob doleprob peroprob
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
clinprob | 456 .5244956 .2992437 .0710932 .9896997
doleprob | 456 .3829107 .3034397 .0002263 .8973446
peroprob | 456 .0925937 .0380756 .010074 .1347315
What do we do with these? One possibility is to plot these predicted proba-
bilities as a function of ideology:
. sort selfideo
. gra clinprob selfideo, c(l)
. gra doleprob selfideo, c(l)
. gra peroprob selfideo, c(l)
(See Figures 2 & 3)...
10
Figure 1: Predicted Probabilities for a Clinton Vote
The results conform with what we might expect: liberals voting for Clin-
ton, conservatives for Dole, and moderates/independents for Perot. Another
thing to do is to generate a predicted vote, based on the maximum of the
probabilities, and compare it to the actual vote:
. gen votehat=.
. replace votehat=1 if clinprob>doleprob & clinprob>peroprob
(266 real changes made)
. replace votehat=2 if clinprob<doleprob & doleprob>peroprob
. replace votehat=3 if peroprob>doleprob & clinprob<peroprob
11
Figure 2: Predicted Probabilities for a Dole Vote
. tab2 prezvote votehat, col
-> tabulation of prezvote by votehat
1=WC,2=RD,3| votehat
=HRP | 1 2 | Total
-----------+----------------------+----------
1 | 163 34 | 197
| 76.17 20.24 | 51.57
-----------+----------------------+----------
2 | 29 121 | 150
| 13.55 72.02 | 39.27
-----------+----------------------+----------
3 | 22 13 | 35
| 10.28 7.74 | 9.16
-----------+----------------------+----------
Total | 214 168 | 382
| 100.00 100.00 | 100.00
12
Figure 3: Predicted Probabilities for a Perot Vote
A couple things about this approach...
1. A null model would pick all Clinton, and so would get
197
382
= 51.6%
correct. This model gets
(163+121)
382
=
284
382
= 74.3% correct, so one

could say this is an improvement of
284197
382197
=
87
185
= 47%.
2. The model correctly predicts 76% of the Clinton votes, and 72% of the
Dole votes, but 0% of the Perot votes. This is common in MNL models
when some categories have very few positive outcomes overall.
Finally, you can calculate dierences in predicted values associated with
changes in the independent variables.
One way is to simply hand- or computer-calculate the values, based
on the basic MNL probability equation, and then subtract to get the
changes.
In doing this, you may want to create a dummy dataset, along the
lines of what we did for logit/probit, from which to generate graphs.
13
Longs -change- and -predval- routines in Stata will automatically
calculate predicted values and changes in predicted values associated
(see the routines).
In addition, Clarify also works with the -mlogit- command...
MNL is far from the only model for unordered, categorical dependent vari-
ables. Next week, well discuss some others, and do some comparisons...
14

Multinomial 2

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Multinomial 2

Enviado por

Direitos autorais:

Formatos disponíveis

POLS570 - Limited Dependent Variables

October 25, 2006

= 74.3% correct, so one

Você também pode gostar