Escolar Documentos
Profissional Documentos
Cultura Documentos
Notation
The following notation is used throughout this chapter unless otherwise stated:
Y Dichotomous dependent variable
Xi The ith independent variable
l Likelihood function
L Log likelihood function
I Information matrix
Model
The linear logistic model assumes a dichotomous dependent variable Y with mean µ ,
b g
whose dependence on the k explanatory variables X = X 1 , K , X k takes the form
exp η bg ,
µ=
1 + exp η
bg
with η = β 0 + β 1 X 1 + K + β k X k .
That is,
exp ηbg ,
b
P Y =1 = g 1 + expbηg
or
ln
FG µ IJ = η = β 0 + β 1 X1 + K + β k X k
H 1− µ K
290
LOGISTIC REGRESSION 291
n
l= ∏ µ b1 − µ g
i =1
ci yi
i i
b g
ci 1− yi
where
exp η i
b g
µi =
1 + expbη g i
and
η i = β 0 + β 1 xi1 + K + β k xik
n
L = ln l = b g ∑ cc y lnbµ g + c b1 − y g lnb1 − µ gh
i i i i i i
i =1
n
∂L
L∗X j =
∂β j
= ∑ c b y − µ gx
i i i ij
i =1
292 LOGISTIC REGRESSION
∑ c b y − µ gx
i i i ij = 0 , for j = 0,K , k
i =1
I=− E
LM F ∂ L I OP = X ′CVX
2
$ ,
MN GH ∂β ∂β JK PQ
i j
V$ = Diagnµ$ c1 − µ$ h, K, µ$ c1 − µ$ hs,
1 1 n n
C = Diaglc , K, c q,
1 n
expcη$ h
µ$ =
i
,
1 + expcη$ h
i
i
LOGISTIC REGRESSION 293
and
η$ i = β$ 0 + β$ 1 xi1 + K + β$ k xik .
Score Statistic
The score statistic is calculated for each variable not in the model to determine
whether the variable should enter the model. Assume that there are r1 variables,
namely, µ 1 , K , µ r1 in the model and r2 variables, w1 , K , wr2 , not in the model.
Based on the MLEs for the parameters in the model, V is estimated by
V = Diag µ$ 1 1 − µ$ 1 , K , µ$ n 1 − µ$ n . The score statistic for wi is defined as
n c h c hs
2
S i = L∗wi
e jB 22, i
~ , K, w
~
b g
it is converted to a m − 1 -dimension dummy vector. Denote these new m − 1
variables as w i i + m−2 . The score statistic for wi is
′
S i = L∗w~ B 22, i L∗w~
e j
′
where L∗w~ = L∗w~i , K , L∗w~i + m − 2 and the m − 1 × m − 1 matrix B 22, i is
e j b g b g
−1 −1
B 22, i = A 22, i − A 21, i A 11
e A 12, i j
294 LOGISTIC REGRESSION
with
$ u,
A 11 = u~ ′ V ~
$w ,
A 12, i = u~ ′ V ~i
A 22, i = w $
~ i′ V w
~i
(2) If B 22, i is not positive definite, the score statistic and residual chi-square
statistic are set to be zero.
Wald Statistic
The Wald statistic is calculated for the variables in the model to determine whether
a variable should be removed. If variable X i is not categorical, the Wald statistic
is defined by
β$ i2
Waldi =
σ$ 2$
βi
−1 $
Waldi = β$ i′ ACOV
b g βi
LOGISTIC REGRESSION 295
LR = −2 ln
F lbreduced g I = −2c Lbreduced g − Lb fullgh
GH lb fullg JK
LR is asymptotically chi-square distributed with degrees of freedom equal to the
difference between the numbers of parameters estimated in the two models.
Conditional Statistic
The conditional statistic is also computed for every variable in the model. The
formula for the conditional statistic is the same as the LR statistic except that the
parameter estimates for each reduced model are conditional estimates, not MLEs.
The conditional estimates are defined as follows. Let β = β 1 , K , β r1 be the
e j
MLE for the r1 variables in the model and C be the asymptotic covariance matrix
for β$ . If variable xi is removed from the model, the conditional estimate for the
parameters left in the model given β$ is
−1
~ FH IK
bi g bi g
β bi g = β$ bi g − c12 c 22 β$ i
where β$ i is the MLE for the parameter(s) associated with xi and β$ bi g is β$ with
bg bg
β$ i removed, c12 is the covariance between β bi g and β$ i , and c 22 is the
covariance of β$ i . Then the conditional statistic is computed by
FH e j b gIK
~
−2 L β bi g − L full
~
where Leβ b g j is the log likelihood function evaluated at β$ b g .
i i
296 LOGISTIC REGRESSION
Stepwise Algorithms
Forward Stepwise (FSTEP)
(1) If FSTEP is the first method requested, estimate the parameter and likelihood
function for the initial model. Otherwise, the final model from the previous
method is the initial model for FSTEP. Obtain the necessary information:
MLEs of the parameters for the current model, predicted probability µ i ,
likelihood function for the current model, and so on.
(2) Based on the MLEs of the current model, calculate the score statistic for every
variable eligible for inclusion and find its significance.
(3) Choose the variable with the smallest significance. If that significance is less
than the probability for a variable to enter, then go to step 4; otherwise, stop
FSTEP.
(4) Update the current model by adding a new variable. If this results in a model
which has already been evaluated, stop FSTEP.
(5) Calculate LR or Wald statistic or conditional statistic for each variable in the
current model. Then calculate its corresponding significance.
(6) Choose the variable with the largest significance. If that significance is less
than the probability for variable removal, then go back to step 2; otherwise, if
the current model with the variable deleted is the same as a previous model,
stop FSTEP; otherwise, go to the next step.
(7) Modify the current model by removing the variable with the largest
significance from the previous model. Estimate the parameters for the
modified model and go back to step 5.
(1) Estimate the parameters for the full model which includes the final model
from previous method and all eligible variables. Only variables listed on the
BSTEP variable list are eligible for entry and removal. Let the current model
be the full model.
(2) Based on the MLEs of the current model, calculate the LR or Wald statistic or
conditional statistic for every variable in the model and find its significance.
(3) Choose the variable with the largest significance. If that significance is less
than the probability for a variable removal, then go to step 5; otherwise, if the
current model without the variable with the largest significance is the same as
the previous model, stop BSTEP; otherwise, go to the next step.
LOGISTIC REGRESSION 297
(4) Modify the current model by removing the variable with the largest
significance from the model. Estimate the parameters for the modified model
and go back to step 2.
(5) Check to see any eligible variable is not in the model. If there is none, stop
BSTEP; otherwise, go to the next step.
(6) Based on the MLEs of the current model, calculate the score statistic for every
variable not in the model and find its significance.
(7) Choose the variable with the smallest significance. If that significance is less
than the probability for variable entry, then go to the next step; otherwise, stop
BSTEP.
(8) Add the variable with the smallest significance to the current model. If the
model is not the same as any previous models, estimate the parameters for the
new model and go back to step 2; otherwise, stop BSTEP.
Statistics
Initial Model Information
n
with W = ∑ c . If β
i 0 is included in the model, the predicted probability is estimated as
i =1
∑c y i i
i =1
µ$ 0 =
W
298 LOGISTIC REGRESSION
and β 0 is estimated by
β$ 0 = ln
FG µ$ IJ
0
H 1 − µ$ K0
1
σ$ β$ =
0
Wµ$ 0 1 − µ$ 0
c h
The log likelihood function is
b g LMM
l 0 = W µ$ 0 ln
FG µ$ IJ + lnc1 − µ$
0
H 1 − µ$ K 0 h.OPP
N 0 Q
Model Information
n
−2 ∑ dc y lncµ$ h + c b1 − y g lnc1 − µ$ hi
i i i i i i
i =1
where
exp η$ i
c h
µ$ i =
1 + exp η$ i
c h
LOGISTIC REGRESSION 299
and
β$ + β$ 1 xi1 + K + β$ k xik
|RS if β 0 is include
η$ i = $ 0
β 1 xi1 + K + β$ k xik
|T otherwise
2(log likelihood function for current model - log likelihood function for initial model)
The initial model contains a constant if it is in the model; otherwise, the model has
no terms. The degrees of freedom for the model chi-square statistic is equal to the
difference between the numbers of parameters estimated in each of the two models.
If the degrees of freedom is zero, the model chi-square is not computed.
2(log likelihood function for current model - log likelihood function for the final
model from the previous method.)
The degrees of freedom for the block chi-square statistic is equal to the difference
between the numbers of parameters estimated in each of the two models.
2(log likelihood function for current model - log likelihood function for the model
from the last step )
The degrees of freedom for the improvement chi-square statistic is equal to the
difference between the numbers of parameters estimated in each of the two models.
2
ci yi − µ$ i
n
c h
∑ c µ$ i 1 − µ$ i
h
i =1
where µ$ i is the predicted probability for case i from the current model.
300 LOGISTIC REGRESSION
(f) Cox and Snell’s R2 (Cox and Snell, 1989; Nagelkerke, 1991)
2
2
F l(0) I
= 1− G
w
RCS
H l(β$ ) JK
RN2 = RCS
2 2
/ max RCS e j
2/ w
2
where max RCS = 1− l 0
e j l a fq .
Suppose that there are Q blocks, and the qth block has mq number of
observations, q = 1, K, Q . Moreover, suppose that the kth group (k = 1, K, g ) is
composed of the q1th, …, qkth blocks of observations. Then the total number of
∑
qk
observations in the kth group is sk = m j . The total observed frequency of
q1
events (that is, Y = 1) in the kth group, call it O1k, is the total number of
observations in the kth group with Y = 1. Let E1k be the total expected frequency of
the event (that is, Y = 1) in the kth group; then E1k is given by E1k = sk ξ k , where
ξ k is the average predicted event probability for the kth group.
∑
qk
ξk = m j π$ j / sk
q1
g
(O1k − E1k ) 2
χ2
HL
= ∑E 1k (1 − ξ k )
k =1
For each of the variables not in the equation, the score statistic is calculated along
with the associated degrees of freedom, significance and partial R. Let X i be a
variable not currently in the model and Si the score statistic. The partial R is
defined by
R| S − 2 × df i
if Si > 2 × df
Partial _ R = S
| −2 Lbinitialg
||0 otherwise
T
b
where df is the degrees of freedom associated with Si , and L initial is the log g
likelihood function for the initial model.
302 LOGISTIC REGRESSION
The residual Chi-Square printed for the variables not in the equation is defined as
′
RCS = L∗w B22 L∗w
e j
′
where L∗w = L∗w1 , K , L∗wr
e j.
2
For each of the variables in the equation, the MLE of the Beta coefficients is
calculated along with the standard errors, Wald statistics, degrees of freedom,
significances, and partial R. If X i is not a categorical variable currently in the
equation, the partial R is computed as
R|sign β$ Waldi − 2
if Wald i > 2
| e j i
−2 L initialb g
Partial _ R = S
||0 otherwise
T
If X i is a categorical variable with m categories, the partial R is then
R| 2d y lncµ$ h + b1 − y g lnc1 − µ$ hi
i i i i if yi > µ$ i
Gi = S|− 2 y ln µ$ + b1 − y g ln 1 − µ$
T d c h i i c hii i otherwise
LOGISTIC REGRESSION 303
b g
(b) Leverage hi
1 −1 1
$ 2 X X ′CVX
V e $ j $2
X ′V
where
$ = Diag µ$ 1 − µ$ , K , µ$ 1 − µ$
V n c h c hs
1 1 n n
~ Gi
Gi∗ =
1 − hi
ei
zi =
µ$ i 1 − µ$ i
c h
304 LOGISTIC REGRESSION
(g) DFBETA
Let ∆β be the change of the coefficient estimates from the deletion of case i. It is
computed as
−1
$ j
eX′CVX Xi′ ei
∆β =
1 − hi
~ V$ h 2
hi = hi − i i
1 + V$i hi
where
−1
hi = V$i Xi′ X ′CVX
e $ j Xi
For the unselected cases, the Cook’s distance and DFBETA are calculated based
~
on hi .