Você está na página 1de 16

Maximum likelihood estimation of Heckmans

sample selection model


Herman J. Bierens
October 2007
1 Heckmans sample selection model
1.1 Introduction
Heckmans sample selection model
1
is based on two latent dependent vari-
ables models:
Y

1
=
0
X +U
1
, (1)
Y

2
=
0
Z +U
2,
(2)
where X and Z are vectors of regressors, possibly containing common com-
ponents, including intercepts, and the errors U
1
and U
2
are, conditional on
X and Z, jointly bivariate normally distributed with zero mean vector and
variance matrix .
The model for Y

1
is the one we are interested in, but Y

1
is only observable
if Y

2
> 0. Thus the observed dependent variable Y is
Y = Y

1
if Y

2
> 0,
Y = missing value if Y

2
0.
However, the Zs are observable if Y is a missing value, and the Xs are
observable if the Y s are.
1
Heckman, James J. (1979): Sample Selection Bias as a Specication Error, Econo-
metrica 47, 153-161. (Heckman got the Nobel prize for this article.)
1
The variance matrix can be written as
=
0
,
where is an upper-triangular matrix:
=


1

2
0
3

Consequently, we can write


U
1
=
1
e
1
+
2
e
2
,
U
2
=
3
e
2
,
where e
1
and e
2
are independent standard normally distributed. Thus the
latent dependent variables models (1) and (2) can be written as
Y

1
=
0
X +
1
e
1
+
2
e
2
, (3)
Y

2
=
0
Z +
3
e
2
. (4)
Without loss of generality we may assume that
1
> 0, and since only the
sign of Y

2
plays a role, we may set
3
= 1. Then the conditional probability
of a missing value of Y is:
P[Y

2
0|Z, X] = P[e
2

0
Z]
= 1 P[e
2

0
Z]
= 1 P[e
2

0
Z]
= 1 F (
0
Z) ,
where F is the distribution function of the standard normal distribution, i.e.,
F(x) =
Z
x

f(u)du, (5)
with
f(x) =
exp [x
2
/2]

2
. (6)
Thus, from now on I will assume that

1
> 0,
3
= 1.
2
Let D be a dummy variable taking the value 1 if Y is observed, and 0 if
not. Then
P [ D = 1| Z, X] = F (
0
Z) . (7)
The distribution function of Y conditional on the event D = 1 and X and
Z is now given by
H(y|X, Z) = P [ Y y| D = 1, X, Z] (8)
=
P [ Y y and D = 1| X, Z]
P [ D = 1| X, Z]
=
P [ Y

1
y and Y

2
> 0| X, Z]
F (
0
Z)
=
P [
2
e
2
y
0
X
1
e
1
and
0
Z < e
2
| X, Z]
F (
0
Z)
1.2 The case
2
> 0
In order to evaluate expression (8) further, and derive the corresponding
conditional density, assume rst that
2
> 0. Then (8) times F (
0
Z) becomes
F (
0
Z) .H(y|X, Z) = P [
0
Z < e
2
(y
0
X
1
e
1
)/
2
| X, Z]
=
Z

P [
0
Z < e
2
(y
0
X
1
u)/
2
| X, Z]
f(u)du
=
Z
(y
0
X+
2

0
Z)/
1

[F ((y
0
X
1
u)/
2
) F (
0
Z)]
f(u)du
=
Z
(y
0
X+
2

0
Z)/
1

F ((y
0
X
1
u)/
2
) f(u)du
F (
0
Z) F ((y
0
X +
2

0
Z)/
1
)
=

2

1
Z

0
Z
F (v) f((y
0
X
2
v)/
1
)dv
F (
0
Z) F ((y
0
X +
2

0
Z)/
1
)
=
Z

0
Z
F (v)
F((y
0
X
2
v)/
1
)
v
dv
F (
0
Z) F ((y
0
X +
2

0
Z)/
1
)
3
= F (v) F((y
0
X
2
v)/
1
)|

0
Z
+
Z

0
Z
F ((y
0
X
2
v)/
1
) f(v)dv
F (
0
Z) F ((y
0
X +
2

0
Z)/
1
)
=
Z

0
Z
F ((y
0
X
2
v)/
1
) f(v)dv
The fth equality follows by substituting
u = (y
0
X
2
v)/
1
,
and the last two equalities follow from integration by parts.
The corresponding conditional density is now
h(y|X, Z) =
H(y|X, Z)
y
=
1

1
F (
0
Z)
Z

0
Z
f ((y
0
X
2
v)/
1
) f(v)dv
It can be shown (see Appendix 1) that for the standard normal density
f,
Z

c
f(a +b.x)f(x)dx =
f

a/

b
2
+ 1

b
2
+ 1
(9)

h
1 F

c

b
2
+ 1 + ab/

b
2
+ 1
i
Substituting c =
0
Z, a = (y
0
X)/
1
and b =
2
/
1
, i.e.,
1

b
2
+ 1
=

1
p

2
1
+
2
2
,
a

b
2
+ 1
=
y
0
X
p

2
1
+
2
2
,
c

b
2
+ 1 +
ab

b
2
+ 1
=

2
(y
0
X) + (
2
1
+
2
2
)
0
Z

1
p

2
1
+
2
2
,
it follows therefore that
h(y|X, Z) =
f

(y
0
X)/
p

2
1
+
2
2

2
1
+
2
2
F (
0
Z)
(10)
4

1 F

2
(y
0
X) + (
2
1
+
2
2
)
0
Z

1
p

2
1
+
2
2
!!
=
f

(y
0
X)/
p

2
1
+
2
2

2
1
+
2
2
F (
0
Z)
F

2
(y
0
X) + (
2
1
+
2
2
)
0
Z

1
p

2
1
+
2
2
!
1.3 The case
2
< 0
If
2
< 0 then (8) times F (
0
Z) becomes
F (
0
Z) .H(y|X, Z) = P [
2
e
2
y
0
X
1
e
1
and
2

0
Z >
2
e
2
| X, Z]
= P [
2
e
2
min((y
0
X
1
e
1
), |
2
|
0
Z)]
= P [|
2
|e
2
min ((y
0
X
1
e
1
), |
2
|
0
Z)]
= P [e
2
min ((y
0
X
1
e
1
)/|
2
|,
0
Z)]
=
Z

F (min ((y
0
X
1
u)/ |
2
| ,
0
Z)) f(u)du
= F (
0
Z)
Z
(y
0
X|
2
|
0
Z)/
1

f(u)du
+
Z

(y
0
X|
2
|
0
Z)/
1
F ((y
0
X
1
u)/ |
2
|) f(u)du
= F (
0
Z) F ((y
0
X |
2
|
0
Z)/
1
)
+
|
2
|

1
Z

0
Z

F (v) f((y
0
X |
2
| v)/
1
)dv
= F (
0
Z) F ((y
0
X |
2
|
0
Z)/
1
)

Z

0
Z

F (v)
F((y
0
X |
2
| v)/
1
)
v
dv
= F (
0
Z) F ((y
0
X |
2
|
0
Z)/
1
)
F (v) F((y
0
X |
2
| v)/
1
)|

0
Z

+
Z

0
Z

F((y
0
X |
2
| v)/
1
)f (v) dv
=
Z

0
Z

F((y
0
X |
2
| v)/
1
)f (v) dv
5
The corresponding conditional density is now
h(y|X, Z) =
H(y|X, Z)
y
=
1

1
F (
0
Z)
Z

0
Z

f((y
0
X |
2
| v)/
1
)f (v) dv
It can be shown (see Appendix 1) that for the standard normal density
f,
Z
c

f(a +b.x)f(x)dx =
f

a/

b
2
+ 1

b
2
+ 1
(11)
F

c

b
2
+ 1 +ab/

b
2
+ 1

Substituting c =
0
Z, a = (y
0
X)/
1
and b = |
2
|/
1
, i.e.,
1

b
2
+ 1
=

1
p

2
1
+
2
2
,
a

b
2
+ 1
=
y
0
X
p

2
1
+
2
2
,
c

b
2
+ 1 +
ab

b
2
+ 1
=
p

2
1
+
2
2

1

0
Z +
|
2
| (y
0
X)

1
p

2
1
+
2
2
=

2
(y
0
X) (
2
1
+
2
2
)
0
Z

1
p

2
1
+
2
2
,
it follows that
h(y|X, Z) =
f

(y
0
X)/
p

2
1
+
2
2

2
1
+
2
2
F (
0
Z)
F

2
(y
0
X) + (
2
1
+
2
2
)
0
Z

1
p

2
1
+
2
2
!
which is the same as in the case
2
> 0.
6
1.4 The conditional density of the observed Y
Next, substitute

1
=
p
1
2
,
2
= ,
which correspond to
=
0
=


2
1
+
2
2

2

2
1

=


2

,
where
2
is the variance of U
1
and (1, 1) is the correlation between U
1
and U
2
. Then (10) simplies to:
h(y|X, Z, , , , ) =
f ((y
0
X)/)
F (
0
Z)
.F

(y
0
X)/ +
0
Z
p
1
2
!
. (12)
The case
2
= 0 corresponds to = 0:
h(y|X, Z, , , , ) = f ((y
0
X)/) /,
which is just the conditional density of Y

1
.
2 Sample selection bias
The conditional expectation corresponding to (12) is
E [Y |D = 1, X, Z] =
0
X +
f (
0
Z)
F (
0
Z)
(13)
and the conditional variance involved is
V ar [Y |D = 1, X, Z] =
2

0
Z +
f (
0
Z)
F (
0
Z)

f (
0
Z)
F (
0
Z)
.. (14)
See Appendix 2. Thus
E[Y |D = 1, X] =
0
X + E [ f (
0
Z) /F (
0
Z)| X] . (15)
The second termis the cause of the sample selection bias of the OLS estimator
of if Y is regressed on X using the valid observations on Y only.
Note that if X and Z are independent then
E[ f (
0
Z) /F (
0
Z)| X] = E[f (
0
Z) /F (
0
Z)]
is constant, and therefore only aects the intercept.
7
3 The log-likelihood function and score vec-
tor
Let for j = 1, ..., n, D
j
= 1 if Y
j
is observed, and D
j
= 0 if not. The
regressors X
j
R
k
are observable if the corresponding Y
j
are observable,
and the Z
j
R
`
are observable for all j. It will be assumed that the data
involved is a random sample with non-response for Y
j
if D
j
= 0.
Without loss of generality we may assume that Y
j
= 0 if D
j
= 0. The
actual dependent variable is now the pair (D
j
, D
j
Y
j
), with joint conditional
distribution given by
d
dy
P [D
j
= 1, D
j
Y
j
y|X
j
, Z
j
]
=
d
dy
P [Y
j
y|D
j
= 1, X
j
, Z
j
] P [D
j
= 1|X
j
, Z
j
]
= h(y|X, Z, , , , )F (
0
Z)
and
P [D
j
= 0, D
j
Y
j
= 0|X
j
, Z
j
] = P [D
j
= 0|X
j
, Z
j
] = 1 F (
0
Z)
Then the log-likelihood takes the form
ln L() =
n
X
j=1
(1 D
j
) ln (1 F(
0
Z
j
)) +
n
X
j=1
D
j
ln (F (
0
Z
j
)) (16)
+
n
X
j=1
D
j
ln (h(Y
j
|X
j
, Z
j
, , , , )) ,
where
= (
0
,
0
, , )
0
. (17)
The corresponding score vector ln L()/ is .
ln L()

0
=
n
X
j=1

j
() (18)
where

j
() =

D
j
f (
0
Z
j
)
F (
0
Z
j
)
(1 D
j
)
f(
0
Z
j
)
1 F(
0
Z
j
)

0
k
Z
j
0
0

(19)
8
+D
j
ln (h(Y
j
|X
j
, Z
j
, , , , ))

0
with 0
k
a k-vectorsof zeros. The partial derivative vector in (19) is derived
in Appendix 3.
Moreover, recall from maximum likelihood theory that for the true para-
meter vector
0
,
H = lim
n
E
"

1
n

2
ln L()

=
0
#
= lim
n
1
n
n
X
j=1
E[
j
(
0
)
j
(
0
)
0
] ,
and that under some regularity conditions the maximum likelihood estimator
b
of
0
satises

b

0

N

0, H
1

in distribution, where H can be consistently estimated by


b
H =
1
n
n
X
j=1

j
(
b
)
j
(
b
)
0
.
4 Initial parameter estimates
The log-likelihood function (16) is highly nonlinear in the parameters, and
even more so are the components of the score vector ln L()/ (see Appen-
dix 3) and the elements of the Hessian matrix

2
lnL()

0
. Moreover, the latter
matrix may not be negative denite for all values of (at least I could not
verify this). Therefore, EasyReg maximizes the log-likelihood function (16)
by using the simplex method of Nelder and Mead, which only requires evalu-
tions of (16) itself. However, this method is rather slow, and if the Hessian is
not negative denite for all values of one may get stuck in a local optimum.
Therefore, it is important to start the simplex iteration from a starting value
of already close to the true parameter value
0
. Such a starting value
e
,
say, can be derived as follows.
The parameter vector can be estimated by Probit analysis. Given the
Probit estimator e , say, the parameter vector and the parameter =
can be estimated by regressing Y
j
on X
j
and f

e
0
Z
j

/F

e
0
Z
j

for the
observations j for which D
j
= 1, with OLS estimators
e
and e , and residual
ev
j
:
Y
j
=
e

0
X
j
+ e f

e
0
Z
j

/F

e
0
Z
j

+ev
j
.
9
Now (14) suggests to estimate
2
by
e
2
=
1
m
m
X
j=1
D
j
"
ev
2
j
+ e
2

1
m
m
X
j=1
e
0
Z
j
f

e
0
Z
j

F

e
0
Z
j
+
1
m
m
X
j=1
f

e
0
Z
j

2
F

e
0
Z
j

2
!#
=
1
m
m
X
j=1
D
j
ev
2
j
+ e
2

1
m
m
X
j=1
D
j

e
0
Z
j

F

e
0
Z
j

+f

e
0
Z
j
f

e
0
Z
j

F

e
0
Z
j

2
!
.
where
m =
n
X
j=1
D
j
.
Note that e
2
e
2
, because
inf
u
[uF(u) +f (u)] = lim
u
[uF(u) +f (u)] = 0.
Finally, can be estimated by
e = e /
p
e
2
.
Let
e
= (
e

0
, e
0
, e , e)
0
. Under some regularity conditions (one of them is
that m/n (0, 1) as n ) it can be shown that
e

0
= O
p

1/

,
where
0
is the true parameter vector.
5 Appendix 1: Products of normal densities
Let f(x) be the standard normal density. Then
f(a +b.x)f(x) =
exp

1
2
(a +bx)
2

1
2
x
2

2
=
exp

1
2
(a
2
+ 2abx + (1 +b
2
)x
2
)

2
=
exp
h

1
2

a
2
1+b
2
+ 2
ab
1+b
2
x +x
2

(1 +b
2
)
i
2
10
=
exp
h

1
2

ab
1+b
2

2
+ 2
ab
1+b
2
x +x
2

(1 +b
2
)
i
2
exp
"

1
2

a
2
1 +b
2

ab
1 +b
2

2
!
(1 + b
2
)
#
=
exp
h

1
2

x +
ab
1+b
2

2
/
1
1+b
2
i
1

1+b
2

exp
h

1
2

a
2
1+b
2
i

1 +b
2

2
= f

x +
ab
1 +b
2

/
1

1 +b
2

f

a/

1 +b
2

Hence:
Z
c

f(a +b.x)f(x)dx (20)


=
Z
c

f

x +
ab
1 +b
2

/
1

1 +b
2

dx f

a/

1 +b
2

=
Z
c

f

x +
ab
1 +b
2

/
1

1 +b
2

x +
ab
1 +b
2

f

a/

1 +b
2

=
Z
c+ab/(1+b
2
)

f

u

1 +b
2

du
f

a/

1 +b
2

=
Z
c+ab/(1+b
2
)

f

u

1 +b
2

1 + b
2

f

a/

1 +b
2

1 +b
2
= F

c

1 +b
2
+
ab

1 +b
2

f

a/

1 +b
2

1 +b
2
This result proves (11).
Setting c = in (20) it follows that
Z

f(a +b.x)f(x)dx =
f

a/

1 +b
2

1 +b
2
, (21)
11
hence
Z

c
f(a +b.x)f(x)dx
=
Z

f(a +b.x)f(x)dx
Z
c

f(a +b.x)f(x)dx
=

1 F

c

1 +b
2
+
ab

1 +b
2

f

a/

1 +b
2

1 + b
2
.
This result proves (9).
6 Appendix 2: The conditional moment gen-
erating function and its derivatives
In order to derive the conditional expectation E [Y |D = 1, X, Z] and the
conditional variance V ar [Y |D = 1, X, Z] we now compute the moment gen-
erating function of the conditional density h(y|X, Z, , , , ):
m(|X, Z, , , , )
=
Z

exp(y)h(y|X, Z, , , , )dy
=
Z

exp(y)
f ((y
0
X)/)
F (
0
Z)
F

(y
0
X)/ +
0
Z
p
1
2
!
dy
=
exp(
0
X)
F (
0
Z)
Z

exp(u)f (u) F

u +
0
Z
p
1
2
!
du
=
exp(
0
X +
2

2
/2)
F (
0
Z)
Z

f (u ) F

u +
0
Z
p
1
2
!
du
=
exp(
0
X +
2

2
/2)
F (
0
Z)
Z

f (u) F

u + +
0
Z
p
1
2
!
du
The fourth equality follows from
exp(u)f (u) =
exp [(
2

2
2u +u
2
) /2]

2
12
exp

2
/2

= exp

2
/2

f(u )
Thus,
m(|X, Z, , , , )

(22)
= (
0
X +
2
)
exp(
0
X +
2

2
/2)
F (
0
Z)
Z

f (u) F

u + +
0
Z
p
1
2
!
du
+
exp(
0
X +
2

2
/2)
F (
0
Z)
p
1
2
Z

f (u) f

u + +
0
Z
p
1
2
!
du
= (
0
X +
2
)m(|X, Z, , , , )
+ exp(
0
X +
2

2
/2)
f ( +
0
Z)
F (
0
Z)
.
The last equality follows from (20) with c = , a = ( +
0
Z)/
p
1
2
,
b = /
p
1
2
:
Z

f (u) f

u + +
0
Z
p
1
2
!
du =
p
1
2
f ( +
0
Z) .
Moreover, it follows from (22) and the easy equality f
0
(u) = uf(u) that

2
m(|X, Z, , , , )
()
2
(23)
=
2
m(|X, Z, , , , )
+(
0
X +
2
)
m(|X, Z, , , , )

+(
0
X +
2
) exp(
0
X +
2

2
/2)
f ( +
0
Z)
F (
0
Z)

2
( +
0
Z) exp(
0
X +
2

2
/2)
f ( +
0
Z)
F (
0
Z)
.
Hence
E[Y |D = 1, X, Z] =
m(|X, Z, , , , )

=0
13
=
0
X +
f (
0
Z)
F (
0
Z)
,
E

Y
2
|D = 1, X, Z

=

2
m(|X, Z, , , , )
()
2

=0
=
2
+

0
X +
f (
0
Z)
F (
0
Z)

0
X
+.
0
X
f (
0
Z)
F (
0
Z)

2

0
Z
f (
0
Z)
F (
0
Z)
=
2
+ (
0
X)
2
+ 2.
0
X
f (
0
Z)
F (
0
Z)

2

2
(
0
Z)
f (
0
Z)
F (
0
Z)
=
2
+

0
X +
f (
0
Z)
F (
0
Z)

2
(
0
Z)
f (
0
Z)
F (
0
Z)

2

2
f (
0
Z)
2
F (
0
Z)
2
,
and thus
V ar [Y |D = 1, X, Z]
= E

Y
2
|D = 1, X, Z

(E[Y |D = 1, X, Z])
2
=
2

2
(
0
Z)
f (
0
Z)
F (
0
Z)

2

2
f (
0
Z)
2
F (
0
Z)
2
7 Appendix 3: The score vector
In order to derive the score vector
lnL()

0
, I will derive the rst-order partial
derivatives of
lnh(y|X, Z, , , , ) (24)
= ln f ((y
0
X)/) + lnF

(y
0
X)/ +
0
Z
p
1
2
!
ln F (
0
Z) ln .
Using the easy equality f
0
(u) = uf(u), it follows from (24) that
14
ln h(y|X, Z, , , , )

0
= ((y
0
X)/)
(y
0
X)/

0
+
f

(y
0
X)/+
0
Z

1
2

F

(y
0
X)/+
0
Z

1
2

(y
0
X)/ +
0
Z
p
1
2
!
=
1

y
0
X


f

(y
0
X)/+
0
Z

1
2

F

(y
0
X)/+
0
Z

1
2


p
1
2

X
=
1

y
0
X

(y
0
X)/ +
0
Z
p
1
2
!

p
1
2
!
X ,
where
g(u) =
f(u)
F(u)
. (25)
Moreover using the notation (25) it is easy to verify from (24) that
lnh(y|X, Z, , , , )

0
=
"
1
p
1
2
g

(y
0
X)/ +
0
Z
p
1
2
!
g (
0
Z)
#
Z,
ln h(y|X, Z, , , , )

= g

(y
0
X)/ +
0
Z
p
1
2
!

(y
0
X)/ +
0
Z
p
1
2
!
= g

(y
0
X)/ +
0
Z
p
1
2
!
(y
0
X)/
p
1
2
!
15
+g

(y
0
X)/ +
0
Z
p
1
2
!
((y
0
X)/ +
0
Z)
(1
2
)
p
1
2
= g

(y
0
X)/ +
0
Z
p
1
2
!
(y
0
X)/ +
0
Z
(1
2
)
p
1
2
and
ln h(y|X, Z, , , , )

= ((y
0
X)/)
((y
0
X)/)

+g

(y
0
X)/ +
0
Z
p
1
2
!

(y
0
X)/ +
0
Z
p
1
2
!

=
1

"
((y
0
X)/)
2
1

p
1
2
((y
0
X)/)
g

(y
0
X)/ +
0
Z
p
1
2
!#
Given these partial derivatives, the results (18) and (19) follow.
16

Você também pode gostar