Escolar Documentos
Profissional Documentos
Cultura Documentos
Christophe Hurlin
University of Orlans
December 9, 2013
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 1 / 207
Section 1
Introduction
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 2 / 207
1. Introduction
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 3 / 207
2. The Principle of Maximum Likelihood
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 4 / 207
1. Introduction
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 5 / 207
1. Introduction
References
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 6 / 207
Section 2
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 7 / 207
2. The Principle of Maximum Likelihood
Objectives
In this section, we present a simple example in order
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 8 / 207
2. The Principle of Maximum Likelihood
Example
Suppose that X1 ,X2 , ,XN are i.i.d. discrete random variables, such that
Xi Pois ( ) with a pmf (probability mass function) dened as:
exp ( ) xi
Pr (Xi = xi ) =
xi !
where is an unknown parameter to estimate.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 9 / 207
2. The Principle of Maximum Likelihood
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 10 / 207
2. The Principle of Maximum Likelihood
Since the variables Xi are i.i.d. this joint probability is equal to the
product of the marginal probabilities
N
Pr ((X1 = x1 ) \ ... \ (XN = xN )) = Pr (Xi = xi )
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 11 / 207
2. The Principle of Maximum Likelihood
Denition
This joint probability is a function of (the unknown parameter) and
corresponds to the likelihood of the sample fx1 , .., xN g denoted by
with
N 1
LN (; x1 .., xN ) = exp ( N ) =1 x i N
xi !
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 12 / 207
2. The Principle of Maximum Likelihood
Example
Let us assume that for N = 10, we have a realization of the sample equal
to f5, 0, 1, 1, 0, 3, 2, 3, 4, 1g , then:
e 10 20
LN (; x1 .., xN ) =
207, 360
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 13 / 207
2. The Principle of Maximum Likelihood
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 14 / 207
2. The Principle of Maximum Likelihood
This Figure plots the function LN (; x ) for various values of . It has a
single mode at = 2, which would be the maximum likelihood estimate,
or MLE, of .
-8
x 10
1.2
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 15 / 207
2. The Principle of Maximum Likelihood
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 16 / 207
2. The Principle of Maximum Likelihood
ln LN (; x1 .., xN ) 1 N
i
= N+ xi
=1
N
2 ln LN (; x1 .., xN ) 1
2
=
2
xi < 0
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 17 / 207
2. The Principle of Maximum Likelihood
b
= arg max ln LN (; x1 .., xN )
2R+
ln LN (; x1 .., xN ) 1 N
FOC :
b
= N+
b xi = 0
i =1
N
() b = (1/N ) xi
i =1
N
2 ln LN (; x1 .., xN ) 1
SOC :
2 b
=
b 2 xi < 0
i =1
b
is a maximum.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 18 / 207
2. The Principle of Maximum Likelihood
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 19 / 207
2. The Principle of Maximum Likelihood
Continuous variables
The reference to the probability of observing the given sample is not
exact in a continuous distribution, since a particular sample has
probability zero. Nonetheless, the principle is the same.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 20 / 207
2. The Principle of Maximum Likelihood
Continuous variables
If the random variables fX1 , X2 , .., XN g are i.i.d. then we have:
N
LN (; x1 .., xN ) = fX (xi ; )
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 21 / 207
Section 3
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 22 / 207
3. The Likelihood Function
Objectives
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 23 / 207
3. The Likelihood Function
Notations
The realisation of fX1 , .., XN g (the data set..) is denoted fx1 , .., xN g
or x for simplicity.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 24 / 207
3. The Likelihood Function
with K = 2 and
m
=
2
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 25 / 207
3. The Likelihood Function
LN : RN ! R+
N
(; x1 , .., xn ) 7 ! LN (; x1 , .., xn ) = fX (xi ; )
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 26 / 207
3. The Likelihood Function
`N : RN ! R
N
(; x1 , .., xn ) 7 ! `N (; x1 , .., xn ) = ln fX (xi ; )
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 27 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 28 / 207
3. The Likelihood Function
Notations: In the rest of the chapter, I will use the following alternative
notations:
LN (; x ) L (; x1 , .., xN ) LN ( )
`N (; x ) ln LN (; x ) ln L (; x1 , .., xN ) ln LN ( )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 29 / 207
3. The Likelihood Function
N
N N 1
`N (; y ) =
2
ln 2
2
ln (2 )
22 (yi m )2
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 30 / 207
3. The Likelihood Function
N
`i (; x ) = ln fX (xi ; ) with `N (; x ) = `i (; x )
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 31 / 207
3. The Likelihood Function
1 di
Li (; di ) = fD (di ; ) = exp
di
`i (; di ) = ln (fD (di ; )) = ln ( )
Then we have: !
1 N
i
N
LN (; d ) = exp di
=1
1 N
i
`N (; d ) = N ln ( ) di
=1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 32 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 33 / 207
3. The Likelihood Function
y = g (x; ) +
YjX D () D
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 34 / 207
3. The Likelihood Function
Notations (model)
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 35 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 36 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 37 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 38 / 207
3. The Likelihood Function
Example (Linear Regression Model)
Consider the following linear regression model:
yi = Xi> + i
Yi j xi N xi> , 2
2!
1 yi xi>
Li (; y j x) = f Y jx ( yi j xi ; ) = p exp
2 22
>
where = > 2 is K + 1 1 vector.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 39 / 207
3. The Likelihood Function
N
N N 1 2
`N (; y j x) =
2
ln 2
2
ln (2 )
22 yi xi>
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 40 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 41 / 207
3. The Likelihood Function
Pr ( Yi = 1j Xi = xi ) = F xi>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 42 / 207
3. The Likelihood Function
Remark: Given the choice of the link function F (.) we get a probit or a
logit model.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 43 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 44 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 45 / 207
3. The Likelihood Function
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 46 / 207
3. The Likelihood Function
N h i N h i
`N (; y j x) = yi ln F xi> + (1 yi ) ln 1 F xi>
i =1 i =1
h i
= ln F xi> + ln 1 F xi>
i : y i =1 i : y i =0
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 47 / 207
3. The Likelihood Function
Key Concepts
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 48 / 207
Section 4
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 49 / 207
4. Maximum Likelihood Estimator
Objectives
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 50 / 207
4. Maximum Likelihood Estimator
Denition (Identication)
The parameter vector is identied (estimable) if for any other parameter
vector, 6= , for some data y , we have
LN (; y ) 6= LN ( ; y )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 51 / 207
4. Maximum Likelihood Estimator
Example
Let us consider a latent (continuous and unobservable) variable Yi such
that:
Yi = Xi> + i
with = ( 1 ..K )> , Xi = (Xi 1 ...XiK )> and where the error term i is
i.i.d. such that E (i ) = 0 and V (i ) = 2 . The distribution of i is
symmetric around 0 and we denote by G (.) the cdf of the standardized
error term i /. We assume that this cdf does not depend on or .
Example: i / N (0, 1).
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 52 / 207
4. Maximum Likelihood Estimator
Example (contd)
We observe a dichotomic variable Yi such that:
1 if Yi > 0
Yi =
0 otherwise
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 53 / 207
4. Maximum Likelihood Estimator
Solution:
To answer to this question we have to compute the (log-)likelihood of the
sample of observed data fyi , xi gN
i =1 . We have:
Pr ( Yi = 1j Xi = xi ) = Pr ( Yi > 0j Xi = xi )
= Pr i > xi>
= 1 Pr i xi>
i
= 1 Pr xi>
If we denote by G (.) the cdf associated to the distribution of i /, since
this distribution is symetric around 0, then we have:
Pr ( Yi = 1j Xi = xi ) = G xi>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 54 / 207
4. Maximum Likelihood Estimator
Solution (contd):
For = ( > 2 )> , we have
N N
`N (; y j x) = yi ln G xi>
+ (1 yi ) ln 1 G xi>
i =1 i =1
This log-likelihood depends only on the ratio /. So, for = ( > 2 )>
and = (k > k )> , with k 6= 1 :
`N (; y j x) = `N ( ; y j x)
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 55 / 207
4. Maximum Likelihood Estimator
Remark:
In this latent model, only the ratio / can be identied since
i
Pr ( Yi = 1j Xi = xi ) = Pr < xi> =G xi>
e
probit : Pr ( Yi = 1j Xi = xi ) = xi> e = /, V i
with =1
i
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 56 / 207
4. Maximum Likelihood Estimator
b
= arg max `N (; y j x )
2
or equivalently
b
= arg max LN (; y j x )
2
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 57 / 207
4. Maximum Likelihood Estimator
Remarks
1 Do not confuse the maximum likelihood estimator b (which is a
random variable) and the maximum likelihood estimate b (x ) which
corresponds to the realisation of b
on the sample x.
2 Generally, it is easier to maximise the log-likelihood than the
likelihood (especially for the distributions that belong to the
exponential family).
3 When we consider an unconditional likelihood, the MLE is dened by:
b
= arg max`N (; x )
2
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 58 / 207
4. Maximum Likelihood Estimator
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 59 / 207
4. Maximum Likelihood Estimator
Notations
The rst derivative (gradient) of the (conditional) log-likelihood evaluated
at the point b
satises:
LN (; y j x ) LN b
; y j x
= g b; y j x = 0
b
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 60 / 207
4. Maximum Likelihood Estimator
Remark
The log-likelihood equations correspond to a linear/nonlinear system of
K equations with K unknown parameters 1 , .., K :
0 1
`N (; Y jx ) 0 1
B 1 b C 0
`N (; Y j x )
=B
@ ... C = @ ... A
A
b `N (; Y jx )
0
K b
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 61 / 207
4. Maximum Likelihood Estimator
2 `N (; y j x )
is negative denite
> b
or
2 LN (; y j x )
is negative denite
> b
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 62 / 207
4. Maximum Likelihood Estimator
Remark:
The Hessian matrix (realisation) is a K K matrix:
0 2 `N (; y jx ) 2 `N (; y jx ) 2 `N (; y jx )
1
21 1 2 .. 1 K
B C
B 2
`N (; y jx ) 2 `N (; y jx ) C
2 ` N ( ; y j x ) B .. .. C
= B 2 1 22 C
> B C
B .. .. .. .. C
@ A
2 `N ( ; y jx ) 2 ` N ( ; y jx )
K 1 .. .. 2K
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 63 / 207
4. Maximum Likelihood Estimator
Reminders
x| Mx < 0
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 64 / 207
4. Maximum Likelihood Estimator
x2 x
fX x; 2 = exp 8x 2 [0, +[
22 2
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 65 / 207
4. Maximum Likelihood Estimator
Solution:
We have:
x2
ln fX x; 2 =
+ ln (x ) ln 2
22
So, the log-likelihood of the sample fx1 , .., xN g is:
N N N
1
`N 2 ; x = ln fX xi ; 2 =
22 xi2 + ln (xi ) N ln 2
i =1 i =1 i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 66 / 207
4. Maximum Likelihood Estimator
Solution (contd):
b2 of 2 2 R+ is a solution to the
The maximum likelihood estimator
maximization problem:
N N
1
b2 = arg max`N 2 ; x = arg max
22 xi2 + ln (xi ) N ln 2
2 2R+ 2 2R+ i =1 i =1
`N 2 ; x 1 N
N
2
= 4
2 xi2 2
i =1
`N 2 ; x 1 N
N 1 N
2
=
4
2b
xi2 b
2
b2 =
= 0 ()
2N xi2
b2
i =1 i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 67 / 207
4. Maximum Likelihood Estimator
Solution (contd):
b2 is a maximum:
Check that
`N 2 ; x 1 N
N 2 `N 2 ; x 1 N
N
2
= 4
2 xi2 2 4
=
6 xi2 + 4
i =1 i =1
SOC:
2 `N 2 ; x 1 N
N
4
=
b6
xi2 + b4
b2
i =1
2N b2 N 1 N
=
b
6
+ 4
b
b2 =
since
2N xi2
i =1
N
= <0
b4
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 68 / 207
4. Maximum Likelihood Estimator
Conclusion:
The maximum likelihood estimator (MLE) of the parameter 2 is dened
by:
1 N 2
2N i
b2 =
Xi
=1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 69 / 207
4. Maximum Likelihood Estimator
b
= arg max `N (; y )
2 2R+ ,m 2R
with
N
N N 1
`N (; y ) =
2
ln 2
2
ln (2 )
22 (yi m )2
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 70 / 207
4. Maximum Likelihood Estimator
Solution (contd):
N
N N 1
`N (; y ) =
2
ln 2
2
ln (2 )
22 (yi m )2
i =1
N N
`N (; y ) 1 `N (; y ) N 1
m
= 2
(yi m)
2
= +
22 24 (yi m )2
i =1 i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 71 / 207
4. Maximum Likelihood Estimator
Solution (contd):
FOC (log-likelihood equations)
! !
`N (; y )
1
b2
N
i =1 (yi b)
m 0
= =
b
N
+ 1
N
i =1 (yi b )2
m 0
2
2b 4
2b
b b
m
=
2
b
with
N N
1 1 2
b =
m
N Yi b2 =
N Yi YN
i =1 i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 72 / 207
4. Maximum Likelihood Estimator
Solution (contd):
N N
`N (; y ) 1 `N (; y ) N 1
m
= 2
(yi m)
2
=
2 2
+ 4
2 (yi m )2
i =1 i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 73 / 207
4. Maximum Likelihood Estimator
b2
0
=
N Nb2
0
4
2b b6
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 74 / 207
4. Maximum Likelihood Estimator
yi = xi> + i
where xi = (xi 1 ...xiK )> and = ( 1 ..K )> are K 1 vectors. We assume
that the i are N .i.d. 0, 2 . Then, the (conditional) log-likelihood of the
observations (xi , yi ) is given by
N
N N 1 2
`N (; y j x ) =
2
ln 2
2
ln (2 )
22 yi xi>
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 75 / 207
4. Maximum Likelihood Estimator
x>
= x
(K ,1 )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 76 / 207
4. Maximum Likelihood Estimator
Solution
N
N N 1 2
b
= arg max
2
ln 2
2
ln (2 )
22 yi xi>
2RK ,2 2R+ i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 77 / 207
4. Maximum Likelihood Estimator
Solution (contd)
N
N N 1 2
b
= arg max
2
ln 2
2
ln (2 )
22 yi xi>
2RK ,2 2R+ i =1
N
`N (; y j x ) N 1 2
2
=
22
+ 4
2 yi xi>
| {z } i =1 | {z }
(1,1 ) (1,1 )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 78 / 207
4. Maximum Likelihood Estimator
Solution (contd):
FOC (log-likelihood equations)
0 1
`N (; y j x )
1
b2
N
i =1 xi yi xi> b
0K
= @
2 A=
b
N
+ 2b14 Ni=1 yi xi> b 0
2
2b
b
b
=
b2
! 1 !
N N N
1 2
b
= Xi Xi> Xi Yi b2 =
N Yi Xi> b
i =1 i =1 i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 79 / 207
4. Maximum Likelihood Estimator
Solution (contd):
The Hessian is a (K + 1) (K + 1) matrix:
0 1
2 `N (; y j x ) 2 `N (; y j x )
B > 2 C
B | {z } C
B | {z } C
2
`N (; y j x ) B K K K 1 C
= B C
> B 2 `N (; y j x ) 2 `N (; y j x ) C
|
{z } BB
C
C
> 4
(K +1 ) (K +1 ) @ 2
|
{z } A
| {z } 1 1
1 K
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 80 / 207
4. Maximum Likelihood Estimator
Solution (contd):
N
`N (; y j x ) 1
= 2
xi yi xi>
i =1
N
`N (; y j x ) N 1 2
2
=
2 2
+ 4
2 yi xi>
i =1
0
1
2 N xi xi>
i =1 |{z}
1
4 N xi yi xi>
i =1 |{z}
B |{z} | {z }
2 `N (; y j x ) B K 1 1 K K 1
B 1 1
=B 2
> B >
x> xi>
@
1
N
i =1 xi y N 1
N
i =1 yi
4 |{z}| i {z i } 24 6
| {z }
1 K
1 1 1 1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 81 / 207
4. Maximum Likelihood Estimator
Solution (contd):
Second Order Conditions (SOC)
0 1
2
`N ( )
1
b2
N i =1 xi xi
> 1
b4
N
i =1 xi yi xi> b
B C
=@ 2 A
> b 1
N > y xi> b N 1
N xi> b
b4 i =1 xi i
4
2b b6
i =1 yi
2
Since N
i =1 xi
> y
i xi> b b 2 = N
= 0 (FOC) and N i =1 yi xi> b
!
2 `N ( )
N
b2
N
i = 1 xi xi
> 0
=
N Nb2
> b
0
4
2b b6
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 82 / 207
4. Maximum Likelihood Estimator
Solution (contd):
Second Order Conditions (SOC).
!
2 `N (; y j x ) 1
b2
N
i =1 xi xi
> 0
=
N is denite negative
> b
0
4
2b
Since N >
i =1 xi xi is positive denite (assumption), the Hessian matrix is
denite negative and b is the MLE of the parameters .
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 83 / 207
4. Maximum Likelihood Estimator
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 84 / 207
4. Maximum Likelihood Estimator
Invariance Principle
For the practitioner, this result is extremely useful. For example, when
a parameter appears in a likelihood function in the form 1/ , it is
usually worthwhile to reparameterize the model in terms of = 1/.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 85 / 207
4. Maximum Likelihood Estimator
becomes
N
N N 2
`N m, 2 ; y =
2
ln 2
2
ln (2 )
2 (yi m )2
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 86 / 207
4. Maximum Likelihood Estimator
N 1
b2 =
2
=
N
i =1 (Yi m) b2
as expected.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 87 / 207
Key Concepts
1 Identication.
2 Maximum likelihood estimator.
3 Maximum likelihood estimate.
4 Log-likelihood equations.
5 Equivariance or invariance principle.
6 Gradient Vector and Hessian Matrix (deterministic elements).
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 88 / 207
Section 5
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 89 / 207
5. Score, Hessian and Fisher Information
Objectives
We aim at introducing the following concepts:
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 90 / 207
5. Score, Hessian and Fisher Information
`N (; Y j x )
sN (; Y j x ) s ( ) =
(K ,1 )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 91 / 207
5. Score, Hessian and Fisher Information
Remarks:
sN (; X ) = `N (; X ) /
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 92 / 207
5. Score, Hessian and Fisher Information
Corollary
By denition, the score vector satises
E (sN (; Y j x )) = 0K
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 93 / 207
5. Score, Hessian and Fisher Information
Z
E (sN (; X )) = sN (; x ) fX (x; ) dx = 0
Z
E (sN (; Y j x )) = sN (; Y j x ) f Y jx (y ; ) dy = 0
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 94 / 207
5. Score, Hessian and Fisher Information
Proof.
If we consider a variable X with a pdf fX (x; ) , 8x 2 R, then:
Z
E (sN (; X )) = sN (; x ) fX (x; ) dx
Z
ln fX (x; )
= N fX (x; ) dx
Z
1 fX (x; )
= N fX (x; ) dx
fX (x; )
Z
= N fX (x; ) dx
1
= N =0
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 95 / 207
5. Score, Hessian and Fisher Information
1 d
fD (d; ) = exp , 8d 2 R+
1 N
i
`N (; d ) = N ln ( ) di
=1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 96 / 207
5. Score, Hessian and Fisher Information
N 1 N
= + 2 E ( Di )
i =1
N N
= + 2
= 0
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 97 / 207
5. Score, Hessian and Fisher Information
Then, we have
0 1
1
2 N
i =1 xi Yi xi>
E (sN (; Y j x )) = E @ 2
A
N
22
+ 1
24 N
i =1 Yi xi>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 98 / 207
5. Score, Hessian and Fisher Information
1 N 1 N
E xi Yi xi> = xi E ( Yi j x ) xi>
2 i =1 2 i =1
1 N
= xi xi> xi>
2 i =1
= 0K
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 99 / 207
5. Score, Hessian and Fisher Information
N 1 2
E 2
+ 4 Ni=1 Yi xi>
2 2
N 1 2
= + 4 Ni=1 E Yi xi>
22 2
N 1
= + 4 Ni=1 E (Yi E ( Yi j x ))2
22 2
N 1
= + 4 Ni=1 V ( Yi j x )
22 2
N N2
= + 4
22 2
= 0
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 100 / 207
5. Score, Hessian and Fisher Information
Denition (Gradient)
The gradient vector associated to the log-likelihood function is a K 1
vector dened by:
`N (; y j x )
gN (; y j x ) g ( ) =
(K ,1 )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 101 / 207
5. Score, Hessian and Fisher Information
Remarks
gN (; x ) = `N (; x ) /
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 102 / 207
5. Score, Hessian and Fisher Information
Corollary
By denition of the FOC, the gradient vector satises
gN b
; y j x = 0K
where b
=b
(x ) is the maximum likelihood estimate of .
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 103 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 104 / 207
5. Score, Hessian and Fisher Information
2 `N (; y j x )
HN (; y j x ) =
>
2 `N (; y jx )
Remarks: The matrix is also called the Hessian matrix, but do
>
2
` (; Y x ) 2 `N (; y jx )
not confuse the two matrices N > j and .
>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 105 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 106 / 207
5. Score, Hessian and Fisher Information
I N ( ) = V (sN (; Y j x ))
| {z }
K K
or equivalently:
`N (; Y j x )
I N ( ) = V
where V means the variance with respect to the conditional distribution
Y j X.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 107 / 207
5. Score, Hessian and Fisher Information
Corollary
Since by denition E (sN (; Y j x )) = 0, then an alternative denition of
the Fisher information matrix of the sample fY1 , .., YN g is:
0 1
B C
I N ( ) = E @sN (; Y j x ) sN (; Y j x )> A
| {z } | {z } | {z }
K K K 1 1 K
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 108 / 207
5. Score, Hessian and Fisher Information
2 `N (; Y j x )
I N ( ) = E = E ( HN (; Y j x ))
>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 109 / 207
5. Score, Hessian and Fisher Information
I N ( ) = V (sN (; Y j x ))
I N ( ) = E sN (; Y j x ) sN (; Y j x )>
I N ( ) = E ( HN (; Y j x ))
where E and V denote the mean and the variance with respect to the
conditional distribution Y j X , and where sN (; Y j x ) denotes the score
vector and HN (; Y j x ) the Hessian matrix.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 110 / 207
5. Score, Hessian and Fisher Information
`N (; Y j x )
I N ( ) = V
> !
`N (; Y j x ) `N (; Y j x )
I N ( ) = E
2 `N (; Y j x )
I N ( ) = E
>
where E and V denote the mean and the variance with respect to the
conditional distribution Y j X .
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 111 / 207
5. Score, Hessian and Fisher Information
Remarks
1 Three equivalent denitions of the Fisher information matrix, and as a
consequence three dierent consistent estimates of the Fisher
information matrix (see later).
2 The Fisher information matrix associated to the sample fY1 , .., YN g
can also be dened from the Fisher information matrix for the
observation i.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 112 / 207
5. Score, Hessian and Fisher Information
`i (; Yi j xi )
I i ( ) = V
!
`i (; Yi j xi ) `i (; Yi j xi )>
I i ( ) = E
2 `i (; Yi j xi )
I i ( ) = E
>
where E and V denote the expectation and variance with respect to the
true conditional distribution Yi j Xi .
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 113 / 207
5. Score, Hessian and Fisher Information
I i ( ) = V (si (; Yi j xi ))
I i ( ) = E si (; Yi j xi ) si (; Yi j xi )>
I i ( ) = E ( Hi (; Yi j xi ))
where E and V denote the expectation and variance with respect to the
true conditional distribution Yi j Xi .
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 114 / 207
5. Score, Hessian and Fisher Information
Theorem
The Fisher information matrix associated to the sample fY1 , .., YN g is
equal to the sum of individual Fisher information matrices:
N
I N ( ) = I i ( )
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 115 / 207
5. Score, Hessian and Fisher Information
Remark:
1 In the case of a marginal log-likelihood, the Fisher information matrix
associated to the variable Xi is the same for the observations i :
I i ( ) = I ( ) 8i = 1, ..N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 116 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 117 / 207
5. Score, Hessian and Fisher Information
Solution
di
` (; di ) = ln ( )
The score of the observation Xi is dened by:
`i (; Di ) 1 Di
si (; Di ) = = + 2
Let us use the three denitions of the information quantity I i ( ) :
I i ( ) = V (si (; Di ))
= E si (; Di )2
= E ( Hi (; Di ))
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 118 / 207
5. Score, Hessian and Fisher Information
Solution, contd
`i (; Di ) 1 Di
si (; Di ) = = + 2
First denition:
I i ( ) = V (si (; Di ))
1 Di
= V + 2
1
= 4 V ( Di )
1
= 2
Conclusion: I i ( ) =I ( ) does not depend on i.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 119 / 207
5. Score, Hessian and Fisher Information
Solution, contd
`i (; Di ) 1 Di
si (; Di ) = = + 2
Second denition:
I i ( ) = E si (; Di )2
!
2
1 Di
= E + 2
1 Di 1 Di
= V + 2 since E + 2 =0
1
=
2
Conclusion: I i ( ) =I ( ) does not depend on i.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 120 / 207
5. Score, Hessian and Fisher Information
Solution, contd
`i (; Di ) 1 Di
si (; Di ) = = + 2
2 `i (; Di ) 1 2Di
Hi (; Di ) = 2
= 2
3
Third denition:
I i ( ) = E ( Hi (; Di ))
1 2Di
= E 2
3
1 2
= 2
+ 3 E ( Di )
1 2 1
= 2
+ 3 = 2
Conclusion: I i ( ) =I ( ) does not depend on i.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 121 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 122 / 207
5. Score, Hessian and Fisher Information
Solution
The information matrix is then dened by:
2 `i (; Yi j xi )
I i ( ) = E = E ( Hi (; Yi j xi ))
| {z } >
K +1 K +1
0 1
1
x x>
2 i i
1
x
4 i
E (Yi ) xi>
I i ( ) = @ 2
A
1 >
x
4 i
E (Yi ) xi> 1
24
+ 1
E
6
Yi xi>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 123 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
0 1
1
x x>
2 i i
1
x
4 i
E (Yi ) xi>
I i ( ) = @ 2
A
1 >
x
4 i
E (Yi ) xi> 1
24
+ 1
E
6
Yi xi>
2
Given that E (Yi ) = xi> and E ( Yi xi> ) = 2 , then we have:
!
1 >
2
xi x i 0
I i ( ) = 1
0 24
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 124 / 207
5. Score, Hessian and Fisher Information
I ( ) = EX (I i ( ))
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 125 / 207
5. Score, Hessian and Fisher Information
!
`i (; Yi j Xi ) `i (; Yi j Xi )>
I ( ) = EX E
= EX E si (; Yi j Xi ) si (; Yi j Xi )>
2 `i (; Yi j Xi )
I ( ) = EX E = EX E ( Hi (; Yi j Xi ))
>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 126 / 207
5. Score, Hessian and Fisher Information
`i (; Yi )
I ( ) = V = V (s (; Yi ))
!
`i (; Yi ) `i (; Yi )>
I ( ) = E
= E si (; Yi ) si (; Yi )>
2 `i (; Yi )
I ( ) = E = E ( Hi (; Yi ))
>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 127 / 207
5. Score, Hessian and Fisher Information
and the average Fisher information Matrix for one observation is dened
by: !
1
E Xi Xi>
2 X
0
I ( ) = EX (I i ( )) = 1
0 24
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 128 / 207
5. Score, Hessian and Fisher Information
Summary: in order to compute the average information matrix I ( ) for
one observation:
Step 1: Compute the Hessian matrix or the score vector for one
observation
2 `i (; Yi j xi ) `i (; Yi j xi )
Hi (; Yi j xi ) = >
si (; Yi j xi ) =
Step 2: Take the expectation (or the variance) with respect to the
conditional distribution Yi j Xi = xi
I i ( ) = V (si (; Yi j xi )) = E ( Hi (; Yi j xi ))
I ( ) = EX (I i ( ))
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 129 / 207
5. Score, Hessian and Fisher Information
Theorem
In a sampling model (with i.i.d. observations), one has:
IN ( ) = N I ( )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 130 / 207
5. Score, Hessian and Fisher Information
Score Vector si (; Xi ) si (; Yi j xi )
Hessian Matrix Hi (; Xi ) Hi (; Yi j xi )
Information matrix I i ( ) = I ( ) I i ( )
Av. Infor. Matrix I ( ) = I i ( ) I ( ) = EX (I i ( ))
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 131 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 132 / 207
5. Score, Hessian and Fisher Information
> !
N
1 `i (; yi j xi ) `i (; yi j xi )
bI b
=
N b b
i =1
N
1 2 `i (; yi j xi )
bI b
=
N > b
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 133 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 134 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 135 / 207
5. Score, Hessian and Fisher Information
Problem
These three estimators are asymptotically equivalent, but they could give
dierent results in nite samples. Available evidence suggests that in small
or moderate sized samples, the Hessian is preferable (Greene, 2007).
However, in most cases, the BHHH estimator will be the easiest to
compute.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 136 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 137 / 207
5. Score, Hessian and Fisher Information
Example (CAPM)
The empirical analogue of the CAPM is given by:
e
rit = i + i e
rmt + t
e
rit = rit rft e
rmt = (rmt rft )
| {z } | {z }
excess return of security i at time t market excess return at time t
E ( t ) = 0 V ( t ) = 2 E ( t j e
rmt ) = 0
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 138 / 207
5. Score, Hessian and Fisher Information
Example (CAPM, contd)
Data (data le: capm.xls): Microsoft, SP500 and Tbill (closing prices)
from 11/1/1993 to 04/03/2003
0.10
0.08
0.05
0.04
RMSFT
0.00
0.00
-0.05
-0.04
-0.10
-0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 -0.08
500 1000 1500 2000
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 139 / 207
5. Score, Hessian and Fisher Information
rit = xt> + t
e t = 1, ..T
rmt )> is 2
where xt = (1 e 1 vector of random variables,
> >
>
= i : i : 2 = : 2 is 3 1 vector of parameters, and
where the error term t satises E (t ) = 0, V (t ) = 2 and
E ( t j e
rmt ) = 0.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 140 / 207
5. Score, Hessian and Fisher Information
! 1 !
T T
b
i
b=
b
i
= xt xt> xt erit
t =1 t =1
T
1 2
b2 =
T e
rit b
xt>
t =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 141 / 207
5. Score, Hessian and Fisher Information
Solution The ML estimator is dened by:
T
T T 1 2
b
= arg max
2
ln 2
2
ln (2 )
22 e
rit b
xt>
2R2 ,2 2R+ t =1
or equivalently
b
asy 1 1
N 0 , I (0 )
T
The asymptotic variance covariance matrix of b
is
1
V b
= I 1
(0 )
T
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 142 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
First estimator: The information matrix at time t is dened by (third
denition):
0 1
eit xt
2 `t ; R
I t () = E @ A = E eit xt
Ht ; R
>
0 1
1
x x> 1
x eit
E R xt>
2 t t 4 t
B C
I t () = @ 2 A
1 >
x eit
E R xt> 1
+ 1
E eit
R xt>
4 t 24 6
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 143 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
First estimator:
0 1
1
x x> 1
x eit
E R xt>
2 t t 4 t
B C
I t () = @ 2 A
1 >
x E eit
R xt> 1
+ 1
E eit
R xt>
4 t 24 6
2
eit
Given that E R = xt> and E eit
R xt> = 2 , then we have:
!
1
x x>
2 t t
02 1
I t () = 1
01 2 24
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 144 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
First estimator:
!
1
x x>
2 t t
02 1
I t () = 1
01 2 24
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 145 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
Second denition (BHHH):
1 1
V = bI
b asy b b
T
> !
T
1 `t (; e
rit j xt ) `t (; e
rit j xt )
bI b
=
T b b
t =1
with
0 1
1 b !
`t (; e
rit j xt ) x
b2 t
e
rit xt> 1
x b
B C b2 t t
=@ 2 A=
b
1
+ 1
e
rit b
xt>
1
2
2b
+ 2b14 b2t
2
2b 4
2b
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 146 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
Second denition (BHHH):
>
`t (; e
rit j xt ) `t (; e
rit j xt )
b
b
1
!
b
x
b2 t t 1 > 1 1 2
=
x b
b2 t t
+ b
1
+ 2b14 b2t 2
2b 4 t
2b
2
2b
0 1
1
x x>b2
b4 t t t
1
x b
b2 t t
1
+ 1 2
b
2
2b 4 t
2b
= @
2 A
1 > 1 1 2 1 1 2
x b
b2 t t
+ b + b
2
2b 4 t
2b 2
2b 4 t
2b
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 147 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
Second denition (BHHH): so we have
1 1
V = bI
b asy b b
T
with
0 1
1 T 1
x x>b2
b4 t t t
1
x b
b2 t t
1
+ 1 2
b
2 4 t
@
bI b 2b 2b A
= 2
T t =1
1 >
x b 1
+ 1 2
b 1
+ 1 2
b
b2 t t
2
2b 4 t
2b 2
2b 4 t
2b
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 148 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
Third denition (inverse of the Hessian): we know that
1 1
V = bI
b asy b b
T
T
1
bI b
=
T Ht b
; e
rit j xt
t =1
0 1
1
x x> 1
x e
rit b
xt>
b2 t t b4 t
Ht b
; e
rit j xt =@
2 A
1 >
x e
rit b
xt> 1 1
e
rit b
xt>
b4 t
4
2b b6
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 149 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
Third denition (inverse of the Hessian):
0 1
1
x x> 1
x e
rit b
xt>
b2 t t b4 t
Ht b
; e
rit j xt =@
2 A
1 >
x e
rit b
xt> 1 1
e
rit b
xt>
b4 t
4
2b b6
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 150 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
Third denition (inverse of the Hessian):
So, in this case, the third estimator of bI b
concides with the rst one:
1 1
V = bI
b asy b b
T
!
1 T 1
Tt=1 xt xt> 02
bI b 1
= Ht b
; e
rit j xt = b2
T
1
T 01 2
t =1 4
2b
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 151 / 207
5. Score, Hessian and Fisher Information
Solution (contd)
These three estimates of the asymptotic variance covariance matrix
are asymptotically equivalent, but can be largely dierent in nite
sample...
1 1 b
b asy b
V = bI
T
with
1 T
T t
bI b = It b
=1
>!
T
1 ` ( ; e
r j x ) ` ( ; e
r j x )
T t
bI b t it t t it t
=
=1 b
b
T
1
bI b
=
T ( Ht (; e
rit j xt ))
t =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 152 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 153 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 154 / 207
5. Score, Hessian and Fisher Information
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 155 / 207
Key Concepts
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 156 / 207
Section 6
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 157 / 207
6. Properties of Maximum Likelihood Estimators
Objectives
MLE is a good estimator? Under which conditions the MLE is
unbiased, consistent and corresponds to the BUE (Best Unbiased
Estimator)? => regularity conditions
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 158 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 159 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 160 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 161 / 207
6. Properties of Maximum Likelihood Estimators
Theorem (Consistency)
Under regularity conditions, the maximum likelihood estimator is
consistent
p
b
! 0
N !
or equivalently:
p limb
= 0
N !
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 162 / 207
6. Properties of Maximum Likelihood Estimators
ln LN b
; y j x ln LN (; y j x )
LN (; Y j x ) LN (; Y j x )
E ln ln E
LN ( 0 ; Y j x ) LN ( 0 ; Y j x )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 163 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 164 / 207
6. Properties of Maximum Likelihood Estimators
1 1
E ln LN (; Y j x ) E ln LN ( 0 ; Y j x )
N N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 165 / 207
6. Properties of Maximum Likelihood Estimators
1 1
E `N ( 0 ; Yi j xi ) E `N (; Yi j xi )
N N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 166 / 207
6. Properties of Maximum Likelihood Estimators
1 p 1
`N (; Yi j xi ) ! E `N (; Yi j xi )
N N ! N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 167 / 207
6. Properties of Maximum Likelihood Estimators
Sketch of the proof, contd
The Likelihood inequality for = b
implies
1 1
E `N ( 0 ; Yi j xi ) E `N b; Yi j xi
N N
with
1 p 1
`N ( 0 ; Yi j xi ) ! E `N ( 0 ; Yi j xi )
N N ! N
1 p 1
`N b; Yi j xi ! E `N b; Yi j xi
N N ! N
and thus
1 1
lim Pr `N ( 0 ; Yi j xi ) `N b; Yi j xi =1
N ! N N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 168 / 207
6. Properties of Maximum Likelihood Estimators
Sketch of the proof, contd So we have two results:
1 1
lim Pr `N ( 0 ; Yi j xi ) `N b; Yi j xi =1
N ! N N
1 1
`N b; Yi j xi `N ( 0 ; Yi j xi ) 8N
N N
It necessarily implies that
1 p 1
`N b; Yi j xi ! `N ( 0 ; Yi j xi )
N N ! N
If is a scalar, we have immediatly:
p
b
! 0
N !
For a more general case with dim ( ) = K , see a formal proof in Amemiya
(1985).
Amemiya T., (1985) Advanced Econometrics. Harvard University Press
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 169 / 207
6. Properties of Maximum Likelihood Estimators
Remark
The proof of the consistency of the MLE is largely easiest when we have a
formal expression for the maximum likelihood estimator b
b
=b
(X1 , .., XN )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 170 / 207
6. Properties of Maximum Likelihood Estimators
Example
Suppose that D1 , D2 , .., DN are i.i.d., positive random variable with
Di Exp ( 0 ), with
1 d
fD (d; ) = exp , 8d 2 R+
E ( Di ) = 0 V (Di ) = 20
where 0 is the true value of . Question: show that the MLE is
consistent.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 171 / 207
6. Properties of Maximum Likelihood Estimators
Solution
The log-likelihood function associated to the sample fd1 , .., dN g is dened
by:
1 N
i
`N (; d ) = N ln ( ) di
=1
We admit that maximum likelihood estimator corresponds to the sample
mean:
b 1
= N Di
N i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 172 / 207
6. Properties of Maximum Likelihood Estimators
Solution, contd
Then, we have:
1
E b
= N E ( Di ) = b
is unbiased
N i =1
1 2
V b
= 2 N
i =1 V ( D i ) =
N N
As a consequence
E b
= lim V b
=0
N !
and
p
b
!
N !
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 173 / 207
6. Properties of Maximum Likelihood Estimators
Lemma
Under stronger conditions, the maximum likelihood estimator converges
almost surely to 0
a.s . p
b ! 0 =) b ! 0
N ! N !
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 174 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 175 / 207
6. Properties of Maximum Likelihood Estimators
V b
I N 1 ( 0 ) FDCR or Cramer-Rao bound
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 176 / 207
6. Properties of Maximum Likelihood Estimators
Remarks
1 Hence, the Cramer-Rao Bound is the inverse of the information matrix
associated to the sample. Reminder: three denitions for I N ( 0 ) .
!
`N (; Y j x )
I N ( 0 ) = V
0
!
`N (; Y j x ) `N (; Y j x )>
I N ( 0 ) = E
0
0
!
2 `N (; Y j x )
I N ( 0 ) = E
> 0
2 If is a vector then V b
I N 1 ( 0 ) means that V b
I N 1 ( 0 )
is positive semi-denite
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 177 / 207
6. Properties of Maximum Likelihood Estimators
Theorem (E ciency)
Under regularity conditions, the maximum likelihood estimator is
asymptotically e cient and attains the FDCR (Frechet - Darnois -
Cramer - Rao) or Cramer-Rao bound:
V b
= I N 1 ( 0 )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 178 / 207
6. Properties of Maximum Likelihood Estimators
1 d
fD (d; ) = exp , 8d 2 R+
E ( Di ) = 0 V (Di ) = 20
where 0 is the true value of . Question: show that the MLE is e cient.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 179 / 207
6. Properties of Maximum Likelihood Estimators
Solution
We shown that the maximum likelihood estimator corresponds to the
sample mean,
1 N
N i
b
= Di
=1
2
V b
= 0
N
E b
= 0
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 180 / 207
6. Properties of Maximum Likelihood Estimators
Solution, contd
The log-likelihood function is
1 N
i
`N (; d ) = N ln ( ) di
=1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 181 / 207
6. Properties of Maximum Likelihood Estimators
Solution, contd
Let us use one of the three denitions of the information quantity I N ( ) :
`N (; D )
I N ( ) = V
!
N
N 1
= V
+ 2
Di
i =1
1 N
= i = 1 V ( Di )
4
N 2 N
= 4
= 2
Then, b
is e cient and attains the Cramer-Rao bound.
2
V b
= I N 1 ( 0 ) =
N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 182 / 207
6. Properties of Maximum Likelihood Estimators
where 0 denotes the true value of the parameter and I ( 0 ) the (average)
Fisher information matrix for one observation.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 183 / 207
6. Properties of Maximum Likelihood Estimators
Corollary
Another way, to write this result, is to say that for large sample size N, the
MLE b is approximatively distributed according a normal distribution
asy
b
N 0 , N 1
I 1
( 0 )
or equivalently
asy
b
N 0 , I N 1 ( 0 )
where I N ( 0 ) = N I ( 0 ) denotes the Fisher information matrix
associated to the sample.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 184 / 207
6. Properties of Maximum Likelihood Estimators
Vasy b
= I N 1 ( 0 )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 185 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 186 / 207
6. Properties of Maximum Likelihood Estimators
Proof (MLE convergence)
At the maximum likelihood estimator, the gradient of the log-likelihood
equals zero (FOC):
`N (; y j x )
gN b
gN b
; y j x = = 0K
b
(K ,1 )
where b
=b (x ) denotes here the ML estimate. Expand this set of
equations in a Taylor series around the true parameters 0 . We will use the
mean value theorem to truncate the Taylor series at the second term:
gN b
= gN ( 0 ) + HN b
0 = 0
p 1 p
1 1
N b
0 = HN N gN ( 0 )
N N
1 p
1
= HN Ng ( 0 )
N
where g ( 0 ) denotes the sample mean of the individual gradient vectors
N
1 1
g ( 0 ) =
N
gN ( 0 ) =
N gi ( 0 ; yi j xi )
i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 188 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 189 / 207
6. Properties of Maximum Likelihood Estimators
E (si ( 0 ; Yi j xi )) = 0
Ex V (si ( 0 ; Yi j xi )) = Ex (I i ( 0 )) = I ( 0 )
By using the Lindberg-Levy Central Limit Theorem, we have:
p d
Ns ( 0 ) ! N (0, I ( 0 ))
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 190 / 207
6. Properties of Maximum Likelihood Estimators
Proof (MLE convergence, contd)
We known that:
N
1 1
N
HN ; Y j x =
N Hi ; Yi j xi
i =1
Reminder:
If XN and YN verify
p
XN ! X
(K ,K ) (K ,K )
d
YN !N 0 ,
(K ,1 ) (K ,1 ) (K ,K )
then
d
XN YN !N 0 , X X>
(K ,K )(K ,1 ) (K ,1 ) (K ,K )(K ,K )(K ,K )
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 192 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 193 / 207
6. Properties of Maximum Likelihood Estimators
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 194 / 207
6. Properties of Maximum Likelihood Estimators
1 d
fD (d; ) = exp , 8d 2 R+
E ( Di ) = 0 V (Di ) = 20
where 0 is the true value of . Question: what is the asymptotic
distribution of the MLE? Propose a consistent estimator of the asymptotic
variance of b.
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 195 / 207
6. Properties of Maximum Likelihood Estimators
Solution
We shown that b
= (1/N ) N
i =1 Di and:
`i (; Di ) 1 Di
si (; Di ) = = + 2
The (average) Fisher information matrix associated to Di is:
1 Di 1 1
I ( ) = V + 2 = V D = 2
4 ( i)
Then, the asymptotic distribution of b
is:
p d
N b 0 ! N 0, 2
or equivalently !
b
asy 2
N 0 ,
N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 196 / 207
6. Properties of Maximum Likelihood Estimators
Solution, contd
The asymptotic variance of b
is:
2
Vasy b
=
N
2
b
b asy
V b
=
N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 197 / 207
6. Properties of Maximum Likelihood Estimators
b
b
=
b2
! 1 !
N N N
1 2
b
= Xi Xi> Xi> Yi b2 =
N Yi Xi> b
i =1 i =1 i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 198 / 207
6. Properties of Maximum Likelihood Estimators
Solution
This model satisfy the regularity conditions. We shown that the average
Fisher information matrix is equal to:
1
E
2 X
Xi Xi> 0
I ( ) = 1
0 24
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 199 / 207
6. Properties of Maximum Likelihood Estimators
Solution, contd
The asymptotic variance covariance matrix of b
is equal to:
Vasy b
=N 1
I 1
( 0 ) = I N 1 ( 0 )
with
N
E
2 X
Xi Xi> 0
I N ( ) = N
0 24
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 200 / 207
6. Properties of Maximum Likelihood Estimators
Solution, contd
A consistent estimate of I N ( ) is:
!
N b
Q
b2 X
0
b asy1 b
bI N ( ) = V =
N
0
4
2b
with
N
bX = 1
Q xi xi>
N i =1
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 201 / 207
6. Properties of Maximum Likelihood Estimators
Solution, contd
Thus we get:
asy 1
b
N b 2 N
0 , i =1 xi xi
>
!
2 asy 4
2b
b
N 20 ,
N
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 202 / 207
6. Properties of Maximum Likelihood Estimators
Summary
Under regular conditions
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 203 / 207
6. Properties of Maximum Likelihood Estimators
But, nite sample properties can be very dierent from large sample
properties:
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 204 / 207
6. Properties of Maximum Likelihood Estimators
Theorem (Equivariance)
Under regular conditions and if g (.) is a continuously dierentiable
function of and is dened from RK to RP , then:
p
g b
! g ( 0 )
p d
N g b
g ( 0 ) ! N 0, G ( 0 ) I 1
( 0 ) G ( 0 )>
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 205 / 207
Key Concepts of the Chapter 2
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 206 / 207
End of Chapter 2
Christophe Hurlin (University of Orlans) Advanced Econometrics - HEC Lausanne December 9, 2013 207 / 207