Você está na página 1de 149

A reconsideration of continuous time

one-factor spot rate models.


Master Thesis for the Cand.Scient.Oecon degree in
Mathematics and Economics at the University of
Copenhagen.
Christoer Kanstrup
6th July 2004
Thesis counsellor: Anders Rahbek.
Institute for Mathematical Sciences.
University of Copenhagen.
ii
Preface
This is my Master Thesis, it represents the conclusion of ve years of studying
mathematics and economics at the University of Copenhagen. More specically
this thesis is the product of my work during the spring and early summer of 2004
and it is thus the product of six months work.
The work presented here is based on all the topics I have studied during the pre-
vious ve years specically nance theory, statistics and econometrics. However,
the actual methods and results in the thesis are all new to me and have not been
included in any of the courses I have followed earlier.
I would like to thank my thesis counsellor Anders Rahbek for his help, patience
and ability to read through my sometimes long ramblings.
Finally I would like to thank the people who helped me during the writing process,
and also Mr. Yacine At-Sahalia for kindly sharing his research data.
University of Copenhagen
July 2004
Christoer kanstrup
Contents
Contents iii
List of Tables vii
List of Figures ix
1 Introduction 1
2 Stochastic Dierential Equations 5
2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Denition and existence of solutions . . . . . . . . . . . . . . . . . 8
2.3 Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Ergodicity of the Solution . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Scale functions and speed measures . . . . . . . . . . . . . 18
2.4.2 Bounded Solutions . . . . . . . . . . . . . . . . . . . . . . 19
2.4.3 Conditions for an Ergodic Solution . . . . . . . . . . . . . 20
2.4.4 Exponential Ergodicity . . . . . . . . . . . . . . . . . . . . 22
3 Parameter estimation in diusion models 31
3.1 Approximating the likelihood function . . . . . . . . . . . . . . . 32
3.2 Martingale estimating functions . . . . . . . . . . . . . . . . . . . 35
3.2.1 Existence of optimal estimating functions . . . . . . . . . . 37
3.2.2 Asymptotic behavior of martingale estimating functions . . 39
3.2.3 Linear estimating functions . . . . . . . . . . . . . . . . . 42
3.2.4 Quadratic estimating functions . . . . . . . . . . . . . . . 44
iii
iv CONTENTS
3.2.5 Estimating the standard deviation . . . . . . . . . . . . . . 55
3.2.6 Estimators based on eigenfunctions . . . . . . . . . . . . . 58
3.3 Model misspecication analysis . . . . . . . . . . . . . . . . . . . 61
3.4 Empirical data example . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 The At-Sahalia method . . . . . . . . . . . . . . . . . . . . . . . 69
4 Modelling the short rate 73
4.1 The term structure of interest rates . . . . . . . . . . . . . . . . . 73
4.2 Characteristics of the short rate . . . . . . . . . . . . . . . . . . . 76
4.2.1 Some standard models . . . . . . . . . . . . . . . . . . . . 78
4.3 Examining parametric models for the short rate . . . . . . . . . . 80
4.3.1 Specication of the general model . . . . . . . . . . . . . . 80
4.3.2 The estimation approach . . . . . . . . . . . . . . . . . . . 85
4.3.3 The identication problem . . . . . . . . . . . . . . . . . . 88
4.3.4 Using proxies . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.1 The 7-day Eurodollar data . . . . . . . . . . . . . . . . . . 91
4.4.2 Conclusion on the empirical analysis . . . . . . . . . . . . 102
5 A semi-parametric approach 105
5.1 The Estimation Approach . . . . . . . . . . . . . . . . . . . . . . 105
5.1.1 Using transition probabilities to estimate the drift function 107
5.1.2 Kernel estimation of the diusion function . . . . . . . . . 108
5.1.3 Semi-parametric diusion estimation . . . . . . . . . . . . 110
5.2 A simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3 Empirical results . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4 Misspecication analysis . . . . . . . . . . . . . . . . . . . . . . . 117
5.5 Conclusion on the semi-parametric estimation . . . . . . . . . . . 119
6 Conclusion 121
6.1 Discussion of the methods used . . . . . . . . . . . . . . . . . . . 122
6.2 Possible extensions of the work . . . . . . . . . . . . . . . . . . . 123
CONTENTS v
A Broydens Method 125
B Source Codes 129
B.1 Broydens Method . . . . . . . . . . . . . . . . . . . . . . . . . . 129
B.2 Estimation Program . . . . . . . . . . . . . . . . . . . . . . . . . 132
Bibliography 137
vi
List of Tables
3.1 Results of simulation study of estimators based on the Euler ap-
proximation of the likelihood function. . . . . . . . . . . . . . . . 35
3.2 Results of simulation study of maximum likelihood estimators in
the Ornstein-Uhlenbeck process. . . . . . . . . . . . . . . . . . . . 47
3.3 Estimators based on an approximation to the optimal quadratic
estimating function. . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.4 Standard deviation calculated by asymptotic estimator and boot-
strap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.5 Estimation results of applying the CIR model to the one month
Eurodollar rate, full sample ie. 1971-2004. The standard devia-
tions are calculated using (3.32)-(3.33). . . . . . . . . . . . . . . 67
3.6 Estimation results of applying the CIR model to the one month
Eurodollar rate, sub-sample ie. 1984-2001. . . . . . . . . . . . . . 68
4.1 Selection of parametric short rate models. . . . . . . . . . . . . . 78
4.2 Some models nested within the general parametric model. . . . . 81
4.3 Details about the 7-day Eurodollar data. . . . . . . . . . . . . . . 91
4.4 The estimated parameters in the CKLS model. . . . . . . . . . . . 93
4.5 The estimated parameters in the general-drift, CEV-diusion model. 95
4.6 The estimated parameters in the linear-drift, general-diusion model. 98
4.7 The estimated parameters in the unconstrained model. . . . . . . 99
4.8 Results of the model specication analysis. . . . . . . . . . . . . . 102
5.1 Results of simulation study of parametric estimators in semi-parametric
model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2 The estimated parameters in semi-parametric model. . . . . . . . 115
vii
viii
List of Figures
2.1 A simulated sample-path of the CIR-process. . . . . . . . . . . . . 27
3.1 A simulated sample-path of the Ornstein-Uhlenbeck-process. . . . 45
3.2 Histogram and empirical density for estimators based on the opti-
mal quadratic estimating function. . . . . . . . . . . . . . . . . . 48
3.3 Estimators based on an approximation to the optimal quadratic
estimating function. . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 The estimated asymptotical standard deviation of the estimators. 58
3.5 Simulation study of the uniform residuals, t
n
= 250, = 2.5. . . . 63
3.6 Simulation study of the uniform residuals, t
n
= 250, = 1. . . . . 64
3.7 Simulation study of the uniform residuals, t
n
= 250, = 0.1. . . . 65
3.8 Daily observations of the one month Eurodollar rate. . . . . . . . 67
3.9 The uniform residuals u
i
based on the CIR model estimated on
the one month Eurodollar rate from 1971 to 2004 . . . . . . . . . 68
3.10 The uniform residuals u
i
based on the CIR model estimated on
the subsample of the one month Eurodollar rate from 1984 to 2001 69
4.1 A simulated sample-path of the full model. . . . . . . . . . . . . . 81
4.2 The function r
2
r

3
for various sets of parameter values. . . . 89
4.3 Daily observations of the 7-day Eurodollar rate . . . . . . . . . . 92
4.4 Cross-plot of r
t
i
and r
t
i1
for the 7-day Eurodollar data. . . . . . . 93
4.5 Roots of the companion matrix. . . . . . . . . . . . . . . . . . . . 93
4.6 The uniform residuals u
i
based on the CKLS model . . . . . . . . 94
4.7 The estimated parametric invariant density in the CKLS model. . 95
ix
x LIST OF FIGURES
4.8 Uniform residuals and drift function in the general-drift, CEV-
diusion model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.9 The estimated parametric invariant density in the general drift,
CEV diusion model. . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.10 Uniform residuals and diusion function in the linear drift, general
diusion model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.11 The estimated parametric invariant density in the linear drift, gen-
eral diusion model. . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.12 Uniform residuals, drift and diusion function in the full model. . 101
4.13 The estimated parametric invariant density in the full model. . . . 102
5.1 Histogram and empirical density for the parametric estimators in
the semi-parametric model. . . . . . . . . . . . . . . . . . . . . . 113
5.2 Model estimation and misspecication for a simulated CIR process
with t
n
= 500 and = 2.5. . . . . . . . . . . . . . . . . . . . . . 113
5.3 Model estimation and misspecication for a simulated CIR process
with t
n
= 500 and = 1. . . . . . . . . . . . . . . . . . . . . . . 114
5.4 Model estimation and misspecication for a simulated CIR process
with t
n
= 500 and = 0.1. . . . . . . . . . . . . . . . . . . . . . 114
5.5 Nonparametric density estimate of the invariant distribution and
estimated parametric drift. . . . . . . . . . . . . . . . . . . . . . . 116
5.6 Estimated semi-parametric diusion estimator. . . . . . . . . . . . 117
5.7 Estimated semi-parametric diusion estimator compared with a
linear function for lower values of the spot rate process. . . . . . . 117
5.8 The uniform residuals u
i
based on the semi-parametric model. . . 118
A.1 An example of a object function for which the Broyden procedure
might diverge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Chapter 1
Introduction
One of the basic assumptions of models dealing with nancial derivative pricing
is that one or more assets and/or non-traded factors are given a priori. That is,
sophisticated tools are developed to price derivative securities, when the under-
lying object, X = (X
1
, . . . , X
k
)
T
, is an empirically observable multi-dimensional
stochastic process whose continuous time dynamics is assumed to be given by the
stochastic dierential equation
dX
t
= (t, X
t
)
. .
k1
dt +(t, X
t
)
. .
kn
dW
t
..
n1
. (1.1)
Where W is a n-dimensional Brownian motion and and are deterministic
matrix functions whose functional forms are know except for the values of certain
parameters.
For a range of popular derivative pricing models the dynamics of the underlying
asset are specied such that elegant solutions for eg. stock options prices or bond
prices are available. Examples are the Black-Scholes model for stock prices or
the Vasicek model for the spot rate.
Specifying and estimating an appropriate stochastic dierential equation requires
some quite advanced statistical/econometric tools. For instance the short term
risk free interest rate (the spot rate) is one of the most important subjects con-
sidered in the nancial markets. A wide range of models have been suggested to
explain the dynamics of this nancial object, in fact some would claim that more
models have been put forward to describe the dynamics of the spot rate than
for any other issue in nance, see Chan et al. [1992] page 1209. However, the
matter of how to specify a stochastic dierential equation capable of capturing
the behavior of the spot rate is still an open question.
A main focus of the thesis is a critical examination of the widely used spot rate
models. We present and discuss a wide range of the classical interest rate mod-
els. From this, we continue the work of Chan et al. [1992] and show how the
classical linear drift models (Vasicek and CIR) fail to capture the dynamics of
observed short rate data. The literature propose ways of extending these models
to compensate for the poor data-t. In particular At-Sahalia [1996b] suggest
that nonlinear functional forms should be included. We show that this extension
only improves part of the misspecication problems (the nonlinear forms bet-
ter describe the observed invariant density), there are still clear indications of
misspecication even for the extend models.
Focus will be almost entirely on modelling the underlying nancial assets. A
natural extension, which is outside the scope of this thesis, would be to derive
1
2 CHAPTER 1. INTRODUCTION
expressions for various options or bond prices based on the suggested dynamics
of the nancial assets. However, for more general models, which t the observed
data better than the classical models, we would not expect to nd analytical
expressions for the price of derivatives and Monte Carlo simulation would have
to be implemented. As such, a major part of the analytical work in asset pricing
lies in specifying an empirically acceptable a priori model.
The contribution of this thesis to the existing literature is thus threefold:
Firstly we collect and present the main theory of stochastic dierential equations,
we introduce some known results and describe them using examples. We also
present a continuous time version of the drift criterion, this appears to be new to
the theory of stochastic dierential equations in this setting.
Secondly we introduce some relatively new results (from within the last decade)
concerning parameter estimation in continuous time. We discuss the following
methods
Estimation using the likelihood function when possible.
Estimation using an approximation to the true likelihood function.
Estimation using linear estimating functions.
Estimation using quadratic estimating functions.
Estimation using estimating functions based on eigenfunctions.
Semi-parametric estimation using the invariant density.
Implementation of these methods is discussed and the performance of the indi-
vidual methods are compared using simulation studies.
Thirdly we analyse the various spot rate models including the relatively unex-
plored nonlinear model proposed by At-Sahalia [1996b] which is presented in
detail here. The identication and proxy problems that arise when attempting
to estimate the parameters in this model (and other short rate models) are il-
lustrated. Along these lines is also the examination of a semi-parametric model
which only imposes parametric structure on the drift function.
Although the thesis is based on existing results regarding stochastic dierential
equations there are new results. One new result regards topic three above: A crit-
ical examination of the various spot rate models, including the new nonlinear and
semi-parametric models, has not been performed in detail in any of the literature
that the author is aware of. The conclusions are also new and relevant in the light
of the fact classical no-arbitrage theory is based heavily on assumptions regarding
the spot rate dynamics. These assumptions are shown to be inconsistent with
the empirical ndings.
3
The thesis is constructed in the following manner:
Chapter 2 formally denes the concept of a stochastic dierential equation. It is
commonly known that the general form (1.1) does not guarantee that a solution
even exists. We dene what we mean by a solution to (1.1) and consider restric-
tions on the functions and such that a solution actually exists. Although
Chapter 2 does briey present some introductory theory, a certain amount of
basic knowledged of stochastic processes (such as the construction of the Ito-
Integral) and stochastic calculus (the Ito formula) is assumed known and not
discussed in any detail. The probabilistic properties of a solution to a SDE are
discussed and conditions ensuring stationarity and ergodicity are derived.
A repeating example in this thesis will be the one-dimensional (k = n = 1)
Cox-Ingersoll-Ross (CIR) model
dX
t
= a(b X
t
)dt +dW
t
. (1.2)
As an illustration of the theorems of Chapter 2, the existence and statistical
properties, of a process satisfying (1.2) are derived in detail.
In (1.2) a, b and are parameters and the question of how to estimate the values
of these parameters, for a set of observed values of the process, is the topic of
Chapter 3. Although we assume that the actual process is continuous, empiri-
cal observations will be of a discrete nature. The fact that an explicit solution
to a general stochastic dierential equation can seldom be derived analytically,
implies that methods such as maximum likelihood estimation will prove to be
impossible for the majority of the models considered. Instead we consider simply
replacing the likelihood function with the likelihood function of an approximation
to the true process. We also consider more sophisticated methods for parameter
estimation based on discrete observations, see the list presented above for a ref-
erence to the main estimation methods discussed in this chapter. The literature
on this topic has grown somewhat within the last decade and some of the main
references are given in Chapter 3.
The asymptotic properties of the proposed estimation methods are discussed and
the nite sample quality of the estimators are studied using simulation. Again
the CIR model is used as an example alongside the slightly more simple Ornstein-
Uhlenbek process which does in fact allow for explicit maximum likelihood esti-
mation.
Apart from parameter estimation, Chapter 3 also presents a method for deter-
mining how well a given model describes the data. This misspecication analysis
is vital in light of the comment above of how the question of nding an acceptable
spot rate model is still unresolved.
Having derived necessary theoretical statistical methods in the previous chapters,
Chapter 4 turns focus to more nancial topics. Basic concepts of term structure
4 CHAPTER 1. INTRODUCTION
modelling are recapitulated and an overview of the some of the main spot rate
models from the literature is presented and the pros and cons of the models are
discussed. The estimation and misspecication analysis methods will be imple-
mented on a new and previously unexamined class of spot rate models.
In Chapter 5 we consider a dierent approach to the task of estimating a stochas-
tic dierential equation on the basic of discrete observations. Instead of specifying
functional parametric forms of both () and () in (1.1), we consider a method
for which it is sucient to impose a parametric shape of () only. A nonparamet-
ric estimator of () is derived thereby loosening the restrictive ties of imposing
a parametric form of this function. The qualities of this method is examined
using simulation studies and an empirical implementation on interest rate data
is conducted.
Finally Chapter 6 concludes the thesis with a discussion of the results and some
suggestions for future work including extensions of the nancial models used and
improvement of the estimation methods.
Chapter 2
Stochastic Dierential Equations
This chapter recapitulates the theory of stochastic integrals, stochastic dieren-
tials and stochastic dierential equations. The contents in this chapter is based
on basic knowledge of Brownian motions and Ito integrals, it covers theorems on
properties of solutions of stochastic dierential equations. Conditions for station-
arity are presented this is something which is seldom discussed in great detail in
nance theory. In general the purpose of this chapter is to provide a broad un-
derstanding of random processes that are constructed as solutions to stochastic
dierential equations and prepare for the work ahead.
2.1 Preliminaries: Ito integrals, diusions and
general stochastic integrals
We start of with some preliminary notes on the notation used in the following.
Consider a given ltered probability space (, T, T
t

t0
, P) and a stochastic
process X : [0, ) R
n
. Hence for each t 0 and for each , X(t, )
will be a vector in R
n
. For a xed t the notation X
t
, will be used for the stochastic
variable dened by X
t
= X(t, ) : R
n
on the probability space (, T
t
, P
t
)
where P
t
is the restriction of P to T
t
.
Assume that (, T, T
t

t0
, P) is a ltered probability space, let W
t
be a Stan-
dard (one-dimensional) Brownian Motion (SBM) dened on this space
1
.
We briey state a few essential results concerning stochastic integrals of the type
_
b
a
f
t
dW
t
(2.1)
for a large class of random functions (processes) f.
The only type of stochastic integrals on the form (2.1) considered in the following
will be Ito integrals. Though other types of stochastic integrals exists, the Ito
Integral is the essential in nancial applications partly because of the martingale
property, see Theorem 2.1. Whenever a stochastic integral such as (2.1) is used
1
that is let W
t
be a standard Brownian motion dened on the probability space (, T, P)
and adapted to the ltration T
t

t0
. It is common for textbooks on nance theory to dene
the Brownian motion rst and then dene the ltration based on the Brownian motion, see
e.g. Bjork [1998] page 78. Either view can of course be adopted here, the important thing is
that we have a ltered probability space and a Brownian motion on this space adapted to the
ltration.
5
6 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
we implicitly assume that this integral is well dened. This is for instance the
case for any process f that belongs to the class L
2
. A stochastic process f
belongs to the classes L
2
[a, b] if f is adapted to the ltration T
t

t0
, i.e. f
t
is
T
t
measurable for all t and
_
b
a
E[f
2
(t)] dt < . A stochastic process f is said to
belong to the class L
2
if f belongs to L
2
[0, t] t > 0. It is possible to dene the
stochastic integral for the larger class of processes f satisfying
_
b
a
f
2
t
dt < with
probability one. Restricting attention to L
2
is simply convenient in the sense that
a range of properties follow directly for Ito integrals dened for L
2
[a, b]. Although
some of these properties are essential, we merely state the main property of
the Ito integral
Theorem 2.1 (The Ito integral is a martingale)
For any process f L
2
[a, b] is holds that for a s < t b
E
__
t
a
f
u
dW
u
[T
s
_
=
_
s
a
f
u
dW
u
(2.2)
The denition of the Ito integral does not guarantee that the process t
_
t
a
f
s
dW
s
has continuous sample paths. However, it is possible to prove that there exists a
stochastic process X with continuous sample paths such that X
t
=
_
t
a
f
s
dW
s
, t
[a, b] with probability one, see e.g. Theorem 3.11. in ksendal [1989]. In the
following we will use this fact to assume that all Ito integrals of the form
_
t
a
f
s
dW
s
are continuous as a function of t.
In the following the term diusion refers to any stochastic process that has con-
tinuous sample paths and has the strong Markov property: For any sequence
of stopping times
2
T
0
< T
1
< . . . < T
n
a process X
t
on the ltered probability
space (, T, T
t

t0
, P) satises the strong Markov property if for any s > 0,
any x
0
, x
1
, . . . , x
n
, and any measurable set A,
P (X
Tn+s
A[X
T
0
= x
0
, X
T
1
= x
1
, . . . , X
Tn
= x
n
) = P (X
Tn+s
A[X
Tn
= x
n
)
(2.3)
That is, the future and the past are independent given the present value of the
process.
The main area of interest in the following will be one-dimensional diusions where
the state space is an interval of the form (l, r) where we may have l = and/or
r = . In some cases we may want to consider closed intervals instead, when
this is the case it will be clearly stated, for now let I be any interval of R. As
usual we let I

denote the interior of I.


2
A stochastic variable T : [0, ] is called a stopping time with respect to the ltration
T
t

t0
if for any t 0 : T t T
t
.
2.1. PRELIMINARIES 7
We are going to focus on diusions modelling certain nancial data, for that to
be reasonable we need the process to behave nicely, let T
y
be the stopping time
dened by
T
y
= inf t 0[X
t
= y (2.4)
we then dene the following important property for the diusion X
t
on state
space I
Denition 2.1 (Regular diusion)
A diusion process, X
t
, with state space I is said to be regular if
P (T
y
< [X
0
= x) > 0, for all x I

, y I (2.5)
This condition rules out that I can be divided into non-communicating subsets,
since all points in I can be reached with positive probability in nite time.
The fact that any diusion has continuous sample paths implies that the proba-
bility of a large change in the value of the process over a short period of time can
be made arbitrarily small by looking at a suciently small time period. That is,
dene
h(t, s) = X
t+x
X
t
so that h(t, s) is the change during a period of length s after t. It can be shown
that any diusion has the property that
> 0 : lim
s0
P ([h(t, s)[ > [X
t
= x) = 0, x, t.
This gives clear indications as to why diusions are well suited for modelling
nancial data. We would not expect any given nancial asset, such as a stock,
to have a large change in price when observed over an arbitrarily short period of
time.
Any diusion process, X, can be characterized by its drift coecient (vector)
and diusion coecient (matrix)
2
, that is processes (x, t) and
2
(x, t) dened
by
lim
s0
1
s
E[h(t, s)[X
t
= x] = (x, t) (2.6)
lim
s0
1
s
E
_
h(t, s)h(t, s)
T
[X
t
= x

=
2
(x, t) (2.7)
The processes we shall work with later on will be solutions to stochastic dieren-
tial equations. Note that it is not true that all such processes will be diusions,
8 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
nor is it true that any diusion can be reached as a solution to some stochastic
dierential equation.
When the term stochastic integral is used it will refer to a larger class of pro-
cess than simply Ito integrals, we remember the formal denition of a stochastic
integral
Denition 2.2 (Stochastic integral)
Let (, T, T
t

t0
, P) be a ltered probability space and assume that W =
W
t

t0
is a standard Brownian motion dened on this probability space. Let
X = X
t

t0
be a one-dimensional stochastic process on (, T, T
t

t0
, P),
we say that X is a stochastic integral if it is on the form
X
t
= x
0
+
_
t
0
a
s
ds +
_
t
0
b
s
dW
s
(2.8)
where a and b are adapted to the ltration T
t

t0
, x
0
constant and b L
2
or b is satisfying any weaker condition such that the Ito integral,
_
t
0
b
s
dW
s
, is
well dened.
From this point on we will no longer explicitly state the assumption that the
processes satisfy the required integrability conditions such that the Ito integral
is well dened. Whenever we present a stochastic integral we implicitly assume
that the conditions are satised.
To avoid any confusion we note that we use the term Ito integral to refer to terms
of the type
_
b
a
f
t
dW
t
whereas stochastic integrals refers to the more general type
of processes given by (2.8).
Following this denition we immediately introduce the following less complicated
notation, such that the integral equation (2.8) will be written in the shorter
dierential form
dX
t
= a
t
dt +b
t
dW
t
(2.9)
X
0
= x
0
. (2.10)
Using this notation we say that that X has the stochastic dierential given by
(2.9) with the initial condition given by (2.10).
It is important to note that the expression (2.9) has no independent meaning, it
is simply shorthand for the expression in Denition 2.2.
2.2. DEFINITION AND EXISTENCE OF SOLUTIONS 9
2.2 Stochastic Dierential Equations - Deni-
tion and Existence of Solutions
This section utilizes the introduction of the stochastic integral to dene stochastic
dierential equations. We are going to look at conditions ensuring that a given
stochastic dierential equation (SDE) has a unique solution. Also conditions
ensuring that this solution is a diusion and conditions ensuring other nice
qualities regarding the solution to a SDE, such as for instance conditions ensuring
stationarity of the solution. The results in this section are based to some extend
on Arnold [1972] and also loosely on ksendal [1989].
We now formally dene the topic of interest, consider
A k-dimensional Brownian motion W
A function : [0, ) R
n
R
n
A function : [0, ) R
n
R
nk
A vector x
0
R
n
where R
nk
denotes the class of n k matrices.
Denition 2.3 (Stochastic Dierential Equation)
Let X be a n-dimensional stochastic process, we say that X satises the
stochastic dierential equation with initial condition x
0
dX
t
= (t, X
t
)dt +(t, X
t
)dW
t
(2.11)
X
0
= x
0
(2.12)
if X satises the integral equation
X
t
= x
0
+
_
t
0
(s, X
s
)ds +
_
t
0
(s, X
s
)dW
s
, t 0. (2.13)
Note that e.g. (2.13) is a system of integral equations for the vector X =
(X
1
, . . . , X
n
)
T
that is for each X
i
we have the equation
X
i
t
= x
i
0
+
_
t
0

i
(s, X
s
)ds +
k

j=1
_
t
0

ij
(s, X
s
)dW
j
s
, t 0. (2.14)
We will often replace the initial condition (2.12) with X
0
D
= Y for some T
0
-
measurable stochastic variable Y .
10 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
Following the notation from diusion processes we denote the function (, )
the drift coecient (function) and (, ) (or sometimes
2
(, )) is denoted the
diusion coecient (function). If (, ) is identically equal to zero we note that
(2.11) reduces to a system of usual dierential equations.
In some cases we can determine an explicit solution to a given SDE but in most
cases this will not be possible. However, it is still possible to state conditions
ensuring existence and uniqueness of a solution.
If, for any two processes X and Y both solving (2.11), we have that
P
_
sup
t0
|X
t
Y
t
| > 0
_
= 0
then we say the solution to the SDE is unique or pathwise unique.
When the Brownian motion is given and we nd a process X adapted to the
ltration T
t

t0
satisfying the SDE we say that we have a strong solution. On
the other hand if we are given the two functions and and are free to construct
some probability space
_

,

T,

T
t

t0
,

P
_
, a Brownian motion

W and a process
X such that
X
t
= x
0
+
_
t
0
(s, X
s
)ds +
_
t
0
(s, X
s
)d

W
s
, t 0.
we then refer to X as a weak solution to the SDE.
Clearly the concept of having a Brownian motion given is purely theoretical.
The phrase is simply used to indicate that two strong solutions will almost surely
have the same sample paths whereas two weak solutions are the same in the
sense that their probability laws coincide. Evidently any strong solution is also
a weak solution, however, a weak solution can be thought of as a strong solution
for an identical SDE dened with the weak solutions Browinan motion. Since
the probabilistic properties of the diusion processes considered are the main
focus, weak solutions are sucient in most applications. It is clear though, that
the distinction between weak and strong solutions is quite subtle and from an
empirical viewpoint less important.
A number of dierent existence and uniqueness theorems for SDEs exists, all
imposing certain sucient (but seldom necessary) Lipschitz conditions on the
functions and , we can now state from Arnold [1972]:
2.2. DEFINITION AND EXISTENCE OF SOLUTIONS 11
Theorem 2.2 (Existence and uniqueness of solutions to SDEs)
Assume that there exists a constant K such that x, y I, t 0
|(t, x)|
2
n
+|(t, x)|
2
nk
K
2
(1 +|x|
2
n
) (2.15)
|(t, x) (t, y)|
n
+|(t, x) (t, y)|
nk
K|x y|
n
(2.16)
where ||
n
is any norm on R
n
and ||
nk
is a matrix norm on R
nk
e.g.
||
nk
= tr(
T
)
1
2
. Let Y be a stochastic variable independent of the Brow-
nian motion and such that
E
_
|Y |
2
n

< (2.17)
Then the SDE
dX
t
= (t, X
t
)dt +(t, X
t
)dW
t
with initial condition X
0
D
= Y has a unique strong solution X with continuous
sample paths.
Proof The proof is based on constructing a sequence of processes X
0
, X
1
, X
2
, . . .
from the recursive denition
X
n+1
= X
0
+
_
t
0
(s, X
n
s
)ds +
_
t
0
(s, X
n
s
)dW
s
It is clear that if the limit lim
n
X
n
exists in L
2
(, P) then this process X
would satisfy the SDE. The formal proof, which can be found on page 42-44 in
ksendal [1989], is based on shoving that the limit does indeed exist, and the
uniqueness then follows from the Ito Isometry and the Lipschitz condition (2.16).

Although the conditions in Theorem 2.2 are sucient to ensure the solution, they
turn out to be too restrictive for many of the nancial implementations we will
use later. Consider the following example.
Example 2.1 (The CIR process)
One of the rst diusion models one meets when modelling the short interest rate
in courses in continuous time nancial theory is the Cox-Ingersoll-Ross (CIR)
model. Let n = k = 1, let r be a short rate and let W be a one-dimensional
Brownian motion, the CIR specication is then that r has the following dynamics
under an appropriate measure:
dr
t
= a(b r
t
)dt +

r
t
dW
t
, a, b, R
+
(2.18)
In the notation used earlier we have
(t, r) = a(b r
t
)
(t, r) =

r
t
12 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
In particular we note that this is a process where the drift- and diusion coe-
cients are independent of the time parameter t. This will be true for the majority
of the nancial models we are going to work with.
Note the slight abuse of notation both here and in the subsequent chapters where
we allow to represent both a function and a parameter, in general () i.e. with
()will represent a function whereas without the brackets will be a parameter.
The topic of later chapters will be how to estimate values of the parameters a, b
and , for now we just consider them to be xed positive real numbers.
We also note that (2.18) is well dened for r
t
[0, ) we shall later state con-
ditions on the parameters ensuring that the process r satises the following
P(r
t
> 0[r
0
> 0) = 1.
Clearly r

r does not satisfy a Lipschitz condition, so the theorem above does


not guarantee that a unique solution to the CIR model exists.
However, the CIR model would hardly be an interesting nancial model if the
there was no unique solution to (2.18). Although no explicit formula for r
t
in
terms of W
t
cannot be found, we shall see below that we can relax the conditions in
Theorem 2.2 enough to include the CIR SDE in the class of stochastic dierential
equations for which a solution exists.
It is common to see the CIR process parameterized by
dr
t
= ( +r
t
)dt +

r
t
dW
t
, > 0, < 0, > 0
which is clearly just a dierent way of stating the same model. This alternative
parametrization has the advantage, when discussing estimation that by the sepa-
ration of and in a sum we avoid working with the product of two parameters
in the sense of the term ab in the rst formulation. However, in many nancial
applications it is natural to use the rst parametrization as this clearly states the
coecient of mean reversion a as well as the long term mean, b see more about
this below. We therefore maintain the original formulation of the model in the
following (also in matters of parameter estimation), we can of course easily get
from one parametrization to the other by a = and ab = .
We start our quest for weaker conditions by noting that the global Lipschitz
condition in Theorem 2.2 can be replaced by a local Lipschitz condition, see
Arnold [1972], page 112. Whenever there is no doubt about the dimensions of a
given variable we will omit the subscripts n or n k on the norms used in the
following.
2.2. DEFINITION AND EXISTENCE OF SOLUTIONS 13
Theorem 2.3 (Existence and uniqueness, weaker conditions)
The results of Theorem 2.2 are still valid if we replace the Lipschitz condition
with the weaker condition, N > 0, K
N
: |x| N, |y| N, t:
|(t, x) (t, y)| +|(t, x) (t, y)| K
N
|x y| (2.19)
Even these weaker conditions do not cover the CIR model, or other square root
diusion models for that matter. However, if we limit ourselves to the one-
dimensional case n = k = 1 we can quote a much weaker condition due to
Yamada and Watanabe [1971], this result has been reported in e.g. Due [1992]
page 240-241.
Theorem 2.4 (Yamada and Watanave)
Assume n = k = 1, sucient conditions for the existence and uniqueness of
a strong solution to the SDE is that is continuous and satises a Lipschitz
condition in x, and that is continuous with the property that
[(t, x) (t, y)[ ([x y[) , x, y, t. (2.20)
Here : [0, ) [0.) is a strictly increasing function with (0) = 0 such
that:
_
z
0
(x)
2
dx = +, z > 0. (2.21)
Example 2.2 (The CIR process continued)
There is a unique solution to the CIR model for appropriate initial condition, we
can use Theorem 2.4 with (x) =

x. This clearly satises the conditions from
Theorem 2.4.
It should be noted that, as stated in Due [1992], even though the conditions
above can be weakened even further, there is a limit of the amount of SDEs for
which we can guarantee the existence of a unique strong solution. There exist
counterexamples to the uniqueness of the solution to a CIR-like process where we
replace the diusion coecient with (x) = [x[

for <
1
2
. However, this does
not imply that all SDEs with this diusion function permit no strong solution, in
particular if we restrict our attention to a class of SDEs containing most interest
rate models, including the CIR model we can state slightly dierent conditions
for existence and uniqueness. In fact, we note from At-Sahalia [1996a] page 550-
551, that for the special case of positive, time-invariant, one-dimensional SDEs,
local Lipschitz and growth conditions on compact subsets not containing zero are
sucient for the pathwise uniqueness of the solution.
14 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
Theorem 2.5 (Local Lipschitz and growth conditions)
Let n = k = 1 and let the state space of the process be I = (0, ). Let the
SDE be time-invariant such that
(t, x) = (x)
(t, x) = (x).
Sucient conditions for pathwise uniqueness of the solution, is that for each
compact subset of I of the form K = [1/R, R], R > 0 there exists constants
N
R,1
and N
R,2
such that x, y K
|(x)| +|(x)| N
R,1
(1 +|x|) (2.22)
|(x) (y)| +|(t, x) (t, y)| N
R,2
|x y| . (2.23)
This theorem can be implemented to show that sucient conditions for the exis-
tence of a unique strong solution (up to possibly an explosion, se below) is that the
drift and diusion functions have s 2 continuous derivatives on I = (0, ) and
(x) > 0, x I. See At-Sahalia [1996a] page 550-551 for a discussion of these
weaker existence conditions. Clearly these results apply to a much smaller class
of processes than those above (one-dimensional, positive and time-invariant) but
for practical purposes this class is sucient when modelling, for instance, interest
rate processes.
Is is now natural to consider the properties of solution to a SDE in the cases
where one exists.
From Arnold [1972] Theorem 9.3.1 or from ksendal [1989] Theorem 7.6 we
immediately note that the conditions guaranteeing the existence and uniqueness
of the solution also guarantee that the solution will be a diusion in the sense
that it has the strong Markov property, we have already seen that the solution
will have continuous sample paths. We summarize in the following
Theorem 2.6 (The solution as diusion processes)
Assume that the conditions from the existence and uniqueness theorems are
satised. If (t, x) R
n
and (t, x) R
nk
are continuous in t then the
solution to the stochastic dierential equation is a n-dimensional diusion
process with drift vector (t, x) and diusion matrix
2
(t, x) = (t, x)(t, x)
T
.
In the case of time-invariant drift- and diusion coecient (like those of the
CIR-model) where we have
(t, x) = (x)
(t, x) = (x)
2.3. TRANSITION PROBABILITIES 15
the solution to the SDE will always be a homogeneous diusion process in the
sense that the distribution of X
t
given X
s
for s < t only depends on (t, s) through
the dierence t s. Again as mentioned above the majority of the economic
models used to describe nancial assets will have this quality.
2.3 Transition Probabilities
We let X be the solution to the SDE
dX
t
= (t, X
t
)dt +(t, X
t
)dW
t
, X
0
D
= Y.
That is, X is a n-dimensional diusion with values in the space (R
n
, B
n
) where
B
n
is the Borel algebra on R
n
. As discussed earlier X might only take values on
a subset I R
n
with positive probability
3
. If this is the case we could just work
with the restriction of B
n
to I.
For 0 < s < t we have, by the Markov property, that the conditional distribution
for X
t
given that X
s
= x is independent of the initial condition (that is indepen-
dent of the initial distribution Y ). We dene the functions p
t,s
(, ) : R
n
B
n

[0, 1] by
p
t,s
(x, A) = P(X
t
A[X
s
= x), A B
n
, x R
n
(2.24)
such that
x p
t,s
(x, A) is measurable for all A B
n
A p
t,s
(x, A) is a probability-function on (R
n
, B
n
) for all x R
n
.
When the diusion process is homogeneous, which for instance would be the case
for time-invariant drift and diusion parameters in the SDE, we have that p
t,s
only depends on the values (t, s) through the values of the dierence t s.
In some simple cases it is possible to solve a SDE explicit and derive an ex-
pression for the transition probabilities p
t,s
. Even if an explicit solution for a
given SDE cannot be obtained it is still possible to derive certain properties of
the transition probabilities, especially the Kolmogorov dierential equations
can sometimes provide means of solving for explicit expressions for the densities
of the transition densities, see for instance Bjork [1998] section 4.6. We mentioned
above that the CIR-model does not have an explicit formula for r
t
in terms of W
t
but a formula for the transition probabilities can in fact be found, see formula 18
in the original paper Cox et al. [1985].
3
this will be the case for most interest rate models, for instance, where we almost always
have P(r
t
< 0[r
s
> 0) = 0, s < t.
16 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
However, for the majority of cases this is not possible. Even though we know
that p
t,s
exists we can nd no analytic expression, clearly this will cause problems
based on the fact that when we attempt to model any process by using stochastic
dierential equations we will do so based on discretely made observations. That
is, we have observed X
t
0
, X
t
1
, . . . , X
t
N
and from these observations we wish to
estimate some parameter , where and depends on . If the transition
probabilities where known, we could consider the density f
s,t
(x, y, ) say, for
p
t,s
(x, y) for the parameter . The likelihood function in this case is
L() =
n

i=1
f
t
i1
,t
i
_
X
t
i1
, X
t
i
,
_
.
Inference about the parameter could then be based on maximum likelihood
estimation. These issues will be discussed in greater detail in the subsequent
chapters.
For later use we introduce the transition operator based on the transition prob-
abilities.
Let B(R
n
) be the space of bounded measurable functions dened on R
n
, for any
f B(R
n
) we dene the transition operator T
s,t
: B(R
n
) B(R
n
) by
T
s,t
(f)(x) =
_
R
n
f(y)p
s,t
(x, dy) (2.25)
A natural relationship between p
s,t
and T
s,t
is the obvious formula
p
s,t
(x, A) = T
s,t
(1
A
)(x).
Again we note that for a homogeneous process T
s,t
depend on (s, t) only through
t s.
2.4 Ergodicity of the Solution
A natural question at this time would be to ask what we can say about the distri-
bution of a solution to a given SDE, we have already seen sucient conditions for
a solution to exist and we have established conditions ensuring that this solution
will be a diusion process. Before we can answer the question of when a solution
can be given a initial distribution such that it is stationary, we need to dene a
number of concepts associated with stochastic dierential equations and diusion
processes.
2.4. ERGODICITY OF THE SOLUTION 17
It is clear that for any Markov process to be stationary we must have that it
is homogeneous, this will be the case for the solution to a SDE with time-
invariant drift- and diusion functions. We will restrict the following to the
one-dimensional SDE
dX
t
= (X
t
)dt +(X
t
)dW
t
(2.26)
we will work in the following setting
() and () only depend on time through the parameter x.
and are continuous and satisfy conditions such that a unique strong
solution exists.
(x) > 0, x
From Itos formula we note a close connection between SDEs and partial dier-
ential equations, we therefore introduce the partial dierential operator /
Denition 2.4 (The Innitesimal Operator)
For any function f : R R where f is a C
2
function the innitesimal
operator for the SDE in (2.26) is dened by
/f(x) = (x)
f
x
(x) +
1
2

2
(x)

2
f
x
2
(x) (2.27)
If X
t
is a solution to the time-invariant SDE (2.26), we note that Itos formula
for the dynamics of a smooth transformation of X
t
is simply expressed using the
operator /:
Let f : R R be C
2
(that is twice dierentiable and with continuous second
derivative) and dene Z
t
= f(X
t
), the process Z
t
has the stochastic dierential
given by
dZ
t
= /f(X
t
)dt +(X
t
)
f
x
(X
t
)dW
t
. (2.28)
For the transition operator T
s,t
dened above we can derive / by the following
theorem
Theorem 2.7 (Relationship between A and T
s,t
)
For any function bounded function f : R R where f is a C
2
function and
such that T
s,t
(f) is also C
2
it holds that
/f = lim
ts
1
t s
(T
s,t
(f) f) (2.29)
18 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
Proof This theorem is a special case of Theorem 3 page 293 in Gihman and
Skorohod [1972] where the more general case of a multi-dimensional SDE with
time varying drif- and diusion functions is treated.
It should be noted that many presentations dene the innitesimal operator by
(2.29) and then derive the expression (2.27) for the stochastic dierential equation
in question. This is the case in Arnold [1972] the results are of course true
whichever way we choose to dene /.
2.4.1 Scale functions and speed measures
We know from Theorem 2.1 that, given an integrability condition, any Ito stochas-
tic integral is a martingale. This result can be strengthened to give that a solution
to a SDE is a martingale (assuming enough integrability) if and only if the drift
function is identically equal to zero, see Bjork [1998] Lemma 3.9.
If we wish to nd a function s : R R and s C
2
such that Y
t
= s(X
t
), t is a
martingale; Itos lemma gives
dY
t
=
_
(X
t
)
s
x
(X
t
) +
1
2

2
(x)

2
s
x
2
(X
t
)
_
dt +X
t
s
x
dW
t
This means that we now have Y
t
is a martingale if and only if /s = 0.
We have assumed that (x) > 0, if
()

2
()
is integrable the solution to the dierential
equation /s = 0 with respect to s

=
s
x
is
s

(x) = K exp
_
2
_
x
x
0
(z)

2
(z)
dz
_
(2.30)
Where x
0
is a xed value in the interior of the range of X, I

, and K is con-
stant. We need two conditions s(x
1
) = s
1
and s(x
2
) = s
2
to determine a unique
expression for s.
Functions s determined by (2.30) are called scale functions and measures with
density proportional to exp
_
2
_
x
x
0
(z)

2
(z)
dz
_
with respect to the Lebesgue mea-
sure are called scale measures. That is, with slight risk of confusion, we shall
refer to s as a scale function and to s

=
s
x
as the density of the scale measure.
Since s is clearly monotone, it is invertible and for J = s(I

) we have that
s
1
: J I

satises
dY
t
= ds(X
t
)
= (X
t
)s

(X
t
)dW
t
=
_
s
1
(Y
t
)
_
s

_
s
1
(Y
t
)
_
dW
t
= a(Y
t
)dW
t
2.4. ERGODICITY OF THE SOLUTION 19
where a : J R is given by a(y) = (s
1
(y)) s

(s
1
(y)).
If we let a, b, x I

such that a < x < b and dene the rst time the process X
reaches either end-point in the interval [a, b] by

ab
= inf t 0[X
t
a, b
then if we assume that X
t
is regular it follows that
P (
ab
< [X
0
= x) = 1.
By the fact that Y is a martingale it follows quite easily that the probability of
X reaching b before a is given by
P(X

ab
= b[X
0
= x) =
s(x) s(a)
s(b) s(a)
.
For the scale function s given by s

(x) = exp
_
2
_
x
x
0
(z)

2
(z)
dz
_
we introduce the
density of the speed measure by the function m
m(x) =
1

2
(x)s

(x)
. (2.31)
The scale function and the speed measure turn out to be important in deriving
conditions ensuring that a solution to a given SDE will be both bounded and
ergodic.
2.4.2 Bounded Solutions
Assume that the solution to the SDE, X, is dened on the open interval I = (l, r)
where we may have l = and/or r = +. We dene the rst time the process
reaches either boundary by
= inf t 0[X
t
l, r = inf t 0[X
t
= l or X
t
= r (2.32)
We dene the following two integrals, that may not be nite
I
1
(x) =
_
x
x
0
s

(y)dy =
_
x
x
0
exp
_
2
_
y
x
0
(z)

2
(z)
dz
_
dy
I
2
(x) =
_
x
0
x
s

(y)dy =
_
x
0
x
exp
_
2
_
y
x
0
(z)

2
(z)
dz
_
dy
where x
0
(l, r) is xed.
20 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
It now follows directly from Theorem 1, Chapter 4, section 16 in Gihman and
Skorohod [1972] that we can state conditions ensuring that X does not hit the
boundary in nite time
Theorem 2.8 (Bounded Solutions)
Assume that the coecients of the SDE are time-invariant and such that a
unique strong solution exists. If I
1
(r) = I
2
(l) = + then
P( = +[X
0
= x) = 1, x I
2.4.3 Conditions for an Ergodic Solution
As noted above a necessary condition for the existence of a invariant distribution
is that the process is homogeneous, which is guaranteed by the time-invariant
drift- and diusion functions.
An equally important condition is that the process stays nite with probability
one, where nite is meant in the sense of Theorem 2.8.
If we impose the restriction that the drift function should be identically zero we
know from above that the solution (when one exist) is a martingale. If we for
now restrict ourselves to this case we can use the scale function as dened above
to transform to the general case, the details are given below.
Consider the one-dimensional stochastic dierential equation
dY
t
= a(Y
t
)dW
t
.
It turns out that working with this stochastic dierential equation will simplify
things considerably. In fact we can now quote from Chapter 4, section 18 of
Gihman and Skorohod [1972], that is assume
a() satises a Lipschitz condition such that a unique strong solution Y
exists.
the condition in Theorem 2.8 is satised

_
r
l
1
a
2
(s)
ds <


is a probability density proportional to
1
a
2
(y)
.
Assume also that the process Y satises the initial condition that Y
0
is distributed
according to the density

. Then Y is stationary and ergodic and the invariant
distribution has density

with respect to the Lebesgue measure.
2.4. ERGODICITY OF THE SOLUTION 21
For any initial condition Y
0
= y
0
where l < y
0
< r the distribution given by the
density

is a limiting distribution in the sense that
lim
t
P(Y
t
y[Y
0
= y
0
) =
_
y
l

(z)dz.
We are now ready to state the general case
Theorem 2.9 (Ergodic Solution)
Consider the stochastic dierential equation
dX
t
= (X
t
)ds +(X
t
)dW
t
Assume that an unique strong solution exists.
Assume that the conditions in Theorem 2.8 hold, that is
_
r
x
0
exp
_
2
_
y
x
0
(z)

2
(z)
dz
_
dy =
_
x
0
l
exp
_
2
_
y
x
0
(z)

2
(z)
dz
_
dy =
Assume that
_
r
l
m(x)dx < where m() is the speed function dened
above.
Then a probability distribution is given by the density (with respect to the
Lebesgue measure) proportional to the speed measure
(x) =
K

2
(x)
exp
__
x
x
0
2(z)

2
(z)
dz
_
(2.33)
where K is a constant such that
_
r
l
(x)dx = 1.
Let X
0
be distributed according to the density .
Then X is stationary and ergodic with a invariant measure with density .
It holds for any x
0
(l, r) that
lim
t
P(X
t
x[X
0
= x
0
) =
_
x
l
(z)dz.
Proof All the work has been done in Chapter 4, section 18 in Gihman and
Skorohod [1972]. Dene Y
t
= s(X
t
) where s is the scale function for the SDE,
since s is continuous and monotone the process Y is well dened and the distri-
butional properties of interest exist simultaneously for X and Y see page 135 in
Gihman and Skorohod [1972].
22 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
We already know that Y follows the SDE
dY
t
= a(Y
t
)dW
t
with a(Y
t
) = (s
1
(Y
t
)) s

(s
1
(Y
t
)).
We have by transformation of the integral
_
r
l
1
a
2
(y)
dy =
_
r
l
1
( (s
1
(y)) s

(s
1
(y)))
2
dy
=
_
r
l
1
( (x) s

(x)))
2
dx < .
Therefore Y is ergodic with stationary distribution given by the density function
(y) =
K
a
2
(y)
, K constant
by the theorem in Gihman and Skorohod [1972] mentioned above.
Now all that is left is to determine an expression for the density of the invari-
ant distribution for X
t
= s
1
(Y
t
), by the rule of transformation of densities see
Theorem 10.3 in Hansen [2001] we have the invariant density of X
(x) =
_
(s
1
)
1
(x)
_

_
(s
1
)
1
_

(x)

= (s(x)) [s

(x)[
=
K
a
2
(s(x))
s

(x)
=
K

2
(x)(s

(x))
2
s

(x)
=
K

2
(x)
exp
__
x
x
0
2(z)

2
(z)
dz
_
.
Where we have used the denition of a() and the functional form of the derivative
of s. We note that the invariant density is proportional to the scale measure,
hence the condition that the function m() is integrable is natural as it ensures
that it can be scaled to be a probability density.
2.4.4 Exponential Ergodicity
When working with discrete time homogeneous Markov chains a useful and well
known tool to prove stationarity and ergodicity is the drift criterion. Using the
drift criterion in discrete time it is possible to derive conditions under which the
Markov chain is not only ergodic but geometrically ergodic. That is, let X
n
be a
2.4. ERGODICITY OF THE SOLUTION 23
discrete time Markov chain with state space A, let P
n
(A[x) = P(X
n
A[X
0
= x)
be the n step transition probability and the invariant measure. The discrete
time drift criterion then provides conditions for which it holds that
lim
n

n
[[P
n
([x) [[ = 0, x A. (2.34)
Where [[ < 1 and [[f[[ = sup
_
[
_
X
g(x)df(x)[ : [g(x)[ 1
_
.
Similar results can be obtained for continuous time Markov chains. These results
may seem somewhat more complicated to work with than their discrete time
counterparts. In this section we attempt to use the results from the general
continuous time Markov chain theory to examine the case where the Markov
process in question is known to be a solution to a stochastic dierential equation.
The author is not aware of any literature where the continuous time Markov
chain results are used to derive conditions for the class of Markov chains that are
solutions to stochastic dierential equations.
Necessary continuous time Markov chain theory
We start by introducing the necessary continuous Markov chain theory. For sim-
plicity and because we aim to use the results on real valued stochastic dierential
equations we consider the case of a Markov process with values in R.
Let X
t
be a continuous time non-explosive Markov chain with state space A R.
As in discrete time we let P
t
(A[x) = p
t,0
(x, A) = P(X
t
A[X
0
= x), we seek
conditions implying that X
t
is exponentially ergodic, that is an invariant
measure, , exists such that
[[P
t
([x) [[ M(x)
t
, t 0, x A. (2.35)
Where M(x) is nite and [[ < 1.
As shown in Down et al. [1995] the process may converge exponentially quickly
in the strong sense of V-uniform ergodicity:
For any measurable function V : A [1, ) Down et al. [1995] dene exponen-
tially ergodicity by V-uniform ergodicity which is given by
[[P
t
([x) [[
V
V (x)D
t
, t 0, x A. (2.36)
Where D < is a constant, [[ < 1 and the V-norm [[ [[
V
is dened for a
measure by
[[[[
V
= sup
_
[
_
X
g(x)d(x)[ : [g[ V
_
(2.37)
we thus note that the V-norm is equal to the total variation norm when V 1.
24 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
A nal denition needed before we are able to state and prove the main result of
this section is the extended generator. Let f : A R
+
R, assume that a
measurable function g : A R
+
R exists such that
E[f(X
t
, t)[X
0
= x] = f(x, 0) + E
__
t
0
g(X
s
, s)ds

X
0
= x
_
(2.38)
_
t
0
E
_
[g(X
s
, s)[

X
0
= x

ds < . (2.39)
In this case we write f = g and is called the extended generator of the Markov
process X
t
. For a given Markov chain we dene D() to be the set of all functions
f for which a g function as above exists. D() is referred to as the domain of
. Clearly the name extended generator hints at the fact that this denition
is an extension of the denition of a innitesimal generator for general Markov
chains. As we shall see below the same is true when we turn our attention back
to stochastic dierential equations, here we have a well dened expression for
the innitesimal generator and the relationship between this generator and the
extended generator will be explored below. We note that the extended generator
is only dened for non-explosive (ie. bounded) processes.
Assume that the Markov chain satises the regularity conditions: -irreducibility
and aperiodicity (see page 1674-1675 in Down et al. [1995] for recapitulation of
these issues).
We can now state from Down et al. [1995], Theorem 5.2 (c): Let b > 0, c > 0 be
constants, let V : A [1, ) be a real valued function and let C be a petite
4
Borel set on A.
If the drift condition
V (x) cV (x) +b1
C
(x) (2.40)
is satised then X
t
is V-uniformly ergodic.
Implementation on SDE setting
We now attempt to use the theory presented above on the particular case where
the Markov process is a weak solution to the stochastic dierential equation with
state space A R
dX
t
= (X
t
)dt +(X
t
)dW
t
. (2.41)
To simplify the presentation we maintain the following assumptions for the rest
of this section
4
The denition of a petite set is similar to that of a small set, see page 1674 Down et al.
[1995]. Given continuity conditions on P([x) it follows that all compact sets are petite, Tweedie
and Pollard [1976]. When this is the case we refer to the Markov chain as a T-chain.
2.4. ERGODICITY OF THE SOLUTION 25
Assumption 2.1
Assume that a weak solution, X
t
, to (2.41) exists.
Assume that X
t
is bounded.
Assume also that X
t
is an irreducible T-chain.
These assumptions thus impose restrictions on the processes for which the fol-
lowing is true. In particular we note that the T-chain assumption is similar to
the usual regularity conditions including continuous transition densities from the
discrete time drift criterion.
We can now show the following important result, which simply states that for
suciently smooth functions we can state an explicit expression satisfying the
conditions for the extended generator.
Lemma 2.1 (The extended generator)
Let / be the innitesimal operator for the process X
t
as given by Denition
2.4. Let f : R R be a C
2
function, such that
_
t
0
E
_
(s)
f
x
(X
s
)

X
0
= x
_
ds < ,
_
t
0
E
_

2
(s)

2
f
x
2
(X
s
)

X
0
= x
_
ds <
(2.42)
then
/f = f. (2.43)
Proof We need to show that /f satises (2.38), condition (2.39) follows di-
rectly from the assumptions in the theorem.
From Itos lemma we have
df(X
t
) = /f(X
t
)dt +(X
t
)
f
x
(X
t
)dW
t
This gives
f(X
t
) = f(x
0
) +
_
t
0
/f(X
s
)ds +
_
t
0
(X
s
)
f
x
(X
s
)dW
s
.
Taking the conditional mean gives us
E[f(X
t
)[X
0
= x] = f(x) + E
__
t
0
/f(X
s
)ds

X
0
= x
_

26 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS


The lemma thus explains the name extended generator in this case.
We are now ready to prove the main result of this section
Theorem 2.10 (Exponential ergodicity)
Let V : A [1, ) be a deterministic function such that (2.42) is satised.
Assume that there exists constants c > 0 and b > 0 and a compact set C A
such that
/V (x) cV (x) +b1
C
(x) (2.44)
then the solution to the SDE (2.41) is V-uniformly ergodic.
Proof From Lemma 2.1 we note that the innitesimal operator is also an
extended generator in this setting. This means that all we need to prove the the-
orem is to verify the conditions from (2.40) necessary to imply uniform ergodicity
in the general case. However, this follows directly from the assumptions made in
regards to the existence of the solution to the SDE.
Necessary conditions on the drift and diusion functions for existence and prop-
erties of this solution are discussed earlier in this chapter. All we need is to
require that any one of the existence and uniqueness theorems is satised as well
as regularity conditions on the drift and diusion function ensuring that the pro-
cess is irreducible and aperiodic and that the process does not explode in nite
time, see for instance Theorem 2.8.
The theorem is similar to the drift criterion known from discrete time, any drift
function can be used. The only requirements on V is that it is smooth enough
to satisfy Itos lemma and bounded such that (2.42) is satised.
We end this section with a particular choice of drift function
Corollary 2.1 (Explicit drift function)
Consider the drift function V (x) = 1 + x
2
.
Sucient conditions for V-uniform ergodicity is that there exists constants
c > 0 and b > 0 and a compact set C such that
2(x)x +
2
(x) c(1 +x
2
) + b1
C
(x).
Proof For the drift function V (x) = 1 + x
2
we have
v
x
= 2x and

2
f
x
2
= 2.
This give
/V (x) = (x)
V
x
(x) +
1
2

2
(x)

2
V
x
2
(x) = 2(x)x +
2
(x)
and the result follows.
2.4. ERGODICITY OF THE SOLUTION 27
0 10 20 30 40 50 60 70 80 90 100
1
2
3
4
5
6
CIR
Figure 2.1: A simulated sample-path of the CIR-process with parameter values a = 2, b = 3
and = 1 for time index 0 to 100, simulated according to the Milstein scheme with t = 0.001
and 100000 observations, see Seydel [2002] page 86.
The corollary presents a particular simple example of conditions sucient for the
results to follow. However, no results guarantee that this particular choice is the
best for a given stochastic dierential equation. Similar to the discrete case the
actual choice of drift function should be chosen on a case by case basis.
Example 2.3 (The CIR process continued)
We complete this chapter by returning to the CIR-process mentioned above that
is the process given by the SDE dened on I = (0, ).
dr
t
= a(b r
t
)dt +

r
t
dW
t
, a, b, R
+
We have already seen that a strong solution exists and as stated above no closed
form expression for the solution can be found. Now we discus the CIR process in
more detail, this will illustrate the use of the theorems above.
We start by noting that this process exhibits mean-reversion towards b with the
rate of a, when [b r
t
[ is large then the dt-term will dominate the Ito term and
push the process towards b.
We start the analysis by determining the rst derivative of the scale function and
solving for the speed measure, for now we omit the constant in the expression for
28 CHAPTER 2. STOCHASTIC DIFFERENTIAL EQUATIONS
s

(x) = exp
_
2
_
x
1
(z)

2
(z)
dz
_
= x
2ab

2
exp
_
2a(x 1)

2
_
Theorem 2.8 gives that if r
0
I

= I = (0, ) then r
t
> 0 and r
t
is nite for all
t with probability one if
_
1
0
s

(x)dx =
_

1
s

(x)dx =
where x
0
= 1 in the denition of s

and the limits of the integrals has been chosen


at random. The only requirement is according to 2.8 that x
0
I

so x
0
= 1 is as
good a choice as any. This choice will help a bit when evaluating the integrals. We
see that the in the rst integral the important term is x
2ab

2
since the exponential
part is clearly increasing. In other words we have that
exp
_
2a

2
__
1
0
x
2ab

2
dx
_
1
0
x
2ab

2
exp
_
2a(x 1)

2
_
dx
_
1
0
x
2ab

2
dx
and we know that
_
1
0
x
2ab

2
dx < if and only if
2ab

2
1. So a necessary
condition for Theorem 2.8 to hold is 2ab
2
. Likewise
_

1
x
2ab

2
dx
_

1
x
2ab

2
exp
_
2a(x 1)

2
_
dx
so we see that a necessary and sucient condition for Theorem 2.8 to hold for
the CIR process is 2ab
2
.
We now turn to the integral of the speed function
_

0
m(x)dx =
_

0
1

2
(x)s

(x)
dx
=
_

0
x
2ab

2
1
exp
_
2a

2
x +
2a

2
_
1

2
dx
= e
2a

2
1

2
_

0
x
2ab

2
1
e
2a

2
x
dx
Dene =
2ab

2
> 0 and =
2a

2
> 0 and we have
_

0
m(x)dx = e

2
_

0
x
1
e
x
dx <
2.4. ERGODICITY OF THE SOLUTION 29
where we recognize the gamma-integral (or Eulers second integral).
So now we have that the CIR-process satises Theorem 2.9 and hence r
t
is ergodic,
we also know that the invariant distribution has density proportional to the speed
function which means that we can determine the exact density by
1 =
_

0
Km(x)dx = Ke

2
_

0
x
1
e
x
dx
= Ke

2
()
_
1

so we have
K = e

()
we conclude that the invariant measure has the following density
(x) = Km(x) =

()
x
1
e
x
.
Finally we note that we cannot use the drift criterion to derive weaker conditions
for ergodicity. Since the condition 2ab
2
is necessary for the process to be
bounded, it will also be necessary for the drift criterion whichever drift function
we may use.
We can thus conclude this example with the following result: Consider the CIR-
process
dr
t
= a(b r
t
)dt +

r
t
dW
t
, a, b, R
+
If 2ab
2
then the solution of the SDE satises the following
P(r
t
> 0[r
0
> 0) = 1
r
t
is nite with probability one.
r is ergodic with stationary measure a gamma-distribution with parameters
(,
1

), where =
2ab

2
> 0 and =
2a

2
> 0.
In particular we have that the stationary mean is given by

= b which we already
suspected based on the comments about mean reversion above.
30
Chapter 3
Parameter estimation in diusion
models
Consider the following one-dimensional stochastic dierential equation
dX
t
= (X
t
; ) dt + (X
t
; ) dW
t
, X
0
= x
0
(3.1)
where we assume that the functions and are known functions except for the
value of the d-dimensional parameter which belongs to some subset R
d
.
To illustrate one can think of the Cox-Ingersoll-Ross model, given by the SDE
(2.18), in this case the parameter would be = (a, b, )
T
and = R
3
++
=
x R
3
[x
i
> 0, i = 1, . . . , 3. We will mostly restrict ourselves to the case where
the functions and do not depend on time, however, many of the results
can be modied to cover the more general case of diusions that are not time-
homogeneous.
We will implicitly assume throughout the remainder that the functions and
satisfy the conditions from one of the existence and uniqueness theorem stated
previously.
Under the regularity conditions discussed earlier the solution to the stochastic
dierential equation will be a homogeneous diusion process, in particular it will
satisfy the Markov property which will be a help when considering the likelihood
function for . In many empirical implementations of diusion models we will not
be able to maintain a continuous record of the process of interest, the available
data will consist of discretely sampled observations. In all of the following the
main task is to draw inference about the parameter based on observations of
the process X, where X is assumed to satisfy the stochastic dierential equation
above. We assume that we have a nite number of observations of X at discrete
time point, that is
X
t
0
, X
t
1
, . . . , X
t
n
, where t
0
< t
1
< . . . , < t
n
As mentioned in Chapter 2 if f
s,t
(X
s
, X
t
; ) is the density of X
t
given X
s
, s < t,
when the true parameter value is then the likelihood function for based on
the observations is
L
n
() =
n

i=1
f
t
i1
,t
i
_
X
t
i1
, X
t
i
;
_
.
In some cases we can use e.g. the Kolmogorov (forward and backward) dier-
ential equations to derive explicit expressions for the transition densities, this is
31
32 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
for instance true for the CIR model. However, the CIR model and other models
simple enough to guarantee explicit solutions are seldom in good agreement with
empirical observations. In the following we will to some extent continue to use
the CIR process as an illustrative example for the procedures considered. This
may seem a bit tedious as we have just stated that we may be able to exploit
the analytical expressions for the transition densities to perform maximum likeli-
hood. It is clear though, that using this approach will severely limit the number
of models we can work with. To work around this problem we introduce infer-
ence based on estimating functions. One might consider the maximum likelihood
method mentioned above to be a special case of this method as the estimating
function in this case is the score function.
For completeness of the presentation we start with an attempt at estimating
by using an Euler discretisation of the process X solving 3.1.
3.1 Approximating the likelihood function
Assume for simplicity in the following section that we have equidistant observa-
tions, i.e. t
i
t
i1
= , i = 1, . . . , n. Consider the Euler scheme for approxi-
mating the solution to (3.1), where W
i
= W
t
i
W
t
i1

X
t
i
=

X
t
i1
+
_

X
t
i1
;
_
+
_

X
t
i1
;
_
W
i
(3.2)
this scheme converges strongly to the solution of the SDE when 0 with
order 1/2 in the sense of Seydel [2002] Denition 3.3.
We say that a discrete time approximation,

X
t
, converges strongly with order
> 0 to the true solution of a SDE, X
t
, on the closed interval [0, T] if
E
_

X
T


X
T

_
= O(

). (3.3)
Using the Euler scheme (3.2) we can approximate the transition probabilities by
the Gaussian distribution, that is for a process that does indeed satisfy (3.2)

X
t
i
[

Xt
i1
=x
^
_
x +(x; ) ,
2
(x; )
_
For this approximation to the real process we can easily derive an approximation
to the log likelihood function

n
() = log (L
n
())
=
n

i=1
_
1
2
log
_

2
_
X
t
i1
;
_

_
+
_
X
t
i
X
t
i1
(X
t
i1
; )
_
2
2
2
_
X
t
i1
;
_

_
3.1. APPROXIMATING THE LIKELIHOOD FUNCTION 33
Assuming all partial derivatives exist we get the score function corresponding to
the Euler approximation, where we let

dene the column vector of partial


derivatives of with respect to
S
n
() =
n

i=1
_
1
2

2
_
X
t
i1
;
_

2
_
X
t
i1
;
_
_
X
t
i
X
t
i1
(X
t
i1
; )
_

_
X
t
i1
;
_

2
_
X
t
i1
;
_

_
X
t
i
X
t
i1
(X
t
i1
; )
_
2

2
_
X
t
i1
;
_
2
4
_
X
t
i1
;
_

_
=
n

i=1
_

2
_
X
t
i1
;
_
2
4
_
X
t
i1
;
_

2
_
X
t
i1
;
_

_
X
t
i
X
t
i1
(X
t
i1
; )
_
2
_

_
X
t
i1
;
_

2
_
X
t
i1
;
_
_
X
t
i
X
t
i1
(X
t
i1
; )
_
_
. (3.4)
The estimator of is then found by solving, possibly numerically, the equations
S
n
() = 0 for . However, we can in general not expect that the approximated
score is an unbiased estimating function. That is, it will in most cases be true
that
E

[S
n
()] ,= 0.
To see the importance of having an unbiased estimating function let
0
be the
true unknown parameter and let

n
be the estimator derived by the condition
S
n
(

n
) = 0. Consider the rst order Taylor formula
0 = S
n
(

n
) = S
n
(
0
) +

S
n
(

)(

0
).
Where

S
n
(

) is the matrix of derivatives evaluated at some convex combinations


of

n
and
0
.
If E

0
[S
n
(
0
)] ,= 0 we would thus, in general, expect that the estimator

n
will be
a non-central estimator. Below we present conditions ensuring that estimators
derived from unbiased estimating functions are consistent. For now we examine
further the properties of estimators derived from the approximation to the true
score (3.4).
Based on the Euler scheme we should expect the function S to provide good
estimators only when is small. We would also expect that estimators derived
from this scheme are consistent and asymptotically normal only when t
n

and 0. It can be shown that in the case where the diusion function ()
depends on the parameter the bias of the estimating function given by (3.4)
will be of order n, see Srensen [1997].
To explore these issues further we turn to simulation and consider the CIR process
once again, this process has the property that we can solve S
n
() = 0 explicitly
and thus derive closed form expressions for the estimators.
34 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
Example 3.1 (Simulation study of the Euler approximation)
Consider the CIR process
dX
t
= a(b X
t
)dt +
_
X
t
dW
t
, a, b, R
+
The estimators based on (3.4) are given by
a
n
=
n
2
n

n
i=1
_
X
t
i
X
1
t
i1
_
+ (X
t
n
X
0
)

n
i=1
X
1
t
i1

_
n
2

_
n
i=1
X
t
i1
_ _
n
i=1
X
1
t
i1
__

b
n
=
n

n
i=1
X
t
i

_
n
i=1
X
t
i
X
1
t
i1
_ _
n
i=1
X
t
i1
_
n
2
n
_
n
i=1
X
t
i
X
1
t
i1
_
+ (X
t
n
X
t
0
)

n
i=1
X
1
t
i1

2
n
=
1
n
n

i=1
X
1
t
i1
_
X
t
i
X
t
i1
a(

b X
t
i1
)
_
2
.
Consider now the situation where we simulate a sample path of the CIR process
by using e.g. a Milstein scheme as mentioned briey in Example 2.3 (see also
Seydel [2002] page 86.). We choose the values (a, b, ) = (1, 2, 1) for which the
solution is known to be ergodic and simulate the process and calculate the value
of estimators for dierent values of and t
n
. For each xed choice of (, t
n
)
the process is simulated 1000 times, the process itself is simulated for a much
ner discretization than the one used in the estimations. We choose to simulate
the process with time between each simulation point equal to 0.0001 thus giving a
good approximation to the true process. We then sample from this process such
that we have observations that are apart, thereby ensuring that the majority of
the inaccuracy in the estimation results stems from the quality of the estimating
function and not from the simulation of the process. The results are summarized
in Table 3.1. This table give an opportunity to examine the eects of both an
increase in the number of observations and the eect of decreasing the interval
between observations.
We see that all three estimators are quite inaccurate even for smaller intervals and
higher values of t
n
. In particular we note that both a
n
and
2
n
are systematically
too small for all values whereas

b
n
seem to be a more precise estimator for the
same values.
There is some ambiguity about the results of increasing the number of observa-
tions without decreasing the size of the observation interval. However, there is
notable improvement in some of the estimators when t
n
increases and decreases
simultaneously. This veries the results mentioned above that estimators based
on Euler approximation of the likelihood function are consistent under regularity
conditions when 0 and t
n
.
Based on the simulations results above, there should be no doubt that the smaller
is, the closer is the mean of the estimators to the true values. This is somewhat
3.2. MARTINGALE ESTIMATING FUNCTIONS 35
t
n
a
n

b
n

2
n
mean sde mean sde mean sde
100 2.5 0.3821 0.0811 2.0010 0.1699 0.2312 0.0796
100 1 0.6530 0.1132 2.0020 0.1509 0.4804 0.0937
100 0.1 0.9898 0.1427 2.0039 0.1434 0.9177 0.0449
250 2.5 0.3711 0.0530 1.9984 0.1063 0.2435 0.0519
250 1 0.6399 0.0684 1.9992 0.0939 0.4916 0.0620
250 0.1 0.9668 0.0866 2.0002 0.0885 0.9203 0.0282
500 2.5 0.3697 0.0380 2.0015 0.0766 0.2503 0.0385
500 1 0.6352 0.0490 2.0010 0.0673 0.4947 0.0438
500 0.1 0.9588 0.0618 2.0023 0.0635 0.9210 0.0200
Table 3.1: Results of simulation study of estimators based on the Euler approximation of the
likelihood function, data is a simulated Cox-Ingersoll-Ross process with true values (a, b,
2
) =
(1, 2, 1).
less clear regarding the estimator for b but both
n
and
2
n
improve dramatically
when we decrease . We also note that the standard deviation of the estimators
improve signicantly when we increase t
n
even for xed .
For a brief review of asymptotic results for estimators based on approximations
to the likelihood function see Srensen [1997] page 4. In general we conclude that
this method only works reasonably well when we have both a great number of ob-
servations and a very short time period between each observation. As mentioned
earlier, when working with nancial data we will seldom be able to exercise great
control over the time between each observation, we can clearly not rely solely on
estimators based on a Gaussian approximation of the likelihood function.
As a way of attempting to deal with the bias problems indicated by Example 3.1
Kessler [1995] suggested that the approximate Gaussian likelihood function be
improved by using better estimates of the mean and variance. This results in a
method that gives more complicated calculations than the one suggested above,
but also (given a number of regularity conditions) provides more control over the
order of the bias, see Kessler [1995].
3.2 Martingale estimating functions
Another way of dealing with the bias problems is to restrict our attention to
martingale estimating functions, that is estimating functions on the form
G
n
() =
n

i=1
g
_

i
, X
t
i1
, X
t
i
;
_
(3.5)
36 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
where
i
= t
i
t
i1
and the function g satises
E

_
g
_

i
, X
t
i1
, X
t
i
;
_
[X
t
i1

= 0. (3.6)
Where as usual E

[X] indicates that the mean value is taken under the parameter
values, . That is, when f
t
i1
,t
i
is the transition density discussed previously we
have
E

_
g
_

i
, X
t
i1
, X
t
i
;
_
[X
t
i1
= x

=
_
g (
i
, x, z; ) f
t
i1
,t
i
(x, z; ) dz.
Equation (3.6) simply states that g
_

i
, X
t
i1
, X
t
i
;
_
should be a martingale
dierence under the appropriate measure. It is clear that (3.6) guarantees that
G
n
(
0
) is a martingale with respect to the usual ltration when
0
is the true
parameter value. In this we also note that G
n
(
0
) has mean zero under the
probability measure with respect to
0
, such that
E

0
[G
n
(
0
)] = 0.
Therefore the problems mentioned in the earlier section on the bias of the esti-
mating functions are avoided by working with martingale estimating functions.
For any given function G
n
() satisfying (3.5) and (3.6) we get an estimator for
if we can solve G
n
() = 0 either analytically or numerically.
It could seem that working with martingale estimating functions is little improve-
ment since we still need the transition density for the process X. The fact that
this density is seldom known was the very reason for turning away from maxi-
mum likelihood. However, even when we do not have a closed expression for the
transition density, we can nd functions satisfying (3.6).
It is often useful to think of any given martingale estimating function as an
approximation to the true (often unknown) score function which is a martingale
under weak regularity conditions.
From this point of view it is natural to attempt to choose estimating functions
that closely resemble the true score function. Consider a given class of estimating
functions, (, we work with two terms of what it means for an element in ( to be
an optimal estimating function.
G

n
( is said to be Fixed Sample Optimal if it is the element in (
closest to the true score function, measured by an appropriate measure.
G

n
( is said to be Asymptotic Optimal if it is the element in ( with
the smallest asymptotic variance as n .
See Godambe and Heyde [1987] for a general denition and discussion of these
terms.
3.2. MARTINGALE ESTIMATING FUNCTIONS 37
3.2.1 Existence of optimal estimating functions
We now present results concerning how to choose the best estimating function
from a particular class of estimating functions. This presentation is largely based
on Srensen [1997].
First we make a few notes on the notation used in both this and the subsequent
sections. For homogeneous diusions we have already noted that the transition
probabilities only depend on (s, t) through t s and for notational simplicity
we will use the notation f

i
(x, y; ) instead of f
t
i1
,t
i
(x, y; ) for the transition
probabilities whenever there is no doubt about the interval
i
= t
i
t
i1
. Also for
notational simplicity we will use the symbol in the following to indicate how the
interval between two subsequent observations enter the various functions. This
does not indicate that we are working with equidistant observations, it is simply
a notational simplication like writing g (, x, y; ) instead of g
_

i
, X
t
i1
, X
t
i
;
_
when working with certain properties of the function g.
We will in particular deal with the case where the estimating function has the
same dimension as , thus providing the same number of equations as the number
of unknown parameters. Consider now the function g given by
g
_

i
, X
t
i1
, X
t
i
;
_
=
N

j=1

j
(
i
, X
t
i1
; )h
j
_

i
, X
t
i1
, X
t
i
;
_
(3.7)
where h
j
is one-dimensional and satises (3.6), for j = 1, . . . , N and
j
are ar-
bitrary functions of the same dimension as (if possible). Evidently any function
g dened this way satises (3.6) and it is thus possible to dene an estimating
function G
n
by (3.5) using g. Assume in the following that the parameter i d-
dimensional and let ( denote the class of all d-dimensional martingale estimating
functions given by (3.5) and (3.7).
We need a number of denitions, consider the following
c
kl
(, x; ) =
_
h
k
(, x, y; )h
l
(, x, y; )f

(x, y; )dy, 1 k, j N(3.8)


b
(i)
j
(, x; ) =
_
h
j
(, x, y; )

i
f

(x, y; )dy, 1 i d, 1 k N (3.9)


and let the matrix function C be given by [C]
kl
= c
kl
, k = 1, . . . , N, l = 1, . . . , N
and the vector B
i
given by B
i
=
_
b
(i)
1
, . . . , b
(i)
N
_
T
, i = 1, . . . , d.
Let the function

ji
(, x; ) be dened by the following equations
C(, x; )
_
_
_

1i
(, x; )
.
.
.

Ni
(, x; )
_
_
_
= B
i
(, x; ), i = 1, . . . , d (3.10)
38 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
where C and B
i
are dened above.
Dene the d-dimensional vector-function

j
=
_

j1
, . . . ,

jd
_
T
.
We can now state the following theorem (Theorem 3.1 in Srensen [1997])
Theorem 3.1 (Optimal estimating function)
Assume that the functions h
j
are square integrable with respect to the
transition distribution for j = 1, . . . , N.
Assume that the transition density is dierentiable with respect to the
parameter .
Assume that the partial derivatives

i
log f
t,s
all belong to the space
L
2
(f
t,s
(x, y; )dy) for all xed values t s, x, and .
Assume that the functions dened by
g

i
(, x, y; ) =
N

j=1

ji
(, x; )h
j
(, x, y; )
are continuously dierentiable with respect to for all values of , x
and y.
Consider the estimating function
G

n
() =
n

i=1
N

j=1

j
(
i
, X
t
i1
; )h
j
_

i
, X
t
i1
, X
t
i
,
_
where

j
is given by (3.10), these can be determined analytically in some
cases.
Under the conditions stated the estimating function G

n
() is the optimal es-
timating function in the class ( in both the Fixed Sample and the Asymptotic
sense.
This theorem guarantees that there exists weights such that estimating functions
on the form (3.7) are optimal. It is not obvious why this is relevant at this point,
but shortly we shall discuss approaches to generating estimating functions (e.g.
polynomial estimating functions and estimating functions based on eigenfunc-
tions) that lead to estimating functions in the class (. It might seem that the
expression for the optimal weight-functions

j
can only be determined in the
cases where the transition density can be found explicitly since this density is
3.2. MARTINGALE ESTIMATING FUNCTIONS 39
used in the denition. It turns out however, that we can nd optimal estimating
functions even in cases where the transition densities are indeterminable, this will
be illustrated below.
Another important thing to note concerning Theorem 3.1 is that we are free to
choose the value of N 1 in (3.7), one might expect that the more functions h
j
we include the better the estimator. This will be discussed briey in the section
on estimating functions based on eigenfunctions.
3.2.2 Asymptotic behavior of martingale estimating func-
tions
In this section we let G
n
be a general martingale estimating function as given by
(3.5) with g given by the form (3.7) and the h
j
functions satisfying (3.6) and let

j
be a d-dimensional function for all j = 1, . . . , N where d is the dimension of
. Note in particular that we do not demand that the
j
functions satisfy (3.10).
By Theorem 2.9 we can impose conditions such that the solution to (3.1) is an
ergodic diusion with an invariant measure with density (with respect to the
Lebesgue measure) proportional to the speed function, denote this density

.
Using the invariant measure and the transition probabilities of the diusion we
can dene a probability measure on R
2
by
Q

(x, y) =

(x) f

(x, y; ) . (3.11)
Let the N by d matrix function A be given by A(, x; )
ij
=
ij
where
ij
is the
jth element in the d-dimensional vector
i
, let
0
denote the true value of .
Assumption 3.1
g is continuously dierentiable with respect to for all , x, y.
the functions

i
g
j
(, x, y; ) are locally dominated square integrable with re-
spect to Q

0
, that is there is an area () around such that sup

()

i
g
j
(, , ;

)
is square integrable with respect to Q

0
. Here g
j
is the jth coordinate of the
d-dimensional function g.
The d by d matrix D(
0
) given by
D(
0
)
i,j
=
_

j
g
i
(, x, y;
0
)dQ

0
(x, y) = E
Q

0
_

j
g
i
(, X
0
, X
1
;
0
)
_
is invertible.
The functions (x, y) g
j
(, x, y; ) are in L
2
(Q

0
) for all j = 1, . . . , d.
40 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
We can now state a result about the asymptotic properties of the estimator
derived by solving G
n
() = 0 for , the theorem below is stated e.g. in Srensen
[1997] or for a slightly less general class of estimating functions in Bibby and
Srensen [1995].
Theorem 3.2 (Asymptotic properties)
Suppose that

0

The conditions in Theorem 2.9 ensuring existence of an ergodic solution


are satised
Assumption 3.1 is satised.
Then, with probability converging to one as n , there exists an estimator

n
satisfying the equation G
n
() = 0. This estimator satises the following

n
P

0
, for n
under P

0
, and

n(

0
)
D
^
_
0, D(
0
)
1
V (
0
)
_
D(
0
)
1
_
T
_
under P

0
, where D(
0
) is dened in Assumption 3.1 and
V (
0
) = E

0
_
A(, X
0
;
0
)
T
C(, X
0
;
0
)A(, X
0
;
0
)

Proof Although this theorem is heavy in notation it is a consequence of the


central limit theorem for martingales and the ergodic theorem (law of large num-
bers). To better understand this we outline the proof which has been omitted in
Srensen [1997].
We rst turn our attention to the V matrix given above, recall that
g
_

i
, X
t
i1
, X
t
i
;
_
=
N

j=1

j
(
i
, X
t
i1
; )h
j
_

i
, X
t
i1
, X
t
i
;
_
recall also the denition of the C matrix as given by the conditional mean of the
product of h
k
and h
l
. Combining these denitions and using the product measure
Q

we can insert the denitions of A and C in the V matrix and we note that
V (
0
) = E
Q

0
_
g (
i
, X
0
, X
1
; ) g (
i
, X
0
, X
1
; )
T
_
. (3.12)
From the central limit theorem for martingales (in a multivariate version), the
conditions stated in Assumption 3.1 are sucient to ensure that (see Srensen
3.2. MARTINGALE ESTIMATING FUNCTIONS 41
[1999] theorem 3.2)
1

n
G
n
(
0
)
D
^(0, V (
0
)) .
By Taylors formula as above we have that
G
n
(
0
) = G
n
(

n
)
. .
=0
+

G
n
(

)(

0
).
If

G
n
(

) is invertible we have

n
_

0
_
=
_
1
n

G
n
(

)
_
1
1

n
G
n
(
0
)
What remains to be shown is that
1
n

G
n
(

) D(
0
) in probability under P

0
and
the asymptotic distribution follows. This convergence seems natural in light of the
denition of D(
0
) as the mean of the derivative (compare with the information
matrix in a maximum likelihood setting). See eg. Bibby and Srensen [1995] for
further details.
We immediately note that this theorem holds for only the number of observa-
tions n going to innity. We do not require that the time between observations
decrease, this was the case for the estimators based on the approximate likeli-
hood function discussed above. We also note that this theorem does not require
equidistant observations, with reference to the comment made above about the
usage of the symbol .
Whereas the theorem covers general estimating functions of the form (3.5) with
g on the form (3.7), we note the following when we choose the optimal estimating
function as stated in Theorem 3.1.
Corollary 3.1 (Properties of optimal estimating function)
Suppose that the estimating function G

n
() is optimal in the sense of Theorem
3.1 and assume that the conditions in Theorem 3.2 are satised.
The asymptotic distribution of the estimator found by solving G

n
() = 0 is
in this case given by

n(

0
)
D
^
_
0, V (
0
)
1
_
Proof Using that
_
h
j
(, x, y; ) f

(x, y; ) dy = 0
42 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
The conditions in assumption 3.1 ensure that we can interchange dierentiation
and integration such that
0 =
_

i
h
j
(, x, y; ) f

(x, y; ) +h
j
(, x, y; )

i
f

(x, y; ) dy
which in turn gives
b
(i)
j
(, x; ) =
_

i
h
j
(, x, y; )f

(x, y; )dy.
Using the denition of the optimal s (3.10) gives the following expression for
the matrix D(
0
)
D(
0
)
ij
=
_

j
g
i
(, x, y;
0
)dQ

0
(x, y)
=
_

j
_
N

k=1

ki
(, x;
0
)h
k
(, x, y;
0
)
_
dQ

0
(x, y)
=
_ _
N

k=1

ki
(, x;
0
)

j
h
k
(, x, y;
0
)f

(x, y;
0
)

0
(x)dydx
=
_
N

k=1

ki
(, x;
0
)
__

j
h
k
(, x, y;
0
)f

(x, y;
0
)dy
_

0
(x)dx
=
_
N

k=1

ki
(, x;
0
)
_
b
(j)
k
(, x;
0
)
_

0
(x)dx
=
_
_

i1
(, x;
0
) . . .
iN
(, x;
0
)

B
j
(, x;
0
)

0
(x)dx.
Using that the N-dimensional vector B
j
satises (3.10) we note that
D(
0
) = E

0
_
A(, X
0
;
0
)
T
C(, X
0
;
0
)A(, X
0
;
0
)

= V (
0
)
and the result follows.
The two main theorems above from Srensen [1997] are quite general in nature
and cover a large class of estimating functions, we now consider a class of slightly
more simple estimating functions.
3.2.3 Linear estimating functions
As above we let X be one-dimensional, and be a d-vector of unknown parame-
ters, consider now an estimating function on the form
G
n
() =
n

i=1
g(
i
, X
t
i1
; )
_
X
t
i
F(
i
, X
t
i1
; )
_
(3.13)
3.2. MARTINGALE ESTIMATING FUNCTIONS 43
where g is a d-dimensional function and
F(
i
, X
t
i1
; ) = E

_
X
t
i
[X
t
i1

.
Equation (3.13) thus gives us d equations with d unknown parameters (). For
this class of estimating functions it follows from Bibby and Srensen [1995] that
the optimal function in both the Fixed Sample and the Asymptotic sense dis-
cussed above is given by
g(, x; ) = g

(, x; ) =

F(, x; ) ((, x; ))
1
(3.14)
where is the conditional variance matrix
(, x; ) = E

_
(X
t
F(, x; ))
2
[X
t
i1
= x

.
A point of criticism of this simpler approach to generating useful estimating
functions could be that the conditional mean and variance are seldom known.
They are however, known in several situations where the transition probabilities
themselves are not determinable. In cases where the mean and variance are not
known it is still possible to determine them using simulation. However, it is
seldom easy to calculate the derivative of a function based on simulation. It
can be advisable to replace the optimal estimating function with an estimating
function closely resembling the optimal choice. It is important to note, however,
that we can only replace the expressions in the g function with approximations.
If we replace the expression for the last factor in (3.13) this will result in ruining
the martingale property. If we approximate the optimal weight function g

with
the rst term of a Taylor-like expansion we end with the following estimating
function
G
n
() =
n

i=1

(X
t
i
; )
_

2
(X
t
i
; )

1
_
X
t
i
F(
i
, X
t
i1
; )
_
. (3.15)
By keeping the exact mean in the last term we ensure that the function still a
martingale and thus still an unbiased estimating function. If we had replaced this
mean with an approximation also, we would have reintroduced the bias problems
from the previous section, note that the approximation to the optimal g

saves us
the problems with simulating the derivative of the mean. By simulation studies
Bibby and Srensen [1995] demonstrate that the approximation to the optimal
estimating function performs quite well, compared to the true optimal function.
In the case of the CIR process the linear estimating process will not be a good
choice, linear functions are clearly a good choice when only the drift function
depends on the unknown parameter, a better choice for processes where both drift
and diusion functions contain unknown parameters will be quadratic estimating
functions.
44 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
3.2.4 Quadratic estimating functions
Consider an extension of the function (3.13) with a second order term
G
n
() =
n

i=1
_
g(
i
, X
t
i1
; )
_
X
t
i
F(
i
, X
t
i1
; )
_
(3.16)
+h(
i
, X
t
i1
; )
_
(X
t
i
F(
i
, X
t
i1
; ))
2
(
i
, X
t
i1
; )
_
Introduce the third and fourth order moments by
(, x; ) = E

_
(X
t
F(, x; ))
3
[X
t
i1
= x

(, x; ) = E

_
(X
t
F(, x; ))
4
[X
t
i1
= x

(, x; )
2
The optimal estimating function in this case is given by the following two optimal
weight functions
g

(, x; ) =

(, x; )(, x; )

F(, x; )(, x; )
(, x; )(, x; ) (, x; )
2
(3.17)
h

(, x; ) =

F(, x; )(, x; )

(, x; )(, x; )
(, x; )(, x; ) (, x; )
2
(3.18)
Clearly we cannot hope to derive analytical expressions for g

and h

for general
models, even when the conditional moments can be determined the estimating
equation may be impossible to solve explicitly . We now consider a model simple
enough to allow closed form expressions for the estimators derived from (3.16)
and (3.17)-(3.18).
Example 3.2 (The Ornstein-Uhlenbeck process)
We now present an example for which both the maximum likelihood estimators
and the estimators based on the optimal quadratic estimating function exist and
both sets of estimators can be found explicitly.
Consider the Ornstein-Uhlenbeck process
dX
t
= ( X
t
)dt +dW
t
, X
0
= x
0
, = (, , )
T
R R R
+
(3.19)
The model is well dened for all X
t
R.
The fact that the diusion function is a constant results in a number of appealing
Gaussian properties for this process. Using the same theorems as when working
with the CIR process it follows simply that a unique strong solution exists and if
> 0 the process is ergodic with invariant distribution
^
_
,

2
2
_
. (3.20)
3.2. MARTINGALE ESTIMATING FUNCTIONS 45
The transition densities in this simple model can also be found explicitly and are
given by
p

i
(x, )
D
= ^
_
e

i
(x ) +,

2
2
(1 e
2
i
)
_
. (3.21)
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Figure 3.1: A simulated sample-path of the Ornstein-Uhlenbeck-process with parameter values
= 1, = 2 and
2
= 1. Note the mean reversion towards similar to that of the CIR model
but also negative values unlike the CIR case.
Maximum likelihood estimation in the OU model:
Assume for simplicity that we have equidistant observations X
0
, X
1
, . . . , X
n
with
a discrete observation-interval of size . We can now nd the maximum like-
lihood estimators explicitly, rst we introduce the bijective transformation of the
parameters:
(, , ) = (, e

,

2
2
(1 e
2
)). (3.22)
Using these parameters the (true) log-likelihood function for the observations
X
0
, X
1
, . . . , X
n
is

n
(, , ) = log (L
n
())
=
n

i=1
_
1
2
log() +
1
2
(X
i
(X
i1
) )
2
_
. (3.23)
Consider the following expressions
S
0
=

n
i=1
X
i1
S
1
=

n
i=1
X
i
S
00
=

n
i=1
X
2
i1
S
01
=

n
i=1
X
i
X
i1
.
46 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
Maximizing the likelihood function now yields the following estimators

n
=
S
01
S
0
S
00
S
1
n(S
01
S
00
) + S
0
(S
0
S
1
)
(3.24)

n
=
nS
01
S
0
S
1
nS
00
S
2
0
(3.25)

n
=
1
n
n

i=1
_
X
i

n
X
i1
+
n
(

n
1)
_
2
. (3.26)
The maximum likelihood estimator is well dened when

n
(0, 1) which is the
case with a probability tending to one as n .
These estimators are asymptotically normally distributed with the true values as
mean and the inverse of the (Fisher) information matrix as variance. It turns
out that this matrix is diagonal and given by
I
n
(, , ) = n
_
_
1

(1 )
2
0 0
0
1
1
2
0
0 0
1
2
2
_
_
.
The estimators (
n
,

n
,
n
) are thus asymptotically independent, this is not the
case for the estimators of original parameters (, , ) which are easily found by
transforming back according to (3.22). The results of a simulation study examin-
ing the performance of these estimators are given in Table 3.2. As above we have
simulated 1000 sample paths of the process with equidistant discretization points
for = 0.0001, and then sampled the observations according to the values given
in the table.
The optimal quadratic estimation function in the OU model:
We now consider the estimators based on the optimal quadratic estimating func-
tion (3.16) with weight functions (3.17) and (3.18). From the Gaussian proper-
ties of the Ornstein-Uhlenbeck process we can easily determine all the moments
needed to work with this estimating function. Solving the estimating equation
(3.16) yields the following estimators (where we still use the parameters (, , ))

n
=
S
01
S
0
S
00
S
1
n(S
01
S
00
) +S
0
(S
0
S
1
)
(3.27)

n
=
nS
01
S
0
S
1
nS
00
S
2
0
(3.28)

n
=
1
n
n

i=1
_
X
i
F(, X
i1
; ,

))
_
2
. (3.29)
3.2. MARTINGALE ESTIMATING FUNCTIONS 47
t
n

n

n

2
n
mean sde mean sde mean sde
100 2.5 0.6640 0.3261 1.9548 0.1191 0.6616 0.3449
100 1 0.9447 0.2354 1.9821 0.1065 0.9336 0.2075
100 0.1 1.0334 0.1522 2.0010 0.1015 0.9983 0.0479
250 2.5 0.8335 0.3299 1.9815 0.0752 0.8212 0.3323
250 1 0.9773 0.1487 1.9918 0.0669 0.9707 0.1317
250 0.1 1.0131 0.0925 1.9994 0.0631 0.9997 0.0303
500 2.5 0.9205 0.2894 1.9919 0.0543 0.9188 0.2977
500 1 0.9898 0.1092 1.9970 0.0479 0.9864 0.0966
500 0.1 1.0065 0.0657 2.0013 0.0453 0.9998 0.0217
Table 3.2: Results of simulation study of maximum likelihood estimators in the Ornstein-
Uhlenbeck process, data is a simulated O-U process with true values (, ,
2
) = (1, 2, 1). A
total of 1000 simulations with interval = 0.0001 was made and the observations were then
sampled according to the values of and t
n
. One source of numerical imprecision in the
estimates stems from the fact that the estimators only exists when

= e

(0, 1). We may
thus encounter numerical problems when is close to zero or large, as is illustrated by the
fact that some estimations fail for = 2.5 which is equivalent to e

= 0.082 for the true


value = 1. For intermediate and small values of the amount of observations is large and
the precision of the estimators improve considerably. Even for large values of the estimator
improves signicantly when the number of observations increase.
If we insert the expression for the conditional mean we realize that these estima-
tors are identical to the maximum likelihood estimators. This result relies strongly
on the normality and is explained by Theorem 3.3 below.
Figure 3.2 depicts the distribution (histograms and estimated densities) of the es-
timators for the parameters (, ,
2
) for the values t
n
= 500 and = 0.1. We
see that (apart from the numerical issues that may inuence the existence of the
estimators) maximum likelihood or equivalently the optimal quadratic estimating
function provide good estimates of the true values. As noted the numerical prob-
lems vanish with a probability tending to one as n increases to innity. Indeed
the problems are not present for the values depicted in the gure. However, the
class of models for which we can explicitly calculate either the maximum likelihood
function or the estimator based on optimal estimating functions, prove to be quite
restrictive.
48 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
The above calculations are facilitated greatly by the normality of the transition
densities, even in models where the transition density is known it may be such
that deriving an analytical expression for the maximum likelihood estimator is
infeasible. In such cases estimating functions may also provide a reasonable
alternative.
Since the likelihood function is seldom known we have spend little attention on the
maximum likelihood estimators. However, inspired by Example 3.2 we examine
further the relationship between the optimal quadratic estimating function and
the likelihood function. It turns out that the ndings in Example 3.2 can be
generalized to a (slightly) larger class of processes than the Ornstein-Uhlenbeck
model alone.
0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25
1
2
3
4
5
6
Density
a
(a) Estimator for the parameter ,
true value
0
= 1.
1.825 1.850 1.875 1.900 1.925 1.950 1.975 2.000 2.025 2.050 2.075 2.100 2.125 2.150
1
2
3
4
5
6
7
8
9
10
Density

(b) Estimator for the parameter ,


true value
0
= 2.
0.92 0.94 0.96 0.98 1.00 1.02 1.04 1.06 1.08 1.10
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
Density
^2
(c) Estimator for the parameter
2
,
true value
2
0
= 1.
Figure 3.2: Histogram and empirical density for estimators based
on the optimal quadratic estimating function (and thus equal to
the maximum likelihood estimator) for the simulated Ornstein-
Uhlenbeck process with true values (, ,
2
) = (1, 2, 1), estimated
for t
n
= 500 and = 0.1.
3.2. MARTINGALE ESTIMATING FUNCTIONS 49
Theorem 3.3 (Quadratic estimating function and the MLE)
Let

be the estimators derived from the optimal quadratic estimating func-
tion, that is (3.16) with weight functions (3.17) and (3.18).
Let

be the maximum likelihood estimators derived from the true likelihood
function.
If the transition probabilities of the diusion process are Gaussian then

=

.
Proof Consider the case with equidistant observations X
0
, X
1
, . . . , X
n
, by the
Gaussian assumption X
i
[X
i1
is normally distributed with mean F(, X
i1
; )
and variance (, X
i1
; ). The log-likelihood function is

n
(, , ) = log (L
n
())
=
n

i=1
_
1
2
log((, X
i1
; )) +
1
2(, X
i1
; )
(X
i
F(, X
i1
; ))
2
_
.
The score function s
n
() is given by
s
n
() =

n
(, , )
=
n

i=1
_
1
2

(, x; )
(, x; )

(, x; )
2
2
(, X
i1
; )
(X
i
F(, X
i1
; ))
2

F(, x; )
(, x; )
(X
i
F(, X
i1
; ))
_
.
From the normality assumption we get that (, x; ) = 0 and (, x; ) =
2
2
(, x; ) such that the optimal weight functions (3.17)-(3.18) simplify to
g

(, x; ) =

F(, x; )
(, x; )
h

(, x; ) =

(, x; )
2
2
(, x; )
.
The optimal quadratic estimating function thus reduces to
G

n
() =
n

i=1
_

F(, x; )
(, x; )
_
X
t
i
F(
i
, X
t
i1
; )
_

(, x; )
2
2
(, x; )
_
(X
t
i
F(
i
, X
t
i1
; ))
2
(
i
, X
t
i1
; )
_
_
= s
n
().

50 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS


Theorem 3.3 is valid for a class of relatively simple processes for which we could
just as well use maximum likelihood estimation. However, the Theorem also
provides intuition to the concept of choosing an approximation to the optimal
estimating function. When dealing with processes where we cannot determine
the conditional moments we will encounter the same problems mentioned ear-
lier about calculating the derivatives of various moment using simulations. By
including the same type of approximations to the derivatives as for the linear
function and inspired by Theorem 3.3 we get the following approximation to the
optimal quadratic estimating function
G
n
() =
n

i=1
_

_
X
t
i1
;
_

2
_
X
t
i1
;
_
_
X
t
i
F(
i
, X
t
i1
; )

(3.30)
+

2
_
X
t
i1
;
_
2
4
_
X
t
i1
;
_

i
_
(X
t
i
F(
i
, X
t
i1
; ))
2
(
i
, X
t
i1
; )

_
.
This estimating function resembles the approximated score function (3.4) de-
rived earlier based on the Euler scheme for approximating the solution to the
SDE. However, this estimating function is unbiased which was clearly not the
case for the approximated score, as illustrated by the simulation example above.
In general, this approximated optimal estimating function can be viewed as a
mixture of the approximated score and the optimal quadratic estimating func-
tion. It has the same quite simple structure as the Euler approximation but has
some of the desirable properties from the optimal estimating function.
Note that we cannot use approximations for the mean and variance inside the
square brackets since this would result in an estimating function that would in
general not be a martingale and thus not an unbiased estimating function. We
still need to derive expressions for the conditional mean and variance to use
this approximation, when this is not possible we must use simulation schemes.
However, as mentioned earlier the approximation to the true optimal quadratic
estimating function has the clear advantage that we need not simulate derivative
of various conditional moments of the process X.
From the linear and quadratic estimating functions it seems natural to move
on to general polynomial estimating functions. However no literature suggests
that the improvement in behavior will be signicant enough to justify the extra
computational time needed to calculate moments of higher order which would be
necessary for such functions. Instead we will consider, later on, a dierent class
of estimating functions that are more closely linked to the individual stochastic
dierential equation studied, that is estimation functions based on eigenfunctions.
3.2. MARTINGALE ESTIMATING FUNCTIONS 51
Example 3.3 (Martingale estimating function for the CIR process)
Consider observations X
0
, X
1
, . . . , X
n
from the CIR diusion process described
in several examples above.
dX
t
= a(b X
t
)dt +
_
X
t
dW
t
, a, b, R
+
For this process we can derive closed expression for the conditional mean and vari-
ance and it is therefore possible to use the optimal quadratic estimating function
(3.16) with weights given by (3.17) and (3.18).
We are interested in determining the conditional mean and variance, we know
that the process X satises
X
t
i
= X
t
i1
+
_
t
i
t
i1
a(b X
s
)ds +
_
t
i
t
i1

_
X
s
dW
s
so by the fact that, modulo an integrability condition, the mean of the Ito integral
is zero we have
E

_
X
t
i
[X
t
i1

= X
t
i1
+ E

__
t
i
t
i1
a(b X
s
)ds
_
+ 0.
Sucient conditions for E

_
_
t
i
t
i1

X
s
dW
s
_
= 0 is that the diusion function
is a L
2
process
1
. In other words, the conditional mean satises the following
ordinary dierential equation, where we let f(t) = E

[X
t
[X
s
] for s < t
f

(t) = ab af(t), f(s) = X


s
this gives
f(t) = b +e
at
(b +X
s
)
or if we use the notation from previously
F(
i
, X
t
i1
; ) = b +e
a
i
(X
t
i1
b).
1
This follows simply from the construction of the Ito integral as the limit of a sum of simple
processes, see ksendal [1989]. As a simple consequence of the zero mean property we derive
the martingale property of the Ito integral introduced in Theorem 2.1.
52 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
Using similar arguments calculations yield
(
i
, X
t
i1
; ) =

2
2a
2
_
a(b 2X
t
i1
)e
2a
i
2a(b X
t
i1
)e
a
i
+ab
_
(
i
, X
t
i1
; ) =

4
2a
3
_
a(b 3X
t
i1
)e
3a
i
3a(b 2X
t
i1
)e
2a
i
+ 3a(b X
t
i1
)e
a
i
ab
_
(
i
, X
t
i1
; ) =
3
6
4a
4
_
a(b 4X
t
i1
)e
4a
i
4a(b 3X
t
i1
)e
3a
i
_
+6a(b 2X
t
i1
)e
2a
i
4a(b X
t
i1
)e
a
i
+ab
_
+ 2(
i
, X
t
i1
; )
2
We are now in a position to state the optimal quadratic estimating function (3.16)
with the weights (3.17)-(3.18). However, the solution to G

n
() = 0 cannot be
solved analytically and numerical methods must be implemented. If we turn our
attention to the approximated optimal estimating function (3.30) and restrict our
attention to equidistant observations we can solve this equation explicit and after
some tedious calculations we achieve the following estimators
a
n
=
1

ln
_
n
2
(

n
i=1
X
t
i1
)(

n
i=1
X
1
t
i1
)
n

n
i=1
X
t
i
X
1
t
i1
(

n
i=1
X
t
i
)(

n
i=1
X
1
t
i1
)
_

b
n
=

n
i=1
X
t
i
X
1
t
i1
ne
a
n

(1 e
an
)

n
i=1
X
1
t
i1

2
n
=

n
i=1
X
1
t
i1
_
X
t
i
F(, X
t
i1
; a
n
,

b
n
)
_
2

n
i=1
X
1
t
i1
Y (, X
t
i1
; a
n
,

b
n
)
where
Y (, X
t
i1
; a, b) =
1
2a
2
_
a(b 2X
t
i1
)e
2a
i
2a(b X
t
i1
)e
a
i
+ab
_
and F is dened above.
These estimators exists when the expression for a
n
is well dened in the sense that
the value inside the logarithm is positive. Note also that, as we have seen above b
is the long term mean of the CIR-process. However, estimating b by the empirical
mean of the observations would fail to take into account the information given
about the mean reversion level i.e. a. Using the empirical mean as an estimator
for b would only be valid if we somehow knew that the process was only observed
while in its long term stationary state and not for example while recovering
from an outside enforced shock to the nancial variable in question.
3.2. MARTINGALE ESTIMATING FUNCTIONS 53
t
n
a
n

b
n

2
n
mean sde mean sde mean sde
100 2.5 0.8290 0.4326 2.0100 0.1719 0.7888 0.4770
100 1 1.1209 0.4096 2.0024 0.1510 1.0498 0.3453
100 0.1 1.0424 0.1589 2.0039 0.1434 1.0001 0.0497
250 2.5 0.9225 0.4006 2.0031 0.1090 0.8922 0.4036
250 1 1.0419 0.2145 1.9992 0.0939 1.0156 0.1913
250 0.1 1.0173 0.0959 2.0002 0.0885 1.0004 0.0315
500 2.5 0.9915 0.3921 2.0043 0.0770 0.9792 0.3981
500 1 1.0177 0.1386 2.0010 0.0673 1.0047 0.1259
500 0.1 1.0081 0.0684 2.0023 0.0635 1.0003 0.0224
Table 3.3: Estimators based on (3.30), data is a simulated Cox-Ingersoll-Ross process with true
values (a, b,
2
) = (1, 2, 1).
In the same way as above we now examine the estimator by a simulation study.
We can experiment with altering the parameters (a, b, ) but more interesting is
the eect of varying the values of and the number of observations measured by t
n
or n (for xed we have n = t
n
so working with simulations for dierent values
of the set of parameters given by (, n) is equivalent to working with (, t
n
)). The
rst thing we note by simulation studies is that the estimator is well-dened for
the large majority of the simulated sample paths, only when we choose n small
and large do we experience cases where the estimator for a is undened, in
particular this hardly ever occurs for the values worked with below.
For the same simulated data as used above for the Euler approximation we now
calculate the estimator

n
based on the approximation to the optimal quadratic
estimating function. That is, we simulate 1000 sample paths of the CIR process
with equidistant discretization points for = 0.0001, we then calculate the esti-
mator for each sample path for various values of time between observations and
observation time-period [0, t
n
], we then consider the empirical mean and standard
deviation for the estimators calculated for each sample path.
From Table 3.3 we see indications that this estimator is a considerable improve-
ment of the estimator discussed in Table 3.1. We do however note that the stan-
dard deviations are quite large for larger in comparison with the estimator
based on (3.4) but for reasonable values of the observation interval this new esti-
mator behaves quite well and unlike the estimator based on (3.4) it appears to be
unbiased.
Figure 3.3 shows histograms and estimated densities for the estimators in the
case t
n
= 100 and = 0.1, we know from Theorem 3.2 that the distribution will
converge to a normal distribution as the number of observations increases. We
note that even for the relatively low number t
n
= 100 the estimators are quite
54 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
precise, equally important we note that the asymptotical results do not require
that the interval between theses observations decreases at the same time, in gen-
eral we conclude that this is a far better estimator than the one based on the
approximation to the true score function 3.4.
It is clear that a part of the appeal of estimators based on the estimating function
in (3.30) is weakened somewhat in the case of more general diusion processes
where we need simulations to calculate conditional mean and variance. In these
cases we do not have any hope of deriving analytical expressions for the estimator
as was the case in the above example, we still note that even though we need
to calculate the estimators numerically much of the appeal of the approximate
optimal quadratic estimating function lies in the fact that it is quite simple and
can be implemented for a great deal of processes. Implementation may require
0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
0.5
1.0
1.5
2.0
2.5
Density
a
(a) Estimator for the parameter a,
true value a
0
= 1.
1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5
0.5
1.0
1.5
2.0
2.5
3.0
Density
b
(b) Estimator for the parameter b,
true value b
0
= 2.
0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000 1.025 1.050 1.075 1.100 1.125 1.150 1.175
1
2
3
4
5
6
7
8
Density
^2
(c) Estimator for the parameter
2
,
true value
2
0
= 1.
Figure 3.3: Histogram and empirical density for estimators based
on the estimating function (3.30) for the simulated Cox-Ingersoll-
Ross process with true values (a, b,
2
) = (1, 2, 1), estimated for
t
n
= 100 and = 0.1.
3.2. MARTINGALE ESTIMATING FUNCTIONS 55
some computational eorts both in the aspect of calculating conditional moments
and solving the estimation equations but the inference based on this function is
clearly a solid and relatively simple alternative to maximum-likelihood estimation
which will quite often not be possible.
3.2.5 Estimating the standard deviation
We now address the issue of estimating the standard deviation of an estimator
based on a martingale estimating function.
Consider the case where an estimator

n
is derived from a given dataset by the
usual estimating equations
G
n
(

n
) = 0.
Where, as above, G
n
is a martingale estimating function on the form
G
n
() =
n

i=1
g
_

i
, X
t
i1
, X
t
i
;
_
.
Recall that g is assumed to be a martingale dierence.
We also assume that the conditions in Theorem 3.2 are satised such that

n
is
consistent and asymptotically normal:

n(

0
)
D
^
_
0, D(
0
)
1
V (
0
)
_
D(
0
)
1
_
T
_
. (3.31)
We will discuss two dierent methods to estimate the standard deviation of

n
.
The rst method is based on estimating the asymptotic variance given by the d
by d matrices D(
0
) and V (
0
) (recall that d is the dimension of the parameter
vector and also the dimension of g).
Given the regularity assumptions from Theorem 3.2 and according to the fact
that
D(
0
)
i,j
= E
Q

0
_

j
g
i
(, X
0
, X
1
;
0
)
_
and
V (
0
) = E
Q

0
_
g (
i
, X
0
, X
1
; ) g (
i
, X
0
, X
1
; )
T
_
Theorem 3.2, Theorem 3.6 and Lemma 3.7 in Srensen [1999] gives that we can
estimate the variance according to

D(
0
) =
1
n
G
n
(

n
)

(3.32)

V (
0
) =
1
n
n

i=1
g
_

i
, X
t
i1
, X
t
i
;

n
_
g
_

i
, X
t
i1
, X
t
i
;

n
_
T
. (3.33)
56 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
Where
G
n
()

is the d by d Jacobian matrix of derivatives of G


n
with respect to
.
Given the regularity conditions imposed we would expect this method to estimate
the variance of

n
well only when n is large. This means that this variance
estimator may not work well due to nite sample issues. The fact that we need
to calculate the Jacobian matrix of G
n
implies that using this method will be
computationally expensive for models with a high number of parameters. Often
we will need to use numerical dierentiation which implies that the function G
n
will be evaluated several times for each parameter
i
, see also Appendix A for a
discussion of the computations needed to calculate numerical derivatives.
To avoid the nite sample issues a bootstrap-like method of estimating the
standard deviation is suggested. Consider the case where an estimator

n
is
derived from the estimating function as above. For these parameter values a
large number of simulated sample-paths are simulated and for each simulation
the parameters are estimated. In this way standard errors can be calculated from
the empirical standard deviation of the estimated parameters. That is, to clarify
the method let X
i

n
i=1
denote the observed data and for a given parameter vector
let Y
i
()
n
i=1
denote a simulated dataset from the model in question.
The bootstrap method suggested can then be illustrated by:
1. Estimate

n
from X
i

n
i=1
.
2. Simulate Y
i
(

n
)
n
i=1
from

n
.
3. Estimate

from Y
i
(

n
)
n
i=1
.
4. Record the value of

and go to 2.
Repeating this algorithm K times gives K +1 (including

n
) observations of the
estimator and the standard deviations can be calculated. It has the advantage
that it does not require the number of observations n to increase, however, the
main disadvantage is the computational time needed for the method to be pre-
cise. If the estimating equations have to be solved using numerical methods this
bootstrap method could be extremely costly in regards to computational time.
A simulation study has been performed to compare the values calculated using
(3.32)-(3.33) to the values calculated using the bootstrap method.
We have, as usual, simulated a CIR process with true values (a, b,
2
) = (1, 2, 1),
and we use the approximation to the optimal quadratic estimating function,
(3.30). We keep the time between observation xed at = 1 and consider
the eects of increasing the number of observations, in the bootstrap method we
use K = 1000 simulations. The results are given in Table 3.4. Note that the
results from the bootstrap method are similar to those of Table 3.3. If we had
3.2. MARTINGALE ESTIMATING FUNCTIONS 57
sde
t
n
Estimator a
n

b
n

2
n
100 Asymptotic 0.5432 0.2416 0.3310
Bootstrap 0.4142 0.1702 0.3542
250 Asymptotic 0.3108 0.1331 0.1258
Bootstrap 0.2712 0.1192 0.1893
500 Asymptotic 0.2047 0.0867 0.0830
Bootstrap 0.1901 0.0795 0.0921
Table 3.4: Standard deviation calculated by asymptotic estimator and bootstrap.
started the bootstrap method in the true parameters instead of the estimated
parameters we would have replicated exactly the process of the simulation study
reported in Table 3.3.
It is dicult to conclude from Table 3.4 that one method is preferable to another.
The bootstrap method has the advantage of letting us increase K. However, a
major disadvantage is that we need to use a discrete scheme to simulated the
processes Y
i
()
n
i=1
. This means that some variation detected may be based on
imprecision of the approximation scheme used. This can of course be counteracted
somewhat by simulating for a very small . It is important to note though, that
the standard deviations calculated by the bootstrap method will always depend
on both the simulation scheme used and the value of K. It may thus seem that
there is something a little arbitrary to the bootstrap method. For large datasets
the method based on (3.32)-(3.33) may be preferable.
We now explore further the behavior the estimator of the asymptotical standard
deviation as the number of observation increases. We have, based on (3.32)-(3.33),
calculated the estimated standard deviation of the estimators from the estimat-
ing function (3.30) for a series of simulated CIR processes where we increase the
number of observations. To draw a similar graph for the bootstrap method would
require a considerable amount of simulations and an equally large amount of com-
putational time. This thus illustrates that for lage datasets the bootstrap method
will not be a reasonable method. Figure 3.4 depicts the estimated asymptotical
standard deviations as a function of the number of observations.
We note that the estimated asymptotical standard deviation decrease as the num-
ber of observations increases. However, for values of n above 1300 observations
the values appears to stabilize somewhat. This indicates that we should use these
estimators with care when the datasets consist of few observations. In this case
few means less that 1300, but for general cases this may of course vary greatly.
If we suspect that a given data sample consists of too few observations to utilize
(3.32)-(3.33) the bootstrap represents a practical alternative.
58 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800
0.050
0.075
0.100
0.125
0.150
0.175
0.200
sde(
~
a) sde(
~
b) sde(
~

2
)
Figure 3.4: The estimated asymptotical standard deviation of the estimators from the ap-
proximation to the optimal quadratic estimating function. Data is a series of simulated CIR
processes with (a, b,
2
) = (1, 2, 1), = 1 and t
n
varies from 500 to 1800.
3.2.6 Estimators based on eigenfunctions
From the above examples of polynomial estimating functions we now present a
method of nding an estimating function that is closely related to the exact type
of stochastic dierential equation in question. The concepts and results in this
section follow to some degree from Kessler and Srensen [1995].
Consider the time-invariant one dimensional stochastic dierential equation worked
with throughout this chapter
dX
t
= (X
t
; ) dt + (X
t
; ) dW
t
.
From Chapter 2 we have the innitesimal operator (or generator), /, for the sde
given by
/

f(x) = (x; )
f
x
(x) +
1
2

2
(x; )

2
f
x
2
(x)
For any function f : R R where f is a C
2
function.
We now dene an eigenfunction for this operator much in the same way as an
eigenvector for a matrix.
3.2. MARTINGALE ESTIMATING FUNCTIONS 59
Denition 3.1 (Eigenfunction for an innitesimal operator)
A function C
2
function (; ) is called an eigenfunction for /

with eigen-
value () if
/

(x; ) = ()(x; )
for all x under P

, where () R.
For a given operator /

we call the set of all eigenvalues,

, the spectrum of /

and it is possible to show that

[0, ).
We wish to nd an expression for the conditional mean of the process X trans-
formed by an eigenfunction for the innitesimal operator. Recall that the deriva-
tive of the scale function (sometimes referred to as the density of the scale-
measure) s

(x, ) is dened by
s

(x; ) = exp
_
2
_
x
x
0
(z; )

2
(z; )
dz
_
.
We note the following from Kessler and Srensen [1995] section 5
Lemma 3.1 (Conditional mean)
If X is ergodic with invariant measure and X
0
then the condition
_
I

(x; )(s

(x; ))
1
dx < (3.34)
is sucient for the following to hold:
E

_
(X
t
i
; )[X
t
i1
= x

= e
()
i
(x; ) (3.35)
where as usual
i
= t
i
t
i1
.
Given a diusion process with innitesimal operator /

and N eigenfunctions

j
(X
t
i1
; ), j = 1, . . . , N with eigenvalues
j
(), j = 1, . . . , N we can thus dene
a martingale estimating function by
g
_

i
, X
t
i1
, X
t
i
;
_
=
N

j=1

j
(
i
, X
t
i1
; )
_

j
(X
t
i
; ) E

_
(X
t
i
; )[X
t
i1
= x

=
N

j=1

j
(
i
, X
t
i1
; )
_

j
(X
t
i
; ) e

j
()
i

j
(X
t
i1
; )

(3.36)
60 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
In accordance with Theorem 3.1, under the set of regularity conditions stated in
Theorem 3.1, there exists optimal estimating functions in both Fixed Sample and
Asymptotic sense within the class of functions on the form above, these optimal
functions satisfy (3.10) and can in some cases be found explicitly. The fact that
E

_
(X
t
i
; )[X
t
i1
= x

is explicitly known means that we do not have to fear any


numerical problems as was the case when simulations were needed. For a more
thorough discussion of Theorem 3.1 in the special case of estimating functions
based on eigenfunctions see Proposition 3.1 in Kessler and Srensen [1995].
A major advantage of working with an optimal estimating function based on
eigenfunctions is that it is invariant under bijective data transformations. That
is, if u is a one-to-one and onto C
2
function then the diusion process derived from
the process X by the transformation Y
t
= u(X
t
) has the same eigenvalues as the
original process X and eigenfunctions given by
j
(u
1
(Y
t
); ). These facts give
that the optimal estimating functions based on eigenfunctions for the transformed
data Y equals the optimal estimating functions for X.
In the case of the CIR process, used as a recurring example above, the generator
is
/

f(x) = a(b x)
f
x
(x) +
1
2

2
x

2
f
x
2
(x), a, b, R
+
the spectrum is discrete and given by

=
j
()[j = 1, 2, . . . = 0, a, 2a, 3a, . . .
and the eigenfunctions can be shown to be given by the Laguerre polynomials
and we are thus back in the class of polynomial estimating functions discussed
previously, see Example 2.1 in Kessler and Srensen [1995] and section 30 in
Spiegel and Liu [1999] for the Laguerre polynomials.
In the case where the spectrum is discrete as in the case of the CIR process we can
to some degree choose the number of eigenfunctions to include in the estimating
function, that is we can choose the number N 1 in (3.7). This holds in general
for both the case of a discrete and a continuous spectrum, when we are able to nd
N dierent eigenfunctions one might expect that we get a better estimator by
using all functions instead of K < N eigenfunctions. From Kessler and Srensen
[1995] section 4 we have that the asymptotic variance of the optimal estimating
function based on n + 1 eigenfunctions is lower than the asymptotic variance of
the optimal estimating function based on n eigenfunctions. This positive eect
has to be weighted against the additional complexity of the estimating function
when including more eigenfunctions.
As mentioned the advantages of working with eigenfunctions are mostly based
on the following
The conditional mean E

_
(X
t
i
; )[X
t
i1
= x

is explicitly known and thus


give a natural way of dening martingale estimating functions.
3.3. MODEL MISSPECIFICATION ANALYSIS 61
The optimal estimating function in the sense of Theorem 3.1 can be explic-
itly calculated in a number of cases, see for instance Larsen and Srensen
[2003] for an example.
The optimal estimating functions are invariant under smooth bijective trans-
formations of the data.
However, one major disadvantage of working with eigenfunctions lies in the way
that the estimating function is derived. This approach is closely connected to
the specic diusion process considered. In some of the classical models we can
rely on the literature to provide eigenfunctions but when faced with a given
diusion model we may not be able to nd any eigenfunctions at all and are
thus forced to use other methods for parameter estimations, such as polynomial
estimating functions. For instance, the approximate optimal quadratic estimating
function introduced above, can be implemented for practically any given diusion
model. We may have to simulate the conditional moments, and thus accept some
numerical inaccuracy but in return we are given a method of parameter estimation
that is fairly easy to implement, consistent and asymptotically normal under mild
regularity conditions and works for a large class of models.
In conclusion, both the strengths and the weaknesses of estimating functions
based on eigenfunctions are based on the fact that such an estimating function is
closely tailored to the exact type of diusion considered. In general we can thus
conclude that whenever an estimating function based eigenfunctions is available
it will be a very good choice and otherwise a quadratic estimating function will
be an acceptable alternative.
3.3 Model misspecication analysis
We now turn to the problem of assessing how well a proposed model ts the given
data. We therefore consider the case of n + 1 observations e.g. of the price of a
nancial asset
X
t
0
, X
t
1
, . . . , X
tn
with a postulated model
dX
t
= (X
t
; ) dt + (X
t
; ) dW
t
.
Assume that we have obtained an estimator

n
by use of e.g. maximum likelihood,
an estimating function, generalized method of moments or some other method. In
this section the exact method used to calculate the estimators is not important,
the question is how to determine if the postulated model with the estimated
parameter actually ts the observations.
62 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
When working within a discrete time-frame a fundamental model-assumption
is the distribution of the residuals, or at least that these residuals have mean
zero, are independent and identically distributed or some other essential veriable
distributional assumptions. The theory of model misspecication in discrete time
is largely based on testing these assumptions on the estimated residuals, in a
continuous time setting we have a number of method of testing the imposed
model against the observed data.
From Theorem 2.9 we have conditions ensuring the existence of an invariant
(stationary) distribution of the diusion, if indeed such a marginal distribution
exists we can perform a test of the model t by examining if the observations
appear to be drawn from this distribution. Clearly this method is mostly
useful for rejecting a postulated model. For instance we cannot conclude that
a set of observations has been generated by a Cox-Ingersoll-Ross type process
merely because the observations appear to be drawn from a Gamma distribution
as stated in Example 2.3. We can however justify ruling the CIR process out if
there are no indications that X
t
0
, X
t
1
, . . . , X
t
n
are gamma-distributed, at least if
n is large we would expect from Theorem 2.9 that the process approaches the
invariant (stationary) distribution.
In the cases where the transition probabilities are known we can of course use
these to test the model specication, in some cases, such as the CIR model, we
can calculate analytically the autocorrelation function and see how well this ts
the observations. In the case of the CIR process the autocorrelation function
turns out to be given by the simple expression (t) = corr (X
s
, X
s+t
) = e
at
, this
is again a useful method for testing wether a suggested model is incorrect and in
the case of the CIR process it is quite simple to implement in a graphic analysis
as a decreasing exponential function is easily recognizable.
Another approach to model misspecication analysis is the method of uniform
residuals. Consider the case of equidistant observations (although the method
easily generalizes to cover the case of non-equidistant observations). Let (x[y; )
denote the conditional distribution function of X
t
1
given X
t
0
= y in the postulated
model, consider the transformation of data, where
0
is the true unknown value
of
u
i
= (X
t
i
[X
t
i1
;
0
).
If the observations X
t
0
, X
t
1
, . . . , X
t
n
are truly generated by the model in question
then it follows that the u
i
s are independent and uniformly distributed on [0, 1].
However, in the majority of models must be calculated by simulations and of
course we dont know the true value of . For the estimated values

we can easily
simulate a large number of sample paths of X
t
1
given X
t
0
and thus derive a close
approximation,

, to the distribution function . Dene the estimated uniform
residuals u by
u
i
=

(X
t
i
[X
t
i1
;

).
3.3. MODEL MISSPECIFICATION ANALYSIS 63
Then, if the suggested model is a good description of the observed data, the u
i
s
are approximately independent and uniformly distributed on [0, 1].
This method thus provides us with a series of data which, if the model is correctly
specied, follows a specic distribution and we are in a situation where we can
perform both graphical and more formal goodness-of-t test.
In particular we can derive a test static from the uniform residuals by

U
n
= 2
n

i=1
log( u
i
) (3.37)
if u
1
, . . . , u
n
are independent and uniformly distributed on [0, 1], which would be
the case when the model is true, then

U
2
(2n).
Example 3.4 (The uniform residuals)
0 10 20 30 40 50 60 70 80 90 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Uniform Residuals
(a) Raw plot of the uniform
residuals
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(b) Cross-plot of u
i
and u
i1
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(c) QQ-plot of the uniform
residuals against a uniform
distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(d) Auto-correlation function
for the uniform residuals
Figure 3.5: The uniform residuals u
i
based on a simulated CIR process with true parameter values
(, ,
2
) = (1, 2, 1). Data sampled from simulated data for = 2.5 and t
n
= 250 and thus giving
a total of 100 observations.
A number of approximations (estimated parameters and simulated distribution
function) are used to derive u
i
. Clearly even for a correctly specied model the
uniform residuals will deviate somewhat from a true uniform distribution. We
64 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
therefor perform a simulation study of how well we can expect the estimated uni-
form residuals to approximate an iid sequence of uniformly distributed variables.
0 25 50 75 100 125 150 175 200 225 250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 Uniform Residuals
(a) Raw plot of the uniform
residuals
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(b) Cross-plot of u
i
and u
i1
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(c) QQ-plot of the uniform
residuals against a uniform
distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(d) Auto-correlation function
for the uniform residuals
Figure 3.6: The uniform residuals u
i
based on a simulated CIR process with true parameter values
(, ,
2
) = (1, 2, 1). Data sampled from simulated data for = 1 and t
n
= 250 and thus giving
a total of 250 observations.
We consider again the case where the data stems from a CIR model, as above
we simulate the data with equidistant discretization points for = 0.0001. From
this data we then sample observations according to the various values of given
below.
For each sample-path the parameters are estimated using the quadratic estimating
function (3.30) also examined in Example 3.3. For the estimated parameters
the uniform residuals are calculated using a simulated conditional distribution
function. For the simulated distribution function we use 1000 simulated sample-
paths with 1000 discretization points between X
i
and X
i+1
.
As in the previous examples we consider the three values of the observation inter-
val 2.5, 1, 0.1 and t
n
100, 250, 500. Figure 3.5 - Figure 3.7 presents the
results for t
n
= 250 and the three dierent values of . The results for t
n
= 100
3.3. MODEL MISSPECIFICATION ANALYSIS 65
and t
n
= 500 are quite similar and the respective graphs are, for reason of space,
omitted in the following.
For each case below the parameter estimates are naturally similar to those reported
in Table 3.3 and thus improve somewhat when either the number of observations
increase or the observation interval decrease (or of course both). See Example
3.3 for a examination of the parameter estimators used in the following.
Figure 3.5 presents the results for = 2.5, we note no particular system in the
raw uniform residuals (Figure 3.5(a)) or the cross-plot of u
i
against u
i1
. This
corresponds quite well with independence of the residuals. However the QQ-plot in
Figure 3.5(c) indicates some deviation from a true uniform distribution. Also the
autocorrelation graph in Figure 3.5(d) indicate some deviation from a sequence
of iid uniformly distributed variables. It is worth keeping in mind that we know
that the structural model is correctly specied. This means that the deviations
in Figure 3.5 stem solely from either the parameter estimates or the simulated
approximation to the distribution function or both.
0 250 500 750 1000 1250 1500 1750 2000 2250 2500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0 Uniform Residuals
(a) Raw plot of the uniform
residuals
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
(b) Cross-plot of u
i
and u
i1
.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(c) QQ-plot of the uniform
residuals against a uniform
distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(d) Auto-correlation function
for the uniform residuals
Figure 3.7: The uniform residuals u
i
based on a simulated CIR process with true parameter values
(, ,
2
) = (1, 2, 1). Data sampled from simulated data for = 0.1 and t
n
= 250 and thus giving
a total of 2500 observations.
66 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
We note that Figure 3.5 is made on the basis of t
n
= 250 and = 2.5 thus giving
only 100 observations and quite a large interval between each observation.
Figure 3.6 indicates the results of a smaller observation interval, that is = 1.
We note considerable improvements in regards to the results for = 2.5. Part of
this improvement stems from the fact that we have more observations but equally
important is the decrease in . We note some signs of correlation but the QQ-plot
is greatly improved from the previous graph.
Finally Figure 3.7 displays the results for = 0.1 and t
n
= 250, this means a
total of 2500 observations. For this amount of observations the QQ-plot reveals
an almost perfect t, some correlation can still be found. However, compared to
the previous results (for = 2.5 and = 1) the improvement is signicant.
When studying the respective graphs for xed and t
n
increasing we note results
similar to the improvements depicted in Figure 3.5 - Figure 3.7. We are thus
quite condent that, for datasets with observations measured in the thousands,
the method of uniform residuals provide an accurate graphical method of model
misspecication analysis.
To conclude this example we note that the test static

U given by (3.37) cannot
reject the model in any of the cases studied above. This also indicates that if the
model is indeed correctly specied, this method should point towards that fact.
3.4 Empirical data example
Having already derived and examined an estimator that works quite well in the
case of the CIR process we now nish the row of CIR examples by using the
methods introduced above to apply this model to a data series containing ob-
servations of a short rate process, The data considered here is the one month
Eurodollar deposit spot-rate, and consists of daily observations from January 4.
1971 to February 27 2004 in total T = 8450 observations
2
(weekends have been
ignored. That is, Monday is treated as the next day following Friday). We will
present a far more detailed data analysis for the datasets used in the empirical
analysis in the following chapter, for now we simply consider how well the CIR
model can be said to be a true model describing this data.
Based on Figure 3.8 we note that it might not be reasonable to treat the dif-
fusion process as a time homogeneous process. In particular the early 1980s
are characterized by considerably higher rates that the remainder of the period
covered in this sample. One way of dealing with this is to restrict our attention
to a sub-sample of the data thus treating the higher observations as belonging
to a dierent regime. A dierent approach, maintaining time-invariant drift and
2
Source: Federal Reserve Statistical Release H.15, http://www.federalreserve.gov/releases/h15/data.htm
3.4. EMPIRICAL DATA EXAMPLE 67
1975 1980 1985 1990 1995 2000
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
22.5
25.0
1_month
Figure 3.8: Daily observations of the one month Eurodollar rate.
a

b
2
0.001159 (0.0009818) 6.5816 (2.6162) 0.009805 (0.00174)
Table 3.5: Estimation results of applying the CIR model to the one month Eurodollar rate, full
sample ie. 1971-2004. The standard deviations are calculated using (3.32)-(3.33).
diusion functions is to consider a more general drift function instead of the lin-
ear drift of the CIR model, this will be the topic of later chapters, for now we
maintain the model specication given by
dX
t
= a(b X
t
)dt +
_
X
t
dW
t
.
We estimate the parameters using the approximation to the optimal quadratic
estimating function (3.30) and for the entire sample we nde the estimates given
in Table 3.5.
The uniform residuals are calculated as stated above, due to rounding errors
based on a generally poor model t we are left with some observations equal to
0 and 1, if we ignore these we are left with 8375 observations that should stem
from a sequence of iid uniformly distributed variables. The test static is found
to be

U = 15629.5 witch if the model was true should be an observation from
a
2
(16750)-distribution. This distribution has a 2.5% quantile of 16393.2 and
based on the test static we thus reject the hypothesis. It should be noted though
that the fact, that we ignore a few observations corresponding to high values of
2 log( u
i
) means that this test static should be viewed with some caution and
inference should be based on methods such QQ-plots and autocorrelation where
68 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
a

b
2
0.0010697 (0.0006461) 5.0679 (1.1890) 0.0011610 (0.0001306)
Table 3.6: Estimation results of applying the CIR model to the one month Eurodollar rate,
sub-sample ie. 1984-2001.
we do not face the same problems.
A QQ-plot of the uniform residuals, as well as a plot of the autocorrelation func-
tion, also clearly reject that this data should be either uniform or independent.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(a) QQ-plot uniform residuals
against a true uniform distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(b) Autocorrelation function for the
uniform residuals u
i
.
Figure 3.9: The uniform residuals u
i
based on the CIR model
estimated on the one month Eurodollar rate from 1971 to 2004
From the above analysis it is clear that a simple model such as the CIR model
cannot be used to describe interest rate data in the long run without allowing
the values of the parameters to adjust to dierent regimes of the spot rate. This
however, would violate the assumption of time-invariant drift and diusion pa-
rameters. It may still be true though that the CIR model is sucient to describe
the short rate between regime changes, that is for periods where there is no
substantial political/nancial pressure to raise or lower the interest rate.
We restrict our attention to a sub-sample of the data, thus avoiding the extraor-
dinary high rates of the early 1980s and the low rates of the recent years, we
thus consider the data form January 3. 1984 to July 2. 2001.
The estimates are given i Table 3.6
Calculating the uniform residuals again gives a test static U = 7267.58 which
should follow a
2
(8888) distribution again clearly rejecting the proposed model-
specication. This is also seen from Figure 3.10. We note some improvement in
the correlation but the residuals are clearly not observations from a iid sequence
3.5. THE A

IT-SAHALIA METHOD 69
of uniformly distributed variables.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(a) QQ-plot uniform residuals
against a true uniform distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(b) Autocorrelation function for the
uniform residuals u
i
.
Figure 3.10: The uniform residuals u
i
based on the CIR model
estimated on the subsample of the one month Eurodollar rate
from 1984 to 2001
From the analysis of both datasets we see that the CIR model is hardly a satisfying
description of the short rate process, at least not for the data discussed in this
example. It may still be possible to get a reasonable model t by choosing
even smaller periods of time for our observations, such as e.g. the second half
of the 1990s where the interest rate appears to be quite stable, but that does
not redeem the need for a more general model of the short rate than the CIR
model, a wide range of models have been suggested, the majority focusing on the
diusion function while maintaining a linear mean reverting drift function but
also non-linear drift function have been suggested, we shall return to these topics
below.
3.5 The At-Sahalia method
We conclude this chapter by discussing an alternative method for both model
specication and parameter estimation suggested by At-Sahalia [1996b]. This
method was suggested as a test on models for the spot interest rate that are
similar to the diusions discussed above, that is univariate diusions that are
stationary Markov processes. The main purpose of the following method is to
conduct model misspecication analysis, the actual parameter estimation can be
thought of as a byproduct.
The main focus of the method is on the marginal density as given by Theorem
2.9. For the remainder of this section we denote this density by (x; ). Consider
70 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
the following setup
Denote the true drift and diusion functions
0
(x) and
0
(x).
Denote the possible drift and diusion functions under the suggested model
by P, that is
P = (; ), (; )[ .
For instance in the CIR model P would be the space of all functions on the
form x ( x) and x

x where = (, , ) R
3
++
.
Consider the null hypothesis
H
0
:
0
[(;
0
) =
0
() and (;
0
) =
0
()
and the alternative hypothesis
H
1
: (
0
(),
0
()) / P.
Simply put in the case of the CIR model, the hypothesis H
0
is the question: Can
we nd ( ,

, ) such that (

x) =
0
(x) and

x =
0
(x)? Recall that
0
and
0
denotes the true unknown drift and diusion functions.
Clearly H
0
is another way of formulating the question asked above, does any
parameter values exist for which our suggested model equals the true process.
Above we considered the case where we already had a parameter estimate here
we do it the other way around.
Since we do not know the true processes (
0
(),
0
()) we need an alternative way
of testing the hypothesis H
0
, denote by
0
(x) the true marginal density, let
M
be the space of all density functions given by Theorem 2.9 for the functions (; )
and (; ) and consider the following null hypothesis and alternative
_
H
M0

0
[(;
0
) =
0
()
H
M0

0
() /
M
.
It follows that H
0
cannot be true if H
M0
is not true, it may seem that working
with H
M0
is little improvement as
0
() is unknown as well, but
0
can often be
estimated by a nonparametric estimator
0
which does not assume that the null
hypothesis is correct and thus converges to the true density whether or not the
parametric model given by (; ) and (; ) is a correct description of the true
proces. At-Sahalia [1996b] introduces the following test static

M = nb
n
min

i=1
((X
i
; )
0
(X
i
))
2
3.5. THE A

IT-SAHALIA METHOD 71
where b
n
is the bandwidth of the nonparametric estimator
0
.
The main idea behind this test static is that when the suggested model is true
there will exist a parameter
0
such that H
M0
holds and thus

M should be small.
When the parameterized model is incorrect the nonparametric estimator will still
converge to the true density whereas the parametric density (, ) will not be
near the true density and thus

M will be large. By the same arguments an
estimator is derived from the dierence between the parametric density and the
nonparametric estimator by

M
= arg min

i=1
((X
i
; )
0
(X
i
))
2
.
In At-Sahalia [1996b] the asymptotic normal distributions of both

M and

M
are stated and we are thus able to test the null hypothesis at any given level,
by remembering that small values of

M are favorable for the hypothesis. At-
Sahalia [1996b] consider a series of regularity conditions including existence of
an invariant measure and sucient dierentiability of the invariant density, see
Assumptions A1-A5 in At-Sahalia [1996b]. Given these conditions it follows that
n
1
2
_

M

0
_
D
^(0,
M
)
b
1
2
n
_

M E
M
_
D
^(0, V
M
)
where
M
, E
M
and V
M
are known expressions derived in the paper.
As mentioned when we introduced model specication tests based on the invariant
distributions, , it is obvious that any procedure based solely on the marginal
density does not utilize all the available information in the data. Although the
issues of exploiting the information contained in the transition probabilities of the
process are discussed in At-Sahalia [1996b] the empirical model-testing is based
on the test static

M. The suggested procedure can thus be used primarily to
reject a given model specication. We can, however, imagine a situation where
a given incorrect model cannot be rejected based on marginal density alone.
However, it turns out that not only acceptance of a false model is a problem with
this method, indeed in a later study Pritsker [1998] shows indications that the
procedure suggested above rejects a true model to often. Chapman and Pearson
[2000] perform Monte Carlo studies of the nonparametric estimators used by At-
Sahalia and conclude that these estimators appear to have poor nite sample
properties.
Another problem when restricting attention to the marginal distribution concerns
the identication of the parameter vector . Consider for instance the simple
gaussian AR(1) model
y
t
= y
t1
+
t
,
t
iid^
_
0,
2
_
72 CHAPTER 3. PARAMETER ESTIMATION IN DIFFUSION MODELS
has the marginal distribution ^
_
0,

2
1
2
_
when [[ < 1, thus we cannot identify
the parameters and
2
from the marginal distribution alone. Clearly these
problems become more pronounced as we introduce more parameters into the
model. These issues are illustrated by the fact that it is evident from the expres-
sion for the invariant distribution given in Theorem 2.9 that the relative scale of
the drift and diusion parameters can vary arbitrarily. Sucient conditions for
identication are discussed briey in At-Sahalia [1996b] and are closely related
to the specic model considered.
At-Sahalia implements the procedure and tests a number of nested models for
the 7-day Eurodollar deposit spot rate and argues that a major source of mis-
specication in the classical nancial models is the linearity in the drift function.
Linearity of drift has been a persistent part of interest rate modelling in the ma-
jority of the models used in nancial literature. Many models such as the CIR
or the more general Constant Elasticity of Variance (CEV) model allow for non-
linearities in the diusion function, but the norm has been to model the drift by
a linear mean reverting function.
In particular At-Sahalia tests a number of nested models including the Vasicek,
CIR, CKLS and even more general models, and conclude that the only model
that cannot be rejected on a 95% level is a very general model with nonlinearities
in both drift and diusion functions. We do however refer to result of the study
in Pritsker [1998] that the

M static has a tendency to reject the true model too
often.
We shall return to the non-linear drift model suggested by At-Sahalia in the
subsequent chapter where we shall attempt to estimate the model and perform
model diagnostics by using estimating functions and uniform residuals instead of
the procedure suggested in At-Sahalia [1996b].
Chapter 4
Modelling the short rate
The emphasis in the previous chapters has been on introducing the classical re-
sults from the theory on stochastic dierential equations and presenting some
relatively new methods for parameter estimation. We have only vaguely touched
upon the main reason for discussing these subjects, which is implementing the
methods on nancial data and determining wether or not a given model speci-
cation can actually be considered a reasonable description of the dynamics of the
nancial asset in question.
We will, in this chapter, limit our attention to one-dimensional parametric dif-
fusion processes and attempt to use the techniques from the previous chapters
to examine the properties of a new group of nested interest rate models. The
properties of one factor short rate models have proven to be illusive, not only
the common assumption of linear drift have been questioned in recent literature.
The entire concept of using parametric models have been challenged by the in-
troduction of sophisticated nonparametric methods. Also the use of one-factor
models has come under scrutiny, in particular it has been proposed to introduce
a second stochastic process to describe the dynamics of the volatility to capture
the volatility clustering often seen in one-factor models. We thus note that
the eld of short rate modelling has a wide range of open issues where consensus
amongst researchers is sparse.
We begin with a short overview of the theory of term structure modelling.
4.1 The term structure of interest rates
A zero coupon bond, ZCB, (also known as a pure discount bond) is a nancial
asset which from the point of view of the holder gives a claim to one certain unit
of account (e.g. a dollar) at time T (the time of maturity for the bond). We will
denote the price of a ZCB at time t < T by P(t,T), for a xed value of t the
graph of T P(t, T) is called the term structure at time t.
The continuously compounded interest rate R(t, T) is the yield to maturity at
time t for investment in the bond maturing at time T, thus we have
P(t, T) = e
R(t,T)(Tt)
(4.1)
which gives an expression for R
R(t, T) =
1
T t
ln (P(t, T)) . (4.2)
73
74 CHAPTER 4. MODELLING THE SHORT RATE
For xed value t we refer to T R(t, T) as the yield curve at time t.
From R(t, T) we obtain a formal denition of the short rate (or instantaneous or
spot rate) at time t as the rate obtained at the limit as T approaches t in (4.2),
that is
r
t
= R(t, t) = lim
Tt
R(t, T). (4.3)
The spot rate is often referred to as the risk-free rate since it is the guaranteed
return for placements between time t and t +dt.
Since the object of interest in the following will be specifying the dynamics of r
t
it is useful to remember the term structure equation, assume that the short rate
dynamics are given by a usual type SDE
dr
t
= (r
t
; )dt +(r
t
; )dW
t
. (4.4)
Introduce the notation P(t, T) = F(t, r
t
, T) to indicate the term structures de-
pendency on the short rate. Then the term structure P(t, T) (and other deriva-
tives) will be determined by the partial dierential equation (function arguments
have been omitted)
F
t
+ ( )
F
r
+
1
2

2
F
r
2
rF = 0 (4.5)
F(T, r
T
, T) = 1 (4.6)
where = (t, r
t
; ) is the market price of risk. See for instance Bjork [1998]
or Due [1992] for an explicit derivation and discussion of the term structure
equation (4.5)-(4.6) and the market price of risk. This means that apart from
specifying the dynamics of the short rate we need to determine the market price of
risk before we are able to price derivatives. From nancial theory it is well known
that the term is the drift of the short rate under the risk-neutral martingale
measure, often called Q, whereas the dynamics of r has been specied under the
objective (or observed) probability measure, referred to as T. Specifying the
market price of risk is equivalent to specifying the martingale measure, Q, which
in this one-factor model is equivalent to specifying the dynamics of the short rate
under the probability measure Q. From Girsanovs Theorem we know that the
diusion term of the stochastic dierential equation will be unaected by the
measure change but the drift will not. However, to obtain good estimates of the
diusion parameters we need a realistic T-model for the short rate. A careful
modelling under the objective measure is thus vital even if we use the Q-measure
to price derivatives. We also note that for general asset pricing models, no
arbitrage requires that the T and Q measures are equivalent. The T measure
thus imposes restrictions on the Qspecication and again we note the importance
of modelling under the T measure, see Larsen and Srensen [2003] for further
details.
4.1. THE TERM STRUCTURE OF INTEREST RATES 75
Since the main object of the following does not concern derivative pricing we
will only briey mention the procedure of specifying the Q dynamics of the short
rate, this is known as martingale modelling. A common procedure to obtain
inference of the Q-parameters is known as inversion of the yield curve. This
method works by specifying a stochastic dierential equation for the dynamics of
the short rate under the martingale measure.
dr
t
=
Q
(r
t
;
Q
)dt +
Q
(r
t
;
Q
)dW
Q
t
.
By solving the term structure equation it is possible to determine a theoretical
term structure which depends on the parameters
Q
from the stochastic dier-
ential equation. From observed bond market prices an empirical term structure
can be derived and a parameter estimate,

Q
, can be achieved by tting the
theoretical term structure to the empirical term structure.
Again it is important to note that the estimator

Q
does not correspond to the
observed values of any proxy for the short rate, using martingale modelling we
completely omit modelling the short rate under the observed probability measure.
From an econometric point of view this method yields only partial information
about the actual short rate process. The fact that the term structure partial
dierential equation (4.5)-(4.6) must be solved to determine the theoretical term
structure also limits the implementation of this method to short rate models for
which the PDEs involved are somewhat easy to solve. This is the case for the
class of models that permit ane term structure, that is when the bond prices
are given on the form
P(t, T) = F(t, r
t
, T) = e
A(t,T)B(t,T)r
t
where A and B are deterministic functions.
This particular simple term structure follows for instance from the class of interest
rate models with drift and diusion functions on the following form
(r
t
; ) =
0
+
1
r
t
(r
t
; ) =
_

0
+
1
r
t
.
These conditions are clearly satised for the CIR model used as a continuing
example in previous chapters and this thus explains some of the popularity of
this model. From a simplied example in Chapter 3 we already know that under
the objective measure T this model is not a satisfying description of the short
rate.
It should be noted that the instantaneous short rate itself is a purely theoretical
object, it makes little sense in actual investment decisions to discuss the guar-
anteed return for investments between time t and t + dt. In general we use
various observed interest rate data as a proxy for the short rate data.
76 CHAPTER 4. MODELLING THE SHORT RATE
An equivalent way of characterizing the term structure is by using forward rates.
The (continuously compounded) instantaneous forward rate f(t,T) is dened by
f(t, T) =
ln(P(t, T))
T
The forward rate is the guaranteed return at time t for investments between time
T and T + dt. It can be achieved by selling one bond maturing at time T and
using the earnings to buy bonds maturing at time T + dt. It is easily seen that
the term structure can be expressed in terms of forward rates by
P(t, T) = exp
_

_
T
t
f(t, s)ds
_
From the denition of the forward rate we must obtain the spot rate if we let T
tend to t, that is
r
t
= f(t, t) = lim
Tt
f(t, T)
Most of the classical arbitrage-free models of the term structure are based on
modelling the short rate, whereas some more recent arbitrage-free models are
based on forward rates or zero coupon bonds.
We now return to the subject of modelling the short rate under the objective
measure T, based on discrete observations of short term interest rates which will
then be used as a proxy for the theoretical instantaneous short rate.
4.2 Characteristics of the short rate
An essential dierence between interest rates and other nancial data such as
stock prices is that interest rates appear to be returning to some long-run average
over time, see Hull [1997] Chapter 17. As discussed in the CIR example of Chapter
3 one could argue that the level of the mean to which the process reverts may
dier within various time periods due to shocks to the economy but there are
compelling economic arguments supporting mean reversion even if this mean may
change in the long run. When rates are high macroeconomic theory predicts
that investment is low and the economy as a whole tends to slow down. This
reduces the demand for funds by borrowers and in turn results in declining interest
rates. When rates are low investment rises and so does the demand for funds, as a
result rates tend to rise. As the latter years have shown this mean reversion eect
may be diminished by other factors in the economy, however, it is reasonable to
argue that any viable model for interest rates should exhibit some degree of mean
reversion.
Even when we limit ourselves to study representations of the short interest rate
dynamics for which the short rate solves a one dimensional stochastic dierential
4.2. CHARACTERISTICS OF THE SHORT RATE 77
equation as studied earlier in this text we are faced with an extremely general
specication. Apart from mean reversion, as mentioned above, economic theory
oers little restrictions on the particular choice of drift and diusion functions.
Based on empirical evidence and economic theory, we can specify that drift and
diusion should be chosen such that the model rules out negative rates and in-
duces a form of conditional heteroskedasticity in the model. Volatility appears to
increase when the level of the interest rate increases. These properties are satis-
ed in the CIR model and thus provide a further number of attractive features to
this model apart from the ane term structure. Nonetheless, these properties are
clearly satised for many choices of drift and diusion functions and for instance
the use of linear drift may prove to be far to restrictive.
One approach to the problem of correctly specifying a parametric diusion
model is to argue that parametric specication of the drift and volatility func-
tions altogether is an unreliable method. Instead turning to nonparametric tech-
niques for diusion estimation that allow for exible, possibly non-linear func-
tional forms. Main contributions include the already mentioned parametric and
semi-parametric methods by At-Sahalia [1996a,b] and also Stanton [1997], both
report strong nonlinearities in the drift function. As already mentioned above the
opinions are split on this question, especially Chapman and Pearson [2000] report
inconsistencies in the nonparametric estimators used to derive the nonlinear drift
functions.
Most of the classic work on interest rate modelling, including the important
article Chan et al. [1992] estimate the parameters and test the specication of
the continuous-time models using a discrete-time econometric specication and
for instance Generalized Method of Moments. As indicated by simulation studies
in Chapter 3 using a discrete approximation to the continuous-time model of
interest may not provide satisfying inference about the true model without the
unrealistic assumption that the time between observations tends to zero.
In conclusion of the above discussion we note that, apart from mean reversion,
economic theory provides very little guidance about how to specify parametric
drift and diusion functions, nancial literature cannot even reach a consensus
on wether the drift is reasonably specied by a linear drift function or not. In
this context the general parametric model proposed in At-Sahalia [1996b] is
important as it is a parametric model which is consistent with the results from the
nonparametric work. Thus this model provides a link between the two approaches
in the sense that we can use this model to implement parametric methods to
examine the results of the nonparametric methods.
78 CHAPTER 4. MODELLING THE SHORT RATE
4.2.1 Some standard models
The literature on the modelling of the instantaneous short rate, yields a large
number of suggestions on how to specify a parametric stochastic dierential equa-
tion determining the dynamics of the rate. These various models illustrate well
the discussion above on the problem of choosing drift and diusion functions
when we are faced with so few restrictions from economic theory. Some models
have been used extensively to model the martingale measure dynamics of the
short rate, but as mentioned the focus from this point on will be on modelling
under the objective measure.
Name Specication Reference
CEV dr
t
= ar
t
dt +r

t
dW
t
Cox [1975]
Vasicek dr
t
= a(b r
t
)dt +dW
t
Vasicek [1977]
Dothan dr
t
= ar
t
dt +r
t
dW
t
Dothan [1978]
CIR VR dr
t
= r
3/2
dW
t
Cox et al. [1980]
CIR dr
t
= a(b r
t
)dt +

r
t
dW
t
Cox et al. [1985]
CKLS dr
t
= a(b r
t
)dt +r

t
dW
t
Chan et al. [1992]
Table 4.1: A small selection of the many parametric short rate models used in the literature in
the past 20-30 years.
The vast majority of the models used previously have been based on linear drift
functions, as mentioned above mean reversion is a reasonable restriction for a
given model but no theoretical results dictate that the drift should be linear,
Table 4.1 summarizes a few of the many models used in the literature.
The CIR model has already been discussed extensively above and possesses sev-
eral appealing qualities such as guaranteeing positive interest rates. This is not
the case for the Vasicek model (also known as the Ornstein-Uhlenbeck process,
see Example 3.2). Due to the constant diusion function it follows that the
resulting process r
t
will be Gaussian and will thus, according to the transition
probabilities (3.21), have a positive probability of reaching zero no matter the
parameter choice. This model has been used excessively in nancial literature
due to the fact that the Gaussian properties simplies calculations realted to this
model, as illustrated in Example 3.2. The problem of non-positive interest rates
disqualies this process from being a reasonable description of the short rate.
The solution to the Dothan [1978] model is simply a Geometric Brownian Motion,
this model is similar to the Black-Scholes model for stock prices and results in log-
normal distributed interest rates. These are clearly positive with probability one
but do not exhibit mean reversion and are not well suited for spot rate models.
The CIR VR model (Cox et al. [1980]) was introduced by Cox, Ingersoll and Ross
in a study of variable-rate securities (hence the VR) and lags a drift function
altogether. The constant elasticity of variance (CEV) model allows for a more
4.2. CHARACTERISTICS OF THE SHORT RATE 79
general diusion specication that the rest of the models discussed, but the drift
function may turn out to be far to restrictive.
Finally the CKLS model (Chan et al. [1992]) considered by Chan, Karolyi,
Longsta and Sanders was introduced as a model nesting the majority of the
classical models.
In Chan et al. [1992] the unrestricted CKLS model is estimated using a discrete-
time model specication and the model is implemented on Treasury bill yield
data. The resulting estimator for is clearly larger than one which in turn
speaks against the majority of the classical models such as the CIR or Vasicek
specications. Indeed apart from the CIR VR model all the specications in
Table 4.1 imply 0 1. Subsequently the work by Chan, Karolyi, Longsta
and Sanders represents a signicant step away from the standard models usually
studied as the results indicate that the volatility depends much more strongly on
the current level of the short rate.
When we consider a linear drift and CEV diusion function model such as the
CKLS model, that is
2
(r; ) = r

, two opposing eects inuence the value of


. When the process is close to zero there should be low volatility to ensure
that zero is unattainable. When the mean reversion is linear this indicates that
should be large such that the pull away from zero is dominating. For large
values of the spot rate a high value of would imply high volatility and thus
probabilities of hitting even higher values that cannot be compensated for by
the relatively weak linear mean reversion. This means that must not be too
large. Clearly these two eects provide some (simplied) arguments towards the
problems that arise when tting a linear drift, CEV diusion model to observed
data. Below we shall consider a model that allows for stronger mean reversion
than linear for both high and low values of the spot rate which to some degree
nullies the arguments above. In addition we shall extend the CEV diusion
function such that a compromise between the two opposing eects is reached.
However, at this point it is natural to mention the hypothesis which we will
also implicitly consider during the following analysis. That is, a prevailing view
in some literature is that a one-factor model is simply not a sucient tool for
describing the dynamics of the spot rate. To model the dynamics of the short
rate we need stochastic volatility models that better describe the uctuations of
the volatility. Empirical evidence can be found in Andersen and Lund [1997], if
this is indeed the case then adding complexity to the one-factor models will only
improve some aspects of the model t. There will still be aspects of the dynamics
of the volatility such as volatility clustering, that are not described well by the
extended one-factor models. This issue is partly explored in the following by
considering a class of one-factor models that extend both the drift and diusion
functions of the classical models discussed above.
80 CHAPTER 4. MODELLING THE SHORT RATE
4.3 Examining parametric models for the short
rate
As mentioned, At-Sahalia [1996b] and Stanton [1997] use semi-nonparametric
and nonparametric methods to model the short rate. The At-Sahalia approach
is unique in the sense that a very exible parametric model is proposed which
to some degree is consistent with the results of the nonparametric work of both
At-Sahalia [1996b] and Stanton [1997]. However, general examinations of the
type of models suggested by the work of At-Sahalia using parametric estimation
methods seem to be quite rare in the literature. Chapman and Pearson [2000]
consider a discrete model with similar drift and report little evidence against
linearity, however as already mentioned several times discretization bias may be
a problem in this approach. Elerian et al. [2001] use Markov Chain Monte Carlo
(MCMC) methods to introduce auxiliary data in a likelihood inference setting and
estimate a model that closely resembles that of At-Sahali (same drift function,
slightly dierent diusion function). Elerian et al. [2001] report overwhelming
support to the general parametric model (with nonlinear drift) compared to the
Vasicek and CIR models, however none of the models considered in that paper
provide a really good t to the data.
The proposed method below uses martingale estimating functions to examine a
parametric model which allows for nonlinear drift. Using estimators of this nature
avoids some of the discretization problems encountered in other models as we do
not need to turn our attention to discrete versions of a continuous model simply
because of the discrete nature of the data.
4.3.1 Specication of the general model
The empirical testing is based on the following stochastic dierential equation
dr
t
= (
0
+
1
r
t
+
2
r
2
t
+
3
/r
t
)dt +
_

0
+
1
r
t
+
2
r

3
t
dW
t
(4.7)
The class of models nested within the SDE given in (4.7) includes all of the
various interest rate models discussed above, see Table 4.2. This model allows
for nonlinearities in the drift function and a more general diusion function than
that of the CEV diusion suggested in Chan et al. [1992].
As we have already noted earlier, even the simple square root diusion of the
CIR model does not satisfy the Lipschitz conditions of the classic existence and
uniqueness results as stated in Theorem 2.2, however, as demonstrated in Chapter
2 these conditions are far from necessary.
4.3. EXAMINING PARAMETRIC MODELS FOR THE SHORT RATE 81
0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
0.025
0.050
0.075
0.100
0.125
0.150
0.175
0.200
Figure 4.1: A simulated sample-path of the full model with parameter values =
(
0
,
1
,
2
,
3
,
0
,
1
,
2
,
3
)
T
= (0.004, 0.03, 0.09, 0.0002, 0.0002, 0.002, 0.06, 2.1)
T
.
We dene as usual
= (
0
,
1
,
2
,
3
,
0
,
1

2
,
3
)
T
(4.8)
(x; ) =
0
+
1
x +
2
x
2
+
3
/x (4.9)

2
(x; ) =
0
+
1
x +
2
x

3
. (4.10)
We rst consider conditions ensuring that the process exhibits a form of mean
reversion in the sense that
lim
x0
(x; ) > 0 (4.11)
and
lim
x
(x; ) < 0 (4.12)
Model Drift Restriction Diusion Restrictions
CEV
0
=
2
=
3
= 0
0
=
1
= 0
Vasicek
2
=
3
= 0
1
=
2
= 0
Dothan
0
=
2
=
3
= 0
0
=
1
= 0,
3
= 2
CIR VR
0
=
1
=
2
=
3
= 0
0
=
1
= 0,
3
= 3
CIR
2
=
3
= 0
0
=
2
= 0
CKLS
2
=
3
= 0
0
=
1
= 0
Table 4.2: The models from Table 4.1 are all nested within the model dened by (4.7). Note
that when comparing the model formulations in Table 4.1 with the restricted models in this
table that for instance in the CKLS model =
1
2

3
because of the square root in (4.7).
82 CHAPTER 4. MODELLING THE SHORT RATE
We also seek constraints such that

2
(x; ) > 0, x > 0. (4.13)
For (4.11)-(4.13) to be satised necessary parameter constraints are given in the
following condition.
Condition 4.1
1.
2
0 and if
2
= 0 then
1
< 0.
2.
3
0 and if
3
= 0 then
0
> 0.
3.
0
0 and if
0
= 0 and 0 <
3
< 1 (and
1
otherwise unrestricted) then

2
> 0 or if
0
= 0 and
3
> 1 (and
2
otherwise unrestricted) then
1
> 0.
4.
2
> 0 if either
3
> 1 or
1
= 0.
5.
1
> 0 if either 0 <
3
< 1 or
2
= 0.
Remark 4.1 (Remarks to Condition 4.1)
Condition 4.1(1) ensures that the dominating terms (the quadratic or linear
terms) in the drift function will be negative when the process reaches suciently
large values, and when the process reaches values close to zero the linear and
quadratic term in the drift function will be negligible and Condition 4.1(2) will
ensure that the drift becomes positive.
For values of the process close to zero we need
0
to be nonnegative or else the
diusion would be negative or zero, if 0 <
3
< 1 then the term
2
x

3
of the
diusion function will be larger numerically than
1
x for x near zero and we
need
2
> 0 to ensure positivity if
0
= 0 and
1
could be negative. Conversely
if
3
> 1 the linear term will dominate near zero and if
0
= 0 then
1
> 0 is
necessary to keep the diusion function positive if
2
is allowed to be negative,
thus we have Condition 4.1(3).
If
3
> 1 the last term of the diusion function will dominate for large values
and thus we need
2
> 0 to keep the diusion function positive. On the other
hand if 0 <
3
< 1 then the linear term will dominate the diusion function for
large values and we need
1
> 0 to ensure positivity. This reasoning explains
Condition 4.1(4,5).
Note that the parametrization (4.7) will give rise to identication problems if e.g.

3
= 0 or
3
= 1, we need to restrict
3
to the set (0, 1) (1, ), clearly this does
not solve all the problems with identication in this model and we shall return
to this issue below.
4.3. EXAMINING PARAMETRIC MODELS FOR THE SHORT RATE 83
The restrictions on the diusion function given in Condition 4.1 are necessary but
not sucient to guarantee that
2
(x; ) is positive for all x > 0. Those conditions
only cover the behavior of
2
(; ) near the boundaries.
When neither
1
or
2
is restricted to be zero we need the following extra condi-
tions
Condition 4.2
If
1
,= 0 and
2
,= 0 then
1. If
3
> 1 then
1
>
2

3
_

0

2
(
3
1)
_

3
1

3
.
2. If 0 <
3
< 1 then
2
>

0

3
1
_

0

1
(1
3
)
_

3
To explain these rather unintuitive conditions consider the following lemma.
Lemma 4.1 (Sucient conditions for positive diusion)
Assume that Conditions 4.1 and Conditions 4.2 are satised, then

2
(x; ) =
0
+
1
x +
2
x

3
> 0, x > 0.
Proof Condition 4.1 ensures that
2
(x; ) is positive near both boundaries of
(0, ), we therefore explore extreme value points on the open interval. The only
root of the rst derivative is found at
x
0
=
_

3
_ 1

3
1
. (4.14)
It is evident that the shape of
2
(; ) guarantees that x
0
is the global minimum
on the interval (0, ), to prove the lemma we need only examine the sign of

2
(x
0
; ).
If
3
> 1 then
2
> 0 from Condition 4.1 and it follows that

2
(x
0
; ) > 0 if and only if

1
>
2

3
_

0

2
(
3
1)
_

3
1

3
. .
<0
.
If 0 <
3
< 1 then
1
> 0 from Condition 4.1 and
2
(x
0
; ) > 0 if and only if

2
>

0

3
1
_

0

1
(1
3
)
_

3
. .
<0
.
84 CHAPTER 4. MODELLING THE SHORT RATE

Whereas Condition 4.1 is intuitively necessary for the diusion coecient to be


positive Condition 4.2 is slightly less obvious. If we allow some of the parameters
in the diusion function to be negative then we must make sure that they are not
too negative. This is simply what is guaranteed by Condition 4.2. It should be
noted that although this condition is not explicitly stated in At-Sahalia [1996b]
where this model is rst introduced, the parameters estimated by the method
discussed in the nal section of Chapter 3 do satisfy the restriction which is also
clear from the fact that the estimated diusion functions are well dened.
To ensure that Theorem 2.8 and Theorem 2.9 are satised rather tedious calcu-
lations with the speed and scale measure show that we in addition to Condition
4.1 need the following conditions
Condition 4.3
1. If
3
> 0 and
0
> 0 then 2
3

0
.
2. If
3
= 0 then
0
> 0,
0
= 0,
3
> 1 and 2
0

1
> 0.
Combined Conditions 4.1, 4.2 and 4.3 guarantee that the process dened by (4.7)
is well dened, does not hit zero, does not go to innity in nite time and permits
the existence of an invariant distribution given by the expression in Theorem 2.9,
we summarize in the following
Theorem 4.1 (Properties of the model)
Suppose that the process r
t
satises the general parametric model given by
(4.7) and that Condition 4.1, 4.2 and 4.3 are satised and dene D = (0, )
and = inf t 0[r
t
/ D, then
A unique strong solution to (4.7) exists.
P( = +[r
0
= x) = 1, x D for all t > 0.
The process r
t
is ergodic with invariant measure given by the density
(x; ) =
K()

2
(x; )
exp
__
x
x
0
2(z; )

2
(z; )
dz
_
. (4.15)
Proof The only issue that has not been discussed above is the existence of a
unique strong solution, however, this follows directly from noting that Assump-
tion A1 page 415 in At-Sahalia [1996b] is clearly satised.
Remark 4.2 (Remarks to Conditions 4.1 and 4.3)
If we compare with Example 2.3 the CIR process is derived from the general model
by the restriction
2
=
3
= 0 and
0
=
2
= 0, in this case the conditions are
4.3. EXAMINING PARAMETRIC MODELS FOR THE SHORT RATE 85
satised when
1
< 0,
0
> 0 (from Condition 4.1(1)),
1
> 0 (from Condition
4.1(5)) and nally 2
0

1
(from Condition 4.3(2)). These conditions are equiv-
alent to those derived in Example 2.3, where we used the alternative parametriza-
tion of the process, that is (x; ) = a(b x) instead of (x; ) =
0
+
1
x
and
2
(x; ) =
2
x instead of
2
(x; ) =
1
x, we found the conditions to be
a > 0, b > 0, > 0 and 2ab >
2
. Clearly these conditions are equal to those
derived from Condition 4.1 and Condition 4.3 when we change parametrization.
We now turn to the issue of estimating the parameters of (4.7) in both some of
the nested models as well as the general model.
4.3.2 The estimation approach
Apart from the the work of At-Sahalia it seems that no econometric analysis of
the full model (4.7) has been made in the literature. As mentioned a continuous-
time model very similar to (4.7) is estimated in Elerian et al. [2001] using MCMC
methods, also a discrete model with drift similar to that of (4.7) is examined in
Chapman and Pearson [2000]. Of course a variety of papers exists examining the
form of the diusion function which still remains an unresolved question. Esti-
mation in the full model is complicated by the fact that we have eight parameters
to be determined, thus if the main area of interest is either drift or diusion (but
not both) it can be justied to consider a more simple version of the function
thereby facilitating easier estimation, this is part of the reasoning behind the
model considered in Chapman and Pearson [2000].
No closed form expression for the transition density of the process exists thus
ruling out maximum likelihood estimation. We would prefer to stay clear of
discretization problems and propose using the method of unbiased estimating
functions discussed in Chapter 3.
Since no analytical expressions for the conditional moments of a process with
dynamics following (4.7) can be derived, the optimal quadratic estimating func-
tion given by (3.16) and (3.17)-(3.18) will prove computationally inecient since
it relies heavily on moments of order one through four and also derivatives of
these moments. We therefore use the approximation to the optimal quadratic
estimating function given by (3.30), that is
G
n
() =
n

i=1
_

_
X
t
i1
;
_

2
_
X
t
i1
;
_
_
X
t
i
F(
i
, X
t
i1
; )

(4.16)
+

2
_
X
t
i1
;
_
2
4
_
X
t
i1
;
_

i
_
(X
t
i
F(
i
, X
t
i1
; ))
2
(
i
, X
t
i1
; )

_
.
86 CHAPTER 4. MODELLING THE SHORT RATE
We still need the conditional mean F(
i
, X
t
i1
; ) and variance (
i
, X
t
i1
; )
which have to be calculated using simulations. Specically F(
i
, X
t
i1
; ) is
approximated by

F(
i
, X
t
i1
; ) =
1
N
n

j=1
Y
(j)
m
i
(4.17)
and (
i
, X
t
i1
; ) is replaced by

(
i
, X
t
i1
; ) =
1
N
n

j=1
_
Y
(j)
m
i

1
N
n

j=1
Y
(j)
m
i
_
2
. (4.18)
Where N is chosen large and
i
= m
i
. Y
(j)
is a simulated path of the process
with values Y
(j)
k
calculated at times k
i
for k = 0, . . . , m for the parameter
and Y
j
0
= X
t
i1
. Thus by choosing N and m we can, according to the law of
large numbers, approximate the true moments in as much detail a desired, this
of course comes at the cost of slowing the estimation algorithm considerably for
large values og N and m.
In the following we implement the Milstein scheme which is of order one, see
Seydel [2002]. Other schemes with higher orders exists such as the strong Taylor
scheme which is of order 1.5. However, this scheme requires that two independent
normally distributed variables be drawn for each step (Y
(j)
k
, Y
(j)
k+1
) whereas the
Milstein only requires one such variable.
To solve the equation system G
n
() = 0 for we must implement a numeri-
cal method. For each iteration in this method, and for each observation of the
dataset, the Milstein scheme must be implemented to calculate the conditional
moments, this in turn means that large values of N and m will result in a quite
slow estimation procedure. A small simulation study indicates that for values as
low as N = m = 250 the results are fairly precise and for values in this neigh-
borhood the time spend on the estimation procedure for various dataset is still
acceptable. Clearly we can achieve better estimators by increasing the values
of N and m but a considerable time-penalty follows each increase. The exact
values of these two parameters must be chosen individually for each estimation
and the choice should be based on both the model in question, the size of the
dataset and the approximation scheme used.
In the full model (4.7) the equations G
n
() = 0 is a system of eight nonlinear
equations with eight unknown variables, . We suggest implementing an adapted
version of Broydens iterative procedure. This method does not require that
the inverse Jacobian matrix be explicitly calculated and it is relatively simple
to program, see Appendix A for further details. All calculations are carried out
using the Ox language by Doornik [2002].
As mentioned, for each iteration of the equation solving algorithm we need to
simulate the conditional moments for each observation. This means that the
4.3. EXAMINING PARAMETRIC MODELS FOR THE SHORT RATE 87
implementation might result in a time consuming program even for state of the art
computers. We therefore propose to rst implement a simple maximum likelihood
estimation based on the Euler approximation as discussed in Chapter 3. The
estimators derived by this method are only precise when we have both a large
number of observations and small time interval between each observation. We
then use the estimators from this procedure as starting values for solving the
estimating equations G
n
() = 0.
This two step method has a number of advantages, rst nearly all program-
ming languages have a build in method for maximization of a function. These
maximization methods are usually reliable, ecient and easy to implement.
Secondly no simulations are needed for this method and the Euler maximum
likelihood estimators thus provide a relatively fast method to derive decent initial
values for the general algorithm.
Finally using maximum likelihood on the Euler approximation to derive initial
values, we get an empirical example of how well the coarse Euler approximation
performs compared with an unbiased estimating function for observed nancial
data. In general we observed decent results from the Euler approximation in
the CKLS model. For the more complicated nonlinear models considered here
the task of calculating the maximum of the Euler likelihood function becomes
far more complicated and the advantage of using these values as starting values
becomes negligible.
The bootstrap method to determine the standard deviations of the estimators
proves far to time consuming in the following, instead we report the standard de-
viations calculated using the estimator of the asymptotic variance, see equations
(3.32)-(3.33).
Once the method above has been implemented for the general model (4.7) it
provides a simple tool for estimating any of the nested models for instance by
restricting the parameters according to Table 4.2.
We shall in particular consider the following models:
The CKLS model:
dr
t
= (
0
+
1
r
t
)dt +
_

3
r

3
t
dW
t
(4.19)
A general drift, CEV diusion model:
dr
t
= (
0
+
1
r
t
+
2
r
2
t
+
3
/r
t
)dt +
_

3
r

3
t
dW
t
(4.20)
A linear drift, general diusion model:
dr
t
= (
0
+
1
r
t
)dt +
_

0
+
1
r
t
+
3
r

3
t
dW
t
(4.21)
88 CHAPTER 4. MODELLING THE SHORT RATE
The full model:
dr
t
= (
0
+
1
r
t
+
2
r
2
t
+
3
/r
t
)dt +
_

0
+
1
r
t
+
2
r

3
t
dW
t
(4.22)
The reasoning behind these choices is obvious, the consensus in the literature
since Chan et al. [1992] is that the CKLS model (4.19) is a necessary extension
of the classical Vasicek and CIR type models, we therefore use this model as a
benchmark. The various extensions (4.20)-(4.22) will hopefully give insight into
the question of the nonlinearity in the drift. Also by comparing the rst and third
specication we can see the eect of a more general diusion function than those
usually considered in the literature. Finally the fourth specication is simply
the unrestricted model combining both nonlinear drift and a general diusion
function.
4.3.3 The identication problem
As mentioned above the parameters are not identied, no matter what method
of estimation is used, if either
3
= 0 or
3
= 1, this simply follows from the
functional form of the diusion function. At-Sahalia [1996b] notes that the
relative scale of (; ) and
2
(; ) is not determined when inference is based solely
on the invariant density (4.15). The basic idea behind the type of estimating
functions we use is that the distance between X
t
i
and E
_
X
t
i
[X
t
i1
;

(and the
similar expression for the variance) should be small for the true parameter. The
estimating function chosen provides n equations with n unknown parameters,
enough equations for identication but hopefully not so many that the equations
cannot be solved. This method thus has a tendency to x the scale of the drift
and diusion and we avoid the problem mentioned above. However, as indicated
by Elerian et al. [2001] restricting
3
away from zero and one does not solve all
the identication problems with the model in question. Indeed the values of
2
and
3
can be dicult to identify separately as
2
r

3
produce virtually the same
values for a range of dierent choices of (
2
,
3
). In fact this problem is the
reason why Elerian et al. [2001] turn their attention to a model with a dierent
diusion function than (4.7). This identication problem is illustrated in Figure
4.2, which indicates that some numerical problems can occur when estimating
the parameters of the diusion function.
It is a common fact though that most one-factor short rate models since Chan
et al. [1992] have included the CEV term on the form
2
r

3
so for the sake of
consistency and keeping the possibility of numerical identication problems in
mind, we will continue to work with the general model.
4.3. EXAMINING PARAMETRIC MODELS FOR THE SHORT RATE 89
0
5e05
0.0001
0.00015
0.0002
0.00025
0.0003
0.00035
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
r
Figure 4.2: The graph shows the function r
2
r

3
for the following sets of values (
2
,
3
):
(0.009, 2), (0.008, 1.9), (0.01, 2.1). The graph indicates that there may be numerical problems
in identifying these two parameters as dierent pairs of (
2
,
3
) may produce values of
2
r

3
that are very similar to each other. The values used are chosen to be close to those determined
by the estimation procedure for the data discussed below. The values in this gure are such
that it is possible to clearly distinguish the individual graphs but do illustrate the problems
that the graphs may be quite similar for dierent parameters.
4.3.4 Using proxies
Before turning to empirical implementation of the methods introduced above we
briey turn to the question of specifying a proxy for the unobserved short rate.
The papers discussed previously use dierent observed data series to estimate
the models for the short rate, Chan et al. [1992] use the one month Treasury bill
yield, Stanton [1997] uses the three month Treasury bill and At-Sahalia [1996a,b]
works with the 7 day Eurodollar rate.
The choice of data must weigh two dierent issues: On one side using very
short term interest rate such as overnight rates is sensitive to microstructure
problems. Whereas longer rates such as three month interest rate data may
deviate substantially from the true short rate process.
To illustrate this topic consider for a moment a general SDE for the true unob-
served short rate
dr
t
= (r
t
)dt +(r
t
)dW
t
. (4.23)
Let F(t, r
t
, T) be the time t price of a discount bond of the type discussed above.
The potential bias problems from using the yield of a nite-maturity bond as
proxy for the short rate can be illustrated from the fact that the yield-to-maturity
is given by
y(r
t
, t) =
ln(F(t, r
t
, T))
T t
. (4.24)
90 CHAPTER 4. MODELLING THE SHORT RATE
Which, assuming dierentiability and keeping T xed, according to Itos lemma
has the dynamics
dy
t
=
_
(r
t
)
y(r
t
)
r
+
1
2

2
(r
t
)

2
y(r
t
)
r
2
_
dt +(r
t
)
y(r
t
)
r
dW
t
.
If we misuse notation slightly and let r : R R denote the inverse function of
(4.24) we have the dynamics of the yield
dy
t
=
_
(r(y
t
))
y (r(y
t
))
r
+
1
2

2
(r(y
t
))

2
y(r
t
)
r
2
_
dt + (r(y
t
))
y (r(y
t
))
r
dW
t
=
y
(y
t
)dt +
y
(y
t
)dW
t
. (4.25)
It follows that the dynamics of the proxy process (4.25) is generally not equal to
the dynamics of the true short rate (4.23). For the models which permit ane
term structure as mentioned above, bond prices can be explicitly calculated and
the bias problem can be examined by comparing
y
and
y
with and from
(4.23).
Chapman, Long and Person (Chapman et al. [1999]) analyse the eect of using
1- or 3 month yield data as proxies for the short rate. For one-factor models that
allow ane term-structure the conclusion is that there are only negligible issues
with the use of a longer rate as a proxy. Unfortunately there are indications of
signicant problems when the proxies in question are implemented on nonlinear
models of the form considered here.
These ndings support the choice of data used in At-Sahalia [1996a,b] as 7 day
data is signicantly shorter that 1- or 3 month data, and still hopefully long
enough to avoid some of the microstructure issues that may occur when using
overnight interest rates.
An alternative way of avoiding proxy problems which we shall only briey mention
but not consider here is the use of a State Space Estimations type method which
uses the Kalman lter to lter out the short rate from various nancial data
thus completely avoiding the use of proxies.
Finally we could simply view the various models under consideration as attempts
to specify the dynamics of the observed data and completely disregard the fact
that we actually model the unobservable spot rate. This would of course severely
limit the usefulness of the results and we must in general keep the proxy issues
in mind whenever models for interest rates are evaluated.
4.4. EMPIRICAL RESULTS 91
Number of Obs. Mean Standard Deviation Highest Lowest
5505 0.083621 0.035911 0.24333 0.029150
Table 4.3: Details about the 7-day Eurodollar data.
4.4 Empirical Results
In this section we present the results of the estimation and model misspecica-
tion analysis. In particular we are going to implement the estimation procedure
introduced above on the nested models (4.19)-(4.20). Based on these parame-
ter estimates we will consider model specication using the method of uniform
residuals presented in Chapter 3.
4.4.1 The 7-day Eurodollar data
The data used as a proxy for the short rate in this section is the 7-day Eurodollar
deposit rate (measured at the midpoint between the bid -ask rates). The data
consists of daily observations from the 1st of June, 1973 to the 25th of February,
1995, and is thus the same as considered in At-Sahalia [1996a,b]. This data,
which has kindly been provided by Mr. Yacine At-Sahalia, enables a direct
comparison between the results reported below and those reported in At-Sahalia
[1996b]. Also in light of the proxy discussion above using 7-day data seem to be
a compromise between the various problems with e.g. overnight data and 1- or 3-
month data often used as proxy. The empirical results presented below are quite
stable in regards to the data chosen to be a proxy for the short rate. Only if we
restrict our attention to data observed during a relative short time period do we
get signicantly dierent results, for various interest rate data spanning 10 years
or more do we nd somewhat similar results as what is indicated below.
When it comes to interest rate data there is little evidence of the weekend
eects that may be observed in stock price data. We thus ignore weekends
and holidays in the analysis, in other words we let t
i
denote business-days and
let
i
= t
i
t
i1
= 1, i. According to (4.16) the estimating function and the
general estimation method used can, without further alterations, be implemented
to deal with non-equidistant observations.
Figure 4.3(a) indicates that a regime shift model might be needed to suciently
describe the excessively high rates of the early eighties. As the empirical example
in Chapter 3 demonstrates (for a dierent dataset but the same overall picture),
the mean reversion level in a linear drift model is clearly higher when we include
this period. However, there is a possibility that the signs of changes in the
parameters due to dierent regime shift merely indicate a misspecied model.
Part of the reasoning behind the general model considered in this chapter, is
92 CHAPTER 4. MODELLING THE SHORT RATE
that a nonlinear drift function may better capture the dynamics of the data in
samples with what linear drift models will interpret as dierent regimes. Thereby
preserving the time-invariant model specication and the homogeneous properties
of the solution to the SDE.
Figure 4.4 shows a cross-plot of r
t
i
against r
t
i1
and indicates a linear relationship
between the observed data. However, one must be careful not to draw too many
conclusions from Figure 4.4. To investigate the time-series properties indicated by
Figure 4.4, we attempt to model the data using a discrete autoregressive model.
A time series analysis based on e.g. the AR(k) model
r
t
= +
k

i=1

i
r
ti
+
t
yields a poorly specied model. We detect autocorrelation, ARCH and normality
problems no matter how many lags, k, we add to the model. We note that there
are indications of a unit root in the data. However any specic tests for the
presence of a unit-root in this setting will be done on the basis of a misspecied
model and as such be treated with care. A likelihood-ratio test of the hypothesis
= = 0 in e.g. a rewritten AR(5) model
r
t
= +r
t1
+
4

i=1

i
r
ti
+
t
yields the test static 6.4556 which when compared to the appropriated Dickey-
Fuller distribution means that we cannot reject the unit-root hypothesis, see also
Figure 4.3 and 4.5.
1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994
0.050
0.075
0.100
0.125
0.150
0.175
0.200
0.225
7Day Eurodollar
(a) Daily observations
1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994
0.04
0.03
0.02
0.01
0.00
0.01
0.02
0.03
0.04 First difference
(b) First dierence
Figure 4.3: Daily observations of the 7-day Eurodollar rate, 5505 observations, sample period:
June 1. 1973 to February 25. 1995.
4.4. EMPIRICAL RESULTS 93
0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24
0.050
0.075
0.100
0.125
0.150
0.175
0.200
0.225
observation lag
Figure 4.4: Cross-plot of r
t
i
and r
t
i1
for the 7-day Eurodollar data.
Roots of the companion matrix
Real Imaginary Modulus
0.9976 0.0000 0.9976
0.09399 0.4326 0.4427
0.09399 -0.4326 0.4427
-0.3867 0.0000 0.3867
-0.1103 0.0000 0.1103
1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
Roots of companion matrix
Figure 4.5: Roots of the companion matrix for an AR(5) model tted to the Eurodollar data,
we see some indications of a unit-root, however, results of unit-root tests are inconclusive due
to the clearly misspecied time series model.
The rst results are from the CKLS model, we have estimated the model using
the method described above. The results are given in Table 4.4 where we present
the parameter estimates based on the estimating function (4.16).

0

1

2

3

0

1

2

3
Estimate 1.41e-4 1.31e-3 := 0 := 0 := 0 := 0 7.86e-3 2.580
sde 3.07e-5 1.99e-4 := 0 := 0 := 0 := 0 2.82e-4 4.52e-5
Table 4.4: The estimated parameters in the CKLS model specication (4.19), the notation := 0
is used to indicate that the relevant parameter in (4.7) is restricted to be zero to derive the
given model. The rst row contains estimates based on the estimating function (4.16) and the
second row contains the standard deviations calculated by (3.32)-(3.33).
Using the uniform residuals we examine how well the CKLS model ts the data,
a rst crude measure for the model t is to observe how many of the uniform
residuals are numerically equal to zero or one. Recall that

is the simulated
distribution function. If we use a suciently high number of Monte Carlo simu-
lations to calculate the distribution function then the fact that the case where
u
i
=

(X
t
i
[X
t
i1
;

)
94 CHAPTER 4. MODELLING THE SHORT RATE
is equal to zero or one means that

X
t
i
X
t
i1

is too large to be consistent with


the proposed model. In other words: If u
i
= 0 then the current structural model
and the estimated parameters predict that there is zero probability that X
t
i
is
as small as what we actually observed. Similarly if u
i
= 1 then there should be
zero probability of observering a value as high as we actually did.
We already noted this when implementing the CIR model in the empirical exam-
ple in Chapter 3 (for a dierent dataset merely meant for illustrative purposes),
if the model is to be considered as a satisfactory description of the data then we
should expect that all the uniform residuals satisfy 0 < u
i
< 1.
A QQ-plot for the uniforms residuals and the autocorrelation function is depicted
in Figure 4.6 further results of the specication analysis is given in Table 4.8 where
we compare with the results of the estimation in the other model specications.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(a) QQ-plot, uniform residuals
against a true uniform distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(b) Autocorrelation function for the
uniform residuals u
i
.
Figure 4.6: The uniform residuals u
i
based on the CKLS model
In general we note that the analysis of the CKLS model conrms the ndings in
the literature. The CEV parameter
3
is clearly larger that of the CIR model
which species that
3
:= 1 and we conclude in a similar fashion as Chan et al.
[1992]. However, even though this particular model may have better support than
eg. square root models, the CKLS model is clearly misspecied and we conclude
that simply extending a linear drift model to include a CEV diusion function
does not provide a satisfactory description of the data.
Figure 4.7 depicts the invariant density (r;

) given in (4.15) for the parameters
estimated in the CKLS model. We compare this density with the observed data by
means of a histogram (Figure 4.7(a)) and the empirical (nonparametric) density
of the observations (Figure 4.7(b)). As discussed earlier, a CKLS process would
converge to this limiting distribution and we can thus use this gure as a further
source of model misspecication analysis. Neither the histogram nor the empirical
4.4. EMPIRICAL RESULTS 95
density resembles the invariant density much, this indicates that the model is not
an acceptable specication. It should be noted that we would not expect the
observed data to be perfectly distributed according to the invariant distribution.
If the model was well specied, however, a good deal of concordance between
the invariant density and the observed data should be expected. This is also the
basic idea between the test static derived in At-Sahalia [1996b].
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275
2.5
5.0
7.5
10.0
12.5
15.0 Parametric density
(a) Parametric density and a his-
togram of the observations.
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275
2.5
5.0
7.5
10.0
12.5
15.0 Parametric density Empirical density
(b) Parametric density and empiri-
cal (nonparametric) density.
Figure 4.7: The implied parametric invariant density (4.15) in the CKLS model. If the model
was a reasonable description of the data, the process would converge to the invariant parametric
density. We would thus expect that the parametric density and the observed data as given by the
histogram and the empirical density would be somewhat similar.
The results from the general drift, CEV diusion model (4.20) are given in Table
4.5 and Figure 4.8. The interesting aspect with this model is clearly the drift
function, in Figure 4.8(c) we see that this model exhibits strong mean reversion
when the spor rate is far away from the mean, and fairly small drift when the pro-
cess is close to the empirical mean (= 0.083621) we shall discuss the implications
of a drift function of this form further below.

0

1

2

3

0

1

2

3
Estimate 1.01e-3 2.40e-2 1.43e-1 1.52e-5 := 0 := 0 8.24e-3 2.618
sde 5.41e-4 4.72e-4 4.74e-5 3.34e-5 := 0 := 0 6.33e-5 2.51e-5
Table 4.5: The estimated parameters in the general-drift, CEV-diusion model specication
(4.20).
The implications of the analysis of the general drif, CEV diusion model indicate
some improvements in regards to model t compared to the CKLS model, we
note that the test static

U indicates a better model t, though there is little im-
provement to be found in Figure 4.8. In fact there are slightly more observations
of the uniform residuals equal to one or zero, which indicate that the very strong
96 CHAPTER 4. MODELLING THE SHORT RATE
mean reversion for extreme values in this model indicated by Figure 4.8(c) actu-
ally might be a disadvantage in describing the extraordinary high interest rates
of 1980-1982. This indicates that if we wish to expand to this functional form
of the drift function then we may also have to extend to a more general diu-
sion function. It should also be noted that the test in At-Sahalia [1996b] rejects
this model on a 95% level. This means that, although a linear drift may be to
restrictive in short rate models, extending the drift to the nonlinear type consid-
ered here is (according to the semi-parametric test developed by At-Sahalia) not
sucient to describe the data well.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(a) QQ-plot, uniform residuals
against a true uniform distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(b) Autocorrelation function for the
uniform residuals u
i
.
0.0015
0.001
0.0005
0
0.0005
0.001
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
r
(c) The graph shows the drift func-
tion in the model (4.20) for the esti-
mated parameters in Table 4.5.
Figure 4.8: The uniform residuals u
i
based on the general-drift,
CEV-diusion model. Also depicted is the drift function.
These results are conrmed by the results presented here. We note some im-
provement in the misspecication analysis. However these results are far from
satisfactory, the extremes of the data are still not captured by this specication
which point towards also extending the diusion function. This discussion is also
somewhat in line with the results in Chapman and Pearson [2000] where little
4.4. EMPIRICAL RESULTS 97
evidence of nonlinearities is found in a model with linear drift and CEV diusion.
However, when considering the invariant distribution (Figure 4.9) we note con-
siderable improvement in comparison to the CKLS model, both the histogram
and the empirical density resemble the parametric density much better than in
the previous model.
We conclude that there is some indication of a need for nonlinear drift. This is
most evident when considering the invariant distribution and less evident from
the uniform residual approach but there is a possibility that even more clear signs
of nonlinearity is mued by a misspecied diusion function.
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275
2
4
6
8
10
12
Parametric density
(a) Parametric density and a his-
togram of the observations.
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275
2
4
6
8
10
12
Prarametric density Empirical density
(b) Parametric density and empiri-
cal density.
Figure 4.9: The implied parametric invariant density (4.15) in the
general drift, CEV diusion model.
The second natural way of extending the CKLS model in the setting above is to
consider the linear mean reverting drift and instead examine the general diusion
function. Before we present the results of estimation in the linear drift general
diusion model, (4.21), a few words on the estimating procedure are in order.
Once we introduce the general diusion term in the models (4.21) and (4.22) we
are faced with a far more dicult estimation task than in the other models. As
discussed above this particular diusion function is virtually unidentied for a
range of parameter values, especially
3
0 or
3
1 will cause problems. The
identication problem manifests itself in the fact that the estimating equations
are considerably more dicult to solve numerically in the models (4.21) and (4.22)
than in the models (4.19)-(4.20). To counteract this problem we make a slight
alteration to the algorithm used to solve the estimating equations. In particular
it turns out that it is necessary to force the implemented Broyden procedure
to take smaller steps during each iteration than suggested by the approximation
to the inverse Jacobian matrix, see Appendix A. Doing this increases the time
span of the estimation but only implies a small change in the original program.
98 CHAPTER 4. MODELLING THE SHORT RATE
These issues illustrate well the problems of gaining a good model t by adding
more complex parametric drift- and diusion functions to the model, for each
new parameter added the estimation is made far more dicult and new inac-
curacy problems arise. This provides a further argument for the nonparametric
approaches suggested in the literature and discussed above.
The results of the estimation in the model (4.21) is presented in Table 4.6 and
Figure 4.10

0

1

2

3

0

1

2

3
Estimate 1.33e-3 1.60e-2 := 0 := 0 1.24e-4 1.85e-3 1.87e-2 2.057
sde 6.33e-5 1.77e-4 := 0 := 0 8.08e-5 7.82e-6 8.26e-6 3.62e-5
Table 4.6: The estimated parameters in the linear-drift, general-diusion model specication
(4.21).
We note that the general diusion function, (r
t
) =
_

0
+
1
r
t
+
2
r

3
t
enables
the model to better capture the extremes in the data, clearly the extended diu-
sion term improves the CKLS model in terms of data t. It is worth noting that
Figure 4.10(c) indicates that the process exhibits heteroscedasticity in a way that
is not captured in the CEV diusion type models like CKLS or CIR. That is,
normally the parametric models associate higher spot rates with higher volatil-
ity, this model allows for higher volatility whenever the process departs form the
medium range values. However, as indicated by for instance Figure 4.10(a) there
are still many aspects of the data that are not modelled well in this specication.
Evidently, from the analysis of the general drift model above, part of these data
characteristics point towards a nonlinear drift function and the linear drift im-
posed by model (4.21) is the cause of some misspecication for which the diusion
function cannot compensate.
Figure 4.11 indicates that the general diusion model provides a far better t
than the CKLS model but not quite as good as the nonlinear drift model. Of
the three models considered so far ((4.19), (4.20) and (4.21)) the nonlinear drift
model, (4.20), provides the best model t but non of the models provide clearly
adequate descriptions of the data. We therefore move on to the full model where
we consider both nonlinear drift and general diusion at once.
As mentioned the estimation procedure in the general model is a somewhat more
dicult numerical exercise than in the more simple models. From a consideration
of the Euler approximation we note that should we truly wish to work in detail
with a discrete approximation to the true model, the maximum likelihood based
on an approximation to the true score function is not an appealing choice. In fact
numerical maximization of the Euler likelihood function is virtually impossible in
the full model. Instead a GMM procedure would perhaps prove to be a superior
method of estimating the eight parameters based on a discrete version of the
4.4. EMPIRICAL RESULTS 99
true model. However a discretization bias would remain no matter the discrete
estimation method used.
The parameter estimates are given in Table 4.7.
The combination of nonlinear drift and general diusion provides a model that
far better describes the data than the standard CKLS model. However, there are
still problems with the residuals in the same manner as the previous models, this
indicates some excess volatility issues that are still not caught by the model.
The uniform residuals are given in Figure 4.10, we note the same picture as in
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(a) QQ-plot, uniform residuals
against a true uniform distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(b) Autocorrelation function for the
uniform residuals u
i
.
0
5e05
0.0001
0.00015
0.0002
0.00025
0.0003
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
r
(c) The graph shows the diu-
sion function in the model (4.21) for
the estimated parameters in Table
4.21. This graph indicates that the
volatility of the short rate is low for
rates near the mean values and in-
creases for both lower and higer spot
rate values.
Figure 4.10: The uniform residuals u
i
based on the linear drift,
general diusion model. Also depicted is the diusion function.
100 CHAPTER 4. MODELLING THE SHORT RATE
the previous models. However, the autocorrelation in the full model is smaller
than above, indicating a better model t. See Table 4.8 for a comparison of all
four models.
An interesting observation is that the nonlinear drift in both Figure 4.8(c) and
Figure 4.12(c) indicate regions around the mean of the observation where the drift
is essentially zero. Once the process depart from this region there is strong mean
reversion, near the historical mean the process resembles a model with zero drift
and nonzero diusion. In a time series setting this would be similar to a model
that behaved like a unit-root process for central values and as a stationary
process for more extreme values. A common nding in interest rate literature
is that discrete models often report the data to be I(1). This could partly be
explained by the above analysis which dictates that the process is indeed I(1) for
the center region and strong mean reversion only takes place for extraordinarily
high or low values. This also supports the shape of the diusion process; when
values are low or high there is stronger drift towards the mean than in the linear
models thus matching the inuence of the increasing volatility. This means that
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275
2
4
6
8
10
12
Parametric density
(a) Parametric density and a his-
togram of the observations.
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275
2
4
6
8
10
12
Parametric density Empirical density
(b) Parametric density and empiri-
cal density.
Figure 4.11: The implied parametric invariant density (4.15) in
the linear drift, general diusion model.

0

1

2

3

0

1

2

3
Estimate -1.52e-3 1.44e-2 -4.01e-2 4.29e-5 3.74e-5 -6.34e-4 3.24e-3 2.065
sde 8.97e-5 1.06e-4 4.14e-5 1.64e-5 1.03e-7 1.34e-7 2.09e-6 2.65e-5
Table 4.7: The estimated parameters in the unconstrained model (4.22). From an Euler maxi-
mum likelihood point of view this model is, for all practical purposes, unidentied. Numerical
maximization of the Euler approximation to the log-likelihood function is highly dependent on
the initial values and the function has many virtually at areas. Thereby making the Euler
approximation approach to parameter estimation in continuous-time models even less appealing
than the discretization bias alone.
4.4. EMPIRICAL RESULTS 101
it is possible for the process to exhibit high volatility for low spot rate values
without implying a positive probability of reaching zero. When values are near
the mean, both the drift and the volatility is relatively low.
In terms of the invariant density, Figure 4.13 supports that the full model provides
a somewhat better description of the data than the general drift, CEV diusion
model and a far better t that the two models with linear drift. This clearly
supports the ndings in At-Sahalia [1996b], that the general model much better
ts the observed density, though as we already know, the model was originally
introduced with a test based on invariant densities in mind so this should hardly
be surprising.
When we turn to the test static derived from the uniform residuals we see that
all models are rejected on a 95% level. The models with the best test static are
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(a) QQ-plot, uniform residuals
against a true uniform distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(b) Autocorrelation function for the
uniform residuals u
i
.
0.001
0
0.001
0.002
0.05 0.1 0.15 0.2 0.25 0.3 0.35
r
(c) Drift function for model (4.22)
with parameters from Table 4.7.
0
2e05
4e05
6e05
8e05
0.0001
0.00012
0.00014
0.00016
0.00018
0.05 0.1 0.15 0.2 0.25 0.3 0.35
r
(d) Diusion function for model
(4.22) with parameters from Table
4.7.
Figure 4.12: The uniform residuals u
i
based on the full model.
Also depicted is the drift and diusion functions for the estimated
parameters in Table 4.7.
102 CHAPTER 4. MODELLING THE SHORT RATE
the two models with nonlinear drift but clearly there are some aspect of the data
that are not picked up by the models considered.
Model Number of boundary observations Value of test static,

U
CKLS (4.19) 139 8687.16
General drift model (4.20) 167 8711
General diusion model (4.21) 104 8466.45
Unrestricted model (4.22) 96 8964.24
Table 4.8: Results of the model specication analysis using the method of uniform residuals for
the models (4.19)-(4.22).
An important aspect to this analysis is that we have derived a model which
implies a parametric density that corresponds well to the observations. However,
the fact that the invariant density is well specied is not a sucient requirement
to conclude that the model cannot be rejected by other test statics. If we compare
this to the conclusions in At-Sahalia [1996b] we note that the method of testing
models by use of the invariant density alone provide ambiguous results. The full
model above provides a good t in regards to how well the density compares to
the nonparametric density, though when considering an alternative method of
model misspecication analysis these results are less convincing. The conclusion
of the analysis is thus somewhat less clear than in At-Sahalia [1996b], we do
nde clear signs of nonlinearity in the drift but nonlinear drift is not the nal
extension needed to provide a well specied spot rate model.
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275
2
4
6
8
10
12
Parametric density
(a) Parametric density and a his-
togram of the observations.
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275
2
4
6
8
10
12
Parametric density Empirical density
(b) Parametric density and empiri-
cal density.
Figure 4.13: The implied parametric invariant density (4.15) in
the full model.
4.4. EMPIRICAL RESULTS 103
4.4.2 Conclusion on the empirical analysis
One-factor models are an appealing tool for modelling the spot rate, they allow
for a simplistic method to describe the dynamics of an important nancial ob-
ject. A great part of the success of the one dimensional SDE approach to term
structure modelling has been based on the fact that many of the models used are
simple enough to imply well dened expressions for various spot rate derivatives
or are in other ways structured such that both empirical and theoretical imple-
mentations are feasible and tractable. The analysis above indicates that, if the
parametric one-factor model is to have decent model t we need to work within a
far more complex set of models that those presented in the literature. We should
allow for nonlinearities in both drift and diusion and doing so means that we
need to include a greater amount of parameters than in the appealing ane term
structure models used to price bonds and other derivatives. Adding these extra
parameters greatly complicates the empirical and theoretical properties of the
models. It is questionable if a parametric one-factor model that ts the data
really well is tractable to both estimate and work with in a theoretical setting.
A natural extension of the models would be to consider two factor models in a
stochastic volatility setting. This implies specifying a second stochastic dieren-
tial equation for the volatility, thus introducing a second source of randomness.
These models imply that the estimating function approach becomes a doubtful
estimation method as we have no natural proxy for the volatility.
In this chapter we have considered the basic theory of term structure modelling
and we have discussed the classical (one-factor, stochastic dierential equation)
models for the instantaneous short rate. From here we have briey discussed the
issues of using a proxy for the unobserved spot rate to estimate the parameters
of the model. We have introduced a general parametric model capable of both
capturing the eects of the classical models and also allowing for more complex
structure than those models previously used. One important aspect of the model
suggested is that it explains the presence of a unit-root in time series analysis of
short rate data. This model allows for (and explains why) data that proves to
be non-stationary in a discrete model may in fact be modelled by a stationary
continuous model. This proved to be the case for the data used in the analysis, a
unit root test could not reject that the model was I(1) whereas the corresponding
SDE model was indeed ergodic.
By use of an estimating function approach we have analyzed the model, and the
parameter estimates are indeed such that both drift and diusion functions dier
greatly in form from the benchmark model. The nal conclusion to this chapter
is thus that a reconsideration of term structure models is needed. An important
st step is to move away from the conning assumption of linearity.
104
Chapter 5
A semi-parametric approach
From the empirical analysis of a series of nested parametric diusion models in
Chapter 4 we concluded, amongst others, the following: Although the full non-
linear specication (4.7), provided a satisfactory model t based on the invariant
distribution alone, model misspecication analysis based on the uniform residuals
showed that some extremes in the volatility were not caught by the parametric
model.
Faced with the innite choice of possible parametric model specications it proves
a complicated task to introduce new models based on reasonable economic theory
to model the short rate. Adding to this the fact that the more complicated
the parametric model the more dicult the identication and estimation of the
parameters, this leads to the conclusion that simply adding further terms to
parametric models may prove futile.
In an attempt to gain a dierent perspective on the question of wether the short
rate drift is nonlinear we turn to a semi-parametric estimation approach. Specif-
ically, we return to the linear mean reverting parametric drift of the CIR model
and attempt to estimate the diusion function using nonparametric methods.
This should provide some knowledge as to wether a reasonable model t can
possibly be reached with linear drift function.
A major disadvantage of such nonparametric methods lie in the fact that we can
say much less about the properties of the model than in the case of parametric
models. For instance in the parametric models we could derive analytical condi-
tions on the parameters ensuring stationarity and ergodicity and even derive an
expression for the invariant density. Clearly many such analytical properties of
the models are not readily available when using nonparametric estimation.
5.1 The Estimation Approach
We consider the usual time invariant stochastic dierential equation
dr
t
= (r
t
)dt +(r
t
)dW
t
. (5.1)
Where we temporarily suppress the parameter vector .
The idea behind the nonlinear estimation of () is suggested in At-Sahalia
[1996a] and lies in using nonparametric estimates of the invariant distribution
and a simple parametric specication of () to reconstruct the diusion func-
tion. This is thus a slightly complementary approach to the one used in the
105
106 CHAPTER 5. A SEMI-PARAMETRIC APPROACH
previous chapters. We previously introduced functional forms of both drift and
diusion functions and then estimated the invariant distribution based on these
choices.
In the following we assume that we have equidistant discrete observations r
t
0
, r
t
1
, . . . , r
tn
with observation interval , let f

(x, y) be the transition density of the process


(5.1) as dened in Chapter 2. Let () be the invariant density of the process if
indeed such a density exists, that is if Theorem 2.9 is satised.
Specically we need the following assumptions
Assumption 5.1
1. The drift and diusion functions satisfy Theorem 2.9.
2. The stationary density is positive on I = (0, ) and r
t
0
, that is the
initial condition is such that the process is stationary.
From the Kolmogorov forward equation we have
f

_
r
t
i1
, r
t
i
_

=

r
t
i
_
(r
t
i
)f

_
r
t
i1
, r
t
i
__
+
1
2

2
r
2
t
i
_

2
(r
t
i
)f

_
r
t
i1
, r
t
i
__
.
(5.2)
By Assumption 5.1 stationarity implies
_
I
f

_
r
t
i1
, r
t
i
_
(r
t
i1
)dr
t
i1
= (r
t
i
) (5.3)
and

(r
t
i
) = 0. (5.4)
Using (5.3)-(5.4) (and under the condition that we can interchange integration
and dierentiation) we note that multiplying (5.2) with (r
t
i1
) and integrating
with respect to r
t
i1
gives the following.
From the left hand side of (5.2):
_
I
(r
t
i1
)
f

_
r
t
i1
, r
t
i
_

dr
t
i1
=
_
I

_
(r
t
i1
)f

_
r
t
i1
, r
t
i
__
dr
t
i1
=

(r
t
i
) = 0.
The rst term of the right hand side of (5.2) yields:
_
I
(r
t
i1
)

r
t
i
_
(r
t
i
)f

_
r
t
i1
, r
t
i
__
dr
t
i1
= (r
t
i
)
(r
t
i
)
r
t
i
+(r
t
i
)
(r
t
i
)
r
t
i
=

r
t
i
((r
t
i
)(r
t
i
)) .
5.1. THE ESTIMATION APPROACH 107
Equivalently the second term of the right hand side of (5.2) gives:
_
I
(r
t
i1
)

2
r
2
t
i
_

2
(r
t
i
)f

_
r
t
i1
, r
t
i
__
dr
t
i1
=

2
r
2
t
i
_

2
(r
t
i
)(r
t
i
)
_
.
Combining these equations gives us the ordinary dierential equation:
1
2

2
r
2
t
i
_

2
(r
t
i
)(r
t
i
)
_
=

r
t
i
((r
t
i
)(r
t
i
)) (5.5)
with initial condition (0) = 0.
Solving this dierential equation yields the central relationship between the dif-
fusion function and an expression containing the drift function and invariant
density.

2
(r) =
2
(r)
_
r
0
(s)(s)ds. (5.6)
Thus, if we have estimated the drift function and invariant measure then under
Assumption 5.1 the diusion function is completely identied.
5.1.1 Using transition probabilities to estimate the drift
function
Based on the relationship given by (5.6) we need to specify a functional form
of the drift function simple enough to identify without specifying the diusion.
This rules out using the methods examined in Chapter 3. If we where to use any
of the estimated drift functions from Chapter 4 this would imply restrictions on
the diusion function and thus render the object of the nonparametric diusion
estimation redundant.
We note that if we specify a linear drift function
(r; ) = a(b r) (5.7)
the model will be consistent with the classical models. More importantly, from
Example 3.3 this drift specication enables us to completely identify the parame-
ters = (a, b)
T
using the transition probabilities regardless of the exact diusion
specication. In fact, we note that the calculations in Example 3.3 show that any
model of the form
dr
t
= a(b r)dt +(r
t
)dW
t
satises
E

_
r
t
i
[r
t
i1

= b +e
a
(r
t
i1
b). (5.8)
108 CHAPTER 5. A SEMI-PARAMETRIC APPROACH
Sucient conditions for (5.8) are also discussed in Example 3.3 and simply consist
of integrability conditions on the diusion function such as restricting to L
2
.
Dene
= b(1 e
a
)
= e
a
and consider the regression
r
t
i
= +r
t
i1
+
t
i
. (5.9)
a =
ln

(5.10)

b =

1

(5.11)
By implementing e.g. OLS on the discrete model (5.9) we can estimate and

and thus transform back to the continuous model


a =
ln

(5.12)

b =

1

(5.13)
and have thereby identied the drift function. From the fact that the solution
to a time-invariant SDE will be a Markov process the residuals in (5.9) will be
serially uncorrelated.
Evidently this method relies heavily on the fact that we can derive an analytical
expression of the conditional mean and thereby identify the parameters of the
drift function. These considerations alongside the wish to explore further the
possibility of a well specied spot rate model with linear drift are the main reasons
for using this particular drift specication.
5.1.2 Kernel estimation of the diusion function
Based on the discrete observations r
t
0
, r
t
1
, . . . , r
t
n
we use a Gaussian kernel K
with bandwidth h
n
to estimate the invariant density
(r) =
1
n
n

i=0
1
h
n
K
_
r r
t
i
h
n
_
, (5.14)
5.1. THE ESTIMATION APPROACH 109
where
K (x) =
1

2
e

x
2
2
. (5.15)
Since r
t
0
, r
t
1
, . . . , r
t
n
are stochastic variables then so is (r). Let represent the
true density of the observations given Assumption 5.1. We now discuss some
statistical properties of kernel density estimation. At a single point r we dene
the mean square error
MSE
r
( ) = E
_
( (r) (r))
2

, (5.16)
from this term we dene a measure of the dierence between the estimator and
the true density, that is the mean integrated square error
MISE( ) = E
_
_
( (r) (r))
2

dr. (5.17)
The optimal choice of bandwidth h
n
should be a value that minimize the MISE
of the estimator. We note two opposing issues when deciding which bandwidth
to use, rst for a xed point r we consider the bias of the estimator
Bias
hn
(r) = E[ (r)] (r) (5.18)
and the variance of the estimator
Var
hn
(r) = E
_
( (r) E[ (r)])
2

. (5.19)
It follows that
MISE( ) =
_
Bias
hn
(r)
2
dr +
_
Var
hn
(r)dr.
It is easily shown that small bandwidth leads to small bias and large variance
and large bandwidth reduces the variance but at the cost of increased bias. The
optimal choice of h
n
should therefore balance these two eects and is based on the
unknown true density. See for instance Section 12.3.2 in Campel et al. [1997] for
more on bandwidth choice, for the sake of consistency we shall, for the empirical
implementation below, use the same rule for bandwidth choice as that used in
At-Sahalia [1996a]. However, we still briey discuss the bandwidth choice using
asymptotic arguments. If we choose the bandwidth h
n
such that h
n
0 and
nh
n
the mean integrated square error can be approximated by
MISE( )
1
nh
n
[K[
1
+
h
4
n
4
[K[
2
[

[ . (5.20)
Where [K[
1
and [K[
2
are constants depending on the choice of kernel and [

[ is
a constant depending on the second derivative of the unknown true density.
110 CHAPTER 5. A SEMI-PARAMETRIC APPROACH
That is, the interpretation of (5.20) is that when the number of observations is
large and the bandwidth close to zero the mean integrated square error tends to
a deterministic expression. This expression depends only on the choice of kernel
function and the second derivative of the unknown true density (and of course
the number of observations and the bandwidth).
If we use the Gaussian kernel the two constants [K[
1
and [K[
2
can be explicitly
calculated and a common approximation is to compute

as if the density was


normal ^(,
2
). Thus we can derive an explicit approximation to the mean
integrated square error and thereby nd an approximation to the true optimal
bandwidth choice.
Minimizing (5.20) with respect to h
n
and inserting the Gaussian kernel and the
normality approximation yields the following
h

n
=
_
[K[
1
[

[ [K[
2
n
_
1/5
=
_
4
5
3n
_
1/5

= 1.06 n
1/5
. (5.21)
The rst equality is derived from minimizing (5.20) and the second equality is
based on the Gaussian kernel assumption and the normality assumption regard-
ing the true density, nally is the estimated standard deviation of the data.
This provides a relatively simply rule of thumb regarding optimal bandwidth
choice. The choice of a Gaussian kernel is not crucial the density estimation
could equivalently be implemented with a various alternative kernel functions,
satisfying
_
K(x)dx = 1, K(x) 0.
If a dierent kernel is used the approximation to the optimal bandwidth h

n
must
naturally be rescaled. If the true density is not Gaussian h

n
may provide to crude
an approximation, however for large dataset we would expect this simple choice
to provide decent results, in particular we note that h

n
is the default bandwidth
used by for instance the econometrics programs PcGive/PcFiml.
5.1.3 Semi-parametric diusion estimation
Denote by (r;

) the linear parametric drift with parameters estimated according
to (5.9).
According to (5.6) we now derive an estimator for the diusion function

2
(r) =
2
(r)
_
r
0
(s;

) (s)ds. (5.22)
Under a series of regularity conditions regarding the kernel, the invariant measure
5.2. A SIMULATION STUDY 111
and the bandwidth At-Sahalia [1996a] shows that this estimator is pointwise
consistent and asymptotically normal, see Theorem 2 in At-Sahalia [1996a].
The main weakness of this method lies in Assumption 5.1. From Theorem 2.9 we
know that the process can be expected to approach the invariant density. This
method, however, assumes that the observations of the process all stem from a
time period when the process did indeed follow this invariant distribution exactly.
The results from the parametric estimation methods made no such assumptions.
We conclude that this method should only be implemented on large datasets
for which the observation can reasonably be considered close to the invariant
density.
5.2 A simulation study
In this section we examine the nite sample properties of the semi-parametric
estimation procedure.
As in the previous chapters the processes are simulated using a Milstein scheme
with a simulation interval of 0.0001, the estimations are then based on observa-
tions sampled from these simulations for various observations intervals, . We
simulate the observations for time interval [t
0
, t
n
] thus having n =
tn

observations
in each sample.
The rst results are for a simulated CIR process
dr
t
= a(b r
t
)dt +

r
t
dW
t
with true parameters (a, b ) = (1, 2, 1).
Using the parametric method above to provide estimates ( a,

b) for a and b with-


out assuming any specic structure of the diusion function. A nonparametric
function r
2
(r) which estimates the true diusion function which in the CIR
case is r
2
r can then be calculated. Based on these estimates we can evaluate
the quality of the method by comparing ( a,

b) with the true values and also by


comparing the diusion function to the true function. Subsequently we can also
use the method of uniform residuals to evaluate how well the semi-parametric
estimated models describes the simulated data.
To investigate the parametric estimators 1000 sample paths are simulated for var-
ious values of t
n
and and the estimators are calculated for each simulation, the
results are given in Table 5.1 and Figure 5.1. The two drift parameters are quite
close to the true parameters for all values of t
n
and , the standard deviations
decrease signicantly as the observation interval decreases. The estimation
procedure considered here hinges on the fact that we can derive estimators for
the parametric drift without imposing structure on the diusion. In this case of
112 CHAPTER 5. A SEMI-PARAMETRIC APPROACH
t
n
a
n

b
n
mean sde mean sde
100 2.5 0.9078 0.4159 2.0087 0.1694
100 1 1.1217 0.3584 2.0023 0.1504
100 0.1 1.0549 0.1794 2.0039 0.1433
250 2.5 0.9961 0.3924 1.9993 0.1119
250 1 1.0491 0.1860 2.0029 0.0963
250 0.1 1.0248 0.1141 2.0024 0.0912
500 2.5 1.0714 0.4056 2.0050 0.0767
500 1 1.0224 0.1290 2.0010 0.0673
500 0.1 1.0106 0.0780 2.0023 0.0636
Table 5.1: Results of the parametric part of the simulation study of the semi-parametric esti-
mation procedure. Data is a simulated CIR process with true values (a, b,
2
) = (1, 2, 1). A
total of 1000 simulations with interval = 0.0001 was made and the observations where then
sampled according to the values of and t
n
.
linear drift this follows easily and we conclude that the parametric estimators are
reliable and easily calculated.
The diusion estimates are compared to the true diusion which in the CIR case
is simply a linear function. Figure 5.2(a), 5.3(a) and 5.4(a) illustrate for dierent
values of and xed values of t
n
. We note signicantly improved estimates in
Figure 5.4(a) compared to Figure 5.2(a). It is noteworthy though that the im-
provements in the estimation of the diusion function turn out to stem more from
the fact that smaller and xed t
n
indicate higher number of observations than
from the decrease in alone. In fact, we note that an increase in the number of
observations, n, with xed results in a far greater improvement in the nonpara-
metric estimated function than an decrease in for xed n. This eect is partly
based on the stationarity arguments used in the construction of the estimators.
That is, if the data is truly distributed according to the stationary distribution
we would expect the interval between the observation to be less important since
we do not rely on any discrete approximation to the true dynamics of the process.
We thus note that this estimation procedure is well suited for the way new obser-
vations are usually added to a dataset, that is subsequent observations are added
to the original data thereby increasing n but not altering . For the asymp-
totic properties of the parametric estimators based on the Euler approximation
we relied on the unrealistic assumption that 0 as n . The fact that
the semi-parametric estimator shares the appealing properties of parametric es-
timators based on unbiased estimating functions, that we need not assume the
observation interval tends to zero for consistency and asymptotic normality, gives
further credit to this method. However, we still need the assumption of station-
arity, an assumption that is not needed in the case of estimating functions.
5.2. A SIMULATION STUDY 113
Apart from the diusion estimates Figures 5.2-5.4 also depict the results of model
misspecication analysis based on uniform residuals. These gures indicate that
the semi-parametric procedure describes the dynamics of the simulated data well.
For all choices of the QQ-plots are close to the diagonal and there are few
signs of autocorrelation. In particular for = 0.1 we have a high amount of
observations and the QQ-plot is almost indistinguishable from the diagonal.
The model test static

U also give no reason to reject the semi-parametric model.
0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Density
a
(a) Estimator for the parameter a,
true value a = 1.
1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Density
b
(b) Estimator for the parameter b,
true value b = 2.
Figure 5.1: Histogram and empirical density for parametric esti-
mators for the drift function based on (5.9) for the simulated CIR
process with true values (a, b, ) = (1, 2, 1), estimated for t
n
= 250
and = 0.1.
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
nonparametric true diffusion
(a) The estimated non-
parametric diusion func-
tion
2
compared with the
true diusion function.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(b) QQ-plot of the uni-
form residuals calculated
for the model with lin-
ear drift and nonparamet-
ric diusion function.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(c) ACF for the uniform
residuals.
Figure 5.2: Model estimation and misspecication for a simulated
CIR process with t
n
= 500 and = 2.5.
114 CHAPTER 5. A SEMI-PARAMETRIC APPROACH
A similar analysis for other linear drift models such as the Vasicek (Ornstein-
Uhlenbeck) model has been carried out. The results are similar to those of the
CIR model. However, once we implement the method for simulated data where
the true drift is not linear the results tend to reject the semi-parametric specica-
tion which is also what should be expected since the only parametric restriction
on the method is in fact linear drift.
Based on the simulation studies we are quite condent that if a given dataset
does in fact allow for a specication with linear drift and some general diusion
function, the semi-parametric method should provide a reasonable description of
the data.
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25 nonparametric true diffusion
(a) Estimated nonpara-
metric diusion function
compared with the true
diusion.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(b) QQ-plot of the uni-
form residuals.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(c) ACF for the uniform
residuals.
Figure 5.3: Model estimation and misspecication for a simulated
CIR process with t
n
= 500 and = 1.
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25
1.0
1.5
2.0
2.5
3.0
3.5
nonparametric true diffusion
(a) Estimated nonpara-
metric diusion function
compared with the true
diusion.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(b) QQ-plot of the uni-
form residuals.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(c) ACF for the uniform
residuals.
Figure 5.4: Model estimation and misspecication for a simulated
CIR process with t
n
= 500 and = 0.1.
5.3. EMPIRICAL RESULTS 115

n

n
Value 0.0005287 0.993636
sde 0.0001386 0.000529
a
n

b
n
0.0063843 0.083082
Table 5.2: Results of maximum likelihood estimation in the discrete system (5.9). Also the
estimates of the drift parameters in the continuous model derived according to (5.12)-(5.13).
5.3 Empirical results
The data used in this section is the 7 day Eurodollar data also used in Chapter
4, that is daily observations ( = 1) from the 1st of June, 1973 to the 25th of
February, 1995, a total of 5505 observations. For this number of observations
the optimal bandwidth, in the sense of At-Sahalia [1996a] Appendix 1, is h
n
=
0.016033.
We rst estimate the indirect drift parameters according to (5.9) and transform
to the continuous time model parameters according to (5.12)-(5.13). We use
maximum likelihood estimation in the discrete model, assuming normality of the
residuals this is thus equivalent to estimation by OLS. The results are given in
Table 5.2.
The nonparametric density estimate is depicted in Figure 5.5(b), we note a low
density for values higher than 0.2 corresponding to the few observations in this
range. This also means that the diusion estimator becomes increasingly unre-
liable for spot-rate values higher than 0.2. This is another issue that is avoided
using parametric method, clearly the estimated parametric functions are valid
for all values, but the parameter estimates are of course more inaccurate if we
consider implementations for spot rate values outside the estimation sample.
Given the drift function and density estimate in Figure 5.5(c) and Figure 5.5(b)
we estimate the diusion function according to the estimator (5.22). The result of
the estimation is given in Figure 5.6, we note a particular interesting shape of the
diusion function. For lower spot rate values the diusion function grows almost
linear, similar to a CIR model diusion (the graph shows
2
(r) such that a square
root diusion would be depicted linear). For intermediate values 0.05 0.15 the
function grows at a higher rate, almost exponentially. Finally for higher values
the diusion function starts declining. This means that for high values of the
spot rate process, the volatility of the process is smaller and thereby allowing
for the mean reversion of the linear drift to dominate and pull the process back
towards the mean. We should also note that the higher the spot rate the more
uncertainty in the diusion estimated since we have few observations among the
higher values.
116 CHAPTER 5. A SEMI-PARAMETRIC APPROACH
The behavior of the diusion for lower to medium values i.e. 0.02 0.11 can
indicate that the CIR process might be a reasonable description of the spot rate
in periods when the rate is low. For these values the nonparametric estimate is
quite similar to a linear function implied by the square root process, however,
once the process exedes this range the process becomes strongly nonlinear thus
implying that the CIR process can no longer be used to describe the data. Figure
5.7 shows the estimated diusion function compared to an ane function, thus
illustrating that the diusion function grows almost linear for the values of the
spot rate in the lower part of the observed range.
1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994
0.050
0.075
0.100
0.125
0.150
0.175
0.200
0.225
7Day Eurodollar
(a) The 7-day Eurodollar rate data,
this is the same data as was used in
Chapter 4.
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250
2
4
6
8
10
Density
(b) Estimated nonparametric den-
sity of the 7 day Eurodollar data,
based on the Gaussian kernel esti-
mator.
0.001
0.0008
0.0006
0.0004
0.0002
0
0.0002
0.0004
0.05 0.1 0.15 0.2 0.25
r
(c) The estimated parametric linear
drift function.
Figure 5.5: Nonparametric density estimate of the invariant dis-
tribution and estimated parametric drift.
5.4. MISSPECIFICATION ANALYSIS 117
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18
0.000005
0.000010
0.000015
0.000020
0.000025
0.000030
0.000035
^2
Figure 5.6: Estimated semi-parametric diusion estimator.
0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11
0.000010
0.000012
0.000014
0.000016
0.000018
0.000020
^2
Figure 5.7: Estimated semi-parametric diusion estimator compared with a linear function for
lower values of the spot rate process.
5.4 Misspecication analysis
We now turn to the question of how well the semi-parametric model describes
the observed data. As in the previous chapters we rely heavily on the method
of uniform residuals. In particular it should be noted that it would make little
sense to compare the implied density from Theorem 2.9 to empirical density as
118 CHAPTER 5. A SEMI-PARAMETRIC APPROACH
the method used to estimate the model uses this empirical density to t the
diusion function to the data.
We note again that the diusion function is quite poorly estimated for high values
of the observations in the dataset. Since the amount of observations decrease
rapidly once the spot rate exceeds 0.2, see Figure 5.5(b), the uniform residuals
computed for these values are less informative than those for which the diusion
is better estimated. However, naturally we also have fewer uniform residuals for
values in this range so this problem is not severe.
The misspecication analysis is also important in the sense of the question of
wether or not the short rate drift is actually nonlinear. If we can obtain a de-
cent model t using a linear drift and nonparametric diusion this indicates that
the ndings in Chapter 4 are incorrect. We concluded that the models with lin-
ear drift all exhibited clear signs of misspecication, these problems with model
t carried on into the nonlinear parametric models, though they were less pro-
nounced. Especially in the case of density matching did the nonlinear drift models
provided a far better t that the linear models.
If, on the other hand, the model with linear drift and nonparametric diusion
also fails to provide a satisfactory model description this reenforce the nding of
the previous chapters. That one-factor models with linear drift are insucient
to model the short rate.
The results of the uniform residuals analysis are depicted in Figure 5.8, we note
a similar picture to that of the parametric models in the preceding chapters.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Q plot
Uniform Residuals
(a) QQ-plot, uniform residuals
against a true uniform distribution.
1 2 3 4 5 6 7 8 9 10 11 12
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
ACFUniform Residuals
(b) Autocorrelation function for the
uniform residuals u
i
.
Figure 5.8: The uniform residuals u
i
based on the semi-parametric
model.
The test static

U derived in Chapter 3 also indicate that this model is still
misspeciced. We nd 94 observation equal to one or zero, indicating that the
5.5. CONCLUSION ON THE SEMI-PARAMETRIC ESTIMATION 119
extremes of the data are caught equally well as the full parametric model consid-
ered earlier, the test static is found to be

U = 8668.41 which is to low to accept
the model specication.
5.5 Conclusion on the semi-parametric estima-
tion
From the nance point of view the diusion function is the main area of inter-
est when pricing derivatives based on a nancial asset with dynamics given by
stochastic dierential equation. We have here considered a model proposed to
estimate the diusion function without parametrical restrictions. This should en-
able a more precise derivative security pricing method even though we still need
to impose parametric restriction on the drift function. However, in light of the
discussion in Chapter 4 we suspect that a misspecied diusion function is not
the only source of problems in regards to model t.
The results of the semi-parametric estimation support the ndings of the para-
metric estimation. A one-factor model with linear drift does not capture all the
dynamics of the short rate. Using a nonparametric estimator of the diusion
function shifts the focus to the drift function, and we conclude that the linear
mean reverting drift is a major source of the misspecication. However, we also
note that simply adding a nonlinear term to the drift function would complicate
the estimation method used in this chapter greatly. In fact, we relied on the
simple linear drift to estimate the parameters without imposing restrictions on
the diusion function.
The method used to estimate the model also suers from the assumption that all
observations are drawn from the stationary distribution of the process.
To sum up we nd further evidence that simple linear drift functions are insu-
cient in describing the data. Some volatility clustering in the uniform residuals
indicate that the usual one-factor models should be extended to include a second
source of randomness in the form of a stochastic volatility process.
To further the analysis of one-factor models a nonparametric method is needed
which does not impose parametric restrictions on neither the drift nor the diu-
sion function. This, however, is outside the scope of the simple method use in
this chapter since according to (5.6) we need one to identify the other.
120
Chapter 6
Conclusion
Justifying the choice of any given stochastic dierential equation in a nancial
model is a dicult and often ignored subject. Many models rely heavily on the
assumptions of dynamics of underlying nancial assets but those dynamics are
seldom based on empirical grounds. This thesis provides a step in the direction
of a more thorough empirical analysis of the subjects often taken for given in
nancial literature.
We have presented an extensive examination of the theoretical background of
stochastic dierential equations and presented important results regarding the
behavior of the instantaneous short rate modelled as a solution to any number of
given SDEs.
We provide methods to estimate and test, on empirical grounds, how well a given
continuous time model actually ts the data it supposedly describes.
An important new addition to the existing literature is the extensive analysis of
a series of nested spot rate models. These models include extensions of all the
classical interest rate models used in nancial literature. We have shown that a
reconsideration of those models is needed, in fact none of the one-factor models
considered are acceptable based on the empirical tests suggested in this paper.
This is a vital conclusion in regards to nancial modelling: If one truly wishes to
base the modelling assumptions on the underlying assets, a thorough empirical
analysis should be performed to justify the choices. As we have shown here, this
is often far from trivial and our empirical analysis shows that the results are far
from ignorable.
In regards to existing literature our results can be viewed both as complementary
and in opposition. Clearly we are in opposition to most nancial spot rate models
as they are rejected based on our test analysis. However, in regards to the
empirical analysis, our work here complements the work of Chan et al. [1992]
where it is also shown that a rethinking of the classical spot rate models is
needed. We also nd results very similar to those of Elerian et al. [2001]. In
fact, also by using the method of uniform residuals Elerian et al. [2001] display
QQ-plots with a structure identical to those presented in Chapter 4.
In regards to the work of At-Sahalia [1996b] our conclusions can be viewed as
twofold: We estimate models and derive a structure of our SDEs such that the
implied parametric invariant density closely matches the empirical density based
on kernel estimation. This is similar to the ndings of At-Sahalia [1996b]. If
we were to base our analysis solely on invariant density matching we would thus
conclude similar to At-Sahalia: That the classical models should still be rejected
121
122 CHAPTER 6. CONCLUSION
but the general nonlinear model provides an acceptable describtion of the data.
However, we do not only consider the invariant density. Our misspecication
analysis relies on the information contained in the transition probabilities of the
data. Utilizing this information we conclude that the new proposed model still
fails to capture all aspects of the structure of the observed data.
One of the issues we set out to explore further with this thesis was wether or
not an acceptable model with linear drift was possible. To explore this, we
considered a model with linear drift and non-parametric diusion. If the majority
of the cause for rejecting the previous models came from a misspecied diusion
function, we would expect to see improvement when we are free from making any
model assumptions on the diusion function at all. Although the model is still
rejected, with structure of the uniform residuals similar to what was found in the
parametric analysis, we note some important conclusion from this analysis: The
shape of the nonparametric estimated diusion diers greatly from any used in
the literature previously. The diusion function appears to be an almost linear
function for lower and intermediate values, it increases almost exponentially for
higher values only to level out and decrease for even higher values. It is clear that
part of the reason behind this structure is the estimation method. We do not have
many observation in the highest range for which the diusion function appears to
decrease, therefore this decrease should be viewed with caution. However, for the
middle part of the range for which we have the majority of the observation, the
part linear part nonlinear structure of the diusion function provides a signicant
break from the CEV type diusion functions used previously.
We thus conclude from the empirical analysis in both Chapter 4 and Chapter 5
that it is vital that nancial models consider empirical results when making model
assumptions about the dynamics of the underlying assets. The usual assumptions
prove dubious at best.
6.1 Discussion of the methods used
Regarding the methods used for parameter estimation we have shown, using
simulation studies, that for models simple enough to permit analytical expressions
for the conditional moments, the estimation procedures all work reasonably well.
The approximation to the likelihood function requires a small time period between
observations but the methods based on martingale estimating functions work well
without this assumption.
When we turn our attention to more complex models, we can no longer rely on
closed expressions for the needed moments and the only way to proceed is to use
simulation. This has some unfortunate consequences. Firstly this means that
we can have no hope of deriving analytical expressions for the estimators them-
6.2. POSSIBLE EXTENSIONS OF THE WORK 123
selves. Secondly any algorithm used to solve the estimation equations is slowed
considerably by the large number of simulations needed to credibly approximate
the needed conditional moments. This means that the more precise estimated
results we require the longer time does the estimation procedure take. The sim-
ulations thus imply some imprecision in the estimated parameters compared to
the situation where no simulation is needed.
This reliance on simulations remove some of the appeal of the estimating proce-
dures examined in Chapter 3 but it is the only way to proceed for the models
considered in Chapter 4 if we want maintain focus on estimation using martingale
estimating functions. The method in Chapter 5 does not suer from the same
problems as the semi-parametric method does not rely on simulations.
6.2 Possible extensions of the work
We concluded that the models considered all failed to capture all the aspects of
the dynamics of the observed data.
A natural extension of the interest rate models would be to move from one-factor
models to include additional sources of randomness. In particular stochastic
volatility models may better describe the dynamic structure than the models in
this thesis.
Other possible improvements would be to loosen the restriction of time-invariant
drift and diusion functions. Regime switching models may also better justify the
linear drift assumption for certain datasets. Finally the shape of the estimated
nonlinear drift function points towards the introduction of non-smooth (non-
dierentiable) drift functions.
In regards to the estimation procedures implemented in this thesis further work
clearly includes nding way to minimize the dependence on simulations when
working with martingale estimating functions. The method based on eigenfunc-
tions provides a good step in that direction.
Although the basic theory of stochastic dierential equations is well described in
the literature, we considered the implementation of general Markov chain theory
to derive conditions for stronger forms of ergodicity than what is guaranteed by
the well known conditions regarding the speed and scale measures. Even though
we only briey introduced these methods it appears to represent a possibility for
useful future results.
124
Appendix A
Broydens Method
Broydens method for solving a system of nonlinear equations is based on New-
tons iterative procedure with the added advantage that we need not calculate or
invert the Jacobian matrix for each iteration.
Let f : R
n
R
n
be a n-dimensional nonlinear function, that is for a vector
x R
n
f(x) is a n-dimensional vector. All vectors in the following are column
vectors.
Newtons method for solving the system of nonlinear equations
f(x) = 0 (A.1)
is dened by the sequence x
0
, x
1
, . . . where
x
n+1
= x
n
J
1
n
f(x
n
). (A.2)
Here J
n
is the n by n Jacobian matrix of partial derivatives of f with respect to
x evaluated a x
n
.
From a computational viewpoint, two major disadvantages with Newtons method
are rstly computing the Jacobian matrix (thereby implicitly assuming that the
function f is dierentiable) and secondly inverting it. Both of these operations
can be quite time-consuming, however, calculating the derivatives is by far the
main disadvantage of the method, matrix inversion proves less of a problem in
most applications. For each iteration, depending on the precision, at least n + 1
function evaluations must be made, one evaluation to calculate f(x
n
) and n func-
tion evaluations to calculate the Jacobian matrix. The convergence properties of
Newtons method are well described in computational literature and the method
is know to work well for smooth functions when the initial vector x
0
is chosen
reasonably close to the true root.
Boydens method, see Broyden [1965], suggests that Newtons method should be
modied to approximate J
n+1
from from the information contained in f(x
n+1
),
f(x
n
) and J
n
. The method is based on a rst-order Taylor series expansion of f,
that is the Jacobian matrix satises the approximate formula
J
n+1
(x
n+1
x
n
) f(x
n+1
) f(x
n
). (A.3)
Instead of using the Jacobian matrix the Broyden algorithm uses a matrix A
n+1
satisfying the same formula with equality instead of approximative. However,
(A.3) provides n equations to determine a n by n matrix and can therefore not
125
126 APPENDIX A. BROYDENS METHOD
determine the matrix uniquely. The best possible choice of a matrix satisfying
(A.3) can therefore be chosen by updating the previously calculated approxima-
tion to the Jacobian matrix thus utilizing the information previously obtained.
The Broyden updating method works in the following manner. Let y
k
= f(x
k+1
)
f(x
k
) and s
k
= x
k+1
x
k
. Given an approximation A
n
to J
n
calculate A
n+1
by
A
n+1
= A
n
+ (y
n
A
n
s
n
)
s
T
n
|s
n
|
2
(A.4)
where |s
n
|
2
= s
T
n
s
n
.
Given (A.4) Broydens method solves the system of equations (A.1) by iterating
x
n+1
= x
n
A
1
n
f(x
n
). (A.5)
For this method to work two critical choices must be made, rst as in all numerical
algorithms, a starting point x
0
must be selected. Secondly an initial choice of
matrix A
0
must be implemented. As always with iterative procedure the initial
values are crucial for the convergence of the process, in principle any initial matrix
A
0
could be used but the closer the initial matrix is to the true Jacobian the faster
the convergence.
The update formula (A.4) calculates approximations to the true Jacobian matrix.
However, what is needed for the iterative procedure (A.5) is the inverse of the
Jacobian. Instead of working with the matrix A
n
we can derive an updating
procedure that works directly on the inverse, thereby eliminating the numerical
calculations needed from inverting A
n
during each iteration.
This update method, also suggested by Broyden, utilizes what is sometimes
known as the Sherman-Morrison formula to derive the following update procedure
A
1
n+1
= A
1
n
+
_
s
n
A
1
n
y
n
_
s
T
n
A
1
n
s
T
n
A
1
n
y
n
(A.6)
which clearly minimizes the number of calculations needed in each iteration. For
each iteration the approximate Jacobian will be less accurate than if Newtons
method had been implemented. This implies that more iterations are needed to
reach a certain level of convergence, however, the numerical computations saved
using this method means that the function f is only evaluated once per iteration
in comparison with at least n + 1 in Newtons method. This will often result
in a computational advantage of Broydens method in comparison to Newtons
method.
In order to prevent divergence of the iterative procedure Broyden [1965] suggest
that a simple modication be implemented. That is to modify the Broyden
iterations in the following way
x
n+1
= x
n
t
n
A
1
n
f(x
n
) (A.7)
127
where t
n
> 0 is a scalar chosen in each iteration. Once the values of x
n
, A
1
n
and
f(x
n
) are calculated, t
n
should be chosen to ensure that the norm of f(x
n+1
) is
decreased. That is, optimally we would like to choose t
n
from
t
n
= arg min
tn
f(x
n+1
), st x
n+1
= x
n
t
n
A
1
n
f(x
n
) (A.8)
however, this would greatly increase the number of numerical computations needed
during each iteration. The alternate strategy suggested by Broyden is to simply
reduce the norm instead of minimizing it, that is to choose the value of t
n
such
that x
n+1
satises
|f(x
n+1
)| < |f(x
n
)| . (A.9)
In particular we should search for values of t
n
close to unity that also satises
(A.9) thus hopefully avoiding divergence and still remaining close to the step
suggested by the Broyden iterations (A.5).
The numerical diculties in solving the various systems of nonlinear equations in
Chapter 4 are quite severe. No only are the initial values crucially important for
the convergence of the procedure but it also turns out to be necessary to prevent
to large steps. Since we need simulations during each iteration to calculate the
conditional mean and variance it is problematic if the Broyden procedure suggests
parameter values for which the conditional moments are not well dened. If this
is the case the following steps are based on numerical inaccurate function values.
0 100 200 300 400 500 600 700 800 900 1000
0.4
0.2
0.0
0.2
0.4
0.6
x
0
x
1
x
2
x
3
object function
Figure A.1: An example of a object function for which the Broyden procedure might diverge for
poorly selected initial values. Choosing step-length according to (A.9) would not prevent this
divergence. However placing a suciently chosen upper limit on the step-length would prevent
the divergence at the cost of additional iterations.
128 APPENDIX A. BROYDENS METHOD
Both the problems of divergence and too large steps are illustrated in a simplied
one-dimensional graph in Figure A.1. Even though the initial value x
0
is chosen
quite close to the true root a Newton/Broyden type algorithm will diverge. This
problem will not be resolved by merely choosing t
n
such that (A.9) is satised,
in fact we note that condition (A.9) is indeed satised for all values x
0
, x
1
, . . .
depicted in the gure. However, choosing the step-length according to (A.8)
would, in this simple one-dimensional case, nd the root after one iteration. We
note though, that if (A.8) had been implemented in the one dimensional case
there would have been no need to implement the Broyden algorithm in the rst
place.
We note that the object functions (the estimating functions) in Chapter 4 do have
a shape similar to that depicted in Figure A.1, this stems from the fact that we
divide by the diusion function, thus for large values of the parameters entering
the diusion function the norm of the estimating function values will decrease
without approaching the root. To avoid a case similar to Figure A.1 we insert a
condition in the algorithm that if the step length is too large a smaller step in
the same direction should be taken. As mentioned in Chapter 4 this increases the
number of iterations needed to reach acceptable values, a second complication of
this approach is a consideration of what a too large step is, and how much a
too large step should be reduced. No general rule can be derived for this and in
the empirical implementations trial and error was used on a case by case basis.
Thus, in conclusion Broydens modication may often turn out to be slightly
more imprecise and require a higher number of iterations than the original New-
ton method. However, in return a method is achieved which can be implemented
even when when calculating the Jacobian matrix numerically is dicult or com-
putationally costly which is clearly the case in the empirical analysis in Chapter
4.
Appendix B
Source Codes
A large number of Ox programs have been used in this thesis, various simulation
and estimation programs were used to test the quality of the estimators consid-
ered. A complete listing of the source code of these programs is not included
in this appendix, instead we include the two of the main programs used. That
is the program performing the Broyden iterations and a program using this al-
gorithm for parameter estimation purposes. The program used to estimate the
semi-parametric method in Chapter 5 is described in some detail within that
chapter and has therefore also been excluded from this appendix.
B.1 Broydens Method
This Ox program performs the Broyden iterations to solve a system of n by n
nonlinear equations. This implementation, which diers from Broydens original
method according to the changes described in Appendix A, is inspired by the
sample programs included as documentation in the Ox program package. The
program can be used for any system of equations with the same amount of pa-
rameters and equations and is thus not restricted to case of estimating equations.
Broyden Code
1 #include <oxstd.h>
2
3 Broy(const fun, const x0, ...){
4
5 decl args=va_arglist();
6 decl n,nargin,ut,v,w,y,z,x,v1,s,k,Tol,FTol,Itmax,A;
7 decl maxnorm,damp,length,maxstep;
8 nargin=sizeof(args);
9 n=sizer(x0);
10
11 // Set default parameters, if needed
12 if(nargin < 1) Tol=1e-4; else Tol=args[0];
13 if(nargin < 2) FTol=ones(n,1)*1e-4; else FTol=args[1]*ones(n,1);
14 if(nargin < 3) Itmax=100; else Itmax=args[2];
15 if(nargin < 4) maxnorm=1; else maxnorm=args[3];
16 if(nargin < 5) damp=1; else damp=args[4];
17 if(nargin < 6) length=1; else length=args[5];
18 if(nargin < 7) maxstep=1; else maxstep=args[6];
19
129
130 APPENDIX B. SOURCE CODES
20 if(sizer(FTol) != n){
21 println("Broy: FTol should be of same size as x");
22 return 0;
23 }
24 println("");
25 println("");
26 println("Starting procedure!");
27 println("Initial Values: ", x0);
28 println("Tolerance levels: ",Tol, " , ",FTol[0]);
29 println("Maximum number of iterations: ", Itmax);
30 println("");
31 println("");
32 // Evaluate initial function value
33 x=x0; v=fun(x);
34 // Evaluate function values agains tolerance levels
35 if(fabs(v) < FTol){return x;}
36
37 if(sizer(x) != sizer(v)){
38 println(" Equation m-file must return column vector");
39 return 0;
40 }
41
42 println("Initial function values: ", v);
43 println("Norm of function values: ", norm(v));
44
45 // Initialize inverse of Jacobian (to identity)
46 A=unit(n,n);
47 // Calculate first step
48 s=-A*v;
49 if(norm(s)>maxnorm){ s=s/(damp*norm(s));}x+=s; k=2;
50 if(norm(s) < Tol || fabs(v) < FTol){return x;}
51 println("Initializing interation procedure!");
52 // Do Broyden iterations
53 while(k++ <= Itmax){
54 decl intermediate=norm(v);
55 w=v; v=fun(x); y=v-w;
56 z=-A*y; ut=s*A; A=(A-(s+z)*ut/(s*z));
57 s=-A*v;
58 if(norm(s)>maxnorm){ s=s/(damp*norm(s));}
59 decl step=maxstep;
60 decl xx=x+step*s;
61 v=fun(xx);
62 //Norm reduction condition
63 while(norm(v)>=intermediate && step>10e-3)
64 {
B.1. BROYDENS METHOD 131
65 v=fun(xx);
66 println(step);
67 step=step-length;
68 xx=x+step*s;
69 }
70 x=xx;
71 println("------------------------------------");
72 println("Iteration number:", " " , k);
73 println("Parameters: ", x);
74 println("Function values: ",v);
75 println("Norm of function values: ", norm(v));
76 println("------------------------------------");
77
78 //Convergence Criterion
79 if(norm(s) < Tol || fabs(v) < FTol || norm(v)<FTol)
80 {
81 //No improvement
82 if(norm(s)<Tol){println("No convergence within tolerance levels!");}
83
84 //Convergence Completed
85 else
86 {
87 println("==========================================");
88 println("Iteration procedure completed!");
89 println("Parameters: ", x);
90 println("Function values: ",v);
91 println("Norm of function values: ", norm(v));
92 println("==========================================");
93 }
94 return x;
95 }
96
97 //In case of Numerical Problems
98 if(v==.NaN.*ones(n,1) || s==.NaN.*ones(n,1))
99 {
100 println("Function evaluation failed after ",k," iterations!");
101 return x;
102 }
103 }
104
105 println("Maximum number of iterations exceeded in broy");
106 exit(1);
107
108 }
109
132 APPENDIX B. SOURCE CODES
B.2 Estimation Program
The following Ox program utilizes the Broyden program above to estimate the
parameters in the parametric diusion model of Chapter 4. For the simulation
of the conditional moments the parameter NS in line 14 indicates the number of
simulated sample paths, that is N in (4.17)-(4.18) and accordingly M in line 21
corresponds to m. That is, we divide the time between two consecutive obser-
vations into M subintervals for the purpose of simulating the continuous process
between these to point. According to the Law of Large Numbers the simulated
moments will converge in probability to the true moments only as NS
and M . That is, the larger these values are chosen the better the estimates,
however some considerations must be made when assigning numerical values to
the simulations. For small values the simulated conditional moments have a
large variance and this means that the algorithm may not function properly and
may not converge. Clearly for large values of NS and M each function evalua-
tion takes a considerable amount of time and the values must be chosen based
on the amount of parameters to be estimated and more signicantly the amount
of observations in the dataset.
Estimation Code
1 #include <oxstd.h>
2 #include <oxdraw.h>
3 #import "Broy"
4 /////////////////////////////////////////////////////////////////////
5 //Global data
6 decl alpha0,alpha1,alpha2,alpha3;
7 decl beta0,beta1,beta2,beta3;
8 decl condmoments=<0;0>;
9 decl data;
10 decl nrow;
11 decl datamean,datavar;
12 //////////////////////////////////////////////////////////////////////
13 condmean(alpha0,alpha1,alpha2,alpha3,beta0,beta1,beta2,beta3,x0,T,number){
14 decl NS=1000; //Critial Value,
15 //number of simulated sample paths,
16 //higher for higher precision.
17 decl simresults=zeros(NS,1);
18 decl i=0;
19 while(i<NS)
20 {
21 decl M=1000;//Critial Value,
22 //number of steps per Delta,
23 //higher for higher precision.
24
B.2. ESTIMATION PROGRAM 133
25 decl eps=rann(M,1),dW=zeros(M,1);
26 decl Y=zeros(M,1);
27 decl delta=T/M;
28 dW=sqrt(delta)*eps;
29 Y[0][0]=x0;
30 decl j=0;
31 decl step=0;
32 while(j<M-1)
33 {
34 step=(alpha0+alpha1*Y[j][0]+alpha2*Y[j][0]^2+alpha3/Y[j][0])*delta+
35 sqrt(fabs(beta0+beta1*Y[j][0]+beta2*Y[j][0]^beta3))*dW[j][0]+
36 0.5*(beta1*0.5+(beta2*(Y[j][0]^beta3)*beta3)
37 /(2*Y[j][0]))*((dW[j][0])^2-delta);
38
39 Y[j+1][0]=Y[j][0]+step;
40
41 if(Y[j+1][0]==.NaN || Y[j+1][0]<=0)
42 {
43 Y[j+1][0]=x0;
44 }
45 j++;
46 }
47
48 simresults[i][0]=Y[M-1][0];
49 i++;
50
51 //Draw one simulated sample-path
52 if(i==NS && number==(nrow-2))
53 {
54 decl xx=<10:150>/450;
55 decl tt=<1:150>/400;
56
57 decl yy=alpha0+alpha1.*xx+alpha2.*xx.^2+alpha3./xx;
58 decl zz=beta0+beta1.*tt+beta2.*tt.^beta3;
59
60 DrawXMatrix(0,yy,"Drift",xx,1,0,1);
61 DrawXMatrix(1,zz,"Diffusion",tt,1,0,1);
62 Draw(2,Y[][0]);
63 DrawLine(2, 1, data[number][0], M ,data[number][0],1);
64 DrawLine(2, 1, data[number+1][0], M ,data[number+1][0],1);
65 ShowDrawWindow();
66 }
67 }
68
69
134 APPENDIX B. SOURCE CODES
70 condmoments[0]=meanc(simresults[][0]);
71 //In case of numerical problems
72 if(varc(simresults[][0])==.NaN){println("Warning!");condmoments[1]=999;}
73 else {condmoments[1]=varc(simresults[][0]);}
74 //Return simulated moments
75 return(condmoments);
76 }
77 //////////////////////////////////////////////////////////////////////
78 estfunction(x)
79 {
80 decl i;
81 decl s1=0,s2=0,s3=0,s4=0,s5=0,s6=0,s7=0,s8=0;
82 decl delta=1;
83
84 alpha0=x[0];
85 alpha1=x[1];
86 alpha2=x[2];
87 alpha3=x[3];
88 beta0=x[4];
89 beta1=x[5];
90 beta2=x[6];
91 beta3=x[7];
92
93 decl meanvar=zeros(nrow,2);
94 decl f=0;
95 for(f=0;f<nrow;f++)
96 {
97 meanvar[f][]=condmean(alpha0,alpha1,alpha2,alpha3,
98 beta0,beta1,beta2,beta3,data[f][0],delta,f);
99 }
100 for(i=1;i<nrow;++i)
101 {
102 decl ssq=beta0+beta1*data[i-1][0]+beta2*data[i-1][0]^beta3;
103 decl meandif=data[i][0]-meanvar[i-1][0];
104 decl vardif=((data[i][0]-meanvar[i-1][0])^2-meanvar[i-1][1]);
105
106 s1=s1+(1/ssq)*meandif;
107 s2=s2+(data[i-1][0]/ssq)*meandif;
108 s3=s3+((data[i-1][0]^2)/ssq)*meandif;
109 s4=s4+((data[i-1][0]^(-1))/ssq)*meandif;
110 s5=s5+(1/(2*ssq^2*delta))*vardif;
111 s6=s6+(data[i-1][0]/(2*ssq^2*delta))*vardif;
112 s7=s7+((data[i-1][0]^beta3)/(2*ssq^2*delta))*vardif;
113 s8=s8+((beta2*((data[i-1][0])^beta3)*
114 log(data[i-1][0]))/(2*ssq^2*delta))*vardif;
B.2. ESTIMATION PROGRAM 135
115 }
116
117 return((s1|s2|s3|s4|s5|s6|s7|s8));
118 }
119
120
121 main()
122 {
123 data=loadmat("C...ait-sahalia.in7");
124
125 nrow=rows(data);
126 Draw(0,data[][0]);
127 ShowDrawWindow();
128
129 //Initial Values
130 decl x=<-4.41e-3,4.333e-2,-1.143e-1,1.304e-4,1.108e-4,-1.883e-3,9.681e-3,2.073>;
131 datamean=meanc(data[][0]);
132 datavar=varc(data[][0]);
133
134 decl y;
135 y=Broy(estfunction,x,0.1,1,999,1,1,1,1);
136 println("Roots are ","%7.3f",y);
137
138 }
136
Bibliography
T. G. Andersen and J. Lund. Estimating continuous-time stochastic volatility
models of the short-term interest rate. Journal of Econometrics., 77(2):343377,
1997.
L. Arnold. Stochastic Dierential Equations: Theory and Applications. Springer
Verlag, 1972.
Y. At-Sahalia. Nonparametric pricing of interest rate derivative securities.
Econometrica, 64(3):527560, 1996a.
Y. At-Sahalia. Testing continuous-time models of the spot interest rate. The
Review of Financial Studies, 9(2):385426, 1996b.
B.M. Bibby and M. Srensen. Martingale estimating functions for discretely
observed diusion processes. Bernoulli, 1:1739, 1995.
T. Bjork. Arbitrage Theory in Continuous Time. Oxford University Press.,
1998.
C.G. Broyden. A class of methods for solving nonlinear simultaneous equations.
Mathematics of Computation., 19(92):577593, 1965.
J.Y. Campel, A.W. Lo, and A.C. MacKinlay. The Econometrics of Financial
Markets. Princeton university Press., 1997.
K.C. Chan, G.A. Karolyi, F.A. Longsta, and A.B. Sanders. An empirical
comparison of alternative models of the short-term interest rate. The Journal
of Finance, 47:12091227, 1992.
D.A. Chapman, J.B. Jr. Long, and N.D. Pearson. Using proxies for the short
rate: When are three months like an instant? Review of Financial Studies,
(12(4)):763806, 1999.
D.A. Chapman and N. Pearson. Is the short rate drift actually nonlinear? The
Journal of Finance, 55:355388, 2000.
137
138 BIBLIOGRAPHY
J.H. Cochrane. Asset Pricing. Princeton university Press., 2001.
J.C. Cox. Notes on option pricing i: constant elasticity of vairance diusions.
Working Paprer, Stanford University, 1975.
J.C. Cox, J.E. Ingersoll, and S.A. Ross. An analysis of vairable rate loan con-
tracts. Journal of Finance, (35):389403, 1980.
J.C. Cox, J.E. Ingersoll, and S.A. Ross. A theory of the term structure of interest
rates. Econometrica, 53(2):385408, 1985.
J.A. Doornik. Object-oriented matrix programming using ox. London: Timber-
lake Consultants Press, 2002.
M. Dothan. On the term structure of interest rates. The Journal of Financial
Economics, (6):5969, 1978.
S. Down, S.P. Meyn, and R.L. Tweedie. Exponential and uniform ergodicity of
markov processes. The Annals of Probability., 23(4):16711691, 1995.
D. Due. Dynamic Asset Pricing Theory. Princeton University Press, 1992.
O. Elerian, C. Siddhatha, and N. Shephard. Likelihood inference for discretely
observed nonlinear diusions. Econometrica, 69(4):959993, 2001.
Gihman and Skorohod. Stochastic Dierential Equations. Springer Verlag, 1972.
V.P. Godambe and C.C. Heyde. Quasi-likelihood and optimal estimation. In-
ternational Statistics Revie, 21:231244, 1987.
E. Hansen. Sandsynlighedsregning pa et malteoretisk grundlag. Afdeling for
teoretisk statistik, Kbenhavns Universitet, 2001.
C.C. Heyde and R. Merton. Multiple roots in general estimating equations.
Biometrica, 85(4):954595, 1998.
J.C. Hull. Options, futures And Other Derivatives. Prentice Hall, Inc., 1997.
M. Kessler. Estimation of an ergodic diusion from discrete observations.
Preprint, Laboratoire de Probabilites, Universite Parism VI, 1995.
M. Kessler and M. Srensen. Estimating equations based on eigenfunctions for
a discretely observed diusion process. Research Report No. 322, Department
of Theoretical Statistics, Universsity of

Arhus, 1995.
B. ksendal. Stochastic Dierential Equations, An Introduction with Applica-
tions. Springer-verlag., 1989.
BIBLIOGRAPHY 139
K.S. Larsen and M. Srensen. Diusion models for exchange rates in a target
zone. Preprint/Department of Applied Mathematics and Statistics nr. 6, pages
124, 2003.
M. Pritsker. Nonparametric density estimation and tests of continuous time
interest rate models. The Review of Financial Studies, 11:449487, 1998.
R. Rendleman and B. Bartter. The pricing of options on debt secutities. The
Journal of Financial and Quantitative Analysis, (15):1124, 1980.
R. Seydel. Tools for Computational Finance. Springer, 2002.
M.R. Spiegel and J. Liu. Schaums Mathematical Handbook of Formulas and
Tables, second edition. McGraw-Hill, 1999.
M. Srensen. Estimating functions for discretely observed diusions: A review.
In Basawa, I.V., Godambe, V.P. and Taylor, R.L. (eds.): Selected Proceedings
of the Symposium on Estimating Functions. IMS Lecture Notes, 32:305325,
1997.
M. Srensen. On asymptotics of estimating functions. Brazilian Journal of
Probability and Statistics., 13:111136, 1999.
R. Stanton. A nonparametric model of term structure dynamics and the market
price of interest rate risk. The Journal of Finance, (52):19732003, 1997.
R. Tweedie and D.B. Pollard. R-theory for markov chains on a topological space
ii. Wahrscheinlichkeitstheorie verw. Geb., 34(4):269278, 1976.
O. Vasicek. An equilibrium characterization of the term structure. The Journal
of Financial Economics, (5):177188, 1977.
Yamada and Watanabe. On the uniqueness of solutions of stochastic dierential
equations. Journal of Mathematics of Kyoto University, 1971.

Você também pode gostar