Escolar Documentos
Profissional Documentos
Cultura Documentos
Laurini MarcioPoletti D
Laurini MarcioPoletti D
CAMPINAS
2009
i
! ∀ # # ∃ ∃ % & ∋ # ((
∃) # ∗+, , #, ,− ../,
0 % 1 % 2
3 # 4% % 5 ( 6 7 # % % ∀# % % ∃) # 8 # %
∃ ∀# 9# ∃) ∋: 9& ,
,∀ ∃ , , ∋ # ( ∀# 9# , 8, 2 1 %,
88, 6 7 # % % ∀# % % ∃) #, 8 # % ∃
∀# 9# ∃) ∋: 9& , 888, 39 ,
≅ % ∋: ∀# 9# ∃
3 ∋: Α ∃ ∀# 9#
Β ∃ % &, Α , 1 % 2
&, Α , % Χ
&, Α , 8 # % ; #% Χ∃ %
&, Α , 7 Χ ; # ; ∃
&, Α , 9 ∀ Ε 7 #2
Α % % & # ../
; ∃ % Φ#(Γ % ∋: Α % ∃ ∀# 9#
ii
iii
À Louis Bachelier.
v
Agradecimentos
Agradeço principalmente ao meu orientador Luiz Koodi Hotta pelo apoio fundamental
dado durante todo o programa de doutorado, e principalmente pelo exemplo como acadêmico,
pesquisador e ser humano. Esta convivência é um presente que irei valorizar por toda a vida.
Aos demais professores do Programa de Pós-Graduação em Estatı́stica do Imecc, agradeço pela
dedicação e por tudo que pude aprender neste perı́odo.
Aos membros da banca, professores Benjamin Tabak e Pedro Morettin pelos comentários e
sugestões dados durante a qualificação da tese, e também pelos inúmeros artigos e livros que
foram fundamentais até agora. Ao professor Flávio Ziegellman pelos comentários na defesa da
tese, e pelos vários comentários em congressos. Ao professor Caio Ibsen de Almeida agradeço
pela inspiração para vários dos artigos que compõem essa tese. Ao professor Mauricio Zevallos
por toda a convivência durante o programa de doutorado, e por todos os comentários dados no
seminários que eu apresentei durante esse perı́odo.
A todos os meus colegas de turma, pelo apoio, amizade e por todas as discussões sobre es-
tatı́stica.
Aos meus colegas de trabalho no Ibmec, pelo suporte e amizade.
A todos meus alunos, por permitirem que eu trabalhe nos meus temas de pesquisa.
A Lucinéia, pela minha alegria e todo o apoio e paciência.
Aos meus pais, por tudo.
vii
“I’m not interested in doing research and I never been. I’m interested in understanding, which
is quite a different thing.“
David Blackwell
“Rather than love, than money, than faith, than fame, than fairness... give me truth.“
Christopher McCandless
ix
Resumo
A tese compreende sete artigos sobre Econometria aplicada a problemas em Finanças. Dois
artigos abordam a estimação de modelos de fatores latentes para o ajuste e previsão da Estru-
tura a Termo de Taxas de Juros, utilizando métodos de Estimação Bayesiana utilizando Markov
Chain Monte Carlo, com o primeiro introduzindo uma estrutura de parâmetros variantes no
tempo e o segundo uma generalização para Estruturas a Termo de Taxas de Juros em múltiplos
mercados com a imposição de condições de não-arbitragem.
Dois artigos discutem a aplicação de Máxima Verossimilhança Empı́rica e Mı́nimo Contraste
Generalizado na estimação de equações diferenciais estocásticas e modelos de volatilidade es-
tocástica.
O próximo artigo aborda a estimação de equações diferenciais estocásticas com uma estrutura
de inovações dirigida por um Movimento Browniano Fracionário, através de uma metodologia
de Inferência Indireta.
O artigo seguinte discute o uso de métodos não-paramétricos para a interpolação de curvas de
juros com a imposição de condições de não-arbitragem, utilizando splines suavizantes sujeitos
a restrições de formato.
A modelagem de microestruturas no mercado de câmbio é abordada no próximo artigo da tese,
através do uso de metodologias paramétricas e semi paramétricas e testes para a presença de
informação assimétrica.
xi
Abstract
The thesis consists of seven articles on Financial Econometrics. Two articles focus on the
estimation of latent factor models to fit and forecast the term structure of interest rates using
Bayesian estimation methods through Markov Chain Monte Carlo, with the first article intro-
ducing a structure of time-varying parameters and the second article a generalization for the
term structure of interest rates in multiple markets with the imposition of no-arbitrage condi-
tions.
Two articles discuss the use of Empirical Likelihood and Generalized Minimum Contrast in the
estimation of stochastic differential equations and stochastic volatility models.
The next article discusses the estimation of stochastic differential equations with a structure of
innovations driven by a Fractional Brownian motion, through a method of Indirect Inference.
The following article discusses the use of nonparametric methods for interpolation of yield
curves with the imposition of no-arbitrage conditions, using smoothing splines with shape re-
strictions.
The modeling of microstructures in the exchange market is discussed in the next article of the
thesis, through the use of semi-parametric and non-parametric methodologies and tests for the
presence of asymmetric information.
xiii
∀ # ∃ ∃ % & ∋ ( !
)
∗ # ( ) + , & − .!
+ −− /
! − + 0 1 2 + − # 3
( ) + , &
4 − ∃ + 5 43
−−
6 & + 2 ( − + − 76
. ( ) 89 − + 5 ,1 : 3
+ 2 )
3 & ∀3
xv
Introdução Geral
Esta tese consiste em uma coleção de artigos sobre aplicações de métodos estatı́sticos
à problemas relacionados a modelagem de dados em mercados financeiros.
O segundo artigo “Generalized Latent Factor Models For Yield Curves In Multiple
Markets“, em co-autoria com Luiz Koodi Hotta, também aborda a estimação de modelos
1
de fatores latentes para a Estrutura a Termo de Taxas de Juros. Nesse artigo propomos
uma estrutura geral de fatores latentes para a modelagem conjunta de múltiplas curvas de
juros. Partindo da metodologia Bayesiana de estimação utilizando Markov Chain Monte
Carlo discutida no artigo anterior, propomos uma estrutura geral que generaliza diversos
modelos existentes da literatura de estrutura a termo de taxas de juros.
Esta generalização permite o uso de formas funcionais mais gerais que as utilizadas
na literatura, com parâmetros de descaimento e volatilidades variantes no tempo, bem
como a incorporação direta da possibilidade de interações entre movimentos na curva de
juros entre mercados. O artigo também apresenta uma forma de incorporar restrições de
Não-Arbitragem na modelagem em múltiplos mercados de juros. Nesse artigo também
são discutidos problemas de identificação e do uso de métodos de Bayesian Shrinkage
para reduzir o elevado número de parâmetros envolvido na estimação de modelos para
múltiplos mercados, e a metodologia de inferência proposta permite obter as distribuições
exatas de parâmetros, fatores latentes e previsões do modelo. As metodologias propostas
são aplicadas para a modelagem conjunta da curva de Cupom Cambial e da curva de
Eurodólares, e no artigo temos uma discussão detalhada sobre especificação, comparação
de modelos e previsões, bem como uma discussão sobre a validade de condições de não-
arbitragem nestes mercados.
Os dois próximos artigos, em co-autoria com Luiz Koodi Hotta, sobre aplicações de
métodos semi-paramétricos baseados em Verossimilhança Empı́rica e Mı́nimo Contraste
Generalizados aplicados a problemas em finanças. O primeiro artigo “Generalized Empiri-
cal Likelihood/Minimum Contrast Estimation of Stochastic Differential Equations“ trata
da estimação de equações diferenciais estocásticas, e o segundo artigo “Estimation of Sto-
chastic Volatility Models Using Methods of Generalized Empirical Likelihood/Minimum
Contrast“ da estimação de modelos de volatilidade estocástica. O ponto comum entre
estes dois artigos está na dificuldade da avaliação da função de verossimilhança nestes
dois problemas.
2
realizados neste estudo. Também discutimos como os métodos propostos tratam dos
problemas de especificação incorreta representados pelo uso de discretizações. O artigo
também contém uma aplicação empı́rica das metodologias propostas para uma série de
taxas de juros de curto prazo de maturidade de um mês (T-Bills).
3
mentos de interpolação e suavização de curvas de juros com a imposição de condições
necessárias de não-arbitragem. Neste artigo mostramos que a metodologia proposta tem
vantagens sobre algumas formas usuais de interpolação de curva de juros através de estu-
dos de simulação e aplicações empı́ricas para o mercado de instrumentos de Swap DIxPRÉ
negociados na BM&F e para instrumentos de STRIPS (Separated Trading of Interest and
Principals) de tı́tulos do Tesouro Americano.
Os artigos estão formatados no padrão das revistas para o qual estão foram submetidos
ou publicados, e desta forma estão em na lı́ngua inglesa de acordo com o padrão destas
revistas. O artigo “Constrained Smoothing B-Splines For The Term Structure Of Interest
Rates“ foi aceito para publicação no Insurance: Mathematics and Economics, e o artigo
“Empirical market microstructure: An analysis of the R$/US$ exchange rate market“ foi
publicado no Emerging Markets Review, v. 9, p. 247-265, 2008.
4
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL
Abstract. This paper proposes a statistical model to adjust, interpolate, and forecast the
term structure of interest rates. This model is based on the extensions for the term structure
model of interest rates proposed by Diebold and Li (2006), through a Bayesian estimation using
Markov Chain Monte Carlo (MCMC). The proposed extensions involve the use of a more flexible
parametric form for the yield curve, allowing all the parameters to vary in time using a structure
of latent factors, and the addition of a stochastic volatility structure to control the presence of
The Bayesian estimation enables the exact distribution of the estimators in finite samples,
and as a by-product, the estimation enables obtaining the distribution of forecasts of the term
structure of interest rates. Unlike some econometric models of term structure, the methodology
developed does not require a pre-interpolation of the yield curve. The model is fitted to the
daily data of the term structure of interest rates implicit in SWAP DI-PRÉ contracts traded in
the Mercantile and Futures Exchange (BM&F) in Brazil. The results are compared with the
5
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 2
1. Introduction
The term structure of interest rates may be defined as a collection of interest rates, indexed
in two dimensions: maturity and time. The first index shows the relation between the rates with
different maturities for contracts of the same nature in a determined period. The second index
shows the time evolution of the rates of contracts with the same maturities. The term structure of
interest rates shows the dynamics of the yield curve, linking a functional structure of observations
in cross-section (evolution of the rates over maturity) and the evolution of the yield curve over
time. As such, the term structure may be represented by a multivariate stochastic process. There
is a wealth of literature regarding the models of the term structure. To simplify, we can classify
this literature into three classes of models.
The first class encompasses the equilibrium models, such as Brennan and Schwartz (1979), Cox
et al. (1985) and Duffie and Kan (1996). The second classification is based on the arbitrage-free
models, of which Heath et al. (1992) is the representative framework.
The third relevant literature is the use of statistical models without a structural interpretation,
that is, models that synthesize data patterns and allow for the forecasting of the curve without
necessarily representing the theoretical models that fit under equilibrium and free-arbitrage con-
ditions. Examples of this model include the methodology of principal components (Litterman and
Scheinkman (1991)), curve interpolation models such as splines (McCulloch (1971)), smoothing
splines (Shea (1984)), kernel regression (Linton. et al. (2001)), and parametric models for curve
fitting such as Nelson and Siegel (1987) and Svensson (1994).
The dynamic extension of the Nelson-Siegel model, presented in Diebold and Li (2006) and
the basis of the procedure studied in this article, are examples of a statistical model that can
successfully forecast the term structure of interest rates.
Despite its basis in the theoretical models for interest rates, the structural models based on
equilibrium conditions have low forecasting power for the term structure. The calibration models
based on no-arbitrage do not permit a direct forecasting of the yield curve. Statistical models are
generally used in the fit and forecasting of the term structure of interest rates, because of their
superior adjustment to equilibrium-based econometric models and for their greater simplicity.
2. Diebold-Li Model
Among the statistical models for interest rate, the influential model designed by Diebold-Li
(Diebold and Li (2006)) is widely used in market applications. This model is a dynamic extension
6
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 3
of the Nelson-Siegel model (Nelson and Siegel (1987)) for the cross-section fit for the yield curve.
The Nelson-Siegel model corresponds to fitting the following equation for the yield curve observed
in the market on a specific date:
1 − e−mit /τt 1 − e−mit /τt −mit /τt
(2.1) yit (mit ) = β1t + β2t + β3t −e + ǫit
mit /τt mit /τt
where yit (mit ) is the observed rate on a given date, β1t , β2t , β3t are the time-varying parameters.
The Nelson-Siegel model is a parsimonious way of fitting the yield curve while managing to
capture a part of the stylized facts in interest rate process, such as the exponential formats present
in the yield curves.
The parameters βit have economic interpretations, where β1t presents a long-term level inter-
pretation; β2t presents short-term components; and β3t indicates medium-term components. It
may also be interpreted as the decompositions of level, slope, and curvature of the yield curve,
respectively, according to the terminology developed by Litterman and Scheinkman (1991). An
extension of this model is to use the formulation proposed by Svensson (1994) to fit the interest
cross-sections. This formulation considers the inclusion of an additional term to the formulation
proposed by Nelson and Siegel (1987), thus corresponding to:
(2.2)
1 − e−mit /τ1t 1 − e−mit /τ1t 1 − e−mit /τ2t
yit (mit ) = β1t +β2t +β3t − e−mit /τ1t +β4t − e−mit /τ2t +ǫit
mit /τ2t mit /τ1t mit /τ2t
allowing a more flexible fit for the yield curve and enabling the capture of multiple changes in the
yield-curve slope. The purpose of these models is to allow fitting and the subsequent interpolations
and extrapolations of the yield curve based on a parametric structure, which concurs with other
nonparametric fitting models such as smoothing splines. Besides the parsimonious estimation, the
Nelson and Siegel (1987) model has two additional advantages over the nonparametric models.
The first advantage is that the extrapolation of the curve has a better performance because of
the exponential nature of this model. The second advantage is that this formulation avoids the
problems in the construction of the forward curve, related to the absence of convexity adjustments,
which occur in non-parametric methods.
The extension formulated by Diebold and Li (2006) renders the Nelson and Siegel (1987) model
dynamic (adjusting the several days observed for the yield curve) by means of a procedure in 3
stages:
7
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 4
(1) The Nelson-Siegel model (with τ fixed, thus, making the model linear in the parameters)
is fitted by ordinary least squares for each date, estimating the parameters β1t , β2t , β3t .
(2) The dynamics of the system is modeled by a vector autoregressive (VAR) model for the
parameters β1t , β2t and β3t estimated at the first stage.
(3) Forecasts for these parameters are made through the VAR model estimated for vectors
β1t , β2t and β3t . By substituting the forecasted parameters in Nelson-Siegel model given
by Eq. 2.2, it is possible to forecast the future interest rate curves.
According to Diebold and Li (2006), this dynamic formulation has the purpose of capturing the
set of the existing stylized facts in the term structure of interest rates, such as the fact that while
the yield curve is crescent and concave, it may also assume inverted shapes like decreasing curves
and slope changes. Other stylized facts captured by Diebold and Li (2006) models are the high
persistence in the time dynamics (rates with same maturity are highly dependent on the past)
and the fact that persistence in the long-term rates is higher than that in the short-term rates.
Though the Diebold-Li model is simple to implement and has a superior predictive potential when
compared with other related models in the literature, some problems still arise when it is used.
The three main limitations to this model are as follows:
(1) To consider τ as fixed (linearization imposed in the model) may be troublesome for the
more unstable yield curves, such as those of the emerging countries.
(2) The functional form adapted from the Nelson and Siegel (1987) model does not allow for
capturing more complicated yield curves, such as when there are multiple changes in the
slope and curvature.
(3) No econometric property of the estimation method has been presented. Consider that it
is a two-step estimation, where the VAR is estimated on the basis of an estimated vector
of beta parameters. The main problem is the construction of the confidence intervals in
the finite samples for the forecasts obtained from this model. These intervals should take
into account the uncertainty in the estimation of the vectors of the hyperparameters β1t ,
There are some proposed solutions to these problems. Problem 1 may be addressed by estimat-
ing the full Nelson and Siegel (1987) models without fixing the parameter τ , by generally using
the nonlinear least squares. Yet, considering the limited number of observations in the yield curve,
8
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 5
the problem of minimizing the nonlinear least squares may be complicated, presenting more than
one local minimum, a possibility that may lead to an inappropriate fit of the yield curve. This is
one of the justifications to keep the parameter τ fixed, to avoid the numeric optimization problems
involved in the estimation of nonlinear models with a restricted number of observations.
The simultaneous estimation of betas may be performed through the state-space formulation
using the Kalman filter, but τ is kept fixed in the sample because of the need for linearity in the use
of the linear Kalman filter. Some statistical properties of a model obtained from the Diebold and
Li (2006) formulation were derived in Huse (2007), in which a form similar to the Nelson-Siegel
model is used with the incorporation of spatial dependence and macroeconomic variables. The
estimation is performed in two steps, but certain properties of the estimation method in the finite
samples are studied using the Monte Carlo simulation. There are several works generalizing the
Nelson-Siegel model in which the no-arbitrage condition is imposed (Christensen et al. (2008)),
but they will not be considered here.
3. Proposed Extensions
To overcome these problems, we have proposed an extended version of the Diebold and Li
(2006) model using Bayesian methods. The Bayesian methods based on Markov Chain Monte
Carlo (MCMC) are proposed as alternatives to the maximum likelihood estimation in large cases
where the maximum likelihood methods are complicated or unfeasible to apply. Examples of
estimation procedures using MCMC include: estimation of continuous-time diffusion processes for
term structure of interest rates, option pricing, stochastic volatility, and regime switching models,
as summarized in Johannes and Polson (2007).
The advantages of the Bayesian formulation are that it enables us to treat both the parame-
ters and state vectors as latent variables. This is carried out through the dynamic linear model
formulation for the time evolution of those parameters. In the Bayesian formulation, it is not
necessary to assume linearity, and hence, it is not necessary to fix the parameter τ , as it is done in
the Diebold-Li method. It must be noted that the construction of the posterior distribution of the
parameters is performed by simulation; hence, the various local minima that affect the estimation
based on nonlinear least squares of Eqs 2.1 and 2.2 do not constitute a problem.
The first Bayesian formulation of the model of Diebold and Li (2006), proposed by Migon
and Abanto-Valle (2007), corresponds to an analogous specification of the original model, using
Nelson-Siegel Eq. 2.1, with parameter τ kept fixed but estimated simultaneously with the other
parameters of the model.
9
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 6
We have proposed some extensions to the Bayesian formulation of the Diebold and Li (2006)
model proposed by Migon and Abanto-Valle (2007). The first is to use the Svensson model (Eq.
2.2), rather than the original Nelson-Siegel formula (Eq. 2.1), which makes the curve format more
flexible. The second extension is to make the parameters τ1 and τ2 time-varying, adding two latent
factors to these components. The third extension is that the formulation of our model allows for
different number of observations for each day, which avoids the first stage of curve interpolation to
obtain a set of observations for the same maturities as it was originally performed in the article by
Diebold and Li (2006), and may introduce distortions in the yield curves used in the estimation.
The last extension introduced is to add a stochastic volatility structure to the model. This
addition is of fundamental importance because one of the stylized facts in the interest rates is
the presence of conditional heteroskedasticity, generally captured in no-arbitrage and equilibrium
models by the addition of factors that specifically control the stochastic evolution of the variance.
Examples of this kind of formulation include the Hull and White (1990) and Scott (1996) models,
and a detailed discussion may be found in Fouque et al. (2000). The advantages of the Bayesian
formulation are that the properties of the estimators are obtained in the exact form for finite
samples, which allows calculating the confidence intervals for the hyperparameters and forecasting
the term structure for interest rates, considering the uncertainty in the parameter estimation.
4. Model Description
We can describe the extensions proposed in this article by the following set of equations:
(4.1)
1 − e−mit /τ1t 1 − e−mit /τ1t 1 − e−mit /τ2t
yit (mit ) = β1t +β2t +β3 − e−mit /τ1t +β4t − e−mit /τ2t +eσt ηt
mit /τ1t mit /τ1t mit /τ2t
β 1t µβ1 β1t−1
β µβ β2t−1
2t 2
β3t µβ3 β3t−1
(4.2) = + Φ + ǫt
β4t µβ4 β4t−1
τ µτ τ1t−1
1t 1
τ2t µτ2 τ2t−1
10
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 7
(4.4) ηt ∼ IID(0, 1) e ηt ⊥ ηs ∀ t 6= s
ση 0 0
X
=
0 Ωǫ 0
η,ǫ,υ
0 0 σv
In this specification, which may be considered as a nonlinear state-space model, Eq. 4.1 cor-
responds to a measurement equation, connecting the observed rates yit that describe the interest
rate as the functions of maturities i at time t.
In this specification, which may be seen as a nonlinear state-space model, Equation 4.1 corre-
sponds to a measurement equation, connecting the observed rates yit that describe the interest
rate as functions of maturities i at time t.
The formulation of this equation follows the specification of Svensson model, but with the
addition of latent factors βjt and τht , j = 1, 2, 3, 4 and h = 1, 2 are time-varying rather than fixed
P
in time. Matrix ε,ǫ,υ denotes the expanded variance-covariance matrix, where ση2 is a scalar
variance in the measurement equation, Ωǫ is the variance-covariance matrix between the latent
factors, and σv2 is a scalar variance in the stochastic volatility equation. We assume that the matrix
is diagonal, except for the submatrix of components Ωǫ , which may be correlated.
The evolution of the latent factors is given by Eq. 4.2, which describes a first-order autoregres-
sive model for these components with a parameter matrix given by Φ, containing the coefficients of
autoregressive estimation. We adopted a specification of first order for the autoregressive model,
though noting that there is no theoretical limitation to a superior order. A possibility is to
implement a restricted vector autoregressive structure, by working with only one autoregressive
structure for each parameter. Although this may be imposed a priori, a possible alternative is the
use of informative priors in the estimation of vector autoregressive models, as advocated by Doan
et al. (1984).
Finally, Eq. 4.3 describes the stochastic volatility components for the errors in the measurement
equation. The formulation used is that of an autoregressive model for the unobserved stochastic
11
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 8
volatility component, according to the original specification of the stochastic volatility model
introduced by Taylor (1986). The addition of the stochastic volatility model represents a relevant
extension, because the presence of conditional heteroskedasticity is a stylized fact in modeling the
series of interest rates. We noted that the addition of stochastic volatility components is especially
important at the moments of changes in the shape of the yield curve, especially because these
moments are linked to greater uncertainties about future interest rates and the expectations about
the ways assumed by the monetary and fiscal policy. A relevant stylized fact is that the volatility of
the interest rates is greater in the emerging economies; thus, the component of stochastic volatility
is especially relevant to the set of data used in this study.
It must be noted that the model specification given by Eqs 4.1,4.2 and 4.3 corresponds to a
nonlinear state-space model, and thus, cannot be treated by methods, such as the Linear Kalman
Filter. A way to perform the simultaneous estimation is through methods of Bayesian Inference
using the MCMC. The idea of the MCMC method is to simulate a Markov chain whose stationary
distribution converges to the distribution p(Θ|y). The MCMC methodology simplifies the calculus,
by factoring this distribution in a set of conditional distributions of inferior dimensions that can
make the simulation easier. The Hammersley-Clifford theorem (see Robert and Casella (2004) for
a derivation of this result) ascertains that under certain conditions, this set of conditional distri-
butions will uniquely characterize the posterior distribution p(Θ|y), and the MCMC methodology
is based on obtaining random samples of the conditional distributions, where a Markov Chain
structure is used. An evident advantage of this method is that it does not involve any method-
ology of numerical maximization, thus avoiding the numerical problems involved in the nonlinear
maximization of the functions such as those found in our problem. The validity of the method-
ology can be verified through methods that check the convergence of the Markov chains for its
stationary distribution.
The methodology of Bayes Hierarchical estimators is a convenient way to address the problem
when the model to be estimated can be placed in a state-space formulation. Following the example
given in Lehmann and Casella (1998), a form to represent these models is:
X|θ ∼ f (x|θ)
Θ|γ ∼ π(θ|γ)
Γ ∼ ψ(γ)
12
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 9
Thus, we place a hierarchy structure among the prior distributions. This formulation is espe-
cially useful in state-space models, because the hierarchical specification allows for the estimation
of the hyperparameters related to the latent factors using the avaliable data, specifying the dy-
namics for the latent factors. For example, the local level model is formulated as follows:
yt = µt + εt
(5.1) ,
µt = µt−1 + νt
where we use as prior distribution of the latent factor µt , the value of µt−1 , and then µt ∼
p(µt−1 ), which corresponds to the idea of state equation in the state-space formulation. The
specification of the latent factors uses a generalized formulation ξt ∼ p(ξt−1 ), where ξ denotes the
set of latent factors in our model given by βit , τit and σi2 . This methodology is also known as
empirical Bayes estimators (Lehmann and Casella (1998)).
In our problem, we cannot directly sample all the conditional distributions, owing to the non-
linear forms involved. Thus, a Hybrid MCMC is used, where we simultaneously use the Gibbs
algorithm and the Metropolis-Hastings algorithm, a methodology initially proposed in Tierney
(1994). A hybrid MCMC algorithm (Robert and Casella (2004)) may be considered as iterations
in the following stages:
θ(t) with probability 1 − ρ
(t+1) i
θi ,=
θei with probability ρ
where
(t+1) (t+1) (t)
gii (θei |θ1 ,...,θi−1 ,,θi ,...θp(t) )
^ e (t+1) ,...,θ (t+1) ,θ (t) ,θ (t) ,...θp(t) )
ρ=1 qii (θi |θ 1
(t) (t+1)
i−1
(t+1)
i i
(t)
gii (θi |θ1 ,...,θi−1 ,,θi ,...θp(t) )
(t) (t+1) (t+1) (t) (t) (t)
qii (θi |θ1 ,...,θi−1 ,θi ,θi ,...θp )
where q is the so-called tentative distribution (we assume a multivariate gaussian distribution
as tentative distribution) and g is the conditional distribution.
To completely characterize our model, the prior distributions are the normal-gamma pair in-
verse for βit and τit , using the hierarchical characterization with the mean given by the vector
13
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 10
0.22
0.20
Intere
0.18
s t Ra
te
0.16
2500
0.14
2000
ity
1500
tur
600
Ma
1000
400
Tim
e 500
200
autoregressive structure. For the parameters Φ of the autoregressive vector, we assume a normal
multivariate structure with the variance matrix given by Wishart distribution; for the latent factor
of stochastic volatility, we assume σt2 ∼ LogN ormal(φ0 + φ1t σt−1
2
, τσ2 ), with a gamma distribution
for τσ2 , normal for φ0 , and finally φ1 ∼ Beta.
For the parameters βit and φ0 , the parameters of Wishart distribution, and those of the gamma
distributions, we used a Gibbs sampling step; for τit , we used Metropolis-Hastings; and for pa-
rameter φ1 , we used the algorithm known as Slice Sampler Neal (2003)).
6. Application
In this section, we have presented an application of this model for the fitting of term structure
implicit in the SWAP DI-PRÉ curves provided by the BM&F (Mercantile and Futures Exchange)
in Brazil. These instruments are swap contracts between floating and fixed interest rates, and are
considered the more liquid fixed income market in Brazil. This yield curve is notoriously difficult
to adjust by conventional methods. We have used the BM&F data on yield curves implicit in the
SWAP operations for time interval from January 12, 2004 to December 12, 2006, a sample of 722
yield-curve days. Figure 6.1 shows the evolution of the yield curves over time.
The interesting fact is that the curves in our study present several slope and curvature changes,
going from the usual crescent shape to inverted curve, several times throughout the period. In
the same interval, it also presents several days where the yield curves have two slope changes.
14
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 11
This fact cannot be adequately captured by the model of Diebold and Li (2006), because the
Nelson and Siegel (1987) formulation does not allow more than one slope and curvature change.
Another point of importance is that the yield curve in Brazil has intense oscillation, both in terms
of curve level and format, which reinforces the necessity to make the parameters time-varying and
challenges the maintenance of parameter τ as fixed, as assumed by the Diebold and Li (2006)
model.
Another important point is that the yield curve lengthens and retracts in the mentioned period,
that is, the maximal maturities observed in the SWAP contracts change in the analyzed sample,
varying between 1800 and 2400 days.
It must be noted that our study does not carry out a pre-interpolation and extrapolation on
the data; the methodology permits to work with distinct maturities in each day. The average
number of distinct maturities is 24, with a minimum of 20 and maximum of 29. This fact must be
highlighted because the interpolation stage may distort the data, and the estimated model may
be used to interpolate and extrapolate the curve if necessary.
To estimate the model, we used 10,000 iterations of the MCMC algorithm described in Section 5,
discarding the first 5,000 iterations (burn-in period) and using the other 5,000 in the construction
of posterior distributions. The Gelman-Rubin convergence diagnoses indicate that the Markov
chains converge to the stationary distributions, thus validating the estimation methodology used.
Figure 6.2 shows the model fit inside the sample and the residuals in relation to the observed
curves. Figures 6.3, 6.4, 6.5 and 6.6 show the evolution of the latent factors β1t , β2t , β3t and β4t
obtained as medians of posterior distributions. The evolution of β1t clearly shows the level of
interpretation of this parameter, following the evolution of the mean yield curve over time. The
evolution of the other hyperparameters also adequately captures the evolution of the slope and
curvature components of the term structure observed in the interest rates.
Figure 6.7 and 6.8 are of of special importance because it they shows that the prior fixing of
parameter τ assumed in Diebold and Li (2006) model is not a valid restriction, as it becomes
evident by the great Timeral variation observed in parameters τ1 and τ2 . This gives an indication
regarding the necessity to incorporate variation in these parameters for the yield curves with great
variation of format, as observed in the emerging countries.
The estimated Stochastic Volatility component (Figure 6.9) shows the capacity of the model
to capture the stylized fact of the presence of conditional heteroskedasticity in the interest rates.
The structure of conditional volatility captures the uncertainty existing in the periods of change
15
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 12
0.24
0.22
Intere
0.20
st Rat
0.18
e
0.16
10
0.14 8
ty
6
turi
600
Ma
4
400
Tim
e 2
200
0.005
Intere
0.000
st Rat
e
10
−0.005
8
ty
6
turi
600
Ma
4
400
Tim
e 2
200
Figure 6.3. β1
β1
0.26
0.24
0.22
0.20
β1
0.18
0.16
0.14
Time
in the yield curves’ shapes, because we can notice the correlation between increase in volatility
and periods of inversion in the curve format.
Figure 6.10 shows another fact captured by the Stochastic Volatility structure-the high persis-
tence of shocks in the volatility. This is noticeable because parameter φ1 is concentrated on values
close to 1.
16
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 13
Figure 6.4. β2
β2
0.05
0.00
β2
−0.05
−0.10
Time
Figure 6.5. β3
β3
0.2
0.1
0.0
β3
−0.1
−0.2
−0.3
−0.4
Time
Figure 6.6. β4
β4
0.0
−0.5
β4
−1.0
−1.5
Time
Table 1 shows the credibility intervals calculated for the matrix of coefficients Φ. To verify
the stationarity of the process, we calculated the eigenvalues of matrix Φ for the upper and lower
limits of this matrix. The highest eigenvalues for the higher limit was 1.0029, and that for the
17
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 14
Figure 6.7. τ1
τ1
0.32
0.30
0.28
τ1
0.26
0.24
0.22
Time
Figure 6.8. τ2
τ2
0.45
0.40
τ2
0.35
0.30
Time
Stochastic Volatility
0.050
0.045
0.040
Stochastic Volatility
0.035
0.030
0.025
0.020
0.015
Time
lower limit was 0.9783, indicating that the region of nonstationarity is included in the credibility
intervals.
18
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 15
300
500
250
400
200
Frequency
Frequency
300
150
200
100
100
50
0
0.032 0.034 0.036 0.038 0.040 0.042 0.9975 0.9980 0.9985 0.9990 0.9995 1.0000
phi0 phi1
To demonstrate the predictive potential of the model, we showed the forecasts for some specific
days, characterized by distinct shapes of the yield curve. We also showed the forecasts and one-
step-ahead forecast errors for all the days observed in the sample.
Figure 6.11 shows the one-step-ahead forecasts obtained by the extended Diebold-Li model,
with confidence intervals at the 2.5% and 97.5% limits, for 4 days observed in the yield curve.
The first subfigure shows the prediction for July 20, 2004, with the format generally observed in
the interest rates, with a positive trend in maturity. The second curve, predicted for February 01,
2005, shows a curve with slope change, normally associated with expected changes in the long-term
interest rates. The curve predicted for June 27, 2006 shows an opposite situation, with a decreasing
19
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 16
0.20
0.20
0.19
0.19
Interest Rate Forecast
0.18
0.17
0.17
0.16
0.16
0.15
0 1 2 3 4 5 0 1 2 3 4 5
Maturity Maturity
0.140
0.165
0.135
0.160
Interest Rate Forecast
0.130
0.155
0.150
0.125
0.145
0.120
0.140
0 2 4 6 8 10 0 2 4 6 8 10
Maturity Maturity
curve at the medium-term maturities and an increasing curve at the long-term maturities. The
subfigure d shows a one-step-ahead forecast for the last observation in the sample, the observation
referring to December 6, 2006.
The one-step-ahead forecasts for all the samples, along with the inclusion of extrapolations for
the unobserved maturities and associated prediction errors, are shown in Figure 6.12. The forecast
errors have relatively low magnitude. It can be noticed that the bigger errors are concentrated at
the moments of change in the shape of the yield curve.
We also carried out a comparative forecast analysis between the extended Diebold-Li model and
the original formulation of Diebold-Li with fixed τ , the time-varying specification of the Diebold-Li
model, and a modification in the Diebold-Li model using Svensson Eq. 2.2 to replace the original
formulation based on the Nelson-Siegel specification, with parameters τ1 and τ2 being fixed and
time-varying, respectively. The estimation of these reference models for the time-varying τ was
based on nonlinear least squares, whereas the linearized forms were based on the estimation by
20
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 17
0.24
0.005
0.22
Intere
0.20
Intere
st Rat
0.000
st Rat
0.18
e
e
0.16
10 10
−0.005
0.14 8 8
ty
ty
6 6
turi
turi
600 600
Ma
Ma
4 4
400 400
Tim Tim
e 2 e 2
200 200
ordinary least squares. Table 2 presents the root mean square error using the one-step-ahead
forecast errors for the five compared models. Parameters τ, τ1 and τ 2 were fixed by the mean value
of the corresponding time-varying parameters.
The results of this comparative analysis showed that the Diebold-Li model with the proposed
extensions has a superior forecast performance when compared with the other models, as shown
in Table 2.
The original Diebold-Li model with parameter τ being fixed, using the Nelson-Siegel specifica-
tion, is not a valid specification because it substantially reduces the predictive power of the model
when compared with the varying-parameter version of the same model. In the case of the Diebold-
Li model using Svensson specification, the fixed parameters result in better predictive power than
the estimation with free parameters. This result may be explained by the difficulty in estimating
Svensson specification, because on many days, the estimation does not converge because of non-
linearity, which makes the model fitting inadequate, thus raising the mean quadratic error value
with the presence of high forecasting errors for all the maturities observed on those days. This
problem also contaminates the autoregressive vector estimation, which compromises on the curve
forecasting for the following day. However, the use of Bayesian estimation with informative priors
allows us to employ the more flexible Svensson specification, but without being affected by the
instability problems in the nonlinear estimation, which occur in the classical estimation that use
nonlinear least squares.
21
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 18
7. Conclusions
In this article, we implemented some extensions for the Diebold and Li (2006) model, including
the use of Bayesian estimation methods using MCMC for the parameters and latent factors of this
model. The proposed extensions for the model were the changes in the functional form, making it
more flexible with the use of the Svensson (1994) specification, including latent factors, to make
the model parameters time-varying, and enabling the use of different observations on each day
and the inclusion of a stochastic volatility structure.
The more flexible form adopted allows capturing changes in the shapes associated with the
yield curve in the emerging countries. This flexibility is reflected in the low forecasting and fitting
errors observed in this model. The use of the Bayesian estimation associated with informative
priors in the latent factors specification avoids the procedure of linearized estimation in the two
stages employed in the Diebold-Li model, which results in more precise fits and forecasting. The
parameter specification, as modeled latent factors through a Bayesian hierarchical structure, allows
obtaining the distribution in finite samples of parameters and model predictions, thus enabling
the quantification of the uncertainty present in the estimation of the term structure of interest
rates
We demonstrated that it is important to make the τ parameters time-varying to fit the term
structure, in particular, to yield curves in the emerging countries with constant modifications
in the shape of the yield curve, as observed by the behavior of the latent factors τ1t and τ2t .
The latent factors, owing to their informative prior structure, allow us to overcome the common
problem of numerical instability associated with the nonlinear estimation of the Nelson-Siegel
and Svensson models in the presence of a restricted number of observations. On the other hand,
the Bayesian estimation methodology, through the MCMC algorithms, carries out the estimation
simultaneously and allows us to avoid the linearization of the model and the estimation in the two
stages used in Diebold and Li (2006).. The specification of the model is based on a standard set
of priors, and the estimation algorithm, based on a mixture of Gibbs and Metropolis-Hastings, is
22
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 19
widely used and its properties are extensively studied, thus making the estimation of the model
simple and trustworthy.
References
Brennan, M. J. and Schwartz, E. J.: 1979, A continuos time approach to the pricing of bonds,
Journal of Banking and Finance 3, 133–155.
Christensen, J. H., Diebold, F. X. and Rudebusch, G. D.: 2008, An arbitrage-free generalized
nelson-siegel term structure model, Econometrics Journal forthcoming.
Cox, J. C., Ingersoll, J. . E. and Ross, S. A.: 1985, A theory of the term structure of interest rates,
Econometrica 53, 385–408.
Diebold, F. and Li, C.: 2006, Forecasting the term structure of government bond yields, Journal
of Econometrics 130, 337–364.
Doan, T., Litterman, R. and Sims, C.: 1984, Forecasting and conditional projection using realistic
prior distributions., Econometric Reviews 3, 1–100.
Duffie, D. and Kan, R.: 1996, A yield-factor model of interest rates, Mathematical Finance pp. 379–
406.
Fouque, J.-P., Papanicolaou, G. and Sircar, K. R.: 2000, Derivatives in Financial Markets with
Stochastic Volatility, Cambridge University Press.
Heath, D., Jarrow, R. and Morton, A.: 1992, Bond pricing and the term structure of interest
rates: A new methodology for contingent claims valuation, Econometrica 60(1).
Hull, J. and White, A.: 1990, Pricing interest rate derivative securities, Review of Financial Studies
3(4), 573–94.
Huse, C.: 2007, Term structure modelling with observable state variables. Unpublished Working
Paper - FMG - LSE.
Johannes, M. and Polson, N.: 2007, Handbook of Financial Econometrics, chapter MCMC Methods
for Continuos Time Financial Econometrics.
Lehmann, E. and Casella, G.: 1998, Theory of Point Estimation (2nd Edition), Springer.
Linton., O., Mammen, E., Nielsen, J. and Tanggard, C.: 2001, Estimating yield curves by kernel
23
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 20
Migon, H. and Abanto-Valle, C.: 2007, A Bayesian term structure modelling, in C. Fernandes,
H. Schimidli and N. Kolev (eds), Proceedings of the Third Brazilian Conference on Statistical
Modelling in Insurance and Finance, IME-USP, pp. 200–203.
Neal, R.: 2003, Slice sampling (with discussions), Annals of Statistics 31, 705–767.
Nelson, C. R. and Siegel, A. F.: 1987, Parsimonous modelling of yield curves, Journal of Business
60(4), 473–489.
Robert, C. and Casella, G.: 2004, Monte Carlo Statistical Methods, Springer.
Scott, L. O.: 1996, Simulating a multi-factor term structure model over relatively long discrete
time periods, Procedings of the IAFE First Annual Computacional Finance Conference.
Shea, G.: 1984, Pitfalls in smoothing interest rate structure data: Equilibrium models and spline
approximation, Journal of Financial and Quantitative Analysis 19, 253–269.
Svensson, L. E. O.: 1994, Estimating and interpreting forward interest rates: Sweden 1992-1994.,
NBER Working Paper (4871).
Taylor, S. J.: 1986, Modelling Financial Time Series, John Wiley& Sons.
Tierney, L.: 1994, Markov chains for exploring posterior distributions (with discussion), Annals
of Statistics 22, 1701–1786.
24
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE
MARKETS
Abstract. In this article we propose latent factors models to model simultaneously yield curves in multiple
markets, generalizing several models found in the literature on the estimation of term structure of interest
rates. The proposed models do not use some of usual restrictions adopted for estimation and identification,
thus enabling us to use more flexible structures incorporating additional latent factors, stochastic volatility
and the imposition of no-arbitrage consistency. The elimination of these restrictions is made possible through
the Bayesian estimation methodology using the Markov Chain Monte Carlo (MCMC). This methodology
makes it possible to obtain exact confidence intervals for the parameters, latent factors and forecasts, and
also to address identification and dimensionality problems in the estimation of multimarket models. The
models are applied to model jointly Cupom Cambial (USD interest rate in Brazil) and Eurodollar curves,
carrying out an extensive procedure of model comparison and demonstrating the forecast and practical
potential of the proposed models.
Keywords: Term Structure, Latent Factors,No-arbitrage, Forecasting.
JEL Codes: C11, G12, G17.
Adress- Insper Institute - Rua Quatá 300, 04546-042, São Paulo, SP. Brasil. email - Márcio Laurini - marciopl@isp.edu.br -
Luiz Koodi Hotta - hotta@ime.unicamp.br.
1
25
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 2
1. Introduction
Modeling the term structure of interest rates is a fundamental point in the management of capital as-
sets. A considerably large literature has been developed to obtain more precise forms for the modelling,
forecasting and pricing of financial instruments with basis on the yield curve. Among these approaches
an important part of the literature is based on the idea that the dynamic evolution of the yield curve may
be described using a set of dynamic factors that determine the evolution of risk premiums for the various
maturities observed. The most common way of considering these factors is through a representation using
1
latent state variables, that is, as variables not directly observed
The purpose of these latent factors is to summarize the whole set of relevant variables determining the yield
curves’ movements. The methodologies for the extraction of these latent factors may arise from purely
statistical mechanisms, such as the decomposition of principal components introduced in Litterman and
Scheinkman (1991), where the latent factors are interpreted as components of level, slope and curvature.
These latent factors may also be identified by methodologies of pricing by equilibrium, such as the short-
rate models of Vasicek (1977) and Cox et al. (1985), which belong to the class of affine models (Affine
Diffusions, e. g. Cox et al. (1985)). These equilibrium models may also be placed in a general framework
based on no-arbitrage conditions by the Heath-Jarrow-Morton ( Heath et al. (1992)) formulation, which
determines the evolution of forward rates as a stochastic process of infinite dimension.
All these approaches, however, show partial success in the empirical modeling of the dynamic evolution of
the term structure of interest rates. The equilibrium models and the affine models, though having impor-
tant analytical properties such as the existence of closed formulas for asset pricing, are both characterized
by a rather unsatisfactory fit of the rates observed and of the forecasts derived from these models as
well. An additional difficulty is that, in general, the econometric estimation of these models suffers from
problems of local maxima and identification, as pointed out by Duffe (2002). The models of no-arbitrage
1
For references about modeling the term structure of interest rates see, for example, Brigo and Mercurio (2006) for aspects
related to the pricing of financial instruments, and Singleton (2006) about the estimation of models of the term structure of
interest rates.
26
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 3
are calibrated so that they replicate perfectly the yield curve observed in the market by way of matching,
using bond prices, but this calibration is cross-sectional and does not allow for forecasting future curves.
It only allows pricing of derivative instruments. Moreover, these models are recalibrated daily using in-
struments observed in the yield curve.
Diebold and Li (2006), whose main objective is to forecast the term structure of interest rates, propose a
dynamic model using the parametric form for the yield curve proposed by Nelson and Siegel (1987), and
interprete this model as a latent factor model. In this generalization each parameter of the cross-section
fit of the Nelson-Siegel’s model is considered as a latent factor, and through the modeling and forecasting
of these latent factors it is possible to obtain forecasts for the whole term structure of interest rates. The
results obtained by Diebold and Li (2006) indicate that this formulation presents a fit and a forecast power
superior to other methodologies of yield curve modeling, making this model the standard reference for
term structure forecasting.
The model proposed in Diebold and Li (2006) was also attractive because of its ease of implementation.
With some restrictions about the parametric space, this model could be estimated by using only estimation
by Ordinary Least Squares while the other models would require more complex estimation tools such as
the Kalman filter (e.g. Duffe (2002)) or estimation methods such as the Simulated Method of Moments,
employed in the estimation of affine models in Dai and Singleton (2000). Apart from simplifying its im-
plementation, the restrictions imposed in the Diebold and Li (2006) model were necessary to avoid the
usual problems in the estimation of term structure of interest rates models, such as the above mentioned
problems of local maxima and non-identification.
Based on the success they obtained in the dynamic extension of the Nelson-Siegel curve, Diebold et al.
(2008) proposed a generalization of this model to fit multiple yield curves simultaneously, employing a
methodology that consisted in building latent factors connected to a not directly observed global yield
curve. In the Diebold et al. (2008) model, the yield curve of each market would be obtained as a linear
displacement of the global yield curve plus an idiosyncratic factor, by means of these latent factors. It is
important to note that the [Diebold et al., 2008] formulation is the first attempt at creating a model that
27
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 4
makes it possible to capture simultaneously the dynamics of several term structures. This formulation
has also been adopted to model the yield curves of emerging countries in Morita and Bueno (2008), thus
demonstrating the general applicability of this model.
However, the model proposed by Diebold et al. (2008) employs a series of restrictions in its formulation.
Given the high number of parameters involved in the estimation of the global model, Diebold et al. (2008)
employ a rather limited specification for the general shape of the yield curve in each market. Instead of
using a complete formulation of the Nelson-Siegel model with level, slope and curvature factors, Diebold
et al. (2008) use only the components of level and slope, which makes the fit for the observed yield curves
rather limited, although it is important to note that the primary purpose of this model was not its fit or
forecast: its purpose was rather to verify the existence of a global factor influencing the movements of the
term structure in most important markets.
An additional restriction is also used in this model, in that the parameter defining the slope of the yield
curve should be kept constant, which significantly impairs the model’s fit. Some other problems found in
this formulation refer to the estimation procedures: the use of a procedure in two stages does not make
it possible to obtain measures such as exact confidence intervals for the model’s parameters and for the
yield curve forecasts. Further problems relate to the model’s identification, that is, to obtain conditions
for a single vector of parameters able to define the maximum of the likelihood function employed in the
model’s estimation. And finally, other problems found in this formulation relate to the presupposition of a
constant conditional volatility, which contradicts one of the stylized facts in the modeling of yield curves.
Furthermore, the formulation proposed in Diebold et al. (2008) does not overcome one of the fundamental
criticism to the original model of [Diebold et al., 2006], that is, the model’s inconsistency with no-arbitrage
conditions. This original limitation of the Diebold et al. (2008) model was resolved in Christensen et al.
(2007, 2008), who demonstrate that, although the original formulation of the Diebold et al. (2008) model
is incompatible with no-arbitrage conditions, it is nevertheless possible to work with an approximate form
of this model which is arbitrage-free, reparametering the Diebold et al. (2008) model as an affine model
of term structure and obtaining a term of correction that enables the incorporation of the no-arbitrage
28
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 5
29
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 6
multimarket models through a mechanism known as Bayesian Shrinkage, which enables the automatic
elimination of the model’s redundant parameters. Finally, we implement, for the model of multiple mar-
kets, the no-arbitrage conditions formulated by Christensen et al. (2008), generalizing these conditions in
the multiple markets’ case. Thus the proposed generalizations deal with all the problems pointed out in
the original formulations of the Diebold and Li (2006) and Diebold et al. (2008) models.
This article is structured as follows: sections 2 and 3 review the original models Diebold and Li (2006)
and Diebold et al. (2008) and discuss the problems in the formulations. Section 4 shows the extensions
proposed to get round those problems. Section 5 discusses the implementation of no-arbitrage conditions,
and section 6 shows the Bayesian estimation procedure by MCMC method. Sections 8 and 7 discuss how
the Bayesian estimation provides a way of addressing the problems of identification and dimensionality
of the parameter vector. Section 9 shows an empirical application of the proposed models, fitting joint
models for the curves of the CopoM Cambial (USD interest rates in Brazil) and the Eurodollar curve. This
section carries out an extensive comparison of all the models proposed in this study and we also implement
a new procedure in the literature, making it possible to verify the validity of the imposition of no-arbitrage
conditions on those models of term structure of interest rates. The final considerations follow in section 10.
Among the models used for modeling the term structure of interest rates, the model proposed by Diebold
and Li (2006) is generally used in the market because of its simplicity of implementation and its superior
forecasting performance. This model is based on the formulation proposed by Nelson and Siegel (1987)
for the cross-section fit (day by day) of the yield curve. The Nelson and Siegel (1987) curve is represented
as follows:
1 − e−m/τ 1 − e−m/τ −m/τ
(2.1) yt (m) = β1 + β2 + β3 −e + ǫt (m)
m/τ m/τ
30
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 7
where yt (m) are the rates observed on a certain date t for the maturity vector m and β1 , β2 , β3 and τ are
parameters. The parameters are interpretable - β1 represents the long-term component, β2 a short-term
component, β3 a medium-term component and τ is a parameter that controls the slope of the yield curve.
Parameters β1 , β2 , β3 may also be interpreted as level, slope and curvature decompositions in accordance
with the terminology developed by Litterman and Scheinkman (1991). This model is a parsimonious way
of fitting the yield curve, and is capable of reproducing several stylized facts about the shape of the yield
curve in time.
Diebold and Li (2006) propose a dynamized version of the Nelson-Siegel model, interpreting the parameters
as dynamic factors. This model can be formulated through an observation equation for the yield curve
given by:
1 − e−m/τ 1 − e−m/τ −m/τ
(2.2) yt (m) = β1t + β2t + β3t −e + ǫt (m)
m/τ m/τ
and a system determining the evolution of latent factors as a first-order vector autoregression:
β1t µ1 β1t−1
(2.3) β2t =
µ2 + Φ β2t−1 + ǫβt
β3t µ3 β3t−1
where Φ is the parameter matrix of this autoregressive vector process. The model estimation is gen-
erally performed through a two-stage procedure. The first stage is the estimation of equation 2.2 for
each day observed. This estimation is performed by Ordinary Least Squares, assuming that the slope
parameter τ is fixed and known, estimating the latent factors β1t , β2t , β3t for each period of time t. The
second stage consists in the estimation by Ordinary Least Squares of the parameter matrix Φ of the au-
toregressive vector using parameters β1t ,β2t and β3t estimated in the first stage. Forecasts for the model
31
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 8
are obtained by joining the forecasts for the latent factors t days ahead in the Nelson-Siegel (2.2) equation.
As it can be noted, the estimation and forecasting in the Diebold and Li (2006) model is extremely
simple, making its implementation possible in any standard econometric software. However, this simpli-
fied formulation can be criticized on various aspects. It might be too restrictive to consider parameter τ
as constant for unstable curves, such as is the case of the curves of emerging countries. This parameter
captures the average slope of the yield curve, and this parameter can be changed with alterations in the
curve shape. Another important point is that the adopted parametric specification, derived from the func-
tional form of the Nelson-Siegel model, does not make it possible to capture curves with more complicated
formats, such as curves that have more than one change in slope and/or in curvature.
Other relevant points refer to the properties of the estimators in this two-stage estimation procedure.
The first point is that the estimation is only consistent in the choice of one correct parameter τ . It is also
important to note that the distribution of estimators in this context is unusual, since the estimation in
the second stage is based on a series constructed through a first stage. This also affects the construction
of confidence intervals for the forecasts of the yield curve derived from this model. Furthermore, there
is a loss of efficiency in the two-stage estimation since the estimation of the latent factors is performed
day by day, hence it is disconnected from the autoregressive vector structure adopted in equation 2.3. An
alternative way of performing this estimation would be to use maximum likelihood through the Kalman
Filter, since the system formed by equations 2.2 and 2.3 is already in a state-space formulation, but this
estimation continues to suffer from local maxima and non-identification problems, as commonly happens
in the estimation of term structure models that use the Kalman Filter (e.g. Duffe (2002) ). Another
fundamental problem is that the original formulation of the Diebold and Li (2006) model is not consistent
with the no-arbitrage principle. The Nelson-Siegel curve used in Diebold and Li (2006) does not accom-
modate a free-arbitrage representation, which can be seen, for example, in Björk and Christensen (1999),
32
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 9
The Diebold-Li model is a dynamic model for a curve of one market only, but it is possible to gener-
alize this formulation to model several yield curves simultaneously. This generalization was proposed in
[Diebold et al., 2008]. Denoting the observed curve for market i as a function of a maturity τ vector by
yit (τ ), we have that, in this model, the yield dynamics is given by a restricted version of the Nelson-Siegel
curve, with level and slope factors only:2.
1 − e−λτ
(3.1) yit (τ ) = lit + sit + vit (τ )
λτ
where, in the [Diebold et al., 2008] notation, lit represents the level component in the period t for
the country i, sit represents the slope component for this same country in each period t, and vit is a
shock component for the equation of rate observation. In order to specify the complete dynamics of the
model it is necessary to specify the evolution of the latent factors of level and slope for each country. In
the specification proposed in Diebold et al. (2008) the idea is that there are the so-called global factors
determined by a non-observed curve ygt in the form:
1 − e−λτ
(3.2) ygt (τ ) = Lt + St + Vgt (τ )
λτ
and the dynamics of the global latent factors Lt and St is given by the following autoregressive dynamic:
2In this exposition of this model we follow the original notation in the Diebold et al. (2008)] study, which denotes the
maturity vector as τ and the slope parameter as λ, whereas the notation used in the other models presented uses m to
denote the maturity vector and τ for the slope parameters. We also use the original specification of Nelson and Siegel (1987)
for parameter , whereas Diebold and Li (2006), Diebold et al. (2008) use the factor λ = 1/τ
33
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 10
Lt φ11 φ12 Lt−1 Utl
(3.3) = +
St φ21 φ22 St−1 Uts
In order to determine the slope and curvature components it is assumed that each country’s curve is a
linear modification of the global curve plus an idiosyncratic component. The local curves are given by:
εlit φ11 φ12 εlit−1 ult
(3.5) = +
εsit φ21 φ22 εsit−1 ust
The estimation of this model could be performed in principle employing maximum likelihood through
the decomposition of the forecast error using the Kalman filter, noting that in this case we have additional
latent variables representing the global factors. However, due to the dimension of the problem for the
multimarket case and the usual estimation problems, such as the identification problems and the possibil-
ity of local maxima, the estimation of the Diebold et al. (2008) model is performed in two stages.
In the first stage the curve for each country is obtained by Ordinary Least Squares, assuming again
that the parameter which controls the slope curve is kept constant and not estimated. A second stage
is performed with the factors obtained for each country, using MCMC to obtain the other parameters
and latent factors. This estimation is also performed with the imposition of some restrictions such as
the assumption that the parameter matrix in the autoregressive processes of the local factors is diagonal.
34
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 11
Even though this procedure has an operational purpose, it is difficut to obtain a statistical interpretation
of the results because the estimation of the model is in part frequentist, and in part Bayesian. Once
again we face the same problem of how to build confidence intervals for parameters and forecasts with
this two-stage procedure; moreover, the estimation procedures by MCMC use only joint distributions and
linear specifications, thus it does not use the total information in the yield curve.
This procedure presents similar limitations to those of the original estimation of the Diebold-Li model,
but aggravated by the dimensionality and heterogeneity of the model and of the shapes of the yield curves
in different markets. The first important point to note is that fixing the slope parameter of the curve can
considerably limit the model fit. Different markets may have very different slope factors, and, as already
mentioned, it may turn out to be extremely limiting to assume that these parameters are fixed in time.
Another important point is that the restriction of assuming only level and slope factors also limits the
possible fit of the model. There is ample literature quoting the fit gains obtained by the inclusion of
additional curvature factors, such as the original Svensson (1994) model and the Björk and Christensen
(1999) models, which add more slope and curvature factors, thus considerably increasing the fit. It is also
fundamental to note that in this specification it is not possible to use the arbitration-free especifications
proposed in Christensen et al. (2007, 2008) because in those formulations each slope component has to be
coupled with a curvature component with the same reversion rate to average; therefore this formulation
without curvature components cannot be made arbitrage free.
A further fundamental criticism is that, in the dynamic specification adopted in equation 3.5, each
country’s curve is a displacement of the global curve plus an idiosyncratic factor. Note that in this formu-
lation there is no direct interdependence between the yield curves, thus the model does not allow direct
identification of the possible interactions between the latent factors of different markets. An expected
interpretation would be to verify whether, for example, displacements of the level of one particular market
35
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 12
do affect the level of the other markets. Note that in this formulation this is performed only indirectly by
modifications in the global factor, and it is not possible to observe this direct effect.
4. Proposed Models
In order to address the existing problems in the original formulations of the Diebold and Li (2006);
Diebold et al. (2008) models, we use the Bayesian framework for latent factors proposed in Laurini and
Hotta (2008), though generalized for the case of more than one yield curve and also with the addition of
no-arbitrage correction proposed in Christensen et al. (2008). The proposed models can be classified in
3 classes - the first class of models is a generalization of the latent factor structure, increasing the state
vector so as to include interactions with the other latent factors, in particular the latent factors of the
other countries; the second class of models is a generalization of the global factor structure Diebold et al.
(2008), with the inclusion of components of curvature, double curvature and additional slopes; and the
third class contains the necessary modifications to make the previous two classes arbitrage-free, using the
approximation of an affine model proposed in [Christensen et al. (2007, 2008).
The common structure between the first two classes is given by the more flexible formulation of the
observation equation. We adopt as basic structure the dynamic generalization of the parametric form
proposed by Svensson (1994), which consists in an equation with a level factor, a slope factor and two
curvature factors in the form:
1 − e−m/τ1t 1 − e−m/τ1t −m/τ1t
(4.1) yt (m) = β1t + β2t + β3t −e
m/τ1t m/τ1t
1 − e−m/τ2t −m/τ2t
+β4t −e + σt ηt (m)
m/τ2t
36
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 13
where we assume:
In the arbitrage-free models we use the specification with an additional slope factor and another curva-
ture factor on the representation given by equation 4.1, detailed in section 5. On this basic model we also
adopt the generalization proposed in Laurini and Hotta (2008) to render time variant the slope factors
τ1 and τ2 , considering these parameters as additional latent factors, by using a first-order autoregressive
structure for all the latent factors given by:
β1t µβ1 β1t−1
β2t µβ2 β2t−1
β3t µβ3 β3t−1
(4.3) = + Φ + ǫt ,
β4t µβ4 β4t−1
τ1t µτ1 τ1t−1
τ2t µτ2 τ2t−1
where, in principle, the matrix Φ is a complete matrix, and thus each latent factor in the period t
depends on the other latent factors in the period t-1 plus an µ intercept. Another generalization in this
model is the possibility of stochastic volatility factor σt , whose dynamics is given by:
This factor makes it possible to capture a conditional volatility structure in interest rates, which in turn
makes it possible to capture this stylized fact (e.g. Chan et al. (1992), Lund and Andersen (1997)). This
stochastic volatility component has an additional function, that is, the possibility of avoiding an excessive
variation in the latent factors of the model. A known result in the Bayesian literature is that it is possible
37
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 14
to write a regression model with random coefficients in a regression model with fixed coefficients by the
inclusion of a component of conditional heterocedasticity (e.g. Bauwens et al. (1999)). In relation to this
point, it is also interesting to note the criticism made by Sims (2001) of a model for time variant param-
eters proposed by Cogley and Sargent (2001) to observe variants in the monetary policy. Sims points out
that the observed variation in the parameters in the Cogley and Sargent (2001) model could be generated
by a non-controlled structure of conditional volatility in the model. Thus this component of conditional
volatility tries to avoid this problem of excessive variation in the latent factors of the model.
An important point is that, in the class of models with corrections for no-arbitrage, it is necessary to keep
both the structures of conditional volatility and of slope parameters constant in time, and therefore these
two extensions cannot be used. These extensions would render the model unsuitable to belong to the class
of affine models, and consequently the approximation proposed in Christensen et al. (2007, 2008) would
not apply.
In the following sections we define the particular characteristics of the 3 classes of models proposed in this
study.
4.1. Models of Generalized Latent Factors. The first class of models uses a generalization of the
Diebold-Li model, expanding the latent vectors to include interactions between the latent factors defining
the curves of the different markets. In this class we define the observed curve yti(m) for the country i
through Svensson’s representation:
i
" i
#
−m/τ1t −m/τ1t
1 − e 1 − e i
(4.5) yti(m) = β1t
i i
+ β2t i
+ β3t − e−m/τ1t
m/τ1ti m/τ1ti
" i
#
i 1 − e−m/τ2t i
+β4t − e−m/τ2t + σti ηti (m)
m/τ2ti
38
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 15
The generalization of the vector of latent factors for the multimarket case is given by the following
representation:
i i j
(4.6) βkt = Φi βkt−1 + Φj βkt−1 + ǫkt
i i j
(4.7) τkt = θi τkt−1 + θj τkt−1 + νkt
2j
(4.8) lnσt2i = γi lnσt−1
2i
+ γj lnσt−1 + ξt
i
where k = 1, 2, 3, 4, βkt represents the latent factors of level (k=1), slope (k=2), curvature (k=3) and
double curvature (k=4) for market i, with a structure analogous to equation 4.3 but with the inclusion of
j
βkt−1 , which represents the factors for the market j , which have an equivalent representation. Likewise,
i
we have the factors τkt for the different countries and the stochastic volatility strucure σt2i for each country
i. Note that in this representation each latent factor of a market is influenced by other countries’ factors,
allowing us to introduce an interaction between the different yield curves as discussed in section 3. In
order to complete the model we adopt the following covariance structure for each market’s parameters:
σηi2 0 0
i
X
= 0 Ωiǫ 0
η,ǫ,υ
2i
0 0 σv
Pi
The matrix ε,ǫ,υ is the expanded variance-covariance matrix of the parameters of the model for each
country; σηi2 is the variance of the measure equation; Ωiǫ is the covariance matrix between the latent factors;
and σvi2 is the variance of the stochastic volatility process. This matrix is block-diagonal, except for the
39
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 16
4.2. Generalized Global Model. The second class of models is a generalization of the Diebold et al.
(2008) global factor model. In this case we do not adopt the restrictions in this study, and we use a
complete representation for the parametric structure of the yield curve observed in each country, employing
a representation analogous to Svensson’s curve. Following the Diebold et al. (2008) notation, we have that
the curve for each country is given by:
i
" i
#
1 − e−m/τ1t 1 − e−m/τ1t i
(4.9) yti(m) = lit + sit + c1it −e−m/τ1t
m/τ1ti m/τ1ti
" i
#
1 − e−m/τ2t i
+c2it − e−m/τ2t + σti ηti (m)
m/τ2ti
where lit is the level of the country i, sit is the slope and c1it and c2it are the two curvature factors, and
all these factors evolve in t. In this representation we have τ1ti and τ2ti as the slope factors for each country
i, and they also are time variants.
To complete the specification of the model, we generalize the structure of global factors used by Diebold
et al. (2008) . In this structure each latent factor of level, slope and curvatures are a linear function of
the equivalent global factor. This representation is written as:
40
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 17
and the α and β represent parameters (loadings) to be estimated and the vector of global latent factors
(Lt, St , C1t , C2t , τg1t , τg2t ) evolves as a first-order autoregressive vector, generalizing the structure of equation
(3.3). We also assume that the idiosyncratic components for the latent factors of each market follow a
first-order autoregressive structure, according to the general specification given by equation (3.5), but
applied to this generalized vector of latent factors.
5. No-arbitrage
What we have so far in the specifications discussed consists basically in statistical representations, i.e.,
although the latent factors are interpreted as components of level, slope and curvature, this interpretation,
even in affine models, is an approximation, as demonstrated in Almeida (2005). These representations are
41
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 18
merely tools for fit and forecast of the yield curve, lacking a complete theoretical or structural justifica-
tion. In this regard, the main shortcoming of these models is their lack of compatibility with the principle
of no-arbitrage pricing. The fundamental result of no-arbitrage pricing, known as fundamental theorem
of asset pricing, establishes that a market is arbitrage-free if, and only if, it has (at least one) measure
of Probability Q equivalent to the physical measure P, so that the discounted sequence of risk-adjusted
returns on assets are a semi-martingale in this measure Q (e.g. Harrison and Kreps (1979); Harrison and
Pliska (1981); Delbaen and Schachermayer (1994)).
Consistency with no-arbitrage is a fundamental principle in Finance, since it establishes that the asset
return must be consistent with its level of risk, and thus a systematic certainty of risk-free profits should
not arise. In large assets and high liquidity markets the no-arbitrage principle must be attained by the
performance of rational traders. In the modeling of the term structure of interest rates, the general prin-
ciple of no-arbitrage can be observed within the general framework proposed by Heath et al. (1992). A
curve is consistent with no-arbitrage if it can be projected in the space of all the arbitrage-free curves in
the equivalent martingale measure, and it must generally be contained in a stochastic variety generated
by the Heath-Jarrow-Morton structure, as demonstrated in Filipovic (2001).
The problem arising herein is that the curves generated by the Nelson-Siegel models are never consistent
with no-arbitrage, and there is only one restriction about the Svensson model which is consistent with
no-arbitrage, but its structure is too limited for practical use, as proven by Filipovic (1999). Therefore,
although models of the Nelson-Siegel class and Svensson class and their dynamic extensions present a
good empirical fit to the observed data of the term structure of interest rates, they would not be valid
in terms of no-arbitrage consistent pricing. On the other hand, the opposite situation also occurs - the
majority of the no-arbitrage models used have a poor fit to the observed data, as demonstrated by Duffe
(2002), presenting an apparent trade-off between consistency with no-arbitrage on the one hand, and fit
and forecast powers on the other. Nevertheless, recent evidence indicates that, with adequate modifica-
tions in the structure of arbitrage-free models, it is possible to obtain an adequate predictive power in
these models, as, for example, in Almeida and Vicente (2008).
42
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 19
Although there is no an arbitrage-free form in the Nelson-Siegel-Svensson class, with the introduction
of some modifications it is possible to produce a similar class of models with the no-arbitrage property,
as shown in Christensen et al. (2007) for the Nelson-Siegel family and Christensen et al. (2008) for the
Svensson family3. To make this correction for no-arbitrage [Christensen et al., 2007, Christensen et al.,
2008] employ affine term structure models. These models are quite convenient because they present inter-
esting analytical properties, such as the existence of closed formulae for asset pricing and are characterized
by a common structure that makes it possible to encompass several models studied in the literature, as
demonstrated by Dai and Singleton (2000).
In order to characterize the structure of the affine term structure models we start from the definition of a
zero coupon bond price in the period t with maturity T in the equivalent martingale measure Q, which
must be given by:
h ´T i
(5.1) P (t, T ) = EtQ e− t rs ds ,
where r(t) represents the instantaneous interest rate (short rate). In this class of models r(t) is an affine
function of an unobserved vector of state variables (latent factors) Y (t):
N
X ′
(5.2) r(t) = δ0 + δy Yi (t),
i=1
where δs represent parameters and Yi (t) is a so-called affine diffusion with the following structure:
Xp
(5.3) dYi (t) = κ(θ − Yi(t))dt + S(t)dW (t)
3The derivation for the Nelson-Siegel family is a special case, using only the latent factors , as can be seen in [Christensen
et al., 2008].
43
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 20
with parameters κ and θ, dW (t) is standard Brownian motion and S(t) is a diagonal matrix with i-th
element given by:
′
(5.4) S(t)ii = αi + βi Y (t).
Duffie and Kan (1996) demonstrate that, in this way, the bond price may be written as:
′
(5.5) P (t, τ ) = eA(τ )−B(τ ) Y (t) ,
where A(τ ) and B(τ ) are given by the solution of the following system of ordinary differential equations:
dA(τ ) ′ ′ 1
PN hP′i2
dt
= −θ κ B(τ ) + i=1 2
B(τ ) αi − δ0
(5.6) P hP′ i 2
i
dB(τ )
= −κ B(τ ) + 12 N
′
dt i=1 B(τ ) βi − δy .
i
The great advantage of this class of affine term structure models is that they are quite flexible, allowing
for the generalization of a wide range of term structure models featured in the literature, particularly in
the definition of the latent factors, which normally tend to be fairly generalized, as indicated in Dai and
Singleton (2000) and Diebold et al. (2005).
In order to obtain this arbitrage-free representation for the family of term structure models defined by the
Svensson curve, Christensen et al. (2008) employ an affine structure model, assuming that the short rate
is given by the sum of latent factors:
44
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 21
and these latent factors Xt1 , Xt2 , Xt3 , Xt4 , Xt5 evolve through the following system of differential stochastic
equations:
dXt1
0 0 0 0 0 θ1Q Xt1
2
dX 2 0 λ1 0 −λ1 0 θ2Q X
t t
(5.8) dX 3 = 0 0 λ 0 −λ θ3Q − X 3 .
t 1 2 t
4
dXt4 0 0 0 λ1 0 θ4Q Xt
dXt5
0 0 0 0 λ2 θ5Q Xt5
In this model, according to equation (5.1), prices of zero-coupon bonds are obtained by the following
expression:
(5.9)
´T
P (t, T ) = EtQ [e t ru du
] = exp(B 1 (t, T )X + B 2 (t, T )Xt2 + B 3 (t, T )Xt3 + B 4 (t, T )Xt4 + B 5 (t, T )Xt5 + C(t, T )),
where the terms B i (t, T ) and C(t, T ) are the only solutions for the following systems of differential
ordinary equations:
dB 1 (t,T ) 1
dt 1 0 0 0 0 0 B (t, T )
dB 2 (t,T )
1 0 λ1 0 0 0 B 2 (t, T )
dt
(5.10) dB 3 (t,T ) = 1 0 0 λ2 0 0 B 3 (t, T )
dt
dB 4 (t,T )
dt 0 0 −λ1 0 λ1 0 B 4 (t, T )
dB 5 (t,T ) 5
dt
0 0 0 −λ2 0 λ2 B (t, T )
1 X X ′ X
5
dC(t, T ) ′ ′
(5.11) = −B(t, T ) κQ θQ − B(t, T )B(t, T )
dt 2 j=1 j,j
45
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 22
1 − e−λ1 (t−T ) −λ1 (t−T ) 4 1 − e−λ2 (t−T ) −λ2 (t−T ) C(t, T )
−e Xt + −e Xt5 − .
λ1 (t − T ) λ2 (t − T ) T −t
This result that can be interpreted as a reparameterization of the Björk and Christensen (1999) curve
with the addition of a correction factor for no-arbitrage given by the term − C(t,T
T −t
)
, which is given by the
following expression4:
1 1 X X X
5
C(t, T )
(5.13) − =− B(s, T )B(s, T )′ ′
,
T −t 2 T − t j=1
P
where is a matrix with latent factor covariances. This correction factor is a function of latent factors
variances and also of the model’s slope parameters, which, in this formulation, are assumed to be constant.
The specification of the Christensen et al. (2008) model is a very useful representation because it allows
any affine form of latent factors to be used, thus making it possible, for example, to add macroeconomic
variables to the latent factors vector, or to increase the dependence structure in these factors, while still
remaining consistent with no-arbitrage (which is the form used in this study), including the interaction
with latent factors of the other markets, so as to generalize the structure of factors employed.
An important aspect to note here is that, in order to make possible the correction for no-arbitrage, it
is necessary to employ a structure with five latent factors, which implies additional slope and curvature
factors in Svensson’s model. Thus the original representation of the Diebold et al. (2008) model, which
4. The analytical expression for this correction term is found in the Appendix of [Christensen et al., 2008]’s article but it
has been omitted here for reasons of space.
46
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 23
displays factors of level and slope exclusively, cannot be made arbitrage- free by the methodology of
Christensen et al. (2007, 2008). In order to obtain arbitrage-free representations for the generalized latent
factor models proposed in section 4, we have increased the dynamics of latent factors by including crossed
factors, i.e., each latent factor in each market depends on the latent factors of its own market plus the
latent factors of the other markets, in the form:
i i j
(5.14) βkt = Φi βkt−1 + Φj βkt−1 + ǫkt ,
where now k=1,2,3,4,5 represents the five factors needed for the arbitrage-free correction, and now the
equation describing the yields of each market is given by:
1 − e−λi1 (t−T ) −λi1 (t−T ) i 1 − e−λi2 (t−T ) −λi2 (t−T ) i Ci (t, T )
−e β4t + −e β5t − .
λi1 (t − T ) λi2 (t − T ) T −t
In this representation we do not adopt the stochastic volatility factor, and we keep the slope param-
eters λ fixed in time, maintaining consistency with the affine specification of the model, but these slope
parameters are jointly estimated with the other parameters of the model.
47
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 24
In all the specifications presented so far, we have models that can be represented by a non-linear
state-space model, where we have a non-linear observation equation for rates, and a set of state equa-
tions representing the latent factors. In some of the models we also consider the slope parameters and
the volatility as additional latent factors. Whereas the basic representation can be estimated using the
Kalman filter, the non-linear forms cannot be estimated by this methodology, and, even in its simplest
representations, this procedure is blighted by several estimation problems.
Given the computational difficulties involved in the estimation of these models, ad hoc restrictions are
generally put in place to facilitate the estimation, such as, taking the decay parameter as fixed, or else to
perform the estimation by means of two-stage procedures, as commented in sections 3 and 2.
In this context, one way of performing the estimation - while using all the available information in the term
structure of interest rates and avoiding the imposition of ad hoc restrictions - is to employ the Bayesian
methods of estimation using MCMC algorithms. As we will demonstrate next, this methodology makes
it possible to address the existing problems in the usual mechanisms of estimation, such as non-linearity,
identification and dimensionality problems. In the estimation via MCMC, linear and non-linear models
are approached in the same way, and one of the advantages of the Bayesian methodology is that it allows
the latent factors to be addressed as additional parameters to be estimated.
In Bayesian inference, the aim is to find the posterior distribution of parameters of interest conditioned
to the observed sample, denoted by p(Θ|y). This posterior distribution is the result of the updating of an
prior distribution assumed for the parameters from the information available in the sample, represented
by the likelihood function.
In order to find the distribution of the parameters conditioned to the sample, the following relation derived
from Bayes rule is used:
48
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 25
where p(y|Θ) is the model’s likelihood, p(Θ) denotes the prior distribution assumed for the parameter
and p(y) is the marginal distribution of the sample, which must be known up to a constant integration.
Thus the a posterior distribution is proportional to the product of the likelihood by the prior distribution.
After obtaining the posterior distribution, the summarization of the results can be done by calculating,
for example, the expected values and the variance of the posterior distribution of each parameter:
ˆ
(6.2) E(θk |y) = θk p(Θ|y)dθ
ˆ
(6.3) V ar(θk |y) = θk2 p(Θ|y)dθ − [E(θk |y)]2.
ˆ
(6.4) p(θj |y) = p(Θ|y)dθ1dθ2 ...dθd .
Thereby, the main objective of a Bayesian estimation is to obtain posteriori distributions, containing
priori information updated by the information in the sample, given by the likelihood function. Except for
some specific cases, generally speaking the use of conjugate distributions - (the prior distribution being
from the same family as the posterior distribution - we do not have analytical forms for posteriori distri-
butions. Nevertheless, there is a way of obtaining such distributions in these cases: it consists in adopting
techniques of numerical integration using Monte Carlo methods. There is a Monte Carlo methodology
fundamental in Bayesian estimation methods and it is to use the so-called MCMC algorithms (e.g Robert
and Casella (2005), Gamerman and Lopes (2006)).
49
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 26
The idea of the MCMC methods is to simulate a Markov chain whose stationary distribution converges
to the p(Θ|y) distribution. A fundamental result of this is that the estimation of p(Θ|y) can be factored
by employing a sampling method of conditional distribution of the parameters, which is a procedure
known as Componentwise Metropolis-Hastings (e.g Ntzoufras (2009)). These conditionals are of an inferior
dimension and can be more easily simulated. Such procedure can be resumed to the following iterations:
The Clifford-Hammersley theorem (see Robert and Casella (2005) for a derivation of this result) en-
sures that, under certain conditions of regularity, this set of conditional distributions converges only for
the distribution of p(Θ|y). An obvious advantage of this method is that it does not involve any method-
ology of numerical maximization, thereby avoiding the numerical problems involved in the maximization
of non-linear functions, such as those found in our problem. The empirical validity of this methodology is
verified by methods capable of checking the convergence of Markov chains for their stationary distribution.
Another important point to be mentioned in relation to the use of Bayesian inference methods is that the
use of prior information helps to solve some of the existing problems in the classic estimation, such as, for
example, the estimation of non-identified models. This point is discussed in detail in section 8.
When all the conditional distributions are known, then the MCMC algorithm becomes the so-called
Gibbs sampler 5, where the estimation is done by sampling directly from the conditional distributions.
However, when it is not possible to sample from the analytical conditional distribution, the sampling of
5For a detailed discussion of this topic see [Robert Casella, 2005], [Gamerman Lopes, 2006] and [Ntzoufras, 2009]
50
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 27
these conditionals can be performed by applying the Metropolis-Hastings algorithm, which is a general-
ization of the acceptance-rejection method of random variables simulation for the sampling of conditional
distributions.
As for our problem, we cannot sample directly from all the conditional distributions because of the non-
linear specifications adopted and the use of non-conjugate distributions. Thus we will use a hybrid MCMC
algorithm, simultaneously using the Gibbs algorithm and the Metropolis-Hastings algorithm, which is a
methodology originally proposed in [Tierney, 1994]. In this case, when we have a known conditional we
use Gibbs sampling, and for other conditionals we use Metropolis-Hastings. A hybrid MCMC algorithm
Robert and Casella (2005)) can be seen in the iterations in the following stages:
2 - Accept
θ(t) with probability 1 − ρ
(t+1) i
θi ,=
θe with probability ρ
i
where
(t+1) (t+1) (t) (t)
gi (θei |θ1 ,...,θi−1 ,,θi ,...θp )
^ e (t+1) ,...,θ (t+1) ,θ (t) ,θ (t) ,...θp(t) )
ρ=1 qi (θi |θ(t)1 (t+1) i−1 i
(t+1)
i
(t) (t)
g (θ
i i |θ 1 ,...,θ i−1 ,,θ i ,...θp )
q (θ(t) |θ(t+1) ,...,θ(t+1),θ(t) ,θ(t) ,...θ(t))
i i 1 i−1 i i p
where qi is the so-called tentative or auxiliary distribution When the model to be estimated can be placed in
a state-space formulation, a convenient way of addressing this problem consists in adopting a hierarchical
formulation. In this structure, the representation of the priors is based on a hierarchy. This formulation is
particularly useful in state space models because the hierarchical specification makes it possible to recover
the distribution of latent factors by using, as prior distribution of the latent factor on date t, an posterior
function of the latent factor in time t-1. A simple example of this is the so-called local level model:
51
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 28
y t = µt + εt ,
(6.6)
µt = µt−1 + νt
In this example we can use as prior distribution of the latent factor µt the value of µt−1 , and so
µt ∼ π(µt−1 ), making direct use of the state equation specification6.
In order to achieve a complete characterization of our model, it is necessary to discuss which prior
distributions are employed. For the latent factors βit and τit we use the Inverse Normal-Gamma pair, as
priors, by means of the hierarchical characterization, where the mean is given by the autoregressive vector
structure. For the parameters of the autoregressive vector Φ, we assume a multivariate normal struc-
ture with variance matrix given by a Wishart inverse distribution, and for the latent factor of stochastic
2
volatility we assume σt2 ∼ LogNormal(φ0 + φ1t σt−1 , τσ2 ), with a Gamma distribution for τσ2 , Normal
for φ0 and finally φ1 ∼ Beta. For the other parameters in the autoregressive processes and also for the
specification of the parameters which identify the factors of each market in the models of generalized and
global latent factor models, we use a normal-multivariate structure for the mean of these parameters, and
Wishart-inverse for their variance matrix7. Alternative specifications implementing shrinkage procedures
are discussed in section 7.
The sampling procedure employs the Gibbs algorithm for βit , φ0 parameters, autoregressive processes’
parameters, loading parameters in the global factor model, Wishart distribution parameters and hyperpa-
rameters for Gamma distributions. For the other parameters that do not have a known conditional distri-
bution, and which are linked with the non-linear and non-conjugate specifications, we use the Metropolis-
Hastings algorithm; and, finally, for the parameter φ1 in the stochastic volatility processes we use the
algorithm known as Slice Sampler (Neal (2003)). The model specification is then complete, assuming a
6See Koop (2003) for the estimation of state-space models using the hierarchical formulation.
7For a discussion about these specifications see Bernardo and Smith (1994).
52
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 29
multivariate Normal likelihood for the term structure observed, allowing us to recover the posterior dis-
tribution of the parameters through equation (??) with the use of estimation algorithms via MCMC.
In order to obtain the predictive distribution of the one-step-ahead model we use the relation:
ˆ
(6.7) yt+1 |yt ) =
p(b yt+1 |Θ)p(Θ|yt )p(Θ)dΘ
p(b
which is future likelihood weighted by the posterior distribution of parameters, where yt are observa-
tions until period t. We summarize the one-step-ahead forecasts using the mean and the percentiles of
the predictive distribution given by equation (6.7).
7. Bayesian Shrinkage
The models specified for the multiple yield curves present a high number of parameters to be estimated,
mainly in the specification for the dynamics of latent factors. The high dimensionality, together with the
identification problems discussed in section 8, render the estimation of models of term structure of interest
rates a very complicated econometric problem.
The usual solutions for the problem of dimensionality of the parameter vector in multimarket models
involve the imposition of ad hoc restrictions, which are also connected to the identification problem. In
the global curve model proposed by Diebold et al. (2008) they reduced the number of parameters con-
sidering only the level and slope components, and discarding the curvature and double curvature factors.
However, this procedure reduces considerably the model fit, particularly for longer maturities. In these
long maturities the curvature and double curvature components significantly increase the model fit (e.g.
Björk and Christensen (1999), Christensen et al. (2008)).
53
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 30
Many of those parameters, however, are expected to be non-significant, and can therefore be eliminated
from the model. In the specification of the generalized latent factor model, the factor parameters with
distinct interpretations are not expected to be important for an explanation of the other factors. For
example, a double-curvature factor is very unlikely to affect the movements of the slope factor. This in-
terpretation can be justified by the performance of the principal component decomposition by Litterman
and Scheinkman (1991), where the components are displayed in orthogonal constructions.
In Bayesian estimation, this problem is implicitly addressed by the prior structure used. A parame-
ter with an expected zero posterior value is generally specified by an prior distribution concentrated on
zero-value. This is the interpretation of the so-called Minnesota prior (e.g. Doan et al. (1984) ) employed
in time series models. This interpretation advocates the use of a prior for non-stationary time series,
defining a random walk process in autoregressive vector models, thus imposing a zero-centered prior for
discrepancies superior to one, and centered on one for the first variable discrepancy.
There is an alternative way of approaching this problem, which consists in using techniques known as
Bayesian Shrinkage, by using priors which place a greater weight on zero than the standard priors. In this
study we employ two forms of shrinkage priors. The first form uses Laplace prior (Double Exponential),
and the second form uses the generalized Minnesota prior.
The estimation which uses the Laplace prior (Double Exponential) is related to the estimation method
known as LASSO - Least absolute shrinkage and selection operator, proposed by Tibshirani (1996) . The
LASSO estimator is obtained as the solution for an estimation problem employing a penality ℓ1 in the
minimization of the problem:
54
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 31
q
X
(7.1) ||Y − Xβ|| + λ |βj |.
j=2
One advantage of the LASSO estimator is that, instead of just forcing the estimators to zero, as it
happens in the case of techniques such as ridge regression, it effectively allows some estimators to be
identically equal to zero, simultaneously performing a procedure of shrinkage and selection of models.
The LASSO estimator can be interpreted as an estimation in an posterior fashion in a Bayesian context,
by using a Laplace prior distribution (Double Exponential) as pointed out by Tibshirani (1996) himself
and Park and Casella (2008). This Laplace distribution is a function of two-hyperparameters (µ, b) in the
form:
1 (− |x−µ| )
(7.2) π(β) = e b ,
2b
where (µ, b) can be interpreted as location and scale factors. Figure 7.1 presents different specifications
of Laplace priors showing that the weight on zero is much greater than, for example, Gaussian priors with
the same scale factor. The only difficulty associated to the use of the Laplace priors is that this distri-
bution is not conjugate, and thus an additional stage becomes necessary in the estimation procedure, by
employing the Metropolis-Hastings algorithm. We have adopted this independent Laplace prior structure
for the parameters in the autoregressive vectors which define the latent factors, and we have used the
values 0 and .1 as values of µ e b.
The generalized Minnesota prior proposed in Robertson (1999) and Kadiyala and Karlsson (2007),
and advocated for use in the estimation of models of high-dimension Bayesian autoregressive vectors by
Banbura et al. (2008), consists in a prior generalization proposed by Doan et al. (1984). In this formulation
we have that the prior for parameter matrices Φi and Φj , in the models of generalized factors, is given by:
55
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 32
Prior Laplace
1.0
0.8
b=.5
b=1
b=2
0.6
Probability
0.4
0.2
0.0
−4 −2 0 2 4
Suporte
O
(7.3) vec [Φi ] = N vec(Φi0 ), ψ Ω0 ,
where vec is the operator which stacks the parameters in a column and
(7.4) ψ ∼ iW (S0 , α0 ),
δi , if i = k λ2 , if i = k
where E(S0 ) = and α0 = and and iW denotes an Inverse Wishart
0 c.c. λ2 σi c.c.
σj
distribution.
In this case, we assume that δi is the expected variance of each latent factor, and parameter λ controls the
chosen shrinkage factor, and that σi and σj are the latent factors’ variabilities. For the models considered
in this study we use a shrinkage factor λ=.1.
8. Identification
Note that, in the definition of the possible specifications for the dynamic models for the term structure
of interest rates, there is a trade-off between a richer specification and the difficulty in the computational
56
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 33
estimation. Dai and Singleton (2000) point out that the problems in the specification of affine models of
term structure of interest rates involve admissibility conditions, i.e., the model leads to well defined bond
prices and to econometric identification conditions.
The identification concept in econometric models can be summarized as the property whereby a model
is considered non-identified if there is more than one parameter vector defining an equivalent likelihood
function, and this perspective is valid both in classic models (e.g. Rothemberg (1971) and in Bayesian
models (e.g Kadane (1974), Poirier (1998) and Aldrich (2002)).
In formal terms, we have that a model is a non-identified model if we consider a regular likelihood function
L(θ; y), where θ ∈ Θ is a vector Kx1 and Θ ∈ RK . If θ is not identified for the whole θ(1) ∈ Θ there is
another θ(2) ∈ Θ so that L(θ(1) ; y) = L(θ(2) ; y) for the whole y. In this way, in non-identified models there
is more than one parameter vector in the parametric space satisfying the estimation criterion function,
the maximum of the likelihood function.
According to Kadane (1974), identification is a property of the likelihood function, and thus the iden-
tification is the same in both classic and Bayesian perspectives. As stated in Poirier (1998) the solution
for the estimation of non-identified models is the same under both perspectives, which is, to use more
information in the model - and this information is generally not contained in the sample. The solution
proposed for the identification problems in the classic perspective is generally represented by the imposi-
tion of restrictions in the parametric space, usually eliminating redundant parameters in the model. The
Bayesian perspective, in which identification is generally obtained by the use of prior information, is less
dogmatic. Quoting Poirier (1998):
“A Bayesian analysis of a non-identified model is always possible if a proper prior on all the parameters
is specified.”
57
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 34
The estimation of non-identified models by the prior is obtained by noting that the use of an ade-
quate prior can reduce the sampling space possible in the subsequent distribution of data, thus reducing
the probability of the posterior being placed in a non-identification region. As formally put in Florens
et al. (1990) the choice of an appropriate prior makes it possible to estimate posterior identified by the
reduction of the sigma-algebra generating the posterior distribution of interest parameters. Nevertheless,
some considerations must be introduced here. The first consideration is that there may be situations
where the data is non-informative and prior and posterior distributions are equivalent; also, problematic
situations can arise if inadequate priors are employed. A detailed discussion of these problems can be
found in [Poirier, 1998]. This same reference contains the discussion of two situations directly related to
the estimation in the context of the problem of estimating models of term structure proposed in this study.
The first discussion relates to the multicolinearity problem. Note that in the general form of the
Nelson-Siegel-Svensson (eq. 4.1) models, there is a potential multicolinearity problem that results in the
non-identification of the model. The terms related to components β2 and β4 are potentially non-identified
for values close to these two factors, and that includes the specification of the value of the slope parameters.
It is usually assumed that τ2 > τ1 for the identification of the model, as in Christensen et al. (2008). In the
case of multicolinearity, the identification can be obtained by means of an adequate prior for the relevant
parameters, and in the case of the estimation of the slope parameters, the identification is obtained by
assuming priors that will lead the posterior of these parameters to distributions where there is maximum
subsequent probability of observing τ2 > τ1 .
The second relevant discussion is related to the estimation of hierarchical models, where it is possible
to show that identification can be obtained by means of an appropriate choice for the prior distribution
in the parameters involved in each hierarchy of the model. In the formulation proposed for the model,
we use a state-space representation for the evolution of the model’s latent factors, and the estimation of
this representation is given by the hierarchical formulation in which the prior for the latent factor in t+1
is the estimated posterior for this latent factor in time t. In this case we have that the informativeness
58
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 35
condition of the data is always respected, and, with an adequate choice of priors, the model can always
be estimated, thus avoiding the non-identification problems generally found in the classic estimation us-
ing the Kalman filter (e.g. Duffe (2002) ). This problem and its economic implications are discussed in
Kim and Orphanides (2005), who point out that affine models of term structure can be characterized by
estimations that are observationally equivalent but very different in their economic interpretations.
Note that a term structure of interest rates has sufficient statistics to identify the necessary parameters,
as the statistics are given by the past observations of the yield curve. This is exactly one of the two-stage
estimation problems employed in Diebold and Li (2006), Diebold et al. (2008), that being the fact that the
estimation of the first stage actually ignores all the time dependence structure between the latent factors.
The simultaneous Bayesian estimation uses the estimated parameter in t-1 period as the prior for the
parameter in time t, and as generally this estimation is informative, we achieve the reduction discussed
in Florens et al. (1990) in the sigma-algebra generator of the posterior distribution of the latent factor,
solving the identification problem for the models analyzed.
9. Empirical application
9.1. Database. In order to perform the empirical analysis of the models proposed, we employ yield curves
of two different markets. The first curve is built on data of the term structure of the "Cupom Cambial"
curve. Cupom Cambial can be summed up as a term structure of instruments negotiated in Brazil but
with yields in dollars. Other studies for modeling the Cupom Cambial curve are found in Pinheiro et al.
(2007) which models this curve using a polynomial structure with latent variables, and in Pereira (2009),
which uses a simplified form of the Diebold and Li (2006) model.
The Cupom Cambial curve was built by means of a synthetic instrument calculated from the assets
transacted at BM&F ("Bolsa de Mercadorias e Futuros") (Brazil’s Commodities and Futures Exchange).
The Cupom Cambial was calculated by no-arbitrage equalizing the returns of the DDI (Contrato de
59
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 36
Cupom Cambial) (Contract Coupon Exchange), which is a fixed interest instrument whose remuneration
is accrued with the CDI (Interbank Deposit Certificate)’s accumulated returns. The formula used for
calculating the Cupom Cambial is given by:
QT !
( 252
1
)
t=1 (1 + it ) 360
(9.1) Ct = ∗ ,
(1 + ∆et ) T
where T is the number of days between the negotiation and the expiration dates of the contract, it is the
CDI rate negotiated in the interbank market on t day and ∆et is the exchange rate appreciation measured
by the exchange rate per dollar in Real (PTAX800) observed between the trading day before the date of
the operation in the futures market and the last day of the month before the expiration of the contract.
Because of a distortion in this contract caused by using the PTAX of the previous day, we apply to this
bond the replication methodology by market instruments with more liquidity, employing the spot dollar,
dollar futures, DI futures and Forward Rate Agreements. This methodology is used in Pereira (2009),
which provides a detailed discussion of the advantages of this procedure.
The other curve used in this study is a yield curve built from the remunerations obtained in the Eurodol-
lar market, which corresponds to the market of financial deposits in dollars negotiated outside the USA.
This external curve is constructed by using Eurodollar term contracts traded in the Chicago Mercantile
Exchange. Both curves in this study are constructed using the methodology suggested by Burghardt
(2003). Note that these two instruments are chosen so as to have remuneration in the same currency, thus
eliminating the influence of exchange rate variations on returns from different markets.
In both curves, we work with fixed Maturities of 6, 9 ,12, 24, 36, 48, 60, 72, 84, 96, 108 and 120 months,
the sample period ranging from March 6, 2007 to November 26, 2008, containing 402 observed curves.
The descriptive statistics for each maturity of both these curves are displayed in Table 1, whereas Figure
60
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 37
There follow some characteristics to be observed in these curves. In these two curves there is, as pointed
out in Pereira (2009), a movement of increase in the average volatility and in the spreads. The final period
of the Eurodollar yield curve reflects the interest rates imposed by the Federal Reserve.
These patterns in the yield curves render them interesting as objects of study for the proposed models.
The first point to be mentioned is that there is a great variability in the shape of the yield curves in time,
indicating that the latent factors must have great variability in these two curves. Another point is that it
is evident that the curve slope pattern changes considerably in time, which justifies the use of time-variant
parameters, in opposition consider it fixed as in Diebold et al. (2005) model. And a further point concerns
the volatility structure, which is not constant in time and thus justifies the use of the stochastic volatility
component.
61
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 38
180
270
360
720
1080
15 1440
1800
2160
2520
2880
3240
3600
Interest Rate
10
5
Time
180
270
Interest Rate
360
4
720
1080
1440
1800
2160
2520
2880
3
3240
3600
2
Time
(b) Eurodollar
9.2. Comparative analysis. In order to perform a complete analysis of the three classes of models
proposed, we undertook the estimation of the complete models of each class, with the imposition of sub-
models of each class also. These different specifications make it possible to analyze how the different model
characteristics affect the fit of the model and the results obtained. Thirteen different specifications were
estimated, detailed as follows:
62
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 39
(1) Independent Curves - in this specification the curves are independent - the latent factors of each
curve depend only on the other factors of the same curve, ignoring the interdependence with
the other market. This specification corresponds to the model of generalized latent factors, with
the restriction that parameters Φ, θ and γ corresponding to the curve of the other market, are
eliminated from the specification in equations (4.6), (4.7) and (4.8).
(2) Complete Generalized Latent Factor Model - this model corresponds to equations (4.5), (4.6), (4.7)
and (4.8) with all the parameters being estimated.
(3) Generalized Latent Factor Model with restricted crossed factor - In this model we assume that
matrix Φ in equation 4.6 has complete rank for the factors of the same curve, and it is a diagonal
matrix for the factors of the other yield curve.
(4) Diagonal Generalized Latent Factor Model - in this specification, we assume the matrices Φi and
Φj are diagonal, and thus each factor only depends on it in time t-1 and on the equivalent factor of
the other curve in time t-1. For example, the level in time t of the Cupom Cambial curve depends
only on the level of the Cupom Cambial curve in time t-1 and on the level of the Eurodollar curve
in t-1, and it does not depend on other factors.
(5) Triangular Generalized Latent Factor Model - the Cupom Cambial curve depends on itself in t-1
and on the Eurodollar curve, but the Eurodollar curve depends only on itself. In this model we
assume that the local curve is influenced by the foreign curve, but the foreign curve is independent
of the Cupom Cambial curve, thus assuming a triangular structure.
(6) Generalized Global Factor Model Identified with the Eurodollar Curve. In this model we assume
that the global factor is given by the latent factors of the Eurodollar curve. In this way, the Cupom
Cambial curve in this model is a direct displacement of the Eurodollar curve with the addition
of one idiosyncratic factor. This structure is much simpler than the complete generalized global
factor because we estimate the Eurodollar latent factors directly and we obtain the factors of the
Cupom Cambial curve by estimating only the corresponding loadings.
63
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 40
(7) Complete Global Factor Model - In this specification we assume the complete structure of gener-
alized global factor, where the factors of both the Eurodollar curve and of the Cupom Cambial
curve are displacements of the global latent factor with the addition of idiosyncratic factors, cor-
responding to equations (4.9)- (4.15) of the generalized global factor model.
(8) Generalized Latent Factor Model with Bayesian Shrinkage via Laplace Prior - In this specification
we estimate the complete model of generalized latent factors, but employing the Laplace Prior
structure (equations (7.1), (7.2)) for the autoregressive parameters of the model. In this case we
assume that (µ, b) are given by vector (0,.1).
(9) Generalized Latent Factor Model with Bayesian shrinkage via Generalized Minnesota Prior - Once
again the complete generalized latent factor model, but now employing the Generalized Minnesota
Prior structure described in equations (7.3) and (7.4).
(10) Generalized Model with 5 factors - We use the specification of 5 factors given by equation (5.12),
but without the no-arbitrage correction and assuming the generalized latent factor structure with
interactions between the yield curves. This model is analogous to the model proposed in Björk
and Christensen (1999). We assume in this specification that the slope parameters and stochastic
volatility are constant. This model aims at verifying the gain obtained by these two additional
factors in the fit of the yield curves.
(11) Generalized Model with 5 factors and No-Arbitrage conditions - We use a complete specification of
the Christensen et al. (2008) model, but assuming the structure of generalized latent factors which
provides the interaction between latent factors of different curves, and employing the no-arbitrage
correction. This model generalizes the Christensen et al. (2008) model for more than one market.
(12) Generalized Model with 5 factors and Bayesian Shrinkage. This model is similar to model 10, but
employs the shrinkage structure via Laplace Prior.
(13) Generalized Model with 5 factors, No-Arbitrage and Bayesian Shrinkage. This specification corre-
sponds to model 11 with the no-arbitrage correction with the addition of the use of shrinkage via
Laplace Prior.
64
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 41
The estimation of all these models employs a burn-in period (number of discarded samples) of 5,000 it-
erations, and another 10,000 iterations for the construction of posterior distributions. The verification
of the chains’ convergence is performed employing the Geweke and Gelman-Rubin procedures (e.g. Nt-
zoufras (2009)), indicating that there were no problems in the convergence of the simulated Markov chains.
The first mechanism of comparison between models uses DIC (Deviance Information Criterion) as pro-
posed by Spiegelhalter et al. (2002). DIC is a criterion of Bayesian information that enables model selection
in an analogous way to the generally employed BIC and AIC criteria. This DIC criterion is of interest
in the comparison of complex models with high number of parameters, because in DIC the penalty is
applied on the effective number of parameters as defined in Spiegelhalter et al. (2002). DIC also has the
characteristic of yielding results equivalent to the robust version of the AIC criterion (e.g. Claeskens and
Hjort (2008)) and thus it is also valid as a selection criterion in a classic inference perspective.
Table 2 shows the estimated DIC for the models estimated. By this criterion, the best models are models
7 and 8, corresponding to the complete Generalized Global Factor Model and to the Generalized Latent
Factor Model employing shrinkage through the Laplace Prior. The global factor model has a rather inferior
number of parameters than the generalized factor model, but it has a more complex structure due to the
global latent factors. The fact that the DIC of these two models is equivalent indicates that the fit within
the sample of these two models is equivalent, with a penalty for the effective number of parameters.
It is also important to note that the use of Laplace Prior makes it possible to reduce significantly the com-
plexity of the model, because, by comparing the DIC of model 2 with model 8, we notice a quite significant
reduction. Also to be noted is the fact that the worst model by DIC is the independent factor model,
which shows that the interaction between curves increases fit power in these models of term structure of
interest rates, even with penalties for greater complexity of the model.
65
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 42
Table 2. DIC
Model DIC Rank
Model 1 -41240.46 10
Model 2 -90263.49 6
Model 3 -42076.37 9
Model 4 -73328.45 8
Model 5 -96747.54 5
Model 6 -5774.245 11
Model 7 -115687.7 2
Model 8 -115814.7 1
Model 9 -74041.7 7
Model 10 -97506.19 4
Model 11 -97506.19 4
Model 12 -101149.4 3
Model 13 -101149.4 3
The result obtained by DIC indicates that the more general structure of models 7 and 8 effectively
increases the fit within the sample, even having penalized those models for their greater complexity. An-
other important point is that the DIC corresponds to the global fit of the model, and thus it does not
make it possible to differentiate the relative fit for each yield curve or for each maturity in particular. It
is relevant, however, to verify whether this greater complexity in the models selected by the DIC criterion
will lead to a better forecast power.
To undertake this analysis, we performed an analysis of predictive power, comparing the models es-
timated by means of several forecast analysis criteria. For one-step-ahead forecasts, we calculated the
following forecast criteria: ME (Mean Error), RMSE (Root Mean Squared Error), MAE (Mean Abso-
lute Error), MPE (Mean Percentage Error), MAPE (Mean Absolute Percentage Error) and the Theil’s U
criterion. The properties of these measures of forecast accuracy can be found in Hyndman and Koehler
(2006). Tables 3 and 4 show these measures in the Cupom Cambial and Eurodollar curves8.
8These curves correspond to an aggregation of forecast errors for all the maturities of the yield curves, but measures for
each maturity were also calculated separately. These results are not shown in this study for reasons of space but they are
available upon request.
66
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 43
The results obtained for the Cupom Cambial curve show that the best model by all forecast criteria is
model 12, which is the 5-factor model, obtaining a forecast power quite superior to that of other models.
We can interpret this result as due to the fact that the addition of the two additional slope and curvature
factors provides a better fit and forecast in this yield curve - this was expected because of the greater
variation in the shape of the Cupom Cambial curve in time. This conclusion can be observed by the Theil’s
U criterion, which shows the relative gain in forecast if compared with a naive forecast using random walk.
We can observe that the 5 factor models, whether with or without no-arbitrage correction, can systemat-
ically obtain a superior performance to that of the random walk in the Cupom Cambial curve. Although
it is a different sample, it is possible to compare its results with those obtained with the models for the
Cupom Cambial curve employed in Pinheiro et al. (2007). For the one-day-ahead forecast, the lowest
Theil’s U obtained in this study is .88, while we obtained a reduction to 0.3065 using model 12, highlight-
ing, however, that the samples come from different periods and the maturities studied are also different,
therefore this is an informal comparison.
Another important comment is that the no-arbitrage correction does not significantly reduce the predictive
67
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 44
power of these models, and this can be observed by comparing models 10-11 and 12-13. By the mean abso-
lute percentage error (MAPE) criterion, we observe that the no-arbitrage correction improves the forecast
power of the model when we compare model 10 with model 11, and thus the no-arbitrage correction, if
placed in a model with sufficient flexibility - such as the 5 factor model - does not represent a loss in the
forecast power. In this way, we can have the best of two worlds: no-arbitrage and accuracy in the forecasts.
For the Eurodollar curve, the general results show that all the models have an adequate forecast per-
formance, which can be verified, on the one hand, by the Theil’s U criterion, which shows that all the
models have a much superior performance than the naïve forecast employing the random walk, and, on
the other hand, by the MAPE criterion, which shows a low mean absolute percentage error. In this case,
it is to be noted that, by both MAE and MAPE criteria, the best model turns out to be model 7, which is
the complete generalized global factor model, whereas by other criteria the best model is the independent
model.
68
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 45
These results can be interpreted by observing that the Eurodollar curve must be prior much less sensitive
to influences of the other yield curves, and consequently the best forecast result of the independent curves
model makes sense. Changes in the Cupom Cambial yield curve should not have significant forecast power
on the Eurodollar curve, and, given the lesser complexity of this model, this characteristic is reflected on
a lesser forecast mean error. However, we note that in general the models are characterized by a negative
bias in the forecast, which does not happen in model 7, which is the best model by MAE and MAPE
criteria. The result obtained by model 7 can be explained by the format of the global factors estimated,
which are closer to the Eurodollar curve than they are to the Cupom Cambial Curve, as shown in Figure
9.3.
In the case of the Eurodollar curve, the addition of another slope and curvature factors does not repre-
sent a better forecast power, and models 10-13 have a rather inferior performance than the other models.
This result is consistent with the stylized fact that the shape of the yield curve of the developed countries
is simpler than the curves of the emerging countries, and thus the forecast power of the simpler models is
greater for this curve than for the Cupom Cambial curve, which needs a more flexible specification.
9.3. The Importance of No-Arbitrage Correction. Although the consistency with no-arbitrage is
a fundamental condition in the specification of models of term structure of interest rates, an interesting
question is to verify whether the yield curves observed are consistent or not with no-arbitrage. In the
context of the Christensen et al. (2008) formulation, it is possible to measure this effect by looking at the
correction factor Ci (t, T ) in equation 5.15. Note that this model is basically a yield curve based on the
Björk and Christensen (1999) model, adding the correction factor to ensure consistency with no-arbitrage.
If the magnitude of this factor is too low and not significant, we have the evidence that the adjusted curve
itself is already arbitrage-free, and this correction factor is therefore not necessary. One way of verifying
this effect is by observing the posterior distribution of the no-arbitrage correction factor estimated by the
69
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 46
MCMC algorithm.
Table 5 shows the 2.5%, 50% and 97.5% percentiles for the posterior distribution estimated for the
no-arbitrage correction factor expressed by the term Ci (t, T ) in equation 5.13, for each maturity estimated
in the Cupom Cambial and Eurodollar curves. These estimated percentiles can be interpreted as a cred-
ibility interval for the no-arbitrage correction factor, and can be used for hypothesis test purpose (e.g.
Bernardo and Smith (1994)). The null hypothesis of interest would be that the no-arbitrage correction in
each Nevertheless is equal to zero, against an alternative hypothesis that this correction is different from
zero. In this case, the validity of this null hypothesis can be tested by verifying whether the credibility
intervals obtained from the posterior distribution of the no-arbitrage correction factor include or not the
zero punctual value.
Table 5 shows that, in general, the values estimated for the no-arbitrage correction factor have a re-
duced magnitude, and that only the 1440 and 1800-day maturities for the Cupom Cambial curve do not
include zero in the intervals, and there would thus be the need for no-arbitrage correction only in these
two maturities for the Cupom Cambial curve, whereas in the Eurodollar curve we cannot reject the fact
that all the correction factors are statistically equal to zero by the confidence intervals estimated.
These results confirm that the greater liquidity of the Eurodollar market already ensures that the data
observed in the yield curve is free from systematic arbitrage opportunities, while in the Cupom Cambial
curve these possibilities can still be present. These results are consistent with the results obtained by the
forecast analysis, which show that, for the Eurodollar curve, the no-arbitrage correction does not alter
significantly the forecast results. Nevertheless, it must be pointed out that these results are conditional
to the structure assumed for these no-arbitrage verification procedures, which assume that the adequate
model for arbitrage-free modeling of the term structure of yield curves is given by the Christensen et al.
(2008) model, and therefore the results obtained are conditional to this model. Tests with other forms of
70
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 47
Cupom Cambial quantile 2.5 % quantile 50 % quantile 97.5 % Eurodollar quantile 2.5 % quantile 50 % quantile 97.5 %
6 -2.505592e-04 5.627831e-05 0.0003210676 6 -2.923742e-04 -1.010111e-04 1.133912e-05
9 -2.220485e-04 5.973929e-05 0.0003037833 9 -2.099763e-04 -6.273529e-05 1.253434e-05
12 -1.978511e-04 6.004418e-05 0.0002859084 12 -1.570117e-04 -3.717365e-05 2.503620e-05
24 -1.148664e-04 5.428410e-05 0.0002036099 24 -2.398569e-05 2.081719e-05 5.001232e-05
36 -3.272735e-05 5.953053e-05 0.0001443693 36 -1.781659e-05 5.687899e-05 9.922963e-05
48 4.570273e-05 7.687623e-05 0.0001133094 48 -1.364977e-05 8.434156e-05 1.812485e-04
60 2.077555e-05 1.017708e-04 0.0001943380 60 -7.023459e-05 1.013277e-04 2.686080e-04
72 -3.571103e-05 1.293691e-04 0.0003155333 72 -1.510072e-04 1.114461e-04 3.622645e-04
84 -1.008644e-04 1.601904e-04 0.0004644550 84 -2.554487e-04 1.097855e-04 4.644888e-04
96 -1.747916e-04 1.919751e-04 0.0006244936 96 -3.797314e-04 1.102472e-04 5.769284e-04
108 -2.651789e-04 2.223300e-04 0.0007991271 108 -5.257737e-04 9.985080e-05 6.997016e-04
120 -3.731035e-04 2.527459e-04 0.0009915839 120 -6.933878e-04 8.184704e-05 8.314890e-04
9.4. Estimated Latent Factors. In order to illustrate briefly some of the characteristics of the esti-
mated models, we present some graphic comparisons between results of distinct specifications. As there
are several factors and distinct models, we only present some selected results. All the figures have the
mean and the 2.5% and 97.5% percentiles of the posterior distributions of each latent factor, representing a
confidence interval of 95%. Figure 9.2 shows, for the Cupom Cambial and Eurodollar curves, the evolution
of level factor (β1 ) for model 1 (independent curve model), for model 2 (generalized latent factors model),
and for model 13 (no-arbitrage model with Bayesian Shrinkage). As can be observed, the results are quite
similar between the various models, which is in line with the estimation of this factor via distinct models
for the term structure (e.g. Almeida (2005)) showing that the results are similar whether the models
employ or not the no-arbitrage structure and different specifications.
The results for the level factor estimated by the second class of models based on the structure of global
factors are displayed in figure 9.3, which presents the estimated components of level and slope. Sub-figures
a) and d) show the estimation of the level and slope global factors, and the other sub-figures show the
71
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 48
β1 β1 β1
0.11
0.12
0.14
0.10
0.12
0.09
0.10
β1
β1
β1
0.08
0.10
0.08
0.07
0.08
0.06
0.06
0.06
0.05
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
(a) Model 1 - Cupom Cambial (b) Model 2 - Cupom Cambial (c) Model 13 - Cupom Cambial
0.060
0.060
0.055
0.055
0.055
0.050
betao1
betao1
betao1
0.050
0.050
0.045
0.045
0.045
0.040
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
transformations obtained for the curves of each market via equations 4.10 and 4.11. It can be observed
that the global factors are more similar to the factors obtained for the Eurodollar curve, but it must
also be observed that the idiosyncratic components are important to all the curves. Another important
point to note is that the local factors obtained by the generalized global factor model are quite similar to
those obtained by the other models estimated, showing the consistency in the estimation of all the mod-
els proposed, and indicating also that the Bayesian methodology proposed does not suffer identification
72
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 49
betag1 β1 betao1
0.30
−2
0.065
0.25
−4
0.060
0.20
−6
0.055
betag1
betao1
β1
0.050
0.15
−8
0.045
0.10
−10
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
(a) Global Factor - Level (b) Level Factor - Cupom Cambial(c) Level Factor - Eurodollar
Curve Curve
betag2 β2 betao2
−0.05
10
−0.01
−0.10
8
−0.02
6
−0.15
−0.03
4
betag2
betao2
β2
−0.20
−0.04
2
−0.25
−0.05
0
−2
−0.06
−0.30
−4
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
(d) Global Factor - Slope (e) Slope Factor - Cupom Cambial(f) Slope Factor - Eurodollar
Curve Curve
problems. An identification problem would be graphically evident if we had very distinct performances of
the same factor with similar fit power, which does not occur with the models estimated, since in all the
models the factors estimated are similar.
The importance of making the decay parameters τ1i and τ2i time variant can be observed in Figure
9.4, which shows the dynamic evolution of these parameters for the two modelled yield curves, by the
73
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 50
τ1 τ2
0.56
0.8
0.54
0.7
0.52
τ1
τ2
0.50
0.6
0.48
0.5
0.46
0.4
Time Time
(a) Tau 1 - Cupom Cambial Curve (b) Tau 2 - Cupom Cambial Curve
τ1 τ2
0.79
0.88
0.78
0.86
0.84
0.77
0.82
0.76
τ1
τ2
0.80
0.75
0.78
0.74
0.76
0.73
Time Time
estimation in model 2. It can be noted that there is a significant time variation in these parameters, in
particular for parameter τ1 in the two curves, although parameter τ2 has a noisier behavior and a smaller
variation interval. This variation pattern shows that this modification provides the estimated models with
greater adaptation to the changes in the term structure of the interest rates observed in Figure 9.1, and it
also avoids the need for an ad hoc specification of the slope parameters as done in the studies of Diebold
and Li (2006); Diebold et al. (2008).
74
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 51
0.12
0.035
0.10
Stochastic Volatility
Stochastic Volatility
0.030
0.08
0.025
0.06
Time Time
The validity of employing the stochastic volatility factors can be visualized in Figure 9.5, which shows
the evolution of these two factors estimated by model 2. The dynamics of these two factors is consistent
with the volatility pattern observed in the yield curves (figure 9.1), accompanying the periods of volatil-
ity increase and reduction in these two curves, and it also shows that these additional latent factors are
important for the correct identification of the variation in the other latent factors of the model. In all the
models displaying the presence of stochastic volatility the same behavior can be observed.
Figure 9.6 shows some examples of forecasts produced by the models proposed. Sub-figures a) and b)
present a comparison of the one-day-ahead forecasts for the Cupom Cambial and Eurodollar curves per-
formed for all the models, obtained as posterior mean of the one-step-ahead forecasts. Sub-figure c) shows
an example of confidence interval construction of 95% for one-day-ahead forecasts for a determined day in
the Eurodollar curve - in this case employing model 2 of generalized latent factors; and finally, sub-figure
d) shows a comparison between the forecasts employing the 5 factor model (model 12, continuous line)
and the equivalent model with the no-arbitrage correction (model 13, dashed line) for the Cupom Cambial
75
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 52
Figure 9.6. One-day Ahead Forecasting: a,b) Mean of the Posterior Predictive Distribu-
tion; c) Example of Model 2 Predictive Interval; d) Effect of the Arbitrage-Free Correction:
Not Corrected (Model 2 - Continuous line) and Corrected (Model 12 - Dashed Line)
1 1
2 2
0.09
0.048
3 3
4 4
5 5
6 6
7 7
0.08
8 8
0.046
9 9
10 10
11 11
12 12
0.07
13 13
Interest Rate
Interest Rate
0.044
0.06
0.042
0.05
0.040
0.04
500 1000 1500 2000 2500 3000 3500 500 1000 1500 2000 2500 3000 3500
Maturity Maturity
* *
0.09
*
*
0.045
*
*
*
0.08
* *
*
0.040
*
0.07
Interest Rate
Interest Rate
* *
*
0.06
*
0.035
*
0.05
* *
0.030
*
0.04
*
* *
* *
2 4 6 8 10 2 4 6 8 10
Maturity Maturity
(c) Credibility Intervals 6/06/2008 (d) With and without no-arbitrage cor-
rections 6/06/2008
curve, showing that the effects of no-arbitrage correction have a reduced magnitude - consistent with the
general result shown in Table 5.
In all these examples we make direct use of a property derived from the estimation procedure by MCMC,
which is the ability of building exact confidence intervals in finite samples for the latent factors and for
76
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 53
the model’s forecasts. Note that in the original procedures of the estimation of the Diebold and Li (2006);
Diebold et al. (2008) models, the confidence intervals are built without taking into consideration the two-
stage estimation performed, and thus they only have asymptotic validity and the can thus be considerably
biased in finite samples.
10. Conclusions
In this study, a series of innovations are proposed in relation to the procedures usually employed in the
estimation of models for the term structure of interest rates, particularly models based on the Diebold
and Li (2006); Diebold et al. (2008); Christensen et al. (2008)’s specifications. These innovations make it
possible to overcome various limitations and restrictions found in these methods, such as the choice of the
functional form, limited to restricted versions with only level and slope factors, as in the model adopted
in Diebold et al. (2008) or else the use of fixed slope parameters chosen ad hoc, as employed in Diebold
and Li (2006)’s estimation. The results obtained demonstrate that there is clear evidence that not only do
the latent factors evolve in time, but also that other factors, such as the slope and volatility parameters
must be addressed as additional latent factors, providing more precise fit and forecast procedures for the
term structure of interest rates, specially of emerging countries’ yield curves, which are characterized by
richer shape and more frequent shape changes.
The estimation procedures based on Bayesian inference employing MCMC algorithms make it possible
to address the problems that generally affect the estimation of latent factors models used in the modeling
of interest rates, such as the existence of local maxima and the identification problems. Estimation
via MCMC does not employ numerical maximization, and the structure of prior information and the
hierarchical formulation make it possible to circumvent the identification problems found in the estimation
of models of term structure of interest rates. This same estimation structure makes it possible to reduce
the dimensionality of the model, by the use of Bayesian Shrinkage, a procedure which proves to be
quite effective, as demonstrated by the use of DIC information criterion in model comparison; thus the
77
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 54
estimation of these models does not need ad hoc restrictions, such as the exclusion of latent factors or the
fixation of parameters. The proposed procedures of Bayesian Shrinkage are quite effective in reducing the
dimensionality and complexity of the models proposed, a problem particularly important in the context
of joint modeling of more than a market.
Bayesian inference is particularly useful in tackling problems related to the complexity of the models of
term structure of interest rates characterized by non-linear structures, and difficult to be estimated by
classic methodologies, such as the likelihood estimation via Kalman filter. This procedure enables the
construction of exact confidence intervals, and the estimation in several stages is not necessary. The
estimation procedure by MCMC is of interest because all the information in the sample is directly used in
the estimation, as the hierarchical structure in state-space employs all the information in cross-section and
in time. Bayesian estimation makes it possible to estimate more complex and flexible models for the term
structure of interest rates, facilitating not only a better fit but also the use of no-arbitrage corrections,
which require a more complex structure of latent factors, as demonstrated by Filipovic (1999), Björk
and Christensen (1999) and Christensen et al. (2008). The results indicate that, by using the estimation
mechanisms proposed, it is possible to achieve, on the one hand, flexibility in the estimation, and on the
other, consistency with no-arbitrage, thus making it possible to generalize these arbitrage-free formulations
for the simultaneous fit of multiple yield curves.
This methodology of estimation makes it possible to obtain a posterior distribution of all the non-
observed components, parameters and latent factors, and these distributions can be employed to verify
other important characteristics, such as, for example, the validity of the no-arbitrage correction by means
of the posterior distribution of the factor of no-arbitrage correction factor. Note that this parameter is
a non-linear function of the slope parameters, and thus its distribution is not a standard distribution,
thereby the use of classic procedures of inference is not trivial, whereas in the Bayesian estimation this
information is a standard sub-product of the estimation procedure.
78
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 55
The results obtained in the empirical application with joint modeling of the Cupom Cambial and Eu-
rodollar curves are of great interest. These results demonstrate that the innovations proposed, such as
the use of additional latent factors for the conditional volatility and the slope parameters, are effective for
the fit and forecast of the term structure of these two markets, which are characterized by rich dynamics
in the shape of their curves. Another significant result is that the interdependence structure adopted
demonstrates that there is gain by using the information in the Eurodollar curve for the fit of the Cupom
Cambial curve, but the opposite is not as important, and this accords with the size and relative importance
of those two markets. Such results are confirmed by the predictive analysis performed, which establish
the validity of the specifications proposed. Moreover, a further confirmation is that the greater liquidity
of the Eurodollar market impedes the occurrence of systematic opportunities for arbitrage, and this does
not occur for some maturities in the Cupom Cambial market.
References
Aldrich, J. (2002). How likelihood and identifition went bayesian. International Statistical Review 70,
79–89.
Almeida, C. I. R. (2005). A note on the relation between principal components and dynamic factors in
affine term structure models. Revista de Econometria 25(1), 89–114.
Almeida, C. I. R. and J. V. M. Vicente (2008). The role of no-arbitrage on forecasting: Lessons from a
parametric term structure model. Journal of Banking and Finance 32, 2695–2705.
Banbura, M., D. Giannone, and L. Reichlin (2008). Large Bayesian Vars. European Central Bank Working
Paper.
Bauwens, L., M. Lubrano, and J.-F. Richard (1999). Bayesian Inference in Dynamic Econometric Models.
Cambridge University Press.
Bernardo, J. and A. Smith (1994). Bayesian Theory. Wiley.
79
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 56
Björk, T. and B. J. Christensen (1999). Interest rate dynamics and consistent forward rate dynamics.
Mathematical Finance 9, 323–348.
Brigo, D. and F. Mercurio (2006). Interest Rates Models - Theory and Practice (2nd Edition). Springer.
Burghardt, G. (2003). The Eurodollar futures and Options Handbook. McGrawHill.
Chan, K. G., G. Karolyi, F. Longstaff, and A. Sanders (1992). An empirical comparasion of alternative
models of short term interest rate. Journal of Finance 47, 1209–1297.
Christensen, J. H., F. Diebold, and G. Rudebusch (2007). The affine arbitrage-free class of Nelson-Siegel
term structure models. NBER Working Paper No. 13611.
Christensen, J. H., F. X. Diebold, and G. D. Rudebusch (2008). An arbitrage-free generalized Nelson-Siegel
term structure model. Econometrics Journal forthcoming.
Claeskens, C. and L. Hjort, N (2008). Model Selection and Model Averaging. Cambridge University Press.
Cogley, T. and T. Sargent (2001). Evolving post worl war ii. u.s. inflation dynamics. NBER Macroeco-
nomics Annual 16, 331–373.
Cox, J. C., J. . E. Ingersoll, and S. A. Ross (1985). A theory of the term structure of interest rates.
Econometrica 53, 385–408.
Dai, Q. and K. Singleton (2000). Specification analysis of affine term structure models. Journal of
Finance 55, 1943–1978.
Delbaen, F. and W. Schachermayer (1994). A general version of the fundamental theory of asset pricing.
Mathematische Annalen 300, 463–520.
Diebold, F. and C. Li (2006). Forecasting the term structure of government bond yields. Journal of
Econometrics 130, 337–364.
Diebold, F., C. Li, and V. Yue (2008). Global yield curve dynamics and interactions: A generalized
Nelson-Siegel approach. Journal of Econometrics 146, 351–363.
Diebold, F. X., M. Piazzes, and G. Rudebusch (2005). Modeling bond yields in finance and macroeco-
nomics. American Economic Review 95(2), 415–420.
80
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 57
Doan, T., R. Litterman, and C. Sims (1984). Forecasting and conditional projection using realistic prior
distributions. Econometric Reviews 3, 1–100.
Duffe, G. (2002). Term premia and interest rate forecasts in affine models. Journal of Finance 57, 405–443.
Duffie, D. and R. Kan (1996, 6). A yield-factor model of interest rates. Mathematical Finance, 379–406.
Filipovic, D. (1999). A note on the Nelson-Siegel family. Mathematical Finance 9(4), 349–359.
Filipovic, D. (2001). Consistency Problems for Heath-Jarrow-Morton Interest Rate Models. Springer-
Verlag.
Florens, J. P., M. Mouchard, and J.-M. Rolin (1990). Elements of Bayesian Statistics. CRC.
Gamerman, D. and H. Lopes (2006). Markov Chain Monte Carlo: Stochastic Simulation for Bayesian
Inference, Second Edition. Chapman & Hall/CRC.
Harrison, J. M. and D. Kreps (1979). Martingales and arbitrage in multiperiod securities markets. Journal
of Economic Theory 20, 381–408.
Harrison, J. M. and S. Pliska (1981). Martingales and stochastic integrals in the theory of continous
trading. Stochastic Processes and Their Applications 11, 215–260.
Heath, D., R. Jarrow, and A. Morton (1992, jan). Bond pricing and the term structure of interest rates:
A new methodology for contingent claims valuation. Econometrica 60 (1).
Hyndman, R. J. and A. B. Koehler (2006). Another look at measures of forecast accuracy. International
Journal of Forecasting 22, 679–688.
Kadane, J. B. (1974). Bayesian Analysis in Econometrics and Statistics, Chapter The role of identification
in Bayesian Theory, pp. 175–191. North-Holland.
Kadiyala, K. R. and S. Karlsson (2007). Forecasting with generalized bayesian vector auto-regressions.
Journal of Forecasting 12, 365 – 378.
Kim, D. H. and A. Orphanides (2005). Term structure estimation with survey data on interest rate
forecasts. Finance and Economics Discussion Series, 2005-08, Board of Directors of Federal Reserve
System.
Koop, G. (2003). Bayesian Econometrics. Wiley.
81
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 58
Laurini, M. P. and L. K. Hotta (2008). Bayesian extensions to Diebold-Li term structure model. In
Forecasting in Rio, FGV-RJ, Rio de Janeiro.
Litterman, R. and J. Scheinkman (1991). Common factors affecting bond returns. Journal of Fixed
Income 1, 54–61.
Lund, J. and T. Andersen (1997). Estimating continuous-time stochastic volatility models of the short-
term interest rate. Journal Of Econometrics 77, 343–377.
Morita, R. and R. Bueno (2008). Investment grade countries yield curve dynamics. In 63rd. European
Meeting of the Econometric Society, 2008, Milão. Annals of the 63rd. European Meeting of the Econo-
metric Society.
Neal, R. (2003). Slice sampling (with discussions). Annals of Statistics 31, 705–767.
Nelson, C. R. and A. F. Siegel (1987). Parsimonous modelling of yield curves. Journal of Business 60 (4),
473–489.
Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS. Wiley.
Park, T. and G. Casella (2008). The bayesian lasso. Journal of the American Statistical Association 103,
681–686.
Pereira, F. T. G. (2009). Curva a termo para o risco de convertibilidade: Uma abordagem utilizando o
diferencial de juros. Unpublished Working Paper.
Pinheiro, F., C. I. Almeida, and J. Vicente (2007). Um modelo de fatores latentes com variáveis macroe-
conômicas para a curva de cupom cambial. Revista Brasileira de Finanças 5(1), 79–92.
Poirier, D. J. (1998). Revising beliefs in nonidentified models. Econometric Theory 14, 483–509.
Robert, C. P. and G. Casella (2005). Monte Carlo Statistical Methods. Springer.
Robertson, J. C. andTallman, E. W. (1999). Vector autoregressions: forecasting and reality. Economic
Review Q1, 4–18.
Rothemberg, T. (1971). Identification in parametric models. Econometrica 39, 577–591.
Sims, C. (2001). Comment on Sargent and Cogley’s evolving post world war ii u.s. inflation dynamics.
NBER Macroeconomics Annual 16, 373–379.
82
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 59
83
GENERALIZED EMPIRICAL LIKELIHOOD/MINIMUM CONTRAST
ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
Abstra
t. In this study we approa
h the semi-parametri
estimation of Sto
hasti
Dif-
ferential Equations employing methods of generalized empiri
al likelihood and generalized
minimum
ontrast. The results obtained demonstrate that the estimators proposed, parti
u-
larly, the exponentially tilted empiri
al likelihood (S
henna
h (2007)) estimator, obtain better
results than those of the Generalized Methods of Moment generally used in the estimation
of sto
hasti
dierential equations. These results are derived from the robustness properties
of this method in the presen
e of problems of in
orre
t spe
i
ation, whi
h, in the
ontext of
the estimation of sto
hasti
dierential equations, o
urs by using the pro
ess' approximate
dis
retizations in the
onstru
tion of moment
onditions. The analyses are
arried out by
means of Monte Carlo experiments and of an empiri
al appli
ation estimating several models
of short-term interest rates for a series of Treasury Bills with a one-month maturity.
Key Words: Sto
hasti
Dierential Equations, Empiri
al Likelihood, Generalized Minimum
Contrast.
JEL Codes: C14, C22.
1. Introdu tion
The use of sto
hasti
pro
esses in
ontinuous time in the modeling and pri
ing of nan
ial
instruments is one of the basis of the modern theory of Finan
e, and its origin
an be tra
ed
ba
k to Ba
helier (1900)'s seminal study. The use of sto
hasti
pro
esses in
ontinuous time is
justied by the mathemati
al
onvenien
e in relation to the use of pro
esses in dis
rete time
and the possibility of employing the mathemati
al theory developed for the general
lass of
pro
esses known as
ontinuous semi-martingales, making it possible to perform an appli
ation
of the whole theory of pri
ing by no-arbitrage ((Harrison and Kreps (1979), Harrison and
Pliska (1981) and Delbaen and S
ha
hermayer (1994))) in this
ontext. The basi
obje
ts of
1
85
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 2
the modeling of sto
hasti
pro
esses in
ontinuous time are the so-
alled Sto
hasti
Dierential
Equations, whi
h are obje
ts represented in the general form:
where µ(t, Xt ) represents the deterministi
part of the pro
ess (instantaneous drift ), σ(t, Xt )
represents the sto
hasti
omponent (volatility) of the pro
ess, and Wt is the so-
alled Wiener
pro
ess or Brownian Motion. This representation is useful be
ause it makes it possible to
dene the evolution of the pro
ess traje
tories Xt by means of a representation given by a
sto
hasti
integration (e.g Rogers and Williams (2000), Karatzas and Shreve (1987), Kloeden
and Platen (1992)):
Z t Z t
(1.2) Xt = X0 + µ(t, Xt )dt + σ(t, Xt )dWt .
t0 t0
Dierent spe
i
ations of the drift µ(t, Xt ) and volatility σ(t, Xt ) pro
esses in the sto
hasti
dierential equation give rise to pro
esses with distin
t properties. These properties enable
the representation of a wide
lass of pro
esses used in nan
e. Fo
using on the modeling of
short-term interest rates, a series of alternative spe
i
ations for the modeling of short-term
interest rates have been employed. Table 1 presents some formulations used in the literature,
omprising the models of Merton (1973), Vasi
ek (1977), Cox et al. (1985), Dothan (1978),
Bla
k and S
holes (1973), Brennan and S
hwartz (1980), Cox et al. (1980) and Cox (1975).
Notably, on its last line is dened the model
alled Generalized Cox-Ingersoll-Ross (CIR),
ontaining all the previous models as parti
ular
ases, as demonstrated in Chan et al. (1992),
whi
h in
ludes a general dis
ussion on the properties of these models.
Parameter estimation in sto
hasti
dierential equations is a well developed theme in the
e
onometri
literature1, and there is a very wide range of te
hniques available. This range
of te
hniques is related to the di
ulties inherent to the estimation of sto
hasti
dierential
1For a review of the literature on sto
hasti
dierential equations see Gourieroux and Monfort (1996),
Prakasa Rao (1999), Bishwal (2007) and Zivot and Wang (2006)
86
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 3
equations, parti
ularly in the non-existen
e of
losed solutions for the sto
hasti
dierential
equations in general
ases and the problem of using data dis
retely observed in the estimation
of a pro
ess formulated in
ontinuous time. As examples of estimation methods in this
ontext,
we have maximum likelihood, generalized methods of moments (GMM), methods of simulated
moments, Martingale estimating equations, Markov
hain Monte Carlo and indire
t inferen
e,
and non-parametri
Methods. In prin
iple, the most re
ommended form of estimation
onsists
in employing the likelihood fun
tion, be
ause, under the regularity
onditions, the maximum
likelihood estimators are
onsistent, e
ient and asymptoti
ally normal. However, in the
ontext of the estimation of sto
hasti
dierential equations, the non-existen
e of general
solutions is a general di
ulty found in the use of methods based on the likelihood of the
pro
ess, whi
h is formulated by employing the transition density resulting from the solution
of the sto
hasti
dierential equation.
In the absen
e of analyti
al solutions, it is ne
essary to use approximations in the
onstru
-
tion of the likelihood fun
tion, su
h as the use of quasi-maximum likelihood methods, whi
h
generates estimators with minimum mean square error, or the use of simulated maximum like-
lihood, whi
h uses simulated traje
tories by Euler or Milstein dis
retizations in the likelihood
evaluation (Pedersen (1995)), or else approximations using Hermite expansions obtained by
Ait-Sahalia (2002). Note that, given the employment of approximations in the evaluation of
the likelihood fun
tion, the optimality properties of this estimator may not remain, and thus
other estimators
ould be
ome
ompetitive. Estimators using moment
onditions are also
often employed in the estimation of sto
hasti
dierential equations. The estimation using
87
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 4
the (GMM) by Hansen (1982), employing a simple dis
retization of the pro
ess, may be the
form most widely employed (e.g. Chan et al. (1992)). Although the generalized method of
moments is
hara
terized by properties of
onsisten
y and asymptoti
e
ien
y, its properties
in nite samples and in the presen
e of spe
i
ation problems may not be optimal. In order
to ta
kle these problems we dis
uss the use of two
lasses of estimators in the estimation of
sto
hasti
dierential equations employing dis
rete data - estimators of generalized empiri
al
likelihood and estimators of generalized minimum
ontrast,
omparing their performan
e with
that of the estimators based on estimation by the Generalized Method Moments. These esti-
mators are semi-parametri
in the sense that the parametri
form of the sto
hasti
dierential
equation is used through moment
onditions, but the non-observed density of the pro
ess is
evaluated in a non-parametri
form.
likelihood, exponential tilting and exponentially tilted empiri
al likelihood) possess the same
properties of
onsisten
y and rst-order asymptoti
e
ien
y (e.g. Smith (2001), S
henna
h
(2007)) as the
ompared GMM estimators (two-stage GMM, iterative GMM,
ontinuous up-
dating GMM). However, theoreti
al results demonstrate that these estimators may have supe-
rior properties in terms of bias in nite samples, and asymptoti
properties of superior order
(e.g. Kitamura (2006)). Furthermore, these estimators are asymptoti
ally e
ient in the
lass
of semi-parametri
estimators (in Bi
kel et al. (1993)'s sense), and have optimal properties in
terms of hypotheses tests: minimax optimality, optimality in the sense of large deviations, and
these tests are uniformly more powerful in the generalized Neyman-Person sense. The
lass of
estimators of generalized minimum
ontrast (exponential tilting and exponentially tilted em-
piri
al likelihood) has
hara
teristi
s of robustness in the presen
e of spe
i
ation problems.
These
hara
teristi
s of robustness of the estimators based on generalized minimum
ontrast
are of the utmost importan
e in the estimation of sto
hasti
dierential equations, and be-
ause of the non-existen
e of exa
t dis
retizations, all the estimators of
ontinuous pro
esses
employing dis
retely observed data are
hara
terized by a problem of in
orre
t spe
i
ation.
88
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 5
This study dis
usses the use of these methods in the estimation of sto
hasti
dierential equa-
tions, and the results obtained demonstrate that these estimators obtain superior results when
ompared with the te
hniques generally employed of generalized methods of moments. One
result of parti
ular interest is that the estimator of exponentially tilted empiri
al likelihood
(S
henna
h (2007)) obtains results that are mu
h superior in terms of nite sample bias, a
result derived from the properties of this model's robustness in the presen
e of in
orre
t spe
-
i
ation (e.g. Smith (2001), S
henna
h (2007)).
This arti
le is stru
tured as follows: se
tion 2 presents a brief review of the estimation of
sto
hasti
dierential equations employing the GMM. Se
tion 3 displays generalized empiri
al
likelihood and generalized minimum
ontrast based estimators, dis
ussing their properties,
similarities and potential advantages in the estimation of sto
hasti
dierential equations. A
series of Monte Carlo's experiments is performed in se
tion 4, aiming at stressing some prop-
erties of the estimators dis
ussed in this study. In se
tion 5, we perform the estimation of the
models in Table 1 employing GMM, generalized empiri
al likelihood and generalized minimum
ontrast estimators for a series of interest rates of Treasury Bills with a one-month maturity,
and Se
tion 6 presents the results of the spe
i
ation tests based on over-identi
ation
ondi-
tions for the models estimated. The nal
on
lusions are in se
tion 7, showing
on
isely that
the estimators proposed, whi
h are unpre
edented in the
ontext of estimation of sto
hasti
dierential equations, obtain results that are superior to the te
hniques of the Generalized
Methods of Moments generally employed in the estimation of sto
hasti
dierential equations.
89
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 6
(2.1) g(θ, Xt ),
where these onditons are evaluated by employing the sample moments su h as:
T
1X
(2.2) g (θ) = g(θ, xt ).
T
t=1
T
1X
(2.3) θb = argθ g(θ, xt ) = 0.
T t=1
Note that, ex
ept in the
ase of the number of parameters being equal to the number of
moment
onditions (exa
tly identied system), the problem des
ribed in 2.2 generally the are
no solutions. In order to obtain a single solution dene the following
riterion fun
tion:
and the minimization of this fun
tion denes the optimum solution of the problem, where
W is a weighting positive denite matrix. Hansen (1982) demonstrates that the e
ient
asymptoti
solution of the GMM estimation is obtained when this matrix is given by:
n √ o−1
(2.5) W∗ = lim V ar T g (θ) = Ω(θ)−1
t→∞
and thus the optimal weight is obtained by employing the inverse of the sample varian
e-
ovarian
e matrix. This matrix is usually estimated employing the
lass of HAC estimators
of Newey and West (1987) given by:
90
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 7
T
X −1
(2.6) b=
Ω b s (θ ∗ ),
kh (s)Γ
s=−(T −1)
where k is a kernel fun
tion dependent on the
hoi
e of a bandwidth h, whi
h
an be
hosen
using the Newey and West (1987)'s or Andrews (1991)'s pro
edures:
T
X
(2.7) b s (θ ∗ ) = 1
Γ g(θ ∗ , xt )g(θ ∗ , xt+s )′ ,
T
t=1
The e ient estimator of the GMM is then obtained as a solution of the problem:
There are several forms of performing the implementation of the GMM estimator. The
initial form proposed by Hansen (1982) is the estimator known as two-stage GMM. This
estimator is obtained by performing a rst stage by obtaining an initial estimator θb∗ =
arg min g (θ)′ Ωg (θ), where Ω is an initial weight matrix, normally an identity matrix. From
b (θ ∗ ) is
al
ulated in fun
tion of this initial estimation, and
this rst stage, a HAC matrix Ω
b (θ ∗ ) g (θ) with
the nal estimation of the GMM estimator is obtained as θb = arg min g (θ)′ Ω
the HAC matrix obtained in the rst stage.
Note that, in this
ase, there is a dependen
e on the results of the se
ond stage with the
initial estimation on the rst stage, and thus this pro
edure
an
reate a rst-order bias im-
pairing the performan
e of the estimator in nite samples (Hansen et al. (1996)). In order to
solve this problem, two alternative pro
edures are proposed. The rst pro
edure is known as
Iterative GMM, whi
h is a modi
ation of the two-stage pro
edure. In this pro
edure, the
estimation of the rst stage is reinitialized with the result of the se
ond stage estimation, and
this iteration
ontinues up to when a variation in the ve
tor of parameters be
omes smaller
91
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 8
Another possible estimator is known as GMM with
ontinuous updating (Hansen et al.
(1996)). In this
ase the estimation of the parameter θb is not performed in stages, but it
is performed simultaneously by employing an algorithm of numeri
al optimization. Starting
from an initial ve
tor θ0 (generally
hosen employing the two-stage GMM method) the estima-
b (θ ∗ ) g (θ) , but now θ and Ω
tion is performed by θb = arg min g (θ)′ Ω b (θ ∗ ) are simultaneously
determined. This pro
edure obtains the same rst-order properties as the Itera
tive GMM
estimator, but a
ording to Hansen et al. (1996), it has better properties in terms of bias
in nite samples. A
ording to Newey and Smith (2004) and Anatolyev (2005), the three
methods are asymptoti
ally equivalent, but the se
ond-order bias of the
ontinuous updating
estimator is smaller, and the iterations in
rease the estimator's e
ien
y. However, the nu-
meri
al pro
edure
an be subje
t to multiple modes in the obje
tive fun
tion, whi
h renders
this estimator numeri
ally unstable.
In order to perform the estimation of sto
hasti
dierential equations by employing the
GMM, it is ne
essary to formulate the moment
onditions in terms of some dis
retized form
of the model. The rst approa
h employed is by means of the simple dis
retization adopted
in Chan et al. (1992) for the Generalized CIR model (Table 1) given by:
with the
onditions: E(εt+1 ) = 0 and E(ε2t+1 ) = σ02 Xt2X . In this
ase, we
an formulate
the moment
onditions ne
essary for the estimation of parameters (α, β, γ, σ2 )), by dening
εt+1 = Xt+1 − Xt − α0 − β0 Xt , and dening four moment
onditions in this form:
92
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 9
εt+1
εt+1 Xt
(2.10) g(θ) = .
ε2t+1 − σ02 Xt2
γ
(ε2t+1 − σ02 Xt2γ )Xt
and applying the GMM estimation dened by equation 2.8. Moment
onditions for the
other submodels of the Generalized CIR family
an be obtained by imposing the ne
essary
restri
tions, a
ording to Table I in Chan et al. (1992). Note that this simple dis
retization is
not
onsistent - the dis
retization does not
onverge to the true solution of the pro
ess, sin
e
it ignores the time interval between observations. A simple form of obtaining a
onsistent
dis
retization for this pro
ess is to employ a rst-order Euler dis
retization, whi
h denes
moment
onditions given by a residual ve
tor in this form: εt+△t = rt+△ − rt − (α0 + β0 rt )△t ,
and thus
onstru
ting the ve
tor of moment
onditions as:
εt+△t
εt+△t Xt
(2.11) g(θ) = .
γ
ε2t+△t − σ02 Xt2 △t
(ε2t+△t − σ02 Xt2γ △t)Xt
This is the form employed in this study. Note that the use of dis
retization always rep-
resents a spe
i
ation problem in the inferen
e pro
edure, sin
e, even employing
onsistent
dis
retizations, the bias term
aused by the dis
retization employed only tends to zero when
△t → 0. Note that the time interval △t employed in the pro
ess of dis
retization depends on
the frequen
y of data observation, and thus it is not under the resear
her's
ontrol. Therefore,
there are two sour
es of bias problems in the estimation of sto
hasti
dierential equations:
the rst form derived from the use of Generalized Methods of Moments estimators, and an
additional form generated by the in
orre
t spe
i
ation given by the use of a non-exa
t dis-
retization of the pro
ess. Note that in Chan et al. (1992)'s original study, the estimation
employs a simple dis
retization of the model rather than the Euler dis
retization, and this
93
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 10
represents a bias in
rease in the estimation due to a spe
i
ation with a larger approximation
error. Consequen
es of this problem
an be seen in Prakasa Rao (1999), and a supplementary
dis
ussion of this problem is presented in se
tion 4, whi
h demonstrates that this dis
retization
problem leads to a problem of in
orre
t spe
i
ation in the estimation of sto
hasti
dierential
equations.
The opposite situation would be the estimation by the maximum mikelihood method, whi
h
employs not only the
onditional moments of the pro
ess but all the information in the
on-
ditional densities. If the pro
ess is
orre
tly spe
ied and meets the regularity
onditions,
then it is a better asymptoti
ally normal estimator, and it also rea
hes optimality in measures
su
h as Badahur e
ien
y (Kitamura (2006), DasGupta (2008)). Nevertheless, employing the
maximum likelihood in the estimation of sto
hasti
dierential equations is di
ult by the
non-existen
e of
losed forms for the solution of sto
hasti
dierential equations, and thus it
is not possible to employ parametri
forms for the maximum likelihood estimation.
An alternative form, not yet explored in the literature of inferen
e in
ontinuous time pro-
esses, is the use of a form of non-parametri
maximum likelihood estimation known as empir-
i
al likelihood (EL). A
ording to Kitamura (2006), assuming a sequen
e of IID data {xi }Ti=1 of
n PT o
an unknown density, and dening △ as the simplex (p1 , . . . , pT ) : p
t=1 t = 1, 0 ≤ p t ≤ 1, t = 1, . . . T ,
the non-parametri
log-likelihood fun
tion is dened as:
94
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 11
T
X
(3.1) ℓN P (p1 , . . . , pT ) = log pt , (p1 , . . . , pt ) ∈ △
t=1
Z
(3.2) E [g(θ, Xt )] = g(θ, X)dµ = 0, θ ∈ Θ ⊂ Rk ,
T
X T
X
ℓN P (p1 , . . . , pT ) = log pt , s.t. g(θ, Xt )pt = 0
t=1 t=1
The estimator that maximizes this expression is the maximum empiri
al likelihood estimate.
The impli
it probabilities are related to the validity of the moment
onditions. These impli
it
probabilities give more weight to the observations where the moment
onditions are
loser to
zero. Note the similarity with the estimation by the GMM, whi
h is a simplied form that
assumes that all weights are equal, i.e., pt = 1/n.
The use of empiri
al likelihood is parti
ularly important in the estimation of sto
hasti
dif-
ferential equations be
ause, ex
ept in a few parti
ular
ases, there are no exa
t solutions for
the sto
hasti
dierential equations, and thus it is not possible to
onstru
t analyti
ally the
transition densities of the pro
ess, whi
h makes it impossible to
onstru
t an exa
t likelihood
fun
tion. Empiri
al likelihood method allow us to assess the likelihood of the pro
ess in a
95
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 12
non-parametri
form, and thus they do not depend on the existen
e of analyti
al solutions for
the sto
hasti
dierential equations. This non-parametri
evaluation of the likelihood fun
tion
is e
ient in the semi-parametri
sense (e.g. Bi
kel et al. (1993)), and, at the same time, it
employs the parametri
spe
i
ation given by the sto
hasti
dierential equation to
onstru
t
moment
onditions. A dieren
e found with the GMM is that, in the methodology of gen-
eralized empiri
al likelihood, the moment
ondition
an be a pro
ess weakly dependent and
heteroskedasti
. In order to ta
kle this situation, Anatolyev (2005) proposes repla
ing g(θ, xt )
for a smoothed version dened as:
m
X
(3.3) gw (θ, xt ) = w(s)g(θ, xt−s )
s=−m
where w(s) are weights obtained by a kernel fun
tion adding one, in the spirit of a HAC
estimator (Andrews (1991)) . This modi
ation makes it possible to obtain the same
onditions
of rst-order asymptoti
e
ien
y existing in the GMM methods. In this way the estimate
given by the moment
onditions is given by:
T
X
(3.4) θb = argθ pt gw (θ, xt ) = 0.
t=1
An interpretation of equation 3.4 in relation to the GMM estimator is that, while in over-
identied models estimated by GMM the moment
onditions are not exa
tly equal to zero, in
the estimators dened by this equation the moment
onditions are exa
tly equal to zero by
weighting with the use of the empiri
al probabilities pt . Note that, in models exa
tly iden-
tied, all the estimators proposed obtain similar results, be
ause in all these estimators the
moment
onditions are always valid. In over-identied models with valid moment
onditions,
all these estimators produ
e the same asymptoti
varian
e.
96
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 13
et al. (1993)), similar to the interpretation of the GMM estimator as an estimator of minimum
χ2 , or the interpretation of estimators of Quasi-Maximum Likelihood as estimators of Mini-
mum Contrast. Dening a general divergen
e fun
tion between two measures of probability
P and Q as follows:
Z
dP
(3.5) D(P, Q) = φ dQ,
dQ
where φ is a onvex fun tion. Dene M as the set of all the probability measures in Rp and
Z
(3.6) P (θ) = P ∈ M : g(θ, x)dP = 0
and P the statisti
model of all the probability measures
ompatible with 3.6. The problem
of minimum
ontrast optimization is given by
T
X
(3.8) θbn = arg min hT (pt ).
θ,pt
t=1
97
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 14
An important result is that an adequate
hoi
e of the dis
repan
y fun
tion
an lead to a
unied representation of Empiri
al Maximum Likelihood and Minimum Contrast estimators.
This representation
an be obtained when the fun
tion hT (pt ) belongs to the Cressie-Read
family of dis
repan
ies given by:
and with restri
tions on the denition of the Cressie-Read dis
repan
y, there are parti
ular
ases of several
lasses of estimators. The empiri
al likelihood Method is obtained with the re-
stri
tion γ → 0 in the dis
repan
y fun
tion hT (pt ); the generalized minimum
ontrast method,
known as exponential tilting (ET) of Kitamura and Stutzer (1997) and Imbens et al. (1998),
is obtained by γ → −1 and the Continuous Updating estimator employing the empiri
al
likelihood formulation is obtained by γ → 1.
Smith (2001) demonstrated that it is possible to dene another estimator that also in
ludes
these estimators as parti
ular
ases. The method of generalized empiri
al likelihood (GEL) of
(Smith (2001)) is obtained as a solution for the following saddlepoint problem:
" T
#
1 X
(3.10) θbn = arg min max ′ w
ρ λ g (θ, xt ) ,
θ λ T
t=1
T
X
(3.11) pt gw (θ, xt ) = 0
t=1
Estimators are obtained solving the previous equation with the rst-order ondition:
T
X
∂gw (θ, xt )
(3.12) pt λ ′
= 0,
t=1
∂θ
where:
98
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 15
1 ′ ′ w
(3.13) pt = ρ λ g (θ, xt ) .
T
This generalized likelihood estimator in
ludes the empiri
al likelihood estimator, assuming
the same
onditions on γ of the Cressie-Read divergen
e fun
tion, and modifying fun
tions h
and ρ. The empiri
al likelihood estimator is obtained with h(p) = −ln np and ρ(ξ) = ln(1−ξ),
the estimator of exponential tilting with h(p) = np ln np and ρ(ξ) = −exp(ξ) , the estimator of
ontinuous updating with h(p) = (np)2 and ρ(ξ) = −(1+ξ)2 /22. The solution for the obtention
of parameter estimators and the Lagrange multiplier estimators
an be a
quired by methods
of numeri
al optimization or via quasi-Newton iterative methods; and the solution
an be
formulated in a problem of a smaller dimension by means of a dual formulation (Kitamura
(2006)), and thus this is the general form employed in the estimation in this study. An
additional
lass of estimators
an be obtained by
ombining the empiri
al likelihood estimator
and the exponential tilting estimator, generating the estimator known as exponentially tilted
empiri al likelihood (ETEL) proposed by S henna h (2007). This estimator is dened as:
n
!
X
(3.14) θb = arg min n −1 e
h(pt (θ)) ,
θ
i=1
n
X
(3.15) min
n
n−1 h(pt )
{wi }i=1
i=1
Pn Pn
subje
t to i=1 pt g(θ, xt ) = 0 and i=1 pt = 1, with e
h(pbt ) = −ln(npt ) and h(pt ) =
npt ln(npt ).
Note that the ETEL (exponentially tilted empiri
al likelihood) estimator employs the ex-
ponential tilting method to nd probabilities wbi (θ) and the empiri
al likelihood method to
99
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 16
estimate the parameter ve
tor θb. These probabilities are related to multipliers λ through the
relation:
b ′ g(θ, xt )
λ(θ)
(3.16) pbt (θ) = P .
n b ′ g(θ, xt )
λ(θ)
i=1
An important property of the ETEL
lass of estimators is their behavior in the presen
e of
in
orre
t spe
i
ation. Imbens et al. (1998) point out that the empiri
al likelihood estimator
an have inadequate behavior in the presen
e of in
orre
t spe
i
ation, due to the presen
e of
a singularity in its inuen
e fun
tion; and theorem 1 in Smith (2001) demonstrates that the
asymptoti
properties of the empiri
al likelihood estimator
an be severely degraded in the
presen
e of minimum spe
i
ation problems. This ee
t also ae
ts the estimations of impli
it
probabilities pbt , be
ause, in the presen
e of spe
i
ation problems, the impli
it probabilities
in likelihood problems tend to
on
entrate in the extreme observations, in opposition to what
is expe
ted from a robust estimator.
The result obtained by Smith (2001) is that, in the
lass of minimum dis
repan
y estimators,
the only estimator with adequate behavior in the presen
e of spe
i
ation problems is the
exponential tilting estimator, be
ause its inuen
e fun
tion does not present singularities. As
the ETEL estimator is a
ombination of empiri
al likelihood estimators and of the exponential
tilting estimator, it maintains the
hara
teristi
s of asymptoti
e
ien
y and minimum bias
of estimator EL, and, additionally, it is robust in the presen
e of spe
i
ation problems, due
to the use of the exponential tilting estimator to estimate the impli
it probabilities, as shown
√
in theorems 8-10 in Smith (2001), indi
ating that this estimator is n
onsistent even in the
presen
e of spe
i
ation problems.
We
an now sum up some
ommon properties of the estimators dis
ussed in this study. The
rst property is that all the estimators presented (two-stage GMM, Iterative GMM,
ontinuous
updating GMM, generalized empiri
al likelihood, exponential tilting and exponentially tilted
empiri
al likelihood) have the same properties of
onsisten
y and rst-order asymptoti
e-
ien
y (e.g. Smith (2001), S
henna
h (2007)), they are e
ient in the semi-parametri
sense of
100
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 17
Bi
kel et al. (1993), and in the validity of spe
ied moment
onditions. All the estimators have
the same asymptoti
varian
e, but the superior results in terms of bias and asymptoti
prop-
erties of superior order are valid for the estimators based on generalized empiri
al likelihood,
Exponen
ial Tilting and exponentially tilted empiri
al likelihood (e.g. Kitamura (2006)).
The
lass of estimators based on empiri
al likelihood also presents optimum properties in term
of hypotheses tests: these tests are optimum in the minimax and large deviation
riteria and
are uniformely more powerful in the generalized sense of Neyman-Person, as demonstrated in
Kitamura (2006).
However, the performan
e in nite samples
an be rather dierent. The two-stage GMM es-
timator
an be severely biased in the sizes of the sample employed in e
onomi
s and nan
e,
and
ontinuous updating estimators are numeri
ally unstable due to the existen
e of multiple
modes in the obje
tive fun
tion, for example, Hansen et al. (1996)). Newey and Smith (2004)
demonstrate that the empiri
al likelihood estimator must have a bias in nite samples smaller
than the bias of estimators of the exponential tilting and
ontinuous updating
lasses. In em-
piri
al likelihood and exponential tilting estimators, the bias does not grow with the number
of moment
onditions, as happens with the GMM estimator. Newey and Smith (2004) also
demonstrate that estimators based on GEL have good properties in terms of se
ond-order
bias. Another interesting property is that estimators based on GMC and GEL are invariant
to linear transformation in the moment
onditions ve
tor, whi
h does not o
ur with the two-
stage GMM estimator.
101
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 18
SR3 models, performing the estimation with the proposed estimation methods, and, based
on these estimations, evaluating the bias, mean square error (MSE) and the mean absolute
error (MAE) in relation to ea
h parameter estimated. Figures 4.1, 4.2, 4.3 and 4.4 show MSE
and MAE sequentially for ea
h parameter and ea
h method, for a more easy visualization of
results.
The simulation pro
edure employed for the Generalized CIR pro
ess employs a Milstein dis-
retization (e.g.Milstein (1974), Kloeden and Platen (1992)) to generate pro
ess traje
tories,
sin
e in this pro
ess there is no exa
t analyti
al solution for the transition density. For the
Vasi
ek and CIR SR pro
esses, we employed the exa
t transition density to generate simulated
traje
tories (e.g. Ait-Sahalia (2002)).
Note that this detail is of fundamental importan
e. Before dis
ussing this point, we will intro-
du
e the notation of strong
onvergen
e of dis
retizations. Suppose that we want to generate
a traje
tory of the sto
hasti
dierential equation dXt = µ(t, Xt ) + σ(t, Xt )dWt employing a
dis
retization that generates traje
tories Yt△ of this pro
ess, and that the traje
tories of this
approximation
onverged to the true traje
tory. An approximation Yt△ is said to be strong
order
onvergent γ > 0 if there are positive
onstants K and γ so that ea
h △ is valid:
E |Xt − Yt△ | ≤ K△γ ,
in whi
h K does not depend on the dis
retization interval △. On usual Lips
hitz
onditions
and growth, it is possible to demonstrate (e.g. Kloeden and Platen (1992), Prakasa Rao
(1999)) that the Euler dis
retization
onverges with strong order γ = 0.5 and the Milstein
dis
retization (Milstein (1974))is strong order
onvergent with γ = 1.
As the dis
retization employed in moment
onditions is of strong order inferior to that
employed in the pro
ess simulation, an in
orre
t spe
i
ation problem arises generated by the
dis
retization employed. This problem o
urs in a more intense form when the exa
t solution
of the sto
hasti
dierential equation
an be used to generate the pro
ess traje
tory. The
fundamental point is that, in the estimation based on approximated dis
retizations, there is
3These experiments were performed for the other models as well and produ e similar results, but are not
102
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 19
always a bias generated by the pro
ess dis
retization, and one of the obje
tives of the Monte
Carlo study is to verify whether any method manages to produ
e a redu
tion in the bias in
relation to this ee
t, whi
h
an be interpreted as a spe
i
ation problem. Note that in Chan
et al. (1992)'s original arti
le, the dis
retization employed is still simpler than Euler's, and
thus the existing bias in the estimators must be even greater.
The rst Monte Carlo experiment
orresponds to the simulation of 1,000 traje
tories of
size 474 of a Generalized CIR pro
ess with a parameter ve
tor given by α =0.0408, β =-
.5921,σ2 =1.6704 and γ =1.4999. The results of this experiment are displayed in Table 2
and Figure 4.1. Ea
h gure shows respe
tively the bias and mse obtained by ea
h estimator.
The results obtained demonstrate that there is a relevant bias in the estimation of all the
parameters, and parti
ularly of parameter σ2 . The results in terms of the size of the bias and
of the mean square error are quite similar for almost all the estimators, ex
ept for estimators
ETEL and SETEL, whi
h present far superior results in terms of bias, MSE and MAE in
relation to the other methods for all the parameters estimated, whi
h is evident in Figure 4.1.
In the Monte Carlo experiment for the Vasi
ek pro
ess (Table 3 and Figure 4.2), we
simulated again one thousand traje
tories with a parameter ve
tor given by α = 0.0154,
β = −.1779, σ 2 = .0004 and γ = 0. The results demonstrate again that the ETEL es-
timators' performan
e is superior, and it is also noti
eable that, in this experiment, the
estimator with the worst performan
e was the estimator GMMCUE. For the CIR SR pro-
ess (Table 4 and Figure 4.3), we simulated one thousand traje
tories of the pro
ess with
α = 0.0189, β = −.2339, σ 2 = 0.0073 and γ = .5.. The same behavior of better performan
e
of the ETEL
lass of estimators was observed, as well as a similar performan
e of the other
estimators.
Note that, so far, the problem of in
orre
t spe
i
ation was
aused only by the use of
approximated dis
retization in the
onstru
tion of the pro
ess' moment
onditions. In order
to verify whether the better performan
e properties of the ETEL
lass of estimators are
valid in more general situations of in
orre
t spe
i
ation, we employed, as data generating
103
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
mean α 0.058507 0.058507 0.058507 0.055458 0.059144 0.059415 0.042725 0.052872 0.058399 0.059222 0.04188
bias α 0.017707 0.017707 0.017707 0.014658 0.018344 0.018615 0.001925 0.012072 0.017599 0.018422 0.0010799
mse α 0.00056964 0.00056964 0.00056964 0.00047632 0.0006153 0.00061202 0.00013504 0.00036246 0.00058505 0.00060074 0.00011089
mae α 0.01897 0.01897 0.01897 0.017089 0.019782 0.019879 0.0054645 0.014678 0.019406 0.019842 0.0062548
mean β -0.8595 -0.8595 -0.8595 -0.80945 -0.86955 -0.87356 -0.55882 -0.7687 -0.8596 -0.87196 -0.57188
bias β -0.2674 -0.2674 -0.2674 -0.21735 -0.27745 -0.28146 0.033282 -0.1766 -0.2675 -0.27986 0.020224
mse β 0.13466 0.13466 0.13466 0.11417 0.14782 0.14621 0.0060387 0.087359 0.14515 0.14649 0.005762
mae β 0.28842 0.28842 0.28842 0.25989 0.30496 0.30573 0.050072 0.22293 0.30165 0.30589 0.053256
2
mean σ 2.0247 2.0247 2.0247 1.5815 1.3286 1.344 1.7024 1.6174 1.3095 1.3495 1.709
2
bias σ 0.35432 0.35432 0.35432 -0.088891 -0.34183 -0.32643 0.031976 -0.053 -0.36091 -0.32087 0.038576
2
mse σ 2.9527 2.9527 2.9527 1.4709 1.936 2.1132 0.0044602 0.92555 1.4653 2.316 0.014295
2
mae σ 0.77683 0.77683 0.77683 0.75317 0.30496 1.0843 0.044706 0.60878 0.93484 1.0786 0.059661
mean γ 1.4939 1.4939 1.4939 1.4426 1.3792 1.3749 1.545 1.4612 1.388 1.379 1.545
bias γ -0.0060482 -0.0060482 -0.0060482 -0.057291 -0.12068 -0.12502 0.04513 -0.038714 -0.11192 -0.12093 0.045113
mse γ 0.026274 0.026274 0.026274 0.020593 0.041267 0.044455 0.0049082 0.014641 0.036723 0.047736 0.0050356
mae γ 0.099573 0.099573 0.099573 0.10787 0.16667 0.17463 0.047923 0.087617 0.15467 0.17471 0.050718
Table 2. Monte Carlo - Generalized CIR Model α =0.0408, β =-.5921,σ 2 =1.6704, γ =1.4999
20
104
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
mean α 0.026282 0.02631 0.032988 0.023512 0.026155 0.02619 0.018149 0.021964 0.026201 0.026319 0.017407
bias α 0.010882 0.01091 0.017588 0.0081121 0.010755 0.01079 0.0027494 0.0065644 0.010801 0.010919 0.0020066
mse α 0.00033681 0.00034073 0.027141 0.00021774 0.00033557 0.00034088 2.94e-05 0.00016122 0.00033437 0.00035097 3.1323e-05
mae α 0.012892 0.012926 0.01954 0.010099 0.012822 0.012858 0.003452 0.0085718 0.01283 0.01297 0.0035887
mean β -0.30312 -0.30342 -0.30962 -0.26678 -0.30146 -0.30183 -0.17052 -0.24704 -0.30219 -0.30346 -0.17013
bias β -0.12522 -0.12552 -0.13172 -0.088877 -0.12356 -0.12393 0.0073759 -0.069138 -0.12429 -0.12556 0.0077726
mse β 0.040826 0.041215 0.085012 0.025724 0.040504 0.040987 9.4271e-05 0.018829 0.040542 0.042202 0.00010285
mae β 0.14424 0.14463 0.15174 0.1087 0.14293 0.14371 0.0075038 0.089738 0.14341 0.14512 0.0079496
2
mean σ 0.00039439 0.00039426 0.0041781 0.00024483 0.00039568 0.00039424 -0.00073209 0.00016984 0.00039582 0.00039419 -0.00043392
2
biasσ -5.6134e-06 -5.7428e-06 0.0037781 -0.00015517 -4.3172e-06 -5.7626e-06 -0.0011321 -0.00023016 -4.1753e-06 -5.8097e-06 -0.00083392
2
mse σ 7.278e-10 7.3255e-10 0.0088959 1.4396e-06 7.0684e-10 7.3271e-10 1.3833e-05 2.6605e-06 7.0188e-10 7.3365e-10 1.8212e-05
2
mae σ 2.1654e-05 2.1722e-05 0.0038054 0.00021042 0.14293 2.1735e-05 0.0016887 0.00033096 2.1213e-05 2.1738e-05 0.002153
21
105
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
mean α 0.027282 0.027373 0.027345 0.026274 0.026822 0.02717 0.035421 0.024065 0.026855 0.027303 0.032218
bias α -0.011618 -0.011527 -0.011555 -0.012626 -0.012078 -0.01173 -0.003479 -0.014835 -0.012045 -0.011597 -0.0066823
mse α 0.00025651 0.00025636 0.00025737 0.00025445 0.00026778 0.0002604 0.00017471 0.00031365 0.00027164 0.00025814 0.00022287
mae α 0.013893 0.013872 0.013915 0.014255 0.014273 0.01401 0.0095164 0.016122 0.01429 0.013928 0.011389
mean β -0.35358 -0.35494 -0.35469 -0.25389 -0.34377 -0.35199 -0.22432 -0.25006 -0.34409 -0.35426 -0.22223
bias β -0.11968 -0.12104 -0.12079 -0.019993 -0.10987 -0.11809 0.009581 -0.016158 -0.11019 -0.12036 0.011673
mse β 0.037413 0.038193 0.038221 0.0085822 0.03374 0.037286 0.00032757 0.007739 0.034579 0.038137 0.00033459
mae β 0.14679 0.14817 0.14801 0.049654 0.13771 0.14582 0.011027 0.0457 0.13865 0.14779 0.012825
2
mean σ 0.007215 0.007212 0.0072157 0.0077384 0.0072538 0.0072123 0.01034 0.0087777 0.0072597 0.0072157 0.011519
2
bias σ -8.5032e-05 -8.796e-05 -8.4257e-05 0.00043839 -4.6153e-05 -8.7746e-05 0.0030401 0.0014777 -4.0337e-05 -8.4268e-05 0.0042193
2
mse σ 2.9662e-07 2.972e-07 2.9673e-07 6.2162e-06 5.1309e-07 2.9714e-07 7.3385e-05 3.0028e-05 4.3382e-07 2.9759e-07 9.4528e-05
2
mae σ 0.00040581 0.00040617 0.00040628 0.00090954 0.13771 0.00040711 0.0041124 0.0018955 0.00042241 0.00040691 0.0054431
22
106
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 23
2.5
2.0
1.5
1.0
0.5
0.0
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
pro
ess, traje
tories of the Generalized CIR pro
ess with parameter ve
tor α =0.0408, β =-
.5921,σ2 =1.6704 and γ =1.4999. However, as spe
i
ation of the estimated model, we now
employed a CIR SR model assuming that γ = .5.
The results of this experiment (Table 5 and Figure 4.4) demonstrate that, in this general
ase, a better performan
e of ETEL and ET estimators also o
urs, but the other estimators
have a mu
h worse performan
e in relation to the estimation of parameter σ2 . Note that the
problem of in
orre
t spe
i
ation is expe
ted, in this situation, to ae
t mainly the estimation
of the pro
ess varian
e, be
ause, in the
lasses of CIR models, the pro
ess volatility is a level
fun
tion of the pro
ess with parameter γ .
In order to perform the pro
edure of model
omparison with real data, we followed the
basi
stru
ture of Chan et al. (1992)'s study, by estimating the generalized CIR model and
the eight submodels (Merton, Vasi
ek, CIR SR, Dothan, Brennan-S
hwartz, GBM, CIR VR
and CEV, in Chan et al. (1992)'s notation), with an expanded sample of a one-month maturity
Treasure Bill yields. The sample used has monthly data from July 1964 to De
ember 2003,
totalling 475 observations. The data employed are extra
ted from the databse of the Center
107
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
mean α 0.058858 0.068777 0.046083 0.0198 0.031206 0.042609 0.01872 0.016732 0.03132 0.054847 0.021016
bias α 0.018058 0.027977 0.0052827 -0.021 -0.0095939 0.0018092 -0.02208 -0.024068 -0.0094795 0.014047 -0.019784
mse α 0.00094603 0.0015446 0.0019998 0.00068434 0.0023605 0.0033731 0.0032493 0.00066272 0.011352 0.0025003 0.011872
mae α 0.021676 0.033352 0.036552 0.023599 0.022699 0.050106 0.023033 0.024875 0.025205 0.043429 0.027102
mean β -0.88012 -1.0498 -0.686 -0.25302 -0.35149 -0.63736 -0.22108 -0.23068 -0.3347 -0.83664 -0.22787
bias β -0.28802 -0.45772 -0.093905 0.33908 0.24061 -0.045256 0.37102 0.36142 0.2574 -0.24454 0.36423
mse β 0.1998 0.40413 0.52967 0.14488 0.18393 0.8179 0.13867 0.13683 0.18573 0.67003 0.16751
mae β 0.34075 0.54251 0.5972 0.37389 0.39164 0.80459 0.37102 0.3685 0.38477 0.71483 0.37528
2
mean σ 0.0081117 0.0080927 0.0082476 0.0093165 0.026925 0.010682 0.014684 0.013577 0.014008 0.0085045 0.018849
2
bias σ -1.6623 -1.6623 -1.6622 -1.6611 -1.6435 -1.6597 -1.6557 -1.6568 -1.6564 -1.6619 -1.6516
2
mse σ 2.7632 2.7633 2.7628 2.7593 2.9352 2.7588 2.7417 2.7452 2.7466 2.7621 2.7358
2
mae σ 1.6623 1.6623 1.6622 1.6611 0.39164 1.6604 1.6557 1.6568 1.6565 1.6619 1.6539
Table 5. Monte Carlo - Misspe ied Model Model α =0.0408, β =-.5921,σ 2 =1.6704, γ =1.4999
24
108
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 25
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
for Resear h and Se urity Pri es (CRSP DATA). Figure 5.1 displays the series employed and
the des riptive statisti s are pla ed in Table 6. It is noti eable that their general behavior is
109
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 26
2.5
2.0
1.5
1.0
0.5
0.0
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
110
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 27
0.15
0.10
r
0.05
Index
in the parameter estimators (
. par) and in the estimation of Lagrange multipliers (
.λ) for
the estimators outside the GMM
lass, where value 1 means
onvergen
e.
The following tables (8,9,10,11,12,9 and 14) display the results of the estimation of the
overidentied systems. In the estimation of Vasi
ek's model (Table 8), there are three param-
eters and four moment
onditions. In these estimations, a greater variability in the estimation
of parameters α and β is noti
eable, however, due to the pattern of deviations obtained, these
estimations are not statisti
ally dierent. In the estimation of the CIR SR model (Table 9),
the result is quite similar between all the estimation methods, ex
ept for the estimation of β
for the GELCUE estimator, whi
h
an be related to a problem of lo
al maximum.
The results of the Merton model estimation (Table 10) show two behavior patterns, with
values of α
lose to .0003 for the GMM2S, GMMITER and GELCUE and SGELCUE methods,
and values
lose to 0.0065 for the other estimators, noti
ing, however, that in these
ases there
was no
onvergen
e in the estimation of λ.
111
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
α 0.025680 0.025680 0.025680 0.030350 0.025650 0.025680 0.032970 0.025630 0.025640 0.031770 0.032090
se α 0.013150 0.013150 0.013150 0.012730 0.012680 0.012680 0.012640 0.015990 0.016000 0.016920 0.016860
β -0.460280 -0.460280 -0.460280 -0.549710 -0.459500 -0.460360 -0.568730 −0.459610 -0.459780 -0.591160 −0.592850
se β 0.272150 0.272150 0.272150 0.263890 0.262940 0.262940 0.260930 0.301850 0.301880 0.320180 0.317150
σ2 1.673600 1.673600 1.673600 1.673600 0.920420 0.924040 1.332080 0.923730 0.925020 1.449700 1.323670
2
se σ 2.295810 2.295810 2.295810 1.826640 1.314290 1.318720 2.350850 2.376070 2.379200 3.683310 3.412470
γ 1.461410 1.461410 1.461410 1.413150 1.339400 1.340160 1.472900 1.340100 1.340350 1.340880 1.509510
se γ 0.262060 0.262060 0.262060 0.256570 0.270330 0.270200 0.340430 0.469460 0.469440 0.467070 0.472950
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
28
112
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
α 0.012770 0.009700 0.005720 0.015400 0.009320 -0.002170 0.008820 0.001950 0.005000 0.008650 0.005000
se α 0.011490 0.011420 0.011380 0.010800 0.012040 0.010740 0.010750 0.013690 0.012510 0.010770 0.012510
β −0.209910 -0.145250 -0.058160 -0.177900 -0.176290 0.100590 -0.168670 -0.168920 -0.180300 -0.125850 -0.180300
se β 0.242210 0.241110 0.240380 0.225640 0.249630 0.225830 0.225710 0.221900 0.229360 0.204440 0.229360
σ2 0.000390 0.000380 0.000380 0.000400 0.003620 0.000400 0.000710 0.007560 0.005170 0.000370 0.005170
2
se σ 0.000060 0.000060 0.000060 0.000060 0.000620 0.000060 0.000070 0.003450 0.001870 0.000080 0.001870
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 0 1 1 0 0 0 1 0
29
113
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
α 0.014700 0.013510 0.011000 0.012730 0.048840 0.005680 0.012190 0.012370 0.017930 0.013130 0.012060
se α 0.011540 0.011520 0.011490 0.010850 0.011860 0.010850 0.010810 0.011110 0.012610 0.011270 0.011030
β -0.245170 -0.219500 -0.164570 -0.220600 −0.626310 −0.058160 -0.218900 -0.224410 -0.230920 -0.214410 -0.216110
se β 0.242760 0.242370 0.241820 0.226930 0.237360 0.227960 0.226900 0.210930 0.238380 0.214710 0.208060
σ2 0.008100 0.008050 0.008000 0.008520 0.010370 0.008160 0.004210 0.005910 0.016620 0.007830 0.002990
2
se σ 0.001110 0.001110 0.001110 0.001690 0.001230 0.001200 0.001250 0.001490 0.002410 0.001490 0.001660
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 0 1 1 0 0 1 1 0
30
114
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
α 0.002940 0.002960 0.003050 0.000820 0.000720 0.002610 0.000720 0.000850 0.000690 0.002800 0.000690
se α 0.002570 0.002570 0.002570 0.002370 0.002370 0.002410 0.002370 0.002490 0.002540 0.003130 0.002540
σ2 0.000390 0.000380 0.000380 0.001030 0.000930 0.000400 0.000930 0.001100 0.000940 0.000370 0.000940
se σ2 0.000060 0.000060 0.000060 0.000190 0.000180 0.000060 0.000180 0.000320 0.000280 0.000080 0.000280
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 0 0 1 0 0 0 1 0
31
115
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 32
Similar results are obtained for parameter σ 2 for all methods in the estimation of the Dothan
model (Table 11), with
onvergen
e in all estimations.
In the estimation of the Brennan-S
hwartz models (Table 12) CIR VR (Table 13) and CEV
(Table 14), we obtained su
ess in the
onvergen
e of the parameter ve
tor and the Lagrange
multipliers' ve
tor in all the methods, and, as expe
ted the results obtained are quite similar
in all the estimators used, ex
ept for the estimators based on exponentially tilted empiri
al
likelihood.
One way of undertaking the spe
i
ation tests in the
ontext of the GMM estimation in
overidentied models is through the distan
e between the moment
onditions and zero. In
the overidentied
ase, the greater proximity of the moment
onditions evaluated in θb would
be eviden
e of the validity of the spe
i
ation employed. A way of dening a test statisti
is
through the
riterion fun
tion itself, and that originates the so-
alled J-tst, whose statisti
is
given by:
(6.1) b ′ [Ω(
J = T g(θ) b −1 g(θ).
b θ)] b
The asymptoti
distribution of the test statisti
s under the null hypothesis of
orre
t es-
pe
i
ation is a distribution χ2 (m − k), whose degree of freedom is given by the number of
moments in ex
ess to the number of parameters estimated. In the estimation by the methods
of generalized minimum
ontrast/generalized empiri
al likelihood, it is also possible to
on-
stru
t two alternative spe
i
ation tests, the Lagrange multiplier (LM) and the Likelihood
Ratio test (LR), as dis
ussed in Smith (2001), employing the Lagrange multipliers of equation
3.10. The intuition of the LM test is similar to that of the J test - if the moment
onditions
are valid, the Lagrange multipliers must not be far from zero - and thus it is not ne
essary to
impose restri
tions to the model to make the moment
onditions equal to zero. The form of
the LM test in this
ontext is given by:
116
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
σ2 0.144400 0.143860 0.143500 0.151650 0.152360 0.143570 0.141730 0.156020 0.147600 0.142890 0.137570
se σ2 0.018830 0.018840 0.018840 0.020320 0.020320 0.020310 0.020310 0.027350 0.027120 0.027040 0.026990
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
33
117
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
α 0.019800 0.019680 0.018880 0.020820 0.019830 0.017060 0.032160 0.025000 0.022600 0.019920 0.024570
se α 0.011810 0.011810 0.011800 0.011320 0.011240 0.011240 0.011760 0.012850 0.012650 0.012490 0.013460
β -0.343020 -0.340200 -0.323030 -0.322980 -0.338490 -0.287340 -0.298100 −0.446630 -0.400510 -0.347120 −0.315520
se β 0.246860 0.246820 0.246660 0.236550 0.235330 0.235330 0.240360 0.244610 0.241050 0.238700 0.250080
σ2 0.144450 0.144100 0.143670 0.158580 0.146250 0.143280 0.127940 0.152150 0.146510 0.142520 0.134780
2
se σ 0.018530 0.018530 0.018540 0.019680 0.019650 0.019650 0.019790 0.025920 0.025780 0.025690 0.025900
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
34
118
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
σ2 1.947930 1.935290 1.973020 2.134040 2.084420 2.031730 1.930910 2.089810 1.991970 1.949750 1.894050
se σ2 0.260370 0.260230 0.260670 0.265510 0.264360 0.263270 0.273340 0.341910 0.337720 0.336220 0.334510
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
35
119
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
β 0.061900 0.062120 0.065920 0.054140 0.052030 0.051540 0.011800 0.029150 0.028310 0.060810 −0.006150
se β 0.054340 0.054340 0.054340 0.050480 0.050480 0.050460 0.050610 0.054950 0.055490 0.056250 0.056190
σ2 0.238460 0.149560 0.149790 0.729960 0.500060 0.213700 0.732500 0.606200 0.593460 0.143730 0.560070
se σ2 0.437080 0.299670 0.299890 1.067110 0.773820 0.377510 1.432650 1.653030 1.748400 0.446840 2.093630
γ 1.098550 1.016000 1.016110 1.296680 1.229400 1.075610 1.365170 1.254660 1.290140 1.014000 1.355840
se γ 0.337350 0.365020 0.364740 0.278770 0.292490 0.327000 0.377480 0.503040 0.546550 0.560770 0.700410
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
36
120
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 37
(6.2) b′ Ω(
LM = T λ b λ.
b θ) b
Note that it is also possible to perform spe
i
ation analyses by
onsidering the distributions
b instead of the joint LM test4. The test of Likelihood Ratio
estimated individually for ea
h λ
(LR) is obtained by
omparing the obje
tive fun
tion ρ under the unrestri
ted model with the
estimation of a restri
ted model, formulated in the
ontext of generalized minimum
ontrast
models, as dened in se
tion 3. This form is given by:
" #
T
X
(6.3) LR = 2 ρ λb′ gw (θ,
b xt ) − ρ(0)
t=1
The LM and LR tests are also distributed asymptoti
ally as χ2 (m−k). Although J, LM and
LR tests are asymptoti
ally equivalent, the optimality results in models of empiri
al likelihood
/ Generalized Minimum Contrast (e.g. Kitamura (2006)) - as well as the best properties in
point estimation in nite samples - indi
ate that LR and LM tests must have better properties
in nite samples than the J-test obtained by GMM, whi
h
an be severely downward biased
in nite samples, as pointed out by Zhou (2000), leading to a greater probability of in
orre
t
a
eptan
e of a null hypothesis of
orre
t model spe
i
ation. For the GMM method only
the J-test is dened. Note that the validity of LM and LR depends on the
onvergen
e in the
estimation of λ.
A summary of the results of all the tests for the seven tted models is presented in Table
15. A tra
e indi
ates whether there was
onvergen
e in the estimation of λ. There is no
onvergen
e problem for Brennan-S
hartz, CIR-VR, Dotha and , CEV models. Also only
the GMM and (S)GELCUE methods presented
onvergen
e for all models. The results for
these methods are similar. Considering only these ve methods there are eviden
e against the
espe
i
ation of Vasi
ek, CIR-SR and Merton models (all p-values less or equal to 7% in all
tests); some eviden
e against the CEV model (p-values less or equal to 7% for the GMM and
4These tests were al ulated for all the models estimated, but they are not reported here for reasons of spa e.
121
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 38
equal to 11% for the GELCUE method); little eviden
e agains the Dothan, Brennan-S
hartz
and CIR-VR models. In the other methods we
an nd strong eviden
e against all the models
whenever the
onvergen
e o
urred but we must
onsider this result with
are.
Observe that, ex
ept for the GEL and ET methods, even when we have
onvergen
e, the
p-values are very small. This
ould be be
ause these methods are more powerful than the
other tests or an indi
ation that the
hi-square distribution is not a good approximation for
nite sample size series. This question
ould be answered using simulation, but it is out of
the s
ope of this arti
le.
In general, the results of spe
i
ation tests employing
onditions of overidenti
ation estimated
by empiri
al likelihood/generalized minimum
ontrast, summarized in Table 15, point towards
the reje
tion of the validity of the null hypothesis of
orre
t spe
i
ation, whereas, in general,
the J-tests by GMM are more favorable to the validity of the null hypothesis. The results of
the empiri
al likelihood/generalized minimum
ontrast tests are
onsistent with the per
eption
that single-fa
tor models, su
h as the models estimated in this study,
an be ex
essively
simple to be able to model interest rate pro
esses or the pri
ing of xed-in
ome instruments
(e.g. Stambaugh (1988), Stanton (1997), Litterman and S
heinkman (1991), Longsta and
S
hwartz (1992) and Lund and Andersen (1997)).
Nevertheless, there are alternative forms of verifying problems of in
orre
t spe
i
ation.
One possibility is by analysing the estimated impli
it probabilities. It is also possible to
onstru
t alternative tests of spe
i
ation and stru
tural break employing impli
it probabilities
obtained by the estimation of generalized empiri
al likelihood models, as shown by Antoine
et al. (2007), Ramalho and Smith (2005) and Guay and Lamar
he (2008), using Pearson type
statisti
s to measure the quadrati
distan
e between the impli
it probabilities of restri
tive
and unrestri
tive models. These statisti
s are asymptoti
ally equivalent to the spe
i
ation
tests employing the moment
onditions presented in this se
tion. None of these alternative
are
onsidered here.
122
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 39
H0 GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
Vasi
ek 1 2 2 - <<< 2-2-2 - - - <<< -
CIR-SR 5 6 6 - <<< 5-5-5 - - <<< <<< -
Merton 6 7 7 - - 4-4-4 - - <<< <<< <<<
Dothan 27 28 28 47-25< 46-34-7 53-53-53 53-24-3 7-3< 13-5< 15-15-15 13< <
BS 28 30 30 12-9-9 26-18-4 28-28-28 <<< 3< < 4-2-< 5-5-5 <<<
CIR-VR 11 10 10 17-15-10 18-15-10 18-18-18 <<< <<< <<< <<< <<<
CEV 6 7 7 8-7-3 10-8-4 11-11-11 <<< <<< <<< <<< <<<
Table 15. Summary of spe
i
ation tests. Ea
h
ell has p-values for J, LR and
LM Tests. In the GMM methods only J-Test was applied. -: no
onvergen
e;
and < p-value smaller than 1%.
7. Con lusions
hood/generalized minimum ontrast for the estimation of sto hasti dierential equations.
These estimators are hara terized by properties of asymptoti e ien y of superior order,
tion problems for the estimators based on exponential tilting. These properties are parti -
ularly important in this ontext of estimation of sto hasti dierential equations, sin e, in
general, it is not possible to onstru t the exa t likelihood fun tion of the pro ess due to the
non-existen e of analyti al solutions (and onsequently of exa t dis retizations) for sto hasti
dierential equations. In this ontext, the use of non-parametri approximation for the pro-
ess density employing these methods is parti ularly advantageous be ause it fa ilitates the
e ient evaluation of the pro ess density, at the same time as the parametri spe i ation
given by the sto hasti dierential equation is being used by means of moment onditions.
The results obtained demonstrate that the exponentially tilted empiri al likelihood estima-
tor in parti ular, proposed by S henna h (2007), obtains a performan e whi h is superior to
the other te hniques proposed, due to its properties of robustness in the presen e of spe i-
ation problems. As it is possible to interpret the estimation of the sto hasti dierential
equations by employing dis rete data as an in orre t spe i ation problem, due to the use
of an approximated dis retization of the model, the results of the Monte Carlo experiments
demonstrate that the performan e of this estimator is quite superior to the other estimation
methods employing moment onditions, and, in general, the estimators based on empiri al like-
lihood/generalized minimum ontrast have a better performan e in terms of bias and mean
123
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 40
Referen
es
Ait-Sahalia, Y.: 2002, Maximum-likelihood estimation of dis
retely-sampled diusions: A
losed-form approximation approa
h, E
onometri
a 70, 223262.
Anatolyev, S.: 2005, Gmm, gel, serial
orrelation and asymptoti
bias, E
onometri
a 73, 983
1002.
Andrews, D. W. K.: 1991, Heteroskedasti
ity and auto
orrelation
onsistent
ovarian
e matrix
estimation, E
onometri
a 59, 817858.
Antoine, B., Bonnal, H. and Renault, E.: 2007, On the e
iente use of the informational
ontent of estimating equations: implied probabilities and eu
lidian empiri
al likelihood,
Journal Of E
onometri
s 138, 461487.
Ba
helier, L.: 1900, Theorie de la spe
ulation. English translation by A J Boness in The
Random Chara
ter of Sto
k Market Pri
es, ed. Paul H Cootner, pg 1778, Cambridge,
Mass, MIT press 1967.
Bi
kel, P., Klassen, C., Ritov, Y. and Wellner, J.: 1993, E
ient and Adaptative Estimation
for Semiparametri
Models, Johns Hopkins Press.
Bishwal, J. P. N.: 2007, Parameter Estimation in Sto
hasti
Dierential Equations, Springer.
Bla
k, F. and S
holes, M. S.: 1973, The pri
ing of options and
orporate liabilities, Journal
of Politi
al E
onomy 7, 63754.
Brennan, M. and S
hwartz, E. S.: 1980, Analyzing
onvertible bonds., Journal of Finan
ial
and Quantitative Analysis 15, 907929.
124
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 41
Chan, K. G., Karolyi, G., Longsta, F. and Sanders, A. B.: 1992, An empiri
al
omparasion
of alternative models of term stru
ture of interest rates, Journal of Finan
e 47, 12091227.
Chausse, P.: 2009, gmm: Generalized Method of Moments and Generalized Empiri
al Likeli-
6, 5959.
Gourieroux, C. and Monfort, A.: 1996, Simulation-Based E
onometri
Models, Oxford Uni-
versity Press.
Guay, A. and Lamar
he, J.-F.: 2008, The information
ontent of implied probabilities to
dete
t stru
tural
hange. Bro
k University Working Paper 08-33.
Hansen, L. P.: 1982, Large sample properties of Generalized Method of Moments estimators,
E
onometri
a 50, 10291054.
Hansen, L. P., Heaton, J. and Yaron, A.: 1996, Finite sample properties od some alternative
gmm estimators, Journal of Business and E
onomi
Statisti
s 14, 262280.
Harrison, J. M. and Kreps, D.: 1979, Martingales and arbitrage in multiperiod se
urities
markets, Journal of E
onomi
Theory 20, 381408.
Harrison, J. M. and Pliska, S.: 1981, Martingales and sto
hasti
integrals in the theory of
ontinous trading, Sto
hasti
Pro
esses and Their Appli
ations 11, 215260.
125
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 42
Imbens, G. W., Spady, R. H. and Johnson, P.: 1998, Information theoreti
approa
hes to
inferen
e in moment
onditions models, E
onometri
a 66, 333357.
Karatzas, I. and Shreve, S. E.: 1987, Brownian Motion and Sto
hasti
Cal
ulus, Springer
Verlag.
Kitamura, Y.: 2006, Empiri
al likelihood methods in e
onometri
s: Theory and pra
ti
e.
Unpublished Working Paper.
Kitamura, Y. and Stutzer, M.: 1997, An information-theoreti
alternative to generalized
method of moments estimation, E
onometri
a 65(5), 861874.
Kloeden, P. and Platen, E.: 1992, Numeri
al Solution of Sto
hasti
Dierential Equations,
SpringerVerlag.
Litterman, R. and S
heinkman, J.: 1991, Common fa
tors ae
ting bond returns, Journal of
Fixed In
ome 1, 5461.
Longsta, F. and S
hwartz, E.: 1992, Interest rate volatility and the term stru
ture: A two-
fa
tor general equilibrium model, Journal of Finan
e 47, 12591282.
Lund, J. and Andersen, T.: 1997, Estimating
ontinuous-time sto
hasti
volatility models of
the short-term interest rate, Journal Of E
onometri
s 77, 343377.
Merton, R. C.: 1973, The theory of rational option pri
ing, Bell Journal 4, 141183.
Milstein, G. N.: 1974, Aproximate integration of sto
hasti
diferential equations, Theory of
Probability and Appli
ations 19, 557562.
Newey, W. K. and West, K. D.: 1987, A simple, positive semi-denite, heteroskedasti
ity and
auto
orrelation
onsistent
ovarian
e matrix, E
onometri
a 55, 703708.
Newey, W. and M
Fadden, D.: 1994, Handbook of E
onometri
s, Vol. 4, Elsevier,
hapter
Large sample estimation and hypothesis testing.
Newey, W. and Smith, R. J.: 2004, High-order properties of gmm and generalized empiri
al
likelihood estimators, E
onometri
a 72, 219255.
Owen, A.: 1991, Empiri
al likelihood for linear models, The Annals of Statisti
s 18(1), 1725
1747.
Pedersen, A. R.: 1995, A new approa
h to maximum likelihood estimation for sto
hasti
dierential equations based on dis
rete observations, S
andinavian Journal of Statisti
s .
126
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 43
Prakasa Rao, B. L. S.: 1999, Statisti
al Inferen
e for Diusion Type Pro
ess, Arnold.
Qin, J. and Lawless, J.: 1994, Empiri
al likelihood and general estimating equations, The
Annals of Statisti
s 20(1), 300325.
Ramalho, J. J. S. and Smith, J.: 2005, Goodness of t tests for momento
onditions models.
Working Paper 2005/05.
Rogers, L. C. G. and Williams, D.: 2000, Diusions, Markov Pro
ess and Martingales: Volume
2 It Cal
ulus, Cambridge.
S
henna
h, S.: 2007, Point estimation with exponentially tilted empiri
al likelihood, Annals
of Statisti
s 35(2), 634672.
Smith, R. J.: 2001, Gel
riteria for moment
onditions models. Working Paper, University of
Bristol.
Stambaugh, R.: 1988, The information in forward rates: Impli
ations for models of the term
stru
ture, Journal of Finan
ial E
onomi
s 21, 4170.
Stanton, R.: 1997, A nonparametri
model of term stru
ture dynami
s and the market pri
e
of interest rate risk, Journal of Finan
e 52, 19732002.
Vasi
ek, O.: 1977, An equilibrium
hara
terization of the term stru
ture, Journal of Finan
ial
E
onomi
s 5, 17788.
Zhou, H.: 2000, A study of the nite sample properties of emm, gmm, qmle, and mle for a
square-root interest rate diusion model. Federal Reserve System Finan
e and E
onomi
s
Dis
ussion Series 2000-45.
Zivot, E. and Wang, J.: 2006, Modeling Finan
ial Time Series with S-PLUS, se
ond edition.,
Springer-Verlag.
127
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING
METHODS OF GENERALIZED EMPIRICAL LIKELIHOOD/MINIMUM
CONTRAST
Abstra
t. In this arti
le we dis
uss the estimation of Sto
hasti
Volatility (SV) models
using generalized empiri
al likelihood/Minimum Contrast methods. We show via Monte
Carlo simulations that the proposed methods have a superior or equivalent performan
e
to the other estimation methods proposed in the literature to estimate SV models, and,
additionally, they oer robustness properties in the presen
e of spe
i
ation problems su
h
as heavy-tailed distributions and the presen
e of outliers.
1. Introdu tion
Measurement of asset volatility is a fundamental aspe
t of nan
e. Pre
ise volatility mea-
surements in nan
ial asset returns are ne
essary in
ertain aspe
ts, su
h as risk management
(M
Neil et al. (2005)) and asset pri
ing (Singleton (2006)). Among the available forms for
modeling volatility, the
lass of models known as SV models stands out1. In this
lass of models,
volatility is treated as a non-observed latent fa
tor. One of the main reason for its popularity
is that SV models
an be derived from
ontinuous time diusions (e.g. Barndor-Nielsen et al.
(2002)), and, thus they be
ome
loser to the pri
ing literature using non-arbitrage/martingale.
These models are also attra
tive be
ause, as empiri
al eviden
e shows, they are better at
apturing stylized fa
ts of nan
ial series, and their predi
ative performan
e is superior in
omparison to other
lasses of volatility models (e.g. Koopman et al. (2005)), su
h as, for
example, the
lass of GARCH models (Engle (1982), Bollerslev (1986)). However, as volatil-
ity is treated as a non-observable latent pro
ess, the estimation of volatility models is more
ompli
ated than the estimation of
on
urrent models, su
h as the GARCH
lass, in whi
h
volatility is a deterministi
fun
tion of the past, whi
h makes the evaluation the likelihood
fun
tion a simple pro
edure.
In SV models, the exa
t evaluation of the likelihood fun
tion, due to the presen
e of the
latent volatility fa
tor, requires the
al
ulation of an integral with a dimension equivalent to
the sample size. The numeri
evaluation of this problem requires methods based on simulation,
su
h as importan
e sampling methods (e.g. Geweke (1994), Liesenfeld and Ri
hard (2003)) or
Markov Chain Monte Carlo (MCMC) (Shephard (1993),Ja
quier et al. (1994)). Although these
methods are e
ient and with the
urrently available
omputational power, quite feasible,
some problems still remain, su
h as the determination of a fun
tion of importan
e appropriate
or the problem of
orrelation in the
hains in MCMC sampling. It is also possible to work
with likelihood fun
tion approximations, su
h as the estimation by quasi-maximum Likelihood
(Harvey et al. (1994), Jungba
ker and Koopman (2009)), based on a linearization of the SV
1For a review of methods for estimating SV models see, for example, Broto and E. (2004), Ghysels et al.
(1996),Shepard and Andersen (2009) and Jungba
ker and Koopman (2009)
1
129
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 2
model. In this methodology, the evaluation of the likelihood fun
tions is made by means of
a de
omposition of the predi
tion error using the Kalman lter, whi
h renders a
onsistent
estimator whi
h is asymptoti
ally Gaussian though ine
ient and biased in nite samples.
Other ways of evaluating this model employ the estimation by simulation using the methods
of indire
t inferen
e and the e
ient method of moments (Gourieroux et al. (1993), Gallant
and Tau
hen (1996)). These two methods are asymptoti
ally e
ient, and have good proper-
ties in nite samples (Monfardini (1998)), but they are less e
ient than the MCMC methods
of Shephard (1993) and Ja
quier et al. (1994). The simplest estimation form for volatility
models is the method of moments, the original form of estimation employed in the estimation
of the seminal log-normal SV model proposed by Taylor (1986). This methodology was later
rened by Melino and Turnbull (1990) through the use of the generalized method of moments
(GMM) by Hansen (1982), whi
h generates
onsistent and asymptoti
ally e
ient estimators.
These estimators are
omputationally simple, but their properties in nite samples
an be
poor and they are ine
ient when
ompared with estimators based on MCMC. A
omprehen-
sive study of these estimators' properties
an be found in Andersen and Sorensen (1996), and
a
omplete survey about the estimation of SV models using the method of moments
an be
found in Renault (2009).
The performan
e of SV model estimators employing GMM is weakened by the fa
t that the
GMM estimator's bias grows with the number of moment
onditions (e.g. Newey and Smith
(2004)), and the e
ien
y in this method depends on an adequate
hoi
e of the moment
onditions. The GMM estimator manages to rea
h the e
ien
y of the maximum likelihood
estimator if one of the moments is the s
ore fun
tion of the maximum likelihood estimator, or
if the moments employed proje
t this fun
tion. In pra
ti
e, the e
ient estimation by GMM
involves the use of a large number of moment
onditions. As the bias in nite samples of
the GMM estimator is proportional to the number of moments employed, there is a trade-
o between bias and varian
e in the estimation by GMM when a high number of moment
onditions is used. Another problem in the estimation of SV models by GMM is the la
k of
robustness in the moment
onditions employed. The estimation of the log-normal SV model
is based on
onditions that employ moments of superior orders, and this
an be a serious
problem in the presen
e of outliers or pro
esses of heavy-tailed innovation. In this situation,
the the ee
ts of outliers in the sample are raised to poten
ies of third or fourth order, whi
h
signi
antly ae
ts the estimation in nite samples.
A further problem lies in the formulation of moment
onditions. Although the GMM
estimator is semi-parametri
, and thus it is not ne
essary to spe
ify the distribution fun
tion of
the pro
ess, the formulation of moment
onditions for SV models generally employs moments
derived from the spe
i
ation of a distribution fun
tion for the innovations, as in the
ase
of the so-
alled log-normal SV model of Taylor (1986). If this assumption is not valid, the
properties of the GMM estimator may be degraded.
In this way, the
omputationally simplest implementation of the generalized moments
method leads to an estimator with poor properties in nite samples (Andersen and Sorensen
(1996)), and, on the other hand, the implementation of e
ient estimators, su
h as the meth-
ods based on MCMC, are
omputationally intensive and subje
t to
onvergen
e problems. In
this study we propose an alternative form of estimation employing semi-parametri
methods
of generalized empiri
al likelihood and generalized minimum
ontrast. These methods, as will
be demonstrated, represent a
omputationally simpler way of implementation be
ause they
an be based on the same moment
onditions as the estimators of generalized moment meth-
ods, and they produ
e e
ient estimators with good properties in nite samples, as will be
130
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 3
131
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 4
minimum
ontrast; se
tion 5 shows Monte Carlo experiments; and the nal
on
lusions are in
se
tion 6.
The so-
alled log-normal volatility model introdu
ed by Taylor (1986)
an be des
ribed by
the following stru
ture:
(2.1) yt = σt εt ,
(2.2) 2
logσt2 = α + βlogσt−1 + σut ,
where the equation 2.1 des
ribes the behavior of the pro
ess mean, and equation 2.2
ontains
the volatility dynami
s. It is usually assumed that the innovation pro
esses in the mean and in
volatility are given by independent normal distributions, that is, (εt , ut ) ∼ iidN (0, I2 ) and in
this model the parameter ve
tor is given by θ = (α, β, σ). Note that it is possible to interpret
this model in a semi-parametri
form, as pointed out by Renault (2009), without an a priori
spe
i
ation of the innovation pro
ess distributions. Renault (2009) denotes this model as
Exponential - SARV be
ause the varian
e exponential is an autoregressive pro
ess.
As demonstrated by Fran
q and Zakoïan (2006), it is not ne
essary to assume a distribution
2 + logε2 ,
for this model's estimation, sin
e, as previously noted by Ruiz (1994), logyt2 = logσt−1 t
and this
orresponds to an ARMA model (1,1) for the log of the square of the observed pro
ess
yt , whi
h makes it possible to derive the representation employed by Fran
q and Zakoïan
(2006) to obtain a
onsistent estimator by least squares for this model. Fran
q and Zakoïan
(2006) also demonstrate that there is an ARMA(m,m) model for any logytm poten
y of this
pro
ess, although it is important to note that the log-normal representation is quite realisti
,
as indi
ated by Andersen (1994).
This log-normal spe
i
ation makes it possible to
onstru
t moment
onditions of any order,
as demonstrated by Taylor (1986) and Melino and Turnbull (1990). The moment
onditions
of the log-normal SV model
an be obtained by initially dening the non-
onditional mean
and varian
e of the equation logarithm of the varian
e:
α σ2
µ = E log σt2 = , σy2 = V ar log σt2 = ,
1−β 1 − β2
and the remaining moments as:
E yt2 = E σt2 ,
p
E |yt3 | = 2 2/πE σt3 ,
E yt4 = 3E σt4 ,
132
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 5
2
E yt2 yt−j 2
= E σt2 σt−j .
Moments of superior order
an be written out as:
ru r 2 u2
E [σrr ] = exp +
2 8
for any positive integer j and
onstants r and s, and in the same way
ovarian
es
an be
obtained by:
rsβ j σ 2
s
E σtr σt−s = E [σtr ] E [σts ] exp .
4
The moment
onditions employed by Andersen and Sorensen (1996) and in our study
om-
prise a set of 24 moment
onditions using absolute moments of se
ond to fourth order and
lags of rst to tenth orders:
(2.3) 24
gt,y t
(θ) = |yt |, yt2 , |yt3 |, yt4 , |yt yt−1 |, ..., |yt yt−10 |, yt2 yt−1
2
, ..., yt2 yt−10
2
We also employed a se
ond ve
tor of moment
onditions with 14 moment
onditions given
by:
(2.4)
14
gt,y t
(θ) = |yt |, yt2 , |yt3 |, yt4 , |yt yt−2 |, |yt yt−4 |, |yt yt−6 |, |yt yt−8 ||yt yt−10 |, yt2 yt−1
2
, yt2 yt−3
2
, yt2 yt−5
2
, yt2 yt−7
2
, yt2 yt−9
2
With these two ve
tors of moment
onditions we
an perform the estimation using the
generalized method of moments dened in se
tion 3 and the generalized empiri
al likelihood
and generalized minimum
ontrast methods in se
tion 4.
T
1X
(3.1) g (θ, yt ) = g(θ, yt ) = 0.
T t=1
This system is generally over-identied (there are more moment
onditions than param-
eters), and so in general there are no solutions. In order to obtain a solution, a
riterion
fun
tion must be employed:
133
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 6
n √ o−1
(3.3) W∗ = lim V ar T w (θ) = Ω(θ)−1 .
t→∞
where Ω(θ) denotes the varian
e-
ovarian
e matrix of the model's parameters. In this
way, the asymptoti
ally e
ient weight is obtained by employing the inverse of the varian
e-
ovarian
e parameter matrix. This matrix is generally unknown, and is usually estimated
using the HAC
lass of estimators by Newey and West (1987):
T
X −1
(3.4) b=
Ω b s (θ ∗ ),
kh (s)Γ
s=−(T −1)
where k denotes a kernel fun
tion in relation to a
ertain parameter of bandwidth h,
hosen
by means of Newey and West (1987) or Andrews (1991)'s pro
edures:
T
1X
(3.5) b ∗
Γs (θ ) = g(θ ∗ , yt )g(θ ∗ , yt+s )′ .
T
t=1
The e
ient estimator of the generalized method of moments is then obtained as a solution
to the problem:
There are several forms to
arry out the implementation of the GMM estimator. The initial
form proposed by Hansen (1982) is the estimator known as two-stage GMM. This estimator
is obtained by performing a rst stage, nding an initial θb∗ = arg min g (θ)′ Ωg (θ) estimator,
where Ω is an initial weighting matrix, usually an identity matrix. Following from this rst
stage, a HAC matrix Ω b (θ ∗ ) is
al
ulated in fun
tion of that initial estimation, and the nal
estimation of the GMM estimator is obtained as θb = arg min g (θ)′ Ω b (θ ∗ ) g (θ) with the HAC
matrix that was obtained in the rst stage.
A point to be noted is that, in this
ase, the se
ond stage results depend on the initial
estimation in the rst stage, and thus this pro
edure
an
reate a rst order bias, weakening
the estimator's performan
e in nite samples (Hansen et al. (1996)). In order to solve this
problem, two alternative pro
edures were proposed. The rst pro
edure is known as iterative
GMM, in whi
h the rst stage estimation is reinitialized with the result of the se
ond stage
estimation, and this iteration
ontinues until the variation in the parameter ve
tor or in the
riterion fun
tion be
omes smaller than an established toleran
e.
Another possible estimator is known as GMM with
ontinuous updating (Hansen et al.
(1996)). In this
ase, the estimation of the parameter θb is not performed in stages, but rather
by simultaneously employing a numeri
optimization algorithm. Starting from an initial ve
-
tor θ0 (usually
hosen by employing a two-stage GMM method), the estimation is performed
by θb = arg min g (θ)′ Ω
b (θ ∗ ) g (θ), but now θ and Ωb (θ ∗ ) are simultaneously determined by the
numeri
optimization pro
edure. This pro
edure obtains the same rst order properties of the
iterative GMM estimator, but, a
ording to Hansen et al. (1996), it has better properties in
terms of bias in nite samples, and this estimator is invariant under model reparameterization.
134
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 7
A
ording to Newey and Smith (2004) and Anatolyev (2005), the three methods are asymp-
toti
ally equivalent, but the se
ond order bias in nite samples of the
ontinuous updating
estimator is smaller. However, the numeri
pro
edure may be subje
t to multiple fashions in
the obje
tive fun
tion, whi
h renders this estimator numeri
ally unstable.
The estimation of the SV model by GMM is performed by employing the moment
onditions
dened by the ve
tor given by Eq. 2.3. There are, however, some spe
i
points in the
estimation of SV. As dis
ussed in Melino and Turnbull (1990) and Hall (2005), the numeri
al
pro
edure in this problem be
omes more di
ult due to the presen
e of non-dierentiable
moment
onditions by using absolute moments. Although these fun
tions are dierentiable
at almost all the points and the use of absolute moments does not ae
t the asymptoti
properties of the estimators (e.g. Hall (2005)), it is important to dis
uss how to deal with this
problem. Melino and Turnbull (1990) assume that the value of the fun
tion is 0 at the non-
dierentiable points, but this pro
edure
an be problemati
be
ause it leads to a dis
ontinuity
in the determination of the step size in the numeri
optimization algorithm. An alternative
form
onsists in performing a pro
edure of numeri
al interpolation at the non-dierentiability
point, whi
h is the pro
edure
arried out in this study. The properties of this approximation
an be seen in Hall (2005).
Properties of the GMM estimator in the estimation of SV models
an be found in Andersen
and Sorensen (1996)'s study, and a
omplete revision of the use of methods of moments, in-
luding the use of simulated methods of moments,
an be found in Renault (2009). The results
demonstrate that this estimator, despite being
omputationally simple, has poor properties
in nite samples due to bias and ine
ien
y problems, although the results are better that
those obtained by the quasi-maximum likelihood estimator (e. g. Ja
quier et al. (1994)). The
problem in nite samples of the GMM estimator is related to the need to use a large number
of moments to se
ure the estimator's e
ien
y, but the bias of the GMM estimator in nite
samples is proportional to the number of moment
onditions used. Thus, in nite samples
there is a trade-o between bias and e
ien
y. Note that, although the prin
ipal advantage of
the GMM estimator lies in its semi-parametri
formulation, whi
h does not require assump-
tions about the sample distribution, the estimator employs only the moments of the pro
ess,
and it does not employ all the information
ontained in the sample.
In Andersen and Sorensen (1996)'s arti
le, several details are dis
ussed in the spe
i
ation
of the GMM estimator for SV models, su
h as the
hoi
e of the Kernel fun
tion and the
bandwidth employed,
onvergen
e problems and other subgroups of moment
onditions. In
this study we employ the quadrati
spe
tral fun
tion as kernel fun
tion, with the optimum
bandwidth
hosen by Andrews (1991)'s pro
edure.
The GMM is a method parti
ularly useful in estimating non-linear models when the mo-
ments are known. However there is a trade-o between, on the one hand, the weaker need of
assumptions for its use, and, on the other, the method's e
ien
y in nite samples, as dis-
ussed in the previous se
tion. The regularity
onditions for GMM estimators (Hansen (1982),
Newey and M
Fadden (1994), Hall (2005)) involve only
onditions for the asymptoti
validity
of the moment
onditions, and they do not assume stronger
onditions su
h as the knowledge
of pro
ess distribution, whi
h represents an underutilization of the information presented in
the sample.
135
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 8
The opposite situation would be the estimation by the method of maximum likelihood,
whi
h uses not only the
onditional moments of the pro
ess but also all the information
present in the
onditional densities. If the pro
ess is
orre
tly spe
ied and meets the regular-
ity
onditions, it is the best asymptoti
ally Gaussian estimator, besides rea
hing optimality
in measures su
h as Badahur e
ien
y (Kitamura (2006), DasGupta (2008)). Note that the
estimation by maximum likelihood in the
ontext of the estimation of SV models is more
omplex be
ause the volatility is a latent variable, and the evaluation of the exa
t likelihood
fun
tion usually requires simulation methods su
h as importan
e sampling or MCMC. Ap-
proximations using the quasi-maximum likelihood prin
iple represent a
ost in terms of their
inferior performan
e in nite samples.
In this
ontext, an alternative form of formulating estimators that do not need the paramet-
ri
spe
i
ation of the pro
ess distribution
onsists in employing semi-parametri
estimation
methods based on a non-parametri
estimation of the likelihood fun
tion of the pro
ess. These
semi-parametri
estimators are known as Empiri
al Likelihood (EL) methods, formulated as
generalizations of the non-parametri
likelihood methods by Kiefer and Wolfowitz (1956).
A
ording to Kitamura (2006)'s presentation, the non-parametri
log-likelihood fun
tion of
a sequen
e of IID data {xi }ni=1 of unknown density is dened as:
n
X
(4.1) ℓN P (p1 , . . . , pn ) = log pi , (p1 , . . . , pn ) ∈ △,
i=1
P
dening △ as the simplex {(p1 , . . . , pn ) : ni=1 pi = 1, 0 ≤ pi ≤ 1, i = 1, . . . n} .
This denition is equivalent to addressing ea
h point of the sample as originating from
a multinomial distribution with the support given by the sample {xi }ni=1 observations, even
though the xi density is not multinomial. As this formulation does not involve any model and
does not
ontain the model's parametri
stru
ture, it is somehow nonrestri
tive when employed
in inferen
e problems involving a parametri
part with a nite number of parameters. The
semi-parametri
spe
i
ation of this pro
ess was obtained by Owen (1991), who established
the
on
ept of empiri
al likelihood.
This formulation is important be
ause it allows
onne
tions between the non-parametri
estimation of the likelihood fun
tion and the estimation using moment
onditions, formulated
with the estimation equation and M-estimators prin
iple - as shown by Qin and Lawless (1994),
and these estimation equations
an be formulated by using moment
onditions in the same
way as GMM estimators.
Assuming moment
onditions given by:
ˆ
(4.2) E [g(θ, Y )] = g(θ, y)dµY (y) = 0, θ ∈ Θ ⊂ Rk ,
where µY is the distribution of the random variable Y , the estimation problem using moment
onditions
an be transformed into a non-parametri
likelihood estimation, by the
onstru
tion
of impli
it probabilities pi , and thus the log-likelihood fun
tion to be maximized be
omes:
n
X n
X
(4.3) ℓN P (p1 , . . . , pn ) = log pi , s.t. g(θ, yi )pi = 0
i=1 i=1
136
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 9
The value that maximizes this expression is the maximum empiri
al likelihood estimative
and it maximizes the empiri
al likelihood fun
tion of the pro
ess and simultaneously imposes
the validity of the moment
onditions. These impli
it probabilities give more weight to obser-
vations where the moment
onditions are
loser to zero, and less weight to other observations.
Note that the generalized method of moments
an be obtained as a parti
ular
ase assuming
all weights to be pt = 1/n.
This empiri
al likelihood formulation is parti
ularly useful in the estimation of models
with latent variables where there is no way of evaluating the exa
t likelihood fun
tion of the
pro
ess. Whereas it is not ne
essary, when dealing with the GMM estimator, to assume the
knowledge of the pro
ess likelihood, in the estimators of empiri
al likelihood the information
of the pro
ess distribution is used in the estimation by means of its non-parametri
estimation.
This
onstru
tion makes it possible to obtain e
ien
y properties in the semi-parametri
sense
dened by Bi
kel et al. (1993).
Note that, when the sample is not an IID pro
ess, it is ne
essary to modify the treatment
given to the moment
onditions. In this situation, the method is modied assuming that
the moment
onditions originate from a pro
ess that is weakly dependent and possibly het-
erokedasti
. Anatolyev (2005) proposes to substitute g(θ, yt ) for a smoothed version dened
as:
m
X
(4.4) gw (θ, yt ) = w(s)g(θ, yt−s ),
s=−m
where w(s) are weights obtained by a kernel fun
tion adding one, in the spirit of a HAC
estimator (Andrews (1991)). This modi
ation makes it possible to obtain the same
onditions
of rst order asymptoti
e
ien
y present in the GMM methods. The moment
onditions are
then as follows:
T
X
(4.5) pt gw (θ, yt ) = 0.
t=1
The GMM estimators is generally dened by the minimization of the quadrati
form 3.6, and
in in the overidentied
ase not all the moment
onditions are ne
essarily equal to zero at the
estimated parameter value. In the empiri
al likelihood estimators formulated by the moment
onditions, these
onditions are set exa
tly equal to zero using the ponderation given by the
empiri
al probabilities pt . Note that in models exa
tly identied, all the proposed estimators
obtain similar results, be
ause in all these estimators the moment
onditions are always valid.
An important result is that in overidentied models with valid moment
onditions all these
estimators obtain the same asymptoti
varian
e (e.g. Kitamura (2006)).
It is possible to formulate these empiri
al likelihood estimators as parti
ular
ases of the
semi-parametri
lass of estimators based on the minimization of distan
es, or, as dened by
Bi
kel et al. (1993), estimators of generalized minimum
ontrast (GMC)2. This formulation
makes it possible to obtain the properties of semi-parametri
e
ien
y in this
lass of estima-
tors. Note that we
an also draw a parallel with the interpretation of the GMM estimator as
2See Bi
kel et al. (1993),
ap 7, for a general dis
ussion of
onditions of regularity, existen
e and e
ien
y
of generalized minimum
ontrast estimators.
137
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 10
T
X
(4.9) θbn = arg min hT (pt ).
θ,pt
t=1
In the
ase of empiri
al likelihood estimators, the point estimate θb is the value whi
h
minimizes the dis
repan
y between pbt and uniform weights. An important result is that an
adequate
hoi
e of the dis
repan
y fun
tion
an lead to a unied representation of empiri
al
likelihood and minimum
ontrast estimators. This representation
an be obtained when the
fun
tion hT (pt ) belongs to the Cressie-Read family of dis
repan
ies given by:
138
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 11
Note that the problem of estimation involves obtaining estimators not only for the impli
it
probabilities but also for the parameters of the parametri
part of the model, whi
h is, in
prin
iple, a high dimension optimization problem. Smith (2001) demonstrated that it is
possible to dene another estimator that also has these estimators as parti
ular
ases, and
that makes possible a dual formulation of inferior dimension.
The Smith (2001) Generalized Empiri
al Likelihood (GEL) estimate is obtained as a solution
for the following saddlepoint problem:
" T
#
1 X
(4.11) θbn = arg min max ′ w
ρ λ g (θ, yt ) ,
θ λ T
t=1
where λ denes Lagrange multipliers imposing a restri
tion:
T
X
(4.12) pt gw (θ, yt ) = 0.
t=1
Estimators are obtained by solving the previous equation with the rst-order
ondition:
T
X ∂gw (θ, yt )
(4.13) pt λ′ =0
∂θ
t=1
with:
1 ′ ′ w
(4.14) pt = ρ λ g (θ, yt ) .
T
This generalized likelihood estimator
ontains the empiri
al likelihood estimator, assuming
the same
onditions of the Cressie-Read divergen
e fun
tion over γ , through modi
ations of
fun
tions h and ρ. The EL estimator is obtained by h(p) = −ln np and ρ(ξ) = ln(1 − ξ); the
ET estimator by (Kitamura and Stutzer (1997) , Imbens et al. (1998)) with h(p) = np ln np
and ρ(ξ) = −exp(ξ); and the
ontinuous updating estimator as h(p) = (np)2 and ρ(ξ) =
−(1 + ξ)2 /23.
An additional
lass of estimators whi
h do not belong dire
tly to the
lass of EL or minimum
ontrast estimators, but whi
h is obtained by
ombining the empiri
al likelihood estimator
and the ET estimator, is the ETEL estimator proposed by S
henna
h (2007). This estimator
is dened as:
n
!
X
(4.15) θb = arg min n−1 e
h(pt (θ)) ,
θ
i=1
where e
gi (θ) is the solution of:
n
X
(4.16) min n −1
h(pt )
{gi }n
i=1 i=1
139
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 12
Pn Pn
subje
t to i=1 pt g(θ, yt ) = 0 and i=1 pt = 1, with e
h(pbt ) = −ln(npt) and h(pt ) =
npt ln(npt ).
Note that the ETEL estimator employs the ET method to nd the probabilities pbi (θ), and
the EL method to estimate the parameter ve
tor θb. These probabilities are related to the
multipliers λ by the relation:
b ′ g(θ, yt )
λ(θ)
(4.17) pbt (θ) = P .
n b ′
i=1 λ(θ) g(θ, yt )
An important property of the estimators of ETEL
lass is their behavior in the presen
e of
in
orre
t spe
i
ation. Imbens et al. (1998) point out that the EL estimator
an display inad-
equate behavior in the presen
e of in
orre
t spe
i
ation due to the presen
e of a singularity in
its inuen
e fun
tion, and, a
ording to theorem 1 in Smith (2001), the asymptoti
properties
of the EL estimator
an be severely weakened in the presen
e of minimum spe
i
ation prob-
lems. This also ae
ts the estimations of the impli
it probabilities, be
ause, in the presen
e
of spe
i
ation problems, the impli
it probabilities in likelihood problems tend to
on
entrate
on the extreme observations, in opposition to what is expe
ted in a robust estimator in Huber
(1981) and Hampel et al. (1986)'s sense, whi
h should minimize the importan
e of extreme
observations in the
onstru
tion of an estimator.
We will now summarize some
ommon properties of the estimators dis
ussed in this study.
The rst property is that all the estimators employed (two-stage GMM, iterative GMM, GMM
ontinuous updating, GEL, ET, and ETEL) have the same properties of
onsisten
y and rst
order asymptoti
e
ien
y (e.g. Smith (2001), S
henna
h (2007)), and in the validity of
moment
onditions all the estimators have the same asymptoti
varian
e. However, their
performan
e in nite samples
an be quite dierent. The two-stage GMM estimator
an be
severely biased in sample sizes employed in e
onomi
s and nan
e, and
ontinuous updating
estimators are numeri
ally unstable due to the presen
e of multiple modes in the obje
tive
fun
tion (e.g. Hansen et al. (1996)). Another interesting property is that estimators based on
GMC and GEL are invariant to linear transformations in the ve
tor of moment
onditions,
whi
h does not o
ur in the two-stage GMM estimator. Estimators based on generalized
empiri
al likelihood/minimum
ontrast are e
ient in the semi-parametri
sense of Bi
kel
et al. (1993), and have superior properties in terms of higher order asymptoti
bias. These
estimators also present optimum properties in terms of hypotheses testing. As demonstrated
by Kitamura (2006), these tests are optimum in the minimax and large deviations
riteria,
and are uniformly more powerful in the generalized sense of Neyman-Pearson.
A fundamental point is that in the EL and minimum
ontrast estimators based on the
Cressie-Read dis
repan
y, the bias in nite samples does not grow with the number of moment
onditions used. This property makes it possible for the e
ien
y of the estimators to be
obtained with the use of a high number of moment
onditions, without implying an in
rease
in the bias in the nite samples as o
urs in the use of the GMM estimator, whi
h leads to
the problem of the inferior performan
e of this method in
omparison with other forms of
estimation.
The result obtained by Smith (2001) is that in the
lass of minimum
ontrast/empiri
al like-
lihood estimators, the only estimator with adequate behavior in the presen
e of spe
i
ation
problems is the ET estimator, be
ause its inuen
e fun
tion does not present singularities. The
ETEL estimator is a
ombination of the EL estimator and the EL estimator, and it maintains
140
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 13
the EL estimator's
hara
teristi
s of asymptoti
e
ien
y and minimum bias. Additionally,
it inherits the robustness in the presen
e of spe
i
ation problems, due to the use of the
ET estimator to estimate impli
it probabilities,
√ as demonstrated by theorems 8-10 in Smith
(2001), who proves that this estimator is n
onvergent even in the presen
e of spe
i
ation
problems.
Estimators for the parameters of the parametri
part of the model and for the impli
it
probabilities
an be obtained by numeri
optimization or via quasi-Newton iterative methods.
These methods
an be formulated in a problem of smaller dimension using a dual formulation
(Kitamura (2006)) through the numeri
optimization employing Lagrange multipliers dened
by equations 4.11 and 4.17, whi
h is the general form used in this study.
Note that in the estimation of SV models we are subje
t to the same problem of using non-
dierentiable moment
onditions due to the use of absolute moments. This problem impedes
the simple use of iterative methods for the estimation of Lagrange multipliers proposed by
Kitamura (2006), and thus, in these
ases, we need to use the same te
hniques of numeri
optimization with the interpolation in the vi
inity of the dis
ontinuity points dis
ussed in the
estimation by GMM.
141
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 14
Table 1. Referen e SV Model Sample Size 500 - α=-0.736 β =.9 σ =.3629, T=500
it is possible to
ompare the results obtained with other estimation methodologies. The results
obtained are dire
tly
omparable with those analyzed in Takada (2009)'s arti
le, who proposed
an estimator for SV models employing simulated Minimum Hellinger Distan
es,
omparing
this method with other methodologies, su
h as the e
ient method of moments (EMM),
MCMC, and maximum likelihood Monte Carlo.
Table 1 in Takada (2009) shows the results for these estimators' MSE for the rst parameter
ve
tor studied, for a sample of size 500. The results of a dire
t
omparison with the results
presented in this table indi
ate that the estimators based on GEL/GMC are superior to
the following methods in terms of MSE: SMHD (Simulated Minimum Hellinger Distan
e),
EMM (E
ient Method of Moments) and MCMC. They also have a superior or equivalent
performan
e to the MCML (Monte Carlo Maximum Likelihood) estimators by the
riterion
of mean quadrati
error. In
omparison with the results of that arti
le, we noti
e that the
results of all the estimators based on GEL/GMC are superior to all these methods, ex
ept for
142
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 15
Figure 5.1. MSE and MAE of the estimation of the referen
e models with
sample size 500 and 24 moment
onditions.
0.8 Experiment 1 α =.736 γ =.9 σ2 =.3629 Experiment 1 α =−.1472 γ =.98 σ2 =.1657
0.20
mse gamma mse gamma
mae gamma mae gamma
mse sigma^2 mse sigma^2
mae sigma^2 mae sigma^2
0.6
0.15
0.4
0.10
0.2
0.05
0.00
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
mae alpha
0.20
mse gamma
mae gamma
mse sigma^2
mae sigma^2
0.15
0.10
0.05
0.00
the estimation of α where the estimators obtain a mean quadrati
error equal to the MCML
estimator.
In this
omparison it is important to noti
e that the GEL/GMC estimators do not require
Monte Carlo simulation pro
edure, and are
omputationally simpler than these methods,
indi
ating that the use of EL and MC makes it possible to obtain superior properties in nite
samples when
ompared with the methods so far
onsidered as the most e
ient in the SV
model estimation, with a noti
eably smaller
omputational and implementation
ost.
5.1. Ee
t of Sample Size and Set of Instruments. In order to verify the ee
t of the
sample size in the estimators' performan
e, we
arried out an analysis with the estimation of
the parameter ve
tors studied with samples of size 250 (Tables 4, 5 and 6) and 1,000 (Tables
7, 8 and 9) and employing the 24 moment
onditions dened by equation 2.3. As expe
ted,
the in
rease in the sample size de
reases the MSE and MAE of all the estimators, but with
dierent ee
ts for ea
h parameter
onguration of and estimation method. Summarizing
these results, we show in Figure 5.2 the relative e
ien
y, dened as a reason between the
MSE of the sample of size 250 and the MSE of sample size 1,000 for ea
h
onguration.
Ex
ept for the GEL estimator in parameter
onguration 2, with e
ien
e rate inferior to
one, there is a real gain in terms of MSE for all the parameters. This parti
ular result for
the GEL estimator in this
onguration
an be explained by the greater
onvergen
e di
ulty
143
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 16
noted in this parti
ular
onguration, but it is important to note that, in the version with
smoothed moments, this estimator behaves as expe
ted.
144
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 17
As
an be seen in Figure 5.2, the sample size has heterogeneous ee
ts for ea
h estima-
tor, depending on the parameter
onguration. The estimators based on GEL/GMC with
smoothed moments have greater gain in the
onguration with smaller persisten
e while those
based on GMM behave in the opposite way. This result
an be interpreted by the fa
t that
the smoothing of moments is more e
ient when the volatility persisten
e is smaller.
As previously dis
ussed, the main theoreti
al motivation for the use of estimators based on
GEL/GMC lies in the possibility of using a larger number of moment
onditions to a
hieve a
more e
ient estimation, sin
e the nite samples bias in these methods does not grow with
the number of moment
onditions, as o
urs with GMM estimators. In order to verify this
property, we employ a new estimation with a subset of the moment
onditions ve
tor, now
working with 14 moment
onditions only, a
ording to Eq. 2.4, instead of the original 24
moment
onditions given by Eq. 2.3.
The results of this
omparison are displayed in Tables 10, 11 and 12, and the
omparisons
between estimators employing MSE and MAE with the use of 14 moment
onditions are pla
ed
in Figure 5.3. We
an note that in this
onguration the GEL/GMC estimators still display
a superior performan
e in
omparison with those based on GMM, but now this performan
e
is not as superior as in the
onguration with 24 moment
onditions, whi
h gives support to
the
onje
ture of a superior use of the moment
onditions in terms of bias and varian
e for
the estimators of GEL/GMC
lass.
Figure 5.4 presents the relative e
ien
y between MSE using 14 moments and the estimator
with 24 moments. For the GMM estimators the e
ien
y presents modest in
reases or redu
-
tions in
reasing the number of instruments, similarly to the results obtained in the studies
by Andersen and Sorensen (1996). However there are, in general, very signi
ant e
ien
y
gains in MSE for the estimators based on GEL/GMC, rea
hing values over 200 times in the
145
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 18
8
6
6
4
4
2
2
0
0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
5.2. - Student-t Distribution (4) in the mean innovations. As previously dis
ussed,
although the SV log-normal model is dened by moments of a log-normal distribution, it
an
be interpreted in a semi-parametri
form as an autoregressive model for the exponential of
146
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 19
the volatility pro
ess, without the need for a distribution spe
i
ation for the innovation pro-
esses (e.g. Fran
q and Zakoïan (2006), Renault (2009)). However, as we are employing these
theoreti
al moments assuming the distribution spe
i
ation of innovations in the
onstru
tion
of the moment
onditions, it is important to verify whether alternative spe
i
ations signi-
antly alter the properties of the estimators in nite samples. It is parti
ularly interesting to
verify if,
onsistently with what is observed for nan
ial series, heavy-tailed pro
esses ae
t
these estimators.
The rst analysis undertaken was to repla
e the standard Gaussian distribution in the
innovations of the mean pro
ess for a Student-t distribution with 4 degrees of freedom. This
hoi
e was purposely made with the aim of verifying the ee
t of a distribution with heavier
tails on the estimation of SV models. Note that, as we are employing higher moments, the
heavy-tail ee
t
an be magnied in the estimation, sin
e now ea
h observation is raised to
poten
ies of se
ond, third and fourth orders. We parti
ularly use this number of 4 degrees
of freedom in Student-t to have a distribution with non-nite kurtosis and,
onsequently, to
have a robustness test under extreme
onditions.
Tables 13, 14 and 15 show the results of this experiment using 24 moment
onditions and
Tables 16, 17 and 18, using 14 moment
onditions. It
an be seen that in this situation the
estimators based on GMC/GEL
learly maintain their dominan
e over the estimators based
on GMM, as it be
omes more evident in Figures 5.5 and 5.6, whi
h show MSE and MAE of
ea
h estimator, and on
e again we have the same result of best performan
e in this situation
of the GEL/GMC-based estimators.
In order to verify whether in this
ase it is still advantageous to work with a larger set
of instruments Figure 5.7 shows the ratio of the MSEs between the estimators with 14 and
24 moment
onditions. The results show that in this situation the in
rease in the number of
147
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 20
148
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 21
Figure 5.3. MSE and MAE of the estimation of the referen
e models with
sample size 500 and 14 moment
onditions.
Experiment 1 − Subset Instruments α =−.736 γ =.9 σ2 =.3629 Experiment 1 − Subset Instruments α =−0.368 γ =.95 σ2 =.26
0.35
mse alpha mse alpha
mae alpha mae alpha
mse gamma mse gamma
0.5
0.30
mse sigma^2 mse sigma^2
mae sigma^2 mae sigma^2
0.25
0.4
0.20
0.3
0.15
0.2
0.10
0.1
0.05
0.00
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
mae alpha
mse gamma
mae gamma
0.15
mse sigma^2
mae sigma^2
0.10
0.05
0.00
instruments
an impair the performan
e of the estimators, and this ee
t o
urs both for the
GMM estimators and for the GEL/GMC estimators, although the ee
t is heterogeneous in
terms of the
onguration and of the parameter analyzed. In the situation of lower persisten
e,
it is advantageous to work with the larger number of instruments for the GEL/GMC estima-
tors, but this result is not maintained in the other parameter
ongurations, and parti
ularly
in the
onguration with high persisten
e, the use of the set of instruments
auses almost a
general degradation in the performan
e of all the methods.
149
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 22
200
30
150
20
100
10
50
0
0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
5.3. Student-t Distribution (4) in the volatility innovations. In the next experiment,
we modied the data generator pro
ess, assuming now that the innovation pro
ess in the
volatility equation is given by a Student-t pro
ess with 4 degrees of freedom, assuming in this
ase the usual supposition of Gaussian innovations in the mean equation. Note that, in this
onguration, the ee
ts are expe
ted to be more harmful, sin
e now the ee
t of heavier tails
is dire
tly spread by the volatility equation's autoregressive stru
ture, unlike the previous
ase
150
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 23
Figure 5.5. MSE and MAE of the estimation of the referen
e models, modi-
ed with Student-t with 4 d.f. innovation in the mean equation. Sample size
500 and 24 moment
onditions.
Experiment 2 α =−.736 γ =.9 σ2 =.3629 Experiment 2 α =−0.368 γ =.95 σ2 =.26
0.3
0.6
0.2
0.4
0.1
0.2
0.0
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
mae alpha
mse gamma
mae gamma
0.30
mse sigma^2
mae sigma^2
0.25
0.20
0.15
0.10
0.05
0.00
where the heavy-tailed innovations ae
ted the mean equation, whi
h was a pro
ess without
orrelation.
Tables 19, 20 and 21 show the results obtained with 24 moment
onditions, and Tables 22,
23 and 24 show the results obtained with 14 moment
onditions. These results are summarized
in Figures 5.8 and 5.9. We note that these heavier tailed innovations ee
tively damage the
performan
e of the GMM-based estimators, and moderately damage the GEL-based estima-
tors. In this experiment, the robustness properties of the methods based on ET and ETEL
be
ome evident, and these methods generally have a superior performan
e in
omparison with
the other methods. For example, the ratio between MSE for α estimated by Iterative GMM
and by the smoothed ETEL method has a value of 5102.984 for the rst parameter
ongura-
tion, showing the dominan
e of these methods in this situation of in
orre
t spe
i
ation. As
previously dis
ussed, this robustness property is derived from the bounded inuen
e fun
tion
of the estimators based on ET, and it proves to be quite important in this situation. As
nan
ial data is
hara
terized by heavy tails, we have an additional justi
ation for the use
of the estimators proposed in this study.
Likewise, we
an verify the ee
ts of using a number of moment
onditions in this
on-
guration. Figure 5.10 shows the relative e
ien
y ee
ts of the estimators obtained with
the in
rease in the number of instruments from 14 to 24. However, in this
onguration, we
have mixed results be
ause for the rst parameter
onguration there is a general gain in the
151
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 24
152
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 25
Figure 5.6. MSE and MAE of the estimation of the referen
e models, modi-
ed with Student-t with 4 d.f. innovation in the mean equation. Sample size
500 and 14 moment
onditions.
Experiment 2 − Subset Instruments α =−.736 γ =.9 σ2 =.3629 Experiment 2 − Subset Instruments α =−0.368 γ =.95 σ2 =.26
0.30
mse gamma mse gamma
mae gamma mae gamma
0.5
0.25
0.4
0.20
0.3
0.15
0.2
0.10
0.1
0.05
0.00
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
mae alpha
mse gamma
mae gamma
mse sigma^2
mae sigma^2
0.10
0.05
0.00
estimators - though more noti
eable for the estimators based on GEL/GMC -, but for the
other
ongurations there are losses, parti
ularly in the estimation of the volatility parameter
σ in the se
ond
onguration.
5.4. Experiment 4 - Level Outlier. In order to verify the ee
ts of aberrant observations
(outliers) in the pro
ess of sto
hasti
estimation, we undertook two
lasses of experiments.
In this part of our study we will verify the ee
t of the so-
alled level outliers (in Hotta and
153
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 26
1.4
12
1.2
10
1.0
8
0.8
6
0.6
4
0.4
2
0.2
0.0
0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
1.4
Tsay (1998)'s nomen
lature) in the estimation of SV models. In this experiment the generator
pro
ess is given by:
(5.1) yt = σt εt + LOt
154
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 27
Figure 5.8. MSE and MAE of the estimation of the referen
e models, mod-
ied with Student-t with 4 d.f. innovation in the volatility equation. Sample
size 500 and 24 moment
onditions.
Experiment 3 α =−.736 γ =.9 σ2 =.3629 Experiment 3 α =−0.368 γ =.95 σ2 =.26
4
mae sigma^2 mae sigma^2
5
3
4
3
2
2
1
1
0
0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
5
mae alpha
mse gamma
mae gamma
mse sigma^2
mae sigma^2
4
3
2
1
0
155
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 28
Figure 5.9. MSE and MAE of the estimation of the referen
e models, mod-
ied with Student-t with 4 d.f. innovation in the volatility equation. Sample
size 500 and 14 moment
onditions.
Experiment 3 − Subset Instruments α =−.736 γ =.9 σ2 =.3629 Experiment 3 − Subset Instruments α =−0.368 γ =.95 σ2 =.26
3
3
2
2
1
1
0
0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
4
mae alpha
mse gamma
mae gamma
mse sigma^2
mae sigma^2
3
2
1
0
instruments represent a loss in performan
e in most
ases, parti
ularly for the estimation of
parameter σ.
5.5. Experiment 5 - Volatility Outlier. In the last spe
i
ation tested, we veried the
ee
t of a so-
alled volatility outlier (as named by Hotta and Tsay (1998)) in the estimation.
In this experiment, the data generator pro
ess is given by:
(5.3) yt = σt εt
156
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 29
1.5
efficiency gain mse sigma efficiency gain mse sigma
300
1.0
200
0.5
100
0.0
0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
157
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 30
dire
tly transmitted by the autoregressive stru
ture in the volatility equation, whereas the
ee
t was indire
t in the
ase of an innovation outlier.
158
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 31
Figure 5.11. MSE and MAE of the estimation of the referen
e models, mod-
ied with Level Outlier. Sample size 500 and 24 moment
onditions.
1.2 Experiment 4 α =−.736 γ =.9 σ2 =.3629 Experiment 4 α =−0.368 γ =.95 σ2 =.26
0.35
mae alpha mae alpha
mse gamma mse gamma
mae gamma mae gamma
mse sigma^2 mse sigma^2
0.30
1.0
0.25
0.8
0.20
0.6
0.15
0.4
0.10
0.2
0.05
0.00
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
0.25
mae alpha
mse gamma
mae gamma
mse sigma^2
mae sigma^2
0.20
0.15
0.10
0.05
0.00
Tables 31, 32 and 33 (estimation with 24 moments) and 34, 35 and 36 (estimation with 14
moments) show the results of estimations whi
h
an be summarized by Figures 5.14 and 5.15
with the MSE and MAE results. As per previous experiments, the GEL/GMC-based estima-
tors have in general a superior performan
e in
omparison with the GMM-based methods, and
show that the same properties of robustness remain valid in this volatility outlier situation,
whi
h would be potentially more serious for the estimation of volatility parameters.
The ee
t of the larger number of instruments in this situation
an be seen in Figure 5.16,
whi
h indi
ates that there is an e
ien
y gain with a higher number of instruments in the
159
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 32
Figure 5.12. MSE and MAE of the estimation of the referen
e models, mod-
ied with Level Outlier. Sample size 500 and 14 moment
onditions.
0.8 Experiment 4 − Subset Instruments α =−.736 γ =.9 σ2 =.3629 Experiment 4 − Subset Instruments α =−0.368 γ =.95 σ2 =.26
0.4
mae sigma^2 mae sigma^2
0.6
0.3
0.4
0.2
0.2
0.1
0.0
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
mae alpha
mse gamma
0.4
mae gamma
mse sigma^2
mae sigma^2
0.3
0.2
0.1
0.0
situation with low persisten
e; however, for situations with higher volatility persisten
e, the
additional instruments generally present noti
eable deterioration in the estimators' MSE.
6. Con lusions
In this study we dis
ussed the estimation of SV models using estimators based on gener-
alizations of the empiri
al likelihood and minimum
ontrast methods. The performan
e of
these estimators, as shown by a set of Monte Carlo experiments, proved to be superior to
the estimation methods based on generalized method of moments, and also superior to the
160
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 33
Figure 5.13. Relative E
ien
y in the referen
e models with level outlier -
Ee
t of number of moment
onditions - (MSE 14 moment
onditions /MSE
24 moment
onditions). Sample size 500.
1.5
60
50
1.0
40
30
0.5
20
10
0.0
0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
methods based on simulation su
h as MCMC and Monte Carlo maximum likelihood as studied
in Takada (2009).
The results obtained in this study are
onsistent with those obtained by other studies (e.g.
Newey and Smith (2004)), whi
h demonstrate that alternative estimators based on moments,
formulated as GEL/GMC-based estimators, display superior performan
e, nullifying the bias
problems o
urring in the usual GMM estimators. The proposed estimators manage to obtain
superior properties in nite samples by a better use of the informational
ontent present in the
moment
onditions, sin
e the higher e
ien
y is obtained not only by means of weighting by
161
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 34
the estimators' varian
e - as in the
ase of GMM estimators - but also by the non-parametri
estimation of the likelihood fun
tion of the pro
ess, as dis
ussed in Antoine et al. (2007).
Another related property lies in the fa
t that the bias of these estimators does not grow with
the number of moment
onditions, as happens in the
ase of GMM estimators. Thus, it is
possible to obtain e
ien
y properties by using an adequate number of moment
onditions.
This
hara
teristi
an be parti
ularly important in the estimation of multivariate SV models,
in whi
h the number of moment
onditions is proportional to the number of series studied.
As the estimation of multivariate SV models still represents a great
omputational
hallenge,
(e.g. Chib et al. (2009)), the estimation by methods based on empiri
al likelihood/minimum
ontrast
an be an e
ient alternative to be explored.
These results are parti
ularly interesting be
ause the implementation of the methods dis-
ussed in this study is
omputationally simpler than the implementation of methods based on
simulation, requiring just one spe
i
ation of the moment
onditions of sto
hasti
volatility
162
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 35
Figure 5.14. MSE and MAE of the estimation of the referen
e models mod-
ied with volatility outlier. Sample size 500 and 24 moment
onditions.
0.35
mse alpha mse alpha
mae alpha mae alpha
mse gamma mse gamma
0.8
0.30
mae sigma^2 mae sigma^2
0.25
0.6
0.20
0.4
0.15
0.10
0.2
0.05
0.00
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
mae alpha
0.20
mse gamma
mae gamma
mse sigma^2
mae sigma^2
0.15
0.10
0.05
0.00
pro
esses. Although this study is based on the spe
i
ation of the log-normal SV model, it is
important to note that this pro
edure
an be generalized by using the methodology proposed
by Meddahi (2001), whi
h makes possible the automati
generation of moment
onditions for
pro
esses that belong to the so-
alled SV-eigenfun
tions family.
Another important
hara
teristi
is related to robustness properties and spe
i
ation prob-
lems, parti
ularly of the√methods based on ET, whi
h, due to properties in their inuen
e
fun
tion, manage to be n
onsistent even in the presen
e of spe
i
ation problems. This
property is parti
ularly important in the presen
e of pro
esses of heavy-tail innovation, as
veried in this study by the use of a Student-t distribution with non-nite kurtosis, or else in
the presen
e of level or volatility outliers.
Referen es
Anatolyev, S.: 2005, Gmm, gel, serial
orrelation and asymptoti
bias, E
onometri
a 73, 983
1002.
Andersen, T.: 1994, Sto
hasti
autoregressive volatility: A framework for volatility modelling,
Mathemati
al Finan
e 4, 75102.
Andersen, T. and Sorensen, B.: 1996, Gmm estimation of a sto
hasti
volatility model: A
Monte Carlo study, Journal of Business and E
onomi
Statisti
s 14(3), 328352.
163
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 36
Figure 5.15. MSE and MAE of the estimation of the referen
e models mod-
ied with volatility outlier. Sample size 500 and 14 moment
onditions.
Experiment 5 − Subset Instruments α =−.736 γ =.9 σ2 =.3629 Experiment 5 − Subset Instruments α =−0.368 γ =.95 σ2 =.26
0.30
mae gamma mae gamma
0.5
0.25
0.4
0.20
0.3
0.15
0.2
0.10
0.1
0.05
0.00
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
mse alpha
mae alpha
mse gamma
0.15
mae gamma
mse sigma^2
mae sigma^2
0.10
0.05
0.00
Andrews, D. W. K.: 1991, Heteroskedasti
ity and auto
orrelation
onsistent
ovarian
e matrix
estimation, E
onometri
a
59, 817858.
Antoine, B., Bonnal, H. and Renault, E.: 2007, On the e
iente use of the informational
ontent of estimating equations: implied probabilities and eu
lidian empiri
al likelihood,
Journal Of E
onometri
s 138, 461487.
Barndor-Nielsen, O. E., No
lato, E. and Shephard, N. G.: 2002, Some re
ent developments
in sto
hasti
volatility modelling, 2, 1123. Quantitative Finan
e
Bi
kel, P., Klassen, C., Ritov, Y. and Wellner, J.: 1993, E
ient and Adaptative Estimation
for Semiparametri
Models , Johns Hopkins Press.
Bollerslev, T.: 1986, Generalized autoregressive
onditional heteroskedasti
ity, Journal of
E
onometri
s 32, 307327.
Broto, C. and E., R.: 2004, Estimation methods for sto
hasti
volatility methods: A survey,
Journal of E
onomi
Surveys 18(5), 613649.
Chib, S., Omori, Y. and Asai, M.: 2009, Handbook of Finan
ial Time Series
, Springer,
hapter
Multivariate Sto
hasti
Volatility Models, pp. 365402.
DasGupta, A.: 2008, Asymptoti
Theory of Statisti
s and Probability
, Sprimger.
Engle, R. F.: 1982, Autoregressive
onditional heteros
edasti
ity with estimates of the varian
e
of United Kingdom ination, E
onometri
a
50, 9871007.
Fran
q, C. and Zakoïan, J. M.: 2006, Linear-representation bases estimation of sto
hasti
volatilty models, S
andinavian Journal of Statisti
s
33, 785806.
164
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 37
Figure 5.16. Relative E
ien
y in the referen
e models modied with volatil-
ity outlier - Ee
t of number of moment
onditions - (MSE 14 moment
ondi-
tions /MSE 24 moment
onditions.) Sample size 500.
1.4
efficiency gain mse alpha efficiency gain mse alpha
15
1.2
1.0
10
0.8
0.6
5
0.4
0.2
0.0
0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
7
Gallant, R. A. and Tau
hen, G.: 1996, Whi
h moments to mat
h, E
onometri
Theory
12(4), 657681.
Geweke, J.: 1994, Bayesian
omparison of e
onometri
models. Federal Reserve of Minneapolis
Working Paper,.
Ghysels, E., Harvey, A. C. and Renault, E.: 1996, Statisti
al Methods in Finan
e, North
Holland,
hapter Sto
hasti
Volatility, pp. 221238.
Gourieroux, C. A., Monfort, A. and Renault, E.: 1993, Indire
t inferen
e, Journal of Applied
E
onometri
s 8, 85118.
Hall, A.: 2005, Generalized Method of Moments, Oxford University Press.
Hampel, F. R., Ron
hetti, E. M., Rousseeuw, P. and Stahel, W. A.: 1986, Robust Statisti
s:
The Approa
h Based on Inuen
e Fun
tions, John Wiley & Sons.
Hansen, L. P.: 1982, Large sample properties of Generalized Method of Moments estimators,
E
onometri
a 50, 10291054.
Hansen, L. P., Heaton, J. and Yaron, A.: 1996, Finite sample properties od some alternative
gmm estimators, Journal of Business and E
onomi
Statisti
s 14, 262280.
Harvey, A. C., E., R. and Shephard, N. G.: 1994, Multivariate sto
hasti
varian
e models,
Review of E
onomi
Studies 61, 247264.
Hotta, L. and Tsay, R.: 1998, Outliers in gar
h pro
esses. Working Paper, Graduate S
hool
of Business, University of Chi
ago.
165
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 38
166
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 39
Singleton, K. J.: 2006, Empiri
al Dynami
Asset Pri
ing, Prin
eton University Press.
Smith, R. J.: 2001, Gel
riteria for moment
onditions models. Working Paper, University of
Bristol.
Takada, T.: 2009, Simulated minimum hellinger distan
e estimation of sto
hasti
volatility
models, Computational Statisti
s and Data Analysis 53, 23902403.
Taylor, S. J.: 1986, Modelling Finan
ial Time Series, John Wiley& Sons.
White, H.: 1982, Maximum likelihood estimation of misspe
ied models, E
onometri
a 50, 1
25.
167
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE
DIFFUSIONS
IMECC-UNICAMP
Abstract. In this article we discuss the estimation of continuous time interest rate models
driven by fractional Brownian motion (fBm) using discrete data. In the presence of a fractional
Brownian motion, usual estimation methods for continuous time models using discrete data are
not appropriate since in general fBm is neither a semimartingale nor a Markov process. In this
This version - October 2009. Adress - Insper Institute - Rua Quatá 300, 04546-042, São Paulo, SP. Brasil. email -
Márcio Laurini - marciopl@isp.edu.br - Luiz Koodi Hotta - hotta@ime.unicamp.br.
1
169
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 2
1. Introduction
The use of continuous time models in finance, started with the seminal work of Bachelier (1900),
allows the usage of probability and stochastic process theory for the determination of asset prices.
The principle of no-arbitrage pricing, introduced in Harrison and Kreps (1979) and Harrison and
Pliska (1981), can be resumed as an imposition of a set of restrictions in stochastic processes
measured in continuous time which does not allow the existence of risk free profits.
Delbaen and Schachermayer (1994) show that no-arbitrage pricing is only possible in processes
know as semimartingales, and process excluded from such cannot be used as innovation process
in financial asset modeling. However, recent articles (discussed in the section 2) show that in the
presence of transaction costs and restrictions on the set of admissible strategies for the agents, more
general processes than semimartingales can be used as price process in finance, being consistent
with the no-arbitrage principle.
A particular result in these articles is that a special stochastic process would be consistent
with no-arbitrage in these conditions - the process known as fractional Brownian motion (fBm in
the text). This is a generalization of Brownian motion which allows the possibility of dependent
increments and long memory. Since the increments of this process are dependent, it is not a Markov
process; and, besides of a particular case where the process reduces to the standard Brownian case,
this process is not a semimartingale.
In this article we discuss the implications of fractional Brownian motions in the estimation of
stochastic differential equations using discrete sampled data. In this situation, most of the estima-
tors proposed for continuous time models using discrete data cannot be applied, since the violation
of Markov property prevents the construction of closed form likelihood functions. However, we
can use the principle of indirect inference (Gourieroux et al. (1993)) to build an estimator for
stochastic differential equations based on fractional Brownian motion. The principle of indirect
inference uses an auxiliary model based on an approximate and analytically tractable specification
of the model. The correction of the inconsistency generated by the incorrect specification is made
by Monte Carlo simulations. The principle of indirect inference can be used in this context of
non Markovian/non semimartingale stochastic differential equations since it does not demand the
exact likelihood function of the process, which is of infinite dimension in the fBm case.
In this article, we show how to implement the indirect inference principle for stochastic dif-
ferential equations driven by fractional Brownian motion, discussing with special attention the
170
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 3
computational difficulties in the implementation of this estimator. To show one practical applica-
tion, we estimate Cox-Ingersoll-Ross models ((Cox et al. (1985))) driven by a fractional Brownian
motion for a set of interest rate series studied in the literature.
This article is organized as following: in Section 2 we present a short review of the connections
between the principle of no-arbitrage and semimartingales; in Section 3, we show a brief descrip-
tion of some properties of fractional Brownian. In Section 4 we review the literature on estimation
of stochastic differential equations, and discuss the limitations of the existing estimators in the
presence of a fBm. Section 5 shows evidences obtained by Monte Carlo simulation of the effects
in usual estimators for diffusions under the influence of a fractional Brownian. In section 6 we
describe the proposed indirect inference estimator and the computational difficulties involved,
Section 7 shows the properties of indirect inference estimator and the GMM auxiliary model by
Monte Carlo simulations. Section 8 shows the real data applications using the US short rate data,
a Eurodollar rate and a series of Canadian short term interest rate. In Section 9 we have our final
words. Sections 2 and 3 can be skipped by readers more experienced with the ideas of no-arbitrage
Pricing and the fractional Brownian motion.
The principle of pricing by no-arbitrage (Harrison and Kreps (1979), Harrison and Pliska (1981))
states that the price of an asset can be calculated as the price of a replicating portfolio that
reproduces the discounted payoff from the stochastic process of interest. We can formalize1 this
principle defining a probability space (Ω, F , (Ft )t>0 , P ) with d + 1 assets, a particular asset B =
(Bt )t>0 being a bank account representing the risk free asset, Ft−1 measurable, and a vector
S = (S 1 , ..., S d ) of risky assets prices with dimension d, with S i = (Sti )n≥0 , Fn - measurable. We
can define the portfolio value (Xtπ )t≥0 in period n by the expression:
(1) Xtπ = βt Bt + γt St .
We define a strategy π = (β, γ) as the choice in every moment of time of shares in the risk free
and the non risk free assets. A strategy π is a self-financing strategy if it can be written as:
1
This explanation follows Shiryaev (1999), and for further mathematical explanation sees Delbaen and Schacher-
mayer (2006).
171
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 4
t
X
(2) Xtπ = X0π + (βk ∆Bk + γk ∆Sk ).
k=1
In no-arbitrage pricing we usually choose one of the assets as numerary, and the portfolio
π
Xn
discounted value g
Xnπ = Bn that satisfies the relationship:
Xtπ St
(3) ∆ = γn ∆ .
Bt Bt
A self-financing strategy represents an arbitrage opportunity if, for an initial capital value
X0π = 0, we have Xtπ ≥ 0 (P-a.s) and Xtπ ≥ 0 with probability P (Xtπ > 0) > 0 , and the expected
portfolio value EXtπ > 0, which is equivalent to say that an investment is a risk free profit.
Defining Sfarb as the class of self-financing strategies with arbitrage opportunities, we state
π
that a market is arbitrage free if Sfarb = ∅ and we have P (XN = 0) = 1. The main result of
no-arbitrage pricing, knows as Fundamental Theorem of Asset Pricing, claims that a market is
free of arbitrage if and only if ther exists (at least one) measure of Probability Q equivalent to the
measure P such that the discounted sequence:
S St
(4) = ,
B Bt
St
(5) EQ <∞
Bt
and
St St−1
(6) EQ |Ft−1 = .
Bt Bt−1
Equation 6 shows the main property which is related to the concept of Market Efficiency -
the portfolio expected value in a future period n, given the information in Fn−1 , is the observed
2This definition is mathematically informal. In Asset Pricing Fundamental Theorem’ rigorous definition (Delbaen
and Schachermayer (1994)), the existence condition for an Equivalent Martingale Measure is the validity of the
condition knows as No Free Lunch with Vanishing Risk. No Arbitrage and No Free Lunch with Vanishing Risk are
equivalent when we have a finite sample space Ω.
172
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 5
portfolio value in n-1 period and thus variations in the portfolio value cannot be predicted in
a systematic way. If this principle is not valid, there are some arbitrage opportunity since it is
possible to create an arbitrage strategy π = (β, γ) using the predictability in the risk asset Sn .
General asset pricing by Martingale Methods follows this general procedure - the idea is to verify
if an Equivalent Martingale Measure exists, in other words, if there is any change of measure that
generate a Martingale process, usually using the Girsanov Theorem. The most common way of
getting an equivalent martingale measure is changing the drift of the diffusion process. Given a
diffusion process:
under the measure P, we can set a new measure Q using the Girsanov Theorem:
Z t 2
dQ 1 f ∗ (Xs (ω)) − f (Xs (ω))
(8) (ω) = exp(− ds
dP 2 0 σ(Xt (ω))
Z t
f ∗ (Xs (ω)) − f (Xs (ω))
+ dWt (ω)).
0 σ(Xt (ω))
So Q is equivalent (share the same sets of null measure) to P. Furthermore:
f ∗ (Xs (ω)) − f (Xs (ω))
(9) dWt∗ (ω) =− dt + dWt (ω)
σ(Xt (ω))
where f ∗ (Xs (ω) and dWt∗ (ω) denotes the drift and Brownian motion in Q Measure.
The no-arbitrage pricing tries to find restrictions in the diffusion process that make the drift
term f ∗ (Xt (ω)) in Q measure equal to zero. If there exists a measure and it is unique so there is
only one no-arbitrage price and the market is complete3.
3This is an informal definition of the Second Fundamental Asset Pricing Theorem - see Shiryaev (1999) and Delbaen
and Schachermayer (2006).
173
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 6
The formal definition for infinite dimensional spaces Ω is known as the First Fundamental
Asset Pricing Theorem, which is related with a no-arbitrage pricing is only possible if and only
if the discounted strategies in measure of Q are semimartingales processes (e.g. Delbaen and
Schachermayer (1994)). Semimartingales are stochastic processes that can be decomposed as:
Xt = Mt + At
where Mt is a Local Martingale and At is a càdlàg predictable process with finite variation
conditioned on a filtration Fn−1 4.
The result in Delbaen and Schachermayer (1994) is clear - no-arbitrage pricing is only possible
for semimartingales processes. Thus, more general classes of processes that are not semimartingales
cannot in principle be used as price processes in finance. This limitation can be very restrictive,
since there are processes that are not semimartingales with interesting features (for example, some
form of dependence in the increments of process) which could be used as price processes in finance.
A process with interesting characteristics to represent prices is the fractional Brownian motion
(fBm)5 . The fractional Brownian motion, introduced in Kolmogorov (1940) and formalized by
Mandelbrot and van Ness (1968) is the simplest stochastic process in continuous time with long
memory.
In this process the increments are stationary but are not independent, and thus, are not a
Markov process. To define a fBm, consider a probability space (Ω, F , P ); a normalized fBm BH (t)
is characterized for its covariance structure:
1 2H
(11) E(BH (t)BH (s)) = |s| + |t|2H − |t − s|2H , 0 < H < 1.
2
In (11) the H coefficient is known as the Hurst coefficient, in homage to British climatologist
H.E. Hurst, one of the first researchers to study the long dependence phenomena.
The fBm process has the property called self-similarity, which means that for given a we have:
174
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 7
F represents the distribution of process, and we can read this property like changes in time
scales having the same effect of scale changing 6.
The representation of fBm defined in Mandelbrot and van Ness (1968) uses the stochastic
integrals for Brownian motion W = (Wt )t∈R , with W0 = 0. With 0 < H <1 the processes fBm
BH (t) is given by:
Z 0 h i Z t
H−1/2 H−1/2 H−1/2
(13) BH (t) = cH (t − s) − (−s) dWs + (t − s) dWs ,
−∞ 0
r
2HΓ( 23 −H)
with normalization constant cH = Γ( 12 −H)Γ(2−2H)
, where Γ is the Gamma function.
Recall some important fBm properties:
Xh i
0 < sup E |BH (ti+1 ) − BH (ti )|1/H < ∞.
Π ti ∈Π
175
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 8
In financial applications and in estimation of diffusion process, the property 9 is the most
important. Excluding the case H =1/2, the Equivalent Martingale Measure construction does not
work for fBm, since the process is not a semimartingale and as seen in Delbaen and Schachermayer
(1994), and in its presence, it should represent an arbitrage situation.
The existence of arbitrage in the presence of fBm was initially showed by Rogers (1997). The
intuition behind Rogers (1997) demonstration is that we can create an arbitrage strategy set-
ting a short (long) portfolio when the dependence structure in increments indicates an increasing
(decreasing) motion in the asset price; and this strategy would be explored by transacting contin-
uously in time.7 But recent studies show that it is possible to use fBm in financial applications,
i.e., it is possible to price using no-arbitrage by the imposition of some restrictions. In the same
article Rogers (1997) presents a modification in kernel of fBm process which avoids the existence
of arbitrage strategies. Another way to build no-arbitrage processes with the fBm was introduced
by (Hu and Oksendal (2000)), using an alternative fBm representation using Wick-Ito-Skorohod
stochastic integrals (e.g. Duncan et al. (2000)). However they do not have any economic meaning.
More intuitive ways to obtain no-arbitrage with fractional Brownian motion can be done by
putting restrictions in the set of possible strategies (Cheridito (2003) and Jarrow et al. (2007)) and
as proved by Guasoni (2006) in the presence of transaction costs the fBm is free of arbitrage. In
Cheridito (2003) and Jarrow et al. (2007) a simple way to obtain no-arbitrage is placing a restriction
that the agents cannot negotiate continuously; particularly this restriction does not allow the
arbitrage strategy in Rogers (1997). In the presence of transaction costs (Guasoni (2006)) arbitrage
strategies in fBm would cause infinite costs. Another important result is obtained in Kluppelberg
and Kuhn (2004), which shows that asset prices formulated as Poisson shot noise processes converge
weakly to fractional Brownian motion processes, and get a representation arbitrage free for this
model.
In our particular problem of interest rate modeling, the main results are obtained by Ohashi
(2009). In this article, no-arbitrage representations for fBM process are obtained for the Heath-
Jarrow-Morton framework (Heath et al. (1992)), using the proportional transaction costs method-
ology of Guasoni (2006). As Heath-Jarrow-Morton’s class contains as particular cases the short
rate models (in special, Cox-Ingersoll-Ross’ models used in this article) we have the compatibil-
ity between the stochastic differential equations processes used in interested rate modeling and
fractional Brownian motion.
7A detailed discussion about the arbitrage problem with fBm can be found in Mishura (2008).
176
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 9
We can define the short interest rate process, measured in continuous time, using a stochastic
differential equation in the general form:
where a(t, Xt ) is the drift process, b(t, Xt ) is the diffusion (volatility) process and W(t) is the
standard Brownian motion. Usually, we work with expressions for the drift and the volatility
which depends of a parameter set. When parameters are constant in time, the process is a
time homogenous s and with time variying parameters, we have a heterogeneous process. In
heterogeneous interest rate models, usually we do not use statistical estimation of parameters, but
usually calibration, where the models parameters are adjusted to reproduce observed variables in
the market, like discount curves or derivative prices.8
Estimation of stochastic differential equations refers to the procedure of estimating the process
parameters based on the observed paths of the process. Note that for a finance process we have a
fundamental problem - although the process is set in continuous time, financial data is presented
in samples observed discretely. For example, interest rate data is usually presented in daily or
monthly frequencies.
The discrete sampling is a fundamental problem, since using only discrete data for model es-
timation when the model is constructed in continuous time we have an incorrect specification
problem, which usually leads for estimator’s inconsistency. The construction of econometric es-
timators for continuous time process is the building of consistent estimators using discrete data,
and in the literature, we have a wide range of methodologies that deals with this problem. Here
is a list for some methodologies proposed in the literature:
8See Brigo and Mercurio (2006), chapters 4,6 and 7 for more information about these procedures.
177
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 10
Estimators can be clustered in many categories. For example, we could group estimators in
methods that use exact or approximations to the likelihood function (1,2,4,5,6,12,13); methods that
use moment conditions generated by the diffusion processes (3,7,8,9), non parametric approaches
(8,11); estimators based on Monte Carlo simulations (5,7,8,9,12,13) against estimators based in
analytical formulas (1,2,14).
Estimators based on the likelihood function can be formulated based on an exact discretization
of the process, e.g. the distribution of the discrete process Xt is known and coincides with the
distribution of the continuous time process. This is the case of the Geometric Brownian motion
(e.g. Campbell et al. (1997)). Another way is using analytical forms for the transition density of
the process f (Xt+∆t |Xt ), like the analytical forms obtained by Aït-Sahalia (1999) or the approach
using Hermite expansions obtained by Aït-Sahalia (2002). In theses cases, the estimator is based
on closed formulas, but there are techniques that evaluate the likelihood function using simulation,
like the Simulated Maximum Likelihood that uses simulated paths generated by Euler or Millstein’
discretization ((Pedersen (1995))), or Bayesian estimations using Markov Chain Monte Carlo or
Particle Filter (e.g. Johanes and Polson (2005)). In diffusion process with stochastic volatility,
a common technique is the Quasi-maximum Likelihood (Lund and Andersen (1997)) using the
Kalman Filter, where the likelihood function used in the estimation does not match the true
likelihood and the estimator, although biased, has minimum mean squared error property.
Estimators using moment conditions are widely used in the estimation of stochastic differential
equations. The estimation using Generalized Method of Moments using a discretization of the
process maybe is the most common methodology used in econometric estimation of stochastic
differential equations (e.g. Chan et al. (1992)). We have also the moment’s conditions derived
from infinitesimal generators from Markov’s process (Hansen and Scheinkman (1995)). In the cases
where the theoretical moments are not known, a very useful methodology is Simulated Method
of Moments (McFadden (1989)), where estimators are defined as solutions of minimum distance
178
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 11
between simulated moments and sample moments. In this category based on moments conditions,
we can also place estimators based on estimating equations (Kessler (1997), Kessler (2000)) and
the methodology of martingale estimating equations (Bibby and Sorensen (1995), Bibby et al.
(2007)).
Estimators using non parametric methods for diffusion processes were discussed in Aït-Sahalia
(1996), where drift and volatility are estimated through kernel regressions, and the transition
density of the process can be estimated by non parametric density estimators. Another application
for non parametric methods is building an auxiliary model for the estimation by Efficient Methods
of Moments (Gallant and Tauchen (1989) and Gallant and Tauchen (2001)) using series expansions
using the Hermite polynomials.
Another way to define estimators is through approximate discretization of stochastic integrals
that define the solution of stochastic differential equations. An application for the estimation of
generalized Cox-Ingersoll-Ross’ process can be found in Bishwal (2007).
discussed are not directly applicably by the fact that fBm process is neither semimartingale nor
a Markov process. To make these problems more clear, note that the Markovian property states
that, conditioned to a filtration Ft , future and past realizations of the process are independent,
and so the process distribution is given by:
for any t>0. A necessary condition for the process to be Markovian is that its transition density
p satisfies the Chapman-Kolmogorov equation:
179
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 12
Z
(16) p(y, t3 |x, t1 ) = p(y, t3 |x, t2 )p(y, t2 |x, t1 ),
t∈Ω
for each t3 > t2 > t1 and x, y are events in Ω or equivalently its transition density should
obey forward and backward Kolmogorov equations (e.g. Definition 5.1.1 in Karatzas and Shreve
(1987)).
If the Markovian property is satisfied, we can write the likelihood function of the process as:
T
X
(17) LN (X|θ, ∆t) ≡ log p(Xt |Xt−∆t , θ, ∆t),
t=1
where θ is the parameter vector. When the Markovian property is valid we can set the likelihood
function as the product of transition densities of the process, which depends only on the immediate
past, and thus we are assuming that increments of the process are independent.
The link between Markovian property and semimartingales process can be seen by Theorem
5.4.20 by Karatzas and Shreve (1987), which sets that the strong Markovian property can be
obtained when the drift and the volatility coefficients are bounded in Rd compact spaces and
the solution for the homogenous martingale problem in time ( definition 5.4.15 of Karatzas and
Shreve (1987)) is well defined. However, in the presence of fBm instead of a standard Brownian,
the homogenous martingale problem is not well formulated (this has expected value different than
zero in the fBm presence), preventing the use of the Markov construction of likelihood function.
In process with dependent increments, the process density does not depend only on the most
recent past but on the whole history of the process, and thus each Xt in the likelihood function
must be conditioned in principle to each observation before Xt−n . In the case of short term
dependent process, we have the asymptotic independence property related with the exponential
mixing concept (exponential α − mixing e ρ − mixing, e.g. Genon-Catalot et al. (1992)), but in
long memory process case as the case of fBm with H > 1/2 this property cannot be used.
The violation of the Markovian property makes the application of maximum Likelihood esti-
mation into continuous time process a difficult issue, given the complexity in the evaluation of
likelihood function. This problem is the analogue to the estimation of long memory process based
on discrete sampling (e.g. estimation of fractionally integrated ARFIMA (p,d,q) process using an
exact formula for the Durbin-Levison algorithm, (e.g. Palma (2007))), since this algorithm would
180
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 13
depend on an infinite past of discrete observations; in continuous time long memory process, the
transition density must be conditioned on all the continuous past of the process.
The violation of the Markovian property does not affect only the estimators based on likeli-
hood function; for example, the estimator of Hansen and Scheinkman (1995) is build based on
infinitesimal generators of a Markovian process, and yet non parametric estimators like the ker-
nel regression estimator used in Aït-Sahalia (1996) should be modified in order to consider the
influence of all past observations of the process.
The fact that fBm is not semimartingale has serious consequences for most of the estimator’s
properties. Usually all estimators for continuous time process use asymptotic properties based
on convergence and central limit theorems for semimartingales (e.g. Jacod and Shiryaev (2002)),
and some estimators are directly based on martingale property, like the estimators in Bibby and
Sorensen (1995) and Bibby et al. (2007). Likewise, there are not known results for estimators
based in approximations for stochastic integral for fBm in the discrete case, noting that in this
case we could not use Ito’s stochastic calculus but fractional stochastic calculus’ tools (e.g. Bishwal
(2007) and Mishura (2008)).
In this context, the estimation techniques for non-Markovian process can be more easily imple-
mented with methodologies based on simulation, like the principle of Indirect Inference, Simulated
Method of Moments and the Efficient Method of Moments. In this article, we discuss the Indirect
Inference implementation (Section 6), but a possible alternative, quickly discussed below, is the
implementation of Simulated Method of Moments.
The Simulated Method of Moments defines estimators for the parameter θ through the mini-
mization of the following distance 9:
" T T
#2
b 1 X 1 X s
(18) θ = arg min Xt − X (θ) ,
θ T t=1 T t=1 t
where Xt are the observed trajectories of the process and Xts (θ) are simulated trajectories of
the process given by vector θ, which converges to the solution:
when n → ∞
9See Gourieroux and Monfort (1996) and Singleton (2006) for more complete references about Simulated Method
of Moments
181
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 14
In this case, we can set estimators for stochastic differential equations driven by fBm through
conditions derived from sample moments (especially derived from moments of the fBm’s covariance
function) and simulated moments from a simulations based on stochastic differential equations
driven by fBm processes. This method must be yet to be explored and since we have some
computational problems involved in the estimation by Indirect Inference, it can be an interesting
alternative. The Efficient Method of Moments is a refinement of the principle of Indirect Inference,
and its possibility will be discussed in Section 6.
4.2. Fractional Cox-Ingersoll-Ross. Referencing our discussion, we are going to work with a
specific short term interest rate model, although the discussion would be, initially, valid on any
model with directly observed components10 . A simple model with interesting properties is the
Cox-Ingersoll-Ross (Cox et al. (1985)) model. The Cox-Ingersoll-Ross model (abbreviated for CIR
in this text) is a single factor model having analytical expressions for the transition density of the
process (Aït-Sahalia (1999)) and also having closed formulas for bonds and options pricing. This
process is given by the following stochastic differential equation:
p
(19) dXt = κ(µ − Xt )dt + σ Xt dWt ,
where µ, κ and σ have the interpretation of long run mean, mean reversion velocity and volatility,
respectively, and Wt is a standard Brownian motion. The positivity condition of the process is
given by 2µκ > σ. Note that in this process the volatility changes in time and it is given by
dx2t = σ 2 xt dWt .
What we are define the fractional Cox-Ingersoll-Ross (CIR-fBm) as a diffusion in form:
p
(20) dXt = κ(µ − Xt )dt + σ Xt dBH (t),
where dBH (t) is fBm with Hurst coefficient H, as set by equation (3.4). In this process, we joined
the dependence structure of the CIR process with increments generated by fBm process, allowing
the long memory possibility in the process when H>1/2, but also a short memory dynamics given
by the mean reversion component in the drift. Until the present date, this model was not studied
10Models with unobservable components, like stochastic volatility, could be treated with some changes concerning
evaluation of moments of the unobservable components e.g. Gourieroux and Monfort (1996)).
182
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 15
in the literature (e.g. Bishwal (2007)), and the closest study in the literature is known as fractional
Ornstein-Uhlenbeck model that can be represented as:
which is a restricted form of the CIR-fBm without the square root component in volatility.
This model properties are compiled in Bishwal (2007). It is important to note that inference
methodologies for this Ornstein-Uhlenbeck process are also based on continuous sampling.
The first step is to study estimators of stochastic differential equations driven by fBm when
this estimators are designed for the standard Brownian innovations. Since we are working with
wrongly specified models, we conduct a Monte Carlo study of three estimators usually used in
estimating diffusion processes.
The Monte Carlo experiment is made with the following structure - for a value grid of Hurst
11
H coefficient ranging from .5 to .90, with .05 increments, we simulate 1000 replications of a
CIR process given by equation (19) with parameters µ = .1, κ = .5 and σ = .3, and samples sizes
200 and 500. Since there is not a known transition density for the case of a CIR process with
increments given by fBm, we simulate each trajectory using an Euler-Maruyama discretization
with [ti+1 − ti ] = .1 for the stochastic differential equation given by the Equation (20).
For diffusion in the form:
p
(23) bt+∆t = X
X bt + a(X
bt )[∆t] + b(X
bt )) [∆t]Yt ,
where Yt is the increment of error process in the interval ∆t. From these simulations, we
estimate the parameter vector (µ,κ, σ) for each simulated trajectory using three estimators: exact
11The simulation of the diffusion process driven by Fractional Brownian motion is detailed in Section 6
183
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 16
Table 1. Mean of the CIR-fBm model parameter estimates based on 1000 repli-
cations - Sample size 200, simulated values µ = 0.1, κ = 0.5 and σ = .3.
Table 2. Mean of the CIR-fBm model parameter estimates based on 1000 repli-
cations - Sample size 500, simulated values µ = 0.1, κ = 0.5 and σ = .3.
maximum likelihood using the transition density of the Cox-Ingersoll-Ross process, maximum
184
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 17
Tables 1 and 2 show estimates means for 1000 simulations for each sample size and estimation
method. Note that is hard to identify the bias caused by fBm in these estimators in a systematic
way. The common result is that the volatility parameter is always overestimated, while persistence
parameters and long term mean are overestimated by some methods and underestimated by others.
For example, the κ parameter is overestimated using transition density estimation, while it’s
underestimated by GMM methods. The effect on long term µ is analogous to the behavior observed
for κ.
It is also interesting to note that for larger values of Hurst coefficient, the gets more difficult
- for example, for values larger than .90 it is very difficult to obtain a convergent estimation for
any method; in samples with size 500, most of the estimations for this parameter value does not
converge, and so we do not report the results for this value.
However the result of the Monte Carlo study is to show that usual estimators are severely
affected by the presence of fBm. For reasons discussed in Section 4.1, the fact that fBm is not a
Markov process or a semimartingale makes impossible the use of most common estimators in fBm
driven process, which is evident from this simulation study that although limited (the bias can be
different for other parameters sets).
As discussed in section 4.1, there are situations where there is not an analytical form for the
likelihood function or there is not an easy way to evaluate this function. Examples of this situation
are models with latent variables like stochastic volatility models, endogenous regime switching or
discrete choice models with serial dependence. In all of those cases, likelihood evaluation involves
the integration of all latent factors14, representing an integral of dimension equal the number of
14See Gourieroux and Monfort (1996) for discussions about possible applications.
185
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 18
The Indirect Inference method (Gourieroux et al. (1993), Smith (1993) and Gallant and Tauchen
(1996)) is based on the construction of a consistent estimator, correcting the bias in the instru-
mental model by Monte Carlo simulations. The Indirect Inference procedure, using the notation
of maximum likelihood estimation, can be formulated through three steps:
(1) Estimation of the parameters of the instrumental model using sample observations:
T
X
(24) θbaux = Argmax log ft (Xt |θ).
i=1
(2) Simulation of a trajectory of the true model conditioned on the parameters vector esti-
mated with the auxiliary model, creating an artificial series yts , T sized, and estimating
the auxiliary model with the artificial series:
T
X
(25) θbsim = Argmax log ft (Xts |Xt−∆t
s
, θ).
i=1
(3) Estimation of the consistent parameter vector θ through the calibration of the existing
bias in estimated vector θbaux , through the minimization of the following criterion function:
(26) θc b b ⊤ b b
II = arg min(θaux − θsim (θ)) Ω(θaux − θsim (θ)),
θ
where Ω is a positive definitive weight matrix. This step is usually performed using some
numerical minimization algorithm to calculate θc cp p
II = limp→∞ θII , where θII is the p-th
interaction given by θc
p b bp
II = h(θaux , θ sim (θ)), being an algorithm for update in the criterion
function.
Note that we can replace the maximum likelihood estimators by other methods. In this article
we use GMM estimators in steps (1) and (2) of the Indirect Inference procedure.
The asymptotic distribution of the Indirect Inference estimator is given by:
√
(27) T (θc
II − θ0 ) d N [0, W (S, Ω)] ,
−
→
with
(28)
⊤ −1 ⊤ ⊤ −1
1 ∂b ∂b ∂b ∂b ∂b ∂b
W (S, Ω) = 1+ (θ0 )Ω ⊤ (θ0 ) (θ0 )ΩΩ∗−1 Ω ⊤ (θ0 ) (θ0 )Ω ⊤ (θ0 ) ,
S ∂θ ∂θ ∂θ ∂θ ∂θ ∂θ
186
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 19
where S is an scale factor linked to the number of moments conditions, the binding function
b(θ0 ) = Eθ0 k(Xt ) sets the distance between the marginal moment of Xt and its expected value,
with function k defining the parameter function to be estimated, defining a distance metric in
−1
equation 24 and Ω∗ = J0 I0 J0 with J0 given by:
∂ 2 ψ(Xt ; θ)
(29) J0 = p lim − [Xt ; b(θ0 )] ,
T ∂β∂β ⊤
−1 √ ∂ψ(Xt ; θ) √ ∂ψ(Xt ; θ)
(30) I0 = lim V T [Xt ; b(θ0 )] − Eθ0 T ,
T →∞ ∂β ∂β
where β is the pseudo-true value of k(Xt ) and the criterion function ψ(Xt ; θ) given by:
⊤
1 1
(31) ψ(Xt ; θ) = − k(Xt ) − β k(Xt ) − β .
T T
This result shows that the estimators are consistent and asymptotically Gaussian15. In the case
of a just identified model - the number of auxiliary model parameters is equal to the number of
parameters in the structural model, we have that W (S, Ω) reduces to:
1 ∂b⊤ ∂b
(32) W (S, Ω) = 1+ (θ0 )Ω∗−1 ⊤ (θ0 ) .−1
S ∂θ ∂θ
In our problem, the application of Indirect Inference is used to correct the inconsistency problem
not only generated by the presence of dependent increments generated by the fBm process, but
also the problem generated by the use of an approximate discretization of the process, as usual in
inference for continuos time process using discrete data.
Note that the fundamental property for the consistency of the Indirect Inference estimator is
the use of simulated trajectories for the true model (e.g. Gourieroux and Monfort (1996)). In
this problem the condition of consistency is achieved by the use of consistent discretizations of the
fBm diffusion process. As discussed in section 6.1, the Euler discretization methodology used in
the simulation step is consistent, and thus the consistency of the estimator is obtained when the
interval of discretization converges to zero.
15For the proof of this property see Gourieroux and Monfort (1996), appendix 4A.
187
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 20
6.1.1. Auxiliary Model Choice. The choice of auxiliary model is based on three main points. First,
the auxiliary model must follow a identification restriction, i.e., must be possible to recover the
structural model parameters through the auxiliary model. This question is analogous to the
estimation of the structural model parameters by the use of a reduced form model; the number
of parameters (and therefore moments conditions) of the auxiliary model equal or higher to the
number of parameters of the real model.
The second point tells that the procedure efficiency depends on the auxiliary model quality of
adjustment. The Efficient Method of Moments (Gallant and Nychka (1987),Gallant (2007)) is a
refinement of Indirect Inference method, where it gets efficient estimators through the use of a
non parametric auxiliary model, using a structure with a higher number of parameters and thus
getting a very good fit for the observed sample. However, there are evidences showing that the
behavior of Efficient Moment of Moments in small samples can be worst than simpler Indirect
Inference estimators, as the studies in Chumacero (1997), Michaelides and Ng (2000), Ghysels
et al. (2003) and Zivot and Czellar (2008) show.
Finally, the third point says that the estimation of the auxiliary model must be fast, since for
each new step in the calibration process involves estimation for each simulated sample and com-
puting questions can limit the application of this procedure. Given the complex implementation of
Efficient Method of Moments, in this study we work only with the principle of Indirect Inference.
In our problem, the auxiliary model choice for the parameters (µ, κ, σ) estimation has a natural
candidate that is the GMM using the Euler discretization of process CIR, given by equation 23.
The estimation of the model using GMM is helpful, since the optimum weighting matrix can be
set using robust methods for heteroscedascity and serial correlation, increasing the efficiency of the
auxiliary model estimation. Note that we could use as auxiliary model the likelihood estimation
using the transition density of process CIR (Aït-Sahalia (1999)), but this procedure has shown
to be quite unstable in the estimation phase with simulated data, especially for larger Hurst
coefficient parameters, so the choice of GMM method was set by the asymptotic efficiency of the
procedure and because it is more stable in estimation.
188
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 21
To build a GMM estimator, we reparameterized the drift κ(µ − xt )dt as (α + βxt )dt, with
α = κµ and κ = −β. In this case we can formulate the conditions needed to estimation of the
parameters (α, β, σ2 ) defining εt+∆t = (rt+∆t − rt ) − (α + βrt )∆t and thus we have the following
four conditions of moments:
εt+∆t
εt+∆t rt
(33) g(θ) = .
ε2t+∆t − σ02 rt2 ∆t
(ε2t+∆t − σ02 rt2 ∆t)rt
The GMM estimator is obtained using the Iterated GMM estimator (Hansen et al. (1996)).
With this procedure we estimate only parameters (µ, κ, σ). To get an estimator for parameter
H we use an estimation procedure through wavelets decomposition (e.g. Percival and Walden
(2000),Palma (2007)), using residual series from CIR model estimated from GMM.
To define this estimator, note that a wavelet is a function ψ(t) with the following properties:
Z
(34) ψ(t)dt = 0.
We define a familty of dilations and translations of the wavelet function ψ by the expression:
Z
(36) djk = y(t)ψjk (t)dt,
Z
(37) ψij (t)ψkl (t)dt = 0 ∀ i, j.
The main advantage in using a orthogonal basis is that any real function can be expressed as:
189
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 22
∞
X ∞
X
(38) y(t) = djk ψjk (t).
j=−∞ k=−∞
To get an estimator for the H parameter, we set ubj using the discrete transform djk :
nj
1 Xd
(39) ubj = djk2 ,
nj
k=1
22dj 2
(40) ubj ∼ χ
nj nj
where nj define the level of discrete transform. Taking logarithms, we have the expression:
with E(logχn ) = ψ(n/2)+ log2, V ar(logχn ) = ζ(2, n/2), where ψ is the psi function, ζ the Zeta
Riemann function and ψ(z) = d/dzlogΓ(z). Setting E(εj )=0 and V ar(εj ) = ζ(2, n/2)/log(2)2,
estimation of H parameters can be done through the following linear regression:
(42) yj = α + βxj + εj ,
b
b is given by (β)/4
where yj = log2 ubj − gj , α = log c and β = 2(H − .5). Variance of H
(Palma (2007), pg. 92). We use in this estimation and in the fBm simulation procedure fBm the
wavelet decomposition given by the family of Daubechies-10 wavelets. With the estimation of H
we complete the auxiliary model θbaux = (b bσ
µ, κ, c2 , H).
b
6.2. Simulation step. In the simulation step, we must simulate trajectories of CIR-fBm process
with the parameters estimated in the first step. As discussed in section 5, there is neither a
transition density nor an exact discretization for this process and we must simulate through Euler
discretization given by equation (22). In this step we will detail the simulation process of fBm
trajectory.
190
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 23
6.2.1. fBm simulation using wavelets. The simulation of fractional Brownian motion is non trivial,
and there is a range of possible methods16. Some methods are exact (Hoskings; Cholesky; Davies
and Harte method), and some methods are approximate (like methods based on the Stochastic
Integral Representation of fBm and based on wavelets). Exact methods are extremely slow and
memory intensive, and in our problem, as we have to make repeated simulations for each step of
the optimization procedure, speed simulation is a relevant issue. The most efficient computational
form of simulating fBm trajectories is through a process know as wavelet Synthesis, which is
the chosen form for this article. Remembering the definitions of the wavelet function (Equations
34-38), the wavelet discrete transform of an fBm given by the following expression:
Z
(43) dBH = BH (t)ψjk (t)dt.
Our interest is not to decompose fBm, but to compose a trajectory of this process. A possible
way (Dieker (2004)) is through the following expression for the simulation of an observation in
period t of fBm process:
ℑ
X X
BH (t) = lim dbBH (j, k)2−j/2 ψ(2−j t − k),
ℑ→∞
j=−∞ k∈N
where dbBH (j, k) are Gaussian random variables with variance given by σ 2 2j(2H+1) . Using this
method we have trajectories of fBm process, and then we can create simulated trajectories of the
CIR-fBm process using Euler discretization. Note that the discretization usage is one approxi-
mation to the solution of the diffusion process and therefore the consistence for our estimators
are conditioned to the validity of this approximation for a ∆t converging to zero. See Kloeden
and Platen (1992) for a detailed discussion. The Euler approximation properties for stochastic
differential equations driven by fBm were studied in Nourdin (2005) and Neunkirch and Nourdin
(2006), and show that Euler approximations are consistent and its approximation error is almost
sure equivalent to a process given by δ 2α ξt , with the analytical form of ξt given explicitly, and
this consistency of Euler discretizations ensures consistency of the Indirect Inference estimator.
In Mishura (2008) the convergence study is performed to some general forms of approximations
of fBm process.
16A detailed discussion about other methodologies of estimation and simulation of fBm processes can be found in
Dieker (2004), and for a reference of simulation of long memory processes using wavelet Synthesis see Percival and
Walden (2000).
191
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 24
To complete this step, for each simulation done we estimate again the model parameters, using
the auxiliary model previously defined.
6.3. Optimization Step. The Indirect Inference estimator is obtained from the minimization of
the distance between the auxiliary model estimator and the one used in simulated data, defined
by Equation (26).
The computational implementations of steps 2 and 3 require the imposition of some restrictions.
The first restriction that we need to lay on is the simulated trajectory matching with a series of
positive interest rates. Therefore, if the simulation makes a rate with negative points, we simulate
a new fBm trajectory until we have positive rates, in a procedure of sampling with rejection.
In this case we are also minimizing with restrictions, laying on the positivity restriction of the
CIR model and in addiction putting a condition on the interval of Hurst coefficient which is set
to values between 0 and 1.
An additional detail is, as discussed previously, having many problems of convergence for higher
values of Hurst coefficient, and thus, the procedure is quite weak for these regions. Another point is
to set the optimum number of simulations for each step of the criterion function evaluation phase,
and we found the number of 20 simulations for each step of the criterion function evaluation
generates satisfactory results. Note that the estimator is computationally intensive.
The minimization numerical algorithm chosen is the algorithm of Nelder-Mead, which was more
efficient and robust to initial value problems than other estimators, like the methods of BFGS or
DFP.
An important step in the simulation is to perform a burn-in procedure during the simulation
step, to cancel the influence of initial observations. Disposal of a large number of initial obser-
vations is especially important since due to long memory structure in the simulation of fBm the
Due to the difficulties already discussed the implementation of the Indirect Inference estimator,
to study the properties of this estimator we performed a series of Monte Carlo experiments.
In the first experiment we used as configuration the parameters vector (µ=.03, κ=.7, σ 2 =.05,
H=.6) and perform the estimation by the methods of Iterated GMM (auxiliary model ) and
Indirect Inference.
192
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 25
In this configuration we performed the estimation with 4 settings, using discretizations sizes
∆t=1/12 and 1/365 and sample sizes 500 and 1000. For each experiment we simulated 500
replications of the procedure, using a number of 20 simulations within the Indirect Inference
simulation step.
Results are shown in Table 3, with the four cases studied (Case 1 - sample size 500, ∆t=1/12,
case 2 - sample size 500, ∆t =1/365, case 3 - sample size 1000, =1/12 and case 4 - sample size
1000 and ∆t=1/365) and in each case showing the estimator mean, bias, mean squared error and
mean absolute error for each parameter.
Overall results show that the GMM estimator has superior performance in the estimation of
long-term average, but underestimates the parameters of the mean reversion for discretization
∆t=1/12 and overestimates for ∆t=1/365, and overestimates σ 2 and underestimates the Hurst
finer discretization the greater proximity in time means a higher covariance between innovations,
and so affect more intensively the estimation of the parameter κ, showing the inconsistency of the
estimation by this method in the presence of a fBm as process innovations. For µ we can observe
the same effect.
Except for the estimation of the parameter of long-term average, the estimation by the method
of Indirect Inference is superior to GMM for all parameters, and with a finer discretization and an
increased sample size Indirect Inference estimator shows convergence the true parameters, which
is not true for the GMM estimator in this situation.
193
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 26
Table 3. Monte Carlo - GMM and Indirect Inference for fBm-CIR Model µ=.03,
κ=.7, σ 2 =.05, H=.6
Figure 1 shows the distribution of the estimators obtained by Monte Carlo for each parameter
in the case 4, and confirming evidence that the procedure of Indirect Inference is considerably
better in the estimation of the parameters κ and H, being more concentrated around the true
value parameter that the GMM estimator, with bias and variability significantly lower.
A second experiment (Table 4) was performed using a parameter setting parameters κ=.95
and H=.9, using a discretization ∆t=1/365 and sample size 500. In this new configuration the
performance of the Indirect Inference estimator is significantly better than the GMM estimator,
except for the parameter µ. The GMM estimator presents a problem of high negative bias for the
estimation of κ and H, emphasizing the need for the correction proposed by the Indirect Inference
estimator.
8. Applications
using three interest rate series. The first series studied is a monthly interest rate on the Treasury
Bills from 03/1964 to 12/1989 (305 observations), similar to the interest rate series studied in
Chan et al. (1992). The second analyzed series is a one day interest rate on Eurolibor rate, with
sample with dates from 01/03/2000 to 27/03/2008 (2610 observations). The last series studied
is monthly Canadian interest rate data with one month maturity, with sample from 1956:11 to
194
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 27
Table 4. Monte Carlo - GMM and Indirect Inference for fBm-CIR Model- µ=.03,
κ=.95, σ 2 =.05, H=.9
gmm II
mean µ 0.02333 0.04804
bias µ -0.00667 0.01804
mse µ 0.00075 0.00143
mae µ 0.01870 0.02176
mean κ 0.73852 0.93412
bias κ -0.21148 -0.01588
mse κ 0.75566 0.00333
mae κ 0.58912 0.02833
mean σ 2 0.10040 0.07152
bias σ 2 0.05040 0.02152
mse σ 2 0.00254 0.00046
mae σ 2 0.06324 0.02810
mean H 0.80240 0.92557
bias H -0.09760 0.02557
mse H 0.04362 0.00534
mae H 0.15547 0.05305
mu 15
kappa
GMM GMM
60
II II
50
10
40
Density
Density
30
5
20
10
0
sigma^2 H
12
GMM GMM
30
II II
10
25
8
20
Density
Density
6
15
4
10
2
5
0
195
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 28
1996:6 (512 observations). These data are the same analyzed in Tkacz (2001). The Figures 2, 3
and 4 show the graphics for these series.
Table 5, 6 and 7 show the estimation results of the auxiliary model estimated by Generalized
Method of Moments (GMM) and the Indirect Inference estimator (II) for the CIR-fBm process, for
the studied series. For the Treasury Bills series we can note that the results for the estimation of
the model by GMM and Indirect Inference are close, and the estimated H parameter is statistically
different than 1/2, with H<1/2 and pointing to a short memory and negative correlation in the
increments of the process. It is also interesting to note that the estimated parameter for the
persistence parameter by the Indirect Inference method is smaller than the obtained by GMM
method, showing that part of the persistence dynamic was not in the mean reversion component,
but in the correlation structure of the increments of the process. The standard errors estimates of
both methods are very close. This is expected because althoug the null hypothesis that H=0.5 is
rejected the estimate is close to 0.5.
196
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 29
Figure 3. Eurolibor
µ κ σ H
GMM 0.04546093 0.81964484 0.05349082 0.44146423
GMM. s.e. 0.090080400 0.029319553 0.014459598 0.005245541
II 0.03238156 0.69379688 0.05813648 0.48755111
II - s.e. 0.09030874 0.029371080 0.014485505 0.005261965
For the series of Eurolibor rate (Table 6), the estimated parameter H is not statistically different
than 1/2, showing that the standard Brownian assumption cannot be rejected for this rate. A
favorable evidence for this result is that estimated parameters and theirs standard errors estimates
are similar for the two methods, showing that the correction by the present of a possible component
of fBm is not necessary.
The interest rate series of Canada, following Tkacz (2001) study, has previously showed evi-
dences of a long memory process. In this study, a wavelet-OLS estimator was used for the long
memory process, but there was no control for the structure of the short memory in continuous
197
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 30
µ κ σ H
GMM 0.3355136 0.1167345 0.2066294 0.5653880
GMM. s.e. 0.03030097 0.12100467 0.01191316 0.04937245
II 0.3000884 0.1196627 0.194989 0.5478063
II - s.e. 0.03030312 0.1210105 0.01191479 0.04938462
time. Using an Indirect Inference estimator suggested in the article, we got support for the long
memory evidence by Tkacz (2001), but controlling the structure of mean reversion existing in the
CIR process and assuming a continuous time process, as show in table 7, showing H significantly
higher than 1/2, supporting the evidence of long memory. For instance the estimate of (κ, σ) in
equal to (.34,.049) by GMM, which does not take into account the fractional process and equal to
(.28,.019) by the II method which takes into account. In this the difference os estimates of κ and
σ by the GMM and Indirect Inference methods are important. Note that this series the standard
errors of the estimates are very different.The Tkacz (2001) example illustrates the importance of
198
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 31
µ κ σ H
GMM 0.01949002 0.33922906 0.04861706 0.63476452
GMM. s.e. 0.079312567 0.027278700 0.0116461 0.004259617
II 0.01444128 0.2848528 0.1868208 0.6107441
II - s.e. 0.10894435 0.03331045 0.03897676 0.01146291
take into account possible fBm process. In practice the GMM estimates would lead to a process
with quick reversion to the mean and to an overestimate of the volatility parameters (this would
9. Conclusions
In this article, we have discussed the estimation of diffusion process driven by a fractional
Brownian motion, in a discrete sampling context. Recent theoretical results (e.g. Cheridito
(2003), Guasoni (2006) and Jarrow et al. (2007)) point to the compatibility between the pricing
by no-arbitrage and process which are not semimartingales, and the results obtained by Ohashi
(2009) for the HJM model using the fractional Brownian motion, show the necessity of building
methodologies for estimation of non Markovian stochastic differential equations and more general
processes than semimartingales. Given practical and theoretical difficulties assumed in these
estimators derivation, like the complexity of the likelihood function, we show that, in this context,
estimation methodologies using simulation as the principle of Indirect Inference are the first step
to address these problems.
The results of the article show that the method of Indirect Inference allows the construction of
estimators with good properties in finite samples in the presence of a fractional Brownian motion.
These estimates show the desired properties when the interval of discretization decreases, in which
the GMM estimator deteriorates dramatically, as shown in Monte Carlo experiments. A possible
generalization is work with a larger class of models. A simple and very interesting modification is
work with the generalized CIR model studied by Chan et al. (1992), where the diffusion process for
fBm would be given by dxt = κ(xt −µ)dt+σxγt dBH (t). Since this model includes many sub models
of diffusion process used in finance, it would be possible to check more widely the methodology
199
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 32
of the integrated volatility of the process (e.g. Barndorff-Nielsen and Shephard (2002), Barndorff-
Nielsen and Shephard (2004)), related to the quadratic variation, is valid only when the sample
interval tends to zero. But exactly in this situation the market microstructure noise dominates the
process of quadratic variation and the estimation becomes inconsistent. This problem has been
studied and treated by many authors in the literature of realized volatility estimation (e.g. Bandi
and Russell (2005), Hansen and Lunde (2006)).
In general, however, all previous works studies assumes a short memory structure for the mar-
ket microstructure effects and the inference procedures, usually, use asymptotic results derived
from semimartingales processes, as discussed in Mykland and Zhang (2005). If the market mi-
crostructure effects would actually be consistent with the long memory process, the estimation of
the diffusion process with market microstructure effects could be made with the estimator of Indi-
rect Inference discussed in this article, given the difficulty of getting likelihood or nonparametric
estimators in this context of long memory. A related result could be found in Bayraktar et al.
(2006). This article studies the effect of investor inertia on stock price fluctuations with a market
microstructure model comprising many small investors who are inactive most of the time. In this
setup the log price process converge to a process with long-range dependence and non-Gaussian
returns distributions, driven by a fractional Brownian motion
In our article we treated only of processes of stochastic differential equations driven by pure fBm,
in other words, all uncertain is set only by fBm. But there are results of conditions existence of
no-arbitrage for mixed processes, where the process is driven by the mixture of standard Brownian
and fractional motions; a detailed discussion about these results can be found in Mishura (2008).
In this article there are some results for the estimation of mixed Brownian-Brownian-fractional
processes, but all results refer to the continuous sampling case and, again, there are not results for
the case of discrete sampling, and the Indirect Inference method could be studied in this context.
References
Aït-Sahalia, Y., Nov. 1996. Nonparametric pricing of interest rate derivative securities. Economet-
200
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 33
Bachelier, L., 1900. Theorie de la speculation. English translation Boness, A. J. in The Random
Character of Stock Market Prices, ed. Paul H Cootner, pg 17–78, Cambridge, Mass, MIT press
1967.
Bandi, F. M., Russell, J. R., 2005. Microstructure noise, realized volatility, and optimal sampling.,
working paper, Graduate School of Business, The University of Chicago.
Barndorff-Nielsen, O., Shephard, N., 2002. Estimating quadratic variation using realized variance.
Journal of Applied Econometrics 17, 455–457.
Barndorff-Nielsen, O. E., Shephard, N., 2004. Econometric analysis of realised covariation: High
frequency based covariance, regression and correlation in financial economics. Econometrica 72,
885–925.
Bayraktar, E., Horst, U., Sircar, R., 2006. Limit theorem for financial markets with inert investors.
533–553.
Chumacero, R., 1997. Finite sample properties of the efficient method of moments. Nonlinear
Dynamics and Econometrics 2, 35–51.
Cox, J. C., Ingersoll, J. . E., Ross, S. A., 1985. A theory of the term structure of interest rates.
Econometrica 53, 385–408.
Decreusefond, L., Ustunel, A. S., 1999. Stochastic analysis of the fractional brownian motion.
Potential Analysis 10, 177–214.
201
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 34
Delbaen, F., Schachermayer, W., 1994. A general version of the fundamental theory of asset
pricing. Mathematische Annalen 300, 463–250.
Delbaen, F., Schachermayer, W., 2006. The Mathematics of Arbitrage. Springer.
Dieker, T., 2004. Simulation of fractional brownian motion. Tech. rep., CWI and University of
Tenesse.
Duncan, T., Hu, Y., Pasik-Duncan, B., 2000. Stochastic calculus for fractional brownian motion.
SIAM Journal on Control and Optimization.
Elerian, O., Chib, S., Shepard, N., 2001. Likelihood inference for discretely observed diffusion
process. Econometrica 69, 959–993.
Gallant, A. R., Tauchen, G., 2001. Efficient method of moments. unpublished manuscript.
Gallant, A. R. Tauchen, G., 2007. Handbook of Financial Econometrics. Elsevier, Ch. Simulated
metrics 8, 85–118.
Guasoni, P., 2006. No arbitrage with transaction costs, with fractional brownian motion and
beyond. Mathematical Finance 16, 569–582.
Hansen, L. P., Heaton, J., Yaron, A., 1996. Finite sample properties od some alternative gmm
estimators. Journal of Business and Economic Statistics 14, 262–280.
Hansen, L. P., Scheinkman, J. A., 1995. Back to the future: generating moment implications for
continous time Markov processes. Econometrica 63 (4), 767–804.
202
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 35
Hansen, P., Lunde, A., 2006. Realized variance and market microstructure noise. Journal of Busi-
ness and Economic Statistics 24, 127–218.
Harrison, J. M., Kreps, D., 1979. Martingales and arbitrage in multiperiod securities markets.
Journal of Economic Theory 20, 381–408.
Harrison, J. M., Pliska, S., 1981. Martingales and stochastic integrals in the theory of continous
trading. Stochastic Processes and Their Applications 11, 215–260.
Heath, D., Jarrow, R., Morton, A., Jan. 1992. Bond pricing and the term structure of interest
rates: A new methodology for contingent claims valuation. Econometrica 60 (1).
Hu, Y., Oksendal, B., 2000. Fractional white noise calculus and application to finance., preprint,
University of Oslo.
Jacod, J., Shiryaev, A., 2002. Limit Theorems for Stochastic Process (2nd Edition). Springer.
Jarrow, R., Protter, P., Sayit, H., 2007. No-arbitrage without semimartingales, unpublished Work-
ing Paper.
Johanes, M., Polson, N., 2005. Handbook of Financial Econometrics. Elsevier-North-Holland, Ch.
MCMC for Financial Econometrics.
Karatzas, I., Shreve, S. E., 1987. Brownian Motion and Stochasic Calculus. Springer–Verlag.
Kessler, M., 1997. Estimation of an ergodic diffusion from discrete observations. Scandinavian
Journal of Statistics 24, 211–22.
Kessler, M., 2000. Simple and explicit estimating functions for a discretely observed diffusion
process. Scandinavian Journal of Statistics 27, 65–82.
Kloeden, P., Platen, E., 1992. Numerical Solution of Stochastic Differential Equations. Springer–
Verlag.
Kluppelberg, C., Kuhn, C., 2004. Fractional brownian motion as a weak limit of Poisson shot noise
processes–with applications to finance. Stochastic Processes and their Applications 113 (2), 333
– 351.
Kolmogorov, A. N., 1940. Wienersche spiralen und einige andere interessante kurvem im
hilbertschen raum. Comptes Rendus (Dklady) de l’Academie des Sciences de l’URSS (N.S.)
26, 115–118.
Le Breton, A., 1998. Filtering and parameter estimation in a simple linear system driven by
fractional brownian motion. Statistics and Probability Letters 38, 263–274.
Lund, J., Andersen, T., 1997. Estimating continuous-time stochastic volatility models of the short-
term interest rate. Journal Of Econometrics 77, 343–377.
203
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 36
Mandelbrot, B. B., van Ness, J. W., 1968. Fractional brownian motion, fractional noises and
applications. SIAM Rev. 10, 422–437.
McFadden, D., 1989. A method of simulated moments for estimation of discrete response models
without numerical integration. Econometrica 57, 995–1026.
Michaelides, A., Ng, S., 2000. Estimating the rational expectations model of speculative storage:
A monte carlo comparison of three simulation estimators. Journal Of Econometrics 96, 231–266.
Mishura, Y., 2008. Stochastic Calculus for Fractional Brownian Motion and Related Process.
Lecture Notes In Mathematics - Springer.
Mykland, P. A., Zhang, L., 2005. Comment: A selective overview of nonparametric methods in
financial economics. Statistical Science 20(4), 347–350.
Neunkirch, A., Nourdin, I., 2006. Exact rate of convergence is some approximation schemes
Press.
Rogers, L. C. G., 1997. Arbitrage with fractional brownian motion. Mathematical Finance 7(1),
95–105.
Shiryaev, 1999. Essentials of Stochastic Finance. World Scientific.
Singleton, K. J., 2006. Empirical Dynamic Asset Pricing. Princeton University Press.
Smith, A., 1993. Estimating nonlinear time series models using simulated vector autoregressions.
Journal of Applied Econometrics 8, 63–84.
204
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 37
Tkacz, G., 2001. Estimating the fractional order of integration of interest rates using a wavelet ols
estimator. Studies in Nonlinear Dynamics and Econometrics 5(1), 19–32.
Zivot, E., Czellar, V., 2008. Improved small sample inference for effi-
cient method of moments and indirect inference estimators, avaliable at
http://faculty.washington.edu/ezivot/research/CZ2007Latex2.pdf.
205
ARTICLE IN PRESS
Insurance: Mathematics and Economics ( ) –
Keywords:
Term structure
No-arbitrage
Interpolation
Smoothing splines
A discussion of adjustment procedures for yield curves for pure 1975). See Hagan and West (2006) and Anderson et al. (1996)
discount bonds with very long maturities and their application in for extensive reviews of methods utilized in nonparametric re-
the evaluation of life annuities can be found in Carriere (1999). gressions. Following this approach, Shaefer (1981) used Bernstein
A major problem raised by Corradi (1996) is that the usual polynomials and Pham (1998) used Chebyshev polynomials. Other
interpolation/extrapolation procedures for yield curves using examples are: Vasicek and Fong (1982) exponential splines,
nonparametric methods can generate negative prices for very long Barzanti and Corradi (1998) tension splines and Li and Yu (2005)
times to maturity of the yield curve. This problem is addressed by Bayesian formulation of spline methods. The interest in smooth-
the techniques introduced in this article. ing rather than interpolating data leads to the adoption of smooth-
Available data do not provide us with a complete term structure ing splines instead of regression splines as in Fisher et al. (1995).
curve; rather, what we observe is a set of discrete points relating The advantage of smoothing splines is the adoption of a penalty
yields to different maturities. This is not helpful in practice. As parameter that can control the excess of roughness. However, the
an illustration, it is unlikely that the observed maturities will be nonparametric methods cited above share some operational prob-
regularly spaced. Moreover, it could be necessary to have quotes lems: the choice of knot location and the number of knot points,
for non-standard maturities to price bond derivatives or other instability when fitting the estimated curve on extremes of ma-
interest rate securities. A perfect fit could be obtained through turity line and great sensibility to outliers, which make the fitted
interpolation; nevertheless, it would produce a very jagged curve curves very unstable.
since bond prices are subject to many sources of disturbances. This This study applies the method of constrained smoothing B-
noise is in general due to different liquidity, tax-treatments, bid- splines (herein after COBS) as formulated by He and Shi (1998) and
ask spreads and other effects. Therefore, it is imperative to have a He and Ng (1999) to confront those problems in term structure
method to estimate a continuous term structure curve. estimation. In particular, Barzanti and Corradi (1999, 2001) have
The literature of the term structure estimation follows two already employed constrained B-splines estimation to the problem
distinct lines. The first takes a statistical approach using smoothing of direct term structure estimation using Italian bond data. The
data techniques without considering driving factors behind asset authors include constraints on monotonicity and nonnegativity of
prices. The second approach makes use of theory to identify state the discount function by using a linear program formulation to the
variables or include no-arbitrage arguments with the purpose of B-spline estimation method.
constructing equilibrium models. However, since models need to Although we are not the first to apply the constrained B-
be calibrated to a constructed curve, even though the researcher spline technique to the estimation of the term structure interest
or practitioner will use an equilibrium model, it is common to rates, our approach differs somewhat of the Barzanti and Corradi
estimate first the term structure curve by some smoothing data (1999, 2001) methodology. First, COBS methodology estimates
technique. The converse is also the case. For instance, to run
a conditional median function and it is consequently robust to
a simulation experiment of our method, we first estimate the
outliers. In technical terms, COBS formulates the B-Spline by a
term structure using an equilibrium model, and then we use our
L1 projection and shares the properties of quantile regression
statistical method.
methods of Koenker and Basset (1978). Second, it provides full
In particular, statistical methods for estimating the term struc-
automation for selecting the smoothing parameter or the knot
ture of interest rates can be divided in parametric and nonpara-
mesh using information criteria instead of ad hoc or some
metric methods. Parametric methods have some advantages. First,
cross validation procedures based on convergence rates. In a
they assume functional forms that are parsimonious to provide
nonparametric setting, the knot mesh can be interpreted as the
economic interpretation for their parameters. Second, parameter
selected functions used to approximate the term structure.
restrictions and constraints can be added in such a way as to obey
Last but not least, similarly to Barzanti and Corradi (1999,
the relationships imposed by economic theory and no-arbitrage
2001), COBS representation in a linear programming formulation
principles. Third, parametric methods can be tested against nested
allows us to include constraints without substantial increase
models to examine if imposed restrictions by the theory are valid.
in computational costs. Recall that in the particular case of
Some typical examples of parametric interpolation can be found in
term structure estimation, those constraints will rule out some
Nelson and Siegel (1987) and Svensson (1994).
arbitrage opportunities. More specifically, COBS imposes boundary
However, as pointed out by Choudry (2005), parametric
conditions and a monotonically decreasing property to the
methods are not immune to problems. First, they are less flexible
discount function in an attempt to maintain positive spot and
than nonparametric methods to fit observed data, especially if the
forward rates.
fitted curve requires more than one hump and trough. Second,
because they are only reasonable approximations to observed data Comparison of estimation techniques can have many criteria.
and in general have a lower fit than nonparametric methods, This study will take the following. First, accuracy against smooth-
they are not appropriate for pricing and no-arbitrage applications. ness: is it flexible enough to accommodate data without generating
Finally, they are subject to a misspecification bias; a preselected very jagged curves? Second, no-arbitrage fulfillment: do the fitted
parametric model might be too restricted or low-dimensional to and implied curves generate no-arbitrage violations? Third, model
fit unexpected features; see Hardle (1990). consistency: is the estimation method consistent with a theoreti-
As pointed out by Ait-Sahalia and Duarte (2003), nonparametric cal equilibrium model? This paper compares COBS methodology
methods will tackle the potential problem of misspecification. against some usual methods in statistical term structure fitting.
First, since they do not assume a particular functional form, Namely, COBS is compared with smoothing splines and the para-
they are robust to misspecification errors. Second, nonparametric metric Nelson and Siegel (1987) and Svensson (1994) models.
methods can be used as a first step in the analysis of data to guide The remainder of the paper is structured as follows. The
the specification effort. Third, nonparametric estimation can be next section describes the relationship among spot interest rates,
quite feasible when the sample size is small and appropriate shape forward rates and discount functions and points the restrictions
restrictions are imposed. imposed by the assumption of no-arbitrage. Section 3 succinctly
For nonparametric estimation, the first methods employed describes the methodology of COBS. Section 4 compares COBS with
were regression splines with the use of quadratic and cubic piece- alternative methodologies using the set of criteria described above.
wise approximation functions introduced by McCulloch (1971, Section 5 concludes the paper.
208
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
ARTICLE IN PRESS
M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) – 3
2. Term structure definitions above (below) spot rates if the term structure of interest rates is
increasing (decreasing).
The spot interest rate, y(m), is the rate of return applied to Because spot rates are an average of forward rates, see Eq. (4),
maturity of a bond or a contract expiring in m periods. Today’s price forward rates will be much more volatile. This last result has
of receiving $1.00 in m periods is given by the discount function, important implications for the comparison exercise of Section 4.
d(m). Under continuous compounding, spot interest rates and the Since forward rates tend to be very volatile, any small error on the
discount function are related by the following formula: discount curve estimation is magnified many times in the forward
rate. Experience shows that although plots of the discount or spot
d(m) = e−y(m)m . (1) rates may differ little among different estimation methodologies,
As a result, from the discount function the spot rate or yield very jagged implied forward rate curves will signal if the technique
curve can be recovered by: has a good fit or not much more clearly.
See Koenker (2005) for a discussion on Lp fitting. These problems parameters in the model more heavily. The AIC and SIC in
can be solved using standard linear programming methods, and constrained smoothing splines of He and Ng (1999) are given by:
again see Koenker (2005) for a discussion on computational aspects
1 1 log(n)
of these problems. In particular, L1 roughness and L∞ roughness
SIC(λ) = log ρτ yi − m̂λ,L1 + pλ
measures are defined by: n 2 n
n −2 and
X
L1 roughness = V (g ′ ) = |g (x+ +
i+1 ) − g (xi )|,
(8) 1 N +m
i=1
AIC(λ) = log ρτ yi − m̂λ,L1 +2
n n
L∞ roughness = V (g ′ ) = ||g ′′ ||∞ = max[g ′′ (x)]. (9) PN +m
x where m̂λ,L1 (x) = j=1 âj Bj (x) is the optimal linear (median)
The estimation of the term structure of interest rates will smoothing B-splines that solves (6) for τ = 0.5, m = 2 and the
specialize in the conditional median estimation, which implies τ = L1 roughness measure.
0.5 in Eq. (7) and the concept of fidelity will be given by: This makes the knot and smoothing parameter choice a fully
automatic procedure, removing the ad hoc procedures in the model
n
X specification. It is worth noting that, if necessary, the choice of
fidelity = |yi − g (xi )|. (10) the number of knots and its localization can be imposed by the
i =1
user. This analysis will be further explored on the next section
Now, based on the fact that any mth order smoothing spline has a were we used the automatic selection based on the SIC and we
corresponding B-spline representation on the same knot mesh, the compare it with other alternatives as the McCulloch (1975) equally
function g can be replaced by its B-spline representation: sparse knots and the Litzenberger and Rolfo (1984) criterion used
N +m by Barzanti and Corradi (1999) monotonic spline estimation.
X
g (x i ) = aj Bj (x).
j=1 4. Applications
In the B-spline representation above a more general knot mesh
The choice of a method of yield curve estimation superior to all
is used, T = {ti }Ni=+12m with N = n − 2 internal knot points. For
others is a hard, if not impossible, task. As Anderson et al. (1996)
simplicity, assume that all xi are distinct from one another, then
pointed out, authors have attempted to highlight the weaknesses
t1 = · · · = tm = x1 , tm+1 = x2 , . . . , tN +m = xn−1 , tN +m+1 = · · · =
and strengths of each method, but it is impossible to select the best
tN +2m = xn . For the estimation of linear (median) smoothing B-
model of all.
splines, for m = 2 and L1 roughness, He and Ng (1999) show that the
Nevertheless, we can narrow the options down to a few models
objective function of the estimation problem can be equivalently
that are commonly used for practitioners and academics. Deacon
described in a linear programming algorithm.
and Derry (1994) concluded that the B-spline is preferred by prac-
In particular, for the case of (median) smoothing B-splines, for
titioners and the survey of BIS Bank of International Settlements—
m = 2 and L1 roughness, the problem is formulated by a L1 projec-
‘‘Zero Coupon Yield Curves: Technical Documentation 1999’’
tion, it shares the properties of robustness related to the quantile
reports that the most used methods by Central Banks are the non-
regression methods of Koenker and Basset (1978), and is less sen-
parametric Smoothing Splines and the parametric methods of Nel-
sible to outliers in reduced samples than the methods of smooth-
ing splines and other interpolation schemes. As pointed out by son and Siegel (1987) and Svensson (1994).
Koenker et al. (1994), because the methods estimate conditional We use the following techniques as comparison benchmarks
quantile functions, they possess an inherent robustness to extreme against COBS: smoothing splines, Nelson and Siegel (1987) and
observations in the yi ’s. Svensson (1994). We skip the details of those models for simplicity,
Given the linear programming representation, it is easy to as a detailed survey of those models can be found in Anderson
incorporate qualitative, monotonicity, convexity (concavity) and et al. (1996) or James and Webber (2000). Our strategy is to
pointwise restrictions on the fitted equation by adding equality compare each method in terms of the three criteria defined
or inequality constraints; see He and Ng (1999) and Ng and in the introduction: accuracy against smoothness, no-arbitrage
Maechler (2007) for details. As noted in Section 2, constrained fulfillment and model consistency.
estimation is very important in the term structure fitting problem Some regression spline interpolation method could also have
in order to respect no-arbitrage principles. These restrictions been included, like cubic B-splines or exponential splines. How-
will be especially useful for estimating the discount function in ever, they were not because those methods interpolate instead
Section 4. of providing smooth approximations to the yield curve. Conse-
One recurrent problem in regression splines is the choice quently, they overfit the data producing very jagged curves and no
of number and location of knot points. In general, the choice trade-off between accuracy and smoothness of fit, which is inco-
follows some ad hoc set of criteria for linear, quadratic, cubic herent with our first comparison criterion.
and exponential regression splines. For details on these criteria; The remainder of this section is structured as follows. First,
see Anderson et al. (1996, pg. 35–36). Some methods, such we describe the raw data used for estimating yield and implied
as the penalized smoothing splines of Jarrow et al. (2004) curves. Second, we fit the three curves for each data set and method
use generalized cross validation (GCV). Other approaches use in order to look at the first two criteria. Third, we test model
economic interpretations of short, intermediate and long-term consistency by simulating the Cox et al. (1985) (CIR) model of the
money, like the one suggested by Litzenberger and Rolfo (1984) yield curve.
and employed by Barzanti and Corradi (1999). The knot selection
and the smoothing parameter λ in the constrained smoothing 4.1. Data description
method of He and Ng (1999) can be obtained by using the
Akaike information criteria (AIC) and the Schwarz information Two different markets are brought into play for our comparison
criteria (SIC). The Akaike information criteria is equivalent to using exercise: the US market using Treasury STRIPS (Separate Trading
the generalized cross validation, while the Schwarz information of Registered Interest and Principal of Securities) and the Brazilian
criteria is a version of AIC, which penalizes the number of market by the DIxPRE swaps contract. While the first market has
210
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
ARTICLE IN PRESS
M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) – 5
high liquidity, longer maturities, and high-density data, the second extra hump to the curves. The respective spot and forward curves
has opposite characteristics. Thus, it is possible to see how the are also independently estimated using the following equations:
methods work in two very different market settings.
1 − e−m/τ1 1 − e−m/τ1
US data come from the Treasury STRIPS program, started in y(m) = β0 + β1 + β2 − e−m/τ1
1985, which trades US Treasury Bonds principal and coupon −m/τ2 −m/τ1
components as separate synthetic zero coupon bonds. As noticed 1 − e−m/τ2
by Carmona and Tehranchi (2006), this program was created to + β3 − e−m/τ2
−m/τ2
give zero coupon reference rates to the market. The Treasury
selects which bonds are eligible for the program and the strips of and
these issues are done by government securities and brokers. US f (m) = β0 + β1 e−m/τ1 + β2 m/τ1 e−m/τ1 + β3 m/τ2 e−m/τ2 .
STRIPS data are collected daily from January 1st, 1997 to January
1st, 2008 in a total of 2874 observations with an average of 208 In both, Nelson and Siegel (1987) and Svensson (1994) our
different quotes each day. approach to obtain the discount function was to use Eq. (1).
Since the Brazilian government zero coupon bonds do not The COBS formulation uses the approach described in Sec-
tion 3.1 For comparison purposes we have two possible formula-
present longer maturities and do have very low liquidity, we
tions, with and without constraints. We call the first specification
use swaps to obtain the Brazilian zero coupon curve. Those are
restricted COBS. This formulation will estimate the discount func-
future swap contracts between floating interbank rates and fixed
tion using the COBS’ algorithm described in Section 3 and with the
predetermined rates, DIxPRE. Those DIxPRE swap contracts are
additional linear constraints:
from the stock exchange future market in Brazil, the BM&F —
Bolsa de Mercadorias e Futuros. The swap curve is extracted d(0) = 1, (11)
from observed contracts in the market and does not have fixed d(m) > 0, (12)
maturities. In particular, note that the price at time t of a contract ′
maturing at time T is determined by the formula: d (m) < 0. (13)
Eq. (11) implies that discount curve starts at one, Eq. (12)
pt ,T = e−y(T −t ) 100,000.
restricts the discount curve to be positive, and Eq. (13) imposes
Thus, the continuous spot rate, y, is obtained by solving the the discount curve to be negatively inclined. Since our data
above equation for y. Data were collected daily from January 3rd, set is by definition finite, we did not include the constraint
2000 to December 5th, 2006 in a total of 1721 observations with limm→∞ d(m) = 0. Spot and forward rates are obtained from the
an average of 19 different quotes each day. implied relationship with the discount rates, Eqs. (2) and (5).
The unrestricted COBS does the same algorithm, however with
two main differences. First, we estimate the spot instead of the
4.2. Term structure of interest rates estimation
discount curve and forward and discount functions are obtained
using Eqs. (1) and (4). Second, the estimation is made without the
Given observed rates yi and maturities mi , the smoothing splines constraints (11)–(13). The idea of this specification is to isolate
approach estimates a smooth function g (mi ) that minimizes the the robustness aspect of COBS in the analysis, since it estimates
objective function: the conditional median function from the constraints inclusion. In
n Z this aspect, the unrestricted COBS can be more directly compared
X 2
yi − g (mi )2 + λ g ′′ (u) to the also nonparametric method of smoothing splines. Figs. 1–4
du
i=1 plot estimated yield and forward curves for all the estimation
techniques at some selected dates. US STRIPS market fits for yield
where the parameter λ controls the trade-off between accuracy,
and forward curves are shown in Figs. 1 and 2, while Figs. 3
represented by the residual sum of squares, and ‘‘smoothness’’ of
and 4 show estimated yield and forward curves for the Brazilian
the solution, represented by the integral of the second derivative to swap market. In general, for the spot rates, we see that COBS
the square. This parameter is automatically selected by using the gives a better adjustment than Nelson–Siegel family curves but
generalized cross validation (GCV) method of Fisher et al. (1995). without implying very jagged curves like the smoothing spline.
For each data point in our sample, we estimate a spot rate curve Looking at the graphs we also see that forward curves are much
directly by minimizing the above objective function. Forward and smoother when we use COBS rather than smoothing splines.
discount functions are found by their implied relationships with However, COBS suffers from similar problems to those found in
the discount curve, Eqs. (1) and (4). nonparametric methods in providing more volatile forward curves
The parameterized Nelson and Siegel (1987) and Svensson than parametric methods do. In particular, for the US STRIPS,
(1994) curves construct a parsimonious formulation for the spot we see that forward rates for restricted and unrestricted COBS
and forward curves using heuristic arguments based on the has the tendency to drop after 20–25 years maturities, which is
expectation theory of the term structure of interest rates; see Zivot not the case for Nelson and Siegel (1987) and Svensson (1994)
and Wang (2006). For the Nelson and Siegel (1987) specification, specifications.
we estimate spot and forward curves independently by estimating This drop in the forward curves after 20–25 years is justified
the following equations: by the convexity bias effect.2 As discussed in Phoa (1997) and
further explored by others, in general but not always, the yield on
1 − e−m/τ 1 − e−m/τ
y(m) = β0 + β1 + β2 − e−m/τ
−m/τ −m/τ
and 1 It is worth noting that the original algorithm for implementing COBS,
programmed by He and Ng (1999) was further developed by Ng and Maechler
f (m) = β0 + β1 e−m/τ + β2 m/τ e−m/τ (2007) providing faster computation, Specifically for our estimation exercise,
compared to the original COBS estimation package of He and Ng (1999),
where τ is a given adjustment parameter. The method of Svensson computations were on average 6.5 times faster using the Ng and Maechler (2007)
(1994) basically adds an extra term to the spot and forward Nelson algorithm.
and Siegel (1987) equations. The extra term intends to allow an 2 We thank an anonymous referee for this point.
211
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
ARTICLE IN PRESS
6 M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) –
a thirty-year Treasury bond is lower than the yield on a twenty- methodology makes use of a fully automatic procedure. More
year bond. This is due to the higher convexity of thirty-year specifically, in our exercise, the value of lambda and the number of
bonds when compared to twenty-year bonds. The market pays knots was selected using the SIC and location is defined uniformly
a premium for this effect, which Phoa (1997) calls a convexity in their percentile levels.4 For instance, if the number of knots is
adjustment. As we can see from Eq. (4), spot rates are an average three, the median maturity will be the single internal point; see He
of forward rates, therefore, an small difference between thirty and Ng (1999) for details. Since the procedure is fully automatic,
and twenty-year bonds implies in sensible changes in the forward for each cross section term structure curve, a different number of
rates from twenty to thirty-year maturities. This is exactly what is knots could be chosen. That is exactly the case for the US STRIPS and
captured by nonparametric methods. However, it is worth noting the Brazilian Swaps curves. Fig. 5 shows how this number changes
that restricted COBS imply in much less volatile changes than other for each daily curve. In general, the number of optimal knots varies
nonparametric methods, a result already pointed by Barzanti and
from two to six and is between four and five for most of the time.
Corradi (1999) constrained B-spline estimation. The convexity bias
In order to further investigate the effect of knot selection
effect is not observed for the Brazilian swap rates because, for this
procedure, we compare COBS estimation selection procedure with
market, maturities are much shorter.
In a nonparametric estimation the selection of the number and two other alternatives: the McCulloch (1975) and Litzenberger and
position of knots plays a crucial role in determining the shape Rolfo (1984) criteria. We use McCulloch (1975) procedure to place
near the boundary.3 As described at the end of Section 3, COBS
4 This is made by selecting the ‘quantile’ option in the COBS algorithm, it is also
3 We thank this comment and the idea of comparing COBS knot selection possible to choose location being uniformly distributed or select knot positions
procedure with other alternatives to an anonymous referee. manually.
212
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
ARTICLE IN PRESS
M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) – 7
internal knots equally sparsed across maturities. Notice that this Litzenberger and Rolfo (1984) we in fact obtain smoother forward
changes location, however, the number of knots is still defined curves.
following COBS automatic selection method in accordance with the Table 1 exhibits the root mean square errors (RMSE) results
SIC. Alternatively, using the procedure of Litzenberger and Rolfo for each model in the two markets studied. In order to evaluate
(1984), we change location and the number of knots. The procedure the alternative knot selection methods, we also include COBS
uses three internal knots at one-, five- and ten-year maturities with McCulloch (1975) and Litzenberger and Rolfo (1984) method,
as short, median and long-term money.5 We also used different hereafter named COBS with Mc and COBS with LR. As we expected,
values of the penalizing parameter lambda by picking values lower the Brazilian Swaps have a lower fit due to fewer data points and
and higher than the optimal choice according to the SIC. lower liquidity. From Table 1 we can conclude that COBS is in
Fig. 6 illustrates COBS restrict estimation with the default an intermediate position between the nonparametric smoothing
automatic selection procedure and using different lambda and spline method and the parametric Nelson–Siegel family curves
knot procedures. We can see that by decreasing the smoothing in terms of in-sample goodness of fit. Note that restricted and
parameter, λ, the forward curves becomes more jagged as unrestricted COBS have very close values for RMSE; this suggests
expected. If we increase the value instead, the forward curve that the added constraints had little impact on in-sample fit. Using
becomes smoother. Using McCulloch (1975) knot selection or alternative knot selections increases the RMSE, but not by much,
COBS with Mc and COBS with LR are still in an intermediate
position between smoothing splines and parametric methods.
5 This choice corresponds to the US STRIPS, for Brazilian Swaps, with shorter One striking fact is that Svensson (1994) specification gave the
and irregular maturities; we placed knots at 1 month, 25% and 75% of the longest worse fit in all analyses. This result can be explained in part by
maturity. the difficulties to attain convergence in the nonlinear equation
213
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
ARTICLE IN PRESS
8 M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) –
Table 2 affine term structure and its solution to the bond price is:
Violations of no-arbitrage conditions (%).
Method US STRIPS Brazilian Swaps P (t , T ) = A(t , T )e−B(t ,T )r (t ) (15)
COBS restricted 0 0
where P (t , T ) is the price of a zero coupon bond at time t that
COBS restricted with Mc 0 0
COBS restricted with LR 0 0 matures at time T > t and
COBS restricted 9.45 0.17 2kθ /σ 2
COBS restricted with Mc 9.56 0.19 2h exp {(κ + h) (T − t )/2}
COBS restricted with LR 9.69 0.22 A( t , T ) = ,
Smoothing spline 17.15 0.93 2h + (κ + h) (exp {(T − t )/h} − 1)
Nelson–Siegel 0 0 2 (exp {(T − t )/h} − 1) (16)
Nelson–Siegel–Svensson 0 0 B(t , T ) = ,
2h + (κ + h) (exp {(T − t )/h} − 1)
p
dynamic of the instantaneous short-term interest rate r (t ) obeys h = κ + 2σ 2 .
2
Strips
6
Optimal Number of Knots
5
4
3
2
0 500 1000 1500 2000 2500
Day
Swaps
6
Optimal Number of Knots
5
4
3
2
Fig. 6. Comparison of alternative knot selection methods for COBS—US STRIPS forward rates.
from 0.1 years achieving a maximum maturity of 29 years. This par- choice of the CIR model was motivated by the fact that this model,
ticular size was chosen to match the number of observations we in particular, provides an analytical formula for generating the
have for the US STRIPS market. discount function, which helps the simulation process.
From those theoretical discrete discount function curve, we The results of the experiment are in Tables 3 and 4.6 We can see
estimate a continuous discount function using the different that they have the same pattern observed for the US STRIPS market.
methods we have employed so far, restricted and unrestricted Absolute RMSE values and relative values across methods are about
COBS, smoothing spline, Nelson and Siegel (1987) and Svensson the same. For the frequency of no-arbitrage violations however, we
(1994). The continuous estimated curves are generated for a see an increase in the percentage. One possible explanation is that
maturity up to 30 years and compared to the theoretical discrete the parameter set is implying a very fast decay of the discount rate
curve generated by the CIR model. The goal here is twofold. First, and large number of no-arbitrage violations.
we want to test model consistency by comparing the CIR discrete The experiment illustrates that COBS is able to get a better fit
theoretical discount curve with the estimated continuous curve. than that provided by the parametric Nelson and Siegel (1987)
Second, we also estimate no-arbitrage violations for each method. and Svensson (1994) models while also avoiding the overfitting
In this setting, what ‘‘model consistency’’ stands for is: being
able to generate estimated continuous curves with good in-
sample-fit and fulfillment of the no-arbitrage conditions. These
6 For the simulation exercise of the CIR model we did not use alternative methods
continuous curves are estimated from a set of theoretical data
for the number and location of knots. The reason for that are twofold: in the
points generated by the CIR model. This simulation is just an simulated data the knot location has no direct economic interpretation and the
illustrative exercise, although the CIR model can be considered results of Section 4.2 demonstrated that alternative knot selection did not bring
a representative model for the interest rate curves shape. The significant differences in the estimation results.
216
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
ARTICLE IN PRESS
M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) – 11
Table 3 Acknowledgements
Root mean squared errors—CIR simulated models.
Method US STRIPS We would like to thank the anonymous referees for valuable
COBS restricted 0.015470
comments. We are grateful to participants at the 12th Time Series
COBS unrestricted 0.014193 School Meeting, Gramado, Brazil, the VI Brazilian Finance Meeting,
Smoothing spline 0.008038 Vitória, Brazil, the XXVIII Brazilian Econometric Meeting, Salvador,
Nelson–Siegel 0.023210 Brazil and the LACEA–LAMES 2006 meeting, Mexico City, Mexico,
Nelson–Siegel–Svensson 0.1281103
for their helpful comments on an earlier draft of this paper.
Table 4 References
Violations of no-arbitrage conditions (%)—CIR simulated models.
Method US STRIPS Ait-Sahalia, Y., Duarte, J., 2003. Nonparametric option pricing under shape
restrictions. Journal of Econometrics 116, 9–47.
COBS restricted 0 Albrecht, P., 1985. A note on immunization under a general stochastic equilibrium
COBS unrestricted 16.5 model of the term structure. Insurance: Mathematics and Economics 4,
Smoothing spline 23.5 239–244.
Nelson–Siegel 0 Anderson, N., Breedon, F., Deacon, M., Derry, A., Murphy, G., 1996. Estimating and
Nelson–Siegel–Svensson 0 Interpreting the Yield Curve. Wiley.
Ang, A., Sherris, M., 1997. Interest rate risk management: Developments in interest
rate term structure modeling for risk management and valuation of interest-
problems of nonparametric methods. We show that, even for the rate-dependent cash flows. North American Actuarial Journal 1 (2), 1–26.
Barzanti, L., Corradi, C., 1998. A note on interest rate term structure estimation using
interest rate curve obtained from a theoretical model, there are a tension splines. Insurance: Mathematics and Economics 22, 139–143.
significant number of no-arbitrage violations if we try fitting by Barzanti, L., Corradi, C., 1999. A note on direct term structure estimation using
unconstrained nonparametric models. COBS will eliminate those monotonic splines. Rivista di Matematica per le Scienze Economiche e Sociali
22, 101–108.
no-arbitrage violations without losing flexibility, which is peculiar Barzanti, L., Corradi, C., 2001. A note on interest term structure estimation by
to nonparametric methods. Similar results are observed for direct monotonic smoothing splines. Statistica LXI, 205–212.
estimation from observed data looking at root mean square errors Bosch, R.J.Y., Ye, Y., Woodworth, G.G., 1995. A convergent algorithm for the quantile
regression with smoothing splines. Computational Statistics & Data Analysis 19,
and the frequency of no-arbitrage violations. 613–630.
Boyle, P.P., 1978. Immunization under stochastic models of the term structure.
Journal of the Institute of Actuaries 105, 177–187.
5. Conclusions
Boyle, P.P., 1980. Recent models of the term structure of interest rates with actuarial
applications. In: Transactions of the 21st International Congress of Actuaries.
COBS innovates the literature of term structure estimation by vol. 4, pp. 95–104.
introducing qualitative no-arbitrage constraints and by providing Bühlmann, H., 1995. Life insurance with stochastic interest rates. In: Ottaviani, G.
(Ed.), Financial Risk in Insurance. Springer, Berlin, pp. 1–24.
a robust estimator to outliers. In fact, the results of Section 4 show Carmona, R., Tehranchi, M., 2006. Interest Rate Models: An Infinite Dimensional
that unrestricted nonparametric methods produce a significant Stochastic Analysis Perspective. Springer.
number of no-arbitrage violations. The problem is more significant Carriere, J.F., 1999. Long-term yield rates for actuarial valuations. North American
Actuarial Journal 3, 13–22.
for nonparametric methods. Violations of those no-arbitrage Carriere, J.F., 2000. Non-parametric confidence intervals of instantaneous forward
conditions are not captured by usual fitting criteria like RMSE and rates. Insurance: Mathematics and Economics 26, 193–202.
can bear very large costs, especially for hedging operations. Chan, K.G., Karolyi, G., Longstaff, F., Sanders, A., 1992. An empirical comparison of
alternative models of short term interest rate. Journal of Finance 47, 1209–1297.
The proposed methodology also produces more meaningful Choudry, M., 2005. The Handbook of Fixed Income Securities. McGraw-Hill.
curves in the sense that the spot curve shows reasonable, not very Corradi, C., 1996. On the estimation of smooth forward rate curves from a finite
jagged, shapes as expected by theoretical arguments and practice number of observations: A comment. Insurance: Mathematics and Economics
18, 115–117.
as well. This robustness is due to the fact that it estimates a
Cox, J.C., Ingersoll, J.E., Ross, S.A., 1985. An intertemporal general equilibrium model
conditional median function using smoothing splines instead of of asset prices. Econometrica 53, 363–384.
a mean conditional function. COBS showed consistency with the Deacon, M., Derry, A., 1994. Estimating the term structure of interest rate. Working
CIR model. The experiment illustrated in Section 4 gave evidence Paper 24. The Bank of England.
Delbaen, F., Lorimier, S., 1992. Estimation of the yield curve and the forward rate
that the other competitive methods presented the same sort of curve starting from a finite number of observations. Insurance: Mathematics
problems shown in direct estimation using observed data. and Economics 11, 259–269.
In conclusion, we put COBS method in an intermediate category De Schepper, A., Goovaerts, M., Delbaen, F., 1992. The Laplace transform of
annuities certain with exponential time distribution. Insurance: Mathematics
between nonparametric and parametric methods. COBS combine and Economics 11, 291–294.
the best aspect of both methods: it captures flexibility from Duffie, D., 1996. Dynamic Asset Pricing Theory, 2nd ed. Princeton University Press.
nonparametric methods and sensible shapes and fulfillment of no- Fisher, M., Nychka, D., Zervos, D., 1995. Fitting the Term Structure of Interest Rates
Using Smoothing Splines. Finance and Economics Discussion Series. Board of
arbitrage conditions from parametric methods. COBS in-sample fit Governors of Federal Reserve System.
is also intermediate between smoothing splines and Nelson–Siegel Hagan, P., West, G., 2006. Interpolation schemes for curve construction. Applied
family curves. This is not particularly surprising, since COBS Mathematical Finance 13, 89–129.
Hardle, W., 1990. Applied Nonparametric Regression. Cambridge University Press.
penalizes the roughness less in exchange for observing the no-
He, X., Ng, P., 1999. COBS: Qualitatively constrained smoothing via linear
arbitrage constraints; however, it keeps a better fit than Nelson and programming. Computational Statistics 14, 315–337.
Siegel (1987) and Svensson (1994) curves. He, X., Shi, P., 1998. Monotone B-Spline smoothing. Journal of the American
The final point in the conclusion is that COBS technique of He Statistical Association 93, 643–650.
James, J., Webber, N., 2000. Interest Rate Modeling. John Wiley & Sons.
and Ng (1999) is a very competitive method to fit the term struc- Jarrow, R., Ruppert, D., Yu, Y., 2004. Estimating the term structure of corporate
ture of interest rates. It combines the flexibility of nonparametric debt with a semiparametric penalized spline model. Journal of the American
methods with more sensible shapes of the parametric methods. It Statistical Association 99, 57–66.
Koenker, R., 2005. Quantile Regression. Cambridge University Press.
ranks very well in the criteria of accuracy against smoothness, no- Koenker, R., Basset, G., 1978. Regression quantiles. Econometrica 46, 33–50.
arbitrage fulfillment and model consistency, and avoids the prob- Koenker, R., Ng, P., Portnoy, S., 1994. Quantile smoothing splines. Biometrika 81,
lems of negative prices for long maturities discussed in Carriere 673–680.
Li, M., Yu, Y., 2005. Estimating the interest rate term structure of treasury and
(1999), as confirmed by the results shown in Table 2. In that sense, corporate debt with Bayesian splines. Journal of Data Science 3, 233–240.
it is very appealing when compared to usual methodologies in term Linton, O., Mammen, E., Nielsen, J., Tanggard, C., 2001. Estimating yield curves by
structure estimation. kernel smoothing methods. Journal of Econometrics 105, 185–223.
217
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
ARTICLE IN PRESS
12 M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) –
Litzenberger, R.H., Rolfo, R., 1984. An international study of tax effects on Shea, G., 1984. Pitfalls in smoothing interest rate structure data: Equilibrium models
government bonds. Journal of Finance 39, 1–22. and spline approximation. Journal of Financial and Quantitative Analysis 19,
McCulloch, J., 1971. Measuring the term structure of interest rates. Journal of 253–269.
Business 44, 19–31. Shiu, E.S.W., 1987. On the Fisher–Weil immunization theorem. Insurance:
McCulloch, J., 1975. The tax-adjusted yield curve. Journal of Finance 30, 811–830. Mathematics and Economics 6, 259–266.
Nelson, C.R., Siegel, A.F., 1987. Parsimonious modeling of yield curves. Journal of Shiu, E.S.W., 1988. Immunization of multiple liabilities. Insurance: Mathematics and
Business 60, 473–489. Economics 7, 219–224.
Ng, P., Maechler, M., 2007. A fast and efficient implementation of qualitatively Shiu, E.S.W., 1990. On Redington’s theory of immunization. Insurance: Mathematics
constrained quantile smoothing splines. Statistical Modeling 7 (4), 315–328. and Economics 9, 171–175.
Panjer, H.H. (Ed.), 1998. Financial Economics: With Applications to Investments, Steeley, J.M., 1991. Estimating the gilt-edged term structure: Basis splines and
Insurance and Pensions. The Actuarial Foundation, Schaumburg, IL. confidence intervals. Journal of Business Finance and Accounting 18, 513–529.
Pedersen, H.W., Shiu, E.S.W., 1994. Evaluation of the GIC rollover option. Insurance: Stevens, R., Waegenaere, A., Melenberg, B., 2009. Longevity risk in pension annuities
Mathematics and Economics 14, 117–127. with exchange options: The effect of product design. Insurance: Mathematics
Pham, T.M., 1998. Estimation of the term structure of interest rates: An international and Economics, in press (doi:10.1016/j.insmatheco.2009.09.005).
perspective. Journal of Multinational Financial Management 8, 265–283. Svensson, L.E.O., 1994. Estimating and interpreting forward interest rates: Sweden
Phoa, W., 1997. Can you derive market volatility forecasts form the observe yield 1992–1994. NBER Working Paper (4871). NBER.
curve convexity bias? The Journal of Fixed Income 6, 43–54. Vasicek, O., Fong, H.G., 1982. Term structure modeling using exponential splines.
Renshaw, A.E., Haberman, S., 2003. Lee–Carter mortality forecasting with age- Journal of Finance 37, 339–356.
specific enhancement. Insurance: Mathematics and Economics 33, 255–272. Zaglauer, K., Bauer, D., 2008. Risk-neutral valuation of participating life insurance
Rolski, T., Schmidli, H., Schmidt, V., Teugels, J., 1999. Stochastic Processes for contracts in a stochastic interest rate environment. Insurance: Mathematics and
Insurance and Finance. Wiley, Chichester. Economics 43, 29–40.
Shaefer, S.M., 1981. Measuring a tax-specific term structure of interest rates in the Zivot, E., Wang, J., 2006. Modeling Financial Time Series with S-PLUS, second ed.
market for British government securities. Economic Journal 91, 415–438. Springer-Verlag.
218
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
Emerging Markets Review 9 (2008) 247–265
a r t i c l e i n f o a b s t r a c t
Article history: This article provides an analysis of empirical microstructure for the
Received 8 October 2008 BRL/US$ exchange rate market using high-frequency bid and ask
Accepted 18 October 2008 quote data. The aims of the article are to verify the importance of the
Available online 31 October 2008
presence of asymmetric information in price dynamics, to build a
model for the price discovery process and to analyze the empirical
JEL classification:
determinants of the spread between bid and ask through a conditional
G14
C22
model that captures an asymmetric response to the spread regarding
C14 past information. The asymmetric information hypothesis is tested
through a nonparametric test of conditional independence for the
Keywords: Markov property. A model for price discovery is built using a vector
Market microstructure error correction between bid and ask, controlling for duration and
Emerging market volatility. As a result of this vector, we build an equilibrium spread
Spread
deviation series, and we show that the conditional distribution of
Markov property
equilibrium spread deviations responds asymmetrically to the spread
Asymmetric response
Quantile regression changes and expected conditional volatilities and durations. This is
made by using the quantilogram and a quantile autoregression as tools
for modeling the asymmetry effects. We relate the findings to some
facts presented in the theoretical literature on market microstructure.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
The analysis of empirical microstructure effects on exchange rate markets has gained great momentum
in recent years.1 It is well recognized that in short-run asset pricing may be more closely related to market
⁎ Corresponding author. Tel.: +55 11 4504 24 26; fax: +55 11 4504 2350.
E-mail address: marciopl@isp.edu.br (M.P. Laurini).
1
Reference works in the literature of microstructure of the market exchange are Frenkel et al. (1996) and Sarno and Taylor (2002).
The monograph of Lyons (2001) is an extensive study of the microstructure of the exchange market based on the order flow.
1566-0141/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.ememar.2008.10.003
219
248 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
structures than the factors related to asset fundamentals, as pointed out by Flood and Taylor (1996). The
literature on market microstructure indicates that factors such as transaction costs, stock balance and
liquidity premia may play a more crucial role in prices in the short run than factors associated with
macroeconomic fundamentals.
The literature on exchange rates encourages the analysis of market microstructure effects by providing
evidence that the conventional macroeconomic approach to exchange rate determination can only explain
long-run movements and extreme situations, such as in hyperinflation events and exchange rate crises. In
normal exchange rate market situations, exchange rate movements are defined by the market
microstructure (e.g. Flood and Taylor (1996), Taylor (1995) and Frankel and Rose (1995)).
Another factor that allows assessing exchange rate market microstructure effects is the large availability
of information about intraday exchange rate operations, provided by proprietary trading systems, such as
Reuters 2000–2 Dealing System, the Electronic Broking System (EBS), Spot Dealing System and, in Brazil,
the Sisbex system. Information is collected systematically and made publicly available through systems
such as Reuters and Bloomberg Data License (used herein), allowing for comprehensive studies on market
microstructure using high frequency bid and ask quote data (tick by tick operations).
The availability of this information allows assessing some exchange rate market characteristics that cannot be
systematically explained by usual macroeconomic models. Among unexplained effects we have the persistence
of exchange rate returns in intraday data, related to deviations of the martingale property from the returns,
which translates into violations of market efficiency and into the principle of no arbitrage. Other effects that are
not accounted for by macroeconomic analysis are the determinants of the spread between bid and ask; the
importance of the information captured by order flows and its predictive power over future rates; the impacts of
chartist analysis on the exchange rate market; influence of trading volume, spatial location of agents and
volatility in price setting; and the importance of private information for the determination of prices and spreads.
A remarkable difference between market microstructure models and macroeconomic models concerns
assumed theoretical restrictions. Macroeconomic models are often based on representative agent
structures, symmetric information, rational expectations and absence of transaction costs. Market
microstructure models, on the other hand, are often characterized by asymmetric and heterogeneous
structures.2 There are several types of agents in this market, such as traders, market makers and customers
with distinct strategic goals and information sets.
Since the exchange rate market is decentralized3 and its operators are physically distant, the
information sets are distinct among agents, rendering private information relevant to the price setting
process. These different sets of information can allow arbitrage situations, which is indeed quite common
and could affect the degree of market efficiency, as reported in Flood (1994).
The wealth of information obtained from intraday data allows for the assessment of issues that would
not be accounted for by lower frequency data analysis. In addition to prices, intraday transactions include
other interesting sources of information, such as the time elapsed between two operations in the market
(order durations). The time elapsed between two orders is linked to the arrival of new information in the
market and is also an inherent liquidity measure.4 This information is relevant in market microstructure
models since prices are likely to be affected by recent transactions (e.g. Hasbrouck (1991) and Dufour and
Engle (2000)), that is, prices and the spread in the subsequent transaction will be affected by previous
prices and also by trading volume, spread, and time of previous transactions.
The aim of the present article is to assess the empirical effects of market microstructure based on intraday
bid and ask quotes in the R$/US$ exchange rate market. We evaluate the importance of private information in
the market by testing the Markov property (Section 5). The result of this test encourages the development of an
empirical price discovery model using a vector error correction model using bid and ask prices and the
previous price durations and quote volatility as explanatory variables, allowing to check the impact of recent
operations on prices.
2
See O'Hara (1995) for a review of the theoretical models of asymmetry of information in the context of market microstructure
and Hasbrouck (2007) to empirical implications.
3
Sarno and Taylor (2002) contain a description of the structures and agents in exchange markets. For a detailed description of
BRL/US$ exchange market see Garcio and Urban (2004).
4
For a review of informational content in durations and econometric models for conditional durations see Engle and Russel (1998)
and Engle (2000). Fernandes and Grammig (2005a) and Fernandes and Grammig (2005b) and references for specification and
testing for conditional duration models.
220
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 249
Using the results of this model, we assess the determinants of equilibrium spread deviations between
bid and ask by developing a model that enables the asymmetric response of the spread to the previous
information set, by means of the quantilogram (Linton and Whang (2007)) and quantile autoregressions
(Koenker and Xiao (2006)).
The paper is organized as follows: Sections 2 and 4 describe the data used, show some characteristics of
these series and comment on the relationship with previous studies using exchange market data; Section 5
checks for the presence of asymmetric information by testing the Markov property; Section 7 describes the
vector error correction model used for price discovery and analyzes the effects of asymmetry on the
conditional distribution of the spread. Section 8 concludes.
Despite the extensive literature on exchange rate market microstructure, there has been a scarcity of
research into the BRL/US$ exchange rate microstructure, one of most significant emerging markets. The
most important studies on the BRL/US$ exchange rate market microstructure are those by Garcio and Urban
(2004) and Wu (2007). The former takes an in-depth look into institutions and the operation of the
interbank currency exchange market in Brazil, describing the agents, institutions and the existing trading
mechanisms. In addition, the paper provides econometric evidence of a shift of Granger causality from the
futures market to the spot market, using daily data.
The article by Wu (2007) is a comprehensive study of Brazilian exchange market microstructures based
on daily data collected from the Sisbacen system, the Central Bank database containing all the consolidated
currency transactions in Brazil. The complete order flow database used by Wu (2007) is the only study in
the international literature that includes 100% of a country's official currency operations and enables
sorting out the effects of exchange rate movements related to trading and financial operations, Central Bank
interventions in the currency exchange market, and the consequences of these movements on the
exchange rate.
There are, however, criticisms against the database used by Wu (2007). The first one concerns the fact
that the data set does not correspond to the publicly available information at the time of the agents's
decision. Some of the information used is not disclosed by the Central Bank and some is made known but
with a large time delay, and therefore it is not the same as the data set used by agents in the decision-
making process in the intraday market.
Our analysis differs from previous ones because we use high-frequency bid and ask quotes. This method
is analogous to those used by Goodhart (1989) and Bollerslev and Domowitz (1993) and allows assessing
the effects of microstructure on the operation of the currency exchange market5 observed in intraday
quotes.
Our data are based on the spot market quotes provided by the Bloomberg Data License database. This
database format is known as FXFX DATA. This system collects information about the operations carried out
in several markets, including Sisbex (Trading system of the Brazilian Mercantile & Futures Exchange) and
over-the-counter operations, based on information gathered from several market participants. The sample
used in this study contains all order fed into the system, starting on May 28, 2006 and ending on November
30, 2006. The data set format is shown in Table 1.
The data file contains eight columns. The first column specifies the traded asset (currency). The second
column contains the time at which the operation was fed into the system, with accuracy in seconds. The
third and fourth columns show the bid identification and the bid price; the fifth column presents the agent
who provided the bid value. The sixth, seventh and eighth columns show the ask, ask price, and the
operator who provided the ask value. Note that the data do not always specify the operator in charge of the
bid and ask, since anonymous order is allowed.
The high-frequency exchange rate market data have a lot of limitations compared to other databases
used in empirical microstructure. The Transaction and Quotes (TAQ) for stocks traded on the New York
Stock Exchange (NYSE), for instance, contains additional information such as the price and volume of
transactions, instead of only indicative quotes. Exchange rate data show only the behavior of quotes, but do
5
Lyons (2001) contains a full description of the various databases used in microstructure of foreign exchange markets.
221
250 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
Table 1
Data format
not reveal the traded value. Other limitations include the absence of information on the trading volume and
the impossibility to find out whether the order was initiated by a buyer or seller, which is provided by the
Transaction Orders and Quotes (TORQ) database compiled by Hasbrouck (1992). The lack of these
informations (transaction prices, volumes and whether the transaction was initiated by a buyer or by a
seller) about effective transactions results from the absence of a disclosure clause for exchange rate
operations.
Although the scope of market microstructure analysis regarding the data set is limited by the no
disclosure, one should note that this is the data set publicly available to spot market operators, and
analyzed in other studies as Goodhart (1989) and Bollerslev and Domowitz (1993).
There is some evidence that the omission of transactions does not affect the estimation results, as
pointed out by Goodhart (1989), but the literature has demonstrated some problems with the use of quotes
because they are just indications and not transactions. Lyons (1996) shows that interdealer spreads are
lower than those of indicative quotes.6 To dismiss such criticisms one may say that these quotes correspond
to the information made publicly available in the spot market; studies as the one by Lyons (1996), which
uses data from a private dealer, have two shortcomings: the time interval is too short (weeks at most) and
they capture a private dealer's trading behavior and might not necessarily represent the behavior of the
market, due to the large heterogeneity of agents in the exchange rate market.
Note that the high-frequency data used in this paper contain two important pieces of information that
are not provided by other studies involving the exchange rate market, such as that conducted by Wu
(2007). The information on trading time allows us to build a variable for order duration, which is given by
the time elapsed between the arrival of two orders to the market. This variable provides information about
market liquidity and volatility, and its behavior represents the arrival of new information to the market (e.g.
(Engle and Russel (1998), Engle (2000) and Fernandes and Grammig (2005a)). Another variable is the
conditional volatility derived from a GARCH model for the mid-quote between the bid and ask, used as a
proxy for the traded price. The use of a mid-quote as trading price is justified in the microstructure
literature, since in some models, the mid-quote is related to the fundamental asset price.7
Time series containing information on bid and ask prices are used in this paper, as shown in Table 1. Two
additional variables are built: the duration variable, given by the number of seconds between the arrival of
two orders, and the mid-quote variable, given by the average between the bid and ask prices, which will be
used to build the volatility proxy using a GARCH model.
Real-time storage of data yields a relatively large number of operations with incorrect information.8 The
correction was made by eliminating clearly discrepant observations due to mistypings (e.g.: a bid recorded
as 0.219 instead of 2.19), observations with negative spreads or spreads that are not compatible with the
6
For this conclusion Lyons (1996) uses the order flow from a particular dealer.
7
See Hasbrouck (2007) for a review of procedures and models used in empirical modeling of market microstructure.
8
See Falkenberry (2002) for details on the problems in the processing of high-frequency data.
222
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 251
local spread behavior. These outliers were filtered out by the use of a filtering rule that regards as outliers
operations whose spread between bid and ask is larger than 10 standard deviations from the spread. This
rule captures mainly mistypings in the database. After filtering out this information, our database
comprises 279,737 tick-by-tick observations between May 28, 2006 and November 30, 2006.
Another consistency check regards the time of the transaction, which measures whether the time was
recorded correctly and whether it follows the correct sequence. Note that, in line with the microstructure
literature, we did not restrict trading hours because exchange rate market quotes are negotiated around the
clock, since this market operates nonstop in the three major trading locations (United States, Europe, and
Japan). Restricted information would lead to the exclusion of data that could serve as a benchmark for
pricing other financial instruments, as information from other markets outside the Brazilian trading hours
may affect the exchange rate determination in the domestic market. Around 4% of the quotes is negotiated
outside the normal business hours of Sisbex system in the BM&F, the most important market for BRL/US$
exchange rate.
The construction of the volatility variable occurred through a GARCH(1,1) model for the mid-quote price.
This variable is built in such a way that it can be used as a proxy for the volatility of the traded price, which
is not observed directly. This variable is obtained through a GARCH(1,1)9 model estimated by maximum
likelihood with estimated parameters (ω = 1.51E09, α = .0.210275, β = .7773732). The duration series is
obtained by the number of seconds between two orders.
Figs. 1 and 2 show the graphs for the bid–ask series, spread, duration and volatility variables. The figures
reveal that the periods with an increase in spreads correspond to the highest values obtained for the bid–
ask series. Table 2 presents the descriptive statistics for these variables, as well as Phillips–Perron unit root
tests for the bid–ask series and bid–ask log returns. The test results indicate that we should not reject the
unit root null for the bid–ask series and that the log return series are stationary at 1% significance level. As
usual in financial time series, the Gaussian distribution is rejected for the bid and ask series.
9
The complete model is not placed in the text by restrictions of space, but can be obtained by request to the authors.
223
252 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
A stylized fact in high frequency financial series is the presence of intraday periodic patterns, e.g. Zivot
and Yan (2003). To analyze periodic patterns, we executed a nonparametric fitting procedure that captures
intraday periodic patterns, using the smoothing spline model for the spreads, durations and volatilities. A
smoothing spline Green and Silverman (1994) can be defined as the solution to the minimization of the
following function:
n
i=1
∫
Sλ ðgÞ = ∑ ðYi −gðXi ÞÞ2 + λ ð g VVðxÞÞ2 dx ð1Þ
where g can be any curve, x is the data set and λ is what controls the smoothness of the fitting process,
controlling the trade-off between minimization of residuals and roughness of the adjusted curve. To obtain
the intraday patterns, we adjusted the smoothing splines by using 24 knots for each series. No significant
periodic patterns were found for the bid–ask log returns. Fig. 3 shows the patterns obtained for the spread,
duration and volatility.
The observed patterns show that the spread is often higher outside the Brazilian market trading hours
(interval of 2–6 a.m. hours). The figure shows that the spread tends to increase at opening and closing hours
Table 2
Descriptive statistics
Ask Bid Ask log returns Bid log returns Duration Variance
Mean 2.163719 2.162170 −2.67E−07 −2.07E−07 12.75741 4.17E−08
Median 2.162000 2.160500 0.000000 0.000000 1.999999 2.47E−08
Maximum 2.236900 2.355400 0.010915 0.0100925 3563.100. 0.000509
Minimum 2.122600 2.12000 −0.009199 −0.008751 0.000000 1.54E−08
Standard deviation 0.022946 0.02887 0.000187 0.000203 72.92399 1.76E−06
Asymmetry 0.502882 0.494570 1.111605 −0.13605 24.9654 54.49727
Kurtosis 2.515426 2.490172 247.0023 176.0372 814.3413 4827.558
Jarque–Bera 14518.0 14433.53 6.93E+08 2.20E+11 7.96e+09 7.46E+14
Prob. 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
p-Value PP unit root test 0.0372 0.0348 0.0001 0.0001 – –
Sample size 279,737 279,737 2,797,376 279,736 279,736 279,736
224
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 253
in the Brazilian currency exchange market, which is analogous to the “U” pattern obtained by Bollerslev and
Domowitz (1993) and consistent with the theoretical model for spread determination by Bollerslev and
Domowitz (1993). A relevant effect is that the spread tends to increase around 5 p.m., which is the time
limit for currency operations at Sisbacen. This effect can be rationalized by the fact that unrecorded
operations must be canceled, producing adjustment costs, inventory imbalance, and problems in risk
margins.
The periodic pattern for the duration series shows that the time length between two quotes is quite long
outside trading hours in Brazil, but shorter, with a slight increase around 5 p.m. The periodic patterns for
the volatility series show large volatility outside trading hours (due to the smaller number of quotes, price
jumps are higher), and that volatility tends to increase during the opening and the closure of trading hours
in Brazilian market, which is possibly associated with the adjustment of trading positions. Note that these
figures indicate a possible correlation between spread, duration and volatility. This association will be
tested in Sections 6 and 7.
An additional hypothesis about the distribution of spreads concerns the existence of clusters in the
spread, which may be consistent with price collusion (e.g. Christie et al. (1994), Hasbrouck (1999)). The
hypothesis of price collusion can be summarized as the tendency of spread quotes towards yielding
multiple values for the minimally allowed variation. In the case of exchange rate series, for instance, the
minimum spread is 0.001, but spread distributions tend to concentrate in a certain multiple value for the
minimum value. An easy way to test this effect is by checking the distribution of the last digit of the spread.
Under the null of no price clustering, the last digit of the spread should be uniformly distributed and the
proportion of values in each digit should be statistically equal. This is the approach followed by McGroarty
et al. (2006), who provide a comprehensive analysis of the clustering effect on exchange rates.
Table 3 shows the distribution of the last digit of spread. Note that the last digit concentrates around
values 1, 2 and 3, indicating that spread values tend to range from 0.001 to 0.003. The test for the equality of
proportions is carried out by using the χ2 for equality of proportions, which rejects the null hypothesis of
equal proportions. Even though the test rejects the null of no clustering, it does not indicate the possible
cause for price clusters. The studies by Hasbrouck (1999) and Hasbrouck (1999) discuss some possible
causes for this effect, and Hasbrouck (1999) complements this analysis with a dynamic model. In these
models, price clustering may be related to a stochastic cost for liquidity provision incurred by the market
maker. In Section 6 we show that the equilibrium spread has a value around 0.0025, and that the
225
254 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
Table 3
Distribution of the spread last digit
concentration of values immediately below 0.001 and 0.003 units is related to short-run deviations from
the equilibrium spread.
5. Markovian property
The property of market efficiency, obtained by the assumption of agents's rationality and efficient
processing of the available data, implies that asset prices should be compatible with a first-order Markov
process. Thus, the price at time t should only depend on the most recent information at t − 1 plus an
innovation process:
Pt = Pt−1 + et ð2Þ
10
If the information is efficiently processed, all the information available up to period t − 1 should be
contained in price Pt − 1, and thus price variation should correspond to a nonsystematic error process.11
Another characteristic related to no-arbitrage conditions (e.g. Harrison and Kreps (1979) and Harrison and
Pliska (1981)) is that conditional expectation EQ ½Pt + k jFt = Pt , i.e., in the risk-neutral measure the price
should be a martingale process, which leads to the concept of equivalent martingale measure.
However, microstructure models based on asymmetric information, such as those by Glosten and
Milgrom (1985) and Easley and O'Hara (1987), predict that the existence of different information sets
between agents affects the Markov property of bid and ask prices. In these models, asymmetric information
causes prices at t to depend upon the whole trading history and not only upon the most recent information,
invalidating the Markov property for prices and indirectly characterizing some type of market inefficiency.
A discussion about this issue can be found in Flood (1994), who shows that the decentralization of agents in
the currency exchange market represents a less efficient information-based form than in stock markets.
Decentralization slows down the dissemination of information, and therefore the prices are correlated not
only with the most recent price, but also with a long set of prices in the past.
Thus, we can check for the presence of asymmetric information using Markov property tests. The test
proposed by Fernandes and Amaro de Matos (2007) takes into account the irregular pattern of price quotes
over time in high-frequency financial data. This is a nonparametric test for conditional independence,
based on the null hypothesis that if the Markov property holds, the length of time between both operations
should be independent of the realization of the variable related to asset prices, that is, the spread between
bid and ask.
10
Formally, the information process is a filtration Ft given by a crescent sequence of sub-sigma algebras Bt o Bu o B for 0 ≤ t ≤ u,
defined in an space of probability (Ω,B,P). We assume the usual filtration in this article, and suppress notation of the filtration.
11
Different types of market efficiency are related to distinct assumptions about the process εt. Efficiency type III would be
associated with a process uncorrelated εt correlated, whereas type II efficiency would be given more restrictive process εt assuming
independence. The more restrictive efficiency type I is obtained assuming that the process εt is independent and identically
distributed. See Campbell et al. (1997).
226
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 255
Table 4
Nonparametric test of Markov property
The null hypothesis of the test derived by Fernandes and Amaro de Matos (2007) is given by:
H0 : fiXj di ; x; dj = fijx ðdi Þfxj x; dj ð3Þ
where fiXj (di, x, dj) represents the joint density of duration di, of spread x and duration dj, fi|X(di) the
conditional density of duration dj and fxj (x, dj) the joint distribution between spread dj and duration j, for
i N j. If the conditional independence property holds, the null hypothesis given by Eq. (3) is equivalent to the
validity of the Markov property.
The test derived by Fernandes and Amaro de Matos (2007) is based on the weighted quadratic distance
between fiXj (di, x, dj) − fi|X(di)fxj (x, dj), when densities are replaced with nonparametric density estimators.
The test statistic is given by:
^
2 3
1 ^ fiX dk + j; Xk ^
Λ ^ = wi dk + j; Xk ; dk f i dk + j; Xk; dk −
4 f Xj ðXk ; dk Þ5 ð4Þ
f n ^
f X ðXk Þ
where estimators f^(•) are kernel density estimators for joint, marginal and conditional densities and wi is a
weighting function. Fernandes and Amaro de Matos (2007) show that the statistic given by Eq. (5) has a
standard normal asymptotic distribution:
3=2 −3=2 ^
nbn Λ ^ − bn δΛ
^ f
λn = d Nð0; 1Þ:: ð5Þ
^
σΛ Y
e
δΛ = k ∑ni= 1 wi Xj dk + j ; Xk ; dk ^
^
f i dk + j ; Xk ; dk ð6Þ
n
^ = k ∑n w2 X d ; X ; d ^
σ f3 d ; X ; d
n i=1 i j k+j k k i k+j k k
Λ
and ek and υk are constants that depends on the selected kernel function.
To test the Markov property on prices, we followed the method proposed by Fernandes and Amaro de
Matos (2007) and we used the spread series at time t, and duration series at times t + 1 and t, and we use
the log-spread and log-duration series, in compliance with the same methodology. We calculated the
marginal, conditional and joint density estimators using a quartic kernel and the same rules applied by
Fernandes and Amaro de Matos (2007) for the selection of the bandwidth ek and υk. Table 4 shows the
Markov property test results for gross spreads and durations and also for adjusted spreads and durations
extracting the periodic pattern found in Section 4. The results indicate rejection of the null hypothesis that
the first-order Markov property holds for any significance level in both data sets. This evidence supports
the conclusion achieved by Richard Lyons – “Contrary to the asset approach – exchange rate determinations
is not wholly a function of public news.”,12 where we assume that the public information is given by the
past information on spreads and durations.
The rejection of the null hypothesis of the Markov property confirms the presumed existence of
asymmetric information effects pointed out by Glosten and Milgrom (1985) and Easley and O'Hara (1987)
for the exchange rate market. Our results are analogous to those obtained by Flood (1994), showing that the
12
Lyons (2001), pg 9.
227
256 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
differential information between agents is key to the determination of spreads and that the existence of
different information affects the agents's price discovery process. Note that it is possible to explain the
violation of the Markov property by the operational structure of the currency exchange market based on a
framework that includes several dealers with different locations.13
The results of rejection of Markov property indicate that the price discovery process will not be based
only on the price established in the immediately preceding time period, but on the whole set of past
information. Note that this property directly affects the model used for price discovery, which will have a
larger number of lags for the adjustment of the short-run dynamics.
One of the predictions of the market microstructure literature in the presence of asymmetric
information is that agents should discover the actual equilibrium price of the asset in a process known as
price discovery.14 In this process, agents seek to determine the fundamental asset price, which is contained
in current quotes but contaminated with microstructure noise. When the price process is based on the
existence of several prices for the same asset, e.g. stocks traded on several stock exchanges or existence of
bid and ask prices, the discovery of the fundamental asset price is related to a mechanism of search for an
equilibrium price between several quotes for the same asset. This is equivalent to the existence of
correction mechanisms for deviations of prices in each quote from equilibrium prices.
This multivariate price discovery process can be represented by a vector error correction model in event
time.15 By assuming a bivariate vector of prices Pt = [p1t p2t], the vector error correction model is represented
as follows:
ΔPt = μ 0 + A1 ΔPt−1 + A2 ΔPt−2 + ::: + Ak ΔPt−k + γðZt−1 −μ 1 Þ + λ1 Xt−1 + ::: + λj Xt−k Zt−1 = ½p1t−1 −B1 p2t−1 ð7Þ
In this model, vector Zt − 1 represents the deviations of the long-run equilibrium values between p1 and
p2 and B1 are the coefficient vectors in the equilibrium relationship; γ is the coefficient vector that controls
the error correction mechanism; coefficients Ai represent the short-run coefficients; Xt − k is a vector of
explanatory variables that are not cointegrated with p1 and p2 and λ is a coefficient vector that captures the
influence of Xt − k on the short-run dynamics of ΔPt.
The VECM (vector error correction model) seeks to decompose the price adjustment dynamics into two
components — one that is linked to the short-run dynamics, given by components AiΔPt − k and λjXt − k and
the other one linked to the dynamics of the long-run equilibrium deviations given by cointegration vector
Zt − 1. In the context of the price discovery model, the cointegration vector represents the equilibrium
between two asset price measures, and this equilibrium includes a measure of the fundamental asset price,
given by Zt − 1 in the VECM.
In our study, the price vector is given by the bid and ask prices Pt = [bidt askt] and thus the cointegration
vector captures the relationship of the equilibrium spread value given by ask − B ⁎ bid. The aim of the VECM
proposed in this study is to decompose bid and ask variations at time t into two components: a short-run
component given by past bid and ask variations and explanatory variables and another component linked
to the long-run equilibrium, given by the correction of the deviations of the long-run relationship between
bid and ask.
The violation of the Markov property shows that the price discovery process cannot be based only on
the set of information about the immediately previous observation (ΔPt − 1 and Xt − 1), since quotes at time
t − 1 do not contain all the information available in the market (the private information shown in previous
transactions is not instantly incorporated into the prices, causing the violation of the Markov property
presented in Section 5). The price discovery process does not depend only on the immediately previous
13
See Lyons (2001) for a discussion on the effects of a structure of multiple dealers on the foreign exchange market.
14
For more references and econometric models of price discovery see Hasbrouck (1988, 1996, 2007, 1991).
15
For a review of multivariate models of price discovery see Hasbrouck (2007), and also Engle (2000) for models in event time.
Event time is the use of order of operations as time index, replacing the calendar time as index of the stochastic process, and
generating irregular spaced time series. Hasbrouck (2007) discusses the advantages of the use of the event time in the
microstructure studies, which are related to time deformation process.
228
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 257
Table 5
Cointegration tests
price, but on the whole trading history, a property that is compatible with some asymmetric information
models such as Glosten and Milgrom (1985) and Easley and O'Hara (1987).
In our VECM model, we also included the values of past durations and past conditional volatilities of the
mid-quote price as explanatory variables. These two variables are included as a way to check whether the
bids and asks respond to the effects of liquidity and uncertainty, using durations and volatilities as proxies
for these effects.
The specification of the VECM is valid in the presence of an equilibrium vector between the bid and ask,
a hypothesis that can be verified using a cointegration test. To test for the existence of an equilibrium
relationship, we used Johansen's cointegration test, whose results for the bid and ask log series are shown
in Table 5. The specification of the cointegration test is a error correction model using 120 lags for variations
in the bid and ask logs, 24 lags for past durations and 20 lags for past volatilities, where the specification
was determined by Schwartz Information Criteria. As discussed in Section 5, this large number of lags is
related to the violation of the Markov property. The test results show that we rejected the null hypothesis of
nonexistence of a cointegration vector with p-value of 0.001 by both tests (Rank and Trace) obtained by
Johansen's procedure, indicating the existence of a long-term equilibrium mechanism between bid and ask
logs and the validity of vector error correction model.
Note that Johansen's cointegration test is based on the assumptions of normal distribution in the
residuals and absence of structural breaks. The hypothesis of normal distributions is not valid for our data
set, since kurtosis and asymmetry indicate non-Gaussian distributions. Therefore, the critical values used
by the test must be affected by this violation. Note, however, that there is an a priori economic reason for
the existence of a cointegration vector between the bid and ask, since an imbalance between the bid and
ask, representing a non-stationary spread, leads to systematic arbitrage opportunities. Thus, we did not
reject the evidence in favor of a cointegration hypothesis, even with possible distortions in the power of the
test.
The vector error correction model is partially shown in Table 6, where we present the estimated
cointegration vector (cointegration equation), the loading matrix for the correction of long-run deviations
(error correction) and the first two lags of the short-run mechanism. The cointegrating equation shows that
the normalized cointegration vector for the bid–ask log is [1 − 1.000949], which represents an equilibrium
spread of.00259.
This equilibrium spread value can be explained by three basic factors: costs related to market dealers's
functions, costs of handling currency inventory and a factor related to the asymmetric information given by
adverse selection. The dealers's costs are linked to the provision of immediate liquidity. The inventory costs
are related to the provision of liquidity and to the possibility that the dealers may be operating with agents
who have privileged information (insiders). The adverse selection problem is given by the fact that the
dealers are not able to distinguish agents with demands for liquidity and hedge from insiders, thus
increasing the spread for both classes of agents.16
The estimated loading matrix γ is given by the value of −0.015364 for the ask log variation and 0.014691
for the bid log variation. We can interpret these signs as follows: positive deviations from the equilibrium
16
See Frenkel et al. (1996) and Sarno and Taylor (2002) for detailed references on this explanation of determinants of the spread.
229
258 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
Table 6
Error correction model
Cointegrating Equation
log (ask(−1)) 1.000000
log (bid(−1)) −1.000949
(1.7E−05)
[−575570.]
Error Correction
D (log(ask)) D (log(bid))
Coint Eq ( 1) −0.015364 0.014691
(0.00089) (0.00094)
[−17.2781] [15.9325]
spread are adjusted by reducing the ask price and increasing the bid price, but the mechanism is the
opposite for spreads below the equilibrium value, being characterized by an increase in the ask price and a
decrease in the bid price. The short-run mechanisms are harder to analyze, due to the large number of lags
and the change in the sign of coefficients. There is evidence that 120 lags correspond on average to 20 min
at the calendar time, showing that the mean time for the incorporation of information can be approximated
by this value.
Fig. 4 shows the generalized response impulse functions obtained by the VECM estimation in Table 6.
The figure indicates that the shocks converge around 50 observations to their permanent value, and show
stable convergence to the long-run values.
7. Spread determination
The VECM estimated in Section 6 allows determining empirical models for the equilibrium spread value
and the correction mechanism for equilibrium spread deviations, but it does not allow the direct
identification of the factors that influence and impact the spread. To assess spread determinants, we begin
by investigating the empirical characteristics of the spread deviation series, created by the cointegrating
equation in the VECM. Based on the observed characteristics, we formulated a asymmetric response model
for the conditional distribution of the spread based on quantile autoregression Koenker and Xiao (2006)
and the results of the quantilogram estimates Linton and Whang (2007).
Note that the existence of cointegrating vector between the log bid and the log ask is equivalent to the
existence of a stationary process for spread deviations. To describe this process, we first formulated a linear
230
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 259
model for spread deviations. This method is based on regressions on the spread as those described in Jorion
(1996). Following the methodology of Jorion (1996), we developed an autoregressive linear model for the
spread by adding variables that represent the expected values of volatility and duration, measuring the
expected effects of uncertainty and liquidity.
The formulation of this model seeks to control the stochastic impacts of the dealer's costs and the impact
of expected risk and liquidity values on the determination of equilibrium spread deviations (which can be
seen as deviations on the average costs embedded in the equilibrium spread) discussed in Section 6.17
The model is based on a third-order autoregressive process for equilibrium spread deviations, with the
addition of one-step ahead predictions for volatility and duration. Volatility forecasts are obtained by the
same GARCH model used for the construction of the volatility variable. For duration forecasts, we estimated
and create forecasts by mean of an Autoregressive Conditional Duration Model of Engle and Russel (1998).18
The aim of incorporating the predictions is to include the effect of agent's expectations about the volatilities
and durations expected for the next transaction in spread determination.
17
An alternative way of looking at the determination of bids and asks is the use of permanent-transitory decompositions for time
series (e.g. Hasbrouck (2007)). The permanent component would be linked to fundamentals of the asset and the transitory
components to microstructure effects.
18
The estimated model is an Autoregressive Conditional Duration (ACD) model, with estimated parameters: ψt = 1.37e − 05 +
0.196830xt2− 1 + .884851ψt where xt are price durations and ψt the conditional durations. More general ACD models could be used;
see Bauwens and Giot (2000), Bauwens et al. (2002) and Fernandes and Grammig (2005b) for generalizations for the ACD model.
231
260 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
The results obtained in this spread decomposition (7) show that there is a high dependence of the
spread on past spreads (the persistence measured by the sum of the autoregressive coefficients is
approximately .93). Other important effects are the signs obtained for the coefficients related to volatility
and duration forecasts. We obtained a positive sign for volatility and a negative but nonsignificant sign for
duration. This may be interpreted as an additional premium in the spread for uncertainty. The
nonsignificance of duration can be interpreted as a measure of high liquidity in this market, where the
agents do not have to pay a premium for urgent transactions. These effects can be interpreted as the dealers
protection against uncertainty (volatility can be related to the arrival of insiders with privileged
information and protection against a higher loading cost for an increase in volatility). Similar effects are
obtained in Glassman (1987), who found positive correlation between spread and volatility.
A possible shortcoming of this model is the symmetric treatment of spread deviations: spreads below
the equilibrium value are treated just like those values above the equilibrium spread. Note that these
situations are intuitively associated with different market situations. Therefore, the imposition of the same
response in both situations can be an invalid restriction.
7.1. Quantilogram
To check for a possible asymmetric response in spread deviations, we used a tool known as
quantilogram, derived by Linton and Whang (2007). The quantilogram is a generalization of the
correlogram to the modeling of the dependence in conditional quantiles of the time series distribution. The
quantilogram is also a measure of directional predictability, as discussed in Linton and Whang (2007) and is
part of the general literature on tests for market efficiency.
Let y1, y2… 19 be a stationary process whose marginal distribution with quantiles µα for α ∈ (0,1). In the
null hypothesis of no directional predictability conditional on quantile α:
where ψα(x) = α − 1(x b 0) is an indicator function that measures if the variable t hit the quantile α and
Ft−1 = yt−1 ; yt−2 N is the usual filtration. Under the null hypothesis of no directional predictability, if the
variable at time t − 1 is below quantile α, the chance is no more than α for series y to achieve this quantile
again at time t. Violations of this hypothesis are evidence of predictability in this conditional quantile. Note
that the traditional test of weak-form market efficiency fits this context, when one uses the conditional
mean:
The quantilogram has two advantages over the usual directional predictability measures: the estimation
of conditional quantiles is robust to the presence of outliers and some quantiles of the distribution of asset
returns have a straightforward interpretation in risk management, being related to measures such as Value
At Risk and Expected Shortfall. Note that there is a similar interpretation in the analysis of the spread — if
the higher quantiles of the spread distribution show persistence, this effect affects the dealers's incomes
and the transaction costs in this market.
To measure the dependence in conditional quantiles, the quantilogram derived by Linton and Whang
(2007) is given by the following expression:
19
The analysed series yi can be a directly observed process or residual of a model estimated in a first stage, as in our case. Linton
and Whang (2007) derived asymptomatic distributions valid in the situations. The inference in quantilogram also remains valid in
the presence of general heteroscedastic components, as stationary GARCH process.
232
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 261
Fig. 5. Quantilogram.
233
262 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
^ ^
h i
∑T−k
t = 1 ψα yt − μ α ψα yt + k − μ α
^
ρ ακ = sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð11Þ
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi
∑T−k ψ2 y − ^ ∑T−k ψ2 y − ^
h
t=1 α μ
t α t=1 α t μ Þ α
where the estimator for μ^ α is given by the sample quantile, which is an estimator of the process:
T
^
μ α = arg min ∑ ρα ðyt −μ Þ: ð12Þ
μaR i = 1
for the each α quantile. Fig. 5 shows the quantilogram estimated for quantiles (0.01, 0.05, 0.10, 0.25, 0.50, 0.75,
0.90, 0.95, 0.99). The quantilogram estimated for these quantiles shows that the same shape of autoregressive
dependence found in conditional mean occurs in the quantiles, but note that the dependence intensity is
different in each quantile, indicating a correlation in the first lags close to .4 for quantile .01 and an upward
trend in the persistence of higher quantiles; in quantile 0.99 (highest spreads), persistence is close to .95.
This asymmetric effect demonstrates that the lowest percentiles are characterized by low persistence
and fast reversion to the unconditional quantile, whereas for the points where the spread is way above the
equilibrium values (percentiles greater than .90), large persistence exists. Note that this asymmetry effect is
of major financial importance, since it shows that high spreads tend to be more persistent than lower ones.
Again, we can interpret this effect as a response of dealers to unanticipated shocks, such as increase in
uncertainty and higher currency inventory maintenance costs.
The quantilogram reveals different time dependence patterns for each conditional quantile, but it does
not represent a complete parametric model. This indicates the necessity to build a model for the conditional
distribution of the spread for each quantile using an autoregressive structure, but also controlling for
volatility effects and expected durations, analogous to the linear model estimated for the spread.
A possible tool for this type of analysis is the quantile autoregression model (Koenker and Xiao (2006)),
which consists in formulating a quantile regression model using the lags of the dependent variable as
Table 7
Linear model for spread
234
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 263
Table 8
Quantile autoregressions
explanatory variables. In a quantile regression (Koenker and Basset (1978)), the objective function is
directly formulated as a function of the conditional quantile, minimizing the function:
n
min ∑ ρα ðyi −x iVβðα ÞÞ ð14Þ
βaRp i = 1
which corresponds to a loss function ρα conditional on quantile α, where α ∈ (0,1). We define the loss
function as ρα (u) = u(α − I(u b 0)), where I(.) is an indicator function, and u is the difference between the
observed value yi and the value predicted by x′β(α). ^(α) are obtained by minimizing the
Estimators for β
i
loss function given by 14, obtained by the expected value of ρα(yi − x′β(α)) i relative to each β(α).
Note that the quantile regression model can be extended to autoregressive structures, using a quantile
regression for the autoregressive process20:
Qyt αjyt−1; N ; yt−p = β0 ðτÞ + β1 ðα Þyt−1 + ::: + βp ðα Þyt−p : ð15Þ
Note that we can represent this model as yt = β0(Ut) + β1(Ut)yt − 1 where Ut is uniformly distributed
between (0,1). In this formulation, the traditional autoregressive AR(1) model is obtained when β0(u) =
σΦ− 1 and β1(u) = β1 To model the possible asymmetry structure in the response of spread deviations, we
built the following model:
Qyt αjyt−1;:::; yt−p = β0 ðτ Þ + ðα ÞE½VoljF t−1 + δðα ÞE½DurjF t−1 + β1 ðα Þyt−1 + :::βp ðα Þyt−p ð17Þ
20
Engle and Manganelli (2004) use a similar formulation for the estimation of conditional value at risk.
235
264 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
We estimated21 these models for the same quantiles (0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.99)
used in the quantilogram (8). The quantile regression model (Eq. (17)) reveals the asymmetry effect already
shown by the quantilogram — the persistence of shocks is lower for the quantiles below the conditional
median and increases for quantiles above the conditional median, being close to 1 in these quantiles, and
confirming the asymmetric response of spread deviations.
The estimated coefficients related to volatility and duration show another interesting effect. The
coefficients of volatility always have negative signs for quantiles below the median and positive signs for
percentiles above the median. The effect of duration is clearer for quantiles above the median, where the
coefficients are always positive and statistically significant (Table 8).
This asymmetry effect indicates that volatility and duration increase the spreads for spread deviations
above the conditional median, and this effect is enhanced in higher quantiles. We may interpret this
asymmetry effect as asymmetric relationship of the spread with volatility and duration — spreads above the
equilibrium spread show high persistence and are positively influenced by volatility and by expected
durations, whereas spreads below the equilibrium spread are poorly persistent and negatively correlated
with expected volatility and durations, indicating an asymmetric mechanism of reversion to the
equilibrium spread.
8. Conclusions
In this paper, we assessed some empirical properties related to market microstructure using high-
frequency bid and ask quote data for the BRL/US$ exchange rate market. The paper shows that some
stylized facts observed in the international literature on exchange rate market microstructure are valid for
the BRL/US$ series and introduces new tools for the analysis of empirical microstructure effects.
Among the effects analyzed herein, we observed that the violation of the Markov property implies that
there is no immediate incorporation of new information into prices in this market, resulting in a structure
with long-range dependence in terms of bid and ask returns. To capture this long-range dependence
structure, we built a price discovery model using a vector error correction model, parameterizing this
process of information incorporation and obtaining an estimate for the equilibrium spread, which is
interpreted in the microstructure literature as a measure of the average costs of liquidity provision and
stock loading by dealers who operate in this market.
The modeling of the spread shows that there is a mechanism of asymmetric response for this variable,
where spread values above and below the equilibrium value react differently to the previous information
about the spread, volatility and durations. Spread values above the equilibrium value show high persistence
and react positively and proportionally to the quantile in relation to the expected volatility and conditional
duration, whereas we found an inverse relationship for quantiles below the spread distribution, with a
negative correlation of the spreads with the expected volatilities and durations and low persistence, which
characterizes a nonlinear mean reversion in the spreads.
This analysis of asymmetry using tools such as quantilogram for the identification of asymmetry in
conditional quantiles and the modeling of this structure using quantile regression models is original in the
literature on currency exchange market microstructure. Such empirical evidence points to new stylized
facts that should be added to theoretical models of market microstructure.
References
Bauwens, L., Giot, P., 2000. The logarithmic ACD model: an application to the bid–ask quote process of three NYSE stocks. Annales
d'Economie et de Statistique 60, 117–149.
Bauwens, L., Giot, P., Grammig, J., Veredas, D., 2002. A comparison of financial duration models through density forecasts.
International Journal of Forecasting 20, 589–609.
Bollerslev, T., Domowitz, I., 1993. Trading patterns and prices in the interbank foreign exchange market. Journal of Finance 48,
1421–1424.
Campbell, J., Lo, A., MacKinlay, A.C., 1997. The Econometrics of Financial Markets. Princeton University Press.
Christie, W., Harris, J.H., Schultz, P., 1994. Why did NASDAQ market makers stop avoiding odd-eights quotes? Journal of Finance 49,
1841–1890.
21
For a complete reference about quantile regression see Koenker (2005). We use the method of rank inversion for calculating the
variance–covariance matrix of the parameters.
236
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 265
Dufour, A., Engle, R., 2000. Time and the price impact of a trade. Journal of Finance 55 (6), 2467–2498.
Easley, D., O'Hara, M., 1987. Price, trade size, and information in securities markets. Journal of Financial Economics 19, 69–90.
Engle, R., 2000. The econometrics of ultra high frequency data. Econometrica 68, 1–22.
Engle, R., Manganelli, 2004. CAViaR: conditional autoregressive value at risk by regression quantiles. Journal of Business and Economic
Statistics 22 (4), 367–381.
Engle, R., Russel, J., 1998. Autoregressive conditional duration: a new model for irregularly transaction data. Econometrica 66,
1127–1162.
Falkenberry, T. N., 2002. High Frequency Data Filtering. Tech. rept. Tick Data.
Fernandes, M., Amaro de Matos, J., 2007. Testing the Markov property with high frequency data. Journal Of Econometrics 141 (1),
44–64.
Fernandes, M., Grammig, J., 2005a. A family of autoregressive conditional duration models. Journal of Econometrics 127 (1), 1–23.
Fernandes, M., Grammig, J., 2005b. Nonparametric specification tests for conditional duration models. Journal Of Econometrics 127
(1), 35–68.
Flood, M.D., 1994. Market structure and inefficiency in the foreign exchange market. Journal of International Money and Finance 13,
52–70.
Flood, R. P., & Taylor, M. P., 1996. The Microstructure of Exchange Rate Markets. National Bureau of Economic Research. Chap. Exchange
Rate Economics: What's Wrong with the Conventional Macro Approach?, pages 261–294.
Frankel, J. A., & Rose, A. K., 1995. Handbook of International Economics. North-Holland. Chap. Empirical Research in Nominal Exchange
Rates, pages 1698–1729.
Frenkel, J.A., Galli, G., Giovannini, A. (Eds.), 1996. The Microstructure of Foreign Exchange Market. National Bureau of Economic
Research.
Garcio, M., & Urban, F., 2004. O Mercado Interbancário de Câmbio no Brasil. Unpublished Working Paper.
Glassman, D., 1987. Exchange rate risk and transactions costs: evidence from bid–ask spreads. Journal of International Money and
Finance 6, 481–490.
Glosten, L., Milgrom, P., 1985. Bid, ask and transaction prices in a specialist market with heterogenously informed traders. Journal of
Financial Economics 14, 71–100 (Mar.).
Goodhart, C., 1989. News and the Foreign Exchange Market. In: Manchester Statistical Society.
Green, P.J., Silverman, B.W. (Eds.), 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach.
Chapman and Hall.
Harrison, J.M., Kreps, D., 1979. Martingales and arbitrage in multiperiod securities markets. Journal of Economic Theory 20, 381–408.
Harrison, J.M., Pliska, S., 1981. Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and Their
Applications 11, 215–260.
Hasbrouck, J., 1991. Measuring the information content of stock trades. Journal of Finance 46, 179–207.
Hasbrouck, J., 1992. Using the TORQ Database. Tech. rept. NYSE Working Paper 92-05.
Hasbrouck, J., 1999. Security bid/ask dynamics with discreteness and clustering. Journal of Financial Markets 2, 1–28.
Hasbrouck, J., 2007. Empirical Market Microstructure. Oxford University Press.
Hasbrouck, Joel, 1988. Trades, quotes, inventories and information. Journal of Financial Economics 22, 229–252.
Hasbrouck, Joel, 1996. Modelling market microstructure time series. Chap. 22 of. In: Maddala, GS, Rao, CR (Eds.), Handbook of
Statistics. Statistical Methods in Finance, vol. 14. North-Holland.
Jorion, J., 1996. The Microstructure of Foreign Exchange Markets. National Bureau of Economic Research. Chap. Risk and Turnover in
The Foreign Exchange Market, pages 19–40.
Koenker, R., 2005. Quantile Regression. Cambridge University Press.
Koenker, R., Basset, G., 1978. Regression quantiles. Econometrica 46, 33–50.
Koenker, R., Xiao, Z., 2006. Quantile autoregression. Journal of the American Statistical Association 475, 980–1006.
Linton, O., Whang, J., 2007. A quantilogram approach to evaluating directional predictability. Journal of Econometrics 141 (1),
250–282.
Lyons, R., 1996. The Microstructure of Foreign Exchange Markets. National Bureau of Economic Research. Chap. Foreign Exchange
Volume: Sound and Fury Signifying Nothing?, pages 183–208.
Lyons, Richard K., 2001. The Microstructure Approach to Exchange Rates. MIT Press.
McGroarty, F., Gwiylm, O., Thomas, S.H., 2006. Microstructure effects, bid–ask spreads and volatility in the spot foreign exchange
market pre and post-EMU. Global Finance Journal 17 (1), 23–49.
O, Hara, M., 1995. Market Microstructure Theory. Blackwell.
Sarno, L., Taylor, M., 2002. The Economics of Exchange Rates. Cambridge University Press.
Taylor, M.P., 1995. The economics of exchange rates. Journal of Economic Literature 83, 19–47.
Wu, T., 2007. Order Flow in the South: Anatomy of the Brazilian FX Market. University of California, Santa Cruz, Unpublished Working
Paper.
Zivot, E., & Yan, B., 2003. Analysis of High-Frequency Financial Data with S-PLUS.
237
Conclusão Geral
No segundo artigo da tese, “Generalized Latent Factor Models For Yield Curves In
Multiple Markets“, mostramos o potencial de generalização existente nesta formulação
Bayesiana de fatores latentes, realizando a modelagem conjunta de múltiplas curvas de
juros. As estruturas propostas permitem capturar de forma adequada o formato mais
complexo observado nas curvas de juros, e também permite identificar as interações entre
movimentos nas curvas entre mercados. Neste artigo também é proposta uma metodologia
de redução do número de parâmetros efetivos através de métodos de Bayesian Shrinkage.
A validade de condições de não-arbitragem e os problemas de identificação existentes
também são tratados neste artigo.
239
dois próximos artigos da tese. No trabalho “Generalized Empirical Likelihood/Minimum
Contrast Estimation of Stochastic Differential Equations“ mostramos que o uso de métodos
semi-paramétricos permite obter estimadores com boas propriedades de viés e eficiência
para a estimação de uma série de modelos em tempo contı́nuo utilizados em modelagem
de taxas de juros. Estas propriedades são interessantes já que a existência de viés e es-
timações ineficientes pode gerar problemas na precificação de ativos e em procedimentos
de gestão de risco. Os procedimentos de inferência semi-paramétrica baseados em Veros-
similhança Empı́rica e Mı́nimo Contraste Generalizados discutidos também representam
melhores propriedades em testes de hipóteses sobre parâmetros e em procedimentos de
testes de especificação.
240
blemas existentes nestas metodologias, como o ajuste inadequado para observações com
maturidades muito longas ou para segmentos da curvas com poucas observações.
241