Laurini MarcioPoletti D

Márcio Poletti Laurini
Ensaios em Econometria de Finanças
Tese apresentada ao Instituto de Ma-

temática, Estatı́stica e Ciência da Com-
putação da Universidade Estadual de
Campinas como requisito parcial para a
obtenção do tı́tulo de Doutor em Estatı́stica.
Orientador - Prof. Dr. Luiz Koodi

Hotta
CAMPINAS
2009
i

! ∀ # # ∃ ∃ % & ∋ # ((
∃) # ∗+, , #, ,− ../,
0 % 1 % 2
3 # 4% % 5 ( 6 7 # % % ∀# % % ∃) # 8 # %
∃ ∀# 9# ∃) ∋: 9& ,
,∀ ∃ , , ∋ # ( ∀# 9# , 8, 2 1 %,
88, 6 7 # % % ∀# % % ∃) #, 8 # % ∃
∀# 9# ∃) ∋: 9& , 888, 39 ,
39 ∃ ; <# ∀## =# & ∃ #
7 #( > 7 ∃ ; <# 41 =? %#5 ,∀ ∃ #, , # # #,
≅ % ∋: ∀# 9# ∃
3 ∋: Α ∃ ∀# 9#
Β ∃ % &, Α , 1 % 2
&, Α , % Χ
&, Α , 8 # % ; #% Χ∃ %
&, Α , 7 Χ ; # ; ∃
&, Α , 9 ∀ Ε 7 #2
Α % % & # ../
; ∃ % Φ#(Γ % ∋: Α % ∃ ∀# 9#
ii
iii
À Louis Bachelier.
v
Agradecimentos
Agradeço principalmente ao meu orientador Luiz Koodi Hotta pelo apoio fundamental
dado durante todo o programa de doutorado, e principalmente pelo exemplo como acadêmico,
pesquisador e ser humano. Esta convivência é um presente que irei valorizar por toda a vida.
Aos demais professores do Programa de Pós-Graduação em Estatı́stica do Imecc, agradeço pela
dedicação e por tudo que pude aprender neste perı́odo.
Aos membros da banca, professores Benjamin Tabak e Pedro Morettin pelos comentários e
sugestões dados durante a qualificação da tese, e também pelos inúmeros artigos e livros que
foram fundamentais até agora. Ao professor Flávio Ziegellman pelos comentários na defesa da
tese, e pelos vários comentários em congressos. Ao professor Caio Ibsen de Almeida agradeço
pela inspiração para vários dos artigos que compõem essa tese. Ao professor Mauricio Zevallos
por toda a convivência durante o programa de doutorado, e por todos os comentários dados no
seminários que eu apresentei durante esse perı́odo.
A todos os meus colegas de turma, pelo apoio, amizade e por todas as discussões sobre es-
tatı́stica.
Aos meus colegas de trabalho no Ibmec, pelo suporte e amizade.
A todos meus alunos, por permitirem que eu trabalhe nos meus temas de pesquisa.
A Lucinéia, pela minha alegria e todo o apoio e paciência.
Aos meus pais, por tudo.
vii
“I’m not interested in doing research and I never been. I’m interested in understanding, which
is quite a different thing.“
David Blackwell
“Rather than love, than money, than faith, than fame, than fairness... give me truth.“
Christopher McCandless
ix
Resumo
A tese compreende sete artigos sobre Econometria aplicada a problemas em Finanças. Dois
artigos abordam a estimação de modelos de fatores latentes para o ajuste e previsão da Estru-
tura a Termo de Taxas de Juros, utilizando métodos de Estimação Bayesiana utilizando Markov
Chain Monte Carlo, com o primeiro introduzindo uma estrutura de parâmetros variantes no
tempo e o segundo uma generalização para Estruturas a Termo de Taxas de Juros em múltiplos
mercados com a imposição de condições de não-arbitragem.
Dois artigos discutem a aplicação de Máxima Verossimilhança Empı́rica e Mı́nimo Contraste
Generalizado na estimação de equações diferenciais estocásticas e modelos de volatilidade es-
tocástica.
O próximo artigo aborda a estimação de equações diferenciais estocásticas com uma estrutura
de inovações dirigida por um Movimento Browniano Fracionário, através de uma metodologia
de Inferência Indireta.
O artigo seguinte discute o uso de métodos não-paramétricos para a interpolação de curvas de
juros com a imposição de condições de não-arbitragem, utilizando splines suavizantes sujeitos
a restrições de formato.
A modelagem de microestruturas no mercado de câmbio é abordada no próximo artigo da tese,
através do uso de metodologias paramétricas e semi paramétricas e testes para a presença de
informação assimétrica.
xi
Abstract
The thesis consists of seven articles on Financial Econometrics. Two articles focus on the
estimation of latent factor models to fit and forecast the term structure of interest rates using
Bayesian estimation methods through Markov Chain Monte Carlo, with the first article intro-
ducing a structure of time-varying parameters and the second article a generalization for the
term structure of interest rates in multiple markets with the imposition of no-arbitrage condi-
tions.
Two articles discuss the use of Empirical Likelihood and Generalized Minimum Contrast in the
estimation of stochastic differential equations and stochastic volatility models.
The next article discusses the estimation of stochastic differential equations with a structure of
innovations driven by a Fractional Brownian motion, through a method of Indirect Inference.
The following article discusses the use of nonparametric methods for interpolation of yield
curves with the imposition of no-arbitrage conditions, using smoothing splines with shape re-
strictions.
The modeling of microstructures in the exchange market is discussed in the next article of the
thesis, through the use of semi-parametric and non-parametric methodologies and tests for the
presence of asymmetric information.
xiii

∀ # ∃ ∃ % & ∋ ( !
)
∗ # ( ) + , & − .!
+ −− /
! − + 0 1 2 + − # 3
( ) + , &
4 − ∃ + 5 43
−−
6 & + 2 ( − + − 76
. ( ) 89 − + 5 ,1 : 3
+ 2 )
3 & ∀3
xv
Introdução Geral
Esta tese consiste em uma coleção de artigos sobre aplicações de métodos estatı́sticos
à problemas relacionados a modelagem de dados em mercados financeiros.
Os dois artigos iniciais tratam da modelagem da Estrutura a Termo de Taxas de

Juros utilizando modelos de fatores latentes. O primeiro artigo, “Bayesian Extensions
to Diebold-Li Term Structure Model“, realizado em co-autoria com Luiz Koodi Hotta,
apresenta um modelo de fatores latentes para a modelagem da estrutura a termo de ta-
xas de juros, que generaliza o modelo de estrutura a termo proposto por Diebold and
Li (2006). Propomos uma estrutura de fatores latentes, estimada de forma Bayesiana
através de procedimentos de Markov Chain Monte Carlo, com uma forma funcional mais
geral que permite um melhor ajuste ao formato da curva de juros. Nesta estrutura pro-
posta tornamos variantes no tempo os parâmetros de descaimento utilizados no ajuste da
curva de juros, e também adicionamos uma estrutura de volatilidade estocástica como um
fator latente adicional. Esta estrutura permite capturar o padrão de heterocedasticidade
observado nas taxas que compõem a estrutura a termo de taxas de juros.
A metodologia proposta endereça alguns dos problemas de inferência existentes no

modelo original proposto por Diebold and Li (2006), permitindo uma inferência exata
em amostras finitas para parâmetros, fatores latentes e previsões do modelo, não neces-
sitando de um procedimento de inferência em dois estágios como no modelo original.
Esta estrutura proposta também é vantajosa já que não necessita de um procedimento de
pré-interpolação da estrutura a termo de taxas de juros, como normalmente é realizado
na estimação de modelos para a curva de juros e que pode introduzir distorções. Uma
aplicação empı́rica da metodologia proposta é aplicada para a modelagem da estrutura
a termo dos contratos de Swaps DI-PRÉ negociados na Bolsa de Mercadoria e Futuros
(BM&F), e um procedimento de análise de previsão é realizado, mostrando que o modelo
proposto tem potencial preditivo e de ajuste dentro da amostra superior ao do modelo
original de Diebold and Li (2006).
O segundo artigo “Generalized Latent Factor Models For Yield Curves In Multiple
Markets“, em co-autoria com Luiz Koodi Hotta, também aborda a estimação de modelos
1
de fatores latentes para a Estrutura a Termo de Taxas de Juros. Nesse artigo propomos
uma estrutura geral de fatores latentes para a modelagem conjunta de múltiplas curvas de
juros. Partindo da metodologia Bayesiana de estimação utilizando Markov Chain Monte
Carlo discutida no artigo anterior, propomos uma estrutura geral que generaliza diversos
modelos existentes da literatura de estrutura a termo de taxas de juros.
Esta generalização permite o uso de formas funcionais mais gerais que as utilizadas
na literatura, com parâmetros de descaimento e volatilidades variantes no tempo, bem
como a incorporação direta da possibilidade de interações entre movimentos na curva de
juros entre mercados. O artigo também apresenta uma forma de incorporar restrições de
Não-Arbitragem na modelagem em múltiplos mercados de juros. Nesse artigo também
são discutidos problemas de identificação e do uso de métodos de Bayesian Shrinkage
para reduzir o elevado número de parâmetros envolvido na estimação de modelos para
múltiplos mercados, e a metodologia de inferência proposta permite obter as distribuições
exatas de parâmetros, fatores latentes e previsões do modelo. As metodologias propostas
são aplicadas para a modelagem conjunta da curva de Cupom Cambial e da curva de
Eurodólares, e no artigo temos uma discussão detalhada sobre especificação, comparação
de modelos e previsões, bem como uma discussão sobre a validade de condições de não-
arbitragem nestes mercados.
Os dois próximos artigos, em co-autoria com Luiz Koodi Hotta, sobre aplicações de
métodos semi-paramétricos baseados em Verossimilhança Empı́rica e Mı́nimo Contraste
Generalizados aplicados a problemas em finanças. O primeiro artigo “Generalized Empiri-
cal Likelihood/Minimum Contrast Estimation of Stochastic Differential Equations“ trata
da estimação de equações diferenciais estocásticas, e o segundo artigo “Estimation of Sto-
chastic Volatility Models Using Methods of Generalized Empirical Likelihood/Minimum
Contrast“ da estimação de modelos de volatilidade estocástica. O ponto comum entre
estes dois artigos está na dificuldade da avaliação da função de verossimilhança nestes
dois problemas.
Na estimação de equações diferenciais estocásticas a avaliação da função de verossi-

milhança é dificultada pela não-existência de soluções gerais para as equações diferenciais
estocásticas, relacionadas à densidade de transição do processo que representa a função
de verossimilhança. Neste contexto mostramos que a estimação de equações diferenciais
estocásticas através do uso de métodos de Verossimilhança Empı́rica e Mı́nimo Contraste
Generalizado nas discretizações dos processos de interesse rendem estimadores com boas
propriedades em amostras finitas, como mostrado pelos procedimentos de Monte Carlo
2
realizados neste estudo. Também discutimos como os métodos propostos tratam dos
problemas de especificação incorreta representados pelo uso de discretizações. O artigo
também contém uma aplicação empı́rica das metodologias propostas para uma série de
taxas de juros de curto prazo de maturidade de um mês (T-Bills).
No artigo que trata da estimação de modelos de volatilidade estocástica a dificuldade

da avaliação da função de verossimilhança está relacionada à presença do fator latente
representado pela volatilidade não-observada do processo. Neste artigo propomos o uso
de condições de momentos geradas pelo modelo log-normal de volatilidade para a cons-
trução de estimadores baseados em verossimilhança empı́rica/mı́nimo contraste. Estas
formas de estimação podem ser pensadas como a generalização dos métodos de momentos
utilizados na estimação de modelos de volatilidade estocástica através do uso de probabi-
lidades implı́citas dadas pela estimação não-paramétrica da verossimilhança do processo.
Mostramos através de uma série de estudos de Monte Carlo que as metodologias de es-
timação propostas têm boas propriedades em amostras finitas, com desempenho superior
aos métodos normalmente utilizados na estimação de modelos de volatilidade estocástica.
Também mostramos por meio de estudos de Monte Carlo que estes estimadores tem pro-
priedades de robustez em relação a inovações com caudas pesadas e outliers.
O próximo artigo da tese “Indirect Inference in Fractional Short-Term Interest Rate

Diffusions“, também em co-autoria com Luiz Koodi Hotta, discute o uso de metodologias
de estimação baseadas em simulação para a estimação de equações diferenciais estocásticas
utilizando dados observados discretamente em um contexto não-trivial, que é o de proces-
sos cujas inovações são dadas por um movimento Browniano Fracionário, que são processos
não Markovianos e que não podem ser caracterizados como Semi-Martingales. Neste con-
texto não é possı́vel avaliar a função de verossimilhança de forma exata, e propomos o uso
do princı́pio de Inferência Indireta na inferência dos parâmetros deste processo. Este pro-
cedimento consiste no uso de um modelo auxiliar mais simples e do uso de correções de viés
baseadas em simulações do processo gerador dos dados. Mostramos que o procedimento
proposto tem um desempenho satisfatório através de simulações de Monte Carlo, e reali-
zamos uma aplicação empı́rica da metodologia para a estimação de modelos fracionários
de taxas de juros de curto prazo através da estimação do modelo Cox-Ingersoll-Ross para
taxas de juros de T-Bills, Eurodólares e taxas de juros do Canadá.
O artigo “Constrained Smoothing B-Splines For The Term Structure Of Interest

Rates“, realizado em co-autoria com Marcelo Moura, discute o uso de métodos não-
paramétricos baseados em smoothing splines com restrições de formato para procedi-
3
mentos de interpolação e suavização de curvas de juros com a imposição de condições
necessárias de não-arbitragem. Neste artigo mostramos que a metodologia proposta tem
vantagens sobre algumas formas usuais de interpolação de curva de juros através de estu-
dos de simulação e aplicações empı́ricas para o mercado de instrumentos de Swap DIxPRÉ
negociados na BM&F e para instrumentos de STRIPS (Separated Trading of Interest and
Principals) de tı́tulos do Tesouro Americano.
O último artigo da tese, “Empirical market microstructure: An analysis of the BRL/US$

exchange rate market“, foi realizado em co-autoria com Luiz Gustavo Cassilati Furlani
e Marcelo Savino Portugal. O artigo apresenta um estudo sobre a presença de micro-
estruturas de mercado e processos de descoberta de preços para o mercado de câmbio
R$/US$ utilizando dados de alta-frequência. Este artigo discute modelagens paramétricas
e semi-paramétricas para estes efeitos de microestrutura, propondo modelos baseados em
co-integração e regressão quantı́lica dinâmica para as cotações de compra e venda neste
mercado de câmbio.
Os artigos estão formatados no padrão das revistas para o qual estão foram submetidos
ou publicados, e desta forma estão em na lı́ngua inglesa de acordo com o padrão destas
revistas. O artigo “Constrained Smoothing B-Splines For The Term Structure Of Interest
Rates“ foi aceito para publicação no Insurance: Mathematics and Economics, e o artigo
“Empirical market microstructure: An analysis of the R$/US$ exchange rate market“ foi
publicado no Emerging Markets Review, v. 9, p. 247-265, 2008.
4
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL
MÁRCIO POLETTI LAURINI
LUIZ KOODI HOTTA
Abstract. This paper proposes a statistical model to adjust, interpolate, and forecast the
term structure of interest rates. This model is based on the extensions for the term structure
model of interest rates proposed by Diebold and Li (2006), through a Bayesian estimation using
Markov Chain Monte Carlo (MCMC). The proposed extensions involve the use of a more flexible
parametric form for the yield curve, allowing all the parameters to vary in time using a structure
of latent factors, and the addition of a stochastic volatility structure to control the presence of
conditional heteroskedasticity observed in the interest rates.
The Bayesian estimation enables the exact distribution of the estimators in finite samples,
and as a by-product, the estimation enables obtaining the distribution of forecasts of the term
structure of interest rates. Unlike some econometric models of term structure, the methodology
developed does not require a pre-interpolation of the yield curve. The model is fitted to the
daily data of the term structure of interest rates implicit in SWAP DI-PRÉ contracts traded in
the Mercantile and Futures Exchange (BM&F) in Brazil. The results are compared with the
other models in terms of fitting and forecasts.
Keywords: Term Structure, Bayesian Inference, Markov Chain Monte Carlo.
JEL Codes: G1,C22.
5
BAYESIAN EXTENSIONS TO DIEBOLD-LI TERM STRUCTURE MODEL 2
1. Introduction
The term structure of interest rates may be defined as a collection of interest rates, indexed
in two dimensions: maturity and time. The first index shows the relation between the rates with
different maturities for contracts of the same nature in a determined period. The second index
shows the time evolution of the rates of contracts with the same maturities. The term structure of
interest rates shows the dynamics of the yield curve, linking a functional structure of observations
in cross-section (evolution of the rates over maturity) and the evolution of the yield curve over
time. As such, the term structure may be represented by a multivariate stochastic process. There
is a wealth of literature regarding the models of the term structure. To simplify, we can classify
this literature into three classes of models.
The first class encompasses the equilibrium models, such as Brennan and Schwartz (1979), Cox
et al. (1985) and Duffie and Kan (1996). The second classification is based on the arbitrage-free
models, of which Heath et al. (1992) is the representative framework.
The third relevant literature is the use of statistical models without a structural interpretation,
that is, models that synthesize data patterns and allow for the forecasting of the curve without
necessarily representing the theoretical models that fit under equilibrium and free-arbitrage con-
ditions. Examples of this model include the methodology of principal components (Litterman and
Scheinkman (1991)), curve interpolation models such as splines (McCulloch (1971)), smoothing
splines (Shea (1984)), kernel regression (Linton. et al. (2001)), and parametric models for curve
fitting such as Nelson and Siegel (1987) and Svensson (1994).
The dynamic extension of the Nelson-Siegel model, presented in Diebold and Li (2006) and
the basis of the procedure studied in this article, are examples of a statistical model that can
successfully forecast the term structure of interest rates.
Despite its basis in the theoretical models for interest rates, the structural models based on
equilibrium conditions have low forecasting power for the term structure. The calibration models
based on no-arbitrage do not permit a direct forecasting of the yield curve. Statistical models are
generally used in the fit and forecasting of the term structure of interest rates, because of their
superior adjustment to equilibrium-based econometric models and for their greater simplicity.
2. Diebold-Li Model
Among the statistical models for interest rate, the influential model designed by Diebold-Li
(Diebold and Li (2006)) is widely used in market applications. This model is a dynamic extension
6
of the Nelson-Siegel model (Nelson and Siegel (1987)) for the cross-section fit for the yield curve.
The Nelson-Siegel model corresponds to fitting the following equation for the yield curve observed
in the market on a specific date:

1 − e−mit /τt 1 − e−mit /τt −mit /τt
(2.1) yit (mit ) = β1t + β2t + β3t −e + ǫit
mit /τt mit /τt
where yit (mit ) is the observed rate on a given date, β1t , β2t , β3t are the time-varying parameters.
The Nelson-Siegel model is a parsimonious way of fitting the yield curve while managing to
capture a part of the stylized facts in interest rate process, such as the exponential formats present
in the yield curves.
The parameters βit have economic interpretations, where β1t presents a long-term level inter-
pretation; β2t presents short-term components; and β3t indicates medium-term components. It
may also be interpreted as the decompositions of level, slope, and curvature of the yield curve,
respectively, according to the terminology developed by Litterman and Scheinkman (1991). An
extension of this model is to use the formulation proposed by Svensson (1994) to fit the interest
cross-sections. This formulation considers the inclusion of an additional term to the formulation
proposed by Nelson and Siegel (1987), thus corresponding to:
(2.2)

1 − e−mit /τ1t 1 − e−mit /τ1t 1 − e−mit /τ2t
yit (mit ) = β1t +β2t +β3t − e−mit /τ1t +β4t − e−mit /τ2t +ǫit
mit /τ2t mit /τ1t mit /τ2t
allowing a more flexible fit for the yield curve and enabling the capture of multiple changes in the
yield-curve slope. The purpose of these models is to allow fitting and the subsequent interpolations
and extrapolations of the yield curve based on a parametric structure, which concurs with other
nonparametric fitting models such as smoothing splines. Besides the parsimonious estimation, the
Nelson and Siegel (1987) model has two additional advantages over the nonparametric models.
The first advantage is that the extrapolation of the curve has a better performance because of
the exponential nature of this model. The second advantage is that this formulation avoids the
problems in the construction of the forward curve, related to the absence of convexity adjustments,
which occur in non-parametric methods.
The extension formulated by Diebold and Li (2006) renders the Nelson and Siegel (1987) model
dynamic (adjusting the several days observed for the yield curve) by means of a procedure in 3
stages:
7
(1) The Nelson-Siegel model (with τ fixed, thus, making the model linear in the parameters)
is fitted by ordinary least squares for each date, estimating the parameters β1t , β2t , β3t .
(2) The dynamics of the system is modeled by a vector autoregressive (VAR) model for the
parameters β1t , β2t and β3t estimated at the first stage.
(3) Forecasts for these parameters are made through the VAR model estimated for vectors
β1t , β2t and β3t . By substituting the forecasted parameters in Nelson-Siegel model given
by Eq. 2.2, it is possible to forecast the future interest rate curves.
According to Diebold and Li (2006), this dynamic formulation has the purpose of capturing the
set of the existing stylized facts in the term structure of interest rates, such as the fact that while
the yield curve is crescent and concave, it may also assume inverted shapes like decreasing curves
and slope changes. Other stylized facts captured by Diebold and Li (2006) models are the high
persistence in the time dynamics (rates with same maturity are highly dependent on the past)
and the fact that persistence in the long-term rates is higher than that in the short-term rates.
Though the Diebold-Li model is simple to implement and has a superior predictive potential when
compared with other related models in the literature, some problems still arise when it is used.
The three main limitations to this model are as follows:
(1) To consider τ as fixed (linearization imposed in the model) may be troublesome for the
more unstable yield curves, such as those of the emerging countries.
(2) The functional form adapted from the Nelson and Siegel (1987) model does not allow for
capturing more complicated yield curves, such as when there are multiple changes in the
slope and curvature.
(3) No econometric property of the estimation method has been presented. Consider that it
is a two-step estimation, where the VAR is estimated on the basis of an estimated vector
of beta parameters. The main problem is the construction of the confidence intervals in
the finite samples for the forecasts obtained from this model. These intervals should take
into account the uncertainty in the estimation of the vectors of the hyperparameters β1t ,
β2t and β3t .

(4) As the forecasted curve may be contaminated by arbitrage situations, there is significant
resistance to the use of models that are not based on no-arbitrage conditions.
There are some proposed solutions to these problems. Problem 1 may be addressed by estimat-
ing the full Nelson and Siegel (1987) models without fixing the parameter τ , by generally using
the nonlinear least squares. Yet, considering the limited number of observations in the yield curve,
8
the problem of minimizing the nonlinear least squares may be complicated, presenting more than
one local minimum, a possibility that may lead to an inappropriate fit of the yield curve. This is
one of the justifications to keep the parameter τ fixed, to avoid the numeric optimization problems
involved in the estimation of nonlinear models with a restricted number of observations.
The simultaneous estimation of betas may be performed through the state-space formulation
using the Kalman filter, but τ is kept fixed in the sample because of the need for linearity in the use
of the linear Kalman filter. Some statistical properties of a model obtained from the Diebold and
Li (2006) formulation were derived in Huse (2007), in which a form similar to the Nelson-Siegel
model is used with the incorporation of spatial dependence and macroeconomic variables. The
estimation is performed in two steps, but certain properties of the estimation method in the finite
samples are studied using the Monte Carlo simulation. There are several works generalizing the
Nelson-Siegel model in which the no-arbitrage condition is imposed (Christensen et al. (2008)),
but they will not be considered here.
3. Proposed Extensions
To overcome these problems, we have proposed an extended version of the Diebold and Li
(2006) model using Bayesian methods. The Bayesian methods based on Markov Chain Monte
Carlo (MCMC) are proposed as alternatives to the maximum likelihood estimation in large cases
where the maximum likelihood methods are complicated or unfeasible to apply. Examples of
estimation procedures using MCMC include: estimation of continuous-time diffusion processes for
term structure of interest rates, option pricing, stochastic volatility, and regime switching models,
as summarized in Johannes and Polson (2007).
The advantages of the Bayesian formulation are that it enables us to treat both the parame-
ters and state vectors as latent variables. This is carried out through the dynamic linear model
formulation for the time evolution of those parameters. In the Bayesian formulation, it is not
necessary to assume linearity, and hence, it is not necessary to fix the parameter τ , as it is done in
the Diebold-Li method. It must be noted that the construction of the posterior distribution of the
parameters is performed by simulation; hence, the various local minima that affect the estimation
based on nonlinear least squares of Eqs 2.1 and 2.2 do not constitute a problem.
The first Bayesian formulation of the model of Diebold and Li (2006), proposed by Migon
and Abanto-Valle (2007), corresponds to an analogous specification of the original model, using
Nelson-Siegel Eq. 2.1, with parameter τ kept fixed but estimated simultaneously with the other
parameters of the model.
9
We have proposed some extensions to the Bayesian formulation of the Diebold and Li (2006)
model proposed by Migon and Abanto-Valle (2007). The first is to use the Svensson model (Eq.
2.2), rather than the original Nelson-Siegel formula (Eq. 2.1), which makes the curve format more
flexible. The second extension is to make the parameters τ1 and τ2 time-varying, adding two latent
factors to these components. The third extension is that the formulation of our model allows for
different number of observations for each day, which avoids the first stage of curve interpolation to
obtain a set of observations for the same maturities as it was originally performed in the article by
Diebold and Li (2006), and may introduce distortions in the yield curves used in the estimation.
The last extension introduced is to add a stochastic volatility structure to the model. This
addition is of fundamental importance because one of the stylized facts in the interest rates is
the presence of conditional heteroskedasticity, generally captured in no-arbitrage and equilibrium
models by the addition of factors that specifically control the stochastic evolution of the variance.
Examples of this kind of formulation include the Hull and White (1990) and Scott (1996) models,
and a detailed discussion may be found in Fouque et al. (2000). The advantages of the Bayesian
formulation are that the properties of the estimators are obtained in the exact form for finite
samples, which allows calculating the confidence intervals for the hyperparameters and forecasting
the term structure for interest rates, considering the uncertainty in the parameter estimation.
4. Model Description
We can describe the extensions proposed in this article by the following set of equations:
(4.1)

1 − e−mit /τ1t 1 − e−mit /τ1t 1 − e−mit /τ2t
yit (mit ) = β1t +β2t +β3 − e−mit /τ1t +β4t − e−mit /τ2t +eσt ηt
mit /τ1t mit /τ1t mit /τ2t
     
 β 1t   µβ1   β1t−1 
     
 β   µβ   β2t−1 
 2t   2   
     
     
 β3t   µβ3   β3t−1 
(4.2)   =  + Φ  + ǫt
     
 β4t   µβ4   β4t−1 
     
     
 τ   µτ   τ1t−1 
 1t   1   
     
τ2t µτ2 τ2t−1
10
(4.3) lnσt2 = φ0 + φ1 lnσt−1

2
+ υt
(4.4) ηt ∼ IID(0, 1) e ηt ⊥ ηs ∀ t 6= s
 
 ση 0 0 
X  
=
 0 Ωǫ 0 
η,ǫ,υ  
0 0 σv
In this specification, which may be considered as a nonlinear state-space model, Eq. 4.1 cor-
responds to a measurement equation, connecting the observed rates yit that describe the interest
rate as the functions of maturities i at time t.
In this specification, which may be seen as a nonlinear state-space model, Equation 4.1 corre-
sponds to a measurement equation, connecting the observed rates yit that describe the interest
rate as functions of maturities i at time t.
The formulation of this equation follows the specification of Svensson model, but with the
addition of latent factors βjt and τht , j = 1, 2, 3, 4 and h = 1, 2 are time-varying rather than fixed
P
in time. Matrix ε,ǫ,υ denotes the expanded variance-covariance matrix, where ση2 is a scalar
variance in the measurement equation, Ωǫ is the variance-covariance matrix between the latent
factors, and σv2 is a scalar variance in the stochastic volatility equation. We assume that the matrix
is diagonal, except for the submatrix of components Ωǫ , which may be correlated.
The evolution of the latent factors is given by Eq. 4.2, which describes a first-order autoregres-
sive model for these components with a parameter matrix given by Φ, containing the coefficients of
autoregressive estimation. We adopted a specification of first order for the autoregressive model,
though noting that there is no theoretical limitation to a superior order. A possibility is to
implement a restricted vector autoregressive structure, by working with only one autoregressive
structure for each parameter. Although this may be imposed a priori, a possible alternative is the
use of informative priors in the estimation of vector autoregressive models, as advocated by Doan
et al. (1984).
Finally, Eq. 4.3 describes the stochastic volatility components for the errors in the measurement
equation. The formulation used is that of an autoregressive model for the unobserved stochastic
11
volatility component, according to the original specification of the stochastic volatility model
introduced by Taylor (1986). The addition of the stochastic volatility model represents a relevant
extension, because the presence of conditional heteroskedasticity is a stylized fact in modeling the
series of interest rates. We noted that the addition of stochastic volatility components is especially
important at the moments of changes in the shape of the yield curve, especially because these
moments are linked to greater uncertainties about future interest rates and the expectations about
the ways assumed by the monetary and fiscal policy. A relevant stylized fact is that the volatility of
the interest rates is greater in the emerging economies; thus, the component of stochastic volatility
is especially relevant to the set of data used in this study.
5. Markov Chain Monte Carlo Estimation
It must be noted that the model specification given by Eqs 4.1,4.2 and 4.3 corresponds to a
nonlinear state-space model, and thus, cannot be treated by methods, such as the Linear Kalman
Filter. A way to perform the simultaneous estimation is through methods of Bayesian Inference
using the MCMC. The idea of the MCMC method is to simulate a Markov chain whose stationary
distribution converges to the distribution p(Θ|y). The MCMC methodology simplifies the calculus,
by factoring this distribution in a set of conditional distributions of inferior dimensions that can
make the simulation easier. The Hammersley-Clifford theorem (see Robert and Casella (2004) for
a derivation of this result) ascertains that under certain conditions, this set of conditional distri-
butions will uniquely characterize the posterior distribution p(Θ|y), and the MCMC methodology
is based on obtaining random samples of the conditional distributions, where a Markov Chain
structure is used. An evident advantage of this method is that it does not involve any method-
ology of numerical maximization, thus avoiding the numerical problems involved in the nonlinear
maximization of the functions such as those found in our problem. The validity of the method-
ology can be verified through methods that check the convergence of the Markov chains for its
stationary distribution.
The methodology of Bayes Hierarchical estimators is a convenient way to address the problem
when the model to be estimated can be placed in a state-space formulation. Following the example
given in Lehmann and Casella (1998), a form to represent these models is:
X|θ ∼ f (x|θ)
Θ|γ ∼ π(θ|γ)
Γ ∼ ψ(γ)
12
Thus, we place a hierarchy structure among the prior distributions. This formulation is espe-
cially useful in state-space models, because the hierarchical specification allows for the estimation
of the hyperparameters related to the latent factors using the avaliable data, specifying the dy-
namics for the latent factors. For example, the local level model is formulated as follows:
yt = µt + εt
(5.1) ,
µt = µt−1 + νt
where we use as prior distribution of the latent factor µt , the value of µt−1 , and then µt ∼
p(µt−1 ), which corresponds to the idea of state equation in the state-space formulation. The
specification of the latent factors uses a generalized formulation ξt ∼ p(ξt−1 ), where ξ denotes the
set of latent factors in our model given by βit , τit and σi2 . This methodology is also known as
empirical Bayes estimators (Lehmann and Casella (1998)).
In our problem, we cannot directly sample all the conditional distributions, owing to the non-
linear forms involved. Thus, a Hybrid MCMC is used, where we simultaneously use the Gibbs
algorithm and the Metropolis-Hastings algorithm, a methodology initially proposed in Tierney
(1994). A hybrid MCMC algorithm (Robert and Casella (2004)) may be considered as iterations
in the following stages:
(t+1) (t+1) (t) (t)

For i=1,...,p ,and given (θ1 , ..., θi−1 , θi , ...θp )
1 - Simulate
(t+1) (t+1) (t) (t)
θei ∼ qi (θ|θ1 , ..., θi−1 , θi , ...θp )
2 - Accept


 θ(t) with probability 1 − ρ
(t+1) i
θi ,=

 θei with probability ρ
where
 (t+1) (t+1) (t)

 gii (θei |θ1 ,...,θi−1 ,,θi ,...θp(t) )


^ e (t+1) ,...,θ (t+1) ,θ (t) ,θ (t) ,...θp(t) )
ρ=1 qii (θi |θ 1
(t) (t+1)
i−1
(t+1)
i i
(t)


 gii (θi |θ1 ,...,θi−1 ,,θi ,...θp(t) )
 (t) (t+1) (t+1) (t) (t) (t)
qii (θi |θ1 ,...,θi−1 ,θi ,θi ,...θp )
where q is the so-called tentative distribution (we assume a multivariate gaussian distribution
as tentative distribution) and g is the conditional distribution.
To completely characterize our model, the prior distributions are the normal-gamma pair in-
verse for βit and τit , using the hierarchical characterization with the mean given by the vector
13
Figure 6.1. Swap Di-PRÉ term structure(12/01/2004 - 05/12/2006)
0.22
0.20
Intere
0.18
s t Ra
te
0.16
2500
0.14
2000
ity
1500
tur
600
Ma
1000
400
Tim
e 500
200
autoregressive structure. For the parameters Φ of the autoregressive vector, we assume a normal
multivariate structure with the variance matrix given by Wishart distribution; for the latent factor
of stochastic volatility, we assume σt2 ∼ LogN ormal(φ0 + φ1t σt−1
2
, τσ2 ), with a gamma distribution
for τσ2 , normal for φ0 , and finally φ1 ∼ Beta.
For the parameters βit and φ0 , the parameters of Wishart distribution, and those of the gamma
distributions, we used a Gibbs sampling step; for τit , we used Metropolis-Hastings; and for pa-
rameter φ1 , we used the algorithm known as Slice Sampler Neal (2003)).
6. Application
In this section, we have presented an application of this model for the fitting of term structure
implicit in the SWAP DI-PRÉ curves provided by the BM&F (Mercantile and Futures Exchange)
in Brazil. These instruments are swap contracts between floating and fixed interest rates, and are
considered the more liquid fixed income market in Brazil. This yield curve is notoriously difficult
to adjust by conventional methods. We have used the BM&F data on yield curves implicit in the
SWAP operations for time interval from January 12, 2004 to December 12, 2006, a sample of 722
yield-curve days. Figure 6.1 shows the evolution of the yield curves over time.
The interesting fact is that the curves in our study present several slope and curvature changes,
going from the usual crescent shape to inverted curve, several times throughout the period. In
the same interval, it also presents several days where the yield curves have two slope changes.
14
This fact cannot be adequately captured by the model of Diebold and Li (2006), because the
Nelson and Siegel (1987) formulation does not allow more than one slope and curvature change.
Another point of importance is that the yield curve in Brazil has intense oscillation, both in terms
of curve level and format, which reinforces the necessity to make the parameters time-varying and
challenges the maintenance of parameter τ as fixed, as assumed by the Diebold and Li (2006)
model.
Another important point is that the yield curve lengthens and retracts in the mentioned period,
that is, the maximal maturities observed in the SWAP contracts change in the analyzed sample,
varying between 1800 and 2400 days.
It must be noted that our study does not carry out a pre-interpolation and extrapolation on
the data; the methodology permits to work with distinct maturities in each day. The average
number of distinct maturities is 24, with a minimum of 20 and maximum of 29. This fact must be
highlighted because the interpolation stage may distort the data, and the estimated model may
be used to interpolate and extrapolate the curve if necessary.
To estimate the model, we used 10,000 iterations of the MCMC algorithm described in Section 5,
discarding the first 5,000 iterations (burn-in period) and using the other 5,000 in the construction
of posterior distributions. The Gelman-Rubin convergence diagnoses indicate that the Markov
chains converge to the stationary distributions, thus validating the estimation methodology used.
Figure 6.2 shows the model fit inside the sample and the residuals in relation to the observed
curves. Figures 6.3, 6.4, 6.5 and 6.6 show the evolution of the latent factors β1t , β2t , β3t and β4t
obtained as medians of posterior distributions. The evolution of β1t clearly shows the level of
interpretation of this parameter, following the evolution of the mean yield curve over time. The
evolution of the other hyperparameters also adequately captures the evolution of the slope and
curvature components of the term structure observed in the interest rates.
Figure 6.7 and 6.8 are of of special importance because it they shows that the prior fixing of
parameter τ assumed in Diebold and Li (2006) model is not a valid restriction, as it becomes
evident by the great Timeral variation observed in parameters τ1 and τ2 . This gives an indication
regarding the necessity to incorporate variation in these parameters for the yield curves with great
variation of format, as observed in the emerging countries.
The estimated Stochastic Volatility component (Figure 6.9) shows the capacity of the model
to capture the stylized fact of the presence of conditional heteroskedasticity in the interest rates.
The structure of conditional volatility captures the uncertainty existing in the periods of change
15
Figure 6.2. Adjusted curve and residuals
0.24
0.22
Intere
0.20
st Rat
0.18
e
0.16
10
0.14 8
ty
6
turi
600
Ma
4
400
Tim
e 2
200
0.005
Intere
0.000
st Rat
e
10
−0.005
8
ty
6
turi
600
Ma
4
400
Tim
e 2
200
Figure 6.3. β1
β1
0.26
0.24
0.22
0.20
β1
0.18
0.16
0.14
0 200 400 600
Time
in the yield curves’ shapes, because we can notice the correlation between increase in volatility
and periods of inversion in the curve format.
Figure 6.10 shows another fact captured by the Stochastic Volatility structure-the high persis-
tence of shocks in the volatility. This is noticeable because parameter φ1 is concentrated on values
close to 1.
16
Figure 6.4. β2
β2
0.05
0.00
β2
−0.05
−0.10
0 200 400 600
Time
Figure 6.5. β3
β3
0.2
0.1
0.0
β3
−0.1
−0.2
−0.3
−0.4
0 200 400 600
Time
Figure 6.6. β4
β4
0.0
−0.5
β4
−1.0
−1.5
0 200 400 600
Time
Table 1 shows the credibility intervals calculated for the matrix of coefficients Φ. To verify
the stationarity of the process, we calculated the eigenvalues of matrix Φ for the upper and lower
limits of this matrix. The highest eigenvalues for the higher limit was 1.0029, and that for the
17
Figure 6.7. τ1
τ1
0.32
0.30
0.28
τ1
0.26
0.24
0.22
0 200 400 600
Time
Figure 6.8. τ2
τ2
0.45
0.40
τ2
0.35
0.30
0 200 400 600
Time
Figure 6.9. Stochastic Volatility
Stochastic Volatility
0.050
0.045
0.040
0.035
0.030
0.025
0.020
0.015
0 200 400 600
Time
lower limit was 0.9783, indicating that the region of nonstationarity is included in the credibility
intervals.
18
Table 1. Credibility Intervals 95% - Φ
µ β1t−1 β2t−1 β3t−1 β4t−1 β5t−1 β6t−1
Φβ1 (.025) .2310 1.262 .5203 -.1122 -.4625 -.0152 -.1183

Φβ1 (.50) .2407 1.281 .5504 -.1077 -.4432 -.0136 -.1115
Φβ1 (.975) .2532 1.293 .5721 -.1033 -.4266 -.0139 -.0999
Φβ2 (.025) -.2582 -.2877 .4215 .1051 .4337 .0135 .1005

Φβ2 (.50) -.2453 -.2755 .4437 .1102 .4490 .0141 .1126
Φβ2 (.975) -.2316 -.2572 .4746 .1149 .4702 .0154 .1189
Φβ3 (.025) -.4741 -.4378 -.9778 1.141 .5951 .0183 .1253
Φβ3 (.50) -.4020 -.3385 -.8204 1.171 .7172 .0224 .1619

Φβ3 (.975) -.3331 -.2330 -.6642 1.201 .8445 .0270 .1998
Φβ4 (.025) .1570 .1605 .3217 -.0822 .1397 -.0115 -.0896
Φβ4 (.50) .1764 .1918 .3705 -.0728 .1782 -.0100 -.0783

Φβ4 (.975) .1984 .2205 .4234 -.0639 .2124 -.0087 -.0659
Φβ5 (.025) .3965 .4719 .9298 -.5165 -2.2090 .9181 -.6076
Φβ5 (.50) .7881 1.1180 1.8360 -.3394 -1.5045 .9456 -.3833
Φβ5 (.975) 1.1680 1.8060 2.787 -.1688 -.7929 .9707 -.1819
Φβ6 (.025) .2788 .3752 .6515 -.1911 -.8212 -.0282 .7723

Φβ6 (.50) .3651 .5022 .8621 -.1567 -.6838 -.0232 .8116
Φβ6 (.975) .4360 .6283 1.0570 -.1191 -.5263 -.0173 .8538
Figure 6.10. Posterior Distribution - Φ0 and Φ1
Histogram of phi0 Histogram of phi1

600
300
500
250
400
200
Frequency
Frequency
300
150
200
100
100
50
0
0.032 0.034 0.036 0.038 0.040 0.042 0.9975 0.9980 0.9985 0.9990 0.9995 1.0000
phi0 phi1
To demonstrate the predictive potential of the model, we showed the forecasts for some specific
days, characterized by distinct shapes of the yield curve. We also showed the forecasts and one-
step-ahead forecast errors for all the days observed in the sample.
Figure 6.11 shows the one-step-ahead forecasts obtained by the extended Diebold-Li model,
with confidence intervals at the 2.5% and 97.5% limits, for 4 days observed in the yield curve.
The first subfigure shows the prediction for July 20, 2004, with the format generally observed in
the interest rates, with a positive trend in maturity. The second curve, predicted for February 01,
2005, shows a curve with slope change, normally associated with expected changes in the long-term
interest rates. The curve predicted for June 27, 2006 shows an opposite situation, with a decreasing
19
Figure 6.11. One-step ahead forecasts and forecast errors-specific days
0.20
0.20
0.19
0.19
Interest Rate Forecast

0.18
0.18
0.17
0.17
0.16
0.16
0.15
0 1 2 3 4 5 0 1 2 3 4 5
Maturity Maturity
(a) 20/07/2004 (b) 01-02-2005

0.170
0.140
0.165
0.135
0.160
0.130
0.155
0.150
0.125
0.145
0.120
0.140
0 2 4 6 8 10 0 2 4 6 8 10
Maturity Maturity
(c) 27/06/2006 (d) 06/12/2006
curve at the medium-term maturities and an increasing curve at the long-term maturities. The
subfigure d shows a one-step-ahead forecast for the last observation in the sample, the observation
referring to December 6, 2006.
The one-step-ahead forecasts for all the samples, along with the inclusion of extrapolations for
the unobserved maturities and associated prediction errors, are shown in Figure 6.12. The forecast
errors have relatively low magnitude. It can be noticed that the bigger errors are concentrated at
the moments of change in the shape of the yield curve.
We also carried out a comparative forecast analysis between the extended Diebold-Li model and
the original formulation of Diebold-Li with fixed τ , the time-varying specification of the Diebold-Li
model, and a modification in the Diebold-Li model using Svensson Eq. 2.2 to replace the original
formulation based on the Nelson-Siegel specification, with parameters τ1 and τ2 being fixed and
time-varying, respectively. The estimation of these reference models for the time-varying τ was
based on nonlinear least squares, whereas the linearized forms were based on the estimation by
20
Figure 6.12. One step Ahead Forecasts and Forecast Errors
0.24
0.005
0.22
Intere
0.20
Intere
st Rat
0.000
st Rat
0.18
e
e
0.16
10 10
−0.005
0.14 8 8
ty
ty
6 6
turi
turi
600 600
Ma
Ma
4 4
400 400
Tim Tim
e 2 e 2
200 200
(a) One step Ahead Forecasts (b) Forecast Errors
ordinary least squares. Table 2 presents the root mean square error using the one-step-ahead
forecast errors for the five compared models. Parameters τ, τ1 and τ 2 were fixed by the mean value
of the corresponding time-varying parameters.
The results of this comparative analysis showed that the Diebold-Li model with the proposed
extensions has a superior forecast performance when compared with the other models, as shown
in Table 2.
The original Diebold-Li model with parameter τ being fixed, using the Nelson-Siegel specifica-
tion, is not a valid specification because it substantially reduces the predictive power of the model
when compared with the varying-parameter version of the same model. In the case of the Diebold-
Li model using Svensson specification, the fixed parameters result in better predictive power than
the estimation with free parameters. This result may be explained by the difficulty in estimating
Svensson specification, because on many days, the estimation does not converge because of non-
linearity, which makes the model fitting inadequate, thus raising the mean quadratic error value
with the presence of high forecasting errors for all the maturities observed on those days. This
problem also contaminates the autoregressive vector estimation, which compromises on the curve
forecasting for the following day. However, the use of Bayesian estimation with informative priors
allows us to employ the more flexible Svensson specification, but without being affected by the
instability problems in the nonlinear estimation, which occur in the classical estimation that use
nonlinear least squares.
21
Table 2. Root mean square forecast error
Model Root Mean Squared Error

Extended Diebold-Li 1.1987
Original Diebold-Li τ fixed 36.61
Original Diebold-Li τ varying 23.92
Diebold-Li-Svensson τ1 and τ2 fixed 17.05
Diebold-Li-Svensson τ1 and τ2 varying 204.6
7. Conclusions
In this article, we implemented some extensions for the Diebold and Li (2006) model, including
the use of Bayesian estimation methods using MCMC for the parameters and latent factors of this
model. The proposed extensions for the model were the changes in the functional form, making it
more flexible with the use of the Svensson (1994) specification, including latent factors, to make
the model parameters time-varying, and enabling the use of different observations on each day
and the inclusion of a stochastic volatility structure.
The more flexible form adopted allows capturing changes in the shapes associated with the
yield curve in the emerging countries. This flexibility is reflected in the low forecasting and fitting
errors observed in this model. The use of the Bayesian estimation associated with informative
priors in the latent factors specification avoids the procedure of linearized estimation in the two
stages employed in the Diebold-Li model, which results in more precise fits and forecasting. The
parameter specification, as modeled latent factors through a Bayesian hierarchical structure, allows
obtaining the distribution in finite samples of parameters and model predictions, thus enabling
the quantification of the uncertainty present in the estimation of the term structure of interest
rates
We demonstrated that it is important to make the τ parameters time-varying to fit the term
structure, in particular, to yield curves in the emerging countries with constant modifications
in the shape of the yield curve, as observed by the behavior of the latent factors τ1t and τ2t .
The latent factors, owing to their informative prior structure, allow us to overcome the common
problem of numerical instability associated with the nonlinear estimation of the Nelson-Siegel
and Svensson models in the presence of a restricted number of observations. On the other hand,
the Bayesian estimation methodology, through the MCMC algorithms, carries out the estimation
simultaneously and allows us to avoid the linearization of the model and the estimation in the two
stages used in Diebold and Li (2006).. The specification of the model is based on a standard set
of priors, and the estimation algorithm, based on a mixture of Gibbs and Metropolis-Hastings, is
22
widely used and its properties are extensively studied, thus making the estimation of the model
simple and trustworthy.
References
Brennan, M. J. and Schwartz, E. J.: 1979, A continuos time approach to the pricing of bonds,
Journal of Banking and Finance 3, 133–155.
Christensen, J. H., Diebold, F. X. and Rudebusch, G. D.: 2008, An arbitrage-free generalized
nelson-siegel term structure model, Econometrics Journal forthcoming.
Cox, J. C., Ingersoll, J. . E. and Ross, S. A.: 1985, A theory of the term structure of interest rates,
Econometrica 53, 385–408.
Diebold, F. and Li, C.: 2006, Forecasting the term structure of government bond yields, Journal
of Econometrics 130, 337–364.
Doan, T., Litterman, R. and Sims, C.: 1984, Forecasting and conditional projection using realistic
prior distributions., Econometric Reviews 3, 1–100.
Duffie, D. and Kan, R.: 1996, A yield-factor model of interest rates, Mathematical Finance pp. 379–
406.
Fouque, J.-P., Papanicolaou, G. and Sircar, K. R.: 2000, Derivatives in Financial Markets with
Stochastic Volatility, Cambridge University Press.
Heath, D., Jarrow, R. and Morton, A.: 1992, Bond pricing and the term structure of interest
rates: A new methodology for contingent claims valuation, Econometrica 60(1).
Hull, J. and White, A.: 1990, Pricing interest rate derivative securities, Review of Financial Studies
3(4), 573–94.
Huse, C.: 2007, Term structure modelling with observable state variables. Unpublished Working
Paper - FMG - LSE.
Johannes, M. and Polson, N.: 2007, Handbook of Financial Econometrics, chapter MCMC Methods
for Continuos Time Financial Econometrics.
Lehmann, E. and Casella, G.: 1998, Theory of Point Estimation (2nd Edition), Springer.
Linton., O., Mammen, E., Nielsen, J. and Tanggard, C.: 2001, Estimating yield curves by kernel
smoothing methods, Journal Of Econometrics 105:1, 185–223.

Litterman, R. and Scheinkman, J.: 1991, Common factors affecting bond returns, Journal of Fixed
Income 1, 54–61.
McCulloch, J.: 1971, Measuring the term structure of interest rates, Journal of Business (44), 19–
31.
23
Migon, H. and Abanto-Valle, C.: 2007, A Bayesian term structure modelling, in C. Fernandes,
H. Schimidli and N. Kolev (eds), Proceedings of the Third Brazilian Conference on Statistical
Modelling in Insurance and Finance, IME-USP, pp. 200–203.
Neal, R.: 2003, Slice sampling (with discussions), Annals of Statistics 31, 705–767.
Nelson, C. R. and Siegel, A. F.: 1987, Parsimonous modelling of yield curves, Journal of Business
60(4), 473–489.
Robert, C. and Casella, G.: 2004, Monte Carlo Statistical Methods, Springer.
Scott, L. O.: 1996, Simulating a multi-factor term structure model over relatively long discrete
time periods, Procedings of the IAFE First Annual Computacional Finance Conference.
Shea, G.: 1984, Pitfalls in smoothing interest rate structure data: Equilibrium models and spline
approximation, Journal of Financial and Quantitative Analysis 19, 253–269.
Svensson, L. E. O.: 1994, Estimating and interpreting forward interest rates: Sweden 1992-1994.,
NBER Working Paper (4871).
Taylor, S. J.: 1986, Modelling Financial Time Series, John Wiley& Sons.
Tierney, L.: 1994, Markov chains for exploring posterior distributions (with discussion), Annals
of Statistics 22, 1701–1786.
24
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE
MARKETS

INSPER INSTITUTE AND IMECC-UNICAMP
LUIZ KOODI HOTTA
IMECC-UNICAMP
Abstract. In this article we propose latent factors models to model simultaneously yield curves in multiple
markets, generalizing several models found in the literature on the estimation of term structure of interest
rates. The proposed models do not use some of usual restrictions adopted for estimation and identification,
thus enabling us to use more flexible structures incorporating additional latent factors, stochastic volatility
and the imposition of no-arbitrage consistency. The elimination of these restrictions is made possible through
the Bayesian estimation methodology using the Markov Chain Monte Carlo (MCMC). This methodology
makes it possible to obtain exact confidence intervals for the parameters, latent factors and forecasts, and
also to address identification and dimensionality problems in the estimation of multimarket models. The
models are applied to model jointly Cupom Cambial (USD interest rate in Brazil) and Eurodollar curves,
carrying out an extensive procedure of model comparison and demonstrating the forecast and practical
potential of the proposed models.
Keywords: Term Structure, Latent Factors,No-arbitrage, Forecasting.
JEL Codes: C11, G12, G17.
Adress- Insper Institute - Rua Quatá 300, 04546-042, São Paulo, SP. Brasil. email - Márcio Laurini - marciopl@isp.edu.br -
Luiz Koodi Hotta - hotta@ime.unicamp.br.
1
25
GENERALIZED LATENT FACTOR MODELS FOR YIELD CURVES IN MULTIPLE MARKETS 2
1. Introduction
Modeling the term structure of interest rates is a fundamental point in the management of capital as-
sets. A considerably large literature has been developed to obtain more precise forms for the modelling,
forecasting and pricing of financial instruments with basis on the yield curve. Among these approaches
an important part of the literature is based on the idea that the dynamic evolution of the yield curve may
be described using a set of dynamic factors that determine the evolution of risk premiums for the various
maturities observed. The most common way of considering these factors is through a representation using
1
latent state variables, that is, as variables not directly observed
The purpose of these latent factors is to summarize the whole set of relevant variables determining the yield
curves’ movements. The methodologies for the extraction of these latent factors may arise from purely
statistical mechanisms, such as the decomposition of principal components introduced in Litterman and
Scheinkman (1991), where the latent factors are interpreted as components of level, slope and curvature.
These latent factors may also be identified by methodologies of pricing by equilibrium, such as the short-
rate models of Vasicek (1977) and Cox et al. (1985), which belong to the class of affine models (Affine
Diffusions, e. g. Cox et al. (1985)). These equilibrium models may also be placed in a general framework
based on no-arbitrage conditions by the Heath-Jarrow-Morton ( Heath et al. (1992)) formulation, which
determines the evolution of forward rates as a stochastic process of infinite dimension.
All these approaches, however, show partial success in the empirical modeling of the dynamic evolution of
the term structure of interest rates. The equilibrium models and the affine models, though having impor-
tant analytical properties such as the existence of closed formulas for asset pricing, are both characterized
by a rather unsatisfactory fit of the rates observed and of the forecasts derived from these models as
well. An additional difficulty is that, in general, the econometric estimation of these models suffers from
problems of local maxima and identification, as pointed out by Duffe (2002). The models of no-arbitrage
1
For references about modeling the term structure of interest rates see, for example, Brigo and Mercurio (2006) for aspects
related to the pricing of financial instruments, and Singleton (2006) about the estimation of models of the term structure of
interest rates.
26
are calibrated so that they replicate perfectly the yield curve observed in the market by way of matching,
using bond prices, but this calibration is cross-sectional and does not allow for forecasting future curves.
It only allows pricing of derivative instruments. Moreover, these models are recalibrated daily using in-
struments observed in the yield curve.
Diebold and Li (2006), whose main objective is to forecast the term structure of interest rates, propose a
dynamic model using the parametric form for the yield curve proposed by Nelson and Siegel (1987), and
interprete this model as a latent factor model. In this generalization each parameter of the cross-section
fit of the Nelson-Siegel’s model is considered as a latent factor, and through the modeling and forecasting
of these latent factors it is possible to obtain forecasts for the whole term structure of interest rates. The
results obtained by Diebold and Li (2006) indicate that this formulation presents a fit and a forecast power
superior to other methodologies of yield curve modeling, making this model the standard reference for
term structure forecasting.
The model proposed in Diebold and Li (2006) was also attractive because of its ease of implementation.
With some restrictions about the parametric space, this model could be estimated by using only estimation
by Ordinary Least Squares while the other models would require more complex estimation tools such as
the Kalman filter (e.g. Duffe (2002)) or estimation methods such as the Simulated Method of Moments,
employed in the estimation of affine models in Dai and Singleton (2000). Apart from simplifying its im-
plementation, the restrictions imposed in the Diebold and Li (2006) model were necessary to avoid the
usual problems in the estimation of term structure of interest rates models, such as the above mentioned
problems of local maxima and non-identification.
Based on the success they obtained in the dynamic extension of the Nelson-Siegel curve, Diebold et al.
(2008) proposed a generalization of this model to fit multiple yield curves simultaneously, employing a
methodology that consisted in building latent factors connected to a not directly observed global yield
curve. In the Diebold et al. (2008) model, the yield curve of each market would be obtained as a linear
displacement of the global yield curve plus an idiosyncratic factor, by means of these latent factors. It is
important to note that the [Diebold et al., 2008] formulation is the first attempt at creating a model that
27
makes it possible to capture simultaneously the dynamics of several term structures. This formulation
has also been adopted to model the yield curves of emerging countries in Morita and Bueno (2008), thus
demonstrating the general applicability of this model.
However, the model proposed by Diebold et al. (2008) employs a series of restrictions in its formulation.
Given the high number of parameters involved in the estimation of the global model, Diebold et al. (2008)
employ a rather limited specification for the general shape of the yield curve in each market. Instead of
using a complete formulation of the Nelson-Siegel model with level, slope and curvature factors, Diebold
et al. (2008) use only the components of level and slope, which makes the fit for the observed yield curves
rather limited, although it is important to note that the primary purpose of this model was not its fit or
forecast: its purpose was rather to verify the existence of a global factor influencing the movements of the
term structure in most important markets.
An additional restriction is also used in this model, in that the parameter defining the slope of the yield
curve should be kept constant, which significantly impairs the model’s fit. Some other problems found in
this formulation refer to the estimation procedures: the use of a procedure in two stages does not make
it possible to obtain measures such as exact confidence intervals for the model’s parameters and for the
yield curve forecasts. Further problems relate to the model’s identification, that is, to obtain conditions
for a single vector of parameters able to define the maximum of the likelihood function employed in the
model’s estimation. And finally, other problems found in this formulation relate to the presupposition of a
constant conditional volatility, which contradicts one of the stylized facts in the modeling of yield curves.
Furthermore, the formulation proposed in Diebold et al. (2008) does not overcome one of the fundamental
criticism to the original model of [Diebold et al., 2006], that is, the model’s inconsistency with no-arbitrage
conditions. This original limitation of the Diebold et al. (2008) model was resolved in Christensen et al.
(2007, 2008), who demonstrate that, although the original formulation of the Diebold et al. (2008) model
is incompatible with no-arbitrage conditions, it is nevertheless possible to work with an approximate form
of this model which is arbitrage-free, reparametering the Diebold et al. (2008) model as an affine model
of term structure and obtaining a term of correction that enables the incorporation of the no-arbitrage
28
conditions in this context.

A difficulty that has been experienced in this formulation is that, in the arbitrage-free formulation for the
Nelson-Siegel curve proposed by Christensen et al. (2007) it is necessary that the model be parameterized
with factors of level, slope and curvature, and, in the generalization proposed in Christensen et al. (2008),
it is necessary that the model have two further curvature and slope factors. Thus, the imposed restriction
of having only level and slope factors in the model of global factors of Diebold et al. (2008) prevents the
adoption of these arbitrage-free parameterizations.
In this article we propose models of latent factors for yield curves in multiple markets, generalizing the
models proposed by Diebold and Li (2006), Diebold et al. (2008) and Christensen et al. (2008). The
suggested models do not need to adopt the restrictions imposed in the original formulations. We work
with a parametric form that is more general than that adopted in those articles, using the specification
proposed by Svensson (1994) with level, slope and two curvature factors as well as the functional form
adopted in Björk and Christensen (1999) and Christensen et al. (2008) with additional level and curvature
factors. These more flexible parametric forms allow a more refined fit for the term structure of interest
rates. Another proposed extension is an additional factor of stochastic volatility which permits capturing
the stylized fact of the presence of conditional volatility observed in interest rates (e.g. Chan et al. (1992)).
The estimation methodology proposed here also allows us to treat the parameters that define the slope
of the yield curve at any given time as one of the additional latent factors and variants in time, allowing
a better fit of the yield curve. The elimination of these restrictions is possible through the methodology
of Bayesian estimation and a procedure of estimation through the Markov Chain Monte Carlo (MCMC)
mechanisms. Additionally, this methodology of estimation makes it possible to obtain exact confidence
intervals for the model’s parameters and forecasts, and it overcomes the existing estimation problems
in the traditional methodologies of term structure of the interest rates, such as non-linear least squares
(e.g Svensson (1994)) or maximum likelihood using the Kalman filter Duffe (2002) , and particularly the
problems of local maxima and non-identification present in these formulations.
This methodology also permits addressing the dimensionality problem that exists in the estimation of
29
multimarket models through a mechanism known as Bayesian Shrinkage, which enables the automatic
elimination of the model’s redundant parameters. Finally, we implement, for the model of multiple mar-
kets, the no-arbitrage conditions formulated by Christensen et al. (2008), generalizing these conditions in
the multiple markets’ case. Thus the proposed generalizations deal with all the problems pointed out in
the original formulations of the Diebold and Li (2006) and Diebold et al. (2008) models.
This article is structured as follows: sections 2 and 3 review the original models Diebold and Li (2006)
and Diebold et al. (2008) and discuss the problems in the formulations. Section 4 shows the extensions
proposed to get round those problems. Section 5 discusses the implementation of no-arbitrage conditions,
and section 6 shows the Bayesian estimation procedure by MCMC method. Sections 8 and 7 discuss how
the Bayesian estimation provides a way of addressing the problems of identification and dimensionality
of the parameter vector. Section 9 shows an empirical application of the proposed models, fitting joint
models for the curves of the CopoM Cambial (USD interest rates in Brazil) and the Eurodollar curve. This
section carries out an extensive comparison of all the models proposed in this study and we also implement
a new procedure in the literature, making it possible to verify the validity of the imposition of no-arbitrage
conditions on those models of term structure of interest rates. The final considerations follow in section 10.
2. Diebold & Li Model
Among the models used for modeling the term structure of interest rates, the model proposed by Diebold
and Li (2006) is generally used in the market because of its simplicity of implementation and its superior
forecasting performance. This model is based on the formulation proposed by Nelson and Siegel (1987)
for the cross-section fit (day by day) of the yield curve. The Nelson and Siegel (1987) curve is represented
as follows:

1 − e−m/τ 1 − e−m/τ −m/τ
(2.1) yt (m) = β1 + β2 + β3 −e + ǫt (m)
m/τ m/τ
30
where yt (m) are the rates observed on a certain date t for the maturity vector m and β1 , β2 , β3 and τ are
parameters. The parameters are interpretable - β1 represents the long-term component, β2 a short-term
component, β3 a medium-term component and τ is a parameter that controls the slope of the yield curve.
Parameters β1 , β2 , β3 may also be interpreted as level, slope and curvature decompositions in accordance
with the terminology developed by Litterman and Scheinkman (1991). This model is a parsimonious way
of fitting the yield curve, and is capable of reproducing several stylized facts about the shape of the yield
curve in time.
Diebold and Li (2006) propose a dynamized version of the Nelson-Siegel model, interpreting the parameters
as dynamic factors. This model can be formulated through an observation equation for the yield curve
given by:

1 − e−m/τ 1 − e−m/τ −m/τ
(2.2) yt (m) = β1t + β2t + β3t −e + ǫt (m)
m/τ m/τ
and a system determining the evolution of latent factors as a first-order vector autoregression:
     
β1t µ1 β1t−1
     
     
(2.3)  β2t =
  µ2  + Φ  β2t−1  + ǫβt
     
β3t µ3 β3t−1
where Φ is the parameter matrix of this autoregressive vector process. The model estimation is gen-
erally performed through a two-stage procedure. The first stage is the estimation of equation 2.2 for
each day observed. This estimation is performed by Ordinary Least Squares, assuming that the slope
parameter τ is fixed and known, estimating the latent factors β1t , β2t , β3t for each period of time t. The
second stage consists in the estimation by Ordinary Least Squares of the parameter matrix Φ of the au-
toregressive vector using parameters β1t ,β2t and β3t estimated in the first stage. Forecasts for the model
31
are obtained by joining the forecasts for the latent factors t days ahead in the Nelson-Siegel (2.2) equation.
As it can be noted, the estimation and forecasting in the Diebold and Li (2006) model is extremely
simple, making its implementation possible in any standard econometric software. However, this simpli-
fied formulation can be criticized on various aspects. It might be too restrictive to consider parameter τ
as constant for unstable curves, such as is the case of the curves of emerging countries. This parameter
captures the average slope of the yield curve, and this parameter can be changed with alterations in the
curve shape. Another important point is that the adopted parametric specification, derived from the func-
tional form of the Nelson-Siegel model, does not make it possible to capture curves with more complicated
formats, such as curves that have more than one change in slope and/or in curvature.
Other relevant points refer to the properties of the estimators in this two-stage estimation procedure.
The first point is that the estimation is only consistent in the choice of one correct parameter τ . It is also
important to note that the distribution of estimators in this context is unusual, since the estimation in
the second stage is based on a series constructed through a first stage. This also affects the construction
of confidence intervals for the forecasts of the yield curve derived from this model. Furthermore, there
is a loss of efficiency in the two-stage estimation since the estimation of the latent factors is performed
day by day, hence it is disconnected from the autoregressive vector structure adopted in equation 2.3. An
alternative way of performing this estimation would be to use maximum likelihood through the Kalman
Filter, since the system formed by equations 2.2 and 2.3 is already in a state-space formulation, but this
estimation continues to suffer from local maxima and non-identification problems, as commonly happens
in the estimation of term structure models that use the Kalman Filter (e.g. Duffe (2002) ). Another
fundamental problem is that the original formulation of the Diebold and Li (2006) model is not consistent
with the no-arbitrage principle. The Nelson-Siegel curve used in Diebold and Li (2006) does not accom-
modate a free-arbitrage representation, which can be seen, for example, in Björk and Christensen (1999),
32
Filipovic (1999, 2001) and Diebold et al. (2005).
3. Extensions for Multiple Markets
The Diebold-Li model is a dynamic model for a curve of one market only, but it is possible to gener-
alize this formulation to model several yield curves simultaneously. This generalization was proposed in
[Diebold et al., 2008]. Denoting the observed curve for market i as a function of a maturity τ vector by
yit (τ ), we have that, in this model, the yield dynamics is given by a restricted version of the Nelson-Siegel
curve, with level and slope factors only:2.

1 − e−λτ
(3.1) yit (τ ) = lit + sit + vit (τ )
λτ
where, in the [Diebold et al., 2008] notation, lit represents the level component in the period t for
the country i, sit represents the slope component for this same country in each period t, and vit is a
shock component for the equation of rate observation. In order to specify the complete dynamics of the
model it is necessary to specify the evolution of the latent factors of level and slope for each country. In
the specification proposed in Diebold et al. (2008) the idea is that there are the so-called global factors
determined by a non-observed curve ygt in the form:

1 − e−λτ
(3.2) ygt (τ ) = Lt + St + Vgt (τ )
λτ
and the dynamics of the global latent factors Lt and St is given by the following autoregressive dynamic:
2In this exposition of this model we follow the original notation in the Diebold et al. (2008)] study, which denotes the
maturity vector as τ and the slope parameter as λ, whereas the notation used in the other models presented uses m to
denote the maturity vector and τ for the slope parameters. We also use the original specification of Nelson and Siegel (1987)
for parameter , whereas Diebold and Li (2006), Diebold et al. (2008) use the factor λ = 1/τ
33
      
Lt φ11 φ12 Lt−1 Utl
(3.3)  =  + 
St φ21 φ22 St−1 Uts
In order to determine the slope and curvature components it is assumed that each country’s curve is a
linear modification of the global curve plus an idiosyncratic component. The local curves are given by:
lit = αil + βil Lt + εlit

(3.4)
sit = αis + βis Lt + εsit
and the idiosyncratic components are given by first-order autoregressive processes:
      
εlit φ11 φ12 εlit−1 ult
(3.5)  =  + 
εsit φ21 φ22 εsit−1 ust
The estimation of this model could be performed in principle employing maximum likelihood through
the decomposition of the forecast error using the Kalman filter, noting that in this case we have additional
latent variables representing the global factors. However, due to the dimension of the problem for the
multimarket case and the usual estimation problems, such as the identification problems and the possibil-
ity of local maxima, the estimation of the Diebold et al. (2008) model is performed in two stages.
In the first stage the curve for each country is obtained by Ordinary Least Squares, assuming again
that the parameter which controls the slope curve is kept constant and not estimated. A second stage
is performed with the factors obtained for each country, using MCMC to obtain the other parameters
and latent factors. This estimation is also performed with the imposition of some restrictions such as
the assumption that the parameter matrix in the autoregressive processes of the local factors is diagonal.
34
Even though this procedure has an operational purpose, it is difficut to obtain a statistical interpretation
of the results because the estimation of the model is in part frequentist, and in part Bayesian. Once
again we face the same problem of how to build confidence intervals for parameters and forecasts with
this two-stage procedure; moreover, the estimation procedures by MCMC use only joint distributions and
linear specifications, thus it does not use the total information in the yield curve.
This procedure presents similar limitations to those of the original estimation of the Diebold-Li model,
but aggravated by the dimensionality and heterogeneity of the model and of the shapes of the yield curves
in different markets. The first important point to note is that fixing the slope parameter of the curve can
considerably limit the model fit. Different markets may have very different slope factors, and, as already
mentioned, it may turn out to be extremely limiting to assume that these parameters are fixed in time.
Another important point is that the restriction of assuming only level and slope factors also limits the
possible fit of the model. There is ample literature quoting the fit gains obtained by the inclusion of
additional curvature factors, such as the original Svensson (1994) model and the Björk and Christensen
(1999) models, which add more slope and curvature factors, thus considerably increasing the fit. It is also
fundamental to note that in this specification it is not possible to use the arbitration-free especifications
proposed in Christensen et al. (2007, 2008) because in those formulations each slope component has to be
coupled with a curvature component with the same reversion rate to average; therefore this formulation
without curvature components cannot be made arbitrage free.
A further fundamental criticism is that, in the dynamic specification adopted in equation 3.5, each
country’s curve is a displacement of the global curve plus an idiosyncratic factor. Note that in this formu-
lation there is no direct interdependence between the yield curves, thus the model does not allow direct
identification of the possible interactions between the latent factors of different markets. An expected
interpretation would be to verify whether, for example, displacements of the level of one particular market
35
do affect the level of the other markets. Note that in this formulation this is performed only indirectly by
modifications in the global factor, and it is not possible to observe this direct effect.
4. Proposed Models
In order to address the existing problems in the original formulations of the Diebold and Li (2006);
Diebold et al. (2008) models, we use the Bayesian framework for latent factors proposed in Laurini and
Hotta (2008), though generalized for the case of more than one yield curve and also with the addition of
no-arbitrage correction proposed in Christensen et al. (2008). The proposed models can be classified in
3 classes - the first class of models is a generalization of the latent factor structure, increasing the state
vector so as to include interactions with the other latent factors, in particular the latent factors of the
other countries; the second class of models is a generalization of the global factor structure Diebold et al.
(2008), with the inclusion of components of curvature, double curvature and additional slopes; and the
third class contains the necessary modifications to make the previous two classes arbitrage-free, using the
approximation of an affine model proposed in [Christensen et al. (2007, 2008).
The common structure between the first two classes is given by the more flexible formulation of the
observation equation. We adopt as basic structure the dynamic generalization of the parametric form
proposed by Svensson (1994), which consists in an equation with a level factor, a slope factor and two
curvature factors in the form:

1 − e−m/τ1t 1 − e−m/τ1t −m/τ1t
(4.1) yt (m) = β1t + β2t + β3t −e
m/τ1t m/τ1t

1 − e−m/τ2t −m/τ2t
+β4t −e + σt ηt (m)
m/τ2t
36
where we assume:
(4.2) ηt ∼ IID(0, 1) and ηt ⊥ ηs ∀ t 6= s,
In the arbitrage-free models we use the specification with an additional slope factor and another curva-
ture factor on the representation given by equation 4.1, detailed in section 5. On this basic model we also
adopt the generalization proposed in Laurini and Hotta (2008) to render time variant the slope factors
τ1 and τ2 , considering these parameters as additional latent factors, by using a first-order autoregressive
structure for all the latent factors given by:
     
β1t µβ1 β1t−1
     
     
 β2t   µβ2   β2t−1 
     
     
 β3t   µβ3   β3t−1 
     
(4.3)  =  + Φ   + ǫt ,
 β4t   µβ4   β4t−1 
     
     
     
 τ1t   µτ1   τ1t−1 
     
τ2t µτ2 τ2t−1
where, in principle, the matrix Φ is a complete matrix, and thus each latent factor in the period t
depends on the other latent factors in the period t-1 plus an µ intercept. Another generalization in this
model is the possibility of stochastic volatility factor σt , whose dynamics is given by:
(4.4) lnσt2 = φ0 + φ1 lnσt−1

2
+ υt ,
This factor makes it possible to capture a conditional volatility structure in interest rates, which in turn
makes it possible to capture this stylized fact (e.g. Chan et al. (1992), Lund and Andersen (1997)). This
stochastic volatility component has an additional function, that is, the possibility of avoiding an excessive
variation in the latent factors of the model. A known result in the Bayesian literature is that it is possible
37
to write a regression model with random coefficients in a regression model with fixed coefficients by the
inclusion of a component of conditional heterocedasticity (e.g. Bauwens et al. (1999)). In relation to this
point, it is also interesting to note the criticism made by Sims (2001) of a model for time variant param-
eters proposed by Cogley and Sargent (2001) to observe variants in the monetary policy. Sims points out
that the observed variation in the parameters in the Cogley and Sargent (2001) model could be generated
by a non-controlled structure of conditional volatility in the model. Thus this component of conditional
volatility tries to avoid this problem of excessive variation in the latent factors of the model.
An important point is that, in the class of models with corrections for no-arbitrage, it is necessary to keep
both the structures of conditional volatility and of slope parameters constant in time, and therefore these
two extensions cannot be used. These extensions would render the model unsuitable to belong to the class
of affine models, and consequently the approximation proposed in Christensen et al. (2007, 2008) would
not apply.
In the following sections we define the particular characteristics of the 3 classes of models proposed in this
study.
4.1. Models of Generalized Latent Factors. The first class of models uses a generalization of the
Diebold-Li model, expanding the latent vectors to include interactions between the latent factors defining
the curves of the different markets. In this class we define the observed curve yti(m) for the country i
through Svensson’s representation:
i
" i
#
−m/τ1t −m/τ1t
1 − e 1 − e i
(4.5) yti(m) = β1t
i i
+ β2t i
+ β3t − e−m/τ1t
m/τ1ti m/τ1ti
" i
#
i 1 − e−m/τ2t i
+β4t − e−m/τ2t + σti ηti (m)
m/τ2ti
38
The generalization of the vector of latent factors for the multimarket case is given by the following
representation:
i i j
(4.6) βkt = Φi βkt−1 + Φj βkt−1 + ǫkt
i i j
(4.7) τkt = θi τkt−1 + θj τkt−1 + νkt
2j
(4.8) lnσt2i = γi lnσt−1
2i
+ γj lnσt−1 + ξt
i
where k = 1, 2, 3, 4, βkt represents the latent factors of level (k=1), slope (k=2), curvature (k=3) and
double curvature (k=4) for market i, with a structure analogous to equation 4.3 but with the inclusion of
j
βkt−1 , which represents the factors for the market j , which have an equivalent representation. Likewise,
i
we have the factors τkt for the different countries and the stochastic volatility strucure σt2i for each country
i. Note that in this representation each latent factor of a market is influenced by other countries’ factors,
allowing us to introduce an interaction between the different yield curves as discussed in section 3. In
order to complete the model we adopt the following covariance structure for each market’s parameters:
 
σηi2 0 0
i
X  
 
= 0 Ωiǫ 0 
η,ǫ,υ
 
2i
0 0 σv
Pi
The matrix ε,ǫ,υ is the expanded variance-covariance matrix of the parameters of the model for each
country; σηi2 is the variance of the measure equation; Ωiǫ is the covariance matrix between the latent factors;
and σvi2 is the variance of the stochastic volatility process. This matrix is block-diagonal, except for the
39
sub-matrix of components Ωiǫ , which can be correlated.
4.2. Generalized Global Model. The second class of models is a generalization of the Diebold et al.
(2008) global factor model. In this case we do not adopt the restrictions in this study, and we use a
complete representation for the parametric structure of the yield curve observed in each country, employing
a representation analogous to Svensson’s curve. Following the Diebold et al. (2008) notation, we have that
the curve for each country is given by:
i
" i
#
1 − e−m/τ1t 1 − e−m/τ1t i
(4.9) yti(m) = lit + sit + c1it −e−m/τ1t
m/τ1ti m/τ1ti
" i
#
1 − e−m/τ2t i
+c2it − e−m/τ2t + σti ηti (m)
m/τ2ti
where lit is the level of the country i, sit is the slope and c1it and c2it are the two curvature factors, and
all these factors evolve in t. In this representation we have τ1ti and τ2ti as the slope factors for each country
i, and they also are time variants.
To complete the specification of the model, we generalize the structure of global factors used by Diebold
et al. (2008) . In this structure each latent factor of level, slope and curvatures are a linear function of
the equivalent global factor. This representation is written as:
(4.10) lit = αil + βil Lt + εlit
40
(4.11) sit = αis + βis St + εsit
(4.12) c1it = αic1 + βic1 C1t + εcit1
(4.13) c2it = αic1 + βic1 C2t + εcit2
(4.14) τ1it = αiτ1 + βiτ1 τg1t + ετit1
(4.15) τ2it = αiτ2 + βiτ2 τg2t + ετit2
and the α and β represent parameters (loadings) to be estimated and the vector of global latent factors
(Lt, St , C1t , C2t , τg1t , τg2t ) evolves as a first-order autoregressive vector, generalizing the structure of equation
(3.3). We also assume that the idiosyncratic components for the latent factors of each market follow a
first-order autoregressive structure, according to the general specification given by equation (3.5), but
applied to this generalized vector of latent factors.
5. No-arbitrage
What we have so far in the specifications discussed consists basically in statistical representations, i.e.,
although the latent factors are interpreted as components of level, slope and curvature, this interpretation,
even in affine models, is an approximation, as demonstrated in Almeida (2005). These representations are
41
merely tools for fit and forecast of the yield curve, lacking a complete theoretical or structural justifica-
tion. In this regard, the main shortcoming of these models is their lack of compatibility with the principle
of no-arbitrage pricing. The fundamental result of no-arbitrage pricing, known as fundamental theorem
of asset pricing, establishes that a market is arbitrage-free if, and only if, it has (at least one) measure
of Probability Q equivalent to the physical measure P, so that the discounted sequence of risk-adjusted
returns on assets are a semi-martingale in this measure Q (e.g. Harrison and Kreps (1979); Harrison and
Pliska (1981); Delbaen and Schachermayer (1994)).
Consistency with no-arbitrage is a fundamental principle in Finance, since it establishes that the asset
return must be consistent with its level of risk, and thus a systematic certainty of risk-free profits should
not arise. In large assets and high liquidity markets the no-arbitrage principle must be attained by the
performance of rational traders. In the modeling of the term structure of interest rates, the general prin-
ciple of no-arbitrage can be observed within the general framework proposed by Heath et al. (1992). A
curve is consistent with no-arbitrage if it can be projected in the space of all the arbitrage-free curves in
the equivalent martingale measure, and it must generally be contained in a stochastic variety generated
by the Heath-Jarrow-Morton structure, as demonstrated in Filipovic (2001).
The problem arising herein is that the curves generated by the Nelson-Siegel models are never consistent
with no-arbitrage, and there is only one restriction about the Svensson model which is consistent with
no-arbitrage, but its structure is too limited for practical use, as proven by Filipovic (1999). Therefore,
although models of the Nelson-Siegel class and Svensson class and their dynamic extensions present a
good empirical fit to the observed data of the term structure of interest rates, they would not be valid
in terms of no-arbitrage consistent pricing. On the other hand, the opposite situation also occurs - the
majority of the no-arbitrage models used have a poor fit to the observed data, as demonstrated by Duffe
(2002), presenting an apparent trade-off between consistency with no-arbitrage on the one hand, and fit
and forecast powers on the other. Nevertheless, recent evidence indicates that, with adequate modifica-
tions in the structure of arbitrage-free models, it is possible to obtain an adequate predictive power in
these models, as, for example, in Almeida and Vicente (2008).
42
Although there is no an arbitrage-free form in the Nelson-Siegel-Svensson class, with the introduction
of some modifications it is possible to produce a similar class of models with the no-arbitrage property,
as shown in Christensen et al. (2007) for the Nelson-Siegel family and Christensen et al. (2008) for the
Svensson family3. To make this correction for no-arbitrage [Christensen et al., 2007, Christensen et al.,
2008] employ affine term structure models. These models are quite convenient because they present inter-
esting analytical properties, such as the existence of closed formulae for asset pricing and are characterized
by a common structure that makes it possible to encompass several models studied in the literature, as
demonstrated by Dai and Singleton (2000).
In order to characterize the structure of the affine term structure models we start from the definition of a
zero coupon bond price in the period t with maturity T in the equivalent martingale measure Q, which
must be given by:
h ´T i
(5.1) P (t, T ) = EtQ e− t rs ds ,
where r(t) represents the instantaneous interest rate (short rate). In this class of models r(t) is an affine
function of an unobserved vector of state variables (latent factors) Y (t):
N
X ′
(5.2) r(t) = δ0 + δy Yi (t),
i=1
where δs represent parameters and Yi (t) is a so-called affine diffusion with the following structure:
Xp
(5.3) dYi (t) = κ(θ − Yi(t))dt + S(t)dW (t)
3The derivation for the Nelson-Siegel family is a special case, using only the latent factors , as can be seen in [Christensen
et al., 2008].
43
with parameters κ and θ, dW (t) is standard Brownian motion and S(t) is a diagonal matrix with i-th
element given by:
′
(5.4) S(t)ii = αi + βi Y (t).
Duffie and Kan (1996) demonstrate that, in this way, the bond price may be written as:
′
(5.5) P (t, τ ) = eA(τ )−B(τ ) Y (t) ,
where A(τ ) and B(τ ) are given by the solution of the following system of ordinary differential equations:
dA(τ ) ′ ′ 1
PN hP′i2
dt
= −θ κ B(τ ) + i=1 2
B(τ ) αi − δ0
(5.6) P hP′ i 2
i
dB(τ )
= −κ B(τ ) + 12 N
′
dt i=1 B(τ ) βi − δy .
i
The great advantage of this class of affine term structure models is that they are quite flexible, allowing
for the generalization of a wide range of term structure models featured in the literature, particularly in
the definition of the latent factors, which normally tend to be fairly generalized, as indicated in Dai and
Singleton (2000) and Diebold et al. (2005).
In order to obtain this arbitrage-free representation for the family of term structure models defined by the
Svensson curve, Christensen et al. (2008) employ an affine structure model, assuming that the short rate
is given by the sum of latent factors:
(5.7) r(t) = Xt1 + Xt2 + Xt3
44
and these latent factors Xt1 , Xt2 , Xt3 , Xt4 , Xt5 evolve through the following system of differential stochastic
equations:
       
 dXt1
  0 0 0 0 0   θ1Q   Xt1

       2 
 dX 2   0 λ1 0 −λ1 0   θ2Q   X 
 t       t 
       
(5.8)  dX 3  =  0 0 λ 0 −λ   θ3Q  −  X 3  .
 t   1 2     t 
       
       4 
 dXt4   0 0 0 λ1 0   θ4Q   Xt 
       
dXt5
0 0 0 0 λ2 θ5Q Xt5
In this model, according to equation (5.1), prices of zero-coupon bonds are obtained by the following
expression:
(5.9)
´T
P (t, T ) = EtQ [e t ru du
] = exp(B 1 (t, T )X + B 2 (t, T )Xt2 + B 3 (t, T )Xt3 + B 4 (t, T )Xt4 + B 5 (t, T )Xt5 + C(t, T )),
where the terms B i (t, T ) and C(t, T ) are the only solutions for the following systems of differential
ordinary equations:
      
dB 1 (t,T ) 1
 dt   1  0 0 0 0 0   B (t, T ) 
 dB 2 (t,T )
     
   1   0 λ1 0 0 0   B 2 (t, T ) 
 dt      
      
(5.10)  dB 3 (t,T ) = 1   0 0 λ2 0 0   B 3 (t, T ) 
 dt      
      
 dB 4 (t,T )      
 dt   0  0 −λ1 0 λ1 0   B 4 (t, T ) 
      
dB 5 (t,T ) 5
dt
0 0 0 −λ2 0 λ2 B (t, T )
1 X X ′ X
5
dC(t, T ) ′ ′
(5.11) = −B(t, T ) κQ θQ − B(t, T )B(t, T )
dt 2 j=1 j,j
45
and the zero-coupon yields are given by:
1 − e−λ1 (t−T ) 2 1 − e−λ2 (t−T ) 3

(5.12) y(t, T ) = Xt1 + Xt + Xt +
λ1 (t − T ) λ2 (t − T )

1 − e−λ1 (t−T ) −λ1 (t−T ) 4 1 − e−λ2 (t−T ) −λ2 (t−T ) C(t, T )
−e Xt + −e Xt5 − .
λ1 (t − T ) λ2 (t − T ) T −t
This result that can be interpreted as a reparameterization of the Björk and Christensen (1999) curve
with the addition of a correction factor for no-arbitrage given by the term − C(t,T
T −t
)
, which is given by the
following expression4:
1 1 X X X
5
C(t, T )
(5.13) − =− B(s, T )B(s, T )′ ′
,
T −t 2 T − t j=1
P
where is a matrix with latent factor covariances. This correction factor is a function of latent factors
variances and also of the model’s slope parameters, which, in this formulation, are assumed to be constant.
The specification of the Christensen et al. (2008) model is a very useful representation because it allows
any affine form of latent factors to be used, thus making it possible, for example, to add macroeconomic
variables to the latent factors vector, or to increase the dependence structure in these factors, while still
remaining consistent with no-arbitrage (which is the form used in this study), including the interaction
with latent factors of the other markets, so as to generalize the structure of factors employed.
An important aspect to note here is that, in order to make possible the correction for no-arbitrage, it
is necessary to employ a structure with five latent factors, which implies additional slope and curvature
factors in Svensson’s model. Thus the original representation of the Diebold et al. (2008) model, which
4. The analytical expression for this correction term is found in the Appendix of [Christensen et al., 2008]’s article but it
has been omitted here for reasons of space.
46
displays factors of level and slope exclusively, cannot be made arbitrage- free by the methodology of
Christensen et al. (2007, 2008). In order to obtain arbitrage-free representations for the generalized latent
factor models proposed in section 4, we have increased the dynamics of latent factors by including crossed
factors, i.e., each latent factor in each market depends on the latent factors of its own market plus the
latent factors of the other markets, in the form:
i i j
(5.14) βkt = Φi βkt−1 + Φj βkt−1 + ǫkt ,
where now k=1,2,3,4,5 represents the five factors needed for the arbitrage-free correction, and now the
equation describing the yields of each market is given by:
1 − e−λii1 (t−T ) i 1 − e−λi2 (t−T ) i

(5.15) y i (t, T ) = β1t
i
+ β2t + β +
λi1 (t − T ) λi2 (t − T ) 3t

1 − e−λi1 (t−T ) −λi1 (t−T ) i 1 − e−λi2 (t−T ) −λi2 (t−T ) i Ci (t, T )
−e β4t + −e β5t − .
λi1 (t − T ) λi2 (t − T ) T −t
In this representation we do not adopt the stochastic volatility factor, and we keep the slope param-
eters λ fixed in time, maintaining consistency with the affine specification of the model, but these slope
parameters are jointly estimated with the other parameters of the model.
47
6. Bayesian Estimation using Markov Chain Monte Carlo
In all the specifications presented so far, we have models that can be represented by a non-linear
state-space model, where we have a non-linear observation equation for rates, and a set of state equa-
tions representing the latent factors. In some of the models we also consider the slope parameters and
the volatility as additional latent factors. Whereas the basic representation can be estimated using the
Kalman filter, the non-linear forms cannot be estimated by this methodology, and, even in its simplest
representations, this procedure is blighted by several estimation problems.
Given the computational difficulties involved in the estimation of these models, ad hoc restrictions are
generally put in place to facilitate the estimation, such as, taking the decay parameter as fixed, or else to
perform the estimation by means of two-stage procedures, as commented in sections 3 and 2.
In this context, one way of performing the estimation - while using all the available information in the term
structure of interest rates and avoiding the imposition of ad hoc restrictions - is to employ the Bayesian
methods of estimation using MCMC algorithms. As we will demonstrate next, this methodology makes
it possible to address the existing problems in the usual mechanisms of estimation, such as non-linearity,
identification and dimensionality problems. In the estimation via MCMC, linear and non-linear models
are approached in the same way, and one of the advantages of the Bayesian methodology is that it allows
the latent factors to be addressed as additional parameters to be estimated.
In Bayesian inference, the aim is to find the posterior distribution of parameters of interest conditioned
to the observed sample, denoted by p(Θ|y). This posterior distribution is the result of the updating of an
prior distribution assumed for the parameters from the information available in the sample, represented
by the likelihood function.
In order to find the distribution of the parameters conditioned to the sample, the following relation derived
from Bayes rule is used:
(6.1) p(Θ|y) = p(Θ, y)/p(y) = p(y|Θ)p(Θ)/p(y),
48
where p(y|Θ) is the model’s likelihood, p(Θ) denotes the prior distribution assumed for the parameter
and p(y) is the marginal distribution of the sample, which must be known up to a constant integration.
Thus the a posterior distribution is proportional to the product of the likelihood by the prior distribution.
After obtaining the posterior distribution, the summarization of the results can be done by calculating,
for example, the expected values and the variance of the posterior distribution of each parameter:
ˆ
(6.2) E(θk |y) = θk p(Θ|y)dθ
ˆ
(6.3) V ar(θk |y) = θk2 p(Θ|y)dθ − [E(θk |y)]2.
and we can evaluate the marginal density of the parameter θj by using:
ˆ
(6.4) p(θj |y) = p(Θ|y)dθ1dθ2 ...dθd .
Thereby, the main objective of a Bayesian estimation is to obtain posteriori distributions, containing
priori information updated by the information in the sample, given by the likelihood function. Except for
some specific cases, generally speaking the use of conjugate distributions - (the prior distribution being
from the same family as the posterior distribution - we do not have analytical forms for posteriori distri-
butions. Nevertheless, there is a way of obtaining such distributions in these cases: it consists in adopting
techniques of numerical integration using Monte Carlo methods. There is a Monte Carlo methodology
fundamental in Bayesian estimation methods and it is to use the so-called MCMC algorithms (e.g Robert
and Casella (2005), Gamerman and Lopes (2006)).
49
The idea of the MCMC methods is to simulate a Markov chain whose stationary distribution converges
to the p(Θ|y) distribution. A fundamental result of this is that the estimation of p(Θ|y) can be factored
by employing a sampling method of conditional distribution of the parameters, which is a procedure
known as Componentwise Metropolis-Hastings (e.g Ntzoufras (2009)). These conditionals are of an inferior
dimension and can be more easily simulated. Such procedure can be resumed to the following iterations:
p(Θ1 |Θ2 , Θ3 , ..., Θn , y)

p(Θ2 |Θ1 , Θ3 , ...., Θn , y)
(6.5) .. .
.
p(Θn |Θ2 , Θ3 ., ..., Θn−1 , y)
The Clifford-Hammersley theorem (see Robert and Casella (2005) for a derivation of this result) en-
sures that, under certain conditions of regularity, this set of conditional distributions converges only for
the distribution of p(Θ|y). An obvious advantage of this method is that it does not involve any method-
ology of numerical maximization, thereby avoiding the numerical problems involved in the maximization
of non-linear functions, such as those found in our problem. The empirical validity of this methodology is
verified by methods capable of checking the convergence of Markov chains for their stationary distribution.
Another important point to be mentioned in relation to the use of Bayesian inference methods is that the
use of prior information helps to solve some of the existing problems in the classic estimation, such as, for
example, the estimation of non-identified models. This point is discussed in detail in section 8.
When all the conditional distributions are known, then the MCMC algorithm becomes the so-called
Gibbs sampler 5, where the estimation is done by sampling directly from the conditional distributions.
However, when it is not possible to sample from the analytical conditional distribution, the sampling of
5For a detailed discussion of this topic see [Robert Casella, 2005], [Gamerman Lopes, 2006] and [Ntzoufras, 2009]
50
these conditionals can be performed by applying the Metropolis-Hastings algorithm, which is a general-
ization of the acceptance-rejection method of random variables simulation for the sampling of conditional
distributions.
As for our problem, we cannot sample directly from all the conditional distributions because of the non-
linear specifications adopted and the use of non-conjugate distributions. Thus we will use a hybrid MCMC
algorithm, simultaneously using the Gibbs algorithm and the Metropolis-Hastings algorithm, which is a
methodology originally proposed in [Tierney, 1994]. In this case, when we have a known conditional we
use Gibbs sampling, and for other conditionals we use Metropolis-Hastings. A hybrid MCMC algorithm
Robert and Casella (2005)) can be seen in the iterations in the following stages:
(t+1) (t+1) (t) (t)

For i=1,...,p and given (θ1 , ..., θi−1 , θi , ...θp ),
1 - Simulate
θei ∼ qi (θ|θ1 , ..., θi−1 , θi , ...θp )
(t+1) (t+1) (t) (t)
2 - Accept

 θ(t) with probability 1 − ρ
(t+1) i
θi ,=
 θe with probability ρ
i
where
 (t+1) (t+1) (t) (t)


 gi (θei |θ1 ,...,θi−1 ,,θi ,...θp )
^ e (t+1) ,...,θ (t+1) ,θ (t) ,θ (t) ,...θp(t) )
ρ=1 qi (θi |θ(t)1 (t+1) i−1 i
(t+1)
i
(t) (t)


 g (θ
i i |θ 1 ,...,θ i−1 ,,θ i ,...θp )
 q (θ(t) |θ(t+1) ,...,θ(t+1),θ(t) ,θ(t) ,...θ(t))
i i 1 i−1 i i p
where qi is the so-called tentative or auxiliary distribution When the model to be estimated can be placed in
a state-space formulation, a convenient way of addressing this problem consists in adopting a hierarchical
formulation. In this structure, the representation of the priors is based on a hierarchy. This formulation is
particularly useful in state space models because the hierarchical specification makes it possible to recover
the distribution of latent factors by using, as prior distribution of the latent factor on date t, an posterior
function of the latent factor in time t-1. A simple example of this is the so-called local level model:
51
y t = µt + εt ,
(6.6)
µt = µt−1 + νt
In this example we can use as prior distribution of the latent factor µt the value of µt−1 , and so
µt ∼ π(µt−1 ), making direct use of the state equation specification6.
In order to achieve a complete characterization of our model, it is necessary to discuss which prior
distributions are employed. For the latent factors βit and τit we use the Inverse Normal-Gamma pair, as
priors, by means of the hierarchical characterization, where the mean is given by the autoregressive vector
structure. For the parameters of the autoregressive vector Φ, we assume a multivariate normal struc-
ture with variance matrix given by a Wishart inverse distribution, and for the latent factor of stochastic
2
volatility we assume σt2 ∼ LogNormal(φ0 + φ1t σt−1 , τσ2 ), with a Gamma distribution for τσ2 , Normal
for φ0 and finally φ1 ∼ Beta. For the other parameters in the autoregressive processes and also for the
specification of the parameters which identify the factors of each market in the models of generalized and
global latent factor models, we use a normal-multivariate structure for the mean of these parameters, and
Wishart-inverse for their variance matrix7. Alternative specifications implementing shrinkage procedures
are discussed in section 7.
The sampling procedure employs the Gibbs algorithm for βit , φ0 parameters, autoregressive processes’
parameters, loading parameters in the global factor model, Wishart distribution parameters and hyperpa-
rameters for Gamma distributions. For the other parameters that do not have a known conditional distri-
bution, and which are linked with the non-linear and non-conjugate specifications, we use the Metropolis-
Hastings algorithm; and, finally, for the parameter φ1 in the stochastic volatility processes we use the
algorithm known as Slice Sampler (Neal (2003)). The model specification is then complete, assuming a
6See Koop (2003) for the estimation of state-space models using the hierarchical formulation.
7For a discussion about these specifications see Bernardo and Smith (1994).
52
multivariate Normal likelihood for the term structure observed, allowing us to recover the posterior dis-
tribution of the parameters through equation (??) with the use of estimation algorithms via MCMC.
In order to obtain the predictive distribution of the one-step-ahead model we use the relation:
ˆ
(6.7) yt+1 |yt ) =
p(b yt+1 |Θ)p(Θ|yt )p(Θ)dΘ
p(b
which is future likelihood weighted by the posterior distribution of parameters, where yt are observa-
tions until period t. We summarize the one-step-ahead forecasts using the mean and the percentiles of
the predictive distribution given by equation (6.7).
7. Bayesian Shrinkage
The models specified for the multiple yield curves present a high number of parameters to be estimated,
mainly in the specification for the dynamics of latent factors. The high dimensionality, together with the
identification problems discussed in section 8, render the estimation of models of term structure of interest
rates a very complicated econometric problem.
The usual solutions for the problem of dimensionality of the parameter vector in multimarket models
involve the imposition of ad hoc restrictions, which are also connected to the identification problem. In
the global curve model proposed by Diebold et al. (2008) they reduced the number of parameters con-
sidering only the level and slope components, and discarding the curvature and double curvature factors.
However, this procedure reduces considerably the model fit, particularly for longer maturities. In these
long maturities the curvature and double curvature components significantly increase the model fit (e.g.
Björk and Christensen (1999), Christensen et al. (2008)).
53
Many of those parameters, however, are expected to be non-significant, and can therefore be eliminated
from the model. In the specification of the generalized latent factor model, the factor parameters with
distinct interpretations are not expected to be important for an explanation of the other factors. For
example, a double-curvature factor is very unlikely to affect the movements of the slope factor. This in-
terpretation can be justified by the performance of the principal component decomposition by Litterman
and Scheinkman (1991), where the components are displayed in orthogonal constructions.
In Bayesian estimation, this problem is implicitly addressed by the prior structure used. A parame-
ter with an expected zero posterior value is generally specified by an prior distribution concentrated on
zero-value. This is the interpretation of the so-called Minnesota prior (e.g. Doan et al. (1984) ) employed
in time series models. This interpretation advocates the use of a prior for non-stationary time series,
defining a random walk process in autoregressive vector models, thus imposing a zero-centered prior for
discrepancies superior to one, and centered on one for the first variable discrepancy.
There is an alternative way of approaching this problem, which consists in using techniques known as
Bayesian Shrinkage, by using priors which place a greater weight on zero than the standard priors. In this
study we employ two forms of shrinkage priors. The first form uses Laplace prior (Double Exponential),
and the second form uses the generalized Minnesota prior.
The estimation which uses the Laplace prior (Double Exponential) is related to the estimation method
known as LASSO - Least absolute shrinkage and selection operator, proposed by Tibshirani (1996) . The
LASSO estimator is obtained as the solution for an estimation problem employing a penality ℓ1 in the
minimization of the problem:
54
q
X
(7.1) ||Y − Xβ|| + λ |βj |.
j=2
One advantage of the LASSO estimator is that, instead of just forcing the estimators to zero, as it
happens in the case of techniques such as ridge regression, it effectively allows some estimators to be
identically equal to zero, simultaneously performing a procedure of shrinkage and selection of models.
The LASSO estimator can be interpreted as an estimation in an posterior fashion in a Bayesian context,
by using a Laplace prior distribution (Double Exponential) as pointed out by Tibshirani (1996) himself
and Park and Casella (2008). This Laplace distribution is a function of two-hyperparameters (µ, b) in the
form:
1 (− |x−µ| )
(7.2) π(β) = e b ,
2b
where (µ, b) can be interpreted as location and scale factors. Figure 7.1 presents different specifications
of Laplace priors showing that the weight on zero is much greater than, for example, Gaussian priors with
the same scale factor. The only difficulty associated to the use of the Laplace priors is that this distri-
bution is not conjugate, and thus an additional stage becomes necessary in the estimation procedure, by
employing the Metropolis-Hastings algorithm. We have adopted this independent Laplace prior structure
for the parameters in the autoregressive vectors which define the latent factors, and we have used the
values 0 and .1 as values of µ e b.
The generalized Minnesota prior proposed in Robertson (1999) and Kadiyala and Karlsson (2007),
and advocated for use in the estimation of models of high-dimension Bayesian autoregressive vectors by
Banbura et al. (2008), consists in a prior generalization proposed by Doan et al. (1984). In this formulation
we have that the prior for parameter matrices Φi and Φj , in the models of generalized factors, is given by:
55
Figure 7.1. Laplace Prior
Prior Laplace
1.0
0.8
b=.5
b=1
b=2
0.6
Probability
0.4
0.2
0.0
−4 −2 0 2 4
Suporte
O
(7.3) vec [Φi ] = N vec(Φi0 ), ψ Ω0 ,
where vec is the operator which stacks the parameters in a column and
(7.4) ψ ∼ iW (S0 , α0 ),
 
 δi , if i = k  λ2 , if i = k
where E(S0 ) = and α0 = and and iW denotes an Inverse Wishart
 0 c.c.  λ2 σi c.c.
σj
distribution.
In this case, we assume that δi is the expected variance of each latent factor, and parameter λ controls the
chosen shrinkage factor, and that σi and σj are the latent factors’ variabilities. For the models considered
in this study we use a shrinkage factor λ=.1.
8. Identification
Note that, in the definition of the possible specifications for the dynamic models for the term structure
of interest rates, there is a trade-off between a richer specification and the difficulty in the computational
56
estimation. Dai and Singleton (2000) point out that the problems in the specification of affine models of
term structure of interest rates involve admissibility conditions, i.e., the model leads to well defined bond
prices and to econometric identification conditions.
The identification concept in econometric models can be summarized as the property whereby a model
is considered non-identified if there is more than one parameter vector defining an equivalent likelihood
function, and this perspective is valid both in classic models (e.g. Rothemberg (1971) and in Bayesian
models (e.g Kadane (1974), Poirier (1998) and Aldrich (2002)).
In formal terms, we have that a model is a non-identified model if we consider a regular likelihood function
L(θ; y), where θ ∈ Θ is a vector Kx1 and Θ ∈ RK . If θ is not identified for the whole θ(1) ∈ Θ there is
another θ(2) ∈ Θ so that L(θ(1) ; y) = L(θ(2) ; y) for the whole y. In this way, in non-identified models there
is more than one parameter vector in the parametric space satisfying the estimation criterion function,
the maximum of the likelihood function.
According to Kadane (1974), identification is a property of the likelihood function, and thus the iden-
tification is the same in both classic and Bayesian perspectives. As stated in Poirier (1998) the solution
for the estimation of non-identified models is the same under both perspectives, which is, to use more
information in the model - and this information is generally not contained in the sample. The solution
proposed for the identification problems in the classic perspective is generally represented by the imposi-
tion of restrictions in the parametric space, usually eliminating redundant parameters in the model. The
Bayesian perspective, in which identification is generally obtained by the use of prior information, is less
dogmatic. Quoting Poirier (1998):
“A Bayesian analysis of a non-identified model is always possible if a proper prior on all the parameters
is specified.”
57
The estimation of non-identified models by the prior is obtained by noting that the use of an ade-
quate prior can reduce the sampling space possible in the subsequent distribution of data, thus reducing
the probability of the posterior being placed in a non-identification region. As formally put in Florens
et al. (1990) the choice of an appropriate prior makes it possible to estimate posterior identified by the
reduction of the sigma-algebra generating the posterior distribution of interest parameters. Nevertheless,
some considerations must be introduced here. The first consideration is that there may be situations
where the data is non-informative and prior and posterior distributions are equivalent; also, problematic
situations can arise if inadequate priors are employed. A detailed discussion of these problems can be
found in [Poirier, 1998]. This same reference contains the discussion of two situations directly related to
the estimation in the context of the problem of estimating models of term structure proposed in this study.
The first discussion relates to the multicolinearity problem. Note that in the general form of the
Nelson-Siegel-Svensson (eq. 4.1) models, there is a potential multicolinearity problem that results in the
non-identification of the model. The terms related to components β2 and β4 are potentially non-identified
for values close to these two factors, and that includes the specification of the value of the slope parameters.
It is usually assumed that τ2 > τ1 for the identification of the model, as in Christensen et al. (2008). In the
case of multicolinearity, the identification can be obtained by means of an adequate prior for the relevant
parameters, and in the case of the estimation of the slope parameters, the identification is obtained by
assuming priors that will lead the posterior of these parameters to distributions where there is maximum
subsequent probability of observing τ2 > τ1 .
The second relevant discussion is related to the estimation of hierarchical models, where it is possible
to show that identification can be obtained by means of an appropriate choice for the prior distribution
in the parameters involved in each hierarchy of the model. In the formulation proposed for the model,
we use a state-space representation for the evolution of the model’s latent factors, and the estimation of
this representation is given by the hierarchical formulation in which the prior for the latent factor in t+1
is the estimated posterior for this latent factor in time t. In this case we have that the informativeness
58
condition of the data is always respected, and, with an adequate choice of priors, the model can always
be estimated, thus avoiding the non-identification problems generally found in the classic estimation us-
ing the Kalman filter (e.g. Duffe (2002) ). This problem and its economic implications are discussed in
Kim and Orphanides (2005), who point out that affine models of term structure can be characterized by
estimations that are observationally equivalent but very different in their economic interpretations.
Note that a term structure of interest rates has sufficient statistics to identify the necessary parameters,
as the statistics are given by the past observations of the yield curve. This is exactly one of the two-stage
estimation problems employed in Diebold and Li (2006), Diebold et al. (2008), that being the fact that the
estimation of the first stage actually ignores all the time dependence structure between the latent factors.
The simultaneous Bayesian estimation uses the estimated parameter in t-1 period as the prior for the
parameter in time t, and as generally this estimation is informative, we achieve the reduction discussed
in Florens et al. (1990) in the sigma-algebra generator of the posterior distribution of the latent factor,
solving the identification problem for the models analyzed.
9. Empirical application
9.1. Database. In order to perform the empirical analysis of the models proposed, we employ yield curves
of two different markets. The first curve is built on data of the term structure of the "Cupom Cambial"
curve. Cupom Cambial can be summed up as a term structure of instruments negotiated in Brazil but
with yields in dollars. Other studies for modeling the Cupom Cambial curve are found in Pinheiro et al.
(2007) which models this curve using a polynomial structure with latent variables, and in Pereira (2009),
which uses a simplified form of the Diebold and Li (2006) model.
The Cupom Cambial curve was built by means of a synthetic instrument calculated from the assets
transacted at BM&F ("Bolsa de Mercadorias e Futuros") (Brazil’s Commodities and Futures Exchange).
The Cupom Cambial was calculated by no-arbitrage equalizing the returns of the DDI (Contrato de
59
Cupom Cambial) (Contract Coupon Exchange), which is a fixed interest instrument whose remuneration
is accrued with the CDI (Interbank Deposit Certificate)’s accumulated returns. The formula used for
calculating the Cupom Cambial is given by:
QT !
( 252
1
)
t=1 (1 + it ) 360
(9.1) Ct = ∗ ,
(1 + ∆et ) T
where T is the number of days between the negotiation and the expiration dates of the contract, it is the
CDI rate negotiated in the interbank market on t day and ∆et is the exchange rate appreciation measured
by the exchange rate per dollar in Real (PTAX800) observed between the trading day before the date of
the operation in the futures market and the last day of the month before the expiration of the contract.
Because of a distortion in this contract caused by using the PTAX of the previous day, we apply to this
bond the replication methodology by market instruments with more liquidity, employing the spot dollar,
dollar futures, DI futures and Forward Rate Agreements. This methodology is used in Pereira (2009),
which provides a detailed discussion of the advantages of this procedure.
The other curve used in this study is a yield curve built from the remunerations obtained in the Eurodol-
lar market, which corresponds to the market of financial deposits in dollars negotiated outside the USA.
This external curve is constructed by using Eurodollar term contracts traded in the Chicago Mercantile
Exchange. Both curves in this study are constructed using the methodology suggested by Burghardt
(2003). Note that these two instruments are chosen so as to have remuneration in the same currency, thus
eliminating the influence of exchange rate variations on returns from different markets.
In both curves, we work with fixed Maturities of 6, 9 ,12, 24, 36, 48, 60, 72, 84, 96, 108 and 120 months,
the sample period ranging from March 6, 2007 to November 26, 2008, containing 402 observed curves.
The descriptive statistics for each maturity of both these curves are displayed in Table 1, whereas Figure
60
Table 1. Descriptive Statistics by Maturities (in months).
Cupom Cambial 6 9 12 24 36 48 60 72 84 96 108 120

Min. 0.0292 0.0309 0.0323 0.0365 0.0401 0.0415 0.0428 0.0441 0.0457 0.0476 0.0495 0.0515
1st Qu. 0.0410 0.0436 0.0449 0.0483 0.0523 0.0554 0.0582 0.0610 0.0640 0.0671 0.0704 0.0735
Median 0.0574 0.0569 0.0564 0.0557 0.0572 0.0603 0.0634 0.0669 0.0705 0.0744 0.0779 0.0809
Mean 0.0532 0.0537 0.0540 0.0547 0.0567 0.0595 0.0628 0.0668 0.0708 0.0749 0.0790 0.0831
3rd Qu. 0.0631 0.0625 0.0623 0.0615 0.0622 0.0646 0.0680 0.0724 0.0766 0.0811 0.0858 0.0906
Max. 0.0942 0.0894 0.0826 0.0823 0.0843 0.0867 0.0913 0.1021 0.1125 0.1224 0.1323 0.1425
sd. 0.0124 0.0116 0.0110 0.0092 0.0082 0.0079 0.0083 0.0094 0.0107 0.0121 0.0136 0.0153
Eurodollar 6 9 12 24 36 48 60 72 84 96 108 120
Min. 0.0183 0.0185 0.0187 0.0205 0.0233 0.0263 0.0286 0.0305 0.0320 0.0331 0.0340 0.0347
1st Qu. 0.0281 0.0281 0.0281 0.0304 0.0332 0.0358 0.0381 0.0401 0.0419 0.0434 0.0447 0.0458
Median 0.0361 0.0341 0.0333 0.0358 0.0387 0.0408 0.0425 0.0440 0.0454 0.0466 0.0478 0.0488
Mean 0.0389 0.0382 0.0377 0.0381 0.0400 0.0418 0.0434 0.0448 0.0461 0.0473 0.0483 0.0492
3rd Qu. 0.0526 0.0510 0.0499 0.0482 0.0481 0.0485 0.0491 0.0498 0.0504 0.0511 0.0516 0.0523
Max. 0.0538 0.0537 0.0538 0.0540 0.0546 0.0553 0.0560 0.0566 0.0572 0.0578 0.0584 0.0589
sd. 0.0119 0.0116 0.0112 0.0097 0.0083 0.0073 0.0065 0.0058 0.0053 0.0049 0.0046 0.0044
9.1 shows these two curves’ evolution in time.
There follow some characteristics to be observed in these curves. In these two curves there is, as pointed
out in Pereira (2009), a movement of increase in the average volatility and in the spreads. The final period
of the Eurodollar yield curve reflects the interest rates imposed by the Federal Reserve.
These patterns in the yield curves render them interesting as objects of study for the proposed models.
The first point to be mentioned is that there is a great variability in the shape of the yield curves in time,
indicating that the latent factors must have great variability in these two curves. Another point is that it
is evident that the curve slope pattern changes considerably in time, which justifies the use of time-variant
parameters, in opposition consider it fixed as in Diebold et al. (2005) model. And a further point concerns
the volatility structure, which is not constant in time and thus justifies the use of the stochastic volatility
component.
61
Figure 9.1. Interest Rates
180
270
360
720
1080
15 1440
1800
2160
2520
2880
3240
3600
Interest Rate
10
5
0 100 200 300 400
Time
(a) Cupom Cambial

6
5
180
270
Interest Rate
360
4
720
1080
1440
1800
2160
2520
2880
3
3240
3600
2
0 100 200 300 400
Time
(b) Eurodollar
9.2. Comparative analysis. In order to perform a complete analysis of the three classes of models
proposed, we undertook the estimation of the complete models of each class, with the imposition of sub-
models of each class also. These different specifications make it possible to analyze how the different model
characteristics affect the fit of the model and the results obtained. Thirteen different specifications were
estimated, detailed as follows:
62
(1) Independent Curves - in this specification the curves are independent - the latent factors of each
curve depend only on the other factors of the same curve, ignoring the interdependence with
the other market. This specification corresponds to the model of generalized latent factors, with
the restriction that parameters Φ, θ and γ corresponding to the curve of the other market, are
eliminated from the specification in equations (4.6), (4.7) and (4.8).
(2) Complete Generalized Latent Factor Model - this model corresponds to equations (4.5), (4.6), (4.7)
and (4.8) with all the parameters being estimated.
(3) Generalized Latent Factor Model with restricted crossed factor - In this model we assume that
matrix Φ in equation 4.6 has complete rank for the factors of the same curve, and it is a diagonal
matrix for the factors of the other yield curve.
(4) Diagonal Generalized Latent Factor Model - in this specification, we assume the matrices Φi and
Φj are diagonal, and thus each factor only depends on it in time t-1 and on the equivalent factor of
the other curve in time t-1. For example, the level in time t of the Cupom Cambial curve depends
only on the level of the Cupom Cambial curve in time t-1 and on the level of the Eurodollar curve
in t-1, and it does not depend on other factors.
(5) Triangular Generalized Latent Factor Model - the Cupom Cambial curve depends on itself in t-1
and on the Eurodollar curve, but the Eurodollar curve depends only on itself. In this model we
assume that the local curve is influenced by the foreign curve, but the foreign curve is independent
of the Cupom Cambial curve, thus assuming a triangular structure.
(6) Generalized Global Factor Model Identified with the Eurodollar Curve. In this model we assume
that the global factor is given by the latent factors of the Eurodollar curve. In this way, the Cupom
Cambial curve in this model is a direct displacement of the Eurodollar curve with the addition
of one idiosyncratic factor. This structure is much simpler than the complete generalized global
factor because we estimate the Eurodollar latent factors directly and we obtain the factors of the
Cupom Cambial curve by estimating only the corresponding loadings.
63
(7) Complete Global Factor Model - In this specification we assume the complete structure of gener-
alized global factor, where the factors of both the Eurodollar curve and of the Cupom Cambial
curve are displacements of the global latent factor with the addition of idiosyncratic factors, cor-
responding to equations (4.9)- (4.15) of the generalized global factor model.
(8) Generalized Latent Factor Model with Bayesian Shrinkage via Laplace Prior - In this specification
we estimate the complete model of generalized latent factors, but employing the Laplace Prior
structure (equations (7.1), (7.2)) for the autoregressive parameters of the model. In this case we
assume that (µ, b) are given by vector (0,.1).
(9) Generalized Latent Factor Model with Bayesian shrinkage via Generalized Minnesota Prior - Once
again the complete generalized latent factor model, but now employing the Generalized Minnesota
Prior structure described in equations (7.3) and (7.4).
(10) Generalized Model with 5 factors - We use the specification of 5 factors given by equation (5.12),
but without the no-arbitrage correction and assuming the generalized latent factor structure with
interactions between the yield curves. This model is analogous to the model proposed in Björk
and Christensen (1999). We assume in this specification that the slope parameters and stochastic
volatility are constant. This model aims at verifying the gain obtained by these two additional
factors in the fit of the yield curves.
(11) Generalized Model with 5 factors and No-Arbitrage conditions - We use a complete specification of
the Christensen et al. (2008) model, but assuming the structure of generalized latent factors which
provides the interaction between latent factors of different curves, and employing the no-arbitrage
correction. This model generalizes the Christensen et al. (2008) model for more than one market.
(12) Generalized Model with 5 factors and Bayesian Shrinkage. This model is similar to model 10, but
employs the shrinkage structure via Laplace Prior.
(13) Generalized Model with 5 factors, No-Arbitrage and Bayesian Shrinkage. This specification corre-
sponds to model 11 with the no-arbitrage correction with the addition of the use of shrinkage via
Laplace Prior.
64
The estimation of all these models employs a burn-in period (number of discarded samples) of 5,000 it-
erations, and another 10,000 iterations for the construction of posterior distributions. The verification
of the chains’ convergence is performed employing the Geweke and Gelman-Rubin procedures (e.g. Nt-
zoufras (2009)), indicating that there were no problems in the convergence of the simulated Markov chains.
The first mechanism of comparison between models uses DIC (Deviance Information Criterion) as pro-
posed by Spiegelhalter et al. (2002). DIC is a criterion of Bayesian information that enables model selection
in an analogous way to the generally employed BIC and AIC criteria. This DIC criterion is of interest
in the comparison of complex models with high number of parameters, because in DIC the penalty is
applied on the effective number of parameters as defined in Spiegelhalter et al. (2002). DIC also has the
characteristic of yielding results equivalent to the robust version of the AIC criterion (e.g. Claeskens and
Hjort (2008)) and thus it is also valid as a selection criterion in a classic inference perspective.
Table 2 shows the estimated DIC for the models estimated. By this criterion, the best models are models
7 and 8, corresponding to the complete Generalized Global Factor Model and to the Generalized Latent
Factor Model employing shrinkage through the Laplace Prior. The global factor model has a rather inferior
number of parameters than the generalized factor model, but it has a more complex structure due to the
global latent factors. The fact that the DIC of these two models is equivalent indicates that the fit within
the sample of these two models is equivalent, with a penalty for the effective number of parameters.
It is also important to note that the use of Laplace Prior makes it possible to reduce significantly the com-
plexity of the model, because, by comparing the DIC of model 2 with model 8, we notice a quite significant
reduction. Also to be noted is the fact that the worst model by DIC is the independent factor model,
which shows that the interaction between curves increases fit power in these models of term structure of
interest rates, even with penalties for greater complexity of the model.
65
Table 2. DIC
Model DIC Rank
Model 1 -41240.46 10
Model 2 -90263.49 6
Model 3 -42076.37 9
Model 4 -73328.45 8
Model 5 -96747.54 5
Model 6 -5774.245 11
Model 7 -115687.7 2
Model 8 -115814.7 1
Model 9 -74041.7 7
Model 10 -97506.19 4
Model 11 -97506.19 4
Model 12 -101149.4 3
Model 13 -101149.4 3
The result obtained by DIC indicates that the more general structure of models 7 and 8 effectively
increases the fit within the sample, even having penalized those models for their greater complexity. An-
other important point is that the DIC corresponds to the global fit of the model, and thus it does not
make it possible to differentiate the relative fit for each yield curve or for each maturity in particular. It
is relevant, however, to verify whether this greater complexity in the models selected by the DIC criterion
will lead to a better forecast power.
To undertake this analysis, we performed an analysis of predictive power, comparing the models es-
timated by means of several forecast analysis criteria. For one-step-ahead forecasts, we calculated the
following forecast criteria: ME (Mean Error), RMSE (Root Mean Squared Error), MAE (Mean Abso-
lute Error), MPE (Mean Percentage Error), MAPE (Mean Absolute Percentage Error) and the Theil’s U
criterion. The properties of these measures of forecast accuracy can be found in Hyndman and Koehler
(2006). Tables 3 and 4 show these measures in the Cupom Cambial and Eurodollar curves8.
8These curves correspond to an aggregation of forecast errors for all the maturities of the yield curves, but measures for
each maturity were also calculated separately. These results are not shown in this study for reasons of space but they are
available upon request.
66
Table 3. One-step-ahead forecasts - Cupom Cambial Curve
ME RMSE MAE MPE MAPE Theil’s U

Model 1 4.948467e-05 0.0033243248 0.0023628392 -0.10506614 3.6941737 1.3876540
Model 2 3.120740e-05 0.0061732919 0.0046305395 -0.67884277 7.2658601 2.5536768
Model 3 3.120740e-05 0.0061732919 0.0046305395 -0.67884277 7.2658601 2.5536768
Model 4 -8.001256e-06 0.0021101813 0.0014223738 -0.09677698 2.2386319 0.9026644
Model 5 2.929328e-06 0.0061070020 0.0045121777 -0.71735752 7.0040327 2.4440608
Model 6 -1.134294e-04 0.0065386572 0.0021641319 -0.38978741 3.4117078 2.4189061
Model 7 -9.932863e-05 0.0041431653 0.0008865839 -0.17691421 1.4313262 1.5970411
Model 8 1.077975e-07 0.0046493827 0.0032573506 -0.37735241 5.0503655 1.8872831
Model 9 3.035743e-06 0.0054047840 0.0041093530 -0.54053051 6.5118160 2.2959305
Model 10 -1.568414e-06 0.0014702792 0.0011108274 -0.02515509 1.7473377 0.6167394
Model 11 7.119768e-05 0.0014720012 0.0011046995 0.09479088 1.7361130 0.6173000
Model 12 -7.629270e-07* 0.0007277954* 0.0005350270* -0.00980893* 0.8361936* 0.3065305*
Model 13 2.347561e-04 0.0007647195 0.0005653953 0.37841530 0.8928385 0.3248396
*denote best model
The results obtained for the Cupom Cambial curve show that the best model by all forecast criteria is
model 12, which is the 5-factor model, obtaining a forecast power quite superior to that of other models.
We can interpret this result as due to the fact that the addition of the two additional slope and curvature
factors provides a better fit and forecast in this yield curve - this was expected because of the greater
variation in the shape of the Cupom Cambial curve in time. This conclusion can be observed by the Theil’s
U criterion, which shows the relative gain in forecast if compared with a naive forecast using random walk.
We can observe that the 5 factor models, whether with or without no-arbitrage correction, can systemat-
ically obtain a superior performance to that of the random walk in the Cupom Cambial curve. Although
it is a different sample, it is possible to compare its results with those obtained with the models for the
Cupom Cambial curve employed in Pinheiro et al. (2007). For the one-day-ahead forecast, the lowest
Theil’s U obtained in this study is .88, while we obtained a reduction to 0.3065 using model 12, highlight-
ing, however, that the samples come from different periods and the maturities studied are also different,
therefore this is an informal comparison.
Another important comment is that the no-arbitrage correction does not significantly reduce the predictive
67
Table 4. Forecasting - Eurodollar Curve
ME RMSE MAE MPE MAPE Theil’s U

Model 1 -6.320870e-08* 0.0001879464* 0.0001483558 -0.002479230* 0.3757145 0.08551584*
Model 2 -1.202949e-06 0.0007612598 0.0006331330 -0.019946197 1.6084661 0.35552141
Model 3 -6.134617e-07 0.0007175650 0.0005976160 -0.016846364 1.5134021 0.33254931
Model 4 -2.514408e-06 0.0003228174 0.0002566345 -0.010678224 0.6317814 0.14342737
Model 5 -1.030380e-05 0.0002885804 0.0002268257 -0.032457901 0.5619791 0.13099188
Model 6 1.213102e-05 0.0003598094 0.0002831787 0.021745183 0.7176587 0.16520193
Model 7 8.326324e-06 0.0002182445 0.0001327137* 0.014453269 0.3432117* 0.11512664
Model 8 1.603705e-07 0.0002768067 0.0002224759 -0.002767453 0.5307122 0.11509083
Model 9 -1.571455e-06 0.0005298119 0.0004375458 -0.013132845 1.0805472 0.23450715
Model 10 -3.153238e-07 0.0009584696 0.0007179616 -0.052340573 1.8029979 0.42355519
Model 11 2.786841e-05 0.0009588747 0.0007156113 0.017230232 1.7958090 0.42326406
Model 12 -1.035208e-06 0.0013243624 0.0010372927 -0.093306970 2.6140207 0.59081940
Model 13 1.093918e-04 0.0013288721 0.0010340418 0.179279071 2.6029361 0.59052908
*denote best model
power of these models, and this can be observed by comparing models 10-11 and 12-13. By the mean abso-
lute percentage error (MAPE) criterion, we observe that the no-arbitrage correction improves the forecast
power of the model when we compare model 10 with model 11, and thus the no-arbitrage correction, if
placed in a model with sufficient flexibility - such as the 5 factor model - does not represent a loss in the
forecast power. In this way, we can have the best of two worlds: no-arbitrage and accuracy in the forecasts.
For the Eurodollar curve, the general results show that all the models have an adequate forecast per-
formance, which can be verified, on the one hand, by the Theil’s U criterion, which shows that all the
models have a much superior performance than the naïve forecast employing the random walk, and, on
the other hand, by the MAPE criterion, which shows a low mean absolute percentage error. In this case,
it is to be noted that, by both MAE and MAPE criteria, the best model turns out to be model 7, which is
the complete generalized global factor model, whereas by other criteria the best model is the independent
model.
68
These results can be interpreted by observing that the Eurodollar curve must be prior much less sensitive
to influences of the other yield curves, and consequently the best forecast result of the independent curves
model makes sense. Changes in the Cupom Cambial yield curve should not have significant forecast power
on the Eurodollar curve, and, given the lesser complexity of this model, this characteristic is reflected on
a lesser forecast mean error. However, we note that in general the models are characterized by a negative
bias in the forecast, which does not happen in model 7, which is the best model by MAE and MAPE
criteria. The result obtained by model 7 can be explained by the format of the global factors estimated,
which are closer to the Eurodollar curve than they are to the Cupom Cambial Curve, as shown in Figure
9.3.
In the case of the Eurodollar curve, the addition of another slope and curvature factors does not repre-
sent a better forecast power, and models 10-13 have a rather inferior performance than the other models.
This result is consistent with the stylized fact that the shape of the yield curve of the developed countries
is simpler than the curves of the emerging countries, and thus the forecast power of the simpler models is
greater for this curve than for the Cupom Cambial curve, which needs a more flexible specification.
9.3. The Importance of No-Arbitrage Correction. Although the consistency with no-arbitrage is
a fundamental condition in the specification of models of term structure of interest rates, an interesting
question is to verify whether the yield curves observed are consistent or not with no-arbitrage. In the
context of the Christensen et al. (2008) formulation, it is possible to measure this effect by looking at the
correction factor Ci (t, T ) in equation 5.15. Note that this model is basically a yield curve based on the
Björk and Christensen (1999) model, adding the correction factor to ensure consistency with no-arbitrage.
If the magnitude of this factor is too low and not significant, we have the evidence that the adjusted curve
itself is already arbitrage-free, and this correction factor is therefore not necessary. One way of verifying
this effect is by observing the posterior distribution of the no-arbitrage correction factor estimated by the
69
MCMC algorithm.
Table 5 shows the 2.5%, 50% and 97.5% percentiles for the posterior distribution estimated for the
no-arbitrage correction factor expressed by the term Ci (t, T ) in equation 5.13, for each maturity estimated
in the Cupom Cambial and Eurodollar curves. These estimated percentiles can be interpreted as a cred-
ibility interval for the no-arbitrage correction factor, and can be used for hypothesis test purpose (e.g.
Bernardo and Smith (1994)). The null hypothesis of interest would be that the no-arbitrage correction in
each Nevertheless is equal to zero, against an alternative hypothesis that this correction is different from
zero. In this case, the validity of this null hypothesis can be tested by verifying whether the credibility
intervals obtained from the posterior distribution of the no-arbitrage correction factor include or not the
zero punctual value.
Table 5 shows that, in general, the values estimated for the no-arbitrage correction factor have a re-
duced magnitude, and that only the 1440 and 1800-day maturities for the Cupom Cambial curve do not
include zero in the intervals, and there would thus be the need for no-arbitrage correction only in these
two maturities for the Cupom Cambial curve, whereas in the Eurodollar curve we cannot reject the fact
that all the correction factors are statistically equal to zero by the confidence intervals estimated.
These results confirm that the greater liquidity of the Eurodollar market already ensures that the data
observed in the yield curve is free from systematic arbitrage opportunities, while in the Cupom Cambial
curve these possibilities can still be present. These results are consistent with the results obtained by the
forecast analysis, which show that, for the Eurodollar curve, the no-arbitrage correction does not alter
significantly the forecast results. Nevertheless, it must be pointed out that these results are conditional
to the structure assumed for these no-arbitrage verification procedures, which assume that the adequate
model for arbitrage-free modeling of the term structure of yield curves is given by the Christensen et al.
(2008) model, and therefore the results obtained are conditional to this model. Tests with other forms of
70
Table 5. Posterior Distribution - Correction Factor by No-Arbitrage
Cupom Cambial quantile 2.5 % quantile 50 % quantile 97.5 % Eurodollar quantile 2.5 % quantile 50 % quantile 97.5 %
6 -2.505592e-04 5.627831e-05 0.0003210676 6 -2.923742e-04 -1.010111e-04 1.133912e-05
9 -2.220485e-04 5.973929e-05 0.0003037833 9 -2.099763e-04 -6.273529e-05 1.253434e-05
12 -1.978511e-04 6.004418e-05 0.0002859084 12 -1.570117e-04 -3.717365e-05 2.503620e-05
24 -1.148664e-04 5.428410e-05 0.0002036099 24 -2.398569e-05 2.081719e-05 5.001232e-05
36 -3.272735e-05 5.953053e-05 0.0001443693 36 -1.781659e-05 5.687899e-05 9.922963e-05
48 4.570273e-05 7.687623e-05 0.0001133094 48 -1.364977e-05 8.434156e-05 1.812485e-04
60 2.077555e-05 1.017708e-04 0.0001943380 60 -7.023459e-05 1.013277e-04 2.686080e-04
72 -3.571103e-05 1.293691e-04 0.0003155333 72 -1.510072e-04 1.114461e-04 3.622645e-04
84 -1.008644e-04 1.601904e-04 0.0004644550 84 -2.554487e-04 1.097855e-04 4.644888e-04
96 -1.747916e-04 1.919751e-04 0.0006244936 96 -3.797314e-04 1.102472e-04 5.769284e-04
108 -2.651789e-04 2.223300e-04 0.0007991271 108 -5.257737e-04 9.985080e-05 6.997016e-04
120 -3.731035e-04 2.527459e-04 0.0009915839 120 -6.933878e-04 8.184704e-05 8.314890e-04
no-arbitrage correction could produce different results.
9.4. Estimated Latent Factors. In order to illustrate briefly some of the characteristics of the esti-
mated models, we present some graphic comparisons between results of distinct specifications. As there
are several factors and distinct models, we only present some selected results. All the figures have the
mean and the 2.5% and 97.5% percentiles of the posterior distributions of each latent factor, representing a
confidence interval of 95%. Figure 9.2 shows, for the Cupom Cambial and Eurodollar curves, the evolution
of level factor (β1 ) for model 1 (independent curve model), for model 2 (generalized latent factors model),
and for model 13 (no-arbitrage model with Bayesian Shrinkage). As can be observed, the results are quite
similar between the various models, which is in line with the estimation of this factor via distinct models
for the term structure (e.g. Almeida (2005)) showing that the results are similar whether the models
employ or not the no-arbitrage structure and different specifications.
The results for the level factor estimated by the second class of models based on the structure of global
factors are displayed in figure 9.3, which presents the estimated components of level and slope. Sub-figures
a) and d) show the estimation of the level and slope global factors, and the other sub-figures show the
71
Figure 9.2. Estimated Level Factor for Models 1, 2 and 13.
β1 β1 β1
0.11
0.12
0.14
0.10
0.12
0.09
0.10
β1
β1
β1
0.08
0.10
0.08
0.07
0.08
0.06
0.06
0.06
0.05
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
Time Time Time
(a) Model 1 - Cupom Cambial (b) Model 2 - Cupom Cambial (c) Model 13 - Cupom Cambial
betao1 betao1 betao1

0.060
0.060
0.060
0.055
0.055
0.055
0.050
betao1
betao1
betao1
0.050
0.050
0.045
0.045
0.045
0.040
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
Time Time Time
(d) Model 1 - Eurodollar (e) Model 2 - Eurodollar (f) Model 13 - Eurodollar
transformations obtained for the curves of each market via equations 4.10 and 4.11. It can be observed
that the global factors are more similar to the factors obtained for the Eurodollar curve, but it must
also be observed that the idiosyncratic components are important to all the curves. Another important
point to note is that the local factors obtained by the generalized global factor model are quite similar to
those obtained by the other models estimated, showing the consistency in the estimation of all the mod-
els proposed, and indicating also that the Bayesian methodology proposed does not suffer identification
72
Figure 9.3. Estimated Factors for the Global Factor - Model 7
betag1 β1 betao1
0.30
−2
0.065
0.25
−4
0.060
0.20
−6
0.055
betag1
betao1
β1
0.050
0.15
−8
0.045
0.10
−10
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
Time Time Time
(a) Global Factor - Level (b) Level Factor - Cupom Cambial(c) Level Factor - Eurodollar
Curve Curve
betag2 β2 betao2
−0.05
10
−0.01
−0.10
8
−0.02
6
−0.15
−0.03
4
betag2
betao2
β2
−0.20
−0.04
2
−0.25
−0.05
0
−2
−0.06
−0.30
−4
0 100 200 300 400 0 100 200 300 400 0 100 200 300 400
Time Time Time
(d) Global Factor - Slope (e) Slope Factor - Cupom Cambial(f) Slope Factor - Eurodollar
Curve Curve
problems. An identification problem would be graphically evident if we had very distinct performances of
the same factor with similar fit power, which does not occur with the models estimated, since in all the
models the factors estimated are similar.
The importance of making the decay parameters τ1i and τ2i time variant can be observed in Figure
9.4, which shows the dynamic evolution of these parameters for the two modelled yield curves, by the
73
Figure 9.4. Estimated Decay Parameters for Model 2.
τ1 τ2
0.56
0.8
0.54
0.7
0.52
τ1
τ2
0.50
0.6
0.48
0.5
0.46
0.4
0 100 200 300 400 0 100 200 300 400
Time Time
(a) Tau 1 - Cupom Cambial Curve (b) Tau 2 - Cupom Cambial Curve
τ1 τ2
0.79
0.88
0.78
0.86
0.84
0.77
0.82
0.76
τ1
τ2
0.80
0.75
0.78
0.74
0.76
0.73
0 100 200 300 400 0 100 200 300 400
Time Time
(c) Tau 1 - Eurodollar Curve (d) Tau 2 - Eurodollar Curve
estimation in model 2. It can be noted that there is a significant time variation in these parameters, in
particular for parameter τ1 in the two curves, although parameter τ2 has a noisier behavior and a smaller
variation interval. This variation pattern shows that this modification provides the estimated models with
greater adaptation to the changes in the term structure of the interest rates observed in Figure 9.1, and it
also avoids the need for an ad hoc specification of the slope parameters as done in the studies of Diebold
and Li (2006); Diebold et al. (2008).
74
Figure 9.5. Estimated Stochastic Volatility for model 2.
Stochastic Volatility Stochastic Volatility
0.12
0.035
0.10
0.030
0.08
0.025
0.06
0 100 200 300 400 0 100 200 300 400
Time Time
(a) Stochastic volatility - Cupom Cam-(b) Stochastic volatility - Eurodollar

bial Curve Curve
The validity of employing the stochastic volatility factors can be visualized in Figure 9.5, which shows
the evolution of these two factors estimated by model 2. The dynamics of these two factors is consistent
with the volatility pattern observed in the yield curves (figure 9.1), accompanying the periods of volatil-
ity increase and reduction in these two curves, and it also shows that these additional latent factors are
important for the correct identification of the variation in the other latent factors of the model. In all the
models displaying the presence of stochastic volatility the same behavior can be observed.
Figure 9.6 shows some examples of forecasts produced by the models proposed. Sub-figures a) and b)
present a comparison of the one-day-ahead forecasts for the Cupom Cambial and Eurodollar curves per-
formed for all the models, obtained as posterior mean of the one-step-ahead forecasts. Sub-figure c) shows
an example of confidence interval construction of 95% for one-day-ahead forecasts for a determined day in
the Eurodollar curve - in this case employing model 2 of generalized latent factors; and finally, sub-figure
d) shows a comparison between the forecasts employing the 5 factor model (model 12, continuous line)
and the equivalent model with the no-arbitrage correction (model 13, dashed line) for the Cupom Cambial
75
Figure 9.6. One-day Ahead Forecasting: a,b) Mean of the Posterior Predictive Distribu-
tion; c) Example of Model 2 Predictive Interval; d) Effect of the Arbitrage-Free Correction:
Not Corrected (Model 2 - Continuous line) and Corrected (Model 12 - Dashed Line)
1 1
2 2
0.09
0.048
3 3
4 4
5 5
6 6
7 7
0.08
8 8
0.046
9 9
10 10
11 11
12 12
0.07
13 13
Interest Rate
Interest Rate
0.044
0.06
0.042
0.05
0.040
0.04
500 1000 1500 2000 2500 3000 3500 500 1000 1500 2000 2500 3000 3500
Maturity Maturity
(a) Cupom Cambial Curve 6/06/2008 (b) Eurodollar Curve 14/12/2007
* *
0.09
*
*
0.045
*
*
*
0.08
* *
*
0.040
*
0.07
Interest Rate
Interest Rate
* *
*
0.06
*
0.035
*
0.05
* *
0.030
*
0.04
*
* *
* *
2 4 6 8 10 2 4 6 8 10
Maturity Maturity
(c) Credibility Intervals 6/06/2008 (d) With and without no-arbitrage cor-
rections 6/06/2008
curve, showing that the effects of no-arbitrage correction have a reduced magnitude - consistent with the
general result shown in Table 5.
In all these examples we make direct use of a property derived from the estimation procedure by MCMC,
which is the ability of building exact confidence intervals in finite samples for the latent factors and for
76
the model’s forecasts. Note that in the original procedures of the estimation of the Diebold and Li (2006);
Diebold et al. (2008) models, the confidence intervals are built without taking into consideration the two-
stage estimation performed, and thus they only have asymptotic validity and the can thus be considerably
biased in finite samples.
10. Conclusions
In this study, a series of innovations are proposed in relation to the procedures usually employed in the
estimation of models for the term structure of interest rates, particularly models based on the Diebold
and Li (2006); Diebold et al. (2008); Christensen et al. (2008)’s specifications. These innovations make it
possible to overcome various limitations and restrictions found in these methods, such as the choice of the
functional form, limited to restricted versions with only level and slope factors, as in the model adopted
in Diebold et al. (2008) or else the use of fixed slope parameters chosen ad hoc, as employed in Diebold
and Li (2006)’s estimation. The results obtained demonstrate that there is clear evidence that not only do
the latent factors evolve in time, but also that other factors, such as the slope and volatility parameters
must be addressed as additional latent factors, providing more precise fit and forecast procedures for the
term structure of interest rates, specially of emerging countries’ yield curves, which are characterized by
richer shape and more frequent shape changes.
The estimation procedures based on Bayesian inference employing MCMC algorithms make it possible
to address the problems that generally affect the estimation of latent factors models used in the modeling
of interest rates, such as the existence of local maxima and the identification problems. Estimation
via MCMC does not employ numerical maximization, and the structure of prior information and the
hierarchical formulation make it possible to circumvent the identification problems found in the estimation
of models of term structure of interest rates. This same estimation structure makes it possible to reduce
the dimensionality of the model, by the use of Bayesian Shrinkage, a procedure which proves to be
quite effective, as demonstrated by the use of DIC information criterion in model comparison; thus the
77
estimation of these models does not need ad hoc restrictions, such as the exclusion of latent factors or the
fixation of parameters. The proposed procedures of Bayesian Shrinkage are quite effective in reducing the
dimensionality and complexity of the models proposed, a problem particularly important in the context
of joint modeling of more than a market.
Bayesian inference is particularly useful in tackling problems related to the complexity of the models of
term structure of interest rates characterized by non-linear structures, and difficult to be estimated by
classic methodologies, such as the likelihood estimation via Kalman filter. This procedure enables the
construction of exact confidence intervals, and the estimation in several stages is not necessary. The
estimation procedure by MCMC is of interest because all the information in the sample is directly used in
the estimation, as the hierarchical structure in state-space employs all the information in cross-section and
in time. Bayesian estimation makes it possible to estimate more complex and flexible models for the term
structure of interest rates, facilitating not only a better fit but also the use of no-arbitrage corrections,
which require a more complex structure of latent factors, as demonstrated by Filipovic (1999), Björk
and Christensen (1999) and Christensen et al. (2008). The results indicate that, by using the estimation
mechanisms proposed, it is possible to achieve, on the one hand, flexibility in the estimation, and on the
other, consistency with no-arbitrage, thus making it possible to generalize these arbitrage-free formulations
for the simultaneous fit of multiple yield curves.
This methodology of estimation makes it possible to obtain a posterior distribution of all the non-
observed components, parameters and latent factors, and these distributions can be employed to verify
other important characteristics, such as, for example, the validity of the no-arbitrage correction by means
of the posterior distribution of the factor of no-arbitrage correction factor. Note that this parameter is
a non-linear function of the slope parameters, and thus its distribution is not a standard distribution,
thereby the use of classic procedures of inference is not trivial, whereas in the Bayesian estimation this
information is a standard sub-product of the estimation procedure.
78
The results obtained in the empirical application with joint modeling of the Cupom Cambial and Eu-
rodollar curves are of great interest. These results demonstrate that the innovations proposed, such as
the use of additional latent factors for the conditional volatility and the slope parameters, are effective for
the fit and forecast of the term structure of these two markets, which are characterized by rich dynamics
in the shape of their curves. Another significant result is that the interdependence structure adopted
demonstrates that there is gain by using the information in the Eurodollar curve for the fit of the Cupom
Cambial curve, but the opposite is not as important, and this accords with the size and relative importance
of those two markets. Such results are confirmed by the predictive analysis performed, which establish
the validity of the specifications proposed. Moreover, a further confirmation is that the greater liquidity
of the Eurodollar market impedes the occurrence of systematic opportunities for arbitrage, and this does
not occur for some maturities in the Cupom Cambial market.
References
Aldrich, J. (2002). How likelihood and identifition went bayesian. International Statistical Review 70,
79–89.
Almeida, C. I. R. (2005). A note on the relation between principal components and dynamic factors in
affine term structure models. Revista de Econometria 25(1), 89–114.
Almeida, C. I. R. and J. V. M. Vicente (2008). The role of no-arbitrage on forecasting: Lessons from a
parametric term structure model. Journal of Banking and Finance 32, 2695–2705.
Banbura, M., D. Giannone, and L. Reichlin (2008). Large Bayesian Vars. European Central Bank Working
Paper.
Bauwens, L., M. Lubrano, and J.-F. Richard (1999). Bayesian Inference in Dynamic Econometric Models.
Cambridge University Press.
Bernardo, J. and A. Smith (1994). Bayesian Theory. Wiley.
79
Björk, T. and B. J. Christensen (1999). Interest rate dynamics and consistent forward rate dynamics.
Mathematical Finance 9, 323–348.
Brigo, D. and F. Mercurio (2006). Interest Rates Models - Theory and Practice (2nd Edition). Springer.
Burghardt, G. (2003). The Eurodollar futures and Options Handbook. McGrawHill.
Chan, K. G., G. Karolyi, F. Longstaff, and A. Sanders (1992). An empirical comparasion of alternative
models of short term interest rate. Journal of Finance 47, 1209–1297.
Christensen, J. H., F. Diebold, and G. Rudebusch (2007). The affine arbitrage-free class of Nelson-Siegel
term structure models. NBER Working Paper No. 13611.
Christensen, J. H., F. X. Diebold, and G. D. Rudebusch (2008). An arbitrage-free generalized Nelson-Siegel
term structure model. Econometrics Journal forthcoming.
Claeskens, C. and L. Hjort, N (2008). Model Selection and Model Averaging. Cambridge University Press.
Cogley, T. and T. Sargent (2001). Evolving post worl war ii. u.s. inflation dynamics. NBER Macroeco-
nomics Annual 16, 331–373.
Cox, J. C., J. . E. Ingersoll, and S. A. Ross (1985). A theory of the term structure of interest rates.
Dai, Q. and K. Singleton (2000). Specification analysis of affine term structure models. Journal of
Finance 55, 1943–1978.
Delbaen, F. and W. Schachermayer (1994). A general version of the fundamental theory of asset pricing.
Mathematische Annalen 300, 463–520.
Diebold, F. and C. Li (2006). Forecasting the term structure of government bond yields. Journal of
Econometrics 130, 337–364.
Diebold, F., C. Li, and V. Yue (2008). Global yield curve dynamics and interactions: A generalized
Nelson-Siegel approach. Journal of Econometrics 146, 351–363.
Diebold, F. X., M. Piazzes, and G. Rudebusch (2005). Modeling bond yields in finance and macroeco-
nomics. American Economic Review 95(2), 415–420.
80
Doan, T., R. Litterman, and C. Sims (1984). Forecasting and conditional projection using realistic prior
distributions. Econometric Reviews 3, 1–100.
Duffe, G. (2002). Term premia and interest rate forecasts in affine models. Journal of Finance 57, 405–443.
Duffie, D. and R. Kan (1996, 6). A yield-factor model of interest rates. Mathematical Finance, 379–406.
Filipovic, D. (1999). A note on the Nelson-Siegel family. Mathematical Finance 9(4), 349–359.
Filipovic, D. (2001). Consistency Problems for Heath-Jarrow-Morton Interest Rate Models. Springer-
Verlag.
Florens, J. P., M. Mouchard, and J.-M. Rolin (1990). Elements of Bayesian Statistics. CRC.
Gamerman, D. and H. Lopes (2006). Markov Chain Monte Carlo: Stochastic Simulation for Bayesian
Inference, Second Edition. Chapman & Hall/CRC.
Harrison, J. M. and D. Kreps (1979). Martingales and arbitrage in multiperiod securities markets. Journal
of Economic Theory 20, 381–408.
Harrison, J. M. and S. Pliska (1981). Martingales and stochastic integrals in the theory of continous
trading. Stochastic Processes and Their Applications 11, 215–260.
Heath, D., R. Jarrow, and A. Morton (1992, jan). Bond pricing and the term structure of interest rates:
A new methodology for contingent claims valuation. Econometrica 60 (1).
Hyndman, R. J. and A. B. Koehler (2006). Another look at measures of forecast accuracy. International
Journal of Forecasting 22, 679–688.
Kadane, J. B. (1974). Bayesian Analysis in Econometrics and Statistics, Chapter The role of identification
in Bayesian Theory, pp. 175–191. North-Holland.
Kadiyala, K. R. and S. Karlsson (2007). Forecasting with generalized bayesian vector auto-regressions.
Journal of Forecasting 12, 365 – 378.
Kim, D. H. and A. Orphanides (2005). Term structure estimation with survey data on interest rate
forecasts. Finance and Economics Discussion Series, 2005-08, Board of Directors of Federal Reserve
System.
Koop, G. (2003). Bayesian Econometrics. Wiley.
81
Laurini, M. P. and L. K. Hotta (2008). Bayesian extensions to Diebold-Li term structure model. In
Forecasting in Rio, FGV-RJ, Rio de Janeiro.
Litterman, R. and J. Scheinkman (1991). Common factors affecting bond returns. Journal of Fixed
Income 1, 54–61.
Lund, J. and T. Andersen (1997). Estimating continuous-time stochastic volatility models of the short-
term interest rate. Journal Of Econometrics 77, 343–377.
Morita, R. and R. Bueno (2008). Investment grade countries yield curve dynamics. In 63rd. European
Meeting of the Econometric Society, 2008, Milão. Annals of the 63rd. European Meeting of the Econo-
metric Society.
Neal, R. (2003). Slice sampling (with discussions). Annals of Statistics 31, 705–767.
Nelson, C. R. and A. F. Siegel (1987). Parsimonous modelling of yield curves. Journal of Business 60 (4),
473–489.
Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS. Wiley.
Park, T. and G. Casella (2008). The bayesian lasso. Journal of the American Statistical Association 103,
681–686.
Pereira, F. T. G. (2009). Curva a termo para o risco de convertibilidade: Uma abordagem utilizando o
diferencial de juros. Unpublished Working Paper.
Pinheiro, F., C. I. Almeida, and J. Vicente (2007). Um modelo de fatores latentes com variáveis macroe-
conômicas para a curva de cupom cambial. Revista Brasileira de Finanças 5(1), 79–92.
Poirier, D. J. (1998). Revising beliefs in nonidentified models. Econometric Theory 14, 483–509.
Robert, C. P. and G. Casella (2005). Monte Carlo Statistical Methods. Springer.
Robertson, J. C. andTallman, E. W. (1999). Vector autoregressions: forecasting and reality. Economic
Review Q1, 4–18.
Rothemberg, T. (1971). Identification in parametric models. Econometrica 39, 577–591.
Sims, C. (2001). Comment on Sargent and Cogley’s evolving post world war ii u.s. inflation dynamics.
NBER Macroeconomics Annual 16, 373–379.
82
Singleton, K. J. (2006). Empirical Dynamic Asset Pricing. Princeton University Press.

Spiegelhalter, D., N. G. Best, B. P. Carlin, and van der Linde A. (2002). Bayesian measures of model
complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B (Statistical
Methodology) 64(4), 583639.
Svensson, L. E. O. (1994). Estimating and interpreting forward interest rates: Sweden 1992-1994. NBER
Working Paper (4871).
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical
Society, Series B 58, 267–288.
Vasicek, O. (1977). An equilibrium characterization of the term structure. Journal of Financial Econom-
ics 5, 177–88.
83
GENERALIZED EMPIRICAL LIKELIHOOD/MINIMUM CONTRAST
ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

LUIZ KOODI HOTTA
Abstra t. In this study we approa h the semi-parametri estimation of Sto hasti Dif-
ferential Equations employing methods of generalized empiri al likelihood and generalized
minimum ontrast. The results obtained demonstrate that the estimators proposed, parti u-
larly, the exponentially tilted empiri al likelihood (S henna h (2007)) estimator, obtain better
results than those of the Generalized Methods of Moment generally used in the estimation
of sto hasti dierential equations. These results are derived from the robustness properties
of this method in the presen e of problems of in orre t spe i ation, whi h, in the ontext of
the estimation of sto hasti dierential equations, o urs by using the pro ess' approximate
dis retizations in the onstru tion of moment onditions. The analyses are arried out by
means of Monte Carlo experiments and of an empiri al appli ation estimating several models
of short-term interest rates for a series of Treasury Bills with a one-month maturity.
Key Words: Sto hasti Dierential Equations, Empiri al Likelihood, Generalized Minimum
Contrast.
JEL Codes: C14, C22.
1. Introdu tion
The use of sto hasti pro esses in ontinuous time in the modeling and pri ing of nan ial
instruments is one of the basis of the modern theory of Finan e, and its origin an be tra ed
ba k to Ba helier (1900)'s seminal study. The use of sto hasti pro esses in ontinuous time is
justied by the mathemati al onvenien e in relation to the use of pro esses in dis rete time
and the possibility of employing the mathemati al theory developed for the general lass of
pro esses known as ontinuous semi-martingales, making it possible to perform an appli ation
of the whole theory of pri ing by no-arbitrage ((Harrison and Kreps (1979), Harrison and
Pliska (1981) and Delbaen and S ha hermayer (1994))) in this ontext. The basi obje ts of
1
85
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS 2
the modeling of sto hasti pro esses in ontinuous time are the so- alled Sto hasti Dierential
Equations, whi h are obje ts represented in the general form:
(1.1) dXt = µ(t, Xt ) + σ(t, Xt )dWt ,
where µ(t, Xt ) represents the deterministi part of the pro ess (instantaneous drift ), σ(t, Xt )
represents the sto hasti omponent (volatility) of the pro ess, and Wt is the so- alled Wiener
pro ess or Brownian Motion. This representation is useful be ause it makes it possible to
dene the evolution of the pro ess traje tories Xt by means of a representation given by a
sto hasti integration (e.g Rogers and Williams (2000), Karatzas and Shreve (1987), Kloeden
and Platen (1992)):
Z t Z t
(1.2) Xt = X0 + µ(t, Xt )dt + σ(t, Xt )dWt .
t0 t0
Dierent spe i ations of the drift µ(t, Xt ) and volatility σ(t, Xt ) pro esses in the sto hasti
dierential equation give rise to pro esses with distin t properties. These properties enable
the representation of a wide lass of pro esses used in nan e. Fo using on the modeling of
short-term interest rates, a series of alternative spe i ations for the modeling of short-term
interest rates have been employed. Table 1 presents some formulations used in the literature,
omprising the models of Merton (1973), Vasi ek (1977), Cox et al. (1985), Dothan (1978),
Bla k and S holes (1973), Brennan and S hwartz (1980), Cox et al. (1980) and Cox (1975).
Notably, on its last line is dened the model alled Generalized Cox-Ingersoll-Ross (CIR),
ontaining all the previous models as parti ular ases, as demonstrated in Chan et al. (1992),
whi h in ludes a general dis ussion on the properties of these models.
Parameter estimation in sto hasti dierential equations is a well developed theme in the
e onometri literature1, and there is a very wide range of te hniques available. This range
of te hniques is related to the di ulties inherent to the estimation of sto hasti dierential
1For a review of the literature on sto hasti dierential equations see Gourieroux and Monfort (1996),
Prakasa Rao (1999), Bishwal (2007) and Zivot and Wang (2006)
86
Merton (1973) dXt = αdt + σdW

Vasi ek (1977) dXt = (α + βXt )dt + σdW
CIR SR (1985) 1/2
dXt = (α + βXt )dt + σXt dW
Dothan (1978) dXt = σXt dW
GBM (1973) dXt = βXt dt + σXt dW
Brennan-S hwartz (1980) dXt = (α + βXt )dt + σXt dW
CIR VR (1980) 3/2
dXt = σXt dW
CEV (1975) dXt = βdt + σdW
Generalized CIR dXt = (α + βXt )dt + σXtγ dW
Table 1. Models for short-term interest rates.
equations, parti ularly in the non-existen e of losed solutions for the sto hasti dierential
equations in general ases and the problem of using data dis retely observed in the estimation
of a pro ess formulated in ontinuous time. As examples of estimation methods in this ontext,
we have maximum likelihood, generalized methods of moments (GMM), methods of simulated
moments, Martingale estimating equations, Markov hain Monte Carlo and indire t inferen e,
and non-parametri Methods. In prin iple, the most re ommended form of estimation onsists
in employing the likelihood fun tion, be ause, under the regularity onditions, the maximum
likelihood estimators are onsistent, e ient and asymptoti ally normal. However, in the
ontext of the estimation of sto hasti dierential equations, the non-existen e of general
solutions is a general di ulty found in the use of methods based on the likelihood of the
pro ess, whi h is formulated by employing the transition density resulting from the solution
of the sto hasti dierential equation.
In the absen e of analyti al solutions, it is ne essary to use approximations in the onstru -
tion of the likelihood fun tion, su h as the use of quasi-maximum likelihood methods, whi h
generates estimators with minimum mean square error, or the use of simulated maximum like-
lihood, whi h uses simulated traje tories by Euler or Milstein dis retizations in the likelihood
evaluation (Pedersen (1995)), or else approximations using Hermite expansions obtained by
Ait-Sahalia (2002). Note that, given the employment of approximations in the evaluation of
the likelihood fun tion, the optimality properties of this estimator may not remain, and thus
other estimators ould be ome ompetitive. Estimators using moment onditions are also
often employed in the estimation of sto hasti dierential equations. The estimation using
87
the (GMM) by Hansen (1982), employing a simple dis retization of the pro ess, may be the
form most widely employed (e.g. Chan et al. (1992)). Although the generalized method of
moments is hara terized by properties of onsisten y and asymptoti e ien y, its properties
in nite samples and in the presen e of spe i ation problems may not be optimal. In order
to ta kle these problems we dis uss the use of two lasses of estimators in the estimation of
sto hasti dierential equations employing dis rete data - estimators of generalized empiri al
likelihood and estimators of generalized minimum ontrast, omparing their performan e with
that of the estimators based on estimation by the Generalized Method Moments. These esti-
mators are semi-parametri in the sense that the parametri form of the sto hasti dierential
equation is used through moment onditions, but the non-observed density of the pro ess is
evaluated in a non-parametri form.
The estimators presented (generalized empiri al likelihood, ontinuous updating empiri al
likelihood, exponential tilting and exponentially tilted empiri al likelihood) possess the same
properties of onsisten y and rst-order asymptoti e ien y (e.g. Smith (2001), S henna h
(2007)) as the ompared GMM estimators (two-stage GMM, iterative GMM, ontinuous up-
dating GMM). However, theoreti al results demonstrate that these estimators may have supe-
rior properties in terms of bias in nite samples, and asymptoti properties of superior order
(e.g. Kitamura (2006)). Furthermore, these estimators are asymptoti ally e ient in the lass
of semi-parametri estimators (in Bi kel et al. (1993)'s sense), and have optimal properties in
terms of hypotheses tests: minimax optimality, optimality in the sense of large deviations, and
these tests are uniformly more powerful in the generalized Neyman-Person sense. The lass of
estimators of generalized minimum ontrast (exponential tilting and exponentially tilted em-
piri al likelihood) has hara teristi s of robustness in the presen e of spe i ation problems.
These hara teristi s of robustness of the estimators based on generalized minimum ontrast
are of the utmost importan e in the estimation of sto hasti dierential equations, and be-
ause of the non-existen e of exa t dis retizations, all the estimators of ontinuous pro esses
employing dis retely observed data are hara terized by a problem of in orre t spe i ation.
88
This study dis usses the use of these methods in the estimation of sto hasti dierential equa-
tions, and the results obtained demonstrate that these estimators obtain superior results when
ompared with the te hniques generally employed of generalized methods of moments. One
result of parti ular interest is that the estimator of exponentially tilted empiri al likelihood
(S henna h (2007)) obtains results that are mu h superior in terms of nite sample bias, a
result derived from the properties of this model's robustness in the presen e of in orre t spe -
i ation (e.g. Smith (2001), S henna h (2007)).
This arti le is stru tured as follows: se tion 2 presents a brief review of the estimation of
sto hasti dierential equations employing the GMM. Se tion 3 displays generalized empiri al
likelihood and generalized minimum ontrast based estimators, dis ussing their properties,
similarities and potential advantages in the estimation of sto hasti dierential equations. A
series of Monte Carlo's experiments is performed in se tion 4, aiming at stressing some prop-
erties of the estimators dis ussed in this study. In se tion 5, we perform the estimation of the
models in Table 1 employing GMM, generalized empiri al likelihood and generalized minimum
ontrast estimators for a series of interest rates of Treasury Bills with a one-month maturity,
and Se tion 6 presents the results of the spe i ation tests based on over-identi ation ondi-
tions for the models estimated. The nal on lusions are in se tion 7, showing on isely that
the estimators proposed, whi h are unpre edented in the ontext of estimation of sto hasti
dierential equations, obtain results that are superior to the te hniques of the Generalized
Methods of Moments generally employed in the estimation of sto hasti dierential equations.
2. Estimation by the Generalized Method of Moments

As the te hnique of the GMM is widely employed in the e onometri literature for the
estimation of sto hasti dierential equations, and as it also has deep onne tions with the
estimation methods of Maximum empiri al likelihood and generalized minimum ontrast, we
will start by reviewing this methodology, giving spe ial attention to the moment onditions
employed in the estimation.
The estimation by the Generalized Method of Moments introdu ed by Hansen (1982) is per-
formed using moment onditions in the form:
89
(2.1) g(θ, Xt ),
where these onditons are evaluated by employing the sample moments su h as:
T
1X
(2.2) g (θ) = g(θ, xt ).
T
t=1
GMM estimators are dened as solutions to the system:
T
1X
(2.3) θb = argθ g(θ, xt ) = 0.
T t=1
Note that, ex ept in the ase of the number of parameters being equal to the number of
moment onditions (exa tly identied system), the problem des ribed in 2.2 generally the are
no solutions. In order to obtain a single solution dene the following riterion fun tion:
(2.4) J(θ) = g (θ)′ W g (θ)
and the minimization of this fun tion denes the optimum solution of the problem, where
W is a weighting positive denite matrix. Hansen (1982) demonstrates that the e ient
asymptoti solution of the GMM estimation is obtained when this matrix is given by:
n √ o−1
(2.5) W∗ = lim V ar T g (θ) = Ω(θ)−1
t→∞
and thus the optimal weight is obtained by employing the inverse of the sample varian e-
ovarian e matrix. This matrix is usually estimated employing the lass of HAC estimators
of Newey and West (1987) given by:
90
T
X −1
(2.6) b=
Ω b s (θ ∗ ),
kh (s)Γ
s=−(T −1)
where k is a kernel fun tion dependent on the hoi e of a bandwidth h, whi h an be hosen
using the Newey and West (1987)'s or Andrews (1991)'s pro edures:
T
X
(2.7) b s (θ ∗ ) = 1
Γ g(θ ∗ , xt )g(θ ∗ , xt+s )′ ,
T
t=1
The e ient estimator of the GMM is then obtained as a solution of the problem:
(2.8) θb = arg min g (θ)′ Ω

b (θ ∗ ) g (θ) .
θ
There are several forms of performing the implementation of the GMM estimator. The
initial form proposed by Hansen (1982) is the estimator known as two-stage GMM. This
estimator is obtained by performing a rst stage by obtaining an initial estimator θb∗ =
arg min g (θ)′ Ωg (θ), where Ω is an initial weight matrix, normally an identity matrix. From
b (θ ∗ ) is al ulated in fun tion of this initial estimation, and
this rst stage, a HAC matrix Ω
b (θ ∗ ) g (θ) with
the nal estimation of the GMM estimator is obtained as θb = arg min g (θ)′ Ω
the HAC matrix obtained in the rst stage.
Note that, in this ase, there is a dependen e on the results of the se ond stage with the
initial estimation on the rst stage, and thus this pro edure an reate a rst-order bias im-
pairing the performan e of the estimator in nite samples (Hansen et al. (1996)). In order to
solve this problem, two alternative pro edures are proposed. The rst pro edure is known as
Iterative GMM, whi h is a modi ation of the two-stage pro edure. In this pro edure, the
estimation of the rst stage is reinitialized with the result of the se ond stage estimation, and
this iteration ontinues up to when a variation in the ve tor of parameters be omes smaller
91
than a hosen epsilon.
Another possible estimator is known as GMM with ontinuous updating (Hansen et al.
(1996)). In this ase the estimation of the parameter θb is not performed in stages, but it
is performed simultaneously by employing an algorithm of numeri al optimization. Starting
from an initial ve tor θ0 (generally hosen employing the two-stage GMM method) the estima-
b (θ ∗ ) g (θ) , but now θ and Ω
tion is performed by θb = arg min g (θ)′ Ω b (θ ∗ ) are simultaneously
determined. This pro edure obtains the same rst-order properties as the Itera tive GMM
estimator, but a ording to Hansen et al. (1996), it has better properties in terms of bias
in nite samples. A ording to Newey and Smith (2004) and Anatolyev (2005), the three
methods are asymptoti ally equivalent, but the se ond-order bias of the ontinuous updating
estimator is smaller, and the iterations in rease the estimator's e ien y. However, the nu-
meri al pro edure an be subje t to multiple modes in the obje tive fun tion, whi h renders
this estimator numeri ally unstable.
In order to perform the estimation of sto hasti dierential equations by employing the
GMM, it is ne essary to formulate the moment onditions in terms of some dis retized form
of the model. The rst approa h employed is by means of the simple dis retization adopted
in Chan et al. (1992) for the Generalized CIR model (Table 1) given by:
(2.9) Xt+1 − Xt = α0 + β0 Xt + εt+1
with the onditions: E(εt+1 ) = 0 and E(ε2t+1 ) = σ02 Xt2X . In this ase, we an formulate
the moment onditions ne essary for the estimation of parameters (α, β, γ, σ2 )), by dening
εt+1 = Xt+1 − Xt − α0 − β0 Xt , and dening four moment onditions in this form:
92
 
 εt+1 
 
 εt+1 Xt 
 
(2.10) g(θ) =  .
 ε2t+1 − σ02 Xt2
γ 
 
 
(ε2t+1 − σ02 Xt2γ )Xt
and applying the GMM estimation dened by equation 2.8. Moment onditions for the
other submodels of the Generalized CIR family an be obtained by imposing the ne essary
restri tions, a ording to Table I in Chan et al. (1992). Note that this simple dis retization is
not onsistent - the dis retization does not onverge to the true solution of the pro ess, sin e
it ignores the time interval between observations. A simple form of obtaining a onsistent
dis retization for this pro ess is to employ a rst-order Euler dis retization, whi h denes
moment onditions given by a residual ve tor in this form: εt+△t = rt+△ − rt − (α0 + β0 rt )△t ,
and thus onstru ting the ve tor of moment onditions as:
 
 εt+△t 
 
 εt+△t Xt 
 
(2.11) g(θ) =  .
 γ
ε2t+△t − σ02 Xt2 △t 
 
 
(ε2t+△t − σ02 Xt2γ △t)Xt
This is the form employed in this study. Note that the use of dis retization always rep-
resents a spe i ation problem in the inferen e pro edure, sin e, even employing onsistent
dis retizations, the bias term aused by the dis retization employed only tends to zero when
△t → 0. Note that the time interval △t employed in the pro ess of dis retization depends on
the frequen y of data observation, and thus it is not under the resear her's ontrol. Therefore,
there are two sour es of bias problems in the estimation of sto hasti dierential equations:
the rst form derived from the use of Generalized Methods of Moments estimators, and an
additional form generated by the in orre t spe i ation given by the use of a non-exa t dis-
retization of the pro ess. Note that in Chan et al. (1992)'s original study, the estimation
employs a simple dis retization of the model rather than the Euler dis retization, and this
93
represents a bias in rease in the estimation due to a spe i ation with a larger approximation
error. Consequen es of this problem an be seen in Prakasa Rao (1999), and a supplementary
dis ussion of this problem is presented in se tion 4, whi h demonstrates that this dis retization
problem leads to a problem of in orre t spe i ation in the estimation of sto hasti dierential
equations.
3. Generalized Empiri al Likelihood and generalized minimum ontrast

Estimators
In the GMM there is a trade-o between weaker ne essity of assumptions for its use and
the e ien y of the method in nite samples. Conditions of regularity for estimators of the
GMM (Hansen (1982), Newey and M Fadden (1994)) involve only onditions for the asymp-
toti validity of the moment onditions and do not assume stronger onditions su h as the
knowledge of the pro ess distribution, but, in nite samples, the properties of this estimator
is not optimal.
The opposite situation would be the estimation by the maximum mikelihood method, whi h
employs not only the onditional moments of the pro ess but all the information in the on-
ditional densities. If the pro ess is orre tly spe ied and meets the regularity onditions,
then it is a better asymptoti ally normal estimator, and it also rea hes optimality in measures
su h as Badahur e ien y (Kitamura (2006), DasGupta (2008)). Nevertheless, employing the
maximum likelihood in the estimation of sto hasti dierential equations is di ult by the
non-existen e of losed forms for the solution of sto hasti dierential equations, and thus it
is not possible to employ parametri forms for the maximum likelihood estimation.
An alternative form, not yet explored in the literature of inferen e in ontinuous time pro-
esses, is the use of a form of non-parametri maximum likelihood estimation known as empir-
i al likelihood (EL). A ording to Kitamura (2006), assuming a sequen e of IID data {xi }Ti=1 of
n PT o
an unknown density, and dening △ as the simplex (p1 , . . . , pT ) : p
t=1 t = 1, 0 ≤ p t ≤ 1, t = 1, . . . T ,
the non-parametri log-likelihood fun tion is dened as:
94
T
X
(3.1) ℓN P (p1 , . . . , pT ) = log pt , (p1 , . . . , pt ) ∈ △
t=1
whi h an be interpreted as a log-likelihood of a multinomial distribution with support

given by the sample observations {xt }ni=1 , even if the density xt is not a multinomial. As this
formulation does not involve any model, it is rather unrestri tive to be employed in inferen e
problems. A notable advan ement in the literature of empiri al likelihood was a hieved by
Owen (1991), who established onne tions between the non-parametri likelihood and the
estimation employing moment onditions, whi h is also used in the estimation by the GMM,
as shown by Qin and Lawless (1994). Assuming the ondition of moments in the form:
Z
(3.2) E [g(θ, Xt )] = g(θ, X)dµ = 0, θ ∈ Θ ⊂ Rk ,
it is possible to transform this estimation problem using onditions of moments in a non-

parametri likelihood problem employing impli it probabilities pi , and thus the log-likelihood
fun tion to be maximized be omes:
T
X T
X
ℓN P (p1 , . . . , pT ) = log pt , s.t. g(θ, Xt )pt = 0
t=1 t=1
The estimator that maximizes this expression is the maximum empiri al likelihood estimate.
The impli it probabilities are related to the validity of the moment onditions. These impli it
probabilities give more weight to the observations where the moment onditions are loser to
zero. Note the similarity with the estimation by the GMM, whi h is a simplied form that
assumes that all weights are equal, i.e., pt = 1/n.
The use of empiri al likelihood is parti ularly important in the estimation of sto hasti dif-
ferential equations be ause, ex ept in a few parti ular ases, there are no exa t solutions for
the sto hasti dierential equations, and thus it is not possible to onstru t analyti ally the
transition densities of the pro ess, whi h makes it impossible to onstru t an exa t likelihood
fun tion. Empiri al likelihood method allow us to assess the likelihood of the pro ess in a
95
non-parametri form, and thus they do not depend on the existen e of analyti al solutions for
the sto hasti dierential equations. This non-parametri evaluation of the likelihood fun tion
is e ient in the semi-parametri sense (e.g. Bi kel et al. (1993)), and, at the same time, it
employs the parametri spe i ation given by the sto hasti dierential equation to onstru t
moment onditions. A dieren e found with the GMM is that, in the methodology of gen-
eralized empiri al likelihood, the moment ondition an be a pro ess weakly dependent and
heteroskedasti . In order to ta kle this situation, Anatolyev (2005) proposes repla ing g(θ, xt )
for a smoothed version dened as:
m
X
(3.3) gw (θ, xt ) = w(s)g(θ, xt−s )
s=−m
where w(s) are weights obtained by a kernel fun tion adding one, in the spirit of a HAC
estimator (Andrews (1991)) . This modi ation makes it possible to obtain the same onditions
of rst-order asymptoti e ien y existing in the GMM methods. In this way the estimate
given by the moment onditions is given by:
T
X
(3.4) θb = argθ pt gw (θ, xt ) = 0.
t=1
An interpretation of equation 3.4 in relation to the GMM estimator is that, while in over-
identied models estimated by GMM the moment onditions are not exa tly equal to zero, in
the estimators dened by this equation the moment onditions are exa tly equal to zero by
weighting with the use of the empiri al probabilities pt . Note that, in models exa tly iden-
tied, all the estimators proposed obtain similar results, be ause in all these estimators the
moment onditions are always valid. In over-identied models with valid moment onditions,
all these estimators produ e the same asymptoti varian e.
An alternative interpretation of the empiri al likelihood estimator an be obtained, su h as

that of a parti ular ase of the generalized minimum ontrast (GMC) estimator (e.g. Bi kel
96
et al. (1993)), similar to the interpretation of the GMM estimator as an estimator of minimum
χ2 , or the interpretation of estimators of Quasi-Maximum Likelihood as estimators of Mini-
mum Contrast. Dening a general divergen e fun tion between two measures of probability
P and Q as follows:
Z
dP
(3.5) D(P, Q) = φ dQ,
dQ
where φ is a onvex fun tion. Dene M as the set of all the probability measures in Rp and
Z
(3.6) P (θ) = P ∈ M : g(θ, x)dP = 0
and P the statisti model of all the probability measures ompatible with 3.6. The problem
of minimum ontrast optimization is given by
(3.7) inf inf D(P, µ)

θ∈Θ P∈P(θ)
where µ denote the dominating measure in this model.

Thus in a orre tly spe ied model, this dis repan y must be minimal in θ = θ0 . In the
ase of empiri al likelihood estimators, the point estimation θb is the one that minimizes the
dis repan y between pbt and uniform weights.
Some measures of divergen e employed in the literature are the Kullba k-Leibler divergen e
and the entropy measure. This problem of Minimum Contrast an be formulated in the form
of moment onditions E(g(θ, Xt )) = 0, by employing a modied ondition in the form of Eq.
3.4 and the Minimum Contrast estimator is obtained with the use of some ontrast fun tion
hT :
T
X
(3.8) θbn = arg min hT (pt ).
θ,pt
t=1
97
An important result is that an adequate hoi e of the dis repan y fun tion an lead to a
unied representation of Empiri al Maximum Likelihood and Minimum Contrast estimators.
This representation an be obtained when the fun tion hT (pt ) belongs to the Cressie-Read
family of dis repan ies given by:
[γ(γ + 1)]−1 (T pt )γ+1 − 1]

(3.9) hT (pt ) =
T
and with restri tions on the denition of the Cressie-Read dis repan y, there are parti ular
ases of several lasses of estimators. The empiri al likelihood Method is obtained with the re-
stri tion γ → 0 in the dis repan y fun tion hT (pt ); the generalized minimum ontrast method,
known as exponential tilting (ET) of Kitamura and Stutzer (1997) and Imbens et al. (1998),
is obtained by γ → −1 and the Continuous Updating estimator employing the empiri al
likelihood formulation is obtained by γ → 1.
Smith (2001) demonstrated that it is possible to dene another estimator that also in ludes
these estimators as parti ular ases. The method of generalized empiri al likelihood (GEL) of
(Smith (2001)) is obtained as a solution for the following saddlepoint problem:
" T
#
1 X
(3.10) θbn = arg min max ′ w
ρ λ g (θ, xt ) ,
θ λ T
t=1
where λ denes Lagrange multipliers asso iated to restri tion:
T
X
(3.11) pt gw (θ, xt ) = 0
t=1
Estimators are obtained solving the previous equation with the rst-order ondition:
T
X
∂gw (θ, xt )
(3.12) pt λ ′
= 0,
t=1
∂θ
where:
98
1 ′ ′ w
(3.13) pt = ρ λ g (θ, xt ) .
T
This generalized likelihood estimator in ludes the empiri al likelihood estimator, assuming
the same onditions on γ of the Cressie-Read divergen e fun tion, and modifying fun tions h
and ρ. The empiri al likelihood estimator is obtained with h(p) = −ln np and ρ(ξ) = ln(1−ξ),
the estimator of exponential tilting with h(p) = np ln np and ρ(ξ) = −exp(ξ) , the estimator of
ontinuous updating with h(p) = (np)2 and ρ(ξ) = −(1+ξ)2 /22. The solution for the obtention
of parameter estimators and the Lagrange multiplier estimators an be a quired by methods
of numeri al optimization or via quasi-Newton iterative methods; and the solution an be
formulated in a problem of a smaller dimension by means of a dual formulation (Kitamura
(2006)), and thus this is the general form employed in the estimation in this study. An
additional lass of estimators an be obtained by ombining the empiri al likelihood estimator
and the exponential tilting estimator, generating the estimator known as exponentially tilted
empiri al likelihood (ETEL) proposed by S henna h (2007). This estimator is dened as:
n
!
X
(3.14) θb = arg min n −1 e
h(pt (θ)) ,
θ
i=1
where wbi (θ) is the solution of:
n
X
(3.15) min
n
n−1 h(pt )
{wi }i=1
i=1
Pn Pn
subje t to i=1 pt g(θ, xt ) = 0 and i=1 pt = 1, with e
h(pbt ) = −ln(npt ) and h(pt ) =
npt ln(npt ).
Note that the ETEL (exponentially tilted empiri al likelihood) estimator employs the ex-
ponential tilting method to nd probabilities wbi (θ) and the empiri al likelihood method to
2See table 1 in Smith (2001) for further details
99
estimate the parameter ve tor θb. These probabilities are related to multipliers λ through the
relation:

b ′ g(θ, xt )
λ(θ)
(3.16) pbt (θ) = P .
n b ′ g(θ, xt )
λ(θ)
i=1
An important property of the ETEL lass of estimators is their behavior in the presen e of
in orre t spe i ation. Imbens et al. (1998) point out that the empiri al likelihood estimator
an have inadequate behavior in the presen e of in orre t spe i ation, due to the presen e of
a singularity in its inuen e fun tion; and theorem 1 in Smith (2001) demonstrates that the
asymptoti properties of the empiri al likelihood estimator an be severely degraded in the
presen e of minimum spe i ation problems. This ee t also ae ts the estimations of impli it
probabilities pbt , be ause, in the presen e of spe i ation problems, the impli it probabilities
in likelihood problems tend to on entrate in the extreme observations, in opposition to what
is expe ted from a robust estimator.
The result obtained by Smith (2001) is that, in the lass of minimum dis repan y estimators,
the only estimator with adequate behavior in the presen e of spe i ation problems is the
exponential tilting estimator, be ause its inuen e fun tion does not present singularities. As
the ETEL estimator is a ombination of empiri al likelihood estimators and of the exponential
tilting estimator, it maintains the hara teristi s of asymptoti e ien y and minimum bias
of estimator EL, and, additionally, it is robust in the presen e of spe i ation problems, due
to the use of the exponential tilting estimator to estimate the impli it probabilities, as shown
√
in theorems 8-10 in Smith (2001), indi ating that this estimator is n onsistent even in the
presen e of spe i ation problems.
We an now sum up some ommon properties of the estimators dis ussed in this study. The
rst property is that all the estimators presented (two-stage GMM, Iterative GMM, ontinuous
updating GMM, generalized empiri al likelihood, exponential tilting and exponentially tilted
empiri al likelihood) have the same properties of onsisten y and rst-order asymptoti e-
ien y (e.g. Smith (2001), S henna h (2007)), they are e ient in the semi-parametri sense of
100
Bi kel et al. (1993), and in the validity of spe ied moment onditions. All the estimators have
the same asymptoti varian e, but the superior results in terms of bias and asymptoti prop-
erties of superior order are valid for the estimators based on generalized empiri al likelihood,
Exponen ial Tilting and exponentially tilted empiri al likelihood (e.g. Kitamura (2006)).
The lass of estimators based on empiri al likelihood also presents optimum properties in term
of hypotheses tests: these tests are optimum in the minimax and large deviation riteria and
are uniformely more powerful in the generalized sense of Neyman-Person, as demonstrated in
Kitamura (2006).
However, the performan e in nite samples an be rather dierent. The two-stage GMM es-
timator an be severely biased in the sizes of the sample employed in e onomi s and nan e,
and ontinuous updating estimators are numeri ally unstable due to the existen e of multiple
modes in the obje tive fun tion, for example, Hansen et al. (1996)). Newey and Smith (2004)
demonstrate that the empiri al likelihood estimator must have a bias in nite samples smaller
than the bias of estimators of the exponential tilting and ontinuous updating lasses. In em-
piri al likelihood and exponential tilting estimators, the bias does not grow with the number
of moment onditions, as happens with the GMM estimator. Newey and Smith (2004) also
demonstrate that estimators based on GEL have good properties in terms of se ond-order
bias. Another interesting property is that estimators based on GMC and GEL are invariant
to linear transformation in the moment onditions ve tor, whi h does not o ur with the two-
stage GMM estimator.
4. Monte Carlo Eviden e

As all these methods are rst order and asymptoti ally equivalent, to perform an analy-
sis of the nite sample properties of these estimators, we performed a Monte Carlo analysis
evaluating several properties of these estimators, parti ularly the bias, mean squared error
and mean absolute error, a ompanied by a dis ussion about their validity in the presen e of
in orre t spe i ation in the ontext of estimation of sto hasti dierential equations.
The Monte Carlo pro edure onsists in simulating Generalized CIR models, Vasi ek and CIR
101
SR3 models, performing the estimation with the proposed estimation methods, and, based
on these estimations, evaluating the bias, mean square error (MSE) and the mean absolute
error (MAE) in relation to ea h parameter estimated. Figures 4.1, 4.2, 4.3 and 4.4 show MSE
and MAE sequentially for ea h parameter and ea h method, for a more easy visualization of
results.
The simulation pro edure employed for the Generalized CIR pro ess employs a Milstein dis-
retization (e.g.Milstein (1974), Kloeden and Platen (1992)) to generate pro ess traje tories,
sin e in this pro ess there is no exa t analyti al solution for the transition density. For the
Vasi ek and CIR SR pro esses, we employed the exa t transition density to generate simulated
traje tories (e.g. Ait-Sahalia (2002)).
Note that this detail is of fundamental importan e. Before dis ussing this point, we will intro-
du e the notation of strong onvergen e of dis retizations. Suppose that we want to generate
a traje tory of the sto hasti dierential equation dXt = µ(t, Xt ) + σ(t, Xt )dWt employing a
dis retization that generates traje tories Yt△ of this pro ess, and that the traje tories of this
approximation onverged to the true traje tory. An approximation Yt△ is said to be strong
order onvergent γ > 0 if there are positive onstants K and γ so that ea h △ is valid:

E |Xt − Yt△ | ≤ K△γ ,
in whi h K does not depend on the dis retization interval △. On usual Lips hitz onditions
and growth, it is possible to demonstrate (e.g. Kloeden and Platen (1992), Prakasa Rao
(1999)) that the Euler dis retization onverges with strong order γ = 0.5 and the Milstein
dis retization (Milstein (1974))is strong order onvergent with γ = 1.
As the dis retization employed in moment onditions is of strong order inferior to that
employed in the pro ess simulation, an in orre t spe i ation problem arises generated by the
dis retization employed. This problem o urs in a more intense form when the exa t solution
of the sto hasti dierential equation an be used to generate the pro ess traje tory. The
fundamental point is that, in the estimation based on approximated dis retizations, there is
3These experiments were performed for the other models as well and produ e similar results, but are not
presented here for reasons of spa e
102
always a bias generated by the pro ess dis retization, and one of the obje tives of the Monte
Carlo study is to verify whether any method manages to produ e a redu tion in the bias in
relation to this ee t, whi h an be interpreted as a spe i ation problem. Note that in Chan
et al. (1992)'s original arti le, the dis retization employed is still simpler than Euler's, and
thus the existing bias in the estimators must be even greater.
The rst Monte Carlo experiment orresponds to the simulation of 1,000 traje tories of
size 474 of a Generalized CIR pro ess with a parameter ve tor given by α =0.0408, β =-
.5921,σ2 =1.6704 and γ =1.4999. The results of this experiment are displayed in Table 2
and Figure 4.1. Ea h gure shows respe tively the bias and mse obtained by ea h estimator.
The results obtained demonstrate that there is a relevant bias in the estimation of all the
parameters, and parti ularly of parameter σ2 . The results in terms of the size of the bias and
of the mean square error are quite similar for almost all the estimators, ex ept for estimators
ETEL and SETEL, whi h present far superior results in terms of bias, MSE and MAE in
relation to the other methods for all the parameters estimated, whi h is evident in Figure 4.1.
In the Monte Carlo experiment for the Vasi ek pro ess (Table 3 and Figure 4.2), we
simulated again one thousand traje tories with a parameter ve tor given by α = 0.0154,
β = −.1779, σ 2 = .0004 and γ = 0. The results demonstrate again that the ETEL es-
timators' performan e is superior, and it is also noti eable that, in this experiment, the
estimator with the worst performan e was the estimator GMMCUE. For the CIR SR pro-
ess (Table 4 and Figure 4.3), we simulated one thousand traje tories of the pro ess with
α = 0.0189, β = −.2339, σ 2 = 0.0073 and γ = .5.. The same behavior of better performan e
of the ETEL lass of estimators was observed, as well as a similar performan e of the other
estimators.
Note that, so far, the problem of in orre t spe i ation was aused only by the use of
approximated dis retization in the onstru tion of the pro ess' moment onditions. In order
to verify whether the better performan e properties of the ETEL lass of estimators are
valid in more general situations of in orre t spe i ation, we employed, as data generating
103
GEL/GMC ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
mean α 0.058507 0.058507 0.058507 0.055458 0.059144 0.059415 0.042725 0.052872 0.058399 0.059222 0.04188
bias α 0.017707 0.017707 0.017707 0.014658 0.018344 0.018615 0.001925 0.012072 0.017599 0.018422 0.0010799
mse α 0.00056964 0.00056964 0.00056964 0.00047632 0.0006153 0.00061202 0.00013504 0.00036246 0.00058505 0.00060074 0.00011089
mae α 0.01897 0.01897 0.01897 0.017089 0.019782 0.019879 0.0054645 0.014678 0.019406 0.019842 0.0062548
mean β -0.8595 -0.8595 -0.8595 -0.80945 -0.86955 -0.87356 -0.55882 -0.7687 -0.8596 -0.87196 -0.57188
bias β -0.2674 -0.2674 -0.2674 -0.21735 -0.27745 -0.28146 0.033282 -0.1766 -0.2675 -0.27986 0.020224
mse β 0.13466 0.13466 0.13466 0.11417 0.14782 0.14621 0.0060387 0.087359 0.14515 0.14649 0.005762
mae β 0.28842 0.28842 0.28842 0.25989 0.30496 0.30573 0.050072 0.22293 0.30165 0.30589 0.053256
2
mean σ 2.0247 2.0247 2.0247 1.5815 1.3286 1.344 1.7024 1.6174 1.3095 1.3495 1.709
2
bias σ 0.35432 0.35432 0.35432 -0.088891 -0.34183 -0.32643 0.031976 -0.053 -0.36091 -0.32087 0.038576
2
mse σ 2.9527 2.9527 2.9527 1.4709 1.936 2.1132 0.0044602 0.92555 1.4653 2.316 0.014295
2
mae σ 0.77683 0.77683 0.77683 0.75317 0.30496 1.0843 0.044706 0.60878 0.93484 1.0786 0.059661
mean γ 1.4939 1.4939 1.4939 1.4426 1.3792 1.3749 1.545 1.4612 1.388 1.379 1.545
bias γ -0.0060482 -0.0060482 -0.0060482 -0.057291 -0.12068 -0.12502 0.04513 -0.038714 -0.11192 -0.12093 0.045113
mse γ 0.026274 0.026274 0.026274 0.020593 0.041267 0.044455 0.0049082 0.014641 0.036723 0.047736 0.0050356
mae γ 0.099573 0.099573 0.099573 0.10787 0.16667 0.17463 0.047923 0.087617 0.15467 0.17471 0.050718
Table 2. Monte Carlo - Generalized CIR Model α =0.0408, β =-.5921,σ 2 =1.6704, γ =1.4999
20
104
mean α 0.026282 0.02631 0.032988 0.023512 0.026155 0.02619 0.018149 0.021964 0.026201 0.026319 0.017407
bias α 0.010882 0.01091 0.017588 0.0081121 0.010755 0.01079 0.0027494 0.0065644 0.010801 0.010919 0.0020066
mse α 0.00033681 0.00034073 0.027141 0.00021774 0.00033557 0.00034088 2.94e-05 0.00016122 0.00033437 0.00035097 3.1323e-05
mae α 0.012892 0.012926 0.01954 0.010099 0.012822 0.012858 0.003452 0.0085718 0.01283 0.01297 0.0035887
mean β -0.30312 -0.30342 -0.30962 -0.26678 -0.30146 -0.30183 -0.17052 -0.24704 -0.30219 -0.30346 -0.17013
bias β -0.12522 -0.12552 -0.13172 -0.088877 -0.12356 -0.12393 0.0073759 -0.069138 -0.12429 -0.12556 0.0077726
mse β 0.040826 0.041215 0.085012 0.025724 0.040504 0.040987 9.4271e-05 0.018829 0.040542 0.042202 0.00010285
mae β 0.14424 0.14463 0.15174 0.1087 0.14293 0.14371 0.0075038 0.089738 0.14341 0.14512 0.0079496
2
mean σ 0.00039439 0.00039426 0.0041781 0.00024483 0.00039568 0.00039424 -0.00073209 0.00016984 0.00039582 0.00039419 -0.00043392
2
biasσ -5.6134e-06 -5.7428e-06 0.0037781 -0.00015517 -4.3172e-06 -5.7626e-06 -0.0011321 -0.00023016 -4.1753e-06 -5.8097e-06 -0.00083392
2
mse σ 7.278e-10 7.3255e-10 0.0088959 1.4396e-06 7.0684e-10 7.3271e-10 1.3833e-05 2.6605e-06 7.0188e-10 7.3365e-10 1.8212e-05
2
mae σ 2.1654e-05 2.1722e-05 0.0038054 0.00021042 0.14293 2.1735e-05 0.0016887 0.00033096 2.1213e-05 2.1738e-05 0.002153
Table 3. Monte Carlo - Vasi ek Modelα = 0.0154, β = −.1779, σ2 = .0004, γ = 0
21
105
mean α 0.027282 0.027373 0.027345 0.026274 0.026822 0.02717 0.035421 0.024065 0.026855 0.027303 0.032218
bias α -0.011618 -0.011527 -0.011555 -0.012626 -0.012078 -0.01173 -0.003479 -0.014835 -0.012045 -0.011597 -0.0066823
mse α 0.00025651 0.00025636 0.00025737 0.00025445 0.00026778 0.0002604 0.00017471 0.00031365 0.00027164 0.00025814 0.00022287
mae α 0.013893 0.013872 0.013915 0.014255 0.014273 0.01401 0.0095164 0.016122 0.01429 0.013928 0.011389
mean β -0.35358 -0.35494 -0.35469 -0.25389 -0.34377 -0.35199 -0.22432 -0.25006 -0.34409 -0.35426 -0.22223
bias β -0.11968 -0.12104 -0.12079 -0.019993 -0.10987 -0.11809 0.009581 -0.016158 -0.11019 -0.12036 0.011673
mse β 0.037413 0.038193 0.038221 0.0085822 0.03374 0.037286 0.00032757 0.007739 0.034579 0.038137 0.00033459
mae β 0.14679 0.14817 0.14801 0.049654 0.13771 0.14582 0.011027 0.0457 0.13865 0.14779 0.012825
2
mean σ 0.007215 0.007212 0.0072157 0.0077384 0.0072538 0.0072123 0.01034 0.0087777 0.0072597 0.0072157 0.011519
2
bias σ -8.5032e-05 -8.796e-05 -8.4257e-05 0.00043839 -4.6153e-05 -8.7746e-05 0.0030401 0.0014777 -4.0337e-05 -8.4268e-05 0.0042193
2
mse σ 2.9662e-07 2.972e-07 2.9673e-07 6.2162e-06 5.1309e-07 2.9714e-07 7.3385e-05 3.0028e-05 4.3382e-07 2.9759e-07 9.4528e-05
2
mae σ 0.00040581 0.00040617 0.00040628 0.00090954 0.13771 0.00040711 0.0041124 0.0018955 0.00042241 0.00040691 0.0054431
Table 4. Monte Carlo -CIR-SR Model α = 0.0189, β = −.2339, σ 2 = 0.0073, γ = .5.
22
106
Monte Carlo Generalized CIR α =.0.0408 β =.5921 σ2 =1.6704 γ=1.4999
2.5
2.0
1.5
1.0
0.5
0.0
Figure 4.1. Monte Carlo Generalized CIR Model
pro ess, traje tories of the Generalized CIR pro ess with parameter ve tor α =0.0408, β =-
.5921,σ2 =1.6704 and γ =1.4999. However, as spe i ation of the estimated model, we now
employed a CIR SR model assuming that γ = .5.
The results of this experiment (Table 5 and Figure 4.4) demonstrate that, in this general
ase, a better performan e of ETEL and ET estimators also o urs, but the other estimators
have a mu h worse performan e in relation to the estimation of parameter σ2 . Note that the
problem of in orre t spe i ation is expe ted, in this situation, to ae t mainly the estimation
of the pro ess varian e, be ause, in the lasses of CIR models, the pro ess volatility is a level
fun tion of the pro ess with parameter γ .
5. Model Estimation for Short Term Interest Rate Models
In order to perform the pro edure of model omparison with real data, we followed the
basi stru ture of Chan et al. (1992)'s study, by estimating the generalized CIR model and
the eight submodels (Merton, Vasi ek, CIR SR, Dothan, Brennan-S hwartz, GBM, CIR VR
and CEV, in Chan et al. (1992)'s notation), with an expanded sample of a one-month maturity
Treasure Bill yields. The sample used has monthly data from July 1964 to De ember 2003,
totalling 475 observations. The data employed are extra ted from the databse of the Center
107
mean α 0.058858 0.068777 0.046083 0.0198 0.031206 0.042609 0.01872 0.016732 0.03132 0.054847 0.021016
bias α 0.018058 0.027977 0.0052827 -0.021 -0.0095939 0.0018092 -0.02208 -0.024068 -0.0094795 0.014047 -0.019784
mse α 0.00094603 0.0015446 0.0019998 0.00068434 0.0023605 0.0033731 0.0032493 0.00066272 0.011352 0.0025003 0.011872
mae α 0.021676 0.033352 0.036552 0.023599 0.022699 0.050106 0.023033 0.024875 0.025205 0.043429 0.027102
mean β -0.88012 -1.0498 -0.686 -0.25302 -0.35149 -0.63736 -0.22108 -0.23068 -0.3347 -0.83664 -0.22787
bias β -0.28802 -0.45772 -0.093905 0.33908 0.24061 -0.045256 0.37102 0.36142 0.2574 -0.24454 0.36423
mse β 0.1998 0.40413 0.52967 0.14488 0.18393 0.8179 0.13867 0.13683 0.18573 0.67003 0.16751
mae β 0.34075 0.54251 0.5972 0.37389 0.39164 0.80459 0.37102 0.3685 0.38477 0.71483 0.37528
2
mean σ 0.0081117 0.0080927 0.0082476 0.0093165 0.026925 0.010682 0.014684 0.013577 0.014008 0.0085045 0.018849
2
bias σ -1.6623 -1.6623 -1.6622 -1.6611 -1.6435 -1.6597 -1.6557 -1.6568 -1.6564 -1.6619 -1.6516
2
mse σ 2.7632 2.7633 2.7628 2.7593 2.9352 2.7588 2.7417 2.7452 2.7466 2.7621 2.7358
2
mae σ 1.6623 1.6623 1.6622 1.6611 0.39164 1.6604 1.6557 1.6568 1.6565 1.6619 1.6539
Table 5. Monte Carlo - Misspe ied Model Model α =0.0408, β =-.5921,σ 2 =1.6704, γ =1.4999
24
108
Monte Carlo Vasicek α =0.0154 β =−.1779 σ2 =0.0004 γ=0
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
Figure 4.2. Monte Carlo Vasi ek Model
Monte Carlo CIR−SR α =0.0189 β =−.2339 σ2 =0.0073 γ=0.5

0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
Figure 4.3. Monte Carlo CIR-SR Model
for Resear h and Se urity Pri es (CRSP DATA). Figure 5.1 displays the series employed and
the des riptive statisti s are pla ed in Table 6. It is noti eable that their general behavior is
quite similar to the original sample studied by Chan et al. (1992).
109
Monte Carlo Generalized CIR α =0.0408 β =.5921 σ2 =1.6704 γ=1.4999
2.5
2.0
1.5
1.0
0.5
0.0
Figure 4.4. Monte Carlo Misspe ief CIR Model
N mean std. dev. ρ1 ρ2 ρ3 ρ4 ρ5 ρ6

rt 475 0.05707511 0.0259669 0.955 0.917 0.883 0.854 0.834 0.815
rt+1 − rt 474 -5.321519e-05 0.007433653 -0.096 -0.038 -0.072 -0.101 -0.017 0.050
Table 6. Des riptive Statisti s
For ea h model, an estimation by 11 methods is performed: two-stage GMM (GMM2S),

iterative GMM (GMMITER), ontinuous updating GMM (GMMCUE), generalized empiri-
al likelihood (GEL), exponential tilting (ET), generalized empiri al likelihood with ontin-
uous updating (GELCUE) and exponentially tilted empiri al likelihood (ETEL), as well as
the versions assuming weakly dependent moments (Anatolyev (2005)) by means of smoothed
moments of the last four estimators (SGEL, SET, SGELCUE and SETEL), employing the
moment onditions des ribed in se tion 2, using Chausse (2009)'s GMM library. The initial
values employed in the estimation are the point estimations obtained in Chan et al. (1992).
The results obtained by the estimation of the Generalized CIR model are pla ed in Table 7.
As the system is exa tly identied, the point estimations of the GMM methods are equal, and
in general quite similar to those obtained by all the other estimators, ex ept the SETEL and
SGELCUE estimators of α that present a small dieren e in the point estimations. In the last
two lines of this table, there is a binary variable that indi ates whether there was onvergen e
110
0.15
0.10
r
0.05
0 100 200 300 400
Index
Figure 5.1. One Month Treasury Bill
in the parameter estimators ( . par) and in the estimation of Lagrange multipliers ( .λ) for
the estimators outside the GMM lass, where value 1 means onvergen e.
The following tables (8,9,10,11,12,9 and 14) display the results of the estimation of the
overidentied systems. In the estimation of Vasi ek's model (Table 8), there are three param-
eters and four moment onditions. In these estimations, a greater variability in the estimation
of parameters α and β is noti eable, however, due to the pattern of deviations obtained, these
estimations are not statisti ally dierent. In the estimation of the CIR SR model (Table 9),
the result is quite similar between all the estimation methods, ex ept for the estimation of β
for the GELCUE estimator, whi h an be related to a problem of lo al maximum.
The results of the Merton model estimation (Table 10) show two behavior patterns, with
values of α lose to .0003 for the GMM2S, GMMITER and GELCUE and SGELCUE methods,
and values lose to 0.0065 for the other estimators, noti ing, however, that in these ases there
was no onvergen e in the estimation of λ.
111
α 0.025680 0.025680 0.025680 0.030350 0.025650 0.025680 0.032970 0.025630 0.025640 0.031770 0.032090
se α 0.013150 0.013150 0.013150 0.012730 0.012680 0.012680 0.012640 0.015990 0.016000 0.016920 0.016860
β -0.460280 -0.460280 -0.460280 -0.549710 -0.459500 -0.460360 -0.568730 −0.459610 -0.459780 -0.591160 −0.592850
se β 0.272150 0.272150 0.272150 0.263890 0.262940 0.262940 0.260930 0.301850 0.301880 0.320180 0.317150
σ2 1.673600 1.673600 1.673600 1.673600 0.920420 0.924040 1.332080 0.923730 0.925020 1.449700 1.323670
2
se σ 2.295810 2.295810 2.295810 1.826640 1.314290 1.318720 2.350850 2.376070 2.379200 3.683310 3.412470
γ 1.461410 1.461410 1.461410 1.413150 1.339400 1.340160 1.472900 1.340100 1.340350 1.340880 1.509510
se γ 0.262060 0.262060 0.262060 0.256570 0.270330 0.270200 0.340430 0.469460 0.469440 0.467070 0.472950
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
Table 7. Estimation - Generalized CIR Model
28
112
α 0.012770 0.009700 0.005720 0.015400 0.009320 -0.002170 0.008820 0.001950 0.005000 0.008650 0.005000
se α 0.011490 0.011420 0.011380 0.010800 0.012040 0.010740 0.010750 0.013690 0.012510 0.010770 0.012510
β −0.209910 -0.145250 -0.058160 -0.177900 -0.176290 0.100590 -0.168670 -0.168920 -0.180300 -0.125850 -0.180300
se β 0.242210 0.241110 0.240380 0.225640 0.249630 0.225830 0.225710 0.221900 0.229360 0.204440 0.229360
σ2 0.000390 0.000380 0.000380 0.000400 0.003620 0.000400 0.000710 0.007560 0.005170 0.000370 0.005170
2
se σ 0.000060 0.000060 0.000060 0.000060 0.000620 0.000060 0.000070 0.003450 0.001870 0.000080 0.001870
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 0 1 1 0 0 0 1 0
Table 8. Estimation - Vasi ek Model
29
113
α 0.014700 0.013510 0.011000 0.012730 0.048840 0.005680 0.012190 0.012370 0.017930 0.013130 0.012060
se α 0.011540 0.011520 0.011490 0.010850 0.011860 0.010850 0.010810 0.011110 0.012610 0.011270 0.011030
β -0.245170 -0.219500 -0.164570 -0.220600 −0.626310 −0.058160 -0.218900 -0.224410 -0.230920 -0.214410 -0.216110
se β 0.242760 0.242370 0.241820 0.226930 0.237360 0.227960 0.226900 0.210930 0.238380 0.214710 0.208060
σ2 0.008100 0.008050 0.008000 0.008520 0.010370 0.008160 0.004210 0.005910 0.016620 0.007830 0.002990
2
se σ 0.001110 0.001110 0.001110 0.001690 0.001230 0.001200 0.001250 0.001490 0.002410 0.001490 0.001660
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 0 1 1 0 0 1 1 0
Table 9. Estimation - CIR SR Model
30
114
α 0.002940 0.002960 0.003050 0.000820 0.000720 0.002610 0.000720 0.000850 0.000690 0.002800 0.000690
se α 0.002570 0.002570 0.002570 0.002370 0.002370 0.002410 0.002370 0.002490 0.002540 0.003130 0.002540
σ2 0.000390 0.000380 0.000380 0.001030 0.000930 0.000400 0.000930 0.001100 0.000940 0.000370 0.000940
se σ2 0.000060 0.000060 0.000060 0.000190 0.000180 0.000060 0.000180 0.000320 0.000280 0.000080 0.000280
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 0 0 1 0 0 0 1 0
Table 10. Estimation - Merton Model
31
115
Similar results are obtained for parameter σ 2 for all methods in the estimation of the Dothan
model (Table 11), with onvergen e in all estimations.
In the estimation of the Brennan-S hwartz models (Table 12) CIR VR (Table 13) and CEV
(Table 14), we obtained su ess in the onvergen e of the parameter ve tor and the Lagrange
multipliers' ve tor in all the methods, and, as expe ted the results obtained are quite similar
in all the estimators used, ex ept for the estimators based on exponentially tilted empiri al
likelihood.
6. Spe ifi ation Tests
One way of undertaking the spe i ation tests in the ontext of the GMM estimation in
overidentied models is through the distan e between the moment onditions and zero. In
the overidentied ase, the greater proximity of the moment onditions evaluated in θb would
be eviden e of the validity of the spe i ation employed. A way of dening a test statisti is
through the riterion fun tion itself, and that originates the so- alled J-tst, whose statisti is
given by:
(6.1) b ′ [Ω(
J = T g(θ) b −1 g(θ).
b θ)] b
The asymptoti distribution of the test statisti s under the null hypothesis of orre t es-
pe i ation is a distribution χ2 (m − k), whose degree of freedom is given by the number of
moments in ex ess to the number of parameters estimated. In the estimation by the methods
of generalized minimum ontrast/generalized empiri al likelihood, it is also possible to on-
stru t two alternative spe i ation tests, the Lagrange multiplier (LM) and the Likelihood
Ratio test (LR), as dis ussed in Smith (2001), employing the Lagrange multipliers of equation
3.10. The intuition of the LM test is similar to that of the J test - if the moment onditions
are valid, the Lagrange multipliers must not be far from zero - and thus it is not ne essary to
impose restri tions to the model to make the moment onditions equal to zero. The form of
the LM test in this ontext is given by:
116
σ2 0.144400 0.143860 0.143500 0.151650 0.152360 0.143570 0.141730 0.156020 0.147600 0.142890 0.137570
se σ2 0.018830 0.018840 0.018840 0.020320 0.020320 0.020310 0.020310 0.027350 0.027120 0.027040 0.026990
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
Table 11. Estimation - Dothan Model
33
117
α 0.019800 0.019680 0.018880 0.020820 0.019830 0.017060 0.032160 0.025000 0.022600 0.019920 0.024570
se α 0.011810 0.011810 0.011800 0.011320 0.011240 0.011240 0.011760 0.012850 0.012650 0.012490 0.013460
β -0.343020 -0.340200 -0.323030 -0.322980 -0.338490 -0.287340 -0.298100 −0.446630 -0.400510 -0.347120 −0.315520
se β 0.246860 0.246820 0.246660 0.236550 0.235330 0.235330 0.240360 0.244610 0.241050 0.238700 0.250080
σ2 0.144450 0.144100 0.143670 0.158580 0.146250 0.143280 0.127940 0.152150 0.146510 0.142520 0.134780
2
se σ 0.018530 0.018530 0.018540 0.019680 0.019650 0.019650 0.019790 0.025920 0.025780 0.025690 0.025900
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
Table 12. Estimation - Brennan-S hwartz Model
34
118
σ2 1.947930 1.935290 1.973020 2.134040 2.084420 2.031730 1.930910 2.089810 1.991970 1.949750 1.894050
se σ2 0.260370 0.260230 0.260670 0.265510 0.264360 0.263270 0.273340 0.341910 0.337720 0.336220 0.334510
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
Table 13. Estimation - CIR VR Model
35
119
β 0.061900 0.062120 0.065920 0.054140 0.052030 0.051540 0.011800 0.029150 0.028310 0.060810 −0.006150
se β 0.054340 0.054340 0.054340 0.050480 0.050480 0.050460 0.050610 0.054950 0.055490 0.056250 0.056190
σ2 0.238460 0.149560 0.149790 0.729960 0.500060 0.213700 0.732500 0.606200 0.593460 0.143730 0.560070
se σ2 0.437080 0.299670 0.299890 1.067110 0.773820 0.377510 1.432650 1.653030 1.748400 0.446840 2.093630
γ 1.098550 1.016000 1.016110 1.296680 1.229400 1.075610 1.365170 1.254660 1.290140 1.014000 1.355840
se γ 0.337350 0.365020 0.364740 0.278770 0.292490 0.327000 0.377480 0.503040 0.546550 0.560770 0.700410
. par 1 1 1 1 1 1 1 1 1 1 1
. λ 1 1 1 1 1 1 1 1
Table 14. Estimation - CEV Model
36
120
(6.2) b′ Ω(
LM = T λ b λ.
b θ) b
Note that it is also possible to perform spe i ation analyses by onsidering the distributions
b instead of the joint LM test4. The test of Likelihood Ratio
estimated individually for ea h λ
(LR) is obtained by omparing the obje tive fun tion ρ under the unrestri ted model with the
estimation of a restri ted model, formulated in the ontext of generalized minimum ontrast
models, as dened in se tion 3. This form is given by:
" #
T
X
(6.3) LR = 2 ρ λb′ gw (θ,
b xt ) − ρ(0)
t=1
The LM and LR tests are also distributed asymptoti ally as χ2 (m−k). Although J, LM and
LR tests are asymptoti ally equivalent, the optimality results in models of empiri al likelihood
/ Generalized Minimum Contrast (e.g. Kitamura (2006)) - as well as the best properties in
point estimation in nite samples - indi ate that LR and LM tests must have better properties
in nite samples than the J-test obtained by GMM, whi h an be severely downward biased
in nite samples, as pointed out by Zhou (2000), leading to a greater probability of in orre t
a eptan e of a null hypothesis of orre t model spe i ation. For the GMM method only
the J-test is dened. Note that the validity of LM and LR depends on the onvergen e in the
estimation of λ.
A summary of the results of all the tests for the seven tted models is presented in Table
15. A tra e indi ates whether there was onvergen e in the estimation of λ. There is no
onvergen e problem for Brennan-S hartz, CIR-VR, Dotha and , CEV models. Also only
the GMM and (S)GELCUE methods presented onvergen e for all models. The results for
these methods are similar. Considering only these ve methods there are eviden e against the
espe i ation of Vasi ek, CIR-SR and Merton models (all p-values less or equal to 7% in all
tests); some eviden e against the CEV model (p-values less or equal to 7% for the GMM and
4These tests were al ulated for all the models estimated, but they are not reported here for reasons of spa e.
They are available from the authors.
121
equal to 11% for the GELCUE method); little eviden e agains the Dothan, Brennan-S hartz
and CIR-VR models. In the other methods we an nd strong eviden e against all the models
whenever the onvergen e o urred but we must onsider this result with are.
Observe that, ex ept for the GEL and ET methods, even when we have onvergen e, the
p-values are very small. This ould be be ause these methods are more powerful than the
other tests or an indi ation that the hi-square distribution is not a good approximation for
nite sample size series. This question ould be answered using simulation, but it is out of
the s ope of this arti le.
In general, the results of spe i ation tests employing onditions of overidenti ation estimated
by empiri al likelihood/generalized minimum ontrast, summarized in Table 15, point towards
the reje tion of the validity of the null hypothesis of orre t spe i ation, whereas, in general,
the J-tests by GMM are more favorable to the validity of the null hypothesis. The results of
the empiri al likelihood/generalized minimum ontrast tests are onsistent with the per eption
that single-fa tor models, su h as the models estimated in this study, an be ex essively
simple to be able to model interest rate pro esses or the pri ing of xed-in ome instruments
(e.g. Stambaugh (1988), Stanton (1997), Litterman and S heinkman (1991), Longsta and
S hwartz (1992) and Lund and Andersen (1997)).
Nevertheless, there are alternative forms of verifying problems of in orre t spe i ation.
One possibility is by analysing the estimated impli it probabilities. It is also possible to
onstru t alternative tests of spe i ation and stru tural break employing impli it probabilities
obtained by the estimation of generalized empiri al likelihood models, as shown by Antoine
et al. (2007), Ramalho and Smith (2005) and Guay and Lamar he (2008), using Pearson type
statisti s to measure the quadrati distan e between the impli it probabilities of restri tive
and unrestri tive models. These statisti s are asymptoti ally equivalent to the spe i ation
tests employing the moment onditions presented in this se tion. None of these alternative
are onsidered here.
122
H0 GMM2S GMMITER GMMCUE GEL ET GELCUE ETEL SGEL SET SGELCUE SETEL
Vasi ek 1 2 2 - <<< 2-2-2 - - - <<< -
CIR-SR 5 6 6 - <<< 5-5-5 - - <<< <<< -
Merton 6 7 7 - - 4-4-4 - - <<< <<< <<<
Dothan 27 28 28 47-25< 46-34-7 53-53-53 53-24-3 7-3< 13-5< 15-15-15 13< <
BS 28 30 30 12-9-9 26-18-4 28-28-28 <<< 3< < 4-2-< 5-5-5 <<<
CIR-VR 11 10 10 17-15-10 18-15-10 18-18-18 <<< <<< <<< <<< <<<
CEV 6 7 7 8-7-3 10-8-4 11-11-11 <<< <<< <<< <<< <<<
Table 15. Summary of spe i ation tests. Ea h ell has p-values for J, LR and
LM Tests. In the GMM methods only J-Test was applied. -: no onvergen e;
and < p-value smaller than 1%.
7. Con lusions
In this arti le we demonstrate semi-parametri methods based on the empiri al likeli-
hood/generalized minimum ontrast for the estimation of sto hasti dierential equations.
These estimators are hara terized by properties of asymptoti e ien y of superior order,
properties of optimality in hypotheses tests and robustness in relation to in orre t spe i a-
tion problems for the estimators based on exponential tilting. These properties are parti -
ularly important in this ontext of estimation of sto hasti dierential equations, sin e, in
general, it is not possible to onstru t the exa t likelihood fun tion of the pro ess due to the
non-existen e of analyti al solutions (and onsequently of exa t dis retizations) for sto hasti
dierential equations. In this ontext, the use of non-parametri approximation for the pro-
ess density employing these methods is parti ularly advantageous be ause it fa ilitates the
e ient evaluation of the pro ess density, at the same time as the parametri spe i ation
given by the sto hasti dierential equation is being used by means of moment onditions.
The results obtained demonstrate that the exponentially tilted empiri al likelihood estima-
tor in parti ular, proposed by S henna h (2007), obtains a performan e whi h is superior to
the other te hniques proposed, due to its properties of robustness in the presen e of spe i-
ation problems. As it is possible to interpret the estimation of the sto hasti dierential
equations by employing dis rete data as an in orre t spe i ation problem, due to the use
of an approximated dis retization of the model, the results of the Monte Carlo experiments
demonstrate that the performan e of this estimator is quite superior to the other estimation
methods employing moment onditions, and, in general, the estimators based on empiri al like-
lihood/generalized minimum ontrast have a better performan e in terms of bias and mean
123
square error than the estimators of the GMM.

The prin ipal results of the empiri al appli ation are related to the spe i ation tests em-
ploying overidenti ation onditions. The tests based on estimations by empiri al likeli-
hood/generalized minimum ontrast give support to the eviden e that single-fa tor models
are not adequate to des ribe the dynami s of short-term interest rate models, a result not
always obtained by J-test obtained by GMM and related to the downward bias found in J-test
when employed in small samples.
Referen es
Ait-Sahalia, Y.: 2002, Maximum-likelihood estimation of dis retely-sampled diusions: A
losed-form approximation approa h, E onometri a 70, 223262.
Anatolyev, S.: 2005, Gmm, gel, serial orrelation and asymptoti bias, E onometri a 73, 983
1002.
Andrews, D. W. K.: 1991, Heteroskedasti ity and auto orrelation onsistent ovarian e matrix
estimation, E onometri a 59, 817858.
Antoine, B., Bonnal, H. and Renault, E.: 2007, On the e iente use of the informational
ontent of estimating equations: implied probabilities and eu lidian empiri al likelihood,
Journal Of E onometri s 138, 461487.
Ba helier, L.: 1900, Theorie de la spe ulation. English translation by A J Boness in The
Random Chara ter of Sto k Market Pri es, ed. Paul H Cootner, pg 1778, Cambridge,
Mass, MIT press 1967.
Bi kel, P., Klassen, C., Ritov, Y. and Wellner, J.: 1993, E ient and Adaptative Estimation
for Semiparametri Models, Johns Hopkins Press.
Bishwal, J. P. N.: 2007, Parameter Estimation in Sto hasti Dierential Equations, Springer.
Bla k, F. and S holes, M. S.: 1973, The pri ing of options and orporate liabilities, Journal
of Politi al E onomy 7, 63754.
Brennan, M. and S hwartz, E. S.: 1980, Analyzing onvertible bonds., Journal of Finan ial
and Quantitative Analysis 15, 907929.
124
Chan, K. G., Karolyi, G., Longsta, F. and Sanders, A. B.: 1992, An empiri al omparasion
of alternative models of term stru ture of interest rates, Journal of Finan e 47, 12091227.
Chausse, P.: 2009, gmm: Generalized Method of Moments and Generalized Empiri al Likeli-
hood. R pa kage version 1.0-6.

URL: http://CRAN.R-proje t.org/pa kage=gmm
Cox, J.: 1975, Notes on option pri ing i: onstant elasti ity of varian e diusions. Working
Paper, Stanford University.
Cox, J. C., Ingersoll, J. . E. and Ross, S. A.: 1985, A theory of the term stru ture of interest
rates, E onometri a 53, 385408.
Cox, J. C., Ingersoll, J. E. and Ross, S. A.: 1980, An analysis of variable rate loan ontra ts,
Journal of Finan e 35, 389403.
DasGupta, A.: 2008, Asymptoti Theory of Statisti s and Probability, Sprimger.
Delbaen, F. and S ha hermayer, W.: 1994, A general version of the fundamental theory of
asset pri ing, Mathematis he Annalen 300, 463250.
Dothan, U. L.: 1978, On the term stru ture of interest rates, Journal of Finan ial E onomi s
6, 5959.
Gourieroux, C. and Monfort, A.: 1996, Simulation-Based E onometri Models, Oxford Uni-
versity Press.
Guay, A. and Lamar he, J.-F.: 2008, The information ontent of implied probabilities to
dete t stru tural hange. Bro k University Working Paper 08-33.
Hansen, L. P.: 1982, Large sample properties of Generalized Method of Moments estimators,
E onometri a 50, 10291054.
Hansen, L. P., Heaton, J. and Yaron, A.: 1996, Finite sample properties od some alternative
gmm estimators, Journal of Business and E onomi Statisti s 14, 262280.
Harrison, J. M. and Kreps, D.: 1979, Martingales and arbitrage in multiperiod se urities
markets, Journal of E onomi Theory 20, 381408.
Harrison, J. M. and Pliska, S.: 1981, Martingales and sto hasti integrals in the theory of
ontinous trading, Sto hasti Pro esses and Their Appli ations 11, 215260.
125
Imbens, G. W., Spady, R. H. and Johnson, P.: 1998, Information theoreti approa hes to
inferen e in moment onditions models, E onometri a 66, 333357.
Karatzas, I. and Shreve, S. E.: 1987, Brownian Motion and Sto hasti Cal ulus, Springer
Verlag.
Kitamura, Y.: 2006, Empiri al likelihood methods in e onometri s: Theory and pra ti e.
Unpublished Working Paper.
Kitamura, Y. and Stutzer, M.: 1997, An information-theoreti alternative to generalized
method of moments estimation, E onometri a 65(5), 861874.
Kloeden, P. and Platen, E.: 1992, Numeri al Solution of Sto hasti Dierential Equations,
SpringerVerlag.
Litterman, R. and S heinkman, J.: 1991, Common fa tors ae ting bond returns, Journal of
Fixed In ome 1, 5461.
Longsta, F. and S hwartz, E.: 1992, Interest rate volatility and the term stru ture: A two-
fa tor general equilibrium model, Journal of Finan e 47, 12591282.
Lund, J. and Andersen, T.: 1997, Estimating ontinuous-time sto hasti volatility models of
the short-term interest rate, Journal Of E onometri s 77, 343377.
Merton, R. C.: 1973, The theory of rational option pri ing, Bell Journal 4, 141183.
Milstein, G. N.: 1974, Aproximate integration of sto hasti diferential equations, Theory of
Probability and Appli ations 19, 557562.
Newey, W. K. and West, K. D.: 1987, A simple, positive semi-denite, heteroskedasti ity and
auto orrelation onsistent ovarian e matrix, E onometri a 55, 703708.
Newey, W. and M Fadden, D.: 1994, Handbook of E onometri s, Vol. 4, Elsevier, hapter
Large sample estimation and hypothesis testing.
Newey, W. and Smith, R. J.: 2004, High-order properties of gmm and generalized empiri al
likelihood estimators, E onometri a 72, 219255.
Owen, A.: 1991, Empiri al likelihood for linear models, The Annals of Statisti s 18(1), 1725
1747.
Pedersen, A. R.: 1995, A new approa h to maximum likelihood estimation for sto hasti
dierential equations based on dis rete observations, S andinavian Journal of Statisti s .
126
Prakasa Rao, B. L. S.: 1999, Statisti al Inferen e for Diusion Type Pro ess, Arnold.
Qin, J. and Lawless, J.: 1994, Empiri al likelihood and general estimating equations, The
Annals of Statisti s 20(1), 300325.
Ramalho, J. J. S. and Smith, J.: 2005, Goodness of t tests for momento onditions models.
Working Paper 2005/05.
Rogers, L. C. G. and Williams, D.: 2000, Diusions, Markov Pro ess and Martingales: Volume
2 It Cal ulus, Cambridge.
S henna h, S.: 2007, Point estimation with exponentially tilted empiri al likelihood, Annals
of Statisti s 35(2), 634672.
Smith, R. J.: 2001, Gel riteria for moment onditions models. Working Paper, University of
Bristol.
Stambaugh, R.: 1988, The information in forward rates: Impli ations for models of the term
stru ture, Journal of Finan ial E onomi s 21, 4170.
Stanton, R.: 1997, A nonparametri model of term stru ture dynami s and the market pri e
of interest rate risk, Journal of Finan e 52, 19732002.
Vasi ek, O.: 1977, An equilibrium hara terization of the term stru ture, Journal of Finan ial
E onomi s 5, 17788.
Zhou, H.: 2000, A study of the nite sample properties of emm, gmm, qmle, and mle for a
square-root interest rate diusion model. Federal Reserve System Finan e and E onomi s
Dis ussion Series 2000-45.
Zivot, E. and Wang, J.: 2006, Modeling Finan ial Time Series with S-PLUS, se ond edition.,
Springer-Verlag.
127
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING
METHODS OF GENERALIZED EMPIRICAL LIKELIHOOD/MINIMUM
CONTRAST

LUIZ KOODI HOTTA
Abstra t. In this arti le we dis uss the estimation of Sto hasti Volatility (SV) models
using generalized empiri al likelihood/Minimum Contrast methods. We show via Monte
Carlo simulations that the proposed methods have a superior or equivalent performan e
to the other estimation methods proposed in the literature to estimate SV models, and,
additionally, they oer robustness properties in the presen e of spe i ation problems su h
as heavy-tailed distributions and the presen e of outliers.
1. Introdu tion
Measurement of asset volatility is a fundamental aspe t of nan e. Pre ise volatility mea-
surements in nan ial asset returns are ne essary in ertain aspe ts, su h as risk management
(M Neil et al. (2005)) and asset pri ing (Singleton (2006)). Among the available forms for
modeling volatility, the lass of models known as SV models stands out1. In this lass of models,
volatility is treated as a non-observed latent fa tor. One of the main reason for its popularity
is that SV models an be derived from ontinuous time diusions (e.g. Barndor-Nielsen et al.
(2002)), and, thus they be ome loser to the pri ing literature using non-arbitrage/martingale.
These models are also attra tive be ause, as empiri al eviden e shows, they are better at
apturing stylized fa ts of nan ial series, and their predi ative performan e is superior in
omparison to other lasses of volatility models (e.g. Koopman et al. (2005)), su h as, for
example, the lass of GARCH models (Engle (1982), Bollerslev (1986)). However, as volatil-
ity is treated as a non-observable latent pro ess, the estimation of volatility models is more
ompli ated than the estimation of on urrent models, su h as the GARCH lass, in whi h
volatility is a deterministi fun tion of the past, whi h makes the evaluation the likelihood
fun tion a simple pro edure.
In SV models, the exa t evaluation of the likelihood fun tion, due to the presen e of the
latent volatility fa tor, requires the al ulation of an integral with a dimension equivalent to
the sample size. The numeri evaluation of this problem requires methods based on simulation,
su h as importan e sampling methods (e.g. Geweke (1994), Liesenfeld and Ri hard (2003)) or
Markov Chain Monte Carlo (MCMC) (Shephard (1993),Ja quier et al. (1994)). Although these
methods are e ient and with the urrently available omputational power, quite feasible,
some problems still remain, su h as the determination of a fun tion of importan e appropriate
or the problem of orrelation in the hains in MCMC sampling. It is also possible to work
with likelihood fun tion approximations, su h as the estimation by quasi-maximum Likelihood
(Harvey et al. (1994), Jungba ker and Koopman (2009)), based on a linearization of the SV
1For a review of methods for estimating SV models see, for example, Broto and E. (2004), Ghysels et al.
(1996),Shepard and Andersen (2009) and Jungba ker and Koopman (2009)
1
129
ESTIMATION OF STOCHASTIC VOLATILITY MODELS USING GEL/GMC METHODS 2
model. In this methodology, the evaluation of the likelihood fun tions is made by means of
a de omposition of the predi tion error using the Kalman lter, whi h renders a onsistent
estimator whi h is asymptoti ally Gaussian though ine ient and biased in nite samples.
Other ways of evaluating this model employ the estimation by simulation using the methods
of indire t inferen e and the e ient method of moments (Gourieroux et al. (1993), Gallant
and Tau hen (1996)). These two methods are asymptoti ally e ient, and have good proper-
ties in nite samples (Monfardini (1998)), but they are less e ient than the MCMC methods
of Shephard (1993) and Ja quier et al. (1994). The simplest estimation form for volatility
models is the method of moments, the original form of estimation employed in the estimation
of the seminal log-normal SV model proposed by Taylor (1986). This methodology was later
rened by Melino and Turnbull (1990) through the use of the generalized method of moments
(GMM) by Hansen (1982), whi h generates onsistent and asymptoti ally e ient estimators.
These estimators are omputationally simple, but their properties in nite samples an be
poor and they are ine ient when ompared with estimators based on MCMC. A omprehen-
sive study of these estimators' properties an be found in Andersen and Sorensen (1996), and
a omplete survey about the estimation of SV models using the method of moments an be
found in Renault (2009).
The performan e of SV model estimators employing GMM is weakened by the fa t that the
GMM estimator's bias grows with the number of moment onditions (e.g. Newey and Smith
(2004)), and the e ien y in this method depends on an adequate hoi e of the moment
onditions. The GMM estimator manages to rea h the e ien y of the maximum likelihood
estimator if one of the moments is the s ore fun tion of the maximum likelihood estimator, or
if the moments employed proje t this fun tion. In pra ti e, the e ient estimation by GMM
involves the use of a large number of moment onditions. As the bias in nite samples of
the GMM estimator is proportional to the number of moments employed, there is a trade-
o between bias and varian e in the estimation by GMM when a high number of moment
onditions is used. Another problem in the estimation of SV models by GMM is the la k of
robustness in the moment onditions employed. The estimation of the log-normal SV model
is based on onditions that employ moments of superior orders, and this an be a serious
problem in the presen e of outliers or pro esses of heavy-tailed innovation. In this situation,
the the ee ts of outliers in the sample are raised to poten ies of third or fourth order, whi h
signi antly ae ts the estimation in nite samples.
A further problem lies in the formulation of moment onditions. Although the GMM
estimator is semi-parametri , and thus it is not ne essary to spe ify the distribution fun tion of
the pro ess, the formulation of moment onditions for SV models generally employs moments
derived from the spe i ation of a distribution fun tion for the innovations, as in the ase
of the so- alled log-normal SV model of Taylor (1986). If this assumption is not valid, the
properties of the GMM estimator may be degraded.
In this way, the omputationally simplest implementation of the generalized moments
method leads to an estimator with poor properties in nite samples (Andersen and Sorensen
(1996)), and, on the other hand, the implementation of e ient estimators, su h as the meth-
ods based on MCMC, are omputationally intensive and subje t to onvergen e problems. In
this study we propose an alternative form of estimation employing semi-parametri methods
of generalized empiri al likelihood and generalized minimum ontrast. These methods, as will
be demonstrated, represent a omputationally simpler way of implementation be ause they
an be based on the same moment onditions as the estimators of generalized moment meth-
ods, and they produ e e ient estimators with good properties in nite samples, as will be
130
demonstrated by a series of Monte Carlo studies. Estimators based on generalized empiri-

al likelihood and generalized minimum ontrast derive from a semi-parametri methodology,
whi h permits the estimation of nite dimensional parameters related to the generating pro ess
of the parametri part of the pro ess in question - in our ase, the parameters of SV pro ess
- but a omplishing e ien y (in the semi-parametri sense dened by Bi kel et al. (1993))
by means of a non-parametri estimation for the pro ess distribution. This enables us to use
the information in the sample in an e ient way. As this methodology uses more information
than the estimation by the generalized method of moments - sin e the latter employs only
moments and not the whole information in the sample, it manages to present superior prop-
erties in nite samples, omparable or superior to simulation based methods su h as MCMC,
e ient method of moments, or minimum Hellinger distan e (Takada (2009)).
Furthermore, the proposed estimators also address the problem of la k of robustness in
the presen e of outliers. Two sub- lasses of estimators studied (the Exponential Tilting (ET)
estimator (Imbens et al. (1998), Kitamura and Stutzer (1997)) and the Exponentially Tilted
Empiri al Likelihood (ETEL) estimator (S henna h (2007)) have properties of robustness in
the presen e of in orre t spe i ation problems, and these properties appear to be impor-
tant in the presen e of outliers ontaminating the data and in the presen e of heavy-tailed
distributions in the innovations of the mean and of the pro ess volatility.
This study's analysis methodology is based on Monte Carlo studies for the veri ation of the
properties of the proposed estimators. In order to obtain ompatibility in the results obtained,
we followed the same designs of Monte Carlo experiments used in the studies by Ja quier
et al. (1994), Andersen and Sorensen (1996) and Takada (2009), whi h fa ilitates a dire t
omparison of the results. The Monte Carlo experiments are based on the spe i ations by
Taylor (1986)'s log-normal SV model used in those studies. The simulated series are estimated
by methods employing estimators of generalized empiri al likelihood, ET and ETEL, as well
as the smoothed moments' versions of these models. As referen e riterion we will also use the
estimation by generalized method of moments, employing two-stage, iterated and ontinuous
updating versions. This ben hmarking is useful be ause the moment onditions are the same.
The obje tive of these analyses is to verify the properties of the estimators proposed in
relation to the size of the sample used, the set of moment onditions, and in relation to
the robustness in the presen e of heavy-tailed pro esses of innovation and outliers. To this
end, we undertook three lasses of experiments. In the rst lass, we analyzed the ee t of
sample size and of the set of instruments, analyzing the estimation with sample sizes of 250,
500 and 1,000 observations, using sets of 24 and 14 moment onditions, following Andersen
and Sorensen (1996)'s study. In the se ond lass, we veried the estimators' properties in
the presen e of heavy-tailed innovation pro esses, and for this we employed two experiment
ongurations. The rst onguration uses a Student t distribution with 4 degrees of freedom
as innovation pro ess of the mean; and in the se ond onguration, we used the same Student
t distribution with 4 degrees of freedom, but now as innovation pro ess in the equation that
des ribes the pro ess volatility. The last lass of experiments veries the ee t of the outliers
on the estimation, and on e again, with two kinds of experiments. The rst experiment veries
the ee ts of an outlier on the mean equation (Level Outlier as named by Hotta and Tsay
(1998); and the se ond experiment veries the ee t of an outlier on the volatility equation
(Volatility Outlier a ording to Hotta and Tsay (1998)).
This study is stru tured as follows: in se tion 2 we briey revise the log-normal SV model
employed; in se tion 3, we revise the use of moment onditions in the estimation of SV models;
in se tion 4, we present the estimation methods based on empiri al likelihood and generalized
131
minimum ontrast; se tion 5 shows Monte Carlo experiments; and the nal on lusions are in
se tion 6.
2. Log-Normal Sto hasti Volatility Model
The so- alled log-normal volatility model introdu ed by Taylor (1986) an be des ribed by
the following stru ture:
(2.1) yt = σt εt ,
(2.2) 2
logσt2 = α + βlogσt−1 + σut ,
where the equation 2.1 des ribes the behavior of the pro ess mean, and equation 2.2 ontains
the volatility dynami s. It is usually assumed that the innovation pro esses in the mean and in
volatility are given by independent normal distributions, that is, (εt , ut ) ∼ iidN (0, I2 ) and in
this model the parameter ve tor is given by θ = (α, β, σ). Note that it is possible to interpret
this model in a semi-parametri form, as pointed out by Renault (2009), without an a priori
spe i ation of the innovation pro ess distributions. Renault (2009) denotes this model as
Exponential - SARV be ause the varian e exponential is an autoregressive pro ess.
As demonstrated by Fran q and Zakoïan (2006), it is not ne essary to assume a distribution
2 + logε2 ,
for this model's estimation, sin e, as previously noted by Ruiz (1994), logyt2 = logσt−1 t
and this orresponds to an ARMA model (1,1) for the log of the square of the observed pro ess
yt , whi h makes it possible to derive the representation employed by Fran q and Zakoïan
(2006) to obtain a onsistent estimator by least squares for this model. Fran q and Zakoïan
(2006) also demonstrate that there is an ARMA(m,m) model for any logytm poten y of this
pro ess, although it is important to note that the log-normal representation is quite realisti ,
as indi ated by Andersen (1994).
This log-normal spe i ation makes it possible to onstru t moment onditions of any order,
as demonstrated by Taylor (1986) and Melino and Turnbull (1990). The moment onditions
of the log-normal SV model an be obtained by initially dening the non- onditional mean
and varian e of the equation logarithm of the varian e:
α σ2
µ = E log σt2 = , σy2 = V ar log σt2 = ,
1−β 1 − β2
and the remaining moments as:
E [|yt |] = (2/π)1/2 E [σt ] ,

E yt2 = E σt2 ,
p
E |yt3 | = 2 2/πE σt3 ,

E yt4 = 3E σt4 ,
E [|yt yt−j |] = (2/π) E [σt σt−j ] ,
132
2
E yt2 yt−j 2
= E σt2 σt−j .
Moments of superior order an be written out as:

ru r 2 u2
E [σrr ] = exp +
2 8
for any positive integer j and onstants r and s, and in the same way ovarian es an be
obtained by:

rsβ j σ 2
s
E σtr σt−s = E [σtr ] E [σts ] exp .
4
The moment onditions employed by Andersen and Sorensen (1996) and in our study om-
prise a set of 24 moment onditions using absolute moments of se ond to fourth order and
lags of rst to tenth orders:

(2.3) 24
gt,y t
(θ) = |yt |, yt2 , |yt3 |, yt4 , |yt yt−1 |, ..., |yt yt−10 |, yt2 yt−1
2
, ..., yt2 yt−10
2
We also employed a se ond ve tor of moment onditions with 14 moment onditions given
by:
(2.4)
14
gt,y t
(θ) = |yt |, yt2 , |yt3 |, yt4 , |yt yt−2 |, |yt yt−4 |, |yt yt−6 |, |yt yt−8 ||yt yt−10 |, yt2 yt−1
2
, yt2 yt−3
2
, yt2 yt−5
2
, yt2 yt−7
2
, yt2 yt−9
2
With these two ve tors of moment onditions we an perform the estimation using the
generalized method of moments dened in se tion 3 and the generalized empiri al likelihood
and generalized minimum ontrast methods in se tion 4.
3. Estimation of Sto hasti Volatility Models using the Method of Moments

The estimation by Hansen (1982)'s generalized method of moments is performed by making
the sample moments equal to the population moments , whi h is equivalent to equalizing the
moment onditions ve tor g(θ, Yt ) to zero in the form:
T
1X
(3.1) g (θ, yt ) = g(θ, yt ) = 0.
T t=1
This system is generally over-identied (there are more moment onditions than param-
eters), and so in general there are no solutions. In order to obtain a solution, a riterion
fun tion must be employed:
(3.2) J(θ) = g (θ, yt )′ W g (θ, yt )

and an optimal solution is dened as the minimization of J(θ), with W being a positive
denite weighting matrix. The fundamental result obtained by Hansen (1982) is to demon-
strate that the asymptoti ally e ient solution of the estimation is obtained when this matrix
is given by:
133
n √ o−1
(3.3) W∗ = lim V ar T w (θ) = Ω(θ)−1 .
t→∞
where Ω(θ) denotes the varian e- ovarian e matrix of the model's parameters. In this
way, the asymptoti ally e ient weight is obtained by employing the inverse of the varian e-
ovarian e parameter matrix. This matrix is generally unknown, and is usually estimated
using the HAC lass of estimators by Newey and West (1987):
T
X −1
(3.4) b=
Ω b s (θ ∗ ),
kh (s)Γ
s=−(T −1)
where k denotes a kernel fun tion in relation to a ertain parameter of bandwidth h, hosen
by means of Newey and West (1987) or Andrews (1991)'s pro edures:
T
1X
(3.5) b ∗
Γs (θ ) = g(θ ∗ , yt )g(θ ∗ , yt+s )′ .
T
t=1
The e ient estimator of the generalized method of moments is then obtained as a solution
to the problem:
(3.6) θb = arg min g (θ, yt )′ Ω

b (θ ∗ ) g (θ, yt)
θ
There are several forms to arry out the implementation of the GMM estimator. The initial
form proposed by Hansen (1982) is the estimator known as two-stage GMM. This estimator
is obtained by performing a rst stage, nding an initial θb∗ = arg min g (θ)′ Ωg (θ) estimator,
where Ω is an initial weighting matrix, usually an identity matrix. Following from this rst
stage, a HAC matrix Ω b (θ ∗ ) is al ulated in fun tion of that initial estimation, and the nal
estimation of the GMM estimator is obtained as θb = arg min g (θ)′ Ω b (θ ∗ ) g (θ) with the HAC
matrix that was obtained in the rst stage.
A point to be noted is that, in this ase, the se ond stage results depend on the initial
estimation in the rst stage, and thus this pro edure an reate a rst order bias, weakening
the estimator's performan e in nite samples (Hansen et al. (1996)). In order to solve this
problem, two alternative pro edures were proposed. The rst pro edure is known as iterative
GMM, in whi h the rst stage estimation is reinitialized with the result of the se ond stage
estimation, and this iteration ontinues until the variation in the parameter ve tor or in the
riterion fun tion be omes smaller than an established toleran e.
Another possible estimator is known as GMM with ontinuous updating (Hansen et al.
(1996)). In this ase, the estimation of the parameter θb is not performed in stages, but rather
by simultaneously employing a numeri optimization algorithm. Starting from an initial ve -
tor θ0 (usually hosen by employing a two-stage GMM method), the estimation is performed
by θb = arg min g (θ)′ Ω
b (θ ∗ ) g (θ), but now θ and Ωb (θ ∗ ) are simultaneously determined by the
numeri optimization pro edure. This pro edure obtains the same rst order properties of the
iterative GMM estimator, but, a ording to Hansen et al. (1996), it has better properties in
terms of bias in nite samples, and this estimator is invariant under model reparameterization.
134
A ording to Newey and Smith (2004) and Anatolyev (2005), the three methods are asymp-
toti ally equivalent, but the se ond order bias in nite samples of the ontinuous updating
estimator is smaller. However, the numeri pro edure may be subje t to multiple fashions in
the obje tive fun tion, whi h renders this estimator numeri ally unstable.
The estimation of the SV model by GMM is performed by employing the moment onditions
dened by the ve tor given by Eq. 2.3. There are, however, some spe i points in the
estimation of SV. As dis ussed in Melino and Turnbull (1990) and Hall (2005), the numeri al
pro edure in this problem be omes more di ult due to the presen e of non-dierentiable
moment onditions by using absolute moments. Although these fun tions are dierentiable
at almost all the points and the use of absolute moments does not ae t the asymptoti
properties of the estimators (e.g. Hall (2005)), it is important to dis uss how to deal with this
problem. Melino and Turnbull (1990) assume that the value of the fun tion is 0 at the non-
dierentiable points, but this pro edure an be problemati be ause it leads to a dis ontinuity
in the determination of the step size in the numeri optimization algorithm. An alternative
form onsists in performing a pro edure of numeri al interpolation at the non-dierentiability
point, whi h is the pro edure arried out in this study. The properties of this approximation
an be seen in Hall (2005).
Properties of the GMM estimator in the estimation of SV models an be found in Andersen
and Sorensen (1996)'s study, and a omplete revision of the use of methods of moments, in-
luding the use of simulated methods of moments, an be found in Renault (2009). The results
demonstrate that this estimator, despite being omputationally simple, has poor properties
in nite samples due to bias and ine ien y problems, although the results are better that
those obtained by the quasi-maximum likelihood estimator (e. g. Ja quier et al. (1994)). The
problem in nite samples of the GMM estimator is related to the need to use a large number
of moments to se ure the estimator's e ien y, but the bias of the GMM estimator in nite
samples is proportional to the number of moment onditions used. Thus, in nite samples
there is a trade-o between bias and e ien y. Note that, although the prin ipal advantage of
the GMM estimator lies in its semi-parametri formulation, whi h does not require assump-
tions about the sample distribution, the estimator employs only the moments of the pro ess,
and it does not employ all the information ontained in the sample.
In Andersen and Sorensen (1996)'s arti le, several details are dis ussed in the spe i ation
of the GMM estimator for SV models, su h as the hoi e of the Kernel fun tion and the
bandwidth employed, onvergen e problems and other subgroups of moment onditions. In
this study we employ the quadrati spe tral fun tion as kernel fun tion, with the optimum
bandwidth hosen by Andrews (1991)'s pro edure.
4. Generalized empiri al likelihood and generalized minimum ontrast

Estimators.
The GMM is a method parti ularly useful in estimating non-linear models when the mo-
ments are known. However there is a trade-o between, on the one hand, the weaker need of
assumptions for its use, and, on the other, the method's e ien y in nite samples, as dis-
ussed in the previous se tion. The regularity onditions for GMM estimators (Hansen (1982),
Newey and M Fadden (1994), Hall (2005)) involve only onditions for the asymptoti validity
of the moment onditions, and they do not assume stronger onditions su h as the knowledge
of pro ess distribution, whi h represents an underutilization of the information presented in
the sample.
135
The opposite situation would be the estimation by the method of maximum likelihood,
whi h uses not only the onditional moments of the pro ess but also all the information
present in the onditional densities. If the pro ess is orre tly spe ied and meets the regular-
ity onditions, it is the best asymptoti ally Gaussian estimator, besides rea hing optimality
in measures su h as Badahur e ien y (Kitamura (2006), DasGupta (2008)). Note that the
estimation by maximum likelihood in the ontext of the estimation of SV models is more
omplex be ause the volatility is a latent variable, and the evaluation of the exa t likelihood
fun tion usually requires simulation methods su h as importan e sampling or MCMC. Ap-
proximations using the quasi-maximum likelihood prin iple represent a ost in terms of their
inferior performan e in nite samples.
In this ontext, an alternative form of formulating estimators that do not need the paramet-
ri spe i ation of the pro ess distribution onsists in employing semi-parametri estimation
methods based on a non-parametri estimation of the likelihood fun tion of the pro ess. These
semi-parametri estimators are known as Empiri al Likelihood (EL) methods, formulated as
generalizations of the non-parametri likelihood methods by Kiefer and Wolfowitz (1956).
A ording to Kitamura (2006)'s presentation, the non-parametri log-likelihood fun tion of
a sequen e of IID data {xi }ni=1 of unknown density is dened as:
n
X
(4.1) ℓN P (p1 , . . . , pn ) = log pi , (p1 , . . . , pn ) ∈ △,
i=1
P
dening △ as the simplex {(p1 , . . . , pn ) : ni=1 pi = 1, 0 ≤ pi ≤ 1, i = 1, . . . n} .
This denition is equivalent to addressing ea h point of the sample as originating from
a multinomial distribution with the support given by the sample {xi }ni=1 observations, even
though the xi density is not multinomial. As this formulation does not involve any model and
does not ontain the model's parametri stru ture, it is somehow nonrestri tive when employed
in inferen e problems involving a parametri part with a nite number of parameters. The
semi-parametri spe i ation of this pro ess was obtained by Owen (1991), who established
the on ept of empiri al likelihood.
This formulation is important be ause it allows onne tions between the non-parametri
estimation of the likelihood fun tion and the estimation using moment onditions, formulated
with the estimation equation and M-estimators prin iple - as shown by Qin and Lawless (1994),
and these estimation equations an be formulated by using moment onditions in the same
way as GMM estimators.
Assuming moment onditions given by:
ˆ
(4.2) E [g(θ, Y )] = g(θ, y)dµY (y) = 0, θ ∈ Θ ⊂ Rk ,
where µY is the distribution of the random variable Y , the estimation problem using moment
onditions an be transformed into a non-parametri likelihood estimation, by the onstru tion
of impli it probabilities pi , and thus the log-likelihood fun tion to be maximized be omes:
n
X n
X
(4.3) ℓN P (p1 , . . . , pn ) = log pi , s.t. g(θ, yi )pi = 0
i=1 i=1
136
The value that maximizes this expression is the maximum empiri al likelihood estimative
and it maximizes the empiri al likelihood fun tion of the pro ess and simultaneously imposes
the validity of the moment onditions. These impli it probabilities give more weight to obser-
vations where the moment onditions are loser to zero, and less weight to other observations.
Note that the generalized method of moments an be obtained as a parti ular ase assuming
all weights to be pt = 1/n.
This empiri al likelihood formulation is parti ularly useful in the estimation of models
with latent variables where there is no way of evaluating the exa t likelihood fun tion of the
pro ess. Whereas it is not ne essary, when dealing with the GMM estimator, to assume the
knowledge of the pro ess likelihood, in the estimators of empiri al likelihood the information
of the pro ess distribution is used in the estimation by means of its non-parametri estimation.
This onstru tion makes it possible to obtain e ien y properties in the semi-parametri sense
dened by Bi kel et al. (1993).
Note that, when the sample is not an IID pro ess, it is ne essary to modify the treatment
given to the moment onditions. In this situation, the method is modied assuming that
the moment onditions originate from a pro ess that is weakly dependent and possibly het-
erokedasti . Anatolyev (2005) proposes to substitute g(θ, yt ) for a smoothed version dened
as:
m
X
(4.4) gw (θ, yt ) = w(s)g(θ, yt−s ),
s=−m
where w(s) are weights obtained by a kernel fun tion adding one, in the spirit of a HAC
estimator (Andrews (1991)). This modi ation makes it possible to obtain the same onditions
of rst order asymptoti e ien y present in the GMM methods. The moment onditions are
then as follows:
T
X
(4.5) pt gw (θ, yt ) = 0.
t=1
The GMM estimators is generally dened by the minimization of the quadrati form 3.6, and
in in the overidentied ase not all the moment onditions are ne essarily equal to zero at the
estimated parameter value. In the empiri al likelihood estimators formulated by the moment
onditions, these onditions are set exa tly equal to zero using the ponderation given by the
empiri al probabilities pt . Note that in models exa tly identied, all the proposed estimators
obtain similar results, be ause in all these estimators the moment onditions are always valid.
An important result is that in overidentied models with valid moment onditions all these
estimators obtain the same asymptoti varian e (e.g. Kitamura (2006)).
It is possible to formulate these empiri al likelihood estimators as parti ular ases of the
semi-parametri lass of estimators based on the minimization of distan es, or, as dened by
Bi kel et al. (1993), estimators of generalized minimum ontrast (GMC)2. This formulation
makes it possible to obtain the properties of semi-parametri e ien y in this lass of estima-
tors. Note that we an also draw a parallel with the interpretation of the GMM estimator as
2See Bi kel et al. (1993), ap 7, for a general dis ussion of onditions of regularity, existen e and e ien y
of generalized minimum ontrast estimators.
137
an estimator of minimum χ2 , or the interpretation of quasi-maximum likelihood estimators as

estimators of minimum ontrast (White (1982)).
In order to show this alternative interpretation of empiri al likelihood estimators, we start
by dening a general divergen e fun tion D(P, Q) between two probability measures P and
Q as:

dP
ˆ
(4.6) D(P, Q) = φ dQ,
dQ
where φ is a onvex fun tion. This is an important ondition be ause it allows us to dene
the onditions of regularity in the pro ess, e.g. Bi kel et al. (1993). Dene M as the set of
all probability measures in Rp and P , the statisti model dened by measures of probability
ompatible with 4.7:
ˆ
(4.7) P (θ) = P ∈ M : g(θ, x)dP = 0
The estimator of generalized minimum ontrast is dened as a solution of:
(4.8) inf inf D(P, µ),

θ∈Θ P∈P(θ)
where µ denote the dominating measure in this model.

Thus, in a orre tly spe ied model, this dis repan y must be the unique, and minimum in
θ = θ0 .
In order to establish the onne tion with empiri al likelihood estimators dened by equation
4.5 and the minimum ontrast estimators by means of impli it probabilities, it should be noted
that the minimum ontrast estimators an be formulated as a problem in the form of moment
onditions E(g(θ, yt )) = 0, turning the minimum ontrast estimators into a fun tion of these
probabilities, using ontrast fun tion hT :
T
X
(4.9) θbn = arg min hT (pt ).
θ,pt
t=1
In the ase of empiri al likelihood estimators, the point estimate θb is the value whi h
minimizes the dis repan y between pbt and uniform weights. An important result is that an
adequate hoi e of the dis repan y fun tion an lead to a unied representation of empiri al
likelihood and minimum ontrast estimators. This representation an be obtained when the
fun tion hT (pt ) belongs to the Cressie-Read family of dis repan ies given by:
[γ(γ + 1)]−1 (T pt )γ+1 − 1]

(4.10) hT (pt ) =
T
whi h en ompasses ases of several lasses of estimators. Empiri al likelihood is obtained
with the restri tion γ → 0 in the dis repan y fun tion hT (pt ); the method of generalized
minimum ontrast, known as ET of Kitamura and Stutzer (1997) and Imbens et al. (1998), is
obtained with γ → −1; and the ontinuous updating estimator using the empiri al likelihood
formulation is obtained with γ → 1.
138
Note that the problem of estimation involves obtaining estimators not only for the impli it
probabilities but also for the parameters of the parametri part of the model, whi h is, in
prin iple, a high dimension optimization problem. Smith (2001) demonstrated that it is
possible to dene another estimator that also has these estimators as parti ular ases, and
that makes possible a dual formulation of inferior dimension.
The Smith (2001) Generalized Empiri al Likelihood (GEL) estimate is obtained as a solution
for the following saddlepoint problem:
" T
#
1 X
(4.11) θbn = arg min max ′ w
ρ λ g (θ, yt ) ,
θ λ T
t=1
where λ denes Lagrange multipliers imposing a restri tion:
T
X
(4.12) pt gw (θ, yt ) = 0.
t=1
Estimators are obtained by solving the previous equation with the rst-order ondition:
T
X ∂gw (θ, yt )
(4.13) pt λ′ =0
∂θ
t=1
with:
1 ′ ′ w
(4.14) pt = ρ λ g (θ, yt ) .
T
This generalized likelihood estimator ontains the empiri al likelihood estimator, assuming
the same onditions of the Cressie-Read divergen e fun tion over γ , through modi ations of
fun tions h and ρ. The EL estimator is obtained by h(p) = −ln np and ρ(ξ) = ln(1 − ξ); the
ET estimator by (Kitamura and Stutzer (1997) , Imbens et al. (1998)) with h(p) = np ln np
and ρ(ξ) = −exp(ξ); and the ontinuous updating estimator as h(p) = (np)2 and ρ(ξ) =
−(1 + ξ)2 /23.
An additional lass of estimators whi h do not belong dire tly to the lass of EL or minimum
ontrast estimators, but whi h is obtained by ombining the empiri al likelihood estimator
and the ET estimator, is the ETEL estimator proposed by S henna h (2007). This estimator
is dened as:
n
!
X
(4.15) θb = arg min n−1 e
h(pt (θ)) ,
θ
i=1
where e
gi (θ) is the solution of:
n
X
(4.16) min n −1
h(pt )
{gi }n
i=1 i=1
3See Table 1 Smith (2001) for more details.
139
Pn Pn
subje t to i=1 pt g(θ, yt ) = 0 and i=1 pt = 1, with e
h(pbt ) = −ln(npt) and h(pt ) =
npt ln(npt ).
Note that the ETEL estimator employs the ET method to nd the probabilities pbi (θ), and
the EL method to estimate the parameter ve tor θb. These probabilities are related to the
multipliers λ by the relation:

b ′ g(θ, yt )
λ(θ)
(4.17) pbt (θ) = P .
n b ′
i=1 λ(θ) g(θ, yt )
An important property of the estimators of ETEL lass is their behavior in the presen e of
in orre t spe i ation. Imbens et al. (1998) point out that the EL estimator an display inad-
equate behavior in the presen e of in orre t spe i ation due to the presen e of a singularity in
its inuen e fun tion, and, a ording to theorem 1 in Smith (2001), the asymptoti properties
of the EL estimator an be severely weakened in the presen e of minimum spe i ation prob-
lems. This also ae ts the estimations of the impli it probabilities, be ause, in the presen e
of spe i ation problems, the impli it probabilities in likelihood problems tend to on entrate
on the extreme observations, in opposition to what is expe ted in a robust estimator in Huber
(1981) and Hampel et al. (1986)'s sense, whi h should minimize the importan e of extreme
observations in the onstru tion of an estimator.
We will now summarize some ommon properties of the estimators dis ussed in this study.
The rst property is that all the estimators employed (two-stage GMM, iterative GMM, GMM
ontinuous updating, GEL, ET, and ETEL) have the same properties of onsisten y and rst
order asymptoti e ien y (e.g. Smith (2001), S henna h (2007)), and in the validity of
moment onditions all the estimators have the same asymptoti varian e. However, their
performan e in nite samples an be quite dierent. The two-stage GMM estimator an be
severely biased in sample sizes employed in e onomi s and nan e, and ontinuous updating
estimators are numeri ally unstable due to the presen e of multiple modes in the obje tive
fun tion (e.g. Hansen et al. (1996)). Another interesting property is that estimators based on
GMC and GEL are invariant to linear transformations in the ve tor of moment onditions,
whi h does not o ur in the two-stage GMM estimator. Estimators based on generalized
empiri al likelihood/minimum ontrast are e ient in the semi-parametri sense of Bi kel
et al. (1993), and have superior properties in terms of higher order asymptoti bias. These
estimators also present optimum properties in terms of hypotheses testing. As demonstrated
by Kitamura (2006), these tests are optimum in the minimax and large deviations riteria,
and are uniformly more powerful in the generalized sense of Neyman-Pearson.
A fundamental point is that in the EL and minimum ontrast estimators based on the
Cressie-Read dis repan y, the bias in nite samples does not grow with the number of moment
onditions used. This property makes it possible for the e ien y of the estimators to be
obtained with the use of a high number of moment onditions, without implying an in rease
in the bias in the nite samples as o urs in the use of the GMM estimator, whi h leads to
the problem of the inferior performan e of this method in omparison with other forms of
estimation.
The result obtained by Smith (2001) is that in the lass of minimum ontrast/empiri al like-
lihood estimators, the only estimator with adequate behavior in the presen e of spe i ation
problems is the ET estimator, be ause its inuen e fun tion does not present singularities. The
ETEL estimator is a ombination of the EL estimator and the EL estimator, and it maintains
140
the EL estimator's hara teristi s of asymptoti e ien y and minimum bias. Additionally,
it inherits the robustness in the presen e of spe i ation problems, due to the use of the
ET estimator to estimate impli it probabilities,
√ as demonstrated by theorems 8-10 in Smith
(2001), who proves that this estimator is n onvergent even in the presen e of spe i ation
problems.
Estimators for the parameters of the parametri part of the model and for the impli it
probabilities an be obtained by numeri optimization or via quasi-Newton iterative methods.
These methods an be formulated in a problem of smaller dimension using a dual formulation
(Kitamura (2006)) through the numeri optimization employing Lagrange multipliers dened
by equations 4.11 and 4.17, whi h is the general form used in this study.
Note that in the estimation of SV models we are subje t to the same problem of using non-
dierentiable moment onditions due to the use of absolute moments. This problem impedes
the simple use of iterative methods for the estimation of Lagrange multipliers proposed by
Kitamura (2006), and thus, in these ases, we need to use the same te hniques of numeri
optimization with the interpolation in the vi inity of the dis ontinuity points dis ussed in the
estimation by GMM.
5. Monte Carlo Studies

The performan e of the proposed estimators is analyzed through a series of Monte Carlo
studies, with the purpose of verifying the performan e of ea h estimator in dierent parameter
ongurations, sample sizes, moment onditions employed, and robustness in the presen e of
spe i ation problems and outliers. In order to analyze these problems, we worked with three
parameter ongurations for ea h experiment performed. These ongurations follow the
same ongurations employed in the arti les by Ja quier et al. (1994), Andersen and Sorensen
(1996) and Takada (2009). The set of simulated models orrespond to the parameters (α, β, σ)
given by (-0.736, .9, .3629), (-0.368, .95, .26) and (-.1472, .98, .1657). This hoi e is justied
in the study by Ja quier et al. (1994) where these ongurations are onsidered as generating
the same non- onditional varian e but with distin t persisten e ongurations.
In the rst analysis, we performed the estimation of the referen e models (Gaussian inno-
vations in the mean and volatility equations and without outlier) using the estimators dened
previously. For ea h parameter ve tor we arried out 1,000 repli ations. The sample size will
be equal to 500 in all ases ex ept in the analysis of the sample size ee ts. Ea h simulated
series was estimated by the following methods: two-stage GMM (GMM2S), Iterative GMM
(GMMITER), GMM Continuous Updating (GMMCUE), GEL, ET and ETEL, as well as the
versions with smoothed moments of these three last estimators (SGEL, SET and SETEL).
Tables 1, 2 and 3 show the estimation results of these referen e models with three parameter
onguration; ea h table presenting the mean, the bias, mean squared error (MSE), and mean
absolute error (MAE) of ea h parameter estimator. In order to ease the visualization of the
results, we shown in Figure 5.1 the MSE and MAE of ea h estimator for ea h parameter.
In terms of mean quadrati error and mean absolute error generally the estimators based on
EL and GMC are mu h superior to those obtained by estimators based on GMM, and this
superiority is valid for all the three parameters estimated in all parameter ongurations.
This result gives support to the use of these methods as ompetitive methodologies in the
estimation of SV models.
Although the straight omparison in this arti le is performed with estimators using the
same moment onditions, due to the use of the same parameter onguration of other studies,
141
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL

mean α -0.700191 -0.608924 -0.569311 -0.729763 -0.730008 -0.728858 -0.732023 -0.730267 -0.727510
bias α 0.035809 0.127076 0.166689 0.006237 0.005992 0.007142 0.003977 0.005733 0.008490
mse α 0.495912 0.827004 0.383436 0.000723 0.000105 0.000229 0.002954 0.000148 0.000431
mae α 0.494248 0.686760 0.466081 0.016242 0.007843 0.011289 0.014323 0.008779 0.011775
mean β 0.905908 0.918369 0.923634 0.902438 0.901313 0.901860 0.901165 0.901239 0.901681
bias β 0.005908 0.018369 0.023634 0.002438 0.001313 0.001860 0.001165 0.001239 0.001681
mse β 0.008939 0.014804 0.006909 0.000019 0.000005 0.000009 0.000049 0.000007 0.000013
mae β 0.066857 0.092551 0.063312 0.003243 0.001806 0.002431 0.002724 0.001962 0.002344
mean σ 0.236795 0.158262 0.170565 0.386721 0.387840 0.380383 0.378491 0.383913 0.376488
bias σ -0.126105 -0.204638 -0.192335 0.023821 0.024940 0.017483 0.015591 0.021013 0.013588
mse σ 0.039156 0.067792 0.053258 0.001503 0.001488 0.001911 0.002134 0.001700 0.002037
mae σ 0.168891 0.234061 0.207049 0.031779 0.033321 0.035319 0.037523 0.035307 0.036788
Table 1. Referen e SV Model Sample Size 500 - α=-0.736 β =.9 σ =.3629, T=500

mean α -0.323798 -0.300172 -0.520061 -0.371048 -0.366947 -0.369468 -0.367388 -0.366799 -0.367850
bias α 0.044202 0.067828 -0.152061 -0.003048 0.001053 -0.001468 0.000612 0.001201 0.000150
mse α 0.209281 0.375829 0.034725 0.000468 0.000219 0.000250 0.000331 0.000225 0.000631
mae α 0.311453 0.394674 0.156372 0.012824 0.010469 0.011898 0.012324 0.011163 0.013191
mean β 0.956477 0.959678 0.930090 0.950147 0.950309 0.950104 0.950271 0.950331 0.950167
bias β 0.006477 0.009678 -0.019910 0.000147 0.000309 0.000104 0.000271 0.000331 0.000167
mse β 0.003805 0.006868 0.000635 0.000008 0.000004 0.000005 0.000006 0.000004 0.000010
mae β 0.042185 0.053324 0.020561 0.001767 0.001443 0.001708 0.001526 0.001395 0.001681
mean σ 0.146227 0.098469 0.198858 0.265285 0.272920 0.262856 0.258772 0.269163 0.257539
bias σ -0.113773 -0.161531 -0.061142 0.005285 0.012920 0.002856 -0.001228 0.009163 -0.002461
mse σ 0.027740 0.040511 0.004846 0.001597 0.001578 0.001870 0.002119 0.001679 0.002234
mae σ 0.142769 0.182002 0.061584 0.031468 0.033283 0.034507 0.037988 0.034831 0.039488
Table 2. Referen e SV Model Sample Size 500 - α=-0.368 β =.95 σ =.26

mean α -0.148610 -0.153037 -0.161124 -0.177110 -0.167514 -0.166558 -0.182551 -0.168109 -0.170785
bias α -0.001410 -0.005837 -0.013924 -0.029910 -0.020314 -0.019358 -0.035351 -0.020909 -0.023585
mse α 0.122958 0.180502 0.007471 0.002719 0.000792 0.000865 0.004335 0.001014 0.002519
mae α 0.185981 0.222997 0.033918 0.031699 0.020929 0.020838 0.038433 0.021868 0.025349
mean β 0.980010 0.979486 0.979135 0.976228 0.977244 0.977427 0.975315 0.977149 0.976785
bias β 0.000010 -0.000514 -0.000865 -0.003772 -0.002756 -0.002573 -0.004685 -0.002851 -0.003215
mse β 0.002228 0.003236 0.000131 0.000043 0.000017 0.000017 0.000073 0.000023 0.000048
mae β 0.025133 0.030039 0.003657 0.004043 0.002837 0.002788 0.005116 0.002971 0.003441
mean σ 0.078125 0.055286 0.148443 0.169463 0.170654 0.160767 0.167593 0.169098 0.160081
bias σ -0.087575 -0.110414 -0.017257 0.003763 0.004954 -0.004933 0.001893 0.003398 -0.005619
mse σ 0.016178 0.020346 0.001544 0.001591 0.001388 0.001523 0.001909 0.001439 0.001613
mae σ 0.115562 0.133223 0.017375 0.031635 0.029684 0.030572 0.035919 0.029612 0.032264
Table 3. Referen e SV Model Sample Size 500 - α-.1472 β =.98 σ =.1657
it is possible to ompare the results obtained with other estimation methodologies. The results
obtained are dire tly omparable with those analyzed in Takada (2009)'s arti le, who proposed
an estimator for SV models employing simulated Minimum Hellinger Distan es, omparing
this method with other methodologies, su h as the e ient method of moments (EMM),
MCMC, and maximum likelihood Monte Carlo.
Table 1 in Takada (2009) shows the results for these estimators' MSE for the rst parameter
ve tor studied, for a sample of size 500. The results of a dire t omparison with the results
presented in this table indi ate that the estimators based on GEL/GMC are superior to
the following methods in terms of MSE: SMHD (Simulated Minimum Hellinger Distan e),
EMM (E ient Method of Moments) and MCMC. They also have a superior or equivalent
performan e to the MCML (Monte Carlo Maximum Likelihood) estimators by the riterion
of mean quadrati error. In omparison with the results of that arti le, we noti e that the
results of all the estimators based on GEL/GMC are superior to all these methods, ex ept for
142
Figure 5.1. MSE and MAE of the estimation of the referen e models with
sample size 500 and 24 moment onditions.
0.8 Experiment 1 α =.736 γ =.9 σ2 =.3629 Experiment 1 α =−.1472 γ =.98 σ2 =.1657
mse alpha mse alpha

mae alpha mae alpha
0.20
mse gamma mse gamma
mae gamma mae gamma
mse sigma^2 mse sigma^2
mae sigma^2 mae sigma^2
0.6
0.15
0.4
0.10
0.2
0.05
0.00
0.0
GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL GMM2S GMMITER GMMCUE GEL ET ETEL SGEL SET SETEL
Experiment 1 α =−.1472 γ =.98 σ2 =.1657
mse alpha
mae alpha
0.20
mse gamma
mae gamma
mse sigma^2
mae sigma^2
0.15
0.10
0.05
0.00
the estimation of α where the estimators obtain a mean quadrati error equal to the MCML
estimator.
In this omparison it is important to noti e that the GEL/GMC estimators do not require
Monte Carlo simulation pro edure, and are omputationally simpler than these methods,
indi ating that the use of EL and MC makes it possible to obtain superior properties in nite
samples when ompared with the methods so far onsidered as the most e ient in the SV
model estimation, with a noti eably smaller omputational and implementation ost.
5.1. Ee t of Sample Size and Set of Instruments. In order to verify the ee t of the
sample size in the estimators' performan e, we arried out an analysis with the estimation of
the parameter ve tors studied with samples of size 250 (Tables 4, 5 and 6) and 1,000 (Tables
7, 8 and 9) and employing the 24 moment onditions dened by equation 2.3. As expe ted,
the in rease in the sample size de reases the MSE and MAE of all the estimators, but with
dierent ee ts for ea h parameter onguration of and estimation method. Summarizing
these results, we show in Figure 5.2 the relative e ien y, dened as a reason between the
MSE of the sample of size 250 and the MSE of sample size 1,000 for ea h onguration.
Ex ept for the GEL estimator in parameter onguration 2, with e ien e rate inferior to
one, there is a real gain in terms of MSE for all the parameters. This parti ular result for
the GEL estimator in this onguration an be explained by the greater onvergen e di ulty
143

mean α -0.845827 -0.840909 -0.672729 -0.736400 -0.728755 -0.727990 -0.730019 -0.729343 -0.726712
bias α -0.109827 -0.104909 0.063271 -0.000400 0.007245 0.008010 0.005981 0.006657 0.009288
mse α 0.944574 2.522021 1.339914 0.004713 0.000205 0.000555 0.000448 0.000775 0.001030
mae α 0.568116 0.956395 0.602316 0.023396 0.010023 0.015524 0.013468 0.012361 0.016576
mean β 0.887361 0.889161 0.910046 0.901653 0.901630 0.902274 0.901814 0.901582 0.902088
bias β -0.012639 -0.010839 0.010046 0.001653 0.001630 0.002274 0.001814 0.001582 0.002088
mse β 0.016784 0.043542 0.023987 0.000080 0.000009 0.000016 0.000015 0.000020 0.000028
mae β 0.076241 0.127159 0.081218 0.004287 0.002298 0.003158 0.002857 0.002684 0.003380
mean σ 0.255995 0.139735 0.143907 0.383615 0.385449 0.377069 0.377577 0.383512 0.370742
bias σ -0.106905 -0.223165 -0.218993 0.020715 0.022549 0.014169 0.014677 0.020612 0.007842
mse σ 0.042192 0.079458 0.067077 0.002284 0.001830 0.002593 0.002906 0.002043 0.003781
mae σ 0.170713 0.256598 0.237429 0.037310 0.036526 0.040098 0.041506 0.038537 0.046534
Table 4. Referen e SV Model Sample Size 250 - α=-0.736 β =.9 σ =.3629

mean α -0.453177 -0.510670 -0.582461 -0.369508 -0.365449 -0.367953 -0.367438 -0.365712 -0.366300
bias α -0.085177 -0.142670 -0.214461 -0.001508 0.002551 0.000047 0.000562 0.002288 0.001700
mse α 0.472332 1.448096 0.080215 0.000396 0.000215 0.000345 0.001415 0.000261 0.001354
mae α 0.392820 0.621217 0.219123 0.013576 0.010837 0.012862 0.016954 0.011440 0.016981
mean β 0.939239 0.932243 0.921192 0.950571 0.950596 0.950449 0.950410 0.950571 0.950487
bias β -0.010761 -0.017757 -0.028808 0.000571 0.000596 0.000449 0.000410 0.000571 0.000487
mse β 0.008842 0.025796 0.001543 0.000007 0.000005 0.000007 0.000024 0.000006 0.000023
mae β 0.053047 0.082917 0.029686 0.002000 0.001679 0.002015 0.002288 0.001694 0.002338
mean σ 0.171005 0.093941 0.178377 0.262543 0.272532 0.258514 0.255426 0.269208 0.252188
bias σ -0.088995 -0.166059 -0.081623 0.002543 0.012532 -0.001486 -0.004574 0.009208 -0.007812
mse σ 0.028943 0.047305 0.009177 0.002206 0.001896 0.002725 0.003091 0.002087 0.003352
mae σ 0.142804 0.200365 0.082031 0.036923 0.036060 0.041672 0.045007 0.038874 0.047074
Table 5. Referen e SV Model, Sample Size 250 - α=-0.368 β =.95 σ =.26

mean α -0.287550 -0.358624 -0.178688 -0.184597 -0.174181 -0.172286 -0.184011 -0.175501 -0.178481
bias α -0.140350 -0.211424 -0.031488 -0.037397 -0.026981 -0.025086 -0.036811 -0.028301 -0.031281
mse α 0.480456 1.419835 0.036347 0.003332 0.001820 0.002089 0.004401 0.002261 0.002978
mae α 0.290438 0.426097 0.052332 0.039280 0.027872 0.026591 0.040621 0.029496 0.033409
mean β 0.961360 0.951781 0.976673 0.975317 0.976275 0.976645 0.975205 0.976093 0.975689
bias β -0.018640 -0.028219 -0.003327 -0.004683 -0.003725 -0.003355 -0.004795 -0.003907 -0.004311
mse β 0.008904 0.025951 0.000614 0.000053 0.000046 0.000051 0.000067 0.000056 0.000068
mae β 0.039143 0.057390 0.006172 0.005015 0.003846 0.003566 0.005370 0.004078 0.004620
mean σ 0.099607 0.058895 0.147883 0.162567 0.165318 0.152367 0.154270 0.165148 0.149183
bias σ -0.066093 -0.106805 -0.017817 -0.003133 -0.000382 -0.013333 -0.011430 -0.000552 -0.016517
mse σ 0.018376 0.022620 0.003918 0.002296 0.002296 0.002485 0.002566 0.002073 0.002865
mae σ 0.117220 0.138037 0.022491 0.038325 0.037033 0.038657 0.041356 0.035026 0.041933
Table 6. Referen e SV Model, Sample Size 250 - α-.1472 β =.98 σ =.1657

mean α -0.657239 -0.655518 -0.646651 -0.732105 -0.731168 -0.728442 -0.731064 -0.728545 -0.730313
bias α 0.078761 0.080482 0.089349 0.003895 0.004832 0.007558 0.004936 0.007455 0.005687
mse α 0.324049 0.448948 0.321806 0.000529 0.000103 0.000284 0.000186 0.000103 0.000182
mae α 0.431807 0.524950 0.412616 0.013323 0.007189 0.011075 0.009587 0.007935 0.009229
mean β 0.911393 0.911682 0.912914 0.901757 0.901142 0.901871 0.901144 0.901450 0.901280
bias β 0.011393 0.011682 0.012914 0.001757 0.001142 0.001871 0.001144 0.001450 0.001280
mse β 0.005944 0.008215 0.005888 0.000011 0.000004 0.000009 0.000007 0.000005 0.000007
mae β 0.058491 0.070958 0.056019 0.002499 0.001646 0.002257 0.002140 0.001763 0.002053
mean σ 0.246982 0.220967 0.227270 0.381270 0.386004 0.377610 0.374018 0.380753 0.376954
bias σ -0.115918 -0.141933 -0.135630 0.018370 0.023104 0.014710 0.011118 0.017853 0.014054
mse σ 0.029136 0.041257 0.034827 0.001272 0.001388 0.001484 0.001415 0.001299 0.001300
mae σ 0.146245 0.173349 0.158733 0.027994 0.031482 0.030527 0.031812 0.031192 0.030416
Table 7. Referen e SV Model, Sample Size 1000 - α=-0.736 β =.9 σ =.3629
noted in this parti ular onguration, but it is important to note that, in the version with
smoothed moments, this estimator behaves as expe ted.
144

mean α -0.300991 -0.300362 -0.476716 -0.372371 -0.369634 -0.371808 -0.368965 -0.368831 -0.369742
bias α 0.067009 0.067638 -0.108716 -0.004371 -0.001634 -0.003808 -0.000965 -0.000831 -0.001742
mse α 0.108547 0.154086 0.016805 0.000462 0.000154 0.000212 0.000287 0.000194 0.000300
mae α 0.248294 0.300398 0.110979 0.011809 0.009061 0.010973 0.010886 0.009801 0.010868
mean β 0.959360 0.959477 0.935880 0.949738 0.949870 0.949686 0.949956 0.949983 0.949848
bias β 0.009360 0.009477 -0.014120 -0.000262 -0.000130 -0.000314 -0.000044 -0.000017 -0.000152
mse β 0.002000 0.002828 0.000297 0.000008 0.000003 0.000004 0.000005 0.000003 0.000005
mae β 0.033712 0.040665 0.014485 0.001599 0.001237 0.001470 0.001397 0.001205 0.001375
mean σ 0.156376 0.134259 0.215652 0.264582 0.270840 0.264941 0.260428 0.269523 0.261419
bias σ -0.103624 -0.125741 -0.044348 0.004582 0.010840 0.004941 0.000428 0.009523 0.001419
mse σ 0.020798 0.028795 0.002460 0.001239 0.001213 0.001342 0.001352 0.001238 0.001391
mae σ 0.122116 0.146295 0.044402 0.027305 0.029037 0.029162 0.030043 0.029670 0.030514
Table 8. Referen e SV Model, Sample Size 1000 - α=-0.368 β =.95 σ=.26

mean α -0.114335 -0.119373 -0.150316 -0.173171 -0.164059 -0.163896 -0.174346 -0.164044 -0.165767
bias α 0.032865 0.027827 -0.003116 -0.025971 -0.016859 -0.016696 -0.027146 -0.016844 -0.018567
mse α 0.043901 0.058947 0.004958 0.001341 0.000497 0.000473 0.001637 0.000563 0.001227
mae α 0.146038 0.166462 0.022415 0.027646 0.017471 0.017493 0.029596 0.017999 0.020478
mean β 0.984552 0.983893 0.980499 0.976673 0.977706 0.977774 0.976394 0.977703 0.977488
bias β 0.004552 0.003893 0.000499 -0.003327 -0.002294 -0.002226 -0.003606 -0.002297 -0.002512
mse β 0.000818 0.001091 0.000081 0.000021 0.000010 0.000009 0.000027 0.000012 0.000023
mae β 0.019818 0.022531 0.002246 0.003557 0.002355 0.002325 0.003912 0.002425 0.002743
mean σ 0.078361 0.066519 0.150921 0.174408 0.171587 0.167460 0.173789 0.171004 0.167570
bias σ -0.087339 -0.099181 -0.014779 0.008708 0.005887 0.001760 0.008089 0.005304 0.001870
mse σ 0.013792 0.016902 0.001312 0.001195 0.000971 0.000976 0.001379 0.000960 0.001094
mae σ 0.104885 0.118200 0.015356 0.026957 0.024514 0.024630 0.029696 0.024281 0.026283
Table 9. Referen e SV Model, Sample Size 1000 - α-.1472 β =.98 σ=.1657
As an be seen in Figure 5.2, the sample size has heterogeneous ee ts for ea h estima-
tor, depending on the parameter onguration. The estimators based on GEL/GMC with
smoothed moments have greater gain in the onguration with smaller persisten e while those
based on GMM behave in the opposite way. This result an be interpreted by the fa t that
the smoothing of moments is more e ient when the volatility persisten e is smaller.
As previously dis ussed, the main theoreti al motivation for the use of estimators based on
GEL/GMC lies in the possibility of using a larger number of moment onditions to a hieve a
more e ient estimation, sin e the nite samples bias in these methods does not grow with
the number of moment onditions, as o urs with GMM estimators. In order to verify this
property, we employ a new estimation with a subset of the moment onditions ve tor, now
working with 14 moment onditions only, a ording to Eq. 2.4, instead of the original 24
moment onditions given by Eq. 2.3.
The results of this omparison are displayed in Tables 10, 11 and 12, and the omparisons
between estimators employing MSE and MAE with the use of 14 moment onditions are pla ed
in Figure 5.3. We an note that in this onguration the GEL/GMC estimators still display
a superior performan e in omparison with those based on GMM, but now this performan e
is not as superior as in the onguration with 24 moment onditions, whi h gives support to
the onje ture of a superior use of the moment onditions in terms of bias and varian e for
the estimators of GEL/GMC lass.
Figure 5.4 presents the relative e ien y between MSE using 14 moments and the estimator
with 24 moments. For the GMM estimators the e ien y presents modest in reases or redu -
tions in reasing the number of instruments, similarly to the results obtained in the studies
by Andersen and Sorensen (1996). However there are, in general, very signi ant e ien y
gains in MSE for the estimators based on GEL/GMC, rea hing values over 200 times in the
145
Figure 5.2.Relative E ien y in the referen e models - Ee t of sample size

- (MSE sample size 250 /MSE sample size 1000). Estimation based on 24
moment onditions
efficiency gain mse alpha efficiency gain mse alpha

efficiency gain mse beta efficiency gain mse beta
efficiency gain mse sigma efficiency gain mse sigma
8
8
6
6
4
4
2
2
0
0
efficiency gain mse alpha

efficiency gain mse beta
efficiency gain mse sigma
20
15
10
5
0

mean α -0.387790 -0.329887 -0.392345 -0.706187 -0.583088 -0.705202 -0.734080 -0.688249 -0.737702
bias α 0.348210 0.406113 0.343655 0.029813 0.152912 0.030798 0.001920 0.047751 -0.001702
mse α 0.407410 0.529764 0.448516 0.030564 0.135697 0.059130 0.018108 0.298280 0.003793
mae α 0.500311 0.579126 0.488393 0.075290 0.240673 0.100716 0.035211 0.243153 0.015571
mean β 0.947847 0.955695 0.947246 0.904431 0.920801 0.905838 0.901287 0.907336 0.901759
bias β 0.047847 0.055695 0.047246 0.004431 0.020801 0.005838 0.001287 0.007336 0.001759
mse β 0.007420 0.009644 0.008133 0.000573 0.002498 0.001141 0.000342 0.005458 0.000083
mae β 0.067970 0.078549 0.066343 0.010849 0.033067 0.015050 0.005682 0.033610 0.003415
mean σ 0.177854 0.141572 0.166061 0.345979 0.256893 0.384320 0.376598 0.293342 0.398028
bias σ -0.185046 -0.221328 -0.196839 -0.016921 -0.106007 0.021420 0.013698 -0.069558 0.035128
mse σ 0.053409 0.071263 0.057475 0.006115 0.023314 0.006100 0.003235 0.020749 0.002738
mae σ 0.204028 0.240183 0.213535 0.061064 0.125973 0.061736 0.045643 0.110763 0.044111
Table 10. Referen e SV Model, Subset of Instruments - α=-0.736 β =.9 σ=.3629
se ond parameter onguration. Nevertheless, for the third parameter onguration, we an

observe that the estimation with a number of moment onditions represents a redu tion in
the relative e ien y of all the methods for the estimators of α and β .
5.2. - Student-t Distribution (4) in the mean innovations. As previously dis ussed,
although the SV log-normal model is dened by moments of a log-normal distribution, it an
be interpreted in a semi-parametri form as an autoregressive model for the exponential of
146

mean α -0.176010 -0.167857 -0.424666 -0.371692 -0.361446 -0.367182 -0.372239 -0.362772 -0.367428
bias α 0.191990 0.200143 -0.056666 -0.003692 0.006554 0.000818 -0.004239 0.005228 0.000572
mse α 0.226799 0.303138 0.012526 0.000276 0.000203 0.000169 0.000345 0.000091 0.000117
mae α 0.323536 0.349715 0.070120 0.010155 0.010376 0.008243 0.012456 0.006924 0.006971
mean β 0.976343 0.977480 0.942910 0.951056 0.951271 0.950586 0.951122 0.951025 0.950734
bias β 0.026343 0.027480 -0.007090 0.001056 0.001271 0.000586 0.001122 0.001025 0.000734
mse β 0.004069 0.005476 0.000219 0.000010 0.000006 0.000006 0.000010 0.000003 0.000004
mae β 0.043860 0.047336 0.009105 0.002251 0.001941 0.001689 0.002211 0.001380 0.001501
mean σ 0.100259 0.081734 0.214449 0.289414 0.269170 0.284782 0.285166 0.281602 0.292884
bias σ -0.159741 -0.178266 -0.045551 0.029414 0.009170 0.024782 0.025166 0.021602 0.032884
mse σ 0.037548 0.044987 0.003587 0.001766 0.001190 0.001708 0.001545 0.001222 0.002284
mae σ 0.176449 0.195545 0.046372 0.032133 0.027369 0.030589 0.029887 0.028479 0.035764
Table 11. Referen e SV Model, Subset of Instruments - α=-0.368 β =.95 σ=.26

mean α -0.089063 -0.092245 -0.146779 -0.160451 -0.160487 -0.152710 -0.158337 -0.151514 -0.154633
bias α 0.058137 0.054955 0.000421 -0.013251 -0.013287 -0.005510 -0.011137 -0.004314 -0.007433
mse α 0.074543 0.137264 0.004951 0.000839 0.000555 0.000188 0.000673 0.000117 0.000291
mae α 0.163815 0.178350 0.028959 0.017083 0.015205 0.007540 0.016316 0.006146 0.010725
mean β 0.988033 0.987660 0.980816 0.979301 0.978390 0.979612 0.979438 0.979615 0.979451
bias β 0.008033 0.007660 0.000816 -0.000699 -0.001610 -0.000388 -0.000562 -0.000385 -0.000549
mse β 0.001332 0.002385 0.000081 0.000010 0.000011 0.000003 0.000009 0.000003 0.000004
mae β 0.022137 0.024041 0.003246 0.001962 0.002243 0.001172 0.001775 0.001188 0.001340
mean σ 0.056143 0.047436 0.146157 0.196089 0.176797 0.191231 0.189516 0.186095 0.196360
bias σ -0.109557 -0.118264 -0.019543 0.030389 0.011097 0.025531 0.023816 0.020395 0.030660
mse σ 0.018120 0.020777 0.002100 0.002081 0.001214 0.001677 0.001812 0.001431 0.002110
mae σ 0.126418 0.135976 0.020115 0.033161 0.026940 0.029457 0.029030 0.028479 0.034580
Table 12. Referen e SV Model, Subset of Instruments - α-.1472 β =.98 σ=.1657
the volatility pro ess, without the need for a distribution spe i ation for the innovation pro-
esses (e.g. Fran q and Zakoïan (2006), Renault (2009)). However, as we are employing these
theoreti al moments assuming the distribution spe i ation of innovations in the onstru tion
of the moment onditions, it is important to verify whether alternative spe i ations signi-
antly alter the properties of the estimators in nite samples. It is parti ularly interesting to
verify if, onsistently with what is observed for nan ial series, heavy-tailed pro esses ae t
these estimators.
The rst analysis undertaken was to repla e the standard Gaussian distribution in the
innovations of the mean pro ess for a Student-t distribution with 4 degrees of freedom. This
hoi e was purposely made with the aim of verifying the ee t of a distribution with heavier
tails on the estimation of SV models. Note that, as we are employing higher moments, the
heavy-tail ee t an be magnied in the estimation, sin e now ea h observation is raised to
poten ies of se ond, third and fourth orders. We parti ularly use this number of 4 degrees
of freedom in Student-t to have a distribution with non-nite kurtosis and, onsequently, to
have a robustness test under extreme onditions.
Tables 13, 14 and 15 show the results of this experiment using 24 moment onditions and
Tables 16, 17 and 18, using 14 moment onditions. It an be seen that in this situation the
estimators based on GMC/GEL learly maintain their dominan e over the estimators based
on GMM, as it be omes more evident in Figures 5.5 and 5.6, whi h show MSE and MAE of
ea h estimator, and on e again we have the same result of best performan e in this situation
of the GEL/GMC-based estimators.
In order to verify whether in this ase it is still advantageous to work with a larger set
of instruments Figure 5.7 shows the ratio of the MSEs between the estimators with 14 and
24 moment onditions. The results show that in this situation the in rease in the number of
147

mean α -0.898594 -0.840806 -0.807848 -0.704411 -0.710422 -0.711583 -0.727019 -0.752694 -0.725200
bias α -0.162594 -0.104806 -0.071848 0.031589 0.025578 0.024417 0.008981 -0.016694 0.010800
mse α 0.390480 0.981390 0.843844 0.031866 0.142569 0.013417 0.002774 0.072544 0.000397
mae α 0.421985 0.677693 0.578302 0.077138 0.211659 0.044133 0.016357 0.072182 0.014560
mean β 0.880334 0.887774 0.892161 0.903264 0.902588 0.903792 0.899936 0.896904 0.900473
bias β -0.019666 -0.012226 -0.007839 0.003264 0.002588 0.003792 -0.000064 -0.003096 0.000473
mse β 0.006902 0.017631 0.015195 0.000610 0.002765 0.000260 0.000052 0.001422 0.000007
mae β 0.056283 0.090791 0.078059 0.010621 0.029255 0.006682 0.001934 0.009835 0.001803
mean σ 0.411426 0.276272 0.277088 0.365962 0.330249 0.392250 0.401249 0.387742 0.398329
bias σ 0.048526 -0.086628 -0.085812 0.003062 -0.032651 0.029350 0.038349 0.024842 0.035429
mse σ 0.035925 0.046839 0.039294 0.005068 0.012450 0.004109 0.002003 0.004555 0.001863
mae σ 0.144036 0.179405 0.162927 0.053270 0.085005 0.051261 0.041188 0.051046 0.039298
Table 13. Student t (4) Innovations in Mean - α=-0.736 β =.9 σ =.3629

mean α -0.418591 -0.409907 -0.436561 -0.366868 -0.364062 -0.366793 -0.365613 -0.364053 -0.365115
bias α -0.050591 -0.041907 -0.068561 0.001132 0.003938 0.001207 0.002387 0.003947 0.002885
mse α 0.162877 0.504704 0.016724 0.000260 0.000175 0.000259 0.000197 0.000180 0.000215
mae α 0.268665 0.420585 0.078071 0.012478 0.010332 0.012119 0.010687 0.010435 0.010866
mean β 0.944253 0.945494 0.941962 0.950021 0.950277 0.949960 0.950228 0.950265 0.950229
bias β -0.005747 -0.004506 -0.008038 0.000021 0.000277 -0.000040 0.000228 0.000265 0.000229
mse β 0.002946 0.008605 0.000312 0.000005 0.000002 0.000005 0.000003 0.000003 0.000003
mae β 0.036112 0.056316 0.009965 0.001655 0.001184 0.001629 0.001271 0.001118 0.001276
mean σ 0.258888 0.163107 0.230616 0.308080 0.304090 0.304898 0.305641 0.302799 0.303441
bias σ -0.001112 -0.096893 -0.029384 0.048080 0.044090 0.044898 0.045641 0.042799 0.043441
mse σ 0.022531 0.036115 0.002240 0.004447 0.003247 0.003896 0.004327 0.003056 0.003854
mae σ 0.120120 0.160839 0.033712 0.053685 0.047893 0.051585 0.053479 0.047292 0.051053
Table 14. Student t (4) Innovations in Mean - α=-0.368 β =.95 σ =.26

mean α -0.182803 -0.207332 -0.153876 -0.178418 -0.172451 -0.171749 -0.179504 -0.174035 -0.173768
bias α -0.035603 -0.060132 -0.006676 -0.031218 -0.025251 -0.024549 -0.032304 -0.026835 -0.026568
mse α 0.064698 0.362934 0.015031 0.001764 0.001905 0.001836 0.001910 0.002270 0.002064
mae α 0.162150 0.250030 0.023437 0.032214 0.025756 0.025222 0.033189 0.027654 0.027348
mean β 0.975580 0.972639 0.979965 0.975852 0.976312 0.976454 0.975630 0.976079 0.976160
bias β -0.004420 -0.007361 -0.000035 -0.004148 -0.003688 -0.003546 -0.004370 -0.003921 -0.003840
mse β 0.001189 0.005761 0.000260 0.000027 0.000049 0.000047 0.000031 0.000058 0.000052
mae β 0.021753 0.033207 0.002463 0.004285 0.003778 0.003645 0.004497 0.004054 0.003948
mean σ 0.141434 0.087524 0.153574 0.206849 0.202369 0.199885 0.205119 0.203117 0.200861
bias σ -0.024266 -0.078176 -0.012126 0.041150 0.036669 0.034185 0.039419 0.037417 0.035161
mse σ 0.013726 0.021573 0.001190 0.003876 0.003752 0.003589 0.003721 0.004012 0.003858
mae σ 0.099130 0.127428 0.013556 0.048306 0.044964 0.044559 0.048999 0.045614 0.045999
Table 15. Student t (4) Innovations in Mean - α-.1472 β =.98 σ =.1657

mean α -0.489227 -0.409877 -0.414510 -0.645279 -0.527524 -0.688817 -0.732088 -0.636541 -0.725662
bias α 0.246773 0.326123 0.321490 0.090721 0.208476 0.047183 0.003912 0.099459 0.010338
mse α 0.294811 0.598153 0.570777 0.062880 0.157706 0.159874 0.034620 0.208947 0.001027
mae α 0.447530 0.597699 0.566140 0.135388 0.277889 0.169003 0.050133 0.260987 0.019382
mean β 0.934580 0.945279 0.944608 0.911168 0.927605 0.905506 0.899820 0.913821 0.901443
bias β 0.034579 0.045279 0.044608 0.011168 0.027605 0.005506 -0.000180 0.013821 0.001443
mse β 0.005378 0.010677 0.010224 0.001163 0.002921 0.003044 0.000682 0.003908 0.000023
mae β 0.060665 0.080809 0.076761 0.018336 0.037783 0.023940 0.006913 0.035885 0.002971
mean σ 0.290529 0.203847 0.211820 0.352261 0.299423 0.379088 0.388014 0.323308 0.398034
bias σ -0.072371 -0.159053 -0.151080 -0.010639 -0.063477 0.016188 0.025114 -0.039592 0.035134
mse σ 0.040188 0.061685 0.053119 0.009106 0.022248 0.008340 0.003194 0.019586 0.002666
mae σ 0.161437 0.214730 0.199325 0.073127 0.118990 0.069374 0.045031 0.103051 0.043476
Table 16. Student t (4) Innovations in Mean, Subset of Instruments - α=-

0.736 β =.9 σ =.3629
148
Figure 5.3. MSE and MAE of the estimation of the referen e models with
sample size 500 and 14 moment onditions.
Experiment 1 − Subset Instruments α =−.736 γ =.9 σ2 =.3629 Experiment 1 − Subset Instruments α =−0.368 γ =.95 σ2 =.26
0.35
mse alpha mse alpha
mae alpha mae alpha
mse gamma mse gamma
0.5
mae gamma mae gamma
0.30
0.25
0.4
0.20
0.3
0.15
0.2
0.10
0.1
0.05
0.00
0.0
Experiment 1 − Subset Instruments α =−.1472 γ =.98 σ2 =.1657
mse alpha
mae alpha
mse gamma
mae gamma
0.15
mse sigma^2
mae sigma^2
0.10
0.05
0.00

mean α -0.212205 -0.188704 -0.381028 -0.370669 -0.361173 -0.367182 -0.373580 -0.361736 -0.366302
bias α 0.155795 0.179296 -0.013028 -0.002669 0.006827 0.000818 -0.005580 0.006264 0.001698
mse α 0.151073 0.250103 0.004146 0.000320 0.000217 0.000351 0.000469 0.000131 0.000170
mae α 0.289612 0.348350 0.041639 0.011552 0.011170 0.011661 0.014248 0.008588 0.008391
mean β 0.971552 0.974813 0.949139 0.950362 0.950847 0.949851 0.950648 0.950749 0.950230
bias β 0.021552 0.024813 -0.000861 0.000362 0.000847 -0.000149 0.000648 0.000749 0.000230
mse β 0.002782 0.004466 0.000074 0.000012 0.000006 0.000011 0.000011 0.000004 0.000006
mae β 0.039291 0.047117 0.005421 0.002340 0.001886 0.002199 0.002342 0.001441 0.001700
mean σ 0.163153 0.114320 0.248484 0.311227 0.304519 0.310048 0.310693 0.303892 0.307685
bias σ -0.096847 -0.145680 -0.011516 0.051227 0.044518 0.050048 0.050693 0.043892 0.047685
mse σ 0.030464 0.042008 0.001299 0.004046 0.003705 0.004212 0.004070 0.002848 0.003464
mae σ 0.149490 0.184456 0.024816 0.052409 0.048864 0.051927 0.052690 0.045343 0.048842
Table 17. Student t (4) Innovations in Mean, Subset of Instruments - α=-

0.368 β =.95 σ=.26
instruments an impair the performan e of the estimators, and this ee t o urs both for the
GMM estimators and for the GEL/GMC estimators, although the ee t is heterogeneous in
terms of the onguration and of the parameter analyzed. In the situation of lower persisten e,
it is advantageous to work with the larger number of instruments for the GEL/GMC estima-
tors, but this result is not maintained in the other parameter ongurations, and parti ularly
in the onguration with high persisten e, the use of the set of instruments auses almost a
general degradation in the performan e of all the methods.
149
Figure 5.4. Relative E ien y in the referen e models - Ee t of number of

moment onditions - (MSE 24 moment onditions /MSE 24 moment ondi-
tions). Sample size 500.
40

200
30
150
20
100
10
50
0
0

1.2
1.0
0.8
0.6
0.4
0.2
0.0

mean α -0.074284 -0.071013 -0.150531 -0.162376 -0.166849 -0.154689 -0.158033 -0.155149 -0.153788
bias α 0.072916 0.076187 -0.003331 -0.015176 -0.019649 -0.007489 -0.010833 -0.007949 -0.006588
mse α 0.028767 0.072554 0.008817 0.000874 0.001352 0.000405 0.000949 0.000434 0.000259
mae α 0.139566 0.156468 0.026179 0.019321 0.021606 0.009982 0.017845 0.009705 0.010202
mean β 0.989995 0.990442 0.980081 0.978495 0.977240 0.978996 0.979339 0.978898 0.979279
bias β 0.009995 0.010442 0.000081 -0.001505 -0.002760 -0.001004 -0.000661 -0.001102 -0.000721
mse β 0.000533 0.001329 0.000149 0.000016 0.000033 0.000011 0.000014 0.000013 0.000005
mae β 0.018938 0.021213 0.003100 0.002503 0.003344 0.001732 0.002077 0.001849 0.001444
mean σ 0.077686 0.057410 0.154182 0.209621 0.203023 0.207680 0.206146 0.205631 0.209157
bias σ -0.088014 -0.108290 -0.011518 0.043921 0.037323 0.041980 0.040446 0.039931 0.043457
mse σ 0.015682 0.019399 0.001735 0.003441 0.003193 0.003309 0.003071 0.002986 0.003199
mae σ 0.115388 0.130757 0.018132 0.045794 0.042404 0.043876 0.042707 0.042980 0.045337
Table 18. Student t (4) Innovations in Mean, Subset of Instruments - α-.1472

β =.98 σ =.1657
5.3. Student-t Distribution (4) in the volatility innovations. In the next experiment,
we modied the data generator pro ess, assuming now that the innovation pro ess in the
volatility equation is given by a Student-t pro ess with 4 degrees of freedom, assuming in this
ase the usual supposition of Gaussian innovations in the mean equation. Note that, in this
onguration, the ee ts are expe ted to be more harmful, sin e now the ee t of heavier tails
is dire tly spread by the volatility equation's autoregressive stru ture, unlike the previous ase
150
Figure 5.5. MSE and MAE of the estimation of the referen e models, modi-
ed with Student-t with 4 d.f. innovation in the mean equation. Sample size
500 and 24 moment onditions.
Experiment 2 α =−.736 γ =.9 σ2 =.3629 Experiment 2 α =−0.368 γ =.95 σ2 =.26
mse alpha mse alpha

mae alpha mae alpha
mse gamma mse gamma
mae gamma mae gamma
0.8
0.3
0.6
0.2
0.4
0.1
0.2
0.0
0.0
Experiment 2 α =−.1472 γ =.98 σ2 =.1657

0.35
mse alpha
mae alpha
mse gamma
mae gamma
0.30
mse sigma^2
mae sigma^2
0.25
0.20
0.15
0.10
0.05
0.00
where the heavy-tailed innovations ae ted the mean equation, whi h was a pro ess without
orrelation.
Tables 19, 20 and 21 show the results obtained with 24 moment onditions, and Tables 22,
23 and 24 show the results obtained with 14 moment onditions. These results are summarized
in Figures 5.8 and 5.9. We note that these heavier tailed innovations ee tively damage the
performan e of the GMM-based estimators, and moderately damage the GEL-based estima-
tors. In this experiment, the robustness properties of the methods based on ET and ETEL
be ome evident, and these methods generally have a superior performan e in omparison with
the other methods. For example, the ratio between MSE for α estimated by Iterative GMM
and by the smoothed ETEL method has a value of 5102.984 for the rst parameter ongura-
tion, showing the dominan e of these methods in this situation of in orre t spe i ation. As
previously dis ussed, this robustness property is derived from the bounded inuen e fun tion
of the estimators based on ET, and it proves to be quite important in this situation. As
nan ial data is hara terized by heavy tails, we have an additional justi ation for the use
of the estimators proposed in this study.
Likewise, we an verify the ee ts of using a number of moment onditions in this on-
guration. Figure 5.10 shows the relative e ien y ee ts of the estimators obtained with
the in rease in the number of instruments from 14 to 24. However, in this onguration, we
have mixed results be ause for the rst parameter onguration there is a general gain in the
151

mean α -2.019463 -2.019004 -1.353874 -0.719185 -0.878625 -0.718273 -0.726762 -0.874355 -0.715406
bias α -1.283463 -1.283004 -0.617874 0.016815 -0.142625 0.017727 0.009238 -0.138355 0.020594
mse α 5.053566 6.598158 3.044399 0.058803 0.680573 0.015097 0.016660 0.644231 0.001293
mae α 1.475510 1.659492 1.051613 0.099185 0.432671 0.048763 0.039363 0.277956 0.024200
mean β 0.720150 0.719250 0.811384 0.898652 0.876085 0.900611 0.897372 0.877314 0.899777
bias β -0.179850 -0.180750 -0.088616 -0.001348 -0.023915 0.000611 -0.002628 -0.022686 -0.000223
mse β 0.097458 0.128318 0.059642 0.001216 0.013893 0.000300 0.000337 0.012709 0.000020
mae β 0.205075 0.231269 0.146937 0.013264 0.060945 0.006534 0.004285 0.038086 0.002403
mean σ 0.532352 0.431597 0.357978 0.366275 0.346430 0.393303 0.397762 0.381166 0.397085
bias σ 0.169452 0.068697 -0.004922 0.003375 -0.016470 0.030403 0.034862 0.018266 0.034185
mse σ 0.109984 0.085714 0.058204 0.005314 0.023325 0.003714 0.002450 0.014768 0.001909
mae σ 0.251868 0.239819 0.198540 0.051763 0.111622 0.047475 0.043035 0.079160 0.038535
Table 19. Student t (4) Innovations in Varian e - α=-0.736 β =.9 σ =.3629

mean α -1.531838 -1.505341 -0.411598 -0.363040 -0.356509 -0.360347 -0.357152 -0.355783 -0.356305
bias α -1.163838 -1.137341 -0.043598 0.004960 0.011491 0.007653 0.010848 0.012217 0.011695
mse α 4.582102 5.703206 0.006206 0.000298 0.000293 0.000355 0.000448 0.000265 0.000321
mae α 1.265176 1.353192 0.048679 0.013481 0.015033 0.015479 0.015028 0.014312 0.015145
mean β 0.786817 0.789921 0.942956 0.949416 0.949709 0.949327 0.949691 0.949781 0.949705
bias β -0.163183 -0.160079 -0.007044 -0.000584 -0.000291 -0.000673 -0.000309 -0.000219 -0.000295
mse β 0.089675 0.112105 0.000128 0.000005 0.000002 0.000005 0.000005 0.000002 0.000003
mae β 0.176775 0.189182 0.007696 0.001525 0.001065 0.001646 0.001189 0.000972 0.001124
mean σ 0.438012 0.340150 0.227405 0.294889 0.292910 0.290458 0.282838 0.292852 0.281492
bias σ 0.178012 0.080150 -0.032595 0.034889 0.032910 0.030458 0.022838 0.032852 0.021492
mse σ 0.111763 0.085479 0.001741 0.002467 0.001768 0.002375 0.001898 0.001811 0.001844
mae σ 0.241064 0.225776 0.034656 0.041202 0.037140 0.040675 0.036961 0.038883 0.037063
Table 20. Student t (4) Innovations in Varian e - α=-0.368 β =.95 σ =.26

mean α -1.183910 -1.169166 -0.139778 -0.169468 -0.177636 -0.176430 -0.171889 -0.177510 -0.180039
bias α -1.036710 -1.021966 0.007422 -0.022268 -0.030436 -0.029230 -0.024689 -0.030310 -0.032839
mse α 4.167620 5.284636 0.002062 0.000871 0.003163 0.003335 0.001260 0.002726 0.003574
mae α 1.083632 1.122330 0.019276 0.022964 0.030810 0.030064 0.026382 0.030722 0.033550
mean β 0.833844 0.835416 0.981116 0.976373 0.974736 0.974993 0.975766 0.974741 0.974392
bias β -0.146156 -0.144584 0.001116 -0.003627 -0.005264 -0.005007 -0.004234 -0.005259 -0.005608
mse β 0.083851 0.106097 0.000032 0.000019 0.000090 0.000091 0.000030 0.000079 0.000099
mae β 0.152473 0.158181 0.002408 0.003689 0.005304 0.005093 0.004429 0.005307 0.005681
mean σ 0.340019 0.254258 0.148694 0.191014 0.199949 0.194585 0.182533 0.197417 0.188237
bias σ 0.174319 0.088558 -0.017006 0.025314 0.034249 0.028885 0.016833 0.031717 0.022537
mse σ 0.097556 0.071992 0.001456 0.001879 0.003394 0.003096 0.001858 0.003056 0.002708
mae σ 0.222331 0.195336 0.017893 0.033934 0.042568 0.040125 0.033289 0.041190 0.037962
Table 21. Student t (4) Innovations in Varian e - α-.1472 β =.98 σ =.1657

mean α -1.624080 -1.558604 -1.060265 -0.734392 -0.676903 -0.956436 -0.853334 -1.077795 -0.725856
bias α -0.888080 -0.822604 -0.324265 0.001608 0.059097 -0.220436 -0.117334 -0.341795 0.010144
mse α 3.928852 4.590728 2.015175 0.173709 0.230651 0.815326 0.491227 1.434573 0.088954
mae α 1.200330 1.292493 0.849144 0.137095 0.238348 0.374986 0.176277 0.537670 0.045287
mean β 0.773478 0.782450 0.851910 0.896662 0.905272 0.864619 0.879744 0.849333 0.899025
bias β -0.126522 -0.117550 -0.048090 -0.003338 0.005272 -0.035381 -0.020256 -0.050667 -0.000975
mse β 0.077261 0.090068 0.039782 0.003338 0.004464 0.017650 0.009865 0.027960 0.001882
mae β 0.167782 0.180375 0.118296 0.018190 0.032246 0.054444 0.023862 0.074613 0.005793
mean σ 0.490207 0.430327 0.370190 0.380871 0.359469 0.428566 0.398392 0.407583 0.403044
bias σ 0.127307 0.067427 0.007290 0.017971 -0.003431 0.065666 0.035492 0.044683 0.040144
mse σ 0.097302 0.093233 0.060965 0.007639 0.015327 0.017199 0.006391 0.031044 0.003633
mae σ 0.226197 0.236678 0.191858 0.061614 0.091534 0.096654 0.051057 0.117093 0.048806
Table 22. Student t (4) Innovations in Varian e, Subset of Instruments -

α=-0.736 β =.9 σ =.3629
152
Figure 5.6. MSE and MAE of the estimation of the referen e models, modi-
ed with Student-t with 4 d.f. innovation in the mean equation. Sample size
500 and 14 moment onditions.
mse alpha mse alpha

mae alpha mae alpha
0.30
mse gamma mse gamma
mae gamma mae gamma
0.5

0.25
0.4
0.20
0.3
0.15
0.2
0.10
0.1
0.05
0.00
0.0

0.15
mse alpha
mae alpha
mse gamma
mae gamma
mse sigma^2
mae sigma^2
0.10
0.05
0.00

mean α -1.197984 -1.098225 -0.361745 -0.370533 -0.361434 -0.368460 -0.371025 -0.358880 -0.365012
bias α -0.829984 -0.730225 0.006255 -0.002533 0.006566 -0.000460 -0.003025 0.009120 0.002988
mse α 3.604356 3.690784 0.001912 0.000341 0.000276 0.000471 0.000384 0.000188 0.000241
mae α 0.987495 0.971869 0.017620 0.011394 0.013219 0.015002 0.012487 0.011009 0.010551
mean β 0.832283 0.845932 0.949841 0.949360 0.949605 0.948366 0.950019 0.949814 0.949038
bias β -0.117717 -0.104068 -0.000159 -0.000640 -0.000395 -0.001634 0.000019 -0.000186 -0.000962
mse β 0.071710 0.073722 0.000037 0.000010 0.000007 0.000016 0.000007 0.000002 0.000007
mae β 0.138731 0.136549 0.002583 0.002030 0.001903 0.002706 0.001824 0.001152 0.001797
mean σ 0.387174 0.328301 0.261493 0.316921 0.301799 0.312991 0.310528 0.307241 0.316634
bias σ 0.127174 0.068301 0.001493 0.056921 0.041799 0.052991 0.050528 0.047241 0.056634
mse σ 0.092473 0.081984 0.001088 0.004417 0.002851 0.004073 0.003565 0.002990 0.004241
mae σ 0.208110 0.207164 0.019947 0.057437 0.045345 0.054480 0.051266 0.049407 0.057532

α=-0.368 β =.95 σ =.26
estimators - though more noti eable for the estimators based on GEL/GMC -, but for the
other ongurations there are losses, parti ularly in the estimation of the volatility parameter
σ in the se ond onguration.
5.4. Experiment 4 - Level Outlier. In order to verify the ee ts of aberrant observations
(outliers) in the pro ess of sto hasti estimation, we undertook two lasses of experiments.
In this part of our study we will verify the ee t of the so- alled level outliers (in Hotta and
153
Figure 5.7. Relative E ien y in the referen e models modied with

Student-t with 4 d.f innovation in the mean equation - Ee t of number of
moment onditions - (MSE 14 moment onditions /MSE 24 moment ondi-
tions). Sample size 500.

1.4
12
1.2
10
1.0
8
0.8
6
0.6
4
0.4
2
0.2
0.0
0
1.4

1.2
1.0
0.8
0.6
0.4
0.2
0.0

mean α -1.013611 -0.966006 -0.160485 -0.162815 -0.170481 -0.159132 -0.158832 -0.158234 -0.152619
bias α -0.866411 -0.818806 -0.013285 -0.015615 -0.023281 -0.011932 -0.011632 -0.011034 -0.005419
mse α 4.022485 4.187654 0.038329 0.001189 0.001502 0.007726 0.001482 0.000565 0.000261
mae α 0.936151 0.921012 0.031955 0.018731 0.024269 0.014440 0.015531 0.013051 0.008082
mean β 0.857288 0.863844 0.977562 0.977898 0.976214 0.977718 0.978634 0.977919 0.978892
bias β -0.122712 -0.116156 -0.002438 -0.002102 -0.003786 -0.002282 -0.001366 -0.002081 -0.001108
mse β 0.080732 0.084098 0.000880 0.000030 0.000039 0.000203 0.000033 0.000017 0.000008
mae β 0.132099 0.129967 0.004854 0.002841 0.003945 0.002639 0.002217 0.002398 0.001452
mean σ 0.304510 0.262075 0.168088 0.211233 0.203048 0.207656 0.205108 0.212755 0.214762
bias σ 0.138810 0.096375 0.002388 0.045533 0.037349 0.041956 0.039408 0.047055 0.049062
mse σ 0.086786 0.077900 0.002828 0.003107 0.002707 0.003095 0.002736 0.003547 0.003678
mae σ 0.201054 0.195892 0.017739 0.046314 0.040821 0.043382 0.040756 0.049855 0.050215

α-.1472 β =.98 σ =.1657
Tsay (1998)'s nomen lature) in the estimation of SV models. In this experiment the generator
pro ess is given by:
(5.1) yt = σt εt + LOt
154
Figure 5.8. MSE and MAE of the estimation of the referen e models, mod-
ied with Student-t with 4 d.f. innovation in the volatility equation. Sample
size 500 and 24 moment onditions.
Experiment 3 α =−.736 γ =.9 σ2 =.3629 Experiment 3 α =−0.368 γ =.95 σ2 =.26
mse alpha mse alpha

mae alpha mae alpha
6
mse gamma mse gamma

mae gamma mae gamma
4
5
3
4
3
2
2
1
1
0
0
Experiment 3 α =−.1472 γ =.98 σ2 =.1657
mse alpha
5
mae alpha
mse gamma
mae gamma
mse sigma^2
mae sigma^2
4
3
2
1
0
(5.2) logσt2 = α + βlogσt−1

2
+ σut ,
where LOt is a binary variable with positive value of 5 standard deviations of the pro ess
if the observation is arried out in the period t=251, and zero in the other observations. Note
that in this experiment the outlier do not ae t the persisten e in the volatility pro ess. The
results of this experiment are displayed in Tables 25,26 and 27 for the set of 24 moments;
and in Tables 28, 29 and 30 for the set of 14 moments; and the results of MSE and MAE
are summarized in Figures 5.11 and 5.12. We observe a better performan e of the estimators
based on GEL/GMC, parti ularly those employing the ET method for the al ulation of the
Lagrange multipliers. For example, the ratio of 260.8 between the MSE of the GMM Iterative
estimator and the smoothed ET estimator of α in the parameter onguration 3 supports
the eviden e that the robustness properties of this lass of estimators have advantages in
the estimation of SV models. The performan e of these estimators is more noti eable in the
situation of longer volatility persisten e, given by parameter ve tor 3.
It is not possible, however, to identify a lear ee t of the number of moment onditions
in this experiment, sin e the ee ts are similar to those o urred in the previous experiments
with heavy-tailed innovations. As per Figure 5.13, the relative e ien y between 14 and 24
moment onditions, for parameter ve tors 2 and 3 indi ates that the in reased number of
155
ied with Student-t with 4 d.f. innovation in the volatility equation. Sample
size 500 and 14 moment onditions.
mse alpha mse alpha

mae alpha mae alpha
mse gamma mse gamma
mae gamma mae gamma
4

3
3
2
2
1
1
0
0
mse alpha
4
mae alpha
mse gamma
mae gamma
mse sigma^2
mae sigma^2
3
2
1
0

mean α -0.896686 -0.921662 -0.796978 -0.728243 -0.835926 -0.729340 -0.734904 -0.757911 -0.732839
bias α -0.160686 -0.185662 -0.060978 0.007757 -0.099926 0.006660 0.001096 -0.021911 0.003161
mse α 0.758956 1.251152 0.627006 0.029238 0.367820 0.005461 0.001601 0.075302 0.000141
mae α 0.563200 0.764779 0.503486 0.055221 0.350987 0.024025 0.010621 0.085502 0.008789
mean β 0.879116 0.875898 0.892628 0.901656 0.885525 0.903877 0.900704 0.897573 0.901291
bias β -0.020884 -0.024102 -0.007372 0.001656 -0.014475 0.003877 0.000704 -0.002427 0.001291
mse β 0.013699 0.022533 0.011362 0.000583 0.007103 0.000121 0.000031 0.001436 0.000007
mae β 0.076157 0.103184 0.068596 0.008517 0.048875 0.005174 0.001878 0.012231 0.001886
mean σ 0.287003 0.236533 0.232068 0.355123 0.286266 0.399452 0.397132 0.351652 0.393545
bias σ -0.075897 -0.126367 -0.130832 -0.007777 -0.076634 0.036552 0.034232 -0.011248 0.030645
mse σ 0.032933 0.049530 0.039221 0.004932 0.019929 0.003515 0.002085 0.008075 0.001812
mae σ 0.150171 0.189751 0.171741 0.055842 0.114077 0.048633 0.041545 0.069468 0.037968
Table 25. Level Outlier - α=-0.736 β =.9 σ=.3629
instruments represent a loss in performan e in most ases, parti ularly for the estimation of
parameter σ.
5.5. Experiment 5 - Volatility Outlier. In the last spe i ation tested, we veried the
ee t of a so- alled volatility outlier (as named by Hotta and Tsay (1998)) in the estimation.
In this experiment, the data generator pro ess is given by:
(5.3) yt = σt εt
156
Figure 5.10. Relative E ien y in the referen e models modied with

Student-t with 4 d.f innovation in the volatility equation - Ee t of num-
ber of moment onditions - (MSE 14 moment onditions /MSE 24 moment
onditions). Sample size 500.

400
1.5
300
1.0
200
0.5
100
0.0
0

25

20
15
10
5
0

mean α -0.450450 -0.452413 -0.505817 -0.369780 -0.366575 -0.368322 -0.367022 -0.366164 -0.365422
bias α -0.082450 -0.084413 -0.137817 -0.001780 0.001425 -0.000322 0.000978 0.001836 0.002578
mse α 0.351303 0.524573 0.029065 0.000364 0.000199 0.000283 0.000401 0.000209 0.000378
mae α 0.375102 0.464943 0.140401 0.012881 0.010511 0.012609 0.012380 0.011065 0.012730
mean β 0.939147 0.938991 0.931884 0.950220 0.950221 0.950119 0.950194 0.950278 0.950372
bias β -0.010853 -0.011009 -0.018116 0.000220 0.000221 0.000119 0.000194 0.000278 0.000372
mse β 0.006482 0.009584 0.000528 0.000007 0.000003 0.000005 0.000006 0.000003 0.000006
mae β 0.050839 0.062913 0.018567 0.001763 0.001325 0.001726 0.001484 0.001256 0.001481
mean σ 0.185589 0.143397 0.208345 0.266704 0.273863 0.263786 0.259905 0.272609 0.259491
bias σ -0.074411 -0.116603 -0.051655 0.006704 0.013863 0.003786 -0.000095 0.012609 -0.000509
mse σ 0.026369 0.036223 0.003550 0.001492 0.001554 0.001678 0.001966 0.001545 0.002029
mae σ 0.134897 0.165390 0.051706 0.030328 0.032812 0.033478 0.036517 0.033588 0.036596
Table 26. Level Outlier - α=-0.368 β =.95 σ=.26
(5.4) logσt2 = α + βlogσt−1

2
+ σut + V Ot ,
where V Ot is a binary variable with positive value of 5 standard deviations in observation
251 of the volatility equation and zero in the other observations. In this situation, there is
a dire t propagation of the ee ts of the outlier in the volatility be ause now the ee t is
157

mean α -0.227624 -0.232687 -0.156716 -0.177422 -0.167124 -0.166912 -0.178924 -0.168041 -0.169662
bias α -0.080424 -0.085487 -0.009516 -0.030222 -0.019924 -0.019712 -0.031724 -0.020841 -0.022462
mse α 0.182403 0.256869 0.008321 0.001678 0.000806 0.000916 0.002641 0.000985 0.001197
mae α 0.234099 0.269799 0.031758 0.031825 0.020514 0.020990 0.034853 0.021881 0.024061
mean β 0.969217 0.968663 0.979613 0.976150 0.977225 0.977292 0.975734 0.977080 0.976873
bias β -0.010783 -0.011337 -0.000387 -0.003850 -0.002775 -0.002708 -0.004266 -0.002920 -0.003127
mse β 0.003380 0.004678 0.000148 0.000025 0.000019 0.000021 0.000044 0.000023 0.000026
mae β 0.031679 0.036386 0.003503 0.004078 0.002851 0.002860 0.004695 0.003064 0.003324
mean σ 0.107893 0.083627 0.145206 0.174273 0.172216 0.163029 0.167475 0.170336 0.162140
bias σ -0.057807 -0.082073 -0.020494 0.008573 0.006516 -0.002671 0.001775 0.004636 -0.003560
mse σ 0.016458 0.020176 0.002062 0.001507 0.001380 0.001435 0.001771 0.001392 0.001579
mae σ 0.112344 0.127673 0.020619 0.030929 0.029741 0.029958 0.034131 0.029380 0.032292
Table 27. Level Outlier - α-.1472 β =.98 σ=.1657

mean α -0.606591 -0.573558 -0.538447 -0.687828 -0.625819 -0.706385 -0.774831 -0.809760 -0.733482
bias α 0.129409 0.162442 0.197553 0.048172 0.110181 0.029615 -0.038831 -0.073760 0.002518
mse α 0.584982 0.692991 0.525481 0.080173 0.219997 0.190438 0.116738 0.801588 0.001846
mae α 0.515088 0.569472 0.498117 0.135071 0.306007 0.204829 0.100064 0.469002 0.017395
mean β 0.918228 0.922744 0.927494 0.906621 0.915070 0.905568 0.895372 0.890500 0.902918
bias β 0.018228 0.022744 0.027494 0.006621 0.015070 0.005568 -0.004628 -0.009500 0.002918
mse β 0.010605 0.012582 0.009568 0.001503 0.004040 0.003652 0.002273 0.014845 0.000050
mae β 0.069723 0.077076 0.067698 0.018982 0.041769 0.029291 0.014573 0.064207 0.003939
mean σ 0.248488 0.222641 0.223980 0.323688 0.283371 0.372535 0.363399 0.298043 0.405760
bias σ -0.114412 -0.140259 -0.138920 -0.039212 -0.079529 0.009635 0.000499 -0.064857 0.042860
mse σ 0.036861 0.047877 0.042372 0.007156 0.016915 0.009111 0.004789 0.025874 0.003402
mae σ 0.159842 0.184737 0.173584 0.065210 0.103195 0.076377 0.049515 0.123514 0.048524
Table 28. Level Outlier, Subset of Instruments - α=-0.736 β =.9 σ=.3629

mean α -0.335845 -0.330789 -0.386053 -0.370569 -0.360749 -0.366858 -0.372957 -0.362454 -0.367322
bias α 0.032155 0.037211 -0.018053 -0.002569 0.007251 0.001142 -0.004957 0.005546 0.000678
mse α 0.393126 0.447313 0.005856 0.000258 0.000216 0.000272 0.000349 0.000100 0.000117
mae α 0.346897 0.376404 0.033920 0.009525 0.010960 0.010322 0.012304 0.007358 0.006922
mean β 0.954821 0.955542 0.948156 0.950982 0.951276 0.950536 0.951031 0.950969 0.950638
bias β 0.004821 0.005542 -0.001844 0.000982 0.001276 0.000536 0.001031 0.000969 0.000638
mse β 0.007061 0.008069 0.000095 0.000009 0.000006 0.000008 0.000010 0.000003 0.000005
mae β 0.046834 0.050766 0.004224 0.002083 0.001983 0.001981 0.002292 0.001400 0.001500
mean σ 0.163024 0.146836 0.230379 0.288998 0.270922 0.285684 0.288388 0.284835 0.293978
bias σ -0.096976 -0.113164 -0.029621 0.028998 0.010922 0.025684 0.028388 0.024835 0.033978
mse σ 0.028388 0.034878 0.002244 0.001811 0.001298 0.001837 0.001679 0.001516 0.002272
mae σ 0.143643 0.161651 0.031444 0.032089 0.028832 0.032401 0.031920 0.031852 0.036552
Table 29. Level Outlier, Subset of Instruments - α=-0.368 β =.95 σ=.26

mean α -0.188493 -0.208328 -0.132164 -0.159277 -0.163758 -0.151777 -0.158373 -0.151709 -0.154944
bias α -0.041293 -0.061128 0.015036 -0.012077 -0.016558 -0.004577 -0.011173 -0.004509 -0.007744
mse α 0.277367 0.463592 0.002377 0.000597 0.000662 0.000095 0.000625 0.000122 0.000265
mae α 0.217575 0.248036 0.026094 0.016203 0.017784 0.005898 0.015817 0.006516 0.010448
mean β 0.974756 0.972189 0.982451 0.979374 0.977933 0.979646 0.979411 0.979550 0.979339
bias β -0.005244 -0.007811 0.002451 -0.000626 -0.002067 -0.000354 -0.000589 -0.000450 -0.000661
mse β 0.004900 0.008202 0.000043 0.000007 0.000013 0.000003 0.000008 0.000003 0.000004
mae β 0.029199 0.033214 0.003226 0.001859 0.002529 0.001038 0.001722 0.001228 0.001289
mean σ 0.100210 0.092477 0.140692 0.196440 0.181443 0.192532 0.189730 0.189435 0.196667
bias σ -0.065490 -0.073223 -0.025008 0.030740 0.015743 0.026832 0.024030 0.023735 0.030967
mse σ 0.017358 0.020779 0.002875 0.001913 0.001262 0.001657 0.001677 0.001469 0.002163
mae σ 0.115406 0.125359 0.025701 0.033128 0.027601 0.030217 0.027776 0.029240 0.034699
Table 30. Level Outlier, Subset of Instruments - α-.1472 β =.98 σ=.1657
dire tly transmitted by the autoregressive stru ture in the volatility equation, whereas the
ee t was indire t in the ase of an innovation outlier.
158
ied with Level Outlier. Sample size 500 and 24 moment onditions.
1.2 Experiment 4 α =−.736 γ =.9 σ2 =.3629 Experiment 4 α =−0.368 γ =.95 σ2 =.26
mse alpha mse alpha
0.35
mae alpha mae alpha
mse gamma mse gamma
mae gamma mae gamma
0.30
1.0
0.25
0.8
0.20
0.6
0.15
0.4
0.10
0.2
0.05
0.00
0.0
Experiment 4 α =−.1472 γ =.98 σ2 =.1657
mse alpha
0.25
mae alpha
mse gamma
mae gamma
mse sigma^2
mae sigma^2
0.20
0.15
0.10
0.05
0.00

mean α -0.706232 -0.637902 -0.592404 -0.709767 -0.704501 -0.725353 -0.730109 -0.734256 -0.731349
bias α 0.029768 0.098098 0.143596 0.026233 0.031499 0.010647 0.005891 0.001744 0.004651
mse α 0.465434 0.924587 0.469655 0.015619 0.119894 0.004171 0.001127 0.017193 0.000166
mae α 0.454061 0.648185 0.466139 0.043156 0.177426 0.021347 0.011337 0.036012 0.009249
mean β 0.904874 0.914112 0.920234 0.903850 0.903874 0.903045 0.900937 0.900494 0.900896
bias β 0.004874 0.014112 0.020234 0.003850 0.003874 0.003045 0.000937 0.000494 0.000896
mse β 0.008459 0.016728 0.008446 0.000292 0.002275 0.000089 0.000022 0.000323 0.000005
mae β 0.061305 0.087544 0.063305 0.006366 0.024527 0.004016 0.001827 0.005275 0.001637
mean σ 0.256169 0.177301 0.185598 0.363270 0.300050 0.393189 0.400320 0.377173 0.396990
bias σ -0.106731 -0.185599 -0.177302 0.000370 -0.062850 0.030289 0.037420 0.014273 0.034090
mse σ 0.033791 0.059072 0.049032 0.005121 0.015349 0.003025 0.001967 0.004401 0.001776
mae σ 0.153312 0.214148 0.195515 0.055298 0.097185 0.045333 0.040743 0.051957 0.038353
Table 31. Volatility Outlier - α=-0.736 β =.9 σ=.3629
Tables 31, 32 and 33 (estimation with 24 moments) and 34, 35 and 36 (estimation with 14
moments) show the results of estimations whi h an be summarized by Figures 5.14 and 5.15
with the MSE and MAE results. As per previous experiments, the GEL/GMC-based estima-
tors have in general a superior performan e in omparison with the GMM-based methods, and
show that the same properties of robustness remain valid in this volatility outlier situation,
whi h would be potentially more serious for the estimation of volatility parameters.
The ee t of the larger number of instruments in this situation an be seen in Figure 5.16,
whi h indi ates that there is an e ien y gain with a higher number of instruments in the
159
ied with Level Outlier. Sample size 500 and 14 moment onditions.
0.8 Experiment 4 − Subset Instruments α =−.736 γ =.9 σ2 =.3629 Experiment 4 − Subset Instruments α =−0.368 γ =.95 σ2 =.26
mse alpha mse alpha

mae alpha mae alpha
mse gamma mse gamma
mae gamma mae gamma
0.4
0.6
0.3
0.4
0.2
0.2
0.1
0.0
0.0
mse alpha
mae alpha
mse gamma
0.4
mae gamma
mse sigma^2
mae sigma^2
0.3
0.2
0.1
0.0

mean α -0.351970 -0.352200 -0.508979 -0.370434 -0.365920 -0.368212 -0.367283 -0.365273 -0.367459
bias α 0.016030 0.015800 -0.140979 -0.002434 0.002080 -0.000212 0.000717 0.002727 0.000541
mse α 0.334339 0.610981 0.030704 0.001105 0.000219 0.000269 0.000495 0.000259 0.001637
mae α 0.329172 0.438042 0.144810 0.014213 0.011012 0.012832 0.012775 0.011757 0.013554
mean β 0.952649 0.952689 0.931288 0.949823 0.950035 0.949864 0.949897 0.950127 0.949848
bias β 0.002649 0.002689 -0.018712 -0.000177 0.000035 -0.000136 -0.000103 0.000127 -0.000152
mse β 0.005887 0.010844 0.000560 0.000017 0.000003 0.000005 0.000008 0.000004 0.000027
mae β 0.044410 0.059013 0.019272 0.001791 0.001296 0.001651 0.001511 0.001290 0.001593
mean σ 0.158805 0.108873 0.202559 0.269724 0.274544 0.267376 0.265636 0.273019 0.265015
bias σ -0.101195 -0.151127 -0.057441 0.009724 0.014544 0.007376 0.005636 0.013019 0.005015
mse σ 0.026434 0.040086 0.004335 0.001910 0.001650 0.001972 0.002104 0.001773 0.002085
mae σ 0.138501 0.180479 0.057666 0.033782 0.034282 0.035755 0.037771 0.035826 0.037569
Table 32. Volatility Outlier - α=-0.368 β =.95 σ =.26
situation with low persisten e; however, for situations with higher volatility persisten e, the
additional instruments generally present noti eable deterioration in the estimators' MSE.
6. Con lusions
In this study we dis ussed the estimation of SV models using estimators based on gener-
alizations of the empiri al likelihood and minimum ontrast methods. The performan e of
these estimators, as shown by a set of Monte Carlo experiments, proved to be superior to
the estimation methods based on generalized method of moments, and also superior to the
160
Figure 5.13. Relative E ien y in the referen e models with level outlier -
Ee t of number of moment onditions - (MSE 14 moment onditions /MSE
24 moment onditions). Sample size 500.

70

1.5
60
50
1.0
40
30
0.5
20
10
0.0
0

1.5
1.0
0.5
0.0

mean α -0.148088 -0.156272 -0.153073 -0.175806 -0.168742 -0.169571 -0.177988 -0.171109 -0.173611
bias α -0.000888 -0.009072 -0.005873 -0.028606 -0.021542 -0.022371 -0.030788 -0.023909 -0.026411
mse α 0.097686 0.181002 0.001995 0.001490 0.001060 0.001655 0.002637 0.001731 0.002607
mae α 0.179616 0.224732 0.021219 0.029802 0.022380 0.023438 0.034092 0.024905 0.027865
mean β 0.979896 0.978821 0.980121 0.976108 0.976763 0.976686 0.975623 0.976401 0.976064
bias β -0.000104 -0.001179 0.000121 -0.003892 -0.003237 -0.003314 -0.004377 -0.003599 -0.003936
mse β 0.001817 0.003372 0.000031 0.000025 0.000027 0.000042 0.000047 0.000045 0.000061
mae β 0.024394 0.030485 0.002028 0.004047 0.003334 0.003450 0.004797 0.003721 0.004109
mean σ 0.083003 0.058572 0.150466 0.174770 0.176514 0.168816 0.170293 0.175436 0.167197
bias σ -0.082697 -0.107128 -0.015234 0.009070 0.010814 0.003116 0.004593 0.009736 0.001497
mse σ 0.015647 0.020630 0.001414 0.001570 0.001758 0.001881 0.001829 0.001875 0.001901
mae σ 0.112508 0.132708 0.015302 0.031762 0.032617 0.032367 0.034844 0.033013 0.033659
Table 33. Volatility Outlier - α-.1472 β =.98 σ=.1657
methods based on simulation su h as MCMC and Monte Carlo maximum likelihood as studied
in Takada (2009).
The results obtained in this study are onsistent with those obtained by other studies (e.g.
Newey and Smith (2004)), whi h demonstrate that alternative estimators based on moments,
formulated as GEL/GMC-based estimators, display superior performan e, nullifying the bias
problems o urring in the usual GMM estimators. The proposed estimators manage to obtain
superior properties in nite samples by a better use of the informational ontent present in the
moment onditions, sin e the higher e ien y is obtained not only by means of weighting by
161

mean α -0.380933 -0.319525 -0.346457 -0.675964 -0.533546 -0.689775 -0.722365 -0.629596 -0.730389
bias α 0.355067 0.416475 0.389543 0.060036 0.202454 0.046225 0.013635 0.106404 0.005611
mse α 0.400762 0.522664 0.372611 0.053243 0.136697 0.052746 0.017866 0.201125 0.002570
mae α 0.507168 0.583628 0.506768 0.103944 0.263422 0.106389 0.039998 0.254014 0.016877
mean β 0.948491 0.956853 0.953213 0.908027 0.927169 0.907076 0.902300 0.914907 0.902072
bias β 0.048491 0.056853 0.053213 0.008027 0.027169 0.007076 0.002300 0.014907 0.002072
mse β 0.007406 0.009648 0.006851 0.000982 0.002507 0.001024 0.000334 0.003711 0.000057
mae β 0.068937 0.079200 0.068915 0.014429 0.035831 0.015276 0.005944 0.034862 0.003113
mean σ 0.183818 0.142762 0.161442 0.338956 0.248977 0.380012 0.374279 0.283110 0.396593
bias σ -0.179082 -0.220138 -0.201458 -0.023944 -0.113923 0.017112 0.011379 -0.079790 0.033693
mse σ 0.052152 0.071179 0.060068 0.008752 0.025916 0.006950 0.004083 0.023850 0.002701
mae σ 0.200869 0.239498 0.218347 0.070212 0.132413 0.064921 0.048370 0.118879 0.043555
Table 34. Volatility Outlier, Subset of Instruments - α=-0.736 β =.9 σ =.3629

mean α -0.172707 -0.155808 -0.416848 -0.371238 -0.360159 -0.366675 -0.372505 -0.361854 -0.366928
bias α 0.195293 0.212192 -0.048848 -0.003238 0.007841 0.001325 -0.004505 0.006146 0.001072
mse α 0.205893 0.235857 0.008678 0.000264 0.000227 0.000177 0.000371 0.000112 0.000124
mae α 0.315813 0.339617 0.063105 0.010417 0.011081 0.008793 0.012457 0.007730 0.007175
mean β 0.976594 0.978933 0.943591 0.950694 0.951035 0.950248 0.950705 0.950765 0.950356
bias β 0.026594 0.028933 -0.006409 0.000694 0.001035 0.000248 0.000705 0.000765 0.000356
mse β 0.003823 0.004361 0.000162 0.000008 0.000005 0.000005 0.000009 0.000003 0.000005
mae β 0.042912 0.046096 0.008475 0.002108 0.001795 0.001665 0.002122 0.001287 0.001475
mean σ 0.104251 0.082372 0.217374 0.293798 0.272302 0.287402 0.288896 0.283725 0.295915
bias σ -0.155749 -0.177628 -0.042626 0.033798 0.012302 0.027402 0.028896 0.023725 0.035915
mse σ 0.036282 0.044006 0.003188 0.002233 0.001357 0.001832 0.001810 0.001541 0.002494
mae σ 0.172382 0.192864 0.044066 0.036650 0.029614 0.032069 0.032802 0.032286 0.038483
Table 35. Volatility Outlier, Subset of Instruments - α=-0.368 β =.95 σ =.26

mean α -0.087077 -0.083241 -0.151294 -0.158665 -0.161511 -0.153171 -0.158281 -0.152353 -0.153574
bias α 0.060123 0.063959 -0.004094 -0.011465 -0.014311 -0.005971 -0.011082 -0.005153 -0.006374
mse α 0.113590 0.127431 0.014150 0.000588 0.000775 0.000351 0.000671 0.000194 0.000245
mae α 0.162835 0.170595 0.028076 0.015739 0.016077 0.007695 0.014863 0.006753 0.009692
mean β 0.988190 0.988723 0.980057 0.979187 0.977970 0.979248 0.979189 0.979258 0.979384
bias β 0.008190 0.008723 0.000057 -0.000813 -0.002030 -0.000752 -0.000811 -0.000742 -0.000616
mse β 0.002128 0.002384 0.000221 0.000010 0.000019 0.000009 0.000010 0.000005 0.000004
mae β 0.022082 0.023135 0.003205 0.001995 0.002513 0.001365 0.001746 0.001348 0.001257
mean σ 0.057910 0.048059 0.148996 0.197172 0.182484 0.192340 0.190732 0.192244 0.199407
bias σ -0.107790 -0.117641 -0.016704 0.031472 0.016784 0.026640 0.025032 0.026544 0.033707
mse σ 0.018145 0.020719 0.001930 0.002109 0.001506 0.001826 0.001733 0.001906 0.002436
mae σ 0.125675 0.135012 0.018445 0.034274 0.029489 0.030313 0.028883 0.032819 0.037243
Table 36. Volatility Outlier, Subset of Instruments - α-.1472 β =.98 σ =.1657
the estimators' varian e - as in the ase of GMM estimators - but also by the non-parametri
estimation of the likelihood fun tion of the pro ess, as dis ussed in Antoine et al. (2007).
Another related property lies in the fa t that the bias of these estimators does not grow with
the number of moment onditions, as happens in the ase of GMM estimators. Thus, it is
possible to obtain e ien y properties by using an adequate number of moment onditions.
This hara teristi an be parti ularly important in the estimation of multivariate SV models,
in whi h the number of moment onditions is proportional to the number of series studied.
As the estimation of multivariate SV models still represents a great omputational hallenge,
(e.g. Chib et al. (2009)), the estimation by methods based on empiri al likelihood/minimum
ontrast an be an e ient alternative to be explored.
These results are parti ularly interesting be ause the implementation of the methods dis-
ussed in this study is omputationally simpler than the implementation of methods based on
simulation, requiring just one spe i ation of the moment onditions of sto hasti volatility
162
Figure 5.14. MSE and MAE of the estimation of the referen e models mod-
ied with volatility outlier. Sample size 500 and 24 moment onditions.
Experiment 5α =−.736 γ =.9 σ2 =.3629 Experiment 5α =−0.368 γ =.95 σ2 =.26
0.35
mse alpha mse alpha
mae alpha mae alpha
mse gamma mse gamma
0.8
mae gamma mae gamma

0.30
0.25
0.6
0.20
0.4
0.15
0.10
0.2
0.05
0.00
0.0
Experiment 5α =−.1472 γ =.98 σ2 =.1657
mse alpha
mae alpha
0.20
mse gamma
mae gamma
mse sigma^2
mae sigma^2
0.15
0.10
0.05
0.00
pro esses. Although this study is based on the spe i ation of the log-normal SV model, it is
important to note that this pro edure an be generalized by using the methodology proposed
by Meddahi (2001), whi h makes possible the automati generation of moment onditions for
pro esses that belong to the so- alled SV-eigenfun tions family.
Another important hara teristi is related to robustness properties and spe i ation prob-
lems, parti ularly of the√methods based on ET, whi h, due to properties in their inuen e
fun tion, manage to be n onsistent even in the presen e of spe i ation problems. This
property is parti ularly important in the presen e of pro esses of heavy-tail innovation, as
veried in this study by the use of a Student-t distribution with non-nite kurtosis, or else in
the presen e of level or volatility outliers.
Referen es
Anatolyev, S.: 2005, Gmm, gel, serial orrelation and asymptoti bias, E onometri a 73, 983
1002.
Andersen, T.: 1994, Sto hasti autoregressive volatility: A framework for volatility modelling,
Mathemati al Finan e 4, 75102.
Andersen, T. and Sorensen, B.: 1996, Gmm estimation of a sto hasti volatility model: A
Monte Carlo study, Journal of Business and E onomi Statisti s 14(3), 328352.
163
Figure 5.15. MSE and MAE of the estimation of the referen e models mod-
ied with volatility outlier. Sample size 500 and 14 moment onditions.
mse alpha mse alpha

mae alpha mae alpha
mse gamma mse gamma
0.30
mae gamma mae gamma
0.5

0.25
0.4
0.20
0.3
0.15
0.2
0.10
0.1
0.05
0.00
0.0
mse alpha
mae alpha
mse gamma
0.15
mae gamma
mse sigma^2
mae sigma^2
0.10
0.05
0.00
Andrews, D. W. K.: 1991, Heteroskedasti ity and auto orrelation onsistent ovarian e matrix
estimation, E onometri a
59, 817858.
Antoine, B., Bonnal, H. and Renault, E.: 2007, On the e iente use of the informational
ontent of estimating equations: implied probabilities and eu lidian empiri al likelihood,
Journal Of E onometri s 138, 461487.
Barndor-Nielsen, O. E., No lato, E. and Shephard, N. G.: 2002, Some re ent developments
in sto hasti volatility modelling, 2, 1123. Quantitative Finan e
Bi kel, P., Klassen, C., Ritov, Y. and Wellner, J.: 1993, E ient and Adaptative Estimation
for Semiparametri Models , Johns Hopkins Press.
Bollerslev, T.: 1986, Generalized autoregressive onditional heteroskedasti ity, Journal of
E onometri s 32, 307327.
Broto, C. and E., R.: 2004, Estimation methods for sto hasti volatility methods: A survey,
Journal of E onomi Surveys 18(5), 613649.
Chib, S., Omori, Y. and Asai, M.: 2009, Handbook of Finan ial Time Series
, Springer, hapter
Multivariate Sto hasti Volatility Models, pp. 365402.
DasGupta, A.: 2008, Asymptoti Theory of Statisti s and Probability
, Sprimger.
Engle, R. F.: 1982, Autoregressive onditional heteros edasti ity with estimates of the varian e
of United Kingdom ination, E onometri a
50, 9871007.
Fran q, C. and Zakoïan, J. M.: 2006, Linear-representation bases estimation of sto hasti
volatilty models, S andinavian Journal of Statisti s
33, 785806.
164
Figure 5.16. Relative E ien y in the referen e models modied with volatil-
ity outlier - Ee t of number of moment onditions - (MSE 14 moment ondi-
tions /MSE 24 moment onditions.) Sample size 500.
1.4
15

1.2
1.0
10
0.8
0.6
5
0.4
0.2
0.0
0
7

6
5
4
3
2
1
0
Gallant, R. A. and Tau hen, G.: 1996, Whi h moments to mat h, E onometri Theory
12(4), 657681.
Geweke, J.: 1994, Bayesian omparison of e onometri models. Federal Reserve of Minneapolis
Working Paper,.
Ghysels, E., Harvey, A. C. and Renault, E.: 1996, Statisti al Methods in Finan e, North
Holland, hapter Sto hasti Volatility, pp. 221238.
Gourieroux, C. A., Monfort, A. and Renault, E.: 1993, Indire t inferen e, Journal of Applied
E onometri s 8, 85118.
Hall, A.: 2005, Generalized Method of Moments, Oxford University Press.
Hampel, F. R., Ron hetti, E. M., Rousseeuw, P. and Stahel, W. A.: 1986, Robust Statisti s:
The Approa h Based on Inuen e Fun tions, John Wiley & Sons.
Hansen, L. P.: 1982, Large sample properties of Generalized Method of Moments estimators,
E onometri a 50, 10291054.
Hansen, L. P., Heaton, J. and Yaron, A.: 1996, Finite sample properties od some alternative
gmm estimators, Journal of Business and E onomi Statisti s 14, 262280.
Harvey, A. C., E., R. and Shephard, N. G.: 1994, Multivariate sto hasti varian e models,
Review of E onomi Studies 61, 247264.
Hotta, L. and Tsay, R.: 1998, Outliers in gar h pro esses. Working Paper, Graduate S hool
of Business, University of Chi ago.
165
Huber, P. J.: 1981, Robust Statisti s, John Wiley & Sons.

Imbens, G. W., Spady, R. H. and Johnson, P.: 1998, Information theoreti approa hes to
inferen e in moment onditions models, E onometri a 66, 333357.
Ja quier, E., Polson, N. and Rossi, P. E.: 1994, Bayesian analysis of sto hasti volatilty models
(with dis ussions)., Journal of Business and E onomi Statisti s 12, 371314.
Jungba ker, B. and Koopman, S. J.: 2009, Handbook of Finan ial Time Series, Springer, hap-
ter Parameter Estimation and Pra ti al Aspe ts of Modeling Sto hasti Volatility, pp. 313
344.
Kiefer, J. and Wolfowitz, J.: 1956, Consisten y of maximum likelihood estimator in the pres-
en e of innitely many parameters., Annals of Mathemati al Statisti s 27, 887906.
Kitamura, Y.: 2006, Empiri al likelihood methods in e onometri s: Theory and pra ti e.
Unpublished Working Paper.
Kitamura, Y. and Stutzer, M.: 1997, An information-theoreti alternative to generalized
method of moments estimation, E onometri a 65(5), 861874.
Koopman, S. J., Jungba ker, B. and Hol, E.: 2005, Fore asting daily variability of the s&p
100 sto k index using histori al, realised and implied volatility measurements., Journal of
Empiri al Finan e 12, 445475.
Liesenfeld, R. and Ri hard, J.: 2003, Univariate and multivariate sto hasti volatility models:
Estimation and diagnosti s, Journal of Empiri al Finan e 10, 505531.
M Neil, A., Frey, R. and Embre hts, P.: 2005, Quantitative Risk Management, Prin eton
University Press.
Meddahi, N.: 2001, An eigenfun tion approa h for volatility modelling. Working Paper,
University of Montreal.
Melino, A. and Turnbull, S. M.: 1990, Pri ing foreign urren y options with sto hasti volatil-
ity, Journal Of E onometri s 45, 239265.
Monfardini, C.: 1998, Estimating sto hasti volatility models through indire t inferene, E ono-
metri s Journal 1, 113128.
Newey, W. K. and West, K. D.: 1987, A simple, positive semi-denite, heteroskedasti ity and
auto orrelation onsistent ovarian e matrix, E onometri a 55, 703708.
Newey, W. and M Fadden, D.: 1994, Handbook of E onometri s, Vol. 4, Elsevier, hapter
Large sample estimation and hypothesis testing.
Newey, W. and Smith, R. J.: 2004, High-order properties of gmm and generalized empiri al
likelihood estimators, E onometri a 72, 219255.
Owen, A.: 1991, Empiri al likelihood for linear models, The Annals of Statisti s 18(1), 1725
1747.
Qin, J. and Lawless, J.: 1994, Empiri al likelihood and general estimating equations, The
Annals of Statisti s 20(1), 300325.
Renault, E.: 2009, Handbook of Finan ial Time Series, Springer, hapter Moment-Based
Estimation of Sto hasti Volatility Models, pp. 269311.
Ruiz, E.: 1994, Quasi-maximum likelihood estimation of sto hasti volatility models, Journal
Of E onometri s 63, 284306.
S henna h, S.: 2007, Point estimation with exponentially tilted empiri al likelihood, Annals
of Statisti s 35(2), 634672.
Shepard, N. and Andersen, T.: 2009, Handbook of Finan ial Time Series, Springer, hapter
Sto hasti Volatility: Origins and Overview, pp. 233254.
Shephard, N. G.: 1993, Fitting non-linear time series models, with appli ations to sto hasti
varian e models, Journal of Applied E onometri s 8, 135152.
166
Singleton, K. J.: 2006, Empiri al Dynami Asset Pri ing, Prin eton University Press.
Smith, R. J.: 2001, Gel riteria for moment onditions models. Working Paper, University of
Bristol.
Takada, T.: 2009, Simulated minimum hellinger distan e estimation of sto hasti volatility
models, Computational Statisti s and Data Analysis 53, 23902403.
Taylor, S. J.: 1986, Modelling Finan ial Time Series, John Wiley& Sons.
White, H.: 1982, Maximum likelihood estimation of misspe ied models, E onometri a 50, 1
25.
167
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE
DIFFUSIONS
INSPER INSTITUTE AND IMECC-UNICAMP
LUIZ KOODI HOTTA
IMECC-UNICAMP
Abstract. In this article we discuss the estimation of continuous time interest rate models
driven by fractional Brownian motion (fBm) using discrete data. In the presence of a fractional
Brownian motion, usual estimation methods for continuous time models using discrete data are
not appropriate since in general fBm is neither a semimartingale nor a Markov process. In this
context, we discuss the use of Indirect Inference .
Keywords: Stochastic Differential Equations, Fractional Brownian Motion, Indirect Inference.
This version - October 2009. Adress - Insper Institute - Rua Quatá 300, 04546-042, São Paulo, SP. Brasil. email -
Márcio Laurini - marciopl@isp.edu.br - Luiz Koodi Hotta - hotta@ime.unicamp.br.
1
169
INDIRECT INFERENCE IN FRACTIONAL SHORT-TERM INTEREST RATE DIFFUSIONS 2
1. Introduction
The use of continuous time models in finance, started with the seminal work of Bachelier (1900),
allows the usage of probability and stochastic process theory for the determination of asset prices.
The principle of no-arbitrage pricing, introduced in Harrison and Kreps (1979) and Harrison and
Pliska (1981), can be resumed as an imposition of a set of restrictions in stochastic processes
measured in continuous time which does not allow the existence of risk free profits.
Delbaen and Schachermayer (1994) show that no-arbitrage pricing is only possible in processes
know as semimartingales, and process excluded from such cannot be used as innovation process
in financial asset modeling. However, recent articles (discussed in the section 2) show that in the
presence of transaction costs and restrictions on the set of admissible strategies for the agents, more
general processes than semimartingales can be used as price process in finance, being consistent
with the no-arbitrage principle.
A particular result in these articles is that a special stochastic process would be consistent
with no-arbitrage in these conditions - the process known as fractional Brownian motion (fBm in
the text). This is a generalization of Brownian motion which allows the possibility of dependent
increments and long memory. Since the increments of this process are dependent, it is not a Markov
process; and, besides of a particular case where the process reduces to the standard Brownian case,
this process is not a semimartingale.
In this article we discuss the implications of fractional Brownian motions in the estimation of
stochastic differential equations using discrete sampled data. In this situation, most of the estima-
tors proposed for continuous time models using discrete data cannot be applied, since the violation
of Markov property prevents the construction of closed form likelihood functions. However, we
can use the principle of indirect inference (Gourieroux et al. (1993)) to build an estimator for
stochastic differential equations based on fractional Brownian motion. The principle of indirect
inference uses an auxiliary model based on an approximate and analytically tractable specification
of the model. The correction of the inconsistency generated by the incorrect specification is made
by Monte Carlo simulations. The principle of indirect inference can be used in this context of
non Markovian/non semimartingale stochastic differential equations since it does not demand the
exact likelihood function of the process, which is of infinite dimension in the fBm case.
In this article, we show how to implement the indirect inference principle for stochastic dif-
ferential equations driven by fractional Brownian motion, discussing with special attention the
170
computational difficulties in the implementation of this estimator. To show one practical applica-
tion, we estimate Cox-Ingersoll-Ross models ((Cox et al. (1985))) driven by a fractional Brownian
motion for a set of interest rate series studied in the literature.
This article is organized as following: in Section 2 we present a short review of the connections
between the principle of no-arbitrage and semimartingales; in Section 3, we show a brief descrip-
tion of some properties of fractional Brownian. In Section 4 we review the literature on estimation
of stochastic differential equations, and discuss the limitations of the existing estimators in the
presence of a fBm. Section 5 shows evidences obtained by Monte Carlo simulation of the effects
in usual estimators for diffusions under the influence of a fractional Brownian. In section 6 we
describe the proposed indirect inference estimator and the computational difficulties involved,
Section 7 shows the properties of indirect inference estimator and the GMM auxiliary model by
Monte Carlo simulations. Section 8 shows the real data applications using the US short rate data,
a Eurodollar rate and a series of Canadian short term interest rate. In Section 9 we have our final
words. Sections 2 and 3 can be skipped by readers more experienced with the ideas of no-arbitrage
Pricing and the fractional Brownian motion.
2. Arbitrage and semimartingales
The principle of pricing by no-arbitrage (Harrison and Kreps (1979), Harrison and Pliska (1981))
states that the price of an asset can be calculated as the price of a replicating portfolio that
reproduces the discounted payoff from the stochastic process of interest. We can formalize1 this
principle defining a probability space (Ω, F , (Ft )t>0 , P ) with d + 1 assets, a particular asset B =
(Bt )t>0 being a bank account representing the risk free asset, Ft−1 measurable, and a vector
S = (S 1 , ..., S d ) of risky assets prices with dimension d, with S i = (Sti )n≥0 , Fn - measurable. We
can define the portfolio value (Xtπ )t≥0 in period n by the expression:
(1) Xtπ = βt Bt + γt St .
We define a strategy π = (β, γ) as the choice in every moment of time of shares in the risk free
and the non risk free assets. A strategy π is a self-financing strategy if it can be written as:
1
This explanation follows Shiryaev (1999), and for further mathematical explanation sees Delbaen and Schacher-
mayer (2006).
171
t
X
(2) Xtπ = X0π + (βk ∆Bk + γk ∆Sk ).
k=1
In no-arbitrage pricing we usually choose one of the assets as numerary, and the portfolio
π
Xn
discounted value g
Xnπ = Bn that satisfies the relationship:

Xtπ St
(3) ∆ = γn ∆ .
Bt Bt
A self-financing strategy represents an arbitrage opportunity if, for an initial capital value
X0π = 0, we have Xtπ ≥ 0 (P-a.s) and Xtπ ≥ 0 with probability P (Xtπ > 0) > 0 , and the expected
portfolio value EXtπ > 0, which is equivalent to say that an investment is a risk free profit.
Defining Sfarb as the class of self-financing strategies with arbitrage opportunities, we state
π
that a market is arbitrage free if Sfarb = ∅ and we have P (XN = 0) = 1. The main result of
no-arbitrage pricing, knows as Fundamental Theorem of Asset Pricing, claims that a market is
free of arbitrage if and only if ther exists (at least one) measure of Probability Q equivalent to the
measure P such that the discounted sequence:
S St
(4) = ,
B Bt
is a Martingale process in wrt. the Q measure2. A Martingale process is a stochastic process

with the following properties:

St
(5) EQ <∞
Bt
and

St St−1
(6) EQ |Ft−1 = .
Bt Bt−1
Equation 6 shows the main property which is related to the concept of Market Efficiency -
the portfolio expected value in a future period n, given the information in Fn−1 , is the observed
2This definition is mathematically informal. In Asset Pricing Fundamental Theorem’ rigorous definition (Delbaen
and Schachermayer (1994)), the existence condition for an Equivalent Martingale Measure is the validity of the
condition knows as No Free Lunch with Vanishing Risk. No Arbitrage and No Free Lunch with Vanishing Risk are
equivalent when we have a finite sample space Ω.
172
portfolio value in n-1 period and thus variations in the portfolio value cannot be predicted in
a systematic way. If this principle is not valid, there are some arbitrage opportunity since it is
possible to create an arbitrage strategy π = (β, γ) using the predictability in the risk asset Sn .
General asset pricing by Martingale Methods follows this general procedure - the idea is to verify
if an Equivalent Martingale Measure exists, in other words, if there is any change of measure that
generate a Martingale process, usually using the Girsanov Theorem. The most common way of
getting an equivalent martingale measure is changing the drift of the diffusion process. Given a
diffusion process:
(7) dXt (ω) = f (Xt (ω))dt + σ(Xt (ω))dWt (ω)
under the measure P, we can set a new measure Q using the Girsanov Theorem:
Z t 2
dQ 1 f ∗ (Xs (ω)) − f (Xs (ω))
(8) (ω) = exp(− ds
dP 2 0 σ(Xt (ω))
Z t
f ∗ (Xs (ω)) − f (Xs (ω))
+ dWt (ω)).
0 σ(Xt (ω))
So Q is equivalent (share the same sets of null measure) to P. Furthermore:

f ∗ (Xs (ω)) − f (Xs (ω))
(9) dWt∗ (ω) =− dt + dWt (ω)
σ(Xt (ω))
is a Brownian motion in Q and:
(10) dXt (ω) = f ∗ (Xt (ω))dt + σ(Xt (ω))dWt∗ (ω),
where f ∗ (Xs (ω) and dWt∗ (ω) denotes the drift and Brownian motion in Q Measure.
The no-arbitrage pricing tries to find restrictions in the diffusion process that make the drift
term f ∗ (Xt (ω)) in Q measure equal to zero. If there exists a measure and it is unique so there is
only one no-arbitrage price and the market is complete3.
3This is an informal definition of the Second Fundamental Asset Pricing Theorem - see Shiryaev (1999) and Delbaen
and Schachermayer (2006).
173
The formal definition for infinite dimensional spaces Ω is known as the First Fundamental
Asset Pricing Theorem, which is related with a no-arbitrage pricing is only possible if and only
if the discounted strategies in measure of Q are semimartingales processes (e.g. Delbaen and
Schachermayer (1994)). Semimartingales are stochastic processes that can be decomposed as:
Xt = Mt + At
where Mt is a Local Martingale and At is a càdlàg predictable process with finite variation
conditioned on a filtration Fn−1 4.
The result in Delbaen and Schachermayer (1994) is clear - no-arbitrage pricing is only possible
for semimartingales processes. Thus, more general classes of processes that are not semimartingales
cannot in principle be used as price processes in finance. This limitation can be very restrictive,
since there are processes that are not semimartingales with interesting features (for example, some
form of dependence in the increments of process) which could be used as price processes in finance.
3. Fractional Brownian motion
A process with interesting characteristics to represent prices is the fractional Brownian motion
(fBm)5 . The fractional Brownian motion, introduced in Kolmogorov (1940) and formalized by
Mandelbrot and van Ness (1968) is the simplest stochastic process in continuous time with long
memory.
In this process the increments are stationary but are not independent, and thus, are not a
Markov process. To define a fBm, consider a probability space (Ω, F , P ); a normalized fBm BH (t)
is characterized for its covariance structure:
1 2H
(11) E(BH (t)BH (s)) = |s| + |t|2H − |t − s|2H , 0 < H < 1.
2
In (11) the H coefficient is known as the Hurst coefficient, in homage to British climatologist
H.E. Hurst, one of the first researchers to study the long dependence phenomena.
The fBm process has the property called self-similarity, which means that for given a we have:
(12) F (BH (at), t ≥ 0) = F (aH BH (t), t ≥ 0).

4The definition of a predictable process shows that given information in the filtration F
n−1 , the process is observed
without errors in n. Cadlag processes are right continuous process with left defined limits.
5
A compilation of results about fractional Brownian motion can be found in Mishura (2008).
174
F represents the distribution of process, and we can read this property like changes in time
scales having the same effect of scale changing 6.
The representation of fBm defined in Mandelbrot and van Ness (1968) uses the stochastic
integrals for Brownian motion W = (Wt )t∈R , with W0 = 0. With 0 < H <1 the processes fBm
BH (t) is given by:
Z 0 h i Z t
H−1/2 H−1/2 H−1/2
(13) BH (t) = cH (t − s) − (−s) dWs + (t − s) dWs ,
−∞ 0
r
2HΓ( 23 −H)
with normalization constant cH = Γ( 12 −H)Γ(2−2H)
, where Γ is the Gamma function.
Recall some important fBm properties:
(1) Stationary increments with E(BH (t) − BH (s))2 = |t − s|2H , t, s > 0.

(2) Process initiates in zero, i.e. BH (0) = 0.
(3) Mean and Variance: E(BH (t)) = 0 and E(BH (t))2 = |t|2H ,t>0.
(4) With H = 1/2 the process reduces to the standard Brownian with independent increments.
(5) Increments are negatively correlated for H < 1/2 and positively correlated for H > 1/2.
(6) For H > 1/2 fBm is a long memory process since the covariance structure goes to zero by
a power law:
ρ(n) = E[BH (1)(BH (1 + n) − BH (n))] ∼ CH n2H−2 ,

P∞
and n=1 ρ(n) = ∞.
(7) The fBm paths are Hölder continuous for orders lower than H and, in the same way of
standard Brownian, the process is always continuous, but is nowhere differentiable.
(8) The process has finite 1/H variation for any H :
Xh i
0 < sup E |BH (ti+1 ) − BH (ti )|1/H < ∞.
Π ti ∈Π
, where Π is the partition |ti+1 − ti |.

(9) For H 6=1/2 fBm is not a semimartingale nor a Markov process, belonging to a class known
as Dirichlet processes, that are the sum of local martingale and a process of zero quadratic
variation. For H <1/2 fBm quadratic variation is infinite, and for H >1/2 its quadratic
variation is zero.
6See Shiryaev (1999), chapter III for details.
175
In financial applications and in estimation of diffusion process, the property 9 is the most
important. Excluding the case H =1/2, the Equivalent Martingale Measure construction does not
work for fBm, since the process is not a semimartingale and as seen in Delbaen and Schachermayer
(1994), and in its presence, it should represent an arbitrage situation.
The existence of arbitrage in the presence of fBm was initially showed by Rogers (1997). The
intuition behind Rogers (1997) demonstration is that we can create an arbitrage strategy set-
ting a short (long) portfolio when the dependence structure in increments indicates an increasing
(decreasing) motion in the asset price; and this strategy would be explored by transacting contin-
uously in time.7 But recent studies show that it is possible to use fBm in financial applications,
i.e., it is possible to price using no-arbitrage by the imposition of some restrictions. In the same
article Rogers (1997) presents a modification in kernel of fBm process which avoids the existence
of arbitrage strategies. Another way to build no-arbitrage processes with the fBm was introduced
by (Hu and Oksendal (2000)), using an alternative fBm representation using Wick-Ito-Skorohod
stochastic integrals (e.g. Duncan et al. (2000)). However they do not have any economic meaning.
More intuitive ways to obtain no-arbitrage with fractional Brownian motion can be done by
putting restrictions in the set of possible strategies (Cheridito (2003) and Jarrow et al. (2007)) and
as proved by Guasoni (2006) in the presence of transaction costs the fBm is free of arbitrage. In
Cheridito (2003) and Jarrow et al. (2007) a simple way to obtain no-arbitrage is placing a restriction
that the agents cannot negotiate continuously; particularly this restriction does not allow the
arbitrage strategy in Rogers (1997). In the presence of transaction costs (Guasoni (2006)) arbitrage
strategies in fBm would cause infinite costs. Another important result is obtained in Kluppelberg
and Kuhn (2004), which shows that asset prices formulated as Poisson shot noise processes converge
weakly to fractional Brownian motion processes, and get a representation arbitrage free for this
model.
In our particular problem of interest rate modeling, the main results are obtained by Ohashi
(2009). In this article, no-arbitrage representations for fBM process are obtained for the Heath-
Jarrow-Morton framework (Heath et al. (1992)), using the proportional transaction costs method-
ology of Guasoni (2006). As Heath-Jarrow-Morton’s class contains as particular cases the short
rate models (in special, Cox-Ingersoll-Ross’ models used in this article) we have the compatibil-
ity between the stochastic differential equations processes used in interested rate modeling and
fractional Brownian motion.
7A detailed discussion about the arbitrage problem with fBm can be found in Mishura (2008).
176
4. Estimation of Stochastic Differential Equations
We can define the short interest rate process, measured in continuous time, using a stochastic
differential equation in the general form:
(14) d(Xt )) = a(t, Xt )dt + b(t, Xt )dW (t),
where a(t, Xt ) is the drift process, b(t, Xt ) is the diffusion (volatility) process and W(t) is the
standard Brownian motion. Usually, we work with expressions for the drift and the volatility
which depends of a parameter set. When parameters are constant in time, the process is a
time homogenous s and with time variying parameters, we have a heterogeneous process. In
heterogeneous interest rate models, usually we do not use statistical estimation of parameters, but
usually calibration, where the models parameters are adjusted to reproduce observed variables in
the market, like discount curves or derivative prices.8
Estimation of stochastic differential equations refers to the procedure of estimating the process
parameters based on the observed paths of the process. Note that for a finance process we have a
fundamental problem - although the process is set in continuous time, financial data is presented
in samples observed discretely. For example, interest rate data is usually presented in daily or
monthly frequencies.
The discrete sampling is a fundamental problem, since using only discrete data for model es-
timation when the model is constructed in continuous time we have an incorrect specification
problem, which usually leads for estimator’s inconsistency. The construction of econometric es-
timators for continuous time process is the building of consistent estimators using discrete data,
and in the literature, we have a wide range of methodologies that deals with this problem. Here
is a list for some methodologies proposed in the literature:
(1) Methods based on exact discretization
(2) Maximum likelihood based on transition density

(3) Generalized Method of Moments
(4) Quasi-Maximum Likelihood
(5) Simulated Maximum Likelihood
(6) Approximated Maximum Likelihood (Hermite Expansions)
(7) Indirect inference
8See Brigo and Mercurio (2006), chapters 4,6 and 7 for more information about these procedures.
177
(8) Efficient Method of Moments

(9) Simulated Method of Moments
(10) Martingale Estimating Equations
(11) Non Parametric Methods
(12) Markov Chain Monte Carlo
(13) Filtering based methods
(14) Stochastic Integral Discretization
Estimators can be clustered in many categories. For example, we could group estimators in
methods that use exact or approximations to the likelihood function (1,2,4,5,6,12,13); methods that
use moment conditions generated by the diffusion processes (3,7,8,9), non parametric approaches
(8,11); estimators based on Monte Carlo simulations (5,7,8,9,12,13) against estimators based in
analytical formulas (1,2,14).
Estimators based on the likelihood function can be formulated based on an exact discretization
of the process, e.g. the distribution of the discrete process Xt is known and coincides with the
distribution of the continuous time process. This is the case of the Geometric Brownian motion
(e.g. Campbell et al. (1997)). Another way is using analytical forms for the transition density of
the process f (Xt+∆t |Xt ), like the analytical forms obtained by Aït-Sahalia (1999) or the approach
using Hermite expansions obtained by Aït-Sahalia (2002). In theses cases, the estimator is based
on closed formulas, but there are techniques that evaluate the likelihood function using simulation,
like the Simulated Maximum Likelihood that uses simulated paths generated by Euler or Millstein’
discretization ((Pedersen (1995))), or Bayesian estimations using Markov Chain Monte Carlo or
Particle Filter (e.g. Johanes and Polson (2005)). In diffusion process with stochastic volatility,
a common technique is the Quasi-maximum Likelihood (Lund and Andersen (1997)) using the
Kalman Filter, where the likelihood function used in the estimation does not match the true
likelihood and the estimator, although biased, has minimum mean squared error property.
Estimators using moment conditions are widely used in the estimation of stochastic differential
equations. The estimation using Generalized Method of Moments using a discretization of the
process maybe is the most common methodology used in econometric estimation of stochastic
differential equations (e.g. Chan et al. (1992)). We have also the moment’s conditions derived
from infinitesimal generators from Markov’s process (Hansen and Scheinkman (1995)). In the cases
where the theoretical moments are not known, a very useful methodology is Simulated Method
of Moments (McFadden (1989)), where estimators are defined as solutions of minimum distance
178
between simulated moments and sample moments. In this category based on moments conditions,
we can also place estimators based on estimating equations (Kessler (1997), Kessler (2000)) and
the methodology of martingale estimating equations (Bibby and Sorensen (1995), Bibby et al.
(2007)).
Estimators using non parametric methods for diffusion processes were discussed in Aït-Sahalia
(1996), where drift and volatility are estimated through kernel regressions, and the transition
density of the process can be estimated by non parametric density estimators. Another application
for non parametric methods is building an auxiliary model for the estimation by Efficient Methods
of Moments (Gallant and Tauchen (1989) and Gallant and Tauchen (2001)) using series expansions
using the Hermite polynomials.
Another way to define estimators is through approximate discretization of stochastic integrals
that define the solution of stochastic differential equations. An application for the estimation of
generalized Cox-Ingersoll-Ross’ process can be found in Bishwal (2007).
4.1. Results of estimators of stochastic differential equations driven by an fBm. De-

creusefond and Ustunel (1999) got a Maximum Likelihood estimator for the drift in a pure fBm.
The properties of this estimator were studied by Norros et al. (1999), who derived unbiased and
strong consistent estimators. The study of stochastic differential equations driven by fBm was
treated by Le Breton (1998), who obtained least squares and non biased linear estimators, and
showed that these estimators are equivalent to the maximum likelihood estimator.
But all of these results are based on continuous sampling, e.g., the problem assumes a continuous
observation in the process. There are no analytical nor any empirical results for fBm processes
using discrete sampling. For a diffusion driven by fBm process most of the techniques previously
discussed are not directly applicably by the fact that fBm process is neither semimartingale nor
a Markov process. To make these problems more clear, note that the Markovian property states
that, conditioned to a filtration Ft , future and past realizations of the process are independent,
and so the process distribution is given by:
(15) f (Xt+k , Xt |Ft ) = f (Xt+k |Ft )f (Xt |Ft ),
for any t>0. A necessary condition for the process to be Markovian is that its transition density
p satisfies the Chapman-Kolmogorov equation:
179
Z
(16) p(y, t3 |x, t1 ) = p(y, t3 |x, t2 )p(y, t2 |x, t1 ),
t∈Ω
for each t3 > t2 > t1 and x, y are events in Ω or equivalently its transition density should
obey forward and backward Kolmogorov equations (e.g. Definition 5.1.1 in Karatzas and Shreve
(1987)).
If the Markovian property is satisfied, we can write the likelihood function of the process as:
T
X
(17) LN (X|θ, ∆t) ≡ log p(Xt |Xt−∆t , θ, ∆t),
t=1
where θ is the parameter vector. When the Markovian property is valid we can set the likelihood
function as the product of transition densities of the process, which depends only on the immediate
past, and thus we are assuming that increments of the process are independent.
The link between Markovian property and semimartingales process can be seen by Theorem
5.4.20 by Karatzas and Shreve (1987), which sets that the strong Markovian property can be
obtained when the drift and the volatility coefficients are bounded in Rd compact spaces and
the solution for the homogenous martingale problem in time ( definition 5.4.15 of Karatzas and
Shreve (1987)) is well defined. However, in the presence of fBm instead of a standard Brownian,
the homogenous martingale problem is not well formulated (this has expected value different than
zero in the fBm presence), preventing the use of the Markov construction of likelihood function.
In process with dependent increments, the process density does not depend only on the most
recent past but on the whole history of the process, and thus each Xt in the likelihood function
must be conditioned in principle to each observation before Xt−n . In the case of short term
dependent process, we have the asymptotic independence property related with the exponential
mixing concept (exponential α − mixing e ρ − mixing, e.g. Genon-Catalot et al. (1992)), but in
long memory process case as the case of fBm with H > 1/2 this property cannot be used.
The violation of the Markovian property makes the application of maximum Likelihood esti-
mation into continuous time process a difficult issue, given the complexity in the evaluation of
likelihood function. This problem is the analogue to the estimation of long memory process based
on discrete sampling (e.g. estimation of fractionally integrated ARFIMA (p,d,q) process using an
exact formula for the Durbin-Levison algorithm, (e.g. Palma (2007))), since this algorithm would
180
depend on an infinite past of discrete observations; in continuous time long memory process, the
transition density must be conditioned on all the continuous past of the process.
The violation of the Markovian property does not affect only the estimators based on likeli-
hood function; for example, the estimator of Hansen and Scheinkman (1995) is build based on
infinitesimal generators of a Markovian process, and yet non parametric estimators like the ker-
nel regression estimator used in Aït-Sahalia (1996) should be modified in order to consider the
influence of all past observations of the process.
The fact that fBm is not semimartingale has serious consequences for most of the estimator’s
properties. Usually all estimators for continuous time process use asymptotic properties based
on convergence and central limit theorems for semimartingales (e.g. Jacod and Shiryaev (2002)),
and some estimators are directly based on martingale property, like the estimators in Bibby and
Sorensen (1995) and Bibby et al. (2007). Likewise, there are not known results for estimators
based in approximations for stochastic integral for fBm in the discrete case, noting that in this
case we could not use Ito’s stochastic calculus but fractional stochastic calculus’ tools (e.g. Bishwal
(2007) and Mishura (2008)).
In this context, the estimation techniques for non-Markovian process can be more easily imple-
mented with methodologies based on simulation, like the principle of Indirect Inference, Simulated
Method of Moments and the Efficient Method of Moments. In this article, we discuss the Indirect
Inference implementation (Section 6), but a possible alternative, quickly discussed below, is the
implementation of Simulated Method of Moments.
The Simulated Method of Moments defines estimators for the parameter θ through the mini-
mization of the following distance 9:
" T T
#2
b 1 X 1 X s
(18) θ = arg min Xt − X (θ) ,
θ T t=1 T t=1 t
where Xt are the observed trajectories of the process and Xts (θ) are simulated trajectories of
the process given by vector θ, which converges to the solution:
θ∞ = arg min [EXt − EXts (θ)]2

θ
when n → ∞
9See Gourieroux and Monfort (1996) and Singleton (2006) for more complete references about Simulated Method
of Moments
181
In this case, we can set estimators for stochastic differential equations driven by fBm through
conditions derived from sample moments (especially derived from moments of the fBm’s covariance
function) and simulated moments from a simulations based on stochastic differential equations
driven by fBm processes. This method must be yet to be explored and since we have some
computational problems involved in the estimation by Indirect Inference, it can be an interesting
alternative. The Efficient Method of Moments is a refinement of the principle of Indirect Inference,
and its possibility will be discussed in Section 6.
4.2. Fractional Cox-Ingersoll-Ross. Referencing our discussion, we are going to work with a
specific short term interest rate model, although the discussion would be, initially, valid on any
model with directly observed components10 . A simple model with interesting properties is the
Cox-Ingersoll-Ross (Cox et al. (1985)) model. The Cox-Ingersoll-Ross model (abbreviated for CIR
in this text) is a single factor model having analytical expressions for the transition density of the
process (Aït-Sahalia (1999)) and also having closed formulas for bonds and options pricing. This
process is given by the following stochastic differential equation:
p
(19) dXt = κ(µ − Xt )dt + σ Xt dWt ,
where µ, κ and σ have the interpretation of long run mean, mean reversion velocity and volatility,
respectively, and Wt is a standard Brownian motion. The positivity condition of the process is
given by 2µκ > σ. Note that in this process the volatility changes in time and it is given by
dx2t = σ 2 xt dWt .
What we are define the fractional Cox-Ingersoll-Ross (CIR-fBm) as a diffusion in form:
p
(20) dXt = κ(µ − Xt )dt + σ Xt dBH (t),
where dBH (t) is fBm with Hurst coefficient H, as set by equation (3.4). In this process, we joined
the dependence structure of the CIR process with increments generated by fBm process, allowing
the long memory possibility in the process when H>1/2, but also a short memory dynamics given
by the mean reversion component in the drift. Until the present date, this model was not studied
10Models with unobservable components, like stochastic volatility, could be treated with some changes concerning
evaluation of moments of the unobservable components e.g. Gourieroux and Monfort (1996)).
182
in the literature (e.g. Bishwal (2007)), and the closest study in the literature is known as fractional
Ornstein-Uhlenbeck model that can be represented as:
(21) dXt = κ(µ − Xt )dt + σdBH (t),
which is a restricted form of the CIR-fBm without the square root component in volatility.
This model properties are compiled in Bishwal (2007). It is important to note that inference
methodologies for this Ornstein-Uhlenbeck process are also based on continuous sampling.
5. Monte Carlo evidence of misspecification problems
The first step is to study estimators of stochastic differential equations driven by fBm when
this estimators are designed for the standard Brownian innovations. Since we are working with
wrongly specified models, we conduct a Monte Carlo study of three estimators usually used in
estimating diffusion processes.
The Monte Carlo experiment is made with the following structure - for a value grid of Hurst
11
H coefficient ranging from .5 to .90, with .05 increments, we simulate 1000 replications of a
CIR process given by equation (19) with parameters µ = .1, κ = .5 and σ = .3, and samples sizes
200 and 500. Since there is not a known transition density for the case of a CIR process with
increments given by fBm, we simulate each trajectory using an Euler-Maruyama discretization
with [ti+1 − ti ] = .1 for the stochastic differential equation given by the Equation (20).
For diffusion in the form:
(22) d(Xt ) = a(Xt )dt + b(Xt )dW (t),
the Euler-Maruyama approximation is given by the recursive simulation:
p
(23) bt+∆t = X
X bt + a(X
bt )[∆t] + b(X
bt )) [∆t]Yt ,
where Yt is the increment of error process in the interval ∆t. From these simulations, we
estimate the parameter vector (µ,κ, σ) for each simulated trajectory using three estimators: exact
11The simulation of the diffusion process driven by Fractional Brownian motion is detailed in Section 6
183
Table 1. Mean of the CIR-fBm model parameter estimates based on 1000 repli-
cations - Sample size 200, simulated values µ = 0.1, κ = 0.5 and σ = .3.
ML using transition density

H .5 .55 .6 .65 70 .75 .80 .85 .90
µ 0.1458775 0.1134305 0.09509974 0.09683407 0.1030691 0.1188226 0.1265792 0.1317436 0.1345096
κ 0.512685 0.5703563 0.4815641 0.469679 0.4688932 0.5464706 0.5780457 0.6153487 0.6471074

σ 0.3479737 0.4014056 1.364213 1.55437 1.543064 1.753845 1.40321 1.727352 1.83721
ML based on Euler approximation
H .5 .55 .6 .65 70 .75 .80 .85 .90
µ 0.1458775 0.1134305 0.09509974 0.09683407 0.1030691 0.1188226 0.1265792 0.1317436 0.1345096

κ 0.5635175 0.4474295 0.5361329 0.548791 0.5497883 0.4874829 0.4654402 0.4304067 0.4193980
σ 0.3002251 0.2754534 0.2318571 0.4262613 0.8918149 1.579008 1.835250 1.373994 1.528249

GMM
H .5 .55 .6 .65 70 .75 .80 .85 .90
µ 0.115501 0.1145403 .08700724 0.07145202 0.0560945 0.04643310 0.0431905 0.04196120 0.04087077

κ 0.4998864 0.5717272 0.4325775 0.3553912 0.2732842 0.224805 0.2062955 0.1998206 0.1905817
σ 0.3475374 0.3881546 0.3382794 0.294583 0.2588234 0.2271535 0.2276487 0.2709468 0.2713994
Table 2. Mean of the CIR-fBm model parameter estimates based on 1000 repli-
cations - Sample size 500, simulated values µ = 0.1, κ = 0.5 and σ = .3.
ML using transition density

H .5 .55 .6 .65 70 .75 .80 .85 .90
µ 0.1348921 0.1232934 0.1143757 0.1298886 0.1341568 0.1392551 0.1474232 0.1487161 .

κ 0.5292128 0.5464539 0.5964127 0.7294346 0.773075 0.7655826 0.8083488 0.943861 .
σ 0.2794959 0.3659509 1.825951 1.547492 1.494382 1.679199 1.792744 1.809987 .
ML based on Euler approximation
H .5 .55 .6 .65 70 .75 .80 .85 .90
µ 0.170231 0.129059 0.1218682 0.1661310 0.1734377 0.1992791 0.1958566 0.3140093 .
κ 0.5627962 0.3948839 0.4436547 0.3965095 0.4003226 0.380165 0.3353381 0.4066419 .
σ 0.2479737 0.2014936 1.76823 1.655987 1.550644 1.553799 1.208276 1.424135 .
GMM
H .5 .55 .6 .65 70 .75 .80 .85 .90
µ 0.1042228 0.090686 0.06572463 0.05402191 0.04677832 0.04899353 0.05384942 0.05760389 .

κ 0.5126528 0.4529054 0.3283621 0.2698209 0.2337225 0.2426560 0.2609257 0.2706834 .
σ 0.3518613 0.3042927 0.25735 0.227723 0.262664 0.2853901 0.295324 0.2973890 .
maximum likelihood using the transition density of the Cox-Ingersoll-Ross process, maximum
likelihood based on Euler discretization of the process and the GMM12.

The maximum likelihood estimation using a transition density is based on analytical formula of
transition density of the CIR process derived by Aït-Sahalia (1999). The Euler approximation is
based on the likelihood function based on the Euler approach of the diffusion process (e.g. Elerian
et al. (2001)). The GMM estimation, following a procedure adopted in Chan et al. (1992) is based
on moment conditions obtained from discretization of CIR model.13:
12The moment conditions are showed in Section 6.

13The moment conditions are showed in Section 6.
184
Tables 1 and 2 show estimates means for 1000 simulations for each sample size and estimation
method. Note that is hard to identify the bias caused by fBm in these estimators in a systematic
way. The common result is that the volatility parameter is always overestimated, while persistence
parameters and long term mean are overestimated by some methods and underestimated by others.
For example, the κ parameter is overestimated using transition density estimation, while it’s
underestimated by GMM methods. The effect on long term µ is analogous to the behavior observed
for κ.
It is also interesting to note that for larger values of Hurst coefficient, the gets more difficult
- for example, for values larger than .90 it is very difficult to obtain a convergent estimation for
any method; in samples with size 500, most of the estimations for this parameter value does not
converge, and so we do not report the results for this value.
However the result of the Monte Carlo study is to show that usual estimators are severely
affected by the presence of fBm. For reasons discussed in Section 4.1, the fact that fBm is not a
Markov process or a semimartingale makes impossible the use of most common estimators in fBm
driven process, which is evident from this simulation study that although limited (the bias can be
different for other parameters sets).
6. Indirect Inference Estimation
As discussed in section 4.1, there are situations where there is not an analytical form for the
likelihood function or there is not an easy way to evaluate this function. Examples of this situation
are models with latent variables like stochastic volatility models, endogenous regime switching or
discrete choice models with serial dependence. In all of those cases, likelihood evaluation involves
the integration of all latent factors14, representing an integral of dimension equal the number of
latent factors, making a huge analytical and computational challenge.

In situations where the exact likelihood function evaluation is not feasible, a possibly way to go
is the employment of an auxiliary (instrumental) model which is simpler to evaluate. But if the
auxiliary model does not match the true specification, the estimators based on the instrumental
model are inconsistent. However, in situations where it is possible to draw simulations of the true
model’s observations it is possible to correct the estimators inconsistency using a calibration step,
minimizing a criterion function which measures the distance between the parameters obtained by
the instrumental model and estimations based on simulations of the true model.
14See Gourieroux and Monfort (1996) for discussions about possible applications.
185
The Indirect Inference method (Gourieroux et al. (1993), Smith (1993) and Gallant and Tauchen
(1996)) is based on the construction of a consistent estimator, correcting the bias in the instru-
mental model by Monte Carlo simulations. The Indirect Inference procedure, using the notation
of maximum likelihood estimation, can be formulated through three steps:
(1) Estimation of the parameters of the instrumental model using sample observations:
T
X
(24) θbaux = Argmax log ft (Xt |θ).
i=1
(2) Simulation of a trajectory of the true model conditioned on the parameters vector esti-
mated with the auxiliary model, creating an artificial series yts , T sized, and estimating
the auxiliary model with the artificial series:
T
X
(25) θbsim = Argmax log ft (Xts |Xt−∆t
s
, θ).
i=1
(3) Estimation of the consistent parameter vector θ through the calibration of the existing
bias in estimated vector θbaux , through the minimization of the following criterion function:
(26) θc b b ⊤ b b
II = arg min(θaux − θsim (θ)) Ω(θaux − θsim (θ)),
θ
where Ω is a positive definitive weight matrix. This step is usually performed using some
numerical minimization algorithm to calculate θc cp p
II = limp→∞ θII , where θII is the p-th
interaction given by θc
p b bp
II = h(θaux , θ sim (θ)), being an algorithm for update in the criterion
function.
Note that we can replace the maximum likelihood estimators by other methods. In this article
we use GMM estimators in steps (1) and (2) of the Indirect Inference procedure.
The asymptotic distribution of the Indirect Inference estimator is given by:
√
(27) T (θc
II − θ0 ) d N [0, W (S, Ω)] ,
−
→
with
(28)
⊤ −1 ⊤ ⊤ −1
1 ∂b ∂b ∂b ∂b ∂b ∂b
W (S, Ω) = 1+ (θ0 )Ω ⊤ (θ0 ) (θ0 )ΩΩ∗−1 Ω ⊤ (θ0 ) (θ0 )Ω ⊤ (θ0 ) ,
S ∂θ ∂θ ∂θ ∂θ ∂θ ∂θ
186
where S is an scale factor linked to the number of moments conditions, the binding function
b(θ0 ) = Eθ0 k(Xt ) sets the distance between the marginal moment of Xt and its expected value,
with function k defining the parameter function to be estimated, defining a distance metric in
−1
equation 24 and Ω∗ = J0 I0 J0 with J0 given by:
∂ 2 ψ(Xt ; θ)
(29) J0 = p lim − [Xt ; b(θ0 )] ,
T ∂β∂β ⊤

−1 √ ∂ψ(Xt ; θ) √ ∂ψ(Xt ; θ)
(30) I0 = lim V T [Xt ; b(θ0 )] − Eθ0 T ,
T →∞ ∂β ∂β
where β is the pseudo-true value of k(Xt ) and the criterion function ψ(Xt ; θ) given by:
⊤
1 1
(31) ψ(Xt ; θ) = − k(Xt ) − β k(Xt ) − β .
T T
This result shows that the estimators are consistent and asymptotically Gaussian15. In the case
of a just identified model - the number of auxiliary model parameters is equal to the number of
parameters in the structural model, we have that W (S, Ω) reduces to:

1 ∂b⊤ ∂b
(32) W (S, Ω) = 1+ (θ0 )Ω∗−1 ⊤ (θ0 ) .−1
S ∂θ ∂θ
In our problem, the application of Indirect Inference is used to correct the inconsistency problem
not only generated by the presence of dependent increments generated by the fBm process, but
also the problem generated by the use of an approximate discretization of the process, as usual in
inference for continuos time process using discrete data.
Note that the fundamental property for the consistency of the Indirect Inference estimator is
the use of simulated trajectories for the true model (e.g. Gourieroux and Monfort (1996)). In
this problem the condition of consistency is achieved by the use of consistent discretizations of the
fBm diffusion process. As discussed in section 6.1, the Euler discretization methodology used in
the simulation step is consistent, and thus the consistency of the estimator is obtained when the
interval of discretization converges to zero.
15For the proof of this property see Gourieroux and Monfort (1996), appendix 4A.
187
6.1. Description of the adopted procedure. In principle Indirect Inference procedure is an

easy method to be applied; but the implementation of the method involves a set of decisions about
the auxiliary model, the simulation procedure and the numerical optimization methodology used
in the calibration step. Next, we describe these procedures.
6.1.1. Auxiliary Model Choice. The choice of auxiliary model is based on three main points. First,
the auxiliary model must follow a identification restriction, i.e., must be possible to recover the
structural model parameters through the auxiliary model. This question is analogous to the
estimation of the structural model parameters by the use of a reduced form model; the number
of parameters (and therefore moments conditions) of the auxiliary model equal or higher to the
number of parameters of the real model.
The second point tells that the procedure efficiency depends on the auxiliary model quality of
adjustment. The Efficient Method of Moments (Gallant and Nychka (1987),Gallant (2007)) is a
refinement of Indirect Inference method, where it gets efficient estimators through the use of a
non parametric auxiliary model, using a structure with a higher number of parameters and thus
getting a very good fit for the observed sample. However, there are evidences showing that the
behavior of Efficient Moment of Moments in small samples can be worst than simpler Indirect
Inference estimators, as the studies in Chumacero (1997), Michaelides and Ng (2000), Ghysels
et al. (2003) and Zivot and Czellar (2008) show.
Finally, the third point says that the estimation of the auxiliary model must be fast, since for
each new step in the calibration process involves estimation for each simulated sample and com-
puting questions can limit the application of this procedure. Given the complex implementation of
Efficient Method of Moments, in this study we work only with the principle of Indirect Inference.
In our problem, the auxiliary model choice for the parameters (µ, κ, σ) estimation has a natural
candidate that is the GMM using the Euler discretization of process CIR, given by equation 23.
The estimation of the model using GMM is helpful, since the optimum weighting matrix can be
set using robust methods for heteroscedascity and serial correlation, increasing the efficiency of the
auxiliary model estimation. Note that we could use as auxiliary model the likelihood estimation
using the transition density of process CIR (Aït-Sahalia (1999)), but this procedure has shown
to be quite unstable in the estimation phase with simulated data, especially for larger Hurst
coefficient parameters, so the choice of GMM method was set by the asymptotic efficiency of the
procedure and because it is more stable in estimation.
188
To build a GMM estimator, we reparameterized the drift κ(µ − xt )dt as (α + βxt )dt, with
α = κµ and κ = −β. In this case we can formulate the conditions needed to estimation of the
parameters (α, β, σ2 ) defining εt+∆t = (rt+∆t − rt ) − (α + βrt )∆t and thus we have the following
four conditions of moments:
 
εt+∆t
 
 
 εt+∆t rt 
 
(33) g(θ) =  .
 
 ε2t+∆t − σ02 rt2 ∆t 
 
(ε2t+∆t − σ02 rt2 ∆t)rt
The GMM estimator is obtained using the Iterated GMM estimator (Hansen et al. (1996)).
With this procedure we estimate only parameters (µ, κ, σ). To get an estimator for parameter
H we use an estimation procedure through wavelets decomposition (e.g. Percival and Walden
(2000),Palma (2007)), using residual series from CIR model estimated from GMM.
To define this estimator, note that a wavelet is a function ψ(t) with the following properties:
Z
(34) ψ(t)dt = 0.
We define a familty of dilations and translations of the wavelet function ψ by the expression:
(35) ψjk (t) = 2−j/2 ψ(2−j t − k).
The discrete wavelet transform of the process y(t) is given by:
Z
(36) djk = y(t)ψjk (t)dt,
where j, k ∈ N. The wavelet function forms an orthogonal basis when:
Z
(37) ψij (t)ψkl (t)dt = 0 ∀ i, j.
The main advantage in using a orthogonal basis is that any real function can be expressed as:
189
∞
X ∞
X
(38) y(t) = djk ψjk (t).
j=−∞ k=−∞
To get an estimator for the H parameter, we set ubj using the discrete transform djk :
nj
1 Xd
(39) ubj = djk2 ,
nj
k=1
with the following property:
22dj 2
(40) ubj ∼ χ
nj nj
where nj define the level of discrete transform. Taking logarithms, we have the expression:
(41) log2 ubj ∼ 2dj + log2 c + logχ2nj /log2 − log2 nj ,
with E(logχn ) = ψ(n/2)+ log2, V ar(logχn ) = ζ(2, n/2), where ψ is the psi function, ζ the Zeta
Riemann function and ψ(z) = d/dzlogΓ(z). Setting E(εj )=0 and V ar(εj ) = ζ(2, n/2)/log(2)2,
estimation of H parameters can be done through the following linear regression:
(42) yj = α + βxj + εj ,
b
b is given by (β)/4
where yj = log2 ubj − gj , α = log c and β = 2(H − .5). Variance of H
(Palma (2007), pg. 92). We use in this estimation and in the fBm simulation procedure fBm the
wavelet decomposition given by the family of Daubechies-10 wavelets. With the estimation of H
we complete the auxiliary model θbaux = (b bσ
µ, κ, c2 , H).
b
6.2. Simulation step. In the simulation step, we must simulate trajectories of CIR-fBm process
with the parameters estimated in the first step. As discussed in section 5, there is neither a
transition density nor an exact discretization for this process and we must simulate through Euler
discretization given by equation (22). In this step we will detail the simulation process of fBm
trajectory.
190
6.2.1. fBm simulation using wavelets. The simulation of fractional Brownian motion is non trivial,
and there is a range of possible methods16. Some methods are exact (Hoskings; Cholesky; Davies
and Harte method), and some methods are approximate (like methods based on the Stochastic
Integral Representation of fBm and based on wavelets). Exact methods are extremely slow and
memory intensive, and in our problem, as we have to make repeated simulations for each step of
the optimization procedure, speed simulation is a relevant issue. The most efficient computational
form of simulating fBm trajectories is through a process know as wavelet Synthesis, which is
the chosen form for this article. Remembering the definitions of the wavelet function (Equations
34-38), the wavelet discrete transform of an fBm given by the following expression:
Z
(43) dBH = BH (t)ψjk (t)dt.
Our interest is not to decompose fBm, but to compose a trajectory of this process. A possible
way (Dieker (2004)) is through the following expression for the simulation of an observation in
period t of fBm process:
ℑ
X X
BH (t) = lim dbBH (j, k)2−j/2 ψ(2−j t − k),
ℑ→∞
j=−∞ k∈N
where dbBH (j, k) are Gaussian random variables with variance given by σ 2 2j(2H+1) . Using this
method we have trajectories of fBm process, and then we can create simulated trajectories of the
CIR-fBm process using Euler discretization. Note that the discretization usage is one approxi-
mation to the solution of the diffusion process and therefore the consistence for our estimators
are conditioned to the validity of this approximation for a ∆t converging to zero. See Kloeden
and Platen (1992) for a detailed discussion. The Euler approximation properties for stochastic
differential equations driven by fBm were studied in Nourdin (2005) and Neunkirch and Nourdin
(2006), and show that Euler approximations are consistent and its approximation error is almost
sure equivalent to a process given by δ 2α ξt , with the analytical form of ξt given explicitly, and
this consistency of Euler discretizations ensures consistency of the Indirect Inference estimator.
In Mishura (2008) the convergence study is performed to some general forms of approximations
of fBm process.
16A detailed discussion about other methodologies of estimation and simulation of fBm processes can be found in
Dieker (2004), and for a reference of simulation of long memory processes using wavelet Synthesis see Percival and
Walden (2000).
191
To complete this step, for each simulation done we estimate again the model parameters, using
the auxiliary model previously defined.
6.3. Optimization Step. The Indirect Inference estimator is obtained from the minimization of
the distance between the auxiliary model estimator and the one used in simulated data, defined
by Equation (26).
The computational implementations of steps 2 and 3 require the imposition of some restrictions.
The first restriction that we need to lay on is the simulated trajectory matching with a series of
positive interest rates. Therefore, if the simulation makes a rate with negative points, we simulate
a new fBm trajectory until we have positive rates, in a procedure of sampling with rejection.
In this case we are also minimizing with restrictions, laying on the positivity restriction of the
CIR model and in addiction putting a condition on the interval of Hurst coefficient which is set
to values between 0 and 1.
An additional detail is, as discussed previously, having many problems of convergence for higher
values of Hurst coefficient, and thus, the procedure is quite weak for these regions. Another point is
to set the optimum number of simulations for each step of the criterion function evaluation phase,
and we found the number of 20 simulations for each step of the criterion function evaluation
generates satisfactory results. Note that the estimator is computationally intensive.
The minimization numerical algorithm chosen is the algorithm of Nelder-Mead, which was more
efficient and robust to initial value problems than other estimators, like the methods of BFGS or
DFP.
An important step in the simulation is to perform a burn-in procedure during the simulation
step, to cancel the influence of initial observations. Disposal of a large number of initial obser-
vations is especially important since due to long memory structure in the simulation of fBm the
influence of initial observation holds for a much longer period.
7. Monte Carlo Evidence for Indirect Inference
Due to the difficulties already discussed the implementation of the Indirect Inference estimator,
to study the properties of this estimator we performed a series of Monte Carlo experiments.
In the first experiment we used as configuration the parameters vector (µ=.03, κ=.7, σ 2 =.05,
H=.6) and perform the estimation by the methods of Iterated GMM (auxiliary model ) and
Indirect Inference.
192
In this configuration we performed the estimation with 4 settings, using discretizations sizes
∆t=1/12 and 1/365 and sample sizes 500 and 1000. For each experiment we simulated 500
replications of the procedure, using a number of 20 simulations within the Indirect Inference
simulation step.
Results are shown in Table 3, with the four cases studied (Case 1 - sample size 500, ∆t=1/12,
case 2 - sample size 500, ∆t =1/365, case 3 - sample size 1000, =1/12 and case 4 - sample size
1000 and ∆t=1/365) and in each case showing the estimator mean, bias, mean squared error and
mean absolute error for each parameter.
Overall results show that the GMM estimator has superior performance in the estimation of
long-term average, but underestimates the parameters of the mean reversion for discretization
∆t=1/12 and overestimates for ∆t=1/365, and overestimates σ 2 and underestimates the Hurst
coefficient in all studied cases.

The results for the estimation by the method of Indirect Inference shows that the performance
of this estimator improves significantly, except for the estimation of H, with the change of the
discretization of 1/12 to 1/365. This result was expected because the consistency of the Indirect
Inference estimator is obtained when the procedure of simulation of discrete trajectories converge
to the simulated continuous trajectories of the true process, which occurs when ∆t→0. This
property can be observed by the reduction in MSE and MAE of Indirect Inference estimators of
the cases I to II and III for the case IV, but it is also important to note that increasing the sample
size (comparison between cases I and II and III and IV) also means in general an improvement in
the performance of the Indirect Inference estimator. For H, the performance is almost the same
for n=500 and there is a small improvement for n=1000.
It is also important to note that the performance of the GMM estimator worsens significantly
when the discretization is finer for the parameter κ, which can be understood by the fact that in a
finer discretization the greater proximity in time means a higher covariance between innovations,
and so affect more intensively the estimation of the parameter κ, showing the inconsistency of the
estimation by this method in the presence of a fBm as process innovations. For µ we can observe
the same effect.
Except for the estimation of the parameter of long-term average, the estimation by the method
of Indirect Inference is superior to GMM for all parameters, and with a finer discretization and an
increased sample size Indirect Inference estimator shows convergence the true parameters, which
is not true for the GMM estimator in this situation.
193
Table 3. Monte Carlo - GMM and Indirect Inference for fBm-CIR Model µ=.03,
κ=.7, σ 2 =.05, H=.6
gmm-1 II-1 gmm-2 II-2 gmm-3 II-3 gmm-4 II-4

mean µ 0.02286 0.05200 0.03323 0.04404 0.02638 0.05214 0.03156 0.04223
bias µ -0.00714 0.02200 0.00323 0.01404 -0.00362 0.02214 0.00156 0.01223
mse µ 0.00009 0.00101 0.00049 0.00076 0.00004 0.00124 0.00006 0.00045
mae µ 0.00746 0.02328 0.01679 0.01587 0.00369 0.02627 0.00615 0.01403
mean κ 0.53312 0.64053 0.77755 0.70960 0.61526 0.67231 0.73783 0.70434
bias κ -0.16688 -0.05947 0.07755 0.00960 -0.08474 -0.02769 0.03783 0.00434
mse κ 0.05102 0.01405 0.31388 0.00163 0.02102 0.00380 0.04334 0.00160
mae κ 0.17342 0.07068 0.42207 0.02459 0.08530 0.03670 0.15835 0.02668
mean σ 2 0.10923 0.07100 0.07280 0.06395 0.15934 0.08787 0.06107 0.06166
bias σ 2 0.05923 0.02100 0.02280 0.01395 0.10934 0.03787 0.01107 0.01166
mse σ 2 0.00351 0.00044 0.00052 0.00019 0.01195 0.00143 0.00012 0.00014
mae σ 2 0.06547 0.02406 0.03033 0.02001 0.11031 0.04486 0.01975 0.01933
mean H 0.43596 0.63927 0.51975 0.63917 0.39496 0.63920 0.56863 0.62814
bias H -0.16404 0.03927 -0.08025 0.03917 -0.20504 0.03920 -0.03137 0.02814
mse H 0.05945 0.00383 0.03875 0.00313 0.05548 0.00318 0.01160 0.00230
mae H 0.19131 0.04310 0.15454 0.04458 0.20609 0.04488 0.08206 0.03956
1,2 n=500 3,4 n=1000 1,3 ∆t=1/12 2,4 ∆t=1/365
Figure 1 shows the distribution of the estimators obtained by Monte Carlo for each parameter
in the case 4, and confirming evidence that the procedure of Indirect Inference is considerably
better in the estimation of the parameters κ and H, being more concentrated around the true
value parameter that the GMM estimator, with bias and variability significantly lower.
A second experiment (Table 4) was performed using a parameter setting parameters κ=.95
and H=.9, using a discretization ∆t=1/365 and sample size 500. In this new configuration the
performance of the Indirect Inference estimator is significantly better than the GMM estimator,
except for the parameter µ. The GMM estimator presents a problem of high negative bias for the
estimation of κ and H, emphasizing the need for the correction proposed by the Indirect Inference
estimator.
8. Applications
We implemented the procedure of estimation of fractional CIR process by Indirect Inference
using three interest rate series. The first series studied is a monthly interest rate on the Treasury
Bills from 03/1964 to 12/1989 (305 observations), similar to the interest rate series studied in
Chan et al. (1992). The second analyzed series is a one day interest rate on Eurolibor rate, with
sample with dates from 01/03/2000 to 27/03/2008 (2610 observations). The last series studied
is monthly Canadian interest rate data with one month maturity, with sample from 1956:11 to
194
Table 4. Monte Carlo - GMM and Indirect Inference for fBm-CIR Model- µ=.03,
κ=.95, σ 2 =.05, H=.9
gmm II
mean µ 0.02333 0.04804
bias µ -0.00667 0.01804
mse µ 0.00075 0.00143
mae µ 0.01870 0.02176
mean κ 0.73852 0.93412
bias κ -0.21148 -0.01588
mse κ 0.75566 0.00333
mae κ 0.58912 0.02833
mean σ 2 0.10040 0.07152
bias σ 2 0.05040 0.02152
mse σ 2 0.00254 0.00046
mae σ 2 0.06324 0.02810
mean H 0.80240 0.92557
bias H -0.09760 0.02557
mse H 0.04362 0.00534
mae H 0.15547 0.05305
Figure 1. Results - Monte Carlo sample size 500, ∆t=1/365
mu 15
kappa
GMM GMM
60
II II
50
10
40
Density
Density
30
5
20
10
0
0.00 0.02 0.04 0.06 0.08 0.10 0.5 1.0 1.5
sigma^2 H
12
GMM GMM
30
II II
10
25
8
20
Density
Density
6
15
4
10
2
5
0
0.05 0.10 0.15 0.20 0.2 0.4 0.6 0.8
195
Figure 2. Treasury Bills 1964-1989
1996:6 (512 observations). These data are the same analyzed in Tkacz (2001). The Figures 2, 3
and 4 show the graphics for these series.
Table 5, 6 and 7 show the estimation results of the auxiliary model estimated by Generalized
Method of Moments (GMM) and the Indirect Inference estimator (II) for the CIR-fBm process, for
the studied series. For the Treasury Bills series we can note that the results for the estimation of
the model by GMM and Indirect Inference are close, and the estimated H parameter is statistically
different than 1/2, with H<1/2 and pointing to a short memory and negative correlation in the
increments of the process. It is also interesting to note that the estimated parameter for the
persistence parameter by the Indirect Inference method is smaller than the obtained by GMM
method, showing that part of the persistence dynamic was not in the mean reversion component,
but in the correlation structure of the increments of the process. The standard errors estimates of
both methods are very close. This is expected because althoug the null hypothesis that H=0.5 is
rejected the estimate is close to 0.5.
196
Figure 3. Eurolibor
Table 5. Estimation - GMM and Indirect Inference - US Treasury Bills
µ κ σ H
GMM 0.04546093 0.81964484 0.05349082 0.44146423
GMM. s.e. 0.090080400 0.029319553 0.014459598 0.005245541
II 0.03238156 0.69379688 0.05813648 0.48755111
II - s.e. 0.09030874 0.029371080 0.014485505 0.005261965
For the series of Eurolibor rate (Table 6), the estimated parameter H is not statistically different
than 1/2, showing that the standard Brownian assumption cannot be rejected for this rate. A
favorable evidence for this result is that estimated parameters and theirs standard errors estimates
are similar for the two methods, showing that the correction by the present of a possible component
of fBm is not necessary.
The interest rate series of Canada, following Tkacz (2001) study, has previously showed evi-
dences of a long memory process. In this study, a wavelet-OLS estimator was used for the long
memory process, but there was no control for the structure of the short memory in continuous
197
Figure 4. Canadian Interest Rates
Table 6. Estimation - GMM and Indirect Inference - EuroLibor rate
µ κ σ H
GMM 0.3355136 0.1167345 0.2066294 0.5653880
GMM. s.e. 0.03030097 0.12100467 0.01191316 0.04937245
II 0.3000884 0.1196627 0.194989 0.5478063
II - s.e. 0.03030312 0.1210105 0.01191479 0.04938462
time. Using an Indirect Inference estimator suggested in the article, we got support for the long
memory evidence by Tkacz (2001), but controlling the structure of mean reversion existing in the
CIR process and assuming a continuous time process, as show in table 7, showing H significantly
higher than 1/2, supporting the evidence of long memory. For instance the estimate of (κ, σ) in
equal to (.34,.049) by GMM, which does not take into account the fractional process and equal to
(.28,.019) by the II method which takes into account. In this the difference os estimates of κ and
σ by the GMM and Indirect Inference methods are important. Note that this series the standard
errors of the estimates are very different.The Tkacz (2001) example illustrates the importance of
198
Table 7. Estimation - GMM and Indirect Inference - Interests Rates - Canada
µ κ σ H
GMM 0.01949002 0.33922906 0.04861706 0.63476452
GMM. s.e. 0.079312567 0.027278700 0.0116461 0.004259617
II 0.01444128 0.2848528 0.1868208 0.6107441
II - s.e. 0.10894435 0.03331045 0.03897676 0.01146291
take into account possible fBm process. In practice the GMM estimates would lead to a process
with quick reversion to the mean and to an overestimate of the volatility parameters (this would
imply, for example, in a unnecessary conservative decision).
9. Conclusions
In this article, we have discussed the estimation of diffusion process driven by a fractional
Brownian motion, in a discrete sampling context. Recent theoretical results (e.g. Cheridito
(2003), Guasoni (2006) and Jarrow et al. (2007)) point to the compatibility between the pricing
by no-arbitrage and process which are not semimartingales, and the results obtained by Ohashi
(2009) for the HJM model using the fractional Brownian motion, show the necessity of building
methodologies for estimation of non Markovian stochastic differential equations and more general
processes than semimartingales. Given practical and theoretical difficulties assumed in these
estimators derivation, like the complexity of the likelihood function, we show that, in this context,
estimation methodologies using simulation as the principle of Indirect Inference are the first step
to address these problems.
The results of the article show that the method of Indirect Inference allows the construction of
estimators with good properties in finite samples in the presence of a fractional Brownian motion.
These estimates show the desired properties when the interval of discretization decreases, in which
the GMM estimator deteriorates dramatically, as shown in Monte Carlo experiments. A possible
generalization is work with a larger class of models. A simple and very interesting modification is
work with the generalized CIR model studied by Chan et al. (1992), where the diffusion process for
fBm would be given by dxt = κ(xt −µ)dt+σxγt dBH (t). Since this model includes many sub models
of diffusion process used in finance, it would be possible to check more widely the methodology
validity for real data.

Perhaps the most important application of the methodology would be the estimation of diffusion
process for high frequency data in finance. In high frequency data, the estimation of stochastic
differential equations faces a major problem which is the contamination with market microstructure
effects. The estimation of many processes using high frequency data, as for example the estimation
199
of the integrated volatility of the process (e.g. Barndorff-Nielsen and Shephard (2002), Barndorff-
Nielsen and Shephard (2004)), related to the quadratic variation, is valid only when the sample
interval tends to zero. But exactly in this situation the market microstructure noise dominates the
process of quadratic variation and the estimation becomes inconsistent. This problem has been
studied and treated by many authors in the literature of realized volatility estimation (e.g. Bandi
and Russell (2005), Hansen and Lunde (2006)).
In general, however, all previous works studies assumes a short memory structure for the mar-
ket microstructure effects and the inference procedures, usually, use asymptotic results derived
from semimartingales processes, as discussed in Mykland and Zhang (2005). If the market mi-
crostructure effects would actually be consistent with the long memory process, the estimation of
the diffusion process with market microstructure effects could be made with the estimator of Indi-
rect Inference discussed in this article, given the difficulty of getting likelihood or nonparametric
estimators in this context of long memory. A related result could be found in Bayraktar et al.
(2006). This article studies the effect of investor inertia on stock price fluctuations with a market
microstructure model comprising many small investors who are inactive most of the time. In this
setup the log price process converge to a process with long-range dependence and non-Gaussian
returns distributions, driven by a fractional Brownian motion
In our article we treated only of processes of stochastic differential equations driven by pure fBm,
in other words, all uncertain is set only by fBm. But there are results of conditions existence of
no-arbitrage for mixed processes, where the process is driven by the mixture of standard Brownian
and fractional motions; a detailed discussion about these results can be found in Mishura (2008).
In this article there are some results for the estimation of mixed Brownian-Brownian-fractional
processes, but all results refer to the continuous sampling case and, again, there are not results for
the case of discrete sampling, and the Indirect Inference method could be studied in this context.
References
Aït-Sahalia, Y., Nov. 1996. Nonparametric pricing of interest rate derivative securities. Economet-
rica 64, 527–64, working Paper, MIT.

Aït-Sahalia, Y., 1999. Transition densities for interest rate and other nonlinear diffusions. Journal
of Finance 54, 1361–1395.
Aït-Sahalia, Y., 2002. Maximum-likelihood estimation of discretely-sampled diffusions: A closed-
form approximation approach. Econometrica 70, 223–262.
200
Bachelier, L., 1900. Theorie de la speculation. English translation Boness, A. J. in The Random
Character of Stock Market Prices, ed. Paul H Cootner, pg 17–78, Cambridge, Mass, MIT press
1967.
Bandi, F. M., Russell, J. R., 2005. Microstructure noise, realized volatility, and optimal sampling.,
working paper, Graduate School of Business, The University of Chicago.
Barndorff-Nielsen, O., Shephard, N., 2002. Estimating quadratic variation using realized variance.
Journal of Applied Econometrics 17, 455–457.
Barndorff-Nielsen, O. E., Shephard, N., 2004. Econometric analysis of realised covariation: High
frequency based covariance, regression and correlation in financial economics. Econometrica 72,
885–925.
Bayraktar, E., Horst, U., Sircar, R., 2006. Limit theorem for financial markets with inert investors.
Mathematics of Operations Research 31(4), 789–810.

Bibby, B., Sorensen, M., 1995. Martingale estimating functions for discretely observed diffusion
processes. Bernoulli 1, 17–39.
Bibby, B. M., Jacobsen, M., Sorensen, M., 2007. Handbook of Financial Econometrics. Elsevier,
Ch. Estimating Functions for Discretely Sampled Diffusion-Type Models.
Bishwal, J. P. N., 2007. Parameter Estimation in Stochastic Differential Equations. Springer.
Brigo, D., Mercurio, F., 2006. Interest Rate Models - Theory and Practice: With Smile, Inflation
and Credit. Springer.
Campbell, J., Lo, A. W., MacKinlay, A., 1997. The Econometrics of Financial Markets. Princeton
University Press.
Chan, K. G., Karolyi, G., Longstaff, F., Sanders, A., 1992. An empirical comparasion of alternative
models of short term interest rate. Journal of Finance 47, 1209–1297.
Cheridito, P., 2003. Arbitrage in fractional brownian motion models. Finance and Stochastics 7(4),
533–553.
Chumacero, R., 1997. Finite sample properties of the efficient method of moments. Nonlinear
Dynamics and Econometrics 2, 35–51.
Cox, J. C., Ingersoll, J. . E., Ross, S. A., 1985. A theory of the term structure of interest rates.
Decreusefond, L., Ustunel, A. S., 1999. Stochastic analysis of the fractional brownian motion.
Potential Analysis 10, 177–214.
201
Delbaen, F., Schachermayer, W., 1994. A general version of the fundamental theory of asset
pricing. Mathematische Annalen 300, 463–250.
Delbaen, F., Schachermayer, W., 2006. The Mathematics of Arbitrage. Springer.
Dieker, T., 2004. Simulation of fractional brownian motion. Tech. rep., CWI and University of
Tenesse.
Duncan, T., Hu, Y., Pasik-Duncan, B., 2000. Stochastic calculus for fractional brownian motion.
SIAM Journal on Control and Optimization.
Elerian, O., Chib, S., Shepard, N., 2001. Likelihood inference for discretely observed diffusion
process. Econometrica 69, 959–993.
Gallant, A. R., Tauchen, G., 2001. Efficient method of moments. unpublished manuscript.
Gallant, A. R. Tauchen, G., 2007. Handbook of Financial Econometrics. Elsevier, Ch. Simulated
score methods and indirect inference for continuous-time models.

Gallant, R. A., Nychka, D. W., 1987. Semi-nonparametric maximum likelihood estimation. Econo-
metrica 55, 363–390.
Gallant, R. A., Tauchen, G., 1989. Seminonparametric estimation of conditional constrained het-
erogenous processes: Asset pricing applications. Econometrica 57, 1091–1121.
Gallant, R. A., Tauchen, G., 1996. Which moments to match. Econometric Theory 12(4), 657–681.
Genon-Catalot, V., Jentheau, T., Laredo, C., 1992. Stochastic volatility models as hidden markov
models and statistical applications. Bernoulli 6, 1051–1079.
Ghysels, E., Khalaf, L., Vodounou, C., 2003. Simulation based inference in moving average models.
Annales D Economie et de Statistique 69, 85–89.
Gourieroux, C., Monfort, A., 1996. Simulation-Based Econometric Models. Oxford University
Press.
Gourieroux, C. A., Monfort, A., Renault, E., 1993. Indirect inference. Journal of Applied Econo-
metrics 8, 85–118.
Guasoni, P., 2006. No arbitrage with transaction costs, with fractional brownian motion and
beyond. Mathematical Finance 16, 569–582.
Hansen, L. P., Heaton, J., Yaron, A., 1996. Finite sample properties od some alternative gmm
estimators. Journal of Business and Economic Statistics 14, 262–280.
Hansen, L. P., Scheinkman, J. A., 1995. Back to the future: generating moment implications for
continous time Markov processes. Econometrica 63 (4), 767–804.
202
Hansen, P., Lunde, A., 2006. Realized variance and market microstructure noise. Journal of Busi-
ness and Economic Statistics 24, 127–218.
Harrison, J. M., Kreps, D., 1979. Martingales and arbitrage in multiperiod securities markets.
Journal of Economic Theory 20, 381–408.
Harrison, J. M., Pliska, S., 1981. Martingales and stochastic integrals in the theory of continous
trading. Stochastic Processes and Their Applications 11, 215–260.
Heath, D., Jarrow, R., Morton, A., Jan. 1992. Bond pricing and the term structure of interest
rates: A new methodology for contingent claims valuation. Econometrica 60 (1).
Hu, Y., Oksendal, B., 2000. Fractional white noise calculus and application to finance., preprint,
University of Oslo.
Jacod, J., Shiryaev, A., 2002. Limit Theorems for Stochastic Process (2nd Edition). Springer.
Jarrow, R., Protter, P., Sayit, H., 2007. No-arbitrage without semimartingales, unpublished Work-
ing Paper.
Johanes, M., Polson, N., 2005. Handbook of Financial Econometrics. Elsevier-North-Holland, Ch.
MCMC for Financial Econometrics.
Karatzas, I., Shreve, S. E., 1987. Brownian Motion and Stochasic Calculus. Springer–Verlag.
Kessler, M., 1997. Estimation of an ergodic diffusion from discrete observations. Scandinavian
Journal of Statistics 24, 211–22.
Kessler, M., 2000. Simple and explicit estimating functions for a discretely observed diffusion
process. Scandinavian Journal of Statistics 27, 65–82.
Kloeden, P., Platen, E., 1992. Numerical Solution of Stochastic Differential Equations. Springer–
Verlag.
Kluppelberg, C., Kuhn, C., 2004. Fractional brownian motion as a weak limit of Poisson shot noise
processes–with applications to finance. Stochastic Processes and their Applications 113 (2), 333
– 351.
Kolmogorov, A. N., 1940. Wienersche spiralen und einige andere interessante kurvem im
hilbertschen raum. Comptes Rendus (Dklady) de l’Academie des Sciences de l’URSS (N.S.)
26, 115–118.
Le Breton, A., 1998. Filtering and parameter estimation in a simple linear system driven by
fractional brownian motion. Statistics and Probability Letters 38, 263–274.
Lund, J., Andersen, T., 1997. Estimating continuous-time stochastic volatility models of the short-
term interest rate. Journal Of Econometrics 77, 343–377.
203
Mandelbrot, B. B., van Ness, J. W., 1968. Fractional brownian motion, fractional noises and
applications. SIAM Rev. 10, 422–437.
McFadden, D., 1989. A method of simulated moments for estimation of discrete response models
without numerical integration. Econometrica 57, 995–1026.
Michaelides, A., Ng, S., 2000. Estimating the rational expectations model of speculative storage:
A monte carlo comparison of three simulation estimators. Journal Of Econometrics 96, 231–266.
Mishura, Y., 2008. Stochastic Calculus for Fractional Brownian Motion and Related Process.
Lecture Notes In Mathematics - Springer.
Mykland, P. A., Zhang, L., 2005. Comment: A selective overview of nonparametric methods in
financial economics. Statistical Science 20(4), 347–350.
Neunkirch, A., Nourdin, I., 2006. Exact rate of convergence is some approximation schemes
associated to sdes driven by a fractional brownian motion, preprint, avaliable online at

http://arxiv.org/abs/math.PR/0601038.
Norros, I., Valkeila, E., J., V., 1999. An elementary approach to a girsanov formula and other
analytical results for fractional brownian motion. Bernoulli 5, 571 –587.
Nourdin, I., 2005. Schemas d’ approximation associes a une equation differentielle dirigee par une
fonction holderienne; cas du mouvement brownien fractionnaire. C.R. Acad. Sci. Paris Ser. I
Math. 340, 611–614.
Ohashi, A., 2009. Fractional term structure models: No arbitrage and consistency. Annals of
Applied Probability 19(4), 1553–1580.
Palma, W., 2007. Long Memory Time Series - Theory and Methods. Wiley.
Pedersen, A. R., 1995. A new approach to maximum likelihood estimation for stochastic differential
equations based on discrete observations. Scandinavian Journal of Statistics, 55–71.
Percival, D., Walden, A., 2000. Wavelet Methods for Time Series Analysis. Cambridge University
Press.
Rogers, L. C. G., 1997. Arbitrage with fractional brownian motion. Mathematical Finance 7(1),
95–105.
Shiryaev, 1999. Essentials of Stochastic Finance. World Scientific.
Singleton, K. J., 2006. Empirical Dynamic Asset Pricing. Princeton University Press.
Smith, A., 1993. Estimating nonlinear time series models using simulated vector autoregressions.
Journal of Applied Econometrics 8, 63–84.
204
Tkacz, G., 2001. Estimating the fractional order of integration of interest rates using a wavelet ols
estimator. Studies in Nonlinear Dynamics and Econometrics 5(1), 19–32.
Zivot, E., Czellar, V., 2008. Improved small sample inference for effi-
cient method of moments and indirect inference estimators, avaliable at
http://faculty.washington.edu/ezivot/research/CZ2007Latex2.pdf.
205
ARTICLE IN PRESS
Insurance: Mathematics and Economics ( ) –
Contents lists available at ScienceDirect
Insurance: Mathematics and Economics

journal homepage: www.elsevier.com/locate/ime
Constrained smoothing B-splines for the term structure of interest rates

Márcio Poletti Laurini a,b,∗ , Marcelo Moura a
a
Insper Institute of Education and Research, Brazil
b
Imecc Unicamp, Brazil
article info abstract

Article history: The constrained smoothing B-splines (COBS) is proposed as a nonparametric approach to estimate the
Received September 2009 term structure of interest rate. Compared to the existing methods in the literature, COBS’ main innovation
Received in revised form lies in its incorporation of important constraints imposed by no-arbitrage, such as monotonically
November 2009
decreasing and boundary conditions for the discount function, positive forward and spot rates. In addition,
Accepted 23 November 2009
Available online xxxx
by estimating the conditional median function, COBS is less sensible to outliers in reduced samples than
other common methods in the literature. Estimation for high and low liquidity markets together with
JEL classification:
simulation exercises puts COBS in an intermediate position between usual parametric and nonparametric
G13 methods in the literature. It has more flexibility than parametric methods and, compared to other
C14 nonparametric methods, satisfies no-arbitrage constraints and generates parsimonious shapes of the term
structure of interest rates.
SIBC Code: © 2009 Elsevier B.V. All rights reserved.
IM10
Keywords:
Term structure
No-arbitrage
Interpolation
Smoothing splines
1. Introduction in life insurance. The risk-neutral pricing of these instruments

in the presence of stochastic interest rates can be found at
The use of term structure of interest rates in finance and Zaglauer and Bauer (2008). Pedersen and Shiu (1994) explain
macroeconomics has been an active line of research over the how to use term structure models in order to evaluate options
past years. For open economies, term structure curves carry for a popular insurance product called Guaranteed Investment
information about expected future inflation rates and future Contract (GIC). The immunization problem was investigated by
GDP growth, and in closed economies shadow interest rates Boyle (1978, 1980), Albrecht (1985) and Shiu (1987, 1988, 1990)
that determine long run inflation expectations. Central banks and De Schepper et al. (1992) introduce some stochastic interest
usually extract inflation market expectations from comparisons of rate models for the valuation of annuities. More general references
nominal and indexed-linked government bonds. In finance, the use on the use of term structure models of interest rates in actuarial
of term structure is also very important for making investment
problems can be found in Panjer (1998) and Rolski et al. (1999).
decisions, pricing derivatives and performing hedging operations.
Related techniques of interpolation for the term structure of
Modeling the term structure of interest rates is a very important
interest rates contained in this article can be found in Delbaen
topic for actuaries, in particular in the pricing of insurance-related
assets and liabilities. A general discussion of actuarial issues related and Lorimier (1992) and Corradi (1996), discussing procedures
to the term structure of interest rates can be found in Ang and for interpolation and smoothing of yield curves in discrete and
Sherris (1997), which addresses the pricing of products such as continuous contexts, and a more empirical approach to the use
annuities and accrued liabilities of defined-benefit pension funds. of splines can be found in Carriere (2000), which deals with the
Bühlmann (1995) discusses how to use stochastic interest rates construction of confidence intervals for forward rates obtained
by these procedures. Interpolation methods using B-splines in
actuarial problems can be found in Renshaw and Haberman (2003)
∗ Corresponding address: Rua Quatá 300, CEP 04546-042, São Paulo, SP, Brazil. to mortality forecasts and Stevens et al. (in press) for the modeling
Tel.: +55 11 4504 2426.
of longevity risk. The technique presented here is especially
E-mail addresses: marciopl@insper.org.br (M. Poletti Laurini), important for actuaries due to the need for the interpolation of
marcelom@insper.org.br (M. Moura). rates for long maturities in the evaluation of long-term cash flows.
0167-6687/$ – see front matter © 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.insmatheco.2009.11.008
207
Please cite this article in press as: Poletti Laurini, M., Moura, M., Constrained smoothing B-splines for the term structure of interest rates. Insurance: Mathematics and
Economics (2009), doi:10.1016/j.insmatheco.2009.11.008
ARTICLE IN PRESS
2 M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) –
A discussion of adjustment procedures for yield curves for pure 1975). See Hagan and West (2006) and Anderson et al. (1996)
discount bonds with very long maturities and their application in for extensive reviews of methods utilized in nonparametric re-
the evaluation of life annuities can be found in Carriere (1999). gressions. Following this approach, Shaefer (1981) used Bernstein
A major problem raised by Corradi (1996) is that the usual polynomials and Pham (1998) used Chebyshev polynomials. Other
interpolation/extrapolation procedures for yield curves using examples are: Vasicek and Fong (1982) exponential splines,
nonparametric methods can generate negative prices for very long Barzanti and Corradi (1998) tension splines and Li and Yu (2005)
times to maturity of the yield curve. This problem is addressed by Bayesian formulation of spline methods. The interest in smooth-
the techniques introduced in this article. ing rather than interpolating data leads to the adoption of smooth-
Available data do not provide us with a complete term structure ing splines instead of regression splines as in Fisher et al. (1995).
curve; rather, what we observe is a set of discrete points relating The advantage of smoothing splines is the adoption of a penalty
yields to different maturities. This is not helpful in practice. As parameter that can control the excess of roughness. However, the
an illustration, it is unlikely that the observed maturities will be nonparametric methods cited above share some operational prob-
regularly spaced. Moreover, it could be necessary to have quotes lems: the choice of knot location and the number of knot points,
for non-standard maturities to price bond derivatives or other instability when fitting the estimated curve on extremes of ma-
interest rate securities. A perfect fit could be obtained through turity line and great sensibility to outliers, which make the fitted
interpolation; nevertheless, it would produce a very jagged curve curves very unstable.
since bond prices are subject to many sources of disturbances. This This study applies the method of constrained smoothing B-
noise is in general due to different liquidity, tax-treatments, bid- splines (herein after COBS) as formulated by He and Shi (1998) and
ask spreads and other effects. Therefore, it is imperative to have a He and Ng (1999) to confront those problems in term structure
method to estimate a continuous term structure curve. estimation. In particular, Barzanti and Corradi (1999, 2001) have
The literature of the term structure estimation follows two already employed constrained B-splines estimation to the problem
distinct lines. The first takes a statistical approach using smoothing of direct term structure estimation using Italian bond data. The
data techniques without considering driving factors behind asset authors include constraints on monotonicity and nonnegativity of
prices. The second approach makes use of theory to identify state the discount function by using a linear program formulation to the
variables or include no-arbitrage arguments with the purpose of B-spline estimation method.
constructing equilibrium models. However, since models need to Although we are not the first to apply the constrained B-
be calibrated to a constructed curve, even though the researcher spline technique to the estimation of the term structure interest
or practitioner will use an equilibrium model, it is common to rates, our approach differs somewhat of the Barzanti and Corradi
estimate first the term structure curve by some smoothing data (1999, 2001) methodology. First, COBS methodology estimates
technique. The converse is also the case. For instance, to run
a conditional median function and it is consequently robust to
a simulation experiment of our method, we first estimate the
outliers. In technical terms, COBS formulates the B-Spline by a
term structure using an equilibrium model, and then we use our
L1 projection and shares the properties of quantile regression
statistical method.
methods of Koenker and Basset (1978). Second, it provides full
In particular, statistical methods for estimating the term struc-
automation for selecting the smoothing parameter or the knot
ture of interest rates can be divided in parametric and nonpara-
mesh using information criteria instead of ad hoc or some
metric methods. Parametric methods have some advantages. First,
cross validation procedures based on convergence rates. In a
they assume functional forms that are parsimonious to provide
nonparametric setting, the knot mesh can be interpreted as the
economic interpretation for their parameters. Second, parameter
selected functions used to approximate the term structure.
restrictions and constraints can be added in such a way as to obey
Last but not least, similarly to Barzanti and Corradi (1999,
the relationships imposed by economic theory and no-arbitrage
2001), COBS representation in a linear programming formulation
principles. Third, parametric methods can be tested against nested
allows us to include constraints without substantial increase
models to examine if imposed restrictions by the theory are valid.
in computational costs. Recall that in the particular case of
Some typical examples of parametric interpolation can be found in
term structure estimation, those constraints will rule out some
Nelson and Siegel (1987) and Svensson (1994).
arbitrage opportunities. More specifically, COBS imposes boundary
However, as pointed out by Choudry (2005), parametric
conditions and a monotonically decreasing property to the
methods are not immune to problems. First, they are less flexible
discount function in an attempt to maintain positive spot and
than nonparametric methods to fit observed data, especially if the
forward rates.
fitted curve requires more than one hump and trough. Second,
because they are only reasonable approximations to observed data Comparison of estimation techniques can have many criteria.
and in general have a lower fit than nonparametric methods, This study will take the following. First, accuracy against smooth-
they are not appropriate for pricing and no-arbitrage applications. ness: is it flexible enough to accommodate data without generating
Finally, they are subject to a misspecification bias; a preselected very jagged curves? Second, no-arbitrage fulfillment: do the fitted
parametric model might be too restricted or low-dimensional to and implied curves generate no-arbitrage violations? Third, model
fit unexpected features; see Hardle (1990). consistency: is the estimation method consistent with a theoreti-
As pointed out by Ait-Sahalia and Duarte (2003), nonparametric cal equilibrium model? This paper compares COBS methodology
methods will tackle the potential problem of misspecification. against some usual methods in statistical term structure fitting.
First, since they do not assume a particular functional form, Namely, COBS is compared with smoothing splines and the para-
they are robust to misspecification errors. Second, nonparametric metric Nelson and Siegel (1987) and Svensson (1994) models.
methods can be used as a first step in the analysis of data to guide The remainder of the paper is structured as follows. The
the specification effort. Third, nonparametric estimation can be next section describes the relationship among spot interest rates,
quite feasible when the sample size is small and appropriate shape forward rates and discount functions and points the restrictions
restrictions are imposed. imposed by the assumption of no-arbitrage. Section 3 succinctly
For nonparametric estimation, the first methods employed describes the methodology of COBS. Section 4 compares COBS with
were regression splines with the use of quadratic and cubic piece- alternative methodologies using the set of criteria described above.
wise approximation functions introduced by McCulloch (1971, Section 5 concludes the paper.
208
ARTICLE IN PRESS
M. Poletti Laurini, M. Moura / Insurance: Mathematics and Economics ( ) – 3
2. Term structure definitions above (below) spot rates if the term structure of interest rates is
increasing (decreasing).
The spot interest rate, y(m), is the rate of return applied to Because spot rates are an average of forward rates, see Eq. (4),
maturity of a bond or a contract expiring in m periods. Today’s price forward rates will be much more volatile. This last result has
of receiving $1.00 in m periods is given by the discount function, important implications for the comparison exercise of Section 4.
d(m). Under continuous compounding, spot interest rates and the Since forward rates tend to be very volatile, any small error on the
discount function are related by the following formula: discount curve estimation is magnified many times in the forward
rate. Experience shows that although plots of the discount or spot
d(m) = e−y(m)m . (1) rates may differ little among different estimation methodologies,
As a result, from the discount function the spot rate or yield very jagged implied forward rate curves will signal if the technique
curve can be recovered by: has a good fit or not much more clearly.
log(d(m)) 3. Constrained smoothing B-splines (COBS)

y(m) = − . (2)
m
Because maturity is positive, Eq. (2) implies that spot rates will This section discusses very briefly COBS’ methodology first
be positive if and only if the discount function is positive and is introduced by He and Shi (1998) and subsequently developed by
less than one, 1 > d(m) > 0. Additionally, it is possible to show He and Ng (1999) in a formal algorithm. The method works by
the necessary conditions to rule out arbitrage opportunities: the accommodating qualitative constraints on the estimated function.
discount functions satisfying the boundary condition d(0) = 1 and Specifically, COBS can include restrictions such as monotonicity,
limm→∞ d(m) = 0, being positive, d(m) > 0, and is monotonically convexity or boundary conditions. The main principle lies in
decreasing, d′ (m) < 0. extending smoothing splines to conditional quantile function
To see this, notice that if the discount function has d(0) 6= estimation, a method introduced by Bosch et al. (1995). The second
1 it is possible to make instantaneous costless profit by selling step is to transform the fitting problem in a linear programming
the bond if d(0) > 1 or buying it if d(0) < 1. Now, assume representation that can easily incorporate the constraints.
limm→∞ d(m) = ε > 0, then it is possible to make a costless profit A more detailed introduction to COBS is provided in He and
by adopting a roll-over selling strategy. For example, at t = 0 sell Ng (1999) and Ng and Maechler (2007). Related concepts of Lp
a bond with maturity L, receiving at the time of purchase d(L) > 0, fitting, quantile regression methods and the linear programming
when time L arrives, the bond face value, assumed to be of $1, technique employed are provided in Koenker (2005). In general,
is paid by selling again a new set of bonds d(12L) and so on. Since this technique of constrained smoothing B-splines is close to the
limm→∞ d(m) = ε > 0, the amount of bonds sold at a determined B-Splines methodology introduced by Shea (1984) and Steeley
time will never explode. This strategy will not be characterized (1991). It also combines B-splines with the method of smoothing
as a doubling strategy, see Duffie (1996) pg. 104, and constitutes splines used by Fisher et al. (1995) and explored, with constrained
an arbitrage with initial costless profit of d(L). Positivity comes optimization, by Barzanti and Corradi (1999, 2001). Other related
directly form Eqs. (1) and (2). Finally, if d′ (m) > 0 for some interval works are the kernel regression methods presented in Linton et al.
m ∈ (m0 , m1 ), then it is possible to make a costless profit by (2001) and the penalized spline approach of Jarrow et al. (2004).
buying d(m0 ) and selling d(m1 ). At time zero there will be a profit What COBS does in this particular application is to fit a
of d(m1 ) − d(m0 ) > 0 and the money received can be held at m0 conditional quantile function to describe the overall relationship
to pay the bond sold when time m1 arrives. of yields and maturities. Note that for a pair of bivariate random
A forward rate, f (m), is the rate paid for a future investment variables (X , Y ), the τ th quantile function gτ (x), of Y given X = x
arranged today and made at time m in the future. Using continuous is a function of x, P (Y ≤ gτ (x)|X = x) = τ . In our particular
compounding arguments, the instantaneous forward rate is given case, we chose the conditional median function, but the whole
by: spectrum of conditional quantile functions could also be estimated.
Rm
In mathematical terms, the goal is to estimate the τ th Lp quantile
ey(m)m = e 0 f (x)dx . (3) smoothing spline, introduced by Koenker et al. (1994), which is a
smoothing spline ĝτ ,Lp (x) that solves the following problem:
Eq. (3) can also be written as:
m
min fidelity + λp roughnes. (6)
1
Z
g
y(m) = f (x)dx. (4)
m 0 As in the usual case of nonparametric smoothing splines,
From Eqs. (1) and (4) one can relate the discount and forward the concept of fidelity measures the goodness-of-fit of the
rates by the following formulas: approximation function ĝτ ,Lp (x) and Lp roughness controls the
roughness of the fit. The parameter λ controls the trade-off
m
Z
between those two measures. For observations {(xi , yi )}ni=1 with
d(m) = exp − f (x)dx
0 a = x0 < x1 < · · · < xn < xn+1 = b and the check function
ρτ (u) = 2[τ − I (u < 0)}, u = [1 + (2τ − 1)sgn(u)]|u, with I (·) as
d′ (m)
f (m) = − . (5) the indicator function, the concept of fidelity to the data is defined
d(m) by:
All those relations show that the term structure of interest rates n
X
can be constructed from any of the three rates, spot, discount fidelity = ρτ (yi − g (x)). (7)
or forward. The relationship works in a similar way for discrete i=1
compounding rates.
An immediate result of Eq. (5) is that forward rates will be The concept of roughness of the fit is defined in the Lp measure
positive if the no-arbitrage conditions of d(m) > 0 and d′ (m) > given by:
0 hold. Forward rates have a nice economic interpretation. They
Z
′′
are marginal increments on the spot rates; thus they will lie λLp roughness = λ||g ||p = λ (g ′′ (x))1/p dx.
209
ARTICLE IN PRESS
See Koenker (2005) for a discussion on Lp fitting. These problems parameters in the model more heavily. The AIC and SIC in
can be solved using standard linear programming methods, and constrained smoothing splines of He and Ng (1999) are given by:
again see Koenker (2005) for a discussion on computational aspects
1 1 log(n)
of these problems. In particular, L1 roughness and L∞ roughness

SIC(λ) = log ρτ yi − m̂λ,L1 + pλ
measures are defined by: n 2 n
n −2 and
X
L1 roughness = V (g ′ ) = |g (x+ +
i+1 ) − g (xi )|,

(8) 1 N +m
i=1
AIC(λ) = log ρτ yi − m̂λ,L1 +2
n n
L∞ roughness = V (g ′ ) = ||g ′′ ||∞ = max[g ′′ (x)]. (9) PN +m
x where m̂λ,L1 (x) = j=1 âj Bj (x) is the optimal linear (median)
The estimation of the term structure of interest rates will smoothing B-splines that solves (6) for τ = 0.5, m = 2 and the
specialize in the conditional median estimation, which implies τ = L1 roughness measure.
0.5 in Eq. (7) and the concept of fidelity will be given by: This makes the knot and smoothing parameter choice a fully
automatic procedure, removing the ad hoc procedures in the model
n
X specification. It is worth noting that, if necessary, the choice of
fidelity = |yi − g (xi )|. (10) the number of knots and its localization can be imposed by the
i =1
user. This analysis will be further explored on the next section
Now, based on the fact that any mth order smoothing spline has a were we used the automatic selection based on the SIC and we
corresponding B-spline representation on the same knot mesh, the compare it with other alternatives as the McCulloch (1975) equally
function g can be replaced by its B-spline representation: sparse knots and the Litzenberger and Rolfo (1984) criterion used
N +m by Barzanti and Corradi (1999) monotonic spline estimation.
X
g (x i ) = aj Bj (x).
j=1 4. Applications
In the B-spline representation above a more general knot mesh
The choice of a method of yield curve estimation superior to all
is used, T = {ti }Ni=+12m with N = n − 2 internal knot points. For
others is a hard, if not impossible, task. As Anderson et al. (1996)
simplicity, assume that all xi are distinct from one another, then
pointed out, authors have attempted to highlight the weaknesses
t1 = · · · = tm = x1 , tm+1 = x2 , . . . , tN +m = xn−1 , tN +m+1 = · · · =
and strengths of each method, but it is impossible to select the best
tN +2m = xn . For the estimation of linear (median) smoothing B-
model of all.
splines, for m = 2 and L1 roughness, He and Ng (1999) show that the
Nevertheless, we can narrow the options down to a few models
objective function of the estimation problem can be equivalently
that are commonly used for practitioners and academics. Deacon
described in a linear programming algorithm.
and Derry (1994) concluded that the B-spline is preferred by prac-
In particular, for the case of (median) smoothing B-splines, for
titioners and the survey of BIS Bank of International Settlements—
m = 2 and L1 roughness, the problem is formulated by a L1 projec-
‘‘Zero Coupon Yield Curves: Technical Documentation 1999’’
tion, it shares the properties of robustness related to the quantile
reports that the most used methods by Central Banks are the non-
regression methods of Koenker and Basset (1978), and is less sen-
parametric Smoothing Splines and the parametric methods of Nel-
sible to outliers in reduced samples than the methods of smooth-
ing splines and other interpolation schemes. As pointed out by son and Siegel (1987) and Svensson (1994).
Koenker et al. (1994), because the methods estimate conditional We use the following techniques as comparison benchmarks
quantile functions, they possess an inherent robustness to extreme against COBS: smoothing splines, Nelson and Siegel (1987) and
observations in the yi ’s. Svensson (1994). We skip the details of those models for simplicity,
Given the linear programming representation, it is easy to as a detailed survey of those models can be found in Anderson
incorporate qualitative, monotonicity, convexity (concavity) and et al. (1996) or James and Webber (2000). Our strategy is to
pointwise restrictions on the fitted equation by adding equality compare each method in terms of the three criteria defined
or inequality constraints; see He and Ng (1999) and Ng and in the introduction: accuracy against smoothness, no-arbitrage
Maechler (2007) for details. As noted in Section 2, constrained fulfillment and model consistency.
estimation is very important in the term structure fitting problem Some regression spline interpolation method could also have
in order to respect no-arbitrage principles. These restrictions been included, like cubic B-splines or exponential splines. How-
will be especially useful for estimating the discount function in ever, they were not because those methods interpolate instead
Section 4. of providing smooth approximations to the yield curve. Conse-
One recurrent problem in regression splines is the choice quently, they overfit the data producing very jagged curves and no
of number and location of knot points. In general, the choice trade-off between accuracy and smoothness of fit, which is inco-
follows some ad hoc set of criteria for linear, quadratic, cubic herent with our first comparison criterion.
and exponential regression splines. For details on these criteria; The remainder of this section is structured as follows. First,
see Anderson et al. (1996, pg. 35–36). Some methods, such we describe the raw data used for estimating yield and implied
as the penalized smoothing splines of Jarrow et al. (2004) curves. Second, we fit the three curves for each data set and method
use generalized cross validation (GCV). Other approaches use in order to look at the first two criteria. Third, we test model
economic interpretations of short, intermediate and long-term consistency by simulating the Cox et al. (1985) (CIR) model of the
money, like the one suggested by Litzenberger and Rolfo (1984) yield curve.
and employed by Barzanti and Corradi (1999). The knot selection
and the smoothing parameter λ in the constrained smoothing 4.1. Data description
method of He and Ng (1999) can be obtained by using the
Akaike information criteria (AIC) and the Schwarz information Two different markets are brought into play for our comparison
criteria (SIC). The Akaike information criteria is equivalent to using exercise: the US market using Treasury STRIPS (Separate Trading
the generalized cross validation, while the Schwarz information of Registered Interest and Principal of Securities) and the Brazilian
criteria is a version of AIC, which penalizes the number of market by the DIxPRE swaps contract. While the first market has
210
ARTICLE IN PRESS
high liquidity, longer maturities, and high-density data, the second extra hump to the curves. The respective spot and forward curves
has opposite characteristics. Thus, it is possible to see how the are also independently estimated using the following equations:
methods work in two very different market settings.
1 − e−m/τ1 1 − e−m/τ1

US data come from the Treasury STRIPS program, started in y(m) = β0 + β1 + β2 − e−m/τ1
1985, which trades US Treasury Bonds principal and coupon −m/τ2 −m/τ1
components as separate synthetic zero coupon bonds. As noticed 1 − e−m/τ2

by Carmona and Tehranchi (2006), this program was created to + β3 − e−m/τ2
−m/τ2
give zero coupon reference rates to the market. The Treasury
selects which bonds are eligible for the program and the strips of and
these issues are done by government securities and brokers. US f (m) = β0 + β1 e−m/τ1 + β2 m/τ1 e−m/τ1 + β3 m/τ2 e−m/τ2 .
STRIPS data are collected daily from January 1st, 1997 to January
1st, 2008 in a total of 2874 observations with an average of 208 In both, Nelson and Siegel (1987) and Svensson (1994) our
different quotes each day. approach to obtain the discount function was to use Eq. (1).
Since the Brazilian government zero coupon bonds do not The COBS formulation uses the approach described in Sec-
tion 3.1 For comparison purposes we have two possible formula-
present longer maturities and do have very low liquidity, we
tions, with and without constraints. We call the first specification
use swaps to obtain the Brazilian zero coupon curve. Those are
restricted COBS. This formulation will estimate the discount func-
future swap contracts between floating interbank rates and fixed
tion using the COBS’ algorithm described in Section 3 and with the
predetermined rates, DIxPRE. Those DIxPRE swap contracts are
additional linear constraints:
from the stock exchange future market in Brazil, the BM&F —
Bolsa de Mercadorias e Futuros. The swap curve is extracted d(0) = 1, (11)
from observed contracts in the market and does not have fixed d(m) > 0, (12)
maturities. In particular, note that the price at time t of a contract ′
maturing at time T is determined by the formula: d (m) < 0. (13)
Eq. (11) implies that discount curve starts at one, Eq. (12)
pt ,T = e−y(T −t ) 100,000.
restricts the discount curve to be positive, and Eq. (13) imposes
Thus, the continuous spot rate, y, is obtained by solving the the discount curve to be negatively inclined. Since our data
above equation for y. Data were collected daily from January 3rd, set is by definition finite, we did not include the constraint
2000 to December 5th, 2006 in a total of 1721 observations with limm→∞ d(m) = 0. Spot and forward rates are obtained from the
an average of 19 different quotes each day. implied relationship with the discount rates, Eqs. (2) and (5).
The unrestricted COBS does the same algorithm, however with
two main differences. First, we estimate the spot instead of the
4.2. Term structure of interest rates estimation
discount curve and forward and discount functions are obtained
using Eqs. (1) and (4). Second, the estimation is made without the
Given observed rates yi and maturities mi , the smoothing splines constraints (11)–(13). The idea of this specification is to isolate
approach estimates a smooth function g (mi ) that minimizes the the robustness aspect of COBS in the analysis, since it estimates
objective function: the conditional median function from the constraints inclusion. In
n Z this aspect, the unrestricted COBS can be more directly compared
X 2
yi − g (mi )2 + λ g ′′ (u) to the also nonparametric method of smoothing splines. Figs. 1–4

du
i=1 plot estimated yield and forward curves for all the estimation
techniques at some selected dates. US STRIPS market fits for yield
where the parameter λ controls the trade-off between accuracy,
and forward curves are shown in Figs. 1 and 2, while Figs. 3
represented by the residual sum of squares, and ‘‘smoothness’’ of
and 4 show estimated yield and forward curves for the Brazilian
the solution, represented by the integral of the second derivative to swap market. In general, for the spot rates, we see that COBS
the square. This parameter is automatically selected by using the gives a better adjustment than Nelson–Siegel family curves but
generalized cross validation (GCV) method of Fisher et al. (1995). without implying very jagged curves like the smoothing spline.
For each data point in our sample, we estimate a spot rate curve Looking at the graphs we also see that forward curves are much
directly by minimizing the above objective function. Forward and smoother when we use COBS rather than smoothing splines.
discount functions are found by their implied relationships with However, COBS suffers from similar problems to those found in
the discount curve, Eqs. (1) and (4). nonparametric methods in providing more volatile forward curves
The parameterized Nelson and Siegel (1987) and Svensson than parametric methods do. In particular, for the US STRIPS,
(1994) curves construct a parsimonious formulation for the spot we see that forward rates for restricted and unrestricted COBS
and forward curves using heuristic arguments based on the has the tendency to drop after 20–25 years maturities, which is
expectation theory of the term structure of interest rates; see Zivot not the case for Nelson and Siegel (1987) and Svensson (1994)
and Wang (2006). For the Nelson and Siegel (1987) specification, specifications.
we estimate spot and forward curves independently by estimating This drop in the forward curves after 20–25 years is justified
the following equations: by the convexity bias effect.2 As discussed in Phoa (1997) and
further explored by others, in general but not always, the yield on
1 − e−m/τ 1 − e−m/τ

y(m) = β0 + β1 + β2 − e−m/τ
−m/τ −m/τ
and 1 It is worth noting that the original algorithm for implementing COBS,
programmed by He and Ng (1999) was further developed by Ng and Maechler
f (m) = β0 + β1 e−m/τ + β2 m/τ e−m/τ (2007) providing faster computation, Specifically for our estimation exercise,
compared to the original COBS estimation package of He and Ng (1999),
where τ is a given adjustment parameter. The method of Svensson computations were on average 6.5 times faster using the Ng and Maechler (2007)
(1994) basically adds an extra term to the spot and forward Nelson algorithm.
and Siegel (1987) equations. The extra term intends to allow an 2 We thank an anonymous referee for this point.
211
ARTICLE IN PRESS
(a) Spot rate.
(b) Forward rate.
Fig. 1. STRIPS 2000-09-22.
a thirty-year Treasury bond is lower than the yield on a twenty- methodology makes use of a fully automatic procedure. More
year bond. This is due to the higher convexity of thirty-year specifically, in our exercise, the value of lambda and the number of
bonds when compared to twenty-year bonds. The market pays knots was selected using the SIC and location is defined uniformly
a premium for this effect, which Phoa (1997) calls a convexity in their percentile levels.4 For instance, if the number of knots is
adjustment. As we can see from Eq. (4), spot rates are an average three, the median maturity will be the single internal point; see He
of forward rates, therefore, an small difference between thirty and Ng (1999) for details. Since the procedure is fully automatic,
and twenty-year bonds implies in sensible changes in the forward for each cross section term structure curve, a different number of
rates from twenty to thirty-year maturities. This is exactly what is knots could be chosen. That is exactly the case for the US STRIPS and
captured by nonparametric methods. However, it is worth noting the Brazilian Swaps curves. Fig. 5 shows how this number changes
that restricted COBS imply in much less volatile changes than other for each daily curve. In general, the number of optimal knots varies
nonparametric methods, a result already pointed by Barzanti and
from two to six and is between four and five for most of the time.
Corradi (1999) constrained B-spline estimation. The convexity bias
In order to further investigate the effect of knot selection
effect is not observed for the Brazilian swap rates because, for this
procedure, we compare COBS estimation selection procedure with
market, maturities are much shorter.
In a nonparametric estimation the selection of the number and two other alternatives: the McCulloch (1975) and Litzenberger and
position of knots plays a crucial role in determining the shape Rolfo (1984) criteria. We use McCulloch (1975) procedure to place
near the boundary.3 As described at the end of Section 3, COBS
4 This is made by selecting the ‘quantile’ option in the COBS algorithm, it is also
3 We thank this comment and the idea of comparing COBS knot selection possible to choose location being uniformly distributed or select knot positions
procedure with other alternatives to an anonymous referee. manually.
212
ARTICLE IN PRESS
(a) Spot rate.
(b) Forward rate.
Fig. 2. STRIPS 2006-09-08.
internal knots equally sparsed across maturities. Notice that this Litzenberger and Rolfo (1984) we in fact obtain smoother forward
changes location, however, the number of knots is still defined curves.
following COBS automatic selection method in accordance with the Table 1 exhibits the root mean square errors (RMSE) results
SIC. Alternatively, using the procedure of Litzenberger and Rolfo for each model in the two markets studied. In order to evaluate
(1984), we change location and the number of knots. The procedure the alternative knot selection methods, we also include COBS
uses three internal knots at one-, five- and ten-year maturities with McCulloch (1975) and Litzenberger and Rolfo (1984) method,
as short, median and long-term money.5 We also used different hereafter named COBS with Mc and COBS with LR. As we expected,
values of the penalizing parameter lambda by picking values lower the Brazilian Swaps have a lower fit due to fewer data points and
and higher than the optimal choice according to the SIC. lower liquidity. From Table 1 we can conclude that COBS is in
Fig. 6 illustrates COBS restrict estimation with the default an intermediate position between the nonparametric smoothing
automatic selection procedure and using different lambda and spline method and the parametric Nelson–Siegel family curves
knot procedures. We can see that by decreasing the smoothing in terms of in-sample goodness of fit. Note that restricted and
parameter, λ, the forward curves becomes more jagged as unrestricted COBS have very close values for RMSE; this suggests
expected. If we increase the value instead, the forward curve that the added constraints had little impact on in-sample fit. Using
becomes smoother. Using McCulloch (1975) knot selection or alternative knot selections increases the RMSE, but not by much,
COBS with Mc and COBS with LR are still in an intermediate
position between smoothing splines and parametric methods.
5 This choice corresponds to the US STRIPS, for Brazilian Swaps, with shorter One striking fact is that Svensson (1994) specification gave the
and irregular maturities; we placed knots at 1 month, 25% and 75% of the longest worse fit in all analyses. This result can be explained in part by
maturity. the difficulties to attain convergence in the nonlinear equation
213
ARTICLE IN PRESS
(a) Spot rate.
(b) Forward rate.
Fig. 3. Swaps 2001-12-06.
Table 1 is because we have estimated the yield and forward curves

Root mean squared errors—Spot yield curve. separately and the parametric shape of those curves guarantees
Method US STRIPS Brazilian Swaps that no-arbitrage conditions (11)–(13) will be satisfied. Smoothing
COBS restricted 0.01496 0.18683 splines and unrestricted COBS, including COBS with Mc and COBS
COBS restricted with Mc 0.01548 0.34865 with LR have a significant number of violations only for the
COBS restricted with LR 0.01539 0.31900 US STRIPS market. This is for the reason that violations in the
COBS unrestricted 0.01466 0.21259 discount function will generally happen at very long maturities
COBS unrestricted with Mc 0.01481 0.28993
not observed in Brazilian Swaps, negative and positively inclined
COBS unrestricted with LR 0.01510 0.27718
Smoothing spline 0.00757 0.02363 discount rates. In short, no-arbitrage constraints do matter in
Nelson–Siegel 0.02419 0.42147 nonparametric estimates, particularly for markets with longer
Nelson–Siegel–Svensson 0.12625 0.63555 maturities. The use of alternative knot selection methods for
COBS increases the number of no-arbitrage violations, however,
estimation: problems of identification, over-parameterization and COBS unrestricted with Mc and COBS unrestricted with LR have
initial conditions. Since the Nelson and Siegel (1987) model had a a significantly lower number of no-arbitrage violations than
good fit, this explains why the Svensson (1994) model was over smoothing splines.
parameterized.
Table 2 displays the percentage of violations of our imposed 4.3. Simulation exercise with a CIR model
no-arbitrage conditions, Eqs. (11)–(13). Restricted COBS has no
violation by construction. In the case of the Nelson–Siegel family In this exercise we simulate the Cox et al. (1985) (herein after
curves we noticed no violations of the arbitrage conditions. This CIR) model for the neutral risk measure. In the CIR model the
214
ARTICLE IN PRESS
(a) Spot rate.
(b) Forward rate.
Fig. 4. Swaps 2005-11-22.
Table 2 affine term structure and its solution to the bond price is:
Violations of no-arbitrage conditions (%).
Method US STRIPS Brazilian Swaps P (t , T ) = A(t , T )e−B(t ,T )r (t ) (15)
COBS restricted 0 0
where P (t , T ) is the price of a zero coupon bond at time t that
COBS restricted with Mc 0 0
COBS restricted with LR 0 0 matures at time T > t and
COBS restricted 9.45 0.17 2kθ /σ 2
COBS restricted with Mc 9.56 0.19 2h exp {(κ + h) (T − t )/2}
COBS restricted with LR 9.69 0.22 A( t , T ) = ,
Smoothing spline 17.15 0.93 2h + (κ + h) (exp {(T − t )/h} − 1)
Nelson–Siegel 0 0 2 (exp {(T − t )/h} − 1) (16)
Nelson–Siegel–Svensson 0 0 B(t , T ) = ,
2h + (κ + h) (exp {(T − t )/h} − 1)
p
dynamic of the instantaneous short-term interest rate r (t ) obeys h = κ + 2σ 2 .
2
the following square root stochastic process:

Following the work of Chan et al. (1992), who calibrated the
p model to the US economy, we use the parameter values κ = 0.9,
dr (t ) = κ(θ − r (t ))dt + σ r (t )dW (t ), r (0) = r0 (14)
θ = 0.1 and σ = 0.3. We simulated 1000 observations of in-
where κ, θ, σ and r0 are positive constants and W (t ) denotes a stantaneous short rate using Eq. (14). For each observation of the
stochastic Wiener process. We impose 2kθ > σ 2 to ensure that short rate process, we use Eqs. (15) and (16) to generate a discrete
the interest rate r (t ) remains positive. The CIR model possesses an theoretical discount function. We used constant maturity intervals
215
ARTICLE IN PRESS
Strips
6
Optimal Number of Knots
5
4
3
2
0 500 1000 1500 2000 2500
Day
Swaps
6
Optimal Number of Knots
5
4
3
2
0 500 1000 1500

Day
Fig. 5. OBS optimal number of knot selections.
Fig. 6. Comparison of alternative knot selection methods for COBS—US STRIPS forward rates.
from 0.1 years achieving a maximum maturity of 29 years. This par- choice of the CIR model was motivated by the fact that this model,
ticular size was chosen to match the number of observations we in particular, provides an analytical formula for generating the
have for the US STRIPS market. discount function, which helps the simulation process.
From those theoretical discrete discount function curve, we The results of the experiment are in Tables 3 and 4.6 We can see
estimate a continuous discount function using the different that they have the same pattern observed for the US STRIPS market.
methods we have employed so far, restricted and unrestricted Absolute RMSE values and relative values across methods are about
COBS, smoothing spline, Nelson and Siegel (1987) and Svensson the same. For the frequency of no-arbitrage violations however, we
(1994). The continuous estimated curves are generated for a see an increase in the percentage. One possible explanation is that
maturity up to 30 years and compared to the theoretical discrete the parameter set is implying a very fast decay of the discount rate
curve generated by the CIR model. The goal here is twofold. First, and large number of no-arbitrage violations.
we want to test model consistency by comparing the CIR discrete The experiment illustrates that COBS is able to get a better fit
theoretical discount curve with the estimated continuous curve. than that provided by the parametric Nelson and Siegel (1987)
Second, we also estimate no-arbitrage violations for each method. and Svensson (1994) models while also avoiding the overfitting
In this setting, what ‘‘model consistency’’ stands for is: being
able to generate estimated continuous curves with good in-
sample-fit and fulfillment of the no-arbitrage conditions. These
6 For the simulation exercise of the CIR model we did not use alternative methods
continuous curves are estimated from a set of theoretical data
for the number and location of knots. The reason for that are twofold: in the
points generated by the CIR model. This simulation is just an simulated data the knot location has no direct economic interpretation and the
illustrative exercise, although the CIR model can be considered results of Section 4.2 demonstrated that alternative knot selection did not bring
a representative model for the interest rate curves shape. The significant differences in the estimation results.
216
ARTICLE IN PRESS
Table 3 Acknowledgements
Root mean squared errors—CIR simulated models.
Method US STRIPS We would like to thank the anonymous referees for valuable
COBS restricted 0.015470
comments. We are grateful to participants at the 12th Time Series
COBS unrestricted 0.014193 School Meeting, Gramado, Brazil, the VI Brazilian Finance Meeting,
Smoothing spline 0.008038 Vitória, Brazil, the XXVIII Brazilian Econometric Meeting, Salvador,
Nelson–Siegel 0.023210 Brazil and the LACEA–LAMES 2006 meeting, Mexico City, Mexico,
Nelson–Siegel–Svensson 0.1281103
for their helpful comments on an earlier draft of this paper.
Table 4 References
Violations of no-arbitrage conditions (%)—CIR simulated models.
Method US STRIPS Ait-Sahalia, Y., Duarte, J., 2003. Nonparametric option pricing under shape
restrictions. Journal of Econometrics 116, 9–47.
COBS restricted 0 Albrecht, P., 1985. A note on immunization under a general stochastic equilibrium
COBS unrestricted 16.5 model of the term structure. Insurance: Mathematics and Economics 4,
Smoothing spline 23.5 239–244.
Nelson–Siegel 0 Anderson, N., Breedon, F., Deacon, M., Derry, A., Murphy, G., 1996. Estimating and
Nelson–Siegel–Svensson 0 Interpreting the Yield Curve. Wiley.
Ang, A., Sherris, M., 1997. Interest rate risk management: Developments in interest
rate term structure modeling for risk management and valuation of interest-
problems of nonparametric methods. We show that, even for the rate-dependent cash flows. North American Actuarial Journal 1 (2), 1–26.
Barzanti, L., Corradi, C., 1998. A note on interest rate term structure estimation using
interest rate curve obtained from a theoretical model, there are a tension splines. Insurance: Mathematics and Economics 22, 139–143.
significant number of no-arbitrage violations if we try fitting by Barzanti, L., Corradi, C., 1999. A note on direct term structure estimation using
unconstrained nonparametric models. COBS will eliminate those monotonic splines. Rivista di Matematica per le Scienze Economiche e Sociali
22, 101–108.
no-arbitrage violations without losing flexibility, which is peculiar Barzanti, L., Corradi, C., 2001. A note on interest term structure estimation by
to nonparametric methods. Similar results are observed for direct monotonic smoothing splines. Statistica LXI, 205–212.
estimation from observed data looking at root mean square errors Bosch, R.J.Y., Ye, Y., Woodworth, G.G., 1995. A convergent algorithm for the quantile
regression with smoothing splines. Computational Statistics & Data Analysis 19,
and the frequency of no-arbitrage violations. 613–630.
Boyle, P.P., 1978. Immunization under stochastic models of the term structure.
Journal of the Institute of Actuaries 105, 177–187.
5. Conclusions
Boyle, P.P., 1980. Recent models of the term structure of interest rates with actuarial
applications. In: Transactions of the 21st International Congress of Actuaries.
COBS innovates the literature of term structure estimation by vol. 4, pp. 95–104.
introducing qualitative no-arbitrage constraints and by providing Bühlmann, H., 1995. Life insurance with stochastic interest rates. In: Ottaviani, G.
(Ed.), Financial Risk in Insurance. Springer, Berlin, pp. 1–24.
a robust estimator to outliers. In fact, the results of Section 4 show Carmona, R., Tehranchi, M., 2006. Interest Rate Models: An Infinite Dimensional
that unrestricted nonparametric methods produce a significant Stochastic Analysis Perspective. Springer.
number of no-arbitrage violations. The problem is more significant Carriere, J.F., 1999. Long-term yield rates for actuarial valuations. North American
Actuarial Journal 3, 13–22.
for nonparametric methods. Violations of those no-arbitrage Carriere, J.F., 2000. Non-parametric confidence intervals of instantaneous forward
conditions are not captured by usual fitting criteria like RMSE and rates. Insurance: Mathematics and Economics 26, 193–202.
can bear very large costs, especially for hedging operations. Chan, K.G., Karolyi, G., Longstaff, F., Sanders, A., 1992. An empirical comparison of
alternative models of short term interest rate. Journal of Finance 47, 1209–1297.
The proposed methodology also produces more meaningful Choudry, M., 2005. The Handbook of Fixed Income Securities. McGraw-Hill.
curves in the sense that the spot curve shows reasonable, not very Corradi, C., 1996. On the estimation of smooth forward rate curves from a finite
jagged, shapes as expected by theoretical arguments and practice number of observations: A comment. Insurance: Mathematics and Economics
18, 115–117.
as well. This robustness is due to the fact that it estimates a
Cox, J.C., Ingersoll, J.E., Ross, S.A., 1985. An intertemporal general equilibrium model
conditional median function using smoothing splines instead of of asset prices. Econometrica 53, 363–384.
a mean conditional function. COBS showed consistency with the Deacon, M., Derry, A., 1994. Estimating the term structure of interest rate. Working
CIR model. The experiment illustrated in Section 4 gave evidence Paper 24. The Bank of England.
Delbaen, F., Lorimier, S., 1992. Estimation of the yield curve and the forward rate
that the other competitive methods presented the same sort of curve starting from a finite number of observations. Insurance: Mathematics
problems shown in direct estimation using observed data. and Economics 11, 259–269.
In conclusion, we put COBS method in an intermediate category De Schepper, A., Goovaerts, M., Delbaen, F., 1992. The Laplace transform of
annuities certain with exponential time distribution. Insurance: Mathematics
between nonparametric and parametric methods. COBS combine and Economics 11, 291–294.
the best aspect of both methods: it captures flexibility from Duffie, D., 1996. Dynamic Asset Pricing Theory, 2nd ed. Princeton University Press.
nonparametric methods and sensible shapes and fulfillment of no- Fisher, M., Nychka, D., Zervos, D., 1995. Fitting the Term Structure of Interest Rates
Using Smoothing Splines. Finance and Economics Discussion Series. Board of
arbitrage conditions from parametric methods. COBS in-sample fit Governors of Federal Reserve System.
is also intermediate between smoothing splines and Nelson–Siegel Hagan, P., West, G., 2006. Interpolation schemes for curve construction. Applied
family curves. This is not particularly surprising, since COBS Mathematical Finance 13, 89–129.
Hardle, W., 1990. Applied Nonparametric Regression. Cambridge University Press.
penalizes the roughness less in exchange for observing the no-
He, X., Ng, P., 1999. COBS: Qualitatively constrained smoothing via linear
arbitrage constraints; however, it keeps a better fit than Nelson and programming. Computational Statistics 14, 315–337.
Siegel (1987) and Svensson (1994) curves. He, X., Shi, P., 1998. Monotone B-Spline smoothing. Journal of the American
The final point in the conclusion is that COBS technique of He Statistical Association 93, 643–650.
James, J., Webber, N., 2000. Interest Rate Modeling. John Wiley & Sons.
and Ng (1999) is a very competitive method to fit the term struc- Jarrow, R., Ruppert, D., Yu, Y., 2004. Estimating the term structure of corporate
ture of interest rates. It combines the flexibility of nonparametric debt with a semiparametric penalized spline model. Journal of the American
methods with more sensible shapes of the parametric methods. It Statistical Association 99, 57–66.
Koenker, R., 2005. Quantile Regression. Cambridge University Press.
ranks very well in the criteria of accuracy against smoothness, no- Koenker, R., Basset, G., 1978. Regression quantiles. Econometrica 46, 33–50.
arbitrage fulfillment and model consistency, and avoids the prob- Koenker, R., Ng, P., Portnoy, S., 1994. Quantile smoothing splines. Biometrika 81,
lems of negative prices for long maturities discussed in Carriere 673–680.
Li, M., Yu, Y., 2005. Estimating the interest rate term structure of treasury and
(1999), as confirmed by the results shown in Table 2. In that sense, corporate debt with Bayesian splines. Journal of Data Science 3, 233–240.
it is very appealing when compared to usual methodologies in term Linton, O., Mammen, E., Nielsen, J., Tanggard, C., 2001. Estimating yield curves by
structure estimation. kernel smoothing methods. Journal of Econometrics 105, 185–223.
217
ARTICLE IN PRESS
Litzenberger, R.H., Rolfo, R., 1984. An international study of tax effects on Shea, G., 1984. Pitfalls in smoothing interest rate structure data: Equilibrium models
government bonds. Journal of Finance 39, 1–22. and spline approximation. Journal of Financial and Quantitative Analysis 19,
McCulloch, J., 1971. Measuring the term structure of interest rates. Journal of 253–269.
Business 44, 19–31. Shiu, E.S.W., 1987. On the Fisher–Weil immunization theorem. Insurance:
McCulloch, J., 1975. The tax-adjusted yield curve. Journal of Finance 30, 811–830. Mathematics and Economics 6, 259–266.
Nelson, C.R., Siegel, A.F., 1987. Parsimonious modeling of yield curves. Journal of Shiu, E.S.W., 1988. Immunization of multiple liabilities. Insurance: Mathematics and
Business 60, 473–489. Economics 7, 219–224.
Ng, P., Maechler, M., 2007. A fast and efficient implementation of qualitatively Shiu, E.S.W., 1990. On Redington’s theory of immunization. Insurance: Mathematics
constrained quantile smoothing splines. Statistical Modeling 7 (4), 315–328. and Economics 9, 171–175.
Panjer, H.H. (Ed.), 1998. Financial Economics: With Applications to Investments, Steeley, J.M., 1991. Estimating the gilt-edged term structure: Basis splines and
Insurance and Pensions. The Actuarial Foundation, Schaumburg, IL. confidence intervals. Journal of Business Finance and Accounting 18, 513–529.
Pedersen, H.W., Shiu, E.S.W., 1994. Evaluation of the GIC rollover option. Insurance: Stevens, R., Waegenaere, A., Melenberg, B., 2009. Longevity risk in pension annuities
Mathematics and Economics 14, 117–127. with exchange options: The effect of product design. Insurance: Mathematics
Pham, T.M., 1998. Estimation of the term structure of interest rates: An international and Economics, in press (doi:10.1016/j.insmatheco.2009.09.005).
perspective. Journal of Multinational Financial Management 8, 265–283. Svensson, L.E.O., 1994. Estimating and interpreting forward interest rates: Sweden
Phoa, W., 1997. Can you derive market volatility forecasts form the observe yield 1992–1994. NBER Working Paper (4871). NBER.
curve convexity bias? The Journal of Fixed Income 6, 43–54. Vasicek, O., Fong, H.G., 1982. Term structure modeling using exponential splines.
Renshaw, A.E., Haberman, S., 2003. Lee–Carter mortality forecasting with age- Journal of Finance 37, 339–356.
specific enhancement. Insurance: Mathematics and Economics 33, 255–272. Zaglauer, K., Bauer, D., 2008. Risk-neutral valuation of participating life insurance
Rolski, T., Schmidli, H., Schmidt, V., Teugels, J., 1999. Stochastic Processes for contracts in a stochastic interest rate environment. Insurance: Mathematics and
Insurance and Finance. Wiley, Chichester. Economics 43, 29–40.
Shaefer, S.M., 1981. Measuring a tax-specific term structure of interest rates in the Zivot, E., Wang, J., 2006. Modeling Financial Time Series with S-PLUS, second ed.
market for British government securities. Economic Journal 91, 415–438. Springer-Verlag.
218
Emerging Markets Review 9 (2008) 247–265
Contents lists available at ScienceDirect
Emerging Markets Review

j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / e m r
Empirical market microstructure: An analysis of the BRL/US$

exchange rate market
Márcio Poletti Laurini a,⁎, Luiz Gustavo Cassilatti Furlani b,
Marcelo Savino Portugal c
a
Ibmec São Paulo and IMECC-Unicamp, Rua Quatá 300, CEP 04546-042, São Paulo, SP-Brazil
b
PPGE-UFRGS and SICREDI, Brazil
c
PPGE-UFRGS and CNPq Associated Researcher, Brazil
a r t i c l e i n f o a b s t r a c t
Article history: This article provides an analysis of empirical microstructure for the
Received 8 October 2008 BRL/US$ exchange rate market using high-frequency bid and ask
Accepted 18 October 2008 quote data. The aims of the article are to verify the importance of the
Available online 31 October 2008
presence of asymmetric information in price dynamics, to build a
model for the price discovery process and to analyze the empirical
JEL classification:
determinants of the spread between bid and ask through a conditional
G14
C22
model that captures an asymmetric response to the spread regarding
C14 past information. The asymmetric information hypothesis is tested
through a nonparametric test of conditional independence for the
Keywords: Markov property. A model for price discovery is built using a vector
Market microstructure error correction between bid and ask, controlling for duration and
Emerging market volatility. As a result of this vector, we build an equilibrium spread
Spread
deviation series, and we show that the conditional distribution of
Markov property
equilibrium spread deviations responds asymmetrically to the spread
Asymmetric response
Quantile regression changes and expected conditional volatilities and durations. This is
made by using the quantilogram and a quantile autoregression as tools
for modeling the asymmetry effects. We relate the findings to some
facts presented in the theoretical literature on market microstructure.
© 2008 Elsevier B.V. All rights reserved.
1. Introduction
The analysis of empirical microstructure effects on exchange rate markets has gained great momentum
in recent years.1 It is well recognized that in short-run asset pricing may be more closely related to market
⁎ Corresponding author. Tel.: +55 11 4504 24 26; fax: +55 11 4504 2350.
E-mail address: marciopl@isp.edu.br (M.P. Laurini).
1
Reference works in the literature of microstructure of the market exchange are Frenkel et al. (1996) and Sarno and Taylor (2002).
The monograph of Lyons (2001) is an extensive study of the microstructure of the exchange market based on the order flow.
1566-0141/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.ememar.2008.10.003
219
248 M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265
structures than the factors related to asset fundamentals, as pointed out by Flood and Taylor (1996). The
literature on market microstructure indicates that factors such as transaction costs, stock balance and
liquidity premia may play a more crucial role in prices in the short run than factors associated with
macroeconomic fundamentals.
The literature on exchange rates encourages the analysis of market microstructure effects by providing
evidence that the conventional macroeconomic approach to exchange rate determination can only explain
long-run movements and extreme situations, such as in hyperinflation events and exchange rate crises. In
normal exchange rate market situations, exchange rate movements are defined by the market
microstructure (e.g. Flood and Taylor (1996), Taylor (1995) and Frankel and Rose (1995)).
Another factor that allows assessing exchange rate market microstructure effects is the large availability
of information about intraday exchange rate operations, provided by proprietary trading systems, such as
Reuters 2000–2 Dealing System, the Electronic Broking System (EBS), Spot Dealing System and, in Brazil,
the Sisbex system. Information is collected systematically and made publicly available through systems
such as Reuters and Bloomberg Data License (used herein), allowing for comprehensive studies on market
microstructure using high frequency bid and ask quote data (tick by tick operations).
The availability of this information allows assessing some exchange rate market characteristics that cannot be
systematically explained by usual macroeconomic models. Among unexplained effects we have the persistence
of exchange rate returns in intraday data, related to deviations of the martingale property from the returns,
which translates into violations of market efficiency and into the principle of no arbitrage. Other effects that are
not accounted for by macroeconomic analysis are the determinants of the spread between bid and ask; the
importance of the information captured by order flows and its predictive power over future rates; the impacts of
chartist analysis on the exchange rate market; influence of trading volume, spatial location of agents and
volatility in price setting; and the importance of private information for the determination of prices and spreads.
A remarkable difference between market microstructure models and macroeconomic models concerns
assumed theoretical restrictions. Macroeconomic models are often based on representative agent
structures, symmetric information, rational expectations and absence of transaction costs. Market
microstructure models, on the other hand, are often characterized by asymmetric and heterogeneous
structures.2 There are several types of agents in this market, such as traders, market makers and customers
with distinct strategic goals and information sets.
Since the exchange rate market is decentralized3 and its operators are physically distant, the
information sets are distinct among agents, rendering private information relevant to the price setting
process. These different sets of information can allow arbitrage situations, which is indeed quite common
and could affect the degree of market efficiency, as reported in Flood (1994).
The wealth of information obtained from intraday data allows for the assessment of issues that would
not be accounted for by lower frequency data analysis. In addition to prices, intraday transactions include
other interesting sources of information, such as the time elapsed between two operations in the market
(order durations). The time elapsed between two orders is linked to the arrival of new information in the
market and is also an inherent liquidity measure.4 This information is relevant in market microstructure
models since prices are likely to be affected by recent transactions (e.g. Hasbrouck (1991) and Dufour and
Engle (2000)), that is, prices and the spread in the subsequent transaction will be affected by previous
prices and also by trading volume, spread, and time of previous transactions.
The aim of the present article is to assess the empirical effects of market microstructure based on intraday
bid and ask quotes in the R$/US$ exchange rate market. We evaluate the importance of private information in
the market by testing the Markov property (Section 5). The result of this test encourages the development of an
empirical price discovery model using a vector error correction model using bid and ask prices and the
previous price durations and quote volatility as explanatory variables, allowing to check the impact of recent
operations on prices.
2
See O'Hara (1995) for a review of the theoretical models of asymmetry of information in the context of market microstructure
and Hasbrouck (2007) to empirical implications.
3
Sarno and Taylor (2002) contain a description of the structures and agents in exchange markets. For a detailed description of
BRL/US$ exchange market see Garcio and Urban (2004).
4
For a review of informational content in durations and econometric models for conditional durations see Engle and Russel (1998)
and Engle (2000). Fernandes and Grammig (2005a) and Fernandes and Grammig (2005b) and references for specification and
testing for conditional duration models.
220
M.P. Laurini et al. / Emerging Markets Review 9 (2008) 247–265 249
Using the results of this model, we assess the determinants of equilibrium spread deviations between
bid and ask by developing a model that enables the asymmetric response of the spread to the previous
information set, by means of the quantilogram (Linton and Whang (2007)) and quantile autoregressions
(Koenker and Xiao (2006)).
The paper is organized as follows: Sections 2 and 4 describe the data used, show some characteristics of
these series and comment on the relationship with previous studies using exchange market data; Section 5
checks for the presence of asymmetric information by testing the Markov property; Section 7 describes the
vector error correction model used for price discovery and analyzes the effects of asymmetry on the
conditional distribution of the spread. Section 8 concludes.
2. Databases and previous studies
Despite the extensive literature on exchange rate market microstructure, there has been a scarcity of
research into the BRL/US$ exchange rate microstructure, one of most significant emerging markets. The
most important studies on the BRL/US$ exchange rate market microstructure are those by Garcio and Urban
(2004) and Wu (2007). The former takes an in-depth look into institutions and the operation of the
interbank currency exchange market in Brazil, describing the agents, institutions and the existing trading
mechanisms. In addition, the paper provides econometric evidence of a shift of Granger causality from the
futures market to the spot market, using daily data.
The article by Wu (2007) is a comprehensive study of Brazilian exchange market microstructures based
on daily data collected from the Sisbacen system, the Central Bank database containing all the consolidated
currency transactions in Brazil. The complete order flow database used by Wu (2007) is the only study in
the international literature that includes 100% of a country's official currency operations and enables
sorting out the effects of exchange rate movements related to trading and financial operations, Central Bank
interventions in the currency exchange market, and the consequences of these movements on the
exchange rate.
There are, however, criticisms against the database used by Wu (2007). The first one concerns the fact
that the data set does not correspond to the publicly available information at the time of the agents's
decision. Some of the information used is not disclosed by the Central Bank and some is made known but
with a large time delay, and therefore it is not the same as the data set used by agents in the decision-
making process in the intraday market.
Our analysis differs from previous ones because we use high-frequency bid and ask quotes. This method
is analogous to those used by Goodhart (1989) and Bollerslev and Domowitz (1993) and allows assessing
the effects of microstructure on the operation of the currency exchange market5 observed in intraday
quotes.
Our data are based on the spot market quotes provided by the Bloomberg Data License database. This
database format is known as FXFX DATA. This system collects information about the operations carried out
in several markets, including Sisbex (Trading system of the Brazilian Mercantile & Futures Exchange) and
over-the-counter operations, based on information gathered from several market participants. The sample
used in this study contains all order fed into the system, starting on May 28, 2006 and ending on November
30, 2006. The data set format is shown in Table 1.
The data file contains eight columns. The first column specifies the traded asset (currency). The second
column contains the time at which the operation was fed into the system, with accuracy in seconds. The
third and fourth columns show the bid identification and the bid price; the fifth column presents the agent
who provided the bid value. The sixth, seventh and eighth columns show the ask, ask price, and the
operator who provided the ask value. Note that the data do not always specify the operator in charge of the
bid and ask, since anonymous order is allowed.
The high-frequency exchange rate market data have a lot of limitations compared to other databases
used in empirical microstructure. The Transaction and Quotes (TAQ) for stocks traded on the New York
Stock Exchange (NYSE), for instance, contains additional information such as the price and volume of
transactions, instead of only indicative quotes. Exchange rate data show only the behavior of quotes, but do
5
Lyons (2001) contains a full description of the various databases used in microstructure of foreign exchange markets.
221
Table 1
Data format
| BRL | 2003-09-30 12:51:35 | B| 2.9 |BNPL | |A| 2.905 | PION |

| BRL | 2003-09-30 12:51:28 | B| 2.9 | ALLI | |A| 2.904 | BNPL |
| BRL | 2003-09-30 12:51:08 | B| 2.9 | ALLI | |A| 2.902 | PION |
| BRL | 2003-09-30 12:51:06 | B| 2.9 | ALLI | |A| 2.903 | ALLI |
| BRL | 2003-09-30 12:50:38 | B| 2.9 | ALLI | |A| 2.902 ||
| BRL | 2003-09-30 12:50:12 | B| 2.9 || |A| 2.902 ||
| BRL | 2003-09-30 12:50:10 | B| 2.899 | BNPL | |A| 2.902 ||
| BRL | 2003-09-30 12:49:16 | B| 2.901 | PION | |A| 2.903 ||
| BRL | 2003-09-30 12:46:17 | B| 2.899 | PION | |A| 2.9025 | BNPL |
| BRL | 2003-09-30 12:46:12 | B| 2.899 | PION | |A| 2.902 ||
| BRL | 2003-09-30 12:46:11 | B| 2.899 | PION | |A| 2.902 | ALLI |
not reveal the traded value. Other limitations include the absence of information on the trading volume and
the impossibility to find out whether the order was initiated by a buyer or seller, which is provided by the
Transaction Orders and Quotes (TORQ) database compiled by Hasbrouck (1992). The lack of these
informations (transaction prices, volumes and whether the transaction was initiated by a buyer or by a
seller) about effective transactions results from the absence of a disclosure clause for exchange rate
operations.
Although the scope of market microstructure analysis regarding the data set is limited by the no
disclosure, one should note that this is the data set publicly available to spot market operators, and
analyzed in other studies as Goodhart (1989) and Bollerslev and Domowitz (1993).
There is some evidence that the omission of transactions does not affect the estimation results, as
pointed out by Goodhart (1989), but the literature has demonstrated some problems with the use of quotes
because they are just indications and not transactions. Lyons (1996) shows that interdealer spreads are
lower than those of indicative quotes.6 To dismiss such criticisms one may say that these quotes correspond
to the information made publicly available in the spot market; studies as the one by Lyons (1996), which
uses data from a private dealer, have two shortcomings: the time interval is too short (weeks at most) and
they capture a private dealer's trading behavior and might not necessarily represent the behavior of the
market, due to the large heterogeneity of agents in the exchange rate market.
Note that the high-frequency data used in this paper contain two important pieces of information that
are not provided by other studies involving the exchange rate market, such as that conducted by Wu
(2007). The information on trading time allows us to build a variable for order duration, which is given by
the time elapsed between the arrival of two orders to the market. This variable provides information about
market liquidity and volatility, and its behavior represents the arrival of new information to the market (e.g.
(Engle and Russel (1998), Engle (2000) and Fernandes and Grammig (2005a)). Another variable is the
conditional volatility derived from a GARCH model for the mid-quote between the bid and ask, used as a
proxy for the traded price. The use of a mid-quote as trading price is justified in the microstructure
literature, since in some models, the mid-quote is related to the fundamental asset price.7
3. Data construction and filtering
Time series containing information on bid and ask prices are used in this paper, as shown in Table 1. Two
additional variables are built: the duration variable, given by the number of seconds between the arrival of
two orders, and the mid-quote variable, given by the average between the bid and ask prices, which will be
used to build the volatility proxy using a GARCH model.
Real-time storage of data yields a relatively large number of operations with incorrect information.8 The
correction was made by eliminating clearly discrepant observations due to mistypings (e.g.: a bid recorded
as 0.219 instead of 2.19), observations with negative spreads or spreads that are not compatible with the
6
For this conclusion Lyons (1996) uses the order flow from a particular dealer.
7
See Hasbrouck (2007) for a review of procedures and models used in empirical modeling of market microstructure.
8
See Falkenberry (2002) for details on the problems in the processing of high-frequency data.
222
Fig. 1. Bid–ask and spreads.
local spread behavior. These outliers were filtered out by the use of a filtering rule that regards as outliers
operations whose spread between bid and ask is larger than 10 standard deviations from the spread. This
rule captures mainly mistypings in the database. After filtering out this information, our database
comprises 279,737 tick-by-tick observations between May 28, 2006 and November 30, 2006.
Another consistency check regards the time of the transaction, which measures whether the time was
recorded correctly and whether it follows the correct sequence. Note that, in line with the microstructure
literature, we did not restrict trading hours because exchange rate market quotes are negotiated around the
clock, since this market operates nonstop in the three major trading locations (United States, Europe, and
Japan). Restricted information would lead to the exclusion of data that could serve as a benchmark for
pricing other financial instruments, as information from other markets outside the Brazilian trading hours
may affect the exchange rate determination in the domestic market. Around 4% of the quotes is negotiated
outside the normal business hours of Sisbex system in the BM&F, the most important market for BRL/US$
exchange rate.
The construction of the volatility variable occurred through a GARCH(1,1) model for the mid-quote price.
This variable is built in such a way that it can be used as a proxy for the volatility of the traded price, which
is not observed directly. This variable is obtained through a GARCH(1,1)9 model estimated by maximum
likelihood with estimated parameters (ω = 1.51E09, α = .0.210275, β = .7773732). The duration series is
obtained by the number of seconds between two orders.
4. Descriptive statistics and periodic patterns
4.1. Descriptive statistics and unit root tests
Figs. 1 and 2 show the graphs for the bid–ask series, spread, duration and volatility variables. The figures
reveal that the periods with an increase in spreads correspond to the highest values obtained for the bid–
ask series. Table 2 presents the descriptive statistics for these variables, as well as Phillips–Perron unit root
tests for the bid–ask series and bid–ask log returns. The test results indicate that we should not reject the
unit root null for the bid–ask series and that the log return series are stationary at 1% significance level. As
usual in financial time series, the Gaussian distribution is rejected for the bid and ask series.
9
The complete model is not placed in the text by restrictions of space, but can be obtained by request to the authors.
223
Fig. 2. Durations and volatilities.
4.2. Periodic patterns
A stylized fact in high frequency financial series is the presence of intraday periodic patterns, e.g. Zivot
and Yan (2003). To analyze periodic patterns, we executed a nonparametric fitting procedure that captures
intraday periodic patterns, using the smoothing spline model for the spreads, durations and volatilities. A
smoothing spline Green and Silverman (1994) can be defined as the solution to the minimization of the
following function:
n
i=1
∫
Sλ ðgÞ = ∑ ðYi −gðXi ÞÞ2 + λ ð g VVðxÞÞ2 dx ð1Þ
where g can be any curve, x is the data set and λ is what controls the smoothness of the fitting process,
controlling the trade-off between minimization of residuals and roughness of the adjusted curve. To obtain
the intraday patterns, we adjusted the smoothing splines by using 24 knots for each series. No significant
periodic patterns were found for the bid–ask log returns. Fig. 3 shows the patterns obtained for the spread,
duration and volatility.
The observed patterns show that the spread is often higher outside the Brazilian market trading hours
(interval of 2–6 a.m. hours). The figure shows that the spread tends to increase at opening and closing hours
Table 2
Descriptive statistics
Ask Bid Ask log returns Bid log returns Duration Variance
Mean 2.163719 2.162170 −2.67E−07 −2.07E−07 12.75741 4.17E−08
Median 2.162000 2.160500 0.000000 0.000000 1.999999 2.47E−08
Maximum 2.236900 2.355400 0.010915 0.0100925 3563.100. 0.000509
Minimum 2.122600 2.12000 −0.009199 −0.008751 0.000000 1.54E−08
Standard deviation 0.022946 0.02887 0.000187 0.000203 72.92399 1.76E−06
Asymmetry 0.502882 0.494570 1.111605 −0.13605 24.9654 54.49727
Kurtosis 2.515426 2.490172 247.0023 176.0372 814.3413 4827.558
Jarque–Bera 14518.0 14433.53 6.93E+08 2.20E+11 7.96e+09 7.46E+14
Prob. 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
p-Value PP unit root test 0.0372 0.0348 0.0001 0.0001 – –
Sample size 279,737 279,737 2,797,376 279,736 279,736 279,736
224
Fig. 3. Periodic patterns.
in the Brazilian currency exchange market, which is analogous to the “U” pattern obtained by Bollerslev and
Domowitz (1993) and consistent with the theoretical model for spread determination by Bollerslev and
Domowitz (1993). A relevant effect is that the spread tends to increase around 5 p.m., which is the time
limit for currency operations at Sisbacen. This effect can be rationalized by the fact that unrecorded
operations must be canceled, producing adjustment costs, inventory imbalance, and problems in risk
margins.
The periodic pattern for the duration series shows that the time length between two quotes is quite long
outside trading hours in Brazil, but shorter, with a slight increase around 5 p.m. The periodic patterns for
the volatility series show large volatility outside trading hours (due to the smaller number of quotes, price
jumps are higher), and that volatility tends to increase during the opening and the closure of trading hours
in Brazilian market, which is possibly associated with the adjustment of trading positions. Note that these
figures indicate a possible correlation between spread, duration and volatility. This association will be
tested in Sections 6 and 7.
An additional hypothesis about the distribution of spreads concerns the existence of clusters in the
spread, which may be consistent with price collusion (e.g. Christie et al. (1994), Hasbrouck (1999)). The
hypothesis of price collusion can be summarized as the tendency of spread quotes towards yielding
multiple values for the minimally allowed variation. In the case of exchange rate series, for instance, the
minimum spread is 0.001, but spread distributions tend to concentrate in a certain multiple value for the
minimum value. An easy way to test this effect is by checking the distribution of the last digit of the spread.
Under the null of no price clustering, the last digit of the spread should be uniformly distributed and the
proportion of values in each digit should be statistically equal. This is the approach followed by McGroarty
et al. (2006), who provide a comprehensive analysis of the clustering effect on exchange rates.
Table 3 shows the distribution of the last digit of spread. Note that the last digit concentrates around
values 1, 2 and 3, indicating that spread values tend to range from 0.001 to 0.003. The test for the equality of
proportions is carried out by using the χ2 for equality of proportions, which rejects the null hypothesis of
equal proportions. Even though the test rejects the null of no clustering, it does not indicate the possible
cause for price clusters. The studies by Hasbrouck (1999) and Hasbrouck (1999) discuss some possible
causes for this effect, and Hasbrouck (1999) complements this analysis with a dynamic model. In these
models, price clustering may be related to a stochastic cost for liquidity provision incurred by the market
maker. In Section 6 we show that the equilibrium spread has a value around 0.0025, and that the
225
Table 3
Distribution of the spread last digit
Last digit Size Proportion

0 16,833 6.017438%
1 135,928 48.59136%
2 111,837 39.97934%
3 11,260 4.025409%
4 1259 0.4500656%
5 434 0.1551457%
6 445 0.159078%
7 893 0.3192284%
8 657 0.2348635%
9 191 0.06827842%
χ2 test Stat. p-value
H0-equal proportions 842,649.8 b2.2e−16
concentration of values immediately below 0.001 and 0.003 units is related to short-run deviations from
the equilibrium spread.
5. Markovian property
The property of market efficiency, obtained by the assumption of agents's rationality and efficient
processing of the available data, implies that asset prices should be compatible with a first-order Markov
process. Thus, the price at time t should only depend on the most recent information at t − 1 plus an
innovation process:
Pt = Pt−1 + et ð2Þ
10
If the information is efficiently processed, all the information available up to period t − 1 should be
contained in price Pt − 1, and thus price variation should correspond to a nonsystematic error process.11
Another characteristic related to no-arbitrage conditions (e.g. Harrison and Kreps (1979) and Harrison and
Pliska (1981)) is that conditional expectation EQ ½Pt + k jFt = Pt , i.e., in the risk-neutral measure the price
should be a martingale process, which leads to the concept of equivalent martingale measure.
However, microstructure models based on asymmetric information, such as those by Glosten and
Milgrom (1985) and Easley and O'Hara (1987), predict that the existence of different information sets
between agents affects the Markov property of bid and ask prices. In these models, asymmetric information
causes prices at t to depend upon the whole trading history and not only upon the most recent information,
invalidating the Markov property for prices and indirectly characterizing some type of market inefficiency.
A discussion about this issue can be found in Flood (1994), who shows that the decentralization of agents in
the currency exchange market represents a less efficient information-based form than in stock markets.
Decentralization slows down the dissemination of information, and therefore the prices are correlated not
only with the most recent price, but also with a long set of prices in the past.
Thus, we can check for the presence of asymmetric information using Markov property tests. The test
proposed by Fernandes and Amaro de Matos (2007) takes into account the irregular pattern of price quotes
over time in high-frequency financial data. This is a nonparametric test for conditional independence,
based on the null hypothesis that if the Markov property holds, the length of time between both operations
should be independent of the realization of the variable related to asset prices, that is, the spread between
bid and ask.
10
Formally, the information process is a filtration Ft given by a crescent sequence of sub-sigma algebras Bt o Bu o B for 0 ≤ t ≤ u,
defined in an space of probability (Ω,B,P). We assume the usual filtration in this article, and suppress notation of the filtration.
11
Different types of market efficiency are related to distinct assumptions about the process εt. Efficiency type III would be
associated with a process uncorrelated εt correlated, whereas type II efficiency would be given more restrictive process εt assuming
independence. The more restrictive efficiency type I is obtained assuming that the process εt is independent and identically
distributed. See Campbell et al. (1997).
226
Table 4
Nonparametric test of Markov property
Series Test statistic p-value

Spread/duration 135.10 0.000
Adjusted spread/duration 102.74 0.000
The null hypothesis of the test derived by Fernandes and Amaro de Matos (2007) is given by:

H0 : fiXj di ; x; dj = fijx ðdi Þfxj x; dj ð3Þ
where fiXj (di, x, dj) represents the joint density of duration di, of spread x and duration dj, fi|X(di) the
conditional density of duration dj and fxj (x, dj) the joint distribution between spread dj and duration j, for
i N j. If the conditional independence property holds, the null hypothesis given by Eq. (3) is equivalent to the
validity of the Markov property.
The test derived by Fernandes and Amaro de Matos (2007) is based on the weighted quadratic distance
between fiXj (di, x, dj) − fi|X(di)fxj (x, dj), when densities are replaced with nonparametric density estimators.
The test statistic is given by:
^
2 3
1 ^ fiX dk + j; Xk ^
Λ ^ = wi dk + j; Xk ; dk f i dk + j; Xk; dk −
4 f Xj ðXk ; dk Þ5 ð4Þ
f n ^
f X ðXk Þ
where estimators f^(•) are kernel density estimators for joint, marginal and conditional densities and wi is a
weighting function. Fernandes and Amaro de Matos (2007) show that the statistic given by Eq. (5) has a
standard normal asymptotic distribution:
3=2 −3=2 ^
nbn Λ ^ − bn δΛ
^ f
λn = d Nð0; 1Þ:: ð5Þ
^
σΛ Y
where n stands for the sample size, bn is the bandwidth value,
e
δΛ = k ∑ni= 1 wi Xj dk + j ; Xk ; dk ^
^
f i dk + j ; Xk ; dk ð6Þ
n
^ = k ∑n w2 X d ; X ; d ^
σ f3 d ; X ; d

n i=1 i j k+j k k i k+j k k
Λ
and ek and υk are constants that depends on the selected kernel function.
To test the Markov property on prices, we followed the method proposed by Fernandes and Amaro de
Matos (2007) and we used the spread series at time t, and duration series at times t + 1 and t, and we use
the log-spread and log-duration series, in compliance with the same methodology. We calculated the
marginal, conditional and joint density estimators using a quartic kernel and the same rules applied by
Fernandes and Amaro de Matos (2007) for the selection of the bandwidth ek and υk. Table 4 shows the
Markov property test results for gross spreads and durations and also for adjusted spreads and durations
extracting the periodic pattern found in Section 4. The results indicate rejection of the null hypothesis that
the first-order Markov property holds for any significance level in both data sets. This evidence supports
the conclusion achieved by Richard Lyons – “Contrary to the asset approach – exchange rate determinations
is not wholly a function of public news.”,12 where we assume that the public information is given by the
past information on spreads and durations.
The rejection of the null hypothesis of the Markov property confirms the presumed existence of
asymmetric information effects pointed out by Glosten and Milgrom (1985) and Easley and O'Hara (1987)
for the exchange rate market. Our results are analogous to those obtained by Flood (1994), showing that the
12
Lyons (2001), pg 9.
227
differential information between agents is key to the determination of spreads and that the existence of
different information affects the agents's price discovery process. Note that it is possible to explain the
violation of the Markov property by the operational structure of the currency exchange market based on a
framework that includes several dealers with different locations.13
The results of rejection of Markov property indicate that the price discovery process will not be based
only on the price established in the immediately preceding time period, but on the whole set of past
information. Note that this property directly affects the model used for price discovery, which will have a
larger number of lags for the adjustment of the short-run dynamics.
6. Vector error correction models for price discovery
One of the predictions of the market microstructure literature in the presence of asymmetric
information is that agents should discover the actual equilibrium price of the asset in a process known as
price discovery.14 In this process, agents seek to determine the fundamental asset price, which is contained
in current quotes but contaminated with microstructure noise. When the price process is based on the
existence of several prices for the same asset, e.g. stocks traded on several stock exchanges or existence of
bid and ask prices, the discovery of the fundamental asset price is related to a mechanism of search for an
equilibrium price between several quotes for the same asset. This is equivalent to the existence of
correction mechanisms for deviations of prices in each quote from equilibrium prices.
This multivariate price discovery process can be represented by a vector error correction model in event
time.15 By assuming a bivariate vector of prices Pt = [p1t p2t], the vector error correction model is represented
as follows:
ΔPt = μ 0 + A1 ΔPt−1 + A2 ΔPt−2 + ::: + Ak ΔPt−k + γðZt−1 −μ 1 Þ + λ1 Xt−1 + ::: + λj Xt−k Zt−1 = ½p1t−1 −B1 p2t−1 ð7Þ
In this model, vector Zt − 1 represents the deviations of the long-run equilibrium values between p1 and
p2 and B1 are the coefficient vectors in the equilibrium relationship; γ is the coefficient vector that controls
the error correction mechanism; coefficients Ai represent the short-run coefficients; Xt − k is a vector of
explanatory variables that are not cointegrated with p1 and p2 and λ is a coefficient vector that captures the
influence of Xt − k on the short-run dynamics of ΔPt.
The VECM (vector error correction model) seeks to decompose the price adjustment dynamics into two
components — one that is linked to the short-run dynamics, given by components AiΔPt − k and λjXt − k and
the other one linked to the dynamics of the long-run equilibrium deviations given by cointegration vector
Zt − 1. In the context of the price discovery model, the cointegration vector represents the equilibrium
between two asset price measures, and this equilibrium includes a measure of the fundamental asset price,
given by Zt − 1 in the VECM.
In our study, the price vector is given by the bid and ask prices Pt = [bidt askt] and thus the cointegration
vector captures the relationship of the equilibrium spread value given by ask − B ⁎ bid. The aim of the VECM
proposed in this study is to decompose bid and ask variations at time t into two components: a short-run
component given by past bid and ask variations and explanatory variables and another component linked
to the long-run equilibrium, given by the correction of the deviations of the long-run relationship between
bid and ask.
The violation of the Markov property shows that the price discovery process cannot be based only on
the set of information about the immediately previous observation (ΔPt − 1 and Xt − 1), since quotes at time
t − 1 do not contain all the information available in the market (the private information shown in previous
transactions is not instantly incorporated into the prices, causing the violation of the Markov property
presented in Section 5). The price discovery process does not depend only on the immediately previous
13
See Lyons (2001) for a discussion on the effects of a structure of multiple dealers on the foreign exchange market.
14
For more references and econometric models of price discovery see Hasbrouck (1988, 1996, 2007, 1991).
15
For a review of multivariate models of price discovery see Hasbrouck (2007), and also Engle (2000) for models in event time.
Event time is the use of order of operations as time index, replacing the calendar time as index of the stochastic process, and
generating irregular spaced time series. Hasbrouck (2007) discusses the advantages of the use of the event time in the
microstructure studies, which are related to time deformation process.
228
Table 5
Cointegration tests
Cointegration trace test

Number of cointegrating equations Trace Stat. Critical value 0.05 Prob. a
None b 0.002669 940.6628 12.32090 0.0001
At most 1 1.86E−06 0.653874 4.129906 0.4789
Cointegration rank (max. eigenvalue) test

Number of cointegrating equations Max. eigenvalue Stat. Critical value 0.05 Prob. a
None b 0.002846 940.0089 11.22480 0.0000
At most 1 0.002669 0.653874 4.129906 0.4789
a
MacKinnon-Haug-Michelis (1999) p-values.
b
Indicates 1 cointegrating test at 0.05 level.
price, but on the whole trading history, a property that is compatible with some asymmetric information
models such as Glosten and Milgrom (1985) and Easley and O'Hara (1987).
In our VECM model, we also included the values of past durations and past conditional volatilities of the
mid-quote price as explanatory variables. These two variables are included as a way to check whether the
bids and asks respond to the effects of liquidity and uncertainty, using durations and volatilities as proxies
for these effects.
The specification of the VECM is valid in the presence of an equilibrium vector between the bid and ask,
a hypothesis that can be verified using a cointegration test. To test for the existence of an equilibrium
relationship, we used Johansen's cointegration test, whose results for the bid and ask log series are shown
in Table 5. The specification of the cointegration test is a error correction model using 120 lags for variations
in the bid and ask logs, 24 lags for past durations and 20 lags for past volatilities, where the specification
was determined by Schwartz Information Criteria. As discussed in Section 5, this large number of lags is
related to the violation of the Markov property. The test results show that we rejected the null hypothesis of
nonexistence of a cointegration vector with p-value of 0.001 by both tests (Rank and Trace) obtained by
Johansen's procedure, indicating the existence of a long-term equilibrium mechanism between bid and ask
logs and the validity of vector error correction model.
Note that Johansen's cointegration test is based on the assumptions of normal distribution in the
residuals and absence of structural breaks. The hypothesis of normal distributions is not valid for our data
set, since kurtosis and asymmetry indicate non-Gaussian distributions. Therefore, the critical values used
by the test must be affected by this violation. Note, however, that there is an a priori economic reason for
the existence of a cointegration vector between the bid and ask, since an imbalance between the bid and
ask, representing a non-stationary spread, leads to systematic arbitrage opportunities. Thus, we did not
reject the evidence in favor of a cointegration hypothesis, even with possible distortions in the power of the
test.
The vector error correction model is partially shown in Table 6, where we present the estimated
cointegration vector (cointegration equation), the loading matrix for the correction of long-run deviations
(error correction) and the first two lags of the short-run mechanism. The cointegrating equation shows that
the normalized cointegration vector for the bid–ask log is [1 − 1.000949], which represents an equilibrium
spread of.00259.
This equilibrium spread value can be explained by three basic factors: costs related to market dealers's
functions, costs of handling currency inventory and a factor related to the asymmetric information given by
adverse selection. The dealers's costs are linked to the provision of immediate liquidity. The inventory costs
are related to the provision of liquidity and to the possibility that the dealers may be operating with agents
who have privileged information (insiders). The adverse selection problem is given by the fact that the
dealers are not able to distinguish agents with demands for liquidity and hedge from insiders, thus
increasing the spread for both classes of agents.16
The estimated loading matrix γ is given by the value of −0.015364 for the ask log variation and 0.014691
for the bid log variation. We can interpret these signs as follows: positive deviations from the equilibrium
16
See Frenkel et al. (1996) and Sarno and Taylor (2002) for detailed references on this explanation of determinants of the spread.
229
Table 6
Error correction model
Long run Dynamics
Cointegrating Equation
log (ask(−1)) 1.000000
log (bid(−1)) −1.000949
(1.7E−05)
[−575570.]
Error Correction
D (log(ask)) D (log(bid))
Coint Eq ( 1) −0.015364 0.014691
(0.00089) (0.00094)
[−17.2781] [15.9325]
Short run Dynamics

D (log(ask)) D (log(bid))
D (log (ask(−1))) −0.209626 0.094811
(0.00202) (0.00214)
[−103.647] [44.3896]
D (log (bid(−1))) 0.087291 −0.238502
(0.00200) (0.00211)
[32.5363] [− 70.4570]
DUR (− 1) −2.93E−06 −4.17E−06
(1.4E−07) (1.5E−07)
[−16.5732] [− 27.4075]
DUR (− 2) −1.38E−07 −9.12E−07
(7.2E−07) (1.5E−07)
[−16.5732] [− 5.61120]
GARCH (−1) 77.68290 89.27822
(3.29714) (3.48201)
[23.5607] [25.6398]
GARCH (−2) −69.63691 −174.6838
(4.73599) (5.00153)
[−14.7038] [− 34.9260]
spread are adjusted by reducing the ask price and increasing the bid price, but the mechanism is the
opposite for spreads below the equilibrium value, being characterized by an increase in the ask price and a
decrease in the bid price. The short-run mechanisms are harder to analyze, due to the large number of lags
and the change in the sign of coefficients. There is evidence that 120 lags correspond on average to 20 min
at the calendar time, showing that the mean time for the incorporation of information can be approximated
by this value.
Fig. 4 shows the generalized response impulse functions obtained by the VECM estimation in Table 6.
The figure indicates that the shocks converge around 50 observations to their permanent value, and show
stable convergence to the long-run values.
7. Spread determination
The VECM estimated in Section 6 allows determining empirical models for the equilibrium spread value
and the correction mechanism for equilibrium spread deviations, but it does not allow the direct
identification of the factors that influence and impact the spread. To assess spread determinants, we begin
by investigating the empirical characteristics of the spread deviation series, created by the cointegrating
equation in the VECM. Based on the observed characteristics, we formulated a asymmetric response model
for the conditional distribution of the spread based on quantile autoregression Koenker and Xiao (2006)
and the results of the quantilogram estimates Linton and Whang (2007).
Note that the existence of cointegrating vector between the log bid and the log ask is equivalent to the
existence of a stationary process for spread deviations. To describe this process, we first formulated a linear
230
Fig. 4. Impulse response function.
model for spread deviations. This method is based on regressions on the spread as those described in Jorion
(1996). Following the methodology of Jorion (1996), we developed an autoregressive linear model for the
spread by adding variables that represent the expected values of volatility and duration, measuring the
expected effects of uncertainty and liquidity.
The formulation of this model seeks to control the stochastic impacts of the dealer's costs and the impact
of expected risk and liquidity values on the determination of equilibrium spread deviations (which can be
seen as deviations on the average costs embedded in the equilibrium spread) discussed in Section 6.17
The model is based on a third-order autoregressive process for equilibrium spread deviations, with the
addition of one-step ahead predictions for volatility and duration. Volatility forecasts are obtained by the
same GARCH model used for the construction of the volatility variable. For duration forecasts, we estimated
and create forecasts by mean of an Autoregressive Conditional Duration Model of Engle and Russel (1998).18
The aim of incorporating the predictions is to include the effect of agent's expectations about the volatilities
and durations expected for the next transaction in spread determination.
17
An alternative way of looking at the determination of bids and asks is the use of permanent-transitory decompositions for time
series (e.g. Hasbrouck (2007)). The permanent component would be linked to fundamentals of the asset and the transitory
components to microstructure effects.
18
The estimated model is an Autoregressive Conditional Duration (ACD) model, with estimated parameters: ψt = 1.37e − 05 +
0.196830xt2− 1 + .884851ψt where xt are price durations and ψt the conditional durations. More general ACD models could be used;
see Bauwens and Giot (2000), Bauwens et al. (2002) and Fernandes and Grammig (2005b) for generalizations for the ACD model.
231
The results obtained in this spread decomposition (7) show that there is a high dependence of the
spread on past spreads (the persistence measured by the sum of the autoregressive coefficients is
approximately .93). Other important effects are the signs obtained for the coefficients related to volatility
and duration forecasts. We obtained a positive sign for volatility and a negative but nonsignificant sign for
duration. This may be interpreted as an additional premium in the spread for uncertainty. The
nonsignificance of duration can be interpreted as a measure of high liquidity in this market, where the
agents do not have to pay a premium for urgent transactions. These effects can be interpreted as the dealers
protection against uncertainty (volatility can be related to the arrival of insiders with privileged
information and protection against a higher loading cost for an increase in volatility). Similar effects are
obtained in Glassman (1987), who found positive correlation between spread and volatility.
A possible shortcoming of this model is the symmetric treatment of spread deviations: spreads below
the equilibrium value are treated just like those values above the equilibrium spread. Note that these
situations are intuitively associated with different market situations. Therefore, the imposition of the same
response in both situations can be an invalid restriction.
7.1. Quantilogram
To check for a possible asymmetric response in spread deviations, we used a tool known as
quantilogram, derived by Linton and Whang (2007). The quantilogram is a generalization of the
correlogram to the modeling of the dependence in conditional quantiles of the time series distribution. The
quantilogram is also a measure of directional predictability, as discussed in Linton and Whang (2007) and is
part of the general literature on tests for market efficiency.
Let y1, y2… 19 be a stationary process whose marginal distribution with quantiles µα for α ∈ (0,1). In the
null hypothesis of no directional predictability conditional on quantile α:
E½ψα ðyt −μ α ÞjFt−1 = 0 ð8Þ
where ψα(x) = α − 1(x b 0) is an indicator function that measures if the variable t hit the quantile α and
Ft−1 = yt−1 ; yt−2 N is the usual filtration. Under the null hypothesis of no directional predictability, if the
variable at time t − 1 is below quantile α, the chance is no more than α for series y to achieve this quantile
again at time t. Violations of this hypothesis are evidence of predictability in this conditional quantile. Note
that the traditional test of weak-form market efficiency fits this context, when one uses the conditional
mean:
E½ðyt −μ ÞjFt−1 = 0: ð9Þ
The quantilogram has two advantages over the usual directional predictability measures: the estimation
of conditional quantiles is robust to the presence of outliers and some quantiles of the distribution of asset
returns have a straightforward interpretation in risk management, being related to measures such as Value
At Risk and Expected Shortfall. Note that there is a similar interpretation in the analysis of the spread — if
the higher quantiles of the spread distribution show persistence, this effect affects the dealers's incomes
and the transaction costs in this market.
To measure the dependence in conditional quantiles, the quantilogram derived by Linton and Whang
(2007) is given by the following expression:
E½ψα ðyt −μ α Þψα ðyt + k −μ α Þ

ρακ = h i : ð10Þ
E ψ2α ðyt −μ α Þ
19
The analysed series yi can be a directly observed process or residual of a model estimated in a first stage, as in our case. Linton
and Whang (2007) derived asymptomatic distributions valid in the situations. The inference in quantilogram also remains valid in
the presence of general heteroscedastic components, as stationary GARCH process.
232
Fig. 5. Quantilogram.
233
The sample estimator is given by:
^ ^
h i
∑T−k
t = 1 ψα yt − μ α ψα yt + k − μ α
^
ρ ακ = sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð11Þ
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi
∑T−k ψ2 y − ^ ∑T−k ψ2 y − ^
h
t=1 α μ
t α t=1 α t μ Þ α
where the estimator for μ^ α is given by the sample quantile, which is an estimator of the process:
T
^
μ α = arg min ∑ ρα ðyt −μ Þ: ð12Þ
μaR i = 1
The null hypothesis of no directional predictability is given by
H0 E½ψα ðyt −μ α Þψα ðyt + k −μ α Þ = 0 ð13Þ
for the each α quantile. Fig. 5 shows the quantilogram estimated for quantiles (0.01, 0.05, 0.10, 0.25, 0.50, 0.75,
0.90, 0.95, 0.99). The quantilogram estimated for these quantiles shows that the same shape of autoregressive
dependence found in conditional mean occurs in the quantiles, but note that the dependence intensity is
different in each quantile, indicating a correlation in the first lags close to .4 for quantile .01 and an upward
trend in the persistence of higher quantiles; in quantile 0.99 (highest spreads), persistence is close to .95.
This asymmetric effect demonstrates that the lowest percentiles are characterized by low persistence
and fast reversion to the unconditional quantile, whereas for the points where the spread is way above the
equilibrium values (percentiles greater than .90), large persistence exists. Note that this asymmetry effect is
of major financial importance, since it shows that high spreads tend to be more persistent than lower ones.
Again, we can interpret this effect as a response of dealers to unanticipated shocks, such as increase in
uncertainty and higher currency inventory maintenance costs.
7.2. Quantile autoregression
The quantilogram reveals different time dependence patterns for each conditional quantile, but it does
not represent a complete parametric model. This indicates the necessity to build a model for the conditional
distribution of the spread for each quantile using an autoregressive structure, but also controlling for
volatility effects and expected durations, analogous to the linear model estimated for the spread.
A possible tool for this type of analysis is the quantile autoregression model (Koenker and Xiao (2006)),
which consists in formulating a quantile regression model using the lags of the dependent variable as
Table 7
Linear model for spread
Spread deviations Method: least squares

Variable Coefficient Std. error T stat Prob.
C − 1.63E−06 3.27E−06 −4.996740 0.0005
Duration forecast − 7.25E−06 4.88E−06 −1.487233 0.1370
Volatility forecast 0.087317 0.01883 4.624163 0.0000
Spread deviations (−1) 0.673913 0.013154 52.23440 0.0000
R-squared 0.849712 Mean dependent S.E. of regression Akaike info criterion
var −1.54E−05 0.000234 —13.88596
Adjusted R-squares 0.778203 S.D. dependent var Sum squared resid Schwarz criterion
0.000593 0.231517 —13.88573
Log likelihood 1,942,215 F-statistic 316,321 Included obs.: 297,737 Newey–West HAC
after adjustments (lag trunk = 23)
Durbin–Watson stat 2.01857 Prob. (F-statistic)
0.000000
234
Table 8
Quantile autoregressions
α Intercept Vol. forecast Dur. forecast ϕ1 ϕ3 ϕ3 ϕ1 + ϕ2 + ϕ3

0.01 −0.000305 −1.5471 1.7146e−007 0.33812 0.06216 0.08322 0.48350
Std deviation 0.00048006 1.3000 0.0096843 0.25598 0.29264 0.14321
T stat −0.63534 1.1900 1.7705e−005 1.1554 0.24286 0.58113
0.05 2.783e−005 −1.9203 −7.2699e−006 0.47112 0.087408 0.094346 0.652878
Std deviation 1.0019e−005 0.053672 2.9878e−007 0.0055919 0.0033144 0.0021805
T stat 2.7430 −35.778 24.332 84.251 26.372 43.268
0.10 0.00011128 −1.8572 7.2020e−006 0.55422 0.10253 0.11588 0.941204
Std deviation 9.0151e−006 0.0506679 4.1718e−007 0.0041372 0.0031668 0.0038092
T stat 12.344 −36.645 17.264 133.96 32.378 30.421
0.25 9.5605e−005 −0.95422 1.877e−005 0.73684 0.10723 0.097129 0.941204
Std deviation 4.9570e−006 0.024695 8.2370e−005 0.0035932 0.0024669 0.0020315
T stat 19.287− 38.639 0.22675 205.07 43.469 47.812
0.50 −5.5749e−007 −0.0030460 5.8815e−006 0.92536 0.050826 0.019937 0.996123
Std deviation 2.3941e−007 0.00067876 2.7470e−006 0.018844 0.013394 0.0056270
T stat −2.3286 −4.4877 2.1410 49.108 3.7948 3.5430
0.75 −0.00011275 0.90265 1.2775e−005 0.79667 0.10759 0.088515 0.99277
Std deviation 2.0740e−006 0.016636 4.3110e−006 0.0042570 0.0040564 0.0023968
T stat −54.363 54.260 2.9634 187.15 26.523 36.930
0.90 −0.00031379 2.9543 3.3944e−005 0.65856 0.093343 0.20397 0.955868
Std deviation 9.8263e−006 0.057821 2.0210e−005 0.0032553 0.0058786 0.0045187
T stat −31.934 51.093 1.6796 202.30 15.878 45.138
.95 −0.00034529 3.8288 0.00012853 0.64144 0.86261 0.22052 0.948226
Std deviation 2.1127e−005 0.12358 4.1586e−006 0.0061701 0.0087383 0.0066088
T stat −16.344 30.981 30.907 103.96 9.8717 33.368
0.99 −0.00015489 5.1649 0.00047318 0.79704 0.087104 0.073036 0.957176
Std deviation 0.00041945 5.1930 0.021389 0.73670 0.072878 0.54358
T stat −0.36928 0.99457 0.022123 1.0819 1.1952 0.13436
explanatory variables. In a quantile regression (Koenker and Basset (1978)), the objective function is
directly formulated as a function of the conditional quantile, minimizing the function:
n
min ∑ ρα ðyi −x iVβðα ÞÞ ð14Þ
βaRp i = 1
which corresponds to a loss function ρα conditional on quantile α, where α ∈ (0,1). We define the loss
function as ρα (u) = u(α − I(u b 0)), where I(.) is an indicator function, and u is the difference between the
observed value yi and the value predicted by x′β(α). ^(α) are obtained by minimizing the
Estimators for β
i
loss function given by 14, obtained by the expected value of ρα(yi − x′β(α)) i relative to each β(α).
Note that the quantile regression model can be extended to autoregressive structures, using a quantile
regression for the autoregressive process20:

Qyt αjyt−1; N ; yt−p = β0 ðτÞ + β1 ðα Þyt−1 + ::: + βp ðα Þyt−p : ð15Þ
Thus a first-order autoregressive quantile model (QAR(1)) can be written as:

Qyt ðαjyt−1 Þ = β0 ðα Þ + β1 ðα Þyt−1 : ð16Þ
Note that we can represent this model as yt = β0(Ut) + β1(Ut)yt − 1 where Ut is uniformly distributed
between (0,1). In this formulation, the traditional autoregressive AR(1) model is obtained when β0(u) =
σΦ− 1 and β1(u) = β1 To model the possible asymmetry structure in the response of spread deviations, we
built the following model:

Qyt αjyt−1;:::; yt−p = β0 ðτ Þ + ðα ÞE½VoljF t−1 + δðα ÞE½DurjF t−1 + β1 ðα Þyt−1 + :::βp ðα Þyt−p ð17Þ
which is a version of the conditional quantile of the model estimated in Table 7.
20
Engle and Manganelli (2004) use a similar formulation for the estimation of conditional value at risk.
235
We estimated21 these models for the same quantiles (0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.99)
used in the quantilogram (8). The quantile regression model (Eq. (17)) reveals the asymmetry effect already
shown by the quantilogram — the persistence of shocks is lower for the quantiles below the conditional
median and increases for quantiles above the conditional median, being close to 1 in these quantiles, and
confirming the asymmetric response of spread deviations.
The estimated coefficients related to volatility and duration show another interesting effect. The
coefficients of volatility always have negative signs for quantiles below the median and positive signs for
percentiles above the median. The effect of duration is clearer for quantiles above the median, where the
coefficients are always positive and statistically significant (Table 8).
This asymmetry effect indicates that volatility and duration increase the spreads for spread deviations
above the conditional median, and this effect is enhanced in higher quantiles. We may interpret this
asymmetry effect as asymmetric relationship of the spread with volatility and duration — spreads above the
equilibrium spread show high persistence and are positively influenced by volatility and by expected
durations, whereas spreads below the equilibrium spread are poorly persistent and negatively correlated
with expected volatility and durations, indicating an asymmetric mechanism of reversion to the
equilibrium spread.
8. Conclusions
In this paper, we assessed some empirical properties related to market microstructure using high-
frequency bid and ask quote data for the BRL/US$ exchange rate market. The paper shows that some
stylized facts observed in the international literature on exchange rate market microstructure are valid for
the BRL/US$ series and introduces new tools for the analysis of empirical microstructure effects.
Among the effects analyzed herein, we observed that the violation of the Markov property implies that
there is no immediate incorporation of new information into prices in this market, resulting in a structure
with long-range dependence in terms of bid and ask returns. To capture this long-range dependence
structure, we built a price discovery model using a vector error correction model, parameterizing this
process of information incorporation and obtaining an estimate for the equilibrium spread, which is
interpreted in the microstructure literature as a measure of the average costs of liquidity provision and
stock loading by dealers who operate in this market.
The modeling of the spread shows that there is a mechanism of asymmetric response for this variable,
where spread values above and below the equilibrium value react differently to the previous information
about the spread, volatility and durations. Spread values above the equilibrium value show high persistence
and react positively and proportionally to the quantile in relation to the expected volatility and conditional
duration, whereas we found an inverse relationship for quantiles below the spread distribution, with a
negative correlation of the spreads with the expected volatilities and durations and low persistence, which
characterizes a nonlinear mean reversion in the spreads.
This analysis of asymmetry using tools such as quantilogram for the identification of asymmetry in
conditional quantiles and the modeling of this structure using quantile regression models is original in the
literature on currency exchange market microstructure. Such empirical evidence points to new stylized
facts that should be added to theoretical models of market microstructure.
References
Bauwens, L., Giot, P., 2000. The logarithmic ACD model: an application to the bid–ask quote process of three NYSE stocks. Annales
d'Economie et de Statistique 60, 117–149.
Bauwens, L., Giot, P., Grammig, J., Veredas, D., 2002. A comparison of financial duration models through density forecasts.
International Journal of Forecasting 20, 589–609.
Bollerslev, T., Domowitz, I., 1993. Trading patterns and prices in the interbank foreign exchange market. Journal of Finance 48,
1421–1424.
Campbell, J., Lo, A., MacKinlay, A.C., 1997. The Econometrics of Financial Markets. Princeton University Press.
Christie, W., Harris, J.H., Schultz, P., 1994. Why did NASDAQ market makers stop avoiding odd-eights quotes? Journal of Finance 49,
1841–1890.
21
For a complete reference about quantile regression see Koenker (2005). We use the method of rank inversion for calculating the
variance–covariance matrix of the parameters.
236
Dufour, A., Engle, R., 2000. Time and the price impact of a trade. Journal of Finance 55 (6), 2467–2498.
Easley, D., O'Hara, M., 1987. Price, trade size, and information in securities markets. Journal of Financial Economics 19, 69–90.
Engle, R., 2000. The econometrics of ultra high frequency data. Econometrica 68, 1–22.
Engle, R., Manganelli, 2004. CAViaR: conditional autoregressive value at risk by regression quantiles. Journal of Business and Economic
Statistics 22 (4), 367–381.
Engle, R., Russel, J., 1998. Autoregressive conditional duration: a new model for irregularly transaction data. Econometrica 66,
1127–1162.
Falkenberry, T. N., 2002. High Frequency Data Filtering. Tech. rept. Tick Data.
Fernandes, M., Amaro de Matos, J., 2007. Testing the Markov property with high frequency data. Journal Of Econometrics 141 (1),
44–64.
Fernandes, M., Grammig, J., 2005a. A family of autoregressive conditional duration models. Journal of Econometrics 127 (1), 1–23.
Fernandes, M., Grammig, J., 2005b. Nonparametric specification tests for conditional duration models. Journal Of Econometrics 127
(1), 35–68.
Flood, M.D., 1994. Market structure and inefficiency in the foreign exchange market. Journal of International Money and Finance 13,
52–70.
Flood, R. P., & Taylor, M. P., 1996. The Microstructure of Exchange Rate Markets. National Bureau of Economic Research. Chap. Exchange
Rate Economics: What's Wrong with the Conventional Macro Approach?, pages 261–294.
Frankel, J. A., & Rose, A. K., 1995. Handbook of International Economics. North-Holland. Chap. Empirical Research in Nominal Exchange
Rates, pages 1698–1729.
Frenkel, J.A., Galli, G., Giovannini, A. (Eds.), 1996. The Microstructure of Foreign Exchange Market. National Bureau of Economic
Research.
Garcio, M., & Urban, F., 2004. O Mercado Interbancário de Câmbio no Brasil. Unpublished Working Paper.
Glassman, D., 1987. Exchange rate risk and transactions costs: evidence from bid–ask spreads. Journal of International Money and
Finance 6, 481–490.
Glosten, L., Milgrom, P., 1985. Bid, ask and transaction prices in a specialist market with heterogenously informed traders. Journal of
Financial Economics 14, 71–100 (Mar.).
Goodhart, C., 1989. News and the Foreign Exchange Market. In: Manchester Statistical Society.
Green, P.J., Silverman, B.W. (Eds.), 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach.
Chapman and Hall.
Harrison, J.M., Kreps, D., 1979. Martingales and arbitrage in multiperiod securities markets. Journal of Economic Theory 20, 381–408.
Harrison, J.M., Pliska, S., 1981. Martingales and stochastic integrals in the theory of continuous trading. Stochastic Processes and Their
Applications 11, 215–260.
Hasbrouck, J., 1991. Measuring the information content of stock trades. Journal of Finance 46, 179–207.
Hasbrouck, J., 1992. Using the TORQ Database. Tech. rept. NYSE Working Paper 92-05.
Hasbrouck, J., 1999. Security bid/ask dynamics with discreteness and clustering. Journal of Financial Markets 2, 1–28.
Hasbrouck, J., 2007. Empirical Market Microstructure. Oxford University Press.
Hasbrouck, Joel, 1988. Trades, quotes, inventories and information. Journal of Financial Economics 22, 229–252.
Hasbrouck, Joel, 1996. Modelling market microstructure time series. Chap. 22 of. In: Maddala, GS, Rao, CR (Eds.), Handbook of
Statistics. Statistical Methods in Finance, vol. 14. North-Holland.
Jorion, J., 1996. The Microstructure of Foreign Exchange Markets. National Bureau of Economic Research. Chap. Risk and Turnover in
The Foreign Exchange Market, pages 19–40.
Koenker, R., 2005. Quantile Regression. Cambridge University Press.
Koenker, R., Basset, G., 1978. Regression quantiles. Econometrica 46, 33–50.
Koenker, R., Xiao, Z., 2006. Quantile autoregression. Journal of the American Statistical Association 475, 980–1006.
Linton, O., Whang, J., 2007. A quantilogram approach to evaluating directional predictability. Journal of Econometrics 141 (1),
250–282.
Lyons, R., 1996. The Microstructure of Foreign Exchange Markets. National Bureau of Economic Research. Chap. Foreign Exchange
Volume: Sound and Fury Signifying Nothing?, pages 183–208.
Lyons, Richard K., 2001. The Microstructure Approach to Exchange Rates. MIT Press.
McGroarty, F., Gwiylm, O., Thomas, S.H., 2006. Microstructure effects, bid–ask spreads and volatility in the spot foreign exchange
market pre and post-EMU. Global Finance Journal 17 (1), 23–49.
O, Hara, M., 1995. Market Microstructure Theory. Blackwell.
Sarno, L., Taylor, M., 2002. The Economics of Exchange Rates. Cambridge University Press.
Taylor, M.P., 1995. The economics of exchange rates. Journal of Economic Literature 83, 19–47.
Wu, T., 2007. Order Flow in the South: Anatomy of the Brazilian FX Market. University of California, Santa Cruz, Unpublished Working
Paper.
Zivot, E., & Yan, B., 2003. Analysis of High-Frequency Financial Data with S-PLUS.
237
Conclusão Geral
Nesta tese foram apresentadas contribuições a literatura de econometria de finanças,

com especial referência a modelagem da Estrutura a Termo das Taxas de Juros.
No primeiro artigo da tese “Bayesian Extensions to Diebold-Li Term Structure Model“

foi introduzida uma estrutura de fatores latentes para decompor e prever os movimentos
observados na Estrutura a Termo das Taxas de Juros, através de uma metodologia Baye-
siana utilizando métodos baseados em Markov Chain Monte Carlo. Devido aos problemas
de inferência relacionados a problemas de dimensionalidade existentes em modelos para
curvas de juros, usualmente é necessário assumir estruturas com um número limitado de
fatores ou então com pressupostos restritivos, como homoscedasticidade. A estrutura de
fatores latentes introduzidas neste artigo permite a generalização das formas funcionais e
estruturas de dependência utilizadas na modelagem de curvas de juros, permitindo incluir
mudanças temporais nos parâmetros que definem o formato da curva de juros, bem como
a inclusão de estruturas de volatilidade condicional. A formulação Bayesiana utilizada
também permite procedimentos de inferência sobre parâmetros, fatores latentes e pre-
visões derivadas do modelo, em contraste com as formulações normalmente utilizadas na
literatura em que estes procedimentos de inferência são realizada de forma incompleta ou
aproximada.
No segundo artigo da tese, “Generalized Latent Factor Models For Yield Curves In
Multiple Markets“, mostramos o potencial de generalização existente nesta formulação
Bayesiana de fatores latentes, realizando a modelagem conjunta de múltiplas curvas de
juros. As estruturas propostas permitem capturar de forma adequada o formato mais
complexo observado nas curvas de juros, e também permite identificar as interações entre
movimentos nas curvas entre mercados. Neste artigo também é proposta uma metodologia
de redução do número de parâmetros efetivos através de métodos de Bayesian Shrinkage.
A validade de condições de não-arbitragem e os problemas de identificação existentes
também são tratados neste artigo.
O uso de métodos de inferência baseados em Verossimilhança Empı́rica e Mı́nimo Con-

traste Generalizado para problemas em econometria de finanças são foram discutidos nos
239
dois próximos artigos da tese. No trabalho “Generalized Empirical Likelihood/Minimum
Contrast Estimation of Stochastic Differential Equations“ mostramos que o uso de métodos
semi-paramétricos permite obter estimadores com boas propriedades de viés e eficiência
para a estimação de uma série de modelos em tempo contı́nuo utilizados em modelagem
de taxas de juros. Estas propriedades são interessantes já que a existência de viés e es-
timações ineficientes pode gerar problemas na precificação de ativos e em procedimentos
de gestão de risco. Os procedimentos de inferência semi-paramétrica baseados em Veros-
similhança Empı́rica e Mı́nimo Contraste Generalizados discutidos também representam
melhores propriedades em testes de hipóteses sobre parâmetros e em procedimentos de
testes de especificação.
No artigo “Estimation of Stochastic Volatility Models Using Methods of Generalized

Empirical Likelihood/Minimum Contrast“ mostramos as propriedades obtidas pelo uso
dos métodos Verossimilhança Empı́rica e Mı́nimo Contraste Generalizado na estimação
de modelos de Volatilidade Estocástica. Os resultados obtidos indicam boas propriedades
em termos de viés e eficiência na estimação destes modelos, e adicionalmente indicam
para propriedades de robustez a presença de outliers e distribuições com caudas pesadas,
um problema tı́pico de séries temporais financeiras. Estas metodologias também são
computacionalmente mais simples que as técnicas normalmente utilizadas na estimação
de modelos de Volatilidade Estocástica, o que permite uma gama maior de aplicações e
generalizações.
O próximo artigo da tese - “Indirect Inference in Fractional Short-Term Interest Rate

Diffusions“ mostramos como métodos de inferência baseados no princı́pio de inferência
indireta podem ser utilizados em problemas de estimação de modelos em tempo contı́nuo
que não obedecem aos pressupostos usuais de dependência. Mostramos que este estimador
tem boas propriedades na estimação de processos de difusão baseados em processos de
memória longa (Fractional Brownian Motion), e a aplicação empı́rica realizada dá suporte
às evidências da existência de memória longa em séries de taxas de juros, agora utilizando
uma estrutura de baseada em tempo contı́nuo.
O artigo‘Constrained Smoothing B-Splines For The Term Structure Of Interest Ra-

tes“ discute o uso de métodos não-paramétricos baseados em splines suavizantes com
restrições de formato para o ajuste e interpolação de curvas de juros. Os resultados obti-
dos neste artigo indicam que esta metodologia possui propriedades superiores a diversas
outras formas paramétricas e não-paramétricas usualmente utilizadas, construindo curvas
de juros que obedecem a condições de não-arbitragem e permitem superar outros pro-
240
blemas existentes nestas metodologias, como o ajuste inadequado para observações com
maturidades muito longas ou para segmentos da curvas com poucas observações.
No último artigo colocado na tese, “Empirical market microstructure: An analysis of

the BRL/US$ exchange rate market“, foram introduzidas metodologias para identificar
efeitos de microestrutura de mercado em dados de alta freqüência em finanças, resultando
em um modelo mais completo de descoberta de preços que permite identificar os efeitos de
volatilidade e liquidez na determinação de preços. Este artigo também contribui para esta
literatura introduzindo um modelo assimétrico para a modelagem do spread entre preços
de compra e venda, baseado em um modelo auto-regressivo para os percentis condicionais.
241

Laurini MarcioPoletti D

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Laurini MarcioPoletti D

Enviado por

Direitos autorais:

Formatos disponíveis

Márcio Poletti Laurini

Ensaios em Econometria de Finanças

Tese apresentada ao Instituto de Ma-

Orientador - Prof. Dr. Luiz Koodi

39 ∃ ; <# ∀## =# & ∃ #

7 #( > 7 ∃ ; <# 41 =? %#5 ,∀ ∃ #, , # # #,

Os dois artigos iniciais tratam da modelagem da Estrutura a Termo de Taxas de

A metodologia proposta endereça alguns dos problemas de inferência existentes no

Na estimação de equações diferenciais estocásticas a avaliação da função de verossi-

No artigo que trata da estimação de modelos de volatilidade estocástica a dificuldade

O próximo artigo da tese “Indirect Inference in Fractional Short-Term Interest Rate

O artigo “Constrained Smoothing B-Splines For The Term Structure Of Interest

O último artigo da tese, “Empirical market microstructure: An analysis of the BRL/US$

MÁRCIO POLETTI LAURINI

LUIZ KOODI HOTTA

conditional heteroskedasticity observed in the interest rates.

other models in terms of fitting and forecasts.

Keywords: Term Structure, Bayesian Inference, Markov Chain Monte Carlo.

JEL Codes: G1,C22.

β2t and β3t .

(4.3) lnσt2 = φ0 + φ1 lnσt−1

5. Markov Chain Monte Carlo Estimation

(t+1) (t+1) (t) (t)

Figure 6.1. Swap Di-PRÉ term structure(12/01/2004 - 05/12/2006)

Figure 6.2. Adjusted curve and residuals

0 200 400 600

0 200 400 600

0 200 400 600

0 200 400 600

0 200 400 600

0 200 400 600

Figure 6.9. Stochastic Volatility

0 200 400 600

Table 1. Credibility Intervals 95% - Φ

µ β1t−1 β2t−1 β3t−1 β4t−1 β5t−1 β6t−1

Φβ1 (.025) .2310 1.262 .5203 -.1122 -.4625 -.0152 -.1183

Φβ1 (.975) .2532 1.293 .5721 -.1033 -.4266 -.0139 -.0999

Φβ2 (.025) -.2582 -.2877 .4215 .1051 .4337 .0135 .1005

Φβ2 (.975) -.2316 -.2572 .4746 .1149 .4702 .0154 .1189

Φβ3 (.025) -.4741 -.4378 -.9778 1.141 .5951 .0183 .1253

Φβ3 (.50) -.4020 -.3385 -.8204 1.171 .7172 .0224 .1619

Φβ4 (.025) .1570 .1605 .3217 -.0822 .1397 -.0115 -.0896

Φβ4 (.50) .1764 .1918 .3705 -.0728 .1782 -.0100 -.0783

Φβ5 (.025) .3965 .4719 .9298 -.5165 -2.2090 .9181 -.6076

Φβ5 (.50) .7881 1.1180 1.8360 -.3394 -1.5045 .9456 -.3833

Φβ5 (.975) 1.1680 1.8060 2.787 -.1688 -.7929 .9707 -.1819

Φβ6 (.025) .2788 .3752 .6515 -.1911 -.8212 -.0282 .7723

Φβ6 (.975) .4360 .6283 1.0570 -.1191 -.5263 -.0173 .8538

Figure 6.10. Posterior Distribution - Φ0 and Φ1

Histogram of phi0 Histogram of phi1

Figure 6.11. One-step ahead forecasts and forecast errors-specific days

Interest Rate Forecast

(a) 20/07/2004 (b) 01-02-2005

Interest Rate Forecast

(c) 27/06/2006 (d) 06/12/2006

Figure 6.12. One step Ahead Forecasts and Forecast Errors

(a) One step Ahead Forecasts (b) Forecast Errors

Table 2. Root mean square forecast error

Model Root Mean Squared Error

smoothing methods, Journal Of Econometrics 105:1, 185–223.

MÁRCIO POLETTI LAURINI

conditions in this context.

2. Diebold & Li Model

GMM estimators are dened as solutions to the system: