Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract—We present an adaptive smoother for linear state- Wishart distributed prior with Gaussian likelihood is exploited
space models with unknown process and measurement noise to model and estimate the measurement noise covariance in
covariances. The proposed method utilizes the variational Bayes the VB framework. It is also shown in [13] that the mean
technique to perform approximate inference. The resulting
smoother is computationally efficient, easy to implement, and can square error of state estimates can be reduced by using the
proposed VB measurement update for robust filtering and
arXiv:1412.5307v2 [cs.SY] 31 Aug 2015
The initial state x0 is assumed to have a Gaussian prior, measurement noise covariance.
i.e., p(x0 ) = N (x0 ; m0 , P0 ), where N (·; µ, Σ) denotes the Our aim is to obtain an analytical approximation of the
Gaussian probability density function (PDF) with mean µ and posterior density for the state trajectory x0:K and noise co-
covariance Σ. Qk and Rk are the unknown positive definite variances R0:K and Q0:K−1 . We will derive an approximate
process noise and measurement noise covariance matrices smoother which will propagate the sufficient statistics of the
assumed to have initial inverse Wishart priors approximate distributions through fixed point iterations with
guaranteed convergence.
p(Q0 ) = IW(Q0 ; ν0 , V0 ), (2a)
p(R0 ) = IW(R0 ; µ0 , M0 ). (2b) III. VARIATIONAL S OLUTION
The inverse Wishart PDF we use in this work is given in the The a priori for the joint density p(x0 , Q0 , R0 ) is given as
following form follows,
1 p(x0 , Q0 , R0 ) =N (x0 ; m0 , P0 )IW(Q0 ; ν0 , V0 )
|Ψ| 2 (ν−d−1) exp Tr − 12 ΨΣ−1
IW(Σ; ν, Ψ) , 1 (ν−d−1)d 1 ν , (3) × IW(R0 ; µ0 , M0 ). (6)
22 Γd 2 (ν − d − 1) |Σ| 2
Then, the posterior for the state trajectory and the unknown
where Σ is a symmetric positive definite random matrix of parameters denoted by Z , {x0:K , Q0:K−1 , R0:K } is given
dimension d × d, ν > 2d is the scalar degrees of freedom by Bayes’ theorem as
and Ψ is a symmetric positive definite matrix of dimension
d × d and is called the scale matrix. This form of the inverse K−2
Y
Wishart distribution is used in [27]. When Σ ∼ IW(Σ; ν, Ψ), p(Z|y0:K ) ∝ p(x0 , Q0 , R0 )p(yK |xK , RK ) p(Ql+1 |Ql )
then Σ−1 ∼ W(Σ; ν − d − 1, Ψ−1 ) and E [Σ] = ν−2d−2 Ψ
when l=0
−1 −1 K−1
ν − 2d − 2 > 0 and E[Σ ] = Ψ (ν − d − 1). Further, any Y
diagonal element of an inverse Wishart matrix is distributed × p(yk |xk , Rk )p(xk+1 |xk , Qk )p(Rk+1 |Rk ). (7)
as inverse gamma [27, Corollary 3.4.2.1]. Therefore, the pro- k=0
posed inverse Wishart model is more general than models with There is no analytical solution for this posterior. We are
diagonal covariance assumption where the diagonal entries are going to look for an approximate analytical solution using the
inverse gamma distributed, see e.g. [11]. following variational approximation.
It is common in Kalman filtering and smoothing literature
to assume that the noise covariances are fixed parameters p(Z|y0:K ) ≈ q(Z) , qx (x0:K )qQ (Q0:K−1 )qR (R0:K ), (8)
[28]. However, the noise covariances can be unknown and where the densities qx (·), qQ (·) and qR (·) are the approximate
time-varying. In such cases, the noise parameters can be posterior densities for x0:K , Q0:K−1 and R0:K , respectively.
treated as state variables with dynamics. Dynamical models The well-known technique of VB [25, Ch. 10], [26] chooses
for covariance matrices is adopted here from [29] where the the estimates q̂x (·), q̂Q (·) and q̂R (·) for the factors in (8) using
matrix Beta-Bartlett stochastic evolution model was proposed the following optimization problem
for estimating the multivariate stochastic volatility. The dy-
namical models for the covariance matrices Rk and Qk are q̂x , q̂Q , q̂R = argmin DKL (q(Z)||p(Z|y0:K )) (9)
qx ,qQ ,qR
parametrized by the covariance discount factors 0 λR ≤ 1
and 0 λQ ≤ 1, respectively. The matrix Beta-Bartlett R q(x)
where DKL (q(x)||p(x)) , q(x) log p(x) dx is the Kullback-
stochastic evolution model for a generic random matrix Σk
Leibler divergence [31]. The optimal solution satisfies the
having a covariance discount factor 0 λ ≤ 1 is described
following set of equations.
in the following.
Let p(Σk−1 ) = IW(Σk−1 ; νk−1 , Ψk−1 ). The forward pre- log q̂x (x0:K ) = E [log p(Z, y0:K )] + cx , (10a)
dictive model p(Σk |Σk−1 ) is such that, the forward predic- q̂Q q̂R
tion marginal density becomes the inverse Wishart density log q̂Q (Q0:K−1 ) = E [log p(Z, y0:K )] + cQ , (10b)
q̂x q̂R
parametrized by p(Σk ) = IW(Σk ; νk , Ψk ) where
log q̂R (R0:K ) = E [log p(Z, y0:K )] + cR , (10c)
Ψk = λΨk−1 , (4a) q̂x q̂Q
νk = λνk−1 + (1 − λ)(2d + 2). (4b) where cx , cQ and cR are constants with respect to the variables
Similar to the Kalman filter’s prediction update, the forward x0:K , Q0:K−1 and R0:K , respectively. The solution to (10) can
prediction keeps the marginal expected value of Σk unchanged be obtained via fixed-point iterations where only one factor
while the spread is increased. Furthermore, the backwards in (8) is updated and all the other factors are fixed to their
smoothing recursion is given by [29] last estimated values [25, Ch. 10]. The iterations converge to
a local optima of (9) [25, Ch. 10], [32, Ch. 3]. The complete
Ψ−1 −1 −1
k ← (1 − λ)Ψk + λΨk+1 , (5a) (standard but tedious) derivations for the variational iterations
νk ← (1 − λ)νk + λνk+1 . (5b) are given in [33].
The implementation pseudo-code for the proposed algo-
Note that in the prediction and smoothing iterations above, rithm is given in Table I. When the recursions of the proposed
setting λ = 1 corresponds to the fixed parameter case. algorithm converge, the expected values or the modes of the
The aforementioned dynamical model for noise covariances posteriors for xk , Rk and Qk can be used as the point estimates
is adopted in [30] for adaptive Kalman filtering framework for the random variables. When an estimate of uncertainty
and in [13] for filtering and smoothing1 with heavy-tailed for the point estimate is required the posterior variances can
be used. Nevertheless, it is well-known that the VB method
1 The expressions given in lines 21 and 22 of Algorithm 3 in [13] are underestimates the covariance when the posterior is multi-
inaccurate. The correct version is given in [29]. modal [25, Ch. 10].
3
Table I
IV. S IMULATIONS S MOOTHING WITH UNKNOWN NOISE COVARIANCES
A. Unknown time-varying noise covariances 1: Inputs: Ak , Ck , ν0 , V0 , µ0 , M0 , m0 , P0 , λQ , λR and y0:K .
initialization
We illustrate the performance of the proposed smoother in 2: Vk|K ← V0 , νk|K ← ν0 , Mk|K ← M0 , µk|K ← µ0 for 0 ≤ k ≤ K
an object tracking scenario. In the simulation scenario, a point 3: repeat
object moves according to the continuous white noise accel- update qx (x0:K ) given qQ (Q0:K−1 ) and qR (R0:K )
4: Qe k ← Vk|K /(νk|K − nx − 1) for 0 ≤ k ≤ K
eration model in two dimensional Cartesian coordinates [34,
p. 269]. The sampling time is τ = 1s and the simulation 5: Rk ← Mk|K /(µk|K − ny − 1) for 0 ≤ k ≤ K
e
6: m0|−1 ← m0 , P0|−1 ← P0
length K is chosen to be 4000. The state vector is defined as 7: for k = 0 to K do
the position and the velocity of the object. A sensor collects 8: Kk ← Pk|k−1 CkT (Ck Pk|k−1 CkT + R ek )−1
noisy measurements of the object’s position according to (1b). 9: mk|k ← mk|k−1 + Kk (yk − Ck mk|k−1 )
The true parameters of the linear state-space model are given 10: Pk|k ← (I − Kk Ck )Pk|k−1
as 11: mk+1|k ← Ak mk|k
12: Pk+1|k ← Ak Pk|k AT k + Qk
e
Ak = Diag(a, a), Q0 = Diag(q, q), 13: end for
3 2
14: for k = K-1 down to 0 do
−1
1 τ 2 τ /3 τ /2 15: Gk ← Pk|k AT k Pk+1|k
a= , q = σν 2 ,
0 1 τ /2 τ 16: mk|K ← mk|k + Gk (mk+1|K − Ak mk|k )
17: Pk|K ← Pk|k + Gk (Pk+1|K − Pk+1|k )GT
4πk 5 1 k
RkTrue = 2 − cos R0 , R0 = σe2 , 18: end for
K 1 5 update qQ (Q0:K−1 ) and qR (R0:K ) given qx (x0:K )
19: ν0|−1 ← ν0 , V0|−1 ← V0 , µ0|−1 ← µ0 , M0|−1 ← M0
2 1 4πk 1 0 0 0 20: for k = 0 to K do
QTrue
k = + cos Q 0 , C k = . 21: µk|k ← µk|k−1 + 1
3 3 K 0 0 1 0
22: Mk|k ← Mk|k−1 + Ck Pk|K CkT
The noise related parameters are σe2 = 2m2 and σv2 = 3m2 /s3 . +(yk − Ck mk|K )(yk − Ck mk|K )T
The initial values of the parameters at time index k = 0 are 23: µk+1|k ← µk|k + (1 − λR )(2ny + 2), Mk+1|k ← λR Mk|k
used in the RTS smoother which are R0 and Q0 , respectively. 24: end for
25: for k = 0 to K-1 do
Using the simulated measurement data, we compare four 26: νk|k ← νk|k−1 + 1
smoothers; RTS smoother using the fixed noise covariances 27: Vk|k ← Vk|k−1 + Pk+1|K + Ak Pk|K AT T
k − Pk+1,k|K Ak
R0 and Q0 (denoted by RTS), VB smoother for estimating −Ak Pk,k+1|K + (mk+1|K − Ak mk|K )(mk+1|K − Ak mk|K )T
only Rk as given in [13, Algorithm 3] (denoted by VBS- 28: νk+1|k ← νk|k + (1 − λQ )(2nx + 2), Vk+1|k ← λQ Vk|k
R), the proposed VB algorithm for estimating Rk and Qk 29: end for
30: for k = K-1 down to 0 do
simultaneously (denoted by VBS-RQ), and the oracle RTS 31: µk|K ← (1 − λR )µk|k + λR µk+1|K
smoother which knows the true noise covariances (denoted
−1 −1
−1
32: Mk|K ← (1 − λR )Mk|k + λR Mk+1|K
by Oracle-RTS).
33: end for
We perform NM C = 5000 Monte Carlo (MC) simulations. 34: for k = K-2 down to 0 do
In each MC run, a new trajectory with an initial state and the 35: νk|K ← (1 − λQ )νk|k + λQ νk+1|K
−1
corresponding measurements are generated. The prior for the
−1 −1
36: Vk|K ← (1 − λQ )Vk|k + λQ Vk+1|K
iid
initial state is assumed to be Gaussian, i.e., xj0 ∼ p(x0 ) = 37: end for
N (x0 ; m0 , P0 ) for the j th MC simulation where, 38: until converged
39: Outputs: mk|K , Pk|K , Mk|K , µk|K and,
m0 = [0m, 5m/s, 0m, 5m/s]T , (11a) Rck , Eq [Rk |y0:K ] = Mk|K /(µk|K − 2ny − 2) for k = 0 · · · K.
R
Vk|K , νk|K and Q ck , Eq [Qk |y0:K ] = Vk|K /(νk|K − 2nx − 2) for
P0 = Diag([302 , 302 , 302 , 302 ]). (11b) k = 0 · · · K − 1.
Q
30
20
10
0
0 500 1000 1500 2000 2500 3000 3500 4000
k
(a)
40 (a)
QTrue
k (·, ·) ck (1, 1)
Q ck (2, 2)
Q ck (1, 2)
Q
30
20
10
0
0 500 1000 1500 2000 2500 3000 3500 4000
k
(b)
Figure 1. (a) First diagonal and the off-diagonal elements of the estimated
measurement noise covariance R ck using VBS-RQ are plotted versus time
index k along with the their true value (in black). (b) First two diagonal and
an off-diagonal element of the estimated process noise covariance Q ck using
VBS-RQ are plotted versus time index k along with the their true value (in
black). (b)
R EFERENCES [27] A. K. Gupta and D. K. Nagar, Matrix variate distributions. Boca Raton,
FL: Chapman & Hall/CRC, 2000.
[1] R. E. Kalman, “A new approach to linear filtering and prediction [28] T. Kailath, A. Sayed, and B. Hassibi, Linear Estimation, ser. Prentice-
problems,” Transactions of the ASME–Journal of Basic Engineering, Hall information and system sciences series. Prentice Hall, 2000.
vol. 82, no. Series D, pp. 35–45, 1960. [Online]. Available: https://books.google.se/books?id=zNJFAQAAIAAJ
[2] H. E. Rauch, C. T. Striebel, and F. Tung, “Maximum Likelihood Esti- [29] C. M. Carvalho and M. West, “Dynamic matrix-variate graphical mod-
mates of Linear Dynamic Systems,” Journal of the American Institute of els,” Bayesian Anal, vol. 2, pp. 69–98, 2007.
Aeronautics and Astronautics, vol. 3, no. 8, pp. 1445–1450, Aug 1965. [30] S. Särkkä and J. Hartikainen, “Non-linear noise adaptive Kalman filter-
[3] H. Heffes, “The effect of erroneous models on the kalman filter ing via variational Bayes,” in Machine Learning for Signal Processing
response,” Automatic Control, IEEE Transactions on, vol. 11, no. 3, (MLSP), 2013 IEEE International Workshop on, Sept 2013, pp. 1–6.
pp. 541–543, Jul 1966. [31] T. M. Cover and J. Thomas, Elements of Information Theory. John
[4] S. Sangsuk-Iam and T. Bullock, “Analysis of discrete-time kalman Wiley and Sons, 2006.
filtering under incorrect noise covariances,” Automatic Control, IEEE [32] M. Wainwright and M. Jordan, Graphical Models, Exponential Families,
Transactions on, vol. 35, no. 12, pp. 1304–1309, Dec 1990. and Variational Inference, ser. Foundations and trends in machine
[5] B. Anderson and J. Moore, Optimal Filtering, ser. Dover Books on learning. Now Publishers, 2008.
Electrical Engineering. Dover Publications, 2012. [Online]. Available: [33] T. Ardeshiri, E. Özkan, U. Orguner, and F. Gustafsson,
http://books.google.se/books?id=iYMqLQp49UMC “Variational iterations for smoothing with unknown process
[6] R. Mehra, “On the identification of variances and adaptive Kalman and measurement noise covariances,” Department of Electrical
filtering,” Automatic Control, IEEE Transactions on, vol. 15, no. 2, pp. Engineering, Linköping University, SE-581 83 Linköping, Sweden,
175 – 184, apr 1970. Tech. Rep. LiTH-ISY-R-3086, August 2015. [Online]. Available:
[7] ——, “Approaches to adaptive filtering,” Automatic Control, IEEE http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-120700
Transactions on, vol. 17, no. 5, pp. 693 – 698, oct 1972. [34] Y. Bar-Shalom, X. Li, and T. Kirubarajan, Estimation with Applications
to Tracking and Navigation: Theory Algorithms and Software. Wiley,
[8] R. H. Shumway and D. S. Stoffer, “An approach to time series smoothing
2004.
and forecasting using the em algorithm,” Journal of time series analysis,
vol. 3, no. 4, pp. 253–264, 1982.
[9] R. Frigola, Y. Chen, and C. Rasmussen, “Variational Gaussian
process state-space models,” in Advances in Neural Information
Processing Systems 27, 2014, pp. 3680–3688. [Online].
Available: http://papers.nips.cc/paper/5375-variational-gaussian-process-
state-space-models.pdf
[10] M. J. Beal, “Variational algorithms for approximate bayesian inference,”
Ph.D. dissertation, Gatsby Computational Neuroscience Unit, University
College London, 2003.
[11] S. Särkkä and A. Nummenmaa, “Recursive noise adaptive Kalman
filtering by variational Bayesian approximations,” Automatic Control,
IEEE Transactions on, vol. 54, no. 3, pp. 596 –600, march 2009.
[12] W. Li and Y. Jia, “State estimation for jump Markov linear systems by
variational Bayesian approximation,” Control Theory Applications, IET,
vol. 6, no. 2, pp. 319 –326, 19 2012.
[13] G. Agamennoni, J. Nieto, and E. Nebot, “Approximate inference in
state-space models with heavy-tailed noise,” Signal Processing, IEEE
Transactions on, vol. 60, no. 10, pp. 5024 –5037, oct. 2012.
[14] R. Pichè, S. Särkkä, and J. Hartikainen, “Recursive outlier-robust filter-
ing and smoothing for nonlinear systems using the multivariate student-t
distribution,” in Machine Learning for Signal Processing (MLSP), 2012
IEEE International Workshop on, sept. 2012, pp. 1 –6.
[15] D. Barber and S. Chiappa, “Unified inference for variational
Bayesian linear Gaussian state-space models,” in Advances in Neural
Information Processing Systems 19. MIT Press, 2007, pp. 81–88.
[Online]. Available: http://papers.nips.cc/paper/3023-unified-inference-
for-variational-bayesian-linear-gaussian-state-space-models.pdf
[16] G. Agamennoni and E. Nebot, “Robust estimation in non-linear state-
space models with state-dependent noise,” Signal Processing, IEEE
Transactions on, vol. 62, no. 8, pp. 2165–2175, April 2014.
[17] A. Y. Aravkin, J. V. Burke, and G. Pillonetto, “Optimization viewpoint
on Kalman smoothing, with applications to robust and sparse estima-
tion,” ArXiv e-prints, Mar. 2013.
[18] ——, “Robust and Trend Following Student’s t Kalman Smoothers,”
ArXiv e-prints, Mar. 2013.
[19] A. Aravkin, B. Bell, J. Burke, and G. Pillonetto, “An `1 -Laplace robust
Kalman smoother,” Automatic Control, IEEE Transactions on, vol. 56,
no. 12, pp. 2898–2911, Dec 2011.
[20] S. Särkkä, Bayesian Filtering and Smoothing. New York, NY, USA:
Cambridge University Press, 2013.
[21] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood
from incomplete data via the EM algorithm,” Journal of the Royal
Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. pp.
1–38, 1977. [Online]. Available: http://www.jstor.org/stable/2984875
[22] S. Gibson and B. Ninness, “Robust maximum-likelihood
estimation of multivariable dynamic systems,” Automatica, vol. 41,
no. 10, pp. 1667 – 1682, 2005. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0005109805001810
[23] Z. Ghahramani and G. E. Hinton, “Parameter estimation for linear
dynamical systems,” Department of Computer Science, University of
Toronto, Tech. Rep., 1996.
[24] J. Fan, Y. Fan, and J. Lv, “High dimensional covariance matrix
estimation using a factor model,” Journal of Econometrics, vol.
147, no. 1, pp. 186–197, November 2008. [Online]. Available:
http://ideas.repec.org/a/eee/econom/v147y2008i1p186-197.html
[25] C. M. Bishop, Pattern Recognition and Machine Learning. Springer,
2007.
[26] D. Tzikas, A. Likas, and N. Galatsanos, “The variational approximation
for Bayesian inference,” IEEE Signal Process. Mag., vol. 25, no. 6, pp.
131–146, Nov. 2008.