Você está na página 1de 9

BAYESIAN STATISTICS 7, pp.

000–000
J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid,
D. Heckerman, A. F. M. Smith and M. West (Eds.)
c Oxford University Press, 2003

Markov Random Field Extensions


using State Space Models
CLAUS DETHLEFSEN
Department of Mathematical Sciences, Aalborg University, Aalborg, Denmark
dethlef@math.auc.dk

SUMMARY
We elaborate on the link between state space models and (Gaussian) Markov random fields. We
extend the Markov random field models by generalising the corresponding state space model. It
turns out that several non Gaussian spatial models can be analysed by combining approximate
Kalman filter techniques with importance sampling. We illustrate the ideas by formulating a
model for edge detection in digital images, which then forms the basis of a simulation study.

Keywords: EXTENDED KALMAN SMOOTHING; IMAGE RESTORATION; EDGE DETECTION;


GAUSSIAN MIXTURES; LATTICE DATA.

1. INTRODUCTION
The class of state space models is very broad and comprises structural time series
models, ARIMA models, cubic spline models and as demonstrated by Lavine (1999)
also Markov random field models. The Kalman filter techniques are powerful tools for
inference in such sequential models. Basic references on state space model methodology
is in Harvey (1989), West and Harrison (1997), Durbin and Koopman (2001). In
the past decade, interest has been in developing Markov chain Monte Carlo (MCMC)
methods for the analysis of complex state state space models, see Carlin et al. (1992),
Carter and Kohn (1994), Frühwirth-Schnatter (1994) and de Jong and Shephard (1996).
Our approach is not based on MCMC, but on iterated extended Kalman smoothing,
which may be combined with importance sampling for exact simulation, see Durbin
and Koopman (2001). Using this method, we avoid the MCMC problems of ensuring
that the Markov chain is mixing well and assessing whether the chain has converged.
Writing Markov random field models as state space models following Lavine (1999),
makes it possible to use Kalman filter techniques to extend and analyse more complex
Markov random field models. We show how to analyse such extensions and illustrate
by formulating a model for restoring digital images with focus on finding edges in the
image. However, the new class of models also have applications within agricultural
experiments, see e.g., Besag and Higdon (1999) and within disease mapping, see e.g.,
Knorr-Held and Rue (2002).
2 Claus Dethlefsen

2. GAUSSIAN STATE SPACE MODELS


The Gaussian state space model involves two processes, namely the latent state process,
θ t = Gt θ t−1 + ω t with ω t ∼ Np (0, W t ) and the observation process y t = F Tt θ t + ν t
with ν t ∼ Nd (0, V t ). The latent process is initialised with θ 0 ∼ Np (m0 , C 0 ). We
assume that the disturbances {ν t } and {ω t } are both serially independent and mutually
independent. The possible time-dependent system matrices F t , Gt , V t and W t are
all considered known for every t. They may also depend on a parameter vector ψ, but
this is suppressed in the notation.
The Kalman filter recursively yields p(θ t | Dt ), the conditional distribution of θ t
given all information available, Dt , at current time t,

a Rt
z }|t { z }| {
T
θ t | Dt−1 ∼ Np (Gt mt−1 , Gt C t−1 Gt + W t )
f Qt
z }|t { z }| {
y t | Dt−1 ∼ Nd (F Tt at , F Tt Rt F t + V t )
A e
z }|t { z }|t {
θ t | Dt ∼ Np (at + Rt F t Q−1 (y t − f t ), Rt − At Qt ATt ).
| {zt } | {z }
mt Ct
Assessment of the state vector, θ t , using all available information, Dn , is called
Kalman smoothing and we write (θ t | Dn ) ∼ Np (f mt , C e t ). Starting with m fn = mn and
e
C n = C n , the Kalman smoother is a backwards recursion in time, t = n − 1, . . . , 1,
ft = mt + B t (f
with m mt+1 − at+1 ) and C e t = C t + B t (C e t+1 − Rt+1 )B T , where
t
B t = C t GTt+1 R−1
t+1 . When p is large it is often computationally faster to use the
mathematically equivalent disturbance smoother, see Koopman (1993).
The posterior mode of p(θ | y) is m fT = (f mT1 , . . . , m
fTn ). From the definition of
conditional densities mf maximises p(θ, y) and thus also
n
X n
X
log p(θ, y) = log p(y t | θ t ) + log p(θ t | θ t−1 ) + log p(θ 0 ). (1)
t=1 t=1

f solves the following


The derivative with respect to θ is zero at the maximum, so m
equations.
∂ log p(y t | θ t ) ∂ log p(θ t | θ t−1 ) ∂ log p(θ t+1 | θ t )
+ + · I[t6=n] = 0,
∂θ t ∂θ t ∂θ t
where I[t6=n] is an indicator function, which is 1, when t 6= n, and zero otherwise. From
the definition of the state space model this gives
F t V −1 T −1 T −1
t (y t − F t θ t ) − W t (θ t − Gt θ t−1 ) + Gt+1 W t+1 (θ t+1 − Gt+1 θ t ) · I[t6=n] = 0. (2)

We may now interpret the Kalman smoother as being an algorithm to solve (2) recur-
sively.
The log likelihood function for a vector of hyperparameters ψ is given by
X n n  
1X 2
l(ψ) = log p(y t | y 1 , . . . , y t−1 , ψ) = c − log |Qt | + ky t − f t k −1 , (3)
2 Qt
i=1 t=1
Markov Random Field Extensions using State Space Models 3

where kxk2Σ = xT Σx and c is a constant. The log likelihood for a given value of ψ can
thus be obtained directly from the Kalman filter. The expression can then be maximised
numerically yielding the maximum likelihood estimate. Approximate standard errors
can be extracted from numerical second derivatives.

3. NON GAUSSIAN STATE SPACE MODELS


Consider a state space model with the observation density p(y t | F Tt θ t ) which may be
non Gaussian. The latent process is specified as θ t = Gt θ t−1 + ω t , where ω t ∼ p(ω t )
may be non Gaussian. Conditional on the latent process, the observations are assumed
serially independent. For notational convenience, we define λt = F Tt θ t .
Kitagawa (1987) and Carlin et al. (1992) analysed similar models using numerical
approximation and MCMC techniques, respectively. Another approach to analyse these
types of state space models is by particle filtering, see for example Doucet et al. (2000).
The exposition here is due to Durbin and Koopman (2001) and is based on maximising
the posterior p(θ | y) with respect to θ. This is equivalent to maximising log p(θ, y)
given by (1). Differentiation with respect to θ t and equating to zero, yields
∂ log p(θ, y) ∂ log p(y t | λt ) ∂ log p(ω t ) ∂ log p(ω t+1 )
= Ft − + GTt+1 · I[t6=n] = 0.
∂θ t ∂λt ∂ω t ∂ω t+1
We assume that the densities are sufficiently well-behaved so that a unique maximum
exists and that it solves the above equations. For a discussion of this point, see Durbin
and Koopman (2001).
The strategy employed to find the maximum is to obtain an approximation to
the state space model and thus identifying y e t , Ve t , W
f t by comparing with (2). The
approximation requires an initial value θ e which is improved by iteration, the new value
being the output from the Kalman smoother, m f, in the approximating linear state
space model. The procedure is called iterated extended Kalman smoothing.
We will consider two methods for approximation, depending on the form of the
densities. Both methods ensures a common mode of the approximating model and the
original model. The first method also ensures common curvature at the mode, but is
not always applicable, so that the second method must be used.
Method 1: Letting λ et = F T θ e
t t , the observation part is linearised as

∂ log p(y t | λt ) ∂ log p(y t | λt ) ∂ 2 log p(y t | λt ) e t)
≈ + T (λt − λ
∂λt ∂λt e
λt =λt ∂λ t ∂λ t e
λ t =λt
Comparing this with the first term in (2), we recognise it as a state space model with
(linear) observation part specified by
2 log p(y | λ )

−1 ∂ t t
Ve t =

∂λt ∂λTt λt =λ et

∂ log p(y | λ t )
et = λ
y e t − Ve t t .
∂λt et
λt =λ
For example, exponential family distributions can be linearised using Method 1.
Method 2: In the second approach, it is assumed that the log densities are a
function of (y t − λt )2 or ω 2t , respectively. The method applies to either or both of
4 Claus Dethlefsen

the derivatives ∂ log p(y t | λt )/∂λt and ∂ log p(ω t )/∂ω t evaluated at λ e
et = F T θ
t t and
et = θ
ω e t − Gt θ
et−1 , respectively. The latter term, at time t + 1 and evaluated at ω e t+1 is
also needed for insertion in (2).
Since
∂ log p(y t | λt ) ∂ log p(y t | λt )
= −2 (y t − λt )
∂λt ∂(y t − λt )2
and
∂ log p(ω t ) ∂ log p(ω t )
= −2 (θ t − Gt θ t−1 ),
∂ω t ∂ω 2t
we see by comparison with (2) that the approximating model is given by
" #−1
1 ∂ log p(y t | λ )
t
Ve t = −
2 ∂(y t − λt )2 λ =λ et
t
" #−1

f t = − 1 ∂ log p(ω t )
W .
2 ∂ω 2t
ω t =ωet
For example, t-distributions or Gaussian mixtures can be approximated by Method 2.

4. MARKOV RANDOM FIELDS


Let S denote an I × J regular, finite lattice. Elements of S are called sites or pixels,
and a typical element is denoted s or ij or (i, j), depending on the context. A site s is
associated with two random variables, ys and θs , denoting, respectively, the observed
and the latent value at s. The vector of random variables at a set of sites A ⊆ S is
denoted y A and at the remaining sites y −A = y S\A . The shorthand notation for y S is
y. The vector y i = {yij }j=1...J is the variables in the ith row, and y −i is the vector at
sites outside the ith row. Similar notation is used for θ and other derived variables.
Let  
1 −1 0 · · · 0
 −1 2 . . . . . . .. 
 . 
Tl =  ... ... ... 
 0 0  .
 .. ... ... 
. 2 −1
0 · · · 0 −1 1 l×l
Then the prior density is given by Besag (1974),
 
1 T
p(θ) ∝ exp − θ P θ , (4)
2
where P is the IJ × IJ precision matrix
P = τ1−2 T I ⊗ I J + τ2−2 I I ⊗ T J ,
and τ12 and τ22 are hyper parameters, which we assume are known. The prior is improper,
since the precision matrix is singular. The parameters τ12 and τ22 measure the degree of
dependency in the row- and column-direction, respectively.
The observations y are assumed to be normally distributed given θ,
y | θ ∼ NIJ (F T θ, Σ).
Markov Random Field Extensions using State Space Models 5

We will assume that Σ and F are given so that in the following, the posterior is proper.
The posterior distribution of θ is given by Bayes’ Theorem,

θ | y ∼ NIJ (m, C), (5)

where C = (P + F Σ−1 F T )−1 and m = CΣ−1 y. The aim is to assess the posterior
distribution, p(θ | y), in a computationally attractive way. Note that the matrices to
be inverted in the above expression are of size IJ × IJ.
The Markov random field model is equivalent to the following state space model
in the sense that their posterior distributions are identical. The state space model is
evolving following the rows instead of time.
    h i
y i θ ∼ N F Ti θ i , Σi 0
(6)
xi i Hθ i 0 τ12 I J−1

θ i | θ i−1 ∼ N (θ i−1 , τ22 I J ) (7)


p(θ 1 ) ∝ 1, (8)
where  
1 −1 0 ···
0
0 ... .. 
...
H= 1 . .
 .. ... ... 
. −1 0
0 ··· 0 1 −1
Thus y i are the observed rows, θ i are the corresponding latent variables and xi are
so-called pseudo observations. The analysis of the model is carried out conditional on
the pseudo observations being observed to zero as this ensures the equivalence of the
state space model with the Markov random field model. In other words, θ | x = 0
corresponds to the Markov random field prior (4) and θ | x = 0, y corresponds to the
posterior (5). This equivalence was established by Lavine (1999).

5. EXTENSIONS TO MARKOV RANDOM FIELD MODELS


We now extend the Markov random field model (6)–(8). For notational convenience,
we assume F i = I and Σi = σ 2 I. The model can then be written in coordinate form
as
yij | θij ∼ N (θij , σ 2 )
xij | (θij , θi,j+1 ) ∼ N (θij − θi,j+1 , τ12 ) (9)
θij | θi−1,j ∼ N (θi−1,j , τ22 ). (10)
The idea is to substitute any or all of the Gaussian distributions with more general
distributions. These may be mixed in any way and this opens up for a large class of
models. For example, the observations may be Poisson distributed conditional on a
Markov random field similar to the model analysed by Christensen and Waagepetersen
(2002). Another example is the fertility model considered by Besag and Higdon (1999)
where t-distributions are used to allow for observational outliers and for jumps in the
underlying fertility. In both examples, MCMC methods were used to analyse the mod-
els. They were reanalysed in Dethlefsen (2002) using our methodology which does not
make use of MCMC. Our approach is approximative, but using importance sampling
6 Claus Dethlefsen

as described in Durbin and Koopman (2001), it is possible to assess various functions


of interest using exact simulation.
We illustrate our approach by substituting the distributions in (9) and (10) by a
mixture of Gaussian distributions, arriving at an image restoration model. Further
illustrations are given in Dethlefsen (2002).

5.1. Application in Digital Image Analysis


Let the observed digital image y be represented by the grey scale values yij in an I × J
lattice made up of pixels ij, for i = 1, . . . , I and j = 1, . . . , J. We assume that y ij is an
indirect observation of the noise-free pixel-value θij , so that the noise-free image is θ.
Let M(µ, k, v12 , v22 ) = kN (µ, v12 ) + (1 − k)N (µ, v22 ), be a mixture of two univariate
Gaussian distributions. Let X ∼ M(0, k, v12 , v22 ) and denote the density of X by p(x).
The log density is a function of x2 , so we use Method 2 to approximate p(x) in the
point, ξ. Thus, p(x) is approximated by a Gaussian distribution with zero mean and
v 2 , where
variance e

k exp[−ξ/(2v12 )]/v1 + (1 − k) exp[−ξ/(2v22 )]/v2


v2 =
e . (11)
k exp[−ξ/(2v12 )]/v13 + (1 − k) exp[−ξ/(2v22 )]/v23

We now formulate a model for image restoration by replacing (9)–(10) with

xij | (θij , θi,j+1 ) ∼ M(θij − θi,j+1 , k, τ 2 , cτ 2 )

θij | θi−1,j ∼ M(θi−1,j , k, τ 2 , cτ 2 ),


where k is the probability of having variance τ 2 , and c > 1 is the scaling factor for the
variance of a jump. When c approaches 1, the model approaches the Gaussian Markov
random field model where edges tend to be blurred due to the smoothing nature of this
model. For larger values of c the model implies that a larger difference in neighbouring
values is needed in order to classify the jump as an edge.
In the approximating state space model, each pixel is associated with two variances.
These are calculated using (11) with ξ substituted with E[θij | x = 0, y] and E[θij −
θi,j+1 | x = 0, y], respectively, and improved by iteration. If these variances are both
small, the pixel is in a smooth part of the image. If one or both are large, this indicates
an edge in either left-right and/or up-down directions.
Simulation Example.
To illustrate the methodology, we simulate an image and denoise it using the image
restoration model. Implementation was done using R, see Ihaka and Gentleman (1996).
The programs are available from the web page www.math.auc.dk/∼dethlef.
The simulated image is built-up by four additive parts, a) an “egg box” function b)
a solid disc c) a solid rectangle and d) independent, Gaussian noise. The sum of these
four contributions determines the grey scale value in each pixel.
Figure 1 shows a simulated image (left) of size 128 × 128 together with the posterior
mode (middle) and residuals (right) of the smoothed image obtained by the Kalman
smoother using the Gaussian Markov random field model with parameters obtained
from maximum likelihood estimation.
The log likelihood is obtained from (3), but without the terms resulting from the
pseudo observations. It turned out to be useful to concentrate σ 2 out of the likeli-
hood, leaving only the fraction τ 2 /σ 2 to be estimated by the numerical maximisation
Markov Random Field Extensions using State Space Models 7

Figure 1 To the left is shown a simulated image. The middle image shows the posterior mode
found by Kalman smoothing using the Gaussian Markov random field model. The right image
shows the residual image.

algorithm. For conveniency, we worked with the transformed parameter log τ /σ, allow-
ing the maximiser to suggest any real number as input. In all runs, we have chosen
m0j = 128 and C 0 = 1000 · I.
The resulting maximum likelihood estimates were b τ 2 = 338 and σ
b 2 = 117, and the
posterior mode of θ is displayed in Figure 1 (middle). The result is very smooth and
edges are blurred, as expected from the model. The residual image in Figure 1 (right)
also suggests that the edges are over-smoothed.
Given a value of log τ /σ, the Kalman smoother took approximately 3 minutes in R
on a SUN Enterprise 220R machine. Maximum likelihood estimation of this parameter
took approximately 90 minutes in R running on a SUN Enterprise 220R machine.

Figure 2 The posterior mode (left) after 50 iterations using the iterated extended Kalman
smoother. The residual image is shown in the middle and to the right is shown the average of
the two variances for each pixel calculated via (11).

The restored image after 50 iterations of the iterated extended Kalman smoother
from the image restoration model is seen in Figure 2 (left) along with the residual
image (middle). The parameters chosen were σ 2 = 60, τ 2 = 50, k = 0.95 and c =
25. These were chosen by tuning, since all attempts to perform maximum likelihood
estimation failed due to numeric instabilities. One run with 50 iterations in the model
took approximately 5 hours in R.
8 Claus Dethlefsen

The image to the right in Figure 2 shows the average of the up-down and left-
right variances calculated in each iteration by (11). As seen, the edges are now found,
although the “egg box” function confuses slightly. The edges in the posterior mode
of θ are clearer than the result from the Gaussian Markov random field model. The
residual image resembles white noise and indicates a good fit.

6. DISCUSSION
We provide an alternative to MCMC analysis of spatial models. For non Gaussian
state space models, the iterated extended Kalman smoother is capable of finding an
approximating Gaussian state space model with the same posterior mode. This allows
us to construct Markov random field models with non Gaussian increments. Then, the
approximating state space model can be used as importance density as described in
Durbin and Koopman (2001), to provide exact sampling of quantities of interest.
In the image restoration example, we experience a weakness of our method. When
the lattice is high-dimensional, the iterated extended Kalman smoother is slow. For
this reason, we have not employed importance sampling in the example, but the result
from the approximating state space model seems very satisfactory.
We find that the methodology has a great potential and has a wide range of ap-
plications. This is illustrated in Dethlefsen (2002) with examples from agricultural
experiments.

ACKNOWLEDGEMENTS
I am indebted to my Ph.D. supervisor Søren Lundbye-Christensen for inspiring discus-
sions.

REFERENCES
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion).
J. Roy. Statist. Soc. B 36, 192–236.
Besag, J. and Higdon, D. (1999). Bayesian analysis of agricultural field experiments (with discussion).
J. Roy. Statist. Soc. B 61, 691–746.
Carlin, B. P., Polson, N. G., and Stoffer, D. S. (1992). A Monte Carlo approach to nonnormal and
nonlinear state-space modeling. J. Amer. Statist. Assoc. 87, 493–500.
Carter, C. K. and Kohn, R. (1994). On Gibbs Sampling for State Space Models. Biometrika 81, 541–553.
Christensen, O. F. and Waagepetersen, R. (2002). Bayesian prediction of spatial count data using gen-
eralised linear mixed models. Biometrics 58 (to appear).
de Jong, P. and Shephard, N. (1995). The simulation smoother for time series models. Biometrika 82,
339-350.
Dethlefsen, C. (2002). Space Time Problems and Applications. Ph.D. Thesis, Aalborg University.
Doucet, A., Godsill, S. J. and West, M. (2000). Monte Carlo filtering and smoothing with application to
time-varying spectral estimation. Proc. of the IEEE International Conference on Acoustics, Speech
and Signal Processing, volume II, 701–704.
Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford: Oxford
University Press.
Frühwirth-Schnatter, S. (1994), Data Augmentation and Dynamic Linear Models. J. Time Series Anal-
ysis 15, 183–202.
Harvey, A. C. (1989). Forecasting, structural time series models and the Kalman filter. Cambridge:
University Press.
Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. J. Comp. Graph.
Statist. 5, 299–314.
Markov Random Field Extensions using State Space Models 9

Kitagawa, G. (1987). Non-gaussian state-space modeling of nonstationary time series (with discussion).
J. Amer. Statist. Assoc. 82, 1032–1063.
Knorr-Held, L. and Rue, H. (2002), On block updating in Markov random field models for disease
mapping. Scandinavian J. Statist. (to appear).
Koopman, S. J. Disturbance smoother for state space models. Biometrika 80, 117–126.
Lavine, M. (1999). Another look at conditionally Gaussian Markov random fields. Bayesian Statistics 6
(J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.). Oxford: University Press.
West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, Berlin: Springer.

Você também pode gostar