Escolar Documentos
Profissional Documentos
Cultura Documentos
Time I
Lecturer: Jason Wyse
December 3, 2014
Contents
1 Examples of Stochastic Processes
1.1 Importance . . . . . . . . . . . . .
1.2 Gamblers Ruin . . . . . . . . . . .
1.3 Social Mobility . . . . . . . . . . .
1.4 Fancy a Drink Tonight? . . . . . .
1.5 Modelling Evolutionary Divergence
2 The
2.1
2.2
2.3
2.4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Poisson Processes
4.1 Assumptions of the Poisson Process . . . . . . . . . .
4.2 Probability Law of Poisson Process . . . . . . . . . .
4.3 Moments of the Poisson Distribution . . . . . . . . .
4.4 Times of First Arrival . . . . . . . . . . . . . . . . .
4.5 Memoryless Property of the Exponential Distribution
4.6 Time to Occurrence of rth Event . . . . . . . . . . . .
4.7 Summary of Inter-Arrival Times . . . . . . . . . . . .
4.8 General Poisson Process . . . . . . . . . . . . . . . .
4.9 Compound Poisson Processes . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
2
4
5
5
.
.
.
.
6
6
6
7
9
.
.
.
.
.
.
15
15
16
17
18
18
22
.
.
.
.
.
.
.
.
.
24
24
25
26
29
29
29
31
31
34
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
36
36
40
41
42
59
A Tutorials
A.1 Tutorial 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Tutorial 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
59
61
1.1
Importance
Stochastic models comprise some of the most powerful methods available to data analysts
in the description of observed real-life processes. In this module we will study Markov
models. The importance of these models is highlighted by the fact that:
1. a huge number of physical, biological, economic and social phenomena can be modelled naturally using them;
2. there is well developed theory and methods which allow us to do this modelling (in
a correct way).
On a very crude and granular level one could describe Markov models as models which
use information observed in the past to give an idea of what to expect in the present.
Lets begin by considering some examples and giving informal notions of some key
concepts.
1.2
Gamblers Ruin
Consider playing a game where on any play of the game you win A
C1 with probability
p = 0.4 or alternatively lose A
C1 with probability 1 p = 0.6. Suppose that you decide to
p) = A
C(2p
1) =
A
C0.2
= in 1 , . . . , x0 = i0 ) = p = 0.4
where in 1 , . . . , i0 are past values of your fortune. Note the use here of the conditioning
event, it is a probability xn+1 = i + 1 conditional on the past fortunes.
Recall that
P(B|A) =
P(B \ A)
P(a)
We say that xn is a discrete (indexed by the positive integers) time Markov chain
with transition probabilities p(i, j) if for any j, i, in 1 , . . . , i0 :
P(xn+1 = j|xn = i, xn
= in 1 , . . . , x0 = i0 ) = p(i, j)
i = 0, i = N :
1) = 0.6
p(0, 0) = 1 p(N, N ) = 1
p(i, i k) = 0 , k > 1
1
6.6
6
60
6
60
6
40
0
0
0
.6
0
0
0
0
.4
0
.6
0
0
0
0
.4
0
.6
0
0
0
0
.4
0
0
3
0
07
7
07
7=p
07
7
.45
1
1.3
Social Mobility
Suppose xn is a familys social class in the nth generation, assuming this to be either 1 =
lower class, 2 =middle, 3 =upper. In this very simplified version of sociology changes
of status are a Markov chain with transition matrix:
2
3
.7 .2 .1
p = 4.3 .5 .25
.2 .4 .4
The graph diagram is as follows:
DIAGRAM
Critiques of this simple model of social mobility:
1. limited as there are only three states;
2. time (temporal) homogeneity we may expect probabilities to change with time;
3. model doesnt learn from the external environment (economy, etc.).
Nonetheless, using this simple model we can ask (and later answer) questions like:
Does the proportion of families in the three classes approach a limit?
Note that in the transition matrices P , that the sum of any row is 1
X
p(i, j) = 1 stochastic matrix
j
1.4
ptoday
ptomorrow
.2
4
.2
=
.1
2
.2
4
.3
=
.7
3
.2 .6
.4 .45
.4 .5
3
.2 .6
.5 .25
.2 .1
1.5
The fundamental description of all living organisms is their genome sequence. This is a
string of 4 characters.
A adenine, C cytosine, G guanine, T thymine.
In DNA terminology these are the bases. DNA is a double-stranded halix (Watson &
Crick); complementary base pairs: A with T, C with G.
E. coli ! 4.6 106 base pairs.
Homo sapiens ! 3 109 base pairs.
Briefly, evolution of organisms occurs because of mutations in these base pairs, which
amounts to a copying error when DNA replicates. Looking at mutations is the key when
talking about evolution. Modern man started our divergence from apes about 5 6
million years ago.
Modern evolutionary models are largely based on Markov processes, in continuous
time. At any site on the genome we have a stochastic variable x(t) (t is time) taking one
of the following values:
{1, 2, 3, 4}
! {A, C, T, G}
so that the probabilities have the Markov property. The transition matrix is
2
3
P(A|A, t) P(C|A, t) P(G|A, t) P(T |A, t)
6
7
..
..
..
..
p=4
5
.
.
.
.
P(A|T, t) P(C|T, t) P(G|T, t) P(T |T, t)
Using some further assumptions on the structure of the probabilities the Jukes-Cantor
model for base substitution
1
(1 + 3e 4t )
4
1
pi,j (t) =
(1 e 4t )
4
pi,i (t) =
2.1
= in 1 , . . . , x0 = i0 } = p(i, j)
2.2
2.3
The one-step transition probabilities p(i, j) give the probability that the chain xn goes
from state i (xn = i) to state j in one step (xn+1 = j).
As we know the p(i, j)s are probabilities, it is clear that
p(i, j)
8 1 i, j k
and since the chain either will stay where it is or transition to a dierent state
k
X
p(i, j) = 1
j=1
then we refer to P as the one-step transition matrix of the Markov chain {xn , n
0}.
Example:
Let {xt , t 1} be independent identically distributed (iid).
(Recall that this means:
8 t, E(Xt ) = , 2 R
2
8 t, Var(Xt ) =
2R
` = 0, 1, 2, . . .
P
Now suppose S0 = 0, Sn = nt=1 Xt .
i|Sn = i, Sn
= P(Xn+1 = j
i)
= aj
= in 1 , . . . , S0 = 0)
= in
1 , . . . , S0
iid
= 0)
n
X
0}, S0 = 0 where
Xt
t=1
1) = q = 1
n
j
Since we know Sj = 0:
P(Sn = i||Sn | = i, |Sn 1 | = in 1 , . . . , |S1 | = i1 ) = P(Sn = i||Sn | = i, |Sn 1 || = in 1 , . . . , |Sj | = 0)
There are two possible values of the sequence Sj+1 , . . . , Sn for which |Sj+1 | = ij+1 , . . . , |Sn | =
i. Since the process does not cross zero between times j+1 & n these are ij+1 , . . . , i, ij+1 , . . . , i.
Assume ij+1 > 0, to obtain the first sequence we note in the n
more up steps (+1) than down steps ( 1).
Let ds be the number of down steps ( 1). Then
(ds + i) +ds = n
| {z }
up
=) ds =
j
2
n j
2
+i
n j
2
=p
n j+i
2
n j
2
n j+i
2
n j
2
Thus,
P(Sn = i||Sn | = i, . . . , |Sj+1 | = ij+1 ) =
=
p
p
n j+i
2
n j
2
n j+i
2
n j
2
+p
n j
2
n j+i
2
pi
pi + q i
Similarly,
qi
pi + q i
From this, we can see that upon conditioning on whether Sn = i or Sn = i.
P(Sn =
pi
qi
i
+
q
pi + g i
pi + q i
pi+1 q i+1
pi + q i
1) =
pi+1 + q i+1
pi + q i
pi (1
p) + q i (1
pi + q i
q)
8i>0
p(0, 1) = 1
2.4
The probability p(i, j) = P(xn+1 = j|xn = i) gives the probability of going from state i to
state j in one step. How do we compute the probability of going from i to j in m steps?
P(xn+m = j|xn = i) = pm (i, j) . . .
Recall that social mobility example with
2
.7
p = 4.3
.2
If my grandmother was upper class (state 3) & my parents were middle class (state
2) what is the probability that I will be lower class (state 1)?
9
P(x2 = 1, x1 = 2, x0 = 3) P(x1 = 2, x0 = 3)
P(x1 = 2, x0 = 3)
P(x0 = 3)
Markov, so drop x0 = 3
P(x2 = 3|x0 ) =
P(x2 = 3, x1 = `|x0 = 2)
`=1
3
X
p(2, `)p(`, 3)
`=1
3
X
P(x2 = 2, x1 = `|x0 = 3)
`=1
3
X
p(3, `)p(`, 2)
`=1
k
X
p(i, `)p(`, j)
`=1
10
If we think of transition matrices P in general, the term p2 (i, j) can be seen as the
dot product of the ith row of P with the j th column of P . . . or the (i, j)th entry of P 2 .
2
3
..
2
3 ..
. p(1, j) .
6.
..
..7
2
6
4
5
P = p(i, 1) p(i, k) 4..
.
.7
5
..
..
. p(k, j) .
m+n
(i, j) =
k
X
`=1
Proof:
To prove this equation we break things down according to the state at time m.
time 0
time m + n
time m
j
i
P(xm+n = j|x0 = i) =
k
X
P(xm+n = j, xm = `|x0 = i)
Now use the definition of conditional probability for the term in the sum:
P(xm+n = j, xm = `|x0 = i) =
=
P(xm+n = j, xm = `, x0 = i)
P(x0 = i)
P(xm+n = j, xm = `, x0 = i) P(xm = `, x0 = i)
P(xm = `, x0 = i)
P(x0 = i)
Thus,
p
m+n
(i, j) =
k
X
`=1
QED
m+1
(i, j) =
k
X
pm (i, `)p(`, j)
`=1
which is the ith row of the m step transition matrix by the column of P . So the m + 1
step transition matrix is given by P m+1 .
The m-step transition matrix is equal to the 1-step transition matrix to
the power of m.
Example:
Let Xn be the weather on day n in Dublin, either rainy= 1 or not rainy= 2, with
transition matrix
.8 .2
P =
.6 .4
The day after tomorrow?
A two step transition matrix
P =
.76 .24
.72 .28
so if it is rainy today, there is a 76% chance it is rainy the day after tomorrow.
10
20
=
=
.750 .250
.749 .251
.75 .25
.75 .25
(approx.)
Note the apparent converging behaviour of the entries of the transition matrix as n gets
larger and larger. More formally,
3 1
n
lim P = 43 41
n!1
End Example
Consider the general 2 state chain, xn 2 {1, 2}, with transition matrix
1 a
a
P =
b
1 b
where 0 a 1, 0 b 1
12
What is P n in general?
In other words, what is the limiting behaviour as n ! 1?
If we can write P = QQ
then
I| =
a
b
= (1
= (1
a)(1
=
)
1,2
=
=
=
=
Matrix =
1
0 1
1
)(1
(2
p
(2
(2
b)
(2
b)
(2
b) a + b
2
1
1
0
a
a
b
p
4
b)
a
a
2
ab
[1
a+1
b) + (1
b)2
4(1
b] +
b)
4(a + b) + (a + b)2
2
y1
y
y1
y1
1 a
a
P
=
=)
= 1
y2
y2
b
1 b y2
y2
a)y1 + ay2 = y1
ay1 =
ay2
y1 = y2 = y
y
y
13
ab
b)
(1
4(a + b)
z
P 1
z2
= (1
b)
(1 a)z1 + az2 = (1 a
(1 a 1 + a + b)z1 =
az2
b
z2 =
z1
a
z
So the second eigenvector is
b
z
a
Now,
y z
Q =
y ab z
1
1
b
yz yz
a
a
yz(a + b)
z
y
z1
z2
b)z1
b
z
a
b
z
a
z
y
P = QQ 1 , so P n = Qn Q 1 .
y
y
1
0
b
z 0 (1 a b)n
a
a
y z
(1
yz(a + b) y ab z
a
yz(a + b)
a
yz(a + b)
a
b
a+b a
a
b
a+b a
So
n
P =
b
yz (1
a
b
yz
+ ab (1
a
yz
yz
b
a
b
a
b
a+b
b
a+b
a
(1
a+b
b
(1
a+b
a
yz(a + b)
b
z
a
a
a
b)n
b)n
(1
1
a
a+b
a
a+b
b)n y
(1 a b)n )
b
(1 a b)n
a
a
(1
a+b
b
(1
a+b
b)n ! 0 as n ! 1
14
yz(1 (1 a b)n )
yz 1 + ab (1 a b)n
z
y
yz + (1 a b)n yz
yz ab (1 a b)n yz
What happens as n ! 1?
If |1
b
z
a
a b)n yz
a b)n yz
a
a+b
a
a+b
b)n y (1
+ (1 a b)n
b
(1 a b)n
a
+ (1 a b)n
b
(1 a b)n
a
a
a
b)n
b)n
1 < 1 a b<1
1 < a+b 1<1
0 < a+b<2
Thus, if 0 < a + b < 2, then
n
lim P =
n!1
We know that
b
, a
a+b a+b
b
a+b
b
a+b
a
a+b
a
a+b
n+1
(i, j) =
k
X
pn (i, `)p(`, j)
`=1
2
X
pn (i, `)p(`, j)
`=1
(j)
|{z}
(`)p(`, j)
`=1
long-run probability
2
X
p(1, 1) p(1, 2)
(1) (2) = (1) (2)
p(2, 1) p(2, 2)
b
, a
a+b a+b
In this chapter we will look at properties which can be used to classify the behaviour of
Markov chains.
3.1
Decomposability
Suppose we have two disjoint closed sets A1 and A2 . If we start the chain in A1 , i.e.
xa 2 A, then the states outside of A1 are immaterial, and the process (chain) can be
analysed solely through its movement in A1 . This is the idea of decomposability.
15
A Markov chain is indecomposable if its set of states does not contain two
or more disjoint closed sets of states.
A1
A2
3.2
Periodicity
Some Markov chains exhibit periodic behaviour. Suppose that the states are indecomposable and consider, say the two-step transition probabilities p2 (i, A) (two-step transition
probabilities in going from state i to states in the set A).
It is possible that the states decompose into two closed sets under this transition
probability. That is, there are two disjoint sets B1 &B2 such that
p2 (i, B1 ) = 1
8 i 2 B1
p2 (i, B2 ) = 1
8 i 2 B2
16
Example
In the simple random walk
p(i, i + 1) = p, p(i, i
q
i
1) = q = 1
p
i
i+1
If i is an odd integer, then the next state will be even. The next state again will be
odd. Similarly, if i is even, in two steps the state will be even again. So if we let the even
integers and B2 be odd, then
p2 (i, B) = 1
8 i even
p2 (i, B1 ) = 1
8 i odd
3.3
! Stability
Stability
Suppose X0 = x. We want to know what statements can be made about the chain after
a large number of subsequent movements. A key question is that of stability.
Regardless of initial state, can the states that are visited by a chain be represented
by some limiting distribution after a large number of steps? If we think of the m-step
transition probabilities pm (x, A) (the probability of being in A after m steps) the question
were asking is:
Is there a limiting distribution (A) such that pm (x, A) ! (A) as m ! 1.
If the answer is yes, we say that the chain is stable. Stability here is a property of the
chains transition probabilities since pm (x, A) ! (A) regardless of the arbitrary state x.
It can be seen that if the chain is decomposable or periodic, we cant have stability.
Decomposability:
Let A1 , A2 be disjoint closed sets of states. Then for every m
(
1 , x 2 A1
pm (x, A1 ) =
0 , x 2 A2
with the reverse for m odd. So theres no limiting value for this probability as m ! 1.
At the very least the chain must be indecomposable and aperiodic to be stable.
Exercise:
Find an example of a decomposable chain and a periodic chain.
17
3.4
Long-Run Regularity
If the chain is stable it has some long-run regularity properties. No matter what state we
start from, the proportion of time the chain spends in the set of states A will be (A).
Count the number of times x1 , . . . , xm is in A. Let
(
1 , xt 2 A
f (xt ) =
0 , otherwise
Then
1 X
f (xt )
m t=1
pt (x, A))
= pt (x, A)
So the expected proportion of time spent in A is
m
1 X t
p (x, A)
m t=1
As time ticks on, as m ! 1, then since the chain is stable pm (x, A) ! (A).
If the numbers in a sequence t ! , then
m
1 X
t !
m t=1
So that
1 X t
p (x, A) ! (A)
m t=1
1 X
f (xt )
m t=1
(A) >
!0
as m gets large(r).
3.5
! stationary distribution
! irreducible
18
m+1
(x, A) =
k
X
pm (x, `)p(`, A)
`=1
k
X
(`)p(`, A)
`=1
..
.
3
p(1, k)
p(2, k)7
7
.. 7
. 5
p(k, k)
a
b
19
a
1
(1) (2)
(1) (2)
1 = (1
a
b
a
1
a)1 + b2
2 = a1 + (1
b)2
a
= a1 =) 2 = 1
b
b2
a
1 + 2 = 1, 1 + 1 =
b
a+b
1
= 1
b
b
a+b
1 =
b
a
=
a+b
a+b
2 = 1
)
b
a+b
a
a+b
Quick Recap:
Stable, stationary distributions . . .
k k transition matrix P
P =
` = 1
`=1
Example
Weather on day n in Dublin.
1 =rainy
2 =not rainy
P =
.8 .2
.6 .4
a
b
a
1
20
End Example
b
a
,
a+b a+b
.6 .2
=
,
.8 .8
3 1
=
,
4 4
.81 + .62
P =
.8 .2
2
= 1 2
.6 .4
.21 + .42 = 1 2
.81 + .62 = 1
.21 + .42 = 2
.1 + 2
.81 + .6(1 1 )
.81 + .6 .61
.81
1
2
=
=
=
=
I
II
1 =) 2 = 1
1
1
.6
6
3
=
=
8
4
1
=
4
End Example
Example:
Social mobility. Recall the transition matrix:
2
3
.7 .2 .1
P = 4.3 .5 .25
.2 .4 .4
Does the proportion of families falling into the three social classes approach a stable
limit?
1 2
P =
3
.7
.2
.1
3 4.3 .5 .25 = 1 2 3
.2 .4 .4
2
21
)
.71 + .32 + .23 = 1
.21 + .52 + .43 = 2
.11 + .22 + .43 = 3
I
II
II
1 + 2 + 3 = 1
Solving yields:
1 =
22
16
9
, 2 = , 3 =
47
47
47
End Example:
Theorem:
For an indecomposable, non-periodic chain with transition probabilities p(x, A) such
that any two states x and y communicate, then the system of equations
(j) =
k
X
p(`, j)(`)
j = 1, . . . , k
`=1
k
X
(`) = 1
`=1
3.6
Detailed Balance
(x)p(x, y) =
x=1
k
X
(y)p(y, x)
x=1
= (y)
k
X
p(y, x)
x=1
= (y)
A good way to think of the detailed balance is as follows: imagine a beach wherre
(x) gives the amount of sand at mound x. A transition of the chain means a fraction
p(x, y) of the sand at x is transferred to y. Detailed balance says that the amount of sand
going from y to x in one step is completely balanced by the amount going back from y
to x.
(x)p(x, y) = (y)p(y, x)
22
In contrast, P = says that after all transfers of sand, the amount that ends up on
each mound is the same as the amount that started there.
Example:
A graph is defined by giving two things:
1. A set of vertices V (finite);
2. An adjacency matrix A(u, v) which is 1 if there is an edge connecting u and v, and
is 0 otherwise.
(Actors= group of nodes)
1
4
2
0
61
A(u, v) = 6
41
0
1
0
1
0
3
0
07
7
05
0
1
1
0
0
The adjacency matrix can be used to describe the topology of the graph.
By convention A(v, v) = 0 8 v 2 V .
3
3
since each neighbour of u contributes 1 to this sum. Now consider a random walk Xn on
this graph.
Define the transition probability by
p(u, v) =
A(u, v)
d(u)
1
2
23
A(u,v)
.
d(u)
This says
d(u)p(u, v) = A(u, v)
(v) = 1
v2V
)
c= P
1
v2V
d(v)
and (u) = P
d(u)
v2V d(v)
End Example
Poisson Processes
4.1
First, we assume that we observe the process for a fixed period of time t.
The number of events that occur in this fixed interval (0, t] is a random variable X.
X will be discrete and its probability law will depend on the manner in which events
occur.
We make the following assumptions about the way in which events occur:
1. In a sufficiently short length of time t, then either 0 or 1 events occur in that time
(two or more simultaneous occurances are impossible).
24
2. The probability of exactly one event occuring in this short time interval of length
t is
t. So, the probability of exactly one event occuring in the interval is
proportional to the length of the interval.
3. Any non-overlapping intervals of length
These three assumptions are the assumptions for a Poisson process with parameter .
4.2
non-overlapping equal
| t|
0
time
t.
t
n
n
k
n!
( t)k
t
t
=
1
1
k!(n k)! nk
n
n
n
k
( t)k
t
t
n(n 1) . . . (n
=
1
1
k!
n
n
nk
lim
n!1
lim
n!1
n(n
1)(n
t
n
2) . . . (n
nk
t
n
= 1
k + 1)
= 1 1
lim P(X = k) =
so in the limit as
= e
)
t!0
k + 1)
t ! 0, the same as n ! 1.
t=
( t)k
e
k!
1
n
2
n
... 1
1
n
!1
t
).
n
X xj
x2
e =1+x+
+ ... =
2!
j!
j=0
x
, valid for x 2 R.
1
X
( t)k
P(X = k) =
e
k!
k=0
k=0
e t=1
, as wed expect.
4.3
For any `
1) . . . (X
` + 1)] = ( t)`
1) . . . (X` + 1) = 0
1.
E[X(X
1) . . . (X
1
X
( t)k
` + 1)] =
e
k!
k=`
k(k
1
X
( t)k ` ( t)`
=
e
(k
`)!
k=`
1) . . . (k
1
X
( t)j
j!
j=0
= e
( t)`
= e
( t)` e
= ( t)`
So then,
E[X] = ( t)1 = t
E[X(X
1)] = ( t)2
` + 1)
Assume molecules in a rare gas occur at an average rate of per cubic metre. If it
is reasonable to assume that these molecules of the gas are distributed independently in
the air, then the molecules in a cubic metre of air is a Poisson random variable with rate
parameter . If we wanted to be 100(1
)% confident of finding at least one molecule
of the gas in a sample of air, what sample size of air would we need to take?
Let the sample size be s cubic metres. let the number of molecules be X which is
Poisson distributed with rate s. So we would require
(s)0 e s
P(X 1) = 1 P(X = 0)
0!
= 1
1
So,e
s log
1
log
=1
P(T t) = FT (t)
P(T > t) = 1
FT (t)
distribution function
FT (t)
survival function
, t>0
d
FT (t) = e
dt
So the time to the first event in a Poisson process is exponentially distributed with
parameter .
The expected value for an exponential random variable is
Z 1
E(T ) =
t e t dt
0
( t + 1)
s(
t)
ds
0
t) 1
s(
E(X j ) =
ds
E(X) =
, t<
d
mT (t)
dt
t=0
d
mT (t)
dtj
t=0
The factorial generating function from above can be used to verify that
1
Var(T ) = 2
and so the standard deviation is the same as the mean.
Example:
Students arriving at a lecture, at a rate of 2 per minute. If I observe for 3 minutes,
what is the probability of no students arriving?
P(X = 0) =
( t)0 e
0!
= e
= 0.0025
So the probability of observing no students arriving in this interval is incredibly small.
End Example
28
4.4
4.5
The exponential probability law has the memoryless property. If T is an exponential with
parameter and let a & b be positive constants, then
P(T > a + b|T > a) =
=
P(T > a + b)
P(T > a)
(a+b)
e
b
= e
= P(T > b)
The exponential distribution is the only continuous probability law with the memoryless
property. There are some similarities between the exponential and geometric probability
distribution.
X1 , . . . , X n
independent Bernoulli trials
# of trials to first success is a geometric random variable.
Geometric is the number of trials to first success while the exponential represents the
time to first event in a Poisson process. If Y is a geometric RV with parameter p, then
P(Y > n) = (1
p)n
n!1
lim
n!1
= e
t
n
4.6
Suppose we begin observing a Poisson process at time zero and let Tn be the time to
occurrence of the rth event, r
1. This random variable is analogous to a negative
binomial random variable. Again t be any fixed number and consider the event {Tr > t}
(time to rth event greater than t). {Tr > t} is equivalent to the event {X r 1} where
X is the number of events in (0, t], since Tr can only exceed t if there are r 1 or fewer
events in (0, t]. X is Poisson with parameter t so
P(Tr > t) = P(X r
r 1
X
( t)k
=
e
k!
k=0
29
1)
t
= 1
r 1
X
( t)k
e
k!
k=0
= 1
Tr is called an Erlang random variable with parameters r and . The density function
for Tr is
d
FT (t)
dt r
d
=
1 e
dt
fTr (t) =
(r
1)!
r r 1
t
e
(r)
r r 1
e
t
te
+
( t)2
e
2!
...
3 2
t
e
2!
( t)r 1
e
(r 1)!
t
r 2 r 2
...
(r
2)!
r r 1
(r
1)!
, t>0
, t>0
since
()
tr 1 e
10
= 5mins
2
so at 9.05am.
30
9
X
(5.2)k
k!
k=0
= 1
9
X
10k
k=0
k!
5(2)
10
= .542
The probability that the tenth call is received between 9.05am & 9.07am.
!
!
9
9
X
X
(14)k 14
10k 10
P(5 < T10 7) =
1
e
1
e
k!
k!
k=0
k=0
= .349
End Example
4.7
We have seen some results concerning the distributions of the time between occurances
in a Poisson process:
1. The distribution of the time to the first event is exponential ( );
2. Times between events are exponential ( );
3. The time to the rth event is gamma distributed, shape= r, rate=
So far we have assumed that the rate of occurrence
called a time homogeneous Poisson process.
1
scale= rate
.
Let X(t) denote the number of events in (0, t]. Then the Poisson process is said to
have independent increments.
Let T1 , T2 , T3 , T4 , . . . denote the arrival times of the process, and define T0 = 0, then
X(T1 ) X(T0 ), X(T2 ) X(T1 ), X(T3 ) X(T2 ), are independent random variables.
This is since X(t + s)
X(r), 0 r < s.
4.8
X(s), t 0 is a rate
Let X(t) be the number of events in (0, t]. We say that X(t) is a Poisson process with
rate (t) if:
1. X(0) = 0;
2. X(t) has independent increments;
31
3. X(t)
s)
hR
t
0
= exp
(r)dr
Z
i0
R
t
(r)dr
0
(r)dr
(r)dr)
0
P(T1 > t)
Z
exp
(r)dr
d
(r)dr exp
(r)dr
=
dt 0
0
Z t
= (t) exp
(r)dr
fT1 (t) =
Rt
If we call (t) = 0 (r)dr, then we can see fT1 (t) = (t)e
exponential distribution.
32
(t)
f T1 =
(r)dr =
0
dr = t
0
When (t) depends explicitly on t, i.e. non-constant then we term this a time nonhomogeneous Poisson process. Change point:
(
,t <
1
(t) =
,t
2
Showing that a Poisson process satisfies the Markov property in general follows from the
independent increments property (2).
However, a Poisson process is a continuous time process, so we need to formally say
what we mean by the Markov property in continuous time. In discrete time we observe
our process at time points 0, 1, 2, 3, . . . , n, n + 1, . . .
For continuous time we observe the process at arbitrary points in time [R+ ].
0 = s0 < s1 < s2 < . . . < sk < s < t < t1 < . . . < tn
, with states i0 , i1 , i2 , . . . , ik , i, j, j1 , . . . , jn . We say that the Markov property holds if for
these arbitrary points in time:
P(X(t) = j, X(t1 ) = j1 , . . . , X(tn ) = jn |X(s0 ) = i0 , . . . , X(sk ) = ik , X(s) = i)
= P(X(t) = j, X(t1 ) = j1 , . . . , X(tn ) = jn |X(s) = i)
Compare to discrete time definition.
For the Poisson process
P(X(t) = j|X(s) = i) =
(independent increments) =
P(X(t) = j, X(s) = i)
P(X(s) = i)
P(X(t)
X(s) = j i)P(X(s) = i)
P(X(s) = i)
= P(X(t) X(s) = j i)
R
j i
R
t
t
(r)dr
exp
(r)dr
s
s
=
(j i)!
Therefore, satisfies the Markov property.
We will denote P(X(t) = j|X(s) = i) by Ps,t (i, j) for continuous time processes.
In the next chapter we will meet examples where the states form a continous random
variable.
33
4.9
34
1
X
E[S(t)|X(t) = n]P(X(t) = n)
n=0
1
X
nE(Y )P(X(t) = n)
n=0
= E[Y ]
1
X
nP(X(t) = n)
n=0
= E[Y ]E[X(t)]
For 2, again if X(t) = n, then
Var[S(t)] = Var[Y1 + . . . + Yn ]
= nVar[Y ]
Hence,
2
E[S(t) ] =
1
X
n=0
1
X
n=0
1
X
n=0
E[S(t)]2
E[Y ]2 E[X(t)]2
E[X(t)]2 )
5.1
Brownian Motion
1
2
1
2
If we think about speeding up this process, i.e., looking at it in smaller and smaller
time intervals for smaller and smaller increments to the left and right well get a continuous time process.
St
2
1
1
1
2
In this regard, consider the symmetric random walk taking steps over short intervals
of length t, with steps of size x. Let X(t) be the value of the process at time t, and
well imagine we have n = t t time intervals.
X(t)
2 x
x
t
2 t
3 t
4 t
x
2 x
Then,
X(t) =
=
xX1 +
xX2 + . . . +
x[X1 + X2 + . . . + X[t/
36
xX[t/
t]
t]
t
t
E[X1 ]
= 0
since E[X1 ] = 12 (1) + 12 ( 1) = 0, and
Var[X(t)] = ( x)
= ( t)
since E[X12 ] = 12 (1) + 12 ( 1)2 = 1.
Now we want to take the limit as
x and
x=c
t
t
Var(X1 )
t tend to 0. Let
p
t
t
= c2 t
The process that were left with in the limit is Brownian motion.
Observe some more properties of this process:
1. Since X(t) = x(X1 + X2 + . . . + X[t/ t] ), by the Central Limit Theorem, X(t)
follows a normal distribution with mean 0 and variance c2 t.
2. As the distribution of the change in position of the random walk is independent over
non-overlapping time intervals, then this implies that {X(t), t 0} has independent
increments.
3. This process also has stationary increments, since the change in the process value,
i.e. X(t) N (0, ct2 ), over a given time interval depends only on the length of the
interval.
The standard Brownian motion (c = 1) is sometimes called thw Wiener process. It is
one of the most widely used processes in applied probability.
The independent increments assumption implies that the change in the value of the
process between times s and t + s, i.e. X(t + s) X(t), is independent of the process
values before time s.
P(X(t + s) a|X(s) = x, X(u)0 u < s) = P(X(t + s)
X(s) a
= P(X(t + s)
X(s) a
x)
= P(X(t + s) = a|X(s) = x)
37
(independence)
So this tells us that Brownian motion satisfies the Markov property (showed that a simple
random walk satisfies the Markov property).
is
Let X(t) be standard Brownian motion, then X(t) N (0, t). So, the density of X(t)
x2
1
ft (x) = p e 2t
2
Since Brownian motion has stationary and independent increments we can write down
the joint distribution of X(t1 ), X(t2 ), . . . , X(tn ). This is:
t1 (x2
x1 )ft3
t2 (x3
x2 ) . . . ftn
tn
(xn
xn 1 )
0} , X(t) N (0, t)
t1 (x2
38
x1 ) f tn
tn
(xn
xn 1 )
For example, conditional distribution of X(s) given that X(t) = B, where s < t is
fs|t (x|B) =
=
fs,t (x, B)
ft (B)
fs (x)ft s (B
ft (B)
p1 e
2s
x2
2s
x)
1
e
2(t s)
p1 e
2t
= q
exp
2 s(tt s)
1
= q
exp
s(t s)
2 t
1
= q
exp
s(t s)
2 t
1
= q
exp
s(t s)
2 t
1
(B x)2
2(t s)
B2
2t
1 x2 (B
+
2 s
t
x)2
s
1 x2 B 2
+
2 s
2Bx + x2
t s
1
2
1
t
x2
2 s(t s)
1
1
+
s t s
= q
exp
s(t s)
2 t
2 s(tt s)
= q
exp
s(t s)
2 t
2 s(tt s)
Var[X(s)|X(t) = B) =
B2
t
Bs
t
2B
x + B2
t s
2B
sB 2
x+
t s
t(t s)
1
t
1
t
2 !
2Bs
s2 B 2
x+ 2
t
t
x2
B2
2t
Bs
t
and variance
s(t s)
.
t
So this
Bs
t
s(t
s)
t
(independent of B)
Interestingly, the variance here does not depend on B. If we set = st , then since s < t,
0 < < 1 and the mean is X(t), and the variance is (1 )t.
When we can consider the process only between 0 and 1 conditional on X(1) =), this
new process is known as the Brownian bridge.
39
X(0) = 0
X(1) = 0
DIAGRAM TO BE FINISHED
This is used in the analysis of empirical distribution functions.
5.2
Gaussian Processes
1
1
T
1
fx (x) =
(x ) (x )
n
1 exp
2
(2) 2 || 2
where is an n n-covariance matrix & = (1 , . . . , n ) is the mean vector.
Example:
If X1 , . . . , Xn N (,
) iid, then
= diag( 2 , . . . ,
= (, . . . , )
fx (x) =
B 1
exp @ (x
(2) ||
| 2
n
2
1
2
)T 1 (x
{z
Mahalanobis distance
2 n
|| = ( )
= diag
40
,...,
C
)A
}
(x
)T 1 (x
) =
x1
x2
xn
(x
)T I(x
(x
)T (x
)
)
3
x1
n
7
6
6 x2 7 X
6 .. 7 =
(xi
4 . 5 i=1
xn
1
2 n
fX (x) =
) 2 exp
n (
(2) 2
n
1 X
2
(xi
)2
i=1
)2
)=
n
Y
f (xi )
i=1
End Example
Recall that the joint density function of X(t1 ), . . . , X(tn ) for Brownain motion was
f (x1 , . . . , xn ) = ft1 (x1 )ft2
t1 (x2
x1 )ft3
t2 (x3
x2 ) . . . ftn
tn
(xn
xn 1 )
5.3
1. X(0) = 0;
2. {X(t), t
41
DIAGRAM TO BE FINISHED
It can be written as
X(t) = t + W (t)
where W (t) is a standard Brownian motion.
5.4
Finance Applications
Alas, no time.
6
6.1
42
likelihood function is
(x|) = f (x1 |)f (x2 |) . . . f (xn |)
n
Y
i=1
f (xi |)
This can be thought of as the probability of observing the given random sample with
parameters .
Example:
Suppose that the time to failure of a vital component in an electronic device is exponentially distributed. A sample of n failure times is X = (x1 , . . . , xn ). The likelihood
function is:
n
Y
Pn
i=1 xi
(x| ) =
e xi = n e
i=1
End Example
Maximum likelihood proceeds by maximising the likelihood with respect to the unknown parameter .
Usually, we work with the log-likelihood
log (x|) = log
"
n
X
i=1
n
Y
i=1
f (xi |)
log f (xi |)
Then take the gradient of the log-likelihood and set this equal to zero:
r log (x|) = 0
The value of which satisfies this is the maximum likelihood estimate.
Example:
X
n
d
log (x| ) =
d
n
xi = 0
=) =
= x
43
xi
xi
xi
End Example
Example:
Assume X1 , . . . , Xn Bernoulli (p). What is the MLE of p?
f (x|p) = px (1
(x|) =
n
Y
i=1
P
= p
= p
p)1
xi
(1
p)
xi
(1
p)n
P
P
xi
xi
(1
xi
X
i
xi
xi
1 xi
P
p)n
+ log(1
xi log p +
= log p
pxi (1
p)1
xi
P
xi
p)n
X
i
d
d X
log (x|p) =
xi log p + n
dp
dp
i
P
P
(n
i xi
i xi )
=
p
1 p
P
(n
i xi )
= 0
1 p
P
p (n
i xi )
= 0
1 p
P
p
i xi
P
=
1 p
n
i xi
P
n
x
1 p
P i i
=
p
i xi
1
p
n
1 = P
p =
xi
xi
xi
xi
log(1
X
i
xi
p)
log(1
p)
n
n
1X
=
xi
n i=1
= x
End Example
44
Example:
X1 , . . . , Xn N (,
f (x; ,
(x; ,
) = p
n
Y
) =
i=1
2)
(2
1
log (x; ,
1
(x
2 2
2)
(2
) = log
= log
n
2
1
2
1
(xi
2 2
!n
= log (2
n
2
1
n
2) 2
(2
2
)2
1
(xi
2 2
1
(xi
2 2
(2
2)
)2
n
2
)2
)2
1
(xi
2 2
+ log e
)2
1
2
n
log(2
2
MLE for :
d
log (x; ,
d
) =
= +
n
1 X
2
(xi
n
2 X
2
n
1 X
2
) = 0
xi
xi
= 0
N
= 0
1X
=
xi
n i=1
= x
45
(x1
)2
)2
(xi
(xi
i=1
)2
i=1
i=1
1 X
1
(xi
2 2
(xi
) ( 1)
)
log e
MLE for
:
d
log (x; ,
d 2
n 4
2 2
) =
1 X
3
2 X
(xi
(xi
)2
n
1 XX
+ 3
(xi
i i
X
2n +
(xi
)2 = 0
)2 = 0
2n =
)2
(xi
1X
=
(xi
n i=1
)2
x)2
1X
(xi
n i=1
s =
E(s2 ) =
n
X
(xi
x)2
i=1
Example:
Let X1 , . . . , Xn Gamma (, ). What are the MLEs of and ?
46
End Example
= (, )
f (x|) =
(x|) =
()
n
Y
i=1
n
Y
i=1
x 1 e
f (xi |)
()
xi 1 e
xi
" n #
Y
n
=
xi
[ ()]n i=1
log (x|) = n log
= n log
Pn
i=1
n log () + (
n log () + (
xi
1) log
1)
X
i
ML for :
@
n
log (x|) =
@
=)
X
n
=
xi
"
n
Y
i=1
log xi
xi
xi
xi
=) = P
i xi
ML for :
@
log (x|) = n log
@
=)
is the solution of:
X
n 0 (
)
n log +
log xi =
(
)
i
n 0 () X
+
log xi
()
i
No closed form solution for the MLEs; use numerical methods to solve for
, . This
can be done quite easily using the R package optim.
End Example
6.2
Prior Distributions
In finding the maximum likelihood estimates in the previous section, only the observed
sample values x1 , . . . , xn are used to construct the estimate of . ML does not require
47
any other information to estimate other than the sample values. If we did have some
prior information about the possible values that may take, such as expert opinion, it
would have been impossible to incorporate this. In many situations such information will
be available. We can use this information to inform a prior distribution for . and
then use the Bayesian approach for estimation. The prior distribution of a parameter
is a probability function/density expressing our degree of belief about the value of prior
to observing a sample of a random variable X, whose distribution function depends on
. The prior distribution makes use of information available above and beyond whats in
the random sample.
Example:
Suppose we have a brand new 50 cent coin and we want to estimate the probability
of a head. We know has to lie between 0 and 1. A prior for could be uniform over
the interval from 0 to 1.
(
1 , 2 (0, 1)
() =
0 , otherwise
()
1
This corresponds to an assumption of total ignorance; we feel that all values of are
equally likely. On the other hand, one may feel justified in assuming a priori 2 (.4, .6)
since the coin appears quite symmetric. Then the following prior corresponds to a belief
that any value in (.4, .6) is equally likely:
(
5 , 2 (.4, .6)
() =
0 , otherwise
()
5
48
Finally, we may only have values .4, .5, .6 with .5 twice as likely giving prior.
()
1
2
1
4
.4
.5
.6
Note in this example that the priors are dierent and depend on the assumptions we
are willing to make regarding the unknown . Often these assumptions will be informed
using expert opinion on the problem.
End Example
() ! prior beliefs about where may lie in space.
: Normal(
, 2 ) = R R +
: Bernoulli() = (0, 1)
6.3
Posterior Distributions
Having observed a sample x = (x1 , . . . , xn ) we can write down the likelihood for x given
the value of :
Likelihood = (x|)
n
Y
i=1
f (xi |)
(x, )
()
,i.e. the product of the likelihood and the prior. Then the marginal density of the sample
values, which is independent of , is given by the integral of the joint density over the
space .
Thus,
(x) =
=
Z
Z
(x, )d
(x|)()d
).
|{z}
) where scale =
1
.
rate
1
Inv Gamma
Y
where
f 1 (t) =
Y
()
50
rate
(+1)
= scale.
1
t)
Y
1
= P(Y
)
t
F 1 (t) = P(
Y
P(Y t)
= 1
1
FY ( )
t
= 1
d
F 1 (t)
dt Y
f 1 (t) =
Y
d
1
FY ( )
dt
t
( 1)
1
fY ( )
2
t
t
1
1
= 2
t
() t
=
f 1 (t)
Y
1
t
t2 ()
()
+1
(+1)
1
e (t)
) =
1
2 2
()
n
Y
i=1
= (2
exp
( 2)
2
)
n
2
1
(
2 2
(+1)
exp
exp
51
1
2
(xi
n
1 X
2
j=1
(xi
)2
|x) / (x|,
2
/ (2
()
2
/ ( )
n
2
)()( 2 )
exp
2
( )
(+1)
(n
++1)
2
exp
(independence)
!
n
1 X
(xi )2 (2 2 )
2
exp
i=1
1
"
n
X
(xi
i=1
) + 2
#!
exp
exp
1
2
1
(
2 2
1
(
2 2
#!
1
/ ( 2)
exp
x2i 2
xi + 2 + 2 + 2 (2 2 + 2 )
2
i
i
P
1
n
1
2
2
2 (n
++1)
2
i xi
i xi
2
/ ( )
exp
+ 2
2
+ 2 +
+ 2 + 2
2
2
2
2
(n
++1)
2
"
1
2
End Example
Computing a marginal likelihood (x): only possible in the sinplest of cases/models.
X1 , . . . , Xn N (, 2 ), 2 known.
52
Prior which is N (, 2 ).
(x|,
) =
n
Y
(2
1
2
exp
i=1
= (2
n
2
1
2
() = (2 )
(x) =
Z
Z
= C
= C
exp
exp
(xi
n
1 X
2
(xi
i=1
1
(
2 2
+1
(x|,
)()d
1
+1
(2
|
Z
Z
= C0
= C0
1
+1
1
2
0
1
2
exp @
exp @
0P
2
0P
B
= C0 exp @
i xi
2
n
2
1
2
1
2
i xi
2
n
2
x2i
= C0 exp @
"
B 16
6
exp B
@ 24
+1
exp
+1
1
2
(2 2 ) exp
{z
}
+1
= C exp
Z
n
2
x2i
n
2
+
2
1
2
+
2
1
2
1
A
)2
(xi
xi + n2
C
A
1
+
exp
1
@
+1
+1
1
2
i xi
2
n
2
1
2
n
2
+
2
1
2
(1)
|{z}
+
2
1
2
1
A
i xi
2
31
x2i
C2
+ 2
n
1
2 + 2
P
C
2 7
7C d
+
+
2
2 5A
|
{z }
i
5A d
!#2
2 +
P
1
+ 2
2
31
+ 2
n
1
2 + 2
2
1
2 + 2
53
)2
1 2
2 2
+ 2
i xi
2
exp @
i xi
2
1
2
1
(
2 2
i=1
0"
2 1 s
1
+ 2
2
+ 2
@Var =
n
1 X
i xi
2
i xi
2
+ 2
12 1
A A d
+ 2
n
1
2 + 2
! 2 11
AA d
6.4
There are many quantities of interest that we may want to get from a Bayesian analysis.
For example, the mean of the posterior distribution is a widely used Bayesian estimator.
The mode of the posterior is called the maximum a posteriori (MAP) estimate of
.
If is of dimension p, (1 , . . . , p ), we may be interested in the marginal density of
j :
(j | j , x) =
where
(|x)d
, j = 1, . . . , p
=
(|x)d = E|x []
=
(x|)()
d
(x)
This calculation requires knowing (x) which will be intractable in most cases.
This is a big problem!
We will face these integrals in each problem we look at.
What if we could simulate values of , say (1) , (2) , . . . , (N ) , from (|x)? Instead of
doing these integrals analytically, we could approximate them numerically:
Z
N
1 X (k)
E|x [] =
(|x)d
N k=1
In fact we could use the same approach to approximate the posterior expectation of
any function g() of .
Z
N
1 X
g((k) )
E|x [g()] =
g()(|x)d
N
k=1
The main idea of Markov Chain Monte Carlo is to approximately generate samples
from the posterior (|x), and then use these to approximate integrals.
6.5
The key idea of MCMC is simple. We want to generate samples from (|x) but we
cant do this directly. However, suppose we can construct a Markov chain (through its
transition probabilities) with state space (all values of which is straightforward to
simulate from, and it has stable (stationary) distribution which is the posterior (|x).
(0) , (1) , (2) , (3) , . . . , (t) , . . . , (N )
54
6.6
j = 1, . . . , p
where j = {i : i 6= j}. These are densities of the individual components given the
data and the specified values of the other components of .
They can be typically recognised as standard densities, e.g. normal, gamma, etc., in
j .
(0)
(0)
nd
(2)
(0)
(0)
draw 1 from (1 |2 , . . . , p , x)
iteration .
..
Now suppose this procedure is contained through t iterations resulting in the sampled
(t)
vector (t) = ((t) , . . . , p ) is a realisation of a Markov chain with transition probabilities
(t)
p( ,
(t+1)
)=
p
Y
(t+1)
(j
(t+1)
)`
(t)
, ` < j, ` , ` > j, x)
j=1
(t)
Example:
A popular application of Gibbs sampler is in finite mixture models used for modelbased clustering. In R use package mclust (Raftery).
55
G
X
2
g)
wg f (x|g ,
g=1
wg = 1
g=1
and
f (x|g ,
2
g)
is N (g ,
2
g)
n
G
Y
X
i=1
g=1
wg f (xi |g ,
2
g)
The likelihood is very difficult to work with. Thus we usually complete the data with
component labels z = (z1 , . . . , zn ), which tells us which component each observation
belongs to, i.e. zi = g, then xi arises from a N (g , g2 ). Of course the labels give the
clustering of the data, but cant be observed directly. We cant include these as unknowns
in the Gibbs sampler.
(K-means)
The likelihood of the complete data is:
G Y
Y
1
(x, z|) =
wg p
2
g=1 i:z =g
i
G
Y
wgng (2 g2 )
ng
2
2
g
exp
2
1
exp
g=1
g ) 2
(xi
2
g
2
g i:zi =g
(xi
g )
( + + ... + )
w 1 w2
( ) ( )... ( ) 1
G
(G ) Y
w
( )G g=1 g
56
. . . wG
Usually one assumes that the means g arise from a N (, 2 ) a priori and independently.
G
Y
1
1
2
p
(1 , . . . , G ) =
exp
(g )
2 2
2 2
g=1
Finally, well assume that the variances arise from an inverse gamma distribution independently:
Y
2
2
2 (+1)
( 1 , . . . , G ) =
( )
exp
2
() g
g
g=1
(, z|x) / (x|)()
/
G
Y
g=1
wgng (2 g2 )
|
ng
2
exp
1
2
{z
2
g i:zi =g
(xi
g )
wg exp
| {z }
} prior |
weights
likelihood
(
|
1
(g
2 2
{z
exp
{z
prior means
2 (+1)
g)
2
g
prior variances
The next step in implementing a Gibbs sampler for this model is to derive the full
conditionals. We want to iteratively sample the lables, weights, means & variances.
Labels full conditional:
P(z + i = k|everything else) / wk (2
/
wk
k
exp
2
k)
1
2
exp
1
2
k
(xi
1
2
2
k
(xi
k )
k )
G
Y
wgng +
g=1
"
2
P
#!
1
ng
1
i:zi =g xi
2
/ exp
+ 2 g 2
+ 2 g
2
2
2
g
g
"
0 n
1
P
g
#2
1
xi
+
+
2
i:zi =g g2
2
2
g
A
/ exp @
g
ng
1
2
2 + 2
g
57
i:zi =g
ng
2
g
2
g
( g2 |everything else) / ( g2 ) (
xi
2
g
1
2
ng
2
g
1
+
1
2
is
n
++1
2
) exp
1
2
g
"
1 X
(xi
2 i:z =g
g ) 2 +
#!
n
1 X
+ ,
(xi
2
2 i:z =g
g ) 2 +
!
End Example
6.7
This algorithm constructs a Markov chain (1) , (2) , . . . , (t) , . . . by defining the transition
probability from (t) to (t+1) as follows:
Let q(, 0 ) denote a proposal distribution such that if = (t) , then 0 is a proposed
next value for the chain, i.e. 0 is a proposed value for (t+1) . However, a further
randomization then takes place.
With some probability (, 0 ) we actually accept (t+1) = (t) . This construction
defines a Markov chain with transition probabilities given by
Z
0
0
0
0
p(, ) = q(, )(, ) + I( = ) 1
q(, 00 )(, 00 )d00
where I() is an indicator function.
If we now set
(0 |x)q(0 , )
(, ) = min 1,
(|x)q(, 0 )
0
(|x)q(, 0 ) = (|x)q(0 , P )
This is called the detailed balance condition, & it is a sufficient condition to to ensure
that (P |x) is the stable distribution of the chain. Thus, we only require the functional
form of the posterior.
In practice we generally assume that q(, 0 ) is a normal distribution
N (,
2
prop I)
where I is the identity matrix. The value of the chain will depend on the value of
generally we tune this to give 25 40%.
58
2
prop ,
Spatial Processes
Tutorials
A.1
Tutorial 1
Problems 1
1. 5 white 5 black
2 urns five balls in each.
i
5
i
5
i)2
(5
25
1|Xn = i) =
i)
i
i
i
i2
= =
5
5 5
25
i
5 i 5 i
i
2i(5 i)
+
=
5
5
5
5
25
Take 2 of the same colour from the urns
P(Xn+1 = i|Xn = i) =
b white balls.
Draw in the same way but such that theres m balls in each urn.
Let Xn = # black balls in left urn
P(Xn+1 = i + 1|Xn = i) =
(Draw black from right
white from left)
m i m (b
m
m2
(b black b
(m
i)(m
m2
i)
i in right)
b+i
Exercise:
P(Xn+1 = i
2. (Gamblers ruin N = 4)
p(i, i + 1) = .4
p(i, i 1) = .6
Stop if i reaches 4 (p(4, 4) = 1) or if i reaches 0 (p(0, 0) = 1).
Since the games are independent:
p3 (1, 4) = (.4)3
1
Xn
2
Xn+1
3
Xn+2
4
Xn+3
0
Xn
Xn+1
0
Xn+3
Xn+2
or
1
Xn
Xn+1
Xn+2
0
Xn+3
Alternative method:
2
1
6.6
6
P =6
60
40
0
0
0
.6
0
0
0
.4
0
.6
0
0
0
.4
0
0
3
0
07
7
07
7
.45
1
Compute P 3 , the matrix of 3-step transition probabilities, and simply read o the
required values.
3. General two-state chain; state space S = {1, 2}.
1 a
a
P =
b
1 b
Use the Markov property to show that
b
P(Xn+1 = 1)
= (1
a+b
b) P(Xn = 1)
b
a+b
Now,
P(Xn+1 = 1) = P(Xn = 1)P(Xn+1 = 1|X
= P(Xn = 1)(1
a) + P(Xn = 2)b
= P(Xn = 1)(1
a) + (1
= (1
b)P(Xn = 1) + b
60
P(Xn = 1))b
P(Xn+1 = 1)
b
= (1
a+b
P(Xn+1 = 1)
b
= (1
a+b
b)P(Xn = 1) +
b(a + b) b
a+b
b) P(Xn = 1)
b
a+b
And hence,
b
P(Xn = 1) =
+ (1
a+b
PXn = 1)
b
= (1
a+b
= (1
a
a
b) P(Xn
b)
b)
= ...
= (1
P(Xn
= 1)
= 1)
b)
b
a+b
n!1
A.2
P(Xn = 1)
Tutorial 2
Problems 2
1.
1
2
11 19 17
=
, ,
47 47 47
1 1 1
=
, ,
3 3 3
2.
1 = (.4, .6)
6 7 22
2 =
, ,
35 35 35
61
b
a+b
b
a+b
b
a+b
b
a+b
P(X0 = 1)
b
lim P(Xn = 1) =
+ lim (1
n!1
a + b n!1
If |1
b)
P(X0 = 1)
b
a+b
i.
Part 1
Part 2
1s
2t
3
;
;
max(t,s)
1t
= e
1t
2t
2t
e
3
max(t,s)
max(t,s)
+
2+
1
62
3)
3)
if V &V independent.
P(U > s, V > t) =
=
Z
Z
= (
=
= e
= e
1
s
1
s
Z
Z
3 )(
2+
1 + 3 )s1
3)
1
s
1 + 3 )s
1s
2t
3 )e
1+
Z
e
1 + 2 )s1
1 + 3 )s1
s
(
2 + 3 )t1
3 )e
ds1
2 + 3 )t
ds1 dt1
(
2 + 3 )t1
dt1
2 + 3 )t
3 (s+t)
1s
2t
max(t,s)
1, . . . ,
n.
n
j=1
Distribution of T () P(T t)
FT (t).
Consider P(T > t), if T > t, then we must have that each of Tj is greater
than t.
P(T > t) =
n
Y
P(Tj > t)
j=1
n
Y
jt
j=1
= e (
Pn
j=1
)t
63
Pn
j=1
j.
(b) Show
i
P(Ti < Tj ) =
P(Ti < Tj ) =
=
=
=
Z
Z
+
1
i 6= j
0
1
0
0
i
i+
=
i
it
ie
jt
dt
i + j )t
dt
(
j )e
n.
Show
i + j )t
dt
(dist of exp(
(1)
j
=
i
(c) T1 , . . . , Tn exponential
j
1, . . . ,
P(Ti = min(T1 , . . . , Tn )) = Pn
i
j
j=1
P(Ti = min(T1 , . . . , Tn )) =
=
=
=
Z
Z
fTi (t)
0
1
ie
64
it
j6=i
jt
dt
j6=i
e (
"
= Pn
Pn
1
Pn
j=1
j=1
5. X1 Poisson (1 ), X2 Poisson (2 ).
j=1
)t dt
e (
j
)t
#1
0
j ))
X1 + X2 Poisson (1 + 2 ).
P(X1 + X2 = k) =
k
X
P(X1 = m)P(X2 = k
m)
m=0
k
1
X
m
k
1 e
2
m!
(k
m=0
= e
(1 +2 )
(1 +2 )
(1 +2 )
k!
e
k
X
e 2
m)!
m!(k
m=0
k!
(1 +2 )
k!
7. Later
65
k
m
1 2
k
X
k!
k
m
1 2
m!(k m)!
m=0
k
X
k
m=0
k
X
k
m
1 2
(1 + 2 )m
m=0
m)!