Você está na página 1de 62

ACTL2003 Stochastic Modelling

By Zhi Ying Feng


Contents
Part 1: Stochastic Processes ................................................................................................................. 4
Stochastic Processes ......................................................................................................................... 4
Increment Properties .................................................................................................................... 4
Markov Processes ............................................................................................................................ 4
Transition Probability .................................................................................................................. 5
Chapman-Kolmogorov Equations................................................................................................ 5
Classification of States ................................................................................................................. 6
Irreducible Markov Chain ............................................................................................................ 6
Recurrent and Transient States .................................................................................................... 6
Class Properties ............................................................................................................................ 8
Limiting Probabilities .................................................................................................................. 8
Mean Time in Transient States .................................................................................................... 9
Branching Processes .................................................................................................................... 9
Probability Generating Functions .............................................................................................. 10
Time Reversible Markov Chains ............................................................................................... 11
Exponential, Poisson and Gamma Distributions........................................................................ 11
Counting Process............................................................................................................................ 12
Poisson Process .......................................................................................................................... 13
Inter-arrival and Waiting Time .................................................................................................. 13
Order Statistics ........................................................................................................................... 13
Conditional Distribution of Arrival Time .................................................................................. 14
Thinning of Poisson Process ...................................................................................................... 14
Non-homogenous Poisson Process ............................................................................................ 15
Compound Poisson Process ....................................................................................................... 15
Continuous-time Markov Chains ................................................................................................... 16
Time Spent in a State ................................................................................................................. 16
Transition Rates and Probabilities ............................................................................................. 17
Chapman-Kolmogorov Equations.............................................................................................. 18
Limiting Probabilities ................................................................................................................ 19
Embedded Markov Chain .......................................................................................................... 19

1
Zhi Ying Feng

Time Reversibility...................................................................................................................... 20
Birth and Death Process ................................................................................................................. 21
Transition Rates and Embedded Markov Chain ........................................................................ 21
Examples of Birth and Death Processes .................................................................................... 21
Expected Time in States ............................................................................................................. 22
Kolmogorov Equations .............................................................................................................. 23
Limiting Probabilities ................................................................................................................ 23
Application: Pure Birth Process ................................................................................................. 24
Application: Simple Sickness Model ......................................................................................... 24
Occupancy Probabilities and Time ............................................................................................ 25
First Holding Time ..................................................................................................................... 25
Non-homogenous Markov Jump Processes ................................................................................... 26
Residual Holding Time .............................................................................................................. 27
Current Holding Time ................................................................................................................ 27
Part 2: Time Series ............................................................................................................................. 28
Introduction to Time Series............................................................................................................ 28
Classical Decomposition Model ................................................................................................ 28
Moving Average Linear Filters .................................................................................................. 28
Differencing ............................................................................................................................... 29
Stationarity ................................................................................................................................. 30
Sample Statistics ........................................................................................................................ 30
Noise .......................................................................................................................................... 31
Linear Processes ......................................................................................................................... 31
Time Series Models ....................................................................................................................... 32
Moving Average Process ........................................................................................................... 32
Autoregressive Process .............................................................................................................. 33
Autoregressive Moving Average Models .................................................................................. 34
Causality..................................................................................................................................... 34
Invertibility................................................................................................................................. 35
Calculation of ACF .................................................................................................................... 35
Partial Autocorrelation Function ................................................................................................ 37
Model Building .............................................................................................................................. 38
Model Selection ......................................................................................................................... 38
Parameter Estimation ................................................................................................................. 39
Model Diagnosis ........................................................................................................................ 40
2
Zhi Ying Feng

Non-Stationarity ............................................................................................................................. 41
Stochastic Trends ....................................................................................................................... 41
ARIMA Model ........................................................................................................................... 42
SARIMA Model ......................................................................................................................... 42
Dickey-Fuller Test ..................................................................................................................... 43
Overdifferencing ........................................................................................................................ 44
Cointegrated Time Series ........................................................................................................... 44
Time Series Forecasting ................................................................................................................. 45
Time Series and Markov Property ............................................................................................. 45
k-step Ahead Predictor ............................................................................................................... 46
Best Linear Predictor ................................................................................................................. 47
Part 3: Brownian Motion.................................................................................................................... 48
Definitions .................................................................................................................................. 48
Properties of Brownian Motion.................................................................................................. 48
Brownian Motion and Symmetric Random Walk...................................................................... 49
Brownian Motion with Drift ...................................................................................................... 49
Geometric Brownian Motion ..................................................................................................... 49
Gaussian Processes .................................................................................................................... 50
Differential Form of Brownian Motion ..................................................................................... 50
Stochastic Differential Equations............................................................................................... 51
Stochastic Integration ................................................................................................................. 52
Part 4: Simulation............................................................................................................................... 53
Continuous Random Variables Simulation .................................................................................... 53
Pseudo-Random Numbers.......................................................................................................... 53
Inverse Transform Method......................................................................................................... 53
Discrete Random Variable Simulation ...................................................................................... 54
Acceptance-Rejection Method ................................................................................................... 55
Simulation Using Distributional Relationships.......................................................................... 56
Monte Carlo Simulation ................................................................................................................. 57
Expectation and Variance .......................................................................................................... 57
Antithetic Variables ................................................................................................................... 58
Control Variates ......................................................................................................................... 59
Importance Sampling ................................................................................................................. 60
Number of Simulations .............................................................................................................. 60

3
Zhi Ying Feng

Part 1: Stochastic Processes


Stochastic Processes
A stochastic process is any collection of random variables, denoted as:

X t , t T

T is the index set of the process, usually the time parameter


X(t) is the state of the process at time t
S is the state space, i.e. the set of values that X(t) can take on

Stochastic processes can be classified according to the nature of the:


Index Set T
- Discrete time process: index set is finite or countably infinite
- Continuous time process: index set is continuous
State Space S
- Discrete state space: X(t) is a discrete random variable
- Continuous state space: X(t) is a continuous random variable
A sample path or a realisation of a stochastic process is a particular assignment of possible values
of X(t) for all t T.
Increment Properties
In a Markov chain, an increment is a random variable for all t1 , t2 :

X t2 X t1
A stochastic process has independent increments if X t0 , X t1 X t0 ,..., X tn X tn1 are
independent for all t1 t2 ... tn . Equivalently, the r.v. X t s X t and X t are independent
for all s t , i.e. future increases are independent of the past or present.
A stochastic process has stationary increments if X t2 X t1 and X t2 X t1 , i.e.
increments of the same length, have the same probability distribution, for all t1 , t2 and 0 .

Markov Processes
A Markov process is a stochastic process that has the Markov property, where given the present
state, the future state is independent of the past states:

Pr X tn xn | X t1 x1 ,..., X tn1 xn1 Pr X tn xn | X tn1 xn1

A Markov Chain is a Markov process on a discrete index set T 0,1, 2,... denoted by

X n , n 0,1, 2,...
i.e. the process is in state k at time n, and a finite or countable state space:

S 0,1, 2,..., n or S 0,1, 2,...


The Markov property for a Markov chain is given by:

Pr X n1 xn1 | X 0 x0 ,..., X n xn Pr X n1 xn1 | X n xn

4
Zhi Ying Feng

Transition Probability
The one-step transition probabilities of the Markov chain is the conditional probability of
moving to state j in one step, given that the process is in state i in the present:

Pij n,n1 Pr X n1 j | X n i
If the one-step transition probabilities do NOT depend on time, i.e. same for all n to n+1, then the
Markov chain is homogenous with stationary transition probabilities.

Pij Pij n,n1 Pr X n1 j | X n i


The transition probability matrix is the matrix consisting of all transition probabilities:
P00 P01 P02

P P P

10
11
12

P Pij
P20 P21 P22

This matrix satisfies the properties:


Pij 0 for i, j 0,1, 2... and Pij 1 for i 0,1, 2...
all j

Note: in the non-homogenous case, time must be specified.


The n-step transition probability is the probability that a process in state i will be in state j in n
steps:

Pijn Pr X nm j | X m i
Chapman-Kolmogorov Equations
The Chapman-Kolmogorov equations computes the n+m-step transition probabilities:

Pijn m Pikn Pkjm for all n, m 0


k 0

This is the sum of the transitional probabilities of reaching an intermediate state k after n steps, then
reaching state j after m steps. Or in matrix multiplication:

nm

n
m
P P P P P...
n m times

n
Where P Pijn is the matrix consisting of the n-step transition probabilities.
Consider the matrix multiplication A*B = AB
To get the ith row of AB, multiply the ith row of A by B
To get the jth column of AB, multiply A by the jth column of A

Kolmogorov Forward Equations: start in state i, transition into state k after n steps, then a onestep transition to state j

Pijn1 Pikn Pkj for all n 0


k 0

Kolmogorov Backward Equations: start in state i, one-step transition into state k, then n step
transition into state j.

Pijn1 Pik Pkjn for all n 0


k 0

5
Zhi Ying Feng

Classification of States
Absorbing state
A state i is said to be an absorbing state if Pii 1 or Pij 0 for all j i . An absorbing state is a state
in which, if the process ones arrive in it, the process will stay in.
Accessible
State j is accessible from state i if Pijn 0 for some n 0 . State j is accessible from i if there is a
positive probability that the process will be in state j at some future time, given that the process is
currently in state i. This is written as:
i j
Note that:
if i j and j k then i k
The absorbing state is the ONLY accessible state for an absorbing state.
Communicate
States i and j communicates if:
i j and j i, i.e. i j

That is, there is a positive probability that if the process is currently in state i, then at a future point
in time the process will return to state i in between visiting state j. Note that:
i i for all i
if i j then j i
if i j and j k then i k
The class of states that communicate with state i is the set:

C i j S : i j
That is, there is a positive probability that if the process is in state i, then at a future point in time the
process will return to state i in between visiting at least one state in class C
Irreducible Markov Chain
A Markov chain is irreducible if there is only ONE class, i.e. all states communicate with each
other. Properties:
All states in a finite, irreducible Markov chain is recurrent
The probability of returning to the current state in the long run is positive
The probability of being in any state in the long run is also positive
Recurrent and Transient States
Let fi be the probability that the process starting in state i will return to state i sometime in the future.

fi Piin Pr X n i | X 0 i
A state is:
Recurrent if fi 1

Transient if fi 1

If state i is recurrent then, starting in state i, the process will return to state i at some point in the
future, and the process will enter state i infinitely often.

6
Zhi Ying Feng

If state i is transient then, starting in state i, the number of the times the process will enter state i
thereafter is a Bernoulli random variable
0 if process returns to state i
X i ~ Ber p 1 fi
1 if process does not return to state i
Then the probability that the process will be in state i for exactly n time periods can be modelled
using the geometric distribution:

fi n1 1 fi
The total number of periods that the process is in state i is given by:

In ,
n 0

1 if X n i;
where I n
0 if X n i,

With the expected number of time periods that the process starts in state i:


E I n | X 0 i E I n | X 0 i
n 0
n 0

Pr X n i | X 0 i
n 0

Piin
n 0

1
1 fi

An alternate definition for recurrent/transient states is if a state i is:

recurrent if

Piin ;
n 1

transient if

Piin
n 1

That is, a transient state will only be visited a finite number of times.
Note:
In a finite state Markov chain, AT LEAST one state must be recurrent and NOT ALL states
can be transient. Otherwise, after a finite number of times, no states will be visited!
An absorbing state is recurrent, since it will revisit itself infinite times
If state i communicates with state j and state i is:
Recurrent, then state j is also recurrent
Transient, then state j is also transient
A class of states is:
Recurrent if all states in the class are recurrent
Transient if all states in the class are transient
Closed if all of the states in the class can only lead to states WITHIN the class. Hence,
states in a closed class are recurrent
Open if the state in the class can lead to states OUTSIDE the class. Hence states in an open
class are transient

7
Zhi Ying Feng

Class Properties
Period
State i has period d(i) if d(i) is the greatest common divisor of all n 1for which Piin 0 , i.e. its
the number of steps in all possible paths back state i, if the process starts in state i. Note that:

If Piin 0 for all n, then d i 0 .

If i j , then d i d j

A state with period 1 is called aperiodic

Positive Recurrent
A recurrent state i is positive recurrent if the expected time of return to itself is finite. In a finite
state Markov chain, ALL recurrent states are positive recurrent.
Ergodic
A state i is ergodic if it is positive recurrent and aperiodic.

Limiting Probabilities
Limiting probabilities is the long run probability that a process is in a certain state. For an
irreducible (only one class, all states communicate) and ergodic Markov chain, the limiting
probability exists and is independent of i, i.e. where the process starts from:

j lim Pijn

j0

Where j is the unique non-negative solution to the set of equations:

j i Pij

and

i 0

j 1
j 0

Or, in matrix form:

P where 1, 2 ,...
This can be interpreted as:
The probability that the process being in state j at time t is the same as t+1, as t
j is the long run proportion of time that the Markov chain is in state j

Note that the limiting probabilities may not exist at all, or only for even/odd transitions

If state j is transient and state i is recurrent, then lim Pijn 0 and lim Piin 1
n

If the distribution of the initial states is chosen to be the limiting distribution, then the probabilities
of being in state j initially is same as the probability of being in state j at time n:

Pr X 0 j j , then Pr X n j j
The mean time between visits to state j is the expected number of transitions m jj until a Markov
chain which starts in state j to return to state j, given by the mean of geometric distribution:
1
m jj

That is, the proportion of time in state j equals the inverse of the mean time between visits to state j.
8
Zhi Ying Feng

Mean Time in Transient States


For a finite state Markov chain, let its transient states be denoted by T 1, 2,..., t . Let PT denote the
transition matrix that ONLY contains the transient states. Note that the rows will sum to less than 1.
P11 P12 ... P1t

P21 P22 ... P2t

PT

...

Pt1 Pt 2 ... Ptt


The mean time spent in transient states sij is the expected number of periods that the Markov
chain is in state j, given that the process starts in state i.

s11 s12 ...

s
s
...
S 21 22

st1 st 2 ...
Conditioning on the initial transition, we have for i, j T

s1t

s2t

stt
,

t
if i j
Pik skj
k 1
sij
t
1 P s if i j

k 1 ik ki
i.e. mean time spent in state j given the process is currently in state i, given by the transition
probability into state k starting in state i, times by the time spent in state j starting in state k,
summed for all possible k.
Then we have:
S = I PT S
S I PT

Note that:
a b
A

c d

1 d b
1 d b

det A c a ad bc c a

Branching Processes
A branching process is a Markov chain that describes the size of a population where each member
in each generation produces a random number offsprings in the next generation.
Let p j , j 0 be the probability of producing j offsprings by each individual in each

generation
Assume that the offsprings of each individual is independent of the numbers by others
Let X0 be the number of individuals initially present, i.e. the zeroth generation
Individuals produced from the nth generation belong to the (n+1)th generation
Let Xn be the size of the nth generation.

9
Zhi Ying Feng

Then X n , n 0,1,... is a branching process with:

P00 1 and state 0 is recurrent

If p0 0 , all states other than 0 are transient, i.e. the population will either die out or
converge to infinity, since transient states are visited finite amount of times

After n+1 generations, the size of the population is a random sun:


Xn

X n 1 Yi
i 1

Where Yk are i.i.d. r.v. that represent the no. of offspring of the Xi-th person:

Pr Yi k pk , for k 0,1,...

pk 1
k 0

The mean and variance of number of offspring for an individual is:

E Yi kpk
k 0

2 var Yi k pk
2

k 0

The mean and variance of number of the population of the nth generation is:
E Xn n

2 n 1 n 1 if 1

1
var X n
2 n
if 1
Under the assumption that X 0 1 , the probability of extinction, i.e. everyone dies, is given by:

0 0 pk
k

k 0

Since the pgf of Y is GY s k 0 pk s k we have

0 GY 0

If 1 then 0 1

If 1 then 0 1 , i.e. extinction is certain

Probability Generating Functions


Let X be an integer-valued random variable with P X i pi for i 0,1, 2... . The p.g.f. is:

GX t E t X t x Pr X x
x 0

Properties:
Relationship between the probability mass function of X:
x
1 PX t
pX x

x ! t x
t 0

Relationship between the moment generating function of X:


mX t E exp Xt

GX t mX log t

10
Zhi Ying Feng

Time Reversible Markov Chains


Consider a stationary ergodic Markov chain with stationary probabilities Pij and stationary
probabilities i. If we start at time n and work backwards, the reversed sequence of states Xn, Xn-1,
is also a Markov chain, due to independence being a symmetric relationship.
The transition probabilities of this reversed process are:
Qij Pr X m j | X m 1 j

Pr X m j , X m 1 j
Pr X m 1 j

Pr X m j Pr X m 1 j | X m j
Pr X m 1 j

j Pji
i

A time reversible Markov chain is one where:


Qij Pij

for all i, j

For a time reversible Markov chain:

i Pij j Pji

for all i, j

Thus, the rate at which the process goes from state i to state j (LHS) is the same as the rate at which
it goes from state j to state i.

Exponential, Poisson and Gamma Distributions


A r.v. X has an exponential distribution if its probability density function is in the form:

e x if x 0
f x
otherwise
0
The cumulative probability distribution and the survival function are in the form:
1 e x if x 0
e x if x 0
F x
S x 1 F x
if x 0
if x 0
0
0
The moment generating function is given by:

M X t E etX

for t

The mean and variance of the exponential distribution is:


1
EX

var X

The hazard (or failure) rate is given by:


d
S x
f x
e x
dx
x

S x
1 F x e x

11
Zhi Ying Feng

A key property of the exponential distribution is that it is memoryless. X is said to be memoryless if


for all s,t>0,
Pr X s t | X t Pr X s or;
Pr X s t Pr X t Pr X s

Consider independent X i for i 1, 2,..., n with parameter i then

If all the parameters are equal i then Yn X1 X 2 ... X n has a gamma(n, )


distribution with pdf:
fYn t e

t n1
n 1!

X min X1, X 2 ,..., X n also has an exponential distribution with parameter and mean:
n

EX

and

i 1

i1 i
n

The probability that X i is the smallest is given by:

j 1 j
n

If all parameters are not equal, i.e. i j then the sum of n independent exponential r.v. has
a hyperexponential distribution with pdf:
n

i1 X i
n

x Ci,ni e x
i

i 1

where Ci ,n
j i

j
j i

Counting Process

A counting process N t , t 0 represents the number of events that occur up to time t, i.e. it is a
discrete state space, continuous time stochastic process. A counting process has the properties:

N t 0

N t is integer-valued

N s N t for s t , i.e. it must be non-decreasing

For s t , N t N s is the number of events that occurred in the interval ( s, t ]

A counting process has independent increments if the number of events that occur in the interval
( s, t ] , N t N s , is independent of the number of events that occur up to time s

A counting process has stationary increments if N t2 s N t1 s has the same distribution as

N t2 N t1 , i.e. the number of events that occur in any interval of time period only depends on
the length of the time period.

12
Zhi Ying Feng

Poisson Process

A Poisson process denoted by N t , t 0 is a counting process that counts the number of events
which occur at a rate from time 0. It has the properties:

N 0 0

Independent and stationary increments


The number of events in any interval of length t has a Poisson distribution with mean t :

Pr N s t N s n Pr N t n
e

stationary increments

t n
n!

Inter-arrival and Waiting Time


The inter-arrival or holding times Tn , n 1, 2,... is the time between the n 1 and nth event. For a
th

Poisson process, the inter-arrival times are i.i.d. exponential r.v. with:
fTn t et ,

Pr Tn t et

and

E Tn

The waiting time S n is the time until the nth event, given by:
n

Sn T1 T2 ... Tn Ti
i 1

The sum of exponential random variables each with SAME parameter has a gamma(n, )
distribution, therefore:
f Sn t e

t n1
n 1!

Note: if Sn is the aggregate of m independent claims, then it has a gamma(n, m) distribution. E.g.
for a motor insurer, each insured motorist makes a claim at a rate of 0.2 per year, and there are a
total of 200 insured. Then S100, the waiting time until the 100th event, has the distribution:

S100 ~ Gamma 100, 0.2 200

Order Statistics
Let Yi be the ith smallest value among Y1 , Y2 ,..., Yn , then Y1 , Y 2 ,..., Y n is called the order statistics. If

Yi are i.i.d. continuous r.v. with pdf f yi , then the joint pdf of the order statistics Y1 , Y 2 ,..., Y n is:
n

f y1 , y2 ,..., yn n ! f yi ,
i 1

where y1 y2 ... yn

If Yi are uniformly distributed over 0,t then the joint pdf of the order statistics is:
f y1 , y2 ,..., yn

n!
tn

13
Zhi Ying Feng

Conditional Distribution of Arrival Time


Given that N t n , i.e. up until time t, n events have occurred, the n arrival times S1 ,..., Sn have the
same distribution as the order statistics corresponding to n independent r.v. uniformly distributed
over 0,t .
Note that the following two events are equivalent:

S1 s1, S2 s2 ,..., Sn n, N t n T1 s1, T2 s2 s1,..., Tn sn sn1, Tn1 t sn

i.e. inter-arrival time between the n 1 and nth event is equal to the waiting time for n events
th

minus the waiting time for n 1 events:


th

f s1 , s2 ,..., sn | N t n

f s1 , s2 ,..., sn , n
Pr N t n

e ... e
s1

sn sn1

e t

The conditional marginal distribution is:

Pr N s m | N t n

n!

n!
tn

Pr N s m, N t n
Pr N t n

Pr N s m Pr N t s n m
Pr N t n

n ! s e s t s
m

t sn

nm

t s
e

m ! n m ! t e t
n

n sm
nm
n t s
m t
Thinning of Poisson Process
Consider a Poisson process N t , t 0 with rate and that each time an event occurs, it is either:

type I event with probability p

type II event with probability 1 p


And that it is independent of all other events. Let N1 t and N 2 t denote the type I and type II
events occurring in [0,t]. Let the sum of the two independent process be:

N t N1 t N 2 t

N1 t , t 0 is a Poisson process with rate p


N2 t , t 0 is a Poisson process with rate 1 p
N t , t 0 is a Poisson process with rate equal to the sum of the two rates, i.e.
14

Zhi Ying Feng

Non-homogenous Poisson Process

The counting process N t , t 0 is a non-homogenous Poisson process with intensity function

t , i.e. non-constant rate, if:

N 0 0

It has independent increments


It has unit jumps, i.e.
Pr N t h N t 1 h o h

Pr N t h N t 2 o h

t itself can be deterministic or a process itself. Note that if t then it is a homogenous


Poisson process.
The mean value function of a non-homogenous Poisson Process is:

m t y dy
t

Then:

N t is a Poisson r.v. with mean m t

N s t N s is a Poisson r.v. with mean m s t m s

N m

t , t 0 is a homogenous Poisson process with intensity 1

Compound Poisson Process


A stochastic process X t , t 0 is a compound Poisson process if

X t

N t

Yi
i 1

Where:

N t , t 0 is a Poisson process

Yi , i 1, 2,... are i.i.d. random variables

This is useful for insurance companies, where the uncertainty on total claim size X(t) is due to both
the number of claims N(t) which follows a Poisson process, and also the claim size Y, which can
have any distribution as long as they are i.i.d. Note that claim size Y is independent of the number of
claims N(t)
The mean and variance of a compound Poisson process is given by:
E X t tE Yi

var X t t E Yi 2

15
Zhi Ying Feng

Continuous-time Markov Chains


A continuous-time Markov chain, or a Markov jump process, is a Markov process in continuous

time with a discrete set space. The Markov jump process X t , t 0 has a continuous version of
the Markov property: for all s, t 0 and nonnegative integers i, j, x u , 0 u s :

X t i,

Pr X t s j X u x u , Pr X t s j | X t i

0 u t

i.e. the process at time t+s is only conditional on the state at time t and independent of its past

states before time t, i.e. X u x u for 0 u t . This means that time path up to the state at time t
does not matter.
The Markov jump process is a homogenous process, i.e. all transition probabilities are stationary, if:

Pr X t s j | X t i Pr X s j | X 0 i

i.e. independent of t

Pij s
Hence, the transition probability over a time period only depends on the duration of the time period.
The Poisson Process is a continuous time Markov chain:

Pr X t s j | X s i Pr j i jumps in ( s, s t ] | i jumps in 0, s
Pr j i jumps in ( s, s t ]
independent increments
Pr j i jumps in (0, t ]
e

stationary increments

t j i
j i !

Time Spent in a State


The time spent in a state i before the next jump, denoted as Ti, for a continuous Markov chain, has
the memoryless property:

Pr Ti s t | Ti s Pr X v i, s v s t X u i , 0 u s

Pr X v i, s v s t | X s i

Markov Property

Pr X v i, 0 v t | X 0 i

Stationary Increments

Pr Ti t
The only continuous distribution with the memoryless property is the exponential distribution.
Hence, the time spent in a state is exponentially distributed with:

Ti ~ exp vi

Pr Ti t 1 evit

With the expected time spent in state i before transitioning into another state is:

E Ti 1 vi

Where vi is the rate transition when in state i, i.e. the transition rate out of state i to another state. It
is the sum of all the rates of jumping from state i to another state j, denoted as qij:
vi j i qij qii

16
Zhi Ying Feng

Transition Rates and Probabilities


The transition probability of the continuous-time Markov chain is defined as:

Pij t , t s Pr X t s j | X t i

For a homogenous or stationary process we have stationary transition probabilities:


Pij t , t s Pr X t s j | X t i

Pr X s j | X 0 i
Pij s
With the initial conditions

0 if i j
Pij s, s Pij 0
1 if i j
Transition Rates of Homogenous Processes
The instantaneous transition rate qij from state i to state j of a continuous-time Markov chain is
defined as the rate of change of the transition probability over a small period of time:

Pij h
lim
if i j

Pij 0 h Pij 0 h0 h qij

P
t

lim

t ij
h 0
h
t 0
lim Pii h 1 q v if i j
ii
i

h
h 0
Where vi qii denote the transition rate out of state i when the process is in state i. Note that:

j Pij 1
j qij 0

sum of all transition probabilities is 1


sum of all transition rates is 0

Therefore, we have that:


qii j i qij 0
qii j i qij vi

Equivalently, the transition probability over a small time h can be defined as:
if i j

q h o h
Pij h ij

1 qii h o h if i j

Transition Rates of Non-homogenous Processes


In the non-homogenous case, the transition rates are also a derivative of the transition probabilities:

t Pij s, t lim
h 0
t s

Pij s, s h Pij s, s
h

Pij s, s h
qij s
if i j
lim
h 0
h

lim Pii s, s h 1 q s v s if i j
ii
i
h0
h

Equivalently, the transition probability over a small time h can be defined as:
if i j

q s h o h
Pij s, s h ij

1 qii s h o h if i j

17
Zhi Ying Feng

Now, ignore when the transition occurs and how long is spent in each state. Consider only the
series of states that the process transitions into. Let:
vi be the transition rate OUT of state i when the process is in state i

Pij be the probability that the transition is into state j, conditional on the fact that a transition

has OCCURRED and process is currently in state i. (see embedded Markov chain)
Then, qij , the transition rate INTO state j, when the process is in state i, is given by:
qij vi Pij

Using Pr A B P A P B | A , where A is the event of moving out of state i, B is the event of


moving to state j. Then we also have:
qii vi j i qij j i vi Pij
qii vi j i qij

Therefore,

Pij

qij
vi

qij

qij
j i

Chapman-Kolmogorov Equations
For a homogenous continuous-time Markov chain the Chapman-Kolmogorov equation is:

Pij t s Pik t Pkj s

for t 0 and all states i, j

k 0

With the initial conditions:

0 if i j
Pij 0
1 if i j
The Kolmogorovs backward equation is:

Pij t qik Pkj t qik Pkj t vi Pij t


t
k
k i
The Kolmogorovs forward equation is:

Pij t Pik t qkj Pik t qkj v j Pij t


t
k
k j
Define the matrices:
P t Pij t

Q = qij

P t Pij t
t
t

In matrix form we have:


Chapman-Kolmogorov Equations

P t s P t P s

Kolmogorov's Backward Equations


P t QP t
t

Kolmogorov's Forward Equations


P t P t Q
t
Initial Condition
P 0 I
18

Zhi Ying Feng

Limiting Probabilities
The limiting probabilities of a continuous-time Markov chain Pj is the long run probability, or the
long-run proportion of time that the process will be in state j, independent of the initial state:

Pj lim Pij t
t

To solve the limiting probabilities, if it exists, we solve the equations:

v j Pj qkj Pk

qkj Pk 0

k j

P Q 0

Pk 1
k

This equation implies that the rate at which process leaves state j is equal to the rate at which the
process enters state j:
v j Pj is the rate at which process leaves state j since Pj is the long-run proportion of time the
process is in state j and v j is the rate of transition out of state j when the process is in state j

k j qkj Pk

is the rate at which the process enters state j since Pk is the long-run proportion

of the time the process is in state k and qkj is the rate of transition from state k to state j
For the limiting probabilities limt Pij to exist, the conditions are:

All states communicate so that starting in state i there is a positive probability of being in
state j
Markov chain is positive recurrent so that starting in any state, the mean time to return to
that state is finite
When the limiting probabilities exist, then the Markov chain is ergodic, note that aperiodic
is unnecessary as periodicity does not apply to continuous-time Markov chains

If the initial state is chosen according to the limiting distribution, then the probability of being in
state j at time t for all t has the same distribution (homogenous)

Pr X 0 j Pj

Pr X t j Pj

Embedded Markov Chain


For a continuous-time Markov chain that is ergodic with limiting probabilities Pi , the sequence of
states visited (ignoring time spent in each state) is a discrete-time Markov chain. This discrete-time
Markov chain is known as the embedded Markov chain.
The embedded Markov chain has transition probabilities:
Pij Pr transition to state j | transition out of state i

Pr transition from state i to state j


Pr transition out of state i
qij
vi

qij

qij
j i

Note: in general Pij Pij t ! Do not get confused!


19
Zhi Ying Feng

Assuming that the embedded Markov chain is ergodic, its limiting probabilities i , i.e. the long run
proportion of transitions into state i, are the unique solutions of the set of equations:

i j Pji

i 1

and

The proportion of time the continuous-time process spends in state i is given, i.e. the limiting
probabilities of the original continuous Markov process, can also be found using:
i vi
Pi
j j vj

Note that i is the limiting probability of the embedded Markov chain, is the proportion of the
number of transits into state i multiplied by1 vi , i.e. the mean time spent in state i during a visit
Time Reversibility
Going backwards, given the process is in state i at time t the probability that the process has been in
state i for an amount of time greater than s is e vi s :

Pr Process in state i throughout t s, t | X t i

Pr Process in state i throughout [t s, t ]


Pr X t i

Pr X t s i Pr Ti s
Pr X t i

e vi s

Since for large t, we have the limiting probability Pr X t s i Pr X t i Pi . Therefore, if


we go back in time, the amount of time the process spends in state i is also exponentially with rate
vi . Thus the reversed continuous-time Markov chain has the same transition intensities as for the
forward-time process.
The sequence of states visited by the reversed process, i.e. its embedded chain, is a discrete-time
Markov chain with transition probabilities:
j Pji
Qij

Thus, the continuous-time Markov chain will be time reversible, i.e. with same probability
structure as the original process, if the embedded chain is time reversible, i.e.
i Pij j Pji
Using the proportion of time the continuous-time chain is in state i:
i vi Pv i
Pi
i
i i
j vj
j j vj

i Pij j Pji

Pv
i i Pij Pj v j Pji

Note that qij vi Pij , then we have an equivalent condition for time reversibility:
Pq
i ij Pj q ji

i.e. rate at which the process goes directly from i to j is the same as the rate to go directly from j to i.

20
Zhi Ying Feng

Birth and Death Process


A birth and death process is a continuous-time Markov chain with states 0,1, 2... for which
transitions from state n may only go into either n 1 (a birth) or state n 1 ( a death).
Suppose that the number of people in a population/system is n
New arrivals enter the population/system at a birth/arrival rate n

The time until the next arrival is exponentially distributed with mean 1 n

People leave the population/system at a death/departure rate n


-

The time until the next departure is exponentially distributed with mean 1 n

Transition Rates and Embedded Markov Chain


Let vi be the transition rate out of state i when the process is in state i. Initially with a population
of 0, there can only be birth (population can't be negative), so 0 0
vi qij

j i

v0 0
vi i i

For the corresponding embedded Markov chain, let the transition probability Pij denote the
transition probabilities between states. If the Markov chain is in state 0 and a transition occurs, it
must transition into state 1, therefore:
P01 1
To derive the transition probabilities of the embedded Markov chain, consider a population of i. It
will jump to i 1 if a birth occurs before death and jump to i 1 if a death occurs before birth:
The time to a birth Tb is exponential with rate i

The time to a death Td is exponential with rate i

Therefore, the probability that a birth occurs before a death is given by: (minimum of ind. Exp.)
Pr Tb Td

i i

Then the transition probabilities are:


i
i
Pi ,i 1
and Pi ,i 1 1 Pi ,i 1
and Pi ,k 0 for i 1 k i 1
i i
i i
Also, since the time to the next jump, regardless of whether it is a birth or death, is exponential with
rate vi and the rate of jumping out of state i is either a birth or death, then we have as before:
vi i i
Examples of Birth and Death Processes
Birth and death rates independent of n
A supermarket has one server and customers join a queue. Customers arrive at rate with
exponential inter-arrival time (Poisson process). The server serves at a rate with service time also
exponentially distributed. Let X(t) be the number in the queue at time t, then:
for i 0
1
Pi ,i 1
n
for n 0
for i 1, 2,...
for n 1

for i 0
n
0
for n 0
Pi ,i 1
0
for i 1, 2,...
21
Zhi Ying Feng

Birth and death rates dependent on n


If there are now s checkout operators serving the queue, with customers joining a single queue and
going to the first available server, then:
n
for n 0

Pi ,i 1
for i 0,1, 2...

min
i
,
s

0
for
n

min i, s
n n for 1 n s
Pi ,i 1
for i 0,1, 2..
s for n s
min i, s

Poisson Process
A poison process is a special case of a birth and death process, with no death:
Pi ,i 1 1 for i 0,1, 2...
n
for n 0

n 0

for n 0

Pi ,i 1 0 for i 0,1, 2..

Population Growth
Each individual in a population gives birth at an exponential rate plus there is immigration at an
exponential rate . Each individual has an exponential death rate

n n
n n

for n 0
for n 0

n
n n
n
Pi ,i 1
n n
Pi ,i 1

for i 0,1, 2...


for i 0,1, 2..

Expected Time in States


Let Ti be the time for the process, starting from state i, to enter state i+1, i.e. time until a birth.
Define the indicator function I as:
1 if first transition is i i 1
Ii
0 if first transition is i i 1
Then the expected value of Ti can be calculated by conditioning on the first transition:
Ti Ti | Ii 1 Pr Ii 1 Ti | Ii 0 Pr Ii 0
1
i
1

Ti 1
i i i i i i
i
1

Ti 1 Ti
i i
i i

i
i

Ti

i i

Ti 1

Since the time to first transition, regardless of whether it is a birth or death, is exponential with rate
i i . The time taken to i+1 if there is a death is the time from i-1 to i then from i to i+1. Note:

T0 1
0
In the case where the rates are homogenous, i.e. i , i for all i :
i 1
i
1
1
Ti 1 ...


Then the expected time to go from state i to a higher state j is:

Tj Ti Ti1 ... Tj 1

22
Zhi Ying Feng

Kolmogorov Equations
We can use the transition rates of a death and birth process to find the Kolmogorovs equations, and
then solve these differential equations to find the transition probabilities.
qi ,i 1 i
q0,1 0
q0,0 v0 0

qi ,i 1 i

q0,i 0

qi ,i vi i i , otherwise zero

for all i 1

Kolmogorovs Backward Equations

P0, j t 0 P1, j t 0 P0, j t


t

Pi , j t i Pi 1, j t i Pi 1, j t i i Pi , j t
t
Kolmogorovs Forward Equations

Pi ,0 t 0 Pi ,1 t 0 Pi ,0 t
t

Pi , j t j 1Pi , j 1 t j 1Pi , j 1 t i i Pi , j t
t
Limiting Probabilities
The limiting probabilities are determined by the balance equations:

v j Pj qkj Pk for all j and


k j

Pk 1
k

This states that the rate at which the process leaves state j (RHS) is equal to the rate at which the
process enters state j (LHS). For a birth and death process, this condition breaks down iteratively:

n n Pn n1Pn1 n1Pn1

n Pn n1Pn1

Then in general:
P1


P0 , P2 1 P1 1 0 P0 ...
1
2
2 1

n
n 1
n 1
1 0
j 1
Pn
Pn 1
... P0 P0
n
n
2 1
j 1 j

Substituting this into the other balance equation n0 Pn 1 :

P0 Pn 1
n 1

n1

1
... 1 0 1 P0


2 1
n 1 n
1 n1 n1 ... 1 0
n
2 1

P0 P0

Substituting back into (2) for Pn gives:

n1

... 1 0

n
2 1

Pn n 1 ... 1 0 P0


n
2 1
1 n 1 n1 ... 1 0
n
2 1
This also gives the condition for long run probabilities to exist, i.e.:


n1 n1 ... 1 0
n
2
1

23
Zhi Ying Feng

Application: Pure Birth Process


The Poisson process is an example of a pure birth process, i.e.
n
for n 0

n 0

for n 0

In this case, re arranging the Kolmogorov backward equations for the case i 0

Pi , j t Pi 1, j t Pi , j t
Pi , j t Pi , j t Pi 1, j t
t
t
Multiply both sides by e t :

et Pi , j t Pi , j t et Pi 1, j t
t

By the product rule, this is equivalent to:


t
e Pi , j t et Pi 1, j t
t
Integrate both sides w.r.t. from 0 to s:

e s Pi , j s e 0 Pi , j 0 et Pi 1, j t dt

Pi , j 0 0 for i j

s s t
Pi , j s e Pi 1, j t dt
0

LHS is the probability of moving from state i to j over the time period 0 to s

s t
RHS is the probability of the process staying in state i for a time s t , i.e. e , then

jumping to state i+1 over the time interval s t to s t dt , i.e. dt , then the probability
that the process starts in i+1 and finishes in j over the time interval s t to s, i.e. Pi 1, j t dt ,
integrated for all possible times from 0 to s, which is equivalent to t varying from 0 to s.
Application: Simple Sickness Model
In a simple sickness model, an individual can be in state 0 (healthy) or in state 1 (sick).
An individual remains healthy for an exponential time with mean 1 before becoming sick

An individual remains sick for an exponential time with mean 1 before becoming healthy

In this birth and death process, we have:

0 and

With all other i , i being zeros.


Using Kolmogorovs backward/forward equations, the transition probabilities are:

P0,0 s

s
e

s
P1,0 s
1 e

s
P0,1 s
1 e

s
P1,1 s

24
Zhi Ying Feng

Occupancy Probabilities and Time


In a simple sickness model, the occupancy probabilities are defined as:
P0,0 s is the probability that a healthy individual stays healthy for a period of s

P1,1 s is the probability that a sick individual remains sick for same period

By the Markov property, the time spent in state 0 or state 1 is exponential with the memoryless
property:
P0,0 s Pr T0 s 1 Pr T0 s 1 (1 e s ) e s
P1,1 s Pr T1 s 1 Pr T1 s 1 (1 e s ) e s

The occupancy time O(t) is the total time that the process spends in each state during the interval
(0,t). If we define the indicator function:

1 if X s 1
I s

0 if X s 0
Then the occupation time for being sick is:

O t I s ds
t

The expected occupation time being sick given that the initial state is healthy is given by:
t I s ds | X 0 0
0

O t | X 0 0

I s | X 0 0 ds

Pr X s 1| X 0 0 ds
t

P0,1 s ds
t

1 e ds
0
t

First Holding Time


The first holding time T0 is the first the time process jumps out of the initial state:

T0 inf t : X t X 0

For a homogenous Markov jump process, this is exponentially distributed with rate i which is the
rate of jumping out of state i, previously denoted as vi .

i vii j i qij
Thus, the first holding time has the c.d.f. and p.d.f.:

Pr T0 t | X 0 i eit

Pr T0 t | X 0 i i eit

The probability of the state to which the process jumps from is:
qij
Pr X T0 j | X 0 i Pij

Where X T0 is independent of T0
25
Zhi Ying Feng

Non-homogenous Markov Jump Processes


Let the transition rates be denoted by ij s , equivalent to the qij s previously

Pij s, t
t
t s
So that the non-homogenous transition probabilities over a small time period of h is:

ij s

if i j

h s o h
Pij s, s h ij

1 h ii s o h if i j

ii s ij s
i j

Note that this can be written as:


if i j

h s h o h
Pij s h, s ij

1 h ii s h o h if i j

So that:

ij s h

Pij s h, s Pij s, s o h
h

Then by letting h 0 we get:

ij s lim

Pij s h, s Pij s, s o h

h 0

Pij s, t
s
t s
The Chapman-Kolmogorov equations are:

Pij s, t Pik s, u Pkj u, t

differential w.r.t s

P s, t P s, u P u , t

P s, s I

The Kolmogorovs forward equations are derived by differentiating the Chapman-Kolmogorov


equation w.r.t t, then setting u t :

Pij s, t Pik s, u Pkj u , t


t
t
u t
k

kj t Pik s, t kj t Pik s, t v j t Pij s, t


k

k j

P s, t P s, t Q t
t
The Kolmogorovs backward equations are derived by differentiating the Chapman-Kolmogorov
equation w.r.t s, then setting u s :

Pij s, t Pik s, u Pkj s, t


s
u s
k s
ik s Pkj s, t
k

ik s Pkj s, t v j t Pkj s, t

k j

P s, t Q s P s, t
s
Where Q t is the matrix containing ij s for all i,j.

26
Zhi Ying Feng

Residual Holding Time


The residual holding time Rs is the time between s and the NEXT jump:

Rs w, X s i X u i, s u s w

i.e. the state remains in the same state i between time s and s+w. Note that this is the nonhomogenous case of first holding time. We can show that:

sw

Pr Rs w | X s i exp

vi u du

The density of Rs w | X s i is given by:

sw

Pr Rs w | X s i
1 Pr Rs w | X s i vi s w exp vi u du

s
w
w

Define X s X s Rs as the state the process jumps to at the next jump. The density for this is:

ij s w

Pr X s j | X s i, Rs w
vi s w

Then we have the transition probability as:


Pij s, t Pr X t j | X s i
t s

sw

s w

vi u du
e s
vi s w iI
PIj s w, t dw
v
s

I i 0
i

This is the conditional probability that a process is in state j at time t, given that it started in state i at
time s. This transition can be done by:
(1) Staying in state i from time s for a duration of w, i.e. e

sw

vi u du

(2) Jump out of state i, i.e. i s w


(3) Given that a jump has occurred, it jumps to an intermediate state I, i.e. iI s w vi s w
(4) Then jump from state I to j over s+w to t and stay in state j until time t, i.e. PIj s w, t
(5) Integrated over all possible w and summed for all possible states
This is the integral form of the Kolmogorov backward equation, i.e. we consider the jump (q) first
Current Holding Time
For non-homogenous processes the current holding time Ct is the time between the last jump and t:

Ct w, X t j X u j, t w u t

It can be shown that:

Pr Ct w | X t j exp
The density of Ct | X t j is given by:

t w

v j u du

Pr Ct w | X t j
1 Pr Ct w | X t j v j t w exp v j du

t w
w
w
Then we have the transition probability as:

t s

t
kj t w

v j u du
Pij s, t Pik s, t w
v j t w e tw
dw
v j t w
k j 0
This is the integral form of the Kolmogorov forward equation, i.e. we consider the jump (q) last.

27
Zhi Ying Feng

Part 2: Time Series


Introduction to Time Series
A time series is a sequence of observations that are recorded at regular time intervals, usually
discrete and evenly spaced, e.g. daily or monthly. Let xt denote the observed data:

x1, x2 ,...xt
A time series model for xt is a family of distributions to which the joint distribution of X t is
assumed to belong. Let X t denote the time series, where each X t is a r.v.:

..., X t 1, X t , X t 1,...
Therefore, xt 1 , xt , xt 1 are realisations of the r.v. X t 1 , X t , X t 1
Classical Decomposition Model
The classical decomposition model decomposes the original data Xt into 3 components
X t Tt St Nt

Nt 0

St St d

j 1 St j 0
d

Where:
Tt is a deterministic trend component that is slowly changing and perfectly predictable
St is a deterministic seasonal component with a known period of d and perfectly
predictable. Note that the seasonal component over a complete cycle is zero, i.e. d would be
4 if it is quarterly.
Nt is a random component with expected value 0, as all information should be captured in
the trend and seasonal component. Nt may be correlated and hence partially predictable
Moving Average Linear Filters
A moving average linear filter has the form:
Tt

a j Xt j

q
1
Xt j
2q 1 j q

Eliminating the Seasonality


A moving average linear can also be passed through Xt to eliminate the seasonal component:
If the period d is odd, i.e. d 2q 1 , use the filter:

1
1
1
X t q X t q 1 ... X t q
d
d
d
If the period d is even, i.e. d 2q , use the filter
11
1

X t q X t q 1 ... X t q 1 X t q
d 2
2

Estimating the Trend


A moving average linear filter can be applied to estimate the trend. Consider X t Tt Nt where

Tt t , applying a 2q+1 point moving average filter to Xt gives:


Tt

q
q
1
1
T

t j t j 2q 1 t j Nt j t Tt
2q 1 j q
j q

28
Zhi Ying Feng

For the classic decomposition model:


Tt

j q

a j Xt j

This model works well, i.e. Tt

q
q
q
1
1
1
T

t j 2q 1 t j 2q 1 N t j
2q 1 j q
j q
j q

Tt if:

The trend is approximately linear


The sum of Nt is close to zero
The sum of St is zero

The estimated noise, or the residuals, is:

Nt xt Tt St
Differencing
The backshift operator, or the lag operator, B is defined by:
BX t X t 1
B j Xt Xt j

The difference operator is defined by:

X t 1 B X t X t X t 1
The powers of are defined by:

j X t 1 B X t
j

0 X t X t
The difference operator with lag d is defined by:

d X t 1 B d X t X t X t d
Eliminating the Trend
Trend can be eliminated by using differencing. E.g. consider a linear trend:
X t Tt Nt , Tt c0 c1t
Apply differencing to the power of one:
X t Tt Nt

c0 c1t c0 c1 t 1 N t
c1 Nt

i.e. the trend becomes a constant c1, which is stationary. This constant can be estimated by the
sample average of X t . In general, a trend of a polynomial of degree k can be reduced to a constant
by differencing k times:

Tt c0 c1t ... ck t k

k X t kTt k Nt

Which is a random process with mean k !ck


Differencing to Eliminate Seasonality
Seasonality can also be eliminated by differencing, by differencing once only but by a lag d
equivalent to the period of the seasonality:
d St St St d 0

29
Zhi Ying Feng

Stationarity
The theory of stationary random processes is important in modelling time series, as stationarity
allows parameters to be estimated efficiently, since we can treat all samples as from the same
distribution. A non-stationary random process must be transformed into a stationary one before
analysis and modelling, i.e. by removing trend and seasonality, or by applying transformations.
A random process X t is said to be:

Integrated of order 0, i.e. I 0 if X t itself is a stationary time series process

Integrated of order 1, i.e. I 1 if X t is not stationary, but the increment through


differencing Yt X t X t X t 1 form a stationary process

Integrated of order n, i.e. I n if n1 X t is still not stationary, but the nth differenced series

Yt n X t is a stationary time series process.


However, the deseasonalised, detrended residuals, i.e. the random component, may still contain
information, i.e. they are correlated over time. Assuming stationarity in the residuals, we can try to
fit a probability model to the residuals for forecasting purposes. To do so, we will need to look at
some sample statistics of the residuals:
Sample Statistics
Let X t be a stochastic process such that var X t for all t.

The autocovariance function (ACVF) is defined by:


Cov X t , X t

0 Cov X t , X t var X t

The autocorrelation function (ACF) is defined by:


Cov X t , X t

Corr X t , X t
0 Cov X t , X t

The covariance function is defined to be:

X X Y Y
XY X Y

Cov X , Y

Some properties of the covariance function are:


Cov X , a 0

Cov aX , bY abCov X , Y
Cov aX bY , cU dV acCov X ,U adCov X ,V bcCov Y ,U bdCov Y ,V
A process X t is said to be weakly stationary if:

Xt
cov X t , X t
i.e. mean is constant and the covariance of the process only depend on the time difference .

30
Zhi Ying Feng

Noise
I.I.D Noise
X t is i.i.d. noise if X t and X t h are independently and identically distributed with mean zero, i.e.
no covariance.

X t ~ IID 0, 2

Assuming X t2 , as for weakly stationary series we need bounded first and second moments,
then:
X 0
2 if h 0
X h

if h 0

White Noise
X t is white noise with zero mean if:

X t ~ WN 0, 2

Where:

X 0
2 if h 0
X h

if h 0
0
IID noise is white noise, but white noise is not necessarily IID noise.
White noise is weakly stationary
Usually, we assume that the error terms have a normal distribution for the purpose of
parameter estimation and etc.

X t ~ N 0, 2

Linear Processes
X t is a linear process if it can be represented as:
Xt

j Zt j j B j Zt B Zt

Where j is a sequence of constants with j j and Zt ~ WN 0, 2 .

Linear processes are stationary because they are a linear combination of stationary white noise
terms. For stationary processes, the regularity condition

j j

holds, i.e.

j j is

absolutely convergent. This ensures that the infinite sum can be manipulated the same way as a
finite sum, i.e. two absolutely convergent series can be added or multiplied together.
In general, if Yt is stationary and

j j

holds then

X t B Yt

X t is stationary. Essentially, ALL stationary processes can be represented by a linear process


31
Zhi Ying Feng

Time Series Models


Moving Average Process

X t is a first-order moving average process MA(1), if there is a process Zt ~ WN 0, 2 and a

constant such that:

X t Zt Zt 1
i.e. the next term depends on the noise plus the proportion of the noise of the previous period.
The mean is:

X t Zt Zt 1
0

The autocovariance ACVF is:


h Cov X t , X t h

Cov Zt Zt 1 , Zt Z t 1 var Z t Z t 1 2 1 2

Cov Zt Zt 1 , Zt 1 Z t 2 var Z t 1 2
Cov Z Z , Z Z
t
t 1
t h
t h 1 0

The autocorrelations ACF is:

h
h

h
2

2
2
0 1
1
0
The conditional mean is stochastic and depends on Xt:

if h 0
if h 1
if h 1

if h 0
if h 1
if h 1

X t 1 | X t , X t 1,... Zt 1 Zt | Zt Zt 0 X t 1

The conditional variance is constant, i.e. independent on Xt:

var X t 1 | X t , X t 1,... var Zt 1 | Zt 2 2 1 var X t 1


Therefore, the conditional and unconditional mean and variance are different.

X t is

a moving average process of order q, MA(q), if the process depends on its previous q

realisations of noises:
X t Zt 1Zt 1 ... q Zt q

Note:
Moving average processes are stationary processes as Xt is a linear combination of stationary
white noise terms
The ACF of a MA(q) process has non-zero values up until lag q, and near-zero values for all
lags greater than q
The conditional mean/variance is used for forecasts, whereas the unconditional
mean/variance is the long run results

32
Zhi Ying Feng

Autoregressive Process

X t is a first-order autoregressive process AR(1) if X t is stationary and there is a process


Zt ~ WN 0, 2 , uncorrelated with all Xs for

s t and a constant such that:

X t X t 1 Zt
i.e. the next term depends on the previous value of the process and slowly converges to its mean.
Note that an AR(1) can also be written as:

X t h X t h Z t Z t 1 ... h 1Z t h 1
h

h 1 X t h 1 j Zt j
j 0

The mean is:

X E X t 1 E Zt
0

The autocovariance ACVF is:


2

2
1
X h cov X t , X t h
2
h

1 2

if h 0
if h 0

The autocorrelations ACF is:

X h

X h 1

X 0 h

if h 0
if h 0

The conditional mean is stochastic and depends on Xt:

E X t 1 | X t , X t 1,... X t E Zt 1 | X t X t 0 E X t 1

The conditional variance is constant, i.e. independent on Xt:

var X t 1 | X t , X t 1,... var X t Zt 1 | X t 2 var X t 1


Therefore, the conditional and unconditional mean and variance are different.

X t is an autoregressive process of order p, AR(p) if:


X t 1 X t 1 2 X t 2 ... p X t p Zt

Note:
For an AR(1) process to be stationary, 1 , so it can be expressed as a linear process.
When 1 , the process is known as a random walk.

For a higher order AR(p) process, there will also be conditions on 1 , 2 ... p for stationarity

An AR(1) process is equivalent to a MA() process and a MA(1) process is equivalent to an


AR() process. Theoretically, MA() is equivalent to and can be converted to a AR()
process
h

j 0

j 0

X t h 1 X t h 1 j Zt j j Zt j

as h

An AR(p) process has, in absolute values, in general decaying non-zero ACF for all lags.
The smaller , the faster ACF decays. If is negative, then ACF will have alternating signs.
33

Zhi Ying Feng

Autoregressive Moving Average Models

X t is a first-order autoregressive moving average model ARMA(1,1), if Xt is stationary and:


X t X t 1 Zt Zt 1

Where and are constants and Zt ~ WN 0, 2 . Or alternatively:

B X t B Zt
Where

B 1 B

and

B 1 B

Then the ARMA(1,1) equation can be written as:

B X t B Zt
Note that for Xt to be stationary, the condition is the same as the AR(1) process, i.e. 1 .

X t is an ARMA process of order (p,q), i.e. ARMA(p,q) if:


X t 1 X t 1 ... p X t p Zt 1Zt 1 ... q Zt q

B X t B Zt
Where:

z 1 1z 2 z 2 ... p z p

z 1 1z 2 z ... q z q

The polynomials (z) and (z) have no common roots

The ARMA(p,q) process X t is stationary if the equation z 0 has no roots on the unit circle,
note that the roots can be complex i.e.:

z 0 for z 1

X t is an ARMA(p,q) process with drift if it is in the form:


X t 1 X t 1... p X t p Zt 1Zt 1 ... q Zt q

If we estimate then remove the mean, then we have X as a normal ARMA(p,q) process. The
drift here represents the expected change after differencing.
Causality

An ARMA(p,q) process X t is causal if there exists constants j such that j and:

X t j Zt j
j 0

i.e. Xt is expressible in terms of current and past noise terms. Causal processes are a subset of
stationary processes, i.e. to be causal it must be stationary first. Causality is important in practice,
since if Xt is not causal then it depends on future noise term, which doesnt make sense.
Theorem:
The X t satisfying B X t B Zt is causal if and only if all the roots of the equation z 0
are outside the unit circle.
34
Zhi Ying Feng

Invertibility

An ARMA(p,q) process X t is invertible if there exists constants j such that j and:

Zt j X t j
j 0

i.e. Zt is expressible in terms of current and past Xt. If Xt is not invertible then it depends on future
processes, again it does not make sense
Theorem:
The X t satisfying B X t B Zt is invertible if and only if all the roots of the equation

z 0 are outside the unit circle.


Note:
All MA processes are causal, but AR processes might not be
All AR processes are invertible, but MA processes might not be

An equivalent condition for an AR(1) process to be causal is 1

An equivalent condition for an MA(1) process to be invertible is 1

Calculation of ACF
Linear Filter Method
Consider the causal ARMA(p,q) process:
X t 1 X t 1 ... p X t p Zt 1Zt 1 ... q Zt q
This can be written as:
Xt

B
Zt B Zt j Zt j
B
j 0

Note that the summation is only from 0 to infinity, as the process is causal.
First step is to determine the j by equating the coefficients in the equation:

B B Zt B Zt

1 B ... B
1

1B1 2 B 2 ... 1 1B1 ... q B q

Then calculate the ACF by replacing the Xt by their linear filter form. For h>0:

h Cov X t h , X t

Cov j Z t h j , j Zt j
j 0

j 0

Cov j h Z t j , j Z t j
j h

j 0

2 j h j
j 0

push index back so that both term has Zt j


since Cov Z t , Z t 2 and Cov Z t , Z s 0

This method is convenient for MA processes, since they are easily expressed as linear processes
35
Zhi Ying Feng

Yule-Walker Equations
Consider the causal ARMA(p,q) process:
X t 1 X t 1 ... p X t p Zt 1Zt 1 ... q Zt q
First, multiply both sides by Xt-h and then take the covariance:

Cov X t h X t 1Cov X t h X t 1 ... pCov X t h X t p Ch

Which is equivalent to:

h 1 h 1 ... p h p Ch
Where Ch is the moving average components on the RHS:

Ch Cov X t h Zt 1Cov X t h Zt 1 ... qCov X t h Zt q

For h 0,1, 2,... p there are a total of p 1 equations which can be expressed in matrix form:

0 1 1 ... p p C0
1 1 0 ... p p 1 C1
2 1 1 ... p p 2 C2

1
1
1
2
1
2 1 3

p p 1

p
0 C0
0

1 C1

Thus,

p C
p
... 1
p 1 p 1 ... p 0 C p
given Cj and j, the ACF can be computed by solving the linear equation, i.e. by taking the inverse.
For h p , we can find the ACF through:
...
...
...

h 1 h 1 ... p h p Ch
To figure out C0, C1,Cp, if js are available, then we can use:

j 0

j h

X t h j Zt h j j h Zt j

This gives Ch as:

Ch Cov X t h Z t 1Z t 1 ... q Z t q

Cov j h Z t j Z t 1Z t 1 ... q Z t q

j h

q h

j h j j j h
2

j h

j 0

Where 0 1 and j 0 j j h 0 . However, if js are available, we may as well use method 1!


q h

Therefore, this method is more convenient for AR processes

36
Zhi Ying Feng

Partial Autocorrelation Function


The autocorrelation function is used to determine the no. of lags for a MA(q) process, while the
partial autocorrelation function is necessary to determine the no. of lags for a AR(p) process. In
finding ACF, we set up the Yule-Walker equations and converted into matrix form to solve for h .
By omitting the first equation, we can rewrite this matrix to solve for the parameters j s:

1
... p 1 1 C1
1 0



0
... p 2 2 C2
2 1



0 p C p
p p 1 p 2 ...
Or equivalently:

p p p C p
Where the red matrix is the covariance matrix and Ch are:

Ch Cov X t h , Zt 1Cov X t h , Zt 1 ... qCov X t h , Zt q

Therefore, the parameters are given by:

p p1 C p p

Note that for an AR(p) process, Ch are all zeroes since the noise terms are uncorrelated with the past:

p p1 p
Define the vector h h1 h 2 ... hh h1 h , then for an AR(p) process where p p1 p ,
T

we have:

If h = p, then p p 1 ,..., p which implies that pp p

If h > p, then h 1 ,..., p , 0,..., 0 which implies hh 0 .

Because AR(p) process is a special case of an AR(h) process, where t 0 for t > p

If h < p, then we do not know precisely what h is

Therefore, for an AR(p) process:


X t 1 X t 1 ... p X t p Zt

We have:

hh

p
0

if h p
if h p
if h p

The h-th partial autocorrelation function is defined by:

if h 0

h
hh if h 0
Where hh is the last element in the vector h1 h
E.g., for an AR(2) process, the PACF at lag 2 is given by:

2 22

where

21 0 1 1
1 0 2

22
37

Zhi Ying Feng

Model Building
For a stationary time series, model building can be classified into 3 stages:
Model Selection
To determine the appropriate model we need to look at the sample ACF and PACF:
The sample autocovariance function is:

1
x x xt x
n t 1 t

The sample autocorrelation function is SACF:

Note that since SPACF will be calculated from SACF, both measures will have estimation error.
Once we have graphed the sample ACF and sample PACF, we first check that the sample ACF
should quickly converge to zero, which shows that the time series is stationary.
If the sample ACF decreases slowly but steadily from a value near 1, then the data need to
be differenced before fitting the model
If the sample ACF exhibits a periodic oscillation, then there may be some seasonality still.
Then we compare the sample ACF and PACF to the theoretical ACF and PACF of different
processes to see if there is a match.
For an AR(p) process:
Sample ACF shows exponential decay towards near-zero values
Sample PACF shows significant values up to lag p, then near-zero values thereafter
For an MA(q) process:
Sample ACF shows significant values up to lag q, then near-zero values thereafter
Sample PACF shows exponential decay towards near-zero values

If neither of these situations occur, then consider an ARMA(p,q) process. However, the sample
ACF and PACF of an ARMA(p,q) process is very flexible, but in general the ACF and PACF are
the sum of the ACF and PACF of an AR(p) and a MA(q) process. So for an ARMA(p,q) model, it
should display:
ACF that decays towards zero after lag p, either direct or oscillatory
PACF that decays towards zero after lag q, either direct or oscillatory
38
Zhi Ying Feng

Model Selection Criteria


When deciding the number of parameters, it is a trade-off between goodness of fit and the
simplicity of the model. More parameters mean more flexibility and a better sample fit. However,
more parameters also mean that each parameter is estimated with more uncertainty, i.e. higher
standard error.
A systematic method to choose p and q, i.e. the parameters, is to minimise an information criterion.
Information criteria have the form:

IC p, q, 2log L P n, p, q

Where the first term, i.e. the log-likelihood function, always decreases with the no. of parameters.
However, the second penalty term, in terms of the no. of observations and the parameters, always
increases with the no. of parameters. Thus the IC seeks to balance out bias and variance.
There common ICs are:
Information Criteria
AIC

Penalty Term
2 p q 1

p q 1 log n
2 p q 1 n

BIC
AICc

n pq2

Parameter Estimation
Mean Estimation
Let X t be a weakly stationary process with mean . An estimator of is the sample mean:

1 n
Xt
n t 1

This estimator is unbiased:

1 n
E X E Xt
n t 1
This estimator is also consistent:

1 n

1 n
var X cov X t , X s
n s 1
n t 1
1 n n
2 t s
n s 1 t 1

1
n2

ns

h
s 1 h 1 s

h n

n h
n2

Thus, the variance 0 as n , i.e. it is consistent, if h h , which is true since for a

stationary time series the ACF should eventually converge to zero regardless of whether it is AR or
MA. For a more detailed proof, see ACTL2003 Proofs.
39
Zhi Ying Feng

Parameter Estimation
If the sufficient number of sample ACF is known, then one way to estimate the parameters in a
model, i.e. , , 2, is by equating them to the theoretical ACF derived from the Yule-Walker
equations. Use this to set up equations in terms of the parameters and solve. If there are more than
one solution, choose the solution that makes the model causal and/or invertible.
Another method is to use maximum likelihood estimation. Suppose we have a set of errors that are
assumed to be normal, then the Xt themselves are also normal so:

X n X1 ,..., X n ~ N 0, n
T

Where:

is a fixed set of parameters in the model, e.g. , 2

n is the symmetric covariance matrix of Xn, expressed in terms of the parameters


0

1
n 2

n 1

1
0
1

2
1
0

...
...
...

for MA(1) model

n 1

n 2
n 3

n 2 n 3 ... 0
Assuming that the observations follow a multivariate normal distribution, the likelihood function is:
1
1
1

L
exp X nT n X n
12
n2
2

2 det n

The maximum likelihood estimator is the value that maximises the likelihood function L().
Under the normality assumption, we have the asymptotic distribution of the MLE:
A

~ N , var
Where the variance can be estimated by:

2 ln L

var
T

This result can be used to compute confidence intervals for the parameters and for hypothesis
testing about the parameters, e.g. whether to include certain parameters.

Model Diagnosis
The residuals of the proposed model are:

Z t Xt X t
Where are the fitted values computed using the estimated parameters. If the proposed model is a
good approximation to the underlying time series process, then the residuals should be
approximately a white noise process. There are several methods to check this:
Plot of Residuals
If the plot of Zt against t shows any trend or patterns in fluctuations, then the model is inadequate.
40
Zhi Ying Feng

SACF of Residuals
If Zt ~ WN 0,1 , then it can be shown that the ACF of Zt has the distribution:
1

Therefore, at 95% significance, the sample ACF should be should be within the range:
1.96
0
n

Z h ~ N 0,
n

Since for a white noise process, h 0 for h 1. If too many values of SACF lie outside this
range, then the model does not fit the process well and more parameters will be needed.
Ljung-Box Test
We can test the null hypothesis that Zt ~ WN 0,1 using the Ljung-Box test statistic. This tests
whether jointly, all correlations at lags greater than zero are zero. Under the null hypothesis, for
large n, the Ljung-Box test statistic is:
h

Q n n 2
j 1

Z j
2

n j

~ h2 p q

Where n is the no. of time series observations. In practice, h is chosen to be between 15 and 30, and
n should be large, i.e. n 100 .
Using the Ljung-Box test, we reject the null hypothesis, i.e. the residual is not white noise, if:

Q h2 p q, 1

Non-Stationarity
In practice, a non-stationary time series may exhibit a non-stationary level of mean, variance or
both. Transformations can be used to remove non-stationarity, e.g. taking logarithm of an
exponential trend can remove non-stationary mean, or it can smooth out the variance
Stochastic Trends
Apart from a deterministic trend or seasonality, a stochastic trend also causes non-stationarity. A
stochastic trend is when the noise terms have a permanent effect on the process. Consider the
random walk rewritten iteratively
X t X t 1 Zt

Xt X0 Z j
j 1

where Zt ~ WN 0, 2

In this case, the effect of any Zt on Xt+h is the SAME for all h 0 , since 1 . This is not true for
stationary processes like AR(1) or ARMA(1,1), where 1 , since depending on h, Zt will have
different level of impact since the coefficient j changes, e.g.:
X t j 1 j Zt j Zt

Since noise terms have a lasting impact, the correlation between Xt and Xt-h is relatively high, so a
distinctive feature of a random walk is a very slowly decaying positive ACF. Note that differencing
the random walk once obtains a stationary series!

Yt 1 B X t X t X t 1 Zt
41
Zhi Ying Feng

ARIMA Model
The process Yt is an autoregressive integrated moving average model ARIMA(p,d,q) with order of
integration d if:

B 1 B Yt B Zt
d

That is Yt becomes a stationary ARMA(p,q) process after differencing d times, i.e:


Wt 1 B Yt ~ ARMA( p, q)
d

B Wt B Zt
E.g. consider the process defined by:
X t 0.6 X t 1 0.3 X t 2 0.1X t 3 Zt 0.25Zt 1
This process can be re-written as:
X t 0.6 X t 1 0.3 X t 2 0.1X t 3 Z t 0.25Z t 1

1 0.6B 0.3B 0.1B X


1 B 1 0.4 B 0.1B X
2

1 0.25B Z t

1 0.25B Z t

1 B

Therefore, this is a ARIMA(2,1,1) process.

SARIMA Model
Suppose that X t exhibits a stochastic seasonal trend, i.e. where Xt not only depends on

X t 1, X t 2 ,... but also X t s , X t 2s ,... .


To model this, we can use a SARIMA p, d , q P, D, Q s process, given by:

B 1 B 1 1B s ... P B s
d

AR p

I d

1 B

s D

I D

AR P

X t B 1 1B s ... Q B s
MA q

MA Q

Where AR(P), MA(Q) and I(D) are polynomials with the term B s .
E,g, consider an SARIMA 1,0,0 0,1,112 process given by:

1 B 1 B12 X t 1 B12 Zt
AR1

This can be written as:

1 B

12

I 1

MA1

B B13 X t Zt Zt 12
X t X t 12 X t 1 X t 13 Zt Zt 12

Note that Xt depends on Xt-12, Xt-1, Xt-13 as well as Zt-12.


SARIMA models can always be rewritten as an ARIMA model, with some constraints on the new
parameters. However, converting to ARIMA always requires more parameters than the SARIMA
model, which leads to better in-the-sample fit, but worse out-of-sample fit, i.e. predictions. Thus,
when forecasting it is better to use SARIMA models than converting it to ARIMA models.
42
Zhi Ying Feng

Dickey-Fuller Test
The Dickey-Fuller test is a unit root test, i.e. it tests whether there is a unit root in the time series.
Note that if the polynomial z has a unit root, then the time series is not stationary and requires
differencing.
Consider a causal time series process X t with:

X t t X t 1 Zt

Z ~ WN 0, 2

To test for stochastic trend, we test the hypothesis:

H 0 : 1 against

H1 : 1

Note that if 1 then there is a unit root, which leads to a stochastic trend so Xt is not stationary.
We can write the above model as:
X t X t 1 t 1 X t 1 Zt
X t t * X t 1 Zt

where * 1

Then the alternative and equivalent null hypothesis is:

H0 : * 0
This is known as the Dickey-Fuller Test, and the test statistic is:

*
se *

Once the parameters , , * have been estimated, reject the null hypothesis * 0 if is LESS than
the critical value, which will be negative since * is negative. Note that rejection of the null
hypothesis implies that the time series is stationary and accepting the null hypothesis implies that
differencing is required.
The distribution of this test statistic is a non-standard distribution depending on and , with
asymptotic percentiles.
Probability to the left
0.01
0.05
0.10
Standard Normal
-2.33
-1.65
-1.28
-2.58
-1.95
-1.62
DF with 0, 0
-3.43
-2.86
-2.57
DF with 0
DF (unconstrained)
-3.96
-3.41
-3.12
Note that the DF distributions are much more spread out than the standard normal. When choosing
whether or not to do DF with 0, 0 , there is a trade-off:

If or is set to 0 when in fact the true values are nonzero, the test becomes inconsistent
and asymptotic critical values are no longer valid. Decisions based on the test are likely to
be wrong, i.e. it might confuse deterministic and stochastic trends
However, allowing or to be non-zero reduces the power of the test, i.e. harder to detect a
false null hypothesis
How to determine and usually depends on what type of series we have. E.g. if a linear trend
exists then we expect the difference to only have a constant, so 0, 0
43
Zhi Ying Feng

Overdifferencing
Let Ut be an ARMA(p,q) process:

B U t B Zt

Define Vt 1 B U t , i.e. n differences. Then we have:


d

Vt 1 B

B
Z
B t

B Vt 1 B B Zt
d

Therefore, Vt becomes an ARMA(p,q+d) process since 1 B B is a polynomial with order q+d.


d

However, this MA polynomial has a unit root so the process Vt is not invertible. Therefore, we
should avoid overdifferencing as it will give us a non-invertible process, even though its still
stationary.
Cointegrated Time Series
Many time series in finance and economics are non-stationary (random walks), e.g. CPI and GDP,
but at the same time do not move too far apart from each other. Cointegration is used to model nonstationary series that move together
T

For a bivariate process X t X t ,1 , X t ,2 , we define:

d X t ,1
Xt d

X t ,2
d

A bivariate process X t X t ,1 , X t ,2 is integrated of order d, i.e. I(d), if d X t is stationary but

d 1 X t is not.
An I(d) bivariate process is cointegrated if there is a cointegrating factor 1 , 2 such that
T

X becomes stationary, i.e.:


T

1 X t ,1 2 X t ,2 ~ I 0
If X t ,1 and X t ,2 are cointegrated, then:

X t ,1 and X t ,2 are both I(1)

et Yt X t is I(0)

Cointegration implies that there is a common stochastic trend between X t ,1 and X t ,2


In many financial applications, the cointegrating vector is in the form:

1, a T
That is, Xt,1 and Xt,2 are random walks themselves but the difference X t ,1 aX t ,2 is stationary. The a
term can be estimated by using regressing:
X t ,1 aX t ,2 t
Then we expect in the long run that the two processes converge to:
X t ,1 aX t ,2

44
Zhi Ying Feng

Time Series Forecasting


Time Series and Markov Property
Recall that a process X t , t 0,1,... has the Markov property if all future states of the process
depend on its present state alone and not on any of its past states:

Pr X t A | X s1 x1 ,... X sn xn , X s x Pr X t A | X s x
For all times s1 s2 ... sn s t and all states x1, x2 ,..., xn , x S and all subsets A of S
AR Processes
An AR(1) process has the Markov property, since the conditional distribution of X n1 given all
previous X t depends only on X n . However, an AR(2) process does not have the Markov property,
since the conditional distribution of X n1 given all previous X t depends on both X n and X n1 . Thus in
general, AR(p) processes do not have the Markov property for p greater than 1.
However, for an AR(2) process if we define a vector-valued process Y by Yt X t , X t 1 , then Y
T

has the Markov property since the conditional distribution of Yn 1 given all previous Yt depends only
on Yn . In general, for an AR(p) process we can define a vector-valued process with p elements that
will have the Markov property.
MA Processes
A MA(q) process can never have the Markov property, even in vector form, since the distribution of
X n1 depends on the value of Z n and in theory no knowledge of the value of X n or any finite
collection of X n ,..., X nq 1 will never be enough to deduce the value of Z n .However in practice,
T

we can estimate Z n very accurately so Markov simulation techniques still applies.


ARIMA Processes
For an ARIMA(p,d,q) process, if q is zero, i.e. no moving average component, then the process
behaves similar to AR(p) in terms of Markov property, i.e. it might not be Markov but an
appropriate vector form can be Markov.
If d is also zero and p is greater than 1 then it is essentially an AR(p) process. If both p and d are
equal to 1, then the model can be written as:
1 1B 1 B X t Zt

1 B B B X
2

Zt

X t 1 1 X t 1 1 X t 2 Zt

This is clearly not Markov. However, it can still be written as a vector-valued process that has the
Markov property. In general, the vector process needs p+d terms to be Markov
If q is not equal to zero, i.e. it has a moving average part, then it will never be Markov for the same
reason that MA(q) is never Markov.
45
Zhi Ying Feng

k-step Ahead Predictor


Assume that we have the following information:
(1) All observations for X t up until time n: x1 ,..., xn
(2) An ARMA model has been fitted to the data
(3) All parameters of the model , , 2 have been estimated
(4) The process Zt is known for all up until time n
k-step ahead forecasts X n k |n is one method to forecast/predict X n k by using the given
observations up until time n:

X n k |n

X nk | X n ,..., X1

This is obtained by:


Replacing the random variables X1 ,..., X n by their observed values x1 ,..., xn

Replacing the random variables X n1 ,..., X nk 1 by their forecast values X n k |n

Replacing the random variables Z n1 ,..., Z nk 1 by their expectations, i.e. 0

If Z1 ,..., Z n are unknown, replacing Z1 ,..., Z n by the residuals Z 1 ,..., Z n where


Z i E Zi | X n ,..., X1

To forecast X n k |1 we have:
X n 1|n

X n1 | X n ,..., X1

1 X n ... p X n1 p Z n1 1Z n ... q Z n 1q | X n ,..., X 1


1 X n | X n ,..., X1 ... p X n 1 p | X n ,..., X1 1 Z n | X n ,..., X 1 ... q Z n 1q | X n ,..., X 1
1 X n ... p X n 1 p 1Z n ... q Z n 1q

To forecast X n k |2 we have:
X n 2|n

X n 2 | X n ,..., X1

1 X n 1 ... p X n 2 p Z n 2 1Z n1 2 Z n ... q Z n 2 q | X n ,..., X 1


1E X n 1 | X n ,..., X 1 ... p X n 2 p | X n ,..., X 1 1 Z n 1 | X n ,..., X 1 ... q Z n 2q | X n ,..., X 1

1 X n 1|n ... p X n 2 p 2 Z n ... q Z n 2q

In practice we do not observe Zt so if there are MA terms in the model, then there are more values
of Z t than X t and there is no way of determining all of them from data. Consider the MA(1) process:

X t Zt 0.5Zt 1

Zt X t 0.5Zt 1

Then by repeated substitution we have:


n 1

Z n 0.5 X n j 0.5 Z 0
j

j 0

To determine Z n we need Z 0 first, one simple way is to assume that Z0 0 . If the process is invertible,
then this assumption will have negligible effect on X n k |n if n is large, since 0.5 will be large.
n

46
Zhi Ying Feng

Best Linear Predictor


Under assumptions (1), (2) & (3), the best linear predictor Pn X n h of X n h has the form:

Pn X nh a0 a1 X n a2 X n1 ... an X1
We need the values of a0 ,..., an that minimises the mean squared error:

X P X 2
n h n nh
The general solution is found by minimising the n+1 first-order conditions:
MSE a0 0 X n h Pn X n h 0
MSE

MSE a1 0

X n h Pn X n h X n 0

MSE an 0

X n h Pn X n h X 1 0

Note that due to the very first condition, the expected prediction error is:

X nh Pn X nh 0

i.e. the prediction is unbiased.


The first equation can be simplified to:
a0 a1 a2 ... an 0
While the second can be simplified using a trick:
X n h Pn X n h X n X n h Pn X n h X n

X n h Pn X n h X n X n h Pn X n h
0

Cov X n h Pn X n h , X n
h a1 0 a2 1 ... an n 1

0
Then applying this trick to every subsequent MSE minimising condition, we end up with a system
of n equations (excluding the first one) that is very similar to the process of finding PACF
coefficients. This system can be represented in matrix form:

n h n an
Where:

1
... n 1
0

1
0
... n 2

0
n 1 n 2 ...
Therefore, the solution is:

a1
a
an 2


an

h 1

n h

h n 1

an n1 n h
Once this is known, the a0 can be found by rewriting the first equation in matrix form:

a0 1 1T an

Where 1T represents a column vector of 1s.


47
Zhi Ying Feng

Part 3: Brownian Motion


Definitions
A stochastic process X t , t 0 is a Brownian motion process if:

X0 0

(1)

X t , t 0 has stationary and independent increments

(2)

For all t 0 , X t ~ N 0, 2t

(3)

The conditions (2) and (3) together are equivalent to:

X t ~ N 0, 2t for t 0

X t , t 0 has stationary increments and cov X s , X t X s 0 for 0 s t

since if two

normal r.v. have 0 covariance then they are independent


A standard Brownian motion Bt , t 0 is when the volatility is one, i.e.: 1 . Any Brownian
motion X t , t 0 can be standardised by setting:
Bt

Xt

Note that Brownian motion is continuous for all values of t.


A Brownian motion process exhibits some strange behaviour:
A Brownian motion is continuous w.r.t. time t everywhere, but differentiable nowhere
Brownian motion will eventually hit any and every real value no matter how large, or how
negative. Likewise, no matter how large, it will always (with probability 1) come back down
to zero at some future time
Once a Brownian motion hits a certain value, it immediately hits it again infinitely often,
and will continue to return after arbitrarily large times
Brownian motion is fractal, i.e. it looks the same regardless of what scale you examine it
Properties of Brownian Motion
Consider a Brownian motion X t , t 0 with volatility :

For any s with 0 s t , Xs and Xt are NOT independent. However, by independence of


increments,
X s and X t X s

are independent.
This can be shown by finding the covariance between any Xs and Xt
cov X s , X t cov X s , X s X t X s

cov X s , X s cov X s , X t X s

2s
For any s and t, X s and X t X s are both normally distributed, i.e.:


~ N 0, t s

X s ~ N 0, 2 s
Xt X s

48
Zhi Ying Feng

Brownian Motion and Symmetric Random Walk


Define a symmetric random walk using random variable:
1 w.p. 0.5
Yi
1 w.p. 0.5
With mean E Yi 0 and var Yi 1
Now divide the interval 0,t into equal intervals of length t so that there is a total of n tt
intervals. Suppose that at each t , the process can either go up or down by size x . Now let X(t) be
the position at time t, then:

X t x Y1 Y2 ... Y t
t

Then, if we let x t and let t 0 then the limiting process of X(t) is a Brownian motion.
This is because:

X t , t 0 has independent increments, since Ys are independent (changes in the value of


the random walk in non-overlapping time intervals are independent)

X t , t 0 has stationary increments, since the distribution of the change in position of the
random walk over any time interval depends only on the length of the interval

X t , t 0 has approximate normal distribution with mean 0 and variance 2t as t


converges to 0, by the Central Limit Theorem on i.i.d. random variables

Brownian Motion with Drift


A stochastic process X t , t 0 is a Brownian motion process with drift coefficient and variance
parameter 2 if it satisfies:
X0 0

X t , t 0 has stationary and independent increments

For all t 0 , X t ~ N t , 2t

A Brownian motion with drift can be converted to a standard Brownian motion by defining:
X t
Bt t

Similarly, a standard Brownian motion can be converted to a Brownian motion with drift by
defining:
X t t Bt
Geometric Brownian Motion
Let X t , t 0 be a Brownian motion process with drift and volatility 2 , then the process

Yt , t 0 defined by:
Yt exp X t
is called a geometric Brownian motion. This is useful when you dont want negative values
49
Zhi Ying Feng

Gaussian Processes

A stochastic process X t , t 0 is a Gaussian, or normal, process if X = X t1 , X t2 ,..., X tn has a


multivariate normal distribution for all t1 , t2 ,..., tn , i.e. the value at each time is normal
A Brownian motion belongs to the class of Gaussian processes, since X1 , X 2 ,..., X n can be expressed
as a linear combination of independent normal random variables:
X1 X1
X 2 X1 X 2 X1

X n X 1 X 2 X 1 ... X n X n 1

Remember that Xs and Xt are not independent but their increments are!
Differential Form of Brownian Motion
Consider a standard Brownian motion Bt , t 0 and define Bt as the change in Bt over t . Using
the property of Brownian motions, we can write this as:

Bt Bt t Bt Z t
Where Z denotes a standard normal random variable
Taking the differential limit t 0 , we have the differential form:

dBt Z dt
With:

dBt 0

and

var dBt dt

Consider the behaviour of dBt :


2

B 2 var B
t
t

t
2
4
var Bt E Bt

Bt

B 2
t

Z 4 t 2

since Bt Z t

2
2
2
t var Z 2 E Z 2 t

3 t t
2

since

i 0
k

Z i2 k2 with k and 2 2k

o t

In differential form, these can be expressed as:

dB 2 dt
t

2
var dBt o dt

We can essentially treat o dt as zero, so the variance is zero and loosely speaking, dBt is just a
2

constant:

dBt

dt

50
Zhi Ying Feng

Now consider the behaviour of dt dBt or dBt dt :

t Bt

t Bt t Bt

0
var t Bt

t 2 B B 2
t t
t

t Bt t Bt

t t 0
2

o t

Then in differential form:

dt dBt 0

var dt dBt o dt

Taking o dt as zero, this is again reduced to a constant:

dt dBt 0
Finally, consider dt :
2

dt 2 o dt 0
The following table summarises the results:

dt
dBt

dt

dBt

dt

Stochastic Differential Equations


Assume that a stochastic process Xt satisfies a stochastic differential equation (SDE) in the form:

dX t X t , t dt X t , t dBt

Consider another stochastic process defined as

Yt F X t , t
Then the stochastic differential equation satisfied by Yt is given by the Itos formula:
2
F x, t
F x, t
2 F x, t
1

dYt
dX

dt

X
,
t
x X t

x X t
t
t x 2 x X t dt

t
2

Proof:
The derivative of the second order Taylor series expansion for a function of two variables is:
1
2
2
df x, y f x dx f y dy
f xx dx f yy dy 2 f xy dxdy
2
Applying this to Yt gives:

dYt

F x, t
F x, t
dX t
dt
x x X
t
x X
t

2 F x, t
2 F x, t
1 F x, t
2
2

dX t dt
dX t
dt 2
2

2 x 2
xt

t
x X t
x X t
x X t

However, we know from the previous section that dt dBt dt 0 , so:


2

dt dX t dt X t , t dt X t , t dBt
0
51
Zhi Ying Feng

And:

dX t

X t , t dt X t , t dBt

2 X t , t dt 2 X t , t dBt 2 X t , t X t , t dBt dt
2

2 X t , t dBt

2 X t , t dt
Therefore, the expansion reduces to the Itos formula
E.g. consider a Brownian motion X t , t 0 with 0 drift and variance 2 , find the SDE for tX t2
We have the SDE for Xt:

dX t dBt
Let Yt tX t2 F X t , t where F x, t tx 2 so the derivatives are:
F x, t
2tx
x

F x, t
x2
t

2 F x, t
x 2

2t

Then using Itos formula, the SDE for tX t2 is given by:

1
d tX t2 dF X t , t 2tX t dX t X t2 dt 2 2t dt
2

2 tX t dBt X t2 2t dt
Stochastic Integration
Consider a Brownian motion X t , t 0 with zero drift and variance 2 . Let f be a function with
continuous derivative on [a,b]. The stochastic process Yt defined by the stochastic/Itos integral is:

Yt f t dX t
a

f tk 1 X t
n

lim

n
max tk tk 1 0 k 1

X tk 1

Where a t0 t1 ... tn b is a partition of[a,b].


This stochastic process Yt satisfies the properties:
Mean of the Ito integral is zero, i.e.

b f t dX
t
a

f tk 1 X t
n

lim

n
max tk tk 1 0 k 1

X tk 1 0

Note that functions of t in this case includes stochastic variables, e.g. Btn dBt 0
Thus, the variance and expectation squared are equal:
-

f t dX t

var

f t dX t

f 2 tk 1 var X t
n

lim

n
max tk tk 1 0 k 1

X tk 1

lim

n
max tk tk 1 0

2 f 2 tk 1 tk tk 1
k 1

2 f 2 t dt
b

52
Zhi Ying Feng

Part 4: Simulation
Continuous Random Variables Simulation
When simulating continuous random variables we have the probability/cumulative distribution of
the random variable. The cumulative probabilities will always have a Uniform(0,1) distribution.
Since if X ~ U 0,1 then F 1 X is distributed as F. This means that the probability that a
probability is in [a,b] is the same as the probability that a probability is in [c,d]. Therefore, the first
step in simulation is usually to generate random variables from a Uniform(0,1).
Pseudo-Random Numbers
The procedure to generate pseudo-random numbers, i.e. from a Uniform(0,1), is:
1. Start with a seed X0 and specify positive integers a, c and m, which are usually given
2. Generate pseudo-random numbers recursively using:

X n1 aX n c mod m
3.

X n 1
will be an approximation to a Uniform(0,1) random variable
m

Inverse Transform Method


Consider a continuous random variable with cumulative distribution function FX and a Uniform(0,1)
random variable U. Define:

X FX1 U
Then X will have the distribution function:
FX x Pr X x

Pr FX1 U x

Pr U FX x

since FX is strictly increasing

FU FX x
FX x

using FU y y

Then the inverse transform procedure to generate r.v. from a cumulative distribution F is:
1. Compute FX1 , from the p.d.f or c.d.f, if possible
2. Generate a Uniform(0,1) random variable U
3. Set X FX1 U , then X will be from the distribution F
Note that for the inverse transform method to work, we must be able to calculate the inverse of the
c.d.f., i.e. FX1 must have an explicit expression.
E.g. to simulate an exponential random variable, first find FX1 :
FX x exp s ds 1 exp x
x

Fx1 x

ln 1 y

Suppose we generate a random variable U from Uniform(0,1). Then we set:


1
1
x F 1 U ln 1 U ln U

Therefore x is a random variable with an exponential distribution.


53
Zhi Ying Feng

Discrete Random Variable Simulation


The inverse transform method can also be applied to simulate discrete random variables. Consider
one with probability mass function:

Pr X x j Pj , for j 1, 2,...

j Pj 1
Then the procedure to simulate r.v. from this p.m.f. is:
1. Generate a Uniform(0,1) random variable U
2. Set:
x1 if U P1
x if P U P P
1
1
2
2
X

x if j 1 P U j P
i 1 i
i 1 i
j
E.g. simulate random variables from a geometric distribution.
Consider the probability mass function for a geometric random variable:
Pj Pr X j p 1 p

j 1

j 1

Notice that:
j 1

Pi 1 Pr X j 1
i 1

1 p 1 p

i 1

j 1

p 1 p
1
1 1 p

j 1

1 1 p

geometric sum

j 1

Then we have:

i 1 Pi U i 1 Pi
j 1

1 1 p

j 1

U 1 1 p

1 p j 1 U 1 p j 1
Since U is a Uniform(0,1) random variable, this is equivalent to:

1 p j U 1 p j 1
j ln 1 p ln U j 1 ln 1 p
ln U
j 1
j
since ln 1 p 0
ln 1 p
Therefore to generate a random variable from a geometric distribution:
1. Generate a uniform random number U
2. Set X as j, where j is the first integer for which:
ln U
j
ln 1 p
54
Zhi Ying Feng

Acceptance-Rejection Method
Suppose we have a method, e.g. inverse transform, to simulate an r.v. with density g(y). Then we
can use this as the basis for simulating from the continuous distribution with density f(x).
The procedure to simulate random variables using the rejection method is:
1. Choose a distribution g for which you know you can simulate outcomes
2. Set c be some constant such that c f y g y for all y
3. Simulate a random variable Y with density function g(y)
4. Simulate a Uniform(0,1) random variable U
5. Accept this as the random number, i.e. set X = Y, if:
f Y
U
c g Y
6. Otherwise, reject and return to step 3.
Therefore, the value for X is YN where N is the number of iterations until a random number is
accepted. We want to be as efficient as possible, i.e. minimise the no. of iterations, by:
For efficiency, choose a density g(y) similar to f(y), e.g. exponential to normal
For efficiency, choose the smallest value of c that satisfies the inequality, using calculus, i.e.:

c max f y g y

E.g. simulate the absolute value of a standard normal random variable X Z


Firstly, X Z has the density:

f x

x2
2
exp
2
2

1. Let another random variable Y be from the density g x exp x , note that this is
comparable to f(x)
2. Choose the smallest value of c such that c f y g y , e.g.:

x 12
2e
2e


c max
max
exp

Since the exponential part will always be less than 1


3. Generate U1 and U2 from Unif(0,1)
4. Compute Y from the density g using inverse transform, and using U1, i.e.:


f x
g x

Y log U1

5. Now check if:


U2

f Y
2
2
1
1
exp Y 1 exp log U1 1
cg Y
2

6. If true, then set X Y log U1 , if false then return to step 3 and repeat.
7. To now generate a standard normal random variable Z, generate U3 and set:
if U 3 0.5
X
Z
X if U 3 0.5

55
Zhi Ying Feng

Simulation Using Distributional Relationships


Gamma Distribution
Recall that the waiting time to the nth event in continuous Markov chains has a Gamma distribution.
To simulate random variables from Gamma n, distribution, the procedure is:
1. Generate n independent Uniform(0,1) random variables U1 ,U 2 ,...,U n
2. Simulate an exponential random variable using the inverse transform method
1
F 1 U ln U

3. Since the sum of n independent Exp() random variables has a Gamma distribution, set:

ln Ui

~ Gamma n,

i 1

Chi-Squared Distribution
The sum of n standard normal r.v. has a chi-squared distribution with n degrees of freedom:

i1 Zi2 ~ n2
n

Alternatively, a chi-squared distribution with an even degree of freedom 2k is equivalent to a

Gamma k , 1 2 distribution. If the degree of freedom is odd, i.e. 2k+1 we can add on an extra Z2
term, where Z is standard normal. That is:

2 i 1 ln U i ~ 2 k
k

Z 2 2 i 1 ln U i ~ 2 k 1
k

Poisson Distribution
Recall that the no. of events within one period where events follow an Exp() distribution is Poi().
To simulate random variables from a Poisson() distribution, the procedure is:
1. Generate a Gamma n, random variable using the above steps
2. Since the waiting time until the nth event has a Gamma n, distribution, which is a sum of
independent Exp() random variables, we set:

1 n
X max n : ln U i 1
i 1

i.e. the total number of arrivals within 1 period.

~ Poi

Normal Distribution
One method of simulation random variables from a normal distribution is the Box-Muller approach:
1. Generate two random variables U1 and U2 from Uniform(0,1).
2. Set:
1

X 2 ln U1 2 cos 2 U 2
1

Y 2 ln U1 2 sin 2 U 2

Then X and Y are a pair of independent standard normal random variables


3. A pair of uncorrelated normal random variables can be then derived using:
X and Y
4. Note that if X is normal then to generate a lognormal random variable L we set L e X
56
Zhi Ying Feng

Monte Carlo Simulation


Suppose that X X1 ,..., X n is a random vector with a given joint density function f x1 ,..., xn and
we want to compute:
g X ... g x1 ,..., xn f x1 ,..., xn dx1...dxn
But with a function g for which this is impossible to compute. Then we can approximate this
integral by using Monte Carlo simulation.

Let X and Y be the ith simulated sample path of X and Y . Then the MC simulation procedure is:
i

1. Generate a random vector X X1 ,..., X n with joint density f x1 ,..., x n and compute

Y g X
1

2
2
2
2. Generate second, independent from step 1, random vector X X1 ,..., X n with joint

2
2
density f x1 ,..., xn and compute Y g X

3. Repeat this process until r (a fixed number) i.i.d. random vectors are generated:

Y g X , for i 1, 2,..., r
i

4. Estimate E g X by using the arithmetic average of the generated Ys

Y ... Y
g X
r
This method works due to the strong law of large numbers:
1

1
r
Y ... Y

r
r

lim

Y i

g X

g X1 ,..., X n

Expectation and Variance


Y is an unbiased estimate of

g X :

1 r i 1 r
Y
r i 1
r i 1

Since each Yi is independent and identically distributed with the distribution of g X1 ,..., X n
The variance of Y is given by:

var Y
1 r i 1 r
i
var Y var Y 2 var Y

r
r i 1
r i 1

Note that usually we do not know var Y , so we estimate it using the sample estimate:
i

var Y

1 r
i
Y Y

r 1 i 1

In the next 3 sections, we will describe some techniques to REDUCE this variance

57
Zhi Ying Feng

Antithetic Variables
Reducing the variance of the estimate using antithetic variable involves generating estimates with
negative correlation then adding these estimates to obtain a final estimate.
Assume that k is even and that r 2 . The antithetic variates procedure is:

1
1
1. Generate a set of n variates X1 ,..., X n and determine Y g X1 ,..., X n

2. Generate a set of n variates X1


determine Y

g X1

,..., X n

,..., X n

, which are correlated with X1 ,..., X n and


1

1
2
r
3. Repeat steps 1 and 2 times to form Y 1 , Y ,..., Y and Y , Y ,..., Y

4. Calculate the arithmetic average of the Ys:

Y1

Y2

and

i 1

Y i

i 1

5. Use:

Y
as the final estimate for

Y1 Y 2
2

g X

Using this method, as long as the correlation between Y 1 and Y 2 are negative, then the variance will
be reduced. One example of this is by using:

X ~ Uniform(0,1) for the first set of n variates, e.g. Y 1

1 X. which is also Uniform(0,1) for the second set of n variates, e.g. Y 2


Then as long as g(X) is a monotonic increasing/decreasing function, we will have a negative
correlation:

cov X ,1 X 1
To show why this method reduces the variance, consider the variance of the estimator using the
antithetic variable

Y1 Y 2
var Y AV var

2
1
var Y 1 var Y 2 2 var Y 1 var Y 2
4
1
var Y 1 1
since var Y 1 var Y 2
2

1 var Y

1
2

using 1 but with only the first set up to

var Y
i

var Y

var Y
i

for 0

58
Zhi Ying Feng

Control Variates
g X and there is a function f such that the expected
value of f(X) can be evaluated analytically with:

If we wish to evaluate the expected value

f X

Then we can evaluate g X using:

g X

g X a f X
0

g X a f X
a g X af X

Where a is a parameter we can choose.


Therefore, instead of evaluating g X directly, we evaluate (1) using a control variates:

g X Y cv
a
a

g X af X
1 r
g X i af X i
r i 1

Then, this estimator will have the variance:

var Y cv

1
r2

var g X i af X i
i 1

var g X i af X i
r

var g X i

a 2 var f X i

r
This variance can be minimised by solving:

2aCov g X i , f X i
r

var Y cv
a
With the solution being the best choice of a:
a

Cov g X i , f X i
var f X i

Substituting this value back, we have that the minimised variance is:

var Y cv

var g X i
r
var g X i

Cov g X , f X

r var f X i
0

var Y

Therefore, using a control variates has decreased the variance compared to the original estimator

59
Zhi Ying Feng

Importance Sampling
We can write:
g X g x f X x dx

Where fX is the density of X.


Now consider another equivalent probability density to f(x) denoted by h(x) such that the zero
probability events agree under both densities. E.g. f x 0 if and only if h x 0 . Let Y be the
random variable with density function h(x).
Then we can write:
X

f x
g X g x
h x dx
h x

f Y
Y g Y

h Y

Then, we can simulate Yi from density h(x) and estimate the expectation
Yh

g X with:

f Yi
1 r
g Yi

r i 1
h Yi

If it is possible to select a probability density h(x) so that the random variable


f X
g X
h X
has a smaller variance then the estimator will be more efficient, i.e.:

f X
var g X
var g X
h X

To do this, we need to select a density h(x) such that the ratio of the two densities, i.e. f x h x is
large when g(x) is small and vice versa.

Number of Simulations
Ideally, we want to carry out simulations as efficiently as possible since the no. of simulations
required for accuracy is quite large. Assume we generate n samples from a known distribution. To
estimate its mean we will use the sample average as an estimator:

Y
This is an unbiased estimator, i.e.:

1 r i
Y
r i 1

Its variance is given by:

var Y

var Y
i

60
Zhi Ying Feng

However, the value of var Y is usually not known, so we estimate it using the sample variance
i

from the first k runs where k < n and usually at least 30. Then the estimate of the mean and variance
becomes:
2
1
1
k
k
i
i
i
Y i 1Y
var Y
Y

Y
k

k
k 1 i 1
For large values of n, we know that by the Central Limit Theorem this estimator will be
approximately normal:

2
Y ~ N ,

Then, we can select n such that the estimate is within a desired accuracy of the true mean, i.e. a
percentage of the true mean with a specified probability.
E.g. random variates are generated from a Gamma distribution:
f x

1 x
x e

for x 0

Twenty values are generated for each sample and the mean and standard deviation of each sample
are given as:
Sample 1
2
3
4
5
6
7
8
9
10
Mean
12.01 11.79 13.43 14.01 11.44 11.19 11.24 12.42 12.91 12.29
SD
6.15
4.73
6.42
7.02
4.30
3.90
3.84
4.35
3.59
4.60
Determine the no. of simulated values required for the estimate of the mean to be to be within 5% of
the true mean with 95% certainty.
We require n such that:

Pr X 0.05 0.95
X 0.05
0.95
Pr

n n

0.05
Pr Z
0.95
n

0.05
0.05
Pr
Z
0.95
n
n

0.05
2 Pr Z
1 0.95
n

0.05
1.96
n
We do not know the true value for and , so we must use the sample estimates. The sample
estimate for the mean is given by averaging the mean of each sample:
X

1 1 10 20
1 10
X ij X i 12.273
20 10 i 1 j 1
10 i 1

61
Zhi Ying Feng

To estimate the sample variance, use:


si2

1 n
Xi X
n 1 i 1

n
n
2
1 n 2
X

2
X
y

i
i
n 1 i 1
i 1
i 1

2
1 n 2
Xi nX
n 1 i 1

So the estimated variance is given by:

10 20 2
2
1
xij 10 20 X
10 20 1 i 1 j 1

Where we have:
10 20

10

i 1 j 1

i 1

xij2 20 1 si2 20 X i

10

10

i 1

i 1

19 si2 20 X i

4790.0596 30285.702
35075.7616
Therefore the estimated variance and standard deviation is:
1
35075.7616 200 12.2732
2

199
24.87666

4.98765
Substituting back into the previous equation, we have:
0.05
1.96
n

0.05 12.273
1.96
4.98765 n
n 254
We can also estimate the parameters of the Gamma distribution, since we know that:

var X 2
X

Using our estimated values we have:

12.273

24.87666
2
0.4934, 6.0549

62
Zhi Ying Feng