Escolar Documentos
Profissional Documentos
Cultura Documentos
TCD-MT2014
Rozenn Dahyot
TCD-MT2014
Rozenn Dahyot
Contents
1 Introduction
1.1 Who Forecasts? . . . . . . . . . . . . . . . . . . . .
1.2 Why Forecast? . . . . . . . . . . . . . . . . . . . . .
1.3 How to Forecast? . . . . . . . . . . . . . . . . . . .
1.4 What are the Steps in the Forecasting Procedure?
.
.
.
.
5
5
5
5
6
2 Quantitative Forecasting
2.1 When Can We Use Quantitative Methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 What are the Types of Quantitative Methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Explanatory model and Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
7
7
7
9
9
10
11
15
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
18
18
19
19
20
21
21
22
23
23
25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TCD-MT2014
Rozenn Dahyot
26
8 Linear Regression
8.1 Regression with one explanatory variable . . . . .
8.2 Using Linear Regression to Make Forecasts . . . .
8.2.1 Time as an explanatory variable . . . . . .
8.2.2 Indicator variables: modelling seasonality
8.3 Least Square algorithm in Matrix Form . . . . . .
8.3.1 Least Squares for Linear regression . . . .
8.3.2 Multiple Linear regression . . . . . . . . . .
.
.
.
.
.
.
.
28
28
30
30
30
31
31
32
33
33
34
37
37
37
.
.
.
.
.
38
38
38
38
39
41
42
42
43
44
45
45
45
46
46
47
15 ARIMA(p, d , q)
15.1 Differencing a time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.2 Integrating differencing into ARMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.3 Which AR I M A(p, d , q) model do I use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
49
51
53
53
54
54
54
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
TCD-MT2014
Rozenn Dahyot
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
59
59
60
60
60
.
.
.
.
62
62
62
63
63
TCD-MT2014
Rozenn Dahyot
Chapter 1
Introduction
TCD-MT2014
Rozenn Dahyot
Qualitative: little or no quantitative data is available, however there is a lot of knowledge and expertise that are
then used to make the forecast.
Example: an economist forecasting how a large increase in oil price will affect oil consumption.
Unpredictable: little or no information about the events to be forecast exists. Forecasts are made on the basis of
speculation.
Example: predicting the effect of a new, very cheap, non-polluting form of energy on the world economy.
In this course we will concentrate on Quantitative Forecasting.
Further, we will concentrate on the last 2 steps of the forecasting procedure: choosing and fitting
models, making forecasts and evaluating them. We will look at several different forecasting models, how they can be used to make forecasts, and different ways to compare their performance.
TCD-MT2014
Rozenn Dahyot
Chapter 2
Quantitative Forecasting
TCD-MT2014
Rozenn Dahyot
Explanatory models: the quantity to be forecast has a relationship with variables in the data. Example: in forecasting next months inflation rate, we assume:
inflation rate = f (previous inflation rates, export level, GDP growth last year,
inflation in neighbouring countries, exchange rates, . . . , error).
It assumes that any change in input variables will change the forecast in a predictable way (given by
f ). We have to discover the form of f . There are always random changes in inflation that cannot be
forecast, so we always include error in the function f to account for these.
Time series: unlike explanatory models, we make no attempt to discover what variables might affect
the quantity to be forecast. We merely look at the values of the quantity over time (a time series), try to
discover patterns in the series, and make a forecast by extrapolating them into the future. Thus, the
inflation rate at month t + 1 can be written:
inflation ratet +1 = g (inflation ratet , inflation ratet 1 , inflation ratet 2 , . . . , error)
This is often a good idea because the function f in the explanatory model can be very difficult to
define, even approximately. Indeed the effect of the variables on the forecast may not be understood,
and it may not be worthwhile or too expensive to try to understand it. It is therefore often better just
to treat the time series as a black box, and use a time series model.
TCD-MT2014
Rozenn Dahyot
Chapter 3
1991
1992
1993
1994
1995
164
148
152
144
155
125
153
146
138
190
192
192
147
133
163
150
129
131
145
137
138
168
176
188
139
143
150
154
137
129
128
140
143
151
177
184
151
134
164
126
131
125
127
143
143
160
190
182
138
136
152
127
151
130
119
153
1.2 Definition The time plot is the plot of the series in order of time (i.e. the time is reported on the xaxis).
Figure 3.1(a) shows the time plot for the beer data.
1.3 Definition The seasonal plot shows the data from each season that are overlapping.
Fig. 3.1(b) shows the seasonal plot of the beer data.
TCD-MT2014
Rozenn Dahyot
180
180
160
160
beer
140
1994
1995
Jan
Feb
Mar
Apr
Time
120
1993
1992
120
1991
140
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Month
(y t y)(y
r k = t =k+1
,
Pn
2
t =1 (y t y)
P
where y = n1 nt=1 y t is the mean of the series. The ACF can be plotted reporting the values r k on the y axi s
with the lag k on the abscissa.
2.2 Definition The Partial AutoCorrelation Function (PACF) is another useful method to examine serial
dependencies. This is an extension of the autocorrelation, where the dependence on the intermediate
elements (those within the lag) is removed. The set of partial autocorrelations at different lags is called the
partial autocorrelation function (PACF) and is plotted like the ACF.
10
TCD-MT2014
Rozenn Dahyot
Figure 3.2 shows time plot, ACF, and PACF for the beer data.
beer
180
160
120
1993
1994
1995
0.4
0.2
0.0
0.2
0.4
0.4
0.2
0.0
PACF
0.2
0.4
0.6
1992
0.6
1991
ACF
140
10
15
Lag
10
15
Lag
Figure 3.2: Displaying the time plot of the beer data with its ACF and PACF plots. R>tsdisplay(beer)
11
Rozenn Dahyot
10000
9000
usdeaths
70
40
7000
50
8000
60
cowtemp
80
90
11000
TCD-MT2014
20
40
60
1973
1974
1975
Time
1976
1977
1978
1979
Time
(2)
100
2e+04
200
4e+04
6e+04
mink
400
300
airpass
8e+04
500
1e+05
600
(1)
1950
1952
1954
1956
1958
1960
1850
Time
1860
1870
1880
1890
1900
1910
Time
(3)
(4)
Figure 3.3: Time plots: (1) Daily morning temperature of a cow, (2) Accidental deaths in USA, (3) International airline passengers and (4) Annual mink trapping.
12
Rozenn Dahyot
0.4
0.2
0.2
0.0
0.0
0.2
0.2
0.4
ACF
0.4
ACF
0.6
0.6
0.8
0.8
1.0
1.0
TCD-MT2014
10
15
20
25
10
Lag
15
20
25
15
20
25
Lag
(B)
0.2
ACF
0.4
0.0
0.2
0.2
0.0
0.4
0.2
ACF
0.4
0.6
0.6
0.8
0.8
1.0
1.0
(A)
10
15
20
25
Lag
10
Lag
(C)
(D)
Figure 3.4: ACF. Example of R command for usdeaths time series >acf(ts(usdeaths,freq=1),25).
13
Rozenn Dahyot
0.5
0.2
0.0
0.2
Partial ACF
0.0
Partial ACF
0.5
0.4
0.6
1.0
TCD-MT2014
10
15
20
25
10
Lag
15
20
25
15
20
25
Lag
(b)
0.2
0.0
Partial ACF
0.1
0.2
0.4
0.1
0.2
0.0
Partial ACF
0.2
0.4
0.3
0.6
0.4
(a)
10
15
20
25
Lag
10
Lag
(c)
(d)
Figure 3.5: PACF. Example of R command for usdeaths time series >pacf(ts(usdeaths,freq=1),25).
14
TCD-MT2014
Rozenn Dahyot
Part I
15
TCD-MT2014
Rozenn Dahyot
This part introduces a number of forecasting methods called Holt-Winters Algorithms which are not
explicitly based on a probability models, and can be seen as being of an ad-hoc nature [1]. Chapter 4 introduces an algorithm suitable to be fitted to a time series with no or little trend and no seasonal patterns. A
second algorithm is introduced in chapter 5 to deal with time series with trends but no seasonal patterns.
Chapter 7 proposes two algorithms for dealing with time series presenting both a trend and a seasonal
component. Chapter 6 proposes criteria that can be used to select the best algorithm for a particular time
series.
16
TCD-MT2014
Rozenn Dahyot
Chapter 4
17
TCD-MT2014
Rozenn Dahyot
'
F t +1 = F t + (y t F t )
Until no more observation are available then
F n+k = F n+1 , k 1
&
= F t + (y t F t )
= [F t 1 + (y t 1 F t 1 )] + (y t [F t 1 + (y t 1 F t 1 )])
= y t + (1 ) y t 1 + (1 )2 F t 1 ,
(4.1)
So exponential smoothing forecasts are a weighted sum of all the previous observations.
4.3 Exercises
(1) What is F t +1 when = 0? What happens as increases to 1? What range of values must F t +1 lie in?
(2) Here is a short time series. Calculate the exponentially smoothed series and make a forecast for the
next value in the sequence, using = 0.5 and = 0.1:
t
yt
F t (for = 0.5)
3
error
0
F t (for = 0.1)
3
error
0
4
(3) Can you make k-step ahead forecasts using exponential smoothing?
(4) Which observation is given the biggest weight in the formula (4.1) for F t +1 . Which is given the smallest? Is this sensible?
18
TCD-MT2014
Rozenn Dahyot
Chapter 5
L t = y t + (1 ) (L t 1 + b t 1 )
b t = (L t L t 1 ) + (1 ) b t 1
F t +1 = L t + b t
Until no more observation are available then
F n+k = L n + k b n , k 1
&
Note that no forecasts or fitted values can be computed until y 1 and y 2 have been observed. Also by
convention, we let F 1 = y 1 .
5.1 Exercises
(1) Calculate the level and slope of the series on the next page by Holts linear method, using = 0.5 and
= 0.1. Compute, at each point, the 1-step ahead forecast F t +1 = L t + b t .
t
yt
Lt
bt
1
4
19
F t = L t 1 + b t 1
yt Ft
TCD-MT2014
Rozenn Dahyot
no
no
yes
yes
no
yes
Algorithms
SES
DES
parameters
(, )
no/yes
yes
yes
20
TCD-MT2014
Rozenn Dahyot
Chapter 6
6.1 Definitions
Given a time series y 1 , . . . , y n , fit a model and compute the fitted values F 1 , . . . , F n .
1.1 Definition (SSE) The Sum of Square Errors is defined by:
SSE =
n
X
(y t F t )2
t =1
In R, the computer selects the best forecasting model by finding the parameters, for SES or (, ) for
DES, such that SSE is minimal. Other software may use other criteria such as RMSE and MAPE.
1.2 Definition (RMSE) The root mean square error of the model is:
s
RMSE =
n
1X
(y t F t )2 . =
n t =1
SSE
n
Note that the same parameters ( or (, )) would be found when minimising the SSE or the RMSE.
1.3 Definition (MAPE) The mean absolute percent error is:
n y F
1X
t
t
MAPE = 100
.
n t =1
yt
It is written as a percentage. Again, pick the model with the smallest MAPE.
Other software may use MAPE for finding the best parameters. This would give slightly different estimates of the parameters than using the SSE/RMSE.
21
TCD-MT2014
Rozenn Dahyot
6.2 Exercises
(1) Here is a short time series and its exponentially smoothed forecast with = 0.5. Compute the error
and then the RMSE and MAPE:
yi
Fi
4
3
5
7
5
6
4
4
4
3.5
4.25
5.73
5.26
5.73
y i Fi
22
(y i F i )2
y i Fi
yi
TCD-MT2014
Rozenn Dahyot
Chapter 7
Init:
L s = 1 Ps y i
s i =1
y y
y y
y y
b s = 1s s+1s 1 + s+2s 2 + + 2s s s
S i = y i L s , i = 1, . . . , s
and choose 0 1 and 0 1 and 0 1
Compute
for t > s :
level
L t = (y t S t s ) + (1 ) (L t 1 + b t 1 )
trend
b
t = (L t L t 1 ) + (1 ) b t 1 ,
seasonal S = (y L ) + (1 ) S
t
t
t
t s
forecast F t +1 = L t + b t + S t +1s
Until no more observationare available
and subsequent forecasts:
F n+k = L n + k b n + S n+ks
&
Table 7.1: Seasonal Holt Winters Additive Model Algorithm (noted SHW+).
s is the length of the seasonal cycle. We have to pick the values of , and . As with the other methods
(i.e. SES and DES), we can use the SSE/RMSE or MAPE to choose the best values.
7.1 Exercise
In the table on the next page are the first 14 months beer production data. Since the data have a 12 month
seasonal cycle, we initialise L 12 , b 12 and S 1 , . . . , S 12 . Use the additive model formulae to calculate month
13 and 14s level, trend and seasonality, and make a 1-step ahead forecast for months 13, 14 and 15. Use
= 0.5, = 0.3 and = 0.9.
23
TCD-MT2014
Rozenn Dahyot
'
Init:
L s = 1 Ps y i
s i =1
y y
y y
y y
b s = 1s s+1s 1 + s+2s 2 + + 2s s s
S i = y i , i = 1, . . . , s
Ls
yt
level
+ (1 ) (L t 1 + b t 1 )
L t = S t s
trend
b t = (L t L t 1 ) + (1 ) b t 1 ,
seasonal S = y t + (1 ) S
t
t s
Lt
forecast
F t +1 = (L t + b t ) S t +1s
Until no more observation are available
and subsequent forecasts:
F n+k = (L n + k b n ) S n+ks
&
Table 7.2: Seasonal Holt Winters Multiplicative Model Algorithm (noted SHW).
Month No.
Production
Level L t
Trend b t
Seasonal S t
1
2
3
4
5
6
7
8
9
10
11
12
164
148
152
144
155
125
153
148
138
190
192
192
158.25
0.65
5.75
10.25
6.25
14.25
3.25
33.25
5.25
12.25
20.25
31.75
33.75
33.75
13
147
14
133
15
163
24
Forecast F t
TCD-MT2014
Rozenn Dahyot
25
TCD-MT2014
Rozenn Dahyot
Part II
26
TCD-MT2014
Rozenn Dahyot
This part is investigating one important class of linear statistical models called ARIMA. ARIMA models
use the same hypotheses and the same approach as Linear regression, so chapter 8 (re-)introduces Linear
regression and show how Linear regression could be used for Forecasting. Linear regression however requires the definition of explanatory variables, and the selection of informative explanatory variables can
often only be done by domain experts. In addition to choosing these explanatory variables, one also needs
to collect this data along with the time series of interest. On the contrary, ARIMA models only requires the
times series to be recorded for a while, and no additional information is required for analysis.
27
TCD-MT2014
Rozenn Dahyot
Chapter 8
Linear Regression
8.1 Regression with one explanatory variable
We have collected the data (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n ), where the x i are known predictor values set by the
experimenter and y i is the observed response. We wish to model y as a function of x. In simple linear
regression we assume that y is related to x by:
y i = a + b x i + i ,
with the assumptions that
i is the error and is normally distributed with mean 0 and unknown variance 2 .
i and j are independent when i 6= j .
Thus we say that the y i follow a pattern a+b x i but with some random, unpredictable behaviour modelled
as a normal distribution. Or, in other words, given a and b, y i will be normally distributed with a mean
a + b x i and variance 2 .
(1) Fitting the Model: The best fitting values of and are the least squares estimates that minimise the
Residual Sum of Squares:
n
X
RSS = (y i a bx i )2
i =1
RSS
=0
b
Pn
i =1 (x i x)(y i y)
,
Pn
2
i =1 (x i x)
a = y x,
(8.1)
RSS
(n 2)
The denominator
indicates that the RSS value is computed with the least squares estimate (a,
b).
RSS
n 2 corresponds to the degree of freedom in the errors {i }i =1, ,n : the estimation of 2 parameters
(a, b) removes 2 degrees of freedom.
28
TCD-MT2014
Rozenn Dahyot
n
1X
xi
n i =1
and
y =
n
1X
yi
n i =1
(x i x)(y
r x y = qP i =1
.
P
n
n
2
2
(y
y)
(x
x)
i
i
i =1
i =1
x 2 x,
, x n x),
and y a column vector
If you note x a column vector gathering all the values (x 1 x,
y 2 y,
, y n y)
then the correlation coefficient corresponds to:
collecting all the values (y 1 y,
rx y =
< x , y >
kxk kyk
where < x , y > is the dot product between x and y . kxk (resp. kyk) is the norm of the vector x (resp. y ).
By definition of the dot product, we have:
< x , y >= kxk kyk cos()
where is the angle between vectors x and y . The correlation coefficient is then simplified to r x y =
cos and consequently has values between 1 (perfect negative correlation) and +1 (perfect positive
correlation).
Another important measure when we do regression is the coefficient of determination.
1.2 Definition (coefficient of determination) The coefficient of determination is the correlation between
i:
the y i and their predicted values from the fitted model yi = a + bx
R
= r y2y
Pn
y)
2
i =1 ( y i
Pn
i =1 (y i
2
y)
For simple linear regression, as explained here, R 2 = r x2y . This is not true for more general multivariate
regression.
(3) Evaluating Model Fit: We look at the residuals. These are the difference between the observed and
predicted values for y:
i = y i yi .
There are two problems with model fit that can arise:
We have not fit the pattern of the data. Any unmodelled relationships between x and y appear
in the residuals. A scatter plot of the residuals against the x i should show up any unmodelled
patterns.
The normally distributed error assumption is not correct. The residuals should be independent
and normally distributed with variance 2 . A histogram of the residuals can usually verify this.
29
TCD-MT2014
Rozenn Dahyot
(4) Outliers: An observation that has an unusually large residual is an outlier. An outlier is an observation
that has been predicted very badly by the model. Standardised residuals give a good indication as to
whether an observation is an outlier. We should investigate if there is some special reason for the
outlier occurring. Outliers can also cause problems because they can significantly alter the model fit,
making it fit the rest of the data worse than it would otherwise.
(5) Making Predictions: Given a new value X , y should be normally distributed with mean a + b X and
.
variance 2 . We replace a, b and 2 by their estimates and so forecast the value of y to be y = a + bX
A 95% prediction interval turns out to be:
2 s
a + bX
(6) Statistical Tests in Regression: An F-test is used to determine if there is any significant linear relationship between x and y. In other word, it is checking if the hypothesis b = 0 is true or not. The test
statistic is:
Pn
2
( yi y)
F = Pn i =1
,
2
i =1 (y i y i ) /(n 2)
which is compared with the F-value having 1 and n 2 degrees of freedom.
EXERCISE: If month i is January, what does the above equation reduce to? What if the month is February?
The parameters 1 , . . . , 12 represent a monthly effect, the same in that month for all the series, that is
the departure from the trend-cycle in that month. Theres one technical matter with the above model. One
of the monthly terms is not a free parameter. We can in fact only fit 11 of the 12 monthly effects and the
rest is absorb by the term a + b i . In other word we need only 11 binary varables to encode the 12 months:
30
TCD-MT2014
Rozenn Dahyot
I choose to eliminate the January effect you are free to choose any other if you wish so the model we
use is:
y i = a + b i + 2 Febi + + 12 Deci + i .
EXERCISE: What is the trend-cycle component of the model? What is the seasonal component?
EXERCISE: what is the earliest value in the series that we can compute a predicted value for?
EXERCISE: we have quarterly data with a yearly seasonal component. What model would you fit using this
method?
(8.2)
Having collected n observations {(x i , y i )}i =1, ,n , we can write the following linear system:
y 1 = a + b x 1 + 1
y 2 = a + b x 2 + 2
..
y n = a + b x n + n
this system can be rewritten as:
y1
y2
..
.
yn
{z
y
=
}
1
1
..
.
1
x1
x2
..
.
xn
{z
1
2
..
.
n
| {z
"
#
b +
| {z }
P
b = arg min RSS = n 2 = kk2
i
=1
i
using table 8.1. So the estimate of such that the derivative of the RSS is zero is:
b = (XT X)1 XT y (Least Square estimate)
you can check that equation (8.3) gives the same result as equation (8.1).
31
(8.3)
TCD-MT2014
Rozenn Dahyot
y
x
T
y
Ax
xT A
xT x
x T Ax
A
A
2x
Ax + A T x
X=
and is
1
1
..
.
1
x1
x2
..
.
xn
z1
z2
zn
= b
c
32
TCD-MT2014
Rozenn Dahyot
Chapter 9
y 2 = 0 + 1 y 1 + 2
y 3 = 0 + 1 y 2 + 3
..
..
.
.
y n = 0 + 1 y n1 + n
(1) Define x i = y i 1 ; this is called the lagged series. Note that x i is only defined for i = 2, . . . , n. It is NOT
defined for i = 1, since there is no y 0 .
(2) The AR(1) model is then:
y i = 0 + 1 x i + i .
This is just the linear regression model! So, we can fit this model by doing a linear regression of the
series against the lagged series. That will give us the best values for the parameters 0 and 1 , and an
estimate s 2 for 2 . We could also do an F-test to verify if there is a significant relationship.
(3) NOTE: because x 1 does not exist, the regression is fitted on n 1 points (x 2 , y 2 ), . . . , (x n , y n ).
(4) Our fitted values for the series are then
yi = 0 + 1 x i = 0 + 1 y i 1 ,
for i = 2, . . . , n. We cannot fit a value to y 1 because there is no y 0 !
33
TCD-MT2014
Rozenn Dahyot
(5) We estimate 2 by s 2 :
s2 =
n
n
X
1
1 X
(y i 0 1 y i 1 )2 =
(y i 0 1 y i 1 )2
n 1 2 i =2
n 3 i =2
Note that we had only n 1 equations in the linear system used to estimate ( 0 , 1 ), and there are 2
parameters ( 0 , 1 ) in our model. Thus a 95% prediction interval for y i when x i is known, is
0 + 1 x i 2s.
with s 2 =
Pn
2
i =2 i
n3
= 0 + 1 y n+1 + n+2
= 0 + 1 ( 0 + 1 y n + n+1 ) + n+2
= 0 + 1 0 + 21 y n + 1 n+1 + n+2
{z
}
{z
} |
|
Forecast yn+2
error term
Note that the Forecast is the part that we can compute ( i.e. we know the values of 0 , 1 , y n ) whereas
we dont know the values of the errors, we only know how these behave statistically.
34
TCD-MT2014
Rozenn Dahyot
We know n+1 N (0, s 2 ) and n+2 N (0, s 2 ) so E[n+1 ] = 0 and E[n+2 ] = 0. Now lets compute the
variance of the error term:
E[( 1 n+1 + n+2 )2 ] = 21 E[2n+1 ] +2 1 E[n+1 n+2 ] + E[2n+2 ]
|
{z
} | {z }
| {z }
=0
=s 2
=s 2
The expectation of n+1 n+2 is 0 because we assume independence of the residuals. So the 95% confidence interval is
q
y n+2 = yn+2 2 s ( 21 + 1)
We see that the confidence interval is getting larger as we move further in the future from the last observation available y n .
(4) How would we go about forecasting k steps ahead, that is y n+k ? What is the prediction interval?
we know that
y n+1 = 0 + 1 y n
| {z }
2s
|{z}
and
q
y n+2 = 0 + 0 1 + 21 y n 2s 1 + 21
|
{z
} |
{z
}
Forecast yn+2
and
y n+3
confidence interval
= 0 + 1 y n+2 + n+3
= 0 + 1 ( 0 + 1 y n+1 + n+2 ) + n+3
= 0 + 1 ( 0 + 1 ( 0 + 1 y n + n+1 ) + n+2 ) + n+3
= 0 + 1 0 + 21 0 + 31 y n + 21 n+1 + 1 n+2 + n+3
{z
}
{z
} |
|
error term
Forecast yn+3
so
y n+3 = 0 + 1 0 + 21 0 + 31 y n 2s
q
2
4
1 + 1 + 1
y n+k = 0
k
X
i11
+ k1 y n +
i =1
k
X
i11 n+ki 1
(9.1)
i =1
} |
{z
forecast
{z
error term
implying:
y n+k = 0
k
X
i11
i =1
v
u
k
uX
1)
k
+ 1 y n 2s t 2(i
1
i =1
{z
forecast
}|
{z
Confidence interval
= 0 + 1 y n+k + n+k+1
P
= 0 + 1 0 ki=1 i11 + k1 y n + ki=1 i11 n+ki 1 + n+k+1
P
P
= 0 + 1 0 ki=1 i11 + 1k+1 y n + 1 ki=1 i11 n+ki 1 + n+k+1
P
P
= 0 k+1 i 1 + k+1 y n + k+1 i 1 n+k+1i 1
i =1
i =1
35
(9.2)
TCD-MT2014
Rozenn Dahyot
Note that the width of the confidence interval depends on the term:
k
X
i =1
1)
2(i
1
k
X
k i =1
1)
2(i
=
1
1
1 21
So for an AR(1) model, the confidence interval is growing up to a finite limit (it is bounded). Check
this result for AR, MA, ARMA models using simulations R.
36
TCD-MT2014
Rozenn Dahyot
Chapter 10
y 1 = 1 0 + 1
y 1 = 1
y 2 = 1 1 + 2
y 2 = 1 y 1 + 2
y 3 = 1 2 + 3
y 3 = 1 (y 2 1 y 1 ) + 3
..
..
..
..
.
.
.
.
y n = 1 n1 + n
y n = 1 y n1 21 y n2 + + (1 )n1 y 1 + n
We estimate the parameter 1 by minimising the sum of squares errors again, however the systems of equations is non-linear w.r.t. the parameter 1 (powers of 1 appears in the expression). More complex numerical methods can perform this estimation (out of scope in this class). The simple Least Squares algorithm
used for Linear Regression (cf. chapter 8) and AR models cannot be used when there is an MA component
in the model.
37
TCD-MT2014
Rozenn Dahyot
Chapter 11
11.2 Exercises
(1) Identify the following equations as MA(q), AR(p) or ARMA(p,q) identifying the orders p and q:
(a) y t = 0 + 12 y t 12 + t
(c) y t = c + 12 y t 12 + 12 t 12 + t
(b) y t = 0 + 12 t 12 + t
(d ) y t = 0 + 1 t 1 + 12 t 12 + t
(2) Assume an MA(1) model, what is the expectation E[y t ]? Is it stationary in mean (i.e. is E[y t ] changing
with the time t )?
(3) Assume an MA(2) model, what is the expectation E[y t ]? Is it stationary in mean?
(4) Assume an AR(1) model, what is the expectation E[y t ] (consider when t )? Is it stationary in
mean?
38
TCD-MT2014
Rozenn Dahyot
sqrt(0.1796)))
TCD-MT2014
Rozenn Dahyot
sqrt(0.1796)))
The correlation between values in the time series depends only on the time distance between these
values. (stationary in autocorrelation)
We spend most of our time discussing the first two.
4.2 Definition (Stationarity in variance) In addition to stationarity in mean, a time series is said to be
stationary in variance if the variance in the time series does not change with time.
EXERCISE:
(1) Are the simulated times series in figures 11.1, 11.2 and 11.3 stationary in variance?
(2) As well as non-stationarity in both mean and variance, series can also be: non-stationary in mean
and stationary in variance; or stationary in mean and non-stationary in variance; or non-stationary
in both. Sketch time series with each of these three properties.
40
TCD-MT2014
Rozenn Dahyot
c(0.8897)),sd = sqrt(0.1796)))
11.5 Conclusion
ARMA models are not able to handle time series that are not stationary in mean and variance. In other word,
ARMA models should only be fitted to time series that are stationary in mean (i.e. no trend or no seasonal
pattern) and stationary in variance.
41
TCD-MT2014
Rozenn Dahyot
Chapter 12
ACF
PACF
AR(1)
AR(p)
MA(1)
MA(q)
Table 12.1: Shapes of ACF and PACF to identify AR or MA models suitable to fit time series.
E y t E[y t ] y t k E[y t k ]
AC F (k) =
p
Var [y t ]Var [y t k ]
In time series, we may want to measure the relationship between Y t and Y t k when the effects of other
time lags 1, 2, . . . , k 1 have been removed. The autocorrelation does not measure this. However, Partial
autocorrelation is a way to measure this effect.
42
TCD-MT2014
Rozenn Dahyot
1.2 Definition (PACF) The partial autocorrelation of a time series at lag k is denoted k and is found as
follows:
(1) Fit a linear regression of y t to the first k lags (i.e. fit an AR(k) model to the time series):
y t = 0 + 1 y t 1 + 2 y t 2 + + k y t k + t
(2) Then k = k , the fitted value of k from the regression (Least Squares).
The set of partial autocorrelations at different lags is called the partial autocorrelation function (PACF) and
is plotted like the ACF.
43
TCD-MT2014
Rozenn Dahyot
=0,k1
0,k
0,k>0
1 if k = 0
Cov[y t , y t k ]
1
if k = 1
Cor r [y t , y t k ] = p
=
21 +1
Var [y t ]Var [y t k ]
0 otherwise t > 1
(3) Conclude about the form of the ACF function for a MA(1) models?
Ans. The ACF plots the lags k on the x-axis, and the y-axis reports the correlation Cor r [y t , y t k ].
44
TCD-MT2014
Rozenn Dahyot
Chapter 13
13.2 Exercises.
(1) Write an MA(1) model with the backshif operator
(2) Write an AR(1) model with the backshif operator
(3) Write an MA(q) model with the backshif operator
(4) Write an AR(p) model with the backshif operator
(5) Write an ARMA(p,q) model with the backshif operator
45
TCD-MT2014
Rozenn Dahyot
Chapter 14
TCD-MT2014
Rozenn Dahyot
is the smallest. Again, a forecasting computer package will allow you to hunt for the ARMA model with the
smallest AIC or BIC.
14.2 R output
Tables 14.1 and 14.1 show the R outputs when fitting a AR(1) and AR(2) to the dowjones data.
Series: dowjones
ARIMA(1,0,0) with non-zero mean
Call: arima(x = dowjones, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.9964 116.0329
s.e. 0.0045
4.8878
sigma^2 estimated as 0.1974: log likelihood = -49.86
AIC = 105.72 AICc = 106.04 BIC = 112.79
Table 14.1: Output in R of arima(dowjones,order=c(1,0,0)).
Series: dowjones
ARIMA(2,0,0) with non-zero mean
Call: arima(x = dowjones, order = c(2, 0, 0))
Coefficients:
ar1
ar2
1.4990 -0.5049
s.e. 0.0993 0.1000
intercept
115.7854
4.1654
TCD-MT2014
Rozenn Dahyot
48
TCD-MT2014
Rozenn Dahyot
Chapter 15
ARIMA(p, d , q)
An ARMA model is not suitable for fitting a times with a trend: Remember a time series showing a trend is
not stationary in mean and ARMA models are only suited for time series stationary in mean and variance.
Differencing is an operation that can be applied to a time series to remove a trend. If after differencing
the time series looks stationary in mean and variance, then an ARMA(p,q) model can be used. Section
15.1 presents differencing (of order d ) and section 15.2 extends the ARMA(p,q) models to the ARIMA(p,d,q)
models.
I (d )
M A(q)
49
TCD-MT2014
Rozenn Dahyot
dowjones
120
115
110
40
60
0.8
0.2
0.2
PACF
0.4
0.6
0.8
0.6
0.4
0.2
0.2
ACF
80
1.0
20
1.0
10
15
10
Lag
15
Lag
1.5
diff(dowjones, differences = 1)
1.0
0.5
0.0
0.5
40
80
0.2
PACF
0.2
0.0
0.2
0.0
0.2
ACF
60
0.4
20
0.4
10
15
Lag
10
15
Lag
TCD-MT2014
Rozenn Dahyot
diff(dowjones, differences = 2)
1.0
0.5
0.0
1.0
40
60
80
PACF
ACF
0.3
0.1
0.3
0.1
20
10
15
Lag
10
15
Lag
(1) In the table overpage are some data to which we wish to fit the ARIMA(1,1,1) model
(1 0.4 B )(1 B ) y t = 0.1 + (1 0.9 B ) t .
If we let x t = (1 B )y t be the differenced series, we can fit the simpler ARMA(1,1) model:
(1 0.4 B ) x t = 0.1 + (1 0.9 B ) t .
We can re-write this as x t = 0.1+0.4 x t 1 0.9 t 1 +t , and so create fitted values xt . With these fitted
values we can back transform to get fitted values yt = xt + y t 1 . Use these facts to fill in the table.
(2) Here is the ARIMA(1,1,1) model:
(1 1 B ) (1 B ) y t = c + (1 1 B )t .
Expand this equation and apply the backshift operators to get an equation for y t in terms of y t 1 , y t 2 ,
t and t 1 .
TCD-MT2014
Rozenn Dahyot
Time
t
Data
yt
Differenced data
x t = y t y t 1
Fitted values
Error
t = x t xt
Fitted values
yt = xt + y t 1
9.5
13.7
4.2
4.2
8.7
-5.0
16.1
7.4
15.3
-0.8
12.2
-3.1
52
TCD-MT2014
Rozenn Dahyot
Chapter 16
AR s (P )
AR(p)
I s (D)
c + (1 1 B 2 B 2 q B q ) (1 1 B s 2 B 2s Q B Qs ) t
{z
}
|
{z
}|
(16.1)
M A s (Q)
M A(q)
where
AR(p) Autoregressive part of order p
M A(q) Moving average part of order q
I (d ) differencing of order d
AR s (P ) Seasonal Autoregressive part of order P
M A s (Q) Seasonal Moving average part of order Q
I s (D) seasonal differencing of order D
s is the period of the seasonal pattern appearing i.e. s = 12 months in the Australian beer production
data.
The idea behind the seasonal ARIMA is to look at what are the best explanatory variables to model a
seasonal pattern. For instance lets consider the Australian beer production that shows a seasonal pattern of
53
TCD-MT2014
Rozenn Dahyot
period 12 months. Then to predict the production at time t , y t , the explanatory variables to consider are:
y t 12 , y t 24 , , and / or t 12 , t 24 ,
For seasonal data, it might also make sense to take differences between observations at the same point in
the seasonal cycle i.e. for monthly data with annual cycle, define differences (D=1)
y t y t 12 .
or (D=2)
y t 2y t 12 + y t 24 .
16.4 Conclusion
We have now defined the full class of statistical models AR I M A(p, d , q)(P, D,Q)s studied in this course.
ARMA(p,q) can only be applied to time series stationnary in mean, hence the extension to AR I M A(p, d , q)(P, D,Q)s
(introducing d , D, P,Q, s) allowed us to make the time series stationary in mean. Unfortunatly, we still are
not able to deal with time series that are not stationary in variance. We propose some possible solutions in
the next chapter.
54
TCD-MT2014
Rozenn Dahyot
0.06
ts1
0.06
0.02
0.02
400
800
0.6
0.0
0.2
0.4
PACF
0.6
0.4
0.2
0.0
ACF
600
0.8
200
0.8
20
40
60
80
100
Lag
20
40
60
Lag
55
80
100
TCD-MT2014
Rozenn Dahyot
ts1
0.04
0.04
0.00
400
800
0.4
0.2
0.0
0.2
PACF
0.4
0.2
0.0
0.2
ACF
600
0.6
200
0.6
20
40
60
80
100
Lag
20
40
60
Lag
56
80
100
TCD-MT2014
Rozenn Dahyot
ts1
0.04
0.04
0.00
400
600
800
0.2
0.0
0.2
0.2
0.0
PACF
0.2
0.4
200
0.4
ACF
20
40
60
80
100
Lag
20
40
60
Lag
57
80
100
TCD-MT2014
Rozenn Dahyot
0.04
ts1
0.04
0.00
400
800
0.4
0.2
0.0
0.2
PACF
0.4
0.2
0.0
0.2
ACF
600
0.6
200
0.6
20
40
60
80
100
Lag
20
40
60
Lag
58
80
100
TCD-MT2014
Rozenn Dahyot
Chapter 17
Figure 17.1: Monthly totals of international airline passengers (1949 to 1960) (time series airpass).
Mathematical functions can be applied to the time series to make them stationary in variance. Four
such transformations are commonly used, and reduce variance by differing amounts. Which one to use
59
TCD-MT2014
Rozenn Dahyot
Square root
Cube root
Logarithm
Negative Reciprocal
Increasing
Strength
EXERCISE: look at the four transformations as applied to the airpass time series (figure 17.2). Which transformation is best at stabilising the variance?
60
TCD-MT2014
Rozenn Dahyot
log
()1/3
()1
Figure 17.2: Transformations of airpass.
61
TCD-MT2014
Rozenn Dahyot
Chapter 18
Conclusion
18.1 Summary of the course
We have introduced the Holt-Winters algorithms and the ARIMA models as two classes of techniques to
analyse time series. Some Holt-Winters algorithms have an equivalent ARIMA models (cf. table 18.1). Figure 18.1 provides a summary to the content of this course. Remember that all these methods rely on the
hypothesis that somewhat what has happened in the past repeats itself in the future (continuity hypothesis).
Simple Exponential Smoothing
ARIMA(0,1,1)
ARIMA(0,2,2)
ARIMA(0,1,s+1)(0, 1, 0)s
no ARIMA equivalent
Table 18.1: Final remarks about Holt-Winters Algorithms and ARIMA models ([2] p. 373).
62
TCD-MT2014
Rozenn Dahyot
ARIMA models
None
~ N (0, 2 ) (independent )
ARMA(p,q)
Selection criteria:
SES
DES
ARIMA(p,d,q)
Seasonal ARIMA(p,d,q)(P,D,Q)s
SSE/MAPE/RMSE
Computation of forecasts
BIC/AIC
Computation of forecasts and
prediction intervals
k
X
a j cos( j t ) + b j sin( j t ) + t .
j =1
Now we have k harmonics and the series is written as a sum of terms (determining the number of harmonics
we require is not always easy). We can adjust the seasonality of the different harmonics by changing the j
terms. For example, if we had daily data with a yearly seasonal trend we could set the first harmonic 1 =
2/365. We can make the model more complicated by letting the data determine the s via, for example,
maximum likelihood.
TCD-MT2014
Rozenn Dahyot
highly systematic. For example, in order for the log transformation to work the variability in the data must
be increasing as time goes on. It is easy to envisage situations where the variability gets bigger and smaller
over time in a non-systematic way.
2.1 Definition (ARCH) Autoregressive Conditional Heteroscedastic (ARCH) models allow the variance term
to have its own random walk, for instance extending AR(1):
y t = 0 + 1 y t 1 + t , t N (0, 2t )
so that
2t = 0 + 1 2t 1
The net effect is to allow the variability in y t to vary with time; each y t has its own t . The parameters 0
and 1 control the amount of variation in y t at time t by controlling the size of t .
2.2 Definition (GARCH) A further generalisation exists, known as Generalised Autoregressive Conditional
Heteroscedastic (GARCH) models. In this version, the variance terms are given their own ARMA process
(rather than the AR process in ARCH models), based on the squares of the error terms. A simple version
would now have:
t N (0, 2t ), 2t = 0 + 1 2t 1 + 1 2t 1 .
where t has the same meaning as in a standard MA process. Further generalisations exist which allow
non-linear, non-stationary processes be applied to the variance terms.
64
TCD-MT2014
Rozenn Dahyot
Bibliography
[1] C. Chatfield. Time series Forecasting. Chapman& Hall/CRC, 2001.
[2] S. Makidakis, S.C. Wheelwright, and R.J. Hyndman. Forecasting; Methods and Applications. Wiley, 1998.
65