Você está na página 1de 4

Notes on ARIMA

Contents

1. Introduction
2. Model Building
3. Model Diagnostics
4. Seasonal ARIMA
5. References

Introduction: ARIMA = Auto Regressive (AR) Integrated(I) Moving Average (MA)

1. Why do we need ARIMA or how is ARIMA different (compared to other time-based


methods)? ARIMA identifies correlation between terms, exponential smoothing
doesn’t.
2. How does ARIMA identify that? By using correlation coefficients! There are 2 types
of correlation coefficients here - Autocorrelation coefficient (ACF) and Partial
Autocorrelation coefficient (PACF).
3. ARIMA (p,d,q)
a. p - number of autoregressive terms - measures the influence of previous
terms
b. d - number of non-seasonal differences-the difference operator
c. q -number of moving average terms - measures the influence of previous
errors, or lagged error terms
4. Assumptions: AR model assumes independent errors and normally distributed errors
with 0 means and constant variance (Ord & Fildes, 2017)

Model Building: Step 1 is to create an (ACF) chart and then ask the following questions.

1. Questions:
a. Is the data random?
i. If all ACF values are within horizontal dotted lines, then yes.
b. Is the data stationary?
i. Observation - If ACF drops to nearly 0 after 2nd or 3rd lag, then yes.
Otherwise, trend exists.
ii. Statistics – Apply unit root tests such as Kwiatkowski–Phillips–
Schmidt–Shin or KPSS, or a unit root test such as Augmented Dickey-
Fuller or ADF test, Phillips-Perron or PP test.
(faculty.washington.edu,2018) The R codes are given below:

For ADF, if p-value is less than 0.05, then the data is stationary.
For KPSS, if p-value is greater than 0.05, then the data is stationary.
c. How to remove stationarity?
i. By differencing the terms
ii. Example (see Table 8.3, Makridakis for more examples)

DATA FIRST SECOND


(TIME DIFFERENCE DIFFERENCE
SERIES)
1 2.5 2.7 0.3
2 5.2 3
3 8.2
d. Look at the first difference time series’ ACF plot. Does the ACF drop after 2 nd
or 3rd lag? If no, then do second difference and so on. If d=1, then it is a linear
trend, if d=2, then the trend is quadratic. If the trend is curved, do a log
transformation before differencing.
e. If not, then at what level do they become stationary?
i. See answer above. To find out how many differences you need, use
nsdiff or ndiff in R.
f. Is the data seasonal?
i. If ACF is high for lag=12, it means that an event happens every year
(once in every 12 months) which explains the spike. Hence,
seasonality exists.
g. If yes, what’s the length of seasonality?
i. In the case above, it is 12.
2. ACF for lag=0, will always be 1. Because the data will have complete correlation with
itself.
3. ACF for a series generated by random number generator should be theoretically 0. It
may not be 0 due to correlation by chance, but it will be within upper and lower
limits (the dotted horizontal lines)
4. If the stationarized series has positive autocorrelation at lag 1, AR terms often work
best. If it has negative autocorrelation at lag 1, MA terms often work best.
(people.duke.edu, 2018)
5. In the equations, MA terms are always negative (parameter is denoted by theta,
theta is the weight of the error term) and AR terms are always positive (parameter is
denoted by phi)
6. Step 2 is to remove non-stationarity, that is, trend. If you don’t, spurious
autocorrelations will occur. (AR and MA models assume stationarity) (Otexts.org,
2018)
7. Step 3 is to create PACF plots and determine the order of AR or MA process (see pg.
331-337, Makridakis)
a. If the spikes trail off (i.e., do not drop suddenly), then it is MA
b. If the spikes drop suddenly after 1,2 or 3 then it’s AR(1), AR(2) or AR(3)
respectively
c. The opposite is true for ACF plots. For MA, the spikes drop suddenly and for
AR, spikes decrease gradually.
d. Phi may be positive or negative. So, if spikes alternate, then phi is negative.
e. If both trail off, then it is a mixed ARMA model. Determine AR order from
PACF and MA order from ACF.

Model Diagnostics:

1. After creating the ARIMA model, plot the residuals. If it looks like white noise (no
spikes above horizontal lines), then your model is valid. Also, use portmanteau test
of residuals to make sure the ARIMA model fits the data - Durbin Watson test, Box-
Pierce test, Ljung-Box test, Breusch-Godfrey test.

Ljung-Box test interpretation – If p value> 0.05, then accept null hypothesis – no


autocorrelation. Given below is the R code.

Seasonal ARIMA:

It is denoted by (P,D,Q). The final model becomes (p,d,q)(P,D,Q). Attach (0,1,1) or (0,0,1) to
a non-seasonal ARIMA model. Check which combination [such as (1,1,1)(0,1,1) or
(2,1,2)(0,0,1), etc. ] of seasonal and non-seasonal ARIMA has the lowest AICc values. See
sample R code below:

There are plenty of possible combinations. You might want to do an auto-arima first, to
determine the seasonal part of the model.

References:

 Makridakis, S., 1998. Forecasting: Methods and Applications, Wiley


 Otexts.org. (2018). Forecasting: principles and practice. [online] Available at:
https://www.otexts.org/fpp [Accessed 11 Apr. 2018].
 Ord, J.K., Fildes, Robert & Kourentzes, Nikolaos, 2017. Principles of business
forecasting Second., New York: Wessex Press Inc.
 People.duke.edu. (2018). [online] Available at:
https://people.duke.edu/~rnau/Slides_on_ARIMA_models--Robert_Nau.pdf
[Accessed 11 Apr. 2018].
 Faculty.washington.edu. (2018). [online] Available at:
http://faculty.washington.edu/ezivot/econ584/notes/unitroot.pdf [Accessed 11 Apr.
2018].

Você também pode gostar