Escolar Documentos
Profissional Documentos
Cultura Documentos
2010-11-17
ARIMA LAB
ECONOMIC TIME SERIES MODELING
FORECAST THE INFLATION IN RUSSIA
1. Introduction
1.1
About the lab
This lab will introduce you to some basic time series techniques including how to build an ARIMA model for
forecasting. The variable to be forecasted, in this lab, is the monthly inflation in Russia. The CPI data of
Russia is stored in russia.xls.
1.2 General About Eviews student version
There is a Eviews ver. 6 for students for 40:- available at
http://www.timberlake.co.uk/Students/eviews.php
There are no serious restrictions except one very serious restriction the student version will only work for 2
years. And, that is a very serious limitation.
2. Read in the data and transform the original series to usable variable
2.1 Intro and Identification
Read in the data. Use the calculator to form the log, the first and the second log differences of the series.
Then turn to the identification step.
The models to be found is an ARIMA model of the form
(1 L )d xt
= a ( L) xt 1 + b( L) t ,
where (1-L) = is the difference operator with d=11, a(L) and b(L) represents the autoregressive structure
(the lag parameters) AR and the Moving Average (MA) structure, respectively. The term t is a white noise
stochastic process. The idea is to use the Auto-Correlation Function and the Partial Auto-Correlation
Functions to identify the order of the process, and then to estimate it. In this lab you are asked to estimate a
large number of models learn how ARIMA modelling works in practice.
IMPORTANT. Since we want to do forecast it is a good idea to extend the sample such that it includes the
forecast horizon as well. Go to the spreadsheet window for the data. Click on the first empty cell after the last
observation, and make that empty cell a missing observation. Once indicated as a missing observation it will
included in the forecast horizon. Add at least 8 observations.
Fractional integration, and fractional differencing, means that d can be non-integer values. If -0.5 d < 0.5 the series
would still be an integrated series but long-run stationary, and if 0.5 d < 1.5, the series would be integrated and nonstationary.
Typically, you will need a log transformation, and first and second differences of the log. (In this case you
should also take the third difference, for reasons explained later).
Building an ARIMA model, or any dynamic econometric time series model, involves the following steps2
1) Identification
2) Estimation
3) Testing
4) Re-estimation until a well-defined statistical model is found.
In the Identification step you should do the following. Study the graphs of the data in log level and
differences.
First you need to determine if the series is stationary or non-stationary. Graph the series in level, in first
differences, second difference, of course graph the ACF:s and PACF:s. Third, find the approximate lengths
of the ACF:s and PACF:s. The latter will help you to determine the number of lags used in the estimated
models
Compare the series in level, with first, second and third differences. Determine when the series becomes
stationary. Use the graph module to graph the ACF and the PACF. An integrated series has an ACF that
starts around unity (or 0.99 to 0.80) and dies out slowly, while the PACF cuts off after the first lags.
It is sometimes easy to see when you have over differentiated the data. Over-differencing is spotted when the
differenced series starts to display a moving average process. Differencing is a form of temporal aggregation
of the data. Temporal aggregation typically gives rise to an MA process.
Second, look for structural breaks in the data, this can be important depending on the series. The effect of
structural breaks, as well as large outliers, is typically that the AR lag structure tends to be come relatively
long. Adjusting the sample to avoid including (pre-) historical periods representing a different economic
policy regime is always a good idea in a univariate model3. And, look for outliers and possible shifts in the
series that might need dummy variables.
Hint: For modeling the CPI of Russia, you might want to start the ARIMA modeling from 1999:07, and put
in two dummies for the outlier around year 2000 something. The graphs will be very informative about this.
These steps represents the Box-Jenkins approach to time series modelling. You are supposed to explain how and
why we use these steps in some detail.
3
In a VAR model, or any structural model, you might have the choice to model the break in one variable with the
help of other variables in the model. Say that inflation takes off at some time, this could be caused by an expansion in
the money supply.
First, use the ACF and the PACF function to identify seasonal effects in the stationary series. For this you
need to do a log transformation and take difference. That is, if you conclude that the series is non-stationary.
Typically seasonality is indicated through significant seasonal ACF:s at the seasonal lags.
In the lab using Russian data you might skip adjusting for seasonal effects4. It does not mean that they are not
there, it is only that the sample too short to identify them. Again look at graphs! And look at section 4.1 if
you think dummies are needed to deal with outliers.
Seasonal effects can be dealt with through:
- Seasonal dummies in the regression (Centered or non-centered, see model formulation menu)5
- Seasonal differencing
- X12 program
Using seasonal differencing, recommended by Box & Jenkins, but might be too crude on the data. You
might impose a seasonal unit root in the process, which might not be a valid transformation.6
Seasonal (impulse) dummies are a standard tool. Centered seasonal dummies is better since they will leave
the constant intact. Centered seasonal dummies with quarterly data include three dummy variables in the
regression with values that sum to zero. Centred seasonal is often the better solution7.
The X12 program is a black box of all sorts of transformations that remove seasonal effects. It is an ad hoc
procedure, but works o.k. and it is used by many official departments that publish seasonally adjusted data. If
you don't like black boxes then don't use it. The program will do things with your data that you cannot
control. (In principle, X12 will not affect unit roots. Thus, it is possible but not recommended to apply the
technique first and then test for unit roots on the deseasonalized data.) Please remember tough that the X12
procedure will remove degrees of freedom from your sample even though the program returns the same
number of observations. In effect you are estimating 12 seasonal variables in monthly data, and 4 seasonal
variables in quarterly data.
AR(p)
ACF
Tails off
PACF
Cuts off at lag p
MA(q)
Cuts of at lag q
Tails off
4
5
Any text book in regression analysis explains the use of seasonal dummies.
If other methods and models dont work well it might be worth coming back to seasonal differencing. Though, if
you are careful, you should test for seasonal unit roots before taking using seasonal differencing.
7
Observe that it s recommended that centred seasonals has a sample with complete years, or exactly the same number
of observations for seasonal dummies.
6
ARMA(p, q)
Tails off
Tails off
Seasonal effects typically show up as spikes at seasonal frequencies. Of course, a white noise process has
no significant ACF or PACF. A random walk or an integrated series has a high first autocorrelation
coefficient and dies out very slowly.
Remember your conclusion about AR and MA components.
Start by estimating a number of AR(p) models, second estimate a number of MA(q) models, third estimate a
number of ARMA(p, q) models, finally pick the best model.
Start with a high lag order 10-12. Given the sample size this is relatively long; you have to use your judgment
regarding the degrees of freedom in the model.
Model AR(12) to AR(0). For each estimated model test for autocorrelation and remember the value of AIC.
Start with AR(12), indicate AR order 12 in the Model Settings window. Click OK.
For estimation method, always click Maximum Likelihood. The MLE procedure might not converge. If there
is no convergence, the results and the model cannot be used. The solution is to find a better specification
(selection) of p and q.
In the results window, confirm that the model is converging. This is important for MA models.
The output shows the estimated parameters, and the t-values. The t-values are only indicative as long as we
have not confirmed that there is no autocorrelation in the residual process (important !).
In the output, below the estimated model, you will find AIC.T (and AIC) Akaikes Information Criteria. This
criterion corresponds to an adjusted R2 value. It can be used to compare the residual variance (value of the
likelihood function) across models. We want to minimize this criterion. Thus, among models with no
autocorrelation in the residuals we pick the one with the smallest information criteria. AIC.T and AIC are
quite equal, and will generally give the same results. There are also other information criteria, reported by
other modules of the program. Look under Help, Akaike for more info. The AIC criteria is developed for
AR(p) models but works, ok even under other dynamic models.
Chose the best AR process according to the three design criteria above.
After find a good AR model. Do the same, estimate MA(q) models from q = 12 to 1, pick the best MA(q)
model. MA models are more difficult to estimate some models might not converge.
Then estimate ARMA(p, q) models up to ARMA(2, 2). These models might not also converge.
Given all these models, you can quite mechanically select a model that fits the data.
Remember that you are looking for a model with (in order)
i) white noise residuals (no autocorrelation in residuals),
ii) low residual variance (low information critera),
iii) few parameters and which is understandable, make common sense (a parsimonious model).
Thus among all possible AR(P), MA(q) and ARIMA(p,d,q) models select models with white noise error
terms.8 Among those with white noise errors pick the model with the lowest information criteria. Different
people might find different models, given the same data set, depending on dummies, seasonal adjustment,
and choices.
The interesting question is how do you motivate your final model?
In the context of ARIMA models, we identify white noise as no significant autocorrelation in the estimated residual process.
Is your model consistent with the ACF:s and PACF:s you saw in the beginning?
Investigate the residual from your estimated model through the ACF & PACF.
The program will produce a huge amount of output (if we ask for it) but we are only interested in a series the
program calls D11. The latter is the seasonally adjusted series, make sure D1 is marked on in the Window
Graphic analysis). Next, the program shows a graph with the original series and the adjusted series. You can
now judge if there is a difference or not.
Return to the X12arima window and save the series under Test, Store in Database. In the following window
mark Seasonally Adjusted (D11). D11 is the name of the series given by the X12 procedure.
The outcome of using X12 is not always predictable. Sometimes you get a de-seasonalized series which is
easy to model leading to a nice parsimonious model, sometimes leads to complex models with long lag
lengths.
5. FORECAST INFLATION
Finally, produce some forecasts.
Question, what are your estimated inflation figures for the year 2008, expressed as annual inflation figures?
Give the 95% confidence intervals of your estimates on a yearly basis. This is easy if you know how to use
log differences. Hint: yearly inflation, 12 X = ln X t ln X 12 . You will find the official inflation figures
and forecasts at some web page I hope.
The point here is that you should be able to see the final model from the original ACF and PACF functions
calculated from the first difference. You should also become familiar with standard time series concepts,
building and testing models for finding a good fit and make predictions, and to describe the characteristics of
given time series. In addition also draw some economic conclusions from graphs and significant correlations.
Do you think your forecast is bad? Dont worry. This is after all only single equation model. From here we
can turn to Transfer functions, rational distributed lag models, VAR models, Error Correction Models and
Vector Error Correction Models. (In order of improvement). Finally, all forecaster will in the end use their
judgment and adjust the constant in the equation to get better forecasts, at least adjust the forecast range.
An AR model is estimated in several ways. Under the saved workfile go to Object and New object.
Define an Equation and give it a name.
Formulate your model. For an AR(p) model write dlx ar(1) ar(2) ar(p)
Notice that, in this case, (1) means the first lag (2 )means the second lag etc.
As long as you dont close the window, it will be helpful for you. Start with a long lag structure and work
down. For each estimated model gather information about information criteria (AIC) will do and the Qstat. The Q-stat should not be significant at any point after p. 9
Test if there is autocorrelation. In the output window, under View find residual diagnostics and the Qtest. of course, there are no prob. values for the first p lags, since they are already eaten by the lags.
The program suggests the number of lags to use, so use that.
If you look around you can also find the roots (=latent=inverted) of the process to study the dynamics. In
particular if there are any roots left close to the unit circle indicating a unit root.
Using these criteria, find the best fitting AR model .
Next, do the same for MA models. These are formulated as dlx ma(1) ma(2) etc. Again, find the best, MA
model using the same criteria.
Next try a combination of AR and MA to find the best ARMA representation. Finally pick the best model
of the three final models.
Learn about AIC and the Q stat, notice how they are formulated.
10