Você está na página 1de 10

Bo Sjo

2010-11-17

ARIMA LAB
ECONOMIC TIME SERIES MODELING
FORECAST THE INFLATION IN RUSSIA

Send in a written report to bosjo@liu.se before Wednesday November 24, 2010.

Notice that this is a relatively complex and time consuming exercise.

1. Introduction
1.1
About the lab
This lab will introduce you to some basic time series techniques including how to build an ARIMA model for
forecasting. The variable to be forecasted, in this lab, is the monthly inflation in Russia. The CPI data of
Russia is stored in russia.xls.
1.2 General About Eviews student version
There is a Eviews ver. 6 for students for 40:- available at
http://www.timberlake.co.uk/Students/eviews.php
There are no serious restrictions except one very serious restriction the student version will only work for 2
years. And, that is a very serious limitation.

2. Read in the data and transform the original series to usable variable
2.1 Intro and Identification
Read in the data. Use the calculator to form the log, the first and the second log differences of the series.
Then turn to the identification step.
The models to be found is an ARIMA model of the form

(1 L )d xt

= a ( L) xt 1 + b( L) t ,

where (1-L) = is the difference operator with d=11, a(L) and b(L) represents the autoregressive structure
(the lag parameters) AR and the Moving Average (MA) structure, respectively. The term t is a white noise
stochastic process. The idea is to use the Auto-Correlation Function and the Partial Auto-Correlation
Functions to identify the order of the process, and then to estimate it. In this lab you are asked to estimate a
large number of models learn how ARIMA modelling works in practice.

IMPORTANT. Since we want to do forecast it is a good idea to extend the sample such that it includes the
forecast horizon as well. Go to the spreadsheet window for the data. Click on the first empty cell after the last
observation, and make that empty cell a missing observation. Once indicated as a missing observation it will
included in the forecast horizon. Add at least 8 observations.

Fractional integration, and fractional differencing, means that d can be non-integer values. If -0.5 d < 0.5 the series
would still be an integrated series but long-run stationary, and if 0.5 d < 1.5, the series would be integrated and nonstationary.

Typically, you will need a log transformation, and first and second differences of the log. (In this case you
should also take the third difference, for reasons explained later).
Building an ARIMA model, or any dynamic econometric time series model, involves the following steps2
1) Identification
2) Estimation
3) Testing
4) Re-estimation until a well-defined statistical model is found.
In the Identification step you should do the following. Study the graphs of the data in log level and
differences.

First you need to determine if the series is stationary or non-stationary. Graph the series in level, in first
differences, second difference, of course graph the ACF:s and PACF:s. Third, find the approximate lengths
of the ACF:s and PACF:s. The latter will help you to determine the number of lags used in the estimated
models
Compare the series in level, with first, second and third differences. Determine when the series becomes
stationary. Use the graph module to graph the ACF and the PACF. An integrated series has an ACF that
starts around unity (or 0.99 to 0.80) and dies out slowly, while the PACF cuts off after the first lags.
It is sometimes easy to see when you have over differentiated the data. Over-differencing is spotted when the
differenced series starts to display a moving average process. Differencing is a form of temporal aggregation
of the data. Temporal aggregation typically gives rise to an MA process.
Second, look for structural breaks in the data, this can be important depending on the series. The effect of
structural breaks, as well as large outliers, is typically that the AR lag structure tends to be come relatively
long. Adjusting the sample to avoid including (pre-) historical periods representing a different economic
policy regime is always a good idea in a univariate model3. And, look for outliers and possible shifts in the
series that might need dummy variables.
Hint: For modeling the CPI of Russia, you might want to start the ARIMA modeling from 1999:07, and put
in two dummies for the outlier around year 2000 something. The graphs will be very informative about this.

2.2 Perform Seasonal Adjustment


2

These steps represents the Box-Jenkins approach to time series modelling. You are supposed to explain how and
why we use these steps in some detail.
3
In a VAR model, or any structural model, you might have the choice to model the break in one variable with the
help of other variables in the model. Say that inflation takes off at some time, this could be caused by an expansion in
the money supply.

First, use the ACF and the PACF function to identify seasonal effects in the stationary series. For this you
need to do a log transformation and take difference. That is, if you conclude that the series is non-stationary.
Typically seasonality is indicated through significant seasonal ACF:s at the seasonal lags.
In the lab using Russian data you might skip adjusting for seasonal effects4. It does not mean that they are not
there, it is only that the sample too short to identify them. Again look at graphs! And look at section 4.1 if
you think dummies are needed to deal with outliers.
Seasonal effects can be dealt with through:
- Seasonal dummies in the regression (Centered or non-centered, see model formulation menu)5
- Seasonal differencing
- X12 program
Using seasonal differencing, recommended by Box & Jenkins, but might be too crude on the data. You
might impose a seasonal unit root in the process, which might not be a valid transformation.6
Seasonal (impulse) dummies are a standard tool. Centered seasonal dummies is better since they will leave
the constant intact. Centered seasonal dummies with quarterly data include three dummy variables in the
regression with values that sum to zero. Centred seasonal is often the better solution7.
The X12 program is a black box of all sorts of transformations that remove seasonal effects. It is an ad hoc
procedure, but works o.k. and it is used by many official departments that publish seasonally adjusted data. If
you don't like black boxes then don't use it. The program will do things with your data that you cannot
control. (In principle, X12 will not affect unit roots. Thus, it is possible but not recommended to apply the
technique first and then test for unit roots on the deseasonalized data.) Please remember tough that the X12
procedure will remove degrees of freedom from your sample even though the program returns the same
number of observations. In effect you are estimating 12 seasonal variables in monthly data, and 4 seasonal
variables in quarterly data.

2.3 Identification after deciding on differencing and seasonality


After finding d in ARIMA(p,d,q), the next step is to identity p and q, or get some idea about them. To our
help we have the following scheme:

AR(p)

ACF
Tails off

PACF
Cuts off at lag p

MA(q)

Cuts of at lag q

Tails off

4
5

Any text book in regression analysis explains the use of seasonal dummies.
If other methods and models dont work well it might be worth coming back to seasonal differencing. Though, if
you are careful, you should test for seasonal unit roots before taking using seasonal differencing.
7
Observe that it s recommended that centred seasonals has a sample with complete years, or exactly the same number
of observations for seasonal dummies.
6

ARMA(p, q)

Tails off

Tails off

Seasonal effects typically show up as spikes at seasonal frequencies. Of course, a white noise process has
no significant ACF or PACF. A random walk or an integrated series has a high first autocorrelation
coefficient and dies out very slowly.
Remember your conclusion about AR and MA components.

3. Building and Estimating the ARIMA Model


Start with the log of CPI in levels. Is log of CPI stationary? How many times do you need to differentiate to
achieve stationarity? This is first of all a matter of judgment (in a forthcoming lab you will learn to back up
your judgment with statistical tests).
After deciding on appropriate differencing, and formed some initial conclusions about the appropriate
process and the order of p and q, continue with Estimation.
Our aim is to find the exact order of the AR and MA processes. The order of differencing has been decided
upon already.
We have three criteria for selecting the final model and lag order:
1) The estimated residual should display no significant autocorrelation (very important). Use the BoxLjung (portmanteau) test, look at the ACF:s and PACF:s of the residual, perhaps in combination with
some other test of autocorrelation if that is available.
2) The final model should have the lowest possible residual variance, among all models with no
autocorrelation
3) The model should be parsimonious and be easy to understand. (We dont want long lag structures, or
the inclusion of significant lags beyond what can be meaningful from an economic perspective). Use
your judgment.
Whether there is autocorrelation in the residual can be judged by inspecting the AFC and the PACF of the
residual, in combination with the test above. The standard test for ARMA models is the portmanteau test
(Box-Ljung test). This test builds on the assumption, that the estimated ACF:s are a zero-mean white noise
process, with an asymptotic normal distribution. Take the square of each ACF and sum the squares, the
outcome of this summation is a chi-square distributed random variable with value not significantly different
from zero. The test makes some additional adjustment for degrees of freedom and small sample corrections.
(Look up the test in any textbook and confirm this!).

Start by estimating a number of AR(p) models, second estimate a number of MA(q) models, third estimate a
number of ARMA(p, q) models, finally pick the best model.

Start with a high lag order 10-12. Given the sample size this is relatively long; you have to use your judgment
regarding the degrees of freedom in the model.
Model AR(12) to AR(0). For each estimated model test for autocorrelation and remember the value of AIC.
Start with AR(12), indicate AR order 12 in the Model Settings window. Click OK.
For estimation method, always click Maximum Likelihood. The MLE procedure might not converge. If there
is no convergence, the results and the model cannot be used. The solution is to find a better specification
(selection) of p and q.
In the results window, confirm that the model is converging. This is important for MA models.
The output shows the estimated parameters, and the t-values. The t-values are only indicative as long as we
have not confirmed that there is no autocorrelation in the residual process (important !).

In the output, below the estimated model, you will find AIC.T (and AIC) Akaikes Information Criteria. This
criterion corresponds to an adjusted R2 value. It can be used to compare the residual variance (value of the
likelihood function) across models. We want to minimize this criterion. Thus, among models with no
autocorrelation in the residuals we pick the one with the smallest information criteria. AIC.T and AIC are
quite equal, and will generally give the same results. There are also other information criteria, reported by
other modules of the program. Look under Help, Akaike for more info. The AIC criteria is developed for
AR(p) models but works, ok even under other dynamic models.
Chose the best AR process according to the three design criteria above.
After find a good AR model. Do the same, estimate MA(q) models from q = 12 to 1, pick the best MA(q)
model. MA models are more difficult to estimate some models might not converge.
Then estimate ARMA(p, q) models up to ARMA(2, 2). These models might not also converge.
Given all these models, you can quite mechanically select a model that fits the data.
Remember that you are looking for a model with (in order)
i) white noise residuals (no autocorrelation in residuals),
ii) low residual variance (low information critera),
iii) few parameters and which is understandable, make common sense (a parsimonious model).
Thus among all possible AR(P), MA(q) and ARIMA(p,d,q) models select models with white noise error
terms.8 Among those with white noise errors pick the model with the lowest information criteria. Different
people might find different models, given the same data set, depending on dummies, seasonal adjustment,
and choices.
The interesting question is how do you motivate your final model?

In the context of ARIMA models, we identify white noise as no significant autocorrelation in the estimated residual process.

Is your model consistent with the ACF:s and PACF:s you saw in the beginning?

4. IMPROVING THE MODEL


4.1 Dummies
The graphs of the series and most of all the residual in the ARIMA model indicate outliers. These models are
sensitive for outliers. Graph under test menu look for big outliers. Save the residuals and look at the numbers
and the graph.
How to search for outliers? First you might of course inspect the estimated residual in a graph. But you can
also ask the program to list extreme residual values. The simplest way of doing this is to run an OLS
regression of the variable of interest against a constant and inspect the residual.
Usually the model and tests will improve after including a few dummies. Sometimes, it possible to get white
noise normally distributed residuals after removing some extreme outliers. Be careful though, adding
dummies might just hide the poor fit of the model instead of improving it.

Investigate the residual from your estimated model through the ACF & PACF.

4.2 More on X12 and Seasonal Dummies


X12 or seasonal dummies, or no seasonal effects?
One might chose between using X12 and seasonal dummies in the model. Centered seasonal dummies should
be the first choice, if it do not work well one might try X12. Centered seasonal dummies have the advantage
that they sum to zero. Thus, their inclusion in the model will not affect the estimated constant term. In time
series modeling this can be important, since the constant reflects the average growth rate in the sample. You
can test if seasonal dummies are significant. Under test menu chose exclusion test and exclude the dummies.
The outcome is an F-test with H0: seasonal dummies have zero parameters.

The program will produce a huge amount of output (if we ask for it) but we are only interested in a series the
program calls D11. The latter is the seasonally adjusted series, make sure D1 is marked on in the Window
Graphic analysis). Next, the program shows a graph with the original series and the adjusted series. You can
now judge if there is a difference or not.
Return to the X12arima window and save the series under Test, Store in Database. In the following window
mark Seasonally Adjusted (D11). D11 is the name of the series given by the X12 procedure.
The outcome of using X12 is not always predictable. Sometimes you get a de-seasonalized series which is

easy to model leading to a nice parsimonious model, sometimes leads to complex models with long lag
lengths.

5. FORECAST INFLATION
Finally, produce some forecasts.
Question, what are your estimated inflation figures for the year 2008, expressed as annual inflation figures?
Give the 95% confidence intervals of your estimates on a yearly basis. This is easy if you know how to use
log differences. Hint: yearly inflation, 12 X = ln X t ln X 12 . You will find the official inflation figures
and forecasts at some web page I hope.

6. LEARNING OBJECTIVIES AND LAB INSTRUCTIONS


MOTIVATE YOUR FINAL CHOICE OF MODEL
COMMENT ON YOUR RESULTS (forecasts) AND CONCLUSIONS
COMMENT ON THE WEAKNESS OF THE MODEL.
In the end time series modeling is about choices that you make and must defend.
What is needed?
- A final ARIMA model.
- Motivations for choosing this model.
- Forecasts as described above.
- Confidence intervals for the estimates.
- A graph comparing seasonal and seasonally adjusted series.
- A comparison with actual inflation rates during 2004.
Remember, this is a real life experiment. Time series modeling is about judgment. Experienced forecasters
will always look at the estimate and consider if they are reasonable or not. If not a common tactic is to adjust
the constant in the model, the point from which you start your forecasts. There is no totally correct answer.
Hence, the reader should be convinced after reading your report that your model is as good as it can get. And
you should not show 100 different estimates of 100 different models or graphs. Consider which are the basic
figures a reader might want to look at?

The point here is that you should be able to see the final model from the original ACF and PACF functions
calculated from the first difference. You should also become familiar with standard time series concepts,
building and testing models for finding a good fit and make predictions, and to describe the characteristics of
given time series. In addition also draw some economic conclusions from graphs and significant correlations.
Do you think your forecast is bad? Dont worry. This is after all only single equation model. From here we
can turn to Transfer functions, rational distributed lag models, VAR models, Error Correction Models and
Vector Error Correction Models. (In order of improvement). Finally, all forecaster will in the end use their
judgment and adjust the constant in the equation to get better forecasts, at least adjust the forecast range.

7. UPDATE ON USING EVIEWS


The manuals for Eviews are stored in the same subdirectory as the program on the harddrive of the
computer.
After reading the data from excel. File Open Foreign Data as Workfile
After loading the data, highlight the Workfile and save it in Eviews format.
If you double click on a series you are taken to a workfile window for that series. It is you choice if you
want o use this facility at this time.
Generate logs of series. Under Genr or Object and Generate series write
lx = log(x)
for the natural log of x. Remember that Eviews doesnt like capital letters. For the first difference write,
again in the Generate window, dlx = lx lx(-1). Remember that most time series programs will use (1) for
the first lag and (-1) for the first lead etc.
Study the graph of lx and dlx (and maybe ddlx) to learn about non-stationarity and outliers.
Do not use EViews facility calculate per cent differences. We want the first (natural) log difference
because it is a close approximation to per cent change and represents continuous compounding.
If you double click on the series, you can under view find Descriptive statistics and tests and
Correlogram. Correlogram gives you the estimated AFC:s and PACF:s in combination with the BoxLjung test (Q) for autocorrelation.
Use the Correlogram to identify the possible ARIMA process.
Notice that correlogram can be found through other ways in the program, such as Quick and Series
Statistics.
How to estimate ARMA models.
After deciding on structural shifts (sample length, differencing to achieve stationarity (order of
integration), estimate a number of AR, MA and ARMA models.
Even though the program can be told take the first difference and the model the ARIMA, wait with that
option, unless you understand what it is doing.

An AR model is estimated in several ways. Under the saved workfile go to Object and New object.
Define an Equation and give it a name.
Formulate your model. For an AR(p) model write dlx ar(1) ar(2) ar(p)
Notice that, in this case, (1) means the first lag (2 )means the second lag etc.
As long as you dont close the window, it will be helpful for you. Start with a long lag structure and work
down. For each estimated model gather information about information criteria (AIC) will do and the Qstat. The Q-stat should not be significant at any point after p. 9
Test if there is autocorrelation. In the output window, under View find residual diagnostics and the Qtest. of course, there are no prob. values for the first p lags, since they are already eaten by the lags.
The program suggests the number of lags to use, so use that.
If you look around you can also find the roots (=latent=inverted) of the process to study the dynamics. In
particular if there are any roots left close to the unit circle indicating a unit root.
Using these criteria, find the best fitting AR model .
Next, do the same for MA models. These are formulated as dlx ma(1) ma(2) etc. Again, find the best, MA
model using the same criteria.
Next try a combination of AR and MA to find the best ARMA representation. Finally pick the best model
of the three final models.

Learn about AIC and the Q stat, notice how they are formulated.

10

Você também pode gostar