Você está na página 1de 7

A GUIDE TO BOX-JENKINS MODELING

By George C. S. Wang

Describes in simple language how transformed into one. We can define that a 1. Model identification
to use Box-Jenkins models for stationary time series has a constant mean 2. Model estimation
and has no trend overtime. A plot of the 3. Diagnostic Checking
forecasting ... the key requirement
data is usually enough to see if the data are 4. Forecasting
of Box-Jenkins modeling is that stationary. In practice, few time series can
time series is either stationary or meet this condition, but as long as the data The four steps are sinîilar to those
can he transformed into one ... the can be transformed into a stationary series, required for linear regression except that
most difficult part in this type of a Box-Jenkins model can be developed. Step I isalittlemoreinvolved. Box-Jenkins
uses a statistical procedure to identify
modeling is the identification of a
THE MODELING PROCESS a model, which can be confusing. The
model. other three steps are quite straightforward.
Box-Jenkins modeling of a stationary Let's first discuss the mechanics of Step
time series involves the following four 1, model identification, which we would
do in great detail. Then we will use an

G
eorge Box and Gwilyni Jenkins steps:
developed a statistical approach example to illustrate the whole modeling
for time series modeling. Time process.
series models developed on the basis of
their approach are called Box-Jenkins MODEL IDENTIFICATION
models, also known as ARIMA models. A
time series can be defined as a sequence of ARIMA stands for Autoregressive-
data observed over time. Integrated-Moving Average. The letter"!"
(Integrated) indicates that the modeling
ARIMA models are univariate, that time series has been transformed into a
is, they are based on a single time series stationary time series. ARIMA represents
variable. Box and Jenkins have also three difTerent types of models: It can be
developed procedures for multivariale an AR (autoregressive) model, or a MA
modeling. However, in practice, even their (moving average) model, or an ARMA
univariate approach, sometimes, is not as which includes both AR and MA terms.
well understood as the classic regression Notice that we have dropped the "1" from
method. The objective of this article is ARIMA for simplicity. Let's briefly define
to describe the basics of univariate Box- these three model forms.
Jenkins models in simple and layman
terms.
GEORGE C. S. WANG AR Model: An AR model looks like
Dr. Wang is currently an independent a linear regression model except that
UNIVARIATE MODELING consultant, specializing in statistical in a regression model the dependent
modeling and business forecasting. variable and its independent variables
The purpose of univariate modeling is Formerly, he was Forecast Manager at are different, whereas in an AR model
to establish a relationship between the Consolidated Edison Company of New the independent variables are simply
present value of a time series and its past York, and was responsible for forecast the time-lagged values of the dependent
values so that forecasts can be made on modeling and forecasting. He also has variable, so it is autoregressive. An AR
the basis of the past values alone. served as Company witness and gave model can include diflcrent numbers ot"
testimony in regulatory proceedings. He autoregressive ternis. If an AR model
Stationary Time Series: The first is the co-author of the book. Regression includes only one autoregressive letm.
requirement for univariate Box-Jenkins Analysis: Modeling and Forecasting. He it is an AR ( 1 ) model; we can also have
modeling is that the time series data to be received his M.B.A. and Ph.D. degrees AR (2), AR (3), etc. An AR model can be
modeled are either stationary or can be from New York University. linear or nonlinear.

THE JOURNAL OF BUSINESS FORECASTINQ SPRING 2008 19


MA Model: A MA model is a weighted
moving average of a fixed number of FIGURE I
forecast errors produced in the past, so
THEORETICAL ACF AND PACF CORRELOGRAMS
it is called moving average. Unlike the
traditional moving average, the weights
la. ACF Ib. PACF
in a MA are not equal and do not sum up
to I. In a traditional moving average, the 0.8 0.8
weight assigned to each of the n values to 0.6 0.6
be averaged equals to 1 /n; the n weights are
0.4 0.4
equal and add up to 1. In a MA, the number
of terms for the model and the weight for 0.2 0.2
each term are statistically determined by
the pattern of the data; the weights are 1 2 3 4 5 1 2 3 4 5
not equal and do not add up to I. Usually, Time Lags Time Lags
in a MA. the most recent value carries a
larger weight than the more distant values,
For a stationary time series, one may use 2a. ACF 2b. PACF
its mean or the immediate past value as a 0.4 0
forecast for the next future period. Each
forecast will produce a forecast error. If
the errors so produced in the past exhibit
0.2
0
-(1.2 J L
t 7 s
-0.2

-0.4
3 4 5 6 7 8

any pattern, we can develop a MA model.


Notice that these forecast errors are not -(1.4 -0.6
observed values; they are generated -0.6 -0.8
values. All MA models, such as MA (1), Time Lags Time Lags
MA (2). MA (3), are nonlinear.

3a. ACF 3b. PACF


ARMA Model: An ARMA model
requires both AR and MA terms. Given 0.4 0
1 2 3 • 5 6 7 « 9
a stationary time series, we must first
identify an appropriate model form. Is it
an AR, or a MA or an ARMA? How many
0.2
0
-0.2 1 2.
ti; -0.2

-0.4
terms do we need in the identified model?
-O.fi
To answer these questions, we need to -0.4
calculate the autocorrelation ftinction and -0.6 -0.8
the partial autocorrelation function of the Time Lags Time Lags
series.

What are Autocorrelation Function the coefficient of the independent variable correlogram. Figure 1 shows three pairs of
(ACF) and Partial Autocorrelation is called first order partial autocorrelation theoretical ACF and PACF correlograms.
Function (PACF)? Without going into the function; when a second term of two-
mathematics, ACF values fall between -1 period lag is added to the regression, the In modeling, if the actual correlogram
and +1 calculated from the time series at coefficient of the second term is called looks like one of these three theoretical
ditïerent lags to measure the significance the second order partial autocorrelation correlograms, in which the ACF dimin-
of correlations between the present function, etc. The values of PACF will ishes quickly and the PACF has only one
observation and the past observations, and also fall between -1 and +1 if the time large spike, we will choose an AR (1)
to determine how far back in time (i.e., of series is stationary. model for the data. The " I " in parenthesis
how many time-lags) are they correlated. indicates that the AR model needs only
How do we use the pair of ACF and one autoregressive term, and the model is
PACF values are the coefficients of a PACF functions to identify an appropriate an AR of order 1.
linear regression of the time series using model? A plot of the pair will provide
its lagged values as independent variables. us with a good indication of what type Notice that the ACF patterns in 2a and 3a
When the regression includes only one of model we want to entertain. The plot are the same, but the large PACF spike in
independent variable of one-period lag. of a pair of ACF and PACF is called a 2b occurs at lag 1, whereas in 3b, it occurs

20 THE JOURNAL OF BUSINESS FORECASTINQ SPRING 2008


at lag 4. Although both correlograms
suggest an AR ( I ) model for the data, the TABLE 1
2a and 2b pattern indicates that the one QUARTERLY ELECTRIC DEMAND
autoregressive term in the model is of lag
1; but the 3a and 3b pattern indicates that Original Data Difftrcnccd Dala
the one autoregressive term in the model Year& Sales \ ear & Sales Year& Sales Year& Sales
is of lag 4. If this lag 4 term is to represent Qt. Y, Qt. Qt. y=Y-Y,^ Qt. y,=Y-Y,^
seasonality of period 4, we wilt denote this (1) (2) (3) (4) (5) "' (6) (7) (8)
model as SAR (4) or AR (4^) to distinguish
it from an AR (4) model, which includes 9501 22.91 0003 33.36 9501 0003 0,16
four autoregressive terms. 9502 20.63 0004 23.50 9502 0004 -0.18
9503 28.85 0101 24.95 9503 0101 -0.42
Suppose that in Figure 1, ACF and 9504 22.97 0102 22.22 9504 0102 -0.14
PACF exchange their patterns, that is, the 9601 23.39 0103 34.81 9601 0.48 0103 1,45
patterns of PACF look like those of the
9602 20.65 0194 24.64 9602 0.02 0194 1,14
ACF and the patterns of ACF look Hke the
PACF having only one large spike, then 9603 30.02 0201 26.21 9603 1.17 0201 L26
we will choose a MA (I) model. Suppose 9604 23.13 0202 23.45 9604 0.16 0202 1.23
again that the PACF in each pair looks the 9701 23.51 0203 31.85 9701 0.12 0203 -2.96
same as the ACF, and then we will try an 9702 22.99 0204 25.28 9702 2.34 0204 0.64
ARMA(1, 1). 9703 32.61 0301 25,76 9703 2.59 0301 -0,45
9704 23.28 0302 22,88 9704 0,15 0302 -0.57
So far we have described the simplest
9801 23.97 0303 34,02 9801 0.46 0303 2,17
AR, MA, and ARMA models. Models of
higher order can be so identified, of course, 9802 21.48 0304 25,80 9802 -1.51 0304 0,52
with difierent patterns of correlograms. 9803 27.39 0401 25,91 9803 -5.22 0401 0.15
Let's use an example to demonstrate what 9804 23.75 0402 24.07 9804 0,47 0402 1,19
we have just discussed. 9901 24.81 0403 36,60 9901 0.84 0403 2.58
9902 21.51 0404 26.43 9902 0,03 0404 0.63
An Example 9903 33.20 0501 27.08 9903 5,81 0501 1.17
9904 23.68 0502 24,99 9904 -0,07 0502 0.92
Table 1 shows the quarterly electric
0001 25.37 0503 41,29 0001 0.56 0503 4,69
demand in New York City from the first
quarter of 1995 through the fourth quarter 0002 22.36 0504 26,69 0002 0,85 0504 0.26
of 2005. The demand is a time series.
The data have been modified to simplify differencing in the following manner: plotted in Figure 3, Notice that, originally.
calculations. Columns (2) and (4) show the the data base has 44 data points; the first
original quarterly demand data. Columns Let Yj be the original data point of four points were lost in ditTerencing, and
(6) and (8) show the quarterly differenced quarter t in Table I; let t = 9601, Y,,,„, = there are 40 points left for modeling.
data. 23.39, and let (t-4) = 9501, Y^,^, - 22'.9I.
The quarterly differenced value y^ ^ Y^ After differencing, has the series become
Stationarity: Is the demand series - Y|^; data in Columns (6) and (8) were stationary? Figure 3 shows that seasonal
stationary? Figure 2 is a plot of the calculated as follows: dilTercncing has eliminated the trend from
original electric demand data in Columns the data, and the mean ofthe data will not
(2) and (4) of Table 1. The plot clearly V =Y -Y =^23 3 9 - 2 2 9 1 = 0 4 8 change over time. The series has become
shows that the demand data are quarterly stationary, and we are ready to develop an
seasonal trending upward; consequently, Similarly, ARMA model.
the mean of the data will change over
time. As defined above, this time series is y,,„,= 20.65-20.63-0.02 Modet Identification: As discussed
not stationary. before, the tools for identifying a good
y,,,,,,= 30.02-28.85 =1.17 model for a stationary time series are its
Since the data are quarterly seasonal, one ACF and PACF. ACF and PACF are the
way to transform the data into a stationary The differenced values so calculated are two statistical terms used in Step I of
series is to perform a four-quarter seasonal given in Columns (6) and (8) of Table 1 and ARMA modeling. When we go through the

THE JOURNAL OF BUSINESS FORECASTING, SPRING 2008 21


calculations, we can easily find that they
are analogous to correlation coelficient FIGURE 2
and partial correlation coefficient in PLOT OF ORIGINAL DATA
multiple linear regression analysis.
45
The ACF and PACF values are giveiî 40
in Table 2. which were calculated for
ten lags. Let's demonstrate manually 35
how to calculate the ACF and PACF of
lag one. 30

In Table 1. Columns (6) and (8). we 25


have 40 differenced data points, so n =
20
40. The differenced data has a mean (u) ^
0.62. IS

Calculation of ACF of Lag I: The 10


calculation of ACF is analogous to the
calculation of correlation coefficient. On
the basis of data given in Table 1. ihe ACF
oflag I is calculated below;ACFüflonger
10 15 20 25 30 45 50
lags can be calculated similarly.
Quarter

ACF of lag I -

FIGURE 3
Auto - cov aricmce
PLOT OF DIFFERENCED DEMAND DATA
Variance

Auto-covariance of lag 1 =

[(0.48-0.62) (0.02-0.62) +
40
(0.02-0.62) (1.17-0.62)

+ ...+ (4.69-0.62) (0.26-0.62)] - 8.53

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Variance = — [(0.48-0.62)-+ (0.02-0.62)^
40
+ ...+ (0.26-0.62)-]= 118.27

TABLE 2
8.53 ACFAND PACF AT DIFFERENT LAGS
ACF of lag 1 = - 0.072.
^ 118.27
Lag ACF PAC Lag ACF PAC
1 0.072 0.072 6 0.012 0.041
2 0.01 0.005 7 -0.05 1 -0.003
Calculation of PACF of Lag 1: In Table 3 0.045 0.045 8 0.148 0.015
2, the PACF of lag I also equals 0.072. -0.396 -0.013
4 -0.406 9 0.122
We can use EXCEL regression add-Íns to
5 -0,177 -0.137 10 0.029 0.025
regress y on y and obtain.

22 THE JOURNAL OF BUSINESS FORECASTING, SPRING 2008


FIGURE 4
As defined before, the coefficient, 0.072, CORRELOGRAM OF ACF AND PACF
of y^ I is the PACF of lag 1. Adding y^, to
ACF PACF
this regression equation, we will get the
PACF of lag 2. etc. 0.2-, 0.2

The correlogram for the ACF and


0 -I - 11. 0
6 7 8 9 1 1 2 3 I Í 6 7 8 9 I
PACF, based on the data given in Table 2,
is shown in Figure 4.

Does this correlogram look like one of


-0.2-

0.4-

0.6-
1 0.2

0.4

0.6
the three sets of correlograms in Figure Time Lags Time l.
1? The ACF in Figure 4 is quite similar
to those in 2a and 3a in Figure I, but the
PACF here seems to look different from equation, where c is the constant term, (j) is the degree of correlation between the
PACF 2b and 3b in Figure 1. However, the coefficient of y^^ and a^ is the residual. dependent variable and the independent
of the 10 bars in the PACF chart, there is We have estimated the model with both variables; we use the t-statistics to test
only one large spike at lag 4. If we ignore procedures as follows: the significance of the coefficients and the
the nine smaller bars in this chart, then it standard error to measure how closely the
becomes similar to chart 3b in Figure 1. y, = 0.863 - 0. + a, -.(2) model fits the data.
We have said that the patterns of charts
3a and 3b in Figure 1 suggested an AR The residual, a,, in Equation (2) is We also need to check the stability of
(4^) model, and then the two charts in expected to be zero in forecasting. The the estimated model. For an AR ( 1 ) model,
Figure 4 also suggest an AR (4^) model as interested readers can use the data given we require that -l<(t)<l. An AR (2) model
follows: in Columns (6) and (8) of Table 1, and has two coefficients, <[), and tj),, we require
use EXCEL to verify the estimated that:
y^ = c + 0y^_^ + a^ . -. Í1 ) coefficients in Equation (2). !f one would
use a nonlinear procedure, it will take
Although we denote Equation (1) as AR three iterations to get Equation (2).
(4^), it is an AR ( 1 ) model in the sense that Equation (2) has a coefficient of
it has only one autoregressive term, which Suppose that the identified model is a -0.4675, which falls between-1 and l.The
models seasonality of period 4. MA (4^) as follow: model is stable. If these conditions are not
met, either because ihe time series is not
stationary requiring more transformation,
MODEL ESTIMATION AND or because the model was not properly
DIAGNOSTIC CHECKING Equation (3) is nonlinear because a^ is identified.
not observable, and it must be generated.
The next two steps are for estimation of We have to use the nonlinear least squares
FORECASTING
the model coefficients and diagnostically procedure to produce a^ (the historic
checking the goodness of fit. These two forecast errors) before we can iteratively
Equation (2) is our model for forecasting,
steps are usually done together. estimate coefficient 0.
but we want to forecast the demand Y, not
the differenced value y^. Therefore, we
Estimation: In fact, most of identified Notice that Box and Jenkins used the must transform the model from the y^ fonn
ARMA models are nonlinear requiring a backward shift operator B in their analysis to the Y, form. Recall that y, ^ V, - Y^ ^ and
nonlinearestimation procedure. Only some very extensively. For example, they y, _,= Yj _,-Y, j^. Equation (2) becomes,
simple AR models are linear and can be denoted y^ ^ = By,, y^, ^ B-y, y ^ = B^y.
estimated with the Ordinary Least Squares etc. In this article, we have avoided the 0.863
(OLS) procedure. For either procedure, the use of B. -0.465 ...(4)
criterion for getting the best estimates of
coefficients is the same, that is, to minimize Diagnostic Checking: Regardless what Notice that we have dropped the a^ tenu
the sum of the squared errors. estimation procedure is used in modeling, in Equation (4) because in forecasting, a
the criteria for testing the goodness of fit is assumed to be zero. Re-arranging terms
Equation ( 1 ) is clearly a linear regression are the same. We use the R' to measure in Equation (4), we obtain.

THE JOURNAL OF BUSINESS FORECASTING SPRING 2008 23


Y - 0.863+ {1 -0.465) Y
+ 0.465Y,,

BE A MEMBER OF THE
+0.465Y. ...(5) INSTITUTE OF BUSINESS FORECASTING & PLANNING
Suppose that we want to forecast the Benefits include:
demand for the first quarter of 2006, •Journal of Business Forecasting co\ cTi. issues sucli as "How to Win the Support
according to Equation (5), we need the Complimentary for active IBF Members, each of Top Management for Forecasting," "How
demand data for the first quarters of 2005 issue gives you a host of jargon-free articles to Select Forecasting Software/ Systems." and
on how to obtain, recognize, and use good more. Plus, you will have access to electronic
and 2004. From Table I. Y,^,„, - 27.08 and
forecasts written in an easy-to-understand copies of the latest journal. Moreover, you
Y„„, = 25.91, then. style for business executives and managers. will also have access to our Action Templates,
Plus, it provides new. practical forecasting ready to use. Currently, they include; (1)
= 0.863 ideas to help you make vital decisions about How to calculate forecast error? (2) How to
sales, capital outlays, credit, plant expansion, calculate how much money you will save by
financial planning, budgeting, inventory reducing specific amount of error? (3) How to
Y,,^,,-0.863 + 0.535x27.08 control, production scheduling and marketing calculate safety stocks (forthcoming).
+ 0.465 «25.9! -27.40 strategies. A one-year subscription includes 4
issues. Most of the articles are written for and •Events & Training (Discounts available)
Forecast accuracy can be similarly by practicing forecasters. IBF Conferences and Tutorials can raise
evaluated as in linear regressioti. your forecasting accuracy to new levels. Get
•Jonrnal of Business Forecasting Past step-by-step training, hear case studies from
Articles NEW! Active Members will now forecasting professionals working in well
CONCLUDING REMARKS have FULL access to all Journal of Business known companies, see demos of the latest
Forecasting articles since inception. With software packages and systems, network
It is obvious that the tnost difficult step active IBF Membership, you will have the and make long lasting connections with your
ability to download unlimited .pdf files of forecasting peers, and more. Our events are
in ARIMA modeling is Step 1, the model articles based on your set search criteria. run in Europe, Asia, as well as in the U.S.A.
identification. Once we get a handle on Step This way you will have access to research at Plus, we also offer online events through our
1, the other three steps are quite similar your fingertips! You can access hundreds of Webinar series.
to those in linear regression. Although articles representing a multitude of industries,
the calculations of the ACF and PACF companies, and topics including demand Join us at an IBF event today! For a full
planning and supply chain management. schedule of our upcoming events and
and the nonlinear estimation procedure
This access will give you a step ahead in testimonials, visit us online: www.ibf org.
look complicated and tedious, computer improving your forecasting performance.
software is available to do these jobs. There is no other body of knowledge which •In-House Training Seminars (Discounts
is as extensive as this one and is geared available) Bring the IBF to your workplace.
primarily towards forecasting practitioners. Enjoy the convenience of a professionally
In the example, the data base originally
developed forecasting training program
included 44 points; we lost 4 points in for your staff at a location of your choice
•Benchmarking Research Reports Our
differencing. The identified model has a benchmarking reports will provide you anywhere in the world. Gain knowledge and
term of lag 4; therefore, only 36 data points with understanding of key metrics and how hands-on training that can be put to use right
were available for mode! estimation. This your company measures up. The ultimate away. Companies that recently had In-House
is the reason why in ARIMA modeling, outcome of these studies is to gain a solid Training include: GAP. Cadbury, Wachovia.
understanding of the "best in class" metrics Wyeth, GlaxoSmithKline, Nike, Molson.
we need a relatively large sample size to most companies are achieving. Research and more. Call us for further details today!
accommodate data loss due to ditïerencing includes; benchmarks of forecasting errors, Discounts are applicable for Corporate
and lagged structure of the model. • forecasting software/systems, forecasting Members.
salary, and more. These indepth studies
of topics are based on various surveys of •Forecasting Books (Disconnts available)
UPCOMING EVENTS forecasting professionals from IBF events as Our books are geared toward helping
Demand Planning & Forecasting: well as from other sources. professionals learn, process, interpret, and
Best Practices Conference implement Business Forecasting information.
•Knowledge & Action Templates Our In addition, if you miss one of our conferences,
Las Vegas. NV - April 30- May 2, 2008
growing online knowledge base includes key we öfter manuals that detail each speaker's
For Information: issues and information on forecasting. This presentation from all our conferences.
Call/Contact
Individual IVIembership Corporate Membership
Institute of Business Forecasting & (8 People Maximum)
Planning S250 Domestic:, $300 Foreign $1800 Domestic, S2000 Foreign

Ph. 516.504.7576, Email lnfo@ibforg Call 516.504.7576 or visit us on the vwsnw.ibf.org to sign upl

28 THE JOURNAL OF BUSINESS FORECASTING, SPRING 2008

Você também pode gostar