Você está na página 1de 7

Modelling the demand on motor vehicles in the U.S.

Daria Nikiforova
(s2595893)
January 30, 2014
Abstract
This paper demonstrates how time series regression models can be used for describing and
forecasting the demand on the new cars in the United States. Using the attributes of manufacturing industry, highly valid seasonal ARIMA model was constructed and tested on forecasting
power. The attemp to explain the dynamics of auto sales by corresponding dynamics of deisposable personal income failed. The possible explaination was the presence of strong seasonality in
one of the variables. However, the SARIMA modification possesses sufficient predictive power.

Introduction

The automobile industry is one of the industries in the U.S. manufacturing sector. The competiotion
amongst manufacturers has been accelerated since the economic crisis in 2008 when the Big Three General Motors, Ford and Chrysler - requested the government aid to relieve their financial problems.
To revive the companies and remain competitive in the market, this has forced them to devise various
strategies tended to overcome competition in the industry (Hulsmann et al., 2012).
Its clear that not only optimal positioning of new products is required but controlling activities,
such as demand planning, are necessary for eective managing of resources and profit maximization.
Due to the present intensive dynamic environment, demand planning is becoming challenging and
important. The errors in demand planning do often lead to enormous costs and loss of revenue.
Hence, accurate demand (sales) modelling is vital to a successful strategic planning.

Problem formulation

This paper considers an econometric model for auto sales in the U.S. from January, 1992 onwards as
its given in Fig.1
The attributes of manufacturing industry will help in modelling this time series. Retailers gave
large rebates in some years, rising sales above their expected levels. The automobile sales are the
prime example of a seasonal data, traditionally peaking in the fall when new models arrive in the
showroom floor. Such pattern as demand build-up can be investigated. It means that if the sales
are lower than expected in 2007, demand builds-up and sales can be higher than expected in 2008.
All these features, most likely, cause SARIMA process, which usally occur in most industries and
firm sales . The crisis in 2008 would be included in the model as it led to a dramatic change in sales
level. Taking into account the data structure (trend and seasonality), the regression analysis will be
applied for sales forecasting.
Additionally, another macroeconomic variable, such as disposable personal income, can be helpful
in the prediction of auto sales (Adda and Cooper, 2006). The intuition of it is obvious: the more
income consumers have to spend, the more money they will spend on automobiles and everything
1

Figure 2: Seasonal stacked line of US auto


sales
Figure 1: Auto sales and personal income againts time
else, wont they? Therere some developments in multivariative time series techniques have been
specifically designed to quantify the long-run impact of related variables to variables of interest.
Especially, for multivariate cointegrated non-stationary time series, VEC models have theoretically
been proven to provide an identification of long-run equilibrium interrelationships among variables
in the system.

Data mining

The main time series in this paper is the number of monthly retail sales of Auto and Other motor
vehicles provided by Census.gov. Its measured in billions of nominal dollars and includes 263 observations for the time period of 1992m01 - 2013m11. Second varible in data set consists is disposable
personal income taken as an economic indicator. Income time series is obtained from the report of
Bureau of Economic Analysis for the same period of 1992m01 - 2013m11 and contains of exactly the
same amount of observations.
As far as inflation aects both time series which leads to heterogenity, the natural logarithms are
taken from both variables to create approximately consistent variation.
Here and then the abbreviation will be used to name
auto sales time series is equal to logsales
disposable personal income is equal to logincome
Making a preliminary inspection, it can be seen that logsales is a strongly seasonable variable,
wherease logincome is not, The trend in logsales is seen to match the trend in logsales very
closely, although the latter seems to have a more cyclical pattern. Theres an evidence of level changes
in 2008 which can be explained by the economic crisis. Focusing on logsales first, the evidence of
deterministinc trend along with seasonality can be observed, while logincome tends to have more
linear trend. Both time series have intercept and trend.

4
4.1

Empirical analysis
Stationarity and seasonality

Since its assumed that logsales is both non-stationary and strongly seasonal, HEGY test as special
modification of Unit Root test was used. The auxiliary regression model that allows to perform the
test is given by eq.(1)
(B)y13,t = 0 + 1 t +

12
X

k Dk,t +

k=2

12
X

k yk,t

+ t

(1)

k=2

where yk,t (k=1,2,...,13) are auxiliary variables, definitions of which can be found in the paper
of Beaulieu and Miron, 1993. Therere 12 possible unit roots. one seasonal with zero frequency per
year and 11 non-seasonal.
Table 1: Regression results for HEGY test for unit roots
Null Hypotheses
h
1 = 0
2 = 0
3 = 4 = 0
5 = 6 = 0
7 = 8 = 0
9 = 10 = 0
11 = 12 = 0
a

Estimation statistics

Critical values

LOGSALES

SARIMA

5% (t,i,sd)

10% (t,i)

-1.88*
-3.31
7.49
8.67
7.59
14.65
13.00

-2.94*
-3.57
8.71
10.45
7.36
13.69
12.61

-3.28
-2.75
6.23
6.23
6.23
6.23
6.23

-3.06
-1.55
2.30
2.30
2.30
2.30
2.30

from the paper of Beaulieu and Miron (1993)

Table 1 presents a summary of the results obtained in performing the tests in order to check for
the integration of the series in its seasonal and nonseasonal parts, under the null hypotheses that
the series is SI(1,1). The null about the presence of a unit root at frequencies 0 and 1 is tested with
the t-statistic of the hypothesis H0 : i = 0 i = [1, 2]. The null hypotheses about the existence of
seasonal unit roots are tested, in each frequency, by means of the F-statistics corresponding to the
joint hypotheses H0 : i = i+1 = 0 i = [3, 5, 7, 9, 11], which take into account all pairs of conjugate
complex roots. The significance tests for 1 and 2 are one-sided.
In this case, the data reject the presence of unit roots at all seasonal frequencies. However,
the existence of a unit root at the zero frequency cannot be rejected. These results imply that the
seasonality present in this monthly series is partly deterministic and partly stationary stochastic.
Some graphical tecniques can be used to detect seasonality and non-stationarity. From the
correlogram given in Fig.3 the supension bridge pattern in the ACF table of logsales can be
observed, as well as the high spikes at each 13th lag after the 1st one in the PACF. It indicated
that logsales time series is both non-sattionary and strongly seasonal. Seasonal subseries plot for
period of 12 shown in Fig. 2 reveals a strong seasonality pattern.
To conclude, lag length of seasonal period in logsales data is 12, which is essential knowledge
for SARIMA model construction.

Figure 3:
logsales

4.2

Correlogram

Correlogram
of Figure 4: Correlogram of first- Figure 5:
SARIMA(9,1,3)(12,0,12)
dierenced logsales

of

SARIMA model

The general theory of SARIMA(p,d,q)(P,D,Q) model of the order of regular dierencing (d), seasonal
dierencing (D), the non-seasonal order of autoregressive (p), the seasonal order of autoregressive
(P), the non-seasonal order of moving average (q) and the non-seasonal order of autoregressive (Q)
(Chickobvu and Siganke, 2012).
t (B) P (B

)zt = + q (B)Q (B s )at

(2)

zt is time series dierencing

p (B)

P (B

= (1

L)

1B

= (1

Q (B L ) = (1

p)

is the non-seasonal autoregressive operator of order p

...

...

q B q ) is the non-seasonal moving average operator of order q

1,L B L
L)

pB

1,L B

1 B 1

q (B) = (1

...

...

P,L B

P L)

is the seasonal autoregressive operator of order P

Q,L B QL ) is the seasonal moving average operator of order Q

= t (B)

, , are unknown parameters that can be calculated from the sample data

a t , at

1 , ...

P (B

is a constant term, where is the mean of stattionary time series

are random shocks that are assumed to be independent of each other

The general stationaty transformation using the backshift operator is presented below:
zt = (1

B s )D (1

B d )yt

(3)

Since the seasonal dierencing was chosen, then (3) was notated with d = 1, D = 0 and s = 12
to define zt as:
4

B 12 )0 (1

zt = (1

B 1 )yt = (1

B)yt = yt

yt

(4)

In this case, the separated non-seasonal and seasonal model was computed. The combination of
parts describes the final model.
Step 1 Model for non-seasonal level
AR(9):

zt = +

1 zt 1

MA(3):

zt = + 1 a t

2 zt 2

+ 3 a t

9 zt 9

+ at

(5)

+ at

(6)

Step 2 Model for seasonal level


AR(12):

zt = +

1,12 zt 12

+ at

MA(12):

zt = + at + 1,12 at

(7)
(8)

12

Step 3 Combining eq.(5)-(8) has arrived to (9)


zt = +

1 zt 1

2 zt 2

9 zt 9

1 at

3 at

+ 1,12 at

12

+ at

(9)

Hence, value was -0.079 statistically not dierent from zero, then can be excluded from the
model.
The final model was derived using the backshift operator in multiplicative form:
(1

1B

2B

9B

1,12 B
1

12

1 1,12 B
3

13

= (1 + 1 B + 3 B + 1,12 B

12

2 1,12 B

14

1 1,12 B

13

21
9 1,12 B )zt
3 1,12 B 15 )at

=
(10)

The parameter results shown in Fig.6 were computed in Eviews for the period of 1992m01 2011m12 (218 observations) in order to check the forecast on real data.
The estimated paramets were included into eq.(10) to form the final model that expressed as
follows:
(1

0.896B 1

0.532B 2 + 0.233B 9 + 0.993B 12 + 0.890B 13 + 0.519B 14


= (1 + 0.487B 1

4.3

0.325B 3

0.794B 12

0.231B 21 )zt =

0.387B 13 + 0.258B 15 )at

(11)

Diagnostic checking

Diagnistic check was made into the selected model. Correlogram and the residual plots shown in Fig.5
indicate that theres no serial correlation because both ACF and PACF art all nearly zero and withing
95% confidence interval exept the 8th lag which can be neglected. The regression demonstrates quite
good descriptive statistics such as R2 = 0.73, DW close to 2 and the values of information criterias
are the best out of other modifications.
LM test applied for the first lag period didnt detect any correlation. ADF test rejected the
null about unit root. However, HEGY test shown in Table 1 |SARIMA| detected unit rot at zero
frequency but obtained t-statistics was extremely close to critical value at 10% level of significance.
Summing up, we can accept that the autocorrelation is extremely weak and almost abscent which
makes selected SARIMA(9,1,3)(12,0,12) model admittable.

Figure 6:
Estimation
SARIMA model

output

for

Figure 7: Forecast finning

4.4

Forecasting

The model was used to forecast the incidence from 2012m01 through 2113m11 (24 observations).
This part of rea data was kept on purpose to check the validation of forecast. The complete time
series plot shown in Fig.5 represents actual and fitted values.
After empirical examination forecast accuracy was initially inspected using dierent accuracy
measures. The results were obtained: RMSE = 0.037, MAPE = 75.18, Theils U-statistic = 0.22.
Relying on these values, the model is highly accurate and present as a close fit. Thus, the
empirical result indicated that the model was able to accurately represent the historical data which
means a good forecasting power.

4.5

Cointegration analysis

At this section axiliary variable is used for modelling US auto sales. Estimation of stationarity
was applied to the whole range of logincome observations. For this purpose, unit root test and
correlograms were used. ADF test with trend and intercept was applied to logincome . Lag lenght
was chosen according to SIC. Result shows that this time series is not stationary (p-value=0.99).
Despite the fact that from the plotting on Fig.1 some useful insights can be obtained whether these
time series are trended together or cointegrated, the formal tests need to be carried out. Having only
I(1) variables, the estimation proceeds in single equation Engle-Granger framework. It is a standart
way of cointegration investigation, which assumes that variables are cointegrated, if theres a linear
combination of order I(0), i.e. stationary.
The residuals of regression of logsales on logincome , the model of which is presented in
eq.(12), were tested on stationarity with ADF test in modification intercept and trend. The null
about unit root presence was not rejected with p-value = 0.37
yt = 0 + 1 x t + t
According to definition of cointegration, there no evidence of it exists.
6

(12)

Since these time series are not cointegrated, theoretically its still possible to build VAR model
in dierences describing the relations between auto sales and income. Constructing typical VAR
model in Eviews with dierent lag lenghs indicated that theres no influence of auto sales on income
dynamics, which is trivial result. However, some lags of income turned out to be significant. Precisely,
its the 5th and the 6th lags. These lags of income were included in SARIMA modification of sales
with the assumption that it can improve the forecast of auto sales. New model can be treated as a
VAR model with restrictions on several coefficients, which are insignificant for the dependent variable
by nature. As s result, R2 slightly increased till 0.74, while information criteria stay almost the same.
So, basically, the income might be helpful in predicting auto sales.
To be correct with the new forecast construction, the income forecast values were calculated first
to be used for sales prediction. The logincome series were modelled as ARMA(1,3) with a constant
with correlogram resemling white noise. Considerably low R2 is quite typical for the dierenced
variables. Obtained values were used as an sequence of data set for logsales forecast. Finally, the
forecast with logincome regressor was produced. The prediction power was not improved. Thus, to
conclude, its troublesome to glue up both non-stationary and seasonal series.

Conclusion and ideas for future research

This paper demonstrates how time series regression models can be used for describing and forecasting
the demand on the new cars in the United States.
Several approaches were explored in order to find the most plausible as well as highly forecastreliable regression model. As an assumption, the hypotethis of strong relation between disposable
income and auto sales was tested with help of vector cointegration analysis. Surprisingly, the auxiliary
variable failed to meet expectations of predictable influeence of auto sales. It can be explained by some
methodological issues and special data set characteristics, indeed, seasonality. However, the empirical
result indicated that historical values of auto sales can be used to develope the best descriptive and
predictive regression model, indeed in specicfication SARIM A(9, 1, 3)(12, 0, 12)12 as it was shown in
the main part.
In further work it still needs to evaluate and apply other methods into auto sales modelling in
order to obtain better accuracy of estimation results. For example, one of the ways to improve chosen
model wold be an addition of dummy variables depicting a dramatic shock observed in data, caused
by 2008 economic crisis. However, chosen model is highly valid according to provided diagnostic
check.

References
J. Adda and R. Cooper, The dynamics of car sales: a discrete choice approach, 2006.
J. Beaulieu and J. Miron, Seasonal unit roots in aggregate U.S. data, Journal of Econometrics,
vol. 55, pp. 305328, 1993.
D. Chickobvu and C. Siganke, Regression-SARIMA modelling of daily peak electricity demand in
South Africa, Journal of Energy in Southern Africa, vol. 23, pp. 2330, 2012.
M. Hulsmann, D. Borscheid, C. Friedrich, and D. Reith, General sales forecast models for automobile
market and their analysis, Transactions on Machine Learnin and Data Mining, pp. 6586, 2012.

Você também pode gostar