Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract— Time series forecasting is useful in many re- autoregressive conditional heteroscedasticity (GARCH) [1]
searches areas. The use of models that provide a reliable among other models.
prediction in financial time series may to bring valuable profits Artificial neural networks (ANN) for time series prediction
for the investors. An intelligent agent can be built from a
suitable prediction model, to make operations in stock market have been successfully used in the last years, because of
daily. Furthermore, even that the investor had caution about some interesting features such as universality in function
the use of an automatic agent to make operations he can to approximations, robustness and fault tolerance [4]. For these
use the prediction model as a valuable decision support. A reasons, neural networks are considered useful to build
methodology based on information obtained from exogenous models for prediction of non-stationary time series [4].
series was used in combination with a neural network to predict
stock series. Exogenous series were selected by analyzing the Furthermore, ANN handles well noise data and it is able
correlation between the series with the stocks series used. In to predict nonlinear systems, which are the type of systems
this way, the prediction was obtained by not just using the that we are interested to predict, the stock market. Among the
previous values of the series but also by using information various ANN models, the most used in literature is multilayer
external to the main series. Additionally, the best trained neural perceptron (MLP) [5]. Radial basis function (RBF), wavelet-
networks were used in a combination to improve the prediction
capacity of single networks. To evaluate the proposed models based and recurrent neural networks have been also applied
for prediction, some known metrics were used plus a proposed with success [6].
one - Prediction in Direction and Accuracy (PDA), which uses Stock Market is a complex system composed of many
some features to determine if a model has a great accuracy and investors selling and buying financial products in form of
trend in prediction. Through this novel metric, we have used securities. Here, we are interested in the prediction of stocks
an evolutionary algorithm to choose the best trained models
in order to obtain better results. Experiments with two of the of the biggest Brazilian oil Company, Petrobras, and one
most important Brazilian companies’ stock quotes have shown of the biggest miners companies of the world, Vale do Rio
the usefulness of the proposed prediction system to generate Doce. The Petrobras stock index is named PETR4 and the
profits in investments. Vale do Rio Doce is named VALE5. These time series were
I. I NTRODUCTION analyzed between the years of 2003 and 2009.
In this paper, a comparison between two models of ANN,
Time series are sets of variables observed over a defined named MLP and RBF networks, both with and without ex-
period of time. These observations may be discrete or con- ogenous time series are presented. Additionally, we propose
tinuous and they are taken in an equal time interval [1]. a novel performance metric to select the best trained models,
There are many research areas involving time series anal- which aims to maximize trend prediction and accuracy. The
ysis, such economy, physics, engineering, social sciences, propose metric was used for selection of the best trained
computing, biology, medicine, meteorology and others. networks to be combined in a combination machine.
Perhaps the most applied analysis of a time series is in pre- This paper is organized as follows. Section II describes
diction. The prediction can be made using past observations briefly the stock market and the exogenous time series used.
of the series that will be forecast or even other time series. Section III presents the performance metrics which were
These different ones used to predict the main are know as used and the novel introduced metric. Section IV presents
Exogenous Time Series. the proposed methods for combining neural networks in a
There are two types of models in time series predic- combination machine. Section V describes the experiments
tion: linear and non-linear. A known linear method is the and results obtained. Finally, the Section VI presents the
ARIMA, proposed by Bob and Jenkins [2]. Some examples conclusions and final remarks.
of non-linear models are: bilinear, exponential autoregressive,
threshold autoregressive, smooth transition autoregressive, II. T HE S TOCK M ARKET AND E XOGENOUS T IME S ERIES
autoregressive with time dependent coefficients [3], autore- The main function of the capital market is the trade of
gressive conditional heteroscedasticity (ARCH) and general stocks with the purpose of finance development, which in its
Manoel C. Amorim Neto, Gustavo Tavares, Victor M. turn produce and nourish the market itself. On this way, a
O. Alves are with the Facilit Technology Company, Brazil, third function is attributed: the market of its own sources of
{manoel,gustavotavares,victor}@facilit.com.br. Site: www.aistocktrend.com incomes [7]. The monetary market, as a whole, is important
George D. C. Cavalcanti and Tsang Ing Ren are with the Center of Infor-
matics, Federal University of Pernambuco, Brazil, {gdcc,tir}@cin.ufpe.br. for the economic development. However, when the economy
Site: www.cin.ufpe.br/∼viisar and the market develops, the market of the source of capital
emerges, which are the stock market, debt titles and real III. P ERFORMANCE M EASUREMENT OF P REDICTION
estate market. M ODELS
Globalization is a trend that allows an intense interchange There are several metrics used to evaluate models of
between countries. Consequently, it is common nowadays time series forecasting. In this paper we have employed five
that the stock market of an emergent country like Brazil metrics that are commonly used in literature: MSE, MAPE,
attain an increasing importance in the international scenario. POCID, THEIL (or NMSE) and ARV. Additionally, it was
Today the stock market is not only an important source used SLG, which was proposed by Amorim Neto [8], and
of corporation finance but also an individual capitalization a novel metric proposed in this work, named Prediction in
resource. When investing in a portfolio, the investor wishes Direction and Accuracy (PDA).
to obtain a large return in other to compensate the risks A simple measure to evaluate the accuracy of a forecasting
associated, in other words, the objective is to minimize risk model is the diference between the expected value and the
and maximize capital returns. Hence, a prediction method is output value of model. From Equation 1, Tt is a expected
most useful and a neural network is well-suited for this kind value and Yt is the output of the forecast model, and et is
of optimization procedure. the calculed error, both at time t. Consider this measure as
Currently, the Brazilian stock market, which is also known a basis for the others.
in the World Federation of Exchange (WFE) by São Paulo
SE, has a global importance. From the 51 stocks monitored et = |Tt − Yt | (1)
by WEF, BOVESPA was in eighth position among the
biggest stock market in the world in terms of capitalization The performance measurement metrics used in this work
and stock values, in a ranking for developing countries. Two are briefly described here. Consider for every metric: Tt as
of the biggest companies in the BOVESPA stock market are the desired output of the forecasting model at time t and
the Petrobras oil company and Vale do Rio Doce, which Yt as the output of the proposed model and N as the total
makes them ideal stocks to be analyzed. For the professional amount of available patterns.
investor to understand the behavior of a stock, at least five A. MSE (Mean Squared Error)
series are necessary:
The Mean Squared Error is the most known metric to
1) The highest value that the stock was negotiated in a evaluate the performance of forecasting models. It is defined
certain day. as:
2) The lowest value that the stock was negotiated during
the same day. N
1 X
3) The value of the first negotiation of the day: opening M SE = (et )2 (2)
price. N t=1
4) The value of the last negotiation of the day: closing B. MAPE (Mean Absolute Percent Error)
price. The Mean Absolute Percent Error measure the accuracy
5) The business volume of the stock during the same day. of model in percentage. It is defined as:
The closing prize is the series that is really important, since
most of the professional investors and financial institutions N
1 X et
take action based on its value. M AP E = (3)
N t=1 Yt
From the methods for forecasting time series, the choice
of the input variables is an important step. In this work, A lower value of MAPE is the desired result from a
we are interested in the prediction of the stocks quotations prediction method.
of PETR4 and VALE5. To predict these stock values, we C. THEIL or NMSE (Normalized Mean Squared Error)
have used exogenous time series that were chosen based
The Normalized Mean Squared Error evaluate the relation-
on the autocorrelation analyzes, similarly to work done
ship of the model with the random walk model. Equation 4
previously [8].
defines this value.
For the Petrobras Company (PETR4) the exogenous time
series utilized were: Dollar, IBOV, CLF, NSY:PBR, DAX PN
and SP500. t=1 (et )2
T HEIL = PN (4)
Dollar time series is the Brazilian Real quotation converted t=1 (Yt − Yt−1 )2
to United States Dollar. IBOV is the BOVESPA quotation. When THEIL is equal to one, the proposed model is
CLF is the Crude Light Oil Future quotation. NSY:PBR is equivalent to random walk model. The random walk model
the quotation of Brazilian Oil. DAX is the German stock proposes that the time series future value is equal to the
market index. SP500 is the S&P 500 index. current value. If THEIL is lower than one, then the proposed
For the Vale do Rio Doce Company (VALE5) the exoge- model is better than random walk model. If THEIL is greater
nous time series used were: Dollar and IBOV. This stocks than one, then the proposed model has a performance worse
were chosen based on economic analyzes [8]. than random walk model.
D. POCID (Prediction On Change In Direction)
POCID is the percentage of the correct trend of the model PN
Gt
t=1
relative to the trend of expected value. This metric is defined P DA = (10)
N
by Equation 5.
where Gt is defined in Equation 11 :
PN
t=1 Dt
P OCID =100 (5)
1− ret
, if (Dt = 1) and ret < remax ,
N remax
0, if (Dt = 1) and ret ≥ remax ,
The value of Dt is defined by Equation 6 Gt = −1 + rere t
, if (Dt = 0) and ret < remax ,
(11)
max
−1, if (Dt = 0) and ret ≥ remax
1, if (Tt − Tt−1 )(Yt − Yt−1 ) > 0, where Dt is defined by Equation 12, ret = Tett and
Dt = 0, otherwise. (6)
remax = 0.02. This constant value is the relative maximum
E. ARV (Average Relative Variance) error accepted by the prediction. In this case, the maximum
tolerance is 2% error.
The Average Relative Variance evaluates the relationship
of the model with the other model, which proposes that the
time series future value is equal to the arithmetic mean of 1, if (Tt − Tt−1 )(Yt − Yt−1 ) > 0,
Dt = 0, otherwise.
(12)
the past values. It is defined as: