A HMM Based Adaptive Fuzzy Inference System For Stock Market Forecasting

Neurocomputing 104 (2013) 1025
Contents lists available at SciVerse ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
A HMM-based adaptive fuzzy inference system for stock market forecasting

Md. Raul Hassan a,n, Kotagiri Ramamohanarao b, Joarder Kamruzzaman c,
Mustazur Rahman b, M. Maruf Hossain b
a
Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia
c
Gippsland School of IT, Monash University, Churchill, VIC 3842, Australia
b
a r t i c l e i n f o
a b s t r a c t
Article history:
Received 24 March 2012
Received in revised form
12 July 2012
Accepted 12 September 2012
Communicated by P. Zhang
Available online 6 December 2012
In this paper, we propose a new type of adaptive fuzzy inference system with a view to achieve
improved performance for forecasting nonlinear time series data by dynamically adapting the fuzzy
rules with arrival of new data. The structure of the fuzzy model utilized in the proposed system is
developed based on the log-likelihood value of each data vector generated by a trained Hidden Markov
Model. As part of its adaptation process, our system checks and computes the parameter values and
generates new fuzzy rules as required, in response to new observations for obtaining better
performance. In addition, it can also identify the most appropriate fuzzy rule in the system that covers
the new data; and thus requires to adapt the parameters of the corresponding rule only, while keeping
the rest of the model unchanged. This intelligent adaptive behavior enables our adaptive fuzzy
inference system (FIS) to outperform standard FISs. We evaluate the performance of the proposed
approach for forecasting stock price indices. The experimental results demonstrate that our approach
can predict a number of stock indices, e.g., Dow Jones Industrial (DJI) index, NASDAQ index, Standard
and Poor500 (S&P500) index and few other indices from UK (FTSE100), Germany (DAX) , Australia
(AORD) and Japan (NIKKEI) stock markets, accurately compared with other existing computational and
statistical methods.
& 2012 Elsevier B.V. All rights reserved.
Keywords:
Fuzzy system
Hidden Markov Model (HMM)
Stock market forecasting
Log-likelihood value
1. Introduction
Adaptive online systems have great appeal in domains where
events change dynamically. Typical examples include nancial,
manufacturing and control engineering. A system is termed
adaptive if it can evolve according to the change in characteristics
of the problem. For instance, to model a chaotic time series where
the values change randomly, the system should continuously
update its knowledge and adapt itself. The aim of such a system is
to improve performance through enhanced modelling of the
changes in behavior. Different application areas of engineering,
computer science and nancial forecasting and analysis can
benet from using such kinds of adaptive systems.
An adaptive online learning system should possess the following criteria to be efcient and effective:
1. It should be able to capture the characteristics of new information as it becomes available;
Corresponding author. Tel.: 61 03 8344 1408; fax: 61 03 9348 1184.

E-mail addresses: hassan.raul@gmail.com,
mrhassan@kfupm.edu.sa (Md.R. Hassan).
0925-2312/$ - see front matter & 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2012.09.017
2. The system should be able to represent an overall knowledge

about the problem, without needing to memorize the large
amount of raw data;
3. The system should be able to update its knowledge in real time
and incrementally update its model;
4. The performance of the adaptive system should be better than
that of the static ofine system for nonstationary time series data.
Neural networks, have been popular for supervised learning;
however, it has been demonstrated by several studies [17] that
these tools can be limited in their ability to be adaptive. In
contrast, Fuzzy Logic can more easily be made adaptive [8], since
new rules can be generated online and rule parameters can be
modied in accordance with the new data. When generating an
adaptive fuzzy model, performance is a crucial factor. Particularly,
since increasing the number of rules may not always guarantee an
improved performance. However, changing the parameter values
according to new data can potentially overcome the inuence of
the farthest past data in the model construction.
There exist a number of adaptive models which combine a
neural network like structure to optimize the parameters of fuzzy
rules. One example is the Adaptive Neuro Fuzzy Inference System
(ANFIS) [8]. The limitation of this system is that it cannot adapt
Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025
the structure of the fuzzy model once the fuzzy model has been
built. Evolving Fuzzy Neural Network (EFuNN) is another system
introduced in [9,10] which uses evolving connectionist systems
(ECOS) architecture to make the system evolve. In the dynamic
version of EFuNN [11] the parameters are self optimized. In EFuNN
a new rule is generated if the distance between the new data vector
and cluster centres for each of the existing rules is greater than the
predened radius of cluster R. Hence, the performance of the model
depends on the optimal choice of R. Furthermore, the distance
function between two fuzzy membership vectors works well for
discretized data values but is not suitable for real continuous
numbers. To adjust the rule parameters a feedback algorithm is
used which requires storage to keep the desired outputs.
Recently, the Dynamic Evolving Neuro Fuzzy Inference System
(DENFIS) [12] has become popular, due to its adaptive and online
learning nature. DENFIS is quite similar to EFuNNN, except that in
DENFIS, the position of the input vector in the input space is
identied online and the output is dynamically computed, based
on the set of fuzzy rules created during the earlier learning process.
Rules are created and updated by partitioning the input space using
an online evolving clustering method (ECM). In ECM, the distance
between a data point and cluster center is compared with a
predened threshold Dthr, which is then used to generate clusters
and corresponding fuzzy rules. The threshold Dthr, which is effectively the radius of a cluster, must be statically dened and can affect
the performance of the obtained model. DENFIS uses Euclidean
distance [13] to measure the difference between two input vectors.
However, Euclidean distance is not a suitable method to differentiate
time series data patterns consisting of linear drift [14]. For example,
the two time series data vectors D1: /0 1 2 3 4 5 6 7 8S and D2:
/5 6 7 8 9 10 11 12 13S (as shown in bold in Fig. 1) have similar
trends, although they are dissimilar in terms of Euclidean distance.
For a time series application, since these two data vectors exhibit
similar pattern, they should belong to the same rule. Consequently,
the performance of DENFIS usually degrades with adapting more new
data when it is applied for forecasting non-linear time series data.
Another approach proposed in literature for realizing adaptive
Fuzzy Inference Systems is through leveraging evolutionary
approaches, such as Genetic Algorithm (GA). In [15] a GA-based
approach for adapting and evolving fuzzy rules was proposed to
achieve automated negotiation among relax-criteria negotiation
agents in e-markets. An evolutionary approach for automatic generation of FIS was proposed in [16], where the structure and
parameters of FIS are generated through reinforcement learning
14
12
Value
10
D2
8
6
4
and the fuzzy rules are evolved via GA. In [17], a method for
generating Mamdani FIS was introduced, where the fuzzy model
parameters are optimized by applying GA. Although GA is quite
popular for developing an evolving fuzzy system, its inherent
computational and time complexity makes this approach inapplicable
to an ever-changing non-linear chaotic time series data forecasting.
Hidden Markov Model (HMM) can be applied to nd similarities
in the patterns of a time series data [1820]. In [21,22,19,23],
HMMFuzzy model was proposed by exploiting the ability of HMM
to capture pattern similarities as well as the ease of fuzzy approach
deal with adaptive system. The HMMFuzzy model is an ofine
data driven fuzzy rule generation technique where HMMs data
pattern identication method is used to rank the data vectors and
then fuzzy rules are generated. The reason for using HMM is that it
models a system that provides higher probability to the data
vectors that represent the system, than the data vectors that
represent the minority scenario of the system. Though these models
have shown promising results, their performance in forecasting
time series data is still inadequate and they are designed for ofine
learning only. To improve performance, a model needs to learn
online where new and recent data trends can be captured making
the model continuously adaptive.
In this paper, we propose a model called the Adaptive Fuzzy
Inference System (AFIS) which consists of two phases. First, an
initial fuzzy model is generated using a small number of training
data vectors. To generate the initial fuzzy model, a HMM is
trained and used to compute log-likelihood values for each of
the data vectors. These log-likelihood values are then used to rank
and group the data vectors to generate appropriate fuzzy rules, as
described in Section 3.1.2. Second, the fuzzy model is conformed
to arrival of new data making it a continuous adaptive online
system. On observing new data either the fuzzy rule that satises
the data is identied using the HMM and is then adapted for the
new data or a new fuzzy rule is generated.
The proposed AFIS has signicant differences from the models in
our previous studies in a number of ways. First, AFIS is an online
learning system while others learn only ofine. Second, AFIS is an
adaptive model. Once a model is built based on the available data, it
remains unchanged in previous studies while, in AFIS, an intelligent
online learning is used to adapt the initial model as new data arrives.
In the latter case, currently dened rule is ne-tuned to t the new
data and if necessary, new rule is generated. Third, in AFIS, the
training dataset does not have to be large and the model not
necessarily be trained with data having characteristics of unknown
test data, rather can be trained incrementally as new data become
available. All these features make AFIS very suitable for forecasting
time series data and it outperforms other existing methods in
literature including our previous models as demonstrated in Section 5.
The remainder of the paper is organized as follows. In Section 2,
we briey discuss the fundamental concepts of HMMs. We describe
the proposed approach in details in Section 3. Section 4 presents the
design of our experimental investigation. We present and discuss the
results in Section 5. Lastly in Section 6, we suggest future improvements and conclude the paper. Notations are listed in Table 1 are
used in describing algorithms in the remaining part of the paper.
2. Preliminaries
D1
2
0
11
In this section, we describe the preliminary concept of HMM

which is useful in understanding the proposed AFIS.
10
20
30
Time (t)
40
50
Fig. 1. Two similar data patterns with different Euclidean distance (ED). Here ED
between D1 and D2 is 15.
2.1. Hidden Markov Model

A Hidden Markov Model (HMM) is like a nite state machine
that represents the structure or statistical regularities of
12
where bSj c represents the emission probability of an observation symbol c in state Sj.
5. The initial state distribution vector p fpi g where
Table 1
List of notations.
Notation
Description
N
M
!
x
x1 ,x2 , . . . ,xk
Number of states of a HMM

Number of distinct symbols for a HMM
An input data vector/an input observation sequence
!
Data features in data vector x
State transition probability matrix
ith state of a HMM
State sequence: fq1 ,q2 , . . .g
Current state
The next state
State transition probability from state Si to Sj at time t
Observation emission probability matrix
Emission probability of observation symbol ck from state Sj
A
Si
Q
q
q0
aij
B
bSj ck
p
l
Rl
M ij
oj
E!
xj
Emse
pi Prq0 Si , i rN,
Initial state transition probability vector

The hidden Markov model
l-dimensional space of continuous/real numbers
Membership function for jth rule of ith feature xi
Fij
The ring strength of jth rule

!
Prediction error for the data vector xj
Prediction error for the total training dataset in MSE
Scaling matrices for all states i
A probability distribution on 0,1
A multivariate Gaussian density function
The mean vector for the Gaussian density function N
The covariance matrix for the Gaussian density function N
Center of the membership function for ith feature of jth rule
sij
Steepness of the membership function for ith feature of jth rule
The set of observation symbols

A distinct observation symbol
A set of observation sequences
An input data vector/an observation sequence of continuous real
numbers
!
The log-likelihood value of generating x given the HMM l
!
Dimension of an input data vector/instance x
Co-efcient of consequent part of a fuzzy rule
Si
G
N
m
u2 S
v
!
x cont
lli
k
!
b
D
where q0 is initial state.

6. lthe entire model, l A,B, p.
There are three basic problems associated with using HMMs [20].
!
First, given an observation sequence, x /x1 ,x2 , . . . ,xT S, xi A O and
!
a HMM l A,B, p, compute Pr x 9l. Second, given an observation
!
sequence x /x1 ,x2 , . . . ,xT S, xi A O and a model l, nd the optimal
state sequence Q /q1 ,q2 , . . . ,qT S, qi A S. Third, given a set of
observation sequences w, estimate model parameters l A,B, p that
!
!
maximize Pr x i 9l 8 x i A w.
Our goal of using HMMs is to rank the data vectors, which we will
then use to generate fuzzy rules in a later phase. To achieve this, we
need to solve both the third and the rst problems described above.
There is no known method to solve the third problem analytically,
i.e., to adjust the model parameters to maximize the probability
of the observation data vectors. The BaumWelch algorithm [24]
is an iterative procedure that can determine the parameters suboptimally, similar to the expectation maximization (EM) method. It
operates as follows: (1) let the initial model be l0 . (2) compute a new
!
!
l based on l0 and observation sequence x . (3) if log Pr x 9l
!
log Pr x 9l0 o t, stop, else set l0 l and go to step 2 (t is the
minimum tolerance between two subsequent models).
The rst problem can be solved using the forwardbackward
algorithm, where given the HMM, the probability of generating a
k-dimensional data vector, /x1 ,x2 , . . . ,xk S, is calculated using the
following set of equations [20]:
X !
!
Pr x 9Q , lPrQ 9l,
4
Pr x 9l
8Q
where Qstate sequence q1 ,q2 , . . . ,qk and qi A S (for a k-state HMM),
Training dataset
l The HMM model,
sequences. HMMs have been applied for speech recognition since

early 1970s. We will rst use the common urn and ball example to
review the basic idea of HMMs. Suppose there are N urns containing
colored balls and there are M distinct colors of balls. Each urn has a
(possibly) different distribution of colors. First, we pick an initial
urn according to some probability. Second, we randomly pick a ball
from the urn and then replace it. Third, we again select an urn
according to a random selection process associated with the urns.
We repeat the Steps 2 and 3. In this example, we can regard the
urns as states and the balls as observation symbols.
In HMMs, the states (in the above example, the urns) are not
observable (i.e., hidden). Observations are probabilistic function
of state. State transitions are probabilistic. More formally:
1. N, the number of states in the model. The set of states is
denoted as S fS1 ,S2 , . . . ,SN g.
2. M, the number of distinct observation symbols, i.e., the
individual symbols. The set of symbols is denoted as
O fc1 ,c2 , . . . ,cM g.
3. The state transition probability distribution matrix A aij ;
the values of aij are calculated as
aij Prq0 Sj 9q Si ,
1r i and j rN;
where q0 is the next state, q is the current state, Sj is the

jth state.
4. The observation symbol probability distribution matrix,
B bSj ck , where ck A O and
bSj ck Prck 9q Sj ,
1 rj r N and 1r k rM;
!
x Input data vector /x1 ,x2 , . . . ,xk S, xi A O (Observation
Sequence).
!
The values of Pr x 9Q , l and PrQ 9l are calculated using the
following equations [20]:
k
Y
!
Prxi 9qi , l bq1 x1 bq2 x2 . . . bqk xk ,
Pr x 9Q , l
i1
where bqi xi emission probability of the feature xi from state qi.

PrQ 9l p1 :aq1 ,q2 :aq2 ,q3 . . . aqk1 ,qk ,
where p or p1 prior probability matrix, aqi ,qj transition probability from state qi to state qj.
So far we have described a HMM that deals with a sequence of
discrete symbols. Most of the real world problems, however, are
continuous (e.g., speech signal recognition, human movement
recognition and stock indices prediction) and hence a HMM able
to deal with continuous dataset is required. This can be achieved
through a slight modication of the discrete HMM. The following
section reviews how a HMM can be used for continuous data.
2.2. HMM for continuous data
There are a number of ways to generate a HMM to deal with
continuous data. Firstly, the continuous dataset can be converted
into a number of discrete sets by adopting a quantization technique.
In fact, a number of studies, especially are dealing with continuous
speech data [25], rst translate the continuous features into a set of
discrete symbols. Another approach is to map the discrete output
13
Fig. 2. Step-by-step example of the proposed model: (1) Convert univariate time series data into data vectors (window); (2) Feed the data vectors into a HMM; (3) Train the HMM
using expectation maximization algorithm; (4) Calculate log-likelihood value for each of the training data vectors and rank them; (5) Group the data vectors based on the ranking;
(6) Generate a set of fuzzy rules (considered as fuzzy system) using the data vector groups; (7) Adapt the generated fuzzy system whenever a new data vector arrives; (8) Feed the
new data vector into the trained HMM; (9) Compute log-likelihood lnew value for the new data vector; (10) If lnew is not within the range of minimum and maximum log-likelihood
values (i.e. ranking score) of the fuzzy system, create a new fuzzy rule; (11) Otherwise identify the rule where the new data vector ts in and (12) Modify the selected fuzzy rule.
distribution bj(k) to the continuous output probability density

function. The advantage of doing this over the quantization is that
the inherent quantization error can be eliminated [26]. Hence, a
HMM with continuous output probability density function is less
error prone than a discrete HMM with quantized continuous data.
To re-estimate the HMM parameters, Baum et al. [27,28]
described a generalization of the BaumWelch algorithm to deal
with such a continuous density function. A necessary condition is that
the probability density functions must be strictly log concave, which
constrains the choice of continuous probability density function to be
Gaussian, Poisson or the Gamma distribution [26].
3. Adaptive fuzzy inference system

The proposed adaptive online fuzzy inference system has two
phases (as illustrated in Fig. 2):
Phase 1: Initial fuzzy rule base generation
J Convert univariate time series data into data vectors
(window)
Feed the data vectors into a HMM

Train the HMM using expectation maximization
algorithm
J Calculate log-likelihood value for each of the training data vectors and rank them
J Group the data vectors based on the ranking (loglikelihood scores)
J Generate a set of fuzzy rules (considered as fuzzy
system) using the data vector groups
Phase 2: Adaptation of the rule parameters in response to arriving of new data sequences, or online generation of a
new rule, if required
J Feed the new data vector into the trained HMM
J Compute log-likelihood value for the new data vector
J If the log-likelihood value does not fall within the range
of minimum and maximum log-likelihood values (i.e.
ranking score) of the fuzzy system, create a new
fuzzy rule
J Else identify the rule where the new data vector
ts in
J Adapt the selected fuzzy rule
J
J
14
In Phase 1, the initial fuzzy model is generated. In the process of

generating the model, extraction of appropriate and accurate fuzzy
rules from data is a challenge. This is because, even for a small
number of data features in the dataset, there are potentially a large
number of rules that can be generated. There are several methods
that can be employed for generating fuzzy rules representing the
inputoutput relationship [2931]. In AFIS, we follow the fuzzy rule
generation approach through using a HMMs ability to identify and
group similar patterns following the studies [22,32,18,23]. The
HMM considers the dependency between features, because it uses
a Markov process, where it is assumed that the current state
depends on the immediate past state. Details about the generation
of fuzzy rule base are provided in sequel.
3.1. HMMFuzzy model
To generate the fuzzy rule base a HMM is trained using well
known BaumWelch [27] algorithm and the trained HMM is then
used to compute log-likelihood values for each of the data
vectors. The HMM log-likelihood value is used as a guide to
extract the appropriate fuzzy rules from the training dataset.
Let us consider the training dataset, which is a univariate time
series, where the set of data vectors are obtained by choosing
a xed window size WT which slides forward with respect to
time. Let D be a univariate time series data of length T, i.e.,
D /x1 ,x2 ,x3 , ,xT S, where xi A O for 1 ri rT. Table 2 shows
the input data vectors and the corresponding desired output. The
training data vectors are ranked using a trained HMM and then
the initial fuzzy model is described in the following subsections.
It may be noted that we have used xed window size with
uniform sampling of recent data (in our case with time lag 1) to
predict future data, which is the standard practice in time series
forecasting. A recent study by Minvydas and Kristina [33] showed
promising results when non-uniform sampling instead of uniform
sampling of data was used in forecasting. It requires determination of
an optimal set of time lags from the observed time series data. Since
our focus in this work to make HMMFuzzy model adaptive for
online forecasting, we stick to the standard approach here, however,
the effect of non-uniform sampling on our model is worth investigating in future.
3.1.1. Ranking the data vectors
To partition the input data space, initially the data vectors are
ranked using HMM log-likelihood scores. Since the data are continuous, a HMM for continuous data sequence (as described in
!
Section 2.2) is used. Each data vector x i has a HMM log-likelihood
!
value lli. This value is the log of probability of generating x i , given
P QWT
!
!
the HMM l: lli logf x i 9l log S t 1 aSt1 St N x i , mqt ,u2t Sqt .
These scores, are thus used to rank the data vectors, considering the
trained HMM as a reference point. This is depicted in the following
scenario.
!
Example 1. Let us consider a dataset D, where x i A D for
!
1 ri r m; i represents the index for data vector x i and xij is the
!
jth element of x i ; 1 rj rk. That is, each of the data vectors is
k-dimensional (here the dimension of the data vector is the length
Table 2
The set of predictor data vectors and the desired output for a univariate data D of
length T where xi A D for 1 r ir T.
Data
Data
Data
y
y
Data
Vector 1:
Vector 2:
Vector 3:
Vector m:
/x1 ,x2 , . . . ,xWT S

/x2 ,x3 , . . . ,xWT 1 S
/x3 ,x4 , . . . ,xWT 2 S
y
y
/xTWT ,xTWT 1 , . . . ,xT1 S
Desired
Desired
Desired
y
y
Desired
Output:
Output:
Output:
Output:
xWT 1
xWT 2
xWT 3
y
y
xT
of the window size WT)and the dataset contains m data vectors.

Assume that the dataset D represents the daily closing price
of a stock: i.e. the ith data vector /xi1 ,xi2 ,xi3 ,xi4 S will be /xdayi ,
xdayi 1 ,xdayi 2 ,xdayi 3 S. For this dataset D, we train a HMM
l A,B, p. Assume that the log-likelihood values for the three
! ! !
data vectors x 1 , x 2 , x 3 are l1, l2 and l3, respectively. In HMM
Fuzzy model, data values with close log-likelihood values would
be assigned the same rank. Let us assume that the values of l1 and
l3 are very close within a tolerance level. On the other hand,
suppose the value of l3 is not close to the value of l1 and l2. Thus,
!
!
data vectors x 1 and x 3 will be assigned the same rank and data
!
vector x 2 will be assigned a different rank.
3.1.2. Fuzzy rule inference
AFIS uses the TakagiSugeno (TS) type fuzzy inference model
[34]. The model comprises a set of fuzzy rules such that each rule
has two parts: a premise and a consequence. The consequence is
usually a linear combination of all variables in an input space, and
is usually denoted as a function of the input variables.
In AFIS, all fuzzy membership functions are radial basis
functions which depend on two parameters, as given by the
following equation:
2
Mji x e1=2xi Fij =sij ,
where Mji(x) represents the membership function for the attribute

xi and jth fuzzy rule, Fij is the center and sij is the steepness of
the membership function for the ith feature, i.e., xi in the dataset
considered to generate the jth rule.
In fuzzy model, the non-linearity in the dataset is considered
to be a combination of linear representations. As soon as representative straight lines are obtained for non-linear data, the TS
fuzzy model is generated and the membership function of each of
the linear representations is derived. The mathematical equation
for each of the linear representations is a rst-order polynomial,
which is used to represent a fuzzy rule in the model. The
representation syntax for such a fuzzy rule is [35]
jth rule : If v1 is Mj
and v2 is M j
and vk is M jk
Then y^ j is f j v1 ,v2 , . . . ,vk :

!
!
Here vi A V , 1 r ir k; V is a k-dimensional input data vector and
Mji is the fuzzy relationship among vis.
The linear function f j v1 ,v2 , . . . ,vk is represented as follows:
j
y^ pred v^ k 1 bj0 bj1 v1 bj2 v2 bjk vk ,
j
y^ pred
is the output predicted by jth-rule and bj0 ,bj1 , . . . ,bjk

here,
are the coefcients.
In the TS model, the consequence of dening a linear mapping
can be termed geometrically as a hyper plane in the inputoutput
space [24] and the defuzzication of the model is computed as the
weighted average of each rules output as represented in Eq. (9) [35].
Pc
y^ pred
j1
oj y^ jpred
,
j 1 oj
Pc
Q
where oj ki 1 M ji (for a k-dimensional input data vector) and
c the total number of rules in the model.
In AFIS, the least-square estimation (LSE) [36,37] function is
used to obtain the optimized parameter values of Eq. (8) in the
consequent part of each fuzzy rule.
Let us assume that, there are m data vectors for the jth fuzzy
!
rule. The co-efcient bi A b , 0 r ir k of Eq. (8) is obtained by
applying the LSE formula (Eq. (10)).
!
!
b CXT y ,
10
where
0
x11
B
B x21
B
B ...
B
C XT X1 ,X B
B ...
B
B ...
@
xm1
x12
...
x22
...
...
...
...
...
...
...
xm2
...
x1k
C
x2k C
C
... C
C
C
... C
C
... C
A
xmk
!
and y y1 y2 . . . ym T :
3.1.3. Fuzzy rule generation

In the process of generating the initial fuzzy rules, we divide
the dataset using the log-likelihood score/rank for each data
vector, through application of a divide and conquer approach.
To begin with, we create only one fuzzy rule that represents
the entire input space of the training dataset. At this point, all the
data vectors are considered to belong to one global group, therefore the log-likelihood value for each data vector does not have
any role in generating the fuzzy rule. In the process of rule
generation, we calculate the center Fi and steepness si , in order
to dene the membership function for each feature xi in the
dataset. Let us assume the dataset D is used to build the initial
!!!
fuzzy model. The parameters { F , s x } for the only one generated rule which satises the whole dataset D is obtained as
follows:
Pm
xij
11
Fi j 1 ,
m
v
u X
u1 m
si t
x Fi 2 ,
12
m j 1 ij
where mtotal data vectors in D; xij ith attribute of jth vector.
The parameter value of the consequent part is obtained using
Eq. (10). The generated fuzzy rule is used to predict the output y^ j
!
for each data vector x j in the training dataset. The prediction
!
error E x j for each data vector is computed using Eq. (13).
E!
x j y^ j yj ,
13
here, y^j is the predicted value using the generated fuzzy rule set
!
and yj is the actual value for jth data vector xj .
The total mean squared error (MSE) Emse for the training
dataset (m total training data vectors/instances) is obtained
by the following Eq. (14).
Pm
E!
x j 2
:
14
Emse j 1
m
The prediction error Emse is used to evaluate the performance of
the developed model for the training dataset. If the error for the
training dataset does not reduce further, the algorithm is terminated and no further rule is generated. Otherwise, the input
training data is split into two groups with the help of data vectors
sorted according to their ranks. The splitting of the data is done by
grouping the data vectors based on their ranks.
Initially, the split is done in such a way that the rst group
contains data vectors having comparatively higher rank than the
data belonging to the another group. To achieve this, a parameter
y is introduced, i.e. the rst y% of the whole ranked dataset is
considered to form a group and the remaining data, i.e. 1y% of
the whole ranked dataset belongs to another group. We create a
new rule for each created partition. Thus each split increases the
number of rules by one. The prediction error Emse for the training
dataset is recalculated using the extracted rule set. At each step of
increase in the number of rules, the convergence of error is
monitored. Rule generation is stopped when adding a rule does
not yield further improvement in prediction error Emse .
15
In the case of further rule creation, the dataset in the second

part is further split to extract more rules. The rst y% of the
ranked dataset in the second part is selected to form a group. The
remaining part of the dataset in the second part of the whole
dataset is considered as another group. A new rule is generated
for each of the new partitions and the prediction error Emse for the
training dataset is recalculated using the extracted rule set. This
process of rule generation continues until the prediction error Emse
for the training dataset reaches a plateau or there is no data in the
last partitioned group to further split.
Each time a new rule (let us consider this new rule is the jth
rule) is generated, the total number of data vectors nj and the
start
HMM log-likelihood value range (starting point: lj
and end
end
point: lj ) considered to generate that jth rule is stored to be
used at later stage of the system.
Example 2. Let us consider the dataset described in Example 1.
First, one fuzzy rule is generated using all the data vectors in
dataset D. We assume that the prediction error for the generated
fuzzy rule is 0.9, which is greater than the best possible minimum
prediction error 0. To minimize error further, the number of fuzzy
rule is increased by one. The dataset is divided into two parts
using the HMM log-likelihood value of each data vector. The
HMM log-likelihood value is used as a rank score for each data
vector. The rst divided part consists of the data vectors with loglikelihood values in the range of 0.0 to 1.5 (i.e. the rst y% data
vectors of the whole dataset) and the second divided part
contains the data vectors with log-likelihood values in the range
of 1.5 to 3.5 (i.e. the remaining 1y% of the dataset). Two
fuzzy rules are generated using these two parts of the divided
dataset. So we obtain a prediction error 0.3 using the two fuzzy
rules. Assuming that error may be reduced further, additional rule
generation is required. The dataset in the second partitioned
group is split into two groups. At this step, the rst y% ranked
data vectors of the second partitioned dataset is used as one
group and the remaining data vectors from the second partitioned
dataset is considered as another group. New rules are generated
for each of the groups (the rst partitioned group and the newly
obtained two groups) and so we have now obtained three fuzzy
rules. Let us assume that a prediction error using these three
fuzzy rules of 0.5. A further split of the last partitioned data
produces a prediction error is 0.7. Let us assume that we have
reached at a point where it is not possible to further split the last
partitioned data vectors. Now, we have the best minimum
prediction error 0.3 that was achieved using two fuzzy rules.
The nal fuzzy model is built using those two fuzzy rules and the
rule generation process is terminated.
3.2. Adaptive fuzzy
To make the HMMFuzzy model adaptive to the new arrival of
data which might become available after the model has been
built, rst we need to identify the rule, that is to be adapted to
reect the new data behavior. Therefore, this process has two
steps: (1) Identifying the related rule and (2) update the selected
rules parameters.
3.2.1. Extracting the rule to be modied
When new data is available, the log-likelihood value for the
new data is calculated. Based on the log-likelihood value the
corresponding rule that was generated using data vectors with
similar log-likelihood score is identied. Then that rules parameters are adapted with the new data.
Example 3. Let us consider a dataset D with a total n data vectors.
For this dataset we train a HMM l and produce the set of fuzzy
16
Fig. 3. The selected rule and its adaptation.
rules using the HMMFuzzy approach. Let us assume n 40 and

we have three fuzzy rules. Among these rules, the rst rule has
been generated using the dataset with log-likelihood values in the
range of 0.0 to 0.6, while that of for the second rule is in
between 0.61 and 0.7 and that for the third rule is in between
0.71 and 3.0. Suppose the new data vector produces a loglikelihood value of 0.9. This value indicates that the new data
vector will be covered by the third rule. Hence, to adapt the fuzzy
model given the new data vector we must update the third rule.
3.2.2. Adapt the extracted rule(s)
Adaptation of the extracted rule given the new data vector is
the process of adjustment of the parameters of the rule. There are
three parameters which need to be adjusted: (1) The linear
parameters of the consequent part of the rule, (2) The center F
of each membership function M, and (3) The steepness s of each
membership function.
! !n
To adapt the linear parameters b to b of the consequent part
of the extracted rule upon arriving new data ynew, the formula for
recursive LSE [37] is used as in Eq. (15).
! !T
C x new x new C
!
!T
1 x new C x new

!n !
!
!T !
b b Cn x new ynew x new b :
Cn C
15
!
Here, x new is the data vector that corresponds to the output ynew.
!
Based on the newly available data x new , recalculation of
the parameters: F and s for each membership function of the
selected rule is done using Eqs. (16) and (17).
Fnij

1
x
nj Fij ,
nj 1 newi
16
sijn2
2
nj
1
nj s2ij
xnewi Fij
,
nj 1
nj 1
17
where Fij current center dening the membership function Mij,

Fnij the new center of the membership function Mij, jthe selected
rule, nj number of data vectors that were used to generate rulej
during initial model building, xnewi ith feature of the data vector
!
x new , Mij membership function of the ith feature of the selected
jth rule.
Fig. 3 shows the effect of adaptation on membership functions
of a selected rule.
Example 4. Considering the same setup described in Example 3, we
!
now have the new data vector x new and the selected rule rulej. Now,
we adjust the center Fij and steepness sij of each membership
!
function Mij of rule rulej. Let us assume that F j /6:2,5:7,6:3,6:5S,
!
!
i.e., F1j 6:2, . . . , F4j 6:5, s j /0:4,0:7,0:63,0:49S and x new
/5:5,6:5,6:7:2S. Using Eq. (16) and (17) we obtain the adjusted
!n
center values as F j /6:16,5:75,6:28,6:53S, and the adjusted
!n
steepness values as s j /0:4488,0:7502,0:6550,0:5213S.
3.2.3. Generate new rule
In AFIS, a new rule is also generated if required. If the HMM
generates a log-likelihood value for a new data vector which does not
t in the existing data array (i.e., the new log-likelihood value exceeds
the range of log-likelihood values that was used to generate the
existing fuzzy model) a new rule needs to be generated. In this case,
given the desired output for the new data vector, the LSE algorithm is
used to obtain the linear parameters for the new rule. Parameters of
the membership functions for the new rule are obtained from the
new data vector. Since there is only one data vector pertaining to the
new rule, the center value (Fnewi ) and steepness value (snewi ) of each
membership function for this new rule are obtained from the new
arriving data vector. However, every time the system fetches new
data vector of similar pattern it will adapt itself with the new arriving
data vector as described in previous Section 3.2.1.
4. Experiment design and data sets

Our experiments are conducted on real stock data. The hardware used is an AMD 2.3 GHz CPU with 4 GB memory. Programs
were written in Matlab and run using Windows Vista.
4.1. Data sets
We have used seven leading stock market indices data from
different parts of the world: Dow Jones Industrial Average (DJI),
NASDAQ Composite (NASDAQ) and S&P 500 Index RTH (S&P500)
from USA stock market; FTSE 100, DAX Performance Index
(GDAXI or DAX) from European stock market; and All Ordinaries
(AORD) and NIKKEI 225 (N225 or NIKKEI) from Asian stock
market [38]. We have used the historical past weekly data of
the above-mentioned stock market indices. The time range for
each dataset is detailed in Table 3.
4.2. Data setup
We have used weekly stock indices to be used in the model
training and evaluation. To use in the system, we have used four
weeks as input variables and fth week as the predict variable.
In AFIS, we build an initial model considering a small amount of
data as training data. This initially built model is adapted with the
arriving of new data following the adaptive process as described in
Section 3.2. To evaluate the online adaptive behavior of the proposed
system, we built the initial model by varying the length of the
training dataset using as small as of 60 data vectors to a maximum
3000 data vectors. The remaining part of the dataset was used as test
data (completely unknown to the system). To make the comparison
consistent, we built the other models using the same split of data
into training and test sets. The other models we have tested are:
DENFIS, Chius fuzzy model [39] (fuzzy model is generated using a
subtractive clustering) followed by a hybrid learning algorithm
presented by Jang et al. [40] and HMMFuzzy model.
Autoregressive Integrated Moving Average (ARIMA) is a popular technique for forecasting time series data. We compared the
performance of AFIS with ARIMA where, we used the rst 1000
data instances as training data and the remaining data instances
as test data.
We have also generated predictions using a repetitively trained
ARIMA, where the ARIMA is trained every time with each arriving
new data instances. Each time, the ARIMA is trained using the new
data instance, along with 1000 data instances from the recent past.
We term this ARIMA as repetitively trained ARIMA.
We developed another fuzzy model through partitioning the
training data randomly and then generating fuzzy rules for each
of the partitions, in order to study the effectiveness of initial
partitioning of data using HMM in AFIS. The number of partitions
was chosen to be same as the number of fuzzy rules generated in
AFIS. We refer to this model as Randomly Partitioned Fuzzy Rule
Generation (RPFRG). We made this model adaptive by adapting its
parameter values with arriving new data. To adapt the parameter
Table 3
Details of the dataset used in the experiment.
Stock name
From date
To date
DJI
NASDAQ
S&P 500
FTSE100
DAX
AORD
NIKKEI
01/10/1928
05/02/1971
03/01/1950
02/04/1984
26/11/1990
03/08/1984
04/01/1984
24/08/2009
24/08/2009
24/08/2009
24/08/2009
24/08/2009
24/08/2009
24/08/2009
17
values, we have used the same methodology as in AFIS. We call

this adaptive model as Adaptive Fuzzy System followed by RPFRG
(ARPFRG). In ARPFRG, the rule that needs to be adapted is selected
randomly and the parameters were adapted given the new data
vector following Eqs. (16) and (17). Details of these approaches
are provided in Appendix.
We developed a fuzzy model where fuzzy rules were generated followed by a k-means clustering algorithm [41]. In this
model, the value of k is provided by the user prior to build the
fuzzy model. Details of this approach is provided in Appendix:
Section Appendix C. In this study we refer to this model as fuzzy
rule generation using k-means clustering. We modied this
ofine approach of generating fuzzy model to an online adaptive
system following the procedure described in Appendix: Section D.
The adaptive version of the fuzzy rule generation using k-means is
referred as adaptive k-means fuzzy model.
Articial Neural Network (ANN) is a popular tool to forecast
future. We used a three layer (Input-Hidden-Output layer) ANN
trained by backpropagation algorithm [42]. To determine the
most suitable architecture, we trained the ANN by varying the
number of hidden nodes from 5 to 35 and then selected that ANN
which produced the best forecast performance.
4.3. Performance metrics

We have used three different metrics to evaluate the predicted
models: Mean Absolute Percentage Error (MAPE), Normalized
Root Mean Squared Error (NRMSE) and t-test.
4.3.1. Mean Absolute Percentage Error (MAPE)

This value is calculated by rst taking the absolute deviation
between the actual value and the forecast value. Then the total of
the ratio of deviation value with its actual value is calculated. The
percentage of the average of this total ratio is the mean absolute
percentage error. The following equation shows the process of
calculating the MAPE.

P
^
MAPE
9yi yi 9
yi
r
i1
100%,
18
where r total number of test data vectors, yi actual stock price

on week i, and y^i forecast stock price on week i.
4.3.2. Normalized Root Mean Squared Error (NRMSE)

This is the root mean squared error divided by the range of
observed values
q
Pr
^ 2
i 1 yi yi
,
19
NRMSE
rymax ymin
where ymax and ymin are the maximum and minimum values of
yi with the r test data vectors.
4.3.3. t-test
t-test is a statistical hypothetical test where the averages
of the two samples: the predicted output using AFIS and the
predicted output using another fuzzy approach is tested against
the null hypothesis H0. Let us consider the two averages are : y^ AFIS
and y^ # respectively. The null hypothesis is dened as
H0 : y^ AFIS y^ # ,
20
where y^ AFIS average of predicted outputs from AFIS; y^ # average

of predicted outputs from any other approach. The t-value for the
18
102
AFIS
HMMFuzyy
DENFIS
Chiu model
101
100
MAPE
MAPE
102
500
1000
1500
2000
2500
101
100
3000
AFIS
HMMFuzyy
DENFIS
Chiu model
500
Number of training data instances
MAPE
MAPE
AFIS
HMMFuzyy
DENFIS
Chiu model
101
500
1000
1500
101
100
2000
200
101
100
200
300
400
500
600
800
1000
700
101
100
800
AFIS
HMMFuzyy
DENFIS
Chiu model
100
200
300
400
500
600
700
800
900
102
MAPE
600
102
AFIS
HMMFuzyy
DENFIS
Chiu model
400
MAPE
MAPE
102
2000
AFIS
HMMFuzyy
DENFIS
Chiu model
100
1500
102
102
100
1000
AFIS
HMMFuzyy
DENFIS
Chiu model
101
100
100
200
300
400
500
600
700
800
900

Fig. 4. Comparison of the MAPE for all datasets for the forecasts generated using AFIS, HMMFuzzy, DENFIS and Chius Fuzzy model [39]. (a) DJI, (b) NASDAQ, (c) S&P500,
(d) FTSE100, (e) DAX, (f) AORD and (g) NIKKEI.
0.4
0.35
0.35
0.3
0.25
0.2
0.15
0.2
0.15
0.1
0.1
0.05
0.05
0
AFIS
HMMFuzyy
DENFIS
Chiu model
0.25
AFIS
HMMFuzyy
DENFIS
Chiu model
NRMSE
NRMSE
0.3
19
500
1000
1500
2000
2500
3000
500
1000
1500
0.45
0.7
0.4
AFIS
HMMFuzyy
DENFIS
Chiu model
0.6
0.35
0.25
NRMSE
NRMSE
0.5
AFIS
HMMFuzyy
DENFIS
Chiu model
0.3
2000
0.2
0.15
0.4
0.3
0.2
0.1
0.1
0.05
0
500
1000
1500
2000
100
200
0.7
500
600
700
0.4
0.35
0.5
0.3
NRMSE
NRMSE
400
0.45
AFIS
HMMFuzyy
DENFIS
Chiu model
0.6
300
0.4
0.3
0.25
AFIS
HMMFuzyy
DENFIS
Chiu model
0.2
0.15
0.2
0.1
0.1
0
0.05
0
100
200
300
400
500
600
700
800
100
200
300
400
500
600
700
800
900
0.35
AFIS
HMMFuzyy
DENFIS
Chiu model
0.3
NRMSE
0.25
0.2
0.15
0.1
0.05
0
0
100
200
300
400
500
600
700
800
900

Fig. 5. Comparison of the NRMSE for all datasets for the forecasts generated using AFIS, HMMFuzzy, DENFIS and Chius Fuzzy model. (a) DJI, (b) NASDAQ, (c) S&P500,
(d) FTSE100, (e) DAX, (f) AORD and (g) NIKKEI.
20
t-test is calculated as in Eq. (21).

y^ AFIS y^ #
t-value r
,
varyAFIS vary^ #
n
21
where n total number of samples. The t-value is used to

determine the signicance level of difference between the two
data samples. This signicance level is known as p-value. A
p-value r0:05 indicates that the null hypothesis is rejected
within 95% condence level and hence the differences in prediction are statistically signicant.
parameter value y, we used 90% of the training data to generate

fuzzy rules and used the remaining 10% of the training data to
monitor the performance of the generated fuzzy model. We
selected a y that produced the minimum error for this 10% data,
through varying the value of y from 10% to 90%. This y is then used
to generate the fuzzy model using the full training dataset. It should
be noted that we used the training data only for selecting an
optimal y and generating the initial fuzzy model, while keeping the
test data complete unknown to the system. The same approach was
used to select the parameter values (i.e., radius of clusters) to
generate Chius fuzzy model. The obtained model using Chius
subtractive clustering technique was tuned using a hybrid learning
algorithm presented by Jang et al. [40] with 500 epochs.
4.4. Choice of parameters

Following the studies [43,22,19,21,23], the number of states in
HMM for the stocks is chosen as 5, as the number of input features
is 5 (i.e. the window size WT5) in the dataset. We generated
forecasts by varying the window size from 3 to 6 and noticed
insignicant variation in forecast performance. All the experimental
results reported here were generated using the window size
WT5. The initial parameter values of the HMM are chosen by
following the same steps as in Hassan et al. [19]. We identied the
parameter values of ARIMA through analysis of the training data
using Akaike Information Criterion (AIC) and Bayesian Information
Criterion (BIC). The parameter values (y) of the HMMFuzzy model
(where the rst 1000 data instances were used as training data) and
ARIMA (p,d,f ) are listed in Table 6. In HMMFuzzy, for choosing the
5. Results and discussion

Graphs in Figs. 4 and 5 show the performances in MAPE and
NRMSE respectively for the seven stocks considered in this paper.
As can be seen from these graphs, the forecast performance of the
proposed adaptive fuzzy inference system (AFIS) on the stock
market datasets clearly outperforms that of all the other reported
competing fuzzy models. From observations of the gures, two
important aspects are evident here: (a) better performance is
achieved by AFIS irrespective of the length of training dataset,
especially for DJI, NASDAQ and S&P500 datasets MAPE and
NRMSE attained by AFIS is signicantly lower than others;
(b) AFIS achieves its superior performance with only very short
Table 4
Comparison of the p-value of t-test for all datasets.
Stock name
p-values (vs. AFIS) within 95% condence

HMMFuzzy
Chius Model
DENFIS
DJI
Signicantly difference
NASDAQ
1:092 103
(0.049)
1:006 107
2:009 109
2:017 103
(0.0243)
2:171 107
(0.0092)
2:336 105
(0.0064)
(0.0137)
3:336 1013
(0.0015)
S&P500
(0.0341)
FTSE1000
3:056 103
(0.0154)
DAX
AORD
2:931 1036
(0.02685)
NIKKEI
10
10
(0.0471)
AFIS
RPFRG
ARPFRG
10
10
10
10
MAPE
NRMSE
101
10
10
10
10
10
3:987 1013
5:289 1090
AFIS
RPFRG
ARPFRG
101
10
200 250 300 350 400 450 500 550 600 650 700
200 250 300 350 400 450 500 550 600 650 700
Number of training instancse
Number of training instancse
Fig. 6. Performance comparison among HMMFuzzy, AFIS, randomly partitioned fuzzy rule generation (RPFRG) and adaptive fuzzy system followed by randomly
partitioned fuzzy rule (ARPFRG) for DJI stock index. (a) Performance metric: NRMSE and (b) Performance metric: MAPE.
21
approach described in Section 3.2.1. As shown in Table 5, the

performance of the adaptive k-means fuzzy model is signicantly
better than that of non-adaptive fuzzy model. This signies the
importance of the proposed adaptive approach in improving forecast
accuracy. More interestingly, even though the adaptive approach
yields better results, the performance of AFIS is far better than the
performance of the adaptive k-means fuzzy model in terms of MAPE.
Hence, these ndings substantiate that the effectiveness of partitioning data using HMM as well as the intelligent adaptive approach
contributed to the improved performance of AFIS.
In literature, ANN has been used to forecast time series data,
e.g., stock market prediction by Atsalakisa et al. [44] and foreign
currency exchange rate forecasting by Kamruzzaman et al. [45].
Table 6 provides a comparison between AFIS and ANN. The forecast
performance of AFIS is signicantly better than that of ANN. The
poor performance of ANN is due to its inability to cope with new
data vector. This is because, while generating forecast using an ANN
that has been trained a dataset, might not reect the characteristics
of the new data vector. Retraining an ANN with new data is time
consuming, and hence not suitable for time series data like stock
market, where the trend may change considerably from that of
the past. Evidently the better performance of AFIS is due to its
intelligent adaptive ability with the new data.
ARIMA is one of the widely used techniques to predict time
series data. ARIMA is an ofine process where the initial model is
built using available training dataset. Once the model is built the
model does not adapt itself with the arrival of new data. Table 6
shows the performance comparison between AFIS and ARIMA for
the seven stock indices. To make the comparison consistent, the
performance of repetitively trained ARIMA is also presented. Once
again, AFIS outperforms standard ARIMA. However, the performance of AFIS is slightly better than repetitively trained ARIMA
except in the case of NIKKEI for which trained ARIMA performs
slightly better. This was not surprising as ARIMA is retrained with
new data thus exhibiting adaptiveness. However, ARIMA is signicantly worse in terms of its computational performance, as
length of the training data. For example, for NIKKEI series, all the
fuzzy models produced a minimum consistent MAPE and NRMSE
starting from the length of the training data as 200 (as we see in
Figs. 4(g) and 5(g)). For this stock AFIS produced even a better
performance starting from the length of the training data as 60 and
onwards.
To further analyze the results, we have conducted a paired t-test
with 5% signicance level (i.e. 95% condence level) between AFIS
and other considered techniques. As shown in Table 4 the computed
p-values between the predicted values by using AFIS and that of using
HMMFuzzy, Chius subtractive clustering based fuzzy model and
DENFIS are much less than 0.05. The fact that the performance of AFIS
is far better than the other fuzzy systems (Figs. 4 and 5) along with
the smaller p-value (i.e., p-value o0:05) statistically signify that AFIS
is capable of forecasting time series data signicantly better than
HMMFuzzy, Chius fuzzy model and DENFIS for the stock data
considered in our experiment.
To analyze the event that makes AFIS such an efcient forecast
approach, rst we generated fuzzy rules using a scheme of randomly
partitioning the training data. Generated rules are also adapted as
soon as new data arrives following a random process as stated in
Section 4. Fig. 6 provides the performance results of Randomly
Partitioned Fuzzy Rule Generation (RPFRG) and Adaptive Fuzzy
System followed by RPFRG (ARPFRG) along with HMMFuzzy and
AFIS for DJI stock index. Table 6 shows that AFIS is clearly able to
model the behavior of the stock series. For example, the performance
of AFIS in MAPE is 1.93 for a training data length of 700, whereas for
the same training data ARPFRG attains a MAPE value of nearly
430 000 (see Fig. 6b). It is worth mentioning here that due to
introducing randomness in generating fuzzy rules and identifying
the rule that needs to be adapted, the MAPE values for both RPFRG
and ARPFRG are much higher than that of AFIS. Second, we generated
fuzzy model using a k-means clustering algorithm and its adaptive
version. In the adaptive k-means fuzzy model, with the arrival of new
data vectors, the initial fuzzy model generated using a k-means
algorithm is adapted by coupling the intelligent dynamic adaptive
Table 5
Performance comparison among AFIS, Fuzzy model generated using k-means and its adaptive version (trained for each new data; the rst 1000 data
instances used for training and the remaining data for testing).
Stock name
DJI
NASDAQ
S&P500
FTSE100
DAX
AORD
NIKEI
AFIS
Fuzzy rule generation using k-means
Adaptive k-means fuzzy model
NRMSE
MAPE
NRMSE
MAPE
NRMSE
MAPE
0.0087
0.0170
0.0102
0.0396
0.0735
0.0251
0.0259
1.5216
2.2276
1.6291
1.7005
3.6791
1.5668
2.4377
3.402
0.3446
0.4287
0.04
0.3402
0.3446
0.0339
42.2622
60.773
46.623
1.7003
42.0948
60.773
2.4259
0.013
0.0216
0.011
0.04
0.0121
0.0216
0.0339
2.4216
3.4422
1.704
1.6976
2.2278
3.4422
2.426
# of Fuzzy rules
3
3
5
4
2
3
3
Table 6
Performance comparison among AFIS, ARIMA and Articial Neural Network (trained for each new data; the rst 1000 data instances used for training and the remaining
data for testing).
Stock name
DJI
NASDAQ
S&P500
FTSE100
DAX
AORD
NIKKEI
AFIS
ARIMA
NRMSE
MAPE
NRMSE
MAPE
p,d,f
0.0087
0.0170
0.0102
0.0396
0.0735
0.0251
0.0259
1.5216
2.2276
1.6291
1.7005
3.6791
1.5668
2.4377
0.8
0.9
0.7
0.8
0.8
0.9
0.9
0.3429
0.2720
0.3407
0.4475
0.3432
0.2736
0.3955
78.4625
50.9447
46.9004
20.7286
79.1502
51.5942
25.8386
3,
1,
1,
2,
4,
1,
1,
1,
1,
1,
1,
1,
1,
1,
3
0
2
2
4
0
3
Repetitively trained ARIMA
Articial Neural Network
NRMSE
MAPE
NRMSE
MAPE
#Nodes
0.0090
0.0174
0.0105
0.0404
0.0742
0.0312
0.0237
1.5697
2.3953
1.6411
1.7213
3.7993
1.6817
2.3701
0.3184
0.3142
0.3225
0.118
0.3225
0.2603
0.0411
42.7288
60.5563
32.9437
2.7618
46.7498
47.1483
2.4377
35
10
10
10
15
15
20
22
electricity production in Australia (data available on [46]). As shown

in the gure, AFIS can better follow the trend of time series in
comparison with that of ofine fuzzy models, e.g. HMMFuzzy
model and Chius fuzzy model. This is because, each of the fuzzy
models was trained using a small data (Jan, 1956 Oct, 1964, length
is 100) and hence as time goes on the ofine fuzzy models cannot
produce a reasonable forecast for the new data. On the other hand,
AFIS employs its intelligent adaptive ability with the arrival of new
data. Forecast error in terms of other performance metrics shown in
Table 9 also shows the superiority of AFIS compared to other
models, even if the length of training data is small.
Table 10 summarizes the characteristics that are required for a
perfect online adaptive system. AFIS has the all the characteristics
while DENFIS satises three criteria among in the list and
repetitively trained ARIMA satises only two criteria. Moreover,
shown in Table 7. On average AFIS is between 4 and 13 times

faster than repetitively trained ARIMA. The average execution
time to generate a prediction using AFIS is almost consistent
(45 ms) for the seven stock indices as shown in Table 8. But, the
time to generate a prediction using repetitively trained ARIMA
varies from 19 ms to 57 ms.
Furthermore, unlike repetitively trained ARIMA, AFIS does not
require to retrain and rebuild the model for every new observation.
Instead, AFIS adapts only its structure and co-efcient dynamically
with every arriving new data instance, thus providing the best
performance compared with other competing models.
The above results demonstrate the capability of our method in
yielding better forecasting for stock market data. In addition, we did
further experiments to assess its efcacy on other time series data.
Fig. 7 shows the forecast values and the actual values of monthly
Table 7
Execution time comparison between AFIS and repetitively trained ARIMA (the rst 1000 data instances used for building the initial model and the remaining data for
testing; this experiment was executed 10 times for each of the stocks and the average performance along with performance variation is reported here).
Stock
name
Length of
Length of
training data test data
AFIS
Training time for building

initial model time (s)
mean 7std
DJI
NASDAQ
S&P500
FTSE100
DAX
AORD
NIKKEI
1000
1000
1000
1000
890
1000
1000
3216
1007
2042
321
84
303
324
3.209
3.315
2.998
3.126
3.391
3.326
3.253
7 0.3512
70.1324
70.5121
70.4142
70.3531
70.1367
70.4851
Speedup per
data prediction
Prediction time and time Training time for building

initial model time (s)
to adapt time (s)
mean 7 std
mean 7std
Prediction time and time

to adapt time (s)
mean 7 std
14.0915 7 0.4251
4.8941 7 0.3393
9.0808 70.4734
1.6265 7 0.3381
0.4156 70.2332
1.5275 7 0.2993
1.6441 7 0.8474
137.8300
25.7742
116.9670
7.5625
2.6600
6.0240
6.2243
2.68
2.63
2.58
2.71
2.25
2.55
2.57
7 0.1312
7 0.3532
7 0.4512
7 0.4417
7 0.3931
7 0.6671
7 0.3985
73.8289
70.6556
71.8735
70.1253
70.1867
70.08514
70.6285
9.79
5.27
12.87
4.65
6.40
3.94
3.79
Table 8
Execution time to generate a prediction.
Stock name
Time to predict the next data (ms)
DJI
NASDAQ
S&P500
FTSE100
DAX
AORD
NIKKEI
Speedup
AFIS
4.38
4.86
4.45
5.07
4.95
5.04
5.07
42.86
25.60
57.28
23.56
31.67
19.88
19.21
9.79
5.27
12.87
4.65
6.40
3.94
3.79
16000
Million kilowatt hours
14000
12000
AFIS
HMMFuzzy
Chius model
Actual data
DENFIS
10000
8000
6000
4000
2000
Nov,1964
Jan,1969
Mar,1973
May,1977
Jul,1981
Sep,1985
Nov,1989
Jan,1994
Date
Fig. 7. Forecast values vs. actual values where forecasts are computed using AFIS, HMMFuzzy model, Chius fuzzy model and DENFIS for the monthly electricity
production in Australia. (Training data: Jan 1956 Oct 1964 and Test data; Nov 1964 Aug 1995).
23
Table 9
Performance comparison among AFIS, HMMFuzzy, DENFIS and Chius model (by varying the length of training data: 100 and 200) for Monthly electricity production in
Australia: million kilowatt hours: Jan 1956 Aug 1995.
Training data
Test data
AFIS
HMMFuzzy
Chius model
DENFIS
From
To
From
To
NRMSE
MAPE
NRMSE
MAPE
NRMSE
MAPE
NRMSE
MAPE
Jan 1956
Jan 1956
Oct 1964
Feb 1973
Nov 1964
Mar 1973
Aug 1995
Aug 1995
0.0686
0.0507
7.5400
4.5667
0.4898
0.0610
38.0055
5.1254
0.2958
0.0611
19.0174
5.1157
0.4686
0.4180
50.1938
32.5049
Table 10
Comparison of adaptive online learning systems based on the desired characteristics.
Desired characteristics
AFIS Rep. trained

ARIMA
DENFIS
Can it capture any new information as they are available ?

Does the system represent the overall knowledge about the problem without memorizing the large amount of representative dataset?
Is the system able to update any knowledge in real time which is observed in the recent datasets that was not previously considered
during building the initial system and thereby be able to avoid rebuilding a new system when there is no change in the model?
Is the performance of the system signicantly better than a static system?
|
|
|
|

|
|
|
the performance of DENFIS is much worse than that of AFIS on all

the seven stock indices as demonstrated in our experiment.
6. Conclusion
In this paper, a new adaptive fuzzy inference system (AFIS) has
been proposed and developed with a view to achieve improved
performance by dynamically adapting with the arrival of new data.
The structure of the base fuzzy model of our proposed system is
developed based on HMM log-likelihood of each data vector. The
rationale of using HMM is to model the underlying system and use
this HMM to rank the data vector accordingly. Fuzzy rules are then
generated after grouping the data vectors that have higher loglikelihood values than the others. These initially generated rules are
adjusted dynamically every time new data is observed. Due to the
intelligent adaptation mechanism in AFIS, it performs better than
other existing competing models (both static fuzzy models and
dynamic models, i.e., static ARIMA, HMMFuzzy, Chius Model,
DENFIS and repetitively trained ARIMA). One such dynamic adaptive fuzzy inference system has many potential applications in
computer science, nancial engineering and control engineering
where the value of an event continuously changes with time.
Appendix A. Randomly Partitioned Fuzzy Rule Generation

(RPFRG)
In this approach of fuzzy rule generation, we assume that the
number of fuzzy rules to be generated for a given data is known to
the user prior to building the fuzzy model. Let us consider the
number of fuzzy rules is k. Hence, the training input (predictor) data
is partitioned into k groups where each data vector should belong to
only one of the k groups. This is done for each data vector to be put
in one of the k groups randomly. Let us consider a set of input
predictor data vectors X1 to X33 and the value of k is 3. Thus, for each
data vector Xi a random value between the range of 13 is generated
and following the randomly generated value the corresponding data
vector is considered to belong to that group. Assume that a random
value 3 is generated for X1 and 2 is generated for X2. So X1 and X2
will be assigned to Group 1 and Group 2, respectively. The partitioning of the input data vectors are as follows:
Group1 : X 3 X 4 X 7 X 9 X 10 X 16 X 20 X 29 X 33
Group2 : X 2 X 5 X 6 X 8 X 11 X 14 X 17 X 19 X 23 X 24 X 28 X 31
Group3 : X 1 X 12 X 13 X 15 X 18 X 21 X 22 X 25 X 26 X 27 X 30 X 32
Since data are partitioned into k groups by choosing their group

labels using randomly generated number we call this partitioning as
Randomly partitioned. Once the data has been partitioned into k
groups, individual fuzzy rule is generated for each group of data
following Section 3.1.2. This approach is referred as randomly
partitioned fuzzy rule generation (RPFRG).
Appendix B. Adaptive Randomly Partitioned Fuzzy Rule

Generation (RPFRG)
In this approach, the fuzzy model generated in Section Appendix
A is transformed to an adaptive system where the rule structure is
adapted with the arrival of new input data vector xnew. In this regard
as soon as a data vector xnew is available, the system either chooses a
fuzzy rule among the existing k fuzzy rules generated in Section
Appendix A and then adapts rule or generates a new fuzzy rule in the
RPFRG fuzzy model. Here, the rule that needs to be adapted is chosen
by generating a random number in between 1 and k, where k is the
total number of fuzzy rules in the RPFRG fuzzy model.
Let us consider the new input data vector is /5:9,1:2S while
the desired output for this input data is 0.9. In the process of
adapting the RPFRG fuzzy model, an integer value in the range of
13 (since, we consider the value of k 3) is generated. Let us
assume that the randomly generated integer value is 2. Thus, Rule
no 2 is chosen to adapt given the new input data vector. Note that,
the parameters of Rule no 2 before adaptation are as follows:
M2,1 : F 4.8633 and s 2.3628

M2,2 : F 3.1875 and s 1.2521
Now, following Eqs. (15)(17), the new parameters (i.e., after
adapting the rule given new input data vector) are as follows:
M2,1 : Fn 4.9431 and sn 2.2804

M2,2 : Fn 3.0346 and sn 1.3195
The effect of adaptation on the fuzzy membership functions of
Rule no 2 will be similar to the one illustrated in Fig. 3.
!
n
In a similar way, the new values of Cn and b are computed by
the following Eq. (15). Once the rule is adapted the resultant fuzzy
model becomes more suitable to forecast new data compared to
the nonadaptive static system (e.g., RPFRG). This approach is
24
referred as adaptive randomly partitioned fuzzy rule generation

(ARPFRG).
Appendix C. Fuzzy rule generation using k-means clustering

In this approach, a k-means clustering algorithm [41] is
applied to partition the input data considering that the number
of fuzzy rules (k) to be generated for a given data is known to the
user prior to building the fuzzy model. Once the data are
partitioned into k clusters, a total k fuzzy rules are generated
by the following Section 3.1.2, where each rule correspond to one
cluster. We refer to this approach as k-means fuzzy model.
Appendix D. Adaptive k-means fuzzy model

In this approach, the k-means fuzzy model is made an adaptive
system, by coupling the k-means partitioning with the dynamic
adaptive fuzzication as described in Section 3.2.2. In the process
of adaptation, to choose the fuzzy rule that needs to be adapted
given a new data vector xnew, the minimum Euclidean distance
between cluster centers and the new data vector xnew is used. Let
us consider that there are three rules that have been generated
from three clusters. Thus, distances from xnew to each of the
cluster centers are computed. Assume these distances are 2.3,
0.9 and 4.5 from the center points of cluster1, cluster2 and
cluster3, respectively. Since, 0.9 is the minimum distance, Rule
no 2 is selected to adapt its parameters. Adaptation of the rule
parameters is accomplished by the following Eqs. (15)(17). We
refer to this approach as adaptive k-means fuzzy model.
References
[1] Liang Xun, Rong-Chang Chen, Jian Yang, An architecture-adaptive neural
network online control system, Neural Comput. Appl. 17 (4) (2008) 413423.
[2] A. Robins, Sequential learning in neural networks: a review and a discussion
of pseudorehearsal based methods, Intell. Data Anal. 8 (3) (2004) 301322.
[3] Y. Zhi-Gang, S. Shen-Min, D. Guang-Ren, R. PEI, Robust adaptive neural
networks with an online learning technique for robot control, in: Advances
in Neural Networks: (Part IIII: ISNN 2006: Third International Symposium
on Neural Networks), 2006, pp. 11531159.
[4] J.A.S. Freeman, D. Saad, Online learning in radial basis function networks,
Neural Comput. 9 (7) (1997) 16011622.
[5] R.M. French, Semi-destructive representations and catastrophic forgetting in
connectionist networks, Connection Sci. 1 (1992) 365377.
[6] T.M. Heskes, B. Kappen, On-line learning processes in articial neural networks, in: J. Taylor (ed.,) Mathematical Approaches to Neural Networks,
Elsevier, Amsterdam, 1993, pp. 199233.
[7] G.A. Rummery, M. Niranjan, On-line q-Learning using Connectionist Systems,
Technical Report. CUED/F-INENG/TR 166, Cambridge University Engineering
Department (1994).
[8] J.S.R. Jang, ANFIS: adaptive-network-based fuzzy inference system, IEEE
Trans. Syst. Man Cybern. 23 (1993) 651663.
[9] N. Kasabov, Evolving fuzzy neural networks-algorithms, applications and
biological motivation, in: Methodologies for the Conception, Design and
Application of Soft Computing, World Scientic, Singapore, 1998.
[10] N. Kasabov, Evolving fuzzy neural networks: theory and applications for online adaptive prediction, decision making and control, Australian Journal of
Intelligent Information Processing Systems, 1998, 154160.
[11] N. Kasabov, Evolving fuzzy neural networks for online, adaptive, knowledgebased learning, IEEE Trans. Syst. Man Cybern. B 31 (2001) 902918.
[12] N.K. Kasabov, Q. Song, DENFIS: dynamic evolving neural-fuzzy inference
system and its application for time series prediction, IEEE Trans. Fuzzy Syst.
10 (2002) 144154.
[13] M.M. Deza, E. Deza, Encyclopedia of Distances, Springer, Berlin, Heidelberg, 2009.
[14] Similarity search and outlier detection in time series, /http://www.latestscience-articles.com/IT/Similarity-Search-and-Outlier-Detection-in-TimeSeries-4480.htmlS.
[15] K.M. Sim, Evolving fuzzy rules for relaxed-criteria negotiation, IEEE Trans.
Syst. Man Cybern. B 38 (6) (2008) 14861499.
[16] Y. Zhou, M.J. Er, An evolutionary approach toward dynamic self-generated fuzzy
inference systems, IEEE Trans. Syst. Man Cybern. B 38 (4) (2008) 963969.
[17] A. Elmzabi, M. Bellah, M. Ramdani, An adaptive fuzzy clustering approach
for the network management, Int. J. Inf. Technol. 1 (3) (2007) 1217.
[18] M.R. Hassan, B. Nath, M. Kirley, A fusion model of HMM, ANN and GA for
stock market forecasting, Expert Syst. Appl. 31 (1) (2007) 171180.
[19] M.R. Hassan, Hybrid HMM and Soft Computing Modeling with Applications
to Time Series Analysis, Ph.D. Thesis, Department of Computer Science and
Software Engineering, The University of Melbourne, 2007.
[20] L.R. Rabiner, A tutorial on Hidden Markov Models and selected applications
in speech recognition, Proc. IEEE 77 (1989) 257286.
[21] M.R. Hassan, A combination of HMM and fuzzy model for stock market
forecasting, Neurocomputing 72 (1618) (2009) 34393446.
[22] M.R. Hassan, B. Nath, M. Kirley, A HMM based fuzzy model for time series
prediction, in: Proceedings of FUZZ-IEEE Conference, 2006, pp. 99669974.
[23] M.R. Hassan, B. Nath, M. Kirley, J. Kamruzzaman, A hybrid of multiobjective
evolutionary algorithm and HMMFuzzy model for time series prediction,
Neurocomputing 81 (2012) 111.
[24] M. Mannle, Identifying rule-based TSK fuzzy models, in: Proceedings of
EUFIT, 1999, pp. 286299.
[25] H. Bahi, M. Sellami, Combination of vector quantization and Hidden Markov
Models for arabic speech recognition, in: ACS/IEEE Proceedings of International Conference on Computer Systems and Applications, 2001, p. 0096.
[26] X. Huang, Y. Aricki, M. Jack, Hidden Markov Models for Speech Recognition,
Edinburgh University Press, 1990.
[27] L.E. Baum, T. Pitrie, G. Souls, N. Weiss, A maximization technique occurring in
the statistical analysis of probabilistic functions of Markov chains, Ann. Math.
Stat. 41 (1970) 164171.
[28] L.E. Baum, An inequality and associated maximization technique in statistical
estimation of probabilistic functions of Markov processes, Inequalities 3
(1972) 18.
[29] S.-M. Chen, S.-H. Lee, A new method for generating fuzzy rules from
numerical data for handling classication problems, Appl. Artif. Intell.
(2001) 645664.
[30] P.P. Angelov, R.A. Buswell, Automatic generation of fuzzy rule-based models
from data by genetic algorithms, Inf. Sci. (2003) 1731.
[31] X.Z. Wang, Y.D. Wang, X.F. Xu, W.D. Ling, D.S. Yeung, A new approach to
fuzzy rule generation: fuzzy extension matrix, Fuzzy Sets Syst. (2001)
291306.
[32] M.R. Hassan, B. Nath, M. Kirley, A data clustering algorithm based on single
Hidden Markov Model, in: Proceedings of the International Multiconference
on Computer Science and Information Technology, 2006, pp. 5766.
[33] M. Ragulskisand, K. Lukoseviciute, Non-uniform attractor embedding for
time series forecasting by fuzzy inference systems, Neurocomputing 72
(2009) 26182626.
[34] T. Takagi, M. Sugeno, Fuzzy identication of systems and its application to
modeling and control, IEEE Trans. Syst. Man Cybern. (1985) 116132.
[35] J. Zurada, Optimal Data Driven Rule Extraction using Adaptive Fuzzy-Neural
Models, Ph.D. Dissertation, University of Louisville, 2002.
[36] A.E. Gaweda, J.M. Zurada, Data-driven linguistic modeling using relational
fuzzy rules, IEEE Trans. Fuzzy Syst. 11 (2003) 121134.
[37] G.C. Goodwin, K.S. Sin, Adaptive Filtering Prediction and Control, PrenticeHall, Upper Saddle River, NJ, 1984.
[38] Yahoo nance, /http://nance.yahoo.com/S. URL /http://nance.yahoo.com/S.
[39] S.L. Chiu, An efcient method for extracting fuzzy classication rules from
high dimensional data, J. Adv. Comput. Intell. 1 (1997) 17.
[40] J.R. Jang, C.T. Sun, E. Mizuatani, Neuro-Fuzzy and Soft Computing, Prentice
Hall, Englewood Cliffs, NJ, 1997.
[41] J.A. Hartigan, M.A. Wong, Algorithm as 136: a k-means clustering algorithm,
Appl. Stat. 28 (1979) 100108.
[42] D. Rumelhart, J. McClelland, Parallel Distributed Processing, MIT Press, 1986.
[43] M.R. Hassan, B. Nath, Stock market forecasting using Hidden Markov Model:
a new approach, in: Proceedings of International Conference on Intelligent
System Design and Application, 2005, pp. 192196.
[44] G. Atsalakisa, K. Valavanisb, Surveying stock market forecasting techniques,
part II: soft computing methods, Expert Syst. Appl. 36 (3) (2009) 59325941.
[45] J. Kamruzzaman, R. Sarker, Forecasting of currency exchange rates using
ANN: a case study, in: International Conference on Neural Networks and
Signal Processing, 2003, pp. 793797.
[46] Monthly electricity data, /http://www.robjhyndman.com/TSDLS. URL /http://
datamarket.com/data/set/22l0/monthly-electricity-production-in-australia-mil
lion-kilowatt-hours-jan-1956-aug-1995#!display=line&ds=22l0S
.
Md. Raul Hassan received a B.Sc. (Engg) in Electronics and Computer Science from Shah Jalal University
of Science and Technology, Bangladesh and a Ph.D. in
Computer Science and Software Engineering from the
University of Melbourne, Australia in 2000 and 2007
respectively. Currently, he is a faculty member in the
Department of Information and Computer Science,
King Fahd University of Petroleum and Minerals, Saudi
Arabia. His research interests include neural networks,
fuzzy logic, evolutionary algorithms, Hidden Markov
Model and support vector machine with a particular
focus on developing new data mining and machine
learning techniques for the analysis and classication
of biomedical data. He is currently involved in several research and development
projects for effective prognosis and diagnosis of breast cancer from gene expression microarray data. He is the author of around 30 papers published in

recognized international journals and conference proceedings. He is a member of
the Melbourne university breast cancer research group, Australian Society of
Operations Research (ASOR), and IEEE Computer Society; and is involved in several
Program Committees of international conferences. He also serves as the reviewer
of few renowned journals such as BMC Breast Cancer, IEEE Transactions on Fuzzy
Systems, Neurocomputing, Knowledge and Information Systems, Current Bioinformatics, Information Science, Digital Signal Processing, IEEE Transactions on
industrial electronics and Computer Communications.
Kotagiri Ramamohanarao received the B.E. degree

from Andhra University in 1972, the M.E. degree from
the Indian Institute of Science in 1974, and the Ph.D.
degree from Monash University in 1980. He joined the
Department of Computer Science and Software Engineering at the University of Melbourne in 1980, was
awarded the Alexander von Humboldt Fellowship in
1983, and was appointed a professor of computer
science in 1989. He held several senior positions, such
as head of the School of Electrical Engineering and
Computer Science at the University of Melbourne,
co director of the Key Centre for Knowledge-Based
Systems, and research director for the Cooperative
Research Centre for Intelligent Decision Systems. He served as a member of the
Australian Research Council Information Technology Panel. He also served on the
editorial boards of the IEEE Transactions on Knowledge and Data Engineering,
Computer Journal and the VLDB Journal. At present, he is also on the editorial
boards of Universal Computer Science and the Journal of Knowledge and
Information Systems. He served as a program committee member of several
international conferences, including SIGMOD, IEEE ICDM, VLDB, ICLP, and ICDE. He
was a program co chair of VLDB, PAKDD, and DOOD conferences. He is a steering
committee member of IEEE ICDM, DASFAA, and PAKDD. He is a fellow of the
Institute of Engineers Australia, the Australian Academy of Technological Sciences
and Engineering and the Australian Academy of Science. He is a recipient of the
Centenary Medal for his contribution to computer science. He has published more
than 200 research papers. His research interests are in the areas of database
systems, logic-based systems, agent-oriented systems, information retrieval, data
mining, and machine learning. He is currently working as a Professor at the
University of Melbourne.
Joarder Kamruzzaman received a B.Sc. and M.Sc. in

electrical engineering from Bangladesh University of
Engineering & Technology, Dhaka, Bangladesh in 1986
and 1989 respectively, and a Ph.D. in information
system engineering from Muroran Institute of Technology, Japan, in 1993. Currently, he is a faculty
member in the Faculty of Information Technology,
Monash University, Australia. His research interest
includes computer networks, computational intelligence, and bioinformatics. He has published over 150
peer-reviewed publications which include 40 journal
papers and 6 book chapters, and edited two reference
books on computational intelligence theory and applications. He is currently serving as a program committee member of a number of
international conferences.
25
Mustazur Rahman received his Ph.D. in Computer

Science and Software Engineering from the University
of Melbourne in August 2010. He completed a Graduate Certicate in Research Commercialization in 2009
from Melbourne Business School and B.Sc. in Computer Science and Engineering in 2004 from Bangladesh
University of Engineering and Technology (BUET). His
research interests include scientic and business
workow management, scheduling in Grids and P2P
systems, Cloud computing and autonomic systems.
Mustaz has contributed to the Gridbus Workow
Engine that facilitates users to execute scientic workow applications on Grids. Mustazur Rahman is
currently working as a Consultant of Business Analytics and Optimization Service
Line at IBM Australia.
M. Maruf Hossain received the B.Sc. (Hons) degree

from the University of Dhaka, Bangladesh in 2000,
the MIT degree from the Deakin University, Australia
in 2005 and a Ph.D. in Computer Science from the
University of Melbourne, Australia in 2009. He is currently
working as a senior data analyst at Australian Transaction
Reports and Analysis Centre, Australia. His research
interests include data mining, machine learning, receiver
operating characteristics curves, and classication.

A HMM Based Adaptive Fuzzy Inference System For Stock Market Forecasting

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A HMM Based Adaptive Fuzzy Inference System For Stock Market Forecasting

Enviado por

Direitos autorais:

Formatos disponíveis

Neurocomputing 104 (2013) 1025

Contents lists available at SciVerse ScienceDirect

A HMM-based adaptive fuzzy inference system for stock market forecasting

Corresponding author. Tel.: 61 03 8344 1408; fax: 61 03 9348 1184.

2. The system should be able to represent an overall knowledge

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

In this section, we describe the preliminary concept of HMM

2.1. Hidden Markov Model

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

Number of states of a HMM

Initial state transition probability vector

The ring strength of jth rule

Steepness of the membership function for ith feature of jth rule

The set of observation symbols

where q0 is initial state.

where Qstate sequence q1 ,q2 , . . . ,qk and qi A S (for a k-state HMM),

l The HMM model,

sequences. HMMs have been applied for speech recognition since

where q0 is the next state, q is the current state, Sj is the

where bqi xi emission probability of the feature xi from state qi.

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

distribution bj(k) to the continuous output probability density

3. Adaptive fuzzy inference system

Feed the data vectors into a HMM

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

In Phase 1, the initial fuzzy model is generated. In the process of

/x1 ,x2 , . . . ,xWT S

of the window size WT)and the dataset contains m data vectors.

Mji x e1=2xi Fij =sij ,

where Mji(x) represents the membership function for the attribute

Then y^ j is f j v1 ,v2 , . . . ,vk :

is the output predicted by jth-rule and bj0 ,bj1 , . . . ,bjk

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

3.1.3. Fuzzy rule generation

In the case of further rule creation, the dataset in the second

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

Fig. 3. The selected rule and its adaptation.

rules using the HMMFuzzy approach. Let us assume n 40 and

where Fij current center dening the membership function Mij,

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

4. Experiment design and data sets

values, we have used the same methodology as in AFIS. We call

4.3. Performance metrics

4.3.1. Mean Absolute Percentage Error (MAPE)

where r total number of test data vectors, yi actual stock price

4.3.2. Normalized Root Mean Squared Error (NRMSE)

where y^ AFIS average of predicted outputs from AFIS; y^ # average

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

Number of training data instances

Number of training data instances

Number of training data instances

Number of training data instances

Number of training data instances

Number of training data instances

Number of training data instances

Md.R. Hassan et al. / Neurocomputing 104 (2013) 1025

Number of training data instances

Number of training data instances

Number of training data instances

Number of training data instances

Number of training data instances

Number of training data instances

Number of training data instances

M2,1 : F 4.8633 and s 2.3628

M2,1 : Fn 4.9431 and sn 2.2804