Escolar Documentos
Profissional Documentos
Cultura Documentos
www.elsevier.com/locate/neunet
Abstract
Sales forecasting plays a very prominent role in business strategy. Numerous investigations addressing this problem have generally
employed statistical methods, such as regression or autoregressive and moving average (ARMA). However, sales forecasting is very
complicated owing to influence by internal and external environments. Recently, artificial neural networks (ANNs) have also been applied in
sales forecasting since their promising performances in the areas of control and pattern recognition. However, further improvement is still
necessary since unique circumstances, e.g. promotion, cause a sudden change in the sales pattern. Thus, this study utilizes a proposed fuzzy
neural network (FNN), which is able to eliminate the unimportant weights, for the sake of learning fuzzy IF THEN rules obtained from the
marketing experts with respect to promotion. The result from FNN is further integrated with the time series data through an ANN. Both the
simulated and real-world problem results show that FNN with weight elimination can have lower training error compared with the regular
FNN. Besides, real-world problem results also indicate that the proposed estimation system outperforms the conventional statistical method
and single ANN in accuracy. q 2002 Elsevier Science Ltd. All rights reserved.
Keywords: Sales forecasting; Artificial neural networks; Fuzzy neural networks; Fuzzy weight elimination
Nomenclature
p the sample number
Xp the input vector of sample p
Tp the target vector of sample p
Opk the output of kth output node
Oph the output of hth hidden node
Wih the connection weight from ith input node to hth hidden node
Whk the connection weight from hth hidden node to kth output node
Netpk the net internal activity level of kth output node
Netph the net internal activity level of hth hidden node
Qj the bias of jth output node
Ep the cost function for sample p
Eps the cost function of s-levels a-cut set for sample p
L
Epks the cost function of the lower boundary for s-levels a-cut set of sample p
L
Epks the cost function of the upper boundary for s-levels a-cut set of sample p
X p the fuzzy input for sample p
Op the fuzzy output for sample p
W ih ; W
hk the fuzzy weights
Q h ; Q k the fuzzy biases
h the learning rate
a the momentum term
p
aL ; p aU the lower limit and the upper limit of the a-cut of fuzzy number
main reason to propose such FNN is that the promotion 2.1. Artificial neural networks in sales forecasting
effect on sales is always very vague, or fuzzy.
In addition to obtaining the promotion effect on sales, it In an enterprises decision support system, sales fore-
is also necessary to provide the forecast of the sales. casting always plays a prominent role. An accurate sales
Therefore, this study also aims to develop an intelligent forecasting in advance is able to help the decision maker
sales forecasting system, which consists of three parts: (1) calculate production and materials costs, even determine the
data collection, (2) special pattern model (FNN), and (3) sale price (LeVee, 1992 1993). This will lead to a lower
decision integration (ANN). To evaluate the proposed inventory level and achieve the objective of just-in-time.
system, the real-world data provided by a well-known Among the conventional sales forecasting methods (Chase,
convenience store (CVS) company in Taiwan are used, 1993; Florance & Sawicz, 1993; Meyer, 1993), most of
while the promotion effect is obtained by surveying the them used either factors or time series data to determine the
experts in the retailing. According to these results, the forecast. However, the relationship between the factors or
proposed system performs more accurately than the con- the past time series data (independent variables) and the
ventional statistical method and single ANN, particularly sales (dependent variable) is always quite complicated.
when the promotion is conducted. Obtaining the promising results through the above-
The rest of this paper is organized as follows. Section 2 mentioned approaches is quite difficult. Therefore, various
provides some necessary background information while the decision makers prefer using their own intuition, instead of
proposed system is discussed in Section 3. Section 4 model-based approaches (i.e. time series or regression
presents the simulation results of FNN, while the evaluation models). However, a model-free approach, ANN, is applied
results are summarized in Section 5. Discussion and in the area of forecasting recently owing to its adequate
concluding remarks are finally made in Sections 6 and 7, performance in control and pattern recognition.
respectively. Artificial neural network (ANN) models are built on
networks of processing units called neurons that are arranged
in layers and are connected to one another by restricted links.
Links between neurons have associated weights. Many studies
2. Background have attempted to apply ANNs to time-series forecasting.
However, their conclusions are often contradictory. Some
In this section, sales forecasting systems and applications studies found out that ANNs are better than conventional
of artificial neural networks in sales forecasting are briefly methods (Weigen, Rumelhart, & Huberman, 1991), while
reviewed. In addition, fuzzy neural networks are also others concluded an opposite conclusion (Tang, Almeida, &
discussed in the following. Fishwick, 1991). A weight-elimination back-propagation
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 911
learning procedure to effectively deal with the overfitting fuzzy logic. In the first one, the traditional fuzzy system
problem is introduced by Weigen et al. (1991). It was also mentioned above is based on experts knowledge. However,
applied to sunspots and an exchange rate time series. Tang it is not very objective. Besides, acquiring robust knowledge
et al. (1991) compared the ANN and Box Jenkins models, and finding available human experts are extremely difficult.
using international airline passenger traffic, domestic car Now, ANNs learning algorithm has been applied to
sales and foreign car sales in the USA. They concluded that enhance the performance of a fuzzy system and demon-
the Box Jenkins models outperformed the ANN models in strated to be an innovative approach. In addition, fuzzy IF
short-term forecasting. On the other hand, the ANN models THEN rules were generated and adjusted by learning
outperformed the Box Jenkins in long term forecasting. methods using numerical data. Takagi and Hayashi (1991)
In order to predict the flour prices in three cities in USA, introduced a feedforward ANN into fuzzy inference. An
Chakraborty et al. (1992) presented an ANN approach to ANN represents a rule, while all the membership functions
multivariate time-series analysis. They showed that the are represented by only one ANN. Jang (1991, 1992) and
result is quite accurate. According to their results, the ANN Jang and Sun (1993) proposed a method which transforms
approach is a leading contender among statistical modeling the fuzzy inference system into a functional equivalent
approaches. Lachtermacher and Fuller (1995) developed a adaptive network, and then employs the EBP-type algorithm
calibrated ANN model. The model used Box Jenkins to update the premise parameters and a least squares method
methods to identify the lag components of the data, to identify the consequence parameters. Meanwhile, Fukuda
which should be used as input variables. In addition, it and Shibata (1992), Shibata, Fukuda, Kosuge, and Arai
employed a heuristics to suggest the number of hidden units (1992) and Wang and Mendel (1992) also presented similar
needed in structuring the model. In examining the stationary methods. Nakayama, Horikawa, Furuhashi, and Uchikawa
series, they observed that the calibrated ANN models have (1992) proposed a so-called FNN which has a special
only a slightly better overall performance than the structure for realizing a fuzzy inference system. Each
conventional time-series methods used in the benchmark. membership function consists of one or two sigmoid func-
In the case of a non-stationary series, the calibrated ANN
tions for each inference rule. Lin and Lee (1991) proposed
models outperformed the ARMA model for three of the four
the so-called Neural-Network-Based Fuzzy Logic Control
series, and almost as well as the ARMA in fourth series. The
System (NN-FLCS). They introduced the low-level learning
above survey indicates that ANN is more appropriate for the
power of neural networks in the fuzzy logic system and
time series data. Ansuj, Camargo, Radharamanan, and Petry
provided high-level human-understandable meaning to the
(1996) compared the time series model with interventions
normal connectionist architecture. In addition, Kuo (1994)
and ANN model in analyzing the behavior of sales in a
and Kuo and Cohen (1998, 1999) introduced a feedforward
medium size enterprise. The results showed that ANN
ANN into fuzzy inference represented by Takagi Sugeno
model is more accurate. Kumar et al. (1995) found that
model.
ANN does quite well compared to logistic regression in
predicting a dichotomous choice in presence of several The above-mentioned FNNs are only appropriate for
independent variables. However, considering only some numerical data. However, the experts knowledge is always
series data may result in a worse forecast. Including both the fuzzy type. Thus, some researchers have attempted to
time series data and factors in the forecasting model seems address this problem. Gupta and Knopf (1990) and Gupta
to be preferable. and Qi (1991, 1992) presented some models with fuzzy
Recently, Bigus (1996) used promotion, time of year, end neurons, but no learning algorithms were proposed in the
of month flag, and weekly sales as inputs for the ANN in paper. However, in a series of papers as cited in a survey
order to forecast the weekly demand. The results seem very paper (Buckley & Hayashi, 1994), the authors discussed the
promising. Agrawal and Schorling (1997) also have shown learning algorithms and applications for fuzzy neural
that ANN is able to predict brand shares quite well even networks with fuzzy inputs, weights and outputs (Buckley
when price promotions, feature, and display are present in & Hayashi, 1992; Hayashi, Buckley, & Czogula, 1993).
the data set. Ishibuchi, Kwon and Tanaka (1995a) and Ishibuchi, Okada,
Fujioka, and Tanaka (1993) also proposed learning methods
2.2. Fuzzy neural networks of neural networks to utilize not only numerical data but
also expert knowledge represented by fuzzy IF THEN
ANNs and the fuzzy model have been used in many rules. Lin (1995) and Lin and Lu (1995) also presented an
application areas (Lee, 1990; Lippmann, 1987; Zadeh, FNN, capable of handling both the fuzzy inputs and outputs.
1973), each pairing its own advantages and disadvant- Based on Ishibuchis work, Kuo and Xue (1999) presented
ages. Therefore, how to combine these two approaches an FNN, which does not only posses the asymmetric fuzzy
successfully has become a relevant concern of further inputs and outputs, but also the asymmetric fuzzy weights.
studies. Moreover, genetic algorithm was integrated with the
Two major parts have recently received much interest: proposed FNN in order to yield better results both in
(1) fuse ANN and fuzzy logic and (2) integrate ANN and speed and accuracy (Kuo, Chen, & Hwang, 2001).
912 R.J. Kuo et al. / Neural Networks 15 (2002) 909925
Section 2 has emphasized the relevance of sales This section discusses how to use FNN to effectively
forecasting as well as some necessary background infor- handle the circumstance of promotion by means of FNN.
mation. Though the research, like (Agrawal & Schorling, Since the FNN architecture is based on the fuzzy logic
1997; Bigus, 1996), have put the promotion effect on the which possesses both the precondition and consequence, the
sales into consideration, yet it is still a straightforward precondition variables represent the effective factors while
model. It is necessary to develop a more robust approach to the sales represents the consequence variable. First, the data
handle the promotion effect on the sales and then input it to and IF THEN rules are obtained through the fuzzy Delphi
the ANN. The proposed system is discussed in more detail method. After this procedure, the collected data can be
in the following. applied to train the proposed FNN. The structure of FNN
The proposed intelligent forecasting system consists of presented in this study is similar to (Ishibuchi et al., 1995a).
(1) data collection, (2) special pattern model (FNN), and (3) The main difference is that the network employs the
decision integration (ANN). Fig. 1 shows the proposed asymmetric bell shaped instead of triangular fuzzy weights.
system architecture. The system determines the qualitative In addition, the network can eliminate the unimportant
factors affecting the sales first. Thereafter, this effect is weights during training. In the following, the two com-
integrated with time series data through a feedforward ponents, fuzzy Delphi and FNN, are discussed in more
neural network with error back-propagation (EBP) learning detail.
algorithm. Each part is thoroughly discussed in the
following subsections. 3.2.1. Fuzzy Delphi
Delphi method was first developed by Dalkey and
3.1. Data collection Helmer (1963) in RAND Corporation. This approach has
been widely applied in many management areas, e.g.
The current study requires two different kinds of data, forecasting, public policy analysis, or project planning.
which include quantitative and qualitative data. The CVS However, the conventional Delphi method cannot converge
(convenience store) franchise company can provide the very well. Besides, high survey frequencies always result in
daily sales data needed, while the promotion effect on sales high costs. Thus, Ishikawa, Amagasa, Tomiqawa, Tatsuta,
can be implemented by questionnaire. This study employs and Mieno (1993) utilized fuzzy sets theory in the Delphi
the fuzzy questionnaire to obtain the fuzzy IF THEN rules method to resolve the above shortcomings. However, the
from the domain experts. method proposed by Ishikawa et al. (1993) is inappropriate
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 913
for this research. Therefore, the procedures of the modified (1995). Thus, this component intends to modify Ishibuchis
fuzzy Delphi method for this research are as follows: work (1995a). In Ishibuchis work, the input, weight, and
output fuzzy numbers are symmetric triangular. Thus, this
a. Collect all the possible factors, which may affect the paper replaces the triangular fuzzy numbers with asymmetric
sales, and make the sortation, grouping in order to Gaussian functions since it can speed up the convergence (Kuo
formulate the first questionnaire. The domain experts & Xue, 1999). The inputoutput relation of the proposed FNN
select the important factors and give each a fuzzy is discussed in the following. However, the operations offuzzy
number. numbers are presented first.
b. Formulate the second questionnaire, which is a set of Operations of fuzzy numbers. Before describing the FNN
IF THEN rules based on the three dimensions. architecture, fuzzy numbers and fuzzy number operations
c. Fuzzify the returned second questionnaires from the are defined by the extension principle. In the proposed
senior managers and determine the pessimistic index, algorithm, real numbers and fuzzy numbers are denoted by
optimistic index and average index. The formulations the lowercase letters (e.g. a,b,) and a bar placed over
are as follows: uppercase letters (e.g. A; B;
), respectively.
1. Pessimistic (minimum) index Since input vectors, connection weights and output
l1 l2 ln vectors of multi-layer feedforward neural networks are
l 1 fuzzified in the proposed FNN, the addition, multiplication
n
and non-linear mapping of fuzzy numbers are necessary for
where li is the pessimistic index of the ith expert defining the proposed FNN. Thus, they are defined as
and n is the number of the experts. follows:
2. Optimistic (maximum) index
Xx
Zz Yy
max{Xx
^ Yylz
x y}; 6
u u2 un
u 1 2 Xx
Zz Yy
max{Xx
^ Yylz
xy}; 7
n
where ui is the optimistic index of the ith expert. f Netz max{Netxlz f x}; 8
3. Average (most appropriate index) p
where X; Y;
Z;
and Net are fuzzy numbers, denotes
For each interval li ; ui ; calculate the mid-
point, mi li ui =2 and then find the membership function of each fuzzy number, ^ is the
minimum operator, and f x 1 exp2x 2 1 is the
m m1 m2 mn 1=n 3 activation function of hidden units and output units of
the proposed FNN. The a-cut of fuzzy number X is defined as
Thereafter, the fuzzy number A m; s ; sL ; R
Hidden layer X
nO
Netpk aL kh aL O
W ph aL
ph a f Netph a;
O h 1; 2; ; nH ; 10 i1
kh aL $0
W
X
nI
Netph a W pi a Q h a;
hi aO 11 X
nO
i1 W ph aU Q k aL ;
kh aL O
k1
kh aL ,0
W
Output layer
X
nO
pk a f Netpk a;
O k 1; 2; ; nO ; 12 Netpk aU kh aU O
W ph aU
k1
X
nO kh aU $0
W
Netpk a W ph a Q k a:
kh aO 13
k1 X
nO
kh aU O
W ph aL
From Eqs. (11) (15), the a-cut sets of the fuzzy output O pk k1
kh aU ,0
W
are calculated from the a-cut sets of the fuzzy inputs, fuzzy
weights, and fuzzy biases. If the a-cut set of the fuzzy Q k aU ; 18
outputs O pk is required, then the above relation can be
rewritten as follows:
The objective is to minimize the cost function defined as:
Input layer XX
nO X
h i h i Ep L
a Ek U
a Eka Epa ; 19
O pi a O pi aL ; O
pi aU X pi aL ; X pi aU ; a k1 a
14
i 1; 2; ; nI ; where
Hidden layer X
nO
L U
h i Epa a Ek a Eka ; 20
ph a O
O ph aL ; O
ph aU k1
h i 1 2
f Netph aL ; f Netph aU ; 15 L
Ek a Tpk aL 2 O
pk aL ;
2
21
h 1; 2; ; nH ; U 1 2
Ek a Tpk aU 2 O
pk aU ;
X
nI 2
Netph aL hi aL O
W pi aL L U
where Ek a and Eka can be viewed as the squared errors for
i1
hi aL $0
W the lower boundaries and the upper boundaries of the a-cut
sets of a fuzzy outputs and fuzzy targets. Other a-cut sets of
X
nI
a fuzzy weight are independently modified to reduce Epa :
W pi aU Q h aL ;
hi aL O
i1
Otherwise, the fuzzy numbers after modifications are
hi aL ,0
W distorted. Therefore, each fuzzy weight is updated in a
similar but still different way from the approach of Ishibuchi
X
nI
et al. (1995a). That is, in the proposed FNN, the membership
Netph aU hi aU O
W pi aU
i1 functions are asymmetric Gaussian functions (i.e. a general
hi aU $0
W shape), which are represented as:
X
nI 8 2 !
pi aL Q h aU ;
hi aU O >
> 1 x 2 m
W > exp 2
> ; x,m
>
> 2 sL
i1
hi aU ,0
W
>
<
Ax 1; xm 22
16 >
> !
>
>
Output layer >
> 1 x2m 2
h i > exp 2
: ; otherwise
pk a O pk aL ; O
pk aU 2 sR
O
h i Thus, the asymmetric Gaussian fuzzy weights are specified
f Netpk aL ; f Netpk aU ; 17
by their three parameters (i.e. center right width and left
width). The gradient search method is derived for each
k 1; 2; ; nO ; parameter. It is the amount of adjustment for each parameter
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 915
1 1
F s 2 sL 22ln a1=2 da
m
0:05 2 R
a1 1
1
F ma sR 2 sL 22ln a1=2 da 25
a 0:05 2 0:05
1 X
Fig. 2. Yagers area.
F 0:95m sR sL 22ln a1=2
2 a
Fig. 3. (a) Training procedure for GA FNNW. (b) Training procedure for GAW FNN.
the FNN. It is written in C language being implemented in ship of fuzzy inputs and fuzzy outputs are shown in Fig. 5 as
IBM compatible PC. The simulation results can be a 0:1; 0:3; 0:5; 0:7; and 0.9, respectively. All the required
referenced for the real-world problem, which will be parameters are set up in the following:
presented in Section 5.
1. The number of hidden nodes: 3
4.1. Example one 2. The number of hidden layers: 2.
3. a-cut levels: a 0:1; 0:3; 0:5; 0:7; and 0.9.
The first example is a linear mapping. Three training 4. The number of training epochs: 30,000.
Tp;
samples, Xp; where Xp is the fuzzy input, Tp
is the 5. Training rate: h 0:3:
fuzzy desired output and training sample number p 6. Momentum: b 0:6:
1; 2; 3; are developed in the two-dimensional space. Each 7. Weight elimination: yes and No.
fuzzy number has mean as ym xm ; left width as yLs 2xLs ;
and right width as yRs 3xRs : The corresponding means and Fig. 6 shows the training results after training. Table 2
standard deviations are presented in Table 1. The relation- presents the MSE values for these two algorithms. The
testing sample results are shown in Fig. 7. It is obvious that
the proposed FNN can well learn the fuzzy relation between
fuzzy inputs and fuzzy outputs accurately. In addition, the
network with weight elimination is really better than the
network without weight elimination.
Table 1
Training pairs for example one
Table 2
Example ones MSE values Fig. 5. Example ones training pairs.
Table 3
Training pairs of example two
Table 4
Example twos MSE values
terms, 0.3 and 0.6, are also tested. In addition, two different
fuzzy thresholds are testified (0.3, ^ 0.1, 0.3) and (0.5,
^ 0.3, 0.5). Totally, there are 48 different combinations. The
simulation results are shown in Table 7. The network with
weight elimination definitely can reduce the training error. Fig. 9. Example twos testing results for training pairs.
The decrease rate is 3.5%. Twelve out of thirteen cases show
better performance in the case that weights are really momentum have significant influence on training error
eliminated. compared with those with weight elimination, since their p-
The other goal of trained FNN is to infer two more values are 0.006 and 0.009, respectively. Besides, the
linguistic terms for each precondition variable with respect interaction of training rate and momentum is more
to existed three linguistic terms. Thus, totally there are 25 significant, while the interaction of training rate and weight
fuzzy IF THEN rules after inference. The new added fuzzy elimination and interaction of momentum and weight
rules are presented in Table 8. Besides, genetic algorithm is elimination are not significant as a 0:05: According to
also applied for increasing the accuracy. Table 9 indicates these results, weight elimination should be implemented
the computational results. It is very clear that GA FNNW after the training rate and momentum term have been well
can provide the best forecast. However, using GA to setup. This can result in the lowest training error.
eliminate the unimportant fuzzy weights can provide better
result, but not the best.
5. Model evaluation results
4.4. Example four
The above sections have presented the proposed fore-
The purpose of this example is to reconfirm the validity casting system and FNNs feasibility numerically. Further, a
of FNN with weight elimination both in speed and accuracy. real-world problem is applied to verify the proposed
The training pairs are adopted from example one. The initial systems practicality. In addition, the proposed system is
also compared with the other method, single ANN and
weights are generated randomly. The number of simulation
ARMA. Both the procedures and results are sequentially
is 30 for both cases using IBM compatible PC-166. All the
shown in the following subsections.
parameters setup is identical to example one. Table 10
shows the simulation results and t test values. Weight
elimination really reduces the training time and training 5.1. Data collection
error as a 0:01:
In order to further find out the procedure for weight A nationally well-known CVS franchise company
elimination, we conducted a three-factorial (training rate, provides the daily sales data. Since the forecasting pattern
momentum and weight elimination) design using example
three. There are two levels for each factor. Thus, totally
there are eight different combinations. The ANOVA results
are presented in Table 11. It indicates that training rate and
Table 5
Example threes fuzzy rules table for training
X1 X2
S MS M ML L
L S MS M
ML
M MS M ML
MS
S ML L L
Fig. 10. Example twos testing results for four new testing pairs.
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 919
there are five times of promotion. The time period lasts from
1 January 1995 to 14 January 1996. For the purpose of
testing, these 379 data points are further divided into
training set and testing set. The front one has 334 data points
while the latter one having 45 data points.
(2) Expert questionnaire. To survey all the possible
factors of promotion and their effects on the sales, this study
employs fuzzy Delphi method. The questionnaires setup is
based on the companys practical requirements. Thus, some
factors are included. The procedures are based on the
modified fuzzy Delphi method.
Fig. 11. The time series data.
A large number of factors can generally affect the sales.
However, a different product has different characteristics.
is divided into two categories, general pattern and special After discussing with the companys senior managers, all
pattern, the data collection is also comprised of two parts: the factors are divided into three dimensions. The first
(1) Time series data. The company provides the daily dimension represents the methods of promotion, while the
sales of 500 cm3 papaya milk. The total number of the data types of advertising media are presented in the second
points is 379 as shown in Fig. 11. The sudden increase of dimension. The third dimension represents the competitors
sales indicates that the promotion is conducting. Totally actions. Table 12 presents the fuzzy number of each event
Table 6
Training pairs of example three
X1 X2 Y
Table 7
Example threes MSE values for different topologies and setup
Hidden nodes h b Without (0.1, 0.3) Deleted no. Improved rate (%) (0.3, 0.5) Deleted no. Improved rate (%)
Table 8 study. The reason is that genetic algorithm may prevent the
New fuzzy rule table of example three network getting stuck to the local minimum and accelerate
X1 X2 the training speed. In addition, different training rates and
momentum terms may yield different results. Thus, two
S MS M ML L training rates, 0.3 and 0.6, and two momentum terms, 0.3
and 0.6, are tested. Two different fuzzy threshold numbers
L S S MS M M
ML S MS M M ML
as used in example four will also be utilized. The network
M MS M M ML ML will not stop learning until 30,000 epochs. The a-level sets
MS M M ML L L are 0.1, 0.3, 0.5, 0.7, and 0.9. For the training results as
S ML ML L L L shown in Table 14, the cell with symbol p is the best result,
or best network. The lowest MSE value is 0.993 1023 as
training rate and momentum are 0.3 and 0.6, respectively.
after three times of survey. The reason for three times of
The network structure is 3 7 7 1. The weight
survey is that the similarity testing results indicate that all
elimination criterion is (0.5, ^ 0.3, 0.5). Besides, GAW
the fuzzy numbers have converged for the third survey.
FNN is also implemented for comparison. The best topology
Therefore, this knowledge base will be applied to train the
is 3 8 8 1 and its MSE is 1.021 1023 as training
FNN and represents the FNN outputs.
rate and momentum are 0.3 and 0.1, respectively. Finally,
Besides, one event, 3-dollar discount, is selected as the
both these two networks become integrated with time series
testing case. Thus, totally there are only 42 (3 7 2) IF-
THEN rules for training the FNN as shown in Table 13. data in the next part.
5.2. Special pattern model (FNN) 5.3. Decision integration model (ANN)
The initial weights for FNN are generated by using This subsection will demonstrate the integration of the
genetic algorithm proposed by Kuo et al. (2001) in this qualitative factor effect on sales and time series data. Both
Table 9
The results for different algorithms
FNN: FNN only and without weight-elimination; FNNW: FNN only and with weight-elimination; GA FNN: genetic algorithm to find the initial weights
and followed by FNN; GAW FNN: genetic algorithm with weight-elimination and followed by FNN; GA FNNW: genetic algorithm and followed by
FNN with weight-elimination.
Table 10
Example fours MSE values and t values
With/out weight elimination Mean Standard deviation Sample no. t value P-value Improved rate (%)
Table 11
Simulation ANOVA table
Source of deviation Degree of freedom Sum square Mean sum square F Value P Value
Table 12
The fuzzy number of each event for the third questionnaire
the training and testing results are presented in the former always has 10.05% less training epochs than the
following. latter.
(1) Training. For the integration network with both the In order to find out better results for these three models,
qualitative and quantitative factors, eight different models three training rates, 0.1, 0.3, 0.5, and three momentum
are testified in order to find out the best network topology. terms, 0.1, 0.5, 0.8, are tested. Totally there are nine
Table 15 shows the four different kinds of network topology combinations for each model. The computational results are
(Models I VIII). Besides, Table 15 also presents the four shown in Table 16. Model III has lowest MSE value,
network topologies (Models IX XII) without qualitative 0.001837, as a and b are 0.1 and 0.5, respectively, while the
factors. In addition, conventional network with an lowest MSE value for model VIII is 0.0020020 as a and b
are 0.1 and 0.5, respectively. Model XII has the lowest MSE
additional input unit for the promotion effect is also
value 002282 as a and b are 0.3 and 0.5, respectively.
considered. This input unit receives on off units (1:
Model XVI has lowest MSE value, 0.002102 as a and b are
promotion, 0: non-promotion). There are also four network
0.5 and 0.8, respectively.
topologies (Models XIII XVI) for such setup. Training rate (2) Testing. Though model III has been shown to be the
and momentum are both 0.5. The network will not stop best network using training data, it cannot be guaranteed
training until the MSE no longer decreases over 500 epochs. that its testing results are also the best. The further
Four best networks for two integration networks, conven- comparison is based on the 45 testing data points as
tional network, and conventional network with promotion mentioned. There is one time of promotion, 3-dollar
unit are models III, VIII, XII, and XVI, respectively. Model discount, during the period and it has not been included in
III with network structure 25 28 1 has the MSE value the FNN knowledge base. Both the MSE and MAPE (mean
0.002151, while the MSE value for model IIIV with absolutely percentage error) values for testing set are listed
structure 30 58 1 has the MSE value 0.002248. Model in Table 17. It is very clear that integration model still has
XII with structure 20 34 1 is 0.002441, while model the best performance compared with both conventional
XVI with network structure 21 36 1 has the MSE value networks and ARMA (autoregressive and moving average)
0.002397. Basically, the MSE values of networks with both model. The MSE value for integration model with GA
quantitative and qualitative factors are all smaller than those FNNW algorithm is 0.001753, while the integration model
of networks with only quantitative factors. Besides, the with GAW FNN algorithm has the MSE value of
0.001856. The former is also outperforms the latter. The
coefficients of ARMA are determined after examining the
ACF (auto-correlation function) and PACF (partial auto-
correlation function). The actual and forecasting outputs are
shown in Fig. 12.
6. Discussion
Table 13
Fuzzy IFTHEN rules
IF THEN
scheme utilizes two kinds of information (i.e. fuzzy if then with the forecasting result, the integration ANN is also
rules and numerical time series data) in the learning of second to none. Regarding the ARMA model and ANN
neural networks. The factor effects on the sales seem to be model, the results are dependent. Basically, if the ANN can
subjective since the data are provided by either the senior be well set up, it can provide the better result.
managers or experts. However, the number of experts is 20, The proposed FNN is able to learn the relationships
implying that the subjective factors can be reduced. In between fuzzy inputs and outputs. If pruning technique
particularly the fuzzy Delphi method is employed; the is included in the training, it provides more promising
above consideration can also be ignored. results. However, the simulation results according to
Table 17 indicates that integration model outperforms all example four indicate that the training rate and
other forecasting methods, e.g. ARMA (2,5) and conven- momentum should be well set up before including the
tional ANN. The reason is that the integration ANN weight elimination. Also, GA FNNW can provide the
prioritizes the promotion effect on the sales pattern. Even best performance.
R.J. Kuo et al. / Neural Networks 15 (2002) 909925 923
Table 14
Training results of FNN
Table 15
Different neural network model
Network number Time series Qualitative factor a-cut Promotion unit Input nodes Hidden nodes
Table 16
MSE values of network III for different setup
0.1 0.001943 0.002315 0.002067 0.002054 0.002398 0.002284 0.002708 0.002349 0.002350 0.002683 0.002293 0.002302
0.5 0.001837 0.002077 0.002151 0.002002 0.002211 0.002309 0.002454 0.002282 0.002441 0.002377 0.002254 0.002397
0.8 0.002389 0.002210 0.002332 0.002477 0.002428 0.002457 0.002793 0.002697 0.002712 0.002636 0.002598 0.002102