Escolar Documentos
Profissional Documentos
Cultura Documentos
a r t i c l e
i n f o
Keywords:
Classication
Quantum-inspired
Differential evolution
Data mining
Continuous attribute
a b s t r a c t
The quantum-inspired differential evolution algorithm (QDE) is a new optimization algorithm in the binary-valued space. The paper proposes the DE/QDE algorithm for the discovery of classication rules. DE/
QDE combines the characteristics of the conventional DE algorithm and the QDE algorithm. Based on
some strategies of DE and QDE, DE/QDE can directly cope with the continuous, nominal attributes without discretizing the continuous attributes in the preprocessing step. DE/QDE also has specic weight
mutation for managing the weight value of the individual encoding. Then DE/QDE is compared with
Ant-Miner and CN2 on six problems from the UCI repository datasets. The results indicate that DE/
QDE is competitive with Ant-Miner and CN2 in term of the predictive accuracy.
2009 Elsevier Ltd. All rights reserved.
1. Introduction
Data mining is the process of knowledge discovery, which
searches a large volume of data to discover interesting and useful
information previously unknown (Collard & Francisci, 2001). Data
classication is one of the most common tasks of data mining. It
generates from a set of training examples a set of rules to classify
future test data. Evolutionary algorithms (EAs) have been applied
to the numerical optimization, combinatorial optimization, neural
networks, and data mining.
1.1. Related work
Genetic algorithms (GAs) have been applied widely to data mining for classication. Holland (1986) proposed Michigan approach
which represents one rule by one individual, and Smith (1983) proposed Pittsburgh approach which represents several rules by one
individual. Rule induction is one of the most common forms of
knowledge discovery. It is able to convert the data into a set of
IF-THEN rules for classication. The algorithms based on GAs
for rule discovery has been studied in Jong, Spears, and Gordon
(1993), Liu and Kwok (2000), Fidelis, Lopes, and Freitas (2000),
Au, Chan, and Yao (2003) and Chiu (2002).
Recently, some algorithms based on other EAs have been developed for rule discovery. Jiao, Liu, and Zhong (2006) proposed the
organizational coevolutionary algorithm for classication (OCEC).
OCEC uses a bottom-up search mechanism, and causes the evolution of sets of examples which form organizations. Three new evo* Corresponding author. Tel.: +86 21 34204261; fax: +86 21 34204427.
E-mail addresses: hjsu@sjtu.edu.cn, hjsush@gmail.com (H. Su).
0957-4174/$ - see front matter 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2009.06.029
2. The DE algorithm
In the section, it describes three versions of DE which can deal
with continuous, integral and binary optimization, respectively.
1217
2.1.1. Mutation
There are several mutation forms of DE at present. The scheme
DE=rand=1 is one of the most popular schemes used. This scheme
will be described as follow. Target vectors denote
T
T
X i;G x1;i ; . . . ; xN;i , and trial vectors denote V i;G v 1;i ; . . . ; v N;i ,
where i 1; . . . ; NP; N is the dimension of the target function, and
the subscript G denotes the G-th generation. DE=rand=1 is expressed the following equations:
V i;G X r1 ;G FX r2 ;G X r3 ;G
uj;i
v j;i
if randj 0; 1 6 CR or j jrand
xj;i
otherwise
X i;G1
U i;G
X i;G
otherwise
f yj
DE is a novel parallel search method. Because DE is a oatingpoint encoded evolutionary algorithm, it often deals with the
real-valued optimization problems. DE generates new candidate
individuals by combining a parent individual and one or several
differences. DE has three parameters: mutation control parameter
F, crossover control parameter CR and population size NP.
Some versions of DE provided many novel strategies for three
parameters.
where
yj INTxj ;
xj 2 X;
j 1; . . . ; N
1218
xi;j xLj ri;j xUj xLj 1
where j 1; . . . ; N. xUj and xLj are the upper and the lower bounds of
the j-th variable, respectively. Using the two modied equations,
the problem containing integral variables should be handled easily.
2.3. QDE for binary optimization
The above method is not suit to handle binary variables
because the value of an binary variable is either 0 or 1. Thus
the form of DE should be modied properly to implement binary
optimization. QDE is a novel evolutionary algorithm based on the
concept and principles of quantum computing (Su & Yang, 2008).
It uses a string of Q-bits as an individual. The Q-bit representation
has a better feature than ordinary binary string. QDE is designed
with the Q-bit representation. We present the QDE algorithm in
the following.
QDE maintains a population of Q-bit individuals,
Q G qG1 ; qG2 ; . . . ; qGn at generation G, where n is the size of population, and q is a Q-bit individual dened as
"
qGi
bGi2
...
bGim
where m is the length of a Q-bit individual. Because jaGij j2 or jbGij j2 denotes a probability toward either 0 or 1 state, they can be changed
by the mutation operator of DE. Thus the mutation operator can be
expressed as
(
0
i;j
v Gi;j
if randj 0; 1 6 CR or j jrand
aGi;j otherwise
b0i;j
r
2
1 a0i;j
q0i
b0i2
...
b0im
10
The new binary string uGi can be obtained by observing each Q-bit
state of qGi . The population at generation G denotes XG
G
x1 ; . . . ; xGn , and the observation vectors denote UG u01 ; . . . ; u0n .
The selection operator is expressed as
(
xiG1
u0i
xGi
if f u0i < f xGi
otherwise
Gene1
W1
V 1,1
Gene m
V 1,2
Wm
V m,1
11
Continuous
Wj
V j,lower
V j,upper
Binary
Wj
V j,1
V j,2
Integer
Wj
V j,1
V j,2
and
(
qiG1
q0i
if f u0i < f xGi
qGi
otherwise
12
So
the observing population at generation G 1 is XG 1
G1
, and the Q-bit population at generation G 1
x1 ; x2G1 ; . . . ; xG1
n
.
is Q G 1 q1G1 ; q2G1 ; . . . ; qG1
n
Class
V m,2
Ck
TP
TP FP
TP
Co
TP FN
TN
Sp
TN FP
Ac
13
14
15
Fitness x1 Ac Co Sp x2 Simp
16
where Simp is a measure of rule simplicity, and x1 and x2 are userdened weights. The Simp measure can be dened in many different
ways. Here, Simp is expressed as follows:
Simp
Termu
Terma
17
where Terma means the number of the useful attributes of a rule, and
Termu means the number of potentially useful attributes. A given
attribute Ai is said to be potentially useful if there is at least one training example having both the Ai s value specied in the rule antecedent and the goal atribute value specied in the rule consequent (de
Araujo, Lopes, & Freitas, 1999). In our experiment, x1 and x2 are
set to 0.8 and 0.2, respectively. The tness is taken on values in the
range [0, 1]. In the classication problem, we search an individual
having the maximum tness value per optimization process.
3.3. Rule extraction and prediction method
In DE/QDE, each individual of the population represents a rule.
The genome of an individual consists of the antecedent (IF part)
and the consequent (THEN part) of the rule. During one run, the
antecedent of the genome of each individual needs to perform
the interrelated evolution operators, but the consequent species
a preset xed class. Each run discovers a single rule which predicts
a given class for examples. If a given dataset contains m classes, the
algorithm needs to run m times at least.
1219
1220
}
else if( the j-th attribute is integral )
{
Change the value of the second eld V j;1 ;
V j;2 INTV j;1 ;
}
}
}
end.
The weight mutation is developed to change the weight of an
attribute in an individual. In the initial population, we usually
choose an attribute randomly whose weight value is set to 1. It
means that each initial individual has only a useful attribute. The
parameter pw denotes the attribute-insertion or attribute-removal
probability. When a random number is less than pw and the j-th
attribute is useless, the j-th attribute is transformed to be useful.
When a random number in the range [0, 1] is less than pw and
the j-th attribute is useful, the j-th attribute is transformed to be
useless. However, during the weight mutation process of an individual, the transformation from useless to useful is limited not
more than two times, and the transformation from useful to useless is limited not more than one time. The following codes sun
up the process of the weight mutation.
The pseudo-code of the weight mutation:
begin
fori 0; i 6 NP; i
{
bag1=0; bag2=0;
forj 0; j 6 m; j
{
if( rand < pw kj m 1)
{
if( W j 0&&bflag1 < 2
{
W j 1; bag1++;
}
if( W j 1&&bflag2 < 1)
{
W j 0; bag2++;
}
}
}
}
end.
We consider that pm should be set to a little value, such as 0.1.
Then each attribute has a small chance to be removed or inserted
in an individual. It ensures that the actual length of an individual
is changed slightly in each iteration. This attribute-insertion and
attribute-removal strategy can test iteratively whether an attribute
is useful to an individual. The basic process of the DE/QDE algorithm is presented as follow:
Step 1: Initialize the population.
Step 2: Perform the value mutation operator according to the
kind of the attributions.
Step 3: Perform the weight mutation operator.
Step 4: Perform the crossover operator.
Step 5: Evaluate each individuals tness.
Step 6: Perform the selection operator.
Step 7: If the iterative generation gets to the preset value, go to
Step 8, else go to Step 2.
Step 8: Extract a rule from the best individual.
Table 1
UCI repository datasets used in experiments.
No.
Datasets
Examples
Attributes
Class
Continuous
Binary
Integral
1
2
3
4
5
6
282
683
958
366
155
303
1
6
5
1
13
3
3
9
9
32
2
2
2
6
2
5
1221
DE/QDE
Ant-Miner
CN2
75.52 4.91
92.68 5.07
98.85 2.07
91.53 2.40
90.97 6.34
52.15 3.61
75.28 2.24
96.04 0.93
73.04 2.53
94.29 1.20
90.00 3.11
59.67 2.50
70.69 3.87
95.74 0.74
76.83 2.27
83.05 1.94
92.50 2.76
54.82 2.56
67.69 3.59
94.88 0.88
97.38 0.52
90.38 1.66
90.00 2.50
57.48 1.78
Table 3
The average number of rules of DE/QDE, Ant-Miner, Ant-Miner without rule pruning and CN2 for the datasets.
DE/QDE
Ant-Miner
CN2
6.30 1.19
11.80 1.08
10.00 0.00
11.90 2.02
4.30 0.64
11.10 1.37
7.10 0.31
6.20 0.25
8.50 0.62
7.30 0.15
3.40 0.16
9.50 0.92
19.60 0.22
22.80 0.20
68.80 0.32
25.90 0.31
6.80 0.13
21.80 0.20
55.40 2.07
18.60 0.45
39.70 2.52
18.50 0.47
7.20 0.25
42.40 0.71
Table 4
The number of terms per rule of DE/QDE, Ant-Miner, Ant-Miner without rule pruning and CN2 for the datasets.
DE/QDE
Ant-Miner
CN2
2.80
1.20
2.60
3.11
2.98
3.38
1.28
1.97
1.18
3.16
2.41
1.71
3.25
5.72
3.47
16.86
6.01
4.32
2.21
2.39
2.90
2.47
1.58
2.79
reduce the number of the rules. In DE/QDE, the match value is used
to classify a new example uncovered by the rule set. Pruning the
antecedent of a rule does not reduce the number of the rule set,
so the number of the rule set is not changed in the rule pruning
step.
In Table 2, it can be seen that the predictive accuracies of DE/
QDE for Breast cancer (L), Tic-tac-toe and Hepatitis are higher than
those of Ant-Miner. DE/QDE has higher predictive accuracies than
CN2 for Breast cancer (L), Tic-tac-toe, Dermatology and Hepatitis.
But the standard deviations of DE/QDE are often larger than those
of Ant-Miner and CN2. In DE/QDE, the continuous attributes are directly used in the process of nding rules. Ant-Miner needs to discretize the continuous attributes. After a continuous attribute is
discretized, it values will become several isolated values rather
other many continuous value in a range. So discretization is often
a good thing because this step simplies the distribution of the
continuous attribute. Of course, DE/QDE can also accept the continuous attributes which are discretized. But this is not the purpose of
the paper.
In each dataset, the number of rules of DE/QDE is obviously less
than those of Ant-Miner without rule pruning and CN2. But DE/
QDE obtains better result than Ant-Miner only for Breast cancer
(L). Likewise, in Table 4, the number of term per rule has similar
comparison results, but DE/QDE obtains better result than AntMiner only for Breast cancer (W).
Generally, the performance of a classication method using
rules is affected by several aspects, such as the discovering rules
algorithm, the evaluation tness and the rule pruning method. In
DE/QDE, the values of F and CR are not discussed specially because
they seldom inuence DE/QDE to discover classication rules. But,
how the weight mutation works is a very important step which
determines the conversion of the attributes between useful and
1222
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4),
261283.
Collard, M., & Francisci, D. (2001). Evolutionary data mining: an overview of
genetic-based algorithms. In Proceedings of the IEEE congress on evolutionary
computation (pp. 39).
de Araujo, D. L. A., Lopes, H. S., & Freitas, A. A. (1999). A parallel genetic algorithm for
rule discovery in large databases. In Proceedings of the IEEE congress on systems,
man and cybernetics, Tokyo (pp. 940945).
Falco, I., Cioppa, A., & Tarantino, E. (2002). Discovering interesting classication
rules with genetic programming. Applied Soft Computing(1), 257269.
Fidelis, M. V., Lopes, H. S., & Freitas, A. A. (2000). Discovering comprehensible
classication rules with a genetic algorithm. In Proceedings of the IEEE congress
on evolutionary computation (pp. 805810).
Han, K.-H., & Kim, J.-H. (2002). Quantum-inspired evolutionary algorithm for a class
of combinatorial optimization. IEEE Transactions on Evolutionary Computation,
6(6), 580593.
Han, K.-H., & Kim, J.-H. (2004). Quantum-inspired evolutionary algorithms with a
new termination criterion, H gate and two-phase scheme. IEEE Transactions on
Evolutionary Computation, 8(2), 156169.
Holden, N., & Freitas, A. A. (2008). A hybrid PSO/ACO algorithm for discovering
classication rules in data mining. Journal of Articial Evolution and Applications,
2008, 11. Article ID 316145. doi:10.1155/2008/316145.
Holland, J. H. (1986). Escaping brittleness: The possibilities of general purpose
learning algorithms applied to parallel rule-based systems. Machine Learning:
An Articial Intelligence Approach, 2, 593623.
Jiao, L., Liu, J., & Zhong, W. (2006). An organizational coevolutionary algorithm for
classication. IEEE Transaction on Evolutionary Computation, 10(1), 6780.
Johnson, H. E., Gilbert, R. J., & Winson, Michael K. (2000). Explanatory analysis of the
metabolome using genetic programming of simple interpretable rules. Genetic
Programming and Evolvable Machines(1), 243258.
Jong, K. A. D., Spears, W. M., & Gordon, D. F. (1993). Using genetic algorithms for
concept learning. Machine Learning, 13(23), 161188.
Lampinen, J., & Zelinka, I. (1999). Mixed integerdiscretecontinuous optimization
by differential evolution, part 1. In Proceedings of the fth international congress
on soft computing.
Liu, J. J., & Kwok, J. T.-Y. (2000). An extended genetic rule induction algorithm. In
Proceedings of the IEEE congress on evolutionary computation (pp. 458463).
Pampara, G., Engelbrecht, A., & Franken, N. (2006). Binary differential evolution. In
Proceedings of the IEEE congress on evolutionary computation (pp. 18731879).
Parpinelli, R. S., Lopes, H. S., & Freitas, A. A. (2002). Data mining with an ant colony
optimization algorithm. IEEE Transactions on Evolutionary Computing, 6(4),
321332.
Smith, S. F. (1983). Flexible learning of problem solving heuristics through adaptive
search. In Proceeding of 8th international congress on articial intelligence,
Karlsruhe, Germany (pp. 422425).
Sousa, T., Silva, A., & Neves, A. (2004). Particle swarm based data mining algorithms
for classication tasks. Parallel Computing, 30, 267783.
Stron, R., & Price, K. (1997). Differential evolution A simple and efcient heuristic
for global optimization over continuous spaces. Journal of Global Optimization,
11(4), 341359.
Su, H., & Yang, Y. (2008). Quantum-inspired differential evolution for binary
optimization, In The 4-th international conference on natural computation (pp.
341346).