Você está na página 1de 7

Expert Systems with Applications 37 (2010) 12161222

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Classication rule discovery with DE/QDE algorithm


Haijun Su a,*, Yupu Yang a, Liang Zhao b
a
b

Department of Automation, Shanghai Jiaotong University, Shanghai, PR China


Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, PR China

a r t i c l e

i n f o

Keywords:
Classication
Quantum-inspired
Differential evolution
Data mining
Continuous attribute

a b s t r a c t
The quantum-inspired differential evolution algorithm (QDE) is a new optimization algorithm in the binary-valued space. The paper proposes the DE/QDE algorithm for the discovery of classication rules. DE/
QDE combines the characteristics of the conventional DE algorithm and the QDE algorithm. Based on
some strategies of DE and QDE, DE/QDE can directly cope with the continuous, nominal attributes without discretizing the continuous attributes in the preprocessing step. DE/QDE also has specic weight
mutation for managing the weight value of the individual encoding. Then DE/QDE is compared with
Ant-Miner and CN2 on six problems from the UCI repository datasets. The results indicate that DE/
QDE is competitive with Ant-Miner and CN2 in term of the predictive accuracy.
2009 Elsevier Ltd. All rights reserved.

1. Introduction
Data mining is the process of knowledge discovery, which
searches a large volume of data to discover interesting and useful
information previously unknown (Collard & Francisci, 2001). Data
classication is one of the most common tasks of data mining. It
generates from a set of training examples a set of rules to classify
future test data. Evolutionary algorithms (EAs) have been applied
to the numerical optimization, combinatorial optimization, neural
networks, and data mining.
1.1. Related work
Genetic algorithms (GAs) have been applied widely to data mining for classication. Holland (1986) proposed Michigan approach
which represents one rule by one individual, and Smith (1983) proposed Pittsburgh approach which represents several rules by one
individual. Rule induction is one of the most common forms of
knowledge discovery. It is able to convert the data into a set of
IF-THEN rules for classication. The algorithms based on GAs
for rule discovery has been studied in Jong, Spears, and Gordon
(1993), Liu and Kwok (2000), Fidelis, Lopes, and Freitas (2000),
Au, Chan, and Yao (2003) and Chiu (2002).
Recently, some algorithms based on other EAs have been developed for rule discovery. Jiao, Liu, and Zhong (2006) proposed the
organizational coevolutionary algorithm for classication (OCEC).
OCEC uses a bottom-up search mechanism, and causes the evolution of sets of examples which form organizations. Three new evo* Corresponding author. Tel.: +86 21 34204261; fax: +86 21 34204427.
E-mail addresses: hjsu@sjtu.edu.cn, hjsush@gmail.com (H. Su).
0957-4174/$ - see front matter 2009 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2009.06.029

lutionary operators and a selection mechanism are devised for


realizing the evolutionary operations performed on organizations.
OCEC can handle multiclass learning in a natural way because it
is inspired from the coevolutionary model.
Genetic programming for discovering comprehensible classication rules have been investigated (Falco, Cioppa, & Tarantino,
2002; Johnson, Gilbert, & Winson, 2000). The algorithm proposed
in Falco et al. (2002) can provide the compact and comprehensible
classication rules and has good robustness.
Particle swarm optimizer (PSO) is a new evolutionary algorithm, which simulates the coordinated motion in ocks of birds.
Sousa, Silva, and Neves (2004) proposed the use of PSO for data
mining. PSO can achieve the rule discovery process. The rule representation in PSO uses the Michigan approach. PSO needs fewer particles than GA to obtain the same results.
Ant colony optimization (ACO) is a new heuristic algorithm by
research on the behavior of real ant colonies. Parpinelli, Lopes,
and Freitas (2002) rstly proposed Ant-Miner based ACO algorithm
for extracting classication rules from data. Ant-Miner discovered
rules referring only to nominal attributes. Continuous attributes
has to be discretized. In the initial population, Ant-Miner using
the entropy measures has more the quality of the rules than a
GA algorithm generating the initial population at random. AntMiner adopts the normalized information-theoretic heuristic function which computed the entropy for an attribute-value pair only.
Holden and Freitas (2008) proposed a hybrid PSO/ACO algorithm for discovering classication rules. In PSO/ACO, the rule discovery process is divided into two separate phases. In the rst
phase, ACO discovers a rule containing nominal attributes only.
In the second phase, PSO discovers the rule potentially extended
with continuous attributes.

H. Su et al. / Expert Systems with Applications 37 (2010) 12161222

1.2. Proposed algorithm


The differential algorithm (DE) is a population-based, stochastic global optimization approach proposed by Stron and
Price (1997). DE is an excellent algorithm in the oat-point
search space. Many modied DE algorithms have been developed for solving continuous optimization problems. Although
these DE algorithms have good performance for continuous
problems, they are difcult to solve lots of practical engineering
problems formulated as discrete optimization problems, such as
the combinational problems, the scheduling or routing problems. Compared with continuous DE, binary DE has not been
researched extensively, and its applications are still limited in
a few cases.
Lampinen modied the differential evolution to solve non-linear programming problems containing integral, discrete and continuous variables (Lampinen & Zelinka, 1999). A new binary DE,
called AMDE, was proposed for numerical optimization in Pampara, Engelbrecht, and Franken (2006).
Han proposed quantum-inspired evolutionary algorithm based
on the concepts of Q-bits and superposition of states in Han and
Kim (2002) and Han and Kim (2004). Although QEA has good performance for the knapsack problem, changing the initial values of
Q-bits can inuence QEA to search the best solutions. Additionally
the strategy choosing the magnitude of the rotation angle has an
effect on the convergence speed of QEA.
We previously proposed a novel binary differential evolution
algorithm, called the quantum-inspired differential evolution
algorithm (QDE) (Su & Yang, 2008). QDE uses a Q-bit individual
as a probabilistic representation, instead of binary or oatpointed representation. The mutation operator and crossover
operator of DE are adapted in order to generate new Q-bit
strings. The selection operator of DE can make better Q-bit
strings and their observing states enter next generation
population.
In this paper, we propose a new classication algorithm called
DE/QDE. DE/QDE uses ideas from an original DE and a modied
DE to cope with continuous and integral attributes, and use ideas
from QDE to cope with binary attributes. Here, the binary and integral attributes belong to the nominal attributes.
1.3. Structure of the paper
The rest of the paper is organized as follows: Section 2 describes
the original DE, the modied DE for integer optimization, and the
QDE algorithm for binary optimization. Section 3 discusses the use
of DE/QDE for rule discovery. In Section 4 we present the experimental procedure and results. Some conclusions are discussed in Section
5.

2. The DE algorithm
In the section, it describes three versions of DE which can deal
with continuous, integral and binary optimization, respectively.

1217

2.1.1. Mutation
There are several mutation forms of DE at present. The scheme
DE=rand=1 is one of the most popular schemes used. This scheme
will be described as follow. Target vectors denote
T
T
X i;G x1;i ; . . . ; xN;i , and trial vectors denote V i;G v 1;i ; . . . ; v N;i ,
where i 1; . . . ; NP; N is the dimension of the target function, and
the subscript G denotes the G-th generation. DE=rand=1 is expressed the following equations:

V i;G X r1 ;G FX r2 ;G  X r3 ;G

where r1 ; r 2 ; r 3 are randomly and mutually different integers chosen


in the range f1; . . . ; NPg, and are also different from the running index i. F is a real parameter which controls the amplication of the
differential variation. Generally, the value of F is set in the range
[0, 2], usually less than 1. If F is chosen a lower value, the diversity
of DE becomes worse, and DE is easier to get into local optima.
2.1.2. Crossover
After the mutation step, the nal trial vector U i;G u1;i ; . . . ; uN;i T
is calculated by the following equation:

uj;i

v j;i

if randj 0; 1 6 CR or j jrand

xj;i

otherwise

where j 1; . . . ; N; CR 2 0; 1; jrand 2 f1; . . . ; Ng. CR represents the


probability that an element of the nal trial vector is chosen from
the new mutation vector and not from the old target vector. If CR
is set to high value, DE becomes to converge faster. If CR is set to
low value, DE becomes robust, but spends more time in nding
the minimum of the problem. jrand is to make sure that the nal trial
vector is different from the corresponding target vector by at least
one element. The Eq. (2) is binomial crossover operator and is
adopted in the paper.
2.1.3. Selection
Each target vector of the next generation is generated as:

X i;G1

U i;G

if f U i;G < f X i;G

X i;G

otherwise

The equation is a greedy selection scheme. If the vector U i;G yields a


smaller objective function value (for minimization problem) than
X i;G , U i;G will replace X i;G and enter the population of the next generation, i.e. X i;G1 obtains the information of U i;G , otherwise X i;G will retain in the population for the next generation, i.e. X i;G1 obtains the
information of X i;G .
2.2. DE for integer optimization
DE is only capable of handling continuous variables in its normal form. Lampinen and Zelinka (1999) discussed how to modify
DE for the integral variables and extended DE for the integer optimization. It is proposed two simple modications. First, integer
values should be used in order to evaluate the tness function,
but DE may still works internally with continuous values. Thus
the form of the tness function is described as:

2.1. DE for continuous optimization

f yj

DE is a novel parallel search method. Because DE is a oatingpoint encoded evolutionary algorithm, it often deals with the
real-valued optimization problems. DE generates new candidate
individuals by combining a parent individual and one or several
differences. DE has three parameters: mutation control parameter
F, crossover control parameter CR and population size NP.
Some versions of DE provided many novel strategies for three
parameters.

where

yj INTxj ;

xj 2 X;

j 1; . . . ; N

INT is a function for converting a real value to an integer value by


truncation. Truncated values are not elsewhere assigned. DE performs its operators with a population of continuous variables
regarding of the corresponding variable type. Second, in case of
integral variables, the population should be initialized as follows:

1218

H. Su et al. / Expert Systems with Applications 37 (2010) 12161222



xi;j xLj ri;j xUj  xLj 1

where j 1; . . . ; N. xUj and xLj are the upper and the lower bounds of
the j-th variable, respectively. Using the two modied equations,
the problem containing integral variables should be handled easily.
2.3. QDE for binary optimization
The above method is not suit to handle binary variables
because the value of an binary variable is either 0 or 1. Thus
the form of DE should be modied properly to implement binary
optimization. QDE is a novel evolutionary algorithm based on the
concept and principles of quantum computing (Su & Yang, 2008).
It uses a string of Q-bits as an individual. The Q-bit representation
has a better feature than ordinary binary string. QDE is designed
with the Q-bit representation. We present the QDE algorithm in
the following.
QDE  maintains  a population of Q-bit individuals,
Q G qG1 ; qG2 ; . . . ; qGn at generation G, where n is the size of population, and q is a Q-bit individual dened as

"
qGi

aGi1 aGi2 . . . aGim


bGi1

bGi2

...

bGim

where m is the length of a Q-bit individual. Because jaGij j2 or jbGij j2 denotes a probability toward either 0 or 1 state, they can be changed
by the mutation operator of DE. Thus the mutation operator can be
expressed as

v Gi;j aGr ;j F aGr ;j  aGr ;j


1

where i 1; . . . ; n; j 1; . . . ; m. The integers r1 ; r 2 ; r3 are randomly


and mutually different integers chosen in the range 1; . . . ; n, and
they are also different from the running index i. F is a real factor
which controls the amplication of the differential variation. Generally, the value of F is set in the range of [0, 2].
The crossover operator is expressed as

(
0
i;j

v Gi;j

if randj 0; 1 6 CR or j jrand

aGi;j otherwise

where i 1; . . . ; n; j 1; . . . ; m; CR 2 0; 1; jrand 2 f1; . . . ; ng. CR is the


crossover probability.b0i;j is calculated by the following equation

b0i;j

r
 2
 1  a0i;j

So the new Q-bit individual is

q0i

a0i1 a0i2 . . . a0im


b0i1

b0i2

...

b0im

10

The new binary string uGi can be obtained by observing each Q-bit
state of qGi . The population at generation G denotes XG
 G



x1 ; . . . ; xGn , and the observation vectors denote UG u01 ; . . . ; u0n .
The selection operator is expressed as

(
xiG1

u0i
xGi

 
if f u0i < f xGi
otherwise

After a Q-bit individual is performed the mutation operation


and the crossover operation, it will update the values of the corresponding bits. The individual must be observed in order to get the
binary string. Of course, it is unnecessary to observe its entire
Q-bits. Observing the entire Q-bits of one individual not only
increases the computational cost, but also may generate a brannew binary individual which does not inherit any information from
the former binary individual. For binary problems, such as the
knapsack problem, it is proposed the binomial observation
approach.
3. The DE/QDE algorithm for rule discovery
In this section, we use three versions of DE to solve the classication problem. Our algorithm is called DE/QDE which can cope
with the datasets containing the continuous, binary and integral
attributes. Especially, the continuous attribute can be used directly.
3.1. Representations of the rule
For describing a problem with m attributes Aj ; j 1; . . . ; m,
rules can be represented as: if cond1 ^ . . . ^ condm then class
C k . The representation of an individual in DE/QDE is cond1 ;
. . . ; condm ; C k . Here, C k is the value of the class, and condj is a condition on Aj . For a continuous attribute, the form of condj is
V j;lower 6 Aj 6 V j;upper . For a nominal attribute, the form of condj
is Aj V j .
It is well known that DE is easy to deal with continuous variables. But there are nominal attributes in the classication problems. If a nominal attribute has more than two values, it is called
an integral attribute. If a nominal attribute has only two values,
it is called a binary attribute. When DE is modied simply, it can
deal with integral attributes. For an binary attribute, DE must increase some mechanisms. So QDE proposed in Section 2.3 is used
in order to solve the binary attribute.
Fig. 1 shows the form of an individual in the population. In a
problem with m attributes, Genej represents the condition of the
j-th attribute Aj ; j 1; . . . ; m. Each Gene contains three elds. The
rst eld, the weight W j , is a binary variable taking 0 or 1. When
the value of W j is set to 0 or 1, the j-th attribute is to be removed
or inserted into the individual. The second eld of Genej shows different forms depending to the type of the corresponding attribute.
If an attribute Aj is continuous, the second eld represents the lower bound V j;lower of the attribute. If Aj is binary, the second eld represents the quantum representation V j;1 of the attribute. If Aj is
integral, the second eld represents the value V j;1 of the attribute,
and is a real number which is used in the mutation operator. The
third eld is similar to the second. If Aj is continuous, the third eld

Gene1
W1

V 1,1

Gene m
V 1,2

Wm

V m,1

11
Continuous

Wj

V j,lower

V j,upper

Binary

Wj

V j,1

V j,2

Integer

Wj

V j,1

V j,2

and

(
qiG1

q0i

 
if f u0i < f xGi

qGi

otherwise

12

So
the observing population at generation G 1 is XG 1
 G1
, and the Q-bit population at generation G 1
x1 ; x2G1 ; . . . ; xG1
 n

.
is Q G 1 q1G1 ; q2G1 ; . . . ; qG1
n

Fig. 1. Representation of an individual.

Class
V m,2

Ck

H. Su et al. / Expert Systems with Applications 37 (2010) 12161222

represents the upper bound V j;upper of the attribute. If Aj is binary,


the third eld represents the value V j;2 of the attribute, i.e.
V j;2 0 or 1. If Aj is integral, the third eld represents the value
V j;2 of the attribute, i.e. V j;2 INTV j;1 . C k is the value of which
class the individual belongs to.
Although the encoding of each individual in DE/QDE has a xed
length, the length of its corresponding rule is alterable depending
to the value of the eld weight W j . How to regulate the weight is
described in Section 3.4.
3.2. Fitness function
The tness function is used to evaluate the quality of each rule
during the training process. Fidelis et al. (2000) proposed the tness function based on the sensitivity Se and the specicity
Sp. The following four aspects must be evaluated rstly:
(1) True Positive TP: the rule covers the number of examples
that have the class predicted by the rule.
(2) False Positive FP: the rule covers the number of examples
that have a class different from the class predicted by the
rule.
(3) False Negative FN: the rule does not cover the number of
examples that have the class predicted by the rule.
(4) True Negative TN: the rule does not cover the number of
examples that have a class different from the class predicted
by the rule.
The accuracy Ac, the sensitivity Se and the specicity Sp are
dened as follows, respectively:

TP
TP FP
TP
Co
TP FN
TN
Sp
TN FP

Ac

13
14
15

We dene the tness function as follows:

Fitness x1  Ac  Co  Sp x2  Simp

16

where Simp is a measure of rule simplicity, and x1 and x2 are userdened weights. The Simp measure can be dened in many different
ways. Here, Simp is expressed as follows:

Simp

Termu
Terma

17

where Terma means the number of the useful attributes of a rule, and
Termu means the number of potentially useful attributes. A given
attribute Ai is said to be potentially useful if there is at least one training example having both the Ai s value specied in the rule antecedent and the goal atribute value specied in the rule consequent (de
Araujo, Lopes, & Freitas, 1999). In our experiment, x1 and x2 are
set to 0.8 and 0.2, respectively. The tness is taken on values in the
range [0, 1]. In the classication problem, we search an individual
having the maximum tness value per optimization process.
3.3. Rule extraction and prediction method
In DE/QDE, each individual of the population represents a rule.
The genome of an individual consists of the antecedent (IF part)
and the consequent (THEN part) of the rule. During one run, the
antecedent of the genome of each individual needs to perform
the interrelated evolution operators, but the consequent species
a preset xed class. Each run discovers a single rule which predicts
a given class for examples. If a given dataset contains m classes, the
algorithm needs to run m times at least.

1219

After a rule is discovered, it goes through a pruning process in


order to remove redundant attributes. This process is done by iteratively removing one attribute of the rule at a time. If the new obtained rule has the same or higher quality than the original rule,
the new rule replaces the original. It is noted that our pruning process only regulates the length of a rule rather than reduces the
number of the rules.
When an example is tested, there are three possible outcomes. First, there might be one rule covering the example. In
this case, the example is simply classied by the rule. Second,
there might be more than one rule covering the example, and
the consequences of the rules belong to different classes. In this
case the example is classied by the rule having the highestquality tness among all the rules covering the example. Third,
there might be no rule covering the example. In this case, the
example is classied by the rule having the maximum match value among all the rules. When more than one rules have the
same maximum match value, the one having higher-quality
t

ness is used. The match value is dened as MV ir


termir
=
jtermr j, where termr denotes the number of terms in the antecedent of the rule r, and termir denotes the number of terms satised by the example i (Jiao et al., 2006). According to the
denition, the range of the match value is [0, 1], but the match
value must be less than 1 in the third case.
3.4. Implementation of DE/QDE
There may be continuous, binary, integral attributes or their
combination in a given dataset. For many classication algorithms,
the continuous attributes have to be converted into discrete values
by the discretization methods in a preprocessing step. Discretization often improves the comprehensibility of the continuous attributes, because the classication algorithms take on only a few
discrete values rather than a set of continuous values.
Our DE/QDE algorithm can cope with the continuous attributes
directly. It has introduced three mutation operators for the continuous, binary and integral attributes in Section 2. Here, we need to
organize these strategies well. Supposing that a dataset contains m
attributes which might belong to the continuous, binary and integral attributes respectively. We need to mark clearly the kind of
each attribute in sequence. The j-th item of an individual denotes
the j-th attribute in the dataset, j 1; . . . ; m. If the j-th item is
continuous, the mutation strategy in Section 2.1 is performed; if
the j-th item is integral, the mutation strategy in Section 2.2 is performed; if the j-th item is binary, the mutation strategy in Section
2.3 is performed. Each continuous item has the lower and upper
bounds, so the two bounds must perform the mutation operator
respectively. The following codes sun up the process of the value
mutation.
The pseudo-code of the value mutation:
There are NP individuals in the population. Each individual has
m attributes.
begin
fori 0; i < NP; i
{
forj 0; j < N; j
{
if( the j-th attribute is continuous )
{
Change the lower and upper bounds respectively;
}
else if( the j-th attribute is binary )
{
Change the value of the quantum bit;
Obtain new value by the observation approach;

1220

H. Su et al. / Expert Systems with Applications 37 (2010) 12161222

}
else if( the j-th attribute is integral )
{
Change the value of the second eld V j;1 ;
V j;2 INTV j;1 ;
}
}
}
end.
The weight mutation is developed to change the weight of an
attribute in an individual. In the initial population, we usually
choose an attribute randomly whose weight value is set to 1. It
means that each initial individual has only a useful attribute. The
parameter pw denotes the attribute-insertion or attribute-removal
probability. When a random number is less than pw and the j-th
attribute is useless, the j-th attribute is transformed to be useful.
When a random number in the range [0, 1] is less than pw and
the j-th attribute is useful, the j-th attribute is transformed to be
useless. However, during the weight mutation process of an individual, the transformation from useless to useful is limited not
more than two times, and the transformation from useful to useless is limited not more than one time. The following codes sun
up the process of the weight mutation.
The pseudo-code of the weight mutation:
begin
fori 0; i 6 NP; i
{
bag1=0; bag2=0;
forj 0; j 6 m; j
{
if( rand < pw kj m  1)
{
if( W j 0&&bflag1 < 2
{
W j 1; bag1++;
}
if( W j 1&&bflag2 < 1)
{
W j 0; bag2++;
}
}
}
}
end.
We consider that pm should be set to a little value, such as 0.1.
Then each attribute has a small chance to be removed or inserted
in an individual. It ensures that the actual length of an individual
is changed slightly in each iteration. This attribute-insertion and
attribute-removal strategy can test iteratively whether an attribute
is useful to an individual. The basic process of the DE/QDE algorithm is presented as follow:
Step 1: Initialize the population.
Step 2: Perform the value mutation operator according to the
kind of the attributions.
Step 3: Perform the weight mutation operator.
Step 4: Perform the crossover operator.
Step 5: Evaluate each individuals tness.
Step 6: Perform the selection operator.
Step 7: If the iterative generation gets to the preset value, go to
Step 8, else go to Step 2.
Step 8: Extract a rule from the best individual.

4. Experiments and results


For the experiments, six datasets from the well-known UCI
dataset repository (Blake, Keogh, & Merz, 1998) are used to test
the performance of DE/QDE. The basic information of the datasets
is presented in Table 1. The attribute is partitioned into three
types: the continuous, binary and integral attributes. Table 1 indicates the numbers of the continuous, binary and integral attributes
of each dataset. Breast cancer (L) contains the binary and integral
attributes. Breast cancer (W) and Tic-tac-toe only contain the integral attributes. Dermatory, Hepatitis and Cleveland contain the
continuous, binary and integral attributes. As mentioned earlier,
DE/QDE can cope with the continuous attributes directly, so discretization is abandoned completely in the data preprocessing
step. We run 10-fold cross-validation as the test method. One dataset is divided into 10 equal partitions. A partition is used as the test
set and the remaining nine partitions are used as the training set.
The average predictive accuracies of the 10 runs are reported as
the predictive accuracy of the discovered rule set.
DE/QDE has 50 individuals and runs for a maximum of 200 iterations to discovery one rule. Other important parameters are listed
as follows:
 Mutation parameter F is generated randomly in the range [0,
0.5];
 Crossover parameter CR is generated randomly in the range [0,
0.3];
 Attribute-insertion or attribute-removal probability pw is set
to 0.1;
 Minimum number of the training examples covered by per rule
is set to 4.
We evaluate the performance of DE/QDE in comparison with
Ant-Miner (Parpinelli et al., 2002) and CN2 (Clark & Niblett,
1989). Ant-Miner and CN2 are well-known classication algorithms using the rule set. Ant-Miner is an algorithm using ant colony optimization (ACO) for the discovery of classication rules.
CN2 is an induction algorithm combining some ne strategies of
ID3 and AQ. Table 2 reports the average predictive accuracy and
the standard deviation. Table 3 reports the average number of
rules, the standard deviation. Table 4 reports the number of terms
per rule.
The results of Ant-Miner without rule pruning are also shown in
the tables. It can be seen that the rules discovered by Ant-Miner
without rule pruning is longer than those discovered by Ant-Miner,
and the rule set discovered by Ant-Miner without rule pruning is
huger than that discovered by Ant-Miner. The reason is that the
rule pruning is usually benecial to improving the predictive accuracy of the rule set since it can delete redundant rules and reduce
the antecedent of each rule. Additionally, a default rule is used to
simply predict a new example uncovered by the rule list in
Ant-Miner. The default rule containing no conditions has only the
consequence. Rules will be removed if they have the same consequence as the default rule. So using the default rule is helpful to

Table 1
UCI repository datasets used in experiments.
No.

Datasets

Examples

Attributes

Class

Continuous

Binary

Integral

1
2
3
4
5
6

Breast cancer (L)


Breast cancer (W)
Tic-tac-toe
Dermatory
Hepatitis
Cleveland

282
683
958
366
155
303

1
6
5

1
13
3

3
9
9
32

2
2
2
6
2
5

1221

H. Su et al. / Expert Systems with Applications 37 (2010) 12161222


Table 2
The predictive accuracy of DE/QDE, Ant-Miner, Ant-Miner without rule pruning and CN2 for the datasets.

Breast cancer (L)


Breast cancer (W)
Tic-tac-toe
Dermatology
Hepatitis
Cleveland

DE/QDE

Ant-Miner

Ant-Miner without rule pruning

CN2

75.52 4.91
92.68 5.07
98.85 2.07
91.53 2.40
90.97 6.34
52.15 3.61

75.28 2.24
96.04 0.93
73.04 2.53
94.29 1.20
90.00 3.11
59.67 2.50

70.69 3.87
95.74 0.74
76.83 2.27
83.05 1.94
92.50 2.76
54.82 2.56

67.69 3.59
94.88 0.88
97.38 0.52
90.38 1.66
90.00 2.50
57.48 1.78

Table 3
The average number of rules of DE/QDE, Ant-Miner, Ant-Miner without rule pruning and CN2 for the datasets.

Breast cancer (L)


Breast cancer (W)
Tic-tac-toe
Dermatology
Hepatitis
Cleveland

DE/QDE

Ant-Miner

Ant-Miner without rule pruning

CN2

6.30 1.19
11.80 1.08
10.00 0.00
11.90 2.02
4.30 0.64
11.10 1.37

7.10 0.31
6.20 0.25
8.50 0.62
7.30 0.15
3.40 0.16
9.50 0.92

19.60 0.22
22.80 0.20
68.80 0.32
25.90 0.31
6.80 0.13
21.80 0.20

55.40 2.07
18.60 0.45
39.70 2.52
18.50 0.47
7.20 0.25
42.40 0.71

Table 4
The number of terms per rule of DE/QDE, Ant-Miner, Ant-Miner without rule pruning and CN2 for the datasets.

Breast cancer (L)


Breast cancer (W)
Tic-tac-toe
Dermatology
Hepatitis
Cleveland

DE/QDE

Ant-Miner

Ant-Miner without rule pruning

CN2

2.80
1.20
2.60
3.11
2.98
3.38

1.28
1.97
1.18
3.16
2.41
1.71

3.25
5.72
3.47
16.86
6.01
4.32

2.21
2.39
2.90
2.47
1.58
2.79

reduce the number of the rules. In DE/QDE, the match value is used
to classify a new example uncovered by the rule set. Pruning the
antecedent of a rule does not reduce the number of the rule set,
so the number of the rule set is not changed in the rule pruning
step.
In Table 2, it can be seen that the predictive accuracies of DE/
QDE for Breast cancer (L), Tic-tac-toe and Hepatitis are higher than
those of Ant-Miner. DE/QDE has higher predictive accuracies than
CN2 for Breast cancer (L), Tic-tac-toe, Dermatology and Hepatitis.
But the standard deviations of DE/QDE are often larger than those
of Ant-Miner and CN2. In DE/QDE, the continuous attributes are directly used in the process of nding rules. Ant-Miner needs to discretize the continuous attributes. After a continuous attribute is
discretized, it values will become several isolated values rather
other many continuous value in a range. So discretization is often
a good thing because this step simplies the distribution of the
continuous attribute. Of course, DE/QDE can also accept the continuous attributes which are discretized. But this is not the purpose of
the paper.
In each dataset, the number of rules of DE/QDE is obviously less
than those of Ant-Miner without rule pruning and CN2. But DE/
QDE obtains better result than Ant-Miner only for Breast cancer
(L). Likewise, in Table 4, the number of term per rule has similar
comparison results, but DE/QDE obtains better result than AntMiner only for Breast cancer (W).
Generally, the performance of a classication method using
rules is affected by several aspects, such as the discovering rules
algorithm, the evaluation tness and the rule pruning method. In
DE/QDE, the values of F and CR are not discussed specially because
they seldom inuence DE/QDE to discover classication rules. But,
how the weight mutation works is a very important step which
determines the conversion of the attributes between useful and

useless. Then the tness function is another important factor.


Designing the tness function needs to depend on the factual demand, such as the predictive accuracy, the comprehensibility and
the interestingness. In many cases, the tness functions need to
take into account more than a measure.
5. Conclusion
This paper has proposed an new classication algorithm called
DE/QDE. DE/QDE combines two DE algorithms, i.e. DE and QDE.
QDE is an optimization algorithm based on the strategies of the
DE algorithm in the binary-valued space. DE/QDE can deal with
the datasets containing the continuous, binary and integral attributes. Because the continuous attribute can be used directly in
DE/QDE, it is possible that discretization is canceled in the data
preprocessing step. The weight mutation operator is used to update the weight of the attributes for an individual. DE/QDE is very
excellent in term of the search ability, and can search high-quality
rules. The results of six datasets show that DE/QDE can obtain competitive the predictive accuracies, although it generates a little larger rule sets than Ant-Miner. So the future research will reduce the
number of the rule set and improve the comprehensibility of the
rule set.
References
Au, W.-H., Chan, K. C. C., & Yao, X. (2003). A novel evolutionary data mining
algorithm with applications to churn prediction. IEEE Transaction on
Evolutionary Computation, 7(6), 532545.
Blake, C., Keogh, E., & Merz, C. J. (1998). Uci repository of machine learning databases,
<http://www.ics.uci.edu/mlearn/MLRepository.html>.
Chiu, C. (2002). A case-based customer classication approach for direct marketing.
Experts Systems with Applications, 22(2), 163168.

1222

H. Su et al. / Expert Systems with Applications 37 (2010) 12161222

Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4),
261283.
Collard, M., & Francisci, D. (2001). Evolutionary data mining: an overview of
genetic-based algorithms. In Proceedings of the IEEE congress on evolutionary
computation (pp. 39).
de Araujo, D. L. A., Lopes, H. S., & Freitas, A. A. (1999). A parallel genetic algorithm for
rule discovery in large databases. In Proceedings of the IEEE congress on systems,
man and cybernetics, Tokyo (pp. 940945).
Falco, I., Cioppa, A., & Tarantino, E. (2002). Discovering interesting classication
rules with genetic programming. Applied Soft Computing(1), 257269.
Fidelis, M. V., Lopes, H. S., & Freitas, A. A. (2000). Discovering comprehensible
classication rules with a genetic algorithm. In Proceedings of the IEEE congress
on evolutionary computation (pp. 805810).
Han, K.-H., & Kim, J.-H. (2002). Quantum-inspired evolutionary algorithm for a class
of combinatorial optimization. IEEE Transactions on Evolutionary Computation,
6(6), 580593.
Han, K.-H., & Kim, J.-H. (2004). Quantum-inspired evolutionary algorithms with a
new termination criterion, H gate and two-phase scheme. IEEE Transactions on
Evolutionary Computation, 8(2), 156169.
Holden, N., & Freitas, A. A. (2008). A hybrid PSO/ACO algorithm for discovering
classication rules in data mining. Journal of Articial Evolution and Applications,
2008, 11. Article ID 316145. doi:10.1155/2008/316145.
Holland, J. H. (1986). Escaping brittleness: The possibilities of general purpose
learning algorithms applied to parallel rule-based systems. Machine Learning:
An Articial Intelligence Approach, 2, 593623.
Jiao, L., Liu, J., & Zhong, W. (2006). An organizational coevolutionary algorithm for
classication. IEEE Transaction on Evolutionary Computation, 10(1), 6780.

Johnson, H. E., Gilbert, R. J., & Winson, Michael K. (2000). Explanatory analysis of the
metabolome using genetic programming of simple interpretable rules. Genetic
Programming and Evolvable Machines(1), 243258.
Jong, K. A. D., Spears, W. M., & Gordon, D. F. (1993). Using genetic algorithms for
concept learning. Machine Learning, 13(23), 161188.
Lampinen, J., & Zelinka, I. (1999). Mixed integerdiscretecontinuous optimization
by differential evolution, part 1. In Proceedings of the fth international congress
on soft computing.
Liu, J. J., & Kwok, J. T.-Y. (2000). An extended genetic rule induction algorithm. In
Proceedings of the IEEE congress on evolutionary computation (pp. 458463).
Pampara, G., Engelbrecht, A., & Franken, N. (2006). Binary differential evolution. In
Proceedings of the IEEE congress on evolutionary computation (pp. 18731879).
Parpinelli, R. S., Lopes, H. S., & Freitas, A. A. (2002). Data mining with an ant colony
optimization algorithm. IEEE Transactions on Evolutionary Computing, 6(4),
321332.
Smith, S. F. (1983). Flexible learning of problem solving heuristics through adaptive
search. In Proceeding of 8th international congress on articial intelligence,
Karlsruhe, Germany (pp. 422425).
Sousa, T., Silva, A., & Neves, A. (2004). Particle swarm based data mining algorithms
for classication tasks. Parallel Computing, 30, 267783.
Stron, R., & Price, K. (1997). Differential evolution A simple and efcient heuristic
for global optimization over continuous spaces. Journal of Global Optimization,
11(4), 341359.
Su, H., & Yang, Y. (2008). Quantum-inspired differential evolution for binary
optimization, In The 4-th international conference on natural computation (pp.
341346).

Você também pode gostar