Escolar Documentos
Profissional Documentos
Cultura Documentos
www.elsevier.com/locate/patrec
a,*
Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, China
b
Institute of Automation, Shanghai Jiao Tong University, Shanghai 200030, China
c
Department of Computer Science, The University of Wales, Aberystwyth, Ceredigion, SY23 3DB Wales, UK
Received 28 December 2004; received in revised form 24 March 2006
Available online 7 November 2006
Communicated by S.K. Pal
Abstract
We propose a new feature selection strategy based on rough sets and particle swarm optimization (PSO). Rough sets have been used
as a feature selection method with much success, but current hill-climbing rough set approaches to feature selection are inadequate at
nding optimal reductions as no perfect heuristic can guarantee optimality. On the other hand, complete searches are not feasible for
even medium-sized datasets. So, stochastic approaches provide a promising feature selection mechanism. Like Genetic Algorithms,
PSO is a new evolutionary computation technique, in which each potential solution is seen as a particle with a certain velocity ying
through the problem space. The Particle Swarms nd optimal regions of the complex search space through the interaction of individuals
in the population. PSO is attractive for feature selection in that particle swarms will discover best feature combinations as they y within
the subset space. Compared with GAs, PSO does not need complex operators such as crossover and mutation, it requires only primitive
and simple mathematical operators, and is computationally inexpensive in terms of both memory and runtime. Experimentation is carried out, using UCI data, which compares the proposed algorithm with a GA-based approach and other deterministic rough set reduction algorithms. The results show that PSO is ecient for rough set-based feature selection.
2006 Elsevier B.V. All rights reserved.
Keywords: Feature selection; Rough sets; Reduct; Genetic algorithms; Particle swarm optimization; Hill-climbing method; Stochastic method
1. Introduction
In many elds such as data mining, machine learning,
pattern recognition and signal processing, datasets containing huge numbers of features are often involved. In such
cases, feature selection will be necessary (Liu and Motoda,
1998; Guyon and Elissee, 2003). Feature selection is the
process of choosing a subset of features from the original
set of features forming patterns in a given dataset. The subset should be necessary and sucient to describe target
concepts, retaining a suitably high accuracy in representing
the original features. The importance of feature selection is
0167-8655/$ - see front matter 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.patrec.2006.09.003
460
461
P X fx 2 U jxP \ X 6 /g
NEGP Q U
BNDP Q
PX
X 2U =Q
PX
X 2U =Q
PX
X 2U =Q
jPOS P Qj
jU j
462
The intersection of all reducts is called the core, the elements of which are those attributes that cannot be eliminated. The core is dened as:
\
CoreC
Re d
10
3. PSO for feature selection
3.1. The principle of PSO
Particle swarm optimization (PSO) is an evolutionary
computation technique developed by Kennedy and Eberhart (1995a,b). The original intent was to graphically simulate the graceful but unpredictable movements of a ock
of birds. Initial simulations were modied to form the original version of PSO. Later, Shi introduced inertia weight
into the particle swarm optimizer to produce the standard
PSO (Shi and Eberhart, 1998a; Eberhart and Shi, 2001).
PSO is initialized with a population of random solutions, called particles. Each particle is treated as a point
in an S-dimensional space. The ith particle is represented
as Xi = (xi1, xi2, . . . , xiS). The best previous position (pbest,
the position giving the best tness value) of any particle
is recorded and represented as Pi = (pi1, pi2, . . . , piS). The
index of the best particle among all the particles in the population is represented by the symbol gbest. The rate of the
position change (velocity) for particle i is represented as
Vi = (vi1, vi2, . . . , viS). The particles are manipulated according to the following equation:
vid w vid c1 rand pid xid c2 Rand pgd xid
xid xid vid
11
12
463
Fig. 1. (a) PSO process. (b) PSO-based Feature Selection. The principle of updating velocity. Individual particles (1 and 2) are accelerated toward the
location of the best solution, gbest, and the location of their own personal best, pbest, in the two dimension problem space (feature number and
classication quality).
}
}
Iter=Iter+1;
}/*rand() and Rand() are two random functions in
the range [0, 1]*/
Return P_{gbest}
End
3.2. PSO and rough set-based feature selection (PSORSFS)
We can use the idea of PSO for the optimal feature selection problem. Consider a large feature space full of feature
subsets. Each feature subset can be seen as a point or position in such a space. If there are N total features, then there
will be 2N kinds of subset, dierent from each other in the
length and features contained in each subset. The optimal
position is the subset with least length and highest classication quality. Now we put a particle swarm into this feature space, each particle takes one position. The particles
y in this space, their goal is to y to the best position.
Over time, they change their position, communicate with
each other, and search around the local best and global
best position. Eventually, they should converge on good,
possibly optimal, positions. It is this exploration ability
of particle swarms that should better equip it to perform
feature selection and discover optimal subsets.
To apply the PSO idea to feature selection, some matters
must rst be considered.
3.2.1. Representation of position
We represent the particles position as binary bit strings
of length N, where N is the total number of attributes.
Every bit represents an attribute, the value 1 means the
corresponding attribute is selected while 0 not selected.
Each position is an attribute subset.
464
where cR(D) is the classication quality of condition attribute set R relative to decision D, jRj is the 1 number of
a position or the length of selected feature subset. jCj is
the total number of features. a and b are two parameters
corresponding to the importance of classication quality
and subset length, a 2 [0, 1] and b = 1 a.
This formula means that the classication quality and
feature subset length have dierent signicance for feature
selection task. In our experiment we assume that classication quality is more important than subset length and set
a = 0.9, b = 0.1. The high a assures that the best position
is at least a real rough set reduct. The goodness of each
position is evaluated by this tness function. The criteria
are to maximize tness values.
4. Experimental results and discussions
jCj jRj
jCj
13
14
Table 1
PSORSFS and GAAR parameter settings
GA
PSO
Population
Generation
Crossover probability
Mutation probability
c_1
c_2
Weight
Velocity
100
20
100
100
0.6
0.4
2.0
2.0
1.40.4
1(1/3) * N
465
Table 2
Experimental results on 14 datasets
Dataset
Features
Instances
Reduct size
Number of rules
Balloon1
Balloon2
Balloon3
Balloon4
Balance-scale
Lenses
Hayes-Roth
Corral
Monk1
Monk2
Monk3
PostoperativePatient
Parity5 + 2
Parity5 + 5
4
4
4
4
4
4
4
6
6
6
6
8
10
10
20
20
20
16
625
24
132
64
124
169
432
90
1024
1024
2
2
2
4
4
4
3
4
3
6
3
8
5
5
4
8
4
6
207
9
15
6
20
82
9
37
128
32
100
100
100
80
88.5
87.6
89.5
100
93.5
65.4
97.2
59.9
100
100
10.172
9.75
10.984
14.078
3001.1
24.844
95.422
36.016
103.406
297.875
327.406
138.75
615.844
722.14
Table 3
Reduct sizes found by feature selection algorithms
Dataset
Tic-tac-toe
Breastcancer
M-of-N
Exactly
Exactly2
Vote
Zoo
Lymphography
Mushroom
Led
Soybean-small
Lung
DNA
a
b
Features
9
9
13
13
13
16
16
18
22
24
35
56
57
Instances
958
699
1000
1000
1000
300
101
148
8124
2000
47
32
318
CEAR
DISMAR
GAAR
PSORSFS
8
4
7
8
10b
9
5
6b
5
6
2
4
7
7
4
7
8
11
11
10
8
5
12
2
5
6
8
5
6
6
10b
8b
5
7
6
18
2
4
6
8
4
6
6
11
9
6
8
5
8
6
6
7
8
4
6
6
10b
8b
5
7
4a
5b
2
4
6
Optimal solution.
Exclusive optimal solution.
466
Table 4
Classication results with dierent reducts
Dataset
Tic-tac-toe
Breastcancer
M-of-N
Exactly
Exactly2
Vote
Zoo
Lymphography
Mushroom
Led
Soybean-small
Lung
DNA
POSAR
CEAR
DISMAR
GAAR
PSORSAR
Time (s)
93
67
35
50
217
25
13
32
19
10
5
11
173
94.42
95.94
100
100
83.7
94.33
96.0
85.71
100
100
100
86.67
33.23
126
75
35
50
178
25
13
42
61
228
4
13
192
77.89
94.20
100
100
69.6
92.33
94.0
72.14
90.83
83.10
100
73.33
26.45
161
67
35
50
230
28
13
40
19
257
4
14
191
86.21
95.94
100
100
83
93.67
94
74.29
100
78.85
100
73.3
36.45
91
64
35
50
200
25
13
38
19
10
4
12
191
93.05
95.65
100
100
80.8
94.0
92.0
70.0
100
100
97.50
70.0
33.87
70
64
35
50
217
25
10
39
23
10
4
8
169
96.32
95.80
100
100
83.7
95.33
96.0
75.71
99.70
100
100
90.0
49.68
5719
1686.4
1576.6
1671.6
5834.0
424.17
87.5
336.172
14176
1758.3
25.719
26.203
1667.0
Table 5
Experimental results by RSES
Dataset
Total reducts
Number of rules
Tic-tac-toe
Breastcancer
M-of-N
Exactly
Exactly2
Vote
Zoo
Lymphography
Mushroom
Led
Soybean-small
Lung
DNA
9
20
1
1
1
2
33
424
292
140
8
4
6
6
10
8
5
6
4
5
2
4
5
1085
455
19
41
242
67
109
353
480
6636
31
100
2592
100
94.4
92.9
85.9
76.4
92.9
96.8
85.9
98.3
100
92.5
75
74.3
Note: For the last three datasets (Soybean-small, Lung and DNA), RSES cannot nd all reducts by the exhaustive algorithm as it requires too much
memory. Instead, we use a fast genetic algorithm to nd 10 reducts.
Table 6
PSO searching process on Exactly2
Iter
Best solution
1
2
3
411
12
13
1,
1,
1,
1,
1,
1,
2,
2,
2,
2,
2,
2,
4,
4,
4,
4,
3,
3,
5,
5,
5,
5,
4,
4,
7,
6,
6,
6,
5,
5,
8,
7,
7,
7,
6,
6,
9,
8,
8,
8,
7,
7,
Fitness value
0.8272
0.8362
0.8362
0.8663
0.9154
0.9231
11
11
11
12
11
10
467
468
5. Conclusion
This paper discusses the shortcomings of conventional
hill-climbing rough set approaches to feature selection.
These techniques often fail to nd optimal reductions, as
no perfect heuristic can guarantee optimality. On the other
hand, complete searches are not feasible for even mediumsized datasets. So, stochastic approaches provide a promising feature selection mechanism.
We propose a new optimal feature selection technique
based on rough sets and particle swarm optimization
(PSO). PSO has the ability to quickly converge (Shi
and Eberhart, 1999), it has a strong search capability in
the problem space and can eciently nd minimal
reducts. Experimental results demonstrate competitive performance. PSO is a promising method for rough set
reduction.
More experimentation and further investigation into
this technique may be required. The inertia weight (w)
and maximum velocity (Vmax) have an important impact
on the performance of PSO. The selection of the parameters may be problem-dependent. Vmax serves as a constraint
that controls the maximum global exploration ability PSO
can have. In many practical problems, its dicult to select
the best Vmax without trial-and-error (Shi and Eberhart,
1998b). In our feature selection problem, we rst limit
the particles velocity to N, since [1, N] is the dynamic range
of the feature space of each particle. But we nd that under
such a limitation, particles have poor local exploration
ability. So after many tests, we set Vmax to (1/3) * N, which
is suitable for our problem (see Section 3.2). The inertia
weight balances the global and local exploration abilities.
In our experiments, we let it decrease from 1.4 to 0.4 along
with the iterations. The performance of the PSO algorithm
with linearly decreasing inertia weight can be improved
greatly and have better results. The larger inertia weights
at the beginning help to nd good seeds and the later small
inertia weights facilitate ne search (Shi and Eberhart,
1998b, 1999).
469
Table 8
PSO searching process on Mushroom
Iter
Best solution
Fitness value
1
2
3
415
16
...
0.9591
0.9636
0.9682
0.9773
0.9818
...
9
8
7
5
4
...
Table 7
PSO searching process on Vote
Iter
Best solution
Fitness value
145
4652
53
54
1,
1,
1,
1,
0.9437
0.9440
0.9443
0.9500
9
8
7
8
2,
2,
2,
2,
3,
3,
3,
3,
4,
4,
4,
4,
7, 9, 11, 13, 16
7, 11, 13, 16
11, 13, 16
9, 11, 13, 16
Table 9
PSO searching process on Soybean-small
Iter
Best solution
Fitness
value
Feature
subset
length
0.9657
12
0.9686
0.9714
0.9743
0.9771
0.9800
0.9829
0.9857
0.9886
0.9914
0.9943
...
11
10
9
8
7
6
5
4
3
2
...
2
35
6
7
815
1619
2021
2227
2832
33
...
470
Best solution
Fitness
value
Feature
subset
length
0.9684
18
0.9702
17
0.9737
15
0.9772
13
0.9789
12
0.9807
0.9825
0.9842
0.9860
0.9877
0.9895
11
10
9
8
7
6
2
3
47
811
1215
1619
2023
2427
2831
32
Table 10
PSO searching process on Lung
Iter
Best solution
Fitness
value
Feature
subset
length
0.9696
17
0.9714
16
0.9732
15
0.9750
14
0.9768
13
0.9786
12
0.9804
0.9821
0.9839
0.9857
0.9893
0.9911
0.9929
11
10
9
8
6
5
4
25
6
78
9
1021
2223
2425
26
2731
32
3339
40
471