Escolar Documentos
Profissional Documentos
Cultura Documentos
S K Padma, Y S Nijagunarya
I. I NTRODUCTION
In Conventional data analysis the objects are numerical
vectors. Clustering of such numerical vectors is achieved by
minimizing the intra cluster dissimilarity and maximizing
the inter cluster dissimilarity. Many different approaches
have been devised to handle such type of data [1] [2].
Symbolic objects are extensions of classical data types. The
main distinction between these two forms of data is that in
case of classical data the objects are more individualized
where as in symbolic frame work they are more unified
by relationships. Symbolic objects are defined as the logical
conjunction of events linking values and variables.
For example e1 = [ Color = (white, blue)], e2 = [ height =
( 1.5-2.0)], here the variable e1 takes the color either white
or blue where as the variable height e2 has a value between
1 This research was supported by the Ministry of Information & Communication (MIC), Korea under the IT Foreign Specialist Programme (ITFSIP)
supervised by the Institute of Information Technology Advancement (IITA).
1.5 to 2.
In general, symbolic data have both quantitative ( numeric
or interval) as well as qualitative attributes. There exists
three types of symbolic data, namely Assertion, Hoard and
Synthetic. Clustering and classification of such type of data
require specialized schemes and most of the procedures
reported use similarity and dissimilarity measures. The
similarity and dissimilarity between two symbolic objects are
defined in terms of three attributes namely, position, span
and content respectively [1] [2]. In the context of mining
important information from large complex data types, such as
multimedia data, it becomes imperative to develop methods
that have generalization ability. Most of the data generated
today closely resembles symbolic data. This work presents
some novel methods to deal with classification of symbolic
objects using machine learning techniques.
Analysis of symbolic data has been explored and expanded
by several researchers such as Edwin Diday and K C
Gowda [1], Ichino [3], D S Guru [4] etc. All of them
have viewed the analysis of symbolic objects from different
mathematical frameworks and reported good results. But
none of the available techniques have the ability to provide
good generalization for the test samples. In other words
neural computing techniques have not been tried on symbolic
objects except a recent work by S.Mitra [5]. In this paper
the authors have taken the bench mark dataset from UCI
machine learning repository and proposed schemes to find the
clusters with respect to medoids and the samples have been
trained using fuzzy Radial basis function neural networks
and have reported very good results. The only drawback of
the proposed scheme is the fixed architecture of the network.
The medoids serve as the fixed optimal centers for the RBF
neural network. While this method works well for small data
sets, for larger data sets the algorithm attracts additional
computational burden since it involves the calculation of
fuzzy membership functions.
In our work, we propose a very simple approach to the
classification of the symbolic objects. The main contributions
of this work are:
22
be number of samples.
and
Similarity between
where
= 1 if
and
is given by
and
(1)
= 0 if
23
, ,
a
b
IV. T RAINING
(2)
where
(3)
(4)
(5)
(6)
(7)
Training is conducted with full dataset as well as with selected significant patterns. Algorithm for training the network
along with notations used are given in the following sections.
(8)
A. Notations
The following are the notations used in the algorithm
presented below:
Pattern index
Input pattern
Desired output
Width of RBF
Learning rate
Center adaptation parameter for BMU
Center adaptation parameter for non-BMU
Output layer
Hidden layer
Input layer
Actual output
Activation output
Error
RBF center
Weight between output and hidden neurons
Best Matching Unit
(9)
24
S.N
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
TABLE I
TABLE II
Error
0.32
0.30
0.28
0.26
0.24
0.22
0.21
0.20
0.19
0.18
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.10
0.005
Percentage
10
20
30
40
50
60
70
75
80
90
100
Number
42
69
95
122
148
175
202
219
228
255
266
25
TABLE III
70
70% Significant patterns
S IGNIFICANT PATTERNS
50
#
266
255
228
219
202
175
148
122
95
69
42
Epochs
2783
2225
2539
2501
2407
2221
1850
1511
1306
1276
853
RBFs
68
60
67
59
68
62
53
47
44
39
28
RBF units
%
100
90
80
75
70
60
50
40
30
20
10
10
0
0
Fig. 2.
Error
40
30
20
10
0
Fig. 1.
1500
1000
Epochs
1500
2000
%
100
90
80
75
70
60
50
40
30
20
10
50
1000
Epochs
500
TABLE IV
% G ENERALIZATION ACHIEVED
500
30
20
70
40
2000
#
266
255
228
219
202
175
148
122
95
69
42
Generalization
100.00
99.60
97.71
97.71
99.20
95.86
89.09
84.59
74.81
65.79
51.50
R EFERENCES
[1] K. C. Gowda and E. Diday, Symbolic clustering using new dissimilarity
measure, Pattern Recognition, vol. 24, no. 6, pp. 567578, 1991.
[2] K. C. Gowda and T. V. Ravi, Divisive clustering of symbolic objects
using the concepts of both similarity and dissimilarity, Pattern Recognition, vol. 28, no. 8, pp. 12771282, 1995.
[3] M. Ichino and H. Yayuchi, Generalized minkowski metrics for mixed
feature-type data analysis, IEEE Transactions on Systems, Man and
Cybernetics, vol. 24, no. 4, 1994.
[4] D. S. Guru, B. B. kiranagi, and P. Nagabhushan, Multivalued type
proximity measure and concept of mutual similarity value useful for
clustering symbolic patterns, Pattern Recognition Letters, vol. 25, pp.
12031213, 2004.
26