Você está na página 1de 5

2007 International Symposium on Information Technology Convergence

Classification of Symbolic Objects using Adaptive


Auto-Configuring RBF Neural Networks
T N Nagabhushan, Hanseok Ko, Junbum Park

S K Padma, Y S Nijagunarya

Department of Electronics & Computer Engineering


Korea University
Anam dong, Seongbuk-gu, Seoul 136-713, Korea
nagabhushan@ispl.korea.ac.kr, hsko@korea.ac.kr

Department of Information Science & Engineering


S J College of Engineering, Mysore 570 006, India.
skp@sjce.ac.in, nijagunarya@yahoo.com

Abstract Symbolic data represents a general form of classical


data. There has been a highly focused research on the analysis
of symbolic data in recent years. Since most of the future
applications involve such general form of data, there is a need
to explore novel methods to analyze such data. In this paper
we present two simple novel approaches for the classification of
symbolic data.1
In the first step, we show the representation of symbolic
data in binary form and then use a simple hamming distance
measure to obtain the clusters from binarised symbolic data.
This gives the Class label and the number of samples in each
cluster. In the second part we pick a specific percentage of
significant data samples in each cluster and use them to train
the Adaptive Auto-configuring neural network. The training
automatically builds an optimal architecture for the shown
samples. Complete data has been used to test the generalization
property of the RBF network. We demonstrate the proposed
approach on the soybean bench mark data set and results are
discussed. It is found that the proposed neural network works
well for symbolic data opening further investigations for data
mining applications.
Key words: Auto-configuring neural networks, Incremental
learning, RBF, Significant patterns.

I. I NTRODUCTION
In Conventional data analysis the objects are numerical
vectors. Clustering of such numerical vectors is achieved by
minimizing the intra cluster dissimilarity and maximizing
the inter cluster dissimilarity. Many different approaches
have been devised to handle such type of data [1] [2].
Symbolic objects are extensions of classical data types. The
main distinction between these two forms of data is that in
case of classical data the objects are more individualized
where as in symbolic frame work they are more unified
by relationships. Symbolic objects are defined as the logical
conjunction of events linking values and variables.
For example e1 = [ Color = (white, blue)], e2 = [ height =
( 1.5-2.0)], here the variable e1 takes the color either white
or blue where as the variable height e2 has a value between
1 This research was supported by the Ministry of Information & Communication (MIC), Korea under the IT Foreign Specialist Programme (ITFSIP)
supervised by the Institute of Information Technology Advancement (IITA).

0-7695-3045-1/07 $25.00 2007 IEEE


DOI 10.1109/ISITC.2007.31

1.5 to 2.
In general, symbolic data have both quantitative ( numeric
or interval) as well as qualitative attributes. There exists
three types of symbolic data, namely Assertion, Hoard and
Synthetic. Clustering and classification of such type of data
require specialized schemes and most of the procedures
reported use similarity and dissimilarity measures. The
similarity and dissimilarity between two symbolic objects are
defined in terms of three attributes namely, position, span
and content respectively [1] [2]. In the context of mining
important information from large complex data types, such as
multimedia data, it becomes imperative to develop methods
that have generalization ability. Most of the data generated
today closely resembles symbolic data. This work presents
some novel methods to deal with classification of symbolic
objects using machine learning techniques.
Analysis of symbolic data has been explored and expanded
by several researchers such as Edwin Diday and K C
Gowda [1], Ichino [3], D S Guru [4] etc. All of them
have viewed the analysis of symbolic objects from different
mathematical frameworks and reported good results. But
none of the available techniques have the ability to provide
good generalization for the test samples. In other words
neural computing techniques have not been tried on symbolic
objects except a recent work by S.Mitra [5]. In this paper
the authors have taken the bench mark dataset from UCI
machine learning repository and proposed schemes to find the
clusters with respect to medoids and the samples have been
trained using fuzzy Radial basis function neural networks
and have reported very good results. The only drawback of
the proposed scheme is the fixed architecture of the network.
The medoids serve as the fixed optimal centers for the RBF
neural network. While this method works well for small data
sets, for larger data sets the algorithm attracts additional
computational burden since it involves the calculation of
fuzzy membership functions.
In our work, we propose a very simple approach to the
classification of the symbolic objects. The main contributions
of this work are:

22

1 Conversion of symbolic data into a homogeneous binary


strings.
2 Using the binary form of Symbolic data sets, we determine the similarity between them using hamming
distance and then using this similarity index, we obtain
the clusters. The entire procedure is very simple and can
be easily applied to larger data sets.
3 We then compute the medoids in each cluster which have
binary symbolic data. We employed farthest neighbor
concept with respect to the medoid within a cluster
and select specified number of samples for training the
network.
Adaptive Auto-configuring RBF neural network proposed
in our earlier work [6] has been used for synthesizing the
architecture. The learning and generalization features of the
proposed techniques are illustrated on the standard bench
mark soybean dataset from UCI machine learning repository.
Next section introduces the Symbolic data and its features.
II. A N EW S IMILARITY M EASURE FOR S YMBOLIC
O BJECTS
It is well known that symbolic objects exist in a generalized
form in several applications. Predominantly symbolic objects
are to be used in all image processing applications. There
exist many methods for clustering symbolic objects. In our
work we propose a simple approach to compute the similarity
between symbolic objects through a homogenous binary
format for both quantitative and qualitative features. Our new
method doesnt require the traditional formule and hence uses
a simple representation of symbolic objects in the form of a
concatenated binary string. We compute the similarity indices
for the binary equivalent of symbolic objects as follows:
Let
bits and

, i = 1,2,......,n be the n number of binary

be number of samples.

Consider two strings

and

Similarity between

where

= 1 if

and

is given by

and

(1)

= 0 if

Equation 1 constitutes a new measure which defines the


similarity between two samples and as the ratio of
number of similar bits in the corresponding position of the two
strings to the total number of bits in the strings. It is seen from
the above equation that computation of similarity values is a
simple direct approach when compared to traditional methods.
We employ the agglomerative clustering algorithm to cluster
the symbolic objects using similarity indices computed using

equation 1. The above procedure has been applied to soybean


data and the class labels are obtained which are in concurrence
with the benchmark dataset.
III. S ELECTION OF S IGNIFICANT PATTERNS
It is known that the training patterns control the dynamics
of the neural network architecture. Larger the number of
patterns, longer will be the training times and larger will
be the size of the generated network. In many real life
situations, the training patterns often have high dimensions
besides being voluminous. In such situations, training a neural
network is laborious and some times frustrating too. Even
though incremental learning algorithms offer a better training
procedure, they also end up with large number of training
cycles.
If the training samples are structured and compact, then
neural networks can learn fast. On the other hand, if the
training patterns are not structured and noisy, then training
all them would result in a over fitting architecture and that
too at the expense of more resources. Therefore it becomes
imperative to take a look at the input patterns themselves
and choose those which makes sense in good learning and
generalization. Those samples which aid in good learning
and generalization are called informative patterns, significant
patterns or representative patterns. Theoretical procedures to
compute the upper bounds on the number of training samples
needed for a specified level of generalization are available.
A widely used theorem is the Vapnik-Chervonenkis (VC)
dimension [7] [8] [9]. Experimental results have shown that
acceptable generalization performances can be obtained with
training sets less than that specified by the VC dimension [10]
In the context of classification problems, the neural
networks define a classification boundary. While generating
the decision boundaries, the neural network often uses all
patterns, many of which are significant and many are not. The
non significant patterns which may be outliers often consume
maximum training cycles during learning and therefore need
to be removed for improving the learning performance. In
this work, we propose a simple approach to select significant
patterns with respect to the mediod in each class.
A. Problem Definition
Given a set V of input vectors, the problem is to obtain a
subset of V such that the vectors in the subset achieve the
desired generalization level.
Let V = , , ......, be n vectors which constitute the
samples in the input space and .
To obtain a subset of V belonging to , select k
number of samples from the given training set such that =
where
and , , ......, constitute
, , ......,
significant patterns. Let the entire set of pattern vectors ,

23

, ......, belong to c number of classes where c =


......, .

For each class

, ,

B. Algorithm for Adaptive RBF (ARBF) Network


1) Select two random vectors from the input space as initial
RBF centers. Connect them by an edge. Set their widths
to be equal to the distance between them, that is

Determine the medoid


Calculate the distance between each of the input
samples in the class and the medoid .
c
Choose a known percentage of samples which are the
farthest from the medoid. These are the samples that
represent this class in the training set. Also choose
the medoid as one of the samples in addition to those
chosen.
d
Repeat steps [a] through [c] for all the classes
in
the given input set.
e
Append all the selected samples from all the classes
to form the training set .
The above procedure is applied to three bench mark
datasets and the patterns derived are shown in Table II.

a
b

IV. T RAINING

(2)

where and are the coordinates of the two selected


RBF units.
2) For a given training pattern (
of the RBF network using

where

), calculate the output

(3)

(4)

Calculate error using

WITH INCREMENTAL LEARNING ADAPTIVE

(5)

(6)

3) Set using Table I.


Calculate using

RBF N EURAL N ETWORK


As mentioned earlier, there exist many versions of
incremental algorithms for synthesizing RBF networks. Each
one is complex in its own way and bears its own benefits. We
have modified the incremental learning algorithm proposed by
Fritzke [11] and used it for training the significant samples.
The modified algorithm has the ability to adapt its learning
parameters which control the RBF center movement in the
input space [6].

For every input pattern


that is presented, find the best
matching unit (BMU) using

(7)

Move the BMU times the current distance towards


the input pattern using

Training is conducted with full dataset as well as with selected significant patterns. Algorithm for training the network
along with notations used are given in the following sections.

(8)

Move all the immediate neighbors of the BMU times


their current distance towards the input pattern using

A. Notations
The following are the notations used in the algorithm
presented below:
Pattern index
Input pattern
Desired output
Width of RBF
Learning rate
Center adaptation parameter for BMU
Center adaptation parameter for non-BMU
Output layer
Hidden layer
Input layer
Actual output
Activation output
Error
RBF center
Weight between output and hidden neurons
Best Matching Unit

(9)

4) Update the weights between the hidden and output units


using

(10)

where is the learning rate which has a small value
between 0 and 1.
5) The width of RBF units are computed using Age
information. For each input presented, compute BMU
and the next immediate BMU. Connect them by an edge
and associate it with an age variable. When an edge is
created, its age variable is set to zero. The age of all
edges emanating from the BMU is increased at every
adaptation step. Edges exceeding an age limit A are
deleted and so also the nodes having no more emanating
edges. The insertion of a new RBF unit is based on
the squared error accumulated across all the output units.

24

S.N
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

TABLE I

TABLE II

L OOKUP TABLE FOR CHOOSING

S OYBEAN DATA SAMPLES SELECTED FOR TRAINING

Error

0.32
0.30
0.28
0.26
0.24
0.22
0.21
0.20
0.19
0.18
0.17
0.16
0.15
0.14
0.13
0.12
0.11
0.10
0.005

6) A RBF unit is inserted between a unit which has


accumulated maximum error and any of its neighbors.
Its weights are set to small random values. Its width is
set to the mean distance between the neighboring units.
7) The learning rate is decremented linearly by a small
value during the convergence cycle.
8) Repeat steps 2 to 7 until classification error for all the
patterns falls below a set value.
It is evident from the algorithm that the RBF units are
subjected to movement in the input space during the entire
learning phase. Insertion and movement of RBF units are
carefully controlled by the adaptation parameters , and
. These parameters help to synthesize optimal network architectures.
V. E XPERIMENTS
Since most of the improvements in RBF network construction have been illustrated with well-defined benchmark
datasets from UCI machine learning database repository [12],
we have also used the soybean dataset for our experiments and
comparison. The dataset is briefly described below:
A. Dataset
SOYBEAN LARGE data: Soybean data is available in two
forms - small and large. We have used the large dataset which
has more samples. The dataset has 307 instances belonging
to 19 classes. But out of the 307 samples, 41 samples have
missing attributes. After deleting these 41 samples, the dataset
has 266 patterns belonging to 15 classes. Each pattern has 35
attributes.

Percentage
10
20
30
40
50
60
70
75
80
90
100

Number
42
69
95
122
148
175
202
219
228
255
266

We have used the binarised representation of the actual


dataset available. After the process of binarization, each pattern is a 105 attribute vector consisting of binary equivalents
of the original data. Therefore we have trained and tested data
having 105 attributes belonging to 15 classes.
B. Training set Generation
We have investigated 2 approaches in selecting significant
training patterns. In the first approach euclidean norm is
computed with respect to the mean of the samples in a
class. Then a percentage of samples are picked for training.
In the second approach the medoid is used as a reference
point and samples are picked with respect to the medoid.
The above approaches have been used to select the training
samples whose number is progressively increased to study the
optimatility of the generated architecture. We have tabulated
results using the medoid only as the medoid happens to be an
actual sample in the input space where as the mean is a non
existant sample. Nevertheless results from both the approaches
are the same. Table II shows the number of samples selected
for training.
C. Learning Characteristics
The adaptive RBF algorithm is trained with different
proportions of significant patterns shown in Table II. Each set
of significant patterns is used to synthesize the optimal RBF
architecture. Thus in the proposed study we have obtained 11
different RBF architectures. Table III shows the number of
RBF units generated and epochs taken by various significant
pattern sets.
Figures 1 and 2 show the learning curves for 70% significant
samples. It can be seen that 70% significant samples have
yielded best results when compared with the remaining set of
patterns.
D. Testing and Generalization
We have used all the 266 patterns of the Soybean dataset
to test the classification accuracy of the RBF architectures
generated. Table IV shows the generalization produced by the
networks.

25

TABLE III

70
70% Significant patterns

R ESULTS : E POCHS & RBF UNITS FOR DIFFERENT COMPOSITION OF


60

S IGNIFICANT PATTERNS

50

#
266
255
228
219
202
175
148
122
95
69
42

Epochs
2783
2225
2539
2501
2407
2221
1850
1511
1306
1276
853

RBFs
68
60
67
59
68
62
53
47
44
39
28

RBF units

%
100
90
80
75
70
60
50
40
30
20
10

10
0
0

Fig. 2.

Error

40
30
20
10
0

Fig. 1.

1500

1000
Epochs

1500

2000

Epochs Vs RBF units for 70% Soybean Data

%
100
90
80
75
70
60
50
40
30
20
10

50

1000
Epochs

500

TABLE IV
% G ENERALIZATION ACHIEVED

70% Significant patterns


60

500

30
20

70

40

2000

#
266
255
228
219
202
175
148
122
95
69
42

Generalization
100.00
99.60
97.71
97.71
99.20
95.86
89.09
84.59
74.81
65.79
51.50

Epochs Vs Error for 70% Soybean Data

It is seen that generalization levels are poor in the lower


half of the table. This can be attributed to the small number
of patterns that are present in some of the clusters. And picking
significant patterns from a small number does not yield good
results. Specifically for this dataset, 70% significant patterns
picked up using fartherest neighbour principle with respect to
medoid has yielded good results both in terms of network size
and training time.
VI. C ONCLUSIONS
In this research work, we have shown that symbolic data can
be classified using auto-configuring RBF network with better
generalization. More applications in multimedia data mining
are under investigation.

[5] K. Mali and S. Mitra, Symbolic classification, clustering and fuzzy


radial basis function network, Fuzzy sets and systems, vol. 152, pp.
553564, 2005.
[6] T. N. Nagabhushan and S. K. Padma, Adaptive learning in incremental
learning rbf networks, ICONIP 2004, pp. 471476, 2004.
[7] Y. S. A. Mostafa, The vapnik-chervonenkis dimension: Information
verses complexity in learning, Neural Computation, vol. 1, pp. 312
317, 1989.
[8] E. B. Baum and D. Haussler, What size net gives valid generalization?
Advances in Neural Information Processing Systems, vol. 1, pp. 8190,
1989.
[9] M. Opper, Learning and generalization in a two-layer neural network:
The role of the vapnik-chervonenkis dimension, Physical Review Letters, pp. 21332166, 1994.
[10] D. Cohn and G. Tesauro, Can neural networks do better than the vapnikchervonenkis bounds? Advances in Neural Information Processing
Systems, vol. 3, pp. 911917, 1991.
[11] B. Fritzke, Supervised learning with growing structures, Advances in
Neural Information Processing Systems, vol. 6, pp. 255262, 1994.
[12] C. Blake and C. Merz, Uci repository of machine learning databases,
1998.

R EFERENCES
[1] K. C. Gowda and E. Diday, Symbolic clustering using new dissimilarity
measure, Pattern Recognition, vol. 24, no. 6, pp. 567578, 1991.
[2] K. C. Gowda and T. V. Ravi, Divisive clustering of symbolic objects
using the concepts of both similarity and dissimilarity, Pattern Recognition, vol. 28, no. 8, pp. 12771282, 1995.
[3] M. Ichino and H. Yayuchi, Generalized minkowski metrics for mixed
feature-type data analysis, IEEE Transactions on Systems, Man and
Cybernetics, vol. 24, no. 4, 1994.
[4] D. S. Guru, B. B. kiranagi, and P. Nagabhushan, Multivalued type
proximity measure and concept of mutual similarity value useful for
clustering symbolic patterns, Pattern Recognition Letters, vol. 25, pp.
12031213, 2004.

26

Você também pode gostar