Você está na página 1de 0

Chapter 15: DATA CLUSTERING

ALGORITHMS

Jyh-Shing Roger Jang et al., Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence,
First Edition, Prentice Hall, 1997
Clustering partitions a data set into
several groups such that the similarity
within a group is larger than that among
groups.
Such partitioning requires a similarity
metrics that takes two input vector and
returns a value reflecting their similarity.
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch.9: Supervised learning Neural Networks
2
Clustering techniques are used in
conjunction with Radial Basis Function
(RBF) networks or fuzzy modelling
primarily to determine initial locations for
RBF or fuzzy if-then rules .
Clustering techniques are validated based
on the basis of following assumptions
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch.9: Supervised learning Neural Networks
3
Asssumptions
1. Similar inputs to get target system to
be modeled should produce similar
outputs.
It states that target system to be modeled
is a smooth input output mapping.
2. These similar input-output pairs are
bundled into clusters in the training data.
This assumption requires the data set to
conform to some specific type of distribution
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch.9: Supervised learning Neural Networks
4
Off-line clustering techniques
K-means (or Hard C-means) Clustering,
Fuzzy C-means Clustering,
Mountain Clustering, and
Subtractive Clustering.
5
K-means Clustering
The K-means clustering, or Hard C-
means clustering, is an algorithm based
on finding data clusters in a data set such
that a cost function (or an objection
function) of dissimilarity (or distance)
measure is minimized.
Euclidean distance is chosen as the
dissimilarity measure.
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch.9: Supervised learning Neural Networks
6
A set of n vectors x
j
, j =1,, n , are to be
partitioned into c groups , G
i
i= 1,.. c .
The cost function, based on the
Euclidean distance between a vector x
k
in
group j and the corresponding cluster
center c
i
, can be defined by
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch.9: Supervised learning Neural Networks
7
Where
is the cost function within group i.
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch.9: Supervised learning Neural Networks
8
The partitioned groups are defined by a
cxn binary membership matrix U , where
the element u
ij
is 1 if the jth data point x
j
belongs to group i , and 0 otherwise.
Once the cluster centers c
i
are fixed, the
minimizing u
ij
for Equation (1) can be
derived as follows:
9
Which means that x
j
belongs to group i if
c
i
is the closest center among all centers.
If the membership matrix is fixed, i.e. if u
ij
is fixed, then the optimal center c
i
that
minimize Equation (1) is the mean of all
vectors in group i :
10
11
where i G is the size of i G , or
ALGORITHM
The algorithm is presented with a data set
x i, i =1 ,.. n ; it then determines the
cluster centers ci and the membership
matrix U iteratively using the following
steps:
Step 1: Initialize the cluster center ,
c i ,i= 1,.. c . This is typically done
by randomly selecting c points from among
all of the data points.
12
Step 2: Determine the membership
matrix U by Equation (2).
Step 3: Compute the cost function
according to Equation (1). Stop if either it
is below a certain tolerance value or its
improvement over previous iteration is
below a certain threshold.
13
Step 4: Update the cluster centers
according to Equation (3). Go to step 2.
Algorithm is inherently iterative and
doesnt guarantees that the algorithm will
converge to an optimal solution.
Properties of k-means
Guaranteed to converge
Guaranteed to achieve local optimal, not necessarily
global optimal
14
The performance of the K-means
algorithm depends on the initial positions
of the cluster centers, thus it is advisable
to run the algorithm several times, each
with a different set of initial cluster
centers.
15
K means clustering has been applied to
a variety of areas, including image and
speech data compression, data
preprocessing for system modeling using
radial basis function networks,and task
decomposition in heterogeneous neural
network architectures.
16
K-means algorithm can be operated in on
line mode where the cluster centres and
the corresponding groups are derived
through time averaging.
The closest center ci for the given input
data point x is updated using
c
i
= (x-c
i
)
17
Pros:
Low complexity
complexity is O(nkt), where t = #iterations
Cons:
Necessity of specifying k
Sensitive to noise and outlier data points
Outliers: a small number of such data can
substantially influence the mean value)
Clusters are sensitive to initial assignment of
centroids
18
K-means is not a deterministic algorithm
Clusters can be inconsistent from one run
to another
19
Fuzzy C-means Clustering
An extension of k-means
Known as Fuzzy ISODATA, data
clustering algorithm in which data point
belongs to cluster to a degree specified
by a membership grade.
Fuzzy C-means clustering, which was
proposed by Bezdek in 1973 [1] as an
improvement over earlier Hard Cmeans
clustering.
20
Fuzzy C-means clustering (FCM), relies
on the basic idea of Hard C-means
clustering (HCM), with the difference that
in FCM each data point belongs to a
cluster to a degree of membership grade,
while in HCM every data point either
belongs to a certain cluster or not.
21
FCM employs fuzzy partitioning such that
a given data point can belong to several
groups with the degree of belongingness
specified by membership grades between
0 and 1.
FCM uses a cost function that is to be
minimized while trying to partition the data
set.
22
The membership matrix U is allowed to
have elements with values between 0 and
1.
The summation of degrees of
belongingness of a data point to all
clusters is always equal to unity:
23
24
The cost function for FCM is a generalization of
Equation (1):
where uij is between 0 and 1; i ci is the cluster
center of fuzzy group i ;
dij ,is the Euclidean distance between the ith
cluster center and the jth data point; and
m[1,) is a weighting exponent.
25
The necessary conditions for Equation (5)
to reach its minimum are
The algorithm works iteratively through
the preceding two conditions until the no
more improvement is noticed.
In a batch mode operation, FCM
determines the cluster centers ci
and the membership matrix U using the
following steps:
26
ALGORITHM
Step 1: Initialize the membership matrixU
with random values between 0 and 1
such that the constraints in Equation (4)
are satisfied.
Step 2: Calculate c fuzzy cluster centers
c i i=1,..c , using Equation (6).
27
Step 3: Compute the cost function
according to Equation (5). Stop if either it
is below a certain tolerance value or its
improvement over previous iteration is
below a certain threshold.
Step 4: Compute a new U using Equation
(7).Go to step 2.
28
The performance of FCM depends on the
initial membership matrix values; thereby
it is advisable to run the algorithm for
several times, each starting with different
values of membership grades of data
points.
29
Pros:
Allows a data point to be in multiple clusters
A more natural representation of the
behavior of genes
genes usually are involved in multiple functions
Cons:
Need to define c, the number of clusters
Need to determine membership cutoff value
Clusters are sensitive to initial assignment of
centroids
Fuzzy c-means is not a deterministic algorithm
30
Mountain Clustering
31
proposed by Yager and Filev
This technique builds calculates a
mountain function (density function) at
every possible position in the data space,
and chooses the position with the
greatest density value as the center of the
first cluster.
It then destructs the effect of the first
cluster mountain function and finds the
second cluster center..
The process is repeated until the desired
number of clusters have been found.
This method is a simple way to find
approximate cluster centers, and can be
used as a pre-processor for other
sophisticated clustering methods.
32
STEP 1
involves forming a grid on the data
space, where the intersections of the
grid lines constitute the potential cluster
centers, denoted as a set V .
33
STEP 2 : entails constructing a mountain
function representing a data density
measure. The height of the mountain
function at a point v V is equal to
where xi is the ith data point and s is an
application specific constant.
34
STEP 3:
involves selecting the cluster centers by
sequentially destructing the mountain
function.
The first cluster center c1 is determined
by selecting the point with the greatest
density measure.
35
On finding the next cluster center
requires eliminating the effect of the first
cluster.
A new mountain function is formed by
subtracting a scaled Gaussian function
centered at c1 :
36
The subtracted amount eliminates the
effect of the first cluster.
after subtraction, the new mountain
function new m (v) reduces to zero at
v = c1
After subtraction, the second cluster
center is selected as the point having the
greatest value for the new mountain
function.
37
Process continues until a sufficient
number of cluster centres is attained.
38
Subtractive Clustering
proposed by Chiu .
This technique is similar to mountain
clustering, except that instead of
calculating the density function at every
possible position in the data space, it
uses the positions of the data points to
calculate the density function, thus
reducing the number of calculations
significantly.
39
problem with the mountain clustering, is
that its computation grows exponentially
with the dimension of the problem; that is
because the mountain function has to be
evaluated at each grid point.
40
Subtractive clustering solves this problem
by using data points as the candidates for
cluster centers, instead of grid points as
in mountain clustering.
that is the computation is now
proportional to the problem size instead
of the problem dimension.
41
42
each data point is a candidate for
cluster centers, a density measure
at data point xi is defined as
where r
a
is a positive constant
representing a neighborhood radius.
Hence, a data point will have a high
density value if it has many neighboring
data points
43
The first cluster center x
c1
is chosen as
the point having the largest density value
D
c1
.
Next,the density measure of each data
point xi is revised as follows:
44
45
where r
b
is a positive constant which
defines a neighborhood that has
measurable reductions in density
measure.
Therefore, the data points near the first
cluster center x
c1
will have significantly
reduced density measure.
After revising the density function, the
next cluster center xc
2
is selected as the
point having the greatest density value.
The process is continued until a
sufficient number of clusters is attainted.
46
Constant rb is normally larger than ra to
prevent closely spaced cluster centers
Generally rb = 1.5 ra
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch.9: Supervised learning Neural Networks
47

Você também pode gostar