Gpac-Apso Clustering Using Modified S-Transform For Data Mining

MAYA NAYAK* et al.
ISSN: 22503676
[IJESAT] INTERNATIONAL JOURNAL OF ENGINEERING SCIENCE & ADVANCED TECHNOLOGY Volume - 2, Issue - 1, 38 48

IJESAT | Jan-Feb 2012
Available online @ http://www.ijesat.org 38
GPAC-APSO CLUSTERING USING MODIFIED S-TRANSFORM FOR
DATA MINING

Maya Nayak
1
, Satyabrata Dash
2

1
Professor, Department of IT, BPUT, Orissa, India, mayanayak3299@yahoo.com
2
Asst.Professor, Department of IT, BPUT, Orissa, India, dash_satyabrata@yahoo.co.in

Abstract
This paper presents a new approach for power signal time series data mining using S-transform based K-means clustering
technique. Initially the power signal time series disturbance data are pre-processed through an advanced signal processing tool such
as S-transform and various statistical features are extracted, which are used as inputs to the K-means algorithm for disturbance event
detection. Particle Swarm Optimization (PSO) technique is used to optimize cluster centers which can be inputs to a decision tree for
pattern classification. The proposed hybrid PSO-K-Means clustering technique provides accurate classification rates even under
noisy conditions compared to the existing techniques, which show the efficacy and robustness of the proposed algorithm for time
varying database like the power signal data.

Index Terms: K-means clustering, Time-varying power signal data, Particle swarm optimization, S-transform, decision
tree, classification
-------------------------------------------------------------------------- *** -------------------------------------------------------------------------

1. INTRODUCTION
Time series clustering has been very effective in providing
pertinent and useful information in various application domains
as a part of temporal data mining research. Time series data are
of interest because of its pervasiveness in various areas ranging
from science, engineering, business, finance, economics,
healthcare, etc. The goal of clustering is to identify the
structure of an unlabeled data set by objectively organizing data
into homogenous groups where the within-group-object
similarity is minimized and between the groups object-
dissimilarity is maximized. Clustering is required when no
labeled data whether is binary, categorical, numerical, textual,
interval, spectral, temporal or multimedia or mixture of the
above data types is available. Data are termed as static if all
their feature values do not change with time or change
negligibly. The bulk of clustering analysis has been applied to
static data. Cluster analysis has become an important technique
in exploratory data analysis, pattern recognition, machine
learning, neural computing, and other engineering system
studies. The clustering aims at identifying and extracting
significant groups in underlying data.

Unlike static data, the features of the time series comprise
values that change with time. From a given unlabelled time
series data streams, it is often desirable to determine the groups
of data that belong to a similar time series. The unlabelled time
series data could be obtained by monitoring a process or more
than a process over a certain period of time. Just like the static
data clustering, time series data clustering requires a clustering
algorithm or method to form clusters from a given set of
unlabelled data objects and the clustering algorithm is chosen
depending on the type of data available and its area of
application [1-7]. Various algorithms have been developed to
cluster different types of time series data. In order to apply the
clustering algorithms, it is necessary to convert the time series
data to the static one and modify the similarity or distance
measures for clustering. This is known as feature based
approach and is accomplished by processing the raw time series
data using a time-frequency transform such as modified S-
transform. The raw time series database comprises the power
line disturbance data recorded or simulated using MATLAB
software. The various power line disturbances include voltage
sag, swell, impulsive, and oscillatory transients, multiple
notches, momentary interruption, harmonics, and voltage
flickers, etc. In order to improve the quality of electrical power
supplied to customers, it is customary to continuously record
the disturbance waveforms using power-monitoring instruments.
Unfortunately most of these recorders rely on visual inspection
of data record creating an unprecedented volume of data to be
inspected by engineers. Thus an automatic recognition or
MAYA NAYAK* et al. ISSN: 22503676

classification of the voluminous data is required for improving
the quality of electricity supply.
Although wavelet multi resolution analysis combined with a
large number of neural networks provides efficient
classification of power quality (PQ) events, the time domain
featured disturbances such as sags, swells, etc. may not easily
be classified [8],[9]. In addition if an important disturbance
frequency component is not precisely extracted by wavelet
transform which consists of octave band pass filters the
classification accuracy may also be limited. This paper,
therefore, presents a generalized discrete S-transform [10] with
modified Gaussian window derived from STFT for the
detection, localization of power line disturbance signal time
series database. The S-transform with modified Gaussian
window or Modified S-transform (MST) is essentially a
variable window STFT whose window width varies inversely
with the frequency. The MST produces a time-frequency
representation of a time varying signal by uniquely combining
the frequency dependent resolution with simultaneously
localizing the real and imaginary spectra. Since the MST
provides the local spectrum of a signal the time averaging of
the local spectrum gives the Fourier spectrum. The MST of a
Power line disturbance signal time series data provides
contours which closely resemble the disturbance patterns unlike
the wavelet transform and hence the features extracted from it
are very suitable for developing highly efficient and accurate
classification scheme. Further the MST analysis of time varying
signal data yields all of the quantifiable parameters for
localization, detection and quantification of signals comprising
a time series database.

After the features are extracted from the time-series data using a
Fast discrete S-transform, a clustering analysis is used to group
the data into clusters and thereby identifying the class of the
data. The well known Fuzzy C-means Algorithm [12] is
commonly used for data clustering but suffers from trial and
error choice of the initial cluster centers and also the noise
present in the original time series data. In addition to this the
traditional Fuzzy C-means Algorithm gets stuck in the local
minima and, therefore, does not offer robustness in clustering
the features. In order to overcome this problem a simpler hybrid
K-Means clustering algorithm along with Particle swarm
optimization (PSO) technique [13-16] is used for clustering the
features into distinct groups that may be used further for
classification. Particle swarm optimization (PSO) is a
population-based algorithm that simulates bird flocking or fish
schooling behavior to achieve an optimized cluster centre using
an objective function. Unlike other evolutionary learning
algorithms, PSO needs smaller parameters to decide. PSO can
be easily implemented, and has stable convergence
characteristic with good computational efficiency. However, to
improve the clustering performance of PSO further, a hybrid
GPAC-Adaptive PSO, which is more effective and capable of
solving non-linear optimization problems faster and with better
accuracy in detecting the global best solution is used in this
paper. The main concept of this development is based on the
passive congregation and a kind of coordinated aggregation
observed in the swarms known as GPAC-PSO and modifying it
further to yield GPAC-Adaptive PSO or simply to be known as
GPAC-APSO.

This paper is organized as follows: Section-2 presents the
modified Discrete S-transform to extract features from raw time
varying power signal data and its implementation is described in
Section-3. In Section-4 the K-Means clustering is described and
the cluster centre optimization is described in Section 5. This
section also describes the GPAC-APSO for improving the
clustering performance of K-Means-PSO. Finally Section-6
gives the computational results and conclusion is presented in
Section-7.
2. MODIFIED DISCRETE S-TRANSFORM FOR FEATURE
EXTRACTION
The short-time Fourier transform of a signal ) t ( x is given by
}

t t
t t t = d e ) t ( w ) ( x ) f , t ( STFT
if 2
(1)
Where t and f denote the time and frequency ) (t is the main
signal and the position of the translating window is determined
by t , which has the same value as t. An alternate expression
for (1) is obtained through the use of convolution theorem as
}

to
o o + o = d e ) ( W ) f ( X ) f , t ( STFT
t 2
(2)
Where W is the Fourier transform of w, and the convolution
variable o has the same dimension as f. One of the
disadvantages of STFT is the fixed width and height of the
window. This causes misinterpretation of signal components
with periods longer than the window width and the width that
works well with low frequency components can not detect the
high frequency components. The S-transform is a time
localized Fourier transform and has a window whose width and
height vary with frequency. Although the theory of S-Transform
has been presented in references [9], [10] and originally by
Stockwell etal. [8], some of the equations and computational
steps are outlined below:

The general window function w(t, f) is chosen as
2
) f ( 2
t
e
2 ) f (
1
) f , t ( w
o
t o
= ,
and
f
) f (
|
= o (3)

The window is normalized as
1 ) , ( =
}

dt f t w
n
(4)
Here | is normally set to a value 0.2 for best overall
performance of S-Transform where the contours exhibit the
least edge effects and for computing the highest frequency
component of very short duration oscillatory transients | is
made equal to 5. The S-Transform performs multiresolution
analysis on the signal, because the width of its window varies
inversely with frequency. This gives high time resolution at
high frequencies and on the other hand, high frequency
resolution at low frequencies.

The expression for S-transform is given by
}

t t
t
t
t
t = d e . e
2
f
) ( x ) f , t ( S
if 2
2
) t ( f
2 2
(5)
Here the window function is used as given in (3).
An alternative formulation for S-transform is given by
}

to
o | t
o + o = d e e ) f ( X ) f , t ( S
t 2
f
2
2
2 2 2
(6)

The S-transform has an advantage in that it provides multi-
resolution analysis while retaining the absolute phase of each
frequency. This has led to its application for detection and
interpretation of events in time series in a variety of disciplines.
In this expression of S transform, the scalable Gaussian
window (the product of
t 2
| | f
and the real exponential)
localizes the complex Fourier sinusoid, giving the S-transform
analyzing function (the term in braces in (5)). The CWT
provides time resolution by translating its whole analyzing
function (the wavelet) along the time axis. The S-transform is
different because only the amplitude envelope of the analyzing
function (the window) translates; the oscillations are given by
the fixed Fourier sinusoid, which does not depend ont . Since
the local oscillatory properties of the analyzing function
determine the phase of the local spectrum, the S-transform can
be considered as having fixed phase reference. The fixed-
phase reference gives the S-transform some advantages over
wavelet transform.
The inverse S-transform is given by :
2
( ) ( , )
i ft
y t S f d e df
t
t t

(
=
(

} }

and ( , ) S f t is complex and it is represented as
( , )
( , ) ( , )
i f
S f A f e
u t
t t = (7)
where ) , ( ) , ( f S f A t t = is the amplitude of the S-spectrum and
|
|
.
|
\
|
=
)] , ( Re[
)] , ( Im[
arctan ) , (
f S
f S
f
t
t
t u is the phase of the S-spectrum.
The phase of S-spectrum is an improvement on the wavelet
transform in that the average of all the local spectra does indeed
give the same result as the Fourier transform.

The discrete version of the S-Transform of a signal is obtained
as
| |
N
mj
i
N
m
e n m W n m X n j S
t 2
1
0
] , [ ] , [

=
+ = (8)
Where X[m + n] is obtained by shifting the discrete Fourier
Transform (DFT) of x(k) by n, X[m] being given
| |
N
mk 2
j
1 N
0 k
e k x
N
1
] m [ X
t
= ,and
n
m 2
2 2
e ] n m [ W
| t
= +
, j, m and n = 0, 1, .., N-1

Another version of the discrete S-Transform used for
computation
| |
N
nk 2
i
*
1
2
N
2
N
k
N
nm 2
j
e ) n , k ( w k m x e ] n , m [ S
t
=
t
+ = (9)
for m = 1, 2, , M and n = 0, 1, 2, N/2
where M is the number of data points of the signal x[m], N is
the width of the window, the signal vector x[m] is padded at the
beginning or at the end with 0.

2.1 IMPLEMENTATION OF S-TRANSFORM (ST)

The computation of the S-Transform is efficiently implemented
using the convolution theorem and FFT. The following steps are
used for the computation of S-Transform:
(i) Compute the DFT of the signal x(k) using FFT
software routine and shift spectrum X[m] to X[m+n].
(ii) Compute the gaussian window function
) n / m 2 exp(
2 2 2 2
| t for the required frequency n.
(iii) Compute the inverse Fourier Transform of the product
of DFT and Gaussian window function to give the ST
matrix.

The output of the S-transform is an nm matrix, whose rows
pertain to frequency and columns indicate time. Each column
thus represents the local spectrum for that point in time. From
the ST matrix we obtain the frequency-time contours having the
same amplitude spectrum and these contours can be used to

visually classify the nature of the signal and its change of
frequency and the harmonic content if any during the distortion.

Although there are several algorithms available for time series
data clustering, the simple and widely used ones are the K-
means and Fuzzy C-means approaches. Other algorithms
include self-organizing maps, similarly distance measures,
short-time series distance measures, probability based methods,
linear predictive coding etc. The main idea behind the K-means
algorithm is the minimization of an objective function, which is
normally chosen to be the total distance between all patterns
from their respective cluster centers. The usual approach in
arriving at a solution is to start with arbitrary cluster centers and
then proceed iteratively. The distribution of feature values
amongst clusters and updating cluster centers are the two main
steps of the K-means algorithm. The clustering algorithm
alternates between these two steps until the value of the
objective function becomes minimum. The objective function of
the K-means is not convex and hence it may contain local
minima. Consequently, while minimizing the objective function,
there is possibility of getting stuck at local minima (also at local
maxima and saddle point). The performance of the K-means
algorithm depends on the initial choice of the cluster centers.
Besides, the Euclidean norm is sensitive to noise or outliers.
Hence K-means algorithm should be affected by noise and
outliers. To overcome the problems faced by GA and Simulated
annealing, we provide an evolutionary-based clustering method
by PSO. It has the higher ability to find the near-optimal
solutions in the search space.

Particle swarm optimization (PSO) is a population-based
algorithm. This algorithm simulates bird flocking or fish
schooling behavior to achieve a self evolution system. It can
automatically search the optimum solution in the solution space.
But the searching process isnt randomness. According to the
different problems, it decides the searching way by the fitness
function. Unlike other evolutionary learning algorithms, PSO
needs smaller parameters to decide. PSO can be easily
implemented, and has stable convergence characteristic with
good computational efficiency. This paper presents an
evolutionary particle swarm optimization (PSO) learning-based
method to optimally cluster N data points into K clusters. The
main concept of this development is based on the passive
congregation GPAC-PSO and a kind of coordinated
aggregation observed in the swarms. As with GA, GPAC-PSO
also has premature convergence and thus results in an
inaccurate solution.
3. K-MEANS CLUSTERING
In the field of clustering, K-means algorithm is the most
popularly used algorithm to find a partition that minimizes
mean square error (MSE) measure. Clustering analysis is a
technology, which can classify the similar sample points into
the same group from a data set. It is a branch from multi-
variable analysis and unsupervised learning rule in the pattern
recognition. For space S which has the K groups and the N
points {x
1
, x
2
, , x
N
}, the definition of the clustering analysis is
as follows: In many clustering techniques, the K-means
(generally called hard C-means) algorithm is one of well-known
hard clustering techniques. It can allocate the data point x
i
to the
closest cluster center z
j
by using Euclidean distances:
} ,..., 2 , 1 ( }, ,..., 2 , 1 { , K j N i z x d
j i
e e = (10)

A good method will cluster data set X = {x
1
, x
2
, ,x
N
} into K
well partitions with 2s s -1. When we have an
unlabelled data set, it is very important to define an objective
function for a clustering analysis method. Intuitively, each
cluster shall be as compact as possible. Thus, the objective
function of the K-means algorithm is created with the Euclidean
norm given by:
2
1

e =
=
i i
C x
j i
K
j
E
-z x J
(11)
Where z
j
is the jth cluster center. The necessary condition of the
minimum J is
e
= =
j i
C x
i
j
j
K j x
N
z ,..., 2 , 1 ,
1
(12)
Where N
j
is the number of points belonging to cluster C
j
.

The K-means clustering process can be stopped when any one
of the following criteria are satisfied: when the maximum
number of iterations has been exceeded, when there is little
change in the centroid vectors over a number of iterations, or
when there are no cluster membership changes.
4. PARTICLE SWARM OPTIMIZATION
Particle swarm optimization (PSO) is a population-based
stochastic search process, modeled after the social behavior of a
bird flock. The algorithm maintains a population of particles,
where each particle represents a potential solution to an
optimization problem. In the context of PSO, a swarm refers to
a number of potential solutions to the optimization problem,
where each potential solution is referred to as a particle. The
aim of the PSO is to find the particle position that results in the
best evaluation of a given fitness (objective) function. Each
particle represents a position in N
d
dimensional space, and is
flown through this multi-dimensional search space, adjusting
its position towards both. For a particle moving in a
multidimensional search space, let
k i
x
,
and
ik
v denote the
position of the i
th particle
in then j
th
dimension and velocity at time
t.


Using the above notation, a particles position is adjusted
according to

, , 1 1, , 2 2, ,
( 1) ( ) ( ) ( ( )) ( )( ( ) ( ))
i k i k k i k k k i k
v t wv t c r t y x t c r t y t x t + = + +
(13)

( 1) ( ) ( 1)
i i i
x t x t v t + = + + (14)

where w is the inertia weight c
1
and c
2
are the acceleration
constants,
1,
( )
k
r t ,
2,
( ) ~ (0,1),
j
r t U and k=1,,N
d
. The
velocity is thus calculated based on three contributions: (1) a
fraction of the previous velocity, (2) the cognitive component
which is a function of the distance of the particle from its
personal best position, and (3) the social component which is a
function of the distance of the particle from the best particle
found thus far (i.e. the best of the personal bests).

The personal best position of particle i is calculated as
( ) if ( ( 1)) ( ( ))
( 1)
( 1) if ( ( 1)) ( ( ))
i i i
i
i i i
y t f x t f y t
y t
x t f x t f y t
+ >
+ =

+ + <

(15)
4.1 PSO CLUSTERING
In the context of clustering, a single particle represents the N
c

cluster centroid vectors. That is, each particle x
i
is constructed
as follows:
1 1
( ,..., ,..., )
c
i i j iN
x m m m = (16)
where m
i1
refers to the j-th cluster centroid vector of the i-th
particle in cluster C
ij
. Therefore, a swarm represents a number
of candidates clustering for the current data vectors. The fitness
of particles is easily measured as the quantization error,
1
( , ) /
p ij
Nc
j Z C p j ij
e
c
d z m C
J
N
= e
(

=

(17)
where d is defined in equation (3.2), and
ij
C
is the number
of data vectors belonging to cluster C
ij
.

The next section first presents a standard gbest PSO for
clustering data into a given number of clusters and then shows
how K-means and the PSO algorithm can be combined to
further improve the performance of the PSO clustering
algorithm. Hybrid PSO algorithm consistently performs better
than the other two approaches with an increase in the number of
clusters. The K-means algorithm exhibited a faster, but
premature convergence to a large quantization error, while the
PSO algorithms and slower quantization errors. Again, we
proceed to the effort of developing more effective PSO
algorithms by reflecting recent advances in swarm intelligence
and, in addition, by introducing new concepts. Under these
conditions, one new hybrid PSO algorithm is proposed, which
is more effective and capable of solving non-linear optimization
problems faster and with better accuracy in detecting the global
best solution. The main concept of this development is based
on the passive congregation and a kind of coordinated
aggregation observed in the swarms known as GPAC-PSO and
modifying it further to yield GPAC-Adaptive PSO or simply to
be known as GPAC-APSO.

Congregation, on the other hand, is a swarming by social forces,
which is the source of attraction of particle to others and is
classified in two types: social and passive. Social congregation
usually happens when the swarms fidelity is high, such as
genetic relation. Social congregation necessitates active
information transfer e.g., ants that have high genetic relation use
antennal contacts to transfer information about location of
resources. We do not consider social congregation in this paper
because it frequently displays a division of labor. Finally,
passive congregation is an attraction of a particle to other
swarm members, where there is no display of social behavior
since particles need to monitor both environment and their
immediate surroundings such as the position and the speed of
neighbors. Such information transfer can be employed in the
passive congregation. The global variant-based passive
congregation PSO (GPAC) is enhanced with the construction
factor approach.

The swarms of the enhanced GPAC is manipulated by the
velocity update

| |
1 1 2 2 3 3
( 1) . ( ). ( ) . .( ( ) . .( ( )) . .( ( ))
i i i i k i r i
v t k w t v t c rand P S t c rand P S t c rand P S t + = + + +

(18)

where i = 1, 2,, N; c
1
, c
2
, and c
3
are the cognitive, social, and
passive congregation parameters, respectively; rand
1
, rand
2
, and
rand
3
are random numbers uniformly distributed within [0,1]; P
i

is the best previous position of the ith particle; P
k
is either the
global best position ever attained among all particles in the case
of enhanced GPAC and P
r
is the position of passive congregator
(position of a randomly chosen particle-r). The position of the
ith particle in the n-dimensional decision space is limited by the
minimum and maximum and maximum positions expressed by
vectors
min max
,
i i
S S (

. Here, the minimum and maximum
position vectors express the inequality constraints. The
velocities of the ith particle in the n-dimensional decision space
are limited by
max max
,
i i
V V (

, where the maximum velocity in
the lth dimension of the search space is proposed as

max min
, , max
,
, ( 1, 2,..., )
i l i l
i l
s S
v l n
Nr
= =
(19)
where
min
, i l
s and
max
, i l
s are the limits in the l-dimension of the
search space. The maximum velocities are constructed in small
intervals in the search space for better balance between
exploration and exploitation. N
r
is a chosen number of search
intervals for the particles. It is an important parameter in the
enhanced GPAC algorithms. A small N
r
facilitates global
exploration (searching new areas), while a large one tends to
facilitate local exploration (fine tuning of the current search
area). A suitable value for the N
r
usually provides balance
between global and local exploration abilities and consequently
results in a reduction of the number of iterations required to
locate the optimum solution.

The basic steps of the enhanced GPAC are listed below:

Step-1: Generate a swarm of N-particles with uniform
probability distribution, initial positions (0)
i
S , and velocities
(0), ( 1, 2,..., )
i
V i N = , and initialize the random parameters.
Evaluate each particle-i using objective function f (e.g., f to be
minimized).

Step-2: For each particle-i, calculate the distance
i j
d between
its position and the position of all other particles:
( 1, 2,..., )
ij i j
d S S i j N = = = where
i
S and
j
S are the
position vectors of particle-i and particle-j, respectively.

Step-3: For each particle-i, determine the nearest particle,
particle-k, with better evaluation than its own, i.e.,
min ( ),
ik j ij k i
d d f f = s , and set is as the leader of particle-i.
In the case of enhanced GPAC, particle-r and set it as
the global best.

Step-4: For each particle-I, randomly select a particle-r and set
it as passive congregator of particle-i.

Step-5: Update the velocities and positions of particles using
equations (13) (14), and (15), respectively.

Step-6: Check if the limits of positions and velocities are
enforced. If the limits are violated, then they are replaced by
the respective limits.

Step-7: Evaluate each particle using the objective function f.

Step-8: If the stopping criteria are not satisfied, go to Step 2.
The enhanced GPAC algorithms will be terminated if one of the
following criteria is satisfied:
(1) No improvement of the global best in the last 30 generations
is observed, or (2) the maximum number of allowed iterations is
achieved (in this chapter).

5. PSO AND GPAC-APSO BASED K-MEANS
CLUSTERING ALGORITHM FOR TIME SERIES PATTERN
CLASSIFICATION
The hybrid K-means PSO algorithm is applied to the time series
data obtained from an electricity distribution network and the
feature clusters are found to clearly classify the disturbance
event of the time series. After the complex S-matrix is obtained
by using the procedure outlined earlier, the feature vectors F
1
,
F
2
, F
3
, and F
4
required for clustering are chosen from the sets of
features described in Table-1. These features are used to
classify the time series data into several classes by using a
hybrid clustering technique based on PSO and K-means
algorithms. The K-means algorithm tends to converge faster
(after less function evaluations) than the PSO, but usually with a
less accurate clustering. This section shows that the
performance of the PSO clustering can further be improved by
sending the initial swarm with the result of the K-means
algorithm. The hybrid algorithm first executes K-means
algorithm once. In this case the K-means clustering is
terminated when the maximum number of iterations is
exceeded, or when the average change in centroid vectors is less
that that 0.0001 (a user specified parameter). The result of the
K-means algorithm is then used as one of the particles, while
the rest of the swarm is initialized randomly. The extracted
features are extracted from S-transform as
1.
1
F

= max (A)+min (A)-max (B)-min (B). (20)
where A is the amplitude versus time graph from the S-matrix
under disturbance and B is the amplitude versus time graph of
the S-matrix without disturbance.
2.
2
F = Standard deviation of the magnitude versus time
spectrum obtained from ST matrix
=
2 / 1 2
) )) (
1
) ( (
1
(

j j
j S
N
j S
N
.
(21)
3.
3
F = energy in the S-transform output
=
2
) , ( j n S (22)
4. F
4
= Total harmonic distortion (THD) =
| |
1
2
2
/ V V
N
n
n
=
(23)

where N is the number of points in the FFT,
n
V the value of
the nth harmonic component of the FFT.
5. F
5
= estimated frequency at the maximum amplitude
obtained from ST matrix.

Table- 1 Features extracted from S-transform with noise

Disturbanc
es
F1 F2 F3 F4 F5
Normal .9963 .001 .052 .028 50
Sag (60%) .591 .022 .039 .027 50
Swell
(50%)
1.503 .012 .076 .029 50
Momentary
Interruption
(MI)
.070 .0387 .0323 .044 50
Harmonics 1.032 .050 .064 0.25 50
Sag with
Harmonic
(60%)
0.601 .0228 .0408 0.113
9
50
Swell with
Harmonic
(50%)
1.50 .0219 .079 .1155 50
Flicker
(4%, 5 Hz)
.998 .0209 .027 0.115
9
55
Notch +
harmonics
.940 .1275 .0531 0.198 50
Spike +
harmonics
1.072 .141 .066 .204 50
Transient
(low
frequency)
1.02 0.1473 .0148 .0566 440
Transient
(high
frequency)
1.038
4
.155 .014 .068 3315

5.1Computational Results for PSO Optimized K-
means algorithm

Thus section compares the results of the K-means, PSO and
Hybrid clustering algorithms on six time series events. The
main purpose is to compare the quality of the respective
clustering, where quality is measured according to the following
three criteria:
- the quantization error
- the intra-cluster distances, i.e. the distance between
data vectors within a cluster, where the objective is to
minimize the intra-cluster distances;
- the inter-cluster distances, i.e. the distance between the
centroids of the clusters, where the objective is to
maximize the distance between clusters.

The latter two objectives, respectively, correspond to crisp,
compact clusters that are well separated. For all the results
reported, an average over 30 simulations is given. All
algorithms are run for 1000 function evaluations, and the PSO
algorithms and 20 particles are chosen for PSO based
clustering. The parameters chosen for PSO are: w=0.72 and
c
1
=c
2
=1.49 to ensure good convergence.

Fig.2 shows the plot of the features F
1
versus F
2
showing clearly
the clusters belonging to steady disturbance patterns like
voltage sag (or dip), voltage swell (rise), harmonics and normal
signal time series, and transient disturbance patterns like
oscillatory transient, notches, and spikes. From Fig.2, it can be
seen that the K-means clustering although clusters, there may be
a number of misclassifications, while GPAC-APSO is found to
produce clear cluster centers separating the above mentioned
disturbance patterns in comparison to K-Mean and is shown in
Fig.3. The fitness function of K-Means and different PSO
variants described in Fig.4 shows that the GPAC-APSO
produces greater convergence in comparison to simply K-means
algorithm.

Similar plots for the features F
2
and F
4
clusters relate to the
presence or absence of harmonics, and fitness values,
respectively. The fitness function converges to a low
quantization error rapidly in the case of both K-Means and
GPAC-APSO algorithms. Cluster 1 belongs to power signal
time series data, when the % of harmonics present in the data is
< 5% of the fundamental component. When the harmonic
content is more than 7 to 10 % of the fundamental component,
the cluster-2 becomes prominent and shows such data points.
However, K-Means does not produce a very clear boundary
between different patterns while GPAC-APSO seems to be very
effective in separating the clusters with a greater clarity in
comparison to other PSO variants.


FIGURES

Fig. 2 F
1
- F
2
K-Means PSO
Cluster-4
Cluster-3
Cluster-1
Cluster-2
HYBRID PSO OUTPUT :
Cluster 1 Sag
Cluster 2: Swell
Cluster 3: Normal + harmonic
Cluster 4: Transient + Notch
Feature
Extraction
(ST or WT)
Clustering
algorithm
Time series Clusters Features
Fig.1 Time series clustering using feature based approach.


Cluster 1 Sag
Cluster 2: Swell
Cluster 3: Normal + harmonic
Cluster 4: Transient + Notch

Fig.3. F
1
- F
2
GPAC APSO



6. CONCLUSION
An evolutionary clustering technique has been developed by
hybridizing the K-means algorithm and PSO for time-series
clustering of power signal disturbance data. It can be considered
as a viable and an efficient heuristic to find optimal or near-
optimal solutions to clustering problems of allocating N data
points to K clusters. The proposed method is very efficient and
simple to implement for clustering analysis when the number of
clusters is known a priori. According to the simulation results
of the proposed approach to a number of time series data sets
occurring in power network disturbances with different
geometrical properties, it is clear that the proposed approach is
more robust than the traditional clustering analysis algorithms.
Several variants of PSO have been also presented in this chapter
and the modified GPAC-PSO known as GPAC-APSO is found
to be more efficient and has a quick convergence property in
comparison to other PSO variants.

REFERENCES
[1]. Selim, S. Z. and Ismail, M. A., K-means Type
Algorithms: A Generalized Convergence Theorem and
Characterization of Local Optimality, IEEE Trans. Pattern
Anal. Mach. Intell. vol. 6, pp.81-87, 1984.

[2]. Wu, K.L. and Yang, M.-S., Alternative C-means
Clustering Algorithms, Pattern Recognition, vol.35, pp.2267-
2278, 2002.

Fig. 4 F
1
- F
2
Fitness Plots
GPAC
APSO
PSO
K-Means
GBEST PSO GPAC APSO

K-Means
PSO
GPAC-APSO
GBEST PSO

[3]. J. Han, M. Kamber, Data Mining: Concepts and
Techniques, Morgan Kaufmann, San Francisco, 2001 pp. 346
389.

[4]. K. Krishna, M.N. Murty, Genetic k-means algorithms,
IEEE Trans. Syst. Man Cybernet.-B: Cybernet. 29 (3) (1999)
433439.

[5]. R. H. Shumway, Timefrequency clustering and
discriminant analysis, Stat. Probab. Lett. 63 (2003) 307314.

[6]. D. Tran, M. Wagner, Fuzzy c-means clustering-based
speaker verification, in: N.R. Pal, M. Sugeno (Eds.), AFSS
2002, Lecture Notes in Artificial Intelligence, 2275, 2002,
pp.318324.

[7]. V.S. Ananthanarayana, M.N. Murty, D.K. Subramanian,
Efficient clustering of large data sets, Pattern Recognition 34
(2001) 25612563.

[8]. S.R.Shaw, C.R.Laughman, S.B.Leeb, and R.F.Lepard,
A power quality prediction system, IEEE Transaction on
Industrial Electronics, vol.47, No.3, Jun 2000, pp.511-517.

[9]. M. Gaouda, S.H.Kanoun, M.M.A. Salama, and A.Y.
Chikhani, Wavelet base signal processing for disturbance
classification and measurement, IEE

[10]. Proc. Gen. Trans. and Distribution, vol. 149, May 2002,
pp 310-318,

[11]. I.W.C.Lee, P.K.Dash, S-transform based intelligent
system for classification of power quality disturbance signals,
IEEE Transactions of Industrial Electronics, vol.50, no.4,
pp.800-806.

[12]. P, K Dash, B.K Panigrahi, and G.Panda Power Quality
Analysis Using S-Transform IEEE Transaction on power
delivery,vol 18,No 2,April 2003.

[13]. Nicolaos B. Karayiannis, and James C. Bezdek, An
Integrated Approach to Fuzzy Learning Vector
Quantization and Fuzzy C -means Clustering IEEE Trans. on
Fuzzy System, vol. 5, no. 4, November 1997.

[14]. Eberhart, R. and Kennedy, J., A New Optimizer Using
Particle Swarm Theory, Proc. 6th Int. Symposium on Micro
Machine and Human Science, pp. 39-43, 1995.

[15]. Eberhart, R. C. and Shi, Y., Particle Swarm
Optimization: Developments, Applications and Resources,
Proceedings of the IEEE Congress on Evolutionary
Computation (CEC 2001), Seoul, Korea, 2001.

[16]. Chen, Ching-Yi. and Ye, Fun., Particle Swarm
Optimization Algorithm and Its Application to Clustering
Analysis, IEEE ICNSC 2004, Taipei, Taiwan, R.O.C., pp.789-
794, 2004.

[17]. Chen, Ching-Yi. and Ye, Fun., K-means Algorithm
Based on Particle Swarm Optimization, 2003 International
Conference on Informatics, Cybernetics, and Systems, I-Shou
University, Taiwan, R.O.C. pp.1470-1475, 2003.

Gpac-Apso Clustering Using Modified S-Transform For Data Mining

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Gpac-Apso Clustering Using Modified S-Transform For Data Mining

Enviado por

Direitos autorais:

Formatos disponíveis

MAYA NAYAK* et al.

Você também pode gostar