DsitEst VDPSO IEEEfmt DrO 8pages After Review

Varying Dimensional Particle Swarm Optimization
Yanjun Yan and Lisa Ann Osadciw

Department of Electrical Engineering and Computer Science Syracuse University Syracuse, NY 13244 Email:{yayan,laosadci}@syr.edu
AbstractA new algorithm, varying dimensional particle swarm optimization (VD-PSO), is proposed for entities with varying-dimensional components and each component assumes continuous-valued parameters. Such problems are distinct from current benchmark problems where the dimension of the particles is xed. One well-studied application of VD-PSO is probability density estimation by Gaussian Mixture Models. A particle in VD-PSO includes a discrete number as the number of components and a set of real-valued component parameters. The number of components varies according to a random scheme, which dictates how many sets of components remain or expand. The component parameters are matched up supporting a linked update. Three other methods, the last two of which are also proposed by us as intuitive attempts to solve the varying dimensional problems, are compared with VD-PSO: 1. binary-headered PSO, 2. exhaustive PSO, 3. discrete-headered PSO. Simulations on known data with specied components show that VD-PSO provides a competitive density estimation to the exhaustive PSO, but spends the least wall-clock-time among all algorithms, almost 12% of the exhaustive PSO, while the binary-headered PSO or discrete-headered PSO do not achieve similar performance, because the binary-headered PSO is adversely affected by the dummy parameters, and the discrete-headered PSO makes too many dimension variations before the parameters are well tuned. The new VD-PSO algorithm is a viable and efcient solution to varying dimensional optimization problems. Index TermsVarying Dimension, Particle Swarm Optimization, Distribution Estimation, Gaussian Mixture Model
I. M OTIVATION AND I NTRODUCTION Particle Swarm Optimization (PSO) [1] has successfully solved various optimization problems [2]. The search space and the dimension of the particles are traditionally determined a priori and xed throughout the search. However, there are problems that a valid solution takes only a subset of the potential dimensions. The dimension of a solution is dened as the number of components that constitute a complete and feasible solution, and each component assumes the same structure, with either a (set of) discrete number(s), a (set of) continuous number(s), or a (set of) hybrid structure(s). For instance, Kamath, Ye and Osadciw [3] implemented a binary-headered PSO for aircraft interrogation scheduling, where the interrogations can happen between many pairs of aircrafts and their nearby ground transceivers, but an optimal scheduling is a subset of all potential interrogations to ensure an accurate detection, mitigate the resource allocation collisions and minimize the transmission costs. In this binaryheadered PSO, each particle is constructed with maximal dimension. A binary header associated with each particle
differentiates the used parameters from the dummy parameters. The binary-headered PSO isolates the solution dimensions, and it is suitable for problems where the parameters at different dimensions do not affect each other in particle movements. The interactions between different dimensions appear in tness evaluations. Another example of the varying-dimensional problems is attribute selection for protein classication by a discrete PSO algorithm proposed by Correa, Freitas and Johnson [4]. In attribute selection, the discrete values are the indices of attributes (up to 443 in this case) and they are unordered. The number of attributes in each particle is initialized randomly to be different, but the dimension of a particle does not change over iterations. Correa et al. propose to modify the discrete values within a particle (the indices of selected attributes) following the probabilities enhanced by the overlap between the current particle and attractors. This procedure works well for unordered attribute selection. However, in probability density estimation problems, the attributes of potential components are continuous, which invalidates the proportional likelihood selection method by Correa, et al. Another difference is that the values of the component parameters are continuous and their magnitudes are ordered. Based on the ordering of the magnitudes, the parameters are connected cross dimensions, and the linkage affects the movements of the particles. Motivated by the need to select a varying number of components with continuous-valued parameters, varying dimensional particle swarm optimization (VD-PSO) is proposed. In VDPSO, a particle includes a discrete number of components and each component contains a collection of parameters. The dimensions of the particles are random, and they vary with certain probability during iterations. The parameters in different components are linked based on their values. VD-PSO also differs from Niching [5] or Stretching [6] based PSO algorithms, which are utilized to solve multi-modal problems. In multi-modal problems, the searching space is still xed and common to all particles, although there are multiple optimal solutions and the particles may be clustered or directed to search a partial region. The particles in VD-PSO covers a varying subset of the potential solution space (each part is a component), and the tness evaluation is on the union of the components altogether. We compare VD-PSO with three other viable algorithms based on PSO to solve the dimension varying problems.
The rst comparison is with binary-headered PSO, which is proved to be successful in some applications [3]. The second comparison is with exhaustive PSO, which is an intuitive extension of the canonical PSO by trying out all potential dimensions. In each trial, the dimension is xed. The best result is selected from all the trials. The third comparison is with a straightforward application of discrete PSO on a discrete header to determine how many components exist. We designed exhaustive PSO and discrete-headered PSO as other attempts to solve the varying-dimensional problems. Our simulation indicates that in terms of tting accuracy, calculation time and memory storage altogether, the three comparison algorithms do not work as well as VD-PSO. Nevertheless, they show interesting properties and should be suitable for certain situations. The rest of the paper is organized as follows. VD-PSO is formulated in Section II with dimension varying and parameter updating rules. The experiments on estimating the probability density function of four datasets by Gaussian mixture models are setup in Section III by using the new VD-PSO algorithm compared with three other PSO algorithms. The performance and calculation resources are reported in Section IV with explanations on why the algorithms work or not to work. Section V concludes this paper with future work. II. VARYING D IMENSIONAL PARTICLE S WARM O PTIMIZATION In VD-PSO the denition of particles and the tness evaluation are straightforward as in other PSO algorithms. The design challenge lies in the dimension-varying and parameterupdating rules. Dimension increment or decrement is determined by a probabilistic scheme, and the attractions of the particles need to be carefully evaluated, especially when the dimension of the particles is different from the dimension of the attractors (the local and global best particles). VD-PSO is versatile and applicable to problems whose components can be numerically ordered, such as the density estimation discussed in this paper and the economic dispatch (ED) problem [7]. A. VD-PSO Formulation In this section, the mathematical formulation of VD-PSO is elaborated. An example of the tting by a mixture model is utilized throughout the paper to facilitate the understanding. Suppose that a dataset is given, and its statistical properties are modeled by a mixture model as
M
the total number of components. wj is the weight of the jth M component, and j=1 wj = 1. In a Gaussian mixture model, the distribution of the Gaussian components is F (x|v1 , v2 ) =
(xv1 ) 1 2 2v2 e 2v2 2
(2)
for x (, ). All the unknown parameters are included into the particle as follows, P : {M, (v1,1 , v2,1 , w1 ), , (v1,M , v2,M , wM )}, (3)
where N = 2 is used for Gaussian mixture models. M dictates the number of the sets of the parameters. The particles can have different M values, and may change their own M during each movement (iteration). There are four basic parts to the VD-PSO algorithm: updating M, updating sets of parameters, tness evaluation, and attraction of particles. These are described as follows. 1) Updating M : Determined by the physical modeling of the data, the integer number of components, M , is searched within a range, [Mmin , Mmax ]. Without any other constraints, M is designed to be increased or decreased by 1 with a certain probability to avoid instability. The probability of the state transition of M is P (M max(M 1, Mmin )) = p, P (M M ) = 1 2p, P (M min(M + 1, Mmax ) = p. (4)
Histdata (x) Hpdf (x) =

j=1
wj F (x|v1,j , , vN,j ), (1)
where the data are grouped into bins, and the centers of the bins constitute a vector x. Histdata (x) is the histogram of the data, which conforms to a mixture model with probability density function (PDF), Hpdf (x). The mixture model consists of several components. The components are assumed to follow a known distribution, with PDF F (x|v1,j , , vN,j ), where F is determined by its N parameters, (v1,j , vN,j ), and j denotes the component index ranging from 1 to M . M is
The value p is a real number probability parameter within the range of [0, 0.5], but preferably it is less than 0.25 so that M has more than half of the chances (12p > 0.5|0 < p < 0.25) to stay at its original value to ensure stability. Currently, the probability of increasing is set to be the same as the probability of decreasing, since no favorable evidence supports either. If there is such an evidence of a favorable operation in an application, the probabilities for an increment or decrement (p1 or p2 ) can be set differently as long as 1 p1 p2 > 0. Later, this paper discusses certain component values that may get pruned affecting M . This variation of M is an involuntary transition. 2) Updating the Sets of Parameters: Once the transition of M is determined, the sets of parameters are adjusted. If M is increased by 1, a random set of parameters, within the range of the parameters, is added into the particle. If M is decreased by 1, a decision needs to be made which set of parameters will be discarded. In density estimation, the components share the same metric and represent the same physical quantities, which enables the comparison of the components in a common sub-space (v1 , v2 , w). If a problem with multiple dimensions can be simplied and optimized in its sub-dimensions, then it is a subspace problem and solvable by xed-dimensional canonical PSO. However, a mixture model tting problem is not a subspace problem, because there are multiple components interacting with each other at the same time, and the mixture model
can not be optimized by its single component. Therefore, VDPSO differs signicantly in concept and mathematical details from a sub-space PSO problem. M In the particles, j=1 wj = 1. The weight is an indicator on how important the component is, and the weight is used to decide which component to delete, if needed. 3) Fitness Evaluation: After the M value and the component parameters are set, the composite mixture distribution is a continuous PDF in Equation (1). Among the data density tting tests, the Kolmogrov Smirnov test (K-S test) [8] is a powerful goodness-of-t test. When the data are abundant, the histogram is an efcient representation, and the K-S test is adapted to measure the absolute difference between the tted model and the empirical distribution without considering the statistical signicance. The summation of the histogram is the number of observations in the dataset. The histogram is normalized by the number of observations to approximate the PDF by Hnorm (i) = H(i)
B i=1
4) Attraction of Particles: A particles position is updated by its velocity, and its current velocity is determined by its previous velocity and outside inuences. The traditional PSO algorithms update the velocity by ut+1 =w ut + c1 r1 (Pl,t Pt )+ c2 r2 (Pg,t Pt ), (10)
where ut is the velocity of the particle population at time instance t , ut+1 is the velocity of the next step, and w is an inertial constant, less than 1, to retain the information of previous velocity. Pt is the current particles location that needs to be updated. Pl,t is the local best particles location that the current particle has been before, which represents its own cognitive awareness. Pg,t is the location of the global best particle, which represents the social inuence. c1 and c2 are constants. r1 and r2 are uniform random numbers. The location of the current particle is updated by Pt+1 = Pt + ut+1 . (11)
H(i)
(5)
where H(i) is the number of data points in the ith bin of the histogram. The number of bins is B. Hnorm (i) is the normalized histogram. The width of each bin, or the resolution of the histogram, is not necessarily uniform, and the boundaries of the ith bin are denoted as [bi,min , bi,max ), where bi,min = bi1,max and bi,max = bi+1,min . On the other hand, the theoretical composite PDF, which is a continuous function, is evaluated at the center of each histogram bin with x = {(b1,min + b1,max )/2, , (bB,min + bB,max )/2}, (6)
M
Hpdf (x|P ) =
j=1
wj F (x|v1,j , , vN,j ),
(7)
where P is the particle dened in Equation (3). F (x|v) is the tted model evaluated at x with distribution parameter vector v, specically by Equation (2) in Gaussian mixture model. A normalized histogram is equivalent to a mass density function, and the values on the continuous PDF curve must be multiplied by the width of the interval to match up a mass density function as Hmdf (xi |P ) = (bi,max bi,min )
M
wj F (
j=1
bi,min + bi,max |v1,j , , vN,j ), 2
(8)
where i = 1, , B. Based on Hnorm (x) (in Equation 5) and Hmdf (x|P ) (in Equation 8) as above, the tness of a particle P is dened as the mean square error between Hnorm (x) and Hmdf (x|P ),
B
f itness(P ) =
i=1
(Hmdf (xi |P ) Hnorm (xi ))2 ,
(9)
which is minimized.
In this paper, the location of the particles do not necessarily reside in the same space as a result of the varying number of dimensions. Thus, the location updating is not as straightforward as in canonical PSO algorithms. An important step is to match up the components. In the mixture model example, all the components share the same subspace, (v1 , v2 , w), and thus the components within two different particles can set up a multiple-to-multiple mapping without the need of crossspace projection. The mapping between the components in two particles is determined by the physical meaning of the component parameters. In the mixture model example, the mean parameter, v1 , indicates the location of the component along the data value axis, and the variance, or shape, parameter, v2 , indicates the characteristic of that component. Considering that a dataset may contain signicant components at different means but with similar variances, v1 is more discriminative than v2 as the indicator to set up the mapping. After the M is updated and hence the components are deleted or expanded, the tness of all the particles in the population is evaluated. The particles need to be updated by their corresponding attractors. Suppose that there are Mp components in the current particle, and there are Ma components in the attractor. L = min(Mp , Ma ) is the number of the mapping links. If there are no less components in the attractor than in the current particle (Mp Ma ), then all the components in the current particle will get updated by the attractor. Otherwise, if Mp > Ma , some components in the current particle (there are Mp Ma of them) do not have a matching component to attract to, and they will be updated by inertia only. The components are associated by the following procedure: 1) Sort the components in the current particle by the importance, or the weight w, in a descending order. 2) Compare the parameter v1 in the rst component in the current particle with the v1 s in the attractor to select the closest one (supposedly in the a1 th component in the attractor). Set up the mapping 1 a1 .
3) Put a1 into a map list. Exclude the rst component in the current particle and the a1 th component in the attractor from the comparison for linkage. 4) Repeat steps 2 and 3 for another L 1 times. The diagram of this updating is illustrated in gure 1, where the attractor is a general notation, which can be the global or local best particle.
particle. The time index 0.3 in ut+0.3,k1 denotes that the updating is not nished yet. The velocity is then updated by the local best particle. The mapping of the components from the current particle, Pt , to the local best particle, Pl,t , is {k2 lk2 }. For each updatable component with index k2 [1, min(Mp , Ml )], where Ml is the number of components in the local best particle, the velocity is updated by ut+0.6,k2 = ut+0.3,k2 + c1 r1 (Pl,t,lk2 Pt,k2 ). (13)
Current Particle: Mp components
Sort components by w
Attractor: Ma components
Yes
Mp Ma?
No
The velocity is lastly updated by the global best particle. The mapping of the components from the current particle, Pt , to the global best particle, Pg,t , is {k3 gk3 }. The k3 s in this social update may not be identical to the k2 s in the cognitive update, as the number of components may differ in the local and global best particles. For each updatable component with index k3 [1, min(Mp , Mg )] (Mg is the number of components in the global best particle), the velocity is updated by ut+1,k3 = ut+0.6,k3 + c2 r2 (Pg,t,gk3 Pt,k3 ). Pt+1 = Pt + ut+1 , (14)
K=Mp
K=Ma
k=1 Find the closet component in the attractor (ak) to the kth component in the current particle by v1 Set up the mapping k a k, Put ak into a map list Yes
Finally, the locations of current particles are updated by (15)
This nal step is identical to the canonical PSO algorithms. The differences between canonical PSO and VD-PSO are in the varying number of components in each update, and the matching of particle components between the particle and attractors. III. E XPERIMENT S ETUP The components of a mixture models probability density function (either Gaussian, Weibull, or other distributions) are not orthogonal, and hence the distribution estimation by a mixture model may not be unique nor analytic. The evaluation on the goodness of tting is based on the tness dened in section II. In order to test the robustness and efciency of our proposed VD-PSO algorithm, we carried out experiments on four datasets. The rst two datasets contain three and six equally weighted Gaussian components respectively. The next two datasets contain three and six unequally weighted Gaussian components respectively. The density estimation is implemented on a histogram instead of data points, and the PDFs of the four datasets with known components are converted to histograms by rounding in Figure 2. A. Equally Weighted Known Components The rst dataset with three equally weighted components (M = 3) has the component parameters listed in Table I with weights in the third column. The histogram of it is the leftupper plot in Figure 2. The second dataset with six equally weighted components (M = 6) has the component parameters listed in Table II in the third column. The histogram of it is the right-upper plot in Figure 2. The six components are selected to be close, on purpose, to illustrate that the solution may not be unique nor analytic.
k=k+1
k K?
No (1,2,,K) (a1,a2,,aK) Ma components will be updated by the attractor
Mp Ma? Yes Mp components will be updated by the attractor
No
(Ma+1,, Mp) components do not change by the attractor
Update
Fig. 1.
Particle location updating diagram.
Based on the the mapping between the particle and its attractor, the updating is applied on the corresponding components. The velocity of the current particle is rst attenuated from previous velocity by ut+0.3,k1 = w ut,k1 , (12) where k1 [1, Mp ] is the index of the components that need to be updated. k1 covers all Mp components in the current
3 equally weighted
6 equally weighted
0.1
0.1
0.05
0.05
10
0 10 20 independent variable 3 unequally weighted
10 0 10 20 independent variable 6 unequally weighted
0.15
Histogram Histogram
0.15
0.1
0.1
0.05
0.05
10
0 10 20 independent variable
10 0 10 20 independent variable
Fig. 2.
The histogram of the four datasets with known components.
B. Unequally Weighted Known Components The third dataset with three unequally weighted components (M = 3) has the component parameters listed in Table I with weights in the fourth column. The histogram of it is the leftlower plot in Figure 2. The fourth dataset with six unequally weighted components (M = 6) has the component parameters listed in Table II in the fourth column. The histogram of it is the right-lower plot in Figure 2.
TABLE I T HE G AUSSIAN MIXTURE MODEL WITH
THREE EQUALLY AND UNEQUALLY WEIGHTED COMPONENTS
In the binary-headered PSO, full-dimensional particles are constructed with maximum number of components (assumed to be 10), and a binary header is included in the particle. Depending on the binary header, only some of the parameters, the ones with 1s in the header, are used in evaluating the tness. In the binary-headered PSO, extra storage and calculations are spent on the dummy parameters. The dummy parameters may adversely affect the performance because the dummy parameters in an attractor inuence the particles corresponding parameters, which may be active, though the dummy parameters are not involved in the tness evaluation. Nevertheless, it needs to be pointed out that if the attractor (global optimum or local optimum) has certain digits as 0s in the binary header, the attracted particles tend to have 0s at the same locations, and the effect of the dummy components tends to be small. The binary header is updated by binary PSO as in reference [9]. The velocity of a binary bit Xt is ubin , evaluated by Equation (10). ubin is rst transformed by a Sigmoid function as sbin = 1 , 1 + eubin (16)
Histogram
Histogram
and the nal value of that binary bit, Xt , is Xt+1 = U (sbin r3 ), (17)
3 components v1,j 3 5 9 v2,j 2 6 4
weights equal unequal wj wj 0.333 0.2 0.333 0.5 0.334 0.3
TABLE II T HE G AUSSIAN
MIXTURE MODEL WITH SIX EQUALLY AND UNEQUALLY WEIGHTED COMPONENTS
6 components v1,j 2 3 5 7 9 12 v2,j 1 2 6 2 4 1
weights equal unequal wj wj 0.167 0.1 0.167 0.2 0.167 0.3 0.167 0.2 0.167 0.1 0.165 0.1
where U () is a unit step function, r3 is a uniform random number. A binary-headered PSO implicitly allows different number of components, and a single run completes a trial as in VD-PSO. In the exhaustive PSO, canonical PSO is applied to nd the best tting for each feasible number of components (from 2 to 10). After all numbers of components are tried, the tness values of different number of components are compared, and the minimum of the tness, together with the corresponding number of components, is recorded. This method is essentially a brute-force searching of component number in several xed candidate parameter space, and it breaks the linkage between the particles in the evaluations of adjacent number of components. A discrete-headered PSO, whose discrete header, dictating the number of components, is updated by the discrete PSO algorithm [10]. A multi-ary discrete PSO algorithm proposed by Veeramachaneni, Osadciw and Kamath [9] is implemented here. The velocity of the discrete header Xt is udis , evaluated by Equation (10). udis is rst transformed by a Sigmoid function as sdis = M , 1 + eudis (18)
and the nal value of that discrete number, Xt , is C. Algorithm Comparison As a comparison to VD-PSO, the binary-headered PSO, exhaustive PSO, and discrete-headered PSO are implemented. The last two methods are intuitive attempts also proposed by us to solve the varying-dimensional problems. Xt+1 = round(sdis + (M 1) r4 ), (19)
where round() is a rounding operator, is an adjustable parameter, r4 is a random number conforming to uniform distribution. If Xt+1 falls outside of the extreme values
of 0 and M 1 by the random scheme, it is adjusted to become the corresponding extreme value. A discreteheadered PSO allows different number of components, and a single run completes a trial. The updating of the components follows that same rule as in VD-PSO. IV. E XPERIMENTAL R ESULTS Because of the randomness involved in the algorithms and the non-uniqueness of the density estimation problem, the best results found by PSO in different trials do not repeat. 100 trials are carried out on each dataset by each algorithm. The mean and standard deviation of the tness values do not change much after about 20 trials as illustrated in Figure 3, which indicates that the number of trials should be enough. The numerical results are provided in Table III with the average tness values and the standard deviations achieved in 100 trials. Another illustration of the variations of the performance is by box plot in Figure 4.
Convergency in evaluating the average performance 0.2
Standard deviation of fitness
TABLE III AVERAGE FITNESS COMPARISON WITH STANDARD DEVIATION ( THE LOWER , THE BETTER ). VD-PSO: VARYING - DIMENSIONAL PSO, B-PSO: B INARY- HEADERED PSO, E-PSO: E XHAUSTIVE PSO, D-PSO: D ISCRETE - HEADERED PSO Mean Standard Deviation 3 equal 0.0019281941246989 0.0018969909569263 0.1140728887922870 0.0474392312545014 0.0006885234732987 0.0007156515786001 0.0162925569967957 0.0054736947697648 6 equal 0.0277046618964822 0.0107943381434862 0.1778608650130660 0.0487448774346023 0.0160631062054728 0.0091813059144331 0.0734462729637343 0.0130126949649935 3 unequal 0.0014622718293250 0.0011356950795961 0.0618603084970137 0.0395654377391042 0.0007933367941558 0.0005092401251780 0.0074931974678837 0.0019082254476293 6 unequal 0.0118673691678036 0.0054619261013581 0.1165901541655910 0.0515274309886316 0.0096214866259121 0.0032608458406728 0.0306238731541168 0.0047533632220672
VD-PSO B-PSO E-PSO D-PSO VD-PSO B-PSO E-PSO D-PSO VD-PSO B-PSO E-PSO D-PSO VD-PSO B-PSO E-PSO D-PSO
Convergency in evaluating the variations of performance 0.2
Average fitness
0.15
0.15
0.1
0.1
0.05
0.05
20
40
60
80
100
20
40
60
80
100
6 unequally weighted components

Standard deviation of fitness in dB
0 20 40 60 80 100
0 20 40 60 80 100 VDPSO BinaryheaderedPSO ExhaustivePSO DiscreteheaderedPSO 0 20 40 60 Trial Number 80 100
20
40 60 Trial Number
80
100
Fig. 3. Comparison of the mean and standard deviation of different PSO algorithms on Gaussian mixture model tting of dataset 4. The upper row is in absolute scale, and the lower row is in dB to magnify the small values.
When the number of components is small, exhaustive PSO reaches an average tness about half or one third of VD-PSO, but both values are small and the difference may not matter if both are below the density tting tolerance, which is usually specied by the application. When the number of components is big, VD-PSO works comparably to exhaustive PSO. It should be pointed out that the performance gain by exhaustive PSO is based on more than 8 times of time and calculations, as illustrated in Table IV. Figure 5 shows the wall-clock time for density estimation on dataset 1. VD-PSO spends the least wall-clock time among all algorithms, almost 12% of the time of the exhaustive PSO. Calculation time is important for real-time systems, and the savings in memory and calculations are crucial for distributed computation systems such as wireless sensor networks.
Calculation time per trial. 500 Iterations. 100 Particles.: 6 unequally weighted modes 1.8
Average fitness in dB
Box Plot of 100 Trials on fitness: 6 unequally weighted modes 0.25

Fitness (the lower, the better)
1.6 1.4 1.2
Minutes
0.2 0.15 0.1
1 0.8 0.6 0.4
0.05 0 VDPSO BinaryPSO ExhaustivePSODiscretePSO
0.2 0
VDPSO
BinaryPSO
ExhaustivePSO
DiscretePSO
Fig. 4. Boxplot comparison of the variations of different PSO algorithms on Gaussian mixture model tting of dataset 4.
Fig. 5. Wall-clock time comparison of different PSO algorithms on Gaussian mixture model tting of dataset 1
In each trial, the searching stops after 500 iterations. The iteration number is selected based on the converging speed of the algorithms in a few pre-trials.
Meanwhile, in exhaustive PSO, all the potential dimension numbers are tried, and except for the conguration with the minimum tness, other congurations incur larger tness and
TABLE IV C ALCULATION TIME COMPARISON ( THE LOWER , THE BETTER ). VD-PSO: VARYING - DIMENSIONAL PSO, B-PSO: B INARY- HEADERED PSO, E-PSO: E XHAUSTIVE PSO, D-PSO: D ISCRETE - HEADERED PSO time B-PSO 13.63025 17.9426 102.185 31.42115
seconds 3 equal 6 equal 3 unequal 6 unequal
VD-PSO 14.20891017 18.02801089 109.8114595 30.65605916
E-PSO 13.34575 16.72608 101.3227 28.62866
D-PSO 13.76101 16.24746 104.4016 30.10395
variations as illustrated in Figure 6. The red circles indicate the correct number of components, yet the selections based on minimum tness do not always coincide with these. The calculation requirements of VD-PSO, binary-headered PSO or discrete-headered PSO are comparable to a single conguration of exhaustive PSO. If all the congurations of exhaustive PSO were taken into account in the average tness evaluation, the average tness would be worse than other algorithms.
Data set 1 0.05 0.04 0.03 0.025 Data set 2
0.03 0.02 0.01 0 0.01 2 4 6 8 Tried M in Exhaustive PSO Data set 3 0.1 0.08 0.06 0.04 0.02 0 10
0.02 0.015 0.01 0.005 0 0.005 2 4 6 8 Tried M in Exhaustive PSO Data set 4 0.06 10
fully in UWB waveform design [11], etc. The reason that the binary-headered PSO does not work well here may be due to the linkage between the component parameters, and the dummy parameters in the attractor adversely affects the active parameters in a particle. The comparison between VD-PSO and discrete-headered PSO inspires an interesting observation on how the variation of discrete header affects the solution searching. These two algorithms differ only in the variation of the discrete header. In discrete-headered PSO, the used in Equation (19) is rst selected as 0.2 [9], and the variation of the discrete header is more than 80% (in the sense of mean) of the iteration steps of all particles as illustrated in Figure 7. With such high variation on the number of components, the parameters associated with the components may get cut or re-initialized before they are well tuned. A smaller value of is thus tried at 0.002, which corresponds to a very small standard deviation of the Gaussian distribution around m (m [1, M ]) in the multi-ary discrete PSO. The jumping of one value to another is limited by small in step-size, but a small does not eliminate the jumping because the random factor is still in effect. When = 0.002, the variation of the discrete header concentrates mostly at 1, instead of being more distributed from 1 to 8 when = 0.2. Though the step-size of jumping decreases, the variation of the discrete header when = 0.002 still happens at more than 60% of the iterations.
Combined Ratio of Dimension Variations by DPSO 0.95 = 0.2 = 0.002 0.9
and of Fitness
and of Fitness
0.04 0.03 0.02 0.01
Occurrences of Expansion and Deletion
0.05
0.85
and of Fitness
and of Fitness
0.8
0.75
0.7
4 6 8 Tried M in Exhaustive PSO
10
4 6 8 Tried M in Exhaustive PSO
10
0.65
Fig. 6. The mean and standard deviation of the tness for each number of components tried in exhaustive PSO are shown as the middle line and gray region. The red circles indicate the known number of components.
0.6 3 equal 0.55 0 10 20 3 unequal 30 6 equal 40 Trial Number 50 60 6 unequal 70 80
It is observed that the tness values of nearly all algorithms on dataset 1 are worse than on dataset 3, and the tness values on dataset 2 are worse than on dataset 4. This observation indicates that the difculty of a density estimation problem is not determined by whether the weighting of the components is equal, but rather by the smoothness of the PDF (please refer to Figure 2). The performance of all the algorithms illustrate a consistent difculty metric of the four datasets. VD-PSO provides a competitive density estimation to exhaustive PSO with savings in calculations and time. VDPSO also works more favorably than the binary-headered PSO and the discrete-headered PSO on all four datasets. These experiments do not show that binary-headered PSO or discreteheadered PSO would not work well for other applications though. The binary-headered PSO has been applied success-
Fig. 7. Combined ratio of dimension variations through expansion or deletion by Discrete-headered PSO. A parameter, , in the discrete PSO is tried at 0.2 and an extremely small value 0.002. In both cases, the Discrete-headered PSO shows high variation in the dimension, causing unmature development of the parameters and bad performance.
Above observation illustrates that the discrete PSO is designed to explore the solution space thoroughly, but its high variation makes it unsuitable to act as a dimension indicator. As a comparison, VD-PSO algorithm employs a simple probabilistic scheme to vary the dimension indicator at about 20% of the iterations in the simulation (10% expansion and 10% deletion). The tness is used to guide the linked update, and the component parameters are able to develop fully and constitute a good solution. In summary, VD-PSO is designed for varying-dimensional
components with continuous-valued parameters, and it is a viable and efcient solution to such optimization. V. C ONCLUSIONS AND F UTURE W ORK A new varying-dimensional particle swarm optimization (VD-PSO) algorithm is proposed with a probabilistic dimension variation rule and a linked parameter updating rule. VDPSO is designed for problems with varying number of components and continuous-valued parameters for each component. The linkage between the components of the current particle and its attractors is determined by association. A Gaussian mixture model (GMM) example is utilized throughout the paper to explain the procedure and the advantages of incorporating the number of components as a parameter to guide the updating of other continuous parameters. Three other methods, the last two of which are our other attempts to solve the varying dimensional problems, are compared with VD-PSO. The rst comparison algorithm is the binary-headered PSO, which constructs the full-size parameters and uses a binary header to indicate the used parameters from the dummy parameters. The second comparison algorithm is the exhaustive PSO, which tries out each potential number of parameters and selects the best result from within. The third comparison algorithm is the discrete-headered PSO, where a discrete number of components is updated by the discrete PSO algorithm. The results of the simulation on the four algorithms in 100 trials show that VD-PSO is competitive in performance to the exhaustive PSO, and it is more efcient than the exhaustive PSO in calculations and time. VD-PSO achieves better tness than the binary-headered PSO and discrete-headered PSO, and it is also more efcient in storage than the binary-headered PSO. VD-PSO handles varying dimensions effectively compared to other PSO algorithms as well as other non-PSO solutions, such as expectation maximization (EM), auto-regressive, or spectral estimation. In the density estimation problem by mixture models, if the number of components is known and the components are assumed to be Gaussian, EM algorithm is widely used to estimate the means and variances of the Gaussian mixture components [12]. If the number of components is unknown, searching for the number of components is needed, too [13]. On the other hand, VD-PSO updates the number of components and the corresponding parameters simultaneously. Furthermore, if the components are not Gaussian, then in EM, for each potential distribution, the maximum likelihood estimator of the parameters needs to be re-derived altogether based on the new distribution, which is usually a complex procedure. In contrast, in VD-PSO, for each potential distribution, only the corresponding probability density function for the tness evaluation needs to be changed. This minimum change prevents the theoretical derivation of the estimator for different distributions saving complexity. This paper hopefully begins to hint at the many applications and corresponding PSO approaches that may solve difcult and intractable problems. A study comparing these types of algorithms would be interesting research to follow this paper.
VD-PSO addresses the problem of varying dimensions, and it can be rened by adopting new features such as selecting which attractor to follow. Veeramachaneni, Peram, Mohan and Osadciw [14] incorporate the near neighbor interactions based on tness. This adds more inuence by the application in the search, resulting in faster and more accurate convergence. The distribution matching could be added into the attraction. In the future, the dimension of a particle can be set to approach the attractors dimension with care. Another kind of VD-PSO problem, where the linkage between the current particle and its attractor requires a transformation to project onto a common space, will be studied. ACKNOWLEDGEMENT The authors are grateful for the discussions with Kalyan Veeramachaneni and Ganapathi Kamath on the binary and discrete PSO algorithms. Thanks to the anonymous reviewers for the insightful questions and suggestions. R EFERENCES
[1] J. Kennedy and R. Eberhart, Particle swarm optimization, in Proc. IEEE Intl. Conf. on Neural Networks (Perth, Australia), vol. IV, IEEE Service Center, Piscataway, NJ, 1995, pp. 19421948. [2] J. Kennedy, R. C. Eberhart, and Y. Shi, Swarm intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., March 23, 2001. [3] G. Kamath, X. Ye, and L. A. Osadciw, Using swarm intelligence and bayesian inference for aircraft interrogation, in Wireless Communication and Networks Conference 2008, Las Vegas, NV, Apr. 2008. [4] E. S. Correa, A. A. Freitas, and C. G. Johnson, A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set, in GECCO 06: Proceedings of the 8th annual conference on Genetic and evolutionary computation. New York, NY, USA: ACM, 2006, pp. 3542. [5] A. Passaro and A. Starita, Particle swarm optimization for multimodal functions: A clustering approach, Journal of Articial Evolution and Applications, vol. 2008, no. 482032, p. 15 pages, 2008. [6] K. E. Parsopoulos, V. P. Plagianakos, G. D. Magoulas, and M. N. Vrahatis, Improving particle swarmoptimizer by function stretching. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2001, vol. 54. [7] D. Liu and Y. Cai, Taguchi method for solving the economic dispatch problem with nonsmooth cost functions, Power Systems, IEEE Transactions on, vol. 20, no. 4, pp. 20062014, Nov. 2005. [8] A. Baklizi, Weighted kolmogrov-smirnov type tests for grouped rayleigh data, Applied Mathematical Modelling, vol. 30, no. 5, pp. 437 445, May 2006. [9] K. Veeramachaneni, L. Osadciw, and G. Kamath, Probabilistically driven particle swarms for discrete multi valued problems: Design and analysis, in Proceedings of IEEE Swarm Intelligence Symposium, Hawaii, April 1-5, 2007. [10] P. Jim and M. Alcherio, Discrete Multi-Valued Particle Swarm Optimization, in IEEE Swarm Intelligence Symposium, 2006, pp. 103 110. [Online]. Available: http://www.computelligence.org/sis/ [11] W. Gao and L. Osadciw, A mui deduction pulse shape design scheme for uwb communications, in Proc. 7th IEEE Upstate New York Workshop on Communications, Sensors and Networking, Nov. 2007. [12] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 138, 1977. [Online]. Available: http://www.jstor.org/stable/2984875 [13] Z. Zivkovic and F. van der Heijden, Recursive unsupervised learning of nite mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 5, pp. 651656, 2004. [14] K. Veeramachaneni, T. Peram, C. Mohan, and L. A. Osadciw, Optimization using particle swarm with near neighbor interactions, in GECCO 2003, Lecture Notes Computer Science, Springer Verlag, vol. 2723/2003, 2003.

DsitEst VDPSO IEEEfmt DrO 8pages After Review

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

DsitEst VDPSO IEEEfmt DrO 8pages After Review

Enviado por

Direitos autorais:

Formatos disponíveis

Varying Dimensional Particle Swarm Optimization

Yanjun Yan and Lisa Ann Osadciw

Histdata (x) Hpdf (x) =

wj F (x|v1,j , , vN,j ), (1)

bi,min + bi,max |v1,j , , vN,j ), 2

(Hmdf (xi |P ) Hnorm (xi ))2 ,

Current Particle: Mp components

Finally, the locations of current particles are updated by (15)

No (1,2,,K) (a1,a2,,aK) Ma components will be updated by the attractor

Mp Ma? Yes Mp components will be updated by the attractor

(Ma+1,, Mp) components do not change by the attractor

Particle location updating diagram.

0 10 20 independent variable 3 unequally weighted

10 0 10 20 independent variable 6 unequally weighted

The histogram of the four datasets with known components.

3 components v1,j 3 5 9 v2,j 2 6 4

weights equal unequal wj wj 0.333 0.2 0.333 0.5 0.334 0.3

6 components v1,j 2 3 5 7 9 12 v2,j 1 2 6 2 4 1

Convergency in evaluating the variations of performance 0.2

6 unequally weighted components

0 20 40 60 80 100 VDPSO BinaryheaderedPSO ExhaustivePSO DiscreteheaderedPSO 0 20 40 60 Trial Number 80 100

Box Plot of 100 Trials on fitness: 6 unequally weighted modes 0.25

1.6 1.4 1.2

0.2 0.15 0.1

1 0.8 0.6 0.4

0.05 0 VDPSO BinaryPSO ExhaustivePSODiscretePSO

seconds 3 equal 6 equal 3 unequal 6 unequal

VD-PSO 14.20891017 18.02801089 109.8114595 30.65605916

E-PSO 13.34575 16.72608 101.3227 28.62866

D-PSO 13.76101 16.24746 104.4016 30.10395

0.04 0.03 0.02 0.01

Occurrences of Expansion and Deletion

4 6 8 Tried M in Exhaustive PSO

4 6 8 Tried M in Exhaustive PSO

0.6 3 equal 0.55 0 10 20 3 unequal 30 6 equal 40 Trial Number 50 60 6 unequal 70 80

Você também pode gostar