Você está na página 1de 10

A Hierarchical Clustering Strategy to Improve the Biological Plausibility of an Ecology-based Evolutionary Algorithm

Rafael Stubs Parpinelli1,2


1

and Heitor Silv erio Lopes2

Applied Cognitive Computing Group Santa Catarina State University Joinville, Brazil 2 Bioinformatics Laboratory Federal Technological University of Paran a Curitiba, Brazil parpinelli@joinville.udesc.br hslopes@utfpr.edu.br

Abstract. It is well known that, in nature, populations are dynamic in space and time. This means that the formation of habitats changes over time and its formation is not deterministic. This work uses the concepts of ecological relationships, ecological successions and probabilistic formation of habitats to build a cooperative search algorithm, named ECO. This work aims at exploring the use of a hierarchical clustering technique to probabilistically set the habitats of the computational ecosystem. The Articial Bee Colony (ABC) was used in the experiments in which benchmark mathematical functions were optimized. Results were compared with ABC running alone, and the ECO with and without the use of hierarchical clustering. The ECO algorithm with hierarchical clustering performed better than the other approaches, possibly thanks to the ecological interactions (intra and inter-habitats) that enabled the coevolution of populations and to a more bio-plausible probabilistic strategy for habitats denition. Also, a critical parameter was suppressed. Keywords: optimization; cooperative search; co-evolution; habitats; ecology; hierarchical clustering; single-link algorithm; biological plausibility

Introduction

The search for biologically plausible ideas, models and computational paradigms always drew the interest of computer scientists, particularly those from the Natural Computing area [1]. The main feature of bio-plausible systems is the use of natural inspirations at some degree where the designers of these systems generally aim to achieve biologically plausible functionalities in non-biological contexts, such as the optimization of engineering problems.

Authors would like to thank the Brazilian National Research Council (CNPq) for the research grant to H.S. Lopes; as well as to UDESC (Santa Catarina State University) and FUMDES program for the doctoral scholarship to R.S. Parpinelli.

Parpinelli, R.S and Lopes, H.S

The concept of optimization can be abstracted from several natural processes such as in the evolution of the species, in the behavior of social groups, in the dynamics of the immune system, in the strategies of searching for food and in the ecological relationships of dierent populations. Most of these cases were the source of inspiration to the development of algorithms for optimization, such as the evolutionary computation (EC) and swarm intelligence (SI) that currently oer a wide range of strategies for optimization [2][3]. It is worth mentioning that most bio-inspired algorithms only focus on and take inspiration from specic aspects of the natural phenomena. However, in nature, biological systems are interlinked to each other, e.g. biological ecosystems [4][5]. In [6] the authors rst introduce the potentiality of some ecological concepts (e.g., habitats, ecological relationships and ecological successions) presenting a simplied ecological-inspired algorithm. In this work we use a hierarchical clustering algorithm [7][8] as a biologically plausible strategy for creating habitats in an ecological-inspired system. The aim is to compare the results obtained by the implementation of the algorithm with the use of ecological concepts, without the use of ecological concepts (application of stand alone algorithms), and with the use of hierarchical clustering.

Hierarchical Clustering

Hierarchical clustering refers to methods that produce a nested series of partitions [9]. Single-link and complete-link algorithms are the most popular hierarchical clustering algorithms. These two algorithms dier in the way they characterize the similarity between a pair of clusters. In the single-link method, the distance between two clusters is the minimum of the distances between any two points (or patterns) in the dierent clusters. In the complete-link algorithm, the distance between two clusters is the maximum of all pairwise distances between any two points in the dierent clusters. In either case, two clusters are merged to form a larger cluster based on the minimum distance criteria. In this work we use the single-link algorithm. A hierarchical algorithm yields a dendrogram representing the nested grouping of patterns and similarity levels at which groupings change [7]. Figure 2 gives a distance matrix sample for ve items (1 - 5). In our context, each item represents the centroid of a given population and the distance matrix is computed using the Euclidean distance metric. The single-link algorithm returns the linkage information needed to build a dendrogram (Figure 2) in a matrix with three columns and N Q 1 rows where N Q is the number of items [8]. In Figure 2, each row identies a node and represents a link between clusters. The rst column identies the nodes, and the two subsequent columns identify the clusters that have been linked. Negative items represent newly formed binary clusters. The third column contains the distance between these objects. The dendrogram of Figure 3 shows the series of merges that result from using the single-link technique. The height at which two clusters are merged in the dendrogram reects

Hierarchical Clustering Improving Bio-Plausibility of an Ecology-based EA

the distance of the two clusters. The dendrogram can be broken into dierent levels to yield dierent clusterings of the data. For example, if we dene a cuto level at 3.0 in the y -axis, three clusters are formed: one with items 1 and 2; other with items 4 and 5; and other with item 3.
Items 1 2 3 4 5 1 0.0 0.5 4.3 3.8 4.8 2 0.5 0.0 4.7 3.3 4.4 3 4.3 4.7 0.0 6.2 6.6 4 3.8 3.3 6.2 0.0 1.1 5 4.8 4.4 6.6 1.1 0.0 Node Itemlef t Itemright Distance 1 1 2 0.8 2 4 5 2.7 3 -1 -2 3.8 4 3 -3 5.7

Fig. 1. Distance matrix for Fig. 2. Single-link result for the data in Fig. 2. ve items.
5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 1 2 4 5 3

Fig. 3. Dendrogram generated using linkage information from Fig. 2.

The Proposed Ecological-Inspired Approach

The ecological-inspired algorithm, named ECO, represents a new perspective to develop cooperative evolutionary algorithms. The ECO is composed by populations of individuals (candidate solutions for a problem being solved) and each population evolves according to an optimization strategy. Therefore, individuals of each population are modied according to the mechanisms of intensication and diversication, and the initial parameters, specic to each optimization strategy. The ECO system can be modeled in two ways: homogeneous or heterogeneous. A homogeneous model implies that all populations evolve in accordance to the same optimization strategy, congured with the same parameters. Any change in the strategies or parameters in at least one population characterises a heterogeneous model. The ecological inspiration stems from the use of some ecological concepts, such as: habitats, ecological relationships and ecological successions [4][5]. Once dispersed in the search space, populations of individuals established in the same

Parpinelli, R.S and Lopes, H.S

region constitute an ecological habitat. For instance, in a multimodal hypersurface, each peak can become a promising habitat for some populations. A hyper-surface may have several habitats. As well as in nature, populations can move around through all the environment. However, each population may belong only to one habitat at a given moment of time t. Therefore, by denition, the intersection between all habitats at moment t is the empty set. With the denition of habitats, two categories of ecological relationships can be dened. Intra-habitats relationships that occur between populations inside each habitat, and inter-habitats relationships that occur between habitats [4][5]. In ECO, the intra-habitat relationship is the mating between individuals. Populations belonging to the same habitat can establish a reproductive link between their individuals, favoring the co-evolution of the involved populations through competition for mating. Populations belonging to dierent habitats are called reproductively isolated. The inter-habitats relationship are the great migrations. Individuals belonging to a given habitat can migrate to other habitats aiming at identifying promising areas for survival and mating. In addition to the mechanisms of intensication and diversication specic to each optimization strategy, when considering the ecological context of the proposed algorithm, the intra-habitats relationships are responsible for intensifying the search and the inter-habitats relationships are responsible for diversifying the search. Inside the ecological metaphor, the ecological successions represent the transformational process of the system. In this process, populational groups are formed (habitats), relations between populations are established and the system stabilizes by means of the self-organization of its components. Algorithm 1 shows the pseudo-code of the proposed approach. In this algorithm, the ecological succession loop (lines 3 to 12) refers to iterations of the computational ecosystem. In line 4, evolutive period, each population evolves (generations/iterations) according to its own criteria. The metric chosen to dene the region of reference is the centroid and represents the point in the space where there is a longest concentration of individuals of population i. For a detailed description refer to [6].

Algorithm 1 Pseudo-code for ECO


1: Consider i = 1, . . . , N Q, j = 1, . . . , N H and t = 0; 2: Initialize each population Qt i with ni random candidate solutions; 3: while stop criteria not satised do {Ecological succession cycles} 4: Perform evolutive period for each population Qt i; 5: Apply metric Ci to identify the region of reference for each population Qt i; 6: Using the Ci values, dene the N H habitats; t t 7: For each habitat Hj dene the communication topology CTj between populations Qt ij ; t 8: For each topology CTj , perform interactions between populations Qt ij ; t 9: Dene communication topology T H t between Hj habitats; t 10: For T H t topology, perform interactions between Hj habitats; 11: Increase t; 12: end while

Hierarchical Clustering Improving Bio-Plausibility of an Ecology-based EA

3.1

Habitats Formation Using Hierarchical Clustering

A key concept of the proposed ECO system is the denition of habitats (line 6 in Algorithm 1). In [6] the denition of habitats is performed deterministically by the use of a user dened proximity threshold . In this work we use a hierarchical clustering algorithm to setup the habitats where each cluster represents a habitat. Hence, the habitats are dened probabilistically taking into account the distance information returned by the single-link algorithm. This gives more biological plausibility to the system once, in nature, the habitats are not dened deterministically as done in [6]. Also, this approach suppress the control parameter . To create probabilistically the habitats we use the linkage information returned by the single-link algorithm (Figure 2). The distance information are used as probabilities to drive the formation of habitats in a top-down strategy (see the Algorithm 2). It is a top-down strategy because it starts from the top of the dendrogram (farthest clusters) and goes down to the bottom of the dendrogram (closest clusters). After some initializations, the rst step in Algorithm 2 is to scale linearly the single-link distances in order to be able to work with this information as probabilities (line 6). We choose to work within the closed interval of [0.01, 0.99] in order to give one more biologically plausible feature to the system. Hence, concerning the lower bound, it means that as close as two populations are from each other, there is still 1% of chance of not grouping these two populations. There is a small chance to the closest populations not belong to the same habitat. Concerning the upper bound, it means that as far as two populations are from each other, there is still 1% of chance of grouping these two populations. There is a small chance to the farthest populations belong to the same habitat. Figure 4 gives the linearly scaled values for the example of Figure 2. Algorithm 2 Pseudo-code for probabilistic habitats formation.
1: N H = 0; 2: nodeCount = 0; 3: curN ode = N Q 1; 4: curHabitat = 0; 5: Create HcurHabitat with no items; 6: Linearly scalonate the single-link distances; 7: while nodeCount < N Q 1 do 8: if rand Distance(curN ode) then {Group items} 9: HcurHabitat = curN ode.itemlef t and curN ode.itemright ; 10: nodeCount = nodeCount + 1; 11: else {Separate items} 12: HcurHabitat = curN ode.itemnearest ; 13: N H = N H + 1; 14: Create HN H with no items; 15: HN H = curN ode.itemf arthest ; 16: nodeCount = nodeCount + 1; 17: end if 18: Update curHabitat; 19: Update curN ode; 20: end while 21: Return N H ; 22: Return Hj where j = 1, . . . , N H ;

Parpinelli, R.S and Lopes, H.S

After that, the algorithm enters a loop until all nodes are analyzed (lines 7 to 20). The nodeCount variable counts the number of analyzed nodes. Inside this loop a probabilistic conditional statement decides if the items will be grouped together or separated in two groups (line 8). Notice that the distance between items inuence directly the probabilistic decision. The closer two items are from each other, the larger the chance to group these two items together. The opposite holds for the farthest items. If two items are decided to be grouped together, the current habitat (HcurHabitat ) receives the left and the right items from the node being analyzed (curN ode) (lines 9 and 10). If two items are decided to be separated from each other, it is necessary to decide which item stays and which item will belong to a new habitat. As a general rule, the closest item from the current group stays and the farthest item creates a new habitat (lines 12 to 16). Next steps are to update the next habitat and the next node to be analyzed. The curHabitat variable is updated to the absolute value of the rst habitat with a negative item inside (newly formed binary clusters) (line 18). The curN ode variable is updated to the absolute value of the rst negative item inside HcurHabitat (line 19). Finally, the algorithm returns the number of habitats (N H ) and the habitats themselves (Hj ) (lines 21 and 22, respectively).

3.2

Intra-habitats communication

Once the habitats are probabilistically dened, the next step in Algorithm 1 (line 7) is the denition of the communication topology for each habitat. Differently from the work done in [6], in this work the denition of intra-habitats communication topology does not use any proximity threshold. Again, aiming at improving the biological plausibility of the system, here, we use a communication topology that is probabilistically dened. For a habitat with more than one population, intra-habitat communication occurs in such a way that each population inside the habitat chooses another population to perform communication. Here, the distance between populations inuence directly the probabilistic decision. The closer two populations are from each other the higher is the chance of these two populations communicate. The opposite happens with farthest populations. All the non-mentioned procedures of Algorithm 1 remain the same as published in [6].

Node Itemlef t Itemright Distance 1 1 2 0.01 2 4 5 0.39 3 -1 -2 0.62 4 3 -3 0.99

Fig. 4. Linearly scaled values for distance.

Hierarchical Clustering Improving Bio-Plausibility of an Ecology-based EA

Experiments and Results

Experiments were conducted using four benchmark functions extensively used in the literature for testing optimization methods [10]. Each function to be minimized was tested with 10 and 200 dimensions. The rst function (f1 (x) with 100 xi 100) is known as generalized F6 Schaer function. The second function (f2 (x) with 5.12 xi 5.12) is the Rastrigin function. The third function (f3 (x) with 600 xi 600) is the Griewank function. Finally, the fourth function (f4 (x) with 30 xi 30) is the Rosenbrock function. The parameters of the ECO algorithm are: number of populations (N-POP ) that will be co-evolved, the initial population size (POP-SIZE ), number of cycles for ecological successions (ECO-STEP ), the size of the evolutive period (EVOSTEP ) that represents number of function evaluations in each ECO-STEP, the tournament size (T-SIZE ) and the proximity threshold . In this development (ECO-C), with denition of habitats using hierarchical clustering, the proximity threshold is suppresed. In all experiments the initial population size was set to POP-SIZE = 10. Studies about the adjustment of parameters have not been carried out yet. Hence, all the parameters of the algorithm were dened empirically [6]. In all experiments, the Articial Bee Colony Optimization (ABC) algorithm [11] was used in a homogeneous model, i.e. all populations use this algorithm with the same control parameters. For the number of dimensions (D) equal to 10, the parameters used were N-POP = 100, ECO-STEP = 100, EVO-STEP = 100, T-SIZE = 5 e = 0, 5. With this conguration, the total number of function evaluations was 10,000 for each population. For D = 200, some parameters were redened: N-POP = 200, ECO-STEP = 500, EVO-STEP = 200. With this adjustment of parameters, for 200 dimensions, the total number of function evaluations was 100,000 evaluations for each population. Table 1 shows the averaged results obtained for the benchmark functions. For both dimensions, D = 10 and D = 200, the results obtained by each conguration of the algorithms are presented (columns 2 to 4). The ecological-inspired framework was tested using three congurations. The rst conguration implements Algorithm 1 as described in Section 3, with the denitions of habitats using the proximity threshold , topologies and ecological relations (ECOABC , fourth column of Table 1). The second conguration implements Algorithm 1 and enables the ability to probabilistically create habitats using the single-link clustering information uppon the proposed Algorithm 2 (ECO-CABC , third column of Table 1). The third conguration disables the ability to create habitats and, consequently, topologies and interactions are not dened. This third conguration simulates the evolution of completely isolated populations, and they evolve without exchanging information (ABC, second column of Table 1). For each conguration, the algorithm was run 30 times. For each dimension, the third line (Global Best ) in Table 1 shows the average and standard deviation of the best result obtained by all populations in all runs.

Parpinelli, R.S and Lopes, H.S Table 1. Obtained results for the benchmarck functions.
f1 (x) Model Global Best Model Global Best f2 (x) Model Global Best Model Global Best f3 (x) Model Global Best Model Global Best f4 (x) Model Global Best Model Global Best D = 10 ECOABC 1.1344 0.2 D = 200 ABC ECOABC 27.5936 0.73 20.2792 0.4 D = 10 ABC ECOABC 11 10 0.0 0.0000 0.0 D = 200 ABC ECOABC 62.1453 0.0 1005 0.0 D = 10 ABC ECOABC 06 10 0.0 1013 0.0 D = 200 ABC ECOABC 07 10 0.0 1011 0.0 D = 10 ABC ECOABC 0.0098 0.0 0.0086 0.0 D = 200 ABC ECOABC 13036.1 4193.4 137.86 42.0 ABC 4.6569 0.8 ECO-CABC 1.0687 0.2 ECO-CABC 19.8027 0.5 ECO-CABC 0.0000 0.0 ECO-CABC 1005 0.0 ECO-CABC 1018 0.0 ECO-CABC 1011 0.0 ECO-CABC 0.0082 0.0 ECO-CABC 1.8778 1.7

Analyzing the ABC and ECOABC we can observe that the ecological-inspired approach obtained much better results than the algorithm executed without the concepts of habitats for all functions. This gain is mainly due to the ecological interactions (intra and inter-habitats) that enabled the co-evolution of populations. Analyzing the results for the ecological-inspired approach with probabilistic habitat denition, ECO-CABC , we can observe that the results were equivalent or better for all functions when compared with the ecological-inspired approach without the use of clustering strategy (ECOABC ). This analysis indicates that the behavior of the ecological algorithm does not change when using the proposed hierarchical clustering strategy to probabilistically setup the habitats and communication topology. It is worth mentioning that with this strategy a critical parameter () is suppressed. Also, one can notice that the results obtained by ECO-CABC for function f4 (x) with D = 200 was much better than the ECOABC approach. This result indicates that the value for the parameter present in ECOABC was not the best choice and should be optimized. With the new application of hierarchical clustering this problem is cleary solved. Moreover, the ECO-CABC was the best approach for all functions. In Figure 5 we can visually verify the results for D = 200, where the x-axis shows the dierent approaches and the y-axis represents the Global Best values of each approach and are shown at the top of each bar.

Conclusions

This paper presents an ecological-inspired algorithm for optimization that uses a hierarchical clustering strategy to probabilistically setup the distribution of pop-

Hierarchical Clustering Improving Bio-Plausibility of an Ecology-based EA

(a) Function f1 (x).

(b) Function f2 (x).

(c) Function f3 (x).

(d) Function f4 (x).

Fig. 5. Bar graph o each benchmarck function with D = 200.

ulations into habitats. The proposed algorithm uses cooperative search strategies where populations of individuals co-evolve and interact among themselves using some ecological concepts. Each population behaves according to the mechanisms of intensication and diversication, and the control parameters, specic to a given search strategy. The Articial Bee Colony Optimization algorithm was used in all populations. In this work, a more biologically plausible denition of habitats is achieved by using probabilistically the distance information returned by the single-link clustering algorithm. The main ecological concepts addressed are the probabilistic denition of habitats, ecological relationships, ecological successions. These features bring a higher biological plausibility to the proposed algorithm, opposed to most bioinspired algorithms that take inspiration only from one biological phenomenon. Thus, the proposed methodology opens the possibility for the insertion of several ecological concepts in the optimization process, bringing more biological plausibility to the system. The results showed that the use of habitats and ecological relationships inuence signicantly the co-evolution process of populations, leading to better

10

Parpinelli, R.S and Lopes, H.S

solutions (when compared to the results not using the ecological concepts). Also, the use of a probabilistic habitats denition inside the ECO framework improved the results and, mainly, suppressed the proximity threshold , a critical control parameter that should be set by the user. This work is still under development and as future work we intend to analyze the inuence of the remaining control parameters (number of ecological successions, evolutive period, and number of populations) on the quality of solutions, as well as to add other search strategies in the proposed model. Currently, in order to bring more biological plausibility to the system, other ecological concepts are being modeled, and eorts are doing to eliminate control parameters.

References
1. de Castro, L.N.: Fundamentals of natural computing: an overview. Physics of Life Reviews 4(1) (2007) 136 2. Engelbrecht, A.P.: Computational Intelligence: An Introduction. 2nd edn. Wiley, Chichester, UK (2007) 3. Parpinelli, R.S., Lopes, H.S.: New inspirations in swarm intelligence: a survey. International Journal of Bio-Inspired Computation 3(1) (2011) 116 4. Begon, M., Townsend, C.R., Harper, J.L.: Ecology: from individuals to ecosystems. 4th edn. Blackwell Publishing, Oxford, UK (2006) 5. May, R.M.C., McLean, A.R.: Theoretical Ecology: Principles and Applications. Oxford University Press, Oxford, UK (2007) 6. Parpinelli, R.S., Lopes, H.S.: An eco-inspired evolutionary algorithm applied to numerical optimization. In: Proceedings of the Third World Congress on Nature and Biologically Inspired Computing, Salamanca, Spain (2011) 473478 7. Murtagh, F., Contreras, P.: Algorithms for hierarchical clustering: an overview. Data Mining and Knowledge Discovery 2(1) (2012) 8697 8. Legendre, P., Legendre, L.: Numerical ecology. Elsevier, Amsterdam (1998) 9. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16 (2005) 645678 10. Digalakis, J.G., Margaritis, K.G.: An experimental study of benchmarking functions for evolutionary algorithms. International Journal of Computer Mathematics 79(4) (2002) 403416 11. Karaboga, D., Akay, B.: A comparative study of articial bee colony algorithm. Applied Mathematics and Computation 214 (2009) 108132

Você também pode gostar