Você está na página 1de 20

Genetic Algorithms and

Supervised Learning
Genetic algorithm apply an evolutionary approach to inductive learning
Applications include scheduling problems such as network routing problems,
TSP and so on
Algorithm
1. Initialize a population P of n elements, often referred to as chromosomes, as
a potential solution
2. Until a specified termination condition is satisfied:
a. Use a fitness function to evaluate each element of the current solution. If
an element passes the fitness criteria, it remains in P
b. The population now contains m elements(m<=n). Use genetic operators
to create (n-m) new elements. Add the new elements to the population
Keep
Population Fitness
Elements Function
Throw

Training
Data

Candidates
for Crossover
& Mutation

Figure 3.8 Supervised genetic learning


Table 3.8 An Initial Population for Supervised Genetic Learning

Population Income Life Insurance Credit Card


Element Range Promotion Insurance Sex Age

1 2030K No Yes Male 3039


2 3040K Yes No Female 5059
3 ? No No Male 4049
4 3040K Yes Yes Male 4049
The goal of the algorithm is to differentiate individuals
who have accepted life insurance promotion from those
who have not
The above table is the initial population
After each iteration, there should be two yes and two no
for the life insurance attribute.
To implement the fitness function, we compare each
population element to six training instances shown
below
Table 3.9 Training Data for Genetic Learning

Training Income Life Insurance Credit Card


Instance Range Promotion Insurance Sex Age

1 3040K Yes Yes Male 3039


2 3040K Yes No Female 4049
3 5060K Yes No Female 3039
4 2030K No No Female 5059
5 2030K No No Male 2029
6 3040K No No Male 4049
To implement the fitness function we compare each population
element to the six training instances shown above. For a
single population element we define fitness function as below:
1. Let N be the number of matches of the input attribute
values of E with training instances of its own class
2. Let M be the number of input attribute value matches to all
training instances from the competing classes
3. Add 1 to M(to avoid divide by 0)
4 fitness score=N/M( delete elements below a threshold score
and becomes candidate for genetic operations)
Lets compute the fitness score for element 1, which is a member of life insurance=no. There for N is the
total matches of training records with life insurance=no
Income range=20-30K matches with 4 and 5
Credit card insurance=yes no match
Sex=male instances 5 and 6
Age=30-39 no matches
S0 N=4
Similarly M for life insurance=yes rounds of to 4
Hence
F(1)=4/5=0.8
F(2)=0.86
F(3)=1.2
F(4)=1.0We choose the least values from each class that is records 1 and 2 as candidates for cross over
Genetic operators
The most widespread genetic operators are crossover and mutation
Crossover forms new elements for the population by combining
parts of two elements currently in population, which are mostly
those destined for elimination
Mutation, a second genetic operator is sparingly applied which uses
flipping of bits within an element.
The termination criteria of the genetic algorithm can be a fixed
number of iterations or elements within the population meeting
some minimum criteria
The total number of population elements should remain constant
after iterations.
Lets make the point of cross over around life insurance
attribute. One possibility can be combining the first two
attributes from instance 1 with last 3 attributes of
instance 2 and vice versa as shown below for the cross
over operation
Population Income Life Insurance Credit Card Population Income Life Insurance Credit Card
Sex Age Sex Age
Element Range Promotion Insurance Element Range Promotion Insurance
#1 20-30K No Yes Male 30-39 #2 30-40K Yes Yes Male 30-39

Population Income Life Insurance Credit Card Population Income Life Insurance Credit Card
Sex Age Sex Age
Element Range Promotion Insurance Element Range Promotion Insurance
#2 30-40K Yes No Fem 50-59 #1 20-30K No No Fem 50-59

Figure 3.9 A crossover operation


Table 3.10 A Second-Generation Population

Population Income Life Insurance Credit Card


Element Range Promotion Insurance Sex Age

1 2030K No No Female 5059


2 3040K Yes Yes Male 3039
3 ? No No Male 4049
4 3040K Yes Yes Male 4049
We can see the ftness score of the first score has
improved to 7/5 and 6/4
To use the model, we can compare a new unknown
instance with the elements of the final population. A
simple technique is to give the unknown instance the
same classification as the population element to which
it is most similar.
Genetic Algorithms and
Unsupervised Clustering
E11
S1
a1 a2 a3 . . . an E12

I1
E21
I2 S2
.
P .
.
. E22

instances .
.
.
.
.
.
.
Ip .
Ek1
SK
Ek2

Solutions

Figure 3.10 Unsupervised genetic clustering


Table 3.11 A First-Generation Population for Unsupervised Clustering
S S S
1 2 3

Solution elements (1.0,1.0) (3.0,2.0) (4.0,3.0)


(initial population) (5.0,5.0) (3.0,5.0) (5.0,1.0)

Fitness score 11.31 9.78 15.55

Solution elements (5.0,1.0) (3.0,2.0) (4.0,3.0)


(second generation) (5.0,5.0) (3.0,5.0) (1.0,1.0)

Fitness score 17.96 9.78 11.34

Solution elements (5.0,5.0) (3.0,2.0) (4.0,3.0)


(third generation) (1.0,5.0) (3.0,5.0) (1.0,1.0)

Fitness score 13.64 9.78 11.34


General Considerations
Global optimization is not a
guarantee.
The fitness function determines the
complexity of the algorithm.
Explain their results provided the
fitness function is understandable.
Transforming the data to a form
suitable for genetic learning can be a
challenge.
3.5 Choosing a Data Mining
Technique
Initial Considerations
Is learning supervised or unsupervised?
Is explanation required?
What is the interaction between input and
output attributes?
What are the data types of the input and
output attributes?
Further Considerations

Do We Know the Distribution of the Data?


Do We Know Which Attributes Best Define
the Data?
Does the Data Contain Missing Values?
Is Time an Issue?
Which Technique Is Most Likely to Give a
Best Test Set Accuracy?

Você também pode gostar