Você está na página 1de 43

ICT619 Intelligent

Systems

Topic 5: Genetic
Algorithms
Genetic Algorithms

Introduction
How GAs work
The TSP as an example
Business Applications of GA
Advantages of GA systems
Some issues related to GA based systems
Case Study

ICT619 2
What is a genetic algorithm?

GA part of the broader soft computing (aka


"computational intelligence") paradigm known as
evolutionary computation

First introduced by Holland (1975)


Inspired by possibility of problem solving through a
process of evolution
ICT619 3
What is a GA? (contd)

GA mimics biological evolution to generate better


solutions from existing solutions through
survival of the fittest
crossbreeding and
mutation

ICT619 4
What is a GA? (contd)

A GA is capable of finding solutions for many problems


for which no usable algorithmic solutions exist

GA methodology particularly suited for optimization

Optimization searches a solution space consisting of a


large number of possible solutions

GA reduces the search space through evolution of


solutions, conceived as individuals in a population

ICT619 5
Intelligence and Evolution
One way of understanding intelligence is as the capability of a creature
to adapt itself to an ever-changing environment
We normally think of adaptation as changes in the characteristics
(including behaviours) of a single animal in response to experiences
over its history
But adaptation is also change in the characteristics of a species, over
the generations, in response to environmental change

An individual creature is in competition with other individuals of the


same species for resources, mates etc.
There is also rivalry from other species which may be a direct (predator)
or indirect (food, water, land, etc.) threat
In nature, evolution operates on populations of organisms, ensuring by
natural selection that characteristics that serve the members well tend
to be passed on to the next generation, while those that dont die out

ICT619 6
Evolution as Optimisation
Evolution can be seen as a process leading to the optimisation of a
populations ability to survive and thus reproduce in a specific
environment.
Evolutionary fitness - the measure of the ability to respond adequately
to the environment, is the quantity that is actually optimised in natural
life
Consider a normal population of rabbits. Some rabbits are naturally
faster than others. Any characteristic has a range of variation that is due
to i) sexual reproduction and ii) mutation
We may say that the faster rabbits possess superior fitness, since they
have a greater chance of avoiding foxes, surviving and then breeding
If two parents have superior fitness, there is a good chance that a
combination of their genes will produce an offspring with even higher
fitness. We say that there is crossover between the parents genes

Over successive generations, the entire population of rabbits tends to


become faster to meet their environment challenges in the face of foxes
ICT619 7
How GAs work
A population of candidate solutions - mathematical
representations - is repeatedly altered until an optimal
solution is found

The GA evolutionary cycle


1. Starts with a randomly generated initial population of solutions
(1st generation)

2. Selects a population of better solutions (next generation) by


using a measure of 'goodness' (a fitness evaluation function)

3. Alters new generation population through crossbreeding and


mutation

Processes of selection (step 2) and alteration (step 3) lead to


a population with a higher proportion of better solutions

ICT619 8
How GAs work (contd)

The GA evolutionary cycle continues until an


acceptable solution is found in the current population,
or
some control parameter such as the maximum
number of generations is exceeded
ICT619 9
How solutions are represented

A series of genes, known as a chromosome,


represents one possible solution

Each gene in the chromosome represents one


component of the solution pattern

Each gene can have one of a number of possible


values known as alleles

The process of converting a solution from its original


form into a chromosome is known as coding

ICT619 10
How solutions are represented
(contd)
The most common form of representing a solution as a
chromosome is a string of binary digits (aka a binary
vector) eg 1010110101001

Each bit in this string is a gene with two alleles: 0 and 1

Other forms of representation are also used, eg,


integer vectors

Solution bit strings are decoded to enable them to be


evaluated using a fitness measure

ICT619 11
GA Selection
Selection in GA based on a process analogous to that
of biological evolution

Only the fittest survive and contribute to the gene pool


of the next generation

Fitness proportional selection


Each chromosomes likelihood of being selected is
proportional to its fitness value.

Solutions failing selection are bad, and are discarded

ICT619 12
Alteration = Crossover + Mutation
Alteration refines good solutions from current generation to
produce next generation of solutions

Carried out by performing crossover and mutation

Crossover by splicing two chromosomes at a crossover point and


swapping the spliced parts

A better chromosome may be created by combining genes with


good characteristics from one chromosome with some good genes
in the other chromosome

Crossover carried out with a probability typically 0.7

Chromosomes not crossed over are cloned

ICT619 13
Crossover and Mutation

Mutation
A random adjustment in the genetic composition
Can be useful for introducing new characteristics in a
population
May be counterproductive
Probability kept low: typically 0.001 to 0.01
ICT619 14
ICT619 15
An albino is a common mutation
The typical Genetic Algorithm

1. Represent the solution as a chromosome of fixed


length, choose size of population N, crossover
probability pc and mutation probability pm.

2. Define a fitness function f for measuring fitness of


chromosomes.

3. Create an initial solution population randomly of size


N:
x1, x2, , xN

4. Use the fitness function f to evaluate the fitness value


of each solution in the current generation:
f(x1), f(x2), , f(xN)
ICT619 16
The typical Genetic Algorithm
(contd)
5. Select good solutions based on fitness value.
Discard rest of the solutions.

6. If acceptable solution(s) found in the current


generation or maximum number of generations is
exceeded then stop.

7. Alter the solution population using crossover and


mutation to create a new generation of solutions with
population size N.

8. Go to step 4.

ICT619 17
The typical Genetic Algorithm
(contd)

ICT619 18
The Travelling Salesperson Problem
Given a set of n cities (A, B, C, ...) find a closed tour of all cities
with the shortest total distance d

Tour 'cost' may be something other than distance d

This is an optimization problem with following constraints

1. Each city to be visited once and only once


2. Total distance travelled must be shortest
possible

The time required to find a solution by exhaustive search


increases exponentially - the problem is NP-hard

Possible number of tours for n cities = n!/2n

1 million centuries for 50 cities at the rate of 1 billion tours per sec!
ICT619 19
The Travelling Salesperson Problem

In 1987, Martin Groetschel and Olaf Holland found an optimal tour of 666
interesting places in the world. Source: http://www.tsp.gatech.edu//index.html
The TSP example (contd)

Representation and coding of TSP solutions

The representation might be an ordered list of numbers each


representing a city nominally (known as order-based GA):

1) London 3) Dunedin 5) Beijing 7) Tokyo


2) Venice 4) Singapore 6) Phoenix 8) Victoria

CityList1 (3 5 7 2 1 6 4 8)
CityList2 (2 5 7 6 8 1 3 4)

Alternatively, the representation of the solution may be


encoded in binary on a matrix...

ICT619 21
The TSP example (contd)
Representation and coding of TSP solutions
A solution to the TSP problem is an ordered list of the n
cities
Each city is assigned 1 out of n possible positions
Representation of the solution may be visualised as a
table:
Each row represents a city
Each column associated with a tour position for cities

ICT619 22
The TSP example (contd)

The tour represented above is CAEBDC


One possible bit string code for this solution:
01000 00010 1000 00001 00100
(rows written end to end)
Binary bit strings can produce "faulty" chromosomes
needing repair
An integer vector scheme produced a 100 city tour
9.4% above optimal cost
ICT619 23
An Optimal 100-City Tour

ICT619 24
Business Applications of GA

Increasing number of industrial and business


applications of GA since late 1980s

In business, applications include (Kingdon 1997)


Portfolio optimisation
Bankruptcy prediction
Financial forecasting
Fraud detection
Scheduling
Design of complex
machines eg. jet engines

ICT619 25
Business Applications of GA (contd)

First Quadrant - investment firm in


California
Started using GA technique in 1993
Uses GA to manage US$5 billion worth of
investments in 17 different countries
Their evolved model earns, on average, $255 for
every $100 invested over six years, as opposed to
$205 for other types of modeling systems (Begley &
Beals, 1995)

ICT619 26
Advantages of GA systems

Useful when no algorithms or heuristics are


available for solving a problem

No formulation of the solution is required - only


"recognition" of a good solution

A GA system can be built as long as a solution


representation and an evaluation scheme can
be worked out
So minimal domain expert access is required

ICT619 27
Advantages of GA systems

GA can act as an alternative to -


Expert Systems if
number of rules is too large or
the nature of the knowledge-base too dynamic

Traditional optimization techniques if


constraints and objective functions are non-linear
and/or discontinuous

ICT619 28
Advantages of GA systems (cont'd)

GA does not guarantee optimal solutions, but produce


near optimal solutions which are likely to be very good

Solution time with GA is highly predictable


Determined by
Size of the population
Time taken to decode and evaluate a solution and
Number of generations of population

GA uses simple operations to solve problems that are


computationally prohibitive otherwise
Example: the TSP problem

ICT619 29
Advantages of GA systems (cont'd)

Because of simplicity, GA software are


Reasonably sized and self-contained
Easier to embed them as a module in another
system

GA can also aid in developing intelligent


business systems that use other
methodologies, eg,
Building the rule base of an expert system
Finding optimal neural networks
ICT619 30
Some issues related to GA based
systems
Level of explainability
Capability to explain why a particular solution was
arrived at is practically nil
The system does not know what a fitness value really
means

Scalability
Moderately scalable
Accommodates increased number of variables by
increasing the length of the chromosome
But
A longer chromosome means a larger population space
(more potential combinations of genes)
More time required for decoding and fitness evaluation
ICT619 31
Some issues related to GA based
systems (contd)
Data requirements
In general, GA do not require extensive access to data
but some applications may need it to evaluate solutions
This makes the quality and quantity of data is important

Local maxima
Local maxima are regions that hold good solutions
relative to regions around them, but which do not
necessarily contain the best overall solutions
The region(s) that contain the best solutions are called
global maxima
GAs are less prone to being trapped in local maxima
because of the use of mutation and crossover
ICT619 32
Some issues related to GA based
systems (contd)

Premature convergence
A GA is said to have converged prematurely if it
explores a local maximum extensively

It may be then dominated by very similar solutions


within the region

Most significant factor leading to such convergence is a


mutation rate which is too slow

Mutation interference is an effect opposite to that of


premature convergence
ICT619 33
Some issues related to GA based
systems (contd)

Mutation interference
Finding a mutation rate which allows the GA to
converge but which also allows adequate exploration of
the solution space is essential for satisfactory
performance

Mutation interference occurs when mutation rates in a


GA are too high, and as a result:
Solutions are frequently or drastically mutated
The algorithm never manages to explore any region of
the space thoroughly
Any good solutions found tend to be destroyed rapidly

ICT619 34
Case Study - Help Desk Task
Scheduling (Dhar & Stein 1997,
pp.219-227)
GA based system developed at Moodys for
scheduling service tasks to its customer
service representatives

Major constraints
The system
Must minimise computer downtime and customer
dissatisfaction
Must integrate with existing database system which kept track
help desk requests. ICT619 35
Case study constraints (contd)

Must be flexible to
Accommodate new types of task definitions and changes
in employee, training etc.
Allow administrator to modify solutions

Must generate and reevaluate schedules quickly


(under 15 minutes) and consistently

Must not take administrator or CSRs away from their


jobs for any extended period of time

Must be developed quickly


ICT619 36
Case study constraints (contd)

Should be scalable in case of future growth in number


of requests for help and the number of CSRs

Must not be too complicated for its users the


administrator and CSRs

The main difficulties in meeting the constraints:


the large number of tasks
the large number of CSRs
the varying capabilities of CSRs, and
the wide variety of tasks
ICT619 37
Case Study - Variables and issues
needing to be considered
The priority of a task, which is determined by the severity of
the problem

The length of time required to perform the task and how it


would affect the servicing of other users

The ability of various CSRs to perform different levels of


tasks (expertise must match the complexity)

Low priority tasks must not be kept waiting indefinitely

The measure of goodness of a schedule to be based on


amount of downtime each schedule cost the organisation.

ICT619 38
Possible solution methodologies
considered
1. Traditional linear programming (a numerical
optimisation technique)
2. A rule based expert system
3. A GA based system

ES ruled out because


Expertise to solve this problem not expressible as a set of
rules
Help desk administrator not available for knowledge extraction

Linear programming ruled out because


It fails if no optimal solution can be found
It does not produce any sub-optimal solutions, which is the
case with GA.
ICT619 39
Case Study - the solution

SOGA (Schedule Optimising for GA)


A hybrid system consisting of GA and fuzzy system
components
The GA component deals with the scheduling task.

Each task in the queue is represented by a gene

The entire task list forms the chromosome

Each chromosome is decoded by a scheduling module


that assigns tasks to available CSRs who can perform
them
ICT619 40
Case Study - the solution (contd)
Fitness of each chromosome is determined by
calculating the amount of downtime that would result
based on the schedule represented by the
chromosome.

Schedules generated by the GA component are


modified by the FS component

SOGA runs in the background behind the help request


tracking system
Updates schedules based upon a predefined time
interval (eg, every 10 or 15 minutes)
CSRs access their current job queue through their
interface to accept jobs.
ICT619 41
Case Study - Results

The system is timely generating schedules in about 5


minutes.

The solutions are found to be good by the help desk


administrator

The system is flexible enough to allow for task


definitions

The system scales up well to larger domains (higher


number of tasks)

The SOGA system was developed in two months using


one programmer overseen by its designers
ICT619 42
REFERENCES
Begley, S. and Beals, G. "Software au naturel." Newsweek, May 8, 1995,
p.70
Dhar, V., & Stein, R., Seven Methods for Transforming Corporate
Data into Business Intelligence., Prentice Hall 1997, pp. 126-148,
203-210.
Goldberg, D. E., Genetic and Evolutionary Algorithms Come of
Age, Communications of the ACM, Vol.37, No.3, March 1994,
pp.113-119.
Holland, J. H., Adaptation in Natural and Artificial Systems, Univ.
of Michigan Press, 1975.
Kingdon, J., Intelligent Systems and Financial Forecasting,
Springer Verlag, London 1997.
Medsker,L., Hybrid Intelligent Systems, Kluwer Academic Press,
Boston 1995.
Michalewicz, Z., Genetic Algorithms + Data Structures = Evolution
Programs, Springer-Verlag, Berlin 1996.
Negnevitsky, M. Artificial Intelligence A Guide to Intelligent
Systems, Addison-Wesley 2005.
ICT619 43

Você também pode gostar