Você está na página 1de 46

Chapter 1 Articial Intelligence and Neural Networks

Sven Helmer Birkbeck College, London

What is AI?
Articial Intelligence (AI) is a very new eld (term was coined in 1956)

Introduction

Along with molecular biology cited as eld I would most like to be in Quite a universal eld: systematizes and automates intellectual tasks, potentially relevant to many human activities

Articial Intelligence andNeural Networks p.1/410

Articial Intelligence andNeural Networks p.2/410

Articial Intelligence andNeural Networks p.3/

Beyond the Hype


Many different denitions Roughly vary along two main dimensions

Acting Humanly
Turing (1950): Computing machinery and intelligence Can machines think? Can machines behave intelligently? Turing Test:
HUMAN HUMAN INTERROGATOR

Acting Humanly(2)
Although researchers dont focus on passing Turing test, it is still valid AI researchers concentrate on underlying issues Articial ight succeeded after turning away from imitating birds and looking at aerodynamics

reasoning

human

rational

?
AI SYSTEM

behavior
All four approaches have been pursued
Articial Intelligence andNeural Networks p.4/410

To pass test, computer would need following capabilities: language, knowledge, reasoning, understanding, learning
Articial Intelligence andNeural Networks p.5/410

Articial Intelligence andNeural Networks p.6/

Thinking Humanly
1960s: cognitive revolution, information-processing psychology started replacing behaviorism Requires scientic theories on internal activities of brain Cognitive science (top-down): predicting and testing human behavior Cognitive Neuroscience (bottom-up): identication from neurological data Distinct approaches, although there is some overlap

Thinking Rationally
Aristotle: one of the rst to codify laws of thought From this the eld of logic evolved Provides patterns for argument that given correct premises yield correct solutions By 1965 programs that could (in principle) solve any solvable problem described in logical notation Two main obstacles: Difcult to state informal, uncertain knowledge in formal, logical notation Solving in principle doesnt mean solving in practice

Acting Rationally
Rational behavior: doing the right thing Given available information, try to achieve best outcome Doesnt necessarily involve thinking (e.g. reexes) More general than thinking rationally (which is only concerned with correct inference) Also better suited for scientic development than human-based approaches Human-based approaches mimic behavior that is result of complicated and largely unknown evolutionary process Therefore we focus on the approach of acting rationally

Articial Intelligence andNeural Networks p.7/410

Articial Intelligence andNeural Networks p.8/410

Articial Intelligence andNeural Networks p.9/

Rational Agents
An agent is an entity that perceives and acts (autonomously) Formally, an agent is a function from a perception history to actions: f : P A We look for agents with best performance for a given class of environments and tasks Perfect rationality is unrealistic, aim for best program for given resources Philosophy

Foundations of AI
Can formal rules be used to draw valid conclusions? How does mental mind arise from physical brain? Where does knowledge come from? How does knowledge lead to action? Mathematics What are formal rules to draw valid conclusions? What can be computed? How do we reason with uncertain information?

Foundations of AI(2)
Economics How to make decisions to maximize payoff? What about others not going along? What if the payoff is far in the future? Neuroscience How do brains process information? Psychology How do humans and animals think and act?

Articial Intelligence andNeural Networks p.10/410

Articial Intelligence andNeural Networks p.11/410

Articial Intelligence andNeural Networks p.12/

Foundations of AI(3)
Computer engineering How can we build efcient computers? Control theory and cybernetics How can artifacts operate under their own control? Linguistics How does language relate to thought?

History of AI
1952-1969: Early enthusiasm, great expectations 1966-1973: A dose of reality 1969-1979: Knowledge-based systems 1980-present: AI becomes an industry 1986-present: Return of neural networks 1987-present: AI becomes a science 1995-present: Intelligent agents

State of the Art


Which of the following can be done today? Play a (decent) game of air hockey (Safely) drive through the desert (Safely) drive through Central London at rush hour Buy a weeks worth of groceries on the web Buy a weeks worth of groceries at the local supermarket on Saturday Play a game of chess Translate languages Converse successfully with another person for an hour

Articial Intelligence andNeural Networks p.13/410

Articial Intelligence andNeural Networks p.14/410

Articial Intelligence andNeural Networks p.15/

Chapter Summary
Different people think differently of AI Are you concerned with thinking or behavior? Do you want to model humans or work from ideal standard We focus on rational action

Overview of Lecture
General introduction to AI Problem Solving Neural Networks Evolutionary Computing Swarm Intelligence Fuzzy Systems Social and Philosophical Implications of AI

Chapter 2

Intelligent Agents

Articial Intelligence andNeural Networks p.16/410

Articial Intelligence andNeural Networks p.17/410

Articial Intelligence andNeural Networks p.18/

Outline
Agents and environments Rationality PEAS (Performance measure, Environment, Actuators, Sensors) Environment types Agent types

Agents and Environments


sensors percepts environment actions ? agent

Vacuum Cleaner World


A B

actuators

Agents include humans, robots, programs Remember: agent function maps from perception history to actions: f : P A Agent program runs on a physical architecture to produce f
Articial Intelligence andNeural Networks p.19/410 Articial Intelligence andNeural Networks p.20/410

Perception: location and contents, e.g. [ A, dirty ] Actions: Left, Right, Suck, Do Nothing

Articial Intelligence andNeural Networks p.21/

Vacuum Cleaner Agent


Perception history
[A,Clean] [A,Dirty] [B,Clean] [B,Dirty] [A,Clean], [A,Clean] [A,Clean], [A,Dirty]

Vacuum Cleaner Agent(2)


function VACUUM-AGENT([location,status]) if status = Dirty then return Suck else if location = A then return Right else return Left

Rationality
What is rational at any given time depends on Performance measure that denes success Agents prior knowledge of the environment Actions that an agent can perform Agents perception history to date Rational agent: For each possible perception history selects an action that is expected to maximize performance measure given history and built-in knowledge Is our vacuum-cleaning function rational?

Action Right Suck Left Suck Right Suck

Is this the right function (i.e. is it rational)?

Can we implement this in a small agent program?

Articial Intelligence andNeural Networks p.22/410

Articial Intelligence andNeural Networks p.23/410

Articial Intelligence andNeural Networks p.24/

Rationality(2)
Depends on the situation Assume following performance measure One point per clean location per time step Environment is known to exist only of locations A and B Clean locations stay clean, sucking cleans dirty locations We only have the aforementioned four actions Agent correctly perceives location and status In this case it is rational Is it still rational, if we give a penalty of one point per move?
Articial Intelligence andNeural Networks p.25/410

Rationality(3)
Rational = omniscient: perceptions may not supply all relevant information Rational = clairvoyant: outcomes of actions may not be as expected Hence, rational does not always mean successful Being rational is also about Exploration: doing actions to modify future perceptions Learning: changing prior knowledge due to perceptions Autonomy: not relying on designers input alone

PEAS
When designing a rational agent, we need to consider performance measure, environment, actuators, sensors Example: designing an automated taxi Performance measure: safety, destination, prots, legality, comfort, . . . Environment: streets and motorways, trafc, pedestrians, weather, day-/nighttime, . . . Actuators: steering, accelerator, brake, horn, speaker/display, . . . Sensors: video, accelerometer, gauges, engine sensors, keyboard, microphone, GPS, . . .

Articial Intelligence andNeural Networks p.26/410

Articial Intelligence andNeural Networks p.27/

PEAS
Example: internet shopping Performance measure: price, quality, appropriateness, efciency Environment: current and future web sites of vendors and shippers Actuators: displaying to user, following URLs, lling in forms Sensors: HTML pages (text, graphics, scripts)

Environment Types
Fully observable vs. partially observable: sensors are able to detect all relevant aspects / noisy or inaccurate sensors Deterministic vs. stochastic: next state completely dened by current state / uncertainty Episodic vs. sequential: future episodes do not depend on decisions in previous ones / short-term actions can have long-term consequences

Environment Types(2)
Static vs. dynamic: no change in environment while agent is deliberating / change in environment Discrete vs. continuous: nite number of distinct states / smooth range of values Single agent vs. multi agent: agent working by itself / agent has to compete or cooperate with other agents

Articial Intelligence andNeural Networks p.28/410

Articial Intelligence andNeural Networks p.29/410

Articial Intelligence andNeural Networks p.30/

Agent Types
Four basic types in order of increasing generality Simple reex agent Reex agent with state Goal-based agent Utility-based agent All of them can be turned into learning agents

Simple Reex Agent


Agent
Sensors What the world is like now

Example
The vacuum agent from before is a simple reex agent
function VACUUM-AGENT([location,status]) if status = Dirty then return Suck else if location = A then return Right else return Left

Environment

Conditionaction rules

What action I should do now

Actuators

Articial Intelligence andNeural Networks p.31/410

Articial Intelligence andNeural Networks p.32/410

Articial Intelligence andNeural Networks p.33/

Reex Agent with State


Sensors State How the world evolves What the world is like now

Example
cleaned[A], cleaned[B] = false function VACUUM-AGENT([location,status]) if status = Dirty then cleaned[location] = true return Suck else if location = A then cleaned[A] = true if cleaned[B] then return Do nothing else return Right else cleaned[B] = true if cleaned[A] then return Do nothing else return Left
Articial Intelligence andNeural Networks p.35/410

Goal-based Agent
Knowing about the current state is not always enough Correct decision depends on what our goal is Agent has to determine if action will bring it towards goal Doing this involves planning Vacuum-cleaning agent had goal hardwired into program: clean up all locations (so no planning involved)

Environment

What my actions do

Conditionaction rules

What action I should do now

Agent

Actuators

Articial Intelligence andNeural Networks p.34/410

Articial Intelligence andNeural Networks p.36/

Goal-based Agent(2)
Sensors State How the world evolves What the world is like now What it will be like if I do action A

Utility-based Agent
Goals alone are not enough E.g. many ways for a taxi to get to destination, but some are quicker, safer, more reliable, cheaper, . . . How happy does a state make an agent? Happy doesnt sound very scientic, therefore its called (high) utility

Utility-based Agent(2)
Sensors State How the world evolves What the world is like now What it will be like if I do action A How happy I will be in such a state What action I should do now

Environment

Environment

What my actions do

What my actions do

Utility

Goals

What action I should do now

Agent

Actuators

Agent

Actuators

Articial Intelligence andNeural Networks p.37/410

Articial Intelligence andNeural Networks p.38/410

Articial Intelligence andNeural Networks p.39/

Learning Agent
How do programs for selecting actions come into being? Programming everything by hand is very laborious Alternative: build machines that can learn and then teach them

Learning Agent(2)
Performance standard

Chapter Summary
Agents interact with environments through actuators and sensors

Critic

Sensors

An agent function f describes what the agent does in all circumstances A performance measure evaluates the sequence of actions and its effects A rational agent maximizes the expected performance
Environment

feedback changes
Learning element learning goals Problem generator

knowledge

Performance element

Agent
Articial Intelligence andNeural Networks p.40/410

Actuators

Articial Intelligence andNeural Networks p.41/410

Articial Intelligence andNeural Networks p.42/

Chapter 3

Outline
Problem-solving agents Problem formulation Example problems

Problem-solving Agents
Simplest agents discussed so far are reex agents They use direct mapping from states to actions Unsuitable for very large mappings Goal-based agents consider future actions and their outcome Problem-solving agents are one kind of goal-based agents Problem-solving agents nd action sequences that lead to desirable states

Problem Solving and Search

Basic search algorithms

Articial Intelligence andNeural Networks p.43/410

Articial Intelligence andNeural Networks p.44/410

Articial Intelligence andNeural Networks p.45/

Problem-solving Agents(2)
seq: an action sequence, initially empty state: a description of current world state goal: a goal, initially null problem: a problem formulation function SIMPLE-PROBLEM-SOLVER(perception) state = UPDATE-STATE(state, perception) if seq is empty then goal = FORMULATE-GOAL(state) problem = FORMULATE-PROBLEM(state,goal) seq = SEARCH(problem) action = FIRST(seq) seq = REST(seq) return action
Articial Intelligence andNeural Networks p.46/410

Problem-solving Agents(3)
Agent on previous slide does ofine problem solving Simple formulate, search, execute design When executing the sequence of actions, it ignores the perceptions Assumes that solution that was found will always work

Example: Romania
On holiday in Romania Currently in Arad, ight leaves tomorrow from Bucharest Formulate goal: Be in Bucharest in time Formulate problem: States: being in various cities Actions: drive between cities Find solution: Sequence of cities, e.g. Arad, Sibiu, Fagaras, Bucharest Execute solution

Articial Intelligence andNeural Networks p.47/410

Articial Intelligence andNeural Networks p.48/

Simplied Road Map


Oradea

A More Detailed Look


In a rst step we look at the process of problem formulation

Problem Formulation
Problem is dened by four components Initial state: the state in which agent starts e.g. In(Arad) Successor function S(x): description of possible actions and their outcome e.g. S(Arad) = { Go(Sibiu), In(Sibiu) , . . . } Goal test: determines if given state is a goal state e.g. explicit: In(Bucharest), implicit: king checkmated Path cost: function that assigns numeric cost to each path (reects performance measure) e.g. route distances between cities

71
Zerind

Neamt

75
Arad

151

87
Iasi

Then take a look at how to search


92

140
Sibiu

99

Fagaras Vaslui

118
Timisoara

80 Rimnicu Vilcea
Pitesti

111 70

Lugoj Mehadia

211

142 98

97 146

101 138

85

Hirsova

Urziceni

75
Dobreta

86
Bucharest

120
Craiova

90
Giurgiu Eforie

Articial Intelligence andNeural Networks p.49/410

Articial Intelligence andNeural Networks p.50/410

Articial Intelligence andNeural Networks p.51/

Problem Formulation(2)
At the moment we assume a single-state problem Deterministic, fully observable environment Agent knows exactly which state it will be in Solution is a sequence of actions from initial state to a goal state Solution quality is measured by path cost Optimal solution has lowest path cost among all solutions

Selecting State Space


Formulation of Romanian holiday problem seems reasonable, yet omits many details Real state of the world is much more complex: condition of the road, condition of the car, weather, fuel level, . . . Actions are not trivial either: keep car on the road, fuel car, switch on/off lights, use indicator, . . . Outcomes of actions are also more complex: besides changing location, it takes up time, consumes fuel, generates pollution, . . . Agent trying to consider all these details would be completely swamped

Selecting State Space(2)


State space and actions need to be abstracted for problem solving Removing irrelevant and keeping relevant details An abstract solution corresponds to a large number of detailed plans E.g. driving with lights on between Sibiu and Rimnicu Vilcea, then switch them off, fuel the car between Pitesti and Bucharest, . . . Abstraction is valid if we can expand abstract solution into a real-world solution straightforwardly

Articial Intelligence andNeural Networks p.52/410

Articial Intelligence andNeural Networks p.53/410

Articial Intelligence andNeural Networks p.54/

Example: Vacuum World


States: agent has two possible locations, each of which might be dirty or not, so 2 22 = 8 possible states Initial state: any state can be designated as initial state Successor function: generates legal states that results from trying (Left, Right, Suck) (Do nothing stays in current state) Goal test: checks whether all locations are clean Path cost: Lets assume each step costs 1, path cost = number of steps in solution

Example: Vacuum World(2)


R L L S R L L S S R L L S S R S S R L L S R R R

Example: 8-puzzle
3 3 board with eight numbered tiles and a blank space

Tiles adjacent to blank space can slide into the space

7 5 8

4 6

5 1 4 7

2 5 8
Goal State

3 6

3
Start State

Lower two states are goal states

Articial Intelligence andNeural Networks p.55/410

Articial Intelligence andNeural Networks p.56/410

Articial Intelligence andNeural Networks p.57/

Example: 8-puzzle
States: Specifying the location of each tile and the blank in one of the nine squares Initial state: any state can be designated as initial state Successor function: generates legal states that results from the four actions (blank moves Left, Right, Up, Down) Goal test: checks whether conguration matches goal state from previous slides Path cost: Lets assume each step costs 1, path cost = number of steps in solution

Real-world Examples
Touring problems: traveling salesman, delivery tour VLSI layout: positioning millions of components and connections on a chip (minimizing area, circuit delays, stray capacitance, . . . ) Robot navigation: no discrete sets of routes, continuous space Automatic assembly sequence: nd an order in which to assemble parts of an object Protein design: nd a sequence of amino acids that will fold into three-dimensional protein with right properties Internet searching: looking for relevant information on the Web
Articial Intelligence andNeural Networks p.58/410 Articial Intelligence andNeural Networks p.59/410

Searching for Solutions


Having formulated problems, we now need to solve them This is done by searching through the state space Search tree is generated by taking initial state and applying successor function to it

Articial Intelligence andNeural Networks p.60/

Basic Tree Search Algorithm


function TREE-SEARCH(problem,strategy) initialize tree with initial problem state loop do if no candidates can be expanded then return failure choose leaf node for expansion (according to strategy) expand node add resulting nodes to tree if one of nodes contains goal state then return solution end

Tree Search Example


Arad

Tree Search Example(2)


Arad

Sibiu

Timisoara

Zerind

Sibiu

Timisoara

Zerind

Arad

Fagaras

Oradea

Rimnicu Vilcea

Arad

Lugoj

Arad

Oradea

Arad

Fagaras

Oradea

Rimnicu Vilcea

Arad

Lugoj

Arad

Oradea

Articial Intelligence andNeural Networks p.61/410

Articial Intelligence andNeural Networks p.62/410

Articial Intelligence andNeural Networks p.63/

Tree Search Example(3)


Arad

States vs. Nodes


It is important to distinguish between the state space and the search tree

General Tree Search Algorithm


Collection of generated nodes not yet expanded is called fringe Nodes with bold outlines on previous slides make up fringe Each element of fringe is a leaf node A search strategy is implemented by a function that removes a node from the fringe expands it adds generated nodes to the fringe Fringe often realized in form of a queue to simplify selection process

Sibiu

Timisoara

Zerind

E.g. Romanian route planning


Oradea

Arad

Fagaras

Oradea

Rimnicu Vilcea

Arad

Lugoj

Arad

20 states, one for each city Innite number of paths, innite number of nodes (good search algorithm avoids this) A node in the search tree is a bookkeeping data structure A state corresponds to a conguration of the world Mere convenience in example naming nodes after states

Articial Intelligence andNeural Networks p.64/410

Articial Intelligence andNeural Networks p.65/410

Articial Intelligence andNeural Networks p.66/

General Tree Search Algorithm(2)


function GEN-TREE-SEARCH(problem,fringe) fringe = INSERT(INITIAL-STATE(problem),fringe) loop do if fringe is empty then return failure node = REMOVE-FIRST(fringe) if GOAL-TEST(problem,STATE(node)) then return node fringe = INSERT(EXPAND(node,problem),fringe) end

General Tree Search Algorithm(3)


function EXPAND(node,problem) successors = empty set for each <action,result> in S(problem,STATE(node)) do n = new node PARENT-NODE[n] = node ACTION[n] = action STATE[n] = result PATH-COST[n] = PATH-COST(node) + STEP-COST(node,action,n) DEPTH[n] = DEPTH[node] + 1 add n to successors end return successors
Articial Intelligence andNeural Networks p.68/410

Search Strategies
Strategy is dened by picking the order of node expansion Uninformed strategies use only the information available in problem denition Breadth-rst search Uniform-cost search Depth-rst search Depth-limited search Iterative deepening search

Articial Intelligence andNeural Networks p.67/410

Articial Intelligence andNeural Networks p.69/

Breadth-rst Search (BFS)


Expand shallowest unexpanded node Implementation: fringe is a FIFO queue, i.e., new successors are added at the end

Breadth-rst Search(2)
Expand shallowest unexpanded node Implementation: fringe is a FIFO queue, i.e., new successors are added at the end

Breadth-rst Search(3)
Expand shallowest unexpanded node Implementation: fringe is a FIFO queue, i.e., new successors are added at the end

A B D E F C G D B E

A C F G D B E

A C F G

Articial Intelligence andNeural Networks p.70/410

Articial Intelligence andNeural Networks p.71/410

Articial Intelligence andNeural Networks p.72/

Breadth-rst Search(4)
Expand shallowest unexpanded node Implementation: fringe is a FIFO queue, i.e., new successors are added at the end

Quality of Strategy
How do we measure the quality of a search strategy? Four aspects: Completeness: does algorithm nd a solution if there is one? Optimality: is strategy able to nd optimal solution? Time complexity: how long does it take to nd a solution? Space complexity: how much memory is needed to perform search?

Quality of Strategy(2)
Complexity is expressed in terms of three quantities Branching factor (b): maximum number of successors of any node Depth (d): depth of the shallowest goal node Maximum length of any path (m) in search space

A B D E F C G

Articial Intelligence andNeural Networks p.73/410

Articial Intelligence andNeural Networks p.74/410

Articial Intelligence andNeural Networks p.75/

How does BFS fare?


Completeness: yes (if b is nite) Optimality: only if cost = 1 per step (not optimal in general) Time complexity: 1 + b + b2 + b3 + + bd + (bd+1 b) = O (bd+1 ) Space complexity: O (bd+1 ) (keeps whole fringe in memory) Big problem: can easily generate nodes at 100MB/sec, searching large search space can take hours

Uniform-cost Search(UCS)
Expand least-cost unexpanded node Implementation: fringe is a queue ordered by path cost Equivalent to BFS if step costs are all equal

How does UCS fare?


Completeness: yes (if step cost ) Optimality: yes (nodes expanded in increasing order of costs) Time complexity: number of nodes with costs C (cost of optimal solution), O (bC / ) Space complexity: number of nodes with costs C , O (bC / )

Articial Intelligence andNeural Networks p.76/410

Articial Intelligence andNeural Networks p.77/410

Articial Intelligence andNeural Networks p.78/

Depth-rst Search (DFS)


Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

Depth-rst Search(2)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

Depth-rst Search(3)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

A B D H I J E K L F M N C G O H D I J B E K

A C F L M N G O H D I J B E K

A C F L M N G O

Articial Intelligence andNeural Networks p.79/410

Articial Intelligence andNeural Networks p.80/410

Articial Intelligence andNeural Networks p.81/

Depth-rst Search(4)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

Depth-rst Search(5)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

Depth-rst Search(6)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

A B D H I J E K L F M N C G O H D I J B E K

A C F L M N G O H D I J B E K

A C F L M N G O

Articial Intelligence andNeural Networks p.82/410

Articial Intelligence andNeural Networks p.83/410

Articial Intelligence andNeural Networks p.84/

Depth-rst Search(7)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

Depth-rst Search(8)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

Depth-rst Search(9)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

A B D H I J E K L F M N C G O H D I J B E K

A C F L M N G O H D I J B E K

A C F L M N G O

Articial Intelligence andNeural Networks p.85/410

Articial Intelligence andNeural Networks p.86/410

Articial Intelligence andNeural Networks p.87/

Depth-rst Search(10)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

Depth-rst Search(11)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

Depth-rst Search(12)
Expand deepest unexpanded node Implementation: fringe is a LIFO queue (or stack), i.e., new successors are added at front

A B D H I J E K L F M N C G O H D I J B E K

A C F L M N G O H D I J B E K

A C F L M N G O

Articial Intelligence andNeural Networks p.88/410

Articial Intelligence andNeural Networks p.89/410

Articial Intelligence andNeural Networks p.90/

How does DFS fare?


Completeness: no (fails in innitely deep spaces or spaces with loops, complete in nite spaces; modify to avoid repeated states) Optimality: no Time complexity: O (bm ), terrible if m much larger than d, if solutions are dense, faster than BFS Space complexity: O (bm), newly expanded nodes are expanded all the way down to a leaf node

Depth-limited Search (DLS)


DFS with depth limit l, i.e., nodes at depth l are not expanded
function DLS(problem,limit) RECURSIVE-DLS(INITIAL-STATE(problem),problem,limit) function RECURSIVE-DLS(node,problem,limit) limit_reached = false if GOAL-TEST(problem,STATE(node)) then return node else if DEPTH(node) = limit then return cutoff else for each successor in EXPAND(node,problem) do result = RECURSIVE-DLS(successor,problem,limit) if result = cutoff then limit_reached = true else if result <> failure then return result end if limit_reached then return cutoff else return failure

Iterative Deepening Search (IDS)


IDS is an iterative depth-limited search that gradually increases the limit Combines the benets of BFS and DFS Lower memory consumption than BFS Is complete (and optimal, if step cost = 1) Disadvantage: states are generated multiple times

Articial Intelligence andNeural Networks p.91/410

Articial Intelligence andNeural Networks p.92/410

Articial Intelligence andNeural Networks p.93/

Iterative Deepening Search(2)


function IDS(problem) for depth = 0 to infinity do result = DLS(problem,depth) if result <> cutoff then return result end

Iterative Deepening Search(3)


Limit = 0
A A
Limit = 1

Iterative Deepening Search(4)


A B C B A C B A C B A C

Articial Intelligence andNeural Networks p.94/410

Articial Intelligence andNeural Networks p.95/410

Articial Intelligence andNeural Networks p.96/

Iterative Deepening Search(5)


Limit = 2
B D E F A C G D B E F A C G D B E F A C G D B E F A C G H A B D E F C G D B E F A C G D B E F A C G D B E F A D I

Iterative Deepening Search(6)


Limit = 3
B E J K L F M N A C G O H D I J B E K L F M N A C G O H D I J B E K L F M N A C G O H D I J B E K L F M N A C G O

How does IDS fare?


Completeness: yes Optimality: yes (if step cost = 1; can also be modied to explore uniform-cost tree) Time complexity: (d + 1)b0 + db1 + (d 1)b2 + + bd = O (bd )
G M N O

A C B G D H I J E K L F M N G O H D I J E K C B

A C F L M N G O H D I J B E K

A C F L M N G O H D I J B E K

A C F L

Space complexity: O (bd)

A B D H I J E K L F M N C G O H D I J B E K

A C F L M N G O H D I J B E K

A C F L M N G O H D I J B E K

A C F L M N G O

Articial Intelligence andNeural Networks p.97/410

Articial Intelligence andNeural Networks p.98/410

Articial Intelligence andNeural Networks p.99/

Avoiding Repeated States


One complication ignored up to now: wasting space by expanding states that have already been encountered Failure to detect repeated states can turn linear problem into an exponential one:
A B C D C A

Avoiding Repeated States(2)


Sometimes repeated states are unavoidable (e.g. problems where actions are reversible) Then search trees are innite, but often we can prune some of the repeated states DFS can detect loops (as it knows path from root to current node), but cannot avoid situation on previous slide
C

Avoiding Repeated States(3)


General tree search can be modied to include new data structure Closed list stores every expanded node If current node matches a node in closed list, it is discarded New algorithm is called graph search

B C C

Only way to avoid these is to keep more nodes in memory Fundamental tradeoff between time and space

Articial Intelligence andNeural Networks p.100/410

Articial Intelligence andNeural Networks p.101/410

Articial Intelligence andNeural Networks p.102/

Graph Search
function GRAPH-SEARCH(problem,fringe) closed = empty set fringe = INSERT(INITIAL-STATE(problem),fringe) loop do if fringe is empty then return failure node = REMOVE-FIRST(fringe) if GOAL-TEST(problem,STATE(node)) then return node if STATE(node) is not in closed then add STATE(node) to closed fringe = INSERTALL(EXPAND(node,problem), fringe) end
Articial Intelligence andNeural Networks p.103/410

How does Graph Search fare?


Completeness: yes Optimality: no (when two paths to a state exist, the newly discovered is discarded; can be modied by using UCS) Time complexity: proportional to the size of the state space Space complexity: proportional to the size of the state space Graph search is more efcient for problems with many repeated states

Chapter Summary
This chapter dealt with problem formulation and (simple) search strategies Problem formulation involves abstracting away irrelevant real-world details for feasibility reasons Variety of uninformed search strategies IDS is often preferred search method for large search spaces and unknown depths of goals Graph search can be exponentially more efcient than tree search

Articial Intelligence andNeural Networks p.104/410

Articial Intelligence andNeural Networks p.105/

Chapter 4

Outline
Using problem-specic knowledge searching can be improved considerably We look at following strategies in this chapter: Best-rst search Greedy search A search Heuristics

Review: Tree Search

Informed Search Algorithms

function GEN-TREE-SEARCH(problem,fringe) fringe = INSERT(INITIAL-STATE(problem),fringe loop do if fringe is empty then return failure node = REMOVE-FIRST(fringe) if GOAL-TEST(problem,STATE(node)) then return node fringe = INSERT(EXPAND(node,problem),fringe end

Strategy is dened by picking order of node expansion


INSERT (or REMOVE-FIRST) are crucial functions

Articial Intelligence andNeural Networks p.106/410

Articial Intelligence andNeural Networks p.107/410

Articial Intelligence andNeural Networks p.108/

Best-rst Search
Main idea: node is selected for expansion based on an evaluation function f (n) Usually node with lowest score is selected (as f (n) normally measures distance to goal) Implementation: fringe is a queue sorted in ascending order of evaluation scores

Best-rst Search(2)
Rarely possible to nd perfect evaluation function f (n) There is a whole family of best-rst search algorithms Key component is heuristics function h(n) that estimates cost of the cheapest path from node n to a goal E.g. for Romanian route h(n) could be straight-line distance from n to Bucharest We look at two members of family: Greedy search A search

Greedy Search
Evaluation function f (n) = h(n) Greedy search expands node that appears to be closest to goal For Romanian route example we choose hSLD (n) = straight-line distance from n to Bucharest

Articial Intelligence andNeural Networks p.109/410

Articial Intelligence andNeural Networks p.110/410

Articial Intelligence andNeural Networks p.111/

Heuristic for Romania


71
Oradea Neamt

Greedy Search Example


Straightline distance to Bucharest Arad 366 Bucharest 0 Craiova 160 Dobreta 242 Eforie 161 Fagaras 178 Giurgiu 77 Hirsova 151 Iasi 226 Lugoj 244 Mehadia 241 Neamt 234 Oradea 380 Pitesti 98 Rimnicu Vilcea 193 Sibiu 253 Timisoara 329 Urziceni 80 Vaslui 199 Zerind 374 Arad 366

Greedy Search Example(2)


Arad

75
Arad

Zerind

151

87
Iasi

Sibiu 253

Timisoara 329

Zerind 374

140
Sibiu

92 99
Fagaras Vaslui Rimnicu Vilcea

118 80
Timisoara

111 70

Lugoj

97 146

Pitesti

211

142 98

Mehadia

101 138

85

Hirsova

Urziceni

75
Dobreta

86
Bucharest

120
Craiova

90
Giurgiu Eforie

Articial Intelligence andNeural Networks p.112/410

Articial Intelligence andNeural Networks p.113/410

Articial Intelligence andNeural Networks p.114/

Greedy Search Example(3)


Arad

Greedy Search Example(4)


Arad

How does Greedy Search fare?


Completeness: no (can get stuck in loops, e.g. getting from Neamt to Fagaras: Neamt - Iasi - Neamt - . . . )

Sibiu

Timisoara 329

Zerind 374

Sibiu

Timisoara 329

Zerind 374

Complete in nite spaces with checking for repeated states Optimality: no (only optimal for problems with matroid property) Time complexity: O (bm ), but a good heuristic can give dramatic improvement Space complexity: O (bm ) (keeps all nodes in memory)

Arad 366

Fagaras 176

Oradea 380

Rimnicu Vilcea

193

Arad 366

Fagaras

Oradea 380

Rimnicu Vilcea

193

Sibiu 253

Bucharest 0

Articial Intelligence andNeural Networks p.115/410

Articial Intelligence andNeural Networks p.116/410

Articial Intelligence andNeural Networks p.117/

A Search
Idea: avoid expanding paths that are already expensive Evaluation function f (n) = g (n) + h(n)
g (n) = cost so far to reach n h(n) = estimated cost from goal to n f (n) = estimated cost of path through n to goal

A Search(2)
A search is optimal, if we use an admissible heuristic This means, if we are wrong, we only underestimate the true costs: h(n) h (n) where h (n) is the true cost We also require h(G) = 0 for any goal G and h(n) > 0 otherwise For our example, hSLD (n) is an admissible heuristic

A Search Example
Arad 366=0+366

If h(n) has certain properties, then A search is optimal!

Articial Intelligence andNeural Networks p.118/410

Articial Intelligence andNeural Networks p.119/410

Articial Intelligence andNeural Networks p.120/

A Search Example(2)
Arad

A Search Example(3)
Arad

A Search Example(4)
Arad

Sibiu 393=140+253

Timisoara 447=118+329

Zerind 449=75+374 Arad Fagaras

Sibiu

Timisoara 447=118+329 Oradea


Rimnicu Vilcea

Zerind 449=75+374 Arad Fagaras

Sibiu

Timisoara 447=118+329 Oradea


Rimnicu Vilcea

Zerind 449=75+374

646=280+366 415=239+176 671=291+380 413=220+193

646=280+366 415=239+176 671=291+380 Craiova Pitesti Sibiu

526=366+160 417=317+100 553=300+253

Articial Intelligence andNeural Networks p.121/410

Articial Intelligence andNeural Networks p.122/410

Articial Intelligence andNeural Networks p.123/

A Search Example(5)
Arad

A Search Example(6)
Arad

Optimality of A
Suppose some suboptimal goal G2 has been generated and is in the queue

Sibiu

Timisoara 447=118+329

Zerind 449=75+374 Arad 646=280+366 Fagaras

Sibiu

Timisoara 447=118+329 Oradea


Rimnicu Vilcea

Zerind 449=75+374

Let n be an unexpanded node on a shortest path to an optimal goal


Start

Arad 646=280+366 Sibiu

Fagaras

Oradea 671=291+380

Rimnicu Vilcea

671=291+380 Bucharest 450=450+0 Craiova 526=366+160 Bucharest 418=418+0 Craiova Pitesti Sibiu 553=300+253
Rimnicu Vilcea

Bucharest 450=450+0

Craiova

Pitesti

Sibiu

Sibiu 591=338+253

591=338+253

526=366+160 417=317+100 553=300+253

615=455+160 607=414+193

G2

Articial Intelligence andNeural Networks p.124/410

Articial Intelligence andNeural Networks p.125/410

Articial Intelligence andNeural Networks p.126/

Optimality of A(2)
f (G2 ) = g (G2 ) since h(G2 ) = 0 > g (G1 ) since G2 is suboptimal f (n) since h is admissible

How does A fare?


Completeness: yes (unless innitely many nodes with f f (G ) Optimality: yes Time complexity: exponential in relative error in h length of solution Space complexity: keeps all nodes in memory

Heuristics Functions
Well now shed some light on h(n) in general Lets take another look at the 8-puzzle
7 5 8 3
Start State

4 6 1

5 1 4 7

2 5 8
Goal State

3 6

Since f (G2 ) > f (n), expansion

will never select G2 for

Average solution cost for a randomly generated instance is about 22 steps Branching factor is about 3
Articial Intelligence andNeural Networks p.127/410 Articial Intelligence andNeural Networks p.128/410

Articial Intelligence andNeural Networks p.129/

Heuristics Functions(2)
Exhaustive search would look at about 3.1 101 0 states Keeping track of repeated states this can be cut down to 181,440 For 8-puzzle its manageable, for 15-puzzle corresponding number is roughly 101 3 We need a good heuristic function

Heuristics Functions(3)
Commonly-used candidates:
h1 (n) = the number of misplaced tiles h2 (n) = total Manhattan distance (i.e., number of squares from desired location of each tile)
7 5 8 3
Start State

Heuristics Functions(4)
Both, h1 and h2 are admissible How efcient are they? Typical search costs:
d = 14 IDS = 3,473,941 nodes A (h1 ) = 539 nodes A (h2 ) = 113 nodes d = 24 IDS 54,000,000,000 nodes A (h1 ) = 39,135 nodes A (h2 ) = 1,641 nodes

4 6 1

5 1 4 7

2 5 8
Goal State

3 6

h1 (Start State) = 6 (tile 2 and 6 are placed correctly) h2 (Start State) = 4 + 0 + 3 + 3 + 1 + 0 + 2 + 1 = 14


Articial Intelligence andNeural Networks p.130/410 Articial Intelligence andNeural Networks p.131/410

Articial Intelligence andNeural Networks p.132/

Quality of Heuristics
h2 seems to be better than h1

Inventing Admissible Heuristics


Admissible heuristics can be derived from the exact solution of a relaxed version of the problem Relaxed means we have fewer restrictions on the allowed actions If a tile can move anywhere (even to occupied spaces), h1 (n) gives shortest solution If a tile can move to any adjacent square (even to occupied spaces), h2 (n) gives shortest solution Key point: optimal solution cost of relaxed problem optimal solution cost of real problem

Inventing Admissible Heuristics(2)


Relaxed problem should be much easier to solve If it is hard to solve, obtaining the values for h(n) will be expensive Heuristics can be combined via max to get dominant one Process has been automated in a program called ABSOLVER (problem denition needs to be formalized)

Is this always the case? Yes, as h2 dominates h1 , i.e., h2 (n) h1 (n) for all n In terms of efciency this means, A using h2 will never expand more nodes than A using h1 If you are not sure about dominance: given any admissible heuristics ha and hb , h(n) = max(ha (n), hb (n)) is also admissible and dominates ha and hb

Articial Intelligence andNeural Networks p.133/410

Articial Intelligence andNeural Networks p.134/410

Articial Intelligence andNeural Networks p.135/

Chapter Summary
Heuristic functions estimate costs of shortest paths Good heuristics can dramatically reduce search cost Greedy search expands node with lowest h(n) Incomplete and not always optimal A search expands node with lowest g (n) + h(n) Complete and optimal Also very efcient Admissible heuristics can be derived by relaxing problems

Chapter 5
Hill climbing

Outline
Simulated Annealing Local Beam Search

Local Search Algorithms

Genetic algorithms (very briey)

Articial Intelligence andNeural Networks p.136/410

Articial Intelligence andNeural Networks p.137/410

Articial Intelligence andNeural Networks p.138/

Motivation
Search algorithms up to now memorize path from initial state to goal In many problems path is irrelevant, we are only interested in a solution (e.g. 8-queens problem) This class of problems includes many important applications Integrated-circuit design Factory-oor layout Job scheduling Network optimization Vehicle routing Portfolio management
Articial Intelligence andNeural Networks p.139/410

Iterative Improvement (II)


In such cases we can use iterative improvement algorithms Keep a single current state and try to improve this state Constant space, suitable for online as well as ofine search

Example: Traveling Salesman


Start with any complete tour, perform pairwise exchange (that improves current tour)

Variants of this approach get within 1% of optimum very quickly (for thousands of cities)

Articial Intelligence andNeural Networks p.140/410

Articial Intelligence andNeural Networks p.141/

Example: n-queens
Put n queens on an n n board with no two queens sharing a row, column, or diagonal II: move a queen to reduce number of conicts

Hill-climbing Search
Simply a loop that continues moving in the direction of increasing value Terminates when it reaches a peak (where no neighboring state has a higher value) Does not look beyond immediate neighbors of current state (greedy local search) Like climbing Everest in thick fog with amnesia

Hill-climbing Search(2)
function HILL-CLIMBING(problem) current = INITIAL-STATE(problem) loop do neighbor = highest-valued successor of current if VALUE(neighbor) <= VALUE(current) then return STATE(current) current = neighbor end

h=5

h=2

h=0

Solves n-queens problem very quickly for very large n

Articial Intelligence andNeural Networks p.142/410

Articial Intelligence andNeural Networks p.143/410

Articial Intelligence andNeural Networks p.144/

Problems with Hill-climbing


Hill-climbing can get stuck without nding optimum Is trapped in local maxima or shoulders
objective function

Solving the Problems


Random sideway moves allow escapes from shoulders But still lead to innite loops on at maxima Common technique: restart hill climbing with new random initial state Depending on the shape of the state space landscape this is more or less successful

Simulated Annealing (SA)


Idea: escape local maxima by allowing some bad moves These bad moves are gradually decreased in their size and frequency Modeled after gradually cooling down material in a heat bath to grow crystals

global maximum

shoulder local maximum "flat" local maximum

current state

state space

Articial Intelligence andNeural Networks p.145/410

Articial Intelligence andNeural Networks p.146/410

Articial Intelligence andNeural Networks p.147/

Simulated Annealing(2)
function SIM-ANNEALING(problem,schedule) current = INITIAL-STATE(problem) t = 1 loop do temperature = schedule[t] if temperature = 0 then return current next = randomly selected successor of current diff = VALUE(next) - VALUE(current) if diff > 0 then current = next else current = next only with probability e(diff/temperature) end
Articial Intelligence andNeural Networks p.148/410

Simulated Annealing(3)
Vivid description: Getting a ping-pong ball into the deepest crevice of a bumpy surface (turning around the hill) Left alone by itself, ball will roll into a local minimum If we shake the surface, we can bounce the ball out of a local minimum The trick is to shake hard enough to get it out of local minimum, but not hard enough to dislodge it from global one We start by shaking hard and then gradually reduce the intensity of shaking

Local Beam Search


Idea: keep k states instead of just 1 Begins with k randomly generated states At each step all the successors of all k states are generated If one is a goal, we stop, otherwise select k best successors from complete list and repeat

Articial Intelligence andNeural Networks p.149/410

Articial Intelligence andNeural Networks p.150/

Local Beam Search(2)


At rst glance, local beam search looks like running k iterative improvement algorithms in parallel However, it isnt, as the results of all k states inuence each other If one state generates several good successor, they all end up in the next iteration States generating bad successor are weeded out

Local Beam Search(3)


This is a strength and a weakness Unfruitful searches are quickly abandoned and searches making the most progress are intensied Can lead to a lack of diversity: concentration in a small region of the search space Remedy: choose k successors randomly, biased towards good ones

Genetic Algorithms (GA)


Genetic algorithms can be seen as a variant of stochastic local beam search However, successor states are generated by combining two parent states (rather than modifying one parent state) A more detailed look at GA later in the lecture

Articial Intelligence andNeural Networks p.151/410

Articial Intelligence andNeural Networks p.152/410

Articial Intelligence andNeural Networks p.153/

Chapter Summary
We covered search algorithms that do not care about path from initial state to goal Only solutions are relevant for local search algorithms

Chapter 6

Outline
Constraint Satisfaction Problems (CSP) examples Backtracking search for CSPs Problem structure and problem decomposition

Constraint Satisfaction Problems

Local search for CSPs

Articial Intelligence andNeural Networks p.154/410

Articial Intelligence andNeural Networks p.155/410

Articial Intelligence andNeural Networks p.156/

Motivation
Up to now the states in search spaces were black boxes to the search algorithms Only accessible by problem-specic routines: successor function, heuristic function, and goal test The search algorithm itself had no knowledge about the internals of the states We now look at CSPs, whose states and goal tests conform to a standard, structured, and simple representation Consequence: search algorithms can use general-purpose rather than problem-specic heuristics

Representation of a CSP
A CSP is dened by a set of variables X1 , X2 , . . . , Xn and a set of constraints C1 , C2 , . . . , Cm Each variable has a domain Di of possible values Each constraint species the allowable combination of values for some subset of variables An assignment not violating constraints is called consistent (or legal) A consistent, complete assignment (involving every variable) is a solution Some CSPs also require a solution that maximizes an objective function
Articial Intelligence andNeural Networks p.157/410 Articial Intelligence andNeural Networks p.158/410

Example: Map-coloring

Northern Territory Western Australia South Australia New South Wales Queensland

Victoria

Tasmania

Variables: WA, NT, Q, NSW, V, SA, T Domains: Di = {red, green, blue} Constraints: adjacent regions must have different colors

Articial Intelligence andNeural Networks p.159/

Example: Map-coloring(2)
Formal description of constraints: WA = NT, WA = SA, NT = SA, NT = Q, . . . Or, depending on description language allowed: (WA,NT) { (red,green),(red,blue),(green,red), . . . } (WA,SA) { (red,green),(red,blue),(green,red), . . . } (NT,SA) { (red,green),(red,blue),(green,red), . . . } We now have to nd a complete assignment that does not violate any constraint

Example: Map-coloring(3)
Possible solution: { WA=red, NT=green, Q=red, NSW=green, V=red, SA=blue, T=green }

Constraint Graph
Binary CSP: each constraint relates at most two variables Constraint graph: nodes are variables, edges are constraints
NT Q

Northern Territory Western Australia South Australia New South Wales Queensland

WA SA V Victoria
NSW

Victoria

Tasmania

Articial Intelligence andNeural Networks p.160/410

Articial Intelligence andNeural Networks p.161/410

Articial Intelligence andNeural Networks p.162/

Constraint Graph(2)
General-purposes CSP algorithms use graph structure A relatively simple way to describe a CSP Can speed up search, e.g. Tasmania is an independent subproblem

Varieties of Constraints
Unary constraints involve a single variable E.g. SA = green Binary constraints involve a pair of variables E.g. SA = WA Higher-order constraints involve three or more variables E.g. cryptarithmetic column constraints (example in just a moment) Preferences (soft constraints) E.g. red is better than green Often represented by costs for a variable assignments, also called constrained optimization problems

Example: Cryptarithmetic
T WO + T WO F O U R
F T U W R

X3

X2

X1

Variables: F, T, U, W, R, O, X1 , X2 , X3 Domains: { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 } Constraints: alldifferent(F, T, U, W, R, O) O+O=R+10X1 , W+W+X1 =R+10X2 , . . .

Articial Intelligence andNeural Networks p.163/410

Articial Intelligence andNeural Networks p.164/410

Articial Intelligence andNeural Networks p.165/

Varieties of CSPs
Discrete variables Finite domains: size d O (dn ) complete assignments with n variables E.g. Boolean CSPs: Boolean satisability (NP-complete) Innite domains: (integers, strings, etc.) with n variables E.g. job scheduling, variables are start/end days for each job Formulated via constraint language: e.g. StartJob1 + 5 StartJob3 Linear constraints solvable, nonlinear undecidable (in general case)
Articial Intelligence andNeural Networks p.166/410

Varieties of CSPs(2)
Continuous variables E.g. start/end times for Hubble Telescope observations Linear constraints solvable in polynomial time by Linear Programming methods Very common in the real world, widely studied in Operations Research

Real-world CSPs
Assignment problems, e.g. who teaches what class? Timetabling problems, e.g. which train arrives and leaves when and where? Transportation scheduling, e.g. which vehicle leaves when and where and carries which goods with it? Usually very hard to solve

Articial Intelligence andNeural Networks p.167/410

Articial Intelligence andNeural Networks p.168/

Standard Search Formulation


CSPs can be formulated as standard search problems: Initial state: the empty assignment Successor function: assign a value to an unassigned value in a consistent way Goal test: current assignment is complete Path cost: constant cost (e.g. 1) for each step Every solution is a complete assignment Assuming n variables, it appears at depth n That means, the search tree extends only to depth n We could use standard search algorithms: BFS, DFS, IDS

Standard Search Formulation(2)


Path to solution is irrelevant Alternative: use complete-state formulation Every state is a complete assignment, which may or may not satisfy constraints Then apply local search methods to this state

Standard Search Formulation(3)


Why is naive application of search algorithms a bad idea? Lets have a look at breadth-rst search: Branching factor at top level is nd, any of d values can be assigned to any of the n variables On the next level it is (n 1)d, and so on for n levels This generates a tree with n! dn leaves, although there are only dn possible complete assignments!

Articial Intelligence andNeural Networks p.169/410

Articial Intelligence andNeural Networks p.170/410

Articial Intelligence andNeural Networks p.171/

Improving the Search


We have ignored a crucial property of CSPs: commutativity Order of the application of actions has no effect on the outcome When assigning values to variables, the order of assignment doesnt matter CSP search algorithms generate successors by considering assignment of only a single variable at each node: E.g. at root node for coloring map, we choose between SA=red, SA=green, SA=blue, but not between SA=red, WA=blue, NT=green, NSW=red, ...
Articial Intelligence andNeural Networks p.172/410

Improving the Search(2)


With this restriction we get number of leaves back down to dn CSP search is done via backtracking search Depth-rst search that chooses values for one variable at a time If no legal values are left (due to constraints), it backtracks

Backtracking Search
function BACKTRACK(csp) return RECURSIVE-BACKTRACK({},csp)

function RECURSIVE-BACKTRACK(assignment,csp) if assignment is complete then return assignment var = SELECT-UNASSIGNED(VARIABLES(csp), assignment,csp) for each value in DOMAIN(var,assignment,csp) d if value is consistent with CONSTRAINTS(csp) then add {var = value} to assignment result = RECURSIVE-BACKTRACK(assignment csp) if result <> failure then return result remove {var = value} from assignment end return failure

Articial Intelligence andNeural Networks p.173/410

Articial Intelligence andNeural Networks p.174/

Backtracking Example

Backtracking Example(2)

Backtracking Example(3)

Articial Intelligence andNeural Networks p.175/410

Articial Intelligence andNeural Networks p.176/410

Articial Intelligence andNeural Networks p.177/

Backtracking Example(4)

Improving Efciency
Backtracking can be improved in terms of efciency by looking at: Which variable should be assigned next? In what order should its values be tried? Can we detect inevitable failure early? Can we take advantage of the problem structure?

Minimum Remaining Values (MRV)


Choose variable with the fewest legal values

Articial Intelligence andNeural Networks p.178/410

Articial Intelligence andNeural Networks p.179/410

Articial Intelligence andNeural Networks p.180/

Degree Heuristic
If there are ties among MRVs use degree heuristic Choose variable with the most constraints on remaining variables

Least Constraining Value


Given a variable, choose least constraining value The one that rules out fewest values in remaining variables

Forward Checking
Idea: keep track of remaining legal values for unassigned variables Terminate branch of search when any variable has no legal values

Allows 1 value for SA

Allows 0 values for SA WA NT Q NSW V SA T

Articial Intelligence andNeural Networks p.181/410

Articial Intelligence andNeural Networks p.182/410

Articial Intelligence andNeural Networks p.183/

Forward Checking(2)
Idea: keep track of remaining legal values for unassigned variables Terminate branch of search when any variable has no legal values

Forward Checking(3)
Idea: keep track of remaining legal values for unassigned variables Terminate branch of search when any variable has no legal values

Forward Checking(4)
Idea: keep track of remaining legal values for unassigned variables Terminate branch of search when any variable has no legal values

WA

NT

NSW

SA

WA

NT

NSW

SA

WA

NT

NSW

SA

Articial Intelligence andNeural Networks p.184/410

Articial Intelligence andNeural Networks p.185/410

Articial Intelligence andNeural Networks p.186/

Constraint Propagation
Forward checking propagates information from assigned to unassigned variables Doesnt provide early detection for all failures NT and SA cannot both the blue:

Constraint Propagation(2)
Constraint propagation repeatedly enforces constraints locally Forward checking propagates from WA and Q onto NT and SA We want to continue by propagating onto the constraint between NT and SA And we want to do this efciently, reducing search space is no good if it takes longer than simple search

Arc Consistency
Method of constraint propagation that is stronger than forward checking Arc refers to a (directed) edge in the constraint graph E.g. there is an arc from SA to NSW Arc is consistent, iff for every value x of SA, there is some value y of NSW that is consistent

WA

NT

NSW

SA

WA

NT

NSW

SA

Articial Intelligence andNeural Networks p.187/410

Articial Intelligence andNeural Networks p.188/410

Articial Intelligence andNeural Networks p.189/

Arc Consistency(2)
Simplest form of propagation makes each arc consistent If we nd a value x for which no consistent y exists, we delete x E.g. arc from NSW to SA

Arc Consistency(3)
If a variable loses a value, neighbors of this variable need to be rechecked E.g. arc from V to NSW

Arc Consistency(4)
Arc consistency detects failure earlier than forward checking E.g. arc from SA to NT

WA

NT

NSW

SA

WA

NT

NSW

SA

WA

NT

NSW

SA

Articial Intelligence andNeural Networks p.190/410

Articial Intelligence andNeural Networks p.191/410

Articial Intelligence andNeural Networks p.192/

Arc Consistency Algorithm


function AC-3(csp) queue = all arcs in csp while queue is not empty do (Xi,Xj) = REMOVE-FIRST(queue) if REMOVE-INCONSISTENT(Xi,Xj) then for each Xk in NEIGHBORS(Xi) do add (Xk,Xi) to queue end end

Arc Consistency Algorithm(2)


function REMOVE-INCONSISTENT(Xi,Xj) removed = false for each x in DOMAIN(Xi) do if no value y in DOMAIN(Xj) allows (x,y) to satisfy constraint between Xi and Xj then delete x from DOMAIN(Xi) removed = true end return removed

Problem Structure
Structure of the constraint graph can often be exploited We are going to look at two techniques for independent subproblems tree-structured problems

Articial Intelligence andNeural Networks p.193/410

Articial Intelligence andNeural Networks p.194/410

Articial Intelligence andNeural Networks p.195/

Independent Subproblems
NT Q WA SA V Victoria
NSW

Independent Subproblems(2)
Each of these (smaller) subproblems can be solved independent of each other Performance gains can be quite high: Suppose each subproblem has c variables out of n total Worst-case solution cost is n/c dc Compare with dn for whole problem: Assume n = 80, d = 2, c = 20 and 10 million nodes/sec processing speed Whole problem: 280 4 billion years All subproblems: 4 220 0.4 seconds

Tree-structured CSPs
A B C D F E

If the constraint graph has no loops, CSP can be solved in O (nd2 ) time (instead of worst-case O (dn ))

Tasmania and mainland are independent subproblems Identiable as connected components of constraint graphs

Articial Intelligence andNeural Networks p.196/410

Articial Intelligence andNeural Networks p.197/410

Articial Intelligence andNeural Networks p.198/

Tree-structured CSPs(2)
Choose a variable as root, order variables from root to leaves (parent precedes all children)

Nearly Tree-Structured CSPs


Conditioning: instantiate a variable, prune its neighbors domains
NT Q NT Q WA SA
NSW NSW

Local Search for CSPs


Hill-climbing, simulated annealing work with complete states To apply to CSPs Allow states with violated constraints Successor function reassigns variable values Variable selection: randomly Better: use min-conict heuristic Change value that leads to fewest constraints violated E.g. do hill-climbing with h(n) = total number of violated constraints

A B C D

E A F B C D E F
WA

Victoria

Victoria

For j from n down to 2, apply REMOVE-INCONSISTENT(Parent(Xj ),Xj ) For j from 1 to n, assign Xj consistently with Parent(Xj )

Cutset conditioning: instantiate (in all ways) as set of variables, such that remaining graph is a tree

Articial Intelligence andNeural Networks p.199/410

Articial Intelligence andNeural Networks p.200/410

Articial Intelligence andNeural Networks p.201/

Example: 4-queens
States: 4 queens in 4 columns (44 = 256 states) Successor function: move queen up or down in column Goal test: no attacking queens Evaluation function: h(n) = number of attacks

Performance
Given random initial state, can solve n-queens in almost constant time for large n with high probability In general very good for any randomly-generated CSP Exceptions are problems in a narrow range of the ratio R = number of constraints number of variables
CPU time

Chapter Summary
CSPs are a special kind of problem States dened by values of a xed set of variables Goal test dened by constraints on variable values Backtracking = depth-rst search with one variable assignment per node Various techniques to improve performance Alternative: local search with min-conicts heuristic Usually efcient in practice

h=5

h=2

h=0

R critical ratio
Articial Intelligence andNeural Networks p.202/410 Articial Intelligence andNeural Networks p.203/410

Articial Intelligence andNeural Networks p.204/

Chapter 7
Games Perfect play

Outline

Motivation
Up to now search problems were hard, but nobody was working against us How do we plan if other agents are planning against us? Games are an ideal domain for exploring capabilities of AI in terms of adversarial search: The rules are xed The scope of the problem is constrained The interactions between players are well dened Yet, problems are far from simple Can be seen as the Formula 1 of AI research

Adversarial Search

Minimax decisions - -pruning Resource limits and approximate evaluation Games of chance Games of imperfect information

Articial Intelligence andNeural Networks p.205/410

Articial Intelligence andNeural Networks p.206/410

Articial Intelligence andNeural Networks p.207/

Games vs. Searching


In games we have an unpredictable opponent Solution is a strategy to nd an answer to each move of our opponent Due to time limits, perfect play often not possible, need to approximate perfect information imperfect information

Types of Games
deterministic chess, checkers, go, othello battleships, stratego random element backgammon, monopoly bridge, poker, scrabble

Representation
Well rst consider games with two players, called MAX and MIN A game can be formally dened as a kind of search problem Initial state: includes board position and identies player to move Successor function: returns a list of legal moves and resulting states Terminal test: determines when the game is over (terminal states are states where the game has ended) Utility function: gives numeric values for terminal states (e.g. win = +1, loss = -1, draw = 0)

Articial Intelligence andNeural Networks p.208/410

Articial Intelligence andNeural Networks p.209/410

Articial Intelligence andNeural Networks p.210/

Representation(2)
Game tree for Tic-Tac-Toe
MAX (X)

Minimax
In normal search, optimal solution is a sequence of moves leading to a goal (each terminal state is a win) In a game, however, opponent MIN has something to say about it

Minimax(2)
Idea: choose move to position with highest minimax values Best achievable payoff against best play
MAX

X MIN (O)

X X X X X X X

X O MAX (X)

X O

Well rst look at deterministic, perfect-information games MAX must nd a contingent strategy: Specify MAXs move in initial state Then MAXs moves in the states resulting from every possible response by MIN Then MAXs moves replying to MINs response to those moves, and so on

3
A1 A2 A3

...

MIN
A 11 A 12

3
A 13 A 21

2
A 22 A 23 A 31

2
A 32 A 33

X O X MIN (O)

X O X

X O X

...

...

...

...

...

TERMINAL Utility

X O X O X O 1

X O X O O X X X O 0

X O X X X O O +1

...

12

14

Articial Intelligence andNeural Networks p.211/410

Articial Intelligence andNeural Networks p.212/410

Articial Intelligence andNeural Networks p.213/

Minimax(3)
function MINIMAX-DECISION(state) return the a in ACTIONS(state) maximizing MIN-VALUE(RESULT(a,state)) function MIN-VALUE(state) if TERMINAL-TEST(state) then return UTILITY(state) v = infinity for a,s in SUCCESSORS(state) do v = MIN(v, MAX-VALUE(s)) end return v function MAX-VALUE(state) if TERMINAL-TEST(state) then return UTILITY(state) v = - infinity for a,s in SUCCESSORS(state) do v = MAX(v, MIN-VALUE(s)) end return v
Articial Intelligence andNeural Networks p.214/410

How does Minimax fare?


Completeness: yes (if tree is nite; even for some innite trees, nite strategy can exist) Optimality: yes (against an optimal opponent) Time complexity: O (bm ) For chess, b 35, m 100, infeasible Space complexity: O (bm) (depth-rst search) But do we need to explore every path?

- Pruning
Problem with minimax: number of examined games states exponential in the number of moves We cant eliminate exponent, but can effectively cut it in half
- pruning cuts off branches that cannot possibly inuence nal decision

Articial Intelligence andNeural Networks p.215/410

Articial Intelligence andNeural Networks p.216/

- Pruning(2)
MAX

- Pruning(3)
MAX

- Pruning(4)
MAX

MIN

MIN

MIN

14

12

12

12

14

Articial Intelligence andNeural Networks p.217/410

Articial Intelligence andNeural Networks p.218/410

Articial Intelligence andNeural Networks p.219/

- Pruning(5)
MAX

- Pruning(6)
MAX

Why Is It Called - ?
MAX

3 3

MIN

MIN

14

MIN

14

5 2

.. .. .. MAX MIN

12

14

12

14

is best value (to MAX) found so far (in another branch)

If v is worse than , MAX will avoid it (cut off this branch)


is best values (to MIN) so far
Articial Intelligence andNeural Networks p.220/410 Articial Intelligence andNeural Networks p.221/410

Articial Intelligence andNeural Networks p.222/

- Algorithm
function ALPHA-BETA-DECISION(state) return the a in ACTIONS(state) maximizing MIN-VALUE(RESULT(a,state),-infinity,infinity) function MIN-VALUE(state,alpha,beta) if TERMINAL-TEST(state) then return UTILITY(state) v = infinity for a,s in SUCCESSORS(state) do v = MIN(v, MAX-VALUE(s,alpha,beta)) if v <= alpha then return v beta = MIN(beta,v) end return v function MAX-VALUE(state) same as MAX-VALUE but with roles of alpha and beta reversed

Properties of - Pruning
Pruning does not affect nal result Good move ordering improves effectiveness of pruning With perfect ordering, time complexity = Unfortunately, 3550 is still infeasible
O (bm/2 )

Resource Limits
Standard approach Use CUTOFF-TEST instead of TERMINAL-TEST, e.g. depth limit Use EVAL instead of UTILITY, i.e. evaluation function that estimates desirability of a position State-of-the-art: Deep Blue, up to 2 108 nodes/sec Assume we have 300 seconds We can go through 6 1010 nodes 3514/2 - reaches depth 14 Evaluation function is the crucial element in quality of play

Articial Intelligence andNeural Networks p.223/410

Articial Intelligence andNeural Networks p.224/410

Articial Intelligence andNeural Networks p.225/

Evaluation Functions
For chess, typically linear weighted sum of features: EVAL(s) = w1 f1 (s) + w2 f2 (s) + + wn fn (s) E.g., f1 (s) = (# white queens - # black queens) with w1 = 9
MAX

Evaluation Functions(2)
Exact values dont matter, only the order matters Behavior is preserved under any monotonic transformation of EVAL

Deterministic Games in Practice


Checkers: Chinook defeats human world champion Marion Tinsley in 1994 Chess: Deep Blue defeats human world champion Gary Kasparov in 1997

MIN

20

Othello:
400

20

20

Human champions refuse to play computers, which are too good Go: Human champions refuse to play computers, which are too bad (in Go, b > 300)

Black to move White slightly better

White to move Black winning


Articial Intelligence andNeural Networks p.226/410 Articial Intelligence andNeural Networks p.227/410

Articial Intelligence andNeural Networks p.228/

Nondeterministic Games
In nondeterministic games, chance introduced by throwing dice, shufing cards, etc. MAX knows his own moves, but does not know next possible moves of MIN We have to add chance nodes in addition to MAX and MIN nodes The branches leading from each chance node denote possible events (each with a probability)
MAX

Example: Coin Flip

Algorithm
EXPECTIMINIMAX gives perfect play

Works like MINIMAX with addition of chance nodes:


3 0.5
MIN

CHANCE

1 0.5 4 0 0.5 2 0.5

... if state is a MAX node then return the highest EXPECTMINIMAX-VALUE of SUCCESSORS(state) if state is a MIN node then return the lowest EXPECTMINIMAX-VALUE of SUCCESSORS(state) if state is a chance node then return weighted average EXPECTMINIMAX-VALU of SUCCESSORS(state) ...

Articial Intelligence andNeural Networks p.229/410

Articial Intelligence andNeural Networks p.230/410

Articial Intelligence andNeural Networks p.231/

Evaluation Functions
Exact values do matter here Behavior is preserved only by positive linear transformation of EVAL Hence, EVAL should be proportional to expected payoff
MAX

Nondeterministic Games in Practice


Dice rolls increase b: 36 ways to roll two dice (21 of them are distinct ways) Backgammon: 20 legal moves depth 4 = 20 (21cdot20)3 109 nodes As depth increases, probability of reaching a certain node shrinks Value of lookahead is diminished

Games of Imperfect Information


Example: card games, where opponents initial cards are unknown Could be seen as a game where all the dice are rolled at beginning Unfortunately, this is not quite right. . . Lets look at an example: two players (MAX and MIN) playing four-card hands of bridge with all cards showing

DICE

2.1 .9 .1 3 1 .9

1.3 .1 4 20

21 .9 30 .1 1 .9

40.9 .1 400

- pruning is less effective

MIN

TDGAMMON uses depth-2 search + very good EVAL world-champion level

20

20 30 30

1 400 400

Articial Intelligence andNeural Networks p.232/410

Articial Intelligence andNeural Networks p.233/410

Articial Intelligence andNeural Networks p.234/

Example
MAXs hand: 6 6 9 8 MINs hand: 4 2 10 5 MAX leading the 9 is an optimal play (as is leading any other card in this case) MAX will get two tricks on optimal play of MIN MIN will get two tricks (with 2 10) Replacing MINs hand with 4 2 10 5 does not make a difference Can be shown with a suitable variant of minimax

Example(2)
Now lets hide one of MINs cards MAX does not know if MIN has a 4 or a 4 One could argue: leading 9 against rst hand and against second hand is optimal; as MIN has one of these hands, its still optimal But: MIN takes trick with 10, leads with 2 MAX has to discard 6 or 6 If the wrong card is discarded, MAX will get only one trick

Example(3)
MAX is using what we might call averaging over clairvoyancy: Computing the minimax value of each action for each possible deal of cards Then computing the expected value over all deals (using probability of each deal) If you think this is reasonable, consider the following

Articial Intelligence andNeural Networks p.235/410

Articial Intelligence andNeural Networks p.236/410

Articial Intelligence andNeural Networks p.237/

Story Example
Day 1: Road A leads to heap of gold; Road B leads to a fork: turn left and nd a mound of jewels, turn right and get run over by a bus Day 2: Road A leads to heap of gold; Road B leads to a fork: turn left and get run over by a bus, turn right and nd a mound of jewels Day 3: Road A leads to heap of gold; Road B leads to a fork: guess correctly and nd a mound of jewels, guess incorrectly and get run over by a bus Choosing Road B on the rst two days is as optimal as choosing Road A Would you choose Road B on the third day?

Proper analysis
With partial observability intuition that value of an action is average of its values in all states is wrong value of an action depends on the information state or belief state an agent is in Correct strategy is to generate and search a tree of information states Leads to rational behavior as Acting to obtain information Signaling to ones partner Acting randomly to minimize information disclosure

Chapter Summary
Games illustrate several important points about AI Perfection is unattainable, we need to approximate Uncertainty constrains the assignment of values to states Optimal decisions depend on information state, not real state

Articial Intelligence andNeural Networks p.238/410

Articial Intelligence andNeural Networks p.239/410

Articial Intelligence andNeural Networks p.240/

Chapter 8
Brains Neural networks

Outline

Brains
A neuron is a brain cell whose function is to collect and process electrical signals

Feed-forward networks

Neural Networks

Single-layer networks Multi-layer networks Recurrent networks Elman networks Learning Supervised Learning Unsupervised Learning Reinforcement Learning
Nucleus Dendrite Synapse Axon

Axonal arborization Axon from another cell

Synapses

Cell body or Soma


Articial Intelligence andNeural Networks p.241/410 Articial Intelligence andNeural Networks p.242/410

Articial Intelligence andNeural Networks p.243/

Brains(2)
The brains information-processing capacity is thought to emerge primarily from networks of neurons There are approx. 1011 neurons in human brain, connected via approx. 1014 synapses 1ms-10ms cycle time Some of the earliest AI work aimed to create articial neural networks

Articial Neuron
McCulloch and Pitts devised simple mathematical model of a neuron Gross oversimplication of real neurons Its purpose was to develop understanding of what networks of simple units can do
aj a0 = 1

Articial Neuron(2)
Bias Weight

W0,i Wj,i

ai = g(ini)

ini

g ai

Input Links

Input Activation Function Function

Output

Output Links

Each unit i rst computes weighted sum of its inputs: ini = n j =0 Wj,i aj Then applies activation function g to derive output:
ai = g (ini ) = g
Articial Intelligence andNeural Networks p.244/410 Articial Intelligence andNeural Networks p.245/410

n j =0 Wj,i aj

Articial Intelligence andNeural Networks p.246/

Activation Function
Activation function g is designed to meet two desiderata: Unit should be active (near 1) when right inputs are given and inactive (near 0) when the wrong inputs are given Activation needs to be nonlinear, otherwise entire neural network collapses into a simple linear function Two typical activation functions are Threshold function (or step function) Sigmoid function In general, activation functions are monotonically increasing

Activation Function(2)
g(ini) +1 +1 g(ini)

What Can We Do?


With single neurons, Boolean functions can be implemented
W0 = 1.5 W1 = 1 W2 = 1
AND

W0 = 0.5 W1 = 1 W2 = 1
OR

W0 = 0.5

ini

ini

(a)

(b)

W1 = 1

(a) is threshold function g (x) = 1 for x > 0, = 0 otherwise (b) is sigmoid function g (x) = 1/(1 + ex ) Usually, bias weight W0,i is used to move threshold location: g (ini ) = g (x W0,i )
Articial Intelligence andNeural Networks p.248/410

NOT

Using neurons, we can build a network to compute any Boolean function of the inputs

Articial Intelligence andNeural Networks p.247/410

Articial Intelligence andNeural Networks p.249/

Network Structures
Two main categories of neural networks structures Feed-forward networks Represents a function of its current input No internal states other than weights (Cyclic or) recurrent networks Feeds outputs back into inputs Dynamical system (may reach stable state, exhibit oscillations or even chaotic behavior) Can support short-term memory This makes them more interesting, but also harder to understand

Feed-forward Example
1 W1,3 W1,4 3 W3,5 5 W2,3 2 W2,4 4 W4,5

Single-layer Networks
Network with all inputs connected directly to the outputs is called a single-layer neural network (or perceptron network) Each output unit is independent of the others, so we look at a single output unit We start by examining the expressiveness of perceptrons As already seen, simple Boolean functions are possible Majority function (outputs 1 if more than half of inputs are 1) is also possible: Wj = 1, threshold W0 = n/2

Simple neural network with two inputs, one hidden layer of two units, and one output Feed-forward networks are usually arranged in layers (each unit receives input from the immediately preceding layer)
Articial Intelligence andNeural Networks p.251/410

Articial Intelligence andNeural Networks p.250/410

Articial Intelligence andNeural Networks p.252/

Expressiveness

Expressiveness(2)
Perceptron represents a linear separator in input space Threshold perceptron returns 1, iff the weighted sum of its inputs is positive: n j =0 Wj xj > 0 Or, interpreting the Wj s and xj s as a vector, W x > 0 This denes a hyperplane in the input space, perceptron returns 1 if input is on one side of that plane
?
0 1 (c) x1 xor x2 x2
Input Units

Expressiveness(3)

?? ?? ??? ?????
x1 1 x1 x1 1 1 0 0 1 x2 0 0 1 x2 0 (a) x1 and x2 (b) x1 or x2

Consider perceptron with threshold function Can represent AND, OR, NOT, majority, but e.g. not XOR:

Wj,i

For this reason, threshold perceptron is called linear separator

Output Units

Perceptron output 1 0.8 0.6 0.4 0.2 0 -4 -2 0 x1

-4

-2

4 2 x2

Output units all operate separately, no shared weights Adjusting weights moves the location, orientation, and steepness of cliff

Articial Intelligence andNeural Networks p.253/410

Articial Intelligence andNeural Networks p.254/410

Articial Intelligence andNeural Networks p.255/

Multilayer Networks
Layers are usually fully connected
Output units Oi Wj,i Hidden units a j Wk,j Input units Ik
hW(x1, x2) 1 0.8 0.6 0.4 0.2 0 -4 -2 x1

Expressiveness
All continuous functions with 2 layers, all functions with 3 layers
hW(x1, x2) 1 0.8 0.6 0.4 0.2 0 -4 -2 x1

Recurrent Networks
Recurrent neural networks have feedback connection (to store information over time) Elman network is a simple one that makes a copy of the hidden layer This copy is called context layer Context layer stores the previous state of the hidden layer

-4

4 2 0 x2 -2

-4

4 2 0 x2 -2

Combine two opposite-facing threshold functions to make ridge Combine two perpendicular ridges to make bump Add various bumps to t any surface
Articial Intelligence andNeural Networks p.256/410 Articial Intelligence andNeural Networks p.257/410

Articial Intelligence andNeural Networks p.258/

Elman network
Output Units

Elman Network(2)
The context layer feeds previous network states into the hidden layer Input vector: x = ( x1 , . . . , xn , xn+1 , . . . , x2n ) actual inputs context units Connections from each hidden unit to corresponding context unit has weight 1 Context units are fully interconnected with all hidden units (not necessarily with weight 1)

Learning
For simple Boolean or majority functions it is easy to nd appropriate weights Generally, by adjusting the weights, we change the function that a network represents That is how learning occurs in neural networks When we have no prior knowledge about the function except for data we have to learn values for Wj from this data

Hidden Units

Context Layer

Input Units

Articial Intelligence andNeural Networks p.259/410

Articial Intelligence andNeural Networks p.260/410

Articial Intelligence andNeural Networks p.261/

Learning(2)
Three main types of learning (we are looking at rst two) Supervised learning Network is provided with a data set of input vectors and desired output (training set) Adjust the weights so that the error between the real output and the desired output is minimized Unsupervised learning Clusters the training set to discover patterns or features in the input data Reinforcement learning Reward the network for good performance, penalize it for bad performance
Articial Intelligence andNeural Networks p.262/410

Supervised Learning
Gradient descent is widely popular approach to train (single-layer) networks Idea: adjust the weights in of the network to minimize some measure of the error on training set Classical measure of error is sum of squared errors Squared error for a single training example with input x 1 2 2 and desired output y is E = 1 2 Err = 2 (y hW (x)) (where hW (x) is output of perceptron)

Gradient Descent
Depending on the gradient of the error, we increase or decrease the weight

Error

Minimum

Weight

Articial Intelligence andNeural Networks p.263/410

Articial Intelligence andNeural Networks p.264/

Gradient Descent(2)
For calculating the gradient, we need some calculus We need to determine a partial derivative of E with respect to each weight:
E Wj
1 2 Err2 Err = = Err Wj Wj n Wj x j yg = Err Wj j =0

Gradient Descent(3)
In the gradient descent algorithm, Wj s are updated as follows:
Wj = Wj + Err g (in) xj is the learning rate:

Gradient Descent(4)
Complete algorithm runs training examples through the net one at a time (adjusting the weights slightly) Each cycle is called an epoch Epochs are repeated until some stopping criterion is reached E.g. weight changes become very small Only converges for linearly separable data set

Size of the steps taken in the negative direction of the gradient

= Err (g (in) xj ) g

= derivative of activation function (e.g. for sigmoid, g = g (1 g ))


Articial Intelligence andNeural Networks p.265/410 Articial Intelligence andNeural Networks p.266/410

Articial Intelligence andNeural Networks p.267/

Gradient Descent(5)
Different variants for cycling through training examples: Batch: adding up all gradient contributions and adjusting weights at end of epoch Stochastic: select examples randomly There are many other methods besides gradient descent: Widrow-Hoff Generalized Delta Error-Correction ...

Back-propagation Learning
Learning in a multi-layer network is a little different Minor difference: we now have several outputs and an output vector hW (x) Major difference: error at output layer is clear, error in hidden layers is unclear Idea: back-propagate error from output layer to hidden layers

Back-propagation Learning(2)
At output layer, weight update is identical to gradient descent We have multiple output units, so let Erri be the i-th component of the error vector y hW (x), so
Wj,i = Wj,i + Erri g (ini ) xj
i

Now we need to connect the output units to the hidden units

Articial Intelligence andNeural Networks p.268/410

Articial Intelligence andNeural Networks p.269/410

Articial Intelligence andNeural Networks p.270/

Back-propagation Learning(3)
Idea: hidden node j is responsible for some fraction of the error in the nodes to which it connects
i values are divided according to the weights of the connections: j = g (inj )
i

Back-propagation Learning(4)
Back-propagation process can be summarized as follows: Compute the values for the output units (using the observed error) Starting with output layer, repeat for each layer until earliest layer is reached: Propagate the values back to the previous layer Update the weights in the previous layer

Unsupervised Learning
In supervised learning, supervisor (or teacher) presents an input pattern and a desired response Neural networks try to learn functional mapping between input and output Unsupervised learnings objective is to discover patterns of features in input data This is done with no help or feedback from teacher No explicit target outputs are prescribed, however, similar inputs will result in similar outputs

Wj,i i

Now we can use same weight-update rule for the hidden nodes
Wk,j = Wk,j + j xk

Articial Intelligence andNeural Networks p.271/410

Articial Intelligence andNeural Networks p.272/410

Articial Intelligence andNeural Networks p.273/

Hebbian Learning Rule


Developed by Donald Hebb, a neuropsychologist, is one of the oldest learning rules (1949) Idea: when neuron A repeatedly participates in ring neuron B, the strength of the action of A onto B increases Formally speaking, this means
Wj,i = Wj,i + g (ini ) g (inj ) is learning rate and neuron j feeds neuron i

Hebbian Learning Rule(2)


Summary of Hebbian learning rule Initialize all weights (e.g. small random values) For each input pattern, compute corresponding output vector Adjust the weights as shown on last slide Repeat from step 2 until stopping criterion has been reached

Hebbian Learning Rule(3)


Problem with Hebbian learning: repeated presentations of input patterns leads to an unlimited growth in weight values Solution: impose limit on increase in weight One type of limit is to introduce a nonlinear forgetting factor :
Wj,i = Wj,i + g (ini ) g (inj ) g (inj ) Wj,i

Articial Intelligence andNeural Networks p.274/410

Articial Intelligence andNeural Networks p.275/410

Articial Intelligence andNeural Networks p.276/

Reinforcement Learning
In supervised learning an input data set and a full set of desired outputs is presented In reinforcement learning the feedback is not as elaborate Desired output is not described explicitly Learning network only gets feedback whether output was a success or not Learning with a critic (rather than learning with a teacher) Main objective is to maximize the (expected) reward or reinforcement signal

Reinforcement Learning(2)
General situation:

Learning Rule
Neural network reinforcement learning usually requires a multi-layer architecture

Learner

An external evaluator is needed to decide whether network has scored a success or not Every node in the network receives a scalar reinforcement signal r representing quality of output

Sensory Input

Reward

Action

r is between 0 and 1, 0 meaning maximum error, 1 meaning optimal

Environment

Compared to back-propagation (where output nodes receive error signal, which is propagated backward), here every node receives same signal

Articial Intelligence andNeural Networks p.277/410

Articial Intelligence andNeural Networks p.278/410

Articial Intelligence andNeural Networks p.279/

Learning Rule(2)
Mazzoni et al. presented following weight-update algorithm (based on Hebbian learning):
Wj,i = Wj,i + ( (g (ini ) pi ) g (inj ) r + (1 g (ini ) pi ) g (inj ) (1 r))

Learning Structures
So far, we have only looked at learning weights (given a xed network structure) How do we nd the best network structure? Choosing a network that is too small: May not be powerful enough to get task done Choosing a network that is too big: Problem of overtting: network memorizes examples rather than generalizing

Learning Structures(2)
If we stick to fully connected networks, the only choices are: The number of hidden layers The number of neurons in each Usual approach: try several and keep the best Try to keep it small to avoid overtting

where is a constant, and pi is the probability of neuron i ring A correct response (large r) will strengthen connections that were active during the response An incorrect response (small r) will weaken active synapses

Articial Intelligence andNeural Networks p.280/410

Articial Intelligence andNeural Networks p.281/410

Articial Intelligence andNeural Networks p.282/

Learning Structures(3)
Let us now consider networks that are not fully connected We need some effective search method to weed out connections One approach is optimal brain damage algorithm Starts with a fully connected network and removes connections from it After rst training an information-theoretic approach identies a selection of connections to be dropped Network is retrained and if performance has not decreased, process is repeated It is also possible to remove neurons that are not contributing much to result
Articial Intelligence andNeural Networks p.283/410

Learning Structures(4)
Several algorithms for growing larger network from smaller one Tiling algorithm starts with a single unit that tries its best Subsequent units are added to take care of examples that rst unit got wrong Algorithm adds only as many units as are needed to cover all examples

Applications:Speech Recognition

Articial Intelligence andNeural Networks p.284/410

Articial Intelligence andNeural Networks p.285/

Applications:Handwriting Recognition

Applications:Fraud Detection
Banks are using AI software (including neural networks) to detect fraud Have the ability to detect fraudulent behavior by analyzing transactions and alerting staff

Applications:CNC
Neural networks are also used in computer numerically controlled (CNC) machines E.g. Siemens SINUMERIK 840D controller for drilling, turning, milling, grinding and special-purpose machines

400-300-10 unit network: 1.6% error 768-192-30-10 unit LeNet: 0.9% error

Credit card fraud losses in the UK fell for the rst time in nearly a decade in 2003 (by more than 5% to 402.4m pounds) Barclays reported that after installing a system in 1997, fraud was reduced by 30% by 2003

Articial Intelligence andNeural Networks p.286/410

Articial Intelligence andNeural Networks p.287/410

Articial Intelligence andNeural Networks p.288/

Applications:Drug Design
Used for testing if certain anti-inammatory drugs cause adverse reactions The rate of these reactions is about 10% (with 1% serious and 0.1% fatal) Three-layer, backpropagated network was used to predict serious reactions Predicted rate matched within 5% of observed rate

Chapter Summary
Neural networks are an AI technique modeled on the brain Single-layer feed-forward networks can represent linearly separable functions Multi-layer feed-forward networks can represent any function (given enough units) Recurrent networks can store information over time Many different techniques to train networks Neural networks have been used for hundreds of applications

Chapter 9

Evolutionary Computing

Articial Intelligence andNeural Networks p.289/410

Articial Intelligence andNeural Networks p.290/410

Articial Intelligence andNeural Networks p.291/

Outline
Introduction to Evolutionary Computing Genetic algorithms Evolutionary programming

Introduction
Genetic algorithms already mentioned when discussing local search algorithms; now we have a closer look (Biological) evolution is an optimization process with the aim to improve ability to survive Characteristics of an individual are contained in his/her chromosomes After sexual reproduction the offsprings chromosomes consist of a combination of parents chromosomes Process of natural selection allows more t individuals to produce more offspring One expects to have offspring similar or even better tness
Articial Intelligence andNeural Networks p.292/410 Articial Intelligence andNeural Networks p.293/410

Introduction(2)
Occasionally mutations occur These have a random effect on the chromosomes of an individual May improve or worsen the tness of an individual (or the offspring) Introduces some variation into a population Evolutionary Computing (EC) emulates the process of natural selection in a search procedure

Articial Intelligence andNeural Networks p.294/

Evolutionary Computing
An evolutionary algorithm (EA) is a stochastic search algorithm comprising: An encoding of solutions to a problem in form of chromosomes Initial state: starting population (usually with randomly determined chromosomes) Successor function: generating offspring given two parents Evaluation function (or tness function): determining the tness of an individual Selection function: choosing the individuals to reproduce

Genetic Algorithms (GA)


Genetic algorithms model genetic evolution One of the rst EC paradigms to be developed and applied (1975) Original GAs by Holland had as distinct features: Bit string chromosome representation Proportional selection Cross-over as primary successor function

Example
Solving the 8-queens problem using GAs
n-th number in chromosome stands for position of queen within n-th column

Position above is encoded as 74258136

Articial Intelligence andNeural Networks p.295/410

Articial Intelligence andNeural Networks p.296/410

Articial Intelligence andNeural Networks p.297/

Example(2)
24748552 32752411 24415124 32543213
24 31% 23 29% 20 26% 11 14%

Example(3)
32748152 24752411 32252124 24415417
Mutation

Algorithm
function GENETIC-ALG(population,FITNESS-FN) repeat new_pop = empty set loop for i from 1 to SIZE(population) do x = RAND-SELECT(population,FITNESS-FN) y = RAND-SELECT(population,FITNESS-FN) child = REPRODUCE(x,y) if small random probability then child = MUTATE(child) add child to new_pop end population = new_pop until some individual fit enough or enough time has elapsed return best individual

32752411 24748552 32752411 24415124


Pairs

32748552 24752411 32752124 24415411


CrossOver

Graphical representation of crossover of rst two parents (before mutation):

Fitness

Selection

As tness function we use the number of nonattacking pairs of queens (here probability of being chosen is proportional to tness) Two individuals are chosen randomly (biased by probabilities) for reproduction Random crossover point determines which fragments will be exchanged when reproducing

Articial Intelligence andNeural Networks p.298/410

Articial Intelligence andNeural Networks p.299/410

Articial Intelligence andNeural Networks p.300/

Variants for Selection


Several different techniques for selecting parents exist Random selection Individuals are selected randomly with no reference to tness at all Proportional selection The chance of individuals being selected is proportional to the tness value Tournament selection Group of k individuals is selected randomly These individuals take part in tournament, i.e. best individual is selected (done for both parents)
Articial Intelligence andNeural Networks p.301/410

Variants for Selection(2)


Elitism Next generation will not consist entirely of new individuals A certain number of individuals from current generation survive into the next one: The k best individuals k individuals selected using any of the previous techniques

Variants for Crossover


There are also different techniques for doing crossover One-point crossover: A single position is randomly selected and substrings after that point are swapped

000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

Articial Intelligence andNeural Networks p.302/410

Articial Intelligence andNeural Networks p.303/

Variants for Crossover(2)


Two-point crossover: Two positions are randomly selected and substrings between points are swapped

Variants for Crossover(3)


Uniform crossover: Any random parts are swapped

Mutation
Aim of Mutation is to introduce new genetic material Adding diversity to the population Usually a small probability for mutations to occur is chosen

000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

To ensure that good solutions are not distorted too much Initial large mutation rate that decreases exponentially can also be quite successful Similarity to simulated annealing

111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

111 000 000 111 000 111 000 111 000 111

111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

111 000 000 111 000 111 000 111 000 111

Articial Intelligence andNeural Networks p.304/410

Articial Intelligence andNeural Networks p.305/410

Articial Intelligence andNeural Networks p.306/

Assessment of GAs
Genetic algorithms are similar to stochastic local beam searches Combine an uphill tendency in searching with random exploration Exchange information among parallel search threads Crossover seems to be the crucial component of GAs

Assessment of GAs(2)
However, crossover conveys no advantage if positions of the chromosomes are randomly permuted initially Advantage comes from combining large blocks that have involved independently to perform useful function E.g. putting queens in positions 7,4, and 2 in the rst three columns They dont attack each other (useful block) Could be combined with other useful block to construct a solution (e.g. 58136, which is also a useful block) This raises the level of granularity at which search takes place
Articial Intelligence andNeural Networks p.307/410 Articial Intelligence andNeural Networks p.308/410

Assessment of GAs(3)
Theory of GAs explains this with the idea of a schema A schema is a substring in which some of the positions can be left unspecied E.g. 742***** describes all states in which the rst three queens are at position 7,4, and 2 Strings that match a schema are called instances of a schema E.g. 74213378

Articial Intelligence andNeural Networks p.309/

Assessment of GAs(4)
If the average tness of a schemas instances is above average, the number of this schemas instances will grow over time This effect is unlikely to be signicant if adjacent positions are totally unrelated There will be few contiguous blocks that provide consistent benet GAs work best when schemas correspond to meaningful components of a solution Successful use of genetic algorithms requires careful engineering of the representation

Applications
Parametric Design of Aircraft Optimizing aircraft designs Task is posed as that of optimizing a list of parameters Routing in circuit-switched telecommunications networks Optimize the routing of telephone networks in order to minimize costs to US West Hybridization with other algorithms can lead to better performance

Applications(2)
Robot trajectory generation Planning the path that a robot arm takes Not only optimizing the length of the path, but also wear and tear on arm (by acceleration and deceleration) Tuning for sonar information processing Training neural networks classifying sonar signals using GAs

Articial Intelligence andNeural Networks p.310/410

Articial Intelligence andNeural Networks p.311/410

Articial Intelligence andNeural Networks p.312/

Evolutionary Programming
Evolutionary Programming (EP) emphasizes behavioral models and not genetic models EP is derived from the simulation of adaptive behavior in evolution EP considers phenotypes (not genotypes) About nding a set of optimal behaviors, the tness function measures behavioral error

Finite-State Machines
How do we model behavior? A popular way to do this is using nite-state machines (FSMs) Describes a sequence of actions that are taken Each action depend on the current state of the machine and the input

Finite-State Machines(2)
Formal denition:
F SM = (S, , I, O, , )

where
S is a nite set of states is the initial state I is a nite set of input symbols O is a nite set of output symbols is the next-state function is the next-output function

Articial Intelligence andNeural Networks p.313/410

Articial Intelligence andNeural Networks p.314/410

Articial Intelligence andNeural Networks p.315/

Finite-State Machines(3)
Example: S = {X, Y, Z }, = Z, I = {0, 1}, O = {f, t}
0/t Y 0/f 1/t 1/f X 1/t Z 0/f

FSMs and EP
Evolutionary programming was originally developed to evolve nite-state machines Aim of early applications was to make predictions about future Given a sequence of previously observed symbols, evolve a program to predict next symbol

Making Predictions
For the example FSM the following sequence would produce the following output Sequence 0 0 1 0 0 1 1 . . . f t t f t t t ... Output Interpreting f=0 and t=1, the FSM made just one mistake

Present state X X Y Y Z Z Input symbol 0 1 0 1 0 1 Y X Y Z Y X Next state Next output f f t t f t


Articial Intelligence andNeural Networks p.316/410 Articial Intelligence andNeural Networks p.317/410

Articial Intelligence andNeural Networks p.318/

Algorithm
function EVOLUTION-PROG(population,FITNESS-FN) repeat new_pop, tmp_pop = empty set loop for i from 1 to SIZE(population) do child = SELECT-NTH(i,population) child = MUTATE(child) add child to tmp_pop end new_pop = SELECT-FITTEST(SIZE(population), population + tmp_pop, FITNESS-FN) population = new_pop until some individual fit enough or enough time has elapsed return best individual
Articial Intelligence andNeural Networks p.319/410

Mutation
In the original EP, there was no crossover, only mutation One point in the behavioral space was seen as standing for a species, not an individual However, EP can be combined with GAs

Variants for Selection


The function SELECT-FITTEST can be realized in different ways Any selection process used for GAs An elitist mechanism picking k best parents, remainder of population from best remaining parents and offspring First cull the worst k among parents and offspring, then select best n among rest

Articial Intelligence andNeural Networks p.320/410

Articial Intelligence andNeural Networks p.321/

Application
Nark: a bug-nding compiler extension developed at Stanford Static analysis techniques are used to nd bugs in software systems One system, Metacompilation (MC), allows users to encode rules such as Dont use freed memory Dont call blocking functions with interrupts disabled To encode a rule, user describes it as a state machine Source code is used as input to the state machine

Application(2)
Unfortunately, encoding those rules is quite complicated Nark allows the user to simply give examples of (a class of) bugs Nark evolves the checker for the rule itself The complexity of the classes of bugs that Nark is able to found is still somewhat limited

Chapter Summary
In practice, genetic algorithms have had a widespread impact on optimization problems At present it is not quite clear whether the appeal arises from their performance or their aesthetically pleasing origins Similar things can be said about evolutionary programming (although it is not as widely spread as GAs)

Articial Intelligence andNeural Networks p.322/410

Articial Intelligence andNeural Networks p.323/410

Articial Intelligence andNeural Networks p.324/

Chapter 10

Outline
General Introduction to Swarm Intelligence Particle swarm optimization (PSO) Ant colony optimization (ACO)

Swarm Intelligence
Simple agents interacting locally with one another and their environment No central control or data source (Simple) local interactions often lead to the emergence of (complex) global behavior Examples found in nature: Ant colonies/bee hives Bird ocking Animal herding Bacteria molding Fish schooling

Swarm Intelligence

Articial Intelligence andNeural Networks p.325/410

Articial Intelligence andNeural Networks p.326/410

Articial Intelligence andNeural Networks p.327/

Examples

Examples(2)

Examples(3)

Articial Intelligence andNeural Networks p.328/410

Articial Intelligence andNeural Networks p.329/410

Articial Intelligence andNeural Networks p.330/

Examples(4)

Particle Swarm Optimization


Particle swarm optimization (PSO) is a global optimization technique A swarm consists of a set of particles Each particle represents a potential solution Assumes that solution can be represented as a point in n-dimensional space Different starting solutions are plotted in this space Each with an initial velocity As well as a communication channel to other particles

Particle Swarm Optimization(2)


Particles then move through solution space After each timestep, particles are evaluated according to some tness criterion Particles are accelerated towards particles with better tness values within their communication group A large number of members makes technique resilient against getting stuck in local optima

Articial Intelligence andNeural Networks p.331/410

Articial Intelligence andNeural Networks p.332/410

Articial Intelligence andNeural Networks p.333/

Algorithm
Formally speaking, each particle i has a position vector xi (t) describing its position at time t a current velocity vi (t) at time t

Algorithm(2)
The position of a particle is changed by adding the velocity vector to the position vector:
xi (t + 1) = xi (t) + vi (t)

Algorithm(3)

Change the velocity vector slightly (random factor r) to point into direction of best neighbor:
vi (t + 1) = vi (t) + r (xbesti (t) xi (t))

As this may lead to an unanimous, unchanging directions, sometimes random craziness factor is added

function SWARM-OPT(population,FITNESS-FN) repeat loop for i from 1 to SIZE(population) do fit_i = FITNESS-FN(SELECT-NTH(i,populatio end loop for i from 1 to SIZE(population) do look for fittest neighbor of particle i change velocity vector v_i change position x_i end until some individual fit enough or enough time has elapsed return best individual

Articial Intelligence andNeural Networks p.334/410

Articial Intelligence andNeural Networks p.335/410

Articial Intelligence andNeural Networks p.336/

Neighborhood
Different neighborhood types have been dened and studied Star topology Every particle can communicate with every other particle Each particle is attracted to the best global solution Was used in the rst version of PSO

Neighborhood(2)
Ring topology Every particle communicates with its n immediate neighbors Diagram below shows case for n = 2 Hybrids with star topology are possible (vi is changed towards best neighbor and best overall)

Neighborhood(3)
Wheel topology Only one particle is connected to all others, all other particles are only neighbors to this focal particle Isolates particles from each other, all particles communicate through focal particle Creates a follow the leader effect

Articial Intelligence andNeural Networks p.337/410

Articial Intelligence andNeural Networks p.338/410

Articial Intelligence andNeural Networks p.339/

Applications
Biochemistry: Improving the fermentation medium for Echinocandin B production Military: Traveling Salesman Problem for Surveillance Mission Electrical engineering: Reactive Power and Voltage Control

PSO vs. Evolutionary Computing


Both are optimization paradigms using adaption of a population of individuals There are some differences, however: Memory: PSOs velocity vectors reect all reactions to previous best solutions EC does not conserve all information from ancestors Driving forces for changes: Learning from peers in case of PSO Genetic recombinations and mutations in case of EC

Ant Colony Optimization (ACO)


PSO is modeled on swarms of individuals having the same behavior and characteristics When looking at social insects, we have a large number of individuals with different morphological structures and tasks but all contributing to a common goal (e.g. survival of the hive) Here we look at ants

Articial Intelligence andNeural Networks p.340/410

Articial Intelligence andNeural Networks p.341/410

Articial Intelligence andNeural Networks p.342/

Ant Behavior in the Wild

Tasks within an Ant Colony

Tasks within an Ant Colony(2)


How do individual ants know which task to do? There is no globally centered command center

Reproduction: queen Brood care: specialized worker Food collection: specialized worker

Seems to occur magically Actually is based on two different things: Anatomical differences Stigmergy

Defense: soldier Nest cleaning: specialized worker Nest building & maintenance: specialized worker
Articial Intelligence andNeural Networks p.343/410 Articial Intelligence andNeural Networks p.344/410

Articial Intelligence andNeural Networks p.345/

Tasks within an Ant Colony(3)


Anatomical differences: Size and larger jaws e.g. distinguish between soldier ants and food collectors Stigmergy: Indirect interactions between ants (as opposed to direct interactions, like antenna, mandibular, or visual contact) Based on local modication of the environment Unfortunately, many aspects of ant behavior are not fully understood yet We are going to look at one aspect that has been studied intensively: food collection
Articial Intelligence andNeural Networks p.346/410

Food Collection
Ants have the ability to nd the shortest path between a food source and their nest: food source

Food Collection(2)
Several experiments have been conducted to study this behavior Initially, paths are chosen randomly With time, more and more ants follow the shorter path
food food

nest
nest
Articial Intelligence andNeural Networks p.347/410

nest

Articial Intelligence andNeural Networks p.348/

Food Collection(3)
Whats the reason for that? Common ant is not very intelligent (a few hundred neurons) Its done via stigmergy When walking around, each ant leaves behind a pheromone trail When an ant has to decide which path to follow, usually it picks the one with higher pheromone concentration Ants on the shorter path will return faster, leaving more pheromone on this path in shorter time Also, pheromone evaporates with time, pheromone on longer path will vanish faster
Articial Intelligence andNeural Networks p.349/410

Modeling Ants and Optimization


Next we show, how to model the ant behavior to help solve traveling salesman problem (TSP) Ants are given a list of cities they have already visited Ant follows trail probabilistically (depending on pheromones) without updating pheromones After completing a tour, ant deposits pheromone depending on quality of solution After each update step, pheromones evaporate

Ants and TSP


Number of ants is constant (e.g. one for each node):

Articial Intelligence andNeural Networks p.350/410

Articial Intelligence andNeural Networks p.351/

Ants and TSP(2)

Ants and TSP(3)

Ants and TSP(4)


The probability of ant k at city i to visit city j next is:
i,k (j ) = ij
k ic cJi

ACO usually performs better if mixed with other heuristics (e.g. greedy local optimization taking shortest path):
i,k (j ) = ij is the amount of pheromone on edge between two nodes i and j
Articial Intelligence andNeural Networks p.353/410

ij ij k cJi ic ic

ij is the inverted distance 1/dij between cities i and j

and control the inuence of pheromone and heuristic

Articial Intelligence andNeural Networks p.352/410

Articial Intelligence andNeural Networks p.354/

Ants and TSP(5)


k At the end of each tour, ant k deposits pheromone ij on links k ij k ij =

Ants and TSP(6)


The pheromone for all ants are added up:
n k ij k =1

Algorithm
function ACO-TSP() nant = NUMBER-OF-ANTS() nnode = NUMBER-OF-NODES() place ants on nodes repeat loop for k from 1 to nant do loop for step from 1 to nnode do choose next node according to probability phi end end update pheromone trails until some tour good enough or enough time has elapsed return best tour found so far

=0 Q Lk

when link (i,j) has not been used otherwise

ij =

The pheromone for all links is adjusted:


ij = (1 )ij + ij is the evaporation coefcient

Q is a xed amount of pheromone and Lk is length of tour of ant k

The longer the tour, the worse the solution, the smaller the amount of pheromone awarded each link

Articial Intelligence andNeural Networks p.355/410

Articial Intelligence andNeural Networks p.356/410

Articial Intelligence andNeural Networks p.357/

Applications
One important eld in which ACO has been applied is telecommunications routing When routing calls through a network, they go through a number of intermediate switching stations In a large network there are many possible routes Some network parts may experience congestion while others have spare capacity Load balancing tries to distribute calls over the network such that almost no calls will be lost there is a short route between callers

Applications(2)
ACO has been used to optimize BT network Right hand shows British Synchronous Digital hierarchy network (SDH) M. Ward, "Theres an ant in my phone", New Scientist, 24 January 1998

Applications(3)
Centralized control systems scale badly Usually decentralized approach with several routers is used, each with (local) routing information Main idea of ACO: Enhance routing tables with pheromone information Send virtual ants through the network going from a random source to a random destination Ant going through network updates pheromone information depending on quality of connection (length, congestion)

Articial Intelligence andNeural Networks p.358/410

Articial Intelligence andNeural Networks p.359/410

Articial Intelligence andNeural Networks p.360/

Chapter Summary
Particle swarm optimization seems to be an efcient and robust technique Although full potential has not been tapped yet Study of ant colonies is still a young eld in computational intelligence More interesting applications still to be explored

Chapter 11
Motivation

Outline
Fuzzy sets and fuzzy logic Approximate Reasoning/fuzzy controllers

Fuzzy Systems

Articial Intelligence andNeural Networks p.361/410

Articial Intelligence andNeural Networks p.362/410

Articial Intelligence andNeural Networks p.363/

Motivation
Development of logic has a long and rich history (many philosopher played a role) Foundations of two-valued logic come from Aristotle (Laws of Thought) 400 B.C.: Law of the Excluded Middle: Every proposition must have only one of two outcomes: true or false Even back then, there were objections: Cretan philosopher Epimenides of Knossos said: All Cretans are liars

Motivation(2)
Many successes have been achieved with two-valued logic However, not all problems can be mapped into the domain of two-valued variables In most real-world problems incomplete, imprecise, vague, or uncertain data has to represented With fuzzy logic domains are characterized by linguistic terms (rather than numbers), e.g. It is partly cloudy John is very tall partly and very describe the magnitude of the (fuzzy) variables cloudy and tall
Articial Intelligence andNeural Networks p.365/410

Motivation(3)
In the 1900s ukasiewicz proposed an alternative in form of a three-valued logic The possible values are true, false, and undecided Later on, he extends it to a four- and ve-valued logic In 1965 Zadeh produced the foundations of an innite-valued logic in form of fuzzy logic Was ignored for some time, really took off after reimporting it from Japan

Articial Intelligence andNeural Networks p.364/410

Articial Intelligence andNeural Networks p.366/

Set Theory

Set Theory(2)
Regular sets or crisp sets have a rigid distinction Either an element belongs to the set or not Formally speaking, we have a membership function mA (x) for set A, which maps elements x of the domain X onto 0 or 1:

Set Theory(3)
A graphical presentation of our set large ants looks like this:

mlarge ants (x) 1

We want to construct the set of all large ants Suppose ants longer than 1.5cm are considered large Clearly, an ant with length of 3cm will belong, one with 0.5cm will not What about an ant with length 1.48cm or 1.52cm?
mA : X {0, 1}

cm 0.5 1 1.5 2 2.5


Articial Intelligence andNeural Networks p.367/410 Articial Intelligence andNeural Networks p.368/410

Articial Intelligence andNeural Networks p.369/

Fuzzy Sets
In contrast to crisp sets, fuzzy sets have membership degrees That means, in addition to the values 1 (belongs to) and 0 (does not belong to) an element can have any value in between (kind of belongs to) Formally speaking, the membership function A (x) for a fuzzy set A maps elements x to any value in the interval [0, 1]:
A : X [0, 1]

Fuzzy Sets(2)
A fuzzy set for large ants could look like this:

Comparing Fuzzy Sets


Equality: Crisp sets: two sets A and B are equal, if they contain the same elements Fuzzy sets: all membership degrees have to be equal, i.e. A (x) = B (x) for all x X Containment: Crisp sets: A is contained in B (A B ), if all elements in A are also elements of B Fuzzy sets: again membership degrees have to be considered, i.e. A B A (x) B (x) for all x X

large ants (x) 1

cm 0.5 1 1.5 2 2.5


Articial Intelligence andNeural Networks p.370/410 Articial Intelligence andNeural Networks p.371/410

Articial Intelligence andNeural Networks p.372/

Fuzzy Operators
Complement (logical NOT): A (u) = 1 A (u) Union (logical OR): AB (u) = max(A (u), B (u)) Intersection (logical AND): AB (u) = min(A (u), B (u)) There are alternatives to these operators (which we will not look at here) All operators need to satisfy certain axioms (e.g. commutativity, associativity for union and intersection)

Fuzzy Operators(2)
Complement Union (green) and Intersection (red)

Fuzziness and Probability


Fuzzy logic and probability are often confused Both refer to uncertainty, but the similarity stops there There are conceptual differences

Articial Intelligence andNeural Networks p.373/410

Articial Intelligence andNeural Networks p.374/410

Articial Intelligence andNeural Networks p.375/

Fuzziness and Probability(2)


Fuzzy truth represents membership in vaguely dened sets Describes an imprecision in facts E.g. coin lying in doorway has a degree of 0.5 belonging to kitchen and 0.5 belonging to dining room Probabilities refer to the likelihood of some event or condition Deals with chances of an event happening (the result, however, is precise) E.g. ipping a coin has a probability of 0.5 for turning up heads or tails; after ipping there is no imprecision
Articial Intelligence andNeural Networks p.376/410

Rudimentary Reasoning
Using the before mentioned operators we can do some simple reasoning For example, consider the three fuzzy sets tall, good_athlete, and good_basketball_player Now assume:
tall (Michael Jordan) = 0.9 good_athlete (Michael Jordan) = 0.9 tall (Sven) = 0.9 good_athlete (Sven) = 0.2

Rudimentary Reasoning(2)
If we know that a good basketball player is tall and a good athlete, then which one is the better player? We can apply the intersection operator and get:

good_basketball_player (Michael Jordan) = min(0.9, 0.9) = 0.9

good_basketball_player (Sven) = min(0.9, 0.2) = 0.2

So Michael Jordan is the better player However, this is a very simplistic situation For most real-world problems, we have to model much more complex scenarios For these cases (rule-based) fuzzy controllers are used

Articial Intelligence andNeural Networks p.377/410

Articial Intelligence andNeural Networks p.378/

Fuzzy Controllers
Mainly used for controlling complex dynamic systems In that case, formal description by mathematical models is very difcult or even impossible Instead of mathematical model, knowledge of human experts in form of linguistic variables and rules is employed

Fuzzy Controllers(2)
In principle, fuzzy controllers work as follows It observes its environment checking for unusual events In a fuzzication phase the input data is transformed into fuzzy sets Based on a (fuzzy) rule set the input data is evaluated and certain actions may be triggered The output data (which is also described in terms of fuzzy sets) needs to be defuzzied

Fuzzy Controllers(3)
Fuzzy controllers can also be seen as intelligent agents using fuzzy logic for their reasoning:

Fuzzy Controller

Sensors Fuzzification Environment

Rule Set

Inference Defuzzification Actuators

Articial Intelligence andNeural Networks p.379/410

Articial Intelligence andNeural Networks p.380/410

Articial Intelligence andNeural Networks p.381/

Fuzzy Rules
Fuzzy rules are of the general form if antecedent(s) then consequent(s) Antecedents of a rule form a combination of fuzzy sets (which are connected via logic operator) The consequent part is usually a single fuzzy set (multiple combined fuzzy sets can also appear)

Example
Let us look at an exemplary application to clarify the functionality of a fuzzy controller We want to monitor the performance of a Web server running on a cluster The goal is to do (automatic) load balancing in order to use resources efciently

Example(2)
The cpu load of a machine is described using fuzzy sets:

1 low

medium high

20 40 60 80 100 cpu load


Articial Intelligence andNeural Networks p.382/410 Articial Intelligence andNeural Networks p.383/410

Articial Intelligence andNeural Networks p.384/

Example(3)
A machine with a cpu load of 60% has a medium load to a degree of 0.5 and a high load to a degree of 0.2:

Example(4)
We have different ways to react to a situation To keep things simple, we look at two of them Scale-up: moving a service to a more powerful machine Scale-out: starting a new instance of a service

Example(5)
Lets assume that cpuLoad and performanceIndex are input variables (performanceIndex expressing how powerful a machine is) and scaleUp and scaleOut are output variables Then rules could look like this IF (cpuLoad IS high AND (performanceIndex IS low OR performanceIndex IS medium)) THEN scaleUp IS applicable IF (cpuLoad IS high AND performanceIndex IS high) THEN scaleOut IS applicable

1 0.5 0.2 20 40 60 80 100 cpu load


Articial Intelligence andNeural Networks p.385/410

Articial Intelligence andNeural Networks p.386/410

Articial Intelligence andNeural Networks p.387/

Example(6)
Lets assume that we have a cpu load of 90%, then for degrees of membership we get:
low_load (90) = 0.0 medium_load (90) = 0.0 high_load (90) = 0.8

Example(7)
For the antecedent of the rst rule we get:
0.8 AND (0.0 OR 0.6) = min(0.8, max(0.0, 0.6)) = 0.6

Example(8)
The applicability of a scale-up is also described with the help of a linguistic fuzzy variable:

For the antecedent of the rst rule we get:


0.8 AND 0.3 = min(0.8, 0.3) = 0.3

Furthermore assume that for the performance index 5 we have the following degrees of membership:
low_perf (5) = 0.0 medium_perf (5) = 0.6 high_perf (5) = 0.3

In classical logic: if the antecedents are true, then the implications are true In fuzzy logic there are several different approaches, we use min-max inference

0.2 0.4 0.6 0.8 1.0 applicability (of scaleup)


Articial Intelligence andNeural Networks p.388/410 Articial Intelligence andNeural Networks p.389/410

Articial Intelligence andNeural Networks p.390/

Example(9)
Using min-max inference the result set is cut off at the degree of the antecedent (for scale-up 0.6):

Example(10)
We use the left-most point of the maximal value to defuzzify the result In this case we say that scale-up is applicable to a degree of 0.6

Applications
The rst application of fuzzy control comes from the work of Mamdani and Assilan (1975) Design of a fuzzy controller for a steam engine Objective was to maintain a constant speed by controlling the pressure on pistons Was done by adjusting the heat supplied to a boiler

1 0.6

Assuming a similar set describing the applicability of a scale-out, for the second rule we get an applicability of 0.3 Since 0.6 > 0.3, we decide to scale-up the service in this case

0.2 0.4 0.6 0.8 1.0 applicability (of scaleup)


Articial Intelligence andNeural Networks p.391/410 Articial Intelligence andNeural Networks p.392/410

Articial Intelligence andNeural Networks p.393/

Applications(2)
Since then, a vast number of fuzzy controllers have been developed: Washing machines Video cameras Air conditioners Robot control Underground trains Hydro-electrical power plants

Chapter Summary
Fuzzy controllers have been very successful in commercial products Although critics argue that these applications are successful because they are quite simple (e.g. have small rule base) There have been attempts to merge fuzzy set theory and probability theory, however, there remain many open questions

Chapter 12

Social and Philosophical Implications of AI

Articial Intelligence andNeural Networks p.394/410

Articial Intelligence andNeural Networks p.395/410

Articial Intelligence andNeural Networks p.396/

Outline
Can machines act intelligently? Can machines really think? Ethics and risks of developing AI

Terminology
Weak AI: assertion that machines could possibly act intelligently (or act as if they were intelligent) Strong AI: assertion that machines that do so are actually thinking Opinion of most AI researchers: Take weak AI hypothesis for granted Dont care about strong AI hypothesis (as long as programs work)

Weak AI
Some philosophers have tried to prove that AI is impossible If it is possible or impossible depends on how it is dened Engineering point of view: nding best agent program on a given architecture Philosophical point of view: Comparing two architectures: human and machine Traditionally posed the question: can machines think? Unfortunately, there is no unambiguous denition of thinking

Articial Intelligence andNeural Networks p.397/410

Articial Intelligence andNeural Networks p.398/410

Articial Intelligence andNeural Networks p.399/

Strong AI
Main criticism: even if a machine passes the Turing test, is it actually thinking or just simulating the thinking process? Chinese room problem: Human (who doesnt understand Chinese) is put into a room with sheets of paper and detailed instructions Sheets of paper with Chinese writing are slipped under the door Human looks up in the instructions what to do, paints some characters on a paper, slips it back From the outside this may seem like an intelligent agent understanding Chinese is at work

Ethics and Risks


AI may pose some serious problems: People might lose their jobs to automation People might have too much (or too little) leisure time People might lose their sense of being unique People might lose some their privacy rights Use of AI systems might result in a loss of accountability Success of AI might mean the end of human race

Automation
This is not a new problem, happens every time new technology is deployed Some people lose their jobs New jobs are created elsewhere Main problem: usually new jobs demand higher qualication

Articial Intelligence andNeural Networks p.400/410

Articial Intelligence andNeural Networks p.401/410

Articial Intelligence andNeural Networks p.402/

Leisure Time
Arthur C. Clarke once wrote that people might face a future of utter boredom No risk of that yet, due to integrated computerized systems that run 7/24, people tend to work longer hours Winner-Takes-All-Society: Traditional industrial economy: working 10% more result roughly in 10% more prot Fast-paced information age economy: an edge of 10% over competitor might mean 100% more prot

Uniqueness
AI research might suggest that human capabilities are not that unique after all Mankind has survived similar setbacks before: Copernicus moving the Earth out of the center of the universe Darwin putting Homo sapiens on the level of other species

Privacy Rights
Widespread wiretapping becomes possible Computer systems using language translation, speech recognition, and keyword search already sift through telephone, email and fax trafc There is an ongoing controversial debate about this: Scott McNealy (CEO Sun): You have zero privacy anyway. Get over it. Louis Brandeis (Judge, 1890): Privacy is the most comprehensive of all rights . . . the right to ones personality.

Articial Intelligence andNeural Networks p.403/410

Articial Intelligence andNeural Networks p.404/410

Articial Intelligence andNeural Networks p.405/

Accountability
What is the legal liability of an AI system? Who takes responsibility if something goes wrong? This is magnied when money changes hands: Who is liable for any debts made by an intelligent agent It may also play a role in life and death situations: When a physician uses a medical expert system, who is at fault if the diagnosis is wrong?

End of Human Race


Almost any technology can cause harm in the wrong hands However, with AI, technology might take things into its own hands (Terminator, Matrix) Nevertheless, AI might achieve a sort of conquest by serving and becoming indispensable

What if AI Does Succeed?


We already covered some of the ethical question Modest successes in AI have already changed computer science in some way, making possible new applications Medium-level successes in AI would affect all kinds of people in their daily lives (like cell phones or the Internet) Large-scale success would change the lives of a majority of humankind, the very nature of our work and play would be altered

Articial Intelligence andNeural Networks p.406/410

Articial Intelligence andNeural Networks p.407/410

Articial Intelligence andNeural Networks p.408/

Lecture Summary
This lecture can only be seen as a brief introduction into this subject AI has made quite a progress in its short history Final word belongs to Alan Turing: We can see only a short distance ahead, but we can see that much remains to be done.

Articial Intelligence andNeural Networks p.409/410

Você também pode gostar