Você está na página 1de 554

Artificial Intelligence

Prof. Dr. Jürgen Dix

Department of Computer Science


TU Clausthal

SS 2005

Artificial Intelligence – Prof. Dr. Jürgen Dix (1/554)


Time and place: Wednesdays and Thursdays 10–12
in T4 (Tannenhöhe)
Exercises: Every other Wednesday (approx.).

Website
http://cig.in.tu-clausthal.de/
Visit regularly!

There you will find important information about the


lecture, documents, excercises et cetera.

Organization: Lecture: Dix, Exercise: Jamroga

Exam: Oral exams on demand.


Artificial Intelligence – Prof. Dr. Jürgen Dix (2/554)
History
This course evolved over the last 10 years. It was first held in Koblenz
(WS 95/96, WS 96/97), then in Vienna (Austria, WS 98, WS00) , Bahia
Blanca (Argentina, SS 98, SS 01) and Clausthal (SS 2004).
Most chapters (except 5,6 and 8) are based on the seminal book of
Russel/Norvig: Artificial Intelligence.

I am indebted to Tristan Behrens who translated my slides


into English using beamer tex (and made many stylistic
improvements).

Artificial Intelligence – Prof. Dr. Jürgen Dix (3/554)


Lecture Overview

1. Introduction
2. Solving Problems by Searching
3. Learning from Observations
4. Neural Networks
5. Knowledge Engineering
6. More about Logic
7. Planning
8. Reasoning under Uncertainty

Artificial Intelligence – Prof. Dr. Jürgen Dix (4/554)


1. Introduction

1. Introduction

Artificial Intelligence – Prof. Dr. Jürgen Dix (5/554)


1. Introduction

1.1 Overview
1.2 Basics
1.3 History of AI
1.4 Intelligent agents

Artificial Intelligence – Prof. Dr. Jürgen Dix (6/554)


1. Introduction 1. What is AI?

1.1 What is AI?

Artificial Intelligence – Prof. Dr. Jürgen Dix (7/554)


1. Introduction 1. What is AI?

“The exciting new effort to make computers “The study of mental faculties through the
think . . . machines with minds, in the full use of computational models”
and literal sense” (Haugeland, 1985) (Charniak and McDermott, 1985)
“[The automation of] activities that we asso- “The study of the computations that make
ciate with human thinking, activities such as it possible to perceive, reason, and act”
decision-making, problem solving, learning (Winston, 1992)
. . .” (Bellman, 1978)

“The art of creating machines that perform “A field of study that seeks to explain and
functions that require intelligence when per- emulate intelligent behavior in terms of
formed by people” (Kurzweil, 1990) computational processes” (Schalkoff, 1990)
“The study of how to make computers do “The branch of computer science that is con-
things at which, at the moment, people are cerned with the automation of intelligent
better” (Rich and Knight, 1991) behavior” (Luger and Stubblefield, 1993)

Artificial Intelligence – Prof. Dr. Jürgen Dix (8/554)


1. Introduction 1. What is AI?

1. Cognitive science

2. ”Socrates is a man. All men are mortal. Therefore


Socrates is mortal.”
(Famous syllogisms by Aristotle.)
(1) Informal description
(2) Formal description
(3) Problem solution
(2) is often problematic due to under-specification
(3) is deduction (correct inferences): only
enumerable not decidable
Artificial Intelligence – Prof. Dr. Jürgen Dix (9/554)
1. Introduction 1. What is AI?

3. Turing Test :
http://cogsci.ucsd.edu/∼asaygin/tt/ttest.html
http://www.loebner.net/Prizef/loebner-prize.html

Standard Turing Test

Total Turing Test

Turing believed in 1950:


In 2000 a computer with 109 memory-units could be
programmed such that it can chat with a human for 5
minutes and pass the Turing Test with a probability of
30 %.
Artificial Intelligence – Prof. Dr. Jürgen Dix (10/554)
1. Introduction 1. What is AI?

4. In item 2. correct inferences were mentioned.

Often not enough information is available in


order to act in a way that makes sense (to act
in a provably correct way).
non-monotonic logics.

The world is in general under-specified. It is also


impossible to act rationally without correct
inferences: reflexes.

Artificial Intelligence – Prof. Dr. Jürgen Dix (11/554)


1. Introduction 1. What is AI?

To pass the total Turing Test , the computer will


need:
computer vision to perceive objects,
robotics to move them about.

The question Are machines able to think? leads


to 2 theses:
Weak AI thesis: Machines can be built, that act
as if they were intelligent.
Strong AI thesis: Machines that act in an
intelligent way do possess cognitive states, i.e.
mind.

Artificial Intelligence – Prof. Dr. Jürgen Dix (12/554)


1. Introduction 1. What is AI?

The Chinese Chamber was used in 1980 by Searle


to demonstrate that a system can pass the
Turing Test but disproves the Strong AI

Thesis . The chamber contains:


CPU: an English-speaking man without experiences with
the Chinese language,
Program: a book containing rules formulated in English
which describe how to translate Chinese texts,
Memory: sufficient pens and paper
Papers with Chinese texts are passed to the man, which he
translates using the book.

Question
Does the chamber understand the Chinese
language?
Artificial Intelligence – Prof. Dr. Jürgen Dix (13/554)
1. Introduction 2. Basics

1.2 Basics

Artificial Intelligence – Prof. Dr. Jürgen Dix (14/554)


1. Introduction 2. Basics

450 BC : Plato, Socrates, Aristotle


Sokr.: ”What is characteristic of piety which
makes all actions pious?”
Aris.: ”Which laws govern the rational part of
the mind?”
800 : Al Chwarizmi (Arabia): Algorithm
1300 : Raymundus Lullus: Ars Magna
1350 : William van Ockham: Ockham's Razor
”Entia non sunt multiplicanda
praeter necessitatem.”

Artificial Intelligence – Prof. Dr. Jürgen Dix (15/554)


1. Introduction 2. Basics

1596–1650 : R. Descartes:
Mind = physical system

Free will, dualism


1623–1662 : B. Pascal, W. Schickard:
Addition-machines

Artificial Intelligence – Prof. Dr. Jürgen Dix (16/554)


1. Introduction 2. Basics

1646–1716 : G. W. Leibniz:
Materialism, uses some ideas of Ars Magna

to build a machine for simulating the human mind


1561–1626 : F. Bacon: Empirism
1632–1704 : J. Locke: Empirism
”Nihil est in intellectu quod non antefuerat in sensu.”
1711–1776 : D. Hume: Induction
1724–1804 : I. Kant: ”Der Verstand schöpft seine Gesetze nicht
aus der Natur, sondern schreibt sie dieser vor.”

Artificial Intelligence – Prof. Dr. Jürgen Dix (17/554)


1. Introduction 2. Basics

1805 : Jacquard: Loom


1815–1864 : G. Boole:
Formal language,
Logic as a mathematical discipline

Artificial Intelligence – Prof. Dr. Jürgen Dix (18/554)


1. Introduction 2. Basics
1792–1871 : Ch. Babbage:
Difference Engine: Logarithm-tables
Analytical Engine: with addressable memory,

stored programs and conditional jumps

Artificial Intelligence – Prof. Dr. Jürgen Dix (19/554)


1. Introduction 2. Basics
1848–1925 : G. Frege:
: 2-dimensional notation for PL1
Begriffsschrift

Artificial Intelligence – Prof. Dr. Jürgen Dix (20/554)


1. Introduction 2. Basics

1862–1943 : D. Hilbert:
Famous talk 1900 in Paris: 23 problems
23rd problem: The Entscheidungsproblem
1872–1970 : B. Russell:
1910: Principia Mathematica
Logical positivism, Vienna Circle (1920–40)

Artificial Intelligence – Prof. Dr. Jürgen Dix (21/554)


1. Introduction 2. Basics

1902–1983 : A. Tarski: (1936)


Idea of truth in formal languages

1906–1978 : K. Gödel:
Completeness theorem (1930)
Incompleteness theorem (1930/31)
Unprovability of theorems (1936)
1912–1954 : A. Turing:
Turing-machine (1936)

Computability

1903–1995 : A. Church:
λ-Calculus, Church-Turing-thesis

Artificial Intelligence – Prof. Dr. Jürgen Dix (22/554)


1. Introduction 2. Basics

1940 : First computer ”Heath Robinson”


built to decipher German messages (Turing)
1943 ”Collossus” built from vacuum tubes

Artificial Intelligence – Prof. Dr. Jürgen Dix (23/554)


1. Introduction 2. Basics
1941 : First operational programmable computer:
Z 3 by K. Zuse (Deutsches Museum)

with floating-point-arithmetic.
Plankalkül: First high-level programming language

Artificial Intelligence – Prof. Dr. Jürgen Dix (24/554)


1. Introduction 2. Basics

1940-1945 : H. Aiken: develops MARK I, II, III.


ENIAC: First general purpose electronic computer

Artificial Intelligence – Prof. Dr. Jürgen Dix (25/554)


1. Introduction 2. Basics
1948 : First stored program computer (The Baby)
Tom Kilburn (Manchester)
Manchester beats Cambridge by 3 months

Artificial Intelligence – Prof. Dr. Jürgen Dix (26/554)


1. Introduction 2. Basics

First program run on a Stored Program Computer in 1948:

Artificial Intelligence – Prof. Dr. Jürgen Dix (27/554)


1. Introduction 2. Basics

1952 : IBM: IBM 701, first computer to yield a profit


(Rochester et al.)

Artificial Intelligence – Prof. Dr. Jürgen Dix (28/554)


1. Introduction 3. History of AI

1.3 History of AI

Artificial Intelligence – Prof. Dr. Jürgen Dix (29/554)


1. Introduction 3. History of AI

The year 1943:


McCulloch and W. Pitts drew on three sources:
1
physiology and function of neurons in the brain
2
propositional logic due to Russell/Whitehead
3
Turing’s theory of computation.

Model of artificial, connected neurons:


Any computable function could be computed by some
network of neurons
All the logical connectives could be implemented by simple
net-structures

Artificial Intelligence – Prof. Dr. Jürgen Dix (30/554)


1. Introduction 3. History of AI

The year 1951:


Minsky and Edwards build the first computer based
on neuronal networks (Princeton)

The year 1952:


A. Samuel develops programs for checkers that learn
to play tournament-level checkers.

Artificial Intelligence – Prof. Dr. Jürgen Dix (31/554)


1. Introduction 3. History of AI

The year 1956:


Two-month workshop at Dartmouth organized by
McCarthy, Minsky, Shannon and Rochester.

Idea:
Combine knowledge about automata theory, neural
nets and the studies of intelligence (10 participants)

Artificial Intelligence – Prof. Dr. Jürgen Dix (32/554)


1. Introduction 3. History of AI

Newell und Simon show a reasoning program, the


Logic Theorist, able to prove most of the theorems

in chapter 2 of the Principia Mathematica (even one


with a shorter proof).
But the Journal of Symbolic Logic rejected a paper
authored by Newell, Simon and Logical Theorist.
Newell and Simon claim to have solved the venerable
mind-body problem.

Artificial Intelligence – Prof. Dr. Jürgen Dix (33/554)


1. Introduction 3. History of AI

The term Artificial Intelligence is proposed to be


name of the new discipline and is established.

Logic Theorist is followed up with the General


Problem Solver, which was designed from the start
to imitate human problem-solving protocols.

Artificial Intelligence – Prof. Dr. Jürgen Dix (34/554)


1. Introduction 3. History of AI

The year 1958: Birthyear of AI


McCarthy joins the MIT and develops:
1 Lisp, the dominant AI programming language
2 Time-Sharing to optimize the use of computer-time
3 Programs with Common-Sense.
Advice-Taker: A hypothetical program that can be seen as
the first complete AI system. Unlike others it embodies
general knowledge of the world.

Artificial Intelligence – Prof. Dr. Jürgen Dix (35/554)


1. Introduction 3. History of AI

The year 1959:


H. Gelernter: Geometry Theorem Prover

Artificial Intelligence – Prof. Dr. Jürgen Dix (36/554)


1. Introduction 3. History of AI

The years 1960-1966:


McCarthy concentrates on knowledge-representation
and reasoning in formal logic ( Robinson’s
Resolution, Green’s Planner , Shakey ).

Minksy is more interested in getting programs to


work and focusses on special worlds, the so-called
Microworlds

Artificial Intelligence – Prof. Dr. Jürgen Dix (37/554)


1. Introduction 3. History of AI

SAINT able to solve integration problems typical of first-year


college calculus courses
ANALOGY able to solve geometric analogy problems that
appear in IQ-tests

is to as is to:

1 2 3 4 5

Artificial Intelligence – Prof. Dr. Jürgen Dix (38/554)


1. Introduction 3. History of AI

Blocksworld is the most famous microworld.

Work building on the neural networks of McCulloch and Pitts


continued. Perceptrons by Rosenblatt and convergence
theorem:

Convergence theorem
An algorithm exists that can adjust the connection
strengths of a perceptron to match any input data.
Artificial Intelligence – Prof. Dr. Jürgen Dix (39/554)
1. Introduction 3. History of AI

Summarized: Great promises for the future, initial


successes but miserable further results.

The year 1966: All US funds for research are


cancelled

Inacceptable mistake
The spirit is willing but the flesh is weak.
was translated into
The vodka is good but the meat is rotten.

Artificial Intelligence – Prof. Dr. Jürgen Dix (40/554)


1. Introduction 3. History of AI

The years 1966–1974: A dose of reality


Simon 1958: ”In 10 years a computer will be
grandmaster of chess.”

Simple problems are solvable due to small


search-space. Serious problems remain unsolvable.

Hope:
Faster hardware and more memory will solve
everything!
NP-Completeness, S. Cook/R. Karp (1971/72),
P 6= NP?

Artificial Intelligence – Prof. Dr. Jürgen Dix (41/554)


1. Introduction 3. History of AI

The year 1973:


The Lighthill report forms the basis for the decision by
the British government to end support for AI research.

Minsky’s book Perceptrons proved limitations of


some approaches with fatal consequences:
Research funding for neural net research decreased
to almost nothing.

Artificial Intelligence – Prof. Dr. Jürgen Dix (42/554)


1. Introduction 3. History of AI

The years 1969–79: Knowledge-based systems


General purpose mechanisms are called weak methods,
because they use weak information about the domain. For many
complex problems it turns out that their performance is also
weak.
Idea:
Use knowledge suited to making larger reasoning steps and to
solving typically occuring cases on narrow areas of expertise.

Example DENDRAL (1969)


Leads to expert systems like MYCIN (diagnosis of blood
infections).

The year 1973: Programming language PROLOG


The year 1974: Relational database-model (Codd)
Artificial Intelligence – Prof. Dr. Jürgen Dix (43/554)
1. Introduction 3. History of AI

Something to laugh about: In 1902 a German poem was


translated into Japanese. The Japanese version war translated
into French. At last this version was translated back into
German, assuming that it was a Japanese poem.

The result:

Stille ist im Pavillon aus Jade,


Krähen fliegen stumm
Zu beschneiten Kirschbäumen im Mondlicht.
Ich sitze
und weine.

Which is the original poem?


Artificial Intelligence – Prof. Dr. Jürgen Dix (44/554)
1. Introduction 4. Intelligent Agents

1.4 Intelligent Agents

Artificial Intelligence – Prof. Dr. Jürgen Dix (45/554)


1. Introduction 4. Intelligent Agents

Definition 1 (Agent a )
An agent a is anything that can be viewed as
perceiving its environment through sensor and
acting upon that environment through effectors.

sensors

percepts
?
environment
agent
actions

effectors

Artificial Intelligence – Prof. Dr. Jürgen Dix (46/554)


1. Introduction 4. Intelligent Agents

Definition 2 (Rational Agent, Omniscient Agent)


A rational agent is one that does the right thing
(Performance measure determines how successful
an agent is).
A omniscient agent knows the actual outcome of his
actions and can act accordingly.

Attention:
A rational agent is in general not omniscient!

Artificial Intelligence – Prof. Dr. Jürgen Dix (47/554)


1. Introduction 4. Intelligent Agents

Question
What is the right thing and what does it depend on?

1 Performance measure (as objective as possible).


2 Percept sequence (everything the agent has received so
far).
3 The agent’s knowledge about the environment.
4 How the agent can act.

Artificial Intelligence – Prof. Dr. Jürgen Dix (48/554)


1. Introduction 4. Intelligent Agents

Definition 3 (Ideal Rational Agent)


For each possible percept-sequence an ideal
rational agent should do whatever action is expected
to maximize its performance measure (based on the
evidence provided by the percepts and built-in
knowledge).

Artificial Intelligence – Prof. Dr. Jürgen Dix (49/554)


1. Introduction 4. Intelligent Agents

Mappings:
set of percept sequences 7→ set of actions

can be used to describe agents in a mathematical


way.

Hint:
Internally an agent is

agent = architecture + program

AI is engaged in designing agent programs

Artificial Intelligence – Prof. Dr. Jürgen Dix (50/554)


1. Introduction 4. Intelligent Agents

Agents and their PAGE-description:


Agent Type Percepts Actions Goals Environment

Medical diagnosis Symptoms, Questions, tests, Healthy patient, Patient, hospital


system findings, patient’s treatments minimize costs
answers

Satellite image Pixels of varying Print a Correct Images from


analysis system intensity, color categorization of categorization orbiting satellite
scene

Part-picking robot Pixels of varying Pick up parts and Place parts in Conveyor belt
intensity sort into bins correct bins with parts

Refinery controller Temperature, Open, close Maximize purity, Refinery


pressure readings valves; adjust yield, safety
temperature

Interactive English Typed words Print exercises, Maximize Set of students


tutor suggestions, student’s score on
corrections test

Artificial Intelligence – Prof. Dr. Jürgen Dix (51/554)


1. Introduction 4. Intelligent Agents

A simple agent program:


function SKELETON-AGENT( percept) returns action
static: memory, the agent’s memory of the world
memory UPDATE-MEMORY(memory, percept)
action CHOOSE-BEST-ACTION(memory)
memory UPDATE-MEMORY(memory, action)
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (52/554)


1. Introduction 4. Intelligent Agents

In theory everything is trivial:


function TABLE-DRIVEN-AGENT( percept) returns action
static: percepts, a sequence, initially empty
table, a table, indexed by percept sequences, initially fully specified

append percept to the end of percepts


action LOOKUP( percepts, table)
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (53/554)


1. Introduction 4. Intelligent Agents

An agent example – taxi driver:


Agent Type Percepts Actions Goals Environment

Taxi driver Cameras, Steer, accelerate, Safe, fast, legal, Roads, other
speedometer, GPS, brake, talk to comfortable trip, traffic, pedestrians,
sonar, microphone passenger maximize profits customers

Artificial Intelligence – Prof. Dr. Jürgen Dix (54/554)


1. Introduction 4. Intelligent Agents

Question:
How can agent programs be designed?

There are four types of agent programs:


Simple reflex agents
Agents that keep track of the world
Goal-based agents
Utility-based agents

Artificial Intelligence – Prof. Dr. Jürgen Dix (55/554)


1. Introduction 4. Intelligent Agents

Some examples:
1
Production rules: If the driver in front hits the
breaks, then hit the breaks too.
function SIMPLE-REFLEX-AGENT( percept) returns action
static: rules, a set of condition-action rules
state INTERPRET-INPUT( percept)
rule RULE-MATCH(state, rules)
action RULE-ACTION[rule]
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (56/554)


1. Introduction 4. Intelligent Agents

Definition 4 (Actions A, Percepts, States)


Percepts P, states S

A := {a1 , a2 , . . . , an } is the set of actions.


P := {p1 , p2 , . . . , pm } is the set of percepts.
S := {s1 , s2 , . . . , sl } is the set of states

Artificial Intelligence – Prof. Dr. Jürgen Dix (57/554)


1. Introduction 4. Intelligent Agents

How does the environment develop (the state s)


when an action a is executed?
We describe this with function too

env : S × A −→ 2S

this includes non-deterministic environments.

Artificial Intelligence – Prof. Dr. Jürgen Dix (58/554)


1. Introduction 4. Intelligent Agents

How do we describe agents?


We could take a function action : P −→ A.

Agent Sensors

What the world


is like now

Environment
What action I
Condition−action rules should do now

Effectors

Artificial Intelligence – Prof. Dr. Jürgen Dix (59/554)


1. Introduction 4. Intelligent Agents

This is too weak! Take the whole history into account.


With history:
s0 →a0 s1 →a1 . . . sn →an . . . (or the sequence of
observations).

Artificial Intelligence – Prof. Dr. Jürgen Dix (60/554)


1. Introduction 4. Intelligent Agents

Question:
Why not doing the same for env?

We define the run of an agent in an environment as a


sequence of interleaved states and actions:
Definition 5 (Run r , R = Ract ∪ Rstate ))
A run r over A and S is a finite sequence

r : s0 →a0 s1 →a1 . . . sn →an . . .

Such a sequence may end with a state sn or with an


action an : we denote by Ract the set of runs ending
with an action and by Rstate the set of runs ending
with a state.
Artificial Intelligence – Prof. Dr. Jürgen Dix (61/554)
1. Introduction 4. Intelligent Agents

Definition 6 (Environment)
An environment Env is a triple hS, s0 ,ττi consisting of
1
the set S of states,
2
the inital state s0 ∈ S,
3
a function τ : Ract −→ 2S , which describes how
the environment changes when an action is
performed (given the whole history).

Artificial Intelligence – Prof. Dr. Jürgen Dix (62/554)


1. Introduction 4. Intelligent Agents

Definition 7 (Agent a)
An agent a is determined by a function

action : Rstate −→ A,

describing which action the agent performs, given its


current history.

Important:
An agent system is then a pair a = haction, Env i
consisting of an agent and an environment.
We denote by R(a a, Env ) the set of runs of agent a in
environment Env .

Artificial Intelligence – Prof. Dr. Jürgen Dix (63/554)


1. Introduction 4. Intelligent Agents

Definition 8 (Characteristic Behaviour)


The characteristic behaviour of an agent a in an
environment Env is the set R of all possible runs
r : s0 →a0 s1 →a1 . . . sn →an . . . with:
1
for all n: an = action(hs0 , a0 . . . , an−1 , sn i),
2
for all n > 0: sn ∈ τ (s0 , a0 , s1 , a1 , . . . , sn−1 , an−1 ).
For deterministic τ , the relation “∈” can be replaced
by “=”.

Artificial Intelligence – Prof. Dr. Jürgen Dix (64/554)


1. Introduction 4. Intelligent Agents

Equivalence
Two agents a , b are called behaviourally equivalent
wrt. environment Env , if R(aa, Env ) = R(bb, Env ).
Two agents a , b are called behaviourally
equivalent, if they are behaviourally equivalent
wrt. all possible environments Env .

Artificial Intelligence – Prof. Dr. Jürgen Dix (65/554)


1. Introduction 4. Intelligent Agents

Question:
What about purely reactive agents? They can be
described much simpler!

Definition 9 (Purely Reactive Agent)


An agent is called purely reactive, if its function is
given by

action : S −→ A.

A purely reactive agent is called simple reflex agent


if actions are selected according to condition-action
rules.

Artificial Intelligence – Prof. Dr. Jürgen Dix (66/554)


1. Introduction 4. Intelligent Agents

function SIMPLE-REFLEX-AGENT( percept) returns action


static: rules, a set of condition-action rules

state INTERPRET-INPUT( percept)


rule RULE-MATCH(state, rules)
action RULE-ACTION[rule]
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (67/554)


1. Introduction 4. Intelligent Agents

function REFLEX-AGENT-WITH-STATE( percept) returns action


static: state, a description of the current world state
rules, a set of condition-action rules

state UPDATE-STATE(state, percept)


rule RULE-MATCH(state, rules)
action RULE-ACTION[rule]
state UPDATE-STATE(state, action)
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (68/554)


1. Introduction 4. Intelligent Agents

Replace states by percepts:

Definition 10 (Standard Agent a )


A standard agent a is given by a function

action : P∗ −→ A

together with

see : S −→ P.

An agent is thus a pair hsee, actioni.

Artificial Intelligence – Prof. Dr. Jürgen Dix (69/554)


1. Introduction 4. Intelligent Agents

Definition 11 (Indistinguishable)
Two different states s, s0 are called indistinguishable
for an agent a , if see(s) = see(s0 ).

The relation “indistinguishable” on S × S is an


equivalence relation.
What does | ∼ | = |S| mean?
And what | ∼ | = 1?

Artificial Intelligence – Prof. Dr. Jürgen Dix (70/554)


1. Introduction 4. Intelligent Agents

Definition 12 (Characteristic Behaviour)


The characteristic behaviour of a standard agent hsee, actioni
in an environment Env is the set of all finite sequences

p0 →a0 p1 →a1 . . . pn →an . . .

where

p0 = see(s0 ),
ai = action(hp0 , . . . , pi i),
pi = see(si ), where si ∈ τ (s0 , a0 , s1 , a1 , . . . , si−1 , ai −1 ).

Such a run, even if deterministic from the agent’s viewpoint, may


cover different environmental behaviours:
s0 →a0 s1 →a1 . . . sn →an . . .

Artificial Intelligence – Prof. Dr. Jürgen Dix (71/554)


1. Introduction 4. Intelligent Agents

Instead of using the whole history, resp. P∗ , one can


also use internal states I := {i1 , i2 , . . . in }.
Definition 13 (State-based Agent a state )
A state-based agent a state is given by a function
action : I −→ A together with

see : S −→ P,
and next : I × P −→ I.

Here next(i, p) is the successor state of i if p is


observed.

Artificial Intelligence – Prof. Dr. Jürgen Dix (72/554)


1. Introduction 4. Intelligent Agents

Sensors
State

How the world evolves What the world


is like now

Environment
What my actions do

What action I
Condition−action rules should do now

Agent Effectors

Artificial Intelligence – Prof. Dr. Jürgen Dix (73/554)


1. Introduction 4. Intelligent Agents

Definition 14 (Characteristic Behaviour)


The characteristic behaviour of a state-based agent a state in an
environment Env is the set of all finite sequences

(i0 , p0 ) →a0 (i1 , p1 ) →a1 . . . →an (in , pn ), . . .


with
1 for all n: an = action(in ),
2 for all n: next(in , pn ) = in+1 ,

Sequence covers the runs r : s0 →a0 s1 →a1 . . . where

aj = action(ij ),
sj ∈ τ(s0 , a0 , s1 , a1 , . . . , sj−1 , aj −1 ),
pj = see(sj )

Artificial Intelligence – Prof. Dr. Jürgen Dix (74/554)


1. Introduction 4. Intelligent Agents

Are state-based agents more expressive than


standard agents? How to measure?
Definition 15 (Environmental Behavior)
Environmental behaviour of an agent a state : set of
possible runs covered by characteristic behaviour of
the agent.

Artificial Intelligence – Prof. Dr. Jürgen Dix (75/554)


1. Introduction 4. Intelligent Agents

Theorem 16 (Equivalence)
Standard agents and state-based agents are
equivalent with respect to their environmental
behaviour.
More precisely: For each state-based agent a state
and next storage function there exists a standard
agent a which has the same environmental
behaviour, and vice versa.

Artificial Intelligence – Prof. Dr. Jürgen Dix (76/554)


1. Introduction 4. Intelligent Agents

3. Goal based agents:

Sensors
State

How the world evolves What the world


is like now

Environment
What my actions do What it will be like
if I do action A

What action I
Goals should do now

Agent Effectors

Will be discussed in chapter planning.


Artificial Intelligence – Prof. Dr. Jürgen Dix (77/554)
1. Introduction 4. Intelligent Agents

Intention:
We want to tell our agent what to do, without telling it
how to do it!
We need a utility measure.

Artificial Intelligence – Prof. Dr. Jürgen Dix (78/554)


1. Introduction 4. Intelligent Agents

How can we define a utility function?


Take for example

u : S 7→ R

Question:
What is the problem with this definition?

Better idea:
u : R 7→ R

Take the set of runs R and not the set of states S.

Artificial Intelligence – Prof. Dr. Jürgen Dix (79/554)


1. Introduction 4. Intelligent Agents

4. Agents with a utility measure:


Sensors
State

How the world evolves What the world


is like now

Environment
What my actions do What it will be like
if I do action A

Utility How happy I will be


in such a state

What action I
should do now

Agent Effectors

The utility measure can often be simulated


through a set of goals.
Artificial Intelligence – Prof. Dr. Jürgen Dix (80/554)
1. Introduction 4. Intelligent Agents

Example:
Tileworld
H H

T T T
T T T H
(a) (b) (c)

Artificial Intelligence – Prof. Dr. Jürgen Dix (81/554)


1. Introduction 4. Intelligent Agents

Question:
How do properties of the environment influence the
design of a fitting agent?

Artificial Intelligence – Prof. Dr. Jürgen Dix (82/554)


1. Introduction 4. Intelligent Agents

Some principal properties:


Accessible/inaccessible: If the environment is not completely
accessible the agent will need internal states.
Deterministic/nondeterministic: If the environment is
inaccessible, then it may appear nondeterministic.
Episodic/nonepisodic: Percept-action-sequences are
independent. The agent’s experience is divided
into episodes.
Static/dynamic: The environment can change while an agent ist
deliberating. Semi-dynamisch. An environment is
semidynamic if it does not change with the passage
of time but the agent’s performance measure does.
Discrete/continuous: If there is a limited number of percepts and
actions the environment is discrete.

Artificial Intelligence – Prof. Dr. Jürgen Dix (83/554)


1. Introduction 4. Intelligent Agents

Environment Accessible Deterministic Episodic Static Discrete


Chess with a clock Yes Yes No Semi Yes
Chess without a clock Yes Yes No Yes Yes
Poker No No No Yes Yes
Backgammon Yes No No Yes Yes
Taxi driving No No No No No
Medical diagnosis system No No No No No
Image-analysis system Yes Yes Yes Semi No
Part-picking robot No No Yes No No
Refinery controller No No No No No
Interactive English tutor No No No No Yes

Artificial Intelligence – Prof. Dr. Jürgen Dix (84/554)


2. Searching

2. Searching

Artificial Intelligence – Prof. Dr. Jürgen Dix (85/554)


2. Searching

2.1 Problem formulation


2.2 Uninformed search strategies
2.3 Best first (Greedy) search
2.4 A∗
2.5 Heuristics
2.6 Searching with limited memory
2.7 Iterative improvements

Artificial Intelligence – Prof. Dr. Jürgen Dix (86/554)


2. Searching

Wanted: Problem-solving agents, which are


obviously a subclass of the goal-oriented
agents.
Structure: Formulate, Search, Execute.

Artificial Intelligence – Prof. Dr. Jürgen Dix (87/554)


2. Searching

Formulate:
Goal: a set of states,
Problem: States, actions mapping
from states into states, transitions
Search: Which sequence of actions is helpful?
Execute: The resulting sequence of actions is
applied to the initial state.

Artificial Intelligence – Prof. Dr. Jürgen Dix (88/554)


2. Searching

function S IMPLE -P ROBLEM -S OLVING -AGENT( ) returns an action    

inputs: , a percept    

static: , an action sequence, initially empty  

, some description of the current world state





, a goal, initially null    

, a problem formulation      

U PDATE -S TATE(



, )

 

    

if is empty then do  

F ORMULATE -G OAL( )     



F ORMULATE -P ROBLEM(       



,    

)
S EARCH( )         

F IRST( )

%  &   

R EST( )
    

return  
%  &

When executing, percepts are ignored.

Artificial Intelligence – Prof. Dr. Jürgen Dix (89/554)


2. Searching 1. Problem formulation

2.1 Problem formulation

Artificial Intelligence – Prof. Dr. Jürgen Dix (90/554)


2. Searching 1. Problem formulation

We distinguish four types:


1 1-state-problems: Actions are completely described.
Complete information through sensors to determine the
actual state.
2 Multiple-state-problems: Actions are completely
described, but the initial state is not certain.
3 Contingency-problems: Sometimes the result is not a
fixed sequence, so the complete tree must be considered.
( "Excercise: Murphy’s law, Chapter 7. Planning)
4 Exploration-problems: Not even the effect of each action
is known. You have to search in the world instead of the
abstract model.

Artificial Intelligence – Prof. Dr. Jürgen Dix (91/554)


2. Searching 1. Problem formulation

1 2

3 4

5 6

7 8

Artificial Intelligence – Prof. Dr. Jürgen Dix (92/554)


2. Searching 1. Problem formulation

Definition 17 (1-state-problem)
A 1-state-problem consists of:
a set of states (incl. the initial state)
a set of n actions (operators), which – applied to a
state – leads to an other state:
Operatori : States 7→ States, i = 1, . . . , n
We use a function Successor-Fn: S 7→ A × S
a set of goal states or a goal test, which – applied
on a state – determines if it is a goal-state or not.
a cost function g, which assesses every path in
the state space (set of reachable states).
g is usually additive.
Artificial Intelligence – Prof. Dr. Jürgen Dix (93/554)
2. Searching 1. Problem formulation

What about multiple-state-problems?


( excercise)

How to choose actions and states?


( abstraction)
Definition 18 (State Space)
The state space of a problem is the set of all
reachable states (from the initial state). It forms a
directed graph with the states as nodes and the arcs
the actions leading from one state to another.

Artificial Intelligence – Prof. Dr. Jürgen Dix (94/554)


2. Searching 1. Problem formulation

Game-example:

5 4 5
1 4
2 3

6 1 88 6
8 84

7 3 22 7 6 25

Start State Goal State

Artificial Intelligence – Prof. Dr. Jürgen Dix (95/554)


2. Searching 1. Problem formulation

Game-example:

Artificial Intelligence – Prof. Dr. Jürgen Dix (96/554)


2. Searching 1. Problem formulation

State Space for Vacuum world

R
L R

S S

R R
L R L R

L L
S S
S S

R
L R

S S

Missionaries vs. cannibals


Artificial Intelligence – Prof. Dr. Jürgen Dix (97/554)
2. Searching 1. Problem formulation

Belief Space for Vacuum world without sensors


L

L R

S S
L R
R

S S

R L

L R

S S
R

Artificial Intelligence – Prof. Dr. Jürgen Dix (98/554)


2. Searching 1. Problem formulation

Real-world-problems:
travelling-salesman-problem
VLSI-layout
labelling maps
robot-navigation

Artificial Intelligence – Prof. Dr. Jürgen Dix (99/554)


2. Searching 2. Uninformed search

2.2 Uninformed search

Artificial Intelligence – Prof. Dr. Jürgen Dix (100/554)


2. Searching 2. Uninformed search

Map of Romania
Oradea
71
Neamt

Zerind 87
151
75
Iasi
Arad
140
92
Sibiu Fagaras
99
118
Vaslui
80
Rimnicu Vilcea
Timisoara
142
111 Pitesti 211
Lugoj 97
70 98
85 Hirsova
146 101
Mehadia Urziceni
75 138 86
Bucharest
Dobreta 120
90
Craiova Eforie
Giurgiu

Choose, test, expand RBFS

Artificial Intelligence – Prof. Dr. Jürgen Dix (101/554)


2. Searching 2. Uninformed search

Principle: Choose, test, expand. ( Search-tree)


(a) The initial state Arad

Sibiu Timisoara Zerind

Arad Fagaras Oradea Rimnicu Vilcea Arad Lugoj Arad Oradea

(b) After expanding Arad Arad

Sibiu Timisoara Zerind

Arad Fagaras Oradea Rimnicu Vilcea Arad Lugoj Arad Oradea

(c) After expanding Sibiu Arad

Sibiu Timisoara Zerind

Arad Fagaras Oradea Rimnicu Vilcea Arad Lugoj Arad Oradea

Map of Romania

Artificial Intelligence – Prof. Dr. Jürgen Dix (102/554)


2. Searching 2. Uninformed search

function T REE -S EARCH( ,  

) returns a solution, or failure



   
 

initialize the search tree using the initial state of    


loop do
if there are no candidates for expansion then return failure
choose a leaf node for expansion according to    
 

if the node contains a goal state then return the corresponding solution
else expand the node and add the resulting nodes to the search tree

Artificial Intelligence – Prof. Dr. Jürgen Dix (103/554)


2. Searching 2. Uninformed search

Important:
State-space versus search-tree:
The search-tree is countably infinite in contrast to the
finite state-space.

a node is a bookkeeping data structure with respect to the


problem-instance and with respect to an algorithm;
a state is a snapshot of the world.

Artificial Intelligence – Prof. Dr. Jürgen Dix (104/554)


2. Searching 2. Uninformed search

Definition 19 (Datatype Node)


The datatype node is defined by state (∈ S), parent
(a node), action (also called operator) which
generated this node, path-costs (the costs to reach
the node) and depth (distance from the root).
Tree-Search

Important:
The recursive dependency between node and parent
is important. If the depth is left out then a special
node root has to be introduced.
Conversely the root can be defined by the depth: root
is its own parent with depth 0.
Artificial Intelligence – Prof. Dr. Jürgen Dix (105/554)
2. Searching 2. Uninformed search

PARENT− N ODE

A CTION = right
5 4 Node DEPTH = 6
STATE PATH− C OST = 6
6 1 88

7 3 22

Artificial Intelligence – Prof. Dr. Jürgen Dix (106/554)


2. Searching 2. Uninformed search

Now we will try to instantiate the function


GENERAL-SEARCH a bit.

Design-decision: Queue
The nodes-to-be-expanded could be described as a
set. But we will use a queue instead.
The fringe is the set of generated nodes that are not
yet expanded.

Here are a few functions operating on queues:


Make-Queue(Elements) Remove-First(Queue)
Empty?(Queue) Insert(Element,Queue)
First(Queue) Insert-All(Elements,Queue)

Artificial Intelligence – Prof. Dr. Jürgen Dix (107/554)


2. Searching 2. Uninformed search

function T REE -S EARCH(      6 : /    + 

) returns a solution, or failure


/    +  

I NSERT(M AKE -N ODE(I NITIAL -S TATE[      6

]), /    + 

)
loop do
if E MPTY ?( ) then return failure /    + 

R EMOVE -F IRST( 

)  =   /    + 

if G OAL -T EST[ ] applied to S TATE[      6   = 

] succeeds
then return S OLUTION( )   = 

I NSERT-A LL(E XPAND(


/ 

,   +     =       6

), /    + 

function E XPAND(   =  :      6

) returns a set of nodes


!

the empty set


     

for each , in S UCCESSOR -F N[ ](S TATE[ @


      !  A      6   = 

]) do
a new N ODE 

S TATE[ ]    ! 

PARENT-N ODE[ ]    = 

ACTION[ ]     

PATH -C OST[ ] PATH -C OST[ ] + S TEP -C OST( ,    =    =     

, )

D EPTH[ ] D EPTH[ ]+1    = 

add to !     

return !     

Datatype Node

Artificial Intelligence – Prof. Dr. Jürgen Dix (108/554)


2. Searching 2. Uninformed search

Question:
Which are interesting requirements of
search-strategies?

completeness
time-complexity
space complexity
optimality (w.r.t. path-costs)

We distinguish:
Uninformed vs. informed search.

Artificial Intelligence – Prof. Dr. Jürgen Dix (109/554)


2. Searching 2. Uninformed search

Breadth-first-search: “nodes with the smallest depth


are expanded first”,

Make-Queue : add new nodes at the end: FIFO

Complete? Yes.
Optimal? Yes, if all operators are equally expensive.

Constant branching-factor b: At depth d we


have generated (in the worst case)

b + b2 + . . . + bd −1 + (bd − b)-many nodes.


Space complexity = Time Complexity

Artificial Intelligence – Prof. Dr. Jürgen Dix (110/554)


2. Searching 2. Uninformed search

A A A A

B C B C B C B C

D E F G D E F G D E F G D E F G

Artificial Intelligence – Prof. Dr. Jürgen Dix (111/554)


2. Searching 2. Uninformed search

Depth Nodes Time Memory


0 1 1 millisecond 100 bytes
2 111 .1 seconds 11 kilobytes
4 11,111 11 seconds 1 megabyte
6 106 18 minutes 111 megabytes
8 108 31 hours 11 gigabytes
10 1010 128 days 1 terabyte
12 1012 35 years 111 terabytes
14 1014 3500 years 11,111 terabytes

Artificial Intelligence – Prof. Dr. Jürgen Dix (112/554)


2. Searching 2. Uninformed search

Uniform-Cost-Search: “Nodes n with lowest


path-costs g (n) are expanded first”

Make-Queue : new nodes are compared to


those in the queue according
to their path costs and are
inserted accordingly

Complete? Yes, if each operator increases the


path-costs by a minimum of δ > 0 (see below).
C∗
Worst case space/time complexity: O(b1+b δ c ),
where C ∗ is the cost of the optimal solution and each
action costs at least δ
Artificial Intelligence – Prof. Dr. Jürgen Dix (113/554)
2. Searching 2. Uninformed search

If all operators have the same costs (in particular if


g (n) = depth(n) holds):

Uniform-cost

Uniform-cost=Breadth-first.

Theorem 20 (Optimality of uniform-cost)


If ∃δ > 0 : g (succ (n)) ≥ g (n) + δ then:
Uniform-Cost is optimal.

Artificial Intelligence – Prof. Dr. Jürgen Dix (114/554)


2. Searching 2. Uninformed search

Depth-first-search: “Nodes of the greatest depth are


expanded first ”,

Make-Queue : LIFO , new nodes are


added at the
beginning

If branching-factor b is constant and the maximum


depth is m then:
Space-complexity: b × m,
Time-complexity: bm .

Artificial Intelligence – Prof. Dr. Jürgen Dix (115/554)


2. Searching 2. Uninformed search

A A A

B C B C B C

D E F G D E F G D E F G

H I J K L M N O H I J K L M N O H I J K L M N O

A A A

B C B C B C

D E F G D E F G D E F G

H I J K L M N O H I J K L M N O H I J K L M N O

A A A

B C B C B C

D E F G D E F G D E F G

H I J K L M N O H I J K L M N O H I J K L M N O

A A A

B C B C B C

D E F G D E F G D E F G

H I J K L M N O H I J K L M N O H I J K L M N O

Artificial Intelligence – Prof. Dr. Jürgen Dix (116/554)


2. Searching 2. Uninformed search

Depth-limited-search: “search down to depth d”.

function D EPTH -L IMITED -S EARCH( , ) returns a solution, or failure/cutoff    


 

return R ECURSIVE -DLS(M AKE -N ODE(I NITIAL -S TATE[ ]), , )    


   
 

function R ECURSIVE -DLS( , , ) returns a solution, or failure/cutoff   


   
 

  

false
      
 # %

if G OAL -T EST[ ](S TATE[ ]) then return S OLUTION( )    


  
  

else if D EPTH[ ]= then return   


     

else for each in E XPAND( , ) do )   


) )     
   

R ECURSIVE -DLS(

)

, 

, )   % )   
) )      
 

if = then 

true
)                 
 # %

else if then return 


)   

.
- / 0   

)   

if  

then return
 

else return      
 #     / 0   

Artificial Intelligence – Prof. Dr. Jürgen Dix (117/554)


2. Searching 2. Uninformed search

In total: 1 + b + b2 + . . . + bd −1 + bd + (bd +1 − b)
many nodes.
Space-complexity: b × l,
Time-complexity: bl .
Two different sorts of failures!

Artificial Intelligence – Prof. Dr. Jürgen Dix (118/554)


2. Searching 2. Uninformed search

Iterative-deepening-search: “depth is increased


one-by-one”
Limit = 0 A A

Limit = 1 A A A A

B C B C B C B C

Limit = 2 A A A A

B C B C B C B C

D E F G D E F G D E F G D E F G

A A A A

B C B C B C B C

D E F G D E F G D E F G D E F G

Limit = 3 A A A A

B C B C B C B C

D E F G D E F G D E F G D E F G

H I J K L M N O H I J K L M N O H I J K L M N O H I J K L M N O

A A A A

B C B C B C B C

D E F G D E F G D E F G D E F G

H I J K L M N O H I J K L M N O H I J K L M N O H I J K L M N O

A A A A

B C B C B C B C

D E F G D E F G D E F G D E F G

H I J K L M N O H I J K L M N O H I J K L M N O H I J K L M N O

Artificial Intelligence – Prof. Dr. Jürgen Dix (119/554)


2. Searching 2. Uninformed search

function I TERATIVE -D EEPENING -S EARCH(    


) returns a solution, or failure


inputs: , a problem    

for 
  

0 to do


    

D EPTH -L IMITED -S EARCH(    


, 
 

)
if 
   

cutoff then return



   

Artificial Intelligence – Prof. Dr. Jürgen Dix (120/554)


2. Searching 2. Uninformed search

How many nodes?


d × b + (d − 1) × b2 + . . . + 2 × bd −1 + 1 × bd -many.

Compare with breadth-first search:

b + b2 + . . . + bd + (bd +1 − b)-many.

Artificial Intelligence – Prof. Dr. Jürgen Dix (121/554)


2. Searching 2. Uninformed search

Attention:
Iterative-deepening-search is faster than
breadth-first-search because the latter also generates
nodes at depth d + 1 (even if the solution is at depth
d).

The amount of revisited nodes is kept within

a limit.

Artificial Intelligence – Prof. Dr. Jürgen Dix (122/554)


2. Searching 2. Uninformed search

Bidirectional search:

Start Goal

Artificial Intelligence – Prof. Dr. Jürgen Dix (123/554)


2. Searching 2. Uninformed search

Breadth- Uniform- Depth- Depth- Iterative Bidirectional


Criterion
First Cost First Limited Deepening (if applicable)
Time bd bd bm bl bd bd/2
Space bd bd bm bl bd bd/2
Optimal? Yes Yes No No Yes Yes
Complete? Yes Yes No Yes, if l d Yes Yes

Artificial Intelligence – Prof. Dr. Jürgen Dix (124/554)


2. Searching 2. Uninformed search

How to avoid repeated states?


Can we avoid infinite trees by checking

for loops?

Compare number of states with number of

paths in the search tree .

Artificial Intelligence – Prof. Dr. Jürgen Dix (125/554)


2. Searching 2. Uninformed search

Rectangular grid: How many different states are


reachable within a path of length d?.

A A

B B B
A

C C C C C

(a) (b) (c)

Artificial Intelligence – Prof. Dr. Jürgen Dix (126/554)


2. Searching 2. Uninformed search

function G RAPH -S EARCH(      6 : /    + 

) returns a solution, or failure


    = 

an empty set
/    +  

) I NSERT(M AKE -N ODE(I NITIAL -S TATE[      6

]), /    + 

loop do
if E MPTY ?( ) then return failure /    + 

R EMOVE -F IRST(


) =   /    + 

if G OAL -T EST[ ](S TATE[ ]) then return S OLUTION(      6   =    = 

)
if S TATE[ ] is not in then   =      =

add S TATE[ ] to   =      =

I NSERT-A LL(E XPAND( , ), /

)    +     =       6 /    + 

Artificial Intelligence – Prof. Dr. Jürgen Dix (127/554)


2. Searching 3. Best first (Greedy) search

2.3 Best first (Greedy) search

Artificial Intelligence – Prof. Dr. Jürgen Dix (128/554)


2. Searching 3. Best first (Greedy) search

Idea:
Use problem-specific knowledge to improve the
search.

Tree-search is precisely defined. Only freedom:


Make-Queue.

Assuming we have an evaluation-function f which


assigns a value f(n) to each node n.
We change Make-Queue as follows
the nodes with smallest f are located at the
beginning of the queue
– thus the queue is sorted wrt. f.
Artificial Intelligence – Prof. Dr. Jürgen Dix (129/554)
2. Searching 3. Best first (Greedy) search

function BEST-FIRST-SEARCH( problem, EVAL-FN) returns a solution sequence


inputs: problem, a problem
Eval-Fn, an evaluation function

Queueing-Fn a function that orders nodes by EVAL-FN


return GENERAL-SEARCH( problem, Queueing-Fn)

What about time and space complexity?

Artificial Intelligence – Prof. Dr. Jürgen Dix (130/554)


2. Searching 3. Best first (Greedy) search

Greedy search: here the evaluation-function is


Eval_Fn(n):= expected costs of an optimal
path from the state in n to the final state.

The word optimal is used in respect to the given


cost-function g.
This evaluation-function is also called
heuristic-function, written h(n).

Artificial Intelligence – Prof. Dr. Jürgen Dix (131/554)


2. Searching 3. Best first (Greedy) search

Example 21 (path-finding)
Straight−line distance
Oradea to Bucharest
71
Neamt Arad 366
87
Bucharest 0
Zerind
75
151 Craiova 160
Iasi Dobreta 242
Arad
140 Eforie 161
92 Fagaras 178
Sibiu Fagaras
99 Giurgiu 77
118 Hirsova 151
Vaslui
80 Iasi 226
Rimnicu Vilcea Lugoj
Timisoara 244
142
Mehadia 241
111 Pitesti 211 Neamt 234
Lugoj 97
Oradea 380
70 98 Pitesti 98
85 Hirsova
146 101 Rimnicu Vilcea 193
Mehadia Urziceni
86 Sibiu 253
75 138 Bucharest Timisoara 329
Dobreta 120
90
Urziceni 80
Craiova Eforie Vaslui 199
Giurgiu Zerind 374

h(n) might be defined as the direct-line distance


between Bucharest and the city denounced by n.
Artificial Intelligence – Prof. Dr. Jürgen Dix (132/554)
2. Searching 3. Best first (Greedy) search

(a) The initial state Arad


366

(b) After expanding Arad Arad

Sibiu Timisoara Zerind


253 329 374

(c) After expanding Sibiu Arad

Sibiu Timisoara Zerind


329 374

Arad Fagaras Oradea Rimnicu Vilcea


366 176 380 193

(d) After expanding Fagaras Arad

Sibiu Timisoara Zerind


329 374

Arad Fagaras Oradea Rimnicu Vilcea


366 380 193

Sibiu Bucharest
253 0
Artificial Intelligence – Prof. Dr. Jürgen Dix (133/554)
2. Searching 3. Best first (Greedy) search

Questions:
1
Is greedy-search optimal?
2
Is greedy-search complete?

Artificial Intelligence – Prof. Dr. Jürgen Dix (134/554)


2. Searching 4. A∗ search

2.4 A∗ search

Artificial Intelligence – Prof. Dr. Jürgen Dix (135/554)


2. Searching 4. A∗ search

A∗ -search: Here the evaluation-function is the sum of


an heuristic-function h(n) and the real
path-costs g(n):
Eval_Fn(n) := h(n) + g(n)
Also written f(n).

So A∗ -search is “greedy + uniform-cost”, because h(nz ) = 0


holds for final states nz , as f(nz ) = g(nz ).

Artificial Intelligence – Prof. Dr. Jürgen Dix (136/554)


2. Searching 4. A∗ search

We require that the heuristic function never


overestimates the cost to reach the goal.
Definition 22 (Admissible heuristic function)
The heuristic-function h is called admissible if h(n) is
always smaller than the optimal costs from n to a
goal-node:

h(n) < g(nz ) − g(n)

Artificial Intelligence – Prof. Dr. Jürgen Dix (137/554)


2. Searching 4. A∗ search

(a) The initial state Arad


366=0+366

(b) After expanding Arad Arad

Sibiu Timisoara Zerind


393=140+253 447=118+329 449=75+374

(c) After expanding Sibiu Arad

Sibiu Timisoara Zerind


447=118+329 449=75+374

Arad Fagaras Oradea Rimnicu Vilcea


646=280+366 415=239+176 671=291+380 413=220+193

(d) After expanding Rimnicu Vilcea Arad

Sibiu Timisoara Zerind


447=118+329 449=75+374

Arad Fagaras Oradea Rimnicu Vilcea


646=280+366 415=239+176 671=291+380

Craiova Pitesti Sibiu


526=366+160 417=317+100 553=300+253

(e) After expanding Fagaras Arad

Sibiu Timisoara Zerind


447=118+329 449=75+374

Arad Fagaras Oradea Rimnicu Vilcea


Artificial Intelligence – Prof. Dr. Jürgen Dix
646=280+366 671=291+380 (138/554)
2. Searching 4. A∗ search

To show completeness of A∗ we have to ensure:


1
Never it is the case that an infinite number of
nodes is expanded in one step (locally finite).
2
There is a δ > 0 such that in each step the path
costs increase by at least δ.
These conditions must also hold in the
following optimality results.

Artificial Intelligence – Prof. Dr. Jürgen Dix (139/554)


2. Searching 4. A∗ search

f := Eval_Fn is monotone in our example. This does


not hold in general.
But it is an easy task to ensure f is monotone:
Definition 23 (f monotone fmon )
We define a modified function fmon recursively and
depending on the node-depth:

fmon (n) := max {fmon (n0 ), f(n)}

in which n0 is the parent of n.

Artificial Intelligence – Prof. Dr. Jürgen Dix (140/554)


2. Searching 4. A∗ search

Theorem 24 (Optimality of A∗ wrt Tree Search)


A∗ is optimal using Tree Search, if (1) the heuristic
function h is admissible, and (2) the evaluation
function f is monotone

Artificial Intelligence – Prof. Dr. Jürgen Dix (141/554)


2. Searching 4. A∗ search

N
Z

I
A
380 S
F
V
400
T R

L P

H
M U
B
420
D
E
C
G

Artificial Intelligence – Prof. Dr. Jürgen Dix (142/554)


2. Searching 4. A∗ search

What if we use Graph Search?

Definition 25 (Consistent heuristics function)


The heuristics function h is called consistent if the
following holds for every node n and successor n0 of
n:

h(n) ≤ cost (n, a, n0 ) + h(n0 ).

Theorem 26 (Optimality of A∗ wrt Graph Search)


A∗ is optimal using Graph Search, if the heuristic
function h is consistent.

Artificial Intelligence – Prof. Dr. Jürgen Dix (143/554)


2. Searching 4. A∗ search

Question:
How many nodes does A∗ store in memory?

Answer:
Virtually always exponentially-many with respect to
the length of the solution.

It can be shown: A long as the heuristic function is


not extremely exact

|h(n) − h∗ (n)| < O(log h∗ (n))


the amount of nodes is always exponential with
respect to the solution.
Artificial Intelligence – Prof. Dr. Jürgen Dix (144/554)
2. Searching 4. A∗ search

For almost every usable heuristic a bad


error-estimation holds:

|h(n) − h∗ (n)| ≈ O(h∗ (n))

Important:
A∗ ’s problem is space not time.

Artificial Intelligence – Prof. Dr. Jürgen Dix (145/554)


2. Searching 4. A∗ search

Important:
A∗ is even optimally efficient: No other algorithm
(which expands search-paths beginning with an initial
node) expands less nodes than A∗ .

Artificial Intelligence – Prof. Dr. Jürgen Dix (146/554)


2. Searching 5. Heuristics

2.5 Heuristics

Artificial Intelligence – Prof. Dr. Jürgen Dix (147/554)


2. Searching 5. Heuristics

5 4 5
1 4
2 3

6 1 88 6
8 84

7 3 22 7 6 25

Start State Goal State

Artificial Intelligence – Prof. Dr. Jürgen Dix (148/554)


2. Searching 5. Heuristics

Question:
Which branching-factor?

Answer:
Approx. 3 (more exactly 83 ).

Question:
How many states have to be considered?

Answer:
3g ≈ 1010

in which g is the amount of moves necessary to get a solution. g


is approx. 20.

Artificial Intelligence – Prof. Dr. Jürgen Dix (149/554)


2. Searching 5. Heuristics

But: There are only 9! ≈ 105 states!


In other words: Looking at cycles can be very
helpful.

Artificial Intelligence – Prof. Dr. Jürgen Dix (150/554)


2. Searching 5. Heuristics

Question:
Which heuristic-functions come in handy?

Hamming-distance: h1 is the amount of numbers


which are on the wrong position. I.e.
h1 (start ) = 8.
Manhattan-distance: Calculate for every piece the
distance to the right position and sum up:
8
h2 := ∑ i’ distance to the right position
i =1
I.e. h2 (start ) =
2 + 3 + 2 + 1 + 2 + 2 + 1 + 2 = 15.
Artificial Intelligence – Prof. Dr. Jürgen Dix (151/554)
2. Searching 5. Heuristics

Question:
How to determine the quality of a heuristic-function?
(In a single value)

Definition 27 (Effective Branching Factor)


Suppose A∗ detects an optimal solution for an
instance of a problem at depth d with N nodes
expanded. Then we define b∗ as follows

N + 1 = 1 + b∗ + (b∗ )2 + . . . + (b∗ )d

the effective branching-factor of A∗ .

Artificial Intelligence – Prof. Dr. Jürgen Dix (152/554)


2. Searching 5. Heuristics

Attention:
b∗ depends on h and on the special
problem-instance.
But for many classes of problem-instances b∗ is quite
constant.

Artificial Intelligence – Prof. Dr. Jürgen Dix (153/554)


2. Searching 5. Heuristics

Search Cost Effective Branching Factor


d IDS A*(h1 ) A*(h2 ) IDS A*(h1 ) A*(h2 )
2 10 6 6 2.45 1.79 1.79
4 112 13 12 2.87 1.48 1.45
6 680 20 18 2.73 1.34 1.30
8 6384 39 25 2.80 1.33 1.24
10 47127 93 39 2.79 1.38 1.22
12 364404 227 73 2.78 1.42 1.24
14 3473941 539 113 2.83 1.44 1.23
16 – 1301 211 – 1.45 1.25
18 – 3056 363 – 1.46 1.26
20 – 7276 676 – 1.47 1.27
22 – 18094 1219 – 1.48 1.28
24 – 39135 1641 – 1.48 1.26

Question:
Is the Manhattan-distance better than the
Hamming-distance?

Artificial Intelligence – Prof. Dr. Jürgen Dix (154/554)


2. Searching 6. Searching with limited memory

2.6 Searching with limited memory

Artificial Intelligence – Prof. Dr. Jürgen Dix (155/554)


2. Searching 6. Searching with limited memory

Question:
A∗ is memory-intensive. What if the memory is
limited? What to do if the queue is restricted in its
length?

This leads to:


IDA∗ : A∗ + iterative deepening,
RBFS: Recursive Best-First Search,
SMA∗ : Simplified memory bounded A∗ .

Artificial Intelligence – Prof. Dr. Jürgen Dix (156/554)


2. Searching 6. Searching with limited memory

IDA∗ : A∗ combined with iterative-deepening


search. We perform depth-first-search
(small memory), but use values of f
instead of depth.

So we consider contures: only nodes within the


f -limit. Concerning those beyond the limit we only
store the smallest f -value above the actual limit.

Artificial Intelligence – Prof. Dr. Jürgen Dix (157/554)


2. Searching 6. Searching with limited memory

function IDA*( problem) returns a solution sequence


inputs: problem, a problem
static: f-limit, the current f - COST limit
root, a node

root MAKE-NODE(INITIAL-STATE[problem])
f-limit f - COST(root)
loop do
solution, f-limit DFS-CONTOUR(root, f-limit)

1
if solution is non-null then return solution
if f-limit = then return failure; end

function DFS-CONTOUR(node, f-limit) returns a solution sequence and a new f - COST limit
inputs: node, a node

1
f-limit, the current f - COST limit
static: next-f, the f - COST limit for the next contour, initially

if f - COST[node] > f-limit then return null, f - COST[node]


if GOAL-TEST[problem](STATE[node]) then return node, f-limit
for each node s in SUCCESSORS(node) do
solution, new-f DFS-CONTOUR(s, f-limit)
if solution is non-null then return solution, f-limit
next-f MIN(next-f, new-f); end
return null, next-f

Artificial Intelligence – Prof. Dr. Jürgen Dix (158/554)


2. Searching 6. Searching with limited memory

Theorem 28 (Properties of IDA∗ )


IDA∗ is optimal is enough memory is available to
store the longest solution-path with costs 5 f ∗ .

Artificial Intelligence – Prof. Dr. Jürgen Dix (159/554)


2. Searching 6. Searching with limited memory

Question:
What about complexity?

space: depends on the smallest operator-costs δ, the


branching-factor b and the optimal costs f ∗ .
bf ∗ /δ-many nodes are expanded (worst case)
time: depends on the amount of values of the
heuristic-function.

If we consider a small amount of values, the last iteration of IDA∗


will often be like A∗ .
If we consider a large amount of values on the other hand, only
one node per conture will be added: How many nodes will IDA∗
expand if A∗ expands n-many?
What does this say about the time-complexity of A∗ and IDA∗ ?
Artificial Intelligence – Prof. Dr. Jürgen Dix (160/554)
2. Searching 6. Searching with limited memory

RBFS: recursive depth-first version of Best-First


search using only linear memory.
Memorizes f -value of best alternative path.

f -value of each node is replaced by the best


(smallest) f -value of its children.
RBFS is optimal and complete under the same
assumptions as A∗ .
IDA
∗ and RBFS suffer from using too little
memory.

Artificial Intelligence – Prof. Dr. Jürgen Dix (161/554)


2. Searching 6. Searching with limited memory

function R ECURSIVE -B EST-F IRST-S EARCH( ) returns a solution, or failure    


RBFS( , M AKE -N ODE(I NITIAL -S TATE[ ]), )    


   

function RBFS( , , ) returns a solution, or failure and a new -cost limit    


  
    

if G OAL -T EST[ ]( ) then return    


   
  

 " "

E XPAND( ,

)      $   
   

if 

is empty then return "

, "
         

for each in do   " "


    

[s] 
$ * , - / 0 /  2 4 6 /  2 7

 9
  
: 2

repeat
the lowest -value node in

  $


 " "
    

if then return , [ ]  9

  : =          



 

the second-lowest -value among


  
     B
$


 " "
    

, [ ] RBFS(


, 

)  



  $    

  * F H /      7   
     B
2

if then return 
  

L
K     

  

Artificial Intelligence – Prof. Dr. Jürgen Dix (162/554)


2. Searching 6. Searching with limited memory

(a) After expanding Arad, Sibiu, ∞


Arad
and Rimnicu Vilcea 366

447
Sibiu Timisoara Zerind
393
447 449
415
Arad Fagaras Oradea Rimnicu Vilcea
526 413
646 415

Craiova Pitesti Sibiu


526 417 553

(b) After unwinding back to Sibiu ∞


and expanding Fagaras Arad 366

447
Sibiu Timisoara Zerind
393 447 449
417
Arad Fagaras Oradea Rimnicu Vilcea
646 415 526 413 417

Sibiu Bucharest
591 450

(c) After switching back to Rimnicu Vilcea ∞


and expanding Pitesti Arad
366

447
Sibiu Timisoara Zerind
393 447 449
447
Arad Fagaras Oradea Rimnicu Vilcea
646 415 450 526 417

447
Craiova Pitesti Sibiu
526 417 553

Bucharest Craiova Rimnicu Vilcea


Map of Romania 418 615 607

Artificial Intelligence – Prof. Dr. Jürgen Dix (163/554)


2. Searching 6. Searching with limited memory

RBFS is optimal and complete under the same


assumptions as A∗ .
IDA
∗ and RBFS suffer from using too little

memory.

We would like to use as much memory as possible.


This leads to SMA∗

Artificial Intelligence – Prof. Dr. Jürgen Dix (164/554)


2. Searching 6. Searching with limited memory

SMA∗ : is an extension of A∗ , which only needs a


limited amount of memory.

If there is no space left but nodes have to be


expanded, nodes will be removed from the queue:
those with possibly great f -value (forgotten
nodes). But their f -costs will be stored. Later

those nodes will be considered if all other paths


lead to higher costs.

Artificial Intelligence – Prof. Dr. Jürgen Dix (165/554)


2. Searching 6. Searching with limited memory

A
0+12=12

10 8

B G
10+5=15 8+5=13

10 10 8 16
C D H I
20+5=25 20+0=20 16+2=18 24+0=24

10 10 8 8
E F J K
30+5=35 30+0=30 24+0=24 24+5=29

1 2 3 A
4 A
A A
12 12 13 13(15)

B B G G
15 13
15 13

H
18

5 A 6 A 7 A 8 A
15(15) 15 15(24) 20(24)

G B G B B
24( ) 15 20( )
15 24

I C D
25
24 20

Artificial Intelligence – Prof. Dr. Jürgen Dix (166/554)


2. Searching 6. Searching with limited memory

function SMA*(problem) returns a solution sequence


inputs: problem, a problem
static: Queue, a queue of nodes ordered by f -cost

Queue MAKE-QUEUE({MAKE-NODE(INITIAL-STATE[problem])})
loop do
if Queue is empty then return failure
n deepest least-f-cost node in Queue
if GOAL-TEST(n) then return success
s NEXT-SUCCESSOR(n)

1
if s is not a goal and is at maximum depth then
f(s)
else
f(s) MAX(f(n), g(s)+h(s))
if all of n’s successors have been generated then
update n’s f -cost and those of its ancestors if necessary
if SUCCESSORS(n) all in memory then remove n from Queue
if memory is full then
delete shallowest, highest-f-cost node in Queue
remove it from its parent’s successor list
insert its parent on Queue if necessary
insert s on Queue
end

Artificial Intelligence – Prof. Dr. Jürgen Dix (167/554)


2. Searching 6. Searching with limited memory

Theorem 29 (Properties of SMA∗ )


SMA∗ is complete if enough memory is available to
store the shortest solution-path.
SMA∗ is optimal if there is enough memory to store
the optimal solution-path.

Artificial Intelligence – Prof. Dr. Jürgen Dix (168/554)


2. Searching 7. Iterative improvements

2.7 Iterative improvements

Artificial Intelligence – Prof. Dr. Jürgen Dix (169/554)


2. Searching 7. Iterative improvements

Idea:
Considering certain problems only the actual state is
important, but not the path leading to it.
evaluation

current
state

Of course this problem is as difficult as you like!


Artificial Intelligence – Prof. Dr. Jürgen Dix (170/554)
2. Searching 7. Iterative improvements

Hill-climbing: gradient-descent (-ascent). Move in a


direction at random. Compare the new
evaluation with the old one. Move from the
new point if the evaluation is better.
Problems:
local optima: (getting lost).
plateaux: (wandering around).
ridge: (detour).

Artificial Intelligence – Prof. Dr. Jürgen Dix (171/554)


2. Searching 7. Iterative improvements

function HILL-CLIMBING( problem) returns a solution state


inputs: problem, a problem
static: current, a node
next, a node

current MAKE-NODE(INITIAL-STATE[problem])
loop do
next a highest-valued successor of current
if VALUE[next] < VALUE[current] then return current
current next
end

Artificial Intelligence – Prof. Dr. Jürgen Dix (172/554)


2. Searching 7. Iterative improvements

Simulated annealing: modified hill-climbing: also bad


moves (small evaluation-value) are
allowed with the small probability of
|∆Eval |
e− T . It depends on the “temperature”
T ranging from ∞ to 0.

Artificial Intelligence – Prof. Dr. Jürgen Dix (173/554)


2. Searching 7. Iterative improvements

function SIMULATED-ANNEALING( problem, schedule) returns a solution state


inputs: problem, a problem
schedule, a mapping from time to “temperature”
static: current, a node
next, a node
T, a “temperature” controlling the probability of downward steps

1
current MAKE-NODE(INITIAL-STATE[problem])
for t 1 to do
T schedule[t]
if T=0 then return current
next a randomly selected successor of current
ΔE VALUE[next] – VALUE[current]
if ΔE > 0 then current next
else current next only with probability e ΔE/T

Artificial Intelligence – Prof. Dr. Jürgen Dix (174/554)


2. Searching 7. Iterative improvements

function G ENETIC -A LGORITHM( , F ITNESS -F N) returns an individual     

inputs: , a set of individuals     

F ITNESS -F N, a function that measures the fitness of an individual

repeat
  

empty set      

loop for from 1 to S IZE( ) do     

R ANDOM -S ELECTION(


, F ITNESS -F N)
     

R ANDOM -S ELECTION(


, F ITNESS -F N)
     

R EPRODUCE( , )   "   

if (small random probability) then M UTATE(    "     "

)
add to    "        

             

until some individual is fit enough, or enough time has elapsed


return the best individual in , according to F ITNESS -F N     

function R EPRODUCE( , ) returns an individual  

inputs: , , parent individuals  

L ENGTH( )
 

random number from 1 to


  

return A PPEND(S UBSTRING( , 1, ), S UBSTRING( ,     + -

, ))


Artificial Intelligence – Prof. Dr. Jürgen Dix (175/554)


2. Searching 7. Iterative improvements

24748552 24 31% 32752411 32748552 32748152


32752411 23 29% 24748552 24752411 24752411
24415124 20 26% 32752411 32752124 32252124
32543213 11 14% 24415124 24415411 24415417

(a) (b) (c) (d) (e)


Initial Population Fitness Function Selection Crossover Mutation

Artificial Intelligence – Prof. Dr. Jürgen Dix (176/554)


2. Searching 7. Iterative improvements

+ =

Artificial Intelligence – Prof. Dr. Jürgen Dix (177/554)


2. Searching 7. Iterative improvements

Up to now: offline search. What about

interleaving actions and computation?

Remember the exploration problem.

3 G

1 S

1 2 3

Artificial Intelligence – Prof. Dr. Jürgen Dix (178/554)


2. Searching 7. Iterative improvements

Definition 30 (Competitive Ratio)


The competitive ratio of an online search problem is
the costs of the path (leading to a goal state) taken by
the agent divided by the costs of the optimal path.

Artificial Intelligence – Prof. Dr. Jürgen Dix (179/554)


2. Searching 7. Iterative improvements

The competitive ratio can be infinite.

S A

S A

Artificial Intelligence – Prof. Dr. Jürgen Dix (180/554)


2. Searching 7. Iterative improvements

The competitive ratio can be unbounded (even for


reversible actions).

S G

Artificial Intelligence – Prof. Dr. Jürgen Dix (181/554)


2. Searching 7. Iterative improvements

Artificial Intelligence – Prof. Dr. Jürgen Dix (182/554)


2. Searching 7. Iterative improvements

Why not simply taking random walks?

S G

Because they can lead to exponentially many steps.


Will a random walk eventually find the goal
(completeness)?

 Yes, if state space is finite;
Yes, for 2-dimensional grids;
with probability 0.34, for 3-dimensional grids.

Artificial Intelligence – Prof. Dr. Jürgen Dix (183/554)


2. Searching 7. Iterative improvements

Hill-Climbing + Memory=LRTA∗
1 1 1 1 1 1 1
(a) 8 9 2 2 4 3

1 1 1 1 1 1 1
(b) 8 9 3 2 4 3

1 1 1 1 1 1 1
(c) 8 9 3 4 4 3

1 1 1 1 1 1 1
(d) 8 9 5 4 4 3

1 1 1 1 1 1 1
(e) 8 9 5 5 4 3

Artificial Intelligence – Prof. Dr. Jürgen Dix (184/554)


2. Searching 7. Iterative improvements

Artificial Intelligence – Prof. Dr. Jürgen Dix (185/554)


3. Supervised Learning

3. Supervised Learning

Artificial Intelligence – Prof. Dr. Jürgen Dix (186/554)


3. Supervised Learning

3.1 Basics
3.2 Decision Trees
3.3 Ensemble Learning
3.4 PL1 formalisations
3.5 PAC learning
3.6 Noise and overfitting

Artificial Intelligence – Prof. Dr. Jürgen Dix (187/554)


3. Supervised Learning

Learning in general
Hitherto: An agent’s intelligence is in his program, it
is hard-wired.
Now: We want a more autonomous agent, which
should learn through percepts
(experiences) to know its environment.
Important: If the domain in which he acts can’t be
described completely.

Artificial Intelligence – Prof. Dr. Jürgen Dix (188/554)


3. Supervised Learning 1. Basics

3.1 Basics

Artificial Intelligence – Prof. Dr. Jürgen Dix (189/554)


3. Supervised Learning 1. Basics

Performance standard

Critic Sensors

feedback

Environment
changes
Learning Performance
element element
knowledge
learning
goals

Problem
generator

Agent Effectors

Artificial Intelligence – Prof. Dr. Jürgen Dix (190/554)


3. Supervised Learning 1. Basics

Performance element: Observes and chooses actions.


This was the whole agent till now.
Critic: Observes the result of an action and assesses it
with respect to an external standard. (Why? Otherwise it
would set its own standard so low that it always holds!)
Learning element: Modifies the performance element by
considering the critic (and the inner architecture of the
performance element). It also creates new goals (they
improve the understanding of effects of actions).
Problem generator: It proposes the execution of actions to
satisfy the goals of the learning element. These do not have
to be the “best” actions (in the performance element’s point
of view): but they should be informative and deliver new
knowledge about the world.
Example: driving a taxi
Artificial Intelligence – Prof. Dr. Jürgen Dix (191/554)
3. Supervised Learning 1. Basics

Question:
What does learning (the design of the learning
element) really depend on?

1. Which components of the performance element


should be improved?
2. How are these components represented?
( Section 199 (Decision trees), Section 222
(Ensemble Learning), Section 231: (Domains
formalised in PL1))

Artificial Intelligence – Prof. Dr. Jürgen Dix (192/554)


3. Supervised Learning 1. Basics

3. Which feedback?
Supervised learning: The execution of an incorrect
action leads to the “right” solution as feedback (e.g.
How intensively should the brakes be used?).
Driving instructor
Reinforcement learning: Only the result is perceived.
Critic tells, if good or bad, but not what would have
been right.
Unsupervised learning: No hints about the right
actions.

4. Which a-priori-information are there?


(Often there is useful background-knowledge)

Artificial Intelligence – Prof. Dr. Jürgen Dix (193/554)


3. Supervised Learning 1. Basics

Components of the performance element:


1
Mappings from the conditions of the actual state
into the set of actions
2
Deriving relevant properties of the world from the
percepts
3
Information about the development of the world
4
Information about the sequence of actions
5
Assessing-function of the world-states
6
Assessing-function concerning the quality of
single actions in one state
7
Description of state-classes, which maximise the
utility-function of an agent
Artificial Intelligence – Prof. Dr. Jürgen Dix (194/554)
3. Supervised Learning 1. Basics

Important:
All these components are – from a mathematical
point of view – mappings.

Learning means to represent these mappings.

Artificial Intelligence – Prof. Dr. Jürgen Dix (195/554)


3. Supervised Learning 1. Basics

Inductive learning: Given are pairs (x , y ).


Find f with f (x ) = y .

Example 31 (Continue a series of numbers)


Which number is next?
3, 5, 7, ?
3, 5, 17, 257, ?

o o o o
o o o o
o o o o
o o o o
o o o o

(a) (b) (c) (d)

Artificial Intelligence – Prof. Dr. Jürgen Dix (196/554)


3. Supervised Learning 1. Basics

Simple reflex agent:

global examples fg
function REFLEX-PERFORMANCE-ELEMENT( percept) returns an action

if ( percept, a) in examples then return a


else
h INDUCE(examples)
return h( percept)

procedure REFLEX-LEARNING-ELEMENT(percept, action)


inputs: percept, feedback percept
action, feedback action

examples examples  f( percept, action)g

Artificial Intelligence – Prof. Dr. Jürgen Dix (197/554)


3. Supervised Learning 1. Basics

Example 32 (Wason’s Test; Verify and Falsify)


Consider a set of cards. Each card has a letter printed on one
side and a number on the other. Having taken a look at some of
these cards you formulate the following hypothesis:

If there is a vowel on one side then there is an even number


on the other.

Now there are the following cards on the table:

A T 4 7.

You are allowed to turn around a maximum of two card to check


the hypothesis.
Which card(s) do you flip?

Artificial Intelligence – Prof. Dr. Jürgen Dix (198/554)


3. Supervised Learning 2. Decision trees

3.2 Decision trees

Artificial Intelligence – Prof. Dr. Jürgen Dix (199/554)


3. Supervised Learning 2. Decision trees

Decision trees represent boolean functions.


Small example:
You plan to go out for dinner and arrive at a
restaurant. Should you wait for a free table or should
you move on?

Artificial Intelligence – Prof. Dr. Jürgen Dix (200/554)


3. Supervised Learning 2. Decision trees

Patrons?

None Some Full

No Yes WaitEstimate?

>60 30−60 10−30 0−10


No Alternate? Hungry? Yes
No Yes No Yes

Reservation? Fri/Sat? Yes Alternate?


No Yes No Yes No Yes

Bar? Yes No Yes Yes Raining?


No Yes No Yes

No Yes No Yes

Trainings set Learned Tree Decision Tree


Artificial Intelligence – Prof. Dr. Jürgen Dix (201/554)
3. Supervised Learning 2. Decision trees

decision tree = conjunction of implications


(implication = path leading to a leaf)
For all restaurants r :
(Patrons(r , Full ) ∧ Wait_estimate(r , 10 − 30) ∧ ¬Hungry (r ))
−→ Will_Wait (r )
Attention:
This is written in first order logic but a decision tree talks only
about a single object (r above). So this is propositional logic:

Patronsr (Full ) ∧ Wait_estimater (10 − 30) ∧ ¬Hungryr


−→ Will_Waitr

Artificial Intelligence – Prof. Dr. Jürgen Dix (202/554)


3. Supervised Learning 2. Decision trees

Question:
Can every boolean function be represented by a
decision tree?
Answer:
Yes! Each row of the table describing the function
belongs to one path in the tree.

Attention
Decision trees can be much smaller! But there are
boolean function which can only be represented by
trees with an exponential size:

1, if ∑ni=1 xi is even

Parity function: par (x1 , . . . , xn ) :=
0, else
Artificial Intelligence – Prof. Dr. Jürgen Dix (203/554)
3. Supervised Learning 2. Decision trees

A variant of decision-trees is the...


Example 33 (Decision List)
All attributes are boolean. A decision list is
TEST 1 no TEST 2 no TEST e no
... Answer e+1
yes yes yes
Answer 1 Answer 2 Answer e

with Answeri ∈ {Yes, No} and Testi a conjunction of


(possibly negated) attributes ( Exercise: Compare
decision trees and decision lists).
k -DL(n) is the set of boolean functions with n
attributes, which can be represented by decision
lists with a maximum of k checks in each test.
Artificial Intelligence – Prof. Dr. Jürgen Dix (204/554)
3. Supervised Learning 2. Decision trees

PAC-Learning

No No
Patrons(x,Some) Patrons(x,Full) Fri/Sat(x) No

>
Yes Yes

Yes Yes

Obviously:
n-DL(n) = set of all boolean functions
n
card (n-DL(n)) = 22 .

Artificial Intelligence – Prof. Dr. Jürgen Dix (205/554)


3. Supervised Learning 2. Decision trees

Question: Table of examples


How should decision trees be learned?
Attributes Goal
Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
X1 Yes No No Yes Some $$$ No Yes French 0–10 Yes
X2 Yes No No Yes Full $ No No Thai 30–60 No
X3 No Yes No No Some $ No No Burger 0–10 Yes
X4 Yes No Yes Yes Full $ No No Thai 10–30 Yes
X5 Yes No Yes No Full $$$ No Yes French >60 No
X6 No Yes No Yes Some $$ Yes Yes Italian 0–10 Yes
X7 No Yes No No None $ Yes No Burger 0–10 No
X8 No No No Yes Some $$ Yes Yes Thai 0–10 Yes
X9 No Yes Yes No Full $ Yes No Burger >60 No
X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10–30 No
X11 No No No No None $ No No Thai 0–10 No
X12 Yes Yes Yes Yes Full $ No No Burger 30–60 Yes

Decision Tree Learned Tree

Artificial Intelligence – Prof. Dr. Jürgen Dix (206/554)


3. Supervised Learning 2. Decision trees

The set of examples-to-be-learned is called training


set. Examples can be evaluated positively (attribute
holds) or negatively (attribute does not hold).
Trivial solution of learning
The paths in the tree are exactly the examples.

Disadvantage:
New cases can not be considered.

Artificial Intelligence – Prof. Dr. Jürgen Dix (207/554)


3. Supervised Learning 2. Decision trees

Idea:
Choose the simplest tree (or rather the most
general) which is compatible with all examples.

Ockham’s razor: Entia non sunt multiplicanda praeter


necessitatem.

Artificial Intelligence – Prof. Dr. Jürgen Dix (208/554)


3. Supervised Learning 2. Decision trees

Example 34 (Guess!)
A computer program encodes triples of numbers with respect to
a certain rule. Find out that rule.

You enter triples (x1 , x2 , x3 ) of your choice (xi ∈ N) and get as


answers “yes” or “no”.
Simplification: At the beginning the program tells you that these
triples are in the set:

(4, 6, 8), (6, 8, 12), (20, 22, 40)

Your task:
Make more enquiries (approx. 10) and try to find out the rule.

Artificial Intelligence – Prof. Dr. Jürgen Dix (209/554)


3. Supervised Learning 2. Decision trees

The idea behind the learning algorithm


Goal: A tree which is as small as possible. First test
the most important attributes (in order to get a quick
classification).
This will be formalised later, using information theory.
Then proceed recursively, i.e. with decreasing
amounts of examples and attributes.

Artificial Intelligence – Prof. Dr. Jürgen Dix (210/554)


3. Supervised Learning 2. Decision trees

We distinguish the following cases:


1
There are positive and negative examples.
Choose the best attribute.
2
There are only positive or only negative
examples. Done – a solution has been found.
3
There are no more examples. Then a default
value has to be chosen, e.g. the majority of
examples of the parent node.
4
There are positive and negative examples, but
no more attributes. Then the basic set of
attributes does not suffice, a decision can not be
made. Not enough information is given.
Artificial Intelligence – Prof. Dr. Jürgen Dix (211/554)
3. Supervised Learning 2. Decision trees

function DECISION-TREE-LEARNING(examples, attributes, default) returns a decision tree


inputs: examples, set of examples
attributes, set of attributes
default, default value for the goal predicate

if examples is empty then return default


else if all examples have the same classification then return the classification
else if attributes is empty then return MAJORITY-VALUE(examples)
else
best CHOOSE-ATTRIBUTE(attributes, examples)
tree a new decision tree with root test best

f g
for each value v i of best do

;
examples i elements of examples with best = v i
subtree DECISION-TREE-LEARNING(examples i, attributes best,
MAJORITY-VALUE(examples))
add a branch to tree with label v i and subtree subtree
end
return tree

Artificial Intelligence – Prof. Dr. Jürgen Dix (212/554)


3. Supervised Learning 2. Decision trees

Patrons?

None Some Full

No Yes Hungry?
No Yes

No Type?

French Italian Thai Burger

Yes No Fri/Sat? Yes


No Yes

No Yes

Decision Tree Trainings set Learned Tree


Artificial Intelligence – Prof. Dr. Jürgen Dix (213/554)
3. Supervised Learning 2. Decision trees

The algorithm computes a tree which is as small as


possible and consistent with the given examples.
Question:
How good is the generated tree? How different is it
from the “actual” tree? Is there an
a-priory-estimation? ( Section 247).

Artificial Intelligence – Prof. Dr. Jürgen Dix (214/554)


3. Supervised Learning 2. Decision trees

Empiric approach:
1
Chose a set of examples MEx .
2
Divide into two sets: MEx = MTrai ∪ MTest .
3
Apply the learning algorithm on MTrai and get a
hypothesis H.
4
Calculate the amount of correctly classified
elements of MTest .
5
Repeat 1.-4. for many MTrai ∪ MTest with randomly
generated MTrai .
Attention: peeking!

Artificial Intelligence – Prof. Dr. Jürgen Dix (215/554)


3. Supervised Learning 2. Decision trees

1
Proportion correct on test set
0.9

0.8

0.7

0.6

0.5

0.4
0 20 40 60 80 100
Training set size

Artificial Intelligence – Prof. Dr. Jürgen Dix (216/554)


3. Supervised Learning 2. Decision trees

Information theory
Question:
How to choose the “best” attribute? The best attribute is the one
that delivers the highest amount of information.

Example: flipping a coin


Definition 35 (1 bit, information)
1 bit is the information contained in the outcome of flipping a
(fair) coin.
More generally: assume there is an experiment with n possible
outcomes v1 , . . . , vn . Each outcome vi will result with a
probability of P (vi ). The information encoded in this result (the
outcome of the experiment) is defined as follows:
n
I(P (v1 ), . . . , P (vn )) := ∑ −P (vi )log2P (vi )
Artificial Intelligence – Prof. Dr. Jürgen Dix
i =1 (217/554)
3. Supervised Learning 2. Decision trees

Assume the coin is manipulated. With a probability of


90% head will come out. Then

I(0.1, 0.9) = . . . ≈ 0.47

Artificial Intelligence – Prof. Dr. Jürgen Dix (218/554)


3. Supervised Learning 2. Decision trees

Question:
For each attribute A: If this attribute is evaluated with
respect to the actual training-set, how much
information will be gained this way?

The “best” attribute is the one with the highest


gain of information!

Artificial Intelligence – Prof. Dr. Jürgen Dix (219/554)


3. Supervised Learning 2. Decision trees

Definition 36 (Gain of Information)


We gain the following information by testing the attribute A:
p n
Gain(A) = I( , ) − Missing_Inf(A)
p+n p+n
with
ν
pi + ni pi ni
Missing_Inf(A) = ∑ I( , )
i =1 p+n pi + ni pi + ni

Choose_Attribute chooses the A with maximal Gain(A).

Artificial Intelligence – Prof. Dr. Jürgen Dix (220/554)


3. Supervised Learning 2. Decision trees

1 3 4 6 8 12 1 3 4 6 8 12
2 5 7 9 10 11 2 5 7 9 10 11

Type? Patrons?

French Italian Thai Burger None Some Full


1 6 4 8 3 12 1 3 6 8 4 12
5 10 2 11 7 9 7 11 2 5 9 10

No Yes Hungry?

No Yes

4 12
5 9 2 10
(a) (b)

The figure implies Gain(Patron) ≈ 0.54. Calculate


Gain(Type), Gain(Hungry ) (Hungry as the first
attribute), Gain(Hungry ) (with predecessor Patron).
Artificial Intelligence – Prof. Dr. Jürgen Dix (221/554)
3. Supervised Learning 3. Ensemble Learning

3.3 Ensemble Learning

Artificial Intelligence – Prof. Dr. Jürgen Dix (222/554)


3. Supervised Learning 3. Ensemble Learning

Sofar:
A single hypothesis is used to make predictions.

Idea:
Let’s take a whole bunch of ’em (an ensemble).

Motivation:
Among several hypotheses, use majority voting.
The misclassified ones get higher weights!

Artificial Intelligence – Prof. Dr. Jürgen Dix (223/554)


3. Supervised Learning 3. Ensemble Learning

Consider three simple hypotheses. No one is perfect.


But all together, a new hypothesis is created (which is
not constructible by the original method).


− −
− − − −
− − −
− −
− −
− − + − −
− + +
++ + −
+ − −
− ++ + −
+ + + +
− − −
− − − −
− − − −
− −
− − −

Artificial Intelligence – Prof. Dr. Jürgen Dix (224/554)
3. Supervised Learning 3. Ensemble Learning

h1 h2 h3 h4

h
Artificial Intelligence – Prof. Dr. Jürgen Dix (225/554)
3. Supervised Learning 3. Ensemble Learning

Weighted Training Set: Each example gets a weight


wj ≥ 0.
Initialisation: All weights are set to 1.
Boosting: Misclassified examples are getting higher
weights.
Iterate: We get new hypotheses hi . After we got a
certain number M of them we feed them
into the
Boosting-Algorithm: It creates a weighted ensemble
hypothesis.

Artificial Intelligence – Prof. Dr. Jürgen Dix (226/554)


3. Supervised Learning 3. Ensemble Learning

Artificial Intelligence – Prof. Dr. Jürgen Dix (227/554)


3. Supervised Learning 3. Ensemble Learning

Theorem 37 (Effect of boosting)


Suppose the Learning algorithm has the following
property: it always returns a hypothesis with weighted
error that is slightly better than random guessing.
Then AdaBOOST will return a hypothesis classifying
the training data perfectly for large enough M.

Artificial Intelligence – Prof. Dr. Jürgen Dix (228/554)


3. Supervised Learning 3. Ensemble Learning

1
0.95
Proportion correct on test set

0.9
0.85
0.8
0.75
0.7
0.65 Boosted decision stumps
0.6 Decision stump
0.55
0.5
0 20 40 60 80 100
Training set size
Artificial Intelligence – Prof. Dr. Jürgen Dix (229/554)
3. Supervised Learning 3. Ensemble Learning

1
0.95
Training/test accuracy

0.9
0.85 Training error
Test error
0.8
0.75
0.7
0.65
0.6
0 50 100 150 200
Number of hypotheses M
Artificial Intelligence – Prof. Dr. Jürgen Dix (230/554)
3. Supervised Learning 4. PL1 formalisations

3.4 PL1 formalisations

Artificial Intelligence – Prof. Dr. Jürgen Dix (231/554)


3. Supervised Learning 4. PL1 formalisations

Goal:
A more general framework.

Idea:
To learn means to search in the hypotheses

space

( planning).

Goal-predicate:
Q (x ), one-dimensional (hitherto: Will_Wait)

We seek a definition of Q (x ), i.e. a formula C (x ) with


∀x Q (x ) ↔ C (x )
Artificial Intelligence – Prof. Dr. Jürgen Dix (232/554)
3. Supervised Learning 4. PL1 formalisations

Each example Xi represents a set of conditions under


which Q (Xi ) holds or not. We look for an explanation:
a formula C (x ) which uses all predicates of the
examples.
∀r Will_Wait (r ) ↔
Patrons(r , Some)
∨(Patrons(r , Full ) ∧ ¬Hungry (r ) ∧ Type(r , French)
∨(Patrons(r , Full ) ∧ ¬Hungry (r ) ∧ Type(r , Thai ) ∧ Fri_Sat (r ))
∨(Patrons(r , Full ) ∧ ¬Hungry (r ) ∧ Type(r , Burger )

Artificial Intelligence – Prof. Dr. Jürgen Dix (233/554)


3. Supervised Learning 4. PL1 formalisations

Definition 38 (Hypothesis, Candidate Function)

A formula Ci (x ) with ∀x C (x ) ↔ Ci (x ) is called


candidate function. The whole formula is called
hypothesis:

Hi : ∀xC (x ) ↔ Ci (x )

Definition 39 (Hypotheses-Space H )
The hypotheses-space H of a learning-algorithm is
the set of those hypotheses the algorithm can create.
The extension of a hypothesis H with respect to the
goal-predicate Q is the set of examples for which H
holds.
Artificial Intelligence – Prof. Dr. Jürgen Dix (234/554)
3. Supervised Learning 4. PL1 formalisations

Attention:
The combination of hypotheses with different
extensions leads to inconsistency.

Hypotheses with the same extension are logically


equivalent.

Artificial Intelligence – Prof. Dr. Jürgen Dix (235/554)


3. Supervised Learning 4. PL1 formalisations

The situation in general:


We have a set of examples {X1 , . . . , Xn }. We describe each
example X through a clause and the declaration Q (x )
(holds) or ¬Q (x ) (does not hold).

Attributes Goal
Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
X1 Yes No No Yes Some $$$ No Yes French 0–10 Yes
X2 Yes No No Yes Full $ No No Thai 30–60 No
X3 No Yes No No Some $ No No Burger 0–10 Yes
X4 Yes No Yes Yes Full $ No No Thai 10–30 Yes
X5 Yes No Yes No Full $$$ No Yes French >60 No
X6 No Yes No Yes Some $$ Yes Yes Italian 0–10 Yes
X7 No Yes No No None $ Yes No Burger 0–10 No
X8 No No No Yes Some $$ Yes Yes Thai 0–10 Yes
X9 No Yes Yes No Full $ Yes No Burger >60 No
X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10–30 No
X11 No No No No None $ No No Thai 0–10 No
X12 Yes Yes Yes Yes Full $ No No Burger 30–60 Yes

Artificial Intelligence – Prof. Dr. Jürgen Dix (236/554)


3. Supervised Learning 4. PL1 formalisations

X1 ist defined through

Alt (X1 ) ∧ ¬Bar (X1 ) ∧ ¬Fri_Sat (X1 ) ∧ Hungry (X1 ) ∧ . . .

und Will_Wait (X1 ).


Note that H is the set of hypotheses as defined in
Definition 38. While it corresponds to decision
trees, it is not the same.
The training set is the set of all those clauses.

Artificial Intelligence – Prof. Dr. Jürgen Dix (237/554)


3. Supervised Learning 4. PL1 formalisations

We search for a hypothesis which is consistent with the


training set.

Question:
Under which conditions is a hypothesis H inconsistent with an
example X ?

false negative: Hypothesis says no (¬Q (X )) but Q (X ) really


holds.
false positive: Hypothese says yes (Q (X )) but ¬Q (X ) really
holds.

Attention:
Inductive learning in logic-based domains means to restrict the
set of possible hypotheses with every example.

Artificial Intelligence – Prof. Dr. Jürgen Dix (238/554)


3. Supervised Learning 4. PL1 formalisations

For first order logic: H is in general infinite, thus


automatic theorem proving is way too general.
1st approach: We keep one hypothesis and modify it
if the examples are inconsistent with it.
2nd approach: We keep the whole subspace that is
still consistent with the examples (version
space). This is effectively represented by
two sets (analogical to the representation
of a range of real numbers by [a, b]).

Artificial Intelligence – Prof. Dr. Jürgen Dix (239/554)


3. Supervised Learning 4. PL1 formalisations

1st approach: current-best-hypothesis search.


Begin with a simple hypothesis H. If a new example
is consistent with H: okay. If it is false negative:
generalise H. If it is false positive: specialise H.

− − −
− − − − − − − − − − − − − −

− −
− − − − −
− − − − −
+ + + + +
+ + + + +
+ − + − + − + − + −
− − − − −
+ + + + +
+ + + + + − + + − + +
+ + − + + − + + − + + − + + −

− − − − − − − − − −
(a) (b) (c) (d) (e)

Artificial Intelligence – Prof. Dr. Jürgen Dix (240/554)


3. Supervised Learning 4. PL1 formalisations

This leads to an algorithm:

function CURRENT-BEST-LEARNING(examples) returns a hypothesis

H any hypothesis consistent with the first example in examples


for each remaining example in examples do
if e is false positive for H then
H choose a specialization of H consistent with examples
else if e is false negative for H then
H choose a generalization of H consistent with examples
if no consistent specialization/generalization can be found then fail
end
return H

Artificial Intelligence – Prof. Dr. Jürgen Dix (241/554)


3. Supervised Learning 4. PL1 formalisations

Question:
How to generalize/specialize?

H1 : ∀x (Q (x ) ↔ C1 (x ))
H2 : ∀x (Q (x ) ↔ C2 (x ))

H1 generalises H2 , if C2 (x ) → C1 (x ), H1 specialises H2 , if
C1 (x ) → C2 (x ).
generalisation means: leave out ∧-elements in a
conjunction, add ∨-elements to a disjunction.
specialisation means: add ∧-elements to a disjunction,
leave out ∨-elements in a conjunction.

Artificial Intelligence – Prof. Dr. Jürgen Dix (242/554)


3. Supervised Learning 4. PL1 formalisations

2nd approach: version-space.

function VERSION-SPACE-LEARNING(examples) returns a version space


local variables: V, the version space: the set of all hypotheses

V the set of all hypotheses


for each example e in examples do
if V is not empty then V VERSION-SPACE-UPDATE(V, e)
end
return V

function VERSION-SPACE-UPDATE(V, e) returns an updated version space


V fh 2 V : h is consistent with eg

Artificial Intelligence – Prof. Dr. Jürgen Dix (243/554)


3. Supervised Learning 4. PL1 formalisations

Problem:
H is a big disjunction H1 ∨ . . . ∨ Hn . How to represent this?

Reminder:
How is the set of real numbers between 0 and 1 represented?
Through the range [0, 1].

To solve our problem:


There is a partial order on H (generalize/specialize). The
borders are defined through
G set: is consistent with all previous examples and there is
no more general hypothesis.
S set: is consistent with all previous examples and there is
no more special hypothesis.

Artificial Intelligence – Prof. Dr. Jürgen Dix (244/554)


3. Supervised Learning 4. PL1 formalisations

On considering new examples the G- and S-sets must be


appropriately modified in VERSION-SPACE-UPDATE.

This region all inconsistent

G1 G2 G3 ... G −
m −
More general − G1



− + + G2

− S1
+ + + −
+
+
+ + +
More specific −

S1 S2 ... S −
n
− −

this region all inconsistent

Artificial Intelligence – Prof. Dr. Jürgen Dix (245/554)


3. Supervised Learning 4. PL1 formalisations

Question:
What could happen?

1
only one hypothesis remains: Hooray!
2
the space collapses: G = 0/ or S = 0/ .
3
no examples left but several hypotheses: i.e.
ending with a big disjunction.

Artificial Intelligence – Prof. Dr. Jürgen Dix (246/554)


3. Supervised Learning 5. PAC Learning

3.5 PAC Learning

Artificial Intelligence – Prof. Dr. Jürgen Dix (247/554)


3. Supervised Learning 5. PAC Learning

Question:
How big is the distance between the hypothesis H
calculated by the learning algorithm and the real
function f ?
computational learning theory: PAC-learning –
Probably Approximately Correct.
Idea:
If a hypothesis is consistent with a big training set
then it cannot be that wrong.

Artificial Intelligence – Prof. Dr. Jürgen Dix (248/554)


3. Supervised Learning 5. PAC Learning

Question:
How are the training set and test set related?

We assume:

The elements of the training and test set are


taken from the set of all examples with the same
probability.

Artificial Intelligence – Prof. Dr. Jürgen Dix (249/554)


3. Supervised Learning 5. PAC Learning

Definition 40 (error(h))
Let h ∈ H be a hypothesis and f the target (i.e.
to-be-learned) function. We are interested in the set

Diff (f, h) := {x : h(x ) 6= f(x )}.

We denote with error (h) the probability of a randomly


selected example being in Diff (f, h).
With ε > 0 the hypothesis h is called ε
approximatively correct, if error (h) ≤ ε holds.

Artificial Intelligence – Prof. Dr. Jürgen Dix (250/554)


3. Supervised Learning 5. PAC Learning

Question:
ε > 0 is given. How many examples must the training
set contain to make sure that the hypothesis created
by a learning algorithm is ε approximatively correct?
Question is wrongly stated!

Artificial Intelligence – Prof. Dr. Jürgen Dix (251/554)


3. Supervised Learning 5. PAC Learning

Different sets of examples lead to different


propositions and with it to different values of ε. It
depends not only on how many but also which
examples are chosen.
For this reason we reformulate our question more
carefully.
Question: More carefully and precisely stated.
ε > 0 and δ > 0 given. How many examples must the
training-set contain to make sure that the hypothesis
computed by a learning-algorithm is ε approximatively
correct with a probability of at least 1 − δ?

Artificial Intelligence – Prof. Dr. Jürgen Dix (252/554)


3. Supervised Learning 5. PAC Learning

We want to abstract from special


learning-algorithms and make a statement about
all possible learning algorithms.
So we assume that every learning-algorithm
calculates a hypothesis that is consistent with all
previous examples.

Artificial Intelligence – Prof. Dr. Jürgen Dix (253/554)


3. Supervised Learning 5. PAC Learning

Definition 41 (Example Complexity)


δ > 0 and ε > 0 are given. The example complexity is
the amount m of examples an arbitrary learning
algorithm needs so that the created hypothesis h is ε
approximatively correct with the probability 1 − δ.

Artificial Intelligence – Prof. Dr. Jürgen Dix (254/554)


3. Supervised Learning 5. PAC Learning

Theorem 42 (Example Complexity)


The example-complexity m depends on ε, δ and the
hypotheses-space H as follows:
1 1
m ≥ (ln + ln|H |)
ε δ

Artificial Intelligence – Prof. Dr. Jürgen Dix (255/554)


3. Supervised Learning 5. PAC Learning

H bad

f

Question:
What does the last result mean?

Answer:
The complexity depends on log(|H |). What does this mean
for boolean functions with n arguments?
Artificial Intelligence – Prof. Dr. Jürgen Dix (256/554)
3. Supervised Learning 5. PAC Learning

Answer:
We have log(|H |) = 2n . Thus one needs
exponentially-many examples even if one is satisfied
with ε approximative correctness under a certain
probability!

Artificial Intelligence – Prof. Dr. Jürgen Dix (257/554)


3. Supervised Learning 5. PAC Learning

Proof (of the theorem):


hb ∈ H with error (hb ) > ε.
What is the probability Pb of hb being consistent with the
chosen examples? This is ≤ (1 − ε) because of the
definition of error (for one single example). So:
Pb ≤ (1 − ε)m
What is the probability P 0 that there is a hypothesis hb with
error (hb ) > ε and consistent with m examples at all?
P 0 ≤ |Hbad |(1 − ε)m
≤ |H |(1 − ε)m

Artificial Intelligence – Prof. Dr. Jürgen Dix (258/554)


3. Supervised Learning 5. PAC Learning

Proof (continuation):
We want P 0 ≤ δ:
|H |(1 − ε)m ≤ δ
After some transformations:
(1 − ε)m ≤ |Hδ |
m ln(1 − ε) ≤ ln(δ) − ln(|H |)
m ≥ − ln(11−ε) (ln( 1δ ) + ln(|H |))
m ≥ 1ε (ln( 1δ ) + ln(|H |))
The last line holds because of ln(1 − ε) < −ε.

Artificial Intelligence – Prof. Dr. Jürgen Dix (259/554)


3. Supervised Learning 5. PAC Learning

Essential result:
Learning is never better than looking up in

a table!

Artificial Intelligence – Prof. Dr. Jürgen Dix (260/554)


3. Supervised Learning 5. PAC Learning

1st way out: We ask for a more specialised


hypothesis instead of one that is just
consistent (complexity gets worse).
2nd way out: We give up on learning arbitrary
boolean functions and concentrate on
appropriate subclasses.

Artificial Intelligence – Prof. Dr. Jürgen Dix (261/554)


3. Supervised Learning 5. PAC Learning

We could consider Decision lists.


Theorem 43 (Decision Lists can be learned)
Learning functions in k -DL(n) (decision lists with a
maximum of k tests) has a PAC-complexity of
1 1
m = (ln( ) + O(nk log2 (nk ))).
ε δ
Decision Lists

Each algorithm which returns a consistent decision


list for a set of examples can be turned into a
PAC-learning-algorithm, which learns a k -DL(n)
function after a maximum of m examples.

Artificial Intelligence – Prof. Dr. Jürgen Dix (262/554)


3. Supervised Learning 5. PAC Learning

Important estimates (exercises):


2n
|Conj (n, k )| = ∑ki=0

i
= O(nk ),
|k -DL(n)| ≤ 3 |Conj (n,k )| |Conj (n, k )! ,
|k -DL(n)| ≤ 2O(n
k
log2 (nk ))
.

Artificial Intelligence – Prof. Dr. Jürgen Dix (263/554)


3. Supervised Learning 5. PAC Learning

function DECISION-LIST-LEARNING(examples) returns a decision list, No or failure

if examples is empty then return the value No


t a test that matches a nonempty subset examples t of examples
such that the members of examplest are all positive or all negative
if there is no such t then return failure
if the examples in examplest are positive then o Yes
else o No

; exa
return a decision list with initial test t and outcome o
and remaining elements given by DECISION-LIST-LEARNING(examples

Artificial Intelligence – Prof. Dr. Jürgen Dix (264/554)


3. Supervised Learning 5. PAC Learning

1
Proportion correct on test set

0.9

0.8
Decision tree
0.7 Decision list

0.6

0.5

0.4
0 20 40 60 80 100
Artificial Intelligence – Prof. Dr. Jürgen Dix (265/554)
3. Supervised Learning 6. Noise and overfitting

3.6 Noise and overfitting

Artificial Intelligence – Prof. Dr. Jürgen Dix (266/554)


3. Supervised Learning 6. Noise and overfitting

Noise:
examples are inconsistent (Q (x ) together with
¬Q (x )),
no attributes left to classify more examples,
makes sense if the environment is
nondeterministic.

Artificial Intelligence – Prof. Dr. Jürgen Dix (267/554)


3. Supervised Learning 6. Noise and overfitting

Overfitting: Dual to noise.


remaining examples can be classified using
attributes which establish a pattern, which is not
existent (irrelevant attributes).
Example 44 (Tossing Dice)
Several coloured dice are tossed. Every toss is
described via (day, month, time, colour). As long as
there is no inconsistency every toss is described by
exactly one (totally overfitted) hypothesis.

Artificial Intelligence – Prof. Dr. Jürgen Dix (268/554)


3. Supervised Learning 6. Noise and overfitting

Other examples:
the pyramids,
astrology,
“Mein magisches Fahrrad”.

Artificial Intelligence – Prof. Dr. Jürgen Dix (269/554)


4. Learning in networks

4. Learning in networks

Artificial Intelligence – Prof. Dr. Jürgen Dix (270/554)


4. Learning in networks

4.1 The human brain


4.2 Neural networks
4.3 The Perceptron
4.4 Multi-layer feed-forward nets
4.5 Kernel Machines
4.6 Applications
4.7 Genetic Algorithms

Artificial Intelligence – Prof. Dr. Jürgen Dix (271/554)


4. Learning in networks

Learning with networks is a method to


build complex functions from many very simple
but connected units and to learn this construction
from examples,
improve the understanding of the functionality of
the human brain.

Artificial Intelligence – Prof. Dr. Jürgen Dix (272/554)


4. Learning in networks 1. The human brain

4.1 The human brain

Artificial Intelligence – Prof. Dr. Jürgen Dix (273/554)


4. Learning in networks 1. The human brain

Axonal arborization

Axon from another cell

Synapse
Dendrite Axon

Nucleus

Synapses

Cell body or Soma

Artificial Intelligence – Prof. Dr. Jürgen Dix (274/554)


4. Learning in networks 1. The human brain

A neuron consists of
the soma: the body of the cell,
the nucleus: the core of the cell,
the dendrites,
the axon: 1 cm - 1 m in length.

The axon branches and connects to the dendrites of


other neurons: these locations are called synapses.
Each neuron shares synapses with 10-100000
others.

Artificial Intelligence – Prof. Dr. Jürgen Dix (275/554)


4. Learning in networks 1. The human brain

Signals are propagated from neuron to neuron by a


complicated electrochemical reaction.

Chemical transmitter substances are released from


the synapses and enter the dendrite, raising or
lowering the electrical potential of the cell body.

When the potential reaches a threshold an electrical


pulse or action potential is sent along the axon.
The pulse spreads out along the branches of the
axons and releases transmitters into the bodies of
other cells.

Artificial Intelligence – Prof. Dr. Jürgen Dix (276/554)


4. Learning in networks 1. The human brain

Question:
How does the building process of the network of
neurons look like?

Question
Long term changes in the strength of the connections
are in response to the pattern of stimulation.

Artificial Intelligence – Prof. Dr. Jürgen Dix (277/554)


4. Learning in networks 1. The human brain

Biology versus electronics:


Computer Human Brain
Computational units 1 CPU, 10 5 gates 1011 neurons
Storage units 109 bits RAM, 1010 bits disk 10 11 neurons, 1014 synapses
Cycle time 10;8 sec 10;3 sec
Bandwidth 10 9 bits/sec 1014 bits/sec
Neuron updates/sec 105 1014

Computer: sequential processes, very fast,


“rebooting quite often”
Brain: works profoundly concurrently, quite slow,
error-correcting, fault-tolerant (neurons die
constantly)

Artificial Intelligence – Prof. Dr. Jürgen Dix (278/554)


4. Learning in networks 2. Neural networks

4.2 Neural networks

Artificial Intelligence – Prof. Dr. Jürgen Dix (279/554)


4. Learning in networks 2. Neural networks

Definition 45 (Neural Network)


A neural network consists of:
1
units,
2
links between units.
The links are weighted. There are three kinds of
units:
1
input units,
2
hidden units,
3
output units.

Artificial Intelligence – Prof. Dr. Jürgen Dix (280/554)


4. Learning in networks 2. Neural networks

Idea:
A unit i receives an input via links to other units j. The
input function

ini := ∑ Wj ,i aj
j

calculates the weighted sum.

Artificial Intelligence – Prof. Dr. Jürgen Dix (281/554)


4. Learning in networks 2. Neural networks

Notation Meaning
ai Activation value of unit i (also the output of the unit)
ai Vector of activation values for the inputs to unit i
g Activation function
g0 Derivative of the activation function
Erri Error (difference between output and target) for unit i
Erre Error for example e
Ii Activation of a unit i in the input layer
I Vector of activations of all input units
Ie Vector of inputs for example e
ini Weighted sum of inputs to unit i
N Total number of units in the network
O Activation of the single output unit of a perceptron
Oi Activation of a unit i in the output layer
O Vector of activations of all units in the output layer
t Threshold for a step function
T Target (desired) output for a perceptron
T Target vector when there are several output units
Te Target vector for example e
Wj,i Weight on the link from unit j to unit i
Wi Weight from unit i to the output in a perceptron
Wi Vector of weights leading into unit i
W Vector of all weights in the network
Artificial Intelligence – Prof. Dr. Jürgen Dix (282/554)
4. Learning in networks 2. Neural networks

a i = g(ini )
aj Wj,i
g
ini

Σ
Input Output
ai
Links Links

Input Activation
Function Function Output

Artificial Intelligence – Prof. Dr. Jürgen Dix (283/554)


4. Learning in networks 2. Neural networks

The activation function g calculates the output ai


(from the inputs) which will be transferred to other
units via output-links:
ai := g (ini )
Examples:
ai ai ai

+1 +1 +1

t ini ini ini

−1

(a) Step function (b) Sign function (c) Sigmoid function

Artificial Intelligence – Prof. Dr. Jürgen Dix (284/554)


4. Learning in networks 2. Neural networks

Simple units:
ai ai ai

+1 +1 +1

t ini ini ini

−1

(a) Step function (b) Sign function (c) Sigmoid function

Artificial Intelligence – Prof. Dr. Jürgen Dix (285/554)


4. Learning in networks 2. Neural networks

Standardisation:
We consider step0 instead of stept .
If an unit i uses the activation function stept (x ) then
we bring in an additional input link “0” which adds a
constant value of a0 := −1. This value is weighted as
W0,i := t. Now we can use step0 for the activation
function:
n n n
stept ( ∑ Wj ,i aj ) = step0 (−t + ∑ Wj ,i aj ) = step0 ( ∑ Wj ,i aj )
j =1 j =1 j =0

Artificial Intelligence – Prof. Dr. Jürgen Dix (286/554)


4. Learning in networks 2. Neural networks

Networks can be
recurrent, i.e. somehow connected,
feed-forward, i.e. they form an acyclic graph.
Usually networks are partitioned into layers:
units in one layer have only links to units of the next layer.
E.g. multi-layer feed-forward networks: without internal states
(no short-term memory).
w13
I1 H3
w35
w14
O5
w23
w45
I2 H4
w24
Artificial Intelligence – Prof. Dr. Jürgen Dix (287/554)
4. Learning in networks 2. Neural networks

Important:
The output of the input-units is determined by the environment.

Question 1:
Which function does the figure describe?

Question 2:
Why can non-trivial functions be represented at all?

Artificial Intelligence – Prof. Dr. Jürgen Dix (288/554)


4. Learning in networks 2. Neural networks

Question 3:
How many units do we need?

a few: a small number of functions can be


represented,
many: the network learns by heart (overfitting
Chapter 186)

Artificial Intelligence – Prof. Dr. Jürgen Dix (289/554)


4. Learning in networks 2. Neural networks

Hopfield networks:
1
bidirectional links with symmetrical weights,
2
activation function: sign,
3
units are input- and output-units,
4
can store up to 0.14 N training examples.

Artificial Intelligence – Prof. Dr. Jürgen Dix (290/554)


4. Learning in networks 2. Neural networks

Learning with neural networks means the

adjustment of the parameters to ensure

consistency with the training-data.

Question:
How to find the optimal network structure?

Answer:
Perform a search in the space of network structures
(e.g. with genetic algorithms).

Artificial Intelligence – Prof. Dr. Jürgen Dix (291/554)


4. Learning in networks 3. The Perceptron

4.3 The Perceptron

Artificial Intelligence – Prof. Dr. Jürgen Dix (292/554)


4. Learning in networks 3. The Perceptron

Definition 46 (Perceptron)
A perceptron is a feed-forward network with only one
layer based on the activation function step0 .

Artificial Intelligence – Prof. Dr. Jürgen Dix (293/554)


4. Learning in networks 3. The Perceptron

Question:
Can every boolean function be represented by a
feed-forward network?

Can AND, OR and NOT be represented?


Is it possible to represent every boolean

function by simply combining these?

What about
n
1, if ∑ni=1 xi >

f (x1 , . . . , xn ) := 2
0, else.

Artificial Intelligence – Prof. Dr. Jürgen Dix (294/554)


4. Learning in networks 3. The Perceptron

Solution:
Every boolean function can be composed using
AND, OR and NOT (or even only NAND).
The combination of the respective perceptrons
is not a perceptron!

Artificial Intelligence – Prof. Dr. Jürgen Dix (295/554)


4. Learning in networks 3. The Perceptron

Perceptron with sigmoid activation

Perceptron output
1
0.8
0.6
0.4
0.2 4
0 2
−4 −2 0 x2
0 −2
x1 2 4 −4

Artificial Intelligence – Prof. Dr. Jürgen Dix (296/554)


4. Learning in networks 3. The Perceptron

Question:
What about XOR?

I1 I1 I1

1 1 1

0 0 0
0 1 I2 0 1 I2 0 1 I2
(a) I1 and I2 (b) I1 or I2 (c) I1 xor I2

Artificial Intelligence – Prof. Dr. Jürgen Dix (297/554)


4. Learning in networks 3. The Perceptron

n
Output = step0 ( ∑ Wj Ij )
j =1

∑nj=1 Wj Ij = 0 defines a n-dimensional hyperplane.


Definition 47 (Linear Separable)
A boolean function with n attributes is called linear
separable if there is a hyperplane
(n − 1-dimensional subspace) which separates the
positive domain-values from the negative ones.

Artificial Intelligence – Prof. Dr. Jürgen Dix (298/554)


4. Learning in networks 3. The Perceptron

I1

W = −1
I2 W = −1
t = −1.5

W = −1

I3

(a) Separating plane (b) Weights and threshold

Artificial Intelligence – Prof. Dr. Jürgen Dix (299/554)


4. Learning in networks 3. The Perceptron

Learning algorithm:
Similar to current best hypothesis ( chapter on
learning).
hypothesis: network with the current weights
(firstly randomly generated)
UPDATE: make it consistent through small
changes.
Important: for each example UPDATE is called
several times. These calls are called epochs.

Artificial Intelligence – Prof. Dr. Jürgen Dix (300/554)


4. Learning in networks 3. The Perceptron

function NEURAL-NETWORK-LEARNING(examples) returns network

network a network with randomly assigned weights


repeat
for each e in examples do
O NEURAL-NETWORK-OUTPUT(network, e)
T the observed output values from e
update the weights in network based on e, O, and T
end
until all examples correctly predicted or stopping criterion is reached
return network

Artificial Intelligence – Prof. Dr. Jürgen Dix (301/554)


4. Learning in networks 3. The Perceptron

Definition 48 (Perceptron Learning (step0 )

Perceptron learning modifies the weights Wj with


respect to this rule:

Wj := Wj + α × Ij × Error

with Error:= T − O (i.e. the difference between the


correct and the current output-value). α is the
learning rate.

Artificial Intelligence – Prof. Dr. Jürgen Dix (302/554)


4. Learning in networks 3. The Perceptron

Definition 49 (Perceptron Learning (sigmoid)

Perceptron learning modifies the weights Wj with


respect to this rule:

Wj := Wj + α × g 0 (in) Ij × Error

with Error:= T − O (i.e. the difference between the


current and the correct output-value). α is the
learning rate.

Artificial Intelligence – Prof. Dr. Jürgen Dix (303/554)


4. Learning in networks 3. The Perceptron

Der folgende Algorithmus verwendet xj [e] für Ij .

function P ERCEPTRON -L EARNING( , ) returns a perceptron hypothesis    


     

inputs: , a set of examples, each with input x


  

and output 





     

!
"

, a perceptron with weights


 

, and activation function


   

#
$


r


X
  

L
%

repeat
for each in    

do
!

&  )  +

H $ $
# *

$ }
W

  ) " + 0 % 2 &  5

* .

) 6 O   O % 2 &  5 O  +
9

$ $ $
# # 7 *

until some stopping criterion is satisfied


return N EURAL -N ET-H YPOTHESIS(      

Artificial Intelligence – Prof. Dr. Jürgen Dix (304/554)


4. Learning in networks 3. The Perceptron

Theorem 50 (Rosenblatt’s Theorem)


Every function which can be represented by a
perceptron is learned through the perceptron learning
algorithm (Definition 48).
More exactly: The series Wj converges to a function
which represents the examples correctly.

Artificial Intelligence – Prof. Dr. Jürgen Dix (305/554)


4. Learning in networks 3. The Perceptron

Proof:
Let ŵ be a solution, i.e. (why? exercise)

− →

ŵ I > 0 for all I ∈ Ipos ∪ −Ineg

with Ipos consisting of the positive and Ineg consisting of the



− → −
negative examples (and −Ineg = {− I : I ∈ Ineg }).

− → −
Let I0 := Ipos ∪ −Ineg and m := min {ŵ I : I ∈ I0 }.

→ →

w1 , . . . , wj , . . . be the sequence of weights resulting from the
algorithm.
We want to show that this sequence eventually becomes
constant.

Artificial Intelligence – Prof. Dr. Jürgen Dix (306/554)


4. Learning in networks 3. The Perceptron

Proof (continued):


Consider the possibility that not all wi are different from their
predecessor (or successor) (this is the case if the new example
is not consistent with the current weights). Let k1 , k2 , . . . , kj be
the indices of the changed weights (where error is non-zero), i.e.
→→
− − −−−→ = −
w I ≤ 0, w
→ →

w + αI .
kj kj kj +1 kj kj


− 0
With Ikj being the kj -th tested example in I (which is not
consistent (wrt the definition of kj )).
The we have

−−→ = −→ →
− →
− →

w w + αI + αI + . . . + α I
kj +1 k1 k1 k2 kj

Artificial Intelligence – Prof. Dr. Jürgen Dix (307/554)


4. Learning in networks 3. The Perceptron

Proof (continued):
We use this to show that j cannot become arbitrarily big. We
compose
−−−→
ŵ wkj +1
cos ω =
kŵ k k−
w
−−→k
kj +1

and estimate as follows (by decreasing the numerator and


increasing the denominator):
−−−→
ŵ wkj +1 −→
ŵ w + αmj
cos ω = −−→k ≥ q k1
kŵ k k−
w kj +1 kŵ k k− →
wk1 k2 + α2 Mj

The right side converges to infinity (when j increases to infinity)


and cosinus will never get greater than 1. This leads to the
contradiction.
Artificial Intelligence – Prof. Dr. Jürgen Dix (308/554)
4. Learning in networks 3. The Perceptron

Proof (continued):
How do we get the estimation above?
−−−→
the scalar product ŵ wkj +1 leads to
−→ →
− →
− −→
ŵ wk1 + αŵ ( I1 + . . . + Ij ) ≥ ŵ wk1 + αmj.


k− −−→ −

wkj +1 k2 = kwkj + α Ikj k2 =

−− →
− →

k− → → −

wkj k2 + 2α Ikj wkj + α2 k Ikj k2 ≤ kwkj k2 + α2 k Ikj k2 , da

−− →
Ikj wkj ≤ 0, this is how we have chosen kj .

− →

Now be M := max {k I k2 : I ∈ I0 }. Then

− →

k−
−−→ −→ −→
wkj +1 k2 ≤ kwk1 k2 + α2 k Ik1 k2 + . . . + α2 k Ikj k2 ≤ wk1 k2 + α2 Mj
holds.

Artificial Intelligence – Prof. Dr. Jürgen Dix (309/554)


4. Learning in networks 3. The Perceptron

Proportion correct on test set 1

0.9

0.8

0.7

0.6 Perceptron
Decision tree
0.5

0.4
0 10 20 30 40 50 60 70 80 90 100
Artificial Intelligence – Prof. Dr. Jürgen Dix (310/554)
4. Learning in networks 3. The Perceptron

Proportion correct on test set 1

0.9

0.8 Decision tree


Perceptron
0.7

0.6

0.5

0.4
0 10 20 30 40 50 60 70 80 90 100
Artificial Intelligence – Prof. Dr. Jürgen Dix (311/554)
4. Learning in networks 4. Multi-layer feed-forward

4.4 Multi-layer feed-forward

Artificial Intelligence – Prof. Dr. Jürgen Dix (312/554)


4. Learning in networks 4. Multi-layer feed-forward

Problem:
How does the error-function of the hidden units look
like?

Learning with multi-layer networks is called back


propagation.
Hidden units can be seen as perceptrons (Figure
on page 296. The outcome can be a linear
combination of such perceptrons (see next two
slides).

Artificial Intelligence – Prof. Dr. Jürgen Dix (313/554)


4. Learning in networks 4. Multi-layer feed-forward

hW(x1, x2)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1 4
0 2
-4 -2 0 x2
0 -2
x1 2 4 -4

Artificial Intelligence – Prof. Dr. Jürgen Dix (314/554)


4. Learning in networks 4. Multi-layer feed-forward

hW(x1, x2)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2 4
0.1 2
0 0 x2
-4 -2 -2
0 2 -4
x1 4

Artificial Intelligence – Prof. Dr. Jürgen Dix (315/554)


4. Learning in networks 4. Multi-layer feed-forward

Output units Oi

Wj,i

Hidden units a j

Wk,j

Input units Ik

Artificial Intelligence – Prof. Dr. Jürgen Dix (316/554)


4. Learning in networks 4. Multi-layer feed-forward

The perceptron was not powerful enough in our


restaurant-example. So we try 2 layers. 10 attributes
lead to 10 input-units.
Question
How many hidden units are necessary?

Answer:
Four!
Perceptron’s error is easily determined because there
was only one Wj between input and output. Now we
have several.
How should the error be distributed?
Artificial Intelligence – Prof. Dr. Jürgen Dix (317/554)
4. Learning in networks 4. Multi-layer feed-forward

We minimise E = 12 ∑i (Ti − Oi )2 and get


1
E = 2 ∑i
(Ti − g (∑j Wj ,i aj ))2
1
= 2 ∑i
(Ti − g (∑j Wj ,i g (∑k Wk ,j Ik )))2
Do a gradient descent.

∂E
∂Wj ,i = 12 ∑i (Ti − Oi )2
= 12 2(Ti − g (. . .))(− ∂∂gx (ini ))aj
= −aj (Ti − Oi ) ∂∂gx (ini )
= −aj ∆i

Artificial Intelligence – Prof. Dr. Jürgen Dix (318/554)


4. Learning in networks 4. Multi-layer feed-forward

Now the Wk ,j :
∂E
∂Wk ,j = 12 ∑i (2(Ti − g (. . .))(− ∂∂gx (ini ))(Wj ,i ∂∂gx (inj )Ik ))
= ∑i (∆i Wj ,i ∂∂gx (inj )Ik )
= ∂∂gx (inj )Ik ∑i Wj ,i ∆i

Artificial Intelligence – Prof. Dr. Jürgen Dix (319/554)


4. Learning in networks 4. Multi-layer feed-forward

Idea:
We perform two different updates. One for the
weights to the input units and one for the weights to
the output units.

output units: similar to the perceptron


Wj ,i := Wj ,i + α × aj × Errori × g 0 (ini )
Instead of Errori × g 0 (ini ) write ∆i .
hidden units: each hidden unit j is partly responsible for the
error ∆i (if j is connected with the output unit i).

Wk ,j := Wk ,j + α × Ik × ∆j
with ∆j := g 0 (inj ) ∑i Wj ,i ∆i .

Artificial Intelligence – Prof. Dr. Jürgen Dix (320/554)


4. Learning in networks 4. Multi-layer feed-forward

function BACK -P ROP -L EARNING( , ) returns a neural network    


     

inputs: 

, a set of examples, each with input vector x and output vector y


  

, a multilayer network with layers, weights


  

, activation function
 

 




repeat
for each in do    

for each node in the input layer do r





! 

 #
$

for = 2 to do 
%

&  ! 

 H   
 

 ! ) &  +

 

for each node in the output layer do Q

! ) &  + O ) $ 3 5 +
.

   
1 #

for = to 1 do

% 3 6

for each node in layer do r 

, ,

! ) &  +
.

    
H  

for each node in layer Q 


7 6

do
,

! 7 O 5 O

     
    9

until some stopping criterion is satisfied


return
Artificial Intelligence – Prof. Dr. Jürgen-N
N EURAL DixET-H YPOTHESIS(
     

) (321/554)
4. Learning in networks 4. Multi-layer feed-forward

14
12
Total error on training set

10
8

6
4
2
0
0 50 100 150 200 250 300 350 400
Number of epochs

Artificial Intelligence – Prof. Dr. Jürgen Dix (322/554)


4. Learning in networks 4. Multi-layer feed-forward

Proportion correct on test set 1

0.9

0.8

0.7

0.6 Decision tree


Multilayer network
0.5

0.4
0 10 20 30 40 50 60 70 80 90 100
Artificial Intelligence – Prof. Dr. Jürgen Dix (323/554)
4. Learning in networks 4. Multi-layer feed-forward

Back propagation algorithm:


1
calculate ∆i for the output units based on the
observed error errori .
2
for each layer proceed recursively (output layer
first):
back propagate the ∆i (predecessor layer)
modify the weight between the current layers

Important:
Back propagation is gradient search!

Artificial Intelligence – Prof. Dr. Jürgen Dix (324/554)


4. Learning in networks 4. Multi-layer feed-forward

error is the function of the network’s weights. This


function delivers the error surface.
Err

W1
a

b
Artificial Intelligence – Prof. Dr. Jürgen Dix (325/554)
4. Learning in networks 4. Multi-layer feed-forward

General remarks:
expressibility: neural networks are suitable for
continous input and outputs (noise).
To represent all boolean functions with
n
n attributes 2n hidden units suffice.
Often much less suffice: the art of
determining the topology of the network.

Artificial Intelligence – Prof. Dr. Jürgen Dix (326/554)


4. Learning in networks 4. Multi-layer feed-forward

efficiency: m examples, |W | weights: each epoch


needs O(m × |W |)-time. We know:
Number of epochs is exponential.
In practice the time of convergence is very
variable.
Problem: local minima on the error
surface.

Artificial Intelligence – Prof. Dr. Jürgen Dix (327/554)


4. Learning in networks 4. Multi-layer feed-forward

transparency: black box. Trees and lists explain their


results!

Artificial Intelligence – Prof. Dr. Jürgen Dix (328/554)


4. Learning in networks 5. Kernel Machines

4.5 Kernel Machines

Artificial Intelligence – Prof. Dr. Jürgen Dix (329/554)


4. Learning in networks 5. Kernel Machines

1.5

0.5
x2

-0.5

-1

-1.5
-1.5
Artificial Intelligence – Prof. Dr. Jürgen Dix -1 -0.5 0 0.5 1 1.5 (330/554)
4. Learning in networks 5. Kernel Machines

The last figure is not linear separable. Can it be


transformed into a linear separable set?
Yes, in a higher dimensional space!


F : R2 ← R3 ; hx1 , x2 i 7→ hx12 , x22 , 2x1 x2 i

Note that

F (xi )F (xj ) = (xi xj )2

Artificial Intelligence – Prof. Dr. Jürgen Dix (331/554)


4. Learning in networks 5. Kernel Machines

√2x1x2
3
2
1
0
-1
-2 2.5
-3 2
0 1.5
0.5
1 1 x22
1.5 0.5
x21 2
Artificial Intelligence – Prof. Dr. Jürgen Dix (332/554)
4. Learning in networks 5. Kernel Machines

Given n data points. Can they always be linearly


separated in n − 1 dimensions?
Yes, so what?

Overfitting!!!
Support Vector Machines
Kernel (or Support Vector-) machines were invented
to compute the optimal linear separator.
.

Artificial Intelligence – Prof. Dr. Jürgen Dix (333/554)


4. Learning in networks 5. Kernel Machines

Given pairs hxi , yi i (yi ∈ {−1, 1} is the classification).


Find αi such that the following is maximised

∑ αi − ∑(αi αj yi yj (xi xj ))
i i ,j

subject to the constraints αi ≥ 0 and ∑i αi yi = 0.


The separator is then given by

h(x) = sign( ∑ αi yi (xxi )


i

Most of the αi are 0, only those closest to the


separator are nonzero: they are called support
vectors.
Artificial Intelligence – Prof. Dr. Jürgen Dix (334/554)
4. Learning in networks 5. Kernel Machines

0.8

0.6
x22

0.4

0.2

0
0
Artificial Intelligence – Prof. Dr. Jürgen Dix 0.2 0.4 0.6 0.8 1 (335/554)
4. Learning in networks 6. Applications

4.6 Applications

Artificial Intelligence – Prof. Dr. Jürgen Dix (336/554)


4. Learning in networks 6. Applications

Each of the following applications has been solved in


test-period of several months.
Pronounciation:
(1987) NETtalk can pronounce English texts. The
input consits of the letters-to-be-spoken, the
predecessor and three successors:
80 hidden units.
output layer: high pitch, low pitch, stressed,
unstressed, ...
training: 1024 words. After 50 epochs: 95%
works correctly on the training-set.
testing-set: 78 % correct.
Artificial Intelligence – Prof. Dr. Jürgen Dix (337/554)
4. Learning in networks 6. Applications

Handwritten character recognition:


(1989) recognition of the zipcode on envelopes.
input 6 × 16 pixel-array. 3 hidden layers, each
with 768, 192, 30 units. 10 output units (0 – 9).
This leads to 200000 weights!
Trick: Units in the first layer observe only a 5 × 5
submatrix. 768 = 12 × 64. Each of these 12
groups uses its own weights. Each group is
responsible for one characteristic. Alltogether
9760 weights.
training: 7300 examples.
testing-set: 99 % correct.
Artificial Intelligence – Prof. Dr. Jürgen Dix (338/554)
4. Learning in networks 6. Applications

Driving:
(1993) ALVINN. Actions: steer, accelerate, brake. Driving
straight.
input: color stereo video, radar.
Picture is a 30 × 32 pixel-array (input units).
output layer: 30 units (representing the steering direction).
The output unit with the highest activation is chosen.
Demanded is a function which maps each image on a
steering direction.
hidden units: one layer with 5 units
training: a human drives for 5 minutes, the network
observes.
back-propagation: 10 minutes
problems:
the human drives too good
brightness! Depends on covering and weather.
Artificial Intelligence – Prof. Dr. Jürgen Dix (339/554)
4. Learning in networks 7. Genetic Algorithms

4.7 Genetic Algorithms

Artificial Intelligence – Prof. Dr. Jürgen Dix (340/554)


4. Learning in networks 7. Genetic Algorithms

Remember:
Darwin, Lamarck
survival of the fittest
mutation, selection, reproduction.

Algorithm:
Works on a population, reproduces and selects
according to a fitness-function and returns
individuals.

Artificial Intelligence – Prof. Dr. Jürgen Dix (341/554)


4. Learning in networks 7. Genetic Algorithms

function GENETIC-ALGORITHM( population, FITNESS-FN) returns an individual


inputs: population, a set of individuals
FITNESS-FN, a function that measures the fitness of an individual
repeat
parents SELECTION( population, FITNESS-FN)
population REPRODUCTION( parents)
until some individual is fit enough
return the best individual in population, according to FITNESS-FN

Artificial Intelligence – Prof. Dr. Jürgen Dix (342/554)


4. Learning in networks 7. Genetic Algorithms

Question:
What does this have to do with learning?

The role of the individual:


function of an agent
(fitness is performance),
component function of an agent
(fitness is critic).
GENETIC-ALGORITHM represents concurrent
search with hill-climbing.

Artificial Intelligence – Prof. Dr. Jürgen Dix (343/554)


4. Learning in networks 7. Genetic Algorithms

Question:
How to apply this learning algorithm?

1 fitness function: set of individuals ⇒ IR,


2 representation of an individual: a string (finite alphabet),
3 selections of individuals: randomly, probability
proportional to the fitness,
4 reproduction of individuals: cross over and mutation.
Selected individuals are randomly mated. For each pair a
cross-over-position is randomly chosen (n0 ∈ IN): i.e.
descendant has the first n0 characters of the first parent
and the last of the second parent. Each character mutates
with a certain probability.

Artificial Intelligence – Prof. Dr. Jürgen Dix (344/554)


4. Learning in networks 7. Genetic Algorithms

Example 51 (Restaurant revisited)


fitness function: number of examples which are
consistent with an individual ( a decision list).
representation: (remember: 10 attributes, some
not boolean).
To represent all pairs (attribute, value) we need 5
bits. For k − DL(n) we need: t × (5 × k + 1) bits.

Artificial Intelligence – Prof. Dr. Jürgen Dix (345/554)


4. Learning in networks 7. Genetic Algorithms

000110010111 8 32% 111010101100 111010010111 111010010111

111010101100 6 24% 000110010111 000110101100 000110101100

001110101001 6 24% 111010101100 111010101001 111110101001

111011011100 5 20% 001110101001 001110101100 001110101101

(a) (b) (c) (d) (e)


Initial Population Fitness Function Selection Cross−Over Mutation

Artificial Intelligence – Prof. Dr. Jürgen Dix (346/554)


5. Knowledge Engineering

5. Knowledge Engineering

Artificial Intelligence – Prof. Dr. Jürgen Dix (347/554)


5. Knowledge Engineering

5.1 Sentential Logic (SL)


5.2 Calculi for SL
5.3 The Wumpus world in SL
5.4 The P=NP problem
5.5 First-Order Logic (FOL)
5.6 Herbrand’s Theorem
5.7 Robinson’s Resolution
5.6 The Situation Calculus
5.7 Important Problems of KR

Artificial Intelligence – Prof. Dr. Jürgen Dix (348/554)


5. Knowledge Engineering 1. Sentential Logic

5.1 Sentential Logic

Artificial Intelligence – Prof. Dr. Jürgen Dix (349/554)


5. Knowledge Engineering 1. Sentential Logic

Definition 52 (Sentential Logic LSL , Language L ⊆ LSL )

The language LSL of propositional (or sentential) logic consists


of
 and >: falsum and verum,
p, q , r , x1 , x2 , . . . xn , . . .: a countable set AT of SL-constants,
¬, ∧, ∨, →: the sentential connectives (¬ is unary, all
others are binary operators),
(, ): the parantheses to help the readability.
Generally we consider only a finite set of SL-constants. They
define a language L ⊆ LSL . The set of L -formulae FmlL is
defined inductively.

Artificial Intelligence – Prof. Dr. Jürgen Dix (350/554)


5. Knowledge Engineering 1. Sentential Logic

Definition 53 (Semantics, Valuation, Model)

A valuation v for a language L ⊆ LSL is a mapping from the set


of SL-constants defined through L and {>, } into the set
{true, false} with v () = false, v (>) = true.
Each valuation v can be uniquely extended to a function
v̄ : FmlL → {true, false} so that:

true, if v̄ (p) = false,
v̄ (¬p) =
false, if v̄ (p) = true.

true, if v̄ (ϕ) = true and v̄ (γ) = true,
v̄ (ϕ ∧ γ) =
false, else

true, if v̄ (ϕ) = true or v̄ (γ) = true,
v̄ (ϕ ∨ γ) =
false, else

Artificial Intelligence – Prof. Dr. Jürgen Dix (351/554)


5. Knowledge Engineering 1. Sentential Logic

Definition (continued)
v̄(ϕ → γ) =
true, if v̄ (ϕ) = false or (v̄ (ϕ) = true and v̄ (γ) = true),
false, else
Thus each valuation v uniquely defines a v̄ . We call v̄
L -structure or model Av . From now in we will speak of models
and valuations.
A model determines for each formula if it is true or false.
The process of mapping a set of L-formulae into {true, false} is
called semantics.

Artificial Intelligence – Prof. Dr. Jürgen Dix (352/554)


5. Knowledge Engineering 1. Sentential Logic

Definition 54 (Validity of a Formula, Tautology)


1
A formula ϕ ∈ FmlL holds under the valuation v if
v̄ (ϕ) = true. We also write v̄ |= ϕ or simply
v |= ϕ.
2
A theory is a set of formulae: T ⊆ FmlL . v
satisfies T if v̄ (ϕ) = true for all ϕ ⊆ T . We write
v |= T .
3
A L -formula ϕ is called L -tautology if for all
possible valuations v in L v |= ϕ holds.
From now on we suppress the language L , because
it is obvious from context. Nevertheless it needs to be
carefully defined.
Artificial Intelligence – Prof. Dr. Jürgen Dix (353/554)
5. Knowledge Engineering 1. Sentential Logic

Definition 55 (Consequence Set Cn(T ))


A formula ϕ results from T if for all valuations v with
v |= T also v |= ϕ holds. We write: T |= ϕ.
We call

CnL (T ) =def {ϕ ∈ FmlL : T |= ϕ},

or simply Cn(T ), the semantic consequence


operator.

Artificial Intelligence – Prof. Dr. Jürgen Dix (354/554)


5. Knowledge Engineering 1. Sentential Logic

Lemma 56 (Properties of Cn(T ))


The semantic consequence operator has the following
properties:
1 T -expansion: T ⊆ Cn(T ),
2 Monotony: T ⊆ T 0 ⇒ Cn(T ) ⊆ Cn(T 0 ),
3 Closure: Cn(Cn(T )) = Cn(T ).

Lemma 57 (ϕ 6∈ Cn(T))
ϕ 6∈ Cn(T ) if and only if there is a valuation v with v |=
T and v̄ (ϕ) = false.

Artificial Intelligence – Prof. Dr. Jürgen Dix (355/554)


5. Knowledge Engineering 1. Sentential Logic

Definition 58 (MOD(T), Cn(U ))

If T ⊆ FmlL then we denote with MOD(T ) the set of all


L -structures A which are models of T :
MOD(T ) =def {A : A |= T }.
If U is a set of models, we consider all those sentences, which
are valid in all models of U . We call this set Cn(U ):

Cn(U ) =def {ϕ ∈ FmlL : ∀v ∈ U : v̄ (ϕ) = true}.

MOD is obviously dual to Cn:

Cn(MOD(T )) = Cn(T ), MOD(Cn(T )) = MOD(T ).

Artificial Intelligence – Prof. Dr. Jürgen Dix (356/554)


5. Knowledge Engineering 1. Sentential Logic

Definition 59 (Completeness of a Theory T )


T is called complete if for each formula ϕ ∈ Fml: T |= ϕ or
T |= ¬ϕ holds.

Attention:
Do not mix up this last condition with the property of a valuation
v (or a model): each valuation is complete in the above sense.

Artificial Intelligence – Prof. Dr. Jürgen Dix (357/554)


5. Knowledge Engineering 1. Sentential Logic

Definition 60 (Consistency of a Theory)


T is called consistent if there is a valuation v with v̄ (ϕ) = true
for all ϕ ∈ T .

Lemma 61 (Ex Falso Quodlibet)


T is consistent if and only if Cn(T ) 6= FmlL .

Artificial Intelligence – Prof. Dr. Jürgen Dix (358/554)


5. Knowledge Engineering 2. Calculi for SL

5.2 Calculi for SL

Artificial Intelligence – Prof. Dr. Jürgen Dix (359/554)


5. Knowledge Engineering 2. Calculi for SL

Definition 62 (Axiom, Inference Rule)

Axioms in SL are the following formulae:


p → >,  → p, ¬> → ,  → ¬>,
(p ∧ q ) → p, (p ∧ q ) → q,
p → (p ∨ q ), q → (p ∨ q ),
¬¬p → p, (p → q ) → ((p → ¬q ) → ¬p),
p → (q → p), p → (q → (p ∧ q )).

Artificial Intelligence – Prof. Dr. Jürgen Dix (360/554)


5. Knowledge Engineering 2. Calculi for SL

Definition (continued)
The only inference rule in SL is modus ponens:

MP : Fml × Fml → Fml : (ϕ, ϕ → ψ) 7→ ψ.

or short
ϕ, ϕ → ψ
(MP) .
ψ
(ϕ, ψ are arbitrary complex formulae).

Artificial Intelligence – Prof. Dr. Jürgen Dix (361/554)


5. Knowledge Engineering 2. Calculi for SL

Definition 63 (Proof)

A proof of a formula ϕ from a theory T ⊆ FmlL is a


sequence ϕ1 , . . . , ϕn of formulae such that ϕn = ϕ
and for all i with 1 ≤ i ≤ n one of the following
conditions holds:
ϕi is substitution instance of an axiom
ϕi ∈ T
there is ϕl , ϕk = (ϕl → ϕi ) with l , k < i. Then ϕi
is the result of the application of modus ponens
on the predecessor-formulae of ϕi .
We write: T ` ϕ (ϕ can be derived from T ).
Artificial Intelligence – Prof. Dr. Jürgen Dix (362/554)
5. Knowledge Engineering 2. Calculi for SL

Show that:
1
A ` A ∨ B,
2
` A ∨ ¬A,
3
the rule
A → ϕ, ¬A → ψ
(R)
ϕ∨ψ
can be derived.

Artificial Intelligence – Prof. Dr. Jürgen Dix (363/554)


5. Knowledge Engineering 2. Calculi for SL

Theorem 64 (Completeness)

A formula follows semantically from a theory T if


and only if it can be derived:

T |= ϕ if and only if T ` ϕ

Theorem 65 (Compactness)
A formula semantically follows from a theory T if
and only if it already follows from a finite subset of T :

Cn(T ) = {Cn(T 0 ) : T 0 ⊆ T , T 0 finite}.


[

Artificial Intelligence – Prof. Dr. Jürgen Dix (364/554)


5. Knowledge Engineering 2. Calculi for SL

Although the axioms from above and modus ponens suffice it is


reasonable to consider more general systems. Therefore we
introduce the following term:
Definition 66 (Rule System MP + D)
Let D be a set of general inference rules, e.g. mappings, which
assign a formula ψ to a finite set of formulae ϕ1 , ϕ2 , . . . , ϕn . We
write
ϕ1 , ϕ2 , . . . , ϕn
.
ψ
MP + D is the rule systems which emerges from adding the
rules in D to modus ponens. For W ⊆ Fml be

CnD (W )

in the following the set of all formulae ϕ, which can be derived


from W and the inference rules from MP+D.
Artificial Intelligence – Prof. Dr. Jürgen Dix (365/554)
5. Knowledge Engineering 2. Calculi for SL

We bear in mind that Cn(W ) is defined semantically but


CnD (W ) is defined syntactically (using the notion of proof).
Both sets are equal according to the completeness-theorem in
the special case D = 0/ .

Lemma 67 (Properties of CnD )


D be a set of general inference rules and W ⊆ Fml. Then:
1 Cn(W ) ⊆ CnD (W ).
2 CnD (CnD (W )) = CnD (W ).
3 CnD (W ) is the smallest set which is closed in respect to D
and contains W .

Artificial Intelligence – Prof. Dr. Jürgen Dix (366/554)


5. Knowledge Engineering 2. Calculi for SL

Question:
ϕ
What is the difference between an inference rule ψ and the
implication ϕ → ψ?
Supposed we have a set T of formulae and we choose two
constants p, q ∈ L . We could either consider
p
(1) T together with MP and { q }
or
(2) T ∪ {p → q } together with MP

Artificial Intelligence – Prof. Dr. Jürgen Dix (367/554)


5. Knowledge Engineering 2. Calculi for SL

{ qp }
1. Case: Cn (T ),
2. Case: Cn(T ∪ {p → q }).
If T = {¬q }, then we have in (2):
¬p ∈ Cn(T ∪ {p → q }), but not in (1).
——————————————————————

Artificial Intelligence – Prof. Dr. Jürgen Dix (368/554)


5. Knowledge Engineering 2. Calculi for SL

A resolution calculus for SL


Given be a set M of clauses of the form

A ∨ ¬B ∨ C ∨ . . . ∨ ¬E

Such a clause is also written as the set

{A, ¬B , C , . . . , ¬E }.
We define the following inference rule:
Definition 68 (SL resolution)
Deduce the clause C1 ∪ C2 from C1 ∪ {A} and C2 ∪ {¬A}.

Question:
Is this calculus correct and complete?

Artificial Intelligence – Prof. Dr. Jürgen Dix (369/554)


5. Knowledge Engineering 2. Calculi for SL

Answer:
No!

But every problem of the kind “T |= φ” is equivalent to

“T ∪ {¬φ} is unsatisfiable”

or rather to

T ∪ {¬φ} ` 

(` stands for the calculus introduced above).


Theorem 69 (Completeness of Resolution Refutation)
If M is an unsatisfiable set of clauses then the empty clause 
can be derived from M using only the resolution rule.

We also say that resolution is refutation complete.

Artificial Intelligence – Prof. Dr. Jürgen Dix (370/554)


5. Knowledge Engineering 3. Wumpus in SL

5.3 Wumpus in SL

Artificial Intelligence – Prof. Dr. Jürgen Dix (371/554)


5. Knowledge Engineering 3. Wumpus in SL

Stench Breeze
4 PIT

Breeze
Breeze
3 Stench PIT
Gold

Stench Breeze
2

Breeze Breeze
1 PIT
START

1 2 3 4
Artificial Intelligence – Prof. Dr. Jürgen Dix (372/554)
5. Knowledge Engineering 3. Wumpus in SL

1,4 2,4 3,4 4,4 A = Agent 1,4 2,4 3,4 4,4


B = Breeze
G = Glitter, Gold
OK = Safe square
1,3 2,3 3,3 4,3 P = Pit 1,3 2,3 3,3 4,3
S = Stench
V = Visited
W = Wumpus
1,2 2,2 3,2 4,2 1,2 2,2 3,2 4,2
P?

OK OK
1,1 2,1 3,1 4,1 1,1 2,1 3,1 4,1
A P?
A
V B
OK OK OK OK

(a) (b)

Artificial Intelligence – Prof. Dr. Jürgen Dix (373/554)


5. Knowledge Engineering 3. Wumpus in SL

1,4 2,4 3,4 4,4 A = Agent 1,4 2,4 3,4 4,4


P?
B = Breeze
G = Glitter, Gold
OK = Safe square
1,3 2,3 3,3 4,3 P = Pit 1,3 W! 2,3 3,3 4,3
W! A P?
S = Stench
S G
V = Visited B
W = Wumpus
1,2 2,2 3,2 4,2 1,2 2,2 3,2 4,2
A S
S V V
OK OK OK OK
1,1 2,1 3,1 4,1 1,1 2,1 3,1 4,1
B P! B P!
V V V V
OK OK OK OK

(a) (b)

Artificial Intelligence – Prof. Dr. Jürgen Dix (374/554)


5. Knowledge Engineering 3. Wumpus in SL

Language definition:
Si ,j stench
Bi ,j breeze
Piti ,j is a pit
Gli ,j glitters
Wi ,j contains Wumpus

General knowledge:
¬S1,1 −→ (¬W1,1 ∧ ¬W1,2 ∧ ¬W2,1 )
¬S2,1 −→ (¬W1,1 ∧ ¬W2,1 ∧ ¬W2,2 ∧ ¬W3,1 )
¬S1,2 −→ (¬W1,1 ∧ ¬W1,2 ∧ ¬W2,2 ∧ ¬W1,3 )

S1,2 −→ (W1,3 ∨ W1,2 ∨ W2,2 ∨ W1,1 )

Artificial Intelligence – Prof. Dr. Jürgen Dix (375/554)


5. Knowledge Engineering 3. Wumpus in SL

Knowledge after the 3rd move:

¬S1,1 ∧ ¬S2,1 ∧ S1,2 ∧ ¬B1,1 ∧ ¬B2,1 ∧ ¬B1,2

Question:
Can we deduce that the wumpus is located at (1,3)?

Answer:
Yes. Either via resolution or using our Hilbert-calculus.

Artificial Intelligence – Prof. Dr. Jürgen Dix (376/554)


5. Knowledge Engineering 3. Wumpus in SL

Problem:
We want more: given a certain situation we want to choose the
best action. This is impossible in SL.

But we can check for each action if it should be done or not.


Therefore we need additional axioms:
A1,1 ∧ East ∧ W2,1 −→ ¬Forward
A1,1 ∧ East ∧ Grube2,1 −→ ¬Forward
Ai ,j ∧ Gli ,j −→ TakeGold

Artificial Intelligence – Prof. Dr. Jürgen Dix (377/554)


5. Knowledge Engineering 3. Wumpus in SL

Stench Breeze
4 PIT

Breeze
Breeze
3 Stench PIT
Gold

Stench Breeze
2

Breeze Breeze
1 PIT
START

1 2 3 4
Artificial Intelligence – Prof. Dr. Jürgen Dix (378/554)
5. Knowledge Engineering 3. Wumpus in SL

Disadvantages
actions can only be guessed
database must be changed continuously
the set of rules becomes very big because there
are no variables
Using an appropriate formalisation (additional
axioms) we can check if

KB ` ¬ action or KB ` action
But it can happen that neither one nor the other is
deducible.

Artificial Intelligence – Prof. Dr. Jürgen Dix (379/554)


5. Knowledge Engineering 4. The P=NP Problem

5.4 The P=NP Problem

Artificial Intelligence – Prof. Dr. Jürgen Dix (380/554)


5. Knowledge Engineering 4. The P=NP Problem

Observation:
Polynomial algorithms are fast, exponential ones are
slow!
To show the unsatisfiability of an PL-formula using
the truth-table-method we need to test 2n
assignments in the worst case (n attributes).
Question:
Can this problem be solved in polynomial time with
other methods?

Artificial Intelligence – Prof. Dr. Jürgen Dix (381/554)


5. Knowledge Engineering 4. The P=NP Problem

We consider yes-no-problems. For each instance I of


a problem we can ask if I has a respective property
or not: I is yes- or no-instance.
Definition 70 (the class P)
A problem is in the class P – problems which can be
deterministically recognized in polynomial time – if
there is an algorithm Alg and a problem p(n) so that
for each instance I the following holds:
I is yes-instance iff Alg replies with yes to the
input I after at most
p(length(I ))-steps.

Artificial Intelligence – Prof. Dr. Jürgen Dix (382/554)


5. Knowledge Engineering 4. The P=NP Problem

Definition 71 (the class NP)


A problem is in the class NP – problems which can be
non-deterministically recognized in polynomial time –
if there is an algorithm Check and a problem p(n) so
that for each instance I the following holds:
I is yes-instance iff there is an instance I 0 of an
other problem:
(1) length(I 0 ) < p(length(I )),
(2) Check replies with yes
to the input I 0 after at most
p(length(I ))-steps.

Artificial Intelligence – Prof. Dr. Jürgen Dix (383/554)


5. Knowledge Engineering 4. The P=NP Problem

Obvious:
The satisfiability of PL-formulae is in NP and P is
subset of NP.

Artificial Intelligence – Prof. Dr. Jürgen Dix (384/554)


5. Knowledge Engineering 4. The P=NP Problem

Question:
Which are the most difficult problems in NP?

Problems which can be polynomially reduced onto each other


are obviously equivalent.
Definition 72 ()
A problem Prob1 is polynomially transformable into a problem
Prob2 if for each instance I1 of Prob1 you can calculate an
instance I2 of Prob2 in polynomial time so that the following
holds:
I1 is yes-instance of Prob1
if and only if
I2 is yes-instance of Prob2

Artificial Intelligence – Prof. Dr. Jürgen Dix (385/554)


5. Knowledge Engineering 4. The P=NP Problem

Definition 73 (NP-completeness)
A problem in NP is called NP-complete if all problems
in NP can be reduced onto it in polynomial time.

A problem is called NP-hard if all problems in NP can


be reduced onto it in polynomial time.

Artificial Intelligence – Prof. Dr. Jürgen Dix (386/554)


5. Knowledge Engineering 4. The P=NP Problem

These problems are NP-complete:


nonprimeness
Hamiltonian circle
satisfiability
maximum clique
3-colorability
subset sum

Artificial Intelligence – Prof. Dr. Jürgen Dix (387/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

5.5 First Order Logic (FOL)

Artificial Intelligence – Prof. Dr. Jürgen Dix (388/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition 74 (First order logic LFOL , language L ⊆ LFOL )


The language LFOL of the first order logic (Praedikatenlogik
erster Stufe) is:
x , y , z , x1 , x2 , . . . , xn , . . .: a countable set Var of variables
for each k ∈ IN: P1k , P2k , . . . , Pnk , . . . a countable set Predk of
k -dimensional predicate symbols (the 0-dimensional
predicate symbols are the propositional logic constants
from At of LPL . We suppose that  and > are available.)
for each k ∈ IN: f1k , f2k , . . . , fnk , . . . a countable set Functk of
k -dimensional function symbols
¬, ∧, ∨, →: the sentential connectives
(, ): the parentheses
∀, ∃: quantors

Artificial Intelligence – Prof. Dr. Jürgen Dix (389/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition (continued)
The 0-dimensional function symbols are called individuum
constants – we leave out the parentheses. In general we will
need – as in propositional logic – only a certain subset of the
predicate or function symbols.
These define a language L ⊆ LFOL (analogously to definition 52
on page 2). The used set of predicate and function symbols is
also called signature Σ.

Artificial Intelligence – Prof. Dr. Jürgen Dix (390/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition (continued)
The concept of an L -term t and an L -formula ϕ are defined
inductively:
Term: L -terms t are defined as follows:
1 each variable is a L -term.
2 if f k is a k -dimensional function symbol from L and t1 ,
. . . ,tk are L -terms, then f k (t1 , . . . , tk ) is a L -Term.
The set of all L -terms that one can create from the set X ⊆ Var
is called TermL (X ) or TermΣ (X ). Using X = 0/ we get the set of
basic terms TermL (0) / , short: TermL .

Artificial Intelligence – Prof. Dr. Jürgen Dix (391/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition (continued)
Formula: L -formulae ϕ are also defined inductively:
k
1 if P is a k -dimensional predicate symbol from L and t1 ,
. . . ,tk are L -terms then P k (t1 , . . . , tk ) is an L -formula
2 for all L -formulae ϕ is (¬ϕ) a L -formula
3 for all L formulae ϕ and ψ are (ϕ ∧ ψ) and (ϕ ∨ ψ)
L -formulae.
4 if x is a variable and ϕ a L -formula then are (∃ x ϕ) and
(∀ x ϕ) L -formulae.

Artificial Intelligence – Prof. Dr. Jürgen Dix (392/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition (continued)
Atomic L -formulae are those which are composed according to
1., we call them AtL (X ) (X ⊆ VAR ). The set of all L -formulae
in respect to X is called FmlL (X ).
Positive formulae (FmlL+ (X )) are those which are composed
using only 1, 3. and 4.
If ϕ is a L -formula and is part of an other L -formula ψ then ϕ is
called sub-formula of ψ.

Artificial Intelligence – Prof. Dr. Jürgen Dix (393/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

An illustrating example

Example 75 (From semigroups to rings)


We consider L = {0, 1, +, ·, ≤, =}, where 0, 1 are constants,
+, · binary operations and ≤, = binary relations. What can one
express in this language?
Ax 1: ∀x ∀y ∀z x + (y + z ) = (x + y ) + z
Ax 2: ∀x (x + 0 = 0 + x ) ∧ (0 + x = x )
Ax 3: ∀x ∃y (x + y = 0) ∧ (y + x = 0)
Ax 4: ∀x ∀y x +y = y +x
Ax 5: ∀x ∀y ∀z x · (y · z ) = (x · y ) · z
Ax 6: ∀x ∀y ∀z x · (y + z ) = x · y + x · z
Ax 7: ∀x ∀y ∀z (y + z ) · x = y · x + z · x
Axiom 1 describes an semigroup, the axioms 1-2 describe a
monoid, the axioms 1-3 a group, and the axioms 1-7 a ring.

Artificial Intelligence – Prof. Dr. Jürgen Dix (394/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition 76 (L -structure A = (UA , IA ))


A L -structure or a L -interpretation is a pair A =def (UA , IA ) with
UA being an arbitrary non-empty set, which is called the basic
set (the universe or the individuum range) of A . Further IA is a
mapping which
assigns to each k -dimensional predicate symbol P k in L a
k -dimensional predicate over UA
assigns to each k -dimensional function symbol f k in L a
k -dimensional function on UA

In other words: the domain of IA is exactly the set of predicate


and function symbols of L .

Artificial Intelligence – Prof. Dr. Jürgen Dix (395/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition (continued)
The range of IA consists of the predicates and functions on UA .
We write:

IA (P ) = P A , IA (f ) = f A .

ϕ be a L1 -formula and A =def (UA , IA ) a L -structure. A is


called matching with ϕ if IA is defined for all predicate and
function symbols which appear in ϕ, i.e. if L1 ⊆ L .

Artificial Intelligence – Prof. Dr. Jürgen Dix (396/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition 77 (Variable assignment ρ)


A variable assignment ρ over a L -structure
A = (UA , IA ) is a function
ρ : V a∇ → UA ; x 7→ ρ(x ).

Artificial Intelligence – Prof. Dr. Jürgen Dix (397/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition 78 (Semantics of first order logic)


ϕ be a formula, A a structure matching with ϕ and ρ a variable
assignment over A . For each term t, which can be built from
components of ϕ, we define inductively the value of t in the
structure A , we call A (t ).
1 for a variable x is A (x ) =def ρ(x ).
2 if t has the form t = f k (t1 , . . . , tk ), with t1 , . . . , tk being terms
and f k a k -dimensional function symbol, then
A (t ) =def f A (A (t1 ), . . . , A (tk )).

Artificial Intelligence – Prof. Dr. Jürgen Dix (398/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition (continued)
Now we define inductively the logical value of a formula ϕ in A :
1. if ϕ =def P k (t1 , . . . , tk ) with the terms t1 , . . . , tk and the
k -dimensional predicate symbol P k , then
true, if (A (t1 ), . . . , A (tk )) ∈ P A ,

A (ϕ) =def
false, else.
2. if ϕ =def ¬ψ, then

true, if A (ψ) = false,
A (ϕ) =def
false, else.
3. if ϕ =def (ψ ∧ η), then

true, if A (ψ) = true und A (η) = true,
A (ϕ) =def
false, else.

Artificial Intelligence – Prof. Dr. Jürgen Dix (399/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition (continued)
4. if ϕ =def (ψ ∨ η), then

true, if A (ψ) = true oder A (η) = true,
A (ϕ) =def
false, else.
5. if ϕ =def ∀ x ψ, then

true, if ∀ d ∈ UA : A[x /d ] (ψ) = true,
A (ϕ) ==def
false, sonst .
6. if ϕ =def ∃ x ψ, then

true, if ∃d ∈ UA : A[x /d ] (ψ) = true,
A (ϕ) =def
false, else.
In the cases 5. and 6. the notation [x /d ] was used. It is defined
as follows: For d ∈ UA be A[x /d ] the structure A 0 , which is
0 0
identical with A except for the definition of x A : x A =def d
(independent of the fact that IA is defined for x or not).
Artificial Intelligence – Prof. Dr. Jürgen Dix (400/554)
5. Knowledge Engineering 5. First Order Logic (FOL)

Definition (continued)
We write:
A |= ϕ[ρ] for A (ϕ) = true: A is a model for ϕ in respect to
ρ.
If ϕ does not contain free variables, then A |= ϕ[ρ] is
independent from ρ. We simply leave out ρ.
If there is at least one model for ϕ , then ϕ is called
satisfiable or consistent.

Artificial Intelligence – Prof. Dr. Jürgen Dix (401/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition 79 (Tautology)
1
A theroy is a set of formulae without free
variables: T ⊆ FmlL . A satisfies T if A |= ϕ
holds for all ϕ ∈ T . We write A |= T .
2
A L -formula ϕ is called L -tautology, if for all
matching L -structures A the following holds:
A |= ϕ.
From now on we mostly leave out the language L ,
because it is clearly obvious from the context.
Nevertheless it has to be defined always.

Artificial Intelligence – Prof. Dr. Jürgen Dix (402/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition 80 (Consequence set Cn(T ))


A formula ϕ results from T , if for all structures

A with A |= T also A |= ϕ holds. Therefore we


write: T |= ϕ.
We denote with CnL (T ) =def {ϕ ∈ FmlL : T |= ϕ}, or
simply Cn(T ), the semantic consequence operator.

Artificial Intelligence – Prof. Dr. Jürgen Dix (403/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Lemma 81 (Properties of Cn(T ))


The semantic consequence operator has the
following properties
1
T -extension: T ⊆ Cn(T ),
2
Monotony: T ⊆ T 0 ⇒ Cn(T ) ⊆ Cn(T 0 ),
3
Closure: Cn(Cn(T )) = Cn(T ).

Lemma 82 (ϕ 6∈ Cn(T))
ϕ 6∈ Cn(T ) if and only if there is a structure A with
A |= T and A |= ¬ϕ.

Artificial Intelligence – Prof. Dr. Jürgen Dix (404/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Definition 83 (MOD (T ), Cn(U ))

If T ⊆ FmlL , then we denote by MOD(T ) the set of


all L -structures A which are models of T :

MOD(T ) =def {A : A |= T }.
If U is a set of structures then we can consider all
sentences, which are true in all structures. We call
this set also Cn(U ):

Cn(U ) =def {ϕ ∈ FmlL : ∀A ∈ U : A |= ϕ}.


MOD is obviously dual to Cn:
Cn(MOD(T )) = Cn(T ), MOD(Cn(T )) = MOD(T ).
Artificial Intelligence – Prof. Dr. Jürgen Dix (405/554)
5. Knowledge Engineering 5. First Order Logic (FOL)

Definition 84 (Completeness of a theory T )


T is called complete, if for each formula ϕ ∈ FmlL : T |= ϕ oder
T |= ¬ϕ holds.

Attention:
Do not mix up this last condition with the property of a valuation
v (or a model): each valuation is complete in the above sense.

Lemma 85 (Ex Falso Quodlibet)


T is consistent if and only if Cn(T ) 6= FmlL .

Artificial Intelligence – Prof. Dr. Jürgen Dix (406/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

An illustrating example
Example 86 (Natural numbers in different languages)
NPr = (IN, 0N , +N , =N ) („Presburger Arithmetik”),
NPA = (IN, 0N , +N , ·N , =N ) („Peano Arithmetik”),
NPA0 = (IN, 0N , 1N , +N , ·N , =N ) (variant of NPA ).
These sets each define the natural numbers, but in different
languages.

Question:
If the language bigger is bigger then we can express more.
Is LPA0 more expressive then LPA ?

Artificial Intelligence – Prof. Dr. Jürgen Dix (407/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Answer:
No, because one can replace the 1N by a
LPA -formula: there is a LPA -formula φ(x ) so that for
each variable assignment ρ the following holds:

NPA0 |=ρ φ(x ) if and only if ρ(x ) = 1IN


Thus we can define a macro for defining the 1. Each
formula of LPA0 can be transformed into an equivalent
formula of LPA .

Artificial Intelligence – Prof. Dr. Jürgen Dix (408/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

Question:
Is LPA perhaps more expressive than LPr , or can the
multiplication be defined somehow?

We will see later that LPA is indeed more expressive:


the set of sentences valid in NPr is decidable
whereas
the set of sentences valid in NPA is not even

recursively enumerable.

Artificial Intelligence – Prof. Dr. Jürgen Dix (409/554)


5. Knowledge Engineering 5. First Order Logic (FOL)

We have already defined a complete and correct calculus for


propositional logic LPL . Such calculi exist also for first order
logic LFOL .
Theorem 87 (Completeness of first order logic)

A formula follows semantically from a theory T if it can be


derived:

T ` ϕ if and only if T |= ϕ

Theorem 88 (Compactness of first order logic)

A formula can be derived from a theory T if it can be derived


from a finite subset of T :

Cn(T ) = {Cn(T 0 ) : T 0 ⊆ T , T 0 finite}.


[

Artificial Intelligence – Prof. Dr. Jürgen Dix (410/554)


5. Knowledge Engineering 6. Herbrand’s Theorem

5.6 Herbrand’s Theorem

Artificial Intelligence – Prof. Dr. Jürgen Dix (411/554)


5. Knowledge Engineering 6. Herbrand’s Theorem

The introduced relation T |= φ says that each model


of T is also a model of φ. But because there are
many models with very large universes the following
question arises: can we restrict to particular models ?
Theorem 89 (Löwenheim-Skolem)
T |= φ holds if and only if φ holds in all countable
models of T .

Artificial Intelligence – Prof. Dr. Jürgen Dix (412/554)


5. Knowledge Engineering 6. Herbrand’s Theorem

Quite often the universes of models (which we are


interested in) consist exactly of the basic terms
TermL (0)
/ . This leads to the following notion:
Definition 90 (Herbrand model)
A model A is called Herbrand model in respect to a
language if the universe of A consists exactly of
/ and the function symbols fik are interpreted
TermL (0)
as follows:
A
fik : TermL (0) / × . . . × TermL (0) / → TermL (0);
/
k
(t1 , . . . , tk ) 7→ fi (t1 , . . . , tk )
We write T |=Herb φ if each Herbrand model of T is
also model of φ ist.
Artificial Intelligence – Prof. Dr. Jürgen Dix (413/554)
5. Knowledge Engineering 6. Herbrand’s Theorem

Theorem 91 (Reduction to Herbrand models)


If T is universal and φ existential, then the following
holds:

T |= φ if and only if T |=Herb φ

Question:
Is T |=Herb φ not much easier, because we have to
consider only Herbrand models? Is it perhaps
decidable? (More about it later in the next section.)

Artificial Intelligence – Prof. Dr. Jürgen Dix (414/554)


5. Knowledge Engineering 6. Herbrand’s Theorem

The following theorem is the basic result to apply resolution. In a


way it expresses that FOL can be reduced to SL.
Theorem 92 (Herbrand)
T be universal and φ does not contain quantifiers. Then:

T |= ∃φ if and only if there is t1 , . . . , tn ∈ TermL (0)


/
with: T |= φ(t1 ) ∨ . . . ∨ φ(tn )

Or: Let M be a set of clauses of PL (formulae in the form


P1 (t1 ) ∨ ¬P2 (t2 ) ∨ . . . ∨ Pn (tn ) with ti ∈ TermL (X )). Then M is
unsatisfiable if and only if there is a finite and unsatisfiable set
Minst of basic instances of M.

Artificial Intelligence – Prof. Dr. Jürgen Dix (415/554)


5. Knowledge Engineering 6. Herbrand’s Theorem

In automatic theorem proving we are always


interested in the question
^
M |= ∃x1 , . . . xn φi
i

Then
^
M ∪ {¬∃x1 , . . . xn φi }
i

is a set of clauses, which are unsatisfiable if and


only if the relation above holds.

Artificial Intelligence – Prof. Dr. Jürgen Dix (416/554)


5. Knowledge Engineering 7. Robinson’s Resolution

5.7 Robinson’s Resolution

Artificial Intelligence – Prof. Dr. Jürgen Dix (417/554)


5. Knowledge Engineering 7. Robinson’s Resolution

Definition 93 (Most general unifier: mgU)


Given a finite set of equations between terms or
equations between literals.
Then there is an algorithm which calculates a most
general solution substitution (i.e. a substitution of the
involved variables so that the left sides of all
equations are syntactically identical to the right sides)
or which returns fail.
In the first case the most general solution substitution
is defined (up to renaming of variables): it is called

mgU, most general unifier

Artificial Intelligence – Prof. Dr. Jürgen Dix (418/554)


5. Knowledge Engineering 7. Robinson’s Resolution

p(x , a) = q (y , b),
p(g (a), f (x )) = p(g (y ), z ). Basic substitutions
are:
[y /a, x /a, z /f (a)], [y /a, x /f (a), z /f (f (a))], . . .
The mgU is: [y /a, z /f (x )] .

Artificial Intelligence – Prof. Dr. Jürgen Dix (419/554)


5. Knowledge Engineering 7. Robinson’s Resolution

We want to outline the mentioned algorithm using an example.

Given: f (x , g (h(y ), y )) = f (x , g (z , a))

The algorithm successively calculates the following sets of


equations:

{ x = x , g (h(y ), y ) = g (z , a) }
{ g (h(y ), y ) = g (z , a) }
{ h(y ) = z , y = a }
{ z = h(y ), y = a }
{ z = h(a), y = a }

Thus the mgU is: [x /x , y /a, z /h(a)].

Artificial Intelligence – Prof. Dr. Jürgen Dix (420/554)


5. Knowledge Engineering 7. Robinson’s Resolution

Definition 94 (Robinson’s resolution for FOL)


The resolution calculus consists of two rules:
C1 ∪ {A1 } C2 ∪ {¬A2 }
Resolution:
(C1 ∪ C2 )mgU (A1 , A2 )
where C1 ∪ {A1 } and C2 ∪ {A2 } are assumed to be
disjunct wrt the variables.

C1 ∪ {L1 , L2 }
Factorization:
(C1 ∪ {L1 })mgU (L1 , L2 )
Consider for example M = {r (x ) ∨ ¬p(x ), p(a), s(a)}
and the question M |= ∃x (s(x ) ∧ r (x ))?
Artificial Intelligence – Prof. Dr. Jürgen Dix (421/554)
5. Knowledge Engineering 7. Robinson’s Resolution

Question:
Why do we need factorization?

Answer:
Consider

M = {s(x1 ) ∨ s(x2 ), ¬s(y1 ) ∨ ¬s(y2 )}

Resolving both clauses gives

{s(x1 )} ∪ {¬s(y1 )}
or variants of it.
Resolving this new clause with one in M only leads to
variants of the respective clause in M.
Artificial Intelligence – Prof. Dr. Jürgen Dix (422/554)
5. Knowledge Engineering 7. Robinson’s Resolution

Answer (continued):
 can never be derived (using resolution only).

Factorization instantly solves the problem, we can


deduce both s(x ) and ¬s(y ), and from there the
empty clause.

Theorem 95 (Resolution is refutation complete)


Robinsons resolution calculus is refutation complete:
given an unsatisfiable set, the empty clause can be
derived using resolution and factorization.

Artificial Intelligence – Prof. Dr. Jürgen Dix (423/554)


5. Knowledge Engineering 8. Situation Calculus

5.8 Situation Calculus

Artificial Intelligence – Prof. Dr. Jürgen Dix (424/554)


5. Knowledge Engineering 8. Situation Calculus

Question:
How do we axiomatize the Wumpus-world in FOL?

function KB-AGENT( percept) returns an action


static: KB, a knowledge base
t, a counter, initially 0, indicating time

TELL(KB, MAKE-PERCEPT-SENTENCE( percept, t))


action ASK(KB, MAKE-ACTION-QUERY(t))
TELL(KB, MAKE-ACTION-SENTENCE(action, t))
t t+1
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (425/554)


5. Knowledge Engineering 8. Situation Calculus

Idea:
In order to describe actions or their effects consistently we
consider the world as a sequence of situations (snapshots od
the world). Therefore we have to extend each predicate by an
additional argument.
We use the function symbol

result (action, situation)

as the term for the situation which emerges when the action
action is executed in the situation situation.
Actions: Turn_right, Turn_left, Foreward, Shoot, Grab,
Release, Climb.

Artificial Intelligence – Prof. Dr. Jürgen Dix (426/554)


5. Knowledge Engineering 8. Situation Calculus

PIT

Gold PIT

PIT
PIT
Gold PIT

S3
PIT
PIT
Gold PIT

Forward
S20
PIT
PIT
Gold PIT

Turn (Right)
S1

PIT

Forward
S0

Artificial Intelligence – Prof. Dr. Jürgen Dix (427/554)


5. Knowledge Engineering 8. Situation Calculus

We also need a memory, a predicate

At(person, location, situation)

with person being either Wumpus or Agent and


location being the actual position (stored as pair [i,j]).
Important axioms are the so called successor-state
axioms, they describe how actions effect situations.
The most general form of these axioms is
true afterwards ⇐⇒ an action made it true
or it is already true and
no action made it false

Artificial Intelligence – Prof. Dr. Jürgen Dix (428/554)


5. Knowledge Engineering 8. Situation Calculus

Axioms over At (p, l , s):

At (p, l , result (a, s))↔((l = location_ahead (p, s) ∧ ¬Wall (l ) ∧ a = Forward )


∨(At (p, l , s) ∧ ¬a = Forward ))

At (p, l , s) →Location_ahead (p, s) = Location_toward (l , Orient .(p, s))


Wall ([x , y ]) ↔(x = 0 ∨ x = 5 ∨ y = 0 ∨ y = 5)

Artificial Intelligence – Prof. Dr. Jürgen Dix (429/554)


5. Knowledge Engineering 8. Situation Calculus

Location_toward ([x , y ], 0) = [x + 1, y ]
Location_toward ([x , y ], 90) = [x , y + 1]
Location_toward ([x , y ], 180) = [x − 1, y ]
Location_toward ([x , y ], 270) = [x , y − 1]

Orient .(Agent , s0 ) = 90
Orient .(p, result (a, s)) = d ↔
((a = turn_right ∧ = mod (Orient .(p, s) − 90, 360))
∨(a = turn_left ∧ d = mod (Orient .(p, s) + 90, 360))
∨(Orient .(p, s) = d ∧ ¬(a = Turn_right ∨ a = Turn_left ))

mod (x , y ) is the implemented “modulo”-function, assigning a


distinct value between 0 and y to each variable x.

Artificial Intelligence – Prof. Dr. Jürgen Dix (430/554)


5. Knowledge Engineering 8. Situation Calculus

Axioms over percepts, useful new notions:

Percept ([Stench, b, g , u , c ], s) → Stench(s)


Percept ([a, Breeze, g , u , c ], s) → Breeze(s)
Percept ([a, b, Glitter , u , c ], s) → At_Gold (s)
Percept ([a, b, g , Bump, c ], s) → At_Wall (s)
Percept ([a, b, g , u , Scream], s) → Wumpus_dead (s)

At (Agent , l , s) ∧ Breeze(s) → Breezy (l )


At (Agent , l , s) ∧ Stench(s) → Smelly (l )

Artificial Intelligence – Prof. Dr. Jürgen Dix (431/554)


5. Knowledge Engineering 8. Situation Calculus

Adjacent (l1 , l2 ) ↔ ∃ d l1 = Location_toward (l2 , d )


Smelly (l1 ) → ∃l2 At (Wumpus, l2 , s) ∧ (l2 = l1 ∨ Adjacent (l1 , l2 ))

Percept ([none, none, g , u , c ], s) ∧ At (Agent , x , s) ∧ Adjacent (x , y )


→ OK (y )
(¬At (Wumpus, x , t ) ∧ ¬Pit (x ))
→ OK (y )
At (Wumpus, l1 , s) ∧ Adjacent (l1 , l2 )
→ Smelly (l2 )
At (Pit , l1 , s) ∧ Adjacent (l1 , l2 )
→ Breezy (l2 )

Artificial Intelligence – Prof. Dr. Jürgen Dix (432/554)


5. Knowledge Engineering 8. Situation Calculus

Axioms to describe actions:

Holding (Gold , result (Grab, s)) ↔ At_Gold (s)∨


Holding (Gold , s)
Holding (Gold , result (Release, s)) ↔ 
Holding (Gold , result (Turn_right , s)) ↔ Holding (Gold , s)
Holding (Gold , result (Turn_left , s)) ↔ Holding (Gold , s)
Holding (Gold , result (Forward , s)) ↔ Holding (Gold , s)
Holding (Gold , result (Climb, s)) ↔ Holding (Gold , s)

Each effect must be described carefully.

Artificial Intelligence – Prof. Dr. Jürgen Dix (433/554)


5. Knowledge Engineering 8. Situation Calculus

Axioms describing preferences among actions:

Great (a, s) → Action(a, s)


(Good (a, s) ∧ ¬∃b Great (b, s)) → Action(a, s)
(Medium(a, s) ∧ ¬∃b (Great (b, s) ∨ Good (b, s))) → Action(a, s)
(Risky (a, s) ∧ ¬∃b (Great (b, s) ∨ Good (b, s) ∨ Medium(a, s)))
→ Action(a, s)
At (Agent , [1, 1], s) ∧ Holding (Gold , s) → Great (Climb, s)
At_Gold (s) ∧ ¬Holding (Gold , s) → Great (Grab, s)
At (Agent , l , s) ∧ ¬Visited (Location_ahead (Agent , s))∧
∧OK (Location_ahead (Agent , s)) → Good (Forward , s)
Visited (l ) ↔ ∃s At (Agent , l , s)

The goal is not only to find the gold but also to return safely. We
need the additional axioms Holding (Gold , s) → Go_back (s)
etc.

Artificial Intelligence – Prof. Dr. Jürgen Dix (434/554)


5. Knowledge Engineering 9. Important Problems

5.9 Important Problems

Artificial Intelligence – Prof. Dr. Jürgen Dix (435/554)


5. Knowledge Engineering 9. Important Problems

There are three very important


representation-problems concerning the
axiomatization of a changing world:
frame problem: most actions change only little – we
need many actions to describe invariant
properties.
It would be ideal to axiomatize only what
does not change and to add a proposition
like “else nothing changes”.

Artificial Intelligence – Prof. Dr. Jürgen Dix (436/554)


5. Knowledge Engineering 9. Important Problems

qualification problem: it is necessary to enumerate all


conditions under which an action is
executed successfully. For example:
∀x (Bird (x ) ∧¬Penguin(x ) ∧ ¬Dead (x )∧
∧¬Ostrich(x ) ∧ ¬Broken_wing (x )∧
∧...
−→ Flies(x )
It would be ideal to only store “birds normally fly”.
ramification problem: How should we handle implicit
consequences of actions? For example
Grab(Gold ): Gold can be contaminated. Then
Grab(Gold ) is not optimal.

Artificial Intelligence – Prof. Dr. Jürgen Dix (437/554)


5. Knowledge Engineering 9. Important Problems

Programming versus knowledge engineering


programming knowledge engineering
choose programming language choose logic
write program define knowledge base
write compiler implement calculus
run program derive new facts

Artificial Intelligence – Prof. Dr. Jürgen Dix (438/554)


5. Knowledge Engineering 9. Important Problems

From Wittgenstein’s Tractatus Logico-Philosophicus:


1 Die Welt ist alles was der Fall ist.
1.1 Die Welt ist die Gesamtheit der Tatsachen, nicht der Dinge.
2 Was der Fall ist, die Tatsache, ist das Bestehen von
Sachverhalten.
3 Das logische Bild der Tatsachen ist der Gedanke.
4 Der Satz ist eine Wahrheitsfunktion der Elementarsätze.
5 Die allgemeine Form der Wahrheitsfunktion ist: [p, ξ, N (ξ)].
6 Wovon man nicht sprechen kann, darüber muß man
schweigen.

Artificial Intelligence – Prof. Dr. Jürgen Dix (439/554)


6. More about Logic

6. More about Logic

Artificial Intelligence – Prof. Dr. Jürgen Dix (440/554)


6. More about Logic

6.1 Notes about Resolution


6.2 Equality
6.3 (Un-) Decidability
6.4 Variants of Resolution
6.5 SLD-Resolution
6.6 Higher Order Logic

Artificial Intelligence – Prof. Dr. Jürgen Dix (441/554)


6. More about Logic 1. Notes about resolution

6.1 Notes about resolution

Artificial Intelligence – Prof. Dr. Jürgen Dix (442/554)


6. More about Logic 1. Notes about resolution

We want to formulate the following Problem (1) in


propositional logic and (2) in first-order logic:

Knowledge:
1
Jack owns a dog.
2
Dog owners are animal lovers.
3
No animal lover kills an animal.
4
Either Jack or Bill killed a cat named Tuna.

Question:
Who killed the cat?

Artificial Intelligence – Prof. Dr. Jürgen Dix (443/554)


6. More about Logic 1. Notes about resolution

Question:
How do we represent the proposition “every animal
has a brain”?
What about ∀x animal (x ) → has_brain(x )?
Or using a unary function symbol brain_of (·)? E.g.

∀x animal (x ) → ∃b b = brain_of (x )

Artificial Intelligence – Prof. Dr. Jürgen Dix (444/554)


6. More about Logic 1. Notes about resolution

Question:
Can Modus Ponens be simulated using resolution?

Answer:
No! (Why?)

Artificial Intelligence – Prof. Dr. Jürgen Dix (445/554)


6. More about Logic 1. Notes about resolution

Question:
Can resolution be simulated using Modus Ponens?
Only because there are no axioms?

Answer:
Yes! (Why?)

Artificial Intelligence – Prof. Dr. Jürgen Dix (446/554)


6. More about Logic 1. Notes about resolution

Question:
Why is the resolution calculus not complete?

Answer:
No, not only because of the missing axioms. Even
axioms do not help because resolution rules only
operate on clauses. And we cannot derive (in the
calculus!) clauses from the implications (axioms).
This works only with Modus Ponens.

Artificial Intelligence – Prof. Dr. Jürgen Dix (447/554)


6. More about Logic 2. Equality

6.2 Equality

Artificial Intelligence – Prof. Dr. Jürgen Dix (448/554)


6. More about Logic 2. Equality

There is only one relation, which has in each


.
structure a canonical interpretation: equality = (a
binary predicate symbol), which can be interpreted as
identity.
Short:
Two elements of an individual range are in the
.
relation = iff they are identical.

Artificial Intelligence – Prof. Dr. Jürgen Dix (449/554)


6. More about Logic 2. Equality

There are two formalisations possible in PL1:


.
“PL1 without =”: as introduced in the last
chapter
.
“PL1 with =”: this means that there is as binary
predicate symbol which is interpreted as identity
in each structure.
Question:
.
Is “PL1 with =” truly more expressive than “PL1
.
without =”?

Artificial Intelligence – Prof. Dr. Jürgen Dix (450/554)


6. More about Logic 2. Equality

Answer:
We can introduce formulae Ψn so that in all
structures A with universe A the following holds:

A |= Ψn iff A contains exactly n elements

Artificial Intelligence – Prof. Dr. Jürgen Dix (451/554)


6. More about Logic 2. Equality

Question:
. .
Can = be simulated in “PL1 without =”?

Answer:
.
We could try to axiomatize = (like the Wumpus world). We
.
consider PL1 without = with an additional binary eq and
demand the following axioms:
∀x eq (x , x ),
for each function symbol f with adequate arity:
∀x1 . . . xn , y1 . . . yn (eq (x1 , y1 ) . . . eq (xn , yn ))
→ eq (f (x1 , . . . , xn ), f (y1 , . . . , yn )),
for each predicate symbol P with adequate arity:
∀x1 . . . xn y1 . . . yn (eq (x1 , y1 ) . . . eq (xn , yn ))
→ (P (x1 , . . . , xn ) → P (y1 , . . . , yn )).

Artificial Intelligence – Prof. Dr. Jürgen Dix (452/554)


6. More about Logic 2. Equality

Important:
This set of axioms depends on the underlying
language L , we call it EQL .

Artificial Intelligence – Prof. Dr. Jürgen Dix (453/554)


6. More about Logic 2. Equality

Question:
We consider the following equivalence with φ being a
. .
formula in PL1 with = (φ[= /eq ] is obtained from φ by
.
replacing each appearing = with eq ):
. .
T |==. φ iff T [= /eq ] ∪ EQL |= φ[= /eq ] ?

Does it hold?

Artificial Intelligence – Prof. Dr. Jürgen Dix (454/554)


6. More about Logic 2. Equality

Answer:
Unfortunately not, because:
1
the axioms EQL leave open the possibility, that
the interpretation of eq in a structure A consists
of A × A
2
models may contain elements which can not be
represented by terms of the considered language
– we can make only limited statements about
these elements.
It can be shown that we are not able to completely
axiomatize the identity even with additional axioms.

Artificial Intelligence – Prof. Dr. Jürgen Dix (455/554)


6. More about Logic 3. (Un-)Decidability

6.3 (Un-)Decidability

Artificial Intelligence – Prof. Dr. Jürgen Dix (456/554)


6. More about Logic 3. (Un-)Decidability

We dealt with the natural numbers with addition and


multiplication before. We now approach them differently and
consider the language LPA with
a constant 0 and a binary function symbol s
a binary predicate symbol eq and two ternary predicate
symbols ⊕ and ⊗.
Before defining some axioms about ⊕ and ⊗ we define the
following abbreviations: the formula ∃!z φ(z ) stands for

∃z ( φ(z ) ∧ ∀y φ(y ) → eq (y , z )).

Artificial Intelligence – Prof. Dr. Jürgen Dix (457/554)


6. More about Logic 3. (Un-)Decidability

Definition 96 (Peano arithmetic PAfin )


PAfin consists of EQLPA and the following axioms:

∀x ∀y ∃!z ⊕(x , y , z )
∀x ⊕(x , 0, x )
∀x ∀y ∀z ⊕(x , y , z ) → ⊕(x , s(y ), s(z ))

∀x ∀y ∃!z ⊗(x , y , z )
∀x ⊗(x , 0, 0)
∀x ∀y ∀z ∃z ⊗(x , y , z ) → (⊗(x , s(y ), z 0 ) ∧ ⊕(z , x , z 0 ))
0

Artificial Intelligence – Prof. Dr. Jürgen Dix (458/554)


6. More about Logic 3. (Un-)Decidability

Obviously N := (IN, 0N , sN , ⊕N , ⊗N , eq N ) is a
model of all these axioms: With 0N being the “right”
0, sN is the successor function, ⊕N is the addition
an ⊗N is the multiplication of natural numbers (each
considered as relations). eq N the identity. The
following holds:
The set {φ : PAfin |= φ} is recursively
enumerable but not recursive (decidable) .
The set {φ : N |= φ} is not even recursively
enumerable .

Artificial Intelligence – Prof. Dr. Jürgen Dix (459/554)


6. More about Logic 3. (Un-)Decidability

The following is the famous result from Kurt Gödel.


Theorem 97 (Gödel)
Each set of formulae which contains PAfin and has N
as model is not recursive.
Each recursively enumerable set of formulae Φ which
contains PAfin and has N as model is incomplete,
i.e. there is a ψ with: N |= ψ but Φ 6|= ψ.
N can never be axiomatized completely.

Artificial Intelligence – Prof. Dr. Jürgen Dix (460/554)


6. More about Logic 4. Variants of resolution

6.4 Variants of resolution

Artificial Intelligence – Prof. Dr. Jürgen Dix (461/554)


6. More about Logic 4. Variants of resolution

Our general goal is to derive an existentially


quantified formula from a set of formulae:

M ` ∃ϕ.

To use resolution we must form M ∪ {¬∃ϕ} and put it


into the form of clauses. This set is called input.
Instead of allowing arbitrary resolvents, we try to
restrict the search space.

Artificial Intelligence – Prof. Dr. Jürgen Dix (462/554)


6. More about Logic 4. Variants of resolution

Example 98 (Unlimited Resolution)


Let M := {r (x ) ∨ ¬p(x ), p(a), s(a)} and
 ← s(x ) ∧ r (x ) the query.
An unlimited resolution might look like this:

r (x ) ∨ ¬p(x ) p(a) s(a) ¬s(x ) ∨ ¬r (x )


r (a) ¬r (a)


Artificial Intelligence – Prof. Dr. Jürgen Dix (463/554)


6. More about Logic 4. Variants of resolution

Input resolution: in each resolution step one of the two parent


clauses must be from the input. In our example:

¬s(x ) ∨ ¬r (x ) s(a)
r (x ) ∨ ¬p(x )
¬r (a)
p(a)
¬p(a)

Linear resolution: in each resolution step one of the two
parent clauses must either be from the input or
must be a successor of the other parent clause.

Artificial Intelligence – Prof. Dr. Jürgen Dix (464/554)


6. More about Logic 4. Variants of resolution

Theorem 99 (Completeness of resolution variants)


Linear resolution is refutation complete.
Input resolution is correct but not refutation complete.

Idea:
Maybe input resolution is complete for a restricted
class of formulae.

Artificial Intelligence – Prof. Dr. Jürgen Dix (465/554)


6. More about Logic 5. SLD resolution

6.5 SLD resolution

Artificial Intelligence – Prof. Dr. Jürgen Dix (466/554)


6. More about Logic 5. SLD resolution

Definition 100 (Horn clause)


A clause is called Horn clause if it contains at most
one positive atom.
A Horn clause is called definite if it contains exactly
one positive atom. It has the form

A(t ) ← A1 (t1 ), . . . , An (tn ).

A Horn clause without positive atom is called query:

 ← A1 (t1 ), . . . , An (tn ).

Artificial Intelligence – Prof. Dr. Jürgen Dix (467/554)


6. More about Logic 5. SLD resolution

Theorem 101 (Input resolution for Horn clauses)


Input resolution for Horn clauses is refutation complete.

Definition 102 (SLD resolution wrt P and query Q)


SLD resolution with respect to a program P and the query Q is
input resolution beginning with the query  ← A1 , . . . , An . Then
one Ai is chosen and resolved with a clause of the program. A
new query emerges, which will be treated as before. If the empty
clause  ← can be derived then SLD resolution was successful
and the instantiation of the variables is called computed answer.

Artificial Intelligence – Prof. Dr. Jürgen Dix (468/554)


6. More about Logic 5. SLD resolution

Theorem 103 (Correctness of SLD resolution)


Let P be a definite program and Q a query. Then each answer
calculated for P wrt Q is correct.

Question:
Is SLD completely instantiated?

Definition 104 (Computation rule)


A computation rule R is a function which assigns an atom
Ai ∈ {A1 , . . . , An } to each query  ← A1 , . . . , An . This Ai is the
chosen atom against which we will resolve in the next step.

Note:
PROLOG always uses the leftmost atom.

Artificial Intelligence – Prof. Dr. Jürgen Dix (469/554)


6. More about Logic 5. SLD resolution

← p(x , b)
@
1 @ 2
@@
← q (x , y ), p(y , b) 
[x /b]
“Success”
3

← p(b, b)
@
1 @ 2
@
@
← q (b, u ), p(u , b) 
[x /a]
“Failure”
“Success”

Artificial Intelligence – Prof. Dr. Jürgen Dix (470/554)


6. More about Logic 5. SLD resolution

← p(x , b)

@
1 2
@
@
@
← q (x , y ), p(y , b) 
[x /b]
@ “Success”
1 2
@
@
@
← q (x , y ), q (y , u ), p(u , b) ← q (x , b)

@
1 2 3
@
@
@
← q (x , y ), q (y , u ), q (u , v ), p(v , b) ← q (x , y ), q (y , b) 
[x /a]
@ 3 “Success”
1
@ 2 ← q (x , a)
@
@ “Failure”

@
@
. .
. .
Artificial .Intelligence – Prof. Dr. Jürgen Dix . (471/554)
6. More about Logic 5. SLD resolution

A SLD tree may have three different kinds of


branches:
1
infinite ones,
2
branches ending with the empty clause (and
leading to an answer) and
3
failing branches (dead ends).

Artificial Intelligence – Prof. Dr. Jürgen Dix (472/554)


6. More about Logic 5. SLD resolution

Theorem 105 (Independence of computation rule)


Let R be a computation rule and σ an answer
calculated wrt R (i.e. there is a successful SLD
resolution). Then there is also a successful SLD
resolution for each other computation rule R’ and the
answer σ0 belonging to R’ is a variant of σ.

Artificial Intelligence – Prof. Dr. Jürgen Dix (473/554)


6. More about Logic 5. SLD resolution

Theorem 106 (Completeness of SLD resolution)


Each correct answer substitution is subsumed
through a calculated answer substitution. I.e.:
P |= ∀Q Θ
implies
SLD computes an answer τ with: ∃σ : Q τσ = Q Θ

Artificial Intelligence – Prof. Dr. Jürgen Dix (474/554)


6. More about Logic 5. SLD resolution

Question:
How to find successful branches in a SLD tree?

Definition 107 (Search rule)


A search rule is a strategy to search for successful
branches in SLD trees.

Note:
PROLOG uses depth-first-search.

A SLD resolution is determined by a computation rule


and a searching rule.

Artificial Intelligence – Prof. Dr. Jürgen Dix (475/554)


6. More about Logic 5. SLD resolution

SLD trees for P ∪ {Q } are determined by the


computation rule.

PROLOG is incomplete because of two reasons:


depth-first-search
incorrect unification (no occur check).

Artificial Intelligence – Prof. Dr. Jürgen Dix (476/554)


6. More about Logic 5. SLD resolution

A third reason comes up if we also ask for finite and


failed SLD resolutions:
the computation rule must be fair, i.e. there must
be a guarantee that every atom on the list of
goals is eventually chosen.

Artificial Intelligence – Prof. Dr. Jürgen Dix (477/554)


6. More about Logic 6. Higher order logic

6.6 Higher order logic

Artificial Intelligence – Prof. Dr. Jürgen Dix (478/554)


6. More about Logic 6. Higher order logic

Definition 108 (Second order logic LPL 2 )


The language L2nd OL of second order logic (Prädikatenlogik
zweiter Stufe) consists of the language LSL 1 and additionally
for each k ∈ IN: X1k , X2k , . . . , Xnk , . . . a countable set
RelVark of k -ary predicate variables.
Thereby the set of terms becomes bigger:
if X k is a k -ary predicate variable and t1 , . . . , tk are terms,
then X k (t1 , . . . , tk ) is also a term
and also the set of formulae:
if X is a predicate variable, ϕ a formula, then (∃X ϕ) and
(∀X ϕ) are also formulae.
Not only elements of the individual range can be quantified but
also arbitrary subsets resp. k -ary relations.
The semantics do not change much – except for the new
interpretation of formulae like (∃X ϕ), (∀X ϕ)).
Artificial Intelligence – Prof. Dr. Jürgen Dix (479/554)
6. More about Logic 6. Higher order logic

We also demand from IA that the new k -ary predicate


variables are mapped onto k -ary relations on UA .
if ϕ =def ∀ X k ψ, then
true, if for all R k ⊆ UA × · · · × UA :

A[X k /R k ] (ψ) = true,
A (ϕ) =def
false, else.

if ϕ =def ∃ X k ψ, then
true, if there is a R k ⊆ UA × · · · × UA with A[X k /R k ] (ψ) = true,

A (ϕ) =def
false, else.
We can quantify over arbitrary n-ary relations, not just
about elements (like in first order logic).

Artificial Intelligence – Prof. Dr. Jürgen Dix (480/554)


6. More about Logic 6. Higher order logic

Question
What do the following two 2nd order sentences mean?
.
∀x ∀y (x = y ⇐⇒ ∀X (X (x ) ⇐⇒ X (y ))),

∀X ( (∀x ∃!yX (x , y ) ∧ ∀x ∀y ∀z
.
( (X (x , z ) ∧ X (y , z )) → x = y ))
→ ∀x ∃yX (y , x ) )

Artificial Intelligence – Prof. Dr. Jürgen Dix (481/554)


6. More about Logic 6. Higher order logic

Answer:
The first sentence shows that equality can be defined in 2nd OL
(in contrast to FOL).
The second sentence holds in a structure iff it is finite. Note that
this cannot be expressed in PL1.

While the semantics of L2n OL is a canonical extension of


LFOL , this does not hold for the calculus level. It can be shown
that the set of valid sentences in L2nd OL is not even recursively
enumerable.
Attention:
There is no correct and complete calculus for 2nd Order Logic!

Artificial Intelligence – Prof. Dr. Jürgen Dix (482/554)


7. Planning

7. Planning

Artificial Intelligence – Prof. Dr. Jürgen Dix (483/554)


7. Planning

7.1 Planning vs. Problem-Solving


7.2 STRIPS
7.3 Partial Order Planning
7.4 Conditional Planning
7.5 Extensions

Artificial Intelligence – Prof. Dr. Jürgen Dix (484/554)


7. Planning 1. Planning vs. Problem-Solving

7.1 Planning vs. Problem-Solving

Artificial Intelligence – Prof. Dr. Jürgen Dix (485/554)


7. Planning 1. Planning vs. Problem-Solving

Motivation:
problem-solving agent: The effects of a static
sequence of actions are determined.
knowledge-based agent: Actions can be chosen.

We try to merge both into a planning agent.

Artificial Intelligence – Prof. Dr. Jürgen Dix (486/554)


7. Planning 1. Planning vs. Problem-Solving

function SIMPLE-PLANNING-AGENT( percept) returns an action


static: KB, a knowledge base (includes action descriptions)
p, a plan, initially NoPlan
t, a counter, initially 0, indicating time
local variables: G, a goal
current, a current state description

TELL(KB, MAKE-PERCEPT-SENTENCE( percept, t))


current STATE-DESCRIPTION(KB, t)
if p = NoPlan then
G ASK(KB, MAKE-GOAL-QUERY(t))
p IDEAL-PLANNER(current, G, KB)
if p = NoPlan or p is empty then action NoOp
else
action FIRST( p)
p REST( p)
TELL(KB, MAKE-ACTION-SENTENCE(action, t))
t t+1
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (487/554)


7. Planning 1. Planning vs. Problem-Solving

Example 109 (Running Example)


We want to drink freshly made banana shake and drill
some holes into a wall at home.
Thus an agent needs to solve the following problem:
Get a quart of milk, a bunch of bananas and a
variable-speed cordless drill.

Artificial Intelligence – Prof. Dr. Jürgen Dix (488/554)


7. Planning 1. Planning vs. Problem-Solving

Question:
How does a problem-solving agent handle this?
Talk to Parrot

Go To Pet Store Buy a Dog

Go To School Go To Class

Go To Supermarket Buy Tuna Fish


Start

Go To Sleep Buy Arugula

Read A Book Buy Milk


... Finish

Sit in Chair Sit Some More

Etc. Etc. ... ... Read A Book

Artificial Intelligence – Prof. Dr. Jürgen Dix (489/554)


7. Planning 1. Planning vs. Problem-Solving

Planning in the situation calculus:

Initial state: At (Home, S0 ), ¬Have(Milk , S0 ),


¬Have(Bananas, S0 ), ¬Have(Drill , S0 ).
Goal: ∃s (At (Home, s) ∧ Have(Milk , s) ∧
Have(Bananas, s) ∧ Have(Drill , s).
Axioms: e.g. “buy milk”:
∀a, s Have(Milk , result (a, s)) ↔
(a = Buy (Milk ) ∧ At (Supermarket , s))∨
(Have(Milk , s) ∧ a 6= Drop(Milk ))

Artificial Intelligence – Prof. Dr. Jürgen Dix (490/554)


7. Planning 1. Planning vs. Problem-Solving

We need also a term Result 0 (l , s): the situation that


would emerge when the sequence of actions l is
executed in s.

∀s Result 0 ([], s) := s
∀a, p, s Result ([a|p], s) := Result 0 (p, result (a, s))
0

Artificial Intelligence – Prof. Dr. Jürgen Dix (491/554)


7. Planning 1. Planning vs. Problem-Solving

The task is now to find a p with


At (Home, Result 0 (p, S0 )) ∧
Have(Milk , Result 0 (p, S0 )) ∧
Have(Bananas, Result 0 (p, S0 )) ∧
Have(Drill , Result 0 (p, S0 )).
Idea:
This is the solution of our problem! Now we start our
theorem prover and get the answer, don’t we?

Artificial Intelligence – Prof. Dr. Jürgen Dix (492/554)


7. Planning 1. Planning vs. Problem-Solving

Problems:
T ` φ is only semi-decidable.
If p is a plan then also [Empty _Action|p] and
[A, A−1 |p].

Result:
We should not resort to a general prover, but to one
which is specially designed for our special case. We
also should restrict the language.

Artificial Intelligence – Prof. Dr. Jürgen Dix (493/554)


7. Planning 1. Planning vs. Problem-Solving

We should make clear the difference between


shipping a goal to a planner,
asking a query to a theorem prover.

In the first case we look for a plan so that after the


execution of the plan the goal holds.
In the second case we ask if the query can be made
true wrt the KB: KB |= ∃x φ(x ).
The dynamics is in the terms. The logic itself is static.

Artificial Intelligence – Prof. Dr. Jürgen Dix (494/554)


7. Planning 2. STRIPS

7.2 STRIPS

Artificial Intelligence – Prof. Dr. Jürgen Dix (495/554)


7. Planning 2. STRIPS

STRIPS stands for STanford Research Institute


Problem Solver.

states: conjunctions of function-free ground-


atoms (positive literals).
goal: conjunctions of function-free literals
actions: STRIPS-operations consist of three
components
1
description, name of the action
2
precondition, conjunction of atoms
3
postcondition, conjunction of literals

Artificial Intelligence – Prof. Dr. Jürgen Dix (496/554)


7. Planning 2. STRIPS

At(here), Path(here, there)

Go(there)

At(there), L At(here)

Artificial Intelligence – Prof. Dr. Jürgen Dix (497/554)


7. Planning 2. STRIPS

Example 110 (Air Cargo Transportation)


Three actions: Load , Unload , Fly . Two predicates
In(c , p) meaning cargo c is inside plane p and
At (x , a) meaning object x is at airport a.
Is cargo c at airport a when it is loaded in a plane p?

Artificial Intelligence – Prof. Dr. Jürgen Dix (498/554)


7. Planning 2. STRIPS

 

                 


   
         
 

+  + 

 ! # %    ! # %   ,   /   ,   /  

 
 
   

 ! 3 % !      ! 3 % !    

       

%  ,         


   
  

;   %   %  ?  C E 

 = A

P RECOND :
+

  E    C E   ! # %   ,   /  C   ! 3 % !   E 

 A     A   

E FFECT: I 
 

A
E 


 

A
C  

;   %   J  , %  ?  C E 

 A

P RECOND :
+

  C    C E   ! # %   ,   /  C   ! 3 % !   E 

A     A   

E FFECT: 
 

A
E 

 I
 

A
C  

;   %   , M  C N ! % P  % 

 

P RECOND :
+

  C N ! % P  ,   /  C   ! 3 % !   N ! % P   ! 3 % !    % 

     

E FFECT: I 
  C N ! % P 

 
  C  %  

Artificial Intelligence – Prof. Dr. Jürgen Dix (499/554)


7. Planning 2. STRIPS

Example 111 (Spare Tire)


Four actions: Remove, PutOn, LeaveOvernight. One
predicate At (x , a) meaning object x is at location a.
Is the following a STRIPS description?

Artificial Intelligence – Prof. Dr. Jürgen Dix (500/554)


7. Planning 2. STRIPS

                    !  


   

$           

  

,


+   $    0 $ 2          ! 

 / 

P RECOND :


         ! 

 

E FFECT:
#

         !         $   7  

3     


+   $  

/
 0 $ 2  

 


  

,
P RECOND : 
 

 


  

E FFECT:
#

            $   7  

3 
  

,
8

+   $             

 9  

P RECOND :
#

       $   7        

   3 


E FFECT:
#

       $   7           

3      


+   $  

<
 2 

9
2     = > 

,
P RECOND :
E FFECT:
#


       $   7                    ! 

3    3     3  

     $   7         

 3 
 3 


Artificial Intelligence – Prof. Dr. Jürgen Dix (501/554)


7. Planning 2. STRIPS

ADL (Action Description Language) and its many


derivatives are extensions of STRIPS:
States: both positive and negative literals in states.
OWA: Open World assumption (not CWA)
Effects: Add all positive and negative literals, and
delete their negations.
Quantification: Quantified variables in goals are
allowed.
Goals: disjunction and negation also allowed.
when P: Conditional effects allowed.

Artificial Intelligence – Prof. Dr. Jürgen Dix (502/554)


7. Planning 2. STRIPS

Example 112 (Blocks World)


One action: Move. Move(b, x , y ) moves the block b
from x to y if both b and y are clear. One predicate
On(b, x ) meaning block b is on x (x can be another
block or the table).

How to formulate that a block is clear?


“b is clear”: ∀x ¬On(x , b). Not allowed in STRIPS.
Therefore we introduce another predicate Clear (y ).
What about Move(b, x , y ) defined by
Precondition: On(b, x ) ∧ Clear (b) ∧ Clear (y ) and
Effect:
On(b, y ) ∧ Clear (x ) ∧ ¬On(b, x ) ∧ ¬Clear (y )?
Artificial Intelligence – Prof. Dr. Jürgen Dix (503/554)
7. Planning 2. STRIPS

(a) Progression (forward) and (b) Regression


(backward) state-space search
At(P1 , B)
Fly(P1 ,A,B) At(P2 , A)
At(P1 , A)
(a)
At(P2 , A)
Fly(P2 ,A,B) At(P1 , A)
At(P2 , B)

At(P1 , A)
At(P2 , B) Fly(P1 ,A,B)
At(P1 , B)
(b)
At(P2 , B)
At(P1 , B) Fly(P2 ,A,B)

At(P2 , A)

Artificial Intelligence – Prof. Dr. Jürgen Dix (504/554)


7. Planning 2. STRIPS

  

                        

      

 $ & '    $ & '    $ & '  

 "  "   " 

   -      -      -   

 *  *   * 

$          

     

&   $   7 $ 9   < 

4 = >

P RECOND : 
  <

=


 *
   -  < 

 *
   - 

>


 "
 $ & '  < 

 <

A
@

=



 <

A
@

>





=
A
@

>


,
E FFECT: 
  <

>


 *
   - 

=


 C 
  <

=


 C *
   - 

>
 

 

&   $   7 $ 9  $      < 

4 =

P RECOND : 
  <

=


 *
   -  < 

 "
 $ & '  < 


 <

A
@

=


,
E FFECT:


  <         -     <  

  * =  C  =

Artificial Intelligence – Prof. Dr. Jürgen Dix (505/554)


7. Planning 2. STRIPS

At(here), Path(here, there)

Go(there)

At(there), L At(here)

Artificial Intelligence – Prof. Dr. Jürgen Dix (506/554)


7. Planning 2. STRIPS

Definition 113 (Applicable Operator)


An operator Op is applicable in a state s if there is
some way to instantiate the variables in Op so that
every one of the preconditions of Op is true in
s:Precond (Op) ⊆ s
In the resulting state, all the positive literals in
Effect (Op) hold, as do all the literals that held in s,
except for those that are negative literals in
Effect (Op).

Artificial Intelligence – Prof. Dr. Jürgen Dix (507/554)


7. Planning 2. STRIPS

Frame problem is handled implicitly : literals not


mentioned in effects remain unchanged
(persistence).
Effect is sometimes split into add and delete lists.
Up to now we can consider this as being
problem-solving. We use STRIPS as an
representation-formalism and search a solution-path:
nodes in the search-tree ≈ situations
solution paths ≈ plans.

Artificial Intelligence – Prof. Dr. Jürgen Dix (508/554)


7. Planning 2. STRIPS

Idea:
Perform a search in plan-space!

Begin with a partial plan and extend successively.


Therefore we need operators which operate on
plans.We distinguish two types:
refinement-op: constraints are attached to a plan.
Then a plan represents the set of all
complete plans (analogously Cn(T ) for
MOD (T )).
modification-op: all others.

Artificial Intelligence – Prof. Dr. Jürgen Dix (509/554)


7. Planning 2. STRIPS

Question:
How do we represent plans?

Answer:
We have to consider two things:
instantiation of variables: instantiate only if
necessary, i.e. always choose the mgU
partial order: refrain from the exact ordering
(reduces the search-space)

Artificial Intelligence – Prof. Dr. Jürgen Dix (510/554)


7. Planning 2. STRIPS

Definition 114 (Plan)


A plan is formally defined as a data structure consisting of the
following four components:
A set of plan steps. Each step is one of the operators for
the problem.
A set of step ordering constraints. Each one of the form
Si ≺ Sj : Si has to be executed before Sj .
A set of variable binding constants. Each one of the form
v = x: v is a variable in some step, and x is either a
constant or another variable.
c
A set of causal links. Each one of the form Si −→ Sj : Si
achieves c for Sj .

Artificial Intelligence – Prof. Dr. Jürgen Dix (511/554)


7. Planning 2. STRIPS

The initial plan consists of two steps, called START


and FINISH.

Start Start

Initial State

Goal State LeftShoeOn, RightShoeOn

Finish Finish

(a) (b)

Artificial Intelligence – Prof. Dr. Jürgen Dix (512/554)


7. Planning 2. STRIPS

The initial plan of a problem is:


Plan( Steps {S1 : START , S2 : FINISH }
:
Orderings : {S1 ≺ S2 }
Bindings : 0/
Links : 0/
)

Artificial Intelligence – Prof. Dr. Jürgen Dix (513/554)


7. Planning 2. STRIPS

Partial Order Plan: Total Order Plans:

Start Start Start Start Start Start Start

Right Right Left Left Right Left


Sock Sock Sock Sock Sock Sock
Left Right
Sock Sock
Left Left Right Right Right Left
Sock Sock Sock Sock Shoe Shoe

LeftSockOn RightSockOn
Right Left Right Left Left Right
Left Right
Shoe Shoe Shoe Shoe Sock Sock
Shoe Shoe

Left Right Left Right Left Right


Shoe Shoe Shoe Shoe Shoe Shoe
LeftShoeOn, RightShoeOn

Finish Finish Finish Finish Finish Finish Finish

Artificial Intelligence – Prof. Dr. Jürgen Dix (514/554)


7. Planning 2. STRIPS

Question:
What is a solution?

Answer:
Considering only fully instantiated, linearly ordered plans:
checking is easy.

But our case is far more complicated:


Definition 115 (Solution of a Plan)
A solution is a complete and consistent plan. Here complete
means that each precondition of each step is achieved by some
other step and consistent means that there are no contradictions
in the ordering or binding constraints.

Artificial Intelligence – Prof. Dr. Jürgen Dix (515/554)


7. Planning 2. STRIPS

More precisely:
“Si achieves c for Sj ” means
c ∈ Precondition(Sj ),
Si ≺ Sj ,
6 ∃Sk : ¬c ∈ Effect (Sk ) with Si ≺ Sk ≺ Sj in any linearization of
the plan
“no contradictions” means
neither (Si ≺ Sj and Sj ≺ Si ) nor (v = A and v = B for different
constants A, B).
Note: these propositions may be derivable, because
≺ and = are transitive.

Artificial Intelligence – Prof. Dr. Jürgen Dix (516/554)


7. Planning 3. Partial-Order Planning

7.3 Partial-Order Planning

Artificial Intelligence – Prof. Dr. Jürgen Dix (517/554)


7. Planning 3. Partial-Order Planning

We consider the banana-milk example. The operators


are Buy and Go.

Start

At(Home) Sells(SM,Banana) Sells(SM,Milk) Sells(HWS,Drill)

Have(Drill) Have(Milk) Have(Banana) At(Home)

Finish

Causal links are protected!


Artificial Intelligence – Prof. Dr. Jürgen Dix (518/554)
7. Planning 3. Partial-Order Planning

Start

At(x) At(x)

Go(HWS) Go(SM)

At(HWS), Sells(HWS,Drill) At(SM), Sells(SM,Milk) At(SM), Sells(SM,Bananas)

Buy(Drill) Buy(Milk) Buy(Bananas)

Have(Drill) , Have(Milk) , Have(Bananas) , At(Home)

Finish

Start

At(Home) At(Home)

Go(HWS) Go(SM)

At(HWS), Sells(HWS,Drill) At(SM), Sells(SM,Milk) At(SM), Sells(SM,Bananas)

Buy(Drill) Buy(Milk) Buy(Bananas)

Have(Drill) , Have(Milk) , Have(Bananas) , At(Home)

Finish

Artificial Intelligence – Prof. Dr. Jürgen Dix (519/554)


7. Planning 3. Partial-Order Planning

The last partial plan cannot be extended. How do we


determine this? By determing that a causal link is
threatened and the threat cannot be resolved.
S3
c

L
S1 S1 S1

S3
c
L

c c c

S2 S2 S2

S3
c

L
(a) (b) (c)

The way to resolve the threat is to add ordering


constraints (this will not always work!).

Artificial Intelligence – Prof. Dr. Jürgen Dix (520/554)


7. Planning 3. Partial-Order Planning

Question:
We have to introduce a Go-step in order to ensure the last
precondition. But how can we ensure the precondition of the
Go-step?

Now there are a lot of threats and many of them are


unresolvable. This leads to
Start

At(Home) At(HWS)

Go(HWS) Go(SM)

At(HWS), Sells(HWS,Drill) At(SM), Sells(SM,Milk) At(SM), Sells(SM,Bananas) At(SM)

Buy(Drill) Buy(Milk) Buy(Bananas) Go(Home)

Have(Drill) , Have(Milk) , Have(Bananas) , At(Home)

Finish

Artificial Intelligence – Prof. Dr. Jürgen Dix (521/554)


7. Planning 3. Partial-Order Planning

Start

At(Home)

Go(HWS)

At(HWS) Sells(HWS,Drill)

Buy(Drill)

At(HWS)

Go(SM)

At(SM) Sells(SM,Milk) At(SM) Sells(SM,Ban.)

Buy(Milk) Buy(Ban.)

At(SM)

Go(Home)

Have(Milk) At(Home) Have(Ban.) Have(Drill)

Finish

Artificial Intelligence – Prof. Dr. Jürgen Dix (522/554)


7. Planning 3. Partial-Order Planning

This leads to the following algorithm


In each round the plan is extended in order to
ensure the precondition of a step. This is done by
choosing an appropriate operator.
The respective causal link is introduced. Threats
are resolved through ordering constraints (two
cases: the new step threatens existing ones or
the existing ones threaten the new one).
If there is no operator or the threat cannot be
resolved then perform backtracking.
Theorem 116 (POP)
POP is complete and correct.
Artificial Intelligence – Prof. Dr. Jürgen Dix (523/554)
7. Planning 3. Partial-Order Planning

function POP(initial, goal, operators) returns plan

plan MAKE-MINIMAL-PLAN(initial, goal)


loop do
if SOLUTION?( plan) then return plan
Sneed , c SELECT-SUBGOAL( plan)
CHOOSE-OPERATOR( plan, operators, S need, c)
RESOLVE-THREATS( plan)
end

function SELECT-SUBGOAL( plan) returns Sneed , c

pick a plan step S need from STEPS( plan)


with a precondition c that has not been achieved
return Sneed , c

procedure CHOOSE-OPERATOR(plan, operators, S need, c)

choose a step S add from operators or STEPS( plan) that has c as an effect
if there is no such step then fail
;!
add the causal link S add c Sneed to LINKS( plan)
add the ordering constraint S add Sneed to ORDERINGS( plan)
if Sadd is a newly added step from operators then

 
add Sadd to STEPS( plan)
add Start S add Finish to ORDERINGS( plan)

procedure RESOLVE-THREATS(plan)

;!
for each S threat that threatens a link Si c Sj in LINKS( plan) do


choose either


Promotion: Add Sthreat Si to ORDERINGS( plan)
Demotion: Add Sj Sthreat to ORDERINGS( plan)
if not CONSISTENT( plan) then fail
end

Artificial Intelligence – Prof. Dr. Jürgen Dix (524/554)


7. Planning 3. Partial-Order Planning

So far we did not consider variable-substitutions.


Question:
Assumed S1 ensures the At (home) precondition of a step S2
and there is a concurrent step S3 with the postcondition ¬At (x ).
Is this a threat?

Artificial Intelligence – Prof. Dr. Jürgen Dix (525/554)


7. Planning 3. Partial-Order Planning

We call such a threat possible and ignore it for the time being,
but keep it in mind. But if x is instantiated with home then a real
threat emerges which has to be resolved.

S1

S3
At(home) ¬ At(x)

S2

Artificial Intelligence – Prof. Dr. Jürgen Dix (526/554)


7. Planning 3. Partial-Order Planning

procedure CHOOSE-OPERATOR(plan, operators, S need, c)

choose a step S add from operators or STEPS( plan) that has c add as an effect
such that u = UNIFY(c, c add , BINDINGS( plan))
if there is no such step
then fail
add u to BINDINGS( plan)
;!
add Sadd c Sneed to LINKS( plan)
add Sadd  Sneed to ORDERINGS( plan)
if Sadd is a newly added step from operators then
add Sadd to STEPS( plan)
add Start  S add 
Finish to ORDERINGS( plan)

procedure RESOLVE-THREATS(plan)

;!
for each S i c Sj in LINKS( plan) do
for each S threat in STEPS( plan) do
for each c 0 in EFFECT(Sthreat ) do
if SUBST(BINDINGS( plan), c) = SUBST(BINDINGS( plan), : c ) then
0

choose either

Promotion: Add Sthreat Si to ORDERINGS( plan)
Demotion: Add Sj  Sthreat to ORDERINGS( plan)
if not CONSISTENT( plan)
then fail
end
end
end

Artificial Intelligence – Prof. Dr. Jürgen Dix (527/554)


7. Planning 3. Partial-Order Planning

STRIPS was originally designed


for SHAKEY, a small and mobile
robot. SHAKEY is described
through 6 Operators: Go(x ),
Push(b, x , y ), Climb(b),
Down(b), Turn_On(ls),
Turn_Off (ls).

To turn the lights on or off SHAKEY has to stand on a


box. On(Shakey , floor ) is a precondition of the
Go-action so that SHAKEY does not fall off a box.
Artificial Intelligence – Prof. Dr. Jürgen Dix (528/554)
7. Planning 3. Partial-Order Planning

Ls4

Room 4 Door 4

Ls3

Room 3 Door 3
Shakey

Corridor
Ls2

Room 2 Door 2

Box3 Ls1
Box2

Room 1 Door 1
Box4 Box1

Artificial Intelligence – Prof. Dr. Jürgen Dix (529/554)


7. Planning 3. Partial-Order Planning

POP for ADL


At(Spare,Trunk) Remove(Spare,Trunk)

At(Spare,Trunk) At(Spare,Ground)
Start PutOn(Spare,Axle) At(Spare,Axle) Finish
At(Flat,Axle) At(Flat,Axle)

Artificial Intelligence – Prof. Dr. Jürgen Dix (530/554)


7. Planning 3. Partial-Order Planning

At(Spare,Trunk) Remove(Spare,Trunk)

At(Spare,Trunk) At(Spare,Ground)
Start PutOn(Spare,Axle) At(Spare,Axle) Finish
At(Flat,Axle) At(Flat,Axle)

At(Flat,Axle)
At(Flat,Ground)
LeaveOvernight At(Spare,Axle)
At(Spare,Ground)
At(Spare,Trunk)

Artificial Intelligence – Prof. Dr. Jürgen Dix (531/554)


7. Planning 3. Partial-Order Planning

At(Spare,Trunk) Remove(Spare,Trunk)

At(Spare,Trunk) At(Spare,Ground)
Start PutOn(Spare,Axle) At(Spare,Axle) Finish
At(Flat,Axle) At(Flat,Axle)

At(Flat,Axle) Remove(Flat,Axle)

Artificial Intelligence – Prof. Dr. Jürgen Dix (532/554)


7. Planning 4. Conditional Planning

7.4 Conditional Planning

Artificial Intelligence – Prof. Dr. Jürgen Dix (533/554)


7. Planning 4. Conditional Planning

Question:
What to do if the world is not fully accessible? Where
to get milk? Milk-price has been redoubled and we do
not have enough money.

Idea:
Introduce new actions to query certain conditions and
to react accordingly. (How much does milk cost?)

Artificial Intelligence – Prof. Dr. Jürgen Dix (534/554)


7. Planning 4. Conditional Planning

A flat tire
Op( Action : Remove(x ),
Prec : On(x ),
Effect : Off (x ) ∧ Clear _Hub ∧ ¬On(x )
)
Op( Action : Put_on(x ),
Prec : Off (x ) ∧ Clear _Hub,
Effect : On(x ) ∧ ¬Clear _Hub ∧ ¬Off (x )
)
Op( Action : Inflate(x ),
Prec : Intact (x ) ∧ Flat (x ),
Effect : Inflated (x ) ∧ ¬Flat (x )
)
goal: On(x ) ∧ Inflated (x ),
initial state: Inflated (Spare) ∧ Intact (Spare) ∧ Off (Spare) ∧
On(Tire1 ) ∧ Flat (Tire1 ).
Artificial Intelligence – Prof. Dr. Jürgen Dix (535/554)
7. Planning 4. Conditional Planning

Question:
What does POP deliver?

Answer:
Because of Intact (Tire1 ) not being present POP
delivers the plan

[Remove(Tire1 ), Put_on(Spare)].

Artificial Intelligence – Prof. Dr. Jürgen Dix (536/554)


7. Planning 4. Conditional Planning

Question:
Would you also do it like that?

Answer:
A better way would be a conditional Plan:

If Intact (Tire1 ) then: Inflate(Tire1 ).

Artificial Intelligence – Prof. Dr. Jürgen Dix (537/554)


7. Planning 4. Conditional Planning

Therefore we have to allow conditional steps in the


phase of plan-building:
Definition 117 (Conditional Step)
A conditional step in a plan has the form
If({Condition}, {Then_Part }, {Else_Part })

Artificial Intelligence – Prof. Dr. Jürgen Dix (538/554)


7. Planning 4. Conditional Planning

The respective planning-agent looks like this:


function CONDITIONAL-PLANNING-AGENT( percept) returns an action
static: KB, a knowledge base (includes action descriptions)
p, a plan, initially NoPlan
t, a counter, initially 0, indicating time
G, a goal

TELL(KB, MAKE-PERCEPT-SENTENCE( percept, t))


current STATE-DESCRIPTION(KB, t)
if p = NoPlan then p CPOP(current, G, KB)
if p = NoPlan or p is empty then action NoOp
else
action FIRST( p)
while CONDITIONAL?(action) do
if ASK(KB, CONDITION-PART[action]) then p APPEND(THEN-PART[action], REST( p))
else p APPEND(ELSE-PART[action], REST( p))
action FIRST( p)
end
p REST( p)
TELL(KB, MAKE-ACTION-SENTENCE(action, t))
t t+1
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (539/554)


7. Planning 4. Conditional Planning

The agent has to know if the respective if-condition


holds or not when executing the plan (at runtime).
Therefore we introduce new checking actions:
Op( Action : Check _Tire(x ),
Prec : Tire(x ),
Effect : “We know if Intact (x ) holds or not.”
)

Definition 118 (Context)


We associate a context with each step: the set of
conditions which have to hold before executing a step.

Artificial Intelligence – Prof. Dr. Jürgen Dix (540/554)


7. Planning 4. Conditional Planning

On(Tire1)
On(x)
Start Flat(Tire1) Finish
Inflated(x)
Inflated(Spare) (True)

On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (True)
Flat(Tire1)
Inflate(Tire1)
Intact(Tire1)
(Intact(Tire1))

Artificial Intelligence – Prof. Dr. Jürgen Dix (541/554)


7. Planning 4. Conditional Planning

Here POP would backtrack, because Intact (Tire1 )


cannot be shown. Hence we introduce a new type of
links: conditional links.

Artificial Intelligence – Prof. Dr. Jürgen Dix (542/554)


7. Planning 4. Conditional Planning

On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (Intact(Tire1))
Flat(Tire1)
1) Inflate(Tire1)
t(Tire Intact(Tire1)
Intac (Intact(Tire1))
Check(Tire1)

Artificial Intelligence – Prof. Dr. Jürgen Dix (543/554)


7. Planning 4. Conditional Planning

We have to cover each case:

On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (Intact(Tire1))
Flat(Tire1)
1 ) Inflate(Tire1)
t(Tire Intact(Tire1)
Intac (Intact(Tire1))
Check(Tire1)

On(x)
Inflated(x) Finish
( Intact(Tire1))

L
Artificial Intelligence – Prof. Dr. Jürgen Dix (544/554)
7. Planning 4. Conditional Planning

Question:
What if new contexts emerge in the second case?

Answer:
Then we have to introduce new copies of the FINISH
step: there must be a complete distinction of cases in
the end.

Artificial Intelligence – Prof. Dr. Jürgen Dix (545/554)


7. Planning 4. Conditional Planning

Question:
How can we make Inflated (x ) true in the FINISH
step? Adding the step Inflate(Tire1 ) would not make
sense, because the preconditions are inconsistent in
combination with the context.
On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (Intact(Tire1))
Flat(Tire1)
1) Inflate(Tire1)
t(Tire Intact(Tire1)
Intac (Intact(Tire1))
Check(Tire1)

On(Spare)
Inflated(Spare) Finish
( Intact(Tire1))

L
Artificial Intelligence – Prof. Dr. Jürgen Dix (546/554)
7. Planning 4. Conditional Planning

At last On(Spare).
On(Tire1)
On(Tire1)
Start Flat(Tire1) Finish
Inflated(Tire1)
Inflated(Spare) (Intact(Tire1))
Flat(Tire1)
1) Inflate(Tire1)
t(Tire Intact(Tire1)
Intac (Intact(Tire1))
Check(Tire1)
Intact(Tire1)

L
On(Spare)
Inflated(Spare) Finish
Remove(Tire1) Puton(Spare)
( Intact(Tire1))

L
( Intact(Tire1)) ( Intact(Tire1))
L

L
Attention:
At first “True” is located under
Remove(Tire1 ), Put_on(Spare). This leads to a
threat with the most upper link.

Artificial Intelligence – Prof. Dr. Jürgen Dix (547/554)


7. Planning 4. Conditional Planning

We can resolve this threat by making the contexts


incompatible. The respective contexts are
handed down.

More exactly:
Search a conditional step which precondition makes
the contexts incompatible and thereby resolves
threats.

Artificial Intelligence – Prof. Dr. Jürgen Dix (548/554)


7. Planning 4. Conditional Planning

function CPOP(initial, goals, operators) returns plan

plan MAKE-PLAN(initial, goals)


loop do
Termination:
if there are no unsatisfied preconditions
and the contexts of the finish steps are exhaustive
then return plan
Alternative context generation:

: _ _
if the plans for existing finish steps are complete and have contexts C 1 . . . Cn then
add a new finish step with a context (C 1 . . . Cn )
this becomes the current context
Subgoal selection and addition:
find a plan step S need with an open precondition c
Action selection:
choose a step S add from operators or STEPS( plan) that adds c or
knowledge of c and has a context compatible with the current context
if there is no such step
then fail
;!
add Sadd c Sneed to LINKS( plan)
add Sadd < Sneed to ORDERINGS( plan)
if Sadd is a newly added step then
add Sadd to STEPS( plan)
add Start < Sadd < Finish to ORDERINGS( plan)
Threat resolution:
;!
for each step S threat that potentially threatens any causal link S i c Sj
with a compatible context do
choose one of
Promotion: Add Sthreat < Si to ORDERINGS( plan)
Demotion: Add Sj < Sthreat to ORDERINGS( plan)
Conditioning:
find a conditional step S cond possibly before both S threat and Sj , where
1. the context of Scond is compatible with the contexts of Sthreat and Sj ;
2. the step has outcomes consistent with S threat and Sj , respectively
add conditioning links for the outcomes from S cond to Sthreat and Sj
augment and propagate the contexts of S threat and Sj
if no choice is consistent
then fail
end
end

Artificial Intelligence – Prof. Dr. Jürgen Dix (549/554)


7. Planning 5. Extensions

7.5 Extensions

Artificial Intelligence – Prof. Dr. Jürgen Dix (550/554)


7. Planning 5. Extensions

1. A plan can fail because of the following


reasons:
Actions may have unexpected effects,
but these can be enumerated (as a
disjunction).
The unexpected effects are known.
Then we have to replan.

Artificial Intelligence – Prof. Dr. Jürgen Dix (551/554)


7. Planning 5. Extensions

function REPLANNING-AGENT( percept) returns an action


static: KB, a knowledge base (includes action descriptions)
p, an annotated plan, initially NoPlan
q, an annotated plan, initially NoPlan
G, a goal

TELL(KB, MAKE-PERCEPT-SENTENCE( percept, t))


current STATE-DESCRIPTION(KB, t)
if p = NoPlan then
p PLANNER(current, G, KB)
q p
if p = NoPlan or p is empty then return NoOp
if PRECONDITIONS( p) not currently true in KB then
p0 CHOOSE-BEST-CONTINUATION(current, q)
p APPEND(PLANNER(current, PRECONDITIONS( p 0 ), KB), p0 )
q p
action FIRST( p)
p REST( p)
return action

I.e. we perceive and then plan only if something has


changed.
Artificial Intelligence – Prof. Dr. Jürgen Dix (552/554)
7. Planning 5. Extensions

2. Combine replanning and conditional


planning. Planning and execution are
integrated.

Artificial Intelligence – Prof. Dr. Jürgen Dix (553/554)


7. Planning 5. Extensions

function REPLANNING-AGENT( percept) returns an action


static: KB, a knowledge base (includes action descriptions)
p, an annotated plan, initially NoPlan
q, an annotated plan, initially NoPlan
G, a goal

TELL(KB, MAKE-PERCEPT-SENTENCE( percept, t))


current STATE-DESCRIPTION(KB, t)
if p = NoPlan then
p PLANNER(current, G, KB)
q p
if p = NoPlan or p is empty then return NoOp
if PRECONDITIONS( p) not currently true in KB then
p0 CHOOSE-BEST-CONTINUATION(current, q)
p APPEND(PLANNER(current, PRECONDITIONS( p 0 ), KB), p0 )
q p
action FIRST( p)
p REST( p)
return action

Artificial Intelligence – Prof. Dr. Jürgen Dix (554/554)

Você também pode gostar