AI Systems and Definitions

AI Systems and Definitions
A good general definition of AI could be: AI is the part of computer science concerned with designing intelligent computer systems, that is, computer systems that exhibit the characteristics we associate with intelligence in human behaviour - understanding language, learning, reasoning and solving problems. There are many other suitable definitions -- See Last year's AI1 notes (Lecture 1). A theme we will develop in this course is that most AI systems can broken into:
y y y
Search, Knowledge Representation, applications of the above.
Search has been studied in great depth last year Last year's AI1 notes (Lecture 3) .
y y y y y
Knowledge representation deals with finding a means of encoding knowledge so that a machine can use it. e.g Expert systems have to work with a knowledge base as do many other reasoning tasks. We will look at knowledge representation early on in this course. Tasks such a planning, reasoning, learning, understanding basically involve some searching and perhaps updating of a knowledge base. Tasks such as vision, natural language understanding, speech recognition and robot planning involve searching knowledge also. We will look at some of these tasks in the remainder of the course.
An example of intelligent action

I have to go shopping in Cardiff. I need to get the following:
y y y y
The latest Soundgarden CD. A birthday present for a friend. This month's Sound on Sound magazine. A bottle of wine for tonight's dinner.
What present do I get my friend? I know he likes modern Jazz. I know Herbie Hancock has new CD out. I decide to get that for him. What wine do I get?
I am cooking beef with black bean source tonight so a good red wine would be in order. A French Burgundy would be nice. What reasoning has gone on so far? Quite a lot
y y y y y y
I know my friend likes Jazz. Recently from the Internet Newsgroups I have - updated my knowledge - that the Herb has a new CD out. The fact that it is new means that there is a good chance he does not have it already. (Also I do not have it so I can tape it). I know what meal I intend to cook. Red wine goes well with red meat and I know that Burgundy is nice wine.
Where to buy? There are many possibilities:

y y y y y
I know Spillers records are generally the cheapest store with a decent stock. But I also know they do not have a large stock of Jazz CDs. I therefore decide to try Spillers first for both CDs. The Virgin Megastore has the largest selection of Jazz CDs so I'll go there if not successful. Since I am going to the bottom area of Cardiff I decide to try Oddbins for the Wine
What reasoning has gone on now?

y y
We have know decided which stores to visit and our basic route. This is a planning problem. This is basically a search problem: o there are a variety of possible shops o I choose them to allow effective shopping. A lot of knowledge has been used (e.g): o The locations of the shops. o What they stock (and general prices). We have used this knowledge to help us select our shops: o Generally,unconstrained search is hard -- A hallmark of intelligence is the use of knowledge to make search problems more tractable.
How could a computer achieve this? NOTE: This would be a very difficult task. 1. Capture knowledge -- Encode knowledge in a ``language''. At least two forms of knowledge. o Certain -- My friend likes Jazz, Street plan of Cardiff.
Probable -- I can find a suitable CD in Cardiff.
Knowledge encoding is a difficult task. We will see this later in the course. 2. Search knowledge for solution. Could do this by: o Enumerating every combination of CD shop, Wine Merchant, Newsagent OR o Use information about shop locations etc. to constrain or search. OUR CHOSEN METHOD.
Summary
Typical problem solving (and hence many AI) tasks can be commonly reduced to:
y y
Knowledge Representation, and Search
Some problems highlight search whilst others knowledge representation.

Our way forward
There are many methods of search and knowledge representation. In last year's course search was studied extensively. The Machine Learning and Optimisation course this also deals with this. Last year you also studied Natural Language Understanding which is an example of searching a fairly well defined knowledge space. Also some problem solving methodologies were looked at. This year we will start by looking at knowledge representation and then look how we reason with knowledge (i.e. effectively search the knowledge base) and then look at these topics in action.
Further reading
Good introductory AI material can be found in most of the recommended books.
Exercise
1. Try to figure what basic AI processes would be involved in the following: o Translating English sentences into, say, Japanese. o Teaching a child to subtract integers. o Solving a crossword puzzle. 2. Scan through The Handbook of AI volumes or some recent AI journal pick a survey of a given topic (e.g. some expert system, vision, planning, robotics etc.) and o analyse what basic AI processes are involved. o determine to which application domains they can/have be employed in. o determine the current state of understanding of the topic.
compare the current state of understanding to normal human abilities.
AI Key Concepts - Revisited

In this lecture we will briefly review key AI concepts met in last year's course. We will summarise how they work and highlight how they are employed in topics we will meet later in this course. Please refer to last year's course and recommended books for full details.
y y y y y
Problems and Search o Problem Definition o Searching o Heuristic Search Knowledge Representation and Search o And-Or Graphs o AO* Algorithm Means-Ends Analysis Constraint Satisfaction Why are these topics important? Further reading Exercises
Problems and Search

Problem solving involves:
y y y y
problem definition -- detailed specifications of inputs and what constitutes an acceptable solution; problem analysis; knowledge representation; problem solving -- selection of best techniques.
Problem Definition
In order to solve the problem play a game, which is restricted to two person table or board games, a topic which was fully discussed in the last year's course, we require the rules of the game and the targets for winning as well as a means of representing positions in the game. The opening position can be defined as the initial state and a winning position as a goal state, there
can be more than one. legal moves allow for transfer from initial state to other states leading to the goal state. However the rules are far to copious in most games especially chess where they exceed the number of particles in the universe. Thus the rules cannot in general be supplied accurately and computer programs cannot easily handle them. The storage also presents another problem but searching can be achieved by hashing. The number of rules that are used must be minimised and the set can be produced by expressing each rule in as general a form as possible. The representation of games in this way leads to a state space representation and it is natural for well organised games with some structure. This representation allows for the formal definition of a problem which necessitates the movement from a set of initial positions to one of a set of target positions. It means that the solution involves using known techniques and a systematic search. This is quite a common method in AI.
y y
Well organised problems (e.g. games) can be described as a set of rules. Rules can be generalised and represented as a state space representation: o formal definition. o move from initial states to one of a set of target positions. o move is achieved via a systematic search.
Searching
There are 2 basic ways to perform a search:

y y
Blind search -- can only move according to position in search. Heuristic search -- use domain-specific information to decide where to search next.
Blind Search Depth-First Search

1. 2. 3. 4. Set L to be a list of the initial nodes in the problem. If L is empty, fail otherwise pick the first node n from L If n is a goal state, quit and return path from initial node. Otherwise remove n from L and add to the front of L all of n's children. Label each child with its path from initial node. Return to 2.
Note: All numbers in Fig 1 refer to order visited in search. Breadth-First Search
1. 2. 3. 4.
Set L to be a list of the initial nodes in the problem. If L is empty, fail otherwise pick the first node n from L If n is a goal state, quit and return path from initial node. Otherwise remove n from L and add to the end of L all of n's children. Label each child with its path from initial node. Return to 2.
Note: All numbers in Fig 1 refer to order visited in search.

Heuristic Search
A heuristic is a method that

y y y y
might not always find the best solution but is guaranteed to find a good solution in reasonable time. By sacrificing completeness it increases efficiency. Useful in solving tough problems which o could not be solved any other way. o solutions take an infinite time or very long time to compute.
The classic example of heuristic search methods is the travelling salesman problem. Heuristic Search methods Generate and Test Algorithm
1. generate a possible solution which can either be a point in the problem space or a path from the initial state. 2. test to see if this possible solution is a real solution by comparing the state reached with the set of goal states. 3. if it is a real solution, return. Otherwise repeat from 1.
This method is basically a depth first search as complete solutions must be created before testing. It is often called the British Museum method as it is like looking for an exhibit at random. A heuristic is needed to sharpen up the search. Consider the problem of four 6-sided cubes, and each side of the cube is painted in one of four colours. The four cubes are placed next to one another and the problem lies in arranging them so that the four available colours are displayed whichever way the 4 cubes are viewed. The problem can only be solved if there are at least four
sides coloured in each colour and the number of options tested can be reduced using heuristics if the most popular colour is hidden by the adjacent cube. Hill climbing Here the generate and test method is augmented by an heuristic function which measures the closeness of the current state to the goal state.
1. Evaluate the initial state if it is goal state quit otherwise current state is initial state. 2. Select a new operator for this state and generate a new state. 3. Evaluate the new state o if it is closer to goal state than current state make it current state o if it is no better ignore 4. If the current state is goal state or no new operators available, quit. Otherwise repeat from 2.
In the case of the four cubes a suitable heuristic is the sum of the number of different colours on each of the four sides, and the goal state is 16 four on each side. The set of rules is simply choose a cube and rotate the cube through 90 degrees. The starting arrangement can either be specified or is at random. SIMULATED ANNEALING This is a variation on hill climbing and the idea is to include a general survey of the scene to avoid climbing false foot hills. The whole space is explored initially and this avoids the danger of being caught on a plateau or ridge and makes the procedure less sensitive to the starting point. There are two additional changes; we go for minimisation rather than creating maxima and we use the term objective function rather than heuristic. It becomes clear that we are valley descending rather than hill climbing. The title comes from the metallurgical process of heating metals and then letting them cool until they reach a minimal energy steady final state. The probability that the metal will jump to a higher energy level is given by where k is Boltzmann's constant. The rate at which the system is cooled is called the annealing schedule. is called the change in the value of the objective function and kT is called T a type of temperature. An example of a problem suitable for such an algorithm is the travelling salesman. The SIMULATED ANNEALING algorithm is based upon the physical process which occurs in metallurgy where metals are heated to high temperatures and are then cooled. The rate of cooling clearly affects the finished product. If the rate of cooling is fast, such as when the metal is quenched in a large tank of water the structure at high temperatures persists at low temperature and large crystal structures exist, which in this case is equivalent to a local maximum. On the other hand if the rate of cooling is slow as in an air based method then a more uniform crystalline structure exists equivalent to a global maximum.The probability of making a large uphill move is lower than a small move and the probability of making large moves decreases with temperature. Downward moves are allowed at any time.
1. Evaluate the initial state.
2. 3. 4. 5.
If it is goal state Then quit otherwise make the current state this initial state and proceed. Make variable BEST_STATE to current state Set temperature, T, according to the annealing schedule Repeat -- difference between the values of current and new states 1. 2. 3. 4. If this new state is goal state Then quit Otherwise compare with the current state If better set BEST_STATE to the value of this state and make the current the new state If it is not better then make it the current state with probability p'. This involves generating a random number in the range 0 to 1 and comparing it with a half, if it is less than a half do nothing and if it is greater than a half accept this state as the next current be a half. 5. Revise T in the annealing schedule dependent on number of nodes in tree
Until a solution is found or no more new operators 6. Return BEST_STATE as the answer
Best First Search A combination of depth first and breadth first searches. Depth first is good because a solution can be found without computing all nodes and breadth first is good because it does not get trapped in dead ends. The best first search allows us to switch between paths thus gaining the benefit of both approaches. At each step the most promising node is chosen. If one of the nodes chosen generates nodes that are less promising it is possible to choose another at the same level and in effect the search changes from depth to breadth. If on analysis these are no better then this previously unexpanded node and branch is not forgotten and the search method reverts to the descendants of the first choice and proceeds, backtracking as it were. This process is very similar to steepest ascent, but in hill climbing once a move is chosen and the others rejected the others are never reconsidered whilst in best first they are saved to enable revisits if an impasse occurs on the apparent best path. Also the best available state is selected in best first even its value is worse than the value of the node just explored whereas in hill climbing the progress stops if there are no better successor nodes. The best first search algorithm will involve an OR graph which avoids the problem of node duplication and assumes that each node has a parent link to give the best node from which it came and a link to all its successors. In this way if a better node is found this path can be propagated down to the successors. This method of using an OR graph requires 2 lists of nodes OPEN is a priority queue of nodes that have been evaluated by the heuristic function but which have not yet been expanded into successors. The most promising nodes are at the front.
CLOSED are nodes that have already been generated and these nodes must be stored because a graph is being used in preference to a tree. Heuristics In order to find the most promising nodes a heuristic function is needed called f' where f' is an approximation to f and is made up of two parts g and h' where g is the cost of going from the initial state to the current node; g is considered simply in this context to be the number of arcs traversed each of which is treated as being of unit weight. h' is an estimate of the initial cost of getting from the current node to the goal state. The function f' is the approximate value or estimate of getting from the initial state to the goal state. Both g and h' are positive valued variables. Best First The Best First algorithm is a simplified form of the A* algorithm. From A* we note that f' = g+h' where g is a measure of the time taken to go from the initial node to the current node and h' is an estimate of the time taken to solution from the current node. Thus f' is an estimate of how long it takes to go from the initial node to the solution. As an aid we take the time to go from one node to the next to be a constant at 1. Best First Search Algorithm:
1. 2. 3. 4. Start with OPEN holding the initial state Pick the best node on OPEN Generate its successors For each successor Do o If it has not been generated before evaluate it add it to OPEN and record its parent o If it has been generated before change the parent if this new path is better and in that case update the cost of getting to any successor nodes 5. If a goal is found or no more nodes left in OPEN, quit, else return to 2.
The A* Algorithm Best first search is a simplified A*.

1. 2. 3. 4. Start with OPEN holding the initial nodes. Pick the BEST node on OPEN such that f = g + h' is minimal. If BEST is goal node quit and return the path from initial to BEST Otherwise Remove BEST from OPEN and all of BEST's children, labelling each with its path from initial node.
Graceful decay of admissibility If h' rarely overestimates h by more than d then the A* algorithm will rarely find a solution whose cost is d greater than the optimal solution.
Knowledge Representation and Search
y y
And-Or Graphs AO* Algorithm
And-Or Graphs
Useful for certain problems where

y y
The solution involves decomposing the problem into smaller problems. We then solve these smaller problems.
Here the alternatives often involve branches where some or all must be satisfied before we can progress. For example if I want to learn to play a Frank Zappa guitar solo I could (Fig. 2.2.1)
y y
Transcribe it from the CD. OR Buy the ``Frank Zappa Guitar Book'' AND Read it from there.
Note the use of arcs to indicate that one or more nodes must all be satisfied before the parent node is achieved. To find solutions using an And-Or GRAPH the best first algorithm is used as a basis with a modification to handle the set of nodes linked by the AND factor. Inadequate: CANNOT deal with AND bit well.
AO* Algorithm 1. Initialise the graph to start node 2. Traverse the graph following the current path accumulating nodes that have not yet been expanded or solved 3. Pick any of these nodes and expand it and if it has no successors call this value FUTILITY otherwise calculate only f' for each of the successors. 4. If f' is 0 then mark the node as SOLVED 5. Change the value of f' for the newly created node to reflect its successors by back propagation. 6. Wherever possible use the most promising routes and if a node is marked as SOLVED then mark the parent node as SOLVED.
7. If starting node is SOLVED or value greater than FUTILITY, stop, else repeat from 2.
Means-Ends Analysis
y y y y
Allows both backward and forward searching. This means we could solve major parts of a problem first and then return to smaller problems when assembling the final solution. GPS was the first AI program to exploit means-ends analysis. STRIPS (A robot Planner) is an advanced problem solver that incorporates means-ends analysis and other techniques.
Very loosely the means-ends analysis algorithm is: 1. Until the goal is reached or no more procedures are available: o Describe the current state, the goal state and the differences between the two. o Use the difference the describe a procedure that will hopefully get nearer to goal. o Use the procedure and update current state. 2. If goal is reached then success otherwise fail. See last year's course for numerous examples of this.
Constraint Satisfaction
y y y
The general problem is to find a solution that satisfies a set of constraints. heuristics used not to estimate the distance to the goal but to decide what node to expand nest. Examples of this technique are design problem, labelling graphs, robot path planning and cryptarithmetic puzzles (see last year).
Algorithm: 1. Propagate available constraints: o Open all objects that must be assigned values in a complete solution. o Repeat until inconsistency or all objects assigned valid valid values: select an object and strengthen as much as possible the set of constraints that apply to object. If set of constraints different from previous set then open all objects that share any of these constraints. remove selected object. 2. If union of constraints discovered above defines a solution return solution. 3. If union of constraints discovered above defines a contradiction return failure
1. Make a guess in order to proceed. Repeat until a solution is found or all possible solutions exhausted: o select an object with a no assigned value and try to strengthen its constraints. o recursively invoke constraint satisfaction with the current set of constraints plus the selected strengthening constraint.
Why are these topics important?

y y y
Search and knowledge representation form the basis for many AI techniques. We have studied search in some detail (last year). We will study Knowledge representation next.
Here are a few pointers as to where specific search methods are used in this course. Knowledge Representation -- Best first search (A*), Constraint satisfaction and means-ends analysis searches used in Rule based and knowledge. Uncertain reasoning -- Depth first, breadth first and constraint satisfaction methods used. Distributed reasoning -- Best first search (A*) and Constraint satisfaction. Planning -- Best first search (A*), AO*, Constraint satisfaction and means-ends analysis. Understanding -- Constraint satisfaction. Learning -- Constraint satisfaction, Means-ends analysis. Common sense -- Constraint satisfaction. Vision -- depth first, breadth first, heuristics, simulated annealing, constraint satisfaction are all used extensively. Robotics -- Constraint satisfaction and means-ends analysis used in planning robot paths.
Further reading
Most AI introductory texts give a good account of Search techniques. The Handbook of Artificial Intelligence (Vol. 1, Ch. 2) also surveys the topic well. Volume 3 gives more details on GPS and STRIPS.
Exercises
1. Implement the Search Algorithms described in this lecture in LISP and/or C. Comment on how suited each language would be for each type of search? 2. How suited would PROLOG be in implementing the search algorithms. Comment on how this might be done and what difficulties might exist. 3. Discuss the relative merits of depth first and breadth first searching methods. What memory overheads exist? How might searches be affected? Suggest some applications to which each is best suited. 4. Steepest ascent hill climbing uses the basic Hill climbing algorithm but chooses the best successor rather than the first successor that is better. How will this improve matters? 5. When will Hill climbing searches fail? Do Steepest ascent hill climbing always find solutions? How might some problems be overcome in the search? 6. List 3 differences between simulated annealing and simple hill climbing methods. 7. List examples where hill climbing and best first search behave (a) similarly (b) differently. 8. Write an algorithm to perform a breadth first search for a graph making sure your algorithm works when a single node is generated at more than one level of the graph. 9. When would best first search be worse than a simple breadth first search? 10. Trace the constraint satisfaction procedure to solve the following cryptarithmetic problem:
11. 12. 13. CROSS +ROADS ------DANGER
14. Discuss how constraint satisfaction might work it implemented its search strategy via: o depth first search o breadth first search o best first search
Knowledge Representation
y y y y
What to Represent? Using Knowledge Properties for Knowledge Representation Systems Approaches to Knowledge Representation o Simple relational knowledge o Inheritable knowledge Inferential Knowledge
y y y y y y y y y y y y y y y y y y y y y y y
o Procedural Knowledge Issue in Knowledge Representation Summary and the way forward Further Reading
What to Represent?
Let us first consider what kinds of knowledge might need to be represented in AI systems: Objects -- Facts about objects in our world domain. e.g. Guitars have strings, trumpets are brass instruments. Events -- Actions that occur in our world. e.g. Steve Vai played the guitar in Frank Zappa's Band. Performance -- A behavior like playing the guitar involves knowledge about how to do things. Meta-knowledge -- knowledge about what we know. e.g. Bobrow's Robot who plan's a trip. It knows that it can read street signs along the way to find out where it is. Thus in solving problems in AI we must represent knowledge and there are two entities to deal with: Facts -- truths about the real world and what we represent. This can be regarded as the knowledge level Representation of the facts which we manipulate. This can be regarded as the symbol level since we usually define the representation in terms of symbols that can be manipulated by programs. We can structure these entities at two levels the knowledge level -- at which facts are described the symbol level -- at which representations of objects are defined in terms of symbols that can be manipulated in programs (see Fig. 5)
y y y
Fig 5 Two Entities in Knowledge Representation English or natural language is an obvious way of representing and handling facts. Logic enables us to consider the following fact: spot is a dog as dog(spot) We could then infer that all dogs have tails with: : dog(x) hasatail(x) We can then deduce:
y y y
hasatail(Spot) Using an appropriate backward mapping function the English sentence Spot has a tail can be generated. The available functions are not always one to one but rather are many to many which is a characteristic of English representations. The sentences All dogs have tails and every dog has a tail both say that each dog has a tail but the first could say that each dog has more than one tail try substituting teeth for tails. When an AI program manipulates the internal representation of facts these new representations should also be interpretable as new representations of facts. Consider the classic problem of the mutilated chess board. Problem In a normal chess board the opposite corner squares have been eliminated. The given task is to cover all the squares on the remaining board by dominoes so that each domino covers two squares. No overlapping of dominoes is allowed, can it be done. Consider three data structures
y y y
Fig. 3.1 Mutilated Checker the first two are illustrated in the diagrams above and the third data structure is the number of black squares and the number of white squares. The first diagram loses the colour of the squares and a solution is not east to see; the second preserves the colours but produces no easier path whereas counting the number of squares of each colour giving black as 32 and the number of white as 30 yields an immediate solution of NO as a domino must be on one white square and one black square, thus the number of squares must be equal for a positive solution.
Using Knowledge
We have briefly mentioned where knowledge is used in AI systems. Let us consider a little further to what applications and how knowledge may be used. Learning -- acquiring knowledge. This is more than simply adding new facts to a knowledge base. New data may have to be classified prior to storage for easy retrieval, etc.. Interaction and inference with existing facts to avoid redundancy and replication in the knowledge and and also so that facts can be updated. Retrieval -- The representation scheme used can have a critical effect on the efficiency of the method. Humans are very good at it. Many AI methods have tried to model human (see lecture on distributed reasoning)
Reasoning -- Infer facts from existing data. If a system on only knows:

y y
Miles Davis is a Jazz Musician. All Jazz Musicians can play their instruments well.
If things like Is Miles Davis a Jazz Musician? or Can Jazz Musicians play their instruments well? are asked then the answer is readily obtained from the data structures and procedures. However a question like Can Miles Davis play his instrument well? requires reasoning. The above are all related. For example, it is fairly obvious that learning and reasoning involve retrieval etc.
Properties for Knowledge Representation Systems

The following properties should be possessed by a knowledge representation system. Representational Adequacy -- the ability to represent the required knowledge; Inferential Adequacy - the ability to manipulate the knowledge represented to produce new knowledge corresponding to that inferred from the original; Inferential Efficiency - the ability to direct the inferential mechanisms into the most productive directions by storing appropriate guides; Acquisitional Efficiency - the ability to acquire new knowledge using automatic methods wherever possible rather than reliance on human intervention. To date no single system optimises all of the above
Approaches to Knowledge Representation

We briefly survey some representation schemes. We will look at some in more detail in further lectures. Also some other course (e.g. Expert Systems also deal with related subjects).
Simple relational knowledge
Inheritable knowledge
Simple relational knowledge
The simplest way of storing facts is to use a relational method where each fact about a set of objects is set out systematically in columns. This representation gives little opportunity for inference, but it can be used as the knowledge basis for inference engines.
y y y y
Simple way to store facts. Each fact about a set of objects is set out systematically in columns (Fig. 7). Little opportunity for inference. Knowledge basis for inference engines.
Figure: Simple Relational Knowledge We can ask things like:

y y
Who is dead? Who plays Jazz/Trumpet etc.?
This sort of representation is popular in database systems.

Inheritable knowledge
Relational knowledge is made up of objects consisting of

y y
attributes corresponding associated values.
We extend the base more by allowing inference mechanisms:

y
Property inheritance o elements inherit values from being members of a class. o data must be organised into a hierarchy of classes (Fig. 8).
Fig. 8 Property Inheritance Hierarchy

y y y y
Boxed nodes -- objects and values of attributes of objects. Values can be objects with attributes and so on. Arrows -- point from object to its value. This structure is known as a slot and filler structure, semantic network or a collection of frames.
The algorithm to retrieve a value for an attribute of an instance object:

1. 2. 3. 4. 5. Find the object in the knowledge base If there is a value for the attribute report it Otherwise look for a value of instance if none fail Otherwise go to that node and find a value for the attribute and then report it Otherwise search through using isa until a value is found for the attribute.
Inferential Knowledge
Represent knowledge as formal logic: All dogs have tails
y
: dog(x)
hasatail(x) Advantages:
y y
A set of strict rules. o Can be used to derive more facts. o Truths of new statements can be verified. o Guaranteed correctness. Many inference procedures available to in implement standard rules of logic. Popular in AI systems. e.g Automated theorem proving.
Procedural Knowledge
Basic idea:
y
Knowledge encoded in some procedures o small programs that know how to do specific things, how to proceed. o e.g a parser in a natural language understander has the knowledge that a noun phrase may contain articles, adjectives and nouns. It is represented by calls to routines that know how to process articles, adjectives and nouns.
Advantages:
y y y
Heuristic or domain specific knowledge can be represented. Extended logical inferences, such as default reasoning facilitated. Side effects of actions may be modelled. Some rules may become false in time. Keeping track of this in large systems may be tricky.
Disadvantages:
y y
Completeness -- not all cases may be represented. Consistency -- not all deductions may be correct.
e.g If we know that Fred is a bird we might deduce that Fred can fly. Later we might discover that Fred is an emu.
y y y y y y y y y
Modularity is sacrificed. Changes in knowledge base might have far-reaching effects. Cumbersome control information.
Issue in Knowledge Representation

Below are listed issues that should be raised when using a knowledge representation technique: Important Attributes -- Are there any attributes that occur in many different types of problem? There are two instance and isa and each is important because each supports property inheritance. Relationships -- What about the relationship between the attributes of an object, such as, inverses, existence, techniques for reasoning about values and single valued attributes. We can consider an example of an inverse in band(John Zorn,Naked City) This can be treated as John Zorn plays in the band Naked City or John Zorn's band is Naked City. Another representation is band = Naked City band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron, Granularity -- At what level should the knowledge be represented and what are the primitives. Choosing the Granularity of Representation Primitives are fundamental concepts such as
y y y y y y
y y y y y y y y
y y y
holding, seeing, playing and as English is a very rich language with over half a million words it is clear we will find difficulty in deciding upon which words to choose as our primitives in a series of situations. If Tom feeds a dog then it could become: feeds(tom, dog) If Tom gives the dog a bone like: gives(tom, dog,bone) Are these the same? In any sense does giving an object food constitute feeding? If give(x, food) feed(x) then we are making progress. But we need to add certain inferential rules. In the famous program on relationships Louise is Bill's cousin How do we represent this? louise = daughter (brother or sister (father or mother( bill))) Suppose it is Chris then we do not know if it is Chris as a male or female and then son applies as well. Clearly the separate levels of understanding require different levels of primitives and these need many rules to link together apparently similar primitives. Obviously there is a potential storage problem and the underlying question must be what level of comprehension is needed.
Summary and the way forward

In this lecture we have hopefully seen the need for knowledge in reasoning programs. Many issues need to be considered when deciding on the representation scheme for knowledge. We introduced the concept of a slot and filler data structure for inheritable knowledge. In the next two lectures we will be looking at different types of slot and filler representations starting with semantic nets and frames and moving onto stronger conceptual dependency based representations.
Further Reading
Artificial Intelligence by Rich and Knight covers most topics well. One book dedicated to this field is Knowledge Representation: An AI perspective by H. Reichgelt, Ablex, New York. The Handbook of Artificial Intelligence (Vol. 1, Ch 3) provides a good concise introduction to most topics Bobrow and Collins deal with fundamental issues and also their robot planner in Representation and Understanding, Academic Press (1975).
Logic Knowledge Representation

We briefly mentioned how logic can be used to represent simple facts in the last lecture. Here we will highlight major principles involved in knowledge representation. In particular predicate logic will be met in other knowledge representation schemes and reasoning methods. A more comprehensive treatment is given in the third year Expert Systems course. Symbols used The following standard logic symbols we use in this course are:
For all
There exists
Implies
Not
Or
And
Let us now look at an example of how predicate logic is used to represent knowledge. There are other ways but this form is popular.
Predicate logic o An example o Isa and instance relationships Applications and extensions
y y
Further reading Exercises
Predicate logic
y y
An example Isa and instance relationships
An example
Consider the following:

y y y y
Prince is a mega star. Mega stars are rich. Rich people have fast cars. Fast cars consume a lot of petrol.
and try to draw the conclusion: Prince's car consumes a lot of petrol.
So we can translate Prince is a mega star into: mega_star(prince) and Mega stars are rich into: m: mega_star(m) rich(m) Rich people have fast cars, the third axiom is more difficult:
y y
Is cars a relation and therefore car(c,m) says that case c is m's car. OR Is cars a function? So we may have car_of(m).
Assume cars is a relation then axiom 3 may be written: c,m: car(c,m) rich(m)
fast(c).
The fourth axiom is a general statement about fast cars. Let consume(c) mean that car c consumes a lot of petrol. Then we may write: c: fast(c) m:car(c,m) consume(c) .
Is this enough? NO! -- Does prince have a car? We need the car_of function after all (and addition to car): c:car(car_of(m),m). The result of applying car_of to m is m's car. The final set of predicates is: mega_star(prince) m: mega_star(m) rich(m) c:car(car_of(m),m). c,m: car(c,m) rich(m) fast(c). c: fast(c) conclude: consume(car_of(prince)). m:car(c,m) consume(c) . Given this we could
Isa and instance relationships
Two attributes isa and instance play an important role in many aspects of knowledge representation. The reason for this is that they support property inheritance.
isa -- used to show class inclusion, e.g. isa(mega_star,rich). instance -- used to show class membership, e.g. instance(prince,mega_star).
From the above it should be simple to see how to represent these in predicate logic.
Applications and extensions

y
y y
First order logic basically extends predicate calculus to allow: o functions -- return objects not just TRUE/FALSE. o equals predicate added. Problem solving and theorem proving -- large application areas. STRIPS robot planning system employs a first order logic system to enhance its meansends analysis (GPS) planning. This amalgamation provided a very powerful heuristic search. Question answering systems.
Further reading
Artificial Intelligence by Rich and Knight covers this topic well. Knowledge Representation: An AI perspective by H. Reichgelt, Ablex, New York covers all aspects of this topic with good examples. The Handbook of Artificial Intelligence (Vol. 1, Ch 3) provides a good concise treatment of this area. See also the Expert Systems course.
Exercises
1. Assume the following facts: o Steve only likes easy courses. o Computing courses are hard.
o o
All courses in Sociology are easy. ``Society is evil'' is a sociology course.
Represent these facts in predicate logic and answer the question? What course would Steve like? 2. Find out what knowledge representation schemes are used in the STRIPS system.
Procedural Knowledge Representations
y y y y y
Declarative or Procedural? An Example Representing How to Use Knowledge Further reading Exercises
Declarative or Procedural?
Declarative knowledge representation:
y y
Static representation -- knowledge about objects, events etc. and their relationships and states given. Requires a program to know what to do with knowledge and how to do it.
Procedural representation:
y y
control information necessary to use the knowledge is embedded in the knowledge itself. e.g. how to find relevant facts, make inferences etc. Requires an interpreter to follow instructions specified in knowledge.
An Example
Let us consider what knowledge an alphabetical sorter would need:
y y
Implicit knowledge that A comes before B etc. This is easy -- really integer comparison of (ASCII) codes for o All programs contain procedural knowledge of this sort. .
The procedural information here is that knowledge of how to alphabetise is represented explicitly in the alphabetisation procedure. o A declarative system might have to have explicit facts like A comes before B, B comes before C etc..
y y y y y y
Representing How to Use Knowledge

Need to represent how to control the processing: direction -- indicate the direction an implication could be used. E.g. To prove something can fly show it is a bird. fly(x) bird(x). Knowledge to achieve goal -- specify what knowledge might be needed to achieve a specific goal. For example to prove something is a bird try using two facts has_wings and has_feathers to show it.
Further reading
Knowledge Representation: An AI perspective by H. Reichgelt, Ablex, New York covers all aspects of this topic with good examples. The Handbook of Artificial Intelligence (Vol. 1, Ch 3) provides a good concise treatment of this area. Winograd's article in Bobrow and Collins deal with fundamental issues and also their robot planner in Representation and Understanding, Academic Press (1975) deals with declarative verses procedural issues.
Exercises
1. Discuss how procedural methods may be used to solve the following problems: o Natural Language Understanding o The Games of Nim and Kalah (see last years notes for rules). o Path Planning type tasks.
Weak Slot and Filler Structures

We have already met this type of structure when discussing inheritance in the last lecture. We will now study this in more detail.
Why use this data structure?
y y
Semantic Nets o Representation in a Semantic Net o Inference in a Semantic Net o Extending Semantic Nets Frames o Frame Knowledge Representation o Interpreting frames Further Reading Exercises
Why use this data structure?

y
y y
It enables attribute values to be retrieved quickly o assertions are indexed by the entities o binary predicates are indexed by first argument. E.g. team(Mike-Hall , Cardiff). Properties of relations are easy to describe . It allows ease of consideration as it embraces aspects of object oriented programming.
So called because:
y y y
A slot is an attribute value pair in its simplest form. A filler is a value that a slot can take -- could be a numeric, string (or any data type) value or a pointer to another slot. A weak slot and filler structure does not consider the content of the representation.
We will study two types:

y y
Semantic Nets. Frames.
Semantic Nets
The major idea is that:
y y
The meaning of a concept comes from its relationship to other concepts, and that, The information is stored by interconnecting nodes with labelled arcs.
y y y
Representation in a Semantic Net Inference in a Semantic Net Extending Semantic Nets
Representation in a Semantic Net
The physical attributes of a person can be represented as in Fig. 9.
Fig. 9 A Semantic Network These values can also be represented in logic as: isa(person, mammal), instance(Mike-Hall, person) team(Mike-Hall, Cardiff) We have already seen how conventional predicates such as lecturer(dave) can be written as instance (dave, lecturer) Recall that isa and instance represent inheritance and are popular in many knowledge representation schemes. But we have a problem: How we can have more than 2 place predicates in semantic nets? E.g. score(Cardiff, Llanelli, 23-6) Solution:
y y
Create new nodes to represent new objects either contained or alluded to in the knowledge, game and fixture in the current example. Relate information to nodes and fill up slots (Fig: 10).
Fig. 10 A Semantic Network for n-Place Predicate As a more complex example consider the sentence: John gave Mary the book. Here we have several aspects of an event.
Fig. 11 A Semantic Network for a Sentence

Inference in a Semantic Net
Basic inference mechanism: follow links between nodes. Two methods to do this:
Intersection search -- the notion that spreading activation out of two nodes and finding their intersection finds relationships among objects. This is achieved by assigning a special tag to each visited node.
Many advantages including entity-based organisation and fast parallel implementation. However very structured questions need highly structured networks.
Inheritance -- the isa and instance representation provide a mechanism to implement this.
Inheritance also provides a means of dealing with default reasoning. E.g. we could represent:
y y y
Emus are birds. Typically birds fly and have wings. Emus run.
in the following Semantic net:
Fig. 12 A Semantic Network for a Default Reasoning In making certain inferences we will also need to distinguish between the link that defines a new entity and holds its value and the other kind of link that relates two existing entities. Consider the example shown where the height of two people is depicted and we also wish to compare them. We need extra nodes for the concept as well as its value.
Fig. 12 Two heights Special procedures are needed to process these nodes, but without this distinction the analysis would be very limited.
Fig. 12 Comparison of two heights

Extending Semantic Nets
Here we will consider some extensions to Semantic nets that overcome a few problems (see Exercises) or extend their expression of knowledge.
Partitioned Networks Partitioned Semantic Networks allow for:

y y
propositions to be made without commitment to truth. expressions to be quantified.
Basic idea: Break network into spaces which consist of groups of nodes and arcs and regard each space as a node.
Consider the following: Andrew believes that the earth is flat. We can encode the proposition the earth is flat in a space and within it have nodes and arcs the represent the fact (Fig. 15). We can the have nodes and arcs to link this space the the rest of the network to represent Andrew's
belief. Fig. 12 Partitioned network Now consider the quantified expression: Every parent loves their child To represent this we:
y y y
Create a general statement, GS, special class. Make node g an instance of GS. Every element will have at least 2 attributes: o a form that states which relation is being asserted. o one or more forall ( ) or exists ( ) connections -- these represent universally quantifiable variables in such statements e.g. x, y in loves(x,y) parent(x) : child(y)
Here we have to construct two spaces one for each x,y. NOTE: We can express variables as existentially qualified variables and express the event of love having an agent p and receiver b for every parent p which could simplify the network (See Exercises). Also If we change the sentence to Every parent loves child then the node of the object being acted on (the child) lies outside the form of the general statement. Thus it is not viewed as an
existentially qualified variable whose value may depend on the agent. (See Exercises and Rich and Knight book for examples of this) So we could construct a partitioned network as in Fig. 16
Fig. 12 Partitioned network
Frames
Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where the distinction between a semantic net and a frame ends. Semantic nets initially we used to represent labelled connections between objects. As tasks became more complex the representation needs to be more structured. The more structured the system it becomes more beneficial to use frames. A frame is a collection of attributes or slots and associated values that describe some real world entity. Frames on their own are not particularly helpful but frame systems are a powerful way of encoding information to support reasoning. Set theory provides a good basis for understanding frame systems. Each frame represents:
y y
a class (set), or an instance (an element of a class).
y y
Frame Knowledge Representation Interpreting frames
Frame Knowledge Representation
Consider the example first discussed in Semantics Nets (Section 6.2.1):

Person
isa:
Mammal
Cardinality:
Adult-Male
isa:
Person
Cardinality:
Rugby-Player
isa:
Adult-Male
Cardinality:
Height:
Weight:
Position:
Team:
Team-Colours:
Back
isa:
Rugby-Player
Cardinality:
Tries:
Mike-Hall
instance:
Back
Height:
6-0
Position:
Centre
Team:
Cardiff-RFC
Team-Colours:
Black/Blue
Rugby-Team
isa:
Team
Cardinality:
Team-size:
15
Coach:
Figure: A simple frame system Here the frames Person, Adult-Male, Rugby-Player and Rugby-Team are all classes and the frames Robert-Howley and Cardiff-RFC are instances. Note
y y y y y
The isa relation is in fact the subset relation. The instance relation is in fact element of. The isa attribute possesses a transitivity property. This implies: Robert-Howley is a Back and a Back is a Rugby-Player who in turn is an Adult-Male and also a Person. Both isa and instance have inverses which are called subclasses or all instances. There are attributes that are associated with the class or set such as cardinality and on the other hand there are attributes that are possessed by each member of the class or set.
DISTINCTION BETWEN SETS AND INSTANCES It is important that this distinction is clearly understood. Cardiff-RFC can be thought of as a set of players or as an instance of a Rugby-Team. If Cardiff-RFC were a class then
y y
its instances would be players it could not be a subclass of Rugby-Team otherwise its elements would be members of RugbyTeam which we do not want.
Instead we make it a subclass of Rugby-Player and this allows the players to inherit the correct properties enabling us to let the Cardiff-RFC to inherit information about teams. This means that Cardiff-RFC is an instance of Rugby-Team. BUT There is a problem here:
y y y
A class is a set and its elements have properties. We wish to use inheritance to bestow values on its members. But there are properties that the set or class itself has such as the manager of a team.
This is why we need to view Cardiff-RFC as a subset of one class players and an instance of teams. We seem to have a CATCH 22. Solution: MetaClasses A metaclass is a special class whose elements are themselves classes. Now consider our rugby teams as:
Figure: A Metaclass frame system The basic metaclass is Class, and this allows us to
y y
define classes which are instances of other classes, and (thus) inherit properties from this class.
Inheritance of default values occurs when one element or class is an instance of a class. Slots as Objects How can we to represent the following properties in frames?
y y y y y
Attributes such as weight, age be attached and make sense. Constraints on values such as age being less than a hundred Default values Rules for inheritance of values such as children inheriting parent's names Rules for computing values
Many values for a slot.
A slot is a relation that maps from its domain of classes to its range of values. A relation is a set of ordered pairs so one relation is a subset of another. Since slot is a set the set of all slots can be represent by a metaclass called Slot, say. Consider the following:
SLOT
isa:
Class
instance:
Class
domain:
range:
range-constraint:
definition:
default:
to-compute:
single-valued:
Coach
instance:
SLOT
domain:
Rugby-Team
range:
Person
range-constraint:
(experience x.manager)
default:
single-valued:
TRUE
Colour
instance:
SLOT
domain:
Physical-Object
range:
Colour-Set
single-valued:
FALSE
Team-Colours
instance:
SLOT
isa:
Colour
domain:
team-player
range:
Colour-Set
range-constraint:
not Pink
single-valued:
FALSE
Position
instance:
SLOT
domain:
Rugby-Player
range:
{ Back, Forward, Reserve }
to-compute:
x.position
single-valued:
TRUE
NOTE the following:
y y y y y y y
Instances of SLOT are slots Associated with SLOT are attributes that each instance will inherit. Each slot has a domain and range. Range is split into two parts one the class of the elements and the other is a constraint which is a logical expression if absent it is taken to be true. If there is a value for default then it must be passed on unless an instance has its own value. The to-compute attribute involves a procedure to compute its value. E.g. in Position where we use the dot notation to assign values to the slot of a frame. Transfers through lists other slots from which values can be derived from inheritance.
Interpreting frames
A frame system interpreter must be capable of the following in order to exploit the frame slot representation:
y y y y y
Consistency checking -- when a slot value is added to the frame relying on the domain attribute and that the value is legal using range and range constraints. Propagation of definition values along isa and instance links. Inheritance of default. values along isa and instance links. Computation of value of slot as needed. Checking that only correct number of values computed.
See Exercises for further instances of drawing inferences etc. from frames.
Further Reading
Artificial Intelligence by Rich and Knight covers this topic well. Knowledge Representation: An AI perspective by H. Reichgelt, Ablex, New York covers all aspects of this topic with good examples. The Handbook of Artificial Intelligence (Vol. 1, Ch 3) provides a good concise treatment of this area. Volume 3 Chapter 11 also gives good background information on Quinlan's semantic memory system. The HAM, ACT and MEMOD systems are also dealt with. Artificial Intelligence Programming by Charniak et al. gives Example LISP code for constructing Semantic nets and forms. See also the Expert Systems course.
Exercises
1. Construct Semantic Net representations of the following: 1. Dave is Welsh, Dave is a Lecturer. 2. Paul leant his new Frank Zappa CD to his best friend. 2. Find out how the SNePS system aims to improve the expressiveness of semantic nets.
3. Represent the following in a Semantic Net and Comment on how the SNePS system helps. 1. Mike and Mary's telephone number is the same. 2. John believes that Mike and Mary's telephone number is the same. 4. Represent the following in partitioned semantic networks: 1. Every player kicked a ball. 2. All players like the referee. 3. Andrew believes that there is a fish with lungs. 5. Simplify Fig. 16 that represents parent(x) : baby(y) loves(x,y)
so that we can express variables as existentially qualified variables and express the event of love having an agent p and receiver b for every parent p. 6. Pick a problem area and represent the knowledge in frame based system. 7. Devise algorithms that enable reasoning with frames. Discuss how 1. Inference through inheritance can be achieved. 2. Matching can be achieved. 8. What are the advantages of a frame based knowledge representation? 9. What problems do you envisage a a frame based knowledge representation having? Give examples of knowledge hard to represent in a frame. How could some problems be overcome? HINT: address issues to do with semantics, inheritance, expressiveness. 10. What programming languages would be suited to implement a semantic network and frames?
Strong Slot and Filler Structures

Strong Slot and Filler Structures typically:
y y y
Represent links between objects according to more rigid rules. Specific notions of what types of object and relations between them are provided. Represent knowledge about common situations.
Conceptual Dependency (CD)

Conceptual Dependency originally developed to represent knowledge acquired from natural language input. The goals of this theory are:
y y
To help in the drawing of inference from sentences. To be independent of the words used in the original input.
That is to say: For any 2 (or more) sentences that are identical in meaning there should be only one representation of that meaning.
It has been used by many programs that portend to understand English (MARGIE, SAM, PAM). CD developed by Schank et al. as were the previous examples. CD provides:
y y y
a structure into which nodes representing information can be placed a specific set of primitives at a given level of granularity.
Sentences are represented as a series of diagrams depicting actions using both abstract and real physical situations.
y y
The agent and the objects are represented The actions are built up from a set of primitive acts which can be modified by tense.
Examples of Primitive Acts are: ATRANS -- Transfer of an abstract relationship. e.g. give. PTRANS -- Transfer of the physical location of an object. e.g. go. PROPEL -- Application of a physical force to an object. e.g. push. MTRANS -- Transfer of mental information. e.g. tell. MBUILD -- Construct new information from old. e.g. decide. SPEAK -- Utter a sound. e.g. say. ATTEND -- Focus a sense on a stimulus. e.g. listen, watch. MOVE -- Movement of a body part by owner. e.g. punch, kick. GRASP -- Actor grasping an object. e.g. clutch. INGEST -- Actor ingesting an object. e.g. eat. EXPEL -- Actor getting rid of an object from body. e.g. ????. Six primitive conceptual categories provide building blocks which are the set of allowable dependencies in the concepts in a sentence:
PP -- Real world objects. ACT -- Real world actions. PA -- Attributes of objects. AA -- Attributes of actions. T -- Times. LOC -- Locations. How do we connect these things together? Consider the example: John gives Mary a book
Arrows indicate the direction of dependency. Letters above indicate certain relationships: o -- object. R -- recipient-donor. I -- instrument e.g. eat with a spoon. D -- destination e.g. going home.
y y
Double arrows ( ) indicate two-way links between the actor (PP) and action (ACT). The actions are built from the set of primitive acts (see above). o These can be modified by tense etc. The use of tense and mood in describing events is extremely important and schank introduced the following modifiers: p -- past f
-- future t -- transition -- start transition -- finished transition k -- continuing ? -- interrogative / -- negative delta -- timeless c -- conditional the absence of any modifier implies the present tense. So the past tense of the above example: John gave Mary a book becomes:
The has an object (actor), PP and action, ACT. I.e. PP ACT. The triplearrow ( two link but between an object, PP, and its attribute, PA. I.e. PP PA. It represents isa type dependencies. E.g Dave lecturerDave is a lecturer.
) is also a
Primitive states are used to describe many state descriptions such as height, health, mental state, physical state. There are many more physical states than primitive actions. They use a numeric scale. E.g. John height(+10) John is the tallest John Zappa health(-10) Frank Zappa is dead Dave physical_state(-10) The vase is broken height(< average) John is short Frank mental_state(-10) Dave is sad Vase
You can also specify things like the time of occurrence in the relation ship. For Example: John gave Mary the book yesterday
Now let us consider a more complex sentence: Since smoking can kill you, I stopped Lets look at how we represent the inference that smoking can kill:
y y y y
Use the notion of one to apply the knowledge to. Use the primitive act of INGESTing smoke from a cigarette to one. Killing is a transition from being alive to dead. We use triple arrows to indicate a transition from one state to another. Have a conditional, c causality link. The triple arrow indicates dependency of one concept on another.
To add the fact that I stopped smoking

y y
Use similar rules to imply that I smoke cigarettes. The qualification attached to this dependency indicates that the instance INGESTing smoke has stopped.
Advantages of CD:
y y y
Using these primitives involves fewer inference rules. Many inference rules are already represented in CD structure. The holes in the initial structure help to focus on the points still to be established.
Disadvantages of CD:
y y y y
Knowledge must be decomposed into fairly low level primitives. Impossible or difficult to find correct set of primitives. A lot of inference may still be required. Representations can be complex even for relatively simple actions. Consider: Dave bet Frank five pounds that Wales would win the Rugby World Cup. Complex representations require a lot of storage
Applications of CD: MARGIE (Meaning Analysis, Response Generation and Inference on English) -- model natural language understanding. SAM (Script Applier Mechanism) -- Scripts to understand stories. See next section. PAM (Plan Applier Mechanism) -- Scripts to understand stories. Schank et al. developed all of the above.
Scripts
A script is a structure that prescribes a set of circumstances which could be expected to follow on from one another.
It is similar to a thought sequence or a chain of situations which could be anticipated. It could be considered to consist of a number of slots or frames but with more specialised roles. Scripts are beneficial because:
y y y y
Events tend to occur in known runs or patterns. Causal relationships between events exist. Entry conditions exist which allow an event to take place Prerequisites exist upon events taking place. E.g. when a student progresses through a degree scheme or when a purchaser buys a house.
The components of a script include: Entry Conditions -- these must be satisfied before events in the script can occur. Results -- Conditions that will be true after events in script occur. Props -- Slots representing objects involved in events. Roles -- Persons involved in the events. Track -- Variations on the script. Different tracks may share components of the same script. Scenes -- The sequence of events that occur. Events are represented in conceptual dependency form. Scripts are useful in describing certain situations such as robbing a bank. This might involve:
y y y
Getting a gun. Hold up a bank. Escape with the money.
Here the Props might be

y y y y
Gun, G. Loot, L. Bag, B Get away car, C.
The Roles might be:

y y y
Robber, S. Cashier, M. Bank Manager, O.
Policeman, P.
The Entry Conditions might be:

y y
S is poor. S is destitute.
The Results might be:

y y y y
S has more money. O is angry. M is in a state of shock. P is shot.
There are 3 scenes: obtaining the gun, robbing the bank and the getaway. The full Script could be described in Fig 19.
Fig. 12 Simplified Bank Robbing Script Some additional points to note on Scripts:
y y y y y
If a particular script is to be applied it must be activated and the activating depends on its significance. If a topic is mentioned in passing then a pointer to that script could be held. If the topic is important then the script should be opened. The danger lies in having too many active scripts much as one might have too many windows open on the screen or too many recursive calls in a program. Provided events follow a known trail we can use scripts to represent the actions involved and use them to answer detailed questions.
Different trails may be allowed for different outcomes of Scripts ( e.g. The bank robbery goes wrong).
Advantages of Scripts:
y y
Ability to predict events. A single coherent interpretation may be build up from a collection of observations.
Disadvantages:
y y
Less general than frames. May not be suitable to represent all kinds of knowledge.
Reasoning with Uncertainty: Non-Monotonic Reasoning
y y y y
y y
What is reasoning? How can we reason? Uncertain Reasoning? Non-Monotonic Reasoning o Default reasoning o Circumscription o Implementations: Truth Maintenance Systems Further Reading Exercises
What is reasoning?
y y
When we require any knowledge system to do something it has not been explicitly told how to do it must reason. The system must figure out what it needs to know from what it already knows.
We have seen simple example of reasoning or drawing inferences already. For example if we know: Robins are birds. All birds have wings. Then if we ask: Do robins have wings? Some reasoning (albeit very simple) has to go on answer the question.
How can we reason?

To a certain extent this will depend on the knowledge representation chosen. Although a good knowledge representation scheme has to allow easy, natural and plausible reasoning. Listed below are very broad methods of how we may reason. We will study specific instances of some of these methods in the next few lectures. Formal reasoning -- Basic rules of inference with logic knowledge representations. Procedural reasoning -- Uses procedures that specify how to perhaps solve (sub) problems. Reasoning by analogy -- Humans are good at this, more difficult for AI systems. E.g. If we are asked Can robins fly?. The system might reason that robins are like sparrows and it knows sparrows can fly so ... Generalisation and abstraction -- Again humans effective at this. This is basically getting towards learning and understanding methods. Meta-level reasoning -- Once again uses knowledge about about what you know and perhaps ordering it in some kind of importance.
Uncertain Reasoning?
Unfortunately the world is an uncertain place. Any AI system that seeks to model and reasoning in such a world must be able to deal with this. In particular it must be able to deal with:
y y y
Incompleteness -- compensate for lack of knowledge. Inconsistencies -- resolve ambiguities and contradictions. Change -- it must be able to update its world knowledge base over time.
Clearly in order to deal with this some decision that a made are more likely to be true (or false) than others and we must introduce methods that can cope with this uncertainty. There are three basic methods that can do this:
y y y
Symbolic methods. Statistical methods. Fuzzy logic methods.
We will look at symbolic methods in this lecture and look the others in the next lecture.
Non-Monotonic Reasoning
Predicate logic and the inferences we perform on it is an example of monotonic reasoning. In monotonic reasoning if we enlarge at set of axioms we cannot retract any existing assertions or axioms. Humans do not adhere to this monotonic structure when reasoning:
y
we need to jump to conclusions in order to plan and, more basically, survive. o we cannot anticipate all possible outcomes of our plan. o we must make assumptions about things we do not specifically know about.
y y y
Default reasoning Circumscription Implementations: Truth Maintenance Systems
Default reasoning
This is a very common from of non-monotonic reasoning. Here We want to draw conclusions based on what is most likely to be true. We have already seen examples of this and possible ways to represent this knowledge. We will discuss two approaches to do this:
y y
Non-Monotonic logic. Default logic.
DO NOT get confused about the label Non-Monotonic and Default being applied to reasoning and a particular logic. Non-Monotonic reasoning is generic descriptions of a class of reasoning. Non-Monotonic logic is a specific theory. The same goes for Default reasoning and Default logic. Non-Monotonic Logic This is basically an extension of first-order predicate logic to include a modal operator, M. The purpose of this is to allow for consistency. For example: : plays_instrument(x) improvises(x) jazz_musician(x)
states that for all x is x plays an instrument and if the fact that x can improvise is consistent with all other knowledge then we can conclude that x is a jazz musician. How do we define consistency? One common solution (consistent with PROLOG notation) is to show that fact P is true attempt to prove is false). . If we fail we may say that P is consistent (since
However consider the famous set of assertions relating to President Nixon. : Republican(x) : Quaker(x) Pacifist(x) Pacifist(x) Pacifist(x)
Pacifist(x)
Now this states that Quakers tend to be pacifists and Republicans tend not to be. BUT Nixon was both a Quaker and a Republican so we could assert: Quaker(Nixon) Republican(Nixon) This now leads to our total knowledge becoming inconsistent. Default Logic Default logic introduces a new inference rule:
which states if A is deducible and it is consistent to assume B then conclude C. Now this is similar to Non-monotonic logic but there are some distinctions:
y
New inference rules are used for computing the set of plausible extensions. So in the Nixon example above Default logic can support both assertions since is does not say anything about how choose between them -- it will depend on the inference being made. In Default logic any nonmonotonic expressions are rules of inference rather than expressions.
Circumscription
Circumscription is a rule of conjecture that allows you to jump to the conclusion that the objects you can show that posses a certain property, p, are in fact all the objects that posses that property.
Circumscription can also cope with default reasoning. Suppose we know: bird(tweety) : penguin(x) : penguin(x) bird(x) flies(x)
and we wish to add the fact that typically, birds fly. In circumscription this phrase would be stated as: A bird will fly if it is not abnormal and can thus be represented by: : bird(x) abnormal(x) flies(x).
However, this is not sufficient We cannot conclude flies(tweety) since we cannot prove abnormal(tweety). This is where we apply circumscription and, in this case, we will assume that those things that are shown to be abnormal are the only things to be abnormal Thus we can rewrite our default rule as: : bird(x) flies(x) abnormal(x)
and add the following : abnormal(x) since there is nothing that cannot be shown to be abnormal. If we now add the fact:
penguin(tweety) Clearly we can prove abnormal(tweety). If we circumscribe abnormal now we would add the sentence, a penguin (tweety) is the abnormal thing: : abnormal(x) penguin(x).
Note the distinction between Default logic and circumscription: Defaults are sentences in language itself not additional inference rules.
Implementations: Truth Maintenance Systems
Due to Lecture time limitation. This topic is not dealt with in any great depth. Please refer to the further reading section. A variety of Truth Maintenance Systems (TMS) have been developed as a means of implementing Non-Monotonic Reasoning Systems. Basically TMSs:
y y
all do some form of dependency directed backtracking assertions are connected via a network of dependencies.
Justification-Based Truth Maintenance Systems (JTMS)

y y y
This is a simple TMS in that it does not know anything about the structure of the assertions themselves. Each supported belief (assertion) in has a justification. Each justification has two parts: o An IN-List -- which supports beliefs held. o An OUT-List -- which supports beliefs not held. An assertion is connected to its justification by an arrow. One assertion can feed another justification thus creating the network. Assertions may be labelled with a belief status. An assertion is valid if every assertion in the IN-List is believed and none in the OUT-List are believed. An assertion is non-monotonic is the OUT-List is not empty or if any assertion in the IN-List is non-monotonic.
y y y y y
Fig. 20 A JTMS Assertion Logic-Based Truth Maintenance Systems (LTMS) Similar to JTMS except:
y y y
Nodes (assertions) assume no relationships among them except ones explicitly stated in justifications. JTMS can represent P and P simultaneously. An LTMS would throw a contradiction here. If this happens network has to be reconstructed.
Assumption-Based Truth Maintenance Systems (ATMS)

y y y y
JTMS and LTMS pursue a single line of reasoning at a time and backtrack (dependency-directed) when needed -- depth first search. ATMS maintain alternative paths in parallel -- breadth-first search Backtracking is avoided at the expense of maintaining multiple contexts. However as reasoning proceeds contradictions arise and the ATMS can be pruned o Simply find assertion with no valid justification.
Further Reading
Artificial Intelligence by Rich and Knight covers these topic well. In particular see the sections on LTMS, JTMS and ATMS. Knowledge Representation: An AI perspective by H. Reichgelt, Ablex, New York discusses default and nonmonotonic reasoning in an easy to read and understandable manner. The Essentials of AI by Ginsberg provides some alternative and comprehensive treatments of all topics discussed in this lecture. Individual chapters are dedicated to ATMS and Non-Monotonic reasoning respectively.
The Handbook book of Artificial Intelligence (Vol 3 Ch 12) deals with nonmonotonic logics and logic programming. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference by Pearl, Morgan Kaufmann. Describes Non-monotic reasoning and Truth Maintenance Systems in some detail. It also describes their use with topics dealt with in the next lecture on Statistical reasoning. A excellent collection of research papers is "Readings in Uncertain Reasoning", Ed Shafer and Pearl, Morgan Kaufmann, 1990.
Exercises
1. What knowledge representation schemes met so far are most suited to the forms of reasoning discussed in the lectures? 1. Discuss how they might deal with Incompleteness, inconsistency, change and and non-monotonic default reasoning. 2. What knowledge forms are totally inadequate for uncertain reasoning? 2. Rewrite the President Nixon example using abnormal predicates. 3. Use circumscription to resolve the President Nixon example. 4. Consider the problem of deciding which clothes to wear using knowledge such as: o Wear casual clothes unless they are not clean or important meeting occurs today. o Wear a Sweater if it is cold. o The winter is usually cold. o Wear shorts if it's warm. o The summer is usually warm. 6. Construct a JTMS network to represent these facts. 7. Try to solve the problem In winter do I wear shorts? 8. Answer the question What shall I wear today (You may assume that the system knows the time of year). Construct as JTMS and ATMS to represent the following 0. If you have spots and a temperature you have measles. 1. If you have a runny nose then unless it is hay fever season you have a cold. Show how a JTMS could be used to facilitate constraint satisfaction problems and in particular cryptarithmetic puzzles. Show how an ATMS could be used to facilitate constraint satisfaction problems and in particular cryptarithmetic puzzles. Design a depth first based algorithm to search or label a JTMS. Design a breadth first based algorithm to search or label an ATMS. Use an ATMS to solve the following car diagnostic problem. You may assume the following:
o o
If the battery is OK and the headlight bulbs are OK then the headlights will work. If the battery is OK and the starter is OK then the engine will start.
Find explanations as to why:
o o
The engine won't start but the headlights work. the headlights won't work but the car starts.
Planning I
y y y y
y y y
What does planning involve? Search in Planning Blocks World Planning Examples Planning System Components o Choice of best rule o Rule application o Detecting Progress Goal Stack Planning o Sussman Anomaly (1975) Further Reading Exercises
What does planning involve?

Planning problems are hard problems:
y y
They are certainly non-trivial. Solutions involve many aspects that we have studied so far: o Search and problem solving strategies. o Knowledge Representation schemes. o Problem decomposition -- breaking problem into smaller pieces and trying to solve these first.
We have seen that it is possible to solve a problem by considering the appropriate form of knowledge representation and using algorithms to solve parts of the problem and also to use searching methods.
Search in Planning
Search basically involved moving from an Initial state to a Goal State. Classic search techniques could be applied to planning state in this manner:
A* Algorithm -- best first search, Problem decomposition -- Synthesis, Frame Problem. AO* Algorithm -- Split problem into distinct parts. Heuristic reasoning -- ordinary search backtracking can be hard so introduce reasoning to devise heuristics and to control the backtracking. The first major method considered the solution as a search from the initial state to a goal state through a state space. There are many ways of moving through this space by using operators and the A* algorithm described the best first search through a graph. This method is fine for simpler problems but for more realistic problems it is advisable to use problem decomposition. Here the problem is split into smaller sub problems and the partial solutions are then synthesised. The danger in this method occurs when certain paths become abortive and have to be discarded. How much of the partial solution can be saved and how much needs to be recomputed. The frame problem -- deciding which things change and which do not -- gave some guidance on enabling us to decide on what stays the same and on what changes as we go from state to state. If the problem concerned designing a robot to build a car then mounting the engine on the chassis would not affect the rear of the car nowadays. The AO* algorithm enabled us to handle the solution of problems where the problem could be split into distinct parts and then the partial solutions reassembled. However difficulties arise if parts interacted with one another. Most problems have some interaction and this implies some thought into the ordering of the steps; for example if the robot has to move a desk with objects on it from one room to another; or to move a sofa from one room to another and the piano is near the doorway. The thought process involved in recombining partial solutions of such problems is known as planning. At this point we run into a discussion about the role of the computer in the design of a plan as to how we can solve a problem. It is extremely unlikely at this stage that a computer will actually solve the problem unless it is a game and here some interaction with a person is needed. Generally the computer is used to decide upon or to offer words of wisdom on the best method of approaching the solution of a problem. In one sense this can be interpreted as a simulation; if we are considering the handling of queues at an airport or a post office there are no actual people and so we can try a range of possibilities; likewise in a traffic control problem there are no cars or aircraft piling up at a junction or two miles over an airport. Once the computer based investigation has come up with the best solution we can then implement it at the real site. This approach assumes that there is continuity in the way of life. It cannot budget for rapid change or revolution. How can this approach cater for unexpected events such as a faulty component or a spurious happening such as two items stuck together or a breakdown in the path somewhere vital such as in a camera reel. When a fault or some unrecognisable state is encountered it is not necessary to restart for much of what has been successfully solved is still
useful. Consider a child removing the needles from a partially completed knitted sweater. The problem lies in restarting from a dead end and this will need some backtracking. This method of solution is pursued to reduce the level of complexity and so to ensure successful handling we must introduce reasoning to help in the backtracking required to cater for faults. To assist in controlling backtracking many methods go backward from the goal to an initial state.
Blocks World Planning Examples

What is the Blocks World? -- The world consists of:
y y y y y
A flat surface such as a tabletop An adequate set of identical blocks which are identified by letters. The blocks can be stacked one on one to form towers of apparently unlimited height. The stacking is achieved using a robot arm which has fundamental operations and states which can be assessed using logic and combined using logical operations. The robot can hold one block at a time and only one block can be moved at a time.
We shall use the four actions: UNSTACK(A,B) -- pick up clear block A from block B; STACK(A,B) -- place block A using the arm onto clear block B; PICKUP(A) -- lift clear block A with the empty arm; PUTDOWN(A) -- place the held block A onto a free space on the table. and the five predicates: ON(A,B) -- block A is on block B. ONTABLE(A) -- block A is on the table. CLEAR(A) -- block A has nothing on it. HOLDING(A) -- the arm holds block A. ARMEMPTY -- the arm holds nothing. Using logic but not logical notation we can say that If the arm is holding a block it is not empty If block A is on the table it is not on any other block If block A is on block B,block B is not clear. Why Use the Blocks world as an example?
The blocks world is chosen because:

y y y
it is sufficiently simple and well behaved. easily understood yet still provides a good sample environment to study planning: o problems can be broken into nearly distinct subproblems o we can show how partial solutions need to be combined to form a realistic complete solution.
Planning System Components

Simple problem solving tasks basically involve the following tasks: 1. 2. 3. 4. Choose the best rule based upon heuristics. Apply this rule to create a new state. Detect when a solution is found. Detect dead ends so that they can be avoided.
More complex problem solvers often add a fifth task:
4 1. Detect when a nearly solved state occurs and use special methods to make it a solved state. Now let us look at what AI techniques are generally used in each of the above tasks. We will then look a t specific methods of implementation.
y y y
Choice of best rule Rule application Detecting Progress
Choice of best rule
Methods used involve

y y y
finding the differences between the current states and the goal states and then choosing the rules that reduce these differences most effectively. Means end analysis good example of this.
If we wish to travel by car to visit a friend

y y y
the first thing to do is to fill up the car with fuel. If we do not have a car then we need to acquire one. The largest difference must be tackled first.
Rule application
y y y
Previously rules could be applied without any difficulty as complete systems were specified and rules enabled the system to progress from one state to the next. Now we must be able to handle rules which only cover parts of systems. A number of approaches to this task have been used.
Green's Approach (1969) Basically this states that we note the changes to a state produced by the application of a rule. Consider the problem of having two blocks A and B stacked on each other (A on top). Then we may have an initial state
ON(
which could be described as:
ONTABLE(
CLEAR(
If we now wish to UNSTACK(A, B). We express the operation as follows:

[CLEAR(x,s) ON(x, y, s)]
[HOLDING(x, DO(UNSTACK(x,y),s))
CLEAR(y,DO(UNSTACK(x,y),s))]
where x,y are any blocks, s is any state and DO() specifies that an new state results from the given action. The result of applying this to state HOLDING( ) CLEAR( ). to give state then we get:
There are a few problems with this approach:

The frame problem -- In the above we know that B is still on the table. This needs to be encoded into frame axioms that describe components of the state not affected by the operator. The qualification problem -- If we resolve the frame problem the resulting description may still be inadequate. Do we need to encode that a block cannot be places on top of itself? If so should this attempt fail?
If we allow failure things can get complex -- do we allow for a lot of unlikely events?
The ramification problem -- After unstacking block A, previously, how do we know that A is no longer at its initial location?
Not only is it hard to specify exactly what does not happen ( frame problem) it is hard to specify exactly what does happen. STRIPS (1971 ff.) STIPS proposed another approach:
y
y y
Basically each operator has three lists of predicates associated with it: o a list of things that become TRUE called ADD. o a list of things that become FALSE called DELETE. o a set of prerequisites that must be true before the operator can be applied. Anything not in these lists is assumed to be unaffected by the operation. This method initial implementation of STRIPS -- has been extended to include other forms of reasoning/planning (e.g. Nonmonotonic methods, Goal Stack Planning and even Nonlinear planning -- see later)
Consider the following example in the Blocks World and the fundamental operations:
STACK
-- requires the arm to be holding a block A and the other block B to be clear. Afterwards the block A is on block B and the arm is empty and these are true -- ADD; The arm is not holding a block and block B is not clear; predicates that are false DELETE; UNSTACK -- requires that the block A is on block B; that the arm is empty and that block A is clear. Afterwards block B is clear and the arm is holding block A - ADD; The arm is not empty and the block A is not on block B -- DELETE; See Exercises for more examples.
We have now greatly reduced the information that needs to be held. If a new attribute is introduced we do not need to add new axioms for existing operators. Unlike in Green's method we remove the state indicator and use a database of predicates to indicate the current state Thus if the last state was: ONTABLE(B) ON(A,B) CLEAR(A) after the unstack operation the new state is ONTABLE(B) CLEAR(B) HOLDING(A) CLEAR(A)
Detecting Progress
The final solution can be detected if

y y
we can devise a predicate that is true when the solution is found and is false otherwise. requires a great deal of thought and requires a proof.
Detecting false trails is also necessary:

y y y y
E.g. A* search -- if insufficient progress is made then this trail is aborted in favour of a more hopeful one. Sometimes it is clear that solving a problem one way has reduced the problem to parts that are harder than the original state. By moving back from the goal state to the initial state it is possible to detect conflicts and any trail or path that involves a conflict can be pruned out. Reducing the number of possible paths means that there are more resources available for those left.
Supposing that the computer teacher is ill at a school there are two possible alternatives
y y
transfer a teacher from mathematics who knows computing or bring another one in.
Possible Problems:
y y
If the maths teacher is the only teacher of maths the problem is not solved. If there is no money left the second solution could be impossible.
If the problems are nearly decomposable we can treat them as decomposable and then patch them, how? Consider the final state reached by treating the problem as decomposable at the current state and noting the differences between the goal state and the current state and the goal state and the initial state and use appropriate Means End analysis techniques to move in the best direction. Better is to work back in the path leading to the current state and see if there are options. It may be that one optional path could lead to a solution whereas the existing route led to a conflict. Generally this means that some conditions are changed prior to taking an optional path through the problem. Another approach involves putting off decisions until one has to, leaving decision making until more information is available and other routes have been explored. Often some decisions need not be taken as these nodes are never reached.
Goal Stack Planning

Basic Idea to handle interactive compound goals uses goal stacks, Here the stack contains :
y y y
goals, operators -- ADD, DELETE and PREREQUISITE lists a database maintaining the current situation for each operator used.
Consider the following where wish to proceed from the start to goal state.
Fig. 24 Goal Stack Planning Example We can describe the start state: ON(B, A) ONTABLE(A) ONTABLE(C) ONTABLE(D) ARMEMPTY and goal state: ON(C, A) ON(B,D) ONTABLE(A) ONTABLE(D)
y y y
Initially the goal stack is the goal state. We then split the problem into four subproblems Two are solved as they already are true in the initial state -- ONTABLE(A), ONTABLE(D). With the other two -- there are two ways to proceed:
1. ON(C,A) 2. 3. 4. 5. 6. 7. 8. ONTABLE(D) 9. ON(B,D) ON(C,A) ONTABLE(A) ON(B,D)
10.
11. 12. 13. 14. 15. 16. 17. 18. 19. ON(B,D) ON(C,A) ON(C,A) ONTABLE(A) ONTABLE(D) ON(B,D)
20. The method is to

y y y
Investigate the first node on the stack ie the top goal. If a sequence of operators is found that satisfies this goal it is removed and the next goal is attempted. This continues until the goal state is empty.
Consider alternative 1 above further:

y y y y y y y y y y y
The first goal ON(C,A) is not true and the only operator that would make it true is STACK (C,A) which replaces ON(C,A) giving:
B<>STACK (C,A) ON(B,D) ON(C,A) ONTABLE(A) ONTABLE(D) ON(B,D)
STACK has prerequisites that must be met which means that block A is clear and the arm is holding block C. So we must do:
y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y
B<>CLEAR(A) HOLDING(C) CLEAR(A) HOLDING(C)
STACK (C,A) ON(B,D) ON(C,A) ONTABLE(A) ONTABLE(D) ON(B,D)
Now top goal is false and can only be made true by unstacking B. This leads to:
B<>ON(B,A) CLEAR(B) ARMEMPTY ON(B,A) ARMEMPTY UNSTACK(B,A) HOLDING(C) CLEAR(A) HOLDING(C) CLEAR(B)
Now the first goal is true, the second is universally true, and the arm is empty. Thus all top three goals are true means that we can apply the operator UNSTACK(B,A) as all prerequisites are met. This gives us the first node in database ONTABLE(A) ONTABLE(C) ONTABLE(D) HOLDING(C) CLEAR(A)
y y y y y y y y
Note as a future reference of the use of UNSTACK(B,A) that HOLDING(B) is true as well as CLEAR(A) The goal stack becomes
HOLDING(C) CLEAR(A) STACK (C,A) HOLDING(C)
y y y y y y y
ON(B,D) ON(C,A) ONTABLE(D) ON(B,D) ONTABLE(A)
There are two ways we can achieve HOLDING(C) by using the operators PICKUP(C) or UNSTACK(C,x) where x is an unspecified block. This leads to two alternative paths:
1. ON(C, x) 2. 3. CLEAR(C) 4. 5. ARMEMPTY 6. 7. ON(C, x) CLEAR(C) 8. 9. ARMEMPTY 10. 11. UNSTACK(C,x) 12. 13. CLEAR(A) HOLDING(C) 14. 15. STACK (C,A) 16. 17. ON(B,D) 18. 19. ON(C,A) ON(B,D) ONTABLE(A) 20. 21. 22. ONTABLE(D) 23.
24.
1
1. ONTABLE(C) 2. 3. CLEAR(C) 4. 5. ARMEMPTY 6. 7. ONTABLE(C) CLEAR(C) 8. 9. ARMEMPTY 10. 11. PICKUP(C) 12. 13. CLEAR(A) HOLDING(C) 14. 15. STACK (C,A) 16. 17. ON(B,D)
18. 19. 20. 21. 22. 23.
ON(C,A)
ON(B,D)
ONTABLE(A)
ONTABLE(D)
24. In this first route we can see three references to some block, x and these must refer to the same block, although in the search it is conceivable several blocks will become temporarily attached. Hence the binding of variables to blocks must be recorded. Investigating further we need to satisfy the first goal and this requires stacking C on some block which is clear.
CLEAR(x)
HOLDING(C)
CLEAR(x)
HOLDING(C)
STACK (C, x)
CLEAR(C)
ARMEMPTY
We now notice that one of the goals created is HOLDING(C) which was the goal we were trying to achieve by applying UNSTACK(C, some block) in this case and PICKUP(C) in the other approach. So it would appear that we have added new goals and not made progress and in terms of the A* algorithm it seems best to try the other approach. So looking at the second approach
y y y
We can see that the first goal is achieved block C is on the table. The second goal is also achieved block C is clear. Remember that HOLDING(B) is still true which means that the arm is not empty. This can be achieved by placing B on the table or planting it on block D if it is clear.
Lookahead could be used here to compare the ADD lists of the competing operators with the goals in the goal stack and there is a match with ON(B,D) which is satisfied by STACK (B,D). This also binds some block to block D. Applying STACK (B,D) generates extra goals CLEAR(D) and HOLDING(B)
The new goal stack becomes;

CLEAR(D)
HOLDING(B)
CLEAR(D)
HOLDING(B)
STACK (B, D)
ONTABLE(C)
CLEAR(C)
ARMEMPTY
PICKUP(C)
At this point the top goal is true and the next and thus the combined goal leading to the application of STACK (B,D), which means that the world model becomes ONTABLE(A) ONTABLE(C) ONTABLE(D) ON(B,D) ARMEMPTY
y y
This means that we can perform PICKUP(C) and then STACK (C,A) Now coming to the goal ON(B,D) we realise that this has already been achieved and checking the final goal we derive the following plan 1. UNSTACK(B,A) 2. STACK (B,D) 3. PICKUP(C) 4. STACK (C,A)
This method produces a plan using good Artificial Intelligence techniques such as heuristics to find matching goals and the A* algorithm to detect unpromising paths which can be discarded.
Sussman Anomaly (1975)
Above method may fail to give a good solution. Consider:
Fig. 25 Sussman's Anomaly The start state is given by: ON(C, A) ONTABLE(A) ONTABLE(B) ARMEMPTY The goal by: ON(A,B) ON(B,C) This immediately leads to two approaches as given below
1. 2. 3. 4. 5. 6. ON(A,B) ON(B,C) ON(A,B) ON(B,C)
7.
8. 9. 10. 11. 12. 13. ON(B,C) ON(A,B) ON(A,B) ON(B,C)
14.
Choosing path 1 and trying to get block A on block B leads to the goal stack:
ON(C,A)
CLEAR(C)
ARMEMPTY
ON(C,A)
CLEAR(C)
ARMEMPTY
UNSTACK(C,A)
ARMEMPTY
CLEAR(A)
ARMEMPTY
PICKUP(A)
CLEAR(B)
HOLDING(A)
STACK(A,B)
ON(B,C)
ON(A,B)
ON(B,C)
This achieves block A on block Bwhich was produced by putting block C on the table. The sequence of operators is
1. 2. 3. 4. UNSTACK(C,A) PUTDOWN(C) PICKUP(A) STACK (A,B)
Working on the next goal of ON(B,C) requires block B to be cleared so that it can be stacked on block C. Unfortunately we need to unstack block A which we just did. Thus the list of operators becomes
1. 2. 3. 4. 5. 6. 7. 8.
UNSTACK(C,A) PUTDOWN(C) PICKUP(A) STACK (A,B) UNSTACK(A,B) PUTDOWN(A) PICKUP(B) STACK (B,C)
To get to the state that block A is not on block B two extra operations are needed:
8 1. PICKUP(A) 2. STACK(A,B)
Analysing this sequence we observe that

y y
Steps 4 and 5 are opposites and therefore cancel each other out, Steps 3 and 6 are opposites and therefore cancel each other out as well.
So a more efficient scheme is:

1. 2. 3. 4. 5. 6. UNSTACK(C,A) PUTDOWN(C) PICKUP(B) STACK (B,C) PICKUP(A) STACK(A,B)
To produce in all such cases this efficient scheme where this interaction between the goals requires more sophisticated techniques which will be considered in the next lecture.
Further Reading
Rich and Knight's book serves as the basis for most of examples given in this lecture. The Handbook of Artificial Intelligence (Vol. 3 Ch 15) deals with planning and gives descriptions of STRIPS and other planners.
Exercises
1. Devise STRIPS style operators to describe the PICKUP and PUTDOWN operators similar to the STACK and UNSTACK operators given in notes. 2. Express the STRIPS style operators STACK, UNSTACK, PICKUP and PUTDOWN in a more computation compatible form using clear definitions P (Precondition), D (Delete) and A (Add) and the predicates given in the notes. E.g the form should be:
3. 4. 5. 6. 7. 8. STACK(x, y) P: describe using predicates D: A:
9. Work through the Sussman anomaly example step by step (using Goal Stack Planning) to produce the goal stack given for getting block A on block B which was produced by putting block C on the table. 10. Work through the Sussman anomaly example further (using Goal Stack Planning) to derive the sequence of operators given at each stage in the lectures. 11. Consider the following:
Fig. 26 1. Describe the start and goal states. 2. Solve the problem using Goal Stack Planning.
Common Sense
True Intelligent systems exhibit common sense -- they possess more than enough knowledge to be able to work in a given environment. We have already mentioned the CYC system which is an ambitious attempt to code up common sense. However as this example illustrates you require a very large knowledge base for this type of system. Common sense systems need to support:
y y y
Descriptions of everyday objects -- Frames. Typical sequences of everyday events -- Scripts. Default reasoning -- Nonmonotonic logics.
Common sense strategies illustrate many important AI topics. We will study how this can be implemented drawing on many of the topics we have studied previously.
y y y y
The Physical World -- Qualitative Physics o Modelling the Qualitative World o Reasoning with qualitative information Common sense Ontologies o Time o Space o Materials Memory Organisation Memory in problem solving Further Reading Exercises
The Physical World -- Qualitative Physics

Qualitative Physics is an area of AI concerned with reasoning about the behaviour of physical systems. It is a good area to study since humans know a great deal about this world:
y y y
They can predict that a falling ball will bounce many times. They can predict the projection of cricket ball and even catch it. They know a pendulum swings back and fore finally coming to rest in the middle.
However most humans whilst being to operate in this world have no notion of the laws of physics that govern this world. We can clearly look up the information and derive equations to describe, say pendulum motion. Indeed computers are very good at this sort of computation when they have been programmed by experienced programmers. Is this how an intelligent system functions in this world? Three year old children can and cannot even read or do elementary maths? One other motivation is that whilst complex computer models can be assembled many problems remain difficult or impossible to solve analytically. Systems of equations (differential etc.) might hard to derive and even impossible to solve.
Modelling the Qualitative World
Qualitative physics seeks to understand physical processes by building models of them.
A model may consist of the following entities:

Variables -- make take on values as in a traditional physics model but with a restricted set of values, e.g. temperature as Quantity Spaces -- a small set of discreet values for a variable. Rate of Change -- Variables take on different values at different times. A real valued rate of change can be modelled qualitatively with a quantity space, e.g. Expressions -- Combination of variables. Equations -- Assignment of expression to variables. States -- Sets of variables who's values change over time. . .
Note that qualitative algebra is different: Say we describe the volume of glass as Then when we add two qualitative values together we get:
Reasoning with qualitative information
Reasoning in this area is often called qualitative simulation.
The basic idea being:

y y y
Construct a sequence of discrete episodes that occur as qualitative variable value changes. States are linked by qualitative rules that may be general. Rules may be applied to many objects simultaneously as they may all influence each other -constraint satisfaction used. Ambiguity may arise so split outcomes into different paths and form a network of all possible states and transitions. Each path is called a history the network an envisionment. In order to achieve effective programs for this we must know how to represent the behaviour of many kinds of processes, materials and the world in which they act.
y y y
Common sense Ontologies

Some concepts are fundamental to common sense reasoning.
y y
Time Space
Time
Here we address notions of time familiar to most people as opposed to the philosophical nature of time. For instance:
y y y y
Jimi Hendrix recorded albums between the mid 1960's and 1970. Jimi Hendrix died in 1970. Beautiful People released an album based on samples of all of Hendrix's recorded music. We can easily infer that Beautiful People's album was released after 1970.
The most basic notion of time is occupied by events:

y y y y
Events occur during intervals -- continuous spaces of time. An interval has a start and end point and a duration (of time) between them. Intervals can be related to one another -- descriptions such as is-before, is-after, meets, is-metby, starts, is-started-by, during, contains, ends, is-ended-by and equals. We can build a axioms with intervals to describe events in time.
Space
The Blocks World is a simple example of we can model and describe space. However common sense notions such a place object x near object y are not accommodated. Now objects have a spatial extent while events have a temporal extent. So we might try to extend of common sense theory of time. However space is 3D and there are many more relationships than those for time so it is not a good idea. Another approach is view objects and space at various levels of abstraction. E.g. We can view most printed circuit boards as being a 2D object. Choosing a representation means selecting relevant properties at particular levels of granularity. For instance we can define relations over spaces such as inside, adjacent etc. We can also define relations for curves, lines, surfaces, planes and volumes. E.g. along, across, perpendicular etc.
Materials
We need to describe properties of materials:

y y y y
You cannot walk on water. If you knock a cup of coffee over what happens? If you pour a full kettle into a cup what happens? You can squeeze a sponge but not a brick.
Liquids (as can be seen from above) provide many interesting points. It is useful to think of spaces occupied by objects. Thus we can define properties such as:
y y y
Capacity -- a bound to an amount of liquid. Amount -- volume occupied by a liquid. Full -- if amount equals capacity.
Other properties materials can posses include:

y y y y
Free -- if a space is not wholly contained inside another object. Surround -- if enclosed by a very thin free space. Rigid Flexible
Particulate -- e.g. sand
Memory Organisation
Memory is central to common sense behaviour and also the basis for learning. Human memory is still not fully understood however psychologists have proposed several ideas:
y y
Short term memory (STM) -- only a few items at a time can be held here. Perceptual information stored directly here. Long term memory (LTM) -- capacity for storage is very large and fairly permanent.
LTM is often divided up further:

y y
Episodic memory -- contains information about personal experiences. Semantic memory -- general facts with no personal meaning. E.g. Birds fly. Useful in natural language understanding.
In terms of AI research work started by Quillian on semantic memory led to semantic networks and frames and other slot and filler structures. Work on episodic memory grew out of scripts. Production systems are an example of STN-LTM computer models.
Memory in problem solving

Let us finish this topic by seeing how memory is employed in problem solving. We have seen that many problems are solved by analogy. Computer systems that perform this task are sometimes called case based reasoning (CBR) systems. CBR systems employ large case libraries rather than descriptions from first principles. They therefore rely heavily on memory organisation and retrieval.
y
A rich indexing system must be employed -- when reasoning with a problem only relevant past experience should be recalled. o Index by features present in problem. Require some measure of relevance of retrieved information. o Some features only important in a certain context. o Inductive and explanation based learning suitable here. The data structures used will be important as the number of cases represented will be large. o Do we retrieve all information about a case or fragment of it?
A number of cases are usually retrieved. We need to select the best one using some heuristic which may include: o Goal directed preference -- cases that include same goal as current problem. o Salient feature preference -- cases that include the most important (or largest number of) features. o Specificity preference -- certain match features identified. o Frequency preference -- select frequently matched cases. o Recency preference -- select recently matched cases. o Ease of adaptation preference -- cases whose features easily modified for new problem.
Further Reading
The Rich and Knight book covers all issues mentioned as does The Handbook of Artificial Intelligence, Vol 4 Ch XXI. .
Exercises
1. Consider a toy balloon being blown up worth compressed air. As air is released the balloon expands. Use qualitative measures to list the quantity spaces of variables of rate of change in this system. Construct an envisionment for the system and devise one possible history. 2. Consider the function of a single stroke engine. Fuel is admitted to a cylinder and is ignited. As the fuel is ignited the air fuel mixture expands in a controlled explosion and the cylinder piston is moved. Use qualitative measures to list the quantity spaces of variables of rate of change in this system. Construct an envisionment for the system and devise one possible history. 3. Compare and contrast case based reasoning and learning by analogy. How may transformational and derivational analogies be applied in case based reasoning systems. 4. Forgetting is one aspect of human memory. How could this be modelled in a case based reasoning system. Under what circumstances might this be beneficial?

AI Systems and Definitions

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

AI Systems and Definitions

Enviado por

Direitos autorais:

Formatos disponíveis

AI Systems and Definitions

Search, Knowledge Representation, applications of the above.

An example of intelligent action

Where to buy? There are many possibilities:

What reasoning has gone on now?

Probable -- I can find a suitable CD in Cardiff.

Knowledge Representation, and Search

Some problems highlight search whilst others knowledge representation.

compare the current state of understanding to normal human abilities.

AI Key Concepts - Revisited

Problems and Search

There are 2 basic ways to perform a search:

Blind Search Depth-First Search

Note: All numbers in Fig 1 refer to order visited in search.

A heuristic is a method that

The A* Algorithm Best first search is a simplified A*.

Knowledge Representation and Search

And-Or Graphs AO* Algorithm

Useful for certain problems where

Why are these topics important?

Reasoning -- Infer facts from existing data. If a system on only knows:

Properties for Knowledge Representation Systems

Approaches to Knowledge Representation

Simple relational knowledge

Simple relational knowledge

Figure: Simple Relational Knowledge We can ask things like:

Who is dead? Who plays Jazz/Trumpet etc.?

This sort of representation is popular in database systems.

Relational knowledge is made up of objects consisting of

attributes corresponding associated values.

We extend the base more by allowing inference mechanisms:

Fig. 8 Property Inheritance Hierarchy

The algorithm to retrieve a value for an attribute of an instance object:

Issue in Knowledge Representation

Summary and the way forward

Logic Knowledge Representation

Further reading Exercises

An example Isa and instance relationships

Consider the following:

Isa and instance relationships

Applications and extensions

All courses in Sociology are easy. ``Society is evil'' is a sociology course.

Procedural Knowledge Representations

Representing How to Use Knowledge

Weak Slot and Filler Structures

Why use this data structure?

Why use this data structure?

We will study two types:

Semantic Nets. Frames.

Representation in a Semantic Net Inference in a Semantic Net Extending Semantic Nets

Representation in a Semantic Net

The physical attributes of a person can be represented as in Fig. 9.

Fig. 11 A Semantic Network for a Sentence

in the following Semantic net:

Fig. 12 Comparison of two heights

Partitioned Networks Partitioned Semantic Networks allow for:

propositions to be made without commitment to truth. expressions to be quantified.

Fig. 12 Partitioned network

a class (set), or an instance (an element of a class).

Frame Knowledge Representation Interpreting frames

Frame Knowledge Representation

Consider the example first discussed in Semantics Nets (Section 6.2.1):

Many values for a slot.