Escolar Documentos
Profissional Documentos
Cultura Documentos
Course Outline
1. Introduction to artificial intelligence 2. Knowledge representation 3. Heuristic search 4. Natural language processing 5. Symbolic machine learning 6. Connectionism and evolutionary computation
knowledge.
Artificial intelligence is an area in computer science that focuses on creating machines
In 1955, a program was developed representing each problem as a tree model. The program would attempt to solve it by selecting the branch that would most likely result in the correct conclusion. In 1956, the term AI was coined at the Dartsmouth Conference and since then, research has continued into developing programs or applications that could efficiently solve problems and learn by themselves. Several applications have been developed e.g. the missile systems, voice and character recognition, engineering controllers etc.
programmed rules.
This allows them to perform simple monotonous tasks efficiently and reliably which human beings are ill suited to. For more complex problems, computers have trouble understanding specific situations
4. Expert systems systems that are able to make decisions and perform the work of professionals (human experts) e.g. diagnostic systems in hospitals. 5. Robotics automation of tasks performed by a mechanical device through predefined programs 6. Information predictors e.g. in banks, insurance companies, market surveys whereby intelligence tools are used to detect trends and predict e.g. customer behaviour. 7. Computer vision and pattern recognition computer processing of images from the real world and recognition of features present in images.
Some domains are hard to store in a machine e.g. affective, psychomotor Lack of a well understood model to represent reality and thus induce artificial intelligence It is expensive to acquire tools develop and research on artificial intelligence
Knowledge Representation
Knowledge is the symbolic representation of some named universe of discourse. The universe of discourse may be actual activities or fictional ones in the future or in some belief. In AI systems, we may need to represent objects, events and performance or behaviour as kinds of knowledge.
Knowledge representation is an area of artificial intelligence that is concerned with how to use symbol system to represent a domain of discourse. Its goal is to organize knowledge in a manner that facilitates drawing of conclusions.
Components of a Representation
A representation has four components: i. ii. iii. A represented world the domain that the representations are mapped` A representing world the domain that contains the representation Representing rules the set of rules that map elements of the represented world to those of the representing world iv. The representation system the procedure for extracting information in a knowledge representation; its choice determines the ease or difficulty of finding the information.
Uses of a Representation
After representing the knowledge, we use it for: i. ii. Inference/reasoning inferring facts from the existing data Learning acquiring knowledge whereby mean data has to be classified prior to storage for easy retrieval and has to interact with existing facts to avoid duplication.
Types of Knowledge
There are two main types of knowledge: i. Declarative/descriptive/propositional knowledge it is the factual information stored in memory and is known to be static in nature. It is the part of knowledge that describes how things are. Its domain is defined by things or events or processes their attributes and the relations between them ii. Procedural/imperative/know-how knowledge it is the knowledge of how to perform a task or how to operate. It is mainly applied in problem solving.
Properties of Knowledge
Good representations of Knowledge i. ii. They make the important objects and relations explicit. They expose natural constraints i.e. one can express the way one object or relation influences another. iii. iv. v. vi. vii. viii. ix. They bring objects and relations together. They suppress irrelevant detail. They are transparent i.e. the meaning can be understood clearly. They are complete i.e. consist of all that needs to be contained. They are concise i.e. they communicate the information efficiently. They are fast i.e. retrieval of information is fast. They are computable i.e. they have been created based on a known procedure.
Properties of Good Knowledge Representation Systems These characteristics can be summarised into the following four properties for knowledge representation systems: i. ii. Representational adequacy the ability to represent the required knowledge. Inferential efficiency the ability to direct the inferential mechanisms into the most productive directions by storing appropriate guides. iii. Inferential adequacy the ability to manipulate the knowledge represented to produce new knowledge corresponding to that inferred from the original. iv. Acquisitional efficiency the ability to acquire knew knowledge using automatic
iii.
Procedural component the part that specifies access procedures that enable one to create descriptions; modify them and answer questions using them.
iv.
Semantic component this part establishes a way of associating meaning with the descriptions created from the procedural part.
a. Propositional logic This is logic at the sentence level where we consider sentences or statements that are
either true or false. If a proposition is true then it has a truth value of true and if it is
false then its truth value is false Example Proposition: Saturday is the last day of the week. Non-proposition: Walk out. Simple sentences which are true or false are basic propositions. Larger and more complex sentences can be constructed from the basic propositions by combining them with connectives. Therefore, the basic elements of propositional logic are propositions and connectives. Examples of connectives are:
Truth tables are used to map the relations of propositions when they are combined with connectives. Let and propositions:
i. Not
iv. Imply T T F F T F T F T F T T
v. If and only if T T F F T F T F T F F T
b. Predicate Logic Propositional logic is not powerful enough to represent all types of assertions.
To cope with the deficiencies of propositional logic, we introduce predicates and quantifiers to form predicate logic. A predicate is a verb phrase that describes properties of objects or the relationships
For example:
Algorithm: Converting to Clause Form 1. Eliminate using the fact that 2. Reduce the scope of each laws (i.e. ( between quantifiers krzrr Consider the following set of facts: ii. Maina was a man. iii. Maina was Larilian. iv. All Larilians were Nyandaruans. v. Mugo was a chief. and ( )
vi. All Nyandaruans were either loyal to Mugo or hated him. vii. Everyone is loyal to someone. viii. People only try to stone chiefs they are not loyal to.
ix. Maina tried to stone Mugo. x. All men are people. The above was propositional logic: i. ii. iii. iv. v. vi. vii. viii. ix. ( ( ( ) ( ) ( ) ( ( ) ( ( ) ( ) ) ( ) ) ( ) ( ) ( ) ( ) ) ) ( )
The above is a predicate set of functions. Conversion of these statements to clause form or well-formed statements (wffs) would lead to: i. ii. iii. iv. v. vi. vii. viii. ix. ( ( ( ( ) ) ( ) ( ) ( ) ( ( ) ( ) ( ) ) ) ) ( ) ( ) ( ( ) ) ( )
Proof of did Maina hate Mugo? i. Express the question in predicate form: ii. Negate the statement/predicate: ( ( ) )
iii. Look for relevant statements and put them together e.g.
( )
( ) ( ) ( )
( ) ( )
( ) ( )
( ) () ( )
( )
( ) ( )
( )
() ( )
( ) ( )
( )
()
( ) ( )
( )
()
( )
()
()
NULL
ii. Rules Rules are commonly used to represent knowledge in an inference system. The rules are usually in the form of production rules (if-then rules). They are used to show relationships among variables and derive actions from input to
an inference engine.
Each rule consists of an antecedent (the if part) and a consequent (the then part). Interpreting an if-then rule involves distinct parts: Evaluating the antecedent. Applying the result to the consequent.
In the case of a binary/2-valued logic, if the premise is true, then the conclusion is true. In case of a multivalued logic, if the antecedent is true to some degree, then the consequent is also true to that same degree. Binary (0 or 1) multivalued (Range from 0 to 1 i.e. 0.5) The antecedent of a rule can have multiple parts: e.g. if the sky is grey and the wind is blowing then it will rain in such a case, all the parts of the antecedent are evaluated simultaneously and resolved into a single number/part using logical operators. iii. Natural Language Natural language is the human spoken language. It is the most expressive knowledge representation formalism since everything that can be expressed symbolically can also be expressed in natural language. Its reasoning potential is very complex but its hard to model. Problems with natural Language i. ii. iii. It is often ambiguous There is little uniformity in the structure of sentences Syntax and semantics are not fully understood
iv. Database systems Database systems are logical organizations of data in a form that makes meaning to the user and facilitates easy retrieval. Database systems are well suited to efficiently represent and process large amounts of data. However, only simple aspects of some universe of discourse can be represented hence reasoning is very simple and limited. v. Semantic Networks Semantic networks are capable of representing individual objects, categories of objects and relations among objects
Mammals
Persons
Sister of
John
Mary is a sister of John. Mary and John are members of persons. Persons have two legs. Semantic nets make it easy to perform inheritance reasoning. They are simple and efficient as compared to logic vi. Frames **to read An AI data structures used to divide knowledge into sub-structures by representing stereotyped situations. Frames are connected together to form a complete idea.
Heuristic Search
Heuristic search uses problems specific knowledge beyond the definition of the problem itself. It is also known as informed Search. It can thus arrive at solutions more efficiently than uninformed/blind search strategies.
expanded at a given breadth in the search tree before any nodes at the next level are expanded. BFS can be implemented using a FIFO queue ensuring that the nodes that are visited first will be expanded first. Evaluation of BFS Algorithm i. It is complete If the goal node is at finite depth d, BFS will eventually find it after expanding all shallower nodes. ii. Optimality
The shallowest goal node is not necessarily the optimal one hence BFS algorithm is optimal if the path cost is a non-decreasing function of the depth of the node e.g. when all the actions/moves have the same cost. iii. Time complexity
Consider a state space where every state has b successors. The root of the search tree generates b nodes at level one b2 at level 2, b3 at level 3. Each of these generates b more nodes and so on. If the solution is at level d in the worst case, we would expand all but the last nodes at level d. this would result in exponential complexity of generated nodes. ( )
Every node that is generated must remain in memory hence space complexity also grows exponentially. BFS places a very high demand on memory. Depth First Search (DFS) The search proceeds to the deepest level of the search tree where the nodes have no successors. As the nodes are expanded, they are dropped off and the search backs up to the next shallowest node that still has unexplored successors. DFS can be implemented using stacks or LIFO queues. Comparison: both are complete, optimality is relative to the node to be searched, time complexity is relative to space.
Heuristic Searches
A key component of a heuristic search is heuristic function denoted ( ). ( ) is the
estimated cost of the cheapest path from node n to a goal node. If n is the goal node,
then ( ) .
Greedy Best First Search (GBFS) GBFS tries to expand the node that is closest to the goal on the grounds that it is likely to meet a solution quickly. It evaluates nodes using the heuristic function ( ) ( ) ( ) ( ) it resembles DFS in the way it prefers to follow a single path all the way to the end but backs up when it hits a dead end. Just like DFS, it is neither optimal nor complete.
366
253
329
374
172
380
193
0 A* Search It evaluates nodes by combining ( ), the cost to reach the node and get from the node to the goal ( ) ( ) ( ) Since ( ) gives the path cost from the start node to node n, ( ) is the estimated cost for the cheapest solution through node n. therefore, in trying to find the cheapest
solution, we try the node with the lowest value of ( ) optimal. From the above algorithm, route would be ABEH and ( )
(cheapest route)
Learning
Forms of Learning
The field of machine learning distinguishes three forms of learning: a. Supervised learning b. Unsupervised learning c. Reinforced learning The type of feedback available is usually the most important factor in determining the nature of learning the agent takes. 1. Supervised Learning It involves learning a function from example of its inputs and outputs e.g. learning multiplication tables.
The correct output values are first provided after which a learning agent can get the correct output from its perceived knowledge. For fully observable environments, an agent can observe the effects of its actions and hence can use supervised learning methods to learn to predict them. inputs outputs
Learning function
2. Unsupervised Learning It involves learning patterns in the input when no specific output values are supplied. input
Learning function
A purely unsupervised learning agent cannot learn what to do because it has no information as to what constitutes a correct action or desirable state. An example is conducting a research. 3. Reinforced Learning Rather than being told what to do, the agent learns from reinforcement maybe a reward or its absence (teaches behavioral skills e.g. potty training, promoting hard working employees, etc.) The design of a learning element is affected by three major concerns:
i. ii. iii.
Which components of the performance element are to be learned? What feedback is available to learn these components? What representation is used for the components?
The components of a performance element in learning may include the following: i. ii. iii. iv. A direct mapping from conditions of the current state to action. A means to make an inference of relevance properties of the world being learned. Information about the way the world being learned responds and the results of possible actions. Information indicating the desirable states and actions.
viii. ix. x.
Reservation whether we have made a reservation or not. Type the kind of restaurant (e.g. Italian/French). WaitEstimate the wait estimated by the host (0-10mins, 10-30mins, 3060mins, ).
Example
Alt
Bar
Fri/Sat
Hungry
Ptrns
Price
Rain
Rsvn
Type
est
Some Full Some Full Full Some None Some Full Full None Full
French Thai Burger Thai French Italian Burger Thai Burger Italian Thai Burger
0-10 30-60 0-10 10-30 >60 0-10 0-10 0-10 >60 10-30 0-10 30-60
The restaurant scenario is an example of Boolean decision tree which consists of a vector of input attributes, X and a single Boolean output, Y. a set of examples (x_1, y_1), , (x_12, y_12) are as shown above. Decision trees are fully expressive in the class of proportional languages (dealing with one variable) since any Boolean function can be written as a decision tree. Positive examples are the ones in which the goal will wait is true e.g. x_1, x_3, x_4, while the negative examples are the ones in which it is false. The complete set of examples is called the training set. The idea behind decision tree learning algorithm is to test the most important attribute first i.e. the attribute that makes most difference to the classification to the training example. This hopes to get the correct classification with a small number of tests implying that all paths in the tree will be short and the tree as a whole will be small e.g. starting with patrons then hungry as opposed to starting with type.
hungry
2, 4, 5, 9, 10
yes
2, 4, 10
Testing both:
no
5, 9
type? french
1, 5
italian
6, 10
thai
2, 4, 8, 11
bugger
3, 12, 7, 9
patrons none
7, 11 negative (no)
some
1, 3, 6, 8 positive (yes)
2, 4, 5, 9, 10, 12
full
yes
no
This is a poor attribute because it leaves us with four outcomes, each with the same number of positive and negative examples. Patrons is a fairly important attribute; if the value is none or some then we are left with example sets which we can answer definitively. Considerations for the Recursive Algorithm include: i. ii. iii. If there are some positive and negative examples, chose the best attribute to split them. If all remaining examples are positive or negative then we can answer yes or no/true or false. If there are no examples left, it means that no such example has been observed and will return a default value calculated from the majority at the nodes parent. iv. If there are no attributes left but both positive and negative examples then there is a problem. It means that the examples have the same descriptions but different classifications as a result of incorrect data or when attributes do not give enough information to describe the situation fully.
10
20
30
40
50
60
70
80
Training set size As the training set grows, prediction quality increases. NB: the learning algorithm must not be allowed to see the test data before the learned hypothesis is tested on them. d.i.y. noise over fitting research on these terms regarding problems when using decision trees for training and how to minimize them. In order to extend decision trees to a wider variety of problems, the following issues must be addressed: Missing data: in many domains, not all attribute values will be known for every example. The values might have not been recorded or too expensive to obtain. Multivalued attributes: when an attribute has many possible values, the information gain measure gives an inappropriate indication on the attributes usefulness. Continuous and integer-valued input attributes: they have an infinite set of possible values that would generate infinitely many branches. Typically, we find the split point that gives the highest information gain.