Artificial Intelligence

1.0 1.
ARTIFICIAL INTELLIGENCE Introduction
Artificial intelligence (AI) is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success. John McCarthy, who coined the term in 1956, defines it as "the science and engineering of making intelligent machines." The field was founded on the claim that a central property of humans, intelligencethe sapience of Homo sapienscan be so precisely described that it can be simulated by a machine. This raises philosophical issues about the nature of the mind and the ethics of creating artificial beings, issues which have been addressed by myth, fiction and philosophy since antiquity. Artificial intelligence has been the subject of optimism, but has also suffered setbacks and, today, has become an essential part of the technology industry, providing the heavy lifting for many of the most difficult problems in computer science. AI research is highly technical and specialized, and deeply divided into subfields that often fail to communicate with each other. Subfields have grown up around particular institutions, the work of individual researchers, the solution of specific problems, longstanding differences of opinion about how AI should be done and the application of widely differing tools. The central problems of AI include such traits as reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate objects. General intelligence (or "strong AI") is still among the field's long term goals.
1.2
Characteristics
In order for something to be considered as an "Artificial Intelligence," there are a few different characteristics that are required. Some of these characteristics include the following abilities:

The ability to act intelligently, as a human. The ability to behave following "general intelligent action." The ability to artificially simulate the human brain. The ability to actively learn and adapt as a human. The ability to process language and symbols.
As it can be seen from just these few characteristics, Artificial Intelligence primarily concerns the ability for a computer to mimic human intelligence. That is its key characteristic.
1.3
Branches
Branches of Artificial Intelligence are as follow: 1. 2. 3. 4. Automatic Programming Specify behavior, allow AI system to write the program. Bayesian Networks, building networks with probabilistic information. Constraint Satisfaction Solving NP - complete problems using a variety of techniques. Knowledge Engineering, transforming human knowledge into a form that a computer can understand. Machine Learning Programs that learn from past experience. Neural Networks Modeling programs that are structured like mammalian brains. Planning Systems that identify the best sequence of actions to reach a given goal.
5. 6. 7.
1.4
Approach
Neural Networks and Parallel Computation The human brain is made up of a web of billions of cells called neurons, and understanding its complexities is seen as one of the last frontiers in scientific research. It is the aim of AI researchers who prefer this bottom-up approach to construct electronic circuits that act as neurons do in the human brain. Although much of the working of the brain remains unknown, the complex network of neurons is what gives humans intelligent characteristics. By itself, a neuron is not intelligent, but when grouped together, neurons are able to pass electrical signals through networks. The neuron "firing", is passing a signal to the next in the chain.
Research has shown that a signal received by a neuron travels through the dendrite region, and down the axon. Separating nerve cells is a gap called the synapse. In order for the signal to be transferred to the next neuron, the signal must be converted from electrical to chemical energy. The signal can then be received by the next neuron and processed.
2.0
2.1
KNOWLEDGE REPRESENTATION
Introduction
Knowledge Representation (KR) research involves analysis of how to accurately and effectively reason and how best to use a set of symbols to represent a set of facts within a knowledge domain. A symbol vocabulary and a system of logic are combined to enable inferences about elements in the KR to create new KR sentences. Logic is used to supply formal semantics of how reasoning functions should be applied to the symbols in the KR system. Logic is also used to define how operators can process and reshape the knowledge. Examples of operators and operations include, negation, conjunction, adverbs, adjectives, quantifiers and modal operators. Interpretation theory is this logic. These elements: symbols, operators, and interpretation theory, are what give sequences of symbols meaning within a KR. A key parameter in choosing or creating a KR is its expressivity. The more expressive a KR, the easier and more compact it is to express a fact or element of knowledge within the semantics and grammar of that KR. However, more expressive languages are likely to require more complex logic and algorithms to construct equivalent inferences. A highly expressive KR is also less likely to be complete and consistent. Less expressive KRs may be both complete and consistent.
In applying KR systems to practical problems, the complexity of the problem may exceed the resource constraints or the capabilities of the KR system. In computer science, particularly artificial intelligence, a number of representations have been devised to structure information. KR is most commonly used to refer to representations intended for processing by modern computers, and in particular, for representations consisting of explicit objects (the class of all elephants, or Clyde a certain individual), and of assertions or claims about them ('Clyde is an elephant', or 'all elephants are grey'). Representing knowledge in such explicit form enables computers to draw conclusions from knowledge already stored ('Clyde is grey'). Many KR methods were tried in the 1970s and early 1980s, such as heuristic questionanswering, neural networks, theorem proving, and expert systems, with varying success. Medical diagnosis (e.g., Mycin) was a major application area, as were games such as chess. In the 1980s formal computer knowledge representation languages and systems arose. Major projects attempted to encode wide bodies of general knowledge; for example the "Cyc" project (still ongoing) went through a large encyclopedia, encoding not the information itself, but the information a reader would need in order to understand the encyclopedia: naive physics; notions of time, causality, motivation; commonplace objects and classes of objects. There are several KR techniques such as frames, rules, tagging, and semantic networks which originated in Cognitive Science. Since knowledge is used to achieve intelligent behavior, the fundamental goal of knowledge representation is to facilitate reasoning, inferencing, or drawing conclusions. A good KR must be both declarative and procedural knowledge. What is knowledge representation can best be understood in terms of five distinct roles it plays, each crucial to the task at hand.
A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing itself, used to enable an entity to determine consequences by thinking rather than acting, i.e., by reasoning about the world rather than taking action in it. It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representation's fundamental conception of intelligent reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of inferences it recommends. It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences. It is a medium of human expression, i.e., a language in which we say things about the world." Characteristics
2.2
A good knowledge representation covers six basic characteristics:

Coverage, which means the KR covers a breath and depth of information. Without a wide coverage, the KR cannot determine anything or resolve ambiguities. Understandable by humans. KR is viewed as a natural language, so the logic should flow freely. It should support modularity and hierarchies of classes (Polar bears are bears, which are animals). It should also have simple primitives that combine in complex forms. Consistency. If John closed the door, it can also be interpreted as the door was closed by John. By being consistent, the KR can eliminate redundant or conflicting knowledge. Efficient Easiness for modifying and updating. Supports the intelligent activity which uses the knowledge base Language and Notation
2.3
Some think it is best to represent knowledge in the same way that it is represented in the human mind, or to represent knowledge in the form of human language. Psycholinguistics investigates how the human mind stores and manipulates language. Other branches of cognitive science examine how human memory stores sounds, sights, smells, emotions, procedures, and abstract ideas. Science has not yet completely described the internal mechanisms of the brain to the point where they can simply be replicated by computer programmers. Various artificial languages and notations have been proposed for representing knowledge. They are typically based on logic and mathematics, and have easily parsed grammars to ease machine processing. 2.4 Storage and Manipulation
One problem in knowledge representation is how to store and manipulate knowledge in an information system in a formal way so that it may be used by mechanisms to accomplish a given
task. Examples of applications are expert systems, machine translation systems, computer-aided maintenance systems and information retrieval systems (including database front-ends). Semantic networks may be used to represent knowledge. Each node represents a concept and arcs are used to define relations between the concepts .The Conceptual graph model is probably the oldest model still alive. One of the most expressive and comprehensively described knowledge representation paradigms along the lines of semantic networks is MultiNet (an acronym for Multilayered Extended Semantic Networks). From the 1960s, the knowledge frame or just frame has been used. Each frame has its own name and a set of attributes, or slots which contain values; for instance, the frame for house might contain a color slot, number of floors slot, etc. Using frames for expert systems is an application of object-oriented programming, with inheritance of features described by the "is-a" link. However, there has been no small amount of inconsistency in the usage of the "is-a" link: Ronald J. Brachman wrote a paper titled "What IS-A is and isn't", wherein 29 different semantics were found in projects whose knowledge representation schemes involved an "is-a" link. Other links include the "part-of" link. Frame structures are well-suited for the representation of schematic knowledge and stereotypical cognitive patterns. The elements of such schematic patterns are weighted unequally, attributing higher weights to the more typical elements of a schema. A pattern is activated by certain expectations: If a person sees a big bird, he or she will classify it rather as a sea eagle than a golden eagle, assuming that his or her "sea-scheme" is currently activated and his "land-scheme" is not. Frame representations are object-centered in the same sense as semantic networks are: All the facts and properties connected with a concept are located in one place - there is no need for costly search processes in the database. A behavioral script is a type of frame that describes what happens temporally; the usual example given is that of describing going to a restaurant. The steps include waiting to be seated, receiving a menu, ordering, etc. The different solutions can be arranged in a so-called semantic spectrum with respect to their semantic expressivity.
2.5 DIFFERENCE BETWEEN ARTIFICIAL INTELLIGENT AND HUMAN INTELLIGENT
1. Brains are analogue; computers are digital It's easy to think that neurons are essentially binary, given that they fire an action potential if they reach a certain threshold, and otherwise do not fire. This superficial similarity to digital "1's and 0's" belies a wide variety of continuous and non-linear processes that directly influence neuronal processing. For example, one of the primary mechanisms of information transmission appears to be the rate at which neurons fire - an essentially continuous variable. Similarly, networks of neurons can fire in relative synchrony or in relative disarray; this coherence affects the strength of the signals received by downstream neurons. Finally, inside each and every neuron is a leaky integrator circuit, composed of a variety of ion channels and continuously fluctuating membrane potentials.
Failure to recognize these important subtleties may have contributed to Minksy & Papert's infamous mischaracterization of perceptrons, a neural network without an intermediate layer between input and output. In linear networks, any function computed by a 3-layer network can also be computed by a suitably rearranged 2-layer network. In other words, combinations of multiple linear functions can be modeled precisely by just a single linear function. Since their simple 2-layer networks could not solve many important problems, Minksy & Papert reasoned that larger networks also could not. In contrast, the computations performed by more realistic (i.e., nonlinear) networks are highly dependent on the number of layers - thus, "perceptrons" grossly underestimate the computational power of neural networks. 2. The brain uses content-addressable memory In computers, information in memory is accessed by polling its precise memory address. This is known as byte-addressable memory. In contrast, the brain uses content-addressable memory, such that information can be accessed in memory through "spreading activation" from closely related concepts. For example, thinking of the word "fox" may automatically spread activation to memories related to other clever animals, fox-hunting horseback riders, or attractive members of the opposite sex. The end result is that your brain has a kind of "built-in Google," in which just a few cues (key words) are enough to cause a full memory to be retrieved. Of course, similar things can be done in computers, mostly by building massive indices of stored data, which then also need to be stored and searched through for the relevant information (incidentally, this is pretty much what Google does, with a few twists). Although this may seem like a rather minor difference between computers and brains, it has profound effects on neural computation. For example, a lasting debate in cognitive psychology concerned whether information is lost from memory because of simply decay or because of interference from other information. In retrospect, this debate is partially based on the false asssumption that these two possibilities are dissociable, as they can be in computers. Many are now realizing that this debate represents a false dichotomy. 3. The brain is a massively parallel machine; computers are modular and serial An unfortunate legacy of the brain-computer metaphor is the tendency for cognitive psychologists to seek out modularity in the brain. For example, the idea that computers require memory has lead some to seek for the "memory area," when in fact these distinctions are far more messy. One consequence of this over-simplification is that we are only now learning that "memory" regions (such as the hippocampus) are also important for imagination, the representation of novel goals, spatial navigation, and other diverse functions.
Similarly, one could imagine there being a "language module" in the brain, as there might be in computers with natural language processing programs. Cognitive psychologists even claimed to have found this module, based on patients with damage to a region of the brain known as Broca's area. More recent evidence has shown that language too is computed by widely distributed and domain-general neural circuits, and Broca's area may also be involved in other computations (see here for more on this). 4. Processing speed is not fixed in the brain; there is no system clock The speed of neural information processing is subject to a variety of constraints, including the time for electrochemical signals to traverse axons and dendrites, axonal myelination, the diffusion time of neurotransmitters across the synaptic cleft, differences in synaptic efficacy, the coherence of neural firing, the current availability of neurotransmitters, and the prior history of neuronal firing. Although there are individual differences in something psychometricians call "processing speed," this does not reflect a monolithic or unitary construct, and certainly nothing as concrete as the speed of a microprocessor. Instead, psychometric "processing speed" probably indexes a heterogenous combination of all the speed constraints mentioned above. Similarly, there does not appear to be any central clock in the brain, and there is debate as to how clock-like the brain's time-keeping devices actually are. To use just one example, the cerebellum is often thought to calculate information involving precise timing, as required for delicate motor movements; however, recent evidence suggests that time-keeping in the brain bears more similarity to ripples on a pond than to a standard digital clock. 5. Short-term memory is not like RAM Although the apparent similarities between RAM and short-term or "working" memory emboldened many early cognitive psychologists, a closer examination reveals strikingly important differences. Although RAM and short-term memory both seem to require power (sustained neuronal firing in the case of short-term memory, and electricity in the case of RAM), short-term memory seems to hold only "pointers" to long term memory whereas RAM holds data that is isomorphic to that being held on the hard disk. (See here for more about "attentional pointers" in short term memory). Unlike RAM, the capacity limit of short-term memory is not fixed; the capacity of short-term memory seems to fluctuate with differences in "processing speed" (see Difference #4) as well as with expertise and familiarity. 6. No hardware/software distinction can be made with respect to the brain or mind
For years it was tempting to imagine that the brain was the hardware on which a "mind program" or "mind software" is executing. This gave rise to a variety of abstract program-like models of cognition, in which the details of how the brain actually executed those programs was considered irrelevant, in the same way that a Java program can accomplish the same function as a C++ program. Unfortunately, this appealing hardware/software distinction obscures an important fact: the mind emerges directly from the brain, and changes in the mind are always accompanied by changes in the brain. Any abstract information processing account of cognition will always need to specify how neuronal architecture can implement those processes - otherwise, cognitive modeling is grossly underconstrained. Some blame this misunderstanding for the infamous failure of "symbolic AI." 7. Synapses are far more complex than electrical logic gates Another pernicious feature of the brain-computer metaphor is that it seems to suggest that brains might also operate on the basis of electrical signals (action potentials) traveling along individual logical gates. Unfortunately, this is only half true. The signals which are propagated along axons are actually electrochemical in nature, meaning that they travel much more slowly than electrical signals in a computer, and that they can be modulated in myriad ways. For example, signal transmission is dependent not only on the putative "logical gates" of synaptic architecture but also by the presence of a variety of chemicals in the synaptic cleft, the relative distance between synapse and dendrites, and many other factors. This adds to the complexity of the processing taking place at each synapse - and it is therefore profoundly wrong to think that neurons function merely as transistors. 8. Unlike computers, processing and memory are performed by the same components in the brain Computers process information from memory using CPUs, and then write the results of that processing back to memory. No such distinction exists in the brain. As neurons process information they are also modifying their synapses - which are themselves the substrate of memory. As a result, retrieval from memory always slightly alters those memories (usually making them stronger, but sometimes making them less accurate - see here for more on this). 9. The brain is a self-organizing system This point follows naturally from the previous point - experience profoundly and directly shapes the nature of neural information processing in a way that simply does not happen in traditional microprocessors. For example, the brain is a self-repairing circuit - something known as "traumainduced plasticity" kicks in after injury. This can lead to a variety of interesting changes,
including some that seem to unlock unused potential in the brain (known as acquired savantism), and others that can result in profound cognitive dysfunction (as is unfortunately far more typical in traumatic brain injury and developmental disorders). One consequence of failing to recognize this difference has been in the field of neuropsychology, where the cognitive performance of brain-damaged patients is examined to determine the computational function of the damaged region. Unfortunately, because of the poorly-understood nature of trauma-induced plasticity, the logic cannot be so straightforward. Similar problems underlie work on developmental disorders and the emerging field of "cognitive genetics", in which the consequences of neural self-organization are frequently neglected .
10. Brains have bodies This is not as trivial as it might seem: it turns out that the brain takes surprising advantage of the fact that it has a body at its disposal. For example, despite your intuitive feeling that you could close your eyes and know the locations of objects around you, a series of experiments in the field of change blindness has shown that our visual memories are actually quite sparse. In this case, the brain is "offloading" its memory requirements to the environment in which it exists: why bother remembering the location of objects when a quick glance will suffice? A surprising set of experiments by Jeremy Wolfe has shown that even after being asked hundreds of times which simple geometrical shapes are displayed on a computer screen; human subjects continue to answer those questions by gaze rather than rote memory. A wide variety of evidence from other domains suggests that we are only beginning to understand the importance of embodiment in information processing.
3.0
NATURAL LANGUAGE PROCESSING
Definition: Natural Language Processing is a theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving humanlike language processing for a range of tasks or applications. 3.1 Natural Language Processing (NLP) NLP encompasses anything a computer needs to understand natural language (typed or spoken) and also generate the natural language. 1. Natural Language Understanding (NLU): The NLU task is understanding and reasoning while the input is a natural language. Here we ignore the issues of natural language generation.
2. Natural Language Generation (NLG): NLG is a subfield of natural language processing NLP. NLG is also referred to text generation.
3.2 NATURAL LANGUAGE A natural language (or ordinary language) is a language that is spoken, written by humans for general-purpose communication. Example: Hindi, English, French, and Chinese, etc. A language is a system, a set of symbols and a set of rules (or grammar). The Symbols are combined to convey new information. The Rules govern the manipulation of symbols.
3.3 FORMAL LANGUAGE Before defining formal language, we need to define symbols, alphabets, strings and words. Symbol is a character, an abstract entity that has no meaning by itself. e.g., Letters, digits and special characters Alphabet is finite set of symbols; an alphabet is often denoted by (sigma) e.g., B = {0, 1} says B is an alphabet of two symbols, 0 and 1. String or a word is a finite sequence of symbols from an alphabet. e.g., 01110 and 111 are strings from the alphabet B above. Language is a set of strings from an alphabet. Formal language (or simply language) is a set L of strings over some finite alphabet . Formal language is described using formal grammars.
3.4 LINGUISTIC AND LANGUAGE PROCESSING Linguistics is the science of language. Its study includes: sounds (phonology), word formation (morphology), sentence structure (syntax), Meaning (semantics), and understanding (pragmatics) etc. The levels of linguistic analysis are shown below. higher level corresponds to Speech Recognition (SR) Lower levels correspond to Natural Language Processing (NLP). Levels of Linguistic Analysis Acoustic signal ! Phonetics Production and perception of speech
Phones ! Phonology - Sound patterns of language
Letter strings ! Morphemes ! Words ! Syntax - Sentence structure Morphology - Word formation and structure Lexicon - Dictionary of words in a language
Phrases & sentences ! Semantics - Intended meaning
Meaning in context ! Pragmatics - Understanding from external info
Meaning out of context
Phones: The Phones are acoustic patterns that are significant and distinguishable in some human language. Example: In English, the L - sounds at the beginning and end of the word "loyal", are termed "light L" and "dark L" by linguists. Phonetics: Tells how acoustic signals are classified into phones. Phonology: Tells how phones are grouped together to form phonemes in particular human languages. Strings: An alphabet is a finite set of symbols. Example: English alphabets {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z} A String is a sequence of symbols taken from an alphabet. Lexicon: Lexicon is collection of information about words of a language. The information is about the lexical categories to which words belong. Example: "pig" is usually a noun (N), but also occurs as a verb (V) and an adjective (ADJ). Lexicon structure: as collection of lexical entries. Example: (pig" N, V, ADJ).
Words: Word is a unit of language that carries meaning. Example: words like bear, car, and house are very different from words like run, sleep, think, and are different from words like in, under, about. These and other categories of words have names: nouns, verbs, prepositions, and so on. Words build phrases, which in turn build sentences. Determiner: Determiners occur before nouns and indicate the kind of reference which the noun has. Example below shows determiners marked by "bold letters" the boy a bus our car these children both hospitals. Morphology: Morphology is the analysis of words into morphemes, and conversely the synthesis of words from morphemes. Morphemes: A smallest meaningful unit in the grammar of a language. A smallest linguistic unit that has semantic meaning. Example: the word "unbreakable" has 3 morphemes: 1 "un-" a bound morpheme; 2 "-break-" a free morpheme; and 3 "-able" a bound morpheme; Also "un-" is also a prefix; "-able" is a suffix; both are affixes. Morphemes are of many types, stated in the next slide. Syntax: Syntax is the structure of language. It is the grammatical arrangement of words in a sentence to show its relationship to one another in a sentence; Syntax is finite set of rules that specifies a language; Syntax rules govern proper sentence structure; Syntax is represented by Parse Tree, a way to show the structure of a language fragment, or by a list . Semantics: Semantic is meaning of words / phrases/ sentences/ whole texts. Normally semantic is restricted to "meaning out of context" - that is, meaning as it can be determined without taking context into account. Pragmatics: Pragmatics tells how language is used; that is meaning in context. Example: if someone says "the door is open" then it is necessary to know which door "the door" refers to; Need to know what the intention of the speaker: Could be a pure statement of fact, Could be an explanation of how the cat got in, or Could be a request to the person addressed to close the door.
3.5 STEPS OF NATURAL LANGUAGE PROCESSING (NLP) Natural Language Processing is done at 5 levels. These levels are: 1. Morphological and Lexical Analysis : The lexicon of a language is its vocabulary that includes its words and expressions. Morphology is the identification, analysis and description of structure of words. The words are generally accepted as being the smallest units of syntax. The syntax refers to the rules and principles that govern the sentence structure of any individual language.
Lexical analysis: The aim is to divide the text into paragraphs, sentences and words. The lexical analysis cannot be performed in isolation from morphological and syntactic analysis 2. Syntactic Analysis : Here the analysis is of words in a sentence to know the grammatical structure of the sentence. The words are transformed into structures that show how the words relate to each others. Some word sequences may be rejected if they violate the rules of the language for how words may be combined. Example: An English syntactic analyzer would reject the sentence say: " Boy the go the to store ".
3. Semantic Analysis : It derives an absolute (dictionary definition) meaning from context; it determines the possible meanings of a sentence in a context. The structures created by the syntactic analyzer are assigned meaning. Thus, a mapping is made between the syntactic structures and objects in the task domain. The structures for which no such mapping is possible are rejected. Example: the sentence "Colorless green ideas . . . would be rejected as semantically anomalous because colorless and green make no sense. 4. Discourse Integration : The meaning of an individual sentence may depend on the sentences that precede it and may influence the meaning of the sentences that follow it. Example: the word it in the sentence, "you wanted it" depends on the prior discourse context. 5. Pragmatic analysis : It derives knowledge from external commonsense information; it means understanding the purposeful use of language in situations, particularly those aspects of language which require world knowledge; the idea is, what was said is reinterpreted to determine what was actually meant. Example: the sentence "Do you know what time it is?" should be interpreted as a request.
3.6 SYNTACTIC PROCESSING Syntactic Processing converts a flat input sentence into a hierarchical structure that corresponds to the units of meaning in the sentence. The Syntactic processing has two main components: One is called grammar, and Other is called parser. Grammar: It is a declarative representation of syntactic facts about the language. It is the specification of the legal structures of a language. It has three basic components: terminal symbols, non-terminal symbols, and rules (productions). Parser: It is a procedure that compares the grammar against input sentences to produce a parsed structure called parse tree.
4.0
KNOWLEDGE REPRESENTATION
Knowledge representation (KR) is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate inferencing from those knowledge elements, creating new elements of knowledge. The KR can be made to be independent of the underlying knowledge model or knowledge base system (KBS) such as a semantic network. Knowledge Representation (KR) research involves analysis of how to accurately and effectively reason and how best to use a set of symbols to represent a set of facts within a knowledge domain. A symbol vocabulary and a system of logic are combined to enable inferences about elements in the KR to create new KR sentences. Logic is used to supply formal semantics of how reasoning functions should be applied to the symbols in the KR system. Logic is also used to define how operators can process and reshape the knowledge. Examples of operators and operations include negation, conjunction, adverbs, adjectives, quantifiers and modal operators. Interpretation theory is this logic. These elements--symbols, operators, and interpretation theory-are what give sequences of symbols meaning within a KR.
A key parameter in choosing or creating a KR is its expressivity. The more expressive a KR, the easier and more compact it is to express a fact or element of knowledge within the semantics and grammar of that KR. However, more expressive languages are likely to require more complex logic and algorithms to construct equivalent inferences. A highly expressive KR is also less likely to be complete and consistent. Less expressive KRs may be both complete and consistent. Autoepistemic temporal modal logic is a highly expressive KR system, encompassing meaningful chunks of knowledge with brief, simple symbol sequences (sentences). Propositional logic is much less expressive but highly consistent and complete and can efficiently produce inferences with minimal algorithm complexity. Nonetheless, only the limitations of an underlying knowledge base affect the ease with which inferences may ultimately be made (once the appropriate KR has been found). This is because a knowledge set may be exported from a knowledge model or knowledge base system (KBS) into different KRs, with different degrees of expressivenes, completeness, and consistency. If a particular KR is inadequate in some way, that set of problematic KR elements may be transformed by importing them into a KBS, modified and operated on to eliminate the problematic elements or augmented with additional knowledge imported from other sources, and then exported into a different, more appropriate KR.
In applying KR systems to practical problems, the complexity of the problem may exceed the resource constraints or the capabilities of the KR system. Recent developments in KR include the concept of the Semantic Web, and development of XML-based knowledge representation languages and standards, including Resource Description Framework (RDF), RDF Schema,
Topic Maps, DARPA Agent Markup Language (DAML), Ontology Inference Layer (OIL)[2], and Web Ontology Language (OWL).
There are several KR techniques such as frames, rules, tagging, and semantic networks which originated in Cognitive Science. Since knowledge is used to achieve intelligent behavior, the fundamental goal of knowledge representation is to facilitate reasoning, inference, or drawing conclusions. A good KR must be both declarative and procedural knowledge. What is knowledge representation can best be understood in terms of five distinct roles it plays, each crucial to the task at hand A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing itself, used to enable an entity to determine consequences by thinking rather than acting, i.e., by reasoning about the world rather than taking action in it. It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) (ii) (iii) The representation's fundamental conception of intelligent reasoning The set of inferences the representation sanctions; and The set of inferences it recommends.
It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this pragmatic efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences. It is a medium of human expression, i.e., a language in which we say things about the world."
Some Issues That Arise In Knowledge Representation From An AI Perspective Are: How do people represent knowledge? What is the nature of knowledge? Should a representation scheme deal with a particular domain or should it be general purpose? How expressive is a representation scheme or formal language? Should the scheme be declarative or procedural?
There has been very little top-down discussion of the knowledge representation (KR) issues and research in this area is a well aged quillwork. There are well known problems such as "spreading activation" (this is a problem in navigating a network of nodes), "subsumption" (this is concerned with selective inheritance; e.g. an ATV can be thought of as a specialization of a car but it inherits only particular characteristics) and "classification." For example a tomato could be classified both as a fruit and a vegetable.
In the field of artificial intelligence, problem solving can be simplified by an appropriate choice of knowledge representation. Representing knowledge in some ways makes certain problems easier to solve. For example, it is easier to divide numbers represented in Hindu-Arabic numerals than numbers represented as Roman numerals.
4.1
CHARACTERISTICS
A good knowledge representation covers six basic characteristics: COVERAGE Which means the KR covers a breath and depth of information. Without a wide coverage, the KR cannot determine anything or resolve ambiguities. UNDERSTANDABLE BY HUMANS KR is viewed as a natural language, so the logic should flow freely. It should support modularity and hierarchies of classes (Polar bears are bears, which are animals). It should also have simple primitives that combine in complex forms. CONSISTENCY If John closed the door, it can also be interpreted as the door was closed by John. By being consistent, the KR can eliminate redundant or conflicting knowledge. EFFICIENT Easy to modify and update. Supports the intelligent activity which uses the knowledge base
To gain a better understanding of why these characteristics represent a good knowledge representation, think about how an encyclopedia (e.g. Wikipedia) is structured. There are millions of articles (coverage), and they are sorted into categories, content types, and similar topics (understandable). It redirects different titles but same content to the same article (consistency). It is efficient, easy to add new pages or update existing ones, and allows users on their mobile phones and desktops to view its knowledge base. A system that employs human knowledge captured in a computer to solve problem that ordinarily require human expertise is called expert system. The knowledge base of an expert system contains both declarative and procedural knowledge. The procedural knowledge is represented in the form of heuristic if-then production rule and is completely integrated with the declarative knowledge. Its a computer application that performs a task that would otherwise be performed by a human expert. For example, there are expert systems that can diagnose human illnesses, make financial forecasts, and schedule routes for delivery vehicles. Some expert systems are designed to take the place of human experts, while others are designed to aid them. Expert systems are part of a general category of computer applications known as artificial intelligence . Expert systems are used to perform variety of extremely complicated task that in the past could be performed by only a trained human expert. Expert systems capture basic knowledge that allows the system to act as an expert when dealing with complicated problem through the application of artificial intelligence techniques. The most powerful characteristic of Expert systems is their capability to deal with challenging real world problem through the application of processes that reflect human judgment of intuition. The interference engine component of an expert systems control how and when the information in the knowledge base is applied. The user interface component cable enables the user to communicate with an expert system. Several expert systems package has been developed in the past and few popular packages are MYCIN (used in medical field), Streamer (teaches naval officer through simulation) and Dendral (estimates the molecular structure of unknown components). 4.2 Expert systems shell Development of expert system traditionally involves three participants. They are Domain expert, Knowledge engineering and Programmer. A domain expert is the most important element in the system development; it has valuable knowledge in a particular field (the problem or knowledge domain). The goal is to embody that knowledge in an expert system. The knowledge engineer extracts relevant knowledge from the domain expert and enters it in a form specific to the system. Knowledge engineering is difficult because domain expert often do not know they solve problem or are unable to verbalize their insight. The knowledge acquired from the domain expert and represented is called knowledge base. An expert system programmer decides how to process the knowledge base. Most Expert systems are written using Prolog or Lisp each of which has symbolic processing, sophisticated data definition, capabilities, interpreted code, and a uniform method of representing data instruction. This team approach is replaced by a new tool for building expert systems called shell. These tools are designed for use by a wide range of people including domain expert who may not
have sufficient knowledge about programming. They aid in acquiring knowledge, representation it in the knowledge base and verifying that the system uses it properly. Shell reduce time and cost of developing by providing structures for holding and representing knowledge (frames and rules) a program to process and draw conclusion (forward and backward conclusion).
5.0
PROBLEM SOLVING
In the simplest case of an agent reasoning about what it should do, the agent has a statebased model of the world, with no uncertainty and with goals to achieve. The agent can determine how to achieve its goals by searching in its representation of the world state space for a way to get from its current state to a goal state. It can find a sequence of actions that will achieve its goal before it has to act in the world. This notion of search is computation inside the agent. It is different from searching in the world, when it may have to act in the world, for example, an agent searching for its keys, lifting up cushions, and so on. It is also different from searching the web, which involves searching for information. Searching in this case means searching in an internal representation for a path to goal. The idea of search is straightforward: the agent constructs a set of potential partial solutions to a problem that can be checked to see if they truly are solutions or if they could lead to solutions. Search proceeds by repeatedly selecting a partial solution, stopping if it is a path to a goal, and otherwise extending it by one more arc in all possible ways. Search underlies much of artificial intelligence. When an agent is given a problem, it is usually given only a description that lets it recognize a solution, not an algorithm to solve it. It has to search for a solution The existence of public key encryption codes, where the search space is clear and the test for a solution is given for which humans nevertheless have no hope of solving and computers cannot solve in a realistic time frame demonstrates the difficulty of search. The difficulty of search and the fact that humans are able to solve some search problems efficiently suggests that computer agents should exploit knowledge about special cases to guide them to a solution. This extra knowledge beyond the search space is heuristic knowledge. One kind of heuristic knowledge in the form of an estimate of the cost from a node to a goal. 5.1 STATE SPACES One general formulation of intelligent action is in terms of state space. A state contains all of the information necessary to predict the effects of an action and to determine if it is a goal state. State-space searching assumes that: The agent has perfect knowledge of the state space and can observe what state it is in (i.e., there is full observability); The agent has a set of actions that have known deterministic effects; Some states are goal states, the agent wants to reach one of these goal states, and the agent can recognize a goal state; and A solution is a sequence of actions that will get the agent from its current state to a goal state. Example: In a tutoring system, a state may consist of the set of topics that the student knows. The action may be teaching a particular lesson, and the result of a teaching action may be that the
student knows the topic of the lesson as long as the student knows the topics that are prerequisites for the lesson being taught. The aim is for the student to know some particular set of topics. If the effect of teaching also depends on the aptitude of the student, this detail must be part of the state space, too. We do not have to model what the student is carrying if that does not affect the result of actions or whether the goal is achieved. A state-space problem consists of a set of states; a distinguished set of states called the start states; a set of actions available to the agent in each state; an action function that, given a state and an action, returns a new state; a set of goal states, often specified as a Boolean function, goal(s), that is true when s is a goal state; and a criterion that specifies the quality of an acceptable solution. For example, any sequence of actions that gets the agent to the goal state may be acceptable, or there may be costs associated with actions and the agent may be required to find a sequence that has minimal total cost. This is called an optimal solution. Alternatively, it may be satisfied with any solution that is within 10% of optimal. This framework is extended in subsequent chapters to include cases where an agent can exploit the internal features of the states, where the state is not fully observable (e.g., the robot does not know where the parcels are, or the teacher does not know the aptitude of the student), where the actions are stochastic (e.g., the robot may overshoot, or the student perhaps does not learn a topic that is taught), and where complex preferences exist in terms of rewards and punishments, not just goal states.
5.2 GRAPH SEARCHING To solve a problem, first define the underlying search space and then apply a search algorithm to that search space. Many problem-solving tasks can be transformed into the problem of finding a path in a graph. Searching in graphs provides an appropriate level of abstraction within which to study simple problem solving independent of a particular domain. A (directed) graph consists of a set of nodes and a set of directed arcs between nodes. The idea is to find a path along these arcs from a start node to a goal node. The abstraction is necessary because there may be more than one way to represent a problem as a graph. Formalizing Graph Searching A directed graph consists of: a set N of nodes and a set A of ordered pairs of nodes called arcs. In this definition, a node can be anything. All this definition does is constrain arcs to be ordered pairs of nodes. There can be infinitely many nodes and arcs. We do not assume that the graph is represented explicitly; we require only a procedure that can generate nodes and arcs as needed. The arc _n1, n2_ is an outgoing arc from n1 and an incoming arc to n2. A node n2 is a neighbor of n1 if there is an arc from n1 to n2; that is, if (n1, n2)A.
A path from node s to node g is a sequence of nodes (n0, n1. . . nk) such that s = n0, g = nk, and (ni1, ni) A; that is, there is an arc from ni1 to ni for each i. Sometimes it is useful to view a path as the sequence of arcs, (no, n1), (n1, n2). . . (nk1, nk), or a sequence of labels of these arcs . A cycle is a non-empty path such that the end node is the same as the start node, that is, a cycle is a path (n0, n1, . . . , nk) such that n0 = nk and k= 0. A directed graph without any cycles is called a directed acyclic graph (DAG). This should probably be an acyclic directed graph, because it is a directed graph that happens to be acyclic, not an acyclic graph that happens to be directed, but DAG sounds better than ADG! A tree is a DAG where there is one node with no incoming arcs and every other node has exactly one incoming arc. The node with no incoming arcs is called the root of the tree and nodes with no outgoing arcs are called leaves. An optimal solution is one of the least-cost solutions; that is, it is a path p from a start node to a goal node such that there is no path p_ from a start node to a goal node where cost(p_) < cost(p). The forward branching factor of a node is the number of arcs leaving the node. The backward branching factor of a node is the number of arcs entering the node. These factors provide measures of the complexity of graphs. When we discuss the time and space complexity of the search algorithms, we assume that the branching factors are bounded from above by a constant. 5.3 A GENERIC SEARCHING ALGORITHM It describes a generic algorithm to search for a solution path in a graph. The algorithm is independent of any particular search strategy and any particular graph. The intuitive idea behind the generic search algorithm, given a graph, a set of start nodes, and a set of goal nodes, is to incrementally explore paths from the start nodes. This is done by maintaining a frontier (or fringe) of paths from the start node that have been explored. The frontier contains all of the paths that could form initial segments of paths from a start node to a goal node.
If the procedure returns , no solutions exist (or there are no remaining solutions if the proof has been retried). The algorithm only tests if a path ends in a goal node after the path has been selected from the frontier, not when it is added to the frontier. There are two main reasons for this. Sometimes a very costly arc exists from a node on the frontier to a goal node. The search should not always return the path with this arc, because a lower-cost solution may exist. This is crucial when the least-cost path is required. The second reason is that it may be expensive to determine whether a node is a goal node.
5.4 UNINFORMED SEARCH STRATEGIES A problem determines the graph and the goal but not which path to select from the frontier. This is the job of a search strategy. A search strategy specifies which paths are selected from the frontier. Different strategies are obtained by modifying how the selection of paths in the frontier is implemented
Depth-First Search The first strategy is depth-first search. In depth-first search, the frontier acts like a last-in first-out stack. The elements are added to the stack one at a time. The one selected and taken off the frontier at any time is the last element that was added. Implementing the frontier as a stack results in paths being pursued in a depth-first manner searching one path to its completion before trying an alternative path. This method is said to involve backtracking: The algorithm selects a first alternative at each node, and it backtracks to the next alternative when it has pursued all of the paths from the first selection. Some paths may be infinite when the graph has cycles or infinitely many nodes, in which case a depth-first search may never stop. Depth-first search is appropriate when either space is restricted; many solutions exist, perhaps with long path lengths, particularly for the case where nearly all paths lead to a solution; or the order of the neighbors of a node are added to the stack can be tuned so that solutions are found on the first try. Breadth-First Search In breadth-first search the frontier is implemented as a FIFO (first-in, first-out) queue. Thus, the path that is selected from the frontier is the one that was added earliest. This approach implies that the paths from the start node are generated in order of the number of arcs in the path. One of the paths with the fewest arcs is selected at each stage. Breadth-first search is useful when space is not a problem; you want to find the solution containing the fewest arcs; few solutions may exist, and at least one has a short path length; and infinite paths may exist, because it explores all of the search space, even with infinite paths. It is a poor method when all solutions have a long path length or there is some heuristic knowledge available. It is not used very often because of its space complexity. Lowest-Cost-First Search When a non-unit cost is associated with arcs, we often want to find the solution that minimizes the total cost of the path. For example, for a delivery robot, costs may be distances and we may want a solution that gives the minimum total distance. Costs for a delivery robot may be resources required by the robot to carry out the action represented by the arc. In each of these cases, the searcher should try to minimize the total cost of the path found to reach the goal. The simplest search method that is guaranteed to find a minimum cost path is similar to breadth-first
search; however, instead of expanding a path with the fewest number of arcs, it selects a path with the minimum cost. This is implemented by treating the frontier as a priority queue ordered
5.5 HEURISTIC SEARCH One form of heuristic information about which nodes seem the most promising is a heuristic function h(n), which takes a node n and returns a non-negative real number that is an estimate of the path cost from node n to a goal node. The function h(n) is an underestimate if h(n) is less than or equal to the actual cost of a lowest-cost path from node n to a goal. The heuristic function is a way to inform the search about the direction to a goal. It provides an informed way to guess which neighbor of a node will lead to a goal. It must use only information that can be readily obtained about a node. Typically a trade-off exists between the amount of work it takes to derive a heuristic value for a node and how accurately the heuristic value of a node measures the actual path cost from the node to a goal. A standard way to derive a heuristic function is to solve a simpler problem and to use the actual cost in the simplified problem as the heuristic unction of the original problem. A Search A search is a combination of lowest-cost-first and best-first searches that considers both path cost and heuristic information in its selection of which path to expand. For each path on the frontier, A uses an estimate of the total path cost from a start node to a goal node constrained to start along that path. It uses cost(p), the cost of the path found, as well as the heuristic function h(p), the estimated path cost from the end of p to the goal. For any path p on the frontier, define f (p) = cost(p) + h(p). This is an estimate of the total path cost to follow path p then go to a goal node. If n is the node at the end of path p, this can be depicted as follows: actual estimate
start ---------- n ------------ goal cost(p) h(p) f (p) 5.6 MORE SOPHISTICATED SEARCH A number of refinements can be made to the preceding strategies. Cycle Checking It is possible for a graph representing a search space to include cycles. For example, in the robot delivery domain, the robot can go back and forth between nodes. Some of the aforementioned search methods can get trapped in cycles, continuously repeating the cycle and never finding an answer even in finite graphs. The other methods can loop though cycles, but eventually they still find a solution. The simplest method of pruning the search tree, while guaranteeing that a solution will be found in a finite graph, is to ensure that the algorithm does not consider
neighbors that are already on the path from the start. A cycle check or loop check checks for paths where the last node already appears on the path from the start node to that node. With a cycle check, only the paths <s0, . . . , sk, s>, where s / {s0, . . . , sk}, are added to the frontier . Multiple-Path Pruning There is often more than one path to a node. If only one path is required, a search algorithm can prune from the frontier any path that leads to a node to which it has already found a path. Multiple-path pruning can be implemented by keeping a closed list of nodes that have been expanded. This approach does not necessarily guarantee that the shortest path is not discarded. Something more sophisticated may have to be done to guarantee that an optimal solution is found. To ensure that the search algorithm can still find a lowest-cost path to a goal, one of the following can be done: Make sure that the first path found to any node is a lowest-cost path to that node, and then prune all subsequent paths found to that node, as discussed earlier. If the search algorithm finds a lower-cost path to a node than one already found, it can remove all paths that used the higher-cost path to the node (because these cannot be on an optimal solution). That is, if there is a path p on the frontier <s, . . . , n, . . . ,m>, and a path p_ to n is found that is shorter than the portion of the path from s to n in p, then p can be removed from the frontier. Whenever the search finds a lower-cost path to a node than a path to that already found, it can incorporate a new initial section on the paths that have extended the initial path. Thus, if there is a path p = <s, . . . , n, . . . ,m> on the frontier, and a path p_ to n is found that is shorter than the portion of p from s to n, then p_ can replace the initial part of p to n. Iterative Deepening One way to combine the space efficiency of depth-first search with the optimality of breadth-first methods is to use iterative deepening. The idea is to recompute the elements of the frontier rather than storing them. Each computation can be a depth-first search, which thus uses less space. Consider making a breadth-first search into an iterative deepening search. This is carried out by having a depth-first searcher, which searches only to a limited depth. It can first do a depth-first search to depth 1 by building paths of length 1 in a depth-first manner. Then it can build paths to depth 2, then depth 3, and so on. It can throw away all of the previous computation each time and start again. Eventually it will find a solution if one exists, and, as it is enumerating paths in order, the path with the fewest arcs will always be found first. When implementing an iterative deepening search, you have to distinguish between: Failure because the depth bound was reached and Failure that does not involve reaching the depth bound. In the first case, the search must be retried with a larger depth bound. In the second case, it is a waste of time to try again with a larger depth bound, because no path exists no matter what the depth. We say that failure due to reaching the depth bound is failing unnaturally, and failure without reaching the depth bound is failing naturally.

Artificial Intelligence

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Artificial Intelligence

Enviado por

Direitos autorais:

Formatos disponíveis

1.0 1.

ARTIFICIAL INTELLIGENCE Introduction

A good knowledge representation covers six basic characteristics:

NATURAL LANGUAGE PROCESSING

Phones ! Phonology - Sound patterns of language

Phrases & sentences ! Semantics - Intended meaning

Meaning in context ! Pragmatics - Understanding from external info

Meaning out of context

Você também pode gostar