Você está na página 1de 5

Review on use of Reinforcement Learning in Artificial Intelligence

Mehdi Samiei yeganeh M.Tech.(S/w. Eng.) School of Information Technology (SIT) Jawaharlal Nehru Technological University Hyderabad, India E-mail:en_samieiyeganeh@yahoo.com Parisa Bahraminikoo M.Tech.(S/w. Eng.) School of Information Technology(SIT) Jawaharlal Nehru Technological University Hyderabad, India E-mail:en_bahrami@yahoo.com G.Praveen Babu Associate Professor School of Informatio Technology(SIT) Jawaharlal Nehru Technological University Hyderabad, India E-mail: pravbob@gmail.com

Abstract - With the start of the 21st century, human


moved into a new world of mechanics. Human started making machinery that can do the job for them. The technology developed so much that it started involving many other branches of engineering such as electronics, robotics etc. This eventually led to much more complex and smart machinery involving Artificial Intelligence. Reinforcement Learning is a type of Machine Learning, and thereby also a branch of Artificial Intelligence. It allows machines and software agents to automatically determine the ideal behavior within a specific context, in order to maximize its performance. Reinforcement Learning (RL) comes from the animal learning theory. RL does not need prior knowledge, it can autonomously get optional policy with the knowledge obtained by trialand-error and continuously interact with dynamic environment. As a matter of fact, Reinforcement Learning is defined by a specific type of problem, and all its solutions are classed as Reinforcement Learning algorithms. In the problem, an agent is supposed decide the best action to select based on its current state. When this step is repeated, the problem is known as a Markov Decision Process. A Markov Decision Process is a discrete time stochastic control process. At each time step, the process is in some state s, and the decision maker may choose any action that is available in states. Markov Decision Process provides a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker. Keywords - Artificial Intelligence, Reinforcement Learning

I.

INTRODUCTION

Since ancient days, philosophers and mathematicians developed a formal reasoning. The study and work of the mathematician Alan Turing led to invent the programmable digital electronic computer [14]. As per Turing Law of Computation, any mathematical deduction could be simulated by shuffling symbols such as 0 and 1. A group of researchers continued with the research in neurology and information theory and cybernetics to develop an electronic brain.

In the year 1956, Artificial Intelligence was established at Dartmuth College during a conference. The people who were present for this conference were Marvin Minsky, Allen Newell, John McCarthy, and Herbert Simon. The programs written by them were beyond belief such as computers solving word problems in Algebra, proving logical theorems and speaking English. Herbert Simon envisaged that in twenty years, machines will be able to work like humans. Marvin Minsky believed that the problem of creating Artificial Intelligence would be solved considerably. In human society, learning is an essential component of intelligent behavior. However, each individual agent need not learn everything from scratch by its own discovery. Reinforcement Learning is a type of Machine Learning [13], and thereby, also a branch of Artificial Intelligence. It allows machines and software agents to automatically determine the ideal behavior within a specific context, in order to maximize its performance. Simple reward feedback is required for the agent to learn its behavior; this is known as the reinforcement signal. Reinforcement Learning is defined not by characterizing learning methods, but by characterizing a learning problem. As mentioned Reinforcement Learning is an important machine learning method, its learning technology is divided into three types: Non-supervised Learning, Supervised Learning and Reinforcement Learning. Reinforcement Learning is an online learning technology [19] which is different from supervised learning and non-supervised learning. The reinforcement signal provided by the environment in Reinforcement Learning is to make a kind of appraisal to the action quality of intelligent agent, but not tell intelligent agent how to generate the correct action. Planning under uncertainty is fundamental to solving many important real-world problems, including applications in robotics, network routing, scheduling, and financial decision making. The rest of this paper is organized as follows: section 2 briefly describes the features of Artificial Intelligence and section 3 explains Reinforcement Learning.

II. A. Definition

ARTIFICIAL INTELLIGENCE

Artificial Intelligence (AI) is the field of Computer Science focused on ensuring that the dream of these scientists becomes a reality AI [3]. Systems are currently capable of understanding speech, playing chess and performing household tasks. Artificial Intelligence includes: Game playing: Programming computers to play games such as Chess and Checkers [8]. Neural networks : Systems that simulate intelligence by attempting to reproduce the types of physical connections that occur in animal brains [9]. Robotics : Programming computers to see and hear and react to other sensory stimuli [6]. Natural language : Programming computers to understand natural human languages [8]. Expert systems : Programming computers to make decisions in real-life situations (for example, some expert systems help doctors diagnose diseases based on symptoms)[4]. AI is set to play an important role in our lives. Researchers produce new products which duplicate intelligence, understand speech, beat the opponent chess player, and acting in complex conditions. The major problems of Artificial Intelligence include qualities such as knowledge, planning, learning, reasoning, communication, perception and capability to move and control the objects [2]. The aim of Artificial Intelligence is to develop the machines to perform the tasks in a better way than the humans. Another aim of Artificial Intelligence is to understand the actions whether it occurs in humans, machines or animals. As a result, Artificial Intelligence is gaining importance in Science and Engineering fields. AI can be classified into two major types and they are classified below: Weak AI: Weak AI represents technology that is capable of influencing pre-meditated rules and applying each of these rules to achieve a definite goal. Strong AI: Strong AI represents technology that has the ability to think or function similar to the human brain. Most people say that this technology will never be realized or will take another century to achieve but there is huge hope ahead.

Embryonic stages of development. Its applications across a wide spectrum. For example, AI is being applied in Management and Administration, Science, Engineering, Manufacturing, Financial and Legal areas, Military and Space endeavors, Medicine, and Diagnostics and many more [5]. Senior managers in many companies use AI-based strategic planning systems to assist in functions like competitive analysis, technology deployment, and resource allocation. They also use programs to assist in equipment configuration design, product distribution, regulatory-compliance advisement, and personnel assessment. AI is contributing heavily to manage organization, planning, and controlling operations, and will continue to do so with more frequency as programs are refined. Robots are being utilized more frequently in the business world [16]. In 1990, over 200,000 robots were in use in U.S. factories. Experts predict that by the year 2025 robots could potentially replace humans in almost all manufacturing jobs. This includes not only the mundane tasks, but also those requiring specialized skills. They will be performing jobs such as shearing sheep, scraping barnacles from the bottoms of ships, and sandblasting walls. However, there are jobs that robots will never be able to perform, such as surgery [1].

III. A. Definition

REINFORCEMENT LEARNING

Reinforcement Learning allows the machine or Software agent to learn its behavior based on the feedback from the environment [20]. This behavior can be learnt once and for all, or keep on adapting as time goes by. If the problem is modeled with care, some Reinforcement Learning algorithms can converge to the global optimum; this is the ideal behavior that maximizes the reward. This automated learning scheme implies that there is little need for a human expert who knows about the domain of application. Much less time will be spent designing a solution, since there is no need for hand-crafting complex sets of rules as with Expert Systems, and all that is required is someone familiar with Reinforcement Learning. The basic model of Reinforcement Learning is shown in figure 1. Intelligent Agent can perceive the environment and choose an action to obtain the biggest reward value by continuously interacting with the environment. The interactive interface of an Intelligent agent and environment includes action, reward and state [12].

B. Application of AI in the Business World AI is being used extensively in the business world, despite the fact that the discipline itself is still in the

State

Consequences of introducing these imperfect value estimations, and research tries to minimize their impact on the quality of the solution. Moreover, problems are also generally very modular; similar behaviors reappear often, and modularity can be introduced to avoid learning everything all over again. Hierarchical approaches are common-place for this, but doing this automatically is proving a challenge [15]. Finally, due to limited perception, it is often impossible to fully determine the current state. This also affects the performance of the algorithm, and much work has been done to compensate this Perceptual Aliasing. C. Applications The possible applications of Reinforcement Learning are abundant, due to the generic nature of the problem specification. As a matter of fact, a very large number of problems in Artificial Intelligence can be fundamentally mapped to a decision process. This is a distinct advantage, since the same theory can be applied to many different domain specific problems with little effort. In practice, this ranges from controlling robotic arms for finding the most efficient motor combination to robot navigation where collision avoidance behavior can be learnt by negative feedback from bumping into obstacles [21]. Logic games are also well-suited to Reinforcement Learning, as they are traditionally defined as a sequence of decisions: games such as Poker, Back-gammon, Othello, and Chess have been tackled more or less successfully. Reinforcement Learning is also used in process control [17], dispatch management. In the dispatch management, the most successful application is Crites and Bartons elevator scheduling problem as they apply a Step Reinforcement Learning algorithm to the operation scheduling including 4 lifts and 10 floors [7]. D. Examples

Reward and Intelligent Agent Punishment signal Environment (Si)

Action (A)

Figure 1. The basic model of RL When each time Reinforcement Learning system interacts with the environment, the system first accepts the input of environment state s, and then the output of action a acts on the environment according to the internal inference mechanism. Finally, the environment changes to new state s after accepting the action. The system accepts the input of the new state s and obtains the rewards and punishment signal r of environment for the system. Reinforcement Learning system's goal is to learn an action strategy : S A, the strategy enables the action of the system choice to obtain the largest cumulative reward value of environment, it can be defined as formula (1), where is discount factor. The basic theory of Reinforcement Learning technology is: If a certain system's action causes the positive reward of the environment, the system generating this action lately will strengthen the trend, this is a positive feedback process; otherwise, the system generating this action will diminish this trend.

t+i

(1)

If the environment is Markov, the interaction between the system and the environment may be regarded as Markov Decision-making Process (MDP). MDP model can be defined by four factors(S, A, R, P) [18]. S is environment state set, A is system action set, R: SA R, which is Reward function, P:SA P which is state transition probability[10].

A good way to understand Reinforcement Learning is to consider some of the examples and possible applications that have guided its development. A master chess player makes a move. The choice is informed both by planning (anticipating possible replies and counter replies) and by immediate, intuitive judgments of the desirability of particular positions and moves. An adaptive controller adjusts parameters of a petroleum refinery's operation in real time. The controller optimizes the yield/cost/quality trade-off on the basis of specified marginal costs without sticking strictly to the set points originally suggested by engineers.

B. Limitations There are many challenges in current Reinforcement Learning research. Firstly, it is often too memory expensive to store values of each state, since the problems can be pretty complex. Solving this involves looking into value approximation techniques, such as Decision Trees or Neural Networks. There are many

A gazelle calf struggles to its feet minutes after being born. Half an hour later it is running at 20 miles per hour. A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station. It makes its decision based on how quickly and easily it has been able to find the recharger in the past. Ramon prepares his breakfast. Closely examined, even this apparently mundane activity reveals a complex web of conditional behavior and interlocking goal sub-goal relationships: walking to the cupboard, opening it, selecting a cereal box, then reaching for, grasping, and retrieving the box. Other complex, tuned, interactive sequences of behavior are required to obtain a bowl, spoon, and milk jug. Each step involves a series of eye movements to obtain information and to guide reaching and locomotion. Rapid judgments are continually made about how to carry the objects or whether it is better to ferry some of them to the dining table before obtaining others. Each step is guided by goals, such as grasping a spoon or getting to the refrigerator, and is in service of other goals, such as having the spoon to eat with once the cereal is prepared and ultimately obtaining nourishment.

Machine Learning need to have no intellectual stakes in AI's overall success. Machine Learning has a close overlap with statistical physics, certain signal processing topics, certain formulations related to planning, control theory, and dynamic programming. Researchers have predicted that weak AI has a very exciting future in store for mankind. With the exponential growth of computing power, weak AI will provide more breakthroughs in the near future. One exciting prospect is the Super-Computer. Super Computers of the near future will be so powerful that they will be capable to perform as expert systems, which will further be used as a database of expert knowledge to solve everyday problems.

REFERENCES

[1] R.B. Mishran Professor Department of Computer Engineering , Banaras Hindu University Artificial Intelligence , 2011 by PHI Learning Private Limited, New Delhi. [2] Chuck Williams. A BRIEF INTRODUCTION TO ARTIFICIAL INTELLIGENCE, 10.0 109 83 IEEE [3] ASA B. SIMMONS AND STEVEN G. CHAPPELL. Artificial Intelligence-Definition and Practice,IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 13, NO. 2, APRIL 1988 [4] V. B. (Kisan) Pandit, United States Navy, Naval Sea Systems Command Surface Ship Maintenance Division, Artificial Intelligence and Expert Systems: A Technology Update [5] HENRY G. GREEN Managing Director of Research and Development and MICHAEL A. PEARSON , ARTIFICIAL INTELLIGENCE IN FINANCIAL markets. [6] E. S. Brunette, R. C. Flemmer and C. L. Flemmer School of Engineering and Advanced Technology Massey University Palmerston North, New Zealand, A Review of Artificial Intelligence , IEEE, Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand, [7] Crites R H and Barto A G. Improving elevator performance using reinforcement learning [A]. In: Touretzky D S ,Mozer M C , and M E H. Advances in Neural Information Processing Systems [M]. Cambridge,MA : The MIT Press ,1995 ,1017 1023. [8] R. A. Corlett, Features of Artificial Intelligence languages and their environments.

The above examples share features that are so basic that they are easy to overlook. All involve interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment. The agent's actions are permitted to affect the future state of the environment (e.g., the next chess position, the level of reservoirs of the refinery, the next location of the robot), thereby affecting the options and opportunities available to the agent at later times. Correct choice requires taking into account indirect, delayed consequences of actions, and thus may require foresight or planning [11]. IV. CONCLUSION

Artificial Intelligence in the nineties focused on providing a better life for humans. Currently, focus is on research that aids in building human-like robots. If highly intelligent robots are a possibility, then the very role humans play in society could come for a change. Machine Learning (Reinforcement Learning) could be strongly considered a part of AI, however, we would classify Machine Learning as the study of creation of semantic models and adaptive behavior with AI being the overall science of systems that is intelligentseeming behavior. Also, it is the discipline which attempts to improve on a machine's performance of a task, given examples. It could be considered to be within AI's range of interests, but researchers in

[9] Simon Le Blond and Raj Aggarwal , University of Bath , A Review of Artificial Intelligence Techniques as Applied to Adaptive Auto reclosure, with Particular Reference to Deployment with Wind generation , 9780-947649-44-9/09/$26.00 2009 IEEE. [10] Wang Qiang ,Zhan Zhongli : Reinforcement Learning Model,Algorithms and Its Application 2011 International Conference on Mechatronic Science, Electric Engineering and Computer August 19-22, 2011, Jilin, China. [11] Richard S. Sutton and Andrew G. Barto A Bradford Book The MIT Press Cambridge, Massachusetts London, England Reinforcement Learning: An Introduction. [12] Tatsuya Kasai, Hiroshi Tenmoto, Akimoto Kamiya, Department of Information Engineering Kushiro National College of Technology, Learning of Communication Codes in Multi-Agent Reinforcement Learning Problem , 2008 IEEE Conference on Soft Computing in Industrial Applications (SMCia/08), June 25-27, 2008, Muroran, JAPAN. [13] Martin Sewell,Department of Computer Science University College London Machine Learning. [14] B. Jack Copeland and Diane Proudfoot AlanTuring's Forgotten Ideas in Computer Science. [15] Graham Taylor University of Waterloo, Canada INSA-Lyon, France Reinforcement Learning for

Parameter Control of Text Detection in Images from Video Sequences. [16] Tim Niemueller and Sumedha Widyadharma Artificial Intelligence An Introduction to Robotics. [17] Ihsan Omur Bucak Mohamed A. Zohdy ,School of Engineering and Computer Science Electrical and Systems Engineering Department Oakland University, Proceedings of the 38 Conference on Decision & Control Phoenix,Arizona USA December 1999 APPLICATION OF REINFORCEMENT LEARNING CONTROL TO A NONLINEAR DEXTEREOUS ROBOT. [18] Bob Givan Purdue University Ron Parr Duke University An Introduction to Markov Decision Processes. [19] James J.Govindhasamy , Sean F.McLoone , George W.Irwin , John j.French , Richard P.Doyle Reinforcement Learning for Online Control and Optimization . [20]Richard S.Sutton ,Andrew G.Barto Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). [21] Marco Wiering Intelligent Systems Group Utrecht University Reinforcement Learning for Robot Control.

*****

Você também pode gostar