acai01-updated.ppt

Machine Learning and ILP for Multi-Agent Systems Daniel Kudenko & Dimitar Kazakov Department of Computer Science University of York, UK ACAI-01, Prague, July 2001

Why Learning Agents? ,[object Object],[object Object],[object Object]

A Brief History Machine Learning Agents Disembodied ML Single-Agent System Single-Agent Learning Multiple Single-Agent Learners Multiple Single-Agent System Social Multi-Agent Learners Social Multi-Agent System

Outline ,[object Object],[object Object],[object Object],[object Object]

What is Machine Learning? ,[object Object],[object Object]

Types of Learning ,[object Object],[object Object],[object Object]

Inductive Learning ,[object Object]

Inductive Learning Examples of Category C 1 Examples of Category C 2 Examples of Category C n Inductive Learning System Hypothesis (Procedure to Classify New Examples)

Inductive Learning Example Ammo: low Monster: near Light: good Category: shoot Inductive Learning System If (Ammo = high) and (light  {medium, good}) then shoot; ……… .. Ammo: low Monster: far Light: medium Category: ¬ shoot Ammo: high Monster: far Light: good Category: shoot

Performance Measure ,[object Object],[object Object]

Where’s the knowledge? ,[object Object],[object Object],[object Object],[object Object]

Example Language ,[object Object],[object Object],[object Object],[object Object]

Hypothesis Language ,[object Object],[object Object],[object Object]

Learning bias ,[object Object],[object Object],[object Object],[object Object]

Inductive Learning ,[object Object],[object Object],[object Object]

Inductive Learning for Agents ,[object Object],[object Object],[object Object],[object Object]

Batch vs Incremental Learning ,[object Object],[object Object],[object Object]

Batch Learning for Agents ,[object Object],[object Object],[object Object],[object Object]

Eager vs. Lazy learning ,[object Object],[object Object]

Active Learning ,[object Object],[object Object]

Black-Box vs. White-Box ,[object Object],[object Object]

Reinforcement Learning ,[object Object],[object Object],[object Object],[object Object]

Q Learning Value of a state: discounted cumulative reward V  (s t ) =  i  0  i r(s t+i ,a t+i ) 0   < 1 is a discount factor (  = 0 means that only immediate reward is considered). r(s t+i ,a t+i ) is the reward determined by performing actions specified by policy  . Q(s,a) = r(s,a) + V*(  (s,a)) Optimal Policy:  *(s) = argmax a Q(s,a)

Q Learning Initialize all Q(s,a) to 0 In some state s choose some action a. Let s’ be the resulting state. Update Q: Q(s,a) = r +  max a’ Q(s’,a’)

Q Learning ,[object Object],[object Object],[object Object]

Pros and Cons of RL ,[object Object],[object Object],[object Object],[object Object],[object Object]

Combination of IL and RL ,[object Object],[object Object],[object Object]

Unsupervised Learning ,[object Object],[object Object],[object Object],[object Object]

Learning and Verification ,[object Object],[object Object],[object Object]

Learning and Verification [Gordon ’00] ,[object Object],[object Object],[object Object],[object Object]

Learning in Multi-Agent Systems ,[object Object],[object Object],[object Object],[object Object],[object Object]

Types of Multi-Agent Learning [Weiss & Dillenbourg 99] ,[object Object],[object Object],[object Object]

Social Awareness ,[object Object],[object Object],[object Object]

Levels of Social Awareness [Vidal&Durfee 97] ,[object Object],[object Object],[object Object],[object Object]

Social Awareness and Q Learning ,[object Object],[object Object],[object Object]

Agent models and Q Learning ,[object Object],[object Object],[object Object],[object Object]

Agent Models and Q Learning ,[object Object],[object Object],[object Object]

Q Learning and Communication [Tan 93] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Role Learning ,[object Object],[object Object],[object Object],[object Object]

Q Learning of roles ,[object Object],[object Object]

Q Learning of Roles [Balch 99] ,[object Object],[object Object],[object Object],[object Object],[object Object]

Distributed Learning ,[object Object],[object Object],[object Object]

Distributed Data Mining ,[object Object],[object Object],[object Object]

Bibliography [Mitchell 97] T. Mitchell. Machine Learning. McGraw Hill, 1997. [Michalski et al. 98] R.S. Michalski, I. Bratko, M. Kubat. Machine Learning and Data Mining: Methods and Applications. Wiley, 1998. [Dietterich&Flann 95] T. Dietterich and N.Flann. Explanation-based Learning and Reinforcement Learning. In Proceedings of the Twelfth International Conference on Machine Learning , 1995. [Dzeroski et al. 98] S. Dzeroski, L. DeRaedt, and H. Blockeel. Relational Reinforcement Learning. In: Proceedings of the Eighth International Conference on Inductive Logic Programming ILP-98. Springer, 1998. [Gordon 00] D. Gordon: Asimovian Adaptive Agents. Journal of Artificial Intelligence Research, 13, 2000. [Weiss & Dilelnbourg 99] G. Weiss and P. Dillenbourg. What is ‘Multi’ in Multi-Agent Learning? In P. Dillenbourg (ed.), Collaborative Learning. Cognitive and Computational Approaches. Pergamon Press, 1999. [Vidal & Durfee 97] J.M. Vidal and E. Durfee. Agents Learning about Agents: A Framework and Analysis. In Working Notes of the AAAI-97 workshop on Multiagent Learning, 1997. [Mundhe & Sen 00] M. Mundhe and S. Sen. Evaluating Concurrent Reinforcement Learners. Proceeding s of the Fourth International Conference on Multiagent Systems , IEEE Press, 2000. [Claus & Boutillier 98] C. Claus and C. Boutillier. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems . AAAI 98. [Lauer & Riedmiller 00] M. Lauer and M. Riedmiller. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In Proceedings of the Seventeenth International Conference in Machine Learning, 2000.

Bibliography [Tan 93] M. Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Proceedings of the Tenth International Conference on Machine Learning, 1993. [Prasad et al. 96] M.V.N. Prasad, S.E. Lander and V.R. Lesser. Learning Organizational Roles for Negotiated Search. International Journal of Human-Computer Studies, 48(1), 1996. [Ono & Fukomoto 96] N. Ono and K. Fukomoto. A Modular Approach to Multi-Agent Reinforcement Learning. Proceedings of the First International Conference on Multi-Agent Systems, 1996. [Crites & Barto 98] R. Crites and A. Barto. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning, 1998. [Balch 99] T. Balch. Reward and Diversity in Multi-Robot Foraging. Proceedings of the IJCAI-99 Workshop on Agents Learning About, From, and With other Agents, 1999. [Provost & Kolluri 99] F. Provost and V. Kolluri. "A Survey of Methods for Scaling Up Inductive Algorithms." Data Mining and Knowledge Discovery 3 , 1999. [Provost & Hennessy 96] F. Provost and D. Hennessy. Scaling up: Distributed Machine Learning with Cooperation. AAAI 96, 1996.

Machine Learning and ILP for MAS: Part II ,[object Object],[object Object],[object Object],[object Object]

From Machine Learning to Learning Agents ,[object Object],Classic Machine Learning Active Learning Learning as one of many goals: Learning Agent(s) Closed Loop Machine Learning

Integrating Machine Learning into the Agent Architecture ,[object Object],[object Object],[object Object]

Time Constraints on Learning ,[object Object],[object Object],[object Object],[object Object],[object Object]

Doing Eager vs . Lazy Learning under Time Pressure ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

“ Clear-cut” vs. Any-time Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Time Constraints on Learning in Simulated Environments ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Synchronisation  Time Constraints Multi-agent Progol (Muggleton) Asynchronous The York MA Environment (Kazakov et al .) 1-move-per-round, immediate update Logic-based MAS for conflict simulations (Kudenko, Alonso) 1-move-per-round, batch update Real time Upper bound Unlimited time

Learning and Recall ,[object Object],[object Object],[object Object]

Learning and Recall (2) Update sensory information Recall current model of world to choose and carry out an action Observe the action outcome Learn new model of the world

Learning and Recall (3) Update sensory information Recall current model of world to choose and carry out an action Learn new model of the world ,[object Object],[object Object]

Learning and Recall (4) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Machine Learning Revisited ,[object Object],[object Object],[object Object],[object Object]

Object and Concept Language ,[object Object],[object Object],+ + + + _ _ _ _  

Machine Learning Biases ,[object Object],[object Object],[object Object],[object Object]

Preference Bias, Search Bias & Version Space ,[object Object],+ + + + _ _ _ _ most spec. concept most gen. concept

Inductive Logic Programming ,[object Object],[object Object],[object Object],[object Object]

LP as ILP Object Language ,[object Object],[object Object],[object Object]

ILP Object Language Example gbc(v8,30000,4000). + £ 4000 30,000 Audi V8 :- gbc(uno,90000,3000). - £ 3000 90,000 Fiat Uno gbc(z3,50000,5000). + £ 5000 50,000 BMW Z3 y/n price mileage model ILP representation Good bargain cars

LP as ILP Concept Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Modes in ILP ,[object Object],[object Object],[object Object],[object Object],[object Object]

Types in ILP ,[object Object],[object Object],[object Object],[object Object]

ILP Types and Modes: Example gbc(v8,30000,4000). + 4000 30,000 Audi V8 :- gbc(uno,90000,3000). - 3000 90,000 Fiat Uno gbc(z3,50000,5000). + 5000 50,000 BMW Z3 modeh(1,gbc(+model,+mileage,+price))? y/n price mileage model ILP representation (Progol) Good bargain cars

Positive Only Learning ,[object Object],[object Object],[object Object],[object Object],[object Object]

Background Knowledge ,[object Object],[object Object],[object Object]

Background Knowledge (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Choice of Background Knowledge ,[object Object],[object Object]

ILP Preference Bias ,[object Object],[object Object],[object Object],[object Object]

Induction in ILP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Example of Induction p(X,Y). p(b,a) :- q(b). p(X,a). p(X,Y) :- q(X). BK: q(b). q(c). Training examples: p(b,a). p(f,g). :- p(i,j).

Induction in Progol ,[object Object],[object Object],[object Object],[object Object],T = p(X,Y)  = p(X,a) :- q(X). p(X,a). p(X,Y) :- q(X)

Summary of ILP Basics ,[object Object],[object Object],[object Object],[object Object],[object Object]

Learning Pure Logic Programs vs . Decision Lists ,[object Object],[object Object],[object Object]

Decision List Example ,[object Object],[object Object],[object Object],[object Object]

Updating Decision Lists with Exceptions ,[object Object],[object Object],[object Object],[object Object]

Updating Decision Lists with Exceptions ,[object Object],[object Object],[object Object]

Replacing Exceptions with Rules: Before ,[object Object],[object Object],[object Object],[object Object],[object Object]

Replacing Exceptions with Rules: After ,[object Object],[object Object],[object Object],[object Object],[object Object]

Eager ILP vs . Analogical Prediction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Analogical Prediction Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Analogical Prediction Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Timing Analysis of Theories Learned with ILP ,[object Object],[object Object],[object Object],[object Object]

Timing Analysis of ILP Theories: Example ,[object Object],[object Object],[object Object],[object Object]

Agent Applications of ILP ,[object Object],[object Object],[object Object],[object Object],[object Object]

Agent Applications of ILP ,[object Object],[object Object],[object Object]

The York MA Environment ,[object Object],[object Object],[object Object],[object Object]

Learning and Natural Selection ,[object Object],[object Object],[object Object]

Darwinian vs . Lamarckian Evolution ,[object Object],[object Object],[object Object]

Darwinian vs . Lamarckian Evolution (2) ,[object Object],[object Object],[object Object],[object Object]

Learning and Language ,[object Object],[object Object],[object Object],[object Object]

Communication and Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Communication and Learning (2) ,[object Object],[object Object],[object Object],[object Object],[object Object]

Communication and Learning (3) ,[object Object]

Our Current Research ,[object Object],[object Object],[object Object],[object Object]

acai01-updated.ppt

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie acai01-updated.ppt

Ähnlich wie acai01-updated.ppt (20)

Mehr von butest

Mehr von butest (20)

acai01-updated.ppt