1. Machine Learning and ILP for Multi-Agent Systems Daniel Kudenko & Dimitar Kazakov Department of Computer Science University of York, UK ACAI-01, Prague, July 2001
2.
3. A Brief History Machine Learning Agents Disembodied ML Single-Agent System Single-Agent Learning Multiple Single-Agent Learners Multiple Single-Agent System Social Multi-Agent Learners Social Multi-Agent System
4.
5.
6.
7.
8. Inductive Learning Examples of Category C 1 Examples of Category C 2 Examples of Category C n Inductive Learning System Hypothesis (Procedure to Classify New Examples)
9. Inductive Learning Example Ammo: low Monster: near Light: good Category: shoot Inductive Learning System If (Ammo = high) and (light {medium, good}) then shoot; ……… .. Ammo: low Monster: far Light: medium Category: ¬ shoot Ammo: high Monster: far Light: good Category: shoot
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23. Q Learning Value of a state: discounted cumulative reward V (s t ) = i 0 i r(s t+i ,a t+i ) 0 < 1 is a discount factor ( = 0 means that only immediate reward is considered). r(s t+i ,a t+i ) is the reward determined by performing actions specified by policy . Q(s,a) = r(s,a) + V*( (s,a)) Optimal Policy: *(s) = argmax a Q(s,a)
24. Q Learning Initialize all Q(s,a) to 0 In some state s choose some action a. Let s’ be the resulting state. Update Q: Q(s,a) = r + max a’ Q(s’,a’)
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45. Bibliography [Mitchell 97] T. Mitchell. Machine Learning. McGraw Hill, 1997. [Michalski et al. 98] R.S. Michalski, I. Bratko, M. Kubat. Machine Learning and Data Mining: Methods and Applications. Wiley, 1998. [Dietterich&Flann 95] T. Dietterich and N.Flann. Explanation-based Learning and Reinforcement Learning. In Proceedings of the Twelfth International Conference on Machine Learning , 1995. [Dzeroski et al. 98] S. Dzeroski, L. DeRaedt, and H. Blockeel. Relational Reinforcement Learning. In: Proceedings of the Eighth International Conference on Inductive Logic Programming ILP-98. Springer, 1998. [Gordon 00] D. Gordon: Asimovian Adaptive Agents. Journal of Artificial Intelligence Research, 13, 2000. [Weiss & Dilelnbourg 99] G. Weiss and P. Dillenbourg. What is ‘Multi’ in Multi-Agent Learning? In P. Dillenbourg (ed.), Collaborative Learning. Cognitive and Computational Approaches. Pergamon Press, 1999. [Vidal & Durfee 97] J.M. Vidal and E. Durfee. Agents Learning about Agents: A Framework and Analysis. In Working Notes of the AAAI-97 workshop on Multiagent Learning, 1997. [Mundhe & Sen 00] M. Mundhe and S. Sen. Evaluating Concurrent Reinforcement Learners. Proceeding s of the Fourth International Conference on Multiagent Systems , IEEE Press, 2000. [Claus & Boutillier 98] C. Claus and C. Boutillier. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems . AAAI 98. [Lauer & Riedmiller 00] M. Lauer and M. Riedmiller. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems. In Proceedings of the Seventeenth International Conference in Machine Learning, 2000.
46. Bibliography [Tan 93] M. Tan. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In: Proceedings of the Tenth International Conference on Machine Learning, 1993. [Prasad et al. 96] M.V.N. Prasad, S.E. Lander and V.R. Lesser. Learning Organizational Roles for Negotiated Search. International Journal of Human-Computer Studies, 48(1), 1996. [Ono & Fukomoto 96] N. Ono and K. Fukomoto. A Modular Approach to Multi-Agent Reinforcement Learning. Proceedings of the First International Conference on Multi-Agent Systems, 1996. [Crites & Barto 98] R. Crites and A. Barto. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning, 1998. [Balch 99] T. Balch. Reward and Diversity in Multi-Robot Foraging. Proceedings of the IJCAI-99 Workshop on Agents Learning About, From, and With other Agents, 1999. [Provost & Kolluri 99] F. Provost and V. Kolluri. "A Survey of Methods for Scaling Up Inductive Algorithms." Data Mining and Knowledge Discovery 3 , 1999. [Provost & Hennessy 96] F. Provost and D. Hennessy. Scaling up: Distributed Machine Learning with Cooperation. AAAI 96, 1996.
56. Synchronisation Time Constraints Multi-agent Progol (Muggleton) Asynchronous The York MA Environment (Kazakov et al .) 1-move-per-round, immediate update Logic-based MAS for conflict simulations (Kudenko, Alonso) 1-move-per-round, batch update Real time Upper bound Unlimited time
57.
58. Learning and Recall (2) Update sensory information Recall current model of world to choose and carry out an action Observe the action outcome Learn new model of the world
59.
60.
61.
62.
63.
64.
65.
66.
67.
68. ILP Object Language Example gbc(v8,30000,4000). + £ 4000 30,000 Audi V8 :- gbc(uno,90000,3000). - £ 3000 90,000 Fiat Uno gbc(z3,50000,5000). + £ 5000 50,000 BMW Z3 y/n price mileage model ILP representation Good bargain cars
69.
70.
71.
72.
73.
74.
75. ILP Types and Modes: Example gbc(v8,30000,4000). + 4000 30,000 Audi V8 :- gbc(uno,90000,3000). - 3000 90,000 Fiat Uno gbc(z3,50000,5000). + 5000 50,000 BMW Z3 modeh(1,gbc(+model,+mileage,+price))? y/n price mileage model ILP representation (Progol) Good bargain cars
76.
77.
78.
79.
80.
81.
82. Example of Induction p(X,Y). p(b,a) :- q(b). p(X,a). p(X,Y) :- q(X). BK: q(b). q(c). Training examples: p(b,a). p(f,g). :- p(i,j).