Reinforcement Learning Q-Learning

•Als PPT, PDF herunterladen•

7 gefällt mir•5,262 views

In some applications, the output of the system is a sequence of actions. In such a case, a single action is not important game playing where a single move by itself is not that important.in the case of the agent acts on its environment, it receives some evaluation of its action (reinforcement), but is not told of which action is the correct one to achieve its goal

Ingenieurwesen

Present by
Melaku Eneayehu
Heena
Girillage
05/21/171
Reinforcement Learning
Q-Learning

Outline
05/21/172
 Introduction
 Reinforcement Leaning
 RL model /MDP Model
 Learning Task
 Q-Learning Basic
 Q-Learning Algorithm
 Key word
 Reference

Introduction
05/21/173
In some applications, the output of the system is a
sequence of actions. In such a case, a single action is
not important
game playing where a single move by itself is not that
important.

Reinforcement learning
05/21/174
in the case of the agent acts on its
environment, it receives some evaluation
of its action (reinforcement),
but is not told of which action is the
correct one to achieve its goal

Reinforcement learning
05/21/175
Task
 Learn how to behave successfully to achieve a goal
while interacting with an external environment
Learn via experiences!
Examples
Game playing: player knows whether it win or lose, but
not know how to move at each step
Control: a traffic system can measure the delay of cars,
but not know how to decrease it.

05/21/176
At the heart of RL is the following analogy:
 consider teaching a dog a new trick: cannot tell it what
to do, but you can reward/punish
Here we train a computer as if we train a dog

05/21/177
 If the dog obeys and acts according to our instructions
we encourage it by giving biscuits or we punish it by
beating or by scolding.
Similarly, if the system works well then the teacher
gives positive value (i.e. reward) or the teacher gives
negative value (i.e. punishment).

05/21/178
The learning system which gets the punishment has to
improve itself.
The reinforcement learning algorithms selectively
retain the outputs that maximize the received reward
over time.

RL is learning from interaction
05/21/179

RL model
05/21/1710
Each percept(e) is enough to determine the State(the state
is accessible)
The agent can decompose the Reward component from a
percept.
The agent task: to find a optimal policy, mapping states to
actions, that maximize long-run measure of the
reinforcement
Think of reinforcement as reward
Can be modeled as MDP model!

Review of MDP model
05/21/1711
MDP model <S,T,A,R>
Agent
Environment
State
Reward
Action
s0
r0
a0
s1
a1
r1
s2
a2
r2
s3
• S– set of states
• A– set of actions
• T(s,a,s’) = P(s’|s,a)– the
probability of transition from
s to s’ given action a
• R(s,a)– the expected reward
for taking action a in state s
∑
∑
=
=
'
'
)',,()',,(),(
)',,(),|'(),(
s
s
sasrsasTasR
sasrassPasR

05/21/1713
The target function to be learned is in this case is a control
policy. Φ:S  A.
reinforcement learning problem differs from other
function approximation tasks in several important
respects.

05/21/1714
 Delayed reward
Exploration
Partially observable states: sensors provide only partial
information
Life-long learning

THE LEARNING TASK
05/21/1715
 formulate the problem of learning sequential control
strategies more precisely
 we might assume the agent's:
1. actions are deterministic or that they are nondeterministic
2. can predict the next state that will result from each action
3. trained by an expert
4. train itself

Key word
05/21/1730
Reinforcement Learning
Task
Agent
Environment
Q-Learning
MDP Model
 State
Reward
Target Fuction

References
05/21/1731
Our Text Book (Tom M.Techeel)
https://www.quora.com/
https://en.wikipedia.org/wiki/Markov_decision_process
https://www-s.acm.illinois.edu/sigart/docs/QLearning.pdf
http://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.htm
l
http://cse-wiki.unl.edu/wiki/index.php/Q_learning
http://mnemstudio.org/path-finding-q-learning-tutorial.htm
http://artint.info/html/ArtInt_265.html
https://en.wikipedia.org/wiki/Q-learning
http://people.revoledu.com/kardi/tutorial/ReinforcementLear
ning/

Weitere ähnliche Inhalte

Was ist angesagt?

Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Shakeeb Ahmad Mohammad Mukhtar

Reinforcement learning Chandra Meena

Intro to Deep Reinforcement LearningKhaled Saleh

Intro to Reinforcement learning - part IIIMikko Mäkipää

Deep Q-LearningNikolay Pavlov

Reinforcement LearningMuhammad Iqbal Tawakal

Policy gradientJie-Han Chen

Deep Reinforcement LearningMeetupDataScienceRoma

Support vector machines (svm)Sharayu Patil

04 Multi-layer Feedforward NetworksTamer Ahmed Farrag, PhD

Reinforcement Learning Tutorial | EdurekaEdureka!

Temporal difference learningJie-Han Chen

Q-learningJasmine Anteunis

Lecture 9 Markov decision processVARUN KUMAR

Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee

An introduction to reinforcement learning (rl)pauldix

Problem solving agentsMegha Sharma

Deep reinforcement learning from scratchJie-Han Chen

Back propagationNagarajan

Deep Learning in Computer VisionSungjoon Choi

Was ist angesagt? (20)

Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]

Reinforcement learning

Intro to Deep Reinforcement Learning

Intro to Reinforcement learning - part III

Deep Q-Learning

Reinforcement Learning

Policy gradient

Deep Reinforcement Learning

Support vector machines (svm)

04 Multi-layer Feedforward Networks

Reinforcement Learning Tutorial | Edureka

Temporal difference learning

Q-learning

Lecture 9 Markov decision process

Reinforcement Learning 6. Temporal Difference Learning

An introduction to reinforcement learning (rl)

Problem solving agents

Deep reinforcement learning from scratch

Back propagation

Deep Learning in Computer Vision

Ähnlich wie Reinforcement Learning Q-Learning

reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1

reiniforcement learning.pptcharusharma165

Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf

YijueRL.pptShoaib Iqbal

RL_online _presentation_1.pptssuser43a599

NUS-ISS Learning Day 2018-How to train your program to play black jackNUS-ISS

Reinforcement Learning.pptPOOJASHREEC1

What Can RL do.pptxSeungeon Baek

reinforcement-learning-141009013546-conversion-gate02.pptxMohibKhan79

RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxdeeplearning6

RL.pptAzharJamil15

Online learning & adaptive game playingSaeid Ghafouri

M Harmon RL TutorialMance Harmon

Reinforcement learningFarzad M. Zaravand

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf

semi supervised Learning and Reinforcement learning (1).pptxDr.Shweta

Head First Reinforcement Learningazzeddine chenine

Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Lviv Startup Club

Naive Reinforcement algorithmSameerJolly2

rlpptgroup3-231018180804-0c05fb2f789piutt201roopikha

Ähnlich wie Reinforcement Learning Q-Learning (20)

reinforcement-learning-141009013546-conversion-gate02.pdf

reiniforcement learning.ppt

Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017

YijueRL.ppt

RL_online _presentation_1.ppt

NUS-ISS Learning Day 2018-How to train your program to play black jack

Reinforcement Learning.ppt

What Can RL do.pptx

reinforcement-learning-141009013546-conversion-gate02.pptx

RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx

RL.ppt

Online learning & adaptive game playing

M Harmon RL Tutorial

Reinforcement learning

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017

semi supervised Learning and Reinforcement learning (1).pptx

Head First Reinforcement Learning

Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...

Naive Reinforcement algorithm

rlpptgroup3-231018180804-0c05fb2f789piutt

Kürzlich hochgeladen

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

young call girls in Green Park🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12

Past, Present and Future of Generative AIabhishek36461

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Correctly Loading Incremental Data at ScaleAlluxio, Inc.

Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774

What are the advantages and disadvantages of membrane structures.pptxwendy cai

Heart Disease Prediction using machine learning.pptxPoojaBan

Design and analysis of solar grass cutter.pdfTagore Institute of Engineering And Technology

8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191

Artificial-Intelligence-in-Electronics (K).pptxbritheesh05

Oxy acetylene welding presentation note.eptoze12

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

complete construction, environmental and economics information of biomass com...asadnawaz62

Introduction-To-Agricultural-Surveillance-Rover.pptxk795866

computer application and construction managementMariconPadriquez1

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani

Kürzlich hochgeladen (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

young call girls in Green Park🔝 9953056974 🔝 escort Service

INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE

Past, Present and Future of Generative AI

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Correctly Loading Incremental Data at Scale

Arduino_CSE ece ppt for working and principal of arduino.ppt

What are the advantages and disadvantages of membrane structures.pptx

Heart Disease Prediction using machine learning.pptx

Design and analysis of solar grass cutter.pdf

8251 universal synchronous asynchronous receiver transmitter

Artificial-Intelligence-in-Electronics (K).pptx

Oxy acetylene welding presentation note.

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

complete construction, environmental and economics information of biomass com...

Introduction-To-Agricultural-Surveillance-Rover.pptx

computer application and construction management

CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf

Reinforcement Learning Q-Learning

1. Present by Melaku Eneayehu Heena Girillage 05/21/171 Reinforcement Learning Q-Learning

2. Outline 05/21/172  Introduction  Reinforcement Leaning  RL model /MDP Model  Learning Task  Q-Learning Basic  Q-Learning Algorithm  Key word  Reference

3. Introduction 05/21/173 In some applications, the output of the system is a sequence of actions. In such a case, a single action is not important game playing where a single move by itself is not that important.

4. Reinforcement learning 05/21/174 in the case of the agent acts on its environment, it receives some evaluation of its action (reinforcement), but is not told of which action is the correct one to achieve its goal

5. Reinforcement learning 05/21/175 Task  Learn how to behave successfully to achieve a goal while interacting with an external environment Learn via experiences! Examples Game playing: player knows whether it win or lose, but not know how to move at each step Control: a traffic system can measure the delay of cars, but not know how to decrease it.

6. 05/21/176 At the heart of RL is the following analogy:  consider teaching a dog a new trick: cannot tell it what to do, but you can reward/punish Here we train a computer as if we train a dog

7. 05/21/177  If the dog obeys and acts according to our instructions we encourage it by giving biscuits or we punish it by beating or by scolding. Similarly, if the system works well then the teacher gives positive value (i.e. reward) or the teacher gives negative value (i.e. punishment).

8. 05/21/178 The learning system which gets the punishment has to improve itself. The reinforcement learning algorithms selectively retain the outputs that maximize the received reward over time.

9. RL is learning from interaction 05/21/179

10. RL model 05/21/1710 Each percept(e) is enough to determine the State(the state is accessible) The agent can decompose the Reward component from a percept. The agent task: to find a optimal policy, mapping states to actions, that maximize long-run measure of the reinforcement Think of reinforcement as reward Can be modeled as MDP model!

11. Review of MDP model 05/21/1711 MDP model <S,T,A,R> Agent Environment State Reward Action s0 r0 a0 s1 a1 r1 s2 a2 r2 s3 • S– set of states • A– set of actions • T(s,a,s’) = P(s’|s,a)– the probability of transition from s to s’ given action a • R(s,a)– the expected reward for taking action a in state s ∑ ∑ = = ' ' )',,()',,(),( )',,(),|'(),( s s sasrsasTasR sasrassPasR

12. 05/21/1712

13. 05/21/1713 The target function to be learned is in this case is a control policy. Φ:S  A. reinforcement learning problem differs from other function approximation tasks in several important respects.

14. 05/21/1714  Delayed reward Exploration Partially observable states: sensors provide only partial information Life-long learning

15. THE LEARNING TASK 05/21/1715  formulate the problem of learning sequential control strategies more precisely  we might assume the agent's: 1. actions are deterministic or that they are nondeterministic 2. can predict the next state that will result from each action 3. trained by an expert 4. train itself

16. 05/21/1716

17. 05/21/1717

18. 05/21/1718

19. 05/21/1719

20. 05/21/1720

21. 05/21/1721

22. 05/21/1722

23. 05/21/1723

24. 05/21/1724

25. 05/21/1725

26. 05/21/1726

27. 05/21/1727

28. 05/21/1728

29. 05/21/1729

30. Key word 05/21/1730 Reinforcement Learning Task Agent Environment Q-Learning MDP Model  State Reward Target Fuction

31. References 05/21/1731 Our Text Book (Tom M.Techeel) https://www.quora.com/ https://en.wikipedia.org/wiki/Markov_decision_process https://www-s.acm.illinois.edu/sigart/docs/QLearning.pdf http://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.htm l http://cse-wiki.unl.edu/wiki/index.php/Q_learning http://mnemstudio.org/path-finding-q-learning-tutorial.htm http://artint.info/html/ArtInt_265.html https://en.wikipedia.org/wiki/Q-learning http://people.revoledu.com/kardi/tutorial/ReinforcementLear ning/

32. 10Q 05/21/1732

Reinforcement Learning Q-Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Reinforcement Learning Q-Learning

Ähnlich wie Reinforcement Learning Q-Learning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Reinforcement Learning Q-Learning