Reinforcement Learning

Reinforcement Learning
Yigit UNALLAR

Machine Learning
Learn without explicitly programmed!
● Supervised Learning
● Unsupervised Learning
● Reinforcement Learning

Reinforcement Learning
● Learning from interaction!
○ Driving a car,
○ Holding a conversation,
● Goal-directed approach
○ Closed-loop,
○ Reward oriented,

Reinforcement vs. Unsupervised Learning
● Hidden structures!
● Unlabeled data!
● No reliance on structures!
● Maximize a reward!

Exploration vs. Exploitation Dilemma
● Exploit to obtain rewards!
● Explore to perform better!
● Either Exploration or Exploitation?
● Closest to the human and animal learning!

Examples
● Mobile Robot
○ More trash to find,
○ Way back to battery station,
● Adaptive Controller for Petrol Refinery
○ Optimize yield/cost/quality,
○ Specified marginal costs,

Agent & Environment
● Policy,
○ Mapping from states to actions,
● Reward,
○ Pain, pleasure,
● Value Function,
○ Farsighted judgement,
● Model,
○ Mimics the environment,

Pick and Place Robot
Action:
Voltages at motors,
States:
Latest joint data,
Reward:
+1 for successful pick-up, computed in the environment!

Goals & Markov Decision Process
Goals:
Markov Decision Process:
Retaining all relevant information, Markov Property!

Markov Decision Process ctd.
MDP if,
● The state and action spaces are finite,
● Satisfies Markov property,
Example: Recycling Robot
● Actively search for a can,
● Remain still and wait for a can,
● Go back to station,

Value Functions- Bellman Equations
Solving RL tasks for WHAT?!
● Finding a policy
○ Achieves lots of reward
■ Over the long RUN!

Dynamic Programming
● Use value functions,
● Organize and structure a search,
● GOOD POLICIES!

Monte Carlo Methods
● Used in algorithm to mimic policy iteration,
○ Policy Evaluation,
■ (s,a) averages over time ==> Q
○ Policy Iteration,
■ Next policy from Q, (Greedy Policy),
● Given s, new policy returns a that max Q(s, . )
● Works in episodic problems ONLY!

References
[1] Reinforcement Learning: Introduction, R. Sutton, A. Barto
[2] AIMA, S. Russell, P. Norvig

Reinforcement Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (14)

Ähnlich wie Reinforcement Learning

Ähnlich wie Reinforcement Learning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Reinforcement Learning