Reinforcement learning

•Als ODP, PDF herunterladen•

0 gefällt mir•357 views

The document discusses reinforcement learning concepts like states, actions, rewards, and how reinforcement learning agents can learn optimal policies through trial-and-error interactions with an environment by maximizing rewards. It uses the example of learning to play the card game Exploding Kittens through reinforcement learning by receiving different rewards for drawing certain cards and learning which moves maximize long term rewards. The document also contrasts reinforcement learning with supervised learning and other machine learning techniques.

Daten & Analysen

Reinforcement
Learning
The Exploding Kittens
Edition
Tarek Amr

Why Reinforcement Learning?
I learned after
playing many times;
That I‘m more likely to
win if I played this move
after that one.
No one kept telling me
make this or that move!

States, Actions and Rewards
St St+1
At At+1 St+2
Goal State
R

What’s a good reward
If getting an
Exploding Kitten card
gives me a reward of
-1;
What reward do I get
if I get a Defuse card?
And for a Nope card?

From Rewards, States get Values
And from
values comes
policies!

a State has a value (V)
St St+1
At At+1 St+2
Goal State
R
Vt Vt+1

or State/Action pair have a value (Q)
St St+1
At At+1 St+2
Goal State
R
Qt Qt+1

Temporal Difference; S-A-R-S-A
St St+1
At At+1 St+2
Goal State
R
Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)

Epsilon Greedy
St
St+1At At+1 St+2
Goal State
RExploration vs Exploitation
Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)

Deep Q Learning
State Feature1 State Feature2 Action Value
10 20 JUMP 0.5
20 15 DUCK 0.6
15 25 JUMP 0.8
Warning:Over simplification Ahead
This is a Q-Table;
What if there are too many States & Actions?

MDP, MC and TD
Markov Decision Process:
● You need to know the states and the transitions between them.
Monte Carlo (variance ↑):
● You wait till episode’s end, and re-assign values to states.
● No need to even know the states, we sample from the environment.
Temporal Difference (bias ↑):
● Update on the go. No need to even have goal states.

Let’s play the RL vs SL game
for (i=0; i<3; i++) {
● Pick a catawiki problem
● Should it be solved via
○ Reinforcement learning?
○ Supervised learning?
}

Weitere ähnliche Inhalte

Kürzlich hochgeladen

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17

Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy

SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1

Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics

World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole

The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics

modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx

Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen

Learn How Data Science Changes Our WorldEduminds Learning

Insurance Churn Prediction Data Analysis ProjectBoston Institute of Analytics

IBEF report on the Insurance market in IndiaManalVerma4

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics

Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181

Principles and Practices of Data VisualizationKianJazayeri1

Data Analysis Project: Stroke PredictionBoston Institute of Analytics

Kürzlich hochgeladen (20)

6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...

Real-Time AI Streaming - AI Max Princeton

What To Do For World Nature Conservation Day by Slidesgo.pptx

Student Profile Sample report on improving academic performance by uniting gr...

SMOTE and K-Fold Cross Validation-Presentation.pptx

Bank Loan Approval Analysis: A Comprehensive Data Analysis Project

World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model

why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...

The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...

modul pembelajaran robotic Workshop _ by Slidesgo.pptx

Data Factory in Microsoft Fabric (MsBIP #82)

Learn How Data Science Changes Our World

Insurance Churn Prediction Data Analysis Project

IBEF report on the Insurance market in India

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...

Rithik Kumar Singh codealpha pythohn.pdf

Principles and Practices of Data Visualization

Data Analysis Project: Stroke Prediction

Empfohlen

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Empfohlen (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Reinforcement learning

1. Reinforcement Learning The Exploding Kittens Edition Tarek Amr

2. Why Reinforcement Learning? I learned after playing many times; That I‘m more likely to win if I played this move after that one. No one kept telling me make this or that move!

3. States, Actions and Rewards St St+1 At At+1 St+2 Goal State R

4. What’s a good reward If getting an Exploding Kitten card gives me a reward of -1; What reward do I get if I get a Defuse card? And for a Nope card?

5. From Rewards, States get Values And from values comes policies!

6. a State has a value (V) St St+1 At At+1 St+2 Goal State R Vt Vt+1

7. or State/Action pair have a value (Q) St St+1 At At+1 St+2 Goal State R Qt Qt+1

8. Temporal Difference; S-A-R-S-A St St+1 At At+1 St+2 Goal State R Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)

9. Epsilon Greedy St St+1At At+1 St+2 Goal State RExploration vs Exploitation Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)

10. Deep Q Learning State Feature1 State Feature2 Action Value 10 20 JUMP 0.5 20 15 DUCK 0.6 15 25 JUMP 0.8 Warning:Over simplification Ahead This is a Q-Table; What if there are too many States & Actions?

11. MDP, MC and TD Markov Decision Process: ● You need to know the states and the transitions between them. Monte Carlo (variance ↑): ● You wait till episode’s end, and re-assign values to states. ● No need to even know the states, we sample from the environment. Temporal Difference (bias ↑): ● Update on the go. No need to even have goal states.

12. Let’s play the RL vs SL game for (i=0; i<3; i++) { ● Pick a catawiki problem ● Should it be solved via ○ Reinforcement learning? ○ Supervised learning? }

Hinweis der Redaktion

We expect, in general, that the environment will be nondeterministic; that is, that taking the same action in the same state on two different occasions may result in different next states and/or different reinforcement values. However, we assume the environment is stationary; that is, that the probabilities of making state transitions or receiving specific reinforcement signals do not change over time.
Reinforcement learning differs from the more widely studied problem of supervised learning in several ways. The most important difference is that there is no presentation of input/output pairs. Instead, after choosing an action the agent is told the immediate reward and the subsequent state, but is not told which action would have been in its best long-term interests. It is necessary for the agent to gather useful experience about the possible system states, actions, transitions and rewards actively to act optimally. Another difference from supervised learning is that on-line performance is important: the evaluation of the system is often concurrent with learning. Use cases for RL: if there is path dependence (i.e. the order of your moves matter, like in chess), if you have a budget (e.g. max # emails to send, money), or if your decisions select your future training examples (e.g. (greedily) not bidding on new websites in programmatic advertising will never allow you acquire data about them). (via Peter Tegelaar)