SlideShare a Scribd company logo
1 of 19
Download to read offline
Duality between OOP and RL
Kwanghee Choi
Local Optima 2019
Contents
I. OOP Perspective
A. Characteristics of Objects
B. Good Objects
C. Object State
D. Object Behavior
II. RL Perspective
A. Agent and Environment
B. Reward and Action
C. History and State
D. Markov Property
III. Dual Perspective
A. Feedback Loop with Messages
B. States
C. Humankind Behind the Duality
I. OOP Perspective
Reference
- 객체지향의 사실과 오해: 역할, 책임, 협력 관점에서 본 객체지향 (조영호,
2015)
https://wikibook.co.kr/object-orientation/
- Summary of the book above (Kwanghee Choi, 2019)
https://juice500ml.github.io/software_design/2019/02/16/The-Essence-of-Object-Orientation.html
- Note. Following contents heavily depend on both of the reference.
A. Characteristics of Objects
● Real-world objects are passive. Software objects are active.
They can do much more stuff than real-world objects.
They acts as if they are live beings. (Anthropomorphism)
● Real-world objects are just metaphors for software objects,
minimizing the representational gap.
● Humans think and decide autonomously.
Objects encapsulate states and behaviors to act autonomously.
● Humans make promises to collaborate for a common goal.
Objects message each other to collaborate for a single functionality.
B. Good Objects
● Object should be able to cooperate via messages, like an open port.
● Object should be autonomous, with own principles and control.
● To ensure openness and autonomy, object has
behavior (the way how object can collaborate with other objects)
and state (data needed for behaviors inside the object).
● OO is not about classes. It is about autonomous objects messaging each
other. It is about maintaining collaborations between roles with
responsibilities. Classes are just tools to implement those.
C. Object State
● State is the total information that the object has at a specific time.
● State is an abstraction of all the previous behaviors to reduce the
complexities of the real-world.
● Object has, and should be on full control unto its own state, hence the
autonomy. State and behavior are bind to one unit: an object.
D. Object Behavior
● Behavior is doing stuff to respond to incoming messages.
● Behavior changes state (side effect), and behavior depends on the state.
● Behavior is the only way for an object to participate in collaborations.
● State Encapsulation: Only behaviors are visible, states are invisible (from
the outside). The only way to manipulate its states is via behaviors.
● As the object becomes more autonomous, it gets more intelligent.
In other words, collaboration gets more flexible and concise.
● Query the state of the object (read, getter),
and command to change the state of the object (write, setter).
II. RL Perspective
Reference
- UCL COMPGI13 Reinforcement Learning (David Silver, 2015)
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
- Reinforcement Learning: An Introduction, 2nd edition (Richard S. Sutton
and Andrew G. Barto, 2018)
http://incompleteideas.net/book/the-book-2nd.html
- Following contents heavily depend on both of the reference.
A. Agent and Environment
● Reinforcement Learning trains the agent what to do
so as to maximize the reward received from the environment.
● At each step t,
agent executes action At
and receives observation Ot
and reward Rt
,
environment receives At
and emits Ot+1
and Rt+1
.
● Agent’s actions affect environment,
therefore affect the subsequent data it receives.
B. Reward and Action
● A reward Rt
is a scalar feedback signal,
which indicates how well agent is doing at step t.
● Sequential Decision Making is selecting actions
to maximize total future reward.
● Actions may have long term consequences, and rewards may be delayed.
● It may be better to sacrifice in short-term to gain in long-term.
C. History and State
● History Ht
is all observable variables up to time t,
i.e. the sequence of observations, actions, and rewards up to time t.
Ht
= O1
, R1
, A1
, … , Ot-1
, Rt-1
, At-1
, Ot
, Rt
● State St
is a function, or a summary of history f (Ht
).
● State is the information used to determine what to do.
● Depending on the history/state, agent selects actions,
and environment selects observations and rewards.
● Environment state St
e
and agent state St
a
D. Markov Property
● A state St
is Markov iff P (St+1
| St
) = P (St+1
| S1
, S2
, … , St
), in other words,
the future does not depend on the past given the present.
● The state is a sufficient statistic of the future,
which captures all relevant information from the history.
Therefore once the state is known, the history may be thrown away.
● Full Observability is achieved when agent directly observes
environment state. (Ot
= St
e
= St
a
)
● Full observability is necessary for Markov Decision Process (MDP).
III. Dual Perspective
A. Feedback Loop with Messages
● Agent and environment are two objects affecting each other,
alternating between being caller and callee.
● Message is the only way for the caller to manipulate the callee.
Therefore, action is the only way for the agent to manipulate the
environment to return the maximized reward.
Inversely, observation and reward is the only for the environment to
manipulate the agent.
● Only the observation and the reward is visible to the agent.
Environment state has to be deduced from them.
B. States
● State determines the action of the agent.
● State is the summary of the previous interaction history,
or abstraction of all the previous behaviors.
● If the state fails to do so, it loses the Markov Property,
hence resulting object depending outside of one’s knowledge.
C. Humankind Behind the Duality
● Innate human ability of seeing the world as
a set of independent and perceivable objects.
● An idealized computational model of
humans learning from interactions with the environment.
Duality between OOP and RL

More Related Content

Similar to Duality between OOP and RL

Agent architectures
Agent architecturesAgent architectures
Agent architecturesguesta6bfe2
 
AI Agents, Agents in Artificial Intelligence
AI Agents, Agents in Artificial IntelligenceAI Agents, Agents in Artificial Intelligence
AI Agents, Agents in Artificial IntelligenceKirti Verma
 
intelligentagent-140313053301-phpapp01 (1).pdf
intelligentagent-140313053301-phpapp01 (1).pdfintelligentagent-140313053301-phpapp01 (1).pdf
intelligentagent-140313053301-phpapp01 (1).pdfShivareddyGangam
 
Lecture 1 - introduction.pdf
Lecture 1 - introduction.pdfLecture 1 - introduction.pdf
Lecture 1 - introduction.pdfNamanJain758248
 
introduction to inteligent IntelligentAgent.ppt
introduction to inteligent IntelligentAgent.pptintroduction to inteligent IntelligentAgent.ppt
introduction to inteligent IntelligentAgent.pptdejene3
 
AI_02_Intelligent Agents.pptx
AI_02_Intelligent Agents.pptxAI_02_Intelligent Agents.pptx
AI_02_Intelligent Agents.pptxYousef Aburawi
 
Types of environment
Types of environmentTypes of environment
Types of environmentMegha Sharma
 

Similar to Duality between OOP and RL (20)

CS4700-Agents_v3.pptx
CS4700-Agents_v3.pptxCS4700-Agents_v3.pptx
CS4700-Agents_v3.pptx
 
Agent architectures
Agent architecturesAgent architectures
Agent architectures
 
Agent architectures
Agent architecturesAgent architectures
Agent architectures
 
Intelligent agent
Intelligent agentIntelligent agent
Intelligent agent
 
AI Agents, Agents in Artificial Intelligence
AI Agents, Agents in Artificial IntelligenceAI Agents, Agents in Artificial Intelligence
AI Agents, Agents in Artificial Intelligence
 
intelligentagent-140313053301-phpapp01 (1).pdf
intelligentagent-140313053301-phpapp01 (1).pdfintelligentagent-140313053301-phpapp01 (1).pdf
intelligentagent-140313053301-phpapp01 (1).pdf
 
Lecture 1 - introduction.pdf
Lecture 1 - introduction.pdfLecture 1 - introduction.pdf
Lecture 1 - introduction.pdf
 
Lecture 4 (1).pptx
Lecture 4 (1).pptxLecture 4 (1).pptx
Lecture 4 (1).pptx
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Infosec
InfosecInfosec
Infosec
 
introduction to inteligent IntelligentAgent.ppt
introduction to inteligent IntelligentAgent.pptintroduction to inteligent IntelligentAgent.ppt
introduction to inteligent IntelligentAgent.ppt
 
AI_02_Intelligent Agents.pptx
AI_02_Intelligent Agents.pptxAI_02_Intelligent Agents.pptx
AI_02_Intelligent Agents.pptx
 
Intelligent Agents
Intelligent AgentsIntelligent Agents
Intelligent Agents
 
Lecture 2 Agents.pptx
Lecture 2 Agents.pptxLecture 2 Agents.pptx
Lecture 2 Agents.pptx
 
Unit2: Agents and Environment
Unit2: Agents and EnvironmentUnit2: Agents and Environment
Unit2: Agents and Environment
 
Types of environment
Types of environmentTypes of environment
Types of environment
 
Slide01 - Intelligent Agents.ppt
Slide01 - Intelligent Agents.pptSlide01 - Intelligent Agents.ppt
Slide01 - Intelligent Agents.ppt
 
AI PPT-2.pptx
AI PPT-2.pptxAI PPT-2.pptx
AI PPT-2.pptx
 
Agents.ppt
Agents.pptAgents.ppt
Agents.ppt
 
Lec 2-agents
Lec 2-agentsLec 2-agents
Lec 2-agents
 

More from Kwanghee Choi

Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)Kwanghee Choi
 
Recommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsRecommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsKwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)Kwanghee Choi
 
추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)Kwanghee Choi
 
Before and After the AI Winter - Recap
Before and After the AI Winter - RecapBefore and After the AI Winter - Recap
Before and After the AI Winter - RecapKwanghee Choi
 
Mastering Gomoku - Recap
Mastering Gomoku - RecapMastering Gomoku - Recap
Mastering Gomoku - RecapKwanghee Choi
 
Teachings of Ada Lovelace
Teachings of Ada LovelaceTeachings of Ada Lovelace
Teachings of Ada LovelaceKwanghee Choi
 
div, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewdiv, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewKwanghee Choi
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnKwanghee Choi
 
Bandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryBandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryKwanghee Choi
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson samplingKwanghee Choi
 
Azure functions: Quickstart
Azure functions: QuickstartAzure functions: Quickstart
Azure functions: QuickstartKwanghee Choi
 
Modern convolutional object detectors
Modern convolutional object detectorsModern convolutional object detectors
Modern convolutional object detectorsKwanghee Choi
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving AverageKwanghee Choi
 
Jpl coding standard for the c programming language
Jpl coding standard for the c programming languageJpl coding standard for the c programming language
Jpl coding standard for the c programming languageKwanghee Choi
 

More from Kwanghee Choi (19)

Visual Transformers
Visual TransformersVisual Transformers
Visual Transformers
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)추천 시스템 한 발짝 떨어져 살펴보기 (3)
추천 시스템 한 발짝 떨어져 살펴보기 (3)
 
Recommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal ScrollsRecommendation systems: Vertical and Horizontal Scrolls
Recommendation systems: Vertical and Horizontal Scrolls
 
추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)추천 시스템 한 발짝 떨어져 살펴보기 (1)
추천 시스템 한 발짝 떨어져 살펴보기 (1)
 
추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)추천 시스템 한 발짝 떨어져 살펴보기 (2)
추천 시스템 한 발짝 떨어져 살펴보기 (2)
 
Before and After the AI Winter - Recap
Before and After the AI Winter - RecapBefore and After the AI Winter - Recap
Before and After the AI Winter - Recap
 
Mastering Gomoku - Recap
Mastering Gomoku - RecapMastering Gomoku - Recap
Mastering Gomoku - Recap
 
Teachings of Ada Lovelace
Teachings of Ada LovelaceTeachings of Ada Lovelace
Teachings of Ada Lovelace
 
div, grad, curl, and all that - a review
div, grad, curl, and all that - a reviewdiv, grad, curl, and all that - a review
div, grad, curl, and all that - a review
 
Gaussian processes
Gaussian processesGaussian processes
Gaussian processes
 
Neural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to LearnNeural Architecture Search: Learning How to Learn
Neural Architecture Search: Learning How to Learn
 
JFEF encoding
JFEF encodingJFEF encoding
JFEF encoding
 
Bandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summaryBandit algorithms for website optimization - A summary
Bandit algorithms for website optimization - A summary
 
Dummy log generation using poisson sampling
 Dummy log generation using poisson sampling Dummy log generation using poisson sampling
Dummy log generation using poisson sampling
 
Azure functions: Quickstart
Azure functions: QuickstartAzure functions: Quickstart
Azure functions: Quickstart
 
Modern convolutional object detectors
Modern convolutional object detectorsModern convolutional object detectors
Modern convolutional object detectors
 
Usage of Moving Average
Usage of Moving AverageUsage of Moving Average
Usage of Moving Average
 
Jpl coding standard for the c programming language
Jpl coding standard for the c programming languageJpl coding standard for the c programming language
Jpl coding standard for the c programming language
 

Recently uploaded

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxviniciusperissetr
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excelysmaelreyes
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一z xss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一F La
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 

Recently uploaded (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Business Analytics using Microsoft Excel
Business Analytics using Microsoft ExcelBusiness Analytics using Microsoft Excel
Business Analytics using Microsoft Excel
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 

Duality between OOP and RL

  • 1. Duality between OOP and RL Kwanghee Choi Local Optima 2019
  • 2. Contents I. OOP Perspective A. Characteristics of Objects B. Good Objects C. Object State D. Object Behavior II. RL Perspective A. Agent and Environment B. Reward and Action C. History and State D. Markov Property III. Dual Perspective A. Feedback Loop with Messages B. States C. Humankind Behind the Duality
  • 4. Reference - 객체지향의 사실과 오해: 역할, 책임, 협력 관점에서 본 객체지향 (조영호, 2015) https://wikibook.co.kr/object-orientation/ - Summary of the book above (Kwanghee Choi, 2019) https://juice500ml.github.io/software_design/2019/02/16/The-Essence-of-Object-Orientation.html - Note. Following contents heavily depend on both of the reference.
  • 5. A. Characteristics of Objects ● Real-world objects are passive. Software objects are active. They can do much more stuff than real-world objects. They acts as if they are live beings. (Anthropomorphism) ● Real-world objects are just metaphors for software objects, minimizing the representational gap. ● Humans think and decide autonomously. Objects encapsulate states and behaviors to act autonomously. ● Humans make promises to collaborate for a common goal. Objects message each other to collaborate for a single functionality.
  • 6. B. Good Objects ● Object should be able to cooperate via messages, like an open port. ● Object should be autonomous, with own principles and control. ● To ensure openness and autonomy, object has behavior (the way how object can collaborate with other objects) and state (data needed for behaviors inside the object). ● OO is not about classes. It is about autonomous objects messaging each other. It is about maintaining collaborations between roles with responsibilities. Classes are just tools to implement those.
  • 7. C. Object State ● State is the total information that the object has at a specific time. ● State is an abstraction of all the previous behaviors to reduce the complexities of the real-world. ● Object has, and should be on full control unto its own state, hence the autonomy. State and behavior are bind to one unit: an object.
  • 8. D. Object Behavior ● Behavior is doing stuff to respond to incoming messages. ● Behavior changes state (side effect), and behavior depends on the state. ● Behavior is the only way for an object to participate in collaborations. ● State Encapsulation: Only behaviors are visible, states are invisible (from the outside). The only way to manipulate its states is via behaviors. ● As the object becomes more autonomous, it gets more intelligent. In other words, collaboration gets more flexible and concise. ● Query the state of the object (read, getter), and command to change the state of the object (write, setter).
  • 10. Reference - UCL COMPGI13 Reinforcement Learning (David Silver, 2015) http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html - Reinforcement Learning: An Introduction, 2nd edition (Richard S. Sutton and Andrew G. Barto, 2018) http://incompleteideas.net/book/the-book-2nd.html - Following contents heavily depend on both of the reference.
  • 11. A. Agent and Environment ● Reinforcement Learning trains the agent what to do so as to maximize the reward received from the environment. ● At each step t, agent executes action At and receives observation Ot and reward Rt , environment receives At and emits Ot+1 and Rt+1 . ● Agent’s actions affect environment, therefore affect the subsequent data it receives.
  • 12. B. Reward and Action ● A reward Rt is a scalar feedback signal, which indicates how well agent is doing at step t. ● Sequential Decision Making is selecting actions to maximize total future reward. ● Actions may have long term consequences, and rewards may be delayed. ● It may be better to sacrifice in short-term to gain in long-term.
  • 13. C. History and State ● History Ht is all observable variables up to time t, i.e. the sequence of observations, actions, and rewards up to time t. Ht = O1 , R1 , A1 , … , Ot-1 , Rt-1 , At-1 , Ot , Rt ● State St is a function, or a summary of history f (Ht ). ● State is the information used to determine what to do. ● Depending on the history/state, agent selects actions, and environment selects observations and rewards. ● Environment state St e and agent state St a
  • 14. D. Markov Property ● A state St is Markov iff P (St+1 | St ) = P (St+1 | S1 , S2 , … , St ), in other words, the future does not depend on the past given the present. ● The state is a sufficient statistic of the future, which captures all relevant information from the history. Therefore once the state is known, the history may be thrown away. ● Full Observability is achieved when agent directly observes environment state. (Ot = St e = St a ) ● Full observability is necessary for Markov Decision Process (MDP).
  • 16. A. Feedback Loop with Messages ● Agent and environment are two objects affecting each other, alternating between being caller and callee. ● Message is the only way for the caller to manipulate the callee. Therefore, action is the only way for the agent to manipulate the environment to return the maximized reward. Inversely, observation and reward is the only for the environment to manipulate the agent. ● Only the observation and the reward is visible to the agent. Environment state has to be deduced from them.
  • 17. B. States ● State determines the action of the agent. ● State is the summary of the previous interaction history, or abstraction of all the previous behaviors. ● If the state fails to do so, it loses the Markov Property, hence resulting object depending outside of one’s knowledge.
  • 18. C. Humankind Behind the Duality ● Innate human ability of seeing the world as a set of independent and perceivable objects. ● An idealized computational model of humans learning from interactions with the environment.