Duality between OOP and RL

Duality between OOP and RL
Kwanghee Choi
Local Optima 2019

Contents
I. OOP Perspective
A. Characteristics of Objects
B. Good Objects
C. Object State
D. Object Behavior
II. RL Perspective
A. Agent and Environment
B. Reward and Action
C. History and State
D. Markov Property
III. Dual Perspective
A. Feedback Loop with Messages
B. States
C. Humankind Behind the Duality

Reference
- 객체지향의 사실과 오해: 역할, 책임, 협력 관점에서 본 객체지향 (조영호,
2015)
https://wikibook.co.kr/object-orientation/
- Summary of the book above (Kwanghee Choi, 2019)
https://juice500ml.github.io/software_design/2019/02/16/The-Essence-of-Object-Orientation.html
- Note. Following contents heavily depend on both of the reference.

A. Characteristics of Objects
● Real-world objects are passive. Software objects are active.
They can do much more stuff than real-world objects.
They acts as if they are live beings. (Anthropomorphism)
● Real-world objects are just metaphors for software objects,
minimizing the representational gap.
● Humans think and decide autonomously.
Objects encapsulate states and behaviors to act autonomously.
● Humans make promises to collaborate for a common goal.
Objects message each other to collaborate for a single functionality.

B. Good Objects
● Object should be able to cooperate via messages, like an open port.
● Object should be autonomous, with own principles and control.
● To ensure openness and autonomy, object has
behavior (the way how object can collaborate with other objects)
and state (data needed for behaviors inside the object).
● OO is not about classes. It is about autonomous objects messaging each
other. It is about maintaining collaborations between roles with
responsibilities. Classes are just tools to implement those.

C. Object State
● State is the total information that the object has at a speciﬁc time.
● State is an abstraction of all the previous behaviors to reduce the
complexities of the real-world.
● Object has, and should be on full control unto its own state, hence the
autonomy. State and behavior are bind to one unit: an object.

D. Object Behavior
● Behavior is doing stuff to respond to incoming messages.
● Behavior changes state (side effect), and behavior depends on the state.
● Behavior is the only way for an object to participate in collaborations.
● State Encapsulation: Only behaviors are visible, states are invisible (from
the outside). The only way to manipulate its states is via behaviors.
● As the object becomes more autonomous, it gets more intelligent.
In other words, collaboration gets more ﬂexible and concise.
● Query the state of the object (read, getter),
and command to change the state of the object (write, setter).

Reference
- UCL COMPGI13 Reinforcement Learning (David Silver, 2015)
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
- Reinforcement Learning: An Introduction, 2nd edition (Richard S. Sutton
and Andrew G. Barto, 2018)
http://incompleteideas.net/book/the-book-2nd.html
- Following contents heavily depend on both of the reference.

A. Agent and Environment
● Reinforcement Learning trains the agent what to do
so as to maximize the reward received from the environment.
● At each step t,
agent executes action At
and receives observation Ot
and reward Rt
,
environment receives At
and emits Ot+1
and Rt+1
.
● Agent’s actions affect environment,
therefore affect the subsequent data it receives.

B. Reward and Action
● A reward Rt
is a scalar feedback signal,
which indicates how well agent is doing at step t.
● Sequential Decision Making is selecting actions
to maximize total future reward.
● Actions may have long term consequences, and rewards may be delayed.
● It may be better to sacriﬁce in short-term to gain in long-term.

C. History and State
● History Ht
is all observable variables up to time t,
i.e. the sequence of observations, actions, and rewards up to time t.
Ht
= O1
, R1
, A1
, … , Ot-1
, Rt-1
, At-1
, Ot
, Rt
● State St
is a function, or a summary of history f (Ht
).
● State is the information used to determine what to do.
● Depending on the history/state, agent selects actions,
and environment selects observations and rewards.
● Environment state St
e
and agent state St
a

D. Markov Property
● A state St
is Markov iff P (St+1
| St
) = P (St+1
| S1
, S2
, … , St
), in other words,
the future does not depend on the past given the present.
● The state is a sufﬁcient statistic of the future,
which captures all relevant information from the history.
Therefore once the state is known, the history may be thrown away.
● Full Observability is achieved when agent directly observes
environment state. (Ot
= St
e
= St
a
)
● Full observability is necessary for Markov Decision Process (MDP).

A. Feedback Loop with Messages
● Agent and environment are two objects affecting each other,
alternating between being caller and callee.
● Message is the only way for the caller to manipulate the callee.
Therefore, action is the only way for the agent to manipulate the
environment to return the maximized reward.
Inversely, observation and reward is the only for the environment to
manipulate the agent.
● Only the observation and the reward is visible to the agent.
Environment state has to be deduced from them.

B. States
● State determines the action of the agent.
● State is the summary of the previous interaction history,
or abstraction of all the previous behaviors.
● If the state fails to do so, it loses the Markov Property,
hence resulting object depending outside of one’s knowledge.

C. Humankind Behind the Duality
● Innate human ability of seeing the world as
a set of independent and perceivable objects.
● An idealized computational model of
humans learning from interactions with the environment.

Duality between OOP and RL

Recommended

Recommended

More Related Content

Similar to Duality between OOP and RL

Similar to Duality between OOP and RL (20)

More from Kwanghee Choi

More from Kwanghee Choi (19)

Recently uploaded

Recently uploaded (20)

Duality between OOP and RL