I will introduce a paper about I2A architecture made by deepmind. That is about Imagination-Augmented Agents for Deep Reinforcement Learning
This slide were presented at Deep Learning Study group in DAVIAN LAB.
Paper link: https://arxiv.org/abs/1707.06203
Imagination-Augmented Agents for Deep Reinforcement Learning
1. Imagination-Augmented Agents
for Deep Reinforcement Learning
Theophane Weber, Sebastien Racaniere, David P. Reichert, Lars Buesing et al.
DeepMind
Presented by Choi Seong Jae
2. Introduction
โข Reinforcement Learning์ Markov Decision Process(MDP) Problem์
ํด๊ฒฐํ๊ธฐ ์ํ ๋ฐฉ๋ฒ
โข ๐: a set of states
โข ๐ด: a set of actions
โข ๐(๐ โ|๐ , ๐): the transition function maps
โข ๐ (๐ , ๐, ๐ โ) -> r: the reinforcement function mapping state-action-successor state
triples to a scalar return
๐โ(๐ ) โ ๐๐๐๐๐๐ฅ
๐ โ ๐ด
๐(๐ , ๐)
๐(๐ , ๐)= ๐ผ ๐ก=0
โ
๐๐ก|๐ , ๐
12. Appendix
โข Standard model-free baseline agent
โข For Sokoban: 3 layers CNN, kernel sizes 8x8, 4x4, 3x3, strides of 4, 2, 1 and
number of output channels 32, 64, 64; following FC has 512 units
โข Rollout Encoder LSTM has 512(for Sokoban) hidden units.
And all rollouts are concatenated into a single vector ๐๐๐ of
length 2560(a rollout encoder per action).
13. Appendix
โข Sokoban environment
โข Every time step, a penalty of -0.1 is applied to the agent
โข Whenever the agent pushes a box on target, it receives a reward of +1
โข Whenever the agent pushes a box off target, it receives a penalty of -1
โข Finishing the level gives the agent a reward of +10 and the level terminates.