SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
ChainerRLの紹介
Chainer Meetup #4
自己紹介
- 藤田康博 / mooopan / muupan
- 2015- Preferred Networks
- Chainerで強化学習を書いている
ChainerRL
- 深層強化学習(deep reinforcement learning)ライブラリ
- 2017/02/20 公開
- GitHub: https://github.com/pfnet/chainerrl
- Preferred Research Blog: https://research.preferred.jp/2017/02/chainerrl/
こういうのが学習できる →
実装済みアルゴリズム
- Deep Q-Network (Mnih et al., 2015)
- Double DQN (Hasselt et al., 2016)
- Normalized Advantage Function (Gu et al., 2016)
- (Persistent) Advantage Learning (Bellemare et al., 2016)
- Deep Deterministic Policy Gradient (Lillicrap et al., 2016)
- SVG(0) (Heese et al., 2015)
- Asynchronous Advantage Actor-Critic (Mnih et al., 2016)
- Asynchronous N-step Q-learning (Mnih et al., 2016)
- Actor-Critic with Experience Replay (Wang et al., 2017) <- NEW!
- etc.
- いっぱい並べているけど共通部分は多い
ChainerRLによる強化学習の流れ
- エージェントが環境とのインタラクションを通じて報酬を最大化する行動を学習する
- 環境(environment)を定義する
環境
行動
観測, 報酬
ChainerRLによる強化学習の流れ
- モデルを定義する
- Q-function:観測 -> 各行動の価値(将来の報酬の和の期待値)
- Policy:観測 -> 行動の確率分布
Distribution: Softmax, Mellowmax, Gaussian
ActionValue: Discrete, Quadratic
ChainerRLによる強化学習の流れ
- エージェントを定義する
- インタラクションさせる
おわりに
- ChainerRL Quickstart Guide
- Jupyter NotebookでQ-functionを定義してDouble DQNでCart Pole Balancingを学習
https://github.com/pfnet/chainerrl/blob/master/examples/quickstart/quickstart.ipynb
- ChainerRLはまだβ版なのでインタフェース等変わる可能性があります
- むしろ積極的に改善していきたいのでぜひご意見ください
- フィードバックください(欲しい機能・アルゴリズムとかでもOK)
ChainerのTrainer
- 今のところ使ってない
- 強化学習においてDatasetとは?iterationとは?
- うまい使い方あったら教えてください

Weitere ähnliche Inhalte

Mehr von mooopan

Clipped Action Policy Gradient
Clipped Action Policy GradientClipped Action Policy Gradient
Clipped Action Policy Gradientmooopan
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017mooopan
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learningmooopan
 
A3Cという強化学習アルゴリズムで遊んでみた話
A3Cという強化学習アルゴリズムで遊んでみた話A3Cという強化学習アルゴリズムで遊んでみた話
A3Cという強化学習アルゴリズムで遊んでみた話mooopan
 
最近のDQN
最近のDQN最近のDQN
最近のDQNmooopan
 
Learning Continuous Control Policies by Stochastic Value Gradients
Learning Continuous Control Policies by Stochastic Value GradientsLearning Continuous Control Policies by Stochastic Value Gradients
Learning Continuous Control Policies by Stochastic Value Gradientsmooopan
 
Trust Region Policy Optimization
Trust Region Policy OptimizationTrust Region Policy Optimization
Trust Region Policy Optimizationmooopan
 
Effective Modern C++ Item 24: Distinguish universal references from rvalue re...
Effective Modern C++ Item 24: Distinguish universal references from rvalue re...Effective Modern C++ Item 24: Distinguish universal references from rvalue re...
Effective Modern C++ Item 24: Distinguish universal references from rvalue re...mooopan
 
"Playing Atari with Deep Reinforcement Learning"
"Playing Atari with Deep Reinforcement Learning""Playing Atari with Deep Reinforcement Learning"
"Playing Atari with Deep Reinforcement Learning"mooopan
 

Mehr von mooopan (9)

Clipped Action Policy Gradient
Clipped Action Policy GradientClipped Action Policy Gradient
Clipped Action Policy Gradient
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
A3Cという強化学習アルゴリズムで遊んでみた話
A3Cという強化学習アルゴリズムで遊んでみた話A3Cという強化学習アルゴリズムで遊んでみた話
A3Cという強化学習アルゴリズムで遊んでみた話
 
最近のDQN
最近のDQN最近のDQN
最近のDQN
 
Learning Continuous Control Policies by Stochastic Value Gradients
Learning Continuous Control Policies by Stochastic Value GradientsLearning Continuous Control Policies by Stochastic Value Gradients
Learning Continuous Control Policies by Stochastic Value Gradients
 
Trust Region Policy Optimization
Trust Region Policy OptimizationTrust Region Policy Optimization
Trust Region Policy Optimization
 
Effective Modern C++ Item 24: Distinguish universal references from rvalue re...
Effective Modern C++ Item 24: Distinguish universal references from rvalue re...Effective Modern C++ Item 24: Distinguish universal references from rvalue re...
Effective Modern C++ Item 24: Distinguish universal references from rvalue re...
 
"Playing Atari with Deep Reinforcement Learning"
"Playing Atari with Deep Reinforcement Learning""Playing Atari with Deep Reinforcement Learning"
"Playing Atari with Deep Reinforcement Learning"
 

ChainerRLの紹介