This is lecure 6 note for JAIST summer school on computational motor control (Hirokazu Tanaka & Hiroyuki Kambara). Lecture video: https://www.youtube.com/watch?v=GHMcx5F0_j8
4. 概念
Reinforcement learning is learning what to do--how to map
situations to actions--so as to maximize a numerical reward
signal.The learner is not told which actions to take, as in most
forms of machine learning, but instead must discover which
actions yield the most reward by trying them.
Reinforcement Learning -An Introduction-
(Sutton & Barto, 1999, MIT Press)
17. より効率的なTD法 (1)
• n-step TD法:Monte Carlo法とTD法の折衷案
where
Update Rule
when n=1 :
when n=T-t :
1-step TD method
Monte Carlo method
Error reduction property
34. 腕の到達運動に強化学習を適用
• Reaching in Sagittal Plane
• 2-links and 6-muscles
musculoskeletal model
• flexion/extension of both shoulder
and elbow joints
• moving hand to various target
points
1
2
1
5
4
3
2
6
horizontal direction
verticaldirection
Sagittal Plane
g
Kambara et al. (Neural Networks, 2009)
43. Reference
• Dora, K.What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?
Neural Networks, 12, 961-974, (1999)
• Reinforcement Learning -An Introduction-, Sutton & Barto, 1999, MIT Press.
• ︎︎︎︎︎木村元, 小林重信,Actorに適性度の履歴を用いたActor-Criticアルゴリズム: 不完全なValue-
Functionのもとでの強化学習, 人工知能学会誌, 11, (1996).
• Williams, R. J., Simple Statistical Gradient︎Following Algorithms for Connectionist Reinforcement
Learning, Machine Learning, 8, 229-256, 1992. ︎ ︎ ︎︎
• 計算論的神経科学への招待, 銅谷賢治, 2007, サイエンス社.
• Schultz,W., Dayan, P., Montague, P.R.,A Neural Substrate of Prediction and Reward, Science, 275,
1593-1599, (1997).
• Kawagoe, R.,Takikawa,Y., Hikosaka, O., Expectation of reward modulates cognitive signals in the basal
ganglia, Nature Neuroscience, 1, 411-416, (1998).
• Samejima, K., Ueda,Y., Doya, K., Kimura, M., Representation of Action-Specific RewardValues in the
Striatum, Science, 310, 1337-1340, (2005).
• Kambara, H., Kim, K., Shin, D., Sato, M., Koike,Y., Learning and generation of goal-directed arm reaching
from scratch, 22, 348-361, (2009)
• Izawa, J., Shadmehr, R., Learning from Sensory and Reward Prediction Errors during Motor Adaptation,
PLoS Computational Biology, 7, e1002012, (2011).