Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance

Bayesian Nonparametric Motor-skill Representations
for Eﬃcient Learning of Robotic Clothing Assistance
Workshop on Practical Bayesian Nonparametrics, NIPS 2016
Nishanth Koganti1,2
, Tomoya Tamei1
, Kazushi Ikeda1
, Tomohiro Shibata2
1
Nara Institute of Science and Technology, Ikoma, Japan
2
Kyushu Institute of Technology, Kitakyushu, Japan
February 11, 2017
0 / 15

Robotic Clothing Assistance
Aging causes loss of motor functions to perform dextrous tasks.
Goal: Develop learning framework for humanoid robots to
perform clothing assistance.
Challenge: Close interaction of robot with clothes and human
Non-rigid clothing material 1
Varying posture of human 1
1
Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study
1 / 15

Reinforcement Learning for Clothing Assistance
Markov Decision Process (MDP)
formulated with low-dimensional state,
policy representations. 1
1
Tamei, T. et al., “Reinforcement learning of clothing assistance”, in IEEE-RAS Humanoids 2011
2 / 15

Clothing Assistance Framework 1
: Outline
1
2 / 15

: Policy
Control policy parametrized by Via-points 2
of trajectory.
Finite diﬀerence policy gradient method is used for policy update:
∂η(θ)
∂θ
≈
r(θi + ∆θ) − r(θi − ∆θ)
2∆θ
θ ← θ + α
∂η(θ)
∂θ
1
2
Wada, Y. et al. “Theory for handwriting on minimization principle.” in Biological Cybernetics, 1995
3 / 15

Problem: Adaptive Learning of Clothing Skills
Design of robust motor-skills learning framework is crucial for
real-world implementation on low-cost robots.
Tight coupling with cloth and close proximity to Human.
Optimal policy varies with initial conditions.
Non-rigid clothing material Varying posture of human
1
Figure Left: Ramisa et al., 2011, Right: Dan MacLeod Posture Study
4 / 15

Reinforcement Learning in Latent Space
Combining motor-skills learning with dimensionality reduction:
Tractable search space reducing learning time.
Latent space can be modeled to capture task space constraints.
Existing methods rely on linear models or MAP estimate of
latent space.
Bitzer et al., 2010 1
Luck et al., 2014 2
1
Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010
2
Luck, K. S. et al., “Latent space policy search for robotics” in IEEE/RSJ IROS, 2014
5 / 15

Motor-skill Learning in Latent Spaces
Use Bayesian nonparametric nonlinear dimensionality reduction for
eﬃcient learning of clothing skills 1.
1
Nishanth, K. et al., “Bayesian Nonparametric Motor-skill Representations for Eﬃcient Learning of Clothing
Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016
6 / 15

Bayesian Gaussian Process Latent Variable Model
Latent variable model (Titsias et al., 2010 1):
y = f (x) + , ∈ N(0, σ2
I)
y ∈ RD
: Observed Variable
x ∈ RQ
(Q D): Unknown latent variable
f : x → y: Mapping given by Gaussian Process
p(Y|X) =
D
d=1
N(yd |0, KNN + β−1
IN)
x f
w, θ
y
1
Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011
7 / 15

BGPLVM: Manifold Learning
Bayesian Inference: Posterior distribution on the latent
space.
p(Y) =
X
p(Y|X)p(X)dX
Marginalization made tractable using variational inference:
q(X) =
N
n=1
N(xn|µn, Sn)
log(p(Y)) ≥ q(X)p(Y|X)dX − q(X) log
q(X)
p(X)
dX
Automatic dimensionality reduction possible using ARD kernel:
k(x, x ) = σ2
f exp

−
1
2
Q
q=1
wq(xq − xq)2


1
Titsias, M. K. et al., “Bayesian Gaussian Process Latent Variable Model”, in AISTATS 2011
8 / 15

Motor-skills Transfer through Latent Space
BGPLVM model trained on robot joint angles ∈ R14
for kinesthetic
demonstration of clothing assistance 1.
1
Nishanth, K. et al., “Motor-skill Learning in Latent Spaces for Robotic Clothing Assistance” in RSJ Annual
Conference, 2016
9 / 15

Reinforcement Learning in BGPLVM Space
Apply Cross Entropy Method to perform policy improvement:
θ∗
∼ N(θ|µ∗
, Σ∗
)
µ∗
:= mean(argmax θold), Σ∗
:= var(argmax θold)
Represent policy using Dynamic Movement Primitive (DMP):
τ¨x = K(g − x) − D ˙x + (g − x0)f
f (s) = i wi ψi (s)s
i ψi (s)
, where τ ˙s = −αs
1
Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016 10 / 15

Reinforcement Learning in BGPLVM Space
Represent reward function by distance from desired Via-points
of current policy:
R(π(θ)) =
ndims
i=1
nvia
j=1
Vi,j − πi (θ, ti,j) 2
11 / 15

Latent Space Controller for Clothing Tasks 1
1
Conference, 2016
12 / 15

Generalization in Latent Space
Evaluation: Reconstruction error
of latent space with RMS Error 1.
Dataset: Clothing trajectories
for 4 postures: Shoulder Angle
∈ {65o
, 70o
, 75o
, 80o
}.
PCA GPLVM BGPLVM
1
Conference, 2016
13 / 15

Reinforcement Learning in Latent Space
Apply Reinforcement Learning in diﬀerent action spaces with same
formulation and reward function
Parameters: 50 × ndims
basis functions
CEM: 50 rollouts per
iteration.
Policy Update: 5 best
rollouts per iteration
1
Assistance” in Workshop on Practical Bayesian Nonparametrics, NIPS, 2016
14 / 15

Moving forward
Immediate Goal: Latent spaces for Robotics applications:
Auto-regressive prior on latent space to capture task dynamics.
Explicit model of human-robot interaction as constraint.
Ambitious Goal: Combine policy search RL and BGPLVM:
Non-linear dimensionality reduction.
Bayesian and data-efficient learning.
Data-efficient 1
Bayesian Inference 1
1
Deisenroth, M. P. et al., “Gaussian processes for data-efficient learning in robotics and control” in IEEE
Transactions PAMI, 2015
15 / 15

Topology Coordinates
To approximate Markov Decision Process, the relationship between
cloth and subject needs to be observed as much as possible.
Low dimensional representations need to be used for a fast learning
time.
Topological Coordinates introduced to address both requirements.
Concept proposed by Edmond et. al(2009) 1
.
Given 2 line segments, the amount of twist(writhe) between them is
given by the Guassian Linking Integral(GLI):
w = GLI(γ1, γ2) =
1
4π γ1 γ2
dγ1 × dγ2 · (γ1 − γ2)
γ1 − γ2
3 (1)
1
Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009
15 / 15

Topology Space
The relationship between linesegments is deﬁned by the Writhe
matrix(Tn×m).
Given line segments S1, S2 with n,m links, Tn×m is given by:
Tij = GLI(Si
1, Sj
2)
The parameters writhe, center, density are deﬁned from writhe
matrix which form the Topology Space.
1
Motion Synthesis using Topology Coordinates, Edmond et. al., Eurographics 2009
15 / 15

: State and Reward
Low-dimensional representation using Topology Coordinates 2
.
Reward given by distance between ﬁnal state and target state:
ri = − starget
i − si (i = 1, 2, 3), r(s) =
3
i=1
ri − µi
σi
1
2
Ho, E. S., et al., “Character synthesis by topology coordinates”, in Computer Graphics Forum 2009
15 / 15

Combining DR and RL
Policy representation:
a = W(ZT
Φ) + MΦ + EΦ
Expectation Step: Posterior distribution over Latent Variables
pθold
(ZT
Φ|a) = N(CWT
(a − MΦ), Cσ2
tr(ΦΦT
)),
C = (σ2
I + WT
W)
Maximization: Compute gradients with respect to Policy
parameters
∂lnp(a)Qt
π
∂M
,
∂lnp(a)Qt
π
∂W
,
∂lnp(a)Qt
π
∂σ2
1
15 / 15

DR as Preprocessing for RL
Bitzer et al. (2010) 1: GPLVM based latent space encoding
task space constraints.
Non-linear dimensionality reduction
Data-eﬃcient learning with GP-mapping
Value-function reinforcement learning (TD(0)) applied to
tractable search space.
1
Bitzer, S. et al., “Using dimensionality reduction in reinforcement learning” in IEEE/RSJ IROS, 2010
15 / 15

Combining DR and RL
Luck et al. (2014) 1: Joint learning of latent space and
optimal policy.
a = W(ZT
Φ) + MΦ + EΦ (2)
PePPER: Formulated Expectation-Maximization formulation
based on KL-divergence lower bound.
Probabilistic PCA used as model for learning latent space.
1
15 / 15

Combining DR and RL
Inverse Kinematics: Planning in joint angle space of highly
redundant robot (20 DOF).
Standing on one leg: Applied to full-humanoid robot and
policy learned from scratch.
1
15 / 15

Discussion
Robotic Clothing Assistance involves several problems.
Propose use of DR with RL for eﬃcient motor-skills learning.
Future Work
Implement Latent Space RL framework for Clothing
Assistance framework.
Combine real-time state estimation with motor-skills learning
framework.
15 / 15

References
Tamei, Tomoya, et al. “Reinforcement learning of clothing assistance with a
dual-arm robot.” Humanoid Robots (Humanoids), 2011 11th IEEE-RAS
International Conference on. IEEE, 2011.
Ho, Edmond SL, and Taku Komura. “Character motion synthesis by topology
coordinates.” Computer Graphics Forum. Vol. 28. No. 2. Blackwell Publishing
Ltd, 2009.
Pohl, William F. “The self-linking number of a closed space curve(Gauss integral
formula treated for disjoint closed space curves linking number).” Journal of
Mathematics and Mechanics 17 (1968): 975-985.
Miyamoto, Hiroyuki, et al. “A kendama learning robot based on bi-directional
theory.” Neural networks 9.8 (1996): 1281-1302.
Koganti, Nishanth, et al. “Cloth dynamics modeling in latent spaces and its
application to robotic clothing assistance.” Intelligent Robots and Systems
(IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015.
Deisenroth, Marc Peter, Dieter Fox, and Carl Edward Rasmussen. “Gaussian
processes for data-eﬃcient learning in robotics and control.” Pattern Analysis
and Machine Intelligence, IEEE Transactions on 37.2 (2015): 408-423.
Levine, Sergey, et al. “End-to-end training of deep visuomotor policies.” arXiv
preprint arXiv:1504.00702 (2015).
15 / 15

Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance

Ähnlich wie Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Bayesian Nonparametric Motor-skill Representations for Efficient Learning of Robotic Clothing Assistance