As robots become increasingly affordable, they are used in ever more
diverse areas in order to perform increasingly complex tasks. These
tasks are typically preprogrammed by a human expert. In some cases,
however, this is not feasible -- either because of the inherent
complexity of the task itself or due to the dynamics of the
environment. The only possibility then is to let the robot learn the
task by itself. This learning process usually involves a long training
period in which the robot experiments with its surroundings in order
to learn the desired behavior. If robots have to learn a shared goal
in a group, the robots should imitate each other in order to reduce
their individual learning time. The question how this can be done in a
robot group has been considered in this thesis, i.e., how robots in a
group can learn to achieve their shared goal and imitate
each other in order to increase the performance and the speed of
learning by spreading the learned knowledge in the group.
To allow for this intertwined learning and imitation, a dedicated
robot architecture has been developed. On the one hand, it fosters
autonomous and self-exploratory learning. On the other hand, it allows
for manipulating the learned knowledge and behavior to account for new
knowledge gathered by the imitation process. Learning of behavior is
achieved by separately learning at two levels of abstraction. At the
higher level, the strategy is learned as a mapping from abstract
states to symbolic actions. At the lower level, the symbolic actions
are grounded autonomously by learned low-level actions.
The approaches of imitation presented in this thesis are unique in
that they relieve the requirements that governed multi-robot imitation
so far. It enables robots in a robot group to imitate each other in a
non-obtrusive manner. The robots can thus increase their learning
speed and thereby the overall performance of the group by simply
observing the other group members without requiring them to stick to a
certain communication protocol that would provide the necessary
information. With the presented approach, a robot is able to infer the
behavior that the observed demonstrator is performing and to replay
the beneficial behavior with its own capabilities.
In addition, the presented approaches allow the robots to apply
imitation even if the group is heterogeneous. Normally, the
performance of a group degrades if robots with incompatible
capabilities imitate each other. Capability differences arise if robot
morphologies differ in a robot group. This is the case if different
robots from different manufacturers form a robot group that has to
achieve shared goals. This thesis presents an approach that is able to
determine similarities or differences between robots. This can guide
the robots in a heterogeneous robot group in order to determine those
robots for imitation that are most similar to themselves.
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Learning and imitation in heterogeneous robot groups
1. Introduction Architecture Imitation in robot groups Conclusion
Learning and imitation in
heterogeneous robot groups
Wilhelm Richert
richert@c-lab.de
Fakultät für Elektrotechnik, Informatik und Mathematik,
Universität Paderborn
22. Dezember 2009
Learning and imitation in heterogeneous robot groups 1 / 58
2. Introduction Architecture Imitation in robot groups Conclusion
Motivation
Why do we need learning and imitation?
State of the art
v Off-line learning (mostly population-based)
v Behavior is fixed afterwards
Swarmanoid [Dorigo et al., 2006] Symbrion [Baele et al., 2009]
Learning and imitation in heterogeneous robot groups 2 / 58
3. Introduction Architecture Imitation in robot groups Conclusion
Motivation
Why do we need learning and imitation?
State of the art
v Off-line learning (mostly population-based)
v Behavior is fixed afterwards
Swarmanoid [Dorigo et al., 2006] Symbrion [Baele et al., 2009]
Desired
v On-line learning to intelligently react on unforeseeable events/problems
v Means to benefit from the “redundancy” in group behavior
v Robustness to arbitrary robot groups
Learning and imitation in heterogeneous robot groups 2 / 58
4. Introduction Architecture Imitation in robot groups Conclusion
The five big challenges in imitation
[Dautenhahn and Nehaniv, 2002]
Five big challenges governing successful imitation in multi-robot systems:
whom heterogeneous robot groups
when concentrate on salient behavior
what the results, the actions, or the hidden goals of the imitatee?
how correspondence problem
how to evaluate What should be counted as successful imitation?
Learning and imitation in heterogeneous robot groups 3 / 58
5. Introduction Architecture Imitation in robot groups Conclusion
Thesis objectives
Robots in a groups shall be able to
1. combine learning with imitation,
2. recognize and learn observed
behavior non-obtrusively, and
3. choose potential imitatees wisely
also in heterogeneous robot groups.
Learning and imitation in heterogeneous robot groups 4 / 58
6. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
7. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
8. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
9. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
10. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
11. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
12. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
13. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
14. Introduction Architecture Imitation in robot groups Conclusion
Robot architecture
motivation layer
current motivation
perception
action
strategy layer
choice of the
imitation
imitatee
request result
skill layer
interaction example
Learning and imitation in heterogeneous robot groups 5 / 58
15. Introduction Architecture Imitation in robot groups Conclusion
Strategy layer raw perception, motivation
I , µi
perception filtering
ot b Is
experience
motivation layer –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e
current motivation
perception
abstraction
heuristics
action
s ξˆo
strategy layer
choice of the
imitation
imitatee
request result model
T, R, γ
skill layer
reinforcement
learning
v Inspired by AMPS [Kochenderfer, 2006]
policy
π
action selection
a π ˆs b A
Learning and imitation in heterogeneous robot groups 6 / 58
16. Introduction Architecture Imitation in robot groups Conclusion
Strategy layer raw perception, motivation
I , µi
v State abstraction function ξ might use any
perception filtering
abstraction method supporting ot b Is
v insertion of new state observations
v deletion of old state observations experience
–ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e
v querying most similar state observation to
a new state observation
abstraction
v Experiments use nearest neighbor s ξˆo
heuristics
model
T, R, γ
reinforcement
learning
region policy
(abstract state) π
state observation
(raw state)
action selection
a π ˆs b A
Learning and imitation in heterogeneous robot groups 6 / 58
17. Introduction Architecture Imitation in robot groups Conclusion
Strategy layer raw perception, motivation
I , µi
v Heuristics maintain the models so that the same
action feels similar in all observations of the perception filtering
ot b Is
same state
v Heuristics may split or merge regions experience
transition, failure, reward, simplification, experience –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e
v Example: transition heuristic
abstraction
heuristics
s ξˆo
model
T, R, γ
reinforcement
learning
region policy
(abstract state) π
state observation
(raw state)
action selection
a π ˆs b A
Learning and imitation in heterogeneous robot groups 6 / 58
18. Introduction Architecture Imitation in robot groups Conclusion
Strategy layer raw perception, motivation
I , µi
v Heuristics maintain the models so that the same
action feels similar in all observations of the perception filtering
ot b Is
same state
v Heuristics may split or merge regions experience
transition, failure, reward, simplification, experience –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e
v Example: transition heuristic
abstraction
heuristics
s ξˆo
model
T, R, γ
reinforcement
learning
region policy
(abstract state) π
state observation
(raw state)
action selection
a π ˆs b A
Learning and imitation in heterogeneous robot groups 6 / 58
19. Introduction Architecture Imitation in robot groups Conclusion
Building a policy raw perception, motivation
I , µi
v Reinforcement Learning with SMDP perception filtering
v Qˆs, a Rˆs, a Q Pˆs ƒs, aγˆs, a, s Vπ ˆs
œ
œ œ œ ot b Is
s bS
v Determine current best policy experience
–ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e
v V π ˆs max Qˆs, a
abA
v π ˆs arg max Qˆs, a abstraction
abA s ξˆo
heuristics
model
T, R, γ
reinforcement
learning
region policy
(abstract state) π
state observation
(raw state)
action selection
a π ˆs b A
Learning and imitation in heterogeneous robot groups 7 / 58
20. Introduction Architecture Imitation in robot groups Conclusion
Strategy layer raw perception, motivation
I , µi
perception filtering
ot b Is
v Strategy layer requests symbolic actions
experience
v Execution of these actions is up to the skill layer –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e
abstraction
motivation layer heuristics
s ξˆo
current motivation
perception
model
action
strategy layer T, R, γ
choice of the
imitation
imitatee
request result reinforcement
learning
skill layer
policy
π
action selection
a π ˆs b A
Learning and imitation in heterogeneous robot groups 8 / 58
21. Introduction Architecture Imitation in robot groups Conclusion
Skill layer
Tasks
1. discover and learn a set of skills that are useful to the
strategy layer
ground symbols b A
2. execute them when requested and optimize at runtime
Skill
v skill s ˆfe , . . . , feN , where
v error function fe ¢ Ia ! Ia
R assigns an error value to a
pair of perception ‰I ˆti , I ˆtj Ž
Example: “approach the ball and orient towards it”
fe ˆI ˆti , I ˆtj dball ˆI ˆtj
minimize the ball distance
fe ˆI ˆti , I ˆtj ƒαball ˆI ˆtj ƒ
minimize the ball angle
s ˆfe , fe approach the ball and orient towards it
Learning and imitation in heterogeneous robot groups 9 / 58
22. Introduction Architecture Imitation in robot groups Conclusion
Skill layer
Measuring a skill’s progress
v Progress function fp ¢ Ia ! Ia
, ¥ measures a skill’s progress
v For a skill s ˆfe , . . . , feN it is defined as
¢
¨
¨ if Ca f W ˆI ˆti , I ˆtj
¨ C W ˆI ˆt ,I ˆt
¨
fp ˆI ˆti , I ˆtj ¦
a i j
if Csd W ˆI ˆti , I ˆtj d Ca
¨
¨ C a C s
¨
¨ if W ˆI ˆti , I ˆtj f Cs
¤
f ei : error function, I ˆt i : perception when the skill has been started, I ˆt j : current perception, success and
abort thresholds C s b R and Ca b R (Cs d Ca )
v W ˆI ˆti , I ˆtj PN fek ˆIˆti , Iˆtj
k
v Example graph:
Cs . , Ca .
full skill definition
Learning and imitation in heterogeneous robot groups 10 / 58
23. Introduction Architecture Imitation in robot groups Conclusion
observed episode
Imitation `ˆo I , e I , . . . , ˆo I , e N e
N
I
Overview of the approach
transform observations
v Robots observe each other permanently
v Moving window of observations and well-being states
subjective observation data
for each observed robot `ˆo D , e , . . . , ˆo D , e N e
N
v Imitation process starts when well-being
improvement is detected
interpret behavior
motivation layer recognized episodes
`. . . , ˆˆ t, o D , s , a t , ˆ t œ , oœD , s œ , . . .e
current motivation
perception
action
strategy layer estimate rewards
choice of the
imitation
imitatee
request result
observed interpreted experience
skill layer `. . . , ˆˆ t, o D , s , a t , r t , ˆ t œ , oœD , s œ , . . .e
integrate into experience,
update SMDP
Learning and imitation in heterogeneous robot groups 11 / 58
24. Introduction Architecture Imitation in robot groups Conclusion
Imitation
HMM and the Viterbi connection [Viterbi, 1967]
sb
sa sc
ox oy oz
Learning and imitation in heterogeneous robot groups 12 / 58
25. Introduction Architecture Imitation in robot groups Conclusion
Imitation
HMM and the Viterbi connection [Viterbi, 1967]
sb
ƒ sa
P ˆs b
Pˆsc ƒ sa
saPˆox ƒ sa sc
Pˆ P ˆo
oy z ƒs
ƒs a
a
ox oy oz
Learning and imitation in heterogeneous robot groups 12 / 58
26. Introduction Architecture Imitation in robot groups Conclusion
Imitation
HMM and the Viterbi connection [Viterbi, 1967]
sb
ƒ sa
P ˆs b
Pˆsc ƒ sa
saPˆox ƒ sa sc
Pˆ P ˆo
oy z ƒs
ƒs a
a
ox oy oz
o o . . . oT Ð Viterbi Ð s s . . . sT
V ˆs, t Pˆot ƒ st s maxsœ Pˆst s ƒ st sœ V ˆsœ , t ¥
Learning and imitation in heterogeneous robot groups 12 / 58
27. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Interpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
v Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Learning and imitation in heterogeneous robot groups 13 / 58
28. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Interpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
s
s s
v Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Learning and imitation in heterogeneous robot groups 13 / 58
29. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Interpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer
s
,s
,a
,s
s
Tˆ
,a
,s
s
Tˆ
,a
T ˆs , a , s
s
Tˆ
T ˆs , a , s
s s
T ˆs , a , s
v Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Learning and imitation in heterogeneous robot groups 13 / 58
30. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Interpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer Knowledge in skill layer
approach ball approach goal lift ball
s
,s a a a
,a
,s
s
Tˆ
,a
,s
s
Tˆ
,a
T ˆs , a , s
s
Tˆ
T ˆs , a , s
s s
T ˆs , a , s
∆o ∆o ∆o
’ “ ball dist ’ . “ ’ “
– — – — – —
– . — goal dist – — – —
– — – — – —
– — – — – —
” • ball height ” • ” . •
v Imitator’s own transition probabilities
instead of “foreign” HMM transition
probabilities
Learning and imitation in heterogeneous robot groups 13 / 58
31. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Interpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer Knowledge in skill layer
approach ball approach goal lift ball
s
,s a a a
,a
,s
s
Tˆ
,a
Pˆ
,s
Pˆ∆ ∆o
Pˆ∆o ƒa
s
Tˆ
,a
T ˆs , a , s o ƒa
ƒa
s
Tˆ
T ˆs , a , s
s s
T ˆs , a , s
∆o ∆o ∆o
’ “ ball dist ’ . “ ’ “
– — – — – —
– . — goal dist – — – —
– — – — – —
– — – — – —
” • ball height ” • ” . •
v Imitator’s own transition probabilities
instead of “foreign” HMM transition v Skills vote on perceptual changes fpa
probabilities plus the following heuristics ...
Learning and imitation in heterogeneous robot groups 13 / 58
32. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Interpreting observed behavior with the imitator’s own knowledge
Knowledge in strategy layer Knowledge in skill layer
approach ball approach goal lift ball
s
,s a a a
,a
,s
s
Tˆ
,a
Pˆ
,s
Pˆ∆ ∆o
Pˆ∆o ƒa
s
Tˆ
,a
T ˆs , a , s o ƒa
ƒa
s
Tˆ
T ˆs , a , s
s s
T ˆs , a , s
∆o ∆o ∆o
’ “ ball dist ’ . “ ’ “
– — – — – —
– . — goal dist – — – —
– — – — – —
– — – — – —
” • ball height ” • ” . •
v Imitator’s own transition probabilities
instead of “foreign” HMM transition v Skills vote on perceptual changes fa
p
probabilities plus the following heuristics ...
Learning and imitation in heterogeneous robot groups 13 / 58
33. Introduction Architecture Imitation in robot groups Conclusion
Recognition
1. Recognize observation changes ot ot
a) Prefer nearer goals
Ambiguous situation: Robot might drive either to the red or yellow goal base
b) Ignore skills that “seem to have finished”
c) Clip votes to , ¥
f p ˆ o t f p ˆ o t
a a
Pa ˆot ƒ ot fpa ˆot
Learning and imitation in heterogeneous robot groups 14 / 58
34. Introduction Architecture Imitation in robot groups Conclusion
Recognition
1. Recognize observation changes ot ot
a) Prefer nearer goals
b) Ignore skills that “seem to have finished”
c) Clip votes to , ¥
¢ f p ˆ o t f p ˆ o t
a a
¨ fpa ˆot d є
¨ min ‹max ‹ f p t , , ,
Pa ˆot ƒ ot ¦
a ˆo
¨
¨
¤ , otherwise
Learning and imitation in heterogeneous robot groups 14 / 58
35. Introduction Architecture Imitation in robot groups Conclusion
Recognition
1. Recognize observation changes ot ot
a) Prefer nearer goals
b) Ignore skills that “seem to have finished”
c) Clip votes to , ¥
¢ f p ˆ o t f p ˆ o t
a a
¨ fpa ˆot d є
¨ min ‹max ‹ f p t , , ,
Pa ˆot ƒ ot ¦
a ˆo
¨
¨
¤ , otherwise
2. Recognize actions in sequence ot
t ot ot ∆ . . . ot
aml arg max
P t
t t Pa ˆot ƒ ot
a t t
Learning and imitation in heterogeneous robot groups 14 / 58
36. Introduction Architecture Imitation in robot groups Conclusion
Recognition
1. Recognize observation changes ot ot
a) Prefer nearer goals
b) Ignore skills that “seem to have finished”
c) Clip votes to , ¥
¢ f p ˆ o t f p ˆ o t
a a
¨ fpa ˆot d є
¨ min ‹max ‹ f p t , , ,
Pa ˆot ƒ ot ¦
a ˆo
¨
¨
¤ , otherwise
2. Recognize actions in sequence ot
t ot ot ∆ . . . ot
aml arg max
P t
t t Pa ˆot ƒ ot
a t t
3. Recognize state transitions
Pˆst ƒ st T ˆst , aml , st
Learning and imitation in heterogeneous robot groups 14 / 58
37. Introduction Architecture Imitation in robot groups Conclusion
Evaluation
Recognition scenario: description
v Demonstrator (right robot) has to
transport the yellow ball onto the
base
v Imitator (left robot) tries to
“understand” its observations
v Two scenarios:
1. Imitator is only able to drive (and
thereby push the ball)
2. Imitator is also able to lift the
ball
fig/lifting.png
Learning and imitation in heterogeneous robot groups 15 / 58
38. Introduction Architecture Imitation in robot groups Conclusion
Evaluation
Recognition scenario: results
1. Without lifting capabilities
distance [m]
move to move to
???
ball goal
v Recognized “drive to ball” (B) and “drive to
goal” (G) correctly
v Detected “missing behavior” in between
Learning and imitation in heterogeneous robot groups 16 / 58
39. Introduction Architecture Imitation in robot groups Conclusion
Evaluation
Recognition scenario: results
1. Without lifting capabilities 2. With lifting capabilities
distance [m]
distance [m]
move to move to move to lift the move to
???
ball goal ball ball goal
v Recognized “drive to ball” (B) and “drive to v Recognized “drive to ball” (B), “lift the ball”
goal” (G) correctly (L), and “drive to goal” (G) correctly
v Detected “missing behavior” in between
Learning and imitation in heterogeneous robot groups 16 / 58
40. Introduction Architecture Imitation in robot groups Conclusion
Evaluation
Multi-robot scenario “three bases”
v Task: transport objects to goal bases
v Reward for reaching an object: 10
v Goal bases provide different reward
v State space consists of
v distance to closest object
v distance of closest object to closest goal
v ID of closest goal
Learning and imitation in heterogeneous robot groups 17 / 58
41. Introduction Architecture Imitation in robot groups Conclusion
Conclusion
Objectives achieved in this thesis
1. Combination of learning and imitation
2. Non-obtrusive recognition and learning
of observed behavior
3. Support for heterogeneous robot
groups
Learning and imitation in heterogeneous robot groups 18 / 58
42. Introduction Architecture Imitation in robot groups Conclusion
Conclusion
Objectives achieved in this thesis
1. Combination of learning and imitation
2. Non-obtrusive recognition and learning
of observed behavior
3. Support for heterogeneous robot
groups
Thank you for your attention!
Learning and imitation in heterogeneous robot groups 18 / 58
43. Introduction Architecture Imitation in robot groups Conclusion
v Architecture v Imitation in robot v Choice of the imitatee
v State of the art v Affordance detection
v
groups v
Overview Affordance network generation
v Overview of the approach
v Layer interaction v Comparing ANs
v Choice of the imitatee
v Recognizing behavior
v Motivation layer v Viterbi
v Evaluation
v Parameterization of the
v Excitation v Interpreting observed behavior
v environment
Prioritizing goals v Recognition example v Robustness experiment
v Integrating recognized behavior v Clustering experiment
v Strategy layer
v Evaluation
v State abstraction
v v CTF with three bases
Heuristics
v v Performance
Policy
v v State abstraction
Sample frequency
v v Group homogeneity
Strategy example
v CTF with five bases
v Performance
v Skill layer
v State abstraction
v Overview of the approach v Group homogeneity
explore, exploit
v Skill manager
v Model manager
v Error minimizer
v Configuration
v Skill example
Learning and imitation in heterogeneous robot groups 19 / 58
44. Introduction Architecture Imitation in robot groups Conclusion
State of the art
[Takahashi et al., 2008] use imitation to learn
robotic soccer behaviors (approaching,
shooting a ball)
combines learning with imitation
requires the robot group to stop
whenever a robot imitates
needs multiple presentation of the
same behavior
needs sufficient prior knowledge of
the task to imitate
[Priesterjahn, 2008] evolves game bots with
similar performance as the human
player
[Inamura et al., 2003] combine top-down
teaching with the bottom-up learning
from the robot’s side
Learning and imitation in heterogeneous robot groups 20 / 58
45. Introduction Architecture Imitation in robot groups Conclusion
State of the art
[Takahashi et al., 2008] use imitation to learn The Rule-Based Operation Cycle of an Agent
robotic soccer behaviors (approaching,
shooting a ball)
[Priesterjahn, 2008] evolves game bots with
similar performance as the human
player
shows that imitation-based
adaptation is able to outperform the
evolutionary only approach
targeted to computer game
scenarios, not stochastic real-world
applications
assumes group homogeneity
[Inamura et al., 2003] combine top-down
teaching with the bottom-up learning
from the robot’s side
Learning and imitation in heterogeneous robot groups 20 / 58
46. Introduction Architecture Imitation in robot groups Conclusion
State of the art
[Takahashi et al., 2008] use imitation to learn
robotic soccer behaviors (approaching,
shooting a ball)
[Priesterjahn, 2008] evolves game bots with
similar performance as the human
player
[Inamura et al., 2003] combine top-down
teaching with the bottom-up learning
from the robot’s side
exclusive approach (cannot be
combined with other learning
techniques)
Motion capturing system: motion for learning data
HMM is learned and then fix
throughout the robot’s lifetime
A result of motion generation on a humanoid robot
Learning and imitation in heterogeneous robot groups 20 / 58
47. Introduction Architecture Imitation in robot groups Conclusion
Layer interaction
clock motivation layer strategy layer skill layer perception action
– Strategy step is triggered – next strategy step event
Strategy step
v Determining the current motivation request Im
and the corresponding next strategy processed perception
action. set next motivation
request Is
v The strategy layer requires the most processed perception
current motivation as feedback determine next strategy step
regarding its last chosen action both
are synchronous. — next skill step event
Skill step
— Skill step is triggered request Ia
v Strategy step does not have to be processed perception
set next skill calculate best actuator command
finished yet
v The skill layer simply executes
according to the action most recently set next low-level action
˜ next skill step event
delivered by the strategy layer
Skill step
request Ia
˜,™ Strategy step has finished processed perception
v It signals the next action to execute calculate best actuator command
and to the skill layer.
v Subsequent skill steps then perform set next low-level action
this action accordingly. ™ next skill step event
Learning and imitation in heterogeneous robot groups 21 / 58
48. Introduction Architecture Imitation in robot groups Conclusion
Motivation layer
Motivation system example
v The current motivation µ is the vector
to the current drive state, dependent
drive 1
on
v time current current
v perception motivation drive state
shortest vector
p to desired drive area,
used for prioritization
v Each drive measures the status of well-being
region
accomplishing a sub-goal
(0 = fully accomplished)
drive 2
v A drive i is called satisfied (goal
drive 3
achieved) if the corresponding
motivation is below its threshold:
µ i d µ iθ
more
Learning and imitation in heterogeneous robot groups 22 / 58
49. Introduction Architecture Imitation in robot groups Conclusion
A sub-goal subjected to an excitation
excitation
1
threshold
triggering
behavior
well-being region
0 t
v Excitation describes the force, which the current drive state
is subjected to.
v By specifying it dependent on the perception and on the
internal state of the robot the user is “programming” the
final behavior.
Learning and imitation in heterogeneous robot groups 23 / 58
50. Introduction Architecture Imitation in robot groups Conclusion
Prioritizing goals
v At each time step, the motivation layer provides the current
motivation vector to the strategy layer.
v With µ p the strategy layer prioritizes, which of the sub-goals
are to be handled first
’ maxˆ ,µ µθ “
– maxˆ ,µ µ —
θ
µp – —
– —
– ¦ —
”maxˆ , µn µ n •
θ
v Different drives can be prioritized by means of an according
scaling modeling a hierarchy of needs
Learning and imitation in heterogeneous robot groups 24 / 58
51. Introduction Architecture Imitation in robot groups Conclusion
Strategy layer
Sample frequency
A new interaction is made in one of the following conditions:
v Sufficiently different perception (measured by some scenario-specific distance
metric d):
d ˆo t , o t e θ o
v Sufficiently interesting motivation change:
ƒ µt µt ƒ e θ r
v Enough time has passed:
t t e θt
θ o , θ r , and θ t are application specific and have to be determined empirically.
Learning and imitation in heterogeneous robot groups 25 / 58
52. Introduction Architecture Imitation in robot groups Conclusion
Strategy example
S
S
G
G
(3, 1) (4, 1)
(3, 1) (4, 1) (5, 1) (6, 1) (6, 2) (6, 3)
(2, 1)
(4, 2)
(6, 4)
(2, 1)
3 2 6 4
(1, 1)
v (1, 1) v
(6, 5)
G G (6, 6)
Learning and imitation in heterogeneous robot groups 26 / 58
53. Introduction Architecture Imitation in robot groups Conclusion
Skill layer
1. discover and learn a set of skills that are useful to the
strategy layer ground symbols b A
2. execute them when requested and optimize at runtime
exploration mode exploitation mode
strategy layer strategy layer
training mode notify new skill execution mode request skill
skill layer skill layer
skill explore actions O skill
manager manager
create fetch skills set current skill
perception
perception
action
action
Ia skills Ia skills
create update fetch cur-
mod- mod- rent skill
els els
model error model error O
manager minimizer manager minimizer
Learning and imitation in heterogeneous robot groups 27 / 58
54. Introduction Architecture Imitation in robot groups Conclusion
Skill layer
Data flow in exploration mode
strategy layer
training mode notify new skill
skill layer
skill explore actions O
manager
create fetch skills
perception
action
Ia skills
create
mod-
els
model error
manager minimizer
Learning and imitation in heterogeneous robot groups 28 / 58
55. Introduction Architecture Imitation in robot groups Conclusion
Skill layer
Data flow in exploitation mode
strategy layer
execution mode request skill
skill layer
skill
manager
set current skill
perception
action
Ia skills
update fetch cur-
mod- rent skill
els
model error O
manager minimizer
Learning and imitation in heterogeneous robot groups 29 / 58
56. Introduction Architecture Imitation in robot groups Conclusion
Skill definition
v extraction function fext ¢ Ia R extracts information from a perception I ˆt b Ia
v control function fc ¢ R ! R R associates an error value to the tuple ˆvt i , vt j
v decrease: fc ˆvti , vtj ƒvtj ƒ
v increase: fc ˆvti , vtj v S tj S
v keep value: fc ˆvti , vtj ƒvti δ vtj ƒ
v error function fe ¢ Ia ! Ia R assigns an error value to a perception pair
v progress function fp ¢ Ia ! Ia , ¥ measures a skill’s progress between two
time points
more about f p
Learning and imitation in heterogeneous robot groups 30 / 58
57. Introduction Architecture Imitation in robot groups Conclusion
Skill manager
strategy layer
v exploration phase training mode notify new skill
skill layer
v generate skills that enable the robot to skill
manager
explore actions O
control the perceived properties
create fetch skills
v assign a priority to each skill dependent on
perception
action
Ia skills
its execution priority
v determine the skills the robot can reliably create
mod-
els
perform and notify them as new skills to model error
manager minimizer
the strategy layer
strategy layer
execution mode request skill
skill layer
skill
manager
v exploitation phase set current skill
perception
action
Ia skills
v manage the execution of requested skills
update fetch cur-
mod- rent skill
els
model error O
manager minimizer
Learning and imitation in heterogeneous robot groups 31 / 58
58. Introduction Architecture Imitation in robot groups Conclusion
Model manager
strategy layer
v creating prediction models for each perceived training mode notify new skill
skill layer
skill explore actions O
property manager
v ˜ ˜
prediction model is the tuple ˆidp , S, M, m create fetch skills
perception
idp b IDp : perception feature to be predicted
action
Ia skills
S – IDo ! IDp : subset of the perceptual features
˜ create
M – O: subset of the actuators to control
˜ mod-
els
˜
˜
m ¢ RƒSƒ ƒMƒ R predicts the value for the
model
manager
error
minimizer
perceptual feature idp at the next input
˜
perception given the values of S and M . ˜
strategy layer
v m in experiments: Poly, RBF
execution mode request skill
v updating prediction models to reflect new skill layer
skill
experiences manager
v scoring each model dependent on its prediction set current skill
perception
accuracy:
action
Ia skills
update fetch cur-
rent skill
n mod-
scoreˆm
P
els
model error O
i k ˆmˆSˆti , M ˆti vt i
k n manager minimizer
Learning and imitation in heterogeneous robot groups 32 / 58
59. Introduction Architecture Imitation in robot groups Conclusion
Error minimizer
1. Ic ˆt ¢
only those perceptual features, on which the error functions of the current
skill s are dependent on current time t
2. Estimate the next perception, Ic ˆt * , dependent on the motor action M as
predicted by mbest arg maxm ˜scoreˆm:
j
M
I c ˆt šmjbest ˆIc ˆt, Mˆt ƒ pj b Ic ˆtŸ
3. For each error function fek : calculate the expected next error eM ˆt
k , with Ic ˆti
being the perception when the skill has been started:
e M ˆt
k fek ˆIc ˆti , Ic ˆt
M
4. Determine the best actuator command M ˆt , by finding the one that minimizes the
accumulated expected error:
Q eM ˆt
N
Mnext ˆt min k
M k
*
t is the time point of the next interaction after time t
Learning and imitation in heterogeneous robot groups 33 / 58
60. Introduction Architecture Imitation in robot groups Conclusion
Skill layer configuration
Greater universality leads to a bigger exploration space. It is wise to limit the
exploration space by specifying non-changing parameters beforehand. This can be
achieved by configuring the following parameters:
v Degrees of freedom specify the number of actors the skill layer has to control.
v Extraction functions define the language that can be used to specify the error
functions.
v Control functions specify the functions that the error minimizer will minimize by
means of the error functions.
v Regression models are used by the model manager to build predictions for the
environment interaction. A regression model consists of two methods: one that fits
a model to an experience trace and one that predicts the value of the modeled
property.
Learning and imitation in heterogeneous robot groups 34 / 58
61. Introduction Architecture Imitation in robot groups Conclusion
Skill example
“Minimize angle to object” learned with radial basis functions
Controlling speed dependent on angle and Controlling rotational speed dependent on
distance to the object angle and distance to the object
Learning and imitation in heterogeneous robot groups 35 / 58
62. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Viterbi [Viterbi, 1967]
Problem description
v Given the observation sequence oN –o , o , . . . oN e ˆoi b Rd
v Find the most likely hidden state sequence sN –s , s , . . . , sN e ˆsi b S
Approach
v Maximizing probability PˆsN ƒ oN : sN ‡ arg max P ‰sN ƒ oN Ž
sN
by recursively calculating the probability V ˆs, t maxs t Pˆot , s . . . st st s that
s b S is the observed hidden state at time t given the observations ot :
v V ˆs, Pˆo ƒ s sPˆs s ¦ s b S
v V ˆs, t Pˆot ƒ st s maxsœ Pˆst s ƒ st s V ˆs , t
œ œ
¥
v φˆs, t arg maxsœ Pˆst s ƒ st s V ˆs , t ¥
œ œ
Learning and imitation in heterogeneous robot groups 36 / 58
63. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Viterbi [Viterbi, 1967]
Problem description
v Given the observation sequence oN –o , o , . . . oN e ˆoi b Rd
v Find the most likely hidden state sequence sN –s , s , . . . , sN e ˆsi b S
Approach
v Maximizing probability PˆsN ƒ oN : sN ‡ arg max P ‰sN ƒ oN Ž
sN
by recursively calculating the probability V ˆs, t maxs t Pˆot , s . . . st st s that
s b S is the observed hidden state at time t given the observations ot :
v V ˆs, Pˆo ƒ s sPˆs s ¦ s b S
v V ˆs, t Pˆot ƒ st s maxsœ Pˆst s ƒ st s V ˆs , t
œ œ
¥
v φˆs, t arg maxsœ Pˆst s ƒ st s V ˆs , t ¥
œ œ
Learning and imitation in heterogeneous robot groups 36 / 58
64. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Viterbi [Viterbi, 1967]
Problem description
v Given the observation sequence oN –o , o , . . . oN e ˆoi b Rd
v Find the most likely hidden state sequence sN –s , s , . . . , sN e ˆsi b S
Approach
v Maximizing probability PˆsN ƒ oN : sN ‡ arg max P ‰sN ƒ oN Ž
sN
by recursively calculating the probability V ˆs, t maxs t Pˆot , s . . . st st s that
s b S is the observed hidden state at time t given the observations ot :
v V ˆs, Pˆo ƒ s sPˆs s ¦ s b S
v V ˆs, t Pˆot ƒ st s maxsœ Pˆst s ƒ st s V ˆs , t
œ œ
¥
v φˆs, t arg maxsœ Pˆst s ƒ st s V ˆs , t ¥
œ œ
Learning and imitation in heterogeneous robot groups 36 / 58
65. Introduction Architecture Imitation in robot groups Conclusion
Imitation
Recognition
Problem description
v Given the observation sequence oN –o , o , . . . oN e ˆoi b Rd
v Find the most likely behavior sequence ˆt b R , o b Rd , s b S, a b A)
Γ ˆ. . . , ˆˆtk , ok , sk , ak , ˆtk , ok , sk , . . .
Approach
v Maximizing probability Pˆsn , an
ƒ oN , n€N
v Adapting V ˆs, and V ˆs, t :
v Use own state and action space for S and A
v Support bootstrapping of probabilities
v Let actions recognize themselves
technical realization of the mirror neuron system
Learning and imitation in heterogeneous robot groups 37 / 58