SlideShare ist ein Scribd-Unternehmen logo
1 von 101
Downloaden Sie, um offline zu lesen
Introduction               Architecture                      Imitation in robot groups            Conclusion




                        Learning and imitation in
                       heterogeneous robot groups

                                          Wilhelm Richert
                                              richert@c-lab.de



                    Fakultät für Elektrotechnik, Informatik und Mathematik,
                                     Universität Paderborn


                                      22. Dezember 2009




               Learning and imitation in heterogeneous robot groups                      1 / 58
Introduction                     Architecture                     Imitation in robot groups                 Conclusion




Motivation
Why do we need learning and imitation?
       State of the art
           v   Off-line learning (mostly population-based)
           v   Behavior is fixed afterwards




                Swarmanoid [Dorigo et al., 2006]                            Symbrion [Baele et al., 2009]




                    Learning and imitation in heterogeneous robot groups                        2 / 58
Introduction                     Architecture                     Imitation in robot groups                 Conclusion




Motivation
Why do we need learning and imitation?
       State of the art
           v   Off-line learning (mostly population-based)
           v   Behavior is fixed afterwards




                Swarmanoid [Dorigo et al., 2006]                            Symbrion [Baele et al., 2009]
       Desired
           v   On-line learning to intelligently react on unforeseeable events/problems
           v   Means to benefit from the “redundancy” in group behavior
           v   Robustness to arbitrary robot groups


                    Learning and imitation in heterogeneous robot groups                        2 / 58
Introduction                 Architecture                      Imitation in robot groups            Conclusion




The five big challenges in imitation
[Dautenhahn and Nehaniv, 2002]




       Five big challenges governing successful imitation in multi-robot systems:

                whom       heterogeneous robot groups
                 when      concentrate on salient behavior
                 what      the results, the actions, or the hidden goals of the imitatee?
                  how      correspondence problem
       how to evaluate What should be counted as successful imitation?




                 Learning and imitation in heterogeneous robot groups                      3 / 58
Introduction               Architecture                      Imitation in robot groups            Conclusion




Thesis objectives



      Robots in a groups shall be able to

         1. combine learning with imitation,
         2. recognize and learn observed
            behavior non-obtrusively, and
         3. choose potential imitatees wisely
            also in heterogeneous robot groups.




               Learning and imitation in heterogeneous robot groups                      4 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                        Architecture                       Imitation in robot groups                       Conclusion




Robot architecture


                                                       motivation layer


                                                                     current motivation
           perception




                                                                                                                      action
                                                                      strategy layer
                                       choice of the




                                                         imitation
                                         imitatee




                                                                     request        result


                                                                        skill layer




                                                                                                        interaction example

                        Learning and imitation in heterogeneous robot groups                       5 / 58
Introduction                                            Architecture                    Imitation in robot groups                                              Conclusion




Strategy layer                                                                                             raw perception, motivation
                                                                                                                               I , µi


                                                                                                               perception filtering
                                                                                                                         ot      b Is


                                                                                                                        experience
                                         motivation layer                                             –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e


                                                        current motivation
 perception




                                                                                               abstraction
                                                                                                                                                    heuristics




                                                                               action
                                                                                                 s   ξˆo
                                                         strategy layer
                      choice of the




                                            imitation
                        imitatee




                                                        request     result                                                 model
                                                                                                                              T, R, γ
                                                           skill layer

                                                                                                                    reinforcement
                                                                                                                       learning
 v            Inspired by AMPS [Kochenderfer, 2006]

                                                                                                                              policy
                                                                                                                                 π


                                                                                                                  action selection
                                                                                                                     a   π ˆs  b A

                                      Learning and imitation in heterogeneous robot groups                                           6 / 58
Introduction                      Architecture                      Imitation in robot groups                                               Conclusion




Strategy layer                                                                          raw perception, motivation
                                                                                                            I , µi

 v   State abstraction function ξ might use any
                                                                                            perception filtering
     abstraction method supporting                                                                    ot      b Is
         v     insertion of new state observations
         v     deletion of old state observations                                                    experience
                                                                                   –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e
         v     querying most similar state observation to
               a new state observation
                                                                             abstraction
 v   Experiments use nearest neighbor                                         s   ξˆo
                                                                                                                                 heuristics



                                                                                                        model
                                                                                                           T, R, γ


                                                                                                 reinforcement
                                                                                                    learning



            region                                                                                         policy
       (abstract state)                                                                                     π
                                          state observation
                                             (raw state)
                                                                                               action selection
                                                                                                  a   π ˆs  b A

                      Learning and imitation in heterogeneous robot groups                                        6 / 58
Introduction                        Architecture                    Imitation in robot groups                                               Conclusion




Strategy layer                                                                          raw perception, motivation
                                                                                                            I , µi

 v   Heuristics maintain the models so that the same
     action feels similar in all observations of the                                        perception filtering
                                                                                                      ot      b Is
     same state
 v   Heuristics may split or merge regions                                                           experience
     transition, failure, reward, simplification, experience                        –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e

 v   Example: transition heuristic
                                                                             abstraction
                                                                                                                                 heuristics
                                                                              s   ξˆo


                                                                                                        model
                                                                                                           T, R, γ


                                                                                                 reinforcement
                                                                                                    learning



            region                                                                                         policy
       (abstract state)                                                                                     π
                                            state observation
                                               (raw state)
                                                                                               action selection
                                                                                                  a   π ˆs  b A

                      Learning and imitation in heterogeneous robot groups                                        6 / 58
Introduction                        Architecture                    Imitation in robot groups                                               Conclusion




Strategy layer                                                                          raw perception, motivation
                                                                                                            I , µi

 v   Heuristics maintain the models so that the same
     action feels similar in all observations of the                                        perception filtering
                                                                                                      ot      b Is
     same state
 v   Heuristics may split or merge regions                                                           experience
     transition, failure, reward, simplification, experience                        –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e

 v   Example: transition heuristic
                                                                             abstraction
                                                                                                                                 heuristics
                                                                              s   ξˆo


                                                                                                        model
                                                                                                           T, R, γ


                                                                                                 reinforcement
                                                                                                    learning



            region                                                                                         policy
       (abstract state)                                                                                     π
                                            state observation
                                               (raw state)
                                                                                               action selection
                                                                                                  a   π ˆs  b A

                      Learning and imitation in heterogeneous robot groups                                        6 / 58
Introduction                        Architecture                      Imitation in robot groups                                               Conclusion




Building a policy                                                                         raw perception, motivation
                                                                                                              I , µi


 v   Reinforcement Learning with SMDP                                                         perception filtering
     v    Qˆs, a   Rˆs, a       Q Pˆs ƒs, aγˆs, a, s Vπ ˆs 
                                  œ
                                          œ                 œ     œ                                     ot      b Is
                                  s bS
 v   Determine current best policy                                                                     experience
                                                                                     –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e
     v    V π ˆs   max Qˆs, a
                    abA
     v    π ˆs   arg max Qˆs, a                                              abstraction
                    abA                                                         s   ξˆo
                                                                                                                                   heuristics



                                                                                                          model
                                                                                                             T, R, γ


                                                                                                   reinforcement
                                                                                                      learning



              region                                                                                         policy
         (abstract state)                                                                                     π
                                              state observation
                                                 (raw state)
                                                                                                 action selection
                                                                                                    a   π ˆs  b A

                        Learning and imitation in heterogeneous robot groups                                        7 / 58
Introduction                                             Architecture                    Imitation in robot groups                                              Conclusion




Strategy layer                                                                                              raw perception, motivation
                                                                                                                                I , µi


                                                                                                                perception filtering
                                                                                                                          ot      b Is
 v            Strategy layer requests symbolic actions
                                                                                                                         experience
 v            Execution of these actions is up to the skill layer                                      –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e



                                                                                                abstraction
                                          motivation layer                                                                                           heuristics
                                                                                                  s   ξˆo

                                                         current motivation
 perception




                                                                                                                            model



                                                                                action
                                                          strategy layer                                                       T, R, γ
                       choice of the




                                             imitation
                         imitatee




                                                         request     result                                          reinforcement
                                                                                                                        learning
                                                            skill layer

                                                                                                                               policy
                                                                                                                                  π


                                                                                                                   action selection
                                                                                                                      a   π ˆs  b A

                                       Learning and imitation in heterogeneous robot groups                                           8 / 58
Introduction                   Architecture                      Imitation in robot groups            Conclusion




Skill layer
       Tasks
         1. discover and learn a set of skills that are useful to the
            strategy layer             
                             ground symbols b A
         2. execute them when requested and optimize at runtime
       Skill
           v   skill s   ˆfe , . . . , feN , where
           v   error function fe ¢ Ia ! Ia              
                                                    R assigns an error value to a
               pair of perception ‰I ˆti , I ˆtj Ž
       Example: “approach the ball and orient towards it”
       fe ˆI ˆti , I ˆtj    dball ˆI ˆtj                    
                                                        minimize the ball distance
       fe ˆI ˆti , I ˆtj    ƒαball ˆI ˆtj ƒ                      
                                                           minimize the ball angle
       s   ˆfe , fe                         approach the ball and orient towards it


                   Learning and imitation in heterogeneous robot groups                      9 / 58
Introduction                                 Architecture                           Imitation in robot groups                     Conclusion




Skill layer
Measuring a skill’s progress
     v   Progress function fp ¢ Ia ! Ia                           
                                                         , ¥ measures a skill’s progress
     v   For a skill s   ˆfe , . . . , feN  it is defined as
                                                          ¢
                                                          ¨
                                                          ¨                          if Ca f W ˆI ˆti , I ˆtj 
                                                          ¨ C W ˆI ˆt  ,I ˆt 
                                                          ¨
                                fp ˆI ˆti , I ˆtj      ¦
                                                             a        i       j
                                                                                     if Csd W ˆI ˆti , I ˆtj  d Ca
                                                          ¨
                                                          ¨     C a C s
                                                          ¨
                                                          ¨                          if W ˆI ˆti , I ˆtj  f Cs
                                                          ¤
         f ei : error function, I ˆt i : perception when the skill has been started, I ˆt j : current perception, success and
         abort thresholds C s          b R and Ca b R (Cs d Ca )

    v    W ˆI ˆti , I ˆtj            PN  fek ˆIˆti , Iˆtj 
                                         k
    v    Example graph:
         Cs   . , Ca   .
         full skill definition




                            Learning and imitation in heterogeneous robot groups                                10 / 58
Introduction                            Architecture                     Imitation in robot groups                                               Conclusion




                                                                                                            observed episode
Imitation                                                                                             `ˆo I , e I , . . . , ˆo I , e N e
                                                                                                                                N
                                                                                                                                      I


Overview of the approach
                                                                                                     transform observations
 v   Robots observe each other permanently
 v   Moving window of observations and well-being states
                                                                                                   subjective observation data
     for each observed robot                                                                         `ˆo D , e , . . . , ˆo D , e N e
                                                                                                                             N
 v   Imitation process starts when well-being
     improvement is detected
                                                                                                         interpret behavior


                                      motivation layer                                                   recognized episodes
                                                                                           `. . . , ˆˆ t, o D , s , a t , ˆ t œ , oœD , s œ  , . . .e
                                                    current motivation
       perception




                                                                          action
                                                     strategy layer                                       estimate rewards
                      choice of the




                                        imitation
                        imitatee




                                                    request     result
                                                                                               observed interpreted experience
                                                       skill layer                       `. . . , ˆˆ t, o D , s , a t , r t , ˆ t œ , oœD , s œ  , . . .e



                                                                                                  integrate into experience,
                                                                                                        update SMDP


                    Learning and imitation in heterogeneous robot groups                                          11 / 58
Introduction                Architecture                      Imitation in robot groups             Conclusion




Imitation
HMM and the Viterbi connection [Viterbi, 1967]


                                                       sb


                              sa                                                sc




                              ox                       oy                       oz




               Learning and imitation in heterogeneous robot groups                       12 / 58
Introduction                Architecture                                          Imitation in robot groups             Conclusion




Imitation
HMM and the Viterbi connection [Viterbi, 1967]


                                                                         sb
                                                       ƒ sa
                                              P ˆs b
                                                                    Pˆsc ƒ sa 
                              saPˆox ƒ sa                                                          sc


                                                         Pˆ                    P ˆo
                                                               oy                 z ƒs
                                                                    ƒs                a
                                                                     a


                              ox                                          oy                        oz




               Learning and imitation in heterogeneous robot groups                                           12 / 58
Introduction                Architecture                                          Imitation in robot groups              Conclusion




Imitation
HMM and the Viterbi connection [Viterbi, 1967]


                                                                         sb
                                                       ƒ sa
                                              P ˆs b
                                                                    Pˆsc ƒ sa 
                              saPˆox ƒ sa                                                          sc


                                                         Pˆ                    P ˆo
                                                               oy                 z ƒs
                                                                    ƒs                a
                                                                     a


                              ox                                          oy                        oz



                        o o . . . oT              Р                Viterbi           Р s s        . . . sT
               V ˆs, t   Pˆot ƒ st   s maxsœ  Pˆst   s ƒ st   sœ V ˆsœ , t                                   ¥


               Learning and imitation in heterogeneous robot groups                                            12 / 58
Introduction                     Architecture                      Imitation in robot groups             Conclusion




Imitation
Interpreting observed behavior with the imitator’s own knowledge

 Knowledge in strategy layer




v   Imitator’s own transition probabilities
    instead of “foreign” HMM transition
    probabilities

                    Learning and imitation in heterogeneous robot groups                       13 / 58
Introduction                     Architecture                      Imitation in robot groups             Conclusion




Imitation
Interpreting observed behavior with the imitator’s own knowledge

 Knowledge in strategy layer

                           s




     s                                           s




v   Imitator’s own transition probabilities
    instead of “foreign” HMM transition
    probabilities

                    Learning and imitation in heterogeneous robot groups                       13 / 58
Introduction                                    Architecture                   Imitation in robot groups             Conclusion




Imitation
Interpreting observed behavior with the imitator’s own knowledge

 Knowledge in strategy layer

                                          s
                            
                       ,s
                  ,a
                                
                        ,s




              s
         Tˆ
                                    
                   ,a

                                ,s
                  s
          Tˆ

                            ,a




                                    T ˆs , a , s 
                        s
                   Tˆ




                                    T ˆs , a , s 
     s                                                         s
                                    T ˆs , a , s 




v   Imitator’s own transition probabilities
    instead of “foreign” HMM transition
    probabilities

                                Learning and imitation in heterogeneous robot groups                       13 / 58
Introduction                                  Architecture                   Imitation in robot groups                   Conclusion




Imitation
Interpreting observed behavior with the imitator’s own knowledge

 Knowledge in strategy layer                                               Knowledge in skill layer
                                                                             approach ball            approach goal      lift ball
                                        s
                          
                       ,s                                                            a                        a               a
                  ,a
                              
                        ,s




              s
         Tˆ
                                  
                   ,a

                              ,s
                  s
          Tˆ

                         ,a




                                  T ˆs , a , s 
                        s
                   Tˆ




                                  T ˆs , a , s 
     s                                                       s
                                  T ˆs , a , s 
                                                                                     ∆o                      ∆o            ∆o
                                                                                 ’       “    ball dist    ’ .   “      ’       “
                                                                                 –        —                 –      —      –       —
                                                                                 – .     —    goal dist    –     —      –       —
                                                                                 –        —                 –      —      –       —
                                                                                 –        —                 –      —      –       —
                                                                                 ”        •   ball height   ”      •      ”   .   •
v   Imitator’s own transition probabilities
    instead of “foreign” HMM transition
    probabilities

                              Learning and imitation in heterogeneous robot groups                             13 / 58
Introduction                                  Architecture                    Imitation in robot groups                                        Conclusion




Imitation
Interpreting observed behavior with the imitator’s own knowledge

 Knowledge in strategy layer                                               Knowledge in skill layer
                                                                              approach ball           approach goal                            lift ball
                                        s
                          
                       ,s                                                            a                        a                                      a
                  ,a
                              
                        ,s




              s
         Tˆ
                                  
                   ,a




                                                                                                                            Pˆ
                              ,s




                                                                                                            Pˆ∆                  ∆o




                                                                                                                                                         Pˆ∆o ƒa 
                  s
          Tˆ

                         ,a




                                  T ˆs , a , s                                                                   o ƒa
                                                                                                                                     ƒa
                        s




                                                                                                                                           
                   Tˆ




                                  T ˆs , a , s 
     s                                                       s
                                  T ˆs , a , s 
                                                                                     ∆o                      ∆o                                  ∆o
                                                                                  ’      “    ball dist    ’ .    “                            ’        “
                                                                                  –       —                 –       —                            –        —
                                                                                  – .    —    goal dist    –      —                            –        —
                                                                                  –       —                 –       —                            –        —
                                                                                  –       —                 –       —                            –        —
                                                                                  ”       •   ball height   ”       •                            ”   .    •
v   Imitator’s own transition probabilities
    instead of “foreign” HMM transition                                   v    Skills vote on perceptual changes                                 fpa
    probabilities                                                              plus the following heuristics ...

                              Learning and imitation in heterogeneous robot groups                                13 / 58
Introduction                                  Architecture                    Imitation in robot groups                                        Conclusion




Imitation
Interpreting observed behavior with the imitator’s own knowledge

 Knowledge in strategy layer                                               Knowledge in skill layer
                                                                              approach ball           approach goal                            lift ball
                                        s
                          
                       ,s                                                            a                        a                                     a
                  ,a
                              
                        ,s




              s
         Tˆ
                                  
                   ,a




                                                                                                                            Pˆ
                              ,s




                                                                                                            Pˆ∆                  ∆o




                                                                                                                                                        Pˆ∆o ƒa 
                  s
          Tˆ

                         ,a




                                  T ˆs , a , s                                                                   o ƒa
                                                                                                                                     ƒa
                        s




                                                                                                                                           
                   Tˆ




                                  T ˆs , a , s 
     s                                                       s
                                  T ˆs , a , s 
                                                                                     ∆o                      ∆o                                  ∆o
                                                                                  ’      “    ball dist    ’ .    “                           ’        “
                                                                                  –       —                 –       —                           –        —
                                                                                  – .    —    goal dist    –      —                           –        —
                                                                                  –       —                 –       —                           –        —
                                                                                  –       —                 –       —                           –        —
                                                                                  ”       •   ball height   ”       •                           ”   .    •
v   Imitator’s own transition probabilities
    instead of “foreign” HMM transition                                   v    Skills vote on perceptual changes                                 fa
                                                                                                                                                  p

    probabilities                                                              plus the following heuristics ...

                              Learning and imitation in heterogeneous robot groups                                13 / 58
Introduction                       Architecture                               Imitation in robot groups             Conclusion




Recognition
   1. Recognize observation changes ot                                                  ot
        a) Prefer nearer goals




                              Ambiguous situation: Robot might drive either to the red or yellow goal base

        b) Ignore skills that “seem to have finished”
        c) Clip votes to   , ¥


                                                        f p ˆ o t  f p ˆ o t 
                                                          a            a
                                                                                   


           Pa ˆot ƒ ot                                        fpa ˆot 




                   Learning and imitation in heterogeneous robot groups                                   14 / 58
Introduction                      Architecture                            Imitation in robot groups                Conclusion




Recognition
   1. Recognize observation changes ot                                                  ot
        a) Prefer nearer goals
        b) Ignore skills that “seem to have finished”




         c) Clip votes to   , ¥


                                  ¢                 f p ˆ o t  f p ˆ o t 
                                                      a            a
                                  ¨                                                               fpa ˆot  d є
                                                                               
                                  ¨   min ‹max ‹            f p t                 ,    , ,
           Pa ˆot ƒ ot          ¦
                                                                a ˆo 

                                  ¨
                                  ¨
                                  ¤   ,                                                         otherwise




                   Learning and imitation in heterogeneous robot groups                               14 / 58
Introduction                      Architecture                             Imitation in robot groups                     Conclusion




Recognition
   1. Recognize observation changes ot                                                  ot
        a) Prefer nearer goals
        b) Ignore skills that “seem to have finished”
        c) Clip votes to   , ¥


                                  ¢                 f p ˆ o t  f p ˆ o t 
                                                      a            a
                                  ¨                                                                     fpa ˆot  d є
                                                                               
                                  ¨   min ‹max ‹            f p t                 ,    , ,
           Pa ˆot ƒ ot          ¦
                                                                a ˆo 

                                  ¨
                                  ¨
                                  ¤   ,                                                             otherwise

   2. Recognize actions in sequence ot
                                     t                                           ot ot        ∆   . . . ot

                                          aml   arg max
                                                               P    t
                                                                    t  t    Pa ˆot ƒ ot        

                                                    a                       t t




                   Learning and imitation in heterogeneous robot groups                                      14 / 58
Introduction                      Architecture                              Imitation in robot groups                         Conclusion




Recognition
   1. Recognize observation changes ot                                                   ot
        a) Prefer nearer goals
        b) Ignore skills that “seem to have finished”
        c) Clip votes to   , ¥


                                  ¢                  f p ˆ o t  f p ˆ o t 
                                                       a            a
                                  ¨                                                                          fpa ˆot  d є
                                                                                
                                  ¨   min ‹max ‹             f p t                 ,    , ,
           Pa ˆot ƒ ot          ¦
                                                                 a ˆo 

                                  ¨
                                  ¨
                                  ¤   ,                                                                  otherwise

   2. Recognize actions in sequence ot
                                     t                                            ot ot            ∆   . . . ot

                                          aml   arg max
                                                                P    t
                                                                     t  t    Pa ˆot ƒ ot          

                                                     a                       t t

   3. Recognize state transitions
                                             Pˆst ƒ st    T ˆst , aml , st                    



                   Learning and imitation in heterogeneous robot groups                                           14 / 58
Introduction                   Architecture                      Imitation in robot groups             Conclusion




Evaluation
Recognition scenario: description



v    Demonstrator (right robot) has to
     transport the yellow ball onto the
     base
v    Imitator (left robot) tries to
     “understand” its observations
v    Two scenarios:
    1. Imitator is only able to drive (and
       thereby push the ball)
    2. Imitator is also able to lift the
       ball

                                                          fig/lifting.png




                  Learning and imitation in heterogeneous robot groups                       15 / 58
Introduction                                   Architecture                      Imitation in robot groups             Conclusion




Evaluation
Recognition scenario: results


                    1. Without lifting capabilities
distance [m]




                  move to                           move to
                            ???
                   ball                              goal




v              Recognized “drive to ball” (B) and “drive to
               goal” (G) correctly
v              Detected “missing behavior” in between



                                  Learning and imitation in heterogeneous robot groups                       16 / 58
Introduction                                   Architecture                           Imitation in robot groups                            Conclusion




Evaluation
Recognition scenario: results


                    1. Without lifting capabilities                                                    2. With lifting capabilities
distance [m]




                                                                                distance [m]
                  move to                           move to                                       move to   lift the             move to
                            ???
                   ball                              goal                                          ball       ball                goal




v              Recognized “drive to ball” (B) and “drive to                   v                Recognized “drive to ball” (B), “lift the ball”
               goal” (G) correctly                                                             (L), and “drive to goal” (G) correctly
v              Detected “missing behavior” in between



                                  Learning and imitation in heterogeneous robot groups                                 16 / 58
Introduction                    Architecture                      Imitation in robot groups             Conclusion




Evaluation
Multi-robot scenario “three bases”

v   Task: transport objects to goal bases
v   Reward for reaching an object: 10
v   Goal bases provide different reward
v   State space consists of
    v   distance to closest object
    v   distance of closest object to closest goal
    v   ID of closest goal




                   Learning and imitation in heterogeneous robot groups                       17 / 58
Introduction                Architecture                      Imitation in robot groups             Conclusion




Conclusion
Objectives achieved in this thesis




   1. Combination of learning and imitation
   2. Non-obtrusive recognition and learning
      of observed behavior
   3. Support for heterogeneous robot
      groups




               Learning and imitation in heterogeneous robot groups                       18 / 58
Introduction                Architecture                      Imitation in robot groups             Conclusion




Conclusion
Objectives achieved in this thesis




   1. Combination of learning and imitation
   2. Non-obtrusive recognition and learning
      of observed behavior
   3. Support for heterogeneous robot
      groups




               Thank you for your attention!
               Learning and imitation in heterogeneous robot groups                       18 / 58
Introduction                         Architecture                       Imitation in robot groups                      Conclusion




v   Architecture                            v   Imitation in robot                       v   Choice of the imitatee
    v   State of the art                                                                     v   Affordance detection
    v
                                                groups                                       v
        Overview                                                                                 Affordance network generation
                                                v   Overview of the approach
    v   Layer interaction                                                                    v   Comparing ANs
                                                                                             v   Choice of the imitatee
                                                v   Recognizing behavior
v   Motivation layer                                v   Viterbi
                                                                                             v   Evaluation
                                                                                             v   Parameterization of the
    v   Excitation                                  v   Interpreting observed behavior
    v                                                                                            environment
        Prioritizing goals                          v   Recognition example                  v   Robustness experiment
                                                v   Integrating recognized behavior          v   Clustering experiment
v   Strategy layer
                                                v   Evaluation
    v   State abstraction
    v                                               v   CTF with three bases
        Heuristics
    v                                               v     Performance
        Policy
    v                                               v     State abstraction
        Sample frequency
    v                                               v     Group homogeneity
        Strategy example
                                                    v   CTF with five bases
                                                    v     Performance
v   Skill layer
                                                    v     State abstraction
    v   Overview of the approach                    v     Group homogeneity
        explore, exploit
    v   Skill manager
    v   Model manager
    v   Error minimizer
    v   Configuration
    v   Skill example




                        Learning and imitation in heterogeneous robot groups                           19 / 58
Introduction                  Architecture                      Imitation in robot groups             Conclusion




 State of the art
[Takahashi et al., 2008] use imitation to learn
      robotic soccer behaviors (approaching,
      shooting a ball)
           combines learning with imitation
          requires the robot group to stop
           whenever a robot imitates
          needs multiple presentation of the
           same behavior
          needs sufficient prior knowledge of
           the task to imitate
[Priesterjahn, 2008] evolves game bots with
        similar performance as the human
        player
[Inamura et al., 2003] combine top-down
      teaching with the bottom-up learning
      from the robot’s side




                  Learning and imitation in heterogeneous robot groups                       20 / 58
Introduction                   Architecture                      Imitation in robot groups                    Conclusion




 State of the art
[Takahashi et al., 2008] use imitation to learn                       The Rule-Based Operation Cycle of an Agent

      robotic soccer behaviors (approaching,
      shooting a ball)
[Priesterjahn, 2008] evolves game bots with
        similar performance as the human
        player
           shows that imitation-based
           adaptation is able to outperform the
           evolutionary only approach
          targeted to computer game
           scenarios, not stochastic real-world
           applications
          assumes group homogeneity
[Inamura et al., 2003] combine top-down
      teaching with the bottom-up learning
      from the robot’s side




                   Learning and imitation in heterogeneous robot groups                       20 / 58
Introduction                   Architecture                      Imitation in robot groups                     Conclusion




 State of the art
[Takahashi et al., 2008] use imitation to learn
      robotic soccer behaviors (approaching,
      shooting a ball)
[Priesterjahn, 2008] evolves game bots with
        similar performance as the human
        player
[Inamura et al., 2003] combine top-down
      teaching with the bottom-up learning
      from the robot’s side
          exclusive approach (cannot be
           combined with other learning
           techniques)
                                                                   Motion capturing system: motion for learning data
          HMM is learned and then fix
           throughout the robot’s lifetime


                                                                  A result of motion generation on a humanoid robot




                   Learning and imitation in heterogeneous robot groups                        20 / 58
Introduction                   Architecture                       Imitation in robot groups                                                  Conclusion




Layer interaction
                                                               clock    motivation layer     strategy layer    skill layer     perception   action


  – Strategy step is triggered                              – next strategy step event
                                                                             Strategy step
      v   Determining the current motivation                                                      request Im
          and the corresponding next strategy                                                processed perception

          action.                                                             set next motivation
                                                                                                              request Is
      v   The strategy layer requires the most                                                        processed perception
          current motivation as feedback                                                               determine next strategy step

          regarding its last chosen action  both 
          are synchronous.                                                  — next skill step event
                                                                                                              Skill step
  — Skill step is triggered                                                                                           request Ia
      v   Strategy step does not have to be                                                                    processed perception
                                                                                                    set next skill         calculate best actuator command
          finished yet
      v   The skill layer simply executes
          according to the action most recently                                                                     set next low-level action
                                                                            ˜ next skill step event
          delivered by the strategy layer
                                                                                                              Skill step
                                                                                                                      request Ia
˜,™ Strategy step has finished                                                                                  processed perception
      v   It signals the next action to execute                                                                            calculate best actuator command

          and to the skill layer.
      v   Subsequent skill steps then perform                                                                       set next low-level action
          this action accordingly.                                          ™ next skill step event


                  Learning and imitation in heterogeneous robot groups                                            21 / 58
Introduction                      Architecture                          Imitation in robot groups                             Conclusion




Motivation layer
Motivation system example

v   The current motivation µ is the vector
    to the current drive state, dependent
                                                                                 drive 1
    on
    v   time                                                                        current      current
    v   perception                                                                  motivation   drive state
                                                                                                                shortest vector
                                                                                                      p         to desired drive area,
                                                                                                                used for prioritization
v   Each drive measures the status of                         well-being
                                                              region
    accomplishing a sub-goal
    (0 = fully accomplished)
                                                                                                                        drive 2



v   A drive i is called satisfied (goal
                                                              drive 3
    achieved) if the corresponding
    motivation is below its threshold:
    µ i d µ iθ
                                                                                                                             more




                     Learning and imitation in heterogeneous robot groups                                 22 / 58
Introduction                Architecture                         Imitation in robot groups             Conclusion




A sub-goal subjected to an excitation

                                                   excitation
                            1




                                                                         threshold
                                                  triggering
                                                  behavior

                                                 well-being region
                            0                                                          t


     v   Excitation describes the force, which the current drive state
         is subjected to.
     v   By specifying it dependent on the perception and on the
         internal state of the robot the user is “programming” the
         final behavior.



               Learning and imitation in heterogeneous robot groups                          23 / 58
Introduction                Architecture                         Imitation in robot groups             Conclusion




Prioritizing goals

     v   At each time step, the motivation layer provides the current
         motivation vector to the strategy layer.
     v   With µ p the strategy layer prioritizes, which of the sub-goals
         are to be handled first

                                              ’ maxˆ       ,µ        µθ  “
                                              – maxˆ       ,µ        µ —
                                                                       θ
                                      µp      –                            —
                                              –                              —
                                              –              ¦               —
                                              ”maxˆ        , µn        µ n •
                                                                          θ


     v   Different drives can be prioritized by means of an according
         scaling   modeling a hierarchy of needs



               Learning and imitation in heterogeneous robot groups                          24 / 58
Introduction                   Architecture                         Imitation in robot groups             Conclusion




Strategy layer
Sample frequency


       A new interaction is made in one of the following conditions:
     v   Sufficiently different perception (measured by some scenario-specific distance
         metric d):

                                                   d ˆo t , o t  e θ o
     v   Sufficiently interesting motivation change:

                                                   ƒ µt      µt ƒ e θ r
     v   Enough time has passed:

                                                      t      t e θt

       θ o , θ r , and θ t are application specific and have to be determined empirically.




                  Learning and imitation in heterogeneous robot groups                          25 / 58
Introduction                       Architecture                      Imitation in robot groups                                                Conclusion




Strategy example

                                                                                       S
                  S




                                   G
                                                                                                                                G



                      (3, 1)       (4, 1)

                                                                              (3, 1)       (4, 1)   (5, 1)         (6, 1)           (6, 2)   (6, 3)
         (2, 1)
                                             (4, 2)

                                                                                                                                             (6, 4)
                                                                             (2, 1)
                        3           2                                                               6                      4
         (1, 1)
                                        v                                     (1, 1)                                            v
                                                                                                                                             (6, 5)




                                    G                                                                                       G                (6, 6)




                      Learning and imitation in heterogeneous robot groups                                       26 / 58
Introduction                                             Architecture                 Imitation in robot groups                                            Conclusion




Skill layer

   1. discover and learn a set of skills that are useful to the
      strategy layer   ground symbols b A                  
   2. execute them when requested and optimize at runtime

                         exploration mode                                                               exploitation mode
                        strategy layer                                                                  strategy layer



                             training mode            notify new skill                                     execution mode            request skill
                        skill layer                                                                     skill layer
                                          skill  explore actions         O                                                 skill
                                         manager                                                                         manager


                                                create  fetch skills                                                          set current skill
      perception




                                                                                      perception
                                                                             action




                                                                                                                                                           action
                   Ia                        skills                                                Ia                       skills

                                      create                                                                          update             fetch cur-
                                      mod-                                                                            mod-               rent skill
                                      els                                                                             els
                           model                              error                                        model                             error     O
                          manager                           minimizer                                     manager                          minimizer




                                Learning and imitation in heterogeneous robot groups                                                     27 / 58
Introduction                     Architecture                       Imitation in robot groups                    Conclusion




Skill layer
Data flow in exploration mode

                                   strategy layer



                                          training mode            notify new skill
                                   skill layer
                                                     skill  explore actions                     O
                                                    manager


                                                             create  fetch skills
               perception




                                                                                                        action
                            Ia                            skills

                                                   create
                                                   mod-
                                                   els
                                      model                                 error
                                     manager                              minimizer




               Learning and imitation in heterogeneous robot groups                                 28 / 58
Introduction                     Architecture                      Imitation in robot groups                    Conclusion




Skill layer
Data flow in exploitation mode

                                   strategy layer



                                        execution mode            request skill
                                   skill layer
                                                     skill
                                                    manager


                                                            set current skill
               perception




                                                                                                       action
                            Ia                           skills


                                                  update              fetch cur-
                                                  mod-                rent skill
                                                  els
                                      model                                error               O
                                     manager                             minimizer




               Learning and imitation in heterogeneous robot groups                                29 / 58
Introduction                       Architecture                           Imitation in robot groups                    Conclusion




Skill definition



     v   extraction function fext          ¢   Ia   R extracts information from a perception I ˆt b Ia
     v   control function fc ¢ R ! R             R           associates an error value to the tuple ˆvt i , vt j 
         v     decrease: fc ˆvti , vtj    ƒvtj ƒ
         v     increase: fc ˆvti , vtj    v      S   tj S
         v keep value: fc ˆvti , vtj    ƒvti  δ  vtj ƒ
     v   error function fe ¢ Ia ! Ia   R assigns an error value to a perception pair
     v   progress function fp ¢ Ia ! Ia     , ¥ measures a skill’s progress between two
         time points
                                                                                                                more about f p




                      Learning and imitation in heterogeneous robot groups                            30 / 58
Introduction                   Architecture                      Imitation in robot groups                                                      Conclusion




Skill manager
                                                                                           strategy layer


 v   exploration phase                                                                          training mode            notify new skill
                                                                                           skill layer
     v   generate skills that enable the robot to                                                            skill
                                                                                                            manager
                                                                                                                    explore actions         O

         control the perceived properties
                                                                                                                   create  fetch skills
     v   assign a priority to each skill dependent on




                                                                         perception




                                                                                                                                                action
                                                                                      Ia                        skills
         its execution priority
     v   determine the skills the robot can reliably                                                     create
                                                                                                         mod-
                                                                                                         els
         perform and notify them as new skills to                                             model                              error
                                                                                             manager                           minimizer
         the strategy layer

                                                                                           strategy layer



                                                                                              execution mode             request skill
                                                                                           skill layer
                                                                                                             skill
                                                                                                            manager


 v   exploitation phase                                                                                            set current skill



                                                                         perception




                                                                                                                                                action
                                                                                      Ia                        skills
     v   manage the execution of requested skills
                                                                                                         update              fetch cur-
                                                                                                         mod-                rent skill
                                                                                                         els
                                                                                              model                              error      O
                                                                                             manager                           minimizer




                  Learning and imitation in heterogeneous robot groups                                                    31 / 58
Introduction                      Architecture                    Imitation in robot groups                                                      Conclusion




Model manager
                                                                                            strategy layer



v   creating prediction models for each perceived                                                training mode            notify new skill
                                                                                            skill layer
                                                                                                              skill  explore actions         O
    property                                                                                                 manager


    v                                        ˜ ˜
        prediction model is the tuple ˆidp , S, M, m                                                               create  fetch skills




                                                                          perception
        idp b IDp : perception feature to be predicted




                                                                                                                                                  action
                                                                                       Ia                        skills

        S – IDo ! IDp : subset of the perceptual features
        ˜                                                                                                 create

        M – O: subset of the actuators to control
         ˜                                                                                                mod-
                                                                                                          els

               ˜
                    
                   ˜
        m ¢ RƒSƒ ƒMƒ R predicts the value for the
                                                                                               model
                                                                                              manager
                                                                                                                                  error
                                                                                                                                minimizer


        perceptual feature idp at the next input
                                         ˜
        perception given the values of S and M . ˜
                                                                                            strategy layer
    v   m in experiments: Poly, RBF
                                                                                               execution mode             request skill
v   updating prediction models to reflect new                                                skill layer
                                                                                                               skill
    experiences                                                                                              manager


v   scoring each model dependent on its prediction                                                                  set current skill




                                                                          perception
    accuracy:




                                                                                                                                                  action
                                                                                       Ia                        skills


                                                                                                          update              fetch cur-
                                                                                                                              rent skill
                                          n                                                               mod-

          scoreˆm  
                       P
                                                                                                          els
                                                                                               model                              error      O
                           i k ˆmˆSˆti , M ˆti   vt i 
                           k n                                                                manager                           minimizer




                   Learning and imitation in heterogeneous robot groups                                              32 / 58
Introduction                         Architecture                              Imitation in robot groups                      Conclusion




Error minimizer
    1. Ic ˆt      ¢
                 only those perceptual features, on which the error functions of the current
       skill s are dependent on current time t
    2. Estimate the next perception, Ic ˆt   * , dependent on the motor action M as
       predicted by mbest   arg maxm ˜scoreˆm:
                     j



                                          M
                                        I c ˆt           šmjbest ˆIc ˆt, Mˆt ƒ pj b Ic ˆtŸ

    3. For each error function fek : calculate the expected next error eM ˆt
                                                                        k                                  , with Ic ˆti 
       being the perception when the skill has been started:

                                                 e M ˆt
                                                   k             fek ˆIc ˆti , Ic ˆt
                                                                                  M
                                                                                           


    4. Determine the best actuator command M ˆt , by finding the one that minimizes the
       accumulated expected error:


                                                                          Q eM ˆt
                                                                           N
                                                    Mnext ˆt    min         k            
                                                                      M k 


           *
               t   is the time point of the next interaction after time t
                       Learning and imitation in heterogeneous robot groups                                33 / 58
Introduction                   Architecture                      Imitation in robot groups             Conclusion




Skill layer configuration


       Greater universality leads to a bigger exploration space. It is wise to limit the
       exploration space by specifying non-changing parameters beforehand. This can be
       achieved by configuring the following parameters:
     v   Degrees of freedom specify the number of actors the skill layer has to control.
     v   Extraction functions define the language that can be used to specify the error
         functions.
     v   Control functions specify the functions that the error minimizer will minimize by
         means of the error functions.
     v   Regression models are used by the model manager to build predictions for the
         environment interaction. A regression model consists of two methods: one that fits
         a model to an experience trace and one that predicts the value of the modeled
         property.




                  Learning and imitation in heterogeneous robot groups                       34 / 58
Introduction                Architecture                      Imitation in robot groups             Conclusion




Skill example
“Minimize angle to object” learned with radial basis functions




Controlling speed dependent on angle and                 Controlling rotational speed dependent on
distance to the object                                   angle and distance to the object



               Learning and imitation in heterogeneous robot groups                       35 / 58
Introduction                    Architecture                           Imitation in robot groups                    Conclusion




Imitation
Viterbi [Viterbi, 1967]


       Problem description
     v   Given the observation sequence oN      –o , o , . . . oN e ˆoi b Rd 
     v   Find the most likely hidden state sequence sN   –s , s , . . . , sN e ˆsi b S



       Approach
     v   Maximizing probability PˆsN ƒ oN : sN ‡           arg max P ‰sN ƒ oN Ž
                                                                  sN
         by recursively calculating the probability V ˆs, t    maxs t Pˆot , s . . . st st             s that
         s b S is the observed hidden state at time t given the observations ot :
     v   V ˆs,    Pˆo ƒ s   sPˆs   s ¦ s b S
     v   V ˆs, t   Pˆot ƒ st   s maxsœ  Pˆst   s ƒ st   s V ˆs , t 
                                                      
                                                              œ         œ
                                                                               ¥
     v   φˆs, t    arg maxsœ  Pˆst   s ƒ st   s V ˆs , t  ¥
                                           
                                                 œ        œ




                   Learning and imitation in heterogeneous robot groups                            36 / 58
Introduction                    Architecture                           Imitation in robot groups                    Conclusion




Imitation
Viterbi [Viterbi, 1967]


       Problem description
     v   Given the observation sequence oN      –o , o , . . . oN e ˆoi b Rd 
     v   Find the most likely hidden state sequence sN   –s , s , . . . , sN e ˆsi b S



       Approach
     v   Maximizing probability PˆsN ƒ oN : sN ‡           arg max P ‰sN ƒ oN Ž
                                                                  sN
         by recursively calculating the probability V ˆs, t    maxs t Pˆot , s . . . st st             s that
         s b S is the observed hidden state at time t given the observations ot :
     v   V ˆs,    Pˆo ƒ s   sPˆs   s ¦ s b S
     v   V ˆs, t   Pˆot ƒ st   s maxsœ  Pˆst   s ƒ st   s V ˆs , t 
                                                      
                                                              œ         œ
                                                                               ¥
     v   φˆs, t    arg maxsœ  Pˆst   s ƒ st   s V ˆs , t  ¥
                                           
                                                 œ        œ




                   Learning and imitation in heterogeneous robot groups                            36 / 58
Introduction                    Architecture                           Imitation in robot groups                    Conclusion




Imitation
Viterbi [Viterbi, 1967]


       Problem description
     v   Given the observation sequence oN      –o , o , . . . oN e ˆoi b Rd 
     v   Find the most likely hidden state sequence sN   –s , s , . . . , sN e ˆsi b S



       Approach
     v   Maximizing probability PˆsN ƒ oN : sN ‡           arg max P ‰sN ƒ oN Ž
                                                                  sN
         by recursively calculating the probability V ˆs, t    maxs t Pˆot , s . . . st st             s that
         s b S is the observed hidden state at time t given the observations ot :
     v   V ˆs,    Pˆo ƒ s   sPˆs   s ¦ s b S
     v   V ˆs, t   Pˆot ƒ st   s maxsœ  Pˆst   s ƒ st   s V ˆs , t 
                                                      
                                                              œ         œ
                                                                               ¥
     v   φˆs, t    arg maxsœ  Pˆst   s ƒ st   s V ˆs , t  ¥
                                           
                                                 œ        œ




                   Learning and imitation in heterogeneous robot groups                            36 / 58
Introduction                      Architecture                      Imitation in robot groups                           Conclusion




Imitation
Recognition

       Problem description
     v   Given the observation sequence oN                                –o , o , . . . oN e ˆoi b Rd 
     v   Find the most likely behavior sequence                                            ˆt   b R , o b Rd , s b S, a b A)
                           Γ   ˆ. . . , ˆˆtk , ok , sk , ak , ˆtk , ok , sk  , . . .

       Approach
     v   Maximizing probability Pˆsn , an                     
                                                                  ƒ oN ,   n€N
     v   Adapting V ˆs,  and V ˆs, t :
         v     Use own state and action space for S and A
         v     Support bootstrapping of probabilities
         v     Let actions recognize themselves
                 technical realization of the mirror neuron system


                     Learning and imitation in heterogeneous robot groups                            37 / 58
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups
Learning and imitation in heterogeneous robot groups

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Kürzlich hochgeladen (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Empfohlen

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Empfohlen (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Learning and imitation in heterogeneous robot groups

  • 1. Introduction Architecture Imitation in robot groups Conclusion Learning and imitation in heterogeneous robot groups Wilhelm Richert richert@c-lab.de Fakultät für Elektrotechnik, Informatik und Mathematik, Universität Paderborn 22. Dezember 2009 Learning and imitation in heterogeneous robot groups 1 / 58
  • 2. Introduction Architecture Imitation in robot groups Conclusion Motivation Why do we need learning and imitation? State of the art v Off-line learning (mostly population-based) v Behavior is fixed afterwards Swarmanoid [Dorigo et al., 2006] Symbrion [Baele et al., 2009] Learning and imitation in heterogeneous robot groups 2 / 58
  • 3. Introduction Architecture Imitation in robot groups Conclusion Motivation Why do we need learning and imitation? State of the art v Off-line learning (mostly population-based) v Behavior is fixed afterwards Swarmanoid [Dorigo et al., 2006] Symbrion [Baele et al., 2009] Desired v On-line learning to intelligently react on unforeseeable events/problems v Means to benefit from the “redundancy” in group behavior v Robustness to arbitrary robot groups Learning and imitation in heterogeneous robot groups 2 / 58
  • 4. Introduction Architecture Imitation in robot groups Conclusion The five big challenges in imitation [Dautenhahn and Nehaniv, 2002] Five big challenges governing successful imitation in multi-robot systems: whom   heterogeneous robot groups when   concentrate on salient behavior what   the results, the actions, or the hidden goals of the imitatee? how   correspondence problem how to evaluate What should be counted as successful imitation? Learning and imitation in heterogeneous robot groups 3 / 58
  • 5. Introduction Architecture Imitation in robot groups Conclusion Thesis objectives Robots in a groups shall be able to 1. combine learning with imitation, 2. recognize and learn observed behavior non-obtrusively, and 3. choose potential imitatees wisely also in heterogeneous robot groups. Learning and imitation in heterogeneous robot groups 4 / 58
  • 6. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 7. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 8. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 9. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 10. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 11. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 12. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 13. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 14. Introduction Architecture Imitation in robot groups Conclusion Robot architecture motivation layer current motivation perception action strategy layer choice of the imitation imitatee request result skill layer interaction example Learning and imitation in heterogeneous robot groups 5 / 58
  • 15. Introduction Architecture Imitation in robot groups Conclusion Strategy layer raw perception, motivation I , µi perception filtering ot b Is experience motivation layer –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e current motivation perception abstraction heuristics action s   ξˆo strategy layer choice of the imitation imitatee request result model T, R, γ skill layer reinforcement learning v Inspired by AMPS [Kochenderfer, 2006] policy π action selection a   π ˆs  b A Learning and imitation in heterogeneous robot groups 6 / 58
  • 16. Introduction Architecture Imitation in robot groups Conclusion Strategy layer raw perception, motivation I , µi v State abstraction function ξ might use any perception filtering abstraction method supporting ot b Is v insertion of new state observations v deletion of old state observations experience –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e v querying most similar state observation to a new state observation abstraction v Experiments use nearest neighbor s   ξˆo heuristics model T, R, γ reinforcement learning region policy (abstract state) π state observation (raw state) action selection a   π ˆs  b A Learning and imitation in heterogeneous robot groups 6 / 58
  • 17. Introduction Architecture Imitation in robot groups Conclusion Strategy layer raw perception, motivation I , µi v Heuristics maintain the models so that the same action feels similar in all observations of the perception filtering ot b Is same state v Heuristics may split or merge regions experience transition, failure, reward, simplification, experience –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e v Example: transition heuristic abstraction heuristics s   ξˆo model T, R, γ reinforcement learning region policy (abstract state) π state observation (raw state) action selection a   π ˆs  b A Learning and imitation in heterogeneous robot groups 6 / 58
  • 18. Introduction Architecture Imitation in robot groups Conclusion Strategy layer raw perception, motivation I , µi v Heuristics maintain the models so that the same action feels similar in all observations of the perception filtering ot b Is same state v Heuristics may split or merge regions experience transition, failure, reward, simplification, experience –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e v Example: transition heuristic abstraction heuristics s   ξˆo model T, R, γ reinforcement learning region policy (abstract state) π state observation (raw state) action selection a   π ˆs  b A Learning and imitation in heterogeneous robot groups 6 / 58
  • 19. Introduction Architecture Imitation in robot groups Conclusion Building a policy raw perception, motivation I , µi v Reinforcement Learning with SMDP perception filtering v Qˆs, a   Rˆs, a Q Pˆs ƒs, aγˆs, a, s Vπ ˆs  œ œ œ œ ot b Is s bS v Determine current best policy experience –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e v V π ˆs   max Qˆs, a abA v π ˆs   arg max Qˆs, a abstraction abA s   ξˆo heuristics model T, R, γ reinforcement learning region policy (abstract state) π state observation (raw state) action selection a   π ˆs  b A Learning and imitation in heterogeneous robot groups 7 / 58
  • 20. Introduction Architecture Imitation in robot groups Conclusion Strategy layer raw perception, motivation I , µi perception filtering ot b Is v Strategy layer requests symbolic actions experience v Execution of these actions is up to the skill layer –ˆo, a, d, µi , f tN , . . . , ˆo, a, d, µi , f t e abstraction motivation layer heuristics s   ξˆo current motivation perception model action strategy layer T, R, γ choice of the imitation imitatee request result reinforcement learning skill layer policy π action selection a   π ˆs  b A Learning and imitation in heterogeneous robot groups 8 / 58
  • 21. Introduction Architecture Imitation in robot groups Conclusion Skill layer Tasks 1. discover and learn a set of skills that are useful to the strategy layer   ground symbols b A 2. execute them when requested and optimize at runtime Skill v skill s   ˆfe , . . . , feN , where v error function fe ¢ Ia ! Ia   R assigns an error value to a pair of perception ‰I ˆti , I ˆtj Ž Example: “approach the ball and orient towards it” fe ˆI ˆti , I ˆtj    dball ˆI ˆtj    minimize the ball distance fe ˆI ˆti , I ˆtj    ƒαball ˆI ˆtj ƒ   minimize the ball angle s   ˆfe , fe    approach the ball and orient towards it Learning and imitation in heterogeneous robot groups 9 / 58
  • 22. Introduction Architecture Imitation in robot groups Conclusion Skill layer Measuring a skill’s progress v Progress function fp ¢ Ia ! Ia     , ¥ measures a skill’s progress v For a skill s   ˆfe , . . . , feN  it is defined as ¢ ¨ ¨ if Ca f W ˆI ˆti , I ˆtj  ¨ C W ˆI ˆt  ,I ˆt  ¨ fp ˆI ˆti , I ˆtj    ¦ a i j if Csd W ˆI ˆti , I ˆtj  d Ca ¨ ¨ C a C s ¨ ¨ if W ˆI ˆti , I ˆtj  f Cs ¤ f ei : error function, I ˆt i : perception when the skill has been started, I ˆt j : current perception, success and abort thresholds C s b R and Ca b R (Cs d Ca ) v W ˆI ˆti , I ˆtj    PN  fek ˆIˆti , Iˆtj  k v Example graph: Cs   . , Ca   . full skill definition Learning and imitation in heterogeneous robot groups 10 / 58
  • 23. Introduction Architecture Imitation in robot groups Conclusion observed episode Imitation `ˆo I , e I , . . . , ˆo I , e N e N I Overview of the approach transform observations v Robots observe each other permanently v Moving window of observations and well-being states subjective observation data for each observed robot `ˆo D , e , . . . , ˆo D , e N e N v Imitation process starts when well-being improvement is detected interpret behavior motivation layer recognized episodes `. . . , ˆˆ t, o D , s , a t , ˆ t œ , oœD , s œ  , . . .e current motivation perception action strategy layer estimate rewards choice of the imitation imitatee request result observed interpreted experience skill layer `. . . , ˆˆ t, o D , s , a t , r t , ˆ t œ , oœD , s œ  , . . .e integrate into experience, update SMDP Learning and imitation in heterogeneous robot groups 11 / 58
  • 24. Introduction Architecture Imitation in robot groups Conclusion Imitation HMM and the Viterbi connection [Viterbi, 1967] sb sa sc ox oy oz Learning and imitation in heterogeneous robot groups 12 / 58
  • 25. Introduction Architecture Imitation in robot groups Conclusion Imitation HMM and the Viterbi connection [Viterbi, 1967]  sb ƒ sa P ˆs b Pˆsc ƒ sa  saPˆox ƒ sa  sc Pˆ P ˆo oy z ƒs ƒs a a ox oy oz Learning and imitation in heterogeneous robot groups 12 / 58
  • 26. Introduction Architecture Imitation in robot groups Conclusion Imitation HMM and the Viterbi connection [Viterbi, 1967]  sb ƒ sa P ˆs b Pˆsc ƒ sa  saPˆox ƒ sa  sc Pˆ P ˆo oy z ƒs ƒs a a ox oy oz o o . . . oT Р Viterbi Р s s . . . sT V ˆs, t   Pˆot ƒ st   s maxsœ  Pˆst   s ƒ st   sœ V ˆsœ , t ¥ Learning and imitation in heterogeneous robot groups 12 / 58
  • 27. Introduction Architecture Imitation in robot groups Conclusion Imitation Interpreting observed behavior with the imitator’s own knowledge Knowledge in strategy layer v Imitator’s own transition probabilities instead of “foreign” HMM transition probabilities Learning and imitation in heterogeneous robot groups 13 / 58
  • 28. Introduction Architecture Imitation in robot groups Conclusion Imitation Interpreting observed behavior with the imitator’s own knowledge Knowledge in strategy layer s s s v Imitator’s own transition probabilities instead of “foreign” HMM transition probabilities Learning and imitation in heterogeneous robot groups 13 / 58
  • 29. Introduction Architecture Imitation in robot groups Conclusion Imitation Interpreting observed behavior with the imitator’s own knowledge Knowledge in strategy layer s  ,s ,a  ,s s Tˆ  ,a ,s s Tˆ ,a T ˆs , a , s  s Tˆ T ˆs , a , s  s s T ˆs , a , s  v Imitator’s own transition probabilities instead of “foreign” HMM transition probabilities Learning and imitation in heterogeneous robot groups 13 / 58
  • 30. Introduction Architecture Imitation in robot groups Conclusion Imitation Interpreting observed behavior with the imitator’s own knowledge Knowledge in strategy layer Knowledge in skill layer approach ball approach goal lift ball s  ,s a a a ,a  ,s s Tˆ  ,a ,s s Tˆ ,a T ˆs , a , s  s Tˆ T ˆs , a , s  s s T ˆs , a , s  ∆o ∆o ∆o ’ “ ball dist ’ . “ ’ “ – — – — – — – . — goal dist – — – — – — – — – — – — – — – — ” • ball height ” • ” . • v Imitator’s own transition probabilities instead of “foreign” HMM transition probabilities Learning and imitation in heterogeneous robot groups 13 / 58
  • 31. Introduction Architecture Imitation in robot groups Conclusion Imitation Interpreting observed behavior with the imitator’s own knowledge Knowledge in strategy layer Knowledge in skill layer approach ball approach goal lift ball s  ,s a a a ,a  ,s s Tˆ  ,a Pˆ ,s Pˆ∆ ∆o Pˆ∆o ƒa  s Tˆ ,a T ˆs , a , s  o ƒa  ƒa s  Tˆ T ˆs , a , s  s s T ˆs , a , s  ∆o ∆o ∆o ’ “ ball dist ’ . “ ’ “ – — – — – — – . — goal dist – — – — – — – — – — – — – — – — ” • ball height ” • ” . • v Imitator’s own transition probabilities instead of “foreign” HMM transition v Skills vote on perceptual changes   fpa probabilities plus the following heuristics ... Learning and imitation in heterogeneous robot groups 13 / 58
  • 32. Introduction Architecture Imitation in robot groups Conclusion Imitation Interpreting observed behavior with the imitator’s own knowledge Knowledge in strategy layer Knowledge in skill layer approach ball approach goal lift ball s  ,s a a a ,a  ,s s Tˆ  ,a Pˆ ,s Pˆ∆ ∆o Pˆ∆o ƒa  s Tˆ ,a T ˆs , a , s  o ƒa  ƒa s  Tˆ T ˆs , a , s  s s T ˆs , a , s  ∆o ∆o ∆o ’ “ ball dist ’ . “ ’ “ – — – — – — – . — goal dist – — – — – — – — – — – — – — – — ” • ball height ” • ” . • v Imitator’s own transition probabilities instead of “foreign” HMM transition v Skills vote on perceptual changes   fa p probabilities plus the following heuristics ... Learning and imitation in heterogeneous robot groups 13 / 58
  • 33. Introduction Architecture Imitation in robot groups Conclusion Recognition 1. Recognize observation changes ot   ot a) Prefer nearer goals Ambiguous situation: Robot might drive either to the red or yellow goal base b) Ignore skills that “seem to have finished” c) Clip votes to   , ¥ f p ˆ o t  f p ˆ o t a a  Pa ˆot ƒ ot    fpa ˆot  Learning and imitation in heterogeneous robot groups 14 / 58
  • 34. Introduction Architecture Imitation in robot groups Conclusion Recognition 1. Recognize observation changes ot   ot a) Prefer nearer goals b) Ignore skills that “seem to have finished” c) Clip votes to   , ¥ ¢ f p ˆ o t  f p ˆ o t a a ¨ fpa ˆot  d є  ¨ min ‹max ‹ f p t , , , Pa ˆot ƒ ot    ¦ a ˆo  ¨ ¨ ¤ , otherwise Learning and imitation in heterogeneous robot groups 14 / 58
  • 35. Introduction Architecture Imitation in robot groups Conclusion Recognition 1. Recognize observation changes ot   ot a) Prefer nearer goals b) Ignore skills that “seem to have finished” c) Clip votes to   , ¥ ¢ f p ˆ o t  f p ˆ o t a a ¨ fpa ˆot  d є  ¨ min ‹max ‹ f p t , , , Pa ˆot ƒ ot    ¦ a ˆo  ¨ ¨ ¤ , otherwise 2. Recognize actions in sequence ot t   ot ot ∆ . . . ot aml   arg max P t t  t Pa ˆot ƒ ot  a t t Learning and imitation in heterogeneous robot groups 14 / 58
  • 36. Introduction Architecture Imitation in robot groups Conclusion Recognition 1. Recognize observation changes ot   ot a) Prefer nearer goals b) Ignore skills that “seem to have finished” c) Clip votes to   , ¥ ¢ f p ˆ o t  f p ˆ o t a a ¨ fpa ˆot  d є  ¨ min ‹max ‹ f p t , , , Pa ˆot ƒ ot    ¦ a ˆo  ¨ ¨ ¤ , otherwise 2. Recognize actions in sequence ot t   ot ot ∆ . . . ot aml   arg max P t t  t Pa ˆot ƒ ot  a t t 3. Recognize state transitions Pˆst ƒ st    T ˆst , aml , st  Learning and imitation in heterogeneous robot groups 14 / 58
  • 37. Introduction Architecture Imitation in robot groups Conclusion Evaluation Recognition scenario: description v Demonstrator (right robot) has to transport the yellow ball onto the base v Imitator (left robot) tries to “understand” its observations v Two scenarios: 1. Imitator is only able to drive (and thereby push the ball) 2. Imitator is also able to lift the ball fig/lifting.png Learning and imitation in heterogeneous robot groups 15 / 58
  • 38. Introduction Architecture Imitation in robot groups Conclusion Evaluation Recognition scenario: results 1. Without lifting capabilities distance [m] move to move to ??? ball goal v Recognized “drive to ball” (B) and “drive to goal” (G) correctly v Detected “missing behavior” in between Learning and imitation in heterogeneous robot groups 16 / 58
  • 39. Introduction Architecture Imitation in robot groups Conclusion Evaluation Recognition scenario: results 1. Without lifting capabilities 2. With lifting capabilities distance [m] distance [m] move to move to move to lift the move to ??? ball goal ball ball goal v Recognized “drive to ball” (B) and “drive to v Recognized “drive to ball” (B), “lift the ball” goal” (G) correctly (L), and “drive to goal” (G) correctly v Detected “missing behavior” in between Learning and imitation in heterogeneous robot groups 16 / 58
  • 40. Introduction Architecture Imitation in robot groups Conclusion Evaluation Multi-robot scenario “three bases” v Task: transport objects to goal bases v Reward for reaching an object: 10 v Goal bases provide different reward v State space consists of v distance to closest object v distance of closest object to closest goal v ID of closest goal Learning and imitation in heterogeneous robot groups 17 / 58
  • 41. Introduction Architecture Imitation in robot groups Conclusion Conclusion Objectives achieved in this thesis 1. Combination of learning and imitation 2. Non-obtrusive recognition and learning of observed behavior 3. Support for heterogeneous robot groups Learning and imitation in heterogeneous robot groups 18 / 58
  • 42. Introduction Architecture Imitation in robot groups Conclusion Conclusion Objectives achieved in this thesis 1. Combination of learning and imitation 2. Non-obtrusive recognition and learning of observed behavior 3. Support for heterogeneous robot groups Thank you for your attention! Learning and imitation in heterogeneous robot groups 18 / 58
  • 43. Introduction Architecture Imitation in robot groups Conclusion v Architecture v Imitation in robot v Choice of the imitatee v State of the art v Affordance detection v groups v Overview Affordance network generation v Overview of the approach v Layer interaction v Comparing ANs v Choice of the imitatee v Recognizing behavior v Motivation layer v Viterbi v Evaluation v Parameterization of the v Excitation v Interpreting observed behavior v environment Prioritizing goals v Recognition example v Robustness experiment v Integrating recognized behavior v Clustering experiment v Strategy layer v Evaluation v State abstraction v v CTF with three bases Heuristics v v Performance Policy v v State abstraction Sample frequency v v Group homogeneity Strategy example v CTF with five bases v Performance v Skill layer v State abstraction v Overview of the approach v Group homogeneity explore, exploit v Skill manager v Model manager v Error minimizer v Configuration v Skill example Learning and imitation in heterogeneous robot groups 19 / 58
  • 44. Introduction Architecture Imitation in robot groups Conclusion State of the art [Takahashi et al., 2008] use imitation to learn robotic soccer behaviors (approaching, shooting a ball) combines learning with imitation requires the robot group to stop whenever a robot imitates needs multiple presentation of the same behavior needs sufficient prior knowledge of the task to imitate [Priesterjahn, 2008] evolves game bots with similar performance as the human player [Inamura et al., 2003] combine top-down teaching with the bottom-up learning from the robot’s side Learning and imitation in heterogeneous robot groups 20 / 58
  • 45. Introduction Architecture Imitation in robot groups Conclusion State of the art [Takahashi et al., 2008] use imitation to learn The Rule-Based Operation Cycle of an Agent robotic soccer behaviors (approaching, shooting a ball) [Priesterjahn, 2008] evolves game bots with similar performance as the human player shows that imitation-based adaptation is able to outperform the evolutionary only approach targeted to computer game scenarios, not stochastic real-world applications assumes group homogeneity [Inamura et al., 2003] combine top-down teaching with the bottom-up learning from the robot’s side Learning and imitation in heterogeneous robot groups 20 / 58
  • 46. Introduction Architecture Imitation in robot groups Conclusion State of the art [Takahashi et al., 2008] use imitation to learn robotic soccer behaviors (approaching, shooting a ball) [Priesterjahn, 2008] evolves game bots with similar performance as the human player [Inamura et al., 2003] combine top-down teaching with the bottom-up learning from the robot’s side exclusive approach (cannot be combined with other learning techniques) Motion capturing system: motion for learning data HMM is learned and then fix throughout the robot’s lifetime A result of motion generation on a humanoid robot Learning and imitation in heterogeneous robot groups 20 / 58
  • 47. Introduction Architecture Imitation in robot groups Conclusion Layer interaction clock motivation layer strategy layer skill layer perception action – Strategy step is triggered – next strategy step event Strategy step v Determining the current motivation request Im and the corresponding next strategy processed perception action. set next motivation request Is v The strategy layer requires the most processed perception current motivation as feedback determine next strategy step regarding its last chosen action both  are synchronous. — next skill step event Skill step — Skill step is triggered request Ia v Strategy step does not have to be processed perception set next skill calculate best actuator command finished yet v The skill layer simply executes according to the action most recently set next low-level action ˜ next skill step event delivered by the strategy layer Skill step request Ia ˜,™ Strategy step has finished processed perception v It signals the next action to execute calculate best actuator command and to the skill layer. v Subsequent skill steps then perform set next low-level action this action accordingly. ™ next skill step event Learning and imitation in heterogeneous robot groups 21 / 58
  • 48. Introduction Architecture Imitation in robot groups Conclusion Motivation layer Motivation system example v The current motivation µ is the vector to the current drive state, dependent drive 1 on v time current current v perception motivation drive state shortest vector p to desired drive area, used for prioritization v Each drive measures the status of well-being region accomplishing a sub-goal (0 = fully accomplished) drive 2 v A drive i is called satisfied (goal drive 3 achieved) if the corresponding motivation is below its threshold: µ i d µ iθ more Learning and imitation in heterogeneous robot groups 22 / 58
  • 49. Introduction Architecture Imitation in robot groups Conclusion A sub-goal subjected to an excitation excitation 1 threshold triggering behavior well-being region 0 t v Excitation describes the force, which the current drive state is subjected to. v By specifying it dependent on the perception and on the internal state of the robot the user is “programming” the final behavior. Learning and imitation in heterogeneous robot groups 23 / 58
  • 50. Introduction Architecture Imitation in robot groups Conclusion Prioritizing goals v At each time step, the motivation layer provides the current motivation vector to the strategy layer. v With µ p the strategy layer prioritizes, which of the sub-goals are to be handled first ’ maxˆ ,µ µθ  “ – maxˆ ,µ µ — θ µp   – — – — – ¦ — ”maxˆ , µn µ n • θ v Different drives can be prioritized by means of an according scaling   modeling a hierarchy of needs Learning and imitation in heterogeneous robot groups 24 / 58
  • 51. Introduction Architecture Imitation in robot groups Conclusion Strategy layer Sample frequency A new interaction is made in one of the following conditions: v Sufficiently different perception (measured by some scenario-specific distance metric d): d ˆo t , o t  e θ o v Sufficiently interesting motivation change: ƒ µt µt ƒ e θ r v Enough time has passed: t t e θt θ o , θ r , and θ t are application specific and have to be determined empirically. Learning and imitation in heterogeneous robot groups 25 / 58
  • 52. Introduction Architecture Imitation in robot groups Conclusion Strategy example S S G G (3, 1) (4, 1) (3, 1) (4, 1) (5, 1) (6, 1) (6, 2) (6, 3) (2, 1) (4, 2) (6, 4) (2, 1) 3 2 6 4 (1, 1) v (1, 1) v (6, 5) G G (6, 6) Learning and imitation in heterogeneous robot groups 26 / 58
  • 53. Introduction Architecture Imitation in robot groups Conclusion Skill layer 1. discover and learn a set of skills that are useful to the strategy layer ground symbols b A   2. execute them when requested and optimize at runtime exploration mode exploitation mode strategy layer strategy layer training mode notify new skill execution mode request skill skill layer skill layer skill explore actions O skill manager manager create fetch skills set current skill perception perception action action Ia skills Ia skills create update fetch cur- mod- mod- rent skill els els model error model error O manager minimizer manager minimizer Learning and imitation in heterogeneous robot groups 27 / 58
  • 54. Introduction Architecture Imitation in robot groups Conclusion Skill layer Data flow in exploration mode strategy layer training mode notify new skill skill layer skill explore actions O manager create fetch skills perception action Ia skills create mod- els model error manager minimizer Learning and imitation in heterogeneous robot groups 28 / 58
  • 55. Introduction Architecture Imitation in robot groups Conclusion Skill layer Data flow in exploitation mode strategy layer execution mode request skill skill layer skill manager set current skill perception action Ia skills update fetch cur- mod- rent skill els model error O manager minimizer Learning and imitation in heterogeneous robot groups 29 / 58
  • 56. Introduction Architecture Imitation in robot groups Conclusion Skill definition v extraction function fext ¢ Ia   R extracts information from a perception I ˆt b Ia v control function fc ¢ R ! R  R associates an error value to the tuple ˆvt i , vt j  v decrease: fc ˆvti , vtj    ƒvtj ƒ v increase: fc ˆvti , vtj    v S tj S v keep value: fc ˆvti , vtj    ƒvti δ vtj ƒ v error function fe ¢ Ia ! Ia   R assigns an error value to a perception pair v progress function fp ¢ Ia ! Ia     , ¥ measures a skill’s progress between two time points more about f p Learning and imitation in heterogeneous robot groups 30 / 58
  • 57. Introduction Architecture Imitation in robot groups Conclusion Skill manager strategy layer v exploration phase training mode notify new skill skill layer v generate skills that enable the robot to skill manager explore actions O control the perceived properties create fetch skills v assign a priority to each skill dependent on perception action Ia skills its execution priority v determine the skills the robot can reliably create mod- els perform and notify them as new skills to model error manager minimizer the strategy layer strategy layer execution mode request skill skill layer skill manager v exploitation phase set current skill perception action Ia skills v manage the execution of requested skills update fetch cur- mod- rent skill els model error O manager minimizer Learning and imitation in heterogeneous robot groups 31 / 58
  • 58. Introduction Architecture Imitation in robot groups Conclusion Model manager strategy layer v creating prediction models for each perceived training mode notify new skill skill layer skill explore actions O property manager v ˜ ˜ prediction model is the tuple ˆidp , S, M, m create fetch skills perception idp b IDp : perception feature to be predicted action Ia skills S – IDo ! IDp : subset of the perceptual features ˜ create M – O: subset of the actuators to control ˜ mod- els ˜   ˜ m ¢ RƒSƒ ƒMƒ R predicts the value for the model manager error minimizer perceptual feature idp at the next input ˜ perception given the values of S and M . ˜ strategy layer v m in experiments: Poly, RBF execution mode request skill v updating prediction models to reflect new skill layer skill experiences manager v scoring each model dependent on its prediction set current skill perception accuracy: action Ia skills update fetch cur- rent skill n mod- scoreˆm   P els model error O i k ˆmˆSˆti , M ˆti  vt i  k n manager minimizer Learning and imitation in heterogeneous robot groups 32 / 58
  • 59. Introduction Architecture Imitation in robot groups Conclusion Error minimizer 1. Ic ˆt  ¢ only those perceptual features, on which the error functions of the current skill s are dependent on current time t 2. Estimate the next perception, Ic ˆt * , dependent on the motor action M as predicted by mbest   arg maxm ˜scoreˆm: j M I c ˆt    šmjbest ˆIc ˆt, Mˆt ƒ pj b Ic ˆtŸ 3. For each error function fek : calculate the expected next error eM ˆt k , with Ic ˆti  being the perception when the skill has been started: e M ˆt k    fek ˆIc ˆti , Ic ˆt M  4. Determine the best actuator command M ˆt , by finding the one that minimizes the accumulated expected error: Q eM ˆt N Mnext ˆt    min k  M k  * t is the time point of the next interaction after time t Learning and imitation in heterogeneous robot groups 33 / 58
  • 60. Introduction Architecture Imitation in robot groups Conclusion Skill layer configuration Greater universality leads to a bigger exploration space. It is wise to limit the exploration space by specifying non-changing parameters beforehand. This can be achieved by configuring the following parameters: v Degrees of freedom specify the number of actors the skill layer has to control. v Extraction functions define the language that can be used to specify the error functions. v Control functions specify the functions that the error minimizer will minimize by means of the error functions. v Regression models are used by the model manager to build predictions for the environment interaction. A regression model consists of two methods: one that fits a model to an experience trace and one that predicts the value of the modeled property. Learning and imitation in heterogeneous robot groups 34 / 58
  • 61. Introduction Architecture Imitation in robot groups Conclusion Skill example “Minimize angle to object” learned with radial basis functions Controlling speed dependent on angle and Controlling rotational speed dependent on distance to the object angle and distance to the object Learning and imitation in heterogeneous robot groups 35 / 58
  • 62. Introduction Architecture Imitation in robot groups Conclusion Imitation Viterbi [Viterbi, 1967] Problem description v Given the observation sequence oN   –o , o , . . . oN e ˆoi b Rd  v Find the most likely hidden state sequence sN   –s , s , . . . , sN e ˆsi b S Approach v Maximizing probability PˆsN ƒ oN : sN ‡   arg max P ‰sN ƒ oN Ž sN by recursively calculating the probability V ˆs, t    maxs t Pˆot , s . . . st st   s that s b S is the observed hidden state at time t given the observations ot : v V ˆs,    Pˆo ƒ s   sPˆs   s ¦ s b S v V ˆs, t   Pˆot ƒ st   s maxsœ  Pˆst   s ƒ st   s V ˆs , t œ œ ¥ v φˆs, t    arg maxsœ  Pˆst   s ƒ st   s V ˆs , t ¥ œ œ Learning and imitation in heterogeneous robot groups 36 / 58
  • 63. Introduction Architecture Imitation in robot groups Conclusion Imitation Viterbi [Viterbi, 1967] Problem description v Given the observation sequence oN   –o , o , . . . oN e ˆoi b Rd  v Find the most likely hidden state sequence sN   –s , s , . . . , sN e ˆsi b S Approach v Maximizing probability PˆsN ƒ oN : sN ‡   arg max P ‰sN ƒ oN Ž sN by recursively calculating the probability V ˆs, t    maxs t Pˆot , s . . . st st   s that s b S is the observed hidden state at time t given the observations ot : v V ˆs,    Pˆo ƒ s   sPˆs   s ¦ s b S v V ˆs, t   Pˆot ƒ st   s maxsœ  Pˆst   s ƒ st   s V ˆs , t œ œ ¥ v φˆs, t    arg maxsœ  Pˆst   s ƒ st   s V ˆs , t ¥ œ œ Learning and imitation in heterogeneous robot groups 36 / 58
  • 64. Introduction Architecture Imitation in robot groups Conclusion Imitation Viterbi [Viterbi, 1967] Problem description v Given the observation sequence oN   –o , o , . . . oN e ˆoi b Rd  v Find the most likely hidden state sequence sN   –s , s , . . . , sN e ˆsi b S Approach v Maximizing probability PˆsN ƒ oN : sN ‡   arg max P ‰sN ƒ oN Ž sN by recursively calculating the probability V ˆs, t    maxs t Pˆot , s . . . st st   s that s b S is the observed hidden state at time t given the observations ot : v V ˆs,    Pˆo ƒ s   sPˆs   s ¦ s b S v V ˆs, t   Pˆot ƒ st   s maxsœ  Pˆst   s ƒ st   s V ˆs , t œ œ ¥ v φˆs, t    arg maxsœ  Pˆst   s ƒ st   s V ˆs , t ¥ œ œ Learning and imitation in heterogeneous robot groups 36 / 58
  • 65. Introduction Architecture Imitation in robot groups Conclusion Imitation Recognition Problem description v Given the observation sequence oN   –o , o , . . . oN e ˆoi b Rd  v Find the most likely behavior sequence ˆt b R , o b Rd , s b S, a b A) Γ   ˆ. . . , ˆˆtk , ok , sk , ak , ˆtk , ok , sk  , . . . Approach v Maximizing probability Pˆsn , an ƒ oN , n€N v Adapting V ˆs,  and V ˆs, t : v Use own state and action space for S and A v Support bootstrapping of probabilities v Let actions recognize themselves   technical realization of the mirror neuron system Learning and imitation in heterogeneous robot groups 37 / 58