In this slides, we present a common gesture speech framework for both virtual agents like ECA, IVA, VH and physical agents like humanoid robots. This framework is designed for different embodiments so that its processus are independent from a specific agent.
Common Gesture and Speech Production Framework for Virtual and Physical Agents
1. A Common Gesture and Speech Production Framework for
Virtual and Physical Agents
Quoc Anh Le - Jing Huang - Catherine Pelachaud
CNRS, LTCI
Telecom-ParisTech, France
Workshop on Speech and Gesture Production, ICMI 2012, Santa Monica, CA, USA
2. Introduction
Motivations
• Similar approaches between virtual agents and
humanoid robots
• Limits of existing systems: agent dependent
Objectives
• Common co-verbal gesture generation framework for
both virtual and physical agents
Methodologies
• Based on GRETA system
• Use
- same representation languages
- same algorithm for selecting and planning gestures
- different algorithms for creating the animation
page 2
3. Architecture Overview
Intent Lexicon Behavior Lexicon
Input Data (text, audio, Baselines for Nao Gestuary for Nao
video, etc) Baselines for Greta Gestuary for Greta
Intent Planner Behavior Planner Behavior Realizer
(Common Module) (Common Module) (Common Module)
FML- FML-
BML BML Keyframes
APML APML
ActiveMQ
Messaging Central System
Keyframes Keyframes
FAP-BAP FAP-BAP Joint Nao Built-in
Player Values Animation Realizer Animation Realizer Values Proprietary
(Specific Module) (Specific Module) Procedures
Greta Nao
Animation Lexicon Animation Lexicon
page 3
4. Behavior Realizer
Intent Lexicon Behavior Lexicon
Behavior Lexicon
Input Data (text, audio, Baselines for Nao Gestuary for Nao
video, etc) Baselines for Greta Gestuary for Greta
Intent Planner Behavior Planner Behavior Realizer
(Common Module) (Common Module) (Common Module)
FML- FML-
BML BML Keyframes
APML APML
Keyframes Keyframes
FAP-BAP FAP-BAP Joint Nao Built-in
Player Values Animation Realizer Animation Realizer Values Proprietary
(Specific Module) (Specific Module) Procedures
Greta Nao
Animation Lexicon Animation Lexicon
page 4
5. Behavior Realizer: Outline
Common processes to all agents
1. Create gesture from the gestuary of an agent
2. Schedule timing of gesture phases
3. Generate keyframes: pair (absolute time, symbolic
description of hand configuration at this time)
Different databases
For Nao
Gestuary (for instance, pointing with full stretch arm)
Velocity profile (empirically determined from Nao)
For Greta
Gestuary (for instance, pointing with one finger)
Velocity profile (empirically determined from real humans)
page 5
7. BR: Synchronization with speech
Algorithm
• Compute preparation phase
• Do not perform gesture if not enough time (strokeEnd(i-1) > strokeStart(i)
+duration)
• Add a hold phase to fit gesture planned duration
• Co-articulation between several gestures
- If enough time, retraction phase (ie go back to rest position)
Start end Start end
- Otherwise, go from end of stroke to preparation phase of next
gesture
S-start S-end S-start S-end
end
Start
page 7
8. BR: Velocity profiles
Gesture velocity
• Predict a movement duration using Fitts’ law:
• Movement Time = a+b*log2(Distance+1)
• Threshold of maximal speeds (empirically determined)
• Stroke phase is different from other phases in velocity and
acceleration (Quek, 1995)
Add expressivity
• Temportal extent (TMP): Modulate the duration of whole gesture
=> change coefficient of Fitts’ Law
page 8
10. Animation Realizer
Intent Lexicon Behavior Lexicon
Input Data (text, audio, Baselines for Nao Gestuary for Nao
video, etc) Baselines for Greta Gestuary for Greta
Intent Planner Behavior Planner Behavior Realizer
(Common Module) (Common Module) (Common Module)
FML- FML-
BML BML Keyframes
APML APML
Keyframes Keyframes
FAP-BAP Joint
Values Animation Realizer Animation Realizer Values
(Specific Module) (Specific Module)
Greta Nao
Animation Lexicon Animation Lexicon
page 10
11. Implemented expressivity parameters
EXP Definition Nao Greta
TMP Velocity of movement Change coefficient of Fitts’ Change coefficient of
law Fitts’ law
SPC Amplitude of movement Limited in predefined key Change gesture
positions space scales
PWR Acceleration of Modulate stroke duration Modulate stroke
movement acceleration
REP Number of stroke Yes Yes
repetition times
FLD Smoothness and No No
Continuity
OPN Relative spatial extent to No elbow swivel angle
body
TEN Muscular tension No No
Create animation parameters
Joint values for Nao
BAP values for Greta
page 11
12. Create animation parameters
Descritization of the gestural space of McNeill (1992)
One symbolic position will be translated into concrete values of agent joints (for
instance 6 joints of Nao as table below)
Code ArmX ArmY ArmZ Joint values (LShoulderPitch, LShoulderRoll, LElbowYaw, LElbowRoll, LWristYaw, Hand)
000 XEP YUpperEP ZNear (-54.4953, 22.4979, -79.0171, -5.53477, -0.00240423, 1.0)
001 XEP YUpperEP ZMiddle (-65.5696, 22.0584, -78.7534, -8.52309, -0.178188, 1.0)
002 XEP YUpperEP ZFar (-79.2807, 22.0584, -78.6655,-8.4352, -0.178188, 1.0)
010 XEP YUpperP ZNear (-21.0964, 24.2557, -79.4565, -26.8046, 0.261271, 1.0)
... ... ... ... ...
Translate symbolic keyframes in joint values
Animation is obtained by interpolating between
joint values with robot built-in proprietary procedures
use Slerp (spherical linear interpolation) with time warping: easing in out
functionsfor Greta
page 12
13. Greta: Full Body IK
Torso IK
Analytic Method: Arm To Torso
Torso target depending on hand position
page 13
16. Perceptive Evaluation
Objective
• Evaluate how robot’s gestures are perceived by human users
Procedure
• Participants (63 French speakers) rate videos of Nao
storyteller
• Random displayed versions to the participants:
- Gestures with expressivity VS. Gestures without expressivity
- Gesture-speech synchronization VS. Gesture-speech asynchronization
Results (using the ANOVA method)
• Synchronization:
- F(1, 124) = 4.94, p < .05
- 76% agreed that gestures were synchronized with speech for sync version
• Expressivity:
- F(1, 124) = 4.43, p < .05
- 70% agreed that gestures were expressive for expressivity version
page 16
17. State of the art
Most similar work: Salem et al. (2012)
• Same idea (based on existing Max virtual agent system)
Main differences:
• Our system: re-designed GRETA as a common framework
• Salem et al.’s system: adjusted Max’s ACE to ASIMO robot
Features Our model Salem et al.’s system
Gesture Product Online from templates Automatically generated from trained
regardless specific domain specified domain data corpus
Gesture Shapes Agent specific parameter Original for Max and mapped to
ASIMO configurations
Gesture Timing Agent specific parameter Original for Max and adapted to
ASIMO by feedback
Expressivity Yes No
Synchronization Adapt gesture to speech Cross-Modal Adjustment
page 17
18. Future works
Short-term plan
• Human like gestures: enhance velocity profiles
• Expressivity: implement fluidity and tension
Long-term plan
• Feedback mechanism
• Study of the coherence between consecutive
gestures in a G-Unit (Kendon, 2004)
page 18
Editor's Notes
Schedule Mechanisme Such as Account Realize Obtain /ob chen/ Architecture /ar ki tec tro/ Exchange /ex s change z/ Twice / wi so/ Table /ta ble/ Creating /cre et ting/ Message /me se/ Virtual /vir tu al/
donnes une description des keyframes que contiennent-elles comme information
rajouter les definitions manquantes “ Power”: acceleration simulation through slerp (frame interpolation) or trajectory interpolation: use of time variation functions (easing in out functions) Expressive Posture: Volume Editing Power parameter: torso relative rotation varies with time and gesture target positions due to inertia Expressive Animated Sequence: Sequential Editing “ fluidity” and “tension” using TCB spline and noise functions(for trajectory) “ Power”: acceleration simulation through slerp (frame interpolation) or trajectory interpolation: use of time variation functions (easing in out functions)
Joint rotation interpolation: use Slerp (spherical linear interpolation) with time warping: easing in out functions. Definition of trajectory parameters: Various trajectory paths: line, circle, spiral, etc. Expressivity: Kochanek Bartels splines(TCB splines)
For posture generation, we use Forward kine. FK defines the initial states; the IK retargets the postures. Relative torso movement is first generated by using potential torso target depending on both hand gestures positions. (vt1, vl5) We decompose torso movement into horizontal and vertical movements, it depends on the center of both hands targets, we solve it directly by analytical method. Head direction is generated by FK, and trigonometric function for gaze. For Arm gesture we use a mass spring solver, which can apply light weight shoulder movements by defining arm chain from sternoclavicular till wrist. It allows us to model passive shoulder movement
The system of Salem et al. produce gesture parameters > potentially result in mistimed synchronization with speech affiliate due to physical joint velocity limits Max: Gesture shapes are designed for virtual agent > Mapping solution
Long-term plan: Mutual synchronization: Adapting phoneme duration to gestures