Más contenido relacionado

Similar a Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences(20)


Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences

  1. Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences Elisabeth André Augsburg University, Germany
  2. 2 Human-Centered Multimedia  Founded: April 2001  Chair: Elisabeth André  Research Topics:  Human-Computer Interaction  Social Signal Processing  Affective Computing  Embodied Conversational Agents  Social Robotics
  3. 3 Motivation  There is another level in human communication, which is just as important as the spoken message: nonverbal communication  How can we enrich the precise and useful functions of computers with the human’s ability to shape the meaning of a message through nonverbal messages?
  4. 4 Observation  Social signal processing has developed from a side issue to a major area of research.  Undertaken effort has not translated well into applications. Why is this? 1998 ………………. …………………….. 2005 2006 ……. 2009 .. 2011 2012 2013 2015 Special Session on Face and Gesture Recognition Keynote „Honest Signals“ 1st HCM Workshop 1/3 of Grand Challenge Papers on Affective Computing 3 Workshops on „Social Cues“Brave New Topic: Affective Multimodal HCI ACM MM
  5. 5 Challenge: Real-Life Applications  Total of 434 publications on SSPNet  10% include term “real(-)time” and are related to detection  Only 2 % address multi-modal detection  Social Signal Processing in the Wild 90% 3% 2% 2% 1%0%2% face (15) gesture (9) speech (9) interaction (8) physiological (2) multimodal (13) Meta Analysis by J. Wagner
  6. 6 Organization of the Talk  Analysis of Emotional and Social Signals  Generation of Expressive Behaviors in Virtual Agents and Robots  Applications of Socially Signal Processing and Embodied Agents  Socially sensitive Robots  Training of Presentation Skills in • Job Interviews • Public Speaking  Providing Information on Social Context to Blind People
  7. 7 Challenge: Noisy and Corrupted Data  We only rely on previously seen data.  We have to deal with noisy and corrupted data. ? now time noise missing
  8. 8 Challenge: Non-Prototypical Behaviors  Previous research focused on the analysis of prototypical samples in preferably pure form  In daily life, we also observe subtle, blended and suppressed emotions, i.e. non-prototypical emotional displays. Pictures from Ekman and Friesen’s database of emotional faces
  9. 9 Accuracy Drops with Naturalness  Systems developed under laboratory conditions often perform poorly in real-world scenarios 100% 80% 70% Accuracy Naturalness Acted Read WOZ
  10. 10 Contextualized Analysis  Improvement by context-sensitive analysis  Gender-specific information (Vogt & André 2006)  Success / failure of student in tutoring applications (Conati & McLaren 2009)  Dialogue behavior of virtual agent / robot (Baur et al. 2014)  Learning context using (B)LSTM (Metallinou et al. 2014)
  11. 11 Challenge: Multimodal Fusion  Meta study by D’Mello and Kory on multimodal affect detection shows that improvement correlates with naturalness of corpus: >10% for acted and only <5% for natural data  In natural interaction people draw on a mixture of strategies to express emotion leading to a complementary rather than consistent display of social behaviour S.K. D'Mello, J.M. Kory: Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies. ICMI 2012: 31-38
  12. 12 Event-Based Fusion  In case of contradictory cues, fusion methods trust the “right” modality just as often as “wrong” one single modalities fusion techniques sample correct classification incorrect classification J. Wagner, E. André, F. Lingenfelser, J. Kim: Exploring Fusion Methods for Multimodal Emotion Recognition with Missing Data. T. Affective Computing 2(4): 206- 218 (2011)
  13. 13 Event-Based Fusion  Amount of misclassified samples significantly higher when annotations mismatch Yes 71% No 29% 62% 36% Agreement?
  14. 14 neutral happy Face Voice happy neutral ?Fusion ? happy happy Face Voice Fusion happy Event-based Fusion
  15. 15 Synchronous Fusion  Synchronous fusion approaches are characterized by the consideration of multiple modalities within the same time frame
  16. 16 Asynchronous Fusion  Asynchronous fusion algorithms refer to past time frames with the help of some kind of memory support.  Therefore, they are able to capture the asynchronous nature of observed modalities.
  17. 17 Event-Based Fusion
  18. 18 Event-Based Fusion  Take into account temporal relationships between channels and learn when to combine information  Move from segmentation-based processing to asynchronous event-driven approaches  More robust in the case of missing or noisy data + 0 Fusion time haha hehe Event F. Lingenfelser, J. Wagner, E. André, G. McKeown, W. Curran: An Event Driven Fusion Approach for Enjoyment Recognition in Real-time. ACM Multimedia 2014: 377-386
  19. 19 SSI Framework  The Social Signal Interpretation (SSI) framework is the attempt to provide a general architecture to tackle the challenges we have discussed:  collection of large and rich multi-modal corpora  investigation of advanced fusion techniques  simplifying the development of online systems hehe hehe Johannes Wagner, Florian Lingenfelser, Tobias Baur, Ionut Damian, Felix Kistler, Elisabeth André: The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time. ACM Multimedia 2013: 831-834 SSI is freely available under:
  20. 20 SSI Framework Mic Cam Xsens Wii Smartex Empatica WAX9 AHM Emotiv Kinect Leap SensingTex Touch Mouse EyeTribe SMI Nexus IOM eHealth Myo
  22. 22 Social Robots
  23. 23 Affective Feedback Loop Create Rapport Mirror Emotional Behavior Generate Implicit Feedback Behavior Analysis Emotion Recognition Sensors
  24. 24 Generation of Facial Expressions  FACS (Facial Action Coding System) can be used to generate and recognize facial expressions.  Action Units are used to describe emotional expressions.  Seven Action Units were identified for the robotic face (out of 40 Action Units for the human face)  Lower face:  lip corner puller (AU 12),  lip corner depressor (AU 15)  and lip opening (AU 25)  Upper face:  inner brows raiser (AU 1),  brown lowerer (AU 4),  upper lid raiser (AU 5)  and eye closure (AU 43).
  25. 25 Generation of Facial Expressions
  26. 26 Realization of Social Lies for the Hanson Robokind  Social lies constitute a great part of human conversation.  Social lies, as used for politeness reasons, are generally accepted.  Humans often show deceptive cues in their nonverbal behavior while lying.  Humanoid robots should show deceptive cues while conducting social lies as well.
  27. 27 Deceptive Cues  Deceptive cues in human faces, according to Ekman and colleagues:  Micro-expressions: A false emotion is displayed but the felt emotion is unconsciously expressed for the fraction of a second.  Masks: The felt emotion is intentionally masked by a not corresponding facial expression.  Timing: The longer an expression is shown the more likely it is accompanying a lie.  Asymmetry: Voluntarily shown facial expressions tend to be displayed in an asymmetrical way.
  28. 28 Real versus Faked Smile Pan Am smile (without eyes)Real smile
  29. 29 Real versus Faked Smile Asymmetric (Pan Am) smileReal smile
  30. 30 Real versus Faked Smile Smile with blended anger (in the eye region Real smile
  31. 31 Results of a Study  It was easier to detect faked smiles by the mouth region.  Robots with an asymmetrical smile were rated as significantly less happy than robots with a genuine smile.  Results are in line with research on virtual agents:  Rehm & André, AAMAS 2005: • Agents that fake emotions are perceived as less trustworthy and less convincing • Subjects were not able to name reasons for their uneasiness with the deceptive agent B. Endrass, M. Häring, G. Akila, E. André: Simulating Deceptive Cues of Joy in Humanoid Robots. IVA 2014: 174-177
  32. 32 TARDIS: a job interview training system for young adults
  33. 33 Social Feedback Loop Improve Social Skills Implicit Social Response Generate Feedback Explicit Hint on Social Behavior Behavior Analysis Social Behavior Sensors
  34. 34 Behavior Analysis  Real-time multimodal analysis and classification of social signals  Expressivity features (Energy, Openness, Fluidity)  Facial expressions (Smiles, Lip biting)  Speech quality (Speech rate, Loudness, Pitch)  Engagement, Nervousness
  35. 35 Evaluation  Location:  Parkschule School in Stadtbergen, Germany  Participants:  20 pupils (10m/10f), 13-16 years old, job seeking  Two practitioners I. Damian, T. Baur, B.Lugrin, P. Gebhard, G. Mehlmann, E. André: Games are Better than Books: In-Situ Comparison of an Interactive Job Interview Game with Conventional Training. AIED 2015: 84-94
  36. 36 Evaluation  Two conditions:  TARDIS versus Book
  37. 37 Day 1 Day 2 Day 3 Pre-Interviews Training (Control) Training (TARDIS) Post-Interviews 20 pupils 2 practitioners Task: mock- interviews Duration: ~10 min 10 pupils Task: reading a job interview guide Duration: ~10 min 10 pupils Task: Interaction with TARDIS + NovA Duration: ~10 min 20 pupils 2 practitioners Task: mock- interviews Duration: ~10 min 2x performance questionnaires (user + practitioner) user experience questionnaires user experience questionnaires 2x performance questionnaires (user + practitioner) Experimental Setting
  38. 38 Results  The overall behavior of the pupils who had interacted with TARDIS was rated significantly better by job trainers than the overall behavior of the pupils who prepared themselves for the job interview using books.  Only for the pupils who trained with TARDIS we were able to measure statistically significant improvements:  Their use of smiles appeared more appropriate.  Their use of eye contact appeared more appropriate.  They appeared significantly less nervous.
  39. [...] using the system, pupils seem to be highly motivated and able to learn how to improve their behaviour […] they usually lack such motivation during class [...] transports the experience into the youngster’s own world [...] makes the feedback be much more believable
  40. 40 Augmenting Social Interactions I. Damian, C.S. Tan, T. Baur, J. Schöning, K. Luyten, E. André: Augmenting Social Interactions: Realtime Behavioural Feedback using Social Signal Processing Techniques. CHI 2015: 565-574
  41. 41 Social Feedback Loop Explicit Feedback Generation Behavior Analysis Social Behavior Sensors Improve Social Skills
  42. 42 Social Feedback Loop Behavior Analysis Social Behavior Explicit Feedback Generation Haptic Feedback Sensors Improve Social Skills
  43. 15 speakers, 2 observers Task: Hold 5 min presentation 2 Conditions: system on, system off - within subjects - randomized order, 2 weeks apart Data acquisition: social signal recordings, questionnaires (speaker/observers) Study 1: Quantitative study in controlled environment
  44. Objective analysis of recordings: Amount of inappropriate behaviour decreased when system was on Off On %inappropriatebehaviour (lowerisbetter) 44
  45. Example user reaction: Every time the user received negative feedback, he quickly adjusted his openness 45
  46. 3 speakers, 13 observers Task: Present PhD progress Data acquisition: semi-structured interview Study 2: Qualitative study in a real presentation setting
  47. [...] once I saw the feedback that I was talking too fast, I tried to adapt
  48. [...] once I saw the feedback that I was talking too fast, I tried to adapt [...] most of the time I did not perceive the system, only when I consciously looked at the feedback
  49. [...] once I saw the feedback that I was talking too fast, I tried to adapt [...] most of the time I did not perceive the system, only when I consciously looked at the feedback It was a good feeling seeing everything [the icons] green ... it’s like applause, or as if someone looks at you and nods. However, the green lasts longer than a nod [laughs]
  50. Exploring Eye-Tracking-Driven Sonification for the Visually Impaired Augmented Human, Geneva, 2016
  51. 51 Feedback Loop Provide Information on Social Context Behavior Analysis Social Behavior Feedback Generation Explicit Audio Feedback Sensors
  52. 52 Facial Expression Sonification woodblock piano guitar french horn bells  Map facial expressions onto musical instruments
  53. 53 User Study  Users:  7 blind and visually impaired participants  Criteria:  No nystagmus, unrestricted eye movements Age Gender Visual impairment Control method 68 male Cataract center point 49 female Cataract (early stage) eye gaze 43 female Optic atrophy eye gaze 73 male Congenital blindness center point 68 male Optic nerve damage (accident) center point 87 female Macular degeneration eye gaze 70 male Retinal degeneration eye gaze
  54. 54 Experiment  Scenario:  Two videos with a speaker giving a monologue are shown  Task:  Rate emotional state of the speaker  Results:  Videos were rated more accurately with the system on
  55. 55 Results
  56. 56 Overall Conclusions  Social and emotional sensitivity are key elements of human intelligence.  Social signals are particularly difficult to interpret requiring to understand and model the causes and consequences of them.  Offline applications start from too optimistic recognition rates.  More work needs to be devoted to interactive online applications.  More information and software available under:
  57. 57 Current Work: Mobile Social Signal Processing SSJ: Realtime Social Signal Processing for Java/Android SSI – Unix/Android build compatibility
  58. 58 Thanks you very much!