1. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 1
Automatic Recognition of Emotions in Speech:
models and methods
Prof. Dr. Andreas Wendemuth
Univ. Magdeburg, Germany
Chair of Cognitive Systems
Institute for Information Technology and Communications
YAC / Yandex, 30. October 2014, Moscow
2. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 2
Recorded speech starts as an acoustic signal. For decades, appropriate
methods in acoustic speech recognition and natural language processing
have been developed which aimed at the detection of the verbal content of
that signal, and its usage for dictation, command purposes, and assistive
systems. These techniques have matured to date. As it shows, they can be
utilized in a modified form to detect and analyse further affective
information which is transported by the acoustic signal: emotional content,
intentions, and involvement in a situation. Whereas words and phonemes are
the unique symbolic classes for assigning the verbal content, finding
appropriate descriptors for affective information is much more difficult.
We describe the corresponding technical steps for software-supported affect
annotation and for automatic emotion recognition, and we report on the
data material used for evaluation of these methods.
Further, we show possible applications in companion systems and in dialog
control.
Abstract
3. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 3
1.Affective Factors in Man-Machine-Interaction
2.Speech and multimodal sensor data – what they reveal
3.Discrete or dimensional affect description
4.software-supported affect annotation
5.Corpora
6.Automatic emotion recognition
7.Applications in companion systems and in dialog control
Contents
4. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 4
Affective Factors in Man-Machine-Interaction
5. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 5
Affective Terms - Disambiguation
Emotion [Becker 2001]
• short-time affect
• bound to specific events
Mood [Morris 1989]
• medium-term affect
• not bound to specific events
Personality [Mehrabian 1996]
• long-term stable
• represents individual characteristics
6. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 6
Emotion: the PAD-space
• Dimensions:
• pleasure / valence (p),
• arousal (a) and
• dominance (d)
• values each from -1.0 bis 1.0
• “neutral” at center
• defines octands, e.g. (+p+a+d)
Siegert et al. 2012 Cognitive Behavioural Systems. COST
7. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 7
Correlation of emotion and mood
In order to make it measurabble, there has to be an empirical correlation of
moods to PAD space (emotion octands). [Mehrabian 1996]
Moods for octands in PAD space
PAD mood PAD mood
+++ Exuberant
++- Dependent
+-+ Relaxed
+- - Docile
- - - Bored
- -+ Disdainful
-+- Anxious
-++ Hostile
Siegert et al. 2012 Cognitive Behavioural Systems. COST
8. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 8
Personality and PAD-space
Unique personality model: Big Five [Allport and Odbert 1936]
5 strong independent factors
[Costa and McCrae 1985] presented the five-factor personality
inventory
deliberately applicable to non-clinical environments
•
•
•
•
Neuroticism
Extraversion
openness
agreeableness
conscientiousness
•
•
•
•
•
• measurable by questionnaires (NEO FFI test)
• Mehrabian showed a relation between the Big Five Factors (from Neo-FFI,
scaled to [0,1]) and PAD-space. E.g.:
• P := 0.21 · extraversion +0.59 · agreeableness +0.19 · neuroticism
(other formulae available for arousal and dominance)
Siegert et al. 2012 Cognitive Behavioural Systems. COST
9. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 9
1.Affective Factors in Man-Machine-Interaction
2.Speech and multimodal sensor data – what they reveal
3.Discrete or dimensional affect description
4.software-supported affect annotation
5.Corpora
6.Automatic emotion recognition
7.Applications in companion systems and in dialog control
Contents
10. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 10
Interaction modalities –
what a person „tells“
• Speech (Semantics)
• Non-semantic utterances („hmm“, „aehhh“)
• Nonverbals (laughing, coughing, swallowing,…)
• Emotions in speech
11. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 11
Discourse Particles
Especially the intonation reveals details about the speakers attitude but
is influenced by semantic and grammatical information.
investigate discourse particles (DPs)
• can’t be inflected but emphasized
• occurring at crucial communicative points
• have specific intonation curves (pitch-contours)
• thus may indicate specific functional meanings
Siegert et al. 2013 WIRN Vietri
12. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 12
The Role of Discourse Particles for Human Interaction
J. E. Schmidt [2001] presented an empirical study where he could
determine seven form-function relations of the DP “hm”:
Siegert et al. 2013 WIRN Vietri
Name idealised
pitch-contour
Description
DP-A attention
DP-T thinking
DP-F finalisation signal
DP-C confirmation
DP-D decline∗
DP-P positive
assessment
DP-R request to respond
13. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 13
The Role of Discourse Particles for Human Interaction
• [Kehrein and Rabanus, 2001] examined different conversational styles and
confirmed the form-function relation.
• [Benus et al., 2007] investigated the occurrence frequency of specific
backchannel words for American English HHI.
• [Fischer et al., 1996]: the number of partner-oriented signals is
decreasing while the number of signals indicating a task-oriented or
expressive function is increasing
• Research Questions
• Are DPs occurring within HCI?
• Which meanings can be determined?
• Which form-types are occurring?
Siegert et al. 2013 WIRN Vietri
14. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 14
Interaction modalities – what a person „tells“ with other modalities
• Speech (Semantics)
• Non-semantic utterances („hmm“, „aehhh“)
• Nonverbals (laughing, coughing, swallowing,…)
• Emotions in speech
• Eye contact / direction of sight
• General Mimics
• Face expressions (Laughing, angryness,..)
• Hand gesture, arm gesture
• Head posure, body posure
• Bio-signals (blushing, paleness, shivering, frowning…)
• Pupil width
• Haptics: Direct operation of devices (keyboard, mouse, touch)
• Handwriting, drawing, sculpturing, …
15. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 15
What speech can (indirectly) reveal
• Indirect expression (pauses, idleness, fatigueness)
• Indirect content (humor, irony, sarcasm)
• Indirect intention (hesitation, fillers, discourse particles)
16. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 16
Technical difficulties
• Recognizing speech, mimics, gestures, poses, haptics, bio-signals: indirect information
• Many (most) modalities need data-driven recognition engines
• Unclear categories (across modalities?)
• Robustness of recognition in varying / mobile environments
17. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 17
Now you (hopefully) have recorded (multimodal)
data with (reliable) emotional content
Actually, you have a (speech) signal,
but what does it convey?
So, really, you have raw data.
18. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 18
1.Affective Factors in Man-Machine-Interaction
2.Speech and multimodal sensor data – what they reveal
3.Discrete or dimensional affect description
4.software-supported affect annotation
5.Corpora
6.Automatic emotion recognition
7.Applications in companion systems and in dialog control
Contents
19. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 19
Now you need:
transcriptions (intended things which happened)
(Speech: „Nice to see you“; Mimics: „eyes open, lip corners up“; … )
and
annotations (unintended events, or the way how it happened).
Speech: heavy breathing, fast, happy; Mimics: smile, happiness; …
Both processes require
labelling: tagging each recording chunk with marks, which
correspond to the relevant transcription / annotation categories
20. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 20
How to transcribe / annotate?
• Trained transcribers / annotators with high intra- and interpersonal reliability (kappa
measures)
• Time aligned (synchronicity!), simultaneous presentation of all modalities to the transcriber /
annotator
• Selection of (known) categories for the transcriber / annotator
• Labelling
21. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 21
Categories:
Clear (?) modal units of investigation / categories e.g.:
• Speech: phonemes, syllables, words
• Language: letters, syllables, words
• Request: content! (orgin city, destination city, day, time)
• Dialogues: turn, speaker, topic
• Situation Involvement: object/subject of attention, diectics, active/passive participant
• Mimics: FACS (Facial Action Coding System) -> 40 action units
• Big 5 Personality Traits (OCEAN)
• Sleepiness (Karolinska Scale)
• Intoxication (Blood Alcohol Percentage)
22. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 22
Categories:
• Unclear (?) modal categories e.g.:
• Emotion: ???
• Cf.: Disposition: Domain-Specific …. ?
• Cf.: Level of Interest (?)
23. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 23
Categorial Models of human emotion ...
... which can be utilized for automatic emotion recognition
• Two-Class models,
e.g. (not) cooperative
• Base Emotions [Ekman, 1992]
(Angriness, Disgust, Fear,
Joy, Sadness, Surprise, Neutral)
• VA(D) Models
(Valence (Pleasure) Arousal Dominance)
• Geneva Emotion Wheel
[Scherer, 2005]
2
3
24. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 24
Categorial Models of human emotion (2):
enhanced listings
Siegert et al. 2011 ICME
2
4
• sadness,
• contempt,
• surprise,
• interest,
• hope,
• relief,
• joy,
• helplessness,
• confusion
25. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 25
Categorial Models of human emotion (3):
Self-Assessment Manikins [Bradley, Lang, 1994]
Böck et al. 2011 ACII
2
5
26. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 26
1.Affective Factors in Man-Machine-Interaction
2.Speech and multimodal sensor data – what they reveal
3.Discrete or dimensional affect description
4.software-supported affect annotation
5.Corpora
6.Automatic emotion recognition
7.Applications in companion systems and in dialog control
Contents
27. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 27
Transcription / annotation tools
• (having fixed the modalities and categories)
• Examples; EXMARaLDA, FOLKER, ikannotate
EXMARaLDA: „Extensible Markup Language for Discourse Annotation“, www.exmaralda.org/, Hamburger Zentrum für Sprachkorpora (HZSK) und
SFB 538 ‘Multilingualism’, seit 2001/ 2006
FOLKER: „Forschungs- und Lehrkorpus Gesprochenes Deutsch“ - Transkriptionseditor, http://agd.ids-mannheim.de/folker.shtml, Institute for
German Language, Uni Mannheim, seit 2010
[Schmidt, Schütte, 2010]
28. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 28
ikannotate tool
ikannotate - A Tool for Labelling, Transcription, and Annotation of Emotionally Coloured
Speech (2011)
• Otto von Guericke University - Chair of Cognitive Systems + Dept. of Psychosomatic Medicine
and Psychotherapy
Written in QT4 based on C++
Versions for Linux, Windows XP and higher, and Mac OS X
Sources and binaries are available on demand
Handles different output formats, especially, XML and TXT
Processes MP3 and WAV files
According to conversation analytic system of transcription
(GAT) (version 1 and 2) [Selting et.al., 2011]
http://ikannotate.cognitive-systems-magdeburg.de/
29. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 29
Screenshots of ikannotate (I)
Böck et al. 2011 ACII
30. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 30
Screenshots of ikannotate (II)
Böck et al. 2011 ACII
31. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 31
1.Affective Factors in Man-Machine-Interaction
2.Speech and multimodal sensor data – what they reveal
3.Discrete or dimensional affect description
4.software-supported affect annotation
5.Corpora
6.Automatic emotion recognition
7.Applications in companion systems and in dialog control
Contents
32. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 32
Corpora of affective speech (+other modalities)
• Overview: http://emotion-research.net/wiki/Databases (not complete)
• Contains information on: Identifier, URL, Modalities, Emotional content,
Emotion elicitation methods, Size, Nature of material, Language
• Published overviews: Ververidis & Kotropoulos 2006, Schuller et al. 2010,
Appendix of [Pittermann et al.2010]*
• Popular corpora (listed on website above):
Emo-DB: Berlin Database of Emotional Speech 2005
SAL: Sensitive Artificial Listener (Semaine 2010)
(not listed on website above):
eNTERFACE (2005)
LMC: LAST MINUTE (2012)
Table Talk (2013)
Audio-Visual Interest Corpus (AVIC) (ISCA 2009)
• Ververidis, D. & Kotropoulos, C. (2006). “Emotional speech recognition: Resources, features, and methods”. Speech Commun 48 (9), pp.
1162–1181.
• Schuller, B.; Vlasenko, B.; Eyben, F.; Wollmer, M.; Stuhlsatz, A.; Wendemuth, A. & Rigoll, G. (2010). “Cross-Corpus Acoustic Emotion
Recognition: Variances and Strategies” IEEE Trans. Affect. Comput. 1 (2), pp. 119–131.
• Pittermann, J.; Pittermann, A. & Minker, W. (2010). Handling Emotions in Human-Computer Dialogues. Amsterdam, The Netherlands:
Springer.
34. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 34
Example 1: Berlin Database of Emotional Speech (EMO-DB)
• Burkhardt, et al., 2005: A Database of German Emotional Speech,
• Proc. INTERSPEECH 2005, Lisbon, Portugal, 1517-1520.
• 7 emotions: anger, boredom, disgust, fear, joy, neutral, sadness
• 10 professional German actors, 5f, 494 phrases
• Perception test with 20 subjects: 84.3% mean acc.
• http://pascal.kgw.tu-berlin.de/emodb/index-1280.html
35. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 35
Example 2: LAST MINUTE Corpus
Setup
Non-acted, emotions evoked by story: task
solving with difficulties (barriers)
Groups N = 130, balanced in age, gender, education
Duration 56:02:14
Sensors 13
Max. Video Bandwidth 1388x1038 25Hz
Biopsychological data heart beat, respiration, skin reductance
Questionnaires sociodemographic, psychometric
Interviews yes (73 subjects)
Language German
Available upon request at roesner@ovgu.de and joerg.frommer@med.ovgu.de
Frommer et al. 2012 LREC
36. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 36
1.Affective Factors in Man-Machine-Interaction
2.Speech and multimodal sensor data – what they reveal
3.Discrete or dimensional affect description
4.software-supported affect annotation
5.Corpora
6.Automatic emotion recognition
7.Applications in companion systems and in dialog control
Contents
37. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 37
Data-driven recognition engines
• Remember, now you have transcribed/annotated data with fixed
categories (across modalities?) and modalities.
• You want to use that data to construct unimodal or multimodal
data-driven recognition engines
• Once you have these engines, you can automatically determine the
categories in yet unkown data.
38. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 38
A Unified View on data driven recognition
• It’s Pattern Recognition
Feature
generation /
selection
once
Learner
Optimisation
U f(x') x' x y=κr f(x)
•
• Knowledge
Sources
•
Schuller 2012 Cognitive Behavioural Systems COST
x y l L l l , 1,...,
Capture
Pre-processing
Feature
extraction
Feature
reduction
Classification
Regression
Decoding
multi-layered
multi-layered
Dictionary
Interaction
Grammar
Production
Model
f x x
Encoding
39. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 39
Audio Features
Böck et al. 2013 HCII
Facial Action Units
• MFCCs with Delta and Acceleration
• Prosodic features
• Formants and corresponding
bandwidths
• Intensity
• Pitch
• Jitter
• For acoustic feature extraction: Hidden Markov Toolkit (HTK)
and phonetic analysis software PRAAT (http://www.praat.org)
40. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 40
What is the current state of affect recognition?
Table : Overview of reported results, #C: Number of Classes, eNT: eNTERFACE,
VAM: Vera am Mittag, SAL: Sensitive Artificial Listener, LMC: LAST MINUTE.
Database Result #C Comment Reference
emoDB
(acted)
91.5% 2 6552 acoustic features and GMMs Schuller et al., 2009
Comparing the results on acted emotional data and naturalistic
interactions:
• recognition performance decreases
• too much variability within the data
eNT
(primed)
74.9% 2 6552 acoustic features, GMMs Schuller et al., 2009
VAM
(natural)
76.5% 2 6552 acoustic features with GMMs Schuller et al., 2009
SAL
(natural)
61.2% 2 6552 acoustic features with GMMs Schuller et al., 2009
LMC
(natural)
80% 2 pre-classification of visual, acoustic
and gestural features, MFN
Krell et al.,2013
Siegert et al. 2013 ERM4HCI
41. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 41
User-group / temporal specific affect recognition
SuccessRates [stress / no stress] (tested on LAST MINUTE corpus) :
• 72% utilizing (few) group-specific (young / old+ male/female)
audio features [Siegert et al., 2013]
• 71% utilizing audio-visual features and a linear filter as decision level
fusion [Panning et al., 2012]
• 80% using facial expressions, gestural analysis and acoustic features
with Markov Fusion Networks [Krell et al., 2013]
Approaches 2 & 3 integrate their classifiers of longer temporal sequences.
Siegert et al. 2013 ERM4HCI, workshop ICMI 2013
42. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 42
Classification Engines – Cross-Modalities
• Classification based on
audio feature
• Preselection of relevant
video sequences
• Manual annotation of
Action Units and
classification of facial
expressions
Further:
• preclassification of the
sequences
• Dialog act representation
models
Böck et al. 2013 HCII, Friesen et al. 2014 LREC
43. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 43
1.Affective Factors in Man-Machine-Interaction
2.Speech and multimodal sensor data – what they reveal
3.Discrete or dimensional affect description
4.software-supported affect annotation
5.Corpora
6.Automatic emotion recognition
7.Applications in companion systems and in dialog control
Contents
44. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 44
Usage of multimodal information
• Remember, now you have transcribed/annotated data with fixed
categories (across modalities?) and modalities (maybe a corpus).
• You also have a categories classifier trained on these data, i.e.
domain specific / person specific.
Now we use categorized information in applications:
45. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 45
Why more modalities help understanding what a
person wants to „tell“
• Disambiguation (saying and pointing)
• Person‘s choice (talking is easier than typing)
• „Real“ information (jokes from a blushing person?)
• Robustness (talking obscured by noise, but lipreading works)
• Higher information content (multiple congruent modalities)m
• Uniqueness (reliable emotion recognition only from multi-modalities)
46. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 46
Companion Technology
Application
/
Dialog-
Management
Gesture
Interaction
Management
Speech Input signal
Touch
Physiolog.
Sensor
Devices Multimodal
Components
Output signal
Multimodal
Adaptive
Individualised
User
Weber et al. 2012 SFB TRR 62
47. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 47
Emotional and dialogic conditions in user behavior
Recognition of critical dialogue courses
• On basis of linguistic content
• in combination with multi-modal emotion recognition
Development of empathy-promoting dialogue strategies
• Motivation of the user
• Prevent abandonment of the dialogue in problem-prone situations
49. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 49
Take home messages / outlook
Emotion / Affect recognition:
• Data driven, automatic pattern recognition
• Categorisation, Annotation tools
• Temporal emotion train dependent on mood and
personality
• Outlook: Emotion-categorial Appraisal-Model
Use in Man-Machine-Interaction:
• Early detection / counteraction of adverse dialogs
• Outlook: use in call centers and companion technology
50. YAC - Automatic recognition of emotions in speech – Andreas Wendemuth 30.Oct. 2014 50
… thank you!
www.cogsy.de