MediaEval 2016 - Emotion in Music Task: Lessons Learned
1. Emotion in Music Task: Lessons Learned
Anna Aljanaki1 Yi-Hsuan Yang2
Mohammad Soleymani1
1University of Geneva, Switzerland
2Academia Sinica, Taiwan
20-21 October, MediaEval 2016
2. Emotion in Music Task
2013 — Emotion in Music Brave New Task.
Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and
Y.-H. Yang
2 subtasks - dynamic (per-second) music emotion
recognition and song-level emotion recognition
3 participating teams
3. Emotion in Music Task
Focused on audio analysis (optionally, metadata)
Most attention was paid to recognizing how emotion
changes over time
Used valence/arousal model
6. Emotion in Music Task
2013 — Emotion in Music Brave New Task.
Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and
Y.-H. Yang
2 tasks - dynamic (per-second) music emotion recognition
and song-level emotion recognition
3 participating teams
2014 — Emotion in Music Task, Second Edition
Organized by A. Aljanaki, Y.-H. Yang, M. Soleymani
2 tasks - dynamic (per-second) music emotion recognition
and feature design
7 participating teams
7. Emotion in Music Task
2013 — Emotion in Music Brave New Task.
Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and
Y.-H. Yang
2 tasks - dynamic (per-second) music emotion recognition
and song-level emotion recognition
3 participating teams
2014 — Emotion in Music Task, Second Edition
Organized by A. Aljanaki, Y.-H. Yang, M. Soleymani
2 tasks - dynamic (per-second) music emotion recognition
and feature design
7 participating teams
2015 — Emotion in Music Task, Third Edition.
Organized by A. Aljanaki, Y.-H. Yang, M. Soleymani
1 task - dynamic (per-second) music emotion recognition,
three submissions - features, prediction on baseline
features, prediction on custom features.
11 participating teams
8. Quality of the annotations
Year 2013 2014 2015
Total length 9h 18min 12h 30min 3h 46min
Cronbach’s α for arousal .28 ± 0.28 .31 ± 0.30 .66 ± 0.26
GAM’s R2 for arousal .13 ± 0.10 .14 ± 0.11 .44 ± 0.19
Cronbach’s α for valence .28 ± 0.29 .20 ± 0.24 .51 ± 0.35
GAM’s R2 for valence .13 ± 0.10 .10 ± 0.08 .37 ± 0.21
9. Quality of the annotations
Year 2013 2014 2015
Total length 9h 18min 12h 30min 3h 46min
Cronbach’s α for arousal .28 ± 0.28 .31 ± 0.30 .66 ± 0.26
GAM’s R2 for arousal .13 ± 0.10 .14 ± 0.11 .44 ± 0.19
Cronbach’s α for valence .28 ± 0.29 .20 ± 0.24 .51 ± 0.35
GAM’s R2 for valence .13 ± 0.10 .10 ± 0.08 .37 ± 0.21
2013 & 2014 – 45 second excerpts. 2015 – full songs.
2013 & 2014 – Amazon Mechanical Turk Workers. 2015 –
Both lab and AMT workers.
2015 – introduced preliminary listening.
16. Continuous annotation problems
There is a reaction time in the annotations. Before listeners can
give judgements on the emotional content of music, they need
to listen to it for some time.
17. Continuous annotation problems
There is a scaling problem – the unit of emotional expression
can be structural section, or phrase, or a single note.
19. Possible solutions and modifications
Change the task from emotion tracking to dynamics
tracking (diminuendo, crescendo, rallentando)
20. Possible solutions and modifications
Change the task from emotion tracking to dynamics
tracking (diminuendo, crescendo, rallentando)
Change the data collection interface
22. Possible solutions and modifications
Change the task from emotion tracking to dynamics
tracking (diminuendo, crescendo, rallentando)
Change the data collection interface
Finding the practical task where continuous tracking is
necessary.
Retrieval by an emotional trajectory
Thumbnailing
Emotion prediction from physiological signals and audio
23. Acknowledgements
We thank Erik M. Schmidt, Mike N. Caro, Cheng-Ya Sha,
Alexander Lansky, Sung-Yen Liu and Eduardo Countinho for
their contributions to task developments, and anonymous
Turkers for their work.