The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources

AcousticBrainz Genre Task
Content-based music genre recognition
from multiple sources
Dmitry Bogdanov, Alastair Porter (Universitat Pompeu Fabra)
Julián Urbano (Delft University of Technology)
Hendrik Schreiber (tagtraum industries incorporated)

Genre recognition in Music Information Retrieval
● A popular task in MIR (Sturm 2014)
● Only small number of broad genres (e.g., rock, jazz, classical, electronic)
● Almost no studies on more specific genres (subgenres)
● Studies don’t consider the subjective nature of genre labels and taxonomies
● Single-class classification problem instead of a multi-class problem
● Genre hierarchy is not exploited
● Small datasets
B. L. Sturm. 2014. The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval. Journal of New Music Research 43, 2 (2014), 147–172.

AcousticBrainz
AcousticBrainz: a community database containing music features extracted
from audio (https://acousticbrainz.org) (Porter et al. 2015)
● Open data computed by open algorithms
● Built on submissions from the community
● Over 5,600,000 analyzed recordings (tracks)
● ~3,000 music features (bags-of-frames)
● Statistical information about spectral shape, rhythm, tonality, loudness, etc.
● Rich music metadata from MusicBrainz (https://musicbrainz.org)
● Lots of data... What can we do with it?
A Porter, D Bogdanov, R Kaye, R Tsukanov, and X Serra. 2015. AcousticBrainz: a community platform for gathering music information obtained from audio. In International Society for Music
Information Retrieval (ISMIR’15) Conference. Málaga, Spain, 786–792.

The 2017 AcousticBrainz MediaEval Task
Content-based music genre recognition from multiple ground truth sources
Goal: Predict genre and subgenre of unknown music recordings given
precomputed music features
Task novelty:
● Four different genre annotation sources (and taxonomies)
● Hundreds of specific subgenres
● Multi-label genre classification problem
● A very large dataset (~2 million recordings in total)

Sources of genre information
● Scrape from internet sources
● Discogs (discogs.com) and AllMusic
(allmusic.com)
● Explicit genre and subgenre annotations
at an album level
● predefined taxonomies
● AcousticBrainz song → album → genre

Sources of genre information
● Tagtraum dataset based on beaTunes
○ Consumer application for Windows and Mac by
tagtraum industries incorporated
○ Encourages users to correct metadata
○ Collects anonymized, user-submitted metadata
○ Relationship Song:Genre is 1:n
● Last.fm
○ Folksonomy tags for each song
○ Relative strength (0-100)
● Tag cleaning (normalization and blacklisting)
● Automatic inference of genre-subgenre relations

Mapping user genre labels to a genre taxonomy
1. Normalization (lowercase, smart subs, ...)
R&B → rnb
Rhythm and Blues → rnb
R and B → rnb
2. Removal of unwanted labels via blacklisting (80spop, love, charts, djonly, ...)
3. Inferring hierarchical relationships via co-occurrence

Co-Occurrence matrix
If a song is labeled with Alternative, how often is it also labeled with Rock?

Co-Occurrence matrix
What about the other way around?
If a song is labeled with Rock, how often is it also labeled with
Alternative?
Co-Occurrence is
not symmetric!

What does the data look like?
https://www.youtube.com/watch?v=zlaz7aR7B44

Subjectivity in music genre
● Classification tasks typically rely on an agreed answer for ground truth
● What should we do if we can’t find agreement between our ground truth?
● What if different sources use a label, but source has a different definition?
Reggae → Dub Electronic → Ambient Dub
Electronic → World Fusion
World, Dub, Fusion

Sub-tasks
● Task 1: Build a separate system for each ground-truth dataset
● Task 2: Can we benefit from combining different ground truths into one
system?
Task 1 Task 2

Development and testing dataset split
● 4 development and 4 testings datasets (70%-15% split, 15% kept for future)
● Album filter
● Each label has at least 40 recordings from 6 release groups in training
dataset (20 from 3 for test dataset)
● Development datasets statistics:

Results
Submissions
● Participants from five teams
● Maximum of 5 submissions for each subtask per team
● (5 submissions ✖ 2 tasks ✖ 4 datasets = 40 runs per team)
● 115 runs received in total
Baselines
● Random baseline: following the distribution of labels
● Popularity baseline: always predicts the most popular genre

Methodologies
● Manual feature preselection
● Classifiers
○ Hierarchical (SVMs + extra trees)
○ Neural networks
○ Random Forest Classifiers
● Task 2 (combining datasets)
○ Genre similarity based on text string matching distance, voting
○ Genre/subgenre similarity based on co-occurrence, conversion matrix, weighting

Evaluation metrics
Effectiveness: Precision, Recall and F-score
● Per recording, all labels (genres and subgenres)
● Per recording, only genres
● Per recording, only subgenres
● Per label, all recordings
● Per genre label, all recordings
● Per subgenre label, all recordings

Per-track F-measure
All labels (genres and subgenres)
Per-label F-measure
All labels (genres and subgenres)
JKU
DBIS
Baselines
popularity
random

Results on genres vs subgenres

Conclusions: The Task is Challenging!
● Subgenre recognition is much more difficult - much space to improve!
● Datasets are heavily unbalanced
● High recall, but poor precision for many systems
● AllMusic dataset is the most difficult
● Systems should exploit hierarchies more
● No significant improvement from combining genre sources yet
Team results
● JKU consistently proposes the best systems across all datasets
● DBIS exploits hierarchies and is significantly better than baselines
● KART, SAM-IRIT and ICSI are similar or close to baselines

Future directions
● AcousticBrainz is an ongoing experiment in collaborative extraction of music
knowledge from audio
● MediaEval 2017 is our starting point
● Integrate promising systems to AcousticBrainz
Next iteration of the AcousticBrainz Genre Task
● Exploit hierarchies to improve predictions on subgenre level
● Better combination of multiple genre annotation sources (Task 2)
● New music features?

Reproducibility
● Open development data
○ Music features computed by open-source software (Essentia)
○ Most genre annotations are open and are gathered by open-source
software (MetaDB)
● Open-source code for evaluation and baselines
● Open validation datasets (will be published after workshop)

Differences between genre annotation sources
Recording "Ambassel" by Dub Colossus
● Source 1: Electronic→ambient dub and Electronic→Downtempo
● Source 2: Electronic→dub and Hip-Hop and Reggae→Dub
● Source 3: World→Worldfusion
● Source 4: World→African and world→Worldfusion
Recording "Como Poden" by In Extremo
● Source 1: Pop/Rock→Heavy Metal
● Source 2: Rock→Folk Rock
● Source 3: Metal→Folk Metal
● Source 4: Rock/Pop→Folk Metal and Rock/Pop→Metal

The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources

The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources

Ähnlich wie The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources (20)

Mehr von multimediaeval

Mehr von multimediaeval (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recognition from Multiple Sources