ISWC 2017 Resources track paper. The study of music is highly interdisciplinary, and thus requires the combination of datasets from multiple musical domains, such as catalog metadata (authors, song titles, dates), industrial records (labels, producers, sales), and music notation (scores). While today an abundance of music metadata exists on the Linked Open Data cloud, linked datasets containing interoperable symbolic descriptions of music itself, i.e. music notation with note and instrument level information, are scarce. In this paper, we describe the MIDI Linked Data Cloud dataset, which represents multiple collections of digital music in the MIDI standard format as Linked Data using the novel midi2rdf algorithm. At the time of writing, our proposed dataset comprises 10,215,557,355 triples of 308,443 interconnected MIDI files, and provides Web-compatible descriptions of their MIDI events. We provide a comprehensive description of the dataset, and reflect on its applications for research in the Semantic Web and Music Information Retrieval communities.
1. ‹#› Het begint met een idee
THE MIDI LINKED DATA
CLOUD
Albert Meroño-Peñuela, Rinke Hoekstra, Aldo Gangemi, Peter Bloem, Reinier de Valk,
Bas Stringer, Berit Janssen, Victor de Boer, Alo Allik, Stefan Schlobach, Kevin Page
ISWC 2017, October 23rd
3. Vrije Universiteit Amsterdam
3
LINKED MUSIC ON THE WEB
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar,
Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
Etree
See Daquino et al. 2017 (WHiSe II)
Characterizing the Landscape of
Musical Data on the Web: state of
the art and challenges
4. Vrije Universiteit Amsterdam
Symbolic music databases (MusicXML, MIDI, NIFF, MEI) are non-
interoperable
From Daquino et al.’s (WHiSe 2017):
“Repositories and digital libraries are the most representative
resources collecting musical data. They mainly offer digitisations
of scores and lyrics (77%), published as PDF and/or JPG (40%)”
“The more the scale of repositories increases, the less structured
formats for representing symbolic notation seem to be used and
the less depth of analysis is provided”
“Larger collections are more likely to feature melody”
Can we find ways of increasing the level of structure of
musical data without compromising its scalability?
4
COOL, BUT…
5. Vrije Universiteit Amsterdam
MIDI: Digital music representation protocol
> (i.e. leaving nothing to analog signals actual instruments)
Popular/abundant, production, standard
Musical Instrument Digital Interface (1983)
> Universal synthesizer interface
> Roland (I. Kakehashi), Yamaha, Korg, Kawai (1981)
> Digital, fine-grained representation of musical tracks and events
> Wide range of controllers and instruments
5
MIDI
6. Vrije Universiteit Amsterdam
[ 144, 60, 100 ]
6
BUT WHAT IS MIDI?
Thanks @rumyra! https://www.youtube.com/watch?v=khsBjXKJOPs
7. Vrije Universiteit Amsterdam
[ 144, 60, 100 ]
[ 128, 60, 64 ]
7
BUT WHAT IS MIDI?
Thanks @rumyra! https://www.youtube.com/watch?v=khsBjXKJOPs
8. Vrije Universiteit Amsterdam
midi2rdf: lossless conversion of MIDI to RDF (and back)
Albert Meroño-Peñuela, Rinke Hoekstra. “The Song Remains the Same: Lossless Conversion and
Streaming of MIDI to RDF and Back”. In: 13th Extended Semantic Web Conference (ESWC 2016),
posters and demos track. May 29th — June 2nd, Heraklion, Crete, Greece (2016).
rdf2midi, direct stream mapping
8
MIDI2RDF & RDF2MIDI
https://midi-ld.github.io/
10. Vrije Universiteit Amsterdam
10
MIDI LINKED DATA RESOURCES
MIDI Pieces http://purl.org/midi-ld/piece/
> Access to MIDI level triples
> Cryptographic hash for unique MIDI content
http://purl.org/midi-ld/pattern/87dd99fb346cd4c7934cb36a00868cbe
MIDI Notes http://purl.org/midi-ld/notes/
> Type, label, octave, pitch value
MIDI Programs http://purl.org/midi-ld/programs/
> All instruments linked to DBpedia
MIDI Chords http://purl.org/midi-ld/chords/
> Label, quality, number of pitch classes, intervals
Enrichments
> Provenance
> Integrated lyrics (mostly from karaoke data)
> Key (Krumhansl-Schumkler), scale degree, metric accents
11. Vrije Universiteit Amsterdam
11
MIDI LINKED DATA RESOURCES
Current collections
The largest MIDI collection on the Internet (thanks @midi_man)
Lakh MIDI dataset (thanks @colinraffel)
MySongBook MIDI
Yours! https://midi-ld.github.com
308,443 interconnected MIDI files
10,215,557,355 triples
Full dump, SPARQL endpoint, RESTful API
12. Vrije Universiteit Amsterdam
12
ENABLING SEMANTIC WEB RESEARCH
Data integration
> Further format interoperability: MIDI, MusicXML, NIFF, MEI
> Integration with formats of other arts: LabanXML
Entity linking
> Audio (Spotify URIs), symbolic notation (MIDI), metadata (MusicBrainz)
> High heterogeneity, low overlap
> Challenge to entity linking algorithms
Semantics and ontologies
> Music Ontology, Chord Ontology, Timeline Ontology
> Underspecification of musical concepts
> Reasoning
> Challenge for ontology alignment
13. Vrije Universiteit Amsterdam
13
ENABLING MUSICOLOGY RESEARCH
Analysis of chords, patterns and melodies at Web scale
> Integrating knowledge from external databases
> Historical, geographical, cultural, economic, sylistic contexts
Everything has a URI
> Annotation tasks, workflow descriptions
Establishing standard Web vocabularies
> Chords (iReal Pro), melodies, metadata
Recommender systems
> Collaborative filtering, content-based feature extraction, hybrid
> Notation-based support for abstract representation of musical concepts
Machine learning (multimodal training data, convincing samples)
Audiolisation