Musicbrainz is an encyclopedia of music tracks, artists and albums. It is available in PostgreSQL under CC license. 2 different approaches to load the database into MongoDB are examined - one where 4 tables are first denormalized in Postgres and then loaded into MongoDB. Other one loads them into MongoDB and denormalizes into a single collection there. We also show MongoDB's fulltext index.
4. What is MusicBrainz ?
• MusicBrainz is a community-maintained open
source encyclopedia of music information.
• This means that anyone - including you - can help
contribute to the project by adding information
about your favorite artists and their related works.
• Robert Kaye founded MusicBrainz. The project
has grown rapidly from a one-man operation to
an international community of enthusiasts who
appreciate both music and music metadata.
5. MusicBrainz
• Along the way, the scope of the project has
expanded from its origins as a mere a CDDB
replacement to today, where MusicBrainz has
become a true encyclopedia of music.
• As an encyclopedia and as a community,
MusicBrainz exists solely to collect as much
information about music as we can without
discriminating or preferring one "type" of music
over another.
6. MusicBrainz Database
The MusicBrainz Database is where all of the various pieces of information we
collect about music is stored, from artists and their releases to works and their
composers, and of course much more.
The majority of the data in the MusicBrainz Database is placed in the Public
Domain, which means that anyone can download the data and use it in any way
they see fit. The remaining data is released under a Creative Commons
Attribution-NonCommercial-ShareAlike 2.0 license.
13. Import strategies
• Denormalized from source DB
– Import TSV in PostgreSQL
– Export joined tables from PostgreSQL
– mongoimport TSV
• Separate collections from TSV
– mongoimport TSVs into temporary collections
– “Join” temporary collections in client (PyMongo) and
insert to destination collection