SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Working with Humongous Music
Database
MongoDB
Prasoon Kumar
#HyderabadDataScienceGroup
Agenda
•  MongoDB Features
•  Bulk Import
•  Full Text Index creation
•  Full Text Search
•  Musicbrainz Database
MUSIC BRAINZ
What is MusicBrainz ?
•  MusicBrainz is a community-maintained open
source encyclopedia of music information.
•  This means that anyone - including you - can help
contribute to the project by adding information
about your favorite artists and their related works.
•  Robert Kaye founded MusicBrainz. The project
has grown rapidly from a one-man operation to
an international community of enthusiasts who
appreciate both music and music metadata.
MusicBrainz
•  Along the way, the scope of the project has
expanded from its origins as a mere a CDDB
replacement to today, where MusicBrainz has
become a true encyclopedia of music.
•  As an encyclopedia and as a community,
MusicBrainz exists solely to collect as much
information about music as we can without
discriminating or preferring one "type" of music
over another.
MusicBrainz Database
The MusicBrainz Database is where all of the various pieces of information we
collect about music is stored, from artists and their releases to works and their
composers, and of course much more.
The majority of the data in the MusicBrainz Database is placed in the Public
Domain, which means that anyone can download the data and use it in any way
they see fit. The remaining data is released under a Creative Commons
Attribution-NonCommercial-ShareAlike 2.0 license.
MongoDB
Document
Database
Open-
Source
General
Purpose
Scalability
Auto-Sharding
•  Increase capacity as you go
•  Commodity and cloud architectures
•  Improved operational simplicity and cost visibility
Morphia
MEAN Stack
Java
Python
Perl
Ruby
Support for the
most popular
languages and
frameworks
Drivers & Ecosystem
Music Mongo
•  Load (import)
•  Run
– Exact match
– Full text search
•  Todo
–  Application interface
AWS Setup
s0 54.225.100.65
s1 54.235.157.214
s2 54.225.100.42
Client & mongos
54.225.100.39
config
184.73.195.120
Relevant schema of MusicBrainz:
Import strategies
•  Denormalized from source DB
–  Import TSV in PostgreSQL
–  Export joined tables from PostgreSQL
–  mongoimport TSV
•  Separate collections from TSV
–  mongoimport TSVs into temporary collections
–  “Join” temporary collections in client (PyMongo) and
insert to destination collection
Steps for creating denormalized table:
Client join
Import statistics
recording:
2013-11-11T22:02:51.213+0000 imported 12817015 objects real 69m49.949s
artist_credit:
2013-11-11T22:04:41.469+0000 imported 756247 objects real 1m50.256s
track:
2013-11-11T22:48:59.423+0000 imported 15427255 objects real 44m17.973s
release:
2013-11-11T22:53:06.627+0000 imported 1208854 objects real 4m7.183s
medium:
2013-11-11T22:57:45.030+0000 imported 1343234 objects real 4m38.414s
Import via Postgres
Operation Time
Postgres Import 08m11s
Denormalize 14m57s
Export 00m29s
(Unsharded) (Sharded)
MongoDB Import 14m59s 12m15s
Index 07m45s 02m35s
Overall 45m23s 40m13s
Indexes & Sharding
Indexes & Sharding - Text Index
Indexes & Sharding - Shard key
musicbrainz2.records3
shard key: { "name" : 1,
"_id" : 1 }
chunks:
shard0002 18
shard0000 18
shard0001 18
Thank You
team = {
members: [“Jonathan”, “Prasoon”],
company: “MongoDB
}
@prasoonk

Weitere ähnliche Inhalte

Was ist angesagt?

How To Start A Career In The NFT Space?
How To Start A Career In The NFT Space?How To Start A Career In The NFT Space?
How To Start A Career In The NFT Space?
101 Blockchains
 
Blockchain Interview Questions And Answers | Blockchain Technology Interview ...
Blockchain Interview Questions And Answers | Blockchain Technology Interview ...Blockchain Interview Questions And Answers | Blockchain Technology Interview ...
Blockchain Interview Questions And Answers | Blockchain Technology Interview ...
Simplilearn
 

Was ist angesagt? (20)

Flask – Python
Flask – PythonFlask – Python
Flask – Python
 
How To Start A Career In The NFT Space?
How To Start A Career In The NFT Space?How To Start A Career In The NFT Space?
How To Start A Career In The NFT Space?
 
OpenChain - The Industry Standard for Open Source Compliance
OpenChain - The Industry Standard for Open Source ComplianceOpenChain - The Industry Standard for Open Source Compliance
OpenChain - The Industry Standard for Open Source Compliance
 
Blockchain, cryptography, and consensus
Blockchain, cryptography, and consensusBlockchain, cryptography, and consensus
Blockchain, cryptography, and consensus
 
Understanding Blockchain: Distributed Ledger Technology
Understanding Blockchain: Distributed Ledger TechnologyUnderstanding Blockchain: Distributed Ledger Technology
Understanding Blockchain: Distributed Ledger Technology
 
Bit Torrent Protocol Report
Bit Torrent Protocol ReportBit Torrent Protocol Report
Bit Torrent Protocol Report
 
All About Ethereum
All About EthereumAll About Ethereum
All About Ethereum
 
Fuzzy Matching or Fuzzy Logic Explained
Fuzzy Matching or Fuzzy Logic ExplainedFuzzy Matching or Fuzzy Logic Explained
Fuzzy Matching or Fuzzy Logic Explained
 
Blockchain PowerPoint Presentation Slides
Blockchain PowerPoint Presentation SlidesBlockchain PowerPoint Presentation Slides
Blockchain PowerPoint Presentation Slides
 
Blockchain Presentation
Blockchain PresentationBlockchain Presentation
Blockchain Presentation
 
DNS sous linux
DNS sous linuxDNS sous linux
DNS sous linux
 
Blockchain, Ethereum and ConsenSys
Blockchain, Ethereum and ConsenSysBlockchain, Ethereum and ConsenSys
Blockchain, Ethereum and ConsenSys
 
Brand New Web3 Wallet
Brand New Web3 WalletBrand New Web3 Wallet
Brand New Web3 Wallet
 
Pinecone Vector Database.pdf
Pinecone Vector Database.pdfPinecone Vector Database.pdf
Pinecone Vector Database.pdf
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Overview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus MechanismsOverview of Blockchain Consensus Mechanisms
Overview of Blockchain Consensus Mechanisms
 
Blockchain Interview Questions And Answers | Blockchain Technology Interview ...
Blockchain Interview Questions And Answers | Blockchain Technology Interview ...Blockchain Interview Questions And Answers | Blockchain Technology Interview ...
Blockchain Interview Questions And Answers | Blockchain Technology Interview ...
 
Blockchains and databases a new era in distributed computing
Blockchains and databases a new era in distributed computingBlockchains and databases a new era in distributed computing
Blockchains and databases a new era in distributed computing
 
Blockchain - the future of the Internet
Blockchain - the future of the InternetBlockchain - the future of the Internet
Blockchain - the future of the Internet
 
IoT and Blockchain Challenges and Risks
IoT and Blockchain Challenges and RisksIoT and Blockchain Challenges and Risks
IoT and Blockchain Challenges and Risks
 

Ähnlich wie MongoDB for storing humongous music database

Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
Serdar Buyuktemiz
 

Ähnlich wie MongoDB for storing humongous music database (20)

My first moments with MongoDB
My first moments with MongoDBMy first moments with MongoDB
My first moments with MongoDB
 
Using MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 MinutesUsing MongoDB For BigData in 20 Minutes
Using MongoDB For BigData in 20 Minutes
 
Music recommendations API with Neo4j
Music recommendations API with Neo4jMusic recommendations API with Neo4j
Music recommendations API with Neo4j
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
lastfm contentdashboards project description
lastfm contentdashboards project descriptionlastfm contentdashboards project description
lastfm contentdashboards project description
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
Mongo db and hadoop driving business insights - final
Mongo db and hadoop   driving business insights - finalMongo db and hadoop   driving business insights - final
Mongo db and hadoop driving business insights - final
 
Music streams
Music streamsMusic streams
Music streams
 
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander ZaitsevWebinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
Webinar 2017. Supercharge your analytics with ClickHouse. Alexander Zaitsev
 
Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
 
Introduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQLIntroduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQL
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data network
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWS
 
NoSQL
NoSQLNoSQL
NoSQL
 
CDC to the Max!
CDC to the Max!CDC to the Max!
CDC to the Max!
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Accelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO WayAccelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO Way
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

MongoDB for storing humongous music database

  • 1. Working with Humongous Music Database MongoDB Prasoon Kumar #HyderabadDataScienceGroup
  • 2. Agenda •  MongoDB Features •  Bulk Import •  Full Text Index creation •  Full Text Search •  Musicbrainz Database
  • 4. What is MusicBrainz ? •  MusicBrainz is a community-maintained open source encyclopedia of music information. •  This means that anyone - including you - can help contribute to the project by adding information about your favorite artists and their related works. •  Robert Kaye founded MusicBrainz. The project has grown rapidly from a one-man operation to an international community of enthusiasts who appreciate both music and music metadata.
  • 5. MusicBrainz •  Along the way, the scope of the project has expanded from its origins as a mere a CDDB replacement to today, where MusicBrainz has become a true encyclopedia of music. •  As an encyclopedia and as a community, MusicBrainz exists solely to collect as much information about music as we can without discriminating or preferring one "type" of music over another.
  • 6. MusicBrainz Database The MusicBrainz Database is where all of the various pieces of information we collect about music is stored, from artists and their releases to works and their composers, and of course much more. The majority of the data in the MusicBrainz Database is placed in the Public Domain, which means that anyone can download the data and use it in any way they see fit. The remaining data is released under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 license.
  • 8. Scalability Auto-Sharding •  Increase capacity as you go •  Commodity and cloud architectures •  Improved operational simplicity and cost visibility
  • 9. Morphia MEAN Stack Java Python Perl Ruby Support for the most popular languages and frameworks Drivers & Ecosystem
  • 10. Music Mongo •  Load (import) •  Run – Exact match – Full text search •  Todo –  Application interface
  • 11. AWS Setup s0 54.225.100.65 s1 54.235.157.214 s2 54.225.100.42 Client & mongos 54.225.100.39 config 184.73.195.120
  • 12. Relevant schema of MusicBrainz:
  • 13. Import strategies •  Denormalized from source DB –  Import TSV in PostgreSQL –  Export joined tables from PostgreSQL –  mongoimport TSV •  Separate collections from TSV –  mongoimport TSVs into temporary collections –  “Join” temporary collections in client (PyMongo) and insert to destination collection
  • 14. Steps for creating denormalized table:
  • 16. Import statistics recording: 2013-11-11T22:02:51.213+0000 imported 12817015 objects real 69m49.949s artist_credit: 2013-11-11T22:04:41.469+0000 imported 756247 objects real 1m50.256s track: 2013-11-11T22:48:59.423+0000 imported 15427255 objects real 44m17.973s release: 2013-11-11T22:53:06.627+0000 imported 1208854 objects real 4m7.183s medium: 2013-11-11T22:57:45.030+0000 imported 1343234 objects real 4m38.414s
  • 17. Import via Postgres Operation Time Postgres Import 08m11s Denormalize 14m57s Export 00m29s (Unsharded) (Sharded) MongoDB Import 14m59s 12m15s Index 07m45s 02m35s Overall 45m23s 40m13s
  • 19. Indexes & Sharding - Text Index
  • 20. Indexes & Sharding - Shard key musicbrainz2.records3 shard key: { "name" : 1, "_id" : 1 } chunks: shard0002 18 shard0000 18 shard0001 18
  • 21. Thank You team = { members: [“Jonathan”, “Prasoon”], company: “MongoDB } @prasoonk