SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Audio fingerprinting and metadata
     correction with Python

           Alastair Porter


         November 21, 2011
Me

     Background in Computer Science
     Masters McGill Music Tech
     Online
         http://github.com/alastair (20/28 music; 11 in python)
         http://twitter.com/alastairporter
Python as a go-to language

     Quick for prototyping
     Use the same code in a production release
     Very handy for API access (thin wrapper around urllib2)
Music and Metadata
Music and Metadata

  The problem:
      People are really bad at naming music
      Inconsistent over releases


  The solution:
      Crowdsourcing
      Get info from as many trusted sources as possible
      Make renaming take no effort
MusicBrainz
Amazon
Amazon (Coverart)
Last.fm
Last.fm (Genre tags)
MusicBrainz
albumidentify




  http://github.com/albumidentify/albumidentify
MP3, FLAC, Ogg, CDs
Identification strategy

      If there’s a CD TOC, use that (musicbrainz lookup)
      If no match, use audio fingerprinting
      If no match, do a text lookup (artist/album)
Fingerprinting

     Converts an audio signal to a short sequence of numbers
     Smaller to compare than an entire file
     Perceptual features rather than byte comparison (works
     with different encodings)
Identification strategy

      Fingerprinting gives us a set of candidate tracks
      A track could be on many albums (original release, best of,
      mix album)
      Keep a list of what tracks we have for each album
      Once we fill all the slots for an album, success!
Metadata strategy

     Text information from Musicbrainz
     Genre from last.fm
     Image from Amazon (or folder.jpg)
     Musicbrainz tells us where these are (don’t need to search)
     Save in every file (Text is cheap)
Writing it all out

      Custom MP3/ID3 writer
      Ogg meta tags
      FLAC meta tags
      Name files
          Artist/Artist - Year - Album/01 - Artist - Track
      Replaygain!
      Be a good citizen: Submit fingerprints to musicbrainz
What’s next

     New version of musicbrainz
     New fingerprinter
     More metadata
     More metadata
Thanks

  More information:
      MusicBrainz: http://musicbrainz.org
      albumidentify:
      http://github.com/albumidentify/albumidentify
      More fingerprinting: http://acoustid.org,
      http://echoprint.me
      Last.fm

Weitere ähnliche Inhalte

Was ist angesagt?

CFADW PRESENTATION(Music sampling in hip hop)
CFADW PRESENTATION(Music sampling in hip hop)CFADW PRESENTATION(Music sampling in hip hop)
CFADW PRESENTATION(Music sampling in hip hop)
shirlon
 
Music Sampling in Hip Hop
Music Sampling in Hip HopMusic Sampling in Hip Hop
Music Sampling in Hip Hop
Ashamim
 
Analysis of the mystery jets digi pack for the
Analysis of the mystery jets digi pack for theAnalysis of the mystery jets digi pack for the
Analysis of the mystery jets digi pack for the
chrismuzz
 
The Olympic Soundtrack Artists 2008 Summer Olympics Beijing China
The Olympic Soundtrack Artists 2008 Summer Olympics Beijing ChinaThe Olympic Soundtrack Artists 2008 Summer Olympics Beijing China
The Olympic Soundtrack Artists 2008 Summer Olympics Beijing China
olympic125
 
Elvis Presley Cut Me And I Bleed 1999
Elvis Presley   Cut Me And I Bleed 1999Elvis Presley   Cut Me And I Bleed 1999
Elvis Presley Cut Me And I Bleed 1999
Elvis Live
 
Sgp slideshow
Sgp slideshowSgp slideshow
Sgp slideshow
jprestler
 
Music Horror Analysis
Music Horror AnalysisMusic Horror Analysis
Music Horror Analysis
gmckillop
 

Was ist angesagt? (20)

CFADW PRESENTATION(Music sampling in hip hop)
CFADW PRESENTATION(Music sampling in hip hop)CFADW PRESENTATION(Music sampling in hip hop)
CFADW PRESENTATION(Music sampling in hip hop)
 
Props List
Props ListProps List
Props List
 
1. initial plans (js)
1. initial plans (js)1. initial plans (js)
1. initial plans (js)
 
Music Sampling in Hip Hop
Music Sampling in Hip HopMusic Sampling in Hip Hop
Music Sampling in Hip Hop
 
Assignment 53
Assignment 53Assignment 53
Assignment 53
 
Twitter bots I have known and loved
Twitter bots I have known and lovedTwitter bots I have known and loved
Twitter bots I have known and loved
 
Podcasting
PodcastingPodcasting
Podcasting
 
Podcasting Tips
Podcasting TipsPodcasting Tips
Podcasting Tips
 
Podcast Tutorial
Podcast TutorialPodcast Tutorial
Podcast Tutorial
 
FCP #3 Importing Media
FCP #3 Importing MediaFCP #3 Importing Media
FCP #3 Importing Media
 
Analysis of the mystery jets digi pack for the
Analysis of the mystery jets digi pack for theAnalysis of the mystery jets digi pack for the
Analysis of the mystery jets digi pack for the
 
The Olympic Soundtrack Artists 2008 Summer Olympics Beijing China
The Olympic Soundtrack Artists 2008 Summer Olympics Beijing ChinaThe Olympic Soundtrack Artists 2008 Summer Olympics Beijing China
The Olympic Soundtrack Artists 2008 Summer Olympics Beijing China
 
Elvis Presley Cut Me And I Bleed 1999
Elvis Presley   Cut Me And I Bleed 1999Elvis Presley   Cut Me And I Bleed 1999
Elvis Presley Cut Me And I Bleed 1999
 
Project pronunciation game 1
Project pronunciation game 1Project pronunciation game 1
Project pronunciation game 1
 
Sgp slideshow
Sgp slideshowSgp slideshow
Sgp slideshow
 
Scott Slotnick Personal Persona
Scott Slotnick Personal PersonaScott Slotnick Personal Persona
Scott Slotnick Personal Persona
 
File Naming Conventions and Creating Stems and Mixes
File Naming Conventions and Creating Stems and MixesFile Naming Conventions and Creating Stems and Mixes
File Naming Conventions and Creating Stems and Mixes
 
Magazine names
Magazine namesMagazine names
Magazine names
 
Music Horror Analysis
Music Horror AnalysisMusic Horror Analysis
Music Horror Analysis
 
\-_-/
\-_-/\-_-/
\-_-/
 

Andere mochten auch

Mp25: Optical Music Recognition with Python
Mp25: Optical Music Recognition with PythonMp25: Optical Music Recognition with Python
Mp25: Optical Music Recognition with Python
Montreal Python
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook game
Montreal Python
 
Mp24: Fabulous Mobile Development with and without Python
Mp24: Fabulous Mobile Development with and without PythonMp24: Fabulous Mobile Development with and without Python
Mp24: Fabulous Mobile Development with and without Python
Montreal Python
 
Mp26 : Connecting Startups with Talents
Mp26 : Connecting Startups with TalentsMp26 : Connecting Startups with Talents
Mp26 : Connecting Startups with Talents
Montreal Python
 
Mp25 Message Switching for Actor Based Designs
Mp25 Message Switching for Actor Based DesignsMp25 Message Switching for Actor Based Designs
Mp25 Message Switching for Actor Based Designs
Montreal Python
 
Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : How do you Solve a Problem like Santa Claus?Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : How do you Solve a Problem like Santa Claus?
Montreal Python
 
Mp26 : Tachyon, sloppiness is bliss
Mp26 : Tachyon, sloppiness is blissMp26 : Tachyon, sloppiness is bliss
Mp26 : Tachyon, sloppiness is bliss
Montreal Python
 

Andere mochten auch (7)

Mp25: Optical Music Recognition with Python
Mp25: Optical Music Recognition with PythonMp25: Optical Music Recognition with Python
Mp25: Optical Music Recognition with Python
 
Mp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook gameMp24: The Bachelor, a facebook game
Mp24: The Bachelor, a facebook game
 
Mp24: Fabulous Mobile Development with and without Python
Mp24: Fabulous Mobile Development with and without PythonMp24: Fabulous Mobile Development with and without Python
Mp24: Fabulous Mobile Development with and without Python
 
Mp26 : Connecting Startups with Talents
Mp26 : Connecting Startups with TalentsMp26 : Connecting Startups with Talents
Mp26 : Connecting Startups with Talents
 
Mp25 Message Switching for Actor Based Designs
Mp25 Message Switching for Actor Based DesignsMp25 Message Switching for Actor Based Designs
Mp25 Message Switching for Actor Based Designs
 
Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : How do you Solve a Problem like Santa Claus?Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : How do you Solve a Problem like Santa Claus?
 
Mp26 : Tachyon, sloppiness is bliss
Mp26 : Tachyon, sloppiness is blissMp26 : Tachyon, sloppiness is bliss
Mp26 : Tachyon, sloppiness is bliss
 

Ähnlich wie Mp25: Audio Fingerprinting and metadata correction with Python

Do Androids Dream Of Algorithmic Playlists
Do Androids Dream Of Algorithmic PlaylistsDo Androids Dream Of Algorithmic Playlists
Do Androids Dream Of Algorithmic Playlists
Matthew Hawn
 
Audio on the web
Audio on the webAudio on the web
Audio on the web
Joel May
 
Mti presentation
Mti presentationMti presentation
Mti presentation
Ding Xu
 
Mti presentation
Mti presentationMti presentation
Mti presentation
Ding Xu
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
luisfvazquez1
 
Teaching Music Technology Concepts with Few Music Technology Resources
Teaching Music Technology Concepts with Few Music Technology ResourcesTeaching Music Technology Concepts with Few Music Technology Resources
Teaching Music Technology Concepts with Few Music Technology Resources
bradfordswanson
 
Music discovery on the net
Music discovery on the netMusic discovery on the net
Music discovery on the net
guestbf080
 

Ähnlich wie Mp25: Audio Fingerprinting and metadata correction with Python (20)

Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)Social Tags and Music Information Retrieval (Part II)
Social Tags and Music Information Retrieval (Part II)
 
Machine Learning for Creative AI Applications in Music (2018 May)
Machine Learning for Creative AI Applications in Music (2018 May)Machine Learning for Creative AI Applications in Music (2018 May)
Machine Learning for Creative AI Applications in Music (2018 May)
 
Copyright in music a lesson in heavy metal
Copyright in music   a lesson in heavy metalCopyright in music   a lesson in heavy metal
Copyright in music a lesson in heavy metal
 
Metadata for musicians: setting up release
Metadata for musicians: setting up releaseMetadata for musicians: setting up release
Metadata for musicians: setting up release
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information Retrieval
 
Introduction to Music Information Retrieval
Introduction to Music Information RetrievalIntroduction to Music Information Retrieval
Introduction to Music Information Retrieval
 
Do Androids Dream Of Algorithmic Playlists
Do Androids Dream Of Algorithmic PlaylistsDo Androids Dream Of Algorithmic Playlists
Do Androids Dream Of Algorithmic Playlists
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
Audio on the web
Audio on the webAudio on the web
Audio on the web
 
Annotating Music Collections: How Content-Based Similarity Helps to Propagate...
Annotating Music Collections: How Content-Based Similarity Helps to Propagate...Annotating Music Collections: How Content-Based Similarity Helps to Propagate...
Annotating Music Collections: How Content-Based Similarity Helps to Propagate...
 
Towards a musical Semantic Web
Towards a musical Semantic WebTowards a musical Semantic Web
Towards a musical Semantic Web
 
Music mobile
Music mobileMusic mobile
Music mobile
 
Audio format
Audio formatAudio format
Audio format
 
Mti presentation
Mti presentationMti presentation
Mti presentation
 
Mti presentation
Mti presentationMti presentation
Mti presentation
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Teaching Music Technology Concepts with Few Music Technology Resources
Teaching Music Technology Concepts with Few Music Technology ResourcesTeaching Music Technology Concepts with Few Music Technology Resources
Teaching Music Technology Concepts with Few Music Technology Resources
 
Music discovery on the net
Music discovery on the netMusic discovery on the net
Music discovery on the net
 
DJ Workshop v.0.2b
DJ Workshop v.0.2bDJ Workshop v.0.2b
DJ Workshop v.0.2b
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Mp25: Audio Fingerprinting and metadata correction with Python

  • 1. Audio fingerprinting and metadata correction with Python Alastair Porter November 21, 2011
  • 2. Me Background in Computer Science Masters McGill Music Tech Online http://github.com/alastair (20/28 music; 11 in python) http://twitter.com/alastairporter
  • 3. Python as a go-to language Quick for prototyping Use the same code in a production release Very handy for API access (thin wrapper around urllib2)
  • 5. Music and Metadata The problem: People are really bad at naming music Inconsistent over releases The solution: Crowdsourcing Get info from as many trusted sources as possible Make renaming take no effort
  • 14. Identification strategy If there’s a CD TOC, use that (musicbrainz lookup) If no match, use audio fingerprinting If no match, do a text lookup (artist/album)
  • 15. Fingerprinting Converts an audio signal to a short sequence of numbers Smaller to compare than an entire file Perceptual features rather than byte comparison (works with different encodings)
  • 16. Identification strategy Fingerprinting gives us a set of candidate tracks A track could be on many albums (original release, best of, mix album) Keep a list of what tracks we have for each album Once we fill all the slots for an album, success!
  • 17. Metadata strategy Text information from Musicbrainz Genre from last.fm Image from Amazon (or folder.jpg) Musicbrainz tells us where these are (don’t need to search) Save in every file (Text is cheap)
  • 18. Writing it all out Custom MP3/ID3 writer Ogg meta tags FLAC meta tags Name files Artist/Artist - Year - Album/01 - Artist - Track Replaygain! Be a good citizen: Submit fingerprints to musicbrainz
  • 19. What’s next New version of musicbrainz New fingerprinter More metadata More metadata
  • 20. Thanks More information: MusicBrainz: http://musicbrainz.org albumidentify: http://github.com/albumidentify/albumidentify More fingerprinting: http://acoustid.org, http://echoprint.me Last.fm