SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Deep learning for
music recommendation
Aloïs Gruson
@nilandmusic@aloisgr niland.io
Who we are
• Founded in 2013 by 2 PhDs who worked at IRCAM
• Won Mirex 2011 in Music Similarity Estimation and Music
Classification
• We sell our technology through our API
• A team of 9 today
What we want to do
•Create a high-dimensional space where every
song is a vector
•Use this space to find similars and classify
songs
•Each query must be <50ms in millions of tracks
How music information retrieval worked in 2011
• Short-term descriptors: MFCCs,
Fluctuation Patterns ("Block-level
audio features for music genres
classification",Seyerlehner and
al.) and much more !
• Pooling techniques : VQ, GMM-SV
("GMM Supervector for content
based music similarity",
Charbuillet and al.), Vlad
("Aggregation local descriptors
into a compact image
representation", Jégou and al.) ...
Audio
MFCCs
Vlad
FP
GMM-
SV
One of our evaluation datasets
• Evaluation metrics for search engine : Precision at K or
mean Average Precision
• Evaluation set presented here : 8500 tracks in 141
playlists from mainstream music
P@k 1 5 10 20 50
mirex2011 17,48 15,39 13,87 12,23 10,00
From 2013 to 2014 @niland
• How to make a product from research work !
• And a lot of work on short-term descriptors and pooling techniques
• But still completely unsupervised, no real way to match outputs with
human perception !
P@k 1 5 10 20 50
mirex2011 17,48 15,39 13,87 12,23 10,00
2014 19,70 16,81 15,37 13,57 11,01
% +12.70 +9.23 +10.81 +10.96 +10.10
Matching algorithm outputs with human perception
•Learn the outputs of a collaborative filtering
model
"Deep content-based music recommendation", Oord and
al.
•Or use a network trained to classify into groups
of similar tracks
Integrating human idea of similarity
•150k tracks in 3500 theme-based albums from
of our clients
•Each album represents a genre, mood or an
usage
•Each gathers socially similar tracks
• We use outputs from our previous system
• We train it with a classification cost
• And remove the classification layer !
P@k 1 5 10 20 50
2014 19,70 16,81 15,37 13,57 11,01
+deep 23,40 21,09 19,68 18,07 15,19
% +18.78 +25.46 +28.04 +33.16 +37.97
Learning with theme-based albums
What if we want to remove the highly engineered features and
pooling techniques ?
Convolutional Neural Networks for Image Recognition :
Source : http://www.clarifai.com/technology
And for music ?
• Mel-Spectrogram (time-frequency representation) as an
input : axis have different meanings !
Should we really use square filters ?
• Labels on the whole track (>= 30 seconds) : input is
128x1200 for a 30 second song !
We have to pool along time axis !
And for music ?
Source : Sander Dieleman, http://benanne.github.io/2014/08/05/spotify-cnns.html
And for music ?
Some ideas to slightly improve it :
• Multi-scale pooling
• Reduce max pooling
• Add batch-norm
P@k 1 5 10 20 50
2014+deep 23,40 21,09 19,68 18,07 15,19
CNN 23,85 21,31 19,81 18,06 15,18
Okay, so ?
• Our 2014 system is a mix of 6 different short-term
descriptors + 6 different "smart" pooling functions, 10
years of research !
• Has the engineering problem become a data problem ?
P@k 1 5 10 20 50
2014+deep 23,40 21,09 19,68 18,07 15,19
CNN 23,85 21,31 19,81 18,06 15,18
From Fisher Vectors to simple pooling functions?
• A very simple pooling function can give great results !
P@k 1 5 10 20 50
Mean 20,94 19,04 17,69 16,17 13,74
Max 22,21 19,90 18,58 17,07 14,61
Var 21,66 19,46 18,14 16,58 14,13
Mean+Max+Var 23,85 21,31 19,81 18,06 15,18
And with square filters?
•Square filters also seem to work !
P@k 1 5 10 20 50
CNN 23,85 21,31 19,81 18,06 15,18
CNNsq 22,94 20,84 19,79 18,15 15,52
A transferable model for music
• Works also for world music, library music…
• This dataset : 10k tracks from library music, 300 groups
P@k 1 5 10 20 50
2014+deep 30,66 19,99 15,57 11,81 7,93
CNN 29,76 19,82 15,55 11,85 7,80
The spectrogram is still an engineered feature…
Could we learn a better temporal filter bank to
replace FFT and mel-filtering ?
“End-to-end learning for music audio", Dieleman and al.
"Learning the Speech Front-end with raw waveform CLDNNs",
Sainath and al.
Source: "Learning the Speech Front-end with raw waveform CLDNNs", Sainath and al.
P@k 1 5 10 20 50
Raw 20,11 18,95 17,23 15,91 14,26
Spectro 23,85 21,31 19,81 18,06 15,18
The spectrogram is still an engineered feature…
Maybe we need more data ?
We can improve !
• Add more albums !
• With 500k tracks ? 1M ?
P@k 1 5 10 20 50
25k tracks 19,84 17,98 15,21 14,06 13,41
150k tracks 23,85 21,31 19,81 18,06 15,18
And …
• Add more layers !
"Deep Residual Learning for Image Recognition", He and al.
P@k 1 5 10 20 50
PlainNet9 23,85 21,31 19,81 18,06 15,18
ResNet78 23,87 22,17 20,98 19,38 16,68
And ?
• Data augmentation ?
"Exploring data augmentation for improved singing voice detection with neural networks",
Schlüter and Grill
• Recurrent Neural Networks ?
• Siamese Network ?
"An exploration of deep learning in music informatics", Humphrey and al.
• More data ! Or semi supervised approach ?
"Semi-supervised learning with ladder networks", Rasmus and al.
Questions ?
@aloisgr @nilandmusicniland.io
Try it for yourself : http://demo.niland.io

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (15)

"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi"All you need is AI and music" by Keunwoo Choi
"All you need is AI and music" by Keunwoo Choi
 
20211026 taicca 2 music generation
20211026 taicca 2 music generation20211026 taicca 2 music generation
20211026 taicca 2 music generation
 
machine learning x music
machine learning x musicmachine learning x music
machine learning x music
 
20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI20190625 Research at Taiwan AI Labs: Music and Speech AI
20190625 Research at Taiwan AI Labs: Music and Speech AI
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Automatic Music Transcription
Automatic Music TranscriptionAutomatic Music Transcription
Automatic Music Transcription
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
More Like This: Machine Learning Approaches to Music similarity
More Like This: Machine Learning Approaches to Music similarityMore Like This: Machine Learning Approaches to Music similarity
More Like This: Machine Learning Approaches to Music similarity
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101
 
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceAudio Source Separation Based on Low-Rank Structure and Statistical Independence
Audio Source Separation Based on Low-Rank Structure and Statistical Independence
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
Igor Kostiuk “Как приручить музыкальную рекомендательную систему”
 
Understanding Music Playlists
Understanding Music PlaylistsUnderstanding Music Playlists
Understanding Music Playlists
 
The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...The effects of noisy labels on deep convolutional neural networks for music t...
The effects of noisy labels on deep convolutional neural networks for music t...
 

Andere mochten auch

Talwar_Rakshak_2016URD
Talwar_Rakshak_2016URDTalwar_Rakshak_2016URD
Talwar_Rakshak_2016URD
Rakshak Talwar
 

Andere mochten auch (12)

Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
 
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
Automatic Tagging using Deep Convolutional Neural Networks - ISMIR 2016
 
Talwar_Rakshak_2016URD
Talwar_Rakshak_2016URDTalwar_Rakshak_2016URD
Talwar_Rakshak_2016URD
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
Pycon apac 2014
Pycon apac 2014Pycon apac 2014
Pycon apac 2014
 
Audio chord recognition using deep neural networks
Audio chord recognition using deep neural networksAudio chord recognition using deep neural networks
Audio chord recognition using deep neural networks
 
딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)딥러닝 개요 (2015-05-09 KISTEP)
딥러닝 개요 (2015-05-09 KISTEP)
 
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song...
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
GTC 2016 ディープラーニング最新情報
GTC 2016 ディープラーニング最新情報GTC 2016 ディープラーニング最新情報
GTC 2016 ディープラーニング最新情報
 

Ähnlich wie Deep Learning Meetup #5

A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm game
Kuan Ting Chen
 
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
Ju-Chiang Wang
 

Ähnlich wie Deep Learning Meetup #5 (20)

[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
 
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...Recsys Challenge 2018 - Creamy Fireflies -  Artist-driven layering and user’s...
Recsys Challenge 2018 - Creamy Fireflies - Artist-driven layering and user’s...
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
 
AC overview
AC overviewAC overview
AC overview
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
 
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
Random Walk with Restart for Automatic Playlist Continuation and Query-specif...
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better Recommendations
 
Music Classification at SoundCloud
Music Classification at SoundCloudMusic Classification at SoundCloud
Music Classification at SoundCloud
 
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music RecognitionScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
 
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
AI&BigData Lab 2016. Игорь Костюк: Как приручить музыкальную рекомендательную...
 
Mit21 m 380s12_complecnot
Mit21 m 380s12_complecnotMit21 m 380s12_complecnot
Mit21 m 380s12_complecnot
 
Music Objects to Social Machines
Music Objects to Social MachinesMusic Objects to Social Machines
Music Objects to Social Machines
 
Timbral modeling for music artist recognition using i-vectors
Timbral modeling for music artist recognition using i-vectorsTimbral modeling for music artist recognition using i-vectors
Timbral modeling for music artist recognition using i-vectors
 
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov ModelAudio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Fun with MATLAB
Fun with MATLABFun with MATLAB
Fun with MATLAB
 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
 
A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm game
 
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and R...
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Deep Learning Meetup #5

  • 1. Deep learning for music recommendation Aloïs Gruson @nilandmusic@aloisgr niland.io
  • 2. Who we are • Founded in 2013 by 2 PhDs who worked at IRCAM • Won Mirex 2011 in Music Similarity Estimation and Music Classification • We sell our technology through our API • A team of 9 today
  • 3. What we want to do •Create a high-dimensional space where every song is a vector •Use this space to find similars and classify songs •Each query must be <50ms in millions of tracks
  • 4. How music information retrieval worked in 2011 • Short-term descriptors: MFCCs, Fluctuation Patterns ("Block-level audio features for music genres classification",Seyerlehner and al.) and much more ! • Pooling techniques : VQ, GMM-SV ("GMM Supervector for content based music similarity", Charbuillet and al.), Vlad ("Aggregation local descriptors into a compact image representation", Jégou and al.) ... Audio MFCCs Vlad FP GMM- SV
  • 5. One of our evaluation datasets • Evaluation metrics for search engine : Precision at K or mean Average Precision • Evaluation set presented here : 8500 tracks in 141 playlists from mainstream music P@k 1 5 10 20 50 mirex2011 17,48 15,39 13,87 12,23 10,00
  • 6. From 2013 to 2014 @niland • How to make a product from research work ! • And a lot of work on short-term descriptors and pooling techniques • But still completely unsupervised, no real way to match outputs with human perception ! P@k 1 5 10 20 50 mirex2011 17,48 15,39 13,87 12,23 10,00 2014 19,70 16,81 15,37 13,57 11,01 % +12.70 +9.23 +10.81 +10.96 +10.10
  • 7. Matching algorithm outputs with human perception •Learn the outputs of a collaborative filtering model "Deep content-based music recommendation", Oord and al. •Or use a network trained to classify into groups of similar tracks
  • 8. Integrating human idea of similarity •150k tracks in 3500 theme-based albums from of our clients •Each album represents a genre, mood or an usage •Each gathers socially similar tracks
  • 9. • We use outputs from our previous system • We train it with a classification cost • And remove the classification layer ! P@k 1 5 10 20 50 2014 19,70 16,81 15,37 13,57 11,01 +deep 23,40 21,09 19,68 18,07 15,19 % +18.78 +25.46 +28.04 +33.16 +37.97 Learning with theme-based albums
  • 10. What if we want to remove the highly engineered features and pooling techniques ? Convolutional Neural Networks for Image Recognition : Source : http://www.clarifai.com/technology
  • 11. And for music ? • Mel-Spectrogram (time-frequency representation) as an input : axis have different meanings ! Should we really use square filters ? • Labels on the whole track (>= 30 seconds) : input is 128x1200 for a 30 second song ! We have to pool along time axis !
  • 12. And for music ? Source : Sander Dieleman, http://benanne.github.io/2014/08/05/spotify-cnns.html
  • 13. And for music ? Some ideas to slightly improve it : • Multi-scale pooling • Reduce max pooling • Add batch-norm P@k 1 5 10 20 50 2014+deep 23,40 21,09 19,68 18,07 15,19 CNN 23,85 21,31 19,81 18,06 15,18
  • 14. Okay, so ? • Our 2014 system is a mix of 6 different short-term descriptors + 6 different "smart" pooling functions, 10 years of research ! • Has the engineering problem become a data problem ? P@k 1 5 10 20 50 2014+deep 23,40 21,09 19,68 18,07 15,19 CNN 23,85 21,31 19,81 18,06 15,18
  • 15. From Fisher Vectors to simple pooling functions? • A very simple pooling function can give great results ! P@k 1 5 10 20 50 Mean 20,94 19,04 17,69 16,17 13,74 Max 22,21 19,90 18,58 17,07 14,61 Var 21,66 19,46 18,14 16,58 14,13 Mean+Max+Var 23,85 21,31 19,81 18,06 15,18
  • 16. And with square filters? •Square filters also seem to work ! P@k 1 5 10 20 50 CNN 23,85 21,31 19,81 18,06 15,18 CNNsq 22,94 20,84 19,79 18,15 15,52
  • 17. A transferable model for music • Works also for world music, library music… • This dataset : 10k tracks from library music, 300 groups P@k 1 5 10 20 50 2014+deep 30,66 19,99 15,57 11,81 7,93 CNN 29,76 19,82 15,55 11,85 7,80
  • 18. The spectrogram is still an engineered feature… Could we learn a better temporal filter bank to replace FFT and mel-filtering ? “End-to-end learning for music audio", Dieleman and al. "Learning the Speech Front-end with raw waveform CLDNNs", Sainath and al.
  • 19. Source: "Learning the Speech Front-end with raw waveform CLDNNs", Sainath and al.
  • 20. P@k 1 5 10 20 50 Raw 20,11 18,95 17,23 15,91 14,26 Spectro 23,85 21,31 19,81 18,06 15,18 The spectrogram is still an engineered feature… Maybe we need more data ?
  • 21. We can improve ! • Add more albums ! • With 500k tracks ? 1M ? P@k 1 5 10 20 50 25k tracks 19,84 17,98 15,21 14,06 13,41 150k tracks 23,85 21,31 19,81 18,06 15,18
  • 22. And … • Add more layers ! "Deep Residual Learning for Image Recognition", He and al. P@k 1 5 10 20 50 PlainNet9 23,85 21,31 19,81 18,06 15,18 ResNet78 23,87 22,17 20,98 19,38 16,68
  • 23. And ? • Data augmentation ? "Exploring data augmentation for improved singing voice detection with neural networks", Schlüter and Grill • Recurrent Neural Networks ? • Siamese Network ? "An exploration of deep learning in music informatics", Humphrey and al. • More data ! Or semi supervised approach ? "Semi-supervised learning with ladder networks", Rasmus and al.
  • 24. Questions ? @aloisgr @nilandmusicniland.io Try it for yourself : http://demo.niland.io