SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Music Personalization:
Realtime Platforms
♫ + ML + You = ❤
CrunchConf, Budapest, October 30, 2015
Esh Kumar
Machine Learning & Data Products @ Spotify NYC
@eshvk
Who am I?
• UT Austin Machine Learning
• Building Large Scale Recommendation Systems @
Mozilla, StumbleUpon & Spotify
75 M+ Active Users
58 Markets
1 TB of Logs/Day
1200+ Node Hadoop Cluster
Products
•Discover … to find new albums
•Discover Weekly … A weekly Playlist
•Editorial Playlist Recommendations
•Radio
Music Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
ML
Content
User
Music Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
Music Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
• Manually tag attributes
• Curation
Music Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
• News, Blogs, NLP
• Manually tag attributes
• Curation
• CF
30 Million Songs…
WhatTo Play?
75 Million Users … 1 Person Every 3 Secs…
Recommendation Systems
• Predict user response to options.
• Rich field: Matrix completion, ranking, text models,
latent factor models.
• Several conferences annually. RecSys, NIPS, ICML etc
• Industry researchers include NFLX, GOOG, MS and
more…
Collaborative Filtering
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Model you based on songs you played…
Predict your future based on similar users…
Millions of users and billions of streams…
…. so there is someone like you out there
Collaborative Filtering
The Netflix Prize.
A million dollars for beating NFLX’s
best algorithms by ~ 10%.
Similarity
Our problem is to figure out how similar two
items are.
Mathematically, this means modeling a function
Similarity(x,y) for all users and items, if possible.
How do we do this?
Matrix Completion. A matrix expresses a system. We model the
data in the form of a matrix. For example, play counts for all songs
and all users could be:
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Call Me Maybe
Esh
Esh listened to call me maybe once…
⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn
Matrix Completion is well studied …
Start with random vectors around the origin. Run alternating least
squares or gradient descent or stochastic gradient descent… All this
is Hadoopable™.
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Call Me Maybe
Esh
Esh listened to call me maybe once…
⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn
30 Million Songs…
WhatTo Play?
75 Million People … 1 Person Every 3 Secs…
1.5 Billion Playlists
Language Models
• Language models work well too. For example, a
playlist could be considered as a document and
you could learn the latent vectors for tracks
(words).
• Then represent a User as a linear combination
of their Tracks.
word2vec
Words with similar contexts have similar
meaning
word2vec
word2vec
Target Word
Context Word
word2vec
Target Words and Corresponding Contexts
shining bright trees dark green
stars 61 50 10 30 1
sun 71 60 5 2 0
cucumber 2 1 15 3 40
word2vec
Playlists CPU Vectors
Read GetVectors & Update
Vectors are awesome!
•Unique fingerprint for every users, tracks,
albums, artists & even playlists in the same
space.
•Similarity is easily computable. Euclidean
Distance or Cosine Similarity.
Approximate Nearest Neighbors
•Fast approximate nearest neighbor search.
• Locality Sensitive Hashing
• https://github.com/spotify/annoy
Vectors are great for Infrastructure too…
•Machine Learning can be decomposed &
abstracted away.
•A Lambda Architecture involving Machine
Learning becomes eas(ier).
•Platforms for Personalization become
possible….
The Record Store…
The List Maker …
How do you scale this?
Tools of the trade
• Build models in Python. (NumPy, SciPy )
• Jobs in Scalding + Luigi ( https://github.com/spotify/luigi )
• Storm for real time.
• In house RPC for serving requests.
Storm 101
• Realtime Stream Processing.
• Like Hadoop but easier.
• Fault tolerant.
• Java, Clojure (yay!) and more!
Storm @ Spotify
• Major users are Ads & Personalization!
• Everyteam manages its own cluster. For personalization, we have
a 12 node cluster.
• Relatively a new tech, compared to Hadoop™.
So why Storm?
• Hadoop is slowwww. Daily UserVector jobs takes ~ 16 hours to
run. Small Data FTW!
• New Users are important; they need a friend!
• What moment are you in? Gym, Running etc?.
Getting Data Across The Globe
HDFS
Kafka
Pipeline …
User

Listens
Playlists
Realtime Listens
Spout
HDFS
Kafka
Pipeline …
User

Listens
Playlists
Realtime Listens
Spout
User Vector
Generation Job
Latent
Vector
Models
Track, Artist, Album
Vectors
HDFS
Kafka
Pipeline …
User

Listens
Playlists
Realtime Listens
Spout
User Vector
Generation Job
Latent
Vector
Models
Track, Artist, Album
Vectors
Compressed
Listening History
Bolts
Cassandra
Cassandra
HDFS
Kafka
Pipeline + Platform
User

Listens
Playlists
Realtime Listens
Spout
User Vector
Generation Job
Latent
Vector
Models
Track, Artist, Album
Vectors
Compressed
Listening History
Bolts
Cassandra
Cassandra
Backend
Systems
•Top Albums
•Top Tracks
•Top Playlists
Discover New User
•Going from two weeks of no
recommendations to recommendations as
soon as a user plays a track.
•Successful A/B test
•First team to build a production ready
personalization feature using Storm.
Lessons Learnt …
• Boring technology works well. Complicated Storm
Topology = Bad. (Dan Mckinley)
• Storm is nice. Would have preferred reusing batch
Scalding Code. Maybe Spark Streaming?
• Grow your API from one use case to another. Don’t
solve for everything at one time.
Join the band!
• Machine Learning, Data & Backend Gigs.
• Now touring in New York, Boston & Stockholm!
• https://www.spotify.com/jobs/
Thanks !
Esh Kumar
@eshvk

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ SpotifyOscar Carlsson
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSophia Ciocca
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with SparkChris Johnson
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Mounia Lalmas-Roelleke
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyChris Johnson
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ SpotifyNikhil Tibrewal
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsJimmy Mårdell
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at ScaleMounia Lalmas-Roelleke
 

Was ist angesagt? (20)

Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Search @ Spotify
Search @ Spotify Search @ Spotify
Search @ Spotify
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objects
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 

Ähnlich wie Music Personalization Platforms: Realtime ML for 75M Spotify Users

Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101 Esh Vckay
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5Aloïs Gruson
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsRamzi Karam
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova musicNAVER D2
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Information retrieval to recommender systems
Information retrieval to recommender systemsInformation retrieval to recommender systems
Information retrieval to recommender systemsData Science Society
 
Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoveryKarthik Murugesan
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming serviceJulie Knibbe
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language AnalysisRudradeb Mitra
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 

Ähnlich wie Music Personalization Platforms: Realtime ML for 75M Spotify Users (20)

Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better Recommendations
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
 
Hive at Last.fm
Hive at Last.fmHive at Last.fm
Hive at Last.fm
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Information retrieval to recommender systems
Information retrieval to recommender systemsInformation retrieval to recommender systems
Information retrieval to recommender systems
 
Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music Discovery
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Natural language Analysis
Natural language AnalysisNatural language Analysis
Natural language Analysis
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)
 
Cassandra nyc
Cassandra nycCassandra nyc
Cassandra nyc
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!Let the Public and the Computer do the Metadata Work!
Let the Public and the Computer do the Metadata Work!
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 

Kürzlich hochgeladen

办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Kürzlich hochgeladen (20)

办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Music Personalization Platforms: Realtime ML for 75M Spotify Users

  • 1. Music Personalization: Realtime Platforms ♫ + ML + You = ❤ CrunchConf, Budapest, October 30, 2015
  • 2. Esh Kumar Machine Learning & Data Products @ Spotify NYC @eshvk
  • 3. Who am I? • UT Austin Machine Learning • Building Large Scale Recommendation Systems @ Mozilla, StumbleUpon & Spotify
  • 4. 75 M+ Active Users
  • 6. 1 TB of Logs/Day
  • 8. Products •Discover … to find new albums •Discover Weekly … A weekly Playlist •Editorial Playlist Recommendations •Radio
  • 9. Music Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based ML Content User
  • 10. Music Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based • News, Blogs, NLP
  • 11. Music Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based • News, Blogs, NLP • Manually tag attributes • Curation
  • 12. Music Personalization •Understanding People ➡ User Experience, Cultural Variations •Understanding Content ➡ Genres, Cultural knowledge •Models ➡ Collaborative Filtering, Content Based • News, Blogs, NLP • Manually tag attributes • Curation • CF
  • 13. 30 Million Songs… WhatTo Play? 75 Million Users … 1 Person Every 3 Secs…
  • 14. Recommendation Systems • Predict user response to options. • Rich field: Matrix completion, ranking, text models, latent factor models. • Several conferences annually. RecSys, NIPS, ICML etc • Industry researchers include NFLX, GOOG, MS and more…
  • 15. Collaborative Filtering Hey, I like tracks P, Q, R, S! Well, I like tracks Q, R, S, T! Then you should check out track P! Nice! Btw try track T! Model you based on songs you played… Predict your future based on similar users… Millions of users and billions of streams… …. so there is someone like you out there
  • 16. Collaborative Filtering The Netflix Prize. A million dollars for beating NFLX’s best algorithms by ~ 10%.
  • 17. Similarity Our problem is to figure out how similar two items are. Mathematically, this means modeling a function Similarity(x,y) for all users and items, if possible.
  • 18. How do we do this? Matrix Completion. A matrix expresses a system. We model the data in the form of a matrix. For example, play counts for all songs and all users could be: Users 8 >>>>>>< >>>>>>: 0 B B B B B B @ Song Plays z }| { s1,1 s1,2 14 · · · s1,n s2,1 s2,2 2 · · · s2,n · · · sm,1 sm,2 1 · · · sm,n 1 C C C C C C A Users 8 >>>>>>< >>>>>>: 0 B B B B B B @ Song Plays z }| { s1,1 s1,2 14 · · · s1,n s2,1 s2,2 2 · · · s2,n · · · sm,1 sm,2 1 · · · sm,n 1 C C C C C C A Call Me Maybe Esh Esh listened to call me maybe once… ⇡ 0 B B B B B B B B B @ u1 u2 ... ... ... um 1 C C C C C C C C C A t1 t2 · · · · · · · · · tn⇡ 0 B B B B B B B B B @ u1 u2 ... ... ... um 1 C C C C C C C C C A t1 t2 · · · · · · · · · tn
  • 19. Matrix Completion is well studied … Start with random vectors around the origin. Run alternating least squares or gradient descent or stochastic gradient descent… All this is Hadoopable™. Users 8 >>>>>>< >>>>>>: 0 B B B B B B @ Song Plays z }| { s1,1 s1,2 14 · · · s1,n s2,1 s2,2 2 · · · s2,n · · · sm,1 sm,2 1 · · · sm,n 1 C C C C C C A Users 8 >>>>>>< >>>>>>: 0 B B B B B B @ Song Plays z }| { s1,1 s1,2 14 · · · s1,n s2,1 s2,2 2 · · · s2,n · · · sm,1 sm,2 1 · · · sm,n 1 C C C C C C A Call Me Maybe Esh Esh listened to call me maybe once… ⇡ 0 B B B B B B B B B @ u1 u2 ... ... ... um 1 C C C C C C C C C A t1 t2 · · · · · · · · · tn⇡ 0 B B B B B B B B B @ u1 u2 ... ... ... um 1 C C C C C C C C C A t1 t2 · · · · · · · · · tn
  • 20. 30 Million Songs… WhatTo Play? 75 Million People … 1 Person Every 3 Secs…
  • 22. Language Models • Language models work well too. For example, a playlist could be considered as a document and you could learn the latent vectors for tracks (words). • Then represent a User as a linear combination of their Tracks.
  • 23. word2vec Words with similar contexts have similar meaning
  • 26. word2vec Target Words and Corresponding Contexts shining bright trees dark green stars 61 50 10 30 1 sun 71 60 5 2 0 cucumber 2 1 15 3 40
  • 27. word2vec Playlists CPU Vectors Read GetVectors & Update
  • 28. Vectors are awesome! •Unique fingerprint for every users, tracks, albums, artists & even playlists in the same space. •Similarity is easily computable. Euclidean Distance or Cosine Similarity.
  • 29. Approximate Nearest Neighbors •Fast approximate nearest neighbor search. • Locality Sensitive Hashing • https://github.com/spotify/annoy
  • 30. Vectors are great for Infrastructure too… •Machine Learning can be decomposed & abstracted away. •A Lambda Architecture involving Machine Learning becomes eas(ier). •Platforms for Personalization become possible….
  • 31. The Record Store… The List Maker … How do you scale this?
  • 32. Tools of the trade • Build models in Python. (NumPy, SciPy ) • Jobs in Scalding + Luigi ( https://github.com/spotify/luigi ) • Storm for real time. • In house RPC for serving requests.
  • 33. Storm 101 • Realtime Stream Processing. • Like Hadoop but easier. • Fault tolerant. • Java, Clojure (yay!) and more!
  • 34. Storm @ Spotify • Major users are Ads & Personalization! • Everyteam manages its own cluster. For personalization, we have a 12 node cluster. • Relatively a new tech, compared to Hadoop™.
  • 35. So why Storm? • Hadoop is slowwww. Daily UserVector jobs takes ~ 16 hours to run. Small Data FTW! • New Users are important; they need a friend! • What moment are you in? Gym, Running etc?.
  • 36. Getting Data Across The Globe
  • 38. HDFS Kafka Pipeline … User
 Listens Playlists Realtime Listens Spout User Vector Generation Job Latent Vector Models Track, Artist, Album Vectors
  • 39. HDFS Kafka Pipeline … User
 Listens Playlists Realtime Listens Spout User Vector Generation Job Latent Vector Models Track, Artist, Album Vectors Compressed Listening History Bolts Cassandra Cassandra
  • 40. HDFS Kafka Pipeline + Platform User
 Listens Playlists Realtime Listens Spout User Vector Generation Job Latent Vector Models Track, Artist, Album Vectors Compressed Listening History Bolts Cassandra Cassandra Backend Systems •Top Albums •Top Tracks •Top Playlists
  • 41. Discover New User •Going from two weeks of no recommendations to recommendations as soon as a user plays a track. •Successful A/B test •First team to build a production ready personalization feature using Storm.
  • 42. Lessons Learnt … • Boring technology works well. Complicated Storm Topology = Bad. (Dan Mckinley) • Storm is nice. Would have preferred reusing batch Scalding Code. Maybe Spark Streaming? • Grow your API from one use case to another. Don’t solve for everything at one time.
  • 43. Join the band! • Machine Learning, Data & Backend Gigs. • Now touring in New York, Boston & Stockholm! • https://www.spotify.com/jobs/