SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Oscar Carlsson
Data Engineer
lad@spotify.com
Big Data
and
Machine Learning
@ Spotify
Friday 6/3 2015
● D-student starting 2009
● Graduated last year from CSALL
(Student in this class 2013)
● Master thesis at Spotify
● Data Engineer at Spotify in Gothenburg
Me
● What is data at Spotify?
● Big data and processing it
● Using data at Spotify
● Machine Learning
Outline
Supervised learning:
data (X), labels (Y)
Unsupervised learning:
data (X)
In the Machine Learning class:
What is data at Spotify?
Songs Track
Metadata
User generated Users Playlists
Cover arts Listens Country, email etc Tracks of
playlist
Album Clicks Add/Removes
Genres, Mood
etc
Page views
30 Million songs
60 Million Monthly Active Users
58 Markets
15 Million subscribers
1.5 Billion Playlists
● What is data at Spotify?
● Big data and processing it
● Using data at Spotify
● Machine Learning
Outline
Big Data and processing it
● 20 TB compressed data / DAY
○ 200 TB generated and stored / day (replication)
● Our business is highly dependent on these logs
○ We pay artist depending on plays, plays = logs
Too much to store on a single computer. We need a
cluster to process it!
.. this is typically what is called “Big Data”
Big Data and processing it
● Distributed computing and storage
○ Hadoop
■ MapReduce
○ Cassandra
● Hadoop cluster
○ 1100 nodes
○ ~8000 jobs/day
● What is data at Spotify?
● Big data and processing it
● Using data at Spotify
● Machine Learning
Outline
Using data at Spotify
Everyone part of the company is interested in our data
● Product
○ Are people using X? Should we focus on features such as Y?
● Insights
○ What music is trending? What artists is popular where?
● Performance
○ How is latency in country Y? Did this reduce stutter in country X?
Using data at Spotify
● Data-driven decision making
○ Like.. every decision.
○ Analysts / Data scientists
● A/B test everything!
● A/B testing:
○ Statistical hypothesis testing
○ Simple randomized experiment with >= 2
variants (A, B)
Using data at Spotify: A/B testing
Objective: Decrease time from loading playlist to first play
Hypothesis: The bigger button the faster users finds it
Test set up:
● A - variant 1
○ 2% US and SE MAU users
● B - variant 2
○ 2% US and SE MAU users
● Control - normal
○ Rest of users in US SE
“The shuffle button”
Using data at Spotify: A/B testing
CONTROL A B
Analytics: A/B testing
Metric:
Share of users playing first play > 500ms
(500ms is made up)
Lets roll out A to all users and throw away B!
● What is data at Spotify?
● Big data and processing it
● Using data at Spotify
● Machine Learning
Outline
● Machine Learning
○ User analysis
○ Artist disambiguation
○ Recommender systems
Outline
“ A music session
somehow represents
a moment for the
user. Can we find
these moments and
describe them? ”
● Take a subset of user listening data with new genre
data
○ Combine listens in sessions
■ Consequent plays, no 15 min pause
○ Session = [genres]
● Clustering algorithms to find similar sessions
○ K-means / Hierarchical clustering
● Describe the clusters using logistic regression
Machine Learning: Cluster user music sessions
Machine Learning: Cluster user music sessions
K-Means Per cluster classification
Machine Learning: Cluster user music sessions
Per cluster logistic regression
w: weight vector
Each w_i can be interpreted as the effect in the x_i variable
x_i = genres
Machine Learning: Cluster user music sessions
Clusters described by logistic regression
name of x_i
at largest
w_i
Machine Learning: Cluster user music sessions
Machine Learning: Cluster user music sessions
Machine Learning
Artist disambiguation
Cleaning up the artists pages
Machine Learning: Artist disambiguation
Machine Learning: Artist disambiguation
Lets listen to those tracks!
Is it really the same Fredrik?
Machine Learning: Artist disambiguation
Machine Learning: Artist disambiguation
● Rank artists with probability of being ambiguous
● Apply clustering on each “ambiguous” artists
albums/tracks
○ Using features such as country, release year,
label/licensor etc.
○ Distinct cluster could be different artists
● Nicely present this for manual curation
Machine Learning: Recommender system
The discover page
Machine Learning: Recommender system
Collaborative filtering
Machine Learning: Recommender system
Collaborative filtering
● Build a matrix of user plays
● Compute similarity between items
Machine Learning: Recommender system
4 Million tracks x 60 Million users
→ Pairwise similarity infeasible
Approximate the matrix with NMF
Machine Learning: Recommender system
Matrix factorization (latent factor models)
Machine Learning: Recommender system
Small vectors
Cosine similarity and dot product efficient
Machine Learning: Recommender system
Finding recommendations:
Approximate nearest neighbour (ANN)
code: https://github.com/spotify/annoy
Related artists & Radio:
Similar to user recommendations, more models and not
all CF-based
Multiple models:
Score candidates from all models, combine and rank!
Machine Learning: Recommender system
I just went through this quickly, read more details of
Spotify Rec sys here:
Doing this on MapReduce
Comparing with Netflix
Music Rec @ MLConf 2014
● More content-based ML
○ Fingerprinting: Echo nest
○ Content-based music recommendation using
convolutional neural networks
● Personalize everything
○ Emails
○ Ads
○ User profiling
● ML on other parts of product than Rec Sys
.. final last words on the Future of ML at Spotify
Summary
● Multiple data sources -> multiple angles
● Data drives decision with A/B testing
● User analysis
○ Cluster and describe with classifier
● Artist disambiguation
○ Cluster and give to manual curators
● Recommender systems
○ Collaborative filtering
● We supervise thesis workers
○ Artist disambiguation/deduplication
○ Cluster user music sessions
○ Context-based recommender systems
○ Personalized ads / Personalized emails
● We have internships!
www.spotify.com/jobs
.. and potentially you could help us?
Oscar Carlsson
lad@spotify.com
Linkedin
Thank you for
listening!

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
Adam Kawa
 

Was ist angesagt? (20)

Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music Discovery
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 

Ähnlich wie Big data and machine learning @ Spotify

(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
icwe2015
 
MUSIC APPLICATION (1).pdf
MUSIC   APPLICATION (1).pdfMUSIC   APPLICATION (1).pdf
MUSIC APPLICATION (1).pdf
namrataSingh900842
 

Ähnlich wie Big data and machine learning @ Spotify (20)

(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
 
Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101
 
Data Science Game 2017 - Machine Learning Meetup Presentation
Data Science Game 2017 - Machine Learning Meetup PresentationData Science Game 2017 - Machine Learning Meetup Presentation
Data Science Game 2017 - Machine Learning Meetup Presentation
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better Recommendations
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Story of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer FlowStory of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer Flow
 
Music
MusicMusic
Music
 
Spotify company presentation
Spotify company presentationSpotify company presentation
Spotify company presentation
 
Thesis presentation on Music Information Retrieval
Thesis presentation on Music Information RetrievalThesis presentation on Music Information Retrieval
Thesis presentation on Music Information Retrieval
 
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
 
MUSIC APPLICATION (1).pdf
MUSIC   APPLICATION (1).pdfMUSIC   APPLICATION (1).pdf
MUSIC APPLICATION (1).pdf
 
Map Reduce: An Example (James Grant at Big Data Brighton)
Map Reduce: An Example (James Grant at Big Data Brighton)Map Reduce: An Example (James Grant at Big Data Brighton)
Map Reduce: An Example (James Grant at Big Data Brighton)
 
Search @ Spotify
Search @ Spotify Search @ Spotify
Search @ Spotify
 

Kürzlich hochgeladen

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Kürzlich hochgeladen (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Big data and machine learning @ Spotify

  • 1. Oscar Carlsson Data Engineer lad@spotify.com Big Data and Machine Learning @ Spotify Friday 6/3 2015
  • 2. ● D-student starting 2009 ● Graduated last year from CSALL (Student in this class 2013) ● Master thesis at Spotify ● Data Engineer at Spotify in Gothenburg Me
  • 3. ● What is data at Spotify? ● Big data and processing it ● Using data at Spotify ● Machine Learning Outline
  • 4. Supervised learning: data (X), labels (Y) Unsupervised learning: data (X) In the Machine Learning class:
  • 5. What is data at Spotify? Songs Track Metadata User generated Users Playlists Cover arts Listens Country, email etc Tracks of playlist Album Clicks Add/Removes Genres, Mood etc Page views 30 Million songs 60 Million Monthly Active Users 58 Markets 15 Million subscribers 1.5 Billion Playlists
  • 6. ● What is data at Spotify? ● Big data and processing it ● Using data at Spotify ● Machine Learning Outline
  • 7. Big Data and processing it ● 20 TB compressed data / DAY ○ 200 TB generated and stored / day (replication) ● Our business is highly dependent on these logs ○ We pay artist depending on plays, plays = logs Too much to store on a single computer. We need a cluster to process it! .. this is typically what is called “Big Data”
  • 8. Big Data and processing it ● Distributed computing and storage ○ Hadoop ■ MapReduce ○ Cassandra ● Hadoop cluster ○ 1100 nodes ○ ~8000 jobs/day
  • 9. ● What is data at Spotify? ● Big data and processing it ● Using data at Spotify ● Machine Learning Outline
  • 10. Using data at Spotify Everyone part of the company is interested in our data ● Product ○ Are people using X? Should we focus on features such as Y? ● Insights ○ What music is trending? What artists is popular where? ● Performance ○ How is latency in country Y? Did this reduce stutter in country X?
  • 11. Using data at Spotify ● Data-driven decision making ○ Like.. every decision. ○ Analysts / Data scientists ● A/B test everything! ● A/B testing: ○ Statistical hypothesis testing ○ Simple randomized experiment with >= 2 variants (A, B)
  • 12. Using data at Spotify: A/B testing Objective: Decrease time from loading playlist to first play Hypothesis: The bigger button the faster users finds it Test set up: ● A - variant 1 ○ 2% US and SE MAU users ● B - variant 2 ○ 2% US and SE MAU users ● Control - normal ○ Rest of users in US SE “The shuffle button”
  • 13. Using data at Spotify: A/B testing CONTROL A B
  • 14. Analytics: A/B testing Metric: Share of users playing first play > 500ms (500ms is made up) Lets roll out A to all users and throw away B!
  • 15. ● What is data at Spotify? ● Big data and processing it ● Using data at Spotify ● Machine Learning Outline
  • 16. ● Machine Learning ○ User analysis ○ Artist disambiguation ○ Recommender systems Outline
  • 17. “ A music session somehow represents a moment for the user. Can we find these moments and describe them? ”
  • 18. ● Take a subset of user listening data with new genre data ○ Combine listens in sessions ■ Consequent plays, no 15 min pause ○ Session = [genres] ● Clustering algorithms to find similar sessions ○ K-means / Hierarchical clustering ● Describe the clusters using logistic regression Machine Learning: Cluster user music sessions
  • 19. Machine Learning: Cluster user music sessions K-Means Per cluster classification
  • 20. Machine Learning: Cluster user music sessions Per cluster logistic regression w: weight vector Each w_i can be interpreted as the effect in the x_i variable x_i = genres
  • 21. Machine Learning: Cluster user music sessions Clusters described by logistic regression name of x_i at largest w_i
  • 22. Machine Learning: Cluster user music sessions
  • 23. Machine Learning: Cluster user music sessions
  • 25. Machine Learning: Artist disambiguation
  • 26. Machine Learning: Artist disambiguation Lets listen to those tracks! Is it really the same Fredrik?
  • 27. Machine Learning: Artist disambiguation
  • 28. Machine Learning: Artist disambiguation ● Rank artists with probability of being ambiguous ● Apply clustering on each “ambiguous” artists albums/tracks ○ Using features such as country, release year, label/licensor etc. ○ Distinct cluster could be different artists ● Nicely present this for manual curation
  • 29. Machine Learning: Recommender system The discover page
  • 30. Machine Learning: Recommender system Collaborative filtering
  • 31. Machine Learning: Recommender system Collaborative filtering ● Build a matrix of user plays ● Compute similarity between items
  • 32. Machine Learning: Recommender system 4 Million tracks x 60 Million users → Pairwise similarity infeasible Approximate the matrix with NMF
  • 33. Machine Learning: Recommender system Matrix factorization (latent factor models)
  • 34. Machine Learning: Recommender system Small vectors Cosine similarity and dot product efficient
  • 35. Machine Learning: Recommender system Finding recommendations: Approximate nearest neighbour (ANN) code: https://github.com/spotify/annoy Related artists & Radio: Similar to user recommendations, more models and not all CF-based Multiple models: Score candidates from all models, combine and rank!
  • 36. Machine Learning: Recommender system I just went through this quickly, read more details of Spotify Rec sys here: Doing this on MapReduce Comparing with Netflix Music Rec @ MLConf 2014
  • 37. ● More content-based ML ○ Fingerprinting: Echo nest ○ Content-based music recommendation using convolutional neural networks ● Personalize everything ○ Emails ○ Ads ○ User profiling ● ML on other parts of product than Rec Sys .. final last words on the Future of ML at Spotify
  • 38. Summary ● Multiple data sources -> multiple angles ● Data drives decision with A/B testing ● User analysis ○ Cluster and describe with classifier ● Artist disambiguation ○ Cluster and give to manual curators ● Recommender systems ○ Collaborative filtering
  • 39. ● We supervise thesis workers ○ Artist disambiguation/deduplication ○ Cluster user music sessions ○ Context-based recommender systems ○ Personalized ads / Personalized emails ● We have internships! www.spotify.com/jobs .. and potentially you could help us?