From Idea to Execution: Spotify's Discover Weekly

Chris Johnson
Chris JohnsonEngineering Manager - Recommendations and Personalization um Spotify
From Idea to
Execution: Spotify’s
Discover Weekly
Chris Johnson :: @MrChrisJohnson
Edward Newett :: @scaladaze
DataEngConf • NYC • Nov 2015
Or: 5 lessons in building
recommendation products at scale
Who are We??
Chris Johnson Edward Newett
Spotify in Numbers
• Started in 2006, now available in 58 markets
• 75+ Million active users, 20 Million paying subscribers
• 30+ Million songs, 20,000 new songs added per day
• 1.5 Billion user generated playlists
• 1 TB user data logged per day
• 1,700 node Hadoop cluster
• 10,000+ Hadoop jobs run daily
Challenge: 30M songs… how do we recommend
music to users?
Discover
Radio
Related Artists
Discover Weekly
• Started in 2006, now available in 58 markets
• 75+ Million active users, 20 Million paying subscribers
• 30+ Million songs, 20,000 new songs added per day
• 1.5 Billion user generated playlists
• 1 TB user data logged per day
• 1,700 node Hadoop cluster
• 10,000+ Hadoop jobs run daily
The Road to
Discover Weekly
2013 :: Discover Page v1.0
• Personalized News Feed of
recommendations
• Artists, Album Reviews, News
Articles, New Releases, Upcoming
Concerts, Social
Recommendations, Playlists…
• Required a lot of attention and
digging to engage with
recommendations
• No organization of content
2014 :: Discover Page v2.0
• Recommendations grouped into
strips (a la Netflix)
• Limited to Albums and New
Releases
• More organized than News-Feed
but still requires active
interaction
Insight: users spending more time on
editorial Browse playlists than Discover.
Idea: combine the
personalized experience
of Discover with the lean-
back ease of Browse
Meanwhile… 2014 Year In Music
Play it forward: Same content as the
Discover Page but.. a playlist
Lesson 1:
Be data driven from
start to finish
Slide from Dan McKinley - Etsy
2008 2012 2015
• Reach: How many users are you reaching
• Depth: For the users you reach, what is the
depth of reach.
• Retention: For the users you reach, how many
do you retain?
Define success metrics BEFORE you
release your test
• Reach: DW WAU / Spotify WAU
• Depth: DW Time Spent / Spotify WAU
• Retention: DW week-over-week retention
Discover Weekly Key Success Metrics
2008 2012 2015
Slide from Dan McKinley - Etsy
Step 1: Prototype (employee test)
Step 1: Prototype (employee test)
Results of Employee Test were very positive!
2008 2012 2015
Slide from Dan McKinley - Etsy
Step 2: Release AB Test to 1% of Users
Google Form 1% Results
Personalized image resulted in 10% lift in WAU
• Initial 0.5% user test
• 1% Spaceman image
• 1% Personalized
image
Lesson 2:
Reuse existing
infrastructure in creative
ways
Discover Weekly Data Flow
Recommendation
Models
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the
weighted RMSE (root mean squared error) using a function of plays, context, and recency as weight
X YUsers
Songs
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
[1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining
Implicit Matrix Factorization
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
•Aggregate all (user, track) streams into a large matrix
•Goal: Model probability of user playing a song as logistic, then maximize log likelihood of binary
preference matrix, weighting positive observations by a function of plays, context, and recency
X YUsers
Songs
• = bias for user
• = bias for item
• = regularization parameter
• = user latent factor vector
• = item latent factor vector
[2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations
Can also use Logistic Loss!
NLP Models on News and Blogs
Playlist itself is a
document
Songs in
playlist are
words
NLP Models work great on Playlists!
[3] http://benanne.github.io/2014/08/05/spotify-cnns.html
Deep Learning on Audio
•normalized item-vectors
Songs in a Latent Space representation
•user-vector in same space
Songs in a Latent Space representation
Lesson 3:
Don’t scale until
you need to
Scaling to 100%: Rollout Challenges
‣Create and publish 75M playlists every week
‣Downloading and processing Facebook images
‣Language translations
Scaling to 100%: Weekly refresh
‣Time sensitive updates
‣Refresh 75M playlists every Sunday night
‣Take timezones into account
Discover Weekly publishing flow
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
What’s next?
Iterating on content
quality and interface
enhancements
Iterating on quality and adding a feedback loop.
DW feedback comes at the expense of presentation bias.
Lesson 4:
Users know best. In the
end, AB Test everything!
Lesson 5 (final lesson!):
Empower bottom-up
innovation in your org and
amazing things will happen.
Thank You!
(btw, we’re hiring Machine Learning and
Data Engineers, come chat with us!)
1 von 50

Recomendados

Collaborative Filtering at Spotify von
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
92.8K views63 Folien
Music Personalization At Spotify von
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At SpotifyVidhya Murali
7.1K views35 Folien
Algorithmic Music Recommendations at Spotify von
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
137.9K views42 Folien
Personalized Playlists at Spotify von
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at SpotifyRohan Agrawal
904 views20 Folien
Scala Data Pipelines for Music Recommendations von
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
163.6K views50 Folien
Building Data Pipelines for Music Recommendations at Spotify von
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyVidhya Murali
5.2K views58 Folien

Más contenido relacionado

Was ist angesagt?

Recommending and searching @ Spotify von
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ SpotifyMounia Lalmas-Roelleke
9.4K views38 Folien
Music recommendations @ MLConf 2014 von
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Erik Bernhardsson
28.6K views44 Folien
Machine learning @ Spotify - Madison Big Data Meetup von
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
17.1K views39 Folien
Machine Learning and Big Data for Music Discovery at Spotify von
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyChing-Wei Chen
20.6K views46 Folien
ML+Hadoop at NYC Predictive Analytics von
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
19.5K views41 Folien
Spotify Discover Weekly: The machine learning behind your music recommendations von
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSophia Ciocca
2K views29 Folien

Was ist angesagt?(20)

Machine learning @ Spotify - Madison Big Data Meetup von Andy Sloane
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
Andy Sloane17.1K views
Machine Learning and Big Data for Music Discovery at Spotify von Ching-Wei Chen
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
Ching-Wei Chen20.6K views
ML+Hadoop at NYC Predictive Analytics von Erik Bernhardsson
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
Erik Bernhardsson19.5K views
Spotify Discover Weekly: The machine learning behind your music recommendations von Sophia Ciocca
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
Sophia Ciocca2K views
Music Recommendations at Scale with Spark von Chris Johnson
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
Chris Johnson58.6K views
Music Personalization : Real time Platforms. von Esh Vckay
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
Esh Vckay1.7K views
CF Models for Music Recommendations At Spotify von Vidhya Murali
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
Vidhya Murali1.6K views
Scala Data Pipelines @ Spotify von Neville Li
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
Neville Li51.4K views
Collaborative Filtering with Spark von Chris Johnson
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
Chris Johnson46.8K views
How data drives spotify von Ali Sarrafi
How data drives spotifyHow data drives spotify
How data drives spotify
Ali Sarrafi2.1K views
Homepage Personalization at Spotify von Oguz Semerci
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci3.5K views
How Apache Drives Music Recommendations At Spotify von Josh Baer
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
Josh Baer5.8K views
Big Data At Spotify von Adam Kawa
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
Adam Kawa18.6K views
The Evolution of Big Data at Spotify von Josh Baer
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
Josh Baer7.8K views

Similar a From Idea to Execution: Spotify's Discover Weekly

Latest SoundCloud Market Research on Vertical Expansion von
Latest SoundCloud Market Research on Vertical ExpansionLatest SoundCloud Market Research on Vertical Expansion
Latest SoundCloud Market Research on Vertical ExpansionInes Chen
5K views36 Folien
Analysis of the Changes in Listening Trends of a Music Streaming Service von
Analysis of the Changes in Listening Trends of a Music Streaming ServiceAnalysis of the Changes in Listening Trends of a Music Streaming Service
Analysis of the Changes in Listening Trends of a Music Streaming ServiceMasanori Takano
840 views16 Folien
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks von
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social NetworksBang Hui Lim
791 views30 Folien
NISO Webinar: Keyword Search = "Improve Discovery Systems" von
NISO Webinar: Keyword Search = "Improve Discovery Systems"NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"National Information Standards Organization (NISO)
1.5K views93 Folien
Groeling, Tim: NewsScape: Preserving TV News von
Groeling, Tim: NewsScape: Preserving TV NewsGroeling, Tim: NewsScape: Preserving TV News
Groeling, Tim: NewsScape: Preserving TV NewsReynolds Journalism Institute (RJI)
597 views27 Folien
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text von
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextElizabeth Murnane
2K views150 Folien

Similar a From Idea to Execution: Spotify's Discover Weekly(20)

Latest SoundCloud Market Research on Vertical Expansion von Ines Chen
Latest SoundCloud Market Research on Vertical ExpansionLatest SoundCloud Market Research on Vertical Expansion
Latest SoundCloud Market Research on Vertical Expansion
Ines Chen5K views
Analysis of the Changes in Listening Trends of a Music Streaming Service von Masanori Takano
Analysis of the Changes in Listening Trends of a Music Streaming ServiceAnalysis of the Changes in Listening Trends of a Music Streaming Service
Analysis of the Changes in Listening Trends of a Music Streaming Service
Masanori Takano840 views
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks von Bang Hui Lim
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
Bang Hui Lim791 views
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text von Elizabeth Murnane
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
Deezer - Big data as a streaming service von Julie Knibbe
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
Julie Knibbe7.3K views
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti... von Marius Miron
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Marius Miron276 views
Electronic Resource Assessment: Adventures in Engagement von Colleen Major
Electronic Resource Assessment: Adventures in EngagementElectronic Resource Assessment: Adventures in Engagement
Electronic Resource Assessment: Adventures in Engagement
Colleen Major914 views
[221]똑똑한 인공지능 dj 비서 clova music von NAVER D2
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
NAVER D28K views
Music Recommendation 2018 von Fabien Gouyon
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
Fabien Gouyon10.2K views
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr... von midem
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...
midem4.5K views
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present... von midem
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...
midem1.1K views
There is a method to it: Making meaning in information research through a mix... von Lynn Connaway
There is a method to it: Making meaning in information research through a mix...There is a method to it: Making meaning in information research through a mix...
There is a method to it: Making meaning in information research through a mix...
Lynn Connaway111 views
Data Visualization: A Quick Tour for Data Science Enthusiasts von Krist Wongsuphasawat
Data Visualization: A Quick Tour for Data Science EnthusiastsData Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Krist Wongsuphasawat50.6K views

Último

Techstack Ltd at Slush 2023, Ukrainian delegation von
Techstack Ltd at Slush 2023, Ukrainian delegationTechstack Ltd at Slush 2023, Ukrainian delegation
Techstack Ltd at Slush 2023, Ukrainian delegationViktoriiaOpanasenko
7 views4 Folien
Transport Management System - Shipment & Container Tracking von
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container TrackingFreightoscope
6 views3 Folien
Using Qt under LGPL-3.0 von
Using Qt under LGPL-3.0Using Qt under LGPL-3.0
Using Qt under LGPL-3.0Burkhard Stubert
15 views11 Folien
JioEngage_Presentation.pptx von
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
9 views4 Folien
Introduction to Maven von
Introduction to MavenIntroduction to Maven
Introduction to MavenJohn Valentino
7 views10 Folien
Understanding HTML terminology von
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminologyartembondar5
9 views8 Folien

Último(20)

Transport Management System - Shipment & Container Tracking von Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 6 views
JioEngage_Presentation.pptx von admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254559 views
Understanding HTML terminology von artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar59 views
Supercharging your Python Development Environment with VS Code and Dev Contai... von Dawn Wages
Supercharging your Python Development Environment with VS Code and Dev Contai...Supercharging your Python Development Environment with VS Code and Dev Contai...
Supercharging your Python Development Environment with VS Code and Dev Contai...
Dawn Wages9 views
Automated Testing of Microsoft Power BI Reports von RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS13 views
predicting-m3-devopsconMunich-2023.pptx von Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app10 views
University of Borås-full talk-2023-12-09.pptx von Mahdi_Fahmideh
University of Borås-full talk-2023-12-09.pptxUniversity of Borås-full talk-2023-12-09.pptx
University of Borås-full talk-2023-12-09.pptx
Mahdi_Fahmideh13 views
Ports-and-Adapters Architecture for Embedded HMI von Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert37 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... von NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi219 views

From Idea to Execution: Spotify's Discover Weekly

  • 1. From Idea to Execution: Spotify’s Discover Weekly Chris Johnson :: @MrChrisJohnson Edward Newett :: @scaladaze DataEngConf • NYC • Nov 2015 Or: 5 lessons in building recommendation products at scale
  • 2. Who are We?? Chris Johnson Edward Newett
  • 3. Spotify in Numbers • Started in 2006, now available in 58 markets • 75+ Million active users, 20 Million paying subscribers • 30+ Million songs, 20,000 new songs added per day • 1.5 Billion user generated playlists • 1 TB user data logged per day • 1,700 node Hadoop cluster • 10,000+ Hadoop jobs run daily
  • 4. Challenge: 30M songs… how do we recommend music to users?
  • 8. Discover Weekly • Started in 2006, now available in 58 markets • 75+ Million active users, 20 Million paying subscribers • 30+ Million songs, 20,000 new songs added per day • 1.5 Billion user generated playlists • 1 TB user data logged per day • 1,700 node Hadoop cluster • 10,000+ Hadoop jobs run daily
  • 10. 2013 :: Discover Page v1.0 • Personalized News Feed of recommendations • Artists, Album Reviews, News Articles, New Releases, Upcoming Concerts, Social Recommendations, Playlists… • Required a lot of attention and digging to engage with recommendations • No organization of content
  • 11. 2014 :: Discover Page v2.0 • Recommendations grouped into strips (a la Netflix) • Limited to Albums and New Releases • More organized than News-Feed but still requires active interaction
  • 12. Insight: users spending more time on editorial Browse playlists than Discover.
  • 13. Idea: combine the personalized experience of Discover with the lean- back ease of Browse
  • 15. Play it forward: Same content as the Discover Page but.. a playlist
  • 16. Lesson 1: Be data driven from start to finish
  • 17. Slide from Dan McKinley - Etsy 2008 2012 2015
  • 18. • Reach: How many users are you reaching • Depth: For the users you reach, what is the depth of reach. • Retention: For the users you reach, how many do you retain? Define success metrics BEFORE you release your test
  • 19. • Reach: DW WAU / Spotify WAU • Depth: DW Time Spent / Spotify WAU • Retention: DW week-over-week retention Discover Weekly Key Success Metrics
  • 20. 2008 2012 2015 Slide from Dan McKinley - Etsy
  • 21. Step 1: Prototype (employee test)
  • 22. Step 1: Prototype (employee test)
  • 23. Results of Employee Test were very positive!
  • 24. 2008 2012 2015 Slide from Dan McKinley - Etsy
  • 25. Step 2: Release AB Test to 1% of Users
  • 26. Google Form 1% Results
  • 27. Personalized image resulted in 10% lift in WAU • Initial 0.5% user test • 1% Spaceman image • 1% Personalized image
  • 31. 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of plays, context, and recency as weight X YUsers Songs • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector [1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining Implicit Matrix Factorization
  • 32. 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 •Aggregate all (user, track) streams into a large matrix •Goal: Model probability of user playing a song as logistic, then maximize log likelihood of binary preference matrix, weighting positive observations by a function of plays, context, and recency X YUsers Songs • = bias for user • = bias for item • = regularization parameter • = user latent factor vector • = item latent factor vector [2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations Can also use Logistic Loss!
  • 33. NLP Models on News and Blogs
  • 34. Playlist itself is a document Songs in playlist are words NLP Models work great on Playlists!
  • 36. •normalized item-vectors Songs in a Latent Space representation
  • 37. •user-vector in same space Songs in a Latent Space representation
  • 38. Lesson 3: Don’t scale until you need to
  • 39. Scaling to 100%: Rollout Challenges ‣Create and publish 75M playlists every week ‣Downloading and processing Facebook images ‣Language translations
  • 40. Scaling to 100%: Weekly refresh ‣Time sensitive updates ‣Refresh 75M playlists every Sunday night ‣Take timezones into account
  • 45. What’s next? Iterating on content quality and interface enhancements
  • 46. Iterating on quality and adding a feedback loop.
  • 47. DW feedback comes at the expense of presentation bias.
  • 48. Lesson 4: Users know best. In the end, AB Test everything!
  • 49. Lesson 5 (final lesson!): Empower bottom-up innovation in your org and amazing things will happen.
  • 50. Thank You! (btw, we’re hiring Machine Learning and Data Engineers, come chat with us!)