SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
From Idea to
Execution: Spotify’s
Discover Weekly
Chris Johnson :: @MrChrisJohnson
Edward Newett :: @scaladaze
DataEngConf • NYC • Nov 2015
Or: 5 lessons in building
recommendation products at scale
Who are We??
Chris Johnson Edward Newett
Spotify in Numbers
• Started in 2006, now available in 58 markets
• 75+ Million active users, 20 Million paying subscribers
• 30+ Million songs, 20,000 new songs added per day
• 1.5 Billion user generated playlists
• 1 TB user data logged per day
• 1,700 node Hadoop cluster
• 10,000+ Hadoop jobs run daily
Challenge: 30M songs… how do we recommend
music to users?
Discover
Radio
Related Artists
Discover Weekly
• Started in 2006, now available in 58 markets
• 75+ Million active users, 20 Million paying subscribers
• 30+ Million songs, 20,000 new songs added per day
• 1.5 Billion user generated playlists
• 1 TB user data logged per day
• 1,700 node Hadoop cluster
• 10,000+ Hadoop jobs run daily
The Road to
Discover Weekly
2013 :: Discover Page v1.0
• Personalized News Feed of
recommendations
• Artists, Album Reviews, News
Articles, New Releases, Upcoming
Concerts, Social
Recommendations, Playlists…
• Required a lot of attention and
digging to engage with
recommendations
• No organization of content
2014 :: Discover Page v2.0
• Recommendations grouped into
strips (a la Netflix)
• Limited to Albums and New
Releases
• More organized than News-Feed
but still requires active
interaction
Insight: users spending more time on
editorial Browse playlists than Discover.
Idea: combine the
personalized experience
of Discover with the lean-
back ease of Browse
Meanwhile… 2014 Year In Music
Play it forward: Same content as the
Discover Page but.. a playlist
Lesson 1:
Be data driven from
start to finish
Slide from Dan McKinley - Etsy
2008 2012 2015
• Reach: How many users are you reaching
• Depth: For the users you reach, what is the
depth of reach.
• Retention: For the users you reach, how many
do you retain?
Define success metrics BEFORE you
release your test
• Reach: DW WAU / Spotify WAU
• Depth: DW Time Spent / Spotify WAU
• Retention: DW week-over-week retention
Discover Weekly Key Success Metrics
2008 2012 2015
Slide from Dan McKinley - Etsy
Step 1: Prototype (employee test)
Step 1: Prototype (employee test)
Results of Employee Test were very positive!
2008 2012 2015
Slide from Dan McKinley - Etsy
Step 2: Release AB Test to 1% of Users
Google Form 1% Results
Personalized image resulted in 10% lift in WAU
• Initial 0.5% user test
• 1% Spaceman image
• 1% Personalized
image
Lesson 2:
Reuse existing
infrastructure in creative
ways
Discover Weekly Data Flow
Recommendation
Models
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the
weighted RMSE (root mean squared error) using a function of plays, context, and recency as weight
X YUsers
Songs
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vector
[1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining
Implicit Matrix Factorization
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
•Aggregate all (user, track) streams into a large matrix
•Goal: Model probability of user playing a song as logistic, then maximize log likelihood of binary
preference matrix, weighting positive observations by a function of plays, context, and recency
X YUsers
Songs
• = bias for user
• = bias for item
• = regularization parameter
• = user latent factor vector
• = item latent factor vector
[2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations
Can also use Logistic Loss!
NLP Models on News and Blogs
Playlist itself is a
document
Songs in
playlist are
words
NLP Models work great on Playlists!
[3] http://benanne.github.io/2014/08/05/spotify-cnns.html
Deep Learning on Audio
•normalized item-vectors
Songs in a Latent Space representation
•user-vector in same space
Songs in a Latent Space representation
Lesson 3:
Don’t scale until
you need to
Scaling to 100%: Rollout Challenges
‣Create and publish 75M playlists every week
‣Downloading and processing Facebook images
‣Language translations
Scaling to 100%: Weekly refresh
‣Time sensitive updates
‣Refresh 75M playlists every Sunday night
‣Take timezones into account
Discover Weekly publishing flow
What’s next?
Iterating on content
quality and interface
enhancements
Iterating on quality and adding a feedback loop.
DW feedback comes at the expense of presentation bias.
Lesson 4:
Users know best. In the
end, AB Test everything!
Lesson 5 (final lesson!):
Empower bottom-up
innovation in your org and
amazing things will happen.
Thank You!
(btw, we’re hiring Machine Learning and
Data Engineers, come chat with us!)

Weitere ähnliche Inhalte

Was ist angesagt?

Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At SpotifyVidhya Murali
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSophia Ciocca
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyJosh Baer
 
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and PainThe Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and PainRafał Wojdyła
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Erik Bernhardsson
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...Hakka Labs
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ SpotifyOscar Carlsson
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyVidhya Murali
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Mounia Lalmas-Roelleke
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at SpotifyOguz Semerci
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsJimmy Mårdell
 

Was ist angesagt? (20)

Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and PainThe Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Distributed "Web Scale" Systems
Distributed "Web Scale" SystemsDistributed "Web Scale" Systems
Distributed "Web Scale" Systems
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objects
 

Ähnlich wie From Idea to Execution: Spotify's Discover Weekly

Latest SoundCloud Market Research on Vertical Expansion
Latest SoundCloud Market Research on Vertical ExpansionLatest SoundCloud Market Research on Vertical Expansion
Latest SoundCloud Market Research on Vertical ExpansionInes Chen
 
Analysis of the Changes in Listening Trends of a Music Streaming Service
Analysis of the Changes in Listening Trends of a Music Streaming ServiceAnalysis of the Changes in Listening Trends of a Music Streaming Service
Analysis of the Changes in Listening Trends of a Music Streaming ServiceMasanori Takano
 
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social NetworksBang Hui Lim
 
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextElizabeth Murnane
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming serviceJulie Knibbe
 
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...Marius Miron
 
Electronic Resource Assessment: Adventures in Engagement
Electronic Resource Assessment: Adventures in EngagementElectronic Resource Assessment: Adventures in Engagement
Electronic Resource Assessment: Adventures in EngagementColleen Major
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova musicNAVER D2
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018Fabien Gouyon
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Karthik Murugesan
 
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...midem
 
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...midem
 
There is a method to it: Making meaning in information research through a mix...
There is a method to it: Making meaning in information research through a mix...There is a method to it: Making meaning in information research through a mix...
There is a method to it: Making meaning in information research through a mix...Lynn Connaway
 
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsData Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsKrist Wongsuphasawat
 

Ähnlich wie From Idea to Execution: Spotify's Discover Weekly (20)

Latest SoundCloud Market Research on Vertical Expansion
Latest SoundCloud Market Research on Vertical ExpansionLatest SoundCloud Market Research on Vertical Expansion
Latest SoundCloud Market Research on Vertical Expansion
 
Analysis of the Changes in Listening Trends of a Music Streaming Service
Analysis of the Changes in Listening Trends of a Music Streaming ServiceAnalysis of the Changes in Listening Trends of a Music Streaming Service
Analysis of the Changes in Listening Trends of a Music Streaming Service
 
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
 
NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"NISO Webinar: Keyword Search = "Improve Discovery Systems"
NISO Webinar: Keyword Search = "Improve Discovery Systems"
 
Groeling, Tim: NewsScape: Preserving TV News
Groeling, Tim: NewsScape: Preserving TV NewsGroeling, Tim: NewsScape: Preserving TV News
Groeling, Tim: NewsScape: Preserving TV News
 
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short TextRESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text
 
Tv, print, radio measuring the impact of traditional media
Tv, print, radio   measuring the impact of traditional mediaTv, print, radio   measuring the impact of traditional media
Tv, print, radio measuring the impact of traditional media
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
 
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
 
Electronic Resource Assessment: Adventures in Engagement
Electronic Resource Assessment: Adventures in EngagementElectronic Resource Assessment: Adventures in Engagement
Electronic Resource Assessment: Adventures in Engagement
 
Past Slides
Past SlidesPast Slides
Past Slides
 
[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music[221]똑똑한 인공지능 dj 비서 clova music
[221]똑똑한 인공지능 dj 비서 clova music
 
Viral loops
Viral loopsViral loops
Viral loops
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018
 
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...
Marketing Tips for Classical Music Artists - midem 2012 Carnegie Hall present...
 
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...
Marketing Tips for Classical Music: Digital Content Marketing – midem 2012 pr...
 
Challenging cajoling and rewarding the community for their contributions to o...
Challenging cajoling and rewarding the community for their contributions to o...Challenging cajoling and rewarding the community for their contributions to o...
Challenging cajoling and rewarding the community for their contributions to o...
 
There is a method to it: Making meaning in information research through a mix...
There is a method to it: Making meaning in information research through a mix...There is a method to it: Making meaning in information research through a mix...
There is a method to it: Making meaning in information research through a mix...
 
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science EnthusiastsData Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
 

Kürzlich hochgeladen

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 

Kürzlich hochgeladen (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 

From Idea to Execution: Spotify's Discover Weekly

  • 1. From Idea to Execution: Spotify’s Discover Weekly Chris Johnson :: @MrChrisJohnson Edward Newett :: @scaladaze DataEngConf • NYC • Nov 2015 Or: 5 lessons in building recommendation products at scale
  • 2. Who are We?? Chris Johnson Edward Newett
  • 3. Spotify in Numbers • Started in 2006, now available in 58 markets • 75+ Million active users, 20 Million paying subscribers • 30+ Million songs, 20,000 new songs added per day • 1.5 Billion user generated playlists • 1 TB user data logged per day • 1,700 node Hadoop cluster • 10,000+ Hadoop jobs run daily
  • 4. Challenge: 30M songs… how do we recommend music to users?
  • 8. Discover Weekly • Started in 2006, now available in 58 markets • 75+ Million active users, 20 Million paying subscribers • 30+ Million songs, 20,000 new songs added per day • 1.5 Billion user generated playlists • 1 TB user data logged per day • 1,700 node Hadoop cluster • 10,000+ Hadoop jobs run daily
  • 10. 2013 :: Discover Page v1.0 • Personalized News Feed of recommendations • Artists, Album Reviews, News Articles, New Releases, Upcoming Concerts, Social Recommendations, Playlists… • Required a lot of attention and digging to engage with recommendations • No organization of content
  • 11. 2014 :: Discover Page v2.0 • Recommendations grouped into strips (a la Netflix) • Limited to Albums and New Releases • More organized than News-Feed but still requires active interaction
  • 12. Insight: users spending more time on editorial Browse playlists than Discover.
  • 13. Idea: combine the personalized experience of Discover with the lean- back ease of Browse
  • 15. Play it forward: Same content as the Discover Page but.. a playlist
  • 16. Lesson 1: Be data driven from start to finish
  • 17. Slide from Dan McKinley - Etsy 2008 2012 2015
  • 18. • Reach: How many users are you reaching • Depth: For the users you reach, what is the depth of reach. • Retention: For the users you reach, how many do you retain? Define success metrics BEFORE you release your test
  • 19. • Reach: DW WAU / Spotify WAU • Depth: DW Time Spent / Spotify WAU • Retention: DW week-over-week retention Discover Weekly Key Success Metrics
  • 20. 2008 2012 2015 Slide from Dan McKinley - Etsy
  • 21. Step 1: Prototype (employee test)
  • 22. Step 1: Prototype (employee test)
  • 23. Results of Employee Test were very positive!
  • 24. 2008 2012 2015 Slide from Dan McKinley - Etsy
  • 25. Step 2: Release AB Test to 1% of Users
  • 26. Google Form 1% Results
  • 27. Personalized image resulted in 10% lift in WAU • Initial 0.5% user test • 1% Spaceman image • 1% Personalized image
  • 31. 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of plays, context, and recency as weight X YUsers Songs • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector [1] Hu Y. & Koren Y. & Volinsky C. (2008) Collaborative Filtering for Implicit Feedback Datasets 8th IEEE International Conference on Data Mining Implicit Matrix Factorization
  • 32. 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 •Aggregate all (user, track) streams into a large matrix •Goal: Model probability of user playing a song as logistic, then maximize log likelihood of binary preference matrix, weighting positive observations by a function of plays, context, and recency X YUsers Songs • = bias for user • = bias for item • = regularization parameter • = user latent factor vector • = item latent factor vector [2] Johnson C. (2014) Logistic Matrix Factorization for Implicit Feedback Data NIPS Workshop on Distributed Matrix Computations Can also use Logistic Loss!
  • 33. NLP Models on News and Blogs
  • 34. Playlist itself is a document Songs in playlist are words NLP Models work great on Playlists!
  • 36. •normalized item-vectors Songs in a Latent Space representation
  • 37. •user-vector in same space Songs in a Latent Space representation
  • 38. Lesson 3: Don’t scale until you need to
  • 39. Scaling to 100%: Rollout Challenges ‣Create and publish 75M playlists every week ‣Downloading and processing Facebook images ‣Language translations
  • 40. Scaling to 100%: Weekly refresh ‣Time sensitive updates ‣Refresh 75M playlists every Sunday night ‣Take timezones into account
  • 42.
  • 43.
  • 44.
  • 45. What’s next? Iterating on content quality and interface enhancements
  • 46. Iterating on quality and adding a feedback loop.
  • 47. DW feedback comes at the expense of presentation bias.
  • 48. Lesson 4: Users know best. In the end, AB Test everything!
  • 49. Lesson 5 (final lesson!): Empower bottom-up innovation in your org and amazing things will happen.
  • 50. Thank You! (btw, we’re hiring Machine Learning and Data Engineers, come chat with us!)