SlideShare a Scribd company logo
1 of 37
Download to read offline
HowApache
Drives Music
Recommendations
At Spotify
Josh Baer (jbx@spotify.com)
Note:The view expressed is my own and does not necessarily represent that of Spotify
WhoAm I?
• Technical Product Owner at
Spotify
• Working with batch and fast
processing infrastructure
@l_phant
Music Discovery in the 90s
What is Spotify?
• Music Streaming Service
• Launched in 2008
• Free and PremiumTiers
• Available in 58 Countries
75+ MillionActive Users
30+ Million Songs
1+ Billion Plays/Day
Music
Recommendations
with Apache
How do we recommend a
personalized playlist of
new music to 75+ million
users?
10.123.133.333 - - [Mon, 3 June 2015 11:31:33 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1847 "https://my.analytics.app/
admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.123.133.222 - - [Mon, 3 June 2015 11:31:43 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1984 "https://my.analytics.app/
admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36”
10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.321.145.111 - - [Mon, 3 June 2015 11:33:03 GMT] "GET /api/loggedInUser HTTP/1.1" 304 - "https://my.analytics.app/dashboard/courses/
1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
10.112.322.111 - - [Mon, 3 June 2015 11:33:03 GMT] "POST /api/instrumentation/events/new HTTP/1.1" 200 2 "https://my.analytics.app/
dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81
Safari/537.36”
10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin"
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36"
It begins with a log
Apache Kafka at Spotify
•340 Kafka-related nodes
•30TB/day from logs
How do we store TBs of
new data every data?
Apache Hadoop at Spotify
• 1700 Nodes
• 60 PB of Data
• 70TB of Memory
• Over 1 Million jobs run in Q3, 2015
ProcessingGrowth
150%
250%
350%
450%
550%
Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014 Q1-2015 Q2-2015 Q3-2015
Hadoop at Spotify
Processing Toolbox
• Apache Crunch
• Scalding
• Apache Hive
• Apache Spark
• Apache Storm
• Hadoop Streaming
• Apache Pig
Storage Formats
• ApacheAvro
• Apache Parquet
How do we personalize
the playlists?
Collaborative Filtering
Justin Bieber Drake Avicii Major Lazer
Anna Listened Listened
Gustav Listened Listened Listened
Mary Listened Listened Listened Listened
Michael Listened ListenedSuggest
How do we serve new
playlists to all our users
every week?
Apache Cassandra at Spotify
• Number of Clusters: 113
• Number of Machines: 1155
• Largest Cluster: 60 Nodes
Driven By Data
Driven ByApache
Thank YOU for your
contributions to
Apache products!
One Last Thing…
Spotify Luigi
•Workflow Manager
•Over 150 contributors
•Used by 10s, possibly 100s of companies
Maybe…Apache Luigi?
Sponsors/mentors/contributors wanted!
Think this stuff is interesting?
We have a great time building it!
spotify.com/jobs
Better Spotify ML Presentations
• Algorithmic Music Recommendations at Spotify (Chris Johnson)
• Interactive Recommender Systems with Netflix and Spotify (Chris Johnson)
• Music recommendations @ MLConf 2014 (Erik Bernhardsson)
• Machine learning @ Spotify (Andy Sloane)
• Recommending music on Spotifywith deep learning (Sander Dieleman)
• Scala Data Pipelines @ Spotify (Neville Li)
• Spotify's Music Recommendations LambdaArchitecture (Esh Kumar and Emily Samuels)

More Related Content

What's hot

Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at SpotifyRohan Agrawal
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At SpotifyVidhya Murali
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ SpotifyOscar Carlsson
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ SpotifyNikhil Tibrewal
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyChris Johnson
 
Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing playNiklas Gustavsson
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayDataWorks Summit
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with SparkChris Johnson
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingSameera Horawalavithana
 

What's hot (20)

Spotify: behind the scenes
Spotify: behind the scenesSpotify: behind the scenes
Spotify: behind the scenes
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Distributed "Web Scale" Systems
Distributed "Web Scale" SystemsDistributed "Web Scale" Systems
Distributed "Web Scale" Systems
 
Spotify: Data center & Backend buildout
Spotify: Data center & Backend buildoutSpotify: Data center & Backend buildout
Spotify: Data center & Backend buildout
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing play
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Spotify: P2P music streaming
Spotify: P2P music streamingSpotify: P2P music streaming
Spotify: P2P music streaming
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Viewers also liked

Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loopJosh Baer
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsJimmy Mårdell
 
Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureEsh Vckay
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeAdam Kawa
 
Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14dotCloud
 
Empowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from SpotifyEmpowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from SpotifyKevin Goldsmith
 
How spotify makes product
How spotify makes productHow spotify makes product
How spotify makes productAli Sarrafi
 
Activation: From thinking to tweaking it, how we do it at Spotify
Activation: From thinking to tweaking it, how we do it at Spotify Activation: From thinking to tweaking it, how we do it at Spotify
Activation: From thinking to tweaking it, how we do it at Spotify TheFamily
 
Quality Built In @ Spotify
Quality Built In @ SpotifyQuality Built In @ Spotify
Quality Built In @ SpotifyAndrii Dzynia
 
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...Kevin Goldsmith
 
Yahoo: Experiences with MySQL GTID and Multi Threaded Replication
Yahoo: Experiences with MySQL GTID and Multi Threaded ReplicationYahoo: Experiences with MySQL GTID and Multi Threaded Replication
Yahoo: Experiences with MySQL GTID and Multi Threaded ReplicationYashada Jadhav
 
Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015
Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015
Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015ACTUONDA
 
Music Recommendations at Spotify
Music Recommendations at SpotifyMusic Recommendations at Spotify
Music Recommendations at SpotifyEmily Samuels
 
Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...
Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...
Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...ACTUONDA
 
Vif Musique et Marques @ Webinar Radio 2.0 MaMA 2016
Vif Musique et Marques @ Webinar Radio 2.0 MaMA 2016Vif Musique et Marques @ Webinar Radio 2.0 MaMA 2016
Vif Musique et Marques @ Webinar Radio 2.0 MaMA 2016ACTUONDA
 
Reinventing radio and music for digital audience at BBC by Marc Friend @ Radi...
Reinventing radio and music for digital audience at BBC by Marc Friend @ Radi...Reinventing radio and music for digital audience at BBC by Marc Friend @ Radi...
Reinventing radio and music for digital audience at BBC by Marc Friend @ Radi...ACTUONDA
 

Viewers also liked (20)

Shortening the feedback loop
Shortening the feedback loopShortening the feedback loop
Shortening the feedback loop
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Playlists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objectsPlaylists at Spotify - Using Cassandra to store version controlled objects
Playlists at Spotify - Using Cassandra to store version controlled objects
 
Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda Architecture
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
HCatalog
HCatalogHCatalog
HCatalog
 
Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14Docker at Spotify - Dockercon14
Docker at Spotify - Dockercon14
 
Empowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from SpotifyEmpowering Engineering Talent - an update from Spotify
Empowering Engineering Talent - an update from Spotify
 
How spotify makes product
How spotify makes productHow spotify makes product
How spotify makes product
 
Activation: From thinking to tweaking it, how we do it at Spotify
Activation: From thinking to tweaking it, how we do it at Spotify Activation: From thinking to tweaking it, how we do it at Spotify
Activation: From thinking to tweaking it, how we do it at Spotify
 
Scaling Operations At Spotify
Scaling Operations At SpotifyScaling Operations At Spotify
Scaling Operations At Spotify
 
Quality Built In @ Spotify
Quality Built In @ SpotifyQuality Built In @ Spotify
Quality Built In @ Spotify
 
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
How Spotify Builds Products (Organization. Architecture, Autonomy, Accountabi...
 
Yahoo: Experiences with MySQL GTID and Multi Threaded Replication
Yahoo: Experiences with MySQL GTID and Multi Threaded ReplicationYahoo: Experiences with MySQL GTID and Multi Threaded Replication
Yahoo: Experiences with MySQL GTID and Multi Threaded Replication
 
Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015
Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015
Spotify: Profils d'écoute et data storytelling @ Radio 2.0 2015
 
Music Recommendations at Spotify
Music Recommendations at SpotifyMusic Recommendations at Spotify
Music Recommendations at Spotify
 
Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...
Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...
Impact et attractivité des concerts des radio pour les auditeurs Etude HyperW...
 
Vif Musique et Marques @ Webinar Radio 2.0 MaMA 2016
Vif Musique et Marques @ Webinar Radio 2.0 MaMA 2016Vif Musique et Marques @ Webinar Radio 2.0 MaMA 2016
Vif Musique et Marques @ Webinar Radio 2.0 MaMA 2016
 
Reinventing radio and music for digital audience at BBC by Marc Friend @ Radi...
Reinventing radio and music for digital audience at BBC by Marc Friend @ Radi...Reinventing radio and music for digital audience at BBC by Marc Friend @ Radi...
Reinventing radio and music for digital audience at BBC by Marc Friend @ Radi...
 

Similar to How Apache Drives Music Recommendations At Spotify

Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCJosh Baer
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Big Data Spain
 
fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14N Masahiro
 
Guidelines for productive full stack data engineers
Guidelines for productive full stack data engineersGuidelines for productive full stack data engineers
Guidelines for productive full stack data engineersDataWorks Summit
 
Productive data engineer
Productive data engineerProductive data engineer
Productive data engineerRafał Wojdyła
 
Music Hackday Boston - The Last.fm API
Music Hackday Boston - The Last.fm APIMusic Hackday Boston - The Last.fm API
Music Hackday Boston - The Last.fm APIdavidsingleton
 
Open source-secret-sauce-rit-2010
Open source-secret-sauce-rit-2010Open source-secret-sauce-rit-2010
Open source-secret-sauce-rit-2010Ted Husted
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
 
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022Timothy Spann
 
Sql bits apache nifi 101 Introduction and best practices
Sql bits    apache nifi 101 Introduction and best practicesSql bits    apache nifi 101 Introduction and best practices
Sql bits apache nifi 101 Introduction and best practicesTimothy Spann
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesNETWAYS
 
Apache Pulsar Community-Jennifer
Apache Pulsar Community-JenniferApache Pulsar Community-Jennifer
Apache Pulsar Community-JenniferStreamNative
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation
 
Create Your Own Chatbot with Hubot and CoffeeScript
Create Your Own Chatbot with Hubot and CoffeeScriptCreate Your Own Chatbot with Hubot and CoffeeScript
Create Your Own Chatbot with Hubot and CoffeeScriptRob Scaduto
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation
 
Last.fm API workshop - Stockholm
Last.fm API workshop - StockholmLast.fm API workshop - Stockholm
Last.fm API workshop - StockholmMatthew Ogle
 

Similar to How Apache Drives Music Recommendations At Spotify (20)

Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14fluent-plugin-beats at Elasticsearch meetup #14
fluent-plugin-beats at Elasticsearch meetup #14
 
Guidelines for productive full stack data engineers
Guidelines for productive full stack data engineersGuidelines for productive full stack data engineers
Guidelines for productive full stack data engineers
 
Productive data engineer
Productive data engineerProductive data engineer
Productive data engineer
 
Music Hackday Boston - The Last.fm API
Music Hackday Boston - The Last.fm APIMusic Hackday Boston - The Last.fm API
Music Hackday Boston - The Last.fm API
 
Open source-secret-sauce-rit-2010
Open source-secret-sauce-rit-2010Open source-secret-sauce-rit-2010
Open source-secret-sauce-rit-2010
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
 
Sql bits apache nifi 101 Introduction and best practices
Sql bits    apache nifi 101 Introduction and best practicesSql bits    apache nifi 101 Introduction and best practices
Sql bits apache nifi 101 Introduction and best practices
 
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-BayesOSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
OSDC 2016 - Ingesting Logs with Style by Pere Urbon-Bayes
 
Apache Pulsar Community-Jennifer
Apache Pulsar Community-JenniferApache Pulsar Community-Jennifer
Apache Pulsar Community-Jennifer
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
Music streams
Music streamsMusic streams
Music streams
 
Create Your Own Chatbot with Hubot and CoffeeScript
Create Your Own Chatbot with Hubot and CoffeeScriptCreate Your Own Chatbot with Hubot and CoffeeScript
Create Your Own Chatbot with Hubot and CoffeeScript
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
Last.fm API workshop - Stockholm
Last.fm API workshop - StockholmLast.fm API workshop - Stockholm
Last.fm API workshop - Stockholm
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

How Apache Drives Music Recommendations At Spotify

  • 1. HowApache Drives Music Recommendations At Spotify Josh Baer (jbx@spotify.com) Note:The view expressed is my own and does not necessarily represent that of Spotify
  • 2. WhoAm I? • Technical Product Owner at Spotify • Working with batch and fast processing infrastructure @l_phant
  • 4. What is Spotify? • Music Streaming Service • Launched in 2008 • Free and PremiumTiers • Available in 58 Countries
  • 9.
  • 10.
  • 11. How do we recommend a personalized playlist of new music to 75+ million users?
  • 12. 10.123.133.333 - - [Mon, 3 June 2015 11:31:33 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1847 "https://my.analytics.app/ admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.123.133.222 - - [Mon, 3 June 2015 11:31:43 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1984 "https://my.analytics.app/ admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36” 10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.321.145.111 - - [Mon, 3 June 2015 11:33:03 GMT] "GET /api/loggedInUser HTTP/1.1" 304 - "https://my.analytics.app/dashboard/courses/ 1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.112.322.111 - - [Mon, 3 June 2015 11:33:03 GMT] "POST /api/instrumentation/events/new HTTP/1.1" 200 2 "https://my.analytics.app/ dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36” 10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" It begins with a log
  • 13.
  • 14.
  • 15. Apache Kafka at Spotify •340 Kafka-related nodes •30TB/day from logs
  • 16.
  • 17. How do we store TBs of new data every data?
  • 18.
  • 19. Apache Hadoop at Spotify • 1700 Nodes • 60 PB of Data • 70TB of Memory • Over 1 Million jobs run in Q3, 2015
  • 20. ProcessingGrowth 150% 250% 350% 450% 550% Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014 Q1-2015 Q2-2015 Q3-2015 Hadoop at Spotify
  • 21. Processing Toolbox • Apache Crunch • Scalding • Apache Hive • Apache Spark • Apache Storm • Hadoop Streaming • Apache Pig
  • 23. How do we personalize the playlists?
  • 24.
  • 25. Collaborative Filtering Justin Bieber Drake Avicii Major Lazer Anna Listened Listened Gustav Listened Listened Listened Mary Listened Listened Listened Listened Michael Listened ListenedSuggest
  • 26.
  • 27. How do we serve new playlists to all our users every week?
  • 28.
  • 29. Apache Cassandra at Spotify • Number of Clusters: 113 • Number of Machines: 1155 • Largest Cluster: 60 Nodes
  • 32. Thank YOU for your contributions to Apache products!
  • 34. Spotify Luigi •Workflow Manager •Over 150 contributors •Used by 10s, possibly 100s of companies
  • 36. Think this stuff is interesting? We have a great time building it! spotify.com/jobs
  • 37. Better Spotify ML Presentations • Algorithmic Music Recommendations at Spotify (Chris Johnson) • Interactive Recommender Systems with Netflix and Spotify (Chris Johnson) • Music recommendations @ MLConf 2014 (Erik Bernhardsson) • Machine learning @ Spotify (Andy Sloane) • Recommending music on Spotifywith deep learning (Sander Dieleman) • Scala Data Pipelines @ Spotify (Neville Li) • Spotify's Music Recommendations LambdaArchitecture (Esh Kumar and Emily Samuels)