Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

How Apache Drives Music Recommendations At Spotify

4.550 Aufrufe

Veröffentlicht am

The slides go through the high-level process of generating personalized playlists for all Spotify's users, using Apache big data products extensively.

Presentation given at Apache: Big Data Europe conference on September 29th, 2015 in Budapest.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

How Apache Drives Music Recommendations At Spotify

  1. 1. HowApache Drives Music Recommendations At Spotify Josh Baer (jbx@spotify.com) Note:The view expressed is my own and does not necessarily represent that of Spotify
  2. 2. WhoAm I? • Technical Product Owner at Spotify • Working with batch and fast processing infrastructure @l_phant
  3. 3. Music Discovery in the 90s
  4. 4. What is Spotify? • Music Streaming Service • Launched in 2008 • Free and PremiumTiers • Available in 58 Countries
  5. 5. 75+ MillionActive Users
  6. 6. 30+ Million Songs
  7. 7. 1+ Billion Plays/Day
  8. 8. Music Recommendations with Apache
  9. 9. How do we recommend a personalized playlist of new music to 75+ million users?
  10. 10. 10.123.133.333 - - [Mon, 3 June 2015 11:31:33 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1847 "https://my.analytics.app/ admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.123.133.222 - - [Mon, 3 June 2015 11:31:43 GMT] "GET /api/admin/job/aggregator/status HTTP/1.1" 200 1984 "https://my.analytics.app/ admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36” 10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.321.145.111 - - [Mon, 3 June 2015 11:33:03 GMT] "GET /api/loggedInUser HTTP/1.1" 304 - "https://my.analytics.app/dashboard/courses/ 1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" 10.112.322.111 - - [Mon, 3 June 2015 11:33:03 GMT] "POST /api/instrumentation/events/new HTTP/1.1" 200 2 "https://my.analytics.app/ dashboard/courses/1291726" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36” 10.123.133.222 - - [Mon, 3 June 2015 11:33:02 GMT] "GET /dashboard/courses/1291726 HTTP/1.1" 304 - "https://my.analytics.app/admin" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36" It begins with a log
  11. 11. Apache Kafka at Spotify •340 Kafka-related nodes •30TB/day from logs
  12. 12. How do we store TBs of new data every data?
  13. 13. Apache Hadoop at Spotify • 1700 Nodes • 60 PB of Data • 70TB of Memory • Over 1 Million jobs run in Q3, 2015
  14. 14. ProcessingGrowth 150% 250% 350% 450% 550% Q4-2013 Q1-2014 Q2-2014 Q3-2014 Q4-2014 Q1-2015 Q2-2015 Q3-2015 Hadoop at Spotify
  15. 15. Processing Toolbox • Apache Crunch • Scalding • Apache Hive • Apache Spark • Apache Storm • Hadoop Streaming • Apache Pig
  16. 16. Storage Formats • ApacheAvro • Apache Parquet
  17. 17. How do we personalize the playlists?
  18. 18. Collaborative Filtering Justin Bieber Drake Avicii Major Lazer Anna Listened Listened Gustav Listened Listened Listened Mary Listened Listened Listened Listened Michael Listened ListenedSuggest
  19. 19. How do we serve new playlists to all our users every week?
  20. 20. Apache Cassandra at Spotify • Number of Clusters: 113 • Number of Machines: 1155 • Largest Cluster: 60 Nodes
  21. 21. Driven By Data
  22. 22. Driven ByApache
  23. 23. Thank YOU for your contributions to Apache products!
  24. 24. One Last Thing…
  25. 25. Spotify Luigi •Workflow Manager •Over 150 contributors •Used by 10s, possibly 100s of companies
  26. 26. Maybe…Apache Luigi? Sponsors/mentors/contributors wanted!
  27. 27. Think this stuff is interesting? We have a great time building it! spotify.com/jobs
  28. 28. Better Spotify ML Presentations • Algorithmic Music Recommendations at Spotify (Chris Johnson) • Interactive Recommender Systems with Netflix and Spotify (Chris Johnson) • Music recommendations @ MLConf 2014 (Erik Bernhardsson) • Machine learning @ Spotify (Andy Sloane) • Recommending music on Spotifywith deep learning (Sander Dieleman) • Scala Data Pipelines @ Spotify (Neville Li) • Spotify's Music Recommendations LambdaArchitecture (Esh Kumar and Emily Samuels)

×