Delivering Personalized Music Discovery

•

3 gefällt mir•3,363 views

Rohan Singh

Technologie Unterhaltung & Humor

A little bit about us
Over 24 million active users
!
55 countries, 4 data centers
!
20+ million tracks

1. Collect lots of data 
2. Generate personalized recommendations 
3. Serve those to millions of listeners each day
9

Data, data, data
Track plays
!
Radio feedback
!
Playlists
!
Follows
10

Collecting it
All data into and out of Spotify goes through access points
!
Access points and services log data
!
Logs are aggregated and shipped to Hadoop
11

12
access points
HadoopCassandra
Discover
backend

Hadoop
Framework to store and process big, distributed data
!
Hadoop Distributed File System (HDFS)
!
Hadoop MapReduce
!
hadoop.apache.org
13

What you listen to
!
Artists you follow
!
What similar users listen to
17

Your buddy listened to Daft Punk all day
!
An artist you follow released a new album
!
A friend just created a new playlist
18

Storing recommendations
About 80 kilobytes of data per user
!
Regenerated daily for all active users
!
> 1 TB of recommendations overall
23

Storing recommendations
Terabyte a day is a fair amount of data to write and index
(+ replicas)
!
Tradeoff between storing data or
looking things up on the fly
24

Apache Cassandra
Highly available, distributed, scalable
!
Fast writes, decent reads
!
We have one cluster per data center
25

Recommendations all generated in Hadoop, in London
!
Need to be shipped to our data centers worldwide
!
So much Internet weather
27

hdfs2cass
Internal tool to copy data from Hadoop to Cassandra
!
Creates table from data in HDFS
!
Loads tables into Cassandra using a bulk loader
!
github.com/spotify/hdfs2cass
28

1. Aggregate data from all our data sources 
2. Decorate it with metadata 
3. Shuffle!
30

Serving at scale…
Discover is Spotify’s home page
500+ requests per second
!
Thousands of requests to other services
!
Database calls for each unique user
31

…or not?
Naive, first-stage prototype:
!
10 to 20 seconds per request
!
(doesn’t really scale)
32

Make it webscale!
!
Find & rewrite slow sections
!
Switch to C++ from Python for critical code
34

More improvements
!
Pregenerate more data
!
Cache aggressively
35

Throw hardware at it
!
Switch to SSD’s
!
Scale horizontally
36

Takeaways
You will need hardware —
big data can still be expensive
!
Less data can be better
!
Prototype, iterate, optimize—
fail early but improve
37

Weitere ähnliche Inhalte

Andere mochten auch

Regulamin konkursupzgomaz

Speechfinalslides3quincigstudent

X Światowe Igrzyska Sportowe Głuchychpzgomaz

Świat Głuchoniemych Gazeta z 1927r.pzgomaz

LN Case StudyBlain Banick

Kongrespzgomaz

Wiggin dielectricSaul Wiggin

speech class ssl documentquincigstudent

SXSW Interactive 2014CowanDeBaets

Speechfinalslides2quincigstudent

Creative IT MindsRAJESH KRAJ, Digital Marketing and Data Analytics Expert

Honorowy Prezes PZGpzgomaz

Stefania ulassowapzgomaz

Powstanie Warszawskie oczami 12 letniego cywilapzgomaz

Smoszewo dalsza historiapzgomaz

Speechslides for classquincigstudent

Rocznicapzgomaz

Pluton głuchoniemych żołnierzypzgomaz

Andere mochten auch (18)

Regulamin konkursu

Speechfinalslides3

X Światowe Igrzyska Sportowe Głuchych

Świat Głuchoniemych Gazeta z 1927r.

LN Case Study

Kongres

Wiggin dielectric

speech class ssl document

SXSW Interactive 2014

Speechfinalslides2

Creative IT Minds

Honorowy Prezes PZG

Stefania ulassowa

Powstanie Warszawskie oczami 12 letniego cywila

Smoszewo dalsza historia

Speechslides for class

Rocznica

Pluton głuchoniemych żołnierzy

Ähnlich wie Delivering Personalized Music Discovery

Impala for PhillyDB MeetupShravan (Sean) Pabba

Apache Spark: killer or savior of Apache Hadoop?rhatr

Scaling Cassandra in all directions - Jimmy Mardell SpotifyEvention

Architectural Evolution Starting from HadoopSpagoWorld

Liferay & Big Data Dev Con 2014Miguel Pastor

Liferay and Big DataMiguel Pastor

The Evolution of Big Data at SpotifyJosh Baer

Drupal case study: ABC Dig MusicDavid Peterson

Final version sql over hadoop ver1Sudheesh Narayanan

Sql over hadoop ver 3Sudheesh Narayanan

Incredible Impala Gwen (Chen) Shapira

Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira

Open source stak of big data techs open suse asiaMuhammad Rifqi

Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen

HfileMarc de Palol

Partners in Crime: Cassandra Analytics and ETL with HadoopStu Hood

Processing Life Science Data at Scale - using Semantic Web TechnologiesSyed Muhammad Ali Hasnain

2016-07-21-Godil-presentation.pptxD21CE161GOSWAMIPARTH

Scalable Hadoop with succinct Python: the best of both worldsDataWorks Summit

Introduction to HadoopTimothy Spann

Ähnlich wie Delivering Personalized Music Discovery (20)

Impala for PhillyDB Meetup

Apache Spark: killer or savior of Apache Hadoop?

Scaling Cassandra in all directions - Jimmy Mardell Spotify

Architectural Evolution Starting from Hadoop

Liferay & Big Data Dev Con 2014

Liferay and Big Data

The Evolution of Big Data at Spotify

Drupal case study: ABC Dig Music

Final version sql over hadoop ver1

Sql over hadoop ver 3

Incredible Impala

Data Wrangling and Oracle Connectors for Hadoop

Open source stak of big data techs open suse asia

Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis

Hfile

Partners in Crime: Cassandra Analytics and ETL with Hadoop

Processing Life Science Data at Scale - using Semantic Web Technologies

2016-07-21-Godil-presentation.pptx

Scalable Hadoop with succinct Python: the best of both worlds

Introduction to Hadoop

Kürzlich hochgeladen

Slack Application Development 101 Slidespraypatel2

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Developing An App To Navigate The Roads of BrazilV3cube

Kürzlich hochgeladen (20)

Slack Application Development 101 Slides

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Finology Group – Insurtech Innovation Award 2024

Automating Google Workspace (GWS) & more with Apps Script

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Salesforce Community Group Quito, Salesforce 101

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

CNv6 Instructor Chapter 6 Quality of Service

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Boost PC performance: How more available memory can improve productivity

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Presentation on how to chat with PDF using ChatGPT code interpreter

How to Troubleshoot Apps for the Modern Connected Worker

2024: Domino Containers - The Next Step. News from the Domino Container commu...

[2024]Digital Global Overview Report 2024 Meltwater.pdf

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Exploring the Future Potential of AI-Enabled Smartphone Processors

Developing An App To Navigate The Roads of Brazil

Delivering Personalized Music Discovery

1. Delivering Music Recommendations to Millions! ! Sriram Malladi Rohan Singh (@rohansingh)

2. A little bit about us Over 24 million active users ! 55 countries, 4 data centers ! 20+ million tracks

3. Discovering music — then 3

4. 4

5. 5

6. What do you want to listen to? 6

7. The right music for every moment 7

8. The Plan 8

9. 1. Collect lots of data  2. Generate personalized recommendations  3. Serve those to millions of listeners each day 9

10. Data, data, data Track plays ! Radio feedback ! Playlists ! Follows 10

11. Collecting it All data into and out of Spotify goes through access points ! Access points and services log data ! Logs are aggregated and shipped to Hadoop 11

12. 12 access points HadoopCassandra Discover backend

13. Hadoop Framework to store and process big, distributed data ! Hadoop Distributed File System (HDFS) ! Hadoop MapReduce ! hadoop.apache.org 13

14. 14

15. 15

16. Crunch the data 16

17. What you listen to ! Artists you follow ! What similar users listen to 17

18. Your buddy listened to Daft Punk all day ! An artist you follow released a new album ! A friend just created a new playlist 18

19. Region ! Age group ! Gender 19

20. What do you want to listen to? 20

21. 21

22. Roll it out 22

23. Storing recommendations About 80 kilobytes of data per user ! Regenerated daily for all active users ! > 1 TB of recommendations overall 23

24. Storing recommendations Terabyte a day is a fair amount of data to write and index (+ replicas) ! Tradeoff between storing data or looking things up on the fly 24

25. Apache Cassandra Highly available, distributed, scalable ! Fast writes, decent reads ! We have one cluster per data center 25

26. Transferring worldwide 26

27. Recommendations all generated in Hadoop, in London ! Need to be shipped to our data centers worldwide ! So much Internet weather 27

28. hdfs2cass Internal tool to copy data from Hadoop to Cassandra ! Creates table from data in HDFS ! Loads tables into Cassandra using a bulk loader ! github.com/spotify/hdfs2cass 28

29. Serve it up 29

30. 1. Aggregate data from all our data sources  2. Decorate it with metadata  3. Shuffle! 30

31. Serving at scale… Discover is Spotify’s home page 500+ requests per second ! Thousands of requests to other services ! Database calls for each unique user 31

32. …or not? Naive, first-stage prototype: ! 10 to 20 seconds per request ! (doesn’t really scale) 32

33. Fail a lot 33

34. Make it webscale! ! Find & rewrite slow sections ! Switch to C++ from Python for critical code 34

35. More improvements ! Pregenerate more data ! Cache aggressively 35

36. Throw hardware at it ! Switch to SSD’s ! Scale horizontally 36

37. Takeaways You will need hardware — big data can still be expensive ! Less data can be better ! Prototype, iterate, optimize— fail early but improve 37

38. Questions? 38

Delivering Personalized Music Discovery

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Delivering Personalized Music Discovery

Ähnlich wie Delivering Personalized Music Discovery (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Delivering Personalized Music Discovery