3. March 25, 2014
Spotify in numbers
Started in 2006, available in 55 markets
20+ million songs, 20,000 added per day
24+ million active users, 6+ million subscribers
1.5 billion playlists
!
4. Big Data
@spotify
600 node cluster
Every day
•400GB service logs
•4.5TB user data
•5,000 Hadoop jobs
•61TB generated
11. Social Listening
Take 1
•PUB/SUB
•Almost real-time
•Spammy
•Hard to scale
All characters appearing in this work are fictitious. Any resemblance to real persons, living
or dead, is purely coincidental.
this
17. What are we transferring?
•TSV logs with version type (moving to Avro)
•Centralized Schema Repository
•Parsers in Python Java
•Log parsing splitting by topic in Kafka
EndSong 21 username:Str timestamp:Int trackId:Str
msPlayed:Int reasonStart:Str reasonEnd:Str …
ClientEvent 15 username:Str platform:Str timestamp:Int
jsonData: Str …
39. March 25, 2014
Development Process
One git repository
One storm-shared sub project → jar → Artifactory
Many storm-team/application subprojects
Sampled log for local development
Turnable params in config files
43. March 25, 2014
Language Choices
Java for boring stuff - Cassandra, memcaced, RPC, etc.
!
Clojure for fun stuff - algorithm heavy
!
Scala - summingbird?
46. March 25, 2014
Want to join the band?
Check out http://www.spotify.com/jobs
or @Spotifyjobs for more information.
!
Neville Li
neville@spotify.com @sinisa_lyh