SlideShare a Scribd company logo
1 of 56
Download to read offline
The Evolution of
Hadoop at Spotify
Through Failures and Pain
Josh Baer (jbx@spotify.com)
Rafal Wojdyla (rav@spotify.com) 1
Note: Our views are our own and don't necessarily represent those of Spotify.
2
• Growing Pains (2009-2012)
• Gaining Focus (2013 - 2014)
• The Future (2015+)
Overview
3
Our First Major
Hadoop Bug
4
Cluster 1.0
What is Spotify?
• Music Streaming Service
• Browse and Discover Millions of
Songs, Artists and Albums
• Launched in October 2008
• December 2014:
• 60 Million Monthly Users
• 15 Million Paid Subscribers
5
What is Spotify?
• Data Infrastructure:
• 1300 Hadoop Nodes
• 42 PB Storage
• 20 TB data ingested via Kafka/day
• 200 TB generated by Hadoop/day
6
7
select artist_id, count(1)
from user_activities
where play_seconds > 30
group by artist_id;
7
select artist_id, count(1)
from user_activities
where play_seconds > 30
group by artist_id;
7
0	
  *	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  hourly_import.jar	
  
15	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  hourly_listeners.jar	
  
30	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐analytics	
  hadoop	
  jar	
  user_funnel_hourly.jar	
  
*	
  1	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  daily_aggregate.jar	
  
*	
  2	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  calculate_royalties.jar	
  
*/2	
  22	
  *	
  *	
  *	
  spotify-­‐radio	
  	
  	
  	
  	
  hadoop	
  jar	
  generate_radio.jar	
  
8
0	
  *	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  hourly_import.jar	
  
15	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  hourly_listeners.jar	
  
30	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐analytics	
  hadoop	
  jar	
  user_funnel_hourly.jar	
  
*	
  1	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  daily_aggregate.jar	
  
*	
  2	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  calculate_royalties.jar	
  
*/2	
  22	
  *	
  *	
  *	
  spotify-­‐radio	
  	
  	
  	
  	
  hadoop	
  jar	
  generate_radio.jar	
  
8
9
Handles the ‘plumbing’ for Hadoop jobs
https://github.com/spotify/luigi
10
10
1111
To the Cloud!
1111
To the Cloud!
1111
12
#	
  sudo	
  addgroup	
  hadoop	
  
#	
  sudo	
  adduser	
  —ingroup	
  hadoop	
  hdfs	
  
#	
  sudo	
  adduser	
  —ingroup	
  hadoop	
  yarn	
  
#	
  cp	
  /tmp/configs/*.xml	
  /etc/hadoop/conf/	
  
#	
  apt-­‐get	
  update	
  
…	
  
[hdfs@sj-­‐hadoop-­‐b20	
  ~]	
  $	
  apt-­‐get	
  install	
  hadoop-­‐hdfs-­‐datanode	
  
…	
  
[yarn@sj-­‐hadoop-­‐b20	
  ~]	
  $	
  apt-­‐get	
  install	
  hadoop-­‐yarn-­‐nodemanager	
  
12
#	
  sudo	
  addgroup	
  hadoop	
  
#	
  sudo	
  adduser	
  —ingroup	
  hadoop	
  hdfs	
  
#	
  sudo	
  adduser	
  —ingroup	
  hadoop	
  yarn	
  
#	
  cp	
  /tmp/configs/*.xml	
  /etc/hadoop/conf/	
  
#	
  apt-­‐get	
  update	
  
…	
  
[hdfs@sj-­‐hadoop-­‐b20	
  ~]	
  $	
  apt-­‐get	
  install	
  hadoop-­‐hdfs-­‐datanode	
  
…	
  
[yarn@sj-­‐hadoop-­‐b20	
  ~]	
  $	
  apt-­‐get	
  install	
  hadoop-­‐yarn-­‐nodemanager	
  
13
Automated
Config Management
(via Puppet)
14
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐ls	
  /data	
  
Found	
  3	
  items	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  lake	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  pond	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  ocean	
  
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐ls	
  /data/lake	
  
Found	
  1	
  items	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  1321451	
  2015-­‐01-­‐01	
  12:00	
  boats.txt	
  
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐cat	
  /data/lake/boats.txt	
  
…
14
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐ls	
  /data	
  
Found	
  3	
  items	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  lake	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  pond	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  ocean	
  
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐ls	
  /data/lake	
  
Found	
  1	
  items	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  1321451	
  2015-­‐01-­‐01	
  12:00	
  boats.txt	
  
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐cat	
  /data/lake/boats.txt	
  
…
15
$	
  time	
  for	
  i	
  in	
  {1..100};	
  do	
  hadoop	
  fs	
  -­‐ls	
  /	
  >	
  /dev/null;	
  done	
  
real	
   3m32.014s	
  
user	
   6m15.891s	
  
sys	
  	
  	
  	
  0m18.821s	
  
$	
  time	
  for	
  i	
  in	
  {1..100};	
  do	
  snakebite	
  ls	
  /	
  >	
  /dev/null;	
  done	
  
real	
   0m34.760s	
  
user	
   0m29.962s	
  
sys	
  	
  	
  	
  0m4.512s	
  
16
17
Gaining Focus
(2013-2014)
18
• In 2013, expanded to 200 nodes
• Hadoop critical
• Needed a team totally focused on it
• Created a ‘squad’ with two missions:
• Migrate to a new distribution with Yarn
• Make Hadoop reliable
Forming a team
19
19
Hadoop ownerless
19
Hadoop ownerless
Squad
19
Hadoop ownerless Upgrades
Squad
19
Hadoop ownerless Upgrades Getting there
Squad
20
• Alert on service level problems (i.e. no jobs running)
• Keep your alarm channel clean. Beware of alert fatigue.
Alerting
21
Uhh ohh…..
I think I made a mistake
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  snakebite	
  rm	
  -­‐R	
  /team/disco/	
  CF/test-­‐10/	
  
22
Goodbye Data (1PB)
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  snakebite	
  rm	
  -­‐R	
  /team/disco/	
  CF/test-­‐10/	
  
22
OK:	
  Deleted	
  /team/disco	
  
23
• “Sit on your hands before you type” - Wouter de Bie
• Users will always want to retain data!
• Remove superusers from ‘edgenodes’
• Moving to trash = client-side implementation
Lessons Learned
24
The Wild Wild West
25
• Same hardware profile as production cluster
• Similar configuration
• Staging environment
• Reliable
Pre-Production Cluster
26
Automated Testing
27
28
Moving Data
29
• Features:
• Data discovery
• Lineage
• Lifecycle management
• More
• We use it for data movement
• Uses Oozie behind the scenes
Apache Falcon
• Most of our jobs were Hadoop (python) Streaming
• Lots of failures, slow performance
• Had to find a better way….
30
Improving Performance
31
• Investigated several frameworks
• Selected Crunch:
• Real types!
• Higher level API
• Easier to test
• Better performance #JVM_FTW
*Dave Whiting’s analysis of systems: http://thewit.ch/scalding_crunchy_pig
Improving Performance
32
33
Let’s Review
34
The Future
(2015+)
35
Note: Spotify users is based on publicly released numbers only
36
Explosive Growth
• Increased Spotify Users
• Increased use cases
• Increased Engineers
37
http://everynoise.com/engenremap.html
38
Scaling machines: easy
Scaling people: hard
39
User Feedback:
Automate IT!
40
Data Management
41
• Data-discovery tool
• Luigi Integration
• Find and browse datasets
• View schemas
• Trace lineage
• Open-source plans? :-(
Raynor
42
Two Takeaways
• Automate Everything
• More time to play FIFA build cool tools
• Listen to your users
• Fail fast, don’t be afraid to scrap work
43
Join the Band!
Engineers wanted in NYC & Stockholm
http://spotify.com/jobs

More Related Content

What's hot

Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSophia Ciocca
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCJosh Baer
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingSameera Horawalavithana
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessNick Barkas
 
Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing playNiklas Gustavsson
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at SpotifyNeville Li
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018Fabien Gouyon
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Esh Vckay
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyDanielle Jabin
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringErik Krogen
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouseVianney FOUCAULT
 

What's hot (20)

Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Spotify architecture - Pressing play
Spotify architecture - Pressing playSpotify architecture - Pressing play
Spotify architecture - Pressing play
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Distributed "Web Scale" Systems
Distributed "Web Scale" SystemsDistributed "Web Scale" Systems
Distributed "Web Scale" Systems
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
A/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at SpotifyA/B Testing Pitfalls and Lessons Learned at Spotify
A/B Testing Pitfalls and Lessons Learned at Spotify
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
 
[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse[Meetup] a successful migration from elastic search to clickhouse
[Meetup] a successful migration from elastic search to clickhouse
 
Spotify: P2P music streaming
Spotify: P2P music streamingSpotify: P2P music streaming
Spotify: P2P music streaming
 

Viewers also liked

Moving into movies - using video in E-Learning
Moving into movies - using video in E-Learning Moving into movies - using video in E-Learning
Moving into movies - using video in E-Learning Aurion Learning
 
We’ve created a monster! Truth and fiction in SOA
We’ve created a monster! Truth and fiction in SOAWe’ve created a monster! Truth and fiction in SOA
We’ve created a monster! Truth and fiction in SOAJon Collins
 
A thousand fronts: on the architectures I like
A thousand fronts: on the architectures I likeA thousand fronts: on the architectures I like
A thousand fronts: on the architectures I likeDavide Tommaso Ferrando
 
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)360i
 
4.4 mb portfolio print 2012-2016
4.4 mb portfolio print 2012-20164.4 mb portfolio print 2012-2016
4.4 mb portfolio print 2012-2016Meghan Garnett
 
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.Hubbub Media
 
Big Idea: FIction & Non-Fiction
Big Idea: FIction & Non-FictionBig Idea: FIction & Non-Fiction
Big Idea: FIction & Non-FictionAngela Maiers
 
Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureEsh Vckay
 
When Brands Attack
When Brands AttackWhen Brands Attack
When Brands AttackDirk Herbert
 
Nature, Architecture And Music (V M )
Nature, Architecture And Music (V M )Nature, Architecture And Music (V M )
Nature, Architecture And Music (V M )Valeriu Margescu
 
exploring architecture and music
exploring architecture and musicexploring architecture and music
exploring architecture and musicmichielmoyaert
 
Structural Project 1 Group 1
Structural Project 1 Group 1Structural Project 1 Group 1
Structural Project 1 Group 1guestb2749c7
 
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...Davide Tommaso Ferrando
 
Architectural structures world Wide
Architectural structures   world WideArchitectural structures   world Wide
Architectural structures world WideSagun Rakibe
 
Notaable Buildings Around The World
Notaable Buildings Around The WorldNotaable Buildings Around The World
Notaable Buildings Around The WorldFakhrul Afifi
 

Viewers also liked (20)

Postmodernism
PostmodernismPostmodernism
Postmodernism
 
Barnes and Noble
Barnes and NobleBarnes and Noble
Barnes and Noble
 
Moving into movies - using video in E-Learning
Moving into movies - using video in E-Learning Moving into movies - using video in E-Learning
Moving into movies - using video in E-Learning
 
We’ve created a monster! Truth and fiction in SOA
We’ve created a monster! Truth and fiction in SOAWe’ve created a monster! Truth and fiction in SOA
We’ve created a monster! Truth and fiction in SOA
 
A thousand fronts: on the architectures I like
A thousand fronts: on the architectures I likeA thousand fronts: on the architectures I like
A thousand fronts: on the architectures I like
 
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
 
Math music and architecture
Math music and architectureMath music and architecture
Math music and architecture
 
4.4 mb portfolio print 2012-2016
4.4 mb portfolio print 2012-20164.4 mb portfolio print 2012-2016
4.4 mb portfolio print 2012-2016
 
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.
 
Big Idea: FIction & Non-Fiction
Big Idea: FIction & Non-FictionBig Idea: FIction & Non-Fiction
Big Idea: FIction & Non-Fiction
 
Hamburg 2012
Hamburg 2012Hamburg 2012
Hamburg 2012
 
Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda Architecture
 
When Brands Attack
When Brands AttackWhen Brands Attack
When Brands Attack
 
Nature, Architecture And Music (V M )
Nature, Architecture And Music (V M )Nature, Architecture And Music (V M )
Nature, Architecture And Music (V M )
 
exploring architecture and music
exploring architecture and musicexploring architecture and music
exploring architecture and music
 
Structural Project 1 Group 1
Structural Project 1 Group 1Structural Project 1 Group 1
Structural Project 1 Group 1
 
Urban Design
Urban DesignUrban Design
Urban Design
 
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...
 
Architectural structures world Wide
Architectural structures   world WideArchitectural structures   world Wide
Architectural structures world Wide
 
Notaable Buildings Around The World
Notaable Buildings Around The WorldNotaable Buildings Around The World
Notaable Buildings Around The World
 

Similar to The Evolution of Hadoop at Spotify - Through Failures and Pain

Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARNFerran Galí Reniu
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...DataStax
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & HadoopJeffrey Breen
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Toshihiro Suzuki
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoopfvanvollenhoven
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparktrihug
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an exampleNikita Kesharwani
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaJazz Yao-Tsung Wang
 
Druid at Strata Conf NY 2016.pdf
Druid at Strata Conf NY 2016.pdfDruid at Strata Conf NY 2016.pdf
Druid at Strata Conf NY 2016.pdfHimanshuGupta936
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologySagi Brody
 

Similar to The Evolution of Hadoop at Spotify - Through Failures and Pain (20)

Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARN
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Hadoop bootcamp getting started
Hadoop bootcamp getting startedHadoop bootcamp getting started
Hadoop bootcamp getting started
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
20080529dublinpt1
20080529dublinpt120080529dublinpt1
20080529dublinpt1
 
Backups
BackupsBackups
Backups
 
Big Data
Big DataBig Data
Big Data
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
 
Druid at Strata Conf NY 2016.pdf
Druid at Strata Conf NY 2016.pdfDruid at Strata Conf NY 2016.pdf
Druid at Strata Conf NY 2016.pdf
 
#WeSpeakLinux Session
#WeSpeakLinux Session#WeSpeakLinux Session
#WeSpeakLinux Session
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container Technology
 

Recently uploaded

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Recently uploaded (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

The Evolution of Hadoop at Spotify - Through Failures and Pain

  • 1. The Evolution of Hadoop at Spotify Through Failures and Pain Josh Baer (jbx@spotify.com) Rafal Wojdyla (rav@spotify.com) 1 Note: Our views are our own and don't necessarily represent those of Spotify.
  • 2. 2 • Growing Pains (2009-2012) • Gaining Focus (2013 - 2014) • The Future (2015+) Overview
  • 5. What is Spotify? • Music Streaming Service • Browse and Discover Millions of Songs, Artists and Albums • Launched in October 2008 • December 2014: • 60 Million Monthly Users • 15 Million Paid Subscribers 5
  • 6. What is Spotify? • Data Infrastructure: • 1300 Hadoop Nodes • 42 PB Storage • 20 TB data ingested via Kafka/day • 200 TB generated by Hadoop/day 6
  • 7. 7 select artist_id, count(1) from user_activities where play_seconds > 30 group by artist_id;
  • 8. 7 select artist_id, count(1) from user_activities where play_seconds > 30 group by artist_id;
  • 9. 7
  • 10. 0  *  *  *  *        spotify-­‐core            hadoop  jar  hourly_import.jar   15  *  *  *  *      spotify-­‐core            hadoop  jar  hourly_listeners.jar   30  *  *  *  *      spotify-­‐analytics  hadoop  jar  user_funnel_hourly.jar   *  1  *  *  *        spotify-­‐core            hadoop  jar  daily_aggregate.jar   *  2  *  *  *        spotify-­‐core            hadoop  jar  calculate_royalties.jar   */2  22  *  *  *  spotify-­‐radio          hadoop  jar  generate_radio.jar   8
  • 11. 0  *  *  *  *        spotify-­‐core            hadoop  jar  hourly_import.jar   15  *  *  *  *      spotify-­‐core            hadoop  jar  hourly_listeners.jar   30  *  *  *  *      spotify-­‐analytics  hadoop  jar  user_funnel_hourly.jar   *  1  *  *  *        spotify-­‐core            hadoop  jar  daily_aggregate.jar   *  2  *  *  *        spotify-­‐core            hadoop  jar  calculate_royalties.jar   */2  22  *  *  *  spotify-­‐radio          hadoop  jar  generate_radio.jar   8
  • 12. 9 Handles the ‘plumbing’ for Hadoop jobs https://github.com/spotify/luigi
  • 13. 10
  • 14. 10
  • 15. 1111
  • 18. 12 #  sudo  addgroup  hadoop   #  sudo  adduser  —ingroup  hadoop  hdfs   #  sudo  adduser  —ingroup  hadoop  yarn   #  cp  /tmp/configs/*.xml  /etc/hadoop/conf/   #  apt-­‐get  update   …   [hdfs@sj-­‐hadoop-­‐b20  ~]  $  apt-­‐get  install  hadoop-­‐hdfs-­‐datanode   …   [yarn@sj-­‐hadoop-­‐b20  ~]  $  apt-­‐get  install  hadoop-­‐yarn-­‐nodemanager  
  • 19. 12 #  sudo  addgroup  hadoop   #  sudo  adduser  —ingroup  hadoop  hdfs   #  sudo  adduser  —ingroup  hadoop  yarn   #  cp  /tmp/configs/*.xml  /etc/hadoop/conf/   #  apt-­‐get  update   …   [hdfs@sj-­‐hadoop-­‐b20  ~]  $  apt-­‐get  install  hadoop-­‐hdfs-­‐datanode   …   [yarn@sj-­‐hadoop-­‐b20  ~]  $  apt-­‐get  install  hadoop-­‐yarn-­‐nodemanager  
  • 21. 14 [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐ls  /data   Found  3  items   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  lake   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  pond   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  ocean   [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐ls  /data/lake   Found  1  items   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          1321451  2015-­‐01-­‐01  12:00  boats.txt   [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐cat  /data/lake/boats.txt   …
  • 22. 14 [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐ls  /data   Found  3  items   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  lake   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  pond   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  ocean   [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐ls  /data/lake   Found  1  items   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          1321451  2015-­‐01-­‐01  12:00  boats.txt   [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐cat  /data/lake/boats.txt   …
  • 23. 15
  • 24. $  time  for  i  in  {1..100};  do  hadoop  fs  -­‐ls  /  >  /dev/null;  done   real   3m32.014s   user   6m15.891s   sys        0m18.821s   $  time  for  i  in  {1..100};  do  snakebite  ls  /  >  /dev/null;  done   real   0m34.760s   user   0m29.962s   sys        0m4.512s   16
  • 26. 18 • In 2013, expanded to 200 nodes • Hadoop critical • Needed a team totally focused on it • Created a ‘squad’ with two missions: • Migrate to a new distribution with Yarn • Make Hadoop reliable Forming a team
  • 27. 19
  • 31. 19 Hadoop ownerless Upgrades Getting there Squad
  • 32. 20 • Alert on service level problems (i.e. no jobs running) • Keep your alarm channel clean. Beware of alert fatigue. Alerting
  • 33. 21 Uhh ohh….. I think I made a mistake
  • 34. [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  snakebite  rm  -­‐R  /team/disco/  CF/test-­‐10/   22
  • 35. Goodbye Data (1PB) [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  snakebite  rm  -­‐R  /team/disco/  CF/test-­‐10/   22 OK:  Deleted  /team/disco  
  • 36. 23 • “Sit on your hands before you type” - Wouter de Bie • Users will always want to retain data! • Remove superusers from ‘edgenodes’ • Moving to trash = client-side implementation Lessons Learned
  • 38. 25 • Same hardware profile as production cluster • Similar configuration • Staging environment • Reliable Pre-Production Cluster
  • 40. 27
  • 42. 29 • Features: • Data discovery • Lineage • Lifecycle management • More • We use it for data movement • Uses Oozie behind the scenes Apache Falcon
  • 43. • Most of our jobs were Hadoop (python) Streaming • Lots of failures, slow performance • Had to find a better way…. 30 Improving Performance
  • 44. 31 • Investigated several frameworks • Selected Crunch: • Real types! • Higher level API • Easier to test • Better performance #JVM_FTW *Dave Whiting’s analysis of systems: http://thewit.ch/scalding_crunchy_pig Improving Performance
  • 45. 32
  • 48. 35 Note: Spotify users is based on publicly released numbers only
  • 49. 36 Explosive Growth • Increased Spotify Users • Increased use cases • Increased Engineers
  • 54. 41 • Data-discovery tool • Luigi Integration • Find and browse datasets • View schemas • Trace lineage • Open-source plans? :-( Raynor
  • 55. 42 Two Takeaways • Automate Everything • More time to play FIFA build cool tools • Listen to your users • Fail fast, don’t be afraid to scrap work
  • 56. 43 Join the Band! Engineers wanted in NYC & Stockholm http://spotify.com/jobs