SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
PILOT HADOOP TOWARDS 2500 NODES ANDPILOT HADOOP TOWARDS 2500 NODES AND
CLUSTER REDUNDANCYCLUSTER REDUNDANCY
Stuart Pook (s.pook@criteo.com @StuartPook)
BROUGHT TO YOU BY LAKEBROUGHT TO YOU BY LAKE
Anna Savarin, Anthony Rabier, Meriam Lachkar, Nicolas Fraison,
Rémy Saissy, Stuart Pook, Thierry Lefort & Yohan Bismuth
CRITEOCRITEO
Online advertising
Target the right user
At the right time
With the right message
SIMPLIFED BUSINESS MODELSIMPLIFED BUSINESS MODEL
We buy
Ad spaces
We sell
Clicks — that convert — a lot
We take the risk
HADOOP AT CRITEO BACK IN 2014 (AM5)HADOOP AT CRITEO BACK IN 2014 (AM5)
1200 nodes
39 PB raw capacity
> 100 000 jobs/day
10 000 CPUs
HPE ProLiant Gen8
105 TB RAM
40 TB imported/day
Cloudera CDH4
TODAY PRIMARY DATA INPUTTODAY PRIMARY DATA INPUT
Kafka
500 billion events per day
up to 4.3 million events/second
JSON → protobuf
72 hour buffers
PAID
COMPUTE AND DATA ARE ESSENTIALCOMPUTE AND DATA ARE ESSENTIAL
Extract, Transform & Load logs
Bidding models
Billing
Business analysis
HADOOP PROVIDES LOCAL REDUNDANCYHADOOP PROVIDES LOCAL REDUNDANCY
Failing datanodes (1 or 2)
Failing racks (1)
Failing namenodes (1)
Failing resourcemanager (1)
NO PROTECTION AGAINSTNO PROTECTION AGAINST
Data centre disaster
Multiple datanode failures in a short time
Multiple rack failures in a day
Operator error
DATA BACKUP IN THE CLOUDDATA BACKUP IN THE CLOUD
Backup long
Import 100 TB/day
Create 80 TB/day
Backup at 50 Gb/s?
Restore 2 PB too long at 50 Gb/s
What about compute?
COMPUTE IN THE CLOUDCOMPUTE IN THE CLOUD
20 000 CPUs requires reservation
Reservation expensive
No need for elasticity (batch processing)
Criteo has data centres
Criteo likes bare metal
Cloud > 8 times more expensive
In-house get us exactly the network & hardware we need
BUILD NEW DATA CENTRE (PA4) IN 9BUILD NEW DATA CENTRE (PA4) IN 9
MONTHSMONTHS
Space for 5000 machines
Non-blocking network
10 Gb/s endpoints
Level 3 routing
Clos topology
Power 1 megawatt + option for 1 MW
It's impossible
NEW HARDWARENEW HARDWARE
Had one supplier
Need competition to keep prices down
3 replies to our call for tenders
3 similar 2U machines
16 (or 12) 6 TB SATA disks
2 Xeon E5-2650L v3, 24 cores, 48 threads
256 GB RAM
Mellanox 10 Gb/s network card
2 different RAID cards
TEST THE HARDWARETEST THE HARDWARE
Three 10 node clusters
Disk bandwidth
Disk errors?
Network bandwidth
Teragen
Zero replication (disks)
High replication (network)
Terasort
HARDWARE IS SIMILARHARDWARE IS SIMILAR
Eliminate the constructor with
4 DOA disks
Other failed disks
20% higher power consumption
Choose the cheapest and most dense
Huawei
LSI-3008 RAID card
MIX THE HARDWAREMIX THE HARDWARE
Operations are more difficult
Multiple configurations needed
Some clusters have both hardware types
We have more choice at each order
Avoid vendor lock-in
HAVE THE DC, BUILD HADOOPHAVE THE DC, BUILD HADOOP
Configure using Chef
Infrastructure is code in git
Automate to scale (because we don't)
Test Hadoop with 10 hour petasorts
Tune Hadoop for this scale
Namenode machine crashes
Upgrade kernel
Rolling restart on all machines
Rack by rack, not node by node
A MISTAKEA MISTAKE
Made master and datanodes the same
Just one type of machine
Many available nodes if failure
Moving services with kerberos hard
Move master nodes to bigger machines
Offload DNS & KDC
FASTER MASTER NODESFASTER MASTER NODES
Namenode does many sequential operations
Long locks
Failovers too slow
Heartbeats lost
→ fast CPU
Two big namenodes
512 GB RAM
2 × Intel Xeon CPU E5-2643 v4 @ 3.40GHz
3 × RAID 1 of 2 SSD
PA4 CLUSTER ONLINEPA4 CLUSTER ONLINE
More capacity as have 2 clusters
Users love it
Users find new uses
Soon using more capacity than the new cluster provides
Impossible to stop the old cluster
SITUATION NOW WORSESITUATION NOW WORSE
Two critical clusters
Two Hadoop versions to support CDH4 & CDH5
Not one but two SPOFs
GROW THE NEW DATACENTREGROW THE NEW DATACENTRE
Add hundreds of new nodes
Soon the new cluster will be big enough
1 370 datanodes
+ 650 in Q3 2017
+ 900 in Q4 2017
~ 3 000 datanodes end 2017
Too many blocks for the namenode?
ONE SPOF IS ENOUGHONE SPOF IS ENOUGH
Move all jobs and data to the new cluster (PA4)
Stop the old cluster (AM5)
Only one SPOF but still no redundancy
Have file backups on different technology
2018 BUILD ANOTHER CLUSTER2018 BUILD ANOTHER CLUSTER
Human users (development, machine-learning, BI)
QA & non-regression for service jobs
All data for service jobs
➡ PA4 backup for service jobs
But RAM is expensive
ANOTHER ROUND OF TENDORSANOTHER ROUND OF TENDORS
Need more CPU
Denser machines
4U, 8 nodes, 16 CPUs, 4 × 8 × 2.5" disks (8/U)
2U, 4 nodes, 8 CPUs, 6 × 4 × 2.5" disks (12/U)
Infrastructure validation
Hadoop tests and benchmarks
THE RAID CONTROLLER STORYTHE RAID CONTROLLER STORY
FIRST RAID CONTROLLERFIRST RAID CONTROLLER
Historically HPE Smart Array P420 (PMC-Sierra)
Only RAID 0 users in Criteo
OS status = RAID card status
Skip volumes flagged as bad by the RAID card
Very rare cases of fsck failures
Filesystems mounted with errors=remount-ro
Very rare cases of unflagged read-only filesystems
Access to the volumes blocked
Assumed to be standard behaviour
Operations need RAID card error flag
Out-of-band status → Jira ticket
Identification LED → disk swap
SECOND RAID CONTROLLER LSI-3008SECOND RAID CONTROLLER LSI-3008
Disks vanished
No diagnostic LED
Used ID LED on other disks
Later tested OK
Worked a er power cycle
Change 700 cards on a running cluster
All blocks lost on each card change
No downtime allowed
Rack by rack
Many HDFS stability problems
THIRD RAID CONTROLLER LSI-3108THIRD RAID CONTROLLER LSI-3108
LSI RAID card
Now OS flags bad disks before card
Failing fsck
Read-only filesystems
Volume seen as OK by the RAID card
O/S can access the volume and get timeouts
No error for out of band monitoring
No error for in-band monitoring
We can only handle OK or Failed volumes
2 SOLUTIONS WITH VENDOR LOCK-IN2 SOLUTIONS WITH VENDOR LOCK-IN
Buy all machines from HPE
Get the supplier to “solve” the problem for us
Agent running in the controller
They develop in China
We debug in France (in prod)
2 COMPLICATED SOLUTIONS2 COMPLICATED SOLUTIONS
Create RAID0 team to
Handle all error conditions
Stop access to the volume
Reformat (once) volume with read errors
Open tickets
Set identification LED
Work with LSI to tweak their controller
STAYING ALIVESTAYING ALIVE
Automate operations
Disk changes
Ramp ups
Infrastructure as code
Tests: kitchen, ChefSpec & preprod
Merge requests with reviews
Choreograph operations
Restarts (machines or services)
NEED LAKE CLUSTERNEED LAKE CLUSTER
Infrastructure validation
Test configuration recipes
Test Hadoop patches
And lab for hardware tests
STAYING ALIVE — TUNE HADOOPSTAYING ALIVE — TUNE HADOOP
Increase bandwidth and time limit for checkpoint
332 GB heap for namenode
180 GB reserved for native code & OS
Tune GC for namenode
Serial → not efficient on multi-thread
Parallel → long pauses + high throughput
Concurrent Mark Sweep → short pauses + lower throughput
G1 → in prod
G1NewSizePercent=0
G1RSetRegionEntries=4096
+ParallelRefProcEnabled & +PerfDisableSharedMem
Azul not required for 1300 datanodes but 3000?
STAYING ALIVE — FIX BUGSSTAYING ALIVE — FIX BUGS
The cluster crashes, find the bug, if fixed, backport it, else fix
Fix
HDFS-10220 expired leases make namenode unresponsive and
failover
Backport
YARN-4041 Slow delegation token renewal prolongs RM
recovery
HDFS-9305 Delayed heartbeat processing causes storm of
heartbeats
YARN-4546 ResourceManager crash due to scheduling
opportunity overflow
HDFS-9906 Remove spammy log spew when a datanode is
restarted
STAYING ALIVE — MONITORINGSTAYING ALIVE — MONITORING
HDFS
Namenode: missing blocks, GC, checkpoints, safemode, QPS,
live datanode
Datanodes: disks, read/write throughput, space
YARN
Queue length, memory & CPU usage, job duration (scheduling
+ run time)
ResourceManager: QPS
Bad nodes
Probes to emulate client behavior with witness jobs
Zookeeper: availability, probes
CLUSTER COLLAPSECLUSTER COLLAPSE
Lots of blocks → 132 GB namenode (NN) heap full
User creates 20 million files & 20 PB data on a Friday a ernoon
NN gets stuck doing GC → no throughput
Increase standby heap size to 85% RAM via restart
Too many requests during restart (iptables)
Failover crashed
Fsimage on active corrupt as too big for transfer
Copy missing NN edits from journal node
Restart 1200 datanodes in batches
36 hours to recover the cluster
RESOURCES MANAGER SLOWSRESOURCES MANAGER SLOWS
Event EventType: KILL_CONTAINER sent to absent container
These messages happen occasionally
Almost no jobs running (8% capacity used)
Need to kill the applications
During NodeManager’s resync with the ResourceManager?
NEED SERVICE-LEVEL AGREEMENT (SLA)NEED SERVICE-LEVEL AGREEMENT (SLA)
Define
Time for operations
Job duration
Request handling
Measure
Monitoring
Respect
Some services are “best effort”
OPERATOR ERROROPERATOR ERROR
Same operators on both clusters
One chef server for both clusters
Single mistake → both clusters
WE HAVEWE HAVE
2 prod clusters
2 pre-prod clusters
1 infrastructure cluster
2 running CDH4
3 running CDH5
2682 datanodes
49 248 cores
135 PB disk space
842 TB RAM
> 300 000 jobs/day
100 TB imported daily
6 PB created or read per day
UPCOMING CHALLENGESUPCOMING CHALLENGES
Optimize and fix Hadoop
Add hundreds more datanodes
Create a new bare-metal data-centre
Make 2 big clusters work together
Improve scheduling
We are hiring
Come and join us in Paris, Palo Alto or Ann Arbor
s.pook@criteo.com @StuartPook
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...Fred de Villamil
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)J.B. Langston
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance SmackdownDataWorks Summit
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1DataStax Academy
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedis Labs
 
Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High CostsJonathan Long
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSDataStax Academy
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterChristopher Bradford
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Community
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Community
 
Cassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterCassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterJ.B. Langston
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDSDenish Patel
 

Was ist angesagt? (20)

Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
SUE 2018 - Migrating a 130TB Cluster from Elasticsearch 2 to 5 in 20 Hours Wi...
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
RedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with RedisRedisConf17 - Searching Billions of Documents with Redis
RedisConf17 - Searching Billions of Documents with Redis
 
Ceph - High Performance Without High Costs
Ceph - High Performance Without High CostsCeph - High Performance Without High Costs
Ceph - High Performance Without High Costs
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFS
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
Cassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and laterCassandra Troubleshooting for 2.1 and later
Cassandra Troubleshooting for 2.1 and later
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 

Ähnlich wie Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy

HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 editionBob Ward
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataLviv Startup Club
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Lviv Startup Club
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureScyllaDB
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxMiraj Godha
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...inside-BigData.com
 
Provisioning Servers Made Easy
Provisioning Servers Made EasyProvisioning Servers Made Easy
Provisioning Servers Made EasyAll Things Open
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...In-Memory Computing Summit
 
Seagate SC15 Announcements for HPC
Seagate SC15 Announcements for HPCSeagate SC15 Announcements for HPC
Seagate SC15 Announcements for HPCinside-BigData.com
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 

Ähnlich wie Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy (20)

HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
HPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big DataHPE Solutions for Challenges in AI and Big Data
HPE Solutions for Challenges in AI and Big Data
 
Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)Saviak lviv ai-2019-e-mail (1)
Saviak lviv ai-2019-e-mail (1)
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database Architecture
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
 
Provisioning Servers Made Easy
Provisioning Servers Made EasyProvisioning Servers Made Easy
Provisioning Servers Made Easy
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
Seagate SC15 Announcements for HPC
Seagate SC15 Announcements for HPCSeagate SC15 Announcements for HPC
Seagate SC15 Announcements for HPC
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 

Kürzlich hochgeladen

Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 

Kürzlich hochgeladen (20)

Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Pratap Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 

Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy

  • 1. PILOT HADOOP TOWARDS 2500 NODES ANDPILOT HADOOP TOWARDS 2500 NODES AND CLUSTER REDUNDANCYCLUSTER REDUNDANCY Stuart Pook (s.pook@criteo.com @StuartPook)
  • 2. BROUGHT TO YOU BY LAKEBROUGHT TO YOU BY LAKE Anna Savarin, Anthony Rabier, Meriam Lachkar, Nicolas Fraison, Rémy Saissy, Stuart Pook, Thierry Lefort & Yohan Bismuth
  • 3. CRITEOCRITEO Online advertising Target the right user At the right time With the right message
  • 4. SIMPLIFED BUSINESS MODELSIMPLIFED BUSINESS MODEL We buy Ad spaces We sell Clicks — that convert — a lot We take the risk
  • 5. HADOOP AT CRITEO BACK IN 2014 (AM5)HADOOP AT CRITEO BACK IN 2014 (AM5) 1200 nodes 39 PB raw capacity > 100 000 jobs/day 10 000 CPUs HPE ProLiant Gen8 105 TB RAM 40 TB imported/day Cloudera CDH4
  • 6. TODAY PRIMARY DATA INPUTTODAY PRIMARY DATA INPUT Kafka 500 billion events per day up to 4.3 million events/second JSON → protobuf 72 hour buffers
  • 7. PAID COMPUTE AND DATA ARE ESSENTIALCOMPUTE AND DATA ARE ESSENTIAL Extract, Transform & Load logs Bidding models Billing Business analysis
  • 8. HADOOP PROVIDES LOCAL REDUNDANCYHADOOP PROVIDES LOCAL REDUNDANCY Failing datanodes (1 or 2) Failing racks (1) Failing namenodes (1) Failing resourcemanager (1)
  • 9. NO PROTECTION AGAINSTNO PROTECTION AGAINST Data centre disaster Multiple datanode failures in a short time Multiple rack failures in a day Operator error
  • 10. DATA BACKUP IN THE CLOUDDATA BACKUP IN THE CLOUD Backup long Import 100 TB/day Create 80 TB/day Backup at 50 Gb/s? Restore 2 PB too long at 50 Gb/s What about compute?
  • 11. COMPUTE IN THE CLOUDCOMPUTE IN THE CLOUD 20 000 CPUs requires reservation Reservation expensive No need for elasticity (batch processing) Criteo has data centres Criteo likes bare metal Cloud > 8 times more expensive In-house get us exactly the network & hardware we need
  • 12. BUILD NEW DATA CENTRE (PA4) IN 9BUILD NEW DATA CENTRE (PA4) IN 9 MONTHSMONTHS Space for 5000 machines Non-blocking network 10 Gb/s endpoints Level 3 routing Clos topology Power 1 megawatt + option for 1 MW It's impossible
  • 13. NEW HARDWARENEW HARDWARE Had one supplier Need competition to keep prices down 3 replies to our call for tenders 3 similar 2U machines 16 (or 12) 6 TB SATA disks 2 Xeon E5-2650L v3, 24 cores, 48 threads 256 GB RAM Mellanox 10 Gb/s network card 2 different RAID cards
  • 14. TEST THE HARDWARETEST THE HARDWARE Three 10 node clusters Disk bandwidth Disk errors? Network bandwidth Teragen Zero replication (disks) High replication (network) Terasort
  • 15. HARDWARE IS SIMILARHARDWARE IS SIMILAR Eliminate the constructor with 4 DOA disks Other failed disks 20% higher power consumption Choose the cheapest and most dense Huawei LSI-3008 RAID card
  • 16. MIX THE HARDWAREMIX THE HARDWARE Operations are more difficult Multiple configurations needed Some clusters have both hardware types We have more choice at each order Avoid vendor lock-in
  • 17. HAVE THE DC, BUILD HADOOPHAVE THE DC, BUILD HADOOP Configure using Chef Infrastructure is code in git Automate to scale (because we don't) Test Hadoop with 10 hour petasorts Tune Hadoop for this scale Namenode machine crashes Upgrade kernel Rolling restart on all machines Rack by rack, not node by node
  • 18.
  • 19. A MISTAKEA MISTAKE Made master and datanodes the same Just one type of machine Many available nodes if failure Moving services with kerberos hard Move master nodes to bigger machines Offload DNS & KDC
  • 20. FASTER MASTER NODESFASTER MASTER NODES Namenode does many sequential operations Long locks Failovers too slow Heartbeats lost → fast CPU Two big namenodes 512 GB RAM 2 × Intel Xeon CPU E5-2643 v4 @ 3.40GHz 3 × RAID 1 of 2 SSD
  • 21. PA4 CLUSTER ONLINEPA4 CLUSTER ONLINE More capacity as have 2 clusters Users love it Users find new uses Soon using more capacity than the new cluster provides Impossible to stop the old cluster
  • 22. SITUATION NOW WORSESITUATION NOW WORSE Two critical clusters Two Hadoop versions to support CDH4 & CDH5 Not one but two SPOFs
  • 23. GROW THE NEW DATACENTREGROW THE NEW DATACENTRE Add hundreds of new nodes Soon the new cluster will be big enough 1 370 datanodes + 650 in Q3 2017 + 900 in Q4 2017 ~ 3 000 datanodes end 2017 Too many blocks for the namenode?
  • 24. ONE SPOF IS ENOUGHONE SPOF IS ENOUGH Move all jobs and data to the new cluster (PA4) Stop the old cluster (AM5) Only one SPOF but still no redundancy Have file backups on different technology
  • 25. 2018 BUILD ANOTHER CLUSTER2018 BUILD ANOTHER CLUSTER Human users (development, machine-learning, BI) QA & non-regression for service jobs All data for service jobs ➡ PA4 backup for service jobs But RAM is expensive
  • 26. ANOTHER ROUND OF TENDORSANOTHER ROUND OF TENDORS Need more CPU Denser machines 4U, 8 nodes, 16 CPUs, 4 × 8 × 2.5" disks (8/U) 2U, 4 nodes, 8 CPUs, 6 × 4 × 2.5" disks (12/U) Infrastructure validation Hadoop tests and benchmarks
  • 27. THE RAID CONTROLLER STORYTHE RAID CONTROLLER STORY
  • 28. FIRST RAID CONTROLLERFIRST RAID CONTROLLER Historically HPE Smart Array P420 (PMC-Sierra) Only RAID 0 users in Criteo OS status = RAID card status Skip volumes flagged as bad by the RAID card Very rare cases of fsck failures Filesystems mounted with errors=remount-ro Very rare cases of unflagged read-only filesystems Access to the volumes blocked Assumed to be standard behaviour Operations need RAID card error flag Out-of-band status → Jira ticket Identification LED → disk swap
  • 29. SECOND RAID CONTROLLER LSI-3008SECOND RAID CONTROLLER LSI-3008 Disks vanished No diagnostic LED Used ID LED on other disks Later tested OK Worked a er power cycle Change 700 cards on a running cluster All blocks lost on each card change No downtime allowed Rack by rack Many HDFS stability problems
  • 30. THIRD RAID CONTROLLER LSI-3108THIRD RAID CONTROLLER LSI-3108 LSI RAID card Now OS flags bad disks before card Failing fsck Read-only filesystems Volume seen as OK by the RAID card O/S can access the volume and get timeouts No error for out of band monitoring No error for in-band monitoring We can only handle OK or Failed volumes
  • 31. 2 SOLUTIONS WITH VENDOR LOCK-IN2 SOLUTIONS WITH VENDOR LOCK-IN Buy all machines from HPE Get the supplier to “solve” the problem for us Agent running in the controller They develop in China We debug in France (in prod)
  • 32. 2 COMPLICATED SOLUTIONS2 COMPLICATED SOLUTIONS Create RAID0 team to Handle all error conditions Stop access to the volume Reformat (once) volume with read errors Open tickets Set identification LED Work with LSI to tweak their controller
  • 33. STAYING ALIVESTAYING ALIVE Automate operations Disk changes Ramp ups Infrastructure as code Tests: kitchen, ChefSpec & preprod Merge requests with reviews Choreograph operations Restarts (machines or services)
  • 34. NEED LAKE CLUSTERNEED LAKE CLUSTER Infrastructure validation Test configuration recipes Test Hadoop patches And lab for hardware tests
  • 35. STAYING ALIVE — TUNE HADOOPSTAYING ALIVE — TUNE HADOOP Increase bandwidth and time limit for checkpoint 332 GB heap for namenode 180 GB reserved for native code & OS Tune GC for namenode Serial → not efficient on multi-thread Parallel → long pauses + high throughput Concurrent Mark Sweep → short pauses + lower throughput G1 → in prod G1NewSizePercent=0 G1RSetRegionEntries=4096 +ParallelRefProcEnabled & +PerfDisableSharedMem Azul not required for 1300 datanodes but 3000?
  • 36. STAYING ALIVE — FIX BUGSSTAYING ALIVE — FIX BUGS The cluster crashes, find the bug, if fixed, backport it, else fix Fix HDFS-10220 expired leases make namenode unresponsive and failover Backport YARN-4041 Slow delegation token renewal prolongs RM recovery HDFS-9305 Delayed heartbeat processing causes storm of heartbeats YARN-4546 ResourceManager crash due to scheduling opportunity overflow HDFS-9906 Remove spammy log spew when a datanode is restarted
  • 37. STAYING ALIVE — MONITORINGSTAYING ALIVE — MONITORING HDFS Namenode: missing blocks, GC, checkpoints, safemode, QPS, live datanode Datanodes: disks, read/write throughput, space YARN Queue length, memory & CPU usage, job duration (scheduling + run time) ResourceManager: QPS Bad nodes Probes to emulate client behavior with witness jobs Zookeeper: availability, probes
  • 38.
  • 39. CLUSTER COLLAPSECLUSTER COLLAPSE Lots of blocks → 132 GB namenode (NN) heap full User creates 20 million files & 20 PB data on a Friday a ernoon NN gets stuck doing GC → no throughput Increase standby heap size to 85% RAM via restart Too many requests during restart (iptables) Failover crashed Fsimage on active corrupt as too big for transfer Copy missing NN edits from journal node Restart 1200 datanodes in batches 36 hours to recover the cluster
  • 40.
  • 41. RESOURCES MANAGER SLOWSRESOURCES MANAGER SLOWS Event EventType: KILL_CONTAINER sent to absent container These messages happen occasionally Almost no jobs running (8% capacity used) Need to kill the applications During NodeManager’s resync with the ResourceManager?
  • 42. NEED SERVICE-LEVEL AGREEMENT (SLA)NEED SERVICE-LEVEL AGREEMENT (SLA) Define Time for operations Job duration Request handling Measure Monitoring Respect Some services are “best effort”
  • 43. OPERATOR ERROROPERATOR ERROR Same operators on both clusters One chef server for both clusters Single mistake → both clusters
  • 44. WE HAVEWE HAVE 2 prod clusters 2 pre-prod clusters 1 infrastructure cluster 2 running CDH4 3 running CDH5 2682 datanodes 49 248 cores 135 PB disk space 842 TB RAM > 300 000 jobs/day 100 TB imported daily 6 PB created or read per day
  • 45. UPCOMING CHALLENGESUPCOMING CHALLENGES Optimize and fix Hadoop Add hundreds more datanodes Create a new bare-metal data-centre Make 2 big clusters work together Improve scheduling We are hiring Come and join us in Paris, Palo Alto or Ann Arbor s.pook@criteo.com @StuartPook Questions?