Hadoop Distributed File System Reliability and Durability at Facebook

•

10 gefällt mir•6,963 views

DataWorks Summit

Technologie Business

I accidentally the Namenode
HDFS reliability at Facebook

Andrew Ryan
Facebook
April 2012

The HDFS Namenode: SPOF by design

▪  Single Point of Failure by Namenode Secondary
Namenode
design
▪  All
metadata operations
go through Namenode
▪  Earlydesigners made
tradeoffs: features & Data Datanode
performance first Clients

Simplified HDFS
Architecture:
Namenode as SPOF

HDFS major use cases at Facebook
Data Warehouse and Facebook Messages

Data Warehouse Facebook
Messages
# of clusters <10 10’s
Size of clusters Large Small
(100’s – 1000’s of (~100 nodes)
nodes)
Processing workload MapReduce batch HBase
jobs transactions
Namenode load Very heavy Very light
End-user downtime None Users without
impact Messages

HDFS at Facebook: 2009-2012
Some things have changed…

2009 2012
# HDFS clusters 1 >100

Largest HDFS cluster size (TB) 600TB >100PB

Largest HDFS cluster size (# files) 10 million 200 million

HDFS cluster types MapReduce MapReduce,
HBase, MySQL
backups, +more

HDFS at Facebook: 2009-2012
…and some things have not

2009 2012
Single points of failure in HDFS Namenode Namenode

HDFS cluster restart time 60 minutes 60 minutes

Namenode failover method Manual, Manual,
complicated complicated
SPOF Namenode as a cause of Unknown Unknown
downtime

Data Warehouse

▪  Storageand querying of UI Tools
structured log data using
Hive and Hadoop Workflow (Nocron)
MapReduce
Query (Hive)
▪  Composed of dozens of
tools/components
Compute (MapReduce)
▪  A
“vigorous and creative”
user population Storage (HDFS)
Hadoop

Data Warehouse: all incidents
41% are HDFS-related

Data Warehouse: SPOF Namenode
incidents
10% are SPOF Namenode

Facebook Messages
Clients
User Directory Service
(www, chat, MTA, etc.)

Messages Cell Mail

Application Server Anti-spam
Outbound
Mail
HBase/HDFS/ZK Mail Servers

Haystack

Messages: all incidents
16% are HDFS-related

Messages: SPOF Namenode incidents
10% are SPOF Namenode

What would happen if…
Instead of this…
Namenode Secondary
Namenode

Data Datanode
Clients

Simplified HDFS Architecture:
Namenode as SPOF

What would happen if…
We had this!
Primary Standby
Namenode Namenode

Data Datanode
Clients

Simplified HDFS Architecture:
Highly Available Namenode

AvatarNode is our solution

AvatarNode client view AvatarNode datanode view

AvatarNode is…
▪  A two-node, highly available Namenode with manual failover
▪  In production today at Facebook
▪  Open-sourced, based on Hadoop 0.20:
https://github.com/facebook/hadoop-20

AvatarNode does not…
▪  Eliminate the dependency on shared storage for image/edits
▪  Provide instant failover (~1 second per million blocks+files)
▪  Provide automated failover
▪  Guarantee I/O fencing for Primary/Standby (although precautions are
taken)
▪  Require Zookeeper at all times for proper normal operation (required for
failover)
▪  Allow for >2 Namenodes to participate in an HA cluster
▪  Have any special network requirements

Wrapping up…
▪  The SPOF Namenode is a weak link of HDFS’s design
▪  In our services which use HDFS, we estimate we could eliminate:
▪  10% of service downtime from unscheduled outages
▪  20-50% of downtime from scheduled maintenance
▪  AvatarNode is Facebook’s solution for 0.20, available today
▪  Other
Namenode HA solutions are being worked on in HDFS trunk
(HDFS-1623)

Sessions will resume at 11:25am

Page 19

Empfohlen

Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud

HBase @ Twitterctrezzo

Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.

Storage infrastructure using HBase behind LINE messagesLINE Corporation (Tech Unit)

Facebook keynote-nicolas-qconYiwei Ma

Geo-based content processing using hbaseRavi Veeramachaneni

Soft-Shake 2013 : Enabling Realtime Queries to End UsersBenoit Perroud

Moving from C#/.NET to Hadoop/MongoDBMongoDB

Empfohlen

Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud

HBase @ Twitterctrezzo

Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.

Storage infrastructure using HBase behind LINE messagesLINE Corporation (Tech Unit)

Facebook keynote-nicolas-qconYiwei Ma

Geo-based content processing using hbaseRavi Veeramachaneni

Soft-Shake 2013 : Enabling Realtime Queries to End UsersBenoit Perroud

Moving from C#/.NET to Hadoop/MongoDBMongoDB

HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHortonworks

Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin

Hadoop 101EMC

Apache HBase for ArchitectsNick Dimiduk

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!

Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.

Introduction to hadoop and hdfsTrendProgContest13

Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.

Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit

Difference between hadoop 2 vs hadoop 3Manish Chopra

Hadoop 1.x vs 2Rommel Garcia

Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentYahoo Developer Network

Hadoop and WANdisco: The Future of Big DataWANdisco Plc

App cap2956v2-121001194956-phpapp01 (1)outstanding59

HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit/Hadoop Summit

Learn Hadoop AdministrationEdureka!

TriHUG - Beyond Batchboorad

Facebook's HBase Backups - StampedeCon 2012StampedeCon

Large scale ETL with HadoopOReillyStrata

Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterBill Graham

Storage Infrastructure Behind Facebook Messagesyarapavan

Hadoop 101 v1John Berns

Weitere ähnliche Inhalte

Was ist angesagt?

HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHortonworks

Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin

Hadoop 101EMC

Apache HBase for ArchitectsNick Dimiduk

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!

Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.

Introduction to hadoop and hdfsTrendProgContest13

Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.

Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit

Difference between hadoop 2 vs hadoop 3Manish Chopra

Hadoop 1.x vs 2Rommel Garcia

Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentYahoo Developer Network

Hadoop and WANdisco: The Future of Big DataWANdisco Plc

App cap2956v2-121001194956-phpapp01 (1)outstanding59

HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit/Hadoop Summit

Learn Hadoop AdministrationEdureka!

TriHUG - Beyond Batchboorad

Facebook's HBase Backups - StampedeCon 2012StampedeCon

Large scale ETL with HadoopOReillyStrata

Was ist angesagt? (19)

HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends

Hadoop 101

Apache HBase for Architects

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |

Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...

Introduction to hadoop and hdfs

Cloudera Sessions - Clinic 1 - Getting Started With Hadoop

Big data processing meets non-volatile memory: opportunities and challenges

Difference between hadoop 2 vs hadoop 3

Hadoop 1.x vs 2

Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment

Hadoop and WANdisco: The Future of Big Data

App cap2956v2-121001194956-phpapp01 (1)

HDFS Tiered Storage: Mounting Object Stores in HDFS

Learn Hadoop Administration

TriHUG - Beyond Batch

Facebook's HBase Backups - StampedeCon 2012

Large scale ETL with Hadoop

Andere mochten auch

Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterBill Graham

Storage Infrastructure Behind Facebook Messagesyarapavan

Hadoop 101 v1John Berns

storm at twitterKrishna Gade

Intro To HadoopBill Graham

Hadoop fault tolerancePallav Jha

Big data: Loading your data with flume and sqoopChristophe Marchal

Big data components - Introduction to Flume, Pig and SqoopJeyamariappan Guru

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012Treasure Data, Inc.

Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)Spark Summit

Transperancy & AccountabilityNusret Guclu

HP Vertica basicsVijayananda Mohire

Cloudera's FlumeCloudera, Inc.

VerticaSamchu Li

Apache Tez: Accelerating Hadoop Query ProcessingHortonworks

Scalable Web ArchitectureAleksandr Tsertkov

Using Kafka and Kudu for fast, low-latency SQL analytics on streaming dataMike Percy

Introduction to Apache KuduJeff Holoman

Vertica-DatabaseChakraborty Navin

Predictive maintenance withsensors_in_utilities_Tina Zhang

Andere mochten auch (20)

Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter

Storage Infrastructure Behind Facebook Messages

Hadoop 101 v1

storm at twitter

Intro To Hadoop

Hadoop fault tolerance

Big data: Loading your data with flume and sqoop

Big data components - Introduction to Flume, Pig and Sqoop

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)

Transperancy & Accountability

HP Vertica basics

Cloudera's Flume

Vertica

Apache Tez: Accelerating Hadoop Query Processing

Scalable Web Architecture

Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data

Introduction to Apache Kudu

Vertica-Database

Predictive maintenance withsensors_in_utilities_

Ähnlich wie Hadoop Distributed File System Reliability and Durability at Facebook

HDFSSteve Loughran

Hadoop training by keylabsSiva Sankar

Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit

Design for a Distributed Name NodeAaron Cordova

Hadoop demo pptPhil Young

Hadoop HDFS by rohitkapakapa rohit

SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit

250hadoopinterviewquestionsRamana Swamy

Hadoop DeveloperEdureka!

Cred_hadoop_presenatationAshish Saraf

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, ClouderaCloudera, Inc.

Introduction to HDFSBhavesh Padharia

Introduction to HDFS and MapReduceDerek Chen

Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewNitesh Ghosh

Hadoop and BigData - July 2016Ranjith Sekar

Apache Hadoop In Theory And PracticeAdam Kawa

HBase User Group #9: HBase and HDFSCloudera, Inc.

Hadoop Tutorial for Beginnersbusiness Corporate

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn

Presentation sreenu dwh-servicesSreenu Musham

Ähnlich wie Hadoop Distributed File System Reliability and Durability at Facebook (20)

HDFS

Hadoop training by keylabs

Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon

Design for a Distributed Name Node

Hadoop demo ppt

Hadoop HDFS by rohitkapa

SQL on Hadoop: Defining the New Generation of Analytics Databases

250hadoopinterviewquestions

Hadoop Developer

Cred_hadoop_presenatation

HBaseCon 2012 | HBase and HDFS: Past, Present, Future - Todd Lipcon, Cloudera

Introduction to HDFS

Introduction to HDFS and MapReduce

Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview

Hadoop and BigData - July 2016

Apache Hadoop In Theory And Practice

HBase User Group #9: HBase and HDFS

Hadoop Tutorial for Beginners

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...

Presentation sreenu dwh-services

Mehr von DataWorks Summit

Data Science Crash CourseDataWorks Summit

Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit

HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit

Managing the Dewey Decimal SystemDataWorks Summit

Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit

HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit

Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit

Security Framework for Multitenant ArchitectureDataWorks Summit

Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit

Extending Twitter's Data Platform to Google CloudDataWorks Summit

Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit

Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit

Computer Vision: Coming to a Store Near YouDataWorks Summit

Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit

Mehr von DataWorks Summit (20)

Data Science Crash Course

Floating on a RAFT: HBase Durability with Apache Ratis

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi

HBase Tales From the Trenches - Short stories about most common HBase operati...

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...

Managing the Dewey Decimal System

Practical NoSQL: Accumulo's dirlist Example

HBase Global Indexing to support large-scale data ingestion at Uber

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Security Framework for Multitenant Architecture

Presto: Optimizing Performance of SQL-on-Anything Engine

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Extending Twitter's Data Platform to Google Cloud

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Computer Vision: Coming to a Store Near You

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Kürzlich hochgeladen

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

How to write a Business Continuity PlanDatabarracks

"ML in Production",Oleksandr BaganFwdays

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

From Family Reminiscence to Scholarly Archive .Alan Dix

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Take control of your SAP testing with UiPath Test SuiteDianaGray10

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

WordPress Websites for Engineers: Elevate Your Brandgvaughan

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Kürzlich hochgeladen (20)

Ensuring Technical Readiness For Copilot in Microsoft 365

How to write a Business Continuity Plan

"ML in Production",Oleksandr Bagan

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

From Family Reminiscence to Scholarly Archive .

Gen AI in Business - Global Trends Report 2024.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

"Debugging python applications inside k8s environment", Andrii Soldatenko

The State of Passkeys with FIDO Alliance.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

Take control of your SAP testing with UiPath Test Suite

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Dev Dives: Streamline document processing with UiPath Studio Web

Scale your database traffic with Read & Write split using MySQL Router

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

SIP trunking in Janus @ Kamailio World 2024

WordPress Websites for Engineers: Elevate Your Brand

What's New in Teams Calling, Meetings and Devices March 2024

Hadoop Distributed File System Reliability and Durability at Facebook

1. I accidentally the Namenode HDFS reliability at Facebook Andrew Ryan Facebook April 2012

2. The HDFS Namenode: SPOF by design ▪  Single Point of Failure by Namenode Secondary Namenode design ▪  All metadata operations go through Namenode ▪  Earlydesigners made tradeoffs: features & Data Datanode performance first Clients Simplified HDFS Architecture: Namenode as SPOF

3. HDFS major use cases at Facebook Data Warehouse and Facebook Messages Data Warehouse Facebook Messages # of clusters <10 10’s Size of clusters Large Small (100’s – 1000’s of (~100 nodes) nodes) Processing workload MapReduce batch HBase jobs transactions Namenode load Very heavy Very light End-user downtime None Users without impact Messages

4. HDFS at Facebook: 2009-2012 Some things have changed… 2009 2012 # HDFS clusters 1 >100 Largest HDFS cluster size (TB) 600TB >100PB Largest HDFS cluster size (# files) 10 million 200 million HDFS cluster types MapReduce MapReduce, HBase, MySQL backups, +more

5. HDFS at Facebook: 2009-2012 …and some things have not 2009 2012 Single points of failure in HDFS Namenode Namenode HDFS cluster restart time 60 minutes 60 minutes Namenode failover method Manual, Manual, complicated complicated SPOF Namenode as a cause of Unknown Unknown downtime

6. Data Warehouse ▪  Storageand querying of UI Tools structured log data using Hive and Hadoop Workflow (Nocron) MapReduce Query (Hive) ▪  Composed of dozens of tools/components Compute (MapReduce) ▪  A “vigorous and creative” user population Storage (HDFS) Hadoop

7. Data Warehouse: all incidents 41% are HDFS-related

8. Data Warehouse: SPOF Namenode incidents 10% are SPOF Namenode

9. Facebook Messages Clients User Directory Service (www, chat, MTA, etc.) Messages Cell Mail Application Server Anti-spam Outbound Mail HBase/HDFS/ZK Mail Servers Haystack

10. Messages: all incidents 16% are HDFS-related

11. Messages: SPOF Namenode incidents 10% are SPOF Namenode

12. What would happen if… Instead of this… Namenode Secondary Namenode Data Datanode Clients Simplified HDFS Architecture: Namenode as SPOF

13. What would happen if… We had this! Primary Standby Namenode Namenode Data Datanode Clients Simplified HDFS Architecture: Highly Available Namenode

14. AvatarNode is our solution AvatarNode client view AvatarNode datanode view

15. AvatarNode is… ▪  A two-node, highly available Namenode with manual failover ▪  In production today at Facebook ▪  Open-sourced, based on Hadoop 0.20: https://github.com/facebook/hadoop-20

16. AvatarNode does not… ▪  Eliminate the dependency on shared storage for image/edits ▪  Provide instant failover (~1 second per million blocks+files) ▪  Provide automated failover ▪  Guarantee I/O fencing for Primary/Standby (although precautions are taken) ▪  Require Zookeeper at all times for proper normal operation (required for failover) ▪  Allow for >2 Namenodes to participate in an HA cluster ▪  Have any special network requirements

17. Wrapping up… ▪  The SPOF Namenode is a weak link of HDFS’s design ▪  In our services which use HDFS, we estimate we could eliminate: ▪  10% of service downtime from unscheduled outages ▪  20-50% of downtime from scheduled maintenance ▪  AvatarNode is Facebook’s solution for 0.20, available today ▪  Other Namenode HA solutions are being worked on in HDFS trunk (HDFS-1623)

18. Questions?

19. Sessions will resume at 11:25am Page 19