What's New in Teams Calling, Meetings and Devices March 2024
Hadoop Distributed File System Reliability and Durability at Facebook
1. I accidentally the Namenode
HDFS reliability at Facebook
Andrew Ryan
Facebook
April 2012
2. The HDFS Namenode: SPOF by design
▪ Single Point of Failure by Namenode Secondary
Namenode
design
▪ All
metadata operations
go through Namenode
▪ Earlydesigners made
tradeoffs: features & Data Datanode
performance first Clients
Simplified HDFS
Architecture:
Namenode as SPOF
3. HDFS major use cases at Facebook
Data Warehouse and Facebook Messages
Data Warehouse Facebook
Messages
# of clusters <10 10’s
Size of clusters Large Small
(100’s – 1000’s of (~100 nodes)
nodes)
Processing workload MapReduce batch HBase
jobs transactions
Namenode load Very heavy Very light
End-user downtime None Users without
impact Messages
4. HDFS at Facebook: 2009-2012
Some things have changed…
2009 2012
# HDFS clusters 1 >100
Largest HDFS cluster size (TB) 600TB >100PB
Largest HDFS cluster size (# files) 10 million 200 million
HDFS cluster types MapReduce MapReduce,
HBase, MySQL
backups, +more
5. HDFS at Facebook: 2009-2012
…and some things have not
2009 2012
Single points of failure in HDFS Namenode Namenode
HDFS cluster restart time 60 minutes 60 minutes
Namenode failover method Manual, Manual,
complicated complicated
SPOF Namenode as a cause of Unknown Unknown
downtime
6. Data Warehouse
▪ Storageand querying of UI Tools
structured log data using
Hive and Hadoop Workflow (Nocron)
MapReduce
Query (Hive)
▪ Composed of dozens of
tools/components
Compute (MapReduce)
▪ A
“vigorous and creative”
user population Storage (HDFS)
Hadoop
9. Facebook Messages
Clients
User Directory Service
(www, chat, MTA, etc.)
Messages Cell Mail
Application Server Anti-spam
Outbound
Mail
HBase/HDFS/ZK Mail Servers
Haystack
15. AvatarNode is…
▪ A two-node, highly available Namenode with manual failover
▪ In production today at Facebook
▪ Open-sourced, based on Hadoop 0.20:
https://github.com/facebook/hadoop-20
16. AvatarNode does not…
▪ Eliminate the dependency on shared storage for image/edits
▪ Provide instant failover (~1 second per million blocks+files)
▪ Provide automated failover
▪ Guarantee I/O fencing for Primary/Standby (although precautions are
taken)
▪ Require Zookeeper at all times for proper normal operation (required for
failover)
▪ Allow for >2 Namenodes to participate in an HA cluster
▪ Have any special network requirements
17. Wrapping up…
▪ The SPOF Namenode is a weak link of HDFS’s design
▪ In our services which use HDFS, we estimate we could eliminate:
▪ 10% of service downtime from unscheduled outages
▪ 20-50% of downtime from scheduled maintenance
▪ AvatarNode is Facebook’s solution for 0.20, available today
▪ Other
Namenode HA solutions are being worked on in HDFS trunk
(HDFS-1623)