SlideShare a Scribd company logo
1 of 34
© Hortonworks Inc. 2017
Scaling HDFS to Manage Billions of Files
with Distributed Storage Schemes
Jing Zhao
Tsz-Wo Nicholas Sze
June 14, 2017
Page 1
© Hortonworks Inc. 2017
About Us
• Tsz-Wo Nicholas Sze, Ph.D.
– Software Engineer at Hortonworks
– PMC member/Committer of Apache Hadoop
– Active contributor and committer of Apache Ratis
– Ph.D. from University of Maryland, College Park
– MPhil & BEng from Hong Kong University of Sci & Tech
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2017
• Jing Zhao, Ph.D.
– Software Engineer at Hortonworks
– PMC member/Committer of Apache Hadoop
– Active contributor and committer of Apache Ratis
– Ph.D. from University of Southern California
– B.S. from Tsinghua University, Beijing
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Outline
• Current HDFS Architecture
• Namespace Scaling
• Storage Container Architecture
– Storage Containers
– Next Generation HDFS
– Ozone – Hadoop Object Store
– cBlock
• Current Development Status
Page 4
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Current HDFS
Architecture
Architecting the Future of Big Data
Page 5
© Hortonworks Inc. 2017
HDFS Architecture
Page 6
Architecting the Future of Big Data
Namenode
Heartbeats & Block Reports
Block
Map Block ID  Block Locations
Datanodes
Block ID  Data
Namespace
Tree
File Path  Block IDs
Horizontally Scale IO and Storage
6
b1
b5
b3
BlockStorageNamespace
b2
b3
b1 b3
b5
b2 b1
b5
b2
© Hortonworks Inc. 2017
Foreign
NS n
Common Storage
HDFS Layering
Page 7
Architecting the Future of Big Data
DN 1 DN 2 DN m
..
NS1
... ...
NS k
Block PoolsPool nPool kPool 1
NN-1 NN-k NN-n
BlockStorageNamespace
.. ..
© Hortonworks Inc. 2017
Scalability – What HDFS Does Well?
• HDFS NN stores all metadata in memory
– Scales to large clusters (5k) and since all metadata in memory
• 60K-100K tasks (large # of parallel ops) can share Namenode
• Low latency
• Large data if files are large
– Proof points of large data and large clusters
• Single Organizations have over 600PB in HDFS
• Single clusters with over 200PB using federation
Page 8
Architecting the Future of Big Data
Metadata in memory the strength of the original GFS and HDFS design
But also its weakness in scaling number of files and blocks
© Hortonworks Inc. 2017
Scalability – The Challenges
• Large number of files (> 350 million)
– The files may be small in size.
– NN’s strength has become a limitation
• Number of file operations
– Need to improve concurrency – move to multiple name servers
• HDFS Federation is the current solution
– Add NameNodes to scale number of files & operations
– Deployed at Twitter
• Cluster with three NameNodes 5000+ node cluster (Plans to grow to 10,000 nodes)
– Backported and used at Facebook to scale HDFS
Page 9
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Scalability – Large Number of Blocks
• Block report processing
– Datanode block reports also become huge
– Requires long time to process them.
Namenode
Datanodes
b1
b5
b3b2
b3
b1 b3
b5
b2 b1
b5
b2
Heartbeats & Block Reports
© Hortonworks Inc. 2017
Namespace Scaling
Architecting the Future of Big Data
Page 11
© Hortonworks Inc. 2017
Partial Namespace in Memory
• Use a key-value store to represent the namespace tree
– Every INode has an unique id.
– Map: id -> INode
– Map: (Parent id, child name) -> child id
• Keep only the working set in memory
– Keep part of in memory and part of it on disk
– Various caching strategies
• LRU, caching hot directories, etc.
• LevelDB
– A fast key-value store
– Used in a prototype of partial namespace implementation
© Hortonworks Inc. 2017
Partial Namespace in Memory
• Has been prototyped
– Benchmarks so that model works well
– Most file systems keep only partial namespace in memory but not at this
scale
• Hence Cache replacement policies of working-set is important
• In Big Data, you are using only the last 3-6-12 months of your five/ten years of data
actively => working set is small
• Work in progress to get it into HDFS
• Partial Namespace has other benefits
– Faster NN start up – load-in the working set as needed
– Partial Namespace in Memory will allow multiple namespace volumes
Page 13
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Previous Talks on Partial Namespace
• Evolving HDFS to a Generalized Storage Subsystem
– Sanjay Radia, Jitendra Pandey (@Hortonworks)
– Hadoop Summit 2016
• Scaling HDFS to Manage Billions of Files with Key Value Stores
– Haohui Mai, Jing Zhao (@Hortonworks)
– Hadoop Summit 2015
• Removing the NameNode's memory limitation
– Lin Xiao (Phd student @CMU, intern @Hortonworks)
– Hadoop User Group 2013
© Hortonworks Inc. 2017
Container Architecture
Architecting the Future of Big Data
Page 15
© Hortonworks Inc. 2017
Containers
• Storage Container – a storage unit
• Local block map
– Map block IDs to local block locations
• Small in size
– 4GB or 32GB (configurable)
Page 16
Architecting the Future of Big Data
b6b1 b3
Block Map
c1
Storage
Containers b8b2 b7
Block Map
c2
© Hortonworks Inc. 2017
Distributed Block Map
• The block map is moved from the namenode to datanodes
– The block map becomes distributed
– Entire container is replicated
– A datanode has multiple containers
Page 17
Architecting the Future of Big Data
b6b1 b3
Block Map
c1
b6b1 b3
Block Map
c1
b6b1 b3
Block Map
c1
c1
c5
c3
Containers
c1
c4
c2 c2
c6
c3
Datanodes
© Hortonworks Inc. 2017
SCM – Storage Container Manager
SCM
Heartbeats & Container Reports
Container
Map Container ID  Container Locations
Datanodes
c1
c5
c3c2
c3
c1 c3
c5
c2 c1
c5
c2
© Hortonworks Inc. 2017
NameNode
Next Generation HDFS
Heartbeats & Container Reports
SCM
Container
Map
Container ID 
Container Locations
Datanodes
c1
c5
c3c2
c3
c1 c3
c5
c2 c1
c5
c2
Namespace
Tree
File Path  Block IDs and Container IDs
© Hortonworks Inc. 2017
Billions of Files
• Next generation HDFS architecture
– Support up to 1 million blocks per container
• Provided that the total block size can fit into a container.
– A 5k-node cluster could have 1 million containers
– The cluster can store up to 1 trillion (small) blocks.
– HDFS can easily scale to mange billions of files!
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Ozone – Hadoop Object Store
• Store KV (key-value) pairs
– Similar to Amazon S3
• Need a Key Map – a key-to-container-id map
• Containers are partial object stores (partial KV maps)
Page 21
Architecting the Future of Big Data
Ozone
Heartbeats & Container Reports
Container
Map Container ID  Container Locations
Datanodes
c1
c5
c3c2
c3
c1 c3
c5
c2 c1
c5
c2
Key MapKey  Container IDs
© Hortonworks Inc. 2017
Challenge – Trillions of Key-Value Pairs
• Values (Objects) are distributed in DataNodes
– 5k nodes can handle a trillion of objects (no problem)
• Trillions of keys in the Key Map
– The Key Map becomes huge (TB in size)
– Cannot fit in memory – the same old problem
• Avoid storing all keys in the Key Map
– Hash partitioning
– Range partitioning
– Partitions can be split/merged
Page 22
Architecting the Future of Big Data
Ozone
Key MapKey  Container IDs
© Hortonworks Inc. 2017
Closed Containers
• Initially, a container is open for read and write
– Using Raft for its replication
• Close the container
– once the container has reached a certain size, say 4GB or 32GB
– No longer managed by Raft
• Closed containers are immutable
– Cannot add new KV entries
– Cannot overwrite/delete KV entries
• Open containers
– New KV entries are always written to open containers
– Only need a small number of open containers (thousands)
Page 24
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Container Replication
• Closed containers
– Replication or Erasure Coding
– The same way HDFS does for blocks
• Open containers are replicated by Raft
– Raft: a consensus algorithm
– Apache Ratis – an implementation of Raft
• More detail in later slides
Page 25
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Big Picture
Page 26
DataNodes
Block
Containers
Object Store
Containers
Cluster
Membership
Replication
Management
Container
Location Service
Container Management Services
(Runs on DataNodes)
HBase
Object
Store
Metadata
Applications
HDFS
Physical Storage - Shared
© Hortonworks Inc. 2017
Current Development
Status
Architecting the Future of Big Data
Page 27
© Hortonworks Inc. 2017
HDFS-7240 – Object store in HDFS
• The umbrella JIRA for the Ozone including the container
framework
– 235 subtasks
– 182 subtasks resolved (as of June 13)
– Code contributors
• Anu Engineer, Arpit Agarwal, Chen Liang, Mingliang Liu, Chris Nauroth, Kanaka
Kumar Avvaru, Mukul Kumar Singh, Tsz Wo Nicholas Sze, Weiwei Yang, Xiaobing
Zhou, Xiaoyu Yao, Yuanbo Liu, …
Page 28
Architecting the Future of Big Data
© Hortonworks Inc. 2017
HDFS-11118: Block Storage for HDFS
• The umbrella JIRA for additional work for cBlock
– 23 subtasks
– 20 subtasks resolved (as of June 13)
– Code contributor
• Chen Liang
• Mukul Kumar Singh
• Xiaoyu Yao
• cBlock has already been deployed in Hortonworks’ QE
environment for several months!
Page 29
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Raft – A Consensus Algorithm
• “In Search of an Understandable Consensus Algorithm”
– The Raft paper by Diego Ongaro and John Ousterhout
– USENIX ATC’14
• “In Search of a Usable Raft Library”
– A long list of Raft implementations is available
– Most of them are tied to another project or a part of another project.
• We need a Raft implementation with high throughput!
Page 30
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Apache Ratis – A Raft Library
• A brand new, incubating Apache project
– Open source, open development
– Written in Java 8
• Emphasized on pluggability
– Pluggable state machine
– Pluggable Raft log
– Pluggable RPC
• Current Supported RPC in examples: gRPC, Netty, Hadoop RPC
• Users may provide their own RPC implementation
• Support high throughput data ingest
– For more general data replication use cases
– Pipeline support for log replication
Page 31
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Apache Ratis – Use cases
• General use case:
– You already have a service running on a single server
• You want to:
– replicate the server log/states to multiple machines
• The replication number/cluster membership can be changed in runtime
– have a HA service
• When a server fails, another server will automatically take over
• Clients automatically failover to the new server
• Apache Ratis is for you!
• Use cases in Ozone/HDFS
– Replicating open containers (HDFS-11519, committed on 3. April)
– Support HA in SCM
– Replacing the current NameNode HA solution
Page 32
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Apache Ratis – Development Status
• A brief history
– 2016-03: Project started at Hortonworks
– 2016-04: First commit “leader election (without tests)”
– 2017-01: Entered Apache incubation.
– 2017-03: Started preparing the first Alpha release (RATIS-53).
– 2017-04: Hadoop Ozone branch started using Ratis (HDFS-11519)
– 2017-05: first 0.1.0-alpha release entered distribution
• Committers
– Anu Engineer, Arpit Agarwal, Chen Liang, Chris Nauroth, Devaraj Das,
Enis Soztutar, Hanisha Koneru, Jakob Homan, Jing Zhao, Jitendra
Pandey, Li Lu, Mayank Bansal, Mingliang Liu, Tsz Wo Nicholas Sze,
Uma Maheswara Rao G, Xiaobing Zhou, Xiaoyu Yao
• Contributions are welcome!
– http://incubator.apache.org/projects/ratis.html
– dev@ratis.incubator.apache.org
Page 33
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Thank You!
Page 34
Architecting the Future of Big Data
© Hortonworks Inc. 2017
Backup Slides
Page 35
Architecting the Future of Big Data

More Related Content

What's hot

A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamDataWorks Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on DockerDataWorks Summit
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalDataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowDataWorks Summit/Hadoop Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 

What's hot (20)

A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Lessons learned from running Spark on Docker
Lessons learned from running Spark on DockerLessons learned from running Spark on Docker
Lessons learned from running Spark on Docker
 
HPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposalHPE Hadoop Solutions - From use cases to proposal
HPE Hadoop Solutions - From use cases to proposal
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 

Similar to Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes

Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryTsz-Wo (Nicholas) Sze
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2hdhappy001
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraAlluxio, Inc.
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory StorageDataWorks Summit
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopMedia Gorod
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Zohar Elkayam
 

Similar to Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes (20)

Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
Building a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native EraBuilding a Distributed File System for the Cloud-Native Era
Building a Distributed File System for the Cloud-Native Era
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory Storage
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with HadoopКонстантин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
Константин Швачко, Yahoo!, - Scaling Storage and Computation with Hadoop
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016Rapid Cluster Computing with Apache Spark 2016
Rapid Cluster Computing with Apache Spark 2016
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes

  • 1. © Hortonworks Inc. 2017 Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes Jing Zhao Tsz-Wo Nicholas Sze June 14, 2017 Page 1
  • 2. © Hortonworks Inc. 2017 About Us • Tsz-Wo Nicholas Sze, Ph.D. – Software Engineer at Hortonworks – PMC member/Committer of Apache Hadoop – Active contributor and committer of Apache Ratis – Ph.D. from University of Maryland, College Park – MPhil & BEng from Hong Kong University of Sci & Tech Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2017 • Jing Zhao, Ph.D. – Software Engineer at Hortonworks – PMC member/Committer of Apache Hadoop – Active contributor and committer of Apache Ratis – Ph.D. from University of Southern California – B.S. from Tsinghua University, Beijing Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2017 Outline • Current HDFS Architecture • Namespace Scaling • Storage Container Architecture – Storage Containers – Next Generation HDFS – Ozone – Hadoop Object Store – cBlock • Current Development Status Page 4 Architecting the Future of Big Data
  • 5. © Hortonworks Inc. 2017 Current HDFS Architecture Architecting the Future of Big Data Page 5
  • 6. © Hortonworks Inc. 2017 HDFS Architecture Page 6 Architecting the Future of Big Data Namenode Heartbeats & Block Reports Block Map Block ID  Block Locations Datanodes Block ID  Data Namespace Tree File Path  Block IDs Horizontally Scale IO and Storage 6 b1 b5 b3 BlockStorageNamespace b2 b3 b1 b3 b5 b2 b1 b5 b2
  • 7. © Hortonworks Inc. 2017 Foreign NS n Common Storage HDFS Layering Page 7 Architecting the Future of Big Data DN 1 DN 2 DN m .. NS1 ... ... NS k Block PoolsPool nPool kPool 1 NN-1 NN-k NN-n BlockStorageNamespace .. ..
  • 8. © Hortonworks Inc. 2017 Scalability – What HDFS Does Well? • HDFS NN stores all metadata in memory – Scales to large clusters (5k) and since all metadata in memory • 60K-100K tasks (large # of parallel ops) can share Namenode • Low latency • Large data if files are large – Proof points of large data and large clusters • Single Organizations have over 600PB in HDFS • Single clusters with over 200PB using federation Page 8 Architecting the Future of Big Data Metadata in memory the strength of the original GFS and HDFS design But also its weakness in scaling number of files and blocks
  • 9. © Hortonworks Inc. 2017 Scalability – The Challenges • Large number of files (> 350 million) – The files may be small in size. – NN’s strength has become a limitation • Number of file operations – Need to improve concurrency – move to multiple name servers • HDFS Federation is the current solution – Add NameNodes to scale number of files & operations – Deployed at Twitter • Cluster with three NameNodes 5000+ node cluster (Plans to grow to 10,000 nodes) – Backported and used at Facebook to scale HDFS Page 9 Architecting the Future of Big Data
  • 10. © Hortonworks Inc. 2017 Scalability – Large Number of Blocks • Block report processing – Datanode block reports also become huge – Requires long time to process them. Namenode Datanodes b1 b5 b3b2 b3 b1 b3 b5 b2 b1 b5 b2 Heartbeats & Block Reports
  • 11. © Hortonworks Inc. 2017 Namespace Scaling Architecting the Future of Big Data Page 11
  • 12. © Hortonworks Inc. 2017 Partial Namespace in Memory • Use a key-value store to represent the namespace tree – Every INode has an unique id. – Map: id -> INode – Map: (Parent id, child name) -> child id • Keep only the working set in memory – Keep part of in memory and part of it on disk – Various caching strategies • LRU, caching hot directories, etc. • LevelDB – A fast key-value store – Used in a prototype of partial namespace implementation
  • 13. © Hortonworks Inc. 2017 Partial Namespace in Memory • Has been prototyped – Benchmarks so that model works well – Most file systems keep only partial namespace in memory but not at this scale • Hence Cache replacement policies of working-set is important • In Big Data, you are using only the last 3-6-12 months of your five/ten years of data actively => working set is small • Work in progress to get it into HDFS • Partial Namespace has other benefits – Faster NN start up – load-in the working set as needed – Partial Namespace in Memory will allow multiple namespace volumes Page 13 Architecting the Future of Big Data
  • 14. © Hortonworks Inc. 2017 Previous Talks on Partial Namespace • Evolving HDFS to a Generalized Storage Subsystem – Sanjay Radia, Jitendra Pandey (@Hortonworks) – Hadoop Summit 2016 • Scaling HDFS to Manage Billions of Files with Key Value Stores – Haohui Mai, Jing Zhao (@Hortonworks) – Hadoop Summit 2015 • Removing the NameNode's memory limitation – Lin Xiao (Phd student @CMU, intern @Hortonworks) – Hadoop User Group 2013
  • 15. © Hortonworks Inc. 2017 Container Architecture Architecting the Future of Big Data Page 15
  • 16. © Hortonworks Inc. 2017 Containers • Storage Container – a storage unit • Local block map – Map block IDs to local block locations • Small in size – 4GB or 32GB (configurable) Page 16 Architecting the Future of Big Data b6b1 b3 Block Map c1 Storage Containers b8b2 b7 Block Map c2
  • 17. © Hortonworks Inc. 2017 Distributed Block Map • The block map is moved from the namenode to datanodes – The block map becomes distributed – Entire container is replicated – A datanode has multiple containers Page 17 Architecting the Future of Big Data b6b1 b3 Block Map c1 b6b1 b3 Block Map c1 b6b1 b3 Block Map c1 c1 c5 c3 Containers c1 c4 c2 c2 c6 c3 Datanodes
  • 18. © Hortonworks Inc. 2017 SCM – Storage Container Manager SCM Heartbeats & Container Reports Container Map Container ID  Container Locations Datanodes c1 c5 c3c2 c3 c1 c3 c5 c2 c1 c5 c2
  • 19. © Hortonworks Inc. 2017 NameNode Next Generation HDFS Heartbeats & Container Reports SCM Container Map Container ID  Container Locations Datanodes c1 c5 c3c2 c3 c1 c3 c5 c2 c1 c5 c2 Namespace Tree File Path  Block IDs and Container IDs
  • 20. © Hortonworks Inc. 2017 Billions of Files • Next generation HDFS architecture – Support up to 1 million blocks per container • Provided that the total block size can fit into a container. – A 5k-node cluster could have 1 million containers – The cluster can store up to 1 trillion (small) blocks. – HDFS can easily scale to mange billions of files! Page 20 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2017 Ozone – Hadoop Object Store • Store KV (key-value) pairs – Similar to Amazon S3 • Need a Key Map – a key-to-container-id map • Containers are partial object stores (partial KV maps) Page 21 Architecting the Future of Big Data Ozone Heartbeats & Container Reports Container Map Container ID  Container Locations Datanodes c1 c5 c3c2 c3 c1 c3 c5 c2 c1 c5 c2 Key MapKey  Container IDs
  • 22. © Hortonworks Inc. 2017 Challenge – Trillions of Key-Value Pairs • Values (Objects) are distributed in DataNodes – 5k nodes can handle a trillion of objects (no problem) • Trillions of keys in the Key Map – The Key Map becomes huge (TB in size) – Cannot fit in memory – the same old problem • Avoid storing all keys in the Key Map – Hash partitioning – Range partitioning – Partitions can be split/merged Page 22 Architecting the Future of Big Data Ozone Key MapKey  Container IDs
  • 23. © Hortonworks Inc. 2017 Closed Containers • Initially, a container is open for read and write – Using Raft for its replication • Close the container – once the container has reached a certain size, say 4GB or 32GB – No longer managed by Raft • Closed containers are immutable – Cannot add new KV entries – Cannot overwrite/delete KV entries • Open containers – New KV entries are always written to open containers – Only need a small number of open containers (thousands) Page 24 Architecting the Future of Big Data
  • 24. © Hortonworks Inc. 2017 Container Replication • Closed containers – Replication or Erasure Coding – The same way HDFS does for blocks • Open containers are replicated by Raft – Raft: a consensus algorithm – Apache Ratis – an implementation of Raft • More detail in later slides Page 25 Architecting the Future of Big Data
  • 25. © Hortonworks Inc. 2017 Big Picture Page 26 DataNodes Block Containers Object Store Containers Cluster Membership Replication Management Container Location Service Container Management Services (Runs on DataNodes) HBase Object Store Metadata Applications HDFS Physical Storage - Shared
  • 26. © Hortonworks Inc. 2017 Current Development Status Architecting the Future of Big Data Page 27
  • 27. © Hortonworks Inc. 2017 HDFS-7240 – Object store in HDFS • The umbrella JIRA for the Ozone including the container framework – 235 subtasks – 182 subtasks resolved (as of June 13) – Code contributors • Anu Engineer, Arpit Agarwal, Chen Liang, Mingliang Liu, Chris Nauroth, Kanaka Kumar Avvaru, Mukul Kumar Singh, Tsz Wo Nicholas Sze, Weiwei Yang, Xiaobing Zhou, Xiaoyu Yao, Yuanbo Liu, … Page 28 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2017 HDFS-11118: Block Storage for HDFS • The umbrella JIRA for additional work for cBlock – 23 subtasks – 20 subtasks resolved (as of June 13) – Code contributor • Chen Liang • Mukul Kumar Singh • Xiaoyu Yao • cBlock has already been deployed in Hortonworks’ QE environment for several months! Page 29 Architecting the Future of Big Data
  • 29. © Hortonworks Inc. 2017 Raft – A Consensus Algorithm • “In Search of an Understandable Consensus Algorithm” – The Raft paper by Diego Ongaro and John Ousterhout – USENIX ATC’14 • “In Search of a Usable Raft Library” – A long list of Raft implementations is available – Most of them are tied to another project or a part of another project. • We need a Raft implementation with high throughput! Page 30 Architecting the Future of Big Data
  • 30. © Hortonworks Inc. 2017 Apache Ratis – A Raft Library • A brand new, incubating Apache project – Open source, open development – Written in Java 8 • Emphasized on pluggability – Pluggable state machine – Pluggable Raft log – Pluggable RPC • Current Supported RPC in examples: gRPC, Netty, Hadoop RPC • Users may provide their own RPC implementation • Support high throughput data ingest – For more general data replication use cases – Pipeline support for log replication Page 31 Architecting the Future of Big Data
  • 31. © Hortonworks Inc. 2017 Apache Ratis – Use cases • General use case: – You already have a service running on a single server • You want to: – replicate the server log/states to multiple machines • The replication number/cluster membership can be changed in runtime – have a HA service • When a server fails, another server will automatically take over • Clients automatically failover to the new server • Apache Ratis is for you! • Use cases in Ozone/HDFS – Replicating open containers (HDFS-11519, committed on 3. April) – Support HA in SCM – Replacing the current NameNode HA solution Page 32 Architecting the Future of Big Data
  • 32. © Hortonworks Inc. 2017 Apache Ratis – Development Status • A brief history – 2016-03: Project started at Hortonworks – 2016-04: First commit “leader election (without tests)” – 2017-01: Entered Apache incubation. – 2017-03: Started preparing the first Alpha release (RATIS-53). – 2017-04: Hadoop Ozone branch started using Ratis (HDFS-11519) – 2017-05: first 0.1.0-alpha release entered distribution • Committers – Anu Engineer, Arpit Agarwal, Chen Liang, Chris Nauroth, Devaraj Das, Enis Soztutar, Hanisha Koneru, Jakob Homan, Jing Zhao, Jitendra Pandey, Li Lu, Mayank Bansal, Mingliang Liu, Tsz Wo Nicholas Sze, Uma Maheswara Rao G, Xiaobing Zhou, Xiaoyu Yao • Contributions are welcome! – http://incubator.apache.org/projects/ratis.html – dev@ratis.incubator.apache.org Page 33 Architecting the Future of Big Data
  • 33. © Hortonworks Inc. 2017 Thank You! Page 34 Architecting the Future of Big Data
  • 34. © Hortonworks Inc. 2017 Backup Slides Page 35 Architecting the Future of Big Data