SlideShare ist ein Scribd-Unternehmen logo
1 von 16
HDFS Federation


Suresh Srinivas
@suresh_m_s




                  Page 1
Agenda

• HDFS Background

• Current Limitations

• Federation Architecture

• Federation Details

• Next Steps
HDFS Architecture

                                                        Two main layers
                                                        •   Namespace
Block Storage Namespace




                                                            ›   Consists of dirs, files and blocks
                            Namenode
                                        NS                  ›   Supports create, delete, modify and list files or
                                                                dirs operations
                                Block Management


                                                        •   Block Storage
                           Datanode     …    Datanode       ›   Block Management
                                      Storage                   • Datanode cluster membership
                                                                • Supports create/delete/modify/get block location
                                                                  operations
                                                                • Manages replication and replica placement
                                                            ›   Storage - provides read and write access to
                                                                blocks



                                                                   3
HDFS Architecture…

                                                        Implemented as
Block Storage Namespace




                           Namenode                     • Single Namespace Volume
                                        NS                 › Namespace Volume = Namespace +
                                Block Management             Blocks
                                                        • Single namenode with a namespace
                           Datanode     …    Datanode      › Entire namespace is in memory
                                      Storage              › Provides Block Management
                                                        • Datanodes store block replicas
                                                           › Block files stored on local file system




                                                                4
Limitation - Scalability

 Scalability
 •   Storage scales horizontally - namespace doesn’t
 •   Limited number of files, dirs and blocks
     ›   250 million files and blocks at 64GB Namenode heap size
           •   Still a very large cluster
           •   Facebook clusters are sized at ~70 PB storage



 Performance
 •   File system operations throughput limited by a single node
     ›   120K read ops/sec and 6000 write ops/sec
           •   Easily scalable to 20K write ops/sec by code improvements



                                              5
Limitation - Isolation

 Poor Isolation
 •   All the tenants share a single namespace
     ›   Separate volume for tenants is desirable

 •   Lacks separate namespace for different application
     categories or application requirements
     ›   Experimental apps can affect production apps

     ›   Example - HBase could use its own namespace




                                       6
Limitation – Tight Coupling

Namespace and Block Management are distinct services
• Tightly coupled due to co-location
• Scaling block management independent of namespace is simpler
• Simplifies Namespace and scaling it


Block Storage could be a generic service
• Namespace is one of the applications to use the service
• Other services can be built directly on Block Storage
   › HBase
   ›   Foreign namespaces



                                    7
Isolation is a problem for even small
                 clusters




                   8
HDFS Federation
         Namespace       NN-1                    NN-k                   NN-n

                                                                               Foreig
                                NS1                     NS k                   n NS n
                                           ...                    ...


                                  Pool 1             Pool k              Pool n
         Block Storage




                                                    Block Pools




                          Datanode 1               Datanode 2            Datanode m
                                ...                     ...                    ...
                                                 Common Storage


•   Multiple independent Namenodes and Namespace Volumes in a cluster
     ›   Namespace Volume = Namespace + Block Pool
•   Block Storage as generic storage service
     ›   Set of blocks for a Namespace Volume is called a Block Pool
     ›   DNs store blocks for all the Namespace Volumes – no partitioning
Key Ideas & Benefits

•   Distributed Namespace: Partitioned across namenodes
                                                                                        Alternate NN
     ›     Simple and Robust due to independent masters                                Implementation   HBase
                                                                           HDFS
         • Each master serves a namespace volume                         Namespace                      MR tmp

         • Preserves namenode stability – little namenode code change

     ›     Scalability – 6K nodes, 100K tasks, 200PB and 1 billion files

•   Block Pools enable generic storage service                                       Storage Service


     ›     Enables Namespace Volumes to be independent of each other
     ›     Fuels innovation and Rapid development
            • New implementations of file systems and Applications on top of block storage possible
            • New block pool categories – tmp storage, distributed cache, small object storage

•   In future, move Block Management out of namenode to separate set of nodes
     ›     Simplifies namespace/application implementation
            • Distributed namenode becomes significantly simpler
HDFS Federation Details
• Simple design
   › Little change to the Namenode, most changes in Datanode, Config and Tools
   › Core development in 4 months
   › Namespace and Block Management remain in Namenode
      • Block Management could be moved out of namenode in the future

• Little impact on existing deployments
   › Single namenode configuration runs as is

• Datanodes provide storage services for all the namenodes
   › Register with all the namenodes
   › Send periodic heartbeats and block reports to all the namenodes
   › Send block received/deleted for a block pool to corresponding namenode



                                           11
HDFS Federation Details…
• Cluster Web UI for better manageability
    › Provides cluster summary
    › Namenode list and summary of namenode status
    › Decommissioning status
• Tools
    › Decommissioning works with multiple namespace
    › Balancer works with multiple namespaces
       • Both Datanode storage or Block Pool storage can be balanced

• Namenode can be added/deleted in Federated cluster
    › No need to restart the cluster
• Single configuration for all the nodes in the cluster
Managing Namespaces
•   Federation has multiple namespaces – Don’t                                 Client-side
    you need a single global namespace?                                    /
                                                                               mount-table
     ›   Key is to share the data and the names used
         to access the data

•   A global namespace is one way to do that
                                                               data project home     tmp
•   Client-side mount table is another way to
    share.
     ›   Shared mount-table => “global” shared view
     ›   Personalized mount-table => per-application                                  NS4
         view
         • Share the data that matter by mounting it

•   Client-side implementation of mount tables
                                                         NS1       NS2         NS3
     ›   No single point of failure
     ›   No hotspot for root and top level directories
Next Steps

• Complete separation of namespace and block
  management layers
  › Block storage as generic service

• Partial namespace in memory for further scalability
• Move partial namespace from one namenode to
  another
  › Namespace operation - no data copy
Next Steps…
                          • Namenode as a container for namespaces
                             › Lots of small namespaces
                               •   Chosen per user/tenant/data feed

                               •   Mount tables for unified namespace
                                   •    Can be managed by a central volume server
                 …

     Namenodes               › Move namespace from one container to
                               another for balancing
Datanode   …   Datanode
                          • Combined with partial namespace
                             › Choose number of namenodes to match
                               •   Sum of (Namespace working set)

                               •   Sum of (Namespace throughput)




                                   15
Thank You

More information
1. HDFS-1052 – HDFS Scalability with multiple namenodes
2. An Introduction to HDFS Federation –
  https://hortonworks.com/an-introduction-to-hdfs-federation/

Weitere ähnliche Inhalte

Was ist angesagt?

Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageWebinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageGlusterFS
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011GlusterFS
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseChristopher Choi
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 
Extending the lifecycle of your storage area network
Extending the lifecycle of your storage area networkExtending the lifecycle of your storage area network
Extending the lifecycle of your storage area networkInterop
 
Ph.D. thesis presentation
Ph.D. thesis presentationPh.D. thesis presentation
Ph.D. thesis presentationdavidkftam
 
Gluster Blog 11.15.2010
Gluster Blog 11.15.2010Gluster Blog 11.15.2010
Gluster Blog 11.15.2010GlusterFS
 
Cache-partitioning
Cache-partitioningCache-partitioning
Cache-partitioningdavidkftam
 
My sql with enterprise storage
My sql with enterprise storageMy sql with enterprise storage
My sql with enterprise storageCaroline_Rose
 
Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...
Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...
Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...Kenneth de Brucq
 
Embracing Open Source: Practice and Experience from Alibaba
Embracing Open Source: Practice and Experience from AlibabaEmbracing Open Source: Practice and Experience from Alibaba
Embracing Open Source: Practice and Experience from AlibabaWensong Zhang
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudRightScale
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java DevelopersRichard McDougall
 
Severalnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines
 
LVS development and experience
LVS development and experienceLVS development and experience
LVS development and experienceWensong Zhang
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfsRami Jebara
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesDataWorks Summit
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.finalHortonworks
 

Was ist angesagt? (20)

Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS StorageWebinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
 
Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011Intro to GlusterFS Webinar - August 2011
Intro to GlusterFS Webinar - August 2011
 
Benchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBaseBenchmarking MongoDB and CouchBase
Benchmarking MongoDB and CouchBase
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
1556 a 07
1556 a 071556 a 07
1556 a 07
 
Extending the lifecycle of your storage area network
Extending the lifecycle of your storage area networkExtending the lifecycle of your storage area network
Extending the lifecycle of your storage area network
 
Hadoop on Virtual Machines
Hadoop on Virtual MachinesHadoop on Virtual Machines
Hadoop on Virtual Machines
 
Ph.D. thesis presentation
Ph.D. thesis presentationPh.D. thesis presentation
Ph.D. thesis presentation
 
Gluster Blog 11.15.2010
Gluster Blog 11.15.2010Gluster Blog 11.15.2010
Gluster Blog 11.15.2010
 
Cache-partitioning
Cache-partitioningCache-partitioning
Cache-partitioning
 
My sql with enterprise storage
My sql with enterprise storageMy sql with enterprise storage
My sql with enterprise storage
 
Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...
Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...
Dell Solutions Tour 2015 - Azure i ditt eget datasenter, Kristian Nese, CTO L...
 
Embracing Open Source: Practice and Experience from Alibaba
Embracing Open Source: Practice and Experience from AlibabaEmbracing Open Source: Practice and Experience from Alibaba
Embracing Open Source: Practice and Experience from Alibaba
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the Cloud
 
Virtualization Primer for Java Developers
Virtualization Primer for Java DevelopersVirtualization Primer for Java Developers
Virtualization Primer for Java Developers
 
Severalnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part X
 
LVS development and experience
LVS development and experienceLVS development and experience
LVS development and experience
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
 
Apache Hadoop on Virtual Machines
Apache Hadoop on Virtual MachinesApache Hadoop on Virtual Machines
Apache Hadoop on Virtual Machines
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.final
 

Ähnlich wie Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks

Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Open Stack
 
Tailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual MachinesTailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual MachinesThe Linux Foundation
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptxSwarnaSLcse
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
Future of cloud storage
Future of cloud storageFuture of cloud storage
Future of cloud storageGlusterFS
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++Hortonworks
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationCeph Community
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfsNAVER D2
 
Severalnines Self-Training: MySQL® Cluster - Part VIII
Severalnines Self-Training: MySQL® Cluster - Part VIIISeveralnines Self-Training: MySQL® Cluster - Part VIII
Severalnines Self-Training: MySQL® Cluster - Part VIIISeveralnines
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 

Ähnlich wie Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks (20)

March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Tailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual MachinesTailoring NAS Proxies for Virtual Machines
Tailoring NAS Proxies for Virtual Machines
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Index file
Index fileIndex file
Index file
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Future of cloud storage
Future of cloud storageFuture of cloud storage
Future of cloud storage
 
HDFS Federation++
HDFS Federation++HDFS Federation++
HDFS Federation++
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
[B4]deview 2012-hdfs
[B4]deview 2012-hdfs[B4]deview 2012-hdfs
[B4]deview 2012-hdfs
 
Severalnines Self-Training: MySQL® Cluster - Part VIII
Severalnines Self-Training: MySQL® Cluster - Part VIIISeveralnines Self-Training: MySQL® Cluster - Part VIII
Severalnines Self-Training: MySQL® Cluster - Part VIII
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks

  • 2. Agenda • HDFS Background • Current Limitations • Federation Architecture • Federation Details • Next Steps
  • 3. HDFS Architecture Two main layers • Namespace Block Storage Namespace › Consists of dirs, files and blocks Namenode NS › Supports create, delete, modify and list files or dirs operations Block Management • Block Storage Datanode … Datanode › Block Management Storage • Datanode cluster membership • Supports create/delete/modify/get block location operations • Manages replication and replica placement › Storage - provides read and write access to blocks 3
  • 4. HDFS Architecture… Implemented as Block Storage Namespace Namenode • Single Namespace Volume NS › Namespace Volume = Namespace + Block Management Blocks • Single namenode with a namespace Datanode … Datanode › Entire namespace is in memory Storage › Provides Block Management • Datanodes store block replicas › Block files stored on local file system 4
  • 5. Limitation - Scalability Scalability • Storage scales horizontally - namespace doesn’t • Limited number of files, dirs and blocks › 250 million files and blocks at 64GB Namenode heap size • Still a very large cluster • Facebook clusters are sized at ~70 PB storage Performance • File system operations throughput limited by a single node › 120K read ops/sec and 6000 write ops/sec • Easily scalable to 20K write ops/sec by code improvements 5
  • 6. Limitation - Isolation Poor Isolation • All the tenants share a single namespace › Separate volume for tenants is desirable • Lacks separate namespace for different application categories or application requirements › Experimental apps can affect production apps › Example - HBase could use its own namespace 6
  • 7. Limitation – Tight Coupling Namespace and Block Management are distinct services • Tightly coupled due to co-location • Scaling block management independent of namespace is simpler • Simplifies Namespace and scaling it Block Storage could be a generic service • Namespace is one of the applications to use the service • Other services can be built directly on Block Storage › HBase › Foreign namespaces 7
  • 8. Isolation is a problem for even small clusters 8
  • 9. HDFS Federation Namespace NN-1 NN-k NN-n Foreig NS1 NS k n NS n ... ... Pool 1 Pool k Pool n Block Storage Block Pools Datanode 1 Datanode 2 Datanode m ... ... ... Common Storage • Multiple independent Namenodes and Namespace Volumes in a cluster › Namespace Volume = Namespace + Block Pool • Block Storage as generic storage service › Set of blocks for a Namespace Volume is called a Block Pool › DNs store blocks for all the Namespace Volumes – no partitioning
  • 10. Key Ideas & Benefits • Distributed Namespace: Partitioned across namenodes Alternate NN › Simple and Robust due to independent masters Implementation HBase HDFS • Each master serves a namespace volume Namespace MR tmp • Preserves namenode stability – little namenode code change › Scalability – 6K nodes, 100K tasks, 200PB and 1 billion files • Block Pools enable generic storage service Storage Service › Enables Namespace Volumes to be independent of each other › Fuels innovation and Rapid development • New implementations of file systems and Applications on top of block storage possible • New block pool categories – tmp storage, distributed cache, small object storage • In future, move Block Management out of namenode to separate set of nodes › Simplifies namespace/application implementation • Distributed namenode becomes significantly simpler
  • 11. HDFS Federation Details • Simple design › Little change to the Namenode, most changes in Datanode, Config and Tools › Core development in 4 months › Namespace and Block Management remain in Namenode • Block Management could be moved out of namenode in the future • Little impact on existing deployments › Single namenode configuration runs as is • Datanodes provide storage services for all the namenodes › Register with all the namenodes › Send periodic heartbeats and block reports to all the namenodes › Send block received/deleted for a block pool to corresponding namenode 11
  • 12. HDFS Federation Details… • Cluster Web UI for better manageability › Provides cluster summary › Namenode list and summary of namenode status › Decommissioning status • Tools › Decommissioning works with multiple namespace › Balancer works with multiple namespaces • Both Datanode storage or Block Pool storage can be balanced • Namenode can be added/deleted in Federated cluster › No need to restart the cluster • Single configuration for all the nodes in the cluster
  • 13. Managing Namespaces • Federation has multiple namespaces – Don’t Client-side you need a single global namespace? / mount-table › Key is to share the data and the names used to access the data • A global namespace is one way to do that data project home tmp • Client-side mount table is another way to share. › Shared mount-table => “global” shared view › Personalized mount-table => per-application NS4 view • Share the data that matter by mounting it • Client-side implementation of mount tables NS1 NS2 NS3 › No single point of failure › No hotspot for root and top level directories
  • 14. Next Steps • Complete separation of namespace and block management layers › Block storage as generic service • Partial namespace in memory for further scalability • Move partial namespace from one namenode to another › Namespace operation - no data copy
  • 15. Next Steps… • Namenode as a container for namespaces › Lots of small namespaces • Chosen per user/tenant/data feed • Mount tables for unified namespace • Can be managed by a central volume server … Namenodes › Move namespace from one container to another for balancing Datanode … Datanode • Combined with partial namespace › Choose number of namenodes to match • Sum of (Namespace working set) • Sum of (Namespace throughput) 15
  • 16. Thank You More information 1. HDFS-1052 – HDFS Scalability with multiple namenodes 2. An Introduction to HDFS Federation – https://hortonworks.com/an-introduction-to-hdfs-federation/