SlideShare ist ein Scribd-Unternehmen logo
1 von 20
HDFS Router-based
Federation
Íñigo Goiri
Microsoft
Chao Sun
Uber
High-level HDFS setup
Microsoft
• 3 data centers
• 2 more provisioning
• Harvested capacity [OSDI’16]
• Hadoop 2.7.1 (migrating to 2.9)
• Many internal extensions
• Internal batch workloads
Uber
• 3 data centers
• 4th is coming soon
• Thousands of servers
• 100+ PB of data
• Hadoop 2.8.2
• Many custom and backported patches
• Serves applications
• Fraud detection, ML/DL, ETA calculation…
• Foundation for Uber’s data lake
Current architecture in HDFS
• NameNode scalability/performance issues
• Split into independent namespaces (subclusters)
• Fragmentation
• Users in charge of choosing subcluster
• ViewFS to unify subclusters
Subcluster 0
NN NN
DN DN DN
Subcluster 1
NN NN
DN DN DN
Subcluster 2
NN NN
DN DN DN
Mount
Table
Setup at Uber
• Use client-side ViewFS config
• Split into 3 clusters (each DC)
• Main production HDFS cluster
• Hbase cluster
• Tmp cluster (Hive scratch directory, Yarn application logs, etc).
Problems with ViewFS
• Hard to maintain per-client mount table
• Many HDFS clients, libraries, configurations,…
• Each client may have its own mount table
• Requires infrastructure to distribute mount table updates
• Adding more datacenters
• Manual balancing of subclusters
• Solution
• “Centralize” the mount table
• Introduce a routing layer: Router-Based Federation (RBF)
Our solution: RBF
• “Centralize” the mount table
• Introduce a routing layer: Router-Based Federation (RBF)
• Router: Proxy requests from users to the right NameNode
• State Store: Shared mount table and other state
Federation layer
Subcluster 0
NN NN
DN DN DN
Subcluster 1
NN NN
DN DN DN
Subcluster N
NN NN
DN DN DN
Router State Store
Router
• Unified view of the federation
• Performance improvements
• Caching State Store data
• Includes modified client to contact NameNodes
Subcluster 0
NN NN
DN DN DN
R
State Store
R
Subcluster 1
NN NN
DN DN DN
RR
Subcluster 2
NN NN
DN DN DN
RR
State Store
• Shared information
• Mount table: /DC0-2  DC0-2, /
• Federation membership: DC0, DC0-1, DC0-2, DC0-3
• Router tracking: R1, R2, R11, R22,…
• Not critical for performance (cached in each Router)
• Implementations
• ZooKeeper, HDFS, SQL server,…
HDFS RBF deployments
Microsoft
• 1 Router per Namenode
• All data centers
• ZooKeeper as the State Store
• Main workloads
• Large deep learning
• Application logs
Uber
• 2 routers
• 1 data center
• ZooKeeper as the State Store
• Workload
• Hive traffic accessing scratch dir
Microsoft RBF deployment
• 23k servers
• 8 subclusters
• 28 NameNodes (4 per subcluster)
• 28 Routers: load balancing
• Transparent access
• RPC
• WebHDFS
• Web UI
Router adds latency
Client Router Server Core (e.g., access SStore) NamenodeRouter Client
Handler Queue
Pre-process
Open
Connection
Socket
Queue
Post-process NN
Latency
Explicitly
measured
Implicitly
estimated
Performance and scalability
NN vs RNN 4NN vs 12R  4NN
0
1
2
3
4
5
6
7
8
9
10
0 10000 20000 30000 40000 50000
Latency(ms)
Requests/sec
With Router
NN
0
1
2
3
4
5
6
7
8
9
10
0 50000 100000 150000 200000
Latency(ms)
Requests/sec
12 Routers 4 NNs
4 NNs
Read metadata (cheapest access  worst case for Router)
Tuning Router RPC client
• Router uses a connection pool for each <user, namenode> pair
• Connection creation/cleanup is done asynchronously
• Hard to configure: Connection pool size, creator queue size,…
• Default IPC client is a bottleneck
• Router forwarding many requests to NameNodes
• HADOOP-13144 allow multiple concurrent client/connections
• HDFS-13274 leverage concurrent connections
Active development
• Many contributors
• VipShop, Uber, Huawei,…
• Bug fixes
• New features:
• WebHDFS
• Better subcluster isolation
• Federated quotas
• Tracking the state of the Routers
• Spreading mount points across subclusters
Spreading mount points across subclusters
• Large jobs processing data spanning multiple subclusters
• Similar to merge mount points (HADOOP-8298)
• Policies to decide the subcluster to write a file
• Consistent hashing, available space, local subcluster,…
• Limitations
• Needs to find files in multiple subclusters
• Renaming has some inefficiencies
• Subcluster unavailability is hard to manage
Open issues
• Router  Namenode connections (HDFS-13274)
• Security (HDFS-13532)
• Manage delegation tokens
• LinkedIn driving this effort
• Rebalancer (HDFS-13123)
• inodes (HDFS-13469)
• We need to identify the nameservice
• Other issues tracked in HDFS-12615
Rebalancer
• Subclusters can get unbalanced
• Load, space,…
• Manually move files/folders between subclusters
• Hard to provide strong consistency
• Transparent data movement between subclusters
• Service to monitor if subclusters balanced
• Policies to decide what to move and trigger rebalancing [ATC’17]
• Rebalancer
• Move data: DistCp, Tiered storage, DN block linking,…
• Change namespace (mount table) consistently
Uber future plan: overview
Uber future plan: detailed
• Routers on top of all HDFS clusters
• Service to initiate data movement between hot and warm clusters
• Similar to Rebalancer, implemented in Golang
• Service to track dataset temperature and initiate data movement
• Observer (ReadOnly) NameNodes to serve stale read requests
• HDFS-12943
• > 90% traffic are read RPC requests
• Mostly for Presto queries
New ideas/brainstorming
• Federating federations
1. Creating a hierarchy: putting Routers in front of other Routers
2. Large federation: sharing State Store across deployments
• Connecting DataNodes to RBF
• DNs need to be configured to join subclusters
• Can DNs use the Routers to join subclusters?
1. Heartbeat through the Router
2. Find the Datanodes through the Router

Weitere ähnliche Inhalte

Mehr von DataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesDataWorks Summit
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentDataWorks Summit
 
Big Data Technologies in Support of a Medical School Data Science Institute
Big Data Technologies in Support of a Medical School Data Science InstituteBig Data Technologies in Support of a Medical School Data Science Institute
Big Data Technologies in Support of a Medical School Data Science InstituteDataWorks Summit
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraDataWorks Summit
 
Free Servers to Build Big Data System on: Bing’s Approach
Free Servers to Build Big Data System on: Bing’s ApproachFree Servers to Build Big Data System on: Bing’s Approach
Free Servers to Build Big Data System on: Bing’s ApproachDataWorks Summit
 
IoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsIoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsDataWorks Summit
 

Mehr von DataWorks Summit (20)

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake Environment
 
Big Data Technologies in Support of a Medical School Data Science Institute
Big Data Technologies in Support of a Medical School Data Science InstituteBig Data Technologies in Support of a Medical School Data Science Institute
Big Data Technologies in Support of a Medical School Data Science Institute
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
Free Servers to Build Big Data System on: Bing’s Approach
Free Servers to Build Big Data System on: Bing’s ApproachFree Servers to Build Big Data System on: Bing’s Approach
Free Servers to Build Big Data System on: Bing’s Approach
 
IoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsIoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management Things
 

KĂźrzlich hochgeladen

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vĂĄzquez
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂşjo
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

KĂźrzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

HDFS router based federation

  • 2. High-level HDFS setup Microsoft • 3 data centers • 2 more provisioning • Harvested capacity [OSDI’16] • Hadoop 2.7.1 (migrating to 2.9) • Many internal extensions • Internal batch workloads Uber • 3 data centers • 4th is coming soon • Thousands of servers • 100+ PB of data • Hadoop 2.8.2 • Many custom and backported patches • Serves applications • Fraud detection, ML/DL, ETA calculation… • Foundation for Uber’s data lake
  • 3. Current architecture in HDFS • NameNode scalability/performance issues • Split into independent namespaces (subclusters) • Fragmentation • Users in charge of choosing subcluster • ViewFS to unify subclusters Subcluster 0 NN NN DN DN DN Subcluster 1 NN NN DN DN DN Subcluster 2 NN NN DN DN DN Mount Table
  • 4. Setup at Uber • Use client-side ViewFS config • Split into 3 clusters (each DC) • Main production HDFS cluster • Hbase cluster • Tmp cluster (Hive scratch directory, Yarn application logs, etc).
  • 5. Problems with ViewFS • Hard to maintain per-client mount table • Many HDFS clients, libraries, configurations,… • Each client may have its own mount table • Requires infrastructure to distribute mount table updates • Adding more datacenters • Manual balancing of subclusters • Solution • “Centralize” the mount table • Introduce a routing layer: Router-Based Federation (RBF)
  • 6. Our solution: RBF • “Centralize” the mount table • Introduce a routing layer: Router-Based Federation (RBF) • Router: Proxy requests from users to the right NameNode • State Store: Shared mount table and other state Federation layer Subcluster 0 NN NN DN DN DN Subcluster 1 NN NN DN DN DN Subcluster N NN NN DN DN DN Router State Store
  • 7. Router • Unified view of the federation • Performance improvements • Caching State Store data • Includes modified client to contact NameNodes Subcluster 0 NN NN DN DN DN R State Store R Subcluster 1 NN NN DN DN DN RR Subcluster 2 NN NN DN DN DN RR
  • 8. State Store • Shared information • Mount table: /DC0-2  DC0-2, / • Federation membership: DC0, DC0-1, DC0-2, DC0-3 • Router tracking: R1, R2, R11, R22,… • Not critical for performance (cached in each Router) • Implementations • ZooKeeper, HDFS, SQL server,…
  • 9. HDFS RBF deployments Microsoft • 1 Router per Namenode • All data centers • ZooKeeper as the State Store • Main workloads • Large deep learning • Application logs Uber • 2 routers • 1 data center • ZooKeeper as the State Store • Workload • Hive traffic accessing scratch dir
  • 10. Microsoft RBF deployment • 23k servers • 8 subclusters • 28 NameNodes (4 per subcluster) • 28 Routers: load balancing • Transparent access • RPC • WebHDFS • Web UI
  • 11. Router adds latency Client Router Server Core (e.g., access SStore) NamenodeRouter Client Handler Queue Pre-process Open Connection Socket Queue Post-process NN Latency Explicitly measured Implicitly estimated
  • 12. Performance and scalability NN vs RNN 4NN vs 12R  4NN 0 1 2 3 4 5 6 7 8 9 10 0 10000 20000 30000 40000 50000 Latency(ms) Requests/sec With Router NN 0 1 2 3 4 5 6 7 8 9 10 0 50000 100000 150000 200000 Latency(ms) Requests/sec 12 Routers 4 NNs 4 NNs Read metadata (cheapest access  worst case for Router)
  • 13. Tuning Router RPC client • Router uses a connection pool for each <user, namenode> pair • Connection creation/cleanup is done asynchronously • Hard to configure: Connection pool size, creator queue size,… • Default IPC client is a bottleneck • Router forwarding many requests to NameNodes • HADOOP-13144 allow multiple concurrent client/connections • HDFS-13274 leverage concurrent connections
  • 14. Active development • Many contributors • VipShop, Uber, Huawei,… • Bug fixes • New features: • WebHDFS • Better subcluster isolation • Federated quotas • Tracking the state of the Routers • Spreading mount points across subclusters
  • 15. Spreading mount points across subclusters • Large jobs processing data spanning multiple subclusters • Similar to merge mount points (HADOOP-8298) • Policies to decide the subcluster to write a file • Consistent hashing, available space, local subcluster,… • Limitations • Needs to find files in multiple subclusters • Renaming has some inefficiencies • Subcluster unavailability is hard to manage
  • 16. Open issues • Router  Namenode connections (HDFS-13274) • Security (HDFS-13532) • Manage delegation tokens • LinkedIn driving this effort • Rebalancer (HDFS-13123) • inodes (HDFS-13469) • We need to identify the nameservice • Other issues tracked in HDFS-12615
  • 17. Rebalancer • Subclusters can get unbalanced • Load, space,… • Manually move files/folders between subclusters • Hard to provide strong consistency • Transparent data movement between subclusters • Service to monitor if subclusters balanced • Policies to decide what to move and trigger rebalancing [ATC’17] • Rebalancer • Move data: DistCp, Tiered storage, DN block linking,… • Change namespace (mount table) consistently
  • 18. Uber future plan: overview
  • 19. Uber future plan: detailed • Routers on top of all HDFS clusters • Service to initiate data movement between hot and warm clusters • Similar to Rebalancer, implemented in Golang • Service to track dataset temperature and initiate data movement • Observer (ReadOnly) NameNodes to serve stale read requests • HDFS-12943 • > 90% traffic are read RPC requests • Mostly for Presto queries
  • 20. New ideas/brainstorming • Federating federations 1. Creating a hierarchy: putting Routers in front of other Routers 2. Large federation: sharing State Store across deployments • Connecting DataNodes to RBF • DNs need to be configured to join subclusters • Can DNs use the Routers to join subclusters? 1. Heartbeat through the Router 2. Find the Datanodes through the Router