SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Hadoop Cluster Management



                 Dheeraj Kapur
                 Principal Engineer, Yahoo!
                 dheerajk@yahoo-inc.com
What it is…
§  Workflow based system for cluster management.
§  Completely modular & distributed design.
§  Has its own JMX based library(can be used to monitor other
    services on cluster).
§  Fully controllable from WebUI.
§  Has command line utility for adhoc administration.




                                    2
What it does…
§    Manage clusters.
§    Break fixing.
§    Upgrades OS seamlessly.
§    Consistency/efficiency of clusters.
§    Proactive self-healing Model.
§    User Management.




                                       3
Manage Clusters
§    Its has well defined workflow to manage clusters.
§    No/Minimal human intervention required.
§    Keep up efficiency of cluster.
§    Keep track of Missing/Bad blocks on system.
§    Well defined WebUI and Command line utility




                                      4
System Overview




                  5
Workflow




           6
Contd..




          7
Command Line Utility




                   8
Web Interface




                9
Web Interface contd…




                  10
fixing bad/mal-performing nodes
These errors can lead to SLA miss or Job failures
§  Takes care of Blacklisted JT nodes.
§  Errors like high load average, wrong network speed.
§  Parse system logs at X frequency (thru workflows) and look for
    patterns.
§  Visit each node multiple times in a day and check health of node.




                                   11
Upgrade OS
§  Upgrade & rollback OS seamlessly.
§  Upgrading on production, heavily used clusters.




                                   12
Consistency & efficiency of clusters
§  Keep track of cluster MR capacity
§  Proactive Fixing of sick nodes, which can cause potential issues.




                                   13
Introducing Proactive self-healing system

Let me set the ground for it.
§  Wounded hosts Called Set A - Hosts having issues, but still in service
    (with degraded services), Which can cause potential SLA misses and
    job execution issues.(which we have seen in past)
§  Fractured Hosts Called Set B - Hosts already in Break fix cycle and
    getting fixed
§  All grid hosts Called Set X - all grid hosts healthy + fine
§  Set A & B are sub-set of set X
§  to find wounded hosts we have to scan entire infrastructure once a
    day.
§  Calculate Symmetric difference b/w Set A & B, we will get actual
    wounded hosts needs service.




                                    14
Proactive self-healing contd….
                  All Grid Hosts - X




                  Set A    Set B




                          15
Proactive self-healing contd….




                   16
User Management
§  We have one of the most complex and secure environment.
§  User access and management is a complex task, due to the
    number of users, security constraints and complexity involved in
    provisioning access.
§  Single request provisioning requires change at multiple places.
§  Well defined workflow based system, where 100% automation is
    achieved.
§  Great help during system audit and compliance.




                                   17
Q&A




 18
Thank You



    19
Sessions will resume at 4:30pm




                             Page 20

Weitere ähnliche Inhalte

Was ist angesagt?

PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALEPostgreSQL Experts, Inc.
 
Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Sameer Kumar
 
MySQL's new Secure by Default Install -- All Things Open October 20th 2015
MySQL's new Secure by Default Install -- All Things Open October 20th 2015MySQL's new Secure by Default Install -- All Things Open October 20th 2015
MySQL's new Secure by Default Install -- All Things Open October 20th 2015Dave Stokes
 
Windows Server 2012 R2 Hyper-V Replica
Windows Server 2012 R2 Hyper-V ReplicaWindows Server 2012 R2 Hyper-V Replica
Windows Server 2012 R2 Hyper-V ReplicaRavikanth Chaganti
 
Ora10g Rac Best Practices
Ora10g Rac Best PracticesOra10g Rac Best Practices
Ora10g Rac Best Practicesvasanthkp
 
My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backupAndrejs Vorobjovs
 
Built-in Replication in PostgreSQL
Built-in Replication in PostgreSQLBuilt-in Replication in PostgreSQL
Built-in Replication in PostgreSQLMasao Fujii
 
What's New in Postgres Plus Advanced Server 9.3
What's New in Postgres Plus Advanced Server 9.3What's New in Postgres Plus Advanced Server 9.3
What's New in Postgres Plus Advanced Server 9.3EDB
 
MySQL Backup and Recovery Essentials
MySQL Backup and Recovery EssentialsMySQL Backup and Recovery Essentials
MySQL Backup and Recovery EssentialsRonald Bradford
 
Online MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackupOnline MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackupKenny Gryp
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr novaJoseph D'Antoni
 
PGPool-II Load testing
PGPool-II Load testingPGPool-II Load testing
PGPool-II Load testingEDB
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverJohn Paulett
 
Essential Linux Commands for DBAs
Essential Linux Commands for DBAsEssential Linux Commands for DBAs
Essential Linux Commands for DBAsGokhan Atil
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationKenny Gryp
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalJoseph D'Antoni
 

Was ist angesagt? (20)

PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
 
Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3
 
MySQL's new Secure by Default Install -- All Things Open October 20th 2015
MySQL's new Secure by Default Install -- All Things Open October 20th 2015MySQL's new Secure by Default Install -- All Things Open October 20th 2015
MySQL's new Secure by Default Install -- All Things Open October 20th 2015
 
Windows Server 2012 R2 Hyper-V Replica
Windows Server 2012 R2 Hyper-V ReplicaWindows Server 2012 R2 Hyper-V Replica
Windows Server 2012 R2 Hyper-V Replica
 
GlassFish v2 Clustering
GlassFish v2 ClusteringGlassFish v2 Clustering
GlassFish v2 Clustering
 
Ora10g Rac Best Practices
Ora10g Rac Best PracticesOra10g Rac Best Practices
Ora10g Rac Best Practices
 
My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backup
 
SQL Server vs Postgres
SQL Server vs PostgresSQL Server vs Postgres
SQL Server vs Postgres
 
Built-in Replication in PostgreSQL
Built-in Replication in PostgreSQLBuilt-in Replication in PostgreSQL
Built-in Replication in PostgreSQL
 
What's New in Postgres Plus Advanced Server 9.3
What's New in Postgres Plus Advanced Server 9.3What's New in Postgres Plus Advanced Server 9.3
What's New in Postgres Plus Advanced Server 9.3
 
MySQL Backup and Recovery Essentials
MySQL Backup and Recovery EssentialsMySQL Backup and Recovery Essentials
MySQL Backup and Recovery Essentials
 
Online MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackupOnline MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackup
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr nova
 
PGPool-II Load testing
PGPool-II Load testingPGPool-II Load testing
PGPool-II Load testing
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And Failover
 
Essential Linux Commands for DBAs
Essential Linux Commands for DBAsEssential Linux Commands for DBAs
Essential Linux Commands for DBAs
 
ProxySQL para mysql
ProxySQL para mysqlProxySQL para mysql
ProxySQL para mysql
 
MySQL Tuning
MySQL TuningMySQL Tuning
MySQL Tuning
 
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group ReplicationPercona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 

Andere mochten auch

Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
빠르고쉬운대출『LG777』.『XYZ』무자본창업 운전자보험만기환급
빠르고쉬운대출『LG777』.『XYZ』무자본창업 운전자보험만기환급빠르고쉬운대출『LG777』.『XYZ』무자본창업 운전자보험만기환급
빠르고쉬운대출『LG777』.『XYZ』무자본창업 운전자보험만기환급hslkdfjs
 
New Use Cases for DAM in the Enterprise
New Use Cases for DAM in the EnterpriseNew Use Cases for DAM in the Enterprise
New Use Cases for DAM in the EnterpriseNuxeo
 
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasRemaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasMongoDB
 
01_HTML - 작심10시간! 나만의 웹사이트 기획하고 만들기
01_HTML - 작심10시간! 나만의 웹사이트 기획하고 만들기01_HTML - 작심10시간! 나만의 웹사이트 기획하고 만들기
01_HTML - 작심10시간! 나만의 웹사이트 기획하고 만들기설리번 프로젝트
 
Tailings dump recovery concept
Tailings dump recovery conceptTailings dump recovery concept
Tailings dump recovery conceptphillip shambare
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
GIS for Infrastructure Management
GIS for Infrastructure ManagementGIS for Infrastructure Management
GIS for Infrastructure ManagementDavid Puckett
 
Real-time, Sensor-based Monitoring of Shipping Containers
Real-time, Sensor-based Monitoring of Shipping ContainersReal-time, Sensor-based Monitoring of Shipping Containers
Real-time, Sensor-based Monitoring of Shipping Containersbenaam
 
Designing your Product as a Platform
Designing your Product as a PlatformDesigning your Product as a Platform
Designing your Product as a PlatformMicah Laaker
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 
Airport Billing System for Aviation and Non-Aviation Services
Airport Billing System for Aviation and Non-Aviation Services Airport Billing System for Aviation and Non-Aviation Services
Airport Billing System for Aviation and Non-Aviation Services Ericsson
 
Web Services Automated Testing via SoapUI Tool
Web Services Automated Testing via SoapUI ToolWeb Services Automated Testing via SoapUI Tool
Web Services Automated Testing via SoapUI ToolSperasoft
 
Spend Analysis In 60 Seconds
Spend Analysis In 60 SecondsSpend Analysis In 60 Seconds
Spend Analysis In 60 SecondsClaritum
 
Surgical induced astigmatism
Surgical induced astigmatismSurgical induced astigmatism
Surgical induced astigmatismNamrata Gupta
 
Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Blackbaud Pacific
 

Andere mochten auch (20)

Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
빠르고쉬운대출『LG777』.『XYZ』무자본창업 운전자보험만기환급
빠르고쉬운대출『LG777』.『XYZ』무자본창업 운전자보험만기환급빠르고쉬운대출『LG777』.『XYZ』무자본창업 운전자보험만기환급
빠르고쉬운대출『LG777』.『XYZ』무자본창업 운전자보험만기환급
 
New Use Cases for DAM in the Enterprise
New Use Cases for DAM in the EnterpriseNew Use Cases for DAM in the Enterprise
New Use Cases for DAM in the Enterprise
 
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB SchemasRemaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas
 
01_HTML - 작심10시간! 나만의 웹사이트 기획하고 만들기
01_HTML - 작심10시간! 나만의 웹사이트 기획하고 만들기01_HTML - 작심10시간! 나만의 웹사이트 기획하고 만들기
01_HTML - 작심10시간! 나만의 웹사이트 기획하고 만들기
 
Tailings dump recovery concept
Tailings dump recovery conceptTailings dump recovery concept
Tailings dump recovery concept
 
Polymer optical fibers
Polymer optical fibersPolymer optical fibers
Polymer optical fibers
 
SAP Cloud for Service
SAP Cloud for ServiceSAP Cloud for Service
SAP Cloud for Service
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
GIS for Infrastructure Management
GIS for Infrastructure ManagementGIS for Infrastructure Management
GIS for Infrastructure Management
 
Real-time, Sensor-based Monitoring of Shipping Containers
Real-time, Sensor-based Monitoring of Shipping ContainersReal-time, Sensor-based Monitoring of Shipping Containers
Real-time, Sensor-based Monitoring of Shipping Containers
 
Chem Lab Report (1)
Chem Lab Report (1)Chem Lab Report (1)
Chem Lab Report (1)
 
Designing your Product as a Platform
Designing your Product as a PlatformDesigning your Product as a Platform
Designing your Product as a Platform
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
High-Density Wireless Networks for Auditoriums
High-Density Wireless Networks for AuditoriumsHigh-Density Wireless Networks for Auditoriums
High-Density Wireless Networks for Auditoriums
 
Airport Billing System for Aviation and Non-Aviation Services
Airport Billing System for Aviation and Non-Aviation Services Airport Billing System for Aviation and Non-Aviation Services
Airport Billing System for Aviation and Non-Aviation Services
 
Web Services Automated Testing via SoapUI Tool
Web Services Automated Testing via SoapUI ToolWeb Services Automated Testing via SoapUI Tool
Web Services Automated Testing via SoapUI Tool
 
Spend Analysis In 60 Seconds
Spend Analysis In 60 SecondsSpend Analysis In 60 Seconds
Spend Analysis In 60 Seconds
 
Surgical induced astigmatism
Surgical induced astigmatismSurgical induced astigmatism
Surgical induced astigmatism
 
Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...Best practice strategies to clean up and maintain your database with Hether G...
Best practice strategies to clean up and maintain your database with Hether G...
 

Ähnlich wie Hadoop Cluster Management

Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13Dave Gardner
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql ClusterAmr Fawzy
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
VMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld
 
1 introduction
1 introduction1 introduction
1 introductionMohd Arif
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Nonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinNonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinTechWell
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudVinay Kumar Chella
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaSShawn Zhu
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing Ran Levy
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeSarah Z
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msIlya Ganelin
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Spark Summit
 
IaaS - Virtualization_Cambridge.pdf
IaaS - Virtualization_Cambridge.pdfIaaS - Virtualization_Cambridge.pdf
IaaS - Virtualization_Cambridge.pdfDharavathRamesh2
 

Ähnlich wie Hadoop Cluster Management (20)

Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Highly Available Load Balanced Galera MySql Cluster
Highly Available Load Balanced  Galera MySql ClusterHighly Available Load Balanced  Galera MySql Cluster
Highly Available Load Balanced Galera MySql Cluster
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
VMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing DatabasesVMworld 2014: Virtualizing Databases
VMworld 2014: Virtualizing Databases
 
1 introduction
1 introduction1 introduction
1 introduction
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Nonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinNonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the Coin
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaS
 
Resilience Testing
Resilience Testing Resilience Testing
Resilience Testing
 
Bishwambar Linux Admin
Bishwambar Linux AdminBishwambar Linux Admin
Bishwambar Linux Admin
 
Ansible for networks
Ansible for networksAnsible for networks
Ansible for networks
 
Ansible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less CoffeeAnsible: How to Get More Sleep and Require Less Coffee
Ansible: How to Get More Sleep and Require Less Coffee
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
IaaS - Virtualization_Cambridge.pdf
IaaS - Virtualization_Cambridge.pdfIaaS - Virtualization_Cambridge.pdf
IaaS - Virtualization_Cambridge.pdf
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Kürzlich hochgeladen (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Hadoop Cluster Management

  • 1. Hadoop Cluster Management Dheeraj Kapur Principal Engineer, Yahoo! dheerajk@yahoo-inc.com
  • 2. What it is… §  Workflow based system for cluster management. §  Completely modular & distributed design. §  Has its own JMX based library(can be used to monitor other services on cluster). §  Fully controllable from WebUI. §  Has command line utility for adhoc administration. 2
  • 3. What it does… §  Manage clusters. §  Break fixing. §  Upgrades OS seamlessly. §  Consistency/efficiency of clusters. §  Proactive self-healing Model. §  User Management. 3
  • 4. Manage Clusters §  Its has well defined workflow to manage clusters. §  No/Minimal human intervention required. §  Keep up efficiency of cluster. §  Keep track of Missing/Bad blocks on system. §  Well defined WebUI and Command line utility 4
  • 11. fixing bad/mal-performing nodes These errors can lead to SLA miss or Job failures §  Takes care of Blacklisted JT nodes. §  Errors like high load average, wrong network speed. §  Parse system logs at X frequency (thru workflows) and look for patterns. §  Visit each node multiple times in a day and check health of node. 11
  • 12. Upgrade OS §  Upgrade & rollback OS seamlessly. §  Upgrading on production, heavily used clusters. 12
  • 13. Consistency & efficiency of clusters §  Keep track of cluster MR capacity §  Proactive Fixing of sick nodes, which can cause potential issues. 13
  • 14. Introducing Proactive self-healing system Let me set the ground for it. §  Wounded hosts Called Set A - Hosts having issues, but still in service (with degraded services), Which can cause potential SLA misses and job execution issues.(which we have seen in past) §  Fractured Hosts Called Set B - Hosts already in Break fix cycle and getting fixed §  All grid hosts Called Set X - all grid hosts healthy + fine §  Set A & B are sub-set of set X §  to find wounded hosts we have to scan entire infrastructure once a day. §  Calculate Symmetric difference b/w Set A & B, we will get actual wounded hosts needs service. 14
  • 15. Proactive self-healing contd…. All Grid Hosts - X Set A Set B 15
  • 17. User Management §  We have one of the most complex and secure environment. §  User access and management is a complex task, due to the number of users, security constraints and complexity involved in provisioning access. §  Single request provisioning requires change at multiple places. §  Well defined workflow based system, where 100% automation is achieved. §  Great help during system audit and compliance. 17
  • 19. Thank You 19
  • 20. Sessions will resume at 4:30pm Page 20