SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Deployment and Management
of Hadoop Clusters
Amal G Jose
Big Data Analytics
http://www.coderfox.com/
http://amalgjose.wordpress.com/
in.linkedin.com/in/amalgjose/
• Introduction
• Cluster design and deployment
• Backup and Recovery
• Hadoop Upgrade
• Routine Administration Tasks
Agenda
Introduction
• What is Hadoop ?
• What makes Hadoop different ?
• Need for a hadoop cluster ?
This has 4 parts:
• Cluster Planning.
• OS installation & Hardening.
• Cluster Software Installation.
• Cluster configuration.
Cluster Installation
Cluster Planning
Hadoop Daemon Configuration
Namenode Dedicated servers.
OS is installed on the RAID device.
The dfs.name.dir will reside on the
same RAID device. One more copy is
configured to have on NFS.
Secondary Namenode Dedicated Server
OS is installed on RAID device
Jobtracker Dedicated Server.
OS installed on JBOD configuration
Datanode/Tasktracker Individual servers.
OS installed on JBOD configuration
Workload Patterns For Hadoop
• Balanced Workload
• Compute Intensive
• I/O Intensive
• Unknown or evolving workload patterns
Cluster Topology
Name Node
Job Tracker
Ganglia-Daemon
Name Node
Job Tracker
Ganglia-Daemon
MN
Hive
Pig
Oozie
Mahout
Ganglia-Master
Hive
Pig
Oozie
Mahout
Ganglia-Master
CN
Typical Hadoop Cluster Topology
Task Tracker
Data Node
Ganglia-Daemon
Task Tracker
Data Node
Ganglia-Daemon
SN
• Creating the instances based on the
requirement
Creating Instances (in case of cloud)
• We will be installing the Hadoop on the RHEL6 64-
bit servers.
• OS should be hardened based on RHEL6
hardening document.
• Setting iptable rules necessary for hadoop
services.
• In case of Amazon EC2 instances create
key/value pairs for logging in.
• GUI can be disabled to make more room for
hadoop.
• Time should be made same in all the servers.
Operating System Hardening
• Choosing the distribution of Hadoop.
• Creation of Local Yum Repository.
• Java Installation in all the machines.
Cluster Software Installation
Hadoop Ecosystem
Installation Methods
• Hadoop can be installed either manually
or automatically using some tools such as
ClouderaManager, Ambari etc.
• One click installation tools helps the users
to install hadoop on clusters without any
pain.
Manual Installation
• Install hadoop daemons in the nodes.
• We can either use tarball or rpm for
installation.
• rpm installation will be easier.
Setting up Client Node
• What is client node ?
• Necessity of a client node ?
• How to configure a client node ?
• What all services are installed ?
• Need for multiuser segregation ?
Cluster Configuration
• Storage location for namenode,
secondarynamenode and datanode.
• Number of task slots (map/reduce slots).
– Number of task slots/node = (memory
available/child jvm size)
• Backup location for namenode.
• Configuring mysql for hive and oozie.
Namenode - Single point of
Failure
• Why namenode is the single point of
failure?
• How to resolve this issue?
• How backup can be achieved?
Implementing Schedulers
• Capacity scheduler
• Fair scheduler
Monitoring Hadoop Cluster
• For manual installation, we can use
Ganglia.
• Automated installation tools have built-in
monitoring mechanisms available.
Ganglia
Cluster Maintenance
• Managing Hadoop processes
– Starting/stopping processes
• HDFS Maintenance
– Adding /Decommissioning datanode
– Checking file system integrity with fsck.
– Balancing hdfs block data
– Dealing with a failed disk
• Mapreduce Maintenance
– Adding /Decommissioning tasktracker
– Killing a mapreduce Job/ Task
– Dealing with a blacklisted tasktracker
Backup and Recovery
• Data Backup
– Distributed copy (distcp)
– Parallel Ingestion
• Namenode Metadata
• Hive metastore backup.
Hadoop Upgrades
• Data Backup
• Software upgrade
• HDFS upgrade
• Finalize upgrade
Steps for Hadoop Upgrade
• Make sure that any previous upgrade is finalized before proceeding
with another upgrade.
• Shut down MapReduce and kill any orphaned task processes on the
tasktrackers.
• Shut down HDFS and backup the namenode directories.
• Install new versions of Hadoop HDFS and MapReduce on the
cluster and on clients.
• Start HDFS with the -upgrade option.
• Wait until the upgrade is complete.
• Perform some sanity checks on HDFS.
• Start MapReduce.
• Roll back or finalize the upgrade (optional).
Routine Administration
Procedures
• Checking every nodes
• Metadata backups
• Data backups
• File system check
• File system balancer
Summary
• Hadoop Cluster design
• Hadoop Cluster Installation
• Back up and Recovery
• Hadoop Upgrade
• Routine Administration Procedures
For more info, visit:
http://amalgjose.wordpress.com
http://coderfox.com
http://in.linkedin.com/in/amalgjose
Additional Information
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterEdureka!
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorEdureka!
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - OverviewJay
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin trainingArun Kumar
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nageSantosh Nage
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 

Was ist angesagt? (20)

Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop Administrator
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
ha_module5
ha_module5ha_module5
ha_module5
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin training
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 

Andere mochten auch

Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop AdministrationEdureka!
 
Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Ashok Royal
 
SQL Server 2012 - FileTables
SQL Server 2012 - FileTables SQL Server 2012 - FileTables
SQL Server 2012 - FileTables Sperasoft
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonVitthal Gogate
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and HadoopKai Zheng
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkDatabricks
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsKalyan Hadoop
 
Where is my next jobs in the age of Big Data and Automation
Where is my next jobs in the age of Big Data and AutomationWhere is my next jobs in the age of Big Data and Automation
Where is my next jobs in the age of Big Data and AutomationTrieu Nguyen
 

Andere mochten auch (20)

Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)
 
SQL Server 2012 - FileTables
SQL Server 2012 - FileTables SQL Server 2012 - FileTables
SQL Server 2012 - FileTables
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Kerberos, Token and Hadoop
Kerberos, Token and HadoopKerberos, Token and Hadoop
Kerberos, Token and Hadoop
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Orienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshotsOrienit hadoop practical cluster setup screenshots
Orienit hadoop practical cluster setup screenshots
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Where is my next jobs in the age of Big Data and Automation
Where is my next jobs in the age of Big Data and AutomationWhere is my next jobs in the age of Big Data and Automation
Where is my next jobs in the age of Big Data and Automation
 

Ähnlich wie Deployment and Management of Hadoop Clusters

Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDataWorks Summit
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around HadoopDataWorks Summit
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopfann wu
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
SpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecturesaipriyacoool
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainingsGeek Trainings
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to HadoopAnandMHadoop
 

Ähnlich wie Deployment and Management of Hadoop Clusters (20)

Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Distro-independent Hadoop cluster management
Distro-independent Hadoop cluster managementDistro-independent Hadoop cluster management
Distro-independent Hadoop cluster management
 
Infrastructure Around Hadoop
Infrastructure Around HadoopInfrastructure Around Hadoop
Infrastructure Around Hadoop
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
SpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache HadoopSpringPeople Introduction to Apache Hadoop
SpringPeople Introduction to Apache Hadoop
 
Instant hadoop of your own
Instant hadoop of your ownInstant hadoop of your own
Instant hadoop of your own
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Hadoop_Admin_eVenkat
Hadoop_Admin_eVenkatHadoop_Admin_eVenkat
Hadoop_Admin_eVenkat
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Hadoop online trainings
Hadoop online trainingsHadoop online trainings
Hadoop online trainings
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Unit 5
Unit  5Unit  5
Unit 5
 

Kürzlich hochgeladen

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Kürzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Deployment and Management of Hadoop Clusters

  • 1. Deployment and Management of Hadoop Clusters Amal G Jose Big Data Analytics http://www.coderfox.com/ http://amalgjose.wordpress.com/ in.linkedin.com/in/amalgjose/
  • 2. • Introduction • Cluster design and deployment • Backup and Recovery • Hadoop Upgrade • Routine Administration Tasks Agenda
  • 3. Introduction • What is Hadoop ? • What makes Hadoop different ? • Need for a hadoop cluster ?
  • 4. This has 4 parts: • Cluster Planning. • OS installation & Hardening. • Cluster Software Installation. • Cluster configuration. Cluster Installation
  • 5. Cluster Planning Hadoop Daemon Configuration Namenode Dedicated servers. OS is installed on the RAID device. The dfs.name.dir will reside on the same RAID device. One more copy is configured to have on NFS. Secondary Namenode Dedicated Server OS is installed on RAID device Jobtracker Dedicated Server. OS installed on JBOD configuration Datanode/Tasktracker Individual servers. OS installed on JBOD configuration
  • 6. Workload Patterns For Hadoop • Balanced Workload • Compute Intensive • I/O Intensive • Unknown or evolving workload patterns
  • 8. Name Node Job Tracker Ganglia-Daemon Name Node Job Tracker Ganglia-Daemon MN Hive Pig Oozie Mahout Ganglia-Master Hive Pig Oozie Mahout Ganglia-Master CN Typical Hadoop Cluster Topology Task Tracker Data Node Ganglia-Daemon Task Tracker Data Node Ganglia-Daemon SN
  • 9. • Creating the instances based on the requirement Creating Instances (in case of cloud)
  • 10. • We will be installing the Hadoop on the RHEL6 64- bit servers. • OS should be hardened based on RHEL6 hardening document. • Setting iptable rules necessary for hadoop services. • In case of Amazon EC2 instances create key/value pairs for logging in. • GUI can be disabled to make more room for hadoop. • Time should be made same in all the servers. Operating System Hardening
  • 11. • Choosing the distribution of Hadoop. • Creation of Local Yum Repository. • Java Installation in all the machines. Cluster Software Installation
  • 13. Installation Methods • Hadoop can be installed either manually or automatically using some tools such as ClouderaManager, Ambari etc. • One click installation tools helps the users to install hadoop on clusters without any pain.
  • 14. Manual Installation • Install hadoop daemons in the nodes. • We can either use tarball or rpm for installation. • rpm installation will be easier.
  • 15. Setting up Client Node • What is client node ? • Necessity of a client node ? • How to configure a client node ? • What all services are installed ? • Need for multiuser segregation ?
  • 16. Cluster Configuration • Storage location for namenode, secondarynamenode and datanode. • Number of task slots (map/reduce slots). – Number of task slots/node = (memory available/child jvm size) • Backup location for namenode. • Configuring mysql for hive and oozie.
  • 17. Namenode - Single point of Failure • Why namenode is the single point of failure? • How to resolve this issue? • How backup can be achieved?
  • 18. Implementing Schedulers • Capacity scheduler • Fair scheduler
  • 19. Monitoring Hadoop Cluster • For manual installation, we can use Ganglia. • Automated installation tools have built-in monitoring mechanisms available.
  • 21. Cluster Maintenance • Managing Hadoop processes – Starting/stopping processes • HDFS Maintenance – Adding /Decommissioning datanode – Checking file system integrity with fsck. – Balancing hdfs block data – Dealing with a failed disk • Mapreduce Maintenance – Adding /Decommissioning tasktracker – Killing a mapreduce Job/ Task – Dealing with a blacklisted tasktracker
  • 22. Backup and Recovery • Data Backup – Distributed copy (distcp) – Parallel Ingestion • Namenode Metadata • Hive metastore backup.
  • 23. Hadoop Upgrades • Data Backup • Software upgrade • HDFS upgrade • Finalize upgrade
  • 24. Steps for Hadoop Upgrade • Make sure that any previous upgrade is finalized before proceeding with another upgrade. • Shut down MapReduce and kill any orphaned task processes on the tasktrackers. • Shut down HDFS and backup the namenode directories. • Install new versions of Hadoop HDFS and MapReduce on the cluster and on clients. • Start HDFS with the -upgrade option. • Wait until the upgrade is complete. • Perform some sanity checks on HDFS. • Start MapReduce. • Roll back or finalize the upgrade (optional).
  • 25. Routine Administration Procedures • Checking every nodes • Metadata backups • Data backups • File system check • File system balancer
  • 26. Summary • Hadoop Cluster design • Hadoop Cluster Installation • Back up and Recovery • Hadoop Upgrade • Routine Administration Procedures
  • 27. For more info, visit: http://amalgjose.wordpress.com http://coderfox.com http://in.linkedin.com/in/amalgjose Additional Information