1. Deployment and Management
of Hadoop Clusters
Amal G Jose
Big Data Analytics
http://www.coderfox.com/
http://amalgjose.wordpress.com/
in.linkedin.com/in/amalgjose/
2. • Introduction
• Cluster design and deployment
• Backup and Recovery
• Hadoop Upgrade
• Routine Administration Tasks
Agenda
3. Introduction
• What is Hadoop ?
• What makes Hadoop different ?
• Need for a hadoop cluster ?
4. This has 4 parts:
• Cluster Planning.
• OS installation & Hardening.
• Cluster Software Installation.
• Cluster configuration.
Cluster Installation
5. Cluster Planning
Hadoop Daemon Configuration
Namenode Dedicated servers.
OS is installed on the RAID device.
The dfs.name.dir will reside on the
same RAID device. One more copy is
configured to have on NFS.
Secondary Namenode Dedicated Server
OS is installed on RAID device
Jobtracker Dedicated Server.
OS installed on JBOD configuration
Datanode/Tasktracker Individual servers.
OS installed on JBOD configuration
6. Workload Patterns For Hadoop
• Balanced Workload
• Compute Intensive
• I/O Intensive
• Unknown or evolving workload patterns
8. Name Node
Job Tracker
Ganglia-Daemon
Name Node
Job Tracker
Ganglia-Daemon
MN
Hive
Pig
Oozie
Mahout
Ganglia-Master
Hive
Pig
Oozie
Mahout
Ganglia-Master
CN
Typical Hadoop Cluster Topology
Task Tracker
Data Node
Ganglia-Daemon
Task Tracker
Data Node
Ganglia-Daemon
SN
9. • Creating the instances based on the
requirement
Creating Instances (in case of cloud)
10. • We will be installing the Hadoop on the RHEL6 64-
bit servers.
• OS should be hardened based on RHEL6
hardening document.
• Setting iptable rules necessary for hadoop
services.
• In case of Amazon EC2 instances create
key/value pairs for logging in.
• GUI can be disabled to make more room for
hadoop.
• Time should be made same in all the servers.
Operating System Hardening
11. • Choosing the distribution of Hadoop.
• Creation of Local Yum Repository.
• Java Installation in all the machines.
Cluster Software Installation
13. Installation Methods
• Hadoop can be installed either manually
or automatically using some tools such as
ClouderaManager, Ambari etc.
• One click installation tools helps the users
to install hadoop on clusters without any
pain.
14. Manual Installation
• Install hadoop daemons in the nodes.
• We can either use tarball or rpm for
installation.
• rpm installation will be easier.
15. Setting up Client Node
• What is client node ?
• Necessity of a client node ?
• How to configure a client node ?
• What all services are installed ?
• Need for multiuser segregation ?
16. Cluster Configuration
• Storage location for namenode,
secondarynamenode and datanode.
• Number of task slots (map/reduce slots).
– Number of task slots/node = (memory
available/child jvm size)
• Backup location for namenode.
• Configuring mysql for hive and oozie.
17. Namenode - Single point of
Failure
• Why namenode is the single point of
failure?
• How to resolve this issue?
• How backup can be achieved?
19. Monitoring Hadoop Cluster
• For manual installation, we can use
Ganglia.
• Automated installation tools have built-in
monitoring mechanisms available.
24. Steps for Hadoop Upgrade
• Make sure that any previous upgrade is finalized before proceeding
with another upgrade.
• Shut down MapReduce and kill any orphaned task processes on the
tasktrackers.
• Shut down HDFS and backup the namenode directories.
• Install new versions of Hadoop HDFS and MapReduce on the
cluster and on clients.
• Start HDFS with the -upgrade option.
• Wait until the upgrade is complete.
• Perform some sanity checks on HDFS.
• Start MapReduce.
• Roll back or finalize the upgrade (optional).