5. Basics
• Ensure that all the namenode and datanode machines are running
on the same OSX version
• For the purpose of this POC, I have selected OSX 10.7.5. All sample
commands are specific to this OS. You may need to tweak the
commands to suit your OS version compatibility
• I am a homebrew fan , so I have used the old and gold ruby based
platform for downloading all software needed to run the POC. You
may very well opt for downloading the installers individually and
tweak the process if you wish
• You will need fair bit of understanding of OSX and Hadoop to
understand and interpret. If not, no worries – most of the stuff can
be looked up online by simple Google search
• The “Namenode” machine needs more RAM than “Datanode”
machines. Please configure the namenode machine with at least 8
GB RAM
6. The Cooking
• Ensure that ALL datanodes and namenode machines are running on the
same OSX version and preferably have regulated software update strategy
(i.e. automatic software disabled)
• Disable automatic “sleep” options in the machines to avoid machines goes
into hibernation (from System Preferences)
• Download and Install “Xcode command line tools for Lion” (skip if Xcode
present)
• As of today, hadoop is not IPv6 friendly. So, please disable IPv6 on all
machines:
“networksetup –listallnetworkservices” command will display all the network
names that your machine uses to connect to your network (E.g: Ethernet, Wi-
Fi etc.)
“networksetup –setv6off Ethernet” will disable IPv6 over Ethernet (you may
need to change the network name if it is any different)
7. The Cooking..
• Give logical names to ALL machines e.g. namenode.local ,datanode01.local
datanode02.local et al. (from System Preferences -> Sharing -> Computer
Name)
• Enable the following services from the Sharing panel of System
Preferences
– File Sharing
– Remote Login
– Remote Management
• Create one universal username (with Administrator privileges) on all
machines . E.g: hadoopuser. Preferably have the same password
• For the rest of steps , please login as this user and execute the commands
8. The Cooking
• On the namenode, run the command:
vi /etc/hosts
• Add all datanode hostnames , one host per line
• On each of the datanodes, run the command:
vi /etc/hosts
• Add the namenode hostname
sudo visudo
• Add an entry on the last line of the file as under:
hadoopuser ALL=(ALL) NOPASSWD: ALL
9. Coffee Time
• Install Java JDK and JRE on all the machines from Oracle Site
(http://bit.ly/1s2i7VC) . Configure $JAVA_HOME (see slides for
instructions)
• Set $JAVA_HOME in ALL machines. Usually, it is best to configure the same
in your .profile file. Run the following command to open your .profile
• vi ~/.profile
• #Paste the subsequent lines in the file and save it :
export JAVA_HOME="`/System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands/java_home`"
• You may additionally paste the following lines in the same file:
export PATH=$PATH:/usr/local/sbin
PS1="H : d t: w :"
This is helpful for housekeeping activities
10. The Brewing
• Install “brew” and other components from it
Run on terminal :
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
[the quotes need to be there]
Run following command on terminal to ensure that it has been installed properly
brew doctor
Run following commands in the same order on terminal
brew install makedepend
brew install wget
brew install ssh-copy-id
brew install hadoop
Run following command on the “namenode” machine
brew install hive
brew install mysql
[assumption is that namenode will host resourcemanager, jobtracker, hive metastore, hiveserver.
brew installs the software in “/usr/local/Cellar” location]
11. Run the following command for setting up keyless login from namenode to ALL
datanodes. Run the command on namenode:
ssh-keygen
[press Enter key twice to accept default RSA , and no-passphrase]
Run the following command recursively for ALL datanode hostnames. Run the command
on namenode:
ssh-copy-id hadoopuser@datanode01.local
provide the password when prompted. The command is verbose and tells if the key is
installed properly. You may validate the same by executing the command :
ssh hadoopuser@datanode01.local . It should NOT ask you to supply password anymore.
After the requisite software has been installed , the next step is to configure the different
components in a stepwise manner. Hadoop works in a distributed mode with “namenode”
being the central hub of the cluster. This gives enough reason to have the common
configuration files created on namenode first, and then copied in an automated manner
into all the datanodes. Let’s start with the .profile changes on namenode machine first.
The Saute
12. We are going to configure Hive to use MySQL as the metastore for this POC. All we need
is to create a db user “hiveuser” with a valid password in the MySQL DB installed and
running on namenode AND copy the MySQL driver jar into Hive lib directory
On the namenode , please fire the command to go to your HADOOP_CONF_DIR
location:
cd /usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop
Here , we need to create/modify the following set of files:
slaves
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
log4j.properties
On the namenode, please fire the command to go to your HIVE_CONF_DIR location:
cd /usr/local/Cellar/hive/0.13.0/libexec/conf
Here , we need to create/modify the following set of files:
hive-site.xml
hive-log4j.properties
The Slow cooking
13. Please find attached a simple script that, if installed on the namenode, can help you
copy your config files to ALL datanodes (I call it the config-push)
Please find attached another simple script that I use for rebooting all the datanodes.
The Plating
14. You may wish to take the next steps if desired:
Install zookeeper
Configure and run journalnodes
Go for High Availability cluster implementation with multiple Namenodes
Leave feedback if you wish to know the Hadoop configuration samples
The Garnishing