1. GoDataDriven
PROUDLY PART OF THE XEBIA GROUP
@krisgeus
krisgeusebroek@godatadriven.com
Bare metal Hadoop
provisioning
Kris Geusebroek
Big Data Hacker
With ansible and cobbler
1
2. -- Big Data Borat
“Give man Hadoop cluster he gain
insight for a day. Teach man build
Hadoop cluster he soon leave for
better job. #bigdata”
2
5. Separate layers...
- Hardware
- OS
- Basic install and configuration (Firewalls, IPSec, IPV6,
NTPd, raise ulimits, disk formatting and mounting)
- Cluster install (Cloudera Manager or Hortonworks
Data Platform)
- Extra stuff (Monitoring Ganglia, R & R-packages, ......)
5
6. Want...
- Horizontal scaling: Effort for an extra machine is
minimal
- Commodity Industry standard hardware
- So cope with errors, malfunctioning, re-installation
- Multiple clusters
- Experiment first with appropriate configuration for a
specific goal
-Think memory, hard disks, number of nodes
6
7. Want...
- Automate all the tasks for every layer
- Parameterise a lot
- Simple configuration of the separate layers
- Definition of roles (masternode, datanode etc.)
7
9. What we have done here...
Nothing new, just another possibility
Nothing tool specific
- demo installs Cloudera Manager, but works also with
Hortonworks Data Platform.
Most important is:
9
14. Ansible...
Ansible used for
-Tying it all together
- Initial setup of network config
- One time push of SSH key
- Full software install
ansible.cc
14
15. Cloudera Manager...
Cloudera Manager used for
- Cluster install software.
- Currently manual labour, can be automated using
the API
cloudera.com
15
16. Show me the code...
Add node information to the cobbler CMS
First make the install dvd known to cobbler:
mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvd
cobbler import --path=/mnt/dvd --name=CentOS64
Next make the node information known:
sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01
--mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True
If needed, re-enable the netboot flag:
sudo cobbler system edit --name=node01 --netboot-enabled=True
16
17. Show me the code...
Ansible needs to know what goes where
[cluster]
node01
node02
node03
[cobbler]
cobbler
[proxy]
cobbler
[ganglia-master]
node01
[ganglia-nodes:children]
cluster
[cloudera-manager]
node01
17
18. Show me the code...
For the rest it’s just a DSL thinghy with extra’s
- hosts:
- cloudera-manager
- cluster
user: root
sudo: yes
vars_files:
- vars/common.yml
tasks:
- include: cloudera-manager/tasks/common.yml
handlers:
- include: cloudera-manager/handlers/main.yml
- name: Configure CM4 Repo
copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root
group=root
- name: Install CM4 common stuff
yum: name=$item state=installed
18
20. Shared problems...
- No magic: Vendor specific hardware can screw
things up (strange names for disk mounts for
example)
- Bios settings, different RAID settings are not handled
(yet).
- Large amount of initial network traffic with large
clusters (N-times downloading the same software
packages from yum repositories) => Repo mirroring
to the rescue
- MAC address of all nodes must be known
20
21. Take aways...
- Do automate from the start
- It’s easy
- Use (our) open source code to get a head start
https://github.com/godatadriven/ansible_cluster
- Our team will do the additional consultancy
21
22. GoDataDriven
We’re hiring / Questions? / Thank you!
@krisgeus
krisgeusebroek@godatadriven.com
Kris Geusebroek
Big Data Hacker
22