SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
GoDataDriven
PROUDLY PART OF THE XEBIA GROUP
@krisgeus
krisgeusebroek@godatadriven.com
Bare metal Hadoop
provisioning
Kris Geusebroek
Big Data Hacker
With ansible and cobbler
1
-- Big Data Borat
“Give man Hadoop cluster he gain
insight for a day. Teach man build
Hadoop cluster he soon leave for
better job. #bigdata”
2
-- Kris Geusebroek
“We’re hiring”
3
Don’t want to...
Manually install everything needed for a Hadoop
cluster...
4
Separate layers...
- Hardware
- OS
- Basic install and configuration (Firewalls, IPSec, IPV6,
NTPd, raise ulimits, disk formatting and mounting)
- Cluster install (Cloudera Manager or Hortonworks
Data Platform)
- Extra stuff (Monitoring Ganglia, R & R-packages, ......)
5
Want...
- Horizontal scaling: Effort for an extra machine is
minimal
- Commodity Industry standard hardware
	

 - So cope with errors, malfunctioning, re-installation
- Multiple clusters
- Experiment first with appropriate configuration for a
specific goal
	

 -Think memory, hard disks, number of nodes
6
Want...
- Automate all the tasks for every layer
- Parameterise a lot
- Simple configuration of the separate layers
- Definition of roles (masternode, datanode etc.)
7
Possible with...
Vendor specific tools
problem here is they can do only a subset of all tasks
8
What we have done here...
Nothing new, just another possibility
Nothing tool specific
- demo installs Cloudera Manager, but works also with
Hortonworks Data Platform.
Most important is:
9
Stack...
10
-- Big Data Borat
“Essentially, this solution is CoSSaaS.”
11
-- Big Data Borat
“Essentially, this solution is CoSSaaS.
(Couple of Shell Scripts as a Service)”
12
Cobbler...
Cobbler used for
- CMS
- DHCP server
- OS image hosting
- OS kickstart
cobblerd.org
13
Ansible...
Ansible used for
-Tying it all together
- Initial setup of network config
- One time push of SSH key
- Full software install
ansible.cc
14
Cloudera Manager...
Cloudera Manager used for
- Cluster install software.
- Currently manual labour, can be automated using
the API
cloudera.com
15
Show me the code...
Add node information to the cobbler CMS
First make the install dvd known to cobbler:
mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvd
cobbler import --path=/mnt/dvd --name=CentOS64
Next make the node information known:
sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01
--mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True
If needed, re-enable the netboot flag:
sudo cobbler system edit --name=node01 --netboot-enabled=True
16
Show me the code...
Ansible needs to know what goes where
[cluster]
node01
node02
node03
[cobbler]
cobbler
[proxy]
cobbler
[ganglia-master]
node01
[ganglia-nodes:children]
cluster
[cloudera-manager]
node01
17
Show me the code...
For the rest it’s just a DSL thinghy with extra’s
- hosts:
- cloudera-manager
- cluster
user: root
sudo: yes
vars_files:
- vars/common.yml
tasks:
- include: cloudera-manager/tasks/common.yml
handlers:
- include: cloudera-manager/handlers/main.yml
- name: Configure CM4 Repo
copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root
group=root
- name: Install CM4 common stuff
yum: name=$item state=installed
18
Demo...
19
Shared problems...
- No magic: Vendor specific hardware can screw
things up (strange names for disk mounts for
example)
- Bios settings, different RAID settings are not handled
(yet).
- Large amount of initial network traffic with large
clusters (N-times downloading the same software
packages from yum repositories) => Repo mirroring
to the rescue
- MAC address of all nodes must be known
20
Take aways...
- Do automate from the start
- It’s easy
- Use (our) open source code to get a head start
https://github.com/godatadriven/ansible_cluster
- Our team will do the additional consultancy
21
GoDataDriven
We’re hiring / Questions? / Thank you!
@krisgeus
krisgeusebroek@godatadriven.com
Kris Geusebroek
Big Data Hacker
22

Weitere ähnliche Inhalte

Was ist angesagt?

Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group
Farshid Pirahansiah
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
Kazuhito Ohkawa
 

Was ist angesagt? (17)

Installing a Cluster of Raspberry Pis with Stacki Ace
Installing a Cluster of Raspberry Pis with Stacki AceInstalling a Cluster of Raspberry Pis with Stacki Ace
Installing a Cluster of Raspberry Pis with Stacki Ace
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Building a Hadoop Cluster with Stacki
Building a Hadoop Cluster with StackiBuilding a Hadoop Cluster with Stacki
Building a Hadoop Cluster with Stacki
 
The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?
 
Everyone Loves a Sausage
Everyone Loves a SausageEveryone Loves a Sausage
Everyone Loves a Sausage
 
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
Ceph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance ArchiectureCeph Day KL - Ceph Tiering with High Performance Archiecture
Ceph Day KL - Ceph Tiering with High Performance Archiecture
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Ceph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To EnterpriseCeph Day Bring Ceph To Enterprise
Ceph Day Bring Ceph To Enterprise
 
Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group Best Deep Learning Post from LinkedIn Group
Best Deep Learning Post from LinkedIn Group
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
Cassandra Summit 2014: Down with Tweaking! Removing Tunable Complexity for Ca...
 
My Old Friend Malloc
My Old Friend MallocMy Old Friend Malloc
My Old Friend Malloc
 
OFY-2015-Cloud-In-A-Day
OFY-2015-Cloud-In-A-DayOFY-2015-Cloud-In-A-Day
OFY-2015-Cloud-In-A-Day
 
JavaScript, Meet Cloud : Node.js on Windows Azure
JavaScript, Meet Cloud : Node.js on Windows AzureJavaScript, Meet Cloud : Node.js on Windows Azure
JavaScript, Meet Cloud : Node.js on Windows Azure
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
 

Ähnlich wie Bare metal Hadoop provisioning

Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
ke4qqq
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
ke4qqq
 
Puppetpreso
PuppetpresoPuppetpreso
Puppetpreso
ke4qqq
 
the NML project
the NML projectthe NML project
the NML project
Lei Yang
 

Ähnlich wie Bare metal Hadoop provisioning (20)

Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 
Puppet and CloudStack
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 
Cobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale EnvironmentsCobbler, Func and Puppet: Tools for Large Scale Environments
Cobbler, Func and Puppet: Tools for Large Scale Environments
 
Deploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise EnvironmentsDeploying Foreman in Enterprise Environments
Deploying Foreman in Enterprise Environments
 
Puppetpreso
PuppetpresoPuppetpreso
Puppetpreso
 
NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)NetBSD on Google Compute Engine (en)
NetBSD on Google Compute Engine (en)
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
LSA2 - 02 Namespaces
LSA2 - 02  NamespacesLSA2 - 02  Namespaces
LSA2 - 02 Namespaces
 
Using cobbler in a not so small environment 1.77
Using cobbler in a not so small environment 1.77Using cobbler in a not so small environment 1.77
Using cobbler in a not so small environment 1.77
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
 
Lessons from Driverless AI going to Production
Lessons from Driverless AI going to ProductionLessons from Driverless AI going to Production
Lessons from Driverless AI going to Production
 
Introduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning ToolIntroduction to Stacki - World's fastest Linux server provisioning Tool
Introduction to Stacki - World's fastest Linux server provisioning Tool
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
 
the NML project
the NML projectthe NML project
the NML project
 
NFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center OperationsNFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center Operations
 
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
Kubernetes on Bare Metal at the Kitchener-Waterloo Kubernetes and Cloud Nativ...
 
Postgres the hardway
Postgres the hardwayPostgres the hardway
Postgres the hardway
 
Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013
 

Mehr von GoDataDriven

DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
GoDataDriven
 

Mehr von GoDataDriven (20)

Streamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature CatalogStreamlining Data Science Workflows with a Feature Catalog
Streamlining Data Science Workflows with a Feature Catalog
 
Visualizing Big Data in a Small Screen
Visualizing Big Data in a Small ScreenVisualizing Big Data in a Small Screen
Visualizing Big Data in a Small Screen
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 
Training Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organizationTraining Taster: Leading the way to become a data-driven organization
Training Taster: Leading the way to become a data-driven organization
 
My Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics EngineerMy Path From Data Engineer to Analytics Engineer
My Path From Data Engineer to Analytics Engineer
 
dbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchezdbt Python models - GoDataFest by Guillermo Sanchez
dbt Python models - GoDataFest by Guillermo Sanchez
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
How to create a Devcontainer for your Python project
How to create a Devcontainer for your Python projectHow to create a Devcontainer for your Python project
How to create a Devcontainer for your Python project
 
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
Using Graph Neural Networks To Embrace The Dependency In Your Data by Usman Z...
 
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
Common Issues With Time Series by Vadim Nelidov - GoDataFest 2022
 
MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022MLOps CodeBreakfast on AWS - GoDataFest 2022
MLOps CodeBreakfast on AWS - GoDataFest 2022
 
MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022MLOps CodeBreakfast on Azure - GoDataFest 2022
MLOps CodeBreakfast on Azure - GoDataFest 2022
 
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
Tableau vs. Power BI by Juan Manuel Perafan - GoDataFest 2022
 
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
Deploying a Modern Data Stack by Lasse Benninga - GoDataFest 2022
 
AWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de HaanAWS Well-Architected Webinar Security - Ben de Haan
AWS Well-Architected Webinar Security - Ben de Haan
 
The 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven CompaniesThe 7 Habits of Effective Data Driven Companies
The 7 Habits of Effective Data Driven Companies
 
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
DevOps for Data Science on Azure - Marcel de Vries (Xpirit) and Niels Zeilema...
 
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...Artificial intelligence in actions: delivering a new experience to Formula 1 ...
Artificial intelligence in actions: delivering a new experience to Formula 1 ...
 
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't HofSmart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
Smart application on Azure at Vattenfall - Rens Weijers & Peter van 't Hof
 
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
Democratizing AI/ML with GCP - Abishay Rao (Google) at GoDataFest 2019
 

Kürzlich hochgeladen

一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
sovco
 
Terna - 1Q 2024 Consolidated Results Presentation
Terna - 1Q 2024 Consolidated Results PresentationTerna - 1Q 2024 Consolidated Results Presentation
Terna - 1Q 2024 Consolidated Results Presentation
Terna SpA
 
Corporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdfCorporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdf
Probe Gold
 
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
xzxvi5zp
 
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Klinik kandungan
 

Kürzlich hochgeladen (20)

Osisko Development - Investor Presentation - May 2024
Osisko Development - Investor Presentation - May 2024Osisko Development - Investor Presentation - May 2024
Osisko Development - Investor Presentation - May 2024
 
Western Copper and Gold - May 2024 Presentation
Western Copper and Gold - May 2024 PresentationWestern Copper and Gold - May 2024 Presentation
Western Copper and Gold - May 2024 Presentation
 
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
一比一原版(EUR毕业证书)鹿特丹伊拉斯姆斯大学毕业证原件一模一样
 
Camil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCamil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdf
 
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024Osisko Gold Royalties Ltd - Corporate Presentation, May 2024
Osisko Gold Royalties Ltd - Corporate Presentation, May 2024
 
Financial Results for the Fiscal Year Ended March 2024
Financial Results for the Fiscal Year Ended March 2024Financial Results for the Fiscal Year Ended March 2024
Financial Results for the Fiscal Year Ended March 2024
 
The Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results PresentationThe Leonardo 1Q 2024 Results Presentation
The Leonardo 1Q 2024 Results Presentation
 
Terna - 1Q 2024 Consolidated Results Presentation
Terna - 1Q 2024 Consolidated Results PresentationTerna - 1Q 2024 Consolidated Results Presentation
Terna - 1Q 2024 Consolidated Results Presentation
 
Corporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdfCorporate Presentation Probe Canaccord Conference 2024.pdf
Corporate Presentation Probe Canaccord Conference 2024.pdf
 
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girls
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call GirlsPremium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girls
Premium Call Girls In Kapurthala} 9332606886❤️VVIP Sonya Call Girls
 
Camil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdfCamil Institutional Presentation_Mai24.pdf
Camil Institutional Presentation_Mai24.pdf
 
Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubai
Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In DubaiDubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubai
Dubai Call Girls/// Hot Afternoon O525547819 Call Girls In Dubai
 
Osisko Gold Royalties Ltd - Q1 2024 Results
Osisko Gold Royalties Ltd - Q1 2024 ResultsOsisko Gold Royalties Ltd - Q1 2024 Results
Osisko Gold Royalties Ltd - Q1 2024 Results
 
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
一比一原版(Acadia毕业证书)加拿大阿卡迪亚大学毕业证学历认证可查认证
 
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Area
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your AreaGorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Area
Gorakhpur Call Girls 8250092165 Low Price Escorts Service in Your Area
 
ITAU EQUITY_STRATEGY_WARM_UP_20240505 DHG.pdf
ITAU EQUITY_STRATEGY_WARM_UP_20240505 DHG.pdfITAU EQUITY_STRATEGY_WARM_UP_20240505 DHG.pdf
ITAU EQUITY_STRATEGY_WARM_UP_20240505 DHG.pdf
 
Teekay Tankers Q1-24 Earnings Presentation
Teekay Tankers Q1-24 Earnings PresentationTeekay Tankers Q1-24 Earnings Presentation
Teekay Tankers Q1-24 Earnings Presentation
 
SME IPO Opportunity and Trends of May 2024
SME IPO Opportunity and Trends of May 2024SME IPO Opportunity and Trends of May 2024
SME IPO Opportunity and Trends of May 2024
 
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
Jual obat aborsi Tawangmangu ( 085657271886 ) Cytote pil telat bulan penggugu...
 
countries with the highest gold reserves in 2024
countries with the highest gold reserves in 2024countries with the highest gold reserves in 2024
countries with the highest gold reserves in 2024
 

Bare metal Hadoop provisioning

  • 1. GoDataDriven PROUDLY PART OF THE XEBIA GROUP @krisgeus krisgeusebroek@godatadriven.com Bare metal Hadoop provisioning Kris Geusebroek Big Data Hacker With ansible and cobbler 1
  • 2. -- Big Data Borat “Give man Hadoop cluster he gain insight for a day. Teach man build Hadoop cluster he soon leave for better job. #bigdata” 2
  • 4. Don’t want to... Manually install everything needed for a Hadoop cluster... 4
  • 5. Separate layers... - Hardware - OS - Basic install and configuration (Firewalls, IPSec, IPV6, NTPd, raise ulimits, disk formatting and mounting) - Cluster install (Cloudera Manager or Hortonworks Data Platform) - Extra stuff (Monitoring Ganglia, R & R-packages, ......) 5
  • 6. Want... - Horizontal scaling: Effort for an extra machine is minimal - Commodity Industry standard hardware - So cope with errors, malfunctioning, re-installation - Multiple clusters - Experiment first with appropriate configuration for a specific goal -Think memory, hard disks, number of nodes 6
  • 7. Want... - Automate all the tasks for every layer - Parameterise a lot - Simple configuration of the separate layers - Definition of roles (masternode, datanode etc.) 7
  • 8. Possible with... Vendor specific tools problem here is they can do only a subset of all tasks 8
  • 9. What we have done here... Nothing new, just another possibility Nothing tool specific - demo installs Cloudera Manager, but works also with Hortonworks Data Platform. Most important is: 9
  • 11. -- Big Data Borat “Essentially, this solution is CoSSaaS.” 11
  • 12. -- Big Data Borat “Essentially, this solution is CoSSaaS. (Couple of Shell Scripts as a Service)” 12
  • 13. Cobbler... Cobbler used for - CMS - DHCP server - OS image hosting - OS kickstart cobblerd.org 13
  • 14. Ansible... Ansible used for -Tying it all together - Initial setup of network config - One time push of SSH key - Full software install ansible.cc 14
  • 15. Cloudera Manager... Cloudera Manager used for - Cluster install software. - Currently manual labour, can be automated using the API cloudera.com 15
  • 16. Show me the code... Add node information to the cobbler CMS First make the install dvd known to cobbler: mount -t iso9660 -o loop /<directoryname>/CentOS-6.4-x86_64-bin-DVD1.iso /mnt/dvd cobbler import --path=/mnt/dvd --name=CentOS64 Next make the node information known: sudo cobbler system add --name=node01 --profile=CentOS64-x86_64 --hostname=node01 --mac=<00:00:00:00:00:00> --ip-address=10.20.0.101 --static=True If needed, re-enable the netboot flag: sudo cobbler system edit --name=node01 --netboot-enabled=True 16
  • 17. Show me the code... Ansible needs to know what goes where [cluster] node01 node02 node03 [cobbler] cobbler [proxy] cobbler [ganglia-master] node01 [ganglia-nodes:children] cluster [cloudera-manager] node01 17
  • 18. Show me the code... For the rest it’s just a DSL thinghy with extra’s - hosts: - cloudera-manager - cluster user: root sudo: yes vars_files: - vars/common.yml tasks: - include: cloudera-manager/tasks/common.yml handlers: - include: cloudera-manager/handlers/main.yml - name: Configure CM4 Repo copy: src=cloudera-manager/files/etc/yum.repos.d/cm4.repo dest=/etc/yum.repos.d/ owner=root group=root - name: Install CM4 common stuff yum: name=$item state=installed 18
  • 20. Shared problems... - No magic: Vendor specific hardware can screw things up (strange names for disk mounts for example) - Bios settings, different RAID settings are not handled (yet). - Large amount of initial network traffic with large clusters (N-times downloading the same software packages from yum repositories) => Repo mirroring to the rescue - MAC address of all nodes must be known 20
  • 21. Take aways... - Do automate from the start - It’s easy - Use (our) open source code to get a head start https://github.com/godatadriven/ansible_cluster - Our team will do the additional consultancy 21
  • 22. GoDataDriven We’re hiring / Questions? / Thank you! @krisgeus krisgeusebroek@godatadriven.com Kris Geusebroek Big Data Hacker 22