SlideShare a Scribd company logo
1 of 18
Download to read offline
Hadoop Cluster on Docker Containers
“What Works and What Doesn't”
By:
Pranav Joshi
ME-HPC
GTU PG School
Content
● Introduction to Hadoop and Docker
● Why Hadoop on Docker?
● Job Configuration
● Openstack Sahara
● Handling Hadoop Single Point of Failure
● Validating the Prototype
● Performance Test
● Conclusion
● Reference
Introduction to Hadoop
● Apache Hadoop is an open-source software framework written in
Java for distributed storage and distributed processing of very large
data sets on computer clusters built from commodity hardware.
● Major Components of Apache Hadoop are,
– Hadoop Common: The common utilities that support the other Hadoop
modules.
– Hadoop Distributed File System (HDFS™): A distributed file system that
provides high-throughput access to application data.
– Hadoop YARN: A framework for job scheduling and cluster resource
management.
– Hadoop MapReduce: A YARN-based system for parallel processing of
large data sets.
Introduction to Docker Container
● Docker allows you to package an application with all of
its dependencies into a standardized unit for software
development.
● It is an open-source program that enables a Linux
application and its dependencies to be packaged as a
container.
● Containers include the application and all of its
dependencies, but share the kernel with other
containers.
Why Docker?
● Lightweight, Portable
● Build once, Run anywhere
● VM – without the overhead of a VM
● Isolated containers
● Automated and scripted
Separating out simple tasks
Container vs. VMs
Job Configuration
● YARN’s ApplicationMaster asks the NodeManager to launch
containers: LinuxContainerExecutor
● Docker can be used not only for fine-grained performance
isolation, but for delivering software packages
Openstack Sahara
Design and Implementation
● Implementation:
– Using a Dockerfile, our solution creates an image with Java, ssh
and some basic packages installed, and set up the image to
use the Hadoop build in a shared folder with the host.
– When an instance is created from the image, it starts ssh
daemon by default in order to allow further runtime configuration
through this channel.
● Management:
– Cluster managing library offers an even more abstract API
allowing the client to list and create a cluster, start, stop and get
details of a container and starting service in a specific container.
Hadoop and Fault Tolerance
● HDFS allows the replication of the NameNode (through
passive replication), but a failure at the level of the Job-
Tracker forces a job to be restarted.
● On Hadoop 2.x, part of the job management responsibility
is transferred to the ApplicationMaster, which becomes a
task manager.
● The loss of the ResourceManager does not block the
execution of a job, only prevents new jobs to be submitted.
However, the loss of an ApplicationMaster forces the
restart of the job, just like on Hadoop l.x.
Handling Hadoop Single Point of
Failures
● Fast recovery in the case of a failure
● Small impact on the performance
● Adapt to the capacity and context of the nodes
Validating the Prototype
● Using the Docker-Hadoop dashboard allowed us to analyze different failure
scenarios, including:
– Crash of the Job'Tracker node: we kill the JobTracker to force a new
node to resume the JobTracker role.
– Restart of an old JobTracker: we investigate the impacts of the return of
an old JobTracker node. Two possibilities are investigated:
● The returning node was simply disconnected from the network and
still thinks it is the JobTracker.
● The returning node has restarted and has lost all its status, but is still
on the top of Zookeeper's list.
– Heartbeat tuning: a too lazy heartbeat slows-down the reaction to failures
and may lead to some of the situations in the previous item. An intensive
heartbeat may impact negatively on the overall performance.
Performance Test
Performance Test
Execution time analysis when using different number of tasktrackers
Conclusion
● From this presentation we can explore the use of
container-based virtual machines to develop a prototyping
environment for MapReduce applications.
● The use of Docker-Hadoop allowed us to improve the
development speed of our Hadoop solution, as the
developers could test their code directly on their own
computers.
References
●
IEEE Paper 1
– Title: Efficient Prototyping of Fault Tolerant Map-Reduce Applications with
Docker-Hadoop
– Authors: Luiz Angelo Steffenel,Javier Rey, Matias Cogorno and Sergio
Nesmachnow, France
– Publication: 2015 IEEE International Conference on Cloud Engineering
●
IEEE Paper 2
– Title: Finding the Big Data Sweet Spot: Towards Automatically
Recommending Configurations for Hadoop Clusters on Docker Containers
– Authors: Rui Zhang, Min Li* and Dean Hildebrand, IBM Research and
Almaden *IBM T.J. Watson Research Center
– Publication: 2015 IEEE International Conference on Cloud Engineering
Thank You

More Related Content

What's hot

April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...Yahoo Developer Network
 
DevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineDevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineKit Merker
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Chris Fregly
 
Angular2, Spring Boot, Docker Swarm
Angular2, Spring Boot, Docker SwarmAngular2, Spring Boot, Docker Swarm
Angular2, Spring Boot, Docker Swarm🐊 Erwin Alberto
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNDataWorks Summit
 
Webinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwareWebinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwarePlatform9
 
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDevTriple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDevWerner Keil
 
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...Oleg Shalygin
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarKamesh Pemmaraju
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentDataWorks Summit
 
Oracle CODE 2017 San Francisco: Docker on Raspi Swarm to OCCS
Oracle CODE 2017 San Francisco: Docker on Raspi Swarm to OCCSOracle CODE 2017 San Francisco: Docker on Raspi Swarm to OCCS
Oracle CODE 2017 San Francisco: Docker on Raspi Swarm to OCCSFrank Munz
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila finalWei Ting Chen
 
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...Edureka!
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosDiscover Pinterest
 
Openshift Container Platform on Azure
Openshift Container Platform on AzureOpenshift Container Platform on Azure
Openshift Container Platform on AzureGlenn West
 

What's hot (20)

April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
 
DevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineDevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container Engine
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Big data and Kubernetes
Big data and KubernetesBig data and Kubernetes
Big data and Kubernetes
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
Angular2, Spring Boot, Docker Swarm
Angular2, Spring Boot, Docker SwarmAngular2, Spring Boot, Docker Swarm
Angular2, Spring Boot, Docker Swarm
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
 
Webinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwareWebinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMware
 
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDevTriple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
GCP - Continuous Integration and Delivery into Kubernetes with GitHub, Travis...
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Oracle CODE 2017 San Francisco: Docker on Raspi Swarm to OCCS
Oracle CODE 2017 San Francisco: Docker on Raspi Swarm to OCCSOracle CODE 2017 San Francisco: Docker on Raspi Swarm to OCCS
Oracle CODE 2017 San Francisco: Docker on Raspi Swarm to OCCS
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
What Is Kubernetes | Kubernetes Introduction | Kubernetes Tutorial For Beginn...
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
 
Openshift Container Platform on Azure
Openshift Container Platform on AzureOpenshift Container Platform on Azure
Openshift Container Platform on Azure
 

Viewers also liked

Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Docker Swarm Cluster
Docker Swarm ClusterDocker Swarm Cluster
Docker Swarm ClusterFernando Ike
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesManaging Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesMarc Sluiter
 

Viewers also liked (8)

Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Docker Swarm Cluster
Docker Swarm ClusterDocker Swarm Cluster
Docker Swarm Cluster
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesManaging Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing Kubernetes
 

Similar to Hadoop Cluster on Docker Containers

project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn Yahoo Developer Network
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Søren Lund
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache HadoopSufi Nawaz
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Olalekan Fuad Elesin
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingStanislav Osipov
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 

Similar to Hadoop Cluster on Docker Containers (20)

project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Unit 5
Unit  5Unit  5
Unit 5
 
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
November 2014 HUG: Lessons from Hadoop 2+Java8 migration at LinkedIn
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Intro to Apache Hadoop
Intro to Apache HadoopIntro to Apache Hadoop
Intro to Apache Hadoop
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
SCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scalingSCM Puppet: from an intro to the scaling
SCM Puppet: from an intro to the scaling
 
Spark
SparkSpark
Spark
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
HugNov14
HugNov14HugNov14
HugNov14
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 

Recently uploaded

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 

Recently uploaded (20)

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 

Hadoop Cluster on Docker Containers

  • 1. Hadoop Cluster on Docker Containers “What Works and What Doesn't” By: Pranav Joshi ME-HPC GTU PG School
  • 2. Content ● Introduction to Hadoop and Docker ● Why Hadoop on Docker? ● Job Configuration ● Openstack Sahara ● Handling Hadoop Single Point of Failure ● Validating the Prototype ● Performance Test ● Conclusion ● Reference
  • 3. Introduction to Hadoop ● Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. ● Major Components of Apache Hadoop are, – Hadoop Common: The common utilities that support the other Hadoop modules. – Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. – Hadoop YARN: A framework for job scheduling and cluster resource management. – Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
  • 4. Introduction to Docker Container ● Docker allows you to package an application with all of its dependencies into a standardized unit for software development. ● It is an open-source program that enables a Linux application and its dependencies to be packaged as a container. ● Containers include the application and all of its dependencies, but share the kernel with other containers.
  • 5. Why Docker? ● Lightweight, Portable ● Build once, Run anywhere ● VM – without the overhead of a VM ● Isolated containers ● Automated and scripted
  • 8. Job Configuration ● YARN’s ApplicationMaster asks the NodeManager to launch containers: LinuxContainerExecutor ● Docker can be used not only for fine-grained performance isolation, but for delivering software packages
  • 10. Design and Implementation ● Implementation: – Using a Dockerfile, our solution creates an image with Java, ssh and some basic packages installed, and set up the image to use the Hadoop build in a shared folder with the host. – When an instance is created from the image, it starts ssh daemon by default in order to allow further runtime configuration through this channel. ● Management: – Cluster managing library offers an even more abstract API allowing the client to list and create a cluster, start, stop and get details of a container and starting service in a specific container.
  • 11. Hadoop and Fault Tolerance ● HDFS allows the replication of the NameNode (through passive replication), but a failure at the level of the Job- Tracker forces a job to be restarted. ● On Hadoop 2.x, part of the job management responsibility is transferred to the ApplicationMaster, which becomes a task manager. ● The loss of the ResourceManager does not block the execution of a job, only prevents new jobs to be submitted. However, the loss of an ApplicationMaster forces the restart of the job, just like on Hadoop l.x.
  • 12. Handling Hadoop Single Point of Failures ● Fast recovery in the case of a failure ● Small impact on the performance ● Adapt to the capacity and context of the nodes
  • 13. Validating the Prototype ● Using the Docker-Hadoop dashboard allowed us to analyze different failure scenarios, including: – Crash of the Job'Tracker node: we kill the JobTracker to force a new node to resume the JobTracker role. – Restart of an old JobTracker: we investigate the impacts of the return of an old JobTracker node. Two possibilities are investigated: ● The returning node was simply disconnected from the network and still thinks it is the JobTracker. ● The returning node has restarted and has lost all its status, but is still on the top of Zookeeper's list. – Heartbeat tuning: a too lazy heartbeat slows-down the reaction to failures and may lead to some of the situations in the previous item. An intensive heartbeat may impact negatively on the overall performance.
  • 15. Performance Test Execution time analysis when using different number of tasktrackers
  • 16. Conclusion ● From this presentation we can explore the use of container-based virtual machines to develop a prototyping environment for MapReduce applications. ● The use of Docker-Hadoop allowed us to improve the development speed of our Hadoop solution, as the developers could test their code directly on their own computers.
  • 17. References ● IEEE Paper 1 – Title: Efficient Prototyping of Fault Tolerant Map-Reduce Applications with Docker-Hadoop – Authors: Luiz Angelo Steffenel,Javier Rey, Matias Cogorno and Sergio Nesmachnow, France – Publication: 2015 IEEE International Conference on Cloud Engineering ● IEEE Paper 2 – Title: Finding the Big Data Sweet Spot: Towards Automatically Recommending Configurations for Hadoop Clusters on Docker Containers – Authors: Rui Zhang, Min Li* and Dean Hildebrand, IBM Research and Almaden *IBM T.J. Watson Research Center – Publication: 2015 IEEE International Conference on Cloud Engineering