SlideShare a Scribd company logo
1 of 30
Download to read offline
Deploying Hadoop-Based Bigdata
                  Environments
     Click to edit Master subtitle style
 “[Tall] Tales From The Frontier”

Roman Shaposhnik
rvs@apache.org, Cloudera Inc.
$ whoami

   An open source software developer
       Linux kernel, C/C++ compilers, FFmpeg, Plan9
   A Hadoop and all around UNIX guy
   root@cloudera
       Member of the “Kitchen” team
   Apache Software Foundation Incubator PMC
       [Bigtop], Hadoop Development Tools, Celix, Helix
   VP of Apache Bigtop
                                                    2
ZooKeeper (coordination)

       HUE (web based UI)


Pig (DQL) Hive (SQL) Impala (SQL)

 HBase      YARN/MR1         Oozie

         HDFS (filesystem)


                                     3
ZooKeeper (coordination)

       HUE (web based UI)


Pig (DQL) Hive (SQL) Impala (SQL)

 HBase      YARN/MR1         Oozie

         HDFS (filesystem)


                                     4
It is a jungle out there
   Zookeeper         Sqoop       JDK/JRE
   Hadoop            Oozie       Kerberos
         HDFS        Whirr       Ganglia
         YARN        Mahout      Nagios
         MR1         Flume       JSVC
         HTTPFS      Giraph      Tomcat
   HBase             Hama        Utils
   Pig               Hue         Postgress
   Hive              Solr        HTTPD
   Impala            Crunch
                                                5
And the answer is:

         Puppet[forge]


                     6
One way of using Apache software

  $ wget http://apache.org/httpd.tar.gz
  $ tar xzvf httpd.tar.gz
  $ cd httpd
  $ ./configure ; make
  $ make install
  ERROR: can't write to /usr/local/bin
  $ sudo make install
                                          7
A different way

  $ sudo apt-get install httpd
  Would you like to also upgrade your conf?




                                              8
Is there apt-get install hadoop ?

   Hadoop is still in a very active development
   Hadoop is Java based
   Hadoop is a distributed application
   Hadoop is way more than HDFS + MR




                                              9
Project-by-project approach

   “Passively” maintained code
       Packaging, OS-level (init.d)
   Developer-centric view
       Edit-compile-debug cycle vs. deployment
       Lack of integration testing
   Differences in distributions/packaging:
       Where is this valid: /usr/libexec ?
   Combinatoric explosion of dependencies
                                                  10
Dependencies Inferno:

                            Hive 0.8.1


          HBase
       Hbase (0.92, 0.90)
                                               HBase
                                            HBase
                                  Hadoop (1.0, 0.22, 0.23)



             A million dollar question:
$ tar xzvf hive-0.8.1.tar.gz
$ ls hive-0.8.1/lib
                                                             11
Dependencies Inferno:

                            Hive 0.8.1


          HBase
       Hbase (0.92, 0.90)
                                               HBase
                                            HBase
                                  Hadoop (1.0, 0.22, 0.23)



             A million dollar question:
$ tar xzvf hive-0.8.1.tar.gz
$ ls hive-0.8.1/lib
hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar
                                             12
Remember what Debian did to Linux?


 GNU Software             Linux kernel
                         Linux kernel




                                         13
Bigtop is trying to do it with Hadoop

Hadoop Ecosystem              Hadoop
                             Linux kernel
(Pig, Hive, Mahout)        (HDFS + MR)




CDH4 beta 1
                                            14
What's there in Bigtop

       Build/Packaging infrastructure
           RPM, DEB, (tarballs, homebrew/MacPorts)
           VirtualBox, VMWare and KVM VMs
           Fedora, OpenSUSE, Mageia, CentOS, Ubuntu
       Puppet deployment infrastrucutre
       Integration test infrastrucutre (iTest)
       Bigtop Jenkins:
           http://bigtop01.cloudera.org:8080
                                                      15
And the answer is:

      Puppet[Bigtop]


                     16
System software deployment

   Packages vs. Puppet code
       package/file/service
   What is packaging?
       dependency tracking
       build encapsulation
       java packaging
       file layout
       user creation
       service registration   17
Does it really work?

   Java packaging
       maven/ivy integration
   file layout
       side-by-side installations of the same package
   user creation
       LDAP/AD provisioning
   service registration
       start on install vs. start on reboot
                                                     18
Petascale distributed systems

       Scale
           Yahoo! ~5000 nodes
       Deployment orchestration
           Kerberos::Host_keytab <| title == "hdfs" |> ->
              Service["hadoop-hdfs-datanode"]
       Highly coordinated distributed system
           It ain't HTTPD/loadbalancer
           Rolling upgrades/asynchronous rollbacks
                                                             19
Back to tarballs and shell?

       What's better for Puppet: fpm or rpm?
       What is the role of Puppet?
           coordinating the entire system: lack of DSL
           converging an isolated node: will it ever work?
           a building block for an agent-based system
       One agent to rule them all?
           there's no spoon^H^H^H^H^H^ agent: Whirr
           MCollective
                                                          20
           Cloudera Manager, Ambari
Evolution, not perfection!
   Minimalistic, highly consistent packages
       /usr/lib/hadoop, /etc/hadoop/conf (alternative)
       fail gracefully: .... || : )
       Java packaging is not solved [yet]: symlinks
   Minimalistic Puppet code
       package/file/service
       masterless (most of the time)
       integration with Whirr
   BoxGrinder                                            21
The road ahead
   New kind of configuration management
       /etc/hadoop vs Zookeeper
   New kinds of system packaging
       Parcels (tarballs + metadata)
       HPS (Hadoop Packaging System)
   Orchestration: to puppet or not to puppet?
       Cloudera Manager
       Apache Ambari (incubating)
       Reactor 8: http://reactor8.com       22
Java Packaging
   Fate of Java
       OpenJDK
   OSGi
       Hadoop's view: MAPREDUCE-1700
        https://issues.apache.org/jira/browse/MAPREDUCE-1700
   Project Jigsaw
       Language tie-ins? Really?
   Linux vendors getting their act together
                                                               23
Integration testing
   Clean room provisioning
       Those ain't unit tests – they trash the system
   Cluster topology and cluster state discovery
       How can puppet help us?
   Cluster state manipulation
       Test-driven orchestration
       Chaos Monkey
   How to be successful in OS co-opetition
       Make everything pluggable (and subvert ;-))      24
Anatomy of iTest

   Versioned, JVM-based test/data artifacts
   Dependency between test artifacts
   Matching stack of integration tests
   Implementation
       Maven artifacts, pom files
       JUnit test-execution entry point
       Groovy for scripting

                                               25
Who's the target audience

       End users
           YOU!
       ASF Projects/Bigdata developers
           from Avro to Zookeeper
       Bigdata solutions vendors
           Cloudera, EMC, Hortonworks, Karmasphere
       DevOPs
           Ebay, Yahoo, Facebook, LinkedIn
                                                      26
Who's on-board?
   Cloudera
       CDH4 is 100% based on Bigtop (hadoop v2)
       Available @cloudera.com
   Canonical
       Ubuntu Server: Hadoop and Bigdata blueprint
        https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop

   TrendMicro
   Hortonworks (partially)
   EMC, EBay (early stages of prototyping)                                27
What's happening?
   A special release: Bigtop 0.3.0-incubating
       Hadoop 1.0.1
   Last stable release: Bigtop 0.5.0
       Hadoop 2.0.2-alpha
   Next stable release: Bigtop 0.6.0
       End of Mar 2013 release
       Hadoop 2.0.3-beta
       Major focus on developers
                                                 28
What Bigtop needs from you?

       More of you!
           Meetup: “Silicon Valley Hands-on Programming”
            http://www.meetup.com/HandsOnProgrammingEvents/
       More infrastructure for build/test
           EC2, Supercell, EMC magic cluster, CloudStack
       More integration tests
           Convince your bosses to commit to Bigtop
       Validate upstream release using Bigtop
                                                              29
Contact
§
    Bigtop home @Apache:
    •
        http://incubator.apache.org/bigtop/
§
    Hangout places:
    •
        {dev,user}@bigtop.apache.org
    •
        #bigtop on Freenode
§
    Roman Shaposhnik
    •
        rvs@apache.org, rvs@cloudera.com



                                 30

More Related Content

What's hot

Practical introduction to dev ops with chef
Practical introduction to dev ops with chefPractical introduction to dev ops with chef
Practical introduction to dev ops with chefLeanDog
 
Chef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of ChefChef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of ChefChef Software, Inc.
 
Chef for OpenStack: Grizzly Roadmap
Chef for OpenStack: Grizzly RoadmapChef for OpenStack: Grizzly Roadmap
Chef for OpenStack: Grizzly RoadmapMatt Ray
 
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6Chef
 
Automated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAutomated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAlberto Molina Coballes
 
Boston/NYC Chef for OpenStack Hack Days
Boston/NYC Chef for OpenStack Hack DaysBoston/NYC Chef for OpenStack Hack Days
Boston/NYC Chef for OpenStack Hack DaysMatt Ray
 
Chef for OpenStack December 2012
Chef for OpenStack December 2012Chef for OpenStack December 2012
Chef for OpenStack December 2012Matt Ray
 
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...Chef Software, Inc.
 
Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Anton Weiss
 
Chef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation SetupChef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation SetupChef Software, Inc.
 
Package Management and Chef - ChefConf 2015
Package Management and Chef - ChefConf 2015Package Management and Chef - ChefConf 2015
Package Management and Chef - ChefConf 2015Chef
 
Scaling Development Environments with Docker
Scaling Development Environments with DockerScaling Development Environments with Docker
Scaling Development Environments with DockerDocker, Inc.
 
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Chef
 
Environments - Fundamentals Webinar Series Week 5
Environments - Fundamentals Webinar Series Week 5Environments - Fundamentals Webinar Series Week 5
Environments - Fundamentals Webinar Series Week 5Chef
 
Opscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefOpscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefChef Software, Inc.
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Software, Inc.
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltStack
 
Building a PaaS using Chef
Building a PaaS using ChefBuilding a PaaS using Chef
Building a PaaS using ChefShaun Domingo
 

What's hot (20)

Practical introduction to dev ops with chef
Practical introduction to dev ops with chefPractical introduction to dev ops with chef
Practical introduction to dev ops with chef
 
Chef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of ChefChef Fundamentals Training Series Module 1: Overview of Chef
Chef Fundamentals Training Series Module 1: Overview of Chef
 
Chef: Smart infrastructure automation
Chef: Smart infrastructure automationChef: Smart infrastructure automation
Chef: Smart infrastructure automation
 
Chef for OpenStack: Grizzly Roadmap
Chef for OpenStack: Grizzly RoadmapChef for OpenStack: Grizzly Roadmap
Chef for OpenStack: Grizzly Roadmap
 
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
Community Cookbooks & further resources - Fundamentals Webinar Series Part 6
 
Automated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. AnsibleAutomated Deployment and Configuration Engines. Ansible
Automated Deployment and Configuration Engines. Ansible
 
Boston/NYC Chef for OpenStack Hack Days
Boston/NYC Chef for OpenStack Hack DaysBoston/NYC Chef for OpenStack Hack Days
Boston/NYC Chef for OpenStack Hack Days
 
Chef for OpenStack December 2012
Chef for OpenStack December 2012Chef for OpenStack December 2012
Chef for OpenStack December 2012
 
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
Chef Fundamentals Training Series Module 4: The Chef Client Run and Expanding...
 
Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!
 
Chef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation SetupChef Fundamentals Training Series Module 2: Workstation Setup
Chef Fundamentals Training Series Module 2: Workstation Setup
 
Package Management and Chef - ChefConf 2015
Package Management and Chef - ChefConf 2015Package Management and Chef - ChefConf 2015
Package Management and Chef - ChefConf 2015
 
Scaling Development Environments with Docker
Scaling Development Environments with DockerScaling Development Environments with Docker
Scaling Development Environments with Docker
 
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
Node setup, resource, and recipes - Fundamentals Webinar Series Part 2
 
Environments - Fundamentals Webinar Series Week 5
Environments - Fundamentals Webinar Series Week 5Environments - Fundamentals Webinar Series Week 5
Environments - Fundamentals Webinar Series Week 5
 
Opscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with ChefOpscode Webinar: Managing Your VMware Infrastructure with Chef
Opscode Webinar: Managing Your VMware Infrastructure with Chef
 
The unintended benefits of Chef
The unintended benefits of ChefThe unintended benefits of Chef
The unintended benefits of Chef
 
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
Chef Fundamentals Training Series Module 3: Setting up Nodes and Cookbook Aut...
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
 
Building a PaaS using Chef
Building a PaaS using ChefBuilding a PaaS using Chef
Building a PaaS using Chef
 

Viewers also liked

Whirr devdown
Whirr devdownWhirr devdown
Whirr devdownPuppet
 
Cascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveJoydeep Sen Sarma
 
Introduction to Puppet Enterprise 2016.4
Introduction to Puppet Enterprise 2016.4Introduction to Puppet Enterprise 2016.4
Introduction to Puppet Enterprise 2016.4Puppet
 
Using Vagrant, Puppet, Testing & Hadoop
Using Vagrant, Puppet, Testing & HadoopUsing Vagrant, Puppet, Testing & Hadoop
Using Vagrant, Puppet, Testing & HadoopPuppet
 
Demystifying TLS
Demystifying TLSDemystifying TLS
Demystifying TLSPuppet
 
Introduction to Puppet Enterprise
Introduction to Puppet EnterpriseIntroduction to Puppet Enterprise
Introduction to Puppet EnterprisePuppet
 
Introduction to Puppet Enterprise
Introduction to Puppet EnterpriseIntroduction to Puppet Enterprise
Introduction to Puppet EnterprisePuppet
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Intro To Cascading
Intro To CascadingIntro To Cascading
Intro To CascadingNate Murray
 
Introduction to Puppet Enterprise
Introduction to Puppet EnterpriseIntroduction to Puppet Enterprise
Introduction to Puppet EnterprisePuppet
 
Adopting Kubernetes with Puppet
Adopting Kubernetes with PuppetAdopting Kubernetes with Puppet
Adopting Kubernetes with PuppetPuppet
 

Viewers also liked (14)

Whirr devdown
Whirr devdownWhirr devdown
Whirr devdown
 
Cascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop WorldCascading - A Java Developer’s Companion to the Hadoop World
Cascading - A Java Developer’s Companion to the Hadoop World
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Introduction to Puppet Enterprise 2016.4
Introduction to Puppet Enterprise 2016.4Introduction to Puppet Enterprise 2016.4
Introduction to Puppet Enterprise 2016.4
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Using Vagrant, Puppet, Testing & Hadoop
Using Vagrant, Puppet, Testing & HadoopUsing Vagrant, Puppet, Testing & Hadoop
Using Vagrant, Puppet, Testing & Hadoop
 
Demystifying TLS
Demystifying TLSDemystifying TLS
Demystifying TLS
 
Introduction to Puppet Enterprise
Introduction to Puppet EnterpriseIntroduction to Puppet Enterprise
Introduction to Puppet Enterprise
 
Introduction to Puppet Enterprise
Introduction to Puppet EnterpriseIntroduction to Puppet Enterprise
Introduction to Puppet Enterprise
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Intro To Cascading
Intro To CascadingIntro To Cascading
Intro To Cascading
 
Introduction to Puppet Enterprise
Introduction to Puppet EnterpriseIntroduction to Puppet Enterprise
Introduction to Puppet Enterprise
 
Adopting Kubernetes with Puppet
Adopting Kubernetes with PuppetAdopting Kubernetes with Puppet
Adopting Kubernetes with Puppet
 

Similar to Deploying Hadoop-Based Bigdata Environments

App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Hw09 Clouderas Distribution For Hadoop
Hw09   Clouderas Distribution For HadoopHw09   Clouderas Distribution For Hadoop
Hw09 Clouderas Distribution For HadoopCloudera, Inc.
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopDataWorks Summit
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101Adam Muise
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-trainingGeohedrick
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereGanesh Raju
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online trainingsrikanthhadoop
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and dockerFabio Fumarola
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and DockerFabio Fumarola
 

Similar to Deploying Hadoop-Based Bigdata Environments (20)

App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Hw09 Clouderas Distribution For Hadoop
Hw09   Clouderas Distribution For HadoopHw09   Clouderas Distribution For Hadoop
Hw09 Clouderas Distribution For Hadoop
 
Hadoop description
Hadoop descriptionHadoop description
Hadoop description
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Unit 5
Unit  5Unit  5
Unit 5
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 
Docker-v3.pdf
Docker-v3.pdfDocker-v3.pdf
Docker-v3.pdf
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
Lecture 2 part 2
Lecture 2 part 2Lecture 2 part 2
Lecture 2 part 2
 

More from Puppet

Puppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepoPuppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepoPuppet
 
Puppetcamp r10kyaml
Puppetcamp r10kyamlPuppetcamp r10kyaml
Puppetcamp r10kyamlPuppet
 
2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)Puppet
 
Puppet camp vscode
Puppet camp vscodePuppet camp vscode
Puppet camp vscodePuppet
 
Modules of the twenties
Modules of the twentiesModules of the twenties
Modules of the twentiesPuppet
 
Applying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance codeApplying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance codePuppet
 
KGI compliance as-code approach
KGI compliance as-code approachKGI compliance as-code approach
KGI compliance as-code approachPuppet
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationPuppet
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliancePuppet
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowPuppet
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Puppet
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppetPuppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkPuppet
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping groundPuppet
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy SoftwarePuppet
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User GroupPuppet
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsPuppet
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyPuppet
 

More from Puppet (20)

Puppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepoPuppet camp2021 testing modules and controlrepo
Puppet camp2021 testing modules and controlrepo
 
Puppetcamp r10kyaml
Puppetcamp r10kyamlPuppetcamp r10kyaml
Puppetcamp r10kyaml
 
2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)2021 04-15 operational verification (with notes)
2021 04-15 operational verification (with notes)
 
Puppet camp vscode
Puppet camp vscodePuppet camp vscode
Puppet camp vscode
 
Modules of the twenties
Modules of the twentiesModules of the twenties
Modules of the twenties
 
Applying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance codeApplying Roles and Profiles method to compliance code
Applying Roles and Profiles method to compliance code
 
KGI compliance as-code approach
KGI compliance as-code approachKGI compliance as-code approach
KGI compliance as-code approach
 
Enforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automationEnforce compliance policy with model-driven automation
Enforce compliance policy with model-driven automation
 
Keynote: Puppet camp compliance
Keynote: Puppet camp complianceKeynote: Puppet camp compliance
Keynote: Puppet camp compliance
 
Automating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNowAutomating it management with Puppet + ServiceNow
Automating it management with Puppet + ServiceNow
 
Puppet: The best way to harden Windows
Puppet: The best way to harden WindowsPuppet: The best way to harden Windows
Puppet: The best way to harden Windows
 
Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020Simplified Patch Management with Puppet - Oct. 2020
Simplified Patch Management with Puppet - Oct. 2020
 
Accelerating azure adoption with puppet
Accelerating azure adoption with puppetAccelerating azure adoption with puppet
Accelerating azure adoption with puppet
 
Puppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael PinsonPuppet catalog Diff; Raphael Pinson
Puppet catalog Diff; Raphael Pinson
 
ServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin ReeuwijkServiceNow and Puppet- better together, Kevin Reeuwijk
ServiceNow and Puppet- better together, Kevin Reeuwijk
 
Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping ground
 
100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software100% Puppet Cloud Deployment of Legacy Software
100% Puppet Cloud Deployment of Legacy Software
 
Puppet User Group
Puppet User GroupPuppet User Group
Puppet User Group
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOps
 
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick MaludyThe Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
The Dynamic Duo of Puppet and Vault tame SSL Certificates, Nick Maludy
 

Recently uploaded

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Deploying Hadoop-Based Bigdata Environments

  • 1. Deploying Hadoop-Based Bigdata Environments Click to edit Master subtitle style “[Tall] Tales From The Frontier” Roman Shaposhnik rvs@apache.org, Cloudera Inc.
  • 2. $ whoami  An open source software developer  Linux kernel, C/C++ compilers, FFmpeg, Plan9  A Hadoop and all around UNIX guy  root@cloudera  Member of the “Kitchen” team  Apache Software Foundation Incubator PMC  [Bigtop], Hadoop Development Tools, Celix, Helix  VP of Apache Bigtop 2
  • 3. ZooKeeper (coordination) HUE (web based UI) Pig (DQL) Hive (SQL) Impala (SQL) HBase YARN/MR1 Oozie HDFS (filesystem) 3
  • 4. ZooKeeper (coordination) HUE (web based UI) Pig (DQL) Hive (SQL) Impala (SQL) HBase YARN/MR1 Oozie HDFS (filesystem) 4
  • 5. It is a jungle out there  Zookeeper  Sqoop  JDK/JRE  Hadoop  Oozie  Kerberos  HDFS  Whirr  Ganglia  YARN  Mahout  Nagios  MR1  Flume  JSVC  HTTPFS  Giraph  Tomcat  HBase  Hama  Utils  Pig  Hue  Postgress  Hive  Solr  HTTPD  Impala  Crunch 5
  • 6. And the answer is: Puppet[forge] 6
  • 7. One way of using Apache software $ wget http://apache.org/httpd.tar.gz $ tar xzvf httpd.tar.gz $ cd httpd $ ./configure ; make $ make install ERROR: can't write to /usr/local/bin $ sudo make install 7
  • 8. A different way $ sudo apt-get install httpd Would you like to also upgrade your conf? 8
  • 9. Is there apt-get install hadoop ?  Hadoop is still in a very active development  Hadoop is Java based  Hadoop is a distributed application  Hadoop is way more than HDFS + MR 9
  • 10. Project-by-project approach  “Passively” maintained code  Packaging, OS-level (init.d)  Developer-centric view  Edit-compile-debug cycle vs. deployment  Lack of integration testing  Differences in distributions/packaging:  Where is this valid: /usr/libexec ?  Combinatoric explosion of dependencies 10
  • 11. Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question: $ tar xzvf hive-0.8.1.tar.gz $ ls hive-0.8.1/lib 11
  • 12. Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question: $ tar xzvf hive-0.8.1.tar.gz $ ls hive-0.8.1/lib hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar 12
  • 13. Remember what Debian did to Linux? GNU Software Linux kernel Linux kernel 13
  • 14. Bigtop is trying to do it with Hadoop Hadoop Ecosystem Hadoop Linux kernel (Pig, Hive, Mahout) (HDFS + MR) CDH4 beta 1 14
  • 15. What's there in Bigtop  Build/Packaging infrastructure  RPM, DEB, (tarballs, homebrew/MacPorts)  VirtualBox, VMWare and KVM VMs  Fedora, OpenSUSE, Mageia, CentOS, Ubuntu  Puppet deployment infrastrucutre  Integration test infrastrucutre (iTest)  Bigtop Jenkins:  http://bigtop01.cloudera.org:8080 15
  • 16. And the answer is: Puppet[Bigtop] 16
  • 17. System software deployment  Packages vs. Puppet code  package/file/service  What is packaging?  dependency tracking  build encapsulation  java packaging  file layout  user creation  service registration 17
  • 18. Does it really work?  Java packaging  maven/ivy integration  file layout  side-by-side installations of the same package  user creation  LDAP/AD provisioning  service registration  start on install vs. start on reboot 18
  • 19. Petascale distributed systems  Scale  Yahoo! ~5000 nodes  Deployment orchestration  Kerberos::Host_keytab <| title == "hdfs" |> -> Service["hadoop-hdfs-datanode"]  Highly coordinated distributed system  It ain't HTTPD/loadbalancer  Rolling upgrades/asynchronous rollbacks 19
  • 20. Back to tarballs and shell?  What's better for Puppet: fpm or rpm?  What is the role of Puppet?  coordinating the entire system: lack of DSL  converging an isolated node: will it ever work?  a building block for an agent-based system  One agent to rule them all?  there's no spoon^H^H^H^H^H^ agent: Whirr  MCollective 20  Cloudera Manager, Ambari
  • 21. Evolution, not perfection!  Minimalistic, highly consistent packages  /usr/lib/hadoop, /etc/hadoop/conf (alternative)  fail gracefully: .... || : )  Java packaging is not solved [yet]: symlinks  Minimalistic Puppet code  package/file/service  masterless (most of the time)  integration with Whirr  BoxGrinder 21
  • 22. The road ahead  New kind of configuration management  /etc/hadoop vs Zookeeper  New kinds of system packaging  Parcels (tarballs + metadata)  HPS (Hadoop Packaging System)  Orchestration: to puppet or not to puppet?  Cloudera Manager  Apache Ambari (incubating)  Reactor 8: http://reactor8.com 22
  • 23. Java Packaging  Fate of Java  OpenJDK  OSGi  Hadoop's view: MAPREDUCE-1700 https://issues.apache.org/jira/browse/MAPREDUCE-1700  Project Jigsaw  Language tie-ins? Really?  Linux vendors getting their act together 23
  • 24. Integration testing  Clean room provisioning  Those ain't unit tests – they trash the system  Cluster topology and cluster state discovery  How can puppet help us?  Cluster state manipulation  Test-driven orchestration  Chaos Monkey  How to be successful in OS co-opetition  Make everything pluggable (and subvert ;-)) 24
  • 25. Anatomy of iTest  Versioned, JVM-based test/data artifacts  Dependency between test artifacts  Matching stack of integration tests  Implementation  Maven artifacts, pom files  JUnit test-execution entry point  Groovy for scripting 25
  • 26. Who's the target audience  End users  YOU!  ASF Projects/Bigdata developers  from Avro to Zookeeper  Bigdata solutions vendors  Cloudera, EMC, Hortonworks, Karmasphere  DevOPs  Ebay, Yahoo, Facebook, LinkedIn 26
  • 27. Who's on-board?  Cloudera  CDH4 is 100% based on Bigtop (hadoop v2)  Available @cloudera.com  Canonical  Ubuntu Server: Hadoop and Bigdata blueprint https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop  TrendMicro  Hortonworks (partially)  EMC, EBay (early stages of prototyping) 27
  • 28. What's happening?  A special release: Bigtop 0.3.0-incubating  Hadoop 1.0.1  Last stable release: Bigtop 0.5.0  Hadoop 2.0.2-alpha  Next stable release: Bigtop 0.6.0  End of Mar 2013 release  Hadoop 2.0.3-beta  Major focus on developers 28
  • 29. What Bigtop needs from you?  More of you!  Meetup: “Silicon Valley Hands-on Programming” http://www.meetup.com/HandsOnProgrammingEvents/  More infrastructure for build/test  EC2, Supercell, EMC magic cluster, CloudStack  More integration tests  Convince your bosses to commit to Bigtop  Validate upstream release using Bigtop 29
  • 30. Contact § Bigtop home @Apache: • http://incubator.apache.org/bigtop/ § Hangout places: • {dev,user}@bigtop.apache.org • #bigtop on Freenode § Roman Shaposhnik • rvs@apache.org, rvs@cloudera.com 30