SlideShare a Scribd company logo
1 of 41
Deploying and Managing
 Hadoop Clusters with
 AMBARI
Matt Foley and Hitesh Shah
Hortonworks, Inc.
mfoley@hortonworks.com
hitesh@hortonworks.com


 © Hortonworks Inc. 2012     Page 1
Matt Foley - Background
•  MTS at Hortonworks Inc.
   – Hadoop Core contributor, part of original ~25 in Yahoo! spin-out of
     Hortonworks
   – Currently managing engineering infrastructure for Hortonworks, including
     build and deployment automation
   – My team also volunteers Build Engineering infrastructure services to ASF,
     for Hadoop core and several related projects within Apache
   – Participated in the Hortonworks team working on Ambari implementation
     during transitional phase
   – Formerly, led software development for back end of Yahoo Mail for three
     years – 20,000 servers in hundreds of clusters, with 30 PB of data under
     management, 400M active users


•  Apache Hadoop, ASF
   – Committer and PMC member, Hadoop core
   – Release Manager – Hadoop-1.0

       Architecting the Future of Big Data
                                                                           Page 2
       © Hortonworks Inc. 2012
Hitesh Shah - Background
• MTS at Hortonworks Inc.
• Committer for Apache MapReduce and Ambari
• Earlier, spent 8+ years at Yahoo! building various
  frameworks all the way from data storage platforms to
  high throughput online ad-serving systems.




     Architecting the Future of Big Data
                                                     Page 3
     © Hortonworks Inc. 2012
Overview
• Brief history – evolution of the Ambari project
• Installation
• Monitoring
• Management
• Invitation




      Architecting the Future of Big Data
                                                    Page 4
      © Hortonworks Inc. 2012
All features are available today
• Apologies that screen shots are from HMC
  (Hortonworks Management Console) version of
  Ambari
• Same code as current Ambari, but with Hortonworks
  graphic elements
• You too can “skin” Ambari with your own logotype
  and graphic elements!




     Architecting the Future of Big Data
                                                     Page 5
     © Hortonworks Inc. 2012
History
Of Ambari




Architecting the Future of Big Data
                                      Page 6
© Hortonworks Inc. 2012
Brief History of the Ambari Project
• Deployment, Monitoring, and Management of Hadoop
  and HBase clusters is:
  – HARD, due to massive scale and distributed services; and
  – DIFFERENT from other kinds of compute clusters,
    due to Hadoop’s intrinsic fault-tolerance
• We needed an Apache opensource solution
• Started Ambari as an Apache incubator project
  – Originally based in part on what was learned from “Hadoop
    Management System” project out of Yahoo!




     Architecting the Future of Big Data
                                                                Page 7
     © Hortonworks Inc. 2012
History (continued)
• Early work specified a full architecture, including
  many elements that remain today:
  – State-based configuration management, rather than event-based
  – Cluster configuration as a data object, able to be saved and manipulated
  – Reliable deployment, parallelized for scalability
  – Insightful monitoring and alerting, sharing our deep experience with the
    community
  – Take advantage of Puppet to achieve idempotence on installs, and
    reliable start/stop of processes
  – Go beyond Puppet to offer orchestrated start/stop of distributed services
• The team started with a “whole cloth” design and
  build project
• 6 months into it, we figured out we had a 2-year
  project on our hands!

      Architecting the Future of Big Data
                                                                           Page 8
      © Hortonworks Inc. 2012
Evolution
•  How to get a useful tool out to the community sooner?
•  Make more use of existing tech
   – Ganglia and Nagios for monitoring and alerting
   – Puppet for reliable deployment and process control
•  Commit to incremental delivery
   – First generation won’t have all the breadth and features desirable
   – But will be useful and worth using


•  And the team has completed the first usable version of Ambari
   over the last few weeks!
   – Offers a good, GUI-driven Deploy experience, currently limited to RHEL5/
     CentOS5 and non-secure mode (but just wait a few more weeks!)
   – Quite nice Monitoring, based on our experience managing multi-
     thousand-node Hadoop clusters at Yahoo!
   – A beginning on Management, with several basic post-install operations

       Architecting the Future of Big Data
                                                                          Page 9
       © Hortonworks Inc. 2012
Deployment
With Ambari




Architecting the Future of Big Data
                                      Page 10
© Hortonworks Inc. 2012
Deployment and Installation Phases
• Preparation
• Cluster Pre-config
• Hadoop Stack Configuration
• Hadoop Stack Deploy / Install
• Service start-up and smoke test




      Architecting the Future of Big Data
                                            Page 11
      © Hortonworks Inc. 2012
Deployment and Installation (Preparation)
•  Prepare Ambari and the Ambari Agent (includes Puppet agent)
   –  Can follow instructions at
      http://svn.apache.org/viewvc/incubator/ambari/trunk/README.txt
   –  Or download the HMC from Hortonworks after Summit, and access its
      documentation
•  Prepare access to ‘yum’ Repositories containing Hadoop Stack
   and Ambari dependencies
   –  If your nodes have direct internet access, can use provided RPMs to “install” the
      repos on each node
   –  Or, to avoid direct access from each node and minimize WAN traffic, can mirror the
      yum repositories to an internal server accessible from the nodes
•  Prepare nodes for installation commands
   –  Set up password-less ‘ssh’ for root user (secured via public keys and agent
      forwarding) from Install Master node to all other cluster nodes, so can run ‘yum
      install’ and ‘puppet’ commands
   –  Take care of any other issues that may prevent root ssh during the Deployment
      phase, such as iptables or SELinux.


        Architecting the Future of Big Data
                                                                                         Page 12
        © Hortonworks Inc. 2012
Deployment and Installation (Pre-config)

• Start running Ambari
• Provide list of hosts
  – Works with Amazon EC2 IP addresses too
• Ambari does node Validation and Discovery
  – Confirms availability and access capability
  – Scans for node attributes and mount points
• Select desired services and data directory paths
• Automatic role assignments to nodes, with your
  approval
  – Based on node attributes and selected services
  – Currently based primarily on memory size, to be refined in future



      Architecting the Future of Big Data
                                                                   Page 13
      © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 14
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 15
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 16
    © Hortonworks Inc. 2012
Deployment and Installation (Configuration)
•  Currently supported Hadoop Stack components for installation:
   – Hadoop Core (required)
   – HBase
   – Pig
   – Hive
   – HCatalog
   – Zookeeper (required for HBase, Hive, Hcat)
   – Sqoop
   – Oozie
   – Ganglia
   – Nagios


•  Modify a subset of about 50 key parameters that most commonly
   need to be adjusted, depending on components selected


       Architecting the Future of Big Data
                                                              Page 17
       © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 18
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 19
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 20
    © Hortonworks Inc. 2012
Deployment and Installation (Deploy)
•  Final review of Cluster and Stack parameters
•  Puppet agent on each node is invoked (in parallel) to reliably
   deploy needed packages
•  Actual fetch and install is managed with ‘yum’
   (for RHEL/CentOS) or comparable services
•  Success / failure is reported back to Install Master and the
   Ambari application
•  Log messages for failures are provided to assist debugging




       Architecting the Future of Big Data
                                                                    Page 21
       © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 22
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 23
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 24
    © Hortonworks Inc. 2012
Deployment and Installation (Smoke Test)
After successful install:

•  Ambari provides “orchestration” to start-up distributed services
   in dependency order

•  Puppet “kicks” are used to reliably (mostly) start and stop
   service processes on individual nodes

•  After each distributed service is started, a smoketest is run and
   results reported

•  Each component is smoketested before dependent components


After successful smoketest, you can be confident that your
selected components have been successfully installed and
started, and are running correctly.

       Architecting the Future of Big Data
                                                                  Page 25
       © Hortonworks Inc. 2012
Going forward
•  Multiple OS support
   – RHEL6/CentOS6
   – Ubuntu and Debian
   – SUSE/SLES
   – Windows
•  Hadoop Security support, including secure install for all
   components
•  HA support
•  Hadoop 2.0 support
•  Improved GUI user interface
•  Integration: Provide CLI commands for invoking Puppet scripts,
   and Web APIs where appropriate
•  Etc.



       Architecting the Future of Big Data
                                                               Page 26
       © Hortonworks Inc. 2012
Monitoring
With Ambari




Architecting the Future of Big Data
                                      Page 27
© Hortonworks Inc. 2012
Monitoring Dashboard




Architecting the Future of Big Data
                                             Page 28
© Hortonworks Inc. 2012
Ambari Monitoring
•  Basic Monitoring capabilities for Hadoop Cluster Services
   –  Up/Down status for installed Hadoop services
   –  Key Alerts configured for health, performance and usage monitoring of
      Hadoop services
   –  Consolidated summary information for Hadoop Services (HDFS, M/R & HBase)
   –  Key service metrics graphs for temporal analysis of service performance, utilization
      and health (+System metrics - Cpu/Memory/Network etc.)


•  Efficient collection and visualization of monitoring metrics
   –  Light weight alert condition checks (mostly over network) for better scalability


•  Leverage Open Source monitoring systems such as Nagios & Ganglia
   –  Nagios - for Alert Monitoring
   –  Ganglia/RRDTool for Hadoop metrics graphs


•  Simple and Intuitive UI to monitor the Hadoop cluster status


        Architecting the Future of Big Data
                                                                                         Page 29
        © Hortonworks Inc. 2012
HDFS Service




Architecting the Future of Big Data
                                               Page 30
© Hortonworks Inc. 2012
Map/Reduce Service




Architecting the Future of Big Data
                                                Page 31
© Hortonworks Inc. 2012
HBase Service




Architecting the Future of Big Data
                                                Page 32
© Hortonworks Inc. 2012
Going forward
•  Rapid iterations with Ambari Open Source community to add more
   monitoring capabilities e.g.
   –  More services Alerts, Summary stats & Reporting for the Hadoop services
   –  Queue/Job level monitoring & Diagnostic Reporting for M/R
   –  Improved Visualization of service metrics graphs & reports
   –  Ability to customize dashboard with relevant graphs, alerts and service information


•  RESTful APIs for Hadoop Monitoring
   –  For integration with Enterprise and Cloud Management Systems, and
      “powered by Ambari” products integration
   –  CLIs


•  Ability to integrate with third party monitoring tools in place of Nagios &
   Ganglia

•  Best practices, tips and guidelines for using Monitoring dashboard for
   identifying and debugging common cluster problems

        Architecting the Future of Big Data
                                                                                     Page 33
        © Hortonworks Inc. 2012
Management
With Ambari




Architecting the Future of Big Data
                                      Page 34
© Hortonworks Inc. 2012
Management
• “Management” can include many different
  post-install activities with Hadoop clusters

• Ambari currently supports only a small set:
  – Start / Stop individual services
       – Dependent services will be automatically stopped also

  – Change configuration parameters for a service
       – Cannot currently change data directory paths

  – Add nodes to the Cluster
       – Decommissioning nodes is currently a manual process

  – Uninstall the Cluster


      Architecting the Future of Big Data
                                                                 Page 35
      © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 36
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 37
    © Hortonworks Inc. 2012
.




    Architecting the Future of Big Data
                                          Page 38
    © Hortonworks Inc. 2012
Going forward
•  Lots more management actions supported
   – Security and user management
   – HA alerting and recovery
   – Extensions of current functionalities
   – Etc.


•  Integration: RESTful APIs / web services for integration with
   established management tools in the data center

•  Improved GUI user interface




       Architecting the Future of Big Data
                                                                   Page 39
       © Hortonworks Inc. 2012
Invitation
• Deployment, Monitoring, and Management – this is
  just the first generation!
• If you are interested in these functionalities and want
  to participate in an Apache opensource project,
  please consider becoming a contributor to the
  AMBARI (incubating) project!
• http://incubator.apache.org/ambari/mail-lists.html




      Architecting the Future of Big Data
                                                       Page 40
      © Hortonworks Inc. 2012
Thank you.




  Architecting the Future of Big Data
                                        Page 41
  © Hortonworks Inc. 2012

More Related Content

What's hot

Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 

What's hot (20)

Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
Apache HBase at Airbnb
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Introduction to Apache Sqoop
Introduction to Apache SqoopIntroduction to Apache Sqoop
Introduction to Apache Sqoop
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 

Viewers also liked

Ambari Meetup: Architecture and Demo
Ambari Meetup: Architecture and DemoAmbari Meetup: Architecture and Demo
Ambari Meetup: Architecture and Demo
Hortonworks
 
Apache Ambari - What's New in 1.4.2
Apache Ambari - What's New in 1.4.2Apache Ambari - What's New in 1.4.2
Apache Ambari - What's New in 1.4.2
Hortonworks
 

Viewers also liked (20)

Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
 
Ambari Meetup: Architecture and Demo
Ambari Meetup: Architecture and DemoAmbari Meetup: Architecture and Demo
Ambari Meetup: Architecture and Demo
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
Cloumon Product Introduction
Cloumon Product IntroductionCloumon Product Introduction
Cloumon Product Introduction
 
Managing your Hadoop Clusters with Ambari
Managing your Hadoop Clusters with AmbariManaging your Hadoop Clusters with Ambari
Managing your Hadoop Clusters with Ambari
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
 
Hadoop 기반 빅데이터 이해
Hadoop 기반 빅데이터 이해Hadoop 기반 빅데이터 이해
Hadoop 기반 빅데이터 이해
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Past, Present and Future of Apache Ambari
Past, Present and Future of Apache AmbariPast, Present and Future of Apache Ambari
Past, Present and Future of Apache Ambari
 
Faster Python
Faster PythonFaster Python
Faster Python
 
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetup
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetupΠαρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetup
Παρουσίαση Hadoop, MapReduce και Mahout στο 1o Hadoop UserGroup meetup
 
Reversing the dropbox client on windows
Reversing the dropbox client on windowsReversing the dropbox client on windows
Reversing the dropbox client on windows
 
Apache Ambari - What's New in 1.4.2
Apache Ambari - What's New in 1.4.2Apache Ambari - What's New in 1.4.2
Apache Ambari - What's New in 1.4.2
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 

Similar to Deploying and Managing Hadoop Clusters with AMBARI

Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
PatrickCrompton
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
 

Similar to Deploying and Managing Hadoop Clusters with AMBARI (20)

Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Inside hadoop-dev
Inside hadoop-devInside hadoop-dev
Inside hadoop-dev
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Deploying and Managing Hadoop Clusters with AMBARI

  • 1. Deploying and Managing Hadoop Clusters with AMBARI Matt Foley and Hitesh Shah Hortonworks, Inc. mfoley@hortonworks.com hitesh@hortonworks.com © Hortonworks Inc. 2012 Page 1
  • 2. Matt Foley - Background •  MTS at Hortonworks Inc. – Hadoop Core contributor, part of original ~25 in Yahoo! spin-out of Hortonworks – Currently managing engineering infrastructure for Hortonworks, including build and deployment automation – My team also volunteers Build Engineering infrastructure services to ASF, for Hadoop core and several related projects within Apache – Participated in the Hortonworks team working on Ambari implementation during transitional phase – Formerly, led software development for back end of Yahoo Mail for three years – 20,000 servers in hundreds of clusters, with 30 PB of data under management, 400M active users •  Apache Hadoop, ASF – Committer and PMC member, Hadoop core – Release Manager – Hadoop-1.0 Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2012
  • 3. Hitesh Shah - Background • MTS at Hortonworks Inc. • Committer for Apache MapReduce and Ambari • Earlier, spent 8+ years at Yahoo! building various frameworks all the way from data storage platforms to high throughput online ad-serving systems. Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2012
  • 4. Overview • Brief history – evolution of the Ambari project • Installation • Monitoring • Management • Invitation Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2012
  • 5. All features are available today • Apologies that screen shots are from HMC (Hortonworks Management Console) version of Ambari • Same code as current Ambari, but with Hortonworks graphic elements • You too can “skin” Ambari with your own logotype and graphic elements! Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2012
  • 6. History Of Ambari Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2012
  • 7. Brief History of the Ambari Project • Deployment, Monitoring, and Management of Hadoop and HBase clusters is: – HARD, due to massive scale and distributed services; and – DIFFERENT from other kinds of compute clusters, due to Hadoop’s intrinsic fault-tolerance • We needed an Apache opensource solution • Started Ambari as an Apache incubator project – Originally based in part on what was learned from “Hadoop Management System” project out of Yahoo! Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2012
  • 8. History (continued) • Early work specified a full architecture, including many elements that remain today: – State-based configuration management, rather than event-based – Cluster configuration as a data object, able to be saved and manipulated – Reliable deployment, parallelized for scalability – Insightful monitoring and alerting, sharing our deep experience with the community – Take advantage of Puppet to achieve idempotence on installs, and reliable start/stop of processes – Go beyond Puppet to offer orchestrated start/stop of distributed services • The team started with a “whole cloth” design and build project • 6 months into it, we figured out we had a 2-year project on our hands! Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2012
  • 9. Evolution •  How to get a useful tool out to the community sooner? •  Make more use of existing tech – Ganglia and Nagios for monitoring and alerting – Puppet for reliable deployment and process control •  Commit to incremental delivery – First generation won’t have all the breadth and features desirable – But will be useful and worth using •  And the team has completed the first usable version of Ambari over the last few weeks! – Offers a good, GUI-driven Deploy experience, currently limited to RHEL5/ CentOS5 and non-secure mode (but just wait a few more weeks!) – Quite nice Monitoring, based on our experience managing multi- thousand-node Hadoop clusters at Yahoo! – A beginning on Management, with several basic post-install operations Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2012
  • 10. Deployment With Ambari Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2012
  • 11. Deployment and Installation Phases • Preparation • Cluster Pre-config • Hadoop Stack Configuration • Hadoop Stack Deploy / Install • Service start-up and smoke test Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2012
  • 12. Deployment and Installation (Preparation) •  Prepare Ambari and the Ambari Agent (includes Puppet agent) –  Can follow instructions at http://svn.apache.org/viewvc/incubator/ambari/trunk/README.txt –  Or download the HMC from Hortonworks after Summit, and access its documentation •  Prepare access to ‘yum’ Repositories containing Hadoop Stack and Ambari dependencies –  If your nodes have direct internet access, can use provided RPMs to “install” the repos on each node –  Or, to avoid direct access from each node and minimize WAN traffic, can mirror the yum repositories to an internal server accessible from the nodes •  Prepare nodes for installation commands –  Set up password-less ‘ssh’ for root user (secured via public keys and agent forwarding) from Install Master node to all other cluster nodes, so can run ‘yum install’ and ‘puppet’ commands –  Take care of any other issues that may prevent root ssh during the Deployment phase, such as iptables or SELinux. Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2012
  • 13. Deployment and Installation (Pre-config) • Start running Ambari • Provide list of hosts – Works with Amazon EC2 IP addresses too • Ambari does node Validation and Discovery – Confirms availability and access capability – Scans for node attributes and mount points • Select desired services and data directory paths • Automatic role assignments to nodes, with your approval – Based on node attributes and selected services – Currently based primarily on memory size, to be refined in future Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2012
  • 14. . Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2012
  • 15. . Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2012
  • 16. . Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2012
  • 17. Deployment and Installation (Configuration) •  Currently supported Hadoop Stack components for installation: – Hadoop Core (required) – HBase – Pig – Hive – HCatalog – Zookeeper (required for HBase, Hive, Hcat) – Sqoop – Oozie – Ganglia – Nagios •  Modify a subset of about 50 key parameters that most commonly need to be adjusted, depending on components selected Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2012
  • 18. . Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2012
  • 19. . Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2012
  • 20. . Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2012
  • 21. Deployment and Installation (Deploy) •  Final review of Cluster and Stack parameters •  Puppet agent on each node is invoked (in parallel) to reliably deploy needed packages •  Actual fetch and install is managed with ‘yum’ (for RHEL/CentOS) or comparable services •  Success / failure is reported back to Install Master and the Ambari application •  Log messages for failures are provided to assist debugging Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2012
  • 22. . Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2012
  • 23. . Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2012
  • 24. . Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2012
  • 25. Deployment and Installation (Smoke Test) After successful install: •  Ambari provides “orchestration” to start-up distributed services in dependency order •  Puppet “kicks” are used to reliably (mostly) start and stop service processes on individual nodes •  After each distributed service is started, a smoketest is run and results reported •  Each component is smoketested before dependent components After successful smoketest, you can be confident that your selected components have been successfully installed and started, and are running correctly. Architecting the Future of Big Data Page 25 © Hortonworks Inc. 2012
  • 26. Going forward •  Multiple OS support – RHEL6/CentOS6 – Ubuntu and Debian – SUSE/SLES – Windows •  Hadoop Security support, including secure install for all components •  HA support •  Hadoop 2.0 support •  Improved GUI user interface •  Integration: Provide CLI commands for invoking Puppet scripts, and Web APIs where appropriate •  Etc. Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2012
  • 27. Monitoring With Ambari Architecting the Future of Big Data Page 27 © Hortonworks Inc. 2012
  • 28. Monitoring Dashboard Architecting the Future of Big Data Page 28 © Hortonworks Inc. 2012
  • 29. Ambari Monitoring •  Basic Monitoring capabilities for Hadoop Cluster Services –  Up/Down status for installed Hadoop services –  Key Alerts configured for health, performance and usage monitoring of Hadoop services –  Consolidated summary information for Hadoop Services (HDFS, M/R & HBase) –  Key service metrics graphs for temporal analysis of service performance, utilization and health (+System metrics - Cpu/Memory/Network etc.) •  Efficient collection and visualization of monitoring metrics –  Light weight alert condition checks (mostly over network) for better scalability •  Leverage Open Source monitoring systems such as Nagios & Ganglia –  Nagios - for Alert Monitoring –  Ganglia/RRDTool for Hadoop metrics graphs •  Simple and Intuitive UI to monitor the Hadoop cluster status Architecting the Future of Big Data Page 29 © Hortonworks Inc. 2012
  • 30. HDFS Service Architecting the Future of Big Data Page 30 © Hortonworks Inc. 2012
  • 31. Map/Reduce Service Architecting the Future of Big Data Page 31 © Hortonworks Inc. 2012
  • 32. HBase Service Architecting the Future of Big Data Page 32 © Hortonworks Inc. 2012
  • 33. Going forward •  Rapid iterations with Ambari Open Source community to add more monitoring capabilities e.g. –  More services Alerts, Summary stats & Reporting for the Hadoop services –  Queue/Job level monitoring & Diagnostic Reporting for M/R –  Improved Visualization of service metrics graphs & reports –  Ability to customize dashboard with relevant graphs, alerts and service information •  RESTful APIs for Hadoop Monitoring –  For integration with Enterprise and Cloud Management Systems, and “powered by Ambari” products integration –  CLIs •  Ability to integrate with third party monitoring tools in place of Nagios & Ganglia •  Best practices, tips and guidelines for using Monitoring dashboard for identifying and debugging common cluster problems Architecting the Future of Big Data Page 33 © Hortonworks Inc. 2012
  • 34. Management With Ambari Architecting the Future of Big Data Page 34 © Hortonworks Inc. 2012
  • 35. Management • “Management” can include many different post-install activities with Hadoop clusters • Ambari currently supports only a small set: – Start / Stop individual services – Dependent services will be automatically stopped also – Change configuration parameters for a service – Cannot currently change data directory paths – Add nodes to the Cluster – Decommissioning nodes is currently a manual process – Uninstall the Cluster Architecting the Future of Big Data Page 35 © Hortonworks Inc. 2012
  • 36. . Architecting the Future of Big Data Page 36 © Hortonworks Inc. 2012
  • 37. . Architecting the Future of Big Data Page 37 © Hortonworks Inc. 2012
  • 38. . Architecting the Future of Big Data Page 38 © Hortonworks Inc. 2012
  • 39. Going forward •  Lots more management actions supported – Security and user management – HA alerting and recovery – Extensions of current functionalities – Etc. •  Integration: RESTful APIs / web services for integration with established management tools in the data center •  Improved GUI user interface Architecting the Future of Big Data Page 39 © Hortonworks Inc. 2012
  • 40. Invitation • Deployment, Monitoring, and Management – this is just the first generation! • If you are interested in these functionalities and want to participate in an Apache opensource project, please consider becoming a contributor to the AMBARI (incubating) project! • http://incubator.apache.org/ambari/mail-lists.html Architecting the Future of Big Data Page 40 © Hortonworks Inc. 2012
  • 41. Thank you. Architecting the Future of Big Data Page 41 © Hortonworks Inc. 2012