More Related Content Similar to Deploying and Managing Hadoop Clusters with AMBARI (20) More from DataWorks Summit (20) Deploying and Managing Hadoop Clusters with AMBARI1. Deploying and Managing
Hadoop Clusters with
AMBARI
Matt Foley and Hitesh Shah
Hortonworks, Inc.
mfoley@hortonworks.com
hitesh@hortonworks.com
© Hortonworks Inc. 2012 Page 1
2. Matt Foley - Background
• MTS at Hortonworks Inc.
– Hadoop Core contributor, part of original ~25 in Yahoo! spin-out of
Hortonworks
– Currently managing engineering infrastructure for Hortonworks, including
build and deployment automation
– My team also volunteers Build Engineering infrastructure services to ASF,
for Hadoop core and several related projects within Apache
– Participated in the Hortonworks team working on Ambari implementation
during transitional phase
– Formerly, led software development for back end of Yahoo Mail for three
years – 20,000 servers in hundreds of clusters, with 30 PB of data under
management, 400M active users
• Apache Hadoop, ASF
– Committer and PMC member, Hadoop core
– Release Manager – Hadoop-1.0
Architecting the Future of Big Data
Page 2
© Hortonworks Inc. 2012
3. Hitesh Shah - Background
• MTS at Hortonworks Inc.
• Committer for Apache MapReduce and Ambari
• Earlier, spent 8+ years at Yahoo! building various
frameworks all the way from data storage platforms to
high throughput online ad-serving systems.
Architecting the Future of Big Data
Page 3
© Hortonworks Inc. 2012
4. Overview
• Brief history – evolution of the Ambari project
• Installation
• Monitoring
• Management
• Invitation
Architecting the Future of Big Data
Page 4
© Hortonworks Inc. 2012
5. All features are available today
• Apologies that screen shots are from HMC
(Hortonworks Management Console) version of
Ambari
• Same code as current Ambari, but with Hortonworks
graphic elements
• You too can “skin” Ambari with your own logotype
and graphic elements!
Architecting the Future of Big Data
Page 5
© Hortonworks Inc. 2012
7. Brief History of the Ambari Project
• Deployment, Monitoring, and Management of Hadoop
and HBase clusters is:
– HARD, due to massive scale and distributed services; and
– DIFFERENT from other kinds of compute clusters,
due to Hadoop’s intrinsic fault-tolerance
• We needed an Apache opensource solution
• Started Ambari as an Apache incubator project
– Originally based in part on what was learned from “Hadoop
Management System” project out of Yahoo!
Architecting the Future of Big Data
Page 7
© Hortonworks Inc. 2012
8. History (continued)
• Early work specified a full architecture, including
many elements that remain today:
– State-based configuration management, rather than event-based
– Cluster configuration as a data object, able to be saved and manipulated
– Reliable deployment, parallelized for scalability
– Insightful monitoring and alerting, sharing our deep experience with the
community
– Take advantage of Puppet to achieve idempotence on installs, and
reliable start/stop of processes
– Go beyond Puppet to offer orchestrated start/stop of distributed services
• The team started with a “whole cloth” design and
build project
• 6 months into it, we figured out we had a 2-year
project on our hands!
Architecting the Future of Big Data
Page 8
© Hortonworks Inc. 2012
9. Evolution
• How to get a useful tool out to the community sooner?
• Make more use of existing tech
– Ganglia and Nagios for monitoring and alerting
– Puppet for reliable deployment and process control
• Commit to incremental delivery
– First generation won’t have all the breadth and features desirable
– But will be useful and worth using
• And the team has completed the first usable version of Ambari
over the last few weeks!
– Offers a good, GUI-driven Deploy experience, currently limited to RHEL5/
CentOS5 and non-secure mode (but just wait a few more weeks!)
– Quite nice Monitoring, based on our experience managing multi-
thousand-node Hadoop clusters at Yahoo!
– A beginning on Management, with several basic post-install operations
Architecting the Future of Big Data
Page 9
© Hortonworks Inc. 2012
11. Deployment and Installation Phases
• Preparation
• Cluster Pre-config
• Hadoop Stack Configuration
• Hadoop Stack Deploy / Install
• Service start-up and smoke test
Architecting the Future of Big Data
Page 11
© Hortonworks Inc. 2012
12. Deployment and Installation (Preparation)
• Prepare Ambari and the Ambari Agent (includes Puppet agent)
– Can follow instructions at
http://svn.apache.org/viewvc/incubator/ambari/trunk/README.txt
– Or download the HMC from Hortonworks after Summit, and access its
documentation
• Prepare access to ‘yum’ Repositories containing Hadoop Stack
and Ambari dependencies
– If your nodes have direct internet access, can use provided RPMs to “install” the
repos on each node
– Or, to avoid direct access from each node and minimize WAN traffic, can mirror the
yum repositories to an internal server accessible from the nodes
• Prepare nodes for installation commands
– Set up password-less ‘ssh’ for root user (secured via public keys and agent
forwarding) from Install Master node to all other cluster nodes, so can run ‘yum
install’ and ‘puppet’ commands
– Take care of any other issues that may prevent root ssh during the Deployment
phase, such as iptables or SELinux.
Architecting the Future of Big Data
Page 12
© Hortonworks Inc. 2012
13. Deployment and Installation (Pre-config)
• Start running Ambari
• Provide list of hosts
– Works with Amazon EC2 IP addresses too
• Ambari does node Validation and Discovery
– Confirms availability and access capability
– Scans for node attributes and mount points
• Select desired services and data directory paths
• Automatic role assignments to nodes, with your
approval
– Based on node attributes and selected services
– Currently based primarily on memory size, to be refined in future
Architecting the Future of Big Data
Page 13
© Hortonworks Inc. 2012
14. .
Architecting the Future of Big Data
Page 14
© Hortonworks Inc. 2012
15. .
Architecting the Future of Big Data
Page 15
© Hortonworks Inc. 2012
16. .
Architecting the Future of Big Data
Page 16
© Hortonworks Inc. 2012
17. Deployment and Installation (Configuration)
• Currently supported Hadoop Stack components for installation:
– Hadoop Core (required)
– HBase
– Pig
– Hive
– HCatalog
– Zookeeper (required for HBase, Hive, Hcat)
– Sqoop
– Oozie
– Ganglia
– Nagios
• Modify a subset of about 50 key parameters that most commonly
need to be adjusted, depending on components selected
Architecting the Future of Big Data
Page 17
© Hortonworks Inc. 2012
18. .
Architecting the Future of Big Data
Page 18
© Hortonworks Inc. 2012
19. .
Architecting the Future of Big Data
Page 19
© Hortonworks Inc. 2012
20. .
Architecting the Future of Big Data
Page 20
© Hortonworks Inc. 2012
21. Deployment and Installation (Deploy)
• Final review of Cluster and Stack parameters
• Puppet agent on each node is invoked (in parallel) to reliably
deploy needed packages
• Actual fetch and install is managed with ‘yum’
(for RHEL/CentOS) or comparable services
• Success / failure is reported back to Install Master and the
Ambari application
• Log messages for failures are provided to assist debugging
Architecting the Future of Big Data
Page 21
© Hortonworks Inc. 2012
22. .
Architecting the Future of Big Data
Page 22
© Hortonworks Inc. 2012
23. .
Architecting the Future of Big Data
Page 23
© Hortonworks Inc. 2012
24. .
Architecting the Future of Big Data
Page 24
© Hortonworks Inc. 2012
25. Deployment and Installation (Smoke Test)
After successful install:
• Ambari provides “orchestration” to start-up distributed services
in dependency order
• Puppet “kicks” are used to reliably (mostly) start and stop
service processes on individual nodes
• After each distributed service is started, a smoketest is run and
results reported
• Each component is smoketested before dependent components
After successful smoketest, you can be confident that your
selected components have been successfully installed and
started, and are running correctly.
Architecting the Future of Big Data
Page 25
© Hortonworks Inc. 2012
26. Going forward
• Multiple OS support
– RHEL6/CentOS6
– Ubuntu and Debian
– SUSE/SLES
– Windows
• Hadoop Security support, including secure install for all
components
• HA support
• Hadoop 2.0 support
• Improved GUI user interface
• Integration: Provide CLI commands for invoking Puppet scripts,
and Web APIs where appropriate
• Etc.
Architecting the Future of Big Data
Page 26
© Hortonworks Inc. 2012
29. Ambari Monitoring
• Basic Monitoring capabilities for Hadoop Cluster Services
– Up/Down status for installed Hadoop services
– Key Alerts configured for health, performance and usage monitoring of
Hadoop services
– Consolidated summary information for Hadoop Services (HDFS, M/R & HBase)
– Key service metrics graphs for temporal analysis of service performance, utilization
and health (+System metrics - Cpu/Memory/Network etc.)
• Efficient collection and visualization of monitoring metrics
– Light weight alert condition checks (mostly over network) for better scalability
• Leverage Open Source monitoring systems such as Nagios & Ganglia
– Nagios - for Alert Monitoring
– Ganglia/RRDTool for Hadoop metrics graphs
• Simple and Intuitive UI to monitor the Hadoop cluster status
Architecting the Future of Big Data
Page 29
© Hortonworks Inc. 2012
33. Going forward
• Rapid iterations with Ambari Open Source community to add more
monitoring capabilities e.g.
– More services Alerts, Summary stats & Reporting for the Hadoop services
– Queue/Job level monitoring & Diagnostic Reporting for M/R
– Improved Visualization of service metrics graphs & reports
– Ability to customize dashboard with relevant graphs, alerts and service information
• RESTful APIs for Hadoop Monitoring
– For integration with Enterprise and Cloud Management Systems, and
“powered by Ambari” products integration
– CLIs
• Ability to integrate with third party monitoring tools in place of Nagios &
Ganglia
• Best practices, tips and guidelines for using Monitoring dashboard for
identifying and debugging common cluster problems
Architecting the Future of Big Data
Page 33
© Hortonworks Inc. 2012
35. Management
• “Management” can include many different
post-install activities with Hadoop clusters
• Ambari currently supports only a small set:
– Start / Stop individual services
– Dependent services will be automatically stopped also
– Change configuration parameters for a service
– Cannot currently change data directory paths
– Add nodes to the Cluster
– Decommissioning nodes is currently a manual process
– Uninstall the Cluster
Architecting the Future of Big Data
Page 35
© Hortonworks Inc. 2012
36. .
Architecting the Future of Big Data
Page 36
© Hortonworks Inc. 2012
37. .
Architecting the Future of Big Data
Page 37
© Hortonworks Inc. 2012
38. .
Architecting the Future of Big Data
Page 38
© Hortonworks Inc. 2012
39. Going forward
• Lots more management actions supported
– Security and user management
– HA alerting and recovery
– Extensions of current functionalities
– Etc.
• Integration: RESTful APIs / web services for integration with
established management tools in the data center
• Improved GUI user interface
Architecting the Future of Big Data
Page 39
© Hortonworks Inc. 2012
40. Invitation
• Deployment, Monitoring, and Management – this is
just the first generation!
• If you are interested in these functionalities and want
to participate in an Apache opensource project,
please consider becoming a contributor to the
AMBARI (incubating) project!
• http://incubator.apache.org/ambari/mail-lists.html
Architecting the Future of Big Data
Page 40
© Hortonworks Inc. 2012
41. Thank you.
Architecting the Future of Big Data
Page 41
© Hortonworks Inc. 2012