3. What is Apache Ambari?
Apache Ambari is the open-source platform to
provision, manage and monitor Hadoop clusters
4. New Enterprise Features
Ambari 2.4
• New Services: Log Search, Zeppelin, Hive LLAP
• Role Based Access Control
• Management Packs
• Grafana UI for Ambari Metrics System
• New Views: Zeppelin, Storm
7. Deploy On Premise
Ambari UI wizard handles all of these
combinations and makes recommendations
based on host specs.
8. Deploy On The Cloud
Certified environments
Sysprepped VMs
Hundreds of similar clusters
9. Deploy with Blueprints
• Systematic way of defining a cluster
• Export existing cluster into blueprint
/api/v1/clusters/:clusterName?format=blueprint
Configs Topology Hosts Cluster
14. Blueprints for Large Scale
• Kerberos, secure out-of-the-box
• High Availability is setup initially for
NameNode, YARN, Hive, Oozie, etc
• Host Discovery allows Ambari to
automatically install services for a Host
when it comes online
• Stack Advisor recommendations
16. Kerberos
Available since Ambari 2.0
• Ambari manages Kerberos principals and keytabs
• Works with existing MIT KDC or Active Directory
• Once Kerberized, handles
• Adding hosts
• Adding components to existing hosts
• Adding services
• Moving components to different hosts
17. Management Packs - Motivation
• Release Management
o Ambari core and stacks released together
o Stack changes require Ambari release
o Decouple stack and Ambari core releases
• Add-on Services
o Release vehicle for 3rd party services
o Self contained release artifacts
19. Management Packs
• Generalized release artifact for stacks, add-on
services, views, etc
• Decouples stack releases from Ambari core
release
• Tarballs with metadata for applicability and
content
• Stack is an overlay of multiple management
packs
21. Management Pack++
Short Term Goals (Ambari 2.4)
• Retrofit in Stack Processing Framework
• Enable 3rd party to ship add-on services
• Command line support
Long Term Goals (Future)
• Management Pack Framework
• Deliver Views
• Rest API support
22. Role Based Access Control (RBAC)
As Ambari & organizations grow,
so do security needs
Ambari integrates with external
authentication systems & LDAP
23. RBAC Terms
• Roles have permissions,
e.g., add services to cluster
• Roles are applied to Resources
e.g., Ambari, particular Cluster, particular View
• Users belong to groups
• A group has a role
• Users can also have additional roles
24. New RBAC Roles
allAmbari Admin
Cluster Admin except manage permissions
Cluster Op except add services, Kerberos,
manage Alerts, & upgrades
Service Admin except alter cluster topology
or install components
Service Op except change configs
Read-Only only view
26. Background: Upgrade Terminology
Manual
Upgrade
The user follows instructions to upgrade
the stack
Incurs downtime
Rolling
Upgrade
Automated
Upgrades one component
per host at a time
Preserves cluster operation
and minimizes service impact
27. Background: Upgrade Terminology
Express
Upgrade
Automated
Runs in parallel across hosts
Incurs downtime
Manual
Upgrade
The user follows instructions to upgrade
the stack
Incurs downtime
Rolling
Upgrade
Automated
Upgrades one component
per host at a time
Preserves cluster operation
and minimizes service impact
28. Automated Upgrade: Rolling or Express
Check
Prerequisites
Review the
prereqs to
confirm
your cluster
configs are
ready
Prepare
Take
backups of
critical
cluster
metadata
Perform
Upgrade
Perform the
HDP
upgrade.
The steps
depend on
upgrade
method:
Rolling or
Express
Register +
Install
Register the
HDP
repository
and install
the target
HDP version
on the
cluster
Finalize
Finalize the
upgrade,
making the
target
version the
current
version
33. Future of Ambari
• Cloud features
• Multiple instances of same service at different
versions, e.g., Spark 1.6 and Spark 2.0
• YARN assemblies
• Component & Patch Upgrades: upgrade
individual components in the same stack
version, e.g., just DN and RM in HDP 2.4.*.*
with zero downtime
Editor's Notes
Log Search : Solr, Logfeeder (similar to Logstash), and Grafana UI
Zeppelin for data exploration and visualization that can plugin to multiple data backends
Role Based Access Control
2.3.0 was not used
2.4.0 is slated with a ton of new features
Cadence is 2-3 major releases per year, with follow up maintenance releases in the months after.
Deploy: Blueprints with Host Discovery
Secure: Kerberos, LDAP syncSmart Configs: stack advisor, painful to configure a thousand related knobs
Monitor: Ambari Alerts, Ambari Metrics
Upgrade: Rolling and Express Upgrade, get patches
Analyze, Scale, Extend: Views, Management Packs
Cloudbreak can install on Amazon EC2, MSFT Azure,
Cluster install takes 5-10 mins, mostly downloading packages, installing bits, and starting services.
Used by HDInsight (Microsoft Azure) and Hortonworks QA
Allow cluster creation or scaling to be started via the REST API prior to all/any hosts being available. As hosts register with Ambari server they will be matched to request host groups and provisioned according to the requested topology
Allow host predicates to be specified along with host count to provide more flexibility in matching hosts to host groups. This will allow for host flavors where different host groups are matched to different host flavors
Break up the current monolithic provisioning request into a request for each host operation. For example, install on host A, start on host A, install on hostB, etc. This will allow hosts to make progress even when another host encounters a failure.
Allow a host count to be specified in the cluster creation template instead of host names. This is documented in https://issues.apache.org/jira/browse/AMBARI-6275
Install a cluster with two API calls
The blueprint contains the configs, assignment of topology to host group, stack version
The creation actually assigns hosts to each host group.
The blueprint contains the configs, assignment of topology to host group, stack version
The creation actually assigns hosts to each host group.
The blueprint contains the configs, assignment of topology to host group, stack version
The creation actually assigns hosts to each host group.
The blueprint contains the configs, assignment of topology to host group, stack version
The creation actually assigns hosts to each host group.
Dynamic availability
Allow host_count to be specified instead of host_namesAs hosts register, they will be matched to the request host groups and provisioned according to to the requested topology
When specifying a host_count, a predicate can also be specified for finer-grained control
Dynamic availability
Allow host_count to be specified instead of host_namesAs hosts register, they will be matched to the request host groups and provisioned according to to the requested topology
When specifying a host_count, a predicate can also be specified for finer-grained control
3 Terabytes since units is in MB
As Ambari grows and organizations grow, so do security needs
Users have fine-grained roles over the cluster and individual views.
Granular authorization checks to distribute the responsibilities and privileges of authenticated users
Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time
Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time
Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time
Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
This Grafana instance is specifically for AMS, not meant to be general-purpose
If customer is already using Grafana, this is not a replacement.
Grafana will support read-only access for anonymous users, and HTTPS