-Kerberos and Health Checks and Bare Metal, Oh My! Updates to OpenStack Sahara in Newton
1. Kerberos and Health Checks and Bare
Metal, Oh My!
Updates to OpenStack Sahara in Newton
Updates to OpenStack Sahara in Newton
Vitaly Gridnev, Sahara PTL (Mirantis)
Elise Gafford, Sahara Core (Red Hat)
Nikita Konovalov, Sahara Core (Mirantis)
2. Agenda
1. Sahara overview
2. Health checks and management improvements
3. Kerberos integration for clusters
4. Image generation improvements
5. Bare metal clusters
6. What is NEW in NEWton
7. Q&A
3. Agenda
1. Sahara overview
2. Health checks and management improvements
3. Kerberos integration for clusters
4. Image generation improvements
5. Bare metal clusters
6. What is NEW in NEWton
7. Q&A
4. Sahara: The Use Cases
● Data Processing Cluster Management
○ On-demand, scalable, configurable, persistent clusters
○ Supports multiple plugins (Apache, Ambari, CDH, MapR...)
○ Integrates with Heat, Glance, Nova, Neutron, and Cinder
● EDP (Elastic Data Processing)
○ Supports multiple job types (Java, MR, Hive, Pig, Spark, Storm...)
○ Supports transient clusters (spin up, process, shut down) or
persistent clusters
○ Integrates with Swift and/or Manila (optionally)
6. Sahara: The Project
● Cluster provisioning plugins:
○ Cloudera Distribution of Hadoop (using Cloudera Manager)
○ Hortonworks Data Platform (using Apache Ambari)
○ MapR
○ “Vanilla” Apache Hadoop, Spark, and Storm
● EDP job types:
○ MapReduce, Java, Hive, and Pig jobs (using Apache Oozie)
○ Spark, Spark Streaming, and Storm jobs (using Apache Spark and Apache Storm)
● Image packing repository (sahara-image-elements)
● Framework to validate Sahara installation (sahara-tests)
● UI plugin
● OpenStackClient plugin
7. Agenda
1. Sahara overview
2. Health checks and management improvements
3. Kerberos integration for clusters
4. Image generation improvements
5. Bare metal clusters
6. What is NEW in NEWton
7. Q&A
8. Event log for clusters
● Cluster events about provisioning: allows to
understand what is the current status of cluster
provisioning, or reasons of failure
● Available since Newton for clusters created by
using Ambari
● Supported in CLI since Newton, with full dump of
all steps and events
11. Health checks for clusters
● Users are interested in monitoring cluster state
after cluster provisioning: vital for long living
clusters
● Sahara in Liberty doesn't have any monitoring of
the health of cluster processes. A cluster can be
broken or unavailable but Sahara will still think
that it is in ACTIVE status.
12. Health checks for clusters
● Clusters health checks have been implemented since
Mitaka
● Available for clusters deployed using Ambari and
Cloudera Manager. Less availability for vanilla
clusters
● Since Newton checks are available for the MapR
plugin
● Health results can be set to notify Ceilometer
● Easy to recheck health
16. Health checks for clusters
Next steps are:
● More detailed health checks
○ Particular datanode/slave failure
○ No enough space in HDFS
● Suggestions/actions to repair health:
○ Datanode replacement
○ New nodes
○ Restarting services
● More flexible configuration of health checks (advanced health
checks, on disabling/enabling health checks for some reason)
17. Agenda
1. Sahara overview
2. Health checks and management improvements
3. Kerberos integration for clusters
4. Image generation improvements
5. Bare metal clusters
6. What is NEW in NEWton
7. Q&A
18. Security improvements
● Security is an important part of created clusters
● Previously security could be enabled only by
managers calling only Ambari and Cloudera
Manager directly, but that leads to a situation in
which Sahara will not perform auth operations,
and EDP does not work
● Security is important not just for clusters, but for
Sahara itself
19. Security improvements
In Newton the following Kerberos security features were implemented:
● MIT KDC can be preconfigured (or an existing KDC can be used)
● Oozie client was re-implemented to support auth operations with Kerberos
● Spark job executions are also supported
● Keys are distributed on nodes for system users (hdfs, hadoop, spark)
● Supported for clusters deployed using Ambari and Cloudera Manager
● Note: Be sure that latest hadoop-swift jars are in place for Swift data sources!
21. Security improvements
● Bandit tests per commit
● Improved secret storage
(using Barbican and
Castellan) was implemented
in the previous release
22. Agenda
1. Sahara overview
2. Health checks and management improvements
3. Kerberos integration for clusters
4. Image generation improvements
5. Bare metal clusters
6. What is NEW in NEWton
7. Q&A
23. Where we were
Sahara had 2 flows that were relevant to image manipulation:
● Pre-Nova spawn image packing
○ Used sahara-image-elements repository to generate images (to store in Glance)
● Post-Nova spawn cluster generation from “clean” (OS-only) images
○ Logic maintained in Sahara process within plugins
● Pre-Configuration validation of images by plugins
○ Remember how I said we had 2 flows relevant to image manipulation?
○ We didn’t do this at all.
24. Where We Were: Problems
● Duplication of logic
○ Steps required for packing images and “clean” image clusters were often identical, but had to
be expressed separately (in DIB and in Python).
● Poor validation
○ Plugins did not validate that images provided to them met their needs.
○ Failures due to image contents were late and sometimes difficult to understand.
● Poor encapsulation
○ Image generation and cluster provisioning logic for any one plugin are really one application
○ Maintaining them in two places allows versionitis and dependency problems
○ Having one monolithic repo for all plugins makes them less pluggable
25. Our Dream Implementation
● All flows share common logic:
○ Image packing
○ Image validation
○ Clean image cluster gen
● Image manipulation is stored and versioned within plugins
● The user can still generate images with a CLI...
● But they can also use an API to generate images in clean build environments
● ... And both dev test cycles and user retries are as quick and painless as
possible
26. The plan
1. Build a validation engine that ensures that images meet a specification
a. YAML-based spec definition
2. Extend that engine to optionally modify images to spec
3. Build a CLI to expose this functionality
4. Create and test specifications for each plugin to support this method
5. Deprecate sahara-image-elements (only when this method proves stable)
6. Build an API to:
a. Spawn a clean tenant-plane image build environment
b. Download a base image from Glance and modify it to spec
c. Push the new image back to Glance and register it for use by Sahara
27. Where we are
1. Build a validation engine that ensures that images meet a specification
a. YAML-based spec definition
2. Extend that engine to optionally modify images to spec
3. Build a CLI to expose this functionality
4. Create and test specifications for each plugin to support this method
5. Deprecate sahara-image-elements (only when this method proves stable)
6. Build an API to:
a. Spawn a clean tenant-plane image build environment
b. Download a base image from Glance and modify it to spec
c. Push the new image back to Glance and register it for use by Sahara
28. What it looks like: the specs
● YAML-based definitions
● Argument definitions for
configurability
● Idempotent resource
declarations
○ Scripts must be written
idempotently, as always in
resource declarations
● Logical control operators (any,
all, os_case, etc.)
29. What it looks like: the CLI
Command structure:
sahara-image-pack --image ./image.qcow2
PLUGIN VERSION [plugin arguments]
Features:
● Auto-generates help text from arguments
● Idempotent and modifies images in-place
○ Very fast test cycles and retries
● Allows freeform bash scripts and more
structured resources
○ Though it’s on you to make your scripts
idempotent
● Test-only mode to validate without change
30. What it’s doing
The images module runs a sequence
of steps against a remote machine
● Validation uses the Sahara SSH remote in
read-only mode
● Clean image gen uses the SSH remote
● Image packing uses a libguestfs Python
API image handle
All three use the same logic,
contained in the appropriate plugin
Plugin implementation targeting O!
31. Agenda
1. Sahara overview
2. Health checks and management improvements
3. Kerberos integration for clusters
4. Image generation improvements
5. Bare metal clusters
6. What is NEW in NEWton
7. Q&A
32. Ironic integration
Why should you run Bare Metal in OpenStack:
● Big Data workload originates from Bare Metal installations
● Quick cluster scalability may have lower priority than a long running stability
and persistence
● Best performance by design, no virtualization overhead
● The ability to manage a baremetal cluster with the OpenStack API
33. Bare Metal compared to Virtualized
Bare metal (Ironic) Virtual Machines
Cluster size flexibility Dedicating nodes completely. Flavor based scheduling
Resource utilization The host is 100% utilized. KVM has memory overhead. Other VM
may abuse host’s resources.
Data locality Data is accessible directly from the
local disks.
Locality may be achieved by proper
resource scheduling
Live migration A host may be lost completely. Supported for some target daemons
34. Some tips before running Bare Metal
● Scheduling is not trivial. The Cloud operator may need to specify additional
Flavors, Availability Zones, or other metadata
● Storage is not backed by Cinder for Bare Metal
○ Sahara does disk discover on it’s own
○ Disks are different from the on w/o root mount are going to be dedicated to HDFS
● Non-standard hardware will require drivers built into the provisioning image
● Network tenant isolation is achievable through manual hardware switch
configurations
35. Agenda
1. Sahara overview
2. Health checks and management improvements
3. Kerberos integration for clusters
4. Image generation improvements
5. Bare metal clusters
6. What is NEW in NEWton
7. Q&A
36. What is NEW in NEWton
● Designate integration;
● API Improvements: pagination for list operations, API to
manage/enable/disable plugins;
● New plugin versions
○ HDP 2.4 supported
○ MapR 5.2.0
○ CDH 5.7.x
○ Vanilla + Spark on YARN
37. What is NEW in Newton
● Sahara tests framework to validate
environment readiness for Sahara’s
clusters
○ Sahara tempest plugin with more tests (CLI,
API)
○ Sahara scenario framework with a bunch of
templates
○ Published on PyPi
https://pypi.python.org/pypi/sahara-tests