SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
© Hortonworks Inc. 2013
Hadoop + OpenStack
integration Roadmap
Himanshu Bari
June 28th, 2013
Sr. Product Manager
hbari@hortonworks.com
© Hortonworks Inc. 2013
Disclaimer
•  This document may contain product features and technology directions
that are under development or may be under development in the future.
•  Technical feasibility, market demand, user feedback, and the Apache
Software Foundation community development process can all affect
timing and final delivery.
•  This document’s description of these features and technology
directions does not represent a contractual commitment from
Hortonworks to deliver these features in any generally available
product.
•  Product features and technology directions are subject to change, and
must not be included in contracts, purchase orders, or sales
agreements of any kind.
© Hortonworks Inc. 2013
Agenda
Why
Hadoop on
OpenStack
Use cases
A bit under
the hood
© Hortonworks Inc. 2013
Big Data & Cloud
Intersection
Point è2013
Big Data & Cloud are top priority for CIOs
Page 4
*
© Hortonworks Inc. 2013
OpenStack is an open source cloud
management platform
Glance
Image Service
Keystone
Identity Service
Horizon
QuantumNova
Cinder
Block Store
Swift
Object Store
(Apache License)
Ceilometer
Metering
Heat
Orchestration
Integrated
Mutli-hypervisor & guest OS
support
© Hortonworks Inc. 2013
OpenStack has taken over Amazon AWS in
market awareness…
Source: Google trends
© Hortonworks Inc. 2013
Maturing quickly with broad support..
Pushed	
  by	
  	
  
150+	
  vendors	
  	
  	
  
Millions	
  of	
  dollars	
  in	
  
venture	
  capital	
  
Early	
  adop;on	
  across	
  all	
  
ver;cals	
  
© Hortonworks Inc. 2013
Why Hadoop & OpenStack?
Hadoop provides a greenfield
use case
•  Net new workload
•  Needs scale out
infrastructure
•  Shared platform
OpenStack provides the perfect
cloud platform
•  Operational agility
•  Supports scale out architecture
•  Deployment choice across
public & private clouds
1.  Open source communities provide the fastest path to innovation
2.  Open source is changing the game as economics and accessibility serve to
accelerate cloud & big data market trends
3.  Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc…
Marries two of the largest open source movements
© Hortonworks Inc. 2013
Accelerate Adoption of Hadoop on OpenStack
Page 9
The leading contributor
to Apache Hadoop
The leading system
integrator for OpenStack
The leading contributor
to OpenStack
Apache Hadoop…
The killer app for OpenStack
© Hortonworks Inc. 2013
OpenStack Infrastructure
Savanna
Elastic Hadoop Controller
Collaborating on Project Savanna
Page 10
Swift
storage
Hadoop Cluster
N
N
N
N
N
N
2
Ambari
Hadoop management
- - + +
N
N
N
N
1
3
1.  Cluster templates: deploy
pre configured Hadoop
clusters in seconds from
Horizon or Ambari
2.  HDFS-Swift connectors:
move data between HDFS
and Swift object storage
3.  Simplified elasticity
Project Savanna
Automate deployment of
Apache Hadoop on
OpenStack
© Hortonworks Inc. 2013
Agenda
Why
Hadoop on
OpenStack
Use cases
A bit under
the hood
© Hortonworks Inc. 2013
Focus on API driven tight integration
Hide Hadoop complexity
through APIs
“It Just Works” experience
Fully leverage virtualization
Scalability, Reliability,
Performance
Project Savanna
design Goals
© Hortonworks Inc. 2013
Problems driving use cases
Finance
Compliance
ITMarketing
Web
Mobile
Sensor
Interactive
Batch
Dev QA
Prod
Operational nightmare of
supporting multiple cluster flavors
Lack of agility
Underutilized
resources
Maintenance
complications
Cluster requirements vary by business unit,
data type & analytics use case
Can’t migrate from public to private cloud
© Hortonworks Inc. 2013
Provisioning related use cases
-  Frequent dev/test/staging cluster provision requests
-  Migrations from staging to prod and vice versa
-  Reduce operator error in cluster provisioning
-  Migrate away from Amazon EMR for Ad hoc analytics
requests to support experimentation
© Hortonworks Inc. 2013
Simplified provisioningPhase-1Phase-2
Use as is Single click
provisioning
Modify
Update VM
resource
allocation,
service to
VM mapping
and service
config
Provision
and/or save
template
Template based provisioning
Hadoop as a service (job flow based provisioning)
Pick	
  job	
  type	
  
+	
  
Cascading,	
  streaming	
  &	
  	
  
custom	
  jar	
  
Upload data
to Swift
Get results in
Swift
Cluster	
  template	
  
E.g.	
  QA	
  cluster	
  
Node	
  template	
  
	
  
a.	
  Resource	
  based	
  
	
  	
  	
  	
  -­‐	
  node.Large	
  
b.	
  Func;on	
  based	
  
	
  	
  	
  	
  -­‐	
  node.NameNode	
  
	
  
Modify
© Hortonworks Inc. 2013
Ambari embedded in Horizon
© Hortonworks Inc. 2013
Swift object store support
Phase-1
Phase-2 Bug fixes & optimizations
Read/write data from/to Swift object stores
Option-1: Copy data from Swift to HDFS, run mapreduce
and copy results back to swift
Option-2: Run mapreduce directly on top of Swift (Output
data still needs to be copied from HDFS to Swift)
© Hortonworks Inc. 2013
Elasticity related use cases
-  Commission a new node or decommission a node for
maintenance
-  For dev/test/staging clusters: automatically vary
cluster data & compute capacity based on tenant,
workload, time of day, resource utilization etc.
-  Automatically vary compute capacity for production
clusters
© Hortonworks Inc. 2013
Elasticity
Nodeelasticity
(computeand/ordata)
Manual
Rule
based
Long lived Short lived
Cluster life
(Swift or HDFS used for storage)
Phase-1
Phase-2
Handle variable
workloads eg. Alter
cluster compute node
count for peak/off-peak
hrs.
Job flow based
clusters for
ad-hoc analysis
Best for
Dev/QA use
Best for predictable
workloads.
© Hortonworks Inc. 2013
Multi-tenancy related use cases
-  Improve server utilization by creating a common
server pool for Hadoop and non Hadoop workloads
-  Simplify maintenance & upgrade testing with the
ability to multiple Hadoop clusters with different
versions on the same server pool
-  Support varying SLAs based on tenant and workload
through resource isolation provided by VMs
-  Simplify chargeback/showback
© Hortonworks Inc. 2013
Multi-tenancy
Phase-1
Phase-2
•  Access isolation
•  Single sign-on for Ambari & HUE through Keystone
integration
•  Dedicated Ambari & HUE instance per cluster per
tenant
•  Resource isolation
•  CPU, memory isolation through VMs
•  Ability to pin a Hadoop VM to a given set of physical
hosts to enable per tenant physical host isolation
•  Version isolation
•  Choice of Hadoop versions for tenants
•  Access isolation
•  Single Ambari instance per tenant ( multi-cluster
support with Ambari)
•  Keystone enhancements to support Hadoop job flow
level RBAC to support Hadoop as a service
© Hortonworks Inc. 2013
Agenda
Why
Hadoop on
OpenStack
Use cases
A bit
under the
hood
© Hortonworks Inc. 2013
Savanna logical architecture
OpenStack Infrastructure
Network Storage
Security Compute
Savanna
Controller
HDP Savanna plugin
API
Hadoop
Provisioning
Ambari template
management
Horizon +
Savanna UI
A
P
I
Configuration Elasticity
Orchestration
Plugin manager
Hadoop Cluster
Ambari + API
© Hortonworks Inc. 2013
Provisioning workflow overview
24
Horizon	
  
Savanna
Controller
+
HDP OpenStack
Plugin
Nova	
   Glance	
  Cluster
request
Provisions
vanilla
VMs
Ambari
configures all
services and
starts the
cluster
VM IMAGE
OS only
OR
Pre loaded
with HDP bits
HDP plugin
passes
cluster
template to
Ambari
Hadoop
Cluster
…
…
HDP
Plugin
installs
Ambari
Ambari
Server
HUE
NN JT DNDN
© Hortonworks Inc. 2013
Ambari based cluster templates
Preconfigured information across all
clusters using this template
HDP Stack Information
- Services & Components & Packages
- Description
- Package Dependencies
Hadoop Topology
Component / Host Group Mapping
Hadoop Configuration
All Hadoop Configuration for the Cluster
(hundreds of parameters and their
values)
Per cluster pluggable data
- User names
- Passwords
- Host names
- Host VM flavors ( CPU/Mem)
- Node count per host group
……….
……….
……….
……….
© Hortonworks Inc. 2013
Swift object store support (Hadoop-8545)
Dir
File1 file2 file3
KEYSTONE	
  
Dir/file1	
   Dir/file2	
  
MapReduce,
pig & Hive
Swift store-1
Create, read, write,
delete, mkdir, ls, mv
& stat
HDFS
+
Swift
Bridge
Container -1 Container -2
Swift store-n
…
Dir/file3	
  
Container -1
Input data
Output results
© Hortonworks Inc. 2013
Hadoop virtualization extensions(HVE)
• Account for the additional ‘node group’ layer so
replicas do not end up on VMs in the same hypervisor
• Available in HDP 1.3. Work in progress to enable in
HDP 2.0 ( YARN & HDFS)
Data
Center
Rack-1
Node
group-1
VM1 VM2
Node
group-2
VM1 VM2
Rack-2
Node
group-1
VM1 VM2
Node
group-2
VM1 VM2
-  Replica (place,
choose & remove)
policies
-  Balancer policies
-  Task placement &
container
allocation(YARN)

Weitere ähnliche Inhalte

Was ist angesagt?

Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit
 

Was ist angesagt? (20)

Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Pig Out to Hadoop
Pig Out to HadoopPig Out to Hadoop
Pig Out to Hadoop
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafka
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 

Andere mochten auch

Andere mochten auch (10)

Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
 
Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014Hadoop on OpenStack - Trove Day 2014
Hadoop on OpenStack - Trove Day 2014
 
Hadoop For OpenStack Log Analysis
Hadoop For OpenStack Log AnalysisHadoop For OpenStack Log Analysis
Hadoop For OpenStack Log Analysis
 
2012 09-08-josug-jeff
2012 09-08-josug-jeff2012 09-08-josug-jeff
2012 09-08-josug-jeff
 
Dell Crowbar Software Framework for OpenStack Deployments
Dell Crowbar Software Framework for OpenStack DeploymentsDell Crowbar Software Framework for OpenStack Deployments
Dell Crowbar Software Framework for OpenStack Deployments
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
 

Ähnlich wie Apache Ambari BOF - OpenStack - Hadoop Summit 2013

Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
 

Ähnlich wie Apache Ambari BOF - OpenStack - Hadoop Summit 2013 (20)

Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
 
Docker based Hadoop Deployment
Docker based Hadoop DeploymentDocker based Hadoop Deployment
Docker based Hadoop Deployment
 
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
DEVNET-1141	Dynamic Dockerized Hadoop ProvisioningDEVNET-1141	Dynamic Dockerized Hadoop Provisioning
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
OpenStack State of Fibre Channel
OpenStack State of Fibre ChannelOpenStack State of Fibre Channel
OpenStack State of Fibre Channel
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Apache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop SummitApache Spark Workshop at Hadoop Summit
Apache Spark Workshop at Hadoop Summit
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 

Mehr von Hortonworks

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Apache Ambari BOF - OpenStack - Hadoop Summit 2013

  • 1. © Hortonworks Inc. 2013 Hadoop + OpenStack integration Roadmap Himanshu Bari June 28th, 2013 Sr. Product Manager hbari@hortonworks.com
  • 2. © Hortonworks Inc. 2013 Disclaimer •  This document may contain product features and technology directions that are under development or may be under development in the future. •  Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all affect timing and final delivery. •  This document’s description of these features and technology directions does not represent a contractual commitment from Hortonworks to deliver these features in any generally available product. •  Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
  • 3. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  • 4. © Hortonworks Inc. 2013 Big Data & Cloud Intersection Point è2013 Big Data & Cloud are top priority for CIOs Page 4 *
  • 5. © Hortonworks Inc. 2013 OpenStack is an open source cloud management platform Glance Image Service Keystone Identity Service Horizon QuantumNova Cinder Block Store Swift Object Store (Apache License) Ceilometer Metering Heat Orchestration Integrated Mutli-hypervisor & guest OS support
  • 6. © Hortonworks Inc. 2013 OpenStack has taken over Amazon AWS in market awareness… Source: Google trends
  • 7. © Hortonworks Inc. 2013 Maturing quickly with broad support.. Pushed  by     150+  vendors       Millions  of  dollars  in   venture  capital   Early  adop;on  across  all   ver;cals  
  • 8. © Hortonworks Inc. 2013 Why Hadoop & OpenStack? Hadoop provides a greenfield use case •  Net new workload •  Needs scale out infrastructure •  Shared platform OpenStack provides the perfect cloud platform •  Operational agility •  Supports scale out architecture •  Deployment choice across public & private clouds 1.  Open source communities provide the fastest path to innovation 2.  Open source is changing the game as economics and accessibility serve to accelerate cloud & big data market trends 3.  Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc… Marries two of the largest open source movements
  • 9. © Hortonworks Inc. 2013 Accelerate Adoption of Hadoop on OpenStack Page 9 The leading contributor to Apache Hadoop The leading system integrator for OpenStack The leading contributor to OpenStack Apache Hadoop… The killer app for OpenStack
  • 10. © Hortonworks Inc. 2013 OpenStack Infrastructure Savanna Elastic Hadoop Controller Collaborating on Project Savanna Page 10 Swift storage Hadoop Cluster N N N N N N 2 Ambari Hadoop management - - + + N N N N 1 3 1.  Cluster templates: deploy pre configured Hadoop clusters in seconds from Horizon or Ambari 2.  HDFS-Swift connectors: move data between HDFS and Swift object storage 3.  Simplified elasticity Project Savanna Automate deployment of Apache Hadoop on OpenStack
  • 11. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  • 12. © Hortonworks Inc. 2013 Focus on API driven tight integration Hide Hadoop complexity through APIs “It Just Works” experience Fully leverage virtualization Scalability, Reliability, Performance Project Savanna design Goals
  • 13. © Hortonworks Inc. 2013 Problems driving use cases Finance Compliance ITMarketing Web Mobile Sensor Interactive Batch Dev QA Prod Operational nightmare of supporting multiple cluster flavors Lack of agility Underutilized resources Maintenance complications Cluster requirements vary by business unit, data type & analytics use case Can’t migrate from public to private cloud
  • 14. © Hortonworks Inc. 2013 Provisioning related use cases -  Frequent dev/test/staging cluster provision requests -  Migrations from staging to prod and vice versa -  Reduce operator error in cluster provisioning -  Migrate away from Amazon EMR for Ad hoc analytics requests to support experimentation
  • 15. © Hortonworks Inc. 2013 Simplified provisioningPhase-1Phase-2 Use as is Single click provisioning Modify Update VM resource allocation, service to VM mapping and service config Provision and/or save template Template based provisioning Hadoop as a service (job flow based provisioning) Pick  job  type   +   Cascading,  streaming  &     custom  jar   Upload data to Swift Get results in Swift Cluster  template   E.g.  QA  cluster   Node  template     a.  Resource  based          -­‐  node.Large   b.  Func;on  based          -­‐  node.NameNode     Modify
  • 16. © Hortonworks Inc. 2013 Ambari embedded in Horizon
  • 17. © Hortonworks Inc. 2013 Swift object store support Phase-1 Phase-2 Bug fixes & optimizations Read/write data from/to Swift object stores Option-1: Copy data from Swift to HDFS, run mapreduce and copy results back to swift Option-2: Run mapreduce directly on top of Swift (Output data still needs to be copied from HDFS to Swift)
  • 18. © Hortonworks Inc. 2013 Elasticity related use cases -  Commission a new node or decommission a node for maintenance -  For dev/test/staging clusters: automatically vary cluster data & compute capacity based on tenant, workload, time of day, resource utilization etc. -  Automatically vary compute capacity for production clusters
  • 19. © Hortonworks Inc. 2013 Elasticity Nodeelasticity (computeand/ordata) Manual Rule based Long lived Short lived Cluster life (Swift or HDFS used for storage) Phase-1 Phase-2 Handle variable workloads eg. Alter cluster compute node count for peak/off-peak hrs. Job flow based clusters for ad-hoc analysis Best for Dev/QA use Best for predictable workloads.
  • 20. © Hortonworks Inc. 2013 Multi-tenancy related use cases -  Improve server utilization by creating a common server pool for Hadoop and non Hadoop workloads -  Simplify maintenance & upgrade testing with the ability to multiple Hadoop clusters with different versions on the same server pool -  Support varying SLAs based on tenant and workload through resource isolation provided by VMs -  Simplify chargeback/showback
  • 21. © Hortonworks Inc. 2013 Multi-tenancy Phase-1 Phase-2 •  Access isolation •  Single sign-on for Ambari & HUE through Keystone integration •  Dedicated Ambari & HUE instance per cluster per tenant •  Resource isolation •  CPU, memory isolation through VMs •  Ability to pin a Hadoop VM to a given set of physical hosts to enable per tenant physical host isolation •  Version isolation •  Choice of Hadoop versions for tenants •  Access isolation •  Single Ambari instance per tenant ( multi-cluster support with Ambari) •  Keystone enhancements to support Hadoop job flow level RBAC to support Hadoop as a service
  • 22. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  • 23. © Hortonworks Inc. 2013 Savanna logical architecture OpenStack Infrastructure Network Storage Security Compute Savanna Controller HDP Savanna plugin API Hadoop Provisioning Ambari template management Horizon + Savanna UI A P I Configuration Elasticity Orchestration Plugin manager Hadoop Cluster Ambari + API
  • 24. © Hortonworks Inc. 2013 Provisioning workflow overview 24 Horizon   Savanna Controller + HDP OpenStack Plugin Nova   Glance  Cluster request Provisions vanilla VMs Ambari configures all services and starts the cluster VM IMAGE OS only OR Pre loaded with HDP bits HDP plugin passes cluster template to Ambari Hadoop Cluster … … HDP Plugin installs Ambari Ambari Server HUE NN JT DNDN
  • 25. © Hortonworks Inc. 2013 Ambari based cluster templates Preconfigured information across all clusters using this template HDP Stack Information - Services & Components & Packages - Description - Package Dependencies Hadoop Topology Component / Host Group Mapping Hadoop Configuration All Hadoop Configuration for the Cluster (hundreds of parameters and their values) Per cluster pluggable data - User names - Passwords - Host names - Host VM flavors ( CPU/Mem) - Node count per host group ………. ………. ………. ……….
  • 26. © Hortonworks Inc. 2013 Swift object store support (Hadoop-8545) Dir File1 file2 file3 KEYSTONE   Dir/file1   Dir/file2   MapReduce, pig & Hive Swift store-1 Create, read, write, delete, mkdir, ls, mv & stat HDFS + Swift Bridge Container -1 Container -2 Swift store-n … Dir/file3   Container -1 Input data Output results
  • 27. © Hortonworks Inc. 2013 Hadoop virtualization extensions(HVE) • Account for the additional ‘node group’ layer so replicas do not end up on VMs in the same hypervisor • Available in HDP 1.3. Work in progress to enable in HDP 2.0 ( YARN & HDFS) Data Center Rack-1 Node group-1 VM1 VM2 Node group-2 VM1 VM2 Rack-2 Node group-1 VM1 VM2 Node group-2 VM1 VM2 -  Replica (place, choose & remove) policies -  Balancer policies -  Task placement & container allocation(YARN)