SlideShare a Scribd company logo
1 of 24
Download to read offline
Hadoop as a Service: HDP 2.0 with OpenStack on 
Cisco UCS Servers 
Karthik Kulkarni, TME, Big Data Solutions Architect 
Date: 10.17.14 
<<Insert show banner header here>>
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2 
2 
• Hadoop as a Service is basically virtualizing 
Hadoop and refers to a cloud computing 
solution for Hadoop 
• HaaS is a managed Hadoop cluster where all 
nitty gritty details of the underlying services are 
transparent to the user
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3 
3 
By combining the innovation of OpenStack to Hadoop, 
we bring in the following benefits to Hadoop seamlessly 
• Self-service provisioning 
• Elastic scaling 
• Support for multi-tenancy and 
• Improve Infrastructure Utilization 
• Pay based on use
OpenStack provides an Infrastructure as a Service (IaaS) 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4 
4 
OpenStack provides a free and open-source 
cloud computing software platform 
Source: openstack.org
Service Project 
name Description 
Dashboard Horizon 
Provides a web-based self-service portal to interact with 
underlying OpenStack services, such as launching an 
instance, assigning IP addresses and configuring access 
controls. 
Compute Nova 
Manages the lifecycle of compute instances in an 
OpenStack environment. Responsibilities include 
spawning, scheduling and decomissioning of machines on 
demand. 
Networking Neutron 
Enables network connectivity as a service for other 
OpenStack services, such as OpenStack Compute. Has a 
pluggable architecture that supports many popular 
networking vendors and technologies. 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5 
5 
OpenStack has a modular architecture with various code names for 
its components. 
Source: openstack.org
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6 
6 
OpenStack has a modular architecture with various components. 
Source: openstack.org
OpenStack has three roles for the nodes underneath 
(Host OS) 
• Controller node – It is the main management for 
Openstack which controls compute and storage node. 
• Compute node – These nodes are hosts to the VMs 
spawned 
• Storage node – These nodes hosts the storage for VM. 
In this architecture of HaaS, storage is Ephemeral, which 
is local to VM. Hence compute nodes are also storage 
nodes and there are no separate Storage nodes. 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7 
7 
Source: openstack.org
Cisco UCS Common Platform 
Architecture for Big Data 
8
Common Platform Architecture (CPA) is a highly scalable architecture designed to meet variety of scale-out 
application demands 
Provisioning 
Monitoring 
Growth 
Maintenance 
UCSM 
provides: 
• Speed 
• Consistency 
• Simplicity 
• Visibility 
UCS 6200 Series 
Fabric 
Internments: 
High speed 
connectivity and 
management, 
integration with 
enterprise 
application on 
blades 
LAN, SAN, 
Management 
UCS Manager 
Nexus 2232 
Fabric 
Extenders: 
Scalability at 
lower cost 
UCS 240 
Servers: 
Compute, 
storage 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
Consistent Management at Scale 
Single Rack Single 
Domain 
Multiple 
Domains 
UCS Manager 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
HaaS with Open Stack on UCS 
11
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12 
12 
The following hardware and software infrastructure were 
used for HaaS solution on UCS 
• Cisco UCS Common Platform Architecture for BigData 
Version 2 (CPAv2) with Capacity Optimized configuration 
• Ubuntu 12.04 LTS for Host and Guest OS 
• OpenStack release - Havana 
• Hortonworks 2.0.6 - installed manually on the guest VMs
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13 
13 
OpenStack components used are as follows 
• Keystone - Identity Service, 
• Glance - VM Image service, 
• Nova - compute (KVM as the hypervisor), 
• Storage - Ephemeral storage (if VM is deleted 
all data associated with the VM is lost ) 
• Networking - nova-network (flat-network) and 
• Horizon - OpenStack Dashboard
Name node Resource Mgr DN … DN DN … DN 
Controller Compute Compute … Compute 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14 
14 
• One of the node is going to be Controller node 
• All other nodes are Compute nodes 
• Hadoop Namenode is run as a Single VM on the 
controller node 
• Hadoop Resource Mgr is run as a Single VM on one of 
the compute node
Name node Resource Mgr DN … DN DN … DN 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15 
15 
Controller Compute Compute … Compute 
Pass --hint option to “nova boot” command with same_host or different_hostIn nova.conf add 
scheduler_default_filters=SameHostFilter,DifferentHostFilter! 
#nova boot --flavor 1 --key_name mykey --image <image-id> ! 
--security_group default --hint different_host=<vm-id>! 
! 
#nova boot --flavor 1 --key_name mykey --image <image-id> ! 
--security_group default --hint same_host=<vm-id>! 
! 
Additional details: www.cisco.com/go/bigdata_design
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
Category Workloads 
Micro Benchmarks WordCount (per node) 
TeraSort (cluster) 
Sort (per node) 
Machine Learning Mahout Bayesian 
Classification (Bayes) 
Mahout K-means clustering 
(kmeans) 
HDFS Benchmark EnhancedDFSIO (dfsioe) 
Hive Query Benchmark Hive Bench 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
Hardware 
/ 
So+ware 
Configura1on 
Servers 
20 
x 
UCS 
C240 
M3 
LFF 
(1 
Name 
node, 
1 
Secondary 
Name 
node, 
18 
Data 
nodes) 
Processor 
2 
x 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
v2 
(25M 
Cache, 
2.80 
GHz), 
10 
Cores 
(Each) 
Hard 
disk 
drives 
12 
x 
4TB 
SATA 
7200RPM 
HDDs, 
RAID 
10 
Memory 
256 
GB 
RAM 
Network 
2 
x 
10 
Cisco 
VIC 
1225 
Gigabit 
Ethernet 
NIC 
Opera[on 
Ubuntu 
14.04LTS 
(Host 
OS 
and 
Guest 
OS) 
system 
Hadoop 
Version 
Hortonworks 
HDP 
2.0.6 
HiBench 
HiBench 
2.2 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
Name 
vCPU 
RAM 
(MB) 
Root 
Disk(GB) 
Ephemeral(GB) 
VM 
Filesystem 
hadoop.8vm.ephemeral 
2 
28250 
50 
2000 
ext3 
hadoop.4vm.ephemeral 
4 
56500 
50 
4000 
xfs 
hadoop.2vm.ephemeral 
8 
113000 
50 
8000 
xfs 
hadoop.1vm.ephemeral 
16 
226000 
50 
16000 
xfs 
hadoop.master 
16 
226000 
50 
20000 
xfs 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
Seconds 
This workload sorts its text input data (24GB) and results are per node 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
Seconds 
TeraSort is a standard benchmark created by Jim Gray. Its input 
data is generated by Hadoop TeraGen (1TB) example program 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
Seconds 
This workload counts the occurrence of each word in the input data, which 
are generated using the Hadoop RandomTextWriter (32GB/node) 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Summary 
While Mainstream Hadoop is still expected to be on Bare-Metal, 
Hadoop as a Service with OpenStack holds great promise and 
gain more popularity with Service Providers, IT offering HaaS 
internally within an Organization, Testing and Development 
environments, to name a few. 
Additional details: www.cisco.com/go/bigdata_design 
Cisco Validated Design: 
Hadoop as a Service (HaaS) with Cisco UCS CPA v2 for Big Data and Open Stack 
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
Thank You 
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. 
and/or other countries.

More Related Content

What's hot

Sample Lucene Big Data Diagram Generic
Sample Lucene Big Data Diagram GenericSample Lucene Big Data Diagram Generic
Sample Lucene Big Data Diagram Generic
Ken Irwin
 
Open stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpen stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyh
OpenCity Community
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Community
 
Distributing Data The Aerospike Way
Distributing Data The Aerospike WayDistributing Data The Aerospike Way
Distributing Data The Aerospike Way
Aerospike, Inc.
 

What's hot (20)

Openstack: An Open Source Cloud Framework
Openstack: An Open Source Cloud FrameworkOpenstack: An Open Source Cloud Framework
Openstack: An Open Source Cloud Framework
 
Sample Lucene Big Data Diagram Generic
Sample Lucene Big Data Diagram GenericSample Lucene Big Data Diagram Generic
Sample Lucene Big Data Diagram Generic
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
 
Postgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStackPostgres Plus Cloud Database on OpenStack
Postgres Plus Cloud Database on OpenStack
 
Open stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpen stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyh
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
Iocg Whats New In V Sphere
Iocg Whats New In V SphereIocg Whats New In V Sphere
Iocg Whats New In V Sphere
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to Enterprise
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
Performance analysis with_ceph
Performance analysis with_cephPerformance analysis with_ceph
Performance analysis with_ceph
 
RHOSP6 DELL Summit - OpenStack
RHOSP6 DELL Summit - OpenStack RHOSP6 DELL Summit - OpenStack
RHOSP6 DELL Summit - OpenStack
 
SUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderXSUSE Enterprise Storage on ThunderX
SUSE Enterprise Storage on ThunderX
 
Distributing Data The Aerospike Way
Distributing Data The Aerospike WayDistributing Data The Aerospike Way
Distributing Data The Aerospike Way
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Power8 hardware technical deep dive workshop
Power8 hardware technical deep dive workshopPower8 hardware technical deep dive workshop
Power8 hardware technical deep dive workshop
 
Configuring Aerospike - Part 2
Configuring Aerospike - Part 2 Configuring Aerospike - Part 2
Configuring Aerospike - Part 2
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
Ceph: Low Fail Go Scale
Ceph: Low Fail Go Scale Ceph: Low Fail Go Scale
Ceph: Low Fail Go Scale
 

Similar to CISCO - Presentation at Hortonworks Booth - Strata 2014

Optimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsOptimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deployments
Animesh Singh
 
Cisco at v mworld 2015 cs integrated infrastructure_vmworld_cisco_v1
Cisco at v mworld 2015 cs integrated infrastructure_vmworld_cisco_v1Cisco at v mworld 2015 cs integrated infrastructure_vmworld_cisco_v1
Cisco at v mworld 2015 cs integrated infrastructure_vmworld_cisco_v1
ldangelo0772
 

Similar to CISCO - Presentation at Hortonworks Booth - Strata 2014 (20)

Optimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deploymentsOptimizing Cloud Foundry and OpenStack for large scale deployments
Optimizing Cloud Foundry and OpenStack for large scale deployments
 
Introducing Cloud Development with Project Shipped and Mantl: a deep dive
Introducing Cloud Development with Project Shipped and Mantl: a deep diveIntroducing Cloud Development with Project Shipped and Mantl: a deep dive
Introducing Cloud Development with Project Shipped and Mantl: a deep dive
 
Introducing Cloud Development with Mantl
Introducing Cloud Development with MantlIntroducing Cloud Development with Mantl
Introducing Cloud Development with Mantl
 
Cisco at v mworld 2015 cs integrated infrastructure_vmworld_cisco_v1
Cisco at v mworld 2015 cs integrated infrastructure_vmworld_cisco_v1Cisco at v mworld 2015 cs integrated infrastructure_vmworld_cisco_v1
Cisco at v mworld 2015 cs integrated infrastructure_vmworld_cisco_v1
 
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
OpenNebulaConf 2016 - OpenNebula, a story about flexibility and technological...
 
Cisco Connect Vancouver 2017 - Compute infrastructure for a hybrid cloud
Cisco Connect Vancouver 2017 - Compute infrastructure for a hybrid cloudCisco Connect Vancouver 2017 - Compute infrastructure for a hybrid cloud
Cisco Connect Vancouver 2017 - Compute infrastructure for a hybrid cloud
 
5 cisco open_stack
5 cisco open_stack5 cisco open_stack
5 cisco open_stack
 
2014 Big_Data_Forum_Cisco
2014 Big_Data_Forum_Cisco2014 Big_Data_Forum_Cisco
2014 Big_Data_Forum_Cisco
 
Deploying Applications in Today’s Network Infrastructure
Deploying Applications in Today’s Network InfrastructureDeploying Applications in Today’s Network Infrastructure
Deploying Applications in Today’s Network Infrastructure
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
 
The Enhanced Cisco Container Platform
The Enhanced Cisco Container PlatformThe Enhanced Cisco Container Platform
The Enhanced Cisco Container Platform
 
Beyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStackBeyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStack
 
Cisco Data Center Orchestration Solution
Cisco Data Center Orchestration SolutionCisco Data Center Orchestration Solution
Cisco Data Center Orchestration Solution
 
Citrix Master Class - Live Upgrade from XenApp 6.5 to 7.6
Citrix Master Class - Live Upgrade from XenApp 6.5 to 7.6Citrix Master Class - Live Upgrade from XenApp 6.5 to 7.6
Citrix Master Class - Live Upgrade from XenApp 6.5 to 7.6
 
The Hitch-Hikers Guide to Data Centre Virtualization and Workload Consolidation:
The Hitch-Hikers Guide to Data Centre Virtualization and Workload Consolidation:The Hitch-Hikers Guide to Data Centre Virtualization and Workload Consolidation:
The Hitch-Hikers Guide to Data Centre Virtualization and Workload Consolidation:
 
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2
Aioug2017 deploying-ebs-on-prem-and-on-oracle-cloud v2
 
TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series SwitchesTechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
TechWiseTV Workshop: Application Hosting on Catalyst 9000 Series Switches
 
Deploying couchbaseserverazure cihanbiyikoglu_microsoft
Deploying couchbaseserverazure cihanbiyikoglu_microsoftDeploying couchbaseserverazure cihanbiyikoglu_microsoft
Deploying couchbaseserverazure cihanbiyikoglu_microsoft
 
Hadoop on OpenStack
Hadoop on OpenStackHadoop on OpenStack
Hadoop on OpenStack
 
Cloud orchestration with ucs director
Cloud orchestration with ucs directorCloud orchestration with ucs director
Cloud orchestration with ucs director
 

More from Hortonworks

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

CISCO - Presentation at Hortonworks Booth - Strata 2014

  • 1. Hadoop as a Service: HDP 2.0 with OpenStack on Cisco UCS Servers Karthik Kulkarni, TME, Big Data Solutions Architect Date: 10.17.14 <<Insert show banner header here>>
  • 2. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2 2 • Hadoop as a Service is basically virtualizing Hadoop and refers to a cloud computing solution for Hadoop • HaaS is a managed Hadoop cluster where all nitty gritty details of the underlying services are transparent to the user
  • 3. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3 3 By combining the innovation of OpenStack to Hadoop, we bring in the following benefits to Hadoop seamlessly • Self-service provisioning • Elastic scaling • Support for multi-tenancy and • Improve Infrastructure Utilization • Pay based on use
  • 4. OpenStack provides an Infrastructure as a Service (IaaS) © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4 4 OpenStack provides a free and open-source cloud computing software platform Source: openstack.org
  • 5. Service Project name Description Dashboard Horizon Provides a web-based self-service portal to interact with underlying OpenStack services, such as launching an instance, assigning IP addresses and configuring access controls. Compute Nova Manages the lifecycle of compute instances in an OpenStack environment. Responsibilities include spawning, scheduling and decomissioning of machines on demand. Networking Neutron Enables network connectivity as a service for other OpenStack services, such as OpenStack Compute. Has a pluggable architecture that supports many popular networking vendors and technologies. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5 5 OpenStack has a modular architecture with various code names for its components. Source: openstack.org
  • 6. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6 6 OpenStack has a modular architecture with various components. Source: openstack.org
  • 7. OpenStack has three roles for the nodes underneath (Host OS) • Controller node – It is the main management for Openstack which controls compute and storage node. • Compute node – These nodes are hosts to the VMs spawned • Storage node – These nodes hosts the storage for VM. In this architecture of HaaS, storage is Ephemeral, which is local to VM. Hence compute nodes are also storage nodes and there are no separate Storage nodes. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7 7 Source: openstack.org
  • 8. Cisco UCS Common Platform Architecture for Big Data 8
  • 9. Common Platform Architecture (CPA) is a highly scalable architecture designed to meet variety of scale-out application demands Provisioning Monitoring Growth Maintenance UCSM provides: • Speed • Consistency • Simplicity • Visibility UCS 6200 Series Fabric Internments: High speed connectivity and management, integration with enterprise application on blades LAN, SAN, Management UCS Manager Nexus 2232 Fabric Extenders: Scalability at lower cost UCS 240 Servers: Compute, storage © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
  • 10. Consistent Management at Scale Single Rack Single Domain Multiple Domains UCS Manager © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
  • 11. HaaS with Open Stack on UCS 11
  • 12. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12 12 The following hardware and software infrastructure were used for HaaS solution on UCS • Cisco UCS Common Platform Architecture for BigData Version 2 (CPAv2) with Capacity Optimized configuration • Ubuntu 12.04 LTS for Host and Guest OS • OpenStack release - Havana • Hortonworks 2.0.6 - installed manually on the guest VMs
  • 13. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13 13 OpenStack components used are as follows • Keystone - Identity Service, • Glance - VM Image service, • Nova - compute (KVM as the hypervisor), • Storage - Ephemeral storage (if VM is deleted all data associated with the VM is lost ) • Networking - nova-network (flat-network) and • Horizon - OpenStack Dashboard
  • 14. Name node Resource Mgr DN … DN DN … DN Controller Compute Compute … Compute © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14 14 • One of the node is going to be Controller node • All other nodes are Compute nodes • Hadoop Namenode is run as a Single VM on the controller node • Hadoop Resource Mgr is run as a Single VM on one of the compute node
  • 15. Name node Resource Mgr DN … DN DN … DN © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15 15 Controller Compute Compute … Compute Pass --hint option to “nova boot” command with same_host or different_hostIn nova.conf add scheduler_default_filters=SameHostFilter,DifferentHostFilter! #nova boot --flavor 1 --key_name mykey --image <image-id> ! --security_group default --hint different_host=<vm-id>! ! #nova boot --flavor 1 --key_name mykey --image <image-id> ! --security_group default --hint same_host=<vm-id>! ! Additional details: www.cisco.com/go/bigdata_design
  • 16. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
  • 17. Category Workloads Micro Benchmarks WordCount (per node) TeraSort (cluster) Sort (per node) Machine Learning Mahout Bayesian Classification (Bayes) Mahout K-means clustering (kmeans) HDFS Benchmark EnhancedDFSIO (dfsioe) Hive Query Benchmark Hive Bench © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
  • 18. Hardware / So+ware Configura1on Servers 20 x UCS C240 M3 LFF (1 Name node, 1 Secondary Name node, 18 Data nodes) Processor 2 x Intel® Xeon® Processor E5-­‐2680 v2 (25M Cache, 2.80 GHz), 10 Cores (Each) Hard disk drives 12 x 4TB SATA 7200RPM HDDs, RAID 10 Memory 256 GB RAM Network 2 x 10 Cisco VIC 1225 Gigabit Ethernet NIC Opera[on Ubuntu 14.04LTS (Host OS and Guest OS) system Hadoop Version Hortonworks HDP 2.0.6 HiBench HiBench 2.2 © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
  • 19. Name vCPU RAM (MB) Root Disk(GB) Ephemeral(GB) VM Filesystem hadoop.8vm.ephemeral 2 28250 50 2000 ext3 hadoop.4vm.ephemeral 4 56500 50 4000 xfs hadoop.2vm.ephemeral 8 113000 50 8000 xfs hadoop.1vm.ephemeral 16 226000 50 16000 xfs hadoop.master 16 226000 50 20000 xfs © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
  • 20. Seconds This workload sorts its text input data (24GB) and results are per node © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
  • 21. Seconds TeraSort is a standard benchmark created by Jim Gray. Its input data is generated by Hadoop TeraGen (1TB) example program © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
  • 22. Seconds This workload counts the occurrence of each word in the input data, which are generated using the Hadoop RandomTextWriter (32GB/node) © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
  • 23. Summary While Mainstream Hadoop is still expected to be on Bare-Metal, Hadoop as a Service with OpenStack holds great promise and gain more popularity with Service Providers, IT offering HaaS internally within an Organization, Testing and Development environments, to name a few. Additional details: www.cisco.com/go/bigdata_design Cisco Validated Design: Hadoop as a Service (HaaS) with Cisco UCS CPA v2 for Big Data and Open Stack © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
  • 24. Thank You Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.