More Related Content Similar to CISCO - Presentation at Hortonworks Booth - Strata 2014 (20) More from Hortonworks (20) CISCO - Presentation at Hortonworks Booth - Strata 20141. Hadoop as a Service: HDP 2.0 with OpenStack on
Cisco UCS Servers
Karthik Kulkarni, TME, Big Data Solutions Architect
Date: 10.17.14
<<Insert show banner header here>>
2. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2
2
• Hadoop as a Service is basically virtualizing
Hadoop and refers to a cloud computing
solution for Hadoop
• HaaS is a managed Hadoop cluster where all
nitty gritty details of the underlying services are
transparent to the user
3. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3
3
By combining the innovation of OpenStack to Hadoop,
we bring in the following benefits to Hadoop seamlessly
• Self-service provisioning
• Elastic scaling
• Support for multi-tenancy and
• Improve Infrastructure Utilization
• Pay based on use
4. OpenStack provides an Infrastructure as a Service (IaaS)
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
4
OpenStack provides a free and open-source
cloud computing software platform
Source: openstack.org
5. Service Project
name Description
Dashboard Horizon
Provides a web-based self-service portal to interact with
underlying OpenStack services, such as launching an
instance, assigning IP addresses and configuring access
controls.
Compute Nova
Manages the lifecycle of compute instances in an
OpenStack environment. Responsibilities include
spawning, scheduling and decomissioning of machines on
demand.
Networking Neutron
Enables network connectivity as a service for other
OpenStack services, such as OpenStack Compute. Has a
pluggable architecture that supports many popular
networking vendors and technologies.
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5
5
OpenStack has a modular architecture with various code names for
its components.
Source: openstack.org
6. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6
6
OpenStack has a modular architecture with various components.
Source: openstack.org
7. OpenStack has three roles for the nodes underneath
(Host OS)
• Controller node – It is the main management for
Openstack which controls compute and storage node.
• Compute node – These nodes are hosts to the VMs
spawned
• Storage node – These nodes hosts the storage for VM.
In this architecture of HaaS, storage is Ephemeral, which
is local to VM. Hence compute nodes are also storage
nodes and there are no separate Storage nodes.
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7
7
Source: openstack.org
9. Common Platform Architecture (CPA) is a highly scalable architecture designed to meet variety of scale-out
application demands
Provisioning
Monitoring
Growth
Maintenance
UCSM
provides:
• Speed
• Consistency
• Simplicity
• Visibility
UCS 6200 Series
Fabric
Internments:
High speed
connectivity and
management,
integration with
enterprise
application on
blades
LAN, SAN,
Management
UCS Manager
Nexus 2232
Fabric
Extenders:
Scalability at
lower cost
UCS 240
Servers:
Compute,
storage
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9
10. Consistent Management at Scale
Single Rack Single
Domain
Multiple
Domains
UCS Manager
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10
12. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12
12
The following hardware and software infrastructure were
used for HaaS solution on UCS
• Cisco UCS Common Platform Architecture for BigData
Version 2 (CPAv2) with Capacity Optimized configuration
• Ubuntu 12.04 LTS for Host and Guest OS
• OpenStack release - Havana
• Hortonworks 2.0.6 - installed manually on the guest VMs
13. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13
13
OpenStack components used are as follows
• Keystone - Identity Service,
• Glance - VM Image service,
• Nova - compute (KVM as the hypervisor),
• Storage - Ephemeral storage (if VM is deleted
all data associated with the VM is lost )
• Networking - nova-network (flat-network) and
• Horizon - OpenStack Dashboard
14. Name node Resource Mgr DN … DN DN … DN
Controller Compute Compute … Compute
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14
14
• One of the node is going to be Controller node
• All other nodes are Compute nodes
• Hadoop Namenode is run as a Single VM on the
controller node
• Hadoop Resource Mgr is run as a Single VM on one of
the compute node
15. Name node Resource Mgr DN … DN DN … DN
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15
15
Controller Compute Compute … Compute
Pass --hint option to “nova boot” command with same_host or different_hostIn nova.conf add
scheduler_default_filters=SameHostFilter,DifferentHostFilter!
#nova boot --flavor 1 --key_name mykey --image <image-id> !
--security_group default --hint different_host=<vm-id>!
!
#nova boot --flavor 1 --key_name mykey --image <image-id> !
--security_group default --hint same_host=<vm-id>!
!
Additional details: www.cisco.com/go/bigdata_design
16. © 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16
17. Category Workloads
Micro Benchmarks WordCount (per node)
TeraSort (cluster)
Sort (per node)
Machine Learning Mahout Bayesian
Classification (Bayes)
Mahout K-means clustering
(kmeans)
HDFS Benchmark EnhancedDFSIO (dfsioe)
Hive Query Benchmark Hive Bench
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17
18. Hardware
/
So+ware
Configura1on
Servers
20
x
UCS
C240
M3
LFF
(1
Name
node,
1
Secondary
Name
node,
18
Data
nodes)
Processor
2
x
Intel®
Xeon®
Processor
E5-‐2680
v2
(25M
Cache,
2.80
GHz),
10
Cores
(Each)
Hard
disk
drives
12
x
4TB
SATA
7200RPM
HDDs,
RAID
10
Memory
256
GB
RAM
Network
2
x
10
Cisco
VIC
1225
Gigabit
Ethernet
NIC
Opera[on
Ubuntu
14.04LTS
(Host
OS
and
Guest
OS)
system
Hadoop
Version
Hortonworks
HDP
2.0.6
HiBench
HiBench
2.2
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
19. Name
vCPU
RAM
(MB)
Root
Disk(GB)
Ephemeral(GB)
VM
Filesystem
hadoop.8vm.ephemeral
2
28250
50
2000
ext3
hadoop.4vm.ephemeral
4
56500
50
4000
xfs
hadoop.2vm.ephemeral
8
113000
50
8000
xfs
hadoop.1vm.ephemeral
16
226000
50
16000
xfs
hadoop.master
16
226000
50
20000
xfs
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 19
20. Seconds
This workload sorts its text input data (24GB) and results are per node
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20
21. Seconds
TeraSort is a standard benchmark created by Jim Gray. Its input
data is generated by Hadoop TeraGen (1TB) example program
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21
22. Seconds
This workload counts the occurrence of each word in the input data, which
are generated using the Hadoop RandomTextWriter (32GB/node)
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
23. Summary
While Mainstream Hadoop is still expected to be on Bare-Metal,
Hadoop as a Service with OpenStack holds great promise and
gain more popularity with Service Providers, IT offering HaaS
internally within an Organization, Testing and Development
environments, to name a few.
Additional details: www.cisco.com/go/bigdata_design
Cisco Validated Design:
Hadoop as a Service (HaaS) with Cisco UCS CPA v2 for Big Data and Open Stack
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 23
24. Thank You
Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries.