SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
Solution Guide
EMC HYBRID CLOUD SOLUTION
WITH VMWARE
Hadoop Applications Solution Guide 2.5
EMC Solutions
Abstract
This document serves as a reference for planning and designing a Pivotal Hadoop
solution that enables IT organizations to quickly deploy Hadoop as a service (HaaS)
on an existing cloud.
August 2014
2 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Copyright © 2014 EMC Corporation. All rights reserved. Published in the USA.
Published August 2014
EMC believes the information in this publication is accurate as of its publication date.
The information is subject to change without notice.
The information in this publication is provided as is. EMC Corporation makes no
representations or warranties of any kind with respect to the information in this
publication, and specifically disclaims implied warranties of merchantability or
fitness for a particular purpose. Use, copying, and distribution of any EMC software
described in this publication requires an applicable software license.
EMC2
, EMC, and the EMC logo are registered trademarks or trademarks of EMC
Corporation in the United States and other countries. All other trademarks used
herein are the property of their respective owners.
For the most up-to-date listing of EMC product names, see EMC Corporation
Trademarks on EMC.com.
EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Part Number H13221
Contents
3EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Contents
Chapter 1 Executive Summary 7
Document purpose.....................................................................................................8
Audience....................................................................................................................8
Solution purpose........................................................................................................8
Business challenge ....................................................................................................9
Technology solution ...................................................................................................9
Chapter 2 EMC Hybrid Cloud Solution Overview 11
Introduction .............................................................................................................12
EMC Hybrid Cloud features and functionality............................................................13
Automation and self-service provisioning ............................................................13
Multitenancy and secure separation....................................................................14
Workload-optimized storage................................................................................14
Elasticity and service assurance ..........................................................................14
Operational monitoring and management............................................................15
Metering and chargeback ....................................................................................15
Modular add-on components...............................................................................16
Chapter 3 EMC Hybrid Cloud Hadoop as a Service 19
Overview ..................................................................................................................20
EMC Hybrid Cloud HaaS and IaaS .............................................................................20
Pivotal Hadoop.........................................................................................................21
Serengeti..................................................................................................................22
VMware vSphere Big Data Extensions.......................................................................22
Chapter 4 HaaS Component Integration 25
Overview ..................................................................................................................26
Integrating Hadoop components with EMC Hybrid Cloud ..........................................26
BDE Topology.......................................................................................................26
Virtualized Hadoop..............................................................................................27
Configuring the platform...........................................................................................28
Installing and configuring BDE.............................................................................28
Installing and configuring PHD.............................................................................30
Installing and configuring EMC Hybrid Cloud IaaS................................................33
Contents
4 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 5 Creating vCO Workflows and vCAC Catalog Services for HaaS 35
Overview ..................................................................................................................36
Importing and modifying custom vCO workflows ......................................................36
Modifying custom workflows ...............................................................................36
Creating BDE Clusters...............................................................................................42
Creating new BDE clusters ...................................................................................42
Configuring a Hadoop cluster...............................................................................42
Creating vCAC Catalog Services ................................................................................45
Accessing vCAC ...................................................................................................45
Creating a new service blueprint..........................................................................45
Chapter 6 Use Cases: EMC Hybrid Cloud IaaS 49
Overview ..................................................................................................................50
IaaS – storage services.............................................................................................50
Overview..............................................................................................................50
Use case 1: Storage provisioning.........................................................................50
Use case 2: Select virtual machine storage ..........................................................54
Use case 3: Metering storage services .................................................................55
Summary .............................................................................................................56
Monitoring and capacity planning ............................................................................57
Monitoring...........................................................................................................57
Capacity planning................................................................................................57
Capacity planning example..................................................................................60
Metering and chargeback .........................................................................................61
Chapter 7 Conclusion 65
Summary..................................................................................................................66
Appendix A References 67
VMware references...................................................................................................68
Contents
5EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figures
Figure 1. EMC Hybrid Cloud key components .....................................................12
Figure 2. EMC Hybrid Cloud self-service portal ...................................................14
Figure 3. EMC ViPR Analytics with VMware vCenter Operations Manager............15
Figure 4. IT Business Management Suite overview dashboard for hybrid cloud ..16
Figure 5. EMC Hybrid Cloud HaaS component overview......................................21
Figure 6. Pivotal Hadoop (PHD) components......................................................22
Figure 7. BDE and Serengeti stack......................................................................23
Figure 8. BDE and vSphere deployment topology...............................................26
Figure 9. The evolution of virtual Hadoop...........................................................27
Figure 10. Configuring the SSO lookup service and management server IP
addresses ...........................................................................................29
Figure 11. Importing Hadoop binaries into BDE management server ....................31
Figure 12. Removing the default Apache template from BDE ................................32
Figure 13. Importing custom workflows into vCO..................................................36
Figure 14. Using the validate workflows action ....................................................37
Figure 15. How to edit the attributes....................................................................37
Figure 16. Editing and creating custom parameter passing ..................................38
Figure 17. Launching scripts from the VCO...........................................................39
Figure 18. Launching of Micro Hadoop Cluster workflow ......................................40
Figure 19. Status of creation of Micro Hadoop cluster from BDE (vSphere web
client)..................................................................................................41
Figure 20. Status of Micro Hadoop cluster creation from BDE vSphere Client .......41
Figure 21. Create and name a new Big Data Cluster .............................................42
Figure 22. Advance Service Designer ...................................................................46
Figure 23. Edit Entitlement window......................................................................46
Figure 24. vCAC Service Catalog showing Hadoop as a Service ............................47
Figure 25. Storage Services - Provision cloud storage ..........................................51
Figure 26. Provision Cloud Storage – select vCenter cluster .................................52
Figure 27. Storage Provisioning – Select datastore type.......................................52
Figure 28. Storage provisioning – Choose ViPR storage pool................................53
Figure 29. Storage provisioning – Enter storage size............................................53
Figure 30. Provision Storage – Storage Reservation for vCAC Business Group ......53
Figure 31. Set storage reservation policy for virtual machine disks ......................54
Figure 32. Create new virtual machine storage profile for Tier 2 storage ...............55
Figure 33. Automatic discovery of storage capabilities using EMC ViPR Storage
Provider...............................................................................................55
Figure 34. VMware ITBM chargeback based on storage profile of datastore .........56
Figure 35. Choosing virtual machine consumption models and profiles...............58
Contents
6 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 36. Specifying configuration and projected capacity usage of new virtual
machines ............................................................................................58
Figure 37. Capacity summary showing insufficient CPU and RAM resources.........59
Figure 38. Specifying number of hosts and amount of CPU and memory ..............59
Figure 39. Specifying datastore size.....................................................................60
Figure 40. Compared scenarios............................................................................60
Figure 41. Combined scenarios............................................................................61
Figure 42. Categorized hybrid cloud environment cost overview ..........................62
Figure 43. vSphere Cluster cost overview.............................................................63
Figure 44. Storage cost overview..........................................................................63
Chapter 1: Executive Summary
7EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 1 Executive Summary
This chapter presents the following topics:
Document purpose.....................................................................................................8
Audience....................................................................................................................8
Solution purpose........................................................................................................8
Business challenge....................................................................................................9
Technology solution...................................................................................................9
Chapter 1: Executive Summary
8 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Document purpose
This document serves as a reference for planning and designing a Pivotal Hadoop
solution that enables IT organizations to quickly deploy Hadoop as a service (HaaS)
on an existing cloud. The solution delivers infrastructure as-a-service (IaaS)
capabilities to support big data application development. This document introduces
the main features and functionality of the solution, the solution architecture and key
components, and the validated hardware and software environment. It demonstrates
the integration of Pivotal Hadoop Enterprise in the EMC®
Hybrid Cloud solution.
The Pivotal Hadoop solution is a modular add-on to the EMC Hybrid Cloud solution.
EMC Hybrid Cloud Solution with VMware: Foundation Infrastructure Reference
Architecture 2.5 and EMC Hybrid Cloud Solution with VMware: Foundation
Infrastructure Solution Guide 2.5 describe the reference architecture and the
foundation solution upon which all the EMC Hybrid Cloud add-on solutions build.
The following documents provide further information about how to implement
specific capabilities or enable specific use cases within the EMC Hybrid Cloud
solution with VMware:
 EMC Hybrid Cloud Solution with VMware: Data Protection Continuous
Availability Solution Guide 2.5
 EMC Hybrid Cloud Solution with VMware: Data Protection Disaster Recovery
Solution Guide 2.5
 EMC Hybrid Cloud Solution with VMware: Data Protection Backup Solution
Guide 2.5
 EMC Hybrid Cloud Solution with VMware: Security Solution Guide 2.5
 EMC Hybrid Cloud Solution with VMware: Pivotal CF Platform as a Service
Solution Guide 2.5
Audience
This document is intended for executives, managers, architects, cloud
administrators, and technical administrators of IT environments who want to build a
self-service Pivotal Hadoop-based Enterprise big data platform. Readers should be
familiar with VMware vCloud Suite, Pivotal Hadoop, VMware Big Data Extensions
(BDE), EMC ViPR®
, general IaaS defined datacenter concepts, and how a hybrid cloud
infrastructure accommodates these technologies and requirements.
Solution purpose
The EMC Hybrid Cloud solution enables EMC customers to build an enterprise-class,
scalable, multitenant infrastructure that enables:
 Complete management of the infrastructure and application service lifecycle
 On-demand access to and control of network bandwidth, servers, storage, and
security
Chapter 1: Executive Summary
9EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
 Quick deployment of IaaS components to support HaaS-based services without
IT administrator involvement
 Scalable, elastic, flexible HaaS-based services for maximum asset utilization
 Access to application services from a single platform for both business-critical
and next-generation cloud applications
This solution provides the reference architecture and the best practice guidance
necessary to integrate the key components and functionality of enterprise HaaS into
an underlying EMC Hybrid Cloud infrastructure.
Business challenge
Today’s enterprise demands an agile development platform that can enable the
continuous delivery, updating, and horizontal scalability of applications. The Pivotal
Hadoop (PHD) platform enables developers to easily deploy, bind, and scale
applications and data services. When integrated with VMware vCloud Automation
Center, it delivers a self-service Pivotal Hadoop platform that facilitates rapid
deployment and instant scaling or updating of Hadoop clusters.
HaaS interoperability with the underlying infrastructure needs to accommodate
consumable new generation applications while maintaining existing end-to-end
service delivery to provide:
 Efficiency and flexibility
 Fast, proactive responses for services requests
 Easy as-a-service model of deployment
 Adequate visibility into the cost of the infrastructure
Technology solution
This EMC Hybrid Cloud solution integrates the best of EMC, VMware, and Pivotal
products and services, and empowers IT organizations to adopt an as-a-service
implementation model of compute and storage infrastructure within the data center.
Agile, elastic, on-demand, end-to-end IaaS provisioning is crucial to support a
comprehensive, dynamic, and fast-growing big data environment.
The key solution components include:
 EMC ViPR software-defined storage platform
 VMware vCloud Suite cloud management and infrastructure
 EMC and VMware integrated workflows
 VMware NSX virtual networking technologies
 VMware vSphere virtualization platform
 VMware Big Data Extensions (BDE) with Project Serengeti
 Pivotal Hadoop (PHD)
Chapter 1: Executive Summary
10 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 2: EMC Hybrid Cloud Solution Overview
11EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 2 EMC Hybrid Cloud Solution Overview
This chapter presents the following topics:
Introduction .............................................................................................................12
EMC Hybrid Cloud features and functionality ...........................................................13
Chapter 2: EMC Hybrid Cloud Solution Overview
12 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Introduction
The EMC Hybrid Cloud solution enables a well-run hybrid cloud by bringing new
functionality not only to IT organizations, but also to developers, end users, and line-
of-business owners. Beyond delivering baseline infrastructure as a service (IaaS),
built on a software-defined data center (SDDC) architecture, the solution delivers
feature-rich capabilities to expand from IaaS to business-enabling IT as a service
(ITaaS). Backup as a service (BaaS) and disaster recovery as a service (DRaaS) are
now policies that users can enable with just a few mouse clicks. End users and
developers can quickly access a marketplace of resources for Microsoft, Oracle, SAP,
EMC Syncplicity®
, and Pivotal applications, and can add third-party packages as
required. All of these resources can be deployed on private cloud or public cloud
services, including VMware vCloud Air, from EMC-powered cloud service providers.
The EMC Hybrid Cloud solution uses the best of EMC and VMware products and
services, and takes advantage of the strong integration between EMC and VMware
technologies to provide the foundation for enabling IaaS on new and existing
infrastructure for the hybrid cloud.
Figure 1 shows the key components of the EMC Hybrid Cloud solution. For detailed
information, refer to EMC Hybrid Cloud Solution with VMware: Foundation
Infrastructure Solution Guide 2.5. For information on EMC Hybrid Cloud modular add-
on solutions, which provide functionality such as data protection, continuous
availability, and application services, refer to Modular add-on components and to the
individual Solution Guides for those add-ons.
Figure 1. EMC Hybrid Cloud key components
Chapter 2: EMC Hybrid Cloud Solution Overview
13EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
EMC Hybrid Cloud features and functionality
The EMC Hybrid Cloud solution incorporates the following features and functionality:
 Automation and self-service provisioning
 Multitenancy and secure separation
 Workload-optimized storage
 Elasticity and service assurance
 Operational monitoring and management
 Metering and chargeback
 Modular add-on components
The solution provides self-service provisioning of automated cloud services to both
users and infrastructure administrators. It uses VMware vCloud Automation Center
(vCAC), integrated with EMC ViPR software-defined storage and VMware NSX, to
provide the compute, storage, network, and security virtualization platforms for the
SDDC.
Cloud users can request and manage their own applications and compute resources
within established operational policies. This can reduce IT service delivery times from
days or weeks to minutes. Automation and self-service provisioning features include:
 Self-service portal—Provides a cross-cloud storefront that delivers a catalog of
custom-defined services for provisioning workloads based on business and IT
policies, as shown in Figure 2
 Role-based entitlements—Ensure that the self-service portal presents only the
virtual machine, application, or service blueprints appropriate to a user’s role
within the business
 Resource reservations—Allocate resources for use by a specific group and
ensure that those resources are inaccessible to other groups
 Service levels—Define the amount and types of resources that a particular
service can receive during initial provisioning or as part of configuration
changes
 Blueprints—Contain the build specifications and automation policies that
define the process for building or reconfiguring compute resources
Automation and
self-service
provisioning
Chapter 2: EMC Hybrid Cloud Solution Overview
14 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 2. EMC Hybrid Cloud self-service portal
The solution provides the ability to enforce physical and virtual separation for
multitenancy, as strongly as the administrator requires. This separation can
encompass network, compute, and storage resources to ensure appropriate security
and performance for each tenant.
The solution supports secure multitenancy through vCAC role-based access control
(RBAC), which enables vCAC roles to be mapped to Microsoft Active Directory groups.
The self-service portal shows only the appropriate views, functions, and operations to
cloud users, based on their role within the business.
The solution enables customers to take advantage of the proven benefits of EMC
storage in a hybrid cloud environment. Using ViPR storage services, which leverage
the capabilities of EMC VNX®
and EMC VMAX®
storage systems, the solution provides
software-defined, policy-based management of block- and file-based virtual storage.
ViPR abstracts the storage configuration and presents it as a single storage control
point, enabling cloud administrators to access all heterogeneous storage resources
within a data center as if the resources were a single large array.
The solution uses the capabilities of vCAC and various EMC tools to provide the
intelligence and visibility required to proactively ensure service levels in virtual and
cloud environments. Infrastructure administrators can add storage, compute, and
network resources to their resource pools as needed. Cloud users can select from a
range of service levels for compute, storage, and data protection for their applications
and can expand the resources of their virtual machines on demand to achieve the
service levels they expect for their application workloads.
Multitenancy and
secure separation
Workload-
optimized storage
Elasticity and
service assurance
Chapter 2: EMC Hybrid Cloud Solution Overview
15EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
The solution features automated monitoring and management capabilities that
provide IT administrators with a comprehensive view of the cloud environment to
enable smart decision-making for resource provisioning and allocation. These
automated capabilities are based on a combination of EMC ViPR Storage Resource
Management (SRM), VMware vCenter Log Insight, and VMware vCenter Operations
Manager (vC Ops), and use EMC plug-ins for ViPR, VNX, VMAX, and EMC Avamar®
systems to provide extensive additional storage detail.
Cloud administrators can use ViPR SRM to understand and manage the impact that
storage has on their applications and to view their storage topologies from
application to disk, as shown in Figure 3.
Figure 3. EMC ViPR Analytics with VMware vCenter Operations Manager
Capacity analytics and what-if scenarios in vC Ops identify over-provisioned
resources so they can be right-sized for the most efficient use of virtualized
resources. In addition, for centralized logging, infrastructure components can be
configured to forward their logs to vCenter Log Insight, which then aggregates the
logs from all the disparate sources for analytics and reporting.
The solution uses VMware IT Business Management Suite (ITBM) to provide cloud
administrators with comprehensive metering and cost information across all
business groups in the enterprise. ITBM is integrated into the cloud administrator’s
self-service portal and presents a dashboard overview of the hybrid cloud
infrastructure, as shown in Figure 4.
Operational
monitoring and
management
Metering and
chargeback
Chapter 2: EMC Hybrid Cloud Solution Overview
16 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 4. IT Business Management Suite overview dashboard for hybrid cloud
The EMC Hybrid Cloud solution provides modular add-on components for the
following services:
 Application services
This add-on solution leverages VMware vCloud Application Director to optimize
application deployment and release management through logical application
blueprints in vCAC. Users can quickly and easily deploy blueprints for
applications and databases such as Microsoft Exchange, Microsoft SQL Server,
Microsoft SharePoint, Oracle, and SAP.
 Data protection services
EMC Avamar and EMC Data Domain®
systems provide a backup infrastructure
that offers features such as deduplication, compression, and VMware
integration. By using VMware vCenter Orchestrator (vCO) workflows customized
by EMC, administrators can quickly and easily set up multitier data protection
policies and enable users to select an appropriate policy when they provision
their virtual machines.
 Continuous availability
A combination of EMC VPLEX®
virtual storage and VMware vSphere High
Availability (HA) provides the ability to federate information across multiple
data centers over synchronous distances. With virtual storage and virtual
servers working together over distance, the infrastructure can transparently
provide load balancing, real time remote data access, and improved
application protection.
Modular add-on
components
Chapter 2: EMC Hybrid Cloud Solution Overview
17EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
 Disaster recovery
This add-on solution enables cloud administrators to select disaster recovery
(DR) protection for their applications and virtual machines when they provision
their hybrid cloud environment. ViPR automatically places these systems on
storage that is protected remotely by EMC RecoverPoint®
technology. VMware
vCenter Site Recovery Manager automates the recovery of all virtual storage and
virtual machines.
 Platform as a service
The EMC Hybrid Cloud solution provides an elastic and scalable IaaS
foundation for platform-as-a-service (PaaS) and software-as-a-service (SaaS)
services. Pivotal CF provides a highly available platform that enables
application owners to easily deliver and manage applications over the
application lifecycle. The EMC Hybrid Cloud service offerings enable PaaS
administrators to easily provision compute and storage resources on demand
to support scalability and growth in their Pivotal CF enterprise PaaS
environments.
Chapter 2: EMC Hybrid Cloud Solution Overview
18 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 3: EMC Hybrid Cloud Hadoop as a Service
19EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 3 EMC Hybrid Cloud Hadoop as a
Service
This chapter presents the following topics:
Overview..................................................................................................................20
EMC Hybrid Cloud HaaS and IaaS .............................................................................20
Pivotal Hadoop.........................................................................................................21
Serengeti .................................................................................................................22
VMware Big Data Extensions....................................................................................22
Chapter 3: EMC Hybrid Cloud Hadoop as a Service
20 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Overview
This chapter identifies and briefly describes the major features and functionality
required to support Pivotal Hadoop as a service and promote scalability in the EMC
Hybrid Cloud environment.
 EMC Hybrid Cloud HaaS and IaaS
 Project Serengeti
 VMware Big Data Extensions (BDE)
 Pivotal Hadoop (PHD)
 HaaS Self-Service Portal
EMC Hybrid Cloud HaaS and IaaS
EMC Hybrid Cloud HaaS is a solution stack made up of EHC IaaS, integrated with BDE
and PHD. The self-service aspect of the portal is controlled by vCAC as shown in
Figure 5.
Hadoop is an open-source software program that supports the processing of large
data sets in a distributed computing environment. It is part of the Apache project
sponsored by the Apache Software Foundation. PHD is an Apache Hadoop
distribution.
Deploying a Hadoop cluster using traditional methods is complex and time-
consuming. It typically involves setting up the infrastructure, installing and
configuring the operating system, acquiring the respective Hadoop media, installing
Hadoop components, and finally creating the Hadoop cluster.
This process typically takes weeks and requires a significant skillset. The EMC HaaS
offering simplifies the process by using extensive workflow automation in the EHC
IaaS backend. Through self-service automation, it is now possible to deploy or
expand a Hadoop cluster in minutes using the vCloud Automation Center self-service
portal.
Chapter 3: EMC Hybrid Cloud Hadoop as a Service
21EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 5. EMC Hybrid Cloud HaaS component overview
Pivotal Hadoop
Pivotal Hadoop (PHD) is an open-source software program that supports the
processing of large data sets in a distributed computing environment. It is part of the
Apache project sponsored by the Apache Software Foundation. PHD is an Apache
Hadoop distribution. The complete PHD platform contains a number of components
that are not specifically used within this solution:
 YARN (Yet Another Resource Negotiator)—a distributed processing framework
that can schedule and execute resource requests from multiple applications
 HBASE—a column database that runs on top of the Hadoop Distributed Files
System (HDFS)
 HAWQ—HAWQ is a parallel SQL query engine that combines the merits of the
Greenplum Database Massively Parallel Processing (MPP) relational database
engine and the Hadoop parallel processing framework
 ZooKeeper—a centralized service for maintaining configuration information,
naming services, providing distributed synchronization, and providing group
services
 Hive—a data warehouse infrastructure built on top of Hadoop infrastructure
 Hadoop Map Reduce—Map Reduce is a programming model for processing and
generating large data sets with a parallel, distributed algorithm on a cluster
Chapter 3: EMC Hybrid Cloud Hadoop as a Service
22 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 6 shows the PHD components.
Figure 6. Pivotal Hadoop (PHD) components
Note: YARN, HBASE, HAWQ and HIVE are not referenced in this solution. HAWQ is not
installed by default and must be installed separately. This can be automated through the
use of vCO workflows if required.
Serengeti
Serengeti is an open source project initiated by VMware to enable the deployment
and management of Hadoop and big data clusters in a vCenter Server managed
environment. The key components are the Serengeti Management Server, which
provides a framework for running big data clusters on vSphere, and a command line
interface that provides tools and utilities that form an administrative interface for
managing and monitoring the cluster environments.
VMware vSphere Big Data Extensions
VMware vSphere Big Data Extensions, or BDE, is a feature within vSphere to support
big data and open source Hadoop distribution workloads. BDE provides an integrated
set of management tools to help enterprises deploy, run, and manage Hadoop on a
common virtual infrastructure. Figure 7 shows how BDE is an installable virtual
appliance plug-in that controls and monitors Hadoop Services. The BDE virtual
appliance runs on top of vSphere and uses the Serengeti Management Server to
control cluster creation by cloning templates through the template server.
BDE is a commercial version of Serengeti, which is an open source project from
VMware. BDE provides the features of Serengeti in an enterprise format, including:
 An open source supported version of the Apache Hadoop Distribution
Chapter 3: EMC Hybrid Cloud Hadoop as a Service
23EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
 The big data extensions GUI which is integrated into vSphere Web Client to
perform Hadoop infrastructure and cluster management tasks
 Elastic-enabled clusters that optimize and provide scaling of physical compute
resources in a vSphere environment
Figure 7. BDE and Serengeti stack
Chapter 3: EMC Hybrid Cloud Hadoop as a Service
24 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 4: HaaS Component Integration
25EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 4 HaaS Component Integration
This chapter presents the following topics:
Overview..................................................................................................................26
Integrating Hadoop components with EMC Hybrid Cloud .........................................26
Configuring the platform..........................................................................................28
Chapter 4: HaaS Component Integration
26 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Overview
This section provides guidance on configuring the services required for Hadoop as a
Service, specifically BDE and PHD, and integrating them with EMC Hybrid Cloud IaaS
services.
Integrating Hadoop components with EMC Hybrid Cloud
To install and configure Hadoop-as-a-Service components, refer to the appropriate
vendor documentation referenced in in the installing and configuring sections for the
component in this chapter.
The steps discussed assume that the EMC Hybrid Cloud has been installed and
configured as described in the EMC Hybrid Cloud Solution with VMware – Foundation
Intrastructure Solution Guide 2.5, and that the IaaS, portal, catalog services, and
tenant structure are all in place.
BDE runs on top of Serengeti. Figure 8 shows the virtual appliance that runs the
Serengeti Management Server and Template Server. BDE provides the GUI for
managing Hadoop clusters, communicating through the Serengeti Management
Server.
Figure 8. BDE and vSphere deployment topology
With VMware’s vSphere Big Data Extensions, you can enable deployment of Hadoop
inside your VMware vSphere environment. The Big Data Extensions are distributed as
a downloadable OVA-based virtual appliance that is imported into an existing
environment. The minimum requirements to support BDE are vSphere 5.0 or later and
Enterprise or Enterprise plus vSphere licenses. By default, the basic Apache
Foundation distribution of Hadoop is also included, but it is very easy to add in other
commercial Hadoop distributions such as Pivotal Hadoop, Cloudera Hadoop,
Hortonworks Hadoop, or MapR Hadoop. This solution uses the Pivotal Hadoop
distribution integrated with the EMC Hybrid Cloud IaaS stack to create Hadoop as a
Service.
BDE Topology
Chapter 4: HaaS Component Integration
27EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
After BDE is installed, you can begin creating a virtual Hadoop cluster. You can
specify a number of configuration options including distribution, topology (basic,
compute/storage separation, HBase-only, or custom), and the number and size of the
virtual machines for each of the Hadoop roles (for example, name node, client node,
and data nodes). Note the options presented in the web interface are only a fraction
of what can be invoked through the advanced command-line tools and API.
When you start to deploy a Hadoop cluster, BDE clones the appropriate virtual
machines and automatically builds out the cluster. When you are satisfied with the
cluster, you can scale up (increase the size of the virtual machine’s memory and CPU
resources) or scale out (increase the number of virtual machines). You can configure
the cluster to scale automatically as the load alters for additional flexibility and
efficiency.
Some of the benefits of virtualizing Hadoop—for example, elasticity and multi-
tenancy—arise from the increased number of deployment options that become
available when Hadoop is virtualized. Figure 9 shows the evolution of virtual Hadoop,
from self-contained to a tenant-based model.
Figure 9. The evolution of virtual Hadoop
The traditional Hadoop model combines compute and data. While this
implementation is straightforward, representing how the physical Hadoop model can
be directly translated into a virtual machine, the ability to scale up and down is
limited because the lifecycle of this type of virtual machine is tightly coupled to the
data it manages. Powering off a virtual machine with combined storage and
computing means access to its data is lost. Scaling out by adding more nodes would
necessitate rebalancing data across the expanded cluster, so this model is not
particularly elastic.
Separating computing from storage in a virtual Hadoop cluster can achieve compute
elasticity, enabling mixed workloads to run on the same virtualization platform and
improving resource utilization. It is simple to configure using a HDFS data layer that is
always available, along with a compute layer comprising a variable number of
TaskTracker nodes, which can be expanded and contracted on demand.
Extending the concept of data-compute separation, multiple tenants can be
accommodated on the virtualized Hadoop cluster by running multiple Hadoop
Virtualized Hadoop
Chapter 4: HaaS Component Integration
28 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
compute clusters against the same data service. Using this model, each virtual
compute cluster enjoys performance, security, and configuration isolation.
While Hadoop performance using the combined data-compute model on vSphere is
similar to its performance on physical hardware, providing virtualized Hadoop
increased topology awareness can enable the data locality needed to improve
performance when data and compute layers are separated. Topology awareness
allows Hadoop operators to realize elasticity and multi-tenancy benefits when data
storage and computing are separated. Furthermore, topology awareness can improve
reliability when multiple nodes of the same Hadoop cluster are colocated on the
same physical host.
To optimize the data locality and failure group characteristics of virtualized Hadoop:
 Group virtual Hadoop nodes on the same physical host into the same failure
domain, and avoid multiple replicas.
 Maximize usage of the virtual network between virtual nodes on the same
physical host. The virtual network has higher throughput and lower latency than
the physical network and does not consume any physical switch bandwidth
Configuring the platform
Refer to VMware vSphere Big Data Extensions Administrator's and User's Guide to
install and configure the BDE components required for Hadoop as a Service.
Configuration task order
The following steps outline the high-level tasks you need to perform to install and
configure BDE:
1. Ensure the environment meets the minimum vSphere requirements, correct
licensing is in place, and compute, storage and networking pre-requisites are
met.
2. Configure cluster settings, including vSphere HA, Distributed Resource
Scheduling, host monitoring, and admission control.
3. Configure network settings using either vSwitch, vSphere Distributed Switch
(vDS), or NSX. Ensure the required ports are configured as part of any firewall
policy.
4. Deploy the BDE OVF file and assign the management network. When you
deploy BDE the setup will ask for a destination port group; this is the network
that the management network uses to communicate with the server so the
port group should be the same as the VLAN ID. If vCenter or BDE are unable to
communicate with each other, then the integration will fail.
Configuring SSO service
As part of the configuration process an important step is to configure the SSO service
and management server IP addresses.
1. As shown in Figure 10, from the left pane in the Deploy OVF Template page
select Customize template.
Installing and
configuring BDE
Chapter 4: HaaS Component Integration
29EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
2. In the VC SSO Lookup Service URL box, type the vCenter Server Fully Qualified
Domain Name FQDN in the same format as shown (if the default server name
has not been changed). If you do not specify the FQDN here, then the
certificate will not be accepted and there will be a connection issue between
BDE and the Serengeti server later.
3. Under Management Server Network Settings, enter the appropriate IP address
settings.
Figure 10. Configuring the SSO lookup service and management server IP addresses
Starting BDE in vSphere
After successfully installing and configuring BDE within vSphere, power on the BDE
management server and then register BDE within vSphere as the final part of
configuration by performing the following steps:
1. Log in to the vSphere client with administrative privileges.
2. Within the vSphere client, locate the BDE management server. The
management server is located under the datacenter resource pool in which it
was deployed.
3. Select and record the management IP address.
4. Register the management server using the register plugin URL:
https://management-server-ip-address:8443/register-plugin where
management-server-ip-address is the IP address you recorded in step 3.
5. Complete the required registration information and then click Submit.
The BDE icon should now be available in the list of objects within the inventory.
Chapter 4: HaaS Component Integration
30 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Before installing and configuring PHD, download the following required components
and make them available for the installation:
 Cent OS 6.2 64 bit ISO
 Pivotal Hadoop Tar files
 Oracle JDK 7, 64 bit rpm for Cent-OS
 Big Data Extension OVF
VMware BDE comes supplied with a default Hadoop distribution from Apache. The
HaaS integration requires that Pivotal Hadoop be installed. Get the Pivotal Hadoop
media and documentation from http://www.gopivotal.com/big-data/pivotal-hd, and
register and obtain the necessary licenses. The following high level tasks outline the
process to load the media and create a PHD template within the BDE configuration.
Installing PHD
To create the required installation configuration for BDE, use Yum repositories (as
opposed to a TAR-ball). When you create a Hadoop cluster that is YUM-deployed, the
Hadoop nodes within the cluster then download the Red Hat Package Manager (RPM)
packages for the Pivotal Hadoop distribution from the official Yum repositories.
The Pivotal Hadoop distribution must be installed in a 64-bit version of the CentOS
6.x operating system. You must use either CentOS 6.2 or CentOS 6.4 to create the
Hadoop template virtual machine . The template is used in the cloning process for
creating a Hadoop cluster. After you have deployed the BDE OVF you must follow the
steps to integrate YUM into PHD by creating a YUM repository as outlined below, and
then create the template.
Creating a Yum repository for PHD
The steps for configuring PHD with BDE are described in the VMware vSphere Big Data
Extensions Administrator’s and User’s Administration Guide.
Creating a Hadoop template virtual machine
You must use either CentOS 6.2 or CentOS 6.4 to create the Hadoop template virtual
machine. To upgrade from a previous version, refer the chapter titled “Create a
Hadoop Template Virtual Machine using RHEL Server 6.x” in the VMware vSphere Big
Data Extensions Administrator’s and User’s Administration Guide.
The following steps outline the procedure for creating a Hadoop template virtual
machine:
1. Import the PHD binaries and create PHD media by logging into the BDE
management server and importing the PHD tar files into an appropriate
directory structure on the server. Figure 11shows the binary import process.
Installing and
configuring PHD
Chapter 4: HaaS Component Integration
31EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 11. Importing Hadoop binaries into BDE management server
2. Test that the import was successful by accessing the URL path from a browser
and ensuring that the expected folders are present.
3. After installing the media into the BDE management server, create a new
Pivotal Hadoop template.
4. Make the new Pivotal Hadoop template the default template by removing the
default Hadoop Apache template from the BDE management server, as shown
in Figure 12.
Chapter 4: HaaS Component Integration
32 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 12. Removing the default Apache template from BDE
Configuring custom resources for BDE
VMware BDE requires two resources types when automating Hadoop clusters:
networking resources and storage resources.
Networking resources
Networking is used to assign virtual machines IP addresses. BDE deploys all nodes of
a Hadoop cluster from a single common CentOS template that comes preconfigured
with the BDE vApp management server. As BDE deploys virtual machines into a
cluster, it uses either an existing DHCP server or a statically created IP address pool.
As part of the deployment process, hostnames are assigned by BDE. The hostnames
are the same as the IP addresses. For example, if DHCP assigns 10.10.10.10 then the
hostname of that virtual machine is 10.10.10.10. Hadoop then uses this hostname
for the clusters.
Storage resources
BDE defines two types of storage resources—local and shared. Shared storage is
useful for management or client servers deployed by BDE as shared storage can be
protected with technologies such as VMware HA.
Within Hadoop there are two types of nodes: master and worker nodes. Master
nodes provide tracking functions whereas worker nodes provide job processing
capabilities. Because worker nodes are disposable, they do not require top tier
storage since Hadoop is designed to deal with node failure. There is also no reason to
deploy worker nodes on shared storage. The choice of storage however must be
capable of dealing with the required level of performance for the nodes. Allowing BDE
Chapter 4: HaaS Component Integration
33EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
to use local VMFS storage for worker nodes is analogous to deploying physical worker
nodes on commodity storage using direct attached storage.
The final stage of configuration is to assign storage resources to BDE. This defines
how the Hadoop clusters are deployed, either using local or shared datastores. By
default BDE defines data stores as local. If you need shared datastores, you must
configure the datastores accordingly. Refer to Chapter 6 of the VMware vSphere Big
Data Extensions Administrator’s and User’s Guide for details on how to add
datastores and networks to a cluster from the vSphere client.
For details. refer to the EMC Hybrid Cloud Solution with VMware - Foundation
Infrastructure Reference Architecture 2.5. Detailed installation and configuration
information is available only to select EMC personnel and authorized partners.
Installing and
configuring EMC
Hybrid Cloud IaaS
Chapter 4: HaaS Component Integration
34 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
35EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 5 Creating vCO Workflows and vCAC
Catalog Services for HaaS
This chapter presents the following topics:
Overview..................................................................................................................36
Importing and modifying custom vCO workflows .....................................................36
Creating vCAC Catalog Services ...............................................................................45
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
36 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Overview
The automation of Hadoop clusters is achieved by using custom workflows created
with VMware vCloud Orchestrator (vCO). This chapter describes how these workflows
are configured from within VMware Cloud Automation Center (vCAC) to present
enterprise organizations with a self-service portal that includes a catalog of pre-
configured Hadoop deployment scenarios.
Importing and modifying custom vCO workflows
To use HaaS within EMC Hybrid Cloud, the administrator must use custom vCO
workflows for deploying HaaS. These workflows offer a choice of cluster sizes that can
then be presented as catalog items from the vCloud Automation Center portal. The
workflows are imported into VMware vCO using the vCO import function to be edited,
tested, and packaged according the needs of the organization.
This section describes the process for importing the custom workflows into vCO, so
that the Hadoop Administrator can alter them and link them with the big data cluster
configurations created in the earlier stages of the process.
Importing custom workflows
From within the vCO client, as shown in Figure 13, select Run, click Workflows, and
select Import workflow. Browse to the location where you have placed the workflow
package and click Open. The imported workflow appears in the folder selected.
Figure 13. Importing custom workflows into vCO
Validating workflows
After importing the workflows into vCO, validate them by clicking the name of the
folder containing the workflows and then selecting the Validate option from the
context menu, as shown in Figure 14. The validation process ensures there are no
Modifying custom
workflows
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
37EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
open ends, unreachable workflow elements, or unused attributes in the workflows, so
that they will execute correctly.
Figure 14. Using the validate workflows action
Customizing HaaS workflows
The HaaS workflows provide a framework for deploying each Hadoop cluster
configuration of a given size through an automated workflow. The Hadoop
administrator should modify the attributes of these workflows to meet the specific
needs of the organization. Figure 15 shows how to use the vCO client to edit the
attributes within a workflow.
Figure 15. How to edit the attributes
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
38 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Configuring custom parameters
To make the workflows dynamic, vCO uses a combination of attributes and
parameters to transfer data when it is processing a workflow. Workflow parameters
must receive an input to generate an output or action. An example of configuring a
custom parameter is when an input is received from the user or system. The input can
then be passed to a command or script that would create a username or password,
This in turn can be passed to the Hadoop cluster for authentication.
Figure 16 shows how to create a custom username and password for the Hadoop
Client node.
Figure 16. Editing and creating custom parameter passing
Launching a custom script
Scripts help to edit the schema, which is the main component of a workflow.
Launching individual scripts lets you test the components of the workflow one
element at a time, or execute a script at runtime to prepare the data set, for example.
Figure 17 shows how to launch scripting from within the workflow by using the
Schema panel within the workflow itself.
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
39EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 17. Launching scripts from the VCO
Testing VCO HaaS custom workflows
The previous sections demonstrated how to import the HaaS sample workflows into
EMC Hybrid Cloud, specifically the vCenter Orchestrator which is the main
orchestration and automation engine for the solution. As shown, once imported, the
default workflows can be altered to meet any modifications made to the Hadoop
clusters. The workflows can also be modified to pass any additional parameters that
may be required, for example, passing the username and password or executing
parts of additional scripts components.
The final stage in importing and configuring the workflows is to test the workflows
that have been imported and modified for each of the HaaS cluster sizes (micro
cluster, small cluster, and large cluster). Figure 18 shows how to:
 Select the specific workflow for a given cluster size
 Execute the workflow from vCO
 View the execution process
 Verify the execution progress by checking the log files for any error messages
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
40 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 18. Launching of Micro Hadoop Cluster workflow
Viewing cluster creation
After the VCO workflow is launched, the cluster creation process starts within vSphere
and BDE. The management server uses the template server to clone the nodes
required to create the cluster in terms of the numbers and types of node that
comprise the cluster. To view and verify the cluster creation process, follow these
steps:
1. Login to the vSphere web client.
2. Go to the BDE and view the actual cluster being created.
Figure 19 shows the status of the creation of a micro Hadoop cluster in the BDE panel
of the vSphere web client.
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
41EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 19. Status of creation of Micro Hadoop cluster from BDE (vSphere web client)
You can also log in to the vSphere Client Application and view the Hadoop
cluster being created. Figure 20 shows the status of the creation of the Micro
Hadoop cluster in the vSphere Client Application.
Figure 20. Status of Micro Hadoop cluster creation from BDE vSphere Client
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
42 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Creating BDE Clusters
After the vCO workflows are imported they need to be customized for the different
sized clusters according to the requirements of the enterprise. The examples
provided describe micro, small, and large Hadoop clusters.
The custom workflows define the type of the cluster, including cluster configuration,
in terms of the number of master nodes, client nodes, and data nodes for each size.
Creating a Hadoop cluster
These steps document the procedure for creating a Hadoop cluster within BDE, which
can then be translated when building a VCO workflow:
1. In vCenter, under Objects > Data Extensions,click New Big Data Cluster.
2. Follow the steps in the wizard, specifying the appropriate parameters as
required. More detail can be found in the VMware vSphere Big Data
Extensions Administrator’s and User’s Guide.
The following sections outline the options and details required during the cluster
configuration process.
Naming a Hadoop cluster
When prompted by the wizard, type a name to identify the cluster. Valid characters for
cluster names are alphanumeric and underscores. When choosing a cluster name you
should also consider the associated vApp name. Together the vApp and cluster name
must be less than 80 characters.
Configuring the Hadoop distribution
When configuring a Hadoop cluster, you must select the correct Hadoop distribution
from the Hadoop distribution list box Change the default from Apache to Pivotal HD,
as shown in Figure 21. The distribution name matches the value of the name
parameter that was passed to the config-distro.rb script when the Hadoop
distribution was configured. For a Pivotal PHD 1.1 cluster, you must configure a valid
DNS and FQDN for the cluster's HDFS and MapReduce traffic. Without valid DNS and
FQDN settings, the cluster creation process might fail or the cluster is created but
does not function.
Figure 21. Create and name a new Big Data Cluster
Specifying deployment type
When prompted by the wizard, select the deployment type for the cluster, either Basic
Hadoop Cluster or Data/Compute Separation Cluster. The type of cluster you create
determines the available node group selections.
Creating new BDE
clusters
Configuring a
Hadoop cluster
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
43EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Identifying the DataMaster node group
The DataMaster node is a virtual machine that runs the Hadoop NameNode service.
This node manages HDFS data and assigns tasks to Hadoop TaskTracker services
deployed in the worker node group. To identify the group:
1. Select a resource template from the list box or select Customize to create a
custom resource template.
2. For the master node, specify shared storage so that the virtual machine is
protected with vSphere HA.
Identifying the ComputeMaster node group
The ComputeMaster node is a virtual machine that runs the Hadoop JobTracker
service. This node assigns tasks to Hadoop TaskTracker services deployed in the
worker node group. To identify the group:
1. Select a resource template from the list box or select Customize to create a
custom resource template.
2. For the master node, specify shared storage so that the virtual machine is
protected with vSphere HA.
Identifying the HBaseMaster node group (HBase cluster only)
The HBaseMaster node is a virtual machine that runs the HBase master service. This
node orchestrates a cluster of one or more RegionServer slave nodes. To identify the
group:
1. Select a resource template from the list box or select Customize to create a
custom resource template.
2. For the master node, specify shared storage so that the virtual machine is
protected with vSphere HA.
Identifying the Worker node group
Worker nodes are virtual machines that run the Hadoop DataNode, TaskTracker, and
HBase HRegionServer services. These nodes store HDFS data and execute tasks. To
identify the group:
1. Select a resource template from the list box or select Customize to create a
custom resource template.
2. For the worker nodes, use local storage.
Note: You can add nodes to the worker node group by using Scale Out Cluster, but you
cannot reduce the number of nodes.
Identifying the Client node group
A client node is a virtual machine that contains Hadoop client components. From this
virtual machine you can access HDFS, submit MapReduce jobs, run Pig scripts, run
Hive queries, and run HBase commands. When configuring the cluster for use with
HaaS, you do not configure the Client node group unless any of these configuration
items are required outside of the HaaS solution.
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
44 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
To identify the group:
1. Select a resource template from the list box or select Customize to create a
custom resource template.
2. For the client nodes, use local storage.
Note: You can add nodes to the client node group by using Scale Out Cluster, but you
cannot reduce the number of nodes.
Selecting the Hadoop topology configuration
When you create a cluster with BDE, BDE disables automatic migration for the
cluster’s virtual machines. This prevents vSphere from migrating anything but does
not prevent the administrator from migrating nodes unintentionally to other vCenter
hosts. It is essential that migrating is not performed from within vCenter as this could
break the cluster placement policy.
As part of the final cluster configuration you should select the topology configuration
that you want the cluster to use: RACK_AS_RACK, HOST_AS_RACK , HVE, or NONE.
More information is available in the chapter “About Cluster Topology” in chapter 7 of
the VMware vSphere Big Data Extensions Administrator’s and User’s Guide.
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
45EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Creating vCAC Catalog Services
The focus of customization for this EMC Hybrid Cloud solution is the VMware vCAC
user self-service portal, where additional functionality is included to enable
additional services for cloud users. The final stage of integrating Hadoop as a Service
is to present to vCAC the HaaS workflows that have been imported and modifiedso
that they can be selected as catalog items.
VMware vCAC 6.0 provides the extensibility to enable IaaS functionality through
Advanced Service blueprints. The IaaS functionality is achieved by exposing custom
vCO workflows that the vCAC 6.0 portal can present as a catalog of services for cloud
users.
You can create custom workflow definitions using vCAC Designer. The vCAC Designer
console provides a visual workflow editor for customizing vCAC lifecycle workflows.
The extensibility toolkits include a library of activities that serve as building blocks for
custom workflows.
Using the Advanced Service Designer, you can define new service offerings and
publish them to the common catalog as catalog items.
To create the service blueprints you must access vCAC from a browser and log in to
vCAC.
Each tenant has a unique URL to the vCAC console:
 The default tenant URL is in the following format:
https://hostname/shell-ui-app
where hostname is the Fully Qualified Domain Name (FQDN) of a vCAC host.
 The URL for additional tenants is in the following format:
https://hostname/shell-ui-app/org/tenantURL
where tenantURL is the URL name specified when the tenant is being created.
This is the workspace in which the customer creates catalog services.
The following steps demonstrate, at a high level, how to integrate the HaaS workflows
into the vCAC self-service catalog by showing the creation of:
 Catalog services
 Blueprints
 Custom resources and resource actions
For more information, refer to the vCloud Automation Center Extensibility Guide.
To integrate the HaaS workflows into the vCACA self-service catalog, follow these
steps:
1. From the main vCAC portal page, click Advanced Services to list all of the
current service blueprints defined.
2. Click the green “plus” symbol, shown in Figure 22, to create a new service
blueprint.
Accessing vCAC
Creating a new
service blueprint
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
46 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 22. Advance Service Designer
Follow these steps to create a new service blueprint:
1. Select one of the imported Hadoop Cluster Creation workflows from the list.
2. Name the new service and create a form to support user input for the required
parameters. If required, delete the default form and create a new form.
3. Drag and drop any appropriate input fields onto the form.
4. Publish the new service to create the appropriate service definition in the
catalog management.
5. Assign a catalog management service to the new advanced service, and
create the appropriate entitlement definition in the catalog management, as
shown in Figure 23.
Figure 23. Edit Entitlement window
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
47EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
When these tasks are completed, the new service is then available in the service
catalog for the cloud administrator. It is possible to replace the default VMware logo
icons in the service catalog with more suitable HaaS icons. The replacement of icons
is the final stage of customization and ensures that the service catalog items are
tailored to a specific function or application. This can be performed from the Catalog
Management menu by selecting the Catalog Items list box, selecting the configure an
icon option, and then browsing and selecting a new icon.
After the configuration stages have been performed within vCAC, the service catalog
is available to provision HaaS items, as shown in Figure 24.
Figure 24. vCAC Service Catalog showing Hadoop as a Service
Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS
48 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
49EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 6 Use Cases: EMC Hybrid Cloud IaaS
This chapter presents the following topics:
Overview..................................................................................................................50
IaaS – storage services ............................................................................................50
Monitoring and capacity planning............................................................................57
Metering and chargeback ........................................................................................61
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
50 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Overview
This chapter covers EMC Hybrid Cloud IaaS and other use cases that can be
incorporated to extend the functionality beyond virtual machine provisioning to
consume resources.
From time to time additional physical resources will be required to support the
extension of a Hadoop environment. The following sections show how EHC storage
provisioning workflows can be used to create additional resources on demand by
provisioning additional storage as required, and how the VMware vC Ops tool set can
be used to analyze consumed resources, provide capacity planning, increase
resources using scenarios that increase physical resources, and increase VM and
node capacity.
IaaS – storage services
Storage is provisioned, allocated, and consumed by different cloud users in this
solution.
For vCAC IaaS users, the storage services provided in the vCAC service catalog
provision storage resources that will be allocated to and consumed by other cloud
users.
Once the storage resources are available, fabric group administrators can assign the
resources to business groups. Creators of virtual machine blueprints (business group
managers) can then configure their blueprints to use those particular storage
resources for the list of virtual machine disks.
When they provision virtual machines, cloud users consume the storage and,
depending on their entitlements, may choose the storage service for their virtual
machines.
This use case demonstrates how ViPR software-defined storage is provisioned for the
hybrid cloud from the VMware vCAC self-service catalog.
1. To provision block or file storage from the vCAC self-service portal, select the
Provision Cloud Storage item from the vCAC service catalog, as shown in
Figure 25.
Overview
Use case 1:
Storage
provisioning
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
51EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 25. Storage Services - Provision cloud storage
The storage service blueprint can be created using vCAC anything-as-a-service
(XaaS) functionality in the vCAC Advanced Service Designer. EMC ViPR
provisioning workflows, which are presented by vCO to the vCAC service
catalog, support storage services.
The storage provisioned by the IaaS user enables the fabric group
administrator to make storage resources available to their business group.
The storage provisioning request requires very little input from the vCAC IaaS
user.
The main inputs required are:
 Datastore Type: VMFS or NFS
 Datastore Size
 vCenter Cluster
 Storage Tier
Most of these inputs, except LUN size, are selected from pre-populated list
boxes whose items are determined by the cluster resources available through
vCenter and the virtual pools available in ViPR.
After entering a description and reason for the storage-provisioning request,
enter your password. The vCenter Server will manage multiple ESXi clusters;
therefore, you must choose the relevant vCenter cluster to tell the
provisioning operation where to assign the storage device. Select a vCenter
cluster from the next screen, as shown in Figure 26.
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
52 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 26. Provision Cloud Storage – select vCenter cluster
2. Select the type of datastore you require from the list of available storage
types, as shown in Figure 27. A datastore type of VMFS requires block
storage, while NFS requires file storage. Other data services such as disaster
recovery and continuous availability are displayed as appropriate only if
detected in the underlying infrastructure.
Figure 27. Storage Provisioning – Select datastore type
3. Select from which storage offering the new storage device should be
provisioned. The list of available storage offerings is based on the datastore
type selected, such as VMFS or NFS, and what matching virtual pools are
available from the ViPR virtual array.
In this example, a single NFS-based ViPR virtual pool is available to provision
storage from, with the available capacity of the virtual pool also displayed to
the user, as shown in Figure 28.
The storage pools listed have been configured in the EMC ViPR virtual array
and their storage capabilities are associated with storage profiles created in
vCenter.
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
53EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 28. Storage provisioning – Choose ViPR storage pool
4. Enter the size required for the new storage, in GB, as shown in Figure 29.
Figure 29. Storage provisioning – Enter storage size
5. The fabric group administrator must reserve the new Storage Pool for use by
the business group, as shown in Figure 30.
Figure 30. Provision Storage – Storage Reservation for vCAC Business Group
When the automated process sends an email notification to the fabric group
administrator that the storage is ready and available in vCAC, the fabric group
administrator can then assign capacity reservations on the device for use by
the business group.
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
54 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
In this example, a number of required input values, such as LUN or datastore name,
have been masked from the user during the storage provisioning request process.
Some of these values are locked-in and managed by the orchestration process and
logic to ensure consistency.
In addition to the initial provisioning of storage to the ESXi cluster at the vSphere
layer, this solution provides further automation and integration of the new storage up
into the vCAC layer. The ViPR storage provider automatically tags the storage device
with the appropriate storage profile based on its storage capabilities.
The remaining automated steps in this solution are:
 vCAC rediscovery of resources under vCenter endpoint
 vCAC storage reservation policy assigned to new datastore
 vCAC fabric group administrator notification of availability of new datastore
This use case demonstrates how cloud users can consume the available storage
service offerings. This use case is part of the broader virtual machine deployment use
case, but here it relates directly to how the business group manager and users can
manage the storage service offerings available to them.
VMware vCAC business group managers and users can select the appropriate storage
for their virtual machine through the VMware vCAC user portal.
For business group managers, the storage type for the virtual machine disks can be
set during the creation of a virtual machine blueprint. As shown in Figure 31, the
relevant storage reservation policy can be applied to each of the virtual disks.
Figure 31. Set storage reservation policy for virtual machine disks
After the storage reservation policy is set, the blueprint will always deploy this virtual
machine and its virtual disks to that storage type. If more user control is required at
deployment time, the business group manager can elect to allow business group
users to reconfigure the storage reservation policies at deployment time by selecting
the checkbox Allow user to see and change storage reservation policies.
Use case 2: Select
virtual machine
storage
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
55EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
This solution uses VMware IT Business Management Suite (ITBM) to provide
chargeback information on the storage service offerings for the hybrid cloud. Through
its integration with VMware vCenter and vCAC, ITBM enables the cloud administrator
to automatically track utilization of storage resources provided by EMC ViPR.
The EMC ViPR VASA provider in vCenter automatically captures the underlying storage
capabilities of LUNs provisioned from virtual pools on the EMC ViPR virtual array.
Storage profiles are created based on these storage capabilities, which are aligned
with the storage service offerings. This integration enables ITBM to automatically
discover and group datastores based on predefined service levels of storage.
In this solution we created a separate virtual machine storage profile for each of the
storage service offerings, as shown in Figure 32.
Figure 32. Create new virtual machine storage profile for Tier 2 storage
The storage capabilities are shown automatically in vSphere, as shown in Figure 33,
where Tier 2 EMC ViPR storage is supporting a datastore.
Figure 33. Automatic discovery of storage capabilities using EMC ViPR Storage Provider
Note: Storage capabilities are only visible in the traditional vSphere client and not in the
web client. Also, the web client uses virtual machine storage policies in place of virtual
machine storage profiles.
After the EMC ViPR Storage Provider has automatically configured the datastores with
the appropriate storage profiles, the data stores can be grouped and managed in
ITBM in line with their storage profile. Figure 34 shows that the cost profiles created
Use case 3:
Metering storage
services
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
56 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
in vCenter are discovered by ITBM. This allows the business management
administrator to group tiered datastores provisioned with ViPR and set the monthly
cost per GB as needed.
Figure 34. VMware ITBM chargeback based on storage profile of datastore
VMware vCAC can provide a storefront for storage services to be used by cloud users.
These service catalog items deploy EMC ViPR software-defined storage services
based on the usage of multiple service offerings of block and file storage across EMC
VNX and VMAX storage arrays. Each service offers varying levels of availability,
capacity, and performance to satisfy the operational requirements of different lines of
business.
This solution combines EMC ViPR with EMC array-based FAST-enabled storage service
offerings across the EMC storage arrays with VMware vSphere to simplify storage
operations for hybrid cloud consumers.
Summary
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
57EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Monitoring and capacity planning
The vCenter Operations Management Suite has functions that can help HaaS
administrators to achieve the following goals:
 Eliminate or significantly reduce the manual problem-solving effort in the
environment.
 Proactively manage core service and cloud infrastructure performance, and
utilize infrastructure resources optimally.
 Provision proactive warnings regarding performance issues before problems
affect the end user. Real-time performance dashboards enable service
providers to meet their SLAs by highlighting potential performance issues
before end users notice these issues.
Infrastructure maintenance and operations teams need the end-to-end visibility and
intelligence to make fast, informed operational decisions to proactively ensure
service levels in cloud environments. They need to get promptly to the root cause of
performance problems, optimize capacity in real time, and maintain compliance in a
dynamic environment of constant change.
The vCenter Operations Management Suite offers many features and functions to
deliver quality of service, operational efficiency, and continuous compliance for your
dynamic cloud infrastructure and business critical applications.
This section describes in detail the capacity planning functions that can help you to
predict the impact on underlying infrastructure of new HaaS deployments or of
upgrading current HaaS instances with new services.
Forecasting capacity risks in vCenter Operations Manager involves creating what-if
scenarios to examine the demand and supply of resources in the cloud infrastructure.
A what-if scenario is a supposition about how capacity and load might change if
certain conditions, influenced by an increased or decreased number of ESX hosts,
storage resources, or virtual machines in environment, occur, without making actual
changes to your virtual infrastructure. If you implement the scenario, you know in
advance what your capacity requirements are.
To create a what-if scenario, you can use models and profiles based on current
resource consumption in the existing environment. Alternatively, you can manually
define amounts of virtual machine RAM, storage, CPU, and utilization in a new
consumption profile, as shown in Figure 35, to predict the potential impact of growth.
Monitoring
Capacity planning
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
58 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 35. Choosing virtual machine consumption models and profiles
To define a new virtual machine profile, you can make detailed specifications that
give you the option to include and predict specific resource utilizations, reservations,
and limits in order to get as accurate a projection as possible, as shown in Figure 36.
Figure 36. Specifying configuration and projected capacity usage of new virtual machines
Figure 37 shows that there are insufficient resources for a planned deployment
scenario consisting of either 50 or 85 new virtual machines. In this case, we can
easily provision new vSphere hosts using vCAC services as described in previous
sections.
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
59EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 37. Capacity summary showing insufficient CPU and RAM resources
Before you provision new hardware resources, you can create hardware change
scenarios to determine the effect of adding, removing, or updating the hardware
capacity in a vSphere cluster. You can create a scenario that models changes to hosts
and datastores, as shown in Figure 38 and Figure 39.
Figure 38. Specifying number of hosts and amount of CPU and memory
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
60 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 39. Specifying datastore size
The what-if scenario capacity planning function allows you compare how adding
different amounts of virtual machines and hardware will impact your actual
environment, as shown in Figure 40.
Figure 40. Compared scenarios
In a planning exercise, assume that you:
 Have a request to deploy an additional 45 Hadoop node instances in the
existing HaaS.
 Plan to purchase blade servers compliant with a certain specification.
 Want to deploy an additional 25 Hadoop clusters.
Capacity planning
example
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
61EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
In Figure 41, each column shows how an individual change affects resources in your
environment. The Combined Scenarios column shows you the cumulative effect of
hardware purchasing and an overall expansion of 70 virtual machines.
Figure 41. Combined scenarios
Metering and chargeback
VMware ITBM provides cloud administrators with comprehensive metering and cost
information across physical and virtual resources in the EMC Hybrid Cloud
environment. Besides working out the cost of physical components such as storage,
compute, and networking resources, you can also include and configure other factors
that affect the overall cost of your cloud environment, such as operating system
licensing, maintenance, labor, and environmental facilities costs, as shown in Figure
42.
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
62 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Figure 42. Categorized hybrid cloud environment cost overview
ITBM is integrated into the vCAC portal for the Hadoop administrator and presents a
dashboard overview of the hybrid cloud infrastructure.
VMware ITBM Standard Edition uses its own reference database, which has been
preloaded with industry-standard data and vendor-specific data to generate the base
price for virtual CPU (vCPU), RAM, and storage values. These prices, which default to
the cost of CPU, RAM, and storage, are automatically consumed by vCAC, where they
can be changed as appropriate by the cloud administrator. This eliminates the need
to manually configure cost profiles in vCAC and assign them to compute resources.
ITBM is also integrated with vCenter and can import existing resource hierarchies,
folder structures, and vCenter tags to associate EMC Hybrid Cloud resource usage
with business units, departments, and projects.
Infrastructure resources consumed by HaaS instances and hosted applications are
provided by dedicated vSphere clusters with associated vSphere hosts and
datastores. ITBM provides you with detailed information about:
 Number of vSphere hosts in the vSphere cluster and the number of virtual
machines on each host
 CPU and RAM capacity and utilization of the vSphere cluster
 Overall cost of the compute resources provided by the dedicated vSphere
cluster
 Cluster cost by virtual machine
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
63EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
The Clusters tab provides you with insight into the cost of the vSphere cluster
resources consumed by Hadoop cluster instances. You can monitor costs while
provisioning new hosts, as shown in Figure 43.
Figure 43. vSphere Cluster cost overview
The Datastores tab provides insight into the cost of the storage resources consumed
by an HaaS instance. The name of a datastore provisioned by vCAC storage services
inherits a cluster name prefix as part of its published name. Performing a sort by
datastore name gives you a list of the names and costs of the datastores provisioned
and assigned to hosts in the vSphere cluster, as shown in Figure 44.
Figure 44. Storage cost overview
Chapter 6: Use Cases: EMC Hybrid Cloud IaaS
64 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 7: Conclusion
65EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Chapter 7 Conclusion
This chapter presents the following topics:
Summary..................................................................................................................66
Chapter 7: Conclusion
66 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Summary
Pivotal Hadoop is designed to create an easy-to-scale big data framework. To achieve
this kind of flexibility, HaaS is designed around the modular system components of
Pivotal Hadoop. Using vCenter Orchestrator workflows, the administrator can provide
fixed cluster configuration catalog items or create dynamic workflows that can be
called from a catalog. The size of the nodes used is determined by the individual
making the request.
Elastic provisioning refers to the ability to provision flexible computing resources
when and where they are required and to easily scale resources up and down to
match demand. Resource elasticity can relate to processing power, memory, storage,
bandwidth, and so on. This document indicates the importance of having an elastic
and scalable IaaS platform on which to support the hosting of dynamically changing
and fast-growing big data platforms.
VMware vCenter Operations Manager enables you to deliver quality of service, attain
operational efficiency, and gather current capacity capabilities while forecasting the
effect of future HaaS deployments or upgrades in your cloud infrastructure.
HaaS clusters can grow to a large number of node instances. The limit can be
changed by changing the BDE configuration parameters. It is crucial therefore to have
proactive performance monitoring and capacity planning solutions in place.
To support comprehensive, dynamic, and fast-growing development environments
such as Hadoop as a service, you must ensure the stability of the underlying cloud
compute infrastructure, which must provide availability, scalability, flexibility, and
performance to the big data platform and its services. As a solution to these
challenges, this document has addressed simple provisioning from a self-service
catalog and considerations for building scalable Hadoop as-a -ervice environments,
with an elastic and easy-to-deploy underlying IaaS infrastructure provided by the EMC
Hybrid Cloud solution.
Appendix A: References
67EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
Appendix A References
This appendix presents the following topic:
References ...............................................................................................................68
Appendix A: References
68 EMC Hybrid Cloud Solution with VMware
Hadoop Applications Solution Guide 2.5
VMware references
The following VMware documents provide additional and relevant information:
 Advanced Service Design vCloud Automation Center 6.0
 Installing and Configuring VMware vCenter Orchestrator
 VMware Compatibility Guide
 VMware vSphere Big Data Extensions Administrator’s and User’s Guide:
vSphere Big Data Extensions 1.0
 Installing and Configuring VMware vSphere Big Data Extensions (Video)

Weitere ähnliche Inhalte

Was ist angesagt?

Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage EMC
 
White Paper: EMC Infrastructure for VMware Cloud Environments
White Paper: EMC Infrastructure for VMware Cloud Environments  White Paper: EMC Infrastructure for VMware Cloud Environments
White Paper: EMC Infrastructure for VMware Cloud Environments EMC
 
TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems
TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems  TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems
TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems EMC
 
White Paper: EMC Compute-as-a-Service
White Paper: EMC Compute-as-a-Service   White Paper: EMC Compute-as-a-Service
White Paper: EMC Compute-as-a-Service EMC
 
Business and Economic Benefits of VMware NSX
Business and Economic Benefits of VMware NSXBusiness and Economic Benefits of VMware NSX
Business and Economic Benefits of VMware NSXAngel Villar Garea
 
Network Virtualization and Security with VMware NSX - Business Case White Pap...
Network Virtualization and Security with VMware NSX - Business Case White Pap...Network Virtualization and Security with VMware NSX - Business Case White Pap...
Network Virtualization and Security with VMware NSX - Business Case White Pap...Błażej Matusik
 
High performance sql server workloads on hyper v
High performance sql server workloads on hyper vHigh performance sql server workloads on hyper v
High performance sql server workloads on hyper vManuel Castro
 
High availability solutions
High availability solutionsHigh availability solutions
High availability solutionsSteve Xu
 
Ws 2012 white paper hyper v
Ws 2012 white paper hyper vWs 2012 white paper hyper v
Ws 2012 white paper hyper vNuno Alves
 
Optimizing oracle-on-sun-cmt-platform
Optimizing oracle-on-sun-cmt-platformOptimizing oracle-on-sun-cmt-platform
Optimizing oracle-on-sun-cmt-platformSal Marcus
 
Eclipse plugin userguide
Eclipse plugin userguideEclipse plugin userguide
Eclipse plugin userguideKhúc Vũ
 
Forwarding Connector User;s Guide for 5.1.7.6151 and 6154
Forwarding Connector User;s Guide for 5.1.7.6151 and 6154Forwarding Connector User;s Guide for 5.1.7.6151 and 6154
Forwarding Connector User;s Guide for 5.1.7.6151 and 6154Protect724
 
Oracle dba-concise-handbook
Oracle dba-concise-handbookOracle dba-concise-handbook
Oracle dba-concise-handbooksasi777
 
Intrusion Monitoring Standard Content Guide
Intrusion Monitoring Standard Content GuideIntrusion Monitoring Standard Content Guide
Intrusion Monitoring Standard Content GuideProtect724
 
Reference Architecture: EMC Infrastructure for VMware View 5.1 EMC VNX Series...
Reference Architecture: EMC Infrastructure for VMware View 5.1 EMC VNX Series...Reference Architecture: EMC Infrastructure for VMware View 5.1 EMC VNX Series...
Reference Architecture: EMC Infrastructure for VMware View 5.1 EMC VNX Series...EMC
 
Practical guide to cc
Practical guide to ccPractical guide to cc
Practical guide to ccAccenture
 
Juniper Networks Solutions for VMware NSX
Juniper Networks Solutions for VMware NSXJuniper Networks Solutions for VMware NSX
Juniper Networks Solutions for VMware NSXJuniper Networks
 
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments   Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments EMC
 

Was ist angesagt? (20)

Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
Backup and Recovery Solution for VMware vSphere on EMC Isilon Storage
 
Open acc.1.0
Open acc.1.0Open acc.1.0
Open acc.1.0
 
White Paper: EMC Infrastructure for VMware Cloud Environments
White Paper: EMC Infrastructure for VMware Cloud Environments  White Paper: EMC Infrastructure for VMware Cloud Environments
White Paper: EMC Infrastructure for VMware Cloud Environments
 
TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems
TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems  TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems
TechBook: DB2 for z/OS Using EMC Symmetrix Storage Systems
 
White Paper: EMC Compute-as-a-Service
White Paper: EMC Compute-as-a-Service   White Paper: EMC Compute-as-a-Service
White Paper: EMC Compute-as-a-Service
 
Business and Economic Benefits of VMware NSX
Business and Economic Benefits of VMware NSXBusiness and Economic Benefits of VMware NSX
Business and Economic Benefits of VMware NSX
 
Network Virtualization and Security with VMware NSX - Business Case White Pap...
Network Virtualization and Security with VMware NSX - Business Case White Pap...Network Virtualization and Security with VMware NSX - Business Case White Pap...
Network Virtualization and Security with VMware NSX - Business Case White Pap...
 
High performance sql server workloads on hyper v
High performance sql server workloads on hyper vHigh performance sql server workloads on hyper v
High performance sql server workloads on hyper v
 
High availability solutions
High availability solutionsHigh availability solutions
High availability solutions
 
Ws 2012 white paper hyper v
Ws 2012 white paper hyper vWs 2012 white paper hyper v
Ws 2012 white paper hyper v
 
Sg248203
Sg248203Sg248203
Sg248203
 
Optimizing oracle-on-sun-cmt-platform
Optimizing oracle-on-sun-cmt-platformOptimizing oracle-on-sun-cmt-platform
Optimizing oracle-on-sun-cmt-platform
 
Eclipse plugin userguide
Eclipse plugin userguideEclipse plugin userguide
Eclipse plugin userguide
 
Forwarding Connector User;s Guide for 5.1.7.6151 and 6154
Forwarding Connector User;s Guide for 5.1.7.6151 and 6154Forwarding Connector User;s Guide for 5.1.7.6151 and 6154
Forwarding Connector User;s Guide for 5.1.7.6151 and 6154
 
Oracle dba-concise-handbook
Oracle dba-concise-handbookOracle dba-concise-handbook
Oracle dba-concise-handbook
 
Intrusion Monitoring Standard Content Guide
Intrusion Monitoring Standard Content GuideIntrusion Monitoring Standard Content Guide
Intrusion Monitoring Standard Content Guide
 
Reference Architecture: EMC Infrastructure for VMware View 5.1 EMC VNX Series...
Reference Architecture: EMC Infrastructure for VMware View 5.1 EMC VNX Series...Reference Architecture: EMC Infrastructure for VMware View 5.1 EMC VNX Series...
Reference Architecture: EMC Infrastructure for VMware View 5.1 EMC VNX Series...
 
Practical guide to cc
Practical guide to ccPractical guide to cc
Practical guide to cc
 
Juniper Networks Solutions for VMware NSX
Juniper Networks Solutions for VMware NSXJuniper Networks Solutions for VMware NSX
Juniper Networks Solutions for VMware NSX
 
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments   Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
 

Andere mochten auch

Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
IT-as-a-Service Solutions for Healthcare Providers
IT-as-a-Service Solutions for Healthcare ProvidersIT-as-a-Service Solutions for Healthcare Providers
IT-as-a-Service Solutions for Healthcare ProvidersEMC
 
Chromatography lect 2
Chromatography lect 2Chromatography lect 2
Chromatography lect 2FLI
 
Minimum wage mon042514
Minimum wage mon042514Minimum wage mon042514
Minimum wage mon042514Travis Klein
 
Tues ind rev problems pollution
Tues ind rev problems pollutionTues ind rev problems pollution
Tues ind rev problems pollutionTravis Klein
 
Creating a VMware Software-Defined Data Center Reference Architecture
Creating a VMware Software-Defined Data Center Reference Architecture Creating a VMware Software-Defined Data Center Reference Architecture
Creating a VMware Software-Defined Data Center Reference Architecture EMC
 
Fed fiscal monetary policy
Fed fiscal monetary policyFed fiscal monetary policy
Fed fiscal monetary policyTravis Klein
 
Fri evaluate crusades
Fri evaluate crusadesFri evaluate crusades
Fri evaluate crusadesTravis Klein
 
Diminishing marginal returns
Diminishing marginal returnsDiminishing marginal returns
Diminishing marginal returnsTravis Klein
 
Trabajo Telenchana Cristian
Trabajo Telenchana CristianTrabajo Telenchana Cristian
Trabajo Telenchana CristianBlanqui Tocto
 
Dedupe-Centric Storage for General Applications
Dedupe-Centric Storage for General Applications Dedupe-Centric Storage for General Applications
Dedupe-Centric Storage for General Applications EMC
 
Substitutes income effect
Substitutes income effectSubstitutes income effect
Substitutes income effectTravis Klein
 
Modelli di business e di servizio digitali nell'industria dell'informazione
Modelli di business e di servizio digitali nell'industria dell'informazioneModelli di business e di servizio digitali nell'industria dell'informazione
Modelli di business e di servizio digitali nell'industria dell'informazioneSara M
 
Flss Test Plan
Flss Test PlanFlss Test Plan
Flss Test PlanSara M
 
Cybercrime and the Healthcare Industry
Cybercrime and the Healthcare IndustryCybercrime and the Healthcare Industry
Cybercrime and the Healthcare IndustryEMC
 

Andere mochten auch (20)

Fri rights of man
Fri rights of manFri rights of man
Fri rights of man
 
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics WorkbenchPivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Tuesday voltaire
Tuesday voltaireTuesday voltaire
Tuesday voltaire
 
IT-as-a-Service Solutions for Healthcare Providers
IT-as-a-Service Solutions for Healthcare ProvidersIT-as-a-Service Solutions for Healthcare Providers
IT-as-a-Service Solutions for Healthcare Providers
 
Chromatography lect 2
Chromatography lect 2Chromatography lect 2
Chromatography lect 2
 
Minimum wage mon042514
Minimum wage mon042514Minimum wage mon042514
Minimum wage mon042514
 
Tues ind rev problems pollution
Tues ind rev problems pollutionTues ind rev problems pollution
Tues ind rev problems pollution
 
Creating a VMware Software-Defined Data Center Reference Architecture
Creating a VMware Software-Defined Data Center Reference Architecture Creating a VMware Software-Defined Data Center Reference Architecture
Creating a VMware Software-Defined Data Center Reference Architecture
 
Fed fiscal monetary policy
Fed fiscal monetary policyFed fiscal monetary policy
Fed fiscal monetary policy
 
Fri evaluate crusades
Fri evaluate crusadesFri evaluate crusades
Fri evaluate crusades
 
Diminishing marginal returns
Diminishing marginal returnsDiminishing marginal returns
Diminishing marginal returns
 
Trabajo Telenchana Cristian
Trabajo Telenchana CristianTrabajo Telenchana Cristian
Trabajo Telenchana Cristian
 
การนำเสนอในการประชุม
การนำเสนอในการประชุมการนำเสนอในการประชุม
การนำเสนอในการประชุม
 
Dedupe-Centric Storage for General Applications
Dedupe-Centric Storage for General Applications Dedupe-Centric Storage for General Applications
Dedupe-Centric Storage for General Applications
 
Substitutes income effect
Substitutes income effectSubstitutes income effect
Substitutes income effect
 
Modelli di business e di servizio digitali nell'industria dell'informazione
Modelli di business e di servizio digitali nell'industria dell'informazioneModelli di business e di servizio digitali nell'industria dell'informazione
Modelli di business e di servizio digitali nell'industria dell'informazione
 
Flss Test Plan
Flss Test PlanFlss Test Plan
Flss Test Plan
 
Cybercrime and the Healthcare Industry
Cybercrime and the Healthcare IndustryCybercrime and the Healthcare Industry
Cybercrime and the Healthcare Industry
 
Trabajo cristian 2
Trabajo cristian 2Trabajo cristian 2
Trabajo cristian 2
 

Ähnlich wie EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Backup Solution G...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Backup Solution G...EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Backup Solution G...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Backup Solution G...EMC
 
Analyzing SAP Performance with VMware vRealize Operations (vROps)
Analyzing SAP Performance with VMware vRealize Operations (vROps)Analyzing SAP Performance with VMware vRealize Operations (vROps)
Analyzing SAP Performance with VMware vRealize Operations (vROps)Blue Medora
 
H13531.1 eehc-federation-sddc-ra
H13531.1 eehc-federation-sddc-raH13531.1 eehc-federation-sddc-ra
H13531.1 eehc-federation-sddc-rasri200012
 
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC
 
EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC
 
Livre blanc technique sur l’architecture de référence
Livre blanc technique sur l’architecture de référenceLivre blanc technique sur l’architecture de référence
Livre blanc technique sur l’architecture de référenceMicrosoft France
 
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolera...
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolera...White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolera...
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolera...EMC
 
White Paper: EMC Compute-as-a-Service — EMC Ionix IT Orchestrator, VCE Vblock...
White Paper: EMC Compute-as-a-Service — EMC Ionix IT Orchestrator, VCE Vblock...White Paper: EMC Compute-as-a-Service — EMC Ionix IT Orchestrator, VCE Vblock...
White Paper: EMC Compute-as-a-Service — EMC Ionix IT Orchestrator, VCE Vblock...EMC
 
Practical Guide to Business Continuity & Disaster Recovery
Practical Guide to Business Continuity & Disaster RecoveryPractical Guide to Business Continuity & Disaster Recovery
Practical Guide to Business Continuity & Disaster Recoveryatif_kamal
 
Intro to embedded systems programming
Intro to embedded systems programming Intro to embedded systems programming
Intro to embedded systems programming Massimo Talia
 
Pda management with ibm tivoli configuration manager sg246951
Pda management with ibm tivoli configuration manager sg246951Pda management with ibm tivoli configuration manager sg246951
Pda management with ibm tivoli configuration manager sg246951Banking at Ho Chi Minh city
 
Pervasive Video in the Enterprise
Pervasive Video in the EnterprisePervasive Video in the Enterprise
Pervasive Video in the EnterpriseAvaya Inc.
 
Backing up web sphere application server with tivoli storage management redp0149
Backing up web sphere application server with tivoli storage management redp0149Backing up web sphere application server with tivoli storage management redp0149
Backing up web sphere application server with tivoli storage management redp0149Banking at Ho Chi Minh city
 
Q T P Tutorial
Q T P  TutorialQ T P  Tutorial
Q T P Tutorialrosereddy
 
Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Banking at Ho Chi Minh city
 
Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Banking at Ho Chi Minh city
 

Ähnlich wie EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5 (20)

EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Backup Solution G...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Backup Solution G...EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Backup Solution G...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Backup Solution G...
 
Analyzing SAP Performance with VMware vRealize Operations (vROps)
Analyzing SAP Performance with VMware vRealize Operations (vROps)Analyzing SAP Performance with VMware vRealize Operations (vROps)
Analyzing SAP Performance with VMware vRealize Operations (vROps)
 
H13531.1 eehc-federation-sddc-ra
H13531.1 eehc-federation-sddc-raH13531.1 eehc-federation-sddc-ra
H13531.1 eehc-federation-sddc-ra
 
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
EMC Enterprise Hybrid Cloud 2.5.1, Federation SDDC Edition: Foundation Infras...
 
EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter Kit
 
ESM_InstallGuide_5.6.pdf
ESM_InstallGuide_5.6.pdfESM_InstallGuide_5.6.pdf
ESM_InstallGuide_5.6.pdf
 
Livre blanc technique sur l’architecture de référence
Livre blanc technique sur l’architecture de référenceLivre blanc technique sur l’architecture de référence
Livre blanc technique sur l’architecture de référence
 
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolera...
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolera...White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolera...
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolera...
 
White Paper: EMC Compute-as-a-Service — EMC Ionix IT Orchestrator, VCE Vblock...
White Paper: EMC Compute-as-a-Service — EMC Ionix IT Orchestrator, VCE Vblock...White Paper: EMC Compute-as-a-Service — EMC Ionix IT Orchestrator, VCE Vblock...
White Paper: EMC Compute-as-a-Service — EMC Ionix IT Orchestrator, VCE Vblock...
 
Practical Guide to Business Continuity & Disaster Recovery
Practical Guide to Business Continuity & Disaster RecoveryPractical Guide to Business Continuity & Disaster Recovery
Practical Guide to Business Continuity & Disaster Recovery
 
Bb sql serverdell
Bb sql serverdellBb sql serverdell
Bb sql serverdell
 
Qtp tutorial
Qtp tutorialQtp tutorial
Qtp tutorial
 
Intro to embedded systems programming
Intro to embedded systems programming Intro to embedded systems programming
Intro to embedded systems programming
 
Pda management with ibm tivoli configuration manager sg246951
Pda management with ibm tivoli configuration manager sg246951Pda management with ibm tivoli configuration manager sg246951
Pda management with ibm tivoli configuration manager sg246951
 
Acad acg
Acad acgAcad acg
Acad acg
 
Pervasive Video in the Enterprise
Pervasive Video in the EnterprisePervasive Video in the Enterprise
Pervasive Video in the Enterprise
 
Backing up web sphere application server with tivoli storage management redp0149
Backing up web sphere application server with tivoli storage management redp0149Backing up web sphere application server with tivoli storage management redp0149
Backing up web sphere application server with tivoli storage management redp0149
 
Q T P Tutorial
Q T P  TutorialQ T P  Tutorial
Q T P Tutorial
 
Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...
 
Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...Deployment guide series ibm tivoli composite application manager for web reso...
Deployment guide series ibm tivoli composite application manager for web reso...
 

Mehr von EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

Mehr von EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Kürzlich hochgeladen

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Kürzlich hochgeladen (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

EMC Hybrid Cloud Solution with VMware: Hadoop Applications Solution Guide 2.5

  • 1. Solution Guide EMC HYBRID CLOUD SOLUTION WITH VMWARE Hadoop Applications Solution Guide 2.5 EMC Solutions Abstract This document serves as a reference for planning and designing a Pivotal Hadoop solution that enables IT organizations to quickly deploy Hadoop as a service (HaaS) on an existing cloud. August 2014
  • 2. 2 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Copyright © 2014 EMC Corporation. All rights reserved. Published in the USA. Published August 2014 EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC2 , EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. All other trademarks used herein are the property of their respective owners. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Part Number H13221
  • 3. Contents 3EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Contents Chapter 1 Executive Summary 7 Document purpose.....................................................................................................8 Audience....................................................................................................................8 Solution purpose........................................................................................................8 Business challenge ....................................................................................................9 Technology solution ...................................................................................................9 Chapter 2 EMC Hybrid Cloud Solution Overview 11 Introduction .............................................................................................................12 EMC Hybrid Cloud features and functionality............................................................13 Automation and self-service provisioning ............................................................13 Multitenancy and secure separation....................................................................14 Workload-optimized storage................................................................................14 Elasticity and service assurance ..........................................................................14 Operational monitoring and management............................................................15 Metering and chargeback ....................................................................................15 Modular add-on components...............................................................................16 Chapter 3 EMC Hybrid Cloud Hadoop as a Service 19 Overview ..................................................................................................................20 EMC Hybrid Cloud HaaS and IaaS .............................................................................20 Pivotal Hadoop.........................................................................................................21 Serengeti..................................................................................................................22 VMware vSphere Big Data Extensions.......................................................................22 Chapter 4 HaaS Component Integration 25 Overview ..................................................................................................................26 Integrating Hadoop components with EMC Hybrid Cloud ..........................................26 BDE Topology.......................................................................................................26 Virtualized Hadoop..............................................................................................27 Configuring the platform...........................................................................................28 Installing and configuring BDE.............................................................................28 Installing and configuring PHD.............................................................................30 Installing and configuring EMC Hybrid Cloud IaaS................................................33
  • 4. Contents 4 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Chapter 5 Creating vCO Workflows and vCAC Catalog Services for HaaS 35 Overview ..................................................................................................................36 Importing and modifying custom vCO workflows ......................................................36 Modifying custom workflows ...............................................................................36 Creating BDE Clusters...............................................................................................42 Creating new BDE clusters ...................................................................................42 Configuring a Hadoop cluster...............................................................................42 Creating vCAC Catalog Services ................................................................................45 Accessing vCAC ...................................................................................................45 Creating a new service blueprint..........................................................................45 Chapter 6 Use Cases: EMC Hybrid Cloud IaaS 49 Overview ..................................................................................................................50 IaaS – storage services.............................................................................................50 Overview..............................................................................................................50 Use case 1: Storage provisioning.........................................................................50 Use case 2: Select virtual machine storage ..........................................................54 Use case 3: Metering storage services .................................................................55 Summary .............................................................................................................56 Monitoring and capacity planning ............................................................................57 Monitoring...........................................................................................................57 Capacity planning................................................................................................57 Capacity planning example..................................................................................60 Metering and chargeback .........................................................................................61 Chapter 7 Conclusion 65 Summary..................................................................................................................66 Appendix A References 67 VMware references...................................................................................................68
  • 5. Contents 5EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figures Figure 1. EMC Hybrid Cloud key components .....................................................12 Figure 2. EMC Hybrid Cloud self-service portal ...................................................14 Figure 3. EMC ViPR Analytics with VMware vCenter Operations Manager............15 Figure 4. IT Business Management Suite overview dashboard for hybrid cloud ..16 Figure 5. EMC Hybrid Cloud HaaS component overview......................................21 Figure 6. Pivotal Hadoop (PHD) components......................................................22 Figure 7. BDE and Serengeti stack......................................................................23 Figure 8. BDE and vSphere deployment topology...............................................26 Figure 9. The evolution of virtual Hadoop...........................................................27 Figure 10. Configuring the SSO lookup service and management server IP addresses ...........................................................................................29 Figure 11. Importing Hadoop binaries into BDE management server ....................31 Figure 12. Removing the default Apache template from BDE ................................32 Figure 13. Importing custom workflows into vCO..................................................36 Figure 14. Using the validate workflows action ....................................................37 Figure 15. How to edit the attributes....................................................................37 Figure 16. Editing and creating custom parameter passing ..................................38 Figure 17. Launching scripts from the VCO...........................................................39 Figure 18. Launching of Micro Hadoop Cluster workflow ......................................40 Figure 19. Status of creation of Micro Hadoop cluster from BDE (vSphere web client)..................................................................................................41 Figure 20. Status of Micro Hadoop cluster creation from BDE vSphere Client .......41 Figure 21. Create and name a new Big Data Cluster .............................................42 Figure 22. Advance Service Designer ...................................................................46 Figure 23. Edit Entitlement window......................................................................46 Figure 24. vCAC Service Catalog showing Hadoop as a Service ............................47 Figure 25. Storage Services - Provision cloud storage ..........................................51 Figure 26. Provision Cloud Storage – select vCenter cluster .................................52 Figure 27. Storage Provisioning – Select datastore type.......................................52 Figure 28. Storage provisioning – Choose ViPR storage pool................................53 Figure 29. Storage provisioning – Enter storage size............................................53 Figure 30. Provision Storage – Storage Reservation for vCAC Business Group ......53 Figure 31. Set storage reservation policy for virtual machine disks ......................54 Figure 32. Create new virtual machine storage profile for Tier 2 storage ...............55 Figure 33. Automatic discovery of storage capabilities using EMC ViPR Storage Provider...............................................................................................55 Figure 34. VMware ITBM chargeback based on storage profile of datastore .........56 Figure 35. Choosing virtual machine consumption models and profiles...............58
  • 6. Contents 6 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 36. Specifying configuration and projected capacity usage of new virtual machines ............................................................................................58 Figure 37. Capacity summary showing insufficient CPU and RAM resources.........59 Figure 38. Specifying number of hosts and amount of CPU and memory ..............59 Figure 39. Specifying datastore size.....................................................................60 Figure 40. Compared scenarios............................................................................60 Figure 41. Combined scenarios............................................................................61 Figure 42. Categorized hybrid cloud environment cost overview ..........................62 Figure 43. vSphere Cluster cost overview.............................................................63 Figure 44. Storage cost overview..........................................................................63
  • 7. Chapter 1: Executive Summary 7EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Chapter 1 Executive Summary This chapter presents the following topics: Document purpose.....................................................................................................8 Audience....................................................................................................................8 Solution purpose........................................................................................................8 Business challenge....................................................................................................9 Technology solution...................................................................................................9
  • 8. Chapter 1: Executive Summary 8 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Document purpose This document serves as a reference for planning and designing a Pivotal Hadoop solution that enables IT organizations to quickly deploy Hadoop as a service (HaaS) on an existing cloud. The solution delivers infrastructure as-a-service (IaaS) capabilities to support big data application development. This document introduces the main features and functionality of the solution, the solution architecture and key components, and the validated hardware and software environment. It demonstrates the integration of Pivotal Hadoop Enterprise in the EMC® Hybrid Cloud solution. The Pivotal Hadoop solution is a modular add-on to the EMC Hybrid Cloud solution. EMC Hybrid Cloud Solution with VMware: Foundation Infrastructure Reference Architecture 2.5 and EMC Hybrid Cloud Solution with VMware: Foundation Infrastructure Solution Guide 2.5 describe the reference architecture and the foundation solution upon which all the EMC Hybrid Cloud add-on solutions build. The following documents provide further information about how to implement specific capabilities or enable specific use cases within the EMC Hybrid Cloud solution with VMware:  EMC Hybrid Cloud Solution with VMware: Data Protection Continuous Availability Solution Guide 2.5  EMC Hybrid Cloud Solution with VMware: Data Protection Disaster Recovery Solution Guide 2.5  EMC Hybrid Cloud Solution with VMware: Data Protection Backup Solution Guide 2.5  EMC Hybrid Cloud Solution with VMware: Security Solution Guide 2.5  EMC Hybrid Cloud Solution with VMware: Pivotal CF Platform as a Service Solution Guide 2.5 Audience This document is intended for executives, managers, architects, cloud administrators, and technical administrators of IT environments who want to build a self-service Pivotal Hadoop-based Enterprise big data platform. Readers should be familiar with VMware vCloud Suite, Pivotal Hadoop, VMware Big Data Extensions (BDE), EMC ViPR® , general IaaS defined datacenter concepts, and how a hybrid cloud infrastructure accommodates these technologies and requirements. Solution purpose The EMC Hybrid Cloud solution enables EMC customers to build an enterprise-class, scalable, multitenant infrastructure that enables:  Complete management of the infrastructure and application service lifecycle  On-demand access to and control of network bandwidth, servers, storage, and security
  • 9. Chapter 1: Executive Summary 9EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5  Quick deployment of IaaS components to support HaaS-based services without IT administrator involvement  Scalable, elastic, flexible HaaS-based services for maximum asset utilization  Access to application services from a single platform for both business-critical and next-generation cloud applications This solution provides the reference architecture and the best practice guidance necessary to integrate the key components and functionality of enterprise HaaS into an underlying EMC Hybrid Cloud infrastructure. Business challenge Today’s enterprise demands an agile development platform that can enable the continuous delivery, updating, and horizontal scalability of applications. The Pivotal Hadoop (PHD) platform enables developers to easily deploy, bind, and scale applications and data services. When integrated with VMware vCloud Automation Center, it delivers a self-service Pivotal Hadoop platform that facilitates rapid deployment and instant scaling or updating of Hadoop clusters. HaaS interoperability with the underlying infrastructure needs to accommodate consumable new generation applications while maintaining existing end-to-end service delivery to provide:  Efficiency and flexibility  Fast, proactive responses for services requests  Easy as-a-service model of deployment  Adequate visibility into the cost of the infrastructure Technology solution This EMC Hybrid Cloud solution integrates the best of EMC, VMware, and Pivotal products and services, and empowers IT organizations to adopt an as-a-service implementation model of compute and storage infrastructure within the data center. Agile, elastic, on-demand, end-to-end IaaS provisioning is crucial to support a comprehensive, dynamic, and fast-growing big data environment. The key solution components include:  EMC ViPR software-defined storage platform  VMware vCloud Suite cloud management and infrastructure  EMC and VMware integrated workflows  VMware NSX virtual networking technologies  VMware vSphere virtualization platform  VMware Big Data Extensions (BDE) with Project Serengeti  Pivotal Hadoop (PHD)
  • 10. Chapter 1: Executive Summary 10 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5
  • 11. Chapter 2: EMC Hybrid Cloud Solution Overview 11EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Chapter 2 EMC Hybrid Cloud Solution Overview This chapter presents the following topics: Introduction .............................................................................................................12 EMC Hybrid Cloud features and functionality ...........................................................13
  • 12. Chapter 2: EMC Hybrid Cloud Solution Overview 12 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Introduction The EMC Hybrid Cloud solution enables a well-run hybrid cloud by bringing new functionality not only to IT organizations, but also to developers, end users, and line- of-business owners. Beyond delivering baseline infrastructure as a service (IaaS), built on a software-defined data center (SDDC) architecture, the solution delivers feature-rich capabilities to expand from IaaS to business-enabling IT as a service (ITaaS). Backup as a service (BaaS) and disaster recovery as a service (DRaaS) are now policies that users can enable with just a few mouse clicks. End users and developers can quickly access a marketplace of resources for Microsoft, Oracle, SAP, EMC Syncplicity® , and Pivotal applications, and can add third-party packages as required. All of these resources can be deployed on private cloud or public cloud services, including VMware vCloud Air, from EMC-powered cloud service providers. The EMC Hybrid Cloud solution uses the best of EMC and VMware products and services, and takes advantage of the strong integration between EMC and VMware technologies to provide the foundation for enabling IaaS on new and existing infrastructure for the hybrid cloud. Figure 1 shows the key components of the EMC Hybrid Cloud solution. For detailed information, refer to EMC Hybrid Cloud Solution with VMware: Foundation Infrastructure Solution Guide 2.5. For information on EMC Hybrid Cloud modular add- on solutions, which provide functionality such as data protection, continuous availability, and application services, refer to Modular add-on components and to the individual Solution Guides for those add-ons. Figure 1. EMC Hybrid Cloud key components
  • 13. Chapter 2: EMC Hybrid Cloud Solution Overview 13EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 EMC Hybrid Cloud features and functionality The EMC Hybrid Cloud solution incorporates the following features and functionality:  Automation and self-service provisioning  Multitenancy and secure separation  Workload-optimized storage  Elasticity and service assurance  Operational monitoring and management  Metering and chargeback  Modular add-on components The solution provides self-service provisioning of automated cloud services to both users and infrastructure administrators. It uses VMware vCloud Automation Center (vCAC), integrated with EMC ViPR software-defined storage and VMware NSX, to provide the compute, storage, network, and security virtualization platforms for the SDDC. Cloud users can request and manage their own applications and compute resources within established operational policies. This can reduce IT service delivery times from days or weeks to minutes. Automation and self-service provisioning features include:  Self-service portal—Provides a cross-cloud storefront that delivers a catalog of custom-defined services for provisioning workloads based on business and IT policies, as shown in Figure 2  Role-based entitlements—Ensure that the self-service portal presents only the virtual machine, application, or service blueprints appropriate to a user’s role within the business  Resource reservations—Allocate resources for use by a specific group and ensure that those resources are inaccessible to other groups  Service levels—Define the amount and types of resources that a particular service can receive during initial provisioning or as part of configuration changes  Blueprints—Contain the build specifications and automation policies that define the process for building or reconfiguring compute resources Automation and self-service provisioning
  • 14. Chapter 2: EMC Hybrid Cloud Solution Overview 14 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 2. EMC Hybrid Cloud self-service portal The solution provides the ability to enforce physical and virtual separation for multitenancy, as strongly as the administrator requires. This separation can encompass network, compute, and storage resources to ensure appropriate security and performance for each tenant. The solution supports secure multitenancy through vCAC role-based access control (RBAC), which enables vCAC roles to be mapped to Microsoft Active Directory groups. The self-service portal shows only the appropriate views, functions, and operations to cloud users, based on their role within the business. The solution enables customers to take advantage of the proven benefits of EMC storage in a hybrid cloud environment. Using ViPR storage services, which leverage the capabilities of EMC VNX® and EMC VMAX® storage systems, the solution provides software-defined, policy-based management of block- and file-based virtual storage. ViPR abstracts the storage configuration and presents it as a single storage control point, enabling cloud administrators to access all heterogeneous storage resources within a data center as if the resources were a single large array. The solution uses the capabilities of vCAC and various EMC tools to provide the intelligence and visibility required to proactively ensure service levels in virtual and cloud environments. Infrastructure administrators can add storage, compute, and network resources to their resource pools as needed. Cloud users can select from a range of service levels for compute, storage, and data protection for their applications and can expand the resources of their virtual machines on demand to achieve the service levels they expect for their application workloads. Multitenancy and secure separation Workload- optimized storage Elasticity and service assurance
  • 15. Chapter 2: EMC Hybrid Cloud Solution Overview 15EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 The solution features automated monitoring and management capabilities that provide IT administrators with a comprehensive view of the cloud environment to enable smart decision-making for resource provisioning and allocation. These automated capabilities are based on a combination of EMC ViPR Storage Resource Management (SRM), VMware vCenter Log Insight, and VMware vCenter Operations Manager (vC Ops), and use EMC plug-ins for ViPR, VNX, VMAX, and EMC Avamar® systems to provide extensive additional storage detail. Cloud administrators can use ViPR SRM to understand and manage the impact that storage has on their applications and to view their storage topologies from application to disk, as shown in Figure 3. Figure 3. EMC ViPR Analytics with VMware vCenter Operations Manager Capacity analytics and what-if scenarios in vC Ops identify over-provisioned resources so they can be right-sized for the most efficient use of virtualized resources. In addition, for centralized logging, infrastructure components can be configured to forward their logs to vCenter Log Insight, which then aggregates the logs from all the disparate sources for analytics and reporting. The solution uses VMware IT Business Management Suite (ITBM) to provide cloud administrators with comprehensive metering and cost information across all business groups in the enterprise. ITBM is integrated into the cloud administrator’s self-service portal and presents a dashboard overview of the hybrid cloud infrastructure, as shown in Figure 4. Operational monitoring and management Metering and chargeback
  • 16. Chapter 2: EMC Hybrid Cloud Solution Overview 16 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 4. IT Business Management Suite overview dashboard for hybrid cloud The EMC Hybrid Cloud solution provides modular add-on components for the following services:  Application services This add-on solution leverages VMware vCloud Application Director to optimize application deployment and release management through logical application blueprints in vCAC. Users can quickly and easily deploy blueprints for applications and databases such as Microsoft Exchange, Microsoft SQL Server, Microsoft SharePoint, Oracle, and SAP.  Data protection services EMC Avamar and EMC Data Domain® systems provide a backup infrastructure that offers features such as deduplication, compression, and VMware integration. By using VMware vCenter Orchestrator (vCO) workflows customized by EMC, administrators can quickly and easily set up multitier data protection policies and enable users to select an appropriate policy when they provision their virtual machines.  Continuous availability A combination of EMC VPLEX® virtual storage and VMware vSphere High Availability (HA) provides the ability to federate information across multiple data centers over synchronous distances. With virtual storage and virtual servers working together over distance, the infrastructure can transparently provide load balancing, real time remote data access, and improved application protection. Modular add-on components
  • 17. Chapter 2: EMC Hybrid Cloud Solution Overview 17EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5  Disaster recovery This add-on solution enables cloud administrators to select disaster recovery (DR) protection for their applications and virtual machines when they provision their hybrid cloud environment. ViPR automatically places these systems on storage that is protected remotely by EMC RecoverPoint® technology. VMware vCenter Site Recovery Manager automates the recovery of all virtual storage and virtual machines.  Platform as a service The EMC Hybrid Cloud solution provides an elastic and scalable IaaS foundation for platform-as-a-service (PaaS) and software-as-a-service (SaaS) services. Pivotal CF provides a highly available platform that enables application owners to easily deliver and manage applications over the application lifecycle. The EMC Hybrid Cloud service offerings enable PaaS administrators to easily provision compute and storage resources on demand to support scalability and growth in their Pivotal CF enterprise PaaS environments.
  • 18. Chapter 2: EMC Hybrid Cloud Solution Overview 18 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5
  • 19. Chapter 3: EMC Hybrid Cloud Hadoop as a Service 19EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Chapter 3 EMC Hybrid Cloud Hadoop as a Service This chapter presents the following topics: Overview..................................................................................................................20 EMC Hybrid Cloud HaaS and IaaS .............................................................................20 Pivotal Hadoop.........................................................................................................21 Serengeti .................................................................................................................22 VMware Big Data Extensions....................................................................................22
  • 20. Chapter 3: EMC Hybrid Cloud Hadoop as a Service 20 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Overview This chapter identifies and briefly describes the major features and functionality required to support Pivotal Hadoop as a service and promote scalability in the EMC Hybrid Cloud environment.  EMC Hybrid Cloud HaaS and IaaS  Project Serengeti  VMware Big Data Extensions (BDE)  Pivotal Hadoop (PHD)  HaaS Self-Service Portal EMC Hybrid Cloud HaaS and IaaS EMC Hybrid Cloud HaaS is a solution stack made up of EHC IaaS, integrated with BDE and PHD. The self-service aspect of the portal is controlled by vCAC as shown in Figure 5. Hadoop is an open-source software program that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. PHD is an Apache Hadoop distribution. Deploying a Hadoop cluster using traditional methods is complex and time- consuming. It typically involves setting up the infrastructure, installing and configuring the operating system, acquiring the respective Hadoop media, installing Hadoop components, and finally creating the Hadoop cluster. This process typically takes weeks and requires a significant skillset. The EMC HaaS offering simplifies the process by using extensive workflow automation in the EHC IaaS backend. Through self-service automation, it is now possible to deploy or expand a Hadoop cluster in minutes using the vCloud Automation Center self-service portal.
  • 21. Chapter 3: EMC Hybrid Cloud Hadoop as a Service 21EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 5. EMC Hybrid Cloud HaaS component overview Pivotal Hadoop Pivotal Hadoop (PHD) is an open-source software program that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. PHD is an Apache Hadoop distribution. The complete PHD platform contains a number of components that are not specifically used within this solution:  YARN (Yet Another Resource Negotiator)—a distributed processing framework that can schedule and execute resource requests from multiple applications  HBASE—a column database that runs on top of the Hadoop Distributed Files System (HDFS)  HAWQ—HAWQ is a parallel SQL query engine that combines the merits of the Greenplum Database Massively Parallel Processing (MPP) relational database engine and the Hadoop parallel processing framework  ZooKeeper—a centralized service for maintaining configuration information, naming services, providing distributed synchronization, and providing group services  Hive—a data warehouse infrastructure built on top of Hadoop infrastructure  Hadoop Map Reduce—Map Reduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster
  • 22. Chapter 3: EMC Hybrid Cloud Hadoop as a Service 22 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 6 shows the PHD components. Figure 6. Pivotal Hadoop (PHD) components Note: YARN, HBASE, HAWQ and HIVE are not referenced in this solution. HAWQ is not installed by default and must be installed separately. This can be automated through the use of vCO workflows if required. Serengeti Serengeti is an open source project initiated by VMware to enable the deployment and management of Hadoop and big data clusters in a vCenter Server managed environment. The key components are the Serengeti Management Server, which provides a framework for running big data clusters on vSphere, and a command line interface that provides tools and utilities that form an administrative interface for managing and monitoring the cluster environments. VMware vSphere Big Data Extensions VMware vSphere Big Data Extensions, or BDE, is a feature within vSphere to support big data and open source Hadoop distribution workloads. BDE provides an integrated set of management tools to help enterprises deploy, run, and manage Hadoop on a common virtual infrastructure. Figure 7 shows how BDE is an installable virtual appliance plug-in that controls and monitors Hadoop Services. The BDE virtual appliance runs on top of vSphere and uses the Serengeti Management Server to control cluster creation by cloning templates through the template server. BDE is a commercial version of Serengeti, which is an open source project from VMware. BDE provides the features of Serengeti in an enterprise format, including:  An open source supported version of the Apache Hadoop Distribution
  • 23. Chapter 3: EMC Hybrid Cloud Hadoop as a Service 23EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5  The big data extensions GUI which is integrated into vSphere Web Client to perform Hadoop infrastructure and cluster management tasks  Elastic-enabled clusters that optimize and provide scaling of physical compute resources in a vSphere environment Figure 7. BDE and Serengeti stack
  • 24. Chapter 3: EMC Hybrid Cloud Hadoop as a Service 24 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5
  • 25. Chapter 4: HaaS Component Integration 25EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Chapter 4 HaaS Component Integration This chapter presents the following topics: Overview..................................................................................................................26 Integrating Hadoop components with EMC Hybrid Cloud .........................................26 Configuring the platform..........................................................................................28
  • 26. Chapter 4: HaaS Component Integration 26 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Overview This section provides guidance on configuring the services required for Hadoop as a Service, specifically BDE and PHD, and integrating them with EMC Hybrid Cloud IaaS services. Integrating Hadoop components with EMC Hybrid Cloud To install and configure Hadoop-as-a-Service components, refer to the appropriate vendor documentation referenced in in the installing and configuring sections for the component in this chapter. The steps discussed assume that the EMC Hybrid Cloud has been installed and configured as described in the EMC Hybrid Cloud Solution with VMware – Foundation Intrastructure Solution Guide 2.5, and that the IaaS, portal, catalog services, and tenant structure are all in place. BDE runs on top of Serengeti. Figure 8 shows the virtual appliance that runs the Serengeti Management Server and Template Server. BDE provides the GUI for managing Hadoop clusters, communicating through the Serengeti Management Server. Figure 8. BDE and vSphere deployment topology With VMware’s vSphere Big Data Extensions, you can enable deployment of Hadoop inside your VMware vSphere environment. The Big Data Extensions are distributed as a downloadable OVA-based virtual appliance that is imported into an existing environment. The minimum requirements to support BDE are vSphere 5.0 or later and Enterprise or Enterprise plus vSphere licenses. By default, the basic Apache Foundation distribution of Hadoop is also included, but it is very easy to add in other commercial Hadoop distributions such as Pivotal Hadoop, Cloudera Hadoop, Hortonworks Hadoop, or MapR Hadoop. This solution uses the Pivotal Hadoop distribution integrated with the EMC Hybrid Cloud IaaS stack to create Hadoop as a Service. BDE Topology
  • 27. Chapter 4: HaaS Component Integration 27EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 After BDE is installed, you can begin creating a virtual Hadoop cluster. You can specify a number of configuration options including distribution, topology (basic, compute/storage separation, HBase-only, or custom), and the number and size of the virtual machines for each of the Hadoop roles (for example, name node, client node, and data nodes). Note the options presented in the web interface are only a fraction of what can be invoked through the advanced command-line tools and API. When you start to deploy a Hadoop cluster, BDE clones the appropriate virtual machines and automatically builds out the cluster. When you are satisfied with the cluster, you can scale up (increase the size of the virtual machine’s memory and CPU resources) or scale out (increase the number of virtual machines). You can configure the cluster to scale automatically as the load alters for additional flexibility and efficiency. Some of the benefits of virtualizing Hadoop—for example, elasticity and multi- tenancy—arise from the increased number of deployment options that become available when Hadoop is virtualized. Figure 9 shows the evolution of virtual Hadoop, from self-contained to a tenant-based model. Figure 9. The evolution of virtual Hadoop The traditional Hadoop model combines compute and data. While this implementation is straightforward, representing how the physical Hadoop model can be directly translated into a virtual machine, the ability to scale up and down is limited because the lifecycle of this type of virtual machine is tightly coupled to the data it manages. Powering off a virtual machine with combined storage and computing means access to its data is lost. Scaling out by adding more nodes would necessitate rebalancing data across the expanded cluster, so this model is not particularly elastic. Separating computing from storage in a virtual Hadoop cluster can achieve compute elasticity, enabling mixed workloads to run on the same virtualization platform and improving resource utilization. It is simple to configure using a HDFS data layer that is always available, along with a compute layer comprising a variable number of TaskTracker nodes, which can be expanded and contracted on demand. Extending the concept of data-compute separation, multiple tenants can be accommodated on the virtualized Hadoop cluster by running multiple Hadoop Virtualized Hadoop
  • 28. Chapter 4: HaaS Component Integration 28 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 compute clusters against the same data service. Using this model, each virtual compute cluster enjoys performance, security, and configuration isolation. While Hadoop performance using the combined data-compute model on vSphere is similar to its performance on physical hardware, providing virtualized Hadoop increased topology awareness can enable the data locality needed to improve performance when data and compute layers are separated. Topology awareness allows Hadoop operators to realize elasticity and multi-tenancy benefits when data storage and computing are separated. Furthermore, topology awareness can improve reliability when multiple nodes of the same Hadoop cluster are colocated on the same physical host. To optimize the data locality and failure group characteristics of virtualized Hadoop:  Group virtual Hadoop nodes on the same physical host into the same failure domain, and avoid multiple replicas.  Maximize usage of the virtual network between virtual nodes on the same physical host. The virtual network has higher throughput and lower latency than the physical network and does not consume any physical switch bandwidth Configuring the platform Refer to VMware vSphere Big Data Extensions Administrator's and User's Guide to install and configure the BDE components required for Hadoop as a Service. Configuration task order The following steps outline the high-level tasks you need to perform to install and configure BDE: 1. Ensure the environment meets the minimum vSphere requirements, correct licensing is in place, and compute, storage and networking pre-requisites are met. 2. Configure cluster settings, including vSphere HA, Distributed Resource Scheduling, host monitoring, and admission control. 3. Configure network settings using either vSwitch, vSphere Distributed Switch (vDS), or NSX. Ensure the required ports are configured as part of any firewall policy. 4. Deploy the BDE OVF file and assign the management network. When you deploy BDE the setup will ask for a destination port group; this is the network that the management network uses to communicate with the server so the port group should be the same as the VLAN ID. If vCenter or BDE are unable to communicate with each other, then the integration will fail. Configuring SSO service As part of the configuration process an important step is to configure the SSO service and management server IP addresses. 1. As shown in Figure 10, from the left pane in the Deploy OVF Template page select Customize template. Installing and configuring BDE
  • 29. Chapter 4: HaaS Component Integration 29EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 2. In the VC SSO Lookup Service URL box, type the vCenter Server Fully Qualified Domain Name FQDN in the same format as shown (if the default server name has not been changed). If you do not specify the FQDN here, then the certificate will not be accepted and there will be a connection issue between BDE and the Serengeti server later. 3. Under Management Server Network Settings, enter the appropriate IP address settings. Figure 10. Configuring the SSO lookup service and management server IP addresses Starting BDE in vSphere After successfully installing and configuring BDE within vSphere, power on the BDE management server and then register BDE within vSphere as the final part of configuration by performing the following steps: 1. Log in to the vSphere client with administrative privileges. 2. Within the vSphere client, locate the BDE management server. The management server is located under the datacenter resource pool in which it was deployed. 3. Select and record the management IP address. 4. Register the management server using the register plugin URL: https://management-server-ip-address:8443/register-plugin where management-server-ip-address is the IP address you recorded in step 3. 5. Complete the required registration information and then click Submit. The BDE icon should now be available in the list of objects within the inventory.
  • 30. Chapter 4: HaaS Component Integration 30 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Before installing and configuring PHD, download the following required components and make them available for the installation:  Cent OS 6.2 64 bit ISO  Pivotal Hadoop Tar files  Oracle JDK 7, 64 bit rpm for Cent-OS  Big Data Extension OVF VMware BDE comes supplied with a default Hadoop distribution from Apache. The HaaS integration requires that Pivotal Hadoop be installed. Get the Pivotal Hadoop media and documentation from http://www.gopivotal.com/big-data/pivotal-hd, and register and obtain the necessary licenses. The following high level tasks outline the process to load the media and create a PHD template within the BDE configuration. Installing PHD To create the required installation configuration for BDE, use Yum repositories (as opposed to a TAR-ball). When you create a Hadoop cluster that is YUM-deployed, the Hadoop nodes within the cluster then download the Red Hat Package Manager (RPM) packages for the Pivotal Hadoop distribution from the official Yum repositories. The Pivotal Hadoop distribution must be installed in a 64-bit version of the CentOS 6.x operating system. You must use either CentOS 6.2 or CentOS 6.4 to create the Hadoop template virtual machine . The template is used in the cloning process for creating a Hadoop cluster. After you have deployed the BDE OVF you must follow the steps to integrate YUM into PHD by creating a YUM repository as outlined below, and then create the template. Creating a Yum repository for PHD The steps for configuring PHD with BDE are described in the VMware vSphere Big Data Extensions Administrator’s and User’s Administration Guide. Creating a Hadoop template virtual machine You must use either CentOS 6.2 or CentOS 6.4 to create the Hadoop template virtual machine. To upgrade from a previous version, refer the chapter titled “Create a Hadoop Template Virtual Machine using RHEL Server 6.x” in the VMware vSphere Big Data Extensions Administrator’s and User’s Administration Guide. The following steps outline the procedure for creating a Hadoop template virtual machine: 1. Import the PHD binaries and create PHD media by logging into the BDE management server and importing the PHD tar files into an appropriate directory structure on the server. Figure 11shows the binary import process. Installing and configuring PHD
  • 31. Chapter 4: HaaS Component Integration 31EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 11. Importing Hadoop binaries into BDE management server 2. Test that the import was successful by accessing the URL path from a browser and ensuring that the expected folders are present. 3. After installing the media into the BDE management server, create a new Pivotal Hadoop template. 4. Make the new Pivotal Hadoop template the default template by removing the default Hadoop Apache template from the BDE management server, as shown in Figure 12.
  • 32. Chapter 4: HaaS Component Integration 32 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 12. Removing the default Apache template from BDE Configuring custom resources for BDE VMware BDE requires two resources types when automating Hadoop clusters: networking resources and storage resources. Networking resources Networking is used to assign virtual machines IP addresses. BDE deploys all nodes of a Hadoop cluster from a single common CentOS template that comes preconfigured with the BDE vApp management server. As BDE deploys virtual machines into a cluster, it uses either an existing DHCP server or a statically created IP address pool. As part of the deployment process, hostnames are assigned by BDE. The hostnames are the same as the IP addresses. For example, if DHCP assigns 10.10.10.10 then the hostname of that virtual machine is 10.10.10.10. Hadoop then uses this hostname for the clusters. Storage resources BDE defines two types of storage resources—local and shared. Shared storage is useful for management or client servers deployed by BDE as shared storage can be protected with technologies such as VMware HA. Within Hadoop there are two types of nodes: master and worker nodes. Master nodes provide tracking functions whereas worker nodes provide job processing capabilities. Because worker nodes are disposable, they do not require top tier storage since Hadoop is designed to deal with node failure. There is also no reason to deploy worker nodes on shared storage. The choice of storage however must be capable of dealing with the required level of performance for the nodes. Allowing BDE
  • 33. Chapter 4: HaaS Component Integration 33EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 to use local VMFS storage for worker nodes is analogous to deploying physical worker nodes on commodity storage using direct attached storage. The final stage of configuration is to assign storage resources to BDE. This defines how the Hadoop clusters are deployed, either using local or shared datastores. By default BDE defines data stores as local. If you need shared datastores, you must configure the datastores accordingly. Refer to Chapter 6 of the VMware vSphere Big Data Extensions Administrator’s and User’s Guide for details on how to add datastores and networks to a cluster from the vSphere client. For details. refer to the EMC Hybrid Cloud Solution with VMware - Foundation Infrastructure Reference Architecture 2.5. Detailed installation and configuration information is available only to select EMC personnel and authorized partners. Installing and configuring EMC Hybrid Cloud IaaS
  • 34. Chapter 4: HaaS Component Integration 34 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5
  • 35. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 35EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Chapter 5 Creating vCO Workflows and vCAC Catalog Services for HaaS This chapter presents the following topics: Overview..................................................................................................................36 Importing and modifying custom vCO workflows .....................................................36 Creating vCAC Catalog Services ...............................................................................45
  • 36. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 36 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Overview The automation of Hadoop clusters is achieved by using custom workflows created with VMware vCloud Orchestrator (vCO). This chapter describes how these workflows are configured from within VMware Cloud Automation Center (vCAC) to present enterprise organizations with a self-service portal that includes a catalog of pre- configured Hadoop deployment scenarios. Importing and modifying custom vCO workflows To use HaaS within EMC Hybrid Cloud, the administrator must use custom vCO workflows for deploying HaaS. These workflows offer a choice of cluster sizes that can then be presented as catalog items from the vCloud Automation Center portal. The workflows are imported into VMware vCO using the vCO import function to be edited, tested, and packaged according the needs of the organization. This section describes the process for importing the custom workflows into vCO, so that the Hadoop Administrator can alter them and link them with the big data cluster configurations created in the earlier stages of the process. Importing custom workflows From within the vCO client, as shown in Figure 13, select Run, click Workflows, and select Import workflow. Browse to the location where you have placed the workflow package and click Open. The imported workflow appears in the folder selected. Figure 13. Importing custom workflows into vCO Validating workflows After importing the workflows into vCO, validate them by clicking the name of the folder containing the workflows and then selecting the Validate option from the context menu, as shown in Figure 14. The validation process ensures there are no Modifying custom workflows
  • 37. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 37EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 open ends, unreachable workflow elements, or unused attributes in the workflows, so that they will execute correctly. Figure 14. Using the validate workflows action Customizing HaaS workflows The HaaS workflows provide a framework for deploying each Hadoop cluster configuration of a given size through an automated workflow. The Hadoop administrator should modify the attributes of these workflows to meet the specific needs of the organization. Figure 15 shows how to use the vCO client to edit the attributes within a workflow. Figure 15. How to edit the attributes
  • 38. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 38 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Configuring custom parameters To make the workflows dynamic, vCO uses a combination of attributes and parameters to transfer data when it is processing a workflow. Workflow parameters must receive an input to generate an output or action. An example of configuring a custom parameter is when an input is received from the user or system. The input can then be passed to a command or script that would create a username or password, This in turn can be passed to the Hadoop cluster for authentication. Figure 16 shows how to create a custom username and password for the Hadoop Client node. Figure 16. Editing and creating custom parameter passing Launching a custom script Scripts help to edit the schema, which is the main component of a workflow. Launching individual scripts lets you test the components of the workflow one element at a time, or execute a script at runtime to prepare the data set, for example. Figure 17 shows how to launch scripting from within the workflow by using the Schema panel within the workflow itself.
  • 39. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 39EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 17. Launching scripts from the VCO Testing VCO HaaS custom workflows The previous sections demonstrated how to import the HaaS sample workflows into EMC Hybrid Cloud, specifically the vCenter Orchestrator which is the main orchestration and automation engine for the solution. As shown, once imported, the default workflows can be altered to meet any modifications made to the Hadoop clusters. The workflows can also be modified to pass any additional parameters that may be required, for example, passing the username and password or executing parts of additional scripts components. The final stage in importing and configuring the workflows is to test the workflows that have been imported and modified for each of the HaaS cluster sizes (micro cluster, small cluster, and large cluster). Figure 18 shows how to:  Select the specific workflow for a given cluster size  Execute the workflow from vCO  View the execution process  Verify the execution progress by checking the log files for any error messages
  • 40. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 40 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 18. Launching of Micro Hadoop Cluster workflow Viewing cluster creation After the VCO workflow is launched, the cluster creation process starts within vSphere and BDE. The management server uses the template server to clone the nodes required to create the cluster in terms of the numbers and types of node that comprise the cluster. To view and verify the cluster creation process, follow these steps: 1. Login to the vSphere web client. 2. Go to the BDE and view the actual cluster being created. Figure 19 shows the status of the creation of a micro Hadoop cluster in the BDE panel of the vSphere web client.
  • 41. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 41EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 19. Status of creation of Micro Hadoop cluster from BDE (vSphere web client) You can also log in to the vSphere Client Application and view the Hadoop cluster being created. Figure 20 shows the status of the creation of the Micro Hadoop cluster in the vSphere Client Application. Figure 20. Status of Micro Hadoop cluster creation from BDE vSphere Client
  • 42. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 42 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Creating BDE Clusters After the vCO workflows are imported they need to be customized for the different sized clusters according to the requirements of the enterprise. The examples provided describe micro, small, and large Hadoop clusters. The custom workflows define the type of the cluster, including cluster configuration, in terms of the number of master nodes, client nodes, and data nodes for each size. Creating a Hadoop cluster These steps document the procedure for creating a Hadoop cluster within BDE, which can then be translated when building a VCO workflow: 1. In vCenter, under Objects > Data Extensions,click New Big Data Cluster. 2. Follow the steps in the wizard, specifying the appropriate parameters as required. More detail can be found in the VMware vSphere Big Data Extensions Administrator’s and User’s Guide. The following sections outline the options and details required during the cluster configuration process. Naming a Hadoop cluster When prompted by the wizard, type a name to identify the cluster. Valid characters for cluster names are alphanumeric and underscores. When choosing a cluster name you should also consider the associated vApp name. Together the vApp and cluster name must be less than 80 characters. Configuring the Hadoop distribution When configuring a Hadoop cluster, you must select the correct Hadoop distribution from the Hadoop distribution list box Change the default from Apache to Pivotal HD, as shown in Figure 21. The distribution name matches the value of the name parameter that was passed to the config-distro.rb script when the Hadoop distribution was configured. For a Pivotal PHD 1.1 cluster, you must configure a valid DNS and FQDN for the cluster's HDFS and MapReduce traffic. Without valid DNS and FQDN settings, the cluster creation process might fail or the cluster is created but does not function. Figure 21. Create and name a new Big Data Cluster Specifying deployment type When prompted by the wizard, select the deployment type for the cluster, either Basic Hadoop Cluster or Data/Compute Separation Cluster. The type of cluster you create determines the available node group selections. Creating new BDE clusters Configuring a Hadoop cluster
  • 43. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 43EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Identifying the DataMaster node group The DataMaster node is a virtual machine that runs the Hadoop NameNode service. This node manages HDFS data and assigns tasks to Hadoop TaskTracker services deployed in the worker node group. To identify the group: 1. Select a resource template from the list box or select Customize to create a custom resource template. 2. For the master node, specify shared storage so that the virtual machine is protected with vSphere HA. Identifying the ComputeMaster node group The ComputeMaster node is a virtual machine that runs the Hadoop JobTracker service. This node assigns tasks to Hadoop TaskTracker services deployed in the worker node group. To identify the group: 1. Select a resource template from the list box or select Customize to create a custom resource template. 2. For the master node, specify shared storage so that the virtual machine is protected with vSphere HA. Identifying the HBaseMaster node group (HBase cluster only) The HBaseMaster node is a virtual machine that runs the HBase master service. This node orchestrates a cluster of one or more RegionServer slave nodes. To identify the group: 1. Select a resource template from the list box or select Customize to create a custom resource template. 2. For the master node, specify shared storage so that the virtual machine is protected with vSphere HA. Identifying the Worker node group Worker nodes are virtual machines that run the Hadoop DataNode, TaskTracker, and HBase HRegionServer services. These nodes store HDFS data and execute tasks. To identify the group: 1. Select a resource template from the list box or select Customize to create a custom resource template. 2. For the worker nodes, use local storage. Note: You can add nodes to the worker node group by using Scale Out Cluster, but you cannot reduce the number of nodes. Identifying the Client node group A client node is a virtual machine that contains Hadoop client components. From this virtual machine you can access HDFS, submit MapReduce jobs, run Pig scripts, run Hive queries, and run HBase commands. When configuring the cluster for use with HaaS, you do not configure the Client node group unless any of these configuration items are required outside of the HaaS solution.
  • 44. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 44 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 To identify the group: 1. Select a resource template from the list box or select Customize to create a custom resource template. 2. For the client nodes, use local storage. Note: You can add nodes to the client node group by using Scale Out Cluster, but you cannot reduce the number of nodes. Selecting the Hadoop topology configuration When you create a cluster with BDE, BDE disables automatic migration for the cluster’s virtual machines. This prevents vSphere from migrating anything but does not prevent the administrator from migrating nodes unintentionally to other vCenter hosts. It is essential that migrating is not performed from within vCenter as this could break the cluster placement policy. As part of the final cluster configuration you should select the topology configuration that you want the cluster to use: RACK_AS_RACK, HOST_AS_RACK , HVE, or NONE. More information is available in the chapter “About Cluster Topology” in chapter 7 of the VMware vSphere Big Data Extensions Administrator’s and User’s Guide.
  • 45. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 45EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Creating vCAC Catalog Services The focus of customization for this EMC Hybrid Cloud solution is the VMware vCAC user self-service portal, where additional functionality is included to enable additional services for cloud users. The final stage of integrating Hadoop as a Service is to present to vCAC the HaaS workflows that have been imported and modifiedso that they can be selected as catalog items. VMware vCAC 6.0 provides the extensibility to enable IaaS functionality through Advanced Service blueprints. The IaaS functionality is achieved by exposing custom vCO workflows that the vCAC 6.0 portal can present as a catalog of services for cloud users. You can create custom workflow definitions using vCAC Designer. The vCAC Designer console provides a visual workflow editor for customizing vCAC lifecycle workflows. The extensibility toolkits include a library of activities that serve as building blocks for custom workflows. Using the Advanced Service Designer, you can define new service offerings and publish them to the common catalog as catalog items. To create the service blueprints you must access vCAC from a browser and log in to vCAC. Each tenant has a unique URL to the vCAC console:  The default tenant URL is in the following format: https://hostname/shell-ui-app where hostname is the Fully Qualified Domain Name (FQDN) of a vCAC host.  The URL for additional tenants is in the following format: https://hostname/shell-ui-app/org/tenantURL where tenantURL is the URL name specified when the tenant is being created. This is the workspace in which the customer creates catalog services. The following steps demonstrate, at a high level, how to integrate the HaaS workflows into the vCAC self-service catalog by showing the creation of:  Catalog services  Blueprints  Custom resources and resource actions For more information, refer to the vCloud Automation Center Extensibility Guide. To integrate the HaaS workflows into the vCACA self-service catalog, follow these steps: 1. From the main vCAC portal page, click Advanced Services to list all of the current service blueprints defined. 2. Click the green “plus” symbol, shown in Figure 22, to create a new service blueprint. Accessing vCAC Creating a new service blueprint
  • 46. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 46 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 22. Advance Service Designer Follow these steps to create a new service blueprint: 1. Select one of the imported Hadoop Cluster Creation workflows from the list. 2. Name the new service and create a form to support user input for the required parameters. If required, delete the default form and create a new form. 3. Drag and drop any appropriate input fields onto the form. 4. Publish the new service to create the appropriate service definition in the catalog management. 5. Assign a catalog management service to the new advanced service, and create the appropriate entitlement definition in the catalog management, as shown in Figure 23. Figure 23. Edit Entitlement window
  • 47. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 47EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 When these tasks are completed, the new service is then available in the service catalog for the cloud administrator. It is possible to replace the default VMware logo icons in the service catalog with more suitable HaaS icons. The replacement of icons is the final stage of customization and ensures that the service catalog items are tailored to a specific function or application. This can be performed from the Catalog Management menu by selecting the Catalog Items list box, selecting the configure an icon option, and then browsing and selecting a new icon. After the configuration stages have been performed within vCAC, the service catalog is available to provision HaaS items, as shown in Figure 24. Figure 24. vCAC Service Catalog showing Hadoop as a Service
  • 48. Chapter 5: Creating vCO Workflows and vCAC Catalog Services for HaaS 48 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5
  • 49. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 49EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Chapter 6 Use Cases: EMC Hybrid Cloud IaaS This chapter presents the following topics: Overview..................................................................................................................50 IaaS – storage services ............................................................................................50 Monitoring and capacity planning............................................................................57 Metering and chargeback ........................................................................................61
  • 50. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 50 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Overview This chapter covers EMC Hybrid Cloud IaaS and other use cases that can be incorporated to extend the functionality beyond virtual machine provisioning to consume resources. From time to time additional physical resources will be required to support the extension of a Hadoop environment. The following sections show how EHC storage provisioning workflows can be used to create additional resources on demand by provisioning additional storage as required, and how the VMware vC Ops tool set can be used to analyze consumed resources, provide capacity planning, increase resources using scenarios that increase physical resources, and increase VM and node capacity. IaaS – storage services Storage is provisioned, allocated, and consumed by different cloud users in this solution. For vCAC IaaS users, the storage services provided in the vCAC service catalog provision storage resources that will be allocated to and consumed by other cloud users. Once the storage resources are available, fabric group administrators can assign the resources to business groups. Creators of virtual machine blueprints (business group managers) can then configure their blueprints to use those particular storage resources for the list of virtual machine disks. When they provision virtual machines, cloud users consume the storage and, depending on their entitlements, may choose the storage service for their virtual machines. This use case demonstrates how ViPR software-defined storage is provisioned for the hybrid cloud from the VMware vCAC self-service catalog. 1. To provision block or file storage from the vCAC self-service portal, select the Provision Cloud Storage item from the vCAC service catalog, as shown in Figure 25. Overview Use case 1: Storage provisioning
  • 51. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 51EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 25. Storage Services - Provision cloud storage The storage service blueprint can be created using vCAC anything-as-a-service (XaaS) functionality in the vCAC Advanced Service Designer. EMC ViPR provisioning workflows, which are presented by vCO to the vCAC service catalog, support storage services. The storage provisioned by the IaaS user enables the fabric group administrator to make storage resources available to their business group. The storage provisioning request requires very little input from the vCAC IaaS user. The main inputs required are:  Datastore Type: VMFS or NFS  Datastore Size  vCenter Cluster  Storage Tier Most of these inputs, except LUN size, are selected from pre-populated list boxes whose items are determined by the cluster resources available through vCenter and the virtual pools available in ViPR. After entering a description and reason for the storage-provisioning request, enter your password. The vCenter Server will manage multiple ESXi clusters; therefore, you must choose the relevant vCenter cluster to tell the provisioning operation where to assign the storage device. Select a vCenter cluster from the next screen, as shown in Figure 26.
  • 52. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 52 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 26. Provision Cloud Storage – select vCenter cluster 2. Select the type of datastore you require from the list of available storage types, as shown in Figure 27. A datastore type of VMFS requires block storage, while NFS requires file storage. Other data services such as disaster recovery and continuous availability are displayed as appropriate only if detected in the underlying infrastructure. Figure 27. Storage Provisioning – Select datastore type 3. Select from which storage offering the new storage device should be provisioned. The list of available storage offerings is based on the datastore type selected, such as VMFS or NFS, and what matching virtual pools are available from the ViPR virtual array. In this example, a single NFS-based ViPR virtual pool is available to provision storage from, with the available capacity of the virtual pool also displayed to the user, as shown in Figure 28. The storage pools listed have been configured in the EMC ViPR virtual array and their storage capabilities are associated with storage profiles created in vCenter.
  • 53. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 53EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 28. Storage provisioning – Choose ViPR storage pool 4. Enter the size required for the new storage, in GB, as shown in Figure 29. Figure 29. Storage provisioning – Enter storage size 5. The fabric group administrator must reserve the new Storage Pool for use by the business group, as shown in Figure 30. Figure 30. Provision Storage – Storage Reservation for vCAC Business Group When the automated process sends an email notification to the fabric group administrator that the storage is ready and available in vCAC, the fabric group administrator can then assign capacity reservations on the device for use by the business group.
  • 54. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 54 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 In this example, a number of required input values, such as LUN or datastore name, have been masked from the user during the storage provisioning request process. Some of these values are locked-in and managed by the orchestration process and logic to ensure consistency. In addition to the initial provisioning of storage to the ESXi cluster at the vSphere layer, this solution provides further automation and integration of the new storage up into the vCAC layer. The ViPR storage provider automatically tags the storage device with the appropriate storage profile based on its storage capabilities. The remaining automated steps in this solution are:  vCAC rediscovery of resources under vCenter endpoint  vCAC storage reservation policy assigned to new datastore  vCAC fabric group administrator notification of availability of new datastore This use case demonstrates how cloud users can consume the available storage service offerings. This use case is part of the broader virtual machine deployment use case, but here it relates directly to how the business group manager and users can manage the storage service offerings available to them. VMware vCAC business group managers and users can select the appropriate storage for their virtual machine through the VMware vCAC user portal. For business group managers, the storage type for the virtual machine disks can be set during the creation of a virtual machine blueprint. As shown in Figure 31, the relevant storage reservation policy can be applied to each of the virtual disks. Figure 31. Set storage reservation policy for virtual machine disks After the storage reservation policy is set, the blueprint will always deploy this virtual machine and its virtual disks to that storage type. If more user control is required at deployment time, the business group manager can elect to allow business group users to reconfigure the storage reservation policies at deployment time by selecting the checkbox Allow user to see and change storage reservation policies. Use case 2: Select virtual machine storage
  • 55. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 55EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 This solution uses VMware IT Business Management Suite (ITBM) to provide chargeback information on the storage service offerings for the hybrid cloud. Through its integration with VMware vCenter and vCAC, ITBM enables the cloud administrator to automatically track utilization of storage resources provided by EMC ViPR. The EMC ViPR VASA provider in vCenter automatically captures the underlying storage capabilities of LUNs provisioned from virtual pools on the EMC ViPR virtual array. Storage profiles are created based on these storage capabilities, which are aligned with the storage service offerings. This integration enables ITBM to automatically discover and group datastores based on predefined service levels of storage. In this solution we created a separate virtual machine storage profile for each of the storage service offerings, as shown in Figure 32. Figure 32. Create new virtual machine storage profile for Tier 2 storage The storage capabilities are shown automatically in vSphere, as shown in Figure 33, where Tier 2 EMC ViPR storage is supporting a datastore. Figure 33. Automatic discovery of storage capabilities using EMC ViPR Storage Provider Note: Storage capabilities are only visible in the traditional vSphere client and not in the web client. Also, the web client uses virtual machine storage policies in place of virtual machine storage profiles. After the EMC ViPR Storage Provider has automatically configured the datastores with the appropriate storage profiles, the data stores can be grouped and managed in ITBM in line with their storage profile. Figure 34 shows that the cost profiles created Use case 3: Metering storage services
  • 56. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 56 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 in vCenter are discovered by ITBM. This allows the business management administrator to group tiered datastores provisioned with ViPR and set the monthly cost per GB as needed. Figure 34. VMware ITBM chargeback based on storage profile of datastore VMware vCAC can provide a storefront for storage services to be used by cloud users. These service catalog items deploy EMC ViPR software-defined storage services based on the usage of multiple service offerings of block and file storage across EMC VNX and VMAX storage arrays. Each service offers varying levels of availability, capacity, and performance to satisfy the operational requirements of different lines of business. This solution combines EMC ViPR with EMC array-based FAST-enabled storage service offerings across the EMC storage arrays with VMware vSphere to simplify storage operations for hybrid cloud consumers. Summary
  • 57. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 57EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Monitoring and capacity planning The vCenter Operations Management Suite has functions that can help HaaS administrators to achieve the following goals:  Eliminate or significantly reduce the manual problem-solving effort in the environment.  Proactively manage core service and cloud infrastructure performance, and utilize infrastructure resources optimally.  Provision proactive warnings regarding performance issues before problems affect the end user. Real-time performance dashboards enable service providers to meet their SLAs by highlighting potential performance issues before end users notice these issues. Infrastructure maintenance and operations teams need the end-to-end visibility and intelligence to make fast, informed operational decisions to proactively ensure service levels in cloud environments. They need to get promptly to the root cause of performance problems, optimize capacity in real time, and maintain compliance in a dynamic environment of constant change. The vCenter Operations Management Suite offers many features and functions to deliver quality of service, operational efficiency, and continuous compliance for your dynamic cloud infrastructure and business critical applications. This section describes in detail the capacity planning functions that can help you to predict the impact on underlying infrastructure of new HaaS deployments or of upgrading current HaaS instances with new services. Forecasting capacity risks in vCenter Operations Manager involves creating what-if scenarios to examine the demand and supply of resources in the cloud infrastructure. A what-if scenario is a supposition about how capacity and load might change if certain conditions, influenced by an increased or decreased number of ESX hosts, storage resources, or virtual machines in environment, occur, without making actual changes to your virtual infrastructure. If you implement the scenario, you know in advance what your capacity requirements are. To create a what-if scenario, you can use models and profiles based on current resource consumption in the existing environment. Alternatively, you can manually define amounts of virtual machine RAM, storage, CPU, and utilization in a new consumption profile, as shown in Figure 35, to predict the potential impact of growth. Monitoring Capacity planning
  • 58. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 58 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 35. Choosing virtual machine consumption models and profiles To define a new virtual machine profile, you can make detailed specifications that give you the option to include and predict specific resource utilizations, reservations, and limits in order to get as accurate a projection as possible, as shown in Figure 36. Figure 36. Specifying configuration and projected capacity usage of new virtual machines Figure 37 shows that there are insufficient resources for a planned deployment scenario consisting of either 50 or 85 new virtual machines. In this case, we can easily provision new vSphere hosts using vCAC services as described in previous sections.
  • 59. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 59EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 37. Capacity summary showing insufficient CPU and RAM resources Before you provision new hardware resources, you can create hardware change scenarios to determine the effect of adding, removing, or updating the hardware capacity in a vSphere cluster. You can create a scenario that models changes to hosts and datastores, as shown in Figure 38 and Figure 39. Figure 38. Specifying number of hosts and amount of CPU and memory
  • 60. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 60 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 39. Specifying datastore size The what-if scenario capacity planning function allows you compare how adding different amounts of virtual machines and hardware will impact your actual environment, as shown in Figure 40. Figure 40. Compared scenarios In a planning exercise, assume that you:  Have a request to deploy an additional 45 Hadoop node instances in the existing HaaS.  Plan to purchase blade servers compliant with a certain specification.  Want to deploy an additional 25 Hadoop clusters. Capacity planning example
  • 61. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 61EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 In Figure 41, each column shows how an individual change affects resources in your environment. The Combined Scenarios column shows you the cumulative effect of hardware purchasing and an overall expansion of 70 virtual machines. Figure 41. Combined scenarios Metering and chargeback VMware ITBM provides cloud administrators with comprehensive metering and cost information across physical and virtual resources in the EMC Hybrid Cloud environment. Besides working out the cost of physical components such as storage, compute, and networking resources, you can also include and configure other factors that affect the overall cost of your cloud environment, such as operating system licensing, maintenance, labor, and environmental facilities costs, as shown in Figure 42.
  • 62. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 62 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Figure 42. Categorized hybrid cloud environment cost overview ITBM is integrated into the vCAC portal for the Hadoop administrator and presents a dashboard overview of the hybrid cloud infrastructure. VMware ITBM Standard Edition uses its own reference database, which has been preloaded with industry-standard data and vendor-specific data to generate the base price for virtual CPU (vCPU), RAM, and storage values. These prices, which default to the cost of CPU, RAM, and storage, are automatically consumed by vCAC, where they can be changed as appropriate by the cloud administrator. This eliminates the need to manually configure cost profiles in vCAC and assign them to compute resources. ITBM is also integrated with vCenter and can import existing resource hierarchies, folder structures, and vCenter tags to associate EMC Hybrid Cloud resource usage with business units, departments, and projects. Infrastructure resources consumed by HaaS instances and hosted applications are provided by dedicated vSphere clusters with associated vSphere hosts and datastores. ITBM provides you with detailed information about:  Number of vSphere hosts in the vSphere cluster and the number of virtual machines on each host  CPU and RAM capacity and utilization of the vSphere cluster  Overall cost of the compute resources provided by the dedicated vSphere cluster  Cluster cost by virtual machine
  • 63. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 63EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 The Clusters tab provides you with insight into the cost of the vSphere cluster resources consumed by Hadoop cluster instances. You can monitor costs while provisioning new hosts, as shown in Figure 43. Figure 43. vSphere Cluster cost overview The Datastores tab provides insight into the cost of the storage resources consumed by an HaaS instance. The name of a datastore provisioned by vCAC storage services inherits a cluster name prefix as part of its published name. Performing a sort by datastore name gives you a list of the names and costs of the datastores provisioned and assigned to hosts in the vSphere cluster, as shown in Figure 44. Figure 44. Storage cost overview
  • 64. Chapter 6: Use Cases: EMC Hybrid Cloud IaaS 64 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5
  • 65. Chapter 7: Conclusion 65EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Chapter 7 Conclusion This chapter presents the following topics: Summary..................................................................................................................66
  • 66. Chapter 7: Conclusion 66 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Summary Pivotal Hadoop is designed to create an easy-to-scale big data framework. To achieve this kind of flexibility, HaaS is designed around the modular system components of Pivotal Hadoop. Using vCenter Orchestrator workflows, the administrator can provide fixed cluster configuration catalog items or create dynamic workflows that can be called from a catalog. The size of the nodes used is determined by the individual making the request. Elastic provisioning refers to the ability to provision flexible computing resources when and where they are required and to easily scale resources up and down to match demand. Resource elasticity can relate to processing power, memory, storage, bandwidth, and so on. This document indicates the importance of having an elastic and scalable IaaS platform on which to support the hosting of dynamically changing and fast-growing big data platforms. VMware vCenter Operations Manager enables you to deliver quality of service, attain operational efficiency, and gather current capacity capabilities while forecasting the effect of future HaaS deployments or upgrades in your cloud infrastructure. HaaS clusters can grow to a large number of node instances. The limit can be changed by changing the BDE configuration parameters. It is crucial therefore to have proactive performance monitoring and capacity planning solutions in place. To support comprehensive, dynamic, and fast-growing development environments such as Hadoop as a service, you must ensure the stability of the underlying cloud compute infrastructure, which must provide availability, scalability, flexibility, and performance to the big data platform and its services. As a solution to these challenges, this document has addressed simple provisioning from a self-service catalog and considerations for building scalable Hadoop as-a -ervice environments, with an elastic and easy-to-deploy underlying IaaS infrastructure provided by the EMC Hybrid Cloud solution.
  • 67. Appendix A: References 67EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 Appendix A References This appendix presents the following topic: References ...............................................................................................................68
  • 68. Appendix A: References 68 EMC Hybrid Cloud Solution with VMware Hadoop Applications Solution Guide 2.5 VMware references The following VMware documents provide additional and relevant information:  Advanced Service Design vCloud Automation Center 6.0  Installing and Configuring VMware vCenter Orchestrator  VMware Compatibility Guide  VMware vSphere Big Data Extensions Administrator’s and User’s Guide: vSphere Big Data Extensions 1.0  Installing and Configuring VMware vSphere Big Data Extensions (Video)