SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
Building

Hadoop-as-a-Service
Using Pivotal HD, Project Serengeti,
And EMC Isilon

Bernd Kaponig
EMC Solutions Group

© Copyright 2013 EMC Corporation. All rights reserved.

1
Roadmap Information Disclaimer
 EMC makes no representation and undertakes no obligations with
regard to product planning information, anticipated product
characteristics, performance specifications, or anticipated release
dates (collectively, “Roadmap Information”).
 Roadmap Information is provided by EMC as an accommodation to the
recipient solely for purposes of discussion and without intending to be
bound thereby.
 Roadmap information is EMC Restricted Confidential and is provided
under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization.

© Copyright 2013 EMC Corporation. All rights reserved.

2
Goal Of This Session
 Demonstrate How Greenplum/Pivotal HD, Project
Serengeti And Isilon Can Work Together To Deliver
Hadoop-as-a-Service Capabilities In A Public Or
Private Service Provider Context

© Copyright 2013 EMC Corporation. All rights reserved.

3
What Is Hadoop-As-A-Service?
Tenant
Analytics-asa-Service

Data
Scientist
Tenant/User
Management

Tenant
Hadoop-asa-Service

Self-Service
Portal

Data
Scientist
Metering

Infrastructureas-a-Service

© Copyright 2013 EMC Corporation. All rights reserved.

Provisiong

Service Provider

4
How “Classic” Hadoop Works
HDFS
CLIEN
T

1: Create file

JOB
TRKR

NAME
NODE

Master

© Copyright 2013 EMC Corporation. All rights reserved.

2: Write

TASK
TRKR

DATA
NODE

Worker

3: Replicate

TASK
TRKR

DATA
NODE

Worker

TASK
TRKR

DATA
NODE

Physical
Hardware

Worker

5
How “Classic” Hadoop Works
MR
APP

1: Submit job

2: Check for tasks

JOB
TRKR

NAME
NODE

Master

© Copyright 2013 EMC Corporation. All rights reserved.

3: Retrieve task resources

TASK
TRKR

DATA
NODE

Worker

TASK
TRKR

DATA
NODE

Worker

TASK
TRKR

DATA
NODE

Physical
Hardware

Worker

6
How “Classic” Hadoop Works
 Physical Hardware Is Dedicated To Node
 Each Node Works With Local Storage
 Physical Network Topology
JOB
TRKR

NAME
NODE

Master

© Copyright 2013 EMC Corporation. All rights reserved.

TASK
TRKR

DATA
NODE

Worker

TASK
TRKR

DATA
NODE

Worker

TASK
TRKR

DATA
NODE

Physical
Hardware

Worker

7
Pivotal HD Architecture
Pivotal HD
Enterprise
Configure,
Resource
Management
& Workflow

HBase

Hadoop Virtualization (HVE)

Pig, Hive,
Mahout
Map Reduce

Yarn

Monitor,
Manage

Command
Center

HDFS

Zookeeper

Deploy,

DataLoader
Sqoop

Flume

Apache

© Copyright 2013 EMC Corporation. All rights reserved.

Pivotal HD Added Value

8
“Classic” Hadoop Challenges
 Hard To Deploy And Operate
 Poor Utilization Of Storage And/Or CPU
 Inefficient Data Staging And Loading Processes
 Lack Of Multi-Tenancy
 Backup And Disaster Recovery Missing
 Cluster Sprawl

© Copyright 2013 EMC Corporation. All rights reserved.

9
The Road To Hadoop-As-A-Service
Tenant/User
Management

Self-Service
Portal

Metering
Provisioning

 Physical

 Virtual

 Dedicated

 Shared, Elastic Compute

 Shared, Elastic Storage
 Multi-Tenant

 Single Tenant

 Multi-App

 As-A-Service

© Copyright 2013 EMC Corporation. All rights reserved.

10
Virtualized Hadoop With Local Storage
Virtual
Infrastructure
VMMaster
+ VMDK

VM + VMDK
Worker

JOB
TRKR

TASK
TRKR

NAME
NODE

Master
Server + DAS

DATA
NODE

Server + DAS
Worker

© Copyright 2013 EMC Corporation. All rights reserved.

VM + VMDK
Worker

TASK
TRKR

DATA
NODE

Worker
Server + DAS

VM + VMDK
Worker

TASK
TRKR

DATA
NODE

Physical
Hardware

Server + DAS
Worker

11
Virtualized Hadoop With Local Storage
JOB
TRKR

NAME
NODE

TASK
TRKR

Master

Server + DAS

DATA
NODE

Worker

Server + DAS

TASK
TRKR

DATA
NODE

Worker

Server + DAS

TASK
TRKR

DATA
NODE

Worker

Server + DAS

 Unified
Operations
 Shared
Resources =
Higher
Utilization
 Elastic
Resources =
Faster
Provisioning

5-10x Better CPU Utilization!
© Copyright 2013 EMC Corporation. All rights reserved.

12
Hadoop Runs Well Virtualized
450
Elapsed time, seconds
(lower is better)

400
350

Nativ
e
1 VM

300
250
200
150
100
50
0
TeraGen

TeraSort

TeraValidate

Source: http://www.vmware.com/files/pdf/techpaper/VMW-HadoopPerformance-vSphere5.pdf

© Copyright 2013 EMC Corporation. All rights reserved.

13
Project Serengeti
 Deploy Hadoop Cluster In 10
minutes
 Customize Hadoop Cluster
 One-Stop Command Center
 Open Source Project Backed
By VMware, Launched In June
2012

© Copyright 2013 EMC Corporation. All rights reserved.

14
Virtualized Hadoop With Shared Storage
JOB
TRKR

NAME
NODE

TASK
TRKR

DATA
NODE

TASK
TRKR

DATA
NODE

TASK
TRKR

DATA
NODE

Virtual
Infrastructure
Master

Worker

Worker

Worker

Physical
Hardware
Server + DAS

Server + DAS

© Copyright 2013 EMC Corporation. All rights reserved.

Server + DAS

Server + DAS

15
Virtualized Hadoop With Shared Storage
JOB
TRKR

NAME
NODE

TASK
TRKR

DATA
NODE

TASK
TRKR

DATA
NODE

TASK
TRKR

DATA
NODE

Virtual
Infrastructure
Master

Worker

Worker

Worker

NAME
NODE

Server

© Copyright 2013 EMC Corporation. All rights reserved.

Server

Isilon

Physical
Hardware

Isilon

16
Virtualized Hadoop With Isilon


Worker

NAME
NODE

Server

Server

TASK
TRKR

Isilon

Efficient Data
Loading



No SPOF
End-To-End Data
Protection



Leading Storage
Efficiency

Worker

DATA
NODE

NAME
NODE

DATA
NODE

Isilon

Replication Overhead Only 20% Rather Than 200%!
© Copyright 2013 EMC Corporation. All rights reserved.

Native HDFS
Support (Plus NFS,
CIFS etc.)



Worker

TASK
TRKR

Independent
Scaling



Master

TASK
TRKR




JOB
TRKR

Multi-App ScaleOut Storage
Platform

17
Hadoop With Software-Defined Storage
JOB
TRKR

TASK
TRKR

TASK
TRKR

NAME
NODE

DATA
NODE

Virtual
Infrastructure
Master

Worker

Worker

Isilon VM

Physical
Hardware
Server

© Copyright 2013 EMC Corporation. All rights reserved.

Server

Any NAS

Any NAS

18
Making It As-A-Service
SELF
SERV

WaveMaker

HD
LCM

Serengeti

WORK
FLOWS

METE
RING

USER
MGMT

TEN’T
MGMT

vCenter O & CB

Postgres

TASK
TRKR

TASK
TRKR

HD Cmd Center

Portal
JOB
TRKR

vCenter
NAME
NODE

DATA
NODE

NAME
NODE

DATA
NODE

Infrastr. Mgmt.

© Copyright 2013 EMC Corporation. All rights reserved.

19
HDaaS Solution Component Interaction
Data
Scientist

Analyze

Manage

PORTAL
UI

SERENGETI
CLIENT

API

2: Invoke

HDAAS
WORKFLOWS

WaveMaker
1: AAA

3: Provision

vCenter
Orchestrator

SERENGETI
SERVER

4: Instantiate

SERENGETI
AGENT

PIVOTAL HD
MASTER

Serengeti

3: Provision
ISILON
REST
API

vCenter & ChargeBack

PLATINU
M
GOLD

SERENSERENGETI
GETI
AGENT
AGENT

vC & CB
APIs

PIVOPIVOTAL HD
TAL HD
MASTER
WORKER

SILVER

BRONZE

Isilon

USER/T
ENANT
MGMT

Postgres

3: Provision

© Copyright 2013 EMC Corporation. All rights reserved.

Serengeti Pivotal HD

20
Tenant Isolation On Isilon
/ifs/HDFS

 One Directory Within OneFS Per Tenant,
One Subdirectory Per Data Scientist
 Access Controlled By Group And User
Rights

/tenant1

/ds1

/tenant2

/ds2

 Leverage SmartQuotas To Set Resource
Limits And Report Usage
 Separate Subnets For Tenants, LoadBalanced With SmartConnect
© Copyright 2013 EMC Corporation. All rights reserved.

21
Demo
© Copyright 2013 EMC Corporation. All rights reserved.

22
 HDaaS Solution Is Your Jump-Start Kit To
Hadoop-As-A-Service – Free!

Compute

Summary

 Pivotal HD Brings Features Like Virtualization
Support to Hadoop
 Serengeti Allows “One-Click” Deployment Of
Hadoop Clusters On vSphere Systems
© Copyright 2013 EMC Corporation. All rights reserved.

Storage

 Isilon Is The First And Only Enterprise-Ready,
Scale-Out NAS That Natively Supports HDFS

23
What’s Next? HAWQ
HAWQ– Advanced
Database Services

Pivotal HD
Enterprise

ANSI SQL + Analytics

Configure,

HBase

Xtension
Catalog
Query
Framework
Services
Optimizer
Hadoop Virtualization (HVE)

Pig, Hive,
Mahout

Dynamic Pipelining

Resource
Management
& Workflow

Map Reduce

Yarn

Monitor,
Manage

Command
Center

HDFS

Zookeeper

Deploy,

DataLoader
Sqoop

Flume

Apache

© Copyright 2013 EMC Corporation. All rights reserved.

Pivotal HD Added Value

24
Resources
 HDaaS Solution Collateral

– White Paper, Presentations, Demos
– http://powerlink.emc.com

 EMC Solution Pavillion
 Related Sessions

– Hadoop for Powerful Processing of Unstructured Data for Valuable Insights
– Virtualize Big Data to Make the Elephant Dance
– Taking Command of Big Data: Hadoop Analytics + Isilon Scale-Out
Storage = One-Stop Solution for High Impact Business Insight

© Copyright 2013 EMC Corporation. All rights reserved.

25
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

Weitere ähnliche Inhalte

Was ist angesagt?

Use Hybrid Cloud to Streamline SAP with NetApp, AWS and SAP LVM
Use Hybrid Cloud to Streamline SAP with NetApp, AWS and SAP LVMUse Hybrid Cloud to Streamline SAP with NetApp, AWS and SAP LVM
Use Hybrid Cloud to Streamline SAP with NetApp, AWS and SAP LVM
Amazon Web Services
 
32984 cloud system la-bcs
32984 cloud system la-bcs32984 cloud system la-bcs
32984 cloud system la-bcs
gmazuel
 
Technical track 2: arcserve UDP for virtualization & cloud
Technical track 2: arcserve UDP for virtualization & cloudTechnical track 2: arcserve UDP for virtualization & cloud
Technical track 2: arcserve UDP for virtualization & cloud
arcserve data protection
 

Was ist angesagt? (20)

EMC HADOOP Storage Strategy
EMC HADOOP Storage StrategyEMC HADOOP Storage Strategy
EMC HADOOP Storage Strategy
 
Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Next Generation Data Protection Architecture
Next Generation Data Protection Architecture
 
CA ARCserve Solution Overview
CA ARCserve Solution OverviewCA ARCserve Solution Overview
CA ARCserve Solution Overview
 
Arcserve Portfolio Technical Overview
Arcserve Portfolio Technical OverviewArcserve Portfolio Technical Overview
Arcserve Portfolio Technical Overview
 
Can $0.08 Change your View of Storage?
Can $0.08 Change your View of Storage?Can $0.08 Change your View of Storage?
Can $0.08 Change your View of Storage?
 
2/18 Technical Overview
2/18 Technical Overview2/18 Technical Overview
2/18 Technical Overview
 
Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution Brochure : The EMC Big Data Solution
Brochure : The EMC Big Data Solution
 
DataCore At VMworld 2016
DataCore At VMworld 2016DataCore At VMworld 2016
DataCore At VMworld 2016
 
The Value of NetApp with VMware
The Value of NetApp with VMwareThe Value of NetApp with VMware
The Value of NetApp with VMware
 
Appliance Launch Webcast
Appliance Launch WebcastAppliance Launch Webcast
Appliance Launch Webcast
 
DataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage HypervisorDataCore Software - The one and only Storage Hypervisor
DataCore Software - The one and only Storage Hypervisor
 
Simplified and Efficient Cloud Disaster Recovery and Cloud Data Protection (S...
Simplified and Efficient Cloud Disaster Recovery and Cloud Data Protection (S...Simplified and Efficient Cloud Disaster Recovery and Cloud Data Protection (S...
Simplified and Efficient Cloud Disaster Recovery and Cloud Data Protection (S...
 
DataCore Technology Overview
DataCore Technology OverviewDataCore Technology Overview
DataCore Technology Overview
 
SAP on AWS
SAP on AWSSAP on AWS
SAP on AWS
 
Technical track 2_Virtualization & Cloud
Technical track 2_Virtualization & CloudTechnical track 2_Virtualization & Cloud
Technical track 2_Virtualization & Cloud
 
Oracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kódOracle Cloud - Infrastruktura jako kód
Oracle Cloud - Infrastruktura jako kód
 
Use Hybrid Cloud to Streamline SAP with NetApp, AWS and SAP LVM
Use Hybrid Cloud to Streamline SAP with NetApp, AWS and SAP LVMUse Hybrid Cloud to Streamline SAP with NetApp, AWS and SAP LVM
Use Hybrid Cloud to Streamline SAP with NetApp, AWS and SAP LVM
 
VIPR SOFTWARE-DEFINED STORAGE
VIPR SOFTWARE-DEFINED STORAGEVIPR SOFTWARE-DEFINED STORAGE
VIPR SOFTWARE-DEFINED STORAGE
 
32984 cloud system la-bcs
32984 cloud system la-bcs32984 cloud system la-bcs
32984 cloud system la-bcs
 
Technical track 2: arcserve UDP for virtualization & cloud
Technical track 2: arcserve UDP for virtualization & cloudTechnical track 2: arcserve UDP for virtualization & cloud
Technical track 2: arcserve UDP for virtualization & cloud
 

Andere mochten auch

7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoop
Taldor Group
 
Approaches for data_loading
Approaches for data_loadingApproaches for data_loading
Approaches for data_loading
Mahesh Benne
 

Andere mochten auch (20)

EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter Kit
 
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
 
2016 Digital Technology Discussion: Strategies, Trends, Future Visions
2016 Digital Technology Discussion: Strategies, Trends, Future Visions2016 Digital Technology Discussion: Strategies, Trends, Future Visions
2016 Digital Technology Discussion: Strategies, Trends, Future Visions
 
Basic introduction of Amazon Web Services (AWS)
Basic introduction of Amazon Web Services (AWS)Basic introduction of Amazon Web Services (AWS)
Basic introduction of Amazon Web Services (AWS)
 
Cloud computing & dbms
Cloud computing & dbmsCloud computing & dbms
Cloud computing & dbms
 
Basic understanding of aws
Basic understanding of awsBasic understanding of aws
Basic understanding of aws
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Vertica - Amazon Web Services
Vertica - Amazon Web ServicesVertica - Amazon Web Services
Vertica - Amazon Web Services
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec Isilon
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoop
 
How to do surya namaskar
How to do surya namaskarHow to do surya namaskar
How to do surya namaskar
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
Vertica on Amazon Web Services
Vertica on Amazon Web ServicesVertica on Amazon Web Services
Vertica on Amazon Web Services
 
AWS Cloud Formation
AWS Cloud Formation AWS Cloud Formation
AWS Cloud Formation
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Approaches for data_loading
Approaches for data_loadingApproaches for data_loading
Approaches for data_loading
 
Gartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesGartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud Services
 

Ähnlich wie Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c
guest8ebe0a8
 

Ähnlich wie Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon (20)

True Storage Virtualization with Software-Defined Storage
True Storage Virtualization with Software-Defined StorageTrue Storage Virtualization with Software-Defined Storage
True Storage Virtualization with Software-Defined Storage
 
Cloud Models, Considerations, & Adoption Techniques
Cloud Models, Considerations, & Adoption TechniquesCloud Models, Considerations, & Adoption Techniques
Cloud Models, Considerations, & Adoption Techniques
 
EMC & OpenStack: A View From Within
EMC & OpenStack: A View From WithinEMC & OpenStack: A View From Within
EMC & OpenStack: A View From Within
 
Greenplum feature
Greenplum featureGreenplum feature
Greenplum feature
 
100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c100424 teradata cloud computing 3rd party influencers2c
100424 teradata cloud computing 3rd party influencers2c
 
Software Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and StorageSoftware Defined Data Center: The Intersection of Networking and Storage
Software Defined Data Center: The Intersection of Networking and Storage
 
Operating Kubernetes at Scale (Australia Presentation)
Operating Kubernetes at Scale (Australia Presentation)Operating Kubernetes at Scale (Australia Presentation)
Operating Kubernetes at Scale (Australia Presentation)
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
 
Running your Spring Apps in the Cloud Javaone 2014
Running your Spring Apps in the Cloud Javaone 2014Running your Spring Apps in the Cloud Javaone 2014
Running your Spring Apps in the Cloud Javaone 2014
 
VMAX : répondez aux niveaux de services applicatifs les plus élevés
VMAX : répondez aux niveaux de services applicatifs les plus élevésVMAX : répondez aux niveaux de services applicatifs les plus élevés
VMAX : répondez aux niveaux de services applicatifs les plus élevés
 
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQMassively Parallel Processing with Procedural Python - Pivotal HAWQ
Massively Parallel Processing with Procedural Python - Pivotal HAWQ
 
Maximize Availability With Oracle Database 12c
Maximize Availability With Oracle Database 12cMaximize Availability With Oracle Database 12c
Maximize Availability With Oracle Database 12c
 
EMC Hybrid Cloud Solutions with VMware
EMC Hybrid Cloud Solutions with VMwareEMC Hybrid Cloud Solutions with VMware
EMC Hybrid Cloud Solutions with VMware
 
What is expected from Chief Cloud Officers?
What is expected from Chief Cloud Officers?What is expected from Chief Cloud Officers?
What is expected from Chief Cloud Officers?
 
EMC's IT Transformation Journey ( EMC Forum 2014 )
EMC's IT Transformation Journey ( EMC Forum 2014 )EMC's IT Transformation Journey ( EMC Forum 2014 )
EMC's IT Transformation Journey ( EMC Forum 2014 )
 
OpenStack + CloudFoundry Austin Meetup
OpenStack + CloudFoundry Austin MeetupOpenStack + CloudFoundry Austin Meetup
OpenStack + CloudFoundry Austin Meetup
 
OS + CF Austin meetup
OS + CF Austin meetupOS + CF Austin meetup
OS + CF Austin meetup
 
Implementing Data Caching and Data Synching Using Oracle MAF
Implementing Data Caching and Data Synching Using Oracle MAFImplementing Data Caching and Data Synching Using Oracle MAF
Implementing Data Caching and Data Synching Using Oracle MAF
 
Transforming Mission Critical Applications
Transforming Mission Critical ApplicationsTransforming Mission Critical Applications
Transforming Mission Critical Applications
 
Desktop, Embedded and Mobile Apps with PrismTech Vortex Cafe
Desktop, Embedded and Mobile Apps with PrismTech Vortex CafeDesktop, Embedded and Mobile Apps with PrismTech Vortex Cafe
Desktop, Embedded and Mobile Apps with PrismTech Vortex Cafe
 

Mehr von EMC

Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
EMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
EMC
 

Mehr von EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon

  • 1. Building Hadoop-as-a-Service Using Pivotal HD, Project Serengeti, And EMC Isilon Bernd Kaponig EMC Solutions Group © Copyright 2013 EMC Corporation. All rights reserved. 1
  • 2. Roadmap Information Disclaimer  EMC makes no representation and undertakes no obligations with regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).  Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.  Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization. © Copyright 2013 EMC Corporation. All rights reserved. 2
  • 3. Goal Of This Session  Demonstrate How Greenplum/Pivotal HD, Project Serengeti And Isilon Can Work Together To Deliver Hadoop-as-a-Service Capabilities In A Public Or Private Service Provider Context © Copyright 2013 EMC Corporation. All rights reserved. 3
  • 5. How “Classic” Hadoop Works HDFS CLIEN T 1: Create file JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. 2: Write TASK TRKR DATA NODE Worker 3: Replicate TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 5
  • 6. How “Classic” Hadoop Works MR APP 1: Submit job 2: Check for tasks JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. 3: Retrieve task resources TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 6
  • 7. How “Classic” Hadoop Works  Physical Hardware Is Dedicated To Node  Each Node Works With Local Storage  Physical Network Topology JOB TRKR NAME NODE Master © Copyright 2013 EMC Corporation. All rights reserved. TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Worker TASK TRKR DATA NODE Physical Hardware Worker 7
  • 8. Pivotal HD Architecture Pivotal HD Enterprise Configure, Resource Management & Workflow HBase Hadoop Virtualization (HVE) Pig, Hive, Mahout Map Reduce Yarn Monitor, Manage Command Center HDFS Zookeeper Deploy, DataLoader Sqoop Flume Apache © Copyright 2013 EMC Corporation. All rights reserved. Pivotal HD Added Value 8
  • 9. “Classic” Hadoop Challenges  Hard To Deploy And Operate  Poor Utilization Of Storage And/Or CPU  Inefficient Data Staging And Loading Processes  Lack Of Multi-Tenancy  Backup And Disaster Recovery Missing  Cluster Sprawl © Copyright 2013 EMC Corporation. All rights reserved. 9
  • 10. The Road To Hadoop-As-A-Service Tenant/User Management Self-Service Portal Metering Provisioning  Physical  Virtual  Dedicated  Shared, Elastic Compute  Shared, Elastic Storage  Multi-Tenant  Single Tenant  Multi-App  As-A-Service © Copyright 2013 EMC Corporation. All rights reserved. 10
  • 11. Virtualized Hadoop With Local Storage Virtual Infrastructure VMMaster + VMDK VM + VMDK Worker JOB TRKR TASK TRKR NAME NODE Master Server + DAS DATA NODE Server + DAS Worker © Copyright 2013 EMC Corporation. All rights reserved. VM + VMDK Worker TASK TRKR DATA NODE Worker Server + DAS VM + VMDK Worker TASK TRKR DATA NODE Physical Hardware Server + DAS Worker 11
  • 12. Virtualized Hadoop With Local Storage JOB TRKR NAME NODE TASK TRKR Master Server + DAS DATA NODE Worker Server + DAS TASK TRKR DATA NODE Worker Server + DAS TASK TRKR DATA NODE Worker Server + DAS  Unified Operations  Shared Resources = Higher Utilization  Elastic Resources = Faster Provisioning 5-10x Better CPU Utilization! © Copyright 2013 EMC Corporation. All rights reserved. 12
  • 13. Hadoop Runs Well Virtualized 450 Elapsed time, seconds (lower is better) 400 350 Nativ e 1 VM 300 250 200 150 100 50 0 TeraGen TeraSort TeraValidate Source: http://www.vmware.com/files/pdf/techpaper/VMW-HadoopPerformance-vSphere5.pdf © Copyright 2013 EMC Corporation. All rights reserved. 13
  • 14. Project Serengeti  Deploy Hadoop Cluster In 10 minutes  Customize Hadoop Cluster  One-Stop Command Center  Open Source Project Backed By VMware, Launched In June 2012 © Copyright 2013 EMC Corporation. All rights reserved. 14
  • 15. Virtualized Hadoop With Shared Storage JOB TRKR NAME NODE TASK TRKR DATA NODE TASK TRKR DATA NODE TASK TRKR DATA NODE Virtual Infrastructure Master Worker Worker Worker Physical Hardware Server + DAS Server + DAS © Copyright 2013 EMC Corporation. All rights reserved. Server + DAS Server + DAS 15
  • 16. Virtualized Hadoop With Shared Storage JOB TRKR NAME NODE TASK TRKR DATA NODE TASK TRKR DATA NODE TASK TRKR DATA NODE Virtual Infrastructure Master Worker Worker Worker NAME NODE Server © Copyright 2013 EMC Corporation. All rights reserved. Server Isilon Physical Hardware Isilon 16
  • 17. Virtualized Hadoop With Isilon  Worker NAME NODE Server Server TASK TRKR Isilon Efficient Data Loading  No SPOF End-To-End Data Protection  Leading Storage Efficiency Worker DATA NODE NAME NODE DATA NODE Isilon Replication Overhead Only 20% Rather Than 200%! © Copyright 2013 EMC Corporation. All rights reserved. Native HDFS Support (Plus NFS, CIFS etc.)  Worker TASK TRKR Independent Scaling  Master TASK TRKR   JOB TRKR Multi-App ScaleOut Storage Platform 17
  • 18. Hadoop With Software-Defined Storage JOB TRKR TASK TRKR TASK TRKR NAME NODE DATA NODE Virtual Infrastructure Master Worker Worker Isilon VM Physical Hardware Server © Copyright 2013 EMC Corporation. All rights reserved. Server Any NAS Any NAS 18
  • 19. Making It As-A-Service SELF SERV WaveMaker HD LCM Serengeti WORK FLOWS METE RING USER MGMT TEN’T MGMT vCenter O & CB Postgres TASK TRKR TASK TRKR HD Cmd Center Portal JOB TRKR vCenter NAME NODE DATA NODE NAME NODE DATA NODE Infrastr. Mgmt. © Copyright 2013 EMC Corporation. All rights reserved. 19
  • 20. HDaaS Solution Component Interaction Data Scientist Analyze Manage PORTAL UI SERENGETI CLIENT API 2: Invoke HDAAS WORKFLOWS WaveMaker 1: AAA 3: Provision vCenter Orchestrator SERENGETI SERVER 4: Instantiate SERENGETI AGENT PIVOTAL HD MASTER Serengeti 3: Provision ISILON REST API vCenter & ChargeBack PLATINU M GOLD SERENSERENGETI GETI AGENT AGENT vC & CB APIs PIVOPIVOTAL HD TAL HD MASTER WORKER SILVER BRONZE Isilon USER/T ENANT MGMT Postgres 3: Provision © Copyright 2013 EMC Corporation. All rights reserved. Serengeti Pivotal HD 20
  • 21. Tenant Isolation On Isilon /ifs/HDFS  One Directory Within OneFS Per Tenant, One Subdirectory Per Data Scientist  Access Controlled By Group And User Rights /tenant1 /ds1 /tenant2 /ds2  Leverage SmartQuotas To Set Resource Limits And Report Usage  Separate Subnets For Tenants, LoadBalanced With SmartConnect © Copyright 2013 EMC Corporation. All rights reserved. 21
  • 22. Demo © Copyright 2013 EMC Corporation. All rights reserved. 22
  • 23.  HDaaS Solution Is Your Jump-Start Kit To Hadoop-As-A-Service – Free! Compute Summary  Pivotal HD Brings Features Like Virtualization Support to Hadoop  Serengeti Allows “One-Click” Deployment Of Hadoop Clusters On vSphere Systems © Copyright 2013 EMC Corporation. All rights reserved. Storage  Isilon Is The First And Only Enterprise-Ready, Scale-Out NAS That Natively Supports HDFS 23
  • 24. What’s Next? HAWQ HAWQ– Advanced Database Services Pivotal HD Enterprise ANSI SQL + Analytics Configure, HBase Xtension Catalog Query Framework Services Optimizer Hadoop Virtualization (HVE) Pig, Hive, Mahout Dynamic Pipelining Resource Management & Workflow Map Reduce Yarn Monitor, Manage Command Center HDFS Zookeeper Deploy, DataLoader Sqoop Flume Apache © Copyright 2013 EMC Corporation. All rights reserved. Pivotal HD Added Value 24
  • 25. Resources  HDaaS Solution Collateral – White Paper, Presentations, Demos – http://powerlink.emc.com  EMC Solution Pavillion  Related Sessions – Hadoop for Powerful Processing of Unstructured Data for Valuable Insights – Virtualize Big Data to Make the Elephant Dance – Taking Command of Big Data: Hadoop Analytics + Isilon Scale-Out Storage = One-Stop Solution for High Impact Business Insight © Copyright 2013 EMC Corporation. All rights reserved. 25