Weitere ähnliche Inhalte Ähnlich wie Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon (20) Kürzlich hochgeladen (20) Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & Isilon2. Roadmap Information Disclaimer
EMC makes no representation and undertakes no obligations with
regard to product planning information, anticipated product
characteristics, performance specifications, or anticipated release
dates (collectively, “Roadmap Information”).
Roadmap Information is provided by EMC as an accommodation to the
recipient solely for purposes of discussion and without intending to be
bound thereby.
Roadmap information is EMC Restricted Confidential and is provided
under the terms, conditions and restrictions defined in the EMC NonDisclosure Agreement in place with your organization.
© Copyright 2013 EMC Corporation. All rights reserved.
2
3. Goal Of This Session
Demonstrate How Greenplum/Pivotal HD, Project
Serengeti And Isilon Can Work Together To Deliver
Hadoop-as-a-Service Capabilities In A Public Or
Private Service Provider Context
© Copyright 2013 EMC Corporation. All rights reserved.
3
5. How “Classic” Hadoop Works
HDFS
CLIEN
T
1: Create file
JOB
TRKR
NAME
NODE
Master
© Copyright 2013 EMC Corporation. All rights reserved.
2: Write
TASK
TRKR
DATA
NODE
Worker
3: Replicate
TASK
TRKR
DATA
NODE
Worker
TASK
TRKR
DATA
NODE
Physical
Hardware
Worker
5
6. How “Classic” Hadoop Works
MR
APP
1: Submit job
2: Check for tasks
JOB
TRKR
NAME
NODE
Master
© Copyright 2013 EMC Corporation. All rights reserved.
3: Retrieve task resources
TASK
TRKR
DATA
NODE
Worker
TASK
TRKR
DATA
NODE
Worker
TASK
TRKR
DATA
NODE
Physical
Hardware
Worker
6
7. How “Classic” Hadoop Works
Physical Hardware Is Dedicated To Node
Each Node Works With Local Storage
Physical Network Topology
JOB
TRKR
NAME
NODE
Master
© Copyright 2013 EMC Corporation. All rights reserved.
TASK
TRKR
DATA
NODE
Worker
TASK
TRKR
DATA
NODE
Worker
TASK
TRKR
DATA
NODE
Physical
Hardware
Worker
7
8. Pivotal HD Architecture
Pivotal HD
Enterprise
Configure,
Resource
Management
& Workflow
HBase
Hadoop Virtualization (HVE)
Pig, Hive,
Mahout
Map Reduce
Yarn
Monitor,
Manage
Command
Center
HDFS
Zookeeper
Deploy,
DataLoader
Sqoop
Flume
Apache
© Copyright 2013 EMC Corporation. All rights reserved.
Pivotal HD Added Value
8
9. “Classic” Hadoop Challenges
Hard To Deploy And Operate
Poor Utilization Of Storage And/Or CPU
Inefficient Data Staging And Loading Processes
Lack Of Multi-Tenancy
Backup And Disaster Recovery Missing
Cluster Sprawl
© Copyright 2013 EMC Corporation. All rights reserved.
9
10. The Road To Hadoop-As-A-Service
Tenant/User
Management
Self-Service
Portal
Metering
Provisioning
Physical
Virtual
Dedicated
Shared, Elastic Compute
Shared, Elastic Storage
Multi-Tenant
Single Tenant
Multi-App
As-A-Service
© Copyright 2013 EMC Corporation. All rights reserved.
10
11. Virtualized Hadoop With Local Storage
Virtual
Infrastructure
VMMaster
+ VMDK
VM + VMDK
Worker
JOB
TRKR
TASK
TRKR
NAME
NODE
Master
Server + DAS
DATA
NODE
Server + DAS
Worker
© Copyright 2013 EMC Corporation. All rights reserved.
VM + VMDK
Worker
TASK
TRKR
DATA
NODE
Worker
Server + DAS
VM + VMDK
Worker
TASK
TRKR
DATA
NODE
Physical
Hardware
Server + DAS
Worker
11
12. Virtualized Hadoop With Local Storage
JOB
TRKR
NAME
NODE
TASK
TRKR
Master
Server + DAS
DATA
NODE
Worker
Server + DAS
TASK
TRKR
DATA
NODE
Worker
Server + DAS
TASK
TRKR
DATA
NODE
Worker
Server + DAS
Unified
Operations
Shared
Resources =
Higher
Utilization
Elastic
Resources =
Faster
Provisioning
5-10x Better CPU Utilization!
© Copyright 2013 EMC Corporation. All rights reserved.
12
13. Hadoop Runs Well Virtualized
450
Elapsed time, seconds
(lower is better)
400
350
Nativ
e
1 VM
300
250
200
150
100
50
0
TeraGen
TeraSort
TeraValidate
Source: http://www.vmware.com/files/pdf/techpaper/VMW-HadoopPerformance-vSphere5.pdf
© Copyright 2013 EMC Corporation. All rights reserved.
13
14. Project Serengeti
Deploy Hadoop Cluster In 10
minutes
Customize Hadoop Cluster
One-Stop Command Center
Open Source Project Backed
By VMware, Launched In June
2012
© Copyright 2013 EMC Corporation. All rights reserved.
14
15. Virtualized Hadoop With Shared Storage
JOB
TRKR
NAME
NODE
TASK
TRKR
DATA
NODE
TASK
TRKR
DATA
NODE
TASK
TRKR
DATA
NODE
Virtual
Infrastructure
Master
Worker
Worker
Worker
Physical
Hardware
Server + DAS
Server + DAS
© Copyright 2013 EMC Corporation. All rights reserved.
Server + DAS
Server + DAS
15
16. Virtualized Hadoop With Shared Storage
JOB
TRKR
NAME
NODE
TASK
TRKR
DATA
NODE
TASK
TRKR
DATA
NODE
TASK
TRKR
DATA
NODE
Virtual
Infrastructure
Master
Worker
Worker
Worker
NAME
NODE
Server
© Copyright 2013 EMC Corporation. All rights reserved.
Server
Isilon
Physical
Hardware
Isilon
16
17. Virtualized Hadoop With Isilon
Worker
NAME
NODE
Server
Server
TASK
TRKR
Isilon
Efficient Data
Loading
No SPOF
End-To-End Data
Protection
Leading Storage
Efficiency
Worker
DATA
NODE
NAME
NODE
DATA
NODE
Isilon
Replication Overhead Only 20% Rather Than 200%!
© Copyright 2013 EMC Corporation. All rights reserved.
Native HDFS
Support (Plus NFS,
CIFS etc.)
Worker
TASK
TRKR
Independent
Scaling
Master
TASK
TRKR
JOB
TRKR
Multi-App ScaleOut Storage
Platform
17
18. Hadoop With Software-Defined Storage
JOB
TRKR
TASK
TRKR
TASK
TRKR
NAME
NODE
DATA
NODE
Virtual
Infrastructure
Master
Worker
Worker
Isilon VM
Physical
Hardware
Server
© Copyright 2013 EMC Corporation. All rights reserved.
Server
Any NAS
Any NAS
18
20. HDaaS Solution Component Interaction
Data
Scientist
Analyze
Manage
PORTAL
UI
SERENGETI
CLIENT
API
2: Invoke
HDAAS
WORKFLOWS
WaveMaker
1: AAA
3: Provision
vCenter
Orchestrator
SERENGETI
SERVER
4: Instantiate
SERENGETI
AGENT
PIVOTAL HD
MASTER
Serengeti
3: Provision
ISILON
REST
API
vCenter & ChargeBack
PLATINU
M
GOLD
SERENSERENGETI
GETI
AGENT
AGENT
vC & CB
APIs
PIVOPIVOTAL HD
TAL HD
MASTER
WORKER
SILVER
BRONZE
Isilon
USER/T
ENANT
MGMT
Postgres
3: Provision
© Copyright 2013 EMC Corporation. All rights reserved.
Serengeti Pivotal HD
20
21. Tenant Isolation On Isilon
/ifs/HDFS
One Directory Within OneFS Per Tenant,
One Subdirectory Per Data Scientist
Access Controlled By Group And User
Rights
/tenant1
/ds1
/tenant2
/ds2
Leverage SmartQuotas To Set Resource
Limits And Report Usage
Separate Subnets For Tenants, LoadBalanced With SmartConnect
© Copyright 2013 EMC Corporation. All rights reserved.
21
23. HDaaS Solution Is Your Jump-Start Kit To
Hadoop-As-A-Service – Free!
Compute
Summary
Pivotal HD Brings Features Like Virtualization
Support to Hadoop
Serengeti Allows “One-Click” Deployment Of
Hadoop Clusters On vSphere Systems
© Copyright 2013 EMC Corporation. All rights reserved.
Storage
Isilon Is The First And Only Enterprise-Ready,
Scale-Out NAS That Natively Supports HDFS
23
24. What’s Next? HAWQ
HAWQ– Advanced
Database Services
Pivotal HD
Enterprise
ANSI SQL + Analytics
Configure,
HBase
Xtension
Catalog
Query
Framework
Services
Optimizer
Hadoop Virtualization (HVE)
Pig, Hive,
Mahout
Dynamic Pipelining
Resource
Management
& Workflow
Map Reduce
Yarn
Monitor,
Manage
Command
Center
HDFS
Zookeeper
Deploy,
DataLoader
Sqoop
Flume
Apache
© Copyright 2013 EMC Corporation. All rights reserved.
Pivotal HD Added Value
24
25. Resources
HDaaS Solution Collateral
– White Paper, Presentations, Demos
– http://powerlink.emc.com
EMC Solution Pavillion
Related Sessions
– Hadoop for Powerful Processing of Unstructured Data for Valuable Insights
– Virtualize Big Data to Make the Elephant Dance
– Taking Command of Big Data: Hadoop Analytics + Isilon Scale-Out
Storage = One-Stop Solution for High Impact Business Insight
© Copyright 2013 EMC Corporation. All rights reserved.
25