SlideShare ist ein Scribd-Unternehmen logo
1 von 30
A r c h i t e c t i n g a S c a l a b l e H a d o o p P l a t f o r m :
To p 1 0 C o n s i d e r a t i o n s f o r S u c c e s s
PRESENTED BY Sumeet Singh ⎪ April 15, 2015
H a d o o p S u m m i t 2 0 1 5 , B r u s s e l s
Introduction
2
§  Manages Hadoop products team at Yahoo
§  Responsible for Product Management, Strategy and
Customer Engagements
§  Managed Cloud Engineering products teams and
headed Strategy functions for the Cloud Platform
Group at Yahoo
§  MBA from UCLA and MS from Rensselaer
Polytechnic Institute (RPI)
Sumeet Singh
Sr. Director, Product Management
Cloud and Big Data Platforms
Platforms and Personalization Products
701 First Avenue,
Sunnyvale, CA 94089 USA
@sumeetksingh
Disclaimer
The considerations presented
here are my personal opinion,
driven purely from my
experiences working with cloud
technologies and services
Hadoop as Secure Shared Hosted Multi-tenant Platform
3
TV
PC
Phone
Tablet
Pushed Data
Pulled Data
Web Crawl
Social
Email
3rd Party Content
Data Highway
Hadoop Grid
BI, Reporting, Adhoc Analytics
Data
Content
Ads
No-SQL
Serving Stores
Serving
Platform Evolution (2006 – 2015)
4
0
100
200
300
400
500
600
700
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
50,000
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
RawHDFS(inPB)
#Servers
Year
Servers Storage
Yahoo!
Commits to
Scaling Hadoop
for Production
Use
Research
Workloads
in Search and
Advertising
Production
(Modeling)
with machine
learning &
WebMap
Revenue
Systems
with Security,
Multi-tenancy,
and SLAs
Open
Sourced with
Apache
Hortonworks
Spinoff for
Enterprise
hardening
Nextgen
Hadoop
(H 0.23 YARN)
New Services
(HBase,
Storm, Spark,
Hive)
Increased
User-base
with partitioned
namespaces
Apache H2.6
(Scalable ML, Latency,
Utilization, Productivity)
Servers Use Cases
Hadoop 43,000 300
HBase 3,000 70
Storm 2,000 50
Top 10 Considerations for Scaling Hadoop-based Platform
5
On-Premise or Public Cloud
Total Cost Of Ownership (TCO)
Hardware Configuration
2
3
Network4
Software Stack5
6
7
8
10
Security and Account Management
Data Lifecycle Management and BCP
Metering, Audit and Governance
9 Integration with External Systems
Debunking Architectural Myths
1
On-Premise or Public Cloud – Deployment Models
6
1
Private (dedicated)
Clusters
Hosted Multi-tenant
(Private Cloud)
Clusters
Hosted Compute
Clusters
§  Large demanding use
cases
§  New technology not
yet platformized
§  Data movement and
regulation issues
§  For cases where
more cost effective
than on-premise
§  Time to market/
results matter
§  Exploration and
learning
§  Source of truth for all
of orgs data
§  App delivery agility
§  Operational efficiency
and cost savings
through economies of
scale
On-Premise Public Cloud
Purpose-built
Big Data
Clusters
§  For performance,
tighter integration
with tech stack
§  Value added services
such as monitoring,
alerts, tuning and
common tools
On-Premise or Public Cloud – Selection Criteria
7
1
§  Fixed, does not vary with utilization
§  Favors scale and 24x7 centralized ops
§  Variable with usage
§  Favors run and done, decentralized ops
Cost
§  Aggregated from disparate or distributed
sources
§  Typically generated and stored in the
cloudData
§  Job queue, cap. sched., BCP, catchup
§  Controlled latency and throughput
§  No guarantees (beyond uptime) without
provisioning additional resourcesSLA
§  Control over deployed technology
§  Requires platform team/ vendor support
§  Little to no control over tech stack
§  No need for platform R&D headcount
Tech Stack
§  Shared env., control over data /
movement, PII, ACLs, pluggable security
§  Data typically not shared among users in
the cloudSecurity
§  Matters, complex to develop and
operate
§  Does not matter, clusters are dynamic/
virtual and dedicated
Multi-
tenancy
On-Premise Public CloudCriteria
On-Premise or Public Cloud – Evaluation
8
1
On-Premise
Public Cloud
Cost
Data
SLA
Tech Stack
Security
Multi-tenancy
On-Premise or Public Cloud – A Lot About Utilization
9
1
Utilization / Consumption (Compute and Storage)
Cost($)
On-premise
Hadoop as a
Service
On-demand
public cloud
service
Terms-based
public cloud
service
Favors on-premise
Hadoop as a Service
Favors public cloud
service
x
x
Current and expected
or target utilization
can provide further
insights into your
operations and cost
competitiveness
Highstartingcost
Scalingup
Crossover
point 1
Total Cost Of Ownership (TCO) – Components
10
2
$2.1 M
60%
12%
7%
6%
3%
2%
6
5
4
3
2
1
7
10%
Operations Engineering
§  Headcount for service engineering and data operations teams responsible for day-to-day ops and
support
6
Acquisition/ Install (One-time)
§  Labor, POs, transportation, space, support, upgrades, decommissions, shipping/ receiving etc.
5
Network Hardware
§  Aggregated network component costs, including switches, wiring, terminal servers, power strips etc.
4
Active Use and Operations (Recurring)
§  Recurring datacenter ops cost (power, space, labor support, and facility maintenance
3
R&D HC
§  Headcount for platform software development, quality, and release engineering
2
Cluster Hardware
§  Data nodes, name nodes, job trackers, gateways, load proxies, monitoring, aggregator, and web servers
1
Monthly TCOTCO Components
Network Bandwidth
§  Data transferred into and out of clusters for all colos, including cross-colo transfers
7
ILLUSTRATIVE
Total Cost Of Ownership (TCO) – Unit Costs (Hadoop)
11
2
Compute (Memory)
Container memory
where apps perform
computation and
access HDFS if
needed
Compute (CPU)
Container CPU
cores used by apps
to perform
computation / data
processing
Bandwidth
Network bandwidth
needed to move
data into/out of the
clusters by the app
$ / GB-Hour (H 2.0+)
GBs of Memory
available for an hour
Monthly Memory Cost
Avail. Memory Capacity
$ / vCore-Hour (H 2.6+)
vCores of CPU
available for an hour
Monthly CPU Cost
Avail. CPU vCores
Unit
Total Capacity
Unit Cost
$ / GB of data stored
Usable storage space
(less replication and
overheads)
Monthly Storage Cost
Avail. Usable Storage
$ / GB for Inter-region
data transfers
Inter-region (peak) link
capacity
Monthly BW Cost
Monthly GB In + Out
Files and directories
used by the apps to
understand/ limit the
load on NN)
Namespace
HFDS (usable)
space needed by
an app with default
replication factor of
three
Storage (Disk)
Total Cost Of Ownership (TCO) – Consumption Costs
12
2
Map GB-Hours = GB(M1) x
T(M1) + GB(M2) x T(M2) +
…
Reduce GB-Hours = GB(R1)
x T(R1) + GB(R2) x T(R2) +
…
Cost = (M + R) GB-Hour x
$0.002 / GB-Hour / Month
= $ for the Job/ Month
(M+R) GB-Hours for all
jobs can summed up for
the month for a user, app,
BU, or the entire platform
Monthly Job
and Task
Cost
Monthly Roll-
ups
Compute (Memory) Compute (CPU) Bandwidth
Map vCore-Hours =
vCores(M1) x T(M1) +
vCores(M2) x T(M2) + …
Reduce vCore-Hours =
vCores(R1) x T(R1) +
vCores(R2) x T(R2) + …
Cost = (M + R) vCore-Hour
x $0.002 / vCore-Hour /
Month
= $ for the Job/ Month
(M+R) vCore-Hours for all
jobs can summed up for
the month for a user, app,
BU, or the entire platform
/ project (app) quota in GB
(peak monthly used)
/ user quota in GB (peak
monthly used)
/ data as each user
accountable for their portion
of use. For e.g.
GB Read (U1)
GB Read (U1) + GB Read
(U2) + …
Roll-ups through
relationship among user,
file ownership, app, and
their BU
Bandwidth measured at the
cluster level and divided
among select apps and
users of data based on
average volume In/Out
Roll-ups through
relationship among user,
app, and their BU
Storage (Disk)
Hardware Configuration – Physical Resources
13
3
.
.
.
.
Datacenter 1
Rack 1 Rack N
.
.
Bandwidth
Storage (Disk)
Memory
CPU
Clusters in Datacenters Server Resources
C-nn / 64,128,256 G / 4000, 6000 etc.
Hardware Configuration – Eventual Heterogeneity
14
3
Memory CPU Storage
24 G 8 cores SATA 0.5 TB
48 G 12 cores SATA 1.0 TB
64 G Harpertown SATA 2.0 TB
128 G Sandy Bridge SATA 3.0 TB
192 G Ivy Bridge SATA 4.0 TB
256 G Haswell SATA 6.0 TB
384 G
§  Heterogeneous Configurations:
10s of configs of data nodes
(collected over the years) without
dictating scheduling decisions –
let the framework balance out the
configs
§  Heterogeneous Storage:
HDFS supports heterogeneous
storage (HDD, SSD, RAM, RAID
etc.) – HDFS-2832, HDFS-5682
§  Heterogeneous Scheduling:
operate multiple purpose
hardware in the same cluster (e.g.
GPUs) – YARN 796
Network – Common Backplane
15
4
DataNode NodeManager
NameNode RM
DataNodes RegionServers
NameNode HBase Master Nimbus
Supervisor
Administration, Management and Monitoring
ZooKeeper
Pools
HTTP/HDFS/GDM
Load Proxies
Applications and Data
Data
Feeds
Data
Stores
Oozie
Server
HS2/
HCat
Network
Backplane
Network – Bottleneck Awareness
16
4
Hadoop Cluster
(Data Set 1)
Hadoop Cluster
(Data Set 2)
HBase Cluster
(Low-latency
Data Store)
Storm Cluster
(Real-time /
Stream
Processing)
Large dataset joins
or data sharing over
network
1
Large extractions
may saturate the
network
2
Fast bulk updates
may saturate the
network
3 Large data
copies may
not be
possible
4
Network – 1G BAS (Rack Locality Not A Major Issue)
17
4
RSW
…
…
…
N x
RSW RSW
BAS1-1 BAS1-2
FAB 1 FAB 2 FAB 3 FAB 4 FAB 5 FAB 6 FAB 7 FAB 8
L3
Backplane
RSW
…
…
…
N x
RSW RSW
BAS8-1 BAS8-2
L3
Backplane
…
1 Gbps
2:1 oversubscription
10 Gbps
8 x 10 Gbps
Fabric
Layer
48 racks, 15,360 hosts
SPOF!
Network –10G CLOS (Server Placement Not an Issue)
18
4
Spine 1
Leaf 1
Spine 2
Leaf 2
Leaf 3
Leaf 4
Spine 15 Leaf 29
Leaf 30
Leaf 31
Spine 0
Leaf 0
.
.
.
.
.
.
Virtual Chassis 0
Spine 1
Leaf 1
Spine 2
Leaf 2
Leaf 3
Leaf 4
Spine 15 Leaf 29
Leaf 30
Leaf 31
Spine 0
Leaf 0
.
.
.
.
.
.
Virtual Chassis 1
RSW
N x
RSW RSW
10 Gbps
5:1 oversubscription
16 spines, 32 leafs
2 x 40 Gbps
512 racks, 20,480 hosts
SPOF!
Network – Gen Next
19
4
Source: http://www.opencompute.org
Software Stack – Where are We Today
20
5
Compute
Services
Storage
Infrastructure
Services
Hive
(0.13, 1.0)
Pig
(0.11, 0.14)
Oozie
(4.4)
HDFS Proxy
(3.2)
GDM
(6.2)
YARN
(2.6)
MapReduce
(2.6)
HDFS
(2.6)
HBase
(0.98)
Zookeeper
Grid UI
(SS/Doppler,
Discovery, Hue 3.7)
Monitoring
Starling,
Timeline
Server
Messaging
Service
HCatalog
(0.13, 1.0)
Storm
(0.9)
Spark
(1.3)
Tez
(0.6)
Software Stack – Obsess With Use Cases, Not Tech
21
5
HDFS
(File System)
YARN
(Scheduling, Resource Management)
Compute
Services
Storage
Infrastructure
Services
Hive
(0.13, 1.0)
Pig
(0.11, 0.14)
Oozie
(4.4)
HDFS Proxy
(3.2)
GDM
(6.2)
YARN
(2.6)
MapReduce
(2.6)
HDFS
(2.6)
HBase
(0.98)
Zookeeper
Grid UI
(SS/Doppler,
Discovery, Hue 3.7)
Monitoring
Starling,
Timeline
Server
Messaging
Service
HCatalog
(0.13, 1.0)
Storm
(0.9)
Spark
(1.3)
Tez
(0.6)
Common
In-
progress,
Unmet
needs or
Apache
Alignment
Platformized
Tech with
Production
Support
RHEL6 64-bit, JDK8
Security and Account Management – Overview
22
6
Grid
Identity,
Authentication and
Authorization
User Id
SSO
Groups, Netgroups, Roles
RPC (GSSAPI)
UI (SPNEGO)
Security and Account Management – Flexibly Secure
23
6
Kerb Realm 2
(Users)
Kerb Realm 1
(Projects, Services)
IdP
SP
CLIENTS
CORP
PROD
Auth
User SSO
Netgroups
Hadoop RPC
Delegation
tokens
Block tokens
Job tokens
Grid
Data Lifecycle Management and BCP
24
7
Acquisition
Replication
(Feeds)Source
Retention
(Policy based
Expiration)
Archival
(Tape Backup)
DataOut
Data Lifecycle
Datastore
Datastore defines a data
source/target (e.g. HDFS)
Dataset
Defines the data flow of a feed
Workflow
Defines a unit of work carried
out by acquisition, replication,
retention servers for moving
an instance of a feed
Data Lifecycle Management and BCP
25
7
MetaStore
Cluster 1 - Colo 1
HDFS
Cluster 2 – Colo 2
HDFS
Grid Data
Management
Feed Acquisition
MetaStore
Feed datasets as
partitioned external
tables
Growl extracts
schema for backfill
HCatClient.
addPartitions(…)
Mark
LOAD_DONE
HCatClient.
addPartitions(…)
Mark
LOAD_DONE
Partitions are dropped with
(HCatClient.dropPartitions(…)) after
retention expiration with a
drop_partition notification
add_partition
event notification
add_partition
event notification
Acquisition
Archival,
Dataout
Retention
Feed
Replication
Metering, Audit, and Governance
26
8
Starling
FS, Job, Task logs
Cluster 1 Cluster 2 Cluster n...
CF, Region, Action, Query Stats
Cluster 1 Cluster 2 Cluster n...
DB, Tbl., Part., Colmn. Access Stats
...MS 1 MS 2 MS n
GDM
Data Defn., Flow, Feed, Source
F 1 F 2 F n
Log Warehouse
Log Sources
Metering, Audit, and Governance
27
8
Data Discovery and Access
Public
Non-sensitive
Financial
Restricted
$
Governance
Classification
No addn. reqmt.
LMS Integration
Stock Admin
Integration
Approval Flow
Integration with External Systems
28
9
BI, Reporting, Transactional DBs
Hadoop Customers
…
DH
Cloud Messaging
Serving Systems
Monitoring, Tools, Portals
Infrastructure in Transition
Debunking Myths
29
10
Hadoop isn’t enterprise ready
Hadoop isn’t stable, clusters go down
You lose data on HDFS
Data cannot be shared across the org
NameNodes do not scale
Software upgrades are rare✗
Hadoop use cases are limited
I need expensive servers to get more
Hadoop is so dead
I need Apache this vs. that
✗
✗
✗
✗
✗
✗
✗
✗
✗
Thank You
@sumeetksingh
Yahoo Kiosk #8

Weitere ähnliche Inhalte

Was ist angesagt?

Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Big Data Joe™ Rossi
 
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionBig Data Joe™ Rossi
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRDouglas Bernardini
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04Ted Dunning
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce FrameworkEdureka!
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBaseHortonworks
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseAsis Mohanty
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesDataWorks Summit
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 

Was ist angesagt? (20)

HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
SQLBits XI - ETL with Hadoop
SQLBits XI - ETL with HadoopSQLBits XI - ETL with Hadoop
SQLBits XI - ETL with Hadoop
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1
 
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Integration of HIve and HBase
Integration of HIve and HBaseIntegration of HIve and HBase
Integration of HIve and HBase
 
Working with the Scalding Type -Safe API
Working with the Scalding Type -Safe APIWorking with the Scalding Type -Safe API
Working with the Scalding Type -Safe API
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
 
What it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! PerspectivesWhat it takes to run Hadoop at Scale: Yahoo! Perspectives
What it takes to run Hadoop at Scale: Yahoo! Perspectives
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 

Ähnlich wie Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10 Considerations

Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Sumeet Singh
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosionactifio
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudlohitvijayarenu
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraAlluxio, Inc.
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHortonworks
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpbigdata sunil
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsDataWorks Summit
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Amazon Web Services
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...inside-BigData.com
 

Ähnlich wie Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10 Considerations (20)

Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
Strata Conference + Hadoop World NY 2013: Running On-premise Hadoop as a Busi...
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloud
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloads
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Resume - Narasimha Rao B V (TCS)
Resume - Narasimha  Rao B V (TCS)Resume - Narasimha  Rao B V (TCS)
Resume - Narasimha Rao B V (TCS)
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
 

Mehr von Sumeet Singh

Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...Sumeet Singh
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out Sumeet Singh
 
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Sumeet Singh
 
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Sumeet Singh
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Sumeet Singh
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Sumeet Singh
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! Sumeet Singh
 
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo! HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo! Sumeet Singh
 

Mehr von Sumeet Singh (9)

Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
Keynote Hadoop Summit San Jose 2017 : Shaping Data Platform To Create Lasting...
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
 
HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out HUG Meetup 2013: HCatalog / Hive Data Out
HUG Meetup 2013: HCatalog / Hive Data Out
 
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop Hadoop Summit San Jose 2014: Data Discovery on Hadoop
Hadoop Summit San Jose 2014: Data Discovery on Hadoop
 
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
Strata Conference + Hadoop World San Jose 2015: Data Discovery on Hadoop
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
 
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo! HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
HBaseCon 2013: Multi-tenant Apache HBase at Yahoo!
 

Kürzlich hochgeladen

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Kürzlich hochgeladen (20)

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 

Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10 Considerations

  • 1. A r c h i t e c t i n g a S c a l a b l e H a d o o p P l a t f o r m : To p 1 0 C o n s i d e r a t i o n s f o r S u c c e s s PRESENTED BY Sumeet Singh ⎪ April 15, 2015 H a d o o p S u m m i t 2 0 1 5 , B r u s s e l s
  • 2. Introduction 2 §  Manages Hadoop products team at Yahoo §  Responsible for Product Management, Strategy and Customer Engagements §  Managed Cloud Engineering products teams and headed Strategy functions for the Cloud Platform Group at Yahoo §  MBA from UCLA and MS from Rensselaer Polytechnic Institute (RPI) Sumeet Singh Sr. Director, Product Management Cloud and Big Data Platforms Platforms and Personalization Products 701 First Avenue, Sunnyvale, CA 94089 USA @sumeetksingh Disclaimer The considerations presented here are my personal opinion, driven purely from my experiences working with cloud technologies and services
  • 3. Hadoop as Secure Shared Hosted Multi-tenant Platform 3 TV PC Phone Tablet Pushed Data Pulled Data Web Crawl Social Email 3rd Party Content Data Highway Hadoop Grid BI, Reporting, Adhoc Analytics Data Content Ads No-SQL Serving Stores Serving
  • 4. Platform Evolution (2006 – 2015) 4 0 100 200 300 400 500 600 700 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 RawHDFS(inPB) #Servers Year Servers Storage Yahoo! Commits to Scaling Hadoop for Production Use Research Workloads in Search and Advertising Production (Modeling) with machine learning & WebMap Revenue Systems with Security, Multi-tenancy, and SLAs Open Sourced with Apache Hortonworks Spinoff for Enterprise hardening Nextgen Hadoop (H 0.23 YARN) New Services (HBase, Storm, Spark, Hive) Increased User-base with partitioned namespaces Apache H2.6 (Scalable ML, Latency, Utilization, Productivity) Servers Use Cases Hadoop 43,000 300 HBase 3,000 70 Storm 2,000 50
  • 5. Top 10 Considerations for Scaling Hadoop-based Platform 5 On-Premise or Public Cloud Total Cost Of Ownership (TCO) Hardware Configuration 2 3 Network4 Software Stack5 6 7 8 10 Security and Account Management Data Lifecycle Management and BCP Metering, Audit and Governance 9 Integration with External Systems Debunking Architectural Myths 1
  • 6. On-Premise or Public Cloud – Deployment Models 6 1 Private (dedicated) Clusters Hosted Multi-tenant (Private Cloud) Clusters Hosted Compute Clusters §  Large demanding use cases §  New technology not yet platformized §  Data movement and regulation issues §  For cases where more cost effective than on-premise §  Time to market/ results matter §  Exploration and learning §  Source of truth for all of orgs data §  App delivery agility §  Operational efficiency and cost savings through economies of scale On-Premise Public Cloud Purpose-built Big Data Clusters §  For performance, tighter integration with tech stack §  Value added services such as monitoring, alerts, tuning and common tools
  • 7. On-Premise or Public Cloud – Selection Criteria 7 1 §  Fixed, does not vary with utilization §  Favors scale and 24x7 centralized ops §  Variable with usage §  Favors run and done, decentralized ops Cost §  Aggregated from disparate or distributed sources §  Typically generated and stored in the cloudData §  Job queue, cap. sched., BCP, catchup §  Controlled latency and throughput §  No guarantees (beyond uptime) without provisioning additional resourcesSLA §  Control over deployed technology §  Requires platform team/ vendor support §  Little to no control over tech stack §  No need for platform R&D headcount Tech Stack §  Shared env., control over data / movement, PII, ACLs, pluggable security §  Data typically not shared among users in the cloudSecurity §  Matters, complex to develop and operate §  Does not matter, clusters are dynamic/ virtual and dedicated Multi- tenancy On-Premise Public CloudCriteria
  • 8. On-Premise or Public Cloud – Evaluation 8 1 On-Premise Public Cloud Cost Data SLA Tech Stack Security Multi-tenancy
  • 9. On-Premise or Public Cloud – A Lot About Utilization 9 1 Utilization / Consumption (Compute and Storage) Cost($) On-premise Hadoop as a Service On-demand public cloud service Terms-based public cloud service Favors on-premise Hadoop as a Service Favors public cloud service x x Current and expected or target utilization can provide further insights into your operations and cost competitiveness Highstartingcost Scalingup Crossover point 1
  • 10. Total Cost Of Ownership (TCO) – Components 10 2 $2.1 M 60% 12% 7% 6% 3% 2% 6 5 4 3 2 1 7 10% Operations Engineering §  Headcount for service engineering and data operations teams responsible for day-to-day ops and support 6 Acquisition/ Install (One-time) §  Labor, POs, transportation, space, support, upgrades, decommissions, shipping/ receiving etc. 5 Network Hardware §  Aggregated network component costs, including switches, wiring, terminal servers, power strips etc. 4 Active Use and Operations (Recurring) §  Recurring datacenter ops cost (power, space, labor support, and facility maintenance 3 R&D HC §  Headcount for platform software development, quality, and release engineering 2 Cluster Hardware §  Data nodes, name nodes, job trackers, gateways, load proxies, monitoring, aggregator, and web servers 1 Monthly TCOTCO Components Network Bandwidth §  Data transferred into and out of clusters for all colos, including cross-colo transfers 7 ILLUSTRATIVE
  • 11. Total Cost Of Ownership (TCO) – Unit Costs (Hadoop) 11 2 Compute (Memory) Container memory where apps perform computation and access HDFS if needed Compute (CPU) Container CPU cores used by apps to perform computation / data processing Bandwidth Network bandwidth needed to move data into/out of the clusters by the app $ / GB-Hour (H 2.0+) GBs of Memory available for an hour Monthly Memory Cost Avail. Memory Capacity $ / vCore-Hour (H 2.6+) vCores of CPU available for an hour Monthly CPU Cost Avail. CPU vCores Unit Total Capacity Unit Cost $ / GB of data stored Usable storage space (less replication and overheads) Monthly Storage Cost Avail. Usable Storage $ / GB for Inter-region data transfers Inter-region (peak) link capacity Monthly BW Cost Monthly GB In + Out Files and directories used by the apps to understand/ limit the load on NN) Namespace HFDS (usable) space needed by an app with default replication factor of three Storage (Disk)
  • 12. Total Cost Of Ownership (TCO) – Consumption Costs 12 2 Map GB-Hours = GB(M1) x T(M1) + GB(M2) x T(M2) + … Reduce GB-Hours = GB(R1) x T(R1) + GB(R2) x T(R2) + … Cost = (M + R) GB-Hour x $0.002 / GB-Hour / Month = $ for the Job/ Month (M+R) GB-Hours for all jobs can summed up for the month for a user, app, BU, or the entire platform Monthly Job and Task Cost Monthly Roll- ups Compute (Memory) Compute (CPU) Bandwidth Map vCore-Hours = vCores(M1) x T(M1) + vCores(M2) x T(M2) + … Reduce vCore-Hours = vCores(R1) x T(R1) + vCores(R2) x T(R2) + … Cost = (M + R) vCore-Hour x $0.002 / vCore-Hour / Month = $ for the Job/ Month (M+R) vCore-Hours for all jobs can summed up for the month for a user, app, BU, or the entire platform / project (app) quota in GB (peak monthly used) / user quota in GB (peak monthly used) / data as each user accountable for their portion of use. For e.g. GB Read (U1) GB Read (U1) + GB Read (U2) + … Roll-ups through relationship among user, file ownership, app, and their BU Bandwidth measured at the cluster level and divided among select apps and users of data based on average volume In/Out Roll-ups through relationship among user, app, and their BU Storage (Disk)
  • 13. Hardware Configuration – Physical Resources 13 3 . . . . Datacenter 1 Rack 1 Rack N . . Bandwidth Storage (Disk) Memory CPU Clusters in Datacenters Server Resources C-nn / 64,128,256 G / 4000, 6000 etc.
  • 14. Hardware Configuration – Eventual Heterogeneity 14 3 Memory CPU Storage 24 G 8 cores SATA 0.5 TB 48 G 12 cores SATA 1.0 TB 64 G Harpertown SATA 2.0 TB 128 G Sandy Bridge SATA 3.0 TB 192 G Ivy Bridge SATA 4.0 TB 256 G Haswell SATA 6.0 TB 384 G §  Heterogeneous Configurations: 10s of configs of data nodes (collected over the years) without dictating scheduling decisions – let the framework balance out the configs §  Heterogeneous Storage: HDFS supports heterogeneous storage (HDD, SSD, RAM, RAID etc.) – HDFS-2832, HDFS-5682 §  Heterogeneous Scheduling: operate multiple purpose hardware in the same cluster (e.g. GPUs) – YARN 796
  • 15. Network – Common Backplane 15 4 DataNode NodeManager NameNode RM DataNodes RegionServers NameNode HBase Master Nimbus Supervisor Administration, Management and Monitoring ZooKeeper Pools HTTP/HDFS/GDM Load Proxies Applications and Data Data Feeds Data Stores Oozie Server HS2/ HCat Network Backplane
  • 16. Network – Bottleneck Awareness 16 4 Hadoop Cluster (Data Set 1) Hadoop Cluster (Data Set 2) HBase Cluster (Low-latency Data Store) Storm Cluster (Real-time / Stream Processing) Large dataset joins or data sharing over network 1 Large extractions may saturate the network 2 Fast bulk updates may saturate the network 3 Large data copies may not be possible 4
  • 17. Network – 1G BAS (Rack Locality Not A Major Issue) 17 4 RSW … … … N x RSW RSW BAS1-1 BAS1-2 FAB 1 FAB 2 FAB 3 FAB 4 FAB 5 FAB 6 FAB 7 FAB 8 L3 Backplane RSW … … … N x RSW RSW BAS8-1 BAS8-2 L3 Backplane … 1 Gbps 2:1 oversubscription 10 Gbps 8 x 10 Gbps Fabric Layer 48 racks, 15,360 hosts SPOF!
  • 18. Network –10G CLOS (Server Placement Not an Issue) 18 4 Spine 1 Leaf 1 Spine 2 Leaf 2 Leaf 3 Leaf 4 Spine 15 Leaf 29 Leaf 30 Leaf 31 Spine 0 Leaf 0 . . . . . . Virtual Chassis 0 Spine 1 Leaf 1 Spine 2 Leaf 2 Leaf 3 Leaf 4 Spine 15 Leaf 29 Leaf 30 Leaf 31 Spine 0 Leaf 0 . . . . . . Virtual Chassis 1 RSW N x RSW RSW 10 Gbps 5:1 oversubscription 16 spines, 32 leafs 2 x 40 Gbps 512 racks, 20,480 hosts SPOF!
  • 19. Network – Gen Next 19 4 Source: http://www.opencompute.org
  • 20. Software Stack – Where are We Today 20 5 Compute Services Storage Infrastructure Services Hive (0.13, 1.0) Pig (0.11, 0.14) Oozie (4.4) HDFS Proxy (3.2) GDM (6.2) YARN (2.6) MapReduce (2.6) HDFS (2.6) HBase (0.98) Zookeeper Grid UI (SS/Doppler, Discovery, Hue 3.7) Monitoring Starling, Timeline Server Messaging Service HCatalog (0.13, 1.0) Storm (0.9) Spark (1.3) Tez (0.6)
  • 21. Software Stack – Obsess With Use Cases, Not Tech 21 5 HDFS (File System) YARN (Scheduling, Resource Management) Compute Services Storage Infrastructure Services Hive (0.13, 1.0) Pig (0.11, 0.14) Oozie (4.4) HDFS Proxy (3.2) GDM (6.2) YARN (2.6) MapReduce (2.6) HDFS (2.6) HBase (0.98) Zookeeper Grid UI (SS/Doppler, Discovery, Hue 3.7) Monitoring Starling, Timeline Server Messaging Service HCatalog (0.13, 1.0) Storm (0.9) Spark (1.3) Tez (0.6) Common In- progress, Unmet needs or Apache Alignment Platformized Tech with Production Support RHEL6 64-bit, JDK8
  • 22. Security and Account Management – Overview 22 6 Grid Identity, Authentication and Authorization User Id SSO Groups, Netgroups, Roles RPC (GSSAPI) UI (SPNEGO)
  • 23. Security and Account Management – Flexibly Secure 23 6 Kerb Realm 2 (Users) Kerb Realm 1 (Projects, Services) IdP SP CLIENTS CORP PROD Auth User SSO Netgroups Hadoop RPC Delegation tokens Block tokens Job tokens Grid
  • 24. Data Lifecycle Management and BCP 24 7 Acquisition Replication (Feeds)Source Retention (Policy based Expiration) Archival (Tape Backup) DataOut Data Lifecycle Datastore Datastore defines a data source/target (e.g. HDFS) Dataset Defines the data flow of a feed Workflow Defines a unit of work carried out by acquisition, replication, retention servers for moving an instance of a feed
  • 25. Data Lifecycle Management and BCP 25 7 MetaStore Cluster 1 - Colo 1 HDFS Cluster 2 – Colo 2 HDFS Grid Data Management Feed Acquisition MetaStore Feed datasets as partitioned external tables Growl extracts schema for backfill HCatClient. addPartitions(…) Mark LOAD_DONE HCatClient. addPartitions(…) Mark LOAD_DONE Partitions are dropped with (HCatClient.dropPartitions(…)) after retention expiration with a drop_partition notification add_partition event notification add_partition event notification Acquisition Archival, Dataout Retention Feed Replication
  • 26. Metering, Audit, and Governance 26 8 Starling FS, Job, Task logs Cluster 1 Cluster 2 Cluster n... CF, Region, Action, Query Stats Cluster 1 Cluster 2 Cluster n... DB, Tbl., Part., Colmn. Access Stats ...MS 1 MS 2 MS n GDM Data Defn., Flow, Feed, Source F 1 F 2 F n Log Warehouse Log Sources
  • 27. Metering, Audit, and Governance 27 8 Data Discovery and Access Public Non-sensitive Financial Restricted $ Governance Classification No addn. reqmt. LMS Integration Stock Admin Integration Approval Flow
  • 28. Integration with External Systems 28 9 BI, Reporting, Transactional DBs Hadoop Customers … DH Cloud Messaging Serving Systems Monitoring, Tools, Portals Infrastructure in Transition
  • 29. Debunking Myths 29 10 Hadoop isn’t enterprise ready Hadoop isn’t stable, clusters go down You lose data on HDFS Data cannot be shared across the org NameNodes do not scale Software upgrades are rare✗ Hadoop use cases are limited I need expensive servers to get more Hadoop is so dead I need Apache this vs. that ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗