SlideShare ist ein Scribd-Unternehmen logo
1 von 19
CLOUD FRIENDLY HADOOP/HIVE
Shrikanth Shankar | Qubole
VP of Engineering
Thursday, July 25, 13
INTRODUCTION
• Hadoop has revolutionized big data processing
• Becoming the de-facto platform for new data projects
• Started as file system (HDFS) + Programming framework (Map-Reduce).An ecosystem of
projects has sprung up on top of Hadoop
• Hive, Pig, Cascading etc. - Simple ways of processing data
• Sqoop, Flume etc. - Data movement into and out of HDFS
• Oozie,Azkaban etc. - Workflow scheduling
• However, these systems were all designed with an on-premise architecture in mind.
• The cloud is different enough - Some things can/should change.
Thursday, July 25, 13
DN/TT DN/TT
ON-PREMISE HADOOP
ARCHITECTURE
Hadoop Cluster
Namenode
JobTracker
DN/TTDN/TTDN/TT ......
IT control
Relational
systems
(Hive metastore etc.)
End User End User ...... End User
Thursday, July 25, 13
HADOOP ON-PREMISE
• Usually deployed on bare-metal nodes*
• HDFS is store of choice (3-way replication for safety). Locality of data
access is a big design point
• Clusters are mostly static - new machines are added on IT schedule*
• Static clusters means users can focus on their tasks (MR jobs, Hive
queries) and not on cluster management
• IT bears the burden of managing clusters
Thursday, July 25, 13
HADOOP ON-PREMISE
• Partitioning of resources
• Static partitioning with different clusters for Batch and
Interactive workloads
• Within a cluster load balancing is done by the JT scheduler
• Capex costs are significant
• IT controlled - requires an Ops team (Hadoop ops, Sysadmin
etc.)
Thursday, July 25, 13
CLOUD ARCHITECTURE
HIGHLY AWS CENTRIC - BUT EVERYONE IS
FOLLOWING FAST
Thursday, July 25, 13
CLOUD COMPONENTS
Object Stores
Ephemeral compute
nodes
Block
Stores
PaaS
Offerings
(RDS, etc.)
Thursday, July 25, 13
INFRASTRUCTURE
CHARACTERISTICS
• Running in aVM
• Not that big a deal usually - except plan for performance variability
• No locality information
• Nodes are ephemeral - if you lose a node you will lose data on the node
• AZ-wide correlated failures are to be expected. Region wide are possible (but rare)
• High capacity Object stores with high cross sectional bandwidth
• High latency, Variability in perf, REMOTE*. Not POSIX compliant
• Persistent block stores
• REMOTE,Variable perf,
Thursday, July 25, 13
INFRASTRUCTURE
CHARACTERISTICS
• ELASTIC
• Add a 100 nodes on demand in a few minutes
• Costs are Op-ex (largely).
• Nodes are per hour (CPU + Disk), Storage is per GB
• Cost management is a key challenge
• Some interesting payment choices (On-demand, Spot, Reserved)
Thursday, July 25, 13
LETS PUTTHESE WORLDS
TOGETHER
Thursday, July 25, 13
STORAGE
• From a cost perspective using HDFS for long term storage
means you pay for both CPU and disk.
• Its also more expensive to make HDFS reliable (cross AZ,
maybe even cross Region?)
• Using an object store allows you to pay only for storage
• With object stores you see latency issues since data is remote
Thursday, July 25, 13
STORAGE
• But node storage is still needed when jobs and queries are
active
• For intermediate job results (not all results should go back
to S3 - e.g. stage outputs in Hive)
• For intermediate data (mapper output)
• Makes scaling nodes challenging
• Also since performance is better - may want to move remote
data to HDFS before accessing
Thursday, July 25, 13
COMPUTE AND CLUSTERS
• If you dont need Hadoop for persistent storage - when do
you need a cluster?
• Bring them up on demand - maybe for every job?
• But that can be expensive - no multiplexing
• Ideally you want to share Hadoop clusters as much as
possible. Shut down cluster when not being used
Thursday, July 25, 13
COMPUTE AND CLUSTERS
• If cluster is dynamic and you need sharing - how do you do
‘discover’ it?
• How about cluster sizing?
• Static is a left over from on-premise
• Be dynamic on the cloud. Hard for end users to do manually
Thursday, July 25, 13
COMPUTE AND CLUSTER
• Adding nodes needs to be done based on load
• E.g. Most of the time jobs need < 5 nodes. A batch job
comes in needs 100 nodes. We should expand the cluster
(for as long as needed)
• Removing nodes is trickier
• If we lose intermediate results lots of work will be lost.
• Job1 uses 100 nodes, produces data spread over all of them.
Job 2 consumes results but only needs 10 nodes. How do you
give up 90 nodes?
Thursday, July 25, 13
COMPUTE AND CLUSTER
• Pricing choices are interesting
• For e.g. spot nodes average half the price of an on-demand
node
• But if price spikes you lose all the spot nodes at once
• Hadoop fault tolerance can retry failed jobs (but expensive) -
what about data loss when you lose all the spot nodes?
Thursday, July 25, 13
END USER EXPERIENCE
• The cloud isnt just about cost - its also about agility.To allow
this we need to focus on the end user experience
• End users would prefer to focus on higher level API’s
• e.g. Run a Hadoop job or a Hive query - specifics of
clusters should be hidden from them
• Some things should be persistent (log files, results, ...)
• They get this for free on premise
Thursday, July 25, 13
BETTER END STATE
• IT/dev ops/users should set high level controls
• Usage governance (max cluster size, max bill, cpu hours used
per month etc.)
• End users should focus at the level they understand
• Smart software should bridge the gap
Thursday, July 25, 13
QUESTIONS?
Thursday, July 25, 13

Weitere ähnliche Inhalte

Was ist angesagt?

Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Alluxio, Inc.
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...OpenNebula Project
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...ScyllaDB
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformHow SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformScyllaDB
 
PLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob BirdPLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob BirdPROIDEA
 
Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)Enrique Catala Bañuls
 
Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsandrewdenty
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016Tzach Livyatan
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScyllaDB
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with ScyllaScyllaDB
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?ScyllaDB
 
Powering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoPowering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoAlluxio, Inc.
 
Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in JavaAltoros
 

Was ist angesagt? (20)

March 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling HadoopMarch 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling Hadoop
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Compression talk
Compression talkCompression talk
Compression talk
 
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy PlatformHow SkyElectric Uses Scylla to Power Its Smart Energy Platform
How SkyElectric Uses Scylla to Power Its Smart Energy Platform
 
PLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob BirdPLNOG15: Exascale future of today - Rob Bird
PLNOG15: Exascale future of today - Rob Bird
 
Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)Dawarehouse como servicio en azure (sqldw)
Dawarehouse como servicio en azure (sqldw)
 
Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAs
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Scylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDBScylla Summit 2022: Stream Processing with ScyllaDB
Scylla Summit 2022: Stream Processing with ScyllaDB
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
 
BDAAS on the Cloud
BDAAS on the CloudBDAAS on the Cloud
BDAAS on the Cloud
 
Powering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoPowering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and Presto
 
Casual mass parallel data processing in Java
Casual mass parallel data processing in JavaCasual mass parallel data processing in Java
Casual mass parallel data processing in Java
 

Andere mochten auch

Curriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNGCurriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNGSamdoo Jung
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
 
Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015Carol Varney
 
How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015StampedeCon
 
Estrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una culturaEstrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una culturaoscar daniel naranjo aristizabal
 
PatrickJacks_Resume2
PatrickJacks_Resume2PatrickJacks_Resume2
PatrickJacks_Resume2Patrick Jacks
 
Economics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programmeEconomics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programmeLSE Enterprise
 
LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015LSE Enterprise
 
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012StampedeCon
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015StampedeCon
 
4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensationfrekhtmanassociates
 
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...StampedeCon
 
2 kien pham cv en vn with project experience
2 kien pham cv en  vn with project experience2 kien pham cv en  vn with project experience
2 kien pham cv en vn with project experienceKien Pham
 
Estrous synchronization
Estrous synchronizationEstrous synchronization
Estrous synchronizationArmia Naguib
 

Andere mochten auch (16)

Curriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNGCurriculum Vitae_Samdoo JUNG
Curriculum Vitae_Samdoo JUNG
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
 
Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015Carol Varney's 1 Resume 2015
Carol Varney's 1 Resume 2015
 
CORRUPTION IN INDIA
CORRUPTION IN INDIACORRUPTION IN INDIA
CORRUPTION IN INDIA
 
How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015How Big Data Will Save Planet Earth - StampedeCon 2015
How Big Data Will Save Planet Earth - StampedeCon 2015
 
Estrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una culturaEstrategias pedagógicas para la enseñanza de una cultura
Estrategias pedagógicas para la enseñanza de una cultura
 
PatrickJacks_Resume2
PatrickJacks_Resume2PatrickJacks_Resume2
PatrickJacks_Resume2
 
Cattalugue2016
Cattalugue2016Cattalugue2016
Cattalugue2016
 
Economics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programmeEconomics and finance of Europe: An intensive ten-week programme
Economics and finance of Europe: An intensive ten-week programme
 
LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015LSE Enterprise Annual Report 2015
LSE Enterprise Annual Report 2015
 
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
Listening for Insights: The Power of Social Media Listening - StampedeCon 2012
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015
 
4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation4 types of train accidents for which victims can claim compensation
4 types of train accidents for which victims can claim compensation
 
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
From Smart Buildings to Smart Cities: An Industry in the Midst of Big Data - ...
 
2 kien pham cv en vn with project experience
2 kien pham cv en  vn with project experience2 kien pham cv en  vn with project experience
2 kien pham cv en vn with project experience
 
Estrous synchronization
Estrous synchronizationEstrous synchronization
Estrous synchronization
 

Ähnlich wie Cloud-Friendly Hadoop and Hive - StampedeCon 2013

Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudRogue Wave Software
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...Mohamed Sayed
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Real-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and HadoopReal-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and HadoopRogue Wave Software
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerIke Ellis
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAlluxio, Inc.
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkHentsū
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonHentsū
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersAmjith Singh
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 

Ähnlich wie Cloud-Friendly Hadoop and Hive - StampedeCon 2013 (20)

Top 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloudTop 10 lessons learned from deploying hadoop in a private cloud
Top 10 lessons learned from deploying hadoop in a private cloud
 
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastru...
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Real-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and HadoopReal-time searching of big data with Solr and Hadoop
Real-time searching of big data with Solr and Hadoop
 
Hadoop for the Absolute Beginner
Hadoop for the Absolute BeginnerHadoop for the Absolute Beginner
Hadoop for the Absolute Beginner
 
Atom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at SupremindAtom: A cloud native deep learning platform at Supremind
Atom: A cloud native deep learning platform at Supremind
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
Hadoop
HadoopHadoop
Hadoop
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Qcon talk
Qcon talkQcon talk
Qcon talk
 

Mehr von StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 

Mehr von StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 

Kürzlich hochgeladen

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Cloud-Friendly Hadoop and Hive - StampedeCon 2013

  • 1. CLOUD FRIENDLY HADOOP/HIVE Shrikanth Shankar | Qubole VP of Engineering Thursday, July 25, 13
  • 2. INTRODUCTION • Hadoop has revolutionized big data processing • Becoming the de-facto platform for new data projects • Started as file system (HDFS) + Programming framework (Map-Reduce).An ecosystem of projects has sprung up on top of Hadoop • Hive, Pig, Cascading etc. - Simple ways of processing data • Sqoop, Flume etc. - Data movement into and out of HDFS • Oozie,Azkaban etc. - Workflow scheduling • However, these systems were all designed with an on-premise architecture in mind. • The cloud is different enough - Some things can/should change. Thursday, July 25, 13
  • 3. DN/TT DN/TT ON-PREMISE HADOOP ARCHITECTURE Hadoop Cluster Namenode JobTracker DN/TTDN/TTDN/TT ...... IT control Relational systems (Hive metastore etc.) End User End User ...... End User Thursday, July 25, 13
  • 4. HADOOP ON-PREMISE • Usually deployed on bare-metal nodes* • HDFS is store of choice (3-way replication for safety). Locality of data access is a big design point • Clusters are mostly static - new machines are added on IT schedule* • Static clusters means users can focus on their tasks (MR jobs, Hive queries) and not on cluster management • IT bears the burden of managing clusters Thursday, July 25, 13
  • 5. HADOOP ON-PREMISE • Partitioning of resources • Static partitioning with different clusters for Batch and Interactive workloads • Within a cluster load balancing is done by the JT scheduler • Capex costs are significant • IT controlled - requires an Ops team (Hadoop ops, Sysadmin etc.) Thursday, July 25, 13
  • 6. CLOUD ARCHITECTURE HIGHLY AWS CENTRIC - BUT EVERYONE IS FOLLOWING FAST Thursday, July 25, 13
  • 7. CLOUD COMPONENTS Object Stores Ephemeral compute nodes Block Stores PaaS Offerings (RDS, etc.) Thursday, July 25, 13
  • 8. INFRASTRUCTURE CHARACTERISTICS • Running in aVM • Not that big a deal usually - except plan for performance variability • No locality information • Nodes are ephemeral - if you lose a node you will lose data on the node • AZ-wide correlated failures are to be expected. Region wide are possible (but rare) • High capacity Object stores with high cross sectional bandwidth • High latency, Variability in perf, REMOTE*. Not POSIX compliant • Persistent block stores • REMOTE,Variable perf, Thursday, July 25, 13
  • 9. INFRASTRUCTURE CHARACTERISTICS • ELASTIC • Add a 100 nodes on demand in a few minutes • Costs are Op-ex (largely). • Nodes are per hour (CPU + Disk), Storage is per GB • Cost management is a key challenge • Some interesting payment choices (On-demand, Spot, Reserved) Thursday, July 25, 13
  • 11. STORAGE • From a cost perspective using HDFS for long term storage means you pay for both CPU and disk. • Its also more expensive to make HDFS reliable (cross AZ, maybe even cross Region?) • Using an object store allows you to pay only for storage • With object stores you see latency issues since data is remote Thursday, July 25, 13
  • 12. STORAGE • But node storage is still needed when jobs and queries are active • For intermediate job results (not all results should go back to S3 - e.g. stage outputs in Hive) • For intermediate data (mapper output) • Makes scaling nodes challenging • Also since performance is better - may want to move remote data to HDFS before accessing Thursday, July 25, 13
  • 13. COMPUTE AND CLUSTERS • If you dont need Hadoop for persistent storage - when do you need a cluster? • Bring them up on demand - maybe for every job? • But that can be expensive - no multiplexing • Ideally you want to share Hadoop clusters as much as possible. Shut down cluster when not being used Thursday, July 25, 13
  • 14. COMPUTE AND CLUSTERS • If cluster is dynamic and you need sharing - how do you do ‘discover’ it? • How about cluster sizing? • Static is a left over from on-premise • Be dynamic on the cloud. Hard for end users to do manually Thursday, July 25, 13
  • 15. COMPUTE AND CLUSTER • Adding nodes needs to be done based on load • E.g. Most of the time jobs need < 5 nodes. A batch job comes in needs 100 nodes. We should expand the cluster (for as long as needed) • Removing nodes is trickier • If we lose intermediate results lots of work will be lost. • Job1 uses 100 nodes, produces data spread over all of them. Job 2 consumes results but only needs 10 nodes. How do you give up 90 nodes? Thursday, July 25, 13
  • 16. COMPUTE AND CLUSTER • Pricing choices are interesting • For e.g. spot nodes average half the price of an on-demand node • But if price spikes you lose all the spot nodes at once • Hadoop fault tolerance can retry failed jobs (but expensive) - what about data loss when you lose all the spot nodes? Thursday, July 25, 13
  • 17. END USER EXPERIENCE • The cloud isnt just about cost - its also about agility.To allow this we need to focus on the end user experience • End users would prefer to focus on higher level API’s • e.g. Run a Hadoop job or a Hive query - specifics of clusters should be hidden from them • Some things should be persistent (log files, results, ...) • They get this for free on premise Thursday, July 25, 13
  • 18. BETTER END STATE • IT/dev ops/users should set high level controls • Usage governance (max cluster size, max bill, cpu hours used per month etc.) • End users should focus at the level they understand • Smart software should bridge the gap Thursday, July 25, 13