SlideShare ist ein Scribd-Unternehmen logo
1 von 53
Downloaden Sie, um offline zu lesen
1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only
MPP vs Hadoop
Alexey Grishchenko
HUG Meetup
28.11.2015
2Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
3Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
4Pivotal Confidential–Internal Use Only
Distributed Systems
Avoid distributed systems in all the problems that potentially
could be solved using non-distributed systems
5Pivotal Confidential–Internal Use Only
Distributed Systems
Ÿ  Consensus problem
–  Paxos
–  RAFT
–  ZAB
–  etc.
Ÿ  Transaction consistency
–  2PC
–  3PC
6Pivotal Confidential–Internal Use Only
Distributed Systems
Ÿ  CAP Theorem
7Pivotal Confidential–Internal Use Only
Distributed Systems
http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
8Pivotal Confidential–Internal Use Only
Distributed Systems
Reasons to use
Ÿ  Performance issues
–  More than 100’000 TPS
–  More than 4 GB/sec scan rate
–  More than 100’000 IOPS
Ÿ  Capacity issues
–  More than 50TB of data
Ÿ  DR and Geo-Distribution
9Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
10Pivotal Confidential–Internal Use Only
MPP
Main principles
Ÿ  Shared Nothing
Ÿ  Data Sharding
Ÿ  Data Replication
Ÿ  Distributed Transactions
Ÿ  Parallel Processing
11Pivotal Confidential–Internal Use Only
MPP
12Pivotal Confidential–Internal Use Only
MPP
Works well for
Ÿ  Relational data
Ÿ  Batch processing
Ÿ  Ad hoc analytical SQL
Ÿ  Low concurrency
Ÿ  Applications requiring ANSI SQL
13Pivotal Confidential–Internal Use Only
MPP
Not the best choice for
Ÿ  Non-relational data
Ÿ  OLTP and event stream processing
Ÿ  High concurrency
Ÿ  100+ server clusters
Ÿ  Non-analytical use cases
Ÿ  Geo-Distributed use cases
14Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
15Pivotal Confidential–Internal Use Only
Hadoop
Main Components
Ÿ  HDFS
Ÿ  YARN
Ÿ  MapReduce
Ÿ  HBase
Ÿ  Hive / Hive+Tez
16Pivotal Confidential–Internal Use Only
Hadoop
HDFS
Ÿ  Distributed filesystem
Ÿ  Block-level storage with big blocks
Ÿ  Non-updatable
Ÿ  Synchronous block replication
Ÿ  No built-in Geo-Distribution support
Ÿ  No built-in DR solution
17Pivotal Confidential–Internal Use Only
Hadoop
HDFS
18Pivotal Confidential–Internal Use Only
Hadoop
YARN
Ÿ  Cluster resource manager
Ÿ  Manages CPU and RAM allocation
Ÿ  Schedulers are pluggable
Ÿ  Can handle different resource pools
Ÿ  Supports both MR and non-MR workload
19Pivotal Confidential–Internal Use Only
Hadoop
YARN
20Pivotal Confidential–Internal Use Only
Hadoop
MapReduce
Ÿ  Framework for distributed data processing
Ÿ  Two main operations: map and reduce
Ÿ  Data hits disk after “map” and before “reduce”
Ÿ  Scales to thousands of servers
Ÿ  Can process petabytes of data
Ÿ  Extremely reliable
21Pivotal Confidential–Internal Use Only
Hadoop
MapReduce
22Pivotal Confidential–Internal Use Only
Hadoop
HBase
Ÿ  Distributed key-value store
Ÿ  Data is sharded by key
Ÿ  Data is stored in sorted order
Ÿ  Stores multiple versions of the row
Ÿ  Easily scales
23Pivotal Confidential–Internal Use Only
Hadoop
HBase
24Pivotal Confidential–Internal Use Only
Hadoop
Hive
Ÿ  Query engine with SQL-like syntax
Ÿ  Translates HiveQL query to MR / Tez / Spark job
Ÿ  Processes HDFS data
Ÿ  Supports UDFs and UDAFs
25Pivotal Confidential–Internal Use Only
Hadoop
Hive
26Pivotal Confidential–Internal Use Only
Hadoop
Works well for
Ÿ  Write Once Read Many
Ÿ  100+ server clusters
Ÿ  Both relational and non-relational data
Ÿ  High concurrency
Ÿ  Batch processing and analytical workload
Ÿ  Elastic scalability
27Pivotal Confidential–Internal Use Only
Hadoop
Not the best choice for
Ÿ  Write-heavy workloads
Ÿ  Small clusters
Ÿ  Analytical DWH cases
Ÿ  OLTP and event stream processing
Ÿ  Cost savings
28Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Summary
29Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
30Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
31Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
32Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
33Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
34Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
35Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
36Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
37Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
38Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
Target Systems DWH Purpose-Built Batch
39Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Business
MPP Hadoop
Platform Openness Mostly Closed Open
Hardware Options Mostly Appliances Commodity
Vendor Lock-in Typical Not Common
Technology Price $200K – $10M $50K – $500K
Implementation Cost Moderate High
Extensibility Vendor-provided APIs Open Source
Supportability Easy Complex
Scalability Up to 100 servers Up to 5000 servers
Scalability Up to 100-300 TB Up to 100 PB
Target Systems DWH Purpose-Built Batch
Target End Users Business Analysts Developers
40Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
41Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
42Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
43Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
44Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
45Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
46Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
47Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
48Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
Min Collection Size Megabytes Gigabytes
49Pivotal Confidential–Internal Use Only
MPP vs Hadoop for Architect
MPP Hadoop
Query Optimization Good Poor to None
Debugging Easy Very Hard
Accessibility SQL Mainly Java
DBA Skill Level Low High
Single Job Redundancy Low High
Query Latency 10-20 ms 10-20 sec
Query Runtime 5-7 sec 10-15 mins
Query Max Runtime 1-2 hours 1-2 weeks
Min Collection Size Megabytes Gigabytes
Max Concurrency 10-15 queries 70-100 jobs
50Pivotal Confidential–Internal Use Only
Agenda
Ÿ  Distributed Systems
Ÿ  MPP
Ÿ  Hadoop
Ÿ  MPP vs Hadoop
Ÿ  Examples
Ÿ  Summary
51Pivotal Confidential–Internal Use Only
Summary
Use MPP for
Ÿ  Analytical DWH
Ÿ  Ad hoc analyst SQL queries and BI
Ÿ  Keep under 100TB of data
Use Hadoop for
Ÿ  Specialized data processing systems
Ÿ  Over 100TB of data
52Pivotal Confidential–Internal Use Only 52Pivotal Confidential–Internal Use Only
Questions?
BUILT FOR THE SPEED OF BUSINESS

Weitere ähnliche Inhalte

Was ist angesagt?

Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesWalaa Hamdy Assy
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkDatabricks
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Spark Summit
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuFlink Forward
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache CalciteJordan Halterman
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 

Was ist angesagt? (20)

Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Simplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache SparkSimplifying Big Data Analytics with Apache Spark
Simplifying Big Data Analytics with Apache Spark
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Spark
SparkSpark
Spark
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 

Ähnlich wie MPP vs Hadoop

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargBig Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargAgile Testing Alliance
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on HadoopDataWorks Summit
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingYahoo Developer Network
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsInside Analysis
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewYafang Chang
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
 
The practice of big data - making big data approachable
The practice of big data - making big data approachableThe practice of big data - making big data approachable
The practice of big data - making big data approachablekcmallu
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenDataWorks Summit
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15IBMInfoSphereUGFR
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases DataWorks Summit
 

Ähnlich wie MPP vs Hadoop (20)

Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Big Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya GargBig Data - Hadoop and MapReduce - Aditya Garg
Big Data - Hadoop and MapReduce - Aditya Garg
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise Thinking
 
The practice of big data - making big data approachable
The practice of big data - making big data approachableThe practice of big data - making big data approachable
The practice of big data - making big data approachable
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
Carpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP HavenCarpe Datum: Building Big Data Analytical Applications with HP Haven
Carpe Datum: Building Big Data Analytical Applications with HP Haven
 
Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15Ibm leads way with hadoop and spark 2015 may 15
Ibm leads way with hadoop and spark 2015 may 15
 
SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases  SQL on Hadoop: Defining the New Generation of Analytics Databases
SQL on Hadoop: Defining the New Generation of Analytics Databases
 

Kürzlich hochgeladen

Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 

Kürzlich hochgeladen (20)

Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 

MPP vs Hadoop

  • 1. 1Pivotal Confidential–Internal Use Only 1Pivotal Confidential–Internal Use Only MPP vs Hadoop Alexey Grishchenko HUG Meetup 28.11.2015
  • 2. 2Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 3. 3Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 4. 4Pivotal Confidential–Internal Use Only Distributed Systems Avoid distributed systems in all the problems that potentially could be solved using non-distributed systems
  • 5. 5Pivotal Confidential–Internal Use Only Distributed Systems Ÿ  Consensus problem –  Paxos –  RAFT –  ZAB –  etc. Ÿ  Transaction consistency –  2PC –  3PC
  • 6. 6Pivotal Confidential–Internal Use Only Distributed Systems Ÿ  CAP Theorem
  • 7. 7Pivotal Confidential–Internal Use Only Distributed Systems http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
  • 8. 8Pivotal Confidential–Internal Use Only Distributed Systems Reasons to use Ÿ  Performance issues –  More than 100’000 TPS –  More than 4 GB/sec scan rate –  More than 100’000 IOPS Ÿ  Capacity issues –  More than 50TB of data Ÿ  DR and Geo-Distribution
  • 9. 9Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 10. 10Pivotal Confidential–Internal Use Only MPP Main principles Ÿ  Shared Nothing Ÿ  Data Sharding Ÿ  Data Replication Ÿ  Distributed Transactions Ÿ  Parallel Processing
  • 12. 12Pivotal Confidential–Internal Use Only MPP Works well for Ÿ  Relational data Ÿ  Batch processing Ÿ  Ad hoc analytical SQL Ÿ  Low concurrency Ÿ  Applications requiring ANSI SQL
  • 13. 13Pivotal Confidential–Internal Use Only MPP Not the best choice for Ÿ  Non-relational data Ÿ  OLTP and event stream processing Ÿ  High concurrency Ÿ  100+ server clusters Ÿ  Non-analytical use cases Ÿ  Geo-Distributed use cases
  • 14. 14Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 15. 15Pivotal Confidential–Internal Use Only Hadoop Main Components Ÿ  HDFS Ÿ  YARN Ÿ  MapReduce Ÿ  HBase Ÿ  Hive / Hive+Tez
  • 16. 16Pivotal Confidential–Internal Use Only Hadoop HDFS Ÿ  Distributed filesystem Ÿ  Block-level storage with big blocks Ÿ  Non-updatable Ÿ  Synchronous block replication Ÿ  No built-in Geo-Distribution support Ÿ  No built-in DR solution
  • 18. 18Pivotal Confidential–Internal Use Only Hadoop YARN Ÿ  Cluster resource manager Ÿ  Manages CPU and RAM allocation Ÿ  Schedulers are pluggable Ÿ  Can handle different resource pools Ÿ  Supports both MR and non-MR workload
  • 20. 20Pivotal Confidential–Internal Use Only Hadoop MapReduce Ÿ  Framework for distributed data processing Ÿ  Two main operations: map and reduce Ÿ  Data hits disk after “map” and before “reduce” Ÿ  Scales to thousands of servers Ÿ  Can process petabytes of data Ÿ  Extremely reliable
  • 21. 21Pivotal Confidential–Internal Use Only Hadoop MapReduce
  • 22. 22Pivotal Confidential–Internal Use Only Hadoop HBase Ÿ  Distributed key-value store Ÿ  Data is sharded by key Ÿ  Data is stored in sorted order Ÿ  Stores multiple versions of the row Ÿ  Easily scales
  • 24. 24Pivotal Confidential–Internal Use Only Hadoop Hive Ÿ  Query engine with SQL-like syntax Ÿ  Translates HiveQL query to MR / Tez / Spark job Ÿ  Processes HDFS data Ÿ  Supports UDFs and UDAFs
  • 26. 26Pivotal Confidential–Internal Use Only Hadoop Works well for Ÿ  Write Once Read Many Ÿ  100+ server clusters Ÿ  Both relational and non-relational data Ÿ  High concurrency Ÿ  Batch processing and analytical workload Ÿ  Elastic scalability
  • 27. 27Pivotal Confidential–Internal Use Only Hadoop Not the best choice for Ÿ  Write-heavy workloads Ÿ  Small clusters Ÿ  Analytical DWH cases Ÿ  OLTP and event stream processing Ÿ  Cost savings
  • 28. 28Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Summary
  • 29. 29Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open
  • 30. 30Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity
  • 31. 31Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common
  • 32. 32Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K
  • 33. 33Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High
  • 34. 34Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source
  • 35. 35Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex
  • 36. 36Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers
  • 37. 37Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB
  • 38. 38Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB Target Systems DWH Purpose-Built Batch
  • 39. 39Pivotal Confidential–Internal Use Only MPP vs Hadoop for Business MPP Hadoop Platform Openness Mostly Closed Open Hardware Options Mostly Appliances Commodity Vendor Lock-in Typical Not Common Technology Price $200K – $10M $50K – $500K Implementation Cost Moderate High Extensibility Vendor-provided APIs Open Source Supportability Easy Complex Scalability Up to 100 servers Up to 5000 servers Scalability Up to 100-300 TB Up to 100 PB Target Systems DWH Purpose-Built Batch Target End Users Business Analysts Developers
  • 40. 40Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None
  • 41. 41Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard
  • 42. 42Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java
  • 43. 43Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High
  • 44. 44Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High
  • 45. 45Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec
  • 46. 46Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins
  • 47. 47Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks
  • 48. 48Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks Min Collection Size Megabytes Gigabytes
  • 49. 49Pivotal Confidential–Internal Use Only MPP vs Hadoop for Architect MPP Hadoop Query Optimization Good Poor to None Debugging Easy Very Hard Accessibility SQL Mainly Java DBA Skill Level Low High Single Job Redundancy Low High Query Latency 10-20 ms 10-20 sec Query Runtime 5-7 sec 10-15 mins Query Max Runtime 1-2 hours 1-2 weeks Min Collection Size Megabytes Gigabytes Max Concurrency 10-15 queries 70-100 jobs
  • 50. 50Pivotal Confidential–Internal Use Only Agenda Ÿ  Distributed Systems Ÿ  MPP Ÿ  Hadoop Ÿ  MPP vs Hadoop Ÿ  Examples Ÿ  Summary
  • 51. 51Pivotal Confidential–Internal Use Only Summary Use MPP for Ÿ  Analytical DWH Ÿ  Ad hoc analyst SQL queries and BI Ÿ  Keep under 100TB of data Use Hadoop for Ÿ  Specialized data processing systems Ÿ  Over 100TB of data
  • 52. 52Pivotal Confidential–Internal Use Only 52Pivotal Confidential–Internal Use Only Questions?
  • 53. BUILT FOR THE SPEED OF BUSINESS