SlideShare ist ein Scribd-Unternehmen logo
1 von 14
HBase vs. Hive

     Philip Wickline
Chief Technology Officer
         Hadapt
Goals


Brief introduction to the differences between
   transactional/operational and analytical systems



Understand when to use Hive and when to use HBase and why




                                                            2
Databases




            3
Datastores




             4
Differences of Purpose : “Transaction Processing”
Operational systems
• Optimized for small short random access – reads and writes
• E.g. record that an employee invested $100 in a S&P500 index
  fund in his 401(k) *or* record that a user posted something on
  another users “wall”

Traditional DB examples
• Oracle
• MySQL
NoSQL Examples
• HBase
• MongoDB
• Cassandra
                                                                   5
Differences of Purpose: Analytics
Analytics
• Optimized for read-only computations about large amounts of
  data
• E.g. compute the average amount invested in bond funds and
  stock funds for all employees at all employers over the last 5
  years                              10
                                                           5
                                                           0                       5-10

DB Examples                                                             Option 1   0-5


• Netezza
• Vertica
                  16
                  14
                  12                                                     Option 1
NoSQL Examples    10
                   8                                           Plan                       Acme
                   6
• Hive                                                         Actual                     GM
                   4
                                                                                          Newco
                   2

• Pig              0                                                                      Oldco
                       Oct   Nov   Dec   Jan   Feb   Mar                                  Bigcorp


                                                                                                    6
HBase Data Model : Conceptual


From the BigTable paper:
“a sparse, distributed, persistent multi-dimensional sorted map”



(row : bytestring, column family : bytestring, column : bytestring,
time : int64) -> byte string




                                                                      7
HBase Map
{ ”key_1" : {
   ”columnfamily_a" : {
     ”column_i" : {
       15 : "y",
       4 : "m"
     },
     ”column_ii" : {
       15 : "d”,
   }},
   “columnfamily_b" : {
     ”column_other" : {
       6 : "w"
       3 : "o"
       1 : "w”
  }}}}
                          8
Hive Data Model : Conceptual
Traditional Relational Tables

CUSTKEY   NAME   ADDRESS      NATIONKEY   PHONE      ACCTBAL      COMMENT
451234    NEWC   196          1           111-555-   $1,231,285   NULL
          ORP    Broadway                 1212
                 …
887765    ACME   1 Main st.   2           222-555-   $46,945      “Top
                 …                        1212                    customer”




                                                                              9
HBase Data Model : Physical

Every cell stored with row, family, column and timestamp
Allows fast lookup with low copy overhead
BUT
Space inefficient (optional compression available) and inefficient
   to scan

      “key_1”   “cf_a”    “c_i”     15        “foo”
      “key_1”   “cf_a”    “c_ii”    15        “bar”
      “key_2”   “cf_a”    “c_ii”    4         “baz”




                                                                     10
Hive Data Model : Physical
Depends on the underlying storage files
Can use flat text files, RCFiles, even use HBase for storage



Standard Row Storage

    C_1        C_2        C_3        C_4
    11         12         13         14
    21         22         23         24
    31         32         33         34
    41         42         43         44
    51         52         53         54



                                                               11
Hive Data Model : RCFile
Break into row groups, and then store as columns

                         Row Group 1
       C_1          11           21           31
       C_2          12           22           32
       C_3          13           23           33
       C_4          14           24           34


                   Row Group 2
       C_1          41           51
       C_2          42           52
       C_3          43           53
       C_4          44           54



                                                   12
Informal Performance Comparison


                   Hive                HBase
  Insert Speed     batch               Fast!
  Update Speed     NA                  Fast!
  Lookup speed     MR lower bound      Fast!
                   (10s of seconds)
  Data warehouse   15x faster on one   Uh oh
  queries          test




                                               13
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
Byeongweon Moon
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
Hortonworks
 
Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
Tim Lossen
 

Was ist angesagt? (20)

Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertable
 
Redis深入浅出
Redis深入浅出Redis深入浅出
Redis深入浅出
 
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
HBaseCon 2013: Honeycomb - MySQL Backed by Apache HBase
 
What's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its BeyondWhat's New Tajo 0.10 and Its Beyond
What's New Tajo 0.10 and Its Beyond
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Boosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and SparkBoosting Machine Learning with Redis Modules and Spark
Boosting Machine Learning with Redis Modules and Spark
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?Key-Value-Stores -- The Key to Scaling?
Key-Value-Stores -- The Key to Scaling?
 
Apache drill
Apache drillApache drill
Apache drill
 
Pig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in AnalyticsPig with Cassandra: Adventures in Analytics
Pig with Cassandra: Adventures in Analytics
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 

Ähnlich wie H base vs hive srp vs analytics 2-14-2012

Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadata
Enkitec
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDB
Johannes Hoppe
 
Hadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneHadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry Osborne
Enkitec
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
Caserta
 

Ähnlich wie H base vs hive srp vs analytics 2-14-2012 (20)

NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Kerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadataKerry osborne hadoop meets exadata
Kerry osborne hadoop meets exadata
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDB
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
No Sql
No SqlNo Sql
No Sql
 
Hadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry OsborneHadoop Meets Exadata- Kerry Osborne
Hadoop Meets Exadata- Kerry Osborne
 
Mongodb lab
Mongodb labMongodb lab
Mongodb lab
 
מיכאל
מיכאלמיכאל
מיכאל
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 

H base vs hive srp vs analytics 2-14-2012

  • 1. HBase vs. Hive Philip Wickline Chief Technology Officer Hadapt
  • 2. Goals Brief introduction to the differences between transactional/operational and analytical systems Understand when to use Hive and when to use HBase and why 2
  • 5. Differences of Purpose : “Transaction Processing” Operational systems • Optimized for small short random access – reads and writes • E.g. record that an employee invested $100 in a S&P500 index fund in his 401(k) *or* record that a user posted something on another users “wall” Traditional DB examples • Oracle • MySQL NoSQL Examples • HBase • MongoDB • Cassandra 5
  • 6. Differences of Purpose: Analytics Analytics • Optimized for read-only computations about large amounts of data • E.g. compute the average amount invested in bond funds and stock funds for all employees at all employers over the last 5 years 10 5 0 5-10 DB Examples Option 1 0-5 • Netezza • Vertica 16 14 12 Option 1 NoSQL Examples 10 8 Plan Acme 6 • Hive Actual GM 4 Newco 2 • Pig 0 Oldco Oct Nov Dec Jan Feb Mar Bigcorp 6
  • 7. HBase Data Model : Conceptual From the BigTable paper: “a sparse, distributed, persistent multi-dimensional sorted map” (row : bytestring, column family : bytestring, column : bytestring, time : int64) -> byte string 7
  • 8. HBase Map { ”key_1" : { ”columnfamily_a" : { ”column_i" : { 15 : "y", 4 : "m" }, ”column_ii" : { 15 : "d”, }}, “columnfamily_b" : { ”column_other" : { 6 : "w" 3 : "o" 1 : "w” }}}} 8
  • 9. Hive Data Model : Conceptual Traditional Relational Tables CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL COMMENT 451234 NEWC 196 1 111-555- $1,231,285 NULL ORP Broadway 1212 … 887765 ACME 1 Main st. 2 222-555- $46,945 “Top … 1212 customer” 9
  • 10. HBase Data Model : Physical Every cell stored with row, family, column and timestamp Allows fast lookup with low copy overhead BUT Space inefficient (optional compression available) and inefficient to scan “key_1” “cf_a” “c_i” 15 “foo” “key_1” “cf_a” “c_ii” 15 “bar” “key_2” “cf_a” “c_ii” 4 “baz” 10
  • 11. Hive Data Model : Physical Depends on the underlying storage files Can use flat text files, RCFiles, even use HBase for storage Standard Row Storage C_1 C_2 C_3 C_4 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 51 52 53 54 11
  • 12. Hive Data Model : RCFile Break into row groups, and then store as columns Row Group 1 C_1 11 21 31 C_2 12 22 32 C_3 13 23 33 C_4 14 24 34 Row Group 2 C_1 41 51 C_2 42 52 C_3 43 53 C_4 44 54 12
  • 13. Informal Performance Comparison Hive HBase Insert Speed batch Fast! Update Speed NA Fast! Lookup speed MR lower bound Fast! (10s of seconds) Data warehouse 15x faster on one Uh oh queries test 13

Hinweis der Redaktion

  1. Not about HadaptBe inclusive of beginnersBe brief
  2. Not a religious presentation – different systems have different properties that work for different needs
  3. 10 GB tpc_h dataCDH3B3 hive and HBaseSingle node desktop workstation, 4 cores, 8GB, a few drives