SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Gaurav Kohli
                              Xebia
Breaking with   DBMS and
Dating with




                1
me




Gaurav Kohli
gaurav.in@gmail.com

Consultant
Xebia IT Architects

                      2
   Why are we here ?
   Something about RDBMS
   Limitations of RDBMS
   Why Hbase or any NoSql solution
   Overview of Hbase
   Specific Use cases
   Paradigm shift in Schema Design
   Architecture of Hbase
   Hbase Interface – Java API, Thrift
   Conclusion              3
Databases




            4
Relational Databases have a lot of




                        5
   Data Set going into PetaBytes
   RDBMS don't scale inherently
       Scale up/Scale out ( Load Balancing + Replication)
   Hard to shard / partition
   Both read / write throughput not possible
       Transactional / Analytical databases
   Specialized Hardware …... is very expensive
       Oracle clustering


                              6
Master



Replication



              Slave



         7
Master
                                             Writes


                                                           Reads
Slave nodes



                 MySQL master becomes a problem
                 All Slaves must have the same write capacity as master
                 Single point of failure, no easy failover


                             8
Master                    Master




Replication

                  Slave



              9
10
11
   2006.11
      Google releases paper on BigTable

   2007.2
      Initial HBase prototype created as Hadoop contrib.

   2007.10
      First usable HBase

   2008.1
      Hadoop become Apache top-level project and HBase becomes
       subproject
   2010.5~
      Hbase becomes Apache top-level project

   2010.6
       Hbase 0.26.5 released.
   2010.10
                                 12
       HBase 0.89.2010092 – third developer release
   Distributed
       uses HDFS for storage
   Column-Oriented
   Multi-Dimensional
       versions
   High-Availability
   High-Performance
   Storage System

                                13
Hbase is
     A Sql Database
         No Joins, no query engine, no datatypes, no sql
     No Schema
     Denormalized data
     Wide and sparsely populated data structure(key-
      value)
     No DBA needed



                             14
   Bigness
       Big data, big number of users, big number of computers
   Massive write performance
       Facebook needs 135 billion messages a month
       Twitter stores 7 TB data per day
   Fast key-value access
   Write availability
   No Single point of failure


                              15
Specific
     Managing large streams of non-transactional data: Apache
      logs, application logs, MySQL logs, etc.
     Real-time inserts, updates, and queries.
     Fraud detection by comparing transactions to known
      patterns in real-time.
     Analytics - Use MapReduce, Hive, or Pig to perform
      analytical queries




                               16
   Column-oriented database
   Table are sorted by Row
   Table schema only defines Column families
       column family can have any number of columns
   Each cell value has a timestamp




                            17
18
19
Sorted Map(
    RowKey, List(
        SortedMap(
          Column, List(
             value, Timestamp
          )
        )
    )
)
SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp)))

                           20
 A BIG SORTED MAP
     Row Key+ Column Key + timestamp => value
                                Column family
              Student table
              Row Key     Column Key        Timestamp         Value
              1           info:name         1273516197868     Gaurav
              1           info:age          1273871824184     28
  Sorted by                                                              2 Versions
Row key and   1           info:age          1273871823022     34         of this row
 column key
              1           info:sex          1273746281432     Male
              2           info:name         1273863723227     Harsh
              3           Info:name         1273822456433     Raman


                  Column Qualifier/Name      Timestamp is a long value
                                       21
   Example of a Student and Subject


      Student Table                     Subject Table
      PK   id                           PK   id
                         m          n
           name                              title
           age                               introduction
           sex                               teacher_id



                  Student-Subject Table
                  student_id
                  subject_id
                  type


                               22
RDBMS

       Example of a Student and Subject
Student table

    key     name             age               sex
    1       Gaurav           28                Male

Subject table

    id       title          introduction              teacher_id
    1        Hbase          Hbase is cool             10

Student-Subject table

    student_id       subject_id         type
    1                1                  elective


                                   23
Hbase

   Student-Subject schema - Hbase
Student table

Row Key           Column family Column Keys
student_id        info            name, age, sex
student_id        subjects        Subject Id's as qualifier(key)
Subject table

Row Key           Column family Column Keys
subject_id        info            title, introduction, teacher_id
subject_id        students        Student id's as qualifier(key)




                             24
Hbase

       Student-Subject schema - Hbase
Student table
key               info                              subjects
1                 info:name=Gaurav                  subjects:1=”elective”
                  info:age=28                       subjects:2=”main”
                  info:sex=Male

Subject table
    key           info                              students
    1             info:title=Hbase                  students:1
                  info:introduction=Hbase is cool   students:2
                  info:teacher_id=10




                                   25
Attribute     Possible Values         Default
COMPRESSION   NONE,GZ,LZO             NONE
VERSIONS      1+                      3
TTL           1-2147483647(seconds)   2147483647

BLOCKSIZE     1 byte – 2 GB           64k
IN_MEMORY     true,false              false
BLOCKCACHE    true,false              true




                      26
   Region: Contiguous set of lexicographically sorted
    rows
       hbase.hregion.max.filesize (default:256 Mb)
   Region hosted by Region Servers
   Each Table is partitioned into Regions




                          27
Regions and


     row1


     row200

     row201


     row500

     new row




               28
Regions and


     row1


     row200


     row201


     row350
     row 351

     row 501




               29
   Master
   Zookeeper
   RegionServers
   HDFS
   MapReduce




                    30
31
– Java API, Thrift...




            32
– Java API, Thrift...
   Java
   Thrift ( Ruby, Php, Python, Perl, C++... )
   REST
   Groovy DSL
   MapReduce
   Hbase Shell




                          33
– Java API, Thrift...
   Java
       Get
       Put
       Delete
       Scan
       IncrementalColumnValue




                           34
35
   Hbase v/s RDBMS
       Not a replacement
       Solves only a small subset(~5%)




                              36
   Where Sql makes life easy
       Joining
       Secondary Indexing
       Referential Integrity (updates)
       ACID
   Where Hbase makes life easy
       Dataset scale
       Read/Write scale
       Replication
       Batch analysis
                              37
38
39
   Hbase Apache (http://hbase.apache.org/)
   Hbase Wiki (wiki.apache.org/hadoop/Hbase)
   Hbase blog (blog.hbase.org)
   Images from Google Search
   http://www.larsgeorge.com/2009/10/hbase-
    architecture-101-storage.html
   http://highscalability.com/blog/2010/12/6/what-the-
    heck-are-you-actually-using-nosql-for.html




                            40

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 

Was ist angesagt? (14)

Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
HDFS
HDFSHDFS
HDFS
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
No SQL introduction
No SQL introductionNo SQL introduction
No SQL introduction
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 

Ähnlich wie Breaking with relational dbms and dating with hbase

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...IndicThreads
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational modelChirag vasava
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDBJohannes Hoppe
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용Byeongweon Moon
 
Cascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingCascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingAlexander Schätzle
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalMichael Rainey
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for HadoopHive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoopbigdatasyd
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for roboticsJoão Gabriel Lima
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlKhanderao Kand
 

Ähnlich wie Breaking with relational dbms and dating with hbase (20)

Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Dbms relational model
Dbms relational modelDbms relational model
Dbms relational model
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDB
 
No sql solutions - 공개용
No sql solutions - 공개용No sql solutions - 공개용
No sql solutions - 공개용
 
Cascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join ProcessingCascading Map-Side Joins over HBase for Scalable Join Processing
Cascading Map-Side Joins over HBase for Scalable Join Processing
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Hive: Data Warehousing for Hadoop
Hive: Data Warehousing for HadoopHive: Data Warehousing for Hadoop
Hive: Data Warehousing for Hadoop
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for robotics
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 

Kürzlich hochgeladen

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Kürzlich hochgeladen (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Breaking with relational dbms and dating with hbase

  • 1. Gaurav Kohli Xebia Breaking with DBMS and Dating with 1
  • 3. Why are we here ?  Something about RDBMS  Limitations of RDBMS  Why Hbase or any NoSql solution  Overview of Hbase  Specific Use cases  Paradigm shift in Schema Design  Architecture of Hbase  Hbase Interface – Java API, Thrift  Conclusion 3
  • 6. Data Set going into PetaBytes  RDBMS don't scale inherently  Scale up/Scale out ( Load Balancing + Replication)  Hard to shard / partition  Both read / write throughput not possible  Transactional / Analytical databases  Specialized Hardware …... is very expensive  Oracle clustering 6
  • 8. Master Writes Reads Slave nodes  MySQL master becomes a problem  All Slaves must have the same write capacity as master  Single point of failure, no easy failover 8
  • 9. Master Master Replication Slave 9
  • 10. 10
  • 11. 11
  • 12. 2006.11  Google releases paper on BigTable  2007.2  Initial HBase prototype created as Hadoop contrib.  2007.10  First usable HBase  2008.1  Hadoop become Apache top-level project and HBase becomes subproject  2010.5~  Hbase becomes Apache top-level project  2010.6  Hbase 0.26.5 released.  2010.10 12  HBase 0.89.2010092 – third developer release
  • 13. Distributed  uses HDFS for storage  Column-Oriented  Multi-Dimensional  versions  High-Availability  High-Performance  Storage System 13
  • 14. Hbase is  A Sql Database  No Joins, no query engine, no datatypes, no sql  No Schema  Denormalized data  Wide and sparsely populated data structure(key- value)  No DBA needed 14
  • 15. Bigness  Big data, big number of users, big number of computers  Massive write performance  Facebook needs 135 billion messages a month  Twitter stores 7 TB data per day  Fast key-value access  Write availability  No Single point of failure 15
  • 16. Specific  Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, etc.  Real-time inserts, updates, and queries.  Fraud detection by comparing transactions to known patterns in real-time.  Analytics - Use MapReduce, Hive, or Pig to perform analytical queries 16
  • 17. Column-oriented database  Table are sorted by Row  Table schema only defines Column families  column family can have any number of columns  Each cell value has a timestamp 17
  • 18. 18
  • 19. 19
  • 20. Sorted Map( RowKey, List( SortedMap( Column, List( value, Timestamp ) ) ) ) SortedMap(RowKey,List(SortedMap(Column,List(Value,Timestamp))) 20
  • 21.  A BIG SORTED MAP  Row Key+ Column Key + timestamp => value Column family Student table Row Key Column Key Timestamp Value 1 info:name 1273516197868 Gaurav 1 info:age 1273871824184 28 Sorted by 2 Versions Row key and 1 info:age 1273871823022 34 of this row column key 1 info:sex 1273746281432 Male 2 info:name 1273863723227 Harsh 3 Info:name 1273822456433 Raman Column Qualifier/Name Timestamp is a long value 21
  • 22. Example of a Student and Subject Student Table Subject Table PK id PK id m n name title age introduction sex teacher_id Student-Subject Table student_id subject_id type 22
  • 23. RDBMS  Example of a Student and Subject Student table key name age sex 1 Gaurav 28 Male Subject table id title introduction teacher_id 1 Hbase Hbase is cool 10 Student-Subject table student_id subject_id type 1 1 elective 23
  • 24. Hbase  Student-Subject schema - Hbase Student table Row Key Column family Column Keys student_id info name, age, sex student_id subjects Subject Id's as qualifier(key) Subject table Row Key Column family Column Keys subject_id info title, introduction, teacher_id subject_id students Student id's as qualifier(key) 24
  • 25. Hbase  Student-Subject schema - Hbase Student table key info subjects 1 info:name=Gaurav subjects:1=”elective” info:age=28 subjects:2=”main” info:sex=Male Subject table key info students 1 info:title=Hbase students:1 info:introduction=Hbase is cool students:2 info:teacher_id=10 25
  • 26. Attribute Possible Values Default COMPRESSION NONE,GZ,LZO NONE VERSIONS 1+ 3 TTL 1-2147483647(seconds) 2147483647 BLOCKSIZE 1 byte – 2 GB 64k IN_MEMORY true,false false BLOCKCACHE true,false true 26
  • 27. Region: Contiguous set of lexicographically sorted rows  hbase.hregion.max.filesize (default:256 Mb)  Region hosted by Region Servers  Each Table is partitioned into Regions 27
  • 28. Regions and row1 row200 row201 row500 new row 28
  • 29. Regions and row1 row200 row201 row350 row 351 row 501 29
  • 30. Master  Zookeeper  RegionServers  HDFS  MapReduce 30
  • 31. 31
  • 32. – Java API, Thrift... 32
  • 33. – Java API, Thrift...  Java  Thrift ( Ruby, Php, Python, Perl, C++... )  REST  Groovy DSL  MapReduce  Hbase Shell 33
  • 34. – Java API, Thrift...  Java  Get  Put  Delete  Scan  IncrementalColumnValue 34
  • 35. 35
  • 36. Hbase v/s RDBMS  Not a replacement  Solves only a small subset(~5%) 36
  • 37. Where Sql makes life easy  Joining  Secondary Indexing  Referential Integrity (updates)  ACID  Where Hbase makes life easy  Dataset scale  Read/Write scale  Replication  Batch analysis 37
  • 38. 38
  • 39. 39
  • 40. Hbase Apache (http://hbase.apache.org/)  Hbase Wiki (wiki.apache.org/hadoop/Hbase)  Hbase blog (blog.hbase.org)  Images from Google Search  http://www.larsgeorge.com/2009/10/hbase- architecture-101-storage.html  http://highscalability.com/blog/2010/12/6/what-the- heck-are-you-actually-using-nosql-for.html 40