SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Introduction to HBase



    Gokuldas K Pillai
       @gokool
HBase - The Hadoop Database
• Based on Google’s BigTable (OSDI’06)
• Runs on top of Hadoop but provides real time
  read/write access
• Distributed Column Oriented Database
HBase Strengths
• Can scale to billions of rows X millions of
  columns
• Relatively cheap & easy to scale
• Random real time access read/write access to
  very large data
• Support for update, delete
Who is using it
• StumpleUpon/ su.pr
    – Uses Hbase as a realtime data storage and analytics platform
• Twitter
    – Distributed read/write backup of all mySQL instances. Powers
      “people search”.
•   Powerset (Now part of MS)
•   Adobe
•   Yahoo
•   Ning
•   Meetup
•   More at http://wiki.apache.org/hadoop/Hbase/PoweredBy
Key features
• Column Oriented store
  – Table costs only for the data stored
  – NULLs in rows are free
• Rows stored in sorted order
• Can scale to Petabytes (At Google)
Comparing to RDBMS
•   No Joins
•   No Query engine
•   No transactions
•   No column typing
•   No SQL, No ODBC/JDBC (Hbql is there now)
Data Model - Tables
•   Tables consisting of rows and columns
•   Table cells are versioned (by timestamp)
•   Tables are sorted by row keys
•   Table access is via primary key
•   Row updates lock the row no matter how
    many columns are involved
Column Families
• Row’s columns are grouped into families
• Column family members identified by a
  common ‘printable’ prefix
• Column family should be predefined
  – but column family members can be added
    dynamically
  – member name can be bytes
• All column family members are collocated on
  disk
Server Architecture
• Similar to HDFS
  – HbaseMaster ~ NameNode
  – RegionServer ~ DataNode
• HBase stores state via the Hadoop FS API
• Can persist to :
  – Local
  – Amazon S3
  – HDFS (Default)
HBaseMaster
What it does:
• Bootstrapping a new instance
• Assignment and handling RegionServer problems
   – Each region from every table is assigned to a RegionServer
• When machines fail, move regions
• When regions split, move regions to balance

What it does NOT do:
    – Handle write requests (Not a DB Master)
    – Handle location finding requests (handled by RegionServer)
RegionServer
• Carry the regions
• Handle client read/write requests
• Manage region splits (inform the Master)
Regions
• Horizontal Partitioning
• Every region has a subset of the table’s rows
• Region identified as
  – [table, first row(+), last row(-)]
• Table starts on a single region
• Splits into two equal sized regions as the
  original region grows bigger and so on..
Zookeeper
• Master election and server availability
• Cluster management
  – Assignment transaction state management
• Client contacts ZooKeeper to bootstrap
  connection to the Hbase cluster
• Region key ranges, region server addresses
• Guarantees consistency of data across clients
Workflow (Client connecting first time)
•   Client  ZooKeeper (returns –ROOT- )
•   Client  -ROOT- (returns .META.)
•   Client  .META. (returns RegionServer)
•   To avoid 3-lookups everytime, client caches
    this info.
    – Recache on fault
Write/Read Operation
• Write request from Client  RegionServer
              Commit log (on HDFS), memstore
                • Flush to filesystem when memstore fills



• Read request from Client  RegionServer
             Lookup the memstore if available
                If not, lookup flush files (reverse chrono. Order)
Integration
• Java HBase Client API
• High performance Thrift gateway
• A REST-ful Web service gateway (Stargate)
  – Supports XML, binary dat encoding options
• Cascading, Hive and Pig integration
• HBase shell (jruby)
• TableInput/TableOutputFormat for MR
Main Classes
• HBaseAdmin
  – Create table, drop table, list and alter table
• HTable
  – Put
  – Get
  – Scan
Alternatives to HBase
• Cassandra (From Facebook)
  – Based on Amazon’s Dynamo
  – No Master-slave but P2P
  – Tunable: Consistency Vs Latency
• Yahoo’s PNUTS
        – Not Open source
        – Works well for multi DC/geographical disbursed servers
References
•   Hadoop – The Definitive Guide
•   Cloudera website
•   http://wiki.hbase.apache.org
•   Lars George,
    – http://www.larsgeorge.com/2009/10/hbase-architecture-
      101-storage.html
• Comparing Hbase, Cassandra and PNUTS
    – http://blog.amandeepkhurana.com/2010/05/comparing-
      pnuts-hbase-and-cassandra.html
• ACID compliance of Hbase -
  http://hbase.apache.org/docs/r0.89.20100621/acid-
  semantics.html

Weitere ähnliche Inhalte

Was ist angesagt?

Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
Edward Yoon
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
asterix_smartplatf
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
IndicThreads
 

Was ist angesagt? (20)

NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
HBaseCon 2012 | Mignify: A Big Data Refinery Built on HBase - Internet Memory...
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
HBase lon meetup
HBase lon meetupHBase lon meetup
HBase lon meetup
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
 
Four NoSQL Databases You Should Know
Four NoSQL Databases You Should KnowFour NoSQL Databases You Should Know
Four NoSQL Databases You Should Know
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
Dancing with the elephant h base1_final
Dancing with the elephant   h base1_finalDancing with the elephant   h base1_final
Dancing with the elephant h base1_final
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
 

Andere mochten auch

Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit
 

Andere mochten auch (7)

Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015Introduction to HBase - NoSqlNow2015
Introduction to HBase - NoSqlNow2015
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 

Ähnlich wie Introduction to Apache HBase

Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
Gokuldas Pillai
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
yongboy
 

Ähnlich wie Introduction to Apache HBase (20)

Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02Hbasepreso 111116185419-phpapp02
Hbasepreso 111116185419-phpapp02
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Hbase
HbaseHbase
Hbase
 
01 hbase
01 hbase01 hbase
01 hbase
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase
HBaseHBase
HBase
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Apache hive
Apache hiveApache hive
Apache hive
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Introduction to Apache HBase

  • 1. Introduction to HBase Gokuldas K Pillai @gokool
  • 2. HBase - The Hadoop Database • Based on Google’s BigTable (OSDI’06) • Runs on top of Hadoop but provides real time read/write access • Distributed Column Oriented Database
  • 3. HBase Strengths • Can scale to billions of rows X millions of columns • Relatively cheap & easy to scale • Random real time access read/write access to very large data • Support for update, delete
  • 4. Who is using it • StumpleUpon/ su.pr – Uses Hbase as a realtime data storage and analytics platform • Twitter – Distributed read/write backup of all mySQL instances. Powers “people search”. • Powerset (Now part of MS) • Adobe • Yahoo • Ning • Meetup • More at http://wiki.apache.org/hadoop/Hbase/PoweredBy
  • 5. Key features • Column Oriented store – Table costs only for the data stored – NULLs in rows are free • Rows stored in sorted order • Can scale to Petabytes (At Google)
  • 6. Comparing to RDBMS • No Joins • No Query engine • No transactions • No column typing • No SQL, No ODBC/JDBC (Hbql is there now)
  • 7. Data Model - Tables • Tables consisting of rows and columns • Table cells are versioned (by timestamp) • Tables are sorted by row keys • Table access is via primary key • Row updates lock the row no matter how many columns are involved
  • 8. Column Families • Row’s columns are grouped into families • Column family members identified by a common ‘printable’ prefix • Column family should be predefined – but column family members can be added dynamically – member name can be bytes • All column family members are collocated on disk
  • 9.
  • 10.
  • 11. Server Architecture • Similar to HDFS – HbaseMaster ~ NameNode – RegionServer ~ DataNode • HBase stores state via the Hadoop FS API • Can persist to : – Local – Amazon S3 – HDFS (Default)
  • 12. HBaseMaster What it does: • Bootstrapping a new instance • Assignment and handling RegionServer problems – Each region from every table is assigned to a RegionServer • When machines fail, move regions • When regions split, move regions to balance What it does NOT do: – Handle write requests (Not a DB Master) – Handle location finding requests (handled by RegionServer)
  • 13. RegionServer • Carry the regions • Handle client read/write requests • Manage region splits (inform the Master)
  • 14. Regions • Horizontal Partitioning • Every region has a subset of the table’s rows • Region identified as – [table, first row(+), last row(-)] • Table starts on a single region • Splits into two equal sized regions as the original region grows bigger and so on..
  • 15. Zookeeper • Master election and server availability • Cluster management – Assignment transaction state management • Client contacts ZooKeeper to bootstrap connection to the Hbase cluster • Region key ranges, region server addresses • Guarantees consistency of data across clients
  • 16. Workflow (Client connecting first time) • Client  ZooKeeper (returns –ROOT- ) • Client  -ROOT- (returns .META.) • Client  .META. (returns RegionServer) • To avoid 3-lookups everytime, client caches this info. – Recache on fault
  • 17. Write/Read Operation • Write request from Client  RegionServer  Commit log (on HDFS), memstore • Flush to filesystem when memstore fills • Read request from Client  RegionServer Lookup the memstore if available If not, lookup flush files (reverse chrono. Order)
  • 18. Integration • Java HBase Client API • High performance Thrift gateway • A REST-ful Web service gateway (Stargate) – Supports XML, binary dat encoding options • Cascading, Hive and Pig integration • HBase shell (jruby) • TableInput/TableOutputFormat for MR
  • 19. Main Classes • HBaseAdmin – Create table, drop table, list and alter table • HTable – Put – Get – Scan
  • 20. Alternatives to HBase • Cassandra (From Facebook) – Based on Amazon’s Dynamo – No Master-slave but P2P – Tunable: Consistency Vs Latency • Yahoo’s PNUTS – Not Open source – Works well for multi DC/geographical disbursed servers
  • 21. References • Hadoop – The Definitive Guide • Cloudera website • http://wiki.hbase.apache.org • Lars George, – http://www.larsgeorge.com/2009/10/hbase-architecture- 101-storage.html • Comparing Hbase, Cassandra and PNUTS – http://blog.amandeepkhurana.com/2010/05/comparing- pnuts-hbase-and-cassandra.html • ACID compliance of Hbase - http://hbase.apache.org/docs/r0.89.20100621/acid- semantics.html

Hinweis der Redaktion

  1. Some are also contributors
  2. Introduce Regions from Tables.
  3. -ROOT- Stores location of the .META. table regions.META. Stores the location of all user regionsEntries have keys as regionName and made up as [tableName, start row, timestamp, hash(1,2,3)]
  4. Writes arriving at a regionserver are first appended to a commit log and then are added to an in-memory memstore. When a memstore fills, its content is flushed to the filesystem.The commit log is hosted on HDFS, so it remains available through a regionserver crash.Reading, the region’s memstore is consulted first. If sufficient versions are found read- ingmemstore alone, we return. Otherwise, flush files are consulted in order, from newest to oldest until versions sufficient to satisfy the query are found, or until we run out of flush files.Compaction – merges multiple flush files into one, removes > max. versions and delete expired cells
  5. Add content one row at a time using Htable.put(Put)Create an instance of Put objectSpecify value, target column and optional TimestampRead using the get method Htable.get(Get)Broad : Get all in a rowNarrow : Return only a single cell valueScan table using Scan classCursor like accessHtable.getScanner(Scan)Invoke next on the returned objectGet, Scan return a Result object which is a List of KeyValue objectsDelete using Htable.delete(Delete) Remove individual cells or entire families etc.Put, Get, Delete lock the row.
  6. Cassandra weak consistency comes in the form of eventual consistency which means the database eventually reaches a consistent state. As the data is replicated, the latest version of something is sitting on some node in the cluster, but older versions are still out there on other nodes, but eventually all nodes will see the latest version.The CAP theorem (Brewer) states that you have to pick two of Consistency, Availability, Partition tolerance: You can't have the three at the same time and get an acceptable latency.