SlideShare ist ein Scribd-Unternehmen logo
1 von 12
HBase CRUD
Use Java API for Create, Read, Update, Delete operations
Agenda
• Intro
• Create
• Insert
• Update
• Delete
• Read – Table Scan
• Read – Get Field
• Conclusions
Intro
A rowkey primarily represents each row uniquely in the HBase table, whereas other
keys such as column family, timestamp, and so on are used to locate a piece of data
in an HBase table. The HBase API provides the following methods to support the
CRUD operations:
• Put
• Get
• Delete
• Scan
• Increment
You could find source code for this presentation on github:
https://github.com/EugeneYushin/HBase-CRUD
Create
Table creates in ‘Enabled’ state. Check table creation in Hue (Cloudera CDH 5.1.0) and hbase shell
Insert
Use HConnection.getTable() against HTablePool as last is deprecated in 0.94, 0.95/0.96, and removed
in 0.98 .
Insert
All manipulations with table implements through
HTableInterface. HTable represents particular table in
Hbase.
The HTable class is not thread-safe as concurrent
modifications are not safe. Hence, a single instance
of HTable for each thread should be used in any
application. For multiple HTable instances with the
same configuration reference, the same underlying
HConnection instance can be used.
RowKey is main point to consider when configuring
table structure. Use compound RowKey with SHA1,
MD5 hashing algorithms (with additional reverse
timestamp part) as Hbase store data sorted.
Update
Data in Hbase is versioned, by default there’re last 3 values stored into column.
Use HColumnDescriptor.setMaxVersions(n) method to overwrite this value.
Delete
Value for “user_name” qual changed to previous version.
Read – Table Scan
Table Scan...
PaulRK Paul paul01@mail.com
Read – Get Field
Get particular Field...
rowKey = MikeRK, user_name: Mike
rowKey = MikeRK, user_mail: mike@mail.com
Conclusions
• HTable is expensive
Creating HTable instances also comes at a cost. Creating an HTable instance is a slow process as the creation of each HTable instance involves the scanning of
the .META table to check whether the table actually exists, which makes the operation very costly. Hence, it is not recommended that you use a new HTable
instance for each request where the number of concurrent requests are very high
• Scan cashing
A scan can be configured to retrieve a batch of rows in every RPC call it makes to HBase. This configuration can be done at a per-scanner level by using the
setCaching(int) API on the scan object. This configuration can also be set in the hbasesite.xml configuration file using the hbase.client.scanner.caching
property
• Increment
Increment Column Value (ICV). It’s exposed as both the Increment command object like the others but also as a method on the HTableInterface. This
command allows you to change an integral value stored in an HBase cell without reading it back first. The data manipulation happens in HBase, not in your
client application, which makes it fast. It also avoids a possible race condition where some other client is interacting with the same cell.
• Filter
A filter is a predicate that executes in HBase instead of on the client. When you specify a Filter in your Scan, HBase uses it to determine whether a record
should be returned. This can avoid a lot of unnecessary data transfer. It also keeps the filtering on the server instead of placing that burden on the client. The
filter applied is anything implementing the org.apache.hadoop.hbase.filter.Filter interface. HBase provides a number of filters, but it’s easy to implement
your own.
Thank you
ushin.evgenij
https://www.linkedin.com/in/yushyn

Weitere ähnliche Inhalte

Andere mochten auch

Organization of Automated Testing
Organization of Automated TestingOrganization of Automated Testing
Organization of Automated TestingKlika Tech, Inc
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systemsKlika Tech, Inc
 
[Tech Talks] Typesafe Stack Introduction
[Tech Talks] Typesafe Stack Introduction[Tech Talks] Typesafe Stack Introduction
[Tech Talks] Typesafe Stack IntroductionKlika Tech, Inc
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
 
HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015Avinash Ramineni
 
An Overview of HTML5 Storage
An Overview of HTML5 StorageAn Overview of HTML5 Storage
An Overview of HTML5 StoragePaul Irish
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to ServerlessNikolaus Graf
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
HBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBaseHBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBaseMichael Stack
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for ArchitectsNick Dimiduk
 
NGINX Microservices Reference Architecture: Ask Me Anything
NGINX Microservices Reference Architecture: Ask Me AnythingNGINX Microservices Reference Architecture: Ask Me Anything
NGINX Microservices Reference Architecture: Ask Me AnythingNGINX, Inc.
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 

Andere mochten auch (20)

Organization of Automated Testing
Organization of Automated TestingOrganization of Automated Testing
Organization of Automated Testing
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
[Tech Talks] Typesafe Stack Introduction
[Tech Talks] Typesafe Stack Introduction[Tech Talks] Typesafe Stack Introduction
[Tech Talks] Typesafe Stack Introduction
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015HBase from the Trenches - Phoenix Data Conference 2015
HBase from the Trenches - Phoenix Data Conference 2015
 
An Overview of HTML5 Storage
An Overview of HTML5 StorageAn Overview of HTML5 Storage
An Overview of HTML5 Storage
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
HBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBaseHBaseConEast2016: Practical Kerberos with Apache HBase
HBaseConEast2016: Practical Kerberos with Apache HBase
 
Apache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to UnderstandApache HBase Internals you hoped you Never Needed to Understand
Apache HBase Internals you hoped you Never Needed to Understand
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
NGINX Microservices Reference Architecture: Ask Me Anything
NGINX Microservices Reference Architecture: Ask Me AnythingNGINX Microservices Reference Architecture: Ask Me Anything
NGINX Microservices Reference Architecture: Ask Me Anything
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 

Kürzlich hochgeladen

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 

Kürzlich hochgeladen (20)

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 

HBase CRUD Java API

  • 1. HBase CRUD Use Java API for Create, Read, Update, Delete operations
  • 2. Agenda • Intro • Create • Insert • Update • Delete • Read – Table Scan • Read – Get Field • Conclusions
  • 3. Intro A rowkey primarily represents each row uniquely in the HBase table, whereas other keys such as column family, timestamp, and so on are used to locate a piece of data in an HBase table. The HBase API provides the following methods to support the CRUD operations: • Put • Get • Delete • Scan • Increment You could find source code for this presentation on github: https://github.com/EugeneYushin/HBase-CRUD
  • 4. Create Table creates in ‘Enabled’ state. Check table creation in Hue (Cloudera CDH 5.1.0) and hbase shell
  • 5. Insert Use HConnection.getTable() against HTablePool as last is deprecated in 0.94, 0.95/0.96, and removed in 0.98 .
  • 6. Insert All manipulations with table implements through HTableInterface. HTable represents particular table in Hbase. The HTable class is not thread-safe as concurrent modifications are not safe. Hence, a single instance of HTable for each thread should be used in any application. For multiple HTable instances with the same configuration reference, the same underlying HConnection instance can be used. RowKey is main point to consider when configuring table structure. Use compound RowKey with SHA1, MD5 hashing algorithms (with additional reverse timestamp part) as Hbase store data sorted.
  • 7. Update Data in Hbase is versioned, by default there’re last 3 values stored into column. Use HColumnDescriptor.setMaxVersions(n) method to overwrite this value.
  • 8. Delete Value for “user_name” qual changed to previous version.
  • 9. Read – Table Scan Table Scan... PaulRK Paul paul01@mail.com
  • 10. Read – Get Field Get particular Field... rowKey = MikeRK, user_name: Mike rowKey = MikeRK, user_mail: mike@mail.com
  • 11. Conclusions • HTable is expensive Creating HTable instances also comes at a cost. Creating an HTable instance is a slow process as the creation of each HTable instance involves the scanning of the .META table to check whether the table actually exists, which makes the operation very costly. Hence, it is not recommended that you use a new HTable instance for each request where the number of concurrent requests are very high • Scan cashing A scan can be configured to retrieve a batch of rows in every RPC call it makes to HBase. This configuration can be done at a per-scanner level by using the setCaching(int) API on the scan object. This configuration can also be set in the hbasesite.xml configuration file using the hbase.client.scanner.caching property • Increment Increment Column Value (ICV). It’s exposed as both the Increment command object like the others but also as a method on the HTableInterface. This command allows you to change an integral value stored in an HBase cell without reading it back first. The data manipulation happens in HBase, not in your client application, which makes it fast. It also avoids a possible race condition where some other client is interacting with the same cell. • Filter A filter is a predicate that executes in HBase instead of on the client. When you specify a Filter in your Scan, HBase uses it to determine whether a record should be returned. This can avoid a lot of unnecessary data transfer. It also keeps the filtering on the server instead of placing that burden on the client. The filter applied is anything implementing the org.apache.hadoop.hbase.filter.Filter interface. HBase provides a number of filters, but it’s easy to implement your own.