HBase CRUD Java API

•Als PPTX, PDF herunterladen•

1 gefällt mir•1,578 views

Based concepts to implement Create, Read, Update, Delete operations on HBase over Java API. Follow us at LinkedIn: https://www.linkedin.com/groups?home=&gid=8104884

Software

HBase CRUD
Use Java API for Create, Read, Update, Delete operations

Agenda
• Intro
• Create
• Insert
• Update
• Delete
• Read – Table Scan
• Read – Get Field
• Conclusions

Intro
A rowkey primarily represents each row uniquely in the HBase table, whereas other
keys such as column family, timestamp, and so on are used to locate a piece of data
in an HBase table. The HBase API provides the following methods to support the
CRUD operations:
• Put
• Get
• Delete
• Scan
• Increment
You could find source code for this presentation on github:
https://github.com/EugeneYushin/HBase-CRUD

Create
Table creates in ‘Enabled’ state. Check table creation in Hue (Cloudera CDH 5.1.0) and hbase shell

Insert
Use HConnection.getTable() against HTablePool as last is deprecated in 0.94, 0.95/0.96, and removed
in 0.98 .

Insert
All manipulations with table implements through
HTableInterface. HTable represents particular table in
Hbase.
The HTable class is not thread-safe as concurrent
modifications are not safe. Hence, a single instance
of HTable for each thread should be used in any
application. For multiple HTable instances with the
same configuration reference, the same underlying
HConnection instance can be used.
RowKey is main point to consider when configuring
table structure. Use compound RowKey with SHA1,
MD5 hashing algorithms (with additional reverse
timestamp part) as Hbase store data sorted.

Update
Data in Hbase is versioned, by default there’re last 3 values stored into column.
Use HColumnDescriptor.setMaxVersions(n) method to overwrite this value.

Delete
Value for “user_name” qual changed to previous version.

Read – Table Scan
Table Scan...
PaulRK Paul paul01@mail.com

Read – Get Field
Get particular Field...
rowKey = MikeRK, user_name: Mike
rowKey = MikeRK, user_mail: mike@mail.com

Conclusions
• HTable is expensive
Creating HTable instances also comes at a cost. Creating an HTable instance is a slow process as the creation of each HTable instance involves the scanning of
the .META table to check whether the table actually exists, which makes the operation very costly. Hence, it is not recommended that you use a new HTable
instance for each request where the number of concurrent requests are very high
• Scan cashing
A scan can be configured to retrieve a batch of rows in every RPC call it makes to HBase. This configuration can be done at a per-scanner level by using the
setCaching(int) API on the scan object. This configuration can also be set in the hbasesite.xml configuration file using the hbase.client.scanner.caching
property
• Increment
Increment Column Value (ICV). It’s exposed as both the Increment command object like the others but also as a method on the HTableInterface. This
command allows you to change an integral value stored in an HBase cell without reading it back first. The data manipulation happens in HBase, not in your
client application, which makes it fast. It also avoids a possible race condition where some other client is interacting with the same cell.
• Filter
A filter is a predicate that executes in HBase instead of on the client. When you specify a Filter in your Scan, HBase uses it to determine whether a record
should be returned. This can avoid a lot of unnecessary data transfer. It also keeps the filtering on the server instead of placing that burden on the client. The
filter applied is anything implementing the org.apache.hadoop.hbase.filter.Filter interface. HBase provides a number of filters, but it’s easy to implement
your own.

Thank you
ushin.evgenij
https://www.linkedin.com/in/yushyn

Empfohlen

Using HBase for Real-time Access to your Big DataDinesh Kumar.V

Using BigSheets for Spreadsheet-like AnalyticsDinesh Kumar.V

EPAM. Hadoop MR streaming in HiveEugene Yushin

Finite State Machines and C++Klika Tech, Inc

Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.

Writing Scalable React Applications: IntroductionKlika Tech, Inc

How to Write UI Automated TestsKlika Tech, Inc

jQuery Anti-Patterns for Performance & CompressionPaul Irish

Empfohlen

Using HBase for Real-time Access to your Big DataDinesh Kumar.V

Using BigSheets for Spreadsheet-like AnalyticsDinesh Kumar.V

EPAM. Hadoop MR streaming in HiveEugene Yushin

Finite State Machines and C++Klika Tech, Inc

Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.

Writing Scalable React Applications: IntroductionKlika Tech, Inc

How to Write UI Automated TestsKlika Tech, Inc

jQuery Anti-Patterns for Performance & CompressionPaul Irish

Organization of Automated TestingKlika Tech, Inc

CAP theorem and distributed systemsKlika Tech, Inc

[Tech Talks] Typesafe Stack IntroductionKlika Tech, Inc

Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh

Intro to HBase - Lars GeorgeJAX London

HBase from the Trenches - Phoenix Data Conference 2015Avinash Ramineni

An Overview of HTML5 StoragePaul Irish

Introduction to ServerlessNikolaus Graf

HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack

HBase Client APIs (for webapps?)Nick Dimiduk

HBase Advanced - Lars GeorgeJAX London

HBaseConEast2016: Practical Kerberos with Apache HBaseMichael Stack

Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser

Apache Phoenix + Apache HBaseDataWorks Summit/Hadoop Summit

Apache Hadoop and HBaseCloudera, Inc.

Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks

Apache HBase for ArchitectsNick Dimiduk

NGINX Microservices Reference Architecture: Ask Me AnythingNGINX, Inc.

HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.

Apache HBase Low LatencyNick Dimiduk

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171

Weitere ähnliche Inhalte

Andere mochten auch

Organization of Automated TestingKlika Tech, Inc

CAP theorem and distributed systemsKlika Tech, Inc

[Tech Talks] Typesafe Stack IntroductionKlika Tech, Inc

Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh

Intro to HBase - Lars GeorgeJAX London

HBase from the Trenches - Phoenix Data Conference 2015Avinash Ramineni

An Overview of HTML5 StoragePaul Irish

Introduction to ServerlessNikolaus Graf

HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack

HBase Client APIs (for webapps?)Nick Dimiduk

HBase Advanced - Lars GeorgeJAX London

HBaseConEast2016: Practical Kerberos with Apache HBaseMichael Stack

Apache HBase Internals you hoped you Never Needed to UnderstandJosh Elser

Apache Phoenix + Apache HBaseDataWorks Summit/Hadoop Summit

Apache Hadoop and HBaseCloudera, Inc.

Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks

Apache HBase for ArchitectsNick Dimiduk

NGINX Microservices Reference Architecture: Ask Me AnythingNGINX, Inc.

HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceCloudera, Inc.

Apache HBase Low LatencyNick Dimiduk

Andere mochten auch (20)

Organization of Automated Testing

CAP theorem and distributed systems

[Tech Talks] Typesafe Stack Introduction

Introduction to HBase - Phoenix HUG 5/14

Intro to HBase - Lars George

HBase from the Trenches - Phoenix Data Conference 2015

An Overview of HTML5 Storage

Introduction to Serverless

HBaseConEast2016: HBase and Spark, State of the Art

HBase Client APIs (for webapps?)

HBase Advanced - Lars George

HBaseConEast2016: Practical Kerberos with Apache HBase

Apache HBase Internals you hoped you Never Needed to Understand

Apache Phoenix + Apache HBase

Apache Hadoop and HBase

Hortonworks Technical Workshop: HBase and Apache Phoenix

Apache HBase for Architects

NGINX Microservices Reference Architecture: Ask Me Anything

HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce

Apache HBase Low Latency

Kürzlich hochgeladen

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171

MYjobs Presentation Django-based projectAnoyGreter

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

What is Fashion PLM and Why Do You Need ItWave PLM

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

Odoo Development Company in India | Devintelle Consulting ServiceDevintelle Consulting Service Pvt Ltd Odoo OpenERP

Powering Real-Time Decisions with Continuous Data StreamsSafe Software

Advantages of Odoo ERP 17 for Your BusinessEnvertis Software Solutions

Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC

英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

Precise and Complete Requirements? An Elusive GoalLionel Briand

What is Advanced Excel and what are some best practices for designing and cre...Technogeeks

How to submit a standout Adobe Champion ApplicationBradBedford3

Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

React Server Component in Next.js by Hanief UtamaHanief Utama

Kürzlich hochgeladen (20)

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany

Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf

MYjobs Presentation Django-based project

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Automate your Kamailio Test Calls - Kamailio World 2024

What is Fashion PLM and Why Do You Need It

CRM Contender Series: HubSpot vs. Salesforce

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

Odoo Development Company in India | Devintelle Consulting Service

Powering Real-Time Decisions with Continuous Data Streams

Advantages of Odoo ERP 17 for Your Business

Software Project Health Check: Best Practices and Techniques for Your Product...

英国UN学位证,北安普顿大学毕业证书1:1制作

Cloud Data Center Network Construction - IEEE

Precise and Complete Requirements? An Elusive Goal

What is Advanced Excel and what are some best practices for designing and cre...

How to submit a standout Adobe Champion Application

Ahmed Motair CV April 2024 (Senior SW Developer)

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

React Server Component in Next.js by Hanief Utama

HBase CRUD Java API

1. HBase CRUD Use Java API for Create, Read, Update, Delete operations

2. Agenda • Intro • Create • Insert • Update • Delete • Read – Table Scan • Read – Get Field • Conclusions

3. Intro A rowkey primarily represents each row uniquely in the HBase table, whereas other keys such as column family, timestamp, and so on are used to locate a piece of data in an HBase table. The HBase API provides the following methods to support the CRUD operations: • Put • Get • Delete • Scan • Increment You could find source code for this presentation on github: https://github.com/EugeneYushin/HBase-CRUD

4. Create Table creates in ‘Enabled’ state. Check table creation in Hue (Cloudera CDH 5.1.0) and hbase shell

5. Insert Use HConnection.getTable() against HTablePool as last is deprecated in 0.94, 0.95/0.96, and removed in 0.98 .

6. Insert All manipulations with table implements through HTableInterface. HTable represents particular table in Hbase. The HTable class is not thread-safe as concurrent modifications are not safe. Hence, a single instance of HTable for each thread should be used in any application. For multiple HTable instances with the same configuration reference, the same underlying HConnection instance can be used. RowKey is main point to consider when configuring table structure. Use compound RowKey with SHA1, MD5 hashing algorithms (with additional reverse timestamp part) as Hbase store data sorted.

7. Update Data in Hbase is versioned, by default there’re last 3 values stored into column. Use HColumnDescriptor.setMaxVersions(n) method to overwrite this value.

8. Delete Value for “user_name” qual changed to previous version.

9. Read – Table Scan Table Scan... PaulRK Paul paul01@mail.com

10. Read – Get Field Get particular Field... rowKey = MikeRK, user_name: Mike rowKey = MikeRK, user_mail: mike@mail.com

11. Conclusions • HTable is expensive Creating HTable instances also comes at a cost. Creating an HTable instance is a slow process as the creation of each HTable instance involves the scanning of the .META table to check whether the table actually exists, which makes the operation very costly. Hence, it is not recommended that you use a new HTable instance for each request where the number of concurrent requests are very high • Scan cashing A scan can be configured to retrieve a batch of rows in every RPC call it makes to HBase. This configuration can be done at a per-scanner level by using the setCaching(int) API on the scan object. This configuration can also be set in the hbasesite.xml configuration file using the hbase.client.scanner.caching property • Increment Increment Column Value (ICV). It’s exposed as both the Increment command object like the others but also as a method on the HTableInterface. This command allows you to change an integral value stored in an HBase cell without reading it back first. The data manipulation happens in HBase, not in your client application, which makes it fast. It also avoids a possible race condition where some other client is interacting with the same cell. • Filter A filter is a predicate that executes in HBase instead of on the client. When you specify a Filter in your Scan, HBase uses it to determine whether a record should be returned. This can avoid a lot of unnecessary data transfer. It also keeps the filtering on the server instead of placing that burden on the client. The filter applied is anything implementing the org.apache.hadoop.hbase.filter.Filter interface. HBase provides a number of filters, but it’s easy to implement your own.

12. Thank you ushin.evgenij https://www.linkedin.com/in/yushyn