Suche senden
Hochladen
Local Secondary Indexes in Apache Phoenix
•
2 gefällt mir
•
2,604 views
Rajeshbabu Chintaguntla
Folgen
Deep dive of local indexes in Apache Phoenix
Weniger lesen
Mehr lesen
Software
Melden
Teilen
Melden
Teilen
1 von 18
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
Strongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache Phoenix
YugabyteDB
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
enissoz
Apache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
DataWorks Summit/Hadoop Summit
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
HBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
Weitere ähnliche Inhalte
Was ist angesagt?
HBase Application Performance Improvement
HBase Application Performance Improvement
Biju Nair
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
DataWorks Summit
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon
Apache phoenix
Apache phoenix
Osama Hussein
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
DataWorks Summit
Reading The Source Code of Presto
Reading The Source Code of Presto
Taro L. Saito
Cassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
Hive 3 - a new horizon
Hive 3 - a new horizon
Thejas Nair
Introduction to memcached
Introduction to memcached
Jurriaan Persyn
Apache Nifi Crash Course
Apache Nifi Crash Course
DataWorks Summit
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
Evan Chan
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
Apache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
Was ist angesagt?
(20)
HBase Application Performance Improvement
HBase Application Performance Improvement
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Performance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
Apache phoenix
Apache phoenix
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
Reading The Source Code of Presto
Reading The Source Code of Presto
Cassandra Introduction & Features
Cassandra Introduction & Features
Hive 3 - a new horizon
Hive 3 - a new horizon
Introduction to memcached
Introduction to memcached
Apache Nifi Crash Course
Apache Nifi Crash Course
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Apache Spark Architecture
Apache Spark Architecture
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
Ähnlich wie Local Secondary Indexes in Apache Phoenix
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Ankit Singhal
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
enissoz
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
DataWorks Summit
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
Martin Loetzsch
Hive 3 a new horizon
Hive 3 a new horizon
Abdelkrim Hadjidj
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
DataWorks Summit
Hbase mhug 2015
Hbase mhug 2015
Joseph Niemiec
Ijebea14 228
Ijebea14 228
Iasir Journals
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
Michael Stack
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
DataWorks Summit
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
Dave Stokes
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
IRJET- Rest API for E-Commerce Site
IRJET- Rest API for E-Commerce Site
IRJET Journal
War of the Indices- SQL vs. Oracle
War of the Indices- SQL vs. Oracle
Kellyn Pot'Vin-Gorman
Hive(ppt)
Hive(ppt)
Abhinav Tyagi
Hive(ppt)
Hive(ppt)
Abhinav Tyagi
Sql server lesson6
Sql server lesson6
Ala Qunaibi
Hive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
Ähnlich wie Local Secondary Indexes in Apache Phoenix
(20)
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
Lightweight ETL pipelines with mara (PyData Berlin September Meetup)
Hive 3 a new horizon
Hive 3 a new horizon
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
Hbase mhug 2015
Hbase mhug 2015
Ijebea14 228
Ijebea14 228
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
IRJET- Rest API for E-Commerce Site
IRJET- Rest API for E-Commerce Site
War of the Indices- SQL vs. Oracle
War of the Indices- SQL vs. Oracle
Hive(ppt)
Hive(ppt)
Hive(ppt)
Hive(ppt)
Sql server lesson6
Sql server lesson6
Hive present-and-feature-shanghai
Hive present-and-feature-shanghai
Kürzlich hochgeladen
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
Nirav Modi
New ThousandEyes Product Features and Release Highlights: March 2024
New ThousandEyes Product Features and Release Highlights: March 2024
ThousandEyes
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Alluxio, Inc.
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
Shyamsundar Das
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
Shane Coughlan
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
Sharon Liu
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Jaydeep Chhasatia
How to Improve the Employee Experience? - HRMS Software
How to Improve the Employee Experience? - HRMS Software
NYGGS Automation Suite
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Dista
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
Raymond Okyere-Forson
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native Buildpacks
Vish Abrams
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Jaydeep Chhasatia
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
Ivo Andreev
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Jonathan Katz
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
robinwilliams8624
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
VICTOR MAESTRE RAMIREZ
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
Neo4j
Top Software Development Trends in 2024
Top Software Development Trends in 2024
Mind IT Systems
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
Ivo Andreev
Kürzlich hochgeladen
(20)
eAuditor Audits & Inspections - conduct field inspections
eAuditor Audits & Inspections - conduct field inspections
New ThousandEyes Product Features and Release Highlights: March 2024
New ThousandEyes Product Features and Release Highlights: March 2024
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
How to Improve the Employee Experience? - HRMS Software
How to Improve the Employee Experience? - HRMS Software
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
Sales Territory Management: A Definitive Guide to Expand Sales Coverage
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
Streamlining Your Application Builds with Cloud Native Buildpacks
Streamlining Your Application Builds with Cloud Native Buildpacks
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Optimizing Business Potential: A Guide to Outsourcing Engineering Services in...
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Vectors are the new JSON in PostgreSQL (SCaLE 21x)
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
Top Software Development Trends in 2024
Top Software Development Trends in 2024
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
Local Secondary Indexes in Apache Phoenix
1.
Local Secondary Indexes
in Apache Phoenix Rajeshbabu Chintaguntla PhoenixCon 2017
2.
2 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Agenda Local Indexes Introduction Local indexes design and data model Local index writes and reads Performance Results Helpful Tips or recommendations
3.
3 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Secondary indexes in Phoenix Primary Key columns in a phoenix table forms HBase row key which acts as a primary index so filtering by primary key columns become point or range scans to the table. Filtering on non primary key column converts query into full table scans and consume lot time and resources. With secondary indexes, we can create alternative access paths to convert queries into point lookups or range scans. Phoenix supports two kinds of indexes GLOBAL and LOCAL. Phoenix supports Functional indexes as well.
4.
4 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Local Secondary Indexes - Introduction Local secondary index is LOCAL in the sense that a REGION in a table is considered as a unit and create and maintain index of it’s data. The local index data is stored and maintained in the shadow column family(ies) in the same table. So the index is 100% co-reside in the same server serving the actual data. Faster index building. Syntax:
5.
5 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Local Secondary Index - Introduction Order Id Customer ID Item ID Date 100 11 1111 06/10/2017 101 23 1231 06/01/2017 102 11 1332 05/31/2017 103 34 3221 06/01/2017 Region[100 ,104) Region[104 ,107) REGION START KEY IDX ID DATE Order ID 100 1 05/31/2017 102 100 1 06/01/2017 101 100 1 06/01/2017 103 100 1 06/10/2017 100 104 55 1343 05/28/2017 105 11 2312 06/01/2017 106 29 1234 05/15/2017 104 1 05/15/2017 106 104 1 05/28/2017 104 104 1 06/01/2017 105 CREATE TABLE IF NOT EXISTS ORDERS( ORDER_ID LONG NOT NULL PRIMARY KEY, CUSTOMER_ID LONG NOT NULL, ITEM_ID INTEGER NOT NULL, DATE DATE NOT NULL); CREATE LOCAL INDEX IDX ON ORDERS(DATE) Index of Region[100, 104) Index of Region[104,107) BASE TABLE DATA – ORDER ID IS PRIMARY KEY INDEX ROW KEY
6.
6 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Table Region1 0 L# 0 STATS CREATE TABLE IF NOT EXISTS WEB_STAT ( HOST CHAR(2) NOT NULL, DOMAIN VARCHAR NOT NULL, FEATURE VARCHAR NOT NULL, DATE DATE NOT NULL, STATS.ACTIVE_VISITOR INTEGER CONSTRAINT PK PRIMARY KEY (HOST, DOMAIN)); Region2 0 L# 0 STATS 2) CREATE LOCAL INDEX IDX2 ON WEB_STAT(STATS.ACTIVE_VISITOR) INCLUDE(DATE) Table Region1 0 STATS Region2 0 L# 0 STATS 3) CREATE LOCAL INDEX IDX3 ON WEB_STAT(DATE) INCLUDE(STATS.ACTIVE_VISITOR) L#STATS L# 0 L#STATS Data Model Shadow column families to store the index data 1) CREATE LOCAL INDEX IDX ON WEB_STAT(DATE)
7.
7 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Data Model REGION START KEY SALT NUMBER (Empty for non salt table) INDEX ID TENANT_ID (Empty for non multi tenant table) INDEXED COLUMN VALUE[S] PRIMARY KEY COLUMN VALUE[S] Local index row key format REGION START KEY: Start key of data region. For first region it’s empty byte array of region end key length. This helps to index region wise data. SALT NUMBER: A byte value represents a salt bucket number calculated for index row key. INDEX ID: A short number represents the local index. This helps to store each index data together. TENANT_ID: Tenant column value of the row key. It’s empty for if a table is not multi-tenant
8.
8 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Write path Region Server Region CLIENT 1.Write request prepare index updates Data cf Index cf 2.batch call Mem Store Me mSto re Index updates Data updates 4.Merge data and index updates 5.Write to MemStores WAL 6.Write to WAL 100% ATOMIC and CONSISTENT local index updates with data updates
9.
9 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Regionserver Region [‘’,F) Region [F,L) Client 0 L#0 Region [L,R) Region [R,’’) Regionserver Read Path 0 L#0 0 L#0 0 L#0 SELECT COUNT(*) FROM T WHERE INDEXED_COL=‘findme’ 2 1 0 5
10.
10 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Read Path SELECT INDEX_COL, NON_INDEX_COL FROM T WHERE INDEX_COL=‘findme’ Joining back missing columns from data table Region CLIENT 1.SCAN,L#0,FILTER Index cf Data cf Mem Store Me mSto re 2.Apply filter on index col 3.Get non index cols on matching rows 4.Merge with index cols 5.Return combined results to client 6. Results
11.
11 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Region Splits and Merges Since the indexes also stored in the same table, splits and merges taken care by HBase automatically. We have special mechanism to separate HFile into child regions after split. We scan through each key value find the data row key from it and write to corresponding child region
12.
12 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Performance Results 4 node cluster Tested with 5 local indexes on the base table of 25 columns with 10 regions. Ingested 50M rows. 3x faster upsert time comparing to global indexes 5x less network RX/TX utilizations during write comparing to global indexes Similar read performance comparing to global indexes with queries like aggregations, group by, limit etc.
13.
13 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Performance results Write performance
14.
14 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Performance results Network Tx/Rx during write
15.
15 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Performance results Network Tx/Rx during write
16.
16 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Performance results Network Tx/Rx during write
17.
17 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Helpful Tips Mutable vs Immutable rows table? – Writes are much more faster with local indexes on immutable rows table than mutable. So if the row written once and never updated then better to create table with IMMUTABLE_ROWS property. Online vs Offline index population? – When a table with pre-existing data then index population time may vary depending on the data size. – Usually index population happen at server by reading data table and writing index to the same table. It works very fast normally. But if the data size is too big then better to use ASYNC population by using IndexTool. Covered index vs non covered index? – When a query contains the non indexed columns to access then Phoenix joins the missing columns(in the index) from data table itself by using get calls. If the matching number of rows are high better to create covered index to avoid get calls.
18.
18 © Hortonworks
Inc. 2011 – 2017. All Rights Reserved Thank You Q & A? rajeshbabu@apache.org @rajeshhcu32
Jetzt herunterladen