SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Phoenix and HBase: Past, Present
and Future of SQL over HBase
Enis Soztutar (enis@hortonworks.com)
Ankit Singhal (asinghal@hortonworks.com)
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
About Me
Enis Soztutar
Committer and PMC member in Apache HBase, Phoenix, and Hadoop
HBase/Phoenix team @Hortonworks
Twitter @enissoz
Disclaimer: Not a SQL expert!
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Outline
PART I – The Past (a.k.a. All the existing stuff)
 Phoenix the basics
 Architecture
 Overview of existing Phoenix features
PART II – The Present (a.k.a. All the recent stuff)
 Look at recent releases
 Transactions
 Phoenix Query Server
 Other features
PART III – The Future (a.k.a. All the upcoming stuff)
 Calcite integration
 Phoenix – Hive
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Part I – The Past
All the existing stuff !
Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Obligatory Slide - Who uses Phoenix
Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix – The Basics
• Hope everybody is familiar with HBase
• Otherwise you are in the wrong talk!
• What is wrong with pure-HBase?
• HBase is a powerful, flexible and extensible “engine”
• Too low level
• Have to write java code to do anything!
• Phoenix is relational layer over HBase
• Also described as a SQL-Skin
• Looking more and more like a generic SQL engine
• Why not Hive / Spark SQL / other SQL-over-Hadoop
• OTLP versus OLAP
• As fast as HBase, 1 ms query, 10K-1M qps
Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Why SQL?
Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
From CDK Global
slides
https://phoenix.apache.
org/presentations/Strata
HadoopWorld.pdf
Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HBase Architecture
DataNode
RegionServer 2
T:foo, region:a
T:bar, region:54
T:foo, region:t
Application
HBase client
DataNode
RegionServer 1
T:foo, region:c
T:bar, region:14
T:foo, region:d
DataNode
RegionServer 3
T:bar, region:32
T:foo, region:k
ZooKeeper
Quorum
Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix Architecture
DataNode
RegionServer 2
T:foo, region:c
T:bar, region:54
T:foo, region:t
Phoenix RPC
endpoint
px
px
Application
Phoenix client / JDBC
HBase client
DataNode
RegionServer 1
T:foo, region:c
T:bar, region:14
T:foo, region:d
Phoenix RPC
endpoint
px
px
DataNode
RegionServer 3
T:SYSTEM.CATALOG
T:bar, region:32
T:foo, region:k
Phoenix RPC
endpoint
px
px
ZooKeeper
Quorum
Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix Goodies
SQL DataTypes
Schemas / DDL / HBase table properties
Composite Types (Composite Primary Key)
Map existing HBase tables
Write from HBase, read from Phoenix
Salting
Parallel Scan
Skip scan
Filter push down
Statistics Collection / Guideposts
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
DDL Example
CREATE TABLE IF NOT EXISTS METRIC_RECORD (
METRIC_NAME VARCHAR,
HOSTNAME VARCHAR,
SERVER_TIME UNSIGNED_LONG NOT NULL
METRIC_VALUE DOUBLE,
…
CONSTRAINT pk PRIMARY KEY (METRIC_NAME, HOSTNAME,
SERVER_TIME))
DATA_BLOCK_ENCODING=’FAST_DIFF', TTL=604800,
COMPRESSION=‘SNAPPY’
SPLIT ON ('a', 'k', 'm');
Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
METRIC_NAME HOSTNAME SERVER_TIME METRIC_VALUE
Regionserver.readRequestCount cn011.hortonworks.com 1396743589 92045759
Regionserver.readRequestCount cn011.hortonworks.com 1396767589 93051916
Regionserver.readRequestCount cn011.hortonworks.com …. …
Regionserver.readRequestCount cn012. hortonworks.com 1396743589
….. … … …
Regionserver.wal.bytesWritten cn011.hortonworks.com
Regionserver.wal.bytesWritten …. …. …
SORT ORDERSORTORDER
HBASE ROW KEY OTHER COLUMNS
Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Parallel Scan
SELECT * FROM METRIC_RECORD;
CLIENT 4-CHUNK PARALLEL 1-WAY
FULL SCAN OVER METRIC_RECORD
Region1
Region2
Region3
Region4
Client
RS3RS2
RS1
scanscanscanscan
Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Filter push down
SELECT * FROM METRIC_RECORD
WHERE SERVER_TIME > NOW() - 7;
CLIENT 4-CHUNK PARALLEL 1-WAY
FULL SCAN OVER METRIC_RECORD
SERVER FILTER BY
SERVER_TIME > DATE
'2016-04-06 09:09:05.978’
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
scanscanscanscan
Server-side Filter
Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Skip Scan
SELECT * FROM METRIC_RECORD
WHERE METRIC_NAME LIKE 'abc%'
AND HOSTNAME in ('host1’,
'host2');
CLIENT 1-CHUNK PARALLEL 1-WAY SKIP
SCAN ON 2 RANGES OVER
METRIC_RECORD ['abc','host1'] -
['abd','host2']
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
Skip scan
Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
TopN
SELECT * FROM METRIC_RECORD
WHERE SERVER_TIME > NOW() - 7
ORDER BY HOSTNAME LIMIT 5;
CLIENT 4-CHUNK PARALLEL 4-WAY FULL
SCAN OVER METRIC_RECORD
SERVER FILTER BY SERVER_TIME > …
SERVER TOP 5 ROWS SORTED BY
[HOSTNAME]
CLIENT MERGE SORT
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
scanscanscanscan
Sort by HOSTNAME
Return only 5
ROWS
Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Aggregation
SELECT METRIC_NAME, HOSTNAME,
AVG(METRIC_VALUE)
FROM METRIC_RECORD
WHERE SERVER_TIME > NOW() - 7
GROUP BY METRIC_NAME, HOSTNAME
ORDER BY METRIC_NAME, HOSTNAME;
CLIENT 4-CHUNK PARALLEL 1-WAY FULL
SCAN OVER METRIC_RECORD
SERVER FILTER BY SERVER_TIME > …
SERVER AGGREGATE INTO ORDERED
DISTINCT ROWS BY
[METRIC_NAME, HOSTNAME]
CLIENT MERGE SORT
Region1
Region2
Region3
Region4
Client
RS3RS2RS1
scanscanscanscan
Return only
aggregated data by
METRIC_NAME,
HOSTNAME
Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Joins and subqueries in Phoenix
Grammar
• Inner, Left, Right, Full outer join, Cross join
• Semi-join / Anti-join
Algorithms
• Hash-join, sort-merge join
• Hash-join table is computed and pushed to each regionserver from client
Optimizations
• Predicate push-down
• PK-to-FK join optimization
• Global index with missing columns
• Correlated query rewrite
Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Joins and subqueries in Phoenix
Phoenix can execute most of TPC-H queries!
No nested loop join
With Calcite support, more improvements soon
No statistical Guided join selection yet
Not very good at executing very big joins
• No generic YARN / Tez execution layer
• But Hive / Spark support for generic DAG execution
Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Secondary Indexes
HBase table is a sorted map
• Everything in HBase is sorted in primary key order
• Full or partial scans in sort order is very efficient in HBase
• Sort data differently with secondary index dimensions
Two types
• Global index
• Local index
Query
• Indexes are “covered”
• Indexes are automatically selected from queries
• Only covered columns are returned from index without going back to data table
Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Global and Local Index
Global Index
• A single instance for all table data in a
different sort order
• A different HBase table per index
• Optimized for read-heavy use cases
• Can be one edit “behind” actual primary
data
• Transactional tables indices have ACID
guarantees
• Different consistency / durability for
mutable / immutable tables
Local Index
• Multiple mini-instances per region
• Uses same HBase table, different cf
• Optimized for write-heavy use cases
• Atomic commit and visibility (coming soon)
• Queries have to ask all regions for relevant
data from index
Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Part II – The Present
All the recent stuff !
Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Release Note Highlights
4.4
• Functional Indexes
• UDFs
• Query Server
• UNION ALL
• MR Index Build
• Spark Integration
• Date built-in functions
4.5
• Client-side per-statement metrics
• SELECT without FROM
• ALTER TABLE with VIEWS
• Math and Array built-in functions
Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Release Note Highlights
4.6
• ROW_TIMESTAMP for HBase native timestamps
• Support for correlate variable
• Support for un-nesting arrays
• Web-app for visualizing trace info (alpha)
4.7
• Transaction support
• Enhanced secondary index consistency guarantees
• Statistics improvements
• Perf improvements
Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Row Timestamps
A pseudo-column for HBase native timestamps (versions)
Enables setting and querying cell timestamps
Perfect for time-series use cases
• Combine with FIFO / Date Tiered Compaction policies
• And HBase scan file pruning based on min-max ts for very efficient scans
CREATE TABLE METRICS_TABLE (
CREATED_DATE NOT NULL DATE,
METRIC_ID NOT NULL CHAR(15), METRIC_VALUE LONG
CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP,
METRIC_ID)) SALT_BUCKETS = 8;
Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Transactions
Uses Tephra
Snapshot isolation semantics
Completely optional.
• Can be enabled per-table (TRANSACTIONAL=true)
• Transactional and non-transactional tables can live side by side
Transactions see their own uncommitted data
Released in 4.7, will GA in 5.0
Optimistic Concurrency Control
• No locking for rows
• Transactions have to roll back and undo their writes in case of conflict
• Cost of conflict is higher
Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Tephra Architecture
RegionServer 2
Tephra / HBase Client
RegionServer 1 RegionServer 3
HBase client
ZooKeeper
Quorum
Tephra Trx Manager
(active)
Tephra Trx Manager
(standby)
Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Transaction Lifecycle
From Tephra
presentation
http://www.slideshare.n
et/alexbaranau/transacti
ons-over-hbase
Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix Query Server
Similar to HBase REST Server / Hive Server 2
Built on top of Calcite’s Avatica Server with Phoenix bindings
Embeds a Phoenix thick client inside
No client side sorting / join!
Protobuf-3.0 over HTTP protocol
Has a (thin) JDBC driver
Allows ODBC driver for Phoenix
Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix architecture revisited (thick client)
RegionServer 2
T:foo, region:d
Phoenix RPC
endpoint
px
Application
RegionServer 1
T:foo, region:d
Phoenix RPC
endpoint
px
RegionServer 3
T:foo, region:d
Phoenix RPC
endpoint
px
HBase client
Phoenix client / JDBC
Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix Query Server
Phoenix Query Server (thin client)
RegionServer 2
T:foo, region:d
Phoenix RPC
endpoint
px
Application
Phoenix thin client / JDBC
RegionServer 1
T:foo, region:d
Phoenix RPC
endpoint
px
RegionServer 3
T:foo, region:d
Phoenix RPC
endpoint
px
Phoenix client / JDBC
HBase client
Phoenix Query Server
Phoenix client / JDBC
HBase client
Phoenix Query Server
Phoenix client / JDBC
HBase client
Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Other new features (4.8+)
Shaded client by default. No more library dependency problems!
Phoenix schema mapping to HBase namespace
• Allows using isolation and security features of HBase namespaces
• Standard SQL syntax:
CREATE SCHEMA FOO;
USE FOO;
LIMIT / OFFSET
• We already had LIMIT. Now we have OFFSET
• Together with Row-Value-Constructs, covers most of cursor use cases
Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Part III – The Future
All the upcoming stuff !
Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Local Index
• Local Index re-implemented
• Instead of a different table, now local index data is kept within the same data
table
• Local index data goes into a different column family
• Index and data is committed together atomically without external transactions
• Bunch of stability improvements with region splits and merges
Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Calcite Integration
Calcite is a framework for:
• Query parser
• Compiler
• Planner
• Cost based optimizer
SQL-92 compliant
Based on relational algebra
Cost based optimizer with default rules + pluggable rules per-backend
Used by Hive / Drill / Kylin / Samza, etc.
Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Calcite Integration
Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Phoenix - Hive integration
Hive is a very rich and generic execution engine
Uses Tez + YARN to execute arbitrary DAG
Hive integration enables big joins and other Hive features
Phoenix DDL with HiveQL
Data insert / update delete (DML) with HiveQL
Predicate pushdown, salting, partitioning, partition pruning, etc
Can use secondary indexes as well since it uses Phoenix compiler
https://issues.apache.org/jira/browse/PHOENIX-2743
Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Future<Phoenix>
JSON support
TPC-H / Microstrategy / Tableau queries
Sqoop integration
Support Omid based transactions
Dogfooding within the Hadoop-ecosystem
• Ambari Metrics Service (AMS) uses Phoenix
• YARN will soon use HBase / Phoenix (ATS)
STRUCT type
Improvements to cost based optimization
Security and other HBase features used from Phoenix
See https://phoenix.apache.org/roadmap.html
Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Further Reference
Even more info on https://phoenix.apache.org
 New Features: https://phoenix.apache.org/recent.html
 Roadmap: https://phoenix.apache.org/roadmap.html
Get involved in mailing lists
 user@phoenix.apache.org
 dev@phoenix.apache.org
Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thanks
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizationsSzehon Ho
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis IntroductionAlex Su
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxData
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache KylinYang Li
 
Strongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache PhoenixStrongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache PhoenixYugabyteDB
 

Was ist angesagt? (20)

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Hive join optimizations
Hive join optimizationsHive join optimizations
Hive join optimizations
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis Introduction
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
 
Strongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache PhoenixStrongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache Phoenix
 

Andere mochten auch

Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANDataWorks Summit/Hadoop Summit
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceDataWorks Summit/Hadoop Summit
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...DataWorks Summit/Hadoop Summit
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...DataWorks Summit/Hadoop Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...DataWorks Summit/Hadoop Summit
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigDataWorks Summit/Hadoop Summit
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...DataWorks Summit/Hadoop Summit
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEDataWorks Summit/Hadoop Summit
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...DataWorks Summit/Hadoop Summit
 

Andere mochten auch (20)

Protecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache HadoopProtecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache Hadoop
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
 
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for ScaleRebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
 
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on HadoopThe truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
 
#HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course #HSTokyo16 Apache Spark Crash Course
#HSTokyo16 Apache Spark Crash Course
 

Ähnlich wie Apache Phoenix and HBase: Past, Present and Future of SQL over HBase

Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanAnkit Singhal
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicasenissoz
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Seetharam Venkatesh
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDataWorks Summit
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Ankit Singhal
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasDataWorks Summit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 

Ähnlich wie Apache Phoenix and HBase: Past, Present and Future of SQL over HBase (20)

Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, JapanApache Phoenix and HBase - Hadoop Summit Tokyo, Japan
Apache Phoenix and HBase - Hadoop Summit Tokyo, Japan
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
What's new in Ambari
What's new in AmbariWhat's new in Ambari
What's new in Ambari
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
HBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region ReplicasHBase Read High Availabilty using Timeline Consistent Region Replicas
HBase Read High Availabilty using Timeline Consistent Region Replicas
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 

Mehr von DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Mehr von DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Apache Phoenix and HBase: Past, Present and Future of SQL over HBase

  • 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Phoenix and HBase: Past, Present and Future of SQL over HBase Enis Soztutar (enis@hortonworks.com) Ankit Singhal (asinghal@hortonworks.com)
  • 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved About Me Enis Soztutar Committer and PMC member in Apache HBase, Phoenix, and Hadoop HBase/Phoenix team @Hortonworks Twitter @enissoz Disclaimer: Not a SQL expert!
  • 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Outline PART I – The Past (a.k.a. All the existing stuff)  Phoenix the basics  Architecture  Overview of existing Phoenix features PART II – The Present (a.k.a. All the recent stuff)  Look at recent releases  Transactions  Phoenix Query Server  Other features PART III – The Future (a.k.a. All the upcoming stuff)  Calcite integration  Phoenix – Hive
  • 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Part I – The Past All the existing stuff !
  • 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Obligatory Slide - Who uses Phoenix
  • 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix – The Basics • Hope everybody is familiar with HBase • Otherwise you are in the wrong talk! • What is wrong with pure-HBase? • HBase is a powerful, flexible and extensible “engine” • Too low level • Have to write java code to do anything! • Phoenix is relational layer over HBase • Also described as a SQL-Skin • Looking more and more like a generic SQL engine • Why not Hive / Spark SQL / other SQL-over-Hadoop • OTLP versus OLAP • As fast as HBase, 1 ms query, 10K-1M qps
  • 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Why SQL?
  • 8. Page8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved From CDK Global slides https://phoenix.apache. org/presentations/Strata HadoopWorld.pdf
  • 9. Page9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HBase Architecture DataNode RegionServer 2 T:foo, region:a T:bar, region:54 T:foo, region:t Application HBase client DataNode RegionServer 1 T:foo, region:c T:bar, region:14 T:foo, region:d DataNode RegionServer 3 T:bar, region:32 T:foo, region:k ZooKeeper Quorum
  • 10. Page10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix Architecture DataNode RegionServer 2 T:foo, region:c T:bar, region:54 T:foo, region:t Phoenix RPC endpoint px px Application Phoenix client / JDBC HBase client DataNode RegionServer 1 T:foo, region:c T:bar, region:14 T:foo, region:d Phoenix RPC endpoint px px DataNode RegionServer 3 T:SYSTEM.CATALOG T:bar, region:32 T:foo, region:k Phoenix RPC endpoint px px ZooKeeper Quorum
  • 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix Goodies SQL DataTypes Schemas / DDL / HBase table properties Composite Types (Composite Primary Key) Map existing HBase tables Write from HBase, read from Phoenix Salting Parallel Scan Skip scan Filter push down Statistics Collection / Guideposts
  • 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DDL Example CREATE TABLE IF NOT EXISTS METRIC_RECORD ( METRIC_NAME VARCHAR, HOSTNAME VARCHAR, SERVER_TIME UNSIGNED_LONG NOT NULL METRIC_VALUE DOUBLE, … CONSTRAINT pk PRIMARY KEY (METRIC_NAME, HOSTNAME, SERVER_TIME)) DATA_BLOCK_ENCODING=’FAST_DIFF', TTL=604800, COMPRESSION=‘SNAPPY’ SPLIT ON ('a', 'k', 'm');
  • 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved METRIC_NAME HOSTNAME SERVER_TIME METRIC_VALUE Regionserver.readRequestCount cn011.hortonworks.com 1396743589 92045759 Regionserver.readRequestCount cn011.hortonworks.com 1396767589 93051916 Regionserver.readRequestCount cn011.hortonworks.com …. … Regionserver.readRequestCount cn012. hortonworks.com 1396743589 ….. … … … Regionserver.wal.bytesWritten cn011.hortonworks.com Regionserver.wal.bytesWritten …. …. … SORT ORDERSORTORDER HBASE ROW KEY OTHER COLUMNS
  • 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Parallel Scan SELECT * FROM METRIC_RECORD; CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD Region1 Region2 Region3 Region4 Client RS3RS2 RS1 scanscanscanscan
  • 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Filter push down SELECT * FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7; CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD SERVER FILTER BY SERVER_TIME > DATE '2016-04-06 09:09:05.978’ Region1 Region2 Region3 Region4 Client RS3RS2RS1 scanscanscanscan Server-side Filter
  • 16. Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Skip Scan SELECT * FROM METRIC_RECORD WHERE METRIC_NAME LIKE 'abc%' AND HOSTNAME in ('host1’, 'host2'); CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 RANGES OVER METRIC_RECORD ['abc','host1'] - ['abd','host2'] Region1 Region2 Region3 Region4 Client RS3RS2RS1 Skip scan
  • 17. Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved TopN SELECT * FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7 ORDER BY HOSTNAME LIMIT 5; CLIENT 4-CHUNK PARALLEL 4-WAY FULL SCAN OVER METRIC_RECORD SERVER FILTER BY SERVER_TIME > … SERVER TOP 5 ROWS SORTED BY [HOSTNAME] CLIENT MERGE SORT Region1 Region2 Region3 Region4 Client RS3RS2RS1 scanscanscanscan Sort by HOSTNAME Return only 5 ROWS
  • 18. Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Aggregation SELECT METRIC_NAME, HOSTNAME, AVG(METRIC_VALUE) FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7 GROUP BY METRIC_NAME, HOSTNAME ORDER BY METRIC_NAME, HOSTNAME; CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD SERVER FILTER BY SERVER_TIME > … SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [METRIC_NAME, HOSTNAME] CLIENT MERGE SORT Region1 Region2 Region3 Region4 Client RS3RS2RS1 scanscanscanscan Return only aggregated data by METRIC_NAME, HOSTNAME
  • 19. Page19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Joins and subqueries in Phoenix Grammar • Inner, Left, Right, Full outer join, Cross join • Semi-join / Anti-join Algorithms • Hash-join, sort-merge join • Hash-join table is computed and pushed to each regionserver from client Optimizations • Predicate push-down • PK-to-FK join optimization • Global index with missing columns • Correlated query rewrite
  • 20. Page20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Joins and subqueries in Phoenix Phoenix can execute most of TPC-H queries! No nested loop join With Calcite support, more improvements soon No statistical Guided join selection yet Not very good at executing very big joins • No generic YARN / Tez execution layer • But Hive / Spark support for generic DAG execution
  • 21. Page21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Secondary Indexes HBase table is a sorted map • Everything in HBase is sorted in primary key order • Full or partial scans in sort order is very efficient in HBase • Sort data differently with secondary index dimensions Two types • Global index • Local index Query • Indexes are “covered” • Indexes are automatically selected from queries • Only covered columns are returned from index without going back to data table
  • 22. Page22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Global and Local Index Global Index • A single instance for all table data in a different sort order • A different HBase table per index • Optimized for read-heavy use cases • Can be one edit “behind” actual primary data • Transactional tables indices have ACID guarantees • Different consistency / durability for mutable / immutable tables Local Index • Multiple mini-instances per region • Uses same HBase table, different cf • Optimized for write-heavy use cases • Atomic commit and visibility (coming soon) • Queries have to ask all regions for relevant data from index
  • 23. Page23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Part II – The Present All the recent stuff !
  • 24. Page24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Release Note Highlights 4.4 • Functional Indexes • UDFs • Query Server • UNION ALL • MR Index Build • Spark Integration • Date built-in functions 4.5 • Client-side per-statement metrics • SELECT without FROM • ALTER TABLE with VIEWS • Math and Array built-in functions
  • 25. Page25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Release Note Highlights 4.6 • ROW_TIMESTAMP for HBase native timestamps • Support for correlate variable • Support for un-nesting arrays • Web-app for visualizing trace info (alpha) 4.7 • Transaction support • Enhanced secondary index consistency guarantees • Statistics improvements • Perf improvements
  • 26. Page26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Row Timestamps A pseudo-column for HBase native timestamps (versions) Enables setting and querying cell timestamps Perfect for time-series use cases • Combine with FIFO / Date Tiered Compaction policies • And HBase scan file pruning based on min-max ts for very efficient scans CREATE TABLE METRICS_TABLE ( CREATED_DATE NOT NULL DATE, METRIC_ID NOT NULL CHAR(15), METRIC_VALUE LONG CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP, METRIC_ID)) SALT_BUCKETS = 8;
  • 27. Page27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Transactions Uses Tephra Snapshot isolation semantics Completely optional. • Can be enabled per-table (TRANSACTIONAL=true) • Transactional and non-transactional tables can live side by side Transactions see their own uncommitted data Released in 4.7, will GA in 5.0 Optimistic Concurrency Control • No locking for rows • Transactions have to roll back and undo their writes in case of conflict • Cost of conflict is higher
  • 28. Page28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Tephra Architecture RegionServer 2 Tephra / HBase Client RegionServer 1 RegionServer 3 HBase client ZooKeeper Quorum Tephra Trx Manager (active) Tephra Trx Manager (standby)
  • 29. Page29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Transaction Lifecycle From Tephra presentation http://www.slideshare.n et/alexbaranau/transacti ons-over-hbase
  • 30. Page30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix Query Server Similar to HBase REST Server / Hive Server 2 Built on top of Calcite’s Avatica Server with Phoenix bindings Embeds a Phoenix thick client inside No client side sorting / join! Protobuf-3.0 over HTTP protocol Has a (thin) JDBC driver Allows ODBC driver for Phoenix
  • 31. Page31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix architecture revisited (thick client) RegionServer 2 T:foo, region:d Phoenix RPC endpoint px Application RegionServer 1 T:foo, region:d Phoenix RPC endpoint px RegionServer 3 T:foo, region:d Phoenix RPC endpoint px HBase client Phoenix client / JDBC
  • 32. Page32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix Query Server Phoenix Query Server (thin client) RegionServer 2 T:foo, region:d Phoenix RPC endpoint px Application Phoenix thin client / JDBC RegionServer 1 T:foo, region:d Phoenix RPC endpoint px RegionServer 3 T:foo, region:d Phoenix RPC endpoint px Phoenix client / JDBC HBase client Phoenix Query Server Phoenix client / JDBC HBase client Phoenix Query Server Phoenix client / JDBC HBase client
  • 33. Page33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Other new features (4.8+) Shaded client by default. No more library dependency problems! Phoenix schema mapping to HBase namespace • Allows using isolation and security features of HBase namespaces • Standard SQL syntax: CREATE SCHEMA FOO; USE FOO; LIMIT / OFFSET • We already had LIMIT. Now we have OFFSET • Together with Row-Value-Constructs, covers most of cursor use cases
  • 34. Page34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Part III – The Future All the upcoming stuff !
  • 35. Page35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Local Index • Local Index re-implemented • Instead of a different table, now local index data is kept within the same data table • Local index data goes into a different column family • Index and data is committed together atomically without external transactions • Bunch of stability improvements with region splits and merges
  • 36. Page36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Calcite Integration Calcite is a framework for: • Query parser • Compiler • Planner • Cost based optimizer SQL-92 compliant Based on relational algebra Cost based optimizer with default rules + pluggable rules per-backend Used by Hive / Drill / Kylin / Samza, etc.
  • 37. Page37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Calcite Integration
  • 38. Page38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Phoenix - Hive integration Hive is a very rich and generic execution engine Uses Tez + YARN to execute arbitrary DAG Hive integration enables big joins and other Hive features Phoenix DDL with HiveQL Data insert / update delete (DML) with HiveQL Predicate pushdown, salting, partitioning, partition pruning, etc Can use secondary indexes as well since it uses Phoenix compiler https://issues.apache.org/jira/browse/PHOENIX-2743
  • 39. Page39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Future<Phoenix> JSON support TPC-H / Microstrategy / Tableau queries Sqoop integration Support Omid based transactions Dogfooding within the Hadoop-ecosystem • Ambari Metrics Service (AMS) uses Phoenix • YARN will soon use HBase / Phoenix (ATS) STRUCT type Improvements to cost based optimization Security and other HBase features used from Phoenix See https://phoenix.apache.org/roadmap.html
  • 40. Page40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Further Reference Even more info on https://phoenix.apache.org  New Features: https://phoenix.apache.org/recent.html  Roadmap: https://phoenix.apache.org/roadmap.html Get involved in mailing lists  user@phoenix.apache.org  dev@phoenix.apache.org
  • 41. Page41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thanks Q & A

Hinweis der Redaktion

  1. - What is hbase? - What is it good at? - How do you use it in my applications? Context, first principals
  2. Understand the world it lives in and it’s building blocks
  3. Understand the world it lives in and it’s building blocks
  4. Understand the world it lives in and it’s building blocks