SlideShare ist ein Scribd-Unternehmen logo
1 von 107
Phoenix
James Taylor
@JamesPlusPlus
http://phoenix-hbase.blogspot.com/
We put the SQL back in NoSQL
https://github.com/forcedotcom/phoenix
Agenda
Completed
 What/why HBase?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
 Roadmap
Agenda
Completed
 What/why HBase?
 What/why Phoenix?
 How does Phoenix work?
 Demo
 Roadmap
 Q&A
What is HBase?
Completed
 Developed as part of Apache Hadoop
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
Distributed
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map
Distributed
Sparse
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distributed
Sparse
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distributed Consistent
Sparse
What is HBase?
Completed
 Developed as part of Apache Hadoop
 Runs on top of HDFS
 Key/value store
Map Sorted
Distributed Consistent
Sparse Multidimensional
Cluster Architecture
Sharding
Why Use HBase?
Completed
 If you have lots of data
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without transactions
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without transactions
 If your data changes
Why Use HBase?
Completed
 If you have lots of data
 Scales linearly
 Shards automatically
 If you can live without transactions
 If your data changes
 If you need strict consistency
What is Phoenix?
Completed
What is Phoenix?
Completed
 SQL skin for HBase
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed
 Compiles SQL into native HBase calls
What is Phoenix?
Completed
 SQL skin for HBase
 Alternate client API
 Embedded JDBC driver
 Runs at HBase native speed
 Compiles SQL into native HBase calls
 So you don’t have to!
Cluster Architecture
Cluster Architecture
Phoenix
Cluster Architecture
Phoenix
Phoenix
Phoenix Performance
Why Use Phoenix?
Why Use Phoenix?
Completed
 Give folks an API they already know
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
SELECT TRUNC(date,'DAY’), AVG(cpu)
FROM web_stat
WHERE domain LIKE 'Salesforce%’
GROUP BY TRUNC(date,'DAY’)
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimizations transparently
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimizations transparently
 Aggregation
 Skip Scan
 Secondary indexing (soon!)
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimizations transparently
 Leverage existing tooling
Why Use Phoenix?
Completed
 Give folks an API they already know
 Reduce the amount of code needed
 Perform optimizations transparently
 Leverage existing tooling
 SQL client/terminal
 OLAP engine
How Does Phoenix Work?
Completed
 Overlays on top of HBase Data Model
 Keeps Versioned Schema Respository
 Query Processor
Phoenix Data Model
HBase Table
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Phoenix maps HBase data model to the relational world
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
Multiple Versions
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
Phoenix Table
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
Phoenix Table
Key Value Columns
Phoenix Data Model
HBase Table
Column Family A Column Family B
Qualifier 1 Qualifier 2 Qualifier 3
Row Key 1 Value
Row Key 2 Value Value
Row Key 3 Value
Phoenix maps HBase data model to the relational world
Phoenix Table
Key Value ColumnsRow Key Columns
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 SYSTEM.TABLE
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 CREATE TABLE
 ALTER TABLE
 DROP TABLE
 CREATE INDEX
 DROP INDEX
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
 Correlates timestamps between schema and data
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
 Correlates timestamps between schema and data
 Flashback queries use schema that was in-place then
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
 Correlates timestamps between schema and data
 Accessible via JDBC metadata APIs
Phoenix Metadata
Completed
 Stored in a Phoenix HBase table
 Updated through DDL commands
 Keeps older versions as schema evolves
 Correlates timestamps between schema and data
 Accessible via JDBC metadata APIs
 java.sql.DatabaseMetaData
 Through Phoenix queries!
Example
Row Key
SERVER METRICS
HOST VARCHAR
DATE DATE
RESPONSE_TIME INTEGER
GC_TIME INTEGER
CPU_TIME INTEGER
IO_TIME INTEGER
…
Over metrics data for clusters of servers with a schema like this:
Example
Over metrics data for clusters of servers with a schema like this:
Key Values
SERVER METRICS
HOST VARCHAR
DATE DATE
RESPONSE_TIME INTEGER
GC_TIME INTEGER
CPU_TIME INTEGER
IO_TIME INTEGER
…
With 90 days of data that looks like this:
SERVER METRICS
HOST DATE RESPONSE_TIME GC_TIME
sf1.s1 Jun5 10:10:10.234 1234
sf1.s1 Jun 5 11:18:28.456 8012
…
sf3.s1 Jun5 10:10:10.234 2345
sf3.s1 Jun 6 12:46:19.123 2340
sf7.s9 Jun 4 08:23:23.456 5002 1234
…
Example
Example
Walk through query processing for three scenarios
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
2. Identify 5 Longest GC Times
Example
Walk through query processing for three scenarios
1. Chart Response Time Per Cluster
2. Identify 5 Longest GC Times
3. Identify 5 Longest GC Times again and again
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Scenario 1
Chart Response Time Per Cluster
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
sf1
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
sf1
sf3
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
sf1
sf3
sf7
Step 1: Client
Identify Row Key Ranges from Query
Completed
SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time)
FROM server_metrics
WHERE date >CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
Row Key Ranges
HOST DATE
sf1 t1 - *
sf3
sf7
Step 2: Client
Overlay Row Key Ranges with Regions
Completed
R1
R2
R3
R4
sf1
sf4
sf6
sf1
sf3
sf7
Step 3: Client
Execute Parallel Scans
Completed
R1
R2
R3
R4
sf1
sf4
sf6
sf1
sf3
sf7
scan1
scan3
scan2
Step 4: Server
Filter using Skip Scan
Completed
sf1.s1 t0SKIP
Step 4: Server
Filter using Skip Scan
Completed
sf1.s1 t1INCLUDE
Step 4: Server
Filter using Skip Scan
Completed
sf1.s2 t0SKIP
Step 4: Server
Filter using Skip Scan
Completed
sf1.s2 t1
INCLUDE
Step 4: Server
Filter using Skip Scan
sf1.s3 t0SKIP
Step 4: Server
Filter using Skip Scan
sf1.s3 t1INCLUDE
SERVER METRICS
HOST DATE
sf1.s1 Jun 2 10:10:10.234
sf1.s2 Jun 3 23:05:44.975
sf1.s2 Jun 9 08:10:32.147
sf1.s3 Jun 1 11:18:28.456
sf1.s3 Jun 3 22:03:22.142
sf1.s4 Jun 1 10:29:58.950
sf1.s4 Jun 2 14:55:34.104
sf1.s4 Jun 3 12:46:19.123
sf1.s5 Jun 8 08:23:23.456
sf1.s6 Jun 1 10:31:10.234
Step 5: Server
Intercept Scan in Coprocessor
SERVER METRICS
HOST DATE AGG
sf1 Jun 1 …
sf1 Jun 2 …
sf1 Jun 3 …
sf1 Jun 8 …
sf1 Jun 9 …
Step 6: Client
Perform Final Merge Sort
Completed
R1
R2
R3
R4
scan1
scan3
scan2
SERVER METRICS
HOST DATE AGG
sf1 Jun5 …
sf1 Jun 9 …
sf3 Jun 1 …
sf3 Jun 2 …
sf7 Jun 1 …
sf7 Jun 8 …
Scenario 2
Find 5 Longest GC Times
Completed
SELECT host, date, gc_time
FROMserver_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
ORDER BY gc_time DESC
LIMIT 5
Scenario 2
Find 5 Longest GC Times
• Same client parallelization and server skip scan filtering
Scenario 2
Find 5 Longest GC Times
Completed
• Same client parallelization and server skip scan filtering
• Server holds 5 longest GC_TIME value for each scan
R1
SERVER METRICS
HOST DATE GC_TIME
sf1.s1 Jun 2 10:10:10.234 22123
sf1.s1 Jun 3 23:05:44.975 19876
sf1.s1 Jun 9 08:10:32.147 11345
sf1.s2 Jun 1 11:18:28.456 10234
sf1.s2 Jun 3 22:03:22.142 10111
SERVER METRICS
HOST DATE GC_TIME
sf1.s1 Jun 2 10:10:10.234 22123
sf1.s1 Jun 3 23:05:44.975 19876
sf1.s1 Jun 9 08:10:32.147 11345
sf1.s2 Jun 1 11:18:28.456 10234
sf1.s2 Jun 3 22:03:22.142 10111
Scenario 2
Find 5 Longest GC Times
• Same client parallelization and server skip scan filtering
• Server holds 5 longest GC_TIME value for each scan
• Client performs final merge sort among parallel scans
Scan1
Scan2
Scan3
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics(gc_time DESC, date DESC)
INCLUDE (host, response_time)
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUDE (host, response_time)
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUDE (host, response_time)
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUDE (host, response_time)
Row Key
GC_TIME_INDEX
GC_TIME INTEGER
DATE DATE
HOST VARCHAR
RESPONSE_TIME INTEGER
Scenario 3
Find 5 Longest GC Times
Completed
CREATE INDEX gc_time_index
ON server_metrics (gc_time DESC, date DESC)
INCLUDE (host, response_time)
Key Value
GC_TIME_INDEX
GC_TIME INTEGER
DATE DATE
HOST VARCHAR
RESPONSE_TIME INTEGER
Scenario 3
Find 5 Longest GC Times
Completed
SELECT host, date, gc_time
FROMserver_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
ORDER BY gc_time DESC
LIMIT 5
Demo
Completed
 Phoenix Stock Analyzer
 Fortune 500 companies
 10 years of historical stock prices
 Demonstrates Skip Scan in action
 Running locally on my single node
laptop cluster
Phoenix Roadmap
Completed
 Secondary Indexing
 Count distinct and percentile
 Derived tables
 Hash Joins
 Apache Drill integration
 Cost-based query optimizer
 OLAP extensions
 Transactions
Thank you!
Questions/comments?

Weitere ähnliche Inhalte

Was ist angesagt?

More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedInMore Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedInCelia Kung
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineAljoscha Krettek
 
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3Splunk
 
phoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupphoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupMaryann Xue
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event PresentationChartio
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
URL Design with Lasso
URL Design with LassoURL Design with Lasso
URL Design with Lassomacsolve
 
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...Yahoo Developer Network
 
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...Severalnines
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016Metosin Oy
 
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsWebinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsSeveralnines
 
Optimizing Apache Spark UDFs
Optimizing Apache Spark UDFsOptimizing Apache Spark UDFs
Optimizing Apache Spark UDFsDatabricks
 
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep diveWebinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep diveSeveralnines
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Josh Elser
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineSpark Summit
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingDataWorks Summit
 

Was ist angesagt? (20)

More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedInMore Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka Mirroring Pipelines at LinkedIn
 
Apache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing EngineApache Flink - A Stream Processing Engine
Apache Flink - A Stream Processing Engine
 
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
SplunkSummit 2015 - Update on Splunk Enterprise 6.3 & Hunk 6.3
 
phoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetupphoenix-on-calcite-nyc-meetup
phoenix-on-calcite-nyc-meetup
 
Splunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search DojoSplunk Ninjas: New Features, Pivot, and Search Dojo
Splunk Ninjas: New Features, Pivot, and Search Dojo
 
Redshift Chartio Event Presentation
Redshift Chartio Event PresentationRedshift Chartio Event Presentation
Redshift Chartio Event Presentation
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
URL Design with Lasso
URL Design with LassoURL Design with Lasso
URL Design with Lasso
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
 
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
Webinar slides: MySQL Query Tuning Trilogy: Working with optimizer and SQL tu...
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
Schema tools-and-trics-and-quick-intro-to-clojure-spec-22.6.2016
 
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and toolsWebinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
Webinar replay: MySQL Query Tuning Trilogy: Query tuning process and tools
 
Optimizing Apache Spark UDFs
Optimizing Apache Spark UDFsOptimizing Apache Spark UDFs
Optimizing Apache Spark UDFs
 
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep diveWebinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
Webinar slides: MySQL Query Tuning Trilogy: Indexing and EXPLAIN - deep dive
 
Calcite meetup-2016-04-20
Calcite meetup-2016-04-20Calcite meetup-2016-04-20
Calcite meetup-2016-04-20
 
Operational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos ChristineOperational Tips for Deploying Spark by Miklos Christine
Operational Tips for Deploying Spark by Miklos Christine
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
 

Ähnlich wie Phoenix: How (and why) we put the SQL back into the NoSQL

Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBaseRohit Jain
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
HBaseCon2016-final
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-finalMaryann Xue
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 StepsScott Cinnamond
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesHBaseCon
 
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.Vijaykumar Vangapandu
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Technical Overview on Cloudera Impala
Technical Overview on Cloudera ImpalaTechnical Overview on Cloudera Impala
Technical Overview on Cloudera ImpalaPraneeth Krishna
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-finalMaryann Xue
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentHostedbyConfluent
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
SAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
SAP HANA Integrated Online Course Training in Hyderabad | Imagine LifeSAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
SAP HANA Integrated Online Course Training in Hyderabad | Imagine LifeImagine life
 

Ähnlich wie Phoenix: How (and why) we put the SQL back into the NoSQL (20)

Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase2 - Trafodion and Hadoop HBase
2 - Trafodion and Hadoop HBase
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
HBaseCon2016-final
HBaseCon2016-finalHBaseCon2016-final
HBaseCon2016-final
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
 
Apache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New FeaturesApache Phoenix: Use Cases and New Features
Apache Phoenix: Use Cases and New Features
 
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.eHarmony @ Hbase Conference 2016 by vijay vangapandu.
eHarmony @ Hbase Conference 2016 by vijay vangapandu.
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
Technical Overview on Cloudera Impala
Technical Overview on Cloudera ImpalaTechnical Overview on Cloudera Impala
Technical Overview on Cloudera Impala
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
HBaseCon2015-final
HBaseCon2015-finalHBaseCon2015-final
HBaseCon2015-final
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, ConfluentCan Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
 
HiveWarehouseConnector
HiveWarehouseConnectorHiveWarehouseConnector
HiveWarehouseConnector
 
SAP S/4 HANA ONLINE TRAINING
SAP S/4 HANA ONLINE TRAININGSAP S/4 HANA ONLINE TRAINING
SAP S/4 HANA ONLINE TRAINING
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
SAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
SAP HANA Integrated Online Course Training in Hyderabad | Imagine LifeSAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
SAP HANA Integrated Online Course Training in Hyderabad | Imagine Life
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Phoenix: How (and why) we put the SQL back into the NoSQL

  • 1. Phoenix James Taylor @JamesPlusPlus http://phoenix-hbase.blogspot.com/ We put the SQL back in NoSQL https://github.com/forcedotcom/phoenix
  • 4. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?
  • 5. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo
  • 6. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo  Roadmap
  • 7. Agenda Completed  What/why HBase?  What/why Phoenix?  How does Phoenix work?  Demo  Roadmap  Q&A
  • 8. What is HBase? Completed  Developed as part of Apache Hadoop
  • 9. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS
  • 10. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store
  • 11. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map
  • 12. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Distributed
  • 13. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Distributed Sparse
  • 14. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Sparse
  • 15. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Consistent Sparse
  • 16. What is HBase? Completed  Developed as part of Apache Hadoop  Runs on top of HDFS  Key/value store Map Sorted Distributed Consistent Sparse Multidimensional
  • 19. Why Use HBase? Completed  If you have lots of data
  • 20. Why Use HBase? Completed  If you have lots of data  Scales linearly
  • 21. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically
  • 22. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions
  • 23. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions  If your data changes
  • 24. Why Use HBase? Completed  If you have lots of data  Scales linearly  Shards automatically  If you can live without transactions  If your data changes  If you need strict consistency
  • 26. What is Phoenix? Completed  SQL skin for HBase
  • 27. What is Phoenix? Completed  SQL skin for HBase  Alternate client API
  • 28. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver
  • 29. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed
  • 30. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed  Compiles SQL into native HBase calls
  • 31. What is Phoenix? Completed  SQL skin for HBase  Alternate client API  Embedded JDBC driver  Runs at HBase native speed  Compiles SQL into native HBase calls  So you don’t have to!
  • 37. Why Use Phoenix? Completed  Give folks an API they already know
  • 38. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed
  • 39. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed SELECT TRUNC(date,'DAY’), AVG(cpu) FROM web_stat WHERE domain LIKE 'Salesforce%’ GROUP BY TRUNC(date,'DAY’)
  • 40. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently
  • 41. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Aggregation  Skip Scan  Secondary indexing (soon!)
  • 42. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Leverage existing tooling
  • 43. Why Use Phoenix? Completed  Give folks an API they already know  Reduce the amount of code needed  Perform optimizations transparently  Leverage existing tooling  SQL client/terminal  OLAP engine
  • 44. How Does Phoenix Work? Completed  Overlays on top of HBase Data Model  Keeps Versioned Schema Respository  Query Processor
  • 45. Phoenix Data Model HBase Table Phoenix maps HBase data model to the relational world
  • 46. Phoenix Data Model HBase Table Column Family A Column Family B Phoenix maps HBase data model to the relational world
  • 47. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Phoenix maps HBase data model to the relational world
  • 48. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Phoenix maps HBase data model to the relational world
  • 49. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Phoenix maps HBase data model to the relational world
  • 50. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  • 51. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  • 52. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world
  • 53. HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Multiple Versions
  • 54. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table
  • 55. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table Key Value Columns
  • 56. Phoenix Data Model HBase Table Column Family A Column Family B Qualifier 1 Qualifier 2 Qualifier 3 Row Key 1 Value Row Key 2 Value Value Row Key 3 Value Phoenix maps HBase data model to the relational world Phoenix Table Key Value ColumnsRow Key Columns
  • 57. Phoenix Metadata Completed  Stored in a Phoenix HBase table
  • 58. Phoenix Metadata Completed  Stored in a Phoenix HBase table  SYSTEM.TABLE
  • 59. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands
  • 60. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  CREATE TABLE  ALTER TABLE  DROP TABLE  CREATE INDEX  DROP INDEX
  • 61. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves
  • 62. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data
  • 63. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Flashback queries use schema that was in-place then
  • 64. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Accessible via JDBC metadata APIs
  • 65. Phoenix Metadata Completed  Stored in a Phoenix HBase table  Updated through DDL commands  Keeps older versions as schema evolves  Correlates timestamps between schema and data  Accessible via JDBC metadata APIs  java.sql.DatabaseMetaData  Through Phoenix queries!
  • 66. Example Row Key SERVER METRICS HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER … Over metrics data for clusters of servers with a schema like this:
  • 67. Example Over metrics data for clusters of servers with a schema like this: Key Values SERVER METRICS HOST VARCHAR DATE DATE RESPONSE_TIME INTEGER GC_TIME INTEGER CPU_TIME INTEGER IO_TIME INTEGER …
  • 68. With 90 days of data that looks like this: SERVER METRICS HOST DATE RESPONSE_TIME GC_TIME sf1.s1 Jun5 10:10:10.234 1234 sf1.s1 Jun 5 11:18:28.456 8012 … sf3.s1 Jun5 10:10:10.234 2345 sf3.s1 Jun 6 12:46:19.123 2340 sf7.s9 Jun 4 08:23:23.456 5002 1234 … Example
  • 69. Example Walk through query processing for three scenarios
  • 70. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster
  • 71. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster 2. Identify 5 Longest GC Times
  • 72. Example Walk through query processing for three scenarios 1. Chart Response Time Per Cluster 2. Identify 5 Longest GC Times 3. Identify 5 Longest GC Times again and again
  • 73. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 74. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 75. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 76. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 77. Scenario 1 Chart Response Time Per Cluster Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
  • 78. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  • 79. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  • 80. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE
  • 81. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1
  • 82. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 sf3
  • 83. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 sf3 sf7
  • 84. Step 1: Client Identify Row Key Ranges from Query Completed SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date >CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3’, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) Row Key Ranges HOST DATE sf1 t1 - * sf3 sf7
  • 85. Step 2: Client Overlay Row Key Ranges with Regions Completed R1 R2 R3 R4 sf1 sf4 sf6 sf1 sf3 sf7
  • 86. Step 3: Client Execute Parallel Scans Completed R1 R2 R3 R4 sf1 sf4 sf6 sf1 sf3 sf7 scan1 scan3 scan2
  • 87. Step 4: Server Filter using Skip Scan Completed sf1.s1 t0SKIP
  • 88. Step 4: Server Filter using Skip Scan Completed sf1.s1 t1INCLUDE
  • 89. Step 4: Server Filter using Skip Scan Completed sf1.s2 t0SKIP
  • 90. Step 4: Server Filter using Skip Scan Completed sf1.s2 t1 INCLUDE
  • 91. Step 4: Server Filter using Skip Scan sf1.s3 t0SKIP
  • 92. Step 4: Server Filter using Skip Scan sf1.s3 t1INCLUDE
  • 93. SERVER METRICS HOST DATE sf1.s1 Jun 2 10:10:10.234 sf1.s2 Jun 3 23:05:44.975 sf1.s2 Jun 9 08:10:32.147 sf1.s3 Jun 1 11:18:28.456 sf1.s3 Jun 3 22:03:22.142 sf1.s4 Jun 1 10:29:58.950 sf1.s4 Jun 2 14:55:34.104 sf1.s4 Jun 3 12:46:19.123 sf1.s5 Jun 8 08:23:23.456 sf1.s6 Jun 1 10:31:10.234 Step 5: Server Intercept Scan in Coprocessor SERVER METRICS HOST DATE AGG sf1 Jun 1 … sf1 Jun 2 … sf1 Jun 3 … sf1 Jun 8 … sf1 Jun 9 …
  • 94. Step 6: Client Perform Final Merge Sort Completed R1 R2 R3 R4 scan1 scan3 scan2 SERVER METRICS HOST DATE AGG sf1 Jun5 … sf1 Jun 9 … sf3 Jun 1 … sf3 Jun 2 … sf7 Jun 1 … sf7 Jun 8 …
  • 95. Scenario 2 Find 5 Longest GC Times Completed SELECT host, date, gc_time FROMserver_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5
  • 96. Scenario 2 Find 5 Longest GC Times • Same client parallelization and server skip scan filtering
  • 97. Scenario 2 Find 5 Longest GC Times Completed • Same client parallelization and server skip scan filtering • Server holds 5 longest GC_TIME value for each scan R1 SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123 sf1.s1 Jun 3 23:05:44.975 19876 sf1.s1 Jun 9 08:10:32.147 11345 sf1.s2 Jun 1 11:18:28.456 10234 sf1.s2 Jun 3 22:03:22.142 10111
  • 98. SERVER METRICS HOST DATE GC_TIME sf1.s1 Jun 2 10:10:10.234 22123 sf1.s1 Jun 3 23:05:44.975 19876 sf1.s1 Jun 9 08:10:32.147 11345 sf1.s2 Jun 1 11:18:28.456 10234 sf1.s2 Jun 3 22:03:22.142 10111 Scenario 2 Find 5 Longest GC Times • Same client parallelization and server skip scan filtering • Server holds 5 longest GC_TIME value for each scan • Client performs final merge sort among parallel scans Scan1 Scan2 Scan3
  • 99. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics(gc_time DESC, date DESC) INCLUDE (host, response_time)
  • 100. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)
  • 101. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time)
  • 102. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time) Row Key GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER
  • 103. Scenario 3 Find 5 Longest GC Times Completed CREATE INDEX gc_time_index ON server_metrics (gc_time DESC, date DESC) INCLUDE (host, response_time) Key Value GC_TIME_INDEX GC_TIME INTEGER DATE DATE HOST VARCHAR RESPONSE_TIME INTEGER
  • 104. Scenario 3 Find 5 Longest GC Times Completed SELECT host, date, gc_time FROMserver_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5
  • 105. Demo Completed  Phoenix Stock Analyzer  Fortune 500 companies  10 years of historical stock prices  Demonstrates Skip Scan in action  Running locally on my single node laptop cluster
  • 106. Phoenix Roadmap Completed  Secondary Indexing  Count distinct and percentile  Derived tables  Hash Joins  Apache Drill integration  Cost-based query optimizer  OLAP extensions  Transactions