3. Phoenix Overview
SQL layer on top of HBase
Delivered as a embedded JDBC driver
Targeting low latency queries over HBase data
Columns modeled as multi-part row key and key values
Query engine transforms SQL into series of scans
Using native HBase APIs and capabilities
Completed
Coprocessors for aggregation
Custom filters for expression evaluation
Transaction isolation through scan time range
Optionally client-controlled timestamps
Open sourcing soon
100% Java
4. Phoenix SQL Support
SELECT <expression>…
FROM <table>
WHERE <expression>
GROUP BY <expression>…
HAVING <aggregate expression>
ORDER BY <aggregate expression>…
LIMIT <value>
Aggregation Functions
MIN, MAX, AVG, SUM, COUNT
Built-in Functions
SUBSTR, ROUND, TRUNC, TO_CHAR, TO_DATE
Operators
=,!=,<>,<,<=,>,>=, LIKE
AND, OR, NOT
Bind Parameters
?, :#
CASE WHEN
IN (<value>…)
DDL/DML (in progress)
CREATE/DROP <table>
DELETE FROM <table> WHERE <expression>
UPSERT INTO <table> [(<column>…)]
VALUES (<value>…)
5. Sample Queries
SELECT host, TRUNC(dateTime, 'DAY'),
Completed
AVG(cache_hit), MIN(cache_hit), MAX(cache_hit)
FROM server_metrics
WHERE host LIKE 'cs11-%'
AND dateTime> TO_DATE('2012-04-01')
AND dateTime< TO_DATE('2012-07-01')
GROUP BY host, TRUNC(dateTime, 'DAY')
HAVING MIN(cache_hit) < 90
ORDER BY host, AVG(cache_hit)
SELECT product_number, product_name,
CASE
WHEN list_price = 0 THEN 'Mfg item - not for resale'
WHEN list_price < 50 THEN 'Under $50'
WHEN list_price >= 50 and list_price < 250 THEN 'Under $250'
WHEN list_price >= 250 and list_price < 1000 THEN 'Under $1000'
ELSE 'Over $1000'
END as price_category
FROM product_catalogue
WHERE product_category IN ('Camping', 'Hiking’)
AND (product_name LIKE '%Pack’ OR product_name LIKE '% Cots %’)
6. Query Processing
Product Metrics HTable
Row Key ORG_ID DATE FEATURE
TXNS
Key Values IO_TIME
RESPONSE_TIME
Scan
SELECT feature, SUM(txns) Start key: ORG_ID (:1) + DATE (:2)
FROM product_metrics End key: ORG_ID (:1) + DATE (:3)
Filter
WHERE org_id = :1 Filter: IO_TIME > 100
AND date >= :2 Aggregation
AND date <= :3 Intercepts scan on region server
Builds map of distinct FEATURE values
AND io_time > 100
Returns one row per distinct group
GROUP BY feature Client does final merge
7. Phoenix Query Optimizations
Start/stop key of scan based on AND-ed columns
Through SUBSTR, ROUND, TRUNC, LIKE
Parallelized on client by chunking over start/stop key of scan
Aggregation on region-servers through coprocessor
Inline for GROUP BY over row key ordered columns
In memory map per group otherwise
WHERE clause executed through custom filters
Completed
Incremental evaluation with early termination
Evaluated through byte pointers
IN and OR over same column (in progress)
Becomes batched get or filter with next row hint
Top N queries (future)
Through coprocessor keeping top N rows
TABLESAMPLE (future)
Becomes filter with next row hint
10. Phoenix Roadmap
Increase breadth of SQL support
DML/DDL (in progress)
Derived tables (SELECT * FROM (SELECT foo FROM bar))
More built-in functions: COALESCE, UPPER, TRIM
More operators: ||, IS NULL, *,/,+,-
Secondary indexes
Multiple projections for immutable data
Reordered columns Completed
in row key
Different levels of aggregation
Incrementally maintained for non immutable data
TABLESAMPLE for sampling
Improve multi-byte support
Joins
Hash join
OLAP extensions
OVER
PARTITION BY