Apache phoenix

Apache
Phoenix
Put the SQL back in NoSQL
1
Osama Hussein, March 2021

Agenda
● History
● Overview
● Architecture
2
● Capabilities
● Code
● Scenarios

1.
History
From open-source repo to top Apache project

Overview (Apache Phoenix)
4
● Began as an internal project by the
company (salesforce.com).
MAY 2014
JAN 2014
A Top-Level Apache
Project
Orignially Open-
Sourced on Github

2.
Overview
UDF, Transactions and Schema

6
Lorem ipsum
congue tempus
Support for
late-bound,
schema-on-
read
SQL and
JDBC API
support
Access to data
stored and
produced in other
components such
as Apache Spark
and Apache Hive
● Developed as part of Apache Hadoop.
● Runs on top of Hadoop Distributed File System (HDFS).
● HBase scales linearly and shards automatically.

7
Lorem ipsum
congue tempus
Support for
late-bound,
schema-on-
read
SQL and
JDBC API
support
Access to data
stored and
produced in other
components such
as Apache Spark
and Apache Hive
● Apache Phoenix is an add-on for Apache HBase that provides a
programmatic ANSI SQL interface.
● implements best-practice optimizations to enable software
engineers to develop next-generation data-driven applications
based on HBase.
● Create and interact with tables in the form of typical DDL/DML
statements using the standard JDBC API.

8
● Written in Java and SQL
● Atomicity, Consistency, Isolation and
Durability (ACID)
● Fully integrated with other Hadoop
products such as Spark, Hive, Pig, Flume,
and Map Reduce.

9
● included in
○ Cloudera Data Platform 7.0 and above.
○ Hortonworks distribution for HDP 2.1
and above.
○ Available as part of Cloudera labs.
○ Part of the Hadoop ecosystem.

Overview (SQL Support)
10
● Compiles SQL to and orchestrate running
of HBase scans.
● Produces JDBC result set.
● All standard SQL query constructs are
supported.

Overview (SQL Support)
11
● Direct use of the HBase API, along with
coprocessors and custom filters.
Performance:
○ Milliseconds for small queries
○ Seconds for tens of millions of rows.

Overview (Bulk Loading)
12
● MapReduce-based :
○ CSV and JSON
○ Via Phoenix
○ MapReduce library
● Single-Threaded:
○ CSV
○ Via PostgreSQL (PSQL)
○ HBase on local machine

Overview (User Defintion Functions)
13
● Temporary UDFs for sessions only.
● Permanent UDFs stored in system functions.
● UDF used in SQL and indexes.
● Tenant specific UDF usage and support.
● UDF jar update require cluster bounce.

Overview (Transactions)
14
● Using Apache Tephra cross row/table/ACID support.
● Create tables with flag ‘transactional=true’.
● Enable transactions and snapshot directory and set
timeout value ‘hbase-site.xml’.
● Transactions start with statement against table.
● Transactions end with commit or rollback.

Overview (Transactions)
15
● Applications let HBase manage timestamps.
● Incase the application needs to control the timestamp
‘CurrentSCN’ property must be specified at the
connection time.
● ‘CurrentSCN’ controls the timestamp for any DDL,
DML, or query.

Overview (Schema)
16
● The table metadata is stored in versioned HBase table
(Up to 1000 versions).
● ‘UPDATE_CACHE_FREQUENCY’ allow the user to
declare how often the server will be checked for meta
data updates. Values:
○ Always
○ Never
○ Millisecond value

Overview (Schema)
17
● Phoenix table can be:
○ Built from scratch.
○ Mapped to an existing HBase table.
■ Read-Write Table
■ Read-Only View

Overview (Schema)
18
 Read-Write Table:
○ column families will be created automatically if they
don’t already exist.
○ An empty key value will be added to the first column
family of each existing row to minimize the size of
the projection for queries.

Overview (Schema)
19
 Read-Only View:
○ All column families must already exist.
○ Addition of the Phoenix coprocessors used for query
processing (Only change to HBase table).

3.
Architecture
Architecture, Phoenix Data Mode, Query Execution
and Enviroment

Architecture (Phoenix Data Model)
23

Architecture (Server Metrics Example)
24

Architecture (Server Metrics Example)
25
● Example:

26
Overlay Row Key
Query
Perform Merge Sort
Skip Filtering
Scan Interception
Execute Scan
Perform Final Merge Sort
Intercept Scan in Coprocessor
Filter using Skip Scan
Execute Parallel Scans
Overlay Row Key Ranges with Regions
Identify Row Key Ranges from Query
Architecture (Query Execution)

Architecture (Enviroment)
27
Data Warehouse
Extract, Transform,
Load (ETL)
BI and Visualizing

4.
Code
Commands and Sample Codes

Code (Commands)
29
● DML Commands:
○ UPSERT VALUES
○ UPSERT SELECT
○ DELETE
● DDL Commands:
○ CREATE TABLE
○ CREATE VIEW
○ Drop Table
○ Drop View

30
Connection:
● Long Running
● Short Running Connection conn =
DriverManager.getConnection(“jdbc:phoenix:my_server:longRunning”,
longRunningProps);
Connection conn =
DriverManager.getConnection("jdbc:phoenix:my_server:shortRunning",
shortRunningProps);

31
@Test
public void createTable() throws Exception {
String tableName = generateUniqueName();
long numSaltBuckets = 6;
String ddl = "CREATE TABLE " + tableName + " (K VARCHAR NOT NULL
PRIMARY KEY, V VARCHAR)" + " SALT_BUCKETS = " + numSaltBuckets;
Connection conn = DriverManager.getConnection(getUrl());
conn.createStatement().execute(ddl);
}
Transactions:
● Create Table

32
@Test
public void readTable() throws Exception {
String tableName = generateUniqueName();
long numSaltBuckets = 6;
long numRows = 1000;
long numExpectedTasks = numSaltBuckets;
insertRowsInTable(tableName, numRows);
String query = "SELECT * FROM " + tableName;
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(query);
PhoenixResultSet resultSetBeingTested = rs.unwrap(PhoenixResultSet.class);
changeInternalStateForTesting(resultSetBeingTested);
while (resultSetBeingTested.next()) {}
resultSetBeingTested.close();
Set<String> expectedTableNames = Sets.newHashSet(tableName);
assertReadMetricValuesForSelectSql(Lists.newArrayList(numRows),
Lists.newArrayList(numExpectedTasks),
resultSetBeingTested, expectedTableNames);
}
Transactions:
● Read Table

33
@Override
public void getRowCount(ResultSet resultSet) throws SQLException {
Tuple row = resultSet.unwrap(PhoenixResultSet.class).getCurrentRow();
Cell kv = row.getValue(0);
ImmutableBytesWritable tmpPtr = new
ImmutableBytesWritable(kv.getValueArray(), kv.getValueOffset(),
kv.getValueLength());
// A single Cell will be returned with the count(*) - we decode that here
rowCount = PLong.INSTANCE.getCodec().decodeLong(tmpPtr,
SortOrder.getDefault());
}
Transactions:
● Row Count

34
private void changeInternalStateForTesting(PhoenixResultSet rs) {
// get and set the internal state for testing purposes.
ReadMetricQueue testMetricsQueue = new
TestReadMetricsQueue(LogLevel.OFF,true);
StatementContext ctx = (StatementContext)Whitebox.getInternalState(rs,
"context");
Whitebox.setInternalState(ctx, "readMetricsQueue", testMetricsQueue);
Whitebox.setInternalState(rs, "readMetricsQueue", testMetricsQueue);
}
Transactions:
● Internal State

5.
Capabilities
Features and Capabilities

Capabilities
● Overlays on top of HBase Data Model
● Keeps Versioned Schema Respository
● Query Processor
36

Capabilities
● Cost-based query optimizer.
● Enhance existing statistics collection.
● Generate histograms to drive query
optimization decisions and join ordering.
37

Capabilities
● Secondary indexes:
● Boost the speed of queries without relying
on specific row-key designs.
● Enable users to use star schemes.
● Leverage SQL tools and Online Analytics 38

Capabilities
● Row timestamp column.
● Set minimum and maximum time range
for scans.
● Improves performance especially when
querying the tail-end of the data.
39

Scenarios (Server Metrics Example)
41

SELECT substr(host,1,3), trunc(date,’DAY’),
avg(response_time) FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
42
Scenarios (Chart Response Time Per Cluster)

SELECT host, date, gc_time
FROM server_metrics WHERE date >
CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
ORDER BY gc_time DESC
LIMIT 5 43
Scenarios (Find 5 Longest GC Times )

Thanks!
Any questions?
You can find me at:
Github: @sxaxmz
Linkedin: linkedin.com/in/husseinosama
44

Apache phoenix

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Apache phoenix

Ähnlich wie Apache phoenix (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Apache phoenix

Hinweis der Redaktion