SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Apache
Phoenix
Put the SQL back in NoSQL
1
Osama Hussein, March 2021
Agenda
● History
● Overview
● Architecture
2
● Capabilities
● Code
● Scenarios
1.
History
From open-source repo to top Apache project
Overview (Apache Phoenix)
4
● Began as an internal project by the
company (salesforce.com).
MAY 2014
JAN 2014
A Top-Level Apache
Project
Orignially Open-
Sourced on Github
2.
Overview
UDF, Transactions and Schema
Overview (Apache Phoenix)
6
Lorem ipsum
congue tempus
Support for
late-bound,
schema-on-
read
SQL and
JDBC API
support
Access to data
stored and
produced in other
components such
as Apache Spark
and Apache Hive
● Developed as part of Apache Hadoop.
● Runs on top of Hadoop Distributed File System (HDFS).
● HBase scales linearly and shards automatically.
Overview (Apache Phoenix)
7
Lorem ipsum
congue tempus
Support for
late-bound,
schema-on-
read
SQL and
JDBC API
support
Access to data
stored and
produced in other
components such
as Apache Spark
and Apache Hive
● Apache Phoenix is an add-on for Apache HBase that provides a
programmatic ANSI SQL interface.
● implements best-practice optimizations to enable software
engineers to develop next-generation data-driven applications
based on HBase.
● Create and interact with tables in the form of typical DDL/DML
statements using the standard JDBC API.
Overview (Apache Phoenix)
8
● Written in Java and SQL
● Atomicity, Consistency, Isolation and
Durability (ACID)
● Fully integrated with other Hadoop
products such as Spark, Hive, Pig, Flume,
and Map Reduce.
Overview (Apache Phoenix)
9
● included in
○ Cloudera Data Platform 7.0 and above.
○ Hortonworks distribution for HDP 2.1
and above.
○ Available as part of Cloudera labs.
○ Part of the Hadoop ecosystem.
Overview (SQL Support)
10
● Compiles SQL to and orchestrate running
of HBase scans.
● Produces JDBC result set.
● All standard SQL query constructs are
supported.
Overview (SQL Support)
11
● Direct use of the HBase API, along with
coprocessors and custom filters.
Performance:
○ Milliseconds for small queries
○ Seconds for tens of millions of rows.
Overview (Bulk Loading)
12
● MapReduce-based :
○ CSV and JSON
○ Via Phoenix
○ MapReduce library
● Single-Threaded:
○ CSV
○ Via PostgreSQL (PSQL)
○ HBase on local machine
Overview (User Defintion Functions)
13
● Temporary UDFs for sessions only.
● Permanent UDFs stored in system functions.
● UDF used in SQL and indexes.
● Tenant specific UDF usage and support.
● UDF jar update require cluster bounce.
Overview (Transactions)
14
● Using Apache Tephra cross row/table/ACID support.
● Create tables with flag ‘transactional=true’.
● Enable transactions and snapshot directory and set
timeout value ‘hbase-site.xml’.
● Transactions start with statement against table.
● Transactions end with commit or rollback.
Overview (Transactions)
15
● Applications let HBase manage timestamps.
● Incase the application needs to control the timestamp
‘CurrentSCN’ property must be specified at the
connection time.
● ‘CurrentSCN’ controls the timestamp for any DDL,
DML, or query.
Overview (Schema)
16
● The table metadata is stored in versioned HBase table
(Up to 1000 versions).
● ‘UPDATE_CACHE_FREQUENCY’ allow the user to
declare how often the server will be checked for meta
data updates. Values:
○ Always
○ Never
○ Millisecond value
Overview (Schema)
17
● Phoenix table can be:
○ Built from scratch.
○ Mapped to an existing HBase table.
■ Read-Write Table
■ Read-Only View
Overview (Schema)
18
 Read-Write Table:
○ column families will be created automatically if they
don’t already exist.
○ An empty key value will be added to the first column
family of each existing row to minimize the size of
the projection for queries.
Overview (Schema)
19
 Read-Only View:
○ All column families must already exist.
○ Addition of the Phoenix coprocessors used for query
processing (Only change to HBase table).
3.
Architecture
Architecture, Phoenix Data Mode, Query Execution
and Enviroment
Architecture
21
Architecture
22
Architecture (Phoenix Data Model)
23
Architecture (Server Metrics Example)
24
Architecture (Server Metrics Example)
25
● Example:
26
Overlay Row Key
Query
Perform Merge Sort
Skip Filtering
Scan Interception
Execute Scan
Perform Final Merge Sort
Intercept Scan in Coprocessor
Filter using Skip Scan
Execute Parallel Scans
Overlay Row Key Ranges with Regions
Identify Row Key Ranges from Query
Architecture (Query Execution)
Architecture (Enviroment)
27
Data Warehouse
Extract, Transform,
Load (ETL)
BI and Visualizing
4.
Code
Commands and Sample Codes
Code (Commands)
29
● DML Commands:
○ UPSERT VALUES
○ UPSERT SELECT
○ DELETE
● DDL Commands:
○ CREATE TABLE
○ CREATE VIEW
○ Drop Table
○ Drop View
30
Connection:
● Long Running
● Short Running Connection conn =
DriverManager.getConnection(“jdbc:phoenix:my_server:longRunning”,
longRunningProps);
Connection conn =
DriverManager.getConnection("jdbc:phoenix:my_server:shortRunning",
shortRunningProps);
31
@Test
public void createTable() throws Exception {
String tableName = generateUniqueName();
long numSaltBuckets = 6;
String ddl = "CREATE TABLE " + tableName + " (K VARCHAR NOT NULL
PRIMARY KEY, V VARCHAR)" + " SALT_BUCKETS = " + numSaltBuckets;
Connection conn = DriverManager.getConnection(getUrl());
conn.createStatement().execute(ddl);
}
Transactions:
● Create Table
32
@Test
public void readTable() throws Exception {
String tableName = generateUniqueName();
long numSaltBuckets = 6;
long numRows = 1000;
long numExpectedTasks = numSaltBuckets;
insertRowsInTable(tableName, numRows);
String query = "SELECT * FROM " + tableName;
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(query);
PhoenixResultSet resultSetBeingTested = rs.unwrap(PhoenixResultSet.class);
changeInternalStateForTesting(resultSetBeingTested);
while (resultSetBeingTested.next()) {}
resultSetBeingTested.close();
Set<String> expectedTableNames = Sets.newHashSet(tableName);
assertReadMetricValuesForSelectSql(Lists.newArrayList(numRows),
Lists.newArrayList(numExpectedTasks),
resultSetBeingTested, expectedTableNames);
}
Transactions:
● Read Table
33
@Override
public void getRowCount(ResultSet resultSet) throws SQLException {
Tuple row = resultSet.unwrap(PhoenixResultSet.class).getCurrentRow();
Cell kv = row.getValue(0);
ImmutableBytesWritable tmpPtr = new
ImmutableBytesWritable(kv.getValueArray(), kv.getValueOffset(),
kv.getValueLength());
// A single Cell will be returned with the count(*) - we decode that here
rowCount = PLong.INSTANCE.getCodec().decodeLong(tmpPtr,
SortOrder.getDefault());
}
Transactions:
● Row Count
34
private void changeInternalStateForTesting(PhoenixResultSet rs) {
// get and set the internal state for testing purposes.
ReadMetricQueue testMetricsQueue = new
TestReadMetricsQueue(LogLevel.OFF,true);
StatementContext ctx = (StatementContext)Whitebox.getInternalState(rs,
"context");
Whitebox.setInternalState(ctx, "readMetricsQueue", testMetricsQueue);
Whitebox.setInternalState(rs, "readMetricsQueue", testMetricsQueue);
}
Transactions:
● Internal State
5.
Capabilities
Features and Capabilities
Capabilities
● Overlays on top of HBase Data Model
● Keeps Versioned Schema Respository
● Query Processor
36
Capabilities
● Cost-based query optimizer.
● Enhance existing statistics collection.
● Generate histograms to drive query
optimization decisions and join ordering.
37
Capabilities
● Secondary indexes:
● Boost the speed of queries without relying
on specific row-key designs.
● Enable users to use star schemes.
● Leverage SQL tools and Online Analytics 38
Capabilities
● Row timestamp column.
● Set minimum and maximum time range
for scans.
● Improves performance especially when
querying the tail-end of the data.
39
5.
Scenarios
Use Cases
Scenarios (Server Metrics Example)
41
SELECT substr(host,1,3), trunc(date,’DAY’),
avg(response_time) FROM server_metrics
WHERE date > CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
GROUP BY substr(host, 1, 3), trunc(date,’DAY’)
42
Scenarios (Chart Response Time Per Cluster)
SELECT host, date, gc_time
FROM server_metrics WHERE date >
CURRENT_DATE() – 7
AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’)
ORDER BY gc_time DESC
LIMIT 5 43
Scenarios (Find 5 Longest GC Times )
Thanks!
Any questions?
You can find me at:
Github: @sxaxmz
Linkedin: linkedin.com/in/husseinosama
44

Weitere ähnliche Inhalte

Was ist angesagt?

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
Intelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFiIntelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFi
DataWorks Summit
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
DataWorks Summit
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 

Was ist angesagt? (20)

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Intelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFiIntelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFi
 
Sqoop
SqoopSqoop
Sqoop
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
Envoy and Kafka
Envoy and KafkaEnvoy and Kafka
Envoy and Kafka
 
NiFi Developer Guide
NiFi Developer GuideNiFi Developer Guide
NiFi Developer Guide
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Confluent Enterprise Datasheet
Confluent Enterprise DatasheetConfluent Enterprise Datasheet
Confluent Enterprise Datasheet
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
File Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & ParquetFile Format Benchmarks - Avro, JSON, ORC, & Parquet
File Format Benchmarks - Avro, JSON, ORC, & Parquet
 

Ähnlich wie Apache phoenix

Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Modern Data Stack France
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Qbeast
 

Ähnlich wie Apache phoenix (20)

Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Apache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - PhoenixApache Big Data EU 2015 - Phoenix
Apache Big Data EU 2015 - Phoenix
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
Extending Spark for Qbeast's SQL Data Source​ with Paola Pardo and Cesare Cug...
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAP
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
OpenTSDB 2.0
OpenTSDB 2.0OpenTSDB 2.0
OpenTSDB 2.0
 
Extend Redis with Modules
Extend Redis with ModulesExtend Redis with Modules
Extend Redis with Modules
 
Apache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLabApache Spark Introduction - CloudxLab
Apache Spark Introduction - CloudxLab
 
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...Flink Forward SF 2017: Timo Walther -  Table & SQL API – unified APIs for bat...
Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for bat...
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Java one2013
Java one2013Java one2013
Java one2013
 
Spark Study Notes
Spark Study NotesSpark Study Notes
Spark Study Notes
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
 

Kürzlich hochgeladen

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Apache phoenix

  • 1. Apache Phoenix Put the SQL back in NoSQL 1 Osama Hussein, March 2021
  • 2. Agenda ● History ● Overview ● Architecture 2 ● Capabilities ● Code ● Scenarios
  • 3. 1. History From open-source repo to top Apache project
  • 4. Overview (Apache Phoenix) 4 ● Began as an internal project by the company (salesforce.com). MAY 2014 JAN 2014 A Top-Level Apache Project Orignially Open- Sourced on Github
  • 6. Overview (Apache Phoenix) 6 Lorem ipsum congue tempus Support for late-bound, schema-on- read SQL and JDBC API support Access to data stored and produced in other components such as Apache Spark and Apache Hive ● Developed as part of Apache Hadoop. ● Runs on top of Hadoop Distributed File System (HDFS). ● HBase scales linearly and shards automatically.
  • 7. Overview (Apache Phoenix) 7 Lorem ipsum congue tempus Support for late-bound, schema-on- read SQL and JDBC API support Access to data stored and produced in other components such as Apache Spark and Apache Hive ● Apache Phoenix is an add-on for Apache HBase that provides a programmatic ANSI SQL interface. ● implements best-practice optimizations to enable software engineers to develop next-generation data-driven applications based on HBase. ● Create and interact with tables in the form of typical DDL/DML statements using the standard JDBC API.
  • 8. Overview (Apache Phoenix) 8 ● Written in Java and SQL ● Atomicity, Consistency, Isolation and Durability (ACID) ● Fully integrated with other Hadoop products such as Spark, Hive, Pig, Flume, and Map Reduce.
  • 9. Overview (Apache Phoenix) 9 ● included in ○ Cloudera Data Platform 7.0 and above. ○ Hortonworks distribution for HDP 2.1 and above. ○ Available as part of Cloudera labs. ○ Part of the Hadoop ecosystem.
  • 10. Overview (SQL Support) 10 ● Compiles SQL to and orchestrate running of HBase scans. ● Produces JDBC result set. ● All standard SQL query constructs are supported.
  • 11. Overview (SQL Support) 11 ● Direct use of the HBase API, along with coprocessors and custom filters. Performance: ○ Milliseconds for small queries ○ Seconds for tens of millions of rows.
  • 12. Overview (Bulk Loading) 12 ● MapReduce-based : ○ CSV and JSON ○ Via Phoenix ○ MapReduce library ● Single-Threaded: ○ CSV ○ Via PostgreSQL (PSQL) ○ HBase on local machine
  • 13. Overview (User Defintion Functions) 13 ● Temporary UDFs for sessions only. ● Permanent UDFs stored in system functions. ● UDF used in SQL and indexes. ● Tenant specific UDF usage and support. ● UDF jar update require cluster bounce.
  • 14. Overview (Transactions) 14 ● Using Apache Tephra cross row/table/ACID support. ● Create tables with flag ‘transactional=true’. ● Enable transactions and snapshot directory and set timeout value ‘hbase-site.xml’. ● Transactions start with statement against table. ● Transactions end with commit or rollback.
  • 15. Overview (Transactions) 15 ● Applications let HBase manage timestamps. ● Incase the application needs to control the timestamp ‘CurrentSCN’ property must be specified at the connection time. ● ‘CurrentSCN’ controls the timestamp for any DDL, DML, or query.
  • 16. Overview (Schema) 16 ● The table metadata is stored in versioned HBase table (Up to 1000 versions). ● ‘UPDATE_CACHE_FREQUENCY’ allow the user to declare how often the server will be checked for meta data updates. Values: ○ Always ○ Never ○ Millisecond value
  • 17. Overview (Schema) 17 ● Phoenix table can be: ○ Built from scratch. ○ Mapped to an existing HBase table. ■ Read-Write Table ■ Read-Only View
  • 18. Overview (Schema) 18  Read-Write Table: ○ column families will be created automatically if they don’t already exist. ○ An empty key value will be added to the first column family of each existing row to minimize the size of the projection for queries.
  • 19. Overview (Schema) 19  Read-Only View: ○ All column families must already exist. ○ Addition of the Phoenix coprocessors used for query processing (Only change to HBase table).
  • 20. 3. Architecture Architecture, Phoenix Data Mode, Query Execution and Enviroment
  • 25. Architecture (Server Metrics Example) 25 ● Example:
  • 26. 26 Overlay Row Key Query Perform Merge Sort Skip Filtering Scan Interception Execute Scan Perform Final Merge Sort Intercept Scan in Coprocessor Filter using Skip Scan Execute Parallel Scans Overlay Row Key Ranges with Regions Identify Row Key Ranges from Query Architecture (Query Execution)
  • 27. Architecture (Enviroment) 27 Data Warehouse Extract, Transform, Load (ETL) BI and Visualizing
  • 29. Code (Commands) 29 ● DML Commands: ○ UPSERT VALUES ○ UPSERT SELECT ○ DELETE ● DDL Commands: ○ CREATE TABLE ○ CREATE VIEW ○ Drop Table ○ Drop View
  • 30. 30 Connection: ● Long Running ● Short Running Connection conn = DriverManager.getConnection(“jdbc:phoenix:my_server:longRunning”, longRunningProps); Connection conn = DriverManager.getConnection("jdbc:phoenix:my_server:shortRunning", shortRunningProps);
  • 31. 31 @Test public void createTable() throws Exception { String tableName = generateUniqueName(); long numSaltBuckets = 6; String ddl = "CREATE TABLE " + tableName + " (K VARCHAR NOT NULL PRIMARY KEY, V VARCHAR)" + " SALT_BUCKETS = " + numSaltBuckets; Connection conn = DriverManager.getConnection(getUrl()); conn.createStatement().execute(ddl); } Transactions: ● Create Table
  • 32. 32 @Test public void readTable() throws Exception { String tableName = generateUniqueName(); long numSaltBuckets = 6; long numRows = 1000; long numExpectedTasks = numSaltBuckets; insertRowsInTable(tableName, numRows); String query = "SELECT * FROM " + tableName; Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(query); PhoenixResultSet resultSetBeingTested = rs.unwrap(PhoenixResultSet.class); changeInternalStateForTesting(resultSetBeingTested); while (resultSetBeingTested.next()) {} resultSetBeingTested.close(); Set<String> expectedTableNames = Sets.newHashSet(tableName); assertReadMetricValuesForSelectSql(Lists.newArrayList(numRows), Lists.newArrayList(numExpectedTasks), resultSetBeingTested, expectedTableNames); } Transactions: ● Read Table
  • 33. 33 @Override public void getRowCount(ResultSet resultSet) throws SQLException { Tuple row = resultSet.unwrap(PhoenixResultSet.class).getCurrentRow(); Cell kv = row.getValue(0); ImmutableBytesWritable tmpPtr = new ImmutableBytesWritable(kv.getValueArray(), kv.getValueOffset(), kv.getValueLength()); // A single Cell will be returned with the count(*) - we decode that here rowCount = PLong.INSTANCE.getCodec().decodeLong(tmpPtr, SortOrder.getDefault()); } Transactions: ● Row Count
  • 34. 34 private void changeInternalStateForTesting(PhoenixResultSet rs) { // get and set the internal state for testing purposes. ReadMetricQueue testMetricsQueue = new TestReadMetricsQueue(LogLevel.OFF,true); StatementContext ctx = (StatementContext)Whitebox.getInternalState(rs, "context"); Whitebox.setInternalState(ctx, "readMetricsQueue", testMetricsQueue); Whitebox.setInternalState(rs, "readMetricsQueue", testMetricsQueue); } Transactions: ● Internal State
  • 36. Capabilities ● Overlays on top of HBase Data Model ● Keeps Versioned Schema Respository ● Query Processor 36
  • 37. Capabilities ● Cost-based query optimizer. ● Enhance existing statistics collection. ● Generate histograms to drive query optimization decisions and join ordering. 37
  • 38. Capabilities ● Secondary indexes: ● Boost the speed of queries without relying on specific row-key designs. ● Enable users to use star schemes. ● Leverage SQL tools and Online Analytics 38
  • 39. Capabilities ● Row timestamp column. ● Set minimum and maximum time range for scans. ● Improves performance especially when querying the tail-end of the data. 39
  • 42. SELECT substr(host,1,3), trunc(date,’DAY’), avg(response_time) FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) GROUP BY substr(host, 1, 3), trunc(date,’DAY’) 42 Scenarios (Chart Response Time Per Cluster)
  • 43. SELECT host, date, gc_time FROM server_metrics WHERE date > CURRENT_DATE() – 7 AND substr(host, 1, 3) IN (‘sf1’, ‘sf3, ‘sf7’) ORDER BY gc_time DESC LIMIT 5 43 Scenarios (Find 5 Longest GC Times )
  • 44. Thanks! Any questions? You can find me at: Github: @sxaxmz Linkedin: linkedin.com/in/husseinosama 44

Hinweis der Redaktion

  1. Apache Phoenix -> A scale-out RDBMS with evolutionary schema built on Apache HBase
  2. Internal project out of a need to support a higher level, well understood, SQL language.
  3. Apache HBase -> open-source non-relational distributed database modeled after Google's Bigtable and written in Java. Used to have random, real-time read/write access to Big Data. column-oriented, NoSQL database built on top of Hadoop.
  4. Apache Phoenix -> Open source massively parallel relational database engine supporting database for Online Transactional Processing (OLTP) and operational analytics in Hadoop. Provides JDB browser enabling users to create, delete and alter SQL tables, view instances indexes and querying data through SQL. Apache phoenix is a relational layer over Hbase. SQL skin for Hbase. Provides a JDBC driver that hides the intricacies of the noSQL
  5.  ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. All changes to data are performed as if they are a single operation. 1. Atomicity preserves the “completeness” of the business process (all or nothing behavior) 2. Consistency refers to the state of the data both before and after the transaction is executed (Use transaction maintains the consistency of the state of the data) 3. Isolation means that transactions can run at the same time., there is no concurrency (locking mechanism is required) 4. Durability refers to the impact of an outage or a failure on a running transaction (data survives any failures) To summarize, a transaction will either complete, producing correct results, or terminate, with no effect.
  6. Bulk loading for tables created in phoenix is easier compared to tables created in HBase shell.
  7. (Server Bounce) Adminstrator/Technician removes power to the device in a "non-controlled shutdown.“ The "down" part of the bounce. Once the server is completely off, and all activity has ceased, the administrator restarts the server.
  8. Set phoenix.transactions.enabled property to true along with running transaction manager (included in distribution) to enable full ACID transactions (Tables may optionally be declared as transactionaltable may optionally be declared as transactional). A concurrency model is used to detect row level conflicts with first commit wins semantics. The later commit would produce an exception indicating that a conflict was detected. A transaction is started implicitly when a transactional table is referenced in a statement. at which no updates can be seen from other connections until either a commit or rollback occurs. A non transactional tables will not see their updates until after a commit has occurred. 
  9. Phoenix uses the value of this connection property as the max timestamp of scans. Timestamps may not be controlled for transactional tables. Instead, the transaction manager assigns timestamps which become the HBase cell timestamps after a commit. Timestamps are multiplied by 1,000,000 to ensure enough granularity for uniqueness across the cluster.
  10. Snapshot queries over older data will pick up and use the correct schema based on the time of connection (Based on CurrentSCN). Data updates such as addition or removal of a table column or the updates of table statistics. 1. ALWAYS value will cause the client to check with the server each time a statement is executed that references a table  (or once per commit for an UPSERT VALUES statement. 2. Millisecond value indicates how long the client will hold on to its cached version of the metadata before checking back with the server for updates.
  11. From scratch -> HBase table and column families will be created automatically. Mapped to existing -> The binary representation of the row key and key values must match that of the Phoenix data types
  12. 1. The primary use case for a VIEW is to transfer existing data into a Phoenix table. A table could also be declared as salted to prevent HBase region hot spotting.  The table catalog argument in the metadata APIs is used to filter based on the tenant ID for multi-tenant tables. 2. since data modification are not allowed on a VIEW and query performance will likely be less than as with a TABLE. Phoenix supports updatable views on top of tables with the unique feature leveraging the schemaless capabilities of HBase of being able to add columns to them. All views all share the same underlying physical HBase table and may even be indexed independently. A multi-tenant view may add columns which are defined solely for that user.
  13. 1. The primary use case for a VIEW is to transfer existing data into a Phoenix table. A table could also be declared as salted to prevent HBase region hot spotting.  The table catalog argument in the metadata APIs is used to filter based on the tenant ID for multi-tenant tables. 2. since data modification are not allowed on a VIEW and query performance will likely be less than as with a TABLE. Phoenix supports updatable views on top of tables with the unique feature leveraging the schemaless capabilities of HBase of being able to add columns to them. All views all share the same underlying physical HBase table and may even be indexed independently. A multi-tenant view may add columns which are defined solely for that user.
  14. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  15. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  16. Phoenix chunks up query using guidePosts, which means more threads working on a single region. Phoenix runs the queries in parallel on the client using a configurable number of threads. Aggregation is done in a coprocessor on the server-side, reducing the amount of data that is returned to the client.
  17. ETL is a type of data integration that refers to the three steps used to blend data from multiple sources. It's often used to build a data warehouse.
  18. Data Manipulation Language (DML). Data Definition Language (DDL). For CREATE TABLE: 1. Any HBase metadata (table, column families) that doesn’t already exist will be created. 2. KEEP_DELETED_CELLS option is enabled to allow for flashback queries to work correctly. 3. an empty key value will also be added for each row so that queries behave as expected (without requiring all columns to be projected during scans).  For CREATE VIEW: Instead the existing HBase metadata must match the metadata specified in the DDL statement (or table read only error). For UPSERT VALUES: Use It multiple times before comminting mutations batching For UPSERT SELECT: Configure phoenix.mutate.batchSize based on row size Write scans directly to Hbase and to write on the server while running upsert select on the same table by setting auto-commit to true
  19. Enhance existing statistics collection by enabling further query optmizations based on the size and cardinality of the data. Generate histograms to drive query optimization decisions such as secondary index usage and join ordering based on cardinalities to produce the most efficient query plan.
  20. Secondary Indexies Types: Global Index (Optimized for read heavy use case), local index (Optimized for write heavy space constrained use cases) and functional index (Create index on arbitrary expression). Hbase tables are sorted maps. Star schema is the simplest style of data mart schema (separates business process data into facts), approach is widely used to develop data warehouses and dimensional data mart. The star schema consists of one or more fact tables referencing any number of dimension tables. Fact table contains measurements, metrics, and facts about a business process while the Dimension table is a companion to the fact table which contains descriptive attributes to be used as query constraining Types of Dimension Table: Slowly Changing Dimension, Conformed Dimension, Junk Dimension, Degenerate Dimension, Roleplay Dimension
  21. Maps Hbase native timestamp to a Phoenix column. Take advantage of various optimizations that HBase provides for time ranges. ROW_TIMESTAMP needs to be a primary key column in a date or time format (Specified in documentations for more details). Only one primary key can be designated with ROW_TIMESTAMP, decleration upon table creation (No null or negative values allowed).
  22. Cache content on server through 2 main parts (SQL Read, SQL Write) with end user and collecting content from content providers.
  23. Cache content on server through 2 main parts (SQL Read, SQL Write) with end user and collecting content from content providers.