SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Query Compilation in Impala
Query Compilation in Impala
Alexander Behm | Software Engineer
May 2014 @ Impala User Group
Query Compilation in Impala
Compile Query
Execute Query
Client
Client
SQL Text
Executable Plan
Query Results
Impala Frontend
(Java)
Impala Backend
(C++)
Focus of this talk
Flow of a SQL Query
Query Compilation in Impala
Client
SQL Text
Executable Plan
Query Compilation
Query
Compiler
SQL
Parsing
Semantic
Analysis
Query
Planning
Parse Tree
Parse Tree + Analyzer
Query Compilation in Impala
Query Parsing
SELECT c1, SUM(c2)
FROM t1 JOIN t2 USING(id)
WHERE c3 > 10 GROUP BY c1
SelectList TableRefs WhereClause
SelectStmt
GroupByClause
ColRef AggExpr
ColRef
BinaryPredicate
ColRef IntLiteral
ColRefTableRef TableRef
UsingClause
ColRef
• Applies SQL grammar, reports syntax errors
• Produces parse tree capturing syntactic structure of query
Query Compilation in Impala
Semantic Analysis…
• Precondition: Query is syntactically valid. Analysis operates on parse tree.
• Consults table metadata
• Do t1 and t2 exist? Does c1 exist in t1 or t2 (or both  error)? Does id exist in t1 and t2?
• Does the user have privileges to SELECT from t1?
• Checks type compatibility of expressions, adds implicit casts
• c3 > 10  c3 > cast(10 as bigint)
• SQL rules (semantic, not syntactic)
• Does c1 appear in the GROUP BY clause?
SELECT c1, SUM(c2)
FROM t1 JOIN t2 USING(id)
WHERE c3 > 10 GROUP BY c1
Query Compilation in Impala
… Semantic Analysis
• Expression substitution for views
• Resolve column references against base tables
• Preparation for Planning
• Register state in analyzer for correct predicate assignment during planning
• Register predicates (WHERE, HAVING, ON, USING, etc.)
• Register outer-joined tables
• Compute value-transfer graph and equivalence classes for predicate inference
• (…)
• Postcondition: Query is valid. An executable plan can be produced.
SELECT c1, SUM(c2)
FROM (SELECT dept AS c1, revenue AS c2,
month AS c3 FROM t1) AS v
WHERE c3 > 10 GROUP BY c1
SELECT dept, SUM(revenue)
FROM t1
WHERE month > 10
GROUP BY dept
Query Compilation in Impala
• Generate executable plan (“tree” of operators)
• Maximize scan locality using DN block metadata
• Minimize data movement
• Full distribution of operators
• Query operators
• Scan, HashJoin, HashAggregation, Union, TopN,
Exchange
Query Planning: Goals
Query Compilation in Impala
Query Planning: Overview
Semantic
Analysis
Parse Tree + Analyzer
Query
Planner
Walk Parse Tree
Parallelize
& Fragment
Single-node Plan
Executable Plan
Query Compilation in Impala
Query Planning: Single-Node Plan
• Four major functions:
1. Parse Tree  Plan Tree
2. Assigns predicates to lowest plan node
3. Optimizes join order
4. Prunes irrelevant columns
Query Compilation in Impala
Parse Tree  Single-Node Plan Tree
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Agg
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
Query Compilation in Impala
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
Predicate Assignment & Inference
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
TopN
Agg
COUNT(t2.revenue) > 10
t1.id2 = t3.id
t1.id1 = t2.id
id1 > 10
category = ‘Online’
id > 10
Inferred
Predicate
Query Compilation in Impala
Join-Order Optimization
• Inner joins are commutative and associative
• Query results correct independent of execution order
• Query execution costs vary dramatically!
• Hash table sizes, network transfers, #hash lookups
• Join-order optimization
• Impala only considers left-deep join trees
• (Right join input is a table, not another join)
• Find cheapest valid join order
• Relies heavily on table and column statistics
• Limitation: Choice of join order independent of join strategy
Query Compilation in Impala
Invalid Join Orders
SELECT t1.dept, SUM(t2.revenue)
FROM LargeHdfsTable t1
JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id)
JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id)
WHERE t3.category = 'Online‘ AND t1.id > 10
GROUP BY t1.dept
HAVING COUNT(t2.revenue) > 10
ORDER BY revenue LIMIT 10
No explicit or implicit
predicate between t2 and t3
Query Compilation in Impala
Join-Order Optimization
HashJoin
Scan: t1
Scan: t3
Scan: t2
HashJoin
HashJoin
Scan: t1
Scan: t2
Scan: t3
HashJoin
HashJoin
Scan: t2
Scan: t3
Scan: t1
HashJoin
HashJoin
Scan: t2
Scan: t1
Scan: t3
HashJoin
HashJoin
Scan: t3
Scan: t2
Scan: t1
HashJoin
HashJoin
Scan: t3
Scan: t1
Scan: t2
HashJoin
Order:
t1, t2, t3
Order:
t1, t3, t2
Order:
t2, t1, t3
Order:
t2, t3, t1
Order:
t3, t1, t2
Order:
t3, t2, t1
Query Compilation in Impala
Join-Order Optimization
• Impala’s Implementation:
1. Heuristic
• Order tables descending by size
• Best plan typically has largest table on the left (if valid)
2. Plan enumeration & costing
• Generate all possible join orders starting from a given
left-most table (starting with largest one)
• Ignore invalid join orders
• Estimate intermediate result sizes (key!)
• Choose plan that minimizes intermediate result sizes
Query Compilation in Impala
Query Planning: Overview
Semantic
Analysis
Parse Tree + Analyzer
Query
Planner
Walk Parse Tree
Parallelize
& Fragment
Single-node Plan
Executable Plan
Query Compilation in Impala
Query Planning: Distributed Plans
• Distributed Aggregation
• Pre-aggregation where data is first materialized
• Merge-aggregation partitioned by grouping columns
• Distinct aggregation: additional level of pre- and merge aggregation
• Distributed Top-N
• Initial Top-N where data is first materialized
• Final Top-N at coordinator
• Distributed Union
• Pre-aggregation/top-n placed into plans of each union operand
• Union-operand plans executed in parallel, merged via exchange
• Above strategies are currently fixed in Impala
• Independent of column/table stats
Query Compilation in Impala
Query Planning: Distributed Joins
• Broadcast Join
• Join is co-located with left input
• Broadcast right input to all nodes executing join
• Build hash table on right input, streaming probe from left input
•  Preferred for small right side (relative to left side)
• Partitioned Join
• Both tables hash-partitioned on join columns
• Same build/probe procedure as above
•  Preferred for joins where both left and right side are large
• Cost-based decision based on table/column stats
• Minimize required network transfer
Query Compilation in Impala
Query Planning: Distributed Plans
HashJoinScan: t2
Scan: t3
Scan: t1
HashJoin
TopN
Pre-Agg
MergeAgg
TopN
Broadcast
Merge
hash t2.idhash t1.id1
hash
t1.custid
at HDFS DN
at HBase RS
at coordinator
HashJoin
Scan: t2
Scan: t3
Scan: t1
HashJoin
TopN
Agg
Single-Node
Plan
Query Compilation in Impala
Explain Example: TPCDS Q42
SELECT d.d_year, i.i_category_id, i.i_category, SUM(ss_ext_sales_price)
FROM store_sales ss
JOIN date_dim d
ON (ss.ss_sold_date_sk = d.d_date_sk)
JOIN item i
ON (ss.ss_item_sk = i.i_item_sk)
WHERE i.i_manager_id = 1 AND d.d_moy = 12 AND d.d_year = 1998
GROUP BY d.d_year, i.i_category_id, i.i_category
ORDER BY total_sales DESC, d_year, i_category_id, i_category
LIMIT 100
Query Compilation in Impala
Explain Example: TPCDS Q42
+-----------------------------------------------------+
| Explain String |
+-----------------------------------------------------+
| Estimated Per-Host Requirements: Memory=0B VCores=0 |
| |
| 06:TOP-N [LIMIT=100] |
| 05:AGGREGATE [FINALIZE] |
| 04:HASH JOIN [INNER JOIN] |
| |--02:SCAN HDFS [tpcds1000gb.item i] |
| 03:HASH JOIN [INNER JOIN] |
| |--01:SCAN HDFS [tpcds1000gb.date_dim d] |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss] |
+-----------------------------------------------------+
set explain_level=0;
set num_nodes=1;
Query Compilation in Impala
Explain Example: TPCDS Q42
+---------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=3.76GB VCores=3 |
| |
| 12:TOP-N [LIMIT=100] |
| 11:EXCHANGE [PARTITION=UNPARTITIONED] |
| 06:TOP-N [LIMIT=100] |
| 10:AGGREGATE [MERGE FINALIZE] |
| 09:EXCHANGE [PARTITION=HASH(d.d_year,i.i_category_id,i.i_category)] |
| 05:AGGREGATE |
| 04:HASH JOIN [INNER JOIN, BROADCAST] |
| |--08:EXCHANGE [BROADCAST] |
| | 02:SCAN HDFS [tpcds1000gb.item i] |
| 03:HASH JOIN [INNER JOIN, BROADCAST] |
| |--07:EXCHANGE [BROADCAST] |
| | 01:SCAN HDFS [tpcds1000gb.date_dim d] |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss] |
+---------------------------------------------------------------------+
set explain_level=0;
set num_nodes=0;
Query Compilation in Impala
Explain Example: TPCDS Q42
| …
| 03:HASH JOIN [INNER JOIN, BROADCAST] |
| | hash predicates: ss.ss_sold_date_sk = d.d_date_sk |
| | hosts=10 per-host-mem=511B |
| | tuple-ids=0,1 row-size=40B cardinality=8251124389 |
| | |
| |--07:EXCHANGE [BROADCAST] |
| | | hosts=3 per-host-mem=0B |
| | | tuple-ids=1 row-size=16B cardinality=29 |
| | | |
| | 01:SCAN HDFS [tpcds1000gb.date_dim d, PARTITION=RANDOM] |
| | partitions=1/1 size=9.77MB |
| | predicates: d.d_moy = 12, d.d_year = 1998 |
| | table stats: 73049 rows total |
| | column stats: all |
| | hosts=3 per-host-mem=48.00MB |
| | tuple-ids=1 row-size=16B cardinality=29 |
| | |
| 00:SCAN HDFS [tpcds1000gb.store_sales ss, PARTITION=RANDOM] |
| partitions=1823/1823 size=1.10TB |
| table stats: 8251124389 rows total |
| column stats: all |
| hosts=10 per-host-mem=3.75GB |
| tuple-ids=0 row-size=24B cardinality=8251124389 |
+--------------------------------------------------------------+
set explain_level=2;
set num_nodes=0;
Query Compilation in Impala
Conclusion
• Cost-based choice of join order and strategy
• Critical for performance
• Relies on table and column stats
• Other plan optimizations currently independent of stats
• Likely to expand plan choices in the future
• Likely to increase reliance on stats
• Helpful Impala commands
• compute stats
• show table/column stats
• explain query/insert stmt
• set explain_level=[0-3]
• set num_nodes=0  show single-node plan
Query Compilation in Impala
Try It Out!
•Questions/comments?
• Download: cloudera.com/impala
• Email: impala-user@cloudera.org
• Join: groups.cloudera.org
Query Compilation in Impala

Weitere ähnliche Inhalte

Was ist angesagt?

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeDremio Corporation
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 

Was ist angesagt? (20)

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 

Ähnlich wie Query Compilation in Impala

Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Databricks
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Spark Summit
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaFlink Forward
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsCarl Lu
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector? confluent
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 

Ähnlich wie Query Compilation in Impala (20)

Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
Cost-Based Optimizer Framework for Spark SQL: Spark Summit East talk by Ron H...
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, AlibabaWhat's new in 1.9.0 blink planner - Kurt Young, Alibaba
What's new in 1.9.0 blink planner - Kurt Young, Alibaba
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
So You Want to Write a Connector?
So You Want to Write a Connector? So You Want to Write a Connector?
So You Want to Write a Connector?
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 

Kürzlich hochgeladen (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 

Query Compilation in Impala

  • 1. Query Compilation in Impala Query Compilation in Impala Alexander Behm | Software Engineer May 2014 @ Impala User Group
  • 2. Query Compilation in Impala Compile Query Execute Query Client Client SQL Text Executable Plan Query Results Impala Frontend (Java) Impala Backend (C++) Focus of this talk Flow of a SQL Query
  • 3. Query Compilation in Impala Client SQL Text Executable Plan Query Compilation Query Compiler SQL Parsing Semantic Analysis Query Planning Parse Tree Parse Tree + Analyzer
  • 4. Query Compilation in Impala Query Parsing SELECT c1, SUM(c2) FROM t1 JOIN t2 USING(id) WHERE c3 > 10 GROUP BY c1 SelectList TableRefs WhereClause SelectStmt GroupByClause ColRef AggExpr ColRef BinaryPredicate ColRef IntLiteral ColRefTableRef TableRef UsingClause ColRef • Applies SQL grammar, reports syntax errors • Produces parse tree capturing syntactic structure of query
  • 5. Query Compilation in Impala Semantic Analysis… • Precondition: Query is syntactically valid. Analysis operates on parse tree. • Consults table metadata • Do t1 and t2 exist? Does c1 exist in t1 or t2 (or both  error)? Does id exist in t1 and t2? • Does the user have privileges to SELECT from t1? • Checks type compatibility of expressions, adds implicit casts • c3 > 10  c3 > cast(10 as bigint) • SQL rules (semantic, not syntactic) • Does c1 appear in the GROUP BY clause? SELECT c1, SUM(c2) FROM t1 JOIN t2 USING(id) WHERE c3 > 10 GROUP BY c1
  • 6. Query Compilation in Impala … Semantic Analysis • Expression substitution for views • Resolve column references against base tables • Preparation for Planning • Register state in analyzer for correct predicate assignment during planning • Register predicates (WHERE, HAVING, ON, USING, etc.) • Register outer-joined tables • Compute value-transfer graph and equivalence classes for predicate inference • (…) • Postcondition: Query is valid. An executable plan can be produced. SELECT c1, SUM(c2) FROM (SELECT dept AS c1, revenue AS c2, month AS c3 FROM t1) AS v WHERE c3 > 10 GROUP BY c1 SELECT dept, SUM(revenue) FROM t1 WHERE month > 10 GROUP BY dept
  • 7. Query Compilation in Impala • Generate executable plan (“tree” of operators) • Maximize scan locality using DN block metadata • Minimize data movement • Full distribution of operators • Query operators • Scan, HashJoin, HashAggregation, Union, TopN, Exchange Query Planning: Goals
  • 8. Query Compilation in Impala Query Planning: Overview Semantic Analysis Parse Tree + Analyzer Query Planner Walk Parse Tree Parallelize & Fragment Single-node Plan Executable Plan
  • 9. Query Compilation in Impala Query Planning: Single-Node Plan • Four major functions: 1. Parse Tree  Plan Tree 2. Assigns predicates to lowest plan node 3. Optimizes join order 4. Prunes irrelevant columns
  • 10. Query Compilation in Impala Parse Tree  Single-Node Plan Tree HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin TopN Agg SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10
  • 11. Query Compilation in Impala SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10 Predicate Assignment & Inference HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin TopN Agg COUNT(t2.revenue) > 10 t1.id2 = t3.id t1.id1 = t2.id id1 > 10 category = ‘Online’ id > 10 Inferred Predicate
  • 12. Query Compilation in Impala Join-Order Optimization • Inner joins are commutative and associative • Query results correct independent of execution order • Query execution costs vary dramatically! • Hash table sizes, network transfers, #hash lookups • Join-order optimization • Impala only considers left-deep join trees • (Right join input is a table, not another join) • Find cheapest valid join order • Relies heavily on table and column statistics • Limitation: Choice of join order independent of join strategy
  • 13. Query Compilation in Impala Invalid Join Orders SELECT t1.dept, SUM(t2.revenue) FROM LargeHdfsTable t1 JOIN HugeHdfsTable t2 ON (t1.id1 = t2.id) JOIN SmallHbaseTable t3 ON (t1.id2 = t3.id) WHERE t3.category = 'Online‘ AND t1.id > 10 GROUP BY t1.dept HAVING COUNT(t2.revenue) > 10 ORDER BY revenue LIMIT 10 No explicit or implicit predicate between t2 and t3
  • 14. Query Compilation in Impala Join-Order Optimization HashJoin Scan: t1 Scan: t3 Scan: t2 HashJoin HashJoin Scan: t1 Scan: t2 Scan: t3 HashJoin HashJoin Scan: t2 Scan: t3 Scan: t1 HashJoin HashJoin Scan: t2 Scan: t1 Scan: t3 HashJoin HashJoin Scan: t3 Scan: t2 Scan: t1 HashJoin HashJoin Scan: t3 Scan: t1 Scan: t2 HashJoin Order: t1, t2, t3 Order: t1, t3, t2 Order: t2, t1, t3 Order: t2, t3, t1 Order: t3, t1, t2 Order: t3, t2, t1
  • 15. Query Compilation in Impala Join-Order Optimization • Impala’s Implementation: 1. Heuristic • Order tables descending by size • Best plan typically has largest table on the left (if valid) 2. Plan enumeration & costing • Generate all possible join orders starting from a given left-most table (starting with largest one) • Ignore invalid join orders • Estimate intermediate result sizes (key!) • Choose plan that minimizes intermediate result sizes
  • 16. Query Compilation in Impala Query Planning: Overview Semantic Analysis Parse Tree + Analyzer Query Planner Walk Parse Tree Parallelize & Fragment Single-node Plan Executable Plan
  • 17. Query Compilation in Impala Query Planning: Distributed Plans • Distributed Aggregation • Pre-aggregation where data is first materialized • Merge-aggregation partitioned by grouping columns • Distinct aggregation: additional level of pre- and merge aggregation • Distributed Top-N • Initial Top-N where data is first materialized • Final Top-N at coordinator • Distributed Union • Pre-aggregation/top-n placed into plans of each union operand • Union-operand plans executed in parallel, merged via exchange • Above strategies are currently fixed in Impala • Independent of column/table stats
  • 18. Query Compilation in Impala Query Planning: Distributed Joins • Broadcast Join • Join is co-located with left input • Broadcast right input to all nodes executing join • Build hash table on right input, streaming probe from left input •  Preferred for small right side (relative to left side) • Partitioned Join • Both tables hash-partitioned on join columns • Same build/probe procedure as above •  Preferred for joins where both left and right side are large • Cost-based decision based on table/column stats • Minimize required network transfer
  • 19. Query Compilation in Impala Query Planning: Distributed Plans HashJoinScan: t2 Scan: t3 Scan: t1 HashJoin TopN Pre-Agg MergeAgg TopN Broadcast Merge hash t2.idhash t1.id1 hash t1.custid at HDFS DN at HBase RS at coordinator HashJoin Scan: t2 Scan: t3 Scan: t1 HashJoin TopN Agg Single-Node Plan
  • 20. Query Compilation in Impala Explain Example: TPCDS Q42 SELECT d.d_year, i.i_category_id, i.i_category, SUM(ss_ext_sales_price) FROM store_sales ss JOIN date_dim d ON (ss.ss_sold_date_sk = d.d_date_sk) JOIN item i ON (ss.ss_item_sk = i.i_item_sk) WHERE i.i_manager_id = 1 AND d.d_moy = 12 AND d.d_year = 1998 GROUP BY d.d_year, i.i_category_id, i.i_category ORDER BY total_sales DESC, d_year, i_category_id, i_category LIMIT 100
  • 21. Query Compilation in Impala Explain Example: TPCDS Q42 +-----------------------------------------------------+ | Explain String | +-----------------------------------------------------+ | Estimated Per-Host Requirements: Memory=0B VCores=0 | | | | 06:TOP-N [LIMIT=100] | | 05:AGGREGATE [FINALIZE] | | 04:HASH JOIN [INNER JOIN] | | |--02:SCAN HDFS [tpcds1000gb.item i] | | 03:HASH JOIN [INNER JOIN] | | |--01:SCAN HDFS [tpcds1000gb.date_dim d] | | 00:SCAN HDFS [tpcds1000gb.store_sales ss] | +-----------------------------------------------------+ set explain_level=0; set num_nodes=1;
  • 22. Query Compilation in Impala Explain Example: TPCDS Q42 +---------------------------------------------------------------------+ | Explain String | +---------------------------------------------------------------------+ | Estimated Per-Host Requirements: Memory=3.76GB VCores=3 | | | | 12:TOP-N [LIMIT=100] | | 11:EXCHANGE [PARTITION=UNPARTITIONED] | | 06:TOP-N [LIMIT=100] | | 10:AGGREGATE [MERGE FINALIZE] | | 09:EXCHANGE [PARTITION=HASH(d.d_year,i.i_category_id,i.i_category)] | | 05:AGGREGATE | | 04:HASH JOIN [INNER JOIN, BROADCAST] | | |--08:EXCHANGE [BROADCAST] | | | 02:SCAN HDFS [tpcds1000gb.item i] | | 03:HASH JOIN [INNER JOIN, BROADCAST] | | |--07:EXCHANGE [BROADCAST] | | | 01:SCAN HDFS [tpcds1000gb.date_dim d] | | 00:SCAN HDFS [tpcds1000gb.store_sales ss] | +---------------------------------------------------------------------+ set explain_level=0; set num_nodes=0;
  • 23. Query Compilation in Impala Explain Example: TPCDS Q42 | … | 03:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: ss.ss_sold_date_sk = d.d_date_sk | | | hosts=10 per-host-mem=511B | | | tuple-ids=0,1 row-size=40B cardinality=8251124389 | | | | | |--07:EXCHANGE [BROADCAST] | | | | hosts=3 per-host-mem=0B | | | | tuple-ids=1 row-size=16B cardinality=29 | | | | | | | 01:SCAN HDFS [tpcds1000gb.date_dim d, PARTITION=RANDOM] | | | partitions=1/1 size=9.77MB | | | predicates: d.d_moy = 12, d.d_year = 1998 | | | table stats: 73049 rows total | | | column stats: all | | | hosts=3 per-host-mem=48.00MB | | | tuple-ids=1 row-size=16B cardinality=29 | | | | | 00:SCAN HDFS [tpcds1000gb.store_sales ss, PARTITION=RANDOM] | | partitions=1823/1823 size=1.10TB | | table stats: 8251124389 rows total | | column stats: all | | hosts=10 per-host-mem=3.75GB | | tuple-ids=0 row-size=24B cardinality=8251124389 | +--------------------------------------------------------------+ set explain_level=2; set num_nodes=0;
  • 24. Query Compilation in Impala Conclusion • Cost-based choice of join order and strategy • Critical for performance • Relies on table and column stats • Other plan optimizations currently independent of stats • Likely to expand plan choices in the future • Likely to increase reliance on stats • Helpful Impala commands • compute stats • show table/column stats • explain query/insert stmt • set explain_level=[0-3] • set num_nodes=0  show single-node plan
  • 25. Query Compilation in Impala Try It Out! •Questions/comments? • Download: cloudera.com/impala • Email: impala-user@cloudera.org • Join: groups.cloudera.org