Query transformation in Oracle can be heuristic or cost-based. This 2006 paper presents a cost-based transformation framework that combines logical transformation and physical optimization for an optimal execution plan, as well as some efficient algorithms for enumerating the search space of it.
2. Problem to Solve
Tranditional query optimization(in 2006) consists of two phases:
● logical phase: query is rewritten based on heuristic rules.
● physical phase: best implementations are chosen based on cost
estimation.
Some heuristic rules in logical phase should be made in a cost-based manner.
For example:
● Join Reorder
● Group-By Placement
● Subquery Unnesting
● ...
3. Problem to Solve
A Subquery Unnesting Example
Join SubQuery
Apply
e1 j
Join
e1 jAgg
4. Overview
● Transformation in Oracle
○ Heuristic Transformations
○ Cost-Based Transformations
● Cost-Based Transformation Framework
○ Overview
○ State Space Search Techniques
○ Interaction between Transformations
○ Optimization Performance
● Performance Study
5. Heuristic Transformation in Oracle
Subquery Unnesting
Two categories of subquery unnesting:
● unnesting that generates inline views.
● unnesting that merges a subquery into its outer query.
Note that dept_id in employees is a foreign key that references the primary key of departments.
Join
d e
d e
Apply
6. Heuristic Transformation in Oracle
Join Elimination
Remove tables from a query if there are constraints on the join columns.
Note that dept_id in employees is a foreign key that references the primary key of departments.
8. Cost-Based Transformation in Oracle
Subquery Unnesting
Two categories of subquery unnesting:
● unnesting that generates inline views.
● unnesting that merges a subquery into its outer query.
Join SubQuery
Apply
e1 j
Join
e1 jAgg
9. Cost-Based Transformation in Oracle
Group-By and Distinct View Merging
Merge view contains group-by or distinct into its outer query block.
Join
e1 jAgg
e2
Join
e1 j
Agg
e2
10. Cost-Based Transformation in Oracle
Join Predicate Pushdown
Push join predicates into a view.
Join
e1 jAgg e1 j
NestJoin/Apply
l
11. Cost-Based Transformation in Oracle
Join Factorization
Pull common join tables up to the outer UNION ALL query block.
Union
l
Join Join
de jd el
Union
Join
Join
d
e
je
l
12. Cost-Based Transformation in Oracle
Expensive Predicate Pullup
Pull expensive predicates up from the originating view to outer query block.
A predicate is considered expensive if it contains
● procedural language,
● user-defined operators,
● subqueries.
This transformation is only considered when rownum(limit) predicate is specified.
13. Cost-Based Transformation Framework
PhysicalOptimization Component is used to:
● estimate query tree cost,
● generate the final physical execution plan.
The order of applying transformations matters,
so...
common sub-expression factorization, SPJ view
merging, join elimination, subquery unnesting,
group-by (distinct) view merging, group
pruning, predicate move around, set operator
into join conversion, group-by placement,
predicate pullup, join factorization,
disjunction into union-all expansion, star
transformation, and join predicate pushdown
14. State Space Search Techniques
Definition of state in search space
We have a query consists of N objects(e.g., tables, join edges, predicats, etc.),
and we have M transformatins that can apply on the N objects,
then the state is represented by an M*N bit matrix and
there are 2^(M*N) states totally.
If we only have one transformation, expensive predicate pullup,
then the SQL below have four states:
● [[0, 0]],
● [[0, 1]],
● [[1, 0]],
● [[1, 1]].
15. State Space Search Techniques
How to search in state space
Four different techniques(only consider one transformation):
● Exhaustive: all 2^N states for N objects are considered.
● Two-pass: only consider 2 states, [[0, 0, ...0]] and [[1, 1, ...1]].
● Linear: a dynamic programming approach that suppose different objects are
independent; N+1 states are considered.
● Iterative:
a. start from an initial state and look for a local minimum state.
b. choose a different initial state and repeat step a until
■ no more new states to be found, or
■ some terminatin condition has been reached.
c. N+1~2^N states are considered.
16. Interaction between Transformations
Interleaving
When two (or more) cost-based transformations apply on the same object such that one
transformation becomes applicable only after the other has been applied, then these
transformations must be interleaved in order for the optimizer to determine the optimal plan.
We begin at S0, and
we can apply T1 to S1 and
we can apply T2, T3 to S2, S3;
If we don’t interleave T3 after T2, we can’t get to
S3 and the best state Sfinal on the right.
17. Interaction between Transformations
Juxtaposition
When two or more cost-based transformations apply on the same object in a way that precludes
their sequential application, they must be applied one by one in order for the optimizer to
determine the most optimal plan. This comparison of two or more cost-based transformations is
called juxtaposition.
We begin at S0, and
we can apply T1 to S1 or T2 to S2 but
T1 < T2 in sequential order and
we know if we get to S1,
we can’t get to S2 probably.
If we don’t consider T2 when we apply T1, we can’t
get to S2 and the best state Sfinal on the right.
18. Interaction between Transformations
Juxtaposition
An example: view merging and join predicate push down must be juxtaposed with each other.
Join
e1 jAgge1 jAgg
NestJoin/Apply
Join
e1 j
Agg
e2
view mergingJPPD
19. Optimization Performance
Reuse of Query Sub-Tree Cost Annotations
We have a query with two subquery and
a transformation which can be applied on subquery.
Then we have four states:
Cost information of Qs1, Qs2, T(Qs1), T(Qs2) can be used.
20. Performance Study
Dataset
● 14000 tables representing HR, Financial, Order Entry, CRM, Supply Chain…
● 241000 queries
○ the average number of tables in a query is 8,
○ most of the queries are of simple Sel/Proj/Join type,
○ 8% of these queries have subqueries, GROUP-BY, DISTINCT or UNION ALL.
Result
● 5910 execution plans changed.
● the total run time improved by 20% on
average.
● 18% affected queries degraded by 40%.
● the top 5% of longest running queries
improved 27%.
● optimization time increased by only 40%.