Calamities with cardinalities

 Independent consultant
 Performance Troubleshooting
 In-house workshops
 Cost-Based Optimizer
 Performance By Design
 (Ex-) Oracle ACE Director
 Member of OakTable Network
2009-2016 Alumni

http://oracle-randolf.blogspot.com

https://www.youtube.com/c/RandolfGeist

https://community.oracle.com/welcome

 Three main questions you should ask when
looking for an efficient execution plan:
 How much data? How many rows / volume?
 How scattered / clustered is the data?
 Caching?
=> Know your data!

 Why are these questions so important?
 Two main strategies:
 One “Big Job”
=> How much data, volume?
 Few/many “Small Jobs”
=> How many times / rows?
=> Effort per iteration? Clustering / Caching

 Optimizer’s cost estimate is based on:
 How much data? How many rows / volume?
 (partially)
 (Caching?) Not at all

 Single table cardinality
 Join cardinality
 Filter subquery / Aggregation cardinality

 Selectivity of predicates applying to a single
table

 Selectivity of predicates applying to a single
table
BaseCardinality
FilteredCardinality/FilterRatio

 Optimizer challenges
 Skewed column value distribution
 Gaps / clustered values
 Correlated column values
 Complex predicates and expressions
 Bind variables

Demo!
optimizer_basics_single_table_cardinality_testcase.sql

 Impact limited to a “single table”
 Influences the favored
(Full Table Scan, Index Access etc.)
 Influences the and
(NESTED LOOP, HASH, MERGE)
=> An incorrect single table cardinality
potentially screws up whole !

 Oracle joins exactly row sources at a time
 If more than two row sources need to be joined,
join operations are required
 Many different possible (factorial!)

 Tree shape of execution plan

 Challenges
 Getting the right!
 A join can mean anything between and a
product

 Getting the right
T1 T2
T1, T2
1,000 rows 1,000 rows
0 rows
1,000,000 rows

 Getting the right
T1 T2
T1, T2
Join cardinality =
Cardinality T1 *
Cardinality T2 *
Join selectivity

 Challenges
 Semi Joins (EXISTS (), = ANY())
 Anti Joins (NOT EXISTS (), <> ALL())
 Non-Equi Joins (Range, Unequal etc.)

 Even for the most common form of a join
- the –
there are several challenges
 Non-uniform join column value distribution
 Partially overlapping join columns
 Correlated column values
 Expressions
 Complex join expressions (multiple AND, OR)

Demo!
optimizer_basics_join_cardinality_testcase.sql

 Influences the and
(NESTED LOOP, HASH, MERGE)
=> An incorrect join cardinality/selectivity
potentially screws up whole !

use ANALYZE … COMPUTE /
ESTIMATE STATISTICS anymore
- Blocks, Rows, Avg Row Len
to configure there, always generated
- Low / High Value,
Num Distinct, Num Nulls
=> Controlled via option of

Controlling column statistics via METHOD_OPT
 If you see FOR ALL COLUMNS:
Question it! Only applicable if the author really knows
what he/she is doing! => Without basic column
statistics Optimizer is resorting to hard coded defaults!
in pre-10g releases:
FOR ALL COLUMNS : Basic column statistics
for all columns, no histograms
from on:
FOR ALL COLUMNS : Basic column
statistics for all columns, histograms if Oracle
determines so

get generated along
with in a (almost)
histogram requires a pass
 Therefore Oracle resorts to
if =>
 This limits the quality of histograms and their
significance – in particular HEIGHT
BALANCED histograms are a potential threat

 Limited resolution of value pairs maximum
 Less than distinct column values =>
 More than distinct column values =>
is always a of data,
even when statistics!

get generated along with
in a (almost)
histograms can be
generated along this single pass with high quality
 New histograms still require a pass
 Hybrid histograms do a much better job than Height
Balanced, not perfect though
 Caution: You only get the new histogram types with
DBMS_STATS.AUTO_SAMPLE_SIZE

 Default max resolution still value pairs
maximum, but can be increased to 2048 max
 Less than distinct column values =>
 More than distinct column values =>
 Again: Watch out for old scripts /
implementations using non-default
ESTIMATE_PERCENT

generates if
a column gets used as a and it has
than distinct values (up to 2048 in 12c if
explicitly requested via parameter / prefs)
in behaviour of histograms
introduced in 10.2.0.4 / 11g
of (no longer) new “value not found in
Frequency Histogram” behaviour
of edge case of very popular /
unpopular values

SELECT SKEWED_NUMBER FROM T ORDER BY SKEWED_NUMBER
1,000
2,000
3,000
4,000
5,000
…
1
70,000
140,000
210,000
280,000
350,000
10,000,000 rows
100 popular values
with 70,000 occurrences
250 buckets each covering
40,000 rows (compute)
250 buckets each covering
approx. 22/23 rows (sampling)
40,000
80,000
120,000
160,000
200,000
240,000
280,000
320,000
360,000
400,000
1,000
2,000
2,000
3,000
3,000
4,000
4,000
5,000
6,000
6,000
RowsRowsEndpoint
1

1,000
100,000
7,000,000
10,000,000

Demo!
histogram_limitations_testcase_12c.sql
frequency_histogram_egde_case_11204.sql

 Still pretty common: Not using the
for storing dates or numbers
 Frequently encountered: Storing in numbers
or strings, or in strings
 Several were added to the optimizer
over time, like detecting numbers stored in strings
under certain circumstances
 Yet there are still many problems with that
approach and quite often
as a result

Demo!
calamities_with_cardinalities_daft_data_types.sql

are still, even with the
latest version of Oracle one of the key factors
for efficient execution plans generated by the
optimizer
 Oracle does not know the you ask
about the data
 Pre 12c: You may want to use FOR ALL
COLUMNS SIZE 1 as default and only
generate where really
 You may get better results with the
histogram , but not always

 There are that work well
with histograms when generated via Oracle
 => Pre 12c: You may need to
generate histograms using
DBMS_STATS.SET_COLUMN_STATS for
critical columns
 Don’t Dynamic Sampling /
Function Based Indexes / Virtual Columns /
Extended Statistics
 Know your and

and determine whether
the or strategy should be
preferred
 If the optimizer gets these estimates right, the
resulting will be within
the of the given access paths

Calamities with cardinalities

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Calamities with cardinalities

Ähnlich wie Calamities with cardinalities (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Calamities with cardinalities

Hinweis der Redaktion