SlideShare ist ein Scribd-Unternehmen logo
1 von 102
Downloaden Sie, um offline zu lesen
Advanced query optimizer
tuning and analysis
Sergei Petrunia
Timour Katchaounov
Monty Program Ab
MySQL Conference And Expo 2013
2 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
3 07:48:08 AM
Is there a problem with query optimizer?
• Database
performance is
affected by many
factors
• One of them is the
query optimizer
• Is my performance
problem caused by
the optimizer?
4 07:48:08 AM
Sings that there is a query optimizer problem
• Some (not all) queries are slow
• A query seems to run longer than it ought to
– And examines more records than it ought to
• Usually, query remains slow regardless of
other activity on the server
5 07:48:08 AM
Catching slow queries, the old ways
● Watch the Slow query log
– Percona Server/MariaDB:
--log_slow_verbosity=query_plan
# Thread_id: 1 Schema: dbt3sf10 QC_hit: No
# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
SET timestamp=1333385770;
select * from customer where c_acctbal < -1000;
• Run SHOW PROCESSLIST periodically
– Run pt-query-digest on the log
6 07:48:08 AM
The new way: SHOW PROCESSLIST + SHOW EXPLAIN
• Available in MariaDB 10.0+
• Displays EXPLAIN of a running statement
MariaDB> show processlist;
+--+----+---------+-------+-------+----+------------+-------------------------...
|Id|User|Host |db |Command|Time|State |Info
+--+----+---------+-------+-------+----+------------+-------------------------...
| 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ...
| 2|root|localhost|dbt3sf1|Query | 0|init |show processlist
+--+----+---------+-------+-------+----+------------+-------------------------...
MariaDB> show explain for 1;
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where|
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+
MariaDB [dbt3sf1]> show warnings;
+-----+----+-----------------------------------------------------------------+
|Level|Code|Message |
+-----+----+-----------------------------------------------------------------+
|Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995|
+-----+----+-----------------------------------------------------------------+
7 07:48:08 AM
SHOW EXPLAIN usage
● Intended usage
– SHOW PROCESSLIST ...
– SHOW EXPLAIN FOR ...
● Why not just run EXPLAIN again
– Difficult to replicate setups
● Temporary tables
● Optimizer settings
● Storage engine's index statistics
● ...
– No uncertainty about whether you're looking at
the same query plan or not.
8 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● use performance_schema
● Many ways to analyze via queries
– events_statements_summary_by_digest
● count_star, sum_timer_wait,
min_timer_wait, avg_timer_wait, max_timer_wait
● digest_text, digest
● sum_rows_examined, sum_created_tmp_disk_tables,
sum_select_full_join
– events_statements_history
● sql_text, digest_text, digest
● timer_start, timer_end, timer_wait
● rows_examined, created_tmp_disk_tables,
select_full_join
8
9 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
• Modified Q18 from DBT3
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > ?
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey,
o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
• App executes Q18 many times with
? = 550000, 500000, 400000, ...
9
10 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● Find candidate slow queries
● Simple tests: select_full_join > 0,
created_tmp_disk_tables > 0, etc
● Complex conditions:
max execution time > X sec OR
min/max time vary a lot:
select max_timer_wait/avg_timer_wait as max_ratio,
avg_timer_wait/min_timer_wait as min_ratio
from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2G
11 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
*************************** 5. row ***************************
DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b
DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` ,
`o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE
`o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY
`c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice`
DESC , `o_orderdate` LIMIT ?
COUNT_STAR: 3
SUM_TIMER_WAIT: 3251758347000
MIN_TIMER_WAIT: 3914209000 → 0.0039 sec
AVG_TIMER_WAIT: 1083919449000
MAX_TIMER_WAIT: 3204044053000 → 3.2 sec
SUM_LOCK_TIME: 555000000
SUM_ROWS_SENT: 25
SUM_ROWS_EXAMINED: 0
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 3
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_RANGE: 3
SUM_SELECT_SCAN: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 25
SUM_SORT_SCAN: 3
SUM_NO_INDEX_USED: 0
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 1970-01-01 03:38:27
LAST_SEEN: 1970-01-01 03:38:43
max_ratio: 2.9560
min_ratio: 276.9192
High variance of
execution time
12 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● Check the actual queries and constants
● The events_statements_history table
select timer_wait/1000000000000 as exec_time, sql_text
from events_statements_history
where digest in
(select digest from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2)
order by timer_wait;
13 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
+-----------+-----------------------------------------------------------------------------------+
| exec_time | sql_text |
+-----------+-----------------------------------------------------------------------------------+
| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 |
| 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 |
| 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 |
+-----------+-----------------------------------------------------------------------------------+
Observation:
orders.o_totalprice > ? is less and less selective
14 07:48:08 AM
Actions after finding the slow query
Bad query plan
– Rewrite the query
– Force a good query plan
• Bad optimizer settings
– Do tuning
• Query is inherently complex
– Don't waste time with it
– Look for other solutions.
15 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
16 07:48:08 AM
Consider a simple select
• 15M rows were scanned, 19 rows in output
• Query plan seems inefficient
– (note: this logic doesn't directly apply to group/order by queries).
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
19 rows in set (7.65 sec)
● Check the query plan:
● Run the query:
17 07:48:08 AM
Query plan analysis
• Entire table is scanned
• WHERE condition checked
after records are read
– Not used to limit
#examined rows.
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
18 07:48:08 AM
Let's add an index
• Outcome
– Down to reading 300K rows
– Still, 300K >> 19 rows.
alter table orders add key i_o_orderdate (o_orderdate);
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
19 rows in set (0.76 sec)
● Query time:
19 07:48:08 AM
Finding out which indexes to add
● index (o_orderdate)
● index (o_clerk)
Check selectivity of conditions that will use the index
select * from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and
o_clerk='Clerk#000009506'
select count(*) from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06';
306322 rows
select count(*) from orders where o_clerk='Clerk#000009506'
1507 rows.
20 07:48:08 AM
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
|1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where|
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where|
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+
Try adding composite indexes
● index (o_clerk, o_orderdate)
● index (o_orderdate, o_clerk)
Bingo! 100% efficiency
Much worse!
• If condition uses multiple columns, composite index will be most efficient
• Order of column matters
– Explanation why is outside of scope of this tutorial. Covered in last year's
tutorial
21 07:48:08 AM
Conditions must be in SARGable form
• Condition must represent a range
• It must have form that is recognized by the optimizer
o_orderDate BETWEEN '1992-06-01' and '1992-06-30'
day(o_orderDate)=1992 and month(o_orderdate)=6
TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and
TO_DAYS('1992-07-06')
o_clerk='Clerk#000009506'
o_clerk LIKE 'Clerk#000009506'
o_clerk LIKE '%Clerk#000009506%'






column IN (1,10,15,21, ...)
(col1, col2) IN ( (1,1), (2,2), (3,3), …). 

22 07:48:08 AM
New in MySQL-5.6: optimizer_trace
● Lets you see the ranges
set optimizer_trace=1;
explain select * from orders
where o_orderDATE between '1992-06-01' and '1992-07-03' and
o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04')
select * from information_schema.optimizer_traceG
● Will print a big JSON struct
● Search for range_scan_alternatives.
23 07:48:08 AM
New in MySQL-5.6: optimizer_trace
...
"range_scan_alternatives": [
{
"index": "i_o_orderdate",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 319082,
"cost": 382900,
"chosen": true
},
{
"index": "i_o_date_clerk",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 406336,
"cost": 487605,
"chosen": false,
"cause": "cost"
}
],
...
● Considered ranges are shown
in range_scan_alternatives
section
● This is actually original use
case of optimizer_trace
● Alas, recent mysql-5.6 displays
misleading info about ranges
on multi-component keys (will
file a bug)
● Still, very useful.
24 07:48:08 AM
Source of #rows estimates for range
select * from orders
where o_orderDate BETWEEN '1992-06-06' and '1992-07-06'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
?
• “records_in_range” estimate
• Done by diving into index
• Usually is fairly accurate
• Not affected by ANALYZE
TABLE.
25 07:48:08 AM
Simple selects: conclusions
• Efficiency == “#rows_scanned is close to #rows_returned”
• Indexes and WHERE conditions reduce #rows scanned
• Index estimates are usually accurate
• Multi-column indexes
– “handle” conditions on multiple columns
– Order of columns in the index matters
• optimizer_trace allows to view the ranges
– But misrepresents ranges over multi-column indexes.
26 07:48:08 AM
Now, will skip some topics
One can also speedup simple selects with
● index_merge access method
● index access method
● Index Condition Pushdown
We don't have time for these now, check out the last
year's tutorial.
27 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
28 07:48:08 AM
A simple join
select * from customer, orders where c_custkey=o_custkey
• “Customers with their orders”
29 07:48:08 AM
Execution: Nested Loops join
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• Complexity:
– Scans table customer
– For each record in customer, scans table orders
• Is this ok?
30 07:48:08 AM
Execution: Nested loops join (2)
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
31 07:48:08 AM
Execution: Nested loops join (3)
select * from customer, orders where c_custkey=o_custkey
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
rows to read
from customer
rows to read from orders
c_custkey=o_custkey
32 07:48:08 AM
Execution: Nested loops join (4)
select * from customer, orders where c_custkey=o_custkey
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
• Scan a 1,493,361-row table 148,749 times
– Consider 1,493,361 * 148,749 row combinations
• Is this query inherently complex?
– We know each customer has his own orders
– size(customer x orders)= size(orders)
– Lower bound is
1,493,361 + 148,749 + costs to match customer<->order.
33 07:48:08 AM
Using index for join: ref access
alter table orders add index i_o_custkey(o_custkey)
select * from customer, orders where c_custkey=o_custkey
34 07:48:08 AM
ref access - analysis
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| |
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+
select * from customer, orders where c_custkey=o_custkey
● One ref lookup scans 7 rows.
● In total: 7 * 148,749=1,041,243 rows
– `orders` has 1.4M rows
– no redundant reads from `orders`
● The whole query plan
– Reads all customers
– Reads 1M orders (of 1.4M)
● Efficient!
35 07:48:08 AM
Conditions that can be used for ref access
● Can use equalities
– tbl.key=other_table.col
– tbl.key=const
– tbl.key IS NULL
● For multipart keys, will use largest prefix
– keypart1=... AND keypart2= … AND keypartK=... .
36 07:48:08 AM
Conditions that can't be used for ref access
● Doesn't work for non-equalities
t1.key BETWEEN t2.col1 AND t2.col2
● Doesn't work for OR-ed equalities
t1.key=t2.col1 OR t1.key=t2.col2
– Except for ref_or_null
t1.key=... OR t1.key IS NULL
● Doesn't “combine” ref and range
access
– t.keypart1 BETWEEN c1 AND c2 AND
t.keypart2=t2.col
– t.keypart2 BETWEEN c1 AND c2 AND
t.keypart1=t2.col .
37 07:48:08 AM
Is ref always efficient?
● Efficient, if column has many different values
– Best case – unique index (eq_ref)
● A few different values – not useful
● Skewed distribution: depends on which part the
join touches
good
bad
depends
38 07:48:08 AM
ref access estimates - index statistics
• How many rows will match
tbl.key_column = $value
for an arbitrary $value?
• Index statistics
show keys from orders where key_name='i_o_custkey'
*************************** 1. row ***************
Table: orders
Non_unique: 1
Key_name: i_o_custkey
Seq_in_index: 1
Column_name: o_custkey
Collation: A
Cardinality: 214462
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
show table status like 'orders'
*************************** 1. row ****
Name: orders
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1495152
Avg_row_length: 133
Data_length: 199966720
Max_data_length: 0
Index_length: 122421248
Data_free: 6291456
...
average = Rows /Cardinality = 1495152 / 214462 = 6.97.
39 07:48:08 AM
ref access – conclusions
● Based on t.key=... equality conditions
● Can make joins very efficient
● Relies on index statistics for estimates.
40 07:48:08 AM
Optimizer statistics
● MySQL/Percona Server
– Index statistics
– Persistent/transient InnoDB stats
● MariaDB
– Index statistics, persistent/transient
● Same as Percona Server (via XtraDB)
– Persistent,
engine-independent,
index-independent statistics.
41 07:48:08 AM
Index statistics
● Cardinality allows to calculate a table-wide
average #rows-per-key-prefix
● It is a statistical value (inexact)
● Exact collection procedure depends on the
storage engine
– InnoDB – random sampling
– MyISAM – index scan
– Engine-independent – index scan.
42 07:48:08 AM
Index statistics in MySQL 5.6
● Sample [8] random index leaf pages
● Table statistics (stored)
– rows - estimated number of rows in a table
– Other stats not used by optimizer
● Index statistics (stored)
– fields - #fields in the index
– rows_per_key - rows per 1 key value, per prefix fields
([1 column value], [2 columns value], [3 columns value], …)
– Other stats not used by optimizer.
43 07:48:08 AM
Index statics updates
● Statistics updated when:
– ANALYZE TABLE tbl_name [, tbl_name] …
– SHOW TABLE STATUS, SHOW INDEX
– Access to INFORMATION_SCHEMA.[TABLES|
STATISTICS]
– A table is opened for the first time
(after server restart)
– A table has changed >10%
– When InnoDB Monitor is turned ON.
44 07:48:08 AM
Displaying optimizer statistics
● MySQL 5.5, MariaDB 5.3, and older
– Issue SQL statements to count rows/keys
– Indirectly, look at EXPLAIN for simple queries
● MariaDB 5.5, Percona Server 5.5 (using XtraDB)
– information_schema.[innodb_index_stats, innodb_table_stats]
– Read-only, always visible
● MySQL 5.6
– mysql.[innodb_index_stats, innodb_table_stats]
– User updatetable
– Only available if innodb_analyze_is_persistent=ON
● MariaDB 10.0
– Persistent updateable tables mysql.[index_stats, column_stats, table_stats]
– User updateable
– + current XtraDB mechanisms.
45 07:48:08 AM
Plan [in]stability
● Statistics may vary a lot (orders)
MariaDB [dbt3]> select * from information_schema.innodb_index_stats;
+------------+-----------------+--------------+ +---------------+
| table_name | index_name | rows_per_key | | rows_per_key | error (actual)
+------------+-----------------+--------------+ +---------------+
| partsupp | PRIMARY | 3, 1 | | 4, 1 | 25%
| partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4)
| partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80)
| orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234)
| orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15)
| lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477)
+------------+-----------------+--------------+ +---------------+
MariaDB [dbt3]> select * from information_schema.innodb_table_stats;
+-----------------+----------+ +----------+
| table_name | rows | | rows |
+-----------------+----------+ +----------+
| partsupp | 6524766 | | 9101065 | 28% (8000000)
| orders | 15039855 | ==> | 14948612 | 0.6% (15000000)
| lineitem | 60062904 | | 59992655 | 0.1% (59986052)
+-----------------+----------+ +----------+
.
46 07:48:08 AM
Controlling statistics (MySQL 5.6)
● Persistent and user-updatetable InnoDB statistics
– innodb_analyze_is_persistent = ON,
– updated manually by ANALYZE TABLE or
– automatically by innodb_stats_auto_recalc = ON
● Control the precision of sampling [default 8]
– innodb_stats_persistent_sample_pages,
– innodb_stats_transient_sample_pages
●
No new statistics compared to older versions.
47 07:48:08 AM
Controlling statistics (MariaDB 10.0)
Current XtraDB index statistics
+
● Engine-independent, persistent, user-updateable statistics
● Precise
● Additional statistics per column (even when there is no
index):
– min_value, max_value: minimum/maximum value per
column
– nulls_ratio: fraction of null values in a column
– avg_length: average size of values in a column
– avg_frequency: average number of rows with the same
value.
48 07:48:08 AM
Join condition
pushdown
49 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
50 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
51 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
52 07:48:08 AM
Join condition pushdown
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
● Conjunctive (ANDed) conditions are split into parts
● Each part is attached as early as possible
– Either as “Using where”
– Or as table access method.
53 07:48:08 AM
Observing join condition pushdown
EXPLAIN: {
"query_block": {
"select_id": 1,
"nested_loop": [
{
"table": {
"table_name": "orders",
"access_type": "ALL",
"possible_keys": [
"i_o_custkey"
],
"rows": 1499715,
"filtered": 100,
"attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` =
'1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))"
}
},
{
"table": {
"table_name": "customer",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"c_custkey"
],
"key_length": "4",
"ref": [
"dbt3sf1.orders.o_custkey"
],
"rows": 1,
"filtered": 100,
"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` <
<cache>(-(500)))"
}
● Before mysql-5.6:
EXPLAIN shows only
“Using where”
– The condition itself
only visible in debug
trace
● Starting from 5.6:
EXPLAIN FORMAT=JSON
shows attached
conditions.
54 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
First table, “customer”
● type=ALL, 150 K rows
●
select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal).
55 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
First table, “customer”
● type=ALL, 150 K rows
●
select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal)
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
Now, access to 'customer' is efficient.
56 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'?.
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
57 07:48:08 AM
●o_orderpriority='1-URGENT'
o_orderpriority='1-URGENT'
● select count(*) from orders – 1.5M rows
● select count(*) from orders where o_orderpriority='1-URGENT' - 300K
rows
● 300K / 1.5M = 0.2
58 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'? Selectivity= 0.2
– Can examine 7*0.2=1.4 rows, 6802 times if we add an index:
alter table orders add index (o_custkey, o_orderpriority)
or
alter table orders add index (o_orderpriority, o_custkey)
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
59 07:48:08 AM
Reasoning about join plan efficiency - summary
Basic* approach to evaluation of join plan efficiency:
for each table $T in the join order {
Look at conditions attached to table $T (condition must
use table $T, may also use previous tables)
Does access method used with $T make a good use
of attached conditions?
}
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
* some other details may also affect join performance
60 07:48:08 AM
Attached conditions
61 07:48:08 AM
Attached conditions
● Ideally, should be used for table access
● Not all conditions can be used [at the same time]
– Unused ones are still useful
– They reduce number of scans for subsequent tables
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and
o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra |
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where|
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
62 07:48:08 AM
Informing optimizer about attached conditions
Currently: a range access that's too expensive to use
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra |
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
|1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where|
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+
explain extended
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal > 8000 and
o_orderpriority='1-URGENT';
● `orders` will be scanned 150081 * 36.22%= 54359 times
● This reduces the cost of join
– Has an effect when comparing potential join plans
● => Index i_o_custkey is not used. But may help the optimizer.
63 07:48:08 AM
Attached condition selectivity
● Unused indexes provide info about selectivity
– Works, but very expensive
● MariaDB 10.0 has engine-independent statistics
– Index statistics
– Non-indexed Column statistics
● Histograms
– Further info:
Tomorrow, 2:20 pm @ Ballroom D
Igor Babaev
Engine-independent persistent statistics with histograms
in MariaDB.
64 07:48:08 AM
How to check if the query plan
matches the reality
65 07:48:08 AM
Check if query plan is realistic
● EXPLAIN shows what optimizer
expects. It may be wrong
– Out-of-date index statistics
– Non-uniform data distribution
● Other DBMS: EXPLAIN ANALYZE
● MySQL: no equivalent. Instead, have
– Handler counters
– “User statistics” (Percona, MariaDB)
– PERFORMANCE_SCHEMA
66 07:48:08 AM
Join analysis: example query (Q18, DBT3)
<reset counters>
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > 500000
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
<collect statistics>
67 07:48:08 AM
Join analysis: handler counters (old)
FLUSH STATUS;
=> RUN QUERY
SHOW STATUS LIKE "Handler%";
+----------------------------+-------+
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 1646 |
| Handler_read_last | 0 |
| Handler_read_next | 1462 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 184 |
| Handler_tmp_update | 1096 |
| Handler_tmp_write | 183 |
| Handler_update | 0 |
| Handler_write | 0 |
68 07:48:08 AM
Join analysis: USERSTAT by Facebook
MariaDB, Percona Server
SET GLOBAL USERSTAT=1;
FLUSH TABLE_STATISTICS;
FLUSH INDEX_STATISTICS;
=> RUN QUERY
SHOW TABLE_STATISTICS;
+--------------+------------+-----------+--------------+-------------------------+
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+--------------+------------+-----------+--------------+-------------------------+
| dbt3 | orders | 183 | 0 | 0 |
| dbt3 | lineitem | 1279 | 0 | 0 |
| dbt3 | customer | 183 | 0 | 0 |
+--------------+------------+-----------+--------------+-------------------------+
SHOW INDEX_STATISTICS;
+--------------+------------+-----------------------+-----------+
| Table_schema | Table_name | Index_name | Rows_read |
+--------------+------------+-----------------------+-----------+
| dbt3 | customer | PRIMARY | 183 |
| dbt3 | lineitem | i_l_orderkey_quantity | 1279 |
| dbt3 | orders | i_o_totalprice | 183 |
+--------------+------------+-----------------------+-----------+
69 07:48:08 AM
Join analysis: PERFORMANCE SCHEMA
[MySQL 5.6, MariaDB 10.0]
● summary tables with read/write statistics
– table_io_waits_summary_by_table
– table_io_waits_summary_by_index_usage
● Superset of the userstat tables
● More overhead
● Not possible to associate statistics with a query
=> truncate stats tables before running a query
● Possible bug
– performance schema not ignored
– Disable by
UPDATE setup_consumers SET ENABLED = 'NO'
where name = 'global_instrumentation';
70 07:48:08 AM
Analyze joins via PERFORMANCE SCHEMA:
SHOW TABLE_STATISTICS analogue
select object_schema, object_name, count_read, count_write,
sum_timer_read, sum_timer_write, ...
from table_io_waits_summary_by_table
where object_schema = 'dbt3' and count_star > 0;
+---------------+-------------+------------+-------------+
| object_schema | object_name | count_read | count_write |
+---------------+-------------+------------+-------------+
| dbt3 | customer | 183 | 0 |
| dbt3 | lineitem | 1462 | 0 |
| dbt3 | orders | 184 | 0 |
+---------------+-------------+------------+-------------+
+----------------+-----------------+
| sum_timer_read | sum_timer_write | ...
+----------------+-----------------+
| 8326528406 | 0 |
| 12117332778 | 0 |
| 7946312812 | 0 |
+----------------+-----------------+
71 07:48:08 AM
Analyze joins via PERFORMANCE SCHEMA:
SHOW INDEX_STATISTICS analogue
select object_schema, object_name, index_name, count_read,
sum_timer_read, sum_timer_write, ...
from table_io_waits_summary_by_index_usage
where object_schema = 'dbt3' and count_star > 0
and index_name is not null;
+---------------+-------------+-----------------------+------------+
| object_schema | object_name | index_name | count_read |
+---------------+-------------+-----------------------+------------+
| dbt3 | customer | PRIMARY | 183 |
| dbt3 | lineitem | i_l_orderkey_quantity | 1462 |
| dbt3 | orders | i_o_totalprice | 184 |
+---------------+-------------+-----------------------+------------+
+----------------+-----------------+
| sum_timer_read | sum_timer_write | ...
+----------------+-----------------+
| 8326528406 | 0 |
| 12117332778 | 0 |
| 7946312812 | 0 |
+----------------+-----------------+
72 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
73 07:48:08 AM
Batched joins
● Optimization for analytical queries
● Analytic queries shovel through lots of data
– e.g. “average size of order in the last month”
– or “pairs of goods purchased together”
● Indexes,etc won't help when you really need to
look at all data
● More data means greater chance of being io-bound
● Solution: batched joins
74 07:48:08 AM
Batched Key Access Idea
75 07:48:08 AM
Batched Key Access Idea
76 07:48:08 AM
Batched Key Access Idea
77 07:48:08 AM
Batched Key Access Idea
78 07:48:08 AM
Batched Key Access Idea
79 07:48:08 AM
Batched Key Access Idea
80 07:48:08 AM
Batched Key Access Idea
● Non-BKA join hits data at random
● Caches are not used efficiently
● Prefetching is not useful
81 07:48:08 AM
Batched Key Access Idea
● BKA implementation accesses data
in order
● Takes advantages of caches and
prefetching
82 07:48:08 AM
Batched Key access effect
set join_cache_level=6;
select max(l_extendedprice)
from orders, lineitem
where
l_orderkey=o_orderkey and
o_orderdate between $DATE1 and $DATE2
The benchmark was run with
● Various BKA buffer size
● Various size of $DATE1...$DATE2 range
83 07:48:08 AM
Batched Key Access Performance
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000
0
500
1000
1500
2000
2500
3000
BKA join performance depending on buffer size
query_size=1, regular
query_size=1, BKA
query_size=2, regular
query_size=2, BKA
query_size=3, regular
query_size=3, BKA
Buffer size, bytes
Querytime,sec
Performance without BKA
Performance with BKA,
given sufficient buffer size● 4x-10x speedup
● The more the data, the bigger the speedup
● Buffer size setting is very important.
84 07:48:08 AM
Batched Key Access settings
● Needs to be turned on
set join_buffer_size= 32*1024*1024;
set join_cache_level=6; -- MariaDB
set optimizer_switch='batched_key_access=on' -- MySQL 5.6
set optimizer_switch='mrr=on';
set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only
● Further join_buffer_size tuning is watching
– Query performance
– Handler_mrr_init counter
and increasing join_buffer_size until either saturates.
85 07:48:08 AM
Batched Key Access - conclusions
● Targeted at big joins
● Needs to be enabled manually
● @@join_buffer_size is the most important
setting
● MariaDB's implementation is a superset of
MySQL's.
86 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries
87 07:48:08 AM
ORDER BY
GROUP BY
aggregates
88 07:48:08 AM
Aggregate functions, no GROUP BY
● COUNT, SUM, AVG, etc need to examine all rows
select SUM(column) from tbl needs to examine the whole tbl.
● MIN and MAX can use index for lookup
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
|1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away|
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+
index (o_orderdate)
select max(o_orderdate) from orders
select min(o_orderdate) from orders where o_orderdate > '1995-05-01'
select max(o_orderdate) from orders where o_orderpriority='1-URGENT'
index (o_orderpriority, o_orderdate)
89 07:48:08 AM
ORDER BY … LIMIT
Three algorithms
● Use an index to read in order
● Read one table, sort, join - “Using filesort”
● Execute join into temporary table and then
sort - “Using temporary; Using filesort”
90 07:48:08 AM
Using index to read data in order
● No special indication
in EXPLAIN output
● LIMIT n: as soon as
we read n records,
we can stop!
91 07:48:08 AM
A problem with LIMIT N optimization
`orders` has 1.5 M rows
explain select * from orders order by o_orderdate desc limit 10;
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra|
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | |
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+
select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10;
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra |
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+
● A problem:
– 1.5M rows, 300K of them 'URGENT'
– Scanning by date, when will we find 10 'URGENT' rows?
– No good solution so far.
92 07:48:08 AM
Using filesort strategy
● Have to read the entire
first table
● For remaining, can apply
LIMIT n
● ORDER BY can only use
columns of tbl1.
93 07:48:08 AM
Using temporary; Using filesort
● ORDER BY clause
can use columns of
any table
● LIMIT is applied only
after executing the
entire join and
sorting.
94 07:48:08 AM
ORDER BY - conclusions
● Resolving ORDER BY with index allows very
efficient handling for LIMIT
– Optimization for
WHERE unused_condition ORDER BY … LIMIT n
is challenging.
● Use sql_big_result, IGNORE INDEX FOR ORDER BY
● Using filesort
– Needs all ORDER BY columns in the first table
– Take advantage of LIMIT when doing join to non-first tables
● Using where; Using filesort is least efficient.
95 07:48:08 AM
GROUP BY strategies
There are three strategies
● Ordered index scan
● Loose Index Scan (LooseScan)
● Groups table
(Using temporary; [Using filesort]).
96 07:48:08 AM
Ordered index scan
● Groups are
enumerated one after
another
● Can compute
aggregates on the fly
● Loose index scan is
also able to jump to
next group.
97 07:48:08 AM
Execution of GROUP BY with temptable
98 07:48:08 AM
Subqueries
99 07:48:08 AM
Subquery optimizations
● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries”
● Queries that caused most of the pain
– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins
– SELECT … FROM (SELECT …) - derived tables
● MariaDB 5.3 and MySQL 5.6
– Have common inheritance, MySQL 6.0 alpha
– Huge (100x, 1000x) speedups for painful areas
– Other kinds of subqueries received a speedup, too
– MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations
● 5.6 handles some un-handled edge cases, too
100 07:48:08 AM
Tuning for subqueries
● “Before”: one execution strategy
– No tuning possible
● “After”: similar to joins
– Reasonable execution strategies supported
– Need indexes
– Need selective conditions
– Support batching in most important cases
● Should be better 9x% of the time.
101 07:48:08 AM
What if it still picks a poor query plan?
For both MariaDB and MySQL:
● Check EXPLAIN [EXTENDED], find a keyword around a
subquery table
● Google “site:kb.askmonty.org $subuqery_keyword”
or https://kb.askmonty.org/en/subquery-optimizations-map/
● Find which optimization it was
● set optimizer_switch='$subquery_optimization=off'
102 07:48:08 AM
Thanks!
Q & A

Weitere ähnliche Inhalte

Was ist angesagt?

Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizerMauro Pagano
 
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Federico Razzoli
 
MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index TuningManikanda kumar
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsEnkitec
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost ModelOlav Sandstå
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS processRiyaj Shamsudeen
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsFederico Razzoli
 
How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15oysteing
 
AWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add upAWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add upJohn Beresniewicz
 
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스PgDay.Seoul
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
Oracle statistics by example
Oracle statistics by exampleOracle statistics by example
Oracle statistics by exampleMauro Pagano
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesSergey Petrunya
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
How to upgrade like a boss to my sql 8.0?
How to upgrade like a boss to my sql 8.0?How to upgrade like a boss to my sql 8.0?
How to upgrade like a boss to my sql 8.0?Alkin Tezuysal
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemSergey Petrunya
 
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...Hiroshi Sekiguchi
 

Was ist angesagt? (20)

Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
 
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
 
MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index Tuning
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
Deep review of LMS process
Deep review of LMS processDeep review of LMS process
Deep review of LMS process
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15
 
AWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add upAWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add up
 
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Oracle statistics by example
Oracle statistics by exampleOracle statistics by example
Oracle statistics by example
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
 
MariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixesMariaDB's join optimizer: how it works and current fixes
MariaDB's join optimizer: how it works and current fixes
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
How to upgrade like a boss to my sql 8.0?
How to upgrade like a boss to my sql 8.0?How to upgrade like a boss to my sql 8.0?
How to upgrade like a boss to my sql 8.0?
 
ANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gemANALYZE for Statements - MariaDB's hidden gem
ANALYZE for Statements - MariaDB's hidden gem
 
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
E34 : [JPOUG Presents] Oracle Database の隠されている様々な謎を解くセッション「なーんでだ?」再び @ db tec...
 

Andere mochten auch

Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksJaime Crespo
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewMYXPLAIN
 
MariaDB Optimizer
MariaDB OptimizerMariaDB Optimizer
MariaDB OptimizerJongJin Lee
 
Capturing Network Traffic into Database
Capturing Network Traffic into Database Capturing Network Traffic into Database
Capturing Network Traffic into Database Tigran Tsaturyan
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)Karthik .P.R
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 

Andere mochten auch (8)

Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricksQuery Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
Query Optimization with MySQL 5.7 and MariaDB 10: Even newer tricks
 
Mysql Optimization
Mysql OptimizationMysql Optimization
Mysql Optimization
 
Cost-Based query optimization
Cost-Based query optimizationCost-Based query optimization
Cost-Based query optimization
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
MariaDB Optimizer
MariaDB OptimizerMariaDB Optimizer
MariaDB Optimizer
 
Capturing Network Traffic into Database
Capturing Network Traffic into Database Capturing Network Traffic into Database
Capturing Network Traffic into Database
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 

Ähnlich wie MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisMYXPLAIN
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...Sergey Petrunya
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL IndexingMYXPLAIN
 
Percona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningPercona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningSergey Petrunya
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0Mydbops
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery ImplementationSimon Su
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query OptimizationAnju Garg
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf TuningHighLoad2009
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerSergey Petrunya
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cMauro Pagano
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)Valeriy Kravchuk
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 

Ähnlich wie MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013 (20)

Advanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and AnalysisAdvanced Query Optimizer Tuning and Analysis
Advanced Query Optimizer Tuning and Analysis
 
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
Percona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningPercona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuning
 
Window functions in MySQL 8.0
Window functions in MySQL 8.0Window functions in MySQL 8.0
Window functions in MySQL 8.0
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery Implementation
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf Tuning
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query Optimizer
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12c
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 

Mehr von Sergey Petrunya

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12Sergey Petrunya
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Sergey Petrunya
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesSergey Petrunya
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureSergey Petrunya
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesSergey Petrunya
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что новогоSergey Petrunya
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performanceSergey Petrunya
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Sergey Petrunya
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standSergey Petrunya
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18Sergey Petrunya
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3Sergey Petrunya
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLSergey Petrunya
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Sergey Petrunya
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howSergey Petrunya
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBSergey Petrunya
 

Mehr von Sergey Petrunya (20)

New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12New optimizer features in MariaDB releases before 10.12
New optimizer features in MariaDB releases before 10.12
 
Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8Improved histograms in MariaDB 10.8
Improved histograms in MariaDB 10.8
 
Improving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimatesImproving MariaDB’s Query Optimizer with better selectivity estimates
Improving MariaDB’s Query Optimizer with better selectivity estimates
 
JSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger pictureJSON Support in MariaDB: News, non-news and the bigger picture
JSON Support in MariaDB: News, non-news and the bigger picture
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Optimizer features in recent releases of other databases
Optimizer features in recent releases of other databasesOptimizer features in recent releases of other databases
Optimizer features in recent releases of other databases
 
MariaDB 10.4 - что нового
MariaDB 10.4 - что новогоMariaDB 10.4 - что нового
MariaDB 10.4 - что нового
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4Query Optimizer in MariaDB 10.4
Query Optimizer in MariaDB 10.4
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
MariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it standMariaDB 10.3 Optimizer - where does it stand
MariaDB 10.3 Optimizer - where does it stand
 
MyRocks in MariaDB | M18
MyRocks in MariaDB | M18MyRocks in MariaDB | M18
MyRocks in MariaDB | M18
 
New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3New Query Optimizer features in MariaDB 10.3
New Query Optimizer features in MariaDB 10.3
 
MyRocks in MariaDB
MyRocks in MariaDBMyRocks in MariaDB
MyRocks in MariaDB
 
Histograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQLHistograms in MariaDB, MySQL and PostgreSQL
Histograms in MariaDB, MySQL and PostgreSQL
 
Say Hello to MyRocks
Say Hello to MyRocksSay Hello to MyRocks
Say Hello to MyRocks
 
Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2Common Table Expressions in MariaDB 10.2
Common Table Expressions in MariaDB 10.2
 
MyRocks in MariaDB: why and how
MyRocks in MariaDB: why and howMyRocks in MariaDB: why and how
MyRocks in MariaDB: why and how
 
Эволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDBЭволюция репликации в MySQL и MariaDB
Эволюция репликации в MySQL и MariaDB
 

Kürzlich hochgeladen

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Kürzlich hochgeladen (20)

Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

  • 1. Advanced query optimizer tuning and analysis Sergei Petrunia Timour Katchaounov Monty Program Ab MySQL Conference And Expo 2013
  • 2. 2 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 3. 3 07:48:08 AM Is there a problem with query optimizer? • Database performance is affected by many factors • One of them is the query optimizer • Is my performance problem caused by the optimizer?
  • 4. 4 07:48:08 AM Sings that there is a query optimizer problem • Some (not all) queries are slow • A query seems to run longer than it ought to – And examines more records than it ought to • Usually, query remains slow regardless of other activity on the server
  • 5. 5 07:48:08 AM Catching slow queries, the old ways ● Watch the Slow query log – Percona Server/MariaDB: --log_slow_verbosity=query_plan # Thread_id: 1 Schema: dbt3sf10 QC_hit: No # Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 SET timestamp=1333385770; select * from customer where c_acctbal < -1000; • Run SHOW PROCESSLIST periodically – Run pt-query-digest on the log
  • 6. 6 07:48:08 AM The new way: SHOW PROCESSLIST + SHOW EXPLAIN • Available in MariaDB 10.0+ • Displays EXPLAIN of a running statement MariaDB> show processlist; +--+----+---------+-------+-------+----+------------+-------------------------... |Id|User|Host |db |Command|Time|State |Info +--+----+---------+-------+-------+----+------------+-------------------------... | 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ... | 2|root|localhost|dbt3sf1|Query | 0|init |show processlist +--+----+---------+-------+-------+----+------------+-------------------------... MariaDB> show explain for 1; +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where| +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ MariaDB [dbt3sf1]> show warnings; +-----+----+-----------------------------------------------------------------+ |Level|Code|Message | +-----+----+-----------------------------------------------------------------+ |Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995| +-----+----+-----------------------------------------------------------------+
  • 7. 7 07:48:08 AM SHOW EXPLAIN usage ● Intended usage – SHOW PROCESSLIST ... – SHOW EXPLAIN FOR ... ● Why not just run EXPLAIN again – Difficult to replicate setups ● Temporary tables ● Optimizer settings ● Storage engine's index statistics ● ... – No uncertainty about whether you're looking at the same query plan or not.
  • 8. 8 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● use performance_schema ● Many ways to analyze via queries – events_statements_summary_by_digest ● count_star, sum_timer_wait, min_timer_wait, avg_timer_wait, max_timer_wait ● digest_text, digest ● sum_rows_examined, sum_created_tmp_disk_tables, sum_select_full_join – events_statements_history ● sql_text, digest_text, digest ● timer_start, timer_end, timer_wait ● rows_examined, created_tmp_disk_tables, select_full_join 8
  • 9. 9 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] • Modified Q18 from DBT3 select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > ? and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 10; • App executes Q18 many times with ? = 550000, 500000, 400000, ... 9
  • 10. 10 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● Find candidate slow queries ● Simple tests: select_full_join > 0, created_tmp_disk_tables > 0, etc ● Complex conditions: max execution time > X sec OR min/max time vary a lot: select max_timer_wait/avg_timer_wait as max_ratio, avg_timer_wait/min_timer_wait as min_ratio from events_statements_summary_by_digest where max_timer_wait > 1000000000000 or max_timer_wait / avg_timer_wait > 2 or avg_timer_wait / min_timer_wait > 2G
  • 11. 11 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] *************************** 5. row *************************** DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE `o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice` DESC , `o_orderdate` LIMIT ? COUNT_STAR: 3 SUM_TIMER_WAIT: 3251758347000 MIN_TIMER_WAIT: 3914209000 → 0.0039 sec AVG_TIMER_WAIT: 1083919449000 MAX_TIMER_WAIT: 3204044053000 → 3.2 sec SUM_LOCK_TIME: 555000000 SUM_ROWS_SENT: 25 SUM_ROWS_EXAMINED: 0 SUM_CREATED_TMP_DISK_TABLES: 0 SUM_CREATED_TMP_TABLES: 3 SUM_SELECT_FULL_JOIN: 0 SUM_SELECT_RANGE: 3 SUM_SELECT_SCAN: 0 SUM_SORT_RANGE: 0 SUM_SORT_ROWS: 25 SUM_SORT_SCAN: 3 SUM_NO_INDEX_USED: 0 SUM_NO_GOOD_INDEX_USED: 0 FIRST_SEEN: 1970-01-01 03:38:27 LAST_SEEN: 1970-01-01 03:38:43 max_ratio: 2.9560 min_ratio: 276.9192 High variance of execution time
  • 12. 12 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● Check the actual queries and constants ● The events_statements_history table select timer_wait/1000000000000 as exec_time, sql_text from events_statements_history where digest in (select digest from events_statements_summary_by_digest where max_timer_wait > 1000000000000 or max_timer_wait / avg_timer_wait > 2 or avg_timer_wait / min_timer_wait > 2) order by timer_wait;
  • 13. 13 07:48:08 AM Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] +-----------+-----------------------------------------------------------------------------------+ | exec_time | sql_text | +-----------+-----------------------------------------------------------------------------------+ | 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 | | 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 | | 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 | +-----------+-----------------------------------------------------------------------------------+ Observation: orders.o_totalprice > ? is less and less selective
  • 14. 14 07:48:08 AM Actions after finding the slow query Bad query plan – Rewrite the query – Force a good query plan • Bad optimizer settings – Do tuning • Query is inherently complex – Don't waste time with it – Look for other solutions.
  • 15. 15 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 16. 16 07:48:08 AM Consider a simple select • 15M rows were scanned, 19 rows in output • Query plan seems inefficient – (note: this logic doesn't directly apply to group/order by queries). select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 19 rows in set (7.65 sec) ● Check the query plan: ● Run the query:
  • 17. 17 07:48:08 AM Query plan analysis • Entire table is scanned • WHERE condition checked after records are read – Not used to limit #examined rows. +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506'
  • 18. 18 07:48:08 AM Let's add an index • Outcome – Down to reading 300K rows – Still, 300K >> 19 rows. alter table orders add key i_o_orderdate (o_orderdate); select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 19 rows in set (0.76 sec) ● Query time:
  • 19. 19 07:48:08 AM Finding out which indexes to add ● index (o_orderdate) ● index (o_clerk) Check selectivity of conditions that will use the index select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' select count(*) from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06'; 306322 rows select count(*) from orders where o_clerk='Clerk#000009506' 1507 rows.
  • 20. 20 07:48:08 AM +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ |1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where| +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where| +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ Try adding composite indexes ● index (o_clerk, o_orderdate) ● index (o_orderdate, o_clerk) Bingo! 100% efficiency Much worse! • If condition uses multiple columns, composite index will be most efficient • Order of column matters – Explanation why is outside of scope of this tutorial. Covered in last year's tutorial
  • 21. 21 07:48:08 AM Conditions must be in SARGable form • Condition must represent a range • It must have form that is recognized by the optimizer o_orderDate BETWEEN '1992-06-01' and '1992-06-30' day(o_orderDate)=1992 and month(o_orderdate)=6 TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and TO_DAYS('1992-07-06') o_clerk='Clerk#000009506' o_clerk LIKE 'Clerk#000009506' o_clerk LIKE '%Clerk#000009506%'       column IN (1,10,15,21, ...) (col1, col2) IN ( (1,1), (2,2), (3,3), …).  
  • 22. 22 07:48:08 AM New in MySQL-5.6: optimizer_trace ● Lets you see the ranges set optimizer_trace=1; explain select * from orders where o_orderDATE between '1992-06-01' and '1992-07-03' and o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04') select * from information_schema.optimizer_traceG ● Will print a big JSON struct ● Search for range_scan_alternatives.
  • 23. 23 07:48:08 AM New in MySQL-5.6: optimizer_trace ... "range_scan_alternatives": [ { "index": "i_o_orderdate", "ranges": [ "1992-06-01 <= o_orderDATE < 1992-06-12", "1992-06-12 < o_orderDATE <= 1992-07-03" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 319082, "cost": 382900, "chosen": true }, { "index": "i_o_date_clerk", "ranges": [ "1992-06-01 <= o_orderDATE < 1992-06-12", "1992-06-12 < o_orderDATE <= 1992-07-03" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 406336, "cost": 487605, "chosen": false, "cause": "cost" } ], ... ● Considered ranges are shown in range_scan_alternatives section ● This is actually original use case of optimizer_trace ● Alas, recent mysql-5.6 displays misleading info about ranges on multi-component keys (will file a bug) ● Still, very useful.
  • 24. 24 07:48:08 AM Source of #rows estimates for range select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ ? • “records_in_range” estimate • Done by diving into index • Usually is fairly accurate • Not affected by ANALYZE TABLE.
  • 25. 25 07:48:08 AM Simple selects: conclusions • Efficiency == “#rows_scanned is close to #rows_returned” • Indexes and WHERE conditions reduce #rows scanned • Index estimates are usually accurate • Multi-column indexes – “handle” conditions on multiple columns – Order of columns in the index matters • optimizer_trace allows to view the ranges – But misrepresents ranges over multi-column indexes.
  • 26. 26 07:48:08 AM Now, will skip some topics One can also speedup simple selects with ● index_merge access method ● index access method ● Index Condition Pushdown We don't have time for these now, check out the last year's tutorial.
  • 27. 27 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 28. 28 07:48:08 AM A simple join select * from customer, orders where c_custkey=o_custkey • “Customers with their orders”
  • 29. 29 07:48:08 AM Execution: Nested Loops join select * from customer, orders where c_custkey=o_custkey for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • Complexity: – Scans table customer – For each record in customer, scans table orders • Is this ok?
  • 30. 30 07:48:08 AM Execution: Nested loops join (2) select * from customer, orders where c_custkey=o_custkey for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • EXPLAIN: +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
  • 31. 31 07:48:08 AM Execution: Nested loops join (3) select * from customer, orders where c_custkey=o_custkey for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • EXPLAIN: +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ rows to read from customer rows to read from orders c_custkey=o_custkey
  • 32. 32 07:48:08 AM Execution: Nested loops join (4) select * from customer, orders where c_custkey=o_custkey +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ • Scan a 1,493,361-row table 148,749 times – Consider 1,493,361 * 148,749 row combinations • Is this query inherently complex? – We know each customer has his own orders – size(customer x orders)= size(orders) – Lower bound is 1,493,361 + 148,749 + costs to match customer<->order.
  • 33. 33 07:48:08 AM Using index for join: ref access alter table orders add index i_o_custkey(o_custkey) select * from customer, orders where c_custkey=o_custkey
  • 34. 34 07:48:08 AM ref access - analysis +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| | |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ select * from customer, orders where c_custkey=o_custkey ● One ref lookup scans 7 rows. ● In total: 7 * 148,749=1,041,243 rows – `orders` has 1.4M rows – no redundant reads from `orders` ● The whole query plan – Reads all customers – Reads 1M orders (of 1.4M) ● Efficient!
  • 35. 35 07:48:08 AM Conditions that can be used for ref access ● Can use equalities – tbl.key=other_table.col – tbl.key=const – tbl.key IS NULL ● For multipart keys, will use largest prefix – keypart1=... AND keypart2= … AND keypartK=... .
  • 36. 36 07:48:08 AM Conditions that can't be used for ref access ● Doesn't work for non-equalities t1.key BETWEEN t2.col1 AND t2.col2 ● Doesn't work for OR-ed equalities t1.key=t2.col1 OR t1.key=t2.col2 – Except for ref_or_null t1.key=... OR t1.key IS NULL ● Doesn't “combine” ref and range access – t.keypart1 BETWEEN c1 AND c2 AND t.keypart2=t2.col – t.keypart2 BETWEEN c1 AND c2 AND t.keypart1=t2.col .
  • 37. 37 07:48:08 AM Is ref always efficient? ● Efficient, if column has many different values – Best case – unique index (eq_ref) ● A few different values – not useful ● Skewed distribution: depends on which part the join touches good bad depends
  • 38. 38 07:48:08 AM ref access estimates - index statistics • How many rows will match tbl.key_column = $value for an arbitrary $value? • Index statistics show keys from orders where key_name='i_o_custkey' *************************** 1. row *************** Table: orders Non_unique: 1 Key_name: i_o_custkey Seq_in_index: 1 Column_name: o_custkey Collation: A Cardinality: 214462 Sub_part: NULL Packed: NULL Null: YES Index_type: BTREE show table status like 'orders' *************************** 1. row **** Name: orders Engine: InnoDB Version: 10 Row_format: Compact Rows: 1495152 Avg_row_length: 133 Data_length: 199966720 Max_data_length: 0 Index_length: 122421248 Data_free: 6291456 ... average = Rows /Cardinality = 1495152 / 214462 = 6.97.
  • 39. 39 07:48:08 AM ref access – conclusions ● Based on t.key=... equality conditions ● Can make joins very efficient ● Relies on index statistics for estimates.
  • 40. 40 07:48:08 AM Optimizer statistics ● MySQL/Percona Server – Index statistics – Persistent/transient InnoDB stats ● MariaDB – Index statistics, persistent/transient ● Same as Percona Server (via XtraDB) – Persistent, engine-independent, index-independent statistics.
  • 41. 41 07:48:08 AM Index statistics ● Cardinality allows to calculate a table-wide average #rows-per-key-prefix ● It is a statistical value (inexact) ● Exact collection procedure depends on the storage engine – InnoDB – random sampling – MyISAM – index scan – Engine-independent – index scan.
  • 42. 42 07:48:08 AM Index statistics in MySQL 5.6 ● Sample [8] random index leaf pages ● Table statistics (stored) – rows - estimated number of rows in a table – Other stats not used by optimizer ● Index statistics (stored) – fields - #fields in the index – rows_per_key - rows per 1 key value, per prefix fields ([1 column value], [2 columns value], [3 columns value], …) – Other stats not used by optimizer.
  • 43. 43 07:48:08 AM Index statics updates ● Statistics updated when: – ANALYZE TABLE tbl_name [, tbl_name] … – SHOW TABLE STATUS, SHOW INDEX – Access to INFORMATION_SCHEMA.[TABLES| STATISTICS] – A table is opened for the first time (after server restart) – A table has changed >10% – When InnoDB Monitor is turned ON.
  • 44. 44 07:48:08 AM Displaying optimizer statistics ● MySQL 5.5, MariaDB 5.3, and older – Issue SQL statements to count rows/keys – Indirectly, look at EXPLAIN for simple queries ● MariaDB 5.5, Percona Server 5.5 (using XtraDB) – information_schema.[innodb_index_stats, innodb_table_stats] – Read-only, always visible ● MySQL 5.6 – mysql.[innodb_index_stats, innodb_table_stats] – User updatetable – Only available if innodb_analyze_is_persistent=ON ● MariaDB 10.0 – Persistent updateable tables mysql.[index_stats, column_stats, table_stats] – User updateable – + current XtraDB mechanisms.
  • 45. 45 07:48:08 AM Plan [in]stability ● Statistics may vary a lot (orders) MariaDB [dbt3]> select * from information_schema.innodb_index_stats; +------------+-----------------+--------------+ +---------------+ | table_name | index_name | rows_per_key | | rows_per_key | error (actual) +------------+-----------------+--------------+ +---------------+ | partsupp | PRIMARY | 3, 1 | | 4, 1 | 25% | partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4) | partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80) | orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234) | orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15) | lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477) +------------+-----------------+--------------+ +---------------+ MariaDB [dbt3]> select * from information_schema.innodb_table_stats; +-----------------+----------+ +----------+ | table_name | rows | | rows | +-----------------+----------+ +----------+ | partsupp | 6524766 | | 9101065 | 28% (8000000) | orders | 15039855 | ==> | 14948612 | 0.6% (15000000) | lineitem | 60062904 | | 59992655 | 0.1% (59986052) +-----------------+----------+ +----------+ .
  • 46. 46 07:48:08 AM Controlling statistics (MySQL 5.6) ● Persistent and user-updatetable InnoDB statistics – innodb_analyze_is_persistent = ON, – updated manually by ANALYZE TABLE or – automatically by innodb_stats_auto_recalc = ON ● Control the precision of sampling [default 8] – innodb_stats_persistent_sample_pages, – innodb_stats_transient_sample_pages ● No new statistics compared to older versions.
  • 47. 47 07:48:08 AM Controlling statistics (MariaDB 10.0) Current XtraDB index statistics + ● Engine-independent, persistent, user-updateable statistics ● Precise ● Additional statistics per column (even when there is no index): – min_value, max_value: minimum/maximum value per column – nulls_ratio: fraction of null values in a column – avg_length: average size of values in a column – avg_frequency: average number of rows with the same value.
  • 48. 48 07:48:08 AM Join condition pushdown
  • 49. 49 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
  • 50. 50 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 51. 51 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 52. 52 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ ● Conjunctive (ANDed) conditions are split into parts ● Each part is attached as early as possible – Either as “Using where” – Or as table access method.
  • 53. 53 07:48:08 AM Observing join condition pushdown EXPLAIN: { "query_block": { "select_id": 1, "nested_loop": [ { "table": { "table_name": "orders", "access_type": "ALL", "possible_keys": [ "i_o_custkey" ], "rows": 1499715, "filtered": 100, "attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))" } }, { "table": { "table_name": "customer", "access_type": "eq_ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "c_custkey" ], "key_length": "4", "ref": [ "dbt3sf1.orders.o_custkey" ], "rows": 1, "filtered": 100, "attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` < <cache>(-(500)))" } ● Before mysql-5.6: EXPLAIN shows only “Using where” – The condition itself only visible in debug trace ● Starting from 5.6: EXPLAIN FORMAT=JSON shows attached conditions.
  • 54. 54 07:48:08 AM Reasoning about join plan efficiency select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ First table, “customer” ● type=ALL, 150 K rows ● select count(*) from customer where c_acctbal < -500 gives 6804. ● alter table customer add index (c_acctbal).
  • 55. 55 07:48:08 AM Reasoning about join plan efficiency select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; First table, “customer” ● type=ALL, 150 K rows ● select count(*) from customer where c_acctbal < -500 gives 6804. ● alter table customer add index (c_acctbal) +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ Now, access to 'customer' is efficient.
  • 56. 56 07:48:08 AM Reasoning about join plan efficiency select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; Second table, “orders” ● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' ● ref access uses only c_custkey=o_custkey ● What about o_orderpriority='1-URGENT'?. +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
  • 57. 57 07:48:08 AM ●o_orderpriority='1-URGENT' o_orderpriority='1-URGENT' ● select count(*) from orders – 1.5M rows ● select count(*) from orders where o_orderpriority='1-URGENT' - 300K rows ● 300K / 1.5M = 0.2
  • 58. 58 07:48:08 AM Reasoning about join plan efficiency select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; Second table, “orders” ● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' ● ref access uses only c_custkey=o_custkey ● What about o_orderpriority='1-URGENT'? Selectivity= 0.2 – Can examine 7*0.2=1.4 rows, 6802 times if we add an index: alter table orders add index (o_custkey, o_orderpriority) or alter table orders add index (o_orderpriority, o_custkey) +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
  • 59. 59 07:48:08 AM Reasoning about join plan efficiency - summary Basic* approach to evaluation of join plan efficiency: for each table $T in the join order { Look at conditions attached to table $T (condition must use table $T, may also use previous tables) Does access method used with $T make a good use of attached conditions? } +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ * some other details may also affect join performance
  • 61. 61 07:48:08 AM Attached conditions ● Ideally, should be used for table access ● Not all conditions can be used [at the same time] – Unused ones are still useful – They reduce number of scans for subsequent tables select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 62. 62 07:48:08 AM Informing optimizer about attached conditions Currently: a range access that's too expensive to use +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ |id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra | +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where| +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ explain extended select * from customer, orders where c_custkey=o_custkey and c_acctbal > 8000 and o_orderpriority='1-URGENT'; ● `orders` will be scanned 150081 * 36.22%= 54359 times ● This reduces the cost of join – Has an effect when comparing potential join plans ● => Index i_o_custkey is not used. But may help the optimizer.
  • 63. 63 07:48:08 AM Attached condition selectivity ● Unused indexes provide info about selectivity – Works, but very expensive ● MariaDB 10.0 has engine-independent statistics – Index statistics – Non-indexed Column statistics ● Histograms – Further info: Tomorrow, 2:20 pm @ Ballroom D Igor Babaev Engine-independent persistent statistics with histograms in MariaDB.
  • 64. 64 07:48:08 AM How to check if the query plan matches the reality
  • 65. 65 07:48:08 AM Check if query plan is realistic ● EXPLAIN shows what optimizer expects. It may be wrong – Out-of-date index statistics – Non-uniform data distribution ● Other DBMS: EXPLAIN ANALYZE ● MySQL: no equivalent. Instead, have – Handler counters – “User statistics” (Percona, MariaDB) – PERFORMANCE_SCHEMA
  • 66. 66 07:48:08 AM Join analysis: example query (Q18, DBT3) <reset counters> select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 500000 and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 10; <collect statistics>
  • 67. 67 07:48:08 AM Join analysis: handler counters (old) FLUSH STATUS; => RUN QUERY SHOW STATUS LIKE "Handler%"; +----------------------------+-------+ | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_read_first | 0 | | Handler_read_key | 1646 | | Handler_read_last | 0 | | Handler_read_next | 1462 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 184 | | Handler_tmp_update | 1096 | | Handler_tmp_write | 183 | | Handler_update | 0 | | Handler_write | 0 |
  • 68. 68 07:48:08 AM Join analysis: USERSTAT by Facebook MariaDB, Percona Server SET GLOBAL USERSTAT=1; FLUSH TABLE_STATISTICS; FLUSH INDEX_STATISTICS; => RUN QUERY SHOW TABLE_STATISTICS; +--------------+------------+-----------+--------------+-------------------------+ | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes | +--------------+------------+-----------+--------------+-------------------------+ | dbt3 | orders | 183 | 0 | 0 | | dbt3 | lineitem | 1279 | 0 | 0 | | dbt3 | customer | 183 | 0 | 0 | +--------------+------------+-----------+--------------+-------------------------+ SHOW INDEX_STATISTICS; +--------------+------------+-----------------------+-----------+ | Table_schema | Table_name | Index_name | Rows_read | +--------------+------------+-----------------------+-----------+ | dbt3 | customer | PRIMARY | 183 | | dbt3 | lineitem | i_l_orderkey_quantity | 1279 | | dbt3 | orders | i_o_totalprice | 183 | +--------------+------------+-----------------------+-----------+
  • 69. 69 07:48:08 AM Join analysis: PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● summary tables with read/write statistics – table_io_waits_summary_by_table – table_io_waits_summary_by_index_usage ● Superset of the userstat tables ● More overhead ● Not possible to associate statistics with a query => truncate stats tables before running a query ● Possible bug – performance schema not ignored – Disable by UPDATE setup_consumers SET ENABLED = 'NO' where name = 'global_instrumentation';
  • 70. 70 07:48:08 AM Analyze joins via PERFORMANCE SCHEMA: SHOW TABLE_STATISTICS analogue select object_schema, object_name, count_read, count_write, sum_timer_read, sum_timer_write, ... from table_io_waits_summary_by_table where object_schema = 'dbt3' and count_star > 0; +---------------+-------------+------------+-------------+ | object_schema | object_name | count_read | count_write | +---------------+-------------+------------+-------------+ | dbt3 | customer | 183 | 0 | | dbt3 | lineitem | 1462 | 0 | | dbt3 | orders | 184 | 0 | +---------------+-------------+------------+-------------+ +----------------+-----------------+ | sum_timer_read | sum_timer_write | ... +----------------+-----------------+ | 8326528406 | 0 | | 12117332778 | 0 | | 7946312812 | 0 | +----------------+-----------------+
  • 71. 71 07:48:08 AM Analyze joins via PERFORMANCE SCHEMA: SHOW INDEX_STATISTICS analogue select object_schema, object_name, index_name, count_read, sum_timer_read, sum_timer_write, ... from table_io_waits_summary_by_index_usage where object_schema = 'dbt3' and count_star > 0 and index_name is not null; +---------------+-------------+-----------------------+------------+ | object_schema | object_name | index_name | count_read | +---------------+-------------+-----------------------+------------+ | dbt3 | customer | PRIMARY | 183 | | dbt3 | lineitem | i_l_orderkey_quantity | 1462 | | dbt3 | orders | i_o_totalprice | 184 | +---------------+-------------+-----------------------+------------+ +----------------+-----------------+ | sum_timer_read | sum_timer_write | ... +----------------+-----------------+ | 8326528406 | 0 | | 12117332778 | 0 | | 7946312812 | 0 | +----------------+-----------------+
  • 72. 72 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 73. 73 07:48:08 AM Batched joins ● Optimization for analytical queries ● Analytic queries shovel through lots of data – e.g. “average size of order in the last month” – or “pairs of goods purchased together” ● Indexes,etc won't help when you really need to look at all data ● More data means greater chance of being io-bound ● Solution: batched joins
  • 74. 74 07:48:08 AM Batched Key Access Idea
  • 75. 75 07:48:08 AM Batched Key Access Idea
  • 76. 76 07:48:08 AM Batched Key Access Idea
  • 77. 77 07:48:08 AM Batched Key Access Idea
  • 78. 78 07:48:08 AM Batched Key Access Idea
  • 79. 79 07:48:08 AM Batched Key Access Idea
  • 80. 80 07:48:08 AM Batched Key Access Idea ● Non-BKA join hits data at random ● Caches are not used efficiently ● Prefetching is not useful
  • 81. 81 07:48:08 AM Batched Key Access Idea ● BKA implementation accesses data in order ● Takes advantages of caches and prefetching
  • 82. 82 07:48:08 AM Batched Key access effect set join_cache_level=6; select max(l_extendedprice) from orders, lineitem where l_orderkey=o_orderkey and o_orderdate between $DATE1 and $DATE2 The benchmark was run with ● Various BKA buffer size ● Various size of $DATE1...$DATE2 range
  • 83. 83 07:48:08 AM Batched Key Access Performance -2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000 0 500 1000 1500 2000 2500 3000 BKA join performance depending on buffer size query_size=1, regular query_size=1, BKA query_size=2, regular query_size=2, BKA query_size=3, regular query_size=3, BKA Buffer size, bytes Querytime,sec Performance without BKA Performance with BKA, given sufficient buffer size● 4x-10x speedup ● The more the data, the bigger the speedup ● Buffer size setting is very important.
  • 84. 84 07:48:08 AM Batched Key Access settings ● Needs to be turned on set join_buffer_size= 32*1024*1024; set join_cache_level=6; -- MariaDB set optimizer_switch='batched_key_access=on' -- MySQL 5.6 set optimizer_switch='mrr=on'; set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only ● Further join_buffer_size tuning is watching – Query performance – Handler_mrr_init counter and increasing join_buffer_size until either saturates.
  • 85. 85 07:48:08 AM Batched Key Access - conclusions ● Targeted at big joins ● Needs to be enabled manually ● @@join_buffer_size is the most important setting ● MariaDB's implementation is a superset of MySQL's.
  • 86. 86 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 87. 87 07:48:08 AM ORDER BY GROUP BY aggregates
  • 88. 88 07:48:08 AM Aggregate functions, no GROUP BY ● COUNT, SUM, AVG, etc need to examine all rows select SUM(column) from tbl needs to examine the whole tbl. ● MIN and MAX can use index for lookup +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ |id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra | +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ |1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away| +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ index (o_orderdate) select max(o_orderdate) from orders select min(o_orderdate) from orders where o_orderdate > '1995-05-01' select max(o_orderdate) from orders where o_orderpriority='1-URGENT' index (o_orderpriority, o_orderdate)
  • 89. 89 07:48:08 AM ORDER BY … LIMIT Three algorithms ● Use an index to read in order ● Read one table, sort, join - “Using filesort” ● Execute join into temporary table and then sort - “Using temporary; Using filesort”
  • 90. 90 07:48:08 AM Using index to read data in order ● No special indication in EXPLAIN output ● LIMIT n: as soon as we read n records, we can stop!
  • 91. 91 07:48:08 AM A problem with LIMIT N optimization `orders` has 1.5 M rows explain select * from orders order by o_orderdate desc limit 10; +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra| +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ |1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | | +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10; +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ |1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where| +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ ● A problem: – 1.5M rows, 300K of them 'URGENT' – Scanning by date, when will we find 10 'URGENT' rows? – No good solution so far.
  • 92. 92 07:48:08 AM Using filesort strategy ● Have to read the entire first table ● For remaining, can apply LIMIT n ● ORDER BY can only use columns of tbl1.
  • 93. 93 07:48:08 AM Using temporary; Using filesort ● ORDER BY clause can use columns of any table ● LIMIT is applied only after executing the entire join and sorting.
  • 94. 94 07:48:08 AM ORDER BY - conclusions ● Resolving ORDER BY with index allows very efficient handling for LIMIT – Optimization for WHERE unused_condition ORDER BY … LIMIT n is challenging. ● Use sql_big_result, IGNORE INDEX FOR ORDER BY ● Using filesort – Needs all ORDER BY columns in the first table – Take advantage of LIMIT when doing join to non-first tables ● Using where; Using filesort is least efficient.
  • 95. 95 07:48:08 AM GROUP BY strategies There are three strategies ● Ordered index scan ● Loose Index Scan (LooseScan) ● Groups table (Using temporary; [Using filesort]).
  • 96. 96 07:48:08 AM Ordered index scan ● Groups are enumerated one after another ● Can compute aggregates on the fly ● Loose index scan is also able to jump to next group.
  • 97. 97 07:48:08 AM Execution of GROUP BY with temptable
  • 99. 99 07:48:08 AM Subquery optimizations ● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries” ● Queries that caused most of the pain – SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins – SELECT … FROM (SELECT …) - derived tables ● MariaDB 5.3 and MySQL 5.6 – Have common inheritance, MySQL 6.0 alpha – Huge (100x, 1000x) speedups for painful areas – Other kinds of subqueries received a speedup, too – MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations ● 5.6 handles some un-handled edge cases, too
  • 100. 100 07:48:08 AM Tuning for subqueries ● “Before”: one execution strategy – No tuning possible ● “After”: similar to joins – Reasonable execution strategies supported – Need indexes – Need selective conditions – Support batching in most important cases ● Should be better 9x% of the time.
  • 101. 101 07:48:08 AM What if it still picks a poor query plan? For both MariaDB and MySQL: ● Check EXPLAIN [EXTENDED], find a keyword around a subquery table ● Google “site:kb.askmonty.org $subuqery_keyword” or https://kb.askmonty.org/en/subquery-optimizations-map/ ● Find which optimization it was ● set optimizer_switch='$subquery_optimization=off'