MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

Advanced query optimizer
tuning and analysis
Sergei Petrunia
Timour Katchaounov
Monty Program Ab
MySQL Conference And Expo 2013

2 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries

3 07:48:08 AM
Is there a problem with query optimizer?
• Database
performance is
affected by many
factors
• One of them is the
query optimizer
• Is my performance
problem caused by
the optimizer?

4 07:48:08 AM
Sings that there is a query optimizer problem
• Some (not all) queries are slow
• A query seems to run longer than it ought to
– And examines more records than it ought to
• Usually, query remains slow regardless of
other activity on the server

5 07:48:08 AM
Catching slow queries, the old ways
● Watch the Slow query log
– Percona Server/MariaDB:
--log_slow_verbosity=query_plan
# Thread_id: 1 Schema: dbt3sf10 QC_hit: No
# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
SET timestamp=1333385770;
select * from customer where c_acctbal < -1000;
• Run SHOW PROCESSLIST periodically
– Run pt-query-digest on the log

7 07:48:08 AM
SHOW EXPLAIN usage
● Intended usage
– SHOW PROCESSLIST ...
– SHOW EXPLAIN FOR ...
● Why not just run EXPLAIN again
– Difficult to replicate setups
● Temporary tables
● Optimizer settings
● Storage engine's index statistics
● ...
– No uncertainty about whether you're looking at
the same query plan or not.

8 07:48:08 AM
Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
● use performance_schema
● Many ways to analyze via queries
– events_statements_summary_by_digest
● count_star, sum_timer_wait,
min_timer_wait, avg_timer_wait, max_timer_wait
● digest_text, digest
● sum_rows_examined, sum_created_tmp_disk_tables,
sum_select_full_join
– events_statements_history
● sql_text, digest_text, digest
● timer_start, timer_end, timer_wait
● rows_examined, created_tmp_disk_tables,
select_full_join
8

9 07:48:08 AM
• Modified Q18 from DBT3
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > ?
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey,
o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
• App executes Q18 many times with
? = 550000, 500000, 400000, ...
9

10 07:48:08 AM
● Find candidate slow queries
● Simple tests: select_full_join > 0,
created_tmp_disk_tables > 0, etc
● Complex conditions:
max execution time > X sec OR
min/max time vary a lot:
select max_timer_wait/avg_timer_wait as max_ratio,
avg_timer_wait/min_timer_wait as min_ratio
from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2G

11 07:48:08 AM
*************************** 5. row ***************************
DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b
DIGEST_TEXT: SELECT `c_name` , `c_custkey` , ò_orderkey` , ò_orderdate` ,
ò_totalprice` , SUM ( `l_quantity` ) FROM `customer` , òrders` , `lineitem` WHERE
ò_totalprice` > ? AND `c_custkey` = ò_custkey` AND ò_orderkey` = `l_orderkey` GROUP BY
`c_name` , `c_custkey` , ò_orderkey` , ò_orderdate` , ò_totalprice` ORDER BY ò_totalprice`
DESC , ò_orderdate` LIMIT ?
COUNT_STAR: 3
SUM_TIMER_WAIT: 3251758347000
MIN_TIMER_WAIT: 3914209000 → 0.0039 sec
AVG_TIMER_WAIT: 1083919449000
MAX_TIMER_WAIT: 3204044053000 → 3.2 sec
SUM_LOCK_TIME: 555000000
SUM_ROWS_SENT: 25
SUM_ROWS_EXAMINED: 0
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 3
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_RANGE: 3
SUM_SELECT_SCAN: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 25
SUM_SORT_SCAN: 3
SUM_NO_INDEX_USED: 0
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 1970-01-01 03:38:27
LAST_SEEN: 1970-01-01 03:38:43
max_ratio: 2.9560
min_ratio: 276.9192
High variance of
execution time

12 07:48:08 AM
● Check the actual queries and constants
● The events_statements_history table
select timer_wait/1000000000000 as exec_time, sql_text
from events_statements_history
where digest in
(select digest from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2)
order by timer_wait;

13 07:48:08 AM
+-----------+-----------------------------------------------------------------------------------+
| exec_time | sql_text |
+-----------+-----------------------------------------------------------------------------------+
| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 |
+-----------+-----------------------------------------------------------------------------------+
Observation:
orders.o_totalprice > ? is less and less selective

14 07:48:08 AM
Actions after finding the slow query
Bad query plan
– Rewrite the query
– Force a good query plan
• Bad optimizer settings
– Do tuning
• Query is inherently complex
– Don't waste time with it
– Look for other solutions.

15 07:48:08 AM
● Introduction
– How to catch it
● JOINs
– ref access
● GROUP BY
● Subqueries

19 07:48:08 AM
Finding out which indexes to add
● index (o_orderdate)
● index (o_clerk)
Check selectivity of conditions that will use the index
where
select count(*) from orders
where
o_orderDate BETWEEN '1992-06-06' and '1992-07-06';
306322 rows
select count(*) from orders where o_clerk='Clerk#000009506'
1507 rows.

21 07:48:08 AM
Conditions must be in SARGable form
• Condition must represent a range
• It must have form that is recognized by the optimizer
o_orderDate BETWEEN '1992-06-01' and '1992-06-30'
day(o_orderDate)=1992 and month(o_orderdate)=6
TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and
TO_DAYS('1992-07-06')
o_clerk LIKE 'Clerk#000009506'
o_clerk LIKE '%Clerk#000009506%'






column IN (1,10,15,21, ...)
(col1, col2) IN ( (1,1), (2,2), (3,3), …). 


22 07:48:08 AM
New in MySQL-5.6: optimizer_trace
● Lets you see the ranges
set optimizer_trace=1;
explain select * from orders
where o_orderDATE between '1992-06-01' and '1992-07-03' and
o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04')
select * from information_schema.optimizer_traceG
● Will print a big JSON struct
● Search for range_scan_alternatives.

23 07:48:08 AM
New in MySQL-5.6: optimizer_trace
...
"range_scan_alternatives": [
{
"index": "i_o_orderdate",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 319082,
"cost": 382900,
"chosen": true
},
{
"index": "i_o_date_clerk",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 406336,
"cost": 487605,
"chosen": false,
"cause": "cost"
}
],
...
● Considered ranges are shown
in range_scan_alternatives
section
● This is actually original use
case of optimizer_trace
● Alas, recent mysql-5.6 displays
misleading info about ranges
on multi-component keys (will
file a bug)
● Still, very useful.

24 07:48:08 AM
Source of #rows estimates for range
where o_orderDate BETWEEN '1992-06-06' and '1992-07-06'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
?
• “records_in_range” estimate
• Done by diving into index
• Usually is fairly accurate
• Not affected by ANALYZE
TABLE.

25 07:48:08 AM
Simple selects: conclusions
• Efficiency == “#rows_scanned is close to #rows_returned”
• Indexes and WHERE conditions reduce #rows scanned
• Index estimates are usually accurate
• Multi-column indexes
– “handle” conditions on multiple columns
– Order of columns in the index matters
• optimizer_trace allows to view the ranges
– But misrepresents ranges over multi-column indexes.

26 07:48:08 AM
Now, will skip some topics
One can also speedup simple selects with
● index_merge access method
● index access method
● Index Condition Pushdown
We don't have time for these now, check out the last
year's tutorial.

27 07:48:08 AM
● Introduction
– How to catch it
● JOINs
– ref access
● GROUP BY
● Subqueries

28 07:48:08 AM
A simple join
select * from customer, orders where c_custkey=o_custkey
• “Customers with their orders”

29 07:48:08 AM
Execution: Nested Loops join
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• Complexity:
– Scans table customer
– For each record in customer, scans table orders
• Is this ok?

30 07:48:08 AM
Execution: Nested loops join (2)
for each order O {
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+

31 07:48:08 AM
for each order O {
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
rows to read
from customer
rows to read from orders
c_custkey=o_custkey

32 07:48:08 AM
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
• Scan a 1,493,361-row table 148,749 times
– Consider 1,493,361 * 148,749 row combinations
• Is this query inherently complex?
– We know each customer has his own orders
– size(customer x orders)= size(orders)
– Lower bound is
1,493,361 + 148,749 + costs to match customer<->order.

33 07:48:08 AM
Using index for join: ref access
alter table orders add index i_o_custkey(o_custkey)

35 07:48:08 AM
Conditions that can be used for ref access
● Can use equalities
– tbl.key=other_table.col
– tbl.key=const
– tbl.key IS NULL
● For multipart keys, will use largest prefix
– keypart1=... AND keypart2= … AND keypartK=... .

36 07:48:08 AM
Conditions that can't be used for ref access
● Doesn't work for non-equalities
t1.key BETWEEN t2.col1 AND t2.col2
● Doesn't work for OR-ed equalities
t1.key=t2.col1 OR t1.key=t2.col2
– Except for ref_or_null
t1.key=... OR t1.key IS NULL
● Doesn't “combine” ref and range
access
– t.keypart1 BETWEEN c1 AND c2 AND
t.keypart2=t2.col
– t.keypart2 BETWEEN c1 AND c2 AND
t.keypart1=t2.col .

37 07:48:08 AM
Is ref always efficient?
● Efficient, if column has many different values
– Best case – unique index (eq_ref)
● A few different values – not useful
● Skewed distribution: depends on which part the
join touches
good
bad
depends

38 07:48:08 AM
ref access estimates - index statistics
• How many rows will match
tbl.key_column = $value
for an arbitrary $value?
• Index statistics
show keys from orders where key_name='i_o_custkey'
*************************** 1. row ***************
Table: orders
Non_unique: 1
Key_name: i_o_custkey
Seq_in_index: 1
Column_name: o_custkey
Collation: A
Cardinality: 214462
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
show table status like 'orders'
*************************** 1. row ****
Name: orders
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1495152
Avg_row_length: 133
Data_length: 199966720
Max_data_length: 0
Index_length: 122421248
Data_free: 6291456
...
average = Rows /Cardinality = 1495152 / 214462 = 6.97.

39 07:48:08 AM
ref access – conclusions
● Based on t.key=... equality conditions
● Can make joins very efficient
● Relies on index statistics for estimates.

40 07:48:08 AM
Optimizer statistics
● MySQL/Percona Server
– Index statistics
– Persistent/transient InnoDB stats
● MariaDB
– Index statistics, persistent/transient
● Same as Percona Server (via XtraDB)
– Persistent,
engine-independent,
index-independent statistics.

41 07:48:08 AM
Index statistics
● Cardinality allows to calculate a table-wide
average #rows-per-key-prefix
● It is a statistical value (inexact)
● Exact collection procedure depends on the
storage engine
– InnoDB – random sampling
– MyISAM – index scan
– Engine-independent – index scan.

42 07:48:08 AM
Index statistics in MySQL 5.6
● Sample [8] random index leaf pages
● Table statistics (stored)
– rows - estimated number of rows in a table
– Other stats not used by optimizer
● Index statistics (stored)
– fields - #fields in the index
– rows_per_key - rows per 1 key value, per prefix fields
([1 column value], [2 columns value], [3 columns value], …)
– Other stats not used by optimizer.

43 07:48:08 AM
Index statics updates
● Statistics updated when:
– ANALYZE TABLE tbl_name [, tbl_name] …
– SHOW TABLE STATUS, SHOW INDEX
– Access to INFORMATION_SCHEMA.[TABLES|
STATISTICS]
– A table is opened for the first time
(after server restart)
– A table has changed >10%
– When InnoDB Monitor is turned ON.

44 07:48:08 AM
Displaying optimizer statistics
● MySQL 5.5, MariaDB 5.3, and older
– Issue SQL statements to count rows/keys
– Indirectly, look at EXPLAIN for simple queries
● MariaDB 5.5, Percona Server 5.5 (using XtraDB)
– information_schema.[innodb_index_stats, innodb_table_stats]
– Read-only, always visible
● MySQL 5.6
– mysql.[innodb_index_stats, innodb_table_stats]
– User updatetable
– Only available if innodb_analyze_is_persistent=ON
● MariaDB 10.0
– Persistent updateable tables mysql.[index_stats, column_stats, table_stats]
– User updateable
– + current XtraDB mechanisms.

45 07:48:08 AM
Plan [in]stability
● Statistics may vary a lot (orders)
MariaDB [dbt3]> select * from information_schema.innodb_index_stats;
+------------+-----------------+--------------+ +---------------+
| table_name | index_name | rows_per_key | | rows_per_key | error (actual)
+------------+-----------------+--------------+ +---------------+
| partsupp | PRIMARY | 3, 1 | | 4, 1 | 25%
| partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4)
| partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80)
| orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234)
| orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15)
| lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477)
+------------+-----------------+--------------+ +---------------+
MariaDB [dbt3]> select * from information_schema.innodb_table_stats;
+-----------------+----------+ +----------+
| table_name | rows | | rows |
+-----------------+----------+ +----------+
| partsupp | 6524766 | | 9101065 | 28% (8000000)
| orders | 15039855 | ==> | 14948612 | 0.6% (15000000)
| lineitem | 60062904 | | 59992655 | 0.1% (59986052)
+-----------------+----------+ +----------+
.

46 07:48:08 AM
Controlling statistics (MySQL 5.6)
● Persistent and user-updatetable InnoDB statistics
– innodb_analyze_is_persistent = ON,
– updated manually by ANALYZE TABLE or
– automatically by innodb_stats_auto_recalc = ON
● Control the precision of sampling [default 8]
– innodb_stats_persistent_sample_pages,
– innodb_stats_transient_sample_pages
●
No new statistics compared to older versions.

47 07:48:08 AM
Controlling statistics (MariaDB 10.0)
Current XtraDB index statistics
+
● Engine-independent, persistent, user-updateable statistics
● Precise
● Additional statistics per column (even when there is no
index):
– min_value, max_value: minimum/maximum value per
column
– nulls_ratio: fraction of null values in a column
– avg_length: average size of values in a column
– avg_frequency: average number of rows with the same
value.

48 07:48:08 AM
Join condition
pushdown

50 07:48:08 AM
select *
from
customer, orders
where
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+

51 07:48:08 AM
select *
from
customer, orders
where
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+

52 07:48:08 AM
select *
from
customer, orders
where
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
● Conjunctive (ANDed) conditions are split into parts
● Each part is attached as early as possible
– Either as “Using where”
– Or as table access method.

53 07:48:08 AM
Observing join condition pushdown
EXPLAIN: {
"query_block": {
"select_id": 1,
"nested_loop": [
{
"table": {
"table_name": "orders",
"access_type": "ALL",
"possible_keys": [
"i_o_custkey"
],
"rows": 1499715,
"filtered": 100,
"attached_condition": "((`dbt3sf1`.òrders`.ò_orderpriority` =
'1-URGENT') and (`dbt3sf1`.òrders`.ò_custkey` is not null))"
}
},
{
"table": {
"table_name": "customer",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"c_custkey"
],
"key_length": "4",
"ref": [
"dbt3sf1.orders.o_custkey"
],
"rows": 1,
"filtered": 100,
"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` <
<cache>(-(500)))"
}
● Before mysql-5.6:
EXPLAIN shows only
“Using where”
– The condition itself
only visible in debug
trace
● Starting from 5.6:
EXPLAIN FORMAT=JSON
shows attached
conditions.

54 07:48:08 AM
Reasoning about join plan efficiency
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
First table, “customer”
● type=ALL, 150 K rows
●
select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal).

55 07:48:08 AM
select *
from
customer, orders
where
First table, “customer”
● type=ALL, 150 K rows
●
select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal)
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
Now, access to 'customer' is efficient.

56 07:48:08 AM
select *
from
customer, orders
where
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'?.
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+

57 07:48:08 AM
●o_orderpriority='1-URGENT'
o_orderpriority='1-URGENT'
● select count(*) from orders – 1.5M rows
● select count(*) from orders where o_orderpriority='1-URGENT' - 300K
rows
● 300K / 1.5M = 0.2

58 07:48:08 AM
select *
from
customer, orders
where
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'? Selectivity= 0.2
– Can examine 7*0.2=1.4 rows, 6802 times if we add an index:
alter table orders add index (o_custkey, o_orderpriority)
or
alter table orders add index (o_orderpriority, o_custkey)
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+

59 07:48:08 AM
Reasoning about join plan efficiency - summary
Basic* approach to evaluation of join plan efficiency:
for each table $T in the join order {
Look at conditions attached to table $T (condition must
use table $T, may also use previous tables)
Does access method used with $T make a good use
of attached conditions?
}
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
* some other details may also affect join performance

60 07:48:08 AM
Attached conditions

61 07:48:08 AM
Attached conditions
● Ideally, should be used for table access
● Not all conditions can be used [at the same time]
– Unused ones are still useful
– They reduce number of scans for subsequent tables
select *
from
customer, orders
where
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+

63 07:48:08 AM
Attached condition selectivity
● Unused indexes provide info about selectivity
– Works, but very expensive
● MariaDB 10.0 has engine-independent statistics
– Index statistics
– Non-indexed Column statistics
● Histograms
– Further info:
Tomorrow, 2:20 pm @ Ballroom D
Igor Babaev
Engine-independent persistent statistics with histograms
in MariaDB.

64 07:48:08 AM
How to check if the query plan
matches the reality

65 07:48:08 AM
Check if query plan is realistic
● EXPLAIN shows what optimizer
expects. It may be wrong
– Out-of-date index statistics
– Non-uniform data distribution
● Other DBMS: EXPLAIN ANALYZE
● MySQL: no equivalent. Instead, have
– Handler counters
– “User statistics” (Percona, MariaDB)
– PERFORMANCE_SCHEMA

66 07:48:08 AM
Join analysis: example query (Q18, DBT3)
<reset counters>
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
where
o_totalprice > 500000
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
<collect statistics>

67 07:48:08 AM
Join analysis: handler counters (old)
FLUSH STATUS;
=> RUN QUERY
SHOW STATUS LIKE "Handler%";
+----------------------------+-------+
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 1646 |
| Handler_read_last | 0 |
| Handler_read_next | 1462 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 184 |
| Handler_tmp_update | 1096 |
| Handler_tmp_write | 183 |
| Handler_update | 0 |
| Handler_write | 0 |

68 07:48:08 AM
Join analysis: USERSTAT by Facebook
MariaDB, Percona Server
SET GLOBAL USERSTAT=1;
FLUSH TABLE_STATISTICS;
FLUSH INDEX_STATISTICS;
=> RUN QUERY
SHOW TABLE_STATISTICS;
+--------------+------------+-----------+--------------+-------------------------+
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+--------------+------------+-----------+--------------+-------------------------+
| dbt3 | orders | 183 | 0 | 0 |
| dbt3 | lineitem | 1279 | 0 | 0 |
| dbt3 | customer | 183 | 0 | 0 |
+--------------+------------+-----------+--------------+-------------------------+
SHOW INDEX_STATISTICS;
+--------------+------------+-----------------------+-----------+
| Table_schema | Table_name | Index_name | Rows_read |
+--------------+------------+-----------------------+-----------+
| dbt3 | customer | PRIMARY | 183 |
| dbt3 | lineitem | i_l_orderkey_quantity | 1279 |
| dbt3 | orders | i_o_totalprice | 183 |
+--------------+------------+-----------------------+-----------+

69 07:48:08 AM
Join analysis: PERFORMANCE SCHEMA
[MySQL 5.6, MariaDB 10.0]
● summary tables with read/write statistics
– table_io_waits_summary_by_table
– table_io_waits_summary_by_index_usage
● Superset of the userstat tables
● More overhead
● Not possible to associate statistics with a query
=> truncate stats tables before running a query
● Possible bug
– performance schema not ignored
– Disable by
UPDATE setup_consumers SET ENABLED = 'NO'
where name = 'global_instrumentation';

70 07:48:08 AM
Analyze joins via PERFORMANCE SCHEMA:
SHOW TABLE_STATISTICS analogue
select object_schema, object_name, count_read, count_write,
sum_timer_read, sum_timer_write, ...
from table_io_waits_summary_by_table
where object_schema = 'dbt3' and count_star > 0;
+---------------+-------------+------------+-------------+
| object_schema | object_name | count_read | count_write |
+---------------+-------------+------------+-------------+
| dbt3 | customer | 183 | 0 |
| dbt3 | lineitem | 1462 | 0 |
| dbt3 | orders | 184 | 0 |
+---------------+-------------+------------+-------------+
+----------------+-----------------+
| sum_timer_read | sum_timer_write | ...
+----------------+-----------------+
| 8326528406 | 0 |
| 12117332778 | 0 |
| 7946312812 | 0 |
+----------------+-----------------+

71 07:48:08 AM
Analyze joins via PERFORMANCE SCHEMA:
SHOW INDEX_STATISTICS analogue
select object_schema, object_name, index_name, count_read,
sum_timer_read, sum_timer_write, ...
from table_io_waits_summary_by_index_usage
where object_schema = 'dbt3' and count_star > 0
and index_name is not null;
+---------------+-------------+-----------------------+------------+
| object_schema | object_name | index_name | count_read |
+---------------+-------------+-----------------------+------------+
| dbt3 | customer | PRIMARY | 183 |
| dbt3 | lineitem | i_l_orderkey_quantity | 1462 |
| dbt3 | orders | i_o_totalprice | 184 |
+---------------+-------------+-----------------------+------------+
+----------------+-----------------+
| sum_timer_read | sum_timer_write | ...
+----------------+-----------------+
| 8326528406 | 0 |
| 12117332778 | 0 |
| 7946312812 | 0 |
+----------------+-----------------+

72 07:48:08 AM
● Introduction
– How to catch it
● JOINs
– ref access
● GROUP BY
● Subqueries

73 07:48:08 AM
Batched joins
● Optimization for analytical queries
● Analytic queries shovel through lots of data
– e.g. “average size of order in the last month”
– or “pairs of goods purchased together”
● Indexes,etc won't help when you really need to
look at all data
● More data means greater chance of being io-bound
● Solution: batched joins

74 07:48:08 AM
Batched Key Access Idea

75 07:48:08 AM

76 07:48:08 AM

77 07:48:08 AM

78 07:48:08 AM

79 07:48:08 AM

80 07:48:08 AM
● Non-BKA join hits data at random
● Caches are not used efficiently
● Prefetching is not useful

81 07:48:08 AM
● BKA implementation accesses data
in order
● Takes advantages of caches and
prefetching

82 07:48:08 AM
Batched Key access effect
set join_cache_level=6;
select max(l_extendedprice)
from orders, lineitem
where
l_orderkey=o_orderkey and
o_orderdate between $DATE1 and $DATE2
The benchmark was run with
● Various BKA buffer size
● Various size of $DATE1...$DATE2 range

83 07:48:08 AM
Batched Key Access Performance
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000
0
500
1000
1500
2000
2500
3000
BKA join performance depending on buffer size
query_size=1, regular
query_size=1, BKA
query_size=2, BKA
query_size=3, BKA
Buffer size, bytes
Querytime,sec
Performance without BKA
Performance with BKA,
given sufficient buffer size● 4x-10x speedup
● The more the data, the bigger the speedup
● Buffer size setting is very important.

84 07:48:08 AM
Batched Key Access settings
● Needs to be turned on
set join_buffer_size= 32*1024*1024;
set join_cache_level=6; -- MariaDB
set optimizer_switch='batched_key_access=on' -- MySQL 5.6
set optimizer_switch='mrr=on';
set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only
● Further join_buffer_size tuning is watching
– Query performance
– Handler_mrr_init counter
and increasing join_buffer_size until either saturates.

85 07:48:08 AM
Batched Key Access - conclusions
● Targeted at big joins
● Needs to be enabled manually
● @@join_buffer_size is the most important
setting
● MariaDB's implementation is a superset of
MySQL's.

86 07:48:08 AM
● Introduction
– How to catch it
● JOINs
– ref access
● GROUP BY
● Subqueries

87 07:48:08 AM
ORDER BY
GROUP BY
aggregates

89 07:48:08 AM
ORDER BY … LIMIT
Three algorithms
● Use an index to read in order
● Read one table, sort, join - “Using filesort”
● Execute join into temporary table and then
sort - “Using temporary; Using filesort”

90 07:48:08 AM
Using index to read data in order
● No special indication
in EXPLAIN output
● LIMIT n: as soon as
we read n records,
we can stop!

92 07:48:08 AM
Using filesort strategy
● Have to read the entire
first table
● For remaining, can apply
LIMIT n
● ORDER BY can only use
columns of tbl1.

93 07:48:08 AM
Using temporary; Using filesort
● ORDER BY clause
can use columns of
any table
● LIMIT is applied only
after executing the
entire join and
sorting.

94 07:48:08 AM
ORDER BY - conclusions
● Resolving ORDER BY with index allows very
efficient handling for LIMIT
– Optimization for
WHERE unused_condition ORDER BY … LIMIT n
is challenging.
● Use sql_big_result, IGNORE INDEX FOR ORDER BY
● Using filesort
– Needs all ORDER BY columns in the first table
– Take advantage of LIMIT when doing join to non-first tables
● Using where; Using filesort is least efficient.

95 07:48:08 AM
GROUP BY strategies
There are three strategies
● Ordered index scan
● Loose Index Scan (LooseScan)
● Groups table
(Using temporary; [Using filesort]).

96 07:48:08 AM
Ordered index scan
● Groups are
enumerated one after
another
● Can compute
aggregates on the fly
● Loose index scan is
also able to jump to
next group.

97 07:48:08 AM
Execution of GROUP BY with temptable

99 07:48:08 AM
Subquery optimizations
● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries”
● Queries that caused most of the pain
– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins
– SELECT … FROM (SELECT …) - derived tables
● MariaDB 5.3 and MySQL 5.6
– Have common inheritance, MySQL 6.0 alpha
– Huge (100x, 1000x) speedups for painful areas
– Other kinds of subqueries received a speedup, too
– MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations
● 5.6 handles some un-handled edge cases, too

100 07:48:08 AM
Tuning for subqueries
● “Before”: one execution strategy
– No tuning possible
● “After”: similar to joins
– Reasonable execution strategies supported
– Need indexes
– Need selective conditions
– Support batching in most important cases
● Should be better 9x% of the time.

101 07:48:08 AM
What if it still picks a poor query plan?
For both MariaDB and MySQL:
● Check EXPLAIN [EXTENDED], find a keyword around a
subquery table
● Google “site:kb.askmonty.org $subuqery_keyword”
or https://kb.askmonty.org/en/subquery-optimizations-map/
● Find which optimization it was
● set optimizer_switch='$subquery_optimization=off'

MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013

Ähnlich wie MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013 (20)

Mehr von Sergey Petrunya

Mehr von Sergey Petrunya (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MySQL/MariaDB query optimizer tuning tutorial from Percona Live 2013