SQL Tuning, takes 3 to tango

SQL Tuning
Takes three to tango
Mauro Pagano

Mauro Pagano @mautro
• Worked at Oracle, been at Enkitec (AEG) a while now
• Spend most of the time on performance problems
• Free tools: SQLd360, TUNAs360 etc (at Oracle: SQLT, SQLHC etc)
• Strong British accent
• “Newbie old fart” (approved by Bryn)
2

1. Do you use AWR for SQL Tuning?
1. Do you start from it?
2. Why?
3. How do you use it?
2. Do you use ASH for SQL Tuning?
2. Why?
3. Do you use SQL Monitoring for SQL tuning?
2. Why?
Poll time – “Historical” SQL Tuning

What are we doing here today?
• Oracle has a ton of diagnostics (awesome!)
• People tend to rely on GV$ / AWR more than ASH
• Some questions harder to answer (if possible) from GV$/AWR data
• Today’s goal is:
• Present scenarios where multiple sources needed
• Explain why & where to gather the missing info, make sense out of it
• Knowing what info represent / source, better use of them
• Focus is on diagnostics

What are we NOT doing here today?
• Argument about which one is better
• They complement each other, not exclude each other
• Need all (often AWR+ASH enough) to have a full picture
• One could be enough depending on cases, still the other adds value
• Provide solution to scenarios presented
• Today it’s about diagnostics, not problem X or Y
• Once behavior identified correctly, solution is often easier to find (if exists)
• Talk license / cost associated with the Packs used

Some (incorrect) terminology we’ll use today
• GV$ all views on X$, except X$ASH
• For Example, GV$SQL
• AWR all the tables in AWR except ASH
• For example, DBA_HIST_SQLSTAT and DBA_HIST_SQL_PLAN
• ASH as GV$ACTIVE_SESSION_HISTORY and
DBA_HIST_ACTIVE_SESS_HISTORY
• SQL Mon as SQL Monitor
• Both raw data (GV$) and reports (current and historical)

How are AWR / (historical) ASH populated?
• AWR takes a picture every N minutes (or manual)
• Source views store accumulated data, take a pic of that at time T
• Historical ASH filters out samples from memory ASH
• Filtered may show info not important enough to show up in accumulated
• Source data includes info for all active sessions individually (not aggregated)
• Ratio is generally 1:10
• X$KEW[A|R]* help to narrow down what to collect
• Might make things a little harder to “break” in isolation 

Why do we need both? AWR
Knows exactly
how much water
ASH
Knows roughly
who, when, how…

ASH samples - Why can we live with it?
Elapsed Time / execution
Frequencyofexecution
How you move
these bars
depends on your
app

Before we begin…
• Every case is artificial, represents real file case without noise
• Case itself is just a mean to an end, not really the focus of the scenario
• Cases build on each other, start simple and get into a little more complex
• OF COURSE I’m cheating!
• What we’ll see can be applied to any environment
• Knowing how to interpret and spot things helps in dev too
• Charts used just to present large amount of info in small space
• Not trying to push for any specific tool
• Get into DataViz anyway, it makes life so much easier!

Two tables, all we need
-- 8x dba_objects
create table t_case1 as
select *
from dba_objects,
(select rownum n1 from dual connect by rownum <= 8);
create index t_case1_objtype on t_case1(object_type);
-- 32x dba_objects
create table t_case3 as
select *
from t_case1,
(select rownum from dual connect by rownum <= 4);

1 - How long does my SQL take? Total
<<removed plan_hash_value&child_number just to make it fit, one child only>>
select elapsed_time, buffer_gets, executions,
trunc(elapsed_time/executions,2) elapsed_exec,
trunc(buffer_gets/executions,2) lio_exec
from gv$sql
where sql_id = 'apg0k1r43s8ak';
ELAPSED_TIME BUFFER_GETS EXECUTIONS ELAPSED_EXEC LIO_EXEC
-------------- ------------ ---------- ------------- -----------
2,446,627 48,884 11 222,420.63 4,444
Elapsed/exec ~220ms
Gets/exec ~4.5k

1 - How long does my SQL take? Good run
var b1 varchar2(20)
exec :b1 := 'RULE';
select * from t_case1 where object_type = :b1;
-----------------------------------------------------------------------------
| Id |Operation |Name |A-Rows|A-Time |Buffers|
-----------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 8|00:00.01| 13|
| 1| TABLE ACCESS BY INDEX ROWID B|T_CASE1 | 8|00:00.01| 13|
|* 2| INDEX RANGE SCAN |T_CASE1_OBJTYPE| 8|00:00.01| 5|
-----------------------------------------------------------------------------
Elapsed: 00:00:00.07
~10ms 13 buffer gets

1 - How long does my SQL take? Bad run
var b1 varchar2(20)
exec :b1 := 'JAVA CLASS';
select * from t_case1 where object_type = :b1;
-----------------------------------------------------------------------------
|Id|Operation |Name |A-Rows|A-Time |Buffers|Reads|
-----------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 305K|00:01.80| 54926| 8009|
| 1| TABLE ACCESS BY INDEX R B|T_CASE1 | 305K|00:01.80| 54926| 8009|
|*2| INDEX RANGE SCAN |T_CASE1_OBJTYPE| 305K|00:00.63| 22737| 1300|
-----------------------------------------------------------------------------
Elapsed: 00:05:11.88
Can we look at the
data from a different
POV?
~1.80s 55k buffer gets

1 - How long does my SQL take? ASH POV
select sample_time, sql_exec_id, sql_exec_start from gv$active_session_history where sql_id =
'apg0k1r43s8ak' order by sample_time;
SAMPLE_TIME SQL_EXEC_ID SQL_EXEC_START
--------------------------- ----------- -------------------
20-AUG-17 11.39.58.754 AM 16777222 2017-08-20/11:39:46
20-AUG-17 11.40.11.761 AM
20-AUG-17 11.40.31.781 AM
20-AUG-17 11.40.40.791 AM
20-AUG-17 11.40.43.792 AM
20-AUG-17 11.40.48.799 AM
20-AUG-17 11.40.51.800 AM
20-AUG-17 11.40.57.801 AM
20-AUG-17 11.41.03.809 AM
20-AUG-17 11.41.05.811 AM
20-AUG-17 11.41.23.833 AM
20-AUG-17 11.41.35.846 AM
20-AUG-17 11.41.52.863 AM
20-AUG-17 11.41.54.864 AM
20-AUG-17 11.42.03.870 AM
20-AUG-17 11.42.08.875 AM
20-AUG-17 11.42.21.882 AM
20-AUG-17 11.42.31.891 AM
20-AUG-17 11.42.34.896 AM
Jumps in time – session not
always busy during the
missing sample
User experience is 5
minutes not 2 secs
Not much we can do from
the DB perspective here

1 - How long does my SQL take? Summary
• Questions answered
• GV$SQL (and similar) report time spent in DB calls, not user experience
• GV$SQL (and similar) aggregates time over executions of same cursor
• ASH sampled data helps understand how DB Time is spread over clock time
• In this case showing how clock time was likely NOT spent inside the DB
• ASH data has many dimensions, can help narrow down further
• For example, all slow executions come from app server X
• Question not solved
• Why slow execution was slow (was easy this time, we provided the bind)
• Historical binds are sampled, no direct correlation with specific execution
• Ideally pick up value and run SQL to reproduce

2 - How long did my SQL take? AWR
SQL ID: 8gv4bwmnp8kmq
select /*+ LEADING(A) USE_NL(B) */ count(*)
from t_case1 a, t_case1 b
where rownum <= 1e10;
select snap_id, executions_delta e_d, executions_total e_t,
end_of_fetch_count_delta eof_d,
trunc(elapsed_time_delta/1e6) et_d_s,
trunc(elapsed_time_total/1e6) et_t_s,
buffer_gets_delta bg_d, buffer_gets_total bg_t
from dba_hist_sqlstat
where sql_id = '8gv4bwmnp8kmq' order by snap_id;
SNAP_ID E_D E_T EOF_D ET_D_S ET_T_S BG_D BG_T
---------- --- --- ----- ------ ------ ----------- ------------
3341 0 1 0 187 188 77,221,193 77,693,626
3342 0 1 0 126 314 51,883,866 129,577,492
3343 0 1 0 128 442 52,887,666 182,465,158
No info from
snapshots when SQL
started & ended

2 - How long did my SQL take? AWR report
No trivial way to determine
#concurrent execs.
Doable from *_TOTAL
raw info

2 - How long did my SQL take? Concurr Execs
Time passing
SNAP_ID
3341 3342 3343
Exec #1, starts
second and
completes second
Not expensive
enough to get
captured
Exec #2, starts last
and completes
first
Session 1
Session 2
Session 3

2 - How long did my SQL take? ASH data
select sql_exec_id, sql_exec_start,
min(sample_time) first_sample, max(sample_time) last_sample,
max(sample_time)-sql_exec_start elapsed
from dba_hist_active_sess_history
where sql_id = '8gv4bwmnp8kmq'
group by sql_exec_id, sql_exec_start;
SQL_EXEC_ID SQL_EXEC_START FIRST_SAMPLE
----------- ------------------- ---------------------------
16777216 2017-08-20/13:04:43 20-AUG-17 01.04.52.779 PM
LAST_SAMPLE ELAPSED
-------------------------- --------------------------
20-AUG-17 01.12.32.799 PM +000000000 00:07:49.79
Only one execution,
took ~8 mins

2 - How long did my SQL take? Summary
• AWR only captures what mattered for the snapshot
• Can miss start / stop “slice” of info if not impacting enough within snapshot
• Raw info allows to determine number of concurrent executions, not AWR report
• Can only say how many started / ended, not which one
• ASH keeps only a subset of samples, but for each exec
• With approximation, allows to determine the who, when, where of each exec
• Questions not answered
• What if my execution takes very little? Sample compromise / doesn’t matter

3 – How is my PX doing?
SQL ID: frzgf5tc9cscc
select /*+ LEADING(A) PARALLEL(4) */ count(*)
where a.owner = b.owner;
<< while SQL running >>
select child_number, elapsed_time, buffer_gets, executions,
px_servers_executions
from gv$sql
where sql_id = 'frzgf5tc9cscc';
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ----------- ---------- ---------------------
0 11,677 172 1 0
1 34,682,909 6 0 0

3 – How is my PX doing? Running slow!
<< SQL still running >>
------------ -------------- ------------ ---------- ---------------------
0 13,682 172 1 0
1 59,852,041 2,734 0 0
after CTRL+C (was taking too long)
------------ -------------- ------------ ---------- ---------------------
0 519,951 172 1 0
1 96,314,353 13,205 0 8
Up to this point we know
8 sessions involved and
aggregated stats only

3 – How is my PX doing? Checking plan
<< SQL was still running>>
---------------------------------------------------------------------------------------
| Id|Operation |Name |E-Rows|Cost (%CPU)| TQ |IN-OUT|PQ Distrib |
---------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | |19462 (100)| | | |
| 1| SORT AGGREGATE | | 1 | | | | |
| 2| PX COORDINATOR | | | | | | |
| 3| PX SEND QC (RANDOM) |:TQ10002| 1 | |Q1,02| P->S |QC (RAND) |
| 4| SORT AGGREGATE | | 1 | |Q1,02| PCWP | |
|* 5| HASH JOIN | | 27G|19462 (86)|Q1,02| PCWP | |
| 6| PX RECEIVE | | 940K| 1377 (1)|Q1,02| PCWP | |
| 7| PX SEND HYBRID HASH |:TQ10000| 940K| 1377 (1)|Q1,00| P->P |HYBRID HASH|
| 8| STATISTICS COLLECTOR| | | |Q1,00| PCWC | |
| 9| PX BLOCK ITERATOR | | 940K| 1377 (1)|Q1,00| PCWC | |
|*10| TABLE ACCESS FULL |T_CASE1 | 940K| 1377 (1)|Q1,00| PCWP | |
| 11| PX RECEIVE | | 940K| 1377 (1)|Q1,02| PCWP | |
| 12| PX SEND HYBRID HASH |:TQ10001| 940K| 1377 (1)|Q1,01| P->P |HYBRID HASH|
| 13| PX BLOCK ITERATOR | | 940K| 1377 (1)|Q1,01| PCWC | |
|*14| TABLE ACCESS FULL |T_CASE1 | 940K| 1377 (1)|Q1,01| PCWP | |
---------------------------------------------------------------------------------------
Nothing surprising, plan
you’d expect when
dealing with large #rows
Maybe PX Skewness?
Can’t use V$PQ_TQSTAT,
we CTRL+Ced exec
Not downgraded,
used 8 processes

3 – How is my PX doing? PX Skew & ASH data
select session_id, session_serial#, program, count(*)
from gv$active_session_history
where sql_id = 'frzgf5tc9cscc'
and sql_exec_id = 16777217
group by session_id, session_serial#, program;
SESSION_ID SESSION_SERIAL# PROGRAM COUNT(*)
---------- --------------- -------------------- ----------
8 55195 oracle@oel7 (P003) 217
133 37006 oracle@oel7 (P001) 12
QC not showing nor most
of the other processes,
P003 top consumer
Adding new ASH cols in the
SQL we can drill down, e.g.
plan step where time goes

3 – How is my PX doing? PX Skew & SQL Mon
Many PX info in SQL
Mon NOT COMING
from ASH 

3 – How is my PX doing? PX Skew Summary
• Presence of skewness during / after SQL execution
• Regardless of V$PQ_TQSTAT view (tricky to use)
• Needs SQL Monitor to have low level info (buffer gets, accurate time, etc)
• What causes the skewness and how to resolve it (not investigated here)

4 – My PX SQL performance is unstable
SQL ID: 8nkpzgz08mdc8
select /*+ PARALLEL(4) */ count(*)
where a.object_id = b.object_id;
select child_number, elapsed_time, buffer_gets, executions,
px_servers_executions
from gv$sql
where sql_id = '8nkpzgz08mdc8';
------------ -------------- ------------ ---------- ---------------------
0 3,498,326 97,015 3 0
1 13,187,086 196,073 0 12

4 – PX SQL perf unstable - Checking plan
--------------------------------------------------------------------------------------
| Id|Operation |Name |E-Rows|Cost(%CPU)| TQ |IN-OUT|PQ Distrib |
--------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | |7375 (100)| | | |
| 1| SORT AGGREGATE | | 1 | | | | |
| 2| PX COORDINATOR | | | | | | |
| 3| PX SEND QC (RANDOM) |:TQ10002| 1 | |Q1,02| P->S |QC (RAND) |
|* 5| HASH JOIN | | 74M|7375 (1)|Q1,02| PCWP | |
| 6| PX RECEIVE | | 2363K|3664 (1)|Q1,02| PCWP | |
| 7| PX SEND HYBRID HASH |:TQ10000| 2363K|3664 (1)|Q1,00| P->P |HYBRID HASH|
| 8| STATISTICS COLLECTOR| | | |Q1,00| PCWC | |
| 9| PX BLOCK ITERATOR | | 2363K|3664 (1)|Q1,00| PCWC | |
|*10| TABLE ACCESS FULL |T_CASE3 | 2363K|3664 (1)|Q1,00| PCWP | |
| 11| PX RECEIVE | | 2363K|3664 (1)|Q1,02| PCWP | |
| 12| PX SEND HYBRID HASH |:TQ10001| 2363K|3664 (1)|Q1,01| P->P |HYBRID HASH|
| 13| PX BLOCK ITERATOR | | 2363K|3664 (1)|Q1,01| PCWC | |
|*14| TABLE ACCESS FULL |T_CASE3 | 2363K|3664 (1)|Q1,01| PCWP | |
--------------------------------------------------------------------------------------
Does 4 slaves make
sense looking at this
plan vs SQL?

4 – PX SQL perf unstable – GV$SQL “history”
After 1st exec
------------ -------------- ------------ ---------- ---------------------
0 16,630 212 1 0
1 8,496,684 98,491 0 8
After 2nd exec
------------ -------------- ------------ ---------- ---------------------
0 3,491,757 96,975 2 0
1 8,496,684 98,491 0 8
After 3rd exec
------------ -------------- ------------ ---------- ---------------------
0 3,498,326 97,015 3 0
1 13,187,086 196,073 0 12
We got lucky here
Info are accumulated thus
very hard to spot downgrades

4 – PX SQL perf unstable – ASH data
select distinct sql_exec_id, sql_exec_start,
case when px_flags is null then 'SERIAL'
else 'DoP '||trunc(px_flags/ 2097152)
end dop
where sql_id = '8nkpzgz08mdc8'
order by 2;
SQL_EXEC_ID SQL_EXEC_START DOP
----------- ------------------- ---------
16777216 2017-08-20/17:08:50 DoP 4
16777217 2017-08-20/17:09:16 SERIAL
16777218 2017-08-20/17:10:03 DoP 2
No need for luck we
got ASH 

5 – My PX SQL perf is unstable – more fun
SQL ID: gcgmgk8m8v4vm
with a as (select /*+ materialize parallel(4)*/ a.object_id, b.object_name
where a.object_id = b.object_id
and rownum <= 1e6)
select count(*)
from (select /*+ parallel(4) no_merge */ c.object_name
from t_case3 c, a
where a.object_id = c.object_id
and a.object_name = c.object_name);
------------ -------------- ------------ ---------- ---------------------
0 4,097,033 50,679 1 0
1 14,191,651 98,499 0 8

5 – My PX SQL perf is unstable - Checking plan
--------------------------------------------------------------------------------------------------------------------
| Id |Operation |Name |E-Rows | Cost | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | | 11502| | | |
| 1| TEMP TABLE TRANSFORMATION | | | | | | |
| 2| LOAD AS SELECT (CURSOR DURATION MEMORY)|SYS_TEMP_0FD9D6B81_119A63B| | | | | |
|* 3| COUNT STOPKEY | | | | | | |
| 4| PX COORDINATOR | | | | | | |
| 5| PX SEND QC (RANDOM) |:TQ10002 | 74M| 7375|Q1,02| P->S | QC (RAND) |
| 6| BUFFER SORT | | 1000K| |Q1,02| PCWP | |
|* 7| COUNT STOPKEY | | | |Q1,02| PCWC | |
|* 8| HASH JOIN | | 74M| 7375|Q1,02| PCWP | |
| 9| PX RECEIVE | | 2363K| 3664|Q1,02| PCWP | |
| 10| PX SEND HYBRID HASH |:TQ10000 | 2363K| 3664|Q1,00| P->P | HYBRID HASH|
| 11| STATISTICS COLLECTOR | | | |Q1,00| PCWC | |
| 12| PX BLOCK ITERATOR | | 2363K| 3664|Q1,00| PCWC | |
|* 13| TABLE ACCESS FULL |T_CASE3 | 2363K| 3664|Q1,00| PCWP | |
| 15| PX SEND HYBRID HASH |:TQ10001 | 2363K| 3664|Q1,01| P->P | HYBRID HASH|
| 18| SORT AGGREGATE | | 1 | | | | |
| 19| PX COORDINATOR | | | | | | |
| 20| PX SEND QC (RANDOM) |:TQ20002 | 1 | |Q2,02| P->S | QC (RAND) |
| 22| VIEW | | 1000K| 4127|Q2,02| PCWP | |
|* 23| HASH JOIN | | 1000K| 4127|Q2,02| PCWP | |
| 25| PX SEND HASH |:TQ20000 | 1000K| 461|Q2,00| P->P | HASH |
| 26| VIEW | | 1000K| 461|Q2,00| PCWP | |
|* 28| TABLE ACCESS FULL |SYS_TEMP_0FD9D6B81_119A63B| 1000K| 461|Q2,00| PCWP | |
| 30| PX SEND HASH |:TQ20001 | 2363K| 3664|Q2,01| P->P | HASH |
--------------------------------------------------------------------------------------------------------------------

5 – My PX SQL perf is unstable – ASH solution
select sample_time, program, sql_plan_line_id, case when px_flags … dop from ash
where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' order by 1, 2 ;
SAMPLE_TIME PROGRAM SQL_PLAN_LINE_ID DOP
----------------------------- -------------------- ---------------- --------
20-AUG-17 05.50.13.290 PM oracle@oel7 (P000) 6 DoP 4
20-AUG-17 05.50.16.294 PM sqlplus@Mauros-MBP.w 2 SERIAL

5 – My PX SQL perf is unstable – SQLMon sol

5 – My PX SQL perf is unstable – SQLM poking
select sid, process_name, px_maxdop, px_servers_requested, px_servers_allocated, px_server#,
px_server_group, px_server_set, px_qcsid
from gv$sql_monitor
where sql_exec_id = 16777216
and sql_id = 'gcgmgk8m8v4vm'
order by px_server_set nulls first, px_server# nulls first;
SID PROCE PX_MAXDOP PX_S_REQUESTED PX_S_ALLOC PX_SERVER# PX_SERVER_GROUP PX_SERVER_SET PX_QCSID
---------- ----- ---------- -------------- ---------- ---------- --------------- ------------- ----------
244 ora 4 16 8
373 p000 1 1 1 244
132 p001 2 1 1 244
256 p002 3 1 1 244
13 p003 4 1 1 244
133 p004 1 1 2 244
255 p005 2 1 2 244
372 p006 3 1 2 244
12 p007 4 1 2 244

4 & 5 PX SQL perf unstable – Summary
• Ability to determine DoP during / after execution
• Regardless of V$PX_SESSION (and others) views
• Ability to determine DoP on a per DFO-tree basis
• Pretty much impossible from GV$ / AWR
• Multiple dimensions can be added to drill down into slave execs (e.g waits)
• SQL Monitor only way to extract low level info per slave
• For example, buffer gets, accurate time, #rows, starts, etc
• What causes the downgrade (not investigated here)

Trivia – My SQL blew up TEMP
SQL ID: 0qnb575hn2mkr (FAILS) & dm53symv2vmy6 (WORKS)
select /*+ PARALLEL(4) LEADING(B A C)
USE_SWAP(c) USE_HASH(A) USE_HASH(C) FAILS|WORKS */
count(*)
from t_case3 a, t_case3 b, t_case3 c
where a.object_id = b.object_id and a.object_id = c.object_id;
ERROR at line 1:
ORA-12801: error signaled in parallel query server P000
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
<<hint, I’m messing with the env and with you>>

select sql_id, child_number, executions, px_servers_executions,
buffer_gets, disk_reads, direct_reads, direct_writes
from gv$sql
where sql_id in ('dm53symv2vmy6','0qnb575hn2mkr')
order by 1,2;
SQL_ID CHILD EXECS PX_EXECS BUFFER_GETS DISK_READS DIRECT_READS DIRECT_WRITES
------ ------ ----- -------- ------------ ----------- ------------- -------------
0qnb57 0 1 0 104 0 0 0
0qnb57 1 0 8 145,756 147,300 147,300 73,253
dm53sy 0 1 0 15 0 0 0
dm53sy 1 0 8 145,752 145,068 145,068 0

sql_id = 'dm53symv2vmy6' group by sample_time order by 1;
SAMPLE_TIME SUM(PGA_ALLOCATED) SUM(TEMP_SPACE_ALLOCATED)
---------------------------- ------------------ -------------------------
20-AUG-17 07.16.06.259 PM 84,414,464 0
20-AUG-17 07.16.07.259 PM 766,644,224 0
20-AUG-17 07.16.08.264 PM 766,644,224 0
<<…>>
20-AUG-17 07.16.24.316 PM 766,644,224 0
20-AUG-17 07.16.25.316 PM 766,644,224 0
sql_id = '0qnb575hn2mkr' group by sample_time order by 1;
SAMPLE_TIME SUM(PGA_ALLOCATED) SUM(TEMP_SPACE_ALLOCATED)
---------------------------- ------------------ -------------------------
20-AUG-17 07.17.49.463 PM 17,075,200 61,865,984
20-AUG-17 07.17.50.463 PM 40,308,736 148,897,792
<<…>>
20-AUG-17 07.17.55.467 PM 58,396,672 509,607,936
20-AUG-17 07.17.56.466 PM 58,396,672 583,008,256
Used less PGA but
spilled to TEMP

Trivia – My SQL blew up TEMP – SQL Monitor
dm53symv2vmy60qnb575hn2mkr

select sql_id, child_number, optimizer_env_hash_value
from gv$sql
where sql_id in ('dm53symv2vmy6','0qnb575hn2mkr');
SQL_ID CHILD_NUMBER OPTIMIZER_ENV_HASH_VALUE
------------- ------------ ------------------------
0qnb575hn2mkr 0 3821565029
0qnb575hn2mkr 1 128879201
dm53symv2vmy6 0 3821565029
dm53symv2vmy6 1 128879201
Same CBO
environment aka
same CBO params
Not all _smm_*
params make it into
CBO env!!!

6 – SQL blew up TEMP – prevention!!
SQL ID: 8d5h5p8znx8mx
select /*+ PARALLEL(4) LEADING(B A C) USE_SWAP(c)
USE_HASH(A) USE_HASH(C) */
count(*)
from t_case1 a,
t_case1 b,
t_case1 c
and a.object_id = c.object_id;
<< not using GV$/AWR because we need to differentiate per exec >>

6 – SQL blew up TEMP – history
1st run
SAMPLE_TIME SQL_EXEC_ID PGA TEMP
------------------------------- ----------- ---------- ----------
21-AUG-17 10.09.12.674 AM 16777217 105.22 61
2nd run – data is growing
------------------------------- ----------- ---------- ----------
21-AUG-17 10.15.32.182 AM 16777218 17.12 20
21-AUG-17 10.15.33.182 AM 16777218 70.12 113
3rd run – data keeps growing
------------------------------- ----------- ---------- ----------
21-AUG-17 10.16.31.259 AM 16777219 2.26 0
21-AUG-17 10.16.32.259 AM 16777219 21.69 40
21-AUG-17 10.16.33.261 AM 16777219 70.12 110

6 – SQL blew up TEMP – Aggregated history
Aggregating over a few runs the trend is obvious (increasing memory usage)
SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
----------- ------------------- -------------------------- ------- ----------
16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82
16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61
16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113
16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188

6 – SQL blew up TEMP – Chart your data!
ASH info are really
easy to chart
Faster to consume!

6 – SQL blew up TEMP – Keep executing
One new “break the pattern” execution showed up
----------- ------------------- ------------------------- ------- -----
16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82
16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61
16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113
16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109
Touched less PGA /
TEMP but took longer

6 – SQL blew up TEMP – Drill into 1 exec
sql_id = '8d5h5p8znx8mx' and sql_exec_id = 16777222
SAMPLE_TIME SID PROGRAM EVENT
--------------------------- --- -------------------- ---------------------------------
21-AUG-17 10.26.28.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.33.480 AM 253 oracle@oel7 (P005)
21-AUG-17 10.26.33.480 AM 362 oracle@oel7 (P006)
21-AUG-17 10.26.34.482 AM 16 oracle@oel7 (P003) direct path write temp

6 – SQL blew up TEMP – Keep executing
One more execution showed up
----------- ------------------- ------------------------ ------- -----
16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82
16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61
16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113
16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109
16777223 2017-08-21/10:34:05 +000000000 00:00:02.126 69.12 108
Same PGA / TEMP as
previous but much
faster

6 – SQL blew up TEMP – Mystery solved
One more execution showed up, but they are from different sessions
SID SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
--- ----------- ------------------- ------------------------ ------- -----
130 16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
130 16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
130 16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
254 16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109
254 16777223 2017-08-21/10:34:05 +000000000 00:00:02.126 69.12 108
130 16777224 2017-08-21/10:38:01 +000000000 00:00:05.402 139.94 201

6 – SQL blew up TEMP – Why not AWR?
select child_number, executions, px_servers_executions, elapsed_time,
direct_writes,
elapsed_time/nvl(nullif(px_servers_executions,0),executions) et_exec,
direct_writes/nvl(nullif(px_servers_executions,0),executions)
direct_wrtes_exec
from gv$sql
where sql_id = '8d5h5p8znx8mx';
CHILD_NUMBER EXECS PX_EXECS ELAP_TIME DIRECT_W ET_EXEC DIRECT_W_EXEC
------------ ----- -------- ---------- -------- ------------ -------------
0 9 0 7,803,972 0 867,108 0
1 0 71 91,106,462 154,559 1,283,189.61 2,176.88
You might be able to
figure it out from GV$
but need a lot of
imagination and luck 

6 – SQL blew up TEMP – Summary
• Ability to monitor spill at per-execution and per-session basis
• AWR would only show aggregated into
• Similar info available for IOPS and IO bytes (and memory scan in V$ASH)
• Charting info allows easy monitoring
• Large amount of info consumed quickly
• SQL Monitor relies on same ASH info
• Even without SQL Mon, tons of info can be extract from ASH

7 – Making sense of “strange” executions
SQL ID: 06pbgg9w0bmgp
select /*+ mauro */ a.*
where a.owner = l_owner
and a.object_id = b.object_id
and burn_cpu(a.object_id/b.object_id) = 1
select child_number, executions, end_of_fetch_count,
elapsed_time, fetches, rows_processed
from gv$sql
where sql_id = '06pbgg9w0bmgp';
CHILD EXECS EOF_COUNT ELAPSED_TIME FETCHES ROWS_PROCESSED
----- ----- --------- ------------ -------- --------------
0 3 0 30,419,821 6 30
This is a single session
executing the SQL
Why none reached EOF?

7 – Making sense of “strange” executions
sql_id = '06pbgg9w0bmgp' and session_id = 377 order by sample_time;
SAMPLE_TIME SQLEXECID SEXECSTA
------------------ --------- --------
06.04.37.450 PM 16777222 18:04:36
06.04.38.450 PM 16777222 18:04:36
06.04.39.450 PM 16777222 18:04:36
06.04.40.450 PM 16777222 18:04:36
06.04.41.450 PM 16777223 18:04:41
06.04.42.450 PM 16777223 18:04:41
06.04.43.450 PM 16777223 18:04:41
06.04.44.450 PM 16777223 18:04:41
06.04.45.450 PM 16777223 18:04:41
06.04.46.450 PM 16777224 18:04:46
06.04.47.450 PM 16777224 18:04:46
06.04.48.450 PM 16777224 18:04:46
06.04.49.450 PM 16777224 18:04:46
06.04.50.450 PM 16777224 18:04:46
06.04.51.450 PM 16777224 18:04:46
06.04.52.450 PM 16777223 18:04:41
06.04.53.450 PM 16777223 18:04:41
06.04.54.450 PM 16777223 18:04:41
06.04.55.450 PM 16777223 18:04:41
06.04.56.450 PM 16777223 18:04:41
06.04.57.450 PM 16777224 18:04:46
06.04.58.450 PM 16777224 18:04:46
06.04.59.450 PM 16777224 18:04:46
06.05.00.450 PM 16777224 18:04:46
06.05.01.450 PM 16777224 18:04:46
06.05.02.450 PM 16777222 18:04:36
06.05.03.450 PM 16777222 18:04:36
06.05.04.450 PM 16777222 18:04:36
06.05.05.450 PM 16777222 18:04:36
06.05.06.450 PM 16777222 18:04:36

7 – Making sense of “strange” executions -
Summary
• ASH data can be used to “slice” GV$ data and make more sense out of it
• In this specific case maybe not a cursor leak
• Since the cursor is used multiple times
• Same approach could be used to potentially spot a cursor leak
• Would require the SQL to take “long” enough to spot it
• Question not answered
• Why would somebody do anything like this 

Something worth knowing
• ASH data uses default values until the value is not “ready to consume”
• Adaptive Plans could take a while to resolve and until then PHV is 0
select /*+ LEADING(a) */ count(a.object_id)
from (select /*+ no_merge leading (a) */ 1 object_id, 'a' owner
from (select rownum from dual connect by rownum <= 1000) a,
(select rownum from dual connect by rownum <= 1000) b) a,
(select a.object_id
from t1 a, t2 b
where a.object_id = b.n1
and a.data_object_id = 1
and a.owner = 'SYS') b

----------------------------------------------------------------------------------------
| Id |Operation |Name|E-Rows|Cost (%CPU)| Pstart| Pstop |
----------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | | 679 (100)| | |
| 1| SORT AGGREGATE | | 1| | | |
|- * 2| HASH JOIN | | 1| 679 (1)| | |
| 3| NESTED LOOPS | | 1| 679 (1)| | |
|- 4| STATISTICS COLLECTOR | | | | | |
| * 5| HASH JOIN | | 1| 405 (1)| | |
| 6| VIEW | | 1| 4 (0)| | |
| 7| MERGE JOIN CARTESIAN | | 1| 4 (0)| | |
| 8| VIEW | | 1| 2 (0)| | |
| 9| COUNT | | | | | |
| 10| CONNECT BY WITHOUT FILTERING | | | | | |
| 11| FAST DUAL | | 1| 2 (0)| | |
| 12| BUFFER SORT | | 1| 4 (0)| | |
| 13| VIEW | | 1| 2 (0)| | |
| 14| COUNT | | | | | |
| 15| CONNECT BY WITHOUT FILTERING| | | | | |
| 16| FAST DUAL | | 1| 2 (0)| | |
| * 17| TABLE ACCESS FULL |T1 | 1| 401 (1)| | |
| 18| PARTITION RANGE ITERATOR | | 49999| 274 (0)| KEY | KEY |
| * 19| TABLE ACCESS FULL |T2 | 49999| 274 (0)| KEY | KEY |
|- 20| PARTITION RANGE JOIN-FILTER | | 49999| 274 (0)|:BF0000|:BF0000|
|- 21| TABLE ACCESS FULL |T2 | 49999| 274 (0)|:BF0000|:BF0000|
----------------------------------------------------------------------------------------

select sample_time, sql_plan_hash_value, sql_plan_line_id
where sql_id = '8x52hyvsh1j45'
order by sample_time
SAMPLE_TIME SQL_PLAN_HASH_VALUE SQL_PLAN_LINE_ID
--------------------------- ------------------- ----------------
28-AUG-17 03.42.43.251 PM 0 7
28-AUG-17 03.42.44.251 PM 0 6
28-AUG-17 03.42.45.251 PM 0 6
28-AUG-17 03.42.46.252 PM 0 7
28-AUG-17 03.42.47.252 PM 0 7
28-AUG-17 03.42.48.252 PM 0 5

Things we just can’t do (as of now)
• Current diagnostic very comprehensive
• Allow to answer many questions around SQL execution
• Still some questions unanswered, some examples
• SQL Plan Baseline / SQL Patch used or not in the past (AWR limitation)
• High Version Count in the past (AWR “limitation”)
• Details of “old” CBO environment (encoded, no public API)
• Historical binds for slow execution (unless captured, requires luck)
• Changes in NLS environment in the past (current, V$SQL_SHARED_CURSOR)
• Probably not a big problem, unless you hit it 

Summary
• Oracle diagnostics rocks when used properly
• No single source of info, needs combining to get full picture
• ASH provides different point of view into SQL execution
• Needed more than expected
• Regardless of the source, visualizing things make it easier
• But this is Enkitec so you are stuck with me & SQL*Plus 
• SQL Monitoring fills some of the gaps
• Still many info come from ASH, available even historically (more than SQLMon)
• Statspack + free ASH can provide useful info
• Unfortunately not as comprehensive as the “real” ones

Contact Information
• Blog: http://mauro-pagano.com
• Free tools
• SQLd360
• TUNAs360
• Pathfinder
• An “interesting” post every N posts
• Email: mauro.pagano@gmail.com
• Twitter: @mautro
71

SQL Tuning, takes 3 to tango

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie SQL Tuning, takes 3 to tango

Ähnlich wie SQL Tuning, takes 3 to tango (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

SQL Tuning, takes 3 to tango

Hinweis der Redaktion