SlideShare ist ein Scribd-Unternehmen logo
1 von 67
SQL Tuning
Takes three to tango
Mauro Pagano
Mauro Pagano @mautro
• Worked at Oracle, been at Enkitec (AEG) a while now
• Spend most of the time on performance problems
• Free tools: SQLd360, TUNAs360 etc (at Oracle: SQLT, SQLHC etc)
• Strong British accent
• “Newbie old fart” (approved by Bryn)
2
but….
1. Do you use AWR for SQL Tuning?
1. Do you start from it?
2. Why?
3. How do you use it?
2. Do you use ASH for SQL Tuning?
1. Do you start from it?
2. Why?
3. How do you use it?
3. Do you use SQL Monitoring for SQL tuning?
1. Do you start from it?
2. Why?
3. How do you use it?
Poll time – “Historical” SQL Tuning
What are we doing here today?
• Oracle has a ton of diagnostics (awesome!)
• People tend to rely on GV$ / AWR more than ASH
• Some questions harder to answer (if possible) from GV$/AWR data
• Today’s goal is:
• Present scenarios where multiple sources needed
• Explain why & where to gather the missing info, make sense out of it
• Knowing what info represent / source, better use of them
• Focus is on diagnostics
What are we NOT doing here today?
• Argument about which one is better
• They complement each other, not exclude each other
• Need all (often AWR+ASH enough) to have a full picture
• One could be enough depending on cases, still the other adds value
• Provide solution to scenarios presented
• Today it’s about diagnostics, not problem X or Y
• Once behavior identified correctly, solution is often easier to find (if exists)
• Talk license / cost associated with the Packs used
Some (incorrect) terminology we’ll use today
• GV$ all views on X$, except X$ASH
• For Example, GV$SQL
• AWR all the tables in AWR except ASH
• For example, DBA_HIST_SQLSTAT and DBA_HIST_SQL_PLAN
• ASH as GV$ACTIVE_SESSION_HISTORY and
DBA_HIST_ACTIVE_SESS_HISTORY
• SQL Mon as SQL Monitor
• Both raw data (GV$) and reports (current and historical)
How are AWR / (historical) ASH populated?
• AWR takes a picture every N minutes (or manual)
• Source views store accumulated data, take a pic of that at time T
• Historical ASH filters out samples from memory ASH
• Filtered may show info not important enough to show up in accumulated
• Source data includes info for all active sessions individually (not aggregated)
• Ratio is generally 1:10
• X$KEW[A|R]* help to narrow down what to collect
• Might make things a little harder to “break” in isolation 
Why do we need both? AWR
Knows exactly
how much water
ASH
Knows roughly
who, when, how…
ASH samples - Why can we live with it?
Elapsed Time / execution
Frequencyofexecution
How you move
these bars
depends on your
app
Before we begin…
• Every case is artificial, represents real file case without noise
• Case itself is just a mean to an end, not really the focus of the scenario
• Cases build on each other, start simple and get into a little more complex
• OF COURSE I’m cheating!
• What we’ll see can be applied to any environment
• Knowing how to interpret and spot things helps in dev too
• Charts used just to present large amount of info in small space
• Not trying to push for any specific tool
• Get into DataViz anyway, it makes life so much easier!
Two tables, all we need
-- 8x dba_objects
create table t_case1 as
select *
from dba_objects,
(select rownum n1 from dual connect by rownum <= 8);
create index t_case1_objtype on t_case1(object_type);
-- 32x dba_objects
create table t_case3 as
select *
from t_case1,
(select rownum from dual connect by rownum <= 4);
1 - How long does my SQL take?
SQL ID: apg0k1r43s8ak
SQL Text: select * from t_case1 where object_type = :b1;
Plan hash value: 3696583251
---------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T_CASE1 |
|* 2 | INDEX RANGE SCAN | T_CASE1_OBJTYPE |
---------------------------------------------------------------
2 - access("OBJECT_TYPE"=:B1)
1 - How long does my SQL take? Total
<<removed plan_hash_value&child_number just to make it fit, one child only>>
select elapsed_time, buffer_gets, executions,
trunc(elapsed_time/executions,2) elapsed_exec,
trunc(buffer_gets/executions,2) lio_exec
from gv$sql
where sql_id = 'apg0k1r43s8ak';
ELAPSED_TIME BUFFER_GETS EXECUTIONS ELAPSED_EXEC LIO_EXEC
-------------- ------------ ---------- ------------- -----------
2,446,627 48,884 11 222,420.63 4,444
Elapsed/exec ~220ms
Gets/exec ~4.5k
1 - How long does my SQL take? Good run
var b1 varchar2(20)
exec :b1 := 'RULE';
select * from t_case1 where object_type = :b1;
-----------------------------------------------------------------------------
| Id |Operation |Name |A-Rows|A-Time |Buffers|
-----------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 8|00:00.01| 13|
| 1| TABLE ACCESS BY INDEX ROWID B|T_CASE1 | 8|00:00.01| 13|
|* 2| INDEX RANGE SCAN |T_CASE1_OBJTYPE| 8|00:00.01| 5|
-----------------------------------------------------------------------------
Elapsed: 00:00:00.07
~10ms 13 buffer gets
1 - How long does my SQL take? Bad run
var b1 varchar2(20)
exec :b1 := 'JAVA CLASS';
select * from t_case1 where object_type = :b1;
-----------------------------------------------------------------------------
|Id|Operation |Name |A-Rows|A-Time |Buffers|Reads|
-----------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 305K|00:01.80| 54926| 8009|
| 1| TABLE ACCESS BY INDEX R B|T_CASE1 | 305K|00:01.80| 54926| 8009|
|*2| INDEX RANGE SCAN |T_CASE1_OBJTYPE| 305K|00:00.63| 22737| 1300|
-----------------------------------------------------------------------------
Elapsed: 00:05:11.88
Can we look at the
data from a different
POV?
~1.80s 55k buffer gets
1 - How long does my SQL take? ASH POV
select sample_time, sql_exec_id, sql_exec_start from gv$active_session_history where sql_id =
'apg0k1r43s8ak' order by sample_time;
SAMPLE_TIME SQL_EXEC_ID SQL_EXEC_START
--------------------------- ----------- -------------------
20-AUG-17 11.39.58.754 AM 16777222 2017-08-20/11:39:46
20-AUG-17 11.40.11.761 AM
20-AUG-17 11.40.31.781 AM
20-AUG-17 11.40.40.791 AM
20-AUG-17 11.40.43.792 AM
20-AUG-17 11.40.48.799 AM
20-AUG-17 11.40.51.800 AM
20-AUG-17 11.40.57.801 AM
20-AUG-17 11.41.03.809 AM
20-AUG-17 11.41.05.811 AM
20-AUG-17 11.41.23.833 AM
20-AUG-17 11.41.35.846 AM
20-AUG-17 11.41.52.863 AM
20-AUG-17 11.41.54.864 AM
20-AUG-17 11.42.03.870 AM
20-AUG-17 11.42.08.875 AM
20-AUG-17 11.42.21.882 AM
20-AUG-17 11.42.31.891 AM
20-AUG-17 11.42.34.896 AM
Jumps in time – session not
always busy during the
missing sample
User experience is 5
minutes not 2 secs
Not much we can do from
the DB perspective here
1 - How long does my SQL take? Summary
• Questions answered
• GV$SQL (and similar) report time spent in DB calls, not user experience
• GV$SQL (and similar) aggregates time over executions of same cursor
• ASH sampled data helps understand how DB Time is spread over clock time
• In this case showing how clock time was likely NOT spent inside the DB
• ASH data has many dimensions, can help narrow down further
• For example, all slow executions come from app server X
• Question not solved
• Why slow execution was slow (was easy this time, we provided the bind)
• Historical binds are sampled, no direct correlation with specific execution
• Ideally pick up value and run SQL to reproduce
2 - How long did my SQL take? AWR
SQL ID: 8gv4bwmnp8kmq
select /*+ LEADING(A) USE_NL(B) */ count(*)
from t_case1 a, t_case1 b
where rownum <= 1e10;
select snap_id, executions_delta e_d, executions_total e_t,
end_of_fetch_count_delta eof_d,
trunc(elapsed_time_delta/1e6) et_d_s,
trunc(elapsed_time_total/1e6) et_t_s,
buffer_gets_delta bg_d, buffer_gets_total bg_t
from dba_hist_sqlstat
where sql_id = '8gv4bwmnp8kmq' order by snap_id;
SNAP_ID E_D E_T EOF_D ET_D_S ET_T_S BG_D BG_T
---------- --- --- ----- ------ ------ ----------- ------------
3341 0 1 0 187 188 77,221,193 77,693,626
3342 0 1 0 126 314 51,883,866 129,577,492
3343 0 1 0 128 442 52,887,666 182,465,158
No info from
snapshots when SQL
started & ended
2 - How long did my SQL take? AWR report
No trivial way to determine
#concurrent execs.
Doable from *_TOTAL
raw info
2 - How long did my SQL take? Concurr Execs
Time passing
SNAP_ID
3341 3342 3343
Exec #1, starts
second and
completes second
Not expensive
enough to get
captured
Exec #2, starts last
and completes
first
Session 1
Session 2
Session 3
2 - How long did my SQL take? ASH data
select sql_exec_id, sql_exec_start,
min(sample_time) first_sample, max(sample_time) last_sample,
max(sample_time)-sql_exec_start elapsed
from dba_hist_active_sess_history
where sql_id = '8gv4bwmnp8kmq'
group by sql_exec_id, sql_exec_start;
SQL_EXEC_ID SQL_EXEC_START FIRST_SAMPLE
----------- ------------------- ---------------------------
16777216 2017-08-20/13:04:43 20-AUG-17 01.04.52.779 PM
LAST_SAMPLE ELAPSED
-------------------------- --------------------------
20-AUG-17 01.12.32.799 PM +000000000 00:07:49.79
Only one execution,
took ~8 mins
2 - How long did my SQL take? Summary
• Questions answered
• AWR only captures what mattered for the snapshot
• Can miss start / stop “slice” of info if not impacting enough within snapshot
• Raw info allows to determine number of concurrent executions, not AWR report
• Can only say how many started / ended, not which one
• ASH keeps only a subset of samples, but for each exec
• With approximation, allows to determine the who, when, where of each exec
• Questions not answered
• What if my execution takes very little? Sample compromise / doesn’t matter
3 – How is my PX doing?
SQL ID: frzgf5tc9cscc
select /*+ LEADING(A) PARALLEL(4) */ count(*)
from t_case1 a, t_case1 b
where a.owner = b.owner;
<< while SQL running >>
select child_number, elapsed_time, buffer_gets, executions,
px_servers_executions
from gv$sql
where sql_id = 'frzgf5tc9cscc';
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ----------- ---------- ---------------------
0 11,677 172 1 0
1 34,682,909 6 0 0
3 – How is my PX doing? Running slow!
<< SQL still running >>
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 13,682 172 1 0
1 59,852,041 2,734 0 0
after CTRL+C (was taking too long)
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 519,951 172 1 0
1 96,314,353 13,205 0 8
Up to this point we know
8 sessions involved and
aggregated stats only
3 – How is my PX doing? Checking plan
<< SQL was still running>>
---------------------------------------------------------------------------------------
| Id|Operation |Name |E-Rows|Cost (%CPU)| TQ |IN-OUT|PQ Distrib |
---------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | |19462 (100)| | | |
| 1| SORT AGGREGATE | | 1 | | | | |
| 2| PX COORDINATOR | | | | | | |
| 3| PX SEND QC (RANDOM) |:TQ10002| 1 | |Q1,02| P->S |QC (RAND) |
| 4| SORT AGGREGATE | | 1 | |Q1,02| PCWP | |
|* 5| HASH JOIN | | 27G|19462 (86)|Q1,02| PCWP | |
| 6| PX RECEIVE | | 940K| 1377 (1)|Q1,02| PCWP | |
| 7| PX SEND HYBRID HASH |:TQ10000| 940K| 1377 (1)|Q1,00| P->P |HYBRID HASH|
| 8| STATISTICS COLLECTOR| | | |Q1,00| PCWC | |
| 9| PX BLOCK ITERATOR | | 940K| 1377 (1)|Q1,00| PCWC | |
|*10| TABLE ACCESS FULL |T_CASE1 | 940K| 1377 (1)|Q1,00| PCWP | |
| 11| PX RECEIVE | | 940K| 1377 (1)|Q1,02| PCWP | |
| 12| PX SEND HYBRID HASH |:TQ10001| 940K| 1377 (1)|Q1,01| P->P |HYBRID HASH|
| 13| PX BLOCK ITERATOR | | 940K| 1377 (1)|Q1,01| PCWC | |
|*14| TABLE ACCESS FULL |T_CASE1 | 940K| 1377 (1)|Q1,01| PCWP | |
---------------------------------------------------------------------------------------
Nothing surprising, plan
you’d expect when
dealing with large #rows
Maybe PX Skewness?
Can’t use V$PQ_TQSTAT,
we CTRL+Ced exec
Not downgraded,
used 8 processes
3 – How is my PX doing? PX Skew & ASH data
select session_id, session_serial#, program, count(*)
from gv$active_session_history
where sql_id = 'frzgf5tc9cscc'
and sql_exec_id = 16777217
group by session_id, session_serial#, program;
SESSION_ID SESSION_SERIAL# PROGRAM COUNT(*)
---------- --------------- -------------------- ----------
8 55195 oracle@oel7 (P003) 217
133 37006 oracle@oel7 (P001) 12
QC not showing nor most
of the other processes,
P003 top consumer
Adding new ASH cols in the
SQL we can drill down, e.g.
plan step where time goes
3 – How is my PX doing? PX Skew & SQL Mon
Many PX info in SQL
Mon NOT COMING
from ASH 
3 – How is my PX doing? PX Skew Summary
• Questions answered
• Presence of skewness during / after SQL execution
• Regardless of V$PQ_TQSTAT view (tricky to use)
• Needs SQL Monitor to have low level info (buffer gets, accurate time, etc)
• Questions not answered
• What causes the skewness and how to resolve it (not investigated here)
4 – My PX SQL performance is unstable
SQL ID: 8nkpzgz08mdc8
select /*+ PARALLEL(4) */ count(*)
from t_case3 a, t_case3 b
where a.object_id = b.object_id;
select child_number, elapsed_time, buffer_gets, executions,
px_servers_executions
from gv$sql
where sql_id = '8nkpzgz08mdc8';
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 3,498,326 97,015 3 0
1 13,187,086 196,073 0 12
4 – PX SQL perf unstable - Checking plan
--------------------------------------------------------------------------------------
| Id|Operation |Name |E-Rows|Cost(%CPU)| TQ |IN-OUT|PQ Distrib |
--------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | |7375 (100)| | | |
| 1| SORT AGGREGATE | | 1 | | | | |
| 2| PX COORDINATOR | | | | | | |
| 3| PX SEND QC (RANDOM) |:TQ10002| 1 | |Q1,02| P->S |QC (RAND) |
| 4| SORT AGGREGATE | | 1 | |Q1,02| PCWP | |
|* 5| HASH JOIN | | 74M|7375 (1)|Q1,02| PCWP | |
| 6| PX RECEIVE | | 2363K|3664 (1)|Q1,02| PCWP | |
| 7| PX SEND HYBRID HASH |:TQ10000| 2363K|3664 (1)|Q1,00| P->P |HYBRID HASH|
| 8| STATISTICS COLLECTOR| | | |Q1,00| PCWC | |
| 9| PX BLOCK ITERATOR | | 2363K|3664 (1)|Q1,00| PCWC | |
|*10| TABLE ACCESS FULL |T_CASE3 | 2363K|3664 (1)|Q1,00| PCWP | |
| 11| PX RECEIVE | | 2363K|3664 (1)|Q1,02| PCWP | |
| 12| PX SEND HYBRID HASH |:TQ10001| 2363K|3664 (1)|Q1,01| P->P |HYBRID HASH|
| 13| PX BLOCK ITERATOR | | 2363K|3664 (1)|Q1,01| PCWC | |
|*14| TABLE ACCESS FULL |T_CASE3 | 2363K|3664 (1)|Q1,01| PCWP | |
--------------------------------------------------------------------------------------
Does 4 slaves make
sense looking at this
plan vs SQL?
4 – PX SQL perf unstable – GV$SQL “history”
After 1st exec
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 16,630 212 1 0
1 8,496,684 98,491 0 8
After 2nd exec
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 3,491,757 96,975 2 0
1 8,496,684 98,491 0 8
After 3rd exec
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 3,498,326 97,015 3 0
1 13,187,086 196,073 0 12
We got lucky here
Info are accumulated thus
very hard to spot downgrades
4 – PX SQL perf unstable – ASH data
select distinct sql_exec_id, sql_exec_start,
case when px_flags is null then 'SERIAL'
else 'DoP '||trunc(px_flags/ 2097152)
end dop
from gv$active_session_history
where sql_id = '8nkpzgz08mdc8'
order by 2;
SQL_EXEC_ID SQL_EXEC_START DOP
----------- ------------------- ---------
16777216 2017-08-20/17:08:50 DoP 4
16777217 2017-08-20/17:09:16 SERIAL
16777218 2017-08-20/17:10:03 DoP 2
No need for luck we
got ASH 
5 – My PX SQL perf is unstable – more fun
SQL ID: gcgmgk8m8v4vm
with a as (select /*+ materialize parallel(4)*/ a.object_id, b.object_name
from t_case3 a, t_case3 b
where a.object_id = b.object_id
and rownum <= 1e6)
select count(*)
from (select /*+ parallel(4) no_merge */ c.object_name
from t_case3 c, a
where a.object_id = c.object_id
and a.object_name = c.object_name);
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 4,097,033 50,679 1 0
1 14,191,651 98,499 0 8
5 – My PX SQL perf is unstable - Checking plan
--------------------------------------------------------------------------------------------------------------------
| Id |Operation |Name |E-Rows | Cost | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | | 11502| | | |
| 1| TEMP TABLE TRANSFORMATION | | | | | | |
| 2| LOAD AS SELECT (CURSOR DURATION MEMORY)|SYS_TEMP_0FD9D6B81_119A63B| | | | | |
|* 3| COUNT STOPKEY | | | | | | |
| 4| PX COORDINATOR | | | | | | |
| 5| PX SEND QC (RANDOM) |:TQ10002 | 74M| 7375|Q1,02| P->S | QC (RAND) |
| 6| BUFFER SORT | | 1000K| |Q1,02| PCWP | |
|* 7| COUNT STOPKEY | | | |Q1,02| PCWC | |
|* 8| HASH JOIN | | 74M| 7375|Q1,02| PCWP | |
| 9| PX RECEIVE | | 2363K| 3664|Q1,02| PCWP | |
| 10| PX SEND HYBRID HASH |:TQ10000 | 2363K| 3664|Q1,00| P->P | HYBRID HASH|
| 11| STATISTICS COLLECTOR | | | |Q1,00| PCWC | |
| 12| PX BLOCK ITERATOR | | 2363K| 3664|Q1,00| PCWC | |
|* 13| TABLE ACCESS FULL |T_CASE3 | 2363K| 3664|Q1,00| PCWP | |
| 14| PX RECEIVE | | 2363K| 3664|Q1,02| PCWP | |
| 15| PX SEND HYBRID HASH |:TQ10001 | 2363K| 3664|Q1,01| P->P | HYBRID HASH|
| 16| PX BLOCK ITERATOR | | 2363K| 3664|Q1,01| PCWC | |
|* 17| TABLE ACCESS FULL |T_CASE3 | 2363K| 3664|Q1,01| PCWP | |
| 18| SORT AGGREGATE | | 1 | | | | |
| 19| PX COORDINATOR | | | | | | |
| 20| PX SEND QC (RANDOM) |:TQ20002 | 1 | |Q2,02| P->S | QC (RAND) |
| 21| SORT AGGREGATE | | 1 | |Q2,02| PCWP | |
| 22| VIEW | | 1000K| 4127|Q2,02| PCWP | |
|* 23| HASH JOIN | | 1000K| 4127|Q2,02| PCWP | |
| 24| PX RECEIVE | | 1000K| 461|Q2,02| PCWP | |
| 25| PX SEND HASH |:TQ20000 | 1000K| 461|Q2,00| P->P | HASH |
| 26| VIEW | | 1000K| 461|Q2,00| PCWP | |
| 27| PX BLOCK ITERATOR | | 1000K| 461|Q2,00| PCWC | |
|* 28| TABLE ACCESS FULL |SYS_TEMP_0FD9D6B81_119A63B| 1000K| 461|Q2,00| PCWP | |
| 29| PX RECEIVE | | 2363K| 3664|Q2,02| PCWP | |
| 30| PX SEND HASH |:TQ20001 | 2363K| 3664|Q2,01| P->P | HASH |
| 31| PX BLOCK ITERATOR | | 2363K| 3664|Q2,01| PCWC | |
|* 32| TABLE ACCESS FULL |T_CASE3 | 2363K| 3664|Q2,01| PCWP | |
--------------------------------------------------------------------------------------------------------------------
5 – My PX SQL perf is unstable – ASH solution
select sample_time, program, sql_plan_line_id, case when px_flags … dop from ash
where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' order by 1, 2 ;
SAMPLE_TIME PROGRAM SQL_PLAN_LINE_ID DOP
----------------------------- -------------------- ---------------- --------
20-AUG-17 05.50.13.290 PM oracle@oel7 (P000) 6 DoP 4
20-AUG-17 05.50.13.290 PM oracle@oel7 (P001) 6 DoP 4
20-AUG-17 05.50.13.290 PM oracle@oel7 (P002) 6 DoP 4
20-AUG-17 05.50.13.290 PM oracle@oel7 (P003) 6 DoP 4
20-AUG-17 05.50.14.290 PM oracle@oel7 (P000) 6 DoP 4
20-AUG-17 05.50.14.290 PM oracle@oel7 (P001) 6 DoP 4
20-AUG-17 05.50.14.290 PM oracle@oel7 (P002) 6 DoP 4
20-AUG-17 05.50.14.290 PM oracle@oel7 (P003) 6 DoP 4
20-AUG-17 05.50.15.289 PM oracle@oel7 (P000) 6 DoP 4
20-AUG-17 05.50.15.289 PM oracle@oel7 (P001) 6 DoP 4
20-AUG-17 05.50.15.289 PM oracle@oel7 (P002) 6 DoP 4
20-AUG-17 05.50.15.289 PM oracle@oel7 (P003) 6 DoP 4
20-AUG-17 05.50.16.294 PM sqlplus@Mauros-MBP.w 2 SERIAL
20-AUG-17 05.50.17.296 PM sqlplus@Mauros-MBP.w 23 SERIAL
20-AUG-17 05.50.18.296 PM sqlplus@Mauros-MBP.w 23 SERIAL
20-AUG-17 05.50.19.296 PM sqlplus@Mauros-MBP.w 23 SERIAL
5 – My PX SQL perf is unstable – SQLMon sol
5 – My PX SQL perf is unstable – SQLM poking
select sid, process_name, px_maxdop, px_servers_requested, px_servers_allocated, px_server#,
px_server_group, px_server_set, px_qcsid
from gv$sql_monitor
where sql_exec_id = 16777216
and sql_id = 'gcgmgk8m8v4vm'
order by px_server_set nulls first, px_server# nulls first;
SID PROCE PX_MAXDOP PX_S_REQUESTED PX_S_ALLOC PX_SERVER# PX_SERVER_GROUP PX_SERVER_SET PX_QCSID
---------- ----- ---------- -------------- ---------- ---------- --------------- ------------- ----------
244 ora 4 16 8
373 p000 1 1 1 244
132 p001 2 1 1 244
256 p002 3 1 1 244
13 p003 4 1 1 244
133 p004 1 1 2 244
255 p005 2 1 2 244
372 p006 3 1 2 244
12 p007 4 1 2 244
4 & 5 PX SQL perf unstable – Summary
• Questions answered
• Ability to determine DoP during / after execution
• Regardless of V$PX_SESSION (and others) views
• Ability to determine DoP on a per DFO-tree basis
• Pretty much impossible from GV$ / AWR
• Multiple dimensions can be added to drill down into slave execs (e.g waits)
• SQL Monitor only way to extract low level info per slave
• For example, buffer gets, accurate time, #rows, starts, etc
• Questions not answered
• What causes the downgrade (not investigated here)
Trivia – My SQL blew up TEMP
SQL ID: 0qnb575hn2mkr (FAILS) & dm53symv2vmy6 (WORKS)
select /*+ PARALLEL(4) LEADING(B A C)
USE_SWAP(c) USE_HASH(A) USE_HASH(C) FAILS|WORKS */
count(*)
from t_case3 a, t_case3 b, t_case3 c
where a.object_id = b.object_id and a.object_id = c.object_id;
ERROR at line 1:
ORA-12801: error signaled in parallel query server P000
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
<<hint, I’m messing with the env and with you>>
Trivia – My SQL blew up TEMP
select sql_id, child_number, executions, px_servers_executions,
buffer_gets, disk_reads, direct_reads, direct_writes
from gv$sql
where sql_id in ('dm53symv2vmy6','0qnb575hn2mkr')
order by 1,2;
SQL_ID CHILD EXECS PX_EXECS BUFFER_GETS DISK_READS DIRECT_READS DIRECT_WRITES
------ ------ ----- -------- ------------ ----------- ------------- -------------
0qnb57 0 1 0 104 0 0 0
0qnb57 1 0 8 145,756 147,300 147,300 73,253
dm53sy 0 1 0 15 0 0 0
dm53sy 1 0 8 145,752 145,068 145,068 0
Trivia – My SQL blew up TEMP
sql_id = 'dm53symv2vmy6' group by sample_time order by 1;
SAMPLE_TIME SUM(PGA_ALLOCATED) SUM(TEMP_SPACE_ALLOCATED)
---------------------------- ------------------ -------------------------
20-AUG-17 07.16.06.259 PM 84,414,464 0
20-AUG-17 07.16.07.259 PM 766,644,224 0
20-AUG-17 07.16.08.264 PM 766,644,224 0
<<…>>
20-AUG-17 07.16.24.316 PM 766,644,224 0
20-AUG-17 07.16.25.316 PM 766,644,224 0
sql_id = '0qnb575hn2mkr' group by sample_time order by 1;
SAMPLE_TIME SUM(PGA_ALLOCATED) SUM(TEMP_SPACE_ALLOCATED)
---------------------------- ------------------ -------------------------
20-AUG-17 07.17.49.463 PM 17,075,200 61,865,984
20-AUG-17 07.17.50.463 PM 40,308,736 148,897,792
<<…>>
20-AUG-17 07.17.55.467 PM 58,396,672 509,607,936
20-AUG-17 07.17.56.466 PM 58,396,672 583,008,256
Used less PGA but
spilled to TEMP
Trivia – My SQL blew up TEMP – SQL Monitor
dm53symv2vmy60qnb575hn2mkr
Trivia – My SQL blew up TEMP
select sql_id, child_number, optimizer_env_hash_value
from gv$sql
where sql_id in ('dm53symv2vmy6','0qnb575hn2mkr');
SQL_ID CHILD_NUMBER OPTIMIZER_ENV_HASH_VALUE
------------- ------------ ------------------------
0qnb575hn2mkr 0 3821565029
0qnb575hn2mkr 1 128879201
dm53symv2vmy6 0 3821565029
dm53symv2vmy6 1 128879201
Same CBO
environment aka
same CBO params
Not all _smm_*
params make it into
CBO env!!!
6 – SQL blew up TEMP – prevention!!
SQL ID: 8d5h5p8znx8mx
select /*+ PARALLEL(4) LEADING(B A C) USE_SWAP(c)
USE_HASH(A) USE_HASH(C) */
count(*)
from t_case1 a,
t_case1 b,
t_case1 c
where a.object_id = b.object_id
and a.object_id = c.object_id;
<< not using GV$/AWR because we need to differentiate per exec >>
6 – SQL blew up TEMP – history
1st run
SAMPLE_TIME SQL_EXEC_ID PGA TEMP
------------------------------- ----------- ---------- ----------
21-AUG-17 10.09.12.674 AM 16777217 105.22 61
2nd run – data is growing
------------------------------- ----------- ---------- ----------
21-AUG-17 10.15.32.182 AM 16777218 17.12 20
21-AUG-17 10.15.33.182 AM 16777218 70.12 113
3rd run – data keeps growing
------------------------------- ----------- ---------- ----------
21-AUG-17 10.16.31.259 AM 16777219 2.26 0
21-AUG-17 10.16.32.259 AM 16777219 21.69 40
21-AUG-17 10.16.33.261 AM 16777219 70.12 110
6 – SQL blew up TEMP – Aggregated history
Aggregating over a few runs the trend is obvious (increasing memory usage)
SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
----------- ------------------- -------------------------- ------- ----------
16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82
16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61
16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113
16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
6 – SQL blew up TEMP – Chart your data!
ASH info are really
easy to chart
Faster to consume!
6 – SQL blew up TEMP – Keep executing
One new “break the pattern” execution showed up
SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
----------- ------------------- ------------------------- ------- -----
16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82
16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61
16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113
16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109
Touched less PGA /
TEMP but took longer
6 – SQL blew up TEMP – Drill into 1 exec
sql_id = '8d5h5p8znx8mx' and sql_exec_id = 16777222
SAMPLE_TIME SID PROGRAM EVENT
--------------------------- --- -------------------- ---------------------------------
21-AUG-17 10.26.28.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.29.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.30.481 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.31.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.32.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.33.480 AM 253 oracle@oel7 (P005)
21-AUG-17 10.26.33.480 AM 362 oracle@oel7 (P006)
21-AUG-17 10.26.34.482 AM 16 oracle@oel7 (P003) direct path write temp
21-AUG-17 10.26.34.482 AM 135 oracle@oel7 (P000) direct path write temp
21-AUG-17 10.26.34.482 AM 255 oracle@oel7 (P001) direct path write temp
21-AUG-17 10.26.34.482 AM 373 oracle@oel7 (P002) direct path write temp
6 – SQL blew up TEMP – Keep executing
One more execution showed up
SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
----------- ------------------- ------------------------ ------- -----
16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82
16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61
16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113
16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109
16777223 2017-08-21/10:34:05 +000000000 00:00:02.126 69.12 108
Same PGA / TEMP as
previous but much
faster
6 – SQL blew up TEMP – Mystery solved
One more execution showed up, but they are from different sessions
SID SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
--- ----------- ------------------- ------------------------ ------- -----
130 16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
130 16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
130 16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
254 16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109
254 16777223 2017-08-21/10:34:05 +000000000 00:00:02.126 69.12 108
130 16777224 2017-08-21/10:38:01 +000000000 00:00:05.402 139.94 201
6 – SQL blew up TEMP – Why not AWR?
select child_number, executions, px_servers_executions, elapsed_time,
direct_writes,
elapsed_time/nvl(nullif(px_servers_executions,0),executions) et_exec,
direct_writes/nvl(nullif(px_servers_executions,0),executions)
direct_wrtes_exec
from gv$sql
where sql_id = '8d5h5p8znx8mx';
CHILD_NUMBER EXECS PX_EXECS ELAP_TIME DIRECT_W ET_EXEC DIRECT_W_EXEC
------------ ----- -------- ---------- -------- ------------ -------------
0 9 0 7,803,972 0 867,108 0
1 0 71 91,106,462 154,559 1,283,189.61 2,176.88
You might be able to
figure it out from GV$
but need a lot of
imagination and luck 
6 – SQL blew up TEMP – Summary
• Questions answered
• Ability to monitor spill at per-execution and per-session basis
• AWR would only show aggregated into
• Similar info available for IOPS and IO bytes (and memory scan in V$ASH)
• Charting info allows easy monitoring
• Large amount of info consumed quickly
• SQL Monitor relies on same ASH info
• Even without SQL Mon, tons of info can be extract from ASH
7 – Making sense of “strange” executions
SQL ID: 06pbgg9w0bmgp
select /*+ mauro */ a.*
from t_case1 a, t_case1 b
where a.owner = l_owner
and a.object_id = b.object_id
and burn_cpu(a.object_id/b.object_id) = 1
select child_number, executions, end_of_fetch_count,
elapsed_time, fetches, rows_processed
from gv$sql
where sql_id = '06pbgg9w0bmgp';
CHILD EXECS EOF_COUNT ELAPSED_TIME FETCHES ROWS_PROCESSED
----- ----- --------- ------------ -------- --------------
0 3 0 30,419,821 6 30
This is a single session
executing the SQL
Why none reached EOF?
7 – Making sense of “strange” executions
sql_id = '06pbgg9w0bmgp' and session_id = 377 order by sample_time;
SAMPLE_TIME SQLEXECID SEXECSTA
------------------ --------- --------
06.04.37.450 PM 16777222 18:04:36
06.04.38.450 PM 16777222 18:04:36
06.04.39.450 PM 16777222 18:04:36
06.04.40.450 PM 16777222 18:04:36
06.04.41.450 PM 16777223 18:04:41
06.04.42.450 PM 16777223 18:04:41
06.04.43.450 PM 16777223 18:04:41
06.04.44.450 PM 16777223 18:04:41
06.04.45.450 PM 16777223 18:04:41
06.04.46.450 PM 16777224 18:04:46
06.04.47.450 PM 16777224 18:04:46
06.04.48.450 PM 16777224 18:04:46
06.04.49.450 PM 16777224 18:04:46
06.04.50.450 PM 16777224 18:04:46
06.04.51.450 PM 16777224 18:04:46
06.04.52.450 PM 16777223 18:04:41
06.04.53.450 PM 16777223 18:04:41
06.04.54.450 PM 16777223 18:04:41
06.04.55.450 PM 16777223 18:04:41
06.04.56.450 PM 16777223 18:04:41
06.04.57.450 PM 16777224 18:04:46
06.04.58.450 PM 16777224 18:04:46
06.04.59.450 PM 16777224 18:04:46
06.05.00.450 PM 16777224 18:04:46
06.05.01.450 PM 16777224 18:04:46
06.05.02.450 PM 16777222 18:04:36
06.05.03.450 PM 16777222 18:04:36
06.05.04.450 PM 16777222 18:04:36
06.05.05.450 PM 16777222 18:04:36
06.05.06.450 PM 16777222 18:04:36
7 – Making sense of “strange” executions -
Summary
• Questions answered
• ASH data can be used to “slice” GV$ data and make more sense out of it
• In this specific case maybe not a cursor leak
• Since the cursor is used multiple times
• Same approach could be used to potentially spot a cursor leak
• Would require the SQL to take “long” enough to spot it
• Question not answered
• Why would somebody do anything like this 
Something worth knowing
• ASH data uses default values until the value is not “ready to consume”
• Adaptive Plans could take a while to resolve and until then PHV is 0
select /*+ LEADING(a) */ count(a.object_id)
from (select /*+ no_merge leading (a) */ 1 object_id, 'a' owner
from (select rownum from dual connect by rownum <= 1000) a,
(select rownum from dual connect by rownum <= 1000) b) a,
(select a.object_id
from t1 a, t2 b
where a.object_id = b.n1
and a.data_object_id = 1
and a.owner = 'SYS') b
where a.object_id = b.object_id
Something worth knowing
----------------------------------------------------------------------------------------
| Id |Operation |Name|E-Rows|Cost (%CPU)| Pstart| Pstop |
----------------------------------------------------------------------------------------
| 0|SELECT STATEMENT | | | 679 (100)| | |
| 1| SORT AGGREGATE | | 1| | | |
|- * 2| HASH JOIN | | 1| 679 (1)| | |
| 3| NESTED LOOPS | | 1| 679 (1)| | |
|- 4| STATISTICS COLLECTOR | | | | | |
| * 5| HASH JOIN | | 1| 405 (1)| | |
| 6| VIEW | | 1| 4 (0)| | |
| 7| MERGE JOIN CARTESIAN | | 1| 4 (0)| | |
| 8| VIEW | | 1| 2 (0)| | |
| 9| COUNT | | | | | |
| 10| CONNECT BY WITHOUT FILTERING | | | | | |
| 11| FAST DUAL | | 1| 2 (0)| | |
| 12| BUFFER SORT | | 1| 4 (0)| | |
| 13| VIEW | | 1| 2 (0)| | |
| 14| COUNT | | | | | |
| 15| CONNECT BY WITHOUT FILTERING| | | | | |
| 16| FAST DUAL | | 1| 2 (0)| | |
| * 17| TABLE ACCESS FULL |T1 | 1| 401 (1)| | |
| 18| PARTITION RANGE ITERATOR | | 49999| 274 (0)| KEY | KEY |
| * 19| TABLE ACCESS FULL |T2 | 49999| 274 (0)| KEY | KEY |
|- 20| PARTITION RANGE JOIN-FILTER | | 49999| 274 (0)|:BF0000|:BF0000|
|- 21| TABLE ACCESS FULL |T2 | 49999| 274 (0)|:BF0000|:BF0000|
----------------------------------------------------------------------------------------
Something worth knowing
select sample_time, sql_plan_hash_value, sql_plan_line_id
from gv$active_session_history
where sql_id = '8x52hyvsh1j45'
order by sample_time
SAMPLE_TIME SQL_PLAN_HASH_VALUE SQL_PLAN_LINE_ID
--------------------------- ------------------- ----------------
28-AUG-17 03.42.43.251 PM 0 7
28-AUG-17 03.42.44.251 PM 0 6
28-AUG-17 03.42.45.251 PM 0 6
28-AUG-17 03.42.46.252 PM 0 7
28-AUG-17 03.42.47.252 PM 0 7
28-AUG-17 03.42.48.252 PM 0 5
Things we just can’t do (as of now)
• Current diagnostic very comprehensive
• Allow to answer many questions around SQL execution
• Still some questions unanswered, some examples
• SQL Plan Baseline / SQL Patch used or not in the past (AWR limitation)
• High Version Count in the past (AWR “limitation”)
• Details of “old” CBO environment (encoded, no public API)
• Historical binds for slow execution (unless captured, requires luck)
• Changes in NLS environment in the past (current, V$SQL_SHARED_CURSOR)
• Probably not a big problem, unless you hit it 
Summary
• Oracle diagnostics rocks when used properly
• No single source of info, needs combining to get full picture
• ASH provides different point of view into SQL execution
• Needed more than expected
• Regardless of the source, visualizing things make it easier
• But this is Enkitec so you are stuck with me & SQL*Plus 
• SQL Monitoring fills some of the gaps
• Still many info come from ASH, available even historically (more than SQLMon)
• Statspack + free ASH can provide useful info
• Unfortunately not as comprehensive as the “real” ones
69
Contact Information
• Blog: http://mauro-pagano.com
• Free tools
• SQLd360
• TUNAs360
• Pathfinder
• An “interesting” post every N posts
• Email: mauro.pagano@gmail.com
• Twitter: @mautro
71

Weitere ähnliche Inhalte

Was ist angesagt?

Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
udaymoogala
 
Oracle 10g Performance: chapter 02 aas
Oracle 10g Performance: chapter 02 aasOracle 10g Performance: chapter 02 aas
Oracle 10g Performance: chapter 02 aas
Kyle Hailey
 

Was ist angesagt? (20)

Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
 
Tanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata MigrationsTanel Poder - Performance stories from Exadata Migrations
Tanel Poder - Performance stories from Exadata Migrations
 
Awr + 12c performance tuning
Awr + 12c performance tuningAwr + 12c performance tuning
Awr + 12c performance tuning
 
Ash and awr deep dive hotsos
Ash and awr deep dive hotsosAsh and awr deep dive hotsos
Ash and awr deep dive hotsos
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
 
Oracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention TroubleshootingOracle Latch and Mutex Contention Troubleshooting
Oracle Latch and Mutex Contention Troubleshooting
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Oracle SQL Tuning for Day-to-Day Data Warehouse Support
Oracle SQL Tuning for Day-to-Day Data Warehouse SupportOracle SQL Tuning for Day-to-Day Data Warehouse Support
Oracle SQL Tuning for Day-to-Day Data Warehouse Support
 
Same plan different performance
Same plan different performanceSame plan different performance
Same plan different performance
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 1
 
Oracle Database performance tuning using oratop
Oracle Database performance tuning using oratopOracle Database performance tuning using oratop
Oracle Database performance tuning using oratop
 
Oracle Performance Tuning Fundamentals
Oracle Performance Tuning FundamentalsOracle Performance Tuning Fundamentals
Oracle Performance Tuning Fundamentals
 
AWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add upAWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add up
 
Oracle statistics by example
Oracle statistics by exampleOracle statistics by example
Oracle statistics by example
 
Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014
 
Oracle 10g Performance: chapter 02 aas
Oracle 10g Performance: chapter 02 aasOracle 10g Performance: chapter 02 aas
Oracle 10g Performance: chapter 02 aas
 
SQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12cSQL Monitoring in Oracle Database 12c
SQL Monitoring in Oracle Database 12c
 

Andere mochten auch (6)

Full Table Scan: friend or foe
Full Table Scan: friend or foeFull Table Scan: friend or foe
Full Table Scan: friend or foe
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c era
 
SQL Plan Directives explained
SQL Plan Directives explainedSQL Plan Directives explained
SQL Plan Directives explained
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12c
 
SQLd360
SQLd360SQLd360
SQLd360
 
Is your SQL Exadata-aware?
Is your SQL Exadata-aware?Is your SQL Exadata-aware?
Is your SQL Exadata-aware?
 

Ähnlich wie SQL Tuning, takes 3 to tango

Oracle Open World Thursday 230 ashmasters
Oracle Open World Thursday 230 ashmastersOracle Open World Thursday 230 ashmasters
Oracle Open World Thursday 230 ashmasters
Kyle Hailey
 
200603ash.pdf Performance Tuning Oracle DB
200603ash.pdf Performance Tuning Oracle DB200603ash.pdf Performance Tuning Oracle DB
200603ash.pdf Performance Tuning Oracle DB
cookie1969
 
AWR, ADDM, ASH, Metrics and Advisors.ppt
AWR, ADDM, ASH, Metrics and Advisors.pptAWR, ADDM, ASH, Metrics and Advisors.ppt
AWR, ADDM, ASH, Metrics and Advisors.ppt
bugzbinny
 
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Kristofferson A
 
Analyzing SQL Traces generated by EVENT 10046.pptx
Analyzing SQL Traces generated by EVENT 10046.pptxAnalyzing SQL Traces generated by EVENT 10046.pptx
Analyzing SQL Traces generated by EVENT 10046.pptx
ssuserbad8d3
 
Empower my sql server administration with 5.7 instruments
Empower my sql server administration with 5.7 instrumentsEmpower my sql server administration with 5.7 instruments
Empower my sql server administration with 5.7 instruments
Marco Tusa
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
Karam Abuataya
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
Iben Rodriguez
 

Ähnlich wie SQL Tuning, takes 3 to tango (20)

Rmoug ashmaster
Rmoug ashmasterRmoug ashmaster
Rmoug ashmaster
 
Oracle Open World Thursday 230 ashmasters
Oracle Open World Thursday 230 ashmastersOracle Open World Thursday 230 ashmasters
Oracle Open World Thursday 230 ashmasters
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
200603ash.pdf Performance Tuning Oracle DB
200603ash.pdf Performance Tuning Oracle DB200603ash.pdf Performance Tuning Oracle DB
200603ash.pdf Performance Tuning Oracle DB
 
AWR, ADDM, ASH, Metrics and Advisors.ppt
AWR, ADDM, ASH, Metrics and Advisors.pptAWR, ADDM, ASH, Metrics and Advisors.ppt
AWR, ADDM, ASH, Metrics and Advisors.ppt
 
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
Hotsos 2011: Mining the AWR repository for Capacity Planning, Visualization, ...
 
active_session_history_oracle_performance.ppt
active_session_history_oracle_performance.pptactive_session_history_oracle_performance.ppt
active_session_history_oracle_performance.ppt
 
Oracle Database Performance Tuning Basics
Oracle Database Performance Tuning BasicsOracle Database Performance Tuning Basics
Oracle Database Performance Tuning Basics
 
Analyzing SQL Traces generated by EVENT 10046.pptx
Analyzing SQL Traces generated by EVENT 10046.pptxAnalyzing SQL Traces generated by EVENT 10046.pptx
Analyzing SQL Traces generated by EVENT 10046.pptx
 
Empower my sql server administration with 5.7 instruments
Empower my sql server administration with 5.7 instrumentsEmpower my sql server administration with 5.7 instruments
Empower my sql server administration with 5.7 instruments
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
ASH and AWR on DB12c
ASH and AWR on DB12cASH and AWR on DB12c
ASH and AWR on DB12c
 
Awr doag
Awr doagAwr doag
Awr doag
 
Awr1page OTW2018
Awr1page OTW2018Awr1page OTW2018
Awr1page OTW2018
 
AWR DB performance Data Mining - Collaborate 2015
AWR DB performance Data Mining - Collaborate 2015AWR DB performance Data Mining - Collaborate 2015
AWR DB performance Data Mining - Collaborate 2015
 
Awr1page - Sanity checking time instrumentation in AWR reports
Awr1page - Sanity checking time instrumentation in AWR reportsAwr1page - Sanity checking time instrumentation in AWR reports
Awr1page - Sanity checking time instrumentation in AWR reports
 
Using AWR for SQL Analysis
Using AWR for SQL AnalysisUsing AWR for SQL Analysis
Using AWR for SQL Analysis
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
 
OTN tour 2015 AWR data mining
OTN tour 2015 AWR data miningOTN tour 2015 AWR data mining
OTN tour 2015 AWR data mining
 

Kürzlich hochgeladen

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Kürzlich hochgeladen (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

SQL Tuning, takes 3 to tango

  • 1. SQL Tuning Takes three to tango Mauro Pagano
  • 2. Mauro Pagano @mautro • Worked at Oracle, been at Enkitec (AEG) a while now • Spend most of the time on performance problems • Free tools: SQLd360, TUNAs360 etc (at Oracle: SQLT, SQLHC etc) • Strong British accent • “Newbie old fart” (approved by Bryn) 2
  • 3.
  • 5.
  • 6. 1. Do you use AWR for SQL Tuning? 1. Do you start from it? 2. Why? 3. How do you use it? 2. Do you use ASH for SQL Tuning? 1. Do you start from it? 2. Why? 3. How do you use it? 3. Do you use SQL Monitoring for SQL tuning? 1. Do you start from it? 2. Why? 3. How do you use it? Poll time – “Historical” SQL Tuning
  • 7. What are we doing here today? • Oracle has a ton of diagnostics (awesome!) • People tend to rely on GV$ / AWR more than ASH • Some questions harder to answer (if possible) from GV$/AWR data • Today’s goal is: • Present scenarios where multiple sources needed • Explain why & where to gather the missing info, make sense out of it • Knowing what info represent / source, better use of them • Focus is on diagnostics
  • 8. What are we NOT doing here today? • Argument about which one is better • They complement each other, not exclude each other • Need all (often AWR+ASH enough) to have a full picture • One could be enough depending on cases, still the other adds value • Provide solution to scenarios presented • Today it’s about diagnostics, not problem X or Y • Once behavior identified correctly, solution is often easier to find (if exists) • Talk license / cost associated with the Packs used
  • 9. Some (incorrect) terminology we’ll use today • GV$ all views on X$, except X$ASH • For Example, GV$SQL • AWR all the tables in AWR except ASH • For example, DBA_HIST_SQLSTAT and DBA_HIST_SQL_PLAN • ASH as GV$ACTIVE_SESSION_HISTORY and DBA_HIST_ACTIVE_SESS_HISTORY • SQL Mon as SQL Monitor • Both raw data (GV$) and reports (current and historical)
  • 10. How are AWR / (historical) ASH populated? • AWR takes a picture every N minutes (or manual) • Source views store accumulated data, take a pic of that at time T • Historical ASH filters out samples from memory ASH • Filtered may show info not important enough to show up in accumulated • Source data includes info for all active sessions individually (not aggregated) • Ratio is generally 1:10 • X$KEW[A|R]* help to narrow down what to collect • Might make things a little harder to “break” in isolation 
  • 11. Why do we need both? AWR Knows exactly how much water ASH Knows roughly who, when, how…
  • 12. ASH samples - Why can we live with it? Elapsed Time / execution Frequencyofexecution How you move these bars depends on your app
  • 13. Before we begin… • Every case is artificial, represents real file case without noise • Case itself is just a mean to an end, not really the focus of the scenario • Cases build on each other, start simple and get into a little more complex • OF COURSE I’m cheating! • What we’ll see can be applied to any environment • Knowing how to interpret and spot things helps in dev too • Charts used just to present large amount of info in small space • Not trying to push for any specific tool • Get into DataViz anyway, it makes life so much easier!
  • 14. Two tables, all we need -- 8x dba_objects create table t_case1 as select * from dba_objects, (select rownum n1 from dual connect by rownum <= 8); create index t_case1_objtype on t_case1(object_type); -- 32x dba_objects create table t_case3 as select * from t_case1, (select rownum from dual connect by rownum <= 4);
  • 15. 1 - How long does my SQL take? SQL ID: apg0k1r43s8ak SQL Text: select * from t_case1 where object_type = :b1; Plan hash value: 3696583251 --------------------------------------------------------------- | Id | Operation | Name | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T_CASE1 | |* 2 | INDEX RANGE SCAN | T_CASE1_OBJTYPE | --------------------------------------------------------------- 2 - access("OBJECT_TYPE"=:B1)
  • 16. 1 - How long does my SQL take? Total <<removed plan_hash_value&child_number just to make it fit, one child only>> select elapsed_time, buffer_gets, executions, trunc(elapsed_time/executions,2) elapsed_exec, trunc(buffer_gets/executions,2) lio_exec from gv$sql where sql_id = 'apg0k1r43s8ak'; ELAPSED_TIME BUFFER_GETS EXECUTIONS ELAPSED_EXEC LIO_EXEC -------------- ------------ ---------- ------------- ----------- 2,446,627 48,884 11 222,420.63 4,444 Elapsed/exec ~220ms Gets/exec ~4.5k
  • 17. 1 - How long does my SQL take? Good run var b1 varchar2(20) exec :b1 := 'RULE'; select * from t_case1 where object_type = :b1; ----------------------------------------------------------------------------- | Id |Operation |Name |A-Rows|A-Time |Buffers| ----------------------------------------------------------------------------- | 0|SELECT STATEMENT | | 8|00:00.01| 13| | 1| TABLE ACCESS BY INDEX ROWID B|T_CASE1 | 8|00:00.01| 13| |* 2| INDEX RANGE SCAN |T_CASE1_OBJTYPE| 8|00:00.01| 5| ----------------------------------------------------------------------------- Elapsed: 00:00:00.07 ~10ms 13 buffer gets
  • 18. 1 - How long does my SQL take? Bad run var b1 varchar2(20) exec :b1 := 'JAVA CLASS'; select * from t_case1 where object_type = :b1; ----------------------------------------------------------------------------- |Id|Operation |Name |A-Rows|A-Time |Buffers|Reads| ----------------------------------------------------------------------------- | 0|SELECT STATEMENT | | 305K|00:01.80| 54926| 8009| | 1| TABLE ACCESS BY INDEX R B|T_CASE1 | 305K|00:01.80| 54926| 8009| |*2| INDEX RANGE SCAN |T_CASE1_OBJTYPE| 305K|00:00.63| 22737| 1300| ----------------------------------------------------------------------------- Elapsed: 00:05:11.88 Can we look at the data from a different POV? ~1.80s 55k buffer gets
  • 19. 1 - How long does my SQL take? ASH POV select sample_time, sql_exec_id, sql_exec_start from gv$active_session_history where sql_id = 'apg0k1r43s8ak' order by sample_time; SAMPLE_TIME SQL_EXEC_ID SQL_EXEC_START --------------------------- ----------- ------------------- 20-AUG-17 11.39.58.754 AM 16777222 2017-08-20/11:39:46 20-AUG-17 11.40.11.761 AM 20-AUG-17 11.40.31.781 AM 20-AUG-17 11.40.40.791 AM 20-AUG-17 11.40.43.792 AM 20-AUG-17 11.40.48.799 AM 20-AUG-17 11.40.51.800 AM 20-AUG-17 11.40.57.801 AM 20-AUG-17 11.41.03.809 AM 20-AUG-17 11.41.05.811 AM 20-AUG-17 11.41.23.833 AM 20-AUG-17 11.41.35.846 AM 20-AUG-17 11.41.52.863 AM 20-AUG-17 11.41.54.864 AM 20-AUG-17 11.42.03.870 AM 20-AUG-17 11.42.08.875 AM 20-AUG-17 11.42.21.882 AM 20-AUG-17 11.42.31.891 AM 20-AUG-17 11.42.34.896 AM Jumps in time – session not always busy during the missing sample User experience is 5 minutes not 2 secs Not much we can do from the DB perspective here
  • 20. 1 - How long does my SQL take? Summary • Questions answered • GV$SQL (and similar) report time spent in DB calls, not user experience • GV$SQL (and similar) aggregates time over executions of same cursor • ASH sampled data helps understand how DB Time is spread over clock time • In this case showing how clock time was likely NOT spent inside the DB • ASH data has many dimensions, can help narrow down further • For example, all slow executions come from app server X • Question not solved • Why slow execution was slow (was easy this time, we provided the bind) • Historical binds are sampled, no direct correlation with specific execution • Ideally pick up value and run SQL to reproduce
  • 21. 2 - How long did my SQL take? AWR SQL ID: 8gv4bwmnp8kmq select /*+ LEADING(A) USE_NL(B) */ count(*) from t_case1 a, t_case1 b where rownum <= 1e10; select snap_id, executions_delta e_d, executions_total e_t, end_of_fetch_count_delta eof_d, trunc(elapsed_time_delta/1e6) et_d_s, trunc(elapsed_time_total/1e6) et_t_s, buffer_gets_delta bg_d, buffer_gets_total bg_t from dba_hist_sqlstat where sql_id = '8gv4bwmnp8kmq' order by snap_id; SNAP_ID E_D E_T EOF_D ET_D_S ET_T_S BG_D BG_T ---------- --- --- ----- ------ ------ ----------- ------------ 3341 0 1 0 187 188 77,221,193 77,693,626 3342 0 1 0 126 314 51,883,866 129,577,492 3343 0 1 0 128 442 52,887,666 182,465,158 No info from snapshots when SQL started & ended
  • 22. 2 - How long did my SQL take? AWR report No trivial way to determine #concurrent execs. Doable from *_TOTAL raw info
  • 23. 2 - How long did my SQL take? Concurr Execs Time passing SNAP_ID 3341 3342 3343 Exec #1, starts second and completes second Not expensive enough to get captured Exec #2, starts last and completes first Session 1 Session 2 Session 3
  • 24. 2 - How long did my SQL take? ASH data select sql_exec_id, sql_exec_start, min(sample_time) first_sample, max(sample_time) last_sample, max(sample_time)-sql_exec_start elapsed from dba_hist_active_sess_history where sql_id = '8gv4bwmnp8kmq' group by sql_exec_id, sql_exec_start; SQL_EXEC_ID SQL_EXEC_START FIRST_SAMPLE ----------- ------------------- --------------------------- 16777216 2017-08-20/13:04:43 20-AUG-17 01.04.52.779 PM LAST_SAMPLE ELAPSED -------------------------- -------------------------- 20-AUG-17 01.12.32.799 PM +000000000 00:07:49.79 Only one execution, took ~8 mins
  • 25. 2 - How long did my SQL take? Summary • Questions answered • AWR only captures what mattered for the snapshot • Can miss start / stop “slice” of info if not impacting enough within snapshot • Raw info allows to determine number of concurrent executions, not AWR report • Can only say how many started / ended, not which one • ASH keeps only a subset of samples, but for each exec • With approximation, allows to determine the who, when, where of each exec • Questions not answered • What if my execution takes very little? Sample compromise / doesn’t matter
  • 26.
  • 27. 3 – How is my PX doing? SQL ID: frzgf5tc9cscc select /*+ LEADING(A) PARALLEL(4) */ count(*) from t_case1 a, t_case1 b where a.owner = b.owner; << while SQL running >> select child_number, elapsed_time, buffer_gets, executions, px_servers_executions from gv$sql where sql_id = 'frzgf5tc9cscc'; CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS ------------ -------------- ----------- ---------- --------------------- 0 11,677 172 1 0 1 34,682,909 6 0 0
  • 28. 3 – How is my PX doing? Running slow! << SQL still running >> CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS ------------ -------------- ------------ ---------- --------------------- 0 13,682 172 1 0 1 59,852,041 2,734 0 0 after CTRL+C (was taking too long) CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS ------------ -------------- ------------ ---------- --------------------- 0 519,951 172 1 0 1 96,314,353 13,205 0 8 Up to this point we know 8 sessions involved and aggregated stats only
  • 29. 3 – How is my PX doing? Checking plan << SQL was still running>> --------------------------------------------------------------------------------------- | Id|Operation |Name |E-Rows|Cost (%CPU)| TQ |IN-OUT|PQ Distrib | --------------------------------------------------------------------------------------- | 0|SELECT STATEMENT | | |19462 (100)| | | | | 1| SORT AGGREGATE | | 1 | | | | | | 2| PX COORDINATOR | | | | | | | | 3| PX SEND QC (RANDOM) |:TQ10002| 1 | |Q1,02| P->S |QC (RAND) | | 4| SORT AGGREGATE | | 1 | |Q1,02| PCWP | | |* 5| HASH JOIN | | 27G|19462 (86)|Q1,02| PCWP | | | 6| PX RECEIVE | | 940K| 1377 (1)|Q1,02| PCWP | | | 7| PX SEND HYBRID HASH |:TQ10000| 940K| 1377 (1)|Q1,00| P->P |HYBRID HASH| | 8| STATISTICS COLLECTOR| | | |Q1,00| PCWC | | | 9| PX BLOCK ITERATOR | | 940K| 1377 (1)|Q1,00| PCWC | | |*10| TABLE ACCESS FULL |T_CASE1 | 940K| 1377 (1)|Q1,00| PCWP | | | 11| PX RECEIVE | | 940K| 1377 (1)|Q1,02| PCWP | | | 12| PX SEND HYBRID HASH |:TQ10001| 940K| 1377 (1)|Q1,01| P->P |HYBRID HASH| | 13| PX BLOCK ITERATOR | | 940K| 1377 (1)|Q1,01| PCWC | | |*14| TABLE ACCESS FULL |T_CASE1 | 940K| 1377 (1)|Q1,01| PCWP | | --------------------------------------------------------------------------------------- Nothing surprising, plan you’d expect when dealing with large #rows Maybe PX Skewness? Can’t use V$PQ_TQSTAT, we CTRL+Ced exec Not downgraded, used 8 processes
  • 30. 3 – How is my PX doing? PX Skew & ASH data select session_id, session_serial#, program, count(*) from gv$active_session_history where sql_id = 'frzgf5tc9cscc' and sql_exec_id = 16777217 group by session_id, session_serial#, program; SESSION_ID SESSION_SERIAL# PROGRAM COUNT(*) ---------- --------------- -------------------- ---------- 8 55195 oracle@oel7 (P003) 217 133 37006 oracle@oel7 (P001) 12 QC not showing nor most of the other processes, P003 top consumer Adding new ASH cols in the SQL we can drill down, e.g. plan step where time goes
  • 31. 3 – How is my PX doing? PX Skew & SQL Mon Many PX info in SQL Mon NOT COMING from ASH 
  • 32. 3 – How is my PX doing? PX Skew Summary • Questions answered • Presence of skewness during / after SQL execution • Regardless of V$PQ_TQSTAT view (tricky to use) • Needs SQL Monitor to have low level info (buffer gets, accurate time, etc) • Questions not answered • What causes the skewness and how to resolve it (not investigated here)
  • 33. 4 – My PX SQL performance is unstable SQL ID: 8nkpzgz08mdc8 select /*+ PARALLEL(4) */ count(*) from t_case3 a, t_case3 b where a.object_id = b.object_id; select child_number, elapsed_time, buffer_gets, executions, px_servers_executions from gv$sql where sql_id = '8nkpzgz08mdc8'; CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS ------------ -------------- ------------ ---------- --------------------- 0 3,498,326 97,015 3 0 1 13,187,086 196,073 0 12
  • 34. 4 – PX SQL perf unstable - Checking plan -------------------------------------------------------------------------------------- | Id|Operation |Name |E-Rows|Cost(%CPU)| TQ |IN-OUT|PQ Distrib | -------------------------------------------------------------------------------------- | 0|SELECT STATEMENT | | |7375 (100)| | | | | 1| SORT AGGREGATE | | 1 | | | | | | 2| PX COORDINATOR | | | | | | | | 3| PX SEND QC (RANDOM) |:TQ10002| 1 | |Q1,02| P->S |QC (RAND) | | 4| SORT AGGREGATE | | 1 | |Q1,02| PCWP | | |* 5| HASH JOIN | | 74M|7375 (1)|Q1,02| PCWP | | | 6| PX RECEIVE | | 2363K|3664 (1)|Q1,02| PCWP | | | 7| PX SEND HYBRID HASH |:TQ10000| 2363K|3664 (1)|Q1,00| P->P |HYBRID HASH| | 8| STATISTICS COLLECTOR| | | |Q1,00| PCWC | | | 9| PX BLOCK ITERATOR | | 2363K|3664 (1)|Q1,00| PCWC | | |*10| TABLE ACCESS FULL |T_CASE3 | 2363K|3664 (1)|Q1,00| PCWP | | | 11| PX RECEIVE | | 2363K|3664 (1)|Q1,02| PCWP | | | 12| PX SEND HYBRID HASH |:TQ10001| 2363K|3664 (1)|Q1,01| P->P |HYBRID HASH| | 13| PX BLOCK ITERATOR | | 2363K|3664 (1)|Q1,01| PCWC | | |*14| TABLE ACCESS FULL |T_CASE3 | 2363K|3664 (1)|Q1,01| PCWP | | -------------------------------------------------------------------------------------- Does 4 slaves make sense looking at this plan vs SQL?
  • 35. 4 – PX SQL perf unstable – GV$SQL “history” After 1st exec CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS ------------ -------------- ------------ ---------- --------------------- 0 16,630 212 1 0 1 8,496,684 98,491 0 8 After 2nd exec CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS ------------ -------------- ------------ ---------- --------------------- 0 3,491,757 96,975 2 0 1 8,496,684 98,491 0 8 After 3rd exec CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS ------------ -------------- ------------ ---------- --------------------- 0 3,498,326 97,015 3 0 1 13,187,086 196,073 0 12 We got lucky here Info are accumulated thus very hard to spot downgrades
  • 36. 4 – PX SQL perf unstable – ASH data select distinct sql_exec_id, sql_exec_start, case when px_flags is null then 'SERIAL' else 'DoP '||trunc(px_flags/ 2097152) end dop from gv$active_session_history where sql_id = '8nkpzgz08mdc8' order by 2; SQL_EXEC_ID SQL_EXEC_START DOP ----------- ------------------- --------- 16777216 2017-08-20/17:08:50 DoP 4 16777217 2017-08-20/17:09:16 SERIAL 16777218 2017-08-20/17:10:03 DoP 2 No need for luck we got ASH 
  • 37. 5 – My PX SQL perf is unstable – more fun SQL ID: gcgmgk8m8v4vm with a as (select /*+ materialize parallel(4)*/ a.object_id, b.object_name from t_case3 a, t_case3 b where a.object_id = b.object_id and rownum <= 1e6) select count(*) from (select /*+ parallel(4) no_merge */ c.object_name from t_case3 c, a where a.object_id = c.object_id and a.object_name = c.object_name); CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS ------------ -------------- ------------ ---------- --------------------- 0 4,097,033 50,679 1 0 1 14,191,651 98,499 0 8
  • 38. 5 – My PX SQL perf is unstable - Checking plan -------------------------------------------------------------------------------------------------------------------- | Id |Operation |Name |E-Rows | Cost | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------------------------------------------- | 0|SELECT STATEMENT | | | 11502| | | | | 1| TEMP TABLE TRANSFORMATION | | | | | | | | 2| LOAD AS SELECT (CURSOR DURATION MEMORY)|SYS_TEMP_0FD9D6B81_119A63B| | | | | | |* 3| COUNT STOPKEY | | | | | | | | 4| PX COORDINATOR | | | | | | | | 5| PX SEND QC (RANDOM) |:TQ10002 | 74M| 7375|Q1,02| P->S | QC (RAND) | | 6| BUFFER SORT | | 1000K| |Q1,02| PCWP | | |* 7| COUNT STOPKEY | | | |Q1,02| PCWC | | |* 8| HASH JOIN | | 74M| 7375|Q1,02| PCWP | | | 9| PX RECEIVE | | 2363K| 3664|Q1,02| PCWP | | | 10| PX SEND HYBRID HASH |:TQ10000 | 2363K| 3664|Q1,00| P->P | HYBRID HASH| | 11| STATISTICS COLLECTOR | | | |Q1,00| PCWC | | | 12| PX BLOCK ITERATOR | | 2363K| 3664|Q1,00| PCWC | | |* 13| TABLE ACCESS FULL |T_CASE3 | 2363K| 3664|Q1,00| PCWP | | | 14| PX RECEIVE | | 2363K| 3664|Q1,02| PCWP | | | 15| PX SEND HYBRID HASH |:TQ10001 | 2363K| 3664|Q1,01| P->P | HYBRID HASH| | 16| PX BLOCK ITERATOR | | 2363K| 3664|Q1,01| PCWC | | |* 17| TABLE ACCESS FULL |T_CASE3 | 2363K| 3664|Q1,01| PCWP | | | 18| SORT AGGREGATE | | 1 | | | | | | 19| PX COORDINATOR | | | | | | | | 20| PX SEND QC (RANDOM) |:TQ20002 | 1 | |Q2,02| P->S | QC (RAND) | | 21| SORT AGGREGATE | | 1 | |Q2,02| PCWP | | | 22| VIEW | | 1000K| 4127|Q2,02| PCWP | | |* 23| HASH JOIN | | 1000K| 4127|Q2,02| PCWP | | | 24| PX RECEIVE | | 1000K| 461|Q2,02| PCWP | | | 25| PX SEND HASH |:TQ20000 | 1000K| 461|Q2,00| P->P | HASH | | 26| VIEW | | 1000K| 461|Q2,00| PCWP | | | 27| PX BLOCK ITERATOR | | 1000K| 461|Q2,00| PCWC | | |* 28| TABLE ACCESS FULL |SYS_TEMP_0FD9D6B81_119A63B| 1000K| 461|Q2,00| PCWP | | | 29| PX RECEIVE | | 2363K| 3664|Q2,02| PCWP | | | 30| PX SEND HASH |:TQ20001 | 2363K| 3664|Q2,01| P->P | HASH | | 31| PX BLOCK ITERATOR | | 2363K| 3664|Q2,01| PCWC | | |* 32| TABLE ACCESS FULL |T_CASE3 | 2363K| 3664|Q2,01| PCWP | | --------------------------------------------------------------------------------------------------------------------
  • 39. 5 – My PX SQL perf is unstable – ASH solution select sample_time, program, sql_plan_line_id, case when px_flags … dop from ash where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' order by 1, 2 ; SAMPLE_TIME PROGRAM SQL_PLAN_LINE_ID DOP ----------------------------- -------------------- ---------------- -------- 20-AUG-17 05.50.13.290 PM oracle@oel7 (P000) 6 DoP 4 20-AUG-17 05.50.13.290 PM oracle@oel7 (P001) 6 DoP 4 20-AUG-17 05.50.13.290 PM oracle@oel7 (P002) 6 DoP 4 20-AUG-17 05.50.13.290 PM oracle@oel7 (P003) 6 DoP 4 20-AUG-17 05.50.14.290 PM oracle@oel7 (P000) 6 DoP 4 20-AUG-17 05.50.14.290 PM oracle@oel7 (P001) 6 DoP 4 20-AUG-17 05.50.14.290 PM oracle@oel7 (P002) 6 DoP 4 20-AUG-17 05.50.14.290 PM oracle@oel7 (P003) 6 DoP 4 20-AUG-17 05.50.15.289 PM oracle@oel7 (P000) 6 DoP 4 20-AUG-17 05.50.15.289 PM oracle@oel7 (P001) 6 DoP 4 20-AUG-17 05.50.15.289 PM oracle@oel7 (P002) 6 DoP 4 20-AUG-17 05.50.15.289 PM oracle@oel7 (P003) 6 DoP 4 20-AUG-17 05.50.16.294 PM sqlplus@Mauros-MBP.w 2 SERIAL 20-AUG-17 05.50.17.296 PM sqlplus@Mauros-MBP.w 23 SERIAL 20-AUG-17 05.50.18.296 PM sqlplus@Mauros-MBP.w 23 SERIAL 20-AUG-17 05.50.19.296 PM sqlplus@Mauros-MBP.w 23 SERIAL
  • 40. 5 – My PX SQL perf is unstable – SQLMon sol
  • 41. 5 – My PX SQL perf is unstable – SQLM poking select sid, process_name, px_maxdop, px_servers_requested, px_servers_allocated, px_server#, px_server_group, px_server_set, px_qcsid from gv$sql_monitor where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' order by px_server_set nulls first, px_server# nulls first; SID PROCE PX_MAXDOP PX_S_REQUESTED PX_S_ALLOC PX_SERVER# PX_SERVER_GROUP PX_SERVER_SET PX_QCSID ---------- ----- ---------- -------------- ---------- ---------- --------------- ------------- ---------- 244 ora 4 16 8 373 p000 1 1 1 244 132 p001 2 1 1 244 256 p002 3 1 1 244 13 p003 4 1 1 244 133 p004 1 1 2 244 255 p005 2 1 2 244 372 p006 3 1 2 244 12 p007 4 1 2 244
  • 42. 4 & 5 PX SQL perf unstable – Summary • Questions answered • Ability to determine DoP during / after execution • Regardless of V$PX_SESSION (and others) views • Ability to determine DoP on a per DFO-tree basis • Pretty much impossible from GV$ / AWR • Multiple dimensions can be added to drill down into slave execs (e.g waits) • SQL Monitor only way to extract low level info per slave • For example, buffer gets, accurate time, #rows, starts, etc • Questions not answered • What causes the downgrade (not investigated here)
  • 43. Trivia – My SQL blew up TEMP SQL ID: 0qnb575hn2mkr (FAILS) & dm53symv2vmy6 (WORKS) select /*+ PARALLEL(4) LEADING(B A C) USE_SWAP(c) USE_HASH(A) USE_HASH(C) FAILS|WORKS */ count(*) from t_case3 a, t_case3 b, t_case3 c where a.object_id = b.object_id and a.object_id = c.object_id; ERROR at line 1: ORA-12801: error signaled in parallel query server P000 ORA-01652: unable to extend temp segment by 128 in tablespace TEMP <<hint, I’m messing with the env and with you>>
  • 44. Trivia – My SQL blew up TEMP select sql_id, child_number, executions, px_servers_executions, buffer_gets, disk_reads, direct_reads, direct_writes from gv$sql where sql_id in ('dm53symv2vmy6','0qnb575hn2mkr') order by 1,2; SQL_ID CHILD EXECS PX_EXECS BUFFER_GETS DISK_READS DIRECT_READS DIRECT_WRITES ------ ------ ----- -------- ------------ ----------- ------------- ------------- 0qnb57 0 1 0 104 0 0 0 0qnb57 1 0 8 145,756 147,300 147,300 73,253 dm53sy 0 1 0 15 0 0 0 dm53sy 1 0 8 145,752 145,068 145,068 0
  • 45. Trivia – My SQL blew up TEMP sql_id = 'dm53symv2vmy6' group by sample_time order by 1; SAMPLE_TIME SUM(PGA_ALLOCATED) SUM(TEMP_SPACE_ALLOCATED) ---------------------------- ------------------ ------------------------- 20-AUG-17 07.16.06.259 PM 84,414,464 0 20-AUG-17 07.16.07.259 PM 766,644,224 0 20-AUG-17 07.16.08.264 PM 766,644,224 0 <<…>> 20-AUG-17 07.16.24.316 PM 766,644,224 0 20-AUG-17 07.16.25.316 PM 766,644,224 0 sql_id = '0qnb575hn2mkr' group by sample_time order by 1; SAMPLE_TIME SUM(PGA_ALLOCATED) SUM(TEMP_SPACE_ALLOCATED) ---------------------------- ------------------ ------------------------- 20-AUG-17 07.17.49.463 PM 17,075,200 61,865,984 20-AUG-17 07.17.50.463 PM 40,308,736 148,897,792 <<…>> 20-AUG-17 07.17.55.467 PM 58,396,672 509,607,936 20-AUG-17 07.17.56.466 PM 58,396,672 583,008,256 Used less PGA but spilled to TEMP
  • 46. Trivia – My SQL blew up TEMP – SQL Monitor dm53symv2vmy60qnb575hn2mkr
  • 47. Trivia – My SQL blew up TEMP select sql_id, child_number, optimizer_env_hash_value from gv$sql where sql_id in ('dm53symv2vmy6','0qnb575hn2mkr'); SQL_ID CHILD_NUMBER OPTIMIZER_ENV_HASH_VALUE ------------- ------------ ------------------------ 0qnb575hn2mkr 0 3821565029 0qnb575hn2mkr 1 128879201 dm53symv2vmy6 0 3821565029 dm53symv2vmy6 1 128879201 Same CBO environment aka same CBO params Not all _smm_* params make it into CBO env!!!
  • 48. 6 – SQL blew up TEMP – prevention!! SQL ID: 8d5h5p8znx8mx select /*+ PARALLEL(4) LEADING(B A C) USE_SWAP(c) USE_HASH(A) USE_HASH(C) */ count(*) from t_case1 a, t_case1 b, t_case1 c where a.object_id = b.object_id and a.object_id = c.object_id; << not using GV$/AWR because we need to differentiate per exec >>
  • 49. 6 – SQL blew up TEMP – history 1st run SAMPLE_TIME SQL_EXEC_ID PGA TEMP ------------------------------- ----------- ---------- ---------- 21-AUG-17 10.09.12.674 AM 16777217 105.22 61 2nd run – data is growing ------------------------------- ----------- ---------- ---------- 21-AUG-17 10.15.32.182 AM 16777218 17.12 20 21-AUG-17 10.15.33.182 AM 16777218 70.12 113 3rd run – data keeps growing ------------------------------- ----------- ---------- ---------- 21-AUG-17 10.16.31.259 AM 16777219 2.26 0 21-AUG-17 10.16.32.259 AM 16777219 21.69 40 21-AUG-17 10.16.33.261 AM 16777219 70.12 110
  • 50. 6 – SQL blew up TEMP – Aggregated history Aggregating over a few runs the trend is obvious (increasing memory usage) SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP ----------- ------------------- -------------------------- ------- ---------- 16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82 16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61 16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113 16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110 16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160 16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
  • 51. 6 – SQL blew up TEMP – Chart your data! ASH info are really easy to chart Faster to consume!
  • 52. 6 – SQL blew up TEMP – Keep executing One new “break the pattern” execution showed up SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP ----------- ------------------- ------------------------- ------- ----- 16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82 16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61 16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113 16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110 16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160 16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188 16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109 Touched less PGA / TEMP but took longer
  • 53. 6 – SQL blew up TEMP – Drill into 1 exec sql_id = '8d5h5p8znx8mx' and sql_exec_id = 16777222 SAMPLE_TIME SID PROGRAM EVENT --------------------------- --- -------------------- --------------------------------- 21-AUG-17 10.26.28.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint 21-AUG-17 10.26.29.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint 21-AUG-17 10.26.30.481 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint 21-AUG-17 10.26.31.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint 21-AUG-17 10.26.32.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint 21-AUG-17 10.26.33.480 AM 253 oracle@oel7 (P005) 21-AUG-17 10.26.33.480 AM 362 oracle@oel7 (P006) 21-AUG-17 10.26.34.482 AM 16 oracle@oel7 (P003) direct path write temp 21-AUG-17 10.26.34.482 AM 135 oracle@oel7 (P000) direct path write temp 21-AUG-17 10.26.34.482 AM 255 oracle@oel7 (P001) direct path write temp 21-AUG-17 10.26.34.482 AM 373 oracle@oel7 (P002) direct path write temp
  • 54. 6 – SQL blew up TEMP – Keep executing One more execution showed up SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP ----------- ------------------- ------------------------ ------- ----- 16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82 16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61 16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113 16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110 16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160 16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188 16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109 16777223 2017-08-21/10:34:05 +000000000 00:00:02.126 69.12 108 Same PGA / TEMP as previous but much faster
  • 55. 6 – SQL blew up TEMP – Mystery solved One more execution showed up, but they are from different sessions SID SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP --- ----------- ------------------- ------------------------ ------- ----- 130 16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110 130 16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160 130 16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188 254 16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109 254 16777223 2017-08-21/10:34:05 +000000000 00:00:02.126 69.12 108 130 16777224 2017-08-21/10:38:01 +000000000 00:00:05.402 139.94 201
  • 56. 6 – SQL blew up TEMP – Why not AWR? select child_number, executions, px_servers_executions, elapsed_time, direct_writes, elapsed_time/nvl(nullif(px_servers_executions,0),executions) et_exec, direct_writes/nvl(nullif(px_servers_executions,0),executions) direct_wrtes_exec from gv$sql where sql_id = '8d5h5p8znx8mx'; CHILD_NUMBER EXECS PX_EXECS ELAP_TIME DIRECT_W ET_EXEC DIRECT_W_EXEC ------------ ----- -------- ---------- -------- ------------ ------------- 0 9 0 7,803,972 0 867,108 0 1 0 71 91,106,462 154,559 1,283,189.61 2,176.88 You might be able to figure it out from GV$ but need a lot of imagination and luck 
  • 57. 6 – SQL blew up TEMP – Summary • Questions answered • Ability to monitor spill at per-execution and per-session basis • AWR would only show aggregated into • Similar info available for IOPS and IO bytes (and memory scan in V$ASH) • Charting info allows easy monitoring • Large amount of info consumed quickly • SQL Monitor relies on same ASH info • Even without SQL Mon, tons of info can be extract from ASH
  • 58. 7 – Making sense of “strange” executions SQL ID: 06pbgg9w0bmgp select /*+ mauro */ a.* from t_case1 a, t_case1 b where a.owner = l_owner and a.object_id = b.object_id and burn_cpu(a.object_id/b.object_id) = 1 select child_number, executions, end_of_fetch_count, elapsed_time, fetches, rows_processed from gv$sql where sql_id = '06pbgg9w0bmgp'; CHILD EXECS EOF_COUNT ELAPSED_TIME FETCHES ROWS_PROCESSED ----- ----- --------- ------------ -------- -------------- 0 3 0 30,419,821 6 30 This is a single session executing the SQL Why none reached EOF?
  • 59. 7 – Making sense of “strange” executions sql_id = '06pbgg9w0bmgp' and session_id = 377 order by sample_time; SAMPLE_TIME SQLEXECID SEXECSTA ------------------ --------- -------- 06.04.37.450 PM 16777222 18:04:36 06.04.38.450 PM 16777222 18:04:36 06.04.39.450 PM 16777222 18:04:36 06.04.40.450 PM 16777222 18:04:36 06.04.41.450 PM 16777223 18:04:41 06.04.42.450 PM 16777223 18:04:41 06.04.43.450 PM 16777223 18:04:41 06.04.44.450 PM 16777223 18:04:41 06.04.45.450 PM 16777223 18:04:41 06.04.46.450 PM 16777224 18:04:46 06.04.47.450 PM 16777224 18:04:46 06.04.48.450 PM 16777224 18:04:46 06.04.49.450 PM 16777224 18:04:46 06.04.50.450 PM 16777224 18:04:46 06.04.51.450 PM 16777224 18:04:46 06.04.52.450 PM 16777223 18:04:41 06.04.53.450 PM 16777223 18:04:41 06.04.54.450 PM 16777223 18:04:41 06.04.55.450 PM 16777223 18:04:41 06.04.56.450 PM 16777223 18:04:41 06.04.57.450 PM 16777224 18:04:46 06.04.58.450 PM 16777224 18:04:46 06.04.59.450 PM 16777224 18:04:46 06.05.00.450 PM 16777224 18:04:46 06.05.01.450 PM 16777224 18:04:46 06.05.02.450 PM 16777222 18:04:36 06.05.03.450 PM 16777222 18:04:36 06.05.04.450 PM 16777222 18:04:36 06.05.05.450 PM 16777222 18:04:36 06.05.06.450 PM 16777222 18:04:36
  • 60. 7 – Making sense of “strange” executions - Summary • Questions answered • ASH data can be used to “slice” GV$ data and make more sense out of it • In this specific case maybe not a cursor leak • Since the cursor is used multiple times • Same approach could be used to potentially spot a cursor leak • Would require the SQL to take “long” enough to spot it • Question not answered • Why would somebody do anything like this 
  • 61. Something worth knowing • ASH data uses default values until the value is not “ready to consume” • Adaptive Plans could take a while to resolve and until then PHV is 0 select /*+ LEADING(a) */ count(a.object_id) from (select /*+ no_merge leading (a) */ 1 object_id, 'a' owner from (select rownum from dual connect by rownum <= 1000) a, (select rownum from dual connect by rownum <= 1000) b) a, (select a.object_id from t1 a, t2 b where a.object_id = b.n1 and a.data_object_id = 1 and a.owner = 'SYS') b where a.object_id = b.object_id
  • 62. Something worth knowing ---------------------------------------------------------------------------------------- | Id |Operation |Name|E-Rows|Cost (%CPU)| Pstart| Pstop | ---------------------------------------------------------------------------------------- | 0|SELECT STATEMENT | | | 679 (100)| | | | 1| SORT AGGREGATE | | 1| | | | |- * 2| HASH JOIN | | 1| 679 (1)| | | | 3| NESTED LOOPS | | 1| 679 (1)| | | |- 4| STATISTICS COLLECTOR | | | | | | | * 5| HASH JOIN | | 1| 405 (1)| | | | 6| VIEW | | 1| 4 (0)| | | | 7| MERGE JOIN CARTESIAN | | 1| 4 (0)| | | | 8| VIEW | | 1| 2 (0)| | | | 9| COUNT | | | | | | | 10| CONNECT BY WITHOUT FILTERING | | | | | | | 11| FAST DUAL | | 1| 2 (0)| | | | 12| BUFFER SORT | | 1| 4 (0)| | | | 13| VIEW | | 1| 2 (0)| | | | 14| COUNT | | | | | | | 15| CONNECT BY WITHOUT FILTERING| | | | | | | 16| FAST DUAL | | 1| 2 (0)| | | | * 17| TABLE ACCESS FULL |T1 | 1| 401 (1)| | | | 18| PARTITION RANGE ITERATOR | | 49999| 274 (0)| KEY | KEY | | * 19| TABLE ACCESS FULL |T2 | 49999| 274 (0)| KEY | KEY | |- 20| PARTITION RANGE JOIN-FILTER | | 49999| 274 (0)|:BF0000|:BF0000| |- 21| TABLE ACCESS FULL |T2 | 49999| 274 (0)|:BF0000|:BF0000| ----------------------------------------------------------------------------------------
  • 63. Something worth knowing select sample_time, sql_plan_hash_value, sql_plan_line_id from gv$active_session_history where sql_id = '8x52hyvsh1j45' order by sample_time SAMPLE_TIME SQL_PLAN_HASH_VALUE SQL_PLAN_LINE_ID --------------------------- ------------------- ---------------- 28-AUG-17 03.42.43.251 PM 0 7 28-AUG-17 03.42.44.251 PM 0 6 28-AUG-17 03.42.45.251 PM 0 6 28-AUG-17 03.42.46.252 PM 0 7 28-AUG-17 03.42.47.252 PM 0 7 28-AUG-17 03.42.48.252 PM 0 5
  • 64. Things we just can’t do (as of now) • Current diagnostic very comprehensive • Allow to answer many questions around SQL execution • Still some questions unanswered, some examples • SQL Plan Baseline / SQL Patch used or not in the past (AWR limitation) • High Version Count in the past (AWR “limitation”) • Details of “old” CBO environment (encoded, no public API) • Historical binds for slow execution (unless captured, requires luck) • Changes in NLS environment in the past (current, V$SQL_SHARED_CURSOR) • Probably not a big problem, unless you hit it 
  • 65. Summary • Oracle diagnostics rocks when used properly • No single source of info, needs combining to get full picture • ASH provides different point of view into SQL execution • Needed more than expected • Regardless of the source, visualizing things make it easier • But this is Enkitec so you are stuck with me & SQL*Plus  • SQL Monitoring fills some of the gaps • Still many info come from ASH, available even historically (more than SQLMon) • Statspack + free ASH can provide useful info • Unfortunately not as comprehensive as the “real” ones
  • 66. 69
  • 67. Contact Information • Blog: http://mauro-pagano.com • Free tools • SQLd360 • TUNAs360 • Pathfinder • An “interesting” post every N posts • Email: mauro.pagano@gmail.com • Twitter: @mautro 71

Hinweis der Redaktion

  1. Mention that ASH could
  2. Parameter controlling ratio is _ash_disk_filter_ratio ########## INSERT INTO wrh$_sqlstat ()SELECT ... FROM X$KEWRSQLIDTAB sie, X$KGLCURSOR_CHILD_SQLIDPH SQL WHERE sie.sqlid_kewrsie = SQL.kglobt03 AND nlssort(sie.sqlid_kewrsie,'nls_sort = binary') = nlssort(SQL.kglobt03,'nls_sort = binary') ########## INSERT INTO WRH$_ACTIVE_SESSION_HISTORY ( ) (SELECT /*+ PARAM('_module_action_old_length',0) */: FROM x$ash a, (SELECT h.sample_addr, h.sample_id FROM x$kewash h WHERE ((h.sample_id >= :begin_flushing) and (h.sample_id < :latest_sample_id)) and (nlssort(h.is_awr_sample,'nls_sort=BINARY') = nlssort('Y', 'nls_sort=BINARY'))) shdr WHERE (1 = 1) and shdr.sample_addr = a.sample_addr and shdr.sample_id = a.sample_id and nlssort(a.need_awr_sample, 'nls_sort=BINARY') = nlssort('Y', 'nls_sort=BINARY'))
  3. Ask where people would go to check how long does the SQL take (forget about AWR for now) -> GV$SQL would be a good answer ####### To run it in SQL*Plus var b1 varchar2(20) exec :b1 := '…';
  4. Ask where people would go to check how long does the SQL take (forget about AWR for now) -> GV$SQL would be a good answer ####### To run it in SQL*Plus var b1 varchar2(20) exec :b1 := '…';
  5. Why elapsed was 5 minutes? Is this a case a user would complain about? Probably yes (5 min is a lot compared to 70ms) Which of the two is representative of the user experience? GV$SQL or SQL*Plus time recorded? Is this problem strictly a DB problem? It takes only 1.8s inside the DB vs 5 mins outsite
  6. Many things we could question from the app perspective
  7. Can we find start / end from anywhere in the DB? This situation not possible because of “executions_total” two slides ago says 1
  8. Questions: How do you check the DoP the SQL is executing at? V$SESSION would be a good answer (another is V$PX_SESSION) What if you want to know metrics per session?
  9. Mention that is OWNER had a histogram in 12c we would have used PX SEND HYBRID HASH (SKEW)
  10. Questions: How do you check the DoP the SQL is executing at? V$SESSION would be a good answer (another is V$PX_SESSION) What if you want to know metrics per session?
  11. What’s the avg elapsed time? But we know it’s no good because the claim is unstable perf and so avg is bad What’s the DoP for my SQL?
  12. We got really lucky here to have the list of executions 1st exec all normal 2nd exec strange because #exec bumped but not px_servers 3rd exec strange too because px execs bumped by 4 instead of 8
  13. DBA_HIST_ACTIVE_SESS_HISTORY would have been the same
  14. Nothing strange here, just the SQL text might suggest something Next is looking on plan We don’t compare here
  15. select sample_time, program, sql_plan_line_id, case when px_flags is null then 'SERIAL' else 'DoP '||trunc(px_flags/ 2097152) end dop from gv$active_session_history where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' and sql_exec_start = to_date('20170820175012','yyyymmddhh24miss') order by 1 ;
  16. select sample_time, program, sql_plan_line_id, case when px_flags is null then 'SERIAL' else 'DoP '||trunc(px_flags/ 2097152) end dop from gv$active_session_history where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' and sql_exec_start = to_date('20170820175012','yyyymmddhh24miss') order by 1 ;
  17. select sample_time, program, sql_plan_line_id, case when px_flags is null then 'SERIAL' else 'DoP '||trunc(px_flags/ 2097152) end dop from gv$active_session_history where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' and sql_exec_start = to_date('20170820175012','yyyymmddhh24miss') order by 1 ;
  18. This one is just to introduce the new topic
  19. This one is just to introduce the new topic
  20. This one is just to introduce the new topic
  21. _smm_px_max_size
  22. Background is critical SQL blew up TEMP and you want to make sure it doesn’t happen again, can you?
  23. History of some executions
  24. Aggregating by executions select sql_exec_id, sql_exec_start, max(sample_time)-sql_exec_start approx_et, max(pga) pga, max(temp) temp from (select sql_exec_id, sql_exec_start, sample_time, trunc(sum(pga_allocated)/1024/1024,2) pga, trunc(sum(temp_space_allocated)/1024/1024,2) temp from gv$active_session_history where sql_id = '8d5h5p8znx8mx' and sample_time >= sysdate-20/1440 group by sql_exec_id, sql_exec_start, sample_time)group by sql_exec_id, sql_exec_start order by 1,2
  25. Background is critical SQL blew up TEMP and you want to make sure it doesn’t happen again, can you?
  26. What can we do here to investigate what happened?
  27. QC waiting on enq KO, what might that be?
  28. What can we do here to investigate what happened?
  29. What can we do here to investigate what happened?
  30. There are two possibilities, this is a single session or multiple sessions. Multiple sessions would be trivial Single session gets interesting <- could even be cursor leaking
  31. Bug 26573174 SQL is just written to waste time and adapt later on, no real meaning
  32. The adaptive part is towards the end of the plan (steps 18-21)
  33. 5,6,7 are HJ, VIEW and MJC steps in the top block