2. Mauro Pagano @mautro
• Worked at Oracle, been at Enkitec (AEG) a while now
• Spend most of the time on performance problems
• Free tools: SQLd360, TUNAs360 etc (at Oracle: SQLT, SQLHC etc)
• Strong British accent
• “Newbie old fart” (approved by Bryn)
2
6. 1. Do you use AWR for SQL Tuning?
1. Do you start from it?
2. Why?
3. How do you use it?
2. Do you use ASH for SQL Tuning?
1. Do you start from it?
2. Why?
3. How do you use it?
3. Do you use SQL Monitoring for SQL tuning?
1. Do you start from it?
2. Why?
3. How do you use it?
Poll time – “Historical” SQL Tuning
7. What are we doing here today?
• Oracle has a ton of diagnostics (awesome!)
• People tend to rely on GV$ / AWR more than ASH
• Some questions harder to answer (if possible) from GV$/AWR data
• Today’s goal is:
• Present scenarios where multiple sources needed
• Explain why & where to gather the missing info, make sense out of it
• Knowing what info represent / source, better use of them
• Focus is on diagnostics
8. What are we NOT doing here today?
• Argument about which one is better
• They complement each other, not exclude each other
• Need all (often AWR+ASH enough) to have a full picture
• One could be enough depending on cases, still the other adds value
• Provide solution to scenarios presented
• Today it’s about diagnostics, not problem X or Y
• Once behavior identified correctly, solution is often easier to find (if exists)
• Talk license / cost associated with the Packs used
9. Some (incorrect) terminology we’ll use today
• GV$ all views on X$, except X$ASH
• For Example, GV$SQL
• AWR all the tables in AWR except ASH
• For example, DBA_HIST_SQLSTAT and DBA_HIST_SQL_PLAN
• ASH as GV$ACTIVE_SESSION_HISTORY and
DBA_HIST_ACTIVE_SESS_HISTORY
• SQL Mon as SQL Monitor
• Both raw data (GV$) and reports (current and historical)
10. How are AWR / (historical) ASH populated?
• AWR takes a picture every N minutes (or manual)
• Source views store accumulated data, take a pic of that at time T
• Historical ASH filters out samples from memory ASH
• Filtered may show info not important enough to show up in accumulated
• Source data includes info for all active sessions individually (not aggregated)
• Ratio is generally 1:10
• X$KEW[A|R]* help to narrow down what to collect
• Might make things a little harder to “break” in isolation
11. Why do we need both? AWR
Knows exactly
how much water
ASH
Knows roughly
who, when, how…
12. ASH samples - Why can we live with it?
Elapsed Time / execution
Frequencyofexecution
How you move
these bars
depends on your
app
13. Before we begin…
• Every case is artificial, represents real file case without noise
• Case itself is just a mean to an end, not really the focus of the scenario
• Cases build on each other, start simple and get into a little more complex
• OF COURSE I’m cheating!
• What we’ll see can be applied to any environment
• Knowing how to interpret and spot things helps in dev too
• Charts used just to present large amount of info in small space
• Not trying to push for any specific tool
• Get into DataViz anyway, it makes life so much easier!
14. Two tables, all we need
-- 8x dba_objects
create table t_case1 as
select *
from dba_objects,
(select rownum n1 from dual connect by rownum <= 8);
create index t_case1_objtype on t_case1(object_type);
-- 32x dba_objects
create table t_case3 as
select *
from t_case1,
(select rownum from dual connect by rownum <= 4);
15. 1 - How long does my SQL take?
SQL ID: apg0k1r43s8ak
SQL Text: select * from t_case1 where object_type = :b1;
Plan hash value: 3696583251
---------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T_CASE1 |
|* 2 | INDEX RANGE SCAN | T_CASE1_OBJTYPE |
---------------------------------------------------------------
2 - access("OBJECT_TYPE"=:B1)
16. 1 - How long does my SQL take? Total
<<removed plan_hash_value&child_number just to make it fit, one child only>>
select elapsed_time, buffer_gets, executions,
trunc(elapsed_time/executions,2) elapsed_exec,
trunc(buffer_gets/executions,2) lio_exec
from gv$sql
where sql_id = 'apg0k1r43s8ak';
ELAPSED_TIME BUFFER_GETS EXECUTIONS ELAPSED_EXEC LIO_EXEC
-------------- ------------ ---------- ------------- -----------
2,446,627 48,884 11 222,420.63 4,444
Elapsed/exec ~220ms
Gets/exec ~4.5k
17. 1 - How long does my SQL take? Good run
var b1 varchar2(20)
exec :b1 := 'RULE';
select * from t_case1 where object_type = :b1;
-----------------------------------------------------------------------------
| Id |Operation |Name |A-Rows|A-Time |Buffers|
-----------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 8|00:00.01| 13|
| 1| TABLE ACCESS BY INDEX ROWID B|T_CASE1 | 8|00:00.01| 13|
|* 2| INDEX RANGE SCAN |T_CASE1_OBJTYPE| 8|00:00.01| 5|
-----------------------------------------------------------------------------
Elapsed: 00:00:00.07
~10ms 13 buffer gets
18. 1 - How long does my SQL take? Bad run
var b1 varchar2(20)
exec :b1 := 'JAVA CLASS';
select * from t_case1 where object_type = :b1;
-----------------------------------------------------------------------------
|Id|Operation |Name |A-Rows|A-Time |Buffers|Reads|
-----------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 305K|00:01.80| 54926| 8009|
| 1| TABLE ACCESS BY INDEX R B|T_CASE1 | 305K|00:01.80| 54926| 8009|
|*2| INDEX RANGE SCAN |T_CASE1_OBJTYPE| 305K|00:00.63| 22737| 1300|
-----------------------------------------------------------------------------
Elapsed: 00:05:11.88
Can we look at the
data from a different
POV?
~1.80s 55k buffer gets
19. 1 - How long does my SQL take? ASH POV
select sample_time, sql_exec_id, sql_exec_start from gv$active_session_history where sql_id =
'apg0k1r43s8ak' order by sample_time;
SAMPLE_TIME SQL_EXEC_ID SQL_EXEC_START
--------------------------- ----------- -------------------
20-AUG-17 11.39.58.754 AM 16777222 2017-08-20/11:39:46
20-AUG-17 11.40.11.761 AM
20-AUG-17 11.40.31.781 AM
20-AUG-17 11.40.40.791 AM
20-AUG-17 11.40.43.792 AM
20-AUG-17 11.40.48.799 AM
20-AUG-17 11.40.51.800 AM
20-AUG-17 11.40.57.801 AM
20-AUG-17 11.41.03.809 AM
20-AUG-17 11.41.05.811 AM
20-AUG-17 11.41.23.833 AM
20-AUG-17 11.41.35.846 AM
20-AUG-17 11.41.52.863 AM
20-AUG-17 11.41.54.864 AM
20-AUG-17 11.42.03.870 AM
20-AUG-17 11.42.08.875 AM
20-AUG-17 11.42.21.882 AM
20-AUG-17 11.42.31.891 AM
20-AUG-17 11.42.34.896 AM
Jumps in time – session not
always busy during the
missing sample
User experience is 5
minutes not 2 secs
Not much we can do from
the DB perspective here
20. 1 - How long does my SQL take? Summary
• Questions answered
• GV$SQL (and similar) report time spent in DB calls, not user experience
• GV$SQL (and similar) aggregates time over executions of same cursor
• ASH sampled data helps understand how DB Time is spread over clock time
• In this case showing how clock time was likely NOT spent inside the DB
• ASH data has many dimensions, can help narrow down further
• For example, all slow executions come from app server X
• Question not solved
• Why slow execution was slow (was easy this time, we provided the bind)
• Historical binds are sampled, no direct correlation with specific execution
• Ideally pick up value and run SQL to reproduce
21. 2 - How long did my SQL take? AWR
SQL ID: 8gv4bwmnp8kmq
select /*+ LEADING(A) USE_NL(B) */ count(*)
from t_case1 a, t_case1 b
where rownum <= 1e10;
select snap_id, executions_delta e_d, executions_total e_t,
end_of_fetch_count_delta eof_d,
trunc(elapsed_time_delta/1e6) et_d_s,
trunc(elapsed_time_total/1e6) et_t_s,
buffer_gets_delta bg_d, buffer_gets_total bg_t
from dba_hist_sqlstat
where sql_id = '8gv4bwmnp8kmq' order by snap_id;
SNAP_ID E_D E_T EOF_D ET_D_S ET_T_S BG_D BG_T
---------- --- --- ----- ------ ------ ----------- ------------
3341 0 1 0 187 188 77,221,193 77,693,626
3342 0 1 0 126 314 51,883,866 129,577,492
3343 0 1 0 128 442 52,887,666 182,465,158
No info from
snapshots when SQL
started & ended
22. 2 - How long did my SQL take? AWR report
No trivial way to determine
#concurrent execs.
Doable from *_TOTAL
raw info
23. 2 - How long did my SQL take? Concurr Execs
Time passing
SNAP_ID
3341 3342 3343
Exec #1, starts
second and
completes second
Not expensive
enough to get
captured
Exec #2, starts last
and completes
first
Session 1
Session 2
Session 3
24. 2 - How long did my SQL take? ASH data
select sql_exec_id, sql_exec_start,
min(sample_time) first_sample, max(sample_time) last_sample,
max(sample_time)-sql_exec_start elapsed
from dba_hist_active_sess_history
where sql_id = '8gv4bwmnp8kmq'
group by sql_exec_id, sql_exec_start;
SQL_EXEC_ID SQL_EXEC_START FIRST_SAMPLE
----------- ------------------- ---------------------------
16777216 2017-08-20/13:04:43 20-AUG-17 01.04.52.779 PM
LAST_SAMPLE ELAPSED
-------------------------- --------------------------
20-AUG-17 01.12.32.799 PM +000000000 00:07:49.79
Only one execution,
took ~8 mins
25. 2 - How long did my SQL take? Summary
• Questions answered
• AWR only captures what mattered for the snapshot
• Can miss start / stop “slice” of info if not impacting enough within snapshot
• Raw info allows to determine number of concurrent executions, not AWR report
• Can only say how many started / ended, not which one
• ASH keeps only a subset of samples, but for each exec
• With approximation, allows to determine the who, when, where of each exec
• Questions not answered
• What if my execution takes very little? Sample compromise / doesn’t matter
26.
27. 3 – How is my PX doing?
SQL ID: frzgf5tc9cscc
select /*+ LEADING(A) PARALLEL(4) */ count(*)
from t_case1 a, t_case1 b
where a.owner = b.owner;
<< while SQL running >>
select child_number, elapsed_time, buffer_gets, executions,
px_servers_executions
from gv$sql
where sql_id = 'frzgf5tc9cscc';
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ----------- ---------- ---------------------
0 11,677 172 1 0
1 34,682,909 6 0 0
28. 3 – How is my PX doing? Running slow!
<< SQL still running >>
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 13,682 172 1 0
1 59,852,041 2,734 0 0
after CTRL+C (was taking too long)
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 519,951 172 1 0
1 96,314,353 13,205 0 8
Up to this point we know
8 sessions involved and
aggregated stats only
30. 3 – How is my PX doing? PX Skew & ASH data
select session_id, session_serial#, program, count(*)
from gv$active_session_history
where sql_id = 'frzgf5tc9cscc'
and sql_exec_id = 16777217
group by session_id, session_serial#, program;
SESSION_ID SESSION_SERIAL# PROGRAM COUNT(*)
---------- --------------- -------------------- ----------
8 55195 oracle@oel7 (P003) 217
133 37006 oracle@oel7 (P001) 12
QC not showing nor most
of the other processes,
P003 top consumer
Adding new ASH cols in the
SQL we can drill down, e.g.
plan step where time goes
31. 3 – How is my PX doing? PX Skew & SQL Mon
Many PX info in SQL
Mon NOT COMING
from ASH
32. 3 – How is my PX doing? PX Skew Summary
• Questions answered
• Presence of skewness during / after SQL execution
• Regardless of V$PQ_TQSTAT view (tricky to use)
• Needs SQL Monitor to have low level info (buffer gets, accurate time, etc)
• Questions not answered
• What causes the skewness and how to resolve it (not investigated here)
33. 4 – My PX SQL performance is unstable
SQL ID: 8nkpzgz08mdc8
select /*+ PARALLEL(4) */ count(*)
from t_case3 a, t_case3 b
where a.object_id = b.object_id;
select child_number, elapsed_time, buffer_gets, executions,
px_servers_executions
from gv$sql
where sql_id = '8nkpzgz08mdc8';
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 3,498,326 97,015 3 0
1 13,187,086 196,073 0 12
35. 4 – PX SQL perf unstable – GV$SQL “history”
After 1st exec
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 16,630 212 1 0
1 8,496,684 98,491 0 8
After 2nd exec
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 3,491,757 96,975 2 0
1 8,496,684 98,491 0 8
After 3rd exec
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 3,498,326 97,015 3 0
1 13,187,086 196,073 0 12
We got lucky here
Info are accumulated thus
very hard to spot downgrades
36. 4 – PX SQL perf unstable – ASH data
select distinct sql_exec_id, sql_exec_start,
case when px_flags is null then 'SERIAL'
else 'DoP '||trunc(px_flags/ 2097152)
end dop
from gv$active_session_history
where sql_id = '8nkpzgz08mdc8'
order by 2;
SQL_EXEC_ID SQL_EXEC_START DOP
----------- ------------------- ---------
16777216 2017-08-20/17:08:50 DoP 4
16777217 2017-08-20/17:09:16 SERIAL
16777218 2017-08-20/17:10:03 DoP 2
No need for luck we
got ASH
37. 5 – My PX SQL perf is unstable – more fun
SQL ID: gcgmgk8m8v4vm
with a as (select /*+ materialize parallel(4)*/ a.object_id, b.object_name
from t_case3 a, t_case3 b
where a.object_id = b.object_id
and rownum <= 1e6)
select count(*)
from (select /*+ parallel(4) no_merge */ c.object_name
from t_case3 c, a
where a.object_id = c.object_id
and a.object_name = c.object_name);
CHILD_NUMBER ELAPSED_TIME BUFFER_GETS EXECUTIONS PX_SERVERS_EXECUTIONS
------------ -------------- ------------ ---------- ---------------------
0 4,097,033 50,679 1 0
1 14,191,651 98,499 0 8
42. 4 & 5 PX SQL perf unstable – Summary
• Questions answered
• Ability to determine DoP during / after execution
• Regardless of V$PX_SESSION (and others) views
• Ability to determine DoP on a per DFO-tree basis
• Pretty much impossible from GV$ / AWR
• Multiple dimensions can be added to drill down into slave execs (e.g waits)
• SQL Monitor only way to extract low level info per slave
• For example, buffer gets, accurate time, #rows, starts, etc
• Questions not answered
• What causes the downgrade (not investigated here)
43. Trivia – My SQL blew up TEMP
SQL ID: 0qnb575hn2mkr (FAILS) & dm53symv2vmy6 (WORKS)
select /*+ PARALLEL(4) LEADING(B A C)
USE_SWAP(c) USE_HASH(A) USE_HASH(C) FAILS|WORKS */
count(*)
from t_case3 a, t_case3 b, t_case3 c
where a.object_id = b.object_id and a.object_id = c.object_id;
ERROR at line 1:
ORA-12801: error signaled in parallel query server P000
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP
<<hint, I’m messing with the env and with you>>
45. Trivia – My SQL blew up TEMP
sql_id = 'dm53symv2vmy6' group by sample_time order by 1;
SAMPLE_TIME SUM(PGA_ALLOCATED) SUM(TEMP_SPACE_ALLOCATED)
---------------------------- ------------------ -------------------------
20-AUG-17 07.16.06.259 PM 84,414,464 0
20-AUG-17 07.16.07.259 PM 766,644,224 0
20-AUG-17 07.16.08.264 PM 766,644,224 0
<<…>>
20-AUG-17 07.16.24.316 PM 766,644,224 0
20-AUG-17 07.16.25.316 PM 766,644,224 0
sql_id = '0qnb575hn2mkr' group by sample_time order by 1;
SAMPLE_TIME SUM(PGA_ALLOCATED) SUM(TEMP_SPACE_ALLOCATED)
---------------------------- ------------------ -------------------------
20-AUG-17 07.17.49.463 PM 17,075,200 61,865,984
20-AUG-17 07.17.50.463 PM 40,308,736 148,897,792
<<…>>
20-AUG-17 07.17.55.467 PM 58,396,672 509,607,936
20-AUG-17 07.17.56.466 PM 58,396,672 583,008,256
Used less PGA but
spilled to TEMP
46. Trivia – My SQL blew up TEMP – SQL Monitor
dm53symv2vmy60qnb575hn2mkr
47. Trivia – My SQL blew up TEMP
select sql_id, child_number, optimizer_env_hash_value
from gv$sql
where sql_id in ('dm53symv2vmy6','0qnb575hn2mkr');
SQL_ID CHILD_NUMBER OPTIMIZER_ENV_HASH_VALUE
------------- ------------ ------------------------
0qnb575hn2mkr 0 3821565029
0qnb575hn2mkr 1 128879201
dm53symv2vmy6 0 3821565029
dm53symv2vmy6 1 128879201
Same CBO
environment aka
same CBO params
Not all _smm_*
params make it into
CBO env!!!
48. 6 – SQL blew up TEMP – prevention!!
SQL ID: 8d5h5p8znx8mx
select /*+ PARALLEL(4) LEADING(B A C) USE_SWAP(c)
USE_HASH(A) USE_HASH(C) */
count(*)
from t_case1 a,
t_case1 b,
t_case1 c
where a.object_id = b.object_id
and a.object_id = c.object_id;
<< not using GV$/AWR because we need to differentiate per exec >>
49. 6 – SQL blew up TEMP – history
1st run
SAMPLE_TIME SQL_EXEC_ID PGA TEMP
------------------------------- ----------- ---------- ----------
21-AUG-17 10.09.12.674 AM 16777217 105.22 61
2nd run – data is growing
------------------------------- ----------- ---------- ----------
21-AUG-17 10.15.32.182 AM 16777218 17.12 20
21-AUG-17 10.15.33.182 AM 16777218 70.12 113
3rd run – data keeps growing
------------------------------- ----------- ---------- ----------
21-AUG-17 10.16.31.259 AM 16777219 2.26 0
21-AUG-17 10.16.32.259 AM 16777219 21.69 40
21-AUG-17 10.16.33.261 AM 16777219 70.12 110
50. 6 – SQL blew up TEMP – Aggregated history
Aggregating over a few runs the trend is obvious (increasing memory usage)
SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
----------- ------------------- -------------------------- ------- ----------
16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82
16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61
16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113
16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
51. 6 – SQL blew up TEMP – Chart your data!
ASH info are really
easy to chart
Faster to consume!
52. 6 – SQL blew up TEMP – Keep executing
One new “break the pattern” execution showed up
SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
----------- ------------------- ------------------------- ------- -----
16777216 2017-08-21/10:07:19 +000000000 00:00:03.578 47.37 82
16777217 2017-08-21/10:09:11 +000000000 00:00:01.674 105.22 61
16777218 2017-08-21/10:15:30 +000000000 00:00:03.182 70.12 113
16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109
Touched less PGA /
TEMP but took longer
53. 6 – SQL blew up TEMP – Drill into 1 exec
sql_id = '8d5h5p8znx8mx' and sql_exec_id = 16777222
SAMPLE_TIME SID PROGRAM EVENT
--------------------------- --- -------------------- ---------------------------------
21-AUG-17 10.26.28.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.29.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.30.481 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.31.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.32.480 AM 254 sqlplus@Mauros-iMac. enq: KO - fast object checkpoint
21-AUG-17 10.26.33.480 AM 253 oracle@oel7 (P005)
21-AUG-17 10.26.33.480 AM 362 oracle@oel7 (P006)
21-AUG-17 10.26.34.482 AM 16 oracle@oel7 (P003) direct path write temp
21-AUG-17 10.26.34.482 AM 135 oracle@oel7 (P000) direct path write temp
21-AUG-17 10.26.34.482 AM 255 oracle@oel7 (P001) direct path write temp
21-AUG-17 10.26.34.482 AM 373 oracle@oel7 (P002) direct path write temp
55. 6 – SQL blew up TEMP – Mystery solved
One more execution showed up, but they are from different sessions
SID SQL_EXEC_ID SQL_EXEC_START APPROX_ET PGA TEMP
--- ----------- ------------------- ------------------------ ------- -----
130 16777219 2017-08-21/10:16:31 +000000000 00:00:02.261 70.12 110
130 16777220 2017-08-21/10:17:21 +000000000 00:00:04.342 126.12 160
130 16777221 2017-08-21/10:18:14 +000000000 00:00:05.449 134.94 188
254 16777222 2017-08-21/10:26:27 +000000000 00:00:07.482 69.69 109
254 16777223 2017-08-21/10:34:05 +000000000 00:00:02.126 69.12 108
130 16777224 2017-08-21/10:38:01 +000000000 00:00:05.402 139.94 201
56. 6 – SQL blew up TEMP – Why not AWR?
select child_number, executions, px_servers_executions, elapsed_time,
direct_writes,
elapsed_time/nvl(nullif(px_servers_executions,0),executions) et_exec,
direct_writes/nvl(nullif(px_servers_executions,0),executions)
direct_wrtes_exec
from gv$sql
where sql_id = '8d5h5p8znx8mx';
CHILD_NUMBER EXECS PX_EXECS ELAP_TIME DIRECT_W ET_EXEC DIRECT_W_EXEC
------------ ----- -------- ---------- -------- ------------ -------------
0 9 0 7,803,972 0 867,108 0
1 0 71 91,106,462 154,559 1,283,189.61 2,176.88
You might be able to
figure it out from GV$
but need a lot of
imagination and luck
57. 6 – SQL blew up TEMP – Summary
• Questions answered
• Ability to monitor spill at per-execution and per-session basis
• AWR would only show aggregated into
• Similar info available for IOPS and IO bytes (and memory scan in V$ASH)
• Charting info allows easy monitoring
• Large amount of info consumed quickly
• SQL Monitor relies on same ASH info
• Even without SQL Mon, tons of info can be extract from ASH
58. 7 – Making sense of “strange” executions
SQL ID: 06pbgg9w0bmgp
select /*+ mauro */ a.*
from t_case1 a, t_case1 b
where a.owner = l_owner
and a.object_id = b.object_id
and burn_cpu(a.object_id/b.object_id) = 1
select child_number, executions, end_of_fetch_count,
elapsed_time, fetches, rows_processed
from gv$sql
where sql_id = '06pbgg9w0bmgp';
CHILD EXECS EOF_COUNT ELAPSED_TIME FETCHES ROWS_PROCESSED
----- ----- --------- ------------ -------- --------------
0 3 0 30,419,821 6 30
This is a single session
executing the SQL
Why none reached EOF?
60. 7 – Making sense of “strange” executions -
Summary
• Questions answered
• ASH data can be used to “slice” GV$ data and make more sense out of it
• In this specific case maybe not a cursor leak
• Since the cursor is used multiple times
• Same approach could be used to potentially spot a cursor leak
• Would require the SQL to take “long” enough to spot it
• Question not answered
• Why would somebody do anything like this
61. Something worth knowing
• ASH data uses default values until the value is not “ready to consume”
• Adaptive Plans could take a while to resolve and until then PHV is 0
select /*+ LEADING(a) */ count(a.object_id)
from (select /*+ no_merge leading (a) */ 1 object_id, 'a' owner
from (select rownum from dual connect by rownum <= 1000) a,
(select rownum from dual connect by rownum <= 1000) b) a,
(select a.object_id
from t1 a, t2 b
where a.object_id = b.n1
and a.data_object_id = 1
and a.owner = 'SYS') b
where a.object_id = b.object_id
64. Things we just can’t do (as of now)
• Current diagnostic very comprehensive
• Allow to answer many questions around SQL execution
• Still some questions unanswered, some examples
• SQL Plan Baseline / SQL Patch used or not in the past (AWR limitation)
• High Version Count in the past (AWR “limitation”)
• Details of “old” CBO environment (encoded, no public API)
• Historical binds for slow execution (unless captured, requires luck)
• Changes in NLS environment in the past (current, V$SQL_SHARED_CURSOR)
• Probably not a big problem, unless you hit it
65. Summary
• Oracle diagnostics rocks when used properly
• No single source of info, needs combining to get full picture
• ASH provides different point of view into SQL execution
• Needed more than expected
• Regardless of the source, visualizing things make it easier
• But this is Enkitec so you are stuck with me & SQL*Plus
• SQL Monitoring fills some of the gaps
• Still many info come from ASH, available even historically (more than SQLMon)
• Statspack + free ASH can provide useful info
• Unfortunately not as comprehensive as the “real” ones
67. Contact Information
• Blog: http://mauro-pagano.com
• Free tools
• SQLd360
• TUNAs360
• Pathfinder
• An “interesting” post every N posts
• Email: mauro.pagano@gmail.com
• Twitter: @mautro
71
Hinweis der Redaktion
Mention that ASH could
Parameter controlling ratio is _ash_disk_filter_ratio
##########
INSERT INTO wrh$_sqlstat ()SELECT ... FROM X$KEWRSQLIDTAB sie, X$KGLCURSOR_CHILD_SQLIDPH SQL WHERE sie.sqlid_kewrsie = SQL.kglobt03 AND nlssort(sie.sqlid_kewrsie,'nls_sort = binary') = nlssort(SQL.kglobt03,'nls_sort = binary')
##########
INSERT INTO WRH$_ACTIVE_SESSION_HISTORY ( ) (SELECT /*+ PARAM('_module_action_old_length',0) */: FROM x$ash a, (SELECT h.sample_addr, h.sample_id FROM x$kewash h WHERE ((h.sample_id >= :begin_flushing) and (h.sample_id < :latest_sample_id)) and (nlssort(h.is_awr_sample,'nls_sort=BINARY') = nlssort('Y', 'nls_sort=BINARY'))) shdr WHERE (1 = 1) and shdr.sample_addr = a.sample_addr and shdr.sample_id = a.sample_id and nlssort(a.need_awr_sample, 'nls_sort=BINARY') = nlssort('Y', 'nls_sort=BINARY'))
Ask where people would go to check how long does the SQL take (forget about AWR for now) -> GV$SQL would be a good answer
#######
To run it in SQL*Plus
var b1 varchar2(20)
exec :b1 := '…';
Ask where people would go to check how long does the SQL take (forget about AWR for now) -> GV$SQL would be a good answer
#######
To run it in SQL*Plus
var b1 varchar2(20)
exec :b1 := '…';
Why elapsed was 5 minutes?
Is this a case a user would complain about? Probably yes (5 min is a lot compared to 70ms)
Which of the two is representative of the user experience? GV$SQL or SQL*Plus time recorded?
Is this problem strictly a DB problem? It takes only 1.8s inside the DB vs 5 mins outsite
Many things we could question from the app perspective
Can we find start / end from anywhere in the DB?
This situation not possible because of “executions_total” two slides ago says 1
Questions:
How do you check the DoP the SQL is executing at? V$SESSION would be a good answer (another is V$PX_SESSION)
What if you want to know metrics per session?
Mention that is OWNER had a histogram in 12c we would have used PX SEND HYBRID HASH (SKEW)
Questions:
How do you check the DoP the SQL is executing at? V$SESSION would be a good answer (another is V$PX_SESSION)
What if you want to know metrics per session?
What’s the avg elapsed time? But we know it’s no good because the claim is unstable perf and so avg is bad
What’s the DoP for my SQL?
We got really lucky here to have the list of executions
1st exec all normal
2nd exec strange because #exec bumped but not px_servers
3rd exec strange too because px execs bumped by 4 instead of 8
DBA_HIST_ACTIVE_SESS_HISTORY would have been the same
Nothing strange here, just the SQL text might suggest something
Next is looking on plan
We don’t compare here
select sample_time, program, sql_plan_line_id, case when px_flags is null then 'SERIAL' else 'DoP '||trunc(px_flags/ 2097152) end dop from gv$active_session_history where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' and sql_exec_start = to_date('20170820175012','yyyymmddhh24miss') order by 1 ;
select sample_time, program, sql_plan_line_id, case when px_flags is null then 'SERIAL' else 'DoP '||trunc(px_flags/ 2097152) end dop from gv$active_session_history where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' and sql_exec_start = to_date('20170820175012','yyyymmddhh24miss') order by 1 ;
select sample_time, program, sql_plan_line_id, case when px_flags is null then 'SERIAL' else 'DoP '||trunc(px_flags/ 2097152) end dop from gv$active_session_history where sql_exec_id = 16777216 and sql_id = 'gcgmgk8m8v4vm' and sql_exec_start = to_date('20170820175012','yyyymmddhh24miss') order by 1 ;
This one is just to introduce the new topic
This one is just to introduce the new topic
This one is just to introduce the new topic
_smm_px_max_size
Background is critical SQL blew up TEMP and you want to make sure it doesn’t happen again, can you?
History of some executions
Aggregating by executions
select sql_exec_id, sql_exec_start, max(sample_time)-sql_exec_start approx_et, max(pga) pga, max(temp) temp from (select sql_exec_id, sql_exec_start, sample_time, trunc(sum(pga_allocated)/1024/1024,2) pga, trunc(sum(temp_space_allocated)/1024/1024,2) temp from gv$active_session_history where sql_id = '8d5h5p8znx8mx' and sample_time >= sysdate-20/1440 group by sql_exec_id, sql_exec_start, sample_time)group by sql_exec_id, sql_exec_start order by 1,2
Background is critical SQL blew up TEMP and you want to make sure it doesn’t happen again, can you?
What can we do here to investigate what happened?
QC waiting on enq KO, what might that be?
What can we do here to investigate what happened?
What can we do here to investigate what happened?
There are two possibilities, this is a single session or multiple sessions.
Multiple sessions would be trivial
Single session gets interesting <- could even be cursor leaking
Bug 26573174
SQL is just written to waste time and adapt later on, no real meaning
The adaptive part is towards the end of the plan (steps 18-21)