Being a crucial feature on managing database load and with real world practice showing that Database
Resource Manager (DBRM) is not often used, this talk want to change this and demystify this feature by
explaining how it works in detail on different scenarios, the CPU math behind it, how to measure it in
real-time using Python and SQL and exploring more complex features to understand its behaviour.
Special attention will be made to understand its internals whenever possible.
3. Agenda
What we are going to talk?
Luís Marques - @drune - http://lcmarques.com
4. About Database Resource
Manager with a lot of questions,
charts, arrows, screenshots and a
Python script
Luís Marques - @drune - http://lcmarques.com
5. Hand Raising
Is there a simple picture that summarize Resource
Manager CPU scheduling?
Luís Marques - @drune - http://lcmarques.com
6. OS
OS
BeforeDatabaseResourceManager
Luís Marques - @drune - http://lcmarques.com
CPU #1
CPU #2
P#n
P#n
P#n
P#n
P#n
OS
PMON
OS
OS
OS
P#n
P#n
OS run-queue
• quantum defined by OS
• Priority can be changed by
OS
• All Oracle user sessions
have the same priority to
be selected for CPU
LGWR
SMON
DBWR
7. OS
OS
AfterDatabaseResourceManager
Luís Marques - @drune - http://lcmarques.com
Processes waiting
for selection
(DBRM internal queue)
CPU #1
CPU #2
S#n
S#n
S#n
S#n
S#n
S#n
DBRM internal queue
(priority aware according DBRM plan)
OS
PM
ON
OS
OS
OS
S#n
S#n
OS run-queue
OS scheduler will
decide between the
processes in run-
queue
LGW
R
8. Moreabout DBRM scheduler…
• DBRM Scheduler is not Database Workload Agnostic
• Priority based round robin algorithm
• Fixed quantum time slice of 100ms given to each process
(_dbrm_quantum)
• More intelligent scheduling:
• Aware of Oracle internal structures (eg: mutex, latching)
• Has code to avoid problems like priority inversion.
• No CPU starvation from critical background processes
• 2 Background Processes: VKRM and DBRM
Luís Marques - @drune - http://lcmarques.com
9. Hand Raising
Interesting! How do you prove that you have internal
queues and how the processes there go chosen to be
on CPU?
Luís Marques - @drune - http://lcmarques.com
10. DBRM –Scheduling(VKRM)
• If process must yield,VKRM background process will
determine what is the next process to be on OS runqueue:
• perf Linux profiler output:
Luís Marques - @drune - http://lcmarques.com
kgskrunnext - function that is
responsible for next-process on
OS runqueue?
11. DBRM –Scheduling(VKRM)
• SuspendingVKRM will place all your session eternally waiting for
CPU.
• SQL> ORADEBUG SETOSPID 16568
Oracle pid: 10, Unix process pid: 16568, image: oracle@baco (VKRM)
• SQL> ORADEBUG SUSPEND
Luís Marques - @drune - http://lcmarques.com
ORADEBUG
SUSPEND
ORADEBUG
RESUME
100% resmgr: cpu quantum
12. DBRM –Scheduling(CPU run-queue)
Luís Marques - @drune - http://lcmarques.com
• vmstat data with DBRM disabled:
• OS run-queue does increase while increasing session number: 41
sessions at end for 2 CPUs
As soon as sessions
increase, OS run queue
increases
13. DBRM –Scheduling(CPU run-queue)
• Oracle maintains an internal queue for DBRM:
• vmstat data with DBRM active
• Increasing sessions number gradually
Luís Marques - @drune - http://lcmarques.com
OS run queue doesn’t
increase even with 41
sessions and 2 CPUs
14. Hand Raising
Nice theory but…
I have a database with several schemas with different
priorities.
How I handle
Resource Management?
Luís Marques - @drune - http://lcmarques.com
15. presman–DBRM monitorscript
• DBRM MonitoringTool written in Python 2.x and cx_Oracle
• Runs onWindows, Linux and OSX
• Usage ./presman.py -m measure -o filename –c column_id -p
• Available measures: CPU, SESSION_IO, PARALLEL, EMPHASIS
• Download: http://lcmarques.com/presman-dbrm-monitor/
• Available on github: https://github.com/lcmarques/presman
Luís Marques - @drune - http://lcmarques.com
17. Hand Raising
Hmm..but the sum of all allocation on all levels is way
over 100%?
How I know the the minimum CPU allocated per
consumer group?
Luís Marques - @drune - http://lcmarques.com
18. Emphasis-The MinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
• Minimum CPU for the all DBRM managed sessions, not host
minimum CPU allocation
• Minimum CPU :
Minimum % of CPU for
Consumer Group “n”
The value specified in
plan directive mgmt_pn
Product of a sequence
k = mgmt_p1
n = mgmt_pn
The sum of
mgmt_p (n-1)
level
19. Emphasis-The MinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
Consumer Group mgmt_p1 mgmt_p2 mgmt_p3 Maximum CPU
RISK 100%
RSK_REPORT 100%
ADHOC 60%
OTHER_GROUPS 100%
65%
17,5%
14%
3,5%
20. Hand Raising
Great stuff! Let’s go test the Resource Manager plan
ok?
Luís Marques - @drune - http://lcmarques.com
21. Test#1 –UTILIZATION_LIMIT
• ADHOC Consumer group with UTILIZATION_LIMIT = 60%
• CPU burner: burn_cpu_adhoc.sql
• UTILIZATION_LIMIT is not a host CPU limit!
• UTILIZATION_LIMIT is for Oracle user sessions managed by
DBRM
Luís Marques - @drune - http://lcmarques.com
Us ~66%
Sys ~7%
22. Hand Raising
Hey, hey, so how I measure it easily?
Luís Marques - @drune - http://lcmarques.com
23. Test#1 –UTILIZATION_LIMIT
• v$rsrcmgrmetric and v$osstat and do some math:
(cpu_consumed_time_sec / (60 * CPU_count)) * 100
• $ presman.py –m cpu -o oracle_cpu.csv –c 7 -p
Luís Marques - @drune - http://lcmarques.com
Oracle CPU in % by
Consumer Group
24. Hand Raising
That is easy!
How do I test my plan CPU allocation ?
Luís Marques - @drune - http://lcmarques.com
25. Test#2 –OracleCPUConsumption
• Step 0 – Start presman to measure CPU by CG
• $ presman.py –m cpu -o oracle_cpu.csv –c 5
• Step 1 - Fire up 3 sessions ADHOC consumer group
• Almost 100% CPU for all consumer groups is used onADHOC
Luís Marques - @drune - http://lcmarques.com
26. Test#2 –OracleCPUConsumption
• Step 2 - Fire up 10 sessions in consumer group RISK
• RISK have a lot more sessions and more priority
• No UTILIZATION_LIMIT directive on RISK consumer group
• ADHOC consumer groupCPU is down to almost 20% of all
consumer group CPU activity
Luís Marques - @drune - http://lcmarques.com
27. Test#2 –OracleCPUConsumption
• Step 3 - Fire up 5 Sessions in consumer group RSK_REPORT
• ADHOC querys got canceled to the directive CANCEL_SQL
• RISK and RISK_REPORT are consuming almost every CPU
cycle.
Luís Marques - @drune - http://lcmarques.com
28. Test#2 –OracleCPUConsumption
• Step 4 - Fire up 3 Sessions in consumer group ADHOC
• Real world test vs Plan Directives CPU allocation
Luís Marques - @drune - http://lcmarques.com
Consumer Group Minimum CPU Test Minimum
CPU
Sessions
RISK 65% 66,74% 10
RSK_REPORT 17,5% 18,23% 5
ADHOC 14% 14,81% 3 + 3
OTHERS_GROUP 3,5% 0,22% No sessions
30. Hand Raising
Clarified!
With so many sessions for a 4 CPU database!You
surely have throttling right?
Luís Marques - @drune - http://lcmarques.com
My hand
hurts…
31. Test#3 –ThrottlingbyWaitEvent
• Throttling by Resource Manager can be monitored by the wait event
resmgr:cpu quantum (wait class Scheduler)
• Without Resource Manager, the time spent in “resmgr:cpu
quantum” will be spent instead as waits on the operating system run
queue.
• AWR report indication of high waits on the run queue is from the
server load numbers (11g)
• 12c AWR has more information on CPU Wait
• resmgr: cpu quantum doesn’t necessarily means you have a
overloaded CPU (eg: UTILIZATION_LIMIT directive)
Luís Marques - @drune - http://lcmarques.com
32. Test#3 -ThrottlingbyWaitEvent
• SQL> alter system set resource_manager_plan=‘’
• CPU available = 4 x 10.04 x 60 = 2409,6 sec
• Consumed CPU = 2053,9 (85%)
• % of CPUWait = 99.79% - 42.7% = 57,09 % of DBTime spent
of OS run queue
Luís Marques - @drune - http://lcmarques.com
33. Test#3 -ThrottlingbyWaitEvent
• alter system set resource_manager_plan=‘DBRM_PLAN’
• CPU available = 4 x 9,03 x 60 = 2167,2
• Consumed CPU = 1820,9 (84%)
• 63% of DBTime is spent on waiting in Resource Manager internal queue
• % of CPUWait = 36,64% - 28,1 % = Only 8,54 % of DBTime spent of OS
run queue
Luís Marques - @drune - http://lcmarques.com
34. Hand Raising
Good! I’ve read that we can handle parallel execution.
Handling all the parallel servers seems to be hard for
me!
Luís Marques - @drune - http://lcmarques.com
35. TheDW forreporting–Plan #2
Consumer
Group
RATIO PARALLE
L_DEGRE
E_LIMIT
SWITCH
_TIME
S_GROUP PARALLEL_
SERVER_LI
MIT
PARALLEL
_QUEUE_
TIMEOUT
OTHERS_GRO
UP
10 0 120 sec SHORT_RE
PORTING
SHORT_REPO
RTING
5 900 sec LONG_RE
PORTING
50%
LONG_REPOR
TING
1 50% 3600 sec
Luís Marques - @drune - http://lcmarques.com
• RATIO was used on create_plan()
• Priority statements on OTHERS_GROUPS have to execute on
serial
• To limit the parallel servers used by a consumer group, use the
parallel_server_limit directive
36. Hand Raising
Hey hey...WAIT! Now you used plan directives with a
thing called RATIO or SHARE! What is that?
Luís Marques - @drune - http://lcmarques.com
37. Ratio-TheMinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
Minimum % of CPU for
Consumer Group “n”
The value specified in
plan directive mgmt_pnSum of all ratios
Consumer Group Mgmt_p1 Ratio Ratio as Emphasis
OTHERS_GROUP 10 10 / 16 = 62,5 %
SHORT_REPORTING 5 5 / 16 = 31,25 %
LONG_REPORTING 1 1 / 16 = 6,25%
38. Hand Raising
Can you go forward with the plan testing. I’m
interested on parallel details!
Luís Marques - @drune - http://lcmarques.com
40. Test#1–PARALLEL_DEGREE_LIMIT
withoutAUTODOP
• Generation of a PARALLEL plan when execution is serial is more expensive
• Large difference between DOP assumed at optimization time (hard parse
time) and actual DOP at execution time might lead to not optimal
execution plans
Luís Marques - @drune - http://lcmarques.com
41. Test#1–PARALLEL_DEGREE_LIMITwith
AUTODOP
• Auto DOP is enabled via parallel_degree_policy= AUTO (or
ADAPTIVE in 12c)
• Only new Auto DOP codepath negotiates with DBRM
• alter session set "_px_trace"="high",all;
• $ burn_me.sh (1 session)
Luís Marques - @drune - http://lcmarques.com
42. Test#2 – PARALLEL_SERVER_LIMIT
• PARALLEL_SERVER_LIMIT directive is percentage of
parameter parallel_servers_target
• Avoid a low priority user and consumer group to get all parallel
servers
• When percentage of parallel servers is reached for Consumer
Group Statement Queued
• Auto DOP is enabled to enable Parallel Statement Queueing
Luís Marques - @drune - http://lcmarques.com
Consumer Group PARALLEL_SERVERS_TAR
GET
PARALLEL_SERVER_LIMIT
LONG_REPORTING 64 50%
SHORT_REPORTING 64 50%
43. Test#2 – PARALLEL_SERVER_LIMIT
• $ burn_me.sql (19 sessions) to LONG_REPORTING
• SQL> alter system set parallel_servers_target = 64
• $ presman.py –m parallel
Luís Marques - @drune - http://lcmarques.com
16 statements running
3 statements queued 32 Parallel Servers = 50%
of parallel_servers_target
44. Hand Raising
Clear! What about having give more or less priority to
my parallel statements when they are queued?
Luís Marques - @drune - http://lcmarques.com
45. Test#3–PriorityoftheParallelStatement
Queue
Luís Marques - @drune - http://lcmarques.com
Parsed Statement
& Auto DOP is
calculated
SQL
stat
SQL
stat
SQL
stat
Statement
Executes in
Parallel
SQL
stat
SQL
stat
SQL
stat
SQL
stat
FIFO Statements Queue per
Consumer Group – not
enough parallel servers or
limit reached
Enough parallel servers –
PARALLEL_SERVER_LIMIT
not reached
Statement
Executes in
Parallel
Dequeuing priority based RATIO /
SHARES or EMPHASIS values on
the Consumer Group
SQL
stat
SQL
stat
46. Test#3–PriorityoftheParallelStatement
Queue
• 35 sessions for SHORT and LONG Reporting Consumer
Group.
• $ burn_me_all_same_time.sh
• $ presman.py –m parallel –o queue_time.csv –c 4
• Step 1 - 16 Statements running and 19 queued for each
Consumer Group
Luís Marques - @drune - http://lcmarques.com
47. Test#3–PriorityoftheParallelStatement
Queue
• Step 2 - Dequeue of parallel statements started
• Step 3 – Dequeuing continues as soon as some statements
finish
• Step 4 - Almost every statement done. No queued statements
Luís Marques - @drune - http://lcmarques.com
48. Test#3–PriorityoftheParallelStatement
Queue
• SHORT_REPORTING QueueTime: 7719385 milliseconds
• LONG_REPORTING QueueTime: 11375129 milliseconds
67,8% less queue time for SHORT_REPORTING
• SHORT_REPORTING ratio is 5 for 1 in LONG_REPORTING
• SHORT_REPORTING has 5 times more probability to get one
statement dequeded than LONG_REPORTING.
Luís Marques - @drune - http://lcmarques.com
49. Hand Raising
What if I have some critical reports that need to
bypass the queue because they are critical?
Luís Marques - @drune - http://lcmarques.com
50. CriticalParallelStatementQueues
• Oracle 12c introduced parallel_stmt_critical on plan directives
• Allows one value: BYPASS_QUEUE
• Sessions will start immediately and not wait in the queue.
• parallel_max_servers init parameter is the hard threshold and
critical statements can run with lower number of PX servers
dbms_resource_manager.create_plan_directive( plan =>
'REPORTS_PLAN',
group_or_subplan => 'CRITICAL_REPORT', comment => 'CRITICAL
Reporting Querys',
parallel_stmt_critical => 'BYPASS_QUEUE');
Luís Marques - @drune - http://lcmarques.com
51. Q & A
Luís Marques - @drune - http://lcmarques.com
I bet we don’t
have time for it
52. Wanttoknow more?
• Dump the state of DBRM with:
• SQL> oradebug setmypid
• SQL> oradebug dump DBSCHEDULER 1
• Trace wait events with 12c interface:
• SQL> alter session set events 'wait_event["resmgr:cpu
quantum"] trace("%sn", shortstack())';
• SQL> exec
DBMS_MONITOR.SESSION_TRACE_ENABLE(waits => true,
binds => false, plan_stat => 'NEVER');
Luís Marques - @drune - http://lcmarques.com
Hinweis der Redaktion
How many of you are using DBRM?
Underestimate because: it is very powerful, not very well understood and poorly used.
Part 1: Theory on DBRM scheduler details with CPU session scheduling in mind
Part 2: More pratical RM plan, testing and validating the most interesting features
A lot of images and arrows will appear during the presentation
The presentation will be driven by a guy that is constantly interrupting it and asking questions
Database Resource Manager is basically a scheduler like the one you will find on your operating system. The difference is that it knows very well your workload and Oracle because it is inside it.
Priority decay can happen and if your mutex holder is eating lot of cpu the priority can be lowered caused a priority inversion issue
1 – BLUE: Processes in DBRM internal queue waiting to be selected and placed on operating system run queue.
This process selection is made according your RM plan and VKRM background process will place the next process in os-runqueue for selection
Operation System is then responsible for place everything on CPUs
2 – ORANGE: Please note that DBRM will take care of priority of PMON (and other Oracle Background Processes) and it will try to avoid any type of CPU starvation for it, even if that means that your session must wait a little longer
Instead of waiting in CPU runqueue, processes will wait on an RM internal queue – I will prove that to you later on
A background process called VKRM will be responsible for placing your next foreground session on OS runqueue
Priority Round Robin scheduling: It retains the advantage of round robin in reducing starvation and also integrates the advantage of priority scheduling.
The quantum that RM gives to your session is by DEFAULT 100m and it is basically a slide of CPU time – You will learn that you can play with it.
PMON starvation will cause also some stability problems (free dead process). PMON will have the same priority as your foreground session and may be an issue
Remember that latches or mutexes are just memory structures on SGA – OS doesn’t have a clue about it
If mutex holder is off the CPU and any other processes that go on CPU may want to take the same mutex it can be an very complex issue since they can’t get it and will spin waiting for it and then sleep (not in 10.2)
Oracle 11g the mutex getters do sleep instead of just yielding the CPU
You may ask: If mutex holder if off the CPU it should come back to CPU very vast right?
Answer: Yes if all the process can get the same priority – That’s what priority decay can happen and if your mutex holder is eating lot of cpu the priority can be lowered caused a priority inversion issue
I will prove 2 different things:
- The VKRM job that places the next process on OS run queue
- The existence of an Resource Manager internal queue
I will not get into much detail here but:
This is part of the output of perf on VKRM process. Perf is a linux profiler and was run against vkrm background process.
We are able to see some Oracle kernel functions here but the function that pops in is kgskrunnext and it give us an hint what is VKRM job.
QUESTION: Let’s think: If somehow we can stop or suspend this background process what will happen?
Test Case:
1 – Oracle Database Resource Manager is throttling with UTILIZATION_LIMIT
2 – ORADEBUG SUSPEND will cause the 100% of resmgr cpu quantum: No more sessions scheduled to be on OS run-queue
3 – RESUME will resume the normal behavior
Only works if DBRM is actively throtlling your session.
Test case: Increasing the number of sessions and watch operating system runqueue size using vmstat
- When DBRM is not enabled, OS-runqueue size will increase as soon as you have start to increase your number of sessions.
- At the end your runqueue is 42 and CPUs are totally busy with 0 of idle time
Same test case
OS run queue stays with same values even if your session number is increasing – This shows that your sessions are not placed directly on Operating System runqueue
Internal queue will have all your “waiting for selection” sessions.
Here an example of a schema consolidation plan that I will use to demonstrate some testing
1 – switch_io_logical - Number of logical I/O for the length of the session
2 – LOG_ONLY – No action, just record the event to SQL Monitoring (12c only)
3 – ACTIVE_SESS_POOL_p1 – 5 maximum active sessions
CANCEL_SQL – If the statements runs for more than 120 seconds it will be canceled
Please note that UTILIZATION_LIMIT replaces MAX_UTILIZATION_LIMIT (on 11.2)
As of 12c UTILIZATION_LIMIT also limits I/O if you are on Exadata and parallel servers as a percentage of parallel_server_targets.
- Fairly easy for many of you, but I will make it complex
- IF you sum the previous values of all levels you will end up with much more than 100%
EMPHASIS – For multilevel plans that use percentage – This is the default if you define a plan.
You may ask: WHAT IS THIS? Are you crazy?
Symbol: Pi and it is basically a product of a sequence
Remember:
L1: 65%
L2: 50% and 40%
L3: 100%
Quick question – How many of you test your DBRM resource manager plans after creating it?
Way over 60% on CPU utilization – Difficult to test if resource manager is respecting your directive.
- As of 12c UTILIZATION_LIMIT also limits I/O if you are on Exadata and parallel servers as a percentage of parallel_server_targets.
A little over 60% here, but it the best way you have to test if RM manager is respecting your UTILIZATION_LIMIT directive.
That value is what Oracle think that is consuming per consumer group.
Will show a sequence of events: The first event is the fireup of 3 sessions in ADHOC consumer group which has a UTILIZATION_LIMIT of 60 as you can seen on screenshot
Only sessions from ADHOC are running with a limit of 60% of Oracle CPU, consuming almost 100% of all consumer groups CPU.
- RISK consumer group has 65% percent of minimum CPU guarantee
120 seconds passed and ADHOC sessions querys got canceled
Every bit of CPU is consumed now by RISK and RISK_REPORT
That is how you should test your RM manager CPU allocation: fire up your workload and measure it.
If you are not satisfied by your results, you should be back to drawing board because defining a RM plan is an interactive process – You create it, test it, check the results and if they are not what you expect you re-create it.
In the end you can just pick your CSV, open excel and do this kind of chart. Will help you so much on visualization on what is consuming your CPU.
Values are in percentage
Explain why
- parallel_server_limit is a percentage of parameter parallel_server_target
RATIO only works for plans with one level and shows the relation between consumer groups
Expected: PX_COORDINATOR FORCED SERIAL
- Running at a very low DOP (for example DOP = 2) might actually be less efficient than running serially, because Parallel Execution comes with some (implementation) overhead that can make a Parallel Execution slower than a serial counterpart
SQL Statements enter in the Database
Statements is parsed and Oracle determines the automatic DOP
If Active parallel servers reaches the value of the PARALLEL_SERVER_LIMIT parallel statements are queued
Statements are queue in FIFO statement queues to control the plan directives: mgmt_pn or ratio priority,
After everything checked that depending of dequeue priority and availability of parallel servers statement is set able to run.
After 11.2.0.2 we do have different FIFO queue per consumer group.