SlideShare ist ein Scribd-Unternehmen logo
1 von 52
DATABASE RESOURCE
MANAGER
Drill-down the most underestimate Oracle
feature
Me
• Name: Luís Marques
http://lcmarques.com / @drune / lcarapinha@gmail.com
Luís Marques - @drune - http://lcmarques.com
Agenda
What we are going to talk?
Luís Marques - @drune - http://lcmarques.com
About Database Resource
Manager with a lot of questions,
charts, arrows, screenshots and a
Python script
Luís Marques - @drune - http://lcmarques.com
Hand Raising
Is there a simple picture that summarize Resource
Manager CPU scheduling?
Luís Marques - @drune - http://lcmarques.com
OS
OS
BeforeDatabaseResourceManager
Luís Marques - @drune - http://lcmarques.com
CPU #1
CPU #2
P#n
P#n
P#n
P#n
P#n
OS
PMON
OS
OS
OS
P#n
P#n
OS run-queue
• quantum defined by OS
• Priority can be changed by
OS
• All Oracle user sessions
have the same priority to
be selected for CPU
LGWR
SMON
DBWR
OS
OS
AfterDatabaseResourceManager
Luís Marques - @drune - http://lcmarques.com
Processes waiting
for selection
(DBRM internal queue)
CPU #1
CPU #2
S#n
S#n
S#n
S#n
S#n
S#n
DBRM internal queue
(priority aware according DBRM plan)
OS
PM
ON
OS
OS
OS
S#n
S#n
OS run-queue
OS scheduler will
decide between the
processes in run-
queue
LGW
R
Moreabout DBRM scheduler…
• DBRM Scheduler is not Database Workload Agnostic
• Priority based round robin algorithm
• Fixed quantum time slice of 100ms given to each process
(_dbrm_quantum)
• More intelligent scheduling:
• Aware of Oracle internal structures (eg: mutex, latching)
• Has code to avoid problems like priority inversion.
• No CPU starvation from critical background processes
• 2 Background Processes: VKRM and DBRM
Luís Marques - @drune - http://lcmarques.com
Hand Raising
Interesting! How do you prove that you have internal
queues and how the processes there go chosen to be
on CPU?
Luís Marques - @drune - http://lcmarques.com
DBRM –Scheduling(VKRM)
• If process must yield,VKRM background process will
determine what is the next process to be on OS runqueue:
• perf Linux profiler output:
Luís Marques - @drune - http://lcmarques.com
kgskrunnext - function that is
responsible for next-process on
OS runqueue?
DBRM –Scheduling(VKRM)
• SuspendingVKRM will place all your session eternally waiting for
CPU.
• SQL> ORADEBUG SETOSPID 16568
Oracle pid: 10, Unix process pid: 16568, image: oracle@baco (VKRM)
• SQL> ORADEBUG SUSPEND
Luís Marques - @drune - http://lcmarques.com
ORADEBUG
SUSPEND
ORADEBUG
RESUME
100% resmgr: cpu quantum
DBRM –Scheduling(CPU run-queue)
Luís Marques - @drune - http://lcmarques.com
• vmstat data with DBRM disabled:
• OS run-queue does increase while increasing session number: 41
sessions at end for 2 CPUs
As soon as sessions
increase, OS run queue
increases
DBRM –Scheduling(CPU run-queue)
• Oracle maintains an internal queue for DBRM:
• vmstat data with DBRM active
• Increasing sessions number gradually
Luís Marques - @drune - http://lcmarques.com
OS run queue doesn’t
increase even with 41
sessions and 2 CPUs
Hand Raising
Nice theory but…
I have a database with several schemas with different
priorities.
How I handle
Resource Management?
Luís Marques - @drune - http://lcmarques.com
presman–DBRM monitorscript
• DBRM MonitoringTool written in Python 2.x and cx_Oracle
• Runs onWindows, Linux and OSX
• Usage ./presman.py -m measure -o filename –c column_id -p
• Available measures: CPU, SESSION_IO, PARALLEL, EMPHASIS
• Download: http://lcmarques.com/presman-dbrm-monitor/
• Available on github: https://github.com/lcmarques/presman
Luís Marques - @drune - http://lcmarques.com
Usecaseschemaconsolidation–Plan #1
Luís Marques - @drune - http://lcmarques.com
C.G L1 L2 L3 UTILIZATION_
LIMIT
SWITCH CRITERIA SWITCH
Consumer
Group
RISK 65 % 12o Logical I/O LOG_ONLY
RSK_REP
ORT
50%
ADHOC 40% 60% 120 seconds CANCEL_SQL
OTHER_G
ROUPS
100%
Hand Raising
Hmm..but the sum of all allocation on all levels is way
over 100%?
How I know the the minimum CPU allocated per
consumer group?
Luís Marques - @drune - http://lcmarques.com
Emphasis-The MinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
• Minimum CPU for the all DBRM managed sessions, not host
minimum CPU allocation
• Minimum CPU :
Minimum % of CPU for
Consumer Group “n”
The value specified in
plan directive mgmt_pn
Product of a sequence
k = mgmt_p1
n = mgmt_pn
The sum of
mgmt_p (n-1)
level
Emphasis-The MinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
Consumer Group mgmt_p1 mgmt_p2 mgmt_p3 Maximum CPU
RISK 100%
RSK_REPORT 100%
ADHOC 60%
OTHER_GROUPS 100%
65%
17,5%
14%
3,5%
Hand Raising
Great stuff! Let’s go test the Resource Manager plan
ok?
Luís Marques - @drune - http://lcmarques.com
Test#1 –UTILIZATION_LIMIT
• ADHOC Consumer group with UTILIZATION_LIMIT = 60%
• CPU burner: burn_cpu_adhoc.sql
• UTILIZATION_LIMIT is not a host CPU limit!
• UTILIZATION_LIMIT is for Oracle user sessions managed by
DBRM
Luís Marques - @drune - http://lcmarques.com
Us ~66%
Sys ~7%
Hand Raising
Hey, hey, so how I measure it easily?
Luís Marques - @drune - http://lcmarques.com
Test#1 –UTILIZATION_LIMIT
• v$rsrcmgrmetric and v$osstat and do some math:
(cpu_consumed_time_sec / (60 * CPU_count)) * 100
• $ presman.py –m cpu -o oracle_cpu.csv –c 7 -p
Luís Marques - @drune - http://lcmarques.com
Oracle CPU in % by
Consumer Group
Hand Raising
That is easy!
How do I test my plan CPU allocation ?
Luís Marques - @drune - http://lcmarques.com
Test#2 –OracleCPUConsumption
• Step 0 – Start presman to measure CPU by CG
• $ presman.py –m cpu -o oracle_cpu.csv –c 5
• Step 1 - Fire up 3 sessions ADHOC consumer group
• Almost 100% CPU for all consumer groups is used onADHOC
Luís Marques - @drune - http://lcmarques.com
Test#2 –OracleCPUConsumption
• Step 2 - Fire up 10 sessions in consumer group RISK
• RISK have a lot more sessions and more priority
• No UTILIZATION_LIMIT directive on RISK consumer group
• ADHOC consumer groupCPU is down to almost 20% of all
consumer group CPU activity
Luís Marques - @drune - http://lcmarques.com
Test#2 –OracleCPUConsumption
• Step 3 - Fire up 5 Sessions in consumer group RSK_REPORT
• ADHOC querys got canceled to the directive CANCEL_SQL
• RISK and RISK_REPORT are consuming almost every CPU
cycle.
Luís Marques - @drune - http://lcmarques.com
Test#2 –OracleCPUConsumption
• Step 4 - Fire up 3 Sessions in consumer group ADHOC
• Real world test vs Plan Directives CPU allocation
Luís Marques - @drune - http://lcmarques.com
Consumer Group Minimum CPU Test Minimum
CPU
Sessions
RISK 65% 66,74% 10
RSK_REPORT 17,5% 18,23% 5
ADHOC 14% 14,81% 3 + 3
OTHERS_GROUP 3,5% 0,22% No sessions
Test#2 –OracleCPUConsumption
• presman historical CSV data file output_cpu.csv
Luís Marques - @drune - http://lcmarques.com
Hand Raising
Clarified!
With so many sessions for a 4 CPU database!You
surely have throttling right?
Luís Marques - @drune - http://lcmarques.com
My hand
hurts…
Test#3 –ThrottlingbyWaitEvent
• Throttling by Resource Manager can be monitored by the wait event
resmgr:cpu quantum (wait class Scheduler)
• Without Resource Manager, the time spent in “resmgr:cpu
quantum” will be spent instead as waits on the operating system run
queue.
• AWR report indication of high waits on the run queue is from the
server load numbers (11g)
• 12c AWR has more information on CPU Wait
• resmgr: cpu quantum doesn’t necessarily means you have a
overloaded CPU (eg: UTILIZATION_LIMIT directive)
Luís Marques - @drune - http://lcmarques.com
Test#3 -ThrottlingbyWaitEvent
• SQL> alter system set resource_manager_plan=‘’
• CPU available = 4 x 10.04 x 60 = 2409,6 sec
• Consumed CPU = 2053,9 (85%)
• % of CPUWait = 99.79% - 42.7% = 57,09 % of DBTime spent
of OS run queue
Luís Marques - @drune - http://lcmarques.com
Test#3 -ThrottlingbyWaitEvent
• alter system set resource_manager_plan=‘DBRM_PLAN’
• CPU available = 4 x 9,03 x 60 = 2167,2
• Consumed CPU = 1820,9 (84%)
• 63% of DBTime is spent on waiting in Resource Manager internal queue
• % of CPUWait = 36,64% - 28,1 % = Only 8,54 % of DBTime spent of OS
run queue
Luís Marques - @drune - http://lcmarques.com
Hand Raising
Good! I’ve read that we can handle parallel execution.
Handling all the parallel servers seems to be hard for
me!
Luís Marques - @drune - http://lcmarques.com
TheDW forreporting–Plan #2
Consumer
Group
RATIO PARALLE
L_DEGRE
E_LIMIT
SWITCH
_TIME
S_GROUP PARALLEL_
SERVER_LI
MIT
PARALLEL
_QUEUE_
TIMEOUT
OTHERS_GRO
UP
10 0 120 sec SHORT_RE
PORTING
SHORT_REPO
RTING
5 900 sec LONG_RE
PORTING
50%
LONG_REPOR
TING
1 50% 3600 sec
Luís Marques - @drune - http://lcmarques.com
• RATIO was used on create_plan()
• Priority statements on OTHERS_GROUPS have to execute on
serial
• To limit the parallel servers used by a consumer group, use the
parallel_server_limit directive
Hand Raising
Hey hey...WAIT! Now you used plan directives with a
thing called RATIO or SHARE! What is that?
Luís Marques - @drune - http://lcmarques.com
Ratio-TheMinimumCPU formula
Luís Marques - @drune - http://lcmarques.com
Minimum % of CPU for
Consumer Group “n”
The value specified in
plan directive mgmt_pnSum of all ratios
Consumer Group Mgmt_p1 Ratio Ratio as Emphasis
OTHERS_GROUP 10 10 / 16 = 62,5 %
SHORT_REPORTING 5 5 / 16 = 31,25 %
LONG_REPORTING 1 1 / 16 = 6,25%
Hand Raising
Can you go forward with the plan testing. I’m
interested on parallel details!
Luís Marques - @drune - http://lcmarques.com
Test#1–PARALLEL_DEGREE_LIMIT
without AUTODOP
• parallel_degree_policy= MANUAL
• OTHERS GROUPs with PARALLEL_DEGREE_LIMIT_P1 = 0 (DOP=0)
• $ burn_me.sh (1 session)
• $ presman.py –m parallel
Luís Marques - @drune - http://lcmarques.com
1 Parallel Statement
No Parallel Servers
Test#1–PARALLEL_DEGREE_LIMIT
withoutAUTODOP
• Generation of a PARALLEL plan when execution is serial is more expensive
• Large difference between DOP assumed at optimization time (hard parse
time) and actual DOP at execution time might lead to not optimal
execution plans
Luís Marques - @drune - http://lcmarques.com
Test#1–PARALLEL_DEGREE_LIMITwith
AUTODOP
• Auto DOP is enabled via parallel_degree_policy= AUTO (or
ADAPTIVE in 12c)
• Only new Auto DOP codepath negotiates with DBRM
• alter session set "_px_trace"="high",all;
• $ burn_me.sh (1 session)
Luís Marques - @drune - http://lcmarques.com
Test#2 – PARALLEL_SERVER_LIMIT
• PARALLEL_SERVER_LIMIT directive is percentage of
parameter parallel_servers_target
• Avoid a low priority user and consumer group to get all parallel
servers
• When percentage of parallel servers is reached for Consumer
Group  Statement Queued
• Auto DOP is enabled to enable Parallel Statement Queueing
Luís Marques - @drune - http://lcmarques.com
Consumer Group PARALLEL_SERVERS_TAR
GET
PARALLEL_SERVER_LIMIT
LONG_REPORTING 64 50%
SHORT_REPORTING 64 50%
Test#2 – PARALLEL_SERVER_LIMIT
• $ burn_me.sql (19 sessions) to LONG_REPORTING
• SQL> alter system set parallel_servers_target = 64
• $ presman.py –m parallel
Luís Marques - @drune - http://lcmarques.com
16 statements running
3 statements queued 32 Parallel Servers = 50%
of parallel_servers_target
Hand Raising
Clear! What about having give more or less priority to
my parallel statements when they are queued?
Luís Marques - @drune - http://lcmarques.com
Test#3–PriorityoftheParallelStatement
Queue
Luís Marques - @drune - http://lcmarques.com
Parsed Statement
& Auto DOP is
calculated
SQL
stat
SQL
stat
SQL
stat
Statement
Executes in
Parallel
SQL
stat
SQL
stat
SQL
stat
SQL
stat
FIFO Statements Queue per
Consumer Group – not
enough parallel servers or
limit reached
Enough parallel servers –
PARALLEL_SERVER_LIMIT
not reached
Statement
Executes in
Parallel
Dequeuing priority based RATIO /
SHARES or EMPHASIS values on
the Consumer Group
SQL
stat
SQL
stat
Test#3–PriorityoftheParallelStatement
Queue
• 35 sessions for SHORT and LONG Reporting Consumer
Group.
• $ burn_me_all_same_time.sh
• $ presman.py –m parallel –o queue_time.csv –c 4
• Step 1 - 16 Statements running and 19 queued for each
Consumer Group
Luís Marques - @drune - http://lcmarques.com
Test#3–PriorityoftheParallelStatement
Queue
• Step 2 - Dequeue of parallel statements started
• Step 3 – Dequeuing continues as soon as some statements
finish
• Step 4 - Almost every statement done. No queued statements
Luís Marques - @drune - http://lcmarques.com
Test#3–PriorityoftheParallelStatement
Queue
• SHORT_REPORTING QueueTime: 7719385 milliseconds
• LONG_REPORTING QueueTime: 11375129 milliseconds
67,8% less queue time for SHORT_REPORTING
• SHORT_REPORTING ratio is 5 for 1 in LONG_REPORTING
• SHORT_REPORTING has 5 times more probability to get one
statement dequeded than LONG_REPORTING.
Luís Marques - @drune - http://lcmarques.com
Hand Raising
What if I have some critical reports that need to
bypass the queue because they are critical?
Luís Marques - @drune - http://lcmarques.com
CriticalParallelStatementQueues
• Oracle 12c introduced parallel_stmt_critical on plan directives
• Allows one value: BYPASS_QUEUE
• Sessions will start immediately and not wait in the queue.
• parallel_max_servers init parameter is the hard threshold and
critical statements can run with lower number of PX servers
dbms_resource_manager.create_plan_directive( plan =>
'REPORTS_PLAN',
group_or_subplan => 'CRITICAL_REPORT', comment => 'CRITICAL
Reporting Querys',
parallel_stmt_critical => 'BYPASS_QUEUE');
Luís Marques - @drune - http://lcmarques.com
Q & A
Luís Marques - @drune - http://lcmarques.com
I bet we don’t
have time for it
Wanttoknow more?
• Dump the state of DBRM with:
• SQL> oradebug setmypid
• SQL> oradebug dump DBSCHEDULER 1
• Trace wait events with 12c interface:
• SQL> alter session set events 'wait_event["resmgr:cpu
quantum"] trace("%sn", shortstack())';
• SQL> exec
DBMS_MONITOR.SESSION_TRACE_ENABLE(waits => true,
binds => false, plan_stat => 'NEVER');
Luís Marques - @drune - http://lcmarques.com

Weitere ähnliche Inhalte

Was ist angesagt?

Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014John Beresniewicz
 
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationWhitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationKristofferson A
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and BenchmarksJignesh Shah
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWRpasalapudi
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Oracle Database Performance Tuning Concept
Oracle Database Performance Tuning ConceptOracle Database Performance Tuning Concept
Oracle Database Performance Tuning ConceptChien Chung Shen
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA EDB
 
AWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add upAWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add upJohn Beresniewicz
 
Oracle db performance tuning
Oracle db performance tuningOracle db performance tuning
Oracle db performance tuningSimon Huang
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Denish Patel
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuningAbishek V S
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersAmal G Jose
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And Whatudaymoogala
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityEdureka!
 
You most probably dont need an RMAN catalog database
You most probably dont need an RMAN catalog databaseYou most probably dont need an RMAN catalog database
You most probably dont need an RMAN catalog databaseYury Velikanov
 

Was ist angesagt? (20)

ha_module5
ha_module5ha_module5
ha_module5
 
Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014Ash architecture and advanced usage rmoug2014
Ash architecture and advanced usage rmoug2014
 
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationWhitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
Oracle Database Performance Tuning Concept
Oracle Database Performance Tuning ConceptOracle Database Performance Tuning Concept
Oracle Database Performance Tuning Concept
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
AWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add upAWR Ambiguity: Performance reasoning when the numbers don't add up
AWR Ambiguity: Performance reasoning when the numbers don't add up
 
Oracle db performance tuning
Oracle db performance tuningOracle db performance tuning
Oracle db performance tuning
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
 
SQL Server vs Postgres
SQL Server vs PostgresSQL Server vs Postgres
SQL Server vs Postgres
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
You most probably dont need an RMAN catalog database
You most probably dont need an RMAN catalog databaseYou most probably dont need an RMAN catalog database
You most probably dont need an RMAN catalog database
 

Ähnlich wie Drill Down the most underestimate Oracle Feature - Database Resource Manager

DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerMaris Elsins
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...Pradeep Redddy Raamana
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerAndrejs Vorobjovs
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d methodAjith Narayanan
 
collab2011-tuning-ebusiness-421966.pdf
collab2011-tuning-ebusiness-421966.pdfcollab2011-tuning-ebusiness-421966.pdf
collab2011-tuning-ebusiness-421966.pdfElboulmaniMohamed
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the fieldJoAnna Cheshire
 
Collaborate 2011-tuning-ebusiness-416502
Collaborate 2011-tuning-ebusiness-416502Collaborate 2011-tuning-ebusiness-416502
Collaborate 2011-tuning-ebusiness-416502kaziul Islam Bulbul
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
How Busy Is Too Busy? Automating Your System for Maximum Throughput
How Busy Is Too Busy? Automating Your System for Maximum Throughput How Busy Is Too Busy? Automating Your System for Maximum Throughput
How Busy Is Too Busy? Automating Your System for Maximum Throughput Compuware
 
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]vasuballa
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training ClassDeepak Shankar
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaaCuneyt Goksu
 
Think Exa!
Think Exa!Think Exa!
Think Exa!Enkitec
 
ITCamp 2013 - Tobiasz Koprowski - 2AM A Disaster Just Began
ITCamp 2013 - Tobiasz Koprowski - 2AM A Disaster Just BeganITCamp 2013 - Tobiasz Koprowski - 2AM A Disaster Just Began
ITCamp 2013 - Tobiasz Koprowski - 2AM A Disaster Just BeganITCamp
 
KoprowskiT_it_camp2013 - 2amADisasterJustBegan
KoprowskiT_it_camp2013 - 2amADisasterJustBeganKoprowskiT_it_camp2013 - 2amADisasterJustBegan
KoprowskiT_it_camp2013 - 2amADisasterJustBeganTobias Koprowski
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Sid Anand
 

Ähnlich wie Drill Down the most underestimate Oracle Feature - Database Resource Manager (20)

DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource Manager
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
 
DB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource ManagerDB12c: All You Need to Know About the Resource Manager
DB12c: All You Need to Know About the Resource Manager
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
collab2011-tuning-ebusiness-421966.pdf
collab2011-tuning-ebusiness-421966.pdfcollab2011-tuning-ebusiness-421966.pdf
collab2011-tuning-ebusiness-421966.pdf
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
Collaborate 2011-tuning-ebusiness-416502
Collaborate 2011-tuning-ebusiness-416502Collaborate 2011-tuning-ebusiness-416502
Collaborate 2011-tuning-ebusiness-416502
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
How Busy Is Too Busy? Automating Your System for Maximum Throughput
How Busy Is Too Busy? Automating Your System for Maximum Throughput How Busy Is Too Busy? Automating Your System for Maximum Throughput
How Busy Is Too Busy? Automating Your System for Maximum Throughput
 
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaa
 
Think Exa!
Think Exa!Think Exa!
Think Exa!
 
ITCamp 2013 - Tobiasz Koprowski - 2AM A Disaster Just Began
ITCamp 2013 - Tobiasz Koprowski - 2AM A Disaster Just BeganITCamp 2013 - Tobiasz Koprowski - 2AM A Disaster Just Began
ITCamp 2013 - Tobiasz Koprowski - 2AM A Disaster Just Began
 
KoprowskiT_it_camp2013 - 2amADisasterJustBegan
KoprowskiT_it_camp2013 - 2amADisasterJustBeganKoprowskiT_it_camp2013 - 2amADisasterJustBegan
KoprowskiT_it_camp2013 - 2amADisasterJustBegan
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)Resilient Predictive Data Pipelines (QCon London 2016)
Resilient Predictive Data Pipelines (QCon London 2016)
 

Kürzlich hochgeladen

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Drill Down the most underestimate Oracle Feature - Database Resource Manager

  • 1. DATABASE RESOURCE MANAGER Drill-down the most underestimate Oracle feature
  • 2. Me • Name: Luís Marques http://lcmarques.com / @drune / lcarapinha@gmail.com Luís Marques - @drune - http://lcmarques.com
  • 3. Agenda What we are going to talk? Luís Marques - @drune - http://lcmarques.com
  • 4. About Database Resource Manager with a lot of questions, charts, arrows, screenshots and a Python script Luís Marques - @drune - http://lcmarques.com
  • 5. Hand Raising Is there a simple picture that summarize Resource Manager CPU scheduling? Luís Marques - @drune - http://lcmarques.com
  • 6. OS OS BeforeDatabaseResourceManager Luís Marques - @drune - http://lcmarques.com CPU #1 CPU #2 P#n P#n P#n P#n P#n OS PMON OS OS OS P#n P#n OS run-queue • quantum defined by OS • Priority can be changed by OS • All Oracle user sessions have the same priority to be selected for CPU LGWR SMON DBWR
  • 7. OS OS AfterDatabaseResourceManager Luís Marques - @drune - http://lcmarques.com Processes waiting for selection (DBRM internal queue) CPU #1 CPU #2 S#n S#n S#n S#n S#n S#n DBRM internal queue (priority aware according DBRM plan) OS PM ON OS OS OS S#n S#n OS run-queue OS scheduler will decide between the processes in run- queue LGW R
  • 8. Moreabout DBRM scheduler… • DBRM Scheduler is not Database Workload Agnostic • Priority based round robin algorithm • Fixed quantum time slice of 100ms given to each process (_dbrm_quantum) • More intelligent scheduling: • Aware of Oracle internal structures (eg: mutex, latching) • Has code to avoid problems like priority inversion. • No CPU starvation from critical background processes • 2 Background Processes: VKRM and DBRM Luís Marques - @drune - http://lcmarques.com
  • 9. Hand Raising Interesting! How do you prove that you have internal queues and how the processes there go chosen to be on CPU? Luís Marques - @drune - http://lcmarques.com
  • 10. DBRM –Scheduling(VKRM) • If process must yield,VKRM background process will determine what is the next process to be on OS runqueue: • perf Linux profiler output: Luís Marques - @drune - http://lcmarques.com kgskrunnext - function that is responsible for next-process on OS runqueue?
  • 11. DBRM –Scheduling(VKRM) • SuspendingVKRM will place all your session eternally waiting for CPU. • SQL> ORADEBUG SETOSPID 16568 Oracle pid: 10, Unix process pid: 16568, image: oracle@baco (VKRM) • SQL> ORADEBUG SUSPEND Luís Marques - @drune - http://lcmarques.com ORADEBUG SUSPEND ORADEBUG RESUME 100% resmgr: cpu quantum
  • 12. DBRM –Scheduling(CPU run-queue) Luís Marques - @drune - http://lcmarques.com • vmstat data with DBRM disabled: • OS run-queue does increase while increasing session number: 41 sessions at end for 2 CPUs As soon as sessions increase, OS run queue increases
  • 13. DBRM –Scheduling(CPU run-queue) • Oracle maintains an internal queue for DBRM: • vmstat data with DBRM active • Increasing sessions number gradually Luís Marques - @drune - http://lcmarques.com OS run queue doesn’t increase even with 41 sessions and 2 CPUs
  • 14. Hand Raising Nice theory but… I have a database with several schemas with different priorities. How I handle Resource Management? Luís Marques - @drune - http://lcmarques.com
  • 15. presman–DBRM monitorscript • DBRM MonitoringTool written in Python 2.x and cx_Oracle • Runs onWindows, Linux and OSX • Usage ./presman.py -m measure -o filename –c column_id -p • Available measures: CPU, SESSION_IO, PARALLEL, EMPHASIS • Download: http://lcmarques.com/presman-dbrm-monitor/ • Available on github: https://github.com/lcmarques/presman Luís Marques - @drune - http://lcmarques.com
  • 16. Usecaseschemaconsolidation–Plan #1 Luís Marques - @drune - http://lcmarques.com C.G L1 L2 L3 UTILIZATION_ LIMIT SWITCH CRITERIA SWITCH Consumer Group RISK 65 % 12o Logical I/O LOG_ONLY RSK_REP ORT 50% ADHOC 40% 60% 120 seconds CANCEL_SQL OTHER_G ROUPS 100%
  • 17. Hand Raising Hmm..but the sum of all allocation on all levels is way over 100%? How I know the the minimum CPU allocated per consumer group? Luís Marques - @drune - http://lcmarques.com
  • 18. Emphasis-The MinimumCPU formula Luís Marques - @drune - http://lcmarques.com • Minimum CPU for the all DBRM managed sessions, not host minimum CPU allocation • Minimum CPU : Minimum % of CPU for Consumer Group “n” The value specified in plan directive mgmt_pn Product of a sequence k = mgmt_p1 n = mgmt_pn The sum of mgmt_p (n-1) level
  • 19. Emphasis-The MinimumCPU formula Luís Marques - @drune - http://lcmarques.com Consumer Group mgmt_p1 mgmt_p2 mgmt_p3 Maximum CPU RISK 100% RSK_REPORT 100% ADHOC 60% OTHER_GROUPS 100% 65% 17,5% 14% 3,5%
  • 20. Hand Raising Great stuff! Let’s go test the Resource Manager plan ok? Luís Marques - @drune - http://lcmarques.com
  • 21. Test#1 –UTILIZATION_LIMIT • ADHOC Consumer group with UTILIZATION_LIMIT = 60% • CPU burner: burn_cpu_adhoc.sql • UTILIZATION_LIMIT is not a host CPU limit! • UTILIZATION_LIMIT is for Oracle user sessions managed by DBRM Luís Marques - @drune - http://lcmarques.com Us ~66% Sys ~7%
  • 22. Hand Raising Hey, hey, so how I measure it easily? Luís Marques - @drune - http://lcmarques.com
  • 23. Test#1 –UTILIZATION_LIMIT • v$rsrcmgrmetric and v$osstat and do some math: (cpu_consumed_time_sec / (60 * CPU_count)) * 100 • $ presman.py –m cpu -o oracle_cpu.csv –c 7 -p Luís Marques - @drune - http://lcmarques.com Oracle CPU in % by Consumer Group
  • 24. Hand Raising That is easy! How do I test my plan CPU allocation ? Luís Marques - @drune - http://lcmarques.com
  • 25. Test#2 –OracleCPUConsumption • Step 0 – Start presman to measure CPU by CG • $ presman.py –m cpu -o oracle_cpu.csv –c 5 • Step 1 - Fire up 3 sessions ADHOC consumer group • Almost 100% CPU for all consumer groups is used onADHOC Luís Marques - @drune - http://lcmarques.com
  • 26. Test#2 –OracleCPUConsumption • Step 2 - Fire up 10 sessions in consumer group RISK • RISK have a lot more sessions and more priority • No UTILIZATION_LIMIT directive on RISK consumer group • ADHOC consumer groupCPU is down to almost 20% of all consumer group CPU activity Luís Marques - @drune - http://lcmarques.com
  • 27. Test#2 –OracleCPUConsumption • Step 3 - Fire up 5 Sessions in consumer group RSK_REPORT • ADHOC querys got canceled to the directive CANCEL_SQL • RISK and RISK_REPORT are consuming almost every CPU cycle. Luís Marques - @drune - http://lcmarques.com
  • 28. Test#2 –OracleCPUConsumption • Step 4 - Fire up 3 Sessions in consumer group ADHOC • Real world test vs Plan Directives CPU allocation Luís Marques - @drune - http://lcmarques.com Consumer Group Minimum CPU Test Minimum CPU Sessions RISK 65% 66,74% 10 RSK_REPORT 17,5% 18,23% 5 ADHOC 14% 14,81% 3 + 3 OTHERS_GROUP 3,5% 0,22% No sessions
  • 29. Test#2 –OracleCPUConsumption • presman historical CSV data file output_cpu.csv Luís Marques - @drune - http://lcmarques.com
  • 30. Hand Raising Clarified! With so many sessions for a 4 CPU database!You surely have throttling right? Luís Marques - @drune - http://lcmarques.com My hand hurts…
  • 31. Test#3 –ThrottlingbyWaitEvent • Throttling by Resource Manager can be monitored by the wait event resmgr:cpu quantum (wait class Scheduler) • Without Resource Manager, the time spent in “resmgr:cpu quantum” will be spent instead as waits on the operating system run queue. • AWR report indication of high waits on the run queue is from the server load numbers (11g) • 12c AWR has more information on CPU Wait • resmgr: cpu quantum doesn’t necessarily means you have a overloaded CPU (eg: UTILIZATION_LIMIT directive) Luís Marques - @drune - http://lcmarques.com
  • 32. Test#3 -ThrottlingbyWaitEvent • SQL> alter system set resource_manager_plan=‘’ • CPU available = 4 x 10.04 x 60 = 2409,6 sec • Consumed CPU = 2053,9 (85%) • % of CPUWait = 99.79% - 42.7% = 57,09 % of DBTime spent of OS run queue Luís Marques - @drune - http://lcmarques.com
  • 33. Test#3 -ThrottlingbyWaitEvent • alter system set resource_manager_plan=‘DBRM_PLAN’ • CPU available = 4 x 9,03 x 60 = 2167,2 • Consumed CPU = 1820,9 (84%) • 63% of DBTime is spent on waiting in Resource Manager internal queue • % of CPUWait = 36,64% - 28,1 % = Only 8,54 % of DBTime spent of OS run queue Luís Marques - @drune - http://lcmarques.com
  • 34. Hand Raising Good! I’ve read that we can handle parallel execution. Handling all the parallel servers seems to be hard for me! Luís Marques - @drune - http://lcmarques.com
  • 35. TheDW forreporting–Plan #2 Consumer Group RATIO PARALLE L_DEGRE E_LIMIT SWITCH _TIME S_GROUP PARALLEL_ SERVER_LI MIT PARALLEL _QUEUE_ TIMEOUT OTHERS_GRO UP 10 0 120 sec SHORT_RE PORTING SHORT_REPO RTING 5 900 sec LONG_RE PORTING 50% LONG_REPOR TING 1 50% 3600 sec Luís Marques - @drune - http://lcmarques.com • RATIO was used on create_plan() • Priority statements on OTHERS_GROUPS have to execute on serial • To limit the parallel servers used by a consumer group, use the parallel_server_limit directive
  • 36. Hand Raising Hey hey...WAIT! Now you used plan directives with a thing called RATIO or SHARE! What is that? Luís Marques - @drune - http://lcmarques.com
  • 37. Ratio-TheMinimumCPU formula Luís Marques - @drune - http://lcmarques.com Minimum % of CPU for Consumer Group “n” The value specified in plan directive mgmt_pnSum of all ratios Consumer Group Mgmt_p1 Ratio Ratio as Emphasis OTHERS_GROUP 10 10 / 16 = 62,5 % SHORT_REPORTING 5 5 / 16 = 31,25 % LONG_REPORTING 1 1 / 16 = 6,25%
  • 38. Hand Raising Can you go forward with the plan testing. I’m interested on parallel details! Luís Marques - @drune - http://lcmarques.com
  • 39. Test#1–PARALLEL_DEGREE_LIMIT without AUTODOP • parallel_degree_policy= MANUAL • OTHERS GROUPs with PARALLEL_DEGREE_LIMIT_P1 = 0 (DOP=0) • $ burn_me.sh (1 session) • $ presman.py –m parallel Luís Marques - @drune - http://lcmarques.com 1 Parallel Statement No Parallel Servers
  • 40. Test#1–PARALLEL_DEGREE_LIMIT withoutAUTODOP • Generation of a PARALLEL plan when execution is serial is more expensive • Large difference between DOP assumed at optimization time (hard parse time) and actual DOP at execution time might lead to not optimal execution plans Luís Marques - @drune - http://lcmarques.com
  • 41. Test#1–PARALLEL_DEGREE_LIMITwith AUTODOP • Auto DOP is enabled via parallel_degree_policy= AUTO (or ADAPTIVE in 12c) • Only new Auto DOP codepath negotiates with DBRM • alter session set "_px_trace"="high",all; • $ burn_me.sh (1 session) Luís Marques - @drune - http://lcmarques.com
  • 42. Test#2 – PARALLEL_SERVER_LIMIT • PARALLEL_SERVER_LIMIT directive is percentage of parameter parallel_servers_target • Avoid a low priority user and consumer group to get all parallel servers • When percentage of parallel servers is reached for Consumer Group  Statement Queued • Auto DOP is enabled to enable Parallel Statement Queueing Luís Marques - @drune - http://lcmarques.com Consumer Group PARALLEL_SERVERS_TAR GET PARALLEL_SERVER_LIMIT LONG_REPORTING 64 50% SHORT_REPORTING 64 50%
  • 43. Test#2 – PARALLEL_SERVER_LIMIT • $ burn_me.sql (19 sessions) to LONG_REPORTING • SQL> alter system set parallel_servers_target = 64 • $ presman.py –m parallel Luís Marques - @drune - http://lcmarques.com 16 statements running 3 statements queued 32 Parallel Servers = 50% of parallel_servers_target
  • 44. Hand Raising Clear! What about having give more or less priority to my parallel statements when they are queued? Luís Marques - @drune - http://lcmarques.com
  • 45. Test#3–PriorityoftheParallelStatement Queue Luís Marques - @drune - http://lcmarques.com Parsed Statement & Auto DOP is calculated SQL stat SQL stat SQL stat Statement Executes in Parallel SQL stat SQL stat SQL stat SQL stat FIFO Statements Queue per Consumer Group – not enough parallel servers or limit reached Enough parallel servers – PARALLEL_SERVER_LIMIT not reached Statement Executes in Parallel Dequeuing priority based RATIO / SHARES or EMPHASIS values on the Consumer Group SQL stat SQL stat
  • 46. Test#3–PriorityoftheParallelStatement Queue • 35 sessions for SHORT and LONG Reporting Consumer Group. • $ burn_me_all_same_time.sh • $ presman.py –m parallel –o queue_time.csv –c 4 • Step 1 - 16 Statements running and 19 queued for each Consumer Group Luís Marques - @drune - http://lcmarques.com
  • 47. Test#3–PriorityoftheParallelStatement Queue • Step 2 - Dequeue of parallel statements started • Step 3 – Dequeuing continues as soon as some statements finish • Step 4 - Almost every statement done. No queued statements Luís Marques - @drune - http://lcmarques.com
  • 48. Test#3–PriorityoftheParallelStatement Queue • SHORT_REPORTING QueueTime: 7719385 milliseconds • LONG_REPORTING QueueTime: 11375129 milliseconds 67,8% less queue time for SHORT_REPORTING • SHORT_REPORTING ratio is 5 for 1 in LONG_REPORTING • SHORT_REPORTING has 5 times more probability to get one statement dequeded than LONG_REPORTING. Luís Marques - @drune - http://lcmarques.com
  • 49. Hand Raising What if I have some critical reports that need to bypass the queue because they are critical? Luís Marques - @drune - http://lcmarques.com
  • 50. CriticalParallelStatementQueues • Oracle 12c introduced parallel_stmt_critical on plan directives • Allows one value: BYPASS_QUEUE • Sessions will start immediately and not wait in the queue. • parallel_max_servers init parameter is the hard threshold and critical statements can run with lower number of PX servers dbms_resource_manager.create_plan_directive( plan => 'REPORTS_PLAN', group_or_subplan => 'CRITICAL_REPORT', comment => 'CRITICAL Reporting Querys', parallel_stmt_critical => 'BYPASS_QUEUE'); Luís Marques - @drune - http://lcmarques.com
  • 51. Q & A Luís Marques - @drune - http://lcmarques.com I bet we don’t have time for it
  • 52. Wanttoknow more? • Dump the state of DBRM with: • SQL> oradebug setmypid • SQL> oradebug dump DBSCHEDULER 1 • Trace wait events with 12c interface: • SQL> alter session set events 'wait_event["resmgr:cpu quantum"] trace("%sn", shortstack())'; • SQL> exec DBMS_MONITOR.SESSION_TRACE_ENABLE(waits => true, binds => false, plan_stat => 'NEVER'); Luís Marques - @drune - http://lcmarques.com

Hinweis der Redaktion

  1. How many of you are using DBRM? Underestimate because: it is very powerful, not very well understood and poorly used.
  2. Part 1: Theory on DBRM scheduler details with CPU session scheduling in mind Part 2: More pratical RM plan, testing and validating the most interesting features A lot of images and arrows will appear during the presentation The presentation will be driven by a guy that is constantly interrupting it and asking questions
  3. Database Resource Manager is basically a scheduler like the one you will find on your operating system. The difference is that it knows very well your workload and Oracle because it is inside it.
  4. Priority decay can happen and if your mutex holder is eating lot of cpu the priority can be lowered caused a priority inversion issue
  5. 1 – BLUE: Processes in DBRM internal queue waiting to be selected and placed on operating system run queue. This process selection is made according your RM plan and VKRM background process will place the next process in os-runqueue for selection Operation System is then responsible for place everything on CPUs 2 – ORANGE: Please note that DBRM will take care of priority of PMON (and other Oracle Background Processes) and it will try to avoid any type of CPU starvation for it, even if that means that your session must wait a little longer
  6. Instead of waiting in CPU runqueue, processes will wait on an RM internal queue – I will prove that to you later on A background process called VKRM will be responsible for placing your next foreground session on OS runqueue Priority Round Robin scheduling: It retains the advantage of round robin in reducing starvation and also integrates the advantage of priority scheduling. The quantum that RM gives to your session is by DEFAULT 100m and it is basically a slide of CPU time – You will learn that you can play with it. PMON starvation will cause also some stability problems (free dead process). PMON will have the same priority as your foreground session and may be an issue Remember that latches or mutexes are just memory structures on SGA – OS doesn’t have a clue about it If mutex holder is off the CPU and any other processes that go on CPU may want to take the same mutex it can be an very complex issue since they can’t get it and will spin waiting for it and then sleep (not in 10.2) Oracle 11g the mutex getters do sleep instead of just yielding the CPU You may ask: If mutex holder if off the CPU it should come back to CPU very vast right? Answer: Yes if all the process can get the same priority – That’s what priority decay can happen and if your mutex holder is eating lot of cpu the priority can be lowered caused a priority inversion issue
  7. I will prove 2 different things: - The VKRM job that places the next process on OS run queue - The existence of an Resource Manager internal queue
  8. I will not get into much detail here but: This is part of the output of perf on VKRM process. Perf is a linux profiler and was run against vkrm background process. We are able to see some Oracle kernel functions here but the function that pops in is kgskrunnext and it give us an hint what is VKRM job. QUESTION: Let’s think: If somehow we can stop or suspend this background process what will happen?
  9. Test Case: 1 – Oracle Database Resource Manager is throttling with UTILIZATION_LIMIT 2 – ORADEBUG SUSPEND will cause the 100% of resmgr cpu quantum: No more sessions scheduled to be on OS run-queue 3 – RESUME will resume the normal behavior Only works if DBRM is actively throtlling your session.
  10. Test case: Increasing the number of sessions and watch operating system runqueue size using vmstat - When DBRM is not enabled, OS-runqueue size will increase as soon as you have start to increase your number of sessions. - At the end your runqueue is 42 and CPUs are totally busy with 0 of idle time
  11. Same test case OS run queue stays with same values even if your session number is increasing – This shows that your sessions are not placed directly on Operating System runqueue Internal queue will have all your “waiting for selection” sessions.
  12. Here an example of a schema consolidation plan that I will use to demonstrate some testing 1 – switch_io_logical - Number of logical I/O for the length of the session 2 – LOG_ONLY – No action, just record the event to SQL Monitoring (12c only) 3 – ACTIVE_SESS_POOL_p1 – 5 maximum active sessions CANCEL_SQL – If the statements runs for more than 120 seconds it will be canceled Please note that UTILIZATION_LIMIT replaces MAX_UTILIZATION_LIMIT (on 11.2) As of 12c UTILIZATION_LIMIT also limits I/O if you are on Exadata and parallel servers as a percentage of parallel_server_targets.
  13. - Fairly easy for many of you, but I will make it complex - IF you sum the previous values of all levels you will end up with much more than 100%
  14. EMPHASIS – For multilevel plans that use percentage – This is the default if you define a plan. You may ask: WHAT IS THIS? Are you crazy? Symbol: Pi and it is basically a product of a sequence
  15. Remember: L1: 65% L2: 50% and 40% L3: 100%
  16. Quick question – How many of you test your DBRM resource manager plans after creating it?
  17. Way over 60% on CPU utilization – Difficult to test if resource manager is respecting your directive. - As of 12c UTILIZATION_LIMIT also limits I/O if you are on Exadata and parallel servers as a percentage of parallel_server_targets.
  18. A little over 60% here, but it the best way you have to test if RM manager is respecting your UTILIZATION_LIMIT directive. That value is what Oracle think that is consuming per consumer group.
  19. Will show a sequence of events: The first event is the fireup of 3 sessions in ADHOC consumer group which has a UTILIZATION_LIMIT of 60 as you can seen on screenshot Only sessions from ADHOC are running with a limit of 60% of Oracle CPU, consuming almost 100% of all consumer groups CPU.
  20. - RISK consumer group has 65% percent of minimum CPU guarantee
  21. 120 seconds passed and ADHOC sessions querys got canceled Every bit of CPU is consumed now by RISK and RISK_REPORT
  22. That is how you should test your RM manager CPU allocation: fire up your workload and measure it. If you are not satisfied by your results, you should be back to drawing board because defining a RM plan is an interactive process – You create it, test it, check the results and if they are not what you expect you re-create it.
  23. In the end you can just pick your CSV, open excel and do this kind of chart. Will help you so much on visualization on what is consuming your CPU. Values are in percentage
  24. Explain why
  25. - parallel_server_limit is a percentage of parameter parallel_server_target
  26. RATIO only works for plans with one level and shows the relation between consumer groups
  27. Expected: PX_COORDINATOR FORCED SERIAL
  28. - Running at a very low DOP (for example DOP = 2) might actually be less efficient than running serially, because Parallel Execution comes with some (implementation) overhead that can make a Parallel Execution slower than a serial counterpart
  29. SQL Statements enter in the Database Statements is parsed and Oracle determines the automatic DOP If Active parallel servers reaches the value of the PARALLEL_SERVER_LIMIT parallel statements are queued Statements are queue in FIFO statement queues to control the plan directives: mgmt_pn or ratio priority, After everything checked that depending of dequeue priority and availability of parallel servers statement is set able to run. After 11.2.0.2 we do have different FIFO queue per consumer group.