Oow2007 performance

Practical Performance Management for Oracle RAC Barb Lundhild RAC Product Management Michael Zoll RAC Development, Performance

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],<Insert Picture Here>

OBJECTIVE ,[object Object],[object Object],[object Object],[object Object]

<Insert Picture Here> RAC Fundamentals and Infrastructure

Oracle RAC Architecture Service public network Node1 Operating System Oracle Clusterware instance 1 ASM VIP1 Listener Node 2 Operating System Oracle Clusterware instance 2 ASM VIP2 Listener Service Node n Operating System Oracle Clusterware instance n ASM VIPn Listener Service /…/ Redo / Archive logs all instances shared storage Database / Control files OCR and Voting Disks Managed by ASM RAW Devices

Oracle Clusterware Node1 public network EVMD CRSD OPROCD ONS VIP1 CSSD Node 2 EVMD CRSD OPROCD ONS VIP2 CSSD Node n EVMD CRSD OPROCD ONS VIPn CSSD /…/ shared storage CSSD Runs in Real Time Priority OCR and Voting Disks RAW Devices

Under the Covers Node n Node 2 Data Files and Control Files Dictionary Cache VKTM LGWR DBW0 SMON PMON Library Cache Global Resoruce Directory LMS0 Instance 2 SGA Instance n Cluster Private High Speed Network LMON LMD0 DIAG Dictionary Cache VKTM LGWR DBW0 SMON PMON Library Cache Global Resoruce Directory LMS0 LMON LMD0 DIAG Dictionary Cache VKTM LGWR DBW0 SMON PMON Library Cache Global Resoruce Directory LMS0 LMON LMD0 DIAG Instance 1 Node 1 SGA SGA Runs in Real Time Priority Redo Log Files Redo Log Files Redo Log Files Log buffer Buffer Cache Log buffer Buffer Cache Log buffer Buffer Cache

Global Cache Service (GCS) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Cache Hierarchy: Data in Remote Cache Local Cache Miss Datablock Requested Datablock Returned Remote Cache Hit

Cache Hierarchy: Data On Disk Local Cache Miss Datablock Requested Grant Returned Remote Cache Miss Disk Read

Cache Hierarchy: Read Mostly Local Cache Miss No Message required Disk Read

Performance of Cache Fusion Message:~200 bytes Block: e.g. 8K LMS Initiate send and wait Receive Process block Send Receive 200 bytes/(1 Gb/sec ) 8192 bytes/(1 Gb/sec) Total access time: e.g. ~360 microseconds (UDP over GBE) Network propagation delay ( “wire time” ) is a minor factor for roundtrip time ( approx.: 6% , vs. 52% in OS and network stack )

Fundamentals: Minimum Latency (*), UDP/GBE and RDS/IB (*) roundtrip, blocks are not “busy” i.e. no log flush, no serialization ( “buffer busy”) AWR and Statspack reports would report averages as if they were normally distributed, the session wait history which is included in Statspack in 10.2 and AWR in 11g will show the actual quantiles The minimum values in this table are the optimal values for 2-way and 3-way block transfers, but can be assumed to be the expected values ( I.e. 10ms for a 2-way block would be very high ) 0.20 0.16 0.13 0.12 RDS/IB 0.46 0.36 0.31 0.30 UDP/GE 16K 8K 4K 2K Block size RT (ms)

Infrastructure: Private Interconnect ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Infrastructure: Interconnect Bandwidth ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Infrastructure: IPC configuration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Infrastructure: Operating System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

<Insert Picture Here> Common Problems and Symptoms

Common Problems and Symptoms ,[object Object],[object Object],[object Object],[object Object],[object Object],<Insert Picture Here>

Miss-configured or Faulty Interconnect Can Cause: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

“Lost Blocks”: NIC Receive Errors Db_block_size = 8K ifconfig –a: eth0 Link encap:Ethernet HWaddr 00:0B:DB:4B:A2:04 inet addr:130.35.25.110 Bcast:130.35.27.255 Mask:255.255.252.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:21721236 errors:135 dropped:0 overruns:0 frame:95 TX packets:273120 errors:0 dropped:0 overruns:0 carrier:0 …

“Lost Blocks”: IP Packet Reassembly Failures netstat –s Ip: 84884742 total packets received … 1201 fragments dropped after timeout … 3384 packet reassembles failed

Finding a Problem with the Interconnect or IPC Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s)(ms) Time Wait Class ---------------------------------------------------------------------------------------------------- log file sync 286,038 49,872 174 41.7 Commit gc buffer busy 177,315 29,021 164 24.3 Cluster gc cr block busy 110,348 5,703 52 4.8 Cluster gc cr block lost 4,272 4,953 1159 4.1 Cluster cr request retry 6,316 4,668 739 3.9 Other Should never be here

Global Cache Lost block handling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Interconnect Statistics Automatic Workload Repository (AWR ) Target Avg Latency Stddev Avg Latency Stddev Instance 500B msg 500B msg 8K msg 8K msg --------------------------------------------------------------------- 1 .79 .65 1.04 1.06 2 .75 .57 . 95 .78 3 .55 .59 .53 .59 4 1.59 3.16 1.46 1.82 --------------------------------------------------------------------- Latency probes for different message sizes Exact throughput measurements ( not shown) Send and receive errors, dropped packets ( not shown )

“Blocks Lost”: Solution ,[object Object],[object Object]

Disk IO Performance Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Cluster-Wide I/O Impact Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s)(ms) Time ------------------------------ ------------ ----------- ------ ------ log file sync 286,038 49,872 174 41.7 gc buffer busy 177,315 29,021 164 24.3 gc cr block busy 110,348 5,703 52 4.8 `` Load Profile ~~~~~~~~~~~~ Per Second --------------- Redo size: 40,982.21 Logical reads: 81,652.41 Physical reads: 51,193.37 Node 2 Node 1 Expensive Query in Node 2 1. IO on disk group containing redo logs is bottlenecked 2. Block shipping for “hot” blocks is delayed by log flush IO 3. Serialization/Queues build up

IO and/or Bad SQL problem fixed Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class --------------------------- --------- ----------- ---- ------ ---------- CPU time 4,580 65.4 log file sync 276,281 1,501 5 21.4 Commit log file parallel write 298,045 923 3 13.2 System I/O gc current block 3-way 605,628 631 1 9.0 Cluster gc cr block 3-way 514,218 533 1 7.6 Cluster 1. Log file writes are normal 2. Global serialization has disappeared

Drill-down: An IO capacity problem Symptom of Full Table Scans I/O contention Top 5 Timed Events Avg %Total wait Call Event Waits Time(s) (ms) Time Wait Class ---------------- -------- ------- ---- ---- ---------- db file scattered read 3,747,683 368,301 98 33.3 User I/O gc buffer busy 3,376,228 233,632 69 21.1 Cluster db file parallel read 1,552,284 225,218 145 20.4 User I/O gc cr multi block 35,588,800 101,888 3 9.2 Cluster request read by other session 1,263,599 82,915 66 7.5 User I/O

IO issues: Solution ,[object Object],[object Object]

CPU Saturation or Long Run Queues Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s) (ms) Time Wait Class ----------------- --------- ------ ---- ----- ---------- db file sequential 1,312,840 21,590 16 21.8 User I/O read gc current block 275,004 21,054 77 21.3 Cluster congested gc cr grant congested 177,044 13,495 76 13.6 Cluster gc current block 1,192,113 9,931 8 10.0 Cluster 2-way gc cr block congested 85,975 8,917 104 9.0 Cluster “ Congested” : LMS could not dequeue messages fast enough Cause : Long run queue, CPU starvation

High CPU Load: Solution ,[object Object],[object Object],[object Object],[object Object]

Contention Event Waits Time (s) AVG (ms) % Call Time ---------------------- --------- -------- -------- -------- gc cr block 2-way 317,062 5,767 18 19.0 gc current block 2-way 201,663 4,063 20 13.4 gc buffer busy 111,372 3,970 36 13.1 CPU time 2,938 9.7 gc cr block busy 40,688 1,670 41 5.5 ------------------------------------------------------- Global Contention on Data Serialization Its is very likely that CR BLOCK BUSY and GC BUFFER BUSY are related

Contention: Solution ,[object Object],[object Object]

High Latencies Event Waits Time (s) AVG (ms) % Call Time ---------------------- ---------- ---------- --------- -------- gc cr block 2-way 317,062 5,767 18 19.0 gc current block 2-way 201,663 4,063 20 13.4 gc buffer busy 111,372 3,970 36 13.1 CPU time 2,938 9.7 gc cr block busy 40,688 1,670 41 5.5 ------------------------------------------------------- Tackle latency first, then tackle busy events Expected: To see 2-way, 3-way Unexpected: To see > 1 ms (AVG ms should be around 1 ms)

High Latencies : Solution ,[object Object],[object Object],[object Object],[object Object],[object Object]

Health Check ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

<Insert Picture Here> Application and Database Design

General Principles ,[object Object],[object Object],[object Object],[object Object],[object Object]

Scalability Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Health Check ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

<Insert Picture Here> Diagnostics and Problem Determination ,[object Object]

Checklist for the Skeptical Performance Analyst ( AWR based ) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Drill-down: SQL Statements “ Culprit”: Query that overwhelms IO subsystem on one node Physical Reads Executions per Exec %Total -------------- ----------- ------------- ------ 182,977,469 1,055 173,438.4 99.3 SELECT SHELL FROM ES_SHELL WHERE MSG_ID = :msg_id ORDER BY ORDER_NO ASC The same query reads from the interconnect: Cluster CWT % of CPU Wait Time (s) Elapsd Tim Time(s) Executions ------------- ---------- ----------- -------------- 341,080.54 31.2 17,495.38 1,055 SELECT SHELL FROM ES_SHELL WHERE MSG_ID = :msg_id ORDER BY ORDER_NO ASC

Drill-Down: Top Segments GC Tablespace Subobject Obj Buffer % of Name Object Name Name Type Busy Capture -------- ------------- -------- ------ ------- ------ ESSMLTBL ES_SHELL SYS_P537 TABLE 311,966 9.91 ESSMLTBL ES_SHELL SYS_P538 TABLE 277,035 8.80 ESSMLTBL ES_SHELL SYS_P527 TABLE 239,294 7.60 … Apart from being the table with the highest IO demand it was the table with the highest number of block transfers AND global serialization

Findings Summary in EM ,[object Object],[object Object]

Recommendations ,[object Object],[object Object],[object Object],[object Object]

ADDM Diagnosis for RAC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What ADDM Diagnoses for RAC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

A Q & Q U E S T I O N S A N S W E R S

OTHER SESSIONS to CHECKOUT S291670 Oracle Database 11g: First Experiences with Grid Computing (with Mobiltel and BCF) South 310 4:00 PM S291662 Using Oracle RAC and Microsoft Windows 64-bit as the Foundation (with Intel and Talx) South 309 1:00 PM S291242 Demystifying Oracle RAC Internals South 104 10:00 AM Title THURSDAY TIME

For More Information http://search.oracle.com or otn.oracle.com/rac REAL APPLICATION CLUSTERS

Oow2007 performance

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Oow2007 performance

Ähnlich wie Oow2007 performance (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Oow2007 performance

Hinweis der Redaktion