Weitere ähnliche Inhalte Ähnlich wie [INSIGHT OUT 2011] A23 database io performance measuring planning(alex) (20) Mehr von Insight Technology, Inc. (20) Kürzlich hochgeladen (20) [INSIGHT OUT 2011] A23 database io performance measuring planning(alex)2. Alex Gorbachev
• CTO, The Pythian Group
• Blogger
• OakTable Network member
• Oracle ACE Director
• BattleAgainstAnyGuess.com
• President, Oracle RAC SIG
2 © 2009/2010 Pythian
3. Why Companies Trust Pythian
• Recognized Leader:
• Global industry-leader in remote database administration services and consulting for Oracle,
Oracle Applications, MySQL and SQL Server
• Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and
MDS Inc. to help manage their complex IT deployments
• Expertise:
• One of the world’s largest concentrations of dedicated, full-time DBA expertise.
• Global Reach & Scalability:
• 24/7/365 global remote support for DBA and consulting, systems administration, special
projects or emergency response
3
8 © 2011 Pythian
4. Why Measure I/O Performance?
Diagnostics & troubleshooting
Proof of impact
Capacity planning and monitoring
Platform validation / acceptance testing
4 © 2009/2010 Pythian
5. Instrumentation:
Storage Stack vs Oracle Database
➡ Oracle DB call ➡ Storage I/O call
1.read block • UNKNOWN
2.read block
3.latch free
4.read block
5.enqueue
6.send result
Can profile a DB call Cannot profile I/O call
5 © 2009/2010 Pythian
7. Direct Attached Storage Stack
Illustration from Guttina Srinivas's Blog - http://guttinasrinivas.wordpress.com/
7 © 2009/2010 Pythian
8. Simplified Enterprise Storage Stack
Sample IBM Storage Stack - http://www.ibm.com/developerworks/tivoli/library/t-snaptsm1/index.html
8 © 2009/2010 Pythian
10. complex
Storage stack is too
and heterogeneous to
build end-to-end IO profile
10 © 2009/2010 Pythian
11. Sources of I/O Performance Measurements
Database as an application consuming I/O services
MUST HAVE
Drill down into the rest of the I/O stack
ASM
Operating System
Storage arrays
Complimentary ...
11 © 2009/2010 Pythian
12. How is I/O Measured in the Database?
• I/O code paths (syscalls) are instrumented - I/O Waits
• timed_statistics=true
• Additional statistics are collected
• IO size, amount, time spent
• Granularity on different levels
• Global, session, datafile, service, module/action
• Stored in SGA as cumulative counters - X$ tables
• Externalized via V$ views
• Snapshots taken by various tools like Statspack, AWR, Snapper, etc.
12 © 2009/2010 Pythian
13. WHAT Do We Measure?
Response Time
Throughput / Bandwidth
Skew & Patterns
I/O measurements are almost always
aggregate!
13 © 2009/2010 Pythian
14. Reproducible issue?
10046 trace
response time
skew & patterns
14 © 2009/2010 Pythian
15. Mr Tools - The Time-Saver
15 © 2009/2010 Pythian
16. Example Profile: 4+ hours batch job
Wait Event / Syscall DURATION CALLS MEAN MIN MAX
----------------------------- ------------------------ ---------- ----------- ----------- -----------
db file sequential read 11861.295517 81.4% 201940 0.058737 0.000000 5.473023
log file switch (checkpoint.. 1941.262523 13.3% 49 39.617603 0.001443 211.405054
PL/SQL lock timer 764.452061 5.2% 765 0.999284 0.000008 1.003142
log buffer space 0.149762 0.0% 8 0.018720 0.006973 0.030125
undo segment extension 0.126689 0.0% 19 0.006668 0.001265 0.033682
6 others 0.201454 0.0% 14 0.014390 0.000004 0.059468
----------------------------- ------------------------ ---------- ----------- ----------- -----------
TOTAL (11) 14567.488006 100.0% 202795 0.071834 0.000000 211.405054
16 © 2009/2010 Pythian
17. I/O Response Time Histogram
Matched event names:
db file sequential read
Options:
group = ''
name = 'db file sequential read'
where = '1'
RANGE {min <= e < max} DURATION CALLS MEAN
----------------------- ------------------------ ---------- -----------
0.000000 0.000001 0.000000 0.0% 14 0.000000
0.000001 0.000010 0.000021 0.0% 8 0.000003
0.000010 0.000100 0.008654 0.0% 180 0.000048
0.000100 0.001000 41.040579 0.3% 86617 0.000474
0.001000 0.010000 201.892556 1.7% 36305 0.005561
0.010000 0.100000 1435.417470 12.1% 66754 0.021503
0.100000 1.000000 3730.265905 31.4% 9059 0.411775
1.000000 10.000000 6452.670332 54.4% 3003 2.148741
10.000000 100.000000 0.000000 0.0% 0
100.000000 1000.000000 0.000000 0.0% 0
1000.000000 Infinity 0.000000 0.0% 0
----------------------- ------------------------ ---------- -----------
TOTAL (8) 11861.295517 100.0% 201940 0.058737
17 © 2009/2010 Pythian
18. Datafile Skew?
Matched event names:
db file sequential read
Options:
group = '$p1'
name = 'db file sequential read'
where = '1'
File ID DURATION CALLS MEAN MIN MAX
6 2383.052786 20.1% 40086 0.059449 0.000000 4.825304
10 2131.333101 18.0% 21568 0.098819 0.000029 5.366355
12 2065.204816 17.4% 35353 0.058417 0.000000 5.104831
7 1870.332973 15.8% 32955 0.056754 0.000000 4.954959
11 1711.504204 14.4% 39065 0.043812 0.000000 4.819981
9 1659.888036 14.0% 23735 0.069934 0.000000 5.473023
14 36.206148 0.3% 3141 0.011527 0.000063 4.442775
8 3.532841 0.0% 5877 0.000601 0.000073 0.061977
13 0.193044 0.0% 126 0.001532 0.000343 0.104574
1 0.046855 0.0% 32 0.001464 0.000000 0.022407
3 0.000713 0.0% 2 0.000357 0.000311 0.000402
TOTAL (11) 11861.295517 100.0% 201940 0.058737 0.000000 5.473023
18 © 2009/2010 Pythian
19. Analyzing Datafile Chunks
Matched event names:
db file sequential read
Options:
group = '$p1*1000000000+int($p2*8192/1024/1024)'
name = 'db file sequential read'
where = '$ela>0.1'
File Chunk DURATION CALLS MEAN MIN MAX
------------ ------------------------ ---------- ----------- ----------- -----------
10000008570 175.587622 1.7% 120 1.463230 0.134717 4.373926
6000000381 173.669439 1.7% 119 1.459407 0.107691 3.713161
10000008566 157.199899 1.5% 102 1.541175 0.167078 4.366412
10000008565 147.466754 1.4% 98 1.504763 0.128982 4.538604
6000008641 139.614461 1.4% 90 1.551272 0.127778 4.799470
10000008567 120.733972 1.2% 89 1.356561 0.100613 4.564558
9000008223 107.619815 1.1% 73 1.474244 0.118106 5.473023
10000008563 95.949235 0.9% 72 1.332628 0.115185 3.580435
9000008224 90.483791 0.9% 79 1.145364 0.129597 5.468010
6000006191 86.307121 0.8% 78 1.106502 0.102094 3.876378
4329 others 8888.304128 87.3% 11142 0.797730 0.100035 5.366355
------------ ------------------------ ---------- ----------- ----------- -----------
TOTAL (4339) 10182.936237 100.0% 12062 0.844216 0.100035 5.473023
19 © 2009/2010 Pythian
20. Playing with Chunks Size
Matched event names:
db file sequential read
Options:
group = '$p1*1000000000+int($p2*8192/1024/1024/16)'
name = 'db file sequential read'
where = '$ela>0.1'
File Chunk DURATION CALLS MEAN MIN MAX
----------- ------------------------ ---------- ----------- ----------- -----------
10000000535 846.934923 8.3% 633 1.337970 0.100168 4.564558
7000000029 315.398085 3.1% 353 0.893479 0.103097 3.670991
6000000023 280.162428 2.8% 330 0.848977 0.100183 3.713161
12000000171 261.555298 2.6% 268 0.975953 0.103535 4.014043
12000000170 193.130501 1.9% 166 1.163437 0.102184 3.937978
9000000513 175.100649 1.7% 124 1.412102 0.118106 5.473023
7000000157 173.111037 1.7% 160 1.081944 0.102949 4.237775
6000000540 140.663440 1.4% 91 1.545752 0.127778 4.799470
6000000386 130.590608 1.3% 172 0.759248 0.100873 3.876378
11000000156 122.062914 1.2% 135 0.904170 0.100622 3.748086
447 others 7544.226354 74.1% 9630 0.783409 0.100035 5.468010
----------- ------------------------ ---------- ----------- ----------- -----------
TOTAL (457) 10182.936237 100.0% 12062 0.844216 0.100035 5.473023
20 © 2009/2010 Pythian
21. Time Periods Analysis
One minute average IO response time, seconds
2.0
1.5
1.0
0.5
0
1
7
13
19
25
31
38
45
52
58
64
70
77
83
92
98
104
110
116
122
128
134
140
146
152
159
165
171
177
186
196
202
208
214
220
226
232
238
244
21 © 2009/2010 Pythian
22. 10046 Trace Is Expensive... NOT!
• 10046 tracing overhead is insignificant
• This sample 4+ hours batch - trace <30MB with 300K+ lines
• 10x compressed - 3 MB
• 30 batches per night - <1GB of traces
• 10x compressed - 100 MB per night
One month of complete 10046 trace
batch history is only 3GB compressed
22 © 2009/2010 Pythian
23. Storing 3GB of data on Amazon S3
costs less than $1 per month
23 © 2009/2010 Pythian
24. What Does 10046 Not Buy You?
• Throughput
• Doable but needs quite a bit of traces to enable and process
• No accounting for non-database workload
• No visibility on how each IO call translates into “real” IOs
• Real IOs - requests done by DB server OS?
• Real IOs - requests done by a SAN controller?
• Real IOS - requests served by disk controller?
• Caching impact
24 © 2009/2010 Pythian
25. Measuring Throughput
Database
Host
• AWR & Statspack
• OS tools Storage Array
• Like sar, iostat, DTrace
• Storage vendor tools
• Like EMC Symmetrix Performance Analyzer (SPA)
25 © 2009/2010 Pythian
26. Average values
make sense only if events
are perfectly randomly
distributed as well as response times
26 © 2009/2010 Pythian
27. Don’t Be Trapped by Averages!
• Averaging response times
• Loosing skew info
• Loosing IO calls attributes
• Sizes, offsets, data blocks
• Loosingscope - what transaction is this IO request for?
• Reduced time granularity
• Traditional Statspack & AWR snaps are hourly
• sar data is captured every 5 (or 10?) minutes be default
• SAN stats usually aggregated as high as 1 hours (SPA - 5 minutes?)
27 © 2009/2010 Pythian
28. Choosing the Aggregation Interval
• 24 hours running window
• 95% of transaction should complete within 1 seconds
• 99% of transactions should complete within 10 seconds
• 10 seconds is timeout so 1% of transactions can fail and it’s OK
• 24 hours is 86,400 seconds => 1% is 864 seconds (14.4 min)
•1 hour intervals => few minutes hiccups won’t be
noticeable
• 5 minutes intervals => significant spikes of IO response
time will likely be noticeable
• But really want to go to intervals within the typical
transaction response times
28 © 2009/2010 Pythian
29. Random Arrivals concept
applies 100% to IO calls
Detecting Random Arrivals rule violation
requires averaging
interval close to response time
29 © 2009/2010 Pythian
30. Monitoring I/O Performance and SLAs
• How your transactions SLAs transform to IO SLAs?
• Percentile requirements
• Commit to response time according to percentile requirements at
the pre-defined throughput and concurrency levels
• *average* 2000 IOPS per second with up to 40 concurrent IOs
• 99% of IOs - <10 ms, 99.9% IOs - <100ms
• 1 minute sliding window
• Monitoring such SLAs - must average 1 minute and collect response
times histogram
30 © 2009/2010 Pythian
31. Importance of Response Time Histograms
• Includinghistograms in the snapshots adds more color to
the averaged measures
• Histogram is an indicator of skew
• They help selecting the right measurements interval
• Histograms can be build on any value - not just response
times
• Histogram of IO throughput per 5 minutes intervals to analyze
whether we have bursts of IO activity.
• Histogram in Statspack reports appeared in 10g
• Histogram in AWR reports appeared in 11g
31 © 2009/2010 Pythian
32. A Tool to Collect Short Interval Averages
• Requirements:
• 1 minute or less intervals
• Collect system level IO waits and stats
• Collect session level IO waits and stats
• Collect IO response time histograms (system and session)
• Nice to have - per service/module/action granularity
• Production collection example (6 years old)
• Oracle 9i RAC, HP-UX 64 cores
• thousands DB calls per second, thousands IO calls per second
• *All* stats and waits with 1-5 minute snaps and at logoff
• Tanel Poder’s Snapper and Sesspack
32 © 2009/2010 Pythian
33. ASH Data for I/O Measurements?
V$ACTIVE_SESSION_HISTORY
&
DBA_HIST_ACTIVE_SESS_HISTORY
• TIME_WAITED => 11.2 documentation is misleading
• DELTA_TIME
• DELTA_READ_IO_REQUESTS/BYTES
• DELTA_WRITE_IO_REQUESTS/BYTES
33 © 2009/2010 Pythian
34. ASH itself is
misleading for I/O performance
measurements
Sampling tends to hide short waits
invalidating it for any response time
analysis
34 © 2009/2010 Pythian
35. AWR Sources
• DBA_HIST_EVENT_HISTOGRAM
• DBA_HIST_FILEMETRIC_HISTORY *
• DBA_HIST_FILESTATXS
• DBA_HIST_IOSTAT_DETAIL/FILETYPE/FUNCTION
• DBA_HIST_SERVICE_STAT
• DBA_HIST_SESSMETRIC_HISTORY *
• DBA_HIST_SQLSTAT
• DBA_HIST_SYSTEM_EVENT
• DBA_HIST_SYSSTAT
• DBA_HIST_SYSMETRIC_HISTORY *
* These views have granularity of 1 minute
35 © 2009/2010 Pythian
36. AWR Example - DBA_HIST_SYSMETRIC_HISTORY
-- Physical Reads Per Sec
-- Physical Writes Per Sec
-- I/O Requests per Second
-- I/O Megabytes per Second
-- Redo Generated Per Sec
-- Average Synchronous Single-Block Read Latency
SELECT begin_time, ROUND(value,1) v
FROM dba_hist_sysmetric_history
WHERE metric_name=
'Average Synchronous Single-Block Read Latency'
ORDER BY 1;
36 © 2009/2010 Pythian
37. V$SESSION_WAIT_HISTORY?
• The last 10 wait events for each active session.
• Column WAIT_TIME_MICRO
• Amount of time waited (in microseconds)
37 © 2009/2010 Pythian
38. Measuring at the OS Layer
• OS is not really transparent for IO requests
• Has IO requests queues
• Utilizes various I/O schedulers that decide on requests priority
• ASYNC I/O
• Filesystems and buffered I/O
• Impact of CPU scheduling
• Timespent in OS layer becomes important as we move to
SSD and Flash storage
• Difficult to directly associate OS stats with DB stats
38 © 2009/2010 Pythian
39. Measuring at the SAN Layer
• Normally most of IO time is spent on physical disk but...
• Read cache impact
• Write cache impact
• Cache saturation situations
• Abnormal situations like controller/switch failure
• Quality of Service (QoS)
• Flash based storage shifts the balance of time again
• Non-disk component of IO response time becomes more prominent
• Difficultto associate SAN stats with OS & DB stats
• Virtualization kicks in
39 © 2009/2010 Pythian
40. Exadata Storage Cell Measurement
• Replacement of SAN layer
• More than jut stats per disk / controller and etc
• Storage Cell now performs more than just I/O functions
• Muchbetter accountability and association with
database
• Database segment visibility in flash cache
• IORM metrics - category, database, consumer groups
• Flash Cache metrics
• Cumulative and 1 minute aggregates
• Some stats are passed back to the database
• V$SYSSTAT, V$SQL, waits, XML cell stats in V$CELL_STATE
40 © 2009/2010 Pythian
41. Increased Importance of Low Latency Network
• With traditional HDD random access times of 5-10ms
➡ Communication overhead is minimal - less than 10%
• FC storage latencies are in few hundreds of microseconds
• NFS mounted storage adds less than 1ms latency
• IP stack is heavier on CPU => impact of OS CPU scheduler
• Flash read latency is order of magnitude shorter
➡ Suddenly InfiniBand SAN becomes necessity!
• microseconds latency
41 © 2009/2010 Pythian
42. Exadata: Flash + InfiniBand = Very Low Latency?
• Let’s check some Exadata 10046 traces...
Matched event names:
cell single block physical read
Options:
group = ''
name = 'cell single block physical read'
where = '1'
RANGE {min <= e < max} DURATION CALLS MEAN
0.000000 0.000001 0.000000 0.0% 0
0.000001 0.000010 0.000000 0.0% 0
0.000010 0.000100 0.000000 0.0% 0
0.000100 0.001000 0.191839 95.5% 310 0.000619
0.001000 0.010000 0.008983 4.5% 3 0.002994
0.010000 0.100000 0.000000 0.0% 0
42 © 2009/2010 Pythian
43. Exadata: Flash + InfiniBand = Very Low Latency?
await svctm %util
0.51 0.31 5.85
0.79 0.38 6.40
0.57 0.41 6.50
0.62 0.40 7.00
0.41 0.30 4.95
0.43 0.32 5.60
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdn 0.50 0.00 188.50 0.00 1512.00 0.00 16.04 0.10 0.51 0.31 5.85
sdo 1.50 0.00 170.50 0.00 1376.00 0.00 16.14 0.14 0.79 0.38 6.40
sdp 2.50 0.00 157.00 0.00 1276.00 0.00 16.25 0.09 0.57 0.41 6.50
sdq 0.50 0.00 173.50 0.00 1392.00 0.00 16.05 0.11 0.62 0.40 7.00
sdr 0.50 0.00 166.50 0.00 1336.00 0.00 16.05 0.07 0.41 0.30 4.95
sds 1.00 0.00 175.50 0.00 1412.00 0.00 16.09 0.08 0.43 0.32 5.60
43 © 2009/2010 Pythian
44. Measuring for Planning:
Aggregate Interval
1. Choose a large-ish interval
2. Analyze histograms - skewed inside the interval?
3. If Yes, reduce the interval
4. Repeat steps 1-3 until ...
a) you either see no skew or ...
b) business stops carrying about skew inside that interval
44 © 2009/2010 Pythian
48. Measuring for Planning:
Distinguish Different Kinds of I/O
• Random vs sequential I/O
• If underlying disks are spinning media
• Small vs Large IOs
• Throughput is then measured either in IOPS or MBPS
• Reads vs Writes
• Sometimes can be generalized as what % are the writes
48 © 2009/2010 Pythian
49. Measuring for Planning:
Business Function Granularity
• Measure I/O at the right granularity
• Ideally per business transaction / function
• Practical - service, session, module/action, SQL
• “System” I/O - LGWR, ARCH, DBWR, etc.
• Indirect association to business transactions
• Helps building more realistic capacity planning models
49 © 2009/2010 Pythian
51. Oracle Database CALIBRATE_IO
DBMS_RESOURCE_MANAGER.CALIBRATE_IO
(<DISKS>, <MAX_LATENCY>, iops, mbps, lat);
• iops - max read per second (random single block)
• lat - actual average single block latency at iops rate
• mbps - max MB/s throughput (large reads)
simplistic
read-only needs a database
outputs max only requires ASYNC I/O
51 © 2009/2010 Pythian
52. ORION - ORacle I/O Numbers
• Free tool from Oracle simulating database-like IOs
• No database required
• Same I/O libs / code-path
• Still requires ASYNC I/O
• Very flexible
• Large vs Small IOs; flexible sizes; mixed
• Random vs Sequential I/O patterns; mixed
• Configurable write I/O %
• Can simulate ASM striping layout
52 © 2009/2010 Pythian
53. ORION Example 1: Scalability Anomaly
HP blades
HP Virtual Connect
Flex10
Big NetApp box
100 disks
53 © 2009/2010 Pythian
54. ORION Example 1: Impact of Large IOs
HP blades
HP Virtual Connect
Flex10
Big NetApp box
100 disks
54 © 2009/2010 Pythian
55. ORION Example 1: Write IO Impact
HP blades
HP Virtual Connect
Flex10
Big NetApp box
100 disks
55 © 2009/2010 Pythian
56. ORION Example 2: Initial Run - Failed Expectations
NetApp NAS, 1 Gbit Ethernet, 42 disks
5000 30.0
4000
Read only 22.5
Latency, ms
3000
IOPS
15.0
2000
7.5
1000
0 0
1 2 3 4 5 10 20 30 40 50 60 70 80 90 100
IOPS Latency
5000 50
4000 40
Read write
Latency, ms
3000 30
IOPS
2000 20
1000 10
0 0
1 2 3 4 5 10 20 30 40 50 60 70 80 90 100
56 © 2009/2010 Pythian
57. ORION Example 2: Tune-Up Results
Switched from Intel to Broadcom NICs
IOPS Latency
10000 12
10
8000
8
Latency, ms
6000
IOPS
6
4000
4
2000
2
0 0
1 2 3 4 5 10 20 30 40 50 60 70 80 90 100
15000 8
12500
6
10000
Latency, ms
IOPS
7500 4
5000
2
2500
0 0
1 2 3 4 5 10 20 30 40 50 60 70 80 90 100
57 © 2009/2010 Pythian
60. Presenting measurements:
Visualization is the Key
60 © 2009/2010 Pythian
61. Q&A
Email me - gorbachev@pythian.com
Read my blog - http://www.pythian.com
Follow me on Twitter - @AlexGorbachev
Join Pythian fan club on Facebook & LinkedIn
61 © 2009/2010 Pythian