Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Understanding Performance with DTrace

  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Understanding Performance with DTrace

  1. 1. Understanding Performance with DTrace (While the customer yells at you) Adam Leventhal @ahl
  2. 2. DTrace = +
  3. 3. Background • At the time the biggest deal in our history • Performance problems almost immediately • Experimentation / CEO mea culpa / frustration • I visit the customer…
  4. 4. “ZFS is a piece of shit; you could not have made a worse choice.” – Valued Customer
  5. 5. The Plan 1. Use DTrace 2. Figure out all their problems 3. Fix all their problems 4. Be annoyingly magnanimous with the customer
  6. 6. DTrace • To diagnose problems you need data • To collect data you must modify the system • DTrace – Dynamic instrumentation – Configurable data collection – Safe, efficient, concise – Sun Solaris in 2003; open source 2004 – Mac OS X, FreeBSD, Oracle Linux (and others)
  7. 7. DTrace Summary • Applications (most languages) and OS • Broad coverage / stable known probes • Trace between processes / languages • Trace kernel interactions • Powerful data aggregation • Easy one-liners / scripts for tough stuff
  8. 8. Customer System • Basically an NFS server (illumos / OpenZFS) – SAN backend – Ethernet-connected AIX client – Typically moving about 300MB/s – 4 socket, 6 cores/socket • Symptoms – Terrible latency reported from Oracle (AWR) – Sad / angry users
  9. 9. MEASURING NFS LATENCY
  10. 10. nfsv3:::op-read-start, nfsv3:::op-write-start { # ... } nfsv3:::op-read-done, nfsv3:::op-write-done { # ... }
  11. 11. nfsv3:::op-read-start, nfsv3:::op-write-start { self->ts = timestamp; } nfsv3:::op-read-done, nfsv3:::op-write-done { @ = quantize(timestamp – self->ts); self->ts = 0; }
  12. 12. nfsv3:::op-read-start, nfsv3:::op-write-start { self->ts = timestamp; } nfsv3:::op-read-done, nfsv3:::op-write-done { @[probename == “op-write-done” ? “write” : “read”] = quantize(timestamp – self->ts); self->ts = 0; }
  13. 13. nfsv3:::op-write-start { self->sync = args[2]->stable != 0; } sdt:::arc-miss, sdt:::blocked-read { self->io = 1; }
  14. 14. nfsv3:::op-read-start, nfsv3:::op-write-start { self->ts = timestamp; } # set self->io and self->sync nfsv3:::op-read-done, nfsv3:::op-write-done { @[probename == “op-write-done” ? self->sync ? “sync write” : “async write” : self->io ? “uncached read” : “cached read”] = quantize(timestamp – self->ts); self->ts = 0; self->io = 0; self->sync = 0; }
  15. 15. nfsv3:::op-read-start, nfsv3:::op-write-start { self->ts = timestamp; } # set self->io and self->sync nfsv3:::op-read-done, nfsv3:::op-write-done { @[probename == “op-write-done” ? self->sync ? “sync write” : “async write” : self->io ? “uncached read” : “cached read”, “microseconds”] = quantize(timestamp – self->ts); self->ts = 0; self->io = 0; self->sync = 0; }
  16. 16. cached read microseconds value ------------- Distribution ------------- count 4 | 0 8 |@@ 7 16 |@@@@@@@@@@@ 43 32 |@@@@@@@@@@@@@@@@@@@ 79 64 |@@@@@@ 23 128 |@@ 8 256 | 2 512 | 0 1024 | 1 2048 | 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1 4 16 64 256 1024 4096 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000
  17. 17. uncached read microseconds value ------------- Distribution ------------- count 128 | 0 256 |@@@@@@@@ 1612 512 |@@ 508 1024 |@ 200 2048 |@ 192 4096 |@@@@@ 1021 8192 |@@@@@@@@@@@@@@@@@ 3411 16384 |@@@@@@ 1191 32768 |@ 128 65536 | 2 131072 | 1 262144 | 0 0 500 1000 1500 2000 2500 3000 3500 4000 1 8 64 512 4096 32768 0 500 1000 1500 2000 2500 3000 3500 4000 0 50000 100000 150000 200000 250000 300000
  18. 18. async write microseconds value ------------- Distribution ------------- count 16 | 0 32 |@@@@@@@@@@ 442 64 |@@@@@@@@@@@@@@@@@ 767 128 |@@@@@ 235 256 |@@ 109 512 |@ 59 1024 | 16 2048 | 15 4096 |@ 29 8192 |@ 28 16384 | 12 32768 | 0 65536 | 0 131072 | 0 262144 | 0 524288 | 11 1048576 |@ 51 2097152 |@ 41 4194304 | 0 0 100 200 300 400 500 600 700 800 900 1 16 256 4096 65536 1048576 0 100 200 300 400 500 600 700 800 900 0 1000000 2000000 3000000 4000000 5000000
  19. 19. sync write microseconds value ------------- Distribution ------------- count 8 | 0 16 | 149 32 |@@@@@@@@@@@@@@@@@@@@@ 8682 64 |@@@@@ 2226 128 |@@@@ 1743 256 |@@ 658 512 | 95 1024 | 20 2048 | 19 4096 | 122 8192 |@@ 744 16384 |@@ 865 32768 |@@ 625 65536 |@ 316 131072 | 113 262144 | 22 524288 | 70 1048576 | 94 2097152 | 16 4194304 | 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1 16 256 4096 65536 1048576 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 500000 1000000 1500000 2000000 2500000 13k < 1ms 3k 1ms-100ms 200 > ¼ second
  20. 20. sync write time contribution 0 20000000 40000000 60000000 80000000 100000000 120000000 0 1000000 2000000 3000000 4000000 5000000 0 20000000 40000000 60000000 80000000 100000000 120000000 1 8 64 512 4096 32768 262144 2097152
  21. 21. I/O: read read microseconds value ------------- Distribution ------------- count 16 | 0 32 | 14 64 | 33 128 |@@@@ 1249 256 |@@@ 998 512 |@ 268 1024 |@ 224 2048 |@ 257 4096 |@@@@@@ 1837 8192 |@@@@@@@@@@@@@@@@@@@ 5725 16384 |@@@@ 1313 32768 | 77 65536 | 0 131072 | 1 262144 | 0
  22. 22. I/O: write write microseconds value ------------- Distribution ------------- count 16 | 0 32 | 338 64 | 490 128 | 720 256 |@@@@ 15079 512 |@@@@@ 20342 1024 |@@@@@@@ 27807 2048 |@@@@@@@@ 28897 4096 |@@@@@@@@ 29910 8192 |@@@@@ 20605 16384 |@ 5081 32768 | 1079 65536 | 69 131072 | 5 262144 | 1 524288 | 0
  23. 23. Basic Performance Goals • Get idle out of the system – Figure out why work isn’t getting done • Get idle into the system – Don’t waste cycles – Be more efficient
  24. 24. Where are we going off cpu? nfsv3:::op-write-start { self->ts = timestamp; } sched:::off-cpu /self->ts/ { self->off = timestamp; } sched:::on-cpu /self->off/ { @s[stack()] = quantize((timestamp - self->off) / 1000); self->off = 0; } nfsv3:::op-write-done /self->ts/ { self->ts= 0; }
  25. 25. genunix`cv_wait+0x61 zfs`txg_wait_open+0x7a zfs`dmu_tx_wait+0xb3 zfs`zfs_write+0x686 genunix`fop_write+0x6b nfssrv`rfs3_write+0x50e nfssrv`common_dispatch+0x48b nfssrv`rfs_dispatch+0x2d rpcmod`svc_getreq+0x19c rpcmod`svc_run+0x171 rpcmod`svc_do_run+0x81 nfs`nfssys+0x765 unix`_sys_sysenter_post_swapgs+0x149 value ------------- Distribution ------------- count 4 | 0 8 | 1 16 | 1 32 | 2 64 | 1 128 | 1 256 | 0 512 | 0 1024 | 0 2048 | 0 4096 | 0 8192 | 0 16384 | 0 32768 | 0 65536 | 0 131072 | 0 262144 | 2 524288 | 21 1048576 |@@@@@@@@ 373 2097152 |@@@@@@@@@@@@@@ 675 4194304 |@@@@@@@@@@ 491 8388608 |@@@@@@@ 336 16777216 | 0
  26. 26. How ZFS Processes Data • Three batches of data – “Open” accepting new data – “Quiesced” intermediate state – “Syncing” writing data to disk • Originally no size limit • Throttle to avoid overwhelming backend
  27. 27. sync write microseconds value ------------- Distribution ------------- count 8 | 0 16 | 149 32 |@@@@@@@@@@@@@@@@@@@@@ 8682 64 |@@@@@ 2226 128 |@@@@ 1743 256 |@@ 658 512 | 95 1024 | 20 2048 | 19 4096 | 122 8192 |@@ 744 16384 |@@ 865 32768 |@@ 625 65536 |@ 316 131072 | 113 262144 | 22 524288 | 70 1048576 | 94 2097152 | 16 4194304 | 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1 16 256 4096 65536 1048576 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 500000 1000000 1500000 2000000 2500000 13k < 1ms 3k 1ms-100ms 200 > ¼ second
  28. 28. Increasing Queue Depth 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1 4 16 64 256 1024 4096 16384 65536 262144 10 32 64 128 Write IO latency for various queue depths
  29. 29. Profiling • Profile provider – profile-199{ @[usym(arg1)] = count(); } • Pick individual functions to measure ... zio_wait 53us ( 0%) dmu_objset_is_dirty 66us ( 0%) spa_sync_config_object 75us ( 0%) spa_sync_aux_dev 79us ( 0%) list_is_empty 86us ( 0%) dsl_scan_sync 124us ( 0%) ddt_sync 201us ( 0%) txg_list_remove 519us ( 0%) vdev_config_sync 1830us ( 0%) bpobj_iterate 9939us ( 0%) vdev_sync 27907us ( 1%) bplist_iterate 35301us ( 1%) vdev_sync_done 346336us (16%) dsl_pool_sync 1652050us (79%) spa_sync 2077646us (100%)
  30. 30. Lockstat Count indv cuml rcnt nsec Lock Caller 166416 8% 17% 0.00 88424 0xffffff0d4aaa4818 cv_wait+0x69 nsec ------ Time Distribution ------ count Stack 512 |@ 7775 taskq_thread_wait+0x84 1024 |@@ 14577 taskq_thread+0x308 2048 |@@@@@ 31499 thread_start+0x8 4096 |@@@@@@ 36522 8192 |@@@ 19818 16384 |@ 11065 32768 |@ 7302 65536 |@ 7932 131072 | 5537 262144 |@ 7992 524288 |@ 8003 1048576 |@ 6017 2097152 | 2086 4194304 | 198 8388608 | 48 16777216 | 37 33554432 | 7 67108864 | 1
  31. 31. Happy Ending • Re-wrote the OpenZFS write throttle • Removed tons of inefficiencies in the code • Broke up the scorching hot lock 1 10 100 1000 10000 100000 1000000 10000000 0.00001 0.0001 0.001 0.01 0.1 1 10 Seconds (log scale) ZFS OpenZFS
  32. 32. Lessons Learned • Remedies before diagnosis will cause anger • Look at the real problem not a reproduction • Don’t let your software hide pain • Right tools, right questions • Iterate, iterate, iterate • Be magnanimous with the customer
  33. 33. BACKUP SLIDES
  34. 34. “Blah blah CDDL blah blah blah… 💩 😠”

×