Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Hotsos Advanced Linux Tools

559 Aufrufe

Veröffentlicht am

Slides from HotSos Symposium 2018

Veröffentlicht in: Technologie
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • I looked forward to be a whole family again with a love spell, I actually got tired of living a depressing life. I was so unlucky that my husband fell out of love with me; he keeps threatening me with divorce paper almost everyday. But all I wanted in our home are positive thinking, love and happiness all the time. I wanted us to be good parent to our kids. When Dr. Wakina stepped in with his helping hands via dr.wakinalovetemple@gmail.com I began to feel lighter on the inside, I stopped having that heavy feeling on my chest during the process of the love spell. Two days after, my husband started exhibiting some positive character. He confessed that he was thinking of me more recently even when he was with his girlfriend. I was overwhelmed when he came to me and apologized for every wrong he has done to me, I also apologized to him as well.  Dr. I just want to say thanks for the patience and kindness shown to me during the whole spell process. My husband and I has been living happily after the love spell, none of his exs ever comeback.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Hotsos Advanced Linux Tools

  1. 1. ADVANCED PERFORMANCE TUNING AND MONITORING WITH LINUX KELLYN POT’VIN-GORMAN TECHNICAL INTELLIGENCE MANAGER, DELPHIX @DBAKEVLAR
  2. 2. KELLYN POT’VIN-GORMAN TECHNICAL INTELLIGENCE MANAGER, DELPHIX • Multi-platform DBA, (Oracle, MSSQL, MySQL, Sybase, PostgreSQL, Informix…) • Oracle ACE Director, (Alumni) and Oak Table Network Member • Idera ACE 2018 • APEX Women in Technology Award, CTA • STEM education with Raspberry Pi and Python, including DevOxx4Kids, Oracle Education Foundation and TechGirls • President, Rocky Mtn Oracle User Group • President, Denver SQL Server User Group • DevOps author, instructor and presenter. • Author, blogger, (http://dbakevlar.com)
  3. 3. LINUX PERFORMANCE IS HUGE
  4. 4. WHAT YOU KNOW • *Top, *STAT, TCPDump, STrace • The Cool Tools • Snapper • DTrace4linux • SLOB, (benchmarking)
  5. 5. MONITORING TOOLS PIDSTAT NETSTAT DSTAT NMON LSOF
  6. 6. PIDSTAT • We all use IOSTAT, VMSTAT, MEMSTAT, etc. • Focusing on the PID and collecting quick information can eliminate the need to gather it elsewhere. • Statistics on a process, either global or by PID
  7. 7. GLOBAL PID STATS AND BY PID
  8. 8. INSPECTING IO FOR A USER WITH PIDSTAT
  9. 9. INSPECTING PAGE FAULTS AND MEMORY
  10. 10. WHAT IS DSTAT? • Monitoring tool for CPU, disk and network activity • Displays ongoing values in interval until q(uit) • To Install: • $ yum install dstat –y • Or • $ sudo apt-get install dstat 13
  11. 11. DSTAT UTILITY 14
  12. 12. DSTAT UTILITY • Dstat for one disk and only CPU info and disk reads: $ dstat -cdl -D xvda1 $ dstat -d 15
  13. 13. WHAT IS NMON? 16 • information dump creation related to cpu, memory, IO or network. • Install: $ yum install nmon –y • Use menu • q<enter> to quit or • <ctrl> c
  14. 14. NMON CPU • Click C <enter> 17
  15. 15. NMON MEMORY • m<enter> 18
  16. 16. NMON NFS • NFS<enter> 19
  17. 17. KNOWING ABOUT OPEN FILES Knowing what files are open are important when waiting for a database to close. Awareness of access of files by a process that may not be known To identify what user has what files open
  18. 18. LSOF AND FUSER •Open Files •File Access Information •Ability to kill only one accessing and write
  19. 19. LSOF • LiSt of Open Files • In our example, we want to see the files that are open in the Temp directory. • Although lsof +D with the directory name should display them, Linux may not provide as specific data as I’d prefer. • I want the child PID 17669, only Temp and I want it sorted • Using grep, we can combat this: $ lsof | grep Temp | grep 17669 | sort
  20. 20. TEMP FILES, FOR CHILD 17669, SORTED
  21. 21. QUERYING OPEN DATAFILES $ lsof -c sqlservr | grep mdf sqlservr 7 root 84u REG 8,1 8388608 1050924 /var/opt/mssql/data/tempdb.mdf sqlservr 7 root 121u REG 8,1 4194304 1050887 /var/opt/mssql/data/master.mdf sqlservr 7 root 127u REG 8,1 8388608 1050889 /var/opt/mssql/data/model.mdf sqlservr 7 root 129u REG 8,1 13959168 1050891 /var/opt/mssql/data/msdbdata.mdf
  22. 22. WHAT FILES ARE OPENED BY USER • $ lsof –u <username> • Jsmith user is having issues working with an application. • Query reports back, but the corresponding commit doesn’t occur. • Believe that login is incorrectly configured. • Now a touch of a file in the directory shows this: jsmith@e09627d558b7:/opt/mssql-tools/bin$ touch test.txt • touch: cannot touch 'test.txt': Permission denied
  23. 23. WHAT DOES LSOF SHOW?
  24. 24. FUSER AND KILLING USERS • -k,--kill kill processes accessing the named file • -w,--writeonly kill only processes with write access $ fuser -i -k <port>/tcp $ fuser –kw <filename>
  25. 25. FUSER –V –M <FILENAME> USER PID ACCESS COMMAND /var/opt/mssql/scripts/push.sh: root 29314 ....m bash root 31503 ....m bash jsmith 21963 …m bash fuser –kw push.sh
  26. 26. ADMINISTRATOR PERF TOOLS FTrace Perf-Events eBPF LTTng SystemTap KTap sysdig
  27. 27. ABOUT LINUX TRACE FILES • Calls are returned • -1 is commonly a sign of an error • Calls can be unfinished and then resumed • Calls are kept as close to readable as possible, Example: ''ls -l /dev/null'' is captured as: lstat("/dev/null", {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
  28. 28. DEEP PERFORMANCE AND TRACING TRACE-CMD Perf-Tools DIG DXToolkit
  29. 29. GOAL • Not to duplicate previous in-depth sessions • Valuable and make you want to learn more • Easy to use • Variety of tools across resources • Hit main pain points
  30. 30. WHY TRACE-CMD AND PERF-TOOLS • Uses FTRACE and Linux Perf Events • Removes a huge percentage of tracing and performance tools with numerous overhead/knowledge. • Already enabled and in use for most Linux hosts • Part of Linux kernel
  31. 31. FTRACE AND THE CASE FOR TRACE-CMD • Great Github page, already installed on most systems, if not, consider leaving this to a full admin vs. a DBA. • Requires root privileges, setup of debugging and tracing on the host, along with mounted of /sys volume. • If you want to run this in a container, it is possible, but the container must be created with privileged:true upon run. • Uses the /sys/kernel/debug/tracing • Consider simplifying using trace-cmd package for DBA use. CONFIG_FUNCTION_TRACER CONFIG_FUNCTION_GRAPH_TRACER CONFIG_STACK_TRACER CONFIG_DYNAMIC_FTRACE
  32. 32. THE NEXT TWO TOOLS USE Trace points, i.e kernel static tracing Kernel dynamic tracing, aka kprobes And User level tracing, also called uprobes Produce kernel level trace output
  33. 33. IN OR OUT In the kernel or outside the kernel, it’s pretty simple. • In-tree: ftrace, perf_events, eBPF • Out-of-tree: System Tap, ktap, LTTng, dtrace4linux, Dtrace, sysdig
  34. 34. WHAT ARE TRACEPOINTS?
  35. 35. PROBES
  36. 36. TRACE CMD # trace-cmd record -e ext4 ls [...] # trace-cmd report version = 6 CPU 1 is empty cpus=2 trace-cmd-7374 [000] 1062.484227: ext4_request_inode: dev 253:2 dir 40801 mode 33188 trace-cmd-7374 [000] 1062.484309: ext4_allocate_inode: dev 253:2 ino 10454 dir 40801 mode 33188 Output file is trace.dat Output is saved as a trace.dat file.
  37. 37. EXAMPLE OF OUTPUT trace-cmd report trace-cmd-16129 [002] 158126.498411: function: __mutex_unlock_slowpath <-- mutex_unlock trace-cmd-16131 [000] 158126.498411: kmem_cache_alloc: call_site=811223c5 ptr=0xffff88003ecf2b40 bytes_req=272 bytes_alloc=320 gfp_flags=GFP_KERNEL|GFP_ZERO trace-cmd-16130 [003] 158126.498411: function: do_splice_to <-- sys_splice sleep-16133 [001] 158126.498412: function: inotify_inode_queue_event <-- vfs_write trace-cmd-16129 [002] 158126.498420: lock_release: 0xffff88003f1fa4f8 &sb->s_type->i_mutex_key trace-cmd-16131 [000] 158126.498421: function: security_file_alloc <-- get_empty_filp sleep-16133 [001] 158126.498422: function: __fsnotify_parent <-- vfs_write trace-cmd-16130 [003] 158126.498422: function: rw_verify_area <-- do_splice_to trace-cmd-16131 [000] 158126.498424: function: cap_file_alloc_security <-- security_file_alloc trace-cmd-16129 [002] 158126.498425: function: syscall_trace_leave <-- int_check_syscall_exit_work sleep-16133 [001] 158126.498426: function: inotify_dentry_parent_queue_event <-- vfs_write [003] 158126.498426: function: security_file_permission <-- rw_verify_area trace-cmd-16129 [002] 158126.498428: function: audit_syscall_exit <-- syscall_trace_leave
  38. 38. READING A FILTERED FTRACE REPORT • CPU=0 • Success/Error/Warning/Info • Success = 1
  39. 39. NETWORK TRACING WITH TRACER-CMD • Network trace from FRODO host to GANDOLF host • File is created on target, GANDOLF • Can view exactly what transpires during the network connection # trace-cmd record -N gandalf:12345 -e sched_switch -e sched_wakeup -e irq hackbench 50
  40. 40. LINUX OBSERVATORY PERF TOOLS • Function collectors • Funccount • Functrace • Funcslower • Funcgraph • kprobe • IO • Iosnoop • Iolatency • Bitesize https://github.com/brendangregg/perf-tools Snoop Opensnoop Execsnoop syscount
  41. 41. PERF-TOOLS • Sources the Linux kernel • Easily added via the linux-tools-common • Dynamic buffering means very little overhead to collect performance data • Capable of profiling, CPU performance, user-level stack collection • Capable of consuming debug info for line tracing and local variables • No kernel programming at this time and very safe to install
  42. 42. IOSNOOP • By Device • By ioType • By Name • Includes • Start/end time • Usage • Duration • Block info • Queueing time • CPU Time
  43. 43. IOSNOOP- WORKLOAD DIFFERENCES SSD COMM PID TYPE DEV BLOCK BYTES LATms randread 20125 R 202,16 24803976 8192 0.15 randread 20125 R 202,32 17527272 8192 0.15 randread 20125 R 202,16 13382360 8192 0.15 randread 20125 R 202,32 29727160 8192 0.19 randread 20125 R 202,32 26965272 8192 0.18 randread 20125 R 202,32 27222376 8192 0.17 Standard Disk COMM PID TYPE DEV BLOCK BYTES LATms randread 6199 R 202,16 71136208 8192 5.32 randread 6199 R 202,16 83134400 8192 9.26 randread 6199 R 202,16 88477928 8192 3.46 randread 6199 R 202,16 66953696 8192 10.69 randread 6199 R 202,16 87832704 8192 3.68 randread 6199 R 202,16 74963120 8192 4.62
  44. 44. OUTLIERS IN IO DATA
  45. 45. PRINTK • Function vs. Utility • Executed printk() • C programming language against the Linux Kernel • Prints the string to kernel log, allowing you to post messages directly.
  46. 46. RECENT CHANGES Recent changes in the /etc/sysctl.conf. Now these numbers may not seem very important between the versions: kernel.printk = 4 4 1 7  kernel.printk = 3 4 1 3 These actually correspond to: • console_loglevel • default_message_loglevel • minimum_console_loglevel • default_console_loglevel
  47. 47. PRINTK MESSAGE - OLDER KERN_EMERG /* system is unusable */ KERN_ALERT /* action must be taken immediately */ KERN_CRIT /* critical conditions */ KERN_ERR /* error conditions */ KERN_WARNING /* warning conditions */ KERN_NOTICE /* normal but significant condition */ KERN_INFO /* informational */ KERN_DEBUG /* debug-level messages */
  48. 48. EXECSNOOP • Use instead of atop • Inspects short lived CPU processes for demands on resources at a kernel level • Uses a “live log” approach for historical research Ever wanted to know exactly what was happening during a start up of an executable or while an application is performing a task?
  49. 49. EXECUTION TRACING
  50. 50. CAPTURE UNEXPECTED BEHAVIOR TIMESTAMPS DISPLAY EACH STEP IN AN APPLICATION PROCESS
  51. 51. OPENSNOOP • Similar to LSOF, focusing on open files • Doesn’t use strace, which is sourced for LSOF • More efficient
  52. 52. REVEALS SAME DATA AS LSOF
  53. 53. BENEFIT OVER LSOF Displays trace info on “files not found”
  54. 54. OK, THE MY LITTLE PONY AND LINUX TRACING?
  55. 55. DIG $ dig <host name> -<x> +<command> Can be used with .digrc file on host to provide more information about DNS. Can use it with monitoring and alerting for the DBA to know when hosts are experiencing DNS issues, which often are blamed on databases vs. DNS.
  56. 56. DIG EXAMPLES
  57. 57. DXTOOLKIT This is free from Delphix and was written by some great folks on my team- (Marcin, Eduardo, etc.) Along with a ton of tools, it also troubleshoots performance issues on VMs Written in Perl Reads config file in JSON format Can be downloaded at Github: https://github.com/delphix/dxtoolkit
  58. 58. VM ENVIRONMENT ANALYSIS dx_get_analytics -d Landshark -i 3600 -t standard -outdir /tmp Connected to Delphix Engine Landshark (IP delphix42) Generating cpu raw report file /tmp/Landshark-analytics-cpu-raw.csv Generating cpu aggregated report file /tmp/Landshark-analytics-cpu-aggregated.csv Generating disk raw report file /tmp/Landshark-analytics-disk-raw.csv Generating disk aggregated report file /tmp/Landshark-analytics-disk-aggregated.csv Generating iscsi raw report file /tmp/Landshark-analytics-iscsi-raw.csv Generating iscsi aggregated report file /tmp/Landshark-analytics-iscsi-aggregated.csv Generating network raw report file /tmp/Landshark-analytics-network-raw.csv Generating network aggregated report file /tmp/Landshark-analytics-network-aggregated.csv Generating nfs raw report file /tmp/Landshark-analytics-nfs-raw.csv Generating nfs aggregated report file /tmp/Landshark-analytics-nfs-aggregated.csv • dx_get_analytics
  59. 59. VM CPU PERFORMANCE • Dx_get_cpu dx_get_cpu -d DE1 OK: DE1 cpu utilization 27.70 Average CPU utilization for a last 5 minutes using 1-second sample with warning level set to 20 % dx_get_cpu -d Landshark5 -w 20 WARNING: Landshark5 cpu utilization 21.50 Average CPU utilization for a 20th April 2016 using 1-minutes data dx_get_cpu -d DE1 -i 60 -st "2016-04-20" -et "2016-04-21" OK: DE1 cpu utilization 30.80 Raw CPU data for a last 5 minutes using 60-second sample dx_get_cpu -d Landshark5 -raw -i 60 #timestamp,util 2016-04-21 09:22:00,26.20 2016-04-21 09:23:00,23.77 2016-04-21 09:24:00,35.94 2016-04-21 09:25:00,35.88
  60. 60. VM IO LATENCY • dx_get_disk_latency Average disk read and write latency for a last 5 minutes using 1-second sample dx_get_disk_latency -d DE1 OK:DE1 disk latency milliseconds 0.29 Average disk write latency for a last 5 minutes using 1-second sample dx_get_disk_latency -d DE1 -opname w OK: DE1 disk latency milliseconds 0.25
  61. 61. TESTING NETWORK LATENCY • dx_ctl_network_tests dx_ctl_network_tests -d Landshark5 -type latency -duration 30 -remoteaddr LINUXTARGET,linuxsource Starting job JOB-7645 for test . 0 - 6 - 10 - 13 - 16 - 20 - 23 - 26 - 30 - 33 - 36 - 40 - 43 - 46 - 50 - 53 - 56 - 60 - 63 - 66 - 70 - 73 - 76 - 80 - 83 - 86 - 90 - 93 - 96 - 100 Job JOB-7645 finished with state: COMPLETED Starting job JOB-7646 for test . 0 - 6 - 10 - 13 - 16 - 20 - 23 - 26 - 30 - 33 - 36 - 40 - 43 - 46 - 50 - 53 - 56 - 60 - 63 - 66 - 70 - 73 - 76 - 80 - 83 - 86 - 90 - 93 - 96 - 100 Job JOB-7646 finished with state: COMPLETED
  62. 62. SUMMARY • Tons of tools out there • Known tools provide data, but as DBAs, we commonly don’t need to go more indepth. • The popularity of Snapper, Dtrace, etc. should offer us the insight to dig into other Linux performance tools. • There’s a tool for every problem. Use it.
  63. 63. THANK YOU! Twitter: @DBAKevlar Email: dbakevlar@gmail.com Blog: http://dbakevlar.com

×