18. lsof -nPp <pid>
-n
Inhibits the conversion of network numbers to host names.
-P
Inhibits the conversion of port numbers to names for network files
FD TYPE NAME json
cwd DIR /var/www/myapp memcached
txt REG /usr/bin/ruby mysql
mem REG /json-1.1.9/ext/json/ext/generator.so http
mem REG /json-1.1.9/ext/json/ext/parser.so
mem REG /memcached-0.17.4/lib/rlibmemcached.so
mem REG /mysql-2.8.1/lib/mysql_api.so
0u CHR /dev/null
1w REG /usr/local/nginx/logs/error.log
2w REG /usr/local/nginx/logs/error.log
3u IPv4 10.8.85.66:33326->10.8.85.68:3306 (ESTABLISHED)
10u IPv4 10.8.85.66:33327->10.8.85.68:3306 (ESTABLISHED)
11u IPv4 127.0.0.1:58273->127.0.0.1:11211 (ESTABLISHED)
12u REG /tmp/RackMultipart.28957.0
33u IPv4 174.36.83.42:37466->69.63.180.21:80 (ESTABLISHED)
19. STRACE
trace system calls and signals
strace -cp <pid>
strace -ttTp <pid> -o <file>
20. strace -cp <pid>
-c
Count time, calls, and errors for each system call and report a
summary on program exit.
-p pid
Attach to the process with the process ID pid and begin tracing.
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
50.39 0.000064 0 1197 592 read
34.65 0.000044 0 609 writev
14.96 0.000019 0 1226 epoll_ctl
0.00 0.000000 0 4 close
0.00 0.000000 0 1 select
0.00 0.000000 0 4 socket
0.00 0.000000 0 4 4 connect
0.00 0.000000 0 1057 epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00 0.000127 4134 596 total
21. strace -ttTp <pid> -o <file>
-tt
If given twice, the time printed will include the microseconds.
-T
Show the time spent in system calls.
-o filename
Write the trace output to the file filename rather than to stderr.
epoll_wait(9, {{EPOLLIN, {u32=68841296, u64=68841296}}}, 4096, 50) = 1 <0.033109>
accept(10, {sin_port=38313, sin_addr="127.0.0.1"}, [1226]) = 22 <0.000014>
fcntl(22, F_GETFL) = 0x2 (flags O_RDWR) <0.000007>
fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000008>
setsockopt(22, SOL_TCP, TCP_NODELAY, [1], 4) = 0 <0.000008>
accept(10, 0x7fff5d9c07d0, [1226]) = -1 EAGAIN <0.000014>
epoll_ctl(9, EPOLL_CTL_ADD, 22, {EPOLLIN, {u32=108750368, u64=108750368}}) = 0 <0.000009>
epoll_wait(9, {{EPOLLIN, {u32=108750368, u64=108750368}}}, 4096, 50) = 1 <0.000007>
read(22, "GET / HTTP/1.1r"..., 16384) = 772 <0.000012>
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
poll([{fd=5, events=POLLIN|POLLPRI}], 1, 0) = 0 (Timeout) <0.000008>
write(5, "1000000-0003SELECT * FROM `table`"..., 56) = 56 <0.000023>
read(5, "25101,20x234m"..., 16384) = 284 <1.300897>
26. stracing ruby: sigprocmask
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.326334 0 3568567 rt_sigprocmask
0.00 0.000000 0 9 read
0.00 0.000000 0 10 open
0.00 0.000000 0 10 close
0.00 0.000000 0 9 fstat
0.00 0.000000 0 25 mmap
------ ----------- ----------- --------- --------- ----------------
100.00 0.326334 3568685 0 total
• debian/redhat compile ruby with --enable-pthread
• uses a native thread timer for SIGVTALRM
• causes excessive calls to sigprocmask: 30%
slowdown!
27. TCPDUMP
dump traffic on a network
tcpdump -i eth0 -s 0 -nqA
tcp dst port 3306
28. tcpdump -i <eth> -s <len> -nqA <expr>
tcpdump -i <eth> -w <file> <expr>
-i <eth>
Network interface.
-s <len>
Snarf len bytes of data from each packet.
-n
Don't convert addresses (host addresses, port numbers) to names.
-q
Quiet output. Print less protocol information.
-A
Print each packet (minus its link level header) in ASCII.
-w <file>
Write the raw packets to file rather than printing them out.
<expr>
libpcap expression, for example:
tcp src port 80
tcp dst port 3306
29. tcp dst port 80
19:52:20.216294 IP 24.203.197.27.40105 >
174.37.48.236.80: tcp 438
E...*.@.l.%&.....%0....POx..%s.oP.......
GET /poll_images/cld99erh0/logo.png HTTP/1.1
Accept: */*
Referer: http://apps.facebook.com/realpolls/?
_fb_q=1
30. tcp dst port 3306
19:51:06.501632 IP 10.8.85.66.50443 >
10.8.85.68.3306: tcp 98
E..."K@.@.Yy
.UB
.UD.....z....L............
GZ.y3b..[......W....
SELECT * FROM `votes` WHERE (`poll_id` =
72621) LIMIT 1
39. require 'sinatra'
$ ab -c 1 -n 50 http://127.0.0.1:4567/compute
$ ab -c 1 -n 50 http://127.0.0.1:4567/sleep
get '/sleep' do
sleep 0.25
'done' • Sampling profiler:
end
• 232 samples total
get '/compute' do • 83 samples were in /compute
proc{ |n|
a,b=0,1 • 118 samples had /compute on
the stack but were in
n.times{ a,b = b,a+b } another function
b
}.call(10_000) • /compute accounts for 50%
'done' of process, but only 35% of
time was in /compute itself
end
== Sinatra has ended his set (crowd applauds)
PROFILE: interrupts/evictions/bytes = 232/0/2152
Total: 232 samples
83 35.8% 35.8% 118 50.9% Sinatra::Application#GET /compute
56 24.1% 59.9% 56 24.1% garbage_collector
35 15.1% 75.0% 113 48.7% Integer#times
52. GDB
the GNU debugger
gdb <executable>
gdb attach <pid>
53. Debugging Ruby Segfaults
test_segv.rb:4: [BUG] Segmentation fault
ruby 1.8.7 (2008-08-11 patchlevel 72) [i686-darwin9.7.0]
def test
#include "ruby.h" require 'segv'
4.times do
VALUE Dir.chdir '/tmp' do
segv() Hash.new{ segv }[0]
{ end
VALUE array[1]; end
array[1000000] = NULL; end
return Qnil;
} sleep 10
test()
void
Init_segv()
{
rb_define_method(rb_cObject, "segv", segv, 0);
}
54. 1. Attach to running process
$ ps aux | grep ruby
joe 23611 0.0 0.1 25424 7540 S Dec01 0:00 ruby test_segv.rb
$ sudo gdb ruby 23611
Attaching to program: ruby, process 23611
0x00007fa5113c0c93 in nanosleep () from /lib/libc.so.6
(gdb) c
Continuing.
Program received signal SIGBUS, Bus error.
segv () at segv.c:7
7 array[1000000] = NULL;
2. Use a coredump
Process.setrlimit Process::RLIMIT_CORE, 300*1024*1024
$ sudo mkdir /cores
$ sudo chmod 777 /cores
$ sudo sysctl kernel.core_pattern=/cores/%e.core.%s.%p.%t
$ sudo gdb ruby /cores/ruby.core.6.23611.1259781224
55. def test
require 'segv'
4.times do
Dir.chdir '/tmp' do
Hash.new{ segv }[0]
end
end (gdb) where
end #0 segv () at segv.c:7
#1 0x000000000041f2be in call_cfunc () at eval.c:5727
test() ...
#13 0x000000000043ba8c in rb_hash_default () at hash.c:521
...
#19 0x000000000043b92a in rb_hash_aref () at hash.c:429
...
#26 0x00000000004bb7bc in chdir_yield () at dir.c:728
#27 0x000000000041d8d7 in rb_ensure () at eval.c:5528
#28 0x00000000004bb93a in dir_s_chdir () at dir.c:816
...
#35 0x000000000041c444 in rb_yield () at eval.c:5142
#36 0x0000000000450690 in int_dotimes () at numeric.c:2834
...
#48 0x0000000000412a90 in ruby_run () at eval.c:1678
#49 0x000000000041014e in main () at main.c:48
61. def test
require 'segv'
4.times do
Dir.chdir '/tmp' do
Hash.new{ segv }[0]
end
end
end
(gdb) ruby threads
test()
0xa3e000 main curr thread THREAD_RUNNABLE WAIT_NONE
node_vcall segv in test_segv.rb:5
node_call test in test_segv.rb:5
node_call call in test_segv.rb:5
node_call default in test_segv.rb:5
node_call [] in test_segv.rb:5
node_call test in test_segv.rb:4
node_call chdir in test_segv.rb:4
node_call test in test_segv.rb:3
node_call times in test_segv.rb:3
node_vcall test in test_segv.rb:9
63. mongrel sleeper thread
0x16814c00 thread THREAD_STOPPED WAIT_TIME(0.47) 1522 bytes
node_fcall sleep in lib/mongrel/configurator.rb:285
node_fcall run in lib/mongrel/configurator.rb:285
node_fcall loop in lib/mongrel/configurator.rb:285
node_call run in lib/mongrel/configurator.rb:285
node_call initialize in lib/mongrel/configurator.rb:285
node_call new in lib/mongrel/configurator.rb:285
node_call run in bin/mongrel_rails:128
node_call run in lib/mongrel/command.rb:212
node_call run in bin/mongrel_rails:281
node_fcall (unknown) in bin/mongrel_rails:19
def run
@listeners.each {|name,s|
s.run
}
$mongrel_sleeper_thread = Thread.new { loop { sleep 1 } }
end
64. god memory leaks
(gdb) ruby objects arrays 43 God::Process
elements instances 43 God::Watch
94310 3 43 God::Driver
94311 3 43 God::DriverEventQueue
94314 2 43 God::Conditions::MemoryUsage
94316 1 43 God::Conditions::ProcessRunning
43 God::Behaviors::CleanPidFile
5369 arrays 45 Process::Status
2863364 member elements 86 God::Metric
327 God::System::SlashProcPoller
many arrays with 327 God::System::Process
90k+ elements! 406 God::DriverEvent
5 separate god leaks fixed by Eric
Lindvall with the help of gdb.rb!
66. • BleakHouse
• installs a patched version of ruby: ruby-bleak-
house
• unlike gdb.rb, see where objects were created
(file:line)
• create multiple dumps over time with `kill -USR2
<pid>` and compare to find leaks
191691 total objects
Final heap size 191691 filled, 220961 free
Displaying top 20 most common line/class pairs
89513 __null__:__null__:__node__
41438 __null__:__null__:String
2348 ruby/site_ruby/1.8/rubygems/specification.rb:557:Array
1508 ruby/gems/1.8/specifications/gettext-1.9.gemspec:14:String
1021 ruby/gems/1.8/specifications/heel-0.2.0.gemspec:14:String
951 ruby/site_ruby/1.8/rubygems/version.rb:111:String
935 ruby/site_ruby/1.8/rubygems/specification.rb:557:String
834 ruby/site_ruby/1.8/rubygems/version.rb:146:Array
67. IOPROFILE
summarizes strace and lsof
wget http://aspersa.googlecode.com/svn/trunk/ioprofile
68. strace -cp <pid>
ioprofile is a script that captures one sample of lsof output then
starts strace for a specified amount of time.
after strace finishes, the results are processed.
below is an example that comes with ioprofile
$ ioprofile t/samples/ioprofile-001.txt
total pread read pwrite write filename
10.094264 10.094264 0.000000 0.000000 0.000000 /data/data/abd_2dia/aia_227_228.ibd
8.356632 8.356632 0.000000 0.000000 0.000000 /data/data/abd_2dia/aia_227_223.ibd
0.048850 0.046989 0.000000 0.001861 0.000000 /data/data/abd/aia_instances.ibd
0.035016 0.031001 0.000000 0.004015 0.000000 /data/data/abd/vo_difuus.ibd
0.013360 0.000000 0.001723 0.000000 0.011637 /var/log/mysql/mysql-relay.002113
0.008676 0.000000 0.000000 0.000000 0.008676 /data/data/master.info
0.002060 0.000000 0.000000 0.002060 0.000000 /data/data/ibdata1
0.001490 0.000000 0.000000 0.001490 0.000000 /data/data/ib_logfile1
0.000555 0.000000 0.000000 0.000000 0.000555 /var/log/mysql/mysql-relay-log.info
0.000141 0.000000 0.000000 0.000141 0.000000 /data/data/ib_logfile0
0.000100 0.000000 0.000000 0.000100 0.000000 /data/data/abd/9fvus.ibd
69. strace -ttTp <pid> -o <file>
-c CELL
specify what to put in the cells of the output. ‘times’, ‘count’,
or ‘sizes‘.
below is an example of -c sizes:
$ ioprofile -c sizes t/samples/ioprofile-001.txt
total pread read pwrite write filename
90800128 90800128 0 0 0 /data/data/abd_2dia/aia_227_223.ibd
52150272 52150272 0 0 0 /data/data/abd_2dia/aia_227_228.ibd
999424 0 0 999424 0 /data/data/ibdata1
638976 131072 0 507904 0 /data/data/abd/vo_difuus.ibd
327680 114688 0 212992 0 /data/data/abd/aia_instances.ibd
305263 0 149662 0 155601 /var/log/mysql/mysql-relay.002113
217088 0 0 217088 0 /data/data/ib_logfile1
22638 0 0 0 22638 /data/data/master.info
16384 0 0 16384 0 /data/data/abd/9fvus.ibd
1088 0 0 0 1088 /var/log/mysql/mysql-relay-log.info
512 0 0 512 0 /data/data/ib_logfile0
71. /proc/[pid]/pagemap
• (as of linux 2.6.25)
• /proc/[pid]/pagemap - find out which physical
frame each virtual page is mapped to and swap
flag.
• /proc/kpagecount - stores number of times a
particular physical page is mapped.
• /proc/kpageflags - flags about each page (locked,
slab, dirty, writeback, ...)
• read more: Documentation/vm/pagemap.txt
75. /proc/sys/kernel/core_pattern
• tell system where to output core dumps
• can also launch user program when
core dumps happen.
• echo "|/path/to/core_helper.rb %p %s %u
%g" > /proc/sys/kernel/core_pattern
• /proc/[pid]/ files are kept in place until your
handler exits, so full state of the process at
death may be inspected.
• http://gist.github.com/587443
76. and lots more
/proc/meminfo
/proc/scsi/*
/proc/net/*
/proc/sys/net/ipv4/*
...
78. • how do you know when a hard drive in
your RAID array fails?
• turns out there some command line tools
that most RAID vendors provide.
• adaptec - /usr/StorMan/arcconf getconfig 1
AL
• 3ware - /usr/bin/tw_cli
79. snooki:/# /usr/StorMan/arcconf getconfig 1 AL
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Optimal
Channel description : SAS/SATA
Controller Model : Adaptec 3405
Controller Serial Number : 7C4911519E3
Physical Slot :2
Temperature : 42 C/ 107 F (Normal)
Installed memory : 128 MB
Copyback : Disabled
Background consistency check : Enabled
Automatic Failover : Enabled
Global task priority : High
Stayawake period : Disabled
Spinup limit internal drives :0
Spinup limit external drives :0
Defunct disk drive count :0
Logical devices/Failed/Degraded : 1/0/0
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.2-0 (17304)
Firmware : 5.2-0 (17304)
Driver : 1.1-5 (2461)
80. • write a script to parse that
• run it with cron
• the ugly ruby script i use for adaptec:
http://gist.github.com/643666
84. CONFIG_HZ_100=y
CONFIG_HZ=100
• Set the timer interrupt frequency.
• Fewer timer interrupts means processes
run with fewer interruptions.
• Servers (without interactive software)
should have lower timer frequency.
88. sudo ethtool -K eth1 tso on
• Allows kernel to offload large packet
segmentation to the network adapter.
• Frees the CPU to do more useful work.
• After running the command above, verify
with:
[joe@timetobleed]% dmesg | tail -1
[892528.450378] 0000:04:00.1: eth1: TSO is Enabled
89. http://kerneltrap.org/node/397
Tx/Rx TCP file send long (bi-directional Rx/Tx):
w/o TSO: 1500Mbps, 82% CPU
w/ TSO: 1633Mbps, 75% CPU
Tx TCP file send long (Tx only):
w/o TSO: 940Mbps, 40% CPU
w/ TSO: 940Mbps, 19% CPU
91. CONFIG_DMADEVICES=y
CONFIG_INTEL_IOATDMA=y
CONFIG_DMA_ENGINE=y
CONFIG_NET_DMA=y
• these options enable Intel I/OAT DMA
engine present in recent Xeon CPUs.
• increases throughput because kernel can
offload network data copying to the DMA
engine.
• CPU can do more useful work.
• statistics about savings can be found in
sysfs: /sys/class/dma/
92. check if I/OAT is enabled
[joe@timetobleed]% dmesg | grep ioat
ioatdma 0000:00:08.0: setting latency timer to 64
ioatdma 0000:00:08.0: Intel(R) I/OAT DMA Engine
found, 4 channels, device version 0x12, driver
version 3.64
ioatdma 0000:00:08.0: irq 56 for MSI/MSI-X
94. CONFIG_DCA=y
• I/OAT includes Direct Cache Access (DCA)
• DCA allows a driver to warm CPU cache.
• Requires driver and device support.
• Intel 10GbE driver (ixgbe) supports this
feature.
• Must enable this feature in the BIOS.
• Some vendors hide BIOS option so you will
need a hack to enable DCA.
95. DCA enable hack
get a hack to enable DCA on some vendor
locked systems:
http://github.com/ice799/dca_force
read about the hack here:
http://timetobleed.com/enabling-bios-
options-on-a-live-server-with-no-rebooting/
97. insmod e1000e.ko
InterruptThrottleRate=1
• SOME drivers allow you to specify the interrupt throttling
algorithm.
• e1000e is one of these drivers.
• Two dynamic throttling algorithms: dynamic (1) and dynamic
conservative (3).
• The difference is the interrupt rate for “Lowest Latency”
traffic.
• Algorithm 1 is more aggressive for this traffic class.
• Read driver documentation for more information.
• Be careful to avoid receive livelock.
101. IRQ Affinity
• NIC, disk, etc IRQ handlers can be set to
execute on specific processors.
• The IRQ to CPU map can be found at:
/proc/interrupts
• Individual IRQs may be set in the file:
/proc/irq/[IRQ NUMBER]/smp_affinity
102. IRQ affinity
• Can pin IRQ handlers for devices to specific
CPUs.
• Can then use taskset to pin important
processes to other CPUs.
• The result is NIC and disk will not interrupt
important processes running elsewhere.
• Can also help preserve CPU caches.
105. CONFIG_OPROFILE=y
CONFIG_HAVE_OPROFILE=y
• oprofile is a system wide profiler that can
profile both kernel and application level code.
• oprofile has a kernel driver which collects data
from CPU registers (on x86 these are MSRs).
• oprofile can also annote source code with
performance information.
• this can help you find and fix performance
problems.