13. before we get this
horror show rolling
• kernels, drivers, glibc, and everything else
changes.
• code snips will differ from what you are
running on your machines.
• some things are simplified in the interest of
time.
14. bprobe
• boundary IPFIX flow meter
• collects flow data by sniffing packets with libpcap
• also collects low level NIC data from the driver
• packets tx/rx
• bytes tx/rx
• ethernet collisions
• ethernet errors
15. ethernet bonding (aka teaming)
• combine a group of physical NICs (eth0, eth1, ...)
into a single virtual device (bond0, bond1, ...).
• different modes
• active-passive
• round robin
• link aggregation
17. how does bonding work (on linux) ?
• at a high level...
• the bonding driver creates a “virtual device”
• when a packet is sent, bonding driver figures
out which physical NIC to transmit the packet
on.
• when a packet comes in, the NICs pass the
incoming packet up for the higher layers of the
network stack to figure out.
18. bprobe and bonding
• bprobe discovers bonded network
interfaces.
• uses libpcap to monitor the underlying
physical NICs instead of bond devices.
• detecting link failures, etc
22. Bug was filed...
• Debian Lenny, 64bit.
• Bonded ethernet interfaces.
•No incoming packets are showing up.
23. Step 0
•Take a step back.
•Breathe.
•Do not break the computer.
24. Step 1
• Examine our assumptions:
• The packets are making it to the kernel.
• The packets are being handed up from the
kernel to libpcap.
• libpcap doesn’t lose any packets before
bprobe examines them.
• bprobe has some weird bug in it.
32. Peel some layers away
• bprobe is really libpcap + packet analysis +
output.
• if this is a bug in the kernel or libpcap then
other programs that use libpcap (like
tcpdump) will also fail the same way.
• so, do they?
33.
34. tcpdump
• bonded ethernet interfaces (on linux) are virtual
devices created by combining other devices.
• for example:
• bond0
• eth0
• eth2
• eth4
• ...
35. First, sniff bond0...
% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 1
12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP
echo request, id 62831, seq 54, length 64
12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP
echo request, id 62831, seq 55, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
37. Now eth0 (the active NIC in
bond0)
% sudo tcpdump -i eth0 dst
172.16.209.136 and proto 1
^C
0 packets captured
2 packets received by filter
0 packets dropped by kernel
51. Steps 3-5
• Dig until you see something you haven’t
seen before.
• Read all of the code and understand it.
• Go to step 2.
52. how are packets received?
• packets come in from the wire.
• a couple different ways for the kernel to
“know” about new packets.
• let’s just look at the simple case.
• an interrupt is raised when a packet arrives.
• both paths hand data up to the higher
layers in similar ways.
66. packet protocol family
(in the kernel)
libpcap
(in userland)
bprobe/tcpdump/etc
(in userland)
network device agnostic layer
(in the kernel)
67. libpcap (userland)
• creates a socket of type PF_PACKET
• two ways to get get packets from the kernel:
• one by one (slow)
• via shared memory (fast)
• libpcap tries to use the fast method
• if it fails, it falls back to slow.
73. packet protocol family
(in the kernel)
libpcap
(in userland)
bprobe/tcpdump/etc
(in userland)
network device agnostic layer
(in the kernel)
74. PF_PACKET (kernel)
• libpcap creates the PF_PACKET socket
• the PF_PACKET code in the kernel
(eventually) executes.
• this code does some initialization and
inserts a protocol hook...
75.
76. packet protocol family
(in the kernel)
libpcap
(in userland)
bprobe/tcpdump/etc
(in userland)
network device agnostic layer
(in the kernel)
77. network device agnostic layer
• pulls packets off the backlog queue.
• calls netif_receive_skb()
• has some logic to determine who the real
sender is when bonding is enabled.
• passes the packet through the protocol
hooks.
83. we now know the path packets take
so they can be examined by pcap apps.
84. packet protocol family
(in the kernel)
libpcap
(in userland)
bprobe/tcpdump/etc
(in userland)
network device agnostic layer
(in the kernel)
85.
86.
87. back to the bug
• so, the bug was that packeting sniffing
physical NICs on bonded hosts was not
revealing incoming packets.
• what do we now know about our
environment?
• what would be the best place to look to
track down this bug?
97. Bug
• We overwrite the packet’s device with the bond
device.
• The protocol hook check, checks to see if the hook is
for the device on the packet.
• It isn’t
• we are sniffing eth0
• skb->dev was overwritten to bond0.
• That’s why if you sniff “bond0” you see packets but if
you sniff “eth0” you see nothing.
107. Now eth0 (the active NIC in
bond0)
% sudo tcpdump -i eth0 dst
172.16.209.136 and proto 1
^C
0 packets captured
2 packets received by filter
0 packets dropped by kernel
121. Step 0
•Take a step back.
•Breathe.
•Do not break the computer.
122. Step 1
• Examine our assumptions:
• The kernel code is still broken.
• The incoming packets are being queued up for
libpcap to pull out of PF_PACKET properly.
• There probably isn’t bug in bProbe and
tcpdump.
126. i used apt-get source to
retrieve the official source for
debian lenny’s libpcap and I
found something
surprising.
127. old way of doing pcap
• debian lenny’s kernel supports the new way
of getting packets out of the kernel via
mmap.
•but, debian lenny’s libpcap is not new
enough and therefore uses the old way
to examine packets.
•this also means that unless i statically link
the libpcap version i want, my app will just
perform worse on lenny.
130. that if statement fails.
• we are sniffing packets on a physical device
• BUT in the kernel we are changing the
device a packet comes in on to the bond
device (remember in netif_receive_skb?)
131.
132. that if statement fails.
• the index of the bond device is different from
the index of the physical device we are sniffing
• so this if statement evaluates to TRUE
• libpcap returns without processing
the packet.
133. why?
this code exists to prevent a race condition
when sniffing packets the old way in some
kernels.
134. solution
• boot into our fixed debian lenny kernel.
• download a version of libpcap that is newer and
supports the mmap method for packet sniffing.
• new method doesn’t have this race condition
and has better performance.
• link bprobe/tcpdump/other pcap apps against it.
135. First, sniff bond0...
% sudo tcpdump -i bond0 dst 172.16.209.136 and proto 1
12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP
echo request, id 62831, seq 54, length 64
12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP
echo request, id 62831, seq 55, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
136. Next, sniff eth0...
% sudo tcpdump -i eth0 dst 172.16.209.136 and proto 1
12:57:26.275660 IP 172.16.209.1 > 172.16.209.136: ICMP
echo request, id 62831, seq 54, length 64
12:57:27.275731 IP 172.16.209.1 > 172.16.209.136: ICMP
echo request, id 62831, seq 55, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
140. summarize
• kernel bug when overwriting the device the
packet arrived on.
• fixed this bug, but bprobe/tcpdump still
failed.
• libpcap bug when pulling packets out the
kernel the old way
• can avoid this bug and get better
performance with a newer libpcap
141. Step 0
•Take a step back.
•Breathe.
•Do not break the computer.
142. Step 1-5
• Examine your assumptions.
• Start digging.
• Keep going until you see something you
haven’t seen before.
• Read all of the code and understand it.
• Go to step 2.
159. W A T
• We have 2 programs:
• Both link against libraries in /usr/local/lib/
• Only one works.
• The broken program’s library is in /usr/local/lib/
160.
161. Step 0
•Take a step back.
•Breathe.
•Do not break the computer.
162. Step 1
• Examine our assumptions:
• The programs and libraries are both 64bit.
• /usr/local/lib/ is in the library search path
174. Strange
• This is confusing.
• bprobe should fail.
• But, the shared libraries a particular binary
dynamically links to at runtime are built
into the binary itself.
• So....
183. ah ha!
• bprobe works and can link because the
binary is storing the library path inside of
itself.
• but, now there are 2 more questions:
• how did the rpath tag get there?
• why doesn’t ipfix_reader have one?