Scaling API-first – The story of a global engineering organization
Anatomy of neutron from the eagle eyes of troubelshoorters
1. Anatomy Of OpenStack Neutron Through
The Of Troubleshooters
Sadique Puthen
Cloud Success Architect
27/10/2016
2. INSERT DESIGNATOR, IF NEEDED2
Examples:
1. Security group rules are not effective
2. Newly created instances cannot get
ip from dhcp.
3. Connection to floating ip randomly
fails.
4. Communications through provider
networks are very very slow.
○ Lessons learned.
Understand:
● Will explore only the limited anatomy
associated with the problems
explained here.
Agenda
Explore troubleshooting neutron and its anatomy using real life troubleshooting examples
● Examples are real life troubleshooting
examples.
● These solutions are applicable to the
versions where they were hit.
○ May not be relevant with latest
versions as patches may have
landed to fix some of them
permanently.
● The prime focus of this session is not
on the problem and solution.
○ But on the anatomy of neutron
and troubleshooting methods
applied to solve them.
4. INSERT DESIGNATOR, IF NEEDED4
The rules say only ping and ssh should be
allowed from x source, but everything is
allowed from everywhere.
Security Groups Are Not Working
Rules are not effective, everything is allowed
● Understand how the packets flow
through multiple iptables chains
● Understand where exactly Security
group rules are applied.
● Try with different security groups
and rules including the default one.
5. INSERT DESIGNATOR, IF NEEDED5
Security Group Not Working
FORWARD
neutron-openvswi-FORWARD
neutron-openvswi-sgchain
neutron-openvswitch-oxxx-x neutron-openvswitch-ixxx-x
incoming outgoing
No Yes Yes No
Does it meet
the RETURN
rule?
Does it meet
the RETURN
rule?
neutron-openvswitch-sg-fallback
DROP
Process further rules
and apply default policy
for the chain ACCEPT
6. INSERT DESIGNATOR, IF NEEDED6
The rules say only ping and ssh should be
allowed from x source, but everything is allowed
from everywhere.
Security Groups Are Not Working
Rules are not effective, everything is allowed
● Understand how the packets flow through
multiple chains
● Verify the rules are inserted to required
iptables chains.
● Understand where exactly Security group
rules are applied.
● Verify if the packets are going through the
chain by iptables logging.
● Example Rule.
iptables -A CHAIN -j LOG --log-prefix
“CHAIN:SG:" --log-level <level>
● When we added the rule, we saw that
packets never traverse through iptables
since nothing was logged.
● This led us to fix our focus to hunt for
global parameters that could bypass
iptables and found:
● This was because the default kernel
configuration is set not to send packets in
a linux bridge through iptables.
net.bridge.bridge-nf-call-iptables =1
● Nova dynamically enables this now.
8. INSERT DESIGNATOR, IF NEEDED8
Newly Created Instances Cannot Get DHCP IP
Instance
dhcp discover
ctrl-0
dhcp offer
dhcp request c0
dhcp ACK
DHCP
dhcp discover
ctrl-0
dhcp offer
dhcp request c0
dhcp NACK
dhcp discover
ctrl-0
dhcp offer
dhcp request c0
dhcp NACK
Understand how HA for DHCP works.
● While network is created, a dhcp
server is spawned on each network
node depending on the value of
dhcp_agents_per_network. In this
case 3
● First the instance sends DHCP
discover.
● All DHCP servers respond with an
offer.
● Instance replies with a DHCP
request with server identifier.
● That server replies with ACK, rest of
them don’t respond or does NACK
DHCP DHCP
Instances created previously can still get their dhcp ip address on renewal or reboot.
10. INSERT DESIGNATOR, IF NEEDED10
How does it work after the packet reaches dhcp-server?
nobody 27219 0.0 0.0 15552 540 ? S 13:35 0:00 dnsmasq --no-hosts --no-resolv --strict-order --except-interface=lo
--pid-file=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/host
--addn-hosts=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/opts
--dhcp-leasefile=/var/lib/neutron/dhcp/d0ad9f6b-4927-4070-abe6-b0f149165d1e/leases --dhcp-match=set:ipxe,175 --bind-interfaces --interface=tapd5bf1700-5c
--dhcp-range=set:tag0,192.168.1.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal
● We do not see any of the dhcp server responding with an offer to instance.
○ This requires exploring how neutron dhcp with dnsmasq works.
● Each DHCP server is a dnsmasq process bound to a tapxxx interface in its own namespace.
● The DHCP server reads mac -> ip mapping from static host file and responds to only the mac -> ip
listed there.
● While exploring the host file, we found it contains mac -> ip mapping only for previous instances.
○ File never gets populated with mac -> ip mapping of any newly created instance.
● This led to further investigation on who is responsible to populate this file and how it does.
11. INSERT DESIGNATOR, IF NEEDED11
Who is responsible to update this file?
● dhcp-agent dynamically updates
this file during changes to a port
through message bus.
● This led us to explore the
dhcp-agent logs and found.
2015-07-29 02:12:14.204 36387 TRACE
root NotFound: Basic.consume: (404)
NOT_FOUND - no queue 'dhcp_agent'
in vhost '/'
The solution!
● We saw this error more than 5k times
in each dhcp-agent.log
● Upon further digging, it was found
the rpc/oslo.messaging code was
missing the patch to reconnect to
message bus in case it loses access
to it.
● The immediate problem was solved
by restarting dhcp-agent.
● A permanent fix was added by
backporting the patch to always
reconnect.
13. INSERT DESIGNATOR, IF NEEDED13
Communication to instance through floating ip randomly fails
Random failure.
● 10 ping to floating ip works. Then
drops some pings. Then works and
starts dropping.
○ It’s purely random and there
is no pattern.
● Layer 3 HA is used. Configured
with.
max_l3_agents_per_router=3
l3_ha=True
● This creates three instances of the
routers instances in active/passive
mode.
Like 10 ping works, then stops and loses another 20, then starts working and fails
● Vxlan tunneling is used for
communication between compute
node and network node.
● Floating ip network is a vlan provider
external network.
● Let us try to explore the anatomy of l3
HA configuration before going very
deep into our problem.
15. INSERT DESIGNATOR, IF NEEDED15
Troubleshooting steps.
tapxxx-x
eth0
qbrxxx-x
qvbxxx-x
qvoxxx-x
br-int
patch-tun
patch-int
br-tun
ethx ethx
qg-xxx
br-int
patch-tun
patch-int
br-tun
qr-yyy
ha-zzz
qrouter-xxx
int-br-ex
phy-br-ex
br-ex
ethx
External
network
Only the anatomy master network node for the
router is shown in the diagram.
● Ping to the default gateway of the private
network from instance.
○ That is the ip of qr-yyy. 100%
successful.
● Ping the base ip of qg-xxx from instance.
Every router has a base ip.
○ 100% successful.
● From qrouter-xxx namespace, ping the
default gateway of external network.
○ This reproduces it!
ip netns exec qrouter-xxx ping <ip>
16. INSERT DESIGNATOR, IF NEEDED16
Troubleshooting steps.
Only the anatomy master network node for the
router is shown in the diagram.
● From an external system connected to the
same floating ip network, we tried to ping
the base ip of qg-xxx.
○ This reproduces it.
○ This helped us to focus on br-ex for
rest of troubleshooting.
● Constantly monitored mac learning of ovs
on br-ex bridge. The mac -> port mapping
was flapping randomly for the mac address
of the instance.
# ovs-appctl fdb/show br-ex
port VLAN MAC Age
1 0 00:2a:6a:8c:d6:c4 37
2 0 00:17:a4:77:10:2c 1
# ovs-appctl fdb/show br-ex
port VLAN MAC Age
1 0 00:2a:6a:8c:d6:c4 37
1 0 00:17:a4:77:10:2c 1
tapxxx-x
eth0
qbrxxx-x
qvbxxx-x
qvoxxx-x
br-int
patch-tun
patch-int
br-tun
ethx ethx
qg-xxx
br-int
patch-tun
patch-int
br-tun
qr-yyy
ha-zzz
qrouter-xxx
int-br-ex
phy-br-ex
br-ex
ethx
External
network
17. INSERT DESIGNATOR, IF NEEDED17
Solution: Fix the loop from switch/enclosure
# tcpdump -i eth0
15:20:03.050558 ARP, Request who-has 12.1.1.1 tell 12.1.1.2, length 28
15:20:03.050583 ARP, Request who-has 12.1.1.1 tell 12.1.1.2, length 28
15:20:03.050835 ARP, Reply 12.1.1.1 is-at 00:17:a4:77:10:2c, length 38
15:20:03.050558 ARP, Request who-has 12.1.1.1 tell 12.1.1.2, length 28
15:20:03.050583 ARP, Request who-has 12.1.1.1 tell 12.1.1.2, length 28
15:20:03.050835 ARP, Reply 12.1.1.1 is-at 00:17:a4:77:10:2c, length 38
# ovs-ofctl show br-ex
1 (eth0): addr:00:17:a4:77:10:14
…
2 (phy-br-ex): addr:5e:b6:f8:49:06:41
…..
● Below command will give port number
associated with the port in the ovs bridge.
● The instance mac address should always be
mapped to port phy-br-ex to reach the packet
to instance.
● We did tcpdump on physical interface to
understand why the flapping happens.
● This clearly indicated that there is loop from switch
that confuses ovs and understands the mac of
instances is outside of the system and flips mac - ip
mapping to the mac of physical interface.
● The loop on the hardware/switch was fixed to
resolve this.
○ Beware some bonding mode or a misconfigured
bonding configuration can exhibit the same problem.
# ovs-appctl vlog/set ofproto_dpif_xlate dbg
2016-06-14T05:02:13.155Z|10769|ofproto_dpif_xlate(x)|DBG|bridge
br-ex: learned that 00:17:a4:77:10:2c is on port eth1 in VLAN 13
2016-06-14T05:02:13.155Z|10770|ofproto_dpif_xlate(x)|DBG|bridge
br-ex: learned that 00:17:a4:77:10:2c is on port phy-br-ex in VLAN 13
● Enabled DBG level logging in ofproto_dpif_xlate
to see what OVS is learning when the loop
happens.
19. INSERT DESIGNATOR, IF NEEDED19
Communication to instance is very slow on provider network
What are provider networks?
● Allows you to directly add an
instance to external network.
○ Instance has gatway ip of the
external gateway
● Compute node should be directly
connected to external network.
● The infra was setup to route
packets to external network via
br-ex -> bond0.301 -> bond0 - >
slaves
● Then a vlan provider network was
created using below
Provider networks enable direct communication from instance to external network.
# neutron net-create provider-vlan171
--provider:network_type vlan --router:external true
--provider:physical_network physnet1
--provider:segmentation_id 171 --shared
● Did tcpdump on physical interface
and found packets are getting
fragmented.
● lowered the mtu to 1450 and fixed the
problem.
● This is not vxlan network, but vlan. Is
lowering mtu the ultimate solution. Of
course no!
● Let us try to explore the anatomy of
provider network to get to the bottom
of it.
20. INSERT DESIGNATOR, IF NEEDED20
Communication to instance is very slow on provider network
Let us see how provider network
works.
● The diagram is packet flow on a
compute node.
● When an outgoing packet
reaches qvoxxx-x, ovs adds
internal vlan tag associated with
provider network to the packet.
● When it reaches phy-br-ex, ovs
strips internal tag and adds vlan
tag associated with provider
network to the packet.
● When the packet reaches
bond0.301, it again gets vlan tag
added to packet header
Provider networks enable direct communication from instance to external network.
tapxxx-x
eth0
qbrxxx-x
qvbxxx-x
qvoxxx-x
br-int
int-br-ex
br-ex
phy-br-ex
bond0.301
bond0
External
network
Compute
eth0 eth1
10
301
301x2
21. INSERT DESIGNATOR, IF NEEDED21
Solution: Add plain interface to ovs bridge, not tagged interface
● This obviously causes double
vlan tag on the packet when it
goes out and exceeds the MTU.
● The solution is simple, Add
bond0 to ovs bridge br-ex
instead of bond0.301.
● This was an admin error who was
confused on how provider
network works and mixed with a
doc that explains about flat
provider network while
configuring.
● But troubleshooting was not that
simple.
Avoid double vlan tagging
tapxxx-x
eth0
qbrxxx-x
qvbxxx-x
qvoxxx-x
br-int
int-br-ex
br-ex
phy-br-ex
bond0
External
network
Compute
eth0 eth1
10
301
301
23. INSERT DESIGNATOR, IF NEEDED23
● Collecting prerequisite information to start troubleshooting is time consuming and
confusing.
○ Compute node the instance runs, instance name, port details, internal vlan
tag on each node, etc.
● Too many hops to run tcpdumps for troubleshooting.
○ Not easy to dump patch-peer . Need to mirror to another port.
● Understanding ovs topology is time confusing.
○ Can be mitigated significantly by using
● Do not assume, neutron is always wrong.
○ It can be user error, OS issues, issues with supporting services and Neutron
layer as well.
● Hunting for expertise in each of them is challenging.
● You may have to tread a lot of wrong paths before you get into the right track.
Lessons learned.
Some of the lessons learned while troubleshooting
24. BREAKOUT SESSIONS - Thursday October 27th
Anatomy Of OpenStack Neutron Through
The Eagle Eyes Of Troubleshooters
The Ceph Power Show :: Hands-on Lab to
learn Ceph "The most popular Cinder
backend"
Building self-healing applications with Aodh,
Zaqar and Mistral
Writing A New Puppet OpenStack Module
Like A Rockstar
Ambassador Community Report
VPP: the ultimate NFV vSwitch (and more!)?
Sadique Puthen
Brent Compton, Karan Singh
Zane Bitter, Lingxian Kong (Catalyst IT),
Fei Long Wang (Catalyst IT)
Emilien Macchi
Erwan Gallen, Kavit Munshi (Aptira),
Jaesuk Ahn (SKT), Marton Kiss (Aptira),
Akihiro Hasegawa (Bit-isle Equinix, Inc)
Franck Baudin, Uri Elzur (Intel)
9:00am-9:40am
9:00am-10:30am
9:00am-9:40am
9:50am-10:30am
9:50am-10:30am
9:50am-10:30am
25. BREAKOUT SESSIONS - Thursday October 27th
Zuul v3: OpenStack and Ansible
Native CI/CD
Container Defense in Depth
Analyzing Performance in the Cloud :
solving an elastic problem with a
scientific approach
One-stop-shop for OpenStack tools
OpenStack troubleshooting: So simple
even your kids can do it
Solving Distributed NFV Puzzle with
OpenStack and SDN
Ceph, now and later: our plan for open
unified cloud storage
James Blair
Thomas Cameron, Scott McCarty
Alex Krzos, Nicholas Wakou (Dell)
Ruchika Kharwar
Vinny Valdez, Jonathan Jozwiak
Rimma Iontel, Fernando Oliveira (VZ),
Rajneesh Bajpai (BigSwitch)
Sage Weil
11:00am-11:40am
11:50am-12:30pm
11:50pm-12:30pm
1:50pm-2:30pm
1:50pm-2:30pm
2:40pm-3:20pm
2:40pm-3:20pm
26. BREAKOUT SESSIONS - Thursday October 27th
How to configure your cloud to be able to
charge your users using official OpenStack
components!
A dice with several faces: Coordinators,
mentors and interns on OpenStack Outreach
internships
Yo dawg I herd you like Containers, so we
put OpenStack and Ceph in Containers
Picking an OpenStack Networking solution
Forget everything you knew about Swift
Rings - here's everything you need to know
about Swift Rings
Julien Danjou, Stephane Albert (Objectif
Libre), Christophe Sauthier (Objectif Libre)
Victoria Martinez de la Cruz, Nisha Yadav
(Delhi Tech University), Samuel de
Medeiros Queiroz (HPE)
Sean Cohen, Sebastien Han, Federico
Lucifredi
Russell Bryant, Gal Sagie (Huawei), Kyle
Mestery (IBM)
Christian Schwede, Clay Gerrard
(Swiftstack)
2:40pm-4:10pm
2:40pm-4:10pm
3:30pm-4:10pm
4:40pm-5:20pm
5:30pm-6:10pm