Tutorial at ONUG Spring 2015 on Network and Service Virtualization. The tutorial covers three converging trends 1) Network virtualization, 2) Service virtualization, 3) overlay networking for Docker and OpenStack. The talk concludes with pointers to the hands-on portion of the tutorial that uses LorisPack, and the operational lessons learned.
4. Application Rollout Today
• Poor automation for VLAN, Service contexts, and VRFs
• Poor legacy application design?
Web
Tier
Application
Tier
Database
Tier
4
5. Typical Data Center Design
5
Rack
Core
Aggregation
Edge
Application group A
Application group B
6. Problem: Network not ready for VMs
Over 70% of today’s servers are Virtual Machines, but VMs are
not treated as first class citizens by the network
‒ East-west traffic poorly managed
‒ Lack of prioritization and rate-limiting at VM level
‒ Traffic between VMs on same server often unsupervised
‒ IP/MAC overlap not allowed, and addressing limited by VLANs
6
VM
VM
VM
VM
VM
VM
VM
VM
Containers
Symptoms of a broader problem with lack of proper
network abstractions and policy layering
7. Solution: SDN and NFV
7
Business How?
Reduced time to revenue Speed up of service provisioning
OpEx saving Automated operations and easier management
of resources
New revenue Through new business models centered around
on-demand usage
Feature velocity Introduce changes quickly according to business
logic needs
Improved policy compliance Ensure that cloud workload is compliant with
enterprise policies (e.g., access control)
Reduced OpEx during upgrades Introduce new functions and service by
replacing just software stack
9. Network Virtualization Requirements
9
Integration with
legacy network
End-to-end visibility of
VM traffic
Traffic isolation across
virtual networks
• Support bare metal
servers, appliances
and gateways
• VLAN, VxLAN, GRE
support, allowing IP
overlap across tenants
• Edge-based control of
VM traffic and
scalable host tracking
Troubleshooting
support
Application policyOrchestrating
virtual L4-L7 services
• End-to-end visibility
that maps Virtual to
Physical scalably
• Provisioning, and
chaining of virtual
services
• Application level policy
across and within
virtual networks
10. Trend #2: Service Virtualization
10
Internet Internet
NFV
Step 1. Virtualizing network functions
Step 2. Chaining/Stitching them
11. NFV in Data Centers
1. Virtualizing the L4-L7 network service
appliance (e.g., Load-balancer)
2. Chaining services to ensure that the traffic
is routed through virtual appliances
3. Optimizing service delivery for applications
• Increasing number of virtual appliances
• Increasing CPU or memory of each appliance
• Placement of virtual appliances
• Offloading certain tasks to NIC or switch
11
Compute
Orchestration
SDN control
Open-source?
15. Deployment mode #1: Underlay
VPN termination,
L3 routing
VM VM VM VMVM VM IP 192.168.1.2, MAC 0x1
VM VM VM VMVM VM
VM VM VM VMVM VM
VM VM VM VMVM VM
Controller
cluster
CLI, REST, GUI
IP 192.168.1.2, MAC 0x2
IP 192.168.2.2, MAC 0x1
IP 192.168.1.2, MAC 0x3
IP 192.168.1.2, MAC 0x2
IP 192.168.1.2, MAC 0x1
IP 192.168.2.1, MAC 0x2
IP 192.168.1.3, MAC 0x4
Tenant membership
decided based on
{switch-port, MAC, IP}
tuple in each flow
15
VNet identified
using VLANs,
VxLANs or GRE
Internet
Custom routing
by controller
16. • Problem: SDN switches have resource limitations
‒ Weak CPU incapable of doing traffic summarization, frequent
statistics reporting, and packet marking
‒ Flow-table limitation in switches (e.g., 1500 exact match entries)
‒ Switch-controller communication limits (e.g., 200 packet_in/sec)
‒ Firmware does not always expose the full capabilities of the chipset
• Solution:
‒ Next generation of hardware customized for OpenFlow
‒ New TCAMs with larger capacity
‒ Intelligent traffic aggregation
‒ Minimal offloading to vSwitches
Performance Limitations
16
17. Legacy
L3 routing
Legacy
L2 switching
VM VM VM VMVM VM
10.1.1.0/24 10.1.2.0/24 10.2.1.0/24
10.1.1.1 10.1.1.2 10.1.2.1 10.1.2.2 10.2.1.1 10.2.1.2
VM VM VM VMVM VM
VM VM VM VMVM VM
VM VM VM VMVM VM
vDP vDP vDP vDP vDP vDP
Controller
cluster
Internet
Logical link
v/p-GatewayCLI, REST, GUI
Deployment mode #2: Overlay
vDP: Virtual Data Plane
VM addressing
masked from fabric
Tunnels
Tenant membership
decided by virtual
interface on the vSwitch
vDP
17
18. VxLAN Tunneling
18
• Between VxLAN Tunnel End Points (VTEP) in each host server
• UDP port numbers allows better ECMP hashing
• In absence of SDN control plane, IP multicast is used for
layer-2 flooding (broadcasts, multicasts and unknown unicasts)
VTEP outer
MAC header
Outer IP
header
Outer UDP
header
VxLAN
header
Original L2 packet
VxLAN flags
Reserved
24bit VN ID
Reserved
Source port
VxLAN port
UDP Length
Checksum
19. • Solution:
‒ Offload it to the top-of-
rack leaf switch
‒ Use hardware gateway
• Problem:
‒ Overlay mode is CPU
hungry at high line rates
and has anecdotally fared
poorly in real world
Performance Limitations
19
Throughput Recv
side cpu
Send
side cpu
Linux Bridge: 9.3 Gbps 85% 75%
OVS Bridge: 9.4 Gbps 82% 70%
OVS-STT: 9.5 Gbps 70% 70%
OVS-GRE: 2.3 Gbps 75% 97%
Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/
20. • Combined overlay and underlay (fabric) to achieve:
‒ end-to-end visibility
‒ complete control
‒ best mix of both worlds
• Also called P+V or Overlay-Underlay
‒ Vendors are converging towards this architecture
• The integration may need 1) link-local VLANs or 2)
integration with VM manager to detect VM profile
Deployment mode #3: Hybrid
20
21. • Decoupling elements inside the overlay and converging with
the underlay to make best of both worlds
• Current mode:
Deployment mode #3: Hybrid
21
Host A Host B Host C
VxLAN overlay
VLAN VLAN VLAN
22. Controller
• Traffic leaving host has VLAN tag
• The VLAN + Source-MAC is mapped to a VxLAN
• Future Mode:
Deployment mode #3: Hybrid
22
Host A Host B
VLAN trunk
Host C
VxLAN
Distributed Virtual Switch or VLAN-based overlay
OpenStack or
vCenter
VTEP
manager
24. VM VM VM VMVM VM
VM VM VM VMVM VM
vNF vNF vNF vNFvNF vNF
CLI, REST, GUI
Typical Deployment Mode is Overlay
vNF: Virtualized Network Function
Services can be single-tenanted and multi-tenanted
vNF vNFvNF vNF
Traffic to vFirewall
Traffic to
dst VMTraffic to
VIP
Network
Controller
Service
Controller
Compute
Controller
24
25. Service Type: Stateful and Stateless
25
OVS
VM 1VM 2VM 1
Host
OVS
VM 2
Host
Dst = VIP 1 Dst = VIP 2
Stateless service: No additional
appliance needed
Stateful service: Virtual function
deployed in VM or container
VM 3
Change header and
Fwd to specific VM
Traffic proxied
to specific VM
Typically stateless load-balancing and
distributed access control
Typically stateful LB, Intrusion
detection and SSL termination
26. Service Scaling: Scale-out and Scale-up
• Scale-out:
‒ Deploy more network function instances
‒ Scale-out of workload is also necessary
• Scale-up:
‒ Give more resources to each network function instance
‒ Offloading simple tasks to vSwitch, pSwitch or pAppliance
26
27. Combined Solution
OVS
VM 1VM 2VM 1
Host 1
OVS
VM 2
Host 2
OpenStack
Dst = VIP 1 Dst = VIP 2
Controller
Orchestration
Network
Plumbing
VM 3
Service rollout
and chaining
L2-L7 Service
orchestration
DC Network Virtualization
Policy/ QoS
Trouble-
shooting
UI/Analytics
Compute
L3 Spine
VTEP
Leaf
27
29. • Most common platform for standardizing open API for
networking, and vendors to innovate.
• Neutron: High-level abstractions for creating and
managing tenant virtual network
‒ Flat L2 connectivity across DC
‒ DHCP enabled IP addressing
‒ Floating-IP (for outside-in access)
‒ L3 subnets and routers
‒ Gateway and VPN
‒ Load-balancer service
‒ Security groups
‒ ….
OpenStack Platform
29
Tenant BTenant A
30. OpenStack API
30
Typical workflow
1. Create a network
2. Associate a subnet
with the network
3. Boot a VM and
attach it to the
network
4. Delete the VM
5. Delete any ports
6. Delete the
network
Network Virtualization App
SDN Controller
pSwitch
pSwitch
vSwitch
vSwitch
OVSDBOpenFlow
XYZ Custom API
XYZ Mech driver
ML2 Plugin
Neutron API
Orchestration
North-bound
API
Application
Controller
South-bound
API
Dataplane
elements
31. Basic Technology for OpenStack Networking
Namespace Containerized networking at the process level managed at /proc.
Primarily used to create
dnsmasq Open-source DNS/DHCP agent run on every host
Linux Bridge L2/MAC learning switch built into the Kernel to use for forwarding
OpenvSwitch Advanced bridge that is programmable and supports tunneling
• ovs-vsctl used to configure the bridge
• ovs-ofctl used to configure the forwarding flow rules
NAT Network address translators are intermediate entities that
translate IP address + Ports (Types: SNAT, DNAT)
iptables Policy engine in kernel that is used for managing packet
forwarding, firewall, NAT features
31
33. • Basic free OpenStack software includes:
‒ OVS plugin that runs as mech driver in Neutron server, and
‒ OVS plugin that runs in both network and compute node
‒ No OpenFlow. Just wrappers to ovs-vsctl and ovs-ofctl CLI
OpenStack OVS Networking Agents
33
RPC over
mgmt. network
Controller Network node Compute node
OVS OVS
OVS
Agent
OVS
Agent
OVS Mech
driver
Neutron
server
VM VMVM VMHorizon UI
ovs-*ctl ovs-*ctl
Data
network
34. • Key feature that reduces bottlenecks in network
• View of 1 tenant routing through namespaces
Compute node Network node
Distributed Virtual Routing
35
eth1
br-ex
snat
namespace
qr qg
“VM2
br-tun
br-int
eth0
qrouter
namespace
S1 S2
br-tun
br-int
eth0
Public access
Private access
eth1
br-ex
“VM1
Floating-IP Non-floating-IP
rfp
35. • Vendor-driven consortium (with Cisco, Brocade, and others)
for developing open-source SDN controller platform
OpenDayLight Controller
36
36. • Overlay-based OpenStack Networking supported today
• All required features offered using Open vSwitch
programming
OpenStack Networking in OpenDaylight
<#>
38. • Over the past few years, LXC came up as an alternative to VM
for running workload on hosts
• Each container is a clone of the host OS
• Docker brought Linux containers to prominence
‒ Tracks application configuration and possibly archives to DockerHub
Linux Containers
39
Container 1
App X
Container 2 Container 3
Host OS
Guest root
App Y
Guest root
App Z
Guest root
39. Docker
• Excellent way to track
application dependencies and
configuration in a portable
format.
• For instance the Dockerfile on
the right can be used to
spawn a container with nginx
LB and accessed at a host port
$ docker build XYZ
$ docker images
$ docker run -i --name=nginx1
-d –i nginx
$ docker ps
$ docker inspect nginx1
40
# Pull base image.
FROM dockerfile/ubuntu
# Install Nginx.
RUN
add-apt-repository -y ppa:nginx/stable &&
apt-get update &&
apt-get install -y nginx &&
rm -rf /var/lib/apt/lists/* &&
echo "ndaemon off;" >> /etc/nginx/nginx.conf &&
chown -R www-data:www-data /var/lib/nginx
# Define mountable directories.
VOLUME ["/etc/nginx/sites-enabled", "/etc/nginx/certs",
"/etc/nginx/conf.d", "/var/log/nginx"]
# Define working directory.
WORKDIR /etc/nginx
# Define default command.
CMD ["nginx"]
# Expose ports.
EXPOSE 80
EXPOSE 443
40. Networking Still in Early Stages
Today Docker usage is
predominantly within a
single laptop or host. The
default network on right is
allocated to the nginx
container we spawned.
But, folks are exploring
connecting containers
across hosts.
41
"NetworkSettings": {
"Bridge": "docker0",
"Gateway": "172.17.42.1",
"IPAddress": "172.17.0.15",
"IPPrefixLen": 16,
"MacAddress":
"02:42:ac:11:00:0f",
"PortMapping": null,
"Ports": {
"443/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "49157"
}
],
"80/tcp": [
{
"HostIp": "0.0.0.0",
"HostPort": "49158"
}
]
}
41. Many ways to network in Docker
• Many of these are similar to what we can do with VM
(except the Unix-domain socket method of direct access)
42
Host
Container
C
Container D Container E Container FContainer A Container B
Direct
Host
network
Unix-domain
sockets and
other IPC
Docker0
Linux bridge
Docker proxy
(using iptables)
Open vSwitch
Port
mapping
42. Mechanisms for Multi-Host Networking
• Option 1: Flat IP space (at container level) with
routing (and possibly NAT) done by host
‒ Step 1: Assign /24 subnet CIDR to each host for its containers
‒ Step 2: Setup ip route to ensure traffic to external subnets leave
from host interface (e.g., eth0)
• Option 2: Create overlay network
‒ Step 1: Create a parallel network for cross-host communication
‒ Step 2: Connect hosts in cluster using encapsulation tunnels
‒ Step 3: Plug containers to appropriate virtual networks
43
43. Option 1: Flat IP space
Step 1: Choose CIDR wisely when starting Docker daemon
Step 2: Add static routes to other containers’ subnets
44
Host 1
Nginx1
172.17.42.18
Bash1
172.17.42.19
172.17.42.1
Docker0 bridge
eth0192.168.50.16
Host 2
Nginx2
172.17.43.18
Bash2
172.17.43.19
172.17.43.1
Docker0 bridge
eth0192.168.50.17
Docker
manages
these
allocation
route add -net 172.17.43.0/24
gw 192.168.50.17
route add -net 172.17.42.0/24
gw 192.168.50.16
Quiz: What IP address do
packets on the wire have?
NAT rules already in
place to masquerade
internal IP addresses
44. 192.168.50.16 192.168.50.17
nginx1 ContainerX
Host 1
bash1 ContainerY
docker0
Open vSwitch
Host 2
Internet
Open vSwitch
docker0
vxlan vxlanvxlan vxlan
Other
cluster
hosts
Option 2: Open vSwitch based Overlay
Suggest creating
parallel network
that decouples
container
networking from
underlying
infrastructure
45
45. Container and VM networking unified
• Edge-based overlays are even more important in container world.
• Open vSwitch already supports network namespaces
• VxLAN provides:
‒ isolation,
‒ improves L2/L3 scalability,
‒ allows overlapping MAC/IP address
Docker Engine
OVS OVS OVS
Conta
iner
Conta
iner
Conta
iner
Conta
iner
Conta
iner
Conta
iner
VM V VM
Orchestration ?? OpenStack
VxLAN Tunneled network
Neutron
OVS agent
46
47. • In this tutorial exercise, we will use the LorisPack toolkit that allows easily creating
the parallel network, and isolating container communication to its own pod/group
• Desired end goals:
1. Containers isolated into two virtual networks
2. c1 cannot access container in different virtual network
3. c1 can have overlapping IP address
• Inter-host communication uses VxLAN encapsulation
Host 2Host 1
Goal for Tutorial: Preview of
Microsegmentation using VxLAN
48
c1
10.10.0.1
c2
10.10.0.1
c3
10.10.0.3
c4
10.10.0.4
Virtual
Network 1
Virtual
Network 2
X X
48. Setup 1: Installation
• Bring up two Linux VMs (preferably Ubuntu over Virtualbox)
on your laptop with following installed:
‒ Open vSwitch (version 2.1 +)
‒ Docker (version 1.5 +)
‒ LorisPack (git clone https://github.com/sdnhub/lorispack)
• The VMs should have host-only adapter added as a second
interface eth1 so that they can communicate with each other.
• In my case,
‒ VM1 IP is 192.168.56.101
‒ VM2 IP is 192.168.56.102
49
49. Setup 2: Docker and networking
On VM 192.168.56.101,
we run:
# docker run --name c1 -dit ubuntu
/bin/bash
# docker run --name c2 -dit ubuntu
/bin/bash
# loris init
# loris cluster 192.168.56.102
# loris connect c1 10.10.0.1/24 1
# loris connect c2 10.10.0.1/24 2
On VM 192.168.56.102,
we run:
# docker run --name c3 -dit ubuntu
/bin/bash
# docker run --name c4 -dit ubuntu
/bin/bash
# loris init
# loris cluster 192.168.56.101
# loris connect c3 10.10.0.3/24 1
# loris connect c4 10.10.0.4/24 2
50. • Verify the Open vSwitch configurations for connecting two nodes with
VxLAN and connecting the two containers to the OVS.
Port Configuration
51
# sudo ovs-vsctl show
873c293e-912d-4067-82ad-d1116d2ad39f
Manager "pssl:6640"
Bridge "br0"
Port "br0"
Interface "br0"
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "tap3392"
tag: 1000
Interface "tap3392"
Port "tap3483"
tag: 1001
Interface "tap3483"
Bridge "br1"
Controller "pssl:6634"
Port "vxlanc0a83866"
Interface "vxlanc0a83866"
type: vxlan
options: {in_key=flow,
out_key=flow, remote_ip="192.168.56.102"}
Port "br1"
Interface "br1"
type: internal
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
ovs_version: "2.3.90"
(Equivalent to br-int)
(Equivalent to br-tun)
VxLAN
tunnel
port
c1
port
c2
port
51. • In our setup, we can verify reachability between containers using ping
• We observe that c1 is able to access c3, but not c4
• We observe that c4 is able to access c2 despite IP overlap
Microsegmentation in Effect
52
VM 2
192.168.56.102
VM 1
192.168.56.101
c1
10.10.0.1
c2
10.10.0.1
c3
10.10.0.3
c4
10.10.0.4
X
# docker attach c1
root@c1:/# ping 10.10.0.3
Success !
root@c1:/# ping 10.10.0.4
Fails!
# docker attach c4
root@c1:/# ping 10.10.0.1
Success !
55. Debugging is a Challenge
Symptom Plausible reasons Things to check
Two VMs unable to
contact each other
• Improper subnet and access
control policies
• Perform neutron client
commands and verify config
• Check iptables -L -t NAT rules
on both compute nodes
• Ping from VMs and check
tcpdump
• VM networking not configured
right
• Check neutron-debug ping-all,
ssh and
Traffic from VM is not
reaching outside
• DHCP failed because the
subnet’s dnsmasq is not
accessible or down
• Check IP assignment and
gateway in the VM
• Check neutron-debug dhcping
• Network node inaccessible
from compute node
• Check ovs-vsctl br-tun to verify
VxLAN or GRE tunnels
• S-NAT router in network node
misbehaving
• Check router configuration in
OpenStack
• Check router namespace using
ip netns exec <id> route –n
56
56. Debugging is a Challenge
Symptom Plausible reasons Things to check
Traffic from outside is
not reaching VM
• Not adding floating-IP to the
VM
• Check floating-IP assignment
• NAT rules lost from compute
node
• Check NAT rules on each
compute node
• DVR in compute node
misbehaving
• Check router configuration in
OpenStack
• Check router namespace using
ip netns exec <id> route –n
• Check if ip netns is able to ping
VM
• MTU is not correctly set in
network
• Perform iperf -m between
endpoints to check effective
MTU and check all interfaces
ping, tcpdump, ip netns, iptables, ovs-vsctl, ovs-ofctl, neutron-
debug, neutron client will haunt your dreams!
57
57. • Open-source version of OpenStack has challenges
going to production without vendor support
‒ Overlay and underlay integration not available
‒ Lacks high availability for the agents
‒ Analytics, metering and other operational tools are immature
‒ Debugging is a tricky art
Production Challenges: OpenStack
58
58. • Similar challenges plagues Docker networking too. In addition,
‒ Fast evolving, overwhelming ecosystem with cute-sounding DevOps
tools that is going through “natural selection”
‒ Storage and Networking are second order problems.
Production Challenges: Docker
59
ClusterHQ's
approach
to migrating
containers
across hosts
nginx nginx
60. Summary
• SDN brings in all operational goodness from computing world to
networking world.
• Looking at service virtualization separately is not wise.
Recommend a joint evaluation
• Varying architectures and networking policy being compiled
down.
• VM and container networking work with similar network
abstractions
‒ But at different scale and velocity
‒ Docker and OpenStack networking fairly similar
• Edge-based overlay intelligence using Open vSwitch is powerful.
61
Note: You can use Neutron + OVS to manage VLANs without requiring commercial s/w
The set of plugins included in the main Neutron distribution and supported by the Neutron community include:
Open vSwitch Plugin
Cisco UCS/Nexus Plugin
Linux Bridge Plugin
Modular Layer 2 Plugin
Nicira Network Virtualization Platform (NVP) Plugin
Ryu OpenFlow Controller Plugin
NEC OpenFlow Plugin
Big Switch Controller Plugin
Cloudbase Hyper-V Plugin
MidoNet Plugin
Brocade Neutron Plugin Brocade Neutron Plugin
PLUMgrid Plugin
Additional plugins are available from other sources:
OpenContrail Plugin
Extreme Networks Plugin
Ruijie Networks Plugin
Mellanox Neutron Plugin Mellanox Neutron Plugin
Juniper Networks Neutron Plugin
If you have your own plugin, feel free to add it to this list.