Among the different methods to implement network virtualization one of the approach is by using by creating IP-MPLS based Layer 2 and the Layer 3 overlays between the compute servers. With NFV applications where the virtual network performance being the one of the important requirements in this presentation we are going to discuss how OpenContrail vRouter implementation leveraged the Intel DPDK and ported the kernel based distributed forwarding module to the user space.
3. CONTRAIL (VENDOR-NEUTRAL) ARCHITECTURE
Physical IP Fabric
(no changes)
CONTRAIL CONTROLLER
Linux Host + Hypervisor
ORCHESTRATOR
Linux Host Hypervisor
vRouter vRouter
Network orchestrationCompute / Storage
orchestration
…
Control Plane: BGP Control Plane (logically
centralized, physically distributed
Controller elements)
Gateway
Config Plane: Bi-directional real-time
message bus using XMPP
… …
Multi-vendor VNFs can run on the
same platform
Interoperates with different
Orchestration systems
Integrates with
§ different Linux Hosts,
§ multiple hypervisors, and
§ multi-vendor X86 servers
Multi-vendor SDN Gateway (any router
that can talk BGP and the aforementioned
tunneling protocols)
Data Plane: Overlay Tunnels (MPLSoGRE,
MPLSoUDP, VXLAN)
Automation: REST APIs to integrate with
different Orchestration Systems
Internet / WAN
5. VIRTUAL
NETWORK
GREEN
Host + Hypervisor
Host + Hypervisor
NETWORK VIRTUALIZATION: LOGICAL / PHYSICAL
VIRTUAL
NETWORK
BLUE
VIRTUAL
NETWORK
YELLOW
Contrail Security Policy
(Firewall-like e.g. allow
only HTTP traffic)
Contrail Policy
with a Firewall
Service
IP fabric
(switch underlay)
G1 G2 G3
B3
B1
B2
G1
G3
G2
Y1 Y2 Y3B1 B2 B3
Y2Y3
Y1
VM and virtualized Network
function pool
Intra-network
traffic
Inter-network traffic traversing a service
… …
LOGICAL
(Policy Definition)
PHYSICAL
(Policy Enforcement)
Non-HTTP traffic
6. vROUTER IN THE KERNEL
vRouter
Kernel Space
User Space
QEMU Layer
Kernel Space
User Space
Guest VM
tap-xyz(vif)
vHOST
tap-xyz(vif)
VIRTIO
Nova Agent
vRouter Host Agent
Application
VM
7. DPDK vROUTER IN THE USER SPACE
Kernel Space
User Space
QEMU Layer
Kernel Space
User Space
Application VM
DPDK
Guest VM
Nova Agent
vRouter Host Agent
vRouter (VRFWD)
eth0
VIF: TAP
eth1
VIF: TAP
8. DPDK vROUTER ARCHITECTURE
VM (Virtual Machine)
VIRTIO Ring
VIRTIO
Frontend
User Space vHost (libvirt 1.2.7)
• vHost-Net : Kernel Space (Before QEUMU 2.1)
• vHost-User: User Space vHost (QEMU 2.1)
vRouter (User-Space)
VRFWD hugetlbfs (DPDK Ring)
User-Space
Qemu Uvhost client
Kernel Space
Virtio ring
Mmap’ed memory in VRFWD from hugetlbfs
Uvhost Server
Unix Socket
(Message exchanged
once VM isUP)
1 2 3 4
NIC Queues (1,2..N)
DPDK NIC
DPDK vRouter
1 2 3 4
DPDK lcores
Lcores to NIC Queue
Mapping 1-1
Poll
vRouter
Forwarding
netlink
pkt0
VRF
Config
Policy
Tables
vRouter Agent
(vnswad)
Uvhost Server: Assigns lcore to virtio interfaces based
on Unix Socket Message communications
TCP Connection
(routes/nexthops/
interfaces/flows
Created by DPDK EAL
(Environment Abstraction Layer)
Created by DPDK EAL
(Environment Abstraction Layer)
VIRTIO
Bandend
Host
Compute Node
QEMU 2.2
Process Per VM
Host Process per VM
DPDK 2.0 Libraries
Guest
9. Qemu
DPDK 2.1 vROUTER PWT
(from NIC to Guest) for a DPDK VM
Intel NIC
82599
CPU Core 0
vRouter*
VN1-LeftVirtual Core 1
STEPS:
1. For all traffic from SDN Gateway, packets will go from Intel NIC to Core0 for inner IP lookup and hashing. This is because the NIC can’t hash on inner header it only sees GRE header
2. Core 0 hashes based on 5-tuple to Core 1 or 2 or 3.
3. Say Core 0 sends to Core 1. Core 1 provides lookup, flow table creation, re-write and policy/SG enforcement then sends to DPDK-VM1.
* vRouter runs as a multi-core process that exists on all 4 Cores. Also we have a scale-out approach to packet processing using multiple cores so the performance of 1VM is NOT limited by
the performance of 1 core.
Note: the VM has a single queue that 3 Lcores could be sending to.
DPDK-VM1CPU Core 1
vRouter*
CPU Core 2
vRouter*
CPU Core 3
vRouter*
64k
mempool
for mbufs
1Q
VN1-LeftVirtual Core 1
non-DPDK VM2
1Q
Vif 0/0
Vif 0/3
Vif 0/4
vHost
(host kernel) Vif 0/1
PERF (cache-misses): gives information on physical cores
VIF: gives performance data with logical core numbers
vRouter Agent
(pkt0) Vif 0/2
Qemu
10. Qemu
Qemu
Intel NIC
82599
CPU Core 0
vRouter*
VN1-LeftVirtual Core 1
STEPS:
1. For return traffic the VM sends to Core0 or Core1 or Core2 or Core3. This is decided by vRouter and is unknown to the VM.
2. Say VM1 sends to Core3. Core3 hashes based on the 5-tuple and sends to Core0 or Core1 or Core2.
3. Say Core3 sends to Core1. Core1 does all the packet handling provides lookup, flow table creation, re-write and policy/SG enforcement then sends to Intel NIC.
* vRouter runs as a multi-core process that exists on all 4 Cores. Also we have a scale-out approach to packet processing using multiple cores so the performance of 1VM is NOT limited by the performance of 1 core
Note: the VM has a single queue that 3 Cores could be sending to it.
DPDK-VM1CPU Core 1
vRouter*
CPU Core 2
vRouter*
CPU Core 3
vRouter*
64k
mempool
for mbufs
1Q
VN1-LeftVirtual Core 1
non-DPDK VM2
1Q
Vif 0/3
Vif 0/4
Vif 0/0
vRouter Agent
(pkt0) Vif 0/2
vHost
(host kernel) Vif 0/1
DPDK 2.1 vROUTER PWT
(from Guest to NIC) for a DPDK VM
PERF (cache-misses): gives information on physical cores
VIF: gives performance data with logical core numbers
11. VROUTER PERFORMANCE
TCP throughput between VMs Packets-per-second between VMs
Kernel vRouter 9.1 Gbps 500K
DPDK vRouter 9.1 Gbps > 12M
• TCP segmentation/Rx offload support implemented with DPDK
in upcoming Contrail release (4.0)
• Performance same as kernel vRouter if VM does not use DPDK
• Multi-queue virtio supported with kernel and DPDK vrouter
• Offloading vRouter packet processing to SmartNIC in upcoming release
15. SERVICE CHAINING
SVC 1 VM SVC 2 VM
R1 R2
Virtual Network
Red
Virtual Network
Green
G1 G2
Service Policy
(for all traffic between
VN-red and VN-Green
use the SFC
Multiple Services in a Service Chain Multiple Service Chains between 2 networks
SVC 1 VM SVC 2 VM
R1 R2
Virtual Network
Red
Virtual Network
Green
G1 G2
SVC 3 VM
Policy-based Service Chaining
(e.g. for a particular 5-tuple use SFC 1 else use SFC2)
SVC 1 VM SVC 2 VM
R1 R2
Virtual Network
Red
Virtual Network
Green
G1 G2
Scale out Services (Active-
Active HA)
Multiple Service Instances (Scale-out aka active-active HA)
SVC 1 VM SVC 2 VM
R1 R2
Virtual Network
Red
Virtual Network
Green
G1 G2
Service Instances Active-backup HA
Active-back-up
Services