1. Accelerate networking innovation through programmable data plane
Removing switches from datacenters with TRILL/VNT and smartNIC
Ahmed Amamou, ahmed@gandi.net
Benoît Ganne, bganne@kalray.eu
2. •Gandi is a domain name registrars since 1999 and a cloud provider since 2008
•We provide both
–IaaS: Infrastructure As A Service
–PaaS: Platform As A Service
•We support open source community:
–Provide open source code : https://github.com/Gandi
–Support open source project: VLC, Debian, … *
* Check http://www.gandi.net/supports/ for exhaustive list
Who is Gandi?
2
3. IaaS new network’s challenges
3
•Cisco Forecast report*:
–Cloud traffic was about 3.3 zetta (1021) Bytes in 2013
–Cloud traffic will reach 6.6 zetta Bytes in 2016
–76% of cloud traffic are East-West (within the same datacenter)
A high density of links within a datacenter is needed
•Customer need a full network access
–Should be isolated
– VM network configuration should not be restrictive
Overlaying tenant traffic should be considered
* Cisco Global Cloud Index Forecast and Methodology, 2011-2016.
4. •New protocols are proposed to solve these problems (TRILL , VXLAN, 802.1 ad STT …) but:
– Hardware integration is slow
– Protocol extensions are hard to integrate
•We believe the OpenCompute community can help us
–To define an open, vendor-neutral API for programmable data plane
–Bring open hardware fulfilling those needs
Why OpenCompute?
4
5. •Switch from classic datacenter architecture to a full-mesh one
•Upgrade hardware to improve performances
New datacenter architecture
5
6. TRILL @Gandi
6
•Gandi uses commodity hardware as TRILL Rbridges since 2013
•We did not yet found hardware that suits our needs.
7. •Layer 2 Routing Protocol
•Uses a control and a data plane
•Control plane : based on IS-IS that computes all Routing information
•Data plane : forward packets using provided information from control plane
•Uses Mac-in-Mac encapsulation
TRILL: TRansparent Interconnection of Lot of Links
7
Original payload
TRILL Header
8. TRILL benefits
8
Commutation(L2)
Routing (L3)
TRILL
Configuration
Minimal
Intense
Minimal
Plug & play
Yes
No
Yes
Discovery
Automatic
Configured
Automatic
Learning
Automatic
Configured
Automatic
Multi path
No
Yes
Yes
Convergence
Slow
Fast
Fast
Connectivity
Inflexible
Flexible
Flexible
Scale
Limited
Important
Important
10. Multitenancy: Virtual Network over TRILL (VNT)
10
New cloud architecture have to take into consideration Multitenancy
Trill does not provide Multitenancy handling mechanisms
→ We need to extend it
11. •Update Both control and data planes
–Control plane : Prune multicast tree to limit multicast traffic
–Data plane : Forwarding is conditioned by VNI support
VNT vs TRILL
11
VNT Encapsulation
Original Ethernet Frame
Outer Destination
Mac Address
Outer Source
Mac Address
Optional
Outer IEEE 802.1Q
TRILL Header
VNT Header
Extensions
Original
Packet Payload
Egress Rbridge Nickname
Ingress Rbridge Nickname
Options description
TLV
VNI Tag (24 bits)
L2 Routing information
Tenant
identification
Publication:
Amamou, A., Haddadou, K., & Pujolle, G. (2014).
A TRILL-based multi-tenant data center network. Computer Networks.
14. Current VNT implementation on Linux
14
Control plane : Quagga daemon
Data plane:
Linux Bridge Module
https://github.com/Gandi/
15. •Throughput is affected by the addition processing operation
•Processing for a single packet is not affected
Data plane: performance
15
Throughput
Delay
16. •Shift data plane from host to smartNIC
–Increase performance
–Offload x86 for other usages
•eg. Customers workload
Improving performance
16
Host
Host
NIC
smartNIC
Control plane
Data plane
Control plane
Data plane
17. •Founded in 2008, fabless semiconductor company
•Kalray has developed the disruptive MPPA® (Multi-Purpose Processing Array) programmable architecture
–Leading Performance / Energy Ratio Worldwide
–Time predictability and low latency
–Heterogeneous applications on the same chip
–High programmability
•Working with industry-leading partners and customers
•55 employees
•Offices in France and US
KALRAY deterministic supercomputing on a chip
17
First MPPA®-256 Chips with CMOS 28nm TSMC Leading Performance / Energy Ratio Worldwide
18. Software Defined NIC
Smart packet classification/dispatching
256 cores for packets processing
Standard C/C++ with GCC-4.9
Advanced debugging and profiling
Low latency
Zero-copy Ethernet PCIe
< 1μs port-to-port transparent mode
< 1μs port to system memory
System integration
Linux support
Virtualization support
Low power
High throughput / Line rate
80 Gbps full-duplex line-rate (2x120MPPS)
3400 instructions per packet @64B
AES, SHA-1, SHA-2,CRC accelerators
2 x PCIe Gen3 8-lanes
MPPA®-256 Bostan Networking Strengths
18
19. MPPA®-256 Bostan
•64-bit processor
•Up to 800MHz
•High Performance
–845 GFLOPS SP / 422 GFLOPS DP
–1 TOPS
•High Bandwidth Network On a Chip
–2 x 12.8 GB/s
•High Speed Ethernet
–Up to 2x40 Gbps / 2x120 MPPS @ 64B
•DDR3 Memory interfaces
–2 x 64-bit + ECC @2133MT/s / 2 x 17GB/s
•PCIe Gen3 interface
–2 x 8-lanes / 2 x 8 GB/s full duplex
–End Point / Root Complex
•NoCX extension
–2 x 40 Gbps + 2 x 80 Gbps ILK
•Flash controller, GPIOs…
19
21. High Speed Ethernet Packet processing
•Ethernet Rx dispatcher
–8 classification tables
•Classify
•Extract fields
•Smart Dispatch
–Round Robin way
–Flexible cores allocation
•Round Robin vs. classification
•Per 10G Ports
• Ethernet Tx
–64 Tx FIFOs
–QoS between the FIFOs
–Flow Control between clusters and Tx FIFOs
21
Patent pending
22. VNT on a programmable data plane Multicast forwarding example
22
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
Kalray Bostan smartNIC
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
•On-going work between Gandi and Kalray
–Explore programmable data plane opportunities
–Study a VNT smartNIC feasibility and architecture
•Multicast forwarding put a high load on each node
IO ethernet driver
8x10GbE
23. VNT on a programmable data plane Multicast forwarding example
23
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
•Dispatch the packet based on Egress Rbridge
–In case of multicast, Egress RBridge is set to the tree root
–Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset)
8x10GbE
IO ethernet driver
if (Packet[Ethertype] == TRILL) {
send to cluster #HASH(Egress RBridge)
}
Kalray Bostan smartNIC
<Ethertype=TRILL, Egress=DTROOT, VNI=VNI-1>
24. VNT on a programmable data plane Multicast forwarding example
24
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
•Dispatch the packet based on Egress Rbridge
–In case of multicast, Egress RBridge is set to the tree root
–Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset)
25. VNT on a programmable data plane Multicast forwarding example
25
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
•Lookup the list of next-hop RBridges for this multicast tree
–RBridge owner clusters can be local or remote
•Lookup the LIB for local ports if any
8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
FIB[Egress RBridge] = {
Egress RBridge MAC;
Egress RBridge Interface;
MCTree = [ RBx, RBy, … ];
VNI = [ VNI-1, VNI-2, … ];
}
LIB = {
(Local MACx, Local Portx, VNI-1);
…
}
26. VNT on a programmable data plane Multicast forwarding example
26
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
•Forward the frame
–Remote
•Forward to clusters owning the next-hop RBridge
–Local
•Decapsulte inner frame
•Forward it the local VM
8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
27. VNT on a programmable data plane Multicast forwarding example
27
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
•Check if the RBridge support the appropriate VNI
–If yes forward to Rbridge
–If not, stop here
8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
FIB[Egress RBridge] = {
Egress RBridge MAC;
Egress RBridge Interface;
MCTree = [ RBx, RBy, … ];
VNI = [ VNI-1, VNI-2, … ];
}
28. VNT on a programmable data plane Multicast forwarding example
28
MPPA Linux ethernet driver
Linux networking stack
TRILL controller
x86
Hypervisor
MPPA Linux ethernet driver
Linux networking stack
Userspace application
•Check if the RBridge support the appropriate VNI
–If yes forward to Rbridge
–If not, stop here
8x10GbE
IO ethernet driver
Kalray Bostan smartNIC
29. •Solving SDN and network virtualization challenges requires new protocols
–eg. VXLAN, NVGRE, TRILL/VNT…
•Efficiency generally means hardware support
…But hardware development cannot keep up with software and slow down innovation
•Gandi and Kalray think a programmable data plane can reconcile efficiency and innovation
…But we need open ecosystems, standards and API
Innovation and efficiency
29
30. Thank you for your attention!
Questions?
Ahmed Amamou, ahmed@gandi.net
Benoît Ganne, bganne@kalray.eu