We present a new open source project which provides IPv6 networking for Linux Containers by generating programs for each individual container on the fly and then runs them as JITed BPF code in the kernel. By generating and compiling the code, the program is reduced to the minimally required feature set and then heavily optimised by the compiler as parameters become plain variables. The upcoming addition of the Express Data Plane (XDP) to the kernel will make this approach even more efficient as the programs will get invoked directly from the network driver.
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
Cilium - Fast IPv6 Container Networking with BPF and XDP
1. Cilium:
Fast IPv6 Container Networking with
BPF and XDP
LinuxCon 2016, Toronto
Thomas Graf (@tgraf__)
Kernel, Cilium & Open vSwitch Team
Noiro Networks (Cisco)
2. The Cilium Experiment
Scale
– Addressing: IPv6?
– Policy: Linear lists don’t scale. Alternative?
Extensibility
– Can we be as extensible as userspace networking
in the kernel?
Simplicity
– What is an appropriate abstraction away from
traditional networking?
Performance
– Do we sacrifice performance in the process?
3. Scaling Addressing
Solution:
– IPv6 addresses with host scope allocator
Pros:
– Everything is globally addressable
– No NAT
– Path to ILA for mobility of tasks
Cons:
– Legacy IPv4 only endpoints/applications
→ Optional IPv4 addressing (+ NAT)
→ NAT46: Provide IPv6 only applications to IPv4
only clients
4. IPv6 Status in Kubernetes/Docker
● Kubernetes (CNI): Almost there
– Pods are IPv6-only capable as of k8s 1.3.6
(PR23317, PR26438, PR26439, PR26441)
– Kubeproxy (services) not done yet
● Docker (libnetwork): Working on it
– PR826 - “Make IPv6 Great Again”
Not merged yet
6. Scaling Policy
LB BEFE
LB FE
FE BE
LB
LB Frontend Backend
Policy:
NetworkPolicy Kubernetes policy spec
as discussed and standardized in the
Networking SIG
https://github.com/kubernetes/kubernetes/blo
b/master/docs/proposals/network-policy.md
7. Scaling Policy
LB QA BE QAFE QA
LB Prod BE ProdFE Prod
LB FE
FE BE
LB
LB Frontend Backend
QA
Prod
Policy:
8. Scaling Policy
LB QA BE QAFE QA
LB Prod BE ProdFE Prod
LB FE
FE
QA
Prod
BE
LB QA
Prod
requires
requires
LB Frontend Backend
QA
Prod
Policy:
Cilium extension
Not yet part of
Kubernetes spec
QA
9. Scaling Policy Enforcement
LB FE
FE
QA
Prod
BE
LB QA
Prod
requires
requires
LB QA
FE QA
LB Prod10
11
12
13
Policy enforcement cost becomes a single hashtable
lookup regardless of number of containers or policy
complexity.
BE QA
FE Prod 14
BE Prod 15
Distributed Label ID Table:Policy:
QA
This ID is carried in packet as
metadata to provide security
context at destination host
13. BPF Features
(As of Aug 2016)
● Efficient data sharing via maps
– Per-CPU/global arrays & hashtables
● Rewrite packet content
● Extend/trim packet size
● Redirect to other net_device
● Attachment of tunnel metadata
● Cgroups integration
● Access to high performance perf ring buffer
● …
16. Why is this awesome?
On the fly BPF program generation means:
● Extensibility of userspace networking in the kernel
● MAC, IP, port number, … all become constants
→ compiler can optimize heavily!
● BPF programs can be recompiled and replaced without
interrupting the container and its connections
– Features can be compiled in/out at runtime with
container granularity
● Access to fast BPF maps and perf ring buffer to interact
with userspace.
– Drop monitor in n*Mpps context
– Use notifications for policy learning, IDS, logging, ...
17. Available Building Blocks
● L3 forwarding (IPv6 & IPV4)
● Host connectivity
● Encapsulation
(VXLAN/Geneve/GRE)
● ICMPv6 generation
● NDisc & ARP responder
● Access Control
Currently working on:
● Fragmentation handling
● Mobility
● Port Mapping (TCP/UDP)
● Connection tracking
● L3/L4 Load Balancer
● Statistics
● Events (perf ring buffer)
● Debugging framework
● NAT46
● End to end encryption
19. Simplicity
● L3 only (Calico gets this right)
– No L2 scaling issues, no broadcast domains, no L2
vulnerabilities
● No “Networks”
– No need for containers to join multiple networks
to access multiple isolation domains. No need for
multiple addresses.
● Policy definition independent of addressing
– As specified in Kubernetes Networking SIG
– All policies based on container labels
23. Q&A
Image Sources:
● Cover (Toronto)
Rick Harris (https://www.flickr.com/photos/rickharris/)
● The Invisible Man
Dr. Azzacov (https://www.flickr.com/photos/drazzacov/)
Start hacking with BPF for containers:
http://github.com/cilium/cilium
Contact:
Slack: cilium.slack.com
Twitter: @tgraf__ Mail: tgraf@tgraf.ch
Team:
● André Martins
● Daniel Borkmann
● Madhu Challa
● Thomas Graf