Weitere ähnliche Inhalte
Ähnlich wie FD.io Vector Packet Processing (VPP) (20)
Kürzlich hochgeladen (20)
FD.io Vector Packet Processing (VPP)
- 1. ©2015 Check Point Software Technologies Ltd. 1©2015 Check Point Software Technologies Ltd.
Overview
Kirill Tsym,
Next Generation Enforcement team
FD.IO
VECTOR PACKET
PROCESSING
- 2. ©2015 Check Point Software Technologies Ltd. 2
CHECK POINT SOFTWARE TECHNOLOGIES
The largest pure-play security vendor in
the world
Protecting more than
100,000 companies
with millions of users
worldwide
$1.63B annual revenues
in 2015
Over 4,300 employees
Partners in over 95
countries
- 3. ©2015 Check Point Software Technologies Ltd. 3
Lecture agenda
Linux networking stack vs user space networking initiatives
– Why User Space networking? Why so many projects around it?
Introduction to FD.io and VPP
– Architecture, Vectors, Graph, etc.
VPP Data path
– Typical graphs
– Example of supported topologies
VPP Threads and scheduling
Single and Multicore support
Supported topologies
- 5. ©2015 Check Point Software Technologies Ltd. 5
Applications
Linux kernel data path
User Space
Kernel Space
NIC1 NIC2
TCP/IP Stack
Forwarding
To Application
HW
Rx Tx
Design goals or why stack is in the kernel?
– Linux is designed as an Internet Host
(RFC1122) or an “End-System” OS
– Need to service multiple applications
– Separate user applications from
sensitive kernel code
– Make application as simple as possible
– Receive direct access to HW drivers
Cost
– Not optimized for Forwarding
– Every change requires new kernel
version
– Code is too generic
– Networking stack today is a huge part
of the kernel
Pass-through
Application Path
ApplicationsApplication
Reference: Kernel Data Path
L1
L2
L3
L4
L7
Drivers
Sockets L5
- 6. ©2015 Check Point Software Technologies Ltd. 6
Linux stack whole picture
Reference: Network_data_flow_through_kernel
- 7. ©2015 Check Point Software Technologies Ltd. 7
Linux stack packet processing
Packets are processed in Kernel one by one
– A lot of code involved in each packet processing
– Processing path is monolithic, it’s impossible to change it or load new
stack modules
– Impossible to achieve Instruction Cache optimization in this model
– There are technics to hijack kernel routines or defines hooks, but no
simple and standard way to replace tcp_input() for example
skb processing is not cache optimized
– sk_buff struct includes too much information
– It could be ideal to load all needed sk_buff ‘s to cache before processing
– But skb doesn’t fit to cache line nor placed in chain
– As result there is no Data Cache optimization and usually a lot of cache
misses
Every change requires new kernel version
– Upstream a new protocol takes very long time
– Standardization goes much faster than implementation
- 9. ©2015 Check Point Software Technologies Ltd. 9
Application
netmap API
Netmap
User Space
Kernel Space
NIC
HW
Linux
Networking Stack
netmap
rings
NIC
rings
Pros
– BSD, Linux and Windows
ports
– Good scalability
– Data path is detached from
host stack
– Widely adopted
Cons
– No networking stack
– Routing done in host stack
which slows down initial
processing
Performance
Packet forwarding Mpps
Freebsd bridging 0.690
Netmap + libpcap 7.5
Netmap 14.88 Reference: netmap - the fast packet I/O framework
- 10. ©2015 Check Point Software Technologies Ltd. 10
DPDK /
Forwarding engine
DPDK
User Space
Kernel Space
NIC1
Linux
Networking Stack
Slow Path
Fast Path
4
HW
Kernel Networking Interface
3
5
8
NIC2
Pros
– Kernel independent
– All packet processing done in
user space
– DPDK Fast Path is cache
and minimum instructions
optimized
Cons
– No networking stack
– No routing stack
– Need to send packets to
Kernel for routing decisions
– Doesn’t perform well on
scaling tests
– No external API
– No integration with
management
– Out of tree drivers
Fast Path
Slow Path
Routing
Decision
Drivers
7
1
2
6
- 11. ©2015 Check Point Software Technologies Ltd. 11
OpenFastPath
BSD Networking Stack on top of DPDK and ODP
OpenDataPlane (ODP) is a cross-platform data plane SoC networking open source API
Supported by Nokia, ARM, Cavium and ENEA
Includes optimized IP, UDP and TCP stacks
Routes and MACs are in sync with Linux through Netlink
- 12. ©2015 Check Point Software Technologies Ltd. 12
Other projects
OpenSwitch
̶ OS with Main component: DPDK based Open vSwitch
̶ Various management and CLI daemons
̶ Routing decision made by Linux Kernel (Ouch!)
̶ REST API
̶ Good for inter-VM communications
OpenOnload
̶ A user-level network stack from Sloarflare
̶ Depends on Solarflare NICs (Ouch!)
• IO Visor
̶ XDP or eXpress Data Path
̶ Not a user space networking!
̶ Tries to bring performance in to
existing kernel with BPF
̶ No need for 3rd party code
̶ Allows option of busy polling
̶ No need to allocate large pages
̶ No need for dedicated CPUs
- 14. ©2015 Check Point Software Technologies Ltd. 14
FD.io Project overview
• FD.io is Linux Foundation project
̶ A collection of several projects based on Data Plane Development Kit (DPDK)
̶ Distributed under Apache license
̶ A key project the Vector Packet Processing (VPP) is donated by Cisco
̶ Proprietary version of VPP is running in Cisco CRS1 router
̶ There is no tool chain, OS, etc in Open sourced VPP version
̶ VPP is about 300K lines of code
̶ Major contributor: Cisco Chief Technology and Architecture office team
• Three Main components
̶ Management Agent
̶ Packet Processing
̶ IO
• VPP Roadmap
̶ First release 16 of June includes14MPPS single core L3 performance
̶ 16.09 release includes integration with containers and orchestration
̶ 17.01 release will include dpdk-16.11, dpdk CryptoDev, enhanced NAT, etc.
- 15. ©2015 Check Point Software Technologies Ltd. 15
VPP ideas
• CPU cycles budget
̶ 14 Mpps on 3.5 Ghz CPU = 250 cycles per packet budget
̶ Memory access 67ns and it’s the cost of fetching one cache line (64
bytes) OR 134 CPU cycles
• Solution
̶ Perform all the processing with minimum of code
̶ Process more than one packet at a time
̶ Grab all available packets from Rx ring on every cycle
̶ Perform each atomic task in a dedicated Node
• VPP Optimization Techniques
̶ Branch Prediction hints
̶ Use of vector instructions SSE, AVX
̶ Prefetching – do not pre-fetch to much to left the cache warm
̶ Speculations – around the packet destination instead of a full lookup
̶ Dual Loops
Cache miss is
unacceptable
- 16. ©2015 Check Point Software Technologies Ltd. 16
VPP architecture
NIC1 NIC2
User Space
Kernel Space
DPDK
VPP IP Stack
PluginsPluginVPP Plugins
VPP
Pros
– Kernel independent
– All packet processing done in user space
– DPDK based (or netmap, virtio, host,
etc.)
– Includes full scale L2/L3 Networking
stack
– Routing decision made by VPP
– Also includes bridge implementation
– Good plugins framework
– Integrated with external management:
Honeycomb
Cons
– Young project
– First stable release ~06/16
– Many open areas
– Open Stack integration / Neutron
– Lack of Transport Layer integration
– Control Plane API & Stack
But what about L4/L7?
– TLDK Project
HW
Fast Path
VPP I/O Tasks I/O Polling logic + L2
L3 tasks
User Defined tasks
- 17. ©2015 Check Point Software Technologies Ltd. 17
Performance
̶ VPP data plane throughput not impacted by large IPv4 FIB size
̶ OVSDPDK data plane throughput heavily impacted by IPv4 FIB size
̶ VPP and OVSDPDK tested on Haswell x86 platform
with E5-2698v3 2x16C 2.3GHz (Ubuntu 14.04 trusty)
fd.io Foundation
Reference: FD.io intro
- 18. ©2015 Check Point Software Technologies Ltd. 18
TLDK
VPP TLDK Application layer (project)
NIC1
User Space
Kernel Space
HW
Fast Path
Purpose build
TLDK
Application
Socket
Application
BSD Socket Layer
LD_PRELOAD
SocketLayer
NativeLinux
Application
DPDK
NIC2
VPP
TLDK Application Layer
– Using TLDK Library to process
TCP and UDP packets
Purpose Built Application
– Using TLDK API Directly
(VPP node)
– Provides highest performance
BSD Socket Layer
– A standard BSD socket layer for
applications using sockets by design
– Lower performance, but good
compatibility
LD_PRELOAD Socket Layer
– Used to allow a ‘native binary Linux’
application to be ported in to the
system
– Allows for existing application to work
without any change
- 19. ©2015 Check Point Software Technologies Ltd. 19
VPP Nodes and Graph
Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Processing is
divided per Node
Node works on
Vector of Packets
Nodes are
connected to graph
Graph could be
changed dynamically
vector of packets
- 21. ©2015 Check Point Software Technologies Ltd. 21
• Full zero copy
• Data always resides in
Huge Pages memory
• Vector is passed from
graph node to node
during processing
ethernet-
input
Data path - ping
dpdk-input
ipv4-input ipv4-local ipv4-icmp-
input
ipv4-icmp-
echo-
request
ipv4-
rewrite-
local
Gigabit
Ethernet-
Output
Gigabit
Ethernet-
Txt
DPDK
Core 0
vector of packet pointers
Huge
Pages
Memory
packets data
Packets placed
to Huge Pages
by NIC
VPP Vector created during
input device work
Node
- 22. ©2015 Check Point Software Technologies Ltd. 22
ethernet-
input
Vector processing – split example
input-
device
ipv4-input
Gigabit
Ethernet-
Output
Gigabit
Ethernet-
Txt
input vector
ipv6-input
output vector A
output vector B
Transmit
queue:
packets are
reordered
Next node is called
twice by threads
scheduler
DPDK
- 23. ©2015 Check Point Software Technologies Ltd. 23
ethernet-
input
Vector processing – cloning example
dpdk-input
ipv4-input
Gigabit
Ethernet-
Output
Gigabit
Ethernet-
Txt
input vector
Transmit
queue
ipv4-frag output vector * 2 packets
input vector
Max vector size is 256
If output vector is full
Then two vectors will be
created
DPDK
- 24. ©2015 Check Point Software Technologies Ltd. 24
Rx features example : IPsec flow
dpdk-input
ipsec-if-
output
Gigabit
Ethernet-
Output
Gigabit
Ethernet-
Txt
DPDK
ethernet-
input
ipv4-input esp-
encrypt
ipv4-
rewrite-
local
esp-
decrypt
ipsec-if-
input
ipv4-local
ipsec-if node been dynamically
registered to receive
IPsec traffic using
Rx Features during interface UP
Done through rewrite
adjutancy
- 26. ©2015 Check Point Software Technologies Ltd. 26
Threads scheduling
[Restricted] ONLY for designated groups and individuals
One VPP scheduling cycle
PRE-INPUT
Purpose:
Linux input and
system control
Example:
unix_epoll_input
dhcp-client
management
stack interface
INPUT
Purpose:
Packets input
Example:
dpdk_io_input
dpdk_input
tuntap_rx
INTERRUPTS
Purpose:
Run Suspended
processes
Example:
expired timers
PENDING
NODES
DISPATCH
Purpose:
Processing all
vectors that needs
additional
processing after
changes
Example:
Worker thread
main
INTERNAL
NODES
DISPATCH
Purpose:
Processing all
pending vectors
on VPP graph
Example:
Worker thread
main
Main work: L2/L3 stack
processing and Tx
- 27. ©2015 Check Point Software Technologies Ltd. 27
Threads zoom-in
[Restricted] ONLY for designated groups and individuals
vpp# show run
Time 9.5, average vectors/node 0.00, last 128 main loops 0.00 per node 0.00
vector rates in 0.0000e0, out 0.0000e0, drop 0.0000e0, punt 0.0000e0
Name State Calls Vectors Suspends Clocks Vectors/Call
admin-up-down-process event wait 0 0 1 6.52e3 0.00
api-rx-from-ring active 0 0 6 1.04e5 0.00
cdp-process any wait 0 0 1 1.10e5 0.00
cnat-db-scanner any wait 0 0 1 5.34e3 0.00
dhcp-client-process any wait 0 0 1 6.58e3 0.00
dpdk-process any wait 0 0 3 2.73e6 0.00
flow-report-process any wait 0 0 1 6.19e3 0.00
gmon-process time wait 0 0 2 5.36e8 0.00
ip6-icmp-neighbor-discovery-ev any wait 0 0 10 1.81e4 0.00
startup-config-process done 1 0 1 2.64e5 0.00
unix-cli-stdin event wait 0 0 1 3.05e9 0.00
unix-epoll-input polling 24811921 0 0 9.48e2 0.00
vhost-user-process any wait 0 0 1 3.24e4 0.00
vpe-link-state-process event wait 0 0 1 7.10e3 0.00
vpe-oam-process any wait 0 0 5 1.37e4 0.00
vpe-route-resolver-process any wait 0 0 1 9.52e3 0.00
vpp# exit
# ps -elf | grep vpp
4 R root 20566 1 92 80 0 - 535432 - 16:10 ? 00:00:27 vpp -c /etc/vpp/startup.conf
0 S root 20582 1960 0 80 0 - 4293 pipe_w 16:10 pts/34 00:00:00 grep --color=auto vpp
#
- 28. ©2015 Check Point Software Technologies Ltd.
SINGLE AND
MULTCORE MODES
[Restricted] ONLY for designated groups and individuals
- 29. ©2015 Check Point Software Technologies Ltd. 29
Core 0 Core 1 Core 2
Rx Tx Rx Tx
VPP Threading modes
[Restricted] ONLY for designated groups and individuals
• Single-threaded
̶ Both control and forwarding engine run on single thread
• Multi-thread with workers only
̶ Control running on Main thread (API, CLI)
̶ Forwarding performed by one or more worker threads
• Multi-thread with IO and Workers
̶ Control on main thread (API,CLI)
̶ IO thread handling input and dispatching to worker threads
̶ Worker threads doing actual work including interface TX
̶ RSS is in use
• Multi-thread with Main and IO
on a single thread
̶ Workers separated by core
- Control - IO - Worker
Core 0 Core 1 Core 2
Rx Tx Tx
Core 0
Rx Tx
Core 0 Core 1 Core 2
Rx Tx
Core 3
Rx
…..
- 30. ©2015 Check Point Software Technologies Ltd.
SUPPORTED
TOPOLOGIES
[Restricted] ONLY for designated groups and individuals
- 31. ©2015 Check Point Software Technologies Ltd. 31
Router and Switch for namespaces
Reference
- 33. ©2015 Check Point Software Technologies Ltd. 33
VPP Capabilities
• Why VPP?
̶ Linux Kernel is good, but going too slow because of backward compatibility
̶ Standardization today moving faster than implementations
̶ Main reason for VPP speed – optimal usage of ICACHE
̶ Do not trash the cache with packet per packet processing like in the standard IP
stack
̶ Separation of Data Plane and Control Plane. VPP is pure Data Plane
• Main ideas
̶ Separation of Data Plane and Control Plane
̶ API generation. Available binding for Java, C and Python
̶ OpenStack integration
̶ Neutron ML2 driver
̶ OPENFV / ODL-GBP / ODL-SFC (Service chaining like firewalls, NAT, QoS)
• Containers
̶ Could be in the host connecting between containers
̶ Could be VPP inside of containers and talking between them
- 34. ©2015 Check Point Software Technologies Ltd. 34
Connection between various layers
dpdk-input
plugin
ethernet-input
ip-input
udp-local
ip4_register_protocol() UDP
ethernet_register_input_type() IPv4
vnet_hw_interface_rx_redirect_to_node()
Defined in plugin code
Next node is hardcoded in
dpdk-input/handoff-dispatch
Callback
Data
- 35. ©2015 Check Point Software Technologies Ltd. 35
Output attachment point
ipv4-input ipv4-
lookup
VPP Adjacency: mechanism to add and
rewrite next node dynamically after routing
lookup.
Available nodes:
- miss
- drop
- punt
- local
- rewrite
- classify
- map
- map_t
- sixrd
- hop_by_hop
*Possible place for POSTROUTING
HOOK
ipv4-
rewrite-
transit
VPP Rx features: mechanism
to add and rewrite next node
dynamically after ipv4-input.
Available nodes:
- input acl *Prerouting
- source check rx
- source check any
- ipsec
- vpath
- lookup
*Currently impossible to do
it from plugins
L3 Nodes Various L4 Nodes Various Post Routing Nodes
Hinweis der Redaktion
- Partners (channel partner program excl. not in program / revoked), with over 100K bookings in the past 2 years (2015-2016YTD)