SlideShare ist ein Scribd-Unternehmen logo
1 von 129
0Copyright©2017 NTT corp. All Rights Reserved.
Software Stacks to enable
Software-Defined Networking and
Network Functions Virtualization
Yoshihiro Nakajima
<nakajima.yoshihiro@lab.ntt.co.jp, ynaka@lagopus.org>
NTT Network Innovation Laboratories
1Copyright©2017 NTT corp. All Rights Reserved.
 Yoshihiro Nakajima
 Work for Nippon Telegraph and Telephone Corporation,
R&D Division
Network Innovation Laboratories
 Project lead of Lagopus SDN software switch
 Background
• High performance computing
• High performance networking and data processsing
About me
2Copyright©2017 NTT corp. All Rights Reserved.
 Trends
 Software-Defined Networking (SDN)
 Network Functions Virtualization (NFV)
 Dataplane
 Lagopus: SDN/OpenFlow software switch
• Overview and NFV
• Trials
 Controller/C-plane
 O3 project
 Ryu and gobgp
 Zebra2
 Future plan
Agenda
3Copyright©2017 NTT corp. All Rights Reserved.
Network is so stuck….
 Many technologies has emerged in system development area
 Language, Debugger, Testing framework, Continuous Integration, Continuous
Deployment….
 Network area …
 CLI with vender format 
 CLI with serial or telnet 
 Expect script 
 Good for Human,
But not for software
4Copyright©2017 NTT corp. All Rights Reserved.
Trend shift in networking
 Closed (Vender lock-in)
 Yearly dev cycle
 Waterfall dev
 Standardization
 Protocol
 Special purpose HW / appliance
 Distributed cntrl
 Custom ASIC / FPGA
 Open (lock-in free)
 Monthly dev cycle
 Agile dev
 DE fact standard
 API
 Commodity HW/ Server
 Logically centralized cntrl
 Merchant Chip
5Copyright©2017 NTT corp. All Rights Reserved.
What is Software-Defined Networking?
5
Innovate services and applications
in software development speed!
Reference: http://opennetsummit.org/talks/ONS2012/pitt-mon-ons.pdf
Decouple control plane and data plane
→Free control plane out of the box
(APIs: OpenFlow, P4, …)
Logically centralized view
→Hide and abstract complexity of networks,
provide entire view of the network
Programmability via abstraction layer
→Enables flexible and rapid service/application
development
SDN conceptual model
6Copyright©2017 NTT corp. All Rights Reserved.
Why SDN?
6
Differentiate services (Innovation)
• Increase user experience
• Provide unique service which is not in the
market
Time-to-Market(Velocity)
•Not to depend on vendors’ feature roadmap
•Develop necessary feature when you need
Cost-efficiency
•Reduce OPEX by automated workflows
•Reduce CAPEX by COTS hardware
7Copyright©2017 NTT corp. All Rights Reserved.
Open-Source Cloud Computing
Open Northbound APIs
Open-Source Controller
Open-Source Hardware
Open Southbound Protocol
Open-Source
Switch Software
Infrastructure
Layer
Application Layer
Business Applications
Control Layer
Network
Services
Network Services
API API API
v
v v v
Open source network stacks for SDN/NFV
FBOSS
8Copyright©2017 NTT corp. All Rights Reserved.
SDN/NFV history
201020092008 2011 2012 2013
OpenFlow deveopment
Open Networking Foundation
ETSI
NFV
Open
Daylight
OpenFlow Switch Consortium
Stanford University
Clean Slate Program
NFV
SDN
activity
Standarlization
OpenFlow
2014
trema Ryu
OpenDaylight
OF1.1 OF1.2
OpenFlow
protocols OF1.4OF 0.8 OF 0.9 OF 1.0
NOX
Lagopus
9Copyright©2017 NTT corp. All Rights Reserved.
History of software vswitch/router and I/O library
1995 2000 2005 2010 2015
DPDK
R1.0 OSS LF join
2012 BT vBRAS demo
Research phase
Internal product phase
OSS phase
netmap
XDP
Click
BESS
OVS
Nicira OVN
Lagopus
VPP
10Copyright©2017 NTT corp. All Rights Reserved.
Network Functions Virtualization
 Replace dedicated network nodes to
virtual appliance
 Runs on general servers
 Leverage cloud provisioning and
monitoring system for management
 Virtual network function (VNF) may be run on
virtual machine or baremetal server
11Copyright©2017 NTT corp. All Rights Reserved.
11
Evaluate the benefits of SDN
by implementing our control plane and switch
12Copyright©2017 NTT corp. All Rights Reserved.
 High performance network I/O for all packet sizes
 Especially in smaller packet size (< 256 bytes)
 Low-latency and less-jitter
 Network I/O & Packet processing
 Isolation
 Performance isolation between NFV VMs
 Security-related VM-to-VM isolation from untrusted apps
 Reliability, availability and serviceability (RAS) function for long-term
operation
NFV requirements from 30,000 feet
Virtua lma chine
Ma na gement a nd
API
Virtua lSwitch
(vSwitch)
NIC
Ha rd wa re resources
Core Core
VM VM
Core
VM VMNf-Vi-H
Instruction,
Policing ma p p ing
a nd emula tion
CoreCore CoreCore
Seq uentia lthrea d
emula tion
13Copyright©2017 NTT corp. All Rights Reserved.
 Still poor performance of NFV apps 
 Lower network I/O performance
 Big processing latency and big jitter
 Limited deployment flexibility 
 SR-IOV has limitation in performance and configuration
 Combination of DPDK apps on guest VM and DPDK-enabled vSwitch is
configuration
 Limited operational support 
 DPDK is good for performance, but has limited dynamic reconfiguration
 Maintenance features are not realized
What’s matter in NFV
14Copyright©2017 NTT corp. All Rights Reserved.
 Not enough performance 
 Packet processing speed < 1Gbps 
 10K flow entry add > two hours 
 Flow management processing is too heavy 
 Develop & extension difficulties 
 Many abstraction layer cause confuse for me 
• Interface abstraction, switch abstraction, protocol abstraction Packet abstraction… 
 Many packet processing codes exists for the same processing 
• Userspace, Kernelspace,…
 Invisible flow entries cause chaos debugging 
Existing vswitch is  @2013
15Copyright©2017 NTT corp. All Rights Reserved.
vSwitch requirement from user side
 Run on the commodity PC server and NIC
 Provide a gateway function to allow connect different
various network domains
 Support of packet frame type in DC, IP-VPN, MPLS and access NW
 Achieve 10Gbps-wire rate with >= 1M flow rules
 low-latency packet processing
 flexible flow lookup using multiple-tables
 High performance flow rule setup/delete
 Run in userland and decrease tight-dependency to OS
kernel
 easy software upgrade and deployment
 Support various management and configuration protocols.
16Copyright©2017 NTT corp. All Rights Reserved.
Lagopus:
High-performance
SDN/OpenFlow Software Switch
17Copyright©2017 NTT corp. All Rights Reserved.
Goal of Lagopus project
 Provide NFV/SDN-aware switch software stack
 Provide dataplane API with OpenFlow protocol and gRPC
 100Gbps-capable high-performance software dataplane
 DPDK extension for carrier requirements
 Cloud middleware adaptation
 Expand software-based packet processing to carrier
networks
18Copyright©2017 NTT corp. All Rights Reserved.
 Lagopus is a small genus of birds in the grouse subfamily,
commonly known as ptarmigans. All living in tundra or
cold upland areas.
 Reference: http://en.wikipedia.org/wiki/Lagopus
What is Lagopus (雷鳥属)?
© Alpsdake 2013© Jan Frode Haugseth 2010
19Copyright©2017 NTT corp. All Rights Reserved.
 Provide High performance software switch on Intel CPU
 Over-100Gbps wire-rate packet processing / port
 High-scalable flows handling
 Expands SDN idea to many network domain
 Datacenter, NFV environment, mobile network
 Various management /configuration interfaces
Target of Lagopus switch
TOR
Virtual Switch
Hypervisor
VM VM
Virtual Switch
Hypervisor
NFV NFV
Virtual Switch
Hypervisor
VM VM
Gateway CPE
Data Center Wide-area Network Access Network Intranet
20Copyright©2017 NTT corp. All Rights Reserved.
 Open Source High performance SDN software switch
 Multicore-CPU-aware packet processing with DPDK
 Supports NFV environment
 Runs on Linux and FreeBSD
 Best OpenFlow 1.3 compliant software switch by Ryu
certification
 Many protocol frame matches and actions support
• Ethernet, VLAN, MPLS, PBB, IPv4, IPv6, TCP, UDP, VxLAN, GRE, GTP
 Multiple-Flow table, Group table, meter table
 1M flow entries handling (4K flow mod/sec)
 Over-40Gbps-class packet processing (20MPPS)
 Open source under Apache v2 license
 http://lagopus.github.io/
What is Lagopus SDN software switch
21Copyright©2017 NTT corp. All Rights Reserved.
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
0 256 512 768 1024 1280
#ofpacketsperseconds
Packet size (Byte)
How many packets to be proceeded for 10Gbps
Short packet 64Byte
14.88 MPPS, 67.2 ns
• 2Ghz: 134 clocks
• 3Ghz: 201 clocks
Computer packet 1KByte
1.2MPPS, 835 ns
• 2Ghz: 1670 clocks
• 3Ghz: 2505 clocks
L1 cache access: 4 clocks
L2 cache access: 12 clocks
L3 cache access: 44 clocks
Main memory: 100 clocks
Max PPS between cores: 20MPPS
22Copyright©2017 NTT corp. All Rights Reserved.
PC architecture and limitation
NIC
CPU CPUMemory
Memory
NIC
NICNIC
QPI
PCI-Exp PCI-Exp
Reference: supermicro X9DAi
23Copyright©2017 NTT corp. All Rights Reserved.
0 1000 2000 3000 4000 5000 6000
L2 forwarding
IP routing
L2-L4 classification
TCP termination
Simple OF processing
DPI
# of required CPU cycles
# of CPU cycle for typical packet processing
10Gbps 1Gbps
24Copyright©2017 NTT corp. All Rights Reserved.
 Simple is better for everything
 Packet processing
 Protocol handling
 Straight forward approach
 Full scratch (No use of existing vSwitch code)
 User land packet processing as much as possible, keep
kernel code small
 Kernel module update is hard for operation
 Every component & algorithm can be replaced
Approach for vSwitch development
25Copyright©2017 NTT corp. All Rights Reserved.
What is Lagopus vSwitch
switch configuration datastore
(config/stats API, SW DSL)
None-DPDK NIC DPDK NIC/vNIC
DPDK libs/PMD driver
Lagopus soft dataplane
flow lookup flow cache
OpenFlow pipeline
queue/
policer
Flow table
Flow table
flow table
Flow table
Flow tableGroup
table
Flow table
Flow tablemeter
table
switch HAL
OpenFlow1.3
agent
JSON IF
SNMP
CLI
CLI
JSON
OSNWstack
Agent
SDN switch Agent
• Full OpenFlow 1.3 support
• Controller-less basic L2 and
L3 support with
action_normal
SDN-aware
management API
• JSON-based control
• Ansible support
DPDK-enabled
OpenFlow-aware
software dataplane
• Over-10-Gbps performance
• Low latency packet processing
• high performance multi-layer flow
lookup
• Cuckoo hash for flow cache
Switch configuration
datastore
• Pub/sub mechanism
• Switch config DSL
• JSON-based control
OS NIFVarious I/O support
• DPDK-enabled NIC
• Standard NIC with raw socket
• tap
Virtualization support
• QEMU/KVM
• Vhost-user
• DPDK-enabled VNF
26Copyright©2017 NTT corp. All Rights Reserved.
General packet processing on UNIX
NIC
skb_buf
Ethernet Driver API
Socket API
vswitch
packet
buffer
Data plane
User-space implementation
(Event-triggered)
1. Interrupt
& DMA
2. system call (read)
User
space
Kernel
space
Driver
4. DMA
3. system call (write)
Kernel-space implementation
(Event-triggered)
NIC
skb_buf
Ethernet Driver API
Socket API
vswitch
packet
buffer
1. Interrupt
& DMA
vswitch
Data plane
agentagent
2. DMA
Contexts switch
Massive
Interrupt
Many memory
copy / read
27Copyright©2017 NTT corp. All Rights Reserved.
 x86 architecture-optimized data-
plane library and NIC drivers
 Memory structure-aware queue, buffer
management
 packet flow classification
 polling mode-based NIC driver
 Low-overhead & high-speed
runtime optimized with data-
plane processing
 Abstraction layer for hetero
server environments
 BSD-license 
Data Plane Development Kit (DPDK)
NIC
Ethernet Driver API
Socket API
DPDK apps
packet
buffer
1. DMA
Write
2. DMA
READ
DPDK
dataplane
DPDK apps
28Copyright©2017 NTT corp. All Rights Reserved.
What DPDK helps
Network Platforms Group
Kernel
Space
Driver
95
Packet Processing Kernel Space vs. User Space
User
Space
NIC
Applications
Stack
System Calls
CSRs
Interrupts
Memory (RAM)
Packet Data
Copy
Socket Buffers
(mbuf’s)
Configuration
Descriptors
Kernel Space Driver
Configuration
Descriptors
DMA
Benefit #1
Removed Data copy
from Kernel to User
Space
Benefit #2
No Interrupts
Descriptors
Mapped from Kernel
Configuration
Mapped from Kernel
Descriptor
Rings
Memory (RAM)
User Space Driver with Zero Copy
Kernel
Space
User
Space
NIC
DPDK PMD
Stack
UIO Driver
System Calls
CSRs
DPDK Enabled App
DMA
Descriptor
Rings
Socket
Buffers
(skb’s)
1
2
3
1
2
Benefit #3
Network stack can
be streamlined and
optimized
DATA
29Copyright©2017 NTT corp. All Rights Reserved.
Processing bypass for speed
NIC
skb_buf
Ethernet Driver API
Socket API
vswitch
packet
buffer
packet buffer
memory
Standard linux application
1. Interrupt & DMA
2. system call (read)
User space
Kernel
space
Driver
4. DMA
3. system call (write)
NIC
Ethernet Driver API
User-mode I/O & HAL
vswitch
packet
buffer
Application with intel DPDK
1. DMA Write 2. DMA READ
DPDK Library
Polling-base
packet handling
Event-base
packet handling
30Copyright©2017 NTT corp. All Rights Reserved.
Implementation strategy for vSwitch
 Massive RX interrupts handling for NIC device
=> Polling-based packet receiving
 Heavy overhead of task switch
=> Thread assignment
(one thread/one physical CPU)
 Lower performance of PCI-Express I/O and memory
bandwidth compared with CPU
=> Reduction of # of access in I/O and memory
 Shared data access is bottleneck between threads
=> Lockless-queue, RCU, batch processing
31Copyright©2017 NTT corp. All Rights Reserved.
Basic packet processing
Network I/O RX
packet
Frame
processing
Flow lookup &
Action
QoS・Queue
Network I/O
TX
Packet classification &
packet distribution to buffers
Packet parsing
lookup, Header rewrite
Encap/decap
Policer, Shaper
Marking
packet
32Copyright©2017 NTT corp. All Rights Reserved.
What we did for performance 
Network I/O RX
packet
Frame
processing
Flow lookup &
Action
QoS・Queue
Network I/O
TX
packet
• Delayed packet frame
evaluation
• Delayed action (processing)
evaluation
• Packet batching to improve
CPU $ efficiency
• Delayed flow stats
evaluation
• Smart flow classification
• Thread assignment optimization
• Parallel flow lookup
• Lookup tree compaction
• High-performance lookup
algorithm for OpenFlow
(multi-layer, mask, priority-aware
flow lookup)
• Flow $ mechanism
• Batch size tuning
33Copyright©2017 NTT corp. All Rights Reserved.
 Exploit many core CPUs
 Reduce data copy & move (reference access)
 Simple packet classifier for parallel processing in I/O RX
 Decouple I/O processing and flow processing
 Improve D-cache efficiency
 Explicit thread assign to CPU core
Packet processing using multi core CPUs
NIC 1
RX
NIC 2
RX
I/O RX
CPU0
I/O RX
CPU1
NIC 1
TX
NIC 2
TX
I/O TX
CPU6
I/O TX
CPU7
Flow lookup
packet processing
CPU2
Flow lookup
packet processing
CPU4
Flow lookup
packet processing
CPU3
Flow lookup
packet processing
CPU5
NIC 3
RX
NIC 4
RX
NIC 3
TX
NIC 4
TX
NIC RX buffer
Ring buffer
Ring buffer NIC TX buffer
34Copyright©2017 NTT corp. All Rights Reserved.
 OpenFlow semantics includes
 Match
• Protocol headers
– Port #, Ethernet, VLAN, BPP, MAC-in-MAC, MPLS, IPv4, IPv6, ARP, ICMP, TCP,
UDP…
– Mask-enabled
• Priority
 Action
• Output port
• Packet-in/Packet-out
• Header rewrite, Header push, Header push
How OpenFlow match is so hard
35Copyright©2017 NTT corp. All Rights Reserved.
Many header match
eth_dst
eth_src
eth_type
eth_dst
eth_src
0x0800
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
eth_dst
eth_src
0x0800
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
17
(ip_sum)
ip_src
ip_dst
udp_src
udp_dst
eth_dst
eth_src
0x0800
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
1
(ip_sum)
ip_src
ip_dst
icmp_type
icmp_code
eth_dst
eth_src
0x0806
(ar_hrd, ar_pro)
(ar_hln, ar_pln)
ar_op
ar_sha
ar_spa
ar_tha
ar_tpa
eth_dst
eth_src
0x8847
eth_dst
eth_src
0x8100
Ethernet IPv4 UDP(v4) ICMP ARP MPLS VLAN
mpls_label
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
vlan_pcp
(vlan_cfi)
vlan_vid
0x0800
mpls_exp
mpls_bos
mpls_ttl
And GRE, L2TP,
OSPF…
Same as TCP, SCTP L3 header continues
Includes IP and other
protocols
36Copyright©2017 NTT corp. All Rights Reserved.
Linear search (first implementation)
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
Is eth_type
0x0800?
Is ip_p
17?
Is udp_dst
53?
0x0800?
6?
53?
packet entry 0 entry 1
Is eth_type
Is ip_p
Is tcp_dst
0x0800?
6?
80?
entry 2
Is eth_type
Is ip_p
Is tcp_dst
37Copyright©2017 NTT corp. All Rights Reserved.
0: Linear search (first implementation)
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
Is eth_type
0x0800?
Is ip_p
17?
Is udp_dst
53?
0x0800?
6?
53?
packet entry 0 entry 1
Is eth_type
Is ip_p
Is tcp_dst
0x0800?
6?
80?
entry 2
Is eth_type
Is ip_p
Is tcp_dst
38Copyright©2017 NTT corp. All Rights Reserved.
 Simplify comparison routine
 Flow entry is composed by mask and value
 Still linear search
1: bitmap comparison with bitmask
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
eth_dst
eth_src
0xffff
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
0xff
(ip_sum)
ip_src
ip_dst
tcp_src
0xffff
eth_dst
eth_src
0x0800
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
6
(ip_sum)
ip_src
ip_dst
tcp_src
80
& ==
packet mask flow entry
39Copyright©2017 NTT corp. All Rights Reserved.
 Compose search tree for dedicate fields to narrow lookup space
 Then compare each fileds
2: Fixed-type search tree
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
packet
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
array
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
hash table
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
array
Linear sarch
40Copyright©2017 NTT corp. All Rights Reserved.
 Scan all flow entries and compose search tree with arrangement
in order from the most frequently searched field
3: Search tree with most searched field first
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
packet
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
tcp_src
tcp_dst
hash table
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
hash table
eth_dst
eth_src
eth_type
(ip_v, ip_hl)
ip_tos
(ip_len)
(ip_id, ip_off)
ip_ttl
ip_p
(ip_sum)
ip_src
ip_dst
tcp_src
tcp_dst
hash table
eth_dst
eth_src
eth_type
ip_dst
Linear
search
41Copyright©2017 NTT corp. All Rights Reserved.
 Reduce # of lock in flow lookup table
 Frequent locks are required
• Switch OpenFlow agent and counter retrieve for SNMP
• Packet processing
Packet batching
Input packet
Lookup table
Input packet
Lookup table
Lock Unlock
ロック Unlock
●Naïve implementation
●Packet batching implementation
Lock is required
for each packet
Lock can be reduced
due to packet batching
42Copyright©2017 NTT corp. All Rights Reserved.
 Reduce # of flow lookup in multiple table
 Generate composed flow hash function that contains flow tables
 Introduce flow cache for each CPU core
Bypass pipeline with flow $
42
●Naïve implementation table1 table2 table3
Input packet
table1 table2 table3
Input packet
Flow cache
1. New flow
2. Success flow
Output
packet
Multiple flow $
generation & mngmtWrite flow $
●Lagopus implementation
Output
packet
Packet
43Copyright©2017 NTT corp. All Rights Reserved.
Best OpenFlow 1.3 compliant switch
Type Action Set field Match Group Meter Total
# of test scenario
(mandatory, optional)
56
(3 , 53)
161
(0 , 161)
714
(108 , 606)
15
(3 , 12)
36
(0 , 36)
991
(114 , 877)
Lagopus
2014.11.09
59
(3, 56)
161
(0, 161)
714
(108, 606)
15
(3, 12)
34
(0, 34)
980
(114, 866)
OVS (kernel)
2014.08.08
34
(3, 31)
96
(0, 96)
534
(108, 426)
6
(3, 3)
0
(0, 0)
670
(114, 556)
OVS (netdev)
2014.11.05
34
(3, 31)
102
(0, 102)
467
(93, 374)
8
(3, 5)
0
(0, 0)
611
(99, 556)
IVS
2015.02.11
17
(3, 14)
46
(0, 46)
323
(108, 229)
3
(0, 2)
0
(0, 0)
402
(111, 291)
ofswitch
2015.01.08
50
(3, 47)
100
(0, 100)
708
(108, 600)
15
(3, 12)
30
(0, 30)
962
(114, 848)
LINC
2015.01.29
24
(3, 21)
68
(0, 68)
428
(108, 320)
3
(3, 0)
4
(0, 4)
523
(114, 409)
Trema
2014.11.28
50
(3, 47)
159
(0 , 159)
708
(108, 600)
15
(3, 12)
34
(0, 34)
966
(114, 854)
44Copyright©2017 NTT corp. All Rights Reserved.
 Summary
 Throughput: 10Gbps wire-rate
 Flow rules: 1M flow rules
 Evaluation models
 WAN-DC gateway
• MPLS-VLAN mapping
 L2 switch
• Mac address switching
Performance Evaluation
5Copyright©2015 NTTcorp. All Rights Reserved.
Typical carrier usecase
Usecase: Coud-VPN gatew ay
Tomorrow:
• Automatic connection setup via North Bound APIs
• SDN controller maintain mapping between tenant logical
network and VPN
• Routes are advertised via eBGP, no need to configure ASBRs
on provider side
Data center
Network
MPLS-VPN
MPLS
Inter-AS option B
SDN
controller
eBGPOF
API
From ONS2014 NTTCOM
Ito-san presentation
VRF-enabled
router with MPLS
45Copyright©2017 NTT corp. All Rights Reserved.
 Evaluation setup
 Server spec.
 CPU: Dual Intel Xeon E5-2660
• 8 core(16 thread), 20M Cache, 2.2 GHz, 8.00GT/s QPI, Sandy bridge
 Memory: DDR3-1600 ECC 64GB
• Quad-channel 8x8GB
 Chipset: Intel C602
 NIC: Intel Ethernet Converged Network Adapter X520-DA2
• Intel 82599ES, PCIe v2.0
Performance Evaluation
Server
Lagopus
Flow
table
tester Flows
Throughput (bps/pps/%)
Flow
rules
Packet size
Flow
cache
(on/off)
46Copyright©2017 NTT corp. All Rights Reserved.
WAN-DC Gateway
0
1
2
3
4
5
6
7
8
9
10
0 200 400 600 800 1000 1200 1400 1600
Throughput(Gbps)
Packet size (byte)
10 flow rules
100 flow rules
1k flow rules
10k flow rules
100k flow rules
1M flow rules
Throughput vs packet size, 1 flow, flow-cache
0
1
2
3
4
5
6
7
8
9
10
1 10 100 1000 10000 100000 1000000
Throughput(Gbps)
flows
10k flow rules
100k flow rules
1M flow rules
Throughput vs flows, 1518 bytes packet
47Copyright©2017 NTT corp. All Rights Reserved.
 10000 IP subnet entries test
L3 forwarding performance with 40G NIC and
CPU E5-2667v3 3.20GHz x2
Memory DDR4 64GB
NIC Intel X710 x2
OS Ubuntu 14.04LTS
DPDK 2.2
Lagopus 0.2.4 with default options
48Copyright©2017 NTT corp. All Rights Reserved.
 Full OpenFlow 1.3 support
 Limited OpenFlow 1.5 support (Flowmod-related instructions)
 General tunnel encap/decap extension (EXT-382 and EXT-566) support
• GRE, VxLAN, GTP, Ethernet, IPv4, IPv6
• Updated draft will be implemented soon
 Flexible & high-performance dataplane
 Hybrid-mode support
• ACTION_NORMAL (L2, L3)
 Various network I/O support
• DPDK NIC, vNIC (vhost-user with virtio-net), none-DPDK NIC (raw-socket)
 Leverage network stack in OS kernel for OpenFlow switch
• ARP, ICMP, Routing control packet are escalated to network stack (with tap IF)
 Queue, Meter table
 Linux, FreeBSD, NetBSD support
 Virtualization support for NFV
 DPDK-enabled VNF on DPDK-enabled vSwitch
 QEMU/KVM support through virsh
Lagopus version 0.2.10
49Copyright©2017 NTT corp. All Rights Reserved.
Hands-on and seminar
50Copyright©2017 NTT corp. All Rights Reserved.
Collaboration with Lagopus
Bussiness Research institutes and networks
Software switch collaboration White box switch collaboration
Yokoyama Laboratory
51Copyright©2017 NTT corp. All Rights Reserved.
High-performance vNIC framework for
hypervisor-based NFV with userspace
vSwitch
• To provide novel components to enable high-
performance NFV with general propose hardware
• To provide high-performance vNIC with operation-
friendly features
52Copyright©2017 NTT corp. All Rights Reserved.
Issues on NFV middleware
53Copyright©2017 NTT corp. All Rights Reserved.
Performance bottleneck in NFV with HV domain
User space on host
Guest VM
Kernel space
Kernel space on host
bridge
NIC driver
TAP driver
QEMU
KVM driver
TAP client
NIC HW
emulator
User space
e1000 NIC driver
NW stack
Legacy NW apps
NIC
Packet buffer
interrupt
copy
copy
register access
interrupt register access
copy
Pkt recv / send
cause
VM transition
HW emulation
needs CPU cycle
& VM transition
Privileged register accesses
for vNIC
cause VM transition
System call cause
context switch on
guest VM
VM transition: 800 CPU cycles
54Copyright©2017 NTT corp. All Rights Reserved.
 Use para-virtualization NIC framework
 No full-virtualization (emulation-based)
 Global-shared memory-based packet exchange
 Reduce memory copy
 User-space-based packet data exchange
 No kernel-userspace packet data exchange
vNIC strategy for performance & RAS
55Copyright©2017 NTT corp. All Rights Reserved.
 DPDK apps or legacy apps on guest VM
+ userspace DPDK vSwitch
 Connected by shared memory-based vNIC
 Reduce OS kernel implementation
Target NFV architecture with hypervisor
Virtua lma chine
Ma na gement a nd
API
Virtua lSwitch
(vSwitch)
NIC
Ha rd wa re resources
Core Core
VM VM
Core
VM VMNf-Vi-H
Instruction,
Policing ma p p ing
a nd emula tion
CoreCore CoreCore
Seq uentia lthrea d
emula tion
Run in userspace
to avoid VM
transition and
context switch
Memory based
packet transfer
56Copyright©2017 NTT corp. All Rights Reserved.
Existing vNIC for u-vSW and guest VM (1/2)
DPDK e1000 PMD with QEMU's e1000 FV and
vSwitch connected by tap
DPDK virtio-net PV PMD with QEMU virtio-net framework
and
vSwitch connected by tap
DPDK virtio-net PV PMD with vhost-net framework and
vSwitch connected by tap
User space on host
Kernel space on host
DPDK
ETHDEV / PMD
UIO driver TAP driver
QEMU
KVM driver
TAP client
NIC HW
emulator
DPDK-enabled
NIC
interruptcopy
copy
DPDK-enabled vSwitch
DPDK ETHDEV
/ pcap PMD
copy
Software dataplane
DMA
Guest VM
Kernel space
User space
DPDK NW apps
UIO driver
DPDK
ETHDEV/
e1000 PMD
register access
Packet buffer
register access
User space on host
Kernel space on host
DPDK
ETHDEV / PMD
UIO driver TAP driver
DPDK-enabled
NIC
DPDK-enabled vSwitch
DPDK ETHDEV /
pcap PMD
copy
Software dataplane
DMA
Guest VM
Kernel space
KVM driver
User space
vhost-net
DPDK NW apps
TAP client
ioeventfd
register access
copy
copy
DPDK
ETHDEV/
virtio-net PMD
UIO
Packet buffer
virtio
ring
User space on host
Kernel space on host
DPDK
ETHDEV / PMD
UIO driver TAP driver
DPDK-enabled
NIC
copy
DPDK-enabled vSwitch
DPDK ETHDEV /
pcap PMD
copy
Software dataplane
DMA
Guest VM
Kernel space
QEMU
KVM driver
TAP client
Virtio-net device
User space
UIO
copy
DPDK NW apps
register access
register access
DPDK ETHDEV/
virtio-net PMD
Packet buffer
virtio
ring
Pros: legacy and DPDK support, opposite status detection
Cons: bad performance, many VM transitions,
context switch
Pros: legacy and DPDK support, opposite status detection
Cons: bad performance, many VM transitions,
context switch
Pros: legacy and DPDK support, opposite status detection
Cons: Cons: bad performance, many VM transitions,
context switch
57Copyright©2017 NTT corp. All Rights Reserved.
User space on host
Kernel space on host
DPDK
ETHDEV / PMD
UIO driver
DPDK-enabled
NIC
DPDK-enabled vSwitch
Software dataplane
Guest VM
Kernel space
QEMU
KVM driver
IVSHMEM
device
User space
Network apps
DPDK
RING API
Hugepage
DPDK
RING API
IVSHMEM driver
Packet
buffer
Packet
buffer
DMA
mma p
mma p
Packet
buffer
mma p
Ring
Ring
Ring
User space on host
Kernel space on host
DPDK
ETHDEV / PMD
UIO driver
DPDK-enabled
NIC
DPDK-enabled vSwitch
DPDK
vhost-user API
Software dataplane
DMA
Guest VM
Kernel space
User space
UIO driver
copy
DPDK NW apps
DPDK
ETHDEV/
virtio-net PMD
KVM driver
Packet buffer
virtio
ring
vhost-user
backend
QEMU
virtio-net device
Existing vNIC for u-vSW and guest VM (2/2)
DPDK ring by QEMU IVSHMEM extension and
vSwitch connected by shared memory
DPDK virtio-net PV PMD with QEMU virtio-net framework
and vSwitch with DPDK vhost-user API to connect to virtio-
net PMD.
Pros: Best performance
Cons: only DPDK support, static configuration, no
RAS
Pros: good performance,
both support of legacy and DPDK
Cons: no status tracking of opposite device
58Copyright©2017 NTT corp. All Rights Reserved.
High performance
vNIC framework for NFV
This patch has been already merged to DPDK
59Copyright©2017 NTT corp. All Rights Reserved.
 High-Performance
 10-Gbps network I/O throughput
 No virtualization transition between a guest VM and u-vSW
 Simultaneous support DPDK apps and DPDK u-vSW
 Functionality for operation
 Isolation between NFV VM and u-vSW
 Flexible service maintenance support
 Link status notification on the both sides
 Virtualization middleware support
 Support open source hypervisor (KVM)
 DPDK app and legacy app support
 No OS (kernel) modification on a guest VM
vNIC requirements for NFV with u-vSW
60Copyright©2017 NTT corp. All Rights Reserved.
 vNIC as an extension of virtio-net framework
 Para-virtualization network interface
 Packet communication by global shared memory
 One packet copy to ensure VM-to-VM isolation
 Control msg by inter-process-communication between pseudo devices
vNIC deisgn
User space on host
Kernel space on host
DPDK-enabled vSwitch
Software dataplane
Guest VM
Kernel space
User space
virtio-net-compatible
device
DPDK NW apps
DPDK ETHDEV/
virtio-net PMD
Global shared memory
DPDK ETHDEV /
PMD
pseudo PMD-
enabled device
IPC-based
control
communication
61Copyright©2017 NTT corp. All Rights Reserved.
 Virtq-PMD driver: 4K LOC modification
 Virtio-net device with DPDK extension
 DPDK API and PV-based NIC (virtio-net) API
 Global shared memory-based packet transmission on hugeTLB
 UNIX domain socket based control message
• Event notification (link-status, finalization)
• Pooling-based the opposite device check mechanism
 QEMU: 1K LOC modification
 virtio-net-ipc device on shared memory space
 Shared memory-based device mapping
vNIC implementation
User space on host
Kernel space on host
DPDK-enabled
vSwitch
Software
dataplane
Guest VM
Kernel space
User space
virtio-net-ipc
device
DPDK NW apps
DPDK ETHDEV/
virtio-net PMD
Global shared memory/HugeTLB
DPDK ETHDEV /
virtq-PMD
Unix domain socket
kick/queue addr
virtio-net device
Event notification
62Copyright©2017 NTT corp. All Rights Reserved.
Performance
63Copyright©2017 NTT corp. All Rights Reserved.
vhost application on host
Measurement
point
virtio-net PMD
virtq PMD
testpmd on Guest VM
Null PMD
Null PMD
Vhost app
testpmd on host
Null PMD
Bare-metal configuration
virtq PMD
Measurement
point
Testpmd on host
Measurement
point
pcap PMD
testpmd on Guest VM
Null PMD
Null PMD
virtq PMD
virtqueue
TAP driver
Virtio-net driver
Kernel-driver
testpmd on host
virtq PMD
Measurement
point
virtio-net PMD
virtqueue
testpmd on Guest VM
Null PMD
Null PMD
virtq-pmd
 micro benchmarking tool: Testpmd apps
 Polling-based DPDK bridge app that reads data from a NIC and writes data to
another NIC in both directions.
 null-PMD: a DPDK-enabled dummy PMD to allow packet generation from
memory buffer and packet discard to memory buffer
Performance benchmark
64Copyright©2017 NTT corp. All Rights Reserved.
Performance evaluation
MPPSGBPS
 Virtq PMD achieved great performance
 62.45 Gbps (7.36 MPPS) unidirectional throughput
 122.90 Gbps (14.72 MPPS) bidirectional throughput
 5.7 times faster than Linux driver in 64B, 2.8 times faster than Linux drvier in 1500B
 Virtq PMD achieved better performance in large packet to vhost app
0.00
2.00
4.00
6.00
8.00
10.00
12.00
64 256 512 1024 1500 1536
MPPS
Packet size
virtq PMD (off) virtq PMD (on) vhost app (off) vhost app (on) Linux drv (off) Linux drv (on)
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
64 256 512 1024 1500 1536
GBPS
Packet size
virtq PMD (off) virtq PMD (on) vhost app (off) vhost app (on) Linux drv (off) Kernel drv (on)
65Copyright©2017 NTT corp. All Rights Reserved.
Container adaptation
66Copyright©2017 NTT corp. All Rights Reserved.
Vhost-user for container
 Vhost-user compatible
PMD for container
 Virtio-net-based backend
 Shared-memory-based
packet data exchange
 Event-trigger by shared file
 27.27 Gbps throughtput
67Copyright©2017 NTT corp. All Rights Reserved.
 Packet flow
 pktgen -> physical-> vswitch -> Container(L2Fwd) -> vswitch -> physical -> pktgen
Performance
Lagopus or docker0
Server
Container
L2Fwd or Linux Bridge
Container
pktgen-dpdk
OS: Ubuntu 16.04.1
CPU: Xeon E5-2697 v2 @ 2.70GHz
Mem: 64GB
68Copyright©2017 NTT corp. All Rights Reserved.
Performance
69Copyright©2017 NTT corp. All Rights Reserved.
SDN IX
@ Interop Tokyo 2015 ShowNet
Interop Tokyo is
the biggest Internet-related technology show in Japan.
This trial was collaboration with NECOMA project
(NAIST & University of Tokyo)
70Copyright©2017 NTT corp. All Rights Reserved.
 IX (Internet eXchange)
 Packet exchange point between ISP and
DC-SP
 Boarder router of ISP exchanges route
information
 Issue
 Enhance automation in provisioning and
configuration
 DDoS attack is one of the most critical
issues
• ISP wants to reduce DDoS-related traffic in
origin
• DDoS traffic occupies link bandwidth
Motivation of SDN-IX
IX
ISP-CISP A ISP-DISP B
SW
SWSW
SW
ISP-EISP F
IX
ISP-CISP A ISP-DISP B
SW
SWSW
SW
ISP-EISP F
71Copyright©2017 NTT corp. All Rights Reserved.
What is SDN IX?
 Next generation IX with SDN technology
 Web portal-based path provisioning between ISPs
• Inter-AS L2 connectivity
– VLAN-based path provisioning
– Private peer provisioning
 Protect network from DDoS attack
• On-demand 5-tuple-baesd packet filtering
 SDN IX controller and distributed SDN/OpenFlow IX core switch
Developed by NECOMA project
(NAIST and University of Tokyo)
ISP-CISP A ISP-DISP B
ISP-EISP F
ISP-CISP A ISP-DISP B
SW
SWSW
SW
ISP-EISP F
72Copyright©2017 NTT corp. All Rights Reserved.
 Two Lagopus (soft switch) are deployed for
SDN-IX core switch
 Multiple 10Gbps links
 Dual Xeon E5 8core CPUs
Lagopus @ ShowNet 2015
73Copyright©2017 NTT corp. All Rights Reserved.
Lagopus @ ShowNet rack
74Copyright©2017 NTT corp. All Rights Reserved.
Connectivity between AS
qfx10k ne5kx8
AS290AS131154
DIX-IEJPIXKDDI
CRS-4
10G-LR
lagopus-1
(DPID:2)
pf5240-1
(DPID:1)
ax.noteJGNX
lagopus-2
(DPID:4)
pf5240-2
(DPID:3)
xg-89:0.1
(port 4)
xg-83:0.0
(port 1)
xg-89:00.0
(port 3)
xg-83:00.1
(port 2)
xg-83:0.0
(port 1)
xg-83:0.1
(port 2)
xg-89:0.0
(port 3)
xg-1-0-49
(port 49)
xg-1-0-51
(port 51)
xg-1-0-52
(port 52)
xg-1-0-50
(port 50)
xg-1-0-49
(port 49)
xg-1-0-50
(port 50)
xg-1-0-51
(port 51)
799, 1600, 1060, 810, 910, 920 (tmporally)2, 3000
???
100
???
Otemachi
Makuhari (Veneue)
75Copyright©2017 NTT corp. All Rights Reserved.
 Average 2Gbps throughput
 No packet drop
 No reboot & no trouble for 1 week during Interop Tokyo
 Sometimes 10Gbps burst traffic
Traffic on Lagopus @Makuhari
76Copyright©2017 NTT corp. All Rights Reserved.
Big change happened
Before After
 
vSwitch has
lots of issues on
performance,
scalability,
stability, …..
vSwitch works
well without
any trouble!
Good
performance,
Good stability.
77Copyright©2017 NTT corp. All Rights Reserved.
 The SDI special prize of Show Award
in Interop Tokyo 2015
 http://www.interop.jp/2015/english/exhibition/bsa.html
 Finalist
 The SDI
 ShowNet demonstration
Award
78Copyright©2017 NTT corp. All Rights Reserved.
DPDK-enabled SDN/NFV middleware
with Lagopus & VNF with Vhost
@Interop Tokyo 2016
This trial was collaboration with
University of Tokyo and IPIfusion
79Copyright©2017 NTT corp. All Rights Reserved.
NFV middleware for scale-out VNFs
 Flexible load balance for VNFs with
smart hash calculation and flow direction
 Hash calc: NetFPGA-SUME
• Hash calculation using IP address pairs
• Hash value are injected to MAC src for flow direction for VNF
 Classification and flow direction: Lagopus
• Flow direction with MAC src lookup
HV VNF VNF VNF
lagopus
lagopus
uplink
downlink
hash calc &
mac rewrite
MAC-based
classification for VMs
hash dl_src
type1 52:54:00:00:00:01
type2 52:54:00:00:00:02
… …
Type 256 52:54:00:00:00:FF
80Copyright©2017 NTT corp. All Rights Reserved.
Bird in ShowNet
 Two Lagopus deployments
 NFV domain, SDN-IX
https://www.facebook.com/interop.shownet
81Copyright©2017 NTT corp. All Rights Reserved.
Challenges in Lagopus
 vNIC between DPDK-enabled Lagopus and DPDK-enabled
VNF (Virnos)
 Many vNICs and flow director (loadbalancing)
 8 VNFs and total 18 vNICs
HV VirNOS VirNOS VirNOS VirNOS
lagopus
lagopus
port2
port4 port6 port8 port10
port9port7port5port3
port1
Eth0
Eth1
Eth0
Eth1
Eth0
Eth1
Eth0
Eth1
82Copyright©2017 NTT corp. All Rights Reserved.
Explicit resource assignment for performance
 Packet processing workload aware assignment is required
for Lagopus and VNF
Memory
Memory
NIC
core
core
core
core
core
core
core
core
core
core
core
core
core
core
core
core
CPU0 CPU1
Traffic
83Copyright©2017 NTT corp. All Rights Reserved.
Resource assign impacts in packet processing
performance
Memory
Memory
NIC
core
core
core
core
core
core
core
core
core
core
core
core
core
core
core
core
CPU0 CPU1
Traffic
Lagopus 8 VNFs
Memory
Memory
NIC
core
core
core
core
core
core
core
core
core
core
core
core
core
core
core
core
CPU0 CPU1
Traffic
Lagopus8 VNFs
10Gbps
4.4Gbps
84Copyright©2017 NTT corp. All Rights Reserved.
HV VirNOS VirNOS VirNOS VirNOS
lagopus
lagopus
port2
port4 port6 port8 port10
port9port7port5port3
port1
Eth0
Eth1
Eth0
Eth1
Eth0
Eth1
Eth0
Eth1
 DPDK-based system needs CPUs for I/O because polling-
based network I/O in DPDK
 Physical I/O are intensive compared to vNICs
CPU resource assignment for I/O (1/2)
84
10/4 Gbps 10Gbps
85Copyright©2017 NTT corp. All Rights Reserved.
HV VirNOS VirNOS VirNOS VirNOS
lagopus
lagopus
port2
port4 port6 port8 port10
port9port7port5port3
port1
Eth0
Eth1
Eth0
Eth1
Eth0
Eth1
Eth0
Eth1
 Traffic-path-aware CPU assign
 4 CPU core were assigned to I/O thread of Lagopus
CPU resource assignment for I/O (2/2)
85
10Gbps
10Gbps
5Gbps 5Gbps
86Copyright©2017 NTT corp. All Rights Reserved.
Performance evaluation
 Good performance and scalability
 But long packet journey
 Packet-in -> Physical NIC -> Lagopus -> vNIC -> VNF -> vNIC ->
Lagopus -> Physical NIC -> Packet-out
[byte]
[Mbps]
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 500 1000 1500
wire rate
lagopus
Packet size
Traffic
87Copyright©2017 NTT corp. All Rights Reserved.
Other trials
88Copyright©2017 NTT corp. All Rights Reserved.
 Location-aware packet forwarding
+ Service Chain (NFV integration)
 Location-aware transparent security
check by NFV
 Virtual Network
 Intra network
• Web service and clients
• Malware site blocking
 Lab network
• Ixia tester for demo
• Policy management (Explicit routing for TE)
#1: Segment routing with Lagopus for campus network
lago0-0
lago0-1 lago1-1
lago1-0
lago2-1
lago2-0
nfv0
win00
Serv01
Serv00
win01
p1
p1
vFW vFW
web
Untrusted server
block service
lago0
-0
lago0
-1
lago1
-1
lago1
-0
lago2
-1
lago2
-0
nfv0
Ixia
p
3
vFW vFW
Ixia
89Copyright©2017 NTT corp. All Rights Reserved.
 Flexible video stream transmission
for multiple-sites and devices
 Lagopus switch as stream duplicator
 Simultaneous 4K video – 50 sites streaming
#2: transparent video stream duplication
111
Encoder
Decoder
Live
IP NW
Cinema or public space
90Copyright©2017 NTT corp. All Rights Reserved.
#2: Blackboard streaming without Teacher’s shadow
Human shadow transparent module
Pattern A
Pattern B
 Realtime image processing with flow direction switch
 User can select modes with teacher or without teacher
 No configuration change without video options
Students can not see due to shadow Transparent processing help
students 
Realtime image
processing
TM
O3 project: providing flexible wide
area networks with SDN
This research is executed under a part of a Research and Development of Network
Virtualization Technology” program commissioned by the Ministry of Internal Affairs and
Communications. (Y2013-2016)
TM
Open Innovation over Network Platform
• 3 kinds of Contributions for User-oriented SDN
(1) Open development with OSS
(2) Standardization of architecture and interface
(3) Commercialization of new technologies
Toward open User-oriented SDN
©O3 Project 92
(1) Open (2) Standardization (3) Commercialization
TM
• Open, Organic, Optima
– Anyone, Anything, Anywhere
– Neutrality & Efficiency for Resource, Performance, Reliability, ….
– Multi-Layer, Multi-Provider, Multi-Service
• User-oriented SDN for WAN
– Softwarization: Unified Tools and Libraries
– On-demand, Dynamic, Scalable, High-performance
• Features
– Object-defined Network Framework
– SDN WAN Open Source Software
– SDN Design & Operations Guideline
• Accelerates
– Service Innovation, Re-engineering, Business Eco-System
The O3 Project Concept, Approach, & Goal
©O3 Project 93
TM
The O3 User-oriented SDN Architecture
©O3 Project 94
Path Nodes
(Opt・Pkt Transport)
Switch Nodes
(Lagopus, OF )
D-plane
C-plane
D-plane consists of Switch and Path Nodes; Switching Nodes provide programmability,
and Path Nodes provide various type of network resources.
Orchestrator & Controllers can create and configure virtual
networks according to SDN Users, and enable to customized
control on individual D-Plane.
Virtual NW Virtual NW
OTTs
Carriers
OTT-A
Cnt. Appl.
OTT-B
Cnt. Appl.
Controls on Virtual
NW
View from
Virtual NW
Network Orchestrator
Switch Nodes
(Lagopus, OF)
Controller
(スイッチ部)
Controller
(スイッチ部)
Controller
(Switch Nodes)
Controller
(パス部)
Controller
(パス部)
Controller
(Path Nodes)
Controller
(スイッチ部)
Controller
(スイッチ部)
Controller
(Switch Nodes)
Common
Control
Framework
SDN Nodes
Multi-Layer,
Multi-Domain
Control
TM
• WAN experiments with Multi-vendor Equipment
Proof-of-Concept: Physical Configuration
©O3 Project 95
TM
PoC on Multi-Layer & Domain Control
©O3 Project 96
TM
PoC on Network Visualization
©O3 Project 97
The Hands-on training for ASEAN Smart Network
98Copyright©2017 NTT corp. All Rights Reserved.
Ryu SDN Framework
http://osrg.github.io/ryu/
99Copyright©2017 NTT corp. All Rights Reserved.
 OSS SDN Framework founded by NTT
 Software for building SDN control plane agilely
 Fully implemented in Python
 Apache v2 license
 More than 350 mailing list subscribers
 Supporting the latest southbound protocols
 OpenFlow 1.0, 1.2, 1.3, 1.4 (and Nicira extensions)
 BGP
 Ofconfig 1.2
 OVSDB JSON
What’s RYU?
100Copyright©2017 NTT corp. All Rights Reserved.
Many users
and more…
101Copyright©2017 NTT corp. All Rights Reserved.
 Developed mainly for network operators
 Not for one who sells the specific hardware switch
 Integration with the existing networks
 Gradual SDN’ing’ the existing networks
Ryu development principles
102Copyright©2017 NTT corp. All Rights Reserved.
 Your application are free from OF wire format (and some details
like handshaking)
What ‘supporting OpenFlow’ means?
Python
Object
OF wire
protocol
Data
Plane
Ryu converts it
Python
Object
OF wire
Protocol
Ryu generates
Your application
does something here
103Copyright©2017 NTT corp. All Rights Reserved.
Ryu development is automated
github
Push the new code
Unit tests are executed
Docker hub image
is updated
Ryu certification is
executed on test
lab
Ryu certification
site is updated
You can update
your Ryu
environment
with one command
104Copyright©2017 NTT corp. All Rights Reserved.
Lessons leaned
105Copyright©2017 NTT corp. All Rights Reserved.
 What’s OpenStack?
 OSS for building IaaS
 You can run lots of VMs
 Many SDN solutions are supported
 What SDN means for OpenStack?
 The network for your VMs are separated from others
 Virtual L2 network on the top of L3 network
SDN in OpenStack
106Copyright©2017 NTT corp. All Rights Reserved.
 Virtual L2 on tunnels (VXLAN, GRE, etc)
Typical virtual L2 implementation
OVS
Agent
Compute node
VM VM
OVS
Agent
Compute node
VM VM
OVS
Agent
Compute node
VM VM
OVS
Agent
Compute node
VM VM
Tunnel
107Copyright©2017 NTT corp. All Rights Reserved.
People advocated something like this
Data
Plane
Data
Plane
Data
Plane
OpenFlow Controller
OpenFlow Protocol
Application Logic
108Copyright©2017 NTT corp. All Rights Reserved.
 Same as other OpenFlow controllers
 The controller are connected with all the OVSes
Our first version OpenStack integration
Plugin
Neutron
Server
Ryu
OVS
RYU
Agent
Compute node
VM VM
Custom REST API
OpenFlow
OVS
Agent
Compute node
VM VM
OVS
Agent
Compute node
VM VM
OpenStack REST API
SDN
Operational
Intelligence
109Copyright©2017 NTT corp. All Rights Reserved.
 Scalability
 Availability
What’s the problems?
110Copyright©2017 NTT corp. All Rights Reserved.
 How many a single controller can handle?
 Can handle hundreds or thousands?
 Controller does more than setting up flows
 Replying to ARP packet requests rather than sending ARP packets to all
compute nodes
 Making OVS work as L3 router rather than sending packets to a central router
 You could add more here
Scalability
111Copyright©2017 NTT corp. All Rights Reserved.
 The death of a controller leads to the dead of the whole cloud
 No more network configuration
Availability
112Copyright©2017 NTT corp. All Rights Reserved.
 OFC on every compute node
 One controller handles only one OVS
Our second verion (OFAgent driver)
Neutron
Server
OVS
RYU
Agent
Compute node
VM VM
OVS
RYU
Agent
Compute node
VM VM
OVS
RYU
Agent
(OFC)
Compute node
VM VM
OpenStack standard RPC
Over queue system
Released in Icehouse
OpenStack REST API
Openflow is used only inside a compute node
• Scalable with the number of compute nodes
• No single point of failure in OFAgent
SDN
Operational
Intelligence
113Copyright©2017 NTT corp. All Rights Reserved.
 Push more features to edges
 Distribute features
 Place only a feature (e.g. TE) on central node you can’t distribute
 Couple loosely a central node and edges
 Tight coupling doesn’t scale (e.g. OpenFlow connections between a controller
and switches)
 The existing technology like queue works
SDN deployment for scale
114Copyright©2017 NTT corp. All Rights Reserved.
 NSA (National Security Agency)
More users: Tracking network activities
“The NSA is using NTT’s Ryu SDN controller. Larish says it’s a few thousand
lines of Python code that’s easy to learn, understand, deploy and troubleshoot”
http://www.networkworld.com/article/2937787/sdn/nsa-uses-openflow-for-tracking-its-network.html
115Copyright©2017 NTT corp. All Rights Reserved.
 TouIX (IX in France) : Replacing expensive legacy switch with
whitebox switch and Ryu
More users: IX (Internet Exchange)
“The deployment is leveraging Ryu, the NTT Labs open-source controller”
http://finance.yahoo.com/news/pica8-powers-sdn-driven-internet-120000932.html
Zebra 2.0
Open Source Routing Software
Open Source Revisited
• Apache License
• Written From Scratch in Go
• Go routine & Go channel is used for
multiplexing
• Task Completion Model + Thread Model
• Single SPF Engine for OSPFv2/OSPFv3/IS-IS
• Forwarding Engine Abstraction for DPDK/OF-
DPA
• Configuration with Commit/Rollback
• gRPC for Zebra control
Architecture
• Single Process/Multithread Architecture
BGP OPSF RSVP-TE LDP
FEA
OpenConfigd
• Commit & Rollback support configuration system
• Configuration is defined by YANG
• CLI, NetConf, REST API is automatically
generated
• `confsh` - bash based CLI command
• OpenConfig is fully supported
OpenConfigd Architecture
• gRPC is used for transport
• completion/show/config APIs between shell
OpenConfigd
Zebra 2.0 Lagopus
confsh
DB
completion show config
API API
Forwarding Engine Abstraction
• Various Forwarding Engine Exists Today
• OS Forwarder Layer
• DPDK
• OF-DPA
• FEA provides Common Layer for Forwarding Engine
• FEA provides
• Interface/Port Management
• Bridge Management
• Routing Table
• ARP Table
122Copyright©2017 NTT corp. All Rights Reserved.
Current limitation of software switch
 # of OF apps is limited
 Leverage Network OS for whtitebox switch
• Opensnaproute, Linux kernel, OpenSwitch
 Hard to integrate with other management system
 OpenStack, libvirt,
 nFlow, sFlow, BGP-extension
 OF pipeline does not cover all our requirements
 Tunnel termination
• IPsec, VxLAN,
 Control packet escalation/injection
 User-defined OAM functionality
 Heavy packet processing for OpenFlow flow entry
 Lookup, action, long pipeline
 Balance of programmability and existing network protocol support
 L2, L3 (IPv4, IPv6), GRE, VxLAN, MPLS
 Hybrid traffic control
123Copyright©2017 NTT corp. All Rights Reserved.
 Provide router-aware programmable
dataplane for network OS
 Protocol-aware pipeline and APIs, OpenFlow
 Integration with network OS
 Existing forwarding &routing protocols
support (BGP, OSPF)
 VPN framework over IP networks
 IP as a transport protocol
 Vxlan, GRE, IPsec tunnel support
 Decouple OpenFlow semantics and
Wireprotocol from OpenFlow
protocol
 Provide gRPC switch control API
Next major upgrade: Lagopus SDN switch router
124Copyright©2017 NTT corp. All Rights Reserved.
Forwarding Engine Integration
dataplane
Lagopus
BGP OPSF RSVP-
TE
LDP
FEA
Zebra 2.0
OpenConfigd DB
Dataplane
Manager
Configuration
datastore
RIB/FIB
control Interface/Port
bridge mngmt
Interface/Port
bridge mngmt
C-plane related packet
User-traffic User-traffic
C-plane related packet
C-plane packet
escalation via tap IF
FIB
ARP
Stats
DB
C-plane packet
escalation via tap IF
125Copyright©2017 NTT corp. All Rights Reserved.
 New version will available this summer
Availability?
126Copyright©2017 NTT corp. All Rights Reserved.
Hirokazu Takahashi, Tomoya Hibi, Ichikawa, Masaru Oki,
Motonori Hirano, Kiyoshi Imai, Takaya Hasegawa,
Tomohiro Nakagawa, Koichi Shigihara, Keisuke Kosuga,
Takanari Hayama, Tetsuya Mukawa, Saori Usami,
Kunihiro Ishiguro
Thanks our development team
127Copyright©2017 NTT corp. All Rights Reserved.
 Comments and collaboration are very welcome!
 Web
 https://lagopus.github.io
 Github
 Lagopus vswitch
• https://github.com/lagopus/lagopus
 Lagopus book
• http://www.lagopus.org/lagopus-book/en/html/
 Ryu with general tunnel ext
• https://github.com/lagopus/ryu-lagopus-ext
Conclusion
128Copyright©2017 NTT corp. All Rights Reserved.
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Service Function Chaining in Openstack Neutron
Service Function Chaining in Openstack NeutronService Function Chaining in Openstack Neutron
Service Function Chaining in Openstack Neutron
Michelle Holley
 
TCP over 6LoWPAN for Industrial Applications
TCP over 6LoWPAN for Industrial ApplicationsTCP over 6LoWPAN for Industrial Applications
TCP over 6LoWPAN for Industrial Applications
Ahmed Ayadi
 
NFV Linaro Connect Keynote
NFV Linaro Connect KeynoteNFV Linaro Connect Keynote
NFV Linaro Connect Keynote
Linaro
 

Was ist angesagt? (20)

IoT Field Area Network Solutions & Integration of IPv6 Standards by Patrick G...
IoT Field Area Network Solutions & Integration of IPv6 Standards by Patrick G...IoT Field Area Network Solutions & Integration of IPv6 Standards by Patrick G...
IoT Field Area Network Solutions & Integration of IPv6 Standards by Patrick G...
 
Service Function Chaining in Openstack Neutron
Service Function Chaining in Openstack NeutronService Function Chaining in Openstack Neutron
Service Function Chaining in Openstack Neutron
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 
Network Node is Not Needed Anymore - Completed Distributed Virtual Router / F...
Network Node is Not Needed Anymore - Completed Distributed Virtual Router / F...Network Node is Not Needed Anymore - Completed Distributed Virtual Router / F...
Network Node is Not Needed Anymore - Completed Distributed Virtual Router / F...
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Cisco Live! :: Introduction to Segment Routing :: BRKRST-2124 | Las Vegas 2017
Cisco Live! :: Introduction to Segment Routing :: BRKRST-2124  | Las Vegas 2017Cisco Live! :: Introduction to Segment Routing :: BRKRST-2124  | Las Vegas 2017
Cisco Live! :: Introduction to Segment Routing :: BRKRST-2124 | Las Vegas 2017
 
SDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + QuantumSDN - OpenFlow + OpenVSwitch + Quantum
SDN - OpenFlow + OpenVSwitch + Quantum
 
TCP over 6LoWPAN for Industrial Applications
TCP over 6LoWPAN for Industrial ApplicationsTCP over 6LoWPAN for Industrial Applications
TCP over 6LoWPAN for Industrial Applications
 
Accelerating SDN Applications with Open Source Network Overlays
Accelerating SDN Applications with Open Source Network OverlaysAccelerating SDN Applications with Open Source Network Overlays
Accelerating SDN Applications with Open Source Network Overlays
 
OpenFlow: What is it Good For?
OpenFlow: What is it Good For? OpenFlow: What is it Good For?
OpenFlow: What is it Good For?
 
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
 
FD.io - The Universal Dataplane
FD.io - The Universal DataplaneFD.io - The Universal Dataplane
FD.io - The Universal Dataplane
 
pps Matters
pps Matterspps Matters
pps Matters
 
NFV Linaro Connect Keynote
NFV Linaro Connect KeynoteNFV Linaro Connect Keynote
NFV Linaro Connect Keynote
 
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK summit 2015: It's kind of fun  to do the impossible with DPDKDPDK summit 2015: It's kind of fun  to do the impossible with DPDK
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
 
Building DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPNBuilding DataCenter networks with VXLAN BGP-EVPN
Building DataCenter networks with VXLAN BGP-EVPN
 
Segment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USASegment Routing Advanced Use Cases - Cisco Live 2016 USA
Segment Routing Advanced Use Cases - Cisco Live 2016 USA
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
02 - IDNOG04 - Sheryl Hermoso (APNIC) - IPv6 Deployment at APNIC
 
SRv6 Network Programming: deployment use-cases
SRv6 Network Programming: deployment use-cases SRv6 Network Programming: deployment use-cases
SRv6 Network Programming: deployment use-cases
 

Andere mochten auch

Andere mochten auch (20)

Lagopus workshop@Internet weekのそば
Lagopus workshop@Internet weekのそばLagopus workshop@Internet weekのそば
Lagopus workshop@Internet weekのそば
 
Introduction to SDN and NFV
Introduction to SDN and NFVIntroduction to SDN and NFV
Introduction to SDN and NFV
 
WAN SDN meet Segment Routing
WAN SDN meet Segment RoutingWAN SDN meet Segment Routing
WAN SDN meet Segment Routing
 
SDN and NFV: Friends or Enemies
SDN and NFV: Friends or EnemiesSDN and NFV: Friends or Enemies
SDN and NFV: Friends or Enemies
 
NFV evolution towards 5G
NFV evolution towards 5GNFV evolution towards 5G
NFV evolution towards 5G
 
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFV
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFVOpenStack and OpenDaylight: An Integrated IaaS for SDN/NFV
OpenStack and OpenDaylight: An Integrated IaaS for SDN/NFV
 
SDN/NFV Sudanese Research Group Initiative
SDN/NFV Sudanese Research Group Initiative SDN/NFV Sudanese Research Group Initiative
SDN/NFV Sudanese Research Group Initiative
 
Summit 16: ETSI NFV Interface and Architecture Overview
Summit 16: ETSI NFV Interface and Architecture OverviewSummit 16: ETSI NFV Interface and Architecture Overview
Summit 16: ETSI NFV Interface and Architecture Overview
 
Lagopus + DockerのDPDK接続
Lagopus + DockerのDPDK接続Lagopus + DockerのDPDK接続
Lagopus + DockerのDPDK接続
 
Network Slicing overview_v6
Network Slicing overview_v6Network Slicing overview_v6
Network Slicing overview_v6
 
NFV and OpenStack
NFV and OpenStackNFV and OpenStack
NFV and OpenStack
 
NFV Open Source projects
NFV Open Source projectsNFV Open Source projects
NFV Open Source projects
 
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...
Summit 16: Keynote: HPE Presentation- Transforming Communication Service Prov...
 
NFV & Openstack
NFV & OpenstackNFV & Openstack
NFV & Openstack
 
Colt SD-WAN experience learnings and future plans
Colt SD-WAN experience learnings and future plansColt SD-WAN experience learnings and future plans
Colt SD-WAN experience learnings and future plans
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
 
Actividad 13 de jose felix luna martinez
Actividad 13 de jose felix luna martinezActividad 13 de jose felix luna martinez
Actividad 13 de jose felix luna martinez
 
Adasdasdaddasd
AdasdasdaddasdAdasdasdaddasd
Adasdasdaddasd
 
Actividad what are you doing complete
Actividad what are you doing completeActividad what are you doing complete
Actividad what are you doing complete
 
Options - Forex Management Chapter II - Part I
Options - Forex Management Chapter II - Part IOptions - Forex Management Chapter II - Part I
Options - Forex Management Chapter II - Part I
 

Ähnlich wie Software Stacks to enable SDN and NFV

Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Community
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
inside-BigData.com
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
Hirofumi Ichihara
 

Ähnlich wie Software Stacks to enable SDN and NFV (20)

Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro NakajimaDPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
 
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
Ceph Day Berlin: Deploying Flash Storage for Ceph without Compromising Perfor...
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and moreAdvanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Lenovo networking: top of the top of the rack
Lenovo networking: top of the top of the rackLenovo networking: top of the top of the rack
Lenovo networking: top of the top of the rack
 
NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus NFV Infrastructure Manager with High Performance Software Switch Lagopus
NFV Infrastructure Manager with High Performance Software Switch Lagopus
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
Mellanox Approach to NFV & SDN
Mellanox Approach to NFV & SDNMellanox Approach to NFV & SDN
Mellanox Approach to NFV & SDN
 
G rpc talk with intel (3)
G rpc talk with intel (3)G rpc talk with intel (3)
G rpc talk with intel (3)
 
SDN use cases_2014
SDN use cases_2014SDN use cases_2014
SDN use cases_2014
 
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
Ceph Day Amsterdam 2015 - Deploying flash storage for Ceph without compromisi...
 
Interconnect your future
Interconnect your futureInterconnect your future
Interconnect your future
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
NkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application serverNkSIP: The Erlang SIP application server
NkSIP: The Erlang SIP application server
 
guna_2015.DOC
guna_2015.DOCguna_2015.DOC
guna_2015.DOC
 
2014 carlos gzlez florido nksip the erlang sip application server
2014 carlos gzlez florido nksip the erlang sip application server2014 carlos gzlez florido nksip the erlang sip application server
2014 carlos gzlez florido nksip the erlang sip application server
 

Kürzlich hochgeladen

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Kürzlich hochgeladen (20)

Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Software Stacks to enable SDN and NFV

  • 1. 0Copyright©2017 NTT corp. All Rights Reserved. Software Stacks to enable Software-Defined Networking and Network Functions Virtualization Yoshihiro Nakajima <nakajima.yoshihiro@lab.ntt.co.jp, ynaka@lagopus.org> NTT Network Innovation Laboratories
  • 2. 1Copyright©2017 NTT corp. All Rights Reserved.  Yoshihiro Nakajima  Work for Nippon Telegraph and Telephone Corporation, R&D Division Network Innovation Laboratories  Project lead of Lagopus SDN software switch  Background • High performance computing • High performance networking and data processsing About me
  • 3. 2Copyright©2017 NTT corp. All Rights Reserved.  Trends  Software-Defined Networking (SDN)  Network Functions Virtualization (NFV)  Dataplane  Lagopus: SDN/OpenFlow software switch • Overview and NFV • Trials  Controller/C-plane  O3 project  Ryu and gobgp  Zebra2  Future plan Agenda
  • 4. 3Copyright©2017 NTT corp. All Rights Reserved. Network is so stuck….  Many technologies has emerged in system development area  Language, Debugger, Testing framework, Continuous Integration, Continuous Deployment….  Network area …  CLI with vender format   CLI with serial or telnet   Expect script   Good for Human, But not for software
  • 5. 4Copyright©2017 NTT corp. All Rights Reserved. Trend shift in networking  Closed (Vender lock-in)  Yearly dev cycle  Waterfall dev  Standardization  Protocol  Special purpose HW / appliance  Distributed cntrl  Custom ASIC / FPGA  Open (lock-in free)  Monthly dev cycle  Agile dev  DE fact standard  API  Commodity HW/ Server  Logically centralized cntrl  Merchant Chip
  • 6. 5Copyright©2017 NTT corp. All Rights Reserved. What is Software-Defined Networking? 5 Innovate services and applications in software development speed! Reference: http://opennetsummit.org/talks/ONS2012/pitt-mon-ons.pdf Decouple control plane and data plane →Free control plane out of the box (APIs: OpenFlow, P4, …) Logically centralized view →Hide and abstract complexity of networks, provide entire view of the network Programmability via abstraction layer →Enables flexible and rapid service/application development SDN conceptual model
  • 7. 6Copyright©2017 NTT corp. All Rights Reserved. Why SDN? 6 Differentiate services (Innovation) • Increase user experience • Provide unique service which is not in the market Time-to-Market(Velocity) •Not to depend on vendors’ feature roadmap •Develop necessary feature when you need Cost-efficiency •Reduce OPEX by automated workflows •Reduce CAPEX by COTS hardware
  • 8. 7Copyright©2017 NTT corp. All Rights Reserved. Open-Source Cloud Computing Open Northbound APIs Open-Source Controller Open-Source Hardware Open Southbound Protocol Open-Source Switch Software Infrastructure Layer Application Layer Business Applications Control Layer Network Services Network Services API API API v v v v Open source network stacks for SDN/NFV FBOSS
  • 9. 8Copyright©2017 NTT corp. All Rights Reserved. SDN/NFV history 201020092008 2011 2012 2013 OpenFlow deveopment Open Networking Foundation ETSI NFV Open Daylight OpenFlow Switch Consortium Stanford University Clean Slate Program NFV SDN activity Standarlization OpenFlow 2014 trema Ryu OpenDaylight OF1.1 OF1.2 OpenFlow protocols OF1.4OF 0.8 OF 0.9 OF 1.0 NOX Lagopus
  • 10. 9Copyright©2017 NTT corp. All Rights Reserved. History of software vswitch/router and I/O library 1995 2000 2005 2010 2015 DPDK R1.0 OSS LF join 2012 BT vBRAS demo Research phase Internal product phase OSS phase netmap XDP Click BESS OVS Nicira OVN Lagopus VPP
  • 11. 10Copyright©2017 NTT corp. All Rights Reserved. Network Functions Virtualization  Replace dedicated network nodes to virtual appliance  Runs on general servers  Leverage cloud provisioning and monitoring system for management  Virtual network function (VNF) may be run on virtual machine or baremetal server
  • 12. 11Copyright©2017 NTT corp. All Rights Reserved. 11 Evaluate the benefits of SDN by implementing our control plane and switch
  • 13. 12Copyright©2017 NTT corp. All Rights Reserved.  High performance network I/O for all packet sizes  Especially in smaller packet size (< 256 bytes)  Low-latency and less-jitter  Network I/O & Packet processing  Isolation  Performance isolation between NFV VMs  Security-related VM-to-VM isolation from untrusted apps  Reliability, availability and serviceability (RAS) function for long-term operation NFV requirements from 30,000 feet Virtua lma chine Ma na gement a nd API Virtua lSwitch (vSwitch) NIC Ha rd wa re resources Core Core VM VM Core VM VMNf-Vi-H Instruction, Policing ma p p ing a nd emula tion CoreCore CoreCore Seq uentia lthrea d emula tion
  • 14. 13Copyright©2017 NTT corp. All Rights Reserved.  Still poor performance of NFV apps   Lower network I/O performance  Big processing latency and big jitter  Limited deployment flexibility   SR-IOV has limitation in performance and configuration  Combination of DPDK apps on guest VM and DPDK-enabled vSwitch is configuration  Limited operational support   DPDK is good for performance, but has limited dynamic reconfiguration  Maintenance features are not realized What’s matter in NFV
  • 15. 14Copyright©2017 NTT corp. All Rights Reserved.  Not enough performance   Packet processing speed < 1Gbps   10K flow entry add > two hours   Flow management processing is too heavy   Develop & extension difficulties   Many abstraction layer cause confuse for me  • Interface abstraction, switch abstraction, protocol abstraction Packet abstraction…   Many packet processing codes exists for the same processing  • Userspace, Kernelspace,…  Invisible flow entries cause chaos debugging  Existing vswitch is  @2013
  • 16. 15Copyright©2017 NTT corp. All Rights Reserved. vSwitch requirement from user side  Run on the commodity PC server and NIC  Provide a gateway function to allow connect different various network domains  Support of packet frame type in DC, IP-VPN, MPLS and access NW  Achieve 10Gbps-wire rate with >= 1M flow rules  low-latency packet processing  flexible flow lookup using multiple-tables  High performance flow rule setup/delete  Run in userland and decrease tight-dependency to OS kernel  easy software upgrade and deployment  Support various management and configuration protocols.
  • 17. 16Copyright©2017 NTT corp. All Rights Reserved. Lagopus: High-performance SDN/OpenFlow Software Switch
  • 18. 17Copyright©2017 NTT corp. All Rights Reserved. Goal of Lagopus project  Provide NFV/SDN-aware switch software stack  Provide dataplane API with OpenFlow protocol and gRPC  100Gbps-capable high-performance software dataplane  DPDK extension for carrier requirements  Cloud middleware adaptation  Expand software-based packet processing to carrier networks
  • 19. 18Copyright©2017 NTT corp. All Rights Reserved.  Lagopus is a small genus of birds in the grouse subfamily, commonly known as ptarmigans. All living in tundra or cold upland areas.  Reference: http://en.wikipedia.org/wiki/Lagopus What is Lagopus (雷鳥属)? © Alpsdake 2013© Jan Frode Haugseth 2010
  • 20. 19Copyright©2017 NTT corp. All Rights Reserved.  Provide High performance software switch on Intel CPU  Over-100Gbps wire-rate packet processing / port  High-scalable flows handling  Expands SDN idea to many network domain  Datacenter, NFV environment, mobile network  Various management /configuration interfaces Target of Lagopus switch TOR Virtual Switch Hypervisor VM VM Virtual Switch Hypervisor NFV NFV Virtual Switch Hypervisor VM VM Gateway CPE Data Center Wide-area Network Access Network Intranet
  • 21. 20Copyright©2017 NTT corp. All Rights Reserved.  Open Source High performance SDN software switch  Multicore-CPU-aware packet processing with DPDK  Supports NFV environment  Runs on Linux and FreeBSD  Best OpenFlow 1.3 compliant software switch by Ryu certification  Many protocol frame matches and actions support • Ethernet, VLAN, MPLS, PBB, IPv4, IPv6, TCP, UDP, VxLAN, GRE, GTP  Multiple-Flow table, Group table, meter table  1M flow entries handling (4K flow mod/sec)  Over-40Gbps-class packet processing (20MPPS)  Open source under Apache v2 license  http://lagopus.github.io/ What is Lagopus SDN software switch
  • 22. 21Copyright©2017 NTT corp. All Rights Reserved. 0 2,000,000 4,000,000 6,000,000 8,000,000 10,000,000 12,000,000 14,000,000 16,000,000 0 256 512 768 1024 1280 #ofpacketsperseconds Packet size (Byte) How many packets to be proceeded for 10Gbps Short packet 64Byte 14.88 MPPS, 67.2 ns • 2Ghz: 134 clocks • 3Ghz: 201 clocks Computer packet 1KByte 1.2MPPS, 835 ns • 2Ghz: 1670 clocks • 3Ghz: 2505 clocks L1 cache access: 4 clocks L2 cache access: 12 clocks L3 cache access: 44 clocks Main memory: 100 clocks Max PPS between cores: 20MPPS
  • 23. 22Copyright©2017 NTT corp. All Rights Reserved. PC architecture and limitation NIC CPU CPUMemory Memory NIC NICNIC QPI PCI-Exp PCI-Exp Reference: supermicro X9DAi
  • 24. 23Copyright©2017 NTT corp. All Rights Reserved. 0 1000 2000 3000 4000 5000 6000 L2 forwarding IP routing L2-L4 classification TCP termination Simple OF processing DPI # of required CPU cycles # of CPU cycle for typical packet processing 10Gbps 1Gbps
  • 25. 24Copyright©2017 NTT corp. All Rights Reserved.  Simple is better for everything  Packet processing  Protocol handling  Straight forward approach  Full scratch (No use of existing vSwitch code)  User land packet processing as much as possible, keep kernel code small  Kernel module update is hard for operation  Every component & algorithm can be replaced Approach for vSwitch development
  • 26. 25Copyright©2017 NTT corp. All Rights Reserved. What is Lagopus vSwitch switch configuration datastore (config/stats API, SW DSL) None-DPDK NIC DPDK NIC/vNIC DPDK libs/PMD driver Lagopus soft dataplane flow lookup flow cache OpenFlow pipeline queue/ policer Flow table Flow table flow table Flow table Flow tableGroup table Flow table Flow tablemeter table switch HAL OpenFlow1.3 agent JSON IF SNMP CLI CLI JSON OSNWstack Agent SDN switch Agent • Full OpenFlow 1.3 support • Controller-less basic L2 and L3 support with action_normal SDN-aware management API • JSON-based control • Ansible support DPDK-enabled OpenFlow-aware software dataplane • Over-10-Gbps performance • Low latency packet processing • high performance multi-layer flow lookup • Cuckoo hash for flow cache Switch configuration datastore • Pub/sub mechanism • Switch config DSL • JSON-based control OS NIFVarious I/O support • DPDK-enabled NIC • Standard NIC with raw socket • tap Virtualization support • QEMU/KVM • Vhost-user • DPDK-enabled VNF
  • 27. 26Copyright©2017 NTT corp. All Rights Reserved. General packet processing on UNIX NIC skb_buf Ethernet Driver API Socket API vswitch packet buffer Data plane User-space implementation (Event-triggered) 1. Interrupt & DMA 2. system call (read) User space Kernel space Driver 4. DMA 3. system call (write) Kernel-space implementation (Event-triggered) NIC skb_buf Ethernet Driver API Socket API vswitch packet buffer 1. Interrupt & DMA vswitch Data plane agentagent 2. DMA Contexts switch Massive Interrupt Many memory copy / read
  • 28. 27Copyright©2017 NTT corp. All Rights Reserved.  x86 architecture-optimized data- plane library and NIC drivers  Memory structure-aware queue, buffer management  packet flow classification  polling mode-based NIC driver  Low-overhead & high-speed runtime optimized with data- plane processing  Abstraction layer for hetero server environments  BSD-license  Data Plane Development Kit (DPDK) NIC Ethernet Driver API Socket API DPDK apps packet buffer 1. DMA Write 2. DMA READ DPDK dataplane DPDK apps
  • 29. 28Copyright©2017 NTT corp. All Rights Reserved. What DPDK helps Network Platforms Group Kernel Space Driver 95 Packet Processing Kernel Space vs. User Space User Space NIC Applications Stack System Calls CSRs Interrupts Memory (RAM) Packet Data Copy Socket Buffers (mbuf’s) Configuration Descriptors Kernel Space Driver Configuration Descriptors DMA Benefit #1 Removed Data copy from Kernel to User Space Benefit #2 No Interrupts Descriptors Mapped from Kernel Configuration Mapped from Kernel Descriptor Rings Memory (RAM) User Space Driver with Zero Copy Kernel Space User Space NIC DPDK PMD Stack UIO Driver System Calls CSRs DPDK Enabled App DMA Descriptor Rings Socket Buffers (skb’s) 1 2 3 1 2 Benefit #3 Network stack can be streamlined and optimized DATA
  • 30. 29Copyright©2017 NTT corp. All Rights Reserved. Processing bypass for speed NIC skb_buf Ethernet Driver API Socket API vswitch packet buffer packet buffer memory Standard linux application 1. Interrupt & DMA 2. system call (read) User space Kernel space Driver 4. DMA 3. system call (write) NIC Ethernet Driver API User-mode I/O & HAL vswitch packet buffer Application with intel DPDK 1. DMA Write 2. DMA READ DPDK Library Polling-base packet handling Event-base packet handling
  • 31. 30Copyright©2017 NTT corp. All Rights Reserved. Implementation strategy for vSwitch  Massive RX interrupts handling for NIC device => Polling-based packet receiving  Heavy overhead of task switch => Thread assignment (one thread/one physical CPU)  Lower performance of PCI-Express I/O and memory bandwidth compared with CPU => Reduction of # of access in I/O and memory  Shared data access is bottleneck between threads => Lockless-queue, RCU, batch processing
  • 32. 31Copyright©2017 NTT corp. All Rights Reserved. Basic packet processing Network I/O RX packet Frame processing Flow lookup & Action QoS・Queue Network I/O TX Packet classification & packet distribution to buffers Packet parsing lookup, Header rewrite Encap/decap Policer, Shaper Marking packet
  • 33. 32Copyright©2017 NTT corp. All Rights Reserved. What we did for performance  Network I/O RX packet Frame processing Flow lookup & Action QoS・Queue Network I/O TX packet • Delayed packet frame evaluation • Delayed action (processing) evaluation • Packet batching to improve CPU $ efficiency • Delayed flow stats evaluation • Smart flow classification • Thread assignment optimization • Parallel flow lookup • Lookup tree compaction • High-performance lookup algorithm for OpenFlow (multi-layer, mask, priority-aware flow lookup) • Flow $ mechanism • Batch size tuning
  • 34. 33Copyright©2017 NTT corp. All Rights Reserved.  Exploit many core CPUs  Reduce data copy & move (reference access)  Simple packet classifier for parallel processing in I/O RX  Decouple I/O processing and flow processing  Improve D-cache efficiency  Explicit thread assign to CPU core Packet processing using multi core CPUs NIC 1 RX NIC 2 RX I/O RX CPU0 I/O RX CPU1 NIC 1 TX NIC 2 TX I/O TX CPU6 I/O TX CPU7 Flow lookup packet processing CPU2 Flow lookup packet processing CPU4 Flow lookup packet processing CPU3 Flow lookup packet processing CPU5 NIC 3 RX NIC 4 RX NIC 3 TX NIC 4 TX NIC RX buffer Ring buffer Ring buffer NIC TX buffer
  • 35. 34Copyright©2017 NTT corp. All Rights Reserved.  OpenFlow semantics includes  Match • Protocol headers – Port #, Ethernet, VLAN, BPP, MAC-in-MAC, MPLS, IPv4, IPv6, ARP, ICMP, TCP, UDP… – Mask-enabled • Priority  Action • Output port • Packet-in/Packet-out • Header rewrite, Header push, Header push How OpenFlow match is so hard
  • 36. 35Copyright©2017 NTT corp. All Rights Reserved. Many header match eth_dst eth_src eth_type eth_dst eth_src 0x0800 (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst eth_dst eth_src 0x0800 (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl 17 (ip_sum) ip_src ip_dst udp_src udp_dst eth_dst eth_src 0x0800 (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl 1 (ip_sum) ip_src ip_dst icmp_type icmp_code eth_dst eth_src 0x0806 (ar_hrd, ar_pro) (ar_hln, ar_pln) ar_op ar_sha ar_spa ar_tha ar_tpa eth_dst eth_src 0x8847 eth_dst eth_src 0x8100 Ethernet IPv4 UDP(v4) ICMP ARP MPLS VLAN mpls_label (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst vlan_pcp (vlan_cfi) vlan_vid 0x0800 mpls_exp mpls_bos mpls_ttl And GRE, L2TP, OSPF… Same as TCP, SCTP L3 header continues Includes IP and other protocols
  • 37. 36Copyright©2017 NTT corp. All Rights Reserved. Linear search (first implementation) eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst Is eth_type 0x0800? Is ip_p 17? Is udp_dst 53? 0x0800? 6? 53? packet entry 0 entry 1 Is eth_type Is ip_p Is tcp_dst 0x0800? 6? 80? entry 2 Is eth_type Is ip_p Is tcp_dst
  • 38. 37Copyright©2017 NTT corp. All Rights Reserved. 0: Linear search (first implementation) eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst Is eth_type 0x0800? Is ip_p 17? Is udp_dst 53? 0x0800? 6? 53? packet entry 0 entry 1 Is eth_type Is ip_p Is tcp_dst 0x0800? 6? 80? entry 2 Is eth_type Is ip_p Is tcp_dst
  • 39. 38Copyright©2017 NTT corp. All Rights Reserved.  Simplify comparison routine  Flow entry is composed by mask and value  Still linear search 1: bitmap comparison with bitmask eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst eth_dst eth_src 0xffff (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl 0xff (ip_sum) ip_src ip_dst tcp_src 0xffff eth_dst eth_src 0x0800 (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl 6 (ip_sum) ip_src ip_dst tcp_src 80 & == packet mask flow entry
  • 40. 39Copyright©2017 NTT corp. All Rights Reserved.  Compose search tree for dedicate fields to narrow lookup space  Then compare each fileds 2: Fixed-type search tree eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst packet eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst array eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst hash table eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst array Linear sarch
  • 41. 40Copyright©2017 NTT corp. All Rights Reserved.  Scan all flow entries and compose search tree with arrangement in order from the most frequently searched field 3: Search tree with most searched field first eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst packet (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src tcp_src tcp_dst hash table eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst hash table eth_dst eth_src eth_type (ip_v, ip_hl) ip_tos (ip_len) (ip_id, ip_off) ip_ttl ip_p (ip_sum) ip_src ip_dst tcp_src tcp_dst hash table eth_dst eth_src eth_type ip_dst Linear search
  • 42. 41Copyright©2017 NTT corp. All Rights Reserved.  Reduce # of lock in flow lookup table  Frequent locks are required • Switch OpenFlow agent and counter retrieve for SNMP • Packet processing Packet batching Input packet Lookup table Input packet Lookup table Lock Unlock ロック Unlock ●Naïve implementation ●Packet batching implementation Lock is required for each packet Lock can be reduced due to packet batching
  • 43. 42Copyright©2017 NTT corp. All Rights Reserved.  Reduce # of flow lookup in multiple table  Generate composed flow hash function that contains flow tables  Introduce flow cache for each CPU core Bypass pipeline with flow $ 42 ●Naïve implementation table1 table2 table3 Input packet table1 table2 table3 Input packet Flow cache 1. New flow 2. Success flow Output packet Multiple flow $ generation & mngmtWrite flow $ ●Lagopus implementation Output packet Packet
  • 44. 43Copyright©2017 NTT corp. All Rights Reserved. Best OpenFlow 1.3 compliant switch Type Action Set field Match Group Meter Total # of test scenario (mandatory, optional) 56 (3 , 53) 161 (0 , 161) 714 (108 , 606) 15 (3 , 12) 36 (0 , 36) 991 (114 , 877) Lagopus 2014.11.09 59 (3, 56) 161 (0, 161) 714 (108, 606) 15 (3, 12) 34 (0, 34) 980 (114, 866) OVS (kernel) 2014.08.08 34 (3, 31) 96 (0, 96) 534 (108, 426) 6 (3, 3) 0 (0, 0) 670 (114, 556) OVS (netdev) 2014.11.05 34 (3, 31) 102 (0, 102) 467 (93, 374) 8 (3, 5) 0 (0, 0) 611 (99, 556) IVS 2015.02.11 17 (3, 14) 46 (0, 46) 323 (108, 229) 3 (0, 2) 0 (0, 0) 402 (111, 291) ofswitch 2015.01.08 50 (3, 47) 100 (0, 100) 708 (108, 600) 15 (3, 12) 30 (0, 30) 962 (114, 848) LINC 2015.01.29 24 (3, 21) 68 (0, 68) 428 (108, 320) 3 (3, 0) 4 (0, 4) 523 (114, 409) Trema 2014.11.28 50 (3, 47) 159 (0 , 159) 708 (108, 600) 15 (3, 12) 34 (0, 34) 966 (114, 854)
  • 45. 44Copyright©2017 NTT corp. All Rights Reserved.  Summary  Throughput: 10Gbps wire-rate  Flow rules: 1M flow rules  Evaluation models  WAN-DC gateway • MPLS-VLAN mapping  L2 switch • Mac address switching Performance Evaluation 5Copyright©2015 NTTcorp. All Rights Reserved. Typical carrier usecase Usecase: Coud-VPN gatew ay Tomorrow: • Automatic connection setup via North Bound APIs • SDN controller maintain mapping between tenant logical network and VPN • Routes are advertised via eBGP, no need to configure ASBRs on provider side Data center Network MPLS-VPN MPLS Inter-AS option B SDN controller eBGPOF API From ONS2014 NTTCOM Ito-san presentation VRF-enabled router with MPLS
  • 46. 45Copyright©2017 NTT corp. All Rights Reserved.  Evaluation setup  Server spec.  CPU: Dual Intel Xeon E5-2660 • 8 core(16 thread), 20M Cache, 2.2 GHz, 8.00GT/s QPI, Sandy bridge  Memory: DDR3-1600 ECC 64GB • Quad-channel 8x8GB  Chipset: Intel C602  NIC: Intel Ethernet Converged Network Adapter X520-DA2 • Intel 82599ES, PCIe v2.0 Performance Evaluation Server Lagopus Flow table tester Flows Throughput (bps/pps/%) Flow rules Packet size Flow cache (on/off)
  • 47. 46Copyright©2017 NTT corp. All Rights Reserved. WAN-DC Gateway 0 1 2 3 4 5 6 7 8 9 10 0 200 400 600 800 1000 1200 1400 1600 Throughput(Gbps) Packet size (byte) 10 flow rules 100 flow rules 1k flow rules 10k flow rules 100k flow rules 1M flow rules Throughput vs packet size, 1 flow, flow-cache 0 1 2 3 4 5 6 7 8 9 10 1 10 100 1000 10000 100000 1000000 Throughput(Gbps) flows 10k flow rules 100k flow rules 1M flow rules Throughput vs flows, 1518 bytes packet
  • 48. 47Copyright©2017 NTT corp. All Rights Reserved.  10000 IP subnet entries test L3 forwarding performance with 40G NIC and CPU E5-2667v3 3.20GHz x2 Memory DDR4 64GB NIC Intel X710 x2 OS Ubuntu 14.04LTS DPDK 2.2 Lagopus 0.2.4 with default options
  • 49. 48Copyright©2017 NTT corp. All Rights Reserved.  Full OpenFlow 1.3 support  Limited OpenFlow 1.5 support (Flowmod-related instructions)  General tunnel encap/decap extension (EXT-382 and EXT-566) support • GRE, VxLAN, GTP, Ethernet, IPv4, IPv6 • Updated draft will be implemented soon  Flexible & high-performance dataplane  Hybrid-mode support • ACTION_NORMAL (L2, L3)  Various network I/O support • DPDK NIC, vNIC (vhost-user with virtio-net), none-DPDK NIC (raw-socket)  Leverage network stack in OS kernel for OpenFlow switch • ARP, ICMP, Routing control packet are escalated to network stack (with tap IF)  Queue, Meter table  Linux, FreeBSD, NetBSD support  Virtualization support for NFV  DPDK-enabled VNF on DPDK-enabled vSwitch  QEMU/KVM support through virsh Lagopus version 0.2.10
  • 50. 49Copyright©2017 NTT corp. All Rights Reserved. Hands-on and seminar
  • 51. 50Copyright©2017 NTT corp. All Rights Reserved. Collaboration with Lagopus Bussiness Research institutes and networks Software switch collaboration White box switch collaboration Yokoyama Laboratory
  • 52. 51Copyright©2017 NTT corp. All Rights Reserved. High-performance vNIC framework for hypervisor-based NFV with userspace vSwitch • To provide novel components to enable high- performance NFV with general propose hardware • To provide high-performance vNIC with operation- friendly features
  • 53. 52Copyright©2017 NTT corp. All Rights Reserved. Issues on NFV middleware
  • 54. 53Copyright©2017 NTT corp. All Rights Reserved. Performance bottleneck in NFV with HV domain User space on host Guest VM Kernel space Kernel space on host bridge NIC driver TAP driver QEMU KVM driver TAP client NIC HW emulator User space e1000 NIC driver NW stack Legacy NW apps NIC Packet buffer interrupt copy copy register access interrupt register access copy Pkt recv / send cause VM transition HW emulation needs CPU cycle & VM transition Privileged register accesses for vNIC cause VM transition System call cause context switch on guest VM VM transition: 800 CPU cycles
  • 55. 54Copyright©2017 NTT corp. All Rights Reserved.  Use para-virtualization NIC framework  No full-virtualization (emulation-based)  Global-shared memory-based packet exchange  Reduce memory copy  User-space-based packet data exchange  No kernel-userspace packet data exchange vNIC strategy for performance & RAS
  • 56. 55Copyright©2017 NTT corp. All Rights Reserved.  DPDK apps or legacy apps on guest VM + userspace DPDK vSwitch  Connected by shared memory-based vNIC  Reduce OS kernel implementation Target NFV architecture with hypervisor Virtua lma chine Ma na gement a nd API Virtua lSwitch (vSwitch) NIC Ha rd wa re resources Core Core VM VM Core VM VMNf-Vi-H Instruction, Policing ma p p ing a nd emula tion CoreCore CoreCore Seq uentia lthrea d emula tion Run in userspace to avoid VM transition and context switch Memory based packet transfer
  • 57. 56Copyright©2017 NTT corp. All Rights Reserved. Existing vNIC for u-vSW and guest VM (1/2) DPDK e1000 PMD with QEMU's e1000 FV and vSwitch connected by tap DPDK virtio-net PV PMD with QEMU virtio-net framework and vSwitch connected by tap DPDK virtio-net PV PMD with vhost-net framework and vSwitch connected by tap User space on host Kernel space on host DPDK ETHDEV / PMD UIO driver TAP driver QEMU KVM driver TAP client NIC HW emulator DPDK-enabled NIC interruptcopy copy DPDK-enabled vSwitch DPDK ETHDEV / pcap PMD copy Software dataplane DMA Guest VM Kernel space User space DPDK NW apps UIO driver DPDK ETHDEV/ e1000 PMD register access Packet buffer register access User space on host Kernel space on host DPDK ETHDEV / PMD UIO driver TAP driver DPDK-enabled NIC DPDK-enabled vSwitch DPDK ETHDEV / pcap PMD copy Software dataplane DMA Guest VM Kernel space KVM driver User space vhost-net DPDK NW apps TAP client ioeventfd register access copy copy DPDK ETHDEV/ virtio-net PMD UIO Packet buffer virtio ring User space on host Kernel space on host DPDK ETHDEV / PMD UIO driver TAP driver DPDK-enabled NIC copy DPDK-enabled vSwitch DPDK ETHDEV / pcap PMD copy Software dataplane DMA Guest VM Kernel space QEMU KVM driver TAP client Virtio-net device User space UIO copy DPDK NW apps register access register access DPDK ETHDEV/ virtio-net PMD Packet buffer virtio ring Pros: legacy and DPDK support, opposite status detection Cons: bad performance, many VM transitions, context switch Pros: legacy and DPDK support, opposite status detection Cons: bad performance, many VM transitions, context switch Pros: legacy and DPDK support, opposite status detection Cons: Cons: bad performance, many VM transitions, context switch
  • 58. 57Copyright©2017 NTT corp. All Rights Reserved. User space on host Kernel space on host DPDK ETHDEV / PMD UIO driver DPDK-enabled NIC DPDK-enabled vSwitch Software dataplane Guest VM Kernel space QEMU KVM driver IVSHMEM device User space Network apps DPDK RING API Hugepage DPDK RING API IVSHMEM driver Packet buffer Packet buffer DMA mma p mma p Packet buffer mma p Ring Ring Ring User space on host Kernel space on host DPDK ETHDEV / PMD UIO driver DPDK-enabled NIC DPDK-enabled vSwitch DPDK vhost-user API Software dataplane DMA Guest VM Kernel space User space UIO driver copy DPDK NW apps DPDK ETHDEV/ virtio-net PMD KVM driver Packet buffer virtio ring vhost-user backend QEMU virtio-net device Existing vNIC for u-vSW and guest VM (2/2) DPDK ring by QEMU IVSHMEM extension and vSwitch connected by shared memory DPDK virtio-net PV PMD with QEMU virtio-net framework and vSwitch with DPDK vhost-user API to connect to virtio- net PMD. Pros: Best performance Cons: only DPDK support, static configuration, no RAS Pros: good performance, both support of legacy and DPDK Cons: no status tracking of opposite device
  • 59. 58Copyright©2017 NTT corp. All Rights Reserved. High performance vNIC framework for NFV This patch has been already merged to DPDK
  • 60. 59Copyright©2017 NTT corp. All Rights Reserved.  High-Performance  10-Gbps network I/O throughput  No virtualization transition between a guest VM and u-vSW  Simultaneous support DPDK apps and DPDK u-vSW  Functionality for operation  Isolation between NFV VM and u-vSW  Flexible service maintenance support  Link status notification on the both sides  Virtualization middleware support  Support open source hypervisor (KVM)  DPDK app and legacy app support  No OS (kernel) modification on a guest VM vNIC requirements for NFV with u-vSW
  • 61. 60Copyright©2017 NTT corp. All Rights Reserved.  vNIC as an extension of virtio-net framework  Para-virtualization network interface  Packet communication by global shared memory  One packet copy to ensure VM-to-VM isolation  Control msg by inter-process-communication between pseudo devices vNIC deisgn User space on host Kernel space on host DPDK-enabled vSwitch Software dataplane Guest VM Kernel space User space virtio-net-compatible device DPDK NW apps DPDK ETHDEV/ virtio-net PMD Global shared memory DPDK ETHDEV / PMD pseudo PMD- enabled device IPC-based control communication
  • 62. 61Copyright©2017 NTT corp. All Rights Reserved.  Virtq-PMD driver: 4K LOC modification  Virtio-net device with DPDK extension  DPDK API and PV-based NIC (virtio-net) API  Global shared memory-based packet transmission on hugeTLB  UNIX domain socket based control message • Event notification (link-status, finalization) • Pooling-based the opposite device check mechanism  QEMU: 1K LOC modification  virtio-net-ipc device on shared memory space  Shared memory-based device mapping vNIC implementation User space on host Kernel space on host DPDK-enabled vSwitch Software dataplane Guest VM Kernel space User space virtio-net-ipc device DPDK NW apps DPDK ETHDEV/ virtio-net PMD Global shared memory/HugeTLB DPDK ETHDEV / virtq-PMD Unix domain socket kick/queue addr virtio-net device Event notification
  • 63. 62Copyright©2017 NTT corp. All Rights Reserved. Performance
  • 64. 63Copyright©2017 NTT corp. All Rights Reserved. vhost application on host Measurement point virtio-net PMD virtq PMD testpmd on Guest VM Null PMD Null PMD Vhost app testpmd on host Null PMD Bare-metal configuration virtq PMD Measurement point Testpmd on host Measurement point pcap PMD testpmd on Guest VM Null PMD Null PMD virtq PMD virtqueue TAP driver Virtio-net driver Kernel-driver testpmd on host virtq PMD Measurement point virtio-net PMD virtqueue testpmd on Guest VM Null PMD Null PMD virtq-pmd  micro benchmarking tool: Testpmd apps  Polling-based DPDK bridge app that reads data from a NIC and writes data to another NIC in both directions.  null-PMD: a DPDK-enabled dummy PMD to allow packet generation from memory buffer and packet discard to memory buffer Performance benchmark
  • 65. 64Copyright©2017 NTT corp. All Rights Reserved. Performance evaluation MPPSGBPS  Virtq PMD achieved great performance  62.45 Gbps (7.36 MPPS) unidirectional throughput  122.90 Gbps (14.72 MPPS) bidirectional throughput  5.7 times faster than Linux driver in 64B, 2.8 times faster than Linux drvier in 1500B  Virtq PMD achieved better performance in large packet to vhost app 0.00 2.00 4.00 6.00 8.00 10.00 12.00 64 256 512 1024 1500 1536 MPPS Packet size virtq PMD (off) virtq PMD (on) vhost app (off) vhost app (on) Linux drv (off) Linux drv (on) 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 64 256 512 1024 1500 1536 GBPS Packet size virtq PMD (off) virtq PMD (on) vhost app (off) vhost app (on) Linux drv (off) Kernel drv (on)
  • 66. 65Copyright©2017 NTT corp. All Rights Reserved. Container adaptation
  • 67. 66Copyright©2017 NTT corp. All Rights Reserved. Vhost-user for container  Vhost-user compatible PMD for container  Virtio-net-based backend  Shared-memory-based packet data exchange  Event-trigger by shared file  27.27 Gbps throughtput
  • 68. 67Copyright©2017 NTT corp. All Rights Reserved.  Packet flow  pktgen -> physical-> vswitch -> Container(L2Fwd) -> vswitch -> physical -> pktgen Performance Lagopus or docker0 Server Container L2Fwd or Linux Bridge Container pktgen-dpdk OS: Ubuntu 16.04.1 CPU: Xeon E5-2697 v2 @ 2.70GHz Mem: 64GB
  • 69. 68Copyright©2017 NTT corp. All Rights Reserved. Performance
  • 70. 69Copyright©2017 NTT corp. All Rights Reserved. SDN IX @ Interop Tokyo 2015 ShowNet Interop Tokyo is the biggest Internet-related technology show in Japan. This trial was collaboration with NECOMA project (NAIST & University of Tokyo)
  • 71. 70Copyright©2017 NTT corp. All Rights Reserved.  IX (Internet eXchange)  Packet exchange point between ISP and DC-SP  Boarder router of ISP exchanges route information  Issue  Enhance automation in provisioning and configuration  DDoS attack is one of the most critical issues • ISP wants to reduce DDoS-related traffic in origin • DDoS traffic occupies link bandwidth Motivation of SDN-IX IX ISP-CISP A ISP-DISP B SW SWSW SW ISP-EISP F IX ISP-CISP A ISP-DISP B SW SWSW SW ISP-EISP F
  • 72. 71Copyright©2017 NTT corp. All Rights Reserved. What is SDN IX?  Next generation IX with SDN technology  Web portal-based path provisioning between ISPs • Inter-AS L2 connectivity – VLAN-based path provisioning – Private peer provisioning  Protect network from DDoS attack • On-demand 5-tuple-baesd packet filtering  SDN IX controller and distributed SDN/OpenFlow IX core switch Developed by NECOMA project (NAIST and University of Tokyo) ISP-CISP A ISP-DISP B ISP-EISP F ISP-CISP A ISP-DISP B SW SWSW SW ISP-EISP F
  • 73. 72Copyright©2017 NTT corp. All Rights Reserved.  Two Lagopus (soft switch) are deployed for SDN-IX core switch  Multiple 10Gbps links  Dual Xeon E5 8core CPUs Lagopus @ ShowNet 2015
  • 74. 73Copyright©2017 NTT corp. All Rights Reserved. Lagopus @ ShowNet rack
  • 75. 74Copyright©2017 NTT corp. All Rights Reserved. Connectivity between AS qfx10k ne5kx8 AS290AS131154 DIX-IEJPIXKDDI CRS-4 10G-LR lagopus-1 (DPID:2) pf5240-1 (DPID:1) ax.noteJGNX lagopus-2 (DPID:4) pf5240-2 (DPID:3) xg-89:0.1 (port 4) xg-83:0.0 (port 1) xg-89:00.0 (port 3) xg-83:00.1 (port 2) xg-83:0.0 (port 1) xg-83:0.1 (port 2) xg-89:0.0 (port 3) xg-1-0-49 (port 49) xg-1-0-51 (port 51) xg-1-0-52 (port 52) xg-1-0-50 (port 50) xg-1-0-49 (port 49) xg-1-0-50 (port 50) xg-1-0-51 (port 51) 799, 1600, 1060, 810, 910, 920 (tmporally)2, 3000 ??? 100 ??? Otemachi Makuhari (Veneue)
  • 76. 75Copyright©2017 NTT corp. All Rights Reserved.  Average 2Gbps throughput  No packet drop  No reboot & no trouble for 1 week during Interop Tokyo  Sometimes 10Gbps burst traffic Traffic on Lagopus @Makuhari
  • 77. 76Copyright©2017 NTT corp. All Rights Reserved. Big change happened Before After   vSwitch has lots of issues on performance, scalability, stability, ….. vSwitch works well without any trouble! Good performance, Good stability.
  • 78. 77Copyright©2017 NTT corp. All Rights Reserved.  The SDI special prize of Show Award in Interop Tokyo 2015  http://www.interop.jp/2015/english/exhibition/bsa.html  Finalist  The SDI  ShowNet demonstration Award
  • 79. 78Copyright©2017 NTT corp. All Rights Reserved. DPDK-enabled SDN/NFV middleware with Lagopus & VNF with Vhost @Interop Tokyo 2016 This trial was collaboration with University of Tokyo and IPIfusion
  • 80. 79Copyright©2017 NTT corp. All Rights Reserved. NFV middleware for scale-out VNFs  Flexible load balance for VNFs with smart hash calculation and flow direction  Hash calc: NetFPGA-SUME • Hash calculation using IP address pairs • Hash value are injected to MAC src for flow direction for VNF  Classification and flow direction: Lagopus • Flow direction with MAC src lookup HV VNF VNF VNF lagopus lagopus uplink downlink hash calc & mac rewrite MAC-based classification for VMs hash dl_src type1 52:54:00:00:00:01 type2 52:54:00:00:00:02 … … Type 256 52:54:00:00:00:FF
  • 81. 80Copyright©2017 NTT corp. All Rights Reserved. Bird in ShowNet  Two Lagopus deployments  NFV domain, SDN-IX https://www.facebook.com/interop.shownet
  • 82. 81Copyright©2017 NTT corp. All Rights Reserved. Challenges in Lagopus  vNIC between DPDK-enabled Lagopus and DPDK-enabled VNF (Virnos)  Many vNICs and flow director (loadbalancing)  8 VNFs and total 18 vNICs HV VirNOS VirNOS VirNOS VirNOS lagopus lagopus port2 port4 port6 port8 port10 port9port7port5port3 port1 Eth0 Eth1 Eth0 Eth1 Eth0 Eth1 Eth0 Eth1
  • 83. 82Copyright©2017 NTT corp. All Rights Reserved. Explicit resource assignment for performance  Packet processing workload aware assignment is required for Lagopus and VNF Memory Memory NIC core core core core core core core core core core core core core core core core CPU0 CPU1 Traffic
  • 84. 83Copyright©2017 NTT corp. All Rights Reserved. Resource assign impacts in packet processing performance Memory Memory NIC core core core core core core core core core core core core core core core core CPU0 CPU1 Traffic Lagopus 8 VNFs Memory Memory NIC core core core core core core core core core core core core core core core core CPU0 CPU1 Traffic Lagopus8 VNFs 10Gbps 4.4Gbps
  • 85. 84Copyright©2017 NTT corp. All Rights Reserved. HV VirNOS VirNOS VirNOS VirNOS lagopus lagopus port2 port4 port6 port8 port10 port9port7port5port3 port1 Eth0 Eth1 Eth0 Eth1 Eth0 Eth1 Eth0 Eth1  DPDK-based system needs CPUs for I/O because polling- based network I/O in DPDK  Physical I/O are intensive compared to vNICs CPU resource assignment for I/O (1/2) 84 10/4 Gbps 10Gbps
  • 86. 85Copyright©2017 NTT corp. All Rights Reserved. HV VirNOS VirNOS VirNOS VirNOS lagopus lagopus port2 port4 port6 port8 port10 port9port7port5port3 port1 Eth0 Eth1 Eth0 Eth1 Eth0 Eth1 Eth0 Eth1  Traffic-path-aware CPU assign  4 CPU core were assigned to I/O thread of Lagopus CPU resource assignment for I/O (2/2) 85 10Gbps 10Gbps 5Gbps 5Gbps
  • 87. 86Copyright©2017 NTT corp. All Rights Reserved. Performance evaluation  Good performance and scalability  But long packet journey  Packet-in -> Physical NIC -> Lagopus -> vNIC -> VNF -> vNIC -> Lagopus -> Physical NIC -> Packet-out [byte] [Mbps] 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 500 1000 1500 wire rate lagopus Packet size Traffic
  • 88. 87Copyright©2017 NTT corp. All Rights Reserved. Other trials
  • 89. 88Copyright©2017 NTT corp. All Rights Reserved.  Location-aware packet forwarding + Service Chain (NFV integration)  Location-aware transparent security check by NFV  Virtual Network  Intra network • Web service and clients • Malware site blocking  Lab network • Ixia tester for demo • Policy management (Explicit routing for TE) #1: Segment routing with Lagopus for campus network lago0-0 lago0-1 lago1-1 lago1-0 lago2-1 lago2-0 nfv0 win00 Serv01 Serv00 win01 p1 p1 vFW vFW web Untrusted server block service lago0 -0 lago0 -1 lago1 -1 lago1 -0 lago2 -1 lago2 -0 nfv0 Ixia p 3 vFW vFW Ixia
  • 90. 89Copyright©2017 NTT corp. All Rights Reserved.  Flexible video stream transmission for multiple-sites and devices  Lagopus switch as stream duplicator  Simultaneous 4K video – 50 sites streaming #2: transparent video stream duplication 111 Encoder Decoder Live IP NW Cinema or public space
  • 91. 90Copyright©2017 NTT corp. All Rights Reserved. #2: Blackboard streaming without Teacher’s shadow Human shadow transparent module Pattern A Pattern B  Realtime image processing with flow direction switch  User can select modes with teacher or without teacher  No configuration change without video options Students can not see due to shadow Transparent processing help students  Realtime image processing
  • 92. TM O3 project: providing flexible wide area networks with SDN This research is executed under a part of a Research and Development of Network Virtualization Technology” program commissioned by the Ministry of Internal Affairs and Communications. (Y2013-2016)
  • 93. TM Open Innovation over Network Platform • 3 kinds of Contributions for User-oriented SDN (1) Open development with OSS (2) Standardization of architecture and interface (3) Commercialization of new technologies Toward open User-oriented SDN ©O3 Project 92 (1) Open (2) Standardization (3) Commercialization
  • 94. TM • Open, Organic, Optima – Anyone, Anything, Anywhere – Neutrality & Efficiency for Resource, Performance, Reliability, …. – Multi-Layer, Multi-Provider, Multi-Service • User-oriented SDN for WAN – Softwarization: Unified Tools and Libraries – On-demand, Dynamic, Scalable, High-performance • Features – Object-defined Network Framework – SDN WAN Open Source Software – SDN Design & Operations Guideline • Accelerates – Service Innovation, Re-engineering, Business Eco-System The O3 Project Concept, Approach, & Goal ©O3 Project 93
  • 95. TM The O3 User-oriented SDN Architecture ©O3 Project 94 Path Nodes (Opt・Pkt Transport) Switch Nodes (Lagopus, OF ) D-plane C-plane D-plane consists of Switch and Path Nodes; Switching Nodes provide programmability, and Path Nodes provide various type of network resources. Orchestrator & Controllers can create and configure virtual networks according to SDN Users, and enable to customized control on individual D-Plane. Virtual NW Virtual NW OTTs Carriers OTT-A Cnt. Appl. OTT-B Cnt. Appl. Controls on Virtual NW View from Virtual NW Network Orchestrator Switch Nodes (Lagopus, OF) Controller (スイッチ部) Controller (スイッチ部) Controller (Switch Nodes) Controller (パス部) Controller (パス部) Controller (Path Nodes) Controller (スイッチ部) Controller (スイッチ部) Controller (Switch Nodes) Common Control Framework SDN Nodes Multi-Layer, Multi-Domain Control
  • 96. TM • WAN experiments with Multi-vendor Equipment Proof-of-Concept: Physical Configuration ©O3 Project 95
  • 97. TM PoC on Multi-Layer & Domain Control ©O3 Project 96
  • 98. TM PoC on Network Visualization ©O3 Project 97 The Hands-on training for ASEAN Smart Network
  • 99. 98Copyright©2017 NTT corp. All Rights Reserved. Ryu SDN Framework http://osrg.github.io/ryu/
  • 100. 99Copyright©2017 NTT corp. All Rights Reserved.  OSS SDN Framework founded by NTT  Software for building SDN control plane agilely  Fully implemented in Python  Apache v2 license  More than 350 mailing list subscribers  Supporting the latest southbound protocols  OpenFlow 1.0, 1.2, 1.3, 1.4 (and Nicira extensions)  BGP  Ofconfig 1.2  OVSDB JSON What’s RYU?
  • 101. 100Copyright©2017 NTT corp. All Rights Reserved. Many users and more…
  • 102. 101Copyright©2017 NTT corp. All Rights Reserved.  Developed mainly for network operators  Not for one who sells the specific hardware switch  Integration with the existing networks  Gradual SDN’ing’ the existing networks Ryu development principles
  • 103. 102Copyright©2017 NTT corp. All Rights Reserved.  Your application are free from OF wire format (and some details like handshaking) What ‘supporting OpenFlow’ means? Python Object OF wire protocol Data Plane Ryu converts it Python Object OF wire Protocol Ryu generates Your application does something here
  • 104. 103Copyright©2017 NTT corp. All Rights Reserved. Ryu development is automated github Push the new code Unit tests are executed Docker hub image is updated Ryu certification is executed on test lab Ryu certification site is updated You can update your Ryu environment with one command
  • 105. 104Copyright©2017 NTT corp. All Rights Reserved. Lessons leaned
  • 106. 105Copyright©2017 NTT corp. All Rights Reserved.  What’s OpenStack?  OSS for building IaaS  You can run lots of VMs  Many SDN solutions are supported  What SDN means for OpenStack?  The network for your VMs are separated from others  Virtual L2 network on the top of L3 network SDN in OpenStack
  • 107. 106Copyright©2017 NTT corp. All Rights Reserved.  Virtual L2 on tunnels (VXLAN, GRE, etc) Typical virtual L2 implementation OVS Agent Compute node VM VM OVS Agent Compute node VM VM OVS Agent Compute node VM VM OVS Agent Compute node VM VM Tunnel
  • 108. 107Copyright©2017 NTT corp. All Rights Reserved. People advocated something like this Data Plane Data Plane Data Plane OpenFlow Controller OpenFlow Protocol Application Logic
  • 109. 108Copyright©2017 NTT corp. All Rights Reserved.  Same as other OpenFlow controllers  The controller are connected with all the OVSes Our first version OpenStack integration Plugin Neutron Server Ryu OVS RYU Agent Compute node VM VM Custom REST API OpenFlow OVS Agent Compute node VM VM OVS Agent Compute node VM VM OpenStack REST API SDN Operational Intelligence
  • 110. 109Copyright©2017 NTT corp. All Rights Reserved.  Scalability  Availability What’s the problems?
  • 111. 110Copyright©2017 NTT corp. All Rights Reserved.  How many a single controller can handle?  Can handle hundreds or thousands?  Controller does more than setting up flows  Replying to ARP packet requests rather than sending ARP packets to all compute nodes  Making OVS work as L3 router rather than sending packets to a central router  You could add more here Scalability
  • 112. 111Copyright©2017 NTT corp. All Rights Reserved.  The death of a controller leads to the dead of the whole cloud  No more network configuration Availability
  • 113. 112Copyright©2017 NTT corp. All Rights Reserved.  OFC on every compute node  One controller handles only one OVS Our second verion (OFAgent driver) Neutron Server OVS RYU Agent Compute node VM VM OVS RYU Agent Compute node VM VM OVS RYU Agent (OFC) Compute node VM VM OpenStack standard RPC Over queue system Released in Icehouse OpenStack REST API Openflow is used only inside a compute node • Scalable with the number of compute nodes • No single point of failure in OFAgent SDN Operational Intelligence
  • 114. 113Copyright©2017 NTT corp. All Rights Reserved.  Push more features to edges  Distribute features  Place only a feature (e.g. TE) on central node you can’t distribute  Couple loosely a central node and edges  Tight coupling doesn’t scale (e.g. OpenFlow connections between a controller and switches)  The existing technology like queue works SDN deployment for scale
  • 115. 114Copyright©2017 NTT corp. All Rights Reserved.  NSA (National Security Agency) More users: Tracking network activities “The NSA is using NTT’s Ryu SDN controller. Larish says it’s a few thousand lines of Python code that’s easy to learn, understand, deploy and troubleshoot” http://www.networkworld.com/article/2937787/sdn/nsa-uses-openflow-for-tracking-its-network.html
  • 116. 115Copyright©2017 NTT corp. All Rights Reserved.  TouIX (IX in France) : Replacing expensive legacy switch with whitebox switch and Ryu More users: IX (Internet Exchange) “The deployment is leveraging Ryu, the NTT Labs open-source controller” http://finance.yahoo.com/news/pica8-powers-sdn-driven-internet-120000932.html
  • 117. Zebra 2.0 Open Source Routing Software
  • 118. Open Source Revisited • Apache License • Written From Scratch in Go • Go routine & Go channel is used for multiplexing • Task Completion Model + Thread Model • Single SPF Engine for OSPFv2/OSPFv3/IS-IS • Forwarding Engine Abstraction for DPDK/OF- DPA • Configuration with Commit/Rollback • gRPC for Zebra control
  • 119. Architecture • Single Process/Multithread Architecture BGP OPSF RSVP-TE LDP FEA
  • 120. OpenConfigd • Commit & Rollback support configuration system • Configuration is defined by YANG • CLI, NetConf, REST API is automatically generated • `confsh` - bash based CLI command • OpenConfig is fully supported
  • 121. OpenConfigd Architecture • gRPC is used for transport • completion/show/config APIs between shell OpenConfigd Zebra 2.0 Lagopus confsh DB completion show config API API
  • 122. Forwarding Engine Abstraction • Various Forwarding Engine Exists Today • OS Forwarder Layer • DPDK • OF-DPA • FEA provides Common Layer for Forwarding Engine • FEA provides • Interface/Port Management • Bridge Management • Routing Table • ARP Table
  • 123. 122Copyright©2017 NTT corp. All Rights Reserved. Current limitation of software switch  # of OF apps is limited  Leverage Network OS for whtitebox switch • Opensnaproute, Linux kernel, OpenSwitch  Hard to integrate with other management system  OpenStack, libvirt,  nFlow, sFlow, BGP-extension  OF pipeline does not cover all our requirements  Tunnel termination • IPsec, VxLAN,  Control packet escalation/injection  User-defined OAM functionality  Heavy packet processing for OpenFlow flow entry  Lookup, action, long pipeline  Balance of programmability and existing network protocol support  L2, L3 (IPv4, IPv6), GRE, VxLAN, MPLS  Hybrid traffic control
  • 124. 123Copyright©2017 NTT corp. All Rights Reserved.  Provide router-aware programmable dataplane for network OS  Protocol-aware pipeline and APIs, OpenFlow  Integration with network OS  Existing forwarding &routing protocols support (BGP, OSPF)  VPN framework over IP networks  IP as a transport protocol  Vxlan, GRE, IPsec tunnel support  Decouple OpenFlow semantics and Wireprotocol from OpenFlow protocol  Provide gRPC switch control API Next major upgrade: Lagopus SDN switch router
  • 125. 124Copyright©2017 NTT corp. All Rights Reserved. Forwarding Engine Integration dataplane Lagopus BGP OPSF RSVP- TE LDP FEA Zebra 2.0 OpenConfigd DB Dataplane Manager Configuration datastore RIB/FIB control Interface/Port bridge mngmt Interface/Port bridge mngmt C-plane related packet User-traffic User-traffic C-plane related packet C-plane packet escalation via tap IF FIB ARP Stats DB C-plane packet escalation via tap IF
  • 126. 125Copyright©2017 NTT corp. All Rights Reserved.  New version will available this summer Availability?
  • 127. 126Copyright©2017 NTT corp. All Rights Reserved. Hirokazu Takahashi, Tomoya Hibi, Ichikawa, Masaru Oki, Motonori Hirano, Kiyoshi Imai, Takaya Hasegawa, Tomohiro Nakagawa, Koichi Shigihara, Keisuke Kosuga, Takanari Hayama, Tetsuya Mukawa, Saori Usami, Kunihiro Ishiguro Thanks our development team
  • 128. 127Copyright©2017 NTT corp. All Rights Reserved.  Comments and collaboration are very welcome!  Web  https://lagopus.github.io  Github  Lagopus vswitch • https://github.com/lagopus/lagopus  Lagopus book • http://www.lagopus.org/lagopus-book/en/html/  Ryu with general tunnel ext • https://github.com/lagopus/ryu-lagopus-ext Conclusion
  • 129. 128Copyright©2017 NTT corp. All Rights Reserved. Questions?