As OpenStack matures, more users move from “dipping a toe” to deploying at large scale, with 1000's of nodes.
OpenStack networking has long been a limiting factor in scaling beyond a few hundreds of nodes, forcing users to turn to cell splitting, or to complete offloading of the networking to the underlay systems and forfeit the overlay network altogether.
Dragonflow is a fully distributed, open source, SDN implementation of Neutron, that handles large scale deployments without splitting to cells.
In testing we've conducted, we were able to scale to 4000+ controllers (each controller is typically deployed on a compute node), while maintaining the same performance we had on a small 30 node environment.
3. Highlights from Mirantis Perf&Scale Test (Dec’16)
• MOS 9.0 with Mitaka-based Neutron
• 3 hardware labs were used for testing
• The largest lab included 378 nodes
• Line-rate throughput was achieved
• Over 24500 VMs were launched on a 200-node lab
• …and yes, Neutron works at scale!
https://www.mirantis.com/blog/openstack-neutron-performance-and-scalability-testing-summary/
4. Highlights from Mirantis Perf&Scale Test (Dec’16)
Configuration
• ML2 OVS
• VxLAN/L2 POP
• DVR
Behavior
• ARP tables exploded at 16K VMs (had to be increased)
• RabbitMQ & Ceph broke at 20K VMs
• Services and agents broke at 24.5K VMs
• Integrity test: Successful
Compute
1
Compute
n
(n<=378)
…
VM
s
19
6
DV
R
Ro
ute
r
Su
bn
et
…
Heat
Stack
1
VM
s
19
6
DV
R
Ro
ute
r
Su
bn
et
Heat
Stack
125
https://www.mirantis.com/blog/openstack-neutron-performance-and-scalability-testing-summary/
5. Is it enough?
Full OpenStack per ~400 servers
Max 24,500 VMs per OpenStack
8. The Solution (for Networking):
•Add a scalable “Read Replica” of Neutron DB
•Use a well-distributed, well-scaling DB (e.g. Redis)
Separate “Reads”
from “Updates”
•Manage small (1) virtual switches in each controller
•Controller should be small (e.g. Not Opendaylight)
Lean Distributed
Control Plane
•Small footprint
•Grows with workload (not with infrastructure)
•Transformed to southbound at the edge
Distribute Policy (vs.
Flows)
•“Run at edge”
•Suppress control messages from going out
•Leverage “predefined” nature of cloud env
Distribute Network
Services
10. Neutron
Server
Neutr
on
API
Dragonflow Server
Distributed Network Services in Dragonflow
Compute NodeCompute NodeCompute Node
Dragonflow
Network DB
OVS
OVSDB
OVSDB-Server
ETCD Redis
Kernel Datapath Module
NIC
User Space
Kernel Space
DB Drivers
OVSD
B
ETCD Redis
Future (Pike+)
vswitchd
Dragonflow Controller
Applications
L2 App L3 App
DHCP
App
VLAN
App
SG App
LBaaS
Metadata
App
Flat Net
App
IGMP
ICMP
App
Remote
Port App
Pluggable DB
Layer
NBDBDrivers
SB DB Drivers
smartNIC OVSDB
OVSDB
ETCD
Redis
ØMQ
ØMQ
Neutron
DB
Dist.
SNAT
App
ML2Driver
L2 SG
Trunk
Port
Pub/Sub Drivers
ØMQRedis ETCD
Trunk
Port
Active
Port
Detection
TAP
FW
OpenFlow
Contai
ner
VM
Service Plugins
Route
r
BGP TAP
LBaa
S
FW
New (Ocata)
12. CN
CN
CN
Brief Overview (SNAT vs. DNAT)
VM
VM
VM
SNA
T
10.1.11.
5
10.1.13.8
10.1.7.7
21.3.5.5
VM
VM
VM
DNA
T
DNA
T
DNA
T
21.3.5.5
21.3.5.7
21.3.8.7
WA
N
GW
WA
N
GW
SNA
T
DNA
T
13. SNAT
Implemented in Neutron DVR
https://www.mirantis.com/blog/openstack-neutron-performance-and-scalability-testing-summary/
14. Distributed SNAT
Implemented in Dragonflow
…
Compute Node
VM
Compute Node
VM
Some vRouters
Some WAN Gateways
Internet
NAT
#1
NAT
#2
15. Distributed SNAT
Implemented in Dragonflow
Compute Node
Dragonflow
VM
OVS
VM
1 2
br-int
qvoXX
X
qvoXX
X
OpenFlo
w
1
42
Dragonflow Controller
Abstraction Layer
L2
App
L3
App
Dist.
SNA
T
App
…
3
1 VM Send Packet
2 Classify Flow as Internet (i.e. not on any of the
internal routed networks)
3 Apply NAT function in OVS with the
4 Forward packet towards Internet
5 Possibly, Internet Gateway does 2nd NAT on Packet
To the
Internet
5
Pluggable DB
Layer
Distrib
uted
DB
17. Network Node
DHCP
Implementation in Neutron
DHCP
Agent
Neutron Server
Message Queue
Example
• 100 Tenants
• 3 vNet / tenant
= 300 DHCP Servers
DHCP
namespace
dnsmasq
18. 1 VM Send DHCP_DISCOVER
2 Classify Flow as DHCP, Forward to Controller
3 DHCP App sends DHCP_OFFER back to VM
4 VM Send DHCP_REQUEST
5 Classify Flow as DHCP, Forward to Controller
6 DHCP App populates DHCP_OPTIONS from DB/CFG
and send DHCP_ACK
Distributed DHCP
Implemented in Dragonflow
VM
DHCP
SERVER
1
3
4
6
7
Compute Node
Dragonflow
VM
OVS
VM
1 2
br-int
qvoXX
X
qvoXX
X
OpenFlo
w
1
4
2
5
7
Dragonflow Controller
Abstraction Layer
L2
App
L3
App
DHC
P
App
…
36
Pluggable DB
Layer
Distrib
uted
DB
20. Test Plan
1. Baseline Neutron
– Measure Neutron API-to-DB latency
2. Baseline Dragonflow
– Measure Dragonflow in small environment (1 controller per compute
node) – Total 33
3. 4K scale
– Measure Dragonflow in large environment (130 controllers per compute
node) – Total 4031
4. Baseline Redis
– Measure Redis in large environment (130 agents per compute node) –
Total 4031
21. OVS32
DF Server
OVS31OVS1
Baseline Test
Server 1 Server 31 Server 32…
Controller
1
Server 33 Server 38…
Redis 1
Master DB
Redis 2
Master DB
Redis 3
Master DB
Redis 4
Replica DB
Redis 5
Replica DB
Redis 6
Replica DB
br-int
Controller
31
br-int
Controller
32
br-int
22. OVS32
DF Server
OVS31OVS1
4K scale
Server 1 Server 31 Server 32…
Controller
1
Server 33 Server 38…
Redis 1
Master DB
Redis 2
Master DB
Redis 3
Master DB
Redis 4
Replica DB
Redis 5
Replica DB
Redis 6
Replica DB
Total:
4030 DF
Local
Controllers
br-int-1
Controller
130
br-int-130
…
…
Controller
1
br-int-1
Controller
130
br-int-130
…
…
Controller
1
br-int-1
24. Benchmark Conclusions
• Dragonflow performance consistent with scale
• Neutron performance needs to improve (need to profile)
– Multiple scripts with single Neutron improve 250% (from 1.06 subnet/sec
to 2.63 subnet/sec)
• Current performance is production ready
– Faster than VM spin-up
– Comparable to Container spin-up
– Scale-agnostic
• Redis performance far exceeds the requirements
– ~177 top-level network events per second, fully synchronized to 4161
nodes
28. SDN Controller
North-bound Interface (REST?)
South-bound Interface
(Openflow)
SDN Apps
SDN
DB
Neutr
on
DB
Neutron-server
ML2-Core-Plugin
ML2.Drivers.Mechanism.XXX
Services-Plugin
Service
Network
Neutron API Nova API
CLI / Dashboard (Horizon) / Orchestration Tool
(Heat)
Switch
Nova
Nova Compute
VM VM
Nova Compute
VM VM
Virtual Switch Virtual Switch
Neu
tron
Plug
in
Age
nt
Neu
tron
Plug
in
Age
nt
Message Queue (AMQP)
Neutron-L3-Agent
Neutron-DHCP-
Agent
Loa
d
Bala
ncerFire
wall
VPN
L3
Serv
ices
Top
olog
y
Mgr.
Over
lay
Mgr.Sec
urity
Vendor-specific API
DB Consistency: Common Problem to all SDN Solutions
29. DB Consistency: Common Problem to all SDN Solution
• Neutron DB transaction is committed, but the related operations on SDN Controller
DB have failed
Problem 1
• Concurrent APIs cause multiple transactions on a given Neutron object. Neutron DB
can deal with it very well due to its ACID nature. How about the SDN Controller DB?
Problem 2
• Nested transactions can be done in Neutron DB. How about the SDN Controller DB?
Problem 3
Problem N…
31. Dragonflow Data System vs. Neutron
Neutron DB
Relational Database
ACID system
Stores the whole virtualized
network topology for OpenStack
Dragonflow DB
Key-value Store
BASE system
Stores a ‘partial’ virtualized
network topology used in
Dragonflow
32. DB Consistency in Dragonflow
• Introduce a distributed lock for coordination
– Guarantee the atomicity of a given API
– Implemented in the Neutron core plugin layer
– Project-based lock allows concurrency
Neutr
on
DB
Neutron-server
ML2
Dragonflow Driver
Neutron API
CLI / Dashboard (Horizon) / Orchestration Tool (Heat)
Dragonflow
North-bound Interface
South-bound Interface
(Openflow)
SDN Apps
Top
olog
y
Mgr.
Over
lay
Mgr.Sec
urity
Obtain
distributed
lock
Dragonflow
NB APIDB
33. DB Consistency in Dragonflow
• Introduce an object synchronization mechanism
– All the objects stored in both databases are versioned
– Take advantage of CAS operations of the Dragonflow DB
– Sync the object when something unexpected happens
SDN DB
Neutron
DB
Network_ID Name Status MTU VLAN Availability Zone Subnets
Object_ID = Network_ID Version = 5
Read
Notify
compare & swap <- Version
Compute NodeCompute NodeCompute Node
Dragonflow
Local
Controller
Subscriber
vSwitch Flush Flow