An in-depth critique of the existing OpenStack networking approach, with a focus on how the Nova network controller is more of a hindrance than a help. Discusses the gap in Quantum's functionality required to close the gap, and alternative solutions. How can we make networking in OpenStack robust, high performance, and fault tolerant? What do typical large scale networks look like and what lessons can we learn from them? Is there an approach to networking we can take that is the same with a handful of servers as it is with hundreds of racks?
Apidays New York 2024 - The value of a flexible API Management solution for O...
Networking is NOT Free: Lessons in Network Design
1. Networking is NOT Free:
Lessons In Network Design
Dan Sneddon
Member Technical Staff
Twitter: @dxs
Download: http://engineering.cloudscaling.com/portland13
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*
* All unlicensed or borrowed works retain their original licenses
2. Presenter Bio
• 20 years of network engineering and systems design
• Lead Global Network Engineer for Apple
• Network Security Architect for SLAC National Laboratory
• IT Architect for division of Schneider Electric
• Financial sector networking (banks and trading floors)
• Major startups, including Twitter
Dan Sneddon
Member Technical Staff
Twitter: @dxs
2
3. Our Journey Today
1. Datacenter Networking: Historical Perspective
2. Rise and Fall Of The VLANs
3. Networking At Cloud Scale
4. OpenStack Networking Models
5. Room For Improvement In OpenStack Networking
3
13. VLAN Pros and Cons
Pros:
• Provide a level of isolation
• Reduction in size of broadcast domain
• Manageable, up to a certain size (especially with VTP, etc)
Cons:
• Each VLAN can only reach other VLANs through routers
• Spanning-tree (when it breaks, everything breaks)
• 4096 VLAN limit--assigning in blocks uses this up faster
13
14. Death Of the VLANs
VLANs Only Scale So Far
• In the late 2000’s, high-density (1U) servers become
standard
• There is no way to make spanned VLANs work for many
thousands of servers
• A new model takes over: small layer 2 domains with
layer 3 routing
14
15. Breaking Through The Scale Barrier
VLANs Only Scale So Far
VLAN Locally, Route Globally
15
18. Two Cloud Infrastructure Models
1 2
Enterprise Elastic
Virtualization Infrastructure
New
Legacy Apps Dynamic Apps
18
19. Elastic Cloud vs.
Enterprise Virtualization
Enterprise Virtualization Elastic Cloud
Applications Traditional & Legacy Dynamic
Scaling Architecture Managed Silos Horizontal
Technology Stack Heavy & Proprietary Distributed & Open
Price/Performance Low High (4-7x better)
Failure Domains Large Small
Provisioning Slower & Manual Faster & 100% API
Server consolidation and lower On-demand, scale-out
Best For:
datacenter mgmt costs infrastructure for new apps
19
20. Nova-Network
Classic OpenStack Networking, With That Old-Timey Feel
FlatDHCP
4 Modes: Flat Flat DHCP VlanManager
Multi-host HA
• Flat/Flat DHCP only support a single VLAN for everything
• VlanManager is the most feature-rich for multi-tenant
• VlanManager requires trunking all VLANs down to each host
• In a public cloud, max of 4096 VLANs limits tenants
20
21. OCS Nova-Networking L3 Plugin
Cloudscaling Exclusive Solution
• Layer 3 networking for VMs, with DHCP and NAT service
• Each VM is on its own Linux bridge, no shared layer 2
• Quantum not required
• DHCP service is local to each compute host
• AWS-like: floating IPs, elastic netblocks, and now VPC
21
22. Brokerless Messaging With ZeroMQ
Avoiding RabbitMQ’s Single Point Of Failure
Nova-Compute Nova-Compute
Single Point
Of Failure
RabbitMQ
Broker
Nova-Scheduler Nova-API Nova-Scheduler Nova-API
RabbitMQ vs. ZeroMQ
(Brokered) (Peer To Peer)
22
24. OpenStack Networking
APIs For All Your Networking Things
• “Quantum” is now known as “OpenStack Networking”
• Pluggable architecture, with APIs for all network functionality
• Basic L3 plugin (finally!), but designed for L3 on flat L2 network
• nova-network process still performs some very basic functions
• Some plugins are more complete/stable than others
24
25. OpenStack Networking
Horizon
REST
over HTTP(S) REST
Nova
Quantum REST (Quantum
API Service Plugin)
RPC
DHCP
Agent
Ceilometer
OPENSTACK Notifi-
NETWORK cations
SERVICE
Virtual Keystone
Network
REST
Plugin
Quantum
Agent(s) SDN
Provider Varies
Solution
Network
Plugin
Varies Physical
Varies Hardware
SQL
Hypervisor
Quantum DB
compute node
25
26. OpenStack Networking Modes
• VLAN networks are supported using provider network plugins
• Layer 3 plugin
• GRE tunnel support using virtual network plugins
• May be used with Linux Namespaces to isolate tenants from
one another within a hypervisor
• Many commercial vendor plugins
26
27. Quantum Compatibility
Lots Of Choices For Virtual Network/SDN Providers
•Open vSwitch. http://www.openvswitch.org/openstack/documentation
•Nicira NVP. quantum/plugins/nicira/nicira_nvp_plugin/README and http://
www.nicira.com/support.
•Midokura. http://www.midokura.com/midonet/openstack/
•BigSwitch. http://www.bigswitch.com/sites/default/files/sdn_resources/
openstack_aag.pdf
•Cisco. quantum/plugins/cisco/README and http://wiki.openstack.org/cisco-
quantum
•Linux Bridge. quantum/plugins/linuxbridge/README and http://
wiki.openstack.org/Quantum-Linux-Bridge-Plugin
•Ryu. quantum/plugins/ryu/README and http://www.osrg.net/ryu/
using_with_openstack.html
•NEC OpenFlow. http://wiki.openstack.org/Quantum-NEC-OpenFlow-Plugin
27
29. Default Layer 3 Design
OpenStack Networking Won’t Magically Configure Routing
VLANs
* Diagram taken from OpenStack Networking
official documentation
29
30. Gaps In Functionality
• VLAN networks are still problematic, Quantum doesn’t fix that
• Layer 3 network plugin still gets deployed on shared layer 2
• Dynamic routing protocols are not supported by L3 plugin
• Overlay networks are great, unless something goes wrong--
GRE tunnels hard to troubleshoot, we need tooling, diagnostics
• Load-balancer-, firewall-, and VPN-as-a-service still in design
phase, may not be production-ready until I or J release
30
31. How Can We Make Things Better?
There Are Plenty Of Ways To Contribute
• Further work needed on the “metaplugin” that allows more
than one plugin simultaneously
• ZeroMQ support (there are known problems with DHCP, etc.)
• Better high-availability, including active-active DHCP
• Better support for custom tenant networks with overlapping IPs
31
33. Networking is NOT Free:
Lessons In Network Design
Dan Sneddon
Member Technical Staff
Twitter: @dxs
Download: http://engineering.cloudscaling.com/portland13
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*
* All unlicensed or borrowed works retain their original licenses