This document discusses data center fabric architectures and HPE's approach. It begins with defining the goals of a data center fabric and how graph theory can help design efficient topologies. It then covers common fabric designs like CLOS/fat tree and discusses their advantages. The document presents HPE's flexible approach using software-defined networking and outlines options for building fabrics with layer 2, layer 3, or overlays. It also covers capabilities like LAN/SAN convergence, data center interconnect, and network virtualization. Finally, it introduces HPE's Altoline/OpenSwitch platform as an open network operating system.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Data center network reference architecture with hpe flex fabric
1. #ATM16
Building the right
DC Fabric
Philippe Michelet, Senior Director, GPLM Data Center
HPE Aruba
March 2016
@ArubaNetworks |
2. 2#ATM16
Agenda
– Data Center Fabric: a Definition and an introduction
– HPE Aruba Data Center Fabric: a Flexible Approach
– Layer 2 / Layer 3 / Overlay
– LAN/SAN Convergence
- In between multiple Data Centers – Data Center Interconnection
- The Foundation for Network Virtualization
- New !!! OpenSwitch on Altoline – the Most Open Network Operating System in the industry
- Conclusion
4. 4#ATM16
Data Center Fabric
Definition/Goals
– How do you optimally interconnect 1000, 10 000, 100 000 or more end points (servers, storage)
– E/W traffic vs N/S traffic
– Servers to servers inside the DC
– Clients to servers entering/servers to clients leaving the DC
– Full bisectional bandwidth
– Every end point has equal BW (TX/RX) with every other peer in the fabric
– Minimizing the number of connections
– Cables/fiber/transceivers can represent 50% of the cost of a Data Center
– Minimizing the number of hops to reach out any other peer in the fabric
– Latency impact
– Providing Redundancy when any node or any link fails
– Failure will happen – it’s just a question of time
– Being Flexible
– Can’t rewire a complete Data Center when going from 10000 to 10001 or 100000 to 100001 end points
5. 5#ATM16
How the Graph Theory can help you...
Seven Bridges of Konigsberg, Leonhard Euler, 1736
– What is the most efficient way to connect n
nodes
– Large scale/Tier 1: >100 000 nodes
– Typically using [32-48] port switches
– Choice of Different Topologies
–Full Mesh
–Line (1-dimension cube)
–Rings
–Cubes,
–Etc.
– With Metrics
–Switch Radix
–Max Hops
–Total Switches
–Total Ports
–Total Links
6. 6#ATM16
Standard Enterprise DC Deployment?
CLOS/Fat Tree
Note: Connectivity shown for some path only for clarity
– In a Fat Tree, there are as many edge ports as spine ports
NE = NS
– Now the most deployed solution up to ~5000 edge ports
– Advantages:
– Non-blocking architecture
– Constant Latency
– Fat Trees have a constant bisectional bandwidth
– Bisection bandwidth scales linearly with the number of nodes
– There are alternate paths for resiliency
– Formulas
FT (m,h): each node has m/2 children and m/2 parents
m = number of ports per switch
h = tree level
machines = m*(m/2)^(h-1)
switches = (2h-1)* (m/2)^(h-1)
example: m =24, if h=2 -> 2048 ports if h=3 -> 3456 ports
7. 7#ATM16
Example of a CLOS Network Design
One of the “Big Seven”
T3 T3 T3 T3
T2 T2 T2 T2 T2 T2 T2 T2
T1 T1 T1 T1 T1 T1 T1 T1
ToR-1 ToR-20
T2 T2 T2 T2
T1 T1 T1 T1 T1 T1 T1 T1
ToR-1 ToR-20
T2 T2T2 T2
9. 9#ATM16
So, which Data Center Fabric are you building in 2016?
There is not a single answer
L R
x N
L R
x N
L R
x N
L R
x N
Ethernet & VLANS IP routing domain
L R
x N
L R
x N
L R
x N
L R
x N
L R
x N
L R
x N
L R
x N
L R
x N
L2 Fabric L3 Fabric Overlay
10. 10#ATM16
Why using Layer 2 in 2016?
– Legacy support
– It works … “If it ain’t broke, don’t fix it”
– Some customers just don’t have the bandwidth to make big changes
– Some applications can’t be easily rewritten so Layer 2 extension remains a must have
– And introducing overlays is a no go considering the complexity (data/control and management planes completely different)
– From a pure technical perspective, Layer 2 is not dead
– Standardization around distributed ling aggregation is done – 802.1AX-2014 giving all vendors an opportunity to still
use Layer 2, but without any STP like solution (or STP used an “insurance” policy)
– HPE Aruba on the FlexFabric side is working on this implementation
– Proprietary implementations have been working for many years
– HPE Aruba Intelligent Resilient Fabric (IRF) – more than 5 years with the big chassis and ToR
– If TRILL remains a niche (supported by FlexFabric products), PBB/SPB provides an elegant alternative
– Compatible with all Ethernet L2 protocols
– Has a native OAM (Operation & Maintenance) – something that overlays are struggling with
– Simpler in nature than new overlays like VXLAN
– Has been used in production very successfully by HP IT since 2011
11. 11#ATM16
Why using Layer 3 in 2016?
– Technically sounder than Layer 2
– No STP (always very complex to manage …)
– Broadcast/Security concerns
– Proven at scale / Reduces size of failure domains
– The choice made by the majority of big cloud providers
– Draft RFC on this subject (Facebook/Microsoft): “Use of BGP for routing in large-scale data centers” here
– Limited and controller number of applications – different from vast majority of Enterprise IT
– Simpler by nature if you know BGP
– Simpler control plane protocols
– eBGP (different AS for each ToR) /ECMP (Equal Cost Multi Path) / BFD (Bi Directional Forwarding Detection)
– Power of BGP, combined to ECMP for load distribution and BFD for protocol convergence acceleration (~200ms)
– Considered more “secure” even if more work will be required
– All protocols completely supported by FlexFabric portfolio and the foundation of the first release of
OpenSwitch
12. 12#ATM16
Why using Overlays in 2016?
– Scalability – goes beyond the 802.1Q VLAN limitations (12bits/4096 IDs)
– Typically 16M services/tenants
– Essentially driven by VM mobility – L2 extension
– VXLAN as de-facto solution by VMware (NSX as part of their SDDC initiative)
– Encapsulation over IP – ability to cross L3 boundaries
– The fabric becomes a big L3 domain with L2 processing (encapsulation / de-capsulation) at the edge (NIC
or Leaf/ToR)
– Separation between “underlay” (L3 Fabric per previous slide) from the “overlay” (Hypervisor/Leaf/Tunnel instantiation)
– All DC fabric vendors do have an overlay solution today – including HPE Aruba with FlexFabric
– But keep in mind that careful attention is required
– Different data plane (additional header) makes Jumbo Frames a must have and will continue to evolve …
– Standardization around control plane is still work in progress (even if BGP EVPNs are here)
– Management is still a big issue – how do you quickly identify the root cause of the problem
– Is it the underlay? Is it the overlay?
13. 13#ATM16
L2/L3/Overlays – Pros/Cons - My Perspective
L2 L3 Overlays Comments
Maturity High/Very high High/Very high Low (VXLAN)
If the concept of overlay is not
new, VXLAN is quite recent
Interoperability Well understood Well understood
Control Plane still work in
progress
BGP EVPN
Tenants (Scalability)
4K – 802.1Q
16M – PBB (802.1ah)
SW implementation
(VRF/VPNs)
16M VXLAN/VNIDs 24 bits / 16M tenants
Stability
Low
The key issue !!!
High Jury is still out …
Convergence Time
45s (standard STP timers)
3 ~ 5 s (RSTP)
BFD ~200ms (*)
OSPF/BGP ~5 to 10s
Real dependencies on
Underlay ..
Security (broadcast) Low Medium/High Medium
Multicast Well understood Well understood
Lot already done, but still
work in progress
OAM Ethernet OAM Protocol by Protocol
Still many improvements
required …
Area of innovation
14. 14#ATM16
FlexFabric – What you change is the SW – Same HW
DC network Design Option 1 Option 2 Option 3
Traditional 3 Tier IRF MSTP PVST/PVST+
Layer 2 Spine & Leaf IRF MSTP PBB/SBP (IS-IS)
Layer 3 BGP (v4/v6) OSPF (v4/v6) IS-IS (v4/v6)
L3 Overlay
IS-IS with centralized L3
GW
EVPN with centralized
L3 GW
EVPN with distributed
L3 GW
LAN/SAN Converged Separate document available on hpe.com describing use case
DC Interconnect EVI (MACoverGRE) EVI 2.0 (VXLAN/EVPN) MPLS/VPLS
DCN with ToR (59xx) Supported
OpenStack with ToR (59xx) Supported
NSX-v with 59xx On-going certification with VMware
15. 15#ATM16
HPE Data Center FlexFabric for Spine/Leaf Deployment
The industry’s best field tested and tried Ethernet fabric
Modular network OS with Intelligent Resilient Fabric
1/10/40GbE L2/L3 and converged switches
25GbE/100GbE (5950/Modular 5950)
HPIMCManagement
Core switches
Spine
HP Comware
Network OS
L2/L3 IPv4/v6
MPLS/VPLS
VXLAN
Top of Rack
(TOR)
Leaf switches
12900E 7910 7904
HPTechnologyServices
HPConsultingServices
SDN
NetVirt
5900 5900CP 5930/5940 Modular
5930/5940
16. 16#ATM16
HPE Data Center FlexFabric for LAN/SAN Convergence
Ideally positions existing FC/FCoE customers to transition to IP Storage
Native FCEthernet FCoE
Ethernet
switches
NIC HBA
Server
FC
switches
FC SANEthernet LAN
Server
NIC HBA CNA
Server
Converged
switches
CNA
Server
FC SAN
Ethernet LAN
FCoE
Now: 50% CAPEX reduction, 66% OPEX reductionBefore: multiple networks
17. 17#ATM16
Data Center Interconnect and EVI
• HPE Ethernet Virtual Interconnect (EVI) can be deployed for active/active DC over any Network
• EVI provides additional benefits such as:
− Transport agnostic
− Up to 16 Active/Active DCs
− Active/Active VRRP default gateways for VMs
− STP outages remain local to each DC
− Improves WAN utilization by dropping unknown frames and providing ARP suppression
Virtual Overlay VXLAN tunnels
Physical Underlay Network
Active Data Center (DC) 1
L2 or L3
Virtual Overlay VXLAN tunnels
Physical Underlay Network
Active Data Center (DC) 2
L2 or L3
L3 WANVM VM VM
VM VM VM
Hypervisor
VM VM VM
VM VM VM
Hypervisor
VM VM VM
VM VM VM
Hypervisor
VM VM VM
VM VM VM
Hypervisor
EVI tunnel
18. 18#ATM16
Orchestrate
network fabrics
Complete the SDN
architecture with management
Accelerates deployment of
services and applications
•Unified IRF/PBB/SPB/TRILL,
fabric management
•Manages across
geographically dispersed
locations (DCI/EVI)
•VMware vMotion playback
•Unified DCB,
FCoE management
•Configuration, monitoring &
policy management for all
SDN layers
•OpenFlow switch
management
•SDN controller performance
management
•One application for managing
SDN and traditional
environments
•“Just right” network services
tuned to business
requirements
•Simplifies provisioning,
monitoring and
troubleshooting of applications
•Eliminates manual
provisioning of network service
parameters
•Easy to use service modeling
tool with drag and drop UI
IMC Orchestration for Data Center
22. 22#ATM16
Lower TCONo vendor Lock inFaster Time to Service
Customer choiceAgile and scalable Affordable capacity
HPE Altoline trusted open network switches solution
Accelerate disaggregation of cloud data center networks
• Optimized for scalable and agile
cloud deployments
• Faster provisioning & time to
service
• HPE worldwide service & support
• HPE Technology Services
expertise
• HPE Altoline switches for open
networking
• Open source, or commercial Linux
OS license from HPE
• Global component buying options
• Global HPE support and services
• Direct sales and purchasing
Lower CapEx and OpEx
• Open source, or commercial Linux
tools and resources
• Consistent automation and SDN
23. 23#ATM16
HPE Altoline deployment models
Top-of-rack spine-leaf switches for cloud data centers
Spine 6940
switch
Leaf 6920
switch
HPE Altoline 6940 Spine TOR
32*40GbE ports
HPE Altoline 6920 Leaf TOR
48*10GbE ports + 6*40GbE ports
Servers
Altoline
ToR Switch
24. 24#ATM16
What is OpenSwitch? More details on openswitch.net
Community Based
• Launched with 8 charter
contributors
• Over 90 non-HPE people / 30
companies on mailing lists
• Active weekly IRC chats
• Sample story:
• LinkedIn said on IRC that
they would be interested, but
want to see Ansible support
• Ansible jumped in, saying
they want to help design the
Ansible integration
• Several IRC chats and open
email discussions since then
• Ansible looking to use
OpenSwitch as template for
native Ansible support
Open Source
• All HPE code for OpenSwitch
is in publicly available git,
mirrored to github
• All under Apache 2.0 (except
some leveraged projects e.g.
Quagga)
• Anyone can download the
source, tinker, build for all
supported platforms
• All HPE development for TOR
is done upstream first
• Leverage Yocto build recipes
and linux menuconfig: build
only the components you need
Full Feature NOS
• L3: using Quagga, with
significant investment to
further enhance. BGP, OSPF,
…
• L2: open-sourcing internally
developed protocols. MSTP,
mLAG, …
• Classic Management: CLI,
SNMP
• GUI: Web-UI
• Programmatic Management:
REST, Ansible, Direct OVSDB,
…
• Open vSwitch DB used for all
state
• Highly available, per-service
restartable
26. 26#ATM16
HPE Data Center Solutions – Built to Win
High Performance
DC Fabric
High Density / High Performance /
Highly Scalable / Highly Resilient
Composable
Infrastructure / SDN
Network Virtualization
“Instantiating open, complex networks and
associated policies in minutes vs. weeks”
Zero Touch Provisioning
- DC Fabric
IMC Platform (ZTP / Fabric Manager)
DevOps (Python, Ansible …)
Hybrid Cloud Integration Helion/Openstack/CSA integration
27. 27#ATM16
Customers & Analysts trust us … Will you be next?
Solution: Entire HPE solution (Server/Storage/Networking/Technology Services)
“We chose HPE and got more than what we asked for. We wanted to standardize our
infrastructure and go with a single vendor to build our data center and reduce
management complexity” Wahid S. Hammami, CIO,Ministry of Petroleum and
Mineral Resources
http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA6-3550ENW&cc=us&lc=en
Solution: DC Core 12916
“We have the largest capacity flagship core switch HPE sells, and with that comes all
the flexibility we’ll ever need. With a 16-slot chassis, and 720 10Gb Ethernet ports,
it’s really a remarkable network core that will support whatever we want to do for the
next 10 years.” Bruce Allen, Director, Max Planck Institute for Gravitational Physics
http://h20195.www2.hp.com/V2/GetDocument.aspx?docname=4AA5-9943ENW
Solution: 2 data centers and launch new services in 8 months
“HP Networking solutions gave us the ability to rapidly scale our ShoreTel Sky
voice communications capacity from 130,000 users to more than a million users
in less than 8 months—and that’s a huge thing for us.” Dennis Schmidt, VP Network
Engineering, ShoreTel
More References here
28. 28#ATM16
Join Aruba’s Titans of Tomorrow
force in the fight against network
mayhem. Find out what your
IT superpower is.
Share your results with friends
and receive a free superpower
t-shirt.
www.arubatitans.com