2. • Introduction
• Business Continuity Solutions
• DCI Solutions for LAN extension
• OTV, EVPN and other
• Interconnecting Fabrics
• VXLAN (Stretched Fabric & Dual Fabrics)
• ACI (Stretched Fabric & Dual Fabrics)
• Key Takeaways
Agenda
3. Network Evolution
Business drivers
10G
40G & 100G
DCI
Virtualization drives 10G to the edge
High Density 10G at Edge
40G &100G in Core/Agg
Unified I/O & Fabric
Clustered Applications &
Big Data Applications
East-West (E-W) traffic
Non-Blocking FabricPath/ VxLAN / ACI
ECMP / Predictable Lower Latency
Multi-Tenancy
Secure Segmentation
Resource Sharing
Business Continuity
Multi-Site DCI Extensions
Storage Extensions
Workload Mobility
More Virtual Workloads per Server
Large L2 Domains
Non-Disruptive Migration
Increased Dependence on
East West Traffic
in the Data Center
Data Center Evolution
DC Fabrics are Increasing in Scale, Extended across sites, and require more Security
3
LowCost,StandardProtocols,OpenArchitectures
Automated Policy Driven
Provisioning and Management
4. Business Continuity and Disaster Recovery
Ability to Absorb the Impact of a Disaster and Continue to Provide an Acceptable Level of Service.
“Disaster Avoidance (DA)”
Planned / Unplanned Service
Continuity within the Metro without
interruption of Services
(including a single data center loss)
“Disaster Recovery (DR)”
Loss of “Regional data centers” leads
to recovery in a Remote data center
DC-1 DC-2 DC-3
Metro Area Pervasive Data Protection
+ Infrastructure Rebalance
Active-Active
Disaster-Recovery /
Hybrid-cloud
•Long Distances
•Beyond app latency
•Asynchronous Rep.
•Move Apps
• not distribute
•Cold Migration:
• Stateless Services
•Public/Hybrid Cloud
•Subnet Extension
•RTO > hours/days
•RPO > several secs to min
Main/Active DCs
Business Continuity
with no interruption
•Shorter Distances
•Synchronous Replication
•Low latency
•Distribute Apps
•Hot Live Migration
•Active/Active DCs
•Integration of Stateful Dev
•Private Cloud
•LAN extension
•RTO/RPO~=0
5. Latency Consideration
5
Intrinsically LAN extension technologies do not impose a latency limit between sites
However some latency limits are imposed by DC technologies
o Live Migration is in general limited to 10ms maximum
o ACI Stretched Fabric is limited to 10ms
And mostly latency is imposed by Data Storage
o Cisco I/O Acceleration improves latency by 50%
o EMC VPLEX Metro or NetApp Metro Cluster (Active/Active) allow theoretically maximum of 10ms
o If replication is synchronous therefore distances is “unlimited” (usually referred to “GEO cluster)
Usually A/A HA Stretched Cluster (Campus or Metro Cluster) allows maximum of 10ms
In addition multi-tiers applications cannot support high latency between tiers
o Recommendation to move all tiers and storage for an application (Host Affinity)
o Stateful devices implies return traffic to owner of the stateful session
User to Application traffic latency and bandwidth consumption can be mitigated using path
optimization
o LISP Mobility
o Host routing advertisement
6. Requirements for the Active-Active Metro Design – Hot Live Migration
Move Virtual Workload across Metro Data Centers while maintaining Stateful Services
6
Business Continuity Use Cases for Live Mobility
Most Business Critical Applications (Lowest RPO/RTO)
Stateful Live Workload Migrations
Operations Rebalancing / Maintenance / Consolidation of Live Workloads
Disaster Avoidance of Live Workloads
Application HA-Clusters spanning Metro DCs (<10 ms)
Hypervisor Tools for Live Mobility
VMware vMotion or Hyper-V Live Migration
Stretched HA-Clusters across Metro DCs (<10 ms)
Host Affinity rules to manage resource allocation
Distributed vCenter or System Center across Metro DCs
Metro DC Infrastructure to support Live Workload Mobility
Network: LAN Extension
Data Center Interconnect and Localized E-W traffic
Virtual Switches Distributed across Metro distances
Maintain Multi-Tenant Containers
Localized E-W traffic using distributed Default Gateway
Services: Maintain Stateful Services for active connections
Minimize traffic tromboning between Metro DCs
Compute: Support Single-Tier and Multi-Tier Applications
Storage: Shared Storage extended across Metro
Synchronous Data Replication
Distributed Virtual Volumes
Hyper-V Shared Nothing Live Migration (Storage agnostic)
Cisco Public
7. Business Continuity Use Cases for Cold Mobility
Less Business Critical Applications (Medium to High RPO/RTO)
Planned Workload Migrations of Stopped VMs
Operations Rebalancing / Maintenance / Consolidation of Stopped Workloads
Disaster Avoidance of Stopped Workloads
Disaster Recovery of Stopped Workloads
Hypervisor Tools for Cold Mobility
VMware Site Recovery Manager (SRM) or Hyper-V Failover Clustering
Geo-Clusters across A/A or A/S Geographically dispersed DCs
Host Affinity rules to manage resource allocation
Many-to-One Site Recovery Scenarios
VMDC Infrastructure to support Cold Workload Mobility
Network: Subnet Extension
Data Center Interconnect LAN Extension optional,
Localized N-S traffic using Ingress Path Optimisation
Create new Multi-Tenant Containers
Cold migration across unlimited distances
Services: Service connections temporarily disrupted
New service containers created at new site
Traffic tromboning between DCs can be reduced
Compute: Support Single-Tier and Multi-Tier Applications
Storage: Asynchronous Data Replication to remote site (NetApp SnapMirror)
Hyper-V Replica Asynchronous Data Replication (Storage agnostic)
Virtual Volumes silo’d to each DC
Cisco Public
Requirements for Metro/Geo Data Centers – Cold Migration
Move a Stopped Virtual Workload across Metro/Geo DCs, reboot machine at new site
Subnet Extension
9. Scope of the L2 DCI requirements
Must have:
Failure domain containment
Control Plane Independence
o STP domain confined inside the DC
o EVPN multi-domains
Control-plane MAC learning
Reduce any flooding
Control the BUM*
Site Independence
Dual-homing with independent paths
Reduced hair-pinning
Distributed L2 Gateway on TOR
Localized E-W traffic
o FHRP Isolation
o Anycast L3 Default Gateway
Fast convergence
Transport agnostic
Additional Improvements:
ARP suppress
ARP caching
VLAN translation
IP Multicast or Non-IP Multicast choice
Multi-homing
Path Diversity (VLAN based / Flow-based / IP-based)
Load Balancing (A/S, VLAN-based, Flow-based)
Localized N-S traffic (for long distances)
o Ingress Path Optimization (LISP)
o Works in conjunction with egress Path
optimization (FHRP localization, Anycast L3
Gateway)
* Broadcast, Unknown Unicast and Multicast
10. Ethernet
MPLS
IP
LAN Extension for DCI
Technology Selection
Over dark fiber or protected D-WDM
VSS & vPC
Dual site interconnection
FabricPath & VXLAN & ACI Stretched Fabric
Multiple sites interconnection
MPLS Transport
EoMPLS
Transparent point to point
VPLS
Large scale & Multi-tenants, Point to Multipoint
PBB-EVPN
Large scale & Multi-tenants, Point to Multipoint
IP Transport
OTV
Enterprise style Inter-site MAC Routing
LISP
For Subnet extension and Path Optimization
VXLAN/EVPN
Emerging A/A site interconnect (Layer 2 only or with Anycast L3 gateway)
Metro style
SP style
IP style
12. Traditional Layer 2 VPNs
12
• Unknown Unicast Flooding used to propagate MAC reachability
• Flooding domain extended to every site
Our goal…
providing layer 2 connectivity, yet restrict the reach of the unknown unicast flooding
domain in order to contain failures and preserve the resiliency
x2
Site A
Site B
Site C
MAC 1
propagation
MAC 1
Extending the Failure Domain
13. Overlay Transport Virtualization
Technology Pillars
13
OTV is a “MAC in IP” technique to
extend Layer 2 domains
OVER ANY TRANSPORT
Protocol Learning
Built-in Loop Prevention
Preserve Failure Boundary
Site Independence
Automated Multi-homing
Dynamic Encapsulation
No Pseudo-Wire State
Maintenance
Optimal Multicast
Replication
Multipoint Connectivity
Point-to-Cloud Model
First platform to support OTV
(since 5.0 NXOS Release)
Nexus 7000
Now also supporting OTV
(since 3.5 XE Release)
ASR 1000
14. Overlay Transport Virtualization
OTV terminology
Edge Device (ED): connects the site to the (WAN/MAN) core and responsible for performing all the OTV
functions
Internal Interfaces: L2 interfaces (usually 802.1q trunks) of the ED that face the site
Join Interface: L3 interface of the ED that faces the core
Overlay Interface: logical multi-access multicast-capable interface. It encapsulates Layer 2 frames in IP
unicast or multicast headers
OTV
Internal
Interfaces
Core
L2 L3
Join
Interface
Overlay
Interface
15. IP A
West
East
3 New MACs are
learned on VLAN 100
Vlan 100 MAC A
Vlan 100 MAC B
Vlan 300 MAC C
South
VLAN MAC IF
100 MAC A IP A
100 MAC B IP A
300 MAC C IP A
4
OTV updates exchanged via
the L3 core
3
3
2
VLAN MAC IF
100 MAC A IP A
100 MAC B IP A
300 MAC C IP A
4
3 New MACs are
learned on VLAN 100
1
Overlay Transport Virtualization
OTV Control Plane
15
• Neighbor discovery and adjacency over
• Multicast (Nexus 7000 and ASR 1000)
• Unicast (Adjacency Server Mode currently available with Nexus 7000 from 5.2 release)
• OTV proactively advertises/withdraws MAC reachability (control-plane learning)
• IS-IS is the OTV Control Protocol - No specific configuration required
16. Transport
Infrastructure
OTV OTV OTV OTV
MAC TABLE
VLAN MAC IF
100 MAC 1 Eth 2
100 MAC 2 Eth 1
100 MAC 3 IP B
100 MAC 4 IP B
MAC 1 MAC 3
MAC TABLE
VLAN MAC IF
100 MAC 1 IP A
100 MAC 2 IP A
100 MAC 3 Eth 3
100 MAC 4 Eth 4
Layer 2
Lookup
6
IP A IP BMAC 1 MAC 3MAC 1 MAC 3
Encap
3
Decap
5
MAC 1 MAC 3
West
SiteServer 1 Server 3
East
Site
4
7
IP A IP B
1
IP A IP BMAC 1 MAC 3
Overlay Transport Virtualization
Inter-Sites Packet Flow
16
17. OTV Failure Domain Isolation
Spanning-Tree Site Independence
17
Site transparency: no changes to the STP topology
Total isolation of the STP domain
Default behavior: no configuration is required
BPDUs sent and received ONLY on Internal Interfaces
L2
L3
OTV OTV
The BPDUs
stop here
18. OTV Failure Domain Isolation
Preventing Unknown Unicast Storms
18
No requirements to forward unknown unicast frames
Assumption: end-host are not silent or uni-directional
• Since 6.2(4) Selective Unicast Flooding
Default behavior: no configuration is required
L2
L3
OTV OTV
MAC TABLE
VLAN MAC IF
100 MAC 1 Eth1
100 MAC 2 IP B
- - -
MAC 1 MAC 3
No MAC 3 in the
MAC Table
19. Authoritative ED Election
Site VLAN and Site Identifier
19
Fully Automated Multi-homing
Site Adjacency established across the
site vlan
Overlay Adjacency established via the
Join interface across Layer 3 network
OTV site-vlan used to discover OTV
neighbor in the same site
A single OTV device is elected as AED on
a per-vlan basis.
The AED is responsible for:
o MAC addresses advertisement for its VLANs
o Forwarding its VLANs’ traffic inside and
outside the site
otv otv
L3 Core
OTV Hello
Site-ID 1.1.1
Full
Adjacency
OTV Hello
Site-ID 1.1.1
Site Adjacency
Overlay Adjacency
OTV Hello
Site-ID 1.1.1
OTV Hello
Site-ID 1.1.1
I’m AED for
Even VLANs
I’m AED for
Odd VLANs
20. 20
OTV Fast Convergence
• AED Server: centralized model where a single edge device runs the AED
election for each VLAN and assigns VLANs to edge devices.
• Per-VLAN AED and Backup AED assigned and advertised to all sites
• Fast Remote Convergence: on remote AED failure, OTV routes are updated
to new AED immediately
• Fast Failure Detection: Detect site VLAN failures faster with BFD and core
failures with route tracking
20
Optimized local and remote convergence
21. L2-L3 boundary at aggregation
DC Core performs only L3 role
STP and L2 broadcast Domains isolated between PODs
Intra-DC and Inter-DCs LAN extension provided by OTV
o Requires the deployment of dedicated OTV VDCs
Ideal for single aggregation block topologies
Recommended for Green Field deployments
o Nexus 7000 required in aggregation
vPC vPC
SVIs SVIs SVIs SVIs
Placement of the OTV Edge Device
OTV in the DC Aggregation
SVIs SVIs
Easy deployment for Brownfield
L2-L3 boundary in the DC core
DC Core devices performs L2, L3 and OTV functionalities
Leverage FabricPath between PoD
22. The Default Gateway (SVI) are distributed among the Leafs (Anycast Gateway)
The Firewalls host the Default Gateway
No SVIs at the Aggregation Layer or DCI Layer
No Need for the OTV VDC
Aggregation
Core
Def
GWY
FirewallFirewall
OTVOTV
Def
GWY
L2
L3
Spine
Leaf
Border-Leaf
OTV DCI Layer
L2
L3
Placement of the OTV Edge Device
SVI enabled on different platforms
OTVOTV
Anycast L3 GWY
23. DCI Convergence Summary
Robust HA is the guiding principle
23
Join Interfaces
Other Uplink Interfaces
Internal Interfaces
Common Failures:
1. Core failures
Multipath routing (or TE FRR) sub-sec
2. Join interface failures
Link Aggregates across line-cards sub-sec
3. Internal Interfaces failures
Multipath topology (vPC) & LAGs sub-sec
4. ED component failures
HW/SW resiliency sub-sec
Extreme failures (unlikely):
1x. Core partition
3x. Site partition
4x. Device down
Implements OTV reconvergence
6.2 < 5s
OTV
VDC
OTV
VDC
Access
Aggregation
East-B East-A
OSFP
Core
2
VPC
4
1
1
3
1x
1x4
4x
24. Additional innovations with OTV
Selective Unknown Unicast flooding (6.2.2)
Join Interface with Loopback Address
(Roadmap)
Tunnel Depolarization & Secondary IP
(6.2.8)
VLAN Translation (6.2.2)
Direct Translation
Transit mode
Dedicated Broadcast Group (6.2.2)
OTV 2.5 VXLAN Encapsulation (7.2)
25. Extensions over any transport (IP, MPLS)
Failure boundary preservation
Site independence
Optimal BW utilization (IP MC)
Automated Built-in Multi-homing
End-to-End loop prevention
Operations simplicity
VXLAN Encapsulation
Improvement
• Selective Unicast Flooding
• F-series internal interfaces
• Logical Source Interfaces w/Multiple Uplinks*
• Dedicated Distribution Group for Data Broadcasting
• Tunnel Depolarization
• VLAN translation
• Improved Scale & Convergence
OTV
Summary
Only 5 CLI
commands
28. What is a Fabric ?
28
New L3 switching model
Fat tree architecture to maintain scalability with tens of thousands of servers
• Spine
• Evolution toward plain switching in Spine
• Concept of Border-Spine to connect to outside the Fabric
• Leaf
• Evolution toward enabling L3 gateway at Leaf
• Concept of Border-Leaf to connect to the outside world
Type of Fabrics:
• MAC-in-MAC encapsulation
• Fabric-Path
• MAC-in-IP encapsulation
• VXLAN
• ACI (Application Centric Architecture)
Spine
Leaf Border-Leaf
29. Stretched Fabric
Considerations
No clear boundary demarcation
Shared multicast domain
Gateway localization per site
o Anycast gateway
o E/W traffic local routing
o N/S egress path optimization
o N/S ingress requires additional technics (LISP, RHI, GSLB)
Hardware based:
o One global L3 only fabric
o Anycast VTEP L2 or L3 gateway distributed
o VxLAN EVPN (ToR)
o ACI (ToR)
o VLAN translation with local VLAN significant
Fabric
Metro Distance – Dark fiber / DWDM
Fabric
Metro Distance – Dark fiber / DWDM
30. Network Dual-Fabric
Considerations
Any Distance
OTV/EVPN
L3 WAN
Dual site vPC
N-S Traffic localization is a choice between efficiency (latency-sensitive Application) and elaboration
DCI model with dedicated DCI device
Failure domain isolation
E-W traffic localization
o Distributed Active L3 gateways
N-S traffic localization
o Egress path optimization
o Ingress path optimization (LISP or IGP assist)
Dual homing
Flow control between site
VLAN Translation
o per Site, per ToR, per Port,
Unicast L3 WAN supported
Path diversity
Metro Distance – Dark fiber / DWDM
32. VXLAN Stretched Fabric
From Transit Leaf Nodes
VXLAN/EVPN Stretched Fabric using Transit Leaf Nodes
o Host Reachability information is distributed End-to-End.
o Transit Leaf (or Spine) node can be a pure Layer 3 only platform
o Data Plane is stretched End-to-End (VXLAN tunnels are established from site to site)
When to use VXLAN/EVPN Stretched Fabric ?
o Across Metro distances, Private L3 DCI, IP Multicast available E2E
o Currently Up to 256 Leaf Nodes End-to-End
Why to use it ?
o VXLAN/EVPN intra-fabric within multiple Greenfield DC
What is the Cisco Value of it ?
o VXLAN EVPN MP-BGP Control-Plane
o IRB Symmetrical Routing and Anycast L3 Gateway
o Storm Control, BPDU Guard, HMM Route Tracking
o ARP suppress
o Bud-Node support
VXLAN Stretched Fabric
Transit Leaf nodes
Host Reachability is End-to-End
Traffic is encapsulated & de-encapsulated on each far end side
* Do NOT necessarily terminate the overlay tunnel
VXLAN Leaf
* DCI Leaf nodes
VXLAN tunnel
iBGP AS 200eBGPiBGP AS 100
VXLAN Stretched Fabric
Layer 3
Host Reachability is End-to-End
Traffic is encapsulated & de-encapsulated on each far end side
Transit Spine nodes
33. VXLAN stretched Fabric Design Consideration
Control Plane Functions delineation
Underlay network (Layer 3)
Used to exchange VTEP
reachability information
Separated IGP Area
Area 0 being the Inter-site links
Overlay Routing Control Plane
Separated MP-iBGP (AS)
interconnected via MP-
eBGP sessions
Data Plane &
Host Reachability Information
is End-to-End
VTEP tunnels are extended
inside or across sites
35. ACI Multi-Fabric Design Options
Single APIC Cluster/Single Domain Multiple APIC Clusters/Multiple Domains
Site 1 Site 2
ACI Fabric
Stretched Fabric
ACI Fabric 2ACI Fabric 1
Dual-Fabric Connected (L2 and L3 Extension)
DB Web
App
L2/L3
POD ‘A’ POD ‘B’
Web/AppDB Web/App
APIC Cluster
MP-BGP - EVPN
Multi-POD (Q2CY16)
IP NetworkSite ‘A’ Site ‘B’
MP-BGP - EVPN
Web
DB App
Multi-Site (Future, Pre-CC)
36. 36
Supported Distances and Interconnection Technologies
Dark Fiber
Transceivers Cable Distance
QSFP-40G-LR4 10 km
QSFP-40GE-LR4 10 km
QSFP-40GLR4L 2 km
QSFP-40G-ER4 30 km in 1.0(4h) or earlier
40 km in 1.1 and later
For all these transceivers the cable type is SMF
DC Site 1 DC Site 2
APIC APIC APIC
ACI Fabric
vCenter
Server
Transit leaf Transit leaf
36
37. 37
Stretched Fabric Future Enhancement
DC Site 1 DC Site 2
APIC APIC
Node ID 1 Node ID 2
Node ID 3
APIC
Metro Fiber
Metro Fiber
10ms RTT
QSA
QSA
QSA
QSA
• QSA and 10G inter-site DWDM links
• Only supported on new platforms (-EX)
• Longer distance (10ms RTT)
10GE
10GE
10GE
10GE
38. 38
Supported Distances and Interconnection Technologies
Ethernet over MPLS (EoMPLS) for Speed Adaptation
Port mode EoMPLS used to stretch the ACI fabric over long distance.
o DC Interconnect links could be 10G (minimum) or higher (100GE) with 40G facing the Leafs / Spines
o DWDM or Dark Fiber provides connectivity between two sites (validated) or MPLS core network with QoS for CP (not validated).
Tested and Validated up to 10 ms (public document *).
Other ports on the Router used for connecting to the WAN via L3Out
MACsec support on DCI (DWDM between 2 ASR9k)
10 ms RTT
DC Site 1 DC Site 2
APIC APIC
Node ID 1 Node ID 2
Node ID 3
APIC
QSFP-40G-SR4
40G
40G 40G
10G/40G/100G
10G/40G/100G
40G
EoMPLS Pseudowire
WAN
38* http://www.cisco.com/c/en/us/td/docs/switches/datacenter/aci/apic/sw/kb/b_kb-aci-stretched-fabric.html
40. 40
Web/App
DB
ACI Multi-POD Solution
Details
POD ‘A’ POD ‘B’
Web/App
mBGP - EVPN
Single APIC Cluster
• One APIC cluster manages all PODs. Same name space(VTEP
address, VNID, class ID, GIPo etc)
• Host reachability is propagated via BGP-EVPN. Not exposed to
transit IP network
• Transit Network is IP Based
• Ability to support advanced
forwarding(direct ARP forwarding, no
unknown unicast flooding)
• Support multiple PODs
• Multicast in the Inter-POD Network
Future
Roadmap
41. ACI Multi-POD Solution
WAN Connectivity
Each POD can have a dedicated
connection to the WAN
Traditional L3Out configuration
Shared between tenants or dedicated
per tenant (VRF-Lite)
VTEPs always select local connection
based on IS-IS metric
Requires an inbound path optimization
solution for achieving traffic symmetry
MP-BGP - EVPN
WAN WAN
42. 42
ACI Multi-POD Solution
Topologies
Two DC sites connected
back2back
POD 1 POD 2
Web/AppDB Web/App
APIC Cluster
Dark fiber/DWDM (up
to 10 msec RTT)
40G/100G 40G/100G
10G/40G/100G
speed agnostic
POD 1 POD n
Web/AppDB Web/App
APIC Cluster
…
Intra-DC
40G/100G 40G/100G
POD 1 POD 2
POD 3
3 DC Sites
Dark fiber/DWDM (up
to 10 msec RTT)
40G/100G 40G/100G
40G/100G
10G/40G/100G
Multiple sites interconnected
by a generic L3 network
L3
40G/100G
40G/100G
40G/100G
40G/100G
Target : Up to 20 sites
44. Interconnecting multiple Fabrics
Multiple Scenarios, Multiple Options
Multi-sites Models
o Classical DC to Classical DC
o VXLAN Fabric to VXLAN Fabric
o ACI to ACI
o Any to Any
DCI Models (2 sites)
o (Campus-Metro) Native Ethernet using vPC Dual-sites back-to-back (Fiber/DWDM)
DCI Models (2 or multiple sites)
o (Metro-Geo) MPLS-based using VPLS or PBB-EVPN
o (Metro-Geo) IP-based using OTV, VXLAN
Stretched Fabric (VXLAN, ACI)
Multi-Fabrics (VXLAN, ACI)
Localized E-W traffic (FHRP isolation, Anycast HSRP, Anycast L3 gateway)
Localized N-S traffic (LISP Mobility, IGP Assist, Host-Route Injection)