Migration WG whose task it is to examine the existing examples of deploying SDN, ideally with the goal of full transition. Roughly, the idea is to examine the cases where this has been done and gather best practices, etc., from the experiences.
The Charter specifies two migration approaches, depicted in Figure 1. The first approach is the more direct method of upgrading existing networking equipment with OpenFlow Agents and decommissioning the Control Machine in favor of OpenFlow Controllers and Configurators.
The second approach includes a phased approach, illustrated in Figure 2, in which OpenFlow devices are deployed in conjunction with existing devices. Network operations are maintained by both the existing Control Machine and by OpenFlow Controllers and Configurators. Once services have been migrated to the OpenFlow target network, the starting network is decommissioned.
Legacy devices are traditional Switch/Routers with integrated control and forwarding plane. OpenFlow devices are switches with only OpenFlow forwarding planes, with the control plane residing external to the device. Hybrid OpenFlow Switches refers to devices with both legacy control and data plane and OpenFlow capabilities.
Campus Networks are typically composed of multiple buildings, interconnected with a central operations center. Components of the Campus network would include a Campus wide backbone. An egress point to the Wide Area Network is typically associated with a datacenter of some description. Each building will typically have a wiring closet and, in many cases, additional networking/datacenter facilities – be they for different academic departments, administration facilities, or campus wide IT resources. Enterprise Datacenters can range in size, but are typically composed of networking resources used to interconnect various sub-networks of servers (physical or virtual) together with associated storage (e.g. NAS or SAN), security, and networking functions (e.g. WAN acceleration, Load Balancing, etc.). Requirements for software-defined networking can vary, but application driven services rank high on the list. Multi-Tenant Datacenters have benefitted greatly from software-defined networking. These datacenters share many aspects of the typical Enterprise Datacenter, however, multiple tenants must typically share the physical resources. Virtualization of computing resources is almost a necessity, with robust features such as Virtual Machine migration facilitating a variety of capabilities, includingresource balancing, maintenance, and disaster recovery. Soft Switches within the computing resources themselves are a dominant component of the architecture. The net effect is that portions of the datacenter move and change, demanding that the overlay network must move and change to echo those changes. Increasingly, however, software-defined networking devices help address these requirements.Service Provider/Wide AreaNetworks introduce significant diversity. Service providers network architectures and requirements vary. For example, a Mobile Cellular Service Provider will have a radio network; along with a mobile backhaul network which hands off to an access network and ultimately a core network. Different applications of OpenFlow and SDN are being developed and deployed today. Service Providers, such as Google, are using OpenFlow to manage their inter-exchange resources and to ensure appropriate bandwidth is available at appropriate times. Many use cases are being developed by the industry, with software-defined solutions addressing Layer 0 through 7 network domains.
Goal was to create a new environment (co-existence model) and let experimenters use it. Gradual migration of users to OpenFlow over a 2-year period (Jun 2009 to Jun 2011). Use of a variety of switches and controllers, including: HP, NEC, Nicira, BigSwitch. 3 types of networks: wireline, experimental, and wireless (ofwifi with 30 APs). Emphasis on VLAN configuration: make new VLAN, migrate users to it, then introduce OpenFlow. Even so, some problems on a VLAN did take down the whole network. 25 wireline users, 77 wireless users, about 30 APs, in the order of 100 subnets. Flow setup time less than 100ms. Experimental work included traffic engineering and scalability exercises. Use of many existing/custom-built tools, including probing tools and VM-based tools (list can be shared). No major issues with loops. 200-300 flows/second on wireline network and about 700 flows/second on wireless network. Traffic engineering algorithms were key to deployment (throughput and rate limits). 3 major types of tools: additional probes on switches (dummy machines), user-installed software, collection on controller, VM circulated to different campuses (further info can be shared). Same switch had OpenFlow and non-OpenFlow VLANS. Users were moved from one to the other on the same switch.
Manage the Risk in Deploying Eventual Goal: Expand the OF Support to Serveral other L2 VLANs and then Interconnect Them at L3 RouterTool Requirements: oftrace, wiresharkdissector for OF, minnet, ofrewind, Hassel andNetPlumber, ATPGGAP AnalysisAdd safeguards in place within Switch firmware or OF controller to automatically revert configurationsStronger interoperability between the OF network and Non-OF network
Data plane and BGP control plane tightly coupled. Hard to keep up with BGP control plane changes or additional features on vendor specific OS and platforms.Puts extra load on the edge router’s control plane, which can lead to failures.BGP Scale limited by the CPU/Memory resources available on the edge router.Makes BGP configuration, management, monitoring and troubleshooting difficult and complex especially for large-scale deployments.Network operator spends a significant amount of time creating/maintaining BGP peering sessions and policies manually.
In the traditional BGP deployment models, edge router maintains numerous BGP adjacencies as well as large number of BGP routes/paths for multiple address families such as IPv4, IPv6, VPNv4 and VPNv6 etc. In addition, to meet customer SLAs, edge router may be configured with aggressive BGP session or Bidirectional Forwarding Detection (BFD) timers. Handling BGP state machine, processing BGP updates as per configured policies and calculating best paths for each address-family puts a heavy load on the router. Additionaly, by definition, service changes are quite frequent on the edge routers to provision new customers or update customer policies. Because of the limited resources, including CPU and memory, as well as proprietary nature of OS, service acceleration and innovation is dependant on vendor implementation. In the traditional deployment model, Provider Edge (PE) router runs BGP with external BGP speaking peers. In a typical Service provider environment, it is not uncommon for an edge router to maintain 500K+ Internet and/or L3VPN routes. Besides external peerings, edge router also maintains internal peering sessions typically with dual Route Reflectors (RR) as depicted in Figure 19. All the BGP sessions as well as policies are typically configured manually using vendor specific CLI. Data plane and BGP control plane tightly coupled. Hard to keep up with BGP control plane changes or additional features on vendor specific OS and platforms.Puts extra load on the edge router’s control plane, which can lead to failures.BGP Scale limited by the CPU/Memory resources available on the edge router.Makes BGP configuration, management, monitoring and troubleshooting difficult and complex especially for large-scale deployments.Network operator spends a significant amount of time creating/maintaining BGP peering sessions and policies manually. BGP Free Core is becoming popular among network operatorswho run some form of encapsulation in the core. Motivations:– Simplified core architecture– Lower cost of core infrastructure– Increase in core speed– Simplified core management– Better control on traffic patterns in the core– Direct preparation for optical switching
Lessons learnt and deployment practices. High level and not comprehensive but can provide some guidelines for others who are planning to go on similar journey. For example, the lack of fault tolerant OpenFlow controllers can be mitigated by provisioning multiple OpenFlow controllers to provide redundancy. Similarly, the lack of BGP relay agent on the OF enabled device to replicate the BGP sessions to provide resiliency for the BGP Free Edge use case and similarly resiliency for the BGP route controller can be addressed by deploying controller across multiple VMs and across multiple physical servers similar to cloud infrastructure and NFV. More work needed on requirements such as resiliency and redundancy for fault-tolerant OpenFlow controllersAlternative options available to mitigate the resiliency concernsDeploy multiple OpenFlow controllers to provide redundancy Deploy BGP controller across multiple VMs/ multiple physical servers for to avoid single point of failure