SlideShare ist ein Scribd-Unternehmen logo
1 von 17
VL2: A Scalable and Flexible
Data Center Network
Microsoft Research
Presented by: Ankita Mahajan
INTRODUCTION
• Cloud services need data centers with
hundreds of thousands of servers and that
concurrently support a large number of
distinct services.
• To be profitable, DC must achieve high
utilization, and key to this is agility — the
capacity to assign any server to any service.
• Agility promises: improved risk management
and cost savings.
Agility in today’s DCN
• Designs for today’s DCN prevent agility in
several ways:
• Oversubscription : Existing architectures do not
provide enough capacity between the servers
they interconnect
• Traffic flood in 1 service affects other services.
• Topologically significant IP addresses.
• Dividing servers among VLANs: reates
Fragmentation of Address Space.
Objectives of building VL2
To overcome limitations we need a n/w with
following objectives:
• Uniform high capacity: Hot-spot free
• Performance isolation
• Layer-2 semantics: Just as if the servers were
on a LAN it should be easy to assign any server
to any service and configure that server with
whatever IP address the service expects.
• 20 to 40 servers per rack, each singly connected to
• a Top of Rack (ToR) switch with a 1 Gbps link.
• ToRs connect to two aggregation switches for redundancy, and
• these switches aggregate further connecting to access routers.
• At the top, core routers carry traffic between access routers.
• All links use Ethernet as a physical-layer protocol
• To limit overheads (e.g., packet flooding and ARP broadcasts) and to isolate
different services servers are partitioned into virtual LANs (VLANs)
Limitations
3 fundamental Limitations:
• Limited server-to-server capacity: ToRs are
1:5 to 1:20 oversubscribed and paths through
the highest layer can be 1:240 oversubscribed.
• Fragmentation of resources: spare capacity is
reserved by individual services.
• Poor reliability and utilization: Resilience
model forces each device and link to be run up
to at most 50% of its maximum utilization.
Data-Center Traffic Analysis
1. The ratio of traffic volume between servers into
data centers, to, traffic entering/leaving data
centers is currently around 4:1
2. data-center computation is focused where high
speed access to data on memory or disk is fast
and cheap.
3. The demand for bandwidth between servers
inside a data center is growing faster than the
demand for bandwidth to external hosts.
4. The network is a bottleneck to computation.
Flow Distribution Analysis:
Distribution of flow sizes:
• Similar to Internet traffic, 99% of flows are
smaller than 100 MB.
• But the distribution is simpler and more
uniform than Internet.
• More than 90% of bytes are in flows between
100MB and 1 GB.
Flow Distribution Analysis:
Number of Concurrent Flows:
• More than 50% of the time, an average machine has
about ten concurrent flows.
• At least 5% of the time it has greater than 80
concurrent ows.
• We almost never see more than 100 concurrent ows.
Both the above Flow Distribution Analysis imply that VLB
will perform well on this traffic. Since even big flows are
only 100 MB.
adaptive routing schemes may be dicult to implement in
the data center since any reactive traffic engineering will
need to run at least once a second if it wants to react to
individual flows.
Traffic Matrix Analysis
• Poor summarizability of trac patterns:
Is there regularity in the trac that might be
exploited through careful measurement and trac
engineering?
TM(t)ij clustering: a day’s worth of trac in the
datacenter, even when approximating with 50-60
clusters, the fitting error remains 60%
• Instability of trac patterns: how predictable is the
trac in the next interval given the current trac?
Failure Characteristics
1. pattern of networking equipment failures:
• Most failures are small in size: 50% of network
device failures involve < 4 devices and 95% of
network device failures involve < 20 devices
• Large correlated failures are rare: the largest
correlated failure involved 217 switches
• downtimes can be signicant: 95% of failures are
resolved in 10min, 98% in < 1 hr, 99.6% in < 1 day,
but 0.09% last > 10 days.
impact of networking equipment
failure?
• in 0.3% of failures all redundant components in a
network device group became unavailable
• The main causes of these downtimes are network
miscongurations, firmware bugs, and faulty
components (e.g., ports).
• With no obvious way to eliminate all failures from
the top of the hierarchy, VL2’sapproach is to
broaden the topmost levels of the network so
that the impact of failures is muted and
performance degrades gracefully,
• moving from 1:1 redundancy to n:m redundancy.
Terminology
• Goodput: useful information delivered per
second to the application layer.
• VLB: each server independently picks a path at
random through the network for each of the
flows it sends to other servers in the data
center.
• ECMP: distributes traffic across equal-cost
paths
• anycast addresses for the Directory System
VL2: A scalable and flexible Data Center Network
VL2: A scalable and flexible Data Center Network
VL2: A scalable and flexible Data Center Network
VL2: A scalable and flexible Data Center Network

Weitere ähnliche Inhalte

Was ist angesagt?

Reference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network DesignReference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network Design
DataWorks Summit
 
basic networking
basic networkingbasic networking
basic networking
Anmol Bagga
 

Was ist angesagt? (20)

Reference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network DesignReference Architecture-Validated & Tested Approach to Define Network Design
Reference Architecture-Validated & Tested Approach to Define Network Design
 
Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3
 
Data Center Network Topologies
Data Center Network TopologiesData Center Network Topologies
Data Center Network Topologies
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Dcnintroduction 141010054657-conversion-gate01
Dcnintroduction 141010054657-conversion-gate01Dcnintroduction 141010054657-conversion-gate01
Dcnintroduction 141010054657-conversion-gate01
 
Ipv6 deployment at the university of warwick - networkshop44
Ipv6 deployment at the university of warwick - networkshop44Ipv6 deployment at the university of warwick - networkshop44
Ipv6 deployment at the university of warwick - networkshop44
 
Chapter06
Chapter06Chapter06
Chapter06
 
Data Center Network Multipathing
Data Center Network MultipathingData Center Network Multipathing
Data Center Network Multipathing
 
LAN Switching and Wireless: Ch1 - LAN Design
LAN Switching and Wireless: Ch1 - LAN DesignLAN Switching and Wireless: Ch1 - LAN Design
LAN Switching and Wireless: Ch1 - LAN Design
 
Architecting data center networks in the era of big data and cloud
Architecting data center networks in the era of big data and cloudArchitecting data center networks in the era of big data and cloud
Architecting data center networks in the era of big data and cloud
 
The Evolving Internet Fndtn
The Evolving Internet FndtnThe Evolving Internet Fndtn
The Evolving Internet Fndtn
 
Network Fundamentals: Ch5 - Network Layer
Network Fundamentals: Ch5 - Network LayerNetwork Fundamentals: Ch5 - Network Layer
Network Fundamentals: Ch5 - Network Layer
 
SDN Architecture & Ecosystem
SDN Architecture & EcosystemSDN Architecture & Ecosystem
SDN Architecture & Ecosystem
 
Cs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT VCs8591 Computer Networks - UNIT V
Cs8591 Computer Networks - UNIT V
 
Lan overview
Lan overviewLan overview
Lan overview
 
Switched networks (LAN Switching – Switches)
Switched networks (LAN Switching – Switches)Switched networks (LAN Switching – Switches)
Switched networks (LAN Switching – Switches)
 
Ch 02 --- sdn and openflow architecture
Ch 02 --- sdn and openflow architectureCh 02 --- sdn and openflow architecture
Ch 02 --- sdn and openflow architecture
 
CS8591 Computer Networks - Unit II
CS8591 Computer Networks - Unit II CS8591 Computer Networks - Unit II
CS8591 Computer Networks - Unit II
 
Switching
SwitchingSwitching
Switching
 
basic networking
basic networkingbasic networking
basic networking
 

Andere mochten auch

software defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllerssoftware defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllers
Isaku Yamahata
 

Andere mochten auch (8)

Hardware Approaches for Fast Lookup & Classification
Hardware Approaches for Fast Lookup & ClassificationHardware Approaches for Fast Lookup & Classification
Hardware Approaches for Fast Lookup & Classification
 
software defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllerssoftware defined network, openflow protocol and its controllers
software defined network, openflow protocol and its controllers
 
Open flow
Open flowOpen flow
Open flow
 
The dark side of SDN and OpenFlow
The dark side of SDN and OpenFlowThe dark side of SDN and OpenFlow
The dark side of SDN and OpenFlow
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance Networks
 
Benefits of programmable topological routing policies in RINA-enabled large s...
Benefits of programmable topological routing policies in RINA-enabled large s...Benefits of programmable topological routing policies in RINA-enabled large s...
Benefits of programmable topological routing policies in RINA-enabled large s...
 
Tools and Platforms for OpenFlow/SDN
Tools and Platforms for OpenFlow/SDNTools and Platforms for OpenFlow/SDN
Tools and Platforms for OpenFlow/SDN
 
Routing, Network Performance, and Role of Analytics
Routing, Network Performance, and Role of AnalyticsRouting, Network Performance, and Role of Analytics
Routing, Network Performance, and Role of Analytics
 

Ähnlich wie VL2: A scalable and flexible Data Center Network

Fundamentals of Enterprise Networks
Fundamentals ofEnterprise NetworksFundamentals ofEnterprise Networks
Fundamentals of Enterprise Networks
VisualBee.com
 
Ccna exploration 3 lan switching and wireless
Ccna exploration 3 lan switching and wirelessCcna exploration 3 lan switching and wireless
Ccna exploration 3 lan switching and wireless
kratos2424
 
Lecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptxLecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptx
SandeepGupta229023
 
Contention Evalution Factors-8 ccategories
Contention Evalution Factors-8 ccategoriesContention Evalution Factors-8 ccategories
Contention Evalution Factors-8 ccategories
jaya shanmuga
 
performanceandtrafficmanagement-160328180107.pdf
performanceandtrafficmanagement-160328180107.pdfperformanceandtrafficmanagement-160328180107.pdf
performanceandtrafficmanagement-160328180107.pdf
ABYTHOMAS46
 
Unit 5-Performance and Trafficmanagement.pptx
Unit 5-Performance and Trafficmanagement.pptxUnit 5-Performance and Trafficmanagement.pptx
Unit 5-Performance and Trafficmanagement.pptx
ABYTHOMAS46
 

Ähnlich wie VL2: A scalable and flexible Data Center Network (20)

Ccna4 mod5-frame relay
Ccna4 mod5-frame relayCcna4 mod5-frame relay
Ccna4 mod5-frame relay
 
Computer networks unit i
Computer networks    unit iComputer networks    unit i
Computer networks unit i
 
Lesson 2-Introduction to Network.pptx
Lesson 2-Introduction to Network.pptxLesson 2-Introduction to Network.pptx
Lesson 2-Introduction to Network.pptx
 
CNE CHP1.pdf
CNE CHP1.pdfCNE CHP1.pdf
CNE CHP1.pdf
 
Lan Switching[1]
Lan Switching[1]Lan Switching[1]
Lan Switching[1]
 
UNIT 4 - UNDERSTANDING THE NETWORK ARCHITECTURE.pptx
UNIT 4 - UNDERSTANDING THE NETWORK ARCHITECTURE.pptxUNIT 4 - UNDERSTANDING THE NETWORK ARCHITECTURE.pptx
UNIT 4 - UNDERSTANDING THE NETWORK ARCHITECTURE.pptx
 
Distributed Systems - Information Technology
Distributed Systems - Information TechnologyDistributed Systems - Information Technology
Distributed Systems - Information Technology
 
CISSP - Chapter 4 - Intranet and extranets
CISSP - Chapter 4 - Intranet and extranetsCISSP - Chapter 4 - Intranet and extranets
CISSP - Chapter 4 - Intranet and extranets
 
Computer network
Computer networkComputer network
Computer network
 
campus_design_eng1.ppt
campus_design_eng1.pptcampus_design_eng1.ppt
campus_design_eng1.ppt
 
Fundamentals of Enterprise Networks
Fundamentals ofEnterprise NetworksFundamentals ofEnterprise Networks
Fundamentals of Enterprise Networks
 
Sem
SemSem
Sem
 
CellSDN: Software-Defined Cellular Core networks
CellSDN: Software-Defined Cellular Core networksCellSDN: Software-Defined Cellular Core networks
CellSDN: Software-Defined Cellular Core networks
 
Ccna exploration 3 lan switching and wireless
Ccna exploration 3 lan switching and wirelessCcna exploration 3 lan switching and wireless
Ccna exploration 3 lan switching and wireless
 
Lecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptxLecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptx
 
Contention Evalution Factors-8 ccategories
Contention Evalution Factors-8 ccategoriesContention Evalution Factors-8 ccategories
Contention Evalution Factors-8 ccategories
 
performanceandtrafficmanagement-160328180107.pdf
performanceandtrafficmanagement-160328180107.pdfperformanceandtrafficmanagement-160328180107.pdf
performanceandtrafficmanagement-160328180107.pdf
 
Performance and traffic management for WSNs
Performance and traffic management for WSNsPerformance and traffic management for WSNs
Performance and traffic management for WSNs
 
internet network for o level
 internet network for o level  internet network for o level
internet network for o level
 
Unit 5-Performance and Trafficmanagement.pptx
Unit 5-Performance and Trafficmanagement.pptxUnit 5-Performance and Trafficmanagement.pptx
Unit 5-Performance and Trafficmanagement.pptx
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

VL2: A scalable and flexible Data Center Network

  • 1. VL2: A Scalable and Flexible Data Center Network Microsoft Research Presented by: Ankita Mahajan
  • 2. INTRODUCTION • Cloud services need data centers with hundreds of thousands of servers and that concurrently support a large number of distinct services. • To be profitable, DC must achieve high utilization, and key to this is agility — the capacity to assign any server to any service. • Agility promises: improved risk management and cost savings.
  • 3. Agility in today’s DCN • Designs for today’s DCN prevent agility in several ways: • Oversubscription : Existing architectures do not provide enough capacity between the servers they interconnect • Traffic flood in 1 service affects other services. • Topologically significant IP addresses. • Dividing servers among VLANs: reates Fragmentation of Address Space.
  • 4. Objectives of building VL2 To overcome limitations we need a n/w with following objectives: • Uniform high capacity: Hot-spot free • Performance isolation • Layer-2 semantics: Just as if the servers were on a LAN it should be easy to assign any server to any service and configure that server with whatever IP address the service expects.
  • 5. • 20 to 40 servers per rack, each singly connected to • a Top of Rack (ToR) switch with a 1 Gbps link. • ToRs connect to two aggregation switches for redundancy, and • these switches aggregate further connecting to access routers. • At the top, core routers carry traffic between access routers. • All links use Ethernet as a physical-layer protocol • To limit overheads (e.g., packet flooding and ARP broadcasts) and to isolate different services servers are partitioned into virtual LANs (VLANs)
  • 6. Limitations 3 fundamental Limitations: • Limited server-to-server capacity: ToRs are 1:5 to 1:20 oversubscribed and paths through the highest layer can be 1:240 oversubscribed. • Fragmentation of resources: spare capacity is reserved by individual services. • Poor reliability and utilization: Resilience model forces each device and link to be run up to at most 50% of its maximum utilization.
  • 7. Data-Center Traffic Analysis 1. The ratio of traffic volume between servers into data centers, to, traffic entering/leaving data centers is currently around 4:1 2. data-center computation is focused where high speed access to data on memory or disk is fast and cheap. 3. The demand for bandwidth between servers inside a data center is growing faster than the demand for bandwidth to external hosts. 4. The network is a bottleneck to computation.
  • 8. Flow Distribution Analysis: Distribution of flow sizes: • Similar to Internet traffic, 99% of flows are smaller than 100 MB. • But the distribution is simpler and more uniform than Internet. • More than 90% of bytes are in flows between 100MB and 1 GB.
  • 9. Flow Distribution Analysis: Number of Concurrent Flows: • More than 50% of the time, an average machine has about ten concurrent flows. • At least 5% of the time it has greater than 80 concurrent ows. • We almost never see more than 100 concurrent ows. Both the above Flow Distribution Analysis imply that VLB will perform well on this traffic. Since even big flows are only 100 MB. adaptive routing schemes may be dicult to implement in the data center since any reactive traffic engineering will need to run at least once a second if it wants to react to individual flows.
  • 10. Traffic Matrix Analysis • Poor summarizability of trac patterns: Is there regularity in the trac that might be exploited through careful measurement and trac engineering? TM(t)ij clustering: a day’s worth of trac in the datacenter, even when approximating with 50-60 clusters, the fitting error remains 60% • Instability of trac patterns: how predictable is the trac in the next interval given the current trac?
  • 11. Failure Characteristics 1. pattern of networking equipment failures: • Most failures are small in size: 50% of network device failures involve < 4 devices and 95% of network device failures involve < 20 devices • Large correlated failures are rare: the largest correlated failure involved 217 switches • downtimes can be signicant: 95% of failures are resolved in 10min, 98% in < 1 hr, 99.6% in < 1 day, but 0.09% last > 10 days.
  • 12. impact of networking equipment failure? • in 0.3% of failures all redundant components in a network device group became unavailable • The main causes of these downtimes are network miscongurations, firmware bugs, and faulty components (e.g., ports). • With no obvious way to eliminate all failures from the top of the hierarchy, VL2’sapproach is to broaden the topmost levels of the network so that the impact of failures is muted and performance degrades gracefully, • moving from 1:1 redundancy to n:m redundancy.
  • 13. Terminology • Goodput: useful information delivered per second to the application layer. • VLB: each server independently picks a path at random through the network for each of the flows it sends to other servers in the data center. • ECMP: distributes traffic across equal-cost paths • anycast addresses for the Directory System