SlideShare a Scribd company logo
1 of 31
A Scalable, Commodity Data
Center Network Architecture
Mohammad AI-Fares , Alexander Loukissas , Amin Vahdat
Overview
• Background of Current DCN Architectures
• Desired properties in a DC Architecture
• Fat tree based solution
• Evaluation
• Conclusion
Common DC Topology
Internet
Servers
Layer-2 switchAccess
Data Center
Layer-2/3 switchAggregation
Layer-3 routerCore
Background Cont.
 Oversubscription:
 Ratio of the worst-case achievable aggregate bandwidth among
the end hosts to the total bisection bandwidth of a particular
communication topology
(1:1 indicates that all hosts may communicate with any arbitrary
host at full BW. Ex) 1Gb/s for commodity Ethernet)
 Lower the total cost of the design
 Typical designs: factor of 2.5:1 (400 Mbps)to 8:1(125 Mbps)
 Cost:
 Edge: $7,000 for each 48-port GigE switch
 Aggregation and core: $700,000 for 128-port 10GigE switches
 Cabling costs are not considered!
Current DC Network Architectures
2High level choices for building comm fabric for large scale clusters:
 Leverages specialized hardware and communication protocols,
such as InfiniBand, Myrinet.
– These solutions can scale to clusters of thousands of nodes with high
bandwidth
– Expensive infrastructure, incompatible with TCP/IP applications
 Leverages commodity Ethernet switches and routers to
interconnect cluster machines
– Backwards compatible with existing infrastructures, low-cost
– Aggregate cluster bandwidth scales poorly with cluster size, and achieving
the highest levels of bandwidth incurs non-linear cost increase with cluster
size
Problem With common DC topology
• Single point of failure
• Over subscript of links higher up in the
topology
Properties of the solution(FAT)
• Backwards compatible with existing
infrastructure
– No changes in applications.
– Support of layer 2 (Ethernet)
• Cost effective
– Low power consumption & heat emission
– Cheap infrastructure
• Allows host communication at line speed.
Clos Networks/Fat-Trees
• Adopt a special instance of a Clos topology:
“K-ary FAT tree”
• Similar trends in telephone switches led to
designing a topology with high bandwidth by
interconnecting smaller commodity switches.
Fat-Tree Based DC Architecture
• Inter-connect racks (of servers) using a fat-tree topology
K-ary fat tree: three-layer topology (edge, aggregation and core)
– each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches
– each edge switch connects to k/2 servers & k/2 aggr. switches
– each aggr. switch connects to k/2 edge & k/2 core switches
– (k/2)2 core switches: each connects to k pods
Fat-tree with
K=4
Fat-Tree Based Topology
• Why Fat-Tree?
– Fat tree has identical bandwidth at any bisections
– Each layer has the same aggregated bandwidth
• Can be built using cheap devices with uniform capacity
– Each port supports same speed as end host
– All devices can transmit at line speed if packets are distributed uniform along
available paths
• Great scalability: k-port switch supports k3/4 servers/hosts:
(k/2 hosts/switch * k/2 switches/pod * k pods)
Fat tree network with K = 6 supporting 54 hosts
FATtree DC meets the following goals:
• 1. Scalable interconnection bandwidth:
• Achieve the full bisetion bandwidth of clusters consisting
of tens of thousands of nodes.
• 2. Backward compatibility. No changes to end
hosts.
• 3. Cost saving.
• Enforce a special (IP) addressing scheme in
DC
– unused.PodNumber.switchnumber.Endhost
– Allows host attached to same switch to route
only through that switch
– Allows inter-pod traffic to stay within pod
• Exising routing protocals such as OSPF,
OSPF-ECMP are unavailable.
– Concentrate on traffic.
– Avoid congestion
– Fast Switch must be able to recognize!
FAT-Tree (Addressing)
FAT-Tree (2-level Look-ups)
• Use two level look-ups to distribute traffic
and maintain packet ordering:
– First level is prefix lookup
• used to route down the topology to hosts.
– Second level is a suffix lookup
• used to route up towards core
• maintain packet ordering by using same ports for
same server.
• Diffuses and spreads out traffic.
More on Fat-Tree DC Architecture
Diffusion Optimizations (optional dynamic
routing techniques)
1. Flow classification, Define a flow as a sequence
of packets; pod switches forward subsequent packets
of the same flow to same outgoing port. And
periodically reassign a minimal number of output
ports
– Eliminates local congestion
– Assign traffic to ports on a per-flow basis
instead of a per-host basis, Ensure fair
distribution on flows
More on Fat-Tree DC Architecture
2. Flow scheduling, Pay attention to routing large flows,
edge switches detect any outgoing flow whose size grows
above a predefined threshold, and then send notification to a
central scheduler. The central scheduler tries to assign non-
conflicting paths for these large flows.
– Eliminates global congestion
– Prevent long lived flows from sharing the same
links
– Assign long lived flows to different links
Fault-Tolerance
In this scheme, each switch in the network maintains a BFD
(Bidirectional Forwarding Detection) session with each of its
neighbors to determine when a link or neighboring switch fails. 2
kinds of failures:
 Failure between upper layer and core switches
 Outgoing inter-pod traffic: local routing table marks
the affected link as unavailable and chooses another
core switch
 Incoming inter-pod traffic, core switch broadcasts a tag
to upper switches directly connected signifying its
inability to carry traffic to that entire pod, then upper
switches avoid that core switch when assigning flows
destined to that pod
Fault-Tolerance
• Failure b/w lower and upper layer switches:
– Outgoing inter- and intra pod traffic from lower-layer:
– the local flow classifier sets the cost to infinity and
does not assign it any new flows, chooses another
upper layer switch
– Intra-pod traffic using upper layer switch as intermediary:
– Switch broadcasts a tag notifying all lower level
switches, these would check when assigning new
flows and avoid it
– Inter-pod traffic coming into upper layer switch:
– Tag to all its core switches signifying its ability to carry
traffic, core switches mirror this tag to all upper layer
switches, then upper switches avoid affected core
switch when assigning new flaws
Packing
Drawback: No of cables
needed.
Proposed packaging
solution:
• Incremental modeling
• Simplifies cluster
mgmt.
• Reduces cable size
• Reduces cost.
Evaluation
• Benchmark suite of communication mappings
to evaluate the performance of the 4-port fat-
tree using the TwoLevelTable switches,
FlowClassifier ans the FlowScheduler and
compare to hirarchical tree with 3.6:1
oversubscription ratio
Results: Network Utilization
Results: Heat & Power Consumption
Conclusion
Bandwidth is the scalability bottleneck in large
scale clusters
Existing solutions are expensive and limit cluster
size
Fat-tree topology with scalable routing and
backward compatibility with TCP/IP and Ethernet
Large number of commodity switches have the
potential of displacing high end switches in DC
the same way clusters of commodity PCs have
displaced supercomputers for high end
computing environments
Critical Analysis of Fat-Tree Topology
9/22/2013 COMP6611A In-class Presentation 23
Draw Backs
• Scalability
• Unable to support different traffic types
efficiently
• Simple load balancing
• Specialized hardware
• No inherent support for VLan traffic
• Ignored connectivity to the Internet
• Waste of address space
9/22/2013 COMP6611A In-class Presentation 24
Scalability
9/22/2013 COMP6611A In-class Presentation 25
• The size of the data center is fixed.
– For commonly available 24, 36, 48 and 64 ports
commodity switches, such a Fat-tree structure is
limited to sizes 3456, 8192, 27648 and 65536,
respectively.
• The cost of incremental deployment is
significant.
Unable to support different traffic
types efficiently
9/22/2013 COMP6611A In-class Presentation 26
• The proposed structure is switch-
oriented, which might not be sufficient to
support 1-to-x traffic.
• A flow can be setup between 2 any pairs of
servers with the help of the switching
fabric, but it is difficult to setup 1-to-many and
1-to-all flows efficiently using the proposed
algorithms.
Specialized hardware
9/22/2013 COMP6611A In-class Presentation 27
• This architecture still relies on a customized
routing primitive that does not exist in
commodity switches.
• To implement the 2-table lookup, the lookup
routine in a router have to be modified.
– The authors achieves this by modifying a NetFPGA
router.
– This modification is yet to be incorporated into
current commodity switches.
No inherent support for VLan traffic
9/22/2013 COMP6611A In-class Presentation 28
• Agility
– The capacity to assign any server to any services
• Performance Isolation
– Services should not affected each othe
Waste of address space
9/22/2013 COMP6611A In-class Presentation 29
• Semantic information in the IP address of
servers and switches.
– e.g. 10.pod.switch.0/24, port
– A large portion of the IP address is wasted.
• NAT is required at the border.
Possible Future Research
9/22/2013 COMP6611A In-class Presentation 30
• Flexible incremental expandability in data
center networks.
• Support Valiant Load Balancing or other load
balancing techniques in the network.
• Reduce cabling, or increase tolerance to mis-
wiring
References:
• A scalable Commodity DCN Archi, dept. of CSE- Univ of California
• pages.cs.wisc.edu/~akella/CS740/F08/DataCenters.ppt
• http://networks.cs.northwestern.edu/EECS495-
s11/prez/DataCenters-defence.ppt
• University of Hong Kong:
http://www.cse.ust.hk/~kaichen/courses/spring2013/comp6611/

More Related Content

What's hot

Routing algorithm
Routing algorithmRouting algorithm
Routing algorithmBushra M
 
Hardware Software Codesign
Hardware Software CodesignHardware Software Codesign
Hardware Software Codesigndestruck
 
pipeline and vector processing
pipeline and vector processingpipeline and vector processing
pipeline and vector processingAcad
 
3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cflSampath Kumar S
 
Interfacing memory with 8086 microprocessor
Interfacing memory with 8086 microprocessorInterfacing memory with 8086 microprocessor
Interfacing memory with 8086 microprocessorVikas Gupta
 
Data Communication & Computer Networks : Serial and parellel transmission
Data Communication & Computer Networks : Serial and parellel transmissionData Communication & Computer Networks : Serial and parellel transmission
Data Communication & Computer Networks : Serial and parellel transmissionDr Rajiv Srivastava
 
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)Mohammad Ilyas Malik
 
Mobile transport layer - traditional TCP
Mobile transport layer - traditional TCPMobile transport layer - traditional TCP
Mobile transport layer - traditional TCPVishal Tandel
 
IEEE 802.11 Architecture and Services
IEEE 802.11 Architecture and ServicesIEEE 802.11 Architecture and Services
IEEE 802.11 Architecture and ServicesDhrumil Panchal
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInteX Research Lab
 
Task communication
Task communicationTask communication
Task communication1jayanti
 
3.8 quick sort
3.8 quick sort3.8 quick sort
3.8 quick sortKrish_ver2
 
Distributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsDistributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsSachin Chauhan
 
Reset Metastability Issues.pptx
Reset Metastability Issues.pptxReset Metastability Issues.pptx
Reset Metastability Issues.pptxssuserfb39fe
 
Fisheye State Routing (FSR) - Protocol Overview
Fisheye State Routing (FSR) - Protocol OverviewFisheye State Routing (FSR) - Protocol Overview
Fisheye State Routing (FSR) - Protocol OverviewYoav Francis
 
Issues in routing protocol
Issues in routing protocolIssues in routing protocol
Issues in routing protocolPradeep Kumar TS
 

What's hot (20)

Routing algorithm
Routing algorithmRouting algorithm
Routing algorithm
 
Hardware Software Codesign
Hardware Software CodesignHardware Software Codesign
Hardware Software Codesign
 
pipeline and vector processing
pipeline and vector processingpipeline and vector processing
pipeline and vector processing
 
Mobile Transport layer
Mobile Transport layerMobile Transport layer
Mobile Transport layer
 
Query processing
Query processingQuery processing
Query processing
 
3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl3.5 equivalence of pushdown automata and cfl
3.5 equivalence of pushdown automata and cfl
 
Interfacing memory with 8086 microprocessor
Interfacing memory with 8086 microprocessorInterfacing memory with 8086 microprocessor
Interfacing memory with 8086 microprocessor
 
Data Communication & Computer Networks : Serial and parellel transmission
Data Communication & Computer Networks : Serial and parellel transmissionData Communication & Computer Networks : Serial and parellel transmission
Data Communication & Computer Networks : Serial and parellel transmission
 
Global state routing
Global state routingGlobal state routing
Global state routing
 
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
Finite Automata: Deterministic And Non-deterministic Finite Automaton (DFA)
 
Mobile transport layer - traditional TCP
Mobile transport layer - traditional TCPMobile transport layer - traditional TCP
Mobile transport layer - traditional TCP
 
IEEE 802.11 Architecture and Services
IEEE 802.11 Architecture and ServicesIEEE 802.11 Architecture and Services
IEEE 802.11 Architecture and Services
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
 
Task communication
Task communicationTask communication
Task communication
 
3.8 quick sort
3.8 quick sort3.8 quick sort
3.8 quick sort
 
Distributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsDistributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit Protocols
 
Parallel computing(1)
Parallel computing(1)Parallel computing(1)
Parallel computing(1)
 
Reset Metastability Issues.pptx
Reset Metastability Issues.pptxReset Metastability Issues.pptx
Reset Metastability Issues.pptx
 
Fisheye State Routing (FSR) - Protocol Overview
Fisheye State Routing (FSR) - Protocol OverviewFisheye State Routing (FSR) - Protocol Overview
Fisheye State Routing (FSR) - Protocol Overview
 
Issues in routing protocol
Issues in routing protocolIssues in routing protocol
Issues in routing protocol
 

Similar to A Scalable Commodity Data Center Network Architecture

Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3Jeong, Wookjae
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureGunawan Jusuf
 
Dc ch08 : local area network overview
Dc ch08 : local area network overviewDc ch08 : local area network overview
Dc ch08 : local area network overviewSyaiful Ahdan
 
Experimental Analysis Of On Demand Routing Protocol
Experimental Analysis Of On Demand Routing ProtocolExperimental Analysis Of On Demand Routing Protocol
Experimental Analysis Of On Demand Routing Protocolsmita gupta
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.pptPatrick Theuri
 
Rapid Survey on Routing in Data Centers
Rapid Survey on Routing in Data CentersRapid Survey on Routing in Data Centers
Rapid Survey on Routing in Data CentersGhazal Tashakor
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.pptVimalMallick
 
A distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theA distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theKamal Spring
 
Networking and Internetworking Devices
Networking and Internetworking DevicesNetworking and Internetworking Devices
Networking and Internetworking Devices21viveksingh
 
1 networking devices 2014
1 networking devices 20141 networking devices 2014
1 networking devices 2014Zuhaib Zaroon
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithmRoshan Karunarathna
 
Acn Experiment No 2
Acn Experiment No 2Acn Experiment No 2
Acn Experiment No 2Garima Singh
 
Cloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxCloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxRahulBhole12
 
Networking devices
Networking devicesNetworking devices
Networking devicesrupinderj
 
equipment list.pdf
equipment list.pdfequipment list.pdf
equipment list.pdfngusyirga
 
Packet Switching Technique in Computer Network
Packet Switching Technique in Computer NetworkPacket Switching Technique in Computer Network
Packet Switching Technique in Computer NetworkNiharikaDubey17
 
ROUTING PROTOCOLS new.pptx
ROUTING PROTOCOLS new.pptxROUTING PROTOCOLS new.pptx
ROUTING PROTOCOLS new.pptxAayushMishra89
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 

Similar to A Scalable Commodity Data Center Network Architecture (20)

Data center network architectures v1.3
Data center network architectures v1.3Data center network architectures v1.3
Data center network architectures v1.3
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
27 Switching.pptx
27 Switching.pptx27 Switching.pptx
27 Switching.pptx
 
Dc ch08 : local area network overview
Dc ch08 : local area network overviewDc ch08 : local area network overview
Dc ch08 : local area network overview
 
Experimental Analysis Of On Demand Routing Protocol
Experimental Analysis Of On Demand Routing ProtocolExperimental Analysis Of On Demand Routing Protocol
Experimental Analysis Of On Demand Routing Protocol
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.ppt
 
Rapid Survey on Routing in Data Centers
Rapid Survey on Routing in Data CentersRapid Survey on Routing in Data Centers
Rapid Survey on Routing in Data Centers
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.ppt
 
Network layer
Network layerNetwork layer
Network layer
 
A distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase theA distributed three hop routing protocol to increase the
A distributed three hop routing protocol to increase the
 
Networking and Internetworking Devices
Networking and Internetworking DevicesNetworking and Internetworking Devices
Networking and Internetworking Devices
 
1 networking devices 2014
1 networking devices 20141 networking devices 2014
1 networking devices 2014
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithm
 
Acn Experiment No 2
Acn Experiment No 2Acn Experiment No 2
Acn Experiment No 2
 
Cloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxCloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptx
 
Networking devices
Networking devicesNetworking devices
Networking devices
 
equipment list.pdf
equipment list.pdfequipment list.pdf
equipment list.pdf
 
Packet Switching Technique in Computer Network
Packet Switching Technique in Computer NetworkPacket Switching Technique in Computer Network
Packet Switching Technique in Computer Network
 
ROUTING PROTOCOLS new.pptx
ROUTING PROTOCOLS new.pptxROUTING PROTOCOLS new.pptx
ROUTING PROTOCOLS new.pptx
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 

More from Ankita Mahajan

Rest api standards and best practices
Rest api standards and best practicesRest api standards and best practices
Rest api standards and best practicesAnkita Mahajan
 
Understanding Goods & Services Tax (GST), India
Understanding Goods & Services Tax (GST), IndiaUnderstanding Goods & Services Tax (GST), India
Understanding Goods & Services Tax (GST), IndiaAnkita Mahajan
 
Introduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureIntroduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureAnkita Mahajan
 
Virtualization in 4-4 1-4 Data Center Network.
Virtualization in 4-4 1-4 Data Center Network.Virtualization in 4-4 1-4 Data Center Network.
Virtualization in 4-4 1-4 Data Center Network.Ankita Mahajan
 
IPv6: Internet Protocol version 6
IPv6: Internet Protocol version 6IPv6: Internet Protocol version 6
IPv6: Internet Protocol version 6Ankita Mahajan
 
Introduction to SDN: Software Defined Networking
Introduction to SDN: Software Defined NetworkingIntroduction to SDN: Software Defined Networking
Introduction to SDN: Software Defined NetworkingAnkita Mahajan
 
VL2: A scalable and flexible Data Center Network
VL2: A scalable and flexible Data Center NetworkVL2: A scalable and flexible Data Center Network
VL2: A scalable and flexible Data Center NetworkAnkita Mahajan
 

More from Ankita Mahajan (8)

Eye training
Eye trainingEye training
Eye training
 
Rest api standards and best practices
Rest api standards and best practicesRest api standards and best practices
Rest api standards and best practices
 
Understanding Goods & Services Tax (GST), India
Understanding Goods & Services Tax (GST), IndiaUnderstanding Goods & Services Tax (GST), India
Understanding Goods & Services Tax (GST), India
 
Introduction to Data Center Network Architecture
Introduction to Data Center Network ArchitectureIntroduction to Data Center Network Architecture
Introduction to Data Center Network Architecture
 
Virtualization in 4-4 1-4 Data Center Network.
Virtualization in 4-4 1-4 Data Center Network.Virtualization in 4-4 1-4 Data Center Network.
Virtualization in 4-4 1-4 Data Center Network.
 
IPv6: Internet Protocol version 6
IPv6: Internet Protocol version 6IPv6: Internet Protocol version 6
IPv6: Internet Protocol version 6
 
Introduction to SDN: Software Defined Networking
Introduction to SDN: Software Defined NetworkingIntroduction to SDN: Software Defined Networking
Introduction to SDN: Software Defined Networking
 
VL2: A scalable and flexible Data Center Network
VL2: A scalable and flexible Data Center NetworkVL2: A scalable and flexible Data Center Network
VL2: A scalable and flexible Data Center Network
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

A Scalable Commodity Data Center Network Architecture

  • 1. A Scalable, Commodity Data Center Network Architecture Mohammad AI-Fares , Alexander Loukissas , Amin Vahdat
  • 2. Overview • Background of Current DCN Architectures • Desired properties in a DC Architecture • Fat tree based solution • Evaluation • Conclusion
  • 3. Common DC Topology Internet Servers Layer-2 switchAccess Data Center Layer-2/3 switchAggregation Layer-3 routerCore
  • 4. Background Cont.  Oversubscription:  Ratio of the worst-case achievable aggregate bandwidth among the end hosts to the total bisection bandwidth of a particular communication topology (1:1 indicates that all hosts may communicate with any arbitrary host at full BW. Ex) 1Gb/s for commodity Ethernet)  Lower the total cost of the design  Typical designs: factor of 2.5:1 (400 Mbps)to 8:1(125 Mbps)  Cost:  Edge: $7,000 for each 48-port GigE switch  Aggregation and core: $700,000 for 128-port 10GigE switches  Cabling costs are not considered!
  • 5. Current DC Network Architectures 2High level choices for building comm fabric for large scale clusters:  Leverages specialized hardware and communication protocols, such as InfiniBand, Myrinet. – These solutions can scale to clusters of thousands of nodes with high bandwidth – Expensive infrastructure, incompatible with TCP/IP applications  Leverages commodity Ethernet switches and routers to interconnect cluster machines – Backwards compatible with existing infrastructures, low-cost – Aggregate cluster bandwidth scales poorly with cluster size, and achieving the highest levels of bandwidth incurs non-linear cost increase with cluster size
  • 6. Problem With common DC topology • Single point of failure • Over subscript of links higher up in the topology
  • 7. Properties of the solution(FAT) • Backwards compatible with existing infrastructure – No changes in applications. – Support of layer 2 (Ethernet) • Cost effective – Low power consumption & heat emission – Cheap infrastructure • Allows host communication at line speed.
  • 8. Clos Networks/Fat-Trees • Adopt a special instance of a Clos topology: “K-ary FAT tree” • Similar trends in telephone switches led to designing a topology with high bandwidth by interconnecting smaller commodity switches.
  • 9. Fat-Tree Based DC Architecture • Inter-connect racks (of servers) using a fat-tree topology K-ary fat tree: three-layer topology (edge, aggregation and core) – each pod consists of (k/2)2 servers & 2 layers of k/2 k-port switches – each edge switch connects to k/2 servers & k/2 aggr. switches – each aggr. switch connects to k/2 edge & k/2 core switches – (k/2)2 core switches: each connects to k pods Fat-tree with K=4
  • 10. Fat-Tree Based Topology • Why Fat-Tree? – Fat tree has identical bandwidth at any bisections – Each layer has the same aggregated bandwidth • Can be built using cheap devices with uniform capacity – Each port supports same speed as end host – All devices can transmit at line speed if packets are distributed uniform along available paths • Great scalability: k-port switch supports k3/4 servers/hosts: (k/2 hosts/switch * k/2 switches/pod * k pods) Fat tree network with K = 6 supporting 54 hosts
  • 11. FATtree DC meets the following goals: • 1. Scalable interconnection bandwidth: • Achieve the full bisetion bandwidth of clusters consisting of tens of thousands of nodes. • 2. Backward compatibility. No changes to end hosts. • 3. Cost saving.
  • 12. • Enforce a special (IP) addressing scheme in DC – unused.PodNumber.switchnumber.Endhost – Allows host attached to same switch to route only through that switch – Allows inter-pod traffic to stay within pod • Exising routing protocals such as OSPF, OSPF-ECMP are unavailable. – Concentrate on traffic. – Avoid congestion – Fast Switch must be able to recognize! FAT-Tree (Addressing)
  • 13. FAT-Tree (2-level Look-ups) • Use two level look-ups to distribute traffic and maintain packet ordering: – First level is prefix lookup • used to route down the topology to hosts. – Second level is a suffix lookup • used to route up towards core • maintain packet ordering by using same ports for same server. • Diffuses and spreads out traffic.
  • 14. More on Fat-Tree DC Architecture Diffusion Optimizations (optional dynamic routing techniques) 1. Flow classification, Define a flow as a sequence of packets; pod switches forward subsequent packets of the same flow to same outgoing port. And periodically reassign a minimal number of output ports – Eliminates local congestion – Assign traffic to ports on a per-flow basis instead of a per-host basis, Ensure fair distribution on flows
  • 15. More on Fat-Tree DC Architecture 2. Flow scheduling, Pay attention to routing large flows, edge switches detect any outgoing flow whose size grows above a predefined threshold, and then send notification to a central scheduler. The central scheduler tries to assign non- conflicting paths for these large flows. – Eliminates global congestion – Prevent long lived flows from sharing the same links – Assign long lived flows to different links
  • 16. Fault-Tolerance In this scheme, each switch in the network maintains a BFD (Bidirectional Forwarding Detection) session with each of its neighbors to determine when a link or neighboring switch fails. 2 kinds of failures:  Failure between upper layer and core switches  Outgoing inter-pod traffic: local routing table marks the affected link as unavailable and chooses another core switch  Incoming inter-pod traffic, core switch broadcasts a tag to upper switches directly connected signifying its inability to carry traffic to that entire pod, then upper switches avoid that core switch when assigning flows destined to that pod
  • 17. Fault-Tolerance • Failure b/w lower and upper layer switches: – Outgoing inter- and intra pod traffic from lower-layer: – the local flow classifier sets the cost to infinity and does not assign it any new flows, chooses another upper layer switch – Intra-pod traffic using upper layer switch as intermediary: – Switch broadcasts a tag notifying all lower level switches, these would check when assigning new flows and avoid it – Inter-pod traffic coming into upper layer switch: – Tag to all its core switches signifying its ability to carry traffic, core switches mirror this tag to all upper layer switches, then upper switches avoid affected core switch when assigning new flaws
  • 18. Packing Drawback: No of cables needed. Proposed packaging solution: • Incremental modeling • Simplifies cluster mgmt. • Reduces cable size • Reduces cost.
  • 19. Evaluation • Benchmark suite of communication mappings to evaluate the performance of the 4-port fat- tree using the TwoLevelTable switches, FlowClassifier ans the FlowScheduler and compare to hirarchical tree with 3.6:1 oversubscription ratio
  • 21. Results: Heat & Power Consumption
  • 22. Conclusion Bandwidth is the scalability bottleneck in large scale clusters Existing solutions are expensive and limit cluster size Fat-tree topology with scalable routing and backward compatibility with TCP/IP and Ethernet Large number of commodity switches have the potential of displacing high end switches in DC the same way clusters of commodity PCs have displaced supercomputers for high end computing environments
  • 23. Critical Analysis of Fat-Tree Topology 9/22/2013 COMP6611A In-class Presentation 23
  • 24. Draw Backs • Scalability • Unable to support different traffic types efficiently • Simple load balancing • Specialized hardware • No inherent support for VLan traffic • Ignored connectivity to the Internet • Waste of address space 9/22/2013 COMP6611A In-class Presentation 24
  • 25. Scalability 9/22/2013 COMP6611A In-class Presentation 25 • The size of the data center is fixed. – For commonly available 24, 36, 48 and 64 ports commodity switches, such a Fat-tree structure is limited to sizes 3456, 8192, 27648 and 65536, respectively. • The cost of incremental deployment is significant.
  • 26. Unable to support different traffic types efficiently 9/22/2013 COMP6611A In-class Presentation 26 • The proposed structure is switch- oriented, which might not be sufficient to support 1-to-x traffic. • A flow can be setup between 2 any pairs of servers with the help of the switching fabric, but it is difficult to setup 1-to-many and 1-to-all flows efficiently using the proposed algorithms.
  • 27. Specialized hardware 9/22/2013 COMP6611A In-class Presentation 27 • This architecture still relies on a customized routing primitive that does not exist in commodity switches. • To implement the 2-table lookup, the lookup routine in a router have to be modified. – The authors achieves this by modifying a NetFPGA router. – This modification is yet to be incorporated into current commodity switches.
  • 28. No inherent support for VLan traffic 9/22/2013 COMP6611A In-class Presentation 28 • Agility – The capacity to assign any server to any services • Performance Isolation – Services should not affected each othe
  • 29. Waste of address space 9/22/2013 COMP6611A In-class Presentation 29 • Semantic information in the IP address of servers and switches. – e.g. 10.pod.switch.0/24, port – A large portion of the IP address is wasted. • NAT is required at the border.
  • 30. Possible Future Research 9/22/2013 COMP6611A In-class Presentation 30 • Flexible incremental expandability in data center networks. • Support Valiant Load Balancing or other load balancing techniques in the network. • Reduce cabling, or increase tolerance to mis- wiring
  • 31. References: • A scalable Commodity DCN Archi, dept. of CSE- Univ of California • pages.cs.wisc.edu/~akella/CS740/F08/DataCenters.ppt • http://networks.cs.northwestern.edu/EECS495- s11/prez/DataCenters-defence.ppt • University of Hong Kong: http://www.cse.ust.hk/~kaichen/courses/spring2013/comp6611/